Statistical methodology relevant to the overall drug development program
Posted on: Tuesday, 9 September 2003, 06:00 CDT
The current role of statistics in the drug development program is dominated by the "planning and analyzing a single trial" paradigm. On the one hand, there is some need to structure such a program. On the other hand, a great deal of flexibility is required to allow researchers to incorporate lessons learned from experience into the inferential procedures. Some statistical methods for planning and analyzing more than one trial are sketched. The limitation of frequentist type inference to cover the decisions following a complex development process are addressed. For integrative statistical coverage of the decicion procedures in a drug development program, extensive modelling based upon various assumptions will be required. This will lead to the main challenge, to make it real without losing credibility, transparency, and persuasiveness.
Key Words
Overall drug development program; Combining phases; Sequential meta analysis; Adaptive designs; Bayesian inference
INTRODUCTION
Statistical reasoning and methods play an important role in medical research. In the area of clinical trials where so many different interests collide this has led to a variety of national and international regulations and guidelines. A prominent example, the International Conference for Harmonisation's Note for Guidance on Statistical Principles for Clinical Trials, ICH9 (1), which was effective as of September 1998, is an attempt to harmonize the principles of statistical methodology applied to clinical trials for marketing applications submitted in Europe, Japan, and the United States.
When looking at the role of statistics in overall drug development programs, one can hardly find contributions in the scientific literature about this general issue. Section 2.1.1 of the ICH 9 document refers to a development plan that:
... usually requires an ordered programme of clinical trials, each with its own specific objectives. This should be specified in a clinical plan, or series of plans, with appropriate decision points and flexibility to allow modifications as knowledge accumulates. . . . A statistical summary, overview or meta-analysis may be informative when medical questions are addressed in more than one trial. Where possible this should be envisaged in the plan so that the relevant trials are clearly identified and any necessary common features of their designs are specified in advance. Other major statistical issues (if any) that are expected to affect a number of trials in a common plan should be addressed in that plan.
Section 7.2, "Summarizing the Clinical Database," starts with: "An overall summary and synthesis of the evidence on safety and efficacy from all reported clinical trials is required for a marketing application . . . This may be accompanied, when appropriate, by a statistical combination of results."
Subsection 7.2.1, "Efficacy Data," states that: "The use of meta- analytic techniques to combine these estimates is often a useful addition, because it allows a more precise estimate of the size of the treatment effects to be generated, and provides a complete and concise summary of the results of the trials." These statements infer, on the one hand, that there is a need for some preplanning of the whole program. On the other hand, they also concede that complete planning is not realistic because of the urgent need to learn from experience. This explains why there is no explicit demand for integrative statistical modeling of the whole decision process.
Beyond these very sparse notes on the overall development program, the guideline is fully dedicated to the single trial situation. This simply mirrors the common practice in which statisticians are involved in the development process in general, with drug development considered an important paradigm. When leaving the context of a single clinical trial the challenge of modeling the complex decision process arises from different sources: dimensionality (incorporating issues of efficacy, safety, burden to the patient, and costs), diversity (dealing with changing models, eg, when switching phases or when switching from surrogate to clinical endpoints), and unpredictability (dealing with the unexpected, eg, when a particular safety issue arises in a new compound).
In the following we will not consider important fields such as modeling and simulation of biological processes (2), simulation of clinical trials (3,4), or forecasting the outcome of rival development programs (5). Also, the techniques of (post hoc) meta- analyses will not be addressed even if accounting for the cumulative nature of the development process (6,7). We focus on a few inferential aspects that try to overcome the "planning and analyzing a single trial" philosophy:
1. The overall planning and analysis of a "confirmative" part of the process, for example, comprising the clinical trials in phase 3 of the development process,
2. Combining phases, for example, by performing multistage designs with dose selection in adaptive interim analyses, and
3. Bayesian methods to cover the decision process and their problems in applications.
We will end with a rather pragmatic view on statistical methods in this area. Statistical results, combined with sufficient transparency to the underlying principles and methods, are an issue of quality research, just as sites collecting data adhere to standard operating procedures and quality assurance handbooks.
OVERALL PLANNING AND ANALYSIS OF THE CONFIRMATIVE PART
THE TWO PIVOTAL STUDIES APPROACH
One of the decision strategies the United States Food and Drug Administration (FDA) asks for is "replicating the result in a second study, to constitute an adequate demonstration of effectiveness for a new product" (8). Assume that only two such pivotal trials are conducted, for example, in the form of treatment-control comparisons that are not necessarily identical in design. Then under the overall null hypothesis (no treatment-control difference exists in all of the trials) the probability of getting two false-positive results at the level [alpha]= 0.05 (two sided) each is controlled by 0.025 [middot] 0.025 = 0.000625 (assuming stochastic independence). Hence, there is a very small chance for getting an erroneous approval with such a strategy. Also, the interpretation is simple in this case, because when rejecting the family-wise error rate 0.05 in both planned pivotal trials, individual test decisions are controlled in the strong sense. The overall false-positive rate increases to 0.00184 if at least 2 significant out of 3 planned trials are considered confirmative and to 0.00363 if at least 2 significant out of 4 planned trials are considered sufficient.
Lu & Huque (9) considered the case when a test in the sample pooled over the two planned pivotal trials at the one-sided level [alpha]* is added as formal evidence whenever the goal of two significant test results at the two-sided level [alpha]= 0.05 has not been achieved. They used the simple model of identically designed trials for the test of the normal mean with known variance. Clearly, by taking a chance of getting two positive results or a positive pooled follow-up test the overall false-positive error rate will exceed 0.000625 even if the pooled test is performed at a one- sided level as small as [alpha]* = 0.000625. The actual overall false-positive rate (under the overall no effect hypothesis) of such an "extended" procedure will increase by 48% to 0.000926. For [alpha]* = 0.025, the overall false-positive rate is as large as 0.025 because rejection of both individual tests at level [alpha]= 0.025 never occurs without rejection of the pooled test.
Protection levels for overall false-positive decisions will become disguised if unforeseen trials are added or the two pivotal studies rule is modified. The arguments above indicate that this may lead to considerable divergence in overall protection levels in frequentist terms.
SERIES OF MORE THAN TWO TRIALS, SEQUENTIAL META-ANALYSIS
Some proposals have been made to impose a certain structure on the decision procedure in the confirmative part of the process. One obvious constraint would be to fix a maximum number m of clinical trials to be performed in this phase. Some simple decision rules could be laid down. One of the most important features of such decision rules is flexibility for planning the remaining trials, that is, a requirement that accumulating information from the previous trials or from other sources be incorporated into the selection of the remaining trial designs and the statistical models applied. Therefore, relying on p-values as the test statistics summarizing the evidence from the separate trials has been proposed. Using independent samples in the different trials under the overall no effect hypothesis, the distributions of the p-values in general have some simple properties, for example, for continuous test statistics they quite generally are independent and uniform on [0,1].
The (m, k, [alpha][low *])-Rule. Stop the process with the claim of efficacy if in k trials the null hypothesis of no treatment effect has been rejected at the modified individual level [alpha][low *]; give up if there are already (m - k + 1) trials where you have not been successful at this level (nonstochastic curtailment) (10). Basically, this rule asks for a prefixed number of k successful trials, and allows one to lay down an overall false- positive r\ate of a for the overall decision.
As an example, let us apply the same overall level [alpha] = 0.000625 as applicable for the two pivotal studies procedure. For three planned trials we can stop if the first two trials show one- sided p-values below [alpha][low *] = 0.0145. If both trials failed at this level we cannot achieve the goal of two positive studies, so the third trial no longer needs to be conducted. The procedure could be modified in various ways (eg, choosing unequal [alpha]^sub i^[low *] for the trials), however, to my knowledge nobody has ever used such a procedure. The major objections may have been against the intention to organize (and formalize) such a complicated process and to aim at an overall false-positive error rate.
Another critical point is the interpretation. Let H^sub 01^, H^sub 02^, . . . , H^sub 0m^ be m individual null hypotheses tested in m trials. Then it is the global null hypothesis H^sub 0^ = H^sub 01^ [intersection] H^sub 02^ [intersection] . . . [intersection] H^sub 0m^ ("There is no difference in all the trials") that is tested by such a procedure. Arguing strictly, the alternative is, "There is a difference in at least one of the trials." Considering the controversy about multiple inference in a single trial, however, it may not be reasonable here to push the concept of family-wise (multiple) error control forward to the m individual study hypotheses.
Repeated Significance Test for Fisher's Product Criterion for P- Values From Consecutive Trials. Another idea was based upon applying the principle of the repeated significance test to the products of all previous p-values, or equally to the cumulative sums of the logarithms of these p-values (10). Certain types of stopping boundaries can easily be constructed for independently and uniformly distributed p-values.
Inference in the above type of procedures relies upon disjoint test statistics for the consecutive trials which, under the global no treatment effect hypothesis, preserve their properties even in cases of data-dependent planning of future trials. This basic concept has been carried over to the concept of adaptive multistage designs; see "Combining Phases."
Sequential Meta-Analysis of Concurrent Trials. This situation refers to a series of k preplanned studies with "broadly" similar protocols where r interim analysis are planned. Again, it is an attempt to achieve an overall test decision at a protected level. At the time of the jth interim analysis, the increments of the efficient score Z and Fisher's information V (eg, 11) are used since the (j-1)th interim analysis are cumulated per study (running in that interval) and suitably pooled over these studies. For the purpose of pooling, random and fixed models for the study effects are proposed (12). The magnitude of the overall level considered indicates that this sort of sequential analysis is intended to be applied in a series of rather closely related parallel trials which, for any reason, could not be integrated successfully into a single multicenter trial with a sequential design. This is affirmed by the lack of learning from experience, which would allow one to adapt later trials to needs arising from previous results.
COMBINING PHASES
In nearly all phases of the drug development process, the advantage of sequential (multi-stage) designs has been considered for ethical and economical reasons (eg, 11,13,14). However, few of the designs try to combine different steps of the process, for example, aiming at the combination of experiments from phases 2 and 3 (15,16).
A rough characterization of the tasks in phase 2 of a drug development program is the choice of the dose(s) to be carried over to the large phase 3 trials in the "confirmatory" part of the program. How can the information from the dose-finding phase also be used in a later "confirmative" analysis? Researchers may feel that when creating a development program with extensive experimentation for the selection of the dose(s) to be investigated later they will be punished; The information from early treatment comparisons may seem to be wasted.
The method described in the following shows how information from earlier stages can be combined with that of later stages although treatments (doses) are selected in adaptive interim analyses based upon all previous information from inside or outside the experiment. The method allows for many midtrial design modifications (which need not necessarily be prespecified a priori) without compromising the type I error rate.
MULTISTAGE DESIGNS WITH ADAPTIVE INTERIM ANALYSES
The basic idea of multistage designs with adaptive interim analyses can be derived by borrowing a few arguments from the concept of sequential meta-analysis. Here the m stages play the role of the m trials in the section on "Series of More than Two Trials, Sequential Meta-Analysis." For simplicity, we restrict ourselves to two-stage designs (m = 2).
Assume that we plan a two-stage trial design in a conventional way by prefixing the primary outcome measure(s), the sample sizes at the two stages, the randomization procedure, the test statistics, and so on. We conduct the first stage and get our first stage p- value: p^sub 1^. We look at all the data collected thus far in the trial or available from sources outside of the trial. Now we have two options:
1. If we are satisfied with our original planning we proceed as scheduled and take a sample for the second stage. After the second stage we calculate the second stage p-value, p^sub 2^, based upon the disjoint sample of the second stage only. The final analysis is conducted by combining the two p-values into a single test statistic by a predefined combination function C(p^sub 1^, p^sub 2^). Note that this combination rule cannot be chosen in a data-dependent way, it must be laid down a priori in the planning phase, or
2. If we are not satisfied with the assumptions from the planning phase we may redesign the second stage by using all of the updated information. In a multiarm trial to establish a dose response relationship this may lead to dropping doses because of lack of efficacy or safety problems, or adding doses because they seem to be sufficiently safe and more effective. The sample size allocation may be changed, assigning a greater sample size to a particular treatment arm to get more information on safely. The total sample sizes may be modified based upon observed nuisance parameters such as the variability of the outcome variable. Even such essential features as the weights of how individual endpoints are aggregated into a single compound multiple endpoint criterion may be modified.
Before running the second stage we ought to fix the adapted protocol. After the second stage, we calculate the second stage p- value, p^sub 2^, from the disjoint second stage sample. The overall test in the final analysis is again performed by combining the two p- values, p^sub 1^, and p^sub 2^, by the function C(p^sub 1^, p^sub 2^), which has been laid down in the planning phase of the trial.
The "Adaptive Combination Test" controls the overall [alpha] under all type of adaptations, which preserve the simple properties of the distribution of the stage-wise p-values under the overall no effect hypothesis (17-21). It is easy to introduce a sequential test by incorporating stopping rules into the interim analysis, for example, p^sub 1^ < or = [alpha]^sub 1^ for early rejection and p^sub 1^ > [alpha]^sub 0^ for early acceptance (stopping for futility), where [alpha]^sub 1^ < [alpha] < [alpha]^sub 0^ (18). Note that this formulation in terms of combination of p-values provides an enormous generality with regard to hypotheses and statistical models. This may be useful when changes of hypotheses and the statistical model in a drug development program are indicated by the cumulating information. It is highly questionable whether the usual practice of dealing with such changes only formally by adding amendments to the study protocol is always appropriate.
Several combination functions have been discussed, for example, Fisher's product criterion C(p^sub 1^, p^sub 2^) = p^sub 1^p^sub 2^ (17,18) because of its simplicity, or the inverse normal combination function C(p^sub 1^, p^sub 2^) = w^sub 1^, [Phi]^sup -1^(1 -p^sub 1^) + w^sub 2^[Phi]^sup -1^ (1 -p^sub 2^), With w^sub 1^, w^sub 2^ > 0, w^sup 2^^sub 1^ + w^sup 2^^sub 2^ = 1 (18,21). Both functions (among others) have been proposed for meta-analysis (22). The "inverse normal" has a simple appealing interpretation. Since [Phi]^sup -1^ is the inverse of the standard normal distribution function the outcome of each stage is transformed into a standard normal z-score. These z-scores are used in frequentist inference to measure the "distance" of an observed mean from the hypothesized mean of a normal distribution. As an overall measure a weighted mean of the stage-wise distances is taken. The natural weights would be proportional to the square root of the respective preplanned sample sizes. As a consequence, conventional one-sided group sequential designs for the normal mean with known variance are a special case of adaptive multistage designs without midtrial design modifications (21).
It has also been shown that a different approach to flexible designs via the conditional error function (23) can be looked at in terms of combination functions (24). The underlying idea is that a trial can be adapted as long as the probability of type I error in the forthcoming adapted design (conditionally on the results observed to date) does not exceed the conditional probability of type I error for the original design, if the latter can indeed be calculated. We will never switch to a design that has an increased risk of a type I error.
This allows one extended flexibility in the number of interim analysis. Assume that we have arrived at the interim analysis of a preplanned two-stage design. From the observed res\ults we see a good chance to achieve an early decision before the scheduled end. So we may replace the second stage by a two-stage design with an additional interim analysis. As long as this two-stage design (to be started in the interim analysis) has the same conditional risk of a type I error as the single stage design of the original plan, no violation of the overall level will occur (25), even in situations when no interim analysis has been planned in advance (unpublished data; Muller HW, Schafer H; 2000). Such flexibility may help prevent the conduct of "unnecessary" interim analyses, or speed up the development procedure by inserting additional "promising" interim analyses. Confidence intervals can be derived from suitably defined p-values for two-stage tests in continuous families of null hypotheses by exploiting the duality to testing (20). The recursive application of two-stage combination tests also generalizes to flexible designs with a variable number of stages (20). This recursive principle, applied without any early stopping boundaries, covers the method of self-designing clinical trials (26,27).
Intense controversy surrounds the value of such flexible designs. The close relation to classical group sequential design is appealing. Due to the experimenter's extreme flexibility, however, it is difficult to systematically evaluate the advantages. Problems of interpretation may arise when adaptation leads to modification of hypotheses; see "Series of More than Two Trials, Sequential Meta- Analysis." But there are methods available to deal with multiplicity in adaptive designs (28,29). One of the prices to be paid for flexibility certainly is that, in the case of adaptation, the final test statistics, in general, will not be the conventional statistics in the total sample.
Further discussion refers to the value of overall or conditional power arguments to be used for sample size reassessment. An experimenter dealing with a special problem in a specific scientific and economic environment who has access to interim results may not want to rely upon long-run arguments averaging over outcomes (of similar equally powered trials) that he has definitely not observed in his own trial (30). Although this is primarily discussed in connection with sample size reassessment, the merits of this approach to drug development may go much beyond this. This approach has been exploited in trials to combine certain steps of the drug development program (31), where dose selection based upon efficacy, safely, and information possibly emerging from outside the trial are major issues. Sample size reassessment may be of major relevance if families of null hypothesis are considered. An example is the possibility for switching the goal from a noninferiority to a superiority trial (32) which, because of different efficacy margins for noninferiority and superiority, may require crossly diverging sample sizes (33,34,35).
BAYESIAN INFERENCE
Bayesian methods look at the problems differently (36-40). They try to model what is currently known about an unknown quantity [theta] (such as the effect size of a new treatment as compared to a control) by some distribution p([theta]). If experimentation continues, we get new evidence formally denoted by x. An important quantity that is also considered in frequentist arguments is the likelihood for the occurrence of such evidence x depending upon the value of [theta] expressed as a function p(x [theta]). In order to quantify what is known about [theta] after having observed x, the Bayes theorem tells us that this updated knowledge p([theta] x) is proportional to p([theta]) x p(x | [theta]). The updated distribution p([theta] | x), called the posterior, is proportional to the product of prior p([theta]) and likelihood p(x | [theta]).
Clearly, this concept can be used to formally model the steps of an overall drug development program. At any time point before further experimentation we have our actual knowledge (prior) from previous experience. After the next experimental step, we simply update this prior by multiplication with the likelihood of the outcome of the experiment.
Having expressed the knowledge in the form of a distribution, things can be driven further. With regard to registration of a new drug, the consequences of the decisions will usually depend upon [theta]. For large negative effects [theta], a registration may be of much more concern than it already is for [theta]=0. Missing a registration in the case of a large positive effect [theta] will be more of a major concern than missing a treatment with low effectiveness. Hence, the utility (or costs) of decisions will, in general, depend upon [theta]. One may wish to quantify overall measures of utilities for rival decision strategies, for example, by weighting the utility of a particular decision procedure with the posterior and averaging this over all possible values of [theta].
From these arguments it is obvious that Bayesian inference aims at much wider goals than frequentist methods. With regard to applications in overall drug development programs, however, there are some crucial problems to be considered. There is a never-ending discussion on the choice of a suitable prior. There seems to be general agreement that the possibility of allowing for individualization of priors that express the opinions of one or more experts involved at certain stages in drug development programs may help researchers make good early decisions. In later decisions with a reach far outside the institution of the drug developer (eg, registration) opinions diverge. Personally, having been closely involved as a statistician in medical faculties and ethics committees for decades, I am very skeptical with regard to the variety and diversity of medical knowledge expressed as opinions on a particular subject by different individuals or faculty of different schools. Why should this issue be less contradictory in a field where new drugs are investigated? It may be less crucial in health technology assessment; see the extensive survey of Spiegelhalter et al. (41). In drug development, small changes in the design may result in massive unpredictable changes in their properties (I guess that there are numerous examples for such bad surprises).
A second problem will arise if the knowledge must be carried between major steps of the drug development program, for example, when moving from experiments in animals to those in humans, when moving from surrogate to clinically-relevant endpoints, when changing the indication, and so forth. Here again, experts will be needed to model new priors when leaving the parameters of one statistical model and proceeding to those of another model. (We are not interested in dealing with a steadily increasing dimension of the parameter [theta]). Using noninformative priors (without accounting for previous observations or expert opinions) will lead to the disintegration of the development process and would be against the intention of Bayesian inference. The consistent use of noninformative priors would sacrifice its specific feature of modeling prior knowledge and lead to formal procedures similar to the classical ones.
The final comment refers to the ultimate goal of defining utilities comprising features such as efficacy, safety, burden to the patient, and costs into one measure. It is obvious that this is a difficult task. One may argue that these things are done implicitly anyway by those who make decisions on issues such as drug registration. Why not address it directly? It is my impression that the main objection is that things get less parsimonious for the observer and it becomes more difficult to separate the influence of assumptions and observed data. When reading through chapter 8 ("Bayeswatch: a Bayesian checklist for health technology assessment") in Spiegelhalter et al. (40) it is obvious that many points must be addressed seriously when communicating such results. Is this related specifically to Bayesian inference? It seems natural that modeling decisions that follow complicated processes in the living world will require indulging in some complexity. (This will also apply to modeling and simulation of clinical trials or rival strategies for the drug development program). Thus, any type of inference to be used for an overall drug development decision procedure will face the primary problem of being real without loosing credibility, transparency, and persuasiveness.
CONCLUSION
Controversies among statisticians over the correct philosophy of inference are longstanding. Looking at the important problem of implementing statistical decision tools in an overall drug development program provokes reasoning over the limitations we face. The frequentist approach reduces things down to the concept of protecting probabilities of erroneous decisions, given some hypothetical true state of nature. It has been shown that the rigidity with regard to a priori planning can be extremely relaxed when applying the concept of adaptive combination tests. Parts of a drug development program may be joined based upon this concept.
The consideration of continuous families of hypotheses (which is related to the dual concept of confidence intervals as has been known for a long time) can help researchers to escape the monolithic structure of performing an experiment only to test a single prespecified null hypothesis. Thus, it is now accepted practice to switch the goal of a trial from noninferiority to superiority (32). Not surprisingly, a lot of research in this field comes from scientists affiliated with the regulatory authorities. This may be attributed to the intention of relaxing unnecessary and counterintuitive rigidity without sacrificing all "objective" frequentist properties.
Nevertheless, classical frequentist methods by their basic intention will soon face limitations. The advantage of these methods i\s generally seen to be their long tradition, their wide distribution, their practicality (at least the older ones), and also their "inflexibility."
The Bayesian approach has a much wider (different) scope and is a particularly appealing concept for a stream of research. While based upon a simple comprehensive concept the problem here is to make it real. There is no such thing as a free lunch; the more exploiting its potential, the more input is required in the form of assumptions that must be made; and the more involved and less transparent the inferential procedure will become. (Is this not an inevitable consequence?) This will, in the extreme, require reducing down all features of the development process (effectiveness, safely, burden to the patients, costs, etc.) into one "utility"-scale.
My long-term experiences with medical opinions and beliefs seem to suggest that medical researchers are ultimately able to find an interpretation for any type of statistical result (possibly even if it is produced with an error in sign). Hence, some skepticism about large-scale implementation of this methodology in drug development programs-where so many interests of different natures may clash-is appropriate. One must question whether all people involved in drug development would be happy with the option (or task) of "individualizing" inference. However, if it is conjectured that people who refuse such a chance are simply backward, we must also ask who in this area is interested in such individualizations, and why.
Rather than asking the dogmatic question of whether a certain type of inference is the "correct" principle, we should deal with the pragmatic question of how the promises can be redeemed. Much has already been done and there seem to be many activities ongoing within pharmaceutical companies beyond pure frequentist principles that have not yet been published (possibly also because of the intrinsic problems in communicating them). There is no doubt that we can expect further interesting projects.
Before statisticians can adequately model scientific progress (is not this ruled out by the logic of science anyway?) they should refrain from transporting their results as messages from those who are in command of the one and only key to science. They should, rather, transport their methods as tools to make observations persuasive for certain conclusions. A statistical analysis addressing the problem of multiplicity might be found to be more conclusive than one ignoring this problem. This does not mean that anybody knows what the "best" way of dealing with multiplicity would be (I am afraid that even in Bayesian analysis, different arguments will be made on the issue of multiplicity). But we also perform, for example, drug and dose finding without having access to any golden rule about how to proceed. Statistical results communicated with transparency to the planning modalities and the underlying methods of inference are an issue of quality of research, just as institutions that produce observations use appropriate standard operating procedures and quality assurance strategies. This does not diminish the important role statisticians may take in planning, guiding, and accompanying clinical trials based upon their experience in this field.
Drug Information Journal, Vol. 37, pp. 81-89, 2003 * 0092-8615/ 2003
Printed in the USA. All rights reserved. Copyright (C) 2003 Drug Information Association, Inc.
REFERENCES
1. Note far Guidance on Statistical Principles for Clinical Trials. ICH 9. London, England: European Agency for the Evaluation of Medical Products; 1998.
2. Gieschke R, Steimer JL. Pharmacountries: modelling and simulation tools to improve decision making in clinical drug development. European J Drug Metabab Pharmacokin. 2000;25: 49-58.
3. Holford, NHG, Hale M, Ko HC, Steimer JL, Sheiner LB, Peck CC. Simluation in Drug Development: Good Practices. 2000. http:// www.dml. georgetown.edu/cdds.
4. Holford NHG, Klimko HC, Monteleone JRR, Peck C.C. Simulation of clinical trials. Ann Rev Pharmcol Toxicol. 2000;40:209-234.
5. Enas GG, Anderson JJ. Enhancing the value delivered by the statistician throughout drug discovery and development: putting statistical science into regulated pharmaceutical innovation. Stat Med. 2001;20:2697-2708.
6. Pogue JM, Yusuf S. Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative meta- analysis. Control Clin Trials. 1997;18:580-593.
7. Ioannidis JP, Contopoulos-Ioannidis DG, Lau J. Recursive cumulative meta analysis: A diagnostic for the evolution of total randomized evidence from group and individual patient data. J Clin Epidemiol. 1999;52:281-291.
8. Food and Drug Administration. Statement Regarding the Demonstration of Effectiveness of Human Drug Products and Device. Federal Register, Vol. 60. No 147, August 1, 1995; Docket No 95- 0230:39180-39181.
9. Lu HL, Huque M. Understanding on the pooled test for controlled clinicals trials. Biometrical J. 2001;43:909-923.
10. Bauer P. Sequential tests of hypotheses in consecutive trials. Biom J. 1989;51:663-676.
11. Whitehead J. The Design and Analysis of Sequential Clinical Trials. Revised second edition, Chichester, United Kingdom: Wiley Verlag; 1997.
12. Whitehead A. A prospectively planned cumulative meta- analysis applied to a series of concurrent clinical trials. Stat Med. 1997;16:2901-2913.
13. Kramar A, Potvin D, Hill C. Multistage design for phase II clinical trials: statistical issues in cancer research. Br J Cancer. 1996;74:1317-1320.
14. Simon R, Thall PF, Ellenberg SS. New designs for the selection of treatments to be tested in randomized clinical trials. Stat Med. 1994;13:417-429.
15. Schaid DJ, Ingle JN, Wieand S, Ahmann DL. A design for phase II testing of anti-cancer agents within a phase III clinical trial. Control Clin Trials. 1988;9:107-118.
16. Storer BE. A sequential phase II/III trial for binary outcomes. Stat Med. 1990;9:229-235.
17. Bauer P. Multistage testing with adaptive interim analyses (with discussion). Biometrie und Informatik in Medizin und Bilogie. 1989;20:130-148.
18. Bauer P, Kohne K. Evaluation of experiments with adaptive interim analyses. Biometrics. 1994;50: 1029-1041.
19. Bauer P, Brannath W, Posch M. Flexible two-stage design: an overview Methods Inf Med. 2001;40: 117-121.
20. Brannath W, Posch M, Bauer P. Recursive combination tests. J Am Stat Assoc. 2002;97:236-244.
21. Lehmacher W, Wassner G. Adaptive sample size calculations in group sequential trials. Biometrics. 1999;55:1286-1290.
22. Hedges LV, Olkin I. Statisical Methods for Meta-Anatysis. New York: Academic Press; 1985.
23. Proschan MA, Hunsberger SA. A designed extension of studies based on conditional power. Biometrics. 1995;51:1315-1324.
24. Posch M, Bauer P. Adaptive two stage designs and the conditional error function. Biom J. 1999;41: 689-696.
25. Muller HH, Schafer H. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and classical group sequential approaches. Biometrics. 2001;57:886-819.
26. Fisher LD. Self-designing clinical trials. Stat Med. 1998;17:1551-1562.
27. Shen Y, Fisher L. Statistical Inference for self-designing clinical trials with a one-sided hypothesis. Biometrics. 1999;55:190- 197.
28. Bauer P, Kieser M. Combining different phases in the development of medical treatments within a single trial. Stat Med. 1999;18:1833-1848.
29. Hommel G. Clinical trials with an adaptive choice of hypotheses. Drug lnf J. 2001;35: 1423-1429.
30. Posch M, Bauer P, Brannath W. "Issues in designing flexible trials SFdS." Presented at the Meeting on Statistic Methods in Biopharmacy, Paris, France, 2001.
31. Zeymer U, Suryapranata H, Monassier JP, et al. The Na+/H+ exchange inhibitor eniporide as an adjunct to early reperfusion therapy for acute myocardial infarction. J Am Col Cardiol. 2001; 38:1644-1651.
32. Committee for Proprietary Medicinal Products. Points to Consider on Switching Between Superiority and Non-Inferiority. London, United Kingdom: European Agency for the Evaluation of Medicinal Products; 2000.
33. Bauer P, Kieser M. A unifying approach for confidence intervals and testing of equivalence and difference. Biometrics. 1996;83:934-937.
34. Wang SJ, Hung HM J, Tsong Y, Cui L. Group sequential test stratigies for superiority and non-inferiority hypotheses in active controlled clinical trials. Stat Med. 2001;20:1903-1912.
35. Brannath W, Bauer P, Posch M, Maurer W. Biometrics. Forthcoming.
36. Lindley D.V. Introduction to Probability and Statistics-from a Bayesian Viewpoint. Part 1, Cambridge, United Kingdom: Cambridge University Press; 1970.
37. Racine-Poon A, Grieve AP, Fluhler H, Smith AFM. Bayesian methods in practice: experiences in the pharmaceutical industry (with discussion). App Stat. 1986;35:93-150.
38. Spiegelhalter DJ, Freedman LS, Parmar MKB. Applying Bayesian ideas in drug development and clinical trials. Stat Med. 1993;1501- 1511.
39. Spiegelhalter DJ, Freedman LS, Parma MKB. Bayesian approaches to randomized trials (with discussion). J Roy Stat Soc Ser A. 1994;157:357-587.
40. Berry D.A. Bayesian approaches to randomised trials- discussion. J Roy Stat Soc Ser A. 1994;157: 387-476.
41. Spiegelhalter DJ, Myles JP, Iones DR, Abrams KR. Bayesian methods in health technology assessment: a review. Health Tech Assess. 2000;4:38.
Peter Bauer, PhD
Department of Medical Statistics, University of Vienna, Vienna, Austria
Reprint Address
Peter Bauer, PhD, Department of Medical Statistics, University of Vienna, Schwarzspanierstra[beta]e 17, A1090 Wien, Vienna, Austria (e- mail: peter.bauer@univie.ac.at).
Presented at the 13th DIA "Workshop on Statistical Methodology in Clinical R&D," April 8-10, 2002, Venice, Italy.
Copyright Drug Information Association 2003
Related Articles
- Dong-A PharmTech Co., Ltd. Announces Start of Phase III Trials for Udenafil, Its New Erectile Dysfunction Drug Under Development
- Independent Clinical Trial Investigators To Present New Data on Nymox's NX-1207 Drug in Development For Benign Prostatic Hyperplasia
- Siemens and LabCorp Enter Agreement to Co-Develop New Clinical Diagnostic Tests
- Gain Competitive Insight into 123 Anti-Angiogenic Drugs, Under Development By 90 Investigators
- NexGenix Pharmaceuticals Presents Data on a Novel Small Molecule Inhibitor of Heat Shock Protein 90 (Hsp90) at the GTCbio Cancer Drugs Research & Development Conference
- Oncologic Drugs Advisory Committee Recommends FDA Wait for Overall Survival Analysis of Satraplatin for Treatment of Hormone-Refractory Prostate Cancer
- CLSI Publishes Guideline for Toxicology and Drug Testing in the Clinical Laboratory
- 2006 Nobel Prize Winner to Deliver Keynote Address at IBC's 12th Annual Drug Discovery & Development of Innovative Therapeutics World Congress
- Surface Logix SLx-2101 Selected As One of 10 Most Promising Cardiovascular Drugs in Development
- Domantis First to Win UK Innovation in Drug Discovery & Development Award
User Comments (0)

RSS Feeds