|Home | About | Journals | Submit | Contact Us | Français|
Eighteen-month-long randomized, placebo-controlled clinical trials are common for phase II and phase III drug development for Alzheimer's disease (AD). Yet, no 18-month trial has shown statistically significant outcomes favoring the test drug. We examined characteristics and underlying assumptions of these trials by assessing the placebo groups.
We searched the clinicaltrials.gov registry for randomized, placebo-controlled clinical trials for AD of at least 18-month duration and extracted demographic, clinical, and trials characteristics, and change in main outcomes from the placebo groups. We obtained additional information from presentations, abstracts, publications, and sponsors.
Of 23 trials identified, 11 were completed and had baseline data available; nine had follow-up data available; 17 were phase III. General inclusion criteria were very similar except that minimum Mini-Mental State Examination (MMSE) scores varied from 12 to 20. Sample sizes ranged from 402 to 1,684 for phase III trials and 80 to 400 for phase II. Cholinesterase inhibitor use was from 53% to 100%, and memantine use was from 13.5% to 78%. The AD Assessment Scale-cognitive (ADAS-cog) was the co-primary outcome in all trials; and activities of daily living, global severity, or global change ratings were the other co-primaries. APOE ε4 genotype carriers ranged from 58% to 67%; mean baseline ADAS-cog was 17.8 to 24.2. ADAS-cog worsening in the placebo groups during 18 months ranged from 4.34 to 9.10, with standard deviations from 8.17 to 9.39, increasing during 18 months.
Inclusion criteria are essentially similar to earlier 6-month and 12-month trials in which cholinesterase inhibitors were not allowed, as were mean ADAS-cog rates of change. Yet increasing variability and relatively little change overall in the ADAS-cog placebo groups, eg, about 25% of patients do not worsen by more than 1 point, might make it more unlikely than previously assumed that a modestly effective drug can be reliably recognized, especially when the drug might work only to attenuate decline in function and not to improve function. These observations would be strengthened by pooling individual trials data, and pharmaceutical sponsors should participate in such efforts.
Although 6-month trials are still standard regulatory guidelines [1,2], 18-month-long randomized, placebo-controlled clinical trials are very common for phase II and phase III drug development for Alzheimer's disease (AD). Many 18-month trials have been launched during the past 8 years, but there has been no completed trial with statistically significant outcomes in favor of the test drug. Although this is most likely due to the inefficacy of the drugs tested and underpowered trials, other possibilities include the insensitivity of the cognitive, global, and activities of daily living outcome measures and incorrect assumptions regarding underlying pathology and clinical course.
Despite some concerns about increasingly longer durations of clinical trials [3–5], an ad hoc group has formally suggested longer trials for disease modification coupled with slope analyses and biomarkers, specifically recommending that 18-month-long trials be used . We systematically compared and examined the methodology and some underlying assumptions of these trials with regard to outcomes.
We searched the clinicaltrials.gov registry to identify randomized, double-blinded, placebo-controlled clinical trials for AD of 18-month duration or longer. We separated the trials into completed and ongoing trials and extracted summary sociodemographic and clinical data characterizing patients and methodologic characteristics of the trials. The former included mean ages, gender, educational level, APOE genotype, cholinesterase inhibitor and memantine use, and clinical rating scales scores at baseline. Methodologic characteristics extracted included inclusion criteria, sample size, randomization allocation ratio, and clinical outcomes scores. Because the Alzheimer's Disease Assessment Scale-cognitive subscale (ADAS-cog)  is frequently used and recommended  as the primary cognitive outcome and the Clinical Dementia Rating scale (CDR) , clinician's global impression of change , and activities of daily living scales [10,11] as co-primary outcomes, we retrieved those change scores from the placebo groups over the durations of the completed trials.
We obtained information from the clinicaltrials.gov registry, presentations at meetings, published abstracts, and publications on the trials. We searched Google and Google News and queried sponsors to seek additional information on the unpublished trials identified on clinicaltrials.gov. We summarized data into text and tables describing characteristics of the completed trials and ongoing trials and the changes on main outcomes scales of the placebo groups from the completed trials in order to facilitate review.
From 243 AD trials citations on clinicaltrials.gov (accessed January 15, 2009), we identified twenty-three 18-month trials. Eleven trials were completed as of May 2009; 12 were ongoing and recruiting. Ten of the 11 completed trials and seven of 12 ongoing trials were classified by their sponsors as phase III and the others as phase II. We obtained screening or baseline demographic and clinical information from 10 of the 11 completed trials, and we obtained clinical outcomes follow-up data from the placebo groups from nine trials. Two of the 11 completed trials were discontinued by their sponsors prematurely, after enrollment was complete but before the last patient completed the 18-month follow-up, because a previous trial with the same drug did not show statistically significant results, and the development programs were terminated.
Data were obtained and summarized from the 11 completed trials (Table 1). The sponsors of the trials were Pfizer (one trial), Sanofi-Aventis (two trials), Bellus (previously Neurochem, two trials), Myriad (two trials), Elan and Wyeth (one trial), and the National Institutes for Health Alzheimer's Disease Cooperative Study (NIH ADCS; three trials).
Inclusion criteria were very similar. All required participants to have probable AD diagnoses . Mini-Mental State Examination (MMSE)  inclusion scores ranged from 12 to 26 (1 trial), 13 to 26 (1 trial), 14 to 26 (2 trials), 16 to 26 (5 trials), and 20 to 26 (2 trials). Differences among trials were mainly in certain trial-specific exclusion criteria in which a medical condition or concomitant use of a medication might confound effects of the test drug, eg, vitamin use, abnormal lipid profiles, excess fatty acid dietary intake, and diabetes were each an exclusion criterion in one trial but not others.
Sample sizes ranged from 402 to 1,684 for the phase III trials; sample size was 234 for the phase II. The three NIH ADCS–sponsored trials used sample sizes of 402, 406, and 409. Sample sizes for the pharmaceutical company–sponsored phase III trials of drugs under development ranged from 841 to 1,684. The numbers of clinical sites per trial ranged from 40 to 133; thus the average number of patients enrolled per site per trial ranged from 7.4 to 15.7. Five trials were conducted exclusively in the United States; two in the U.S. and Canada; two in North America and Europe; one exclusively in Europe; and one across the U.S., Europe, South Africa, Asia, Australia, and New Zealand.
Eight trials randomized patients to one dose of test medication or placebo; two randomized to two doses or placebo. The phase II trial was unique in that its methodology involved randomizing four cohorts of 60 patients to ascending doses of an amyloid-beta monoclonal antibody, bapineuzumab, or placebo and staggering the starts of each cohort. Therefore, the placebo group is the sum of the placebo groups from the four sequentially conducted comparisons .
Mean age per trial ranged from 73.6 to 76.3 years for the phase III trials and was 69.0 for the phase II trial. Gender distribution ranged from 50.1% to 59.4% female, and mean years of education were from 13.9 to 14.3, with 26% to 62% of patients per trial having some university education. The mean proportion of patients per trial that carried one or two APOE ε4 alleles ranged from 58.1% to 66.9%.
All trials allowed patients to use cholinesterase inhibitors; three required their use, and one required donepezil specifically. In seven of 11 trials, more than 91% of the participants used cholinesterase inhibitors. In the remaining four, cholinesterase inhibitor use was 82%, 75%, 68%, and 53%. All allowed memantine use, and the baseline prevalence for use ranged from 13.5% to 78% in the nine trials from which information was available.
The ADAS-cog was the primary cognitive outcome in all trials. The co-primary outcomes, used as measures of clinical meaning, were the ADCS Activities of Daily Living inventory (ADCS-ADL; two trials), Disability Assessment for Dementia  (DAD; one trial), CDR (six trials), and ADCS-Clinical Global Impression of Change (ADCS-CGIC; two trials). An activities of daily living scale was used in all trials: the ADCS-ADL inventory (seven trials), DAD (three trials), and AD Functional Assessment and Change Scale (one trial). The CDR was used in 10 trials. The bapineuzumab phase II trial was originally designed with safety outcomes as primary but was changed to co-primary efficacy outcomes, the ADAS-cog and DAD, during the course of the trial.
Mean baseline ADAS-cog was 18.0 and 18.8 in the two trials of mild AD that restricted patients to MMSE scores from 20 to 26 and ranged from 21.1 to 24.2 in the other, more broadly inclusive trials. The lower allowable limits for the MMSE (ie, 12, 13, or 14) did not affect the mean baseline ADAS-cog score. Mean baseline CDR sums of the boxes scores were 4.90 and 5.30 in the two mild AD trials and ranged from 5.28 to 6.17 in the others. Mean ADCS-ADL scores were from 55.6 to 67.9.
Magnetic resonance imaging brain volume estimates were included in subsets from seven trials, and some included subsets from which cerebrospinal fluid (CSF) was obtained for biomarkers.
The 12 ongoing trials included seven phase III and five phase II, all with drugs or vaccines intended to lower or oppose the putative toxic effects of amyloid-beta protein fragments (Table 2). Inclusion criteria were very similar to the completed trials, requiring patients with mild to moderate AD. All but one required MMSE scores from 16 to 26, and the other, an ADCS-sponsored trial, allowed MMSE scores from 14 to 26. The lower age ranges were the same except for two that restricted the age to 55 years rather than 50. All the trials allowed or required the use of cholinesterase inhibitors, and all allowed the use of memantine.
Four of the seven phase III trials were with the experimental amyloid-beta antibody bapineuzumab and were sponsored by Elan and Wyeth pharmaceutical companies. Two trials, identical in methods with each other, included only APOE ε4 allele carriers and randomized patients to one dose or placebo. Two others, also identical in methods with each other, included only APOE ε3 or APOE ε2 allele carriers and randomized patients to three doses of antibody or placebo.
Two phase III trials were with the gamma-secretase inhibitor, semagacestat, LY 450139, and were sponsored by Lilly pharmaceuticals. The intravenous immune globulin trial was different from the other phase III trials in that two doses are compared with placebo during a period of 9 months for the primary efficacy assessment, and the same comparison during a period of 18 months is a secondary efficacy assessment. It is also the smallest phase III trial, enrolling only 360 patients. Three of the five phase II trials included amyloid-beta vaccines with multiple doses or regimens. The other two were with an inhibitor of the advanced glycation end product receptor and an amyloid fibrillogenesis inhibitor, scyllo-inositol, each using two or three doses of medication compared with placebo.
Four of the five phase II trials listed safety as the primary outcomes. They had smaller planned sample sizes from 80 to 400 and placebo sizes from approximately 27 to 133, as compared with six of the seven phase III trials with sample sizes from 800 to 1,500 and placebo sizes from 400 to 550.
Six of the 12 trials are being conducted in North America and one in North America, the United Kingdom, and Australia. Four are more broadly international, although mostly in English-speaking countries and including Japan, Taiwan, and India. One small phase II vaccine trial is conducted in France, Germany, and Spain.
The seven phase III trials listed the ADAS-cog as the primary cognitive outcome. The co-primary outcomes for the phase III trials included the DAD (four bapineuzumab trials), the ADCS-ADLs (two trials), and the CDR (one trial). All the ongoing phase III trials included provisions for subgroups for blood, CSF, or brain imaging biomarkers. One ongoing trial was initiated in March 2007, five in November and December 2007, three in the first half of 2008, two in the last half of 2008, and one in March 2009.
Main outcomes from the placebo groups of the nine trials with available data are summarized in Table 3. One trial that provided baseline data has been completed and analyzed, but the outcomes have not been made available, and one trial was completed in May 2009 with results to be presented in July 2009.
Sample sizes of the placebo groups ranged from 169 to 809 for the phase III trials; it was 110 for the phase II trial. Dropouts from eight placebo groups ranged from 17.2% to 33%, and the ninth showed an unusually large 41%.
Mean ADAS-cog worsening in the placebo groups over 18 months was from 4.34 to 8.14 for the phase III studies, with standard deviations (SDs) from 8.17 to 9.39, and standardized change or effect sizes (ie, mean change/SD) from 0.51 to 0.94 SD units. Mean ADAS-cog changes at 6 and 12 months ranged from 1.04 to 2.35 and from 2.41 to 5.37, respectively. For the phase II trial, mean ADAS-cog change was 9.10 (SD, 8.33) at 18 months.
The mean CDR sum of boxes change was from 2.05 to 2.74 at 18 months, with SDs from 2.57 to 3.12 and effect sizes from 0.73 to 0.98 in the phase III trials and 2.99 (SD, 2.92), effect size 1.02 in the phase II trial. The mean CGIC scores at 18 months were 5.11 estimated from the atorvastatin trial report  and 5.23 (SD, 0.97), effect size 1.27, for the simvastatin trial, where 4 is no change, 5 is minimal worsening, and 6 is moderate worsening. In the later trial 3.3% patients were judged by clinicians as improved, 14.6% as not changed, and 44.4% as minimally worse. The mean ADCS-ADL inventory change was from 9.7 (SD, 14.0) to 11.4 (SD, 13.0), effect sizes from 0.69 to 0.90, in the four trials from which this information was available.
The 18-month trials are essentially similar to previous 6-month and 12-month-long trials, with standard criteria for probable AD and qualifying severity with MMSE ranges. All allowed the same top limit of 26 on the MMSE, but with the lower limit varying from 12 to 20. The patient groups were comparable among the trials with respect to age, gender, educational levels, and baseline rating scale scores. APOE ε4 genotype distributions were similar in both the mild and the mild to moderate trials and to population-based estimates , indicating that the patients selected on the whole were typical of AD patients samples.
The phase II bapineuzumab trial was different from the other completed trials, having the smallest sample size, fewest clinical sites, and the youngest mean age of 69.0, at least 4.6 years lower than the other trials. The placebo group showed the greatest mean worsening of the ADAS-cog over 18 months, 9.10 compared with the next greatest 8.14, and a median of 6.5 points among the trials. These apparent differences might represent selection bias, random variation, and the imprecision of point estimates in smaller samples.
The tarenflurbil trials, attempted to identify a milder population by restricting the MMSE score range to 26-20 [24,25]. Despite scoring better on the ADAS-cog at baseline, however, this group did not perform better on the ADCS-ADL inventory, scoring within the 55.6 to 67.9 range of all the trials. Also the mean changes on the ADAS-cog, ADCS-ADL inventory, and CDR over 18 months in these trials was similar to or larger than the change seen in most of the trials that enrolled patients with lower MMSE scores. Moreover, the ADAS-cog changes were similar to the changes in the uncontrolled AD Neuroimaging Initiative (www.adni.org) that also enrolled mild AD patients (Table 3, footnote). These observations do not support the use of restricted MMSE scores to identify a reliably milder subgroup and they contradict the suggestion that the higher baseline scores will yield smaller changes over time.
It is not obvious why the two nearly identically designed xaliproden trials had the smallest ADAS-cog mean changes and effect sizes. Both were demographically and geographically comparable to the other trials. One distinction is that fewer patients in these trials used cholinesterase inhibitors and memantine than in the other trials. There might be unrecognized differences in how these particular trials were performed, perhaps with respect to sample selection, sites, trial management, cognitive test versions, or scoring methods. However, decline on the CDR and ADCS-ADL was also less in one of the trials than in the others, suggesting external and consensual validity to the observation.
A CGIC was used in only two of the 23 trials as the co-primary outcome assessing clinical significance, perhaps because of concern about sensitivity and stability of raters and ratings over such a long period of time. Evidence here suggests otherwise. The mean CGIC scores at 18 months were 5.11 for the atorvastatin trial and 5.23 for the simvastatin trials in the minimal worsening range, and the effect size for the simvastatin trial, 1.27 SD units, was substantially larger than the effect sizes for the CDR, ADAS-cog, and ADL in all the other trials. This large effect, however, might be consequent to the inherent expectation that patients will deteriorate or to the CGIC being, in fact, more sensitive to change than other measures .
Similarly, the CDR might be more sensitive to change than the ADAS-cog, in that the CDR effect sizes were nominally larger than the ADAS-cog effect sizes in all the trials that reported both (Table 3). Similar to the CGIC, the CDR relies on a clinician's judgment. However, it assesses current extent of impairments, severity, not change from baseline, and might be less influenced by the raters' expectations of worsening. Future trials could consider the CDR as a main cognitive outcome as well, because CDR scoring is heavily weighted toward assessing orientation, memory, and problem-solving. One caveat is that although a rating might show greater sensitivity to change over time within a treatment group, it does not follow that it will be better able to distinguish the effect of a particular drug from placebo.
Cognitive decline in the placebo group was observed despite the use of cholinesterase inhibitors, and the rates of ADAS-cog decline at 6 and 12 months were similar to historical placebo groups not treated with these drugs [3,26]. Moreover, the more recent trials in which more than 90% of patients used the drugs seemed to show greater worsening on the ADAS-cog than the trials in which fewer patients used the drugs at baseline. Trials with substantial European samples and that were started before memantine's U.S. introduction in 2004 had less memantine use than the more recent and predominant North American trials in which at least half to more than three fourths of patients used the drug. The amount of memantine use is especially surprising, considering that the drug is not indicated in the U.S. for patients with MMSE scores higher than 14, and the Food and Drug Administration specifically refused to approve a new drug application for memantine for mild AD because of lack of efficacy . The potential effects of these drugs, indeed their continuing effects if any, could be better understood by analyses of individual patient data pooled from these trials.
The ongoing 18-month trials (Table 2) are all with drugs expected to have anti–amyloid-beta actions and direct effects on the pathologic progression of AD. The sizes of the phase II studies, with less than the 240 patients included, are likely to be too small to show a reliable effect. By comparison, the ongoing phase III trials are virtually identical to each other and with the previous trials, with the main addition that the CDR is now the most common co-primary outcome, displacing ADLs. Although this might ultimately be demonstrated to be a reasonable choice, it is made absent evidence on its performance compared with ADLs or the CGIC.
The bapineuzumab phase III trials are unique in hypothesizing that tolerability and outcomes will differ by APOE genotype. Positive outcomes might lead to rather complicated labeling, with different indications, doses, efficacy, and safety considerations depending on a patient's APOE genotype.
Although durations of AD trials have increased to 18 months, none has shown statistical significance for the experimental intervention. Conceptual issues about the kinds of drugs that require this length of time to manifest their effects need to be addressed. Two important assumptions are that current drugs in development are expected only to attenuate worsening on outcomes and not improve them and that the therapeutic effect will persist and yet might not be detectable for 18 months. Indeed, a 12-month duration of efficacy for the marketed cholinesterase inhibitors has only been demonstrated in a few trials, and trials in mild cognitive impairment suggest that efficacy might be only fleeting, if at all, over the longer term [28–30]. Although these longer trial designs might be intended to observe changes in disease progression, it is not clear that they do anything other than extend the observation period for symptomatic effects.
Trial methods presuppose that patients decline generally in parallel with each other and do not drop out in excessive numbers. Inspection of available individual patient data shows that although, on average, placebo patients worsen, they do not worsen in parallel but rather “fan out,” showing broad intraindividual and interindividual variations as further indicated by the increasing SDs of ADAS-cog change over 18 months observed among the trials (Table 3). Moreover, on average, the cognitive progression is slight. The mean change of the ADAS-cog and the SDs over 18 months in these trials indicate that approximately 25% of placebo-treated patients worsen by no more than 1 point (eg, Z = 6.54 points – 1 point/SD = 5.54/8.17 = 0.68, equivalent to 24.8%).
The changes in the placebo groups from these trials are consistent with two observational studies. The Alzheimer's Disease Neuroimaging Initiative study in mild AD (ie, MMSE from 21 to 26) showed mean 4.3 (SD, 6.4) ADAS-cog points change at 12 months and 9.9 (SD, 9.2) at 24 months, and the Real.fr study  reported 4.02 (SD, 6.83) change at 18 months, both mainly in patients receiving cholinesterase inhibitors from before study entry.
An ad hoc European expert group recommends that the ADAS-cog, ADCS-ADL scale, and CDR be used as outcomes because they are the most widely used in mild to moderate AD trials and “no available data suggest suitable alternatives” for disease-modifying trials. The group consensus was that a 2-point drug-placebo group difference at 18 months on the ADAS-cog should be the “minimal clinically important change (MCIC)” [6,32]. This difference, however, represents a shift of the placebo effect by less than 0.25 SD units, and it might be legitimately questioned whether such a difference is indeed clinically meaningful as some of the expert group members have [33,34], recommending elsewhere that a 7-point, within-patient ADAS-cog change be considered minimally significant . At this threshold, of course, more than half of the placebo patients in these trials would have been considered not to have worsened. This expert group also recommended slopes analyses from the perspective that diverging slopes of decline between drug and placebo groups are evidence for disease modification, yet the broad variability in progression, the fanning out, through 18 months suggests that modeling group change as a slope might be misleading.
At least two things need to occur in an 18-month trial to improve chances for detecting efficacy with current outcomes. A greater proportion of the placebo sample needs to measurably worsen on the primary outcome scale, and the drug group needs to improve over baseline to overcome the broad variances in change. In addition, sample sizes need to be fairly large, as large as the largest phase III trials, to reliably detect change. Because the change detected will be small and the placebo group will have changed only slightly, an interpretation of such change as clinically meaningful still will be controversial.
Many characteristics of these trials, such as the modest decline of the placebo groups, the expectation that new drugs will only attenuate this modest decline, heterogeneity and variability of clinical course over only 18 months, and the imprecision of the outcomes ratings, when considered together, markedly diminish the likelihood of discovering modestly effective drugs. Drugs with very different therapeutic expectations are developed by using essentially the same phase II and phase III clinical methods as were used with cholinesterase inhibitors and memantine [3,4]. Current phase II and phase III AD drug development traces a rut wherein the development program for each new candidate drug is modeled on the most recent competitor's program, with limited regard for the drug's unique characteristics, therapeutic expectations, or methodologic limitations. The lack of precedent success is itself a barrier to drug development.
Further insight can be attained by pooling and analyzing individual patient outcomes from these trials. Trial methods can be advanced by collaboratively examining factors that might be important in the progression of illness and the ability to detect change. Pharmaceutical companies could collaborate by allowing their clinical trials data to be further examined and pooled with other trials to better assess and develop trials methodology.
The authors thank Patrice Douillet, Sanofi-Aventis; Michael Grundman, Elan Pharmaceuticals; Joseph F. Quinn and Paul Aisen, NIH Alzheimer Disease Cooperative Study; Daniel Saumier, Bellus Health; and Kenton Zavitz, Myriad Pharmaceuticals for providing additional data and Philip In-sel, University of California, San Francisco, for Alzheimer Disease Neuroimaging Initiative data.
L.S.S. is a member of the steering committee of the National Institute on Aging-Alzheimer Disease Cooperative Study (NIA-ADCS), the sponsor or co-sponsor of five trials discussed in this report, and has served as a consultant to the following companies who are developers or marketers of drugs for Alzheimer's disease and whose drugs are discussed in this review: Elan, Eli Lilly, Forest, Johnson and Johnson, Lundbeck, Medivation, Merck, Merz, Myriad, No-vartis, Pfizer, Roche, Sanofi-Aventis, Schering Plough, and Wyeth. M.S. is a member of the steering committee of the National Institute on Aging-Alzheimer Disease Cooperative Study (NIA-ADCS), the sponsor or co-sponsor of five trials discussed in this report, is principal investigator of the simvastatin trial, and has served as a consultant to the following companies who are developers or marketers of drugs for Alzheimer's disease and whose drugs are discussed in this review: Elan, Forest, Genentech, Johnson and Johnson, Medivation, Merck, Novartis, Pfizer, Takeda, and Wyeth.