The 18-month trials are essentially similar to previous 6-month and 12-month-long trials, with standard criteria for probable AD and qualifying severity with MMSE ranges. All allowed the same top limit of 26 on the MMSE, but with the lower limit varying from 12 to 20. The patient groups were comparable among the trials with respect to age, gender, educational levels, and baseline rating scale scores.
APOE ε4 genotype distributions were similar in both the mild and the mild to moderate trials and to population-based estimates [
23], indicating that the patients selected on the whole were typical of AD patients samples.
The phase II bapineuzumab trial was different from the other completed trials, having the smallest sample size, fewest clinical sites, and the youngest mean age of 69.0, at least 4.6 years lower than the other trials. The placebo group showed the greatest mean worsening of the ADAS-cog over 18 months, 9.10 compared with the next greatest 8.14, and a median of 6.5 points among the trials. These apparent differences might represent selection bias, random variation, and the imprecision of point estimates in smaller samples.
The tarenflurbil trials, attempted to identify a milder population by restricting the MMSE score range to 26-20 [
24,
25]. Despite scoring better on the ADAS-cog at baseline, however, this group did not perform better on the ADCS-ADL inventory, scoring within the 55.6 to 67.9 range of all the trials. Also the mean changes on the ADAS-cog, ADCS-ADL inventory, and CDR over 18 months in these trials was similar to or larger than the change seen in most of the trials that enrolled patients with lower MMSE scores. Moreover, the ADAS-cog changes were similar to the changes in the uncontrolled AD Neuroimaging Initiative (
www.adni.org) that also enrolled mild AD patients (, footnote). These observations do not support the use of restricted MMSE scores to identify a reliably milder subgroup and they contradict the suggestion that the higher baseline scores will yield smaller changes over time.
It is not obvious why the two nearly identically designed xaliproden trials had the smallest ADAS-cog mean changes and effect sizes. Both were demographically and geographically comparable to the other trials. One distinction is that fewer patients in these trials used cholinesterase inhibitors and memantine than in the other trials. There might be unrecognized differences in how these particular trials were performed, perhaps with respect to sample selection, sites, trial management, cognitive test versions, or scoring methods. However, decline on the CDR and ADCS-ADL was also less in one of the trials than in the others, suggesting external and consensual validity to the observation.
A CGIC was used in only two of the 23 trials as the co-primary outcome assessing clinical significance, perhaps because of concern about sensitivity and stability of raters and ratings over such a long period of time. Evidence here suggests otherwise. The mean CGIC scores at 18 months were 5.11 for the atorvastatin trial and 5.23 for the simvastatin trials in the minimal worsening range, and the effect size for the simvastatin trial, 1.27 SD units, was substantially larger than the effect sizes for the CDR, ADAS-cog, and ADL in all the other trials. This large effect, however, might be consequent to the inherent expectation that patients will deteriorate or to the CGIC being, in fact, more sensitive to change than other measures [
9].
Similarly, the CDR might be more sensitive to change than the ADAS-cog, in that the CDR effect sizes were nominally larger than the ADAS-cog effect sizes in all the trials that reported both (). Similar to the CGIC, the CDR relies on a clinician's judgment. However, it assesses current extent of impairments, severity, not change from baseline, and might be less influenced by the raters' expectations of worsening. Future trials could consider the CDR as a main cognitive outcome as well, because CDR scoring is heavily weighted toward assessing orientation, memory, and problem-solving. One caveat is that although a rating might show greater sensitivity to change over time within a treatment group, it does not follow that it will be better able to distinguish the effect of a particular drug from placebo.
Cognitive decline in the placebo group was observed despite the use of cholinesterase inhibitors, and the rates of ADAS-cog decline at 6 and 12 months were similar to historical placebo groups not treated with these drugs [
3,
26]. Moreover, the more recent trials in which more than 90% of patients used the drugs seemed to show greater worsening on the ADAS-cog than the trials in which fewer patients used the drugs at baseline. Trials with substantial European samples and that were started before memantine's U.S. introduction in 2004 had less memantine use than the more recent and predominant North American trials in which at least half to more than three fourths of patients used the drug. The amount of memantine use is especially surprising, considering that the drug is not indicated in the U.S. for patients with MMSE scores higher than 14, and the Food and Drug Administration specifically refused to approve a new drug application for memantine for mild AD because of lack of efficacy [
27]. The potential effects of these drugs, indeed their continuing effects if any, could be better understood by analyses of individual patient data pooled from these trials.
The ongoing 18-month trials () are all with drugs expected to have anti–amyloid-beta actions and direct effects on the pathologic progression of AD. The sizes of the phase II studies, with less than the 240 patients included, are likely to be too small to show a reliable effect. By comparison, the ongoing phase III trials are virtually identical to each other and with the previous trials, with the main addition that the CDR is now the most common co-primary outcome, displacing ADLs. Although this might ultimately be demonstrated to be a reasonable choice, it is made absent evidence on its performance compared with ADLs or the CGIC.
The bapineuzumab phase III trials are unique in hypothesizing that tolerability and outcomes will differ by APOE genotype. Positive outcomes might lead to rather complicated labeling, with different indications, doses, efficacy, and safety considerations depending on a patient's APOE genotype.
Although durations of AD trials have increased to 18 months, none has shown statistical significance for the experimental intervention. Conceptual issues about the kinds of drugs that require this length of time to manifest their effects need to be addressed. Two important assumptions are that current drugs in development are expected only to attenuate worsening on outcomes and not improve them and that the therapeutic effect will persist and yet might not be detectable for 18 months. Indeed, a 12-month duration of efficacy for the marketed cholinesterase inhibitors has only been demonstrated in a few trials, and trials in mild cognitive impairment suggest that efficacy might be only fleeting, if at all, over the longer term [
28–
30]. Although these longer trial designs might be intended to observe changes in disease progression, it is not clear that they do anything other than extend the observation period for symptomatic effects.
Trial methods presuppose that patients decline generally in parallel with each other and do not drop out in excessive numbers. Inspection of available individual patient data shows that although, on average, placebo patients worsen, they do not worsen in parallel but rather “fan out,” showing broad intraindividual and interindividual variations as further indicated by the increasing SDs of ADAS-cog change over 18 months observed among the trials (). Moreover, on average, the cognitive progression is slight. The mean change of the ADAS-cog and the SDs over 18 months in these trials indicate that approximately 25% of placebo-treated patients worsen by no more than 1 point (eg, Z = 6.54 points – 1 point/SD = 5.54/8.17 = 0.68, equivalent to 24.8%).
The changes in the placebo groups from these trials are consistent with two observational studies. The Alzheimer's Disease Neuroimaging Initiative study in mild AD (ie, MMSE from 21 to 26) showed mean 4.3 (SD, 6.4) ADAS-cog points change at 12 months and 9.9 (SD, 9.2) at 24 months, and the Real.fr study [
31] reported 4.02 (SD, 6.83) change at 18 months, both mainly in patients receiving cholinesterase inhibitors from before study entry.
An ad hoc European expert group recommends that the ADAS-cog, ADCS-ADL scale, and CDR be used as outcomes because they are the most widely used in mild to moderate AD trials and “no available data suggest suitable alternatives” for disease-modifying trials. The group consensus was that a 2-point drug-placebo group difference at 18 months on the ADAS-cog should be the “minimal clinically important change (MCIC)” [
6,
32]. This difference, however, represents a shift of the placebo effect by less than 0.25 SD units, and it might be legitimately questioned whether such a difference is indeed clinically meaningful as some of the expert group members have [
33,
34], recommending elsewhere that a 7-point, within-patient ADAS-cog change be considered minimally significant [
34]. At this threshold, of course, more than half of the placebo patients in these trials would have been considered not to have worsened. This expert group also recommended slopes analyses from the perspective that diverging slopes of decline between drug and placebo groups are evidence for disease modification, yet the broad variability in progression, the fanning out, through 18 months suggests that modeling group change as a slope might be misleading.
At least two things need to occur in an 18-month trial to improve chances for detecting efficacy with current outcomes. A greater proportion of the placebo sample needs to measurably worsen on the primary outcome scale, and the drug group needs to improve over baseline to overcome the broad variances in change. In addition, sample sizes need to be fairly large, as large as the largest phase III trials, to reliably detect change. Because the change detected will be small and the placebo group will have changed only slightly, an interpretation of such change as clinically meaningful still will be controversial.
Many characteristics of these trials, such as the modest decline of the placebo groups, the expectation that new drugs will only attenuate this modest decline, heterogeneity and variability of clinical course over only 18 months, and the imprecision of the outcomes ratings, when considered together, markedly diminish the likelihood of discovering modestly effective drugs. Drugs with very different therapeutic expectations are developed by using essentially the same phase II and phase III clinical methods as were used with cholinesterase inhibitors and memantine [
3,
4]. Current phase II and phase III AD drug development traces a rut wherein the development program for each new candidate drug is modeled on the most recent competitor's program, with limited regard for the drug's unique characteristics, therapeutic expectations, or methodologic limitations. The lack of precedent success is itself a barrier to drug development.
Further insight can be attained by pooling and analyzing individual patient outcomes from these trials. Trial methods can be advanced by collaboratively examining factors that might be important in the progression of illness and the ability to detect change. Pharmaceutical companies could collaborate by allowing their clinical trials data to be further examined and pooled with other trials to better assess and develop trials methodology.