|Home | About | Journals | Submit | Contact Us | Français|
Low cerebrospinal fluid (CSF) amyloid-β1-42 concentration and high total-tau/Aβ1-42 ratio have been recommended to support the diagnosis of prodromal Alzheimer’s disease (AD) in patients with amnestic mild cognitive impairment (aMCI) and also to select patients for clinical trials.
We tested this recommendation with clinical trials simulations using patients from the Alzheimer Disease Neuroimaging Initiative who fulfilled the following entry criteria: (1) aMCI, (2) aMCI with CSF Aβ1-42 ≤192 mg/mL, (3) and aMCI with total-tau/Aβ1-42 >.0.39. For each criterion, we randomly resampled the database obtaining samples for 1000 trials for each trial scenario, planning for 1 or 2 year trials with samples from 50 to 400 patients per treatment or placebo group, with up to 40% dropouts, outcomes after using the AD assessment scale-cognitive subscale and clinical dementia rating scale with effect sizes ranging from 0.15 to 0.75, and calculated statistical power.
Approximately 70% to 74% of aMCI patients with CSF measures met biomarker criteria. The addition of the low Aβ1-42 or high tau/Aβ1-42 requirement resulted in minimal or no increase in the power of the trials compared with enrolling aMCI without requiring the biomarker criteria. Slightly larger mean differences between the placebo and treatment groups fulfilling biomarker criteria were offset by increased outcome variability within the groups.
Although patients with aMCI or patients with prodromal AD meeting CSF biomarkers criteria were slightly more cognitively impaired and showed greater decline than patients with aMCI diagnosed without considering the biomarkers, the requirement of biomarker-positive patients would most likely not result in more efficient clinical trials, and trials would take longer because fewer patients would be available. A CSF Aβ1-42 marker, however, could be useful as an explanatory variable or covariate when warranted by the action of a drug.
Using data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), researchers concluded that low cerebrospinal fluid (CSF) amyloid-β1-42 concentrations or high total-tau protein to Aβ1-42 ratios in patients diagnosed with amnestic mild cognitive impairment (aMCI) predicted progression to Alzheimer’s dementia (e.g., 89% with high t-tau/Aβ1-42 ratio within 1 year) . Those researchers suggested that biomarkers could be used as clinical trials entry requirements to improve the efficiency and reduce the sample sizes of trials. Some others have recommended CSF Aβ1-42 markers for this purpose as well [2-5], and at least one pharmaceutical manufacturer required similar biomarker criteria to support a prodromal Alzheimer’s disease (AD) diagnosis in a targeted design clinical trial .
Research criteria for prodromal or early AD, that is, AD before the onset of dementia, have been proposed on the basis of the fulfillment of the criteria for the equivalent of aMCI by a patient, with an episodic memory deficit that fails to improve on cueing, and a biomarker associated with AD that could be identified and measured by either brain imaging or CSFAβ1-42 or tau protein assays . Moreover, ad hoc groups have recommended that future clinical trials for prodromal AD could be made more efficient by introducing the requirement for a CSF Aβ1-42 biomarker [1,3]. Relying on the ADNI database [5,7], researchers calculated that, to demonstrate a 40% reduction in progression on clinical ratings, with 80% power, an alpha error P ≤ .05, and a 2-year drop-out rate of <40%, about 100 or 150 patients would be required for one or another primary outcome per drug and placebo group for those who are selected by using the biomarker criteria as compared with twice as many without the biomarker criteria.
We empirically tested the potential efficiency of these recommendations by statistically simulating a range of clinical trials scenarios with aMCI patients with or without biomarker inclusion criteria using the same database on which the CSF biomarker recommendations were based .
ADNI is a natural history, nontreatment, observational study aimed at setting standards for brain imaging studies and biomarkers for diagnosis and treatment trials . A total of 59 sites, which were mostly academic, recruited 188 participants with mild AD (i.e., mini-mental state examination [MMSE] scores ranging from 21 to 26), 405 with mild cognitive impairment (MCI) (MMSE ranging from 24 to 30), and 229 with no cognitive impairment, who were followed up with regular assessments [5,7].
The MCI inclusion criteria, detailed in previously published data [5,7], are identical to criteria for the MCI of the amnestic-type used in previous MCI clinical trials on cholinesterase inhibitors [8,9], which required a clinical dementia rating (CDR)  score of 0.5 with the memory box scored at 0.5 or greater, and delayed recall from the logical memory II subscale of the Wechsler memory scale–revised  to be ≤8 for 16 years of education, ≤4 for 8 to 15 years, or ≤2 for 0 to 7 years. Patients were required to be largely intact with regard to general cognition and functional performance, and could not qualify for a dementia diagnosis. As in most current clinical trials, participants could continue using marketed anti-dementia drugs if they had been on stable doses for at least 4 weeks before entry .
The main imaging and biomarkers include brain magnetic resonance imaging, positron-emission tomography, and CSF Aβ and tau protein concentrations [5,7]. The main clinical ratings reflected the following clinical trials outcomes: the Alzheimer’s Disease Assessment Scale-cognitive subscale (ADAS-cog) , CDR , MMSE , and functional activities questionnaire . Clinical assessments were carried out at 6-month intervals over the first 2 years.
The ADAS-cog  evaluates memory, reasoning, language, orientation, praxis, language, and word finding difficulty, and is scored from 0 to 70 errors. The CDR  is used to rate impairment (from 0 = not impaired to 3 = severely impaired) in each of the following six categories: memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care; and are summed into the CDR sum of the boxes score (CDR-sb) as a severity measure ranging from 0 to 18.
Simulations were conducted under a detailed protocol  to reflect typical clinical trials for an experimental drug for aMCI or early AD with one treatment and placebo group, 1:1 allocation ratio, and parameters selected to be consistent with previously published trials [8,9]. For each trial scenario, a separate set of patients was constructed by randomly choosing from the ADNI dataset with replacement, that is, patients from the dataset could be present in the simulated groups for more than one occasion. Sample sizes of 50, 100, 200, and 400 per group were used; 12- and 24-month-long trials were considered; the ADAS-cog and CDR-sb were the primary outcomes. The placebo group outcome was the score for the patient at the specified time point in the ADNI database. For the treatment group, a range of effect sizes ranging from 0.15 to 0.75 with 0.10 increments were used to compute an expected treatment effect (or slowing down of decline) reflecting very small to moderately large effect sizes . For each patient, an individual treatment effect was randomly generated from a χ2 distribution with a mean equal to the expected treatment effect to allow for a more realistic distribution of declines over time, where a few patients may fail or worsen more markedly than would be predicted by a normal distribution. The individual treatment effect was shifted by subtracting two times the expected treatment effect, then adding the resultant to the patient’s score at the specified time point in the database. Therefore, even when a patient was reused in the analysis, the actual value used would be modified by this randomly selected amount in the treatment arm. In the placebo arm, use of the same patient would lead to a slight underestimation of the variance, thereby slightly improving the statistical power examined later in the text. Dropout rates of 20% and 40% in both the treatment and placebo groups were incorporated into the scenarios.
Patients were selected for the samples in a way similar to when they apply for clinical trials using the following three sets of inclusion criteria: (1) aMCI diagnosis at screening with the logical memory II delayed memory deficit defined previously; (2) aMCI with CSF Aβ1-42 ≤192 pg/mL; and (3) aMCI with t-tau/Aβ1-42 >.0.39. The latter two criteria were specifically recommended as aforementioned , fulfill newly proposed research criteria for AD , and are consistent with criteria for a commercial prodromal AD clinical trial .
The primary analyses were conducted using a mixed effects linear model (covariance pattern model), which adjusts for missing data to test for differences between the baseline and endpoint values for the treatment and placebo groups. The mixed effects model minimizes bias and better controls for type I error in the presence of missing data . For each set of inclusion criteria, a full model was constructed with group effect, visit effect, and group-by-visit interactions, with age and gender as covariates, and a reduced model with visit, age, and gender effects. A compound symmetric covariance structure was used to model the correlation between visits for each participant. Parameters were estimated using maximum likelihood. Probability values for the group (treatment) effect were calculated using twice the difference in the logarithmic likelihood of the full and reduced model, which follows a χ-squared distribution with the appropriate degrees of freedom. Secondary analyses examined last observation carried forward samples to impute missing values and complete cases using the nonparametric Wilcoxon test to detect any differences between treatment and placebo groups as a result of the skewed distributions of the outcomes. For all analyses, the missing data pattern present in the ADNI database was used to realistically simulate dropouts; observations were missing in simulated datasets in cases where they were originally missing in the ADNI database.
One thousand simulations were carried out for each scenario so that estimates of power could be obtained up to three digits. Power was calculated as the proportion of 1000 simulated trials per trial scenario having a P value ≤.05. Analyses were performed using version 2.10.1 of the R programming environment . Mixed model analyses were performed using version 3.1-89 of the nlme package for R . The database was downloaded on December 7, 2009 from http://www.loni.ucla.edu/twiki/bin/view/ADNI/ADNIClinicalFAQ.
The three sets of inclusion criteria were similar with respect to demographic and clinical characteristics, predominantly, patients were Caucasian, male, married, highly educated, and 57.7% of those whose history was ascertained had a family history of dementia (Table 1).
Overall, 54.0% of the patients were APOE ε4 genotype carriers (one or two alleles) and 64% to 66% of the biomarker-positive patients were APOE ε4 genotype carriers. In all, 44.0% used cholinesterase inhibitors (84.9% donepezil), 9% used cholinesterase inhibitors and memantine, 2.4% used only memantine, and 53.5% used neither. Median duration of previous use of cholinesterase inhibitors was 0.97 years and for memantine it was 0.88 years .
Over 95% of the population was classified by investigators as MCI due to AD; 400 of 402 patients (99.5%) fulfilled criteria for aMCI. Aβ1-42 and total tau levels were determined in 199 and 196 patients, respectively, from CSF samples taken at enrollment. Mean and median Aβ1-42 levels were 163.6 ng/mL (SD = 54.8) and 146 ng/mL, respectively; 148 (74.4%) patients had CSF Aβ1-42 levels of <192 pg/mL. Mean and median total tau levels were 103.6 ng/mL (SD = 60.8) and 87 ng/mL, respectively; 137 (69.9%) patients had t-tau/Aβ1-42 of >0.39.
There were statistical trends for the biomarker-positive groups to have slightly worse rating scale scores. Among the aMCI patients, 95.5%, 90.0%, 80.6%, and 72.1% had outcomes available at 6-, 12-, 18- and 24-months, respectively; and 28.5% of them progressed to dementia at 24 months as compared with 35.8% of low Aβ1-42 and 38.8% of high t-tau/Aβ1-42 patients (P = .23).
Patients showed considerable heterogeneity in their clinical course and within each diagnostic group. Plots depicting change in ADAS-cog and CDR-sb over time for random samples of 25 participants in the aMCI and low Aβ1-42 diagnostic groups are available online in supplemental figures.
Power calculations for the mixed model analyses for the three groups are shown across a range of effect and sample sizes providing for 20% and 40% dropouts (Figs. 1--4).4). Power increased with increasing effect size and sample size, but there was generally little difference in power across the three inclusion criteria. For both the ADAS-cog and the CDR-sb, the requirement for biomarker criteria typically resulted in 2% to 5% increase in power, 7% in a few scenarios (Tables 2, ,3;3; Figs. 1--4).4). Although there were greater mean differences between placebo and treatment groups with the biomarker criteria, there were also greater increases in variability which tended to reduce the effect sizes.
Analyses of 12-month trials using the aforementioned scenarios also did not show a meaningful difference in power for the outcomes across the selection criteria. For both the ADAS-cog and the CDR-sb, aMCI alone typically resulted in 2% to 4% increase in power as compared with the additional requirement for biomarker criteria. Secondary analyses of last observation carried forward and complete cases samples showed similar, very small differences between diagnostic groups (Results are available online in supplemental figures and tables).
ADNI was designed to provide information for future clinical trials and it is ideal for evaluating the benefits of the usage of CSF biomarkers [1,3,5,7]. The assessment of expert-proposed targeted trials designs for AD and the performance of Aβ1-42 diagnostic or predictive biomarkers under experimental, clinical trials conditions have not been done previously. The results in this study provide an empirical estimation of the distribution and accuracy of clinical outcomes and potential biases for future AD trials that would use Aβ1-42 biomarkers or a prodromal AD diagnosis as entry criteria . The low Aβ1-42 and the high t-tau/Aβ1-42 criteria when added to an aMCI diagnosis did not meaningfully affect the efficiency of the trials as compared with the aMCI diagnosis alone.
In the more plausible trials scenarios of small effect sizes of 0.35 or less, 40% dropouts over 2 years, and 200 to 400 patients per group, the gain in power was typically 4% or less with either clinical outcome. This small gain must be weighed against the additional efforts of obtaining CSF, analyzing it, and excluding a proportion of aMCI patients. At least 26% of the ADNI aMCI patients who had lumbar punctures would not fulfill the biomarker criteria, increasing cost and time for recruitment by about one-third in exchange for very little or no gain in statistical power.
The considerable heterogeneity among biomarker-positive participants is a likely explanation for our results. Despite greater clinical worsening of about 0.8 ADAS-cog and 0.4 CDR-sb points over 2 years in the biomarker-positive groups as compared with the overall aMCI group without regard to biomarkers, the standard deviations of the outcomes were larger, decreasing the power to detect treatment differences, that is, the within-group effect sizes were about the same. The use of these biomarker criteria for a targeted clinical trial may select from the extremes of the distribution, where increased within-group variability may offset any increase in mean difference between groups.
A notable difference between the ADAS-cog and CDR-sb outcomes was that the within-group effect sizes (i.e., mean change/SD of the change) were generally larger in the case of the CDR-sb. However, this did not translate to more efficient trials using the CDR-sb in preference to the ADAS-cog outcome in terms of treatment effects, power, and required sample sizes.
Longitudinal studies [21-24], including ADNI [1,25], demonstrate that CSF Aβ1-42 and t-tau concentrations predict clinical progression in MCI patients; and subgroups that progress to dementia at differential rates defined by CSF biomarkers can be identified [1,21,23-25]. Although a consistent finding , it has also been consistently observed that either the memory impairment or the CSF abnormalities provided approximately equal predicted clinical declines without differential sensitivity [24,26]. Similarly, in our analyses the biomarker-positive groups showed only fractional differences on mean baseline and changed scores as compared with the aMCI group selected without consideration of CSF biomarkers. Therefore, it appears that positive CSF biomarkers, when obtained in a clinical research environment after an aMCI diagnosis is made—and therefore perhaps in clinical practice as well—may mainly identify more advanced aMCI or prodromal AD; and if so, then cognitive severity appears to be the more pragmatic predictor of decline [24,26]. Further evidence for this is that the 148 aMCI patients with low concentration values of CSF Aβ1-42 scored significantly worse on screening ADAS-cog, CDR-sb, logical memory-delayed, and functional activities than the 51 patients with high CSF Aβ1-42 (data not shown).
These results have substantial implications for clinical trials planning and interpretation. Assumptions that low CSF Aβ1-42 or high t-tau/Aβ1-42 are more relevant selection criteria for clinical trials are based on views that they aid diagnosis and index greater brain Aβ load and neurodegeneration . However, it is not known whether such biomarker-positive patients would be more likely to respond to an experimental drug or whether a therapeutic effect will be detected more readily. The opposite could be true and targeted design trials that select only patients with Aβ1-42 biomarkers may inadvertently select those who are less likely to benefit because they are too advanced. In fact, the use of CSF Aβ1-42 biomarkers after a clinical aMCI diagnosis is made may not achieve the desired goal of identifying prodromal AD patients early enough in their illness course for a disease-modifying drug to show an effect.
Moreover, the efficiency of a targeted clinical trial design where the premise is that the low CSF Aβ1-42 patients will both deteriorate more and be particularly responsive to treatment, depends on the effectiveness of the drug in both the biomarker-positive and -negative groups, the proportion of biomarker-positive patients in the sample, and the accuracy of the assay . When a small proportion of available patients are biomarker-positive and the drug has little benefit for biomarker-negative patients, then in such cases choosing only biomarker-positive patients would indeed require fewer patients than a standard clinical trial design . In this study, 70% to 74% of aMCI patients were biomarker-positive, potentially limiting the usefulness of CSF Aβ1-42 for screening, and there was no meaningful effect on statistical power. For more efficient trials based on preferentially selecting biomarker-positive patients, the treatment in question must be substantially more effective in that group as compared with the biomarker-negative group. It is important to identify and validate biomarkers for diagnosis and prediction of both disease progression and treatment response when designing targeted clinical trials ; however, CSF Aβ1-42 biomarkers may be differentially informative at different stages [4,24].
The results also demonstrate differences between modeling and simulations in estimating power for clinical trials. Typically, parameters for power calculations are obtained using summary statistics from reference groups and, assuming a range of effect sizes, corresponding sample sizes are calculated. This approach depends on the critical assumptions that the reference group adequately represents the characteristics of the planned trial sample and the summary statistics capture the heterogeneity among the trial participants. However, heterogeneity in the pattern of outcomes may be unrecognized using summary statistics, particularly, when the model requires scores to change linearly over time, and could explain why we observed no significant increase in statistical power in biomarker-positive patients, whereas others calculated greater power for the same sample sizes by using summary data. Therefore, the heterogeneity resulting from the sampling process of the simulations better anticipates the heterogeneity that would be observed in a prospective trial.
One limitation to making inferences from these results is that, although ADNI was meant to inform clinical trials methods, it is not itself a randomized trial. Patients volunteered for a study without planned treatment intervention in which lumbar puncture was optional and ratings were not done under the double-blinded conditions of a randomized, controlled trial. Investigators could obtain knowledge related to APOE ε4 genotype, clinical characteristics, test performance, course, severity, and medication use, which could have in turn influenced their diagnosis, clinical ratings, and performing lumbar punctures. Another potential limitation is that the substantial majority who underwent CSF examinations had low Aβ1-42 concentrations and high t-tau/Aβ1-42 ratios, and, although consistent with a European MCI sample , may not represent samples from broader communities or nonacademic clinics. Finally, although the use of cholinesterase inhibitors is allowed in all long-term AD clinical trials , the nearly half the patients using the drugs were slightly more impaired and declined more as compared with those not using them and this may have affected illness course . The random treatment allocations and thousands of simulations ensured that results were not biased in this respect; however, future simulations studies and clinical trials might consider the potential effects of marketed medications on both internal and external validity.
In summary, selecting aMCI or prodromal AD patients for a clinical trial on the basis of CSF Aβ1-42 biomarker criteria will most likely identify relatively more severe patients and not enhance the statistical power of the trials. In the absence of a strong scientific rationale, it may be more practical and clinically relevant to not have Aβ1-42 CSF biomarkers as a criterion for trials entry in this setting and to restrict their use as explanatory or stratification variables when there are reasons to do so.
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI, NIA U01 AG024904) database (www.loni.ucla.edu/ADNI). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or the writing of this report. A complete list of investigators in ADNI is listed at: http://www.loni.ucla.edu/ADNI/About/About_InvestigatorsTable.shtml.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (NIH U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through contributions from the following: Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical Development, Elan Corporation, Genentech, GE Healthcare, GlaxoSmithKline, Innogenetics, Johnson and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-Plough, Synarc, Inc., the Alzheimer’s Association and Alzheimer’s Drug Discovery Foundation, with participation from the U.S. Food and Drug Administration. Private sector contributions to ADNI are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514; the Dana Foundation; the USC Alzheimer’s Disease Research Center NIH P50 AG05142; and NIH T32 HL072757.
Contributors: All authors participated in the design of the study, statistical analysis, writing and editing of the manuscript, and all reviewed and approved the final draft.
Conflict of interest statement: [During the 36-month window period before submission] LSS reports being an editor on the Cochrane Collaboration Dementia and Cognitive Improvement Group, which oversees systematic reviews of drugs for cognitive impairment and dementia; receiving a grant from the Alzheimer’s Association for a registry for dementia and cognitive impairment trials; receiving grant or research support from Baxter, Elan Pharmaceuticals, Johnson & Johnson, Eli Lilly, Myriad, Novartis, and Pfizer; and having served as a consultant for or receiving consulting fees from Abbott Laboratories, AC Immune, Allergan, Allon, Alzheimer Drug Discovery Foundation, AstraZeneca, Bristol-Myers Squibb, Elan, Eli Lilly, Exonhit, Forest, GlaxoSmithKline, Ipsen, Johnson & Johnson, Lundbeck, Myriad, Medavante, Medivation, Merck, Novartis, Pfizer, Roche, Sanofi-Aventis, Schering-Plough, Servier, Toyama, and Transition Therapeutics.
REK declares that he has no conflict of interest. GRC reports having served on data and safety monitoring committees for AntiSense, Sanofi-Aventis, Bayhill, BioMS, Daichi-Sankyo, Eli Lilly, GlaxoSmithKline, Genmab, Medivation, Ono, PTC Therapeutics, Vivus, NHLBI, NINDS, and the NMSS; and having served as a consultant to or receiving consulting fees from Alexion, Accentia, Bayer, Barofold, Biogen-Idec, CibaVision, Enzo, Eisai, Genentech, Millenium, Novartis, Consortium of Multiple Sclerosis Centers, Peptimmune, Klein-Buendel Incorporated, Incyte, Somnus, Teva, Visioneering Technologies.