|Home | About | Journals | Submit | Contact Us | Français|
We examined the improvement in statistical power that could be obtained in therapeutic trials for early (pre-dementia) Alzheimer’s disease (AD) by constraining enrollment to individuals with amnestic mild cognitive impairment (MCI) and an atrophy pattern on a screening MRI previously found to be predictive of clinical decline, or to individuals with MCI and the apolipoprotein E ε4 genetic risk factor for AD. Treatable effects were defined as absolute change versus change relative to healthy controls (HC). Data from 168 HC and 299 MCI participants were analyzed to determine sample sizes required to detect 25% slowing in mean rate of decline using global function, cognitive function, and structural measures as outcome variables. Reductions in estimated sample sizes of 10-43%, were observed using the genetic enrichment strategy; reductions of 43-60% were observed with the neuroimaging enrichment strategy. Sample sizes needed to detect slowing in rate of atrophy in MCI relative to HC were dramatically larger than those needed to detect absolute change in atrophy rates. Constraining enrollment to MCI subjects with predictive atrophy on a screening MRI could improve the efficiency of clinical trials. Failure to take into account normal age-related changes risks under-powering trials designed to test disease-modifying properties of potential treatments.
There is growing concern that therapeutic interventions aimed at slowing or halting the neurobiological processes underlying Alzheimer’s disease may be only minimally effective when administered to individuals meeting clinical criteria for dementia. Clinical trials of such therapies are more likely to succeed if tested in individuals who are in an early, pre-dementia stage of AD but are nonetheless likely to experience clinical deterioration over a relatively short period of time, if left untreated. Individuals with Mild Cognitive Impairment (MCI) 1, 2 represent a possible target population for clinical trials since MCI is associated with an increased risk of progression to a diagnosis of probable AD, with rates of 5-16% per year relative to 1-2% for the general population 3-5. However, MCI is a heterogeneous condition, not a perfect predictor of AD: some individuals deteriorate rapidly, others remain stable for many years, and some revert to normal cognitive status. The ability to restrict enrollment in clinical trials to the subset of MCI individuals most likely to have pre-clinical AD and to experience rapid decline could improve statistical power of clinical trials, enabling the use of smaller sample sizes and shorter trial periods – factors that could substantially reduce the cost or required duration of the trial.
Here we investigate the relative benefit in statistical power that could be achieved by a genetic or a structural neuroimaging enrichment strategy. The apolipoprotein E (APOE) ε4 allele is a well-known risk factor for AD and confers a higher risk of developing AD in a dose-specific manner 6. Constraining enrollment to those MCI participants with at least one APOE ε4 allele is thus one potential method for enhancing the power of a clinical trial. Prior research suggests that structural atrophy, evident on MRI data, also confers a higher risk of rapid clinical decline in individuals with MCI 7-9. Thus another potential enrichment strategy is to constrain enrollment to MCI individuals who show, on a baseline structural MRI, an atrophy pattern previously shown to be predictive of rapid clinical decline 8. We assessed the impact of these potential enrichment strategies on statistical power for two commonly used outcome variables in AD treatment trials: the Clinical Dementia Rating Scale Sum of Boxes score (CDR-SB) 10, and the Cognitive subscale of the AD Assessment Scale (ADAS-Cog) 11. We also examined four structural measures as potential outcome variables, including non-specific measures that have been shown to be sensitive to progressive degeneration in AD (whole brain and ventricular volumes) 12, 13, 14 and mesial temporal structures (hippocampus and entorhinal cortex) that are early targets of AD pathology 15 and sensitive to changes in early and prodromal AD 12, 16-20. Sample sizes were estimated with and without considering the effects of normal aging.
Data used in this study were obtained from the ADNI 21, 22. ADNI is an ongoing longitudinal study carried out by a broad consortium of academic institutions and private corporations designed to test whether serial MRI, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to effectively measure the progression of MCI and early AD. Determination of sensitive and specific markers of early AD progression may help researchers and clinicians develop new treatments and monitor their effectiveness, and lessen the time and cost of clinical trials. ADNI was launched in 2003 by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, the Food and Drug Administration, private pharmaceutical companies, and non-profit organizations, as a $60 million, 5-year public-private partnership.
ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations. ADNI has recruited 229 cognitively normal individuals to be followed for 3 years, 398 people with amnestic MCI to be followed for 3 years, and 192 people with mild AD to be followed for 2 years (see www.adni-info.org). The research protocol was approved by each local institutional review board and written informed consent was obtained from each participant or participant’s guardian.
Data from 168 healthy control (HC) and 299 amnestic MCI participants were analyzed. ADNI general eligibility criteria are described at http://www.adni-info.org/index.php?option=com_content&task=view&id=9&Itemid=43). In brief, subjects are 55-90 years of age, generally healthy, non-depressed, have a modified Hachinski score of 4 or less, and a study partner able to provide an independent evaluation of functioning. Inclusion criteria for MCI subjects include MMSE scores between 24-30, a subjective memory complaint, objective memory loss measured by education-adjusted scores on the Wechsler Memory Scale Logical Memory Test-II, a CDR of 0.5, preserved activities of daily living, and an absence of dementia 23. Full details of the ADNI study are publicly available at: http://www.adni-info.org/.
Subjects were included in this study if MRI data from the baseline scan and from at least one of the 6, 12, 18, or 24-month follow-up intervals were available and met local quality review procedures. Seven HCs who converted to a diagnosis of MCI or AD during the time course of the study were excluded. Data included here reflect data available as of 08/24/2009 and processed locally as of 09/05/2009. Of the 168 HCs, 93% had acceptable 6 month follow-up MRI data, 86% had acceptable 12 month follow-up data and 70% had acceptable 24 month follow-up data (HCs were not evaluated at 18 months in the ADNI). Of the 299 MCI subjects, 89% had acceptable 6 month follow-up data; 79% had acceptable 12 month follow-up data, 71% had acceptable 18-month follow-up data and 61% had acceptable 24 month follow-up data.
Raw DICOM MRI scans (including two T1-weighted volumes per subject per time visit were downloaded from the public ADNI site (http://www.loni.ucla.edu/ADNI/Data/index.shtml). These data were collected across a variety of scanners with protocols individualized for each scanner 22, as defined at http://www.loni.ucla.edu/ADNI/Research/Cores/index.shtml. In our laboratory, MRI data were reviewed for quality, automatically corrected for spatial distortion due to gradient nonlinearity 24 and B1 field inhomogeneity 25, registered, and the two volumes per subject were averaged to improve signal-to-noise. Volumetric segmentation 26, 27 and cortical surface reconstruction 28-31 and parcellation methods 32, 33 based on the FreeSurfer software package were used to quantify regional atrophy. The morphometric procedures used to quantify regional thickness and volumes in baseline scans are described in detail elsewhere 34. Results of automated labeling of the hippocampus and entorhinal cortex are shown for a representative MCI participant in Figure 1. Approximately 5% of HC and 11% of MCI subjects’ baseline MRI data failed local quality review, primarily due to extreme white matter disease or atrophy.
To quantify longitudinal change in brain structural measures, all available MRI data from the 6, 12, 18 and 24 month follow-up visits were analyzed. Dual T1-weighted follow-up scans for each subject at each follow-up time point were corrected for spatial distortion due to gradient nonlinearity and B1 field inhomogeneity, rigid-body aligned, averaged, then affine aligned with 12 degrees of freedom to the subject’s baseline scan. A deformation field was calculated from nonlinear registration 35 and used to align scans at the sub-voxel level. The reduction of site-specific distortion effects and normalization of inhomogeneities improves the accuracy of the morphometric analysis 19 and minimizes the effects of instrumental drift on atrophy rate estimates. The follow-up aligned image underwent skull stripping and subcortical segmentation, with labels applied from the baseline scan. For cortical reconstruction, surface coordinates for the white and pial boundaries were derived from the baseline images and mapped onto the follow-up images using the deformation field. Parcellation and labeling from the baseline image was applied to the follow-up image, resulting in a one-to-one correspondence between each vertex in the baseline image and the follow-up images. Visual quality control, blind to diagnosis, was performed on the volume change field to exclude cases with significant degradation in meaningful registration for at least one ROI, due to artifacts or major changes in scanner hardware between visits. The most common form of artifact was due to within-scan subject motion. Loss of scans due to motion artifacts may be greatly reduced in future trials by using real-time motion correction procedures 36, 37. Quality control procedures on longitudinal data resulted in rejection of approximately 10% of HC subjects and 12% of MCI subjects who had acceptable baseline MRI data, resulting in the final sample size of 168 HC and 299 MCI.
To identify MCI individuals with the baseline atrophy pattern previously found to be predictive of decline, we used a linear discriminant model based on that obtained in our prior study 8. In that study, we trained a linear discriminant classifier on data from 139 HC and 84 AD subjects from the ADNI and found a pattern of baseline regional atrophy that discriminated HC from AD data with a fully cross validated sensitivity of 83% and specificity of 93% 8. The pattern involved atrophy, relative to HCs, in the hippocampus, entorhinal cortex, middle temporal gyrus, bank of the superior temporal sulcus, isthmus cingulate, superior temporal gyrus, and medial and lateral orbital frontal gyri. Application of this discriminant model to the 299 MCI subjects here (including 151 subjects from our prior report 8) resulted in classification of 151 individuals as showing the atrophy pattern predictive of decline at baseline, and 148 who did not (Figure 2). Characteristics of the HC and the two MCI subgroups are shown in Table 1. For the genetic enrichment strategy, 170 MCI participants had at least one APOE ε4 allele; and 129 did not. Characteristics of these MCI subgroups are shown in Table 2.
Differences between groups on demographic and outcome measures were assessed with chi-square analyses for categorical variables and analyses of variances (ANOVAs) for continuous variables. When significant main effects of group were observed in the ANOVA, post-hoc comparisons were performed to determine differences between the pairs of the groups using Bonferroni corrections for multiple comparisons.
The sample size required to detect 25% slowing in mean rate of decline for a hypothetical disease-modifying treatment versus placebo was estimated for a 24 month, two-arm, equal allocation trial, with a 6-month assessment interval. Power calculations were performed with the requirement that the trial have 80% power to detect the treatment effect using a 2-sided significance level of 5%. Power calculations are for a linear mixed effects model analysis 38 comparing mean rate of decline in the treatment arm versus the untreated arm using the formula:
where is the variance of the random slopes within the MCI group, and is the residual error variance of the mixed effects model. The detectable effect size, Δ, is set to 25% of the mean rate of decline observed in the MCI subjects when effects of normal aging are not taken into account (“absolute change”); otherwise the detectable effect size is set to 25% of the mean rate of decline observed in MCI subjects minus that observed in HC subjects (“relative change”). For structural outcome measures, change at each time interval was calculated as percent change from baseline values. Parameters for power calculations were estimated using the lmer function in the R package lme4. The 95% confidence intervals for the estimated sample sizes were obtained from 1000 bootstrap samples, where bootstrap resampling was with replacement. The number of values per bootstrap sample was equal to the number in the original sample within HCs and within MCI cases.
To facilitate comparison with other studies that have estimated sample sizes needed to detect slowing in decline in MCI participants39,40, we also calculated sample sizes based on a linear mixed effects model ignoring between-subject variance in rate of change (i.e., with random intercepts but taking the group-specific rate of change as a fixed effect). These results are presented in Supplementary Table 1, Supplemental Digital Content 1, http://links.lww.com/WAD/A3.
Sample size implications of the genetic and neuroimaging enrichment strategies for clinical and structural outcome measures are summarized in Tables 3 and and4,4, respectively, and shown in Figure 3. A treatment trial using the CDR-SB as the primary outcome measure and employing the same recruitment criteria as the ADNI would expect a mean increase of 0.67 points per year on the CDR-SB in the placebo arm. If enrollment were constrained to individuals with predictive atrophy, an annual increase of 0.97 points would be expected. The resulting increase in power due to the larger potential treatment effect would enable a 57% reduction in sample size (from 492 to 207 subjects/arm), based on detecting a 25% reduction in change relative to HCs. This reduction in sample size is statistically significant, as indicated by the non-overlapping confidence intervals (Table 3). In contrast, if enrollment were constrained to MCI subjects with the genetic risk factor, the small increase in the potential treatment effect (0.72 points versus 0.67) would permit only a 10% reduction in sample size (from 492 to 443 subjects/arm). Similar results were observed for the ADAS-COG outcome measure, where a larger potential treatment effect with a neuroimaging enrichment strategy relative to the full MCI group allowed for a 46% reduction in estimated sample size; the genetic enrichment strategy permitted only a 17% reduction in sample size.
For structural outcome measures, both the genetic and neuroimaging enrichment strategies resulted in smaller sample size estimates relative to that obtained using the full MCI group, but greater reductions were observed for the neuroimaging than for the genetic strategy (see Table 4 and Figure 3). Non-overlapping confidence intervals for estimates obtained with the neuroimaging strategy relative to the full MCI group indicated that the sample size reductions permitted by the neuroimaging enrichment strategy were significant for the hippocampas and entorhinal cortex outcome measures. Overall, the smallest estimated sample size, when the effects of age were taken into account, was observed for the neuroimaging enrichment strategy with the entorhinal cortex as the outcome variable (113 subjects/arm).
For all structural measures, HCs showed an annual rate of change that ranged from 30% of the amount observed in the full MCI cohort (entorhinal cortex), to 48% (whole brain volume). The reduction in the treatable effect size when change in HCs was subtracted from change observed in MCI participants resulted in substantial increases in sample size estimates (Table 4). For example, the sample size needed to detect slowing of atrophy in whole brain volume in MCI subjects relative to an untreated control group was 37% of the size needed to detect change in excess of that experienced by HCs (181 vs. 679 subjects/arm, respectively; Table 4).
Heterogeneity within the MCI population with regard to rate of disease progression is problematic for clinical trials, creating the need for large sample sizes and long follow-up periods to ensure adequate statistical power for assessing disease-modifying treatment effects. Strategies that would enable such trials to enroll a more homogenous sample of MCI individuals who are at higher risk of imminent decline than the general MCI population could enhance the observed treatment effects, thereby increasing the power to observe those effects. Consideration of outcome variables and the definition of treatable effects are also vital to the design of clinical trials.
We evaluated the reduction in sample size that could be obtained by using two potential enrichment strategies relative to a trial using the same enrollment criteria as the ADNI; criteria that have been used in prior therapeutic trials 4. A genetic enrichment strategy, based on the presence of an APOE ε4 allele, could be easily and inexpensively implemented. However, restricting trial enrollment to those with an APOE ε4 allele could limit generalizability of the results since beneficial therapeutic effects and adverse side effects may differ as a function of genetic status. Furthermore, our results showed that although sample sizes were generally smaller with this strategy relative to an unenriched trial, the neuroimaging enrichment strategy offered even greater benefit.
Constraining enrollment to MCI individuals with a baseline pattern of atrophy previously found to be predictive of clinical decline would allow a 58% reduction in sample size using the CDR-SB as the primary outcome variable, or a 60% reduction in sample size, using change relative to controls in the entorhinal cortex as the outcome variable. Such a reduction in sample size would offer substantial savings in a clinical trial, exceeding the cost associated with the neuroimaging analysis required for this strategy. Since screening MRIs are routinely employed in clinical trials, collection of MRI data poses no additional expense. The semi-automated methods based on the FreeSurfer software package 26-31 used here to quantify regional atrophy in MCI subjects can be efficiently applied to large samples: MRI data processing required approximately 45 minutes per subject for a trained technologist to manually review and edit the cortical surface according to minimal, objective editing rules, with 24 hours’ computation time for image construction using a dual quad core Intel(R) Xeon(R) CPU E5420 with a processing speed of 2.50GHz and16GB ram. Use of several CPUs allows processing of multiple subjects’ scans to occur in parallel.
We also evaluated the sample size implications of two methods of defining treatable effect size: absolute change and change relative to controls. Since measures of global function are relatively stable in healthy individuals, use of absolute versus relative change had little impact on sample size estimates using CDR-SB as the outcome variable. For the ADAS-Cog, HCs showed a small improvement in scores over the 1 year period, presumably due to practice effects, whereas MCI subjects deteriorated, resulting in smaller sample size estimates for relative than for absolute change measures. Much larger differences in estimated sample sizes in the opposite direction were observed, however, for structural outcome measures.
Consistent with prior reports, we found that HCs exhibited significant 1 year reduction in whole brain, hippocampal and entorhinal cortex volumes, as well as ventricular expansion 19, 41-45. Since a treatment designed to halt or slow the progression of AD is unlikely to affect brain changes associated with normal aging, failure to take into account the magnitude of change experienced by HCs may lead to substantial underestimation in sample sizes needed to detect a beneficial treatment effect on disease-related brain atrophy. Prior studies have reported that failure to control for effects of normal aging would result in sample size estimates approximately 25-35% smaller than that required to detect a therapeutic effect on disease-related slowing of whole brain14, hippocampal 46 or entorhinal 46 atrophy in patients with AD. Since the proportion of atrophy attributable to general aging is higher in individuals with MCI than in those with AD, we observed even greater differences in sample size estimates. For example, approximately 48% of the change in whole brain volume experienced in the full MCI sample could be attributed to general effects of aging, resulting in a sample size estimate 73% smaller than that needed to detect a reduction in disease-related whole brain atrophy.
Direct comparison of the current results to prior studies that have examined sample sizes needed to detect slowing in rate of whole brain 13, 14 , hippocampal 17, 39 , or entorhinal 46 atrophy, or ventricular expansion 13 in AD and MCI patients is hindered by differences in subject populations; number and timing of follow-up MRIs; MRI quantification methods; and statistical analysis methods. Nevertheless, comparison of our results to recent studies that have analyzed data from ADNI’s MCI participants reveals that our estimate for detecting slowing in rate of hippocampal atrophy is smaller than that reported by Schuff et al 39, but consistent with the results of Hua et al. 40 , who reported that 88 subjects per arm would be needed to detect 25% slowing in mean rate of temporal lobe atrophy. When we used a statistical model similar to that used by Hua et al. (i.e. by not fitting random slopes within the MCI group), we found that 73 subjects per arm would be sufficient for detecting 25% slowing in entorhinal atrophy and 95 subjects per arm for detecting slowing of hippocampal atrophy, relative to an untreated MCI group (See Supplementary Table 1, Supplemental Digital Content 1, http://links.lww.com/WAD/A3). However, failure to take into account between-subject variability in rate of progression may result in substantial sample size underestimation, as can be see by comparing results in Tables 3 and and44 for the full MCI group with results presented in Supplementary Table 1, Supplemental Digital Content 1, http://links.lww.com/WAD/A3.
We investigated the implication of enrichment strategies using symptomatic changes in global and cognitive function, and structural atrophy as outcome measures, rather than attainment of the clinical diagnosis of probable AD (i.e., “conversion” to AD) within the period of the trial. As recently reviewed 47, symptomatic changes in cognition and function may be more effective outcome measures than diagnostic conversion since the definition of conversion is inherently arbitrary in a disease characterized by continuous progression. The lack of diagnostic precision can negatively affect the accuracy and generalizability of trial results 2, 47, 48. Furthermore, a continuous primary outcome measure, rather than a dichotomous measure based on conversion to probable AD, may allow for shorter treatment trials 47.
The ADAS-Cog is routinely used as a primary cognitive outcome measure in AD treatment trials, but has been reported to be suboptimal for use in MCI due to its lack of sensitivity at this mild stage of the disorder 47. The large sample size estimates needed to detect slowing in annual rate of change obtained here and in prior reports 39, 40 are consistent with this. The CDR-SB is another commonly used outcome variable in AD clinical trials, and has been found to be sensitive to longitudinal decline in MCI treatment trials 4, 49. Consistent with this, we found that CDR-SB was more sensitive than the ADAS-Cog, requiring a sample size approximately half that required by the ADAS-Cog to detect a treatment effect.
Structural outcome variables are of interest due to their face validity as markers of disease-modifying properties of a treatment, and to their reduced variability compared to symptomatic measures 12, 17. Of the structural measures assessed here, change in entorhinal cortex was the most sensitive outcome measure. Using an enrichment strategy of constraining enrollment to MCI individuals with atrophy predictive of decline, a sample size of only 113 subjects per arm would be needed to detect a 25% slowing in mean rate of decline relative to that experienced by HCs.
This study is limited by the lack of histological verification of clinical status. It is certain that some HCs have AD pathology 50, 51. To minimize the risk of including HCs with preclinical AD, HC subjects who converted to a diagnosis of MCI or AD during any follow-up visit were excluded. Additionally, some individuals with MCI may suffer from pathologies unrelated to AD. Continued follow-up with ADNI’s MCI sample may help address this issue. We note that the absence of the predictive atrophy pattern at baseline in MCI individuals does not imply that these individuals are not in a pre-clinical AD state. As a group, the MCI individuals without predictive atrophy experienced progressive clinical and structural decline consistent with AD, but progressed at a slower rate than MCI individuals with predictive atrophy. We also note that we did not consider the effect of study subject dropout in the power calculations reported here. Sample sizes for actual trials should be larger than the numbers reported here to account for loss of power due to dropout.
Results show that an enrichment strategy for clinical treatment trials of selectively enrolling MCI individuals with evidence of structural atrophy consistent with AD on a screening MRI can improve power to detect a treatment effect on global function and progressive structural deterioration. They also point to the necessity of taking into account the rate of change in healthy elderly when using structural variables as outcome measures to ensure that the trial is adequately powered to detect a disease-modifying effect of treatment.
We thank Robin Jennings, Michele Perry, Chris Pung, and Elaine Wu for downloading and preprocessing the ADNI MRI data.
Source of Funding: This research was supported by grants from the National Institute of Aging (AG031224; K01AG029218; RO3AG034439), and the National Center for Research Resources (#U24 RR021382).
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI; Principal Investigator: Michael Weiner; NIH grant U01 AG0-24904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and through generous contributions from the following: Pfizer Inc., Wyeth Research, Bristol-Myers Squibb, Eli Lilly and Company, GlaxoSmithKline, Merck & Co. Inc., AstraZeneca AB, Novartis Pharmaceuticals Corporation, Alzheimer’s Association, Eisai Global Clinical Development, Elan Corporation plc, Forest Laboratories, and the Institute for the Study of Aging, with participation from the U.S. Food and Drug Administration. Industry partnerships are coordinated through the Foundation for the National Institutes of Health. The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego (Principal Investigator: Paul Aisen; NIH Grant U01 AG0-10483). ADNI data are disseminated by the Laboratory of Neuro Imaging (LONI) at the University of California, Los Angeles.
Disclosure. Anders M. Dale is a founder and holds equity in CorTechs Labs, Inc, and also serves on the Scientific Advisory Board. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. Linda K McEvoy’s spouse is President, CorTechs Labs, Inc.