Here we show that stratifying MCI participants into dichotomized categories with respect to established AD biomarkers results in subgroups of participants with different rates of clinical decline and brain atrophy, and correspondingly different potentially treatable effect sizes that can be leveraged to increase the efficiency of clinical trials. We further show that power for detecting change due to disease progression varies by outcome measure, so that the most powerful outcome measure-enrichment strategy combination dramatically enhances the ability to detect therapeutic effects of investigational disease-altering treatments. In contrast, when using CSF biomarkers to identify at-risk individuals in the asymptomatic stage, though small differences in atrophy rates relative to the control group were found for restricted brain regions, even reaching significance for the amygdala and parahippocampal cortex, the variance relative to the small effect size suggests that preventive trials using the most sensitive atrophy rate measure, let alone the standard clinical measure, would be prohibitively large, owing to the extremely high upper bounds on the sample size estimates.
As has long been known, the diagnosis of MCI does not reflect a homogenous etiology, but is composed of individuals who may suffer from cognitive impairment due to a variety of causes, including AD pathology. Even among those with AD pathology, individuals are at different stages along the disease continuum, with corresponding differences in rate of expected decline. Given this heterogeneity, clinical trials aimed at the prodromal phase can benefit greatly from enrichment strategies that selectively enroll individuals on the basis of biomarker evidence of disease pathology. Not only can this ensure that enrolled individuals show the pathology that is targeted by the therapeutic agent under investigation (though Aβ pathology is most commonly targeted 
, therapies aimed at tau are also under investigation 
), it can also aid in the identification of individuals at increased risk of rapid disease progression, thereby enabling smaller and shorter duration trials. Alternatively, without enrollment restriction, biomarker stratification could enable potentially informative subgroup analyses.
In addition to providing a basis for clinical trial enrichment, structural MRI measures of change have emerged as the most promising biomarkers for detecting effects of therapy – beneficial or adverse – in AD clinical trials 
. They sensitively track the disease state, with rates of atrophy tending to accelerate as the disease progresses from preclinical to early AD dementia 
, with regional rates of atrophy showing higher sensitivity than whole brain and clinical measures 
. Here, we observed that of the subregional measures, atrophy rate of the entorhinal cortex consistently provided the smallest estimated sample size, regardless of enrichment strategy. Atrophy rate for the amygdala was the next most powerful outcome measure, although sample size estimates obtained using this measure did not significantly differ from those obtained using the entorhinal or the hippocampus as outcome measures. The relatively high power for rate of decline of the amygdala is in agreement with recent reports indicating that the amygdala is prominent in early AD 
. However, caution is warranted in interpreting relative importance of the amygdala versus the hippocampus because of possible mislabeling of voxels for these ROIs due to their proximity and similar image contrast.
In contrast to MCI, there is a relatively high degree of similarity in rate-of-change outcome measures for HCs who may be in a preclinical stage of AD (those testing positive for CSF Aβ and ptau) and those unlikely to be in a preclinical stage of AD (those testing negative for CSF Aβ). Studies to date have not presented a clear picture on how amyloid is associated with increased brain atrophy rates in HCs. Bourgeat et al 
found that hippocampal atrophy was associated with β-amyloid deposition in the inferior temporal neocortex, as measured by PiB retention in PET imaging. Chételat et al 
recently found accelerated cortical atrophy, particularly in the middle temporal gyrus though not in medial temporal lobe structures, in cognitively normal elderly with PiB evidence of high β-amyloid deposition. It should be noted that cortical ‘atrophy’ averaged over the 54 PiB-negative participants appears to show large areas of the cortex expanding
, particularly in sulcal regions (
), a biologically implausible effect that calls into question the accuracy of the method for serial MRI analysis; effects that rely on differences between a study cohort and a control cohort, as in 
, should not be affected by additive bias, but recent findings of bias in image registration point to the need for establishing fidelity of longitudinal image analysis methods 
. Earlier, Fjell et al 
showed that in HCs with low levels of CSF Aβ, cortical atrophy rates were significantly correlated with CSF Aβ, particularly in regions not vulnerable in the early stages of AD. Desikan et al observed that atrophy rate in entorhinal cortex was associated with CSF Aβ only in the presence of ptau 
. Dickerson et al 
showed that a baseline MRI signature for AD – developed in a non-ADNI cohort – that was predictive of subsequent clinical decline in HCs was also associated with decreased CSF Aβ in HCs. Note that care must be taken when comparing results based on PiB, which binds to the neuritic – though not diffuse – amyloid plaques, and CSF Aβ for three reasons: (1) the CSF Aβ values are amyloid monomer concentrations 
, whereas PiB values reflect density of plaques composed of amyloid fibrils; (2) CSF Aβ is a global, not a local or regional measure of amyloid; (3) they are not correlates, but rather have different distributions with age, as shown in 
. Nevertheless, in the current study, a significantly elevated atrophy rate for CSF Aβ+
HCs relative to CSF Aβ–
HCs was observed only in the isthmus cingulate (File S1
Table S2A). Atrophy rate in the parahippocampal gyrus and amygdala was significantly elevated in those additionally testing positive for ptau (File S1
The small difference in atrophy rates and rates of clinical decline observed here between HCs testing positive for CSF biomarkers and those testing negative imply that clinical trials, even if of longer duration than the typical 18 to 24 months, will lack power to detect treatment effects using currently available clinical or structural outcome measures. This conclusion is seemingly at odds with the results of a recent study by Schott and colleagues 
which reported that brain atrophy may be a useful outcome measure in preventive trials. In that study ADNI’s HCs were categorized with respect to CSF Aβ, using the same cut-off threshold applied here, and sample sizes estimated based on rate of atrophy of whole brain, hippocampus, and ventricles, using baseline and 12-month follow-up MRIs only; whole brain atrophy rate was calculated using the KN-BSI method 
, HMAPS with BSI 
was used for the hippocampus, and BSI was used for the ventricles. Results showed that for a treatment effect reported to be equal to 48% of a disease effect calculated from rates of change in 40 Aβ+
HCs relative to rates of change in 65 Aβ–
HCs, sample size of 141 [86 to 287] participants per arm for whole brain atrophy as the outcome measure and 467 [197 to 2675] participants per arm for hippocampal atrophy as the outcome measure would provide 80% power at a significance of 0.05. However, few clinical trials are powered on the basis of such a large effect size; most studies estimate sample sizes to provide sufficient power to detect a slowing in the disease-related rate of decline of 20% 
or 25% 
as we have done here. Scaling Schott and colleagues’ results to an effect size of 25% slowing in disease-related atrophy, to enable comparison with this and prior studies, yields sample size estimates of 500 [317 to 1058] participants per arm for whole brain atrophy as an outcome, and 1722 [726 to 9861] participants per arm for hippocampal atrophy as an outcome. Though the large sample size, and large upper confidence interval, renders hippocampal atrophy rate unsuitable for use as an outcome measure in a preclinical treatment trial, this analysis suggests that whole brain atrophy could be a feasibly outcome measure in a large preclinical trial. However, there is another important difference in the analysis methods that must be considered. Schott and colleagues estimated sample sizes using two timepoints only: baseline and a single followup at 12 months. More reliable estimates of atrophy rates and associated variances, and sample sizes derived from these, would come from using all available followup timepoints – of which there are up to four covering up to 36 months per HC participant – as we have done here. When we analyzed publicly available quality-controlled KN-BSI data for all available visits, as described in detail in 
, for the 39 Aβ+
HCs (including 4 converters) and 65 Aβ–
HCs (excluding 2 converters) available, we obtained a sample size estimate for whole brain atrophy of 1179 [375 to 33090] per arm. We note that, as a check we also analyzed the publicly available KN-BSI data using the baseline and 12 month time points only, and obtained an estimated sample size of 663 [307 to 2358] for 30 Aβ+
HCs (including 2 converters) and 53 Aβ–
HCs (excluding 1 converter). This estimate is in reasonable agreement, given the smaller number of subjects available for our analysis, with the results of Schott and colleagues 
after translation to an effect size of 25% slowing in disease related atrophy (sample size of 500 [317 to 1058] per arm). The sample size of 1179 [375 to 33090] participants per arm, with the large upper bound on the 95% confidence interval when all available time points are used, indicates that rate of whole brain atrophy is not feasible as an outcome measure for AD prevention studies if the effect size of interest is 25% slowing of disease-related atrophy.
There is little information currently available on whether and how AD biomarkers change during the presymptomatic phase of the disease. Natural history studies of long duration will likely be required to establish estimates of biomarker trajectories in the presymptomatic phase so that estimates of the time to significant disease-related change can be established to inform needed duration of preventive clinical trials. Change in biomarkers of amyloid burden, which is thought to rise rapidly and subsequently rise more gently or even plateau during the predementia stage 
, might provide sufficient power in a clinical trial of reasonable duration, if the period during which these changes occur can be reliably identified. Given the known temporal-topographic amyloid plaque deposition pattern, detecting anti-amyloid therapeutic efficacy might further be enhanced by use of longitudinal subregional measures of amyloid deposition from PET imaging, requiring cross-modality registration of structural MRI with PET images.
While current structural measures do not provide feasible outcome measures for primary prevention trials, they can significantly reduce sample sizes compared with cognitive outcome measures in secondary prevention trials, aimed at the prodromal phase when mild impairment is evident. Using enrichment strategies to selectively enroll individuals at high risk of imminent decline can reduce sample sizes even further. However, a strict enrichment approach to clinical trial design means screening out many candidate participants. In ADNI, only about 23% of the MCI cohort would satisfy screening criteria if restricted to those testing positive for all biomarkers examined here, Aβ, Ptau, and atrophy; 77% would fail screening, making this a challenging selective enrollment strategy. The reduced costs enabled by the gain in power from selectively enrolling fewer participants would need to be balanced against the increased cost of screening out large numbers of individuals. Furthermore, given general difficulties in recruiting subjects in clinical trials 
, particularly when they may be associated with deleterious side effects, a selective enrollment criterion that eliminated the majority of potentially eligible candidates could make it very difficult to recruit a large enough sample. Lorenzi et al. 
explicitly assessed the screen-out cost for different single biomarker enrichment strategies, using change in ADAS-Cog and CDR-SB as outcome measures. They examined thresholds needed to either maximize inclusion of MCI-to-AD converters, or to minimize exclusion of these converters, where conversion took place within two years from baseline. The focus on participants who are known to convert in a short period, however, selects for younger participants 
and shifts standard thresholds more into the AD-range (e.g., the CSF Aβ threshold is shifted from 192 pg/ml to 165.8 pg/ml); the more pronounced AD phenotype selected leads to substantial reductions in sample sizes at the cost of a high rate of screen failures. Strategies that minimized exclusion of converters rather than maximizing their inclusion resulted in larger sample sizes, though still smaller than that of an unenriched trial, with a more acceptable rate of screen failures. This study did not examine enrichment that could be enabled by combinations of biomarkers, or examine structural outcome measures, as we have done here.
In addition to weighing the costs of screen failures against improved trial power, ethical concerns must also be explicitly addressed during the design of a clinical trial that plans to incorporate an enrichment strategy 
. In such trials, individuals are likely to be informed of their biomarker status, and it is not yet clear what implications that may have for an individual’s future. Institutional review boards will have to be convinced that the risks associated with disclosure of risk status are adequately minimized before such trials can proceed. With the increasing move towards preventive trials, in which risk must be defined on the basis of biomarkers, much attention is currently focused towards development of methods for accurately conveying information regarding biomarker risk to potential participants, while minimizing negative effects of learning one’s risk status.
An alternative approach to enrichment strategies, which would ease recruitment and avoid the necessity of informing participants of their risk status, is to enroll a broader set of individuals, drawing a balance between selectively enrolling those at high risk while minimizing screen failures, then stratifying participants into biomarker-defined subgroups for analyses. This could determine whether a treatment that might not be effective in the full group showed promise in identifiable subgroups. Such subgroup analyses, and enrichment, could result in drug labeling requirements by regulatory agencies limiting prescription of a successful agent to those with the biomarkers used in the trial. However, given the current lack of any effective therapy for delaying the disease, and the enormous burden the coming epidemic will place on society, establishing efficacy even in a small subgroup would be a development of major importance, and one that could be followed by future trials on less select populations.
A different approach to stratification and enrichment for reducing sample sizes for MCI and AD treatment trials was recently proposed that increased effect sizes by reducing inter-individual variance through adjustment for several factors, including age, genetics, clinical measures of disease severity, baseline brain measures, and CSF biomarkers 
. The authors reported a 10–30% reduction in sample sizes with adjustment for 11 predefined variables. However, some variables might be identified as ‘nuisance’ variables 
, while others might be of crucial importance, depending on therapeutic targeting mechanisms. Thus, for example, if a treatment effect were found for a heterogeneous cohort, it could arise from a strong effect in a particular subset and little or no relevance or effect in another subset of participants. Therefore, though some ‘nuisance’ variability could be controlled for, subgroup analysis would still be needed to identify patients that might benefit most from a treatment, and those for whom risks might exceed the benefits.
A popular model of the sequence of AD biomarkers of the AD pathological cascade 
postulates that amyloid deposition (and CSF Aβ-positivity 
) is an early event followed by neurofibirllary pathology (and CSF ptau-positivity 
) – though this remains contentions 
. Since NFT pathology is strongly linked with synaptic and neuronal injury and loss, next in the postulated sequence of biomarkers is brain atrophy observable on MRI. Consistent with this, we found that in Aβ+
MCI individuals, annual atrophy rates were significantly higher for those who tested positive for ptau as compared with those who tested negative for ptau for all subregions examined, except the hippocampus. Interestingly, the hippocampus showed a trend for elevated atrophy rate earlier in the disease process, when evidence of Aβ pathology was present, but in the absence of ptau pathology. Although the statistical power is limited due to the low number of Aβ+
MCI participants, and bearing in mind that CSF measures are global and so do not fully inform on pathology within particular subregions, a possible interpretation of these findings is that elevation of the hippocampal atrophy rate is an early event occurring during the progression from the initial Aβ–
stage to the Aβ+
stage, with more widespread atrophy occurring at a later stage, when ptau pathology becomes evident. This interpretation is not obviously at variance with the neuropathological evidence, which shows that the entorhinal cortex and hippocampus are both affected by NFT lesions in pre-clinical Braak stage II, additionally with scattered neuritic plaques appearing in the CA1 region 
, while substantial neuron loss for both regions appears to begin in later Braak stages when clinical symptoms manifest: 35% in the entorhinal cortex and 46% in CA1 
. It is possible, perhaps likely, that the Aβ–
MCI participants do not have prodromal AD, but that their cognitive impairment (and subsequent dementia in the case of the seven who converted to a diagnosis of “AD” during follow-up) is due to some other condition, such as vascular dementia or hippocampal sclerosis.
It is also interesting to note that annual atrophy rates for the 48 MCI Aβ+
participants are relatively high, almost 2% per year for the entorhinal, amygdala, and hippocampus (), even though these participants do not exhibit a baseline atrophy pattern indicative of AD. However, 39 of these 48 participants are also Ptau+
, indicating that neuronal injury is likely taking place 
. Thus, although these participants have not yet lost substantial amounts of cortical tissues in AD-vulnerable areas, they are experiencing a rapid rate of degeneration in these areas.
A limitation of this study is that the ADNI HCs are not representative of the general population (although the MCI and AD cohorts have been shown to be representative of patients who might be recruited for therapeutic trials 
). Effect sizes, therefore, between cognitively normal elderly Aβ+
individuals in a more representative sample might be different to those found here. Also, our sample size estimates did not model for screening failures or patient attrition, which can significantly affect trial design.
Due to the failure of clinical trials of candidate disease modifying therapies to slow disease progression in patients already diagnosed with early AD, there is growing interest in conducting secondary and tertiary prevention trials and treatment trials for AD 
, targeting cognitively healthy individuals exhibiting biomarker evidence of the disease and those with mild cognitive impairment. In addition to arresting or slowing clinical decline, establishing disease-modifying properties of therapies will require demonstrating an effect on disease biomarkers. Structural MRI measures of change have emerged as the most promising biomarkers for detecting effects of therapy. The dominant component to structural atrophy is neuron loss, prior to which there will be synapse loss and reduction in neuropil complexity. In the preclinical stage of AD, cognition remains intact, reflecting the preservation of neurons, and structural atrophy on MRI is minimally different from that in older individuals who are not in the preclinical stage. In contrast, cellular biomarkers for AD, indicating advancing amyloid and tau pathologies, become manifest during this stage. Based on the observed atrophy rates in the HCs most likely to have preclinical AD, sample size estimates for preclinical trials are prohibitively large. Longer natural history studies of HCs likely to progress to AD are needed to inform on potential strategies for evaluating treatment effects in this group. It will also be important to take cohort age into account, as larger disease-related effects would be expected with younger cohorts 
In contrast to the preclinical stage, effect sizes are large enough in MCI cohorts to render clinical trials quite feasible at this disease stage. However, given the heterogeneity in etiology and in rates of change in outcome measures across individuals categorized as MCI, enrichment in this disease stage offers important benefits. MCI participants testing positive for the AD atrophy pattern at baseline (MRI+) are likely to be more advanced along the disease trajectory than those testing negative. As a result, stratification by this measure alone offers the single strongest enrichment. However, our results show that the presence of either CSF Aβ or ptau biomarker, regardless of atrophy status, is associated with increased rates of change. Thus, selective enrollment of individuals with the targeted pathology for either anti-amyloid or anti-tau compounds would offer the additional advantage of increasing trial power. For trials aimed at other putative disease targets, where selective enrollment based on amyloid or tau pathology may not be desired, analyses may be stratified by these biomarkers to enhance power for detecting effects in subgroups and to more finely monitor response to therapy by disease stage.
CDR-SB is the most sensitive clinical outcome measure used in clinical trials, and its power is strongly enhanced by enrichment. However, several subregional ROIs, particularly the entorhinal cortex, amygdala, and hippocampus, are significantly more powerful than CDR-SB or whole brain volume, the MRI measure currently used as a secondary outcome variable in clinical trials. The power of subregional MRI outcome measures is also enhanced by enrichment. MRI outcome measures have yet to be validated as surrogates for clinical outcome measures, a process that will require successful clinical trials, but they provide strong evidence for disease-modifying – and not just symptomatic – claims for therapies. The sensitivity of these measures, as demonstrated here, suggests that detecting efficacy of candidate therapies in MCI participants is unlikely to be a limiting factor in AD therapeutics research.