Within each sample, QD and ADNI, the differences in AUCs between predictor models were similar, suggesting robustness and generalizability across outpatient settings. When advising patients and families about the likelihood of transition from MCI to AD, a predictor model with specificity over 80% is essential because a false positive rate of over 20% (specificity less than 80%) is clinically unacceptable [
14,
15]. In the predictor model, adding hippocampal and entorhinal cortex atrophy to age, MMSE, and the episodic verbal memory and function measures increased sensitivity only to a small extent at fixed specificities of 80% and 90%. These findings suggest limited added utility for MRI hippocampal and entorhinal cortex volumes to clinical assessment of memory and function in predicting transition from MCI to AD. In contrast, adding measures of episodic verbal memory and function to the model that combined age, MMSE, and hippocampal and entorhinal cortex volumes appreciably increased sensitivity for fixed levels of 80% and 90% specificity in both samples. In both studies, the model that included AVLT/SRT, FAQ, and hippocampal and entorhinal cortex volumes with age and MMSE showed the strongest predictive accuracy.
For episodic verbal memory measures, numerical ranges and cutoffs for specific ages and education levels can inform the likelihood of transition to AD. Although delayed recall deficit is typical in AD, both immediate recall (incorporates learning) and delayed recall show comparable predictive accuracy for the transition from MCI to AD [
4]. The use of a single episodic memory measure in the predictor models examined does not replace the need for a comprehensive neuropsychological evaluation for diagnostic purposes [
4]. Informant reports of FAQ scores reflect instrumental, social, and cognitive functional impairments, but specific cutoffs for prediction of transition to AD are not established [
5,
16]. International efforts to standardize MRI imaging parameters and methods of volumetric assessment [
17], both of which have varied widely across studies, may lead to the development of specific cutoffs for hippocampal and entorhinal cortex atrophy that improve predictive accuracy.
The use of cognitive markers has some advantages over neuroimaging: objectivity in scoring, comparative economy in expense and time, and reliability. One argument is that episodic verbal memory should not be used as a marker because it is used for inclusion criteria and in the diagnostic process. However, evaluation of severity of episodic verbal memory deficit as a predictor in patients with amnestic MCI who have episodic verbal memory deficits is analogous to the established strategy of evaluating severity of depression as a predictor of clinical course and treatment response in major depression [
18]. Further, using memory test scores in prediction creates a statistical handicap, rather than an advantage, by restricting the range in baseline memory test performance [
12]. Of note, the AVLT memory measure examined as a predictor in this paper was not part of the study inclusion criteria in ADNI (WMS-R logical memory was used). The same rationale applies to the incorporation of the MMSE, which is widely used and clinically relevant, in predictor analyses even though it is part of the screening criteria for study inclusion.
Informant report of functional impairment using the FAQ was not part of the inclusion criteria in either QD or ADNI, and the definition of MCI by the original Petersen criteria requires the absence of significant functional impairment [
1,
2]. Therefore, the use of informant report of functional impairment is independent of the diagnostic criteria for MCI, and our findings indicate that this type of assessment is important in predicting transition to AD [
3,
5].
Clinical and neurobiological markers have been incorporated recently into diagnostic classification systems. An international panel used the terms “prodromal dementia” and “predementia” to indicate that neurobiological markers may identify patients with incipient AD who cannot be diagnosed clinically [
19]. The new NIA diagnostic criteria separate core clinical criteria from research criteria that employ neurobiological markers [
20], partly because diagnostic and predictive accuracy for neurobiological markers has not been fully developed and validated. Our results emphasize the need for such validation.
There have been few comparisons of predictor models between studies. In a comparison of ADNI to a Finnish study, classification performance did not increase after the inclusion of 10 variables that included CSF measures, apolipoprotein E
ε4, MRI measures, age, and education [
21]. The overall model was not strong, possibly because key cognitive and functional measures were excluded. Another study compared different samples of patients with MCI who had
18FDG PET with generally positive results [
22] but without cut-points for clinical application. Our report represents a novel independent validation of predictor models that included clinical, memory, functional, and MRI measures. The consistency in the differences between models in each study indicates that this two-study comparison is broader and more clinically relevant than prior validation attempts [
21,
22].
From the ADNI database, several reports show moderate predictive accuracy for weighted scores within a global cognitive test [
23] and moderately strong predictive accuracy for specific neuropsychological test scores [
24], consistent with other studies [
4]. The best possible fit from a high-dimensional pattern classification approach using ADNI MRI data [
25] led to results similar to our report that used volumetric measures, but other MRI analytic strategies using ADNI data have led to lower predictive accuracy [
26,
27]. Entorhinal cortex volume enhanced prediction in both ADNI and QD in our comparisons, supporting the evaluation of entorhinal cortex volume as a predictor [
7].
There were some limitations to this paper. The two samples differed in sex and age distribution and cognitive test scores, significant episodic verbal memory deficits were required in ADNI compared to broader inclusion criteria in QD that may partly account for higher transition rates in ADNI, and different episodic verbal memory measures and different MRI volumetric assessment methods were compared. Nonetheless, within each sample for several combinations of predictors the differences in AUCs were similar. The high transition rate in ADNI suggests that some patients diagnosed with MCI by 3-year followup may convert in subsequent years, likely leading to a higher rate of false negatives in ADNI. This may partly explain the lower accuracy for predictor combinations in ADNI. In ADNI, the smaller number of patients at 3-year followup was partly related to some recently recruited patients not yet having had the opportunity to reach 3-year followup at the time of data analysis for this paper. This issue also precluded the use of survival analysis in this sample. In QD, we derived the strongest predictors from a set of a priori measures in a large neuropsychological test battery and examined comparable measures from the shorter ADNI neuropsychological assessment. While administering a comprehensive neuropsychological test battery is important for diagnostic purposes, our clinically relevant approach of examining individual measures facilitates comparison across studies and demonstrates the predictive strength of even a single episodic verbal memory test. Baseline MRI measures were examined because serial MRI measures were not available in QD. It remains unclear if serial imaging measures are superior to baseline imaging in predicting long-term outcome [
28]. Serial imaging measures provide useful information about structural changes associated with disease progression, but they are expensive, not current clinical practice, and not useful in early converters. Cerebrovascular disease may contribute to cognitive decline in these patients [
19,
20]. However, hyperintensities, lacunes, and infarcts could not be assessed systematically in QD because of the MRI sequences obtained (no FLAIR or comparable sequence) and therefore could not be compared with ADNI. Absent neuropathological validation, we considered examining CSF measures from ADNI (not done in QD) for in vivo validation of transition to AD, but CSF was not collected in approximately half the ADNI sample and neuropathological validation of CSF tau and A
β abnormalities has not been established.
In QD, the pathophysiological measure [
19] of olfactory identification deficits (not done in ADNI) strongly predicted transition to AD with limited overlap in prediction with the SRT and MRI measures [
3,
29]. In ADNI,
18FDG indices (not done in QD) significantly predicted transition to AD and were superior to the ADAS-cog [
8], but the ADAS-cog is a global cognitive measure used primarily in clinical trials of AD patients and is not established as a strong predictor of transition from MCI to AD. PET amyloid imaging discriminates among AD, MCI, and controls [
30] and correlates at autopsy with amyloid plaques [
9]. However, approximately 10–30% of healthy controls show increased amyloid uptake [
30] and whether these subjects have incipient AD needs confirmation in long-term followup studies. The sensitivity and specificity of CSF levels of A
β42 and tau/phospho tau, and their ratio, for predicting MCI transition to AD in ADNI [
31] and in a European multicenter study [
32] ranged from 65% to 75%, which is slightly lower than that in other reports [
10,
11]. For CSF markers, further refinement of assay technique and validation in long-term followup studies are needed to establish more definitive cut-points for individual and ratio measures that have varied to some extent across studies [
10,
11,
32].
This report suggests that volumetric evaluation of medial temporal lobe atrophy adds only marginally to the information obtained by cognitive testing and assessment of episodic memory, and it cannot yet be recommended for wide clinical use to assess the risk of patients with MCI being diagnosed with AD during followup. In the clinic, visual inspection ratings are likely to lead to lower predictive accuracy than either the QD or ADNI volumetric assessments. Structural neuroimaging with MRI remains useful to rule out specific causes of cognitive impairment, for example, stroke, tumor. A key conclusion from this report is that conducting neuropsychological evaluation is important, and interviewing family members or other informants about the patient's functioning may be at least as important as conducting an MRI scan. Several clinical and neurobiological markers, including cognitive test scores, functional ability, and MRI and 18FDG PET measures, are influenced considerably by age and other demographic factors, and their utility needs to be evaluated in more heterogeneous samples. The comparative predictive utility of clinical and neurobiological markers needs further assessment across different populations as these measures improve in predictive accuracy.