|Home | About | Journals | Submit | Contact Us | Français|
Behavioral markers measured through neuropsychological testing in Mild Cognitive Impairment (MCI) were analyzed and combined in multivariate ways to predict conversion to Alzheimer’s disease (AD) in a longitudinal study of 43 MCI patients. The test measures taken at a baseline evaluation were first reduced to underlying components (Principal Components Analysis, PCA) and then the component scores were used in discriminant analysis to classify MCI individuals as likely to convert or not. When empirically weighted and combined, episodic memory, speeded executive functioning, recognition memory (false and true positives), visuospatial memory processing speed, and visuospatial episodic memory were together strong predictors of conversion to AD. These multivariate combinations of the test measures achieved through the PCA were good, statistically significant predictors of MCI conversion to AD (84% accuracy, 86% sensitivity, and 83% specificity). Importantly, the posterior probabilities of group membership that accompanied the binary prediction for each participant indicated the confidence of the prediction. Most of the subjects (81%) were in the highly confident probability bins (0.70 – 1.00), where the obtained prediction accuracy was more than 90%. The strength and reliability of this multivariate prediction method were tested by cross-validation and randomized resampling.
Recent large-scale studies of Mild Cognitive Impairment (MCI) have suggested that not all patients with MCI will convert to Alzheimer’s disease (AD) (Petersen, 2004). As a result, identifying those with MCI who are likely to convert to AD is becoming increasingly important. Early identification of MCI patients who will convert to AD is essential to timely administration of pharmacologic and therapeutic interventions as well as to determining with some confidence which subjects with memory disorders are appropriate for various research studies. Amnestic MCI is a clinical diagnosis commonly characterized by a memory deficit which does not interfere significantly with activities of daily living (Petersen et al., 1999). This memory deficit must fall at least 1.5 standard deviations below age-adjusted performance on standardized tests of memory and should be corroborated by an informant (Petersen, 2004). While the majority of MCI patients have primary memory deficits, the behavioral heterogeneity of MCI is becoming clearer. Numerous reports now suggest that MCI can present initially as a primary impairment in other cognitive domains including language, visuospatial or visuoperceptual abilities (Mapstone, Steffenella, & Duffy, 2003), executive function, or even affect. The notion of multi-domain MCI complicates early diagnosis as there are numerous disorders which may cause subtle cognitive deficits in multiple domains. Because the primary cognitive deficit in most cases of MCI and AD is in the domain of memory, it is not surprising that memory performance, particularly new learning, recall, and retention, are strong predictors of conversion from MCI to AD. However, some studies have suggested that at baseline other cognitive domains, including executive function and lower cognitive abilities, better predict conversion from MCI to AD (e.g. Rozzini et al., 2008). Still other studies suggest that non-amnestic or multi-domain MCI patients convert to AD at lower rates than amnestic MCI patients (Maioli et al., 2007). These discrepancies highlight the behavioral heterogeneity of the clinical presentation of MCI and AD and the difficulties of applying findings based on groups of patients to individual patients.
Measuring group differences is an important aspect of understanding the differing cognitive processes between progressing to dementia and remaining stable; however, predicting individual outcomes is essential to early intervention in patients. Despite the great interest in identifying MCI patients who are at high risk for developing AD, presently there are no clinical or imaging markers that predict conversion to AD with certainty or reliably establish which MCI subjects will convert (Brayne, 2007; Marcos et al., 2006). While ongoing research emphasizes biological markers found in analyses of blood and cerebrospinal fluid (Bateman, Wen, Morris, & Holtzman, 2007; Papaliagkas, Anogianakis, Tsolaki, Koliakos, & Kimiskidis, 2009; Simonsen et al., 2007), in anatomical and functional brain imaging studies (Klunk et al., 2004), in event-related potential studies (Chapman et al., 2009; Chapman et al., 2007), and in other biomedical techniques, here we will investigate if behavioral markers measured through neuropsychological testing can be analyzed and combined in multivariate ways to predict conversion to AD. Because cognitive changes are a prominent and early feature of AD, focusing on neuropsychological markers for conversion would seem appropriate. Neuropsychological testing, which is relatively inexpensive and noninvasive to the patient, has long been used in the clinical assessment of AD.
Studies have shown that neuropsychological measures of episodic memory have good power in predicting MCI progression to AD (Albert, Blacker, Moss, Tanzi, & McArdle, 2007; Albert, Moss, Tanzi, & Jones, 2001; Bondi et al., 1994; Lekeu et al., 2010; Marcos et al., 2006; Perri, Serra, Carlesimo, & Caltagirone, 2007). Also, non-memory measures have been studied in the interest of increasing predictive success (Babins, Slater, Whitehead, & Chertkow, 2008; Bäckman, Jones, Berger, Laukka, & Small, 2005; Lekeu et al., 2010; Marcos et al., 2006; Rozzini et al., 2008; Tabert et al., 2006; Tierney, Yao, Kiss, & McDowell, 2005). In this article, we test whether employing multivariate methods to combine data from differing cognitive domains can predict conversion to AD in MCI patients. We will study the predictive power of the tests using a multivariate method with two levels of empirically derived weighting. The test measures will first be reduced to underlying components (Principal Components Analysis) and then the component scores will be combined in a weighted, linear fashion (discriminant analysis) to classify individuals. PCA resolves a correlation matrix of test measures from a set of subjects into underlying components, and each subject receives a component score for each component. This data reduction from many test measures to a few components is also key in reducing concerns about degrees of freedom in the discriminant analyses. We hypothesize these multivariate composite measures will be able to predict AD in MCI individuals with strong, statistically significant success. The reliability and influence of chance will be assessed through validation and randomized resampling analyses.
We studied 43 elderly individuals diagnosed with Mild Cognitive Impairment (MCI) (Table 1). These subjects were recruited from the Memory Disorders Clinic at the University of Rochester and other affiliated University of Rochester clinics. The MCI subjects were evaluated by memory-disorders physicians and met current consensus criteria for the amnestic subtype of MCI (“a-MCI”) (Petersen, 2004; Petersen et al., 1999; Petersen et al., 2001). (In this article, we will use the term “MCI” to refer to amnestic MCI). Each MCI participant was subsequently found to either have converted to clinically defined AD (by the NINCDS-ADRDA criteria (McKhann et al., 1984) and DSM-IV-TR criteria for Dementia of the Alzheimer’s Type (American Psychiatric Association, 2000) or to have remained stable with regard to cognitive state. These determinations were made at a later date through clinical follow-ups by the same memory-disorders physicians, who were blind to our study data. Those who converted were given the clinical diagnosis of “probable” AD (but referred to here as AD for brevity’s sake). Of the 43 MCI patients, 14 were subsequently diagnosed with AD (the Conversion to AD group, or Conversion group) and 29 were not (the Stable group). The clinical diagnoses of MCI and AD were based on the history, relevant laboratory findings, and imaging studies routinely performed as part of the clinical assessment of dementia (Petersen et al., 2001). Separate cognitive testing was performed by the memory-disorders physicians to assist with their diagnoses; these tests included the Mini Mental State Examination (MMSE) (Folstein, Folstein, & McHugh, 1975), a clock face drawing, the Auditory Verbal Learning Test (Rey, 1964; Taylor, 1959), and a category fluency task (animal naming). With the exception of the MMSE, the clock face drawing, and the category fluency task (animal naming) (all of which had small weights in the components used in discrimination), no cognitive test used in clinical decision making was repeated as part of our experimental cognitive test battery described below. Thus, our study maintained relative independence between predictors and diagnostic outcomes (Tierney et al., 2005).
The median number of months between the initial diagnosis of MCI and the subsequent diagnosis of AD was 19.7 (interquartile range p25-p75 = 10.1-24.4) for the Conversion group. For the Stable group, the median number of months between the initial MCI diagnosis and the most recent clinical work-up was 19.6 (interquartile range p25-p75 = 10.1-27.6). The gender, age, and education demographics for each group appear in Table 1. There were no significant group or gender differences for age and education. In the Conversion group, 7 of the 14 individuals were taking cholinesterase inhibitors and/or memantine at the time of testing. In the Stable group, 13 of the 29 individuals were taking these medications. The proportions taking these medications were not significantly different between these groups (Fisher’s Exact Test, χ2 (1, N = 43) = 0.04, p = 0.69).
Exclusion criteria for all groups included clinical (or imaging) evidence of stroke, Parkinson’s disease, HIV/AIDS, and reversible dementias, as well as treatment with benzodiazepines, antipsychotic, or antiepileptic medications. As an additional inclusion criterion for our study, all clinical subjects had a previously administered score of 21 or higher on the MMSE (this criterion included AD subjects; the MCI subjects used in this study had mean MMSE scores of 25 or 27 (Conversion or Stable) as shown in Table 2). There was no significant difference between the two MCI subgroups in comorbid depressive symptoms (as shown through the Geriatric Depression Scale) or in impact of disease on daily activities (indicated by the Blessed Dementia Scale) (Table 2). In general, the mean scores for the Geriatric Depression Scale for each group were considered “normal” for depressive symptoms (Hickie & Snowdon, 1987).
Our study received IRB approval from the University of Rochester Research Subjects Review Board, and informed consent was obtained from each subject.
More subjects were used to develop the neuropsychological component structure with PCA. In addition to the 43 MCI subjects used in discriminant analysis, for the PCA analysis we also included 55 elderly individuals diagnosed with AD, 78 individuals with normal cognition (Controls), 5 individuals diagnosed with Age-Associated Memory Impairment (Crook et al., 1986), and 35 more MCI subjects (totaling 216 subjects). These 35 additional MCI individuals did not return for follow-up evaluations and therefore their subsequent clinical outcomes were not known at the time of this analysis. Enlarging the set of subjects for the development of the component structure was done for several reasons. First, increasing the number of observations added stability to the resultant structure. Second, including a variety of subject groups in the creation of the component structure allowed for greater generalizability to the population (Chapman, Mapstone, McCrary et al., 2010; John, Easton, Prichep, & Friedman, 1993). Using data from only one group also would risk restricting the range in the test measures and attenuating correlations among variables that could result in falsely low estimates of component loadings (Fabrigar, MacCullum, Wegener, & Stahan, 1999). This risk is reduced by involving data from multiple groups of individuals.
These additional subjects were also evaluated by the same memory-disorders physicians from area clinics. Demographic information for the additional subjects also appears in Table 1 (for more detailed demographic and neuropsychological information concerning the AD and Control subjects, see Chapman, Mapstone, Porsteinsson, et al., 2010).
The neuropsychological battery we administered to each MCI subject contained 17 common tests (total of 49 measures) (Table 2) that target all eight cognitive domains as defined by the NINCDS-ADRDA criteria, particularly memory. We designed the battery to produce a comprehensive sample of cognitive processes and their degeneration in AD. Among others, the tests included measures of memory retrieval and retention, generative fluency, executive function, visuospatial abilities, and attributes of mood and daily living. Each participant’s battery of raw scores was transformed to standard scores using established age/education corrected normative data when possible or laboratory-derived data (normal elderly) when published norms were not available. Normalizing the data limited the influence of age, education, and gender effects. The standard z scores were used for all statistical analyses described in this paper.
Group mean differences for each of the 49 neuropsychological test measures are included as baseline characterization of the Conversion and Stable groups.
Principal Components Analysis (PCA) was used to develop the component structure from the battery of neuropsychological tests. The 216 AD, MCI, and normal participants (observations) and 49 test measures (variables) were submitted to a PCA using the correlation matrix and with Varimax rotation (Kaiser, 1958). PCA produced both component loadings and component scores (Chapman & McCrary, 1995). The component loadings (the general underlying structure of the neuropsychological test results) were used to derive interpretations of the components by relating the test measures to the component structure.
PCA was an important step in our data analysis in three ways. First, it revealed underlying cognitive dimensions implicit in neuropsychological test performance through the component loadings. The loadings related the test measures to the components through each measure’s weighted contribution to the components’ structure (Albert et al., 2007; Carroll, 1993; Chapman, Mapstone, McCrary et al., 2010; Chapman, Mapstone, Porsteinsson et al., 2010; Harman, 1976). Secondly, PCA achieved data reduction by remapping the 49 test measures to a smaller number of component scores via the component structure without being influenced by the group to which the subjects belonged. These were important advantages both in organizing similar test measures into components and in reducing the number of variables while retaining the contributions of all the measures. Finally, PCA permits direct and easy computation of the component scores.
After PCA, the component scores of the 43 MCI participants were retained for discriminant analysis. This analysis developed discriminant functions, based on Bayesian posterior distributions (Ingelfinger, Mosteller, Thibodeau, & Ware, 1983), that predict individuals who will likely convert to AD or likely remain stable. Discriminant analysis provided the posterior probability of group membership for each subject as an integral part of the computation, which adds a key quantitative context when analyzing binary predictions of individuals.
First, a stepwise selection (PROC STEPDISC of SAS) was used to find a subset of the component scores to use as predictors in the analysis. A reduced set of predictors was desirable and a stepwise discriminant procedure used statistical criteria to determine order of entry. A probability to enter criterion of 0.20 was used to ensure entry of important variables that best revealed differences between the Stable and Conversion groups. Then, the subset of selected components was used in a second multivariate procedure (PROC DISCRIM of SAS) to compute linear discriminant functions for classifying individuals. The linear discriminant function was comprised of the weights (coefficients) to be used with each of the input variables. The group classification for each subject was dependent on from which group that individual had the smaller generalized squared distance.
Afterward, a jackknifed cross-validation was performed in which the data from each individual were left out when the coefficients used to assign that individual to a group were computed. Thus, a new discriminant function was developed for and tested on every subject individually. Jackknifed cross-validation gives a more realistic estimate of the ability of predictors to separate groups, and bias in classification is eliminated when the same predictors are forced into the equation, as was done here (Hora & Wilcox, 1982; Lachenbruch, 1975; Tabachnick & Fidell, 2001). We chose this method considering sample size limitations and our desire to use as much data as possible in the development of the discriminant function (Johnson & Wichern, 2002). This solution stability coupled with elimination of bias in classification makes for a better approach given a limited, fixed sample size. Hora and Wilcox (1982) indicated that the one-left-out method is a “superior alternative” to a split-half method, which has an unfortunate effect of reducing the effective sample size.
Required sample size depends upon a number of issues, including expected effect size and number of predictors (Tabachnick & Fidell, 2001). Green (1991) provides a thorough discussion of these concerns and some procedures to determine an appropriate number of cases, including a more complex rule of thumb that takes effect size into account. Expecting the squared multiple correlations to be 0.2 or greater, we computed a necessary sample size of 42 individuals, given 11 predictor variables. Our sample size of 43 individuals exceeds this rule of thumb. Additionally, the sample size of the smallest group should be larger than the number of predictor variables (Tabachnick & Fidell, 2001), which is true in our sample.
Nevertheless, to substantiate empirically that the sample size is sufficient in the present set of data, a randomized resampling procedure was done to assess baseline discriminant performance to compare with our nonrandomized performance. Classification success after randomizing the data largely depends on capitalizing on chance. We randomized our MCI sample such that each subject was randomly placed in a pseudo-Conversion or pseudo-Stable group regardless of his or her clinical diagnosis. The constraint of 14 members in the pseudo-Conversion group and 29 members in the pseudo-Stable group was maintained. The subset of components best able to discriminate between these pseudogroups was collected by stepwise discriminant analysis (PROC STEPDISC) and used in classification analysis (PROC DISCRIM), the methods being the same as those used with our nonrandomized (real) data. We randomized our subject groups 50 times, performed stepwise and discriminant analyses on each randomization, and collected the predictive accuracies for the development and cross-validation of the pseudogroups.
Statistical analyses were computed with SAS 9.1.3 (SAS Institute Inc., 2002). The primary procedures were the MULTTEST, FACTOR, STEPDISC, and DISCRIM procedures. These have also been applied to neuropsychological tests used to classify AD from normal elderly (Chapman, Mapstone, Porsteinsson et al., 2010), as well as applied to brain Event-Related Potentials used to study AD (Chapman et al., 2007) and MCI conversion to AD (Chapman et al., 2009). To evaluate the statistical significance of the classification results, we applied Fisher’s Exact Test with an alpha level of 0.05. This test is appropriate because each individual is placed in a cell in a 2×2 contingency table: test classification of Conversion or Stable by clinical diagnosis of Conversion or Stable. We corrected for multiple comparisons in the analysis of group mean differences with Bonferroni adjustments. Also, p values calculated from the Fisher’s Exact Tests on classification results were corrected with a Bonferroni adjustment (Shaffer, 1995).
The group mean scores for each of the 49 neuropsychological test measures for the Conversion and Stable groups are shown in Table 2. Generally the Conversion group performed worse than the Stable group, particularly on measures of retentive memory. However, none of the 49 test measures had a statistically significant group difference between the Stable and Conversion groups when adjusted for multiple comparisons.
Using mainly Kaiser’s (Eigenvalue > 1) criterion (Kaiser, 1960) as a guideline, we obtained 13 distinct, orthogonal, and interpretable components in the component structure. These 13 components accounted for 77% of the total variance of the data and included a General Episodic Memory component, a Generative Fluency component, a Speeded Executive Function component, a Mood/Activities of Daily Living component, and other components representative of learning and recognition memory. These neuropsychological components have been shown to have strong discriminatory power in differentiating AD from normal aging (Chapman, Mapstone, Porsteinsson et al., 2010).
Of the 13 PCA components, we retained 11 component scores for each of the 43 MCI participants for discriminant analysis. The last two components (which accounted for little variation in the component solution) were not used in order to maintain a roughly 4:1 ratio between subjects and predictor variables entering the stepwise discriminant procedure. These first 11 components accounted for 72% of the total variance of the data. The group mean component scores appear in Table 3. While group differences between the component scores could be examined, it should be noted that the stepwise discriminant procedure classifies individuals and takes the correlations between the components into account when making its determinations; therefore, significant differences between the group means might not necessarily signify strong discriminatory power at the individual level given the rest of the components in the set used. From the 11 component scores entering the stepwise discriminant procedure, six component scores were selected as those that had the best discriminability between the Conversion and Stable groups (Table 4). These six components were weighted and combined in linear discriminant functions to classify each individual as a member of either the Conversion or Stable group (Table 5). Little credence should be placed in the meaning of the particular coefficients found for the sample unless all important variables are known to be included in the analysis or are known to be uncorrelated with the variables already included (Ahlgren, 1986). We show them here because they were used in the discriminant functions as the weights to be multiplied by the neuropsychological component scores of an individual and as a set were assessed to have favorable, statistically significant classification success. Furthermore, they may be used as a tool in analyzing additional data.
The discriminant functions performed well in the development set: 36 of the 43 subjects were correctly classified, resulting in 83.7% prediction accuracy (Fisher’s Exact Test, χ2 (1, N = 43) = 18.2, p<0.0001). Of the 14 members of the Conversion group, two were incorrectly predicted to have remained stable, resulting in a sensitivity of 0.86 and a positive predictive value of 0.71. Additionally, 24 of the 29 members of the Stable group were correctly predicted, resulting in a specificity of 0.83 and a negative predictive value of 0.92. The likelihood ratios for a positive and negative test were: LR+ = 5.06 and LR− = −0.17.
To gauge the confidence in these predictions, we analyzed the posterior probabilities of group membership computed by the discriminant procedure for each individual. These probabilities added quantitative context to the binary predictive decision by supplying measures of likelihood that the group into which the subject was placed by our multivariate method was the correct group. The obtained prediction accuracy for each posterior probability bin was plotted (Figure 1). Subjects were placed in posterior probability bins by their probability of belonging to the group in which the discriminant function placed them (and placement was determined by the group for which the posterior probability was greater than 0.50). First, the obtained prediction accuracy dramatically rose with posterior probability. Second, most (35) of the 43 (81%) subjects lie in the highly confident probability bins (0.70 – 1.00, where the prediction accuracy curve reached its highest level). Only eight subjects were located in the least confident bins (0.50 – 0.69) where the obtained prediction accuracy was near 50% chance.
The cross-validation provided good results: 34 of the 43 (79.0%) individuals were correctly placed in either the Conversion or Stable group (Fisher’s Exact Test, χ2 (1, N = 43) = 13.2, p<0.001). In the Conversion group, 11 subjects were correctly classified, resulting in a sensitivity of 0.79. Likewise, in the Stable group, 23 subjects were correctly classified, resulting in a specificity of 0.79.
Given the modest sample size (43), there might be some concern that we have arrived at seemingly impressive results solely by chance variation in the sample (Ahlgren, 1986). However, since only 11 variables entered the stepwise discriminant procedure, the ratio of subjects (43) to predictor variables was approximately 4:1. Also, our number of subjects exceeds the number suggested by analysis of expected effect size (see Methods and Green, 1991).
Still, to assess empirically that our predictive results were not capitalizing on chance, we measured chance performance with these data randomized. We discriminated randomized, resampled pseudogroups of our MCI subjects and determined if our nonrandomized results were statistical outliers. We randomized our subject groups 50 times, performed stepwise and discriminant analyses on each randomization, and collected the predictive accuracies for the development and cross-validation of the pseudogroups. The mean (with standard deviations in parentheses) percent accuracies for the development and cross-validation analyses of the pseudogroups were 73.0 (5.7) and 61.3 (9.0), respectively. Our nonrandomized (real) results reported above (83.7% accuracy for the development and 79.0% accuracy for the cross-validation) are nearly two standard deviations above the mean accuracies calculated from the randomized pseudogroups for each analysis. Despite the modest sample size, our real results were statistical outliers (p<0.05) from the mean predictive accuracies that chance in the pseudogroups could produce. It should be noted that, because of the constraints on the sample size in each pseudo group, the mean predictive accuracies were higher than 50% chance (as with each randomization, 15 Stable MCI individuals must be placed in the pseudo-Stable group). Additionally, by chance there could have been randomizations where nearly all of the subjects were placed in their correct groups. Nonetheless, the prediction accuracies for our nonrandomized (real) groups were substantially higher than the averages of the pseudogroups. This finding indicated that the sample size was large enough for these data and that our results were not due simply to capitalizing on chance.
Neuropsychological tests are sensitive to the cognitive deficits of MCI. We have examined whether weighted combinations of neuropsychological test measures derived from a battery of commonly used tests can predict conversion in MCI to AD at the individual level. The prediction results using PCA component scores combined by discriminant analysis were accurate with strong sensitivity and specificity. We will now explore the predictive success of the neuropsychological tests as obtained through our combinatory methods and what these tests may reveal about cognitive decline in MCI.
We studied whether or not a multivariate approach, where the correlations among all the test measures in the battery were taken into account in the resultant component structure before discrimination, could generate strong, statistically significant predictive results. PCA reorganized the neuropsychological test measures into 11 more interpretable components that were implicit in the test data. An advantage of PCA is that it allows contribution of all cognitive tests in the neuropsychological battery as measured by each test’s loading on each component. Bäckman et al. (2005) stated in a meta-analysis that few studies tapped all of the eight cognitive domains suggested by the NINCDS-ADRDA criteria in their analyses of conversion from MCI to AD using neuropsychological tests. Through PCA, we were able to represent all eight cognitive domains in our composite neuropsychological component scores.
The multivariate method discussed in this paper has essentially two layers of weighting: (1) the weighting applied by PCA in reorganizing the neuropsychological test measures via the correlations among them into the component structure (independent of knowledge about the differing groups), and (2) the differential weighting for the component scores added by the discriminant analysis in computing discriminant coefficients that are best able to differentiate between the Conversion and Stable groups. This method achieved 84% accuracy, a sensitivity of 0.86, and a specificity of 0.83 by simultaneously taking into account an individual’s performance in all the cognitive domains represented by the battery. The components become a new metric that more parsimoniously represents neuropsychological test performance than the many measures of the original battery, and each subject’s component scores, weighted through formal, data-driven methods, can place his or her performance along the dimensions of the metric as closer to (or farther from) conversion to AD.
Additionally, the posterior probabilities of group membership that accompanied the binary prediction for each participant indicated the confidence of the prediction (Figure 1). We have rarely seen these measures used or discussed in the literature on predicting MCI progression in individuals. Featuring the posterior probabilities is important for several reasons. It allows a determination of which subjects may be “too close to call” (e.g., the determination of a cut point of group membership for classification). These subjects, because of their low posterior probabilities (near chance), could be labeled indeterminate in their diagnosis.
One could consider the prediction outcomes to be binary (either the subject converted to AD or did not). However, the posterior probabilities might be used to measure disease progression (how similar or dissimilar an MCI patient is to other patients who have converted to AD). Evaluating the posterior probabilities may allow the physician or researcher to both identify the probable predictions in a group of individuals and measure the stage of progression for each individual. Posterior probabilities could aid a physician in determining the appropriateness of treatment, and it could benefit researchers when selecting project participants.
There are examples in the literature of analyzing a neuropsychological battery for predicting conversion from MCI to AD. When memory alone was used as a predictor for conversion to AD, sensitivity and specificity were lower (Lekeu et al., 2010; Perri et al., 2007; Tierney et al., 2005). We have shown that the addition of other domains, particularly executive function, produces high predictive accuracy, a finding echoed by other studies (Marcos et al., 2006; Rozzini et al., 2008; Tabert et al., 2006). Removing the Speeded Executive Function Component (Component 2) from our discriminant analysis caused a 14% drop in cross-validation predictive accuracy (27% drop in sensitivity and 10% drop in specificity). Tabert et al. (2006) used regression analysis on both memory and executive function measures and reached a classification accuracy of 86%. Likewise, Marcos et al. (2006) examined the predictive power of another battery of tests which included similar neuropsychological measures. In their study, conversion to AD or stability was correctly identified in approximately 82% of their MCI sample using primarily the Global Cognitive Subscale of the Cambridge Mental Disorders (CAMCOG) and multiple regression analysis. Their results are roughly similar to our developmental findings using component scores, though an examination of their subject demographics suggested that their MCI individuals had less exposure to education (<10 years) than ours (Table 1). In these studies and in our own, the combination of measures from other cognitive domains with memory measures in a formal, multivariate manner improves predictive accuracy. However, none of these other studies provided validation analyses.
Our multivariate method may be generalizable and could be implemented in other settings. This would be easiest to do for a new subject if the tests administered are the same as those used in the development of the component structure. However, it might be possible to use different tests if their loadings on the same components could be reasonably estimated. This is an important point, considering different clinics and research centers might wish to employ their own battery of tests. An aid to doing this might be to calibrate the new measures in combination with marker variables that belong to some of the tests we used in this study that have strong loadings. For more information on this topic and a flow diagram depicting the application of this methodology to new individuals, see Chapman et al., 2010. Perhaps more important than the particular tests are the cognitive dimensions represented by those tests. Our multivariate methodology can be applied to different neuropsychological batteries that represent the same or similar cognitive dimensions.
Additionally, after the component structure has been developed, it may be possible to reduce the number of tests administered and achieve essentially the same results. The 49 test measures we used came from 17 neuropsychological tests (Table 2) which were resolved into 11 PCA components (Table 3). Because in the discriminant analyses only 6 of the 11 component scores were used for prediction, some reduction in the number of neuropsychological tests may be possible without greatly harming the results of the analysis. This can be estimated by studying the PCA loadings of the test measures on the six components that were used in the discriminant analyses (Table 4). Test measures with high loadings on a component play the largest role in computing its component scores. Perhaps dropping test measures with low loadings might incur only small changes in the component scores. For example, since each of the six components selected by the stepwise discriminant procedure had one or more test measures with loadings as high as 0.74, perhaps test measures with loadings below 0.43 might be dropped with minor effects on the component scores.
A caveat is that it may be risky to completely remove measures with low loadings, and we have not studied the effects of doing so in these data. Here we are only proposing this as an idea for those who wish to use a smaller neuropsychological battery.
Our results indicate measures of memory strongly herald the decline of cognition from MCI to AD, a finding in concordance with much of the literature (e.g., Bäckman et al., 2005; Lekeu et al., 2010; Tabert et al., 2006). The PCA components selected by the stepwise discriminant procedure showed the General Memory Component as the first chosen for its discriminatory power between the Conversion and Stable groups (Table 5). It was not obvious from simple examination of mean differences between the Conversion and Stable groups that measures of non-memory cognitive domains might provide predictive power (Table 2), However, our multivariate method revealed these cognitive processes may be involved in the decline from MCI to AD. These contributions might not be as prominent as the increased impairment evident in memory measures, but they still added discriminatory power to the discriminant functions. Our finding of the utility of executive functioning and perceptual speed (Component 2) supports Bäckman et al. (2005) that episodic memory alone may not be the best predictor of conversion to AD in MCI.
One would naturally expect memory measures to present strong predictive power of conversion from MCI to dementia given the nature of AD. The tests with the highest loadings on the General Memory Component generally reflect delayed recall (the Logical Memory II), which has typically shown greater power in predicting conversion to AD (Bäckman et al., 2005; Perri et al., 2007). Recognition memory is impacted to a lesser extent (Bäckman et al., 2005), and in our study, the Recognition Memory (True Positives) Component was selected later in the stepwise process. Conversion to AD in amnestic MCI is marked by increased difficulties with memory (Petersen, 2004). It is worth noting that employing the PCA allowed many memory measures to contribute to the composite component score along this cognitive dimension (Table 4), which can enhance the discriminatory power by including different types of memory (episodic, verbal, and visuospatial) in the component scores without threatening the degrees of freedom in subsequent discriminant analyses.
The second most powerful predictor of conversion to AD in MCI was the Speeded Executive Function Component, which featured high loadings from the Stroop and Trail Making tests. Impairments in quickly switching attention as well as disinhibition may indicate progression toward or conversion to AD. We found this facet of impairment again in the selection of PCA Component 4, the Recognition Memory (False Positives) Component. Those MCI who convert to AD seem to have difficulty inhibiting an incorrect response, a finding echoed by others (e.g., Marcos et al., 2006). This component provided an important enhancement of predictive power to the discriminant function as removing it from the analysis caused a sizeable drop in prediction accuracy (14%) in the cross-validation.
In addition to emerging difficulties with executive function and response inhibition, individuals with amnestic MCI who convert to AD may also have impairments in processing visuospatial memory (the Visuospatial Episodic Memory Component (Component 7)). Perhaps visuospatial episodic memory retrieval may be more seriously or quickly impacted as the disease progresses. Component 11, the Speed in Processing Visuospatial Memory Component, also added discriminatory power. This measure was derived from the speed at which the subjects were capable of reproducing the Rey Complex Figure from memory in the immediate and delayed recall tests. Marcos et al. (2006) suggested that as impairment toward AD progresses, MCI subjects exhibit increased difficulty during complex or demanding visuospatial tasks. The notion that visuospatial memory is impacted at an earlier stage or to a greater extent in those MCI individuals who later develop AD could be consistent with the known pathology of posterior brain regions in visuospatial variants of AD (Mapstone et al., 2003).
The development of impairment in a secondary cognitive domain is a diagnostic hallmark of conversion from MCI to AD (Petersen, 2004). We have seen through our study of this neuropsychological battery for this group of amnestic MCI individuals that failures in executive function (specifically attention-switching and response inhibition) may represent impairment of the secondary cognitive domain. It would be interesting torelate these results to individuals with multiple-domain MCI or with single-domain MCI that presents without primary memory impairment. Perhaps these individuals may already be impaired in executive function or visuospatial memory and perception (Mapstone et al., 2003) and may develop memory difficulties as a “secondary” domain in a temporal sense. This concept requires further study to determine if our componentscores are applicable to the wide range of cognitive and behavioral heterogeneity of MCI that often presents itself in clinics. Additionally, our study warrants further validationwith a greater number of MCI subjects. Although others have indicated that the time between the initial diagnosis and the time of conversion is not a strong predictor (Jelic et al., 2000), our follow-up periods were relatively short and should be studied more extensively. Finally, we believe our battery of neuropsychological tests (found to have 13 dimensions) was representative of the myriad cognitive domains affected by AD. Still, it is possible that other tests that we did not study (for example, measures of olfactory discrimination) may offer better assessment of the decline from MCI to AD, and these tests should be analyzed in such a multivariate way to measure their contributions.
This article was intended to test whether or not a multivariate, composite marker of neuropsychological test performance developed with PCA can predict conversion to AD in MCI individuals with strong accuracy, sensitivity, and specificity. Multivariate weighting, achieved sequentially first through PCA and then through discriminant analysis, produced high, statistically significant accuracy, sensitivity, and specificity in predicting which MCI individuals will be members of a Stable or Conversion group. In addition, the posterior probabilities provided by discriminant analysis added an important measure of confidence to the predictions, allowing the identification of those subjects whose predictive diagnosis might be “too close to call”. Our results were tested for reliability by a statistically significant cross-validation and a randomized resampling method. This multivariate approach warrants further study in comparison to and in combination with other biological, neuropsychological, and behavioral methods of identifying the progression of cognitive impairment.
We thank: the Memory Disorders Clinic at the University of Rochester Medical Center, Monroe Community Hospital, the Alzheimer’s Disease Center, especially Paul Coleman, Charles Duffy, and Roger Kurlan, for their strong support of our research; Susan E. Chapman for help in writing; Robert Emerson and William Vaughn for their technical contributions; Courtney Vargas, Dustina Holt, Jonathan DeRight, Cendrine Robinson, Kristen Morie, Anna Fagan, Michael Garber-Barron, Leon Tsao, and Brittany Huber for technical help; and the many voluntary participants in this research. This research was supported by the National Institute of Health grants P30-AG08665, R01-AG018880, and P30-EY01319.
Robert M. Chapman, Brain and Cognitive Sciences and Center for Visual Science, University of Rochester.
Mark Mapstone, Neurology, University of Rochester Medical Center.
John W. McCrary, Brain and Cognitive Sciences and Center for Visual Science, University of Rochester.
Margaret N. Gardner, Brain and Cognitive Sciences, University of Rochester.
Anton Porsteinsson, Psychiatry, University of Rochester Medical Center.
Tiffany C. Sandoval, Brain and Cognitive Sciences, University of Rochester; San Diego State University / University of California at San Diego Joint Doctoral Program in Clinical Psychology.
Maria D. Guillily, Brain and Cognitive Sciences, University of Rochester; Department of Pharmacology and Experimental Therapeutics at Boston University.
Elizabeth DeGrush, Brain and Cognitive Sciences, University of Rochester; Chicago College of Osteopathic Medicine at Midwestern University.
Lindsey A. Reilly, Brain and Cognitive Sciences, University of Rochester; Springer Publishing Company.