|Home | About | Journals | Submit | Contact Us | Français|
We report evidence that computer-based high-dimensional pattern classification of MRI detects patterns of brain structure characterizing mild cognitive impairment (MCI), often a prodromal phase of Alzheimer's Disease (AD). 90% diagnostic accuracy was achieved, using cross-validation, for 30 participants in the Baltimore Longitudinal Study of Aging. Retrospective evaluation of serial scans obtained during prior years revealed gradual increases in structural abnormality for the MCI group, often before clinical symptoms, but slower increase for individuals remaining cognitively normal. Detecting complex patterns of brain abnormality in very early stages of cognitive impairment has pivotal importance for the detection and management of AD.
Prevalence of AD doubles every 5 years of life after age 60, with more than 4 million individuals affected in the US alone. AD is the most common dementing illness and a major public health issue of increasing importance as life expectancy increases. Although noninvasive approaches for antemortem diagnosis of AD are under development, definitive diagnosis of AD requires neuropathologic confirmation of the characteristic amyloid plaques and neurofibrillary tangles . New drugs under development will target different stages of disease pathophysiology, and efficacious AD treatments likely will require early initiation before irreversible brain tissue damage. Thus, a great deal of attention has been paid recently to the prodromal stage of AD, referred to as mild cognitive impairment (MCI), which includes individuals with memory problems who do not meet criteria for dementia. Although MCI definitions vary across studies , MCI individuals convert to AD with rates of 6−15% annually . Therefore, MCI individuals are a high risk group likely to benefit from effective treatments.
Structural MRI promises to aid diagnosis and treatment monitoring of MCI and AD, offering the potential for easily obtainable surrogate markers of diagnostic status and disease progression. Unlike relatively more advanced stages of MCI and AD, quantifying patterns of structural change during early stages of AD or during clinically normal stages is a major challenge. Brain atrophy in the early stages of AD may be relatively subtle and spatially distributed over many brain regions [6,7,15,28,30], including the entorhinal cortex, the hippocampus, lateral and inferior temporal structures, anterior and posterior cingulate, and possibly other regions that have only recently been investigated . Furthermore, spatially heterogeneous patterns of atrophy have been found within the hippocampus, with regions known to correspond to the CA1 field presenting relatively more pronounced atrophy[19,48]. Patterns of atrophy associated with pathology are confounded by complex patterns of atrophy associated with normal aging . Moreover, the error associated with structural measurements can vary throughout the brain, since some structures are more difficult to delineate, especially via computer algorithms, thereby rendering the measurement of certain brain regions more informative than others merely for methodological reasons . Therefore, powerful and sensitive statistical image analysis methods must be used to capture morphological characteristics that are different between normal aging and MCI, and to determine which are most informative, from a diagnostic perspective.
Most MRI studies in MCI and AD have relied on measurement of volumes of specific brain regions[5,41], especially the hippocampus and the entorhinal cortex, which show histopathogical changes at early stages of AD . Computational neuroanatomy has also been used to evaluate voxel-by-voxel brain changes in healthy aging, MCI and AD . These studies have confirmed patterns of atrophy involving medial temporal lobe structures in MCI and AD. They have reinforced the value of MRI as a potential surrogate marker of disease at the group-analysis level, i.e. for examining overall differences between individuals with and without pathology. However, their diagnostic value is limited, especially at early stages of brain pathology, since their sensitivity and specificity are not sufficient for prediction of the status of a given individual.
Herein we report results from a longitudinal study that provide strong evidence that there is a subtle and spatially-distributed pattern of brain structure that is characteristic of MCI, and which often begins developing prior to the recognition of cognitive deficits. Moreover, this pattern can be detected with high sensitivity and specificity using a high-dimensional image analysis and pattern classification method that examines spatial patterns of brain atrophy in their entirety, instead of applying separate region-by-region evaluations. Therefore, detection of this structural pattern can lead to very early diagnosis of prodromal AD. This study adds to mounting evidence in the literature for the importance of pattern classification methods in detecting subtle and complex structural and functional patterns [10,11,21,37].
MRI scans from 30 elderly individuals were obtained annually as part of the Baltimore Longitudinal Study of Aging neuroimaging substudy . At initial enrollment, all individuals were free of dementia and other central nervous system disorders, severe cardiovascular disease, and metastatic cancer (detailed in ). Screening of mental status by the Blessed-Information-Memory-Concentration (BIMC) test was performed at each annual visit in conjunction with a comprehensive neuropsychological assessment. Subject and informant based assessments with the Clinical Dementia Rating (CDR)  scale were administered by certified examiners to participants in the BLSA autopsy study annually (about 50% of participants) and to remaining participants scoring 3 or more BIMC errors. After as many as 9 annual neuroimaging and cognitive assessments, 20 participants with MCI were identified from a sample of 155 neuroimaging study participants who completed MRI studies. These 20 participants were characterized as MCI, based on CDR scores of 0.5 and/or consensus diagnosis indicating memory impairment that does not meet criteria for dementia. Of these individuals, 15 were eligible for inclusion in the current study. Five participants who developed cognitive impairment over the course of the study were excluded from these analyses due to documentation of other pathological processes (e.g., clinical stroke (1), brain trauma (1), heavy alcohol use (1), post-surgical confusion (1), absence of Alzheimer's pathology at autopsy (1)). A control sample of 15 individuals who remained unimpaired (CDR = 0), matched for age, sex, and follow-up interval, was identified from the remaining 135 participants. Subject characteristics are shown in Table 1. It is important to emphasize that the MCI participants in this study are identified within the context of prospective longitudinal follow-ups and typically represent relatively mild cases of cognitive impairment in contrast to those followed in other studies who typically present with memory complaints (e.g. ). None had CDR total scores greater than 0.5 for any of the visits used in these analyses, and the mean (SD) of the sum of the individual CDR box scores was 1.2 (0.9) for the most recent visit included in the analyses. Therefore, in this group AD pathology is likely to be at a relatively early stage. To date, 10 of the 15 MCI individuals have been assigned diagnoses of Alzheimer disease at subsequent follow-up visits., verifying that we are including individuals who are progressing as well as those who may be more stable.
MR acquisition procedures are detailed in . MR scanning was performed on a GE Signa 1.5 Tesla scanner. The current results are based on a high-resolution volumetric “spoiled grass” (SPGR) series (axial acquisition, TR = 35, TE = 5, flip angle = 45, FOV = 24, matrix = 256 × 256, NEX = 1, voxel dimensions of .94 × .94 × 1.5 mm slice thickness).
Images were pre-processed using the methods described in , which resulted in segmentations into grey matter (GM), white matter (WM) and cerebrospinal fluid (CSF). To compare structural patterns across individuals, we spatially transformed each segmented image into a common coordinate system, often called stereotaxic space. A mass-preserving framework was adopted  to ensure that the volumes of brain tissue were preserved during the transformation and to provide tissue density maps of GM, WM and CSF for each individual that reflected the spatial distribution of these tissue volumes. For example, relatively lower GM tissue density in the hippocampus would be indicative of hippocampal atrophy.
Patterns of the spatial distribution of GM, WM and CSF volumes were then examined via a pattern classification technique , and patterns specific to MCI were determined. In particular, regions in which the tissue density correlated well with the clinical variable (MCI = 1, normal = −1) were first identified, via a voxel-by-voxel calculation of the Pearson correlation coefficient. In order to render this calculation robust to outliers, a leave-one-out procedure was applied, i.e. given n training samples, the correlation coefficients were calculated n times, each time leaving one of the scans out. The minimum value was then retained, representing the worst case scenario. This approach allowed us to subsequently construct spatial patterns from brain regions that were not only good discriminators between normal and MCI groups, but also were robust. Additional robustness was achieved by examining the spatial consistency between a voxel and its spatial neighborhood, and retaining only the brain regions that displayed both robust correlation with clinical status and high spatial consistency. A watershed-based clustering method was then used to determine brain regions whose volumes had good discriminant features. Finally, a reverse feature elimination procedure was used to find a minimal set of features to be fed to the classifier. Because the predictive power varies somewhat as a function of the number of brain regions (features), we estimated predictive power by averaging the abnormality scores (see below) obtained from all classifiers built for cluster numbers ranging from 12 to 45 (where the predictive power reaches a plateau). Additional details of the feature construction and selection procedures can be found in .
This analysis was cross-sectional, in that it was based solely on the tissue density maps obtained for the most recent MRI assessment for each individual. For participants who developed dementia over the course of the study, the most recent MRI prior to the diagnosis was used, with a mean (SD) interval of 1.9 (1.9) years between most recent scan and diagnosis. Volumetric measurements from these brain regions were then used to build a classifier [35,47], which produced an abnormality score: positive values indicate a structural pattern resembling MCI, whereas negative values indicate brain structure in unimpaired individuals. A value of 0 would indicate a structural profile that is in-between normal and abnormal. Leave-one-out cross-validation was used to test the predictive power of this analysis on new datasets (datasets not involved in the selection of optimal brain clusters and training of the classifier) and construct Receiver Operating Characteristic (ROC) curves that summarize predictive value. In this analysis, the scan of one participant was put aside, and the classifier was constructed from the most recent scans of all other individuals. Thus, the individual being classified was not included in the training data set for development of the classifier. This classifier was then applied to all available scans of the left-out individual. In this way, the temporal evolution of these spatial patterns of brain abnormality was measured during earlier years for each individual.
In order to form an image of the brain regions that constitute a pattern of brain tissue distribution that is characteristic of MCI, we followed our earlier work , and formed a spatial map of brain regions whose volumes change as one follows the path of fastest change in the abnormality score from positive to negative. These regions jointly form a pattern that optimally characterizes the differences between MCI and healthy controls, from the perspective of classification. Moreover, a value from 0 to 1 is determined for each region, reflecting its relative importance in classification.
The path of fastest change was constructed by taking each support vector, i.e. each sample that was close to the interface between MCI and controls, and following the gradient of the SVM's decision function. This gradient is known to provide the direction of fastest change. In the context of our approach, when following the gradient direction, one moves from an MCI sample to the opposite side, i.e. to the realm of features corresponding to normal controls, thereby generating a spatial map of the regions whose volumes change when one “makes an MCI brain look like healthy”. This procedure was repeated for each support vector, and an average spatial map was generated for display purposes.
The spatial map of brain regions that was formed as described in Methods is shown in Fig. 1a. This set of regions forms a structural network which, according to our classification approach, carries the most distinctive characteristics of MCI relative to unimpaired individuals. This map highlights several regions including the lateral and inferior aspects of both hippocampi, which is where the CA1 field of the hippocampus is located, bilateral superior, middle and inferior temporal gyri (GM), bilateral orbitofrontal GM, left fusiform gyrus (GM), right collateral sulcus, and cingulate, especially posteriorly. We also found clusters of reduced WM volumes in the inferior temporal gyri as well as middle and superior frontal gyri (see Fig. 1b).
Based on the pattern of spatial distribution of brain tissue in the regions of Fig. 1, individuals with MCI were distinguished initially with 100% accuracy from those without cognitive impairment, thereby demonstrating full separation between MCI and cognitively normal individuals, using this nonlinear classifier. Next, predictive power was determined via the leave-one-out cross-validation (Methods) to be 90%; this is an estimate of classification accuracy of a new individual's scan and therefore of direct diagnostic relevance. Fig. 2 shows the ROC curve of the leave-one-out analysis, indicating very good predictive power and sensitivity/specificity trade-off.
As described earlier, the classifier was determined from the latest scans of all participants and did not include prior longitudinal images. Next, the classifier derived from the leave-one-out analysis of the most recent scans was retrospectively applied to all available earlier scans for each participant, yielding an abnormality score for each individual scan for each participant. The longitudinal evolutions of the abnormality scores for all available scans of each participant are shown in Fig. 3a. As described above, the values for each individual were based on cross-validation, i.e. all scans of a given individual were omitted during the construction of the classifier for that individual based on the most recent scans of all other participants. Fig. 3a shows increases over time in the structural abnormality scores from normal levels (negative) to MCI-like levels (positive). Mixed effects regression models were used to compare longitudinal changes in abnormality scores for MCI and cognitively normal groups. Age at assessment and diagnostic status were independent variables and abnormality scores were dependent measures. The MCI group showed a highly significant increase in abnormality score over time (p < 0.0001), whereas the cognitively normal group showed a weaker increase over time, p=0.022 (Fig. 3a); the two slopes were significantly different (p=0.045, all one sided tests).
Examination of the individual longitudinal trajectories revealed that one of the clinically normal participants was the main contributor to the longitudinal increase of the abnormality scores of the normal group. We omitted this participant and repeated the mixed effects regression analysis, obtaining the plot of Fig. 3b. Excluding this outlier, there was no significant longitudinal increase of abnormality scores in the normal sample (p=0.38). This participant is now deceased, and autopsy findings indicated moderate plaques and a Braak score of 4, consistent with a pathological diagnosis of possible AD by CERAD criteria . This finding is in agreement with his highly abnormal MRI-based classification scores.
Examination of the classification scores on the year of conversion from normal to MCI in the subset of 8 subjects who converted over the course of the study showed that structural brain changes were already quite considerable at that stage. Specifically, the average score of the MCI participants on the year of conversion from normal to MCI was 0.15, whereas the average score of the same participants during their most recent scan during which did not have a diagnosis of dementia was 0.26; the latter scores reflect a more advanced disease stage. The average score of all MCI participants at the latest year of available scans prior to dementia was also 0.26, and the average classification score of all scans of normal controls was −0.3. These numbers indicate that on the year of conversion, the group that progressed to MCI was already well into the range of abnormal brain structure. However, one of the limitations of this study was that we were not able to investigate a variety of specific intervals prior to conversion to MCI due to the modest sample size and the fact that 7 of the MCI individuals were labeled MCI at their first imaging assessment by retrospective ascertainment of date of onset.
We compared the discrimination power of our pattern analysis approach with that of volume measurements of the left and right hippocampus and entorhinal cortex, after normalization by total intracranial volume (ICV) that accounted for total head size. Fig. 4 shows the scatterplot of ICV-normalized volumes, which indicate modest group separation and diagnostic accuracy in these relatively mildly impaired subjects, even though a statistically significant difference between the two groups was measured (p=0.036 for the hippocampus and p=0.02 for the entorhinal cortex, after omission of one outlier; not excluding the outlier yielded respective p-values of 0.08 and 0.14, all one-sided t-tests). In fact, the best classification rate we were able to achieve using the hippocampus and ERC volumes jointly was 76.6%, via a nonlinear support vector machine. The respective ROC is shown in Fig. 5, reflecting a relatively limited diagnostic accuracy.
Comparison of our results with the widely used voxel-based morphometry method, which applied voxel-by-voxel t-tests on the smoothed tissue density maps , revealed that no single brain region carried adequate predictive power. In particular, the classification accuracy of the most discriminatory cluster, examined via cross-validation, was 63.3%, reflecting relatively poor predictive power of single-region measurements, and further supporting the importance of the network-type of analysis performed herein.
In our final experiment, we investigated the possibility of using a smaller number of brain regions without sacrificing predictive accuracy. Fig. 6 shows the predictive accuracy as a function of the number of clusters, revealing that at least 20 brain regions must be considered jointly, in order to capture the MCI-specific structural pattern.
To our knowledge, our study is the first to demonstrate that complex and subtle structural patterns that characterize the trajectory of increasing brain abnormalities in individuals with mild MCI can be identified from cross-sectional MR scans via high-dimensional image analysis and pattern classification methods. Importantly, these patterns of structural change can be measured even before cognitive decline brings the individuals to clinical attention. The spatial pattern of distribution of brain tissue that allowed us to accurately detect individuals with the MCI structural phenotype involved several structures that are known to be implicated in AD, including a number of temporal lobe structures, the cingulate, and parts of the orbitofrontal cortex that have dense connections with anterior temporal lobe structures and show pathology early in AD . Notable is the fact that it was mainly the lateral and anterior part of the hippocampus that contributed to the classification, which might imply that CA1 is relatively more affected in these individuals, and might explain, in part, the relatively limited diagnostic value of total hippocampal volume measurements in the subjects of our study (Fig. 4). A relatively higher CA1 atrophy would also agree with histopathological studies , albeit the resolution of current 1.5T MR images does not allow us to specifically define hippocampal subfields. Some recent studies using high-dimensional warping of the hippocampus have also observed spatially heterogeneous patterns of hippocampal atrophy in AD [19,48]. These results further bolster our confidence that sophisticated methods for feature selection and pattern classification are necessary to capture the complex patterns of brain atrophy that are most informative for diagnosis, whereas conventional ROI volumetric analysis is limited in that it combines regions that are relatively more affected by disease with regions that are relatively less affected, thereby potentially reducing the predictive power of these measurements. Our analysis also identified white matter regions both in the temporal lobe and in the superior and middle frontal gyri that were very important for accurate classification of MCI and have not been previously reported in the literature. These structures merit further investigation using imaging methods that are more suitable for analysis of white matter structure, especially diffusion tensor imaging. Finally, our results indicated bilateral hippocampal atrophy, in contrast to recent reports of lateralized atrophy, with both greater right hippocampal atrophy  and greater left hippocampal atrophy  reported.
Most importantly, our results provide the first evidence that integration of these regional measurements via nonlinear pattern classification provides very high diagnostic accuracy on an individual basis. Although several studies examining cross-sectional and longitudinal effects in volumes of brain regions have shown significant group differences between MCI patients that convert to AD and MCI patients that don't convert, or between healthy controls and AD or MCI patients [2,8,9,12-16,18,22,25,26,29,31-34,36,44,46,50], the ability to detect structural patterns that enable accurate prediction for specific individuals is ultimately what determines the clinical value of MRI and measurements obtained from it. Our results further confirmed that evaluation of all these brain regions jointly is necessary to obtain high predictive accuracy. Measurements restricted to volumes of the hippocampus and ERC, two structures that are known to be affected early in AD and show group differences in neuroimaging studies of AD and MCI [2,8,12,14-16,18,26,29,32,36,44,50], provided considerably lower predictive accuracy. Lower accuracy was observed even when these measures were evaluated jointly, due to the high overlap of the volume distributions of these two structures between MCI and healthy individuals (Fig. 4).
Analysis of predictive accuracy as a function of the number of brain regions contributing to the classification confirmed that one needs 20−25 clusters to be able to accurately capture all subtleties of the structural abnormality in these mild MCI cases and achieve sufficient predictive accuracy (Fig. 6). Further increase of the number of brain regions reduces predictive accuracy, a fact that is well-known in machine learning and is due to two factors: 1) increase of noise by including regions that are not relevant to classification; 2) insufficient training, as the number of variables exceeds the number of training samples (30).
Clinically important was also the finding that the pattern of spatial distribution of brain tissues characteristic of MCI was typically apparent prior to the recognition of clinical symptoms. In particular, abnormality scores progressed steadily in people that were later diagnosed with MCI, and at the time of conversion to MCI, their scores were well into the abnormal realm. In contrast, individuals who did not develop cognitive impairment showed much slower increase of brain abnormalities during earlier years. Therefore, our findings suggest that there may be a continuum of structural brain changes that eventually reaches some threshold associated with clinically detected cognitive impairment. Furthermore, we have demonstrated that these brain changes can be captured by pattern analysis and summarized by a score reflecting the level of structural abnormality at an individual level. Our results reveal the potential of this approach for early detection of MCI during the prodromal phase of AD, which will be critical in making treatment decisions with the anticipated availability of different treatment options in the future.
Further evaluation of the abnormality scores in the normal group revealed that only one clinically normal participant displayed considerable increase in abnormality score. Autopsy results in this individual demonstrated findings consistent with a pathological diagnosis of AD. Although a single case, this finding bolsters our confidence that our MRI-based classifier is able to detect structural patterns that indicate pathology, even in individuals who are cognitively normal.
The fact that the majority of our MCI participants were only very mildly impaired is both a strength and a possible weakness of our study. Our ability to robustly classify individuals with very mild impairment provides encouragement for future development of this approach as a diagnostic tool for prodromal AD. The majority of our participants showed only mild impairment in memory based on informant ratings. As they were identified within the context of prospective longitudinal follow-ups, they were likely at early stages of impairment and most would not yet have come to clinical attention. On the other hand, we do not know how many of the remaining MCI individuals will ultimately develop AD or when they will do so, and we do not know whether our method will distinguish MCI individuals who remain stable versus those who convert to AD. We also cannot determine at present which cognitively normal individuals with increasing structural abnormality scores will ultimately convert to MCI (Fig. 3) and how quickly they will do so. We continue to follow these participants with cognitive and MRI assessments and have initiated PET amyloid imaging studies in this sample, in order to fully characterize the process of progression from healthy brain structure and cognition to AD.
It is important to emphasize that our sensitivity/specificity estimates were obtained using leave-one-out cross-validation to obtain robust estimates of generalization and predictive power. This ensured that our predictive power was being evaluated on MRI scans not previously encountered by the classifier and the feature selector. This type of cross-validation guards against findings that can potentially be introduced purely by chance, a persistent problem in high-dimensionality analyses especially in relatively small samples. This type of potentially misleading result is illustrated by our observation of 100% separation between MCI and healthy controls in our initial analysis without cross-validation. However, when cross-validated, our analysis showed 90% predictive power, which is an estimate of the diagnostic value of these volumetric measurements on new individuals. While our study demonstrates that individuals with MCI can be distinguished from cognitively healthy controls with high accuracy, sensitivity and specificity, we did not test the potential utility of this approach in differential diagnosis among different types of dementia. A test of the validity of this approach for differential diagnosis of dementia subtypes would require training of the classifier on all types of dementia jointly and merits future investigation.
Our classification results and the estimates of predictive power were derived from a single structural MRI of each individual, rather than from longitudinal measurements of brain atrophy. The ability to accurately classify even mildly impaired individuals from a single cross-sectional MRI contrasts with prevailing thinking that effective prediction of early stages of AD will require measurement of longitudinal brain changes. This is very important from a clinical and financial perspective, since consistent and frequent follow-up of healthy individuals that might or might not be at risk for AD is extremely difficult, especially in a typical clinical setting. We believe that the high-dimensional multivariate nature of our analysis compensated for not using longitudinal measurements for classification. However, if necessary, our classification approach could readily incorporate both cross-sectional and longitudinal measurements, as well as physiological measurements, in order to construct more sensitive and specific early diagnostic tools for AD. It will also be important to continue longitudinal studies to determine how early in the prodromal period these changes are detected.
If our findings are replicated in other samples and imaging centers and, most importantly, if accuracy of classification can be improved by training on more extensive samples compared to the relatively modest number of scans used in this study, our results and the methodology adopted herein can play a significant role in detection of brain abnormalities associated with cognitive impairment prior to recognition of clinical deficits. Moreover, application of our classification methodology to longitudinal imaging data will enrich our understanding of the progression of the underlying brain abnormalities and will be critical in monitoring treatment and guiding therapeutic interventions for one of the most devastating diseases of the elderly.
We gratefully acknowledge the assistance of Yang An, M.Sc., National Institute on Aging, in statistical analysis, the BLSA participants and staff, the staff of the MRI facility at Johns Hopkins Hospital, and Dr. Juan Troncoso, Department of Pathology, Johns Hopkins University, for neuropathologic assessments. This study was supported in part by NIH funding sources N01-AG-3-2124 and R01-AG14971 and by the Intramural Research Program of the NIH, National Institute on Aging.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Disclosure Statement All co-authors have seen and approved this submission. There are no conflicts of interest including any financial, personal or other relationships with other people or organizations, by any of the co-authors, related to the work described in the paper. The submission is not under review by any other archival journal. An oral and poster presentation of this work has been made at the International Conference of Alzheimer's Disease, in Madrid, July 2006.