|Home | About | Journals | Submit | Contact Us | Français|
Biomarkers of Alzheimer's disease (AD) are increasingly important. All modern AD therapeutic trials employ AD biomarkers in some capacity. In addition, AD biomarkers are an essential component of recently updated diagnostic criteria for AD from the National Institute on Aging – Alzheimer's Association. Biomarkers serve as proxies for specific pathophysiological features of disease. The 5 most well established AD biomarkers include both brain imaging and cerebrospinal fluid (CSF) measures – CSF Abeta and tau, amyloid positron emission tomography (PET), fluorodeoxyglucose (FDG) PET, and structural magnetic resonance imaging (MRI). This article reviews evidence supporting the position that MRI is a biomarker of neurodegenerative atrophy. Topics covered include methods of extracting quantitative and semi quantitative information from structural MRI; imaging-autopsy correlation; and evidence supporting diagnostic and prognostic value of MRI measures. Finally, the place of MRI in a hypothetical model of temporal ordering of AD biomarkers is reviewed.
The topographic pattern of neurodegenerative atrophy in Alzheimer's disease (AD) captured by anatomic magnetic resonance imaging (MRI) mirrors that of neurofibrillary pathology (Braak and Braak, 1991; Whitwell et al., 2008a; Whitwell et al., 2007). Atrophy begins in and is ultimately most severe in the medial temporal lobe, particularly the entorhinal cortex and hippocampus, which is why these structures have been targeted in many MRI studies for diagnostic purposes. Atrophy later spreads to the inferior temporal lobe and paralimbic cortical areas. The transition from mild cognitive impairment (MCI) to full dementia is felt to be due to spread of degenerative atrophy to multi- modal association neocortices. Below is a brief survey of methods to extract and/or visualize this information from 3D MRI scans of cross-sectional and longitudinal studies (modified from (Vemuri, P and Jack, 2010)).
Visual assessment of the degree of atrophy in the medial temporal lobe is often used to assess disease severity and to add confidence in a clinical diagnosis of AD (Scheltens et al., 1992). Figure 1 shows the medial temporal lobe in cognitively normal elderly (CN), amnestic mild cognitive impairment (aMCI) and AD. While simple visual assessment is easily implemented and widely available, atrophy is a continuous process and this method does not lend itself to accurate or reproducible assessment of fine incremental grades of atrophy.
Manual tracing and quantifying the volume of medial temporal lobe structures e.g., the hippocampus or entorhinal cortex has been traditionally employed and provides an accurate quantitative measure of atrophy but is time consuming (Fox et al., 1996; Jack et al., 1992).
Methods have been developed to automatically parcellate gray matter density (Tzourio-Mazoyer et al., 2002) or the thickness of cortical surfaces (Dale et al., 1999; Fischl et al., 2004) into regions of interest. This is computationally intensive but is reproducible and does not require manual intervention.
An advantage of measuring something like the hippocampus is that the measurements describe a known anatomic structure that (in the case of the hippocampus) is closely related to the pathological expression of the disease and is also functionally related to one of the cardinal early clinical symptoms – memory impairment. The disadvantage of using a single structure or region of interest (ROI) to consolidate 3D information is that it is topographically limited and does not make use of all the available information in a 3D MRI.
These methods assess atrophy over the entire three dimensional MRI scan.
Methods such as voxel based morphometry (VBM) (Ashburner and Friston, 2000) are a popular and useful way to test for group-wise differences in the topography of atrophy. However, the statistical testing portion of VBM is not designed to provide diagnostic information at the single subject level.
Several investigators have developed multivariate analysis and machine learning based algorithms which use the entire 3D MRI data to form a disease model against which individual subjects may be compared. A new incoming scan is scored based on the degree and the pattern of atrophy in comparison to the scans of a large database of well characterized subjects (Alexander and Moeller, 1994; Csernansky et al., 2000; Davatzikos et al., 2009; Fan et al., 2005; Kloppel et al., 2008; Stonnington et al., 2008; Vemuri, P. et al., 2008a). Such measures capture the severity of neuronal pathology, i.e., Braak staging, better than hippocampal volumes (Vemuri, P. et al., 2008b).
While change over time can be determined by simply measuring a volume independently on each scan in a series and performing arithmetic subtraction of the volumes, more sophisticated techniques have been developed to extract tissue loss information from serial MRI scans. In these techniques all MRI scans within a subject's time series are registered to each other and brain loss between scans is quantified as a measure of neurodegenerative disease progression.
One of the earliest methods developed to quantify the global change in brain volume between two scans was the boundary shift integral (BSI) (Fox and Freeborough, 1997; Freeborough and Fox, 1997). BSI determines the total volume through which the surface of the brain has moved between scans acquired at two time points, i.e., as the brain volume decreases and the volume of the ventricles increases.
Evidence validating MRI as a neurodegenerative AD biomarker is reviewed below. Studies are classified on several criteria, including the method of measurement, numbers of subjects, and source of subjects. The ideal source is an epidemiological or population-based cohort. The next best option is a community based sample. The least desirable but most common source of data are referral samples, which have the highest risk of biases. Evidence validating MRI as an AD biomarker takes the form of several different types of studies: cross sectional clinical-MRI correlations; prediction of future clinical change; correlating change-over-time on serial MRI with concurrent change on clinical indices; and MRI-autopsy correlation.
Many studies have been published describing the accuracy, sensitivity, specificity, or area under receiver operating characteristic curve (AUROC) with which clinically diagnosed AD subjects can be separated from cognitively normal elderly control subjects. This is the simplest type of data to acquire and hence this is the most frequent type of study found in the literature. This is the weakest category of validation data, since the gold standard against which the MRI is compared is a clinical diagnosis, which can be wrong. A clinical diagnosis is also available in the absence of any biomarker data. Accuracy ranges from 85% to 100%. Different methods have been employed as described above. The literature is too vast to describe each publication, but Table 1 contains some representative examples of studies demonstrating cross sectional separation of clinically diagnosed AD vs. controls. Results vary depending on measurement method, source of subjects, and statistical endpoints.
A related class of studies is those that demonstrate cross sectional separation of clinically diagnosed controls vs. subjects with mild cognitive impairment. Mild cognitive impairment may have been defined using the formal diagnostic criteria for MCI outlined by Petersen et al (Petersen, R. C., 2004) or may have been defined using other criteria. Table 2 contains some representative examples of studies demonstrating cross sectional separation of clinically diagnosed controls vs. subjects with mild cognitive impairment using quantitative MRI measures. Results vary depending on measurement method, source of subjects, and statistical endpoints.
MRI-autopsy studies have convincingly validated that quantitative measurements of brain volume loss correlate with pathological indices of neurodegenerative severity. Hippocampal volumes measured from ante mortem MRI scans correlate with Braak neurofibrillary tangle pathologic staging in both demented and non-demented subjects (Gosche et al., 2002; Jack et al., 2002). Ante mortem hippocampal volume as well as rates of brain and hippocampal atrophy from MRI correlate with hippocampal neurofibrillary tangle density (Csernansky et al., 2004; Silbert et al., 2003) at autopsy. Excellent correlation is found between hippocampal volume measures obtained on either ante mortem MRI (Zarow et al., 2005) or post mortem MRI (Bobinski et al., 2000) and hippocampal neuron cell counts in autopsy specimens. On the basis of these imaging-to-pathology correlation studies, quantitative measures from structural MRI, such as hippocampal volume, are inferred to represent an approximate surrogate of the stage/severity of neuronal pathology – neuron loss, neuron shrinkage, and synapse loss – that occurs in AD. Voxel-wise studies of grey matter loss demonstrate that the topographic distribution of grey matter loss closely mirrors Braak and Braak spatial distribution of neurofibrillary pathology in subjects who have had ante mortem MRI and have come to autopsy (Figure 2). Fully automated multi voxel analysis methods demonstrate close correlation between quantitative ante mortem MRI and Braak staging, as depicted in Figure 3 with STructural Abnormality iNDex (STAND) scores.
We point out that while MRI measures of atrophy do scale with pathological indices of neurodegeneration, brain atrophy is not specific for AD. It occurs in other conditions that may be associated with cognitive impairment, such as cerebrovascular disease, hippocampal sclerosis, frontal temporal lobar degeneration, and head trauma (Jack et al., 2002; Jagust et al., 2008; Zarow et al., 2005).
Because different AD biomarkers provide information about different AD-related pathological processes, it stands to reason that comprehensive in vivo assessment of the disease requires information from different classes of biomarkers. Based on the assumptions that MRI provides an index of neurodegenerative pathologic burden (above) and PIB PET a measure of amyloid plaque burden, a model of AD has been proposed in which the rate of amyloid deposition and the rate of neurodegeneration later in life are dissociated. The presence of brain amyloidosis is necessary but not sufficient to produce cognitive decline; the neurodegenerative component of AD pathology is the immediate substrate of cognitive impairment, and the rate of cognitive decline is driven by the rate of neurodegeneration. In this proposed model, amyloid deposition is dynamic early in the disease process (presymptomatically) while neurodegeneration is dynamic in the mid to late stage. This Amyloid and Neurodegeneration model (Jack et al., 2009) is reproduced in Figure 4. In the model, the lifetime course of the disease is divided into clinically defined pre-symptomatic, early symptomatic (MCI), and dementia phases. Neurodegeneration, detected by atrophy on volumetric MRI, is indicated by a dashed line. Cognitive function is indicated by a dot-dash line. Amyloid deposition, detected by PIB, is indicated by a solid line later in the course of AD (i.e., that portion of the disease for which PiB data is now available). The time course of Aβ42 deposition early in life is represented as two possible theoretical trajectories (dotted lines), reflecting uncertainty about the time course of early PIB signal.
An expanded version of this disease biomarker model (Jack et al., 2010a) incorporates the five most well validated AD biomarkers into a comprehensive sequence of pathological events as subjects progress from cognitively normal in middle age to dementia in older age. There are presently five well-accepted biomarkers of AD. Both CSF Aβ42 and amyloid PET imaging are biomarkers of Aβ plaque deposition. CSF tau is an indicator of tau pathology and associated neuronal injury. FDG PET measures AD mediated neuronal dysfunction, while structural MRI measures AD-mediated neurodegeneration. This model rests on the assumption that these five AD biomarkers become abnormal in a sequential manner, but their time courses also overlap. The hypothesis is that amyloid PET imaging and CSF Aβ42 become abnormal first, perhaps as much as 20 years before the first clinical symptoms appear. CSF tau and FDG PET become abnormal later and structural MRI is the last of the five major biomarkers to become abnormal. CSF tau, FDG PET and structural MRI correlate with clinical symptom severity while CSF Aβ42 and amyloid PET imaging may not. The hypothesis is that together these five biomarkers of AD are able to stage the complete trajectory of AD, which may span as much as 20-30 years or more in affected individuals. Figure 5 illustrates this expanded model (Jack et al., 2010a).
MRI is used in several different ways in therapeutic trials. Therapeutic modification of the natural rate of atrophy has been used as an outcome measure in a number of AD and MCI trials. As a measure of the severity or stage of neurodegeneration, MRI has been used as a covariate in analyses, much the same way disease severity on clinical scales like the Mini Mental State Examination (MMSE) or AD Assessment Scale—Cognitive (ADAS—cog) is used. In theory MRI can also be used to stratify trial subjects at baseline on the basis of disease severity. Although the discussion above has focused on structural MRI as a measure of the severity of AD-related neurodegeneration, MRI also is commonly used for inclusion/exclusion purposes in therapeutic trials. For example, hemispheric cerebral infarction, tumor, normal pressure hydrocephalus (NPH), prior surgery, major head trauma and cerebral hemorrhage are common exclusionary findings on screening MRI. Micro hemorrhages that exceed a prespecified number are also a common exclusionary finding in anti-amyloid trials. The major barrier to the use of volumetric MRI as an outcome measure in clinical trials has been lack of standardization of MRI methods, particularly methods for extracting quantitative information from scans. This lack of standardization leads to different results (see Tables 1--4),4), which in turn undermines the credibility of the method in the minds of regulators. Although initiatives such as the Alzheimer's Disease Neuroimaging Initiative (ADNI) have focused on standardizing imaging methods, to date universally accepted standards for MRI image quantification have not emerged.
At the present time, AD biomarkers have not yet been validated as surrogate endpoints for regulatory purposes. However the impact of interventions on these biomarkers has been evaluated in a few trials and was found to be potentially useful in capturing the pharmacodynamic effects of an agent. The efficacy of Donepezil, an acetylcholinesterase inhibitor, was evaluated using serial anatomic MRI (Hashimoto et al., 2005; Jack et al., 2008; Krishnan et al., 2003) and was found to possibly be neuro-protective based on some evidence of decreased rates of atrophy in the treatment vs. placebo arms. In a different study, antibody responders immunized to Aβ had more rapid volume loss than placebo patients during a Phase IIa immunotherapy trial that was prematurely terminated due to meningoencephalitis in a small subset of patients (Fox et al., 2005).
About 12%-15% of MCI subjects annually progress to AD (Fischer et al., 2007; Petersen, R C, 2007); however, clinical criteria alone can not identify with certainty which subjects will progress more rapidly than others. For this reason, predictive information from imaging has been sought to supplement clinical prognostic indicators. Studies demonstrating the ability of MRI to predict future progression have taken several forms. Studies using time-to-event methods are appropriate when follow up times vary among subjects in the cohort which is most commonly the case. Such studies typically employ Cox proportional hazards models in which cut offs stratify a baseline MRI measurement into risk groups and the results are reported as hazard ratios (HR) (Jack et al., 1999). This type of analysis relates an imaging measurement to the time to progression from a diagnosis of MCI to AD, not to the life time risk of developing AD. A related method of analysis employs a rate of change at baseline as the predictor rather than a brain volume measurement at one point in time (Jack et al., 2005). If all subjects in the study have the same follow up time, then simply comparing baseline MRI between progressors and non-progressors is appropriate. Unfortunately, several papers have simply compared baseline MRI measures between progressors and non progressors when follow up times were not the same across subjects in the cohort. Inferences about imaging as a predictor may be invalid in this situation because subjects classified as progressors may simply be those who have longer follow up times than subjects classified as non-progressors. Table 3 illustrates examples of studies evaluating the ability of baseline MRI measures to predict time to progression from MCI to AD. Results vary depending on measurement method, source of subjects, and statistical endpoints.
The idea of using change-over-time measures of brain volume on serial MRI was introduced by Fox and Freeborough (Freeborough and Fox, 1997). This approach has appeal as a means of measuring disease progression that is independent of clinical assessment. It has found utility in assessments of individual subjects; in longitudinal observational studies; and as an outcome measurement in therapeutic trials. The potential of change-over-time measures as outcomes in therapeutic trials is particularly appealing because longitudinal MRI measures have considerably better precision and therefore can be powered with much smaller sample sizes than traditional clinical assessment tools. A number of different methodological approaches have been employed ranging from simple manual tracing to sophisticated TBM methods. Several investigators have shown that the lower variance in the serial MRI measurements compared to clinical measures of cognition and function could potentially permit performing clinical trials with smaller sample sizes than would be possible using traditional clinical instruments (Fox et al., 2000; Hua, X. et al., 2008; Jack et al., 2003; Schott et al., 2006; Vemuri, P. et al., 2010). Table 4 illustrates examples of sample sizes needed to power AD or MCI trials. Results vary depending on measurement method, assumptions about the trial design, and statistical methods.
The Alexander Family Alzheimer's Disease Research Professorship of the Mayo Foundation, USA, and the Robert H. and Clarice Smith Alzheimer's Disease Research Program of the Mayo Foundation, USA. This study was supported by the NIH/National Institute on Aging (R01 AG011378).