|Home | About | Journals | Submit | Contact Us | Français|
Neurodegenerative disorders, such as Alzheimer’s disease, are associated with changes in multiple neuroimaging and biological measures. These may provide complementary information for diagnosis and prognosis. We present a multi-modality classification framework in which manifolds are constructed based on pairwise similarity measures derived from random forest classifiers. Similarities from multiple modalities are combined to generate an embedding that simultaneously encodes information about all the available features. Multimodality classification is then performed using coordinates from this joint embedding. We evaluate the proposed framework by application to neuroimaging and biological data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Features include regional MRI volumes, voxel-based FDG-PET signal intensities, CSF biomarker measures, and categorical genetic information. Classification based on the joint embedding constructed using information from all four modalities out-performs classification based on any individual modality for comparisons between Alzheimer’s disease patients and healthy controls, as well as between mild cognitive impairment patients and healthy controls. Based on the joint embedding, we achieve classification accuracies of 89% between Alzheimer’s disease patients and healthy controls, and 75% between mild cognitive impairment patients and healthy controls. These results are comparable with those reported in other recent studies using multi-kernel learning. Random forests provide consistent pairwise similarity measures for multiple modalities, thus facilitating the combination of different types of feature data. We demonstrate this by application to data in which the number of features differ by several orders of magnitude between modalities. Random forest classifiers extend naturally to multi-class problems, and the framework described here could be applied to distinguish between multiple patient groups in the future.
Changes in multiple biomarkers may provide complementary information for the diagnosis and prognosis of neurodegenerative disorders such as Alzheimer’s disease (AD). At present, the clinical diagnosis of AD is based on assessments of cognition and behaviour, which start to decline in the later disease stages. Recently published revisions to the diagnostic criteria (Albert et al., 2011; McKhann et al., 2011; Sperling et al., 2011) incorporate suggestions that biological and neuroimaging measures of structural and molecular changes in the brain may be better suited for the early detection of AD, as well as for monitoring its progression. Automated classification of individual patients based on multiple biomarkers could provide valuable support for clinicians, when considered alongside cognitive assessment scores and traditional visual image analysis. This could be particularly useful for monitoring the progression of patients with mild cognitive impairment (MCI), which is often a transitional stage between the cognitive decline associated with normal ageing, and that of established AD.
The neuropathological changes associated with the development of AD begin many years before cognitive symptoms become apparent. According to the amyloid cascade hypothesis, the disease process begins with the formation of insoluble β-amyloid (Aβ) plaques, whose presence triggers the hyperphosphorylation of tau protein, ultimately leading to cell death (Selkoe, 1991). Surrogate measures of the levels of Aβ and tau in the brain may be obtained from the cerebrospinal fluid (CSF). Various studies have shown reduced CSF Aβ and elevated CSF tau in AD patients compared with cognitively normal individuals (for example, Motter et al. (1995); Vandermeeren et al. (1993)). MCI patients tend to have CSF Aβ and tau levels between those expected of AD patients and healthy controls, with AD-like biomarker levels associated with an increased likelihood of progression to AD (Hansson et al., 2006). Positron emission tomography (PET) imaging with radiotracers for amyloid provides an alternative method for assessing intracranial Aβ deposition (Klunk et al., 2004).
PET imaging with the radiotracer [18F]-fluorodeoxyglucose (FDG) can be used to assess brain function in terms of the rate of cerebral glucose metabolism. Both AD and MCI are associated with significantly reduced glucose metabolism in affected regions, including temporal and parietal lobes, and the posterior cingulate cortex (Herholz et al., 2002; Langbaum et al., 2009). Reduced metabolism in AD patients is predictive of their cognitive decline and histopathological diagnosis (Hoffman et al., 2000; Silverman et al., 2001), and in MCI patients can predict their progression to AD (Anchisi et al., 2005). Changes in metabolism can be detected on FDG-PET before corresponding structural changes are visible (Aisen et al., 2010). FDG-PET-based classification techniques can discriminate both AD and MCI patients from healthy controls (Hinrichs et al., 2009).
The progressive structural damage caused by AD can be non-invasively assessed by using magnetic resonance imaging (MRI) to measure cerebral atrophy or ventricular expansion. Temporal lobe atrophy is closely associated with AD, and histological studies show that the hippocampus, amygdala and entorhinal cortex are particularly vulnerable to AD pathology (Braak and Braak, 1998). Accelerated hippocampal atrophy compared with healthy controls has been measured using MRI in both AD and MCI patients (Schuff et al., 2009; van de Pol et al., 2007). A recent study (Cuingnet et al., 2011) comparing ten MRI-based high-dimensional classification methods reported good results in discriminating between AD patients and healthy controls (up to 81% sensitivity and 95% specificity). Two methods relied on only hippocampal shape or volume, while the remainder were whole-brain approaches, using either cortical thickness measures, or voxel-wise tissue class probabilities. Four of the ten methods achieved slightly better accuracy than a random classifier in distinguishing MCI patients who later progressed to AD from MCI patients remaining stable over 18 months.
Although age is the most significant risk factor for AD (Rocca et al., 1991), genetic and environmental factors also play a role. The ApoE gene is the only one so far shown to be associated with sporadic late-onset AD (Dawbarn and Allen, 2007). There are three major alleles of this gene: ε2, ε3 and ε4. The most common is ε3, while ε4 is associated with an increased risk of developing AD, and ε2 with a reduced risk (Corder et al., 1993). More extensive AD pathology is generally observed in carriers of the ApoE ε4 allele than in non-carriers (Roses and Saunders, 1997). Genetics can therefore impact the biological and neuroimaging biomarkers. For example, AD carriers of the ApoE ε4 allele typically have reduced CSF Aβ, elevated CSF tau, and accelerated hippocampal atrophy on MRI compared with non-carriers (Tapiola et al., 2000; Schuff et al., 2009). Cognitively normal carriers of the ApoE ε4 allele display reduced glucose metabolism on FDG-PET in AD-typical regions (Langbaum et al., 2009).
The above biomarker patterns, however, are not specific to AD. For example, reduced CSF Aβ and elevated CSF tau are also associated with Lewy body dementia and vascular dementia (Andreasen et al., 2001; Blennow and Hampel, 2003). Temporal lobe atrophy is also observed in hippocampal sclerosis and temporal lobe epilepsy (Keihaninejad et al., 2012), and an AD-like metabolic pattern can also indicate Creutzfeldt-Jacob disease (Hoffman et al., 1990). It has been suggested that there may be some complementary information between modalities which can be exploited to produce more powerful combined classifiers (Walhovd et al., 2010; Landau et al., 2010). The potential utility of biomarker combinations is also supported by the recently revised diagnostic criteria (Albert et al., 2011; McKhann et al., 2011; Sperling et al., 2011). For research applications, these incorporate a biomarker-based probability of AD etiology which is highest when evidence of both amyloid deposition and neuronal degeneration or injury are observed. This support has resulted in an increasing interest in using multi-modality imaging and biological data for classification. Studies reporting multi-modality AD classification are far less common than those based on data from a single modality (Weiner et al., 2012). However, two independent studies (Zhang et al., 2011; Hinrichs et al., 2011) using multi-kernel learning report that classification based on multi-modality data is superior to that based on any individual modality. Classification accuracies of 93% between AD patients and healthy controls, and 76% between MCI patients and healthy controls were reported in Zhang et al. (2011), based on a combination of FDG-PET, MRI and CSF measures.
We present a framework for multi-modality classification based on pairwise similarity measures derived from random forests (Breiman, 2001). The similarities are used to construct a manifold representation from labelled training data and then to infer the clinical labels of test data mapped into this space. Classical multidimensional scaling (MDS) (Torgerson, 1952) is applied to learn a manifold on which to perform classification, resulting in the construction of a coordinate embedding in which the distances between points preserve their pairwise similarities. MDS is commonly used to provide low-dimensional visualisations of similarity relationships, including those derived from random forests (Hastie et al., 2011). Random forest-derived similarities have been successfully applied in unsupervised clustering tasks, for example those involving high-dimensional genetic or tissue microarray data (Shi and Horvath, 2006; Shi et al., 2005). Here, random forests are used to derive supervised similarity measures, with the aim of generating manifolds that are optimal for the task of clinical group discrimination. The proposed method facilitates the incorporation of multi-modality data, since similarities derived from several datasets may be combined to generate an embedding that simultaneously encodes information from all features.
Manifold learning techniques based on pairwise similarities between images have been applied in a variety of neuroimaging studies. For example, Laplacian eigenmaps (Belkin and Niyogi, 2003) have been used to generate an embedding of brain MR images based on similarities derived from overlaps of their structural segmentations (Aljabar et al., 2008). Isomap (Tenenbaum et al., 2000) has also been used to estimate the manifold structure of brain MR images, using distance measures based on nonrigid transformations between image pairs (Gerber et al., 2010). A framework for fusing manifold learning steps based on different pairwise similarity measures was presented in Aljabar et al. (2010). The method proposed here uses random forests to derive consistent pairwise similarity measures for multiple modalities, thus facilitating the combination of different types of feature data. We evaluate the proposed multi-modality classification framework by application to neuroimaging and biological data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). These data include FDG-PET and MR imaging data, CSF biomarker measures, and categorical genetic information. This work is an extension of preliminary studies involving neuroimaging data which have been previously presented at a workshop (Gray et al., 2011).
A schematic overview of the proposed approach is illustrated in Figure 1. A random forest classifier was applied to the feature data from each modality independently, firstly to obtain single-modality classification results for comparison, but also to derive the similarities required for manifold learning. The resulting similarity matrices were combined, and classical MDS was applied to generate a joint embedding for multi-modality classification. Full details of the data collection and feature extraction are presented in Section 3.1.
A random forest (Breiman, 2001) is an ensemble classifier consisting of many decision trees, where the final predicted class for a test example is obtained by combining the predictions of all individual trees, as illustrated in Figure 2. Random forests combine bootstrap aggregation (bagging) (Breiman, 1996) and random feature selection (Amit and Geman, 1997; Ho, 1998) to construct a collection of decision trees exhibiting controlled variation. We used the R implementation of random forests, a port of Leo Breiman and Adele Cutler’s original Fortran code by Andy Liaw and Matthew Wiener, version 4.6–6 (http://cran.r-project.org/web/packages/randomForest).
The training set for each individual tree in a random forest is constructed by sampling N examples at random with replacement from the N available examples in the dataset. This is known as bootstrap sampling, and bagging describes the aggregation of predictions from the resulting collection of trees. As a result of the bootstrap sampling procedure, approximately one third of the available N examples are not present in the training set of each tree. These are referred to as the “out-of-bag” data of the tree, for which internal test predictions can be made. By aggregating the predictions of the out-of-bag data across all trees, an internal estimate of the generalisation error of the random forest can be determined.
At each node in a tree, d D features are randomly selected from the D available features in the dataset, and the node is partitioned using the best possible binary split. A parent node np is partitioned into child nodes nl and nr according to the Gini index (Breiman et al., 1984), which measures the likelihood that an example would be incorrectly labelled if it were randomly classified according to the distrbution of labels within the node. For a binary split, the Gini index of a node n may be expressed as , where pc is the relative proportion of examples belonging to class c present in node n. The best possible binary split is the one which maximises the improvement in the Gini index ΔIG(np) = IG(np) − plIG(nl) − prIG(nr), where pl and pr are the proportions of examples in node np that are assigned to child nodes nl and nr, respectively. The Gini index can also be used to assess the relative importances of features for classification. A measure of the importance of an individual feature may be computed by summing the decreases in the Gini index occuring at all nodes in the forest which are partitioned based on that feature.
Random forests can provide measures of the similarity between pairs of examples in the dataset. Each of the N examples is represented by a feature vector, all of which are passed down each tree in the forest. The similarities are initialised to zero, and if examples i and j finish in the same terminal node of a tree, their similarity sij is increased by one. The final pairwise similarity measures are normalised by the total number of trees in the forest.
The similarities thus form a N × N matrix with elements sij, and corresponding distance matrix elements dij = 1 − sij (Cox and Cox, 2001). Manifold learning techniques may be applied to find an appropriate coordinate embedding for the feature vectors, such that the distance relationships between them are preserved. A review of the most popular manifold learning techniques, as applied to medical imaging, is provided in Aljabar et al. (2012). MDS (Torgerson, 1952) is commonly used in conjunction with random-forest derived similarity measures for low-dimensional data visualisation, as well as unsupervised clustering tasks. In this work, MDS is applied to similarity measures derived in a supervised manner from random forests, with the aim of generating manifolds that are optimal for the task of clinical group discrimination.
Using MDS, the matrix of coordinates X is derived by performing an eigen-value decomposition on the matrix of scalar products
Retaining only the eigenvectors corresponding to the k largest-valued eigenvalues leads to a k-dimensional embedding for the data. A goodness-of-fit parameter G, describing the extent to which the selected k eigenvectors represent the full matrix, can be useful in selecting an appropriate dimensionality for the embedding. The measure of goodness-of-fit used in this work is given by
where the eigenvalues λj are sorted in decreasing order (Mardia et al., 1979).
To generate an embedding that simultaneously incorporated information from multiple modalities, a joint similarity matrix S was defined as a linear combination of the similarity matrices from each of the four modalities Si. Each modality was assigned a weighting factor αi, such that , where . To ensure the best combination of the four modalities for classification, the ai parameters were optimised as part of the training process. This was achieved by performing a grid-search within the training data to select the optimum modality weightings. The classifier was then trained using the selected set of parameters, before having its performance assessed on the test data.
We applied the proposed methodology to neuroimaging and biological data from 147 ADNI participants for whom baseline 1.5 T MRI, FDG-PET and CSF biomarker measures were available, as well as ApoE genotype information.
Data used in the preparation of this article were obtained from the ADNI database (http://adni.loni.ucla.edu). The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of MCI and early AD. A more detailed description of the ADNI study is provided in Appendix A.
We use imaging and biological data from 147 ADNI participants, including 37 AD patients, 75 MCI patients, and 35 healthy controls (HC). These represent all participants for whom ApoE genotype information and baseline 1.5 T MRI, FDG-PET and CSF measures of Aβ, tau and phosphorylated tau (ptau) were available, subject to the exclusions described in Appendix B. The MCI patients were divided into those who have progressed to AD (pMCI) and those whose diagnoses have so far remained stable (sMCI) based on changes in clinical status occurring over 20 ± 11 months (range 6 – 36). Clinical and demographic information for the study population is provided in Table 1.
Automatic whole-brain segmentations into 83 anatomical regions were prepared in native MRI space using multi-atlas propagation with enhanced registration (MAPER), an approach that has been previously described and validated for use in AD (Heckemann et al., 2010). The segmentations are available to download through the ADNI website, and full details of the procedure and morphometric analysis are presented in Heckemann et al. (2011). The required atlas data consisted of manually segmented T1-weighted MR volumes from 30 young, healthy adults, as described in Hammers et al. (2003). Protocols for the manual delineation are described in Hammers et al. (2003) and Gousias et al. (2008).
To match the requirements of MAPER, additional pre-processing was applied for brain extraction and tissue classification. For brain extraction, intracranial masks were generated using binary masks covering intracranial white and grey matter as the starting point, as described in Heckemann et al. (2011). The binary masks had been generated with a semi-automatic procedure as part of a separate project using MIDAS (Freeborough et al., 1997; Leung et al., 2011). Individual tissue probability maps for grey matter, white matter and CSF were obtained using FSL FAST (http://www.fmrib.ox.ac.uk/fsl). Binary maximum-probability tissue class labels were employed to mask the anatomical segmentations. All regions except ventricles, central structures, cerebellum and brainstem were masked with a grey matter label. Lateral ventricles were masked with a CSF label. Regional volumes were normalised by the intracranial volume, resulting in 83 volumetric region-based features per image.
Each FDG-PET image was motion-corrected as necessary, converted to a 30-minute static image, examined for major artefacts, and affinely aligned with the corresponding MRI using the Image Registration Toolkit (IRTK; http://www.doc.ic.ac.uk/~dr/software). An affine transformation was preferred over a rigid one because it can account for scaling or voxel size errors remaining after phantom correction of the MRI (Clarkson et al., 2009).
Using the “Segment” module of SPM5 (http://www.fil.ion.ucl.ac.uk/spm), each MRI was linearly and non-linearly deformed to the Montreal Neurological Institute (MNI) template (Ashburner and Friston, 2005). The resulting transformation parameters were applied to the MRI-space PET images using trilinear interpolation. The MNI-space PET images (voxel sizes 2×2×2mm) were then smoothed to a common isotropic spatial resolution of 8 mm full-width-at-half-maximum (FWHM) using scanner-specific kernels (Joshi et al., 2009), and then by an additional 8 mm FWHM isotropic Gaussian kernel. The smoothed images were intensity normalised to account for inter-subject variability in overall radioactivity using an independently-derived cluster of relatively preserved regions (Yakushev et al., 2009). The apriori brainmask available in SPM5 was thresholded at 50% probability, and applied to each normalised FDG-PET image to exclude voxels outside the brain. Signal intensities were extracted from all remaining voxels, resulting in 239,304 voxel-based features per image.
The ADNI Biomarker Core provides biological data for the study participants. These data include CSF measures of Aβ, tau and ptau, as well as ApoE genotype information determined from a blood sample. Details of the biofluid collection and processing are provided in Trojanowski et al. (2010). The genetic feature data for each participant consisted of a single categorical variable describing their ApoE genotype. These data are summarised in Table 2.
Classification performance was assessed between three clinically relevant pairs of diagnostic groups (AD/HC, MCI/HC, pMCI/sMCI). Robust estimates of classifier performance were obtained using a stratified repeated random sampling approach. The mean accuracy, balanced accuracy, sensitivity and specificity were evaluated over 100 runs in which 75% of the data were randomly selected for training, with the remaining 25% used as test data. Balanced accuracy is the arithmetic mean of the sensitivity and specificity. This provides a more meaningful performance metric for groups of unequal sizes.
A random forest classifier was applied to the feature data from each of the four modalities independently, and the single-modality classification results obtained are presented in Table 3.
Before performing classification experiments, the number of trees grown in each forest, t, and the number of features randomly selected at each tree node, d, had to be chosen. We used t = 5, 000 for all the experiments that follow, since stable estimates of the out-of-bag classification error were consistently observed for t 1, 000. We used for all experiments, following the recommendation of Liaw and Wiener (2002). The value of d was also consistently observed to have little effect on the out-of-bag classification error estimate.
As described in Section 2.1, estimates of the relative importances of the features for classification may be extracted from the random forest. Feature importances for the two imaging modalities are shown in Figure 3. The most important features for MRI include volumes of the hippocampus, amygdala, and other medial temporal lobe structures. The most important features for FDG-PET include signal intensities of voxels located in the posterior cingulate gyrus, parietal lobe, posterior temporal lobe, and around the hippocampus.
The random forest classifiers described in Section 3.2.1 were used to derive pairwise similarity measures for each of the four modalities, as described in Section 2.2. Examples of the similarity matrices are shown in Figure 4.
MDS was applied to each similarity matrix, and a goodness-of-fit value of 90% was used to determine an appropriate dimensionality for the resulting embeddings. A random forest classifier was then applied to the embedded feature data from each of the four modalities independently. The single-modality classification results obtained are presented in Table 4, along with the dimensionality of each embedding.
MDS was applied to the joint similarity matrix constructed using information from all four modalities, and a goodness-of-fit value of 90% was again used to determine an appropriate dimensionality for the resulting embedding. A random forest classifier was applied to the embedded feature data, and the multimodality classification results obtained are presented in Table 5. The balanced accuracies based on multi-modality classification are significantly (p < 0.01) higher than those based on any individual modality for both the AD/HC and MCI/HC experiments. For the pMCI/sMCI experiment, however, the balanced accuracy based on multi-modality classification is not significantly different from that based on MRI information alone. Table 5 additionally shows results based on the application of a random forest classifier to the combined feature set generated by a simple concatenation of features from all four modalities. These show no significant differences from the results based on FDG-PET alone.
For each of the classification experiments, the distribution of modality weighting parameters selected over the 100 runs is illustrated in Figure 5.
We have presented a framework for multi-modality classification based on pairwise similarity measures derived from random forests. The similarities are used to construct a manifold representation from labelled training data and then to infer the clinical labels of test data mapped into this space. Random forests are used to derive similarity measures in a supervised manner, with the aim of generating manifolds that are optimal for the task of clinical group discrimination. The proposed method facilitates the incorporation of multi-modality data, since consistent similarities may be derived from several datasets, and combined to generate an embedding that simultaneously encodes information from all features. Multi-modality classification may then be performed using coordinates from this joint embedding. The applicability of the method to a large and diverse dataset is demonstrated using imaging and biological information from the ADNI study. This includes FDG-PET and MR imaging data, CSF biomarker measures, and categorical genetic information.
Classification based on the joint embedding constructed using information from all four modalities is superior to classification based on any individual modality for comparisons between AD patients and HC, as well as between MCI patients and HC. This finding is in agreement with other recent studies (Zhang et al., 2011; Hinrichs et al., 2011), and supports previous suggestions that there is some complementary information between modalities which can be exploited to produce more powerful combined classifiers (Walhovd et al., 2010; Landau et al., 2010). Classification performance is commonly reported in terms of accuracy, but here we perform statistical comparisons between experiments based on the balanced accuracy. This provides a more meaningful performance metric for groups of unequal sizes. In terms of accuracy, we achieve 89% classification between AD patients and HC, and 75% between MCI patients and HC.
Direct comparisons with existing work are complicated by factors such as the inclusion of different subjects and modalities, as well as the use of different methods for feature extraction and cross-validation. In assessing the proposed framework, we consider comparisons between the single- and multimodality classification performance more important, since these are based on the same method, data, and processing pipeline. Our multi-modality classification results are, however, comparable with those reported in other recent studies based on ADNI data. Zhang et al. (2011) combine MRI, FDG-PET and CSF data to achieve classification accuracies of 93% between AD patients and HC, and 76% between MCI patients and HC. Hinrichs et al. (2011) combine MRI, FDG-PET, CSF, genetic and cognitive data to achieve an accuracy of 92% between AD patients and HC. Both studies employ multi-kernel classifiers based on support vector machines, which are relatively well-established in the field (Weiner et al., 2012). Random forests have not been extensively applied in neuroimaging research, although they are becoming more widely used for applications including classification (Hope et al., 2008; Chincarini et al., 2011) and segmentation (Geremia et al., 2011; Iglesias et al., 2011). Classification of neurodegenerative disease based on the combination of imaging and non-imaging information is therefore a new application for random forests, and we are encouraged by the fact that our results are comparable with those of more established techniques.
The ability of random forests to extract consistent pairwise similarity measures for multiple modalities facilitates the combination of different types of feature data. We demonstrate this using datasets in which the number of features differ between modalities by several orders of magnitude. A simple concatenation of all features was shown not to optimally combine these data (Table 5), since the high-dimensional FDG-PET features dominate the results.
In previous work (Gray et al., 2011), we obtained comparable multi-modality classification results using only information extracted from the two neuroimaging modalities (accuracies of 90% for AD/HC, and 76% for MCI/HC). The lack of improvement over these previously reported results is likely to be attributable to the considerable reduction in size of the subject group as a result of our requirement for CSF biomarker information. The subject group for this study is approximately half the size of that used in Gray et al. (2011). The present work additionally employs a more robust form of cross-validation, using a stratified repeated random sampling approach, as opposed to the single round of ten-fold cross-validation employed in Gray et al. (2011).
In the context of a neuroimaging application, estimates of the importances of the features for classification are valuable because this allows assessment of whether the features that contribute most to the classifier correspond to regions or structures with a biologically plausible connection to pathology. In this work, the most important features for discriminating between clinical groups correspond with those known to be visibly affected in AD on both FDG-PET and structural MR imaging (Hampel et al., 2008; Patwardhan et al., 2004). Important features for distinguishing between AD patients and HC are more localised to affected areas, with the more challenging distinctions between MCI patients and HC, or pMCI and sMCI patients, requiring features spread across a larger part of the brain. The motivation for extracting region-based features from the MR images and voxel-based features from the FDG-PET images was to demonstrate that these two different types of imaging features could be readily combined using the proposed method.
The lack of significant difference between classification performance based on the original feature data and that based on the embedding coordinates for each individual modality is expected, since a random forest is already a nonlinear classifier. However, a difference is observed for the comparison between MCI patients and HC based on the voxel-wise FDG-PET features. This may be due to the inhomogeneity of the MCI group, which comprises both pMCI and sMCI patients. It is possible that the high-dimensional voxel-based FDG-PET features are sensitive to differences in the pattern of glucose metabolism between these two groups, resulting in a reduced classification performance based on the associated embedding coordinates.
Random forests are ensemble-based classifiers that are often applied to high-dimensional datasets. In this work, random forests are also applied to low-dimensional biological data so that consistent pairwise similarity measures may be obtained for all modalities. In the case of a single feature, such as the categorical genetic information, a random forest reduces to bootstrap aggregation.
Visualisation of the parameters selected to combine similarities for multimodality classification (Figure 5) provides some interesting insights into the relationships among the modalities. The figure indicates the optimum way in which to combine MRI, FDG-PET, CSF and genetic information within the framework described. For distinguishing between AD patients and HC, for example, it appears that FDG-PET and MR imaging features provide the most complementary information. For distinguishing MCI patients from HC, genetic information appears to have a relatively high importance. However, these indications may not necessarily have a direct biological interpretation, and are likely not sufficient to draw any firm conclusions about the relative importances of the four modalities. The optimum modality weightings for distinguishing between AD patients and HC are more stable than for the other two experiments. The weighting parameters selected for distinguishing MCI patients from HC may be more variable because the heterogeneity of the MCI group makes their selection dependent on the proportions of pMCI and sMCI patients in the training set. The selected modality weightings are not equal for any experiment, supporting the need to optimise the relative contributions of the four modalities for classification. The figure suggests an interesting avenue for further research, in that estimates of inter-modality correlations could help to determine the amount of complementary information between them. This could facilitate decisions on how to acquire the maximum amount of diagnostically relevant information for a patient using a minimum number of assessments.
The classification performance between pMCI and sMCI patients is not significantly improved by combining multi-modality information in this study. Other studies which apply multi-modality classification to this challenging problem have incorporated longitudinal information to improve performance (Hinrichs et al., 2011; Zhang et al., 2012). Our previous work based on FDG-PET (Gray et al., 2012) has also shown that incorporating longitudinal information can be beneficial to improve the ability to distinguish between these two groups. This will be an important issue to address in the future. It is important to consider, however, that progression from MCI to AD occurs at a rate of 10–15% per year (Petersen et al., 1999), with up to 80% of MCI patients developing AD over a six year period (Petersen, 2004). Longer clinical follow-up is therefore required to properly assess the utility of any classification method in separating pMCI from sMCI patients. This may be made possible by the continuation of the ADNI study in the form of ADNI-GO and ADNI-2. Further information about these studies is available via the ADNI website (http://adni.loni.ucla.edu/about/about-the-study/).
We have identified several areas for further research. Methodologically, the approach is generalisable, in that the manifold learning step could be performed using an alternative technique, and similarities could be combined using more sophisticated formulae. We applied MDS for manifold learning in this work because its application to random forest-derived similarities is straightforward, and relatively common in the literature (Hastie et al., 2011). However, MDS may not be the optimal embedding approach, and there is potential to improve the performance of this step by pursuing a thorough exploration of alternative techniques. Similarly, we chose to additively combine similarity matrices for simplicity, and there is potential for improvement by incorporating a more sophisticated formula. Random forest classifiers extend naturally to multi-class problems, and the framework described here could be applied to distinguish between multiple patient groups in the future. Exploration of the multi-class setting would better represent a real-world clinical scenario, including patients with other forms of dementia. In addition, the implementation of random forests used in this work could be modified to produce uncertainty information about the predicted diagnostic labels. This may be more useful to clinicians than a simple binary prediction, and could be achieved by having each leaf node store a probabilistic distribution of labels, rather than a point estimate. Criminisi et al. (2012), for example, describes this and other extensions to the original random forests algorithm, and presents a unified model of random decision forests for classification, regression, density estimation, manifold learning, and semi-supervised learning. Further investigations into different data fusion strategies may also be interesting. Although the feature level fusion approach proposed in this work can capture interdependencies between modalities, the application of a data level fusion technique to the neuroimaging data may be advantageous in terms of capturing local inter-modality correlations.
The authors would like to thank Igor Yakushev (University of Mainz, Germany) for provision of the reference cluster image used for FDG-PET intensity normalisation, as well as Casper Nielsen and Kelvin Leung (Dementia Research Centre, University College London) for provision of the MIDAS brain masks used for MRI anatomical segmentation. Katherine R. Gray received a studentship from the Engineering and Physical Sciences Research Council. Rolf A. Heckemann was supported by a research grant from the Dunhill Medical Trust. The Image Registration Toolkit was used under licence from Ixico Ltd.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical Development, Elan Corporation, Genentech, GE Healthcare, GlaxoSmithKline, Innogenetics, John-son and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-Plough, Synarc, Inc., as well as non-profit partners the Alzheimer’s Association and Alzheimer’s Drug Discovery Foundation, with participation from the U.S. Food and Drug Administration. Private sector contributions to ADNI are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514, and the Dana Foundation.
The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organisations, as a $60 million, five-year public-private partnership. Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California - San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations. Subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit approximately 200 cognitively normal older individuals to be followed for three years, 400 MCI patients to be followed for three years, and 200 early AD patients to be followed for two years. Further up-to-date information is available on the ADNI information website (http://www.adni-info.org).
In this work, we use imaging and biological data from 147 ADNI participants. These represent all participants for whom ApoE genotype information and baseline 1.5 T MRI, FDG-PET and CSF measures of Aβ, tau and phosphorylated tau were available, subject to the following exclusions. FDG-PET images acquired using either the Siemens HRRT or BioGraph HiRez scanners were excluded due to differences in the pattern of FDG metabolism that were discovered during the ADNI quality control process. Further information is available on the ADNI PET Core website (http://www.loni.ucla.edu/twiki/bin/view/ADNI/ADNIPETCore). A small number of subjects (n = 14) whose imaging data could not be processed as required, or whose diagnosis did not clearly fall into one of the four diagnostic groups (AD, pMCI, sMCI, HC), were also excluded. ADNI subject identifiers for these participants are provided as supplementary material, along with more detailed reasons for their exclusion.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.