Alzheimer’s disease (AD) is the most common type of dementia, accounting for 60–80% of age-related dementia cases [
1]. AD currently affects about 5.3 million people in the US, with a significant increase predicted in the near future if no disease-altering therapeutics are developed [
1]. In AD patients, neurons and their connections are progressively destroyed, leading to loss of cognitive function and ultimately death. As therapeutic intervention is most likely to be beneficial in the early stage of the disease, identification of a biosignature that enables an earlier and more accurate diagnosis of AD is an important goal. Mild Cognitive Impairment (MCI), a transitional stage between normal aging and the development of dementia, has been introduced to account for the intermediate cognitive state where patients are impaired on one or more standardized cognitive tests but do not meet the criteria for clinical diagnosis of dementia [
2]. The American Academy of Neurology has recognized MCI as an important clinical group to be identified and monitored [
3]. Patients with MCI are at high risk of progression to dementia; it is estimated that 10–15% of these patients progress to AD annually. MCI has thus attracted increasing attention, because it offers an opportunity to target the disease process early. More recently, MCI has been further classified according to the presence or absence of a primary memory deficit (amnestic and nonamnestic MCI, respectively), either in relative isolation (single domain) or accompanied by other types of cognitive deficits (multiple domain). As the amnestic form of MCI, single or multiple domain, has the greatest risk of progression to dementia, it has been a primary focus of interest in aging studies. There is thus an urgent need to address two major research questions: (1) how can we identify MCI individuals with high likelihood of progression to dementia (2) what is the biosignature most predictive of the conversion from MCI to AD. Brain atrophy measured by MRI scans, positron emission tomography (PET) including imaging of amyloid burden, and CSF measurements including Aβ
42 and total tau (t-tau) have been the prime candidate biosignatures for diagnosis and tracking disease progression.
Neuroimaging has been shown to be a powerful tool for the ex ploration of disease progression and therapeutic efficacy in AD and MCI. Neuroimaging research offers great potential to identify features that can identify individuals early in the course of dementing illness; several candidate neuroimaging biosignatures have been examined in recent cross-sectional and longitudinal neuroimaging studies [
4,
5]. Realizing the importance of neuroimaging, NIH in 2003 funded the Alzheimer’s Disease Neuroimaging Initiative (ADNI). All subjects in ADNI undergo 1.5T or 3T structural Magnetic Resonance Imaging (MRI) scans. Half of the subjects undergo Positron Emission Tomography (PET) scans. While FDG-PET scans may show a high sensitivity or specificity for the early detection of AD, the validation of structural MRI markers is the core project in ADNI due to its greater availability, faster data acquisition, and lower cost. Structural MRI, in particular, has great potential in enabling earlier clinical diagnosis and predicting disease progression. Previous studies have demonstrated that the hippocampus and the entorhinal cortex of MCI patients are typically smaller than those measured in normal controls, and are predictive of future conversion to AD [
4]. As the specificity of the prediction is still low [
5], current work continues to examine additional regions and pattern changes for more accurate prediction.
Besides brain atrophy measured by MRI scans, CSF measurements including total tau (t-tau), phosphorylated tau (p-tau), and Aβ
42 were identified as being among the most promising and informative AD biosignatures. Increased CSF concentrations of t-tau and p-tau and decreased concentrations of Aβ
42 are found in MCI and AD, and their combination is considered to be characteristic of AD. However, there is considerable variability of published opinion on the utility of CSF measurements for predicting conversion from MCI to AD [
6-
8]. This may be attributable to the small number of subjects used in many of the previous studies and the variability in their measurement methodology.
In addition to MRI and CSF measurements, there are various clinical/cognitive assessment scores from the ADNI data set that are potentially useful for the prediction of MCI-to-AD conversion, including Mini Mental State Examination (MMSE), Clinical Dementia Rating Sum of Boxes (CDR-SB), Alzheimer’s Disease Assessment Scale-cognitive subscale (ADAS-cog), Logical Memory immediate (LIMM) and delayed (DELL) paragraph recall, Activities of Daily Living Score (from the Functional Activities Questionnaire, FAQ), and Trail Making Tests: Part A (TRAA) and Part B (TRAB). Clinical/cognitive assessments offer potential advantages over imaging or CSF biomarkers since the use of imaging and CSF biomarkers could severely limit the number of participants screened for a study. Although MRI, CSF, and clinical/cognitive assessments have been extensively studied in the past, few reports have compared and combined various measurements from MCI subjects. In this study, we use a large number of samples from ADNI to test:
(1) the ability of various baseline data (MRI, demographic, genetic and cognitive measures) for predicting the conversion from MCI to probable AD
(2) the power of integrating various baseline data in order to identify a biosignature (small subset of predictive biomarkers) for prediction of the conversion from MCI to probable AD and
(3) the use of CSF biomarkers for predicting the conversion from MCI to probable AD and the potential of increasing predictive accuracy by combining CSF biomarkers with other measurements.
The main technical challenge is how to integrate effectively various baseline data for classification (MCI Converts versus MCI Non-converts). A simple approach for data integration is to form a long vector for each sample (subject) by concatenating the features from all baseline data, which is then fed into a classifier such as support vector machines (SVM) [
9]. To deal with the high dimension/small sample size problem, feature selection, which selects a small subset of features for improved generalization performance, is commonly applied. Most existing feature selection algorithms such as the
t-test perform univariate feature ranking [
10], and they fail to take the feature correlation into consideration. In this paper, we apply sparse logistic regression for feature selection, which selects a small subset of features using the L
1- norm regularization [
11]. The L
1-norm regularization is appealing in many applications due to its sparsity-inducing property, convenient convexity, and strong theoretical guarantees [
12]. An important issue in the practical application of sparse logistic regression is the selection of an appropriate amount of regularization, known as model selection. Cross validation is commonly used for model selection, however it tends to select more features than needed. In this paper, we employed stability selection, a method recently proposed to address the problem of proper regularization using subsampling/bootstrapping [
13].
Our study differs from others in three important respects: (1) we use a large cohort of MCI samples that are unbiased with respect to age or education status between case and controls (2) we integrate and test various types of baseline data available in ADNI including MRI, demographic, genetic and cognitive measures and (3) we apply sparse logistic regression with stability selection to ADNI data for robust feature selection. We have evaluated sparse logistic regression with stability selection on a set of 319 MCI subjects from ADNI, including 177 MCI Non-converters and 142 MCI Converters (the conversion was considered over the course of a 4-year follow-up period). Our experiments show that a combination of 15 features from MRI scans, APOE genotyping, and cognitive measures selected by sparse logistic regression with stability selection achieves an AUC score of 0.8587.