PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Arch Neurol. Author manuscript; available in PMC 2011 September 1.
Published in final edited form as:
PMCID: PMC3069805
NIHMSID: NIHMS262518

A Serum Protein-Based Algorithm for the Detection of Alzheimer's Disease

Abstract

Background

Alzheimer's disease (AD) is the most common form of age-related dementia and one of the most serious health problems in the industrialized world. Biomarker approaches to diagnostics would be more time and cost effective and may also be useful for identifying endophenotypes within AD patient populations.

Methods

We analyzed serum protein-based multiplex biomarker data from 197 patients diagnosed with AD and 203 controls from a longitudinal study of Alzheimer's disease being conducted by the Texas Alzheimer's Research Consortium to develop an algorithm that separates AD from controls. The total sample was randomized equally into training and test sets and random forest methods were applied to the training set to create a biomarker risk score.

Findings

The biomarker risk score had a sensitivity and specificity of 0.80 and 0.91, respectively and an AUC of 0.91 in detecting AD. When age, gender, education, and APOE status were added to the algorithm, the sensitivity, specificity, and AUC were 0.94, 0.84, and 0.95, respectively.

Interpretation

These initial data suggest that serum protein-based biomarkers can be combined with clinical information to accurately classify AD. Of note, a disproportionate number of inflammatory and vascular markers were weighted most heavily in analyses. Additionally, these markers consistently distinguished cases from controls in SAM, logistic regression and Wilcoxon analyses, suggesting the existence of an inflammatory-related endophenotype of AD that may provide targeted therapeutic opportunities for this subset of patients.

Introduction

There is clearly a need for reliable and valid diagnostic and prognostic biomarkers of Alzheimer's disease (AD) and, in recent years there has been an explosive increase of effort aimed at identifying such markers. It has been previously argued that, due to significant advantages, the ideal biomarkers would be gleaned from peripheral blood1. Peripheral blood can be collected at any clinic (or in-home visit) whereas most clinics are not capable of conducting lumbar punctures. Furthermore, advanced neuroimaging techniques are typically only available in large medical centers of heavily urbanized areas. A blood-based algorithm greatly increases access to advanced detection and, while nearly all patients are willing to undergo venipuncture, fewer elderly patients agree to lumbar puncture and many are unable to undergo neuroimaging for a range of reasons (e.g. pacemakers).

Even though there is a large literature demonstrating altered levels of a range of biomarkers (CSF, serum and plasma) in AD patients (as well as MCI patients) relative to controls, attempts to identify a single biomarker specific to AD have failed. In the highly publicized Ray et al2 publication, a large set of plasma-based proteins was analyzed in an effort to identify a biomarker profile indicative of AD. The overall classification accuracy for their algorithm was 90%; additionally, their algorithm accurately identified 81% of MCI patients who would progress to AD within a 2-6 year follow-up period. To date, however, these findings have not been cross-validated, nor has an independent blood-based (particularly serum-based) algorithm been published.

In addition to offering more accessible, rapid, as well as cost- and time-effective methods for assessment, biomarkers (or panels of biomarkers) also hold great potential for the identification of endophenotypes within AD populations associated with particular disease mechanisms. Once identified, targeted therapeutics specifically tailored to endophenotype status could be tested. Drawing upon an example from cardiovascular disease, by identifying a subset of patients where atherosclerosis is pathogenically related to hypercholesterolemia, plasma cholesterol is a useful biomarker in the management of coronary artery disease. Plasma cholesterol measurements are useful as indicators of efficacy of treatment with HMG-CoA reductase inhibitors. Translating this conceptual framework to AD would be a major advancement in this field3. The identification of a pro-inflammatory endophenotype of AD would have implications for targeted therapeutics for a subgroup of patients such that those with an over-expression of the pro-inflammatory biomarker profile may benefit from treatment with anti-inflammatory compounds while those patients with an under-expression of this profile may get worse on such treatment.

In the current study we sought to (1) determine if a serum-based biomarker algorithm would significantly predict AD status, (2) evaluate if inclusion of demographic variables directly into the algorithm would improve the overall classification accuracy and (3) determine if there was a predominance of inflammatory-related markers that were over- or under-expressed in AD, which would be an initial step towards the concept of an inflammatory-related AD endophenotype.

Methods

Participants

Participants included 400 individuals (197 AD subjects, 203 controls) enrolled in the Texas Alzheimer's Research Consortium (TARC). The methodology of the TARC project has been described in detail elsewhere4; each participant underwent a standardized annual examination at the respective site that includes a medical evaluation, neuropsychological testing, and interview. Each participant also provided blood for storage in the TARC biobank. Diagnosis of AD status was based on NINCDS-ADRDA criteria5 and controls performed within normal limits on psychometric assessment. Institutional Review Board approval was obtained at each site and written informed consent was obtained for all participants.

Assays

Non-fasting blood samples were collected in serum-separating tubes during clinical evaluations, allowed to clot at room temperature for 30 minutes, centrifuged, aliquoted, and stored at -80°C in plastic vials. Batched specimens from either baseline or year-one follow-up exams were sent frozen to Rules Based Medicine (RBM, www.rulesbasedmedicine.com, Austin, TX) where they were thawed for assay without additional freeze-thaw cycles using their multiplexed immunoassay human Multi-Analyte Profile (humanMAP). Multiple proteins were quantified though multiplex fluorescent immunoassay utilizing colored microspheres with protein-specific antibodies. Information regarding the least detectable dose (LDD), inter-run coefficient of variation, dynamic range, overall spiked standard recovery, and cross-reactivity with other human MAP analytes can be readily obtained from Rules Based Medicine. As with all such technologies, rapid evolution is expected, therefore, the complete list of analytes utilized from the humanMAP at the time of the current analyses is provided in Appendix 1.

Statistical Analyses

Analyses were performed using R (V 2.10) statistical software6. Fisher's exact and Mann Whitney U tests were used to compare case versus controls for categorical variables (APOE ε4 allele frequency, gender, race, or ethnicity) and continuous variables (age and education). The biomarker data was log transformed and then standardized for each analyte. The random forest prediction model was performed using R package randomForest (V 4.5)7, with all software default settings. We used the method by Bair et al8 to de-correlate the RBM biomarker data and clinical variables. The ROC (receiver operation characteristic) curves were analyzed using R package AUC (area under the curve) was calculated using R package DiagnosisMed (V 0.2.2.2). The significant analysis of Microarray (SAM) was performed using R package samr (V 1.27)9. The FDR (false discovery rate) was calculated to address the multiple comparison issues. The FDR from SAM analysis was determined by permutation and those from Wilcoxon test and logistic regression model were determined by fitting the p values to Beta-uniform models10. The Beta uniform models were fitted using R package ClassComparison (V 2.5.0) (http://bioinformatics.mdanderson.org/Software/OOMPA/).

Results

Demographic characteristics of the study population are shown in Table 1. Alzheimer's patients were significantly older (p<0.001), less educated (p < 0.001), and more likely to carry at least one copy of the APOE ε4 allele (p < 0.001) than control participants.

Table 1
Participant Demographic Information

Once randomized into a training set or a testing set via random number generator, a random forest (RF) prediction model was built with the training set using all of the markers in the RBM human MAP. Using the training set as a guide, the random forest algorithm assigned a risk score to each subject in the test set that was reflective of the probability of being diagnosed with AD. Using the humanMAP markers, when the cut-off for the risk score was set at to optimize performance, 0.47 (i.e. patient's risk score > 0.47 = AD, ≤ 0.47 = control), the area under the curve (AUC) for the biomarker algorithm was 0.91 (95% CI = 0.88 - 0.95), the sensitivity and specificity were equal to 0.80 (95% CI = 0.71 - 0.87) and 0.0.91 (95% CI = 0.81-0.94), respectively. Of note, when the non-optimized cut-off of 0.5 was used, the results did not change significantly (AUC = 0.91, sensitivity = 0.73, specificity = 0.91). To test the robustness against allocation to training and test sets, randomization was also done by TARC site, which yielded an AUC of 0.88 demonstrating the robustness of the algorithm against choice of methodology. Figure 1 presents a variable importance plot of protein markers measured by the random-forest built from the training set.

Figure 1
Variable importance plot of protein biomarkers measured by the Random Forest built from training set.

Next the biomarker data was de-correlated8 from the clinical variables of age, gender, education, and APOE status and an additional random forest prediction model generated. Results from the multivariate logistic regression model (Table 2) demonstrate that the biomarker risk score was a significant, independent predictor of case status. As can be seen in Table 3, clinical data alone accurately classified a large portion of the sample, which was comparable to, though somewhat less accurate than the performance of the biomarker profile alone. However, a combined algorithm using biomarker and clinical data was superior to either alone (see Table 3 and Figure 2). Using the non-optimized cut-off for the biomarker risk score did not change the findings for the algorithm using both clinical and biomarker data (AUC = 0.95, sensitivity = 0.90, specificity = 0.87).

Figure 2
ROC curve for clinical variables alone and in conjunction with biomarker data
Table 2
Results from logistic regression models for test set
Table 3
Diagnostic accuracy of clinical variables alone and in conjunction with biomarker data when applied to test set

SAM analysis with a FDR of < 0.001 identified a total of 23 proteins with that were either differentially over (n=14) or under (n=9) expressed in AD relative to controls (see Table 4). There were 22 proteins identified by the Wilcoxon test with a FDR less than 0.0025 and 22 by logistic regression with a FDR less than 0.01. Figure 3 demonstrates the consistency between methods utilized. Supporting our notion of a possible inflammatory-related endophenotype present with AD patients, 10 (MIP1, eotaxin 1, TNFα, fibrinogen, IL5, IL7, IL10, CRP, MCP1, and von Willebran Factor) of the total 30 markers identified in Figure 1 were inflammatory in nature.

Figure 3
Venn diagram demonstrating consistency across methods for identifying altered protein in test set expression in AD
Table 4
Proteins with differential expression in AD cases based on SAM analysis

Discussion

In a recently highly publicized study, Ray et al2 identified a subset of 18 plasma-based proteins that yielded excellent classification accuracy in case versus controls and our serum protein-based algorithm yielded comparable accuracy. It is noteworthy that the markers from our study (Figure 1) have only minimal overlap with those presented by Ray at al (ANG2 and TNFα). It is likely that this differential signature profile resulted from different sample mediums as well as different assay platforms. Our study has multiple distinct advantages over that of Ray et al. First, our serum protein assays were conducted by RBM, who have developed high throughput methodologies for reliable assay of high volumes of samples and analytes. RBM is the leading biomarker company in the U.S., working with multiple pharmaceutical companies, the Alzheimer's Disease Neuroimaging Initiative, as well as several of the leading AD biomarker research labs both within and outside the U.S. Second, our sample size of controls and AD cases is more than twice as large of the sample utilized by Ray and colleagues. Third, our study included demographic information in the predictive algorithm (age, gender, education, APOE status) and we demonstrated that the combination of biomarker and clinical information yields superior results to either alone. Finally, our study is unique in that we are the first group to present serum-based findings.

In support of our theory of the existence of an inflammatory endophenotype, many of the proteins with the highest importance from the RF analyses were inflammatory in nature (Figure 1). Additionally, when SAM analyses were conducted, a large portion of the proteins either identified as over- or under-expressed were inflammatory in nature. Taken together, these data suggest the existence of an inflammatory endophenotype within Alzheimer's disease cases, which could offer targeted therapeutic options for this subgroup of patients.

Of note, it is possible that the algorithm identified in the current study is not AD-specific. The current findings are a preliminary in nature and follow-up is necessary to test the ability of the algorithm to detect AD when mixed in with non-AD dementia samples. It is also possible that the inflammatory signature observed is not specific to AD, but rather is related to other co-morbid factors (e.g. cardiovascular disease). In fact, it is likely that a pro-inflammatory endophenotype exists within patients diagnosed with other dementia syndromes. Such a finding would further support the utility of a pro-inflammatory endophenotype, as it is likely to represent a common pathway for a wide array of diseases.

The identification of blood-based biomarker profiles with good diagnostic accuracy would have a profound impact worldwide and requires further validation. Additionally, the identification of pathway-specific endophenotypes among AD patients would likewise have implications for targeted therapeutics as well as understanding differential progression among diagnosed cases. With the rapidly evolving technology and analytic techniques available, Alzheimer's disease researchers now have the tools to simultaneously analyze exponentially more information from a host of modalities, which is likely going to be necessary to understand this very complex disease.

Supplementary Material

Appendix 1

Acknowledgments

This study was made possible by the Texas Alzheimer's Research Consortium (TARC) funded by the state of Texas through the Texas Council on Alzheimer's Disease and Related Disorders. Investigators at the University of Texas Southwestern Medical Center at Dallas also acknowledge support from the UTSW Alzheimer's Disease Center NIH, NIA grant P30AG12300. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Investigators from the Texas Alzheimer's Research Consortium: Baylor College of Medicine: Eveleen Darby, Kinga Szigeti, Aline Hittle; Texas Tech University Health Science Center: Paula Grammas, Benjamin Williams, Andrew Dentino, Gregory Schrimsher, Parastoo Momeni, Larry Hill; University of North Texas Health Science Center: Janice Knebl, James Hall, Lisa Alvarez, Douglas Mains; University of Texas Southwestern Medical Center: Roger Rosenberg, Ryan Huebinger, Janet Smith, Mechelle Murray, Tomequa Sears

Footnotes

Disclosure: A patent has been filed in conjunction with Rules Based Medicine for the algorithm contained within this manuscript. The following authors are named on the patent: SE O'Bryant, RC Barber, R Diaz-Arrastia, G Xiao, PM Adams, JS Reisch, RS Doody, and TJ Fairchild.

References

1. Graff-Radford NR, Crook JE, Lucas J, et al. Association of low plasma Abeta42/Abeta40 ratios with increased imminent risk for mild cognitive impairment and Alzheimer disease. Archives of Neurology. 2007;64(3):354–362. [PubMed]
2. Ray S, Britschgi M, Herbert C, et al. Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins. Nature Medicine. 2007;13(11):1359–1362. [PubMed]
3. Thal LJ, Kantarci K, Reiman EM, et al. The role of biomarkers in clinical trials for Alzheimer disease. Alzheimer Disease & Associated Disorders. 2006;20(1):6–15. [PMC free article] [PubMed]
4. Waring S, O'Bryant SE, Reisch JS, Diaz-Arrastia R, Knebl J, Doody R, for the Texas Alzheimer's Research Consortium The Texas Alzheimer's Research Consortium longitudinal research cohort: Study design and baseline characteristics. Texas Public Health Journal. 2008;60(3):9–13.
5. McKhann D, Drockman D, Folstein M, et al. Clinical diagnosis of Alzheimer's disease: Report of the NINCDS-ADRDA Work Group. Neurology. 1984;34:939–944. [PubMed]
6. R Development Core Team R: A language and environment for statistical computing. 2009. http://www.R-project.org.
7. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32.
8. Bair E, Hastie T, Paul D, Tibshirani R. Prediction by supervised principal components: Department of Statistics, Stanford University technique report. 2004.
9. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(9):5116–5121. [PubMed]
10. Pounds S, Morris SW. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics. 2003;19(10):1236–1242. [PubMed]