Approach 1: Core Analytes, Single threshold
The use of a single analyte with a single threshold value is perhaps the most familiar classification strategy to the practicing physician. Each clinical laboratory has a set of defined cut-off values, sometimes age- and gender-adjusted, for common systemic illnesses associated with neurological manifestations such as vitamin B12 deficiency and hypothyroidism. Multiple peptides have been proposed as such a litmus test for AD (reviewed by Shaw et al.[
68]), and CSF levels of peptides associated with the underlying AD pathology[
5] – total tau, p-tau
181, and Aβ42 – have been the most widely studied surrogate markers. Alterations of these peptides have been found in autopsy-confirmed cases of AD:[
26] CSF Aβ42 levels are decreased in AD presumably due to aggregation of soluble extracellular Aβ42 peptides in neuritic plaques, while CSF tau and p-tau
181 levels are increased due to release from degenerating neurons.[
15,
68] Multiple monoclonal antibody combinations are available commercially or through the development of individual laboratories for each peptide, and standardized kits to be used in an enzyme-linked immunosorbent assay (ELISA) are commercially available. The clear advantages of using such an assay system include its low upfront cost and convenience, as most investigators have expertise in developing or applying ELISA to peptide targets without highly specialized reagents or instruments.
At the same time, the ease of conducting such assays comes at the cost of relatively higher inter-assay variability which has been the main barrier in translating ELISA-based analyses to the clinical diagnosis of AD. While coefficients of variation (%CV) can be reduced significantly over a large number of samples and assays at single centers,[
66] significant inter-center variability continues to be reported. In one study that attempted to address this type of variability across 14 clinical neurochemistry laboratories in Europe (Germany, Austria, and Switzerland) using commercially available ELISA kits (Innogenetics, Gent, Belgium), the CV of ELISA-based assays were in the 20–30% range (26% for tau, 27% for p-tau
181, and 29% for Aβ42).[
52] In another study in 2004, 13 international centers each received three CSF-pool samples, and the majority used the same commercially available ELISA kits to measure levels of AD biomarkers.[
79] The CV for Aβ42 ranged from 17 to 53%, with CV for tau ranging from 13–29% and for p-tau
181 ranging from 10 to 16%. The variability improved when the same samples were again assayed by the participating centers 4 years later, with CV ranging from 19–26% for Aβ42, 13–18% for tau, and 14–16% for p-tau
181.[
79]
This type of variability is especially troublesome when levels of different biomarkers are combined, as there is a certain degree of variation associated with each analyte analyzed in the model. For example, in our analysis of the best predictor combination of CSF AD biomarkers to differentiate between pathologically confirmed cases of AD and FTLD, our initial analysis derived a cut-off value of 1.06 for the CSF total tau to Aβ42 ratio (tau/Aβ42) to achieve 78.9% sensitivity and 96.6% specificity in distinguishing between FTLD and AD.[
11] However, our subsequent re-analysis of a larger cohort including both CSF samples from living patients and pathologically confirmed cases derived a cut-off value of 1.45 using the same set of autopsy-confirmed cases as our initial analysis.[
45] These differences were not due to cases having widely different relative levels of tau or Aβ42, but rather small differences in measured levels of tau and Aβ42. One potential solution is the inclusion of autopsy-confirmed cases within each experimental design to derive the receiver operating curve (ROC) for each assay, although there is a limited supply of autopsy-confirmed cases with ante-mortem CSF and it will be impractical to translate this approach to additional sites where similar assays will be performed. A second approach is to come up with systematic numeric normalization to account for the inter-site and inter-assay variability, which has been demonstrated in a multi-center study.[
55] The exact variability this method introduces is unclear, and deserves further review.
An alternative to the use of ELISA to measure levels of CSF AD biomarkers is the use of a bead-based approach (AlzBio3®, Innogenetics, Ghent, Belgium)[
47] in the Luminex platform. Instead of plate-mounted antibodies, antibodies targeting total tau, p-tau
181, and Aβ42 are attached to beads that can be suspended in solution. This method resembles more a solution-phase interaction between specific antibodies and paired antigens, and has been empirically determined to be associated with less intra- and inter-assay variability. As part of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Biomarker Core,[
69] a seven-site round robin study was undertaken to determine the inter-site and inter-assay variability of pooled CSF samples.[
76] The intra- and inter-assay CVs using 5 pooled CSF samples averaged 6.7–7.5% within single centers, and 18–20% for Aβ42, 12–19% for tau, and 15–26% for p-tau
181 between centers, compared to the 20–30% values typically seen in ELISA-based assays. Currently, additional work is underway to explore and minimize such inter-center CVs similar to what was done to ELISA-based assays. The advantages of a Luminex-based approach thus include the increased precision and the ability to multiplex assays using a very small sample volume of biofluid, while the disadvantages include the upfront cost and the maintenance of the specialized equipment. Using living cognitively normal subjects (n=52) and autopsy-confirmed cases of AD (n=56) from UPenn, Shaw and co-workers were able to use the Luminex platform in a multiplex fashion to derive threshold values for each CSF AD biomarker (i.e. total tau, p-tau
181, and Aβ42) to distinguish between AD and cognitively normal subjects.[
69] The observed trade-off between sensitivity and specificity was similar to that seen in ELISA-based analyses, with CSF Aβ42 levels garnering higher sensitivity at the cost of specificity, and CSF tau levels associated with higher specificity at the cost of lower sensitivity. Also similar to work based on ELISA, combination of the two biomarkers – either as a ratio or as linear combinations[
55,
69] – achieved a stronger balance between sensitivity and specificity, again suggesting that the use of multiple biomarkers instead of any single biomarker may generate the best diagnostic profile for AD against cognitively healthy subjects.
However, a number of questions remain unanswered in the use of a single cut-off value for each core analyte or analyte combination. For example, do the threshold values derived from studies of AD vs. cognitively healthy subjects apply to the distinction between AD and non-AD dementing illnesses? Given the lag time between clinical CSF collection and autopsy evaluation, most studies rely on analysis of living AD patients or living cognitively health subjects, and few reports are available on the relationship between these CSF AD biomarkers and the eventual pathologic diagnosis.[
19,
21,
29] As detailed neuropathologic analysis remains the gold standard against which biomarkers are compared, cut-off values based on living patients and control subjects may over- or under-estimate the diagnostic accuracy given any threshold level. Furthermore, such cut-off values are usually derived from comparisons between elderly AD patients (with or without autopsy confirmation) and age-matched cognitively normal control subjects. The proportion of cognitively healthy subjects with an “abnormal” (i.e. AD-like) pattern of CSF AD biomarker profile may increase with age either as a reflection of the increasing incidence of age-associated amyloidosis or pre-symptomatic AD.[
27] In 168 living cognitively healthy subjects who underwent CSF analysis by ELISA, Morris and colleagues found age-associated trends of change in Aβ42, total tau, and p-tau
181 all in the direction of pathologic alterations in those with and without the ApoE ε4 allele.[
58] When a cut-off value of 500 pg/mL was used to distinguish between normal and pathologic levels of CSF Aβ42, the proportion of cognitively healthy subjects with AD-like biomarker levels increased from 18.2% among those 45–49 years of age to 50% among those 80–89 years of age. Thus, cut-off values derived from studies using older cognitively health subjects must strike a balance between the sensitivity of detecting symptomatic and pre-symptomatic AD,[
28] and the specificity associated with clinical cognitive status. On the other hand, in younger subjects with cognitive concerns and patients with non-AD neurodegenerative disorders (FTLD, DLB, vascular dementia) who often have younger ages of onset than patients with AD,[
31,
41,
44] applying threshold values from elderly AD vs. age-matched cognitively healthy subjects may underestimate the proportion of younger patients with pathologic AD. Therefore, as our first analysis using CSF from patients with known pathology (n=91) and living cognitively healthy subjects (n=33, ),[
42] we sought to validate the diagnostic utility of established CSF AD biomarkers and their proposed threshold values in a Luminex platform in differentiating between pathologically confirmed AD and non-AD neurodegenerative disorders.
| Table 1Subjects included in the initial CSF biomarker studies, including cognitively healthy subjects and demented subjects with known pathology. Neuropathologic diagnosis was made based on published consensus criteria. |
Using previously proposed threshold values,[
69] CSF levels of total tau, p-tau
181, and Aβ42 in general achieved adequate performance in distinguishing between cases of AD and non-AD neurodegenerative disorders (). When each traditional AD biomarker or biomarker combination (e.g. CSF levels of total tau, p-tau
181, and Aβ42) was examined, measurement of Aβ42 levels again was highly sensitive with moderate specificity (overall diagnostic accuracy of 90.1%, other performance characteristics in ), while total tau and p-tau
181 levels achieved high specificity at the cost of sensitivity. Compared to the performance characteristics of the same biomarkers in differentiating between AD and cognitively healthy subjects, p-tau
181 was associated with much higher specificity when used to distinguish between AD and non-AD dementia cases (92.0%) than when used to differentiate between AD and cognitively healthy subjects (73.1%), although the ratio of tau/Aβ42 continued to provide the best balance between sensitivity and specificity.
| Table 2Sensitivity, specificity, and diagnostic accuracy of established CSF AD biomarker or biomarker combination in the distinction between autopsy-confirmed AD (n=66) and non-AD neurodegenerative disorders (n=25) using cut-off values (in parentheses) from (more ...) |
It is also noteworthy that the overall diagnostic accuracy of total tau or p-tau181 as a single biomarker to separate AD from non-AD disorders was at similar levels as chance alone (assuming all cases are AD; diagnostic accuracy=72.5%, with 100% sensitivity and 0% specificity). Thus, different threshold values may be potentially derived for the distinction between AD and non-AD dementia (). Hence, in this set of core analytes, the single threshold approach could be augmented by adjusting the cut-off values for Aβ42 to improve specificity without significant change in sensitivity, while doing the same for total tau and p-tau181 still demonstrated sub-optimal performance as lone biomarkers.
| Table 3Optimal threshold values and associated diagnostic performance for each traditional CSF AD biomarker or biomarker combination in the distinction between AD and non-AD neurodegenerative disorders. |
To improve upon the performance of these more established CSF AD biomarkers, we analyzed the levels of novel candidate biomarkers included in the RBM Human DiscoveryMAP
™ panel (hereby referred to as the MAP panel). This approach has been used for discovery of biomarkers for ovarian cancers,[
3,
10] coronary artery disease,[
38] Sjogren’s disease,[
24] and pre-natal screening for Down syndrome.[
50,
63] The MAP panel currently includes in its membership 189 analytes, and the levels of many MAP analytes have been proposed to be altered in AD when contrasted against non-AD cases[
19,
29,
65,
72]. The initial version of the RBM MAP panel available to us for CSF biomarker studies included 151 analytes, although only 106 analytes had measurable levels in the CSF.[
42] Among these 106 analytes, we searched for novel biomarkers which could substitute or complement the classification performance by established AD biomarkers (i.e. tau, p-tau
181, and Aβ42). This is important in two regards. First, as we demonstrated above in the single-analyte approach, there is a clear trade-off between sensitivity and specificity in using the more established CSF AD biomarkers to classify between AD and non-AD cases (cognitively healthy subjects and non-AD demented cases), and one or many of the MAP analytes may improve the sensitivity or specificity of AD classification with or without the traditional CSF AD biomarkers. Second, as existing therapeutic strategies in AD largely target the molecular signature of AD, it will be important to have an independent set of biomarkers not directly linked to Aβ42 or tau metabolism as surrogate endpoints in monitoring therapeutic response. In other words, if a gamma-secretase inhibitor[
30] is successful in restoring the CSF Aβ42 levels in patients clinically diagnosed with AD, restoration (partial or complete) of additional biochemical analytes altered in AD could be very useful in determining the extent of therapeutic success[
64] especially if clinical measures of cognition or imaging measures of brain atrophy do not demonstrate clear disease amelioration or regression. In identifying novel MAP biomarkers, we analyzed levels of MAP analytes between autopsy-confirmed cases of AD and cognitively normal subjects, and then between autopsy-confirmed cases of AD and non-AD neurodegenerative disorders. This two-stage approach was preferred over a comparison between AD and a combined cohort of non-AD cases (cognitively healthy and demented), as the two stage approach can isolate 1) analytes useful in the identification of AD, but whose levels are intermediate between cognitively normal subjects and non-AD disorders; 2) analytes common between AD and one other group of subjects, but distinct from the third group. Univariate analysis using Mann-Whitney U-test (a non-parametric analysis was chosen because MAP analyte levels in CSF usually had non-normal distribution even after transformations) showed that multiple analytes do differ between AD and cognitively normal subjects, but only a few differ between AD and non-AD neurodegenerative disorders. This result was the first suggestion that the majority of MAP analytes likely represented common pathway elements in neurodegeneration downstream from the initial steps of pathogenesis. Indeed, as the MAP panel is biased towards a large number of peptides in inflammatory, cell signaling and apoptotic pathways that are hypothesized to be activated in both AD and non-AD disorders, altered patterns of multiple such peptides in both AD and non-AD disorders should not be surprising.
Since AD and non-AD disorders (and possibly normal aging) will activate inflammatory and apoptotic pathways, any single MAP analyte is most likely insufficient to distinguish AD from other groups in the cohort. Indeed, on a single analyte level, CSF Aβ42 outperformed any other analyte with respect to the ability to distinguish between AD and cognitively health control subjects by generating the largest area-under-the-curve (AUC) i.e. an AUC of 0.920, with total tau closely behind (AUC of 0.813, ). At the same time, 35 other analytes outperformed p-tau
181 (AUC of 0.613) in the ROC curve analysis (), and platelet derived growth factor (PDGF; biological relevance to AD reviewed by us[
42]) had an intermediate AUC between Aβ42 and tau. Using a cut-off value of 432 pg/ml, CSF PDGF levels are associated with a sensitivity of 84.8% and a specificity of 72.7% (). If we replaced tau with PDGF in the traditional AD biomarker set, the PDGF/Aβ42 ratio can achieve a sensitivity of 89.4% and a specificity of 87.9%. Similar to CSF tau levels, CSF PDGF levels likely improved upon the low specificity associated with using CSF Aβ42 levels alone. As we hypothesized that elevated CSF tau levels serve as a surrogate marker of tau pathology mediated cell death, the improved specificity (against control subjects) with the introduction of PDGF levels in the diagnostic model is probably due to the distinction between cognitively normal subjects with decreased CSF Aβ42 levels and demented AD patients. When we examined this in detail, 4 of the 5 cognitively healthy subjects with low CSF Aβ42 levels had normal PDGF levels, and 3 of the 4 AD patients with normal CSF Aβ42 levels had elevated PDGF levels (78% accuracy in re-classification). In contrast, 3 of the 5 cognitively healthy subjects with low CSF Aβ42 levels had normal tau levels, and 1 of the 4 AD patients with normal CSF Aβ42 levels had elevated tau levels (44% accuracy in re-classification).
| Table 4AUC for traditional and novel AD biomarkers in the diagnosis of AD compared to cognitively healthy subjects. Higher levels of analytes are diagnostic of AD, except underlined analytes where lower levels are diagnostic of AD. |
Approach 3: Multi-analyte Profiling
Since combining traditional CSF AD biomarkers (e.g., deriving tau-to-Aβ42 ratio or a linear relationship among tau, p-tau
181, and Aβ42)[
55,
69] can improve the overall diagnostic accuracy over using individual component biomarkers, we hypothesized that combining multiple candidate biomarkers in combinatorial analyte profiling would improve the diagnostic performance of any individual novel biomarker. This type of approach has been used in cancer diagnosis using gene expression profiling[
14,
35,
75,
81] and in one prior study of plasma biomarkers of AD.[
65] A large number of classification schemes are available, including algorithms that are supervised and unsupervised,[
56] and strategies that reduce the dimensionality of data through feature selection,[
67] group membership with proxy features,[
65,
75] tree-based approaches,[
81] or a combination of the methods.[
62] As biomarker identification and performance may vary according to the choice of analytical strategy due to dependence of each strategy on certain underlying assumptions (normal distribution, accuracy and precision of features, etc), we sought to determine the most optimal classification biomarkers for AD using three independent strategies in discovering novel CSF biomarkers for AD. These include feature selection using univariate analysis followed by logistic regression, a classification tree-based strategy (random forests analysis), and a nearest-shrunken centroid approach (predictive analysis of microarrays or PAM).[
65,
75] Univariate analysis followed by logistic regression was included due to its common presence in the scientific literature, although it is associated with significant weakness in this instance due to the high dimensional data relative to the number of samples included. Thus, the smallest number of AD biomarkers emerged from the logistic regression model, largely due to sample size restrictions.[
42] The nearest-shurnken centroid method is explained in detail by Tibshirani.[
75] Briefly, it identifies analytes (genes in cancer expression profiles or proteins in our case) that are most representative of the class (e.g., AD) while eliminating analytes whose levels are more similar between classes. This approach was used by Ray et al. to identify plasma biomarkers of AD,[
65] and gives more weight to analytes whose levels are stable within samples of the same class. At the same time, success in deriving differences in gene expression from transformed tissue may not necessarily translate into success in finding differences in soluble protein levels in biofluids that are/are not in close contact with the organ of interest, which in this case is the CNS (i.e. CSF versus plasma, respectively). Thus, we chose a tree-based classification strategy as our last analytical approach, as random forests analysis is less influenced by the within class variability which may be inherent to biofluid-based studies. Lastly, as these analytical approaches may differ in power in terms of identifying unique vs. shared biomarker changes and two- vs. multi-group classifiers, we separately performed two-class comparisons (AD vs. health elderly seniors, AD vs. non-AD dementia) using all three statistical approaches.
This parallel analytical approach thus permits voting of class membership (healthy elderly control, AD, non-AD dementias) both on an analyte-by-analyte basis and on an analysis-by-analysis basis to generate biomarkers with the highest likelihood of characterizing AD versus controls and non-AD subjects from a CSF perspective. When only MAP biomarkers were used, a core group of analytes were identified by multiple (two or three) analytical strategies to be useful in the distinction between AD and cognitively healthy subjects (analytes in shaded areas, , left). Analytes determined by random forests analysis achieved 86.4% sensitivity and 81.8% specificity, in similar ranges as a model containing traditional AD biomarkers (i.e. CSF levels of total tau, p-tau181, and Aβ42) and demographic factors (age and gender). Analytes determined by PAM achieved 89.4% sensitivity and 75.8% specificity, also in similar ranges as traditional AD biomarkers plus age and gender per PAM. With the exception of the logistic regression model, long lists of MAP analytes were necessary for the most optimal classification between AD and cognitively healthy subjects (17 analytes by random forests, and 33 by PAM). Thus, if CSF tau and Aβ42 levels were altered in patients due to a therapeutic intervention of some kind such that they would no longer be available for classification purposes, these MAP analytes have the potential of maintaining the ability to distinguish between AD and cognitively healthy subjects. However, this type of modeling does not take into account the relationship between MAP biomarker levels and the levels of traditional AD biomarkers such as CSF tau and Aβ42. Thus, when we introduced these traditional AD CSF biomarkers into each classification model (logistic regression, random forests, and PAM), the improvement in classification both in terms of diagnostic accuracy and reduction of useful analytes can be visualized in the Venn diagram (, right). First, CSF tau and Aβ42 levels were identified by all three algorithms as key CSF analytes for identifying AD cases. In addition, levels of PDGF and neuron-glia CAM-related cell adhesion molecule (NrCAM) continued to provide classification values in conjunction with CSF levels of tau and Aβ42, although the role of fatty acid binding protein (Fabp) was reduced in one algorithm (logistic regression). Similarly, many novel analytes in the MAP biomarker only model no longer contributed to the identification of AD cases, suggesting that some of the MAP biomarker combinations were likely biomarker substitutes for traditional AD biomarkers. Lastly, the number of total analytes necessary for the identification of AD was reduced in all models (especially PAM), which likely supports the notion that the majority of MAP analytes are needed to refine the classification performed by the dominant biomarkers (Aβ42, tau, PDGF, NrCAM). The combined model (MAP and traditional AD biomarkers) achieved 92.4% sensitivity and 97.0% specificity in classification by random forests, and 97.0% sensitivity and 87.9% specificity in classification by PAM.
To better understand how MAP analytes improve the distinction between cognitively healthy subjects and AD, we analyzed PDGF levels according to CSF Aβ42 and tau categories using threshold values determined in this review (). As we previously demonstrated, low CSF Aβ42 levels identified the majority of AD, while 13 cases were misclassified based on CSF Aβ42 alone. Using elevated tau levels (red circles in ) in addition to CSF Aβ42 levels did not significantly alter the outcome, as there was a trade-off between increased sensitivity and decreased specificity. Elevated PDGF levels were more specific than elevated tau levels to correctly identify those with autopsy-confirmed AD but normal CSF Aβ42 levels (). In fact, when we analyzed the relative levels of the MAP analytes in these 13 misclassified subjects next to tau levels alone (), it was apparent that coupled changes in MAP biomarkers could better correctly re-classify autopsy-confirmed AD cases with normal CSF Aβ42 levels than using a tau threshold alone. At the same time, multiple cases of cognitively healthy subjects had low CSF Aβ42 levels and MAP biomarker patterns suggestive of AD. As CSF biomarker patterns can change long before symptomatic onset, alterations in MAP biomarkers other than tau provide possible support that these cognitively healthy subjects may be at increased risk for future cognitive symptoms.
Compared to the core analyte approaches (with single or multiple thresholds), the combinatorial MAP approach is more tolerant of analytes in the borderline ranges and analytes with larger CVs, as the eventual class membership depends more on each sample’s composite profile (and its neighbors) than the absolute relationship between a single analyte’s level and a cut-off value. Levels of multiple analytes can be measured simultaneously, as a fixed amount of assay volume is mixed with multiple types of beads each carrying specific antibodies. The disadvantages include the more sophisticated equipment and the necessity to develop multiple assays, which can be costly and time consuming. While individual investigators may weigh the disadvantages of a MAP approach against those associated with the threshold-approach using established targets and assays (such as Aβ42), a MAP approach is clearly superior to the single analyte approach in biomarker discovery as we discuss below.