|Home | About | Journals | Submit | Contact Us | Français|
The cerebrospinal fluid (CSF) biomarkers amyloid β (Aβ)-42, total-tau (T-tau), and phosphorylated-tau (P-tau) demonstrate good diagnostic accuracy for Alzheimer’s disease (AD). However, there are large variations in biomarker measurements between studies, and between and within laboratories. The Alzheimer’s Association has initiated a global quality control program to estimate and monitor variability of measurements, quantify batch-to-batch assay variations, and identify sources of variability. In this article, we present the results from the first two rounds of the program.
The program is open for laboratories using commercially available kits for Aβ, T-tau, or P-tau. CSF samples (aliquots of pooled CSF) are sent for analysis several times a year from the Clinical Neurochemistry Laboratory at the Molndal campus of the University of Gothenburg, Sweden. Each round consists of three quality control samples.
Forty laboratories participated. Twenty-six used INNOTESTenzyme-linked immunosorbent assay kits, 14 used Luminex xMAP with the INNO-BIA AlzBio3 kit (both measure Aβ-(1-42), P-tau(181P), and T-tau), and 5 used Meso Scale Discovery with the Aβ triplex (AβN-42, AβN-40, and AβN-38) or T-tau kits. The total coefficients of variation between the laboratories were 13% to 36%. Five laboratories analyzed the samples six times on different occasions. Within-laboratory precisions differed considerably between biomarkers within individual laboratories.
Measurements of CSF AD biomarkers show large between-laboratory variability, likely caused by factors related to analytical procedures and the analytical kits. Standardization of laboratory procedures and efforts by kit vendors to increase kit performance might lower variability, and will likely increase the usefulness of CSF AD biomarkers.
The three major brain hallmarks in Alzheimer’s disease (AD) are extracellular amyloid plaques, axonal degeneration, and intraneuronal neurofibrillary tangles, which may be monitored with the cerebrospinal fluid (CSF) biomarkers amyloid β-42 (Aβ-42), total-tau (T-tau), and phosphorylated-tau (P-tau), respectively [1–4]. These three biomarkers have high diagnostic accuracy for established AD . They may also be used to identify AD before onset of dementia at the stage of mild cognitive impairment, as shown in both single-center [6–8] and large-scale heterogeneous multicenter studies [9–11], and to predict mild cognitive impairment/AD in those who are cognitively normal [12,13]. However, measured biomarker levels differ greatly between studies (Supplementary Fig. 1 and Supplementary Table 1), and the reported diagnostic accuracy of the biomarkers varies significantly [14,15]. These variations could be the result of preanalytical, analytical, or manufacturing processes that affect assay-related factors . Preanalytical factors include selection of study participants, procedures of lumbar puncture, sample handling, and sample storage [16–20]. Possible analytical factors include various differences in laboratory procedures among centers and technicians . Assay-related factors (between-lot) arise from manufacturing variations in the source material for components and reagents in the analytical kits and random variability of the production process. These issues are summarized in Table 1.
There are several commercially available assays for the determination of CSF Aβ-42, T-tau, and P-tau. Most laboratories in the program used the INNOTEST enzyme-linked immunosorbent assays (ELISAs) or the bead-based Luminex xMAP platform with the INNO-BIA AlzBio3 (both Innogenetics, Ghent, Belgium, www.innogenetics.com), which quantifies Aβ(1-42) (called Aβ-42 later in text), T-tau, and P-tau(181P) (called P-tau later in the text). Meso Scale Discovery (MSD, Gaithersburg, MD, www.mesoscale.com) technology was used by some laboratories for CSF AβN-42, AβN-40, AβN-38, and T-tau measurements. Although the observed biomarker concentrations may vary significantly between platforms, these techniques seem to have similar diagnostic accuracy for patients with AD versus controls . The within-center coefficients of variation (CV) are low, generally within 10% to 15%, and the intra-assay CVs are generally within 5% to 10% [18,22–25]. However, two control surveys of CSF Aβ-42, T-tau, and P-tau reported interassay and interlaboratory CVs of approximately 20% to 35% [25,26]. These values are in agreement with the variability seen in the largest published multicenter trial of early-stage AD so far, which included measurements performed at several laboratories .
Novel biomarker measurements may initially present significant intercenter differences before quality control (QC) programs have been established. To facilitate the worldwide use of CSF biomarkers in clinical dementia investigations and in research, it was decided at the International Conference on Alzheimer’s Disease (2009) in Vienna to initiate an international QC program for AD CSF biomarkers. The program is run by the Alzheimer’s Association and administrated from the Clinical Neurochemistry Laboratory at the Molndal campus of the University of Gothenburg, Sweden. The program consists of (1) a standardized operating procedure (SOP) for lumbar puncture and CSF sample handling procedures , and (2) an external comparison program of CSF analyses between laboratories. The program is open for any laboratory using a commercially available assay for CSF Aβ, T-tau, or P-tau. In-house assays and assays for which samples must be sent to kit vendors (e.g., P-tau231) are not part of the program. The results of the first two rounds of the program, which were completed during the spring of 2010, are presented in this report.
CSF pools were constructed in Molndal, Sweden, from a large number of fresh, de-identified samples from the clinical routine workflow. All samples tested negative for human immunodeficiency virus and hepatitis B and C. Samples with suspected Creutzfeldt–Jakob disease were excluded. The pools were prepared by experienced and certified laboratory technicians. The pools were thoroughly mixed and underwent one freeze–thaw cycle before aliquotation in 500-μL portions in polypropylene screw-cap tubes (Sarstedt Art. No. 72.692, 1.5 mL, Sarstedt AG & Co., Numbrecht, Germany), were frozen at −80°C, and were distributed to the participating laboratories on dry ice by courier. All laboratories verified that the samples had arrived frozen. In total, the laboratories received six blinded QC samples, including one sample each from the pools 2009-1A and 2009-1B for the first round, and one sample each from the pools 2010-2A and 2010-2B for the second round. For each round, the laboratories also received one aliquot from the pool QC-L, which will be the same in the coming years, to evaluate longitudinal stability. The blinded challenge samples differed in their AD biomarker profiles. Samples 2009-1A, 2010-2A, and QC-L had levels of Aβ-42, T-tau, and P-tau essentially in the range for healthy subjects. Sample 2009-1B had a classical AD biomarker profile, with low Aβ-42 and high T-tau and P-tau. Sample 2010-2B had essentially normal levels of Aβ-42, combined with high T-tau and P-tau.
Laboratories used assay lots that were available in their laboratories. Samples were analyzed in duplicate as part of the laboratories’ ordinary activities. Five laboratories routinely processing a large number of samples assessed within-laboratory precision performance by analyzing the samples six times using different plates. These laboratories (Amsterdam, Molndal, Erlangen, Ghent, and Pennsylvania) are called reference laboratories later in the text. All results were reported back to Molndal for data analysis.
Biomarker results were statistically analyzed and grouped by rounds, samples, and analytical techniques. Mean levels, standard deviations, and total CVs were calculated. For the reference laboratories, within-laboratory CVs were calculated. Correlations were assessed using Pearson correlation coefficient. GraphPad Prism 5 (GraphPad Software, La Jolla, CA, USA) was used for these analyses.
Analysis of variance was performed with the mixed procedure of SAS software version 9.2 (SAS Institute Inc., Cary, NC, USA) using Restricted Maximum Likelihood estimation of covariances. Analyses were performed in-line with International Organization for Standardization (ISO) standard ISO5725 and National Committee for Clinical Laboratory Standards (NCCLS) guideline Evaluation Of Precision Performance Of Quantitative Measurement Methods (EP5-A2). The estimated variance components were within-laboratory, between-laboratory, and between-lot variability. Following a widely accepted statistical convention, negative variance estimates were set to 0.
Forty laboratories participated (Supplementary Table 2). Two laboratories participated only in the first round, and three laboratories participated only in the second round. The laboratories used INNOTEST ELISAs (n = 26), Luminex xMAP with the INNO-BIA AlzBio3 kit (n = 14), and MSD with the Aβ triplex kit (n = 4 in the first round, n = 5 in the second round) or T-tau kit (MSD) (n =1). Aβ triplex may be used with different Aβ detection antibodies. The 4G8 antibody binds to Aβ amino acid residues 18-22, and the 6E10 antibody binds to residues 3-8. Both these antibodies were used by laboratories in the program. Every sample volume was enough for duplicate analyses with ELISA (T-tau: 2 × 25 μL, Aβ-42: 2 × 25μL, and P-tau: 2 × 25 μL), xMAP (2 × 75 μL), and MSD (Aβ triplex: 2 × 25 μL and T-tau: 2 ×25 μL), or combinations of these. Several laboratories used multiple techniques.
Results were grouped according to analytical techniques and samples. The total CVs among centers were 16% to 28% for ELISA (Fig. 1 A–C), 13% to 36% for xMAP (Fig. 1D–F), and 16% to 36% for MSD (Fig. 1G–I). CVs for MSD must be interpreted with caution, because they include both the 4G8 and 6E10 assays, and the 6E10 and the 4G8 antibodies bind to different epitopes on the Aβ peptide. Note that, given the study design of one reported mean value per sample and laboratory, this total variability includes both within- and between-center variability.
For each round and analyte, correlations between results for the A and B samples were analyzed for ELISA, xMAP, and MSD (Supplementary Fig. 2). In the ideal situation, the measured concentration range is small, and the correlation is then of secondary interest. However, when the range is wide, as in the present results, a high correlation indicates differences between laboratories but consistency within laboratories, whereas a low correlation may indicate inconsistency within laboratories combined with other variation.
Within-laboratory CVs were examined at the reference laboratories for ELISA and xMAP in the first (Figs. 2 and and3)3) and the second round (Figs. 4 and and5).5). CVs were 3.2% to 24% for ELISA and 2.3% to 26% for xMAP, but differed between analytes within individual laboratories, indicating assay-dependent variations. For example, in xMAP runs for sample 2009-1A, reference laboratory 5 had low variations for Aβ-42 and T-tau but high variation for P-tau (Fig. 3A–C). For the same analyte and platform, important differences in within-center variability could be noticed among reference laboratories. Most striking are the consistently low CVs for Aβ-42 measured with xMAP by reference laboratory 5. Also, a platform-dependent variation was observed with larger differences in mean levels between laboratories for the xMAP format as compared with ELISA.
The QC-L sample was analyzed in both rounds. Mean levels and total CVs among the laboratories are presented in Supplementary Fig. 3. There were no major changes in total CVs over time, except a decrease in variation for T-tau measured by ELISA. We also calculated within-laboratory CVs between the two rounds. For ELISA, the means of these between-round CVs were 14%, 10%, and 11% for Aβ-42, T-tau, and P-tau, respectively. For xMAP, the means were 14%, 9%, and 11%, respectively.
The analytical techniques reported different absolutevalues for the biomarkers. ELISA values were higher than xMAP values, especially for Aβ-42 and T-tau. MSD values for Aβ-42 were intermediate to ELISA and xMAP in the first round and higher than ELISA values in the second round (Fig. 1).
Contributions of between-laboratory, within-laboratory, and between-lot variability to the total variability were estimated using variance component analysis for ELISA and Luminex measurements. Samples from lots that were used in a minimum of 10 repeats were included. Estimates for the within-laboratory components were based only on data pertaining to the QC-L sample that were repeated in round 1 and round 2. Because of the unbalanced design and limited information per assay lot, variance components were estimated with large uncertainties. Therefore, we decided to limit interpretation of analysis of variance to the rankings of the different factors in contribution to overall variability. The rankings of the contributing factors differed among techniques and analytes (Supplementary Table 3).
This is the first data report from the Alzheimer’s Association QC program for AD CSF biomarkers. The total CVs between laboratories ranged from 13% to 36%, which is comparable with what has been seen in earlier smaller investigations [25,26]. No major differences in CVs were seen between the two rounds, which was as expected because there were no active interventions between the rounds. As the QC program continues, the most likely causes for the variations can be identified and addressed. For example, if a laboratory consistently reports low-rank data, the divergence is probably because of analytical factors. Moreover, oscillations between low- and high-rank results suggest that the origin of the inconsistency may be either analytical or assay-related factors, or a combination of both. Well-established routine CSF parameters, such as albumin and immunoglobulin levels, often have between-laboratory CVs of less than 10% to 15% in external control assurance programs. Biomarker scientists and manufacturers should strive to achieve this level of reproducibility for CSF AD markers. Such a goal is already within reach for some of the markers.
The key question is what causes the total variability described. Because pooled QC samples prepared in bulk at a single site were used in this study, preanalytical confounding factors related to the sample preparation were eliminated. Detected variations must have been caused by differences in other preanalytical procedures (e.g., handling/storage of QC samples or commercial kits at individual sites), analytical procedures, or variations related to the commercial assays themselves. With only two program rounds analyzed and many different assay lots used, the estimates of the contributions from between-laboratory, within-laboratory, and between-lot components to the total variability could only be interpreted as rankings instead of quantitative CVs. In general, different kit batches were rather evenly spread among the reported results, indicating that the total variations were not mainly caused by batch-to-batch variability. Intrabatch variability will contribute to the observed variations but cannot be singled out in this study. It may be noted that variations between laboratories were less for the reference laboratories than for all participating laboratories. Because the reference laboratories routinely process large amounts of samples, this highlights the importance of experience to decrease variations.
Differences in within-laboratory CVs among the bio-markers within individual reference laboratories suggest that assay-related factors are important. For example, for the xMAP analyses of sample 2009-1A, reference laboratory 1 had low CV for P-tau and high for Aβ-42 and T-tau, whereas reference laboratory 5 had high CV for P-tau and low for Aβ-42 and T-tau (Fig. 3A–C). Because all analytes are measured simultaneously with the xMAP system, such discrepancies are difficult to explain by variations in laboratory procedures and more likely caused by variations inherent to the kit itself. However, it cannot be ruled out that individual analytes in a multiplexed assay might be more or less sensitive to certain laboratory procedures. Possible assay-related factors are variations in antibody purification, coating of plates and beads, and preparation and stability of standards. Such sources of variation need to be decreased to a minimum, which requires increased efforts by kit manufacturers. The ideal approach is a collaborative effort between commercial kit vendors, instrument platform manufacturers, reference standardization programs, and laboratories using these methods.
Mean levels of biomarkers differ between the analytical techniques ELISA, xMAP, and MSD. This is ultimately caused by the lack of certified reference materials (CRMs) and calibrators for CSF Aβ-42, T-tau, and P-tau. CRMs (also called standard reference materials) are developed by metrology institutes, such as the United States Pharmacopeia and the National Institute of Standards and Technology (NIST) in the United States, and the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC), World Health Organization, and the National Institute for Biological Standards and Control . The reference materials include primary CRMs, produced with a certified value of purity, and secondary CRMs, which often are samples of human body fluids evaluated against primary CRMs. It is relatively easy to determine the purity of small molecules, such as glucose or cholesterol, which allow measurements in SI units in a defined matrix (e.g., serum or CSF). However, it is more difficult to establish the purity of proteins because of heterogeneities caused by post-translational modifications or contaminations. This makes it difficult to reach full SI traceability for proteins, and standardization is sometimes done with “artifact standards,” traced to the World Health Organization reference preparations, reporting concentrations in International Units (IUs) instead of SI units. One recent example of the complexity of establishing a protein CRM is the development of the troponin standard SRM 2921 (human cardiac troponin complex) . The development of CRMs for CSF AD biomarkers would be a major challenge for the AD biomarker community. Such a complicated task would require devotion and orchestrated efforts by researchers, industry, and metrology institutes. If successful, it would allow full global traceability and comparability of biomarker results, also among analytical techniques and centers.
The QC program was recently extended with a standardization program for clinical studies, called University of Gothenburg CSF 2010 (UGOT CSF 2010). For this, a CSF pool of 2000 mL was constructed and aliquoted in 500-μL portions. Multiple aliquots have been analyzed in Gothenburg to determine biomarker concentrations with high precision in this center. These aliquots may be requested by contacting the QC program coordinator at neurochem/at/neuro.gu.se. When including UGOT CSF 2010 biomarker measurements in publications, researchers enable normalization of their data or comparison with other studies. Authors may, for example, report their measured concentrations in their publications and conclude that “The UGOT CSF 2010 samples were within mean ±2 SD for Aβ-42, T-tau and P-tau.”
The QC program will continue with multiple test rounds each year. The program is still open for enrollment, and inquiries regarding participation can be made to the coordinator at neurochem/at/neuro.gu.se. The next rounds will include checklists for each analytical technique, in an attempt to identify analytical factors differing between laboratories. These checklists include information on instrument calibration, use of manual or automated techniques, sample handling and storage, handling of assay reagents and calibrators, use of internal control samples, assay conditions during preincubation and incubation, settings for data analysis, and criteria for run acceptance (for more information and checklists, see the program homepage http://neurochem.gu.se/TheAlzAssQCProgram). The aim is that this information will serve as a basis to identify factors that influence within- and between-laboratory variations. The participating laboratories may use the summary data to alter their procedures to harmonize their measurements. The QC program can be used to monitor the progress of these efforts.
This initiative should be viewed in the larger context of the development of SOPs for the measurement of diagnostic markers for the early detection of AD. This is needed for all biomarker modalities, including biochemical markers, magnetic resonance imaging markers, and positron emission tomography imaging markers using fluorodeoxyglucose or amyloid ligands . An effort similar to the QC program described in this article is the development of SOPs for magnetic resonance imaging measurements of hippocampal atrophy, which is being carried out by an international workgroup . The development of SOPs for biochemical and imaging markers will be a mandatory step for the introduction of new revised diagnostic criteria for AD that include biomarker information.
It should be noted that the data presented in this article do not hinder the implementation of CSF biomarkers for research or clinical use, but they highlight the present difficulties in establishing universal cutoff levels for the biomarkers. The variations put great demands on each laboratory to develop routines to ensure longitudinal stability in the values they report, for example, by testing multiple incoming kit lots and selecting the ones that best reproduce values in internal controls. Each laboratory must develop their own reference limits or check their method agreements against laboratories who have published such data. These efforts will increase the availability of AD CSF biomarkers as tools for researchers and clinicians.
The authors thank Åsa Källén, Monica Christiansson, Sara Hullberg, and Dzemila Secic for excellent technical assistance.
K.B., H.Z., N.M., and U.A. designed the study. N.M. and U.A. performed general statistical analyses, and E.C. performed the variance component analysis. N.M. drafted the manuscript. S.P. was the study coordinator. All authors participated in interpretation of data, revised the manuscript for intellectual content, and approved the final version.
A generous grant from the Alzheimer’s Association supported this study.