The purposes of this study were to develop protocols and procedures for a large-scale biomarker pipeline based on targeted MS and to test the hypothesis that the pipeline would be sufficient to support large-scale clinical biomarker studies in humans. Previous work has characterized the performance of the individual components in isolation, such as AIMS, SRM and immuno-SRM technologies. Nonetheless, they had not been assembled in a pipeline, and were generally carried out in small numbers of reference samples and small numbers of analytes, inadequate to demonstrate the feasibility of deploying them in clinical studies.
To be useful for human biomarker studies, a pipeline must enable (i) data-dependent prioritization of hundreds of candidate biomarkers, (ii) cost-effective de novo development of tens to hundreds of assays of sufficient precision, specificity and sensitivity for human studies and (iii) multiplex biomarker verification studies allowing testing of tens to hundreds of candidate biomarkers in hundreds of clinical samples while consuming a minimal volume of biospecimens.
Based on these criteria, the procedures and protocols we have described, along with our benchmarking activities, demonstrate that the analytical performance of the targeted proteomic pipeline should be sufficient to support large-scale verification studies in humans.
First, the AIMS and SQ-SRM stages of the pipeline enabled data-dependent prioritization of hundreds of candidate biomarkers. Because of the ability to schedule AIMS detection based on the known or predicted retention times of the target peptides, >1,000 parent ions can be monitored in a single LC-MS/MS run, enabling a large number of candidates to be screened for detection in plasma in a single run. This high capacity enables prefractionation of plasma, thus enabling detection of analytes in the low ng protein/ml plasma range (levels consistent with many known cancer biomarkers). Of the 1,908 candidates identified from genomic and proteomic data sets, proteotypic peptides could be identified for 1,551 candidates (81%). This already high coverage will expand further as ongoing projects designed to characterize proteotypic peptides for all human proteins mature (
http://www.mrmatlas.org/). Of the 1,144 candidates targeted for detection in plasma by AIMS, ~50% were detected in plasma, despite the fact that many of the candidates had been discovered in tumor tissues, not plasma.
Although AIMS provided confirmation that a candidate biomarker could be detected in the plasma of tumor-bearing animals, it did not provide quantitative comparison of the levels of the candidates in cases versus controls. The gold standard for quantitative MS in the clinical laboratory is SRM-MS, which has been used for decades to quantify small molecules in human blood samples
30. An isotopically labeled internal standard typically is used for rigorous quantification, but label-free approaches provide a potential cost-effective method for rapid semiquantitative screening of large numbers of candidates
25. In this study we normalized the SRM signals of candidate biomarkers to those from a set of housekeeping proteins each of whose abundances did not, on average, differ between cases and controls (
Supplementary Results Section 3). Three hundred eighty-three of the 572 candidates (67%) detected in plasma were associated with proteotypic peptides that met criteria for SRM-based quantification (
Supplementary Results Section 3). Of these, 373 (97%) contained transitions that were reproducibly detectable with a signal-to-noise ratio ≥8, and were thus analyzed by SQ-SRM in pooled case versus control plasma to estimate whether the corresponding biomarker candidates were differentially abundant and thus might be useful biomarkers.
By far, the most challenging stage of the pipeline was verifying the specificity of the detected transitions in SQ-SRM. The difficulty resulted from the lack of internal standards, which provide reference signals for the verification of analyte specificity. Thus, in our experiment, the choice of candidates for quantitative SRM assay development was limited to the most abundant proteins or peptides for which multiple transitions were identified. For example, of the 373 analytes tested, only 164 (44%) were associated with at least three transitions with perfectly aligned retention times, indicating high confidence for specificity of the detection for the targeted analyte. Indeed 171 of the 373 analytes (46%) could not be substantiated with confidence, owing to difficulty in detecting transitions. The use of more affordable, albeit less pure, stable isotope standards from a process generally referred to as spot or membrane synthesis
31 remains to be tested but may offer a viable, cost-effective, alternative strategy to circumvent this issue. Furthermore, the specificity in SQ-SRM could also be improved by using recent advances in the data-dependent inclusion of transitions, such as intelligent SRM
32. It is notable that the success rate in verifying the credentialed candidates was very good (36/91 candidates or 40%). Therefore, MS-based techniques can be used to triage candidate biomarkers.
The second activity that a biomarker pipeline must support for human studies is the cost effective
de novo development of tens to hundreds of assays of sufficient precision, specificity and sensitivity for human studies. The cost of reagents for generating Q-SRM assays are those of the SIS peptides, for which heavy-light pairs can be obtained for <$1,000 per analyte. Reagent costs for generating a novel immuno-SRM assay with >90% success rate is <$5,000, and the average yield of an affinity-purified polyclonal antibody allows for testing of hundreds of plasma samples to assess the utility of a marker before investing in monoclonal antibody development
33. Lead time for assay generation is ~24 weeks, including selection of proteotypic peptides, synthesis of immunogens, generation of antibodies, optimization of assay conditions and generation of response curves. In this study, 88 novel assays were developed and characterized in <1 year by one laboratory. In contrast, it would be extraordinary for a single academic laboratory to successfully configure 1–10 ELISA assays
de novo in 1–2 years, especially at a comparable cost.
Regarding precision, as precision of the assay deteriorates, there is a nonlinear increase in the dispersion of the results, and thus the clinical signal may be drowned out by analytical noise. For example, biomarker measurements using an assay with poor precision will have a broader reference interval (due to analytical variation) and will thus be of less value for clinical classification of patients. As discussed elsewhere
20, statistical verification of a novel biomarker showing comparable biological variation to prostate-specific antigen would require testing plasma samples from a minimum of 500 cases and 500 well-matched controls using an assay technology associated with CV ≤ 20%. In this study, we observed CV < 15% for the majority of assays, demonstrating the feasibility of our approach for human biomarker verification studies.
Beyond verification studies, true clinical validation will require an even larger-scale case-control or cohort study to carefully examine the impact of other covariates on the proposed marker test, to determine the positive predictive values and false referral probabilities in real practice, and to compare or combine the new test with existing clinical tests. In the field of clinical chemistry, quality specifications for assay precision are routinely based on biological variation
34. It is widely accepted among clinical chemists
34 that the desirable level of imprecision in measurements should be <50% of the average within-subject variation (in which case the amount of variability added to true test-result variability is ~10%). Where desirable performance standards are not attainable with current methodology, the minimum level of imprecision in measurements should be <75% of the average within-subject variation (in which case the amount of variability added to true test-result variability is ~25%). Across 163 protein analytes for which human biological variation has been determined (
http://www.westgard.com/biodatabase1.htm), the median within-subject biologic variation is 10%; hence, the median desirable specification for assay precision is 5%. Achieving this very high bar for clinical implementation of the immuno-SRM assays in hospital laboratories will likely require optimization of the individual assays on an analyte-by-analyte basis.
Regarding specificity, SRM-based assays have a distinct advantage over conventional immunoassays, which are prone to interferences
35. With SRM, assay specificity is ensured by monitoring multiple transitions from each analyte and by inclusion of an internal isotopically labeled standard. Furthermore, where interferences are present, they are readily detected and can be avoided by selection of alternative transitions.
Regarding sensitivity, many known human cancer protein biomarkers are in the ng-to-pg of protein/ml plasma concentration range; early disease detection, such as in cancer, may require much higher sensitivities. In this study, assay LOQs in the low ng protein/ml plasma range were typically achieved when starting with only 10 μl of plasma (small volumes were used in this study due to the limited plasma yield from individual mice). However, immuno-SRM assays can achieve low pg protein/ml plasma LOQs when capturing from higher plasma volumes
29. Immuno-SRM assays are readily amenable to the testing of hundreds of biospecimens in human studies, as sample handling is minimal and largely automated and performed in a 96-well format
29,33. In contrast, for Q-SRM studies (without an anti-peptide antibody), sensitivity remains the primary obstacle to use of SRM assays, without enrichment of the target analyte. Coupling SRM with abundant protein depletion and sample fractionation significantly improves LOQs
36–38, but multiplex assay configuration may be difficult for hundreds of candidates, and the intensive sample processing required has a great impact on sample throughput. Hence, for large-scale biomarker verification studies on hundreds of plasma samples, it is beneficial to generate an antibody for immuno-SRM measurements, obviating the need for depletion or fractionation.
The third activity that a biomarker pipeline must support for human studies is multiplex biomarker verification studies that permit testing of tens to hundreds of candidate biomarkers in hundreds of clinical samples
20, while consuming a minimal volume of biospecimens per analyte. The SRM technology is highly amenable to multiplex measurements, as demonstrated in this study and others
39. Current instrumentation and software allow for scheduling of the transitions being monitored (based on peptide retention times on the high-performance liquid chromatography (HPLC) column), theoretically enabling hundreds of peptide analytes to be quantified in a single LC-MS run. The high multiplex level of these technologies also allows a very large number of measurements to be made in a relatively short period of time, accelerating progress from the discovery step to verification. For example, in this study 80 plasma samples were run in triplicate for 88 analytes, totaling 21,120 assays run in one laboratory in <6 months, demonstrating the feasibility for human biomarker studies. The ability to test ten to hundreds of biomarker candidates in multiplex also makes the use of larger plasma volumes a viable option for early human biomarker studies, allowing for increased assay sensitivity (as discussed above).
In this study, we demonstrate that a staged, targeted proteomic pipeline enables triage and follow-up quantitative testing of a far larger number of candidates than would have been possible using conventional technologies, marking a substantial improvement over the current state of biomarker evaluation. Although the true impact of this or any biomarker development pipeline will not become apparent until it is successfully used to discover novel biomarkers in humans, this study uses an animal model to benchmark the performance of the proposed pipeline and thereby demonstrates the feasibility of, and thus sets the stage for, applying the pipeline to human biomarker studies. Because biological and disease subtype heterogeneity (pre-analytical variables) likely differ between mouse models and humans, biomarkers identified using mice will not necessarily be of clinical utility in humans. Nonetheless, because the analytical performance of the technologies (e.g., sensitivity, precision, specificity and multiplex characteristics) as well as pre-analytical sample processing are not species-specific, we can confidently extrapolate the analytic performance observed using a mouse model to analogous studies with humans, even if the specific biomarkers are of limited clinical potential.