|Home | About | Journals | Submit | Contact Us | Français|
High-throughput technologies can now identify hundreds of candidate protein biomarkers for any disease with relative ease. However, because there are no assays for the majority of proteins and de novo immunoassay development is prohibitively expensive, few candidate biomarkers are tested in clinical studies. We tested whether the analytical performance of a biomarker identification pipeline based on targeted mass spectrometry would be sufficient for data-dependent prioritization of candidate biomarkers, de novo development of assays and multiplexed biomarker verification. We used a data-dependent triage process to prioritize a subset of putative plasma biomarkers from >1,000 candidates previously identified using a mouse model of breast cancer. Eighty-eight novel quantitative assays based on selected reaction monitoring mass spectrometry were developed, multiplexed and evaluated in 80 plasma samples. Thirty-six proteins were verified as being elevated in the plasma of tumor-bearing animals. The analytical performance of this pipeline suggests that it should support the use of an analogous approach with human samples.
For nearly a decade, hundreds of millions of dollars per year have been spent on the discovery of putative protein biomarkers of human diseases, especially plasma biomarkers. Despite this substantial investment, the number of new US Food and Drug Administration–approved biomarkers in plasma approved annually has remained relatively static, at no more than two per year1. This low yield is especially disappointing in the light of the revolution in technologies for discovering biomarker candidates, which has enabled the identification of hundreds of biomarker candidates for each of the most extensively studied diseases.
Several factors have contributed to this low return on investment. First, although so-called ’omics technologies have revolutionized the discovery of candidate biomarkers, the majority of these candidates do not encode clinically actionable information even if they differ in abundance in disease and control samples. Second, biomarker discovery experiments are fraught with false discoveries resulting from biological variability and the large number of hypotheses being tested in small numbers of samples. Furthermore, because there are no validated approaches for prioritizing from among the droves of candidates those likely to be of clinical use, costly clinical validation studies must be performed on large numbers of candidates for a single novel biomarker of clinical utility to be identified. Third, because there are no quantitative assays for the majority of human proteins2, assays (typically enzyme-linked immunosorbent assays (ELISA)) must be developed de novo for clinical testing of candidate biomarkers, and de novo assay development is prohibitively expensive for testing large numbers of candidate biomarkers. As a result, few putative biomarkers undergo rigorous validation, and the literature is replete with lengthy lists of candidates without follow-up.
Hence, current practice in typical biomarker discovery projects consists of the following stages. First, ’omics technologies are applied to plasma, proximal fluids and/or solid tissues to identify hundreds of candidate biomarkers. Next, candidate biomarkers undergo ‘verification’ in which each putative biomarker is quantified in a limited number (tens to hundreds) of clinical samples to confirm differential expression of the candidate in plasma from cases versus controls. Finally, beyond verification studies, clinical validation requires a large-scale case-control or cohort study to carefully examine the impact of other covariates on the proposed marker test, to determine the positive predictive values and false referral probabilities in real practice, and to compare or combine the new test with existing clinical tests. Because the odds are extraordinarily low that any one candidate will encode clinically useful information, large numbers of candidates must be tested if there is to be any hope of identifying a clinically useful biomarker. Thus begins a desperate search of commercial sources for antibodies and immunoassays for quantifying the candidates. Unfortunately, no assays are available for the vast majority of human proteins, and 50–60% of commercially available antibodies are so poorly validated as to be useless3–5, resulting in a considerable waste of time and money. Novel ELISA assays are extremely expensive to generate and very difficult to multiplex. Ideally they also require a recombinant protein standard, and because many proteins cannot be purified to a single component in a soluble form, the failure rate can be quite high. Thus, if one is to test more than a few dozen candidate biomarkers (for which there are good commercially available assays) one must generate novel, analytically validated assays de novo. Because of the high costs associated with de novo assay generation, a small number of candidate biomarkers must be selected from the many hundreds of available candidates, and there is no validated method to guide the prioritization of candidates. As a result, despite tremendous effort, each biomarker project in the end faces little more than a stochastic chance of success.
Until technologies exist to enable high-content, quantitative proteomic profiling on large numbers of clinical samples, a successful biomarker development pipeline must enable triaging and measuring large numbers of candidates downstream of biomarker candidate discovery, thereby accommodating the presently inevitable high per-candidate failure rate. The recent emergence of targeted mass spectrometry (MS)-based proteomic technologies, such as accurate inclusion mass screening6 (AIMS) and selected reaction monitoring7–11 (SRM), has raised the possibility12–15 of using these technologies for data-dependent triage and testing of hundreds of candidate biomarkers in a time- and cost-effective manner. Unlike untargeted (shotgun) modes of MS frequently used in biomarker candidate discovery studies, targeted MS methods are peptide sequence–based modes of MS that focus the full analytical capacity of the instrument on tens to hundreds of selected peptides in a complex mixture16,17. By restricting detection and fragmentation to only those peptides derived from proteins of interest (that is, candidate biomarkers identified in biomarker discovery experiments), sensitivity and reproducibility are improved dramatically compared to discovery-mode MS methods.
Although targeted MS-based technologies have been proposed as the basis of a viable biomarker pipeline12,13,15,18–20, no such pipeline has been demonstrated. Implementing such a notional pipeline is a substantial undertaking involving protocol development and benchmarking. The purpose of this study was to test the hypothesis that the analytical performance of a staged biomarker pipeline based on targeted MS would be sufficient for (i) data-dependent prioritization of hundreds of candidate biomarkers, (ii) de novo development of tens to hundreds of assays of sufficient precision, specificity and sensitivity for human studies and (iii) multiplex biomarker verification studies allowing testing of tens to hundreds of candidate biomarkers in hundreds of clinical samples while consuming a minimal volume of biospecimen material.
For decades, animal models have been critical for testing hypotheses before undertaking human studies. Using a mouse cancer model for this technology benchmarking study was advantageous because (i) samples could be easily generated and collected under strict standard operating protocols in a cost-effective manner, (ii) biological and environmental variations could be minimized and strictly controlled for, allowing us to focus on the pre-analytic and analytic variations associated with the technologies and (iii) precious human clinical samples were not consumed. Although biological variation among humans will undoubtedly be greater than that among mice, the pre-analytic and analytic variations associated with the technologies are agnostic as to what species is used. Thus, although specific biomarkers identified in a mouse model may not be relevant to human disease, the mouse is a useful model system for benchmarking technologies intended for clinical applications.
We selected a mouse model of breast cancer that has been extensively characterized using genomic and proteomic technologies21–24. Therefore, considerable data were available for identifying putative circulating biomarkers at the outset of the study. Candidate biomarkers (1,908; Supplementary Worksheet 1) were identified by integrating 13 independent genomic and proteomic data sets21,22, including candidates discovered in tumor tissue (and predicted to be secreted, leaked or shed) and/or in plasma (fig. S1.1 and table S1.2 in Supplementary Results Section 1). Quantitative assays (e.g., ELISAs) were not available for the majority of the candidates, severely limiting our ability to perform follow-up verification studies of candidates.
To triage and verify a large number of biomarker candidates, we configured a pipeline (Fig. 1) using targeted proteomic technologies, strategically staged in a manner intended to enable us to test as many candidates as possible while containing costs. Specifically, the technology with the lower expense and higher capacity to triage large numbers of candidates (AIMS) was used first, whereas the lower capacity and more expensive quantification technology (SRM) was used last (on only the most highly credentialed candidates).
It is imperative to choose from among the large number of initial candidates and identify those most likely to be of use as a blood-based marker before committing resources to quantitative assay configuration. Because many candidates originate from discovery experiments based in tissue (or proximal fluids), and the number of candidates from this category may be very high (table S1.2 in Supplementary Results Section 1), the approach used should have a large capacity for determining the presence of each candidate in plasma. We chose to use AIMS6 for this purpose.
For AIMS analysis, peptide mass-to-charge ratios (m/z) on a user-generated software inclusion list are monitored in each scan using an Orbitrap MS system, and MS/MS spectra are acquired only when a peptide from the list is detected with both the correct accurate mass and charge state6. The fragmentation patterns from these acquired spectra are used to confirm the presence of the peptide. We used AIMS6 to confirm that a given peptide (and thus the protein from which it is derived) was detectable in the plasma (and thus might form the basis of a blood-based test), providing a bridge from discovery to MS-based targeted assay development. Only candidates detected in the plasma by AIMS were advanced to assay development.
The AIMS study is described in detail in Supplementary Results Section 2. Briefly, a pool of plasma with equal contributions from 20 tumor-bearing animals was collected, abundant plasma proteins were depleted, and plasma was proteolyzed with trypsin and subjected to strong cation exchange (SCX) chromatography to generate ten fractions. Each fraction was subjected to 21 AIMS runs and six liquid chromatography (LC)-MS/MS shotgun runs. For AIMS analysis, we targeted 16,961 proteotypic peptides (experimentally observable peptides that each uniquely identify a specific protein or protein isoform) from 1,144 candidate biomarkers. Of these, 764 candidates were not targeted owing to capacity limitations and the absence of empirical proteotypic peptides; 572 candidates (50%) were detected (Supplementary Worksheet 2). The highest success rates for detection (table S2.2 in Supplementary Results Section 2) came from those candidates identified in both tissue and plasma (79%) and those discovered in plasma but not tissue (77%). An interesting feature of this analysis is the reasonably high (17%) success rate for candidates originally discovered exclusively in the tissue data sets.
AIMS analysis added little benefit (1–5% additional candidates) over the standard shotgun analysis for candidates that had previously been discovered in plasma (table S2.3 in Supplementary Results Section 2). The added value of the inclusion list was quite apparent for candidates previously identified exclusively in tumor tissue, where a large number (n = 48; 26% additional candidates) of candidates discovered in tissue were confirmed by AIMS only (that is, not detected by shotgun analysis). This was expected because candidates originally discovered only in tissue are likely present at a lower concentration in plasma compared with those candidates discovered directly in plasma; thus the use of the inclusion list (and associated added sensitivity) is beneficial.
Reagent costs (e.g., synthetic stable isotope–labeled standard (SIS) peptides) prevented us from developing quantitative SRM assays for all the candidates confirmed by the AIMS analysis, necessitating a further prioritization step. For this we used a semiquantitative application of SRM25 that involves normalization of signals, as described below. In the targeted SRM mode of MS, a specific tryptic peptide is selected as a stoichiometric representative of the protein from which it is cleaved. For semiquantitative SRM (SQ-SRM) evaluation, SRM signals from housekeeping proteins (that is, plasma proteins with the same abundance in tumor-bearing and control animals) were used to normalize SRM signals from candidate proteins across LC-SRM-MS runs (Supplementary Worksheet 3). The AIMS/shotgun data were used to empirically select 9,647 transitions (i.e., a pair of precursor and fragment ion m/z for a given peptide) representing 383 candidates that were measured in triplicate on depleted plasma pooled from 20 tumor-bearing mice and 20 control mice. Of these, 189 candidates were not targeted because the observed peptides did not meet our SRM peptide criteria (Supplementary Results Section 3). SRM signals were curated by hand to ensure proper peak integration and retention time alignment of multiple transitions emanating from each peptide; only transitions showing consistency across runs with a signal-to-noise ratio ≥8 were included in the analysis. Candidates were binned based on confidence level (figure s3.2 in Supplementary Results Section 3). High confidence was assigned to the 164 candidates for which we detected at least three transitions from at least one proteotypic peptide. Medium confidence was assigned to the 38 candidates for which we detected two transitions from one proteotypic peptide. The remaining 171 candidates were detected by a single transition only and thus were considered to be unsubstantiated. The 164 high-confidence candidates were further prioritized by receiver operating characteristic (ROC) analysis (Supplementary Results Section 3), narrowing the list to those 118 candidates with ROC > 0.7. Thus the triage step was successful in narrowing the number of candidates to a reasonable and manageable list, and for determining the subset of candidates for which an antibody (that is, immuno-SRM assay) would be required for robust measurement.
Ninety-one proteins were selected for de novo quantitative assay generation (Supplementary Results Section 3). Seven candidates were selected from the housekeeping proteins and 49 candidates were selected from the high-confidence candidates, whose transition signals are likely to be sufficient for quantification by SRM-MS. To evaluate the false-negative rate of the pipeline for prioritizing viable lower abundance candidates, we also selected an additional 35 candidates for which signals in the SQ-SRM analysis were too low for assessment. For these 35 candidates, we generated affinity-purified, polyclonal antibodies for immunoaffinity enrichment of proteotypic peptides upstream of SRM-MS. This technique, stable isotope standards and capture by antipeptide antibodies26 (SISCAPA), achieves limits of quantification (LOQ) of proteins in the low ng protein/ml range21,27–29 from 10 μl and low pg protein/ml range29 from 1 ml plasma. Development and characterization of the SRM and immuno-SRM assays are described in the following sections.
We configured 57 peptide analytes (representing 49 candidates and seven housekeeping proteins) into a multiplex SRM-MS assay, and SIS peptides were used as internal standards (Supplementary Results Section 4 and Supplementary Worksheet 4a–c). A list of peptides and SRM transitions is provided as a Skyline document; see Supplementary Data 1. The analytical performance of the multiplex assay was determined by generating response curves for candidate biomarkers in a trypsinized, depleted plasma matrix (Supplementary Appendix A). Performance characteristics of the assays for the candidate biomarkers are summarized in Table 1. The median analytical coefficient of variation (CV) across all analytes was 5.7% at the LOQ. The median limit of detection (LOD) was 68 ng protein/ml plasma, and median LOQ was 157 ng protein/ml plasma.
We developed 31 affinity-purified polyclonal antibodies to proteotypic peptides from 30 of the 35 selected candidates described above, and these antibodies were used to construct a 31-plex immuno-SRM assay, using SIS peptides as internal standards (Supplementary Results Section 5 and Supplementary Worksheet 5a–d). A list of peptides and SRM transitions is provided as a Skyline document; see Supplementary Data 2. The analytical performance of the 31-plex assay was determined by generating response curves in a trypsinized plasma matrix (Supplementary Appendix B). Performance characteristics of the assays are summarized in Table 2; median CV across all analytes was 7.4% (triplicate SISCAPA captures), median LOD was 26 ng protein/ml plasma and median LOQ was 48 ng protein/ml plasma. Note that LOD/LOQs are not directly comparable to those for the multiplex Q-SRM assay described above, as the Q-SRM assays tested a different set of analytes and were run on depleted plasma using a Qtrap 5500 instrument, whereas immuno-SRM assays were run on nondepleted plasma using a Qtrap 4000.
The analytically validated, 57-plex SRM and 31-plex immuno-SRM assays were used in a quantitative biomarker verification study (Supplementary Results Section 6). No mice used for discovery, AIMS or SQ-SRM were used for the verification studies, which used a separate cohort of animals. All 88 assays were run in triplicate on 80 plasma samples derived from three cohorts of mice, as described below (Supplementary Worksheet 6a–g and Supplementary Appendix C). Median assay precision (during the verification studies) was 3.4% CV (analytical variation) for the 57-plex SRM and 12.5% CV (preanalytical and analytical variation combined) for the 31-plex SISCAPA assays.
The first cohort included ten control and ten tumor-bearing mice, where tumors were clinically apparent (~1 cm, equivalent to the tumor burden in the mice used for discovery). For the 57-plex assay, 46 of the targeted peptide analytes were present above their assay LOQs in at least 50% of the cancer-bearing animals (Supplementary Worksheet 6a). Thirty proteins (2310057J18Rik, Aldoc, Chi3l1, Comp, Ctsb, Ctsl, Dsg2, Ext2, Fetub, Fn1, Hsp90b1, Hspa5, Itih3, Kng2, Lcn2, Lcp1, Mug1, Nrp1, Orm2, Spp1, P4hb, Papln, Pdia3, Ppic, Ptk7, Ptprf, Spint1, Tnc, Itih1, Pzp) showed significant elevation in tumor-bearing animals (area under the curve (AUC) ≥ 0.8, P ≤ 0.01; Table 1). These success rates indicate that the SQ-SRM analysis of pooled plasma successfully identified candidates that (i) could be measured by SRM in depleted plasma without antibody enrichment, and (ii) had a high probability of being verified in quantitative studies examining individual (that is, not pooled) plasma samples.
For the 31-plex immuno-SRM assay, 17 of the analytes were present above their assay LOQs in at least 50% of the cancer-bearing animals (Supplementary Worksheet 6a). Of these, six proteins (B4galnt1, Ext1, Man2a1, Mfge8, Pcdh18, Prdx4) showed significant elevation in tumor-bearing animals (AUC ≥ 0.8, P ≤ 0.01). The results are summarized in Table 2, and representative examples are shown in Figures 2–4. Where available, commercial antibodies were used for western blot analysis or ELISA of plasma and confirmed the results of the SRM-based assays (Figs. 3, ,44 and figure S6.4 in Supplementary Results Section 6). These results demonstrate that generation of an anti-peptide antibody and configuration of an immuno-SRM assay enhanced the SRM signals from target analytes, allowing quantification in plasma.
A higher percentage of candidates assayed by Q-SRM (30/56 or 54%) were verified compared to the number verified using immuno-SRM (6/35 or 17%). This is expected because the immuno-SRM targets were unsubstantiated by SQ-SRM and likely represent lower abundance proteins. Consistent with this, 46/57 Q-SRM analytes were above their assay LOQs in at least 50% of the cancer-bearing animals, whereas only 17/31 were for the immuno-SRM analytes.
The second cohort included ten control and ten tumor-bearing mice, where the tumors were preclinical (that is, not palpable); hence, the tumor burden was considerably lower than that used in the biomarker discovery work. This second time point was examined to determine whether the SRM technology is sufficiently sensitive to quantify biomarkers that are only fractionally elevated as a consequence of an extremely low tumor burden. For the 57-plex SRM assay, 29 of the targeted analytes were present above their assay LOQs in at least 50% of the cancer-bearing animals (Supplementary Worksheet 6b). Of these, one protein (Spp1) was present at significantly (P ≤ 0.01) elevated levels in tumor-bearing animals. For the 31-plex SISCAPA assay, 20 of the analytes were present above their assay LOQs in at least 50% of the cancer-bearing animals. Of these, one protein (Mfge8) was present at significantly (P ≤ 0.01) elevated levels in tumor-bearing animals. Hence, as expected, detection of very low disease burden is far more challenging than detection of high disease burden.
To determine the specificity of the biomarkers elevated in cancer, we tested a third cohort of 10 control mice and 30 mice that had been injected with a compound to stimulate confounding conditions including inflammation, apoptosis, necrosis, wound healing and angiogenesis (Supplementary Results Section 6). Of the 36 proteins elevated in the plasma of animals with clinically apparent breast cancers, 17 were above their assay LOQ in at least 50% of the animals with the confounding condition. Only two proteins (Pdia3, Ppic) were elevated in a confounding condition (AUC ≥ 0.8, P ≤ 0.01), suggesting that the majority of biomarkers tested are not general indicators of inflammation or tissue remodeling.
The distribution of the discovery methods used to identify the verified candidates is shown in table S6.1 in Supplementary Results Section 6. The number of candidates tested in this study makes it difficult to draw conclusions regarding the general success rates of various discovery approaches. It is noteworthy that a high percentage of candidates that had been discovered exclusively in tissue-based analyses were substantiated by Q-SRM (4/4) and immuno-SRM (3/15).
The purposes of this study were to develop protocols and procedures for a large-scale biomarker pipeline based on targeted MS and to test the hypothesis that the pipeline would be sufficient to support large-scale clinical biomarker studies in humans. Previous work has characterized the performance of the individual components in isolation, such as AIMS, SRM and immuno-SRM technologies. Nonetheless, they had not been assembled in a pipeline, and were generally carried out in small numbers of reference samples and small numbers of analytes, inadequate to demonstrate the feasibility of deploying them in clinical studies.
To be useful for human biomarker studies, a pipeline must enable (i) data-dependent prioritization of hundreds of candidate biomarkers, (ii) cost-effective de novo development of tens to hundreds of assays of sufficient precision, specificity and sensitivity for human studies and (iii) multiplex biomarker verification studies allowing testing of tens to hundreds of candidate biomarkers in hundreds of clinical samples while consuming a minimal volume of biospecimens.
Based on these criteria, the procedures and protocols we have described, along with our benchmarking activities, demonstrate that the analytical performance of the targeted proteomic pipeline should be sufficient to support large-scale verification studies in humans.
First, the AIMS and SQ-SRM stages of the pipeline enabled data-dependent prioritization of hundreds of candidate biomarkers. Because of the ability to schedule AIMS detection based on the known or predicted retention times of the target peptides, >1,000 parent ions can be monitored in a single LC-MS/MS run, enabling a large number of candidates to be screened for detection in plasma in a single run. This high capacity enables prefractionation of plasma, thus enabling detection of analytes in the low ng protein/ml plasma range (levels consistent with many known cancer biomarkers). Of the 1,908 candidates identified from genomic and proteomic data sets, proteotypic peptides could be identified for 1,551 candidates (81%). This already high coverage will expand further as ongoing projects designed to characterize proteotypic peptides for all human proteins mature (http://www.mrmatlas.org/). Of the 1,144 candidates targeted for detection in plasma by AIMS, ~50% were detected in plasma, despite the fact that many of the candidates had been discovered in tumor tissues, not plasma.
Although AIMS provided confirmation that a candidate biomarker could be detected in the plasma of tumor-bearing animals, it did not provide quantitative comparison of the levels of the candidates in cases versus controls. The gold standard for quantitative MS in the clinical laboratory is SRM-MS, which has been used for decades to quantify small molecules in human blood samples30. An isotopically labeled internal standard typically is used for rigorous quantification, but label-free approaches provide a potential cost-effective method for rapid semiquantitative screening of large numbers of candidates25. In this study we normalized the SRM signals of candidate biomarkers to those from a set of housekeeping proteins each of whose abundances did not, on average, differ between cases and controls (Supplementary Results Section 3). Three hundred eighty-three of the 572 candidates (67%) detected in plasma were associated with proteotypic peptides that met criteria for SRM-based quantification (Supplementary Results Section 3). Of these, 373 (97%) contained transitions that were reproducibly detectable with a signal-to-noise ratio ≥8, and were thus analyzed by SQ-SRM in pooled case versus control plasma to estimate whether the corresponding biomarker candidates were differentially abundant and thus might be useful biomarkers.
By far, the most challenging stage of the pipeline was verifying the specificity of the detected transitions in SQ-SRM. The difficulty resulted from the lack of internal standards, which provide reference signals for the verification of analyte specificity. Thus, in our experiment, the choice of candidates for quantitative SRM assay development was limited to the most abundant proteins or peptides for which multiple transitions were identified. For example, of the 373 analytes tested, only 164 (44%) were associated with at least three transitions with perfectly aligned retention times, indicating high confidence for specificity of the detection for the targeted analyte. Indeed 171 of the 373 analytes (46%) could not be substantiated with confidence, owing to difficulty in detecting transitions. The use of more affordable, albeit less pure, stable isotope standards from a process generally referred to as spot or membrane synthesis31 remains to be tested but may offer a viable, cost-effective, alternative strategy to circumvent this issue. Furthermore, the specificity in SQ-SRM could also be improved by using recent advances in the data-dependent inclusion of transitions, such as intelligent SRM32. It is notable that the success rate in verifying the credentialed candidates was very good (36/91 candidates or 40%). Therefore, MS-based techniques can be used to triage candidate biomarkers.
The second activity that a biomarker pipeline must support for human studies is the cost effective de novo development of tens to hundreds of assays of sufficient precision, specificity and sensitivity for human studies. The cost of reagents for generating Q-SRM assays are those of the SIS peptides, for which heavy-light pairs can be obtained for <$1,000 per analyte. Reagent costs for generating a novel immuno-SRM assay with >90% success rate is <$5,000, and the average yield of an affinity-purified polyclonal antibody allows for testing of hundreds of plasma samples to assess the utility of a marker before investing in monoclonal antibody development33. Lead time for assay generation is ~24 weeks, including selection of proteotypic peptides, synthesis of immunogens, generation of antibodies, optimization of assay conditions and generation of response curves. In this study, 88 novel assays were developed and characterized in <1 year by one laboratory. In contrast, it would be extraordinary for a single academic laboratory to successfully configure 1–10 ELISA assays de novo in 1–2 years, especially at a comparable cost.
Regarding precision, as precision of the assay deteriorates, there is a nonlinear increase in the dispersion of the results, and thus the clinical signal may be drowned out by analytical noise. For example, biomarker measurements using an assay with poor precision will have a broader reference interval (due to analytical variation) and will thus be of less value for clinical classification of patients. As discussed elsewhere20, statistical verification of a novel biomarker showing comparable biological variation to prostate-specific antigen would require testing plasma samples from a minimum of 500 cases and 500 well-matched controls using an assay technology associated with CV ≤ 20%. In this study, we observed CV < 15% for the majority of assays, demonstrating the feasibility of our approach for human biomarker verification studies.
Beyond verification studies, true clinical validation will require an even larger-scale case-control or cohort study to carefully examine the impact of other covariates on the proposed marker test, to determine the positive predictive values and false referral probabilities in real practice, and to compare or combine the new test with existing clinical tests. In the field of clinical chemistry, quality specifications for assay precision are routinely based on biological variation34. It is widely accepted among clinical chemists34 that the desirable level of imprecision in measurements should be <50% of the average within-subject variation (in which case the amount of variability added to true test-result variability is ~10%). Where desirable performance standards are not attainable with current methodology, the minimum level of imprecision in measurements should be <75% of the average within-subject variation (in which case the amount of variability added to true test-result variability is ~25%). Across 163 protein analytes for which human biological variation has been determined (http://www.westgard.com/biodatabase1.htm), the median within-subject biologic variation is 10%; hence, the median desirable specification for assay precision is 5%. Achieving this very high bar for clinical implementation of the immuno-SRM assays in hospital laboratories will likely require optimization of the individual assays on an analyte-by-analyte basis.
Regarding specificity, SRM-based assays have a distinct advantage over conventional immunoassays, which are prone to interferences35. With SRM, assay specificity is ensured by monitoring multiple transitions from each analyte and by inclusion of an internal isotopically labeled standard. Furthermore, where interferences are present, they are readily detected and can be avoided by selection of alternative transitions.
Regarding sensitivity, many known human cancer protein biomarkers are in the ng-to-pg of protein/ml plasma concentration range; early disease detection, such as in cancer, may require much higher sensitivities. In this study, assay LOQs in the low ng protein/ml plasma range were typically achieved when starting with only 10 μl of plasma (small volumes were used in this study due to the limited plasma yield from individual mice). However, immuno-SRM assays can achieve low pg protein/ml plasma LOQs when capturing from higher plasma volumes29. Immuno-SRM assays are readily amenable to the testing of hundreds of biospecimens in human studies, as sample handling is minimal and largely automated and performed in a 96-well format29,33. In contrast, for Q-SRM studies (without an anti-peptide antibody), sensitivity remains the primary obstacle to use of SRM assays, without enrichment of the target analyte. Coupling SRM with abundant protein depletion and sample fractionation significantly improves LOQs36–38, but multiplex assay configuration may be difficult for hundreds of candidates, and the intensive sample processing required has a great impact on sample throughput. Hence, for large-scale biomarker verification studies on hundreds of plasma samples, it is beneficial to generate an antibody for immuno-SRM measurements, obviating the need for depletion or fractionation.
The third activity that a biomarker pipeline must support for human studies is multiplex biomarker verification studies that permit testing of tens to hundreds of candidate biomarkers in hundreds of clinical samples20, while consuming a minimal volume of biospecimens per analyte. The SRM technology is highly amenable to multiplex measurements, as demonstrated in this study and others39. Current instrumentation and software allow for scheduling of the transitions being monitored (based on peptide retention times on the high-performance liquid chromatography (HPLC) column), theoretically enabling hundreds of peptide analytes to be quantified in a single LC-MS run. The high multiplex level of these technologies also allows a very large number of measurements to be made in a relatively short period of time, accelerating progress from the discovery step to verification. For example, in this study 80 plasma samples were run in triplicate for 88 analytes, totaling 21,120 assays run in one laboratory in <6 months, demonstrating the feasibility for human biomarker studies. The ability to test ten to hundreds of biomarker candidates in multiplex also makes the use of larger plasma volumes a viable option for early human biomarker studies, allowing for increased assay sensitivity (as discussed above).
In this study, we demonstrate that a staged, targeted proteomic pipeline enables triage and follow-up quantitative testing of a far larger number of candidates than would have been possible using conventional technologies, marking a substantial improvement over the current state of biomarker evaluation. Although the true impact of this or any biomarker development pipeline will not become apparent until it is successfully used to discover novel biomarkers in humans, this study uses an animal model to benchmark the performance of the proposed pipeline and thereby demonstrates the feasibility of, and thus sets the stage for, applying the pipeline to human biomarker studies. Because biological and disease subtype heterogeneity (pre-analytical variables) likely differ between mouse models and humans, biomarkers identified using mice will not necessarily be of clinical utility in humans. Nonetheless, because the analytical performance of the technologies (e.g., sensitivity, precision, specificity and multiplex characteristics) as well as pre-analytical sample processing are not species-specific, we can confidently extrapolate the analytic performance observed using a mouse model to analogous studies with humans, even if the specific biomarkers are of limited clinical potential.
Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/.
This work was funded by a grant from The Paul G. Allen Family Foundation and the National Institutes of Health grant U24 CA126476 from the National Cancer Institute Clinical Proteomic Technology Assessment Center (CPTAC). We thank our Biomarker Project advisory board members for advice and input: R. Aebersold, L. Anderson, E. Diamandis, R. Smith, F. Appelbaum, D. Gottschling, M. Groudine, J. Roberts, V. Vasioukhin and N. Urban. We thank all members of the Biomarker Project team for input: L. Hartwell, S.M. Hanash, S.J. Pitteri, H. Wong, K.E. Gurley, D. Liggitt, D.B. Martin, T. Whitmore, A. Peterson, R. Prueitt, M. Fitzgibbon, J.K. Eng, D. May, T. Holzman, Y. Zhang, A. Stimmel, S.L. Zriny, R. Dumpit, I. Coleman and T.D. Lorentzen.
AUTHOR CONTRIBUTIONSJ.R.W. oversaw all proteomic experiments, guided the data analysis and assisted in writing the manuscript. C.L. performed or oversaw all data analysis and assisted in writing the manuscript. J.K. performed all verification studies using the 57-plex SRM assay. L.H. performed all SQ-SRM studies, assisted with the AIMS studies and assisted with configuration of the multiplex SRM assays. M.T. performed plasma sample processing throughout the project. I.S. led all verification studies using the 31-plex immuno-SRM assay and performed all ELISA analyses. P.Y. performed analysis of all response curve SRM data. R.M.S. assisted with the 31-plex immuno-SRM assay characterization and verification studies, identification of candidate biomarkers, depletion of plasma samples for verification studies and editing of the manuscript. L.Z. assisted with the 31-plex immuno-SRM assay characterization and verification studies. U.J.V. performed all western blot analyses. L.A.C. provided the Her2 mouse model. K.S.K.-S. and C.J.K. helped design the mouse experiments, performed all mouse husbandry and provided the required plasma samples. A.K. assisted in selection of candidate biomarkers, assembly of the inclusion lists for the AIMS analyses and analysis of the AIMS data. P.R.G., J.M.H. and L.A.J. performed the AIMS analyses. P.S.N. managed the overall consortium, contributed data for candidate selection, made intellectual contributions to the design of the study and assisted with editing the manuscript. P.W., M.W.M. and L.A. provided statistical input for the design of the experiments and the analysis of the data and assisted with editing the manuscript. A.G.P. designed experiments, coordinated execution of the project, coordinated analysis/interpretation of the data and assumed primary responsibility for writing the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Reprints and permissions information is available online at http://www.nature.com/reprints/index.html.
Note: Supplementary information is available on the Nature Biotechnology website.
Accession codes. All data used by or generated during this study have been deposited in the public domain. Hash codes from the ProteomeCommons.org Tranche network for the plasma AIMS data, shotgun data and integrated SRM data are given in Table 3. Additionally, Skyline documents for implementing the 57-plex SRM (Supplementary Data 1) and the 31-plex immuno-SRM (Supplementary Data 2) assays are available.