|Home | About | Journals | Submit | Contact Us | Français|
The application of “omics” technologies to biological samples generates hundreds to thousands of biomarker candidates; however, a discouragingly small number make it through the pipeline to clinical use. This is in large part due to the incredible mismatch between the large numbers of biomarker candidates and the paucity of reliable assays and methods for validation studies. We desperately need a pipeline that relieves this bottleneck between biomarker discovery and validation. This paper reviews the requirements for technologies to adequately credential biomarker candidates for costly clinical validation and proposes methods and systems to verify biomarker candidates. Models involving pooling of clinical samples, where appropriate, are discussed. We conclude that current proteomic technologies are on the cusp of significantly affecting translation of molecular diagnostics into the clinic.
Biomarkers are a cornerstone of medical care. In the acute care setting (e.g., emergency rooms), blood biomarker measurements are routinely used to differentiate causes of patient symptoms such as chest pain (troponins for heart attack) or abdominal pain (transaminases for hepatitis, alkaline phosphatase for biliary problems, and human chorionic gonadotropin (β-hCG) for pregnancy). Biomarkers also have a proven track record in other clinical applications such as risk stratifying patients for preventive interventions , screening populations for early disease detection , subtyping disease to facilitate molecularly tailored therapy , and monitoring response to treatment . Additionally, biomarkers spur the development of new generations of therapeutics by providing accepted surrogates that reduce the cost of screening drugs in humans (e.g., LDL cholesterol for risk of stroke or heart attack, viral load for HIV).
Given the tremendous track record of biomarkers for impacting patient care and the medical community’s growing interest in personalized medicine, there is considerable activity toward the development of more and better biomarkers. With the recent application of genomics and proteomics technologies, hundreds-to-thousands of biomarker candidates are routinely identified in biomarker discovery experiments, spawning great hope that a new onslaught of clinically useful biomarkers is imminent. However, enigmatically, diminishingly few new protein biomarkers are achieving FDA approval , leading to a community that is disgruntled and questioning the value of proteomic technologies to biomarker discovery [6–10].
In this article, we will focus on the development of novel biomarkers that can be measured relatively noninvasively in plasma. We will review the current status of the biomarker development pipeline, with a focus on biomarker verification, the stage of the pipeline where our opportunities for improvement using emerging proteomic technologies are greatest. We will argue that clinical proteomics in the post-genomics era is in its infancy, and despite having produced no novel biomarkers to date, is poised to impact the clogged biomarker pipeline now more than at any other time in history. We will propose one possible path forward to apply emerging proteomic technologies in novel ways to improve flux through and overall success of the biomarker pipeline.
A schematic of the biomarker pipeline is shown in Fig. 1. The major stages in the pipeline are shown including biomarker candidate identification, prioritization, verification, and clinical validation. Additionally, typical numbers of candidate biomarkers making it through each stage are shown, highlighting the aforementioned enigma wherein despite our ability to generate long lists of candidates, a mere 0–2 per year are achieving FDA approval (across all diseases).
Low flux through the biomarker pipeline has been compared to that of the drug discovery pipeline, for which a high failure rate [11, 12] is responsible for the rising cost of bringing a new drug to market, which is now approaching $1 billion [13, 14]. Although the emergence of molecularly targeted therapies will likely change the situation, diagnostics have historically been perceived as being of lesser value than drugs . Hence, diagnostics are reimbursed at significantly lower levels than therapeutics, and inadequate reimbursement is a profound impediment to development of new diagnostics . Until this situation changes, we must minimize the cost of developing new diagnostics if we are to succeed in bringing new tests into clinical use.
In this article, we will discuss each stage of the protein biomarker pipeline, maintaining a focus on how each stage impacts the overall success rate of the pipeline. This will naturally lead us to a discussion of what can be done to improve our success rates (and reduce costs), especially in the verification stage, the “tar pit” of the pipeline wherein the largest bottleneck is encountered.
Although it is the last step of the pipeline, we will begin our discussion with clinical validation since it sets the context for all of the upstream steps. Validation is expensive and time consuming, and the bar for success is daunting. It is not sufficient that a protein’s mean abundance differs between populations of cases and controls and that a clinical-grade assay is available; the successful protein biomarker must also perform responsibly and economically in a given clinical scenario.
Each clinical scenario will have different requirements for success. For example, a biomarker to be used for population screening for early cancer detection must meet very high standards; millions of healthy people are screened, and the overall disease incidence is low leading to a low pretest probability and high risk for false positives. Hence, to be beneficial, the test must have extraordinarily high specificity (or it must have a more specific follow-up test), and there must be a clinical intervention that will improve the quality of or prolong life.
Let’s take prostate-specific antigen (PSA) as an example. PSA is a protein secreted by normal and cancerous prostate epithelial cells. Currently PSA is the only serum- or plasma-based population screening tool we have for any cancer. Blood PSA levels are used to screen men over 50 years old for early diagnosis of prostate cancer. However, the specificity of this test is low; only 25–30% of men with a PSA elevated at 7.0 ng/mL will actually have prostate cancer on biopsy . Hence, this low specificity screening marker (PSA) is coupled to a more specific follow-up test (biopsy) for definitive diagnosis. The end result is that the poor specificity of PSA leads to an annual cost of $750 million in unnecessary medical follow up . There are other issues with PSA screening. For example, ideally we would diagnose only disease that will become clinically significant, otherwise intervention may cause more harm than good (this is called “overdiagnosis”). For example, since PSA screening has achieved widespread use, a man’s lifetime risk of being diagnosed with prostate cancer has increased to ~17%, yet his lifetime risk of dying from prostate cancer is only ~3% [18, 19]. So most men die “with” and not “of” their prostate cancer, and overdiagnosis in this population has been a major problem associated with significant treatment-related cost and morbidity.
In contrast to biomarkers for screening millions of mostly healthy individuals to detect asymptomatic disease, a biomarker intended to diagnose patients who have presented with a specific symptom must meet different minimum performance standards. For example, a patient with chest pain has an elevated prior probability that he is having a heart attack compared to the general population; because far fewer patients will be screened (only those with chest pain) and the prior probability of a heart attack is elevated over the general population, the number of false positive results requiring follow up will be much lower. These tailored clinical requirements, coupled to the need to validate markers in thousands of individuals with clinical follow-up information, create a situation wherein trials for validating biomarkers are lengthy, multimillion dollar endeavors. Hence, it is absolutely essential that we give priority to investing in clinical validation studies for only the most highly credentialed candidate biomarkers. In other words, each step in the biomarker pipeline must be designed with the clinical application in mind, rather than proceeding in a vacuum.
During the discovery of candidate biomarkers, a candidate database is populated via de novo discovery using genomic and/or proteomic technologies and/or through the curation of candidates from the scientific literature. Discovery efforts produce candidates (hypotheses), not biomarkers , and these efforts are inherently error-prone for multiple reasons. First, conventional technologies are not currently capable of globally interrogating biological proteomes, resulting in low sensitivity for directly detecting putative protein biomarkers. This is especially true for low concentration biomarkers or biomarkers resulting from disease-associated mutations, aberrant PTMs, or alternative splicing. Second, although genomics technologies are more comprehensive (and quantitative) than proteomics for discovering candidates, the correlation between DNA or mRNA copy number and protein abundance is imperfect [21–25], and thus many candidates discovered based on gene or mRNA copy number will not be elevated at the protein level. Third, proteomic-based discovery of biomarkers directly in plasma is challenging due to a small number of overwhelmingly high abundance proteins . Thus, increasingly, discovery efforts are focusing first on tissues or proximal fluids for discovery, only moving to plasma once candidates have been identified. However, we have no good rules for predicting tissue proteins likely to be successful as plasma biomarkers, and the error rate is high. Fourth, discovery efforts are often poorly designed without clear understanding of the nuances of interpreting high dimensional datasets, often leading to biases  and high false discovery rates (FDR). Finally, discovery efforts rarely use pertinent clinical information to prioritize markers that will ultimately have the highest likelihood of success in the desired clinical setting. As a result, discovery efforts create large lists of candidate biomarkers, many of which may discriminate two classes of interest but are unlikely to meet the high bar of clinical validation, as discussed above.
Certainly, we can stack the deck in favor of success during the discovery process by using appropriate study designs that avoid bias, carefully estimating and controlling the FDR of our discovery technologies, and integrating clinical information early in the discovery or prioritization process. However, despite our earnest efforts to implement these measures, the success rate for translating biomarker candidates into biomarkers capable of meeting the high bar of clinical validation will still be disturbingly low. Our best hope for success is to relieve the bottleneck of candidate verification (Fig. 1) and allow the maximum number of candidate biomarkers to be tested, improving our odds of finding clinically useful markers. Strategies for achieving this goal are discussed in detail below.
Because discovery efforts generate more candidates (100s–1000s) than there are available resources for follow up, an ill-defined prioritization step ensues. Often, candidates that show the most significant differences between cases and controls in discovery datasets are prioritized, without any information as to whether these may be the most useful analytes for clinical decision making. For example, many such candidates are a part of a generalized inflammatory response, and as individual clinical markers these are of very little diagnostic use due to their lack of specificity .
Proteins discovered in diseased tissues and predicted to be secreted or on the cell surface (based on the presence of a signal sequence, or N-linked glycosylation site) are often given priority based on the assumption that they might have greater access to plasma . A better understanding of the biology of plasma biomarkers would help us develop meaningful criteria for prioritizing candidates. For example, what are the predominant modes by which cellular proteins from diseased tissues access the plasma? Should we be focused on proteins predicted to be secreted, based on the abundance of secreted proteins amongst the known plasma proteome , or does this rule not apply for proteins not in plasma by design, but rather leaked or shed from diseased tissue? Alternatively, an additional targeted proteomics step such as dynamic inclusion [30, 31] or multiple reaction monitoring (MRM) MS  can be used as an empirical prioritization step to test selected biomarker candidates to determine if they can be detected in plasma.
Other times prioritization is done based on a biological hypothesis. For example, proteins acting in cellular pathways known or hypothesized to be deregulated in the diseased state are targeted for testing. Although there are examples of success using this approach , our biological knowledge base is far too incomplete to rely entirely on this method for prioritizing candidates. In this sense it will likely be of use to use gene expression and protein interaction network analyses based on genomics data to refine candidate lists [34, 35] by selecting candidates from neighborhoods of interest within the network.
To improve our success rate of moving candidate biomarkers successfully into clinical use, we must accommodate the harsh reality described above wherein even if our discovery efforts are honed to perfection, many candidates (perhaps the majority) will still not meet the high bar of clinical validation. In other words, despite our best efforts, it is highly likely that the majority of protein biomarker candidates will ultimately fail as useful clinical biomarkers. Hence, to succeed, we must develop a staged pipeline that incorporates a verification step that allows us to test (in pilot studies) the maximal number of candidates with highest possible throughput and lowest possible cost to ensure even a few successes .
Verification has a singular goal: to determine if there is sufficient evidence for potential clinical utility of a given candidate plasma biomarker to warrant further investment in that candidate for clinical validation studies. Because of the high cost (in terms of time, money, and consumption of clinical samples) of follow-up clinical validation studies, these “pilot” verification data are essential for credentialing a candidate to be moved forward.
In the current pipeline, the same assays are used for verification and clinical validation of biomarkers. In an ideal world this is advantageous since measurement of biomarkers can vary across different assays. One current assay often employed is the ELISA, which is understandable since a well-functioning ELISA can be relatively high throughput and has extraordinary sensitivity for quantifying the target analyte. Unfortunately, ELISA development is costly ($100 000–$2 million per biomarker candidate) and associated with a long development lead time (>1 year) and a high failure rate [37, 38], making it impractical to develop an ELISA for all putative biomarkers. As a result, even in the best-funded efforts, only a few percent of total candidate biomarkers for any given disease are actually tested (Fig. 1), not surprisingly leading to a high failure rate and an abysmal return on investment. Even if the immunoassay is to remain the gold standard for ultimate clinical application of validated biomarkers, we desperately need affordable bridging technologies to facilitate testing of a large number of potential candidates  if we are to identify the few that are likely to be of clinical use. To succeed, we must aim to increase our capacity in the verification stage by a 100-fold or more (Fig. 1).
The remainder of this article will focus on: (i) an exploration of the technological capabilities required for efficient verification of large numbers of candidates, (ii) a discussion of how current and emerging proteomic technologies measure up to these requirements, and (iii) a proposal for a staged approach to credentialing biomarkers in the most cost-effective manner.
Let’s assume that a rigorously determined list of 1000 candidate plasma protein biomarkers for detecting prostate cancer has been produced, and that the discovery process was well orchestrated using a study design that avoided bias, used well-characterized technologies with low FDRs, and that information on the likelihood of clinical relevance was also included in the prioritization or discovery of these candidates (e.g., markers were generated that correlate with clinical outcome). We lack sufficient resources to build ELISAs for all of these 1000 candidates, yet we would like to perform verification studies for as many of the candidates as possible to maximize our chances to find the subset of the most clinically promising candidates for validation studies. So we desperately need a novel experimental approach, and potentially a new assay method, to measure 1000 candidates to accommodate our conundrum.
Let’s use known information about the PSA biomarker to define the boundaries of performance that will be required of our new approach to candidate verification. As discussed above, despite its widespread use, the performance of the PSA marker is marginal at best. Hence, we will use its performance characteristics to set the minimum standard that we will accept in our ongoing search for new markers for the detection of prostate cancer. In other words, we do not want to aim to discover markers that perform worse than PSA; we will only aim to discover markers that perform as well as or better than PSA.
Using the empirical distribution of PSA levels described from a previous population study , we will simulate different experimental scenarios for verification studies. We will consider two levels of candidate credentialing as part of the verification stage (Fig. 1):
As we will demonstrate, it is useful to divide verification into these two levels of credentialing because the technology requirements (sample throughput, assay precision, assay multiplexing) differ between them. For example, we will argue that although biomarker candidates must be measured in individual patient samples for level two credentialing, a pooling strategy is possible for level one credentialing. Pooling is potentially advantageous since pooling plasma samples from multiple individuals provides an opportunity to reduce sample numbers (and hence throughput requirements), reduce the sample volumes required from individual clinical samples, and reduce the cost of verification. Reduced throughput requirements are a major advantage early on in verification, since this allows us to accommodate workflows that are too cumbersome and imprecise for validation studies, but that may provide a fast and relatively cheap way to screen a large number of candidates (see below).
Table 1A–C show the results of simulating different experimental scenarios for level one and level two credentialing, respectively. The statistical power for detecting PSA as a potential biomarker in plasma is calculated for various assay precisions (coefficient of variation, CV), numbers of samples (N), and numbers of replicate assays performed. Here statistical power is defined as our probability of detecting a biomarker with our assay given that the marker is differentially expressed between case and control; ideally, our experimental design should be associated with as high of a statistical power as possible (to avoid false negatives), minimally >90%. For level one credentialing, two study designs are considered: one using pooled plasma from multiple individuals (Table 1A.1, 1B.1) and another using individual plasma samples (i.e., not pooled; Table 1A.2, 1B.2). Several important conclusions can be drawn regarding the performance requirements of our ultimate verification workflow.
For level one credentialing, if we want ≥90% power to detect PSA as a potential biomarker worthy of further study (i.e., mean plasma levels significantly differ between case and control populations), we find that:
Based on the above considerations, in order to achieve level one credentialing (for a candidate typified by PSA and a homogeneous disease population), we will need plasma samples from 20 cases and 20 well-matched controls. Additionally, we need to devise an assay technology with the following characteristics
If the target biomarker is elevated in only a subset (S) of the case population of which we have no prior knowledge, our requirements are more stringent. In this scenario, we will need >100 samples (depending on the prevalence of the disease subtype in which the biomarker is present; Table 1B) and assay CV ≤ 0.2 to detect a marker elevated in at least 20% of the case population.
For level two credentialing, our goal is to identify the subset of candidate markers most likely to meet the minimally acceptable sensitivity and specificity in a given clinical setting. Hence, we must perform a pilot study to characterize the distribution of the marker in the population, allowing us to estimate its sensitivity and specificity. The success of this step relies on how accurately the sensitivity and specificity can be measured; therefore, assay precision (CV) again plays an important role. As we can see from Table 1C, a large CV will result in underestimation of sensitivity and specificity. For example, for PSA the actual sample sensitivity is 73.9% and sample specificity is 88% . In our simulation, when CV = 0.5 the estimated sensitivity is about 65%, which is 8.9% lower than the true sensitivity of PSA based on the population study . In addition to using a precise assay, a larger number (100s–1000s) of individual patient samples (Table 1C) will be needed compared with level one credentialing. For example, for the estimation of sensitivity (Table 1C.1), around 1000 cases and 1000 controls would be needed to get a 90% confidence interval (CI) spanning less than 5% (i.e., CI = (x − 2.5%, x + 2.5%)); or more than 5000 cases and 5000 controls will be needed to get a CI spanning less than 2% (i.e., (x − 1%, x + 1%)). These requirements can also be viewed from another angle. In order to have 90% power to identify PSA as a good candidate marker worthy of follow up (i.e., 70% sample sensitivity and 85% sample specificity), we would need 500 cases and 500 controls with a CV = 0.15. By comparison, this would require a couple of thousand cases and controls with a CV = 0.25.
Based on the above considerations, in order to achieve level two credentialing for a marker similar to PSA, we will need plasma samples from a minimum of 500 cases and 500 well-matched controls. Additionally, we need to devise an assay technology with the following characteristics:
Note that level two credentialing is still just a pilot study using limited throughput assays to determine if a candidate is trending toward utility and therefore worthy of making a better high-throughput clinical-grade assay. True clinical validation, however, will require an even larger-scale case-control or cohort study in order to carefully examine the impact of other covariates on the proposed marker test, to determine the positive predictive values and false referral probabilities in real practice, and to compare or combine the new test with existing clinical tests. Although candidates showing promise in pilot level two credentialing studies may still not pass the test of ultimate clinical validation, level two credentialing is important because it allows us to advance only the most promising of candidates forward to clinical validation trials, thereby saving time, money, and clinical specimens and helping to maximize our return on investment.
It should be noted that the power calculations described in Table 1 are based on known distributions of PSA levels in the cancer and the normal populations. Hence, these results can be generalized to other biomarkers showing similar population distributions, but markers with vastly different distributions would require that new calculations be performed based on the specific behavior of that marker. In the absence of knowing the population variation for markers yet to be discovered, it is useful to look at a well-studied example such as PSA to provide general guidance in planning verification studies.
The further along in the biomarker pipeline that a candidate moves, the more time and resources become invested in that candidate. Hence, it is prudent for us to advance candidates through sequential, economical stages, each stage requiring additional credentialing for a given candidate to be advanced (Fig. 1). Based on the above statistical considerations (Table 1), one possible staged workflow for biomarker candidate verification and validation is described below.
The goal is to determine if the mean levels of each of ~1000 candidate protein biomarkers differ between case and control populations to allow selection of a subset of most promising markers for further investment in assay development. Table 2 summarizes the reagents cost, lead time, sensitivity, throughput (working assay), and sample consumption for several existing and emerging technologies for measuring protein levels. If resources were unlimited, we would generate antibodies/immunoassays for each candidate so that we could achieve the highest possible sensitivity for detecting the candidates in plasma. Unfortunately, for the vast majority of protein biomarker candidates there will be no commercially available antibody, and generating antibodies to all of our 1000 candidates is cost- and time-prohibitive (estimated cost >$2 million; lead time 9–12 months).
One emerging alternative to the immunoassay is to use MRM-MS/MS to verify candidate biomarkers. This targeted mode of MS is well entrenched in clinical chemistry laboratories where it is used to measure “small” molecules such as drug metabolites [39, 40]. MRM differs from the typical shotgun MS/MS-based approaches used in discovering biomarker candidates in that MRM is a targeted technique directed to measure proteotypic peptides with known fragmentation properties. In MRM, a specific precursor ion (corresponding to a proteotypic peptide) and a specific fragment ion are selected by the MS1 and MS2 modes of the mass spectrometer, respectively. The instrument cycles through a number of precursor/fragment ion pairs (dubbed “transitions”) sequentially, and records the signal over time (the chromatographic elution of the analyte) . The combination of precursor/fragment ion masses and retention times of multiple transitions from the same peptide result in high specificity for the targeted peptide. In addition, the instrument is only analyzing a subset of ions present in a complex mixture (reducing the overall chemical background) resulting in a substantial increase in sensitivity. The MRM experiment is ideally suited to triple quadrupole instruments [41, 42]. Recently, LC-MRM-MS/MS has been coupled to stable isotope dilution methods to measure concentrations of proteotypic peptides as surrogates for quantification of biomarker candidates in complex biological matrices such as tissue lysates and plasma [32, 43–50]. Assay linear ranges in plasma typically span four to five orders of magnitude with CVs <20%.
The capability of multiplexing many peptides into a single run is an important advantage in using MRM-MS/MS. A triple quadrupole operated in MRM mode is capable of monitoring 100 peptides or more in a single run, depending on the number of time segments . Recent improvements in acquisition software allows for scheduling MRM transitions at specific time points in a chromatographic separation , increasing the number of transitions that can be monitored in a run to around 1000, drastically improving the multiplexability. Recent work demonstrating quantitative MRM using a MALDI source also has the potential to dramatically improve sample throughput .
The primary limitation for applying LC-MRM-MS/MS directly to plasma samples for biomarker verification studies is sensitivity. Typical LOQ are in the range of 100–1000 ng/mL of target protein in plasma [32, 45, 48]. Most novel and specific biomarkers are expected to occur at ≤ nanogram per milliliter levels.
Recently, it has been shown that coupling MRM-MS/MS with minimal fractionation of plasma dramatically improves the sensitivity, raising hope for measuring candidates directly in plasma [46, 51]. In these approaches, plasma is subjected to minimal fractionation using N-glycopeptide enrichment  or abundant protein depletion and strong cation exchange chromatography at the peptide level  prior to LC-MRM-MS/MS. For example, coupled with stable isotope dilution, using abundant protein depletion and SCX is multiplexable and able to achieve LOQ in the 1–10 ng/mL range without immunoaffinity enrichment of either proteins or peptides. However, this workflow is somewhat laborious, and its many steps will likely introduce experimental variation from run-to-run. In addition, the analysis timeframe is lengthened by the number of fractions analyzed by LC-MRM-MS/MS, limiting the sample throughput. Also, although the coefficients of variation for the MRM step range from 3 to 15%, reproducibility of fractionation and/or enrichment steps, such as abundant protein depletion and strong cation exchange chromatography, has not been assessed and will almost certainly introduce additional noise. Nonetheless, as discussed above (Table 1A.1), although a workflow that is limited in sample throughput and is associated with a CV≤0.5 would not be sufficient for level two credentialing or clinical validation, it would be perfectly acceptable for level one credentialing, where pooled samples and higher CVs can be tolerated.
In applying this approach to our hypothetical biomarker candidates, purchasing stable isotope standard (SIS) peptides for each of the 1000 candidate proteins would be cost-prohibitive. Hence, we propose that before investing in costly SIS peptides, LC-MRM-MS/MS can be performed semi-quantitatively by normalizing the amount of total peptides loaded on column across samples, or by using MRM transitions from nonchanging proteins in the sample to normalize the candidate response. As has been demonstrated , this semiquantitative look at candidates successfully allows triage of only those showing initial promise for further resource investment, and allows us to test literally hundreds-to-thousands of candidates with a minimum of upfront investment. Ideally (depending on resources), all markers meeting significance (using a statistical test such as the t-test) in level one credentialing will enter the level two credentialing. (For our specific PSA example, markers with ≥4×change would have approximately 90% chance of being truly different between the groups (Table 1)).
Even with limited fractionation, sensitivity remains a major limitation in this stage of the pipeline. For example, some candidates may be present in plasma at too low of abundance for detection in this workflow, yet be useful biomarkers that could be detected with higher sensitivity (affinity based) assays. We have no way of identifying these candidates without investing in high sensitivity assays, so we will still potentially have a high false negative rate for this class of candidates.
The goal is to estimate each marker’s sensitivity and specificity to determine if the marker shows sufficient promise to warrant a full clinical validation trial. This will require quantitative measurement of the candidate markers in many hundreds of individual patient samples with a CV≤0.2 (Table 1C.3), and hence will require a different workflow than that proposed above for level one credentialing.
Specifically, for level two credentialing an affinity reagent will need to be generated to enrich each candidate in a onestep, highly precise, preferably automatable process. A technique has recently been described that achieves these goals [37, 38, 53]. In this technique, Stable Isotope Standards and Capture by Antipeptide Antibodies (SISCAPA), immobilized affinity-purified antipeptide polyclonal antibodies are used to capture specific peptides of interest . Captured peptides are subsequently eluted and detected by MRM-MS/MS. Quantitative results can be obtained by spiking in SIS peptides at known concentrations prior to immunoaffinity capture. The concentrations of the measured peptides are then used as surrogates for the concentrations of the biomarker protein candidates. This technique, using affinity-purified polyclonal antibodies, has been shown to achieve LOQ in the nanogram per milliliter range in plasma [32, 38]. Furthermore, selection of very high-affinity mAb is expected to further improve the sensitivity of the SISCAPA method.
The need for an antibody as well as a SIS peptide significantly raises the cost of level two credentialing over that for level one credentialing; reagents alone will cost ~$3000 per candidate tested (Table 2). Hence, we will likely be limited to testing 100s of candidates.
Promising candidate biomarkers identified during level two credentialing will then be validated in real clinical practice, where the impact of other clinical covariates on the proposed test will be investigated. Positive predictive values and false referral probabilities at the population level will be determined. Additionally, panels of markers or perhaps the predictive value of changes in marker levels over time within an individual will need to be assessed. These complexities require a clinical-grade assay capable of high throughput and accurate measurements; hence the assay reagents must be well characterized and renewable, requiring that a mAb be generated. This further increases the cost and time investment in each candidate compared to level two credentialing (Table 2), and it is likely that resources will limit the numbers of candidates that can be tested to 10s.
As discussed above, the immunoassay is the conventional protein concentration assay format in the clinical setting; the ELISA is a well-known example. Despite its widespread use and favorable characteristics (quantitative, sensitive, high throughput), the ELISA does have some disadvantages [54, 55]. First, the creation of a sandwich immunoassay requires generating two different antibodies that both recognize the native protein and are free from steric interference with one another. Second, interfering autoantibodies can mask the surface features recognized by reagent antibodies [56, 57], a rarely appreciated problem in the clinical laboratory. Third, endogenous, nonspecific heterophilic antireagent antibodies can cause falsely elevated protein concentrations in as many as 3% of human samples [56, 58–60]. While it has less significance in the verification of potential biomarkers, a fourth disadvantage that plagues immunoassays in clinical settings is a lack of standardization. It is extremely uncommon for the clinical community to have access to truly useful standard materials that permit comparisons between the assays that were used to validate biomarkers and the many assays that might be used clinically at different centers .
The SISCAPA technology described above is one potential alternative to the ELISA. Advantages of SISCAPA are that (i) it only requires one antibody and so is cheaper; (ii) the antibody need not recognize the native protein, only a proteotypic peptide, so is easier to generate; (iii) the mass spectrometer essentially acts as the secondary antibody, so specificity is absolute; (iv) it is highly multiplexable and consumes small volumes (microliters) of clinical plasma specimens; (v) it directly detects antigen-derived peptide normalized to a stable isotope-labeled internal standard peptide, which could be easily standardized across laboratories. A disadvantage of SISCAPA is that the use of quantitative targeted MS methods requires that a proteotypic peptide be an accurate surrogate for measuring protein biomarker abundance. The validity of this assumption is threatened by the imperfect nature of trypsin digestion, for which no current standards exist. To avoid this source of error, some recent work has demonstrated the use of stable isotope-labeled proteins as standards in immunoaffinity-enrichment coupled to quantitative MS . The verdict is still out as to whether, after further development, the SIS-CAPA-MRM technology will ultimately replace (or complement) the ELISA as a gold standard for clinical diagnostics, or whether it will “simply” provide a desperately needed bridging technology between verification and validation studies [36, 37].
Given conventional capabilities, it is imperative that we develop a practical biomarker pipeline allowing pilot testing (verification) of thousands of protein candidates in hundreds of patient samples in a reasonable timeframe (<1 year) so that only the most promising candidates are triaged for lengthy and costly clinical validation studies. There are many unmet needs that could dramatically impact our success in assembling such a pipeline.
For example, the poor availability and often unacceptable quality of commercially available antibodies necessitates expensive and time-consuming de novo reagent generation for most candidates. As is being addressed (http://proteomics.cancer.gov/programs/reagents_resource/), there is a tremendous opportunity to partner with industry as well as academic efforts  (http://www.proteinatlas.org/) to generate well-characterized affinity reagents to the human proteome. If the immunogens were properly designed to support MS applications, these reagents would be invaluable for the biomarker pipeline described herein.
Additionally, ongoing efforts to clone, tag, and purify human proteins [64, 65] have the potential to greatly facilitate MRM-based biomarker candidate verification. The choice of the proteotypic peptide for monitoring is critical in constructing a successful assay. Small-scale purification of biomarker candidate proteins would allow LC-MS/MS analysis of the candidate proteins and thereby facilitate empirical selection of high-performing proteotypic peptides and transitions for targeted MS/MS analyses in complex human specimens. Without the ability to generate these empirical data, one must rely on mining of large proteomic databases (e.g., PeptideAtlas , Global Proteome Machine Database , PRIDE ) for peptides seen frequently or at relatively high intensity. Unfortunately, not all proteins of interest are represented in the databases and the extent of variability in manufacturer/instrument platforms for choosing proteotypic peptides remains to be determined. Recent attempts have been described to computationally predict proteotypic peptides, but the generality of this approach remains untested .
Technological improvements that increase the sensitivity of targeted LC-MS-based proteomic measurements of candidate proteins [70, 71] will also greatly improve our success rate by decreasing our false negative rate during level one credentialing. It is also conceivable that if the sensitivity of the instrument platforms can be improved ≥104, we may no longer rely on the generation of antibodies. This would tremendously decrease the cost and lead time for testing candidates and allow the number of candidates that can be tested in validation studies to be increased ≥100-fold. In the meantime, the generation of high-throughput, affordable, highly reproducible depletion, or fractionation technologies that further improve our sensitivity for measuring low abundance analytes will further improve the pipeline’s success.
On a biological front, the acute phase (aka host or inflammatory) response has been extensively slandered in the biomarker world, since the predominantly abundant proteins whose levels change as part of this response are not altered in a disease-specific pattern. Hence, due to their low specificity, they are considered to be diagnostically of little or no value. However, little is known about this host response, except for a handful of proteins. An organized effort to systematically characterize this response on a more global level would aid the biomarker field either by allowing us to eliminate these proteins from further consideration or alternatively by revealing that when they are considered more comprehensively they may actually have diagnostic value in some clinical settings.
As targeted proteomic platforms become more sensitive and as high-quality reagents become available for all human proteins, it will someday become possible to build sensitive, targeted assays for the entire proteome, merging the discovery and verification stages of the biomarker pipeline and allowing us to truly comprehensively test for protein biomarkers. Until this time, we will be dependent on a hypothesis-driven approach to selecting biomarker candidates for testing. Genomic technologies provide excellent sources of biomarker candidates via gene expression profiling and DNA copy number measurements. Sequencing and tiling arrays provide the opportunity to discover disease-associated mutations, novel fusion proteins, or splice variants that may show high specificity as biomarkers. Additionally, for many diseases, well-characterized disruptions in normal physiology or cell biology may provide a source of hypothesis-driven candidate selection such as angiogenesis in cancer.
Finally, the daunting, costly, complex, interdisciplinary effort required to move a candidate through from discovery to validation creates a situation where there is no feedback loop because those doing the actual discovery are often unaware of the outcome of the downstream follow up. Oftentimes, proteomic core facilities are paid (or collaborate) to generate biomarker discovery datasets for investigators studying a particular disease of interest. Lists of identified protein candidates, with some estimate of their relative abundances in cases versus controls, are then passed back to the primary investigators for follow-up studies that largely involve a painfully uninformed prioritization of candidates followed by costly and lengthy assay generation. Not surprisingly, the success rate of this approach is abysmal. These failures should not be misinterpreted as evidence that proteomics is not a worthwhile endeavor; rather, they are evidence that we need better integration. Currently, there is a disconnect between the discovery, verification, and clinical validation stages, making it impossible for paradigms to emerge that will iteratively improve performance. The recent application of LC-MRM-MS/MS methods to candidate verification provides us a new and exciting opportunity to keep proteomic centers engaged in the biomarker pipeline beyond basic discovery, and will thereby provide them with a valuable feedback loop about the unique issues of biomarker discovery proteomics compared to the more familiar protein-cataloging proteomics, and thereby facilitate iterative changes to improve overall success rates.
The authors are thankful for the generous support of the National Cancer Institute’s Clinical Proteomic Technologies for Cancer, the Entertainment Industry Foundation, and the Paul G. Allen Family Foundation.
The authors have declared no conflict of interest.