Emerging proteomics profiling technologies hold enormous promise for illuminating new biomarkers. However, successful applications to human disease are still lacking. This is due, in large part, to the lack of a coherent, demonstrably successful pipeline enabling systematic building of credentialing information around biomarker candidate proteins emerging from discovery proteomics experiments. Here we developed an MS-intensive pipeline that coherently integrates high-performance LC-MS/MS, AIMS and SID-MRM-MS for biomarker candidate discovery, analytical qualification and quantitative verification, respectively, and applied the pipeline in the context of cardiovascular disease to yield novel cardiovascular biomarkers meriting further evaluation in large, heterogeneous patient cohorts. The discovery, qualification and verification steps systematically informed the next stage of the pipeline and the analyses took specific advantage of key attributes of the MS-based technology platforms used at each stage. An essential feature of this pipeline is transitioning from the analysis of proximal fluid (or tissue) for biomarker candidate discovery to peripheral blood for qualification and verification of candidates.
We applied our pipeline approach, beginning with discovery proteomics, to a unique clinical model of MI that allowed for precise kinetic analysis in patients who serve as their own biological controls. Coronary sinus catheterization provided the opportunity to sample directly from the organ of interest. This approach enabled the use of a proximal fluid of the heart for discovery of candidate biomarker proteins, rather than peripheral plasma where proteins arising from the myocardium would have been more diluted. The consistent temporal changes in the levels of candidate biomarkers within individual patients and in comparisons between patients () underscore the biological plausibility of the observed association between proteomic changes and MI. This study emphasizes the important point that small numbers of samples may be used for discovery if the effect size is large and if these initial findings can be followed up with methods (specifically AIMS and SID-MRM-MS) that enable large numbers of candidates to be further credentialed or discarded by analysis of additional patient samples. In the current study, we began with samples from three time points in three patients undergoing PMI, and focused on changes of at least fivefold in protein abundance before identifying a protein as a candidate. This experimental design enhanced our power to identify statistically meaningful changes. It is important to emphasize that the MS tools, data acquisition and analysis methods as well as the statistical tests used are not specific to this human model, but are broadly applicable to the analysis of any perturbational experiment, including the more common biomarker discovery paradigm in which cases and controls come from different patients.
Using untargeted, data-dependent LC-MS/MS–based proteomics for discovery, we identified 1,105 unique total proteins with two or more peptides and FDR ≤ 1.5% in the plasma from the coronary sinus, or 999 proteins after excluding immunoglobulins and common contaminants such as keratins. The identified proteins spanned ~6 or 7 orders of magnitude of abundance, based on detection of peptides from REG3, IGFBP4 and LCN2, all of which are known to be present at 1–130 ng/ml levels in the plasma of healthy people22
. Consistent with prior studies23,24
, our pipeline underscores the need for abundant protein depletion combined with extensive peptide- or protein-level fractionation before LC-MS/MS for identification of proteins present at low ng/ml range in plasma. In the present study, the nine discovery samples yielded >700 sample subfractions, necessitating ~2,800 h of instrument time on the Orbitrap for LC-MS/MS analyses. The resulting list of proteins detected with high confidence in plasma also adds to the list of high-quality studies of the human plasma proteome23,24
Qualification by AIMS is an essential element of our pipeline for biomarker prioritization (), providing a reliable and relatively high-throughput method to prioritize lengthy lists of biomarker candidate methods discovered in proximal fluids or tissues. AIMS, a targeted, label-free, relative quantification method, can be thought of as the MS equivalent of a highly multiplexed western blot. Using AIMS, we effectively configured 121 MS-based western blots in a single series of analyses without the need for antibody reagents. AIMS analyses in peripheral plasma, together with the temporal correlation analysis, qualified 52 of the 121 candidate proteins (43%) derived from discovery proteomics in the coronary sinus, thereby prioritizing the candidate list to focus critical resources for quantitative assay development by MRM-MS on those qualified protein biomarker candidates with a high likelihood for success in being detected and quantified in peripheral patient plasma.
The inability to rediscover 17 out of 83 AIMS-qualified candidate biomarker proteins by discovery proteomics is important, and underscores that DDA is not as efficient as AIMS for candidate qualification. The high degree of correlation (>0.85 for three out of the four proteins verified by SID-MRM-MS) between the temporal behavior in protein abundance observed by AIMS and the more quantitative approach of SID-MRM-MS further demonstrates the utility of AIMS as a method to prioritize candidates for the resource-intensive SID-MRM-MS assay development. Together, these results suggest that AIMS is an essential technology in a functioning biomarker-discovery-through-verification pipeline, and that AIMS provides increased sensitivity for candidate biomarker qualification compared to data-dependent methods (Supplementary Results and Discussion
). Although, in principle, it is possible to proceed directly from discovery proteomics to MRM-MS-based assay configuration, doing so is prohibitive with respect to the cost and time involved. Use of AIMS to prioritize assay development results in considerable time and cost savings (Supplementary Results and Discussion
and Supplementary Methods
The third step of our pipeline is verification6
using SID-MRM-MS, or ELISA for the minority of cases where antibodies are available (Supplementary Table 5
). Antibodies suitable for construction of ELISAs were available for only four of the novel candidate biomarker proteins that emerged from discovery and single-antibody reagents or commercial ELISAs were available for seven other candidate proteins. In principle, antibody-based measurements could be used at all steps in the validation process. However, few immunoassay-grade antibodies of sufficient quality and number (two per protein candidate) are available. Moreover, because developing a new, clinically deployable immunoassay is both expensive and time consuming, such development is normally restricted to but a short list of already highly credentialed candidates. The need for alternate methods to rapidly configure quantitative assays to credential novel protein candidate biomarkers is highlighted by a recent study of pancreatic cancer22
. Over 600 proteins were quantified in plasma of which 165 (~27%) were found to change in abundance with development of pancreatic cancer. In their verification studies, antibody reagents for only 11 of these proteins were available, including an antibody specific for CA-19-9, a marker of pancreatic cancer already in clinical use. Owing to the lack of antibody reagents, no follow-up studies were done for the remaining proteins of interest.
As a proof of principle, we developed quantitative SID-MRM-MS assays for four of the novel, heart-specific proteins discovered, together with additional cardiovascular-related proteins already in clinical use or of growing interest10
. Highly consistent temporal trends were observed when we measured two or three peptides for each of the novel candidate proteins in four patients. Additionally, there was a high degree of correlation between AIMS and SID-MRM-MS results for the novel candidates, further supporting our findings that AIMS is a useful initial method for label-free quantification. Levels of MYL3, TPM1 and FHL1 all remained sufficiently high at 240 min after ablation to warrant further investigation in larger clinical cohorts. Inaccurate quantification can occur in SID-MRM-MS due to problems in MRM-MS data acquisition and analysis8,25
, but potential problems can be circumvented (Supplementary Results and Discussion
Our unbiased analysis also rediscovered many of the known cardiovascular biomarkers, including creatine kinase, MB, FABP and MPO and extended prior work by identifying many new proteins not previously associated with acute myocardial injury in humans. Supplementary Results and Discussion
details a number of candidate biomarkers with published reports of proteins potentially associated with cardiovascular disease.
Our approach to enhanced biomarker discovery emphasized the in-depth analysis of a small, extensively phenotyped patient cohort. Promising proteins were then validated in additional, more heterogeneous cohorts. Some limitations are implicit in this approach. First, although serial sampling within patients constrains interindividual variability and improves signal-to-noise ratios, the small discovery population means that changes in proteins that failed to reach nominal significance in our study may still be scientifically important and warrant further investigation. Second, the marked cardiac perturbation that characterizes the PMI model may have influenced the type and magnitude of protein alterations and hence the ultimate clinical utility of our markers. Notably, however, the finding that several of the biomarkers appear elevated in subjects with spontaneous MI and reversible myocardial ischemia supports the clinical relevance of the model. Finally, although our proteomics markers had excellent discriminatory power in subjects with spontaneous ischemic disease and myocardial injury, these findings must be further evaluated in larger populations, improving estimates of predictive value, permitting comparison to and adjustment for traditional cardiovascular risk factors, and allowing evaluation within subgroups of interest including those defined by gender, race and comorbidities.
In summary, we have developed a generalizable, proteomics-based, discovery-through-verification pipeline and demonstrated its value by identifying novel protein biomarkers of myocardial injury. We have demonstrated that this pipeline can successfully credential candidate biomarkers using MS-based targeted assays and immunoassays when the appropriate reagents exist. We have developed and deployed assays for targets enriched in myocardium, and are applying our methods to interrogate the remaining candidates from our discovery proteomics studies. In addition to markers of infarction, our candidates include several proteins that may serve as markers of reversible myocardial ischemia, a condition for which there are no circulating biomarkers. Markers emerging from these studies can be integrated with established biomarkers to create multimarker risk scores, providing additional information to help guide cardiovascular disease management. We anticipate that our strategy could be used in many other clinically relevant scenarios where planned perturbational experiments are performed to elicit pathological phenotypes. These treatments might include drug administration, oral glucose challenge for diabetes26
, exercise testing for cardiovascular disorders27
or dialysis for kidney disorders28