|Home | About | Journals | Submit | Contact Us | Français|
High-throughput genomic and proteomic studies have generated near-comprehensive catalogs of biological constituents within many model systems. Nevertheless, static catalogs are often insufficient to fully describe the dynamic processes that drive biology. Quantitative proteomic techniques address this need by providing insight into closely related biological states such as the stages of a therapeutic response or cellular differentiation. The maturation of quantitative proteomics in recent years has brought about a variety of technologies, each with their own strengths and weaknesses. It can be difficult for those unfamiliar with this evolving landscape to match the experiment at hand with the best tool for the job. Here, we outline quantitative methods for proteomic mass spectrometry and discuss their benefits and weaknesses from the perspective of the biologist aiming to generate meaningful data and address mechanistic questions.
Throughout the pioneering days of genomic and proteomic research, much effort was put into constructing comprehensive catalogs of biological data, exemplified by the sequencing of the human genome (1, 2) and proteome (3, 4). Although providing a necessary foundation, these catalogs remain insufficient for describing the complex biological mechanisms at work within cells. At its heart, biology is the study of dynamic processes in living organisms, and proteins are the operators that exert direct control over processes at the cellular level. Today, we seek to build upon these genomic and proteomic foundations to understand closely related biological states, including the stages of a therapeutic response, cellular differentiation, and cancer progression. Such studies promise to reveal the core of mechanistic cell biology by elucidating the relationships between proteins and their roles in cellular processes.
In the past half-century, key insights into many cellular processes have been revealed by demonstrating differential abundance of individual proteins across a small number of conditions. But as with a single photograph, information about a single protein in isolation provides only a narrow portal for viewing the dynamics of the cellular landscape. We now appreciate that the coordination of many proteins and the responses of multiprotein networks to the cellular environment define this landscape. To understand cellular mechanisms, experimental data showing the dynamic nature of the proteome under physiologic and experimentally manipulated conditions are required. In practice, this involves populating a multidimensional matrix of data “photographs” as either a function of time or of comparisons across many closely related states.
Over the last 50 years, our understanding of the levels, localization, interactions, and activation states for proteins has exploded alongside the emergence of increasingly complex biochemical, biophysical, and molecular tools. Biochemical assays using radioisotopes (5, 6) and spectrophotometric readouts (7) established a quantitative framework for understanding proteins and their functions, either individually or in small groups. With the emergence of monoclonal antibodies, molecular and cell biologists have gained the ability to probe any protein or any protein feature against which a specific reagent could be generated (8). A parallel explosion in molecular biology made it possible to overexpress or knock down genes, append affinity tags (9), and attach fluorescent reporters (10) to proteins in both cultured cells and in vivo. Additional techniques such as two-dimensional gel electrophoresis (11) allowed for the separation of complex protein mixtures. The most recent revolution in protein chemistry has come from the field of proteomic mass spectrometry (12), opening the door to the direct characterization of proteins and post-translational modifications (PTMs)1 with site-specific resolution.
Mass spectrometry (MS) provides access to the proteome through three main avenues: identifying the proteins present, assessing their post-translational modification states, and quantifying the relative abundance of each protein-modification state combination (13–17). Although characterization of protein-modification state combinations would ideally be done on intact proteins (18) to reveal the full repertoire of “proteoforms” (19), the majority of “proteomic” analyses are actually performed on peptides generated by proteolytic digestion of protein samples (12). With a focus on speed, sensitivity, and dynamic range in sequencing peptides from complex mixtures, improvements in mass spectrometry instrumentation have brought about dramatic improvements in cataloging the protein constituents of a sample; recent reports have shown the ability to identify the entire yeast proteome in an hour (20) and provided a draft of the human proteome (3, 4). Likewise, sophisticated metal- and immunoaffinity-based enrichment methods (21, 22) have proven useful in elucidating the repertoire of proteins carrying PTMs (23). Focused examples involving phosphorylation (17, 24), ubiquitination (25, 26), acetylation (27), and glycosylation (28, 29) profiling have been thoroughly reviewed elsewhere.
Obtaining abundance measurements for proteins and their various modification states across conditions produces a detailed picture of protein activity and is now possible via a myriad of technologies. In the simplest form of quantitative mass spectrometry, label-free analysis determines the signal intensities or peak areas associated with individual peptides (30). Binary and ternary comparisons provide additional accuracy over label-free techniques and have become routine using methods such as stable isotope-labeling of amino acids in cell culture (SILAC) (31) and reductive methylation (32), as have comparisons of 4–10 conditions using isobaric tags for relative and absolute quantification (iTRAQ) (33) and tandem mass tag (TMT) chemical-tagging reagents (34). In addition, isotopically labeled synthetic peptides and recombinant proteins have also proven valuable for more focused experiments. Further detailed information about these approaches can be found in a number of previous reviews (35–39). Finally, although a full description of such methods falls outside the scope of this review, significant advances have been made in data-independent analysis (40). These methods include both hypothesis-driven, high-throughput targeted analyses by selected and multiple reaction monitoring (37, 41–44) and further advances to the approach through parallel-reaction monitoring of multiple peptides at the same time (45, 46). Such methods can be leveraged to generate proteome maps through targeted post hoc data extraction (47), providing a complement to conventional data-dependent analyses. Proteomics technologies have reached the stage where studies comparing many samples are now feasible, facilitating time course experiments, multiple condition comparisons, and facile introduction of biological replicates.
Each quantitative MS technique has its own strengths and weaknesses, and the pairing of an experiment to the right quantitative model is essential to maximize the utility of the results. A biologist should consider the following questions. What are the key points in designing a successful proteomics analysis? How does one choose among the available proteomics technologies? How and when is multiplexing valuable? Here, we provide background on quantitative proteomic techniques useful for the parallel analysis of multiple samples, and we discuss their benefits and limitations in addressing biological questions.
Many discovery proteomics experiments today utilize data-dependent tandem mass spectrometry. In data-dependent analysis, the instrument is programmed to first generate an MS1 spectrum that surveys the masses and signal intensities of intact peptide ions. As this information is frequently insufficient to conclusively match spectra to peptides, the instrument performs a series of secondary MS scans (termed MS/MS or MS2), where individual peptide ions are isolated and fragmented along their amide backbones. Differences in mass between these peptide fragment ions are used to decipher peptide sequence information (48).
Although the intensities of ions observed in an MS1 scan are generally proportional to peptide abundance in the sample, absolute signal intensities can vary depending on a number of factors. Run-to-run differences in sample complexity, chromatography, data-dependent sampling, and peptide ionization efficiency have historically limited the reliability of comparing peptide ion signals across runs. Even with biological replicates in hand, the prevalence of missing values in proteomic data has posed a challenge for downstream statistical analysis.
A breakthrough for quantitative proteomics came in the application of stable isotope dilution approaches to the quantitation of proteins and digested peptides. Stable isotopes such as 13C, 15N, 18O, and 2H (deuterium) can be introduced to proteomic samples in a variety of ways (Fig. 1). An early approach, isotope-coded affinity tags (ICAT), employs a biotin affinity tag coupled to a stable isotope-labeled (i.e. 8 deuterium; 2H8) linker and a thiol-reactive group (49). These tags allow for quantitation of cysteine-containing peptides via MS1 survey scans where intensities of peptide ions labeled with the “light” and “heavy” ICAT reagents are directly compared. This approach provides relative ratio measurements of peptides between samples as a proxy for overall proteoform abundance. Although effective, the reliance on cysteine-containing peptides means the majority of peptides lacking cysteine are unmeasured. Thus, alternative techniques such as dimethyl labeling, quantitative carbamylation, and the incorporation of other stable isotopes (18O and 15N) were subsequently developed (50–54).
An inherent challenge with chemical labeling approaches is that they only account for differences in sample preparation that occur after the labeling step. Seeking to minimize the variability imparted through sample handling, metabolic labeling was developed for proteomics (55). A popular application of metabolic labeling is SILAC (31), where one or more naturally occurring amino acids are replaced with synthetic counterparts, enriched in the stable isotopes 13C and 15N. In common practice, fully labeled [13C615N2]lysine and [13C615N4]arginine are used in combination so that all peptides arising from trypsin digestion (except for those at the C terminus of the protein) can be systematically quantified. Incorporating 13C and 15N labels alleviates the chromatographic retention time shifts sometimes observed with 2H labeling. Amino acid reagents such as [13C6]lysine, [13C6]arginine, and [15N4]arginine (with lesser numbers of stable isotopes incorporated) are also useful and can be combined with fully labeled reagents to extend multiplexing beyond simple paired comparisons. In fact, proof-of-concept studies examining adipocyte differentiation and tyrosine phosphorylation dynamics have demonstrated 5-plex quantitation using these and additional forms of arginine (i.e. 13C615N42H7) (56) or combinations of labeled lysine, arginine, and tyrosine (57), respectively. Nevertheless, a limiting factor of multiplexing capacity in traditional SILAC has been the overlap in the mass dimension between different isotopically labeled forms, even in high resolution MS1 spectra. Moreover, splitting the signal across labeled forms of the same peptide increases the complexity of signals seen by the mass spectrometer, thereby impacting sensitivity and peptide identification rate.
One solution for multiplexing is to use standard metabolic labeling to “unroll” a large sample set into a series of binary comparisons. Instead of labeling multiple samples, this approach involves the preparation of an isotopically labeled reference standard that can be added to each experimental contrast. Comparing each sample against a single common reference mixture makes it possible to determine relative peptide abundance as a ratio of ratios. An advantage of this over label-free analysis is that the presence of reference features in each biological sample empowers longitudinal studies by permitting MS1-based quantification even when the corresponding features are absent (58–60). This approach is equally applicable in the context of both data-independent analysis and targeted experiments, where individual samples are compared with a common reference sample.
A variety of approaches for the construction of a common reference sample has been reported, ranging from focused studies using isotopically labeled synthetic peptides (i.e. AQUA) (61) or recombinant proteins (62) to more global studies employing labeled cell lysates (63) or mammalian tissues (64). In the case of cell lines, computational methods such as principal component analysis have been combined with initial screening experiments (either by proteome or microarray) to select an optimally comprehensive mix of cell lines (65, 66). By carefully combining cell lines that span the breadth and diversity of the proteome into a mixed protein reference standard, “Super-SILAC” (58) and similar approaches have been useful for comparative analysis of human tissues (67), primary neurons (68), cancer cells (69, 70), and model organisms such as nematodes (71), fly (72), zebrafish (73), and mice (74, 75).
Despite their utility, multiplexed approaches involving binary comparisons against a labeled reference sample have certain drawbacks. Considerable instrument acquisition time and bioinformatic analysis are necessary to analyze each biological sample and to select and optimize the common reference mixture. Undersampling of the proteome also remains a concern, as analysis time is spent examining redundant precursor ion species from the reference mix in each run (76). Furthermore, abundance determinations based upon the ratio-of-ratios approach are affected by the abundance of the feature in both the sample and the common reference. Nevertheless, the reliability and quantitative accuracy of these methods make them attractive methods for a variety of purposes.
The metabolic and chemical labeling methods discussed thus far operate on the principle of incorporating stable isotopes into one or more samples to create peptide features that differ in their overall precursor ion mass. In contrast, isobaric tagging utilizes a carefully constructed molecular tag where each of the differentially labeled forms confers the same overall mass addition but yields a unique reporter ion upon fragmentation. Such isobaric tagging reagents consist of three parts as follows: a reactive group, a reporter region, and a balance region. The reactive group is used to couple the isobaric reagent onto the peptide, although the complete reagent remains attached through MS1 analysis. Upon fragmentation, the isobaric reagent is cleaved between the balance and reporter regions to liberate the low m/z reporter. By varying the distribution of stable isotopes between the reporter and balance regions, a series of reporter ions with varying masses can be generated. As reviewed by Rauniyar and Yates (77), iTRAQ (33) and TMT (34) have each made significant contributions in multiplexed proteomic analysis.
iTRAQ was first reported in 2004, where its use as a 4-plex reagent allowed for the simultaneous quantification of proteomic changes in yeast resulting from nonsense-mediated mRNA decay (33). iTRAQ reagents use an amine-reactive group to derivatize both the peptide N terminus and the ε-amine of lysines. Upon tandem MS analysis, iTRAQ reagents release singly charged reporter ions with masses between 114 and 117 m/z. Subsequent optimization extended iTRAQ to 8-plex, utilizing masses between 113 and 121 Da (avoiding potential interference from naturally occurring phenylalanine immonium ions at 120 Da) (78). The TMT reagent has a different structure than that of iTRAQ (79) but achieves largely the same goal, generating a 6-plex reagent with reporter ions of 126 to 131 m/z (34). TMT was later extended to 10-plex by exploiting the mass deficit between 15N and 13C (80, 81). Despite a mass difference of only 6.32 mDa, pairs of reporter ions ranging from 127 and 130 m/z are readily distinguishable from one another with standard high resolution instrumentation.
Isobaric tagging approaches mitigate several issues presented by MS1-based quantitation. With isobaric tags, the precursor ion signal for all samples accumulates in the same mass window at the MS1 level, aiding sensitivity. Multiplexed analysis within a single tandem mass spectrum allows for direct comparisons that minimize run-to-run variability, maximize analysis time, and help to eliminate missing values (82, 83). Early isobaric labeling experiments were carried out on time-of-flight (TOF)-based mass spectrometers, as the mechanics of ion-trap instrumentation and collision-induced dissociation rendered the low m/z reporter ions unmeasurable (84). The advent of higher energy collision dissociation within the LTQ-Orbitrap sidestepped this low mass cutoff and opened the door to the examination of isobaric tag reporter ions on Orbitrap instruments (85). At the same time, early studies showed TMT data to be affected by ratio compression, a phenomena in which observed ratios underestimate expected ratios for defined mixtures (86–90). This phenomenon stems from the presence of co-eluting peptides within the precursor ion isolation window used for MS/MS fragmentation. To rectify this issue, strategies involving gas-phase fragmentation (89) and MS3 analysis involving a second round of selection and fragmentation of one or more MS2 fragment ions (90, 91) have been introduced.
Technological advances promise to extend multiplexing beyond its current limits. For example, the combination of isobaric and metabolic labeling has been proposed to simultaneously measure up to 18 samples within a single instrument analysis. For an 18-sample comparison, Dephoure and Gygi (92) used a combination of triplex SILAC labeling along with 6-plex TMT tagging to study the effects of rapamycin in yeast. Pairing this approach with 10-plex TMT could conceivably empower 30-plex analysis, assuming that dedicated methods could be established to focus the mass spectrometer so that triplet features were consistently and equivalently sampled. Similar levels of multiplexing are proposed for the next-generation isobaric tagging reagents, including combinatorial isobaric mass tags (93).
Isobarically tagged protein-profiling methods have been employed successfully in a number of settings involving baseline proteomic profiling of ovarian cancer cell lines (94) and pancreatic organoid models (95), as well as in specialized applications such as validating protein-protein interaction networks (96) and characterizing site-specific acetylation (97). Grimsrud et al. (98) used TMT to characterize the in vivo mitochondrial phosphoproteome of over 50 mice across eight different conditions (strain, age, and obesity status) in a type 2 diabetes model. One particularly appealing application of multiplexed quantitation consists of matching small molecules with their cellular targets by thermal stability profiling. There, TMT-based quantitation has been used to generate pairs of melting curves for as many as 7000 proteins in the presence and absence of drug compounds (99).
Building on the concept of the neutron mass deficit between 15N and 13C (80, 81), Coon and co-workers (100) developed an MS1-centric multiplexing technique that takes advantage of isotopically labeled amino acids that differ in mass by at little as 6.32 mDa. The elegance of NeuCode stems from this compact representation of labeled samples within a narrow mass range. These isotopologues are spaced closely enough together that they appear as a single peak in MS1 scans at 30,000 resolution and are co-isolated for MS/MS fragmentation. Despite appearing isobaric in low resolution MS1 survey scans and ion trap MS2 scans, these isotopologues can be resolved in MS1 survey scans performed at ultra-high resolution (~500,000 at 400 m/z) to reveal quantitative data. The reduction in apparent sample complexity that comes from co-isolation and co-fragmentation in the MS2 offers the potential to mitigate the sensitivity limitations of SILAC, which stems from the distribution of ion signals into multiple precursor ion populations (101). Although this approach combines the most desirable qualities from isobaric multiplexing with the quantitative readout in an MS1 survey scan, to successfully implement this technique, instruments capable of achieving ultra-high resolution on a chromatographic time scale are necessary. This need for high-resolving power also requires longer scan times than traditional SILAC, losing some of the time that was gained from simplifying the precursor ion population. Additional studies are also required to understand the phenomenon of peak coalescence, where closely spaced isotopologue signals from abundant or high m/z features can merge together. As with the ratio compression observed with isobaric tags, improvements to instrument performance and data analysis algorithms promise to mitigate these limitations as the technique matures.
Advances in instrument design continue to produce mass spectrometers that are faster, more accurate, and more sensitive. Just as these improvements have fueled the application of stable isotope-based quantitation methods, they have also improved the depth and reproducibility of peptide measurements without the use of isotopic labels. Through careful experimental design, sample replication, and improved data analysis procedures, many label-free experiments have proven capable of generating results on par with labeling methods (102). As label-free experiments require less complicated sample preparation and translate well to both in vivo models (103) and clinical specimens (104), they are often chosen for studying complex biological systems. The recent report from Humphrey et al. (103) describes the type of large, parallel study that is now possible, investigating in vivo phosphorylation dynamics upon insulin stimulation. In a brute-force approach, label-free analysis was carried out across at least eight biological replicates at five different time points, where each sample was interrogated using 2–4 h of instrument time. Similarly, global proteomic profiling of human colon and rectal tumors and spectral counting have revealed new tumor subtypes and candidate driver genes (104).
Label-free approaches provide an appealing alternative to metabolic labeling approaches while still retaining the definition of the MS1 peptide ion signal as an area-under-curve measurement. However, in label-free methods, samples are compared across different instrument analyses without employing a stable isotope-based reference. Although comparisons across multiple analyses can introduce variability at the individual observation level, repeated measurements across biological replicates and data aggregation at the peptide or protein level can empower modeling approaches using established statistical methods. A variety of methods has been proposed to normalize quantitative data from area-based label-free experiments, including empirical normalization (105, 106), regression-based approaches (107), correlation-based approaches combined with regression (108), and statistical inference (60), among others (30, 109). In 2012, Clough et al. (110) reported a generalized statistical model for label-free analysis while characterizing hypoxia-induced proteomic changes in breast cancer cell lines. This approach was released to the community as the MSstats package for the R statistical programming language (111). We and others have applied similar statistical approaches to the study of cysteine oxidation (112), phosphorylation dynamics (59), inducible ubiquitination (113–116), and cell surface proteins (117). When sufficient data can be generated to overcome the inherent variability of individual peptide measurements, label-free analyses provide an attractive option for quantification while avoiding laborious or technically challenging metabolic labeling techniques.
From a biologist's perspective, choosing between the available methods for quantitative proteomic analysis can be a daunting task, as one seeks to balance optimal experimental design against the practicalities of time, cost, and available technology. Throughout this process, the importance of an active dialogue between the experimental biologist, the analytical mass spectrometrist, and the computational proteomics specialist is paramount. An early, important consideration is whether the proposed experiments aim to stand alone in testing a biological hypothesis or instead represent a screening approach to generate lead candidates for orthogonal validation. Screening-type experiments provide more freedom in experimental design and have proven to be enormously valuable in revealing the key players in cellular signaling, but these results come at the expense of variability and incomplete information. Common screening approaches involve the profiling of post-translationally modified peptides, whereby differentially abundant phospho-, acetyl-, or ubiquitinated peptides can quickly reveal when and where PTMs are being added to or removed from proteins of interest. It is important to determine early on whether abundance changes measured at the aggregate protein level are sufficient or whether information at the peptide or site level is necessary for follow-up study. As compared with global proteome analyses, where many peptide observations are aggregated to estimate protein abundance, PTM site-level quantitation is often supported by a limited number of measurements for a single feature per sample. Moreover, in the absence of data showing the baseline protein levels, numerous mechanisms involving altered gene expression, protein synthesis, degradation, or alternative PTMs can confound apparent abundance changes in the PTM of interest. Our experience in studying ubiquitination suggests that label-free analyses are powerful at revealing protein level effects even in small sample sizes but that stable isotope-based approaches or large sample sizes are more required for quantification of site-specific features.
In practice, decisions about which quantitative proteomics technology to use are often bounded by the nature of the model system being studied. Issues encountered with primary or non-proliferating cell culture models can limit the amount of protein available or the feasibility of metabolic labeling, forcing consideration of alternative approaches. Despite the potential of these technologies, it is not possible to visualize a dynamic protein signal before it has emerged or after it has been resolved. Careful experimental protocols are required to ensure synchronization of the biological response between individuals and the timely collection of samples to maintain transient proteoforms in vivo. Given the importance of the time dimension, a pre-emptive definition of the temporal dynamics of known markers or phenotypes within the model system can dramatically improve the outcome of MS proteomics investigations. Similar considerations also apply to the realm of peptide abundance and instrument sensitivity. Discovery experiments must be performed on a scale large enough to ensure that low level analytes can be detected after enrichment and/or fractionation procedures. Scaling-up can insidiously affect signaling dynamics within cells and tissues. It can also result in unforeseen variability in the effectiveness of sample preparation protocols; the additional time and handling steps required to harvest experimental protein samples en masse can have consequences on protein, PTM, and inhibitor stability in solution (118), in addition to downstream steps such as digestion, enrichment, and desalting.
Beyond sample-processing considerations, a constant challenge in experimental design is determining how much instrument time should be devoted to an analysis. As undersampling in proteomics remains a key limitation, extending the instrument time devoted to an experiment can produce increased data density and yield improvements in signal-to-noise, especially when coupled with fractionation. At the same time, practical considerations such as the cost of MS instrumentation require that resources be allocated thoughtfully. By collecting quantitative information for each “isotopically bar-coded” feature within individual scan events, multiplexing methods such as isobaric tagging maximize the amount of useful data generated per unit of time. However, it has been shown that the quality of quantitative data from isobarically tagged samples scales with ion-injection time and the purity of the ion population (89, 90). Although recent reports are encouraging (119), additional careful studies are required to fully benchmark these technologies against the more traditional approaches for baseline protein profiling. Additionally, reagent cost, larger input sample sizes, and the nature of enrichment protocols can make isobaric labeling less attractive for multiplexed PTM analysis. For example, immunoaffinity enrichment of K-GG peptides in ubiquitin-substrate profiling requires the free N terminus of the PTM remnant that is blocked by isobaric labeling. The cost of the isotopically labeled amino acid reagent is a consideration in metabolic labeling approaches as well. It is important to consider these costs not only for pilot experiments but also for full-scale studies involving replicates and controls.
Irrespective of the method chosen, multiplexed quantitative proteomics provides a powerful tool that complements genetic approaches to allow for direct insight in the molecular mechanisms underlying biological processes. In recent years, a steady stream of studies illuminating specific areas of biology have marked the successful transition of these approaches from proof-of-concept applications to essential components of a molecular biologist's toolkit. Although the approaches will undoubtedly improve in sensitivity, precision, and ease of use, the current landscape of techniques allows one to gain rich knowledge about the molecular basis of biology and disease.
Author contributions: C. E. B. and D. S. K. wrote the paper.
The authors declare that they have no conflicts of interest with the contents of this article.
1 The abbreviations used are: