|Home | About | Journals | Submit | Contact Us | Français|
Replicate mass spectrometry (MS) measurements and the use of multiple analytical methods can greatly expand the comprehensiveness of shotgun proteomic profiling of biological samples1-3. Such MS data contains inherent biases and variations which create computational and statistical challenges for quantitative comparative analysis4. We have developed and extensively tested a normalized label-free quantitative method termed the normalized spectral index, SIN, which combines three MS abundance features: peptide and spectral count with fragment-ion (ms/ms) intensity. SIN largely eliminated variances between replicate MS measurements, permitting quantitative reproducibility and highly significant quantification of thousands of proteins detected in replicate MS measurements of the same and distinct samples. It accurately predicts protein abundance more often than current methods tested. Comparative immunoblotting and densitometry further validate this method. SIN enables comparative quantification of complex datasets from multiple shotgun proteomics measurements, which is potentially useful for systems biology and biomarker discovery.
Quantitative proteomics is widely used for examining differences in global protein expression between cellular states and in disease biomarker and target discovery3, 5-8. These methods are based on one MS feature of abundance such as spectral or peptide count or chromatographic peak area or height values using labelled or label-free approaches (see supplementary notes online for details). How to compare and quantify differential expression remains an important challenge for this field9, 10. To date, MS fragment ion intensities appear only to be used for candidate-based quantification, such as the quantification of small molecules relative to a labelled version of the analyte of interest11. A similar approach is single or multiple reaction monitoring (SRM, MRM) where transitions from selected precursor to specific fragment ions are monitored and compared to a spiked standard12, 13. Fragment ion intensities are also used in iTRAQ quantification, where the intensity of the reporter fragment ion is directly related to the abundance of the precursor from which it's derived14. To date, fragment ion approaches have not been applied in a label-free manner or used in large-scale shotgun proteomics analysis. Here, we explore their utility as an abundance feature.
We previously discovered that multiple MS measurements of a sample are required for large-scale shotgun proteomic platforms to achieve statistically significant comprehensiveness in protein identifications1 (supplementary notes). This is critical for biomarker discovery where proteins differentially expressed between normal and disease samples can only be meaningfully compared if samples are analyzed systematically and equivalently to completeness. This requires 4-8 MS measurements of each distinct sample1, 2, 15. Unfortunately, replicate data contains inherent biases and variations so that MS signals are frequently corrupted by systematic or even apparently random changes (supplementary notes).
We set out to develop and test various methods to quantify, normalize and compare complex label-free proteomic data. We concurrently developed and tested various methods to normalize these features to control for measurement biases and variations. We sought MS features of abundance recorded in all datasets that can be easily extracted, and thus can be universally mined. These include spectral count (SC, number of ms/ms spectra per peptide) and unique peptide number (PN). We also include fragment ion (ms/ms) intensities as a new feature easily extracted from typical MS data and, to our knowledge, not incorporated previously into unlabelled, normalized quantification.
The spectral index (SI) is the cumulative fragment ion intensity for each significantly identified peptide (including all its spectra) giving rise to a protein and is defined as:
Therefore, this equation inherently incorporates fragment ion intensity values with SC and PN for each protein.
To test the reproducibility of the raw MS abundance features, we graphed the mean diamonds and confidence circles (see online methods) of multiple MS measurements of the same liver endothelial plasma membrane sample, with the null hypothesis that all replicates are equal. The mean PN, SC, and SI across datasets were not sufficiently reproducible and showed significant differences (Figure 1a-c), easily visualized by the non-overlapping mean diamonds and confirmed by ANOVA (online methods). Thus, normalization is required to enable meaningful quantitative comparison within and between samples.
We began with a simplistic approach to normalize the MS datasets by using the “housekeeping” protein actin. The SI of each protein was divided by the SI of actin in each MS measurement to yield SIact (eq. 2),
This normalization approach was applied to the 5,923 proteins identified in common across all the liver replicate MS measurements. A significant difference was still detected between the replicates (Fig. 1d). The results were similar when we replaced SI with SC or PN, or when we tested different standards or tissue samples (data not shown).
Next we utilized mean SI values to normalize the data. The Mean Protein Intensity (MPI) was calculated by dividing the total SI for all identified proteins by the total number of proteins identified, n. The SI of each protein was subsequently normalized by MPI:
Similarly, we incorporated the total SC to generate another Mean Intensity normalization (MI) method (eq. 4). MI was calculated by dividing the total SI for the dataset by the total SC of the dataset . The SI of each protein was subsequently normalized by MI to yield SIMI for each protein:
However, the replicates were still significantly different (Fig. 1e, f). Equivalent normalization of the SC and PN datasets were even less effective (data not shown).
Next, we incorporated PN, SC and ms/ms intensities of each peptide into subsequent normalizations, because the SI values are dependent on these features. Each individual SI was normalized by either the total PN (PNt) for the protein, p (eq.5), total SC, TSC (eq.6), or global/total intensity, GI (eq. 7):
The replicate SIP and SITSC datasets were still significantly different (Fig. 1g, h). SIGI showed no significant difference between the datasets (Fig. 1i), indicating it succeeded in normalizing the replicate datasets.
Although the SIGI method provided a dramatic improvement, we aimed for further enhancement. Large proteins can contribute more peptides than smaller ones, thus their abundance may be overestimated16, 17. To correct for protein length (number of amino acids), we first normalized SI by protein length (SIL) (eq. 8, Supplementary Table 1), resulting in improvement over SI alone (Fig. 1j), but significant differences were still evident between the samples. As SIGI successfully normalized different samples, we incorporated protein length into this method, resulting in normalized SI (SIN):
No significant difference could be detected between the SIN normalized datasets (Fig. 1k) and SIN was superior to SIGI.
We applied the SIN normalization similarly to the SC datasets, by substituting SI with SC to yield SCN.
SCN failed to adequately reduce the variation between the datasets (Fig. 1l), showing the superiority of SI over SC. As expected, PN was even worse (data not shown).
Next, we compared SIN to two published SC methods, NSAF18 and Rsc19. NSAF failed to adequately normalize our replicate liver datasets (Fig. 1m). Substituting SI for SC in the NSAF approach proved much better but there was still a significant difference between the replicates (data not shown). Rsc proved better than NSAF in reducing the variation but was inferior to the SIN method, (Fig 1n). When SC was replaced by SI in the Rsc equation (RSI), SI yet again outperformed SC, demonstrating the substantial improvement that can be gained by using SI over SC, regardless of the normalization approach (Fig. 1n).
To validate SIN as an abundance feature, we performed spiking experiments with a mixture of 19 protein standards across a wide dynamic range (0.5-50,000fmol), with either BSA (2.65μg) or a complex plasma membrane fraction (40μg) as background (Supplementary Methods). The SIN for each of the standard proteins was calculated and plotted as a function of protein load. R2 = 0.9239 (Fig. 2a, Supplementary Fig. 2a). The slope of the regression line is 1.223 (95% CI of 1.101 to 1.345), meaning the magnitude of SIN for any given change in protein abundance can be calculated (Supplementary Data).
To determine how SIN compares to the most commonly used abundance features, namely SC and AUC, we analyzed the ability of SIN, SC and various AUC methods19, 20, to accurately predict the amount of each protein in “the standard protein mixture”21 (Supplementary Methods, Notes), a published dataset generated by ISB. For 13 of the 16 proteins in the standard mixture, no significant difference could be determined between the SIN predicted amount and the actual amount (Fig. 2 b, c, e). For the remaining 3 proteins, SIN came closest to predicting the actual protein amount 2 out of 3 times (Fig. 2d). Thus, SIN accurately predicted protein amount 81.25% and was the best method at predicting protein amount 93.75% of the time.
The next best method was total AUC, which was accurate 50% of the time (56.25%, including the one time AUC outperformed SIN when none of the methods were significantly accurate). The other AUC methods19, 20 faired just as poorly as the “raw” AUC method in accurately determining the protein amount (18.75%, 43.75%, respectively). SC predicted correct protein abundance for the proteins only 37.5% of the time.
To determine whether SIN could control for variation in sample load, we compared two different MS datasets taken from the same sample, but analyzed different protein amounts. Proper normalization should scale the individual 40 and 150μg samples, facilitating a direct comparison based on relative abundance. Before normalization, the mean SI values between the 40 and 150μg samples were clearly and significantly different (Fig. 3a). First, we corrected the SI values for the 40μg sample by the dilution factor, SI40*150/40 (Fig. 3b). However, the samples were still significantly different, (t-ratio 24.459). The Rsc method was more effective than the NSAF method but significant variation was still apparent (Fig. 3d, e). SIN was the only method tested that eliminated the variation introduced by dissimilar or even unknown protein loads, and thereby facilitated comparisons of proteins between these samples (Fig. 3c). When the t-ratios obtained from testing SIN and Rsc are plotted at incremental peptide cut-off levels (significantly identified peptides), SIN consistently outperforms Rsc as each t-ratio falls below the significance level of 2, meaning that no significant difference can be found between the SIN normalized datasets (Fig. 3f).
Next, we used SIN values to estimate protein amounts (online methods). Linear regression analysis of the ng amounts for each of the 2,660 common proteins from the 40 and 150ug samples fit a straight line (R2>0.94) with a slope of 3.72, in excellent agreement with the 150/40 ratio of 3.75 (Fig. 3g). This test provides additional strong statistical validation for the applicability of the SIN method to thousands of proteins for large-scale quantification of protein expression.
As a more stringent test, we determined whether the SIN method could facilitate unsupervised hieracherical clustering of biologically distinct datasets to identify correlated expression patterns. Using endothelial cell plasma membranes isolated from kidney and heart, each with five replicate MS measurements, we performed two-way unsupervised clustering (not imparting any prior knowledge onto the dataset) on the complete datasets using SIN values for all commonly identified proteins (across all datasets), which are represented by rows and tissues by columns (Fig. 3h). The clustering algorithm successfully clustered replicates according to tissue type, whereas distinct samples (heart vs. kidney) visually separated. This confirmed that SIN is quite successful in maintaining sameness of replicates while exposing the differences of distinct biological samples.
Next, we compared intensities obtained from a Coomassie-stained SDS-PAGE gel (Fig. 4a) to the SI abundance values generated by LC-MS/MS analysis. Each gel slice was treated as a distinct sample with an MS and densitometric measurement to determine relative abundance. The two profiles overlapped substantially for most of the gel (Fig. 4b). This strong correlation verifies further the utility of SI as a quantitative tool. Notably, MS also detected proteins in gel regions containing high molecular weight proteins with little Coomassie staining, consistent with this stain's well-known inability to stain some high molecular weight proteins. The sensitivity and utility of the SI method is readily apparent.
To determine whether we could detect quantitative differences for individual proteins expressed in two distinct samples, we applied the SIN method to datasets from total lung homogenates (H) versus lung endothelial plasma membrane (P) subfractions. Because the P fraction is physically derived from H, the protein composition of P is a subset of the proteins present in H. To determine those proteins that are enriched in P compared to H, we used SIN and western blotting to generate P/H expression ratios for individual proteins. Statistical comparison of the P/H ratios for 64 proteins, ranging from low to high abundance, produced a Spearman's Rho correlation coefficient of 0.86, indicating an excellent positive correlation between the two methods (Fig. 4c). As the Bland Altman plot 22 showed a mean difference of 0.09, and the data points fall within 2 s.d. of the mean (95% CI for the difference between the two methods), there is very little evidence to indicate that the quantitative methods are significantly different (Fig. 4d).
In this study, we have developed and tested a number of methods to quantify and normalize complex proteomics data obtained from a variety of MS methodologies. We performed extensive statistical testing and validation of our methods, systematically proving their utility through a wide variety of tests and demonstrating their application to diverse MS datasets. With the SIN method, we have successfully compensated for experimental and random bias and noise, thus showing that protein abundances, as reflected by mean SIN from replicate samples, are essentially the same. Conversely, the analysis of biologically distinct samples, with noise and bias controlled by proper normalization, enables meaningful direct quantitative comparisons reflecting their true biological diversity.
We aimed to generate an abundance index with the convenience of SC, but greater confidence at low peptide numbers without the added complexity of peak area or AUC measurements. We tested the benefit of combining PN, SC and ms/ms ion intensities into one metric, (SI) as opposed to using these features in isolation. This approach proved statistically robust, more so than SC or AUC (Fig. 1, Fig. 2 and Supplementary Fig. 5&6) and obviates the need for spiked samples.
Using the fragment ion intensity, specifically only those intensities that match the peptide of interest as they are inherently more reflective of the precursor, may facilitate more accurate measurements as there is less chance of including the signal of co-eluting precursors or background noise (Supplementary Discussion online). Also, for SIN, it's the peak height of the ms/ms fragments that are summed where as for the AUC, the precursor ions are integrated thus providing more scope for error with the integration process. This is particularly important as most mass spectrometers operate in conjunction with liquid chromatography (LC) systems, thereby producing MS scans permeated by chemical noise. In addition, the chromatogram becomes noisier as the sample complexity increases. For example, even our typical 36hr LC-LC-MS/MS runs for complex samples still produce over-lapping chromatograms due to co-elution peaks, making AUC quantification troublesome. Most groups use LC or LC-LC setups with much shorter elution times. This will continue to be an issue, even in the advent of high-resolution instruments, as improvements in MS resolution can only really be appreciated when the chromatographic resolution improves in tandem, which has yet to be fully realized. This is not an issue for the SIN calculation as the ms/ms spectra is inherently less complex and no integration is preformed.
SIN, above all other methods tested, could accurately determine the correct amount of each protein standard in a mixture (Fig. 2 b-d, e). Despite giving the AUC methods the best possible advantage (Supplementary Notes), SIN consistently outperformed them in determining protein abundance (Supplementary Fig. 5&6). SC performed as modestly as the AUC methods. SIN could accurately determine the relative abundance for thousands of proteins in complex samples, without the need for spiking of protein standards (Fig. 3g, Fig. 4). SIN also facilitated the identification of a subset of proteins enriched in P relative to H. We showed outstanding correlation between the SIN and western blot ratios (Fig. 4c&d), validating this enrichment.
As abundance features, SI, SC, PN, and AUC are not reproducible across replicate datasets (Fig. 1 a-c, Supplementary Fig. 4a). Methods of normalizing complex LC-MS/MS data are only just emerging19, 23-27, but no comparison is generally shown between the pre- and post-normalized data. Stringent validation has been rather sporadic and the results and efficacy have been variable24, 28-30. Therefore, we undertook a systematic and logical approach to normalize our datasets. Of the methods tested, only SIGI and SIN were successful in removing the variability between replicates (Fig. 1i). For SIN, we added a protein size parameter to the SIGI calculation, resulting in a clear benefit over SIGI alone (Fig. 1i, k). Even when we substituted SI for SC, PN or AUC in the normalization methods, SI consistently outperformed all other features, regardless of the sample, measurement or normalization approached applied (Fig. 1k, l, o & Supplementary Fig. 4c). Even the raw SI values show less variation between the replicates than the SIN “normalized” AUC values (Supplementary Fig. 4).
SIN does not over-normalize, but rather can reduce replicate variability to maintain sameness in datasets from a single sample while maintaining quantitative differences between distinct samples (Fig. 3h). SIN can also successfully normalize datasets with different loading amounts (Fig. 3c), thus, the dilution factor between samples can be accurately derived from the data simply by calculating the slope of the regression line (Fig. 3g). SIN can also control for the variation introduced by different MS methodological analysis of the same sample (Supplementary Fig. 3) to facilitate comparison and quantification across all datasets (Supplementary Fig. 7). This may have significant implications for the comparison of datasets acquired in different labs.
In summary, combining and normalizing several MS abundance features, including for the first time, fragment ion intensities, represents a novel approach to MS quantification with broad utility. When we compared our new methods to each other and to previously reported methods19, 20, the best method was SIN, which was developed through logical systematic application of enabling parameters which, when combined, produced improved normalization and quantification. This method now allows the quantitative comparison of biologically distinct datasets with high confidence and relative ease, and should therefore greatly facilitate the use of label-free quantitative proteomic approaches for differential protein expression analysis. This method is all the more valuable in the era of systems biology and biomarker discovery where distinct samples must be analyzed both quantitatively and comprehensively through the replicate MS measurements necessary to gain confidence in determining expression differences and novel biomarkers.
Proteins were pre-fractionated on SDS-PAGE gels prior to 2D-LC-MS/MS and Reverse Phase-MS/MS. For RP-MS/MS using either an LCQ or LTQ mass spectrometer, digested peptides were extracted from each gel slice and lyophilized. For 2D-LC-MS/MS using an LCQ, peptides extracted from each gel slice were first pooled into 7 groups then lyophilized2. Data acquisition from both the LCQ and LTQ was carried out in data-dependent mode. Full MS scan were recorded on the eluting peptides over the 400-1400 m/z range with one MS scan followed by three MS/MS scans of the most abundant ions. A dynamic exclusion was applied for Repeat Count of 2, a Repeat Duration of 0.5 minute, and an Exclusion Duration of 3 min. A dynamic exclusion window was applied for duration of 10 minutes for 2D-LC-MS/MS.
The acquired MS/MS spectra were converted into mass lists using the Extract_msn program from Xcalibur and searched against a protein database containing human, rat and mouse sequences using the Sequest program in Bioworks™ 3.1 for Linux (Thermo Fisher Scientific, Inc., Waltham, MA, USA). The searches were performed allowing for tryptic peptides only with peptide mass tolerance of 1.5 Da for LTQ data, 2.0 Da for LCQ data. Accepted peptide identification was based on a minimum ΔCn score of 0.1; minimum cross correlation score of 1.8(z=1), 2.5(z=2), 3.5(z=3). The false positive identification rates were ≤ 1% (see supplementary methods). The positive protein identification results were extracted from Sequest.out files, filtered and grouped with DTASelect software using the above criteria. Proteins were identified based on 2 unique significantly identified peptides.
Fragment ion intensity (intensity of the ions in the ms/ms spectrum that are assigned to a given peptide), peptide number (number of unique peptides identifying a protein) and spectral counts (number of ms/ms spectra assigned to a particular peptide) were extracted from the DTAselect output files using a script written in-house (see supplementary methods). For the purpose of this manuscript, fragment ion intensity is defined as the total intensity of all detected b and y fragment ions (ms/ms spectra) for a specific peptide. The fragment ion intensity of each peptide that passes the threshold for identification that gives rise to a significantly identified protein (see above) is summed. The combination of these summed fragment ion intensities from all ms/ms spectra and peptides relating to a given protein is combined and is referred to as the spectral index (SI) for that protein.
NSAF is described by Zybailov et al18 as
where Spc is the spectral count for protein k and L is the length of protein k.
The Rsc is described by Old et al19 as:
where, for each protein, RSC is the log2 ratio of abundance between Samples 1 and 2; n1 and n2 are spectral counts for the protein in Samples 1 and 2, respectively; t1 and t2 are total numbers of spectra over all proteins in the two samples; and ƒ is a correction factor set to 0.5 by Beissbarth et al33 and varied in the study by Old et al. We used the 1.25 correction factor as per Old et al19.
SIN were converted to estimated nanogram amounts, by including the initial sample load in the final calculation using the following equation:
where j = number of all proteins identified with ≥2 unique peptides, and the subscript i refers to the ith protein of j total proteins, and Q is the amount of sample (μg) used in a given measurement.
JMP IN 5.1 (SAS Institute) was used for all statistical analysis. T-tests and ANOVAs are common statistical tests used for determining difference between sample means but require data to be normally distributed to achieve analytical rigor. Our raw SC, PN and SI datasets were not normally distributed (Supplementary Fig. 1) as measured by the skewness and kurtosis of the frequency distribution. To maintain statistical rigor and to avoid inflated variance, we performed a log10 transformation of our datasets which produce a reasonable normality as determined from the histogram and Q-Q plots (Supplementary Fig. 1). Thus, for comparative statistical analysis, we similarly transformed all the datasets after performing the normalizations described below. It should be noted that equivalent results to those described below were obtained with non-parametric analyses (data not shown).
To visualize normalized datasets, we graphed the mean (center line) and 95% confidence intervals (CI), indicated as diamonds on the graphs, of normalized spectral indexes. If the CIs shown by the mean intervals do not overlap, the groups are significantly different. The reverse is not necessarily true and significance is determined from the summary statistics associated with the analysis (see below). The confidence circles are another way of visualising the diamonds and aids in determining CI overlap.
To determine whether there was any evidence that the replicate values were significantly different before and after application of the normalization methods, we applied a t-statistic (2 replicates) or ANOVA, one-way (>2 replicates) to look for differences in normalized mean abundance features. For the statistical analysis, we used only the proteins that were identified in common across all replicate datasets for a particular comparison. Our null hypothesis was that both (2 replicates) or all (>2 replicates) samples were equal. For the t-statistic (2 replicates), the normalized values were deemed significantly different if a large t-ratio (as determined from the t-tables) and a small P-value (p<0.05) were produced from the t-statistic. We use a t-ratio <2 in absolute value for significance as it approximates the 0.05 significance level. For analysis of difference in mean intensities between multiple replicate (>2) samples, analysis of variance (ANOVA, one-way) was performed. Our null hypothesis was that all replicate samples were equal. If our null hypothesis was true then we expect the F-ratio to be ~1. (Informally, the smaller the F statistic [equivalently, the larger the p-value], the closer the agreement across the replicates). Our significance level was p<0.05. If there is no statistically significant difference between the replicates (as indicated by F-ratio ~1) we conclude that the normalization method succeeded in controlling for the variation between the replicate datasets.
Cluster analysis was performed on a dataset from 5 replicate MS measurements of endothelial cell plasma membranes isolated from kidney and heart samples using JMP 5.1, and using Wards hierarchical method34. Ward's method is a hierarchical method designed to optimize the minimum variance within clusters (minimizes within-group dispersions). The SIN values for each protein was normalized across each row (all 10 samples) using the following standard approach: (SIN – (mean SIN)row/(standard deviation) row).
This work was supported by NIH grants (to J.E.S): RO1HL074063, R33CA118602, and P01CA104898.
N.M.G designed, developed and analyzed the methods, provided some of the mass spectrometry data, performed the spiking experiments and analysis, wrote the manuscript, J.Y. initiated the project, designed, tested and implemented the methods; F.L. developed the scripts for data extraction, P.O. performed western blots and densitometry, S.S. performed western blots, YL provided key mass spectrometry data, J.A.K. provided direction for statistical analysis, J.E.S supervised the project, designed specific tests, and helped to write the manuscript. All authors have read and agreed to all the content in this manuscript.
See supplementary information for additional methods and details.