|Home | About | Journals | Submit | Contact Us | Français|
Using global liquid chromatography-mass spectrometry (LC-MS)–based proteomics analyses, we identified 24 serum proteins that were significantly variant between those with type 1 diabetes (T1D) and healthy controls. Functionally, these proteins represent innate immune responses, the activation cascade of complement, inflammatory responses, and blood coagulation. Targeted verification analyses were performed on 52 surrogate peptides representing these proteins, with serum samples from an antibody standardization program cohort of 100 healthy control and 50 type 1 diabetic subjects. 16 peptides were verified as having very good discriminating power, with areas under the receiver operating characteristic curve ≥0.8. Further validation with blinded serum samples from an independent cohort (10 healthy control and 10 type 1 diabetics) demonstrated that peptides from platelet basic protein and C1 inhibitor achieved both 100% sensitivity and 100% specificity for classification of samples. The disease specificity of these proteins was assessed using sera from 50 age-matched type 2 diabetic individuals, and a subset of proteins, C1 inhibitor in particular, were exceptionally good discriminators between these two forms of diabetes. The panel of biomarkers distinguishing those with T1D from healthy controls and those with type 2 diabetes suggests that dysregulated innate immune responses may be associated with the development of this disorder.
Type 1 diabetes (T1D) mellitus is widely considered to result from an autoimmune destruction of the insulin-producing pancreatic β cells (Tisch and McDevitt, 1996; Mathis et al., 2001; Knip and Siljander, 2008). Although the presence of several human leukocyte antigen genotypes indicate the importance of genetic predisposition to T1D (Horn et al., 1988; Sheehy et al., 1989; Hagopian et al., 2011; Vehik and Dabelea, 2011), and increasing evidence points to environmental triggers and regulators (Knip et al., 2005; Hober and Sauter, 2010; Norris, 2010; Stene et al., 2010; Foxman and Iwasaki, 2011), the exact etiology of this disease remains unknown.
It has been estimated that only 20% of β cell mass remains at the clinical presentation of T1D (Knip and Siljander, 2008), which is typically preceded by an asymptomatic period of highly variable duration that can last for a few months or for decades (Knip, 2002). The appearance of one or more autoantibodies against islet cell antigens is among the first detectable signs of emerging β cell autoimmunity (Knip et al., 2005). These autoantigens include glutamic acid decarboxylase (GAD), protein tyrosine phosphatase (IA-2), insulin, and, most recently, the zinc transporter Slc30A8 protein (Wenzlau et al., 2007). Multiple autoantibody positivities, and their persistence, are unequivocally related to the risk of progression to overt T1D, as noted in both family studies and surveys of general population cohorts (Mueller et al., 2002; Bingley et al., 2003; Barker et al., 2004; Siljander et al., 2007; Knip and Siljander, 2008). Although performance of autoantibody assays has improved considerably over the years, owing in large part to efforts by the Diabetes Antibody Standardization Program (DASP) and The Environmental Determinants of Diabetes in the Young consortium to standardize these assays (Bonifacio et al., 2010; Schlosser et al., 2010; Törn et al., 2008), not all islet autoantibody-positive subjects progress to T1D (Bingley et al., 1997; Barker et al., 2004; Siljander et al., 2007). In addition, the pathogenic role (if any) for islet autoantibodies in T1D remains elusive (Howson et al., 2011).
Therefore, we explored the potential of proteomics technologies for identifying novel biomarkers that could provide additional insight into the pathogenesis of T1D and whose measurement could be more accurate and precise for disease prediction and/or diagnosis than the currently available autoantibody measurements. We used liquid chromatography-mass spectrometry (LC-MS)–based, bottom-up proteomics measurements to discover blood serum peptides/proteins that varied significantly between type 1 diabetic and control subjects. These candidate peptide biomarkers were further verified using targeted, multiplexed multiple reaction monitoring (MRM) LC-MS assays (Anderson and Hunter, 2006; Kuzyk et al., 2009; Schiess et al., 2009) in a DASP sample cohort consisting of 100 healthy controls and 50 patient subjects. Using this approach, we identified a set of peptide biomarkers with above average ability to distinguish T1D from healthy controls, and these peptides were further validated in an independent 20-sample set blinded to the investigators. In addition, using serum samples from 50 age-matched type 2 diabetes (T2D) individuals, these proteins were assessed for their specificity to hyperglycemia, the common physiological outcome shared between type 1 and T2D, with a panel of peptides identified to be specific only to T1D.
For discovery of candidate protein markers of T1D, we prepared 10 pooled sera from healthy control individuals and 10 from individuals with T1D using samples of a DASP cohort; each pool was comprised of 5 subjects. To achieve broad proteomic coverage and to construct an accurate mass and time (AMT) tag reference database of identified peptides, intensive sample fractionation was performed at both the protein (to deplete the major serum proteins) and the peptide levels (to reduce the complexity of proteolytic digests before LC-MS analysis), in combination with high-throughput LC-MS/MS analyses. Subsequent label-free quantitative proteomic measurements on tryptic digest of each pooled sera were performed using the LC-MS–based AMT tag approach (Zimmer et al., 2006; Metz et al., 2008). For these analyses, the samples were fractionated only at the protein level. LC-MS datasets were then analyzed using an established pipeline of software tools developed in house for AMT tag-based, bottom-up proteomics data (Kiebel et al., 2006). In the end, we identified 24 proteins (Fig. 1) that demonstrated significant changes (P < 0.05, Student’s t test) between type 1 diabetic and healthy control samples, including 4 proteins (AZGP1, CLU, SERPINA6, and LUM) that showed statistically significant differences between T1D and healthy controls in a previous study (Metz et al., 2008). Functional annotation of these proteins showed that most are extracellular proteins secreted from the liver, and have important roles in the innate immune response, complement activation cascade, inflammatory response, and blood coagulation. Collectively, these results implicate systemic dysregulations of pathogen clearance activities and imbalances in blood coagulation and humoral immune response in T1D.
To significantly improve the accuracy, sensitivity, and specificity of peptide measurements, and to evaluate the utility of these candidate proteins as T1D markers in large cohorts, it is critical to select proteolytic peptides that can be used as surrogates of these proteins. To this end, we used an iterative screening approach (Kuzyk et al., 2009) to select proteolytic peptides that have high detectability in tryptic digests of human serum and low interference from sample matrices, which resulted in 52 peptides as surrogates for these 24 proteins in a multiplexed LC-MRM-MS assay (Table S1) on whole serum without depletion of major serum proteins. Quantification of these 52 peptides was assisted by spiking their custom-synthesized, stable isotope–labeled standard (SIS) peptide analogues into tryptic digests of each individual serum sample. These SIS peptides co-elute, ionize, and fragment identically with their endogenous counterparts (Anderson and Hunter, 2006; Kuzyk et al., 2009), and because the spiked amounts were individually adjusted to be close to the levels of their endogenous analogues (Kuzyk et al., 2009), abundances of the endogenous peptides and their corresponding proteins could be accurately measured based on the peak area ratios between endogenous and SIS peptides.
To minimize systematic errors during quantification of peptides, we randomized the orders of both the sample proteolytic digestion and LC-MRM-MS analysis. The measured abundances (the peak area ratios between endogenous peptides and their SIS analogues) of these 52 peptides in each of the DASP cohort samples were statistically evaluated to determine their power in differentiating T1D from healthy controls. It should be noted that our analysis was based on all 150 individuals without removing any outliers, as no further diagnostic or follow-up information was available on the anonymous DASP samples beyond the original sample designation, collection, and demographic information, despite the fact that there were samples clearly showing poor correlation within each sample group based on the partial least squares (PLS) and correlation analyses (Fig. 2).
The levels of 33 peptides were significantly different in type 1 diabetic subjects compared with healthy controls on the basis of the Mann-Whitney U test (P < 0.001), with 14 peptides down-regulated by ≥1.5-fold and 6 peptides up-regulated by ≥1.5-fold in type 1 diabetic subjects (Table 1). Receiver operating characteristic (ROC) curve analysis was used to evaluate the performance of each peptide assay in discriminating type 1 diabetic from healthy control individuals. The areas under the curve (AUCs) with 95% confidence intervals showed that 16 of these 33 peptides had AUCs ≥0.8 (Table 1). Four proteins were notable: complement C3, gelsolin (GSN), N-acetylmuramoyl-l-alanine amidase (PGLYRP2), and transthyretin (TTR); these proteins showed significant down-regulation among all of their 10 constitutive peptides. Importantly, the relative levels of down-regulation among peptides from the same proteins agree well with each other. Two proteins, platelet basic protein (PPBP) and plasma protease C1 inhibitor (SERPING1), showed significant up-regulation among the six surrogate peptides monitored, although the relative levels of up-regulation between the two peptides from PPBP did not agree well (see Discussion), with peptide NIQSLEVIGK having a dramatic up-regulation of 8.5-fold (P = 6.62E-18, U test) in the type 1 diabetic group. This peptide also had the highest AUC (0.93) in differentiating disease from control (Fig. 3). In addition, the cut-off values of relative peptide abundance corresponding to sensitivity at 90% specificity (AS90) were obtained from the ROC sensitivity and specificity analyses (Bingley et al., 2003) for each peptide, and this value was used as a threshold to classify the blind samples into control and type 1 diabetic individuals.
An independent DASP cohort, composed of sera from 20 individuals and blinded to the investigators, was measured using the aforementioned multiplexed LC-MRM-MS peptide assay to evaluate the utility of these 52 peptide assays in diagnosing T1D. After sample unblinding, the sensitivity for each peptide assay was calculated as the percentage of type 1 diabetic sera reported as positive using the cut-off value at AS90, and the specificity was calculated as the percentage of control sera reported as negative using the same threshold. Using this approach, we found 7 peptides from 4 (PGLYRP2, PPBP, SERPING1, and TTR) of the aforementioned proteins showed both sensitivity and specificity >80% (Table 1). Importantly, two peptides, NIQSLEVIGK and LLDSLPSDTR, achieved both 100% sensitivity and 100% specificity (Fig. 4). As shown in Fig. 4, if the cut-off values were slightly adjusted, then two more peptides (FQPTLLTLPR and TNLESILSYPK) would also have achieved 100% sensitivity and 100% specificity. For the peptides that were up-regulated in the type 1 diabetic subjects from this blinded cohort, NIQSLEVIGK showed a dramatic increase of 30.4-fold (P = 1.08E-5, U test), whereas 3 peptides in C1 inhibitor protein (SERPING1) all had a fold change of greater than sevenfold (P = 1.08E-5, U test).
Hyperglycemia is the common clinical outcome among all types of diabetes mellitus. To establish the specificity of these peptide markers to T1D (i.e., instead of diabetes-associated hyperglycemia in general), abundances of these peptide markers in serum samples of 50 age-matched (Table 2) T2D individuals were measured using our established LC-MRM-MS–based assays and compared with that of the T1D and healthy control individuals in the DASP blind set. Except for the constituent peptides of proteins GPX3, GSN, HGFAC, LUM, SERPINA6, and TTR, where there is no significant variance (P > 0.01, U test) between type 1 and T2D, our results clearly showed that 21 peptides have an increased ability (P < 0.001, U test; absolute fold change ≥ 1.5) to distinguish between these two types of diabetes (Table 1). These peptides are from proteins (C3, C4A, C6, CFP, KLKB1, KNG1, PGLYRP2, SERPIND1, and SERPING1) that are involved in complement activation, blood coagulation, and the innate immune and inflammation responses. Notably, peptides from C1 inhibitor had significant up-regulation in T1D, but were down-regulated in T2D compared with healthy controls (Fig. 5). In addition, 22 peptides were also identified as having increased ability (as previously defined) to distinguish between T2D and healthy controls (Table 1), and these peptides could be used as markers for T2D. Importantly, the levels of the best performing peptides do not appear to correlate with the levels of HbA1c, which is the current best marker for average hyperglycemia (Fig. 6).
The inference of parent protein abundance based on constituent peptide abundance is a key general challenge in bottom-up proteomics (Nesvizhskii and Aebersold, 2005), especially in respect to human blood samples, where very complex alternate splicing, in vivo proteolytic processing, and posttranslational modifications occur on protein precursors. We used BLAST to ensure that each of the 52 peptides selected in our study were unique to only one gene name. The calculations of protein concentrations based on SIS peptides in general agree with the literature survey (Hortin et al., 2008) or with similar MRM types of protein concentration measurement (Kuzyk et al., 2009). Our results also showed that peptides belonging to the same protein had similar abundances, except for the platelet basic protein, as previously noted.
Based on the level of fibrinogen measured by a sandwiched micro-ELISA assay (Gonzalez et al., 2011), a majority of the samples in the DASP verification set are sera, whereas a few are plasma (Table S2). In addition, there were eight T1D samples collected using plasmapheresis procedures, and our ELISA assay showed them to be plasma. These plasmapheresis samples correlate poorly (Fig. 2) with the rest of the diabetic subjects, possibly because of the low total protein concentrations for these samples (55.5 ± 18.2 µg/µl) compared with that of the rest of the group (85.5 ± 20.0 µg/µl). Except for these eight samples, the sample type (serum or plasma) doesn’t affect the performance of our peptide assays. For the outliers that we identified from the control or diabetic subjects, as revealed by PLS and correlation plot analyses (Fig. 2), the abundances of the target peptides in these samples do not correlate with race, age (Fig. 7), and gender, or with protein concentrations as measured by bicinchoninic acid (BCA) assay (Table S2).
Currently, levels of GAD, IA-2, and insulin autoantibodies are the best markers for prediction and diagnosis of T1D (Törn et al., 2008; Bingley et al., 2010; Schlosser et al., 2010); however, it has been reported that people with high levels of these autoantibodies do not always progress to a diabetic state (Bingley et al., 1997; Barker et al., 2004; Siljander et al., 2007). In general, for the samples in our study, diagnosis of diabetes correlates to the number of autoantibodies deemed positive, but there are some control subjects with high positivities of at least one of the three autoantibodies, whereas the autoantibody levels of two type 1 diabetic subjects were called negative by most of the DASP laboratories (Table S2). Notably, our peptide assay results can be independent of the positivities of the autoantibody assays. For the two diabetic subjects having very negative autoantibody levels, our results showed that their levels of peptide NIQSLEVIGK are either above or slightly below the mean of this peptide in the diabetic group. Conversely, some of the control subjects with high positivities of GAD autoantibody assay had very low levels of this peptide. This indicates that the peptide biomarkers identified in our study are not absolutely dependent on the widely accepted autoantibody markers.
The specificity evaluation using samples from individuals with T2D demonstrated that most of the peptide markers differentiating T1D patients from healthy controls in the DASP validation cohort appear to be specific only to T1D. Furthermore, although patient levels of HbA1c are not available from the DASP, our analysis of the relationship between HbA1c levels in the T2D samples and the best performing peptide markers didn’t show a correlation (Fig. 6). However, the potential association between glycemia and these protein markers still needs to be determined, particularly in a prospective cohort during the period of gradually increasing glycemia that precedes the diagnosis of T1D.
In its mature form, PPBP precursor can be proteolytically cleaved into 10 polypeptide chains with different functions (UniProt accession no. P02775; Fig. 8). The two peptides that were selected to represent PPBP have sharp differences in the level of up-regulation in type 1 diabetic subjects, with EESLDSDLYAELR (aa 50–62) only having a modest 1.6-fold increase in both the DASP verification and blind set, whereas NIQSLEVIGK (aa 76–85) showed a dramatic 8.5-fold (23-fold if outlier samples had been excluded from the statistical analysis) up-regulation in the DASP verification set and 30-fold up-regulation in the blind set (Table 1). Because EESLDSDLYAELR exists exclusively in two forms of CTAPIII, TC-2 and β-TG, whereas NIQSLEVIGK exists in all four of these proteins, in addition to TC-1 and the five forms of NAP-2 (Fig. 8), we reason that TC-1/NAP-2s are the major sources of this latter peptide. TC-1 is an antibacterial protein released from activated platelet α-granules as part of the innate immune response (Krijgsveld et al., 2000), whereas the NAP-2s are activators of neutrophils and can be generated from proteolytic cleavage of both CTAPIII and PPBP by tissue proteases (Walz and Baggiolini, 1990). Our results also showed that these two peptides were significantly up-regulated in sera of T2D subjects when compared with healthy controls, but their levels in T2D subjects were not dramatically different than those found in type 1 diabetic individuals (Table 1). Procoagulant and proinflammatory conditions deriving from increased platelet adhesiveness have been reported in T1D (Targher et al., 2011). In addition, genomic analysis of autoimmune thyroid disease and latent autoimmune diabetes of adults identified the PPBP gene CXCL7 to be overly expressed in monocytes (van der Heul-Nieuwenhuijsen et al., 2010). The general overexpression and differential expression of these two peptides in diabetes warrants further study to delineate the original isoform of these peptides and to investigate their role in the pathogenesis of T1D.
We identified plasma protease C1 inhibitor (SERPING1) as a sensitive marker for diagnosing T1D, as its level was significantly up-regulated in respect to healthy controls, and down-regulated in T2D. C1 inhibitor is known to regulate the activation of C1 complex and inhibit chymotrypsin and kallikrein, and it may play a crucial role in blood coagulation, fibrinolysis, and suppression of inflammation (Davis et al., 2008; Stoermer and Morrison, 2011). Alternatively, C1 inhibitor can be used by bacteria against complement-mediated lysis through binding to bacterial cell membranes (Lathem et al., 2004). The mechanism of up-regulation of C1 inhibitor and its role in the pathogenesis of T1D remains to be investigated. In addition to C1 inhibitor, other protease inhibitors, such as heparin cofactor (SERPIND1) and kininogen-1 (KNG1) were down-regulated consistently in T1D, but up-regulated in T2D.
Several proteases(BTD, C1R, C2, CNDP1, F2, KLKB1, and PGLYRP2) showed differential regulation in type 1 versus T2D. As an example, human peptidoglycan recognition protein 2 (PGLYRP2), an innate immunity protein, is an N-acetylmuramoyl-l-alanine amidase that hydrolyzes bacterial peptidoglycan, which is constitutively produced in the liver and secreted into the blood, and also induced by bacteria in epithelial cells (Dziarski and Gupta, 2006). It was suggested that this amidase eliminates proinflammatory peptidoglycan and thus prevents overactivation of the immune system leading to excessive inflammation (Hoijer et al., 1997). In contrast, glutathione peroxidase 3 (GPX3) showed similar levels of down-regulation in both types of diabetes. With its role in protection of cells and enzymes from oxidative damage, its down-regulation reflects the body’s reduced capability in modulating oxidative stress that commonly occurs in diabetes/hyperglycemia. Other proteins that were consistently down-regulated in both type 1 and 2 diabetes include gelsolin (GSN) and transthyretin (TTR). Interestingly, transthyretin has been reported to have decreased levels in chronic pancreatitis (Lasztity et al., 2002) and is involved in the development of β cell failure/destruction in T1D (Refai et al., 2005).
The proteins we identified are known to play a role in the innate immune response, complement activation cascade, inflammatory response, and blood coagulation. Given this identification, their potential role in these physiological processes, as well as their contribution to the development of T1D, warrant further mechanistic investigation. Such investigations are also important, as none of these proteins have direct overlap with genetic susceptibility loci identified from the genome-wide association studies in T1D (Todd et al., 2007). Associations between altered innate immune responses have been shown by others in functional genomics studies (Wang et al., 2008), specifically of up-regulated proinflammatory factors using PBMCs and sera of type 1 diabetic subjects. Although the list of proteins that we identified did not exactly match the gene list in Wang et al. (2008; and our serum samples are not suitable for mRNA measurements), we note that some proteins identified in our study, such as PPBP (CXCL7) and SERPING1, are from the same gene families as those reported in previous studies (i.e., CXCL1, CXCL3, CXCL5, SERPINB2, SERPINB8). It should also be noted that gene regulation and protein abundance have more complicated relationships. In this respect, Vogel and Marcotte (2012) reported that regulatory processes after production of mRNAs (i.e., posttranscriptional, translational, and protein degradation regulation) play substantial roles in controlling steady-state protein abundances. Therefore, transcript abundances are not proxies for the concentration and activities of proteins.
In respect to the potential site of their origin (i.e., production), most of the proteins identified in our study are known to be secreted from the liver. Considering the (apparently) limited inflammation in the pancreatic insulitis of T1D, we reason that the abundance of these proteins likely reflect more of a secondary or systemic immune response to the pathology of T1D, although we cannot rule out that some proteins are a result of insulitis. If our belief is correct, one might presume it the time for maximal production in T1D, as it has been suggested that the peak of insulitis occurs at or near the onset of disease (In’t Veld, 2011). To answer this question, and to address the important issue regarding the predictive value of these serum proteins for T1D prediction, in the future the abundances of these proteins in serum will be monitored using samples from the natural history studies of T1D (i.e., both before and long after disease onset).
To our knowledge, this is the first report that systemic, proteome-level dysregulation of the innate immune response is a characteristic of T1D, which sheds new light on the pathogenesis of this disease and may point to new strategies in diagnosis, intervention, and prevention. Importantly, some of these surrogate peptide markers are independent of the commonly used autoantibody assays for diagnosis of T1D, and are specific to only individuals with this disease, and not diabetes-induced hyperglycemia in general. Collectively, our results demonstrate the power of LC-MS–based proteomics technologies in the discovery and validation of peptide markers of T1D from human serum and plasma. Although the proteins we have identified have little overlap with those discovered by Zhi et al. (2011), the LC-MRM-MS peptide assays that we have developed provide an efficient approach to validate our findings in other large-scale, well-characterized population cohorts, as well as in future efforts to evaluate the significance of these peptides in predicting T1D progression and to unravel the roles of these proteins in the pathogenesis of T1D using natural history repository samples.
All chemicals and peptide-desalting solid phase extraction cartridges (Supelco Discovery DSC-18) were purchased from Sigma-Aldrich, and the Micro-BCA protein assay kit was obtained from Thermo Fisher Scientific. Sequencing-grade trypsin was purchased from Promega. All solvents used were LC-grade or higher. SIS peptides with uniformly [13C]- and [15N]-labeled arginine or lysine residues on C termini and carbamidomethyl modification of cysteine residues were custom synthesized by Thermo Fisher Scientific at the purity level of AQUA Basic (Purity >95%). The SIS peptides were received lyophilized and used as is without further purification. The amount of each SIS peptide was determined by the manufacturer before lyophilization.
Human serum and plasma samples for discovery, verification, and validation analyses were provided by the DASP program, in accordance with the Human Subjects policies and regulations of the United States Centers for Disease Control and Prevention. The DASP samples were anonymous and without donor identities. The samples for verification corresponded to 100 healthy control individuals and 50 patients diagnosed with T1D, with mixed ethnicities and genders in each group (Table S2). An additional 20 blind samples that were independent to the verification cohort were also provided by the DASP to validate the initial findings. Patient samples were collected worldwide and were taken from donors within 14 d of their starting insulin treatment. Healthy control samples were from individuals who self-reported no diabetes in themselves or their families. The blind samples were not decoded to the investigators until data analysis was complete and results were reported to the DASP (Table S3). For evaluation of the specificity of these markers to diabetes-induced hyperglycemia, serum samples from 50 age-similar, islet autoantibody−negative individuals with clinically confirmed T2D were obtained from the University of Colorado Denver (Table S4), and were previously collected under the approval of the University of Colorado’s Institutional Review Board (IRB). Similarly, all work reported here was approved by the IRB of the Pacific Northwest National Laboratory. All samples were received frozen on dry ice.
The AMT tag approach (Zimmer et al., 2006; Metz et al., 2008) was used in quantitative proteomic analyses to discover peptide/protein markers of T1D. In brief, aliquots of each control or patient serum/plasma sample (n = 50 each) were pooled, and the pooled samples were then subjected to immunoaffinity subtraction using a SuperMix LC2 immunodepletion system (Sigma-Aldrich) coupled with an Agilent 1100 series HPLC, as described previously (Qian et al., 2008). The flow-through fractions (low abundance proteins) were collected, pooled, and then concentrated in Amicon Ultra-15 concentrators (Millipore) with MWCO of 3 kD, followed by a buffer exchange to 50 mM NH4HCO3 in the same unit, according to the manufacturer’s instructions. As reported previously, sample proteins were next sequentially denatured, reduced, alkylated, and digested with trypsin; the peptide mixtures were then cleaned with C18 SPE cartridges and fractionated using strong cation exchange chromatography (Metz et al., 2008). A total of 30 peptide fractions were collected and analyzed in duplicate using a custom-built 4-column nanocapillary LC system coupled online to a linear ion-trap mass spectrometer (LTQ; Thermo Fisher Scientific). Peptides were separated on capillary columns (75 µm × 65 cm) packed in-house with 3-µm Jupiter C18 particles (Phenomenex; Qian et al., 2008). The LTQ was operated in data-dependent MS/MS mode, during which a full MS scan was followed by 10 MS/MS scan events.
The SEQUEST search algorithm was used to match the MS/MS fragmentation spectra with sequences from the IPI human protein database (Version 3.39); static carbamidomethylation of cysteine and dynamic oxidation of methionine were used for the database search. Database-matched results were filtered using criteria based on the cross correlation score (Xcorr), Δ correlation (ΔCn) values, trypsin cleavage rules, and charge states to limit false positive identifications to ~1% at the peptide level using the decoy database approach (Metz et al., 2008). Peptides passing these filter criteria were added to the AMT tag database. The final plasma AMT tag database contained 18,157 human plasma peptides available for matching to subsequent high-resolution LC-MS datasets (see following paragraph), which include data from a previous proteomics study of a DASP sample subset (Metz et al., 2008) and several other studies. The peptide elution times from each LC-MS/MS analysis were normalized to a range of 0 to 1 using a predictive normalized elution time (NET) model (Petritis et al., 2006). Both calculated monoisotopic masses and observed NETs of identified peptides were included in the AMT tag database.
Aliquots of each individual control and patient serum/plasma sample were pooled to form 10 pooled control and 10 pooled patient samples, with samples from 5 individuals comprising each pool. Each pooled sample was subjected to immunoaffinity subtraction of abundant proteins using the SuperMix immunodepletion system, as described in the previous section, with the exception that both the flow-through (low abundance proteins) and bound (high abundance proteins) fractions were collected separately and subjected to tryptic digestion and clean-up, as described above. Peptides from each pooled sample were then analyzed in duplicate using the same nanocapillary LC system described above, which was coupled to a 9.4 Tesla Fourier transform ion cyclotron resonance-mass spectrometry (FTICR-MS; Bruker Daltonics) set to only collect high-resolution MS data.
LC-FTICR-MS datasets were processed using the PRISM Data Analysis system (Kiebel et al., 2006), a series of software tools (e.g., Decon2LS, VIPER; freely available at http://ncrr.pnl.gov/software/) developed in-house. Decon2LS functions to deisotope the raw MS data, providing the monoisotopic mass, charge state, and intensity of the major peaks in each MS spectrum. The data were then examined in a 2D fashion to identify “features” using VIPER; each feature has a median monoisotopic mass, central NET, and abundance estimate computed by adding up the intensities of the MS peaks that comprise the entire LC-FTICR-MS feature. To facilitate identification and quantification across multiple datasets, the detected features in each dataset (referring to data from a single LC-FTICR-MS analysis) were aligned against the peptides within the AMT tag database using the LCMSWARP algorithm. This is accomplished by comparing the measured monoisotopic masses and NETs of the detected features to the calculated ones of each of the peptides in the AMT tag database within search tolerances of ±3 ppm and ±0.02 NET for monoisotopic mass and elution time, respectively. This peak-matching process gave an initial list of peptide identifications for each individual dataset; in addition, all peptides were required to be observed in at least 50% of LC-FTICR-MS datasets in each disease state.
DAnTE (Polpitiya et al., 2008) was then used for quantitative and statistical analysis of the identified peptide abundances. In brief, the matrix of peptide abundances from all LC-FTICR-MS analyses were log2 transformed, and then normalized globally using a central tendency algorithm (Callister et al., 2006) to variations in the data caused by amount of sample loaded onto the LC column and peptide ionization efficiency. For each peptide, a Student’s t test was performed between the samples in the control and patient groups, and only peptides with p-values <0.05 were considered as significantly changed. To facilitate visualization of quantitative changes in peptide abundances across all 20 samples, the data were Z-score transformed before loading in the open-source tool TIGR Multiexperiment Viewer (Saeed et al., 2006). Functional annotation of these proteins were performed on line using DAVID (Huang et al., 2009).
The processing and analysis of the global quantitative proteomics data resulted in the identification of 24 candidate protein biomarkers of T1D, the peptides of which were screened to remove those with the following characteristics: (a) nontryptic; (b) greater than 20 aa long; (c) shorter than 7 aa; and (c) containing methionine (Kuzyk et al., 2009). In addition, an attempt was made to remove peptides that contained cysteine or other known posttranslational modification sites. The best-scoring tandem mass spectra for each peptide identified in the global proteomics data were manually reviewed using Mass Analyzer (Zhang, 2004), and the peptide sequence fragmentation modeling tool in Molecular Weight Calculator to select the precursor and associated top six most intense fragment ions (i.e., MRM transitions). A tryptic digest (0.4 µg/µl) of a pooled serum sample from healthy subjects was used to screen for the detectabilities and specificities (i.e., whether there are other peptides sharing the same MRM transitions that co-elute with the target peptides) of these transitions in a complex matrix using our LC-MRM-MS platform. The collision energies (CE) used to fragment each precursor were calculated from the following equations: for 2+ precursors, CE = 0.034 m/z + 3.314 and for 3+ precursors, CE = 0.044 m/z + 3.314, where m/z is the mass/charge value of the precursor (Maclean et al., 2010a). Only those peptides having relatively high s/n and few matrix interferences were retained for the next round of screening. After two additional rounds of screening, including the removal of peptides with poor LC-MRM-MS performance, all remaining peptides showed good detectability and specificity. The stable isotope-labeled versions of these peptides were then synthesized and used for establishing the final LC-MRM-MS assays and for spiking into each sample for verification of candidate biomarkers.
Because the SIS and endogenous peptides co-elute on the LC column and have the same ionization efficiency and fragmentation behavior under collision-induced dissociation, we used the SIS peptides to optimize the detection parameters of the endogenous peptides. In brief, the synthetic SIS peptides were individually dissolved in 0.1% formic acid/CH3CN (50:50, vol/vol) and then infused to a triple quadrupole mass spectrometer (TSQ; Thermo Fisher Scientific) to optimize their collision energies and to verify their MRM transitions selected during the screening process. The SIS peptides were then individually spiked into serum tryptic digests, and through isotope dilution experiments using LC-MRM-MS, their retention times and best single MRM transitions were determined for use in the final assay. The retention time information from all peptides was used to set up a segmented MRM-MS method, such that all peptide targets could be monitored in the final multiplexed LC-MRM-MS assay (Table S1). The linear dynamic ranges of the SIS peptides were also determined in these measurements.
Because of the large variations in endogenous levels of each peptide, their different ionization efficiencies and different linear dynamic ranges in quantification, the concentrations of spiked SIS peptides were individually adjusted to match the levels of the endogenous peptides to provide more accurate quantification (Kuzyk et al., 2009). Therefore, serum samples from 10 healthy subjects were pooled and tryptically digested to achieve a final peptide concentration of 0.5 µg/µl, which was used to titrate each SIS peptide to determine a concentration that would result in a SIS peptide peak area within a factor of 10 to that of the endogenous peptide. SIS peptides with optimized concentrations were then mixed in equal volumes to create a concentration-balanced mixture. Before spiking SIS peptides into each individually digested sample to accurately quantify the endogenous peptides, the volume of SIS peptide mixture for spiking and the volume of serum/plasma sample to use for tryptic digestion were further optimized before the final processing of all cohort samples.
Samples were randomized before digestion to minimize the bias from sample handling. Next, 5 µl of each whole serum sample (~400 µg of proteins measured by BCA assay) was diluted with 45 µl of 8 M urea. Denatured samples were reduced with 10 mM of dithiothreitol at 37°C for 1 h, and then the reduced samples were alkylated with 40 mM iodoacetamide at room temperature for 1 h in the dark. Samples were then diluted with 50 mM NH4HCO3 (pH 8.1) to reduce the urea concentration to 0.8 M, followed by addition of 1 M CaCl2 solution to a final concentration of 1 mM and trypsin in the ratio of 1:50 enzyme/substrate (wt/wt). Digestion was performed overnight at 37°C.
Samples were acidified by adding 20% formic acid to a final concentration of 1% formic acid to stop the digestion. Next, 15 µl of a concentration-balanced mixture of SIS peptides (in 0.1% formic acid) was added to each serum digest. Samples were then desalted on C18 SPE cartridges (50 mg) and eluted peptides were dried in vacuo and reconstituted with 0.1% formic acid. The final concentrations of the peptide mixtures were adjusted to 0.5 µg/µl before they were subjected to LC-MRM-MS analysis.
A nanoACQUITY LC system (Waters) equipped with a BEH130 C18 capillary UPLC column (100 µm × 100 mm, 1.7-µm particle size) was used for separation of tryptic digests. 1 µl (corresponding to 0.01 µl of original blood plasma/serum) of each sample was injected onto the column before the start of gradient LC separation. The LC flow rate was set at 0.40 µl/min with the following mobile phases: A, 0.1% formic acid (FA) in water; B, 0.1% FA in acetonitrile. The following gradient was used: 0 min, 0.5% B; 0.5 min, 10% B; 4 min, 15% B; 25 min, 25% B; 36 min, 38.5% B; 37 to 41 min, 95% B; 42 min, 10% B; 43–60 min, 0.5% B. The effluent from the LC column was ionized using a spray voltage of 2,400 V and peptides were detected using a triple quadrupole mass spectrometer (TSQ Vantage; Thermo Fisher Scientific). Other acquisition parameters were as follows: collision gas pressure of 1.5 mTorr; scan width of 0.002 m/z; scan time of 0.015 s and peak width of 0.7 for both Q1 and Q3. Each sample was analyzed in duplicate.
The acquired datasets were imported into Skyline (MacLean et al., 2010b). LC-MS peak area integration was manually reviewed, and the peak area ratios between endogenous and SIS peptides as well as other peptide identification information, were exported in tabular, comma-separated value (csv) format, using a customized Skyline format. Further statistical analysis was performed on the csv-formatted data matrix. We used DAnTE to perform Pearson correlation, principal component (PCA), and PLS analyses to identify outlier samples. The peak area ratio distribution was also tested for normality using a Shapiro-Wilks test as implemented in DAnTE (Polpitiya et al., 2008). Because abundances (peak area ratios) for the majority of peptides were not normally distributed, the nonparametric Mann-Whitney U test was used to determine significance. This and the fold change calculations between control and patient groups were all functions of DAnTE. SigmaPlot (version 11.0) was used for ROC curve analysis and for drawing box-whisker plots and bar charts. ROC curve analysis was used to evaluate the performance of each peptide assay in discriminating disease from control. The AUC with 95% Confidence Interval was calculated assuming a nonparametric distribution. An AUC of 1.00 would indicate the peptide achieved 100% accuracy in identifying disease, and an AUC of 0.50 would indicate that the assignment of disease/control is entirely random. To facilitate the identification of samples in the blind group, the cut-off value of peak area ratio corresponding to sensitivity at 90% specificity (AS90) was obtained from the sensitivity and specificity report of ROC curve analysis for each peptide, and it was used as a threshold to classify the blind samples into control and disease. Then the sensitivity for each peptide assay was calculated as the percentage of sera from patients in the blind samples reported as positive using the individual cut-off values at AS90, and the specificity of each peptide assay was calculated as the percentage of control sera reported as negative using the same threshold. Calculation of the endogenous concentration of each peptide/protein was based on the amount of SIS peptide spiked and the average peak area ratio as measured in endogenous/SIS peptide.
Table S1 shows the detailed parameters for the 52 peptide LC-MRM-MS assays. Tables S2–S4 show the clinical data for the DASP verification, blind set, and T2D cohorts, respectively. Online supplemental material is available at http://www.jem.org/cgi/content/full/jem.20111843/DC1.
This research was supported by National Institutes of Health grant DK070146, and Laboratory Directed Research and Development and Technology Maturation Grants of Pacific Northwest National Laboratory (PNNL). Portions of this research were supported by the National Center for Research Resources (P41RR018522), the National Institute of General Medical Sciences (P41GM103493), and National Institute of Diabetes and Digestive and Kidney Diseases grants DK32083 and P30 DK57516. Work was performed at the Environmental Molecular Sciences Laboratory, a national scientific user facility located at PNNL and sponsored by the U. S. Department of Energy (DOE) Office of Biological and Environmental Research. PNNL is operated by Battelle for the DOE under contract no. DE-AC06-76RLO-1830. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.
Battelle Memorial Institute/PNNL has filed a patent application based on this work. The authors have no additional financial interests.
Q. Zhang designed the study; performed experiments, data analyses, and interpretation; and wrote the manuscript. T.L. Fillmore, A.A. Schepmoes, T.R.W. Clauss, and M.A. Gritsenko performed experiments. P.W. Mueller and M. Rewers provided samples. M. Rewers, M.A. Atkinson, and R.D. Smith contributed to discussion and edited the manuscript. T.O. Metz contributed to study design and data analysis and edited the manuscript. All authors have read and approved the final manuscript. Q.Z. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.