200 proof molecular biology grade ethanol, LC-MS grade formic acid, and iodoacetamide were purchased from Sigma-Aldrich (St. Louis, MO). Sodium dodecyl sulfate (SDS) and Tris were purchased from Bio-Rad (Hercules, CA). Dithiothreitol (DTT) was obtained from GE Healthcare (Piscataway, NJ). HPLC grade acetonitrile was purchased from Thomas Scientific (Swedesboro, NJ). Sequencing grade modified trypsin was purchased from Promega (Madison, WI).
Serum was collected from nine patients with an ectopic pregnancy and nine matched controls with normal intrauterine pregnancies. Specimens were matched based on gestational age (range of 4 weeks, 2 days to 11 weeks, 3 days), hCG level (3821–52430 mIU/ml) and diagnosis (EP or IUP). Blood was collected by venipuncture into BD Vacutainer red/grey serum separator tubes (BD, Franklin Lakes, NJ), allowed to clot at RT, and centrifuged. Serum was then aliquoted, frozen, and stored at −80 °C.
Samples were depleted of 20 abundant serum proteins using a ProteoPrep20 Immunodepletion Column (Sigma-Aldrich). Typically, 100 μL of serum was filtered through a 0.22 μm microcentrifuge filter and injected onto the column. The flow-through fractions containing unbound proteins were collected, pooled, and precipitated with nine volumes of 200 proof ethanol, pre-chilled to −20 °C. Ethanol supernatants were carefully removed and protein pellets were frozen and stored at −20 °C until further use. Fractions containing affinity-bound abundant proteins were collected and pooled, neutralized with 1M NaOH, and frozen for possible future analysis.
SDS-PAGE/ In-Gel Trypsin Digestion
Prior to 1-D SDS-PAGE, frozen protein pellets from ethanol precipitation of depleted serum were thawed briefly and re-suspended in 50 mM Tris-Cl, 1% SDS, pH 8.5. Samples were reduced with 20 mM DTT for 1 h at 37 °C and alkylated with 60 mM IAM in 50 mM Tris-Cl, pH 8.5 for 1 h at 37 °C. Alkylation was quenched with 50 mM DTT for 15 min at 37 °C. Following in-solution reduction and alkylation, samples were prepared for PAGE by addition of SDS sample buffer. For each sample, aliquots representing 10 μL of original serum per lane were loaded into 10-well 12% NuPAGE mini-gels (Invitrogen, Carlsbad, CA) and separated using MES running buffer until the tracking dye had migrated 2 cm. Gels were stained with Colloidal Blue (Invitrogen), and each lane was subsequently sliced into 21 uniform 1 mm slices using a custom razor-blade array. Corresponding slices from three lanes for each depleted serum sample were combined in single wells of a 96-well pierced plate (Biomachines, Inc., Carrboro, NC). Gel slices were digested overnight using 0.02 μg/mL modified trypsin. Following digestion, aliquots of corresponding fractions from three patients in each group were pooled to produce three EP and three IUP serum fraction pools. These pools and the remainder of individual sample digests were frozen and stored at −20 °C for future discovery and validation analyses, respectively.
For initial discovery of candidate biomarkers, pooled tryptic digests were analyzed in duplicate using an LTQ-Orbitrap XL mass spectrometer (Thermo Scientific, Waltham, MA) interfaced with a Nano-ACQUITY UPLC system (Waters, Milford, MA) with the column heater maintained at 40 °C. For each tryptic digest, 6 μL was injected onto a UPLC Symmetry trap column (180 μm i.d. × 2 cm packed with 5 μm C18 resin; Waters), and tryptic peptides were separated by RP-HPLC on a BEH C18 nanocapillary analytical column (75 μm i.d. × 25 cm, 1.7 μm particle size; Waters). Solvent A was Milli-Q (Millipore, Billerica, MA) water containing 0.1% formic acid, and Solvent B was ACN containing 0.1% formic acid. Peptides were eluted at 200 nL/min using an ACN gradient consisting of 5–28% B over 42 min, 28–50% B over 25.5 min, 50–80% B over 5 min, 80% B for 4.5 min before returning to 5% B over 0.5 min. The column was re-equilibrated using 5% B at 400 nl/min for 20 min before injecting the next sample. The mass spectrometer was set to scan m/z from 400 to 2000. The full MS scan was collected at 60,000 resolution in the Orbitrap in profile mode followed by data-dependant MS/MS scans on the three most abundant ions exceeding a minimum threshold of 1000, collected in the linear trap. Monoisotopic precursor selection was enabled and charge-state screening was enabled to reject z = 1 ions. Ions subjected to MS/MS were excluded from repeated analysis for 60 s. The order of sample analysis was randomized to prevent temporal experimental bias. Mass spectrometer, HPLC, and autoinjector performance were rigorously monitored to maintain mass accuracies within 2 ppm, retention times within a ±1.0 min window, and injection volumes within ± 10% to facilitate label-free pattern comparisons.
Label-Free Quantitation Using the Rosetta Elucidator System
LC-MS and LCMS/MS data were analyzed using the Rosetta Elucidator system. A total of 252 raw MS spectra files were imported into the system (6 depleted serum pools × 21 fractions × duplicates); LC-MS data were acquired from 0–98 min, but based on elution profiles of peptides and density of ion signals, data for the label-free comparison was trimmed to 20–75 minutes and the m/z
range was trimmed to 400–1800. Retention time (RT) alignment, feature identification (discrete ion signals), and feature extraction across the entire chromatographic time window were performed by the Elucidator software, essentially as described by others.29,30
DTAs were created with BioWorks v. 3.3.1 (Thermo Scientific) using high-quality features with z >1 and <5, and having peak scores greater than 0.7 and 0.8 for RT and m/z
, respectively. Peak scores, as defined in the Rosetta Eludicator System User Guide, are correlation coefficients that compare the shape of a feature in the time and m/z
dimensions to the shape of an ideal peak, with an ideal peak having a score of 1.31
DTAs were searched using the SEQUEST algorithm (v. 28, rev. 13, University of Washington, Seattle, WA) with a full tryptic constraint against a human UniRef100 protein sequence database (10/23/2007, 84,662 entries) to which commonly observed “contaminants” were added (trypsin, keratins, etc.). A decoy database was produced by reversing the protein sequence of each database entry and the entire reversed database was appended in front of the forward database. Peptide and protein information was assigned to features using the Protein and Peptide Tellers, which are Rosetta Biosoftware's re-implementations of the open-source ProteinProphet™ and PeptideProphet® programs,32, 33
respectively. Specifically, as described in the Rosetta Elucidator System User Guide, Peptide Teller validates peptides assigned to MS/MS spectra by search engines by computing probabilities that search results are correct in the dataset based on search scores and peptide properties. Protein Teller computes probabilities that proteins were present in a sample based on the combined probabilities of their corresponding peptides. Importantly, it deals with two issues critical for protein inference: First, correct peptides often correspond to multi-hit proteins whereas incorrect peptides most often correspond to single-hit proteins. This non-random grouping of peptides with their corresponding proteins can lead to an amplification of the false positive error rate at the protein level. Protein Teller counteracts this effect by penalizing peptides corresponding to single-hit proteins at an appropriate amount learned from each data set. Second, a substantial number of identified peptides are common to multiple database entries. This is especially true for human and other higher eukaryotic species, which usually contain alternative splice forms, large, homologous protein families, and partial sequences in the databases. Protein Teller apportions common peptides among all corresponding proteins to derive the simplest list of proteins that can explain the observed peptides.31
Data were filtered using Protein Teller scores of correct identification probability > 0.95 and Peptide Teller scores > 0.8.
Identification of Differentially Expressed Proteins of Interest
The experiment was defined in the Elucidator System as having two treatment groups (EP, IUP). Each treatment group included three pools of three individual serum samples and two technical replicates per group. Several strategies and tools within the Elucidator System were used to analyze the data, including differences at the annotated peptide level, the protein level, and peptide trend plots. Specifically, the 2-D visual script shown in Supplemental Figure S1
utilized peptide annotation to sum feature intensities across gel slice fractions within each sample, and peptides significantly different between groups were defined using a two-way Analysis of Variance (ANOVA) with p<0.001. Peptides were grouped into consensus proteins using Protein Teller and protein level ratios were determined using those peptides that were significantly different between groups, as defined by ANOVA.
A subsequent independent manual analysis was conducted by exporting the peptide report results, which included values for technical replicates, into Microsoft Excel (Microsoft Corporation, Redmond, WA). Peptides were grouped into proteins based on protein description and pair-wise ratios between average intensities of IUP and EP were calculated for each peptide as well as the summed intensity for the protein. In addition, a further statistical test was developed independently to identify those peptides with the greatest discrimination power between groups, as summarized below.
Identification of the Most Significant Peptide Differences
We assumed peptide logarithmic expression levels in each sample were normally distributed and introduced two statistical measurements, sum-of-Z-score (sumZscores) and probability-of-misclassification (Pm
), to objectively quantitate the separation between the two distributions. Given two normal distributions with means and variances(
) and (
) respectively, sumZscores computes the distance between the two means in terms of Z-scores, taking into account the widths of the distributions. Explicitly, we have the following expression for sumZscores,
On the other hand, the probability-of-misclassification (Pm
) of a peptide represents the minimal theoretical error that would occur if we were to classify samples from a balanced mixture of two normal distributions into EP or IUP group by thresholding on the logarithmic expression level of that peptide. In practice, the optimal threshold value can be found by solving a quadratic equation for the point(s) where the two normal distributions yield equal density, and then select the one with lower classification error. The value for Pm
is then computed as the corresponding minimal theoretical error. A detailed derivatization of Pm
is described in Supporting Information
Targeted LC-MS/MS Analysis
Targeted LC-MS/MS analyses for proteins of interest were performed on a LTQ-Orbitrap XL mass spectrometer coupled to a Nano-ACQUITY UPLC system. Targeted analysis was used to: verify the initial peptide and protein identifications of putative biomarkers of interest, distinguish between related protein isoforms where needed, and increase the number of identified peptides where needed for subsequent quantitative assay development. Columns, solvents, and gradient used were as described above for LC-MS/MS. A list of m/z values representing the targeted peptides were generated and placed into the parent mass list of the MS method. The mass spectrometer was set to scan m/z from 360 to 2000 at 60,000 resolution in the Orbitrap followed by data-dependent ion trap MS/MS scans of up to the three most abundant ions from the parent mass list that exceed a minimum threshold of 500. Targeted ions were monitored throughout the entire run with an m/z tolerance of ±10 ppm. Dynamic exclusion was enabled with a repeat count of 2, repeat duration of 10 s, and exclusion duration of 10 s. Monoisotopic precursor selection was not enabled, and charge-state screening was set to reject singly charged ions and ions with unknown charge state.
Label-Free Multiple Reaction Monitoring (MRM)
MRM experiments were performed on a 4000 Q TRAP hybrid triple quadrupole/linear ion trap mass spectrometer (Applied Biosystems, Foster City, CA) interfaced with a NanoACQUITY UPLC system. Chromatography was performed with Solvent A (Milli-Q water with 0.1% formic acid) and Solvent B (acetonitrile with 0.1% formic acid). Typically, 5 μl of an appropriate tryptic digest was injected in duplicate on PicoFrit columns (75-μm i.d., 15-μm tip opening; New Objective, Woburn, MA) packed in house with 25 cm of Magic C18 3-μm reversed-phase resin (Michrom Bioresources, Auburn, CA). Peptides were eluted at 300 nL/min using an acetonitrile gradient consisting of 5–35% B over 15 min, 35–70% B over 5 min, 70% B for 5 min before returning to 5% B in 0.5 min. To minimize sample carryover, a blank was run between each sample. Data were acquired with a spray voltage of 2,800 V, curtain gas of 20 p.s.i., nebulizer gas of 10 p.s.i., and an interface heater temperature of 150 °C. At least three MRM transitions per peptide, and three peptides per protein were monitored and acquired at unit resolution in both Q1 and Q3 quadrupoles to maximize specificity. Scheduled MRM also was used to reduce the number of concurrent transitions and maximize the dwell time for each transition. The MRM detection window was set at 4 min, and target scan time was set at 1 s. The final MRM method included 60 optimized transitions for five target proteins. Data analysis was performed using MultiQuant version 1.1 software (AB/MDS Sciex, Foster City, CA). The most abundant transition for each peptide was used for quantification unless interference from the matrix was observed. In these cases, another transition free of interference was chosen for quantification.