|Home | About | Journals | Submit | Contact Us | Français|
Proteomics discovery of novel cancer serum biomarkers is hindered by the great complexity of serum, patient-to-patient variability, and triggering by the tumor of an acute-phase inflammatory reaction. This host response alters many serum protein levels in cancer patients, but these changes have low specificity as they can be triggered by diverse causes. We addressed these hurdles by utilizing a xenograft mouse model coupled with an in-depth 4-D protein profiling method to identify human proteins in the mouse serum. This strategy ensures identified putative biomarkers are shed by the tumor, and detection of low-abundance proteins shed by the tumor is enhanced because the mouse blood volume is more than a thousand times smaller than that of a human. Using TOV-112D ovarian tumors, more than 200 human proteins were identified in the mouse serum, including novel candidate biomarkers and proteins previously reported to be elevated in either ovarian tumors or the blood of ovarian cancer patients. Subsequent quantitation of selected putative biomarkers in human sera using label-free multiple reaction monitoring (MRM) mass spectrometry (MS) showed that chloride intracellular channel 1, the mature form of cathepsin D, and peroxiredoxin 6 were elevated significantly in sera from ovarian carcinoma patients.
Ovarian cancer is the fifth-leading cause of cancer-related death in women in the United States, and is the most lethal of all gynecological malignancies.1 In 2010, an estimated 21,880 women were diagnosed with ovarian cancer, and 13,850 deaths occurred in the United States alone.1 The most common and deadly form of ovarian cancer is epithelial ovarian cancer, which further can be divided into four major histopathological groups: serous, endometrioid, mucinous, and clear cell tumors.2,3 The high mortality rate of ovarian cancer is due largely to the lack of effective screening strategies for early detection. When ovarian cancer is diagnosed at an early stage (stages I or II), treatment is highly effective, with a five-year survival rate of up to 90%, whereas the five-year survival rate for patients with advanced disease (stages III and IV) is reduced to 30% or less.4, 5 Unfortunately, most ovarian cancers are not diagnosed until after the cancer has spread, primarily because earlier-stage diseases are asymptomatic and the ovaries are buried deep within the body.
Current screening methods for ovarian cancer typically use a combination of pelvic examination, transvaginal ultrasonography, and serum CA125, but these methods are not effective in detecting early-stage ovarian cancer.6–8 In addition, CA125 is recognized as a poor protein biomarker for early detection due to its high false positive rate and poor sensitivity and specificity.9, 10 Other promising biomarkers have been reported,11, 12 but a recently completed study comparing many of these protein biomarkers showed that none of them performed better than CA125 as a biomarker for ovarian cancer.13 A few groups also have used panels of biomarkers and obtained better sensitivity and specificity than CA125 alone when used in diagnostic samples.14–17 However, a recent study found that available biomarker panels did not outperform CA125 when used in prediagnostic samples.18 Therefore, better biomarkers that could diagnose early-stage ovarian cancer with high sensitivity and specificity are needed. Furthermore, it is unlikely that any single protein will have adequate specificity and sensitivity for early diagnosis of most solid-tumor cancers. Instead, multiple novel biomarkers must be identified and analyzed in combination to identify biomarker panels that can outperform the use of CA125 alone.
Proteomics technology offers a conceptually attractive platform for cancer biomarker discovery.19 Human blood, in the form of plasma or serum, is one of the most valuable specimens for protein biomarker discovery because it is routinely collected, collection is minimally invasive, and it contains thousands of proteins, including those secreted or shed into the blood by tumors.20 However, systematic discovery of serological biomarkers directly from human serum using proteomics has proven extremely challenging due to the extremely wide concentration range of blood proteins that span more than 10 orders of magnitude. In addition, the most tumor-specific proteins are very likely to primarily be shed by the tumor and will be very low abundant in blood, as exemplified by well-known cancer biomarkers such as PSA and CEA, which are present in serum in the low ng/mL to pg/mL range.20, 21 Most cancers and other diseases also elicit a wide range of host response mechanisms, producing many acute-phase or inflammation-related proteins. It is unlikely that most such relatively general host responses will have sufficient specificity and sensitivity for cancer detection in at-risk populations, although selected inflammation-related biomarkers could contribute to panels of biomarkers that include proteins specifically shed by the tumor. Regardless, it is clear that these common, acute-phase-related changes in serum proteins hamper discovery of tumor-specific proteins when directly profiling sera in human populations. Finally, individual protein levels in blood are highly variable in the human population due to extensive genetic, physiological, and environmental variations, requiring analysis of many patient and control samples before statistically significant, disease-related differences can be identified.
The dynamic range and complexity of the blood proteome can partially be addressed by major protein depletion and multidimensional sample prefractionation. We and others have shown that multidimensional sample prefractionation prior to mass spectrometry analysis greatly enhances proteome coverage and allows detection of low-abundance proteins, at least down to the low ng/mL range.22–27 To overcome the genetic, physiological, and environmental variability associated with analyzing human samples, many less complex experimental models, including cancer cell lines in culture,28, 29 cancer tissue specimens,30, 31 ascites fluid,32, 33 secretomes,34, 35 and mouse models,36–38 have been used in ovarian cancer biomarker discovery. Each model has its benefits, but most strategies, except for the use of mouse models, are not able to determine if the discovered biomarkers are actually shed into blood. In ovarian cancer, the use of both genetically engineered and xenograft mouse models to facilitate serum biomarker discovery has been described.36–38 Even though subject-to-subject heterogeneity is considerably reduced with the use of genetically engineered mouse models, these models still produce many host-response protein changes that can be difficult to distinguish from more tumor-specific protein changes.37
The use of xenograft mouse models has several advantages over other models. First, proteins shed by human tumors into mouse blood can be unambiguously distinguished from less-specific host responses by exploiting species differences in peptide sequences identified by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Second, the blood volume of a mouse is approximately 5,000 times less than an adult human. Therefore, proteins shed by similar-sized small tumors in a mouse and an adult human are likely to be at least 1,000 times more concentrated in a xenograft mouse as compared to the same size tumor in a human. Third, the minimal biological heterogeneity of the xenograft mouse model means that only a small number of samples need to be profiled in order to make inferences about putative biomarkers.
While the use of xenograft mouse models potentially can improve detection of novel cancer biomarkers, mouse serum is still a very complex proteome and requires multidimensional sample prefractionation for sufficient depth of analysis. For example, in a prior xenograft mouse study using two-dimensional gel electrophoresis and without any sample prefractionation, only acute-phase proteins were identified successfully.39 In the case of ovarian cancer, two different studies using a xenograft model with human SKOV-3 serous ovarian cancer cells have been described. In one study, mouse sera were trypsin-digested and analyzed directly by LC-MS/MS, resulting in identification of 13 human proteins.38 The other study focused on the low-molecular weight serum proteome/peptidome of the xenograft model and reported the identification of five human proteins.36 While both prior xenograft ovarian cancer studies successfully identified a few candidate biomarkers (14-3-3 zeta and S100A6), we reasoned that the combination of a xenograft mouse model with more extensive fractionation of serum proteins could identify much larger numbers of novel human candidate biomarkers. The most difficult-to-detect proteins are expected to be lower abundance and, therefore, may be more tumor specific.
In this study, we established a xenograft mouse model using the ovarian endometrioid TOV-112D cell line and analyzed the serum proteome using a 4-D protein profiling strategy. We demonstrated that it is possible to detect many tumor-derived human proteins, including low-abundance human proteins that are present at < 100 ng/mL in normal human serum. In a proof-of-concept validation analysis, we quantified the levels of three high-priority candidate biomarkers in serum from ovarian patients, as well as normal controls and patients with benign disease, using label-free MRM-MS.
The human epithelial ovarian cancer cell line TOV-112D was obtained from the American Type Culture Collection (ATCC, Manassas, VA). The TOV-112D cells were maintained in a 37° C incubator with a 5% CO2−95% air atmosphere in a 1:1 mixture of MCDB 105 media and Medium 199 (Sigma-Aldrich, St. Louis, MO) supplemented with 15% fetal calf serum, as described previously.40
Approximately 3 million TOV-112D cells in 100 µl PBS were injected subcutaneously into the flanks of nine severe combined immunodeficient (SCID) female mice. Tumor volumes were monitored by caliper measurements. When tumors were approximately 1 cm in length, blood was collected from mice by cardiac puncture under anesthesia, animals were immediately euthanized, and the tumor at the injection site and other internal organs were collected. This study protocol was approved by the Wistar Institute’s Institutional Animal Care and Use Committee (IACUC).
The collected blood was allowed to clot at room temperature and followed by centrifugation for 15 min at 4° C to collect the serum. Individual aliquots of serum from each mouse were then snap-frozen and stored at −80° C. Serum subsequently was thawed briefly and pooled based on assessment of tumor burden and extent of tumor necrosis. The total protein concentrations of pooled serum samples were measured using a BCA Protein Assay (Pierce Chemical, Rockford, IL), after which the pooled serum was re-aliquoted, snap-frozen, and stored at −80° C until future use. Tumor necrosis was assessed by microscopic inspection of hematoxylin and eosin (H&E) stained paraffin-embedded sections (5 µm), and other organs were macroscopically and microscopically examined for evidence of tumor metastasis.
The pooled mouse serum was depleted using a 4.6 × 100 mm MARS Mouse-3 HPLC column (Agilent Technologies, Wilmington, DE). A total of 225 µL pooled serum was diluted five-fold with the manufacturer’s equilibration buffer, filtered through a 0.22 µm microcentrifuge filter, and briefly stored on ice. This sample subsequently was applied to the antibody column in five serial injections of 200–250 µL per depletion. The flow-through fractions containing unbound proteins were collected and pooled. The immunodepletion equilibration buffer was removed by buffer exchange into 10 mM sodium phosphate, pH 7.0, and the sample was concentrated to the initial serum volume using a 5K molecular weight cutoff (MWCO) spin concentrator. Bound proteins were eluted with the manufacturer’s elution buffer, neutralized with 1 M NaOH, pooled, aliquoted, and stored at −20° C for possible future analysis. Protein concentrations from the unbound and bound fractions were estimated using standard and reducing-reagent-compatible BCA assays, respectively.
Depleted and concentrated mouse serum (2.6 mg) was reduced with 20 mM DTT for 30 min and alkylated with 50 mM N,N-dimethylacrylamide (DMA) for 30 min at room temperature in 550 µL of buffer (final volume) containing 8 M urea, 20 mM Tris-HCL, pH 8.5. Alkylation was quenched with 1% DTT, and serum was diluted to 670 µL (final volume) in a sample buffer consisting of 8 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, 1% pH 3–7 ZOOM focusing buffer, and 1% pH 7–12 ZOOM focusing buffer. Serum was fractionated by MicroSol-IEF, as previously described23, 41 using a ZOOM-IEF fractionator (Invitrogen, Carlsbad, CA), into five small-volume (550–650 µL) pools where the separation chambers were defined by IPG gel membranes having pH values of 3.0, 4.6, 5.4, 6.2, and 7.0, respectively. After focusing, samples were removed and each chamber was rinsed with a small volume of sample buffer, which was combined with the solution removed from that chamber. IPG gel membranes were extracted twice for 1 h each with 100 µL of 1% SDS, 20mM Tris, 1% 2-mercaptoethanol. To maximize protein loads for SDS-PAGE, MicroSol-IEF fractions were precipitated overnight with nine volumes of 200 proof ethanol, prechilled to −20° C. Ethanol supernatants were carefully removed and protein pellets were re-suspended in 50% ethanol, centrifuged, and pellets were frozen and stored at −20° C until further use. Membrane extracts were concentrated to approximately 50 µL with 5K MWCO spin concentrators.
Frozen protein pellets from ethanol precipitation of MicroSol IEF fractions were thawed briefly and resuspended in SDS gel sample buffer. For fractions 2–4, aliquots derived from 15 µL of original serum per lane were loaded into 10-well 12% NuPAGE mini-gels (Invitrogen) and separated using MES running buffer until the tracking dye had migrated 4 cm. For fractions 1 and 5 and for membrane extractions, the equivalent of 37 µL and 80 µL of original serum, respectively, was loaded and separated for 1 cm. Gels were stained with Colloidal Blue (Invitrogen), and each lane subsequently was sliced into either 40 (fractions 2–4) or 10 (fractions 1, 5, and membranes) uniform 1 mm slices using a disposable gel cutter (The Gel Company, San Francisco, CA). For fractions 2–4, two adjacent slices in a single lane were combined in a digestion well. Slices from duplicate lanes of fractions 1, 5, and membrane extractions were combined and all samples were digested overnight using 0.02 µg/µL modified trypsin (Promega, Madison, WI). A total of 140 digests were performed from the five IEF fractions and six membrane extracts.
Tryptic digests were analyzed on an LTQ-FT hybrid mass spectrometer (Thermo Electron, San Jose, CA) coupled with a NanoLC pump (Eksigent Technologies, Livermore, CA) and autosampler. Tryptic peptides were separated as described previously22 by RP-HPLC on a PicoFrit column (75 µm i.d., 15 nm tip opening; New Objective, Woburn, MA), packed with 8 cm of Hypersil C18 1.9-µm resin (Thermo Electron). Eluted peptides were analyzed by the mass spectrometer set to repetitively scan m/z from 400 to 1600 followed by data-dependant MS/MS scans on the six most abundant ions with dynamic exclusion enabled.
Peptides from each LC-MS/MS run were interpreted from MS/MS spectra using SEQUEST in Bioworks 3.2 (Thermo Electron). DTA files were created and searched against a combined mouse and human database generated from Uniprot (5/16/06), National Center for Biotechnology Information non-redundant (2/05/06), and International Protein Index (version 3.17) databases. This composite database also contained the reversed sequences of each entry appended to the beginning of the forward database. The database was indexed with the following parameters: monoisotopic mass range of 750 to 3500, length of 4 to 100, partial tryptic cleavages with a maximum of two internal missed cleavage sites, static modification of Cys by dimethylacrylamide (+99.0684 Da) and dynamic modification of Met to methionine sulfoxide (+15.9946 Da). The DTA files were searched with a 2.5 Da peptide mass tolerance. Other search parameters were identical to those used for database indexing.
Outputs from all SEQUEST searches were combined, filtered using in-house scripts, and grouped into non-redundant proteins using DTASelect version 1.9.42 An in-house script was used to correct the wrong peptide m/z assignments to the C13 peaks. Peptides were filtered using mass accuracy ≤ 8 ppm, Sf ≥ 0.4 and requiring full tryptic specificity for all identified peptides. This filtering scheme resulted in 1.4% peptide false positives calculated as the number of unique reversed sequence hits/number of unique forward sequence hits. Keratin identifications were removed from the datasets as probable contaminations. Additional Java scripts were developed to compress the non-redundant protein identifications reported by DTASelect into the smallest sets of unique proteins. Peptide counts were derived after collapsing different forms (charge states and modifications) of the same peptide into a single hit. Further reduction was applied by allowing a peptide to be assigned only once to the protein that had the highest sequence coverage.
Custom software also was developed to separate mouse and human proteins based on their sequence identifiers. Putative human peptides were then searched using BLAST against a mouse-only database from Uniprot (11/2007) to remove any putative human sequences that exactly matched mouse sequences.
Sera from patients with benign ovarian tumors (n=9), and from late-stage ovarian cancer patients (stages III, n=15; or IV, n=3) were collected at the University of Turin, Turin, Italy, at the time of diagnosis. Control serum samples (n=6) were collected from healthy, post-menopausal female donors at the Wistar Institute, Philadelphia, PA. All specimens were processed in compliance with institutional review board and Health Insurance Portability and Accountability Act (HIPAA) requirements.
Control and patient serum samples were analyzed either individually or as pools, as follows. Samples were depleted of 20 abundant serum proteins using a ProteoPrep20 Immunodepletion Column (Sigma-Aldrich). Typically, 30–50 µL of serum was depleted, concentrated, and prepared for SDS-PAGE, as previously described.43 SDS-PAGE conditions for human serum were identical to those described above for the analysis of mouse serum, with the exception that the equivalent of 10 µL of original serum were loaded into three adjacent lanes and separated for 4 cm. Gels were sliced and digested, as previously described.43
MRM experiments were performed on a 4000 QTRAP hybrid triple quadrupole/linear ion trap mass spectrometer (AB Sciex, Foster City, CA) interfaced with a NanoACQUITY UPLC system (Waters, Milford, MA). Eight µL of tryptic digests were injected using the partial loop injection mode onto a UPLC Symmetry trap column (180 µm i.d. × 2 cm packed with 5 µm C18 resin; Waters) and then separated by RP-HPLC on a PicoFrit column (75-µm i.d., 15-µm tip opening) packed in-house with 25 cm of Magic C18 3-µm reversed-phase resin (Michrom Bioresources, Auburn, CA). Chromatography was performed with Solvent A (Milli-Q [Millipore, Billerica, MA] water with 0.1% formic acid) and Solvent B (acetonitrile with 0.1% formic acid). Peptides were eluted at 200 nL/min for 3–28% B over 42 min, 28–50% B over 26 min, 50–80% B over 5 min, 80% B for 4.5 min before returning to 3% B over 0.5 min. To minimize sample carryover, a fast blank gradient was run between each sample. An identical reference sample was run at the beginning of each set of samples and was used to normalize variation in MRM signals caused by changes in performance of the HPLC, reverse-phase column or mass spectrometer.
MRM data were acquired with a spray voltage of 2,500 V, curtain gas of 20 p.s.i., nebulizer gas of 10 p.s.i., interface heater temperature of 150 °C, and a pause time of 5 ms. Multiple MRM transitions were monitored using unit resolution in both Q1 and Q3 quadrupoles to maximize specificity. Each MRM transition had a minimum dwell time of 15 s. Data analyses were performed using MultiQuant version 1.1 software (AB Sciex). The most abundant transition for each peptide was used for quantitation unless interference from the matrix was observed. In these cases, another transition free of interference was chosen for quantitation.
Serum levels of candidate biomarkers were compared across patient groups using an unpaired, two-tailed Student’s t-test. Welch’s correction was applied to the t-test when the variances between the two sets were significantly different. Statistical significance was determined if the P-value of the test was less than 0.05. Calculations, scatter plots, and receiver operator characteristic (ROC) curves were generated using the GraphPad Prism 5 (GraphPad, San Diego, CA). Optimal cut-points were obtained by identifying a threshold for each biomarker that resulted in maximum sensitivity and specificity when used to classify serum as tumor or control. Both sensitivity and specificity for each decision rule defined by biomarker-specific optimal cut-point were computed, as well as the positive and negative predictive values. The odds ratio between the group classification and the result from each decision rule from the logistic regression was used as a measure of their association. Multivariate models were fit using logistic regression analysis.
The general experimental workflow we used for discovery and verification of candidate ovarian cancer protein biomarkers is shown in Figure 1. For discovery of candidate human biomarkers, serum proteins obtained from SCID mice harboring human ovarian cancer tumors were subjected to a 4-D separation consisting of three sequential, and substantially orthogonal, protein separations, i.e., major protein depletion, solution IEF, and 1-D SDS-PAGE, followed by online, reversed-phase LC peptide separation prior to MS/MS analysis. We previously developed this 4-D protein profiling method for comprehensive analysis of human serum and plasma proteomes, which resulted in the most comprehensive coverage of a serum sample in the pilot phase of the Human Proteome Organization Plasma Proteome Project (HUPO PPP).22, 44 That study also demonstrated that 14 of the 20 proteins known at that time to be in the 1–100 ng/mL range in normal human serum could be detected. While this method has the capacity to detect many low-abundance proteins, the extensive fractionations and large number of associated LC-MS/MS runs require several weeks to analyze a single proteome, making it impractical to analyze large numbers of samples. However, depth of proteome analysis is much more important than throughput for discovery of candidate cancer biomarkers in the xenograft mouse cancer model, which makes this 4-D method ideally suited for this application.
Verification and initial laboratory-scale validation of candidate biomarkers in human serum represent an important second hurdle in biomarker identification, because appropriate sandwich ELISA or other immunoassays rarely are available for novel candidate biomarkers. We therefore developed an independent, multiplexed, targeted mass spectrometry verification strategy utilizing label-free MRM analysis for this purpose (Figure 1B). Previous MRM studies have shown that multidimensional separations are essential to quantitate plasma proteins in the low ng/mL range unless targeted enrichment is used.45, 46 We also recently demonstrated that this label-free GeLC-MRM workflow is highly reproducible and is capable of quantitating proteins in serum down to approximately 200 pg/mL.47 The throughput of this method is much lower than sandwich ELISA assays or their equivalent. Hence, MRM methods as used herein are not directly applicable as a clinical test. However, such MRM assays are a useful tool for initial screening of candidate biomarkers in modest-sized patient cohorts for biomarkers where validated immunoassays have not yet been established. Proteins that can distinguish between patients and controls will be carried forward for further testing, which is likely to ultimately require the costly, time-consuming task of developing sandwich ELISA assays or related immunoassays. We have recently applied this strategy to verify ADAM12, a disintegrin and metalloprotease domain-containing protein, as a novel biomarker for the diagnosis of ectopic pregnancy.48, 49 Another important advantage of the GeLC-MRM method is that SDS gel fractionation can resolve different molecular weight-forms of targeted proteins and consequently permit separate quantitation of each form of the targeted proteins. Similarly, it can distinguish between closely related protein isoforms with high confidence by targeting isoform-specific peptide sequences. Such specificity is often defined only for a few of the best-characterized immunoassays. Therefore, the label-free GeLC-MRM workflow enables rapid, sensitive, and economical initial screening of large numbers of candidate biomarkers prior to setting up stable-isotope dilution MRM assays or immunoassays for the most promising candidate biomarkers.
A potential concern with our discovery and initial validation workflows is the use of immunoaffinity protein depletion to facilitate in-depth proteome analysis. Major proteins such as albumin can act as carrier proteins that, when removed, could lead to loss of interacting non-targeted proteins.50 In fact, using Cibachron blue dye affinity chromatography, the ‘molecular sponge’ property of albumin has been exploited to identify peptide biomarker candidates for ovarian cancer.51 However, the degree of non-targeted protein losses appeared to depend on the method used to remove major proteins. Less specific methods, such as Cibachron blue dye, result in greater losses compared to the more specific immunodepletion methods commonly used currently.52 Recent studies using the MARS and IgY-14 columns showed that immunodepletion of major proteins was highly reproducible, and less than 40 non-targeted proteins were detected in the bound fraction.53, 54 In addition, these non-targeted bound proteins were mostly captured at a low level in a reproducible manner. Major protein immunodepletion also resulted in a 25% increase in identified proteins, including some low-abundance (<10 ng/ml) plasma proteins, and enriched nontargeted plasma proteins by ~ 4-fold compared to undepleted plasma.54 While the identified low-abundance proteins only accounted for ~6% of total identified proteins, it is unclear whether alternative untargeted proteomic strategies are able to greatly outperform major protein immunodepletion in detecting low-abundance proteins. Therefore, despite its limitations, major protein immunodepletion is an effective method for reducing plasma or serum proteome complexity to enhance protein detection.
To identify candidate serum biomarkers using the xenograft mouse model, TOV-112D ovarian endometrioid tumor cells were injected subcutaneously. This cell line was originally derived from a patient with a malignant ovarian tumor that had never been exposed to radiation or chemotherapy.55 This cell line was chosen because it has a fast growth rate and has been demonstrated to form tumors readily in immune-compromised mice. More importantly, the in vitro growth characteristics of the cell line mimic the aggressive clinical behavior of the cancer.55 In addition, previous proteomics biomarker discovery studies used human SKOV-3 serous ovarian cancer cells.36, 38 It is important to note that ovarian cancer is a heterogeneous disease with four major subtypes: serous, clear cell, endometrioid, and mucinous that develop differently, have distinct underlying molecular events during oncogenesis, respond differently to chemotherapy, and have differences in gene expression.2, 3 Therefore, we expect proteomics biomarker discovery using different ovarian cancer cell subtypes will lead to identification of sets of biomarkers where some proteins are common to all or multiple cancer subtypes, while other proteins may be specific to a single subtype.
The mouse serum was first subjected to depletion of three major proteins from a total of 225 µl (10.2 mg) serum using a MARS Mouse-3 HPLC column. Following depletion, 3.1 mg of total unbound proteins were recovered. SDS-PAGE analysis of the unbound and bound fractions showed good depletion of albumin (69 kDa) and transferrin (77 kDa), as expected (Figure 2A). The third protein expected to be depleted by this column, IgG, was not apparent in this experiment because SCID mice have very low levels of immunoglobulins. Following major protein depletion, the unbound proteins were separated into five pI fractions using MicroSol-IEF, and proteins with pIs identical to the pH of the MicroSol-IEF separator membranes were extracted (Figure 2B). Although the total amount of protein trapped in the MicroSol-IEF membrane partitions was very low, these fractions were included in the proteome analysis to increase its comprehensiveness. The third fractionation step utilized 1-D SDS-PAGE, and to enhance detection of low-abundance protein, MicroSol IEF fractions and membrane extracts were concentrated and the largest possible protein loads that avoided excessive band distortion were applied onto the gels. Furthermore, because trypsin digestion of large gel volumes containing low protein amounts can be inefficient with disproportionally high adsorptive losses, the length of the electrophoretic separation was adjusted based upon sample complexity and the number of fractions desired. Hence, the most complex fractions (F2 to F4) were separated for 4 cm and cut into 20 × 2-mm slices, while remaining, less complex fractions were separated for 1 cm and cut into10 × 1-mm slices, with corresponding slices from duplicate lanes combined for trypsin digestion (Figure 2C). This yielded a total of 140 samples for trypsin digestion and LC-MS/MS analysis.
From a total of 140 LC-MS/MS runs performed on a LTQ-FT mass spectrometer, 1.2 million MS/MS spectra were acquired and searched against a mouse and human composite database. After stringent data filtering and removal of redundant entries and common contaminants, a total of 1,198 unique proteins were identified from 6,014 unique peptides (Figure 3). The peptide false discovery rate estimated from the number of hits against the reversed entries in the composite database was 1.4%. As expected, the majority of identified proteins (753) were mouse proteins, as they contained peptide sequences unique to the mouse database. Based on the database search results, 222 proteins were initially identified as human proteins because they contained at least one apparent human-specific peptide. To confirm the species assignment, all peptides for these putative human proteins were searched against a mouse database using BLAST, and the results were used to remove proteins where all peptides were identical to mouse sequences or contained only isobaric differences (Ile/Leu). After the BLAST analysis, a total of 573 unique human-specific peptides remained and they defined 208 high-confidence human proteins identified by at least one human-specific peptide. The apparent molecular weights of these human proteins ranged from 10 to 435 kDa.
It is possible that some apparently human proteins were misidentified as human due to single nucleotide polymorphisms (SNPs) or deamidation of an asparagine to an aspartic acid. Therefore, such assignments were flagged as tentatively human. However, 87% of identified human-specific peptides differed from the homologous mouse sequences by more than a single nucleotide, indicating that this was a relatively minor concern. Two examples of MS/MS spectral assignments for human-specific peptides and the corresponding mouse sequences are shown in Figure 4. In both examples, the identified human-specific peptide sequences differ from the homologous mouse sequences by more than one residue, unambiguously indicating that the identified proteins had to be secreted by the human ovarian tumors into the mouse blood. In addition to the mouse or human-specific proteins, 237 proteins were identified by peptides common to both mouse and human sequences, and are therefore species indistinguishable at this stage.
These results demonstrate the feasibility of identifying large numbers of human-specific proteins from xenograft mouse models of solid tumor cancers when an in-depth analysis of the serum proteome is performed using a 4-D protein profiling strategy. Of course, as in any proteomics studies, human-specific proteins identified by only a single peptide will have a somewhat higher probability of being a false positive assignment despite the low overall peptide false discovery rate used here. Despite the somewhat higher uncertainty, these single-peptide proteins are not disregarded at this stage of analysis, because the most specific cancer biomarkers are likely to be very low-abundance proteins that are only shed by the tumor. Proteins identified by single peptides are likely to be among the lowest abundance proteins in these datasets because there is a rough correlation between protein abundance and the number of peptides identified. Even if all single-peptide proteins were disregarded, 106 human proteins were identified by at least two peptides.
Previous serum proteomics studies of ovarian cancer xenograft mouse models only identified a few human-specific proteins, presumably due at least in part to lower levels of sample fractionation used in those studies.36, 38 In the study that analyzed the low-molecular weight serum proteome using LC-MS/MS analysis, only 400 peptides were identified, and these peptides corresponded to a total combination of 300 human and mouse proteins. By using MS/MS spectral counts, five human-specific proteins were identified at a statistically significant higher level in the cancer versus control xenograft model.36 In contrast, a similar low-molecular weight serum proteome analysis of a lung tumor xenograft mouse model that incorporated an additional fractionation by strong cation exchange chromatography prior to LC-MS/MS analysis was able to identify more than a thousand proteins, although no effort was made to distinguish between human and mouse proteins in that study.56 Another ovarian cancer study directly analyzed the xenograft serum proteome by LC-MS/MS and identified 13 human proteins. However, most of the human species assignments were made by comparing the results with those obtained from a separately analyzed SKOV-3 cell line secretome.38 In fact, the candidate biomarker reported in that study (14-3-3 zeta) was identified by a single peptide that is indistinguishable from the corresponding mouse sequence. Proteomics profiling of xenograft mouse models of prostate, breast, and oral squamous cell carcinomas also has been reported.57–59 The number of human-specific proteins reported using a xenograft mouse model could be influenced by the specific cell line used, the number of cells injected, the site of cell injection (subcutaneously vs. orthotopically), total human tumor burden at blood collection, and other tumor properties such as extent of necrosis. However, the most critical factors that affect depth of analysis are likely to be the total human tumor burden at the time of blood collection and the extent of plasma fractionation prior to LC-MS/MS analysis. For example, the previous xenograft mouse studies utilized three or less dimensions of sample fractionation and resulted in identification of less than 20 human-specific proteins. In contrast to these earlier studies, the current study demonstrates that a more in-depth protein profiling strategy, such as the 4-D method, is crucial for successful identification of substantial numbers of human-specific proteins in xenograft mouse serum.
A group of interesting human proteins identified in this study are summarized in Table 1, and details of the peptide identifications are reported in Supplemental Table 1. Interestingly, a substantial number of common, relatively abundant serum proteins, such as ITIH4, APOA1, TTR, and TF are shed by the ovarian tumors. Some of these common plasma proteins were observed in human ovarian tumor specimens in prior reports, but it was not clear whether these proteins infiltrated the tissue from the blood or if the tumor produced these proteins. Some of these abundant proteins also have been associated with host-response or acute-phase reaction to biological insults and are primarily synthesized by the liver. However, the identification of multiple, human-specific peptides for these proteins unambiguously demonstrates that they were produced and shed into the blood by the ovarian tumors. Despite their unambiguous tumor origin, these proteins are unlikely to be useful for diagnosing or monitoring ovarian cancer in humans, because the contribution to blood levels from small tumors is likely to be swamped by shedding from other tissues and normal variations in the protein’s level in the normal population. Interestingly, three of these proteins, APOA1, TTR and a fragment of ITIH4 used in a multimarker panel, were reported to have higher diagnostic accuracy than CA125 for detection of early-stage ovarian cancer.14 However, a subsequent study found that the use of these proteins in biomarker panels did not outperform CA125 when used in prediagnostic samples.18 Nevertheless, the identification of these proteins that previously have been shown to be produced by ovarian cancer cells or tissues further demonstrates that the xenograft mouse model system is a realistic system for serum biomarker discovery.
Some of the human proteins identified were previously reported to be possible serum biomarkers for ovarian cancer. For example, CTSD level has been proposed as a prognostic factor in a variety of cancers, including ovarian cancer.60, 61 Recently, it also was shown that quantitation of circulating autoantibody against CTSD can differentiate benign ovarian conditions from ovarian carcinoma.62 Similarly, CLU was shown to be present at a higher level in serum of late-stage ovarian cancer patients versus normal controls.35 A number of other proteins such as LDHB, CFL1, CLIC1, and AKR1B1 also were identified previously in ovarian cancer tissues or in conditioned media for ovarian cancer cell lines,35, 63 but they were not previously known to be shed into the blood.
Most of the human proteins identified have at least 70% sequence identity with their mouse counterpart, and some share ≥ 90% identity with the mouse protein. An example of a highly homologous protein is CLIC1, which shares 98% sequence identity between the two species. Despite this very high sequence homology, a human-specific peptide, LAALNPESNTAGLDIFAK, was identified that indicated the protein was shed by the human ovarian tumor (Figure 5). This demonstrates that very highly homologous human and mouse proteins can be distinguished using the xenograft mouse model and the 4-D serum protein profiling method.
Xenograft mouse models of ovarian cancer are widely used in therapeutic and tumor biology studies.64, 65 Most xenografts are created by subcutaneous, intraperitoneal, or orthotopic intrabursal implantation of tumor cells/tissues. Some studies have shown that the site of implantation has no effect on the histopathological characteristics of the tumor66 or the ability to support early follicular growth of ovarian tissue.67 While secretion of some tumor proteins are likely to be affected by the implantation site, the similarities of histopathological characteristics suggest that many proteins shed by the tumor will be independent of tumor microenvironment. In support of the biological relevance of our xenograft mouse model, more than 50% of the human proteins identified in the current study were also identified in secretomes of primary human ovarian cancer tumor tissue in short-term organ culture (data not shown).
The production by the ovarian tumors and shedding into the mouse blood of proteins previously associated with ovarian cancer argue that this model effectively mimics normal protein shedding of ovarian cancer in humans. In addition, the xenograft mouse model allows detection of low-abundance serum proteins such as CTSD, which has a reported serum concentration of about 16 ng/mL.68, 69 Another important benefit of the use of the xenograft mouse model over cell lines, tumor tissues, and secretomes from cells or tissues is the ability to show that the identified proteins are actually shed into blood. Since the human proteins identified in this study were both unambiguously produced by the ovarian tumor and shed into the blood, they are likely to be shed into the bloodstream of ovarian cancer patients. Hence, they are candidate biomarkers worth testing in human patients. As noted above, the most cancer-specific biomarkers probably will be those proteins shed primarily by the tumor and not other organs such as the liver. Therefore, candidate biomarkers from the xenograft mouse model that either have not been reported previously in normal human serum or are known to be low abundance are considered high-priority candidates for screening in ovarian cancer patient sera.
The potential utility of several high-priority candidate biomarkers for screening ovarian cancer patients was evaluated in a small patient cohort, as outlined in Figure 1 and described in Tang et al.47 In this proof-of-concept experiment, a subset of seven human proteins within the 20 to 55 kDa range was selected (CTSD, CLIC1, AKR1B1, HMX1, TRPM1, CUTA, and SERPINB12) for initial evaluation (Table 1). In addition, a higher abundant known ovarian-cancer-associated serum protein (CLU) was included as a positive control.
Although these human proteins were detected in the xenograft mouse serum, the ability to detect them in human serum using GeLC-MRM was unknown. Therefore, multiple peptides from each of these proteins initially were targeted by LC-MS/MS on an LTQ-Orbitrap XL mass spectrometer using a pool of abundant protein-depleted serum from nine late-stage ovarian cancer patients (Table 1). The targeted MS/MS analysis was able to identify multiple peptides for CLU, CTSD, and CLIC1. However, only a single peptide was identified for AKR1B1 and HMX1, and three proteins (TRPM1, CUTA, and SERPINB12) could not be identified. The inability to detect TRPM1, CUTA, and SERPINB12 might be due to either very low concentrations in human serum, as suggested by the identification of only a single peptide in the xenograft mouse system, or these single peptide proteins might be false positive identifications from the discovery experiment. Another possibility is that these proteins could be relatively specific to endometriod ovarian tumors, since the majority of the patient samples tested so far are from serous ovarian cancer patients (Supplemental Table 2). This possibility could be further tested by obtaining larger numbers of endometrioid subtype samples in future validation tests. Our minimum criteria for high-confidence GeLC-MRM quantitation required detection and quantitation of at least two peptides per protein and, therefore, AKR1B1 and HMX1 were not included in the MRM assays.
GeLC-MRM quantitation was performed initially on separate pools of nine serum samples from patients diagnosed with benign, and late- (stages III and IV) stage ovarian cancer. To assess the robustness of the MRM methods for the selected proteins and to obtain a preliminary indication of the predictive value of CLU, CTSD, and CLIC1, gel slices identified as containing the proteins in the xenograft mouse analysis, as well as adjacent fractions, were analyzed, and relative peptide amounts were summed across gel slices (Figure 6). As expected, typically all peptides from a given protein displayed similar trends across the three pooled serum samples. In the case of CLU, one peptide was disproportionately low in the benign samples. Examination of the raw data showed splitting of the peptide peak, apparently due to variations in spray in the triple quadrupole nanosource. Hence, this peptide was not used for protein quantitation in this dataset. It should also be noted that some serum proteins such as CTSD and CLU undergo proteolytic processing to yield mature forms of the protein. Although both the full-length and mature forms are detectable in serum if all fractions of the gel are analyzed, it is more efficient if analysis focuses on a discrete region of the gel to maximize throughput. In this study, we focused on the 20–55 kDa region, which included the mature forms of CTSD and CLU but not the full-length (unprocessed) forms of these proteins, which will be quantified in a future study that evaluates the higher molecular weight region of the gel.
The results obtained from a preliminary analysis of the pooled samples showed that the CLIC1 and the CTSD mature forms (henceforth referred to as CTSD-30 kDa) exhibited the greatest difference between benign and late-stage ovarian cancer and therefore warranted further analysis. The levels of these proteins were measured in individual control serum samples (six normal and nine benign), and late-stage cancer samples (15 stage III and 3 stage IV). We did not continue to evaluate CLU because it is a known high-abundant plasma protein with reported concentration ranging from 58 to 150 µg/mL,35, 70 and its level is also elevated by many acute-phase stimuli such as inflammation, heat shock, and injury.71, 72 In addition to CLIC1 and CTSD-30 kDa, PRDX6 was included in subsequent analyses. PRDX6 is a 25 kDa bifunctional 1-Cys peroxiredoxin that has been hypothesized to promote cancer growth and invasiveness, with increased expression observed in some malignancies.73–75 Although human PRDX6 was not conclusively identified in this study, the mouse PRDX6 was identified with four peptides that are indistinguishable from human. In addition, PRDX6 was identified in a TOV-112D secretome study.35 Taken together, these results suggest PRDX6, which is in the 25 kDa region being assayed, could be a potential biomarker for ovarian cancer.
Label-free MRM quantitation of individual serum samples showed a significant difference (P<0.05, Student’s t-test) between the control (normal and benign) and cancer groups for CLIC1 and CTSD-30 kDa (Figure 7, left panels). The normal and benign samples also were compared separately to the cancer group, since the normal and benign samples were collected at two different sites and benign conditions often are more difficult to distinguish from cancer than healthy controls (Figure 7, right panels). PRDX6 showed a significant difference between normal controls and cancer but not between benign disease and cancer, whereas there were significant differences between cancer and either non-cancer group for the other two biomarkers. To further evaluate the potential diagnostic efficacy for each of the three proteins, receiver operating characteristic curve analyses were performed on the control and cancer groups (Figure 8). In agreement with the t-test, both CLIC1 and CTSD-30 kDa showed a larger area under the ROC (AUC) compared to PRDX6. The sensitivity and specificity, as well as the positive and negative predictive values for each biomarker at the optimal cut-point, are presented in Supplemental Table 3. A binary decision rule for CTSD1-30kDa, CLIC1, and PRDX6 was created using their optimal cut-point. Each binary variable was a significant predictor of tumor samples (p=0.001. 0.005, and 0.011, respectively). In the multivariate analysis, only CTSD1-30kDa and CLIC1 remained in the final model (p=0.009 and 0.052, respectively). The AUC for the predicted probability of a tumor sample from the multivariate model, including these two biomarkers, was 0.893.
We previously demonstrated the reproducibility of the label-free GeLC-MRM workflow in quantitating biomarkers for ectopic pregnancy.47, 48 To evaluate the reproducibility of the entire label-free GeLC-MRM workflow for the ovarian cancer biomarkers, we prepared two separate serum pools of normal (n=6) and ovarian cancer (n=9) samples and subjected the two pooled samples to major protein depletion and GeLC-MRM quantitation, as shown in Figure 1B. MRM quantitation of CTSD-30 kDa, CLIC1, and PRDX6 in this new pool of samples then were compared to the averaged quantitation values of individual samples that made up the pooled samples. As shown in Figure 9, MRM quantitation of the normal and cancer samples are very similar for the pool samples and the average of individual samples, demonstrating the reproducibility of the entire label-free GeLC-MRM workflow. In this analysis, the level of CTSD did not appear to be significantly different between the two groups due to the inclusion of more low responders in the cancer group. This highlights the potential risks of using pooled serum samples to attempt to gauge the predictive value of a biomarker, and the importance of analyzing individual samples in biomarker validation. These results indicate that CTSD-30 kDa and CLIC1 are promising biomarkers of ovarian cancer, although the performance of these two markers for early detection remains to be determined as larger cohorts, as well as early-stage cancer samples and pre-diagnosis specimens, need to be tested in future studies. While PRDX6 is a less promising biomarker based upon the current data, further studies using a larger cohort may show efficacy of this biomarker, particularly as part of a multi-protein panel. To our knowledge, CTSD, CLIC1, and PRDX6 levels in serum of ovarian cancer patients have not been reported previously—although, a recent report did show that quantitation of circulating autoantibody against CTSD can differentiate between benign and other stages of ovarian carcinoma including stage I,62 which supports our argument that CTSD is a biomarker for ovarian cancer. Interestingly, CLIC1 recently was discovered as a novel plasma marker for nasopharyngeal carcinoma,76 although its role in ovarian cancer is still unclear.
New proteomic technologies or strategies are constantly being refined or developed to improve identification of novel cancer biomarkers. One interesting approach is the use of nanoparticle capture technology to enrich for the low-molecular weight peptidome that has been suggested to contain a rich source of cancer-specific diagnostic information.77, 78 Another strategy involves the use of combinatorial peptide libraries to enrich for low-abundance proteins in ovarian cancer ascites.79 Here, we demonstrate that analysis of serum from an ovarian cancer xenograft mouse model using a 4-D protein profiling method is capable of deep mining of the mouse plasma with resulting identification of more than two hundred human proteins that are unambiguously shed by human tumors into the blood. While the xenograft mouse model is not expected to fully recapitulate the in vivo human tumor microenvironment in patients, many of the proteins identified in this study have been previously associated with ovarian cancer, although many others were not previously known to be produced by the tumor and shed into the blood. Verification and small-scale validation of selected high-priority candidate biomarkers show that a substantial portion of the tested candidate biomarkers correlate with ovarian cancer in patient serum specimens.
This work was supported by National Institutes of Health grants CA131582 and CA120393 to D.W.S., as well as an institutional grant to the Wistar Institute (NCI Cancer Core Grant CA010815). We gratefully acknowledge Dr. Dionyssios Katsaros, University of Turin, Turin, Italy, for providing benign and ovarian cancer sera. We gratefully acknowledge the Wistar Institute Proteomics Core, the Histotechnology Core, and the Animal Facility for their assistance with the project.
Supporting Information Available
Figure showing MS/MS spectra of all single-peptide proteins listed in Table 1 showing details of Sequest peptide identification of selected human proteins shed by ovarian TOV-112D tumors into SCID mouse; Table showing normalized MRM peak area values for CLIC1, CTSD, and PRDX6 peptides and their averages in all samples analyzed; Table showing the sensitivity, specificity, and positive and negative predicted values for tumor vs. normal and benign. This material is available free of charge via the Internet at http://pubs.acs.org.