Overview of Discovery, Prioritization and Verification of EOC Biomarkers Using a Xenograft Serous EOC Mouse Model
The strategies used to improve ovarian cancer biomarker discovery using the xenograft mouse model system, select high priority candidate biomarkers, and improve the efficiency of MRM assay development and biomarker verification are outlined in . In the discovery phase, OVCAR-3, an established human serous cell line, was grown in SCID mice. Xenograft mouse plasma pooled from four mice with the largest tumors was subjected to extensive fractionation (180 fractions) using a 4 D plasma proteome separation method developed in our laboratory which consists of immunoaffinity depletion of major serum proteins, MicroSol IEF, 1D SDS-PAGE, and LC-MS/MS. 
Representative analytical and preparative 1D SDS gels showing MicroSol IEF fractions prior to LC-MS/MS analysis can be found in Figure S1 B and C
, respectively. Human proteins identified in the plasma by at least two peptides, at least one of which was uniquely human, were prioritized and verified as illustrated and described in further detail below. Although this biomarker candidate discovery study used a cell line representative of a late stage tumor, our working hypothesis is that the best cancer biomarkers will be shed by the tumor into the blood and will correlate with tumor size. These biomarkers will ideally be detectable in serum or plasma at higher levels than in control subjects, even when the tumors are small, and the levels of these biomarkers will increase as the tumor grows. By utilizing the xenograft mouse model and identifying human proteins, we are assured that the candidate biomarkers are derived from the tumor and shed into the blood, at least in this model system.
Scheme for ovarian cancer biomarker discovery and efficient verification using a xenograft mouse model.
Analyses of the Xenograft Mouse Plasma Proteome and Corresponding Tumor Supernatants
Plasma from four mice containing OVCAR-3 tumors that were at least 1 cm3 was pooled and analyzed using the 4D method described above. Fractionation using MicroSol IEF and 1D SDS gels yielded 180 fractions and subsequent analysis of these fractions by LC-MS/MS produced more than 1.1 million spectra, which were searched against a combined human and mouse database. A total of 3647 non-redundant human and mouse proteins were initially identified by 22,890 peptides at a peptide FDR of 5.7% for all proteins, and a 0.5% FDR for protein identifications with two or more peptides. After species classification, 268 human proteins were identified by two or more peptides and an additional 550 by a single peptide (). Because the FDR was considerably lower for proteins with ≥2 peptides in both the plasma and tumor supernatant samples, we only considered proteins having two or more peptide identifications for downstream analyses.
Proteins identified in the xenograft plasma and tumor supernatant.
Due to the difficulty of detecting low abundance human proteins in the mouse plasma, tumor supernatants from the same batch of SCID mice with OVCAR-3 tumors were analyzed to attempt to achieve more extensive sequence coverage of human proteins detected in the mouse plasma. Concentrated tumor supernatants were separated on 1D SDS gels, each lane was sliced into 60 uniform fractions (Figure S1A
), and each sample was digested with trypsin followed by LC-MS/MS analysis resulting in 487,076 MS/MS spectra, which were searched against a combined mouse and human database. A total of 6066 unique proteins were identified from 46,111 peptides at a peptide FDR of 0.9%. Eliminating single peptide proteins resulted in 4619 unique protein entries, with a peptide FDR of 0.03%. This list of high-confidence proteins (≥2 peptides) identified from the combined human and mouse dataset is listed in Table S1
. This complete dataset was divided into “human” and “mouse” based upon the presence of at least one peptide unique to that species, while “indistinguishable” proteins contained only peptides common to both species. A total of 2843 human proteins were identified by two or more peptides, and an additional 727 human proteins were identified by single peptides ().
Interestingly, the tumor supernatant dataset provided a much greater depth of analysis both in terms of total proteins identified and sequence coverage of most proteins, despite the less extensive fractionation used. Also, the proportion of total identified proteins that could be assigned as human was far higher in the tumor supernatant. In part, this was expected because the plasma analysis was dominated by detection of high and medium abundance mouse plasma proteins. However, it also indicates that the contribution of mouse cells in the tumor, including fibroblasts and vascular cells, was relatively minor compared to shedding of proteins by the human tumor cells.
The Tumor Supernatant Increases Sequence Coverage of Human Proteins Detected in the Xenograft Mouse Plasma
Over 2800 human proteins from the tumor were identified in the supernatant and this dataset is a possible source of additional plasma biomarkers for ovarian cancer. However, unless the proteins were also detected in the mouse plasma, there is no assurance that proteins observed in the supernatant would be shed into and be detectable in the blood. Furthermore, the plasma analysis had already identified nearly 300 human proteins, which exceeds the number of proteins that could be feasibly tested in human serum. Hence, in this study, the larger tumor supernatant dataset was only used to confirm the “human” assignment of proteins identified in the mouse plasma, although this large dataset almost certainly contains additional potential plasma biomarkers that could be explored in future studies. Confirmation of apparent human proteins in the mouse plasma is important because many proteins were assigned as human based upon the detection of one human peptide and one or more peptides with sequences common to both species. A few of these apparent human proteins may be false positives while others could represent unreported mouse polymorphisms, mouse sequences not reported in the database, etc. Also, some identified proteins represent a protein family but the peptides identified in the plasma do not unambiguously define a unique isoform. Therefore, for most of the human proteins identified in the plasma, the tumor supernatant dataset improved the confidence of species assignment and in some cases more clearly defined family member(s) present in the xenograft plasma samples by confirming the original peptide and protein identifications and, in most cases, providing more extensive peptide coverage.
Examples of using the tumor supernatant data to expand the utility of data from the xenograft plasma are shown in . PSMA1 () was identified by a total of 10 peptides in the mouse plasma, but only a single one of these was a uniquely human peptide. This protein could have been de-prioritized because of its high homology to a mouse counterpart and the possibility that the single uniquely human peptide in the plasma might have been a false positive identification or unknown mouse sequence variant. However, the tumor supernatant identified an additional five peptides that were uniquely human, thus increasing the confidence of the species assignment for the original plasma identification. shows PSME2, a protein identified by two uniquely human peptides in the plasma dataset, and therefore the species assignment as human is well supported. But, the tumor supernatant analysis identified six additional human peptides, thereby providing more proteotypic peptides for setting up MRM assays.
Sequence coverage for selected human proteins from the xenograft plasma and tumor supernatants.
The tumor supernatant and plasma datasets were compared to a study by Pitteri et al.
that identified candidate biomarkers by comparing a genetically engineered mouse model and secretomes of ovarian cancer cells. 
That study validated eight proteins found to be at higher abundance levels in ovarian cancer patients’ plasma, and they also described identification of an additional nine proteins previously identified as ovarian cancer plasma biomarkers. Of the 17 candidate markers described by Pitteri et al.
, we identified eight proteins (CTSB, FASN, IGFBP2, LCN2, MIF, THBS1, WFDC2, and NRCAM) in our tumor supernatant analysis, and three proteins (FASN, IGFBP2, and LCN1) in the high-confidence xenograft plasma dataset.
We also compared the results from the current study using OVCAR-3 cells, a serous EOC cell line to an earlier xenograft mouse study using an endometrioid EOC cell line (TOV-112D) where we identified three new biomarkers of ovarian cancer that could distinguish cancer patients from normal individuals. 
These three biomarkers, CLIC1, CTSD, and PRDX6, were all identified in the current study.
Overall, these comparisons show that different biomarker discovery strategies result in detection of overlapping, but non-identical sets of biomarkers. These data also demonstrate that analysis of the tumor supernatant in parallel with xenograft mouse plasma is useful for confirming candidate biomarkers detected in xenograft mouse plasma.
Prioritization and Selection of Candidate Biomarkers for Verification Using Patient Sera
Efficient methods for selecting the best candidate biomarkers and economically verifying them in serum or plasma of EOC patients are needed because, some, but not all proteins shed by EOC tumors into blood are expected to be good biomarkers of the disease. Furthermore, some proteins detected in the xenograft mouse model may not be detectable in human blood using current methods either because the concentration in human blood is below detection limits of available assays or because in some cases the shedding may be unique to the mouse model. As noted above, when we evaluated a panel of candidate biomarkers from the TOV-112D xenograft mice tumors, the overall success in setting up MRM assays and demonstrating elevated levels of the targeted biomarker in EOC patient sera was only about 20%. Hence, an important challenge is to develop appropriate methods for more efficient triaging of candidate biomarkers and evaluating them in serum of EOC patients.
The xenograft plasma proteome was prioritized starting with the 268 human proteins identified by two or more peptides (). This dataset was further refined by removing a few trypsin and keratin contaminants that were missed at the initial contaminant-removal step due to ambiguous protein descriptions or isoform differences. In addition, proteins known to be in normal human plasma at medium- to high-abundant levels (>100 ng/mL 
) and hemoglobins were removed. Such proteins were not considered to be viable candidate biomarkers because the contribution of shedding from a small tumor is unlikely to be discernible above the normal variation of that protein in the general population. For example, if a protein is normally in the plasma of unaffected individuals in the 1–5 µg/mL range and the protein is also shed by a typical ovarian tumor, which contributes another 50 ng/mL of that protein into the plasma, the contribution from the tumor is not detectable above normal variation.
Verification of Candidate Biomarkers in the 15–50 kDa Region Using the Tumor Supernatant and Label-free Discovery Proteomics Analysis of Patient Pools
Candidate biomarkers in the 15–50 kDa region of the gel were selected for further prioritization and verification because this was the region of the gel that contained the largest density of human proteins in the xenograft plasma analysis. By focusing on a discrete region of the gel we could increase the subsequent throughput of MRM assays by minimizing the number of fractions that need to be analyzed to quantitate the targeted group of candidate biomarkers. Candidates in this region that were identified with at least the same number of peptides in the tumor supernatant were considered further ( and ). These candidate biomarkers were then compared to data from an in-depth label-free quantitative comparison of pools of patient sera using a 4 h gradient for the LC-MS/MS runs. One serum pool from patients with benign tumors (pool B, n
9), was compared to three serum pools from patients with advanced ovarian cancer (pool C1: stage 3, n
9; pool C2: stage 3, n
9; pool C3: stage 4, n
5). Descriptions of the patients and the sample pooling strategy are provided in Table S2
. Acquisition of full MS and data-dependent MS/MS scans were identical to those described for the xenograft proteome analyses, with the exception that ions subjected to MS/MS were excluded from repeated analysis for 180 s. Xenograft plasma candidate biomarkers that could be detected in these human serum pools were quantitatively compared across pools using peptide ion signal intensities from the Rosetta Elucidator System's peptide report results. Peptides were grouped into consensus proteins by protein description and peptide intensities were summed for each protein. The criteria for selecting candidates for further validation were proteins that showed increases in all three cancer pools compared with the benign serum, and where the average intensity of the three cancer pools was at least 1.7 times that of the benign serum pool (). Candidates whose protein intensities did not increase in cancer were not considered to be good biomarkers (). For example, ARG1 and AZGP1 failed because they showed decreases in cancer relative to benign disease–a trend that does not correlate with cancer burden, and DSC1 and SBSN were not further considered because the benign and cancer pools exhibited similar levels of these proteins. shows a number of proteasome subunits that exhibited increases in ovarian cancer. The proteasome complex is responsible for degradation of proteins crucial to cell cycle regulation and apoptosis and has been recognized as a potential target for cancer therapy. 
Specific proteasome subunits, including PSMB2 and PSMB4, have been identified as upregulated in gene expression profiles of ovarian carcinomas. 
Interestingly, circulating intact proteasomes have recently been reported to correlate with EOC, 
but the assay used in that study did not distinguish specific isoforms or quantify subunits that may not have been in intact proteasomes. shows representative additional promising candidates. One candidate, AGRN, is a 215 kDa protein previously identified as being upregulated in ovarian cancer tissue samples compared with normal and non-ovarian tissue samples, 
but it has not previously been reported to be a serum biomarker for EOC. In this study, it was identified by SDS-PAGE as both the intact 215 kDa protein and as a 43 kDa fragment from the C-terminal region of the protein in the tumor supernatant. In contrast, only the 43 kDa fragment which is presumably a proteolytic fragment produced by proteolysis either in the tumor or in the blood was detected in the xenograft plasma. The peptides quantitated in belong to the fragment and correlate with ovarian cancer in this experiment. Additionally, six proteins, including ANXA1, FABP5, PSMB3, PSMB6, PSMB8, and PSMB9 were deprioritized because they were either closely related to other selected biomarkers, or based on biology were considered unlikely to be specific to ovarian cancer. Finally, three biomarkers previously reported by others were detected in either the xenograft mouse plasma or tumor supernatant or both () and were within the targeted 15–50 kDa region of the gel. These known biomarkers, which included HE4 (WFDC2) one of the two FDA approved ovarian cancer biomarkers, were included in our prioritization and verification analyses as known biomarker references. As expected, these three proteins exhibited increased levels in the cancer pools compared with the benign pool (). CA125 was not identified in the xenograft plasma, presumably due to its extensive glycosylation and low concentration as well as the high complexity of plasma; however, it was identified in the tumor supernatant by its alternative protein name ‘Mucin-16′ (Table S2
Candidate biomarkers for validation in patient serum pools.
Quantitative comparisons of candidate biomarkers using label-free discovery mode LC-MS/MS analysis of patient serum pools.
Potential Correlation of Biomarkers with Gene Expression
To evaluate whether gene expression in ovarian tumor tissues could be a useful indicator of whether a protein is promising serum biomarker, we queried our candidate biomarkers from against published microarray hybridization data using BioGPS, a centralized gene portal of combined gene annotation resources. 
Specifically, gene expression levels for normal ovarian tissue (n
4) and papillary serous ovarian carcinoma primary tumor samples (n
were extracted for each of the candidate markers listed in . shows examples of gene expression patterns for some of our novel and two known ovarian cancer biomarkers. These gene expression levels can be compared with the observed levels of these same proteins in the benign and EOC patient serum pools (). Some proteins show similar trends; that is, elevated levels in both the serum and tumor tissue levels for EOC, including AGRN, TPI1, and HE4. However, other proteins do not exhibit much similarity between tissue expression and serum levels. For proteins such as YWHAH and PSME2, gene expression levels overlap extensively between normal and cancer tissue, but the serum levels of these proteins show similar patterns to those for AGRN and TPI1. Also, PSMA1 exhibits similar expression levels between normal and EOC tumor samples but much higher levels in the serum of EOC patients compared with benign tumor controls. Interestingly, at the gene expression level, each of the four illustrated proteasome subunits exhibits differing expression patterns at the cancer tissue level but all four subunits show elevated serum levels in all cancer patient pools compared with the benign sera.
Gene expression of candidate biomarkers in ovarian tissues.
Overall, these comparisons suggest that gene expression levels are not reliable indicators of blood levels of a given protein, and use of gene expression levels to predict blood biomarkers is likely to be of limited value. This is not surprising because: 1) gene expression levels do not always correlate with protein abundance within cells; 2) shedding of proteins into the extracellular space and, more specifically, into the vascular system, does not necessarily depend upon the tissue levels of that protein; and 3) changes in proteolytic processing, PTM levels, or other processing of proteins that might affect their blood concentration may differ between normal and cancer states.
MRM Assays and Quantitation of Normal, Benign, Early Stage EOC and Late Stage EOC Serum Pools
We subsequently attempted to set up MRM assays for the 11 novel and three known biomarkers shown in as described in Methods
. MRM assays achieve high selectivity by monitoring the combination of the specific mass/charge of a parent ion and a unique fragment ion produced after collision to quantify the targeted peptide in a complex mixture. MRM assays targeting at least two peptides per protein were successfully established for all targeted proteins. The methods were integrated into a single multiplexed MRM assay that was subsequently used to quantitate the levels of these proteins in four serum pools, including: a normal serum pool (pool N; n
9), a benign ovarian tumor pool (pool B, n
10), an early-stage ovarian cancer pool (pool E: stage 1 and 2, n
18), and a late-stage cancer pool (pool L: stage 3, n
29). The cancer pools included serum from patients with different EOC histotypes, although the majority of tumors were the serous subtype as is typically the case in groups of EOC patients. Details of the patients and samples used to prepare these pools are summarized in Table S2
. The peptides and transitions used in the integrated multiplex MRM assay, as well as the resulting relative quantitative data for the four pools, are shown in Table S3
. Resulting relative protein quantities for the four pools are summarized in for the 11 novel candidates. The levels in the same serum pools of the three previously reported biomarkers are shown for reference in .
Verification of promising candidate biomarkers using a label-free MRM assay.
These results confirm that quantitative MRM assays were established for all 11 targeted proteins and that all targeted proteins showed elevated levels in initial analysis of sera from advanced EOC. This 100% success using these two criteria is dramatically better than the 20% success rate achieved for setting up MRM assays for TOV-112D derived candidate biomarkers showing elevated levels in initial screens of sera from advanced EOC. This more efficient selection of candidate biomarkers was achieved because the current strategy utilized additional criteria prior to attempting to set up MRM assays. The key advantages of the current approach include analysis of the tumor supernatant to extend sequence coverage for putative human plasma proteins in the xenograft mouse plasma and comparison of remaining high priority candidate biomarkers to an in-depth discovery mode quantitative comparison of serum pools from benign and advanced EOC patients. This latter analysis identified those proteins detectable in patient serum as well as those proteins exhibiting elevated levels in EOC patient sera. Interestingly, approximately two-thirds of the high priority candidates from the xenograft mouse plasma both expected to be in the 15–50 kDa region and verified in the tumor supernatant, were detected in the patient pools and half of these met the criteria used above for elevated levels in EOC serum. This 20% success rate is very similar to that obtained with the TOV-112D candidate biomarkers. The major difference in the current study is that time and expenses were not invested in attempting to set up MRM assays for the 75–80% of biomarkers that would ultimately fail to be detected in patient plasma or that would not show elevated levels in advanced EOC patient serum. Although substantial mass spectrometer and analysis time was invested in conducting the in-depth discovery mode quantitative comparison of serum pools from benign and advanced EOC patients, these analyses do not need to be repeated as the same dataset can be used to screen future candidate biomarkers. Additionally, this study includes pools of mixed histotypes that approximate the mixtures of cancer subtypes typically seen clinically because the numbers of available samples and the assay throughput were too low to distinguish potential subtype specific biomarkers. One goal of future studies using higher throughput assays such as sandwich ELISA will be to carefully evaluate potential relationships between EOC subtypes and these biomarker candidates.
In the current study, improved strategies for both discovery and triaging novel blood biomarkers for EOC have been developed. The utility of analyzing xenograft mouse plasma and the corresponding tumor supernatant in parallel was demonstrated. The presence of human proteins in the plasma demonstrated these proteins were produced by the tumor and shed into the blood, but many of these assignments were based upon only a few peptides. Analysis of the tumor supernatant produced far more extensive sequence coverage for human proteins and confirmed many of the proteins identified in the plasma as human. In most cases an increased sequence coverage provided additional peptide candidates for potential MRM assays. A second key step in the prioritization and verification strategy was to compare candidate plasma biomarkers to an in-depth, label-free comparison of benign disease and advanced cancer patient serum pools to prescreen candidate biomarkers prior to setting up MRM assays. By extending reverse-phase gradients for the discovery mode analysis of these samples to four hours, the detection sensitivity is similar to that of MRM assays using shorter gradients. That is, if a protein cannot be detected in this dataset, it will probably not be feasible to set up an MRM assay and, therefore, effort is not wasted in assay development. Furthermore, by comparing candidate biomarker levels in the benign and advanced cancer patient pools, only those proteins showing elevated levels in the cancer sera advance to MRM assay development. This new approach reduces the effort invested in setting up MRM assays by about four-fold relative to the biomarkers detected in advanced EOC patient sera at elevated levels. Based upon initial screening of large pools of normal, benign, early, and advanced ovarian cancer sera, all of the biomarkers selected for MRM assay development in the current study should move forward and be further evaluated using serum or plasma from individual patients and controls. Although the fold changes observed in the pooled samples for most of these candidate biomarkers are not as large as HE4, the ranges of values for these biomarkers in individual EOC and control sera need to be determined in order to compare their diagnostic capacities to HE4 and CA125. While it is unlikely that most individual biomarkers will prove to be superior to HE4 or CA125, it is more likely that combinations with each other or with CA125 and HE4 could outperform the use CA125 or HE4 alone. Finally, the strategies developed in this study demonstrate that in-depth analysis of xenograft mouse plasma with efficient pilot verification using multiplexed assays can efficiently identify multiple promising candidate EOC biomarkers. This approach can be readily applied to further in-depth analysis of the OVCAR-3 cell line, as well as other EOC cell lines to identify additional EOC plasma biomarkers.