|Home | About | Journals | Submit | Contact Us | Français|
We have begun an early phase of biomarker discovery in three clinically important types of breast cancer using a panel of human cell lines: HER2 positive, HER2 negative and hormone receptor positive and triple negative (HER2−, ER−, PR−). We identified and characterized the most abundant secreted, sloughed, or leaked proteins released into serum free media from these breast cancer cell lines using a combination of protein fractionation methods before LC-MS/MS mass spectrometry analysis. A total of 249 proteins were detected in the proximal fluid of 7 breast cancer cell lines. The expression of a selected group of high abundance and/or breast cancer specific potential biomarkers including thromobospondin 1, galectin-3 binding protein, cathepsin D, vimentin, zinc-α2-glycoprotein, CD44, and EGFR from the breast cancer cell lines and in their culture media were further validated by Western blot analysis. Interestingly, mass spectrometry identified a cathepsin D protein single-nucleotide polymorphism (SNP) by alanine to valine replacement from the MCF-7 breast cancer cell line. Comparison of each cell line media proteome displayed unique and consistent biosignatures regardless of the individual group classifications demonstrating the potential for stratification of breast cancer. Based on the cell line media proteome, predictive Tree software was able to categorize each cell line as HER2 positive, HER2 negative and hormone receptor positive and triple negative based on only two proteins, muscle fructose 1,6-bisphosphate aldolase and keratin 19. In addition, the predictive Tree software clearly identified MCF-7 cell line overexpresing the HER2 receptor with the SNP cathepsin D biomarker.
Early detection is one of the most effective means to decrease cancer mortalities1. Protein biomarker discovery by mass spectrometry research represents a promising new approach to improve cancer detection and enable earlier treatment2. Currently mammographic screening is the gold standard in early detection of breast cancer3, 4. However, mammography frequently fails to detect tumors in women with increased density in breast tissue and those with lobular cancer5–7. In addition, routine mammography screening is unaffordable to many women even in the USA and the vast majority of the developing world8. The recent age-related controversy on the inability of mammography to detect tumors in younger women as well as a significant level of false positives in different age groups9, 10 further underlies the importance in developing alternative techniques. Our goal is to develop a more affordable and easily obtainable screening tool for the early detection and characterization of breast cancer. Plasma/serum is the most suitable clinical specimen for biomarker research because it is attainable by non-invasive means, extraction is feasible, and it is likely to contain tumor markers11. However, both patient populations and breast cancer are heterogeneous in nature12 which can complicate the discovery phase of the assay development. Furthermore, there are two major technical hurdles in identifying disease-related protein biomarkers in serum. First, protein concentrations vary by 10–11 orders of magnitude in serum with the useful biomarkers in the lower end of this spectrum. Second, the 20 most abundant serum proteins that make up more than 99% of total protein mass can obscure the finding of low abundance proteins13, 14. Both the wide concentration range and the interference of the high abundance proteins can mask detection of less abundant serum proteins. One way to circumvent these hurdles is to begin by analyzing breast cancer cell lines and their released proteins15 for comparison with matching biopsied tumor samples.
The analysis of proximal fluid from a homogenous cancer source, provides a pool of leaked, secreted and sloughed proteins that may be similar to the proteins found in the interstitial fluid of tumor tissue. Most of the clinically useful tumor markers such as prostrate specific antigen (PSA), cancer antigen 125 (CA125), carcinoembryonic antigen (CEA) and alpha-fetoprotein (AFP) are membrane proteins16, 17. These proteins may be released into the interstitial fluids thus enter the patients' blood circulation18–20. The additional advantages of studying the in vitro proximal fluid are reduced levels of human serum proteins and higher concentrations of tumor-related proteins allowing the identification by mass spectrometry.
Mass spectrometry currently is capable of identifying the proximal fluid proteome at a dynamic range of 1–4 orders of magnitude21, 22. Applying this approach to cancer proximal fluid biomarker discovery and characterization improves the likelihood of detecting new biomarkers that are present in serum at a dilution of 10-11 orders of magnitude13. When candidate biomarkers are identified and validated not only from studying the cancer cell lines and their proximal fluid, but also cancer tissue, then an effective, practical, and highly sensitive validation assay can be developed with corresponding antibodies to the new biomarkers. In addition, the identified peptides transition states from candidate biomarkers may be used for selective ion monitoring (SIM) during mass spectrometry analysis to screen small volumes of human serum samples while ignoring the high abundant serum proteins23.
Breast cancer is heterogeneous with alternate splicing leading to multiple protein expression, function and activity from a single gene. Currently, breast cancers are grouped into 3 clinical types, HER2 positive, estrogen receptor (ER) and/or progesterone (PR) positive/HER2 negative and triple negative based on the presence or absence of these three biomarkers24. The clinical classification of breast cancers determines the type of adjuvant therapy and predicts clinical outcomes of women with different types of breast cancer25. However, despite targeted treatment of these three breast cancer markers, the successful outcomes are not uniform with many recurrences after the initial treatment25–27. In addition, these three breast cancer markers are typically expressed in lower quantities making them difficult to identify in serum/plasma for screening use. Carcinoma antigen 27.29 (CA27.29, MUC1) and CEA are the only two circulation-borne and breast cancer related biomarkers used clinically, however lack of sensitivity and specificity in blood assays result in no early detection of breast cancer and although elevated levels in blood reflect recurrent/metastatic disease, normal levels may not indicate the lack of disease presence18–20, 28–30. Recently, new candidate tumor biomarkers such as keratin 18, keratin 8, EGFR, CD44, as well as others have been reported20, 31, 32. Taken together, further proteomic characterization is needed to refine subtypes of breast cancer, as well as improve early detection, and systemic treatment of breast cancer.
In this study we have begun an effort to identify the breast cancer proteome by LC-MS/MS in the culture media (proximal fluid) of seven human tumor cell lines representing the 3 major types of breast cancer defined clinically. The quantitative presence of several high abundance breast cancer related proteins, was validated in both the breast cancer whole cell lysates and the proximal fluid by antibody assays. Most of the validated proteins were selected based on their overlap with our previous reports of enriched N-linked glycoproteins present in cancer cell membrane fractions by the hydrazide method32 and were also found in hydrophobic fractions33. Interestingly, we also identified a single-nucleotide polymorphism (SNP) in cathepsin D by LC-MS/MS unique to MCF-7 and MCF-7HER2 cell lines. Further development of a sensitive blood assay and mass spectrometry based selective ion monitoring methods will be necessary to determine the significance of these candidate biomarkers in breast cancer patients.
A panel of breast cancer cell lines including HER2 negative and estrogen receptor (ER) and progesterone receptor (PR) positive T47D and MCF-7, HER2 transfected MCF-7 (MCF-7HER2), HER 2 positive SKBR-3 and MDA-MB-453, and triple negative breast cancer (TNBC) MDA-MB-468 and MDA-MB-231 were maintained in DMEM culture medium supplemented with L-glutamine and sodium pyruvate, 10% FBS, and 1% penicillin and streptomycin. Cells were grown to 70% confluency in fifteen 10 cm petri dishes. Media was removed and cells were washed with serum free media three times each, and incubated with serum free media for 12 hours. Serum free media was collected from 5 of the 15 petri dishes, combined into 3 separate sets for each cell line. Cells were harvested by centrifugation (1,000 × g, 2 min 4 °C) and combined in the same manner as media. Both were stored at −80 °C until needed.
Serum free media (proximal fluid) was centrifuged to remove any cellular debris and a 1 ml aliquot of culture media was removed for Western dot blot validation. The remainder or the proximal fluid was concentrated via centricon (3000 MW cut off, Vivaspin 20, Sartorius Biolab Products), speed vacuum dried, solubilized in 40 mM Tris-HCl pH 8.3, 6 M guanidine HCl, 5 mM DTT, centrifuged (15,000 × g, 2 min, RT), and supernatants were diluted with <1 M guanidine HCl with 40 mM Tris-HCl pH 8.3. The sample was sequentially treated with iodoacetamide and trypsin (overnight, 37 °C) according to manufacturers protocol (Promega). The pH of the samples was adjusted to pH 3 with TCA, before centrifugation (15,000 × g, 2 min, RT) and the supernatants were passed through an activated, washed C18 spin column (The Nest Group, Inc.) according to manufacturers protocol.
The frozen cell pellets were homogenized in ice cold 50 mM Tris-HCl pH 7.4, 150 mM NaCl, 1 mM EDTA, 1 mM dithiothreitol (DTT), 1 mM PMSF, 1% NP40, and protease cocktail (Roche Cat. No. 04693132001) using an ultrasonic cell disrupter (Fisher Scientific, sonic dismembrater model 100, at setting 4 for 2 × 10 seconds at 30 seconds intervals) on ice. Samples were centrifuged at 15,000 × g for 10 min at 4 °C to remove large debris. Protein concentration was determined by Bradford assay (Biorad), proteins were solubilized with Laemli buffer, and subjected to Western blot analysis.
Equal amounts of protein were loaded onto 4–12% SDS-PAGE, transferred to nitrocellulose and blotted with the antibodies indicating HER2, MUC1, ERα, thromobospondin 1, galectin-3 binding protein, cathepsin D, vimentin, zinc-α-glycoprotein, CD44, or EGFR (1:2000 dilution; Santa Cruz biotechnology, INC., Santa Cruz, CA). Dot blots were performed on the proximal fluid from an equal, quadruple count of breast cancer cells.
Dried samples were treated as described in Whelan et al.32, by dissolving in Buffer A (H2O/acetonitrile/formic acid, 98.9/1/0.1), separated by nanospray LC (Eskigent technologies, Inc. Dublin, CA), and analyzed by online tandem LTQ Orbitrap mass spectrometry (Thermo Fisher). Aliquots were injected (10 μl) onto a reverse phase column (New Objective C18, 15 cm, 75 μM diameter, 5 μm particle size equilibrated in Buffer A) and eluted (300 nL/min) with an increasing concentration of Buffer B (acetonitrile/water/formic acid, 98.9/1/0.1; min 0/5, 10/10, 112/40, 130/60, 135/90, 140/90). Eluted peptides were analyzed by MS and data-dependent MS/MS acquisition (collision-induced dissociation CID) selecting the 7 most abundant precursor ions for MS/MS with a dynamic exclusion duration of 15.0 seconds.
The mass spectra were searched against a IPI 3.73 human trypsin indexed database (two trypsin missed cleavages), in both the forward and backward directions (decoy database; 179, 304 proteins), with variable modifications of carboxyamidomethylation, deamidation and methionine oxidation using the Proteome Discoverer software 1.3 (Thermo Fisher) based on the SEQUEST algorithm and Mascot (Matrix Science, UK). Quantitative data analysis was performed using the Scaffold 3.43 (Proteome Software, Inc.) program. The Proteome Discoverer search results were uploaded into the scaffold software program and a filter with a 99% minimum protein ID probability (calculated probability of correct protein identification), with a minimum number of 2 unique peptides for one protein and a stringent minimum peptide ID probability of 95% was set (all data and parameters are included in supplemental data). Scaffold verifies peptide identification derived from MS/MS sequencing results using X! Tandem34 ProteinProphet computer algorithms35. Scaffold normalizes MS/MS data between samples with similar total protein amounts by averaging the spectral counts for all the samples and then multiplying the spectral counts in each sample by the average divided by the individual sample's sum. The proximal fluid from each breast cancer cell line is represented by three replicates each containing a combination of five separate experiments.
One at a time mean intensity comparisons across the 4 cancer cell groups (triple negative breast cancer, HER2+ and hormone negative, hormone positive and HER2 negative, and HER2 transected MCF7) were carried out using one way analysis of variance (ANOVA) methods. The overall F statistic and corresponding p value under this method is reported. A separate comparison is done for each of the top 229 proteins.
Visualization of the high dimensional intensity scattergram was carried out by computing the first two principle components. The principle components were computed from the subset of all proteins that had statistically significant mean intensity differences for at least one group compared to the others.
A multivariate classification tree analysis (CART-classification and regression tree) was also carried out in order to identify a subset of the up the top 229 proteins that best classified the observations into the four groups. The binary recursive partitioning algorithm is used by the tree.
The aim of this study was to identify the most abundant proteins in the proximal fluid of growing breast cancer cells which represent the proteins sloughed off, secreted, leaked or cleaved by proteases from the cells. The breast cancer cell lines were selected instead of cancer tissues as the initial step for new biomarker discovery because they are more homogenous, can be grown to any quantity necessary, and easily manipulated experimentally. They are also readily available to other investigators to reproduce these findings and provide proximal fluid to define new cancer biomarkers. We analyzed the proximal fluids of 7 different breast cancer cell lines in triplicate from the three major categories of breast cancer, HER2 positive (SKBR-3, MDA-MB-453), HER2 negative and hormone receptor positive (T47D, MCF-7), triple negative (MDA-MB-468, MDAMB-231), and MCF-7 transfected with HER2 (MCF-7HER2). We used the MCF-7HER2 cell line to assess if it resembles its native HER2 negative parental cancer cell line or the group of HER2 positive breast cancer cell lines. The proximal fluid of each cell line was concentrated, digested and subjected to LCMS/MS mass spectrometry. All mass spectrometry data was searched using Bioworks and Mascot against the IPI human 3.73 database with a built in decoy database, then uploaded into Scaffold, to quantitatively analyze the proximal proteins of all 7 breast cancer cell lines (Table 1, top 50 proteins). While uploading data in Scaffold the data was verified by X! Tandem. A total of 249 proteins were identified in the serum free media from 7 breast cancer cell lines with a stringent scaffold filter setting with a 99% minimum protein ID probability and a minimum of 2 unique peptides for one protein (Supplemental Table 1, all 249 proteins). In addition, the averages of triplicate experiments along with the standard deviation was calculated for each identified protein (Supplemental Table 1). The standard deviation of the triplicate samples for each cell line demonstrate significant reproducibility. A significant portion of the proximal fluid proteins are localized in the extracellular region (76 proteins) and plasma membrane (54 proteins) and have diverse biological functions and processes (Supplemental Figures 2–4). However, more importantly several proteins were found to vary considerably in the serum free media of the 7 breast cancer cell lines, thrombospondin 1, galectin-3 binding protein, cathepsin D, vimentin, zinc-α2-glycoprotein (ZAG), CD44, EGFR, keratin 18 and enolase as shown in Figure 1. Seven of these proteins were selected for further validation in the whole cell lysates of each cell line and their respective proximal fluid. The criteria for selecting these proteins for validation was based on their high abundance and differential expression in the different breast cancer groups.
Cell lysates of the 7 breast cancer cell lines were subjected to Western blot analysis in triplicate (1 replicate= average of 5 combined experiments). In most cases the expression and levels of each protein in the cells as detected by Western blot (Figure 2) were consistent with the findings of mass spectrometry analysis. Western blot analysis of HER2, ERα and MUC1 were incorporated as a quality control of the assay (Figure 2). MCF-7 and MCF-7HER2 cell lines both contained ERα and MUC1 consistent with the literature, as well as thrombospondin 1 and low levels of galectin-3 binding protein. Although MCF-7 and MCF-7HER2 had high levels of cathepsin D by mass spectrometry analysis, a monoclonal antibody to cathepsin D (antigen 1–75) failed to detect this protein in the Western blot of the same cell lysates. This discrepancy between the two assays may have been due to a posttranslational modification or mutated amino acid in the first 75 amino acids that interferes with the monoclonal antibody binding site. Both HER2 positive cell lines had high levels of HER2 and thrombospondin 1. However, SKBR-3 had high levels of galectin-3 binding protein and cathepsin D, while another HER2 positive cell line MDA-MB-453 had high levels of zinc-α2-glycoprotein (ZAG) and low levels cathepsin D. Signature cancer proteins, EGFR and CD44, were both found in a triple negative cell line, MDA-MB-468, as well as ZAG, galectin-3 binding protein and cathepsin D. Although another triple negative cell line, MDA-MB-231, had higher levels of cathepsin D and galectin-3 binding protein and significant levels of vimentin, reduced CD44 and no detectable levels of ZAG, EGFR, and MUC1 were seen. Our study suggests that even within the same subtype of breast cancer, significant differences exist in each breast cancer cell line.
Proteins identified by mass spectrometry in serum free media were then confirmed by dot blot analysis (Figure 3). The dot blots analysis allowed us to analyze multiple samples in triplicate and compare relevant levels of protein in the serum free media using sensitive and specific antibodies. Consistent in almost every case, levels of each proteins found by mass spectrometry coincided with the results of dot blot analysis of the serum free media, except from MDAMB-231 for the expression of thrombospondin 1 and the absence of cathepsin D from MCF-7 and MCF-7HER2 cells. Though there was some non-specific background observed in a few dot blots, development of IgY antibodies in chickens may give cleaner results. We were able to detect these proteins at 1 to 500 dilution of the sample (data not shown).
The lack of cathepsin D detection in MCF-7 and MCF-7HER2 proximal fluid was consistent with the Western blot data of their whole cell lysates. Interestingly, the mass spectrometry coverage of MCF-7HER2 cathepsin D (MS/MS coverage of 24–54 amino acids) did not completely span the antigenic region (first 75 amino acids) recognized by the monoclonal antibody (Figure 4A). While in SKBR-3 cells the mass spectrometry coverage only included amino acids 45–82 of the cathepsin D, strong immunoreactivity was seen in both Western blot analysis of the whole cell lysate and proximal fluid (Figure 4A). Therefore, the Universal Protein knowledgebase (UniProtKB) was used to search for possible posttranslational modifications and/or amino acid mutations on cathepsin D that would block the antibody interaction. Since there are 4 potential single-nucleotide polymorphisms (SNPs) that could occur in cathepsin D (58A →V; 229F→I; 282G→R; 383W→C)36, 37, a database was manually constructed containing all potential cathepsin D SNPs and mass spectrometry data of all seven cell lines was searched. Consistent with the Western blot findings of whole cell lysate and the proximal fluid analysis we detected the wild type amino acid sequence 55-YSQAVPAVTEGPIPEVLK-72 in cell lines MDA-MB-231 and SKBR-3 (Figure 4B). However, only the MCF-7 and MCF-7HER2 proximal fluid cathepsin D was found to have a SNP at amino acid 58A→V (Figure 4C). The identification of a SNP in cathepsin D may allow for the development of a new breast cancer cell line specific biomarker antibody.
Clinical grouping of breast cancer currently is based on three major protein biomarkers, HER2, estrogen receptor, and progesterone receptor. However, many other proteins defining unique features of a cancer cell may also prove important in breast cancer stratification and targeted treatment. Mass spectrometry identified 249 proteins from the seven breast cancer cell lines in triplicate quantitative spectral Scaffold analysis, thus resulting in 21 observations for each protein. These seven cell lines were classified into four groups HER2 positive, hormone receptor positive and HER2 negative, triple hormone receptor negative, and MCF-7 transfected with HER2. A blind statistical analysis (without protein names) was conducted by the UCLA statistical biomathematical consulting clinic (SBCC). Univariate comparison of mean values was used for each protein across groups via one-way analysis of variance methods (Supplemental Table 2). The mean comparisons of all 229 proteins were ranked from most statistically significant to least statistically significant. Using all four breast cancer groups including MCF7 HER2, 145 of the 229 proteins have a p value less than 0.05 in at least one of the 6 comparisons (Supplemental Table 2). Multivariate analysis of all seven cell lines was computed for all 229 proteins. A plot of first versus second principle component is the projection (shadow) of the 229 dimensional data shown in a two dimensional plot (Figure 5). Reproducibility of the principle components can be seen in the mass spectrometry data for each sample. In addition, there are differences between each individual cell line regardless of breast cancer group demonstrating the potential for breast cancer stratification.
Predictive software was used to determine if a cell line may be grouped into HER2 positive, hormone receptor positive and HER2 negative, hormone receptor positive and HER2 over-expressed or the triple hormone receptor negative cohorts by analyzing a set of unique biosignature proteins. The Tree test required only a maximum of 3 proteins, cathepsin D, fructose 1,6-bisphosphate aldolase and keratin 19, to differentiate between each cell group (Figure 6). Tree results clearly identified cathepsin D as an indicator of MCF-7 cells over-expressing HER2 with a spectral count of greater than 59.5. Secondly fructose 1,6-bisphosphate aldolase at greater than 18 spectral counts classified the cell line as HER2 positive while less than 18 spectral counts required a second protein, keratin 19, to further distinguish the remaining 2 groups. Spectral counts less than 18 for keratin 19 indicated triple negative cell lines. Spectral counts greater than 18 indicated hormone positive and HER2 negative cell lines. Interestingly, Keratin 8 is also found in higher quantities of hormone positive and HER2 negative cell lines and may be another useful biomarker in conjunction with keratin 19. The keratin 19 data correlated with our other studies in primary breast cancer tissue by both mass spectrometry analysis and Western blot validation38, 39. Western blot analysis of fructose 1,6-bisphosphate aldolase demonstrated it was present in 4 of the 5 HER2+, 2 out of 5 in the ER+PR+ while found in low levels of TNBC tissue (Figure 7). While Western blot analysis of keratin 8 in five patient samples per hormone receptor positive and HER2 negative (3 of 5 positive), HER2 positive and hormone receptor negative (1 of 5 positive) and triple negative (0 of 5 positive) breast cancer tissue was conducted (Figure 7). We further tested the potential usefulness of keratin 19 as a biomarker in the serum of 10 hormone positive, 10 HER2 positive and triple negative breast cancer patients by ELISA (Supplemental Figure 5). Although the data was not considered experimentally significant the trend of increased levels of keratin 19 in circulation was consistent with the data reflected in mass spectrometry analysis and affinity blots of the breast cancer media and whole cell extracts. Interestingly, in our recent published work (Jianbo et al.39) galectin-3 binding protein was found in higher concentrations of triple negative breast cancer tissue than HER2+ tissue. A number of other proteins may also be used in combination for identification of TNBC such as annexin A1, annexin A5, CD44, EGFR, and vimentin. These observations have not been validated extensively and is provisional, but they are encouraging for the application of mass spectrometry data to differentiating breast cancer groups. However, we plan to test a panel of biomarkers selected from this manuscript and our other breast cancer studies32, 33, 38, 39 in 250 cases by tissue microarray. In addition, a list of the top 100 biomarker candidates will also be screened in the serum of 20 hormone receptor positive, 20 HER2 positive, 20 TNBC and 20 control patients.
Early detection remains a key to improved breast cancer survival rates. Despite decades of research, little progress has been made in the development of an effective new blood assay for the early detection of any cancer. The study of blood serum for biomarker discovery has been hindered by the enormous number of serum proteins and large volume of circulating blood in cancer patients making direct identification of new breast cancer related proteins in blood to be an impossible task. In addition, 99% of serum proteins consists of 20 highly abundant serum proteins that overwhelm most methods of fractionation before detecting low abundance biomarkers of disease including human breast cancer.
Therefore, to improve the chance of developing a novel blood assay for breast cancer screening, biomarker discovery must be conducted on the source of breast cancer and nearby interstitial fluid. To circumvent the variability caused by heterogeneous breast cancer tissue we focused our study on breast cancer cell lines and their proximal fluids. Using an LTQ Orbitrap mass spectrometer in conjunction with Scaffold we were able to identify proteins with a dynamic range of 3–4 orders of magnitude and quantitatively compare the protein signatures of each of the seven breast cancer cell lines. Although identifying low abundance proteins are needed to further characterize breast cancer, it is unlikely they would be at high enough concentrations to be first detected by mass spectrometry in serum of a patient. Therefore, in this study we focused on the secreted, shed or leaked protein biosignatures of breast cancer cells as potential biomarkers for the future development of a blood assay.
Exactly 249 proteins were identified with 99% confidence from a panel of seven breast cancer cell lines representing four clinically different types of breast cancer. Selective validation of biomarkers differentially expressed in these breast cancer cells and their proximal fluid samples were performed. Seven highly expressed candidates, EGFR, vimentin, thrombospondin 1, CD44, ZAG, galectin-3 binding protein and cathepsin D, were selected for their clearly distinct biosignatures in each cancer cell line. Western blot validation of these proteins from whole cell lysates suggested their biosignatures of the cell lines were closely matched to the mass spectrometer expression levels. Furthermore, we were able to validate most of these proteins in the proximal fluid of each cell line by dot blot analysis confirming the quantitative spectral count results found by LC-MS/MS analysis. However, a few exceptions including the level of ZAG protein in the proximal fluid of MDA-MB-453 was lower than that found by mass spectrometry possibly due to digestion by proteases in the proximal fluid or differential posttranslational modifications interfering with the antibody-antigen binding.
In addition, cathepsin D was not detected by Western blots in the whole cell lysates or proximal fluid of either MCF-7 or MCF-7HER2, but found by mass spectrometry. By creating a database of all potential cathepsin D SNPs, the mutation 58A→V was identified from the mass spectrometry data. This unique mutation found only in MCF-7 cells may explain why the cathepsin D was not detected by the monoclonal antibody in the Western blot or dot blot analysis while it was clearly identified by mass spectrometry. SNPs naturally occur by a single nucleotide mutation in DNA resulting in the translation of an amino acid that is different from the wild type.
Whether the cathepsin D SNP found in MCF-7 has any biological significance is unknown, but SNPs play a pathogenic role in a number of diseases including Alzheimer's disease, Crohn's disease, autism, psoriasis, Parkinson's disease, schizophrenia and cancer40. Intriguingly, the 58Ala to 58Val polymorphism may affect the intracellular trafficking and maturation of this pro-enzyme in cancer41 and the level of beta-amyloid and tau increasing the risk of Alzhemer's disease36. The alteration in proenzyme routing in several breast cancer cell lines leads to its hypersecretion and also makes cathepsin D an excellent candidate for a blood assay. Moreover, procathepsin D (pCD), is secreted from cancer cells, acts as a mitogen on the cancer cell, stromal cells, and endothelial cells by stimulating their pro-invasive and pro-metastatic properties42. Others have also shown that over-expression of cathepsin D in human breast cancers is associated with a higher risk of relapse and metastasis41, 42. Interestingly, the Tree test designated cathepsin D as the number one biomarker in MCF-7 HER2 cell lines due to its high expression.
Another discordant finding was that the proximal fluid of MDA-MB-231 had significantly higher quantities of thrombospondin than seen by mass spectrometry analysis of whole cell lysates. This may be due to a combination of factors including posttranslational modifications, SNPs, and/or incomplete reduction and alkylation of the many disulfide bonds. Thrombospondin is also heavily glycosylated and this could also interfere with mass spectrometry identification. Since high quantities of thrombospondin are also found in blood, the identification of mutations such as the SNP found in cathepsin D or posttranslational modifications unique to breast cancer cells would need further investigation before becoming a candidate biomarkers. Monoclonal antibodies can be raised against SNPs or posttranslational modifications such as phosphorylation or glycosylation sites on normal versus disease specific biomarkers43, 44, giving rise to a highly selective tool in the detection of disease modified proteins.
All the proteins validated in this study except vimentin, may be enriched by their N-linked glycosylation sites using the hydrazide method as we described in our previous study32. The hydrazide method or a lectin column may be used to enrich for disease specific glycosylated protein biomarkers. Knowing the glycosylation patterns of these candidate biomarker will also allow for the development of antibodies to recognize the absence or presence of disease specific glycosylation. In addition, the recent discovery of O-GlcNAc modified vimentin by Slawson et al. allows for the development of site specific antibody45. Though the mass spectrometer reproducibly and consistently detected the expression levels of most protein biosignatures validated by antibody-based platforms in whole cell lysate and the proximal fluid, the value of this instrument is mainly in discovery while affinity assays are more important for later implementation of sensitive diagnostic tests.
As a positive control we also validated the presence of HER2 in the breast cancer whole cell lysates and proximal fluid despite not being detected by the LTQ mass spectrometer. The HER2 protein was found to be expressed at high levels in MCF-7HER2, SKBR3, and MDA-MB-453, but at a significantly lower level in MCF-7 and T47D breast cancer cell lines by Western blots. HER2 could also be clearly identified by an antibody-based assay in the proximal fluids of MCF-7HER2 and SKBR3 cell lines. In addition, to the specificity and sensitivity of antibody based detection systems, these antibodies may also be attached covalently to resin to enrich low abundant biomarkers from larger volumes of blood by immunoprecipitation. Theoretically, an immunoprecipitation of 10 mL of serum from a breast cancer patient with an average circulating volume of 3500–4500 mL would allow the detection of biomarkers with a dilution of 1 to 350–450 of the analyte, well within working range of dilution 1 to 500 that we observed in the Western dot blot analysis of proximal fluids. Still the key to the success of a viable blood assay is the identification of the true biosignatures of cancer from the discovery phase using mass spectrometry followed by the validation of biomarkers by affinity assays.
Although 249 proteins were detected by mass spectrometry analysis from the proximal fluids of seven cell lines we have selectively validated and analyzed several candidate biomarkers. It is necessary to look at a wider scope of individual cancer's biosignatures to properly identify every candidate. Although a larger cohort needs to be analyzed, the principle component plot of each breast cancer cell line demonstrates the reproducibility of the mass spectrometry analysis for the proximal fluid proteome and the unique biosignatures of each cell line. Each potential biomarker proteins selected for Western blot validation was found to be statistically significant in the univariate analysis across all breast cancer groups except zinc-α2-glycoprotein. Therefore, any number of these proteins could lead to the development of a multi-biomarker affinity assays for grouping breast cancer cells. Interestingly, the Tree test only required 3 biosignature proteins to successfully group HER2 positive, hormone receptor positive and HER2 negative, hormone receptor positive over-expressing HER2 and triple negative breast cancer cell lines.
The first qualifying protein used in the Tree test was the SNP containing cathepsin D which is an excellent candidate for creating and antibody that specifically recognizes a specific amino acid sequence containing the SNP mutation. The second qualifiying protein is muscle fructose 1,6-bisphosphate aldolase, a key protein in glycolysis. Fructose 1,6-bisphosphate aldolase as well as other metabolic proteins have been implicated as potential biomarker's in a number of diseases such as pancreatic ductal adenocarcinoma46, melanoma47 and Schizophrenia48. Metabolic upregulation and high glucose consumption is common in cancer cells, known as the Warburg effect, allowing their aggressive growth. A number of other key glycolysis enzymes including glucose-6-phosphate isomerase, phosphoglycerate kinase, enolase 1, and pyruvate kinase 3 were differentially secreted among the seven breast cancer cell lines (Supplemental Table 1 and 3). When the proximal fluid protein profiles were compared to the expression levels in the nucleocytoplasmic fractions of the same cell lines there was a significant degree of variability between each cell line. In addition, the number 1 protein 60kDa heat shock protein (mitochondrial) found in the nucleocytoplasmic fractions was found in significantly less quantities in the proximal fluid fractions suggestion the proximal fluid profiles were unique to each breast cancer cell line and minimal cell death occurred (Supplemental Table 3). Any combination of these enzymes may help in the stratification of human breast tumors and may be key anti-cancer drug target points. Pyruvate kinase 3 (also known as pyruvate kinase 2) is abnormally expressed in breast cancer tissue while in normal breast cells pyruvate kinase 1 is expressed49, making it a potential drug target. Any one of these metabolic enzymes may contain single-nucleotide polymorphisms that affect its function. The third qualifying protein from the Tree test was keratin 19, a well studied biomarker for breast cancer50–52. Interestingly, keratin 8 was also found to be in high quantities in similar cell lines as keratin 19, which may also lead to it being a candidate biomarker. Analysis of 10 hormone positive, 10 HER2 positive and 10 triple negative serum samples for the presence of keratin 19 by ELISA was consistent with mass spectrometry and affinity blot data of the breast cancer media and cell lines. Further analysis and validation of these biosignatures will be important in stratification, diagnosis, systemic treatment, response, and/or metastasis of human cancer.
Since breast cancer is highly heterogeneous, it is necessary to look at a larger pool of candidate biomarkers beyond HER2, PR, ER proteins in defining breast cancer in order to successfully deliver personalized cancer treatment. The combination of mass spectrometry in the discovery phase to study primary cancer cells and proximal fluid followed by use of sensitive affinity assays in validation will allow us a better chance to develop simple and non-invasive multi-biomarker blood assays for characterizing breast cancer. In addition, our study introduces a new opportunity to develop antibodies recognizing cell specific SNPs or posttranslational protein modifications which may also become a tool in biomarker validation.
We would like to thank and acknowledge Jefferey Gornbein of the Statistical Biomathematical Consulting Clinic (SBCC) from the UCLA Biomathematics Department for his assistance in the statistical analysis of our data. This work was supported in part by the California Breast Cancer Research Program (6JB-0013), the Department of Defense (DAMD17-01-1-0179), the National Institute of Health (1RO1CA93736), the Gonda Foundation, the EIF-Women Cancer Research Fund and Friends of the Breast Program at UCLA.