|Home | About | Journals | Submit | Contact Us | Français|
Cancer cell membrane proteins are released into the plasma/serum by exterior protein cleavage, membrane sloughing, cellular secretion or cell lysis, and represent promising candidates for interrogation. Because many known disease biomarkers are both glycoproteins and membrane bound, we chose the hydrazide method to specifically target, enrich, and identify glycosylated proteins from breast cancer cell membrane fractions using the LTQ Orbitrap mass spectrometer. Our initial goal was to select membrane proteins from breast cancer cell lines and then to use the hydrazide method to identify the N-linked proteome as a prelude to evaluation of plasma/serum proteins from cancer patients. A combination of steps facilitated identification of the glycopeptides and also defined the glycosylation sites. In MCF-7, MDA-MB-453 and MDA-MB-468 cell membrane fractions, use of the hydrazide method facilitated an initial enrichment and site mapping of 27 N-linked glycosylation sites in 25 different proteins. However, only three N-linked glycosylated proteins, galectin-3 binding protein, lysosome associated membrane glycoprotein 1, and oxygen regulated protein, were identified in all three breast cancer cell lines. In addition, MCF-7 cells shared an additional 3 proteins with MDA-MB-453. Interestingly, the hydrazide method isolated a number of other N-linked glycoproteins also known to be involved in breast cancer including, epidermal growth factor receptor (EGFR), CD44, and the breast cancer 1, and early onset isoform 1 (BRCA1) biomarker. Analyzing the N-glycoproteins from membranes of breast cancer cell lines highlights the usefulness of the procedure for generating a practical set of potential biomarkers.
Breast tumors are the most common malignancy in women and the second leading cause of cancer death in American females. Until cancer prevention becomes a reality, early detection and better treatment are the two most effective means to reduce cancer mortalities1. Although mammography screening is successful for detecting many breast tumors, it often fails to identify malignancies in young women, as well as in women with dense breasts, implants or lobular tumors2. In addition, mammography is unaffordable to many women not only in the USA, but also in the overwhelming majority of the developing world 3. Protein cancer biomarkers provide a promising approach that may advance the field in disease detection and treatment1. To achieve this clinical goal, discovery of more specific, sensitive and reliable biomarkers is essential 4. A patients blood plasma/serum is considered to be the most desirable clinical specimen for biomarker research because it is attainable non-invasively, extraction is simple and affordable, and it is likely to contain tumor markers, albeit in significantly low abundance. It is well documented that not only do tumors leak or secrete proteins into the circulation, but also the surrounding stroma releases proteases and other mediators of cancer growth 5 6. The current panel of known tumor markers include carcinoma antigen 27.29 [CA 27.29] for breast cancer 7, prostrate-specific antigen [PSA] for prostrate cancer 8, cancer antigen 125 [CA125] for ovarian cancer 9, carcinoembryonic antigen [CEA] for a wide spectrum of carcinomas 7, 10, and alpha-fetoprotein [AFP] for liver 11 and testicular cancers 10. These proteins are all examples of useful circulation-borne cancer biomarkers that probably arise from the primary tumor or the surrounding stroma 7, 8. However, the discovery of new cancer biomarkers in serum is hindered by two major technical hurdles: the low abundance of disease related proteins in sera and the interference of the 20 or so highly abundant normal serum proteins that make up 99% of the circulating proteome12–14. Serum protein concentrations cover more than 10 orders of magnitude. This enormous range exceeds the dynamic range of any current analytical method or instrument and further complicates the discovery of new cancer biomarkers. Therefore, direct protein profiling of tumor cells may be the most effective way to identify novel cancer signatures, before screening human sera.
However, extracts secreted by or extracted from breast cancer cell lines, proximal fluid, and breast tumor biopsies have highly complex proteomes that must be extensively and reproducibly fractionated in order to identify candidate cancer biomarkers with clinical relevance. A number of fractionation strategies are currently used to reduce sample complexity. These include chromatography, often with the combination of ion exchange and reverse phase modalities such as the multidimensional protein identification technology (MUDPIT)15, fractionations based on molecular weight cut-off filters14, fractionations based on hydrophobicity such as that developed using reverse phase separations16, the capture of cysteine-containing peptides with biotinylated thiol reagents17, the capture of phosphorylated peptides using immobilized metal affinity chromatography18 or dendrimer conjugation chemistry 19, and glycopeptide capture 20. Since most clinically useful tumor markers such as PSA21, CA12522, 23, CEA24, 25, or AFP26, are both N-glycosylated and O-glycosylated we take advantage of the hydrazide method’s utility to isolate, enrich, and identify the N-glycoproteome. In addition, important proteins such as fibroblast growth factor (FGF)27, epidermal growth factor receptor (EGFR)28, insulin receptor (IR)29, and insulin-like growth factor 1 (IGF-1)30 that are all N-glycosylated may also be enriched, identified, and site-mapped by the hydrazide method. In this study we investigated cell membrane glycoproteins of several breast cancer cell lines with the goal of detecting new potential cancer biomarkers.
For the past 30 years the hydrazide method has been used to isolate and enrich glycoproteins 31. More recently Aebersold’s group described its use for isolating and site mapping N-linked glycoproteins from serum32. However many of the most abundant proteins are also N-linked glycosylated molecules precluding the discovery of any low abundance glycoproteins that are released from tissue into the serum. The use of the hydrazide method would be more effective in the search of potential biomarkers if applied to the source of disease. Loo’s group recently reported the hydrazide method was effective in characterizing N-linked glycoproteins in salivary fluids 33. The hydrazide method is initiated by first oxidizing the sugar residues in the glycosylated peptides with periodate before reacting with a hydrazide resin34. Covalent attachment of glycopeptides to the resin allows the samples to be thoroughly washed free of non-covalently-attached peptides and other contaminates before enzymatic release of the N-linked peptides with PNGase F leaving a signature of deaminated asparagine residues converted to aspartic acid. These samples may then be processed by data-dependent LC-MS/MS so that the glycopeptide is identified and the glycosylation site mapped.
Since excised human tumors are both limited in quantity, heterogeneous for cell type and variable for cancer sub-type, we chose to evaluate the utility of the proposed method with human breast cancer cell lines that are homogenous for cell-type, available in reasonable quantities and represent different subtypes of breast cancer. The names and characteristics of the cell lines are: MCF-7, hormone receptor positive and HER2 negative; MDA-MB-453, hormone receptor negative and HER2 positive; and MDA-MB-468, hormone receptor (estrogen receptor (ER) and progesterone receptor (PR)) and HER2 negative (triple negative). We selected N-linked glycosylated proteins in our concentrated membrane extracts of breast cancer cell lines using the hydrazide system since many of the membrane proteins released into the plasma/serum are frequently glycosylated. The identified N-glycoproteins in cancer cells then can guide further detection of candidate biomarkers in serum/plasma from breast cancer patients and lead to the development of clinically useful diagnostic assays.
MCF-7, MDA-MB-453 and MDA-MB-468 human breast cancer cell lines were maintained in DMEM culture medium supplemented with L-glutamine and sodium pyruvate, 10% FBS, and 1% penicillin and streptomycin. Cells were grown to 70% confluency in 10 cm petri dishes. Media was removed and cells were washed with PBS, harvested by centrifugation (1,000 × g, 2 min 4 °C) and the pellets stored at −80 °C until needed.
The frozen cell pellets were homogenized in ice cold 50 mM Tris-HCl pH 7.4, 150 mM NaCl, 1 mM EDTA, 1 mM dithiothreitol (DTT), 1 mM PMSF, and protease cocktail (Roche Cat. No. 04693132001) using an ultrasonic cell disrupter (Fisher Scientific, sonic dismembrater model 100, at setting 4 for 2 × 10 seconds at 30 seconds intervals) on ice. Samples were centrifuged at 1,500 × g for 5 min at 4 °C to remove large debris, and the supernatant re-centrifuged at (100,000 × g, 1 hr, 4 °C). The pellet was solubilized in 40 mM Tris-HCl pH 8.3, 6 M guanidine HCl, 0.2% Rapigest™ (Waters), 5 mM DTT, centrifuged (15,000 × g, 2 min, RT), and supernatant diluted to <1 M guanidine HCl with 40 mM Tris-HCl pH 8.3. The sample was sequentially treated with DTT, iodoacetamide, and trypsin (overnight, 37 °C) according to manufacturers protocol (Promega). The pH was adjusted to pH 3 with TCA, the samples were centrifuged (15,000 × g, 2 min, RT) and the supernatant was passed through an activated, washed C18 spin column (The Nest Group, Inc.) according to manufacturers protocol. The eluate was vacuum dried and a fraction (2%) was reserved for LC-MS/MS without further treatment.
The remainder of dried eluates in microcentrifuged tubes were dissolved in coupling buffer (100 mM sodium acetate, 150 mM NaCl, pH 5.5), centrifuged (15,000 × g, 2 min, RT), and the supernatant was treated with sodium periodate (15 mM final concentration, 30 min, RT) by end over end rotation in the dark. The oxidation reaction was quenched with 20 mM sodium sulfite (20 mM final concentration, 15 min, RT) and the sample was added to the hydrazide resin (200 µl wet resin, Biorad, previously washed with coupling buffer following manufacturers instructions). The coupling reaction was performed overnight at 37 °C with end over end rotation. The resin was washed (1 mL) sequentially twice each with H2O, 1.5 M NaCl, methanol, acetonitrile, and finally with 50mM NH4CO3 pH 8.0. The slurry was then treated with PNGase F (New England BioLabs Inc.), centrifuged (15,000 × g, 2 min, RT), the supernatant removed and the pellet washed with water/acetonitrile (50/50, 1 mL) and re-centrifuged. The pooled samples were vacuum dried.
Dried samples were re-dissolved in Buffer A (H2O/acetonitrile/formic acid, 98.9/1/0.1, typically 100 µl), separated by nanospray LC (Eskigent technologies, Inc. Dublin, CA), and analyzed by online tandem mass spectrometry (LTQ Orbitrap, Thermo Fisher). Aliquots were injected (10 µl) onto a reverse phase column (New Objective C18, 15 cm, 75 µM diameter, 5 µm particle size equilibrated in Buffer A) and eluted (300 nL/min) with an increasing concentration of Buffer B (acetonitrile/water/formic acid, 98.9/1/0.1; min 0/5, 10/10, 112/40, 130/60, 135/90, 140/90). The effluent from the column was passed directly into an integrated nanospray emitter tip connected to the LTQ Orbitrap mass spectrometer. Eluted peptides were analyzed by MS and data-dependent MS/MS acquisition (collision-induced dissociation CID), previously optimized for samples, selecting the 7 most abundant precursor ions for MS/MS with a dynamic exclusion duration of 15.0 seconds.
The mass spectra were searched against a human trypsin indexed database (two trypsin missed cleavages), with variable modifications of carboxyamidomethylation, methionine oxidation, and deamination of asparagine residues using the Bioworks software (Thermo Fisher) based on the SEQUEST algorithm. Positive protein identification was accepted for peptides with a Xcorr for doubly charged ions ≥ 2.0 and for triply charged ions ≥ 2.5.
Quantitative data analysis was performed using the Scaffold (Proteome Software, Inc.) software program. The Bioworks search results were uploaded into the scaffold software program and a filter with a 99% minimum protein ID probability (calculated probability of correct protein identification), with a minimum number of 2 unique peptides for one protein, and with a minimum peptide ID probability of 99.9% was set. Note that this filter setting affects not only which proteins are shown but also the report values shown for the Number of Peptides, Number of Identified Spectra, and Number of Unique Spectra. Scaffold verifies peptide identification derived from MS/MS sequencing results using X! Tandem35 ProteinProphet computer algorithms36. Scaffold validates these peptide identifications using PeptideProphet37 and derives corresponding protein probabilities using ProteinProphet36. Scaffold normalizes MS/MS data between samples with similar total protein amounts allowing a comparison of protein abundance between samples. Each of the breast cancer cell line samples is a combination of three replicate experiments. Normalization entails averaging the spectral counts for all the samples and then multiplying the spectral counts in each sample by the average divided by the individual sample’s sum. Since we are analyzing relatively equivalent amounts of proteins in each MS/MS experiment, combining three replicate experiments per breast cancer cell line sample and only looking at the most abundantly expressed proteins in each sample, the abundance differences between each breast cancer cell line is significant.
The aim of this study was to isolate, identify and site-map N-linked glycosylated membrane proteins as a more direct way to define new tumor related proteins, since most of the cancer biomarkers used in clinical practice today are glycosylated cell membrane proteins. Isolation of N-linked glycoproteins provides an opportunity to study a unique subset of proteins in the cell membrane of breast cancers. Hydrazide beads were used to bind periodate oxidized sugar residues of glycosylated peptides after trypsin cleavage of cancer cell membrane proteins. A total of 27 N-glycosylated sites in 25 glycoproteins (haptoglobin and intercellular adhesion molecule 1 had two N-glycosylated sites) with the consensus site NXS/T (where X may be any amino acid except proline) emerged from the analysis of the three cell lines (Tables 1), with MCF-7 yielding 15 total proteins, MDA-MB-453 yielding 13 total proteins, and MDA-MB-468 yielding 6 total proteins for a combined total of 25 different N-glycosylated proteins. In addition, 12 other N-glycosylated proteins were site mapped without the conventional NXS/T motif (Supplemental Table 1). In Table 1 only three N-glycosylated NXS/T consensus site proteins were shared between all three different cell types. These are galectin-3 binding protein, oxygen regulated protein precusor, and lysosome-associated membrane glycoprotein 1 (LAMP1). The MCF-7 and MDA-MB-453 cell lines were found to share three proteins, cathepsin D, pigment epithelium-derived factor, and clusterin isofrom 2. Interestingly, of the 25 site mapped NXS/T motif N-linked glycoproteins, this analysis revealed a breast cancer stem cell marker CD44 (Figure 1A). An example of the Scaffold display of MS/MS data indicating the N-linked glycosylation site mapped to the galectin-3 binding protein peptide is shown in Figure 1B. Surprisingly, we also found a well-known breast cancer biomarker: breast cancer 1, early onset (BRCA1) a suppressor of uncontrolled tumor cell proliferation (Supplemental Figure 1). While interesting, some caution should be attached to this identification because the peptide identified had two additional deaminated asparagines outside the NXS/T motif, possibly due to novel N-glycosylations, or deamination during the oxidative process of the hydrazide method, or SNPs. The MSMS spectrum also revealed a surprisingly intense high m/z b- ladder, unusual for tryptic peptides. Confirmation of the identification would require high-resolution accurate m/z assignments on the fragment ions that is not possible at this time. In addition, spontaneous deamination of asparagine and glutamine is commonly found in mass spectrometry samples and the parent ion for the BRCA1 N-linked glycopeptide had a mass error less than 2 ppm. A number of other proteins, varyingly implicated in breast cancer, were also identified, including galectin-3 binding protein, cathepsin D, and keratin 18 (supplemental table 1 and supplemental figure 2). Although the MS/MS spectra of the non-NXS/T motif N-glycoproteins were purified by covalently attaching the N-glycan to the hydrazide resin and subjected to stringent washing conditions we could not rule out contamination. In addition, most of these non-NXS/T motif N-glycoproteins had high xcorr factors, ion matching ratios, and were subjected to manual inspection (supplemental table 1 and supplemental figure 2).
Since N-linked proteins were isolated from crude membrane fractions we compared the overlap of N-linked glycoproteins to the membrane sources to determine selectivity of the hydrazide enrichment method. First we analyzed the starting material before hydrazide treatment using total membrane proteins from three separate isolation experiments for each of the three breast cancer cell lines. We found that MCF-7, MDA-MB453, and MDA-MB-468 had over 291 proteins in common (Figure 2). In addition, by comparing the total peptides in common and total spectra shared we observed that our starting material had relatively equivalent amounts of protein present. An example of the base peak MS chromatogram of MDA-MB-468 membrane fractions and MS/MS of the signature EGFR found in MDA-MB-468 breast cancer cell lines is shown in Figure 3A and 3B. The hydrazide method clearly enriched for an EGFR N-linked glycopeptide that had not been detectable in the crude membrane fractions (Figure 3C). A representative 2D plot (ion map) for each crude membrane fraction from each breast cancer cell is shown in Supplemental Figure 3.
When the 15 site mapped N-linked glycosylated proteins in MCF-7 membrane fractions were compared to the total protein of MCF-7 membrane fractions by a Venn diagram (Figure 4A) only three proteins including cathepsin D, solute carrier family 3 member 2, and thrombospondin 1, were present in both preparations. Only one, solute carrier family 3 member 2 isoform (101 spectral counts), of the three proteins had high abundance spectral counts in the crude membrane fractions. While cathepsin D (13 spectral counts) and thrombospodin 1 (0 spectral counts) had either low or no spectral counts, respectively. Two of the non-NXS/T motif N-glcoproteins found also had high abundant spectral counts, keratin 18 (150 spectral counts) and H2B histone family member C (163 spectral counts). This data indicates that the fractionation of N-linked glycoproteins from crude membrane fractions using the hydrazide method both significantly enriched for low abundant glycoproteins and identified glycoproteins from high abundant membrane proteins. The same comparison of 13 site mapped N-linked glycosylated proteins in the MDA-MB-453 relative to total membrane proteins is shown in Figure 4B. Only two proteins, cathepsin D (7 spectral counts) and heterogeneous nuclear ribonucleoprotein L (22 spectral counts), were found in both preparations. The two non-NXS/T glycoproteins, H2B histone family member C (192 spectral counts) and keratin 18 (29 spectral counts) were also found in the MDA-MB-453 cell line. The 6 N-linked glycoproteins site mapped in MDA-MB-468 membranes had only two shared proteins, CD44 (33 spectral counts) and EGFR (22 spectral counts), also seen in the total cell membrane fraction. Since the majority of the site mapped N-linked glycoproteins were not detected in the starting material of the crude membrane fractions, it appears that the hydrazide method did in fact significantly enriched the glycoproteome in all three breast cancer cell lines.
After we identified a set of glycoprotein potential biomarkers from membrane fractions we also analyzed the quantitative expression of proteins between the three human breast cancer cell line membranes. Using the Scaffold program we quantitatively compared the spectral counts of three combined experiments for each of the three breast cancer cell lines (Supplemental Table 2), with over 400 proteins quantified. To simplify comparisons among the large number of proteins from the 3 breast cancer cell lines, a subset of the data was analyzed in charts highlighting quantitatively distinct proteins between the three cell lines (Figure 5A and 5B). A number of proteins that are unique or elevated in each cell line are readily appreciated. Erb-binding protein, solute carrier protein 3, keratin 8, keratin 18, and heat shock protein 27 (HSP27) are increased in hormone receptor positive and HER2 negative MCF-7 cells (black bar). MDA-MB-453 (grey bar), a HER2 positive cell line, demonstrates an elevation of enolase 1, RAB1B Ras oncogene, stomatin, keratin 18, filamin A, and valosin. Triple negative breast cancer cell line MDA-MB-468 (white bar) has a distinctive set of elevated proteins including CD44, EGFR, filamin A, and valosin. A number of elevated proteins were in high quantity in all three breast cancer cell lines such as stress-induced-phosphoprotein-1 (STIP-1), keratin 19, lamin A/C, chaperonin, and glyceraldehydes-3-phospate dehydrogenase (GAPDH) to name a few. The expression of distinct protein profiles in the three types of breast cancer cell line membrane fractions may be useful in the further stratification of the major types of breast cancer. In addition, the high expression of these proteins may enhance detection after release in sera or plasma from cancer patients and could represent potential cancer biomarkers.
Since N-linked glycoproteins clearly were enriched from crude cell membrane fractions using the hydrazide method, it was of interest to assess the location (Figure 6A) and cellular function (Figure 6B) of these defined proteins. N-linked proteins were classified into a pie chart of differential cell compartments and function using an in-house developed protein sorting software program (UCLA Bioinformatics Department). As expected, this analysis revealed a heavy representation of membrane-associated proteins (Figure 6A) with diverse functions (Figure 6B).
The current study explored the hydrazide method as an effective way to preselect N-linked glycoproteins from cell membranes before applying the resolving power of coupled microbore chromatography-mass spectrometry to identify the proteins. The results indicate this approach was successful in enriching, identifying and site-mapping 27 N-linked glycosylation sites in 25 glycoproteins from crude membrane samples. The method allowed us to dig deeper into the glycosylated membrane proteome than is possible with conventional cellular fractionation. This can be clearly seen in all three human breast cancer cell lines examined. For example, in MCF-7 cell membrane isolates there was only an overlap of 3 N-linked glycoproteins, with the 357 total membrane proteins detected by LC-MS/MS alone, while 12 N-linked glycoproteins were uncovered only after N-linked glycoprotein hydrazide enrichment. Similarly, only 3 glycoproteins were detectable by LC-MS/MS in MDA-MB-453 among the 388 total membrane proteins but N-linked glycoprotein enrichment revealed an additional 10 N-linked glycoproteins after hydrazide enrichment. Finally in MDA-MB-468 cells only 2 N-linked glycoproteins were seen by LC-MS/MS among the 338 total membrane proteins identified. Both these N-linked glycoproteins are signature proteins of the MDA-MB-468 cell line. Clearly there was significant enrichment of the N-linked glycoprotein proteome using the hydrazide system. Recently published was a modification of the original hydrazide method32 that significantly increased the selection and identification of cell surface glycoproteins by chemically tagging glycoproteins on intact living cells38. The application of this method would significantly reduce the contamination of proteins from intracellular membranes found in crude membrane extracts.
Traditionally N-glycosylation typically occurs at the NXS/T motif on glycoproteins, where X is any amino acid except proline. Interestingly, we also detected a number of non-NXS/T motif glyco-sites with the hydrazide method. We can not rule these peptides out as contaminating peptides that underwent deamination through the hydrazide process nor could we dismiss them due to the covalent isolation of the glyco-residues and stringent washing. Therefore we included them in this manuscript (Supplemental Table 1). We also included representative MS/MS data (Supplemental Figure 2). Some of these glco-sites have serine or threonine located right next to the asparagines (NS/T) or two amino acids to the right (NXXS/T).
The identified N-linked glycoprotein differences and similarities between the three cell lines are especially interesting, since out of 291 shared membrane proteins there were only 3 glycoproteins in common. In addition, these three proteins were not detectable in the crude membrane fraction, providing further support for the utility of selective glycoprotein enrichment by the hydrazide method. MCF-7 and MDA-MB-453 cells shared three NXS/T motif glycoproteins in common. Cathepsin D is reported to be secreted in more aggressive breast cancer cell lines39 and clusterin when elevated promotes tumor cell survival by having anti-apoptotic effects on chemotherapy treated breast cancer40. On the other hand pigment epithelium-derived factor (PEDF) expression was found to be dramatically decreased in breast cancer 41. PEDF is one of the most potent inhibitors of angiogenesis and is a candidate tumor suppressor in a variety of cancers 42. Further study of increased expression of PEDF in hormone and receptor positive breast cancer cell lines compared to triple negative cell lines may correlate with the prognosis of this cancer. We also detected another angiogenesis inhibitor protein, thrombospondin-1, in MCF-7 cells. Interestingly, the expression of thrombospondin-1 is inversely correlated with vascularization in breast cancer.43 Further study in HER2 positive and HER2 negative cell lines will be necessary to determine the significance of the presence of these glycoproteins in breast cancer cell lines. Three other interesting MCF-7 glycoproteins that did not have the traditional NXS/T motif where keratin and hypothetical protein LOC9768 isoform 1, KIAA0101 or p15(PAF). Keratin KRT8/18 expression is known to differentiate distinct subtypes of grade 3 invasive ductal carcinoma of the breast 44. We also see an abundant amount of keratin 8 in the MCF-7 breast cancer cell lines membrane fraction, an apparent candidate biomarker for that particular breast cancer cell type. Hypothetical protein LOC9768 isoform 1, KIAA0101 or p15(PAF), is a 15-kDa protein containing a conserved proliferating cell nuclear antigen (PCNA)-binding motif found to be involved in the regulation of DNA repair, cell cycle progression, and cell proliferation45, 46. KIAA0101 is over-expressed in tumors of the breast, uterine cervix, brain, kidney, esophagus, lungs and colon45, 46.
A single non-NXS/T motif glycoprotein, tankyrase 1, was shared by the triple negative MDA-MB-468 breast cancer cells and the HER2 negative MCF-7 breast cancer cells. Tankyrase 1 is known to be involved in the positive regulation of telomeres which play a critical role in cellular senescence, apoptosis and tumorigenesis47, 48. Increased expression of tankyrase 1 interacts with and releases a negative regulator of telomerase, telomere repeat binding factor 1 (TRF1), inducing telomere elongation47. Tankyrase 1 also is elevated in tumor tissues and its expression is higher in estrogen and progesterone negative tumors49. The discovery of a N-linked glycosylation site may be important in the regulation and function of tankyrase 1 as well as being a useful post-translational modification (PTM) for biomarker detection.
After hydrazide enrichment, LC-MS/MS identified breast cancer 1, early onset isoform 1 (BRCA 1) as a N-linked glycoprotein in MCF-7 cells. Point mutations in BRCA 1 and BRCA 2 have been well documented in 30% of hereditary breast cancers50. Previously (based on the “Keil rule”) it was believed trypsin cleaves at lysine or arginine but not before proline, however Rodriguez et al. reported that many peptides are actually cleaved with trypsin before proline residues51. In addition, one of the N-linked sites on the BRCA1 glycopeptide is a predicted N-linked site (https://db.systemsbiology.net) based on the N-linked glycosylation consensus sequence NXS/T at amino acid site 913 (EENQGKN*ESNIK). The enrichment and identification of BRCA 1 as a N-linked glycoprotein also provides support for the utility of the hydrazide method in discovering additional glycoproteins as potential breast cancer biomarkers.
N-linked glycosylation plays a key role in the expression levels and receptor activity of several receptor tyrosine kinases (RTK) such as EGFR and HER252. N-linked glycosylation of the EGFR positively effects the conformation necessary for receptor-receptor self-dimerization. Four hypothetical N-linked glycosylation sites located in domain III of EGFR have been proposed as potential key contributors to EGFR self-dimerization which is necessary for ligand activation of the receptor53. We have identified and site-mapped a N-linked glycosylation site in the domain III region of EGFR in MDA-MB-468 breast cancer cells. The EGFR peptide, 347DSLSINATNIK357, contains a N-linked glycosylation consensus site (NXS/T) that was also detected in purified EGFR from A431 epidermal cancer cell line54. In addition, the hydrazide method identified a N-linked glycosylation at N57 of CD44 which matched one of five N-linked glycosylation sites necessary for CD44 conformational interaction with hyaluronic acid leading to CD44 activation55. Both O-glycosylation and N-glycosylation of CD44 play a critical role in its function55. Interestingly the presence of CD44 cell surface receptors and the absence of CD24 is a phenotype found in breast cancer stem cells and basal-like breast tumors otherwise known as triple negative cancers (without PR, without ER and without HER2 receptor) such as the MDA-MB-468 breast cancer cell line56–58. The CD44+/CD24- phenotype correlates with metastasis and invasiveness59. Identification of key post-translational modifications (PTMs) that play key roles in receptor function and activity may be important as targets for drug development as well as for the development of PTM specific biomarkers.
The high sensitivity of LC-MS/MS allowed detection of progesterone receptor membrane component 1 (PGRMC1) in both of the estrogen receptor (ER) negative breast cancer cell lines tested and not in the ER positive MCF-7 cell line (Supplemental Table 2). PGRMC1 is found to be highly expressed in ER negative cell lines compared to ER positive cell lines60. This agreement between our mass spectrometry data with previously published reports confirms the sensitivity and reliability of the LTQ Orbitrap as a biomarker discovery tool.
A primary goal of this study was to enrich for N-linked glycoproteins in crude cancer cell membrane fractions as a prelude to discover of potential biomarkers similar to the currently known biomarkers HER2, MUC1, ER, and PR. Since we were able to isolate 25 N-linked glycoproteins with many of the proteins playing major roles in cancer we also analyzed the proteins differentially expressed in the membrane fraction that may also be secreted or sloughed off into the interstitial fluid of tumors in vivo. Significant overlaps and differences between the membrane proteins were seen in each tumor cell line, but quantitatively each cell line appeared to have a unique proteome signature. Comparison of the normalized spectral counts of membrane proteins in MCF-7, MDA-MB-453, and MDA-MB-468 cells using the scaffold software program revealed signature differences between the three breast cancer cell lines. MCF-7 had high quantities keratin 8 and 18, solute carrier protein 3, HSP 27 and ErbB-binding protein, while MDA-MB-453 had high quantities of enolase 1, nucleolin, RAB1B Ras oncogene, stomatin, filamin A, valosin and cytokeratin 7. MDA-MB-468 cells had high quantities of EGFR, CD44, filamin A, progesterone receptor component 1, and valosin. All three cancer cell lines also share high expression of STIP1, keratin 19, chaperonin, and HSP-alpha. Any one of these proteins could represent signature new candidate biomarkers of cancer for a blood assay. Further studies of breast cancer cell lines, proximal fluid, tissue samples, and patient matched serum samples using affinity-based assays will be necessary to validate reproducibility and functionality of each of these candidate biomarkers.
This preliminary study of N-linked glycoprotein proteome in breast cancer cell lines has led to the identification of 27 glycosylation sites in 25 proteins. An important extension of this work will be to document the correlation between protein quantity and breast cancer type and severity. Already an overlap in the glycoproteins found in cancer membrane fractions and secreted or released proteins has been detected in the breast cell line culture media (data not shown). Another important extension of this work will be to compare both the conventional O-linked glycoproteins of the cell membrane and the monosacharide O-linked N-acetylglucosamine (O-GlcNAc)61 of the nucleocytoplasmic fraction between the different breast cancer cell lines, tissue and the interstitial fluid around the tumor sites. The discovery of unique candidate biomarkers will help with the stratification of breast tumor types. Future development of antibodies that specifically recognize changes in glycosylation PTMs of candidate biomarkers would further the stratification of breast cancer. In addition, discovery of high abundant proteins secreted into the proximal fluid of breast cancer cell lines or interstitial fluid of tumor tissue may lead to a plausible affinity-based assay capable of identifying these trace level biomarkers in the large volume of circulating blood proteins of breast cancer patients.
This work was supported in part by the California Breast Cancer Research Program (6JB-0013), the Department of Defense (DAMD17-01-1-0179), the National Institute of Health (1RO1CA93736), the Gonda Foundation, the EIF-Women Cancer Research Fund and Friends of the Breast Program at UCLA.