|Home | About | Journals | Submit | Contact Us | Français|
Estrogen receptor (ER) positive tumors represent the majority of breast malignancies, and are effectively treated with hormonal therapies, such as tamoxifen. However, in the recurrent disease resistance to tamoxifen therapy is common and a major cause of death. In recent years, in‐depth proteome analyses have enabled identification of clinically useful biomarkers, particularly, when heterogeneity in complex tumor tissue was reduced using laser capture microdissection (LCM). In the current study, we performed high resolution proteomic analysis on two cohorts of ER positive breast tumors derived from patients who either manifested good or poor outcome to tamoxifen treatment upon recurrence. A total of 112 fresh frozen tumors were collected from multiple medical centers and divided into two sets: an in‐house training and a multi‐center test set. Epithelial tumor cells were enriched with LCM and analyzed by nano‐LC Orbitrap mass spectrometry (MS), which yielded >3000 and >4000 quantified proteins in the training and test sets, respectively. Raw data are available via ProteomeXchange with identifiers PXD000484 and PXD000485. Statistical analysis showed differential abundance of 99 proteins, of which a subset of 4 proteins was selected through a multivariate step‐down to develop a predictor for tamoxifen treatment outcome. The 4‐protein signature significantly predicted poor outcome patients in the test set, independent of predictive histopathological characteristics (hazard ratio [HR] = 2.17; 95% confidence interval [CI] = 1.15 to 4.17; multivariate Cox regression p value = 0.017). Immunohistochemical (IHC) staining of PDCD4, one of the signature proteins, on an independent set of formalin‐fixed paraffin‐embedded tumor tissues provided and independent technical validation (HR = 0.72; 95% CI = 0.57 to 0.92; multivariate Cox regression p value = 0.009). We hereby report the first validated protein predictor for tamoxifen treatment outcome in recurrent ER‐positive breast cancer. IHC further showed that PDCD4 is an independent marker.
ER positive tumors constitute the majority of all breast malignancies. Tamoxifen therapy has been shown to significantly improve survival and cure of patients with primary ER positive breast tumors, but upon recurrence about half of the patients show intrinsic resistance, while those initially responding will ultimately develop acquired resistance (Cardoso et al., 2012; Milani, 2014). The need for biomarkers capable of determining mechanisms of resistance has led to the development of several predictive signatures, though none has been introduced in the clinic so far (Beelen et al., 2012). With the recent advancements in MS techniques, in‐depth quantification of the human proteome has become possible and the ability of measuring protein abundance over a broad dynamic range has established proteomics as a robust tool for biomarker discovery (Drabovich et al., 2014; Kim et al., 2014; Wilhelm et al., 2014). The proteomic analysis of tissue specimens is, however, hindered by their heterogeneity, which alters protein abundance dynamic range. Furthermore, the presence of stromal and infiltrating cells adds another layer of complexity by hampering accurate protein quantitation of target epithelial tumor cells (Kondo, 2014). To address this issue, LCM offers a robust cell sub‐population enrichment technique, allowing accurate downstream analysis of morphologically heterogeneous specimens (Emmert‐buck et al., 1996; Vogel et al., 2007). Genomic and proteomic analyses of LCM derived material showed the feasibility of this technique in molecular profiling studies and pointed out its efficacy in studying disease associated signaling pathways when compared to whole tissue analyses (Cheng and Zhang, 2013; Sereni et al., 2015; Xu, 2010; F. Yang et al., 2006). LCM yields sub‐microgram protein amounts due to the fact that only a limited number of cells can be dissected from each sample. In the light of this, coupling LCM enrichment to chemical labeling methods would require extensive sample preparation and workflow optimization, which would be unsuitable in the analysis of large sample sets. Label‐fee quantification (LFQ) software algorithms have demonstrated to be accurate tools in the quantitation of proteins, allowing high yield identification and reliable quantitation of measured peptides even from minute amount of analyzed specimens (Cox and Mann, 2008; Megger et al., 2013). We have optimized a tissue proteomic pipeline for biomarker discovery coupling LCM cell enrichment to high resolution LC‐MS and LFQ, capable of quantifying more than 3000 proteins from only 4000 dissected epithelial cells (Braakman et al., 2012; Liu et al., 2012). Using this workflow, we recently developed and validated a prognostic protein signature for triple negative breast cancer (Liu et al., 2014). Despite our workflow has demonstrated to be a robust methodology for the discovery of cancer biomarkers, application of shotgun proteomics in clinical diagnostics remains problematic due to the extensive and time consuming sample preparation required. In this perspective, IHC or selected reaction monitoring/multiple reaction monitoring (SRM/MRM) MS may be more suitable biomarker verification techniques that do not require extensive method optimization or sample preparation (Whiteaker et al., 2011). Although antibody specificity and lack of accurate quantitation remain important issues, IHC still remains a major technique in clinical diagnostics and significantly requires less amount of optimization time in comparison to ELISA or even SRM/MRM MS.
In this study we describe the development of a predictive protein signature for tamoxifen resistance in ER positive breast cancer by coupling LCM tumor cell enrichment and high resolution LC‐MS in the analysis of independent training and test patient cohorts. We also provide further validation by IHC analysis of signature proteins on an independent panel of paraffin‐embedded tissues captured in a tissue micro‐array (TMA).
From an initial selection of 200 tissues collected from patients that received tamoxifen as first line therapy we excluded tissues with a low percentage of tumor cells (i.e. <40%; n = 88; Figure 1). A total of 112 ER positive fresh frozen primary breast cancer tissue samples were then included in our sets: 56 from Erasmus MC University Medical Center (EMC), Rotterdam (years of surgery: 1981–1994), 41 from the National Cancer Institute – Antoni van Leeuwenhoek hospital (NKI‐AVL), Amsterdam (1980–1996), and 15 from Radboud University Medical Center (RadboudUMC), Nijmegen (1991–1996; Table 1). EMC derived samples constituted the training set, while NKI‐AVL and RadboudUMC provided an independent external test set. ER positivity in tumor cytosols was assessed by quantitative biochemical assays (EMC), reverse‐transcriptase quantitative polymerase chain reaction (RadboudUMC), or IHC (NKI‐AVL). All patients underwent surgery of their primary tumor (conservative or non‐conservative), developed recurrent disease, and were treated with tamoxifen as first line therapy. Due to lack of response data for a subset of specimens, treatment outcome was defined based on time to progression (TTP): disease progression ≤6 months and >6 months after start of first line tamoxifen administration were defined as poor and good outcome, respectively. The training set comprised 24 and 32 patients who showed good and poor outcome upon tamoxifen treatment, respectively. The test set included tumors of 41 good and 15 poor outcome patients. The NKI‐AVL cohort did not contain stage IV tumors, while such specimens were found in the EMC and RadboudUMC sets. In addition, 2 tumor tissues of which clinical follow up information was not available were used as LCM and whole tissue lysate (WTL) controls. For biological replicates, both tumor tissues were subjected to 4 rounds of LCM. Of one of these, a WTL was prepared from one sample and digested in triplicate.
Data analysis flow‐chart and development of predictor for tamoxifen treatment outcome. Patients were divided into two independent cohorts and separately measured by LC‐MS. Proteomic data from training and test sets were analyzed separately ...
Patient and tumor characteristics.
In addition, a total of 447 formalin‐fixed and paraffin‐embedded tissues collected from EMC and regional hospitals were comprised in a tissue micro‐array. For further analyses, we included only ER positive tumors and patients who did not receive hormonal adjuvant therapy. Patients with a revised histology that showed no tumor, or patients with a progression within 3 weeks were excluded as well, leading to a total of 408 ER positive tissues from patients treated with tamoxifen as first‐line therapy for recurrent disease (Supplemental Table 1). Response data were collected according to the standard International Union Against Cancer criteria (Hayward and Carbone, 1977). In this set, 11 (2.7%) and 51 (12.5%) patients respectively showed complete (CR) and partial remission (PR). Two hundred and five (50.3%) patients showed no change (NC) of disease, of whom 170 (41.7%) showed NC >6 months (defined as stable disease, SD) while 35 (8.6%) showed NC ≤6 months after start of therapy. Progressive disease (PD) was observed in 141 (34.6%) patients. Clinical benefit was defined as CR + PR + SD patients (n = 232; 57%), while objective response was defined as CR + PR only (n = 62; 15%). This retrospective study used coded primary tumor tissues, in accordance with the Code of Conduct of the Federation of Medical Scientific Societies in the Netherlands (http://www.federa.org/codes‐conduct). Reporting Recommendations for Tumor Marker Prognostic Studies were followed where possible (Altman et al., 2012).
All tissue samples were cut into 8 μm cryo‐sections, and collected on UV‐sterilized polyethylene naphthalate (PEN) coated glass slides (Carl Zeiss Microsystems GmbH, Göttingen, Germany) for downstream LCM. In addition, 5 μm sections were collected on regular glass slides and stained with hematoxylin and eosin dyes for histological evaluation. Sections on PEN slides were dehydrated with 95% ethanol and immediately stored at −80 °C, until further processing. Prior to LCM, PEN slides were thawed at room temperature and subsequently stained with hematoxylin as follows: distilled water, hematoxylin, distilled water, 50% ethanol, 70% ethanol, 95% ethanol, 100% ethanol, 100% ethanol. During dehydration steps Halt Protease Inhibitor Cocktail (Thermo Fisher Scientific Inc, Rockford, IL, USA) at a 1:100 v/v concentration was added in order to prevent proteolytic degradation of proteins. An area of approximately 500,000 μm2 (~4000 tumor cells) was collected from each tissue using a photo‐activated localization microscopy Micro Beam device and gathered in an opaque adhesive cap (Carl Zeiss Microsystems GmbH, Göttingen, Germany). A volume of 20 μl of 0.1% w/v Rapigest surfactant (Waters Corporation, Milford, MA, USA) in 50 mM ammonium bicarbonate solution was used to transfer the collected LCM samples into LoBind™ Eppendorf tubes (Eppendorf AG, Hamburg, Germany). Tissue containing buffer was immediately frozen after collection and stored at −80 °C.
LCM collected material was disrupted in a horn sonifier bath using an Ultrasonic Disruptor Sonifier II (Bransons Utrasonics, Danbury, CT, USA) at 70% amplitude. Proteins were denatured at 95 °C, reduced with 100 mM DTT for 30 min at room temperature, and alkylated in the dark with 300 mM iodoacetamide for 30 min at room temperature. Samples were then digested for 4 h at 37 °C after addition of MS grade trypsin at a 1:4 enzyme‐protein ratio (i.e. 100 ng/μl). Samples were acidified with TFA, and spun down at 14,000 RPM. Supernatants were collected and transferred to HPLC vials (Sigma–Aldrich Corporation, St. Louis, MO, USA).
Mass spectrometry measurements were performed with a nano liquid chromatography system (Ultimate 3000, Dionex, Amsterdam, The Netherlands) coupled online to a linear Ion Trap – Orbitrap XL™ mass spectrometer (Thermo Electron, Bremen, Germany). Samples were first loaded on a trap column (PepMap C18, 300 μm ID × 5 mm length, 5 μm particle size, 100 Å pore size; Dionex), then washed and desalted in 0.1% TFA acidified water. Trap column and analytical column (PepMap C18, 75 μm ID × 50 cm, 3 μm particle size and 100 Å pore size; Dionex) were then coupled and peptides were eluted in a 3 h binary gradient (flow: 300 nl/min; mobile phase A: 2% acetonitrile and 0.1% formic acid in H2O; mobile phase B: 80% acetonitrile and 0.08% formic acid). Gradient was run as follows: 0%–25% mobile phase B for 2 h, increase to 50% mobile phase B in 1 h. For ESI, metal‐coated nano ESI emitters (New Objective, Woburn, MA) were used and a spray voltage of 1.6 kV was applied. High‐resolution scan was acquired from 400 to 1800 Th and was used for MS detection. Automatic gain was set at 106 ions and lock mass was set at 445.120025 u protonated with (Si(CH3)2O)6. The 5 most intense peaks in full scan were selected and fragmented by collision induced dissociation (CID) applying 35% normalized collision energy and detected in the ion trap. Ions falling out of the ±5 ppm window or for which precursor intensity fell below 1.5 signal‐to‐noise ratio during 10 scans were excluded.
A total of 112 samples were analyzed by LTQ‐Orbitrap XL™ MS, together with 4 biological LCM replicates of control samples, and of which one was measured with a triplicate of its matching WTL. MS spectra of the training and test cohorts were generated and analyzed separately with a time interval of two years. Orbitrap.RAW files derived from MS analyses were imported and analyzed in MaxQuant (version 22.214.171.124) (Cox and Mann, 2008), using Andromeda peptide search engine (Cox et al., 2011). Analysis of spectra was performed using the following options: acetylation of the N‐terminus and oxidation of methionine were selected as variable modifications, multiplicity was set to 1. FASTA file used for protein search was UniProt‐SwissProt human canonical database (version 2012‐09, human canonical proteome; 20.243 identifiers). Minimal peptide length was set to 7 amino acids, match between runs and LFQ options were selected and kept as default. Other options were kept as default (e.g. fixed peptide modifications: carbamidomethylation; false discovery rate = 0.01). For further data analysis, “ProteinGroups.txt” file was imported into Microsoft Excel and protein identifiers were filtered based on PEP score (cutoff <0.05). Contaminants and reversed sequences were excluded. LFQ intensities for each sample were selected and each value was Log10 transformed. Protein intensities from training and test sets were then normalized using ComBat (Johnson et al., 2007) algorithm in R free software, allowing 10 minimum observations for whole dataset analysis. A second protein list was generated allowing 30% missing data points in the training set and none in the test set for predictor development. LCM and WTL control samples were not included in the ComBat normalization procedure due to the lower amount of identified and quantified proteins. Coefficients of variations of Log10 transformed MS data were calculated according to the following formula (Bland and Altman, 1996):
Pearson correlation coefficients between measurements of LCM and WTL replicates were calculated in Perseus (Max Planck Institute for Biochemistry, Muenchen, Germany). The MS proteomic data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository (Vizcaíno et al., 2013) with dataset identifiers PXD000484 and PXD000485.
TMA was prepared using an ATA 27 (Beecher Instruments, Sun Prairie, WI, USA). 408 paraffin‐embedded primary, ER positive breast cancer tissues derived from patients treated with first line tamoxifen upon recurrence were used to prepare the TMA. Tissue cores of 0.6 mm were taken from each tissue paraffin block and transferred in triplicate into a TMA recipient block. For each tumor tissue sample, three different areas of the tumor were taken as biological replicates. TMA slides were digitalized and analyzed using Slidepath software (Leica Microsystems, Solms, Germany).
Paraffin‐embedded tissues on glass slides were de‐paraffinized at 60 °C, and remnants of paraffin were removed by sequential washings in xylene (3 × 5 min). Re‐hydration was performed by washings through decreasing concentrations of ethanol following with distilled water as follows: 100% ethanol (1 × 5 min, 2 × 2 min), 70% ethanol (1 × 2 min), 50% ethanol (1 × 2 min), distilled water (1 × 2 min). Slides were then incubated at 95 °C for 40 min in DAKO (Agilent Technologies Inc, Santa Clara, CA, USA) antigen retrieval solution (pH 6) diluted 1:10 in MilliQ water, cooled down to room temperature and washed with PBS buffer 3 times for 5 min. Blocking solution consisting of 5% BSA in PBS was added to the slides and incubated for 30 min. Primary antibodies were diluted in DAKO Antibody Diluent, added to each slide and incubated for 1 h at room temperature. Slides were then washed with PBS, and DAKO Envision® secondary antibody (Goat anti‐Mouse‐HRP and G anti‐R‐HRP, 100 μl per slide) solution was added to each slide and incubated for 45 min at room temperature. A washing cycle with PBS was performed for 5 min and a 1:15 solution of DAB+ chromogen in antibody diluent was added, following incubation in the dark for 10 min. Slides were then washed in tap water for 5 min, stained with hematoxylin/eosin for 1 min each and dehydrated again through sequential washings in 50%–70%–100% ethanol and xylene of 5 min each. Cover glasses were mounted with Pertex and slides were left to dry. TMA slides were stained for Programmed Cell Death 4 (PDCD4) protein (1:200), OCIAD1 (1:800), G3BP2 (1:50), and CGN (1:25). Anti‐PDCD4 mouse monoclonal (id: LS‐B2949; clone K4C1) and anti‐OCIAD1 rabbit polyclonal (id: LS‐B5046) antibodies were purchased from Lifespan Technologies (Lifespan technologies Inc, Seattle, WA, USA), anti‐G3BP2 rabbit polyclonal (id: NBP1‐82976) antibody was purchased from Novus Biologicals (Novus Biologicals LLC, 8100 Littleton, CO, USA), and anti‐CGN rabbit polyclonal (HPA027657) antibody was purchased from Sigma.
Data from scored tissues were filtered for missing data and adjuvant endocrine therapy, leading to a final list of 294 tissue samples. PDCD4 antibody stained tissues were separately scored for nuclear and cytoplasmic staining intensity (categories: negative, weak, moderate, strong) and percentage of stained tumor cells (categories: 0%, 1–10%, 11–20%, 21–30%, 31–40%, 41–50%, 51–60%, 71–80%, 81–90%, 91–100%). CGN, and OCIAD1 stained tissues were scored based on intensity parameters only, while G3BP2 scoring included quantity levels as well. TMA was scored by two independent researchers, and the average, consolidated scores of triplicate cores were used for statistical analysis. Due to the fact that PDCD4 cytoplasmic and nuclear stainings were co‐expressed in the evaluated TMA cores, these were merged in order to assess total protein levels. PDCD4 nuclear and cytoplasmic scores were numerically transformed and merged into a histo‐score (Supplemental Table 2) according to formula:
Histo‐score cutoff (i.e. 30) reflective of weak vs strong protein expression was used to stratify patient groups: low and high PDCD4 protein expressing tumors displayed a histo‐score below (<) or above (≥) the cutoff (Supplemental Table 3). PDCD4 cytoplasmic quantity was ranging only from 80% to 90% so it was not included in the histo‐score calculation formula.
Differences in clinical parameters between training and test sets were evaluated by Mann–Whitney U and Pearson χ2 tests (two sided tests). Commonly expressed proteins between the two LCM sets and proteins quantified in WTL sample replicates were annotated through DAVID (Huang and Lempicki, 2008; Huang et al., 2009) for organelle distribution using Swissprot keyword database. Average abundance levels of these proteins in all 112 measured samples were used to generate a waterfall plot of protein abundance distribution.
Protein list used for predictor development was tested for protein differential abundance between patient groups through Student's t‐test (two sided, unequal variances assumed). Hierarchical clustering was performed on all quantified and differentially expressed proteins (t test p value < 0.05), respectively (complete linkage; distance metric: correlation‐uncentered). Significant proteins in the training set were submitted along with their fold changes and t test p values to Ingenuity Pathway Analysis (IPA) network analysis with the following settings: Data sources: all; Confidence: high (predicted) and experimentally observed; species: human. Network was plotted using path designer (Ingenuity Systems, Redwood City, CA, USA).
In order to rule out possible indiscriminative identifiers, the protein predictor was developed selecting the 38 most significant proteins (univariate p value < 0.01) in the training set and a Cox regression multivariate analysis was performed with a step‐down procedure, which involved iteratively removing the least significant proteins (multivariate p value ≥ 0.01) until all remaining proteins in the model showed a multivariate p value < 0.01. Each protein score (t value) was then multiplied by its abundance, and values were then summed for all proteins to obtain a patient score, which was then coupled to outcome data. Each patient score was plotted in a receiver operating characteristic (ROC) curve. Youden index (max of J = Sensitivity + Specificity−1) was set as cutoff in the training set and used to categorize patients in the test set. Log‐rank tests on the survival curves of predicted groups were performed to assess significance of prediction. Association of predictor proteins to TTP was assessed through Cox regression, correcting for patient and tumor characteristics. IHC stainings were used to test for association with TTP, clinical benefit and objective response in combination with clinical parameters by Cox and logistic regression analyses, respectively. Co‐variables that were found not significant in univariate regression analyses were excluded from multivariate models. Cox regression and logistic regression analyses, hazard ratios, odds ratios and confidence intervals were calculated in Stata (version 13.1; Stata Corp, College Station, TX, USA).
One hundred and twelve ER positive primary breast tumor tissues, of which 56 comprised the training set and another 56 the test set, were processed according to our tissue proteomics workflow (Braakman et al., 2012; Liu et al., 2012) and analyzed through high resolution MS.
Analysis of tumor and patient characteristics between the training and test sets showed that age and menopausal status at start of tamoxifen therapy, lymph node status, and tumor size were not significantly different. The test set contained a higher proportion of poorly differentiated tumors (Pearson's χ2 = 21.19, p value < 0.001) compared to the training set. Furthermore, patients in the test cohort had a median disease free interval (DFI) of 51.4 months (range: 0–195 months), which was significantly longer (Mann–Whitney U = −3.814, p value < 0.001) than for patients in the training set (median: 16.4 months, range: 0–90.8 months). This can be attributed to the lack of stage IV tumors in the NKI‐AVL cohort, which possibly contributed to the difference in DFI and grade between training and test set.
LCM discovery and test samples were analyzed along with 8 LCM replicates from 2 separate control tissues, and 3 technical replicates of a control WTL. A total of 2215 proteins were quantified in LCM control samples, and 1320 proteins in the WTL sample replicates, with only 852 proteins quantified in both LCM and WTL controls. Pearson correlation coefficients between LCM samples ranged from 0.92 to 0.97 while it ranged from 0.96 to 0.97 between WTL measurements (Supplemental Figure 1A). Hierarchical clustering of LCM and WTL controls showed grouping according to sample origin without miss‐classifications (Supplemental Figure 1B). Median coefficients of variation of biological and technical replicates were 16.05% (interquartile range, IQR: 10.77–24.56) and 20.35% (IQR: 11.55–34.28), respectively. Reproducibility of MS measurements was defined as acceptable given the low number of control samples replicate measurements.
A total of 3227 proteins were identified in the training set, of which 3109 were quantified by LFQ. In the test set, 4278 proteins were identified and 4061 proteins were quantified. LFQ intensity values of 2741 proteins commonly expressed between the training and test set were normalized for batch differences and filtered for missing data to generate two protein lists: a 1960 protein list (10 minimum observations; Supplemental Table 4) for general proteome analysis and an 845 protein list for predictor development (30% missing data in training set and 0% missing data in test set; Supplemental Table 5). From the analysis of 1960 expressed proteins, a wide distribution of protein abundances was observed over 3 orders of magnitude (Figure 2A). Interferon signaling related (e.g. IFI16, IFIT5) and chaperone associated proteins (e.g. DNAJC7, BAG1) displayed low overall abundance (Figure 2B), while luminal epithelial specific (e.g. KRT18), metabolism related (e.g. PKM, ATP5A1), and heat‐shock (e.g. HSPD1, HSPB1) proteins were found to be highly abundant (Figure 2C). In the training set, CV was 14.10% (IQR: 10.22–18.78), whereas it was 13.86% (IQR: 10.33–18.57) in the test set.
Protein abundance levels in 112 ER positive breast cancer samples. The waterfall plot shows mean protein abundance distribution of 1.960 commonly expressed proteins. The mean abundance of each quantified protein was calculated and plotted. The 30 least ...
DAVID based annotation for subcellular compartment showed that in the 112 breast cancer tissues the majority of expressed proteins belonged to the nuclear (25.76%) and cytoplasmic (56.38%) compartments while the endoplasmic reticulum (9.54%), Golgi apparatus (6.43%), mitochondria (12.65%), plasma membrane (7.50%), and the extracellular matrix (1.84%) comprised a lower amount of proteins. The smallest group consisted of plasma proteins (0.46%; Figure 3A). The distribution of intensity levels of the 1320 proteins quantified in the WTL control sample showed a similar dynamic range but with increased variation, probably due to exclusion from the normalization procedure (Figure 3B). Annotation for cellular compartments showed a similar distribution of the 1320 identified proteins into subcellular compartments compared to the 112 tissue set but with a notable enrichment of extracellular matrix (e.g. COL1A1) and plasma proteins (e.g. APOA1), which represented 7.19% and 6.89% of all quantified proteins in this set, respectively. The minor contribution of extracellular matrix and plasma proteins in the LCM samples suggests that LCM indeed resulted in highly enriched epithelial tumor cell fractions.
Protein compartmentalization and abundance correlation analysis. Panel shows quantified protein abundance range per subcellular compartment in the LCM enriched 112 ER positive tumors (A) and in WTL control replicates (B). Number of proteins per compartment ...
Distribution of intensities of organelle specific proteins showed comparable average levels of expression, therefore showing that all cell compartments were quantified. In the LCM annotated set several proteins showed multiple organelle localization. The nuclear and cytoplasmic compartments showed the highest degree of overlap with 249 (12.70%) proteins, mostly represented by proteasome subunits (e.g. PSME3) and proteins involved in RNA binding (e.g. RBM3, HNRNPA1). A small number of multi‐compartmentalized proteins was constituted by vesicular transport components between the endoplasmic reticulum and the Golgi apparatus (n = 39; 1.99%) such as SEC23A. A subset of proteins was also found co‐localized in both Golgi and cytoplasm (e.g. SEC24D). The remaining compartments showed expression of locally specific proteins (median overlap: 0.59% of total) such as oxidative chain proteins in the mitochondrion (e.g. UQCRC1) or DNA replication and repair involved proteins in the nucleus (e.g. FEN1). These data indicate the capability of LC‐MS coupled to LCM enrichment to assess protein abundances throughout all cellular compartments from minute amounts of epithelial tumor tissues.
Due to the fact that the 1960 proteins did not clearly discriminate patient groups (Supplemental Figure 2), a more stringent filter for missing values was therefore applied and candidate proteins were selected based on their differential abundance between patient groups. On the panel of filtered 845 quantified proteins, a Student's t test was performed to identify 99 proteins that were differentially abundant between good and poor outcome patients in the training set (p value < 0.05). Of these, 50 proteins were found upregulated in the poor outcome group and 49 displayed higher expression in the good outcome group (Supplemental Table 6). In order to define molecular interaction networks between significant molecules, network analysis in IPA was performed. The network that displayed the most hits comprised proteins involved in cell growth and proliferation and cell death and survival, such as CDC37 (upregulated in poor outcome) and PDCD4 (upregulated in good outcome; Supplemental Figure 3). Several molecules included in the network that were found upregulated in the poor outcome patient group were involved in integrin‐linked kinase signaling (e.g. ITGB1, CFL1), a key pathway in cell migration and proliferation, protein translation (e.g. EIF4G1), and DNA mismatch repair (e.g. MSH2). The proteins found upregulated in the good outcome group and comprised in this network were involved in cell cycle (e.g. KRT18) and cell growth (e.g. NOP58). Although not present among the significant proteins, Akt and MAPK pathways constituted the focal point of the network, suggesting their activation based on their interactors expression levels. IPA analysis showed that differentially expressed proteins were involved in cell growth and proliferation and suggested that actors involved in such pathways may have a key role in tamoxifen resistance.
Based on the 99 differentially abundant proteins, hierarchical clustering separated the two patient groups (Figure 4A): 20 out of 28 predicted good outcome patients were correctly classified as “Good”, while 24 out of 28 predicted poor outcome patients correctly grouped in the “Poor” cluster. After more stringent filtering (p value < 0.01) 38 proteins remained, which were included in a multivariate Cox regression model. Using a step‐down approach, we identified a 4‐protein signature that best predicted outcome to tamoxifen treatment. The signature comprised the following proteins: programmed cell death 4 (PDCD4; t test p value < 0.001), Cingulin (CGN; t test p value = 0.006), ovarian carcinoma immuno‐reactive antigen domain containing protein 1 (OCIAD1; t test p value < 0.001) and Ras‐GTPase activating protein‐binding protein 2 (G3BP2; t test p value < 0.001; Table 2 and Table 3). Based on LFQ intensity levels, OCIAD1, CGN and PDCD4 showed a relatively high abundance in good outcome patients, while G3BP2 was more highly abundant in the poor outcome group (Figure 4B). Next, patient scores of the 4‐protein predictor were plotted in a ROC curve to select a cut‐off with the highest sensitivity and specificity at predicting poor outcome (J = 0.740, area under the curve = 0.93, sensitivity = 90.6%, specificity = 83.3%; Figure 5A). The 4‐protein predictor was then validated in the test cohort through Cox regression and Kaplan–Meier analyses. In both Cox univariate and multivariate regression analysis for TTP, the 4‐protein signature was significantly correlated with outcome of tamoxifen therapy (HR = 2.44; 95% CI = 1.30 to 4.54; p value = 0.006) and multivariate (HR = 2.17; 95% CI = 1.15 to 4.17; p value = 0.017) regression analyses corrected for traditional predictive factors (Table 4). In Kaplan Meier analysis, patients with predicted poor outcome had significantly shorter TTP compared to those with a predicted good outcome (HR = 2.32; 95% CI = 1.29 to 4.17; Log‐rank p value = 0.004; Figure 5B). In the test set, sensitivity, specificity, positive predicted value (PPV), and negative predicted value (NPV) in predicting poor outcome patients were 86.7%, 41.5%, 35.1% and 89.5%, respectively.
Hierarchical clustering and differential protein abundance of 4‐protein predictor. Samples in the training set (n = 56) were hierarchically clustered based on 99 differentially abundant proteins (t test p value < 0.05). ...
ROC curve of the training set and Kaplan–Meier curves for TTP as a function of predicted outcome in patients in the test set. Patient outcome scores from the training set were calculated based on abundance levels of the 4 predictor proteins and ...
LFQ based identification of 4 proteins in discovery and validation sets.
Information on the 4 proteins constituting the predictor for tamoxifen therapy outcome.
Univariate and multivariate Cox regression analysis for time to progression.
While our tissue proteomics pipeline proved to be successful in identifying and validating the 4‐protein predictor, this technology is not yet readily available in a clinical setting. Therefore, we assessed protein expression of PDCD4, G3BP2, CNG, and OCIAD1 through IHC, a technology that is routinely used in diagnostic laboratories, in an independent set of formalin‐fixed paraffin‐embedded breast cancer tissues incorporated in a TMA. Normal breast epithelium (i.e. acini and ducts) and leukocytes displayed expression of all markers except for CGN, which stained the myoepithelial cell layer only. Blood vessels displayed expression of all 4 proteins, while overall low to negative staining was displayed in the stromal compartment. Examples of comparative IHC analysis of normal breast tissue, blood vessels, leucocytes and breast carcinoma cells are displayed in Supplemental Figure 4A–B. Strong PDCD4 staining (histo‐score ≥ 30) was found to be significantly associated with longer TTP in univariate (HR = 0.75; 95% CI = 0.59 to 0.96; p value = 0.020) and multivariate Cox regression analysis (HR = 0.72; 95% CI = 0.57 to 0.92; p value = 0.009) corrected for traditional predictive factors (Table 5). PDCD4 stained tissues showing both low or high protein expression and the Kaplan–Meier curve for TTP as a function of the PDCD4 histo‐score are shown in Figure 6A and Figure 6B, respectively. In logistic regression analyses for clinical benefit or objective response, PDCD4 levels (histo‐score ≥ 30 vs. < 30) were not significantly associated with the type of response (data not shown). OCIAD1, CGN and G3BP2 stainings showed strong intensities and high quantities of stained tumor cells in the vast majority of specimens. The limited dynamic range in staining intensities proved insufficient to find a significant association of CGN, OCIAD1 and G3BP2 levels with TTP, clinical benefit or objective response (data not shown).
PDCD4 immunohistochemical staining of tissue micro‐array. Tissue cores showed two different staining patterns that have been evaluated by histo‐score (i.e. Histo‐score < 30 and ≥ 30), representing ...
Univariate and multivariate Cox regression analysis for time to progression.
About half of the recurrent ER positive breast cancer patients treated with tamoxifen show intrinsic resistance to the drug. Despite many studies describing several mechanisms associated to tamoxifen resistance and a large amount of markers associated to patient hormonal treatment outcome, there is no molecular predictor available in the clinic (Chung and Baxter, 2012; Droog et al., 2013). Furthermore, the search for biomarkers in the analysis of clinical specimen is often hindered by tissue heterogeneity, which complicates accurate measurement of tumor protein abundance. In the light of this, tissue enrichment technologies offer an invaluable tool to quantify the proteome of specific cell subpopulations. Though mechanisms of resistance encompass not only a plethora of molecular mechanisms, but also different cell types as stromal ones (den Boon et al., 2015; Jung et al., 2015), analysis of whole tissue specimens would suffer from “signal dilution” derived from protein differential expression in heterogeneous tissues. Furthermore, analysis of microdissected stroma is hindered by the presence of high‐abundance proteins (e.g. collagen family) and often needs additional protein separation methods. In this perspective, we have focused only on the epithelial tumor markers involved in tamoxifen resistance. Having successfully coupled LCM tissue enrichment with high resolution MS in a biomarker discovery pipeline (Braakman et al., 2012, 2012, 2014), we have here developed and validated a 4‐protein signature predicting outcome to tamoxifen treatment in an independent set of ER positive recurrent breast cancer.
Despite the low amount of material derived from tissue enrichment compared to whole tissue specimens, a higher number of proteins was identified and quantified in our LCM samples (training and test sets, and controls) compared to the WTL control, suggesting interference from highly abundant proteins (e.g. collagen family) in the latter. Furthermore, from our global proteomic analysis of our combined training and test sets we showed that plasma and stromal proteins contamination was minimized in the LCM derived material while proteins expressed in subcellular compartments were enriched. This allowed us to take a unique snapshot of protein abundance of breast cancer epithelial tissue and to derive markers specifically involved in tumor cell treatment resistance pathways. From a subset of commonly expressed proteins in our 112 ER‐positive breast cancer tissues we developed and validated a protein signature comprising PDCD4, CGN, OCIAD1 and G3BP2, which was capable of predicting tamoxifen treatment outcome in the test set with 86.7% sensitivity, 41.4% specificity, 35.1% PPV and 89.5% NPV and independently from traditional predictive parameters.
The selection of a large cohort of hormonal‐treatment naive patients allowed us to assess tumor protein abundance directly related to first line tamoxifen treatment without any expression changes derived from previous therapies. Furthermore, the availability of an in‐house training and a multi‐center test set enabled us to test the robustness of our predictor in a heterogeneous set of samples, reflective of differences in pathological evaluation and standard of care among medical centers. While our in‐house training set showed almost equal distribution of patient groups, the multi‐center cohort comprised a high number of good outcome patients, which could be explained by different grading systems used in local hospitals. To transfer our findings more easily to a clinical setting, we also performed IHC staining on an independent cohort of ER‐positive breast cancer tissues, which confirmed PDCD4 to be an independent predictive marker of tamoxifen sensitivity. Nevertheless, the MS based 4‐protein signature was a stronger predictor than the single marker PDCD4, emphasizing the potential of proteomic technologies in the dissection of tumor molecular pathways. Still, introduction of high resolution MS in routine clinical diagnostics remains problematic due to extensive and laborious sample preparation and relatively high costs. On the other side, targeted MS methods offer an accurate tool to detect and quantitate target analytes (i.e. peptides or metabolites) from biological specimens at a relatively lower cost, sample processing and measurement times (Grebe and Singh, 2011; Yassine et al., 2013), and would therefore constitute a more eligible technique for clinical introduction.
Pathways analysis on differentially expressed proteins showed how cell growth and proliferation pathways are key components in tamoxifen therapy response and resistance. Akt and MAPK, although not present among differentially expressed proteins, constituted the center of the molecular interaction network, showing how cell cycle progression through estrogen‐independent mechanisms can overcome tamoxifen treatment. Activation of Akt signaling has been linked to tamoxifen resistance in previous studies (Clark et al., 2002; Klinge, 2015; Nass and Kalinski, 2015), but other molecular mechanisms may be involved. In the light of this, the 4 protein signature not only is capable of discriminating patients that manifested good and poor outcome to tamoxifen treatment, but may also pinpoint other molecular mechanisms of resistance. PDCD4 is an inhibitor of protein translation, which functions both in the nucleus and the cytoplasm (Lankat‐Buttgereit and Göke, 2009). This protein has already been described as a tumor suppressor capable of inhibiting protein synthesis and gene expression by preventing the interaction of eukaryotic initiation factor (eIF) 4A1 and eIF4G, and by binding to target gene transcripts (e.g. MAP4K) in the nucleus, respectively (Biyanee et al., 2014; H. Yang et al., 2006). The nuclear localization of PDCD4 is attributed to Akt phosphorylation in a PI3K‐dependent manner (Palamarchuk et al., 2005). PDCD4 levels have also been negatively correlated to increased expression of miR‐21 in MCF‐7 cells after tamoxifen treatment (Klinge et al., 2010; Manavalan et al., 2011). CGN is involved in tight junction formation and it has been described as a potential epithelial differentiation marker in human neoplasias (Citi et al., 1991; Paschoud et al., 2007). Together with Paracingulin, CNG controls the expression of GATA‐4, contributing to down‐regulation of RhoA in cells, a key regulator of cell cycle progression that displays its function through cytoskeletal re‐organization (Guillemot et al., 2013). OCIAD1 expression has been suggested as a thyroid cancer biomarker and has been correlated to distant metastasis formation, since it was found overexpressed in metastatic ovarian cancer by MS analysis (Sengupta et al., 2008; Yang et al., 2012). Recent studies have demonstrated that OCIAD1 directly interacts with STAT3 and aids in its activation, though whether this leads to activation of the tumor suppressor pathway or the oncogenic one still remains unclear (Lee et al., 2012; Musteanu et al., 2010; Sinha et al., 2013). G3BP2 has been shown to be involved in stress granule formation along with its relative G3BP1, as well as in mRNA binding and gene expression regulation. G3BP1 protein has been shown to have a distinct role in breast cancer cell proliferation by stabilizing mRNA molecules, but its homologue G3BP2 was not associated to any of these characteristics, keeping the function of this protein still ambiguous (Kociok et al., 1999; Matsuki et al., 2013; Winslow et al., 2013). With the exception of OCIAD1, no studies observed a correlation between levels of PDCD4, G3BP2, or CGN and patient survival or therapy response in clinical cancers; nonetheless these markers may play a role in the type of response to tamoxifen in breast cancer. The anti‐proliferative effects of PDCD4 and CNG may have a synergistic role with the anti‐estrogenic action of tamoxifen, which results in the block of cell proliferation. Due to its relatively high expression in good outcome patients, OCIAD1 may activate the tumor suppressor role of STAT3 in ER positive breast cancer patients, further inhibiting proliferation. On the other hand, expression of G3BP2 could actually counteract tamoxifen action by stabilizing mRNAs of estrogen‐responsive elements as well as the ones of ER unrelated growth factors.
We hereby demonstrate that LCM coupled to high resolution LC‐MS not only enables the proteomic analysis of pure cell subpopulations, but it also provides a powerful tool for biomarker discovery studies. This allowed us to delve into the breast cancer proteome and to generate and validate a signature predictive of tamoxifen therapy outcome in recurrent ER‐positive breast cancer. In addition, a technical validation through IHC verified that PDCD4 is an independent marker associated with good outcome patients, although it is difficult to distinguish small changes in protein expression by IHC. Despite the fact that shotgun LC‐MS coupled to LCM based cell enrichment has shown to be a robust tool for biomarker discovery, time‐consuming sample preparation and relatively high costs may hinder its introduction into a clinical setting. In the light of this, targeted LC‐MS methods such as multiple reaction monitoring would be suited to fill this gap, given the fact that accurate quantification of target analytes can be performed at lower costs with reasonable optimization times and in a multiplexed fashion.
This study was supported by the Dutch Cancer Society (KWF), EMCR2009‐4319 and the CTMM‐Breast Care project 030‐104‐06.
The authors declare no conflict of interest regarding this work.
The following are the supplementary data related to this article:
The authors wish to thank all collaborating local hospitals for providing formalin‐fixed paraffin‐embedded tissues, and Renée Foekens and Anita Trapman‐Jansen for assembling the TMA. Marion Meijer van Gelder is thanked for clinical data management. Proteomics data deposition to the ProteomeXchange consortium was supported by the PRIDE team, EBI, with dataset identifiers PXD000484 and PXD000485.
Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.molonc.2015.07.004.
Tommaso De Marchi, Liu Ning Qing, Stingl Cristoph, Timmermans Mieke A., Smid Marcel, Look Maxime P., Tjoa Mila, Braakman Rene B.H., Opdam Mark, Linn Sabine C., Sweep Fred C.G.J., Span Paul N., Kliffen Mike, Luider Theo M., Foekens John A., Martens John W.M., Umar Arzu, (2016), 4‐protein signature predicting tamoxifen treatment outcome in recurrent breast cancer, Molecular Oncology, 10, doi: 10.1016/j.molonc.2015.07.004.
Tommaso De Marchi, Email: firstname.lastname@example.org.
Ning Qing Liu, Email: email@example.com.
Cristoph Stingl, Email: firstname.lastname@example.org.
Mieke A. Timmermans, Email: email@example.com.
Marcel Smid, Email: firstname.lastname@example.org.
Maxime P. Look, Email: email@example.com.
Mila Tjoa, Email: moc.liamtoh@aojtalim.
Rene B.H. Braakman, Email: firstname.lastname@example.org.
Mark Opdam, Email: email@example.com.
Sabine C. Linn, Email: firstname.lastname@example.org.
Fred C.G.J. Sweep, Email: email@example.com.
Paul N. Span, Email: ln.ncmu.rehtr@napS.P.
Mike Kliffen, Email: ln.siuhnekeizdatssaam@MneffilK.
Theo M. Luider, Email: firstname.lastname@example.org.
John A. Foekens, Email: email@example.com.
John W.M. Martens, Email: firstname.lastname@example.org.
Arzu Umar, Email: email@example.com.