Tumor samples and patients
Breast tumor blocks were from a cohort of patients who had consented to the use of any deidentified tissues for research purposes under the auspices of an institutional review board–approved protocol (Vanderbilt IRB number 030747). Selection was based on the following factors: (i) the patient received NAC as treatment and had residual disease at surgery; (ii) sufficient FFPE tissue was available for nucleic acid purification; and (iii) presurgical clinical parameters (grade, mitotic index and ER, PR and HER2 status) were available. Samples were included only if they were deemed by an expert pathologist to contain ≥20% tumor cells in the entire section or if they contained a region with ≥20% tumor cells that could be macrodissected. In total, 49 samples met these criteria. Fifteen of 49 patients had pretreatment tumor biopsies available. This methodology was repeated for a cohort of 89 TNBC FFPE specimens resected after NAC. Patients in this second cohort were identified retrospectively at the Instituto Nacional de Enfermedades Neoplásicas, Lima, Perú and were collected under an institutionally approved protocol (INEN 10-018).
To perform macrodissection, 3–5 serial 10-μm sections of tumor were adhered to uncharged slides using nuclease-free water. One additional 5-μm adjacent section was stained for H&E. The tumor hotspot region was outlined by an expert breast histopathologist. Each slide from the block was then overlaid on the H&E-stained slide and oriented according to the features of the section. The area surrounding the tumor-dense target region was scraped away using a sterile razor blade; the remaining tumor region was scraped into a 1.7-ml tube using a fresh blade. This process was repeated for all of the sections for each macrodissected sample.
Nucleic acid purification
Nucleic acid purification was performed using the Agencourt FormaPure Kit (Beckman Coulter, Beverly, MA) according to the manufacturer’s instructions. Total nucleic acid was extracted from the samples. DNAse treatment was performed on 1 μg total nucleic acid before nCounter analysis.
NanoString nCounter analysis
RNA samples were provided to NanoString (Seattle, WA) for analysis. Samples were assayed on a Bioanalyzer (Agilent, Santa Clara, CA) to determine the concentration of intact RNA (Supplementary Fig. 2
). Code sets were synthesized targeting 355 genes and 14 controls (369-plex code set). The raw transcript counts were normalized by dividing by the geometric mean of the seven preselected normalization housekeeper genes (WAS
), which cover a range of levels of constitutive expression.
Intrinsic subtyping was performed on the NanoString data by hierarchical clustering. Four clusters were identified. On the basis of the gene expressions of known basal (KIT
), luminal (ESR1
) and HER2-enriched (ERBB2
) markers, these clusters were annotated accordingly. For external microarray data sets, PAM50 molecular subtyping was performed using the genefu package in R on the scaled log2
-normalized microarray data13,41
Immunohistochemistry was performed for Ki-67 (m7240; Dako, Denmark), pERK1/2 (Cell Signaling, 9101) and DUSP4 (Santa Cruz, sc-10797). FFPE tumor sections were scanned at 100× magnification, and the area containing the highest number of positive cells in each case was selected. Positive and negative tumor cells were manually counted at 400×; the percentage of positive cells was calculated with at least 700 viable cells. Antigen retrieval for Ki-67 was performed using HpH Buffer (pH 8.0) in a decloaking chamber (Biocare Medical, Concord, MA). The antibody to Ki-67 (m7240; Dako, Denmark) was used at a 1:75 dilution overnight. Visualization was performed using the 4plus Detection System (Biocare) and 3,3′-diaminobenzidine (DAB) (Dako) as the chromogen. IHC was performed for DUSP4 (Santa Cruz, sc-10797) according to the following parameters: antigen retrieval using citrate buffer, pH 6.0 (decloaking chamber); dilution of 1:100; overnight incubation at 4 °C; and the Envision Visualization System from Dako. IHC was performed for pERK1/2 (Cell Signaling, 9101) according to the following parameters: antigen retrieval using citrate buffer, pH 6.0 (decloaking chamber); dilution of 1:80; overnight incubation at 4 °C; and the Envision Visualization System from Dako using DAB (Dako). DUSP4- and pERK1/2-stained tumor regions were scored independently for cytoplasmic and nuclear staining by an expert histopathologist by calculating the product of the percentage of cells staining at each intensity level and the intensity level (1+ to 3+ intensity, as estimated by an expert pathologist). An H score was then calculated by summing the individual intensity level scores. A composite H score was calculated by summing the H scores for the nuclear and cytoplasmic regions.
Hierarchical clustering, linear regression, Kaplan-Meier analyses, ANOVA and Student’s t
tests were performed in R (http://cran.r-project.org/
) or GraphPad Prism (GraphPad Software, La Jolla, CA). Receiver operator characteristic curves were generated using JMP 7 (SAS Institute, Cary, NC). Bonferroni-corrected post-hoc t
tests were used to make selected comparisons in multigroup analyses after a significant result was obtained using ANOVA.
All cell lines were obtained from American Type Culture Collection (Rockland, MD). MDA-231, MCF-7, MDA-436 and T47D cells were cultured in DMEM with 10% FBS (Gibco). SUM159PT cells were cultured in DMEM with 5% FBS and 0.5 μg/ml hydrocortisone. BT-549 cells were cultured in RPMI with 10% FBS. MCF-10A cells were grown in DMEM and F12 at a 1:1 ratio with 5% horse serum, 10 μg/ml insulin, 100 ng/ml cholera toxin, 0.5 μg/ml hydrocortisone and 20 ng/ml epidermal growth factor.
Docetaxel was obtained from the Vanderbilt University Hospital Outpatient Pharmacy and diluted in DMSO to a stock concentration of 1.25 mM. Selumetinib was provided by AstraZeneca and reconstituted in DMSO at a stock concentration of 10 μM. Selumetinib was tested in cells at a final concentration of 1 μM.
MDA-231 cells (1 × 106) were injected into the left inguinal mammary fatpads of female BALB/c athymic mice (Harlan Laboratories) in 100 μl 1:1 DMEM and growth-factor–reduced Matrigel (BD Biosciences). All tumors were palpable within 10 d. Tumor volume in mm3 was measured three times weekly using the formula: volume = width2 × length/2. When tumors were ≥100 mm3, mice were randomized to receive saline (control) or docetaxel (15 mg per kg body weight weekly intraperitoneally), each with or without selumetinib (25 mg per kg body weight emulsified in 50 μl 0.1% methylcellulose and 0.1% Tween-80, twice daily orally; n = 9 per group) or vehicle gavage. Mice were treated through day 24, at which time they were killed for tumor collection 1 h after the last administration of selumetinib. In some cases, tumors were collected after 3 d of therapy to assess the inhibition of drug targets in situ.
Immunoblotting was carried out as described42
. Antibodies to the following were used for immunoblotting: pERK1/2 (phosphorylated at Thr202 or Tyr204; 9101; 1:5,000), cleaved caspase 3 (9664; 1:500), PARP (9542; 1:1,000), calnexin (2433; 1:5,000), pELK-1 (9181; 1:1,000), ELK-1 (9182; 1:1,000), pcJun (2361; 1:1,000), cJun (9165; 1:1,000) (all from Cell Signaling), pETS-1 (phosphorylated at Thr38, Invitrogen, 44-1104G; 1:500), ETS-1 (Santa Cruz, sc-350; 1:1,000), DUSP4 (Abcam, ab7259; 1:500) and actin (Sigma, A2066; 1:10,000).
Gene set and metagene selection
The STROMAL_META metagene group comprised a set of 50 genes shown to be correlated with the seed stromal gene DCN
(decorin) fitted on an external dataset of patients with breast cancer10
. The high expression of this metagene group predicts intrinsic resistance to chemotherapy and is reported elsewhere10
. The CHEMO gene set is a set of 31 genes; high expression of five of these genes is associated with pCR to NAC, and high expression of the other 26 genes in the set is associated with lack of pCR after NAC. These genes were identified on a training set of 82 breast tumors to which clinical response to NAC was known, and they were then validated on a set of 51 additional breast tumors9
. The WNT/METS gene set is a 13-gene component of a larger signature that was identified in an orthotopic human breast cancer xenograft that was metastatic to lung11
. These 13 genes were identified by gene ontology to be linked to the Wnt pathway and were associated with reduced time to metastasis, poor prognosis and reduced overall survival in patients with breast cancer11
The CLUSTER gene set was identified by unsupervised techniques to select genes potentially correlated with Ki-67 score after therapy and, thus, with poor patient outcome. To accomplish this, 102 tumor samples that were previously published as part of the EORTC 10994 study10
were queried using hierarchical clustering to identify subclasses within the study population that did not undergo a pCR after NAC. Expression of ER protein and its associated expression signature can easily mask other gene expression patterns of biological relevance43
. Therefore, to ensure that an ER-driven signature was not directly selected for interrogation, we further reduced the class discovery sample set to only the 37 ER−
tumors that did not have a pCR in response to NAC. These 37 tumors were subjected to hierarchical clustering (genes with >100% coefficient of variation; CV and an average signal intensity of at least 100) to identify two transcriptionally distinct classes of ER−
tumors. The 354 probe sets (P
< 0.01, t
test), which differentiated these two classes (called here the CLUSTER signature), mapped to 244 unique Entrez identification numbers.
Selection of normalization genes
To identify candidate normalization genes a priori
, we used previously published laser-capture microdissection data from 50 breast tumors (GSE5847)44
. Genes were selected based on a low coefficient of variation among all samples (<10%) and no significant differences in expression between the tumor epithelium and the surrounding stroma. Genes with varying average expressions across all tissue samples were selected to generate a diverse series of normalization genes.
NanoString nCounter analysis
All samples were assayed in duplicate using a 5-μl aliquot for a total of 98 assays. Hybridizations were carried at 65 °C for 18 h after mixing 5 μl of sample with 20 μl NanoString nCounter Reporter Probe and 5 μl Capture Probe. To account for slight differences in assay efficiency (hybridization, purification, binding and so on), the data were normalized to the sum of six positive control RNA spikes. The concentrations of the control RNA spikes ranged from 0.125 to 128 fM. All but one assay (of 98 total) passed quality control metrics for control spike linearity (R2 > 0.95) and sensitivity (control spike detection at 0.5 fM).
Expression data normalization, technical reproducibility and robustness
The correlation of expression across the 348 assayed genes was calculated for each pair of technical replicates. The interreplicate correlation was extremely high for 48 of the 49 samples (replicate r
range of 0.998869–0.999991; (Supplementary Fig. 2f
)). One sample was not assayed in duplicate because of a technical problem with a replicate that did not produce a signal.
Raw nCounter data from seven preselected normalization genes were first plotted as a correlation matrix to test their similarity of expression before using them to normalize the remaining data. Plotting the expression data in this manner permits the visual identification of individual genes that vary markedly in their pattern of expression relative to other genes across all of the samples. In this case, all seven selected genes had generally high positive correlation to one another, suggesting that they were all appropriate for normalization and that their expression varied according to the amount of input RNA (Supplementary Fig. 2g
). One of the normalization genes, NPAS1
, was not well correlated with several of the other normalization genes. Thus, to protect against the potential contribution of this gene as a normalization outlier, we used a geometric mean as opposed to an arithmetic mean. Therefore, the geometric mean of the seven transcript counts was calculated to serve as a normalization factor for the remaining data.
Gene set scoring
The CHEMO, WNT/METS and STROMA_META gene sets were scored by first summing the upregulated genes and downregulated genes from each respective gene set. The gene set score was then calculated as: gene set score = (upregulated gene component score) − (downregulated gene component score).
The Ras-ERK activation signature was comprised of 57 upregulated probe sets, as reported previously24
. The 57 probe sets were extracted from the target RMA-normalized log2-transformed Affymetrix U133plus2 dataset. The resulting signal intensities were summed for each sample to generate the Ras-ERK pathway score.
The CLUSTER gene set was scored by summing the normalized log2 transcript counts for genes upregulated in the cluster A and cluster B components. The CLUSTER score was then generated as: CLUSTER score = (cluster A component score) − (cluster B component score).
Recurrence score quantification
The recurrence score (RS) was reproduced based on the methods reported by Paik et al.8
. The log2 housekeeper gene– normalized data were used to approximate the ΔCT values used in the Oncotype DX algorithm. The recurrence score was calculated in R as follows (using the normalized transcript counts for each gene where indicated):
- GRB7 group score = (0.9 × GRB7) + (0.1 × ERBB2)
- ER group score = ((0.8 × ESR1) + (1.2 × PGR) + (1 × BCL2) + (1 × SCUBE2))/4
- Proliferation group score = ((1 × survivin) + (1 × MKI67) + (1 × CCNB1))/3
Two genes from the proliferation group were not available; thus, we divided the gene sum by a factor of 3 instead of by a factor of 5, as is reported in the original algorithm8
- Invasion group score = ((1 × CTSL2) + (1 × MMP11))/2
The unscaled recurrence score (RSU) was then calculated as:
The RSU for the 49 patients was then scaled from 0–100 using the rescale function of the genefu package in R41
. A scaled RS (RSS) <18 was considered ‘low’; an RSS of 18–31 was considered ‘intermediate’; and RSS >31 was considered ‘high’.
No threshold cutoffs were used in the approximation of the RS as is performed in the Oncotype DX algorithm.
Microarray data sets
Raw microarray data from 230 unique breast tumors from the MAQC-II project were downloaded from Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/
) under accession GSE20194 (ref. 21
). Microarray data were generated from pretreatment fine-needle aspirates of the primary tumor and from the surgical specimen after NAC. The patients were treated with 6 months of NAC. Raw microarray data from the study published by Wang et al.25
and clinical follow-up data were extracted from GEO under accession GSE2034.
Raw microarray data for the ICBP-50 were extracted from ArrayExpress (http://www.ebi.ac.uk/arrayexpress/
) under accession E-TABM-157 (ref. 20
). All data were log2
transformed and RMA normalized in R before analysis. Normalized probe-level methylation data from 189 annotated breast tumors were downloaded from GEO under accession GSE22210 (ref. 22
). DUSP4 promoter methylation levels were plotted according to molecular subtype as reported by the authors of ref 22
. Only tumors classified as either HER2-enriched, basal like or luminal (A or B) were included in the analysis (n
= 138). Normalized probe-level methylation data from 28 annotated breast tumors were downloaded from GEO under accession GSE22135 (ref. 23
). DUSP4 promoter methylation levels were plotted according to molecular subtype as reported by the authors of ref. 23
. Only tumors classified as either HER2-enriched, basal-like or luminal (A or B) were included in the analysis.
Microarray data from 32 tumors sampled before and after four cycles of NAC with epirubicine and cyclophosphamide every 3 weeks followed by four cycles of docetaxel were downloaded from GEO under accession GSE21974 (ref. 45
). All patients with gene expression data at both time points had residual tumor in the breast after therapy. Patients were classified as responders or non-responders according to substantial tumor volume response by ultrasonigraph (as defined by the investigators). Additionally, microarray data from 15 patients treated with four cycles of NAC (docetaxel and capecitabine) were downloaded from GEO under accession GSE18728. Only those patients with pretreatment and surgical specimens were included. Patients were categorized as responders or non-responders based on change in tumor size by clinical exam and pathologic response as assessed and reported by the investigators27
Reverse-phase protein array analysis
Breast cancer cell lines in the ICBP-50 panel were maintained in culture as described20
. RPPA of lysates from 42 cell lines in the ICBP-50 panel was performed as described46–49
. In brief, cell lysates were normalized to 1 μg/μl concentration as assessed by bicinchoninic acid assay and boiled with 1% SDS. Supernatants were manually diluted fivefold with lysis buffer. An Aushon BioSystems 2470 arrayer (Burlington, MA) created 1,056-sample arrays on nitrocellulose-coated FAST slides (Schleicher and Schuell BioScience, Inc.). Slides were probed with validated primary antibodies to pERK1/2 (Cell Signaling, 4377; 1:1,000) and total ERK2 (Santa Cruz, SC-154; 1:250) and signal amplified using a DakoCytomation-catalyzed system. Secondary antibodies were used as a starting point for amplification. Slides were scanned, analyzed and quantified using Microvigene software (VigeneTech Inc., Carlisle, MA) to generate spot signal intensities, which were processed by the R package SuperCurve (version 1.01, http://bioinformatics.mdanderson.org/OOMPA
. A fitted curve (‘supercurve’) was plotted, with the signal intensities on the y
axis and the relative log2 concentration of each protein on the x
axis, using the nonparametric, monotone increasing B-spline model47
. Protein concentrations were derived from the supercurve for each lysate by curve fitting and were normalized by median polish. Protein measurements were corrected for loading as previously described46,49,50
The transduction procedure and the documentation of GFP-expressing (AdGFP) and the constitutively active MEK1 (AdMEK1ca) adenoviruses were conducted as previously reported42
. The AdMEK1ca construct was kindly provided by E.P. Black (University of Kentucky, Lexington KY). Adenovirus expressing DUSP4 (AdDUSP4) was purchased from Vector Biolabs (Philadelphia, PA).
Lentiviral transduction of wild-type and siRNA-resistant DUSP4
The destination vector pLX301 was purchased from Addgene51
. The Gateway entry vector for the wild-type DUSP4 open reading frame (pENTR221) was purchased from Open Biosystems (100066579). Six synonymous point mutations were introduced in the target coding sequence of siDUSP4 construct 3 to render the resulting transcript resistant to the siRNA (services provided by GENEWIZ, South Plainfield, NJ). The resulting sequences are shown below (siRNA target in bold).
- wild type: GACTGCCCAAACCACTTTGAAGGACACTATCAGTACA AGTGCATCCCAGTGGAAGATAAC;
- mutated: GACTGCCCAAACCACTTTGAGGGTCATTACCAATATA AGTGCATCCCAGTGGAAGATAAC
Gateway recombination was performed using the Invitrogen Clonase II LR Kit, resulting in the wild-type pLX301-DUSP4 and pLX301-DUSP4 mutant constructs. Lentivirus was packaged in 293T cells by cotransfection with the plasmids psPAX2 and pMD2G. MDA-231 cells were transduced with viral supernatant and selected with puromycin for over 1 week before plating for experiments.
SiRNA knockdown of DUSP4
Cells were reverse transfected by plating 2 × 105 cells in 5 ml of growth medium in 60-mm dishes containing precomplexed Lipofectamine RNAiMAX (Invitrogen) and siRNA in Optimem (Gibco) medium. After 24 h, cells were transferred to 6-well or 96-well plates for drug treatment. SiRNA duplexes were obtained from QIAGEN (siCONTROL: 5′-GGAAGCAGACTCACTCTTATA-3′) or Dharmacon (ON-TARGETplus, siDUSP4 construct 1: 5′-GUACAUCGAUGCCGUGAAG-3′; siDUSP4 construct 2: 5′-CAUCACGGCUCUGUUGAAU-3′; and siDUSP4 construct 3: 5′-GAAG GACACUAUCAGUACA-3′). All siRNAs were used at a final concentration of 20 nM.
MDA-231 and MDA-436 cells grown in 4-well chamber slides were fixed in 10% neutral buffered formalin, washed twice with PBS and then blocked for 30 min in 3% cold fish gelatin (Sigma-Aldrich) in PBST (0.1% Tween-20 in PBS, pH 7.4). Slides were incubated overnight at 4 °C with a rabbit antibody to DUSP4 (Abcam, ab7259, 1:200) and a goat antibody to cleaved caspase 3 (Santa Cruz Biotechnologies, sc-22171, 1:100), washed five times with PBS (2 min per wash) and incubated with fluorochrome-conjugated donkey antibody for rabbit or goat IgG. (Santa Cruz Biotechnologies, 1:100 in PBS) for 2 h at 4 °C, washed five times with PBS (2 min per wash) and mounted using Vectashield with DAPI (Vector Laboratories, Burlingame, VT). Images were captured using ProgRes software and a Jenoptik ProgRes digital camera mounted on a Motic AE31 microscope. To quantify activated caspase 3, four or five random images (FITC for cleaved caspase 3 and DAPI) were taken from each well. The images were converted to a binary signal and quantified by densitometry using ImageJ. Each caspase 3 measurement was normalized to its paired DAPI measurement to control for cell number.
Cell viability and apoptosis assays
Sulforhodamine B assays were performed as previously described42
. Apoptosis assays (Caspase-Glo) were performed in 96-well black-walled plates according to the manufacturer’s protocol (Promega, Madison, WI). SRB and Caspase-Glo assay data were normalized by subtracting the signal in a blank well and then dividing by the signal from untreated cells where indicated (percentage of control viability).