Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively, p<2x10-16. Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries, but detection of eSNV and fusion transcripts was less sensitive.
Our goal in these analyses was to use genomic features from a test set of primary breast tumors to build an integrated transcriptome landscape model that makes relevant hypothetical predictions about the biological and/or clinical behavior of HER2-positive breast cancer. We interrogated RNA-Seq data from benign breast lesions, ER+, triple negative, and HER2-positive tumors to identify 685 differentially expressed genes, 102 alternatively spliced genes, and 303 genes that expressed single nucleotide sequence variants (eSNVs) that were associated with the HER2-positive tumors in our survey panel. These features were integrated into a transcriptome landscape model that identified 12 highly interconnected genomic modules, each of which represents a cellular processes pathway that appears to define the genomic architecture of the HER2-positive tumors in our test set. The generality of the model was confirmed by the observation that several key pathways were enriched in HER2-positive TCGA breast tumors. The ability of this model to make relevant predictions about the biology of breast cancer cells was established by the observation that integrin signaling was linked to lapatinib sensitivity in vitro and strongly associated with risk of relapse in the NCCTG N9831 adjuvant trastuzumab clinical trial dataset. Additional modules from the HER2 transcriptome model, including ubiquitin-mediated proteolysis, TGF-beta signaling, RHO-family GTPase signaling, and M-phase progression, were linked to response to lapatinib and paclitaxel in vitro and/or risk of relapse in the N9831 dataset. These data indicate that an integrated transcriptome landscape model derived from a test set of HER2-positive breast tumors has potential for predicting outcome and for identifying novel potential therapeutic strategies for this breast cancer subtype.
Background and objective
Gemcitabine is widely used to treat non-small cell lung cancer (NSCLC). To assess the pharmacogenomic effects of the entire gemcitabine metabolic pathway, we genotyped SNPs within the 17 pathway genes using DNA samples from NSCLC patients treated with gemcitabine to determine the effect of genetic variants within gemcitabine pathway genes on overall survival (OS) of NSCLC patients after treatment of gemcitabine.
Eight of the 17 pathway genes were resequenced with DNA samples from Coriell lymphoblastoid cell lines (LCLs) using Sanger sequencing for all exons, exon-intron junctions and 5′-, 3′-UTRs. A total of 107 tag SNPs were selected based on the resequencing data for the 8 genes and on HapMap data for the remaining 9 genes, followed by successful genotyping of 394 NSCLC patient DNA samples. Association of SNPs/haplotypes with OS was performed using the Cox regression model, followed by functional studies performed with LCLs and NSCLC cell lines.
5 SNPs in 4 genes (CDA, NT5C2, RRM1, and SLC29A1) showed associations with OS of those NSCLC patients, as well as 9 haplotypes in 4 genes (RRM1, RRM2, SLC28A3, and SLC29A1) with P < 0.05. Genotype imputation using the LCLs was performed for a region of 200kb surrounding those SNPs, followed by association studies with gemcitabine cytotoxicity. Functional studies demonstrated that downregulation of SLC29A1, NT5C2, and RRM1 in NSCLC cell lines altered cell susceptibility to gemcitabine.
These studies help identify biomarkers to predict gemcitabine response in NSCLC, a step toward the individualized chemotherapy of lung cancer.
gemcitabine; pharmacogenomics; metabolic pathway; lymphoblastoid cell lines; non-small cell lung cancer (NSCLC)
Accumulating evidence demonstrates that PKCι is an oncogene and prognostic marker that is frequently targeted for genetic alteration in many major forms of human cancer. Functional data demonstrate that PKCι is required for the transformed phenotype of NSCLC, pancreatic, ovarian, prostate, colon and brain cancer cells. Future studies will be required to determine whether PKCι is also an oncogene in the many other cancer types that also overexpress PKCι. Studies of PKCι using genetically defined models of tumorigenesis have revealed a critical role for PKCι in multiple stages of tumorigenesis, including tumor initiation, progression and metastasis. Recent studies in a genetic model of lung adenocarcinoma suggest a role for PKCι in transformation of lung cancer stem cells. These studies have important implications for the therapeutic use of aurothiomalate (ATM), a highly selective PKCι signaling inhibitor currently undergoing clinical evaluation. Significant progress has been made in determining the molecular mechanisms by which PKCι drives the transformed phenotype, particularly the central role played by the oncogenic PKCι-Par6 complex in transformed growth and invasion, and of several PKCι-dependent survival pathways in chemo-resistance. Future studies will be required to determine the composition and dynamics of the PKCι-Par6 complex, and the mechanisms by which oncogenic signaling through this complex is regulated. Likewise, a better understanding of the critical downstream effectors of PKCι in various human tumor types holds promise for identifying novel prognostic and surrogate markers of oncogenic PKCι activity that may be clinically useful in ongoing clinical trials of ATM.
tumorigenesis; gene amplification; signal trandsuction; invasion; metastasis; aurothiomalate
Acetaminophen is the leading cause of acute hepatic failure in many developed nations. Acetaminophen hepatotoxicity is mediated by the reactive metabolite N-acetyl-p-benzoquinonimine (NAPQI). We performed a “discovery” genome-wide association study using a cell line–based model system to study the possible contribution of genomics to NAPQI-induced cytotoxicity. A total of 176 lymphoblastoid cell lines from healthy subjects were treated with increasing concentrations of NAPQI. Inhibiting concentration 50 values were determined and were associated with “glutathione pathway” gene single nucleotide polymorphisms (SNPs) and genome-wide basal messenger RNA expression, as well as with 1.3 million genome-wide SNPs. A group of SNPs in linkage disequilibrium on chromosome 3 was highly associated with NAPQI toxicity. The p value for rs2880961, the SNP with the lowest p value, was 1.88 × 10−7. This group of SNPs mapped to a “gene desert,” but chromatin immunoprecipitation assays demonstrated binding of several transcription factor proteins including heat shock factor 1 (HSF1) and HSF2, at or near rs2880961. These chromosome 3 SNPs were not significantly associated with variation in basal expression for any of the genome-wide genes represented on the Affymetrix U133 Plus 2.0 GeneChip. We have used a cell line–based model system to identify a SNP signal associated with NAPQI cytotoxicity. If these observations are validated in future clinical studies, this SNP signal might represent a potential biomarker for risk of acetaminophen hepatotoxicity. The mechanisms responsible for this association remain unclear.
acetaminophen; N-acetyl-p-benzoquinonimine; NAPQI; cytotoxicity; single nucleotide polymorphisms; SNPs; expression array; mRNA; Human Variation Panel; GWAS
Gemcitabine is a cytidine analogue used in the treatment of various solid tumors. Little is known about how gemcitabine and its metabolites are transported out of cells. We set out to study the efflux of gemcitabine and the possible consequences of that process in cancer cells. We observed the efflux of gemcitabine and its deaminated metabolite, 2’,2’-difluorodeoxyuridine (dFdU) using high performance liquid chromatography and tandem mass spectrometry (LC-MS/MS) after gemcitabine treatment. Non-selective ABCC-transport inhibition with probenecid significantly increased intracellular dFdU concentrations, with a similar trend observed with verapamil, a non-selective ABCB1 and ABCG2 transport inhibitor. Neither probenecid nor verapamil altered intracellular gemcitabine levels after the inhibition of deamination with tetrahydrourudine, suggesting that efflux of dFdU, but not gemcitabine, was mediated by ABC transporters. MTS assays showed that probenecid increased sensitivity to gemcitabine. While dFdU displayed little cytotoxicity, intracellular dFdU accumulation inhibited cytidine deaminase, resulting in increased gemcitabine levels and enhanced cytotoxicity. Knockdown of ABCC3, ABCC5 or ABCC10 individually did not significantly increase gemcitabine sensitivity, suggesting the involvement of multiple transporters. In summary, ABCC-mediated efflux may contribute to gemcitabine resistance through increased dFdU efflux that allows for the continuation of gemcitabine deamination. Reversing efflux-mediated gemcitabine resistance may require broad-based efflux inhibition.
Gemcitabine; Cytotoxicity; Drug Efflux; Transport
KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.
transcriptome sequencing; RNA-Seq; KRAS mutation; NSCLC; bioinformatics; network analysis; data integration and computational methods
Summary: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways.
Availability and implementation: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm
Contact: Hossain.Asif@mayo.edu; Kocher.JeanPierre@mayo.edu
Supplementary information: Supplementary data are provided at Bioinformatics online.
SnowShoes-FTD, developed for fusion transcript detection in paired-end mRNA-Seq data, employs multiple steps of false positive filtering to nominate fusion transcripts with near 100% confidence. Unique features include: (i) identification of multiple fusion isoforms from two gene partners; (ii) prediction of genomic rearrangements; (iii) identification of exon fusion boundaries; (iv) generation of a 5′–3′ fusion spanning sequence for PCR validation; and (v) prediction of the protein sequences, including frame shift and amino acid insertions. We applied SnowShoes-FTD to identify 50 fusion candidates in 22 breast cancer and 9 non-transformed cell lines. Five additional fusion candidates with two isoforms were confirmed. In all, 30 of 55 fusion candidates had in-frame protein products. No fusion transcripts were detected in non-transformed cells. Consideration of the possible functions of a subset of predicted fusion proteins suggests several potentially important functions in transformation, including a possible new mechanism for overexpression of ERBB2 in a HER-positive cell line. The source code of SnowShoes-FTD is provided in two formats: one configured to run on the Sun Grid Engine for parallelization, and the other formatted to run on a single LINUX node. Executables in PERL are available for download from our web site: http://mayoresearch.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
Akt is a central regulator of cell growth. Its activity can be negatively regulated by the phosphatase PHLPP that specifically dephosphorylates the hydrophobic motif of Akt (Ser473 in Akt1). However, how PHLPP is targeted to Akt is not clear. Here we show that FKBP51 (FK506-binding protein 51) acts as a scaffolding protein for Akt and PHLPP and promotes dephosphorylation of Akt. Furthermore, FKBP51 is downregulated in pancreatic cancer tissue samples and several cancer cell lines. Decreased FKBP51 expression in cancer cells results in hyperphosphorylation of Akt and decreased cell death following genotoxic stress. Overall, our findings identify FKBP51 as a negative regulator of the Akt pathway, with potentially important implications for cancer etiology and response to chemotherapy.
FKBP51; Chemotherapy; Akt; PHLPP; Cancer
S-Adenosylhomocysteine hydrolase (AHCY) is the only mammalian enzyme known to catalyze the hydrolysis of S-adenosylhomocysteine (AdoHcy). We have used a genotype-to-phenotype strategy to study this important enzyme by resequencing AHCY in 240 DNA samples from four ethnic groups. Thirty-nine polymorphisms were identified – 28 of which were novel. Functional genomic studies for wild type (WT) AHCY and the three variant allozymes identified showed that two variant allozymes had slight, but significant decreases in enzyme activity, but with no significant differences in levels of immunoreactive protein. Luciferase reporter gene assays for common 5′-flanking region (5′-FR) haplotypes revealed that one haplotype with a frequency of ∼2% in Caucasian-American subjects displayed a decreased ability to drive transcription. The variant nucleotide at 5′-FR SNP (-34) in that haplotype altered the DNA-protein binding pattern during electrophoresis mobility shift assay (EMSA). Finally, an AHCY genotype-phenotype association study for expression in lymphoblastoid cells identified four SNPs that were associated with decreased expression. For the IVS6 (intervening sequence 6, i.e., intron 6) G56>C SNP among those four, EMSA showed that a C>G nucleotide change resulted in an additional shifted band. These results represent a step toward understanding the functional consequences of common genetic variation in AHCY for the regulation of neurotransmitter, drug and macromolecule methylation.
S-Adenosylhomocysteine hydrolase; AHCY; functional genomics; gene sequence variation; single nucleotide polymorphisms; SNPs; gene resequencing
5’-Nucleotidases play a critical role in nucleotide pool balance and in the metabolism of nucleoside analogs such as gemcitabine and cytosine arabinoside (AraC). We previously performed an expression array association study with gemcitabine and AraC cytotoxicity using 197 human lymphoblastoid cell lines. One gene that was significantly associated with gemcitabine cytotoxicity was a nucleotidase family member, NT5C3. Very little is known with regard to the pharmacogenomics of this family of enzymes.
We set out to identify common genetic variation in NT5C3 by resequencing the gene and to determine the effect of that variation on NT5C3 protein function and potential effect on response to cytidine analogues. We identified 61 NT5C3 polymorphisms, 48 of which were novel, by resequencing 240 ethnically defined DNA samples. Functional studies were performed with one nonsynonymous (G847C, Asp283His) and 4 synonymous cSNPs (T9C, C276T, T306C, and G759A), as well as three combined variants (T276/His283, T276/C306, T276/C9).
The His283 and T276/His283 constructs showed decreased levels of enzyme activity and protein. Substrate kinetic analysis showed no significant differences in Km values between WT and His283 when CMP, AraCMP and GemMP were used as substrates. An association study between SNPs and NT5C3 expression in the 240 cell lines from which DNA was extracted to resequence NT5C3 identified 4 SNPs that were significantly associated with NT5C3 expression. EMSAs showed that two of those SNPs, I4(-114) and I6(9), altered DNA-protein binding patterns. These findings suggest that genetic variation in NT5C3 might affect protein function and potentially influence drug response.
Gemcitabine; cytosine arabinoside; cytosolic 5’-nucleotidase III; pharmacogenomics; single nucleotide polymorphisms (SNPs); functional genomics
The human genome displays extensive copy-number variation (CNV). Recent discoveries have shown that large segments of DNA, ranging in size from hundreds to thousands of nucleotides, are either deleted or duplicated. This CNV may encompass genes, leading to a change in phenotype, including drug response phenotypes. Gemcitabine and 1-β-D-arabinofuranosylcytosine (AraC) are cytidine analogues used to treat a variety of cancers. Previous studies have shown that genetic variation may influence response to these drugs. In the present study, we set out to test the hypothesis that variation in copy number might contribute to variation in cytidine analogue response phenotypes.
We used a cell-based model system consisting of 197 ethnically-defined lymphoblastoid cell lines for which genome-wide SNP data were obtained using Illumina 550 and 650 K SNP arrays to study cytidine analogue cytotoxicity. 775 CNVs with allele frequencies > 1% were identified in 102 regions across the genome. 87/102 of these loci overlapped with previously identified regions of CNV. Association of CNVs with gemcitabine and AraC IC50 values identified 11 regions with permutation p-values < 0.05. Multiplex ligation-dependent probe amplification assays were performed to verify the 11 CNV regions that were associated with this phenotype; with false positive and false negative rates for the in-silico findings of 1.3% and 0.04%, respectively. We also had basal mRNA expression array data for these same 197 cell lines, which allowed us to quantify mRNA expression for 41 probesets in or near the CNV regions identified. We found that 7 of those 41 genes were highly expressed in our lymphoblastoid cell lines, and one of the seven genes (SMYD3) that was significant in the CNV association study was selected for further functional experiments. Those studies showed that knockdown of SMYD3, in pancreatic cancer cell lines increased gemcitabine and AraC resistance during cytotoxicity assay, consistent with the results of the association analysis.
These results suggest that CNVs may play a role in variation in cytidine analogue effect. Therefore, association studies of CNVs with drug response phenotypes in cell-based model systems, when paired with functional characterization, might help to identify CNV that contributes to variation in drug response.
The proteasome is a multisubunit cellular organelle that functions as a nonlysosomal threonine protease. Proteasomes play a critical role in the degradation of proteins, regulating a variety of cellular processes, and they are also the target for antineoplastic proteasome inhibitors. Genetic variation in proteasome subunits could influence both proteasome function and response to drug therapy.
We resequenced genes encoding the three active proteasome β subunits using 240 DNA samples from four ethnic groups and the β5 subunit gene in 79 DNA samples from multiple myeloma patients who had been treated with the proteasome inhibitor bortezomib. Resequencing was followed by functional studies of polymorphisms identified in the coding region and 3′-flanking region (3′-FR) of PSMB5, the gene encoding the target for clinically useful proteasome inhibitors.
Resequencing of 240 DNA samples identified a series of novel ethnic-specific polymorphisms that are not represented in public databases.The PSMB5 3′-FR1042 G allele significantly increased transcription during reporter gene studies, observations confirmed by genotype-phenotype correlations between single nucleotide polymorphisms (SNP) in PSMB5 and mRNA expression in the 240 lymphoblastoid cell lines from which the resequenced DNA was obtained. Studies with patient DNA samples identified additional novel PSMB5 polymorphisms, including a SNP and an insertion in the 3′-FR. Reporter-gene studies indicated that these two novel polymorphisms might decrease transcription.
These results show that nonsynonymous coding SNPs in the PSMB5 gene did not show significant effects on proteasome activity, but SNPs did influence transcription. Future studies might focus on regulatory region polymorphisms.
Traditional mutation assessment methods generally focus on predicting disruptive changes in protein-coding regions rather than non-coding regulatory regions like untranslated regions (UTRs) of mRNAs. The UTRs, however, are known to have many sequence and structural motifs that can regulate translational and transcriptional efficiency and stability of mRNAs through interaction with RNA-binding proteins and other non-coding RNAs like microRNAs (miRNAs). In a recent study, transcriptomes of tumor cells harboring mutant and wild-type KRAS (V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) genes in patients with non-small cell lung cancer (NSCLC) have been sequenced to identify single nucleotide variations (SNVs). About 40% of the total SNVs (73,717) identified were mapped to UTRs, but omitted in the previous analysis. To meet this obvious demand for analysis of the UTRs, we designed a comprehensive pipeline to predict the effect of SNVs on two major regulatory elements, secondary structure and miRNA target sites. Out of 29,290 SNVs in 6462 genes, we predict 472 SNVs (in 408 genes) affecting local RNA secondary structure, 490 SNVs (in 447 genes) affecting miRNA target sites and 48 that do both. Together these disruptive SNVs were present in 803 different genes, out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (one-sided Fisher's exact test p-value = 0.032) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set (n = 6462). Network analysis shows that the genes harboring disruptive SNVs were involved in molecular mechanisms of cancer, and the signaling pathways of LPS-stimulated MAPK, IL-6, iNOS, EIF2 and mTOR. In conclusion, we have found hundreds of SNVs which are highly disruptive with respect to changes in the secondary structure and miRNA target sites within UTRs. These changes hold the potential to alter the expression of known cancer genes or genes linked to cancer-associated pathways.
The sequencing by the PolyA selection is the most common approach for library preparation. With limited amount or degraded RNA, alternative protocols such as the NuGEN have been developed. However, it is not yet clear how the different library preparations affect the downstream analyses of the broad applications of RNA sequencing.
Methods and Materials
Eight human mammary epithelial cell (HMEC) lines with high quality RNA were sequenced by Illumina’s mRNA-Seq PolyA selection and NuGEN ENCORE library preparation. The following analyses and comparisons were conducted: 1) the numbers of genes captured by each protocol; 2) the impact of protocols on differentially expressed gene detection between biological replicates; 3) expressed single nucleotide variant (SNV) detection; 4) non-coding RNAs, particularly lincRNA detection; and 5) intragenic gene expression.
Sequences from the NuGEN protocol had lower (75%) alignment rate than the PolyA (over 90%). The NuGEN protocol detected fewer genes (12–20% less) with a significant portion of reads mapped to non-coding regions. A large number of genes were differentially detected between the two protocols. About 17–20% of the differentially expressed genes between biological replicates were commonly detected between the two protocols. Significantly higher numbers of SNVs (5–6 times) were detected in the NuGEN samples, which were largely from intragenic and intergenic regions. The NuGEN captured fewer exons (25% less) and had higher base level coverage variance. While 6.3% of reads were mapped to intragenic regions in the PolyA samples, the percentages were much higher (20–25%) for the NuGEN samples. The NuGEN protocol did not detect more known non-coding RNAs such as lincRNAs, but targeted small and “novel” lincRNAs.
Different library preparations can have significant impacts on downstream analysis and interpretation of RNA-seq data. The NuGEN provides an alternative for limited or degraded RNA but it has limitations for some RNA-seq applications.
We used deep sequencing technology to profile the transcriptome, gene copy number, and CpG island methylation status simultaneously in eight commonly used breast cell lines to develop a model for how these genomic features are integrated in estrogen receptor positive (ER+) and negative breast cancer. Total mRNA sequence, gene copy number, and genomic CpG island methylation were carried out using the Illumina Genome Analyzer. Sequences were mapped to the human genome to obtain digitized gene expression data, DNA copy number in reference to the non-tumor cell line (MCF10A), and methylation status of 21,570 CpG islands to identify differentially expressed genes that were correlated with methylation or copy number changes. These were evaluated in a dataset from 129 primary breast tumors. Gene expression in cell lines was dominated by ER-associated genes. ER+ and ER− cell lines formed two distinct, stable clusters, and 1,873 genes were differentially expressed in the two groups. Part of chromosome 8 was deleted in all ER− cells and part of chromosome 17 amplified in all ER+ cells. These loci encoded 30 genes that were overexpressed in ER+ cells; 9 of these genes were overexpressed in ER+ tumors. We identified 149 differentially expressed genes that exhibited differential methylation of one or more CpG islands within 5 kb of the 5′ end of the gene and for which mRNA abundance was inversely correlated with CpG island methylation status. In primary tumors we identified 84 genes that appear to be robust components of the methylation signature that we identified in ER+ cell lines. Our analyses reveal a global pattern of differential CpG island methylation that contributes to the transcriptome landscape of ER+ and ER− breast cancer cells and tumors. The role of gene amplification/deletion appears to more modest, although several potentially significant genes appear to be regulated by copy number aberrations.