In this paper, we propose a new method remMap — REgularized Multivariate regression for identifying MAster Predictors — for fitting multivariate response regression models under the high-dimension-low-sample-size setting. remMap is motivated by investigating the regulatory relationships among different biological molecules based on multiple types of high dimensional genomic data. Particularly, we are interested in studying the influence of DNA copy number alterations on RNA transcript levels. For this purpose, we model the dependence of the RNA expression levels on DNA copy numbers through multivariate linear regressions and utilize proper regularization to deal with the high dimensionality as well as to incorporate desired network structures. Criteria for selecting the tuning parameters are also discussed. The performance of the proposed method is illustrated through extensive simulation studies. Finally, remMap is applied to a breast cancer study, in which genome wide RNA transcript levels and DNA copy numbers were measured for 172 tumor samples. We identify a trans-hub region in cytoband 17q12–q21, whose amplification influences the RNA expression levels of more than 30 unlinked genes. These findings may lead to a better understanding of breast cancer pathology.
sparse regression; MAP(MAster Predictor) penalty; DNA copy number alteration; RNA transcript level; v-fold cross validation
Pancreatic cancer is a deadly disease with a five-year survival of less than 5%. A better understanding of the underlying biology may suggest novel therapeutic targets. Recent surveys of the pancreatic cancer genome have uncovered numerous new alterations; yet systematic functional characterization of candidate cancer genes has lagged behind. To address this challenge, here we have devised a highly-parallel RNA interference-based functional screen to evaluate many genomically-nominated candidate pancreatic cancer genes simultaneously.
For 185 candidate pancreatic cancer genes, selected from recurrently altered genomic loci, we performed a pooled shRNA library screen of cell growth/viability across 10 different cell lines. Knockdown-associated effects on cell growth were assessed by enrichment or depletion of shRNA hairpins, by hybridization to barcode microarrays. A novel analytical approach (COrrelated Phenotypes for On-Target Effects; COPOTE) was used to discern probable on-target knockdown, based on identifying different shRNAs targeting the same gene and displaying concordant phenotypes across cell lines. Knockdown data were integrated with genomic architecture and gene-expression profiles, and selected findings validated using individual shRNAs and/or independent siRNAs. The pooled shRNA library design delivered reproducible data. In all, COPOTE analysis identified 52 probable on-target gene-knockdowns. Knockdown of known oncogenes (KRAS, MYC, SMURF1 and CCNE1) and a tumor suppressor (CDKN2A) showed the expected contrasting effects on cell growth. In addition, the screen corroborated purported roles of PLEKHG2 and MED29 as 19q13 amplicon drivers. Most notably, the analysis also revealed novel possible oncogenic functions of nucleoporin NUP153 (ostensibly by modulating TGFβ signaling) and Kruppel-like transcription factor KLF5 in pancreatic cancer.
By integrating physical and functional genomic data, we were able to simultaneously evaluate many candidate pancreatic cancer genes. Our findings uncover new facets of pancreatic cancer biology, with possible therapeutic implications. More broadly, our study provides a general strategy for the efficient characterization of candidate genes emerging from cancer genome studies.
Pancreatic cancer; Functional genomics; RNAi screen; shRNA screen; NUP153; KLF5
Clear cell carcinoma (CCC) is a histologically distinct carcinoma subtype that arises in several organ systems and is marked by cytoplasmic clearing, attributed to abundant intracellular glycogen. Previously, transcription factor hepatocyte nuclear factor 1-beta (HNF1B) was identified as a biomarker of ovarian CCC. Here, we set out to explore more broadly the relation between HNF1B and carcinomas with clear cell histology. HNF1B expression, evaluated by immunohistochemistry, was significantly associated with clear cell histology across diverse gynecologic and renal carcinomas (P<0.001), as was hypomethylation of the HNF1B promoter (P<0.001). From microarray analysis, an empirically-derived HNF1B signature was significantly enriched for computationally-predicted targets (with HNF1 binding sites) (P<0.03), as well as genes associated with glycogen metabolism, including glucose-6-phophatase, and strikingly the blood clotting cascade, including fibrinogen, prothrombin and factor XIII. Enrichment of the clotting cascade was also evident in microarray data from ovarian CCC versus other histotypes (P<0.01), and HNF1B-associated prothrombin expression was verified by immunohistochemistry (P = 0.015). Finally, among gynecologic carcinomas with cytoplasmic clearing, HNF1B immunostaining was linked to a 3.0-fold increased risk of clinically-significant venous thrombosis (P = 0.043), and with a 2.3-fold increased risk (P = 0.011) in a combined gynecologic and renal carcinoma cohort. Our results define HNF1B as a broad marker of clear cell phenotype, and support a mechanistic link to glycogen accumulation and thrombosis, possibly reflecting (for gynecologic CCC) derivation from secretory endometrium. Our findings also implicate a novel mechanism of tumor-associated thrombosis (a major cause of cancer mortality), based on the direct production of clotting factors by cancer cells.
Breast cancer is a heterogeneous disease, appreciable by molecular markers, gene expression profiles, and most recently, patterns of genomic alteration. In particular, genomic profiling has revealed three distinct patterns of DNA copy number alteration: a “simple” type with few gains or losses of whole chromosome arms, an “amplifier” type with focal high-level DNA amplifications, and a “complex” type marked by numerous low-amplitude changes and copy-number transitions. The three patterns are associated with distinct gene-expression subtypes, and preferentially target different loci in the genome (implicating distinct cancer genes). Moreover, the different patterns of alteration imply distinct underlying mechanisms of genomic instability. The amplifier pattern may arise from transient telomere dysfunction, although new data suggest ongoing “amplifier” instability. The complex pattern shows similarity to breast cancers with germline BRCA1 mutation, which also exhibit “basal-like” expression profiles and complex-pattern genomes, implicating a possible defect in BRCA1-associated repair of DNA double-strand breaks. As such, targeting presumptive DNA repair defects represents a promising area of clinical investigation. Future studies should clarify the pathogenesis of breast cancers with amplifier and complex pattern genomes, and will likely identify new therapeutic opportunities.
Breast cancer; genomic instability; DNA amplification; DNA repair defect; basal-like; BRCA1
SWI/SNF is a multi-subunit chromatin remodeling complex that uses the energy of ATP hydrolysis to reposition nucleosomes, thereby modulating gene expression. Accumulating evidence suggests that SWI/SNF functions as a tumor suppressor in some cancers. However, the spectrum of SWI/SNF mutations across human cancers has not been systematically investigated. Here, we mined whole-exome sequencing data from 24 published studies representing 669 cases from 18 neoplastic diagnoses. SWI/SNF mutations were widespread across diverse human cancers, with an excess of deleterious mutations, and an overall frequency approaching TP53 mutation. Mutations occurred most commonly in the SMARCA4 enzymatic subunit, and in subunits thought to confer functional specificity (ARID1A, ARID1B, PBRM1, and ARID2). SWI/SNF mutations were not mutually-exclusive of other mutated cancer genes, including TP53 and EZH2 (both previously linked to SWI/SNF). Our findings implicate SWI/SNF as an important but under-recognized tumor suppressor in diverse human cancers, and provide a key resource to guide future investigations.
Array-based comparative genomic hybridization (aCGH) enables the measurement of DNA copy number across thousands of locations in a genome. The main goals of analyzing aCGH data are to identify the regions of copy number variation (CNV) and to quantify the amount of CNV. Although there are many methods for analyzing single-sample aCGH data, the analysis of multi-sample aCGH data is a relatively new area of research. Further, many of the current approaches for analyzing multi-sample aCGH data do not appropriately utilize the additional information present in the multiple samples. We propose a procedure called the Fused Lasso Latent Feature Model (FLLat) that provides a statistical framework for modeling multi-sample aCGH data and identifying regions of CNV. The procedure involves modeling each sample of aCGH data as a weighted sum of a fixed number of features. Regions of CNV are then identified through an application of the fused lasso penalty to each feature. Some simulation analyses show that FLLat outperforms single-sample methods when the simulated samples share common information. We also propose a method for estimating the false discovery rate. An analysis of an aCGH data set obtained from human breast tumors, focusing on chromosomes 8 and 17, shows that FLLat and Significance Testing of Aberrant Copy number (an alternative, existing approach) identify similar regions of CNV that are consistent with previous findings. However, through the estimated features and their corresponding weights, FLLat is further able to discern specific relationships between the samples, for example, identifying 3 distinct groups of samples based on their patterns of CNV for chromosome 17.
Cancer; DNA copy number; False discovery rate; Mutation
DNA amplifications, leading to the overexpression of oncogenes, are a cardinal feature of lung cancer and directly contribute to its pathogenesis. To uncover novel such alterations, we performed an array-based comparative genomic hybridization survey of 128 non-small cell lung cancer cell lines and tumors. Prominent among our findings, we identified recurrent high-level amplification at cytoband 22q11.21 in 3% of lung cancer specimens, with another 11% of specimens exhibiting low-level gain spanning that locus. The 22q11.21 amplicon core contained eight named genes, only four of which were overexpressed (by transcript profiling) when amplified. Among these, CRKL encodes an adaptor protein functioning in signal transduction, best known as a substrate of the BCR-ABL kinase in chronic myelogenous leukemia. RNA interference-mediated knockdown of CRKL in lung cancer cell lines with (but not without) amplification led to significantly decreased cell proliferation, cell-cycle progression, cell survival, and cell motility and invasion. In addition, overexpression of CRKL in immortalized human bronchial epithelial cells led to EGF-independent cell growth. Our findings indicate that amplification and resultant overexpression of CRKL contributes to diverse oncogenic phenotypes in lung cancer, with implications for targeted therapy, and highlighting a role of adapter proteins as primary genetic drivers of tumorigenesis.
CRKL; lung cancer; DNA amplification; genomic profiling; adapter protein
Steroid receptor coactivator-3 (SRC-3) is a histone acetyltransferase and nuclear hormone receptor (NHR) co activator, located on 20q12, which is amplified in several epithelial cancers and well studied in breast cancer. However, its possible role in lung cancer pathogenesis is unknown. We found SRC-3 over-expressed in 27% of NSCLC patients (N=311) by immunohistochemistry, which correlated with poor disease-free (p=0.0015) and overall (p=0.0008) survival. Twenty-seven percent of NSCLCs exhibited SRC-3 gene amplification, and we found lung cancer cell lines expressed higher levels of SRC-3 than immortalized human bronchial epithelial cells (HBECs), which in turn expressed higher level of SRC-3 than cultured primary human HBECs. siRNA-mediated down-regulation of SRC-3 in high-expressing (but not low expressing) lung cancer cells significantly inhibited tumor cell growth and induced apoptosis. Finally, we found that SRC-3 expression is inversely correlated with gefitinib sensitivity and that SRC-3 knockdown results in EGFR-TKI-resistant lung cancers becoming more sensitive to gefitinib. Together these data suggest that SRC-3 may be an important oncogene and therapeutic target for lung cancer.
The most common preclinical models of pancreatic adenocarcinoma utilize human cells or tissues that are xenografted into immunodeficient hosts. Several immunocompetent, genetically engineered mouse models of pancreatic cancer exist; however, tumor latency and disease progression in these models are highly variable. We sought to develop an immunocompetent, orthotopic mouse model of pancreatic cancer with rapid and predictable growth kinetics.
Cell lines with epithelial morphology were derived from liver metastases obtained from KrasG12D/+;LSL-Trp53R172H/+;Pdx-1-Cre mice. Tumor cells were implanted in the pancreas of immunocompetent, histocompatible B6/129 mice, and the mice were monitored for disease progression. Relevant tissues were harvested for histological, genomic and immunophenotypic analysis.
All mice developed pancreatic tumors by 2 weeks. Invasive disease and liver metastases were noted by 6-8 weeks. Histological examination of tumors demonstrated cytokeratin-19-positive adenocarcinoma with regions of desmoplasia. Genomic analysis revealed broad chromosomal changes along with focal gains and losses. Pancreatic tumors were infiltrated with dendritic cells, myeloid-derived suppressor cells, macrophages and T lymphocytes. Survival was decreased in RAG-/- mice, which are deficient in T cells, suggesting that an adaptive immune response alters the course of disease in wild-type mice.
We have developed a rapid, predictable orthotopic model of pancreatic adenocarcinoma in immunocompetent mice that mimics human pancreatic cancer with regard to genetic mutations, histological appearance and pattern of disease progression. This model highlights both the complexity and relevance of the immune response to invasive pancreatic cancer and may be useful for the preclinical evaluation of new therapeutic agents.
Pancreatic cancer; Metastasis/metastasis genes/metastasis models; Animal models of cancer
Epithelial-mesenchymal transition (EMT), a switch of polarized epithelial cells to a migratory, fibroblastoid phenotype, is considered a key process driving tumor cell invasiveness and metastasis. Using breast cancer cell lines as a model system, we sought to discover gene-expression signatures of EMT with clinical and mechanistic relevance. A supervised comparison of epithelial and mesenchymal breast cancer lines defined a 200-gene EMT signature that was prognostic across multiple breast cancer cohorts. Immunostaining of LYN, a top-ranked EMT signature gene and Src-family tyrosine kinase, was associated with significantly shorter overall survival (P=0.02), and correlated with the basal-like (“triple-negative”) phenotype. In mesenchymal breast cancer lines, RNAi-mediated knockdown of LYN inhibited cell migration and invasion, but not proliferation. Dasatinib, a dual-specificity tyrosine kinase inhibitor, also blocked invasion (but not proliferation) at nanomolar concentrations that inhibit LYN kinase activity, suggesting that LYN is a likely target and invasion a relevant endpoint for dasatinib therapy. Our findings define a prognostically-relevant EMT signature in breast cancer, and identify LYN as a mediator of invasion and possible new therapeutic target (and theranostic marker for dasatinib response), with particular relevance to clinically-aggressive basal-like breast cancer.
Breast cancer; epithelial-mesenchymal transition; transcriptional profiling; LYN; dasatinib
Summary: DNA copy number alterations (CNA) frequently underlie gene expression changes by increasing or decreasing gene dosage. However, only a subset of genes with altered dosage exhibit concordant changes in gene expression. This subset is likely to be enriched for oncogenes and tumor suppressor genes, and can be identified by integrating these two layers of genome-scale data. We introduce DNA/RNA-Integrator (DR-Integrator), a statistical software tool to perform integrative analyses on paired DNA copy number and gene expression data. DR-Integrator identifies genes with significant correlations between DNA copy number and gene expression, and implements a supervised analysis that captures genes with significant alterations in both DNA copy number and gene expression between two sample classes.
Availability: DR-Integrator is freely available for non-commercial use from the Pollack Lab at http://pollacklab.stanford.edu/ and can be downloaded as a plug-in application to Microsoft Excel and as a package for the R statistical computing environment. The R package is available under the name ‘DRI’ at http://cran.r-project.org/. An example analysis using DR-Integrator is included as supplemental material.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Breast cancer exhibits clinical and molecular heterogeneity, where expression-profiling studies have identified five major molecular subtypes. The basal-like subtype, expressing basal epithelial markers and negative for estrogen receptor (ER), progesterone receptor (PR) and HER2, is associated with higher overall levels of DNA copy number alteration (CNA), specific CNAs (like gain on chromosome 10p), and poor prognosis. Discovering the molecular genetic basis of tumor subtypes may provide new opportunities for therapy. To identify the driver oncogene on 10p associated with basal-like tumors, we analyzed genomic profiles of 172 breast carcinomas. The smallest shared region of gain spanned just seven genes at 10p13, including calcium/calmodulin-dependent protein kinase ID (CAMK1D), functioning in intracellular signaling but not previously linked to cancer. By microarray, CAMK1D was overexpressed when amplified, and by immunohistochemistry exhibited elevated expression in invasive carcinomas compared to carcinoma in situ. Engineered overexpression of CAMK1D in non-tumorigenic breast epithelial cells led to increased cell proliferation, and molecular and phenotypic alterations indicative of epithelial-mesenchymal transition (EMT), including loss of cell-cell adhesions and increased cell migration and invasion. Our findings identify CAMK1D as a novel amplified oncogene linked to EMT in breast cancer, and as a potential therapeutic target with particular relevance to clinically unfavorable basal-like tumors.
Breast cancer; genomic profiling; DNA amplification; oncogene; epithelial-mesenchymal transition; EMT
Prostate cancer is the most frequently diagnosed cancer among men in the United States. In contrast, cancer of the seminal vesicle is exceedingly rare, despite that the prostate and seminal vesicle share similar histology, secretory function, androgen dependency, blood supply, and (in part) embryonic origin. We hypothesized that gene-expression differences between prostate and seminal vesicle might inform mechanisms underlying the higher incidence of prostate cancer.
Whole-genome DNA microarrays were used to profile gene expression of 11 normal prostate and 7 seminal vesicle specimens (including 6 matched pairs) obtained from radical prostatectomy. Supervised analysis was used to identify genes differentially expressed between normal prostate and seminal vesicle, and this list was then cross-referenced to genes differentially expressed between normal and cancerous prostate. Expression patterns of selected genes were confirmed by immunohistochemistry using a tissue microarray.
We identified 32 genes that displayed a highly statistically-significant expression pattern with highest levels in seminal vesicle, lower levels in normal prostate, and lowest levels in prostate cancer. Among these genes was the known candidate prostate tumor suppressor GSTP1 (involved in xenobiotic detoxification). The expression pattern of GSTP1 and four other genes, ABCG2 (xenobiotic transport), CRABP2 (retinoic acid signaling), GATA3 (lineage-specific transcription) and SLPI (immune response), was confirmed by immunohistochemistry.
Our findings identify candidate prostate cancer genes whose reduced expression in prostate (compared to seminal vesicle) may be permissive to prostate cancer initiation. Such genes and their pathways may inform mechanisms of prostate carcinogenesis, and suggest new opportunities for prostate cancer prevention.
prostate cancer; seminal vesicle; expression profiling; microarray
A correction to A DNA microarray survey of gene expression in normal human tissues by R Shyamsundar, YH Kim, JP Higgins, K Montgomery, M Jorden, A Sethuraman, M van de Rijn, D Botstein, PO Brown and JR Pollack. Genome Biology 2005, 6:R22
Pancreatic cancer, the fourth leading cause of cancer death in the United States, is frequently associated with the amplification and deletion of specific oncogenes and tumor-suppressor genes (TSGs), respectively. To identify such novel alterations and to discover the underlying genes, we performed comparative genomic hybridization on a set of 22 human pancreatic cancer cell lines, using cDNA microarrays measuring ∼26,000 human genes (thereby providing an average mapping resolution of <60 kb). To define the subset of amplified and deleted genes with correspondingly altered expression, we also profiled mRNA levels in parallel using the same cDNA microarray platform. In total, we identified 14 high-level amplifications (38–4934 kb in size) and 15 homozygous deletions (46–725 kb). We discovered novel localized amplicons, suggesting previously unrecognized candidate oncogenes at 6p21, 7q21 (SMURF1, TRRAP), 11q22 (BIRC2, BIRC3), 12p12, 14q24 (TGFB3), 17q12, and 19q13. Likewise, we identified novel polymerase chain reaction-validated homozygous deletions indicating new candidate TSGs at 6q25, 8p23, 8p22 (TUSC3), 9q33 (TNC, TNFSF15), 10q22, 10q24 (CHUK), 11p15 (DKK3), 16q23, 18q23, 21q22 (PRDM15, ANKRD3), and Xp11. Our findings suggest candidate genes and pathways, which may contribute to the development or progression of pancreatic cancer.
Pancreatic cancer; array CGH; comparative genomic hybridization; expression profiling; DNA amplification
Array-based comparative genomic hybridization, RNA expression profiling, and proteomic analyses are new molecular technologies used to study breast cancer. Invasive breast cancers were originally evaluated because they provided ample quantities of DNA, RNA, and protein. The application of these technologies to pre-invasive breast lesions is discussed, including methods that facilitate their implementation. Data indicate that atypical ductal hyperplasia and ductal carcinoma in situ are precursor lesions molecularly similar to adjacent invasive breast cancer. It is expected that molecular technologies will identify breast tissue at risk for the development of unfavorable subtypes of invasive breast cancer and reveal strategies for targeted chemoprevention or eradication.
array comparative genomic hybridization; breast cancer; ductal carcinoma in situ; expression profiling; microarrays
Wnt signaling is implicated in many developmental decisions, including stem cell control, as well as in cancer. There are relatively few target genes known of the Wnt pathway.
We have identified target genes of Wnt signaling using microarray technology and human embryonic carcinoma cells stimulated with active Wnt protein. The ~50 genes upregulated early after Wnt addition include the previously known Wnt targets Cyclin D1, MYC, ID2 and βTRCP. The newly identified targets, which include MSX1, MSX2, Nucleophosmin, Follistatin, TLE/Groucho, Ubc4/5E2, CBP/P300, Frizzled and REST/NRSF, have important implications for understanding the roles of Wnts in development and cancer. The protein synthesis inhibitor cycloheximide blocks induction by Wnt, consistent with a requirement for newly synthesized β-catenin protein prior to target gene activation. The promoters of nearly all the target genes we identified have putative TCF binding sites, and we show that the TCF binding site is required for induction of Follistatin. Several of the target genes have a cooperative response to a combination of Wnt and BMP.
Wnt signaling activates genes that promote stem cell fate and inhibit cellular differentiation and regulates a remarkable number of genes involved in its own signaling system.
Somatic cell mutants can be informative in the analysis of a wide variety of cellular processes. The use of map-based positional cloning strategies in somatic cell hybrids to analyze genes responsible for recessive mutant phenotypes is often tedious, however, and remains a major obstacle in somatic cell genetics. To fulfill the need for more efficient gene mapping in somatic cell mutants, we have developed a new DNA microarray comparative genomic hybridization (array-CGH) method that can rapidly and efficiently map the physical location of genes complementing somatic cell mutants to a small candidate genomic region. Here we report experiments that establish the validity and efficacy of the methodology.
CHO cells deficient for hypoxanthine:guanine phosphoribosyl transferase (HPRT) were fused with irradiated normal human fibroblasts and subjected to HAT selection. Cy5-labeled genomic DNA from the surviving hybrids containing the HPRT gene was mixed with Cy3-labeled genomic DNA from normal CHO cells and hybridized to a microarray containing 40,185 cDNAs, representing 29,399 genes (UniGene clusters). The DNA spots with the highest Cy5:Cy3 fluorescence ratios corresponded to a group of genes mapping within a 1 Mb interval centered near position 142.7 Mb on the X chromosome, the genomic location of HPRT.
The results indicate that our physical mapping method based on radiation hybrids and array-CGH should significantly enhance the speed and efficiency of positional cloning in somatic cell genetics.
Gene fusions, like BCR/ABL1 in chronic myelogenous leukemia, have long been recognized in hematologic and mesenchymal malignancies. The recent finding of gene fusions in prostate and lung cancers has motivated the search for pathogenic gene fusions in other malignancies. Here, we developed a “breakpoint analysis” pipeline to discover candidate gene fusions by tell-tale transcript level or genomic DNA copy number transitions occurring within genes. Mining data from 974 diverse cancer samples, we identified 198 candidate fusions involving annotated cancer genes. From these, we validated and further characterized novel gene fusions involving ROS1 tyrosine kinase in angiosarcoma (CEP85L/ROS1), SLC1A2 glutamate transporter in colon cancer (APIP/SLC1A2), RAF1 kinase in pancreatic cancer (ATG7/RAF1) and anaplastic astrocytoma (BCL6/RAF1), EWSR1 in melanoma (EWSR1/CREM), CDK6 kinase in T-cell acute lymphoblastic leukemia (FAM133B/CDK6), and CLTC in breast cancer (CLTC/VMP1). Notably, while these fusions involved known cancer genes, all occurred with novel fusion partners and in previously unreported cancer types. Moreover, several constituted druggable targets (including kinases), with therapeutic implications for their respective malignancies. Lastly, breakpoint analysis identified new cell line models for known rearrangements, including EGFRvIII and FIP1L1/PDGFRA. Taken together, we provide a robust approach for gene fusion discovery, and our results highlight a more widespread role of fusion genes in cancer pathogenesis.
Gene fusions represent an important class of cancer genes, created by rearrangements of the genome that bring together two different genes. Because they are unique to cancer cells, gene fusions are ideal diagnostic markers and therapeutic targets. While gene fusions were once thought restricted mainly to blood cancers, recent discoveries suggest they are more widespread. Here, we have developed an approach for mining DNA microarray data to detect the tell-tale signatures of gene fusions, as “breakpoints” occurring within the encoding DNA or expressed transcripts. We apply this approach to a large collection of nearly 1,000 human cancer specimens. From this analysis, we discover and verify twelve new gene fusions occurring in diverse cancer types. We verify that some of these rearrangements recur in other samples of the same cancer type (supporting a causal role) and that the cancers show dependency on the fusion for cancer cell growth. Notably, some of these fusions (e.g. CEP85L/ROS1 in angiosarcoma) represent the first for that cancer type and thus provide important new biological insight. Some are also good drug targets (including rearrangements of ROS1, RAF1, and CDK6 kinases), with clear implications for therapy.
Molecular characterization of tumors has been critical for identifying important genes in cancer biology and for improving tumor classification and diagnosis. Long non-coding RNAs, as a new, relatively unstudied class of transcripts, provide a rich opportunity to identify both functional drivers and cancer-type-specific biomarkers. However, despite the potential importance of long non-coding RNAs to the cancer field, no comprehensive survey of long non-coding RNA expression across various cancers has been reported.
We performed a sequencing-based transcriptional survey of both known long non-coding RNAs and novel intergenic transcripts across a panel of 64 archival tumor samples comprising 17 diagnostic subtypes of adenocarcinomas, squamous cell carcinomas and sarcomas. We identified hundreds of transcripts from among the known 1,065 long non-coding RNAs surveyed that showed variability in transcript levels between the tumor types and are therefore potential biomarker candidates. We discovered 1,071 novel intergenic transcribed regions and demonstrate that these show similar patterns of variability between tumor types. We found that many of these differentially expressed cancer transcripts are also expressed in normal tissues. One such novel transcript specifically expressed in breast tissue was further evaluated using RNA in situ hybridization on a panel of breast tumors. It was shown to correlate with low tumor grade and estrogen receptor expression, thereby representing a potentially important new breast cancer biomarker.
This study provides the first large survey of long non-coding RNA expression within a panel of solid cancers and also identifies a number of novel transcribed regions differentially expressed across distinct cancer types that represent candidate biomarkers for future research.
3SEQ; FFPE; human cancer; intergenic transcripts; lncRNAs; novel transcripts; solid tumors; transcriptional profiling
Pancreatic cancer is a deadly disease, and new therapeutic targets are urgently needed. We previously identified DNA amplification at 7q21-q22 in pancreatic cancer cell lines. Now, by high-resolution genomic profiling of human pancreatic cancer cell lines and human tumors (engrafted in immunodeficient mice to enrich the cancer epithelial fraction), we define a 325 Kb minimal amplicon spanning SMURF1, an E3 ubiquitin ligase and known negative regulator of transforming growth factor β (TGFβ) growth inhibitory signaling. SMURF1 amplification was confirmed in primary human pancreatic cancers by fluorescence in situ hybridization (FISH), where 4 of 95 cases (4.2%) exhibited amplification. By RNA interference (RNAi), knockdown of SMURF1 in a human pancreatic cancer line with focal amplification (AsPC-1) did not alter cell growth, but led to reduced cell invasion and anchorage-independent growth. Interestingly, this effect was not mediated through altered TGFβ signaling, assayed by transcriptional reporter. Finally, overexpression of SMURF1 (but not a catalytic mutant) led to loss of contact inhibition in NIH-3T3 mouse embryo fibroblast cells. Together, these findings identify SMURF1 as an amplified oncogene driving multiple tumorigenic phenotypes in pancreatic cancer, and provide a new druggable target for molecularly directed therapy.
Prostate cancer exhibits tremendous variability in clinical behavior, ranging
from indolent to lethal disease. Better prognostic markers are needed to
stratify patients for appropriately aggressive therapy. By expression profiling,
we can identify a proliferation signature variably expressed in prostate
cancers. Here, we asked whether one or more tissue biomarkers might capture that
information, and provide prognostic utility. We assayed three proliferation
signature genes: MKI67 (Ki-67; also a classic proliferation
biomarker), TOP2A (DNA topoisomerase II, alpha), and
E2F1 (E2F transcription factor 1). Immunohistochemical
staining was evaluable on 139 radical prostatectomy cases (in tissue microarray
format), with a median clinical follow-up of eight years. Each of the three
proliferation markers was by itself prognostic. Notably, combining the three
markers together as a “proliferation index” (0 or 1,
vs. 2 or 3 positive markers) provided superior prognostic
performance (hazard ratio = 2.6 (95% CI:
1.4–4.9); P = 0.001). In a
multivariate analysis that included preoperative serum prostate specific antigen
(PSA) levels, Gleason grade and pathologic tumor stage, the composite
proliferation index remained a significant predictor
(P = 0.005). Analysis of
receiver-operating characteristic (ROC) curves confirmed the improved
prognostication afforded by incorporating the proliferation index (compared to
the clinicopathologic data alone). Our findings highlight the potential value of
a multi-gene signature-based diagnostic, and define a tri-marker proliferation
index with possible utility for improved prognostication and treatment
stratification in prostate cancer.
We sought to identify genes of clinical significance to predict survival and the risk for colorectal liver metastasis (CLM), the most common site of metastasis from colorectal cancer (CRC).
Patients and Methods
We profiled gene expression in 31 specimens from primary CRC and 32 unmatched specimens of CLM, and performed Significance Analysis of Microarrays (SAM) to identify genes differentially expressed between these two groups. To characterize the clinical relevance of two highly-ranked differentially-expressed genes, we analyzed the expression of secreted phosphoprotein 1 (SPP1 or osteopontin) and lymphoid enhancer factor-1 (LEF1) by immunohistochemistry using a tissue microarray (TMA) representing an independent set of 154 patients with primary CRC.
Supervised analysis using SAM identified 963 genes with significantly higher expression in CLM compared to primary CRC, with a false discovery rate of <0.5%. TMA analysis showed SPP1 and LEF1 protein overexpression in 60% and 44% of CRC cases, respectively. Subsequent occurrence of CLM was significantly correlated with the overexpression of LEF1 (chi-square p = 0.042), but not SPP1 (p = 0.14). Kaplan Meier analysis revealed significantly worse survival in patients with overexpression of LEF1 (p<0.01), but not SPP1 (p = 0.11). Both univariate and multivariate analyses identified stage (p<0.0001) and LEF1 overexpression (p<0.05) as important prognostic markers, but not tumor grade or SPP1.
Among genes differentially expressed between CLM and primary CRC, we demonstrate overexpression of LEF1 in primary CRC to be a prognostic factor for poor survival and increased risk for liver metastasis.
Breast cancer cell lines have been used widely to investigate breast cancer pathobiology and new therapies. Breast cancer is a molecularly heterogeneous disease, and it is important to understand how well and which cell lines best model that diversity. In particular, microarray studies have identified molecular subtypes–luminal A, luminal B, ERBB2-associated, basal-like and normal-like–with characteristic gene-expression patterns and underlying DNA copy number alterations (CNAs). Here, we studied a collection of breast cancer cell lines to catalog molecular profiles and to assess their relation to breast cancer subtypes.
Whole-genome DNA microarrays were used to profile gene expression and CNAs in a collection of 52 widely-used breast cancer cell lines, and comparisons were made to existing profiles of primary breast tumors. Hierarchical clustering was used to identify gene-expression subtypes, and Gene Set Enrichment Analysis (GSEA) to discover biological features of those subtypes. Genomic and transcriptional profiles were integrated to discover within high-amplitude CNAs candidate cancer genes with coordinately altered gene copy number and expression.
Transcriptional profiling of breast cancer cell lines identified one luminal and two basal-like (A and B) subtypes. Luminal lines displayed an estrogen receptor (ER) signature and resembled luminal-A/B tumors, basal-A lines were associated with ETS-pathway and BRCA1 signatures and resembled basal-like tumors, and basal-B lines displayed mesenchymal and stem/progenitor-cell characteristics. Compared to tumors, cell lines exhibited similar patterns of CNA, but an overall higher complexity of CNA (genetically simple luminal-A tumors were not represented), and only partial conservation of subtype-specific CNAs. We identified 80 high-level DNA amplifications and 13 multi-copy deletions, and the resident genes with concomitantly altered gene-expression, highlighting known and novel candidate breast cancer genes.
Overall, breast cancer cell lines were genetically more complex than tumors, but retained expression patterns with relevance to the luminal-basal subtype distinction. The compendium of molecular profiles defines cell lines suitable for investigations of subtype-specific pathobiology, cancer stem cell biology, biomarkers and therapies, and provides a resource for discovery of new breast cancer genes.