The androgen receptor (AR) plays a central role in establishing an oncogenic cascade that drives prostate cancer progression. Some prostate cancers escape androgen dependence and are often associated with an aggressive phenotype. The oestrogen receptor alpha (ERα) is expressed in prostate cancers, independent of AR status. However, the role of ERα remains elusive. Using a combination of chromatin immunoprecipitation (ChIP) and RNA-sequencing data, we identified an ERα-specific non-coding transcriptome signature. Among putatively ERα-regulated intergenic long non-coding RNAs (lncRNAs), we identified nuclear enriched abundant transcript 1 (NEAT1) as the most significantly overexpressed lncRNA in prostate cancer. Analysis of two large clinical cohorts also revealed that NEAT1 expression is associated with prostate cancer progression. Prostate cancer cells expressing high levels of NEAT1 were recalcitrant to androgen or AR antagonists. Finally, we provide evidence that NEAT1 drives oncogenic growth by altering the epigenetic landscape of target gene promoters to favour transcription.
While prostate cancer predominantly exhibits androgen dependence, oestrogen receptor (ER) signalling is also involved. Here, Chakravarty et al. show that ERα regulates the expression of the NEAT1 long non-coding RNA, which in turn promotes tumorigenesis by maintaining an oncogenic programme/cascade.
Glomus tumors (GT) have been classified among tumors of perivascular smooth muscle differentiation, together with myopericytoma, myofibroma/tosis, and angioleiomyoma, based on their morphologic overlap. However, no molecular studies have been carried out to date to investigate their genetic phenotype and to confirm their shared pathogenesis. RNA sequencing was performed in three index cases (GT1, malignant GT; GT2, benign GT and M1, multifocal myopericytoma), followed by FusionSeq data analysis, a modular computational tool developed to discover gene fusions from paired-end RNA-seq data. A gene fusion involving MIR143 in band 5q32 was identified in both GTs with either NOTCH2 in 1p13 in GT1 or NOTCH1 in 9q34 in GT2, but none in M1. After being validated by FISH and RT-PCR, these abnormalities were screened on 33 GTs, 6 myopericytomas, 9 myofibroma/toses, 18 angioleiomyomas and in a control group of 5 sino-nasal hemangiopericytomas. Overall NOTCH2 gene rearrangements were identified in 52% of GT, including all malignant cases and one NF1-related GT. No additional cases showed NOTCH1 rearrangement. As NOTCH3 shares similar functions with NOTCH2 in regulating vascular smooth muscle development, the study group was also investigated for abnormalities in this gene by FISH. Indeed, NOTCH3 rearrangements were identified in 9% of GTs, all present in benign soft tissue GT, one case being fused to MIR143. Only 1/18 angioleiomyomas showed NOTCH2 gene rearrangement, while all the myopericytomas and myofibroma/toses were negative. In summary we describe novel NOTCH1-3 rearrangements in benign and malignant, visceral and soft tissue GTs.
NOTCH2; NOTCH3; NOTCH1; miR143; glomus tumor
Defining the chronology of molecular alterations may identify milestones in carcinogenesis. To unravel the temporal evolution of aberrations from clinical tumors, we developed CLONET, which upon estimation of tumor admixture and ploidy infers the clonal hierarchy of genomic aberrations. Comparative analysis across 100 sequenced genomes from prostate, melanoma, and lung cancers established diverse evolutionary hierarchies, demonstrating the early disruption of tumor-specific pathways. The analyses highlight the diversity of clonal evolution within and across tumor types that might be informative for risk stratification and patient selection for targeted therapies. CLONET addresses heterogeneous clinical samples seen in the setting of precision medicine.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0439-6) contains supplementary material, which is available to authorized users.
Conventional epithelioid hemangioendotheliomas (EHE) have a distinctive morphologic appearance and are characterized by a recurrent t(1;3) translocation, resulting in a WWTR1-CAMTA1 fusion gene. We have recently encountered a fusion-negative subset characterized by a somewhat different morphology, including focally well-formed vasoformative features, which was further investigated for recurrent genetic abnormalities. Based on a case showing strong TFE3 immunoreactivity, FISH analysis for TFE3 gene rearrangement was applied to the index case as well as to 9 additional cases, selected through negative WWTR1-CAMTA1 screening. A control group, including 18 epithelioid hemangiomas, 9 pseudomyogenic HE and 3 epithelioid angiosarcomas, was also tested. TFE3 gene rearrangement was identified in 10 patients, with equal gender distribution and a mean age of 30 years old. The lesions were located in somatic soft tissue in 6 cases, lung in 3 and one in bone. One case with available frozen tissue was tested by RNA sequencing and FusionSeq data analysis to detect novel fusions. A YAP1-TFE3 fusion was thus detected, which was further validated by FISH and RT-PCR. YAP1 gene rearrangements were then confirmed in 7 of the remaining 9 TFE3-rearranged EHEs by FISH. No TFE3 structural abnormalities were detected in any of the controls. The TFE3-rearranged EHEs showed similar morphologic features with at least focally, well-formed vascular channels, in addition to a variably solid architecture. All tumors expressed endothelial markers, as well as strong nuclear TFE3. In summary we are reporting a novel subset of EHE occurring in young adults, showing a distinct phenotype and YAP1-TFE3 fusions.
TFE3; YAP1; epithelioid hemangioendothelioma; WWTR1
PHF1 gene rearrangements have been recently described in around 50% of ossifying fibromyxoid tumors (OFMT) including benign and malignant cases, with a small subset showing EP400-PHF1 fusions. In the remaining cases no alternative gene fusions have been identified. PHF1-negative OFTs, especially if lacking S100 protein staining or peripheral ossification, are difficult to diagnose and distinguish from other soft tissue mimics. In seeking more comprehensive molecular characterization, we investigated a large cohort of 39 OFMT of various anatomic sites, immunoprofiles and grades of malignancy. Tumors were screened for PHF1 and EP400 rearrangements by FISH. RNA sequencing was performed in two index cases (OFMT1, OFMT3), negative for EP400-PHF1 fusions, followed by FusionSeq data analysis, a modular computational tool developed to discover gene fusions from paired-end RNA-seq data. Two novel fusions were identified ZC3H7B-BCOR in OFMT1 and MEAF6-PHF1 in OFMT3. After being validated by FISH and RT-PCR, these abnormalities were screened on the remaining cases. With these additional gene fusions, 33/39 (85%) of OFMTs demonstrated recurrent gene rearrangements, which can be used as molecular markers in challenging cases. The most common abnormality is PHF1 gene rearrangement (80%), being present in benign, atypical and malignant lesions, with fusion to EP400 in 44% of cases. ZC3H7B-BCOR and MEAF6-PHF1 fusions occurred predominantly in S100 protein-negative and malignant OFMT. As similar gene fusions were reported in endometrial stromal sarcomas, we screened for potential gene abnormalities in JAZF1 and EPC1 by FISH and found two additional cases with EPC1-PHF1 fusions.
Ossifying fibromyxoid tumor; PHF1; EP400; BCOR; MEAF6
Spindle cell rhabdomyosarcoma (RMS) is a rare form of RMS with different clinical characteristics and behavior between children and adult patients. Its genetic hallmark remains unknown and it remains debatable if there is pathogenetic relationship between the spindle cell and the so-called sclerosing RMS. We studied two pediatric and one adult spindle cell RMS by next generation RNA sequencing and used FusionSeq for data analysis to detect novel fusions. An SRF-NCOA2 gene fusion was detected in a spindle cell RMS from the posterior neck in a 7 month-old child. The fusion matched the tumor karyotype and was further confirmed by fluorescence in situ hybridization (FISH) and by RT-PCR, which showed fusion of SRF exon 6 to NCOA2 exon 12. Additional 14 spindle cell (from 8 children and 6 adults) and 4 sclerosing (from 2 children and 2 adults) RMS were tested by FISH for the presence of abnormalities in NCOA2, SRF, as well as for PAX3 and NCOA1, identifying NCOA2 rearrangements in two additional spindle cell RMS from a 3 month-old and a 4 week-old child, both arising in the chest wall. In the latter tumor, TEAD1 was identified by rapid amplification of cDNA ends (RACE) to be the NCOA2 gene fusion partner. None of the adult tumors were positive for NCOA2 rearrangement. Despite similar histomorphology in adults and young children, these results suggest that spindle cell RMS is a heterogeneous disease genetically as well as clinically. Our findings also support a relationship between NCOA2-rearranged spindle cell RMS occurring in young childhood and the so-called congenital RMS, which often displays rearrangements at 8q13 locus (NCOA2).
rhabdomyosarcoma; spindle cell; NCOA2; SRF; TEAD1; translocation; infantile
The analysis of exonic DNA from prostate cancers has identified recurrently mutated genes, but the spectrum of genome-wide alterations has not been profiled extensively in this disease. We sequenced the genomes of 57 prostate tumors and matched normal tissues to characterize somatic alterations and to study how they accumulate during oncogenesis and progression. By modeling the genesis of genomic rearrangements, we identified abundant DNA translocations and deletions that arise in a highly interdependent manner. This phenomenon, which we term “chromoplexy”, frequently accounts for the dysregulation of prostate cancer genes and appears to disrupt multiple cancer genes coordinately. Our modeling suggests that chromoplexy may induce considerable genomic derangement over relatively few events in prostate cancer and other neoplasms, supporting a model of punctuated cancer evolution. By characterizing the clonal hierarchy of genomic lesions in prostate tumors, we charted a path of oncogenic events along which chromoplexy may drive prostate carcinogenesis.
Interpreting variants, especially noncoding ones, in the increasing
number of personal genomes is challenging. We used patterns of polymorphisms in
functionally annotated regions in 1092 humans to identify deleterious variants;
then we experimentally validated candidates. We analyzed both coding and
noncoding regions, with the former corroborating the latter. We found regions
particularly sensitive to mutations (“ultrasensitive”) and
variants that are disruptive because of mechanistic effects on
transcription-factor binding (that is, “motif-breakers”). We also
found variants in regions with higher network centrality tend to be deleterious.
Insertions and deletions followed a similar pattern to single-nucleotide
variants, with some notable exceptions (e.g., certain deletions and enhancers).
On the basis of these patterns, we developed a computational tool (FunSeq),
whose application to ~90 cancer genomes reveals nearly a hundred
candidate noncoding drivers.
Recurrent mutations in the Speckle-Type POZ Protein (SPOP) gene occur in up to 15% of prostate cancers. However, the frequency and features of cancers with these mutations across different populations is unknown.
To investigate SPOP mutations across diverse cohorts and validate a series of assays employing high-resolution melting (HRM) analysis and Sanger sequencing for mutational analysis of formalin-fixed paraffin-embedded material.
Design, Setting, and Participants
720 prostate cancer samples from six international cohorts spanning Caucasian, African American, and Asian patients, including both prostate-specific antigen-screened and unscreened populations, were screened for their SPOP mutation status. Status of SPOP was correlated to molecular features (ERG rearrangement, PTEN deletion, and CHD1 deletion) as well as clinical and pathologic features.
Results and Limitations
Overall frequency of SPOP mutations was 8.1% (4.6% to 14.4%), SPOP mutation was inversely associated with ERG rearrangement (P < .01), and SPOP mutant (SPOPmut) cancers had higher rates of CHD1 deletions (P < .01). There were no significant differences in biochemical recurrence in SPOPmut cancers. Limitations of this study include missing mutational data due to sample quality and lack of power to identify a difference in clinical outcomes.
SPOP is mutated in 4.6% to 14.4% of patients with prostate cancer across different ethnic and demographic backgrounds. There was no significant association between SPOP mutations with ethnicity, clinical, or pathologic parameters. Mutual exclusivity of SPOP mutation with ERG rearrangement as well as a high association with CHD1 deletion reinforces SPOP mutation as defining a distinct molecular subclass of prostate cancer.
TFE3 translocation renal cell carcinoma (tRCC) is defined by chromosomal translocations involving the TFE3 transcription factor at chromosome Xp11.2. Genetically proven TFE3 tRCCs have a broad histologic spectrum with overlapping features to other renal tumor subtypes. In this study, we aimed for characterizing RCC with TFE3 protein expression. Using next-generation whole transcriptome sequencing (RNA-Seq) as a discovery tool, we analyzed fusion transcripts, gene expression profile, and somatic mutations in frozen tissue of one TFE3 tRCC. By applying a computational analysis developed to call chimeric RNA molecules from paired-end RNA-Seq data, we confirmed the known TFE3 translocation. Its fusion partner SFPQ has already been described as fusion partner in tRCCs. In addition, an RNA read-through chimera between TMED6 and COG8 as well as MET and KDR (VEGFR2) point mutations were identified. An EGFR mutation, but no chromosomal rearrangements, was identified in a control group of five clear cell RCCs (ccRCCs). The TFE3 tRCC could be clearly distinguished from the ccRCCs by RNA-Seq gene expression measurements using a previously reported tRCC gene signature. In validation experiments using reverse transcription-PCR, TMED6-COG8 chimera expression was significantly higher in nine TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in 24 ccRCCs (P < .001) and 22 papillary RCCs (P < .05–.07). Immunohistochemical analysis of selected genes from the tRCC gene signature showed significantly higher eukaryotic translation elongation factor 1 alpha 2 (EEF1A2) and Contactin 3 (CNTN3) expression in 16 TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in over 200 ccRCCs (P < .0001, both).
Summary: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment.
Availability and Implementation: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.
Contact: firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
Although prostate cancer (PCa) is the second leading cause of cancer death among men worldwide, not all men diagnosed with PCa will die from the disease. A critical challenge, therefore, is to distinguish indolent PCa from more advanced forms to guide appropriate treatment decisions. We used Enhanced Reduced Representation Bisulfite Sequencing, a genome-wide high-coverage single-base resolution DNA methylation method to profile seven localized PCa samples, seven matched benign prostate tissues, and six aggressive castration-resistant prostate cancer (CRPC) samples. We integrated these data with RNA-seq and whole-genome DNA-seq data to comprehensively characterize the PCa methylome, detect changes associated with disease progression, and identify novel candidate prognostic biomarkers. Our analyses revealed the correlation of cytosine guanine dinucleotide island (CGI)-specific hypermethylation with disease severity and association of certain breakpoints (deletion, tandem duplications, and interchromosomal translocations) with DNA methylation. Furthermore, integrative analysis of methylation and single-nucleotide polymorphisms (SNPs) uncovered widespread allele-specific methylation (ASM) for the first time in PCa. We found that most DNA methylation changes occurred in the context of ASM, suggesting that variations in tumor epigenetic landscape of individuals are partly mediated by genetic differences, which may affect PCa disease progression. We further selected a panel of 13 CGIs demonstrating increased DNA methylation with disease progression and validated this panel in an independent cohort of 20 benign prostate tissues, 16 PCa, and 8 aggressive CRPCs. These results warrant clinical evaluation in larger cohorts to help distinguish indolent PCa from advanced disease.
Neuroendocrine prostate cancer (NEPC) is an aggressive subtype of prostate cancer that most commonly evolves from preexisting prostate adenocarcinoma (PCA). Using Next Generation RNA-sequencing and oligonucleotide arrays, we profiled 7 NEPC, 30 PCA, and 5 benign prostate tissue (BEN), and validated findings on tumors from a large cohort of patients (37 NEPC, 169 PCA, 22 BEN) using IHC and FISH. We discovered significant overexpression and gene amplification of AURKA and MYCN in 40% of NEPC and 5% of PCA, respectively, and evidence that that they cooperate to induce a neuroendocrine phenotype in prostate cells. There was dramatic and enhanced sensitivity of NEPC (and MYCN overexpressing PCA) to Aurora kinase inhibitor therapy both in vitro and in vivo, with complete suppression of neuroendocrine marker expression following treatment. We propose that alterations in Aurora kinase A and N-myc are involved in the development of NEPC, and future clinical trials will help determine from the efficacy of Aurora kinase inhibitor therapy.
neuroendocrine prostate cancer; aurora kinase A; n-myc; drug targets
Advances in sequencing technology have led to a sharp decrease in the cost of 'data generation'. But is this sufficient to ensure cost-effective and efficient 'knowledge generation'?
Bioinformatics; costs of sequencing; data analysis; experimental design; next-generation sequencing; sample collection
The microbial conversion of solid cellulosic biomass to liquid biofuels may provide a renewable energy source for transportation fuels. Endophytes represent a promising group of organisms, as they are a mostly untapped reservoir of metabolic diversity. They are often able to degrade cellulose, and they can produce an extraordinary diversity of metabolites. The filamentous fungal endophyte Ascocoryne sarcoides was shown to produce potential-biofuel metabolites when grown on a cellulose-based medium; however, the genetic pathways needed for this production are unknown and the lack of genetic tools makes traditional reverse genetics difficult. We present the genomic characterization of A. sarcoides and use transcriptomic and metabolomic data to describe the genes involved in cellulose degradation and to provide hypotheses for the biofuel production pathways. In total, almost 80 biosynthetic clusters were identified, including several previously found only in plants. Additionally, many transcriptionally active regions outside of genes showed condition-specific expression, offering more evidence for the role of long non-coding RNA in gene regulation. This is one of the highest quality fungal genomes and, to our knowledge, the only thoroughly annotated and transcriptionally profiled fungal endophyte genome currently available. The analyses and datasets contribute to the study of cellulose degradation and biofuel production and provide the genomic foundation for the study of a model endophyte system.
A renewable source of energy is a pressing global need. The biological conversion of lignocellulose to biofuels by microorganisms presents a promising avenue, but few organisms have been studied thoroughly enough to develop the genetic tools necessary for rigorous experimentation. The filamentous-fungal endophyte A. sarcoides produces metabolites when grown on a cellulose-based medium that include eight-carbon volatile organic compounds, which are potential biofuel targets. Here we use broadly applicable methods including genomics, transcriptomics, and metabolomics to explore the biofuel production of A. sarcoides. These data were used to assemble the genome into 16 scaffolds, to thoroughly annotate the cellulose-degradation machinery, and to make predictions for the production pathway for the eight-carbon volatiles. Extremely high expression of the gene swollenin when grown on cellulose highlights the importance of accessory proteins in addition to the enzymes that catalyze the breakdown of the polymers. Correlation of the production of the eight-carbon biofuel-like metabolites with the expression of lipoxygenase pathway genes suggests the catabolism of linoleic acid as the mechanism of eight-carbon compound production. This is the first fungal genome to be sequenced in the family Helotiaceae, and A. sarcoides was isolated as an endophyte, making this work also potentially useful in fungal systematics and the study of plant–fungus relationships.
With the recent advances in high-throughput RNA sequencing (RNA-Seq), biologists are able to measure transcription with unprecedented precision. One problem that can now be tackled is that of isoform quantification: here one tries to reconstruct the abundances of isoforms of a gene. We have developed a statistical solution for this problem, based on analyzing a set of RNA-Seq reads, and a practical implementation, available from archive.gersteinlab.org/proj/rnaseq/IQSeq, in a tool we call IQSeq (Isoform Quantification in next-generation Sequencing). Here, we present theoretical results which IQSeq is based on, and then use both simulated and real datasets to illustrate various applications of the tool. In order to measure the accuracy of an isoform-quantification result, one would try to estimate the average variance of the estimated isoform abundances for each gene (based on resampling the RNA-seq reads), and IQSeq has a particularly fast algorithm (based on the Fisher Information Matrix) for calculating this, achieving a speedup of times compared to brute-force resampling. IQSeq also calculates an information theoretic measure of overall transcriptome complexity to describe isoform abundance for a whole experiment. IQSeq has many features that are particularly useful in RNA-Seq experimental design, allowing one to optimally model the integration of different sequencing technologies in a cost-effective way. In particular, the IQSeq formalism integrates the analysis of different sample (i.e. read) sets generated from different technologies within the same statistical framework. It also supports a generalized statistical partial-sample-generation function to model the sequencing process. This allows one to have a modular, “plugin-able” read-generation function to support the particularities of the many evolving sequencing technologies.
Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can “slice” and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approaches—for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the university setting will, in turn, create additional privacy issues and social conundrums.
Prostate cancer is the second most common cause of male cancer deaths in the United States. Here we present the complete sequence of seven primary prostate cancers and their paired normal counterparts. Several tumors contained complex chains of balanced rearrangements that occurred within or adjacent to known cancer genes. Rearrangement breakpoints were enriched near open chromatin, androgen receptor and ERG DNA binding sites in the setting of the ETS gene fusion TMPRSS2-ERG, but inversely correlated with these regions in tumors lacking ETS fusions. This observation suggests a link between chromatin or transcriptional regulation and the genesis of genomic aberrations. Three tumors contained rearrangements that disrupted CADM2, and four harbored events disrupting either PTEN (unbalanced events), a prostate tumor suppressor, or MAGI2 (balanced events), a PTEN interacting protein not previously implicated in prostate tumorigenesis. Thus, genomic rearrangements may arise from transcriptional or chromatin aberrancies to engage prostate tumorigenic mechanisms.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
The majority of prostate cancers harbor gene fusions of the 5′-untranslated region of the androgen-regulated transmembrane protease, serine 2 (TMPRSS2) promoter with erythroblast transformation specific (ETS) transcription factor family members. The common v-ets erythroblastosis virus E26 oncogene homolog [avian] (TMPRSS2–ERG) fusion is associated with a more aggressive clinical phenotype, implying the existence of a distinct subclass of prostate cancer defined by this fusion.
We used cDNA-mediated annealing, selection, ligation, and extension to determine the expression profiles of 6144 transcriptionally informative genes in archived biopsy samples from 455 prostate cancer patients in the Swedish Watchful Waiting cohort (1987–1999) and the US-based Physicians Health Study cohort (1983–2003). A gene expression signature for prostate cancers with the TMPRSS2-ERG fusion was determined using partitioning and classification models and used in computational functional analysis. Cell proliferation and TMPRSS2-ERG expression in androgen receptor–negative (NCI-H660) and –positive (VCaP-ERβ) prostate cancer cells after treatment with vehicle or estrogenic compounds were assessed by viability assays and quantitative polymerase chain reaction, respectively. All statistical tests were two-sided.
We identified an 87-gene expression signature that distinguishes TMPRSS2-ERG fusion prostate cancer as a discrete molecular entity (area under the curve = 0.80, 95% confidence interval [CI] = 0.792 to 0.81; P<.001). Computational analysis suggested that this fusion signature was associated with estrogen receptor (ER) signaling. Viability of NCI-H660 cells decreased after treatment with estrogen (viability normalized to day 0, estrogen vs vehicle at day 8, mean = 2.04 vs 3.40, difference = 1.36, 95% CI = 1.12 to 1.62) or ERβ agonist (ERβ agonist vs vehicle at day 8, mean = 1.86 vs 3.40, difference = 1.54, 95% CI = 1.39 to 1.69) but increased after ERα agonist treatment (ERα agonist vs vehicle at day 8, mean = 4.36 vs 3.40, difference = 0.96, 95% CI = 0.68 to 1.23). Similarly, expression of TMPRSS2-ERG decreased after ERβ agonist treatment (fold change over internal control, ERβ agonist vs vehicle at 24 hours, NCI H660, mean = 0.57-fold vs 1.0-fold, difference = 0.43, 95% CI = 0.29-fold to 0.57-fold) and increased after ERα agonist treatment (ERα agonist vs vehicle at 24 hours, mean = 5.63-fold vs 1.0-fold, difference = 4.63-fold, 95% CI = 4.34-fold to 4.92-fold).
TMPRSS2-ERG fusion prostate cancer is a distinct molecular subclass. TMPRSS2-ERG expression is regulated by a novel ER-dependent mechanism.
Summary: The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses.
Availability and implementation: RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
We have developed FusionSeq to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments, and it ranks candidates according to several statistics. It also has a module to identify exact sequences at breakpoint junctions. FusionSeq detected known and novel fusions in a specially sequenced calibration data set, including eight cancers with and without known rearrangements.
Tiling arrays have been the tool of choice for probing an organism's transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs.
Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of C. elegans. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a "nearest neighbor" classifier applied to array probes; we describe a method for determining potential "black list" regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array data. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center.
Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.
Emerging molecular and clinical data suggest that ETS fusion prostate cancer represents a distinct molecular subclass, driven most commonly by a hormonally regulated promoter and characterized by an aggressive natural history. The study of the genomic landscape of prostate cancer in the light of ETS fusion events is required to understand the foundation of this molecularly and clinically distinct subtype. We performed genome-wide profiling of 49 primary prostate cancers and identified 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses. Co-occurring events included losses at 19q13.32 and 1p22.1. We discovered 3 genomic events associated with ERG rearranged prostate cancer, affecting 6q, 7q, and 16q. 6q loss in non- rearranged prostate cancer is accompanied by gene expression deregulation in an independent dataset and by protein deregulation of MYO6. To analyze copy number alterations within the ETS genes, we performed a comprehensive analysis of all 27 ETS genes and of the 3Mbp genomic area between ERG and TMPRSS2 (21q) with an unprecedented resolution (30 bp). We demonstrate that high-resolution tiling arrays can be used to pin-point breakpoints leading to fusion events. This study provides further support to defining a distinct molecular subtype of prostate cancer based on the presence of ETS gene rearrangements.
ETS genes; prostate cancer; gain; loss
Current prostate cancer prognostic models are based on pre-treatment prostate specific antigen (PSA) levels, biopsy Gleason score, and clinical staging but in practice are inadequate to accurately predict disease progression. Hence, we sought to develop a molecular panel for prostate cancer progression by reasoning that molecular profiles might further improve current clinical models.
We analyzed a Swedish Watchful Waiting cohort with up to 30 years of clinical follow up using a novel method for gene expression profiling. This cDNA-mediated annealing, selection, ligation, and extension (DASL) method enabled the use of formalin-fixed paraffin-embedded transurethral resection of prostate (TURP) samples taken at the time of the initial diagnosis. We determined the expression profiles of 6100 genes for 281 men divided in two extreme groups: men who died of prostate cancer and men who survived more than 10 years without metastases (lethals and indolents, respectively). Several statistical and machine learning models using clinical and molecular features were evaluated for their ability to distinguish lethal from indolent cases.
Surprisingly, none of the predictive models using molecular profiles significantly improved over models using clinical variables only. Additional computational analysis confirmed that molecular heterogeneity within both the lethal and indolent classes is widespread in prostate cancer as compared to other types of tumors.
The determination of the molecularly dominant tumor nodule may be limited by sampling at time of initial diagnosis, may not be present at time of initial diagnosis, or may occur as the disease progresses making the development of molecular biomarkers for prostate cancer progression challenging.