Despite growing appreciations of the importance of long non-coding RNA (lncRNA) in normal physiology and disease, our knowledge of cancer-related lncRNA remains limited. By repurposing microarray probes, we constructed the expression profile of 10,207 lncRNA genes in approximately 1,300 tumors over four different cancer types. Through integrative analysis of the lncRNA expression profiles with clinical outcome and somatic copy number alteration (SCNA), we identified lncRNA that are associated with cancer subtypes and clinical prognosis, and predicted those that are potential drivers of cancer progression. We validated our predictions by experimentally confirming prostate cancer cell growth dependence on two novel lncRNA. Our analysis provided a resource of clinically relevant lncRNA for development of lncRNA biomarkers and identification of lncRNA therapeutic targets. It also demonstrated the power of integrating publically available genomic datasets and clinical information for discovering disease associated lncRNA.
Diversified histone modifications (HMs) are essential epigenetic features. They play important roles in fundamental biological processes including transcription, DNA repair and DNA replication. Chromatin regulators (CRs), which are indispensable in epigenetics, can mediate HMs to adjust chromatin structures and functions. With the development of ChIP-Seq technology, there is an opportunity to study CR and HM profiles at the whole-genome scale. However, no specific resource for the integration of CR ChIP-Seq data or CR-HM ChIP-Seq linkage pairs is currently available. Therefore, we constructed the CR Cistrome database, available online at http://compbio.tongji.edu.cn/cr and http://cistrome.org/cr/, to further elucidate CR functions and CR-HM linkages. Within this database, we collected all publicly available ChIP-Seq data on CRs in human and mouse and categorized the data into four cohorts: the reader, writer, eraser and remodeler cohorts, together with curated introductions and ChIP-Seq data analysis results. For the HM readers, writers and erasers, we provided further ChIP-Seq analysis data for the targeted HMs and schematized the relationships between them. We believe CR Cistrome is a valuable resource for the epigenetics community.
Chromatin regulators play an important role in the development of human diseases. In this study, we focused on Plant Homeo Domain Finger protein 8 (PHF8), a chromatin regulator that has attracted special concern recently. PHF8 is a histone lysine demethylase ubiquitously expressed in nuclei. Mutations of PHF8 are associated with X-linked mental retardation. It usually functions as a transcriptional co-activator by associating with H3K4me3 and RNA polymerase II. We found that PHF8 may associate with another regulator, REST/NRSF, predominately at promoter regions via studying several published PHF8 chromatin immunoprecipitation-sequencing (ChIP-Seq) datasets. Our analysis suggested that PHF8 not only activates but may also repress gene expression.
The profiling of small RNAs by high throughput sequencing (smRNA-Seq) has revealed the complexity of the RNA world. Here, we describe a computational scheme for dissecting the plant smRNAome by integrating smRNA-Seq datasets in Arabidopsis thaliana. Our analytical approach first defines ab initio the genomic loci that produce smRNAs as basic units, then utilizes principal component analysis (PCA) to predict novel miRNAs. Secondary structure prediction of candidates’ putative precursors discovered a group of long hairpin double-stranded RNAs (lh-dsRNAs) formed by inverted duplications of decayed coding genes. These gene remnants produce miRNA-like small RNAs which are predominantly 21- and 22-nt long, dependent of DCL1 but independent of RDR2 and DCL2/3/4, and associated with AGO1. Additionally, we found two classes of transcription start site associated- (TSSa-) RNAs located at sense (+) and antisense (−) approximately 100 ~ 200 bp downstream of TSSs, but are differentially incorporated into AGO1 and AGO4, respectively.
High-throughput sequencing; small RNAs; Principal component analysis; TSS-associated RNAs
Tissue-specific gene expression requires modulation of nucleosomes, allowing transcription factors to occupy cis elements that are accessible only in selected tissues. Master transcription factors control cell-specific genes and define cellular identities, but it is unclear if they possess special abilities to regulate cell-specific chromatin and if such abilities might underlie lineage determination and maintenance. One prevailing view is that several transcription factors enable chromatin access in combination. The homeodomain protein CDX2 specifies the embryonic intestinal epithelium, through unknown mechanisms, and partners with transcription factors such as HNF4A in the adult intestine. We examined enhancer chromatin and gene expression following Cdx2 or Hnf4a excision in mouse intestines. HNF4A loss did not affect CDX2 binding or chromatin, whereas CDX2 depletion modified chromatin significantly at CDX2-bound enhancers, disrupted HNF4A occupancy, and abrogated expression of neighboring genes. Thus, CDX2 maintains transcription-permissive chromatin, illustrating a powerful and dominant effect on enhancer configuration in an adult tissue. Similar, hierarchical control of cell-specific chromatin states is probably a general property of master transcription factors.
Summary: Chromatin immunoprecipitation and DNase I hypersensitivity assays with high-throughput sequencing have greatly accelerated the understanding of transcriptional and epigenetic regulation, although data reuse for the community of experimental biologists has been challenging. We created a data portal CistromeFinder that can help query, evaluate and visualize publicly available Chromatin immunoprecipitation and DNase I hypersensitivity assays with high-throughput sequencing data in human and mouse. The database currently contains 6378 samples over 4391 datasets, 313 factors and 102 cell lines or cell populations. Each dataset has gone through a consistent analysis and quality control pipeline; therefore, users could evaluate the overall quality of each dataset before examining binding sites near their genes of interest. CistromeFinder is integrated with UCSC genome browser for visualization, Primer3Plus for ChIP-qPCR primer design and CistromeMap for submitting newly available datasets. It also allows users to leave comments to facilitate data evaluation and update.
email@example.com or firstname.lastname@example.org
Nuclear receptors (NRs) comprise a superfamily of ligand-activated transcription factors that play important roles in both physiology and diseases including cancer. The technologies of Chromatin ImmunoPrecipitation followed by array hybridization (ChIP-chip) or massively parallel sequencing (ChIP-seq) has been used to map, at an unprecedented rate, the in vivo genome-wide binding (cistrome) of NRs in both normal and cancer cells. We developed a curated database of 88 NR cistrome datasets and other associated high-throughput datasets, including 121 collaborating factor cistromes, 94 epigenomes and 319 transcriptomes. Through integrative analysis of the curated NR ChIP-chip/seq datasets, we discovered novel factor-specific noncanonical motifs that may have important regulatory roles. We also revealed a common feature of NR pioneering factors to recognize relatively short and AT-rich motifs. Most NRs bind predominantly to introns and distal intergenetic regions, and binding sites closer to transcription start sites (TSSs) were found to be neither stronger nor more evolutionarily conserved. Interestingly, while most NRs appear to be predominantly transcriptional activators, our analysis suggests that the binding of ESR1, RARA and RARG has both activating and repressive effects. Through meta-analysis of different omic data of the same cancer cell line model from multiple studies, we generated consensus cistrome and expression profiles. We further made probabilistic predictions of the NR target genes by integrating cistrome and transcriptome data, and validated the predictions using expression data from tumor samples. The final database, with comprehensive cistrome, epigenome, transcriptome datasets, and downstream analysis results, constitutes a valuable resource for the nuclear receptor and cancer community.
We performed a systematic evaluation of how variations in sequencing depth and other parameters influence interpretation of Chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) experiments. Using Drosophila S2 cells, we generated ChIP-seq datasets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin state bias, open chromatin regions yielded higher coverage, which led to false positives if not corrected and had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP library complexity at high coverage. The removal of reads originating at the same base reduced false-positives while having little effect on detection sensitivity. Even at a depth of ~1 read/bp coverage of mappable genome, ~1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely-used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle datasets with deep coverage.
Androgen receptor (AR) is a ligand-dependent transcription factor that plays a key role in prostate cancer. Little is known about the nature of AR cis-regulatory sites in the human genome. We have mapped the AR binding regions on two chromosomes in human prostate cancer cells by combining chromatin immunoprecipitation (ChIP) with tiled oligonucleotide microarrays. We find that the majority of AR binding regions contain noncanonical AR-responsive elements (AREs). Importantly, we identify a noncanonical ARE as a cis-regulatory target of AR action in TMPRSS2, a gene fused to ETS transcription factors in the majority of prostate cancers. In addition, through the presence of enriched DNA-binding motifs, we find other transcription factors including GATA2 and Oct1 that cooperate in mediating the androgen response. These collaborating factors, together with AR, form a regulatory hierarchy that governs androgen-dependent gene expression and prostate cancer growth and offer potential new opportunities for therapeutic intervention.
The transcription factor SOX2 is an essential regulator of pluripotent stem cells and promotes development and maintenance of squamous epithelia. We previously reported that SOX2 is an oncogene and subject to highly recurrent genomic amplification in squamous cell carcinomas (SCCs). Here, we have further characterized the function of SOX2 in SCC. Using ChIP-seq analysis, we compared SOX2-regulated gene profiles in multiple SCC cell lines to ES cell profiles and determined that SOX2 binds to distinct genomic loci in SCCs. In SCCs, SOX2 preferentially interacts with the transcription factor p63, as opposed to the transcription factor OCT4, which is the preferred SOX2 binding partner in ES cells. SOX2 and p63 exhibited overlapping genomic occupancy at a large number of loci in SCCs; however, coordinate binding of SOX2 and p63 was absent in ES cells. We further demonstrated that SOX2 and p63 jointly regulate gene expression, including the oncogene ETV4, which was essential for SOX2-amplified SCC cell survival. Together, these findings demonstrate that the action of SOX2 in SCC differs substantially from its role in pluripotency. The identification of the SCC-associated interaction between SOX2 and p63 will enable deeper characterization the downstream targets of this interaction in SCC and normal squamous epithelial physiology.
Endocrine therapies for breast cancer that target the estrogen receptor (ER) are ineffective in the 25-30% of cases that are ER negative (ER−). Androgen receptor (AR) is expressed in 60-70% of breast tumors, independent of ER status. How androgens and AR regulate breast cancer growth remains largely unknown. We find that AR is enriched in ER−breast tumors that over-express HER2. Through analysis of the AR cistrome and androgen-regulated gene expression in ER−/HER2+ breast cancers we find that AR mediates ligand-dependent activation of Wnt and HER2 signaling pathways through direct transcriptional induction of WNT7B and HER3. Specific targeting of AR, Wnt or HER2 signaling impairs androgen-stimulated tumor cell growth suggesting potential therapeutic approaches for ER−/HER2+ breast cancers.
Ten-Eleven Translocation (Tet) family of dioxygenases offers a new mechanism for dynamic regulation of DNA methylation and has been implicated in cell lineage differentiation and oncogenesis. Yet their functional roles and mechanisms of action in gene regulation and embryonic development are largely unknown. Here, we report that Xenopus Tet3 plays an essential role in early eye and neural development by directly regulating a set of key developmental genes. Tet3 is an active 5mC hydroxylase regulating the 5mC/5hmC status at target gene promoters. Biochemical and structural studies further reveal a novel DNA binding mode of the Tet3 CXXC domain that is critical for specific Tet3 targeting. Finally, we show that the enzymatic activity and CXXC domain are crucial for Tet3’s biological function. Together, these findings define Tet3 as a novel transcription factor and reveal a molecular mechanism by which the 5mC hydroxylase and DNA binding activities of Tet3 cooperate to control target gene expression and embryonic development.
MicroRNAs (miRNAs) are a class of 20–23 nucleotide small RNAs that regulate gene expression post-transcriptionally in animals and plants. Annotation of miRNAs by the miRNA database (miRBase) has largely relied on computational approaches. As a result, many miRBase entries lack experimental validation, and discrepancies between miRBase annotation and actual miRNA sequences are often observed. In this study, we integrated the small RNA sequencing (smRNA-seq) datasets in Caenorhabditis elegans and Drosophila melanogaster and devised an analytical pipeline coupled with detailed manual inspection to curate miRNA annotation systematically in miRBase. Our analysis reveals 19 (17.0%) and 51 (31.3%) miRNAs entries with detectable smRNA-seq reads have mature sequence discrepancies in C. elegans and D. melanogaster, respectively. These discrepancies frequently occur either for conserved miRNA families whose mature sequences were predicted according to their homologous counterparts in other species or for miRNAs whose precursor miRNA (pre-miRNA) hairpins produce an abundance of multiple miRNA isoforms or variants. Our analysis shows that while Drosophila pre-miRNAs, on average, produce less than 60% accurate mature miRNA reads in addition to their 5′ and 3′ variant isoforms, the precision of miRNA processing in C. elegans is much higher, at over 90%. Based on the revised miRNA sequences, we analyzed expression patterns of the more conserved (MC) and less conserved (LC) miRNAs and found that, whereas MC miRNAs are often co-expressed at multiple developmental stages, LC miRNAs tend to be expressed specifically at fewer stages.
microRNA; deep sequencing; database curation
If trait-associated variants alter regulatory regions, then they should fall within chromatin marks in relevant cell types. However, it is unclear which of the many marks are most useful in defining cell types associated with disease and fine mapping variants. We hypothesized that informative marks are phenotypically cell type specific; that is, SNPs associated with the same trait likely overlap marks in the same cell type. We examined 15 chromatin marks and found that those highlighting active gene regulation were phenotypically cell type specific. Trimethylation of histone H3 at lysine 4 (H3K4me3) was the most phenotypically cell type specific (P < 1 × 10−6), driven by colocalization of variants and marks rather than gene proximity (P < 0.001). H3K4me3 peaks overlapped with 37 SNPs for plasma low-density lipoprotein concentration in the liver (P < 7 × 10−5), 31 SNPs for rheumatoid arthritis within CD4+ regulatory T cells (P = 1 × 10−4), 67 SNPs for type 2 diabetes in pancreatic islet cells (P = 0.003) and the liver (P = 0.003), and 14 SNPs for neuropsychiatric disease in neuronal tissues (P = 0.007). We show how cell type–specific H3K4me3 peaks can inform the fine mapping of associated SNPs to identify causal variation.
The epithelial-to-mesenchymal transition is an important mechanism in cancer metastasis. Although transcription factors including SNAIL, SLUG, and TWIST1 regulate the epithelial-to-mesenchymal transition, other unknown transcription factors could also be involved. Identification of the full complement of transcription factors is essential for a more complete understanding of gene regulation in this process. Chromatin immunoprecipitation-sequencing (ChIP-Seq) technologies have been used to detect genome-wide binding of transcription factors; here, we developed a systematic approach to integrate existing ChIP-Seq and transcriptome data. We scanned multiple transcription factors to investigate their functional impact on the epithelial-to-mesenchymal transition in the human A549 lung adenocarcinoma cell line.
Among the transcription factors tested, impact scores identified the forkhead box protein A1 (FOXA1) as the most significant transcription factor in the epithelial-to-mesenchymal transition. FOXA1 physically associates with the promoters of its predicted target genes. Several critical epithelial-to-mesenchymal transition effectors involved in cellular adhesion and cellular communication were identified in the regulatory network of FOXA1, including FOXA2, FGA, FGB, FGG, and FGL1. The implication of FOXA1 in the epithelial-to-mesenchymal transition via its regulatory network indicates that FOXA1 may play an important role in the initiation of lung cancer metastasis.
We identified FOXA1 as a potentially important transcription factor and negative regulator in the initial stages of lung cancer metastasis. FOXA1 may modulate the epithelial-to-mesenchymal transition via its transcriptional regulatory network. Further, this study demonstrates how ChIP-Seq and expression data could be integrated to delineate the impact of transcription factors on a specific biological process.
The epithelial-to-mesenchymal transition; Lung cancer; ChIP-Seq; FOXA1
The recent availability of high-density human genome tiling arrays enables biologists to conduct ChIP–chip experiments to locate the in vivo-binding sites of transcription factors in the human genome and explore the regulatory mechanisms. Once genomic regions enriched by transcription factor ChIP–chip are located, genome-scale downstream analyses are crucial but difficult for biologists without strong bioinformatics support. We designed and implemented the first web server to streamline the ChIP–chip downstream analyses. Given genome-scale ChIP regions, the cis-regulatory element annotation system (CEAS) retrieves repeat-masked genomic sequences, calculates GC content, plots evolutionary conservation, maps nearby genes and identifies enriched transcription factor-binding motifs. Biologists can utilize CEAS to retrieve useful information for ChIP–chip validation, assemble important knowledge to include in their publication and generate novel hypotheses (e.g. transcription factor cooperative partner) for further study. CEAS helps the adoption of ChIP–chip in mammalian systems and provides insights towards a more comprehensive understanding of transcriptional regulatory mechanisms. The URL of the server is .
Summary: Transcription and chromatin regulators, and histone modifications play essential roles in gene expression regulation. We have created CistromeMap as a web server to provide a comprehensive knowledgebase of all of the publicly available ChIP-Seq and DNase-Seq data in mouse and human. We have also manually curated metadata to ensure annotation consistency, and developed a user-friendly display matrix for quick navigation and retrieval of data for specific factors, cells and papers. Finally, we provide users with summary statistics of ChIP-Seq and DNase-Seq studies.
Availability: Freely available on the web at http://cistrome.dfci.harvard.edu/pc/
Epigenetic regulators represent a promising new class of therapeutic targets for cancer. Enhancer of zeste homolog 2 (EZH2), a subunit of Polycomb repressive complex 2 (PRC2), silences gene expression via its histone methyltransferase activity. Here we report that the oncogenic function of EZH2 in castration-resistant prostate cancer (CRPC) is independent of its role as a transcriptional repressor. Instead, it involves the ability of EZH2 to act as a co-activator for critical transcription factors including the androgen receptor (AR). This functional switch is dependent on phosphorylation of EZH2, and requires an intact methyltransferase domain. Hence, targeting the non-PRC2 function of EZH2 may have significant therapeutic efficacy for treating metastatic, hormone-refractory prostate cancer.
Genome-wide ChIP-chip assays of protein–DNA interactions yield large volumes of data requiring effective statistical analysis to obtain reliable results. Successful analysis methods need to be tailored to platform specific characteristics such as probe density, genome coverage, and the nature of the controls. We describe the use of the respective software packages MAT and MA2C for the analysis of ChIP-chip data from one-color Affymetrix and two-color NimbleGen or Agilent tiling microarrays.
ChIP-chip; probe modeling; normalization; peak detection
Fusion of the androgen receptor-regulated (AR-regulated) TMPRSS2 gene with ERG in prostate cancer (PCa) causes androgen-stimulated overexpression of ERG, an ETS transcription factor, but critical downstream effectors of ERG-mediating PCa development remain to be established. Expression of the SOX9 transcription factor correlated with TMPRSS2:ERG fusion in 3 independent PCa cohorts, and ERG-dependent expression of SOX9 was confirmed by RNAi in the fusion-positive VCaP cell line. SOX9 has been shown to mediate ductal morphogenesis in fetal prostate and maintain stem/progenitor cell pools in multiple adult tissues, and has also been linked to PCa and other cancers. SOX9 overexpression resulted in neoplasia in murine prostate and stimulated tumor invasion, similarly to ERG. Moreover, SOX9 depletion in VCaP cells markedly impaired invasion and growth in vitro and in vivo, establishing SOX9 as a critical downstream effector of ERG. Finally, we found that ERG regulated SOX9 indirectly by opening a cryptic AR-regulated enhancer in the SOX9 gene. Together, these results demonstrate that ERG redirects AR to a set of genes including SOX9 that are not normally androgen stimulated, and identify SOX9 as a critical downstream effector of ERG in TMPRSS2:ERG fusion–positive PCa.
As we come to the end of 2011, Genome Biology has asked some members of our Editorial Board for their views on the state of play in genomics. What was their favorite paper of 2011? What are the challenges in their particular research area? Who has had the biggest influence on their careers? What advice would they give to young researchers embarking on a career in research?
Androgen receptor (AR) is reactivated in castration resistant prostate cancer (CRPC) through mechanisms including marked increases in AR gene expression. We identify an enhancer in the AR second intron contributing to increased AR expression at low androgen levels in CRPC. Moreover, at increased androgen levels the AR binds this site and represses AR gene expression through recruitment of lysine specific demethylase 1 (LSD1) and H3K4me1,2 demethylation. AR similarly represses expression of multiple genes mediating androgen synthesis, DNA synthesis and proliferation, while stimulating genes mediating lipid and protein biosynthesis. Androgen levels in CRPC appear adequate to stimulate AR activity on enhancer elements, but not suppressor elements, resulting in increased expression of AR and AR repressed genes that contribute to cellular proliferation.
prostate cancer; androgen receptor; androgen deprivation therapy; H3K4 methylation; LSD1