Despite growing appreciations of the importance of long non-coding RNA (lncRNA) in normal physiology and disease, our knowledge of cancer-related lncRNA remains limited. By repurposing microarray probes, we constructed the expression profile of 10,207 lncRNA genes in approximately 1,300 tumors over four different cancer types. Through integrative analysis of the lncRNA expression profiles with clinical outcome and somatic copy number alteration (SCNA), we identified lncRNA that are associated with cancer subtypes and clinical prognosis, and predicted those that are potential drivers of cancer progression. We validated our predictions by experimentally confirming prostate cancer cell growth dependence on two novel lncRNA. Our analysis provided a resource of clinically relevant lncRNA for development of lncRNA biomarkers and identification of lncRNA therapeutic targets. It also demonstrated the power of integrating publically available genomic datasets and clinical information for discovering disease associated lncRNA.
Diversified histone modifications (HMs) are essential epigenetic features. They play important roles in fundamental biological processes including transcription, DNA repair and DNA replication. Chromatin regulators (CRs), which are indispensable in epigenetics, can mediate HMs to adjust chromatin structures and functions. With the development of ChIP-Seq technology, there is an opportunity to study CR and HM profiles at the whole-genome scale. However, no specific resource for the integration of CR ChIP-Seq data or CR-HM ChIP-Seq linkage pairs is currently available. Therefore, we constructed the CR Cistrome database, available online at http://compbio.tongji.edu.cn/cr and http://cistrome.org/cr/, to further elucidate CR functions and CR-HM linkages. Within this database, we collected all publicly available ChIP-Seq data on CRs in human and mouse and categorized the data into four cohorts: the reader, writer, eraser and remodeler cohorts, together with curated introductions and ChIP-Seq data analysis results. For the HM readers, writers and erasers, we provided further ChIP-Seq analysis data for the targeted HMs and schematized the relationships between them. We believe CR Cistrome is a valuable resource for the epigenetics community.
The profiling of small RNAs by high throughput sequencing (smRNA-Seq) has revealed the complexity of the RNA world. Here, we describe a computational scheme for dissecting the plant smRNAome by integrating smRNA-Seq datasets in Arabidopsis thaliana. Our analytical approach first defines ab initio the genomic loci that produce smRNAs as basic units, then utilizes principal component analysis (PCA) to predict novel miRNAs. Secondary structure prediction of candidates’ putative precursors discovered a group of long hairpin double-stranded RNAs (lh-dsRNAs) formed by inverted duplications of decayed coding genes. These gene remnants produce miRNA-like small RNAs which are predominantly 21- and 22-nt long, dependent of DCL1 but independent of RDR2 and DCL2/3/4, and associated with AGO1. Additionally, we found two classes of transcription start site associated- (TSSa-) RNAs located at sense (+) and antisense (−) approximately 100 ~ 200 bp downstream of TSSs, but are differentially incorporated into AGO1 and AGO4, respectively.
High-throughput sequencing; small RNAs; Principal component analysis; TSS-associated RNAs
Tissue-specific gene expression requires modulation of nucleosomes, allowing transcription factors to occupy cis elements that are accessible only in selected tissues. Master transcription factors control cell-specific genes and define cellular identities, but it is unclear if they possess special abilities to regulate cell-specific chromatin and if such abilities might underlie lineage determination and maintenance. One prevailing view is that several transcription factors enable chromatin access in combination. The homeodomain protein CDX2 specifies the embryonic intestinal epithelium, through unknown mechanisms, and partners with transcription factors such as HNF4A in the adult intestine. We examined enhancer chromatin and gene expression following Cdx2 or Hnf4a excision in mouse intestines. HNF4A loss did not affect CDX2 binding or chromatin, whereas CDX2 depletion modified chromatin significantly at CDX2-bound enhancers, disrupted HNF4A occupancy, and abrogated expression of neighboring genes. Thus, CDX2 maintains transcription-permissive chromatin, illustrating a powerful and dominant effect on enhancer configuration in an adult tissue. Similar, hierarchical control of cell-specific chromatin states is probably a general property of master transcription factors.
Nuclear receptors (NRs) comprise a superfamily of ligand-activated transcription factors that play important roles in both physiology and diseases including cancer. The technologies of Chromatin ImmunoPrecipitation followed by array hybridization (ChIP-chip) or massively parallel sequencing (ChIP-seq) has been used to map, at an unprecedented rate, the in vivo genome-wide binding (cistrome) of NRs in both normal and cancer cells. We developed a curated database of 88 NR cistrome datasets and other associated high-throughput datasets, including 121 collaborating factor cistromes, 94 epigenomes and 319 transcriptomes. Through integrative analysis of the curated NR ChIP-chip/seq datasets, we discovered novel factor-specific noncanonical motifs that may have important regulatory roles. We also revealed a common feature of NR pioneering factors to recognize relatively short and AT-rich motifs. Most NRs bind predominantly to introns and distal intergenetic regions, and binding sites closer to transcription start sites (TSSs) were found to be neither stronger nor more evolutionarily conserved. Interestingly, while most NRs appear to be predominantly transcriptional activators, our analysis suggests that the binding of ESR1, RARA and RARG has both activating and repressive effects. Through meta-analysis of different omic data of the same cancer cell line model from multiple studies, we generated consensus cistrome and expression profiles. We further made probabilistic predictions of the NR target genes by integrating cistrome and transcriptome data, and validated the predictions using expression data from tumor samples. The final database, with comprehensive cistrome, epigenome, transcriptome datasets, and downstream analysis results, constitutes a valuable resource for the nuclear receptor and cancer community.
We performed a systematic evaluation of how variations in sequencing depth and other parameters influence interpretation of Chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) experiments. Using Drosophila S2 cells, we generated ChIP-seq datasets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin state bias, open chromatin regions yielded higher coverage, which led to false positives if not corrected and had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP library complexity at high coverage. The removal of reads originating at the same base reduced false-positives while having little effect on detection sensitivity. Even at a depth of ~1 read/bp coverage of mappable genome, ~1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely-used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle datasets with deep coverage.
Endocrine therapies for breast cancer that target the estrogen receptor (ER) are ineffective in the 25-30% of cases that are ER negative (ER−). Androgen receptor (AR) is expressed in 60-70% of breast tumors, independent of ER status. How androgens and AR regulate breast cancer growth remains largely unknown. We find that AR is enriched in ER−breast tumors that over-express HER2. Through analysis of the AR cistrome and androgen-regulated gene expression in ER−/HER2+ breast cancers we find that AR mediates ligand-dependent activation of Wnt and HER2 signaling pathways through direct transcriptional induction of WNT7B and HER3. Specific targeting of AR, Wnt or HER2 signaling impairs androgen-stimulated tumor cell growth suggesting potential therapeutic approaches for ER−/HER2+ breast cancers.
Canonical Wnt signaling supports the pluripotency of embryonic stem cells (ESCs) but also promotes differentiation of early mammalian cell lineages. To explain these paradoxical observations, we explored the gene regulatory networks at play. Canonical Wnt signaling is intertwined with the pluripotency network comprising Nanog, Oct4, and Sox2 in mouse ESCs. In defined media supporting the derivation and propagation of ESCs, Tcf3 and β-catenin interact with Oct4; Tcf3 binds to Sox motif within Oct-Sox composite motifs that are also bound by Oct4-Sox2 complexes. Further, canonical Wnt signaling up-regulates the activity of the Pou5f1 distal enhancer via the Sox motif in ESCs. When viewed in the context of published studies on Tcf3 and β-catenin mutants, our findings suggest Tcf3 counters pluripotency by competition with Sox2 at these sites, and Tcf3 inhibition is blocked by β-catenin entry into this complex. Wnt pathway stimulation also triggers β-catenin association at regulatory elements with classic Lef/Tcf motifs associated with differentiation programs. The failure to activate these targets in the presence of a MEK/ERK inhibitor essential for ESC culture suggests MEK/ERK signaling and canonical Wnt signaling combine to promote ESC differentiation.
Mouse embryonic stem cells; Wnt; β-catenin; 2i; pluripotency; differentiation
MicroRNAs (miRNAs) are a class of 20–23 nucleotide small RNAs that regulate gene expression post-transcriptionally in animals and plants. Annotation of miRNAs by the miRNA database (miRBase) has largely relied on computational approaches. As a result, many miRBase entries lack experimental validation, and discrepancies between miRBase annotation and actual miRNA sequences are often observed. In this study, we integrated the small RNA sequencing (smRNA-seq) datasets in Caenorhabditis elegans and Drosophila melanogaster and devised an analytical pipeline coupled with detailed manual inspection to curate miRNA annotation systematically in miRBase. Our analysis reveals 19 (17.0%) and 51 (31.3%) miRNAs entries with detectable smRNA-seq reads have mature sequence discrepancies in C. elegans and D. melanogaster, respectively. These discrepancies frequently occur either for conserved miRNA families whose mature sequences were predicted according to their homologous counterparts in other species or for miRNAs whose precursor miRNA (pre-miRNA) hairpins produce an abundance of multiple miRNA isoforms or variants. Our analysis shows that while Drosophila pre-miRNAs, on average, produce less than 60% accurate mature miRNA reads in addition to their 5′ and 3′ variant isoforms, the precision of miRNA processing in C. elegans is much higher, at over 90%. Based on the revised miRNA sequences, we analyzed expression patterns of the more conserved (MC) and less conserved (LC) miRNAs and found that, whereas MC miRNAs are often co-expressed at multiple developmental stages, LC miRNAs tend to be expressed specifically at fewer stages.
microRNA; deep sequencing; database curation
Cancer cells induce a set of adaptive response pathways to survive in the face of stressors due to inadequate vascularization1. One such adaptive pathway is the unfolded protein (UPR) or endoplasmic reticulum (ER) stress response mediated in part by the ER-localized transmembrane sensor IRE12
and its substrate XBP13. Previous studies report UPR activation in various human tumors4-6, but XBP1's role in cancer progression in mammary epithelial cells is largely unknown. Triple negative breast cancer (TNBC), a form of breast cancer in which tumor cells do not express the genes for estrogen receptor, progesterone receptor, and Her2/neu, is a highly aggressive malignancy with limited treatment options7, 8. Here, we report that XBP1 is activated in TNBC and plays a pivotal role in the tumorigenicity and progression of this human breast cancer subtype. In breast cancer cell line models, depletion of XBP1 inhibited tumor growth and tumor relapse and reduced the CD44high/CD24low population. Hypoxia-inducing factor (HIF)1α is known to be hyperactivated in TNBCs 9, 10. Genome-wide mapping of the XBP1 transcriptional regulatory network revealed that XBP1 drives TNBC tumorigenicity by assembling a transcriptional complex with HIF1α that regulates the expression of HIF1α targets via the recruitment of RNA polymerase II. Analysis of independent cohorts of patients with TNBC revealed a specific XBP1 gene expression signature that was highly correlated with HIF1α and hypoxia-driven signatures and that strongly associated with poor prognosis. Our findings reveal a key function for the XBP1 branch of the UPR in TNBC and imply that targeting this pathway may offer alternative treatment strategies for this aggressive subtype of breast cancer.
Human neurons are functional over an entire lifetime, yet the mechanisms that preserve function and protect against neurodegeneration during aging are unknown. Here we show that induction of the repressor element 1-silencing transcription/neuron-restrictive silencer factor (REST/NRSF) is a universal feature of normal aging in human cortical and hippocampal neurons. REST is lost, however, in mild cognitive impairment (MCI) and Alzheimer’s disease (AD). Chromatin immunoprecipitation with deep sequencing (ChIP-seq) and expression analysis show that REST represses genes that promote cell death and AD pathology, and induces the expression of stress response genes. Moreover, REST potently protects neurons from oxidative stress and amyloid β-protein (Aβ) toxicity, and conditional deletion of REST in the mouse brain leads to age-related neurodegeneration. A functional ortholog of REST, C. elegans SPR-4, also protects against oxidative stress and Aβ toxicity. During normal aging, REST is induced in part by cell non-autonomous Wnt signaling. However, in AD, frontotemporal dementia and dementia with Lewy bodies, REST is lost from the nucleus and appears in autophagosomes together with pathologic misfolded proteins. Finally, REST levels during aging are closely correlated with cognitive preservation and longevity. Thus, the activation state of REST may distinguish neuroprotection from neurodegeneration in the aging brain.
H3K4me2/3, H3K9ac, and H3K27ac investigated by ChIP-Seq showed enrichment in generic regions and transcription start sites, and associated with active transcription in rice. They were used to discover unannotated genes and to predict transcription factor binding sites together with DNase-Seq data.
While previous studies have shown that histone modifications could influence plant growth and development by regulating gene transcription, knowledge about the relationships between these modifications and gene expression is still limited. This study used chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-Seq), to investigate the genome-wide distribution of four histone modifications: di and trimethylation of H3K4 (H3K4me2 and H3K4me3) and acylation of H3K9 and H3K27 (H3K9ac and H3K27ac) in Oryza sativa L. japonica. By analyzing published DNase-Seq data, this study explored DNase-Hypersensitive (DH) sites along the rice genome. The histone marks appeared mainly in generic regions and were enriched around the transcription start sites (TSSs) of genes. This analysis demonstrated that the four histone modifications and the DH sites were all associated with active transcription. Furthermore, the four histone modifications were highly concurrent with transcript regions—a promising feature that was used to predict missing genes in the rice gene annotation. The predictions were further validated by experimentally confirming the transcription of two predicted missing genes. Moreover, a sequence motif analysis was constructed in order to identify the DH sites and many putative transcription factor binding sites.
bioinformatics; chromatin structure and remodeling; epigenetics; gene regulation; genomics; rice.
The combination of ChIP-seq and transcriptome analysis is a compelling approach to unravel the regulation of gene expression. Several recently published methods combine transcription factor (TF) binding and gene expression for target prediction, but few of them provide an efficient software package for the community. Binding and expression target analysis (BETA) is a software package that integrates ChIP-seq of TFs or chromatin regulators with differential gene expression data to infer direct target genes. BETA has three functions: (i) to predict whether the factor has activating or repressive function; (ii) to infer the factor’s target genes; and (iii) to identify the motif of the factor and its collaborators, which might modulate the factor’s activating or repressive function. Here we describe the implementation and features of BETA to demonstrate its application to several data sets. BETA requires ~1 GB of RAM, and the procedure takes 20 min to complete. BETA is available open source at http://cistrome.org/BETA/.
The recent availability of high-density human genome tiling arrays enables biologists to conduct ChIP–chip experiments to locate the in vivo-binding sites of transcription factors in the human genome and explore the regulatory mechanisms. Once genomic regions enriched by transcription factor ChIP–chip are located, genome-scale downstream analyses are crucial but difficult for biologists without strong bioinformatics support. We designed and implemented the first web server to streamline the ChIP–chip downstream analyses. Given genome-scale ChIP regions, the cis-regulatory element annotation system (CEAS) retrieves repeat-masked genomic sequences, calculates GC content, plots evolutionary conservation, maps nearby genes and identifies enriched transcription factor-binding motifs. Biologists can utilize CEAS to retrieve useful information for ChIP–chip validation, assemble important knowledge to include in their publication and generate novel hypotheses (e.g. transcription factor cooperative partner) for further study. CEAS helps the adoption of ChIP–chip in mammalian systems and provides insights towards a more comprehensive understanding of transcriptional regulatory mechanisms. The URL of the server is .
Early full-term pregnancy is one of the most effective natural protections against breast cancer. To investigate this effect, we have characterized the global gene expression and epigenetic profiles of multiple cell types from normal breast tissue of nulliparous and parous women, and carriers of BRCA1 or BRCA2 mutations. We found significant differences in CD44+ progenitor cells, where the levels of many stem cell-related genes and pathways, including the cell cycle regulator p27, are lower in parous women without BRCA1/BRCA2 mutations. We also noted a significant reduction in the frequency of CD44+p27+ cells in parous women, and showed using explant cultures that parity-related signaling pathways play a role in regulating the number of p27+ cells and their proliferation. Our results suggest that pathways controlling p27+ mammary epithelial cells and the numbers of these cells relate to breast cancer risk, and can be explored for cancer risk assessment and prevention.
DNase-seq is a powerful technique for identifying cis-regulatory elements across the genome. We studied the key experimental parameters to optimize the performance of DNase-seq. We found that sequencing short 50-100bp fragments that accumulate in long inter-nucleosome linker regions is more efficient for identifying transcription factor binding sites than using longer fragments. We also assessed the potential of DNase-seq to predict transcription factor occupancy through the generation of nucleotide-resolution transcription factor footprints. In modeling the sequence-specific DNaseI cutting bias we found a surprisingly strong effect that varied over more than two orders of magnitude. This confounds DNaseI footprint analysis to the extent that the nucleotide resolution cleavage patterns at most transcription factor binding sites are derived from intrinsic DNaseI cleavage bias rather than from specific protein-DNA interactions. In contrast, quantitative comparison of DNaseI hypersensitivity between states can predict transcription factor occupancy associated with particular biological perturbations.
DNaseI hypersensitivity; DNase-seq; DNaseI footprint; Chromatin dynamics; CTCF; Androgen receptor; Estrogen receptor; Transcription factor binding; Nucleosome
Transcription factor activity and turnover are functionally linked, but the global patterns by which DNA-bound regulators are eliminated remain poorly understood. We established an assay to define the chromosomal location of DNA-associated proteins that are slated for degradation by the ubiquitin-proteasome system. The genome-wide map described here ties proteolysis in mammalian cells to active enhancers and to promoters of specific gene families. Nuclear-encoded mitochondrial genes in particular correlate with protein elimination, which positively affects their transcription. We show that the nuclear receptor corepressor NCoR1 is a key target of proteolysis and physically interacts with the transcription factor CREB. Proteasome inhibition stabilizes NCoR1 in a site-specific manner and restrains mitochondrial activity by repressing CREB-sensitive genes. In conclusion, this functional map of nuclear proteolysis links chromatin architecture with local protein stability and identifies proteolytic derepression as highly dynamic in regulating the transcription of genes involved in energy metabolism.
Chromatin regulators play an important role in the development of human diseases. In this study, we focused on Plant Homeo Domain Finger protein 8 (PHF8), a chromatin regulator that has attracted special concern recently. PHF8 is a histone lysine demethylase ubiquitously expressed in nuclei. Mutations of PHF8 are associated with X-linked mental retardation. It usually functions as a transcriptional co-activator by associating with H3K4me3 and RNA polymerase II. We found that PHF8 may associate with another regulator, REST/NRSF, predominately at promoter regions via studying several published PHF8 chromatin immunoprecipitation-sequencing (ChIP-Seq) datasets. Our analysis suggested that PHF8 not only activates but may also repress gene expression.
Summary: Chromatin immunoprecipitation and DNase I hypersensitivity assays with high-throughput sequencing have greatly accelerated the understanding of transcriptional and epigenetic regulation, although data reuse for the community of experimental biologists has been challenging. We created a data portal CistromeFinder that can help query, evaluate and visualize publicly available Chromatin immunoprecipitation and DNase I hypersensitivity assays with high-throughput sequencing data in human and mouse. The database currently contains 6378 samples over 4391 datasets, 313 factors and 102 cell lines or cell populations. Each dataset has gone through a consistent analysis and quality control pipeline; therefore, users could evaluate the overall quality of each dataset before examining binding sites near their genes of interest. CistromeFinder is integrated with UCSC genome browser for visualization, Primer3Plus for ChIP-qPCR primer design and CistromeMap for submitting newly available datasets. It also allows users to leave comments to facilitate data evaluation and update.
firstname.lastname@example.org or email@example.com
Androgen receptor (AR) is a ligand-dependent transcription factor that plays a key role in prostate cancer. Little is known about the nature of AR cis-regulatory sites in the human genome. We have mapped the AR binding regions on two chromosomes in human prostate cancer cells by combining chromatin immunoprecipitation (ChIP) with tiled oligonucleotide microarrays. We find that the majority of AR binding regions contain noncanonical AR-responsive elements (AREs). Importantly, we identify a noncanonical ARE as a cis-regulatory target of AR action in TMPRSS2, a gene fused to ETS transcription factors in the majority of prostate cancers. In addition, through the presence of enriched DNA-binding motifs, we find other transcription factors including GATA2 and Oct1 that cooperate in mediating the androgen response. These collaborating factors, together with AR, form a regulatory hierarchy that governs androgen-dependent gene expression and prostate cancer growth and offer potential new opportunities for therapeutic intervention.
The transcription factor SOX2 is an essential regulator of pluripotent stem cells and promotes development and maintenance of squamous epithelia. We previously reported that SOX2 is an oncogene and subject to highly recurrent genomic amplification in squamous cell carcinomas (SCCs). Here, we have further characterized the function of SOX2 in SCC. Using ChIP-seq analysis, we compared SOX2-regulated gene profiles in multiple SCC cell lines to ES cell profiles and determined that SOX2 binds to distinct genomic loci in SCCs. In SCCs, SOX2 preferentially interacts with the transcription factor p63, as opposed to the transcription factor OCT4, which is the preferred SOX2 binding partner in ES cells. SOX2 and p63 exhibited overlapping genomic occupancy at a large number of loci in SCCs; however, coordinate binding of SOX2 and p63 was absent in ES cells. We further demonstrated that SOX2 and p63 jointly regulate gene expression, including the oncogene ETV4, which was essential for SOX2-amplified SCC cell survival. Together, these findings demonstrate that the action of SOX2 in SCC differs substantially from its role in pluripotency. The identification of the SCC-associated interaction between SOX2 and p63 will enable deeper characterization the downstream targets of this interaction in SCC and normal squamous epithelial physiology.
Ten-Eleven Translocation (Tet) family of dioxygenases offers a new mechanism for dynamic regulation of DNA methylation and has been implicated in cell lineage differentiation and oncogenesis. Yet their functional roles and mechanisms of action in gene regulation and embryonic development are largely unknown. Here, we report that Xenopus Tet3 plays an essential role in early eye and neural development by directly regulating a set of key developmental genes. Tet3 is an active 5mC hydroxylase regulating the 5mC/5hmC status at target gene promoters. Biochemical and structural studies further reveal a novel DNA binding mode of the Tet3 CXXC domain that is critical for specific Tet3 targeting. Finally, we show that the enzymatic activity and CXXC domain are crucial for Tet3’s biological function. Together, these findings define Tet3 as a novel transcription factor and reveal a molecular mechanism by which the 5mC hydroxylase and DNA binding activities of Tet3 cooperate to control target gene expression and embryonic development.
If trait-associated variants alter regulatory regions, then they should fall within chromatin marks in relevant cell types. However, it is unclear which of the many marks are most useful in defining cell types associated with disease and fine mapping variants. We hypothesized that informative marks are phenotypically cell type specific; that is, SNPs associated with the same trait likely overlap marks in the same cell type. We examined 15 chromatin marks and found that those highlighting active gene regulation were phenotypically cell type specific. Trimethylation of histone H3 at lysine 4 (H3K4me3) was the most phenotypically cell type specific (P < 1 × 10−6), driven by colocalization of variants and marks rather than gene proximity (P < 0.001). H3K4me3 peaks overlapped with 37 SNPs for plasma low-density lipoprotein concentration in the liver (P < 7 × 10−5), 31 SNPs for rheumatoid arthritis within CD4+ regulatory T cells (P = 1 × 10−4), 67 SNPs for type 2 diabetes in pancreatic islet cells (P = 0.003) and the liver (P = 0.003), and 14 SNPs for neuropsychiatric disease in neuronal tissues (P = 0.007). We show how cell type–specific H3K4me3 peaks can inform the fine mapping of associated SNPs to identify causal variation.