The JAK2 tyrosine kinase is a critical mediator of cytokine-induced signaling. It plays a role in the nucleus, where it regulates transcription by phosphorylating histone H3 at tyrosine 41 (H3Y41ph). We used chromatin immunoprecipitation coupled to massively parallel DNA sequencing (ChIP-seq) to define the genome-wide pattern of H3Y41ph in human erythroid leukemia cells. Our results indicate that H3Y41ph is located at three distinct sites: (1) at a subset of active promoters, where it overlaps with H3K4me3, (2) at distal cis-regulatory elements, where it coincides with the binding of STAT5, and (3) throughout the transcribed regions of active, tissue-specific hematopoietic genes. Together, these data extend our understanding of this conserved and essential signaling pathway and provide insight into the mechanisms by which extracellular stimuli may lead to the coordinated regulation of transcription.
► Histone H3Y41 phosphorylation is associated with actively transcribed genes ► H3Y41ph correlates with H3K4me3 at the TSS of a subset of active genes ► H3Y41ph and STAT5 binding are coincident at some JAK2/STAT5 target genes ► H3Y41ph blankets the entire transcribed region of active tissue-specific genes
JAK2 tyrosine kinase, a critical mediator of cytokine-induced signaling, plays a role in the nucleus, where it regulates transcription by phosphorylating histone H3 at tyrosine 41 (H3Y41ph). Using Chip-seq, Göttgens, Kouzarides, and colleagues now show that H3Y41ph marks specific sets of genes stimulated by this signaling pathway and that it blankets lineage-specific hematopoietic genes. Notably, at certain genes and enhancers, H3Y41ph coincides with STAT5 binding. These data provide insight into the mechanisms by which extracellular stimuli may lead to the coordinated regulation of transcription.
The pregnane X receptor (PXR) is a key regulator of xenobiotic metabolism and disposition in liver. However, little is known about the PXR DNA-binding signatures in vivo, or how PXR regulates novel direct targets on a genome-wide scale. Therefore, we generated a roadmap of hepatic PXR bindings in the entire mouse genome [chromatin immunoprecipitation (ChIP)-Seq]. The most frequent PXR DNA-binding motif is the AGTTCA-like direct repeat with a 4bp spacer [direct repeat (DR)-4)]. Surprisingly, there are also high motif occurrences with spacers of a periodicity of 5 bp, forming a novel DR-(5n + 4) pattern for PXR binding. PXR-binding overlaps with the epigenetic mark for gene activation (histone-H3K4-di-methylation), but not with epigenetic marks for gene suppression (DNA methylation or histone-H3K27-tri-methylation) (ChIP-on-chip). After administering a PXR agonist, changes in mRNA of most PXR-direct target genes correlate with increased PXR binding. Specifically, increased PXR binding triggers the trans-activation of critical drug-metabolizing enzymes and transporters. The mRNA induction of these genes is absent in PXR-null mice. The current work provides the first in vivo evidence of PXR DNA-binding signatures in the mouse genome, paving the path for predicting and further understanding the multifaceted roles of PXR in liver.
Motivation: Antibody-based Chromatin Immunoprecipitation assay followed by high-throughput sequencing technology (ChIP-seq) is a relatively new method to study the binding patterns of specific protein molecules over the entire genome. ChIP-seq technology allows scientist to get more comprehensive results in shorter time. Here, we present a non-linear normalization algorithm and a mixture modeling method for comparing ChIP-seq data from multiple samples and characterizing genes based on their RNA polymerase II (Pol II) binding patterns.
Results: We apply a two-step non-linear normalization method based on locally weighted regression (LOESS) approach to compare ChIP-seq data across multiple samples and model the difference using an Exponential-NormalK mixture model. Fitted model is used to identify genes associated with differential binding sites based on local false discovery rate (fdr). These genes are then standardized and hierarchically clustered to characterize their Pol II binding patterns. As a case study, we apply the analysis procedure comparing normal breast cancer (MCF7) to tamoxifen-resistant (OHT) cell line. We find enriched regions that are associated with cancer (P < 0.0001). Our findings also imply that there may be a dysregulation of cell cycle and gene expression control pathways in the tamoxifen-resistant cells. These results show that the non-linear normalization method can be used to analyze ChIP-seq data across multiple samples.
Availability: Data are available at http://www.bmi.osu.edu/~khuang/Data/ChIP/RNAPII/
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Mammalian genomes encode numerous cis-natural antisense transcripts (cis-NATs). The extent to which these cis-NATs are actively regulated and ultimately functionally relevant, as opposed to transcriptional noise, remains a matter of debate. To address this issue, we analyzed the chromatin environment and RNA Pol II binding properties of human cis-NAT promoters genome-wide. Cap analysis of gene expression data were used to identify thousands of cis-NAT promoters, and profiles of nine histone modifications and RNA Pol II binding for these promoters in ENCODE cell types were analyzed using chromatin immunoprecipitation followed by sequencing (ChIP-seq) data. Active cis-NAT promoters are enriched with activating histone modifications and occupied by RNA Pol II, whereas weak cis-NAT promoters are depleted for both activating modifications and RNA Pol II. The enrichment levels of activating histone modifications and RNA Pol II binding show peaks centered around cis-NAT transcriptional start sites, and the levels of activating histone modifications at cis-NAT promoters are positively correlated with cis-NAT expression levels. Cis-NAT promoters also show highly tissue-specific patterns of expression. These results suggest that human cis-NATs are actively transcribed by the RNA Pol II and that their expression is epigenetically regulated, prerequisites for a functional potential for many of these non-coding RNAs.
Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster.
Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis.
Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.
Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study.
High throughput signature sequencing holds many promises, one of which is the ready identification of in vivo transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq). Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation.
Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR). Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2–3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.'s neuron-restrictive silencer factor (NRSF) ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds.
The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from http://useq.sourceforge.net/.
An important mechanism for gene regulation involves chromatin changes via histone modification. One such modification is histone H3 lysine 4 trimethylation (H3K4me3), which requires histone methyltranferase complexes (HMT) containing the trithorax-group (trxG) protein ASH2. Mutations in ash2 cause a variety of pattern formation defects in the Drosophila wing. We have identified genome-wide binding of ASH2 in wing imaginal discs using chromatin immunoprecipitation combined with sequencing (ChIP-Seq). Our results show that genes with functions in development and transcriptional regulation are activated by ASH2 via H3K4 trimethylation in nearby nucleosomes. We have characterized the occupancy of phosphorylated forms of RNA Polymerase II and histone marks associated with activation and repression of transcription. ASH2 occupancy correlates with phosphorylated forms of RNA Polymerase II and histone activating marks in expressed genes. Additionally, RNA Polymerase II phosphorylation on serine 5 and H3K4me3 are reduced in ash2 mutants in comparison to wild-type flies. Finally, we have identified specific motifs associated with ASH2 binding in genes that are differentially expressed in ash2 mutants. Our data suggest that recruitment of the ASH2-containing HMT complexes is context specific and points to a function of ASH2 and H3K4me3 in transcriptional pausing control.
Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites.
Electronic Supplementary Material
The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.
Transcription factor binding sites; DNase-seq; ChIP-seq; FAIRE-seq; Next-generation sequencing; Motif
Transcription is a sophisticated multi-step process in which RNA polymerase II (Pol II) transcribes a DNA template into RNA in concert with a broad array of transcription initiation, elongation, capping, termination, and histone modifying factors. Recent global analyses of Pol II distribution have indicated that many genes are regulated during the elongation phase, shedding light on a previously underappreciated mechanism for controlling gene expression. Understanding how various factors regulate transcription elongation in living cells has been greatly aided by chromatin immunoprecipitation (ChIP) studies, which can provide spatial and temporal resolution of protein-DNA binding events. The coupling of ChIP with DNA microarray and high-throughput sequencing technologies (ChIP-chip and ChIP-seq) has significantly increased the scope of ChIP studies and genome-wide maps of Pol II or elongation factor binding sites can now be readily produced. However, while ChIP-chip/ChIP-seq data allow for high-resolution localization of protein-DNA binding sites, they are not sufficient to dissect protein function. Here we describe techniques for coupling ChIP-chip/ChIP-seq with genetic, chemical, and experimental manipulation to obtain mechanistic insight from genome-wide protein-DNA binding studies. We have employed these techniques to discern immature promoter-proximal Pol II from productively elongating Pol II, and infer a critical role for the transition between initiation and full elongation competence in regulating development and gene induction in response to environmental signals.
transcription elongation; gene expression; ChIP-chip; ChIP-seq
Chromatin immunoprecipitation (ChIP) coupled with genome tiling array hybridization (ChIP-chip) and ChIP followed by massively parallel sequencing (ChIP-seq) are high throughput approaches to profile genome-wide protein-DNA interactions. Both technologies are increasingly used to study transcription factor binding sites and chromatin modifications. CisGenome is an integrated software system for analyzing ChIP-chip and ChIP-seq data. This unit describes basic functions of CisGenome and how to use them to find genomic regions with protein-DNA interactions, visualize binding signals, associate binding regions with nearby genes, search for novel transcription factor binding motifs, and map existing DNA sequence motifs to user-supplied genomic regions to define their exact locations.
transcription factor; chromatin immunoprecipitation; tiling array; next generation sequencing; motif; gene regulation
Regulatory elements play an important role in the variability of individual responses to drug treatment. This has been established through studies on three classes of elements that regulate RNA and protein abundance: promoters, enhancers and microRNAs. Each of these elements, and genetic variants within them, are being characterized at an exponential pace by next-generation sequencing (NGS) technologies. In this review, we outline examples of how each class of element affects drug response via regulation of drug targets, transporters and enzymes. We also discuss the impact of NGS technologies such as chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq), and the ramifications of new techniques such as high-throughput chromosome capture (Hi-C), chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) and massively parallel reporter assays (MPRA). NGS approaches are generating data faster than they can be analyzed, and new methods will be required to prioritize laboratory results before they are ready for the clinic. However, there is no doubt that these approaches will bring about a systems-level understanding of the interplay between genetic variants and drug response. An understanding of the importance of regulatory variants in pharmacogenomics will facilitate the identification of responders versus non-responders, the prevention of adverse effects and the optimization of therapies for individual patients.
ChIP-Seq; enhancers; miRNA; next-generation sequencing; pharmacogenomics; promoters; RNA-Seq
The use of high-throughput sequencing in combination with chromatin immunoprecipitation (ChIP-seq) has enabled the study of genome-wide protein binding at high resolution. While the amount of data generated from such experiments is steadily increasing, the methods available for their analysis remain limited. Although several algorithms for the analysis of ChIP-seq data have been published they focus almost exclusively on transcription factor studies and are usually not well suited for the analysis of other types of experiments.
Here we present ChIPseqR, an algorithm for the analysis of nucleosome positioning and histone modification ChIP-seq experiments. The performance of this novel method is studied on short read sequencing data of Arabidopsis thaliana mononucleosomes as well as on simulated data.
ChIPseqR is shown to improve sensitivity and spatial resolution over existing methods while maintaining high specificity. Further analysis of predicted nucleosomes reveals characteristic patterns in nucleosome sequences and placement.
Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context.
We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters.
We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters.
Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters.
Chromatin immunoprecipitation combined with genome tile path microarrays or deep sequencing can be used to study genome-wide epigenetic profiles and the transcription factor binding repertoire. Although well studied in a variety of cell lines, these genome-wide profiles have so far been little explored in vertebrate embryos.
Here we report on two genome tile path ChIP-chip designs for interrogating the Xenopus tropicalis genome. In particular, a whole-genome microarray design was used to identify active promoters by close proximity to histone H3 lysine 4 trimethylation. A second microarray design features these experimentally derived promoter regions in addition to currently annotated 5′ ends of genes. These regions truly represent promoters as shown by binding of TBP, a key transcription initiation factor.
A whole-genome and a promoter tile path microarray design was developed. Both designs can be used to study epigenetic phenomena and transcription factor binding in developing Xenopus embryos.
ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome. However, to maximize the effectiveness of data analysis of such sequences requires the development of new algorithms that are able to accurately predict DNA-protein binding sites.
Here, we present SIPeS (Site Identification from Paired-end Sequencing), a novel algorithm for precise identification of binding sites from short reads generated by paired-end solexa ChIP-Seq technology. In this paper we used ChIP-Seq data from the Arabidopsis basic helix-loop-helix transcription factor ABORTED MICROSPORES (AMS), which is expressed within the anther during pollen development, the results show that SIPeS has better resolution for binding site identification compared to two existing ChIP-Seq peak detection algorithms, Cisgenome and MACS.
When compared to Cisgenome and MACS, SIPeS shows better resolution for binding site discovery. Moreover, SIPeS is designed to calculate the mappable genome length accurately with the fragment length based on the paired-end reads. Dynamic baselines are also employed to effectively discriminate closely adjacent binding sites, for effective binding sites discovery, which is of particular value when working with high-density genomes.
We present a mixture model-based analysis for identifying differences in the distribution of RNA polymerase II (Pol II) in transcribed regions, measured using ChIP-seq (chromatin immunoprecipitation following massively parallel sequencing technology). The statistical model assumes that the number of Pol II-targeted sequences contained within each genomic region follows a Poisson distribution. A Poisson mixture model was then developed to distinguish Pol II binding changes in transcribed region using an empirical approach and an expectation-maximization (EM) algorithm developed for estimation and inference. In order to achieve a global maximum in the M-step, a particle swarm optimization (PSO) was implemented. We applied this model to Pol II binding data generated from hormone-dependent MCF7 breast cancer cells and antiestrogen-resistant MCF7 breast cancer cells before and after treatment with 17β-estradiol (E2). We determined that in the hormone-dependent cells, ~9.9% (2527) genes showed significant changes in Pol II binding after E2 treatment. However, only ~0.7% (172) genes displayed significant Pol II binding changes in E2-treated antiestrogen-resistant cells. These results show that a Poisson mixture model can be used to analyze ChIP-seq data.
Recent genome-wide chromatin immunoprecipitation coupled high throughput sequencing (ChIP-seq) analyses performed in various eukaryotic organisms, analysed RNA Polymerase II (Pol II) pausing around the transcription start sites of genes. In this study we have further investigated genome-wide binding of Pol II downstream of the 3′ end of the annotated genes (EAGs) by ChIP-seq in human cells. At almost all expressed genes we observed Pol II occupancy downstream of the EAGs suggesting that Pol II pausing 3′ from the transcription units is a rather common phenomenon. Downstream of EAGs Pol II transcripts can also be detected by global run-on and sequencing, suggesting the presence of functionally active Pol II. Based on Pol II occupancy downstream of EAGs we could distinguish distinct clusters of Pol II pause patterns. On core histone genes, coding for non-polyadenylated transcripts, Pol II occupancy is quickly dropping after the EAG. In contrast, on genes, whose transcripts undergo polyA tail addition [poly(A)+], Pol II occupancy downstream of the EAGs can be detected up to 4–6 kb. Inhibition of polyadenylation significantly increased Pol II occupancy downstream of EAGs at poly(A)+ genes, but not at the EAGs of core histone genes. The differential genome-wide Pol II occupancy profiles 3′ of the EAGs have also been confirmed in mouse embryonic stem (mES) cells, indicating that Pol II pauses genome-wide downstream of the EAGs in mammalian cells. Moreover, in mES cells the sharp drop of Pol II signal at the EAG of core histone genes seems to be independent of the phosphorylation status of the C-terminal domain of the large subunit of Pol II. Thus, our study uncovers a potential link between different mRNA 3′ end processing mechanisms and consequent Pol II transcription termination processes.
Motivation: Chromatin states are the key to gene regulation and cell identity. Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-Seq) is increasingly being used to map epigenetic states across genomes of diverse species. Chromatin modification profiles are frequently noisy and diffuse, spanning regions ranging from several nucleosomes to large domains of multiple genes. Much of the early work on the identification of ChIP-enriched regions for ChIP-Seq data has focused on identifying localized regions, such as transcription factor binding sites. Bioinformatic tools to identify diffuse domains of ChIP-enriched regions have been lacking.
Results: Based on the biological observation that histone modifications tend to cluster to form domains, we present a method that identifies spatial clusters of signals unlikely to appear by chance. This method pools together enrichment information from neighboring nucleosomes to increase sensitivity and specificity. By using genomic-scale analysis, as well as the examination of loci with validated epigenetic states, we demonstrate that this method outperforms existing methods in the identification of ChIP-enriched signals for histone modification profiles. We demonstrate the application of this unbiased method in important issues in ChIP-Seq data analysis, such as data normalization for quantitative comparison of levels of epigenetic modifications across cell types and growth conditions.
Supplementary information: Supplementary data are available at Bioinformatics online.
The molecular mechanisms of genome reprogramming during transcriptional responses to stress are associated with specific chromatin modifications. Available data, however, describe histone modifications only at individual plant genes induced by stress. We have no knowledge of chromatin modifications taking place at genes whose transcription has been down-regulated or on the genome-wide chromatin modification patterns that occur during the plant's response to dehydration stress.
Using chromatin immunoprecipitation and deep sequencing (ChIP-Seq) we established the whole-genome distribution patterns of histone H3 lysine 4 mono-, di-, and tri-methylation (H3K4me1, H3K4me2, and H3K4me3, respectively) in Arabidopsis thaliana during watered and dehydration stress conditions. In contrast to the relatively even distribution of H3 throughout the genome, the H3K4me1, H3K4me2, and H3K4me3 marks are predominantly located on genes. About 90% of annotated genes carry one or more of the H3K4 methylation marks. The H3K4me1 and H3K4me2 marks are more widely distributed (80% and 84%, respectively) than the H3K4me3 marks (62%), but the H3K4me2 and H3K4me1 levels changed only modestly during dehydration stress. By contrast, the H3K4me3 abundance changed robustly when transcripts levels from responding genes increased or decreased. In contrast to the prominent H3K4me3 peaks present at the 5'-ends of most transcribed genes, genes inducible by dehydration and ABA displayed atypically broader H3K4me3 distribution profiles that were present before and after the stress.
A higher number (90%) of annotated Arabidopsis genes carry one or more types of H3K4me marks than previously reported. During the response to dehydration stress the changes in H3K4me1, H3K4me2, and H3K4me3 patterns show different dynamics and specific patterns at up-regulated, down-regulated, and unaffected genes. The different behavior of each methylation mark during the response process illustrates that they have distinct roles in the transcriptional response of implicated genes. The broad H3K4me3 distribution profiles on nucleosomes of stress-induced genes uncovered a specific chromatin pattern associated with many of the genes involved in the dehydration stress response.
Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases.
Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.
By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.
transcription factor; ChIP-Seq; histone modification; chromatin
Oestrogen receptor-α (ER) is the defining and driving transcription factor in the majority of breast cancers and its target genes dictate cell growth and endocrine response, yet genomic understanding of ER function has been restricted to model systems1-3. We now map genome-wide ER binding events, by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq), in primary breast cancers from patients with different clinical outcome and in distant ER positive (ER+) metastases. We find that drug resistant cancers still have ER-chromatin occupancy, but that ER binding is a dynamic process, with the acquisition of unique ER binding regions in tumours from patients that are likely to relapse. The acquired, poor outcome ER regulatory regions observed in primary tumours reveal gene signatures that predict clinical outcome in ER+ disease exclusively. We find that the differential ER binding programme observed in tumours from patients with poor outcome is not due to the selection of a rare subpopulation of cells, but is due to the FoxA1-mediated reprogramming of ER binding on a rapid time scale. The parallel redistribution of ER and FoxA1 cis-regulatory elements in drug resistant cellular contexts is supported by histological co-expression of ER and FoxA1 in metastatic samples. By establishing transcription factor mapping in primary tumour material, we show that there is plasticity in ER binding capacity, with distinct combinations of cis-regulatory elements linked with the different clinical outcomes.
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has recently been used to identify the modification patterns for the methylation and acetylation of many different histone tails in genes and enhancers.
We have extended the analysis of histone modifications to gene deserts, pericentromeres and subtelomeres. Using data from human CD4+ T cells, we have found that each of these non-genic regions has a particular profile of histone modifications that distinguish it from the other non-coding regions. Different methylation states of H4K20, H3K9 and H3K27 were found to be enriched in each region relative to the other regions. These findings indicate that non-genic regions of the genome are variable with respect to histone modification patterns, rather than being monolithic. We furthermore used consensus sequences for unassembled centromeres and telomeres to identify the significant histone modifications in these regions. Finally, we compared the modification patterns in non-genic regions to those at silent genes and genes with higher levels of expression. For all tested methylations with the exception of H3K27me3, the enrichment level of each modification state for silent genes is between that of non-genic regions and expressed genes. For H3K27me3, the highest levels are found in silent genes.
In addition to the histone modification pattern difference between euchromatin and heterochromatin regions, as is illustrated by the enrichment of H3K9me2/3 in non-genic regions while H3K9me1 is enriched at active genes; the chromatin modifications within non-genic (heterochromatin-like) regions (e.g. subtelomeres, pericentromeres and gene deserts) are also quite different.
The pattern of histone H4 acetylation in different genomic regions has been investigated by immunoprecipitating oligonucleosomes from a human lymphoblastoid cell line with antibodies to H4 acetylated at lysines 5, 8, 12 or 16. DNA from antibody-bound or unbound chromatin was assayed by slot blotting. Pol I and pol II transcribed genes located in euchromatin were shown to have levels of H4 acetylation at lysines 5, 8 and 12 equivalent to those in input chromatin, but to be slightly enriched in H4 acetylated at lysine 16. In no case did the acetylation level correlate with actual or potential transcriptional activity. All acetylated histone H4 isoforms were depleted in non-coding, simple repeat DNA in heterochromatin, though the extent of depletion varied with the type of heterochromatin and with the isoform. Two single copy genes that map within or adjacent to blocks of paracentric heterochromatin are depleted in H4 acetylated at lysines 5, 8 and 12, but not 16. Consensus sequences of repetitive elements of the Alu family (SINES, enriched in R bands) were associated with H4 that was more highly acetylated at all four lysines than input chromatin, while H4 associated with Kpn I elements (LINES, enriched in G bands) was significantly underacetylated.
Epigenetic mechanisms set apart the active and inactive regions in the genome of multicellular organisms to produce distinct cell fates during embryogenesis. Here we report on the epigenetic and transcriptome genome-wide maps of gastrula-stage Xenopus tropicalis embryos using massive parallel sequencing of cDNA (RNA-seq) and DNA obtained by chromatin immunoprecipitation (ChIP-seq) of histone H3 K4 and K27 trimethylation and RNA Polymerase II (RNAPII). These maps identify promoters and transcribed regions. Strikingly, genomic regions featuring opposing histone modifications are mostly transcribed, reflecting spatially regulated expression rather than bivalency as determined by expression profile analyses, sequential ChIP, and ChIP-seq on dissected embryos. Spatial differences in H3K27me3 deposition are predictive of localized gene expression. Moreover, the appearance of H3K4me3 coincides with zygotic gene activation, whereas H3K27me3 is predominantly deposited upon subsequent spatial restriction or repression of transcriptional regulators. These results reveal a hierarchy in the spatial control of zygotic gene activation.