Related Articles
Motivation: Antibody-based Chromatin Immunoprecipitation assay followed by high-throughput sequencing technology (ChIP-seq) is a relatively new method to study the binding patterns of specific protein molecules over the entire genome. ChIP-seq technology allows scientist to get more comprehensive results in shorter time. Here, we present a non-linear normalization algorithm and a mixture modeling method for comparing ChIP-seq data from multiple samples and characterizing genes based on their RNA polymerase II (Pol II) binding patterns.
Results: We apply a two-step non-linear normalization method based on locally weighted regression (LOESS) approach to compare ChIP-seq data across multiple samples and model the difference using an Exponential-NormalK mixture model. Fitted model is used to identify genes associated with differential binding sites based on local false discovery rate (fdr). These genes are then standardized and hierarchically clustered to characterize their Pol II binding patterns. As a case study, we apply the analysis procedure comparing normal breast cancer (MCF7) to tamoxifen-resistant (OHT) cell line. We find enriched regions that are associated with cancer (P < 0.0001). Our findings also imply that there may be a dysregulation of cell cycle and gene expression control pathways in the tamoxifen-resistant cells. These results show that the non-linear normalization method can be used to analyze ChIP-seq data across multiple samples.
Availability: Data are available at http://www.bmi.osu.edu/~khuang/Data/ChIP/RNAPII/
Contact: taslim.2@osu.edu; khuang@bmi.osu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp384
PMCID: PMC2800347
PMID: 19561022
The pregnane X receptor (PXR) is a key regulator of xenobiotic metabolism and disposition in liver. However, little is known about the PXR DNA-binding signatures in vivo, or how PXR regulates novel direct targets on a genome-wide scale. Therefore, we generated a roadmap of hepatic PXR bindings in the entire mouse genome [chromatin immunoprecipitation (ChIP)-Seq]. The most frequent PXR DNA-binding motif is the AGTTCA-like direct repeat with a 4bp spacer [direct repeat (DR)-4)]. Surprisingly, there are also high motif occurrences with spacers of a periodicity of 5 bp, forming a novel DR-(5n + 4) pattern for PXR binding. PXR-binding overlaps with the epigenetic mark for gene activation (histone-H3K4-di-methylation), but not with epigenetic marks for gene suppression (DNA methylation or histone-H3K27-tri-methylation) (ChIP-on-chip). After administering a PXR agonist, changes in mRNA of most PXR-direct target genes correlate with increased PXR binding. Specifically, increased PXR binding triggers the trans-activation of critical drug-metabolizing enzymes and transporters. The mRNA induction of these genes is absent in PXR-null mice. The current work provides the first in vivo evidence of PXR DNA-binding signatures in the mouse genome, paving the path for predicting and further understanding the multifaceted roles of PXR in liver.
doi:10.1093/nar/gkq654
PMCID: PMC3001051
PMID: 20693526
Mammalian genomes encode numerous cis-natural antisense transcripts (cis-NATs). The extent to which these cis-NATs are actively regulated and ultimately functionally relevant, as opposed to transcriptional noise, remains a matter of debate. To address this issue, we analyzed the chromatin environment and RNA Pol II binding properties of human cis-NAT promoters genome-wide. Cap analysis of gene expression data were used to identify thousands of cis-NAT promoters, and profiles of nine histone modifications and RNA Pol II binding for these promoters in ENCODE cell types were analyzed using chromatin immunoprecipitation followed by sequencing (ChIP-seq) data. Active cis-NAT promoters are enriched with activating histone modifications and occupied by RNA Pol II, whereas weak cis-NAT promoters are depleted for both activating modifications and RNA Pol II. The enrichment levels of activating histone modifications and RNA Pol II binding show peaks centered around cis-NAT transcriptional start sites, and the levels of activating histone modifications at cis-NAT promoters are positively correlated with cis-NAT expression levels. Cis-NAT promoters also show highly tissue-specific patterns of expression. These results suggest that human cis-NATs are actively transcribed by the RNA Pol II and that their expression is epigenetically regulated, prerequisites for a functional potential for many of these non-coding RNAs.
doi:10.1093/nar/gkr1010
PMCID: PMC3287164
PMID: 22371288
Background
Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster.
Results
Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis.
Conclusions
Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.
doi:10.1186/1471-2164-12-134
PMCID: PMC3053263
PMID: 21356108
Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study.
doi:10.1371/journal.pone.0024051
PMCID: PMC3184070
PMID: 21980340
Background
High throughput signature sequencing holds many promises, one of which is the ready identification of in vivo transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq). Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation.
Results
Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR). Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2–3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.'s neuron-restrictive silencer factor (NRSF) ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds.
Conclusion
The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from .
doi:10.1186/1471-2105-9-523
PMCID: PMC2628906
PMID: 19061503
An important mechanism for gene regulation involves chromatin changes via histone modification. One such modification is histone H3 lysine 4 trimethylation (H3K4me3), which requires histone methyltranferase complexes (HMT) containing the trithorax-group (trxG) protein ASH2. Mutations in ash2 cause a variety of pattern formation defects in the Drosophila wing. We have identified genome-wide binding of ASH2 in wing imaginal discs using chromatin immunoprecipitation combined with sequencing (ChIP-Seq). Our results show that genes with functions in development and transcriptional regulation are activated by ASH2 via H3K4 trimethylation in nearby nucleosomes. We have characterized the occupancy of phosphorylated forms of RNA Polymerase II and histone marks associated with activation and repression of transcription. ASH2 occupancy correlates with phosphorylated forms of RNA Polymerase II and histone activating marks in expressed genes. Additionally, RNA Polymerase II phosphorylation on serine 5 and H3K4me3 are reduced in ash2 mutants in comparison to wild-type flies. Finally, we have identified specific motifs associated with ASH2 binding in genes that are differentially expressed in ash2 mutants. Our data suggest that recruitment of the ASH2-containing HMT complexes is context specific and points to a function of ASH2 and H3K4me3 in transcriptional pausing control.
doi:10.1093/nar/gkq1322
PMCID: PMC3113561
PMID: 21310711
Transcription is a sophisticated multi-step process in which RNA polymerase II (Pol II) transcribes a DNA template into RNA in concert with a broad array of transcription initiation, elongation, capping, termination, and histone modifying factors. Recent global analyses of Pol II distribution have indicated that many genes are regulated during the elongation phase, shedding light on a previously underappreciated mechanism for controlling gene expression. Understanding how various factors regulate transcription elongation in living cells has been greatly aided by chromatin immunoprecipitation (ChIP) studies, which can provide spatial and temporal resolution of protein-DNA binding events. The coupling of ChIP with DNA microarray and high-throughput sequencing technologies (ChIP-chip and ChIP-seq) has significantly increased the scope of ChIP studies and genome-wide maps of Pol II or elongation factor binding sites can now be readily produced. However, while ChIP-chip/ChIP-seq data allow for high-resolution localization of protein-DNA binding sites, they are not sufficient to dissect protein function. Here we describe techniques for coupling ChIP-chip/ChIP-seq with genetic, chemical, and experimental manipulation to obtain mechanistic insight from genome-wide protein-DNA binding studies. We have employed these techniques to discern immature promoter-proximal Pol II from productively elongating Pol II, and infer a critical role for the transition between initiation and full elongation competence in regulating development and gene induction in response to environmental signals.
doi:10.1016/j.ymeth.2009.02.024
PMCID: PMC3431615
PMID: 19275938
transcription elongation; gene expression; ChIP-chip; ChIP-seq
Chromatin immunoprecipitation (ChIP) coupled with genome tiling array hybridization (ChIP-chip) and ChIP followed by massively parallel sequencing (ChIP-seq) are high throughput approaches to profile genome-wide protein-DNA interactions. Both technologies are increasingly used to study transcription factor binding sites and chromatin modifications. CisGenome is an integrated software system for analyzing ChIP-chip and ChIP-seq data. This unit describes basic functions of CisGenome and how to use them to find genomic regions with protein-DNA interactions, visualize binding signals, associate binding regions with nearby genes, search for novel transcription factor binding motifs, and map existing DNA sequence motifs to user-supplied genomic regions to define their exact locations.
doi:10.1002/0471250953.bi0213s33
PMCID: PMC3072298
PMID: 21400695
transcription factor; chromatin immunoprecipitation; tiling array; next generation sequencing; motif; gene regulation
Background
The use of high-throughput sequencing in combination with chromatin immunoprecipitation (ChIP-seq) has enabled the study of genome-wide protein binding at high resolution. While the amount of data generated from such experiments is steadily increasing, the methods available for their analysis remain limited. Although several algorithms for the analysis of ChIP-seq data have been published they focus almost exclusively on transcription factor studies and are usually not well suited for the analysis of other types of experiments.
Results
Here we present ChIPseqR, an algorithm for the analysis of nucleosome positioning and histone modification ChIP-seq experiments. The performance of this novel method is studied on short read sequencing data of Arabidopsis thaliana mononucleosomes as well as on simulated data.
Conclusions
ChIPseqR is shown to improve sensitivity and spatial resolution over existing methods while maintaining high specificity. Further analysis of predicted nucleosomes reveals characteristic patterns in nucleosome sequences and placement.
doi:10.1186/1471-2105-12-39
PMCID: PMC3045301
PMID: 21281468
Background
Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context.
Methods
We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters.
Results
We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters.
Conclusion
Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters.
doi:10.1186/1471-2105-11-S1-S65
PMCID: PMC3009539
PMID: 20122241
Background
ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome. However, to maximize the effectiveness of data analysis of such sequences requires the development of new algorithms that are able to accurately predict DNA-protein binding sites.
Results
Here, we present SIPeS (Site Identification from Paired-end Sequencing), a novel algorithm for precise identification of binding sites from short reads generated by paired-end solexa ChIP-Seq technology. In this paper we used ChIP-Seq data from the Arabidopsis basic helix-loop-helix transcription factor ABORTED MICROSPORES (AMS), which is expressed within the anther during pollen development, the results show that SIPeS has better resolution for binding site identification compared to two existing ChIP-Seq peak detection algorithms, Cisgenome and MACS.
Conclusions
When compared to Cisgenome and MACS, SIPeS shows better resolution for binding site discovery. Moreover, SIPeS is designed to calculate the mappable genome length accurately with the fragment length based on the paired-end reads. Dynamic baselines are also employed to effectively discriminate closely adjacent binding sites, for effective binding sites discovery, which is of particular value when working with high-density genomes.
doi:10.1186/1471-2105-11-81
PMCID: PMC2831849
PMID: 20144209
We present a mixture model-based analysis for identifying differences in the distribution of RNA polymerase II (Pol II) in transcribed regions, measured using ChIP-seq (chromatin immunoprecipitation following massively parallel sequencing technology). The statistical model assumes that the number of Pol II-targeted sequences contained within each genomic region follows a Poisson distribution. A Poisson mixture model was then developed to distinguish Pol II binding changes in transcribed region using an empirical approach and an expectation-maximization (EM) algorithm developed for estimation and inference. In order to achieve a global maximum in the M-step, a particle swarm optimization (PSO) was implemented. We applied this model to Pol II binding data generated from hormone-dependent MCF7 breast cancer cells and antiestrogen-resistant MCF7 breast cancer cells before and after treatment with 17β-estradiol (E2). We determined that in the hormone-dependent cells, ~9.9% (2527) genes showed significant changes in Pol II binding after E2 treatment. However, only ~0.7% (172) genes displayed significant Pol II binding changes in E2-treated antiestrogen-resistant cells. These results show that a Poisson mixture model can be used to analyze ChIP-seq data.
doi:10.1186/1471-2164-9-S2-S23
PMCID: PMC2559888
PMID: 18831789
Recent genome-wide chromatin immunoprecipitation coupled high throughput sequencing (ChIP-seq) analyses performed in various eukaryotic organisms, analysed RNA Polymerase II (Pol II) pausing around the transcription start sites of genes. In this study we have further investigated genome-wide binding of Pol II downstream of the 3′ end of the annotated genes (EAGs) by ChIP-seq in human cells. At almost all expressed genes we observed Pol II occupancy downstream of the EAGs suggesting that Pol II pausing 3′ from the transcription units is a rather common phenomenon. Downstream of EAGs Pol II transcripts can also be detected by global run-on and sequencing, suggesting the presence of functionally active Pol II. Based on Pol II occupancy downstream of EAGs we could distinguish distinct clusters of Pol II pause patterns. On core histone genes, coding for non-polyadenylated transcripts, Pol II occupancy is quickly dropping after the EAG. In contrast, on genes, whose transcripts undergo polyA tail addition [poly(A)+], Pol II occupancy downstream of the EAGs can be detected up to 4–6 kb. Inhibition of polyadenylation significantly increased Pol II occupancy downstream of EAGs at poly(A)+ genes, but not at the EAGs of core histone genes. The differential genome-wide Pol II occupancy profiles 3′ of the EAGs have also been confirmed in mouse embryonic stem (mES) cells, indicating that Pol II pauses genome-wide downstream of the EAGs in mammalian cells. Moreover, in mES cells the sharp drop of Pol II signal at the EAG of core histone genes seems to be independent of the phosphorylation status of the C-terminal domain of the large subunit of Pol II. Thus, our study uncovers a potential link between different mRNA 3′ end processing mechanisms and consequent Pol II transcription termination processes.
doi:10.1371/journal.pone.0038769
PMCID: PMC3372504
PMID: 22701709
Background
Chromatin immunoprecipitation combined with genome tile path microarrays or deep sequencing can be used to study genome-wide epigenetic profiles and the transcription factor binding repertoire. Although well studied in a variety of cell lines, these genome-wide profiles have so far been little explored in vertebrate embryos.
Principal Findings
Here we report on two genome tile path ChIP-chip designs for interrogating the Xenopus tropicalis genome. In particular, a whole-genome microarray design was used to identify active promoters by close proximity to histone H3 lysine 4 trimethylation. A second microarray design features these experimentally derived promoter regions in addition to currently annotated 5′ ends of genes. These regions truly represent promoters as shown by binding of TBP, a key transcription initiation factor.
Conclusions
A whole-genome and a promoter tile path microarray design was developed. Both designs can be used to study epigenetic phenomena and transcription factor binding in developing Xenopus embryos.
doi:10.1371/journal.pone.0008820
PMCID: PMC2809088
PMID: 20098671
Motivation: Chromatin states are the key to gene regulation and cell identity. Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-Seq) is increasingly being used to map epigenetic states across genomes of diverse species. Chromatin modification profiles are frequently noisy and diffuse, spanning regions ranging from several nucleosomes to large domains of multiple genes. Much of the early work on the identification of ChIP-enriched regions for ChIP-Seq data has focused on identifying localized regions, such as transcription factor binding sites. Bioinformatic tools to identify diffuse domains of ChIP-enriched regions have been lacking.
Results: Based on the biological observation that histone modifications tend to cluster to form domains, we present a method that identifies spatial clusters of signals unlikely to appear by chance. This method pools together enrichment information from neighboring nucleosomes to increase sensitivity and specificity. By using genomic-scale analysis, as well as the examination of loci with validated epigenetic states, we demonstrate that this method outperforms existing methods in the identification of ChIP-enriched signals for histone modification profiles. We demonstrate the application of this unbiased method in important issues in ChIP-Seq data analysis, such as data normalization for quantitative comparison of levels of epigenetic modifications across cell types and growth conditions.
Availability: http://home.gwu.edu/∼wpeng/Software.htm
Contact: wpeng@gwu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp340
PMCID: PMC2732366
PMID: 19505939
van Dijk, Karin | Ding, Yong | Malkaram, Sridhar | Riethoven, Jean-Jack M | Liu, Rong | Yang, Jingyi | Laczko, Peter | Chen, Han | Xia, Yuannan | Ladunga, Istvan | Avramova, Zoya | Fromm, Michael
Background
The molecular mechanisms of genome reprogramming during transcriptional responses to stress are associated with specific chromatin modifications. Available data, however, describe histone modifications only at individual plant genes induced by stress. We have no knowledge of chromatin modifications taking place at genes whose transcription has been down-regulated or on the genome-wide chromatin modification patterns that occur during the plant's response to dehydration stress.
Results
Using chromatin immunoprecipitation and deep sequencing (ChIP-Seq) we established the whole-genome distribution patterns of histone H3 lysine 4 mono-, di-, and tri-methylation (H3K4me1, H3K4me2, and H3K4me3, respectively) in Arabidopsis thaliana during watered and dehydration stress conditions. In contrast to the relatively even distribution of H3 throughout the genome, the H3K4me1, H3K4me2, and H3K4me3 marks are predominantly located on genes. About 90% of annotated genes carry one or more of the H3K4 methylation marks. The H3K4me1 and H3K4me2 marks are more widely distributed (80% and 84%, respectively) than the H3K4me3 marks (62%), but the H3K4me2 and H3K4me1 levels changed only modestly during dehydration stress. By contrast, the H3K4me3 abundance changed robustly when transcripts levels from responding genes increased or decreased. In contrast to the prominent H3K4me3 peaks present at the 5'-ends of most transcribed genes, genes inducible by dehydration and ABA displayed atypically broader H3K4me3 distribution profiles that were present before and after the stress.
Conclusions
A higher number (90%) of annotated Arabidopsis genes carry one or more types of H3K4me marks than previously reported. During the response to dehydration stress the changes in H3K4me1, H3K4me2, and H3K4me3 patterns show different dynamics and specific patterns at up-regulated, down-regulated, and unaffected genes. The different behavior of each methylation mark during the response process illustrates that they have distinct roles in the transcriptional response of implicated genes. The broad H3K4me3 distribution profiles on nucleosomes of stress-induced genes uncovered a specific chromatin pattern associated with many of the genes involved in the dehydration stress response.
doi:10.1186/1471-2229-10-238
PMCID: PMC3095321
PMID: 21050490
Background
Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases.
Results
Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.
Conclusion
By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.
doi:10.1186/1741-7007-9-80
PMCID: PMC3239327
PMID: 22115494
transcription factor; ChIP-Seq; histone modification; chromatin
Background
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has recently been used to identify the modification patterns for the methylation and acetylation of many different histone tails in genes and enhancers.
Results
We have extended the analysis of histone modifications to gene deserts, pericentromeres and subtelomeres. Using data from human CD4+ T cells, we have found that each of these non-genic regions has a particular profile of histone modifications that distinguish it from the other non-coding regions. Different methylation states of H4K20, H3K9 and H3K27 were found to be enriched in each region relative to the other regions. These findings indicate that non-genic regions of the genome are variable with respect to histone modification patterns, rather than being monolithic. We furthermore used consensus sequences for unassembled centromeres and telomeres to identify the significant histone modifications in these regions. Finally, we compared the modification patterns in non-genic regions to those at silent genes and genes with higher levels of expression. For all tested methylations with the exception of H3K27me3, the enrichment level of each modification state for silent genes is between that of non-genic regions and expressed genes. For H3K27me3, the highest levels are found in silent genes.
Conclusion
In addition to the histone modification pattern difference between euchromatin and heterochromatin regions, as is illustrated by the enrichment of H3K9me2/3 in non-genic regions while H3K9me1 is enriched at active genes; the chromatin modifications within non-genic (heterochromatin-like) regions (e.g. subtelomeres, pericentromeres and gene deserts) are also quite different.
doi:10.1186/1471-2164-10-143
PMCID: PMC2667539
PMID: 19335899
Ross-Innes, Caryn S. | Stark, Rory | Teschendorff, Andrew E. | Holmes, Kelly A. | Ali, H. Raza | Dunning, Mark J. | Brown, Gordon D. | Gojis, Ondrej | Ellis, Ian O. | Green, Andrew R. | Ali, Simak | Chin, Suet-Feung | Palmieri, Carlo | Caldas, Carlos | Carroll, Jason S.
Nature
2012;481(7381):389-393.
Summary
Oestrogen receptor-α (ER) is the defining and driving transcription factor in the majority of breast cancers and its target genes dictate cell growth and endocrine response, yet genomic understanding of ER function has been restricted to model systems1-3. We now map genome-wide ER binding events, by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq), in primary breast cancers from patients with different clinical outcome and in distant ER positive (ER+) metastases. We find that drug resistant cancers still have ER-chromatin occupancy, but that ER binding is a dynamic process, with the acquisition of unique ER binding regions in tumours from patients that are likely to relapse. The acquired, poor outcome ER regulatory regions observed in primary tumours reveal gene signatures that predict clinical outcome in ER+ disease exclusively. We find that the differential ER binding programme observed in tumours from patients with poor outcome is not due to the selection of a rare subpopulation of cells, but is due to the FoxA1-mediated reprogramming of ER binding on a rapid time scale. The parallel redistribution of ER and FoxA1 cis-regulatory elements in drug resistant cellular contexts is supported by histological co-expression of ER and FoxA1 in metastatic samples. By establishing transcription factor mapping in primary tumour material, we show that there is plasticity in ER binding capacity, with distinct combinations of cis-regulatory elements linked with the different clinical outcomes.
doi:10.1038/nature10730
PMCID: PMC3272464
PMID: 22217937
SUMMARY
Epigenetic mechanisms set apart the active and inactive regions in the genome of multicellular organisms to produce distinct cell fates during embryogenesis. Here we report on the epigenetic and transcriptome genome-wide maps of gastrula-stage Xenopus tropicalis embryos using massive parallel sequencing of cDNA (RNA-seq) and DNA obtained by chromatin immunoprecipitation (ChIP-seq) of histone H3 K4 and K27 trimethylation and RNA Polymerase II (RNAPII). These maps identify promoters and transcribed regions. Strikingly, genomic regions featuring opposing histone modifications are mostly transcribed, reflecting spatially regulated expression rather than bivalency as determined by expression profile analyses, sequential ChIP, and ChIP-seq on dissected embryos. Spatial differences in H3K27me3 deposition are predictive of localized gene expression. Moreover, the appearance of H3K4me3 coincides with zygotic gene activation, whereas H3K27me3 is predominantly deposited upon subsequent spatial restriction or repression of transcriptional regulators. These results reveal a hierarchy in the spatial control of zygotic gene activation.
doi:10.1016/j.devcel.2009.08.005
PMCID: PMC2746918
PMID: 19758566
The pattern of histone H4 acetylation in different genomic regions has been investigated by immunoprecipitating oligonucleosomes from a human lymphoblastoid cell line with antibodies to H4 acetylated at lysines 5, 8, 12 or 16. DNA from antibody-bound or unbound chromatin was assayed by slot blotting. Pol I and pol II transcribed genes located in euchromatin were shown to have levels of H4 acetylation at lysines 5, 8 and 12 equivalent to those in input chromatin, but to be slightly enriched in H4 acetylated at lysine 16. In no case did the acetylation level correlate with actual or potential transcriptional activity. All acetylated histone H4 isoforms were depleted in non-coding, simple repeat DNA in heterochromatin, though the extent of depletion varied with the type of heterochromatin and with the isoform. Two single copy genes that map within or adjacent to blocks of paracentric heterochromatin are depleted in H4 acetylated at lysines 5, 8 and 12, but not 16. Consensus sequences of repetitive elements of the Alu family (SINES, enriched in R bands) were associated with H4 that was more highly acetylated at all four lysines than input chromatin, while H4 associated with Kpn I elements (LINES, enriched in G bands) was significantly underacetylated.
PMCID: PMC147356
PMID: 9461459
Ramsey, Stephen A. | Knijnenburg, Theo A. | Kennedy, Kathleen A. | Zak, Daniel E. | Gilchrist, Mark | Gold, Elizabeth S. | Johnson, Carrie D. | Lampano, Aaron E. | Litvak, Vladimir | Navarro, Garnet | Stolyar, Tetyana | Aderem, Alan | Shmulevich, Ilya
Motivation: Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation.
Results: We tested this hypothesis using a multi-evidence approach for predicting binding sites. As a training/test dataset, we used ChIP-Seq-derived TF binding site locations for five TFs in activated murine macrophages. Our model combined TF binding site motif scanning with evidence from sequence-based sources and from HAc ChIP-Seq data, using a weighted sum of thresholded scores. We find that using HAc data significantly improves the performance of motif-based TF binding site prediction. Furthermore, we find that within regions of high HAc, local minima of the HAc ChIP-Seq signal are particularly strongly correlated with TF binding locations. Our model, using motif scanning and HAc local minima, improves the sensitivity for TF binding site prediction by ∼50% over a model based on motif scanning alone, at a false positive rate cutoff of 0.01.
Availability: The data and software source code for model training and validation are freely available online at http://magnet.systemsbiology.net/hac.
Contact: aderem@systemsbiology.org; ishmulevich@systemsbiology.org
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq405
PMCID: PMC2922897
PMID: 20663846
Behaviors observed at the cellular level such as development and acquisition of effector functions by immune cells result from transcriptional changes. The biochemical mediators of transcription are sequence specific transcription factors (TFs), chromatin modifying enzymes, and chromatin, the complex of DNA and histone proteins. Covalent modification of DNA and histones, also termed epigenetic modification, influences the accessibility of target sequences for transcription factors on chromatin and the expression of linked genes required for immune functions. Genome-wide techniques such as ChIP-Seq have described the entire “cistrome” of transcription factors involved in specific developmental steps of B and T cells and started to define specific immune responses in terms of the binding profiles of critical effectors and epigenetic modification patterns. Current data suggest that both promoters and enhancers are prepared for action at different stages of activation by epigenetic modification through distinct transcription factors in different cells.
doi:10.1016/j.immuni.2011.06.002
PMCID: PMC3137373
PMID: 21703538
Background
Histone post-translational modifications are critical for gene expression and cell viability. A broad spectrum of histone lysine residues have been identified in yeast that are targeted by a variety of modifying enzymes. However, the regulation and interaction of these enzymes remains relatively uncharacterized. Previously we demonstrated that deletion of either the histone acetyltransferase (HAT) GCN5 or the histone deacetylase (HDAC) HDA1 exacerbated the temperature sensitive (ts) mutant phenotype of the Anaphase Promoting Complex (APC) apc5CA allele. Here, the apc5CA mutant background is used to study a previously uncharacterized functional antagonistic genetic interaction between Gcn5 and Hda1 that is not detected in APC5 cells.
Results
Using Northerns, Westerns, reverse transcriptase PCR (rtPCR), chromatin immunoprecipitation (ChIP), and mutant phenotype suppression analysis, we observed that Hda1 and Gcn5 appear to compete for recruitment to promoters. We observed that the presence of Hda1 can partially occlude the binding of Gcn5 to the same promoter. Occlusion of Gcn5 recruitment to these promoters involved Hda1 and Tup1. Using sequential ChIP we show that Hda1 and Tup1 likely form complexes at these promoters, and that complex formation can be increased by deleting GCN5.
Conclusions
Our data suggests large Gcn5 and Hda1 containing complexes may compete for space on promoters that utilize the Ssn6/Tup1 repressor complex. We predict that in apc5CA cells the accumulation of an APC target may compensate for the loss of both GCN5 and HDA1.
doi:10.1186/1747-1028-6-13
PMCID: PMC3141613
PMID: 21651791