PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1267592)

Clipboard (0)
None

Related Articles

1.  Global Mapping of Cell Type–Specific Open Chromatin by FAIRE-seq Reveals the Regulatory Role of the NFI Family in Adipocyte Differentiation 
PLoS Genetics  2011;7(10):e1002311.
Identification of regulatory elements within the genome is crucial for understanding the mechanisms that govern cell type–specific gene expression. We generated genome-wide maps of open chromatin sites in 3T3-L1 adipocytes (on day 0 and day 8 of differentiation) and NIH-3T3 fibroblasts using formaldehyde-assisted isolation of regulatory elements coupled with high-throughput sequencing (FAIRE-seq). FAIRE peaks at the promoter were associated with active transcription and histone modifications of H3K4me3 and H3K27ac. Non-promoter FAIRE peaks were characterized by H3K4me1+/me3-, the signature of enhancers, and were largely located in distal regions. The non-promoter FAIRE peaks showed dynamic change during differentiation, while the promoter FAIRE peaks were relatively constant. Functionally, the adipocyte- and preadipocyte-specific non-promoter FAIRE peaks were, respectively, associated with genes up-regulated and down-regulated by differentiation. Genes highly up-regulated during differentiation were associated with multiple clustered adipocyte-specific FAIRE peaks. Among the adipocyte-specific FAIRE peaks, 45.3% and 11.7% overlapped binding sites for, respectively, PPARγ and C/EBPα, the master regulators of adipocyte differentiation. Computational motif analyses of the adipocyte-specific FAIRE peaks revealed enrichment of a binding motif for nuclear family I (NFI) transcription factors. Indeed, ChIP assay showed that NFI occupy the adipocyte-specific FAIRE peaks and/or the PPARγ binding sites near PPARγ, C/EBPα, and aP2 genes. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus. Overexpression of dominant-negative NFIA or siRNA–mediated knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation, suggesting a physiological function of these factors in the adipogenic program. Together, our study demonstrates the utility of FAIRE-seq in providing a global view of cell type–specific regulatory elements in the genome and in identifying transcriptional regulators of adipocyte differentiation.
Author Summary
Humans consist of a few hundred types of specialized-function cells. Spatial and temporal transcriptional regulation of genes is essential for manifestation of cellular phenotypes. Identification of regulatory regions in the genome is central to understanding the mechanism of cell type–specific gene regulation. Recently developed high-throughput sequencing technology and computational analyses allow genome-wide investigation of the genome's chromatin structure. Using the FAIRE-seq technique, we identified the genome's open chromatin regions, which harbor regulatory elements in adipocytes. Open chromatin regions distal to genes' transcription start sites significantly differ among cell types. Multiple cell type–specific open chromatin regions exist near genes regulated during adipocyte differentiation. Computational motif analysis of adipocyte-specific open chromatin regions revealed enrichment of a binding motif for the NFI transcription factor family. These factors bind to the regulatory elements near adipogenic PPARγ, C/EBPα, and aP2 genes and regulate their expression. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus and knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation. Our study demonstrates the utility of FAIRE-seq in providing a global view of regulatory elements and in identifying transcriptional regulators of cellular functions.
doi:10.1371/journal.pgen.1002311
PMCID: PMC3197683  PMID: 22028663
2.  ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome 
PLoS Computational Biology  2008;4(10):e1000201.
Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications.
Author Summary
The DNA in eukaryotes is packaged by histones. Interestingly, histones can be marked by a variety of posttranslational modifications, and it has been hypothesized that distinct combinations of histone modifications mark at distinct functional regions of the genome. The study of histone modifications has been aided by the development of high-throughput techniques to map a wide assortment of histone modifications on a global scale. However, because much of our current understanding of the human genome is concentrated on promoters, most studies have only examined histone modifications at these well-defined sites, ignoring the vast majority of the genome. To aid in the discovery of functional elements outside of these well-annotated loci, we develop an unbiased method that searches for commonly occurring histone modification patterns on a global scale without using any annotation information. This method recovers known patterns associated with transcriptional enhancers and promoters. Supporting the histone code hypothesis, we discover that the different functional activities of enhancers are closely associated with the presence of different histone modification patterns. We also discover several novel patterns that likely contain other potential regulatory elements. As the availability of large-scale histone modification data increases, the ability of methods such as the one presented here to concisely describe commonly occurring chromatin signatures, thereby abstracting away irrelevant or redundant data, will become increasingly more critical.
doi:10.1371/journal.pcbi.1000201
PMCID: PMC2556089  PMID: 18927605
3.  Temporal ChIP-on-Chip of RNA-Polymerase-II to detect novel gene activation events during photoreceptor maturation 
Molecular Vision  2010;16:252-271.
Purpose
During retinal development, post-mitotic neural progenitor cells must activate thousands of genes to complete synaptogenesis and terminal maturation. While many of these genes are known, others remain beyond the sensitivity of expression microarray analysis. Some of these elusive gene activation events can be detected by mapping changes in RNA polymerase-II (Pol-II) association around transcription start sites.
Methods
High-resolution (35 bp) chromatin immunoprecipitation (ChIP)-on-chip was used to map changes in Pol-II binding surrounding 26,000 gene transcription start sites during photoreceptor maturation of the mouse neural retina, comparing postnatal age 25 (P25) to P2. Coverage was 10–12 kb per transcription start site, including 2.5 kb downstream. Pol-II-active regions were mapped to the mouse genomic DNA sequence by using computational methods (Tiling Analysis Software-TAS program), and the ratio of maximum Pol-II binding (P25/P2) was calculated for each gene. A validation set of 36 genes (3%), representing a full range of Pol-II signal ratios (P25/P2), were examined with quantitative ChIP assays for transcriptionally active Pol-II. Gene expression assays were also performed for 19 genes of the validation set, again on independent samples. FLT-3 Interacting Zinc-finger-1 (FIZ1), a zinc-finger protein that associates with active promoter complexes of photoreceptor-specific genes, provided an additional ChIP marker to highlight genes activated in the mature neural retina. To demonstrate the use of ChIP-on-chip predictions to find novel gene activation events, four additional genes were selected for quantitative PCR analysis (qRT–PCR analysis); these four genes have human homologs located in unidentified retinal disease regions: Solute carrier family 25 member 33 (Slc25a33), Lysophosphatidylcholine acyltransferase 1 (Lpcat1), Coiled-coil domain-containing 126 (Ccdc126), and ADP-ribosylation factor-like 4D (Arl4d).
Results
ChIP-on-chip Pol-II peak signal ratios >1.8 predicted increased amounts of transcribing Pol-II and increased expression with an estimated 97% accuracy, based on analysis of the validation gene set. Using this threshold ratio, 1,101 genes were predicted to experience increased binding of Pol-II in their promoter regions during terminal maturation of the neural retina. Over 800 of these gene activations were additional to those previously reported by microarray analysis. Slc25a33, Lpcat1, Ccdc126, and Arl4d increased expression significantly (p<0.001) during photoreceptor maturation. Expression of all four genes was diminished in adult retinas lacking rod photoreceptors (Rd1 mice) compared to normal retinas (90% loss for Ccdc126 and Arl4d). For rhodopsin (Rho), a marker of photoreceptor maturation, two regions of maximum Pol-II signal corresponded to the upstream rhodopsin enhancer region and the rhodopsin proximal promoter region.
Conclusions
High-resolution maps of Pol-II binding around transcription start sites were generated for the postnatal mouse retina; which can predict activation increases for a specific gene of interest. Novel gene activation predictions are enriched for biologic functions relevant to vision, neural function, and chromatin regulation. Use of the data set to detect novel activation increases was demonstrated by expression analysis for several genes that have human homologs located within unidentified retinal disease regions: Slc25a33, Lpcat1, Ccdc126, and Arl4d. Analysis of photoreceptor-deficient retinas indicated that all four genes are expressed in photoreceptors. Genome-wide maps of Pol-II binding were developed for visual access in the University of California, Santa Cruz (UCSC) Genome Browser and its eye-centric version EyeBrowse (National Eye Institute-NEI). Single promoter resolution of Pol-II distribution patterns suggest the Rho enhancer region and the Rho proximal promoter region become closely associated with the activated gene’s promoter complex.
PMCID: PMC2822553  PMID: 20161818
4.  LANA Binds to Multiple Active Viral and Cellular Promoters and Associates with the H3K4Methyltransferase hSET1 Complex 
PLoS Pathogens  2014;10(7):e1004240.
Kaposi's sarcoma-associated herpesvirus (KSHV) is a γ-herpesvirus associated with KS and two lymphoproliferative diseases. Recent studies characterized epigenetic modification of KSHV episomes during latency and determined that latency-associated genes are associated with H3K4me3 while most lytic genes are associated with the silencing mark H3K27me3. Since the latency-associated nuclear antigen (LANA) (i) is expressed very early after de novo infection, (ii) interacts with transcriptional regulators and chromatin remodelers, and (iii) regulates the LANA and RTA promoters, we hypothesized that LANA may contribute to the establishment of latency through epigenetic control. We performed a detailed ChIP-seq analysis in cells of lymphoid and endothelial origin and compared H3K4me3, H3K27me3, polII, and LANA occupancy. On viral episomes LANA binding was detected at numerous lytic and latent promoters, which were transactivated by LANA using reporter assays. LANA binding was highly enriched at H3K4me3 peaks and this co-occupancy was also detected on many host gene promoters. Bioinformatic analysis of enriched LANA binding sites in combination with biochemical binding studies revealed three distinct binding patterns. A small subset of LANA binding sites showed sequence homology to the characterized LBS1/2 sequence in the viral terminal repeat. A large number of sites contained a novel LANA binding motif (TCCAT)3 which was confirmed by gel shift analysis. Third, some viral and cellular promoters did not contain LANA binding sites and are likely enriched through protein/protein interaction. LANA was associated with H3K4me3 marks and in PEL cells 86% of all LANA bound promoters were transcriptionally active, leading to the hypothesis that LANA interacts with the machinery that methylates H3K4. Co-immunoprecipitation demonstrated LANA association with endogenous hSET1 complexes in both lymphoid and endothelial cells suggesting that LANA may contribute to the epigenetic profile of KSHV episomes.
Author Summary
KSHV is a DNA tumor virus which is associated with Kaposi's sarcoma and some lymphoproliferative diseases. During latent infection, the viral genome persists as circular extrachromosomal DNA in the nucleus and expresses a very limited number of viral proteins, including LANA, a multi-functional protein. KSHV viral episomes, like host genomic DNA, are subject to chromatin formation and histone modifications which contribute to tightly controlled gene expression during latency. We determined where LANA binds on the KSHV and human genomes, and mapped activating and repressing histone marks and RNA polymerase II binding. We found that LANA bound near transcription start sites, and binding correlated with the transcription active mark H3K4me3, but not silencing mark H3K27me3. Binding sites for transcription factors including znf143, CTCF, and Stat1 are enriched at regions where LANA is bound. We identified some LANA binding sites near human gene promoters that resembled KSHV sequences known to bind LANA. We also found a novel motif that occurs frequently in the human genome and that binds LANA directly despite being different from known LANA-binding sequences. Furthermore, we demonstrate that LANA associates with the H3K4 methyltransferase hSET1 which creates activating histone marks.
doi:10.1371/journal.ppat.1004240
PMCID: PMC4102568  PMID: 25033463
5.  Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data 
BMC Bioinformatics  2015;16(1):74.
Background
Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion.
Results
We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes.
The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10−100), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately −9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (−9435 to −8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.
Conclusion
We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0460-0) contains supplementary material, which is available to authorized users.
doi:10.1186/s12859-015-0460-0
PMCID: PMC4359553  PMID: 25887857
Cardiac differentiation; Network inference; Logistic regression; Time-varying dynamic Bayesian model; Data integration; Gene regulatory network
6.  RNA Polymerase II Binding Patterns Reveal Genomic Regions Involved in MicroRNA Gene Regulation 
PLoS ONE  2010;5(11):e13798.
MicroRNAs are small non-coding RNAs involved in post-transcriptional regulation of gene expression. Due to the poor annotation of primary microRNA (pri-microRNA) transcripts, the precise location of promoter regions driving expression of many microRNA genes is enigmatic. This deficiency hinders our understanding of microRNA-mediated regulatory networks. In this study, we develop a computational approach to identify the promoter region and transcription start site (TSS) of pri-microRNAs actively transcribed using genome-wide RNA Polymerase II (RPol II) binding patterns derived from ChIP-seq data. Based upon the assumption that the distribution of RPol II binding patterns around the TSS of microRNA and protein coding genes are similar, we designed a statistical model to mimic RPol II binding patterns around the TSS of highly expressed, well-annotated promoter regions of protein coding genes. We used this model to systematically scan the regions upstream of all intergenic microRNAs for RPol II binding patterns similar to those of TSS from protein coding genes. We validated our findings by examining the conservation, CpG content, and activating histone marks in the identified promoter regions. We applied our model to assess changes in microRNA transcription in steroid hormone-treated breast cancer cells. The results demonstrate many microRNA genes have lost hormone-dependent regulation in tamoxifen-resistant breast cancer cells. MicroRNA promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription, and therefore allows comparison of transcription activities between different conditions, such as normal and disease states.
doi:10.1371/journal.pone.0013798
PMCID: PMC2970572  PMID: 21072189
7.  Promoter nucleosome dynamics regulated by signalling through the CTD code 
eLife  null;4:e09008.
The phosphorylation of the RNA polymerase II C-terminal domain (CTD) plays a key role in delineating transcribed regions within chromatin by recruiting histone methylases and deacetylases. Using genome-wide nucleosome mapping, we show that CTD S2 phosphorylation controls nucleosome dynamics in the promoter of a subset of 324 genes, including the regulators of cell differentiation ste11 and metabolic adaptation inv1. Mechanistic studies on these genes indicate that during gene activation a local increase of phospho-S2 CTD nearby the promoter impairs the phospho-S5 CTD-dependent recruitment of Set1 and the subsequent recruitment of specific HDACs, which leads to nucleosome depletion and efficient transcription. The early increase of phospho-S2 results from the phosphorylation of the CTD S2 kinase Lsk1 by MAP kinase in response to cellular signalling. The artificial tethering of the Lsk1 kinase at the ste11 promoter is sufficient to activate transcription. Therefore, signalling through the CTD code regulates promoter nucleosomes dynamics.
DOI: http://dx.doi.org/10.7554/eLife.09008.001
eLife digest
The process of activating genes—known as gene expression—involves a number of steps. During the first step, the gene's DNA is copied or ‘transcribed’ to produce a molecule of messenger RNA. However, most of the DNA in a cell is wrapped around proteins called histones to make structures known as nucleosomes, and the DNA has to be unpacked to allow the enzymes that make messenger RNA to access it.
Cells regulate how the DNA is packed by attaching chemical groups to the histone proteins. Adding acetyl groups to histones usually causes the nucleosomes to unwrap and creates loosely packed DNA that helps with gene expression. On the other hand, the addition of methyl groups has the opposite effect.
RNA polymerase II is the enzyme that carries out transcription of messenger RNAs in all eukaryotic cells—that is, the cells of organisms like plants, animals, and fungi. Like all enzymes, RNA polymerase II is made of smaller building blocks called amino acids. One end of the RNA polymerase II enzyme, called the C-terminal domain (or CTD), contains a unique sequence of amino acids that serves as a scaffold to recruit other proteins involved in transcription and histone modifications. Different amino acids in this region of RNA polymerase II can be modified by the addition of phosphate groups. The pattern of these modifications is often thought of as a code and can influence which other proteins get recruited.
It remains poorly understood how RNA polymerase II regulates nucleosomes to allow transcription to occur. Materne, Anandhakumar et al. have now addressed this issue by engineering mutant yeast cells in which phosphate groups cannot be added to specific amino acids in the RNA polymerase II enzyme. Most genes were expressed as normal in these yeast cells, but a few hundred genes were not expressed.
Materne, Anandhakumar et al. then used a technique called MNase-Seq to map the position of nucleosomes across the genome and found that there were more nucleosomes near to start of these down-regulated genes. Further experiments showed that the addition of phosphate groups onto the RNA polymerase II is required to deplete the nucleosomes at the start of a gene called ste11, which allows transcription to occur.
Materne, Anandhakumar et al. also found that artificially tethering the enzyme that adds phosphate groups to the C-terminal domain to the start of the ste11 gene was sufficient to oust nucleosomes and activate transcription by RNA polymerase II.
Future work will address if this newly discovered mechanism is implicated in the activation of specific patterns of gene expression during the development of more complex organisms.
DOI: http://dx.doi.org/10.7554/eLife.09008.002
doi:10.7554/eLife.09008
PMCID: PMC4502402  PMID: 26098123
RNA polymerase; chromatin; Set1; MAP kinase; HDAC; S. pombe
8.  Discovery of Transcription Factors and Regulatory Regions Driving In Vivo Tumor Development by ATAC-seq and FAIRE-seq Open Chromatin Profiling 
PLoS Genetics  2015;11(2):e1004994.
Genomic enhancers regulate spatio-temporal gene expression by recruiting specific combinations of transcription factors (TFs). When TFs are bound to active regulatory regions, they displace canonical nucleosomes, making these regions biochemically detectable as nucleosome-depleted regions or accessible/open chromatin. Here we ask whether open chromatin profiling can be used to identify the entire repertoire of active promoters and enhancers underlying tissue-specific gene expression during normal development and oncogenesis in vivo. To this end, we first compare two different approaches to detect open chromatin in vivo using the Drosophila eye primordium as a model system: FAIRE-seq, based on physical separation of open versus closed chromatin; and ATAC-seq, based on preferential integration of a transposon into open chromatin. We find that both methods reproducibly capture the tissue-specific chromatin activity of regulatory regions, including promoters, enhancers, and insulators. Using both techniques, we screened for regulatory regions that become ectopically active during Ras-dependent oncogenesis, and identified 3778 regions that become (over-)activated during tumor development. Next, we applied motif discovery to search for candidate transcription factors that could bind these regions and identified AP-1 and Stat92E as key regulators. We validated the importance of Stat92E in the development of the tumors by introducing a loss of function Stat92E mutant, which was sufficient to rescue the tumor phenotype. Additionally we tested if the predicted Stat92E responsive regulatory regions are genuine, using ectopic induction of JAK/STAT signaling in developing eye discs, and observed that similar chromatin changes indeed occurred. Finally, we determine that these are functionally significant regulatory changes, as nearby target genes are up- or down-regulated. In conclusion, we show that FAIRE-seq and ATAC-seq based open chromatin profiling, combined with motif discovery, is a straightforward approach to identify functional genomic regulatory regions, master regulators, and gene regulatory networks controlling complex in vivo processes.
Author Summary
The functional expression of all genes is regulated by proteins, namely transcription factors that bind to specific areas of DNA known as regulatory regions. Whereas most DNA in our genome is normally bound by other proteins (histones) and packaged into units called nucleosomes, a specific subset of tissue-specific regulatory regions is responsible for tissue-specific gene expression; these active regions are nucleosome-depleted and bound by transcription factors. We use two techniques to identify these open chromatin regions, in a normal tissue and a RasV12 induced cancer tissue. We discovered a remarkable change in the accessible regulatory landscape between these two tissues, with several thousand regions becoming more accessible in the cancer tissue. We identified two transcription factors known to be involved in cancer (AP-1 and Stat92E) controlling these newly accessible regulatory regions. Finally, we introduced a mutation resulting in Stat92E becoming non-functional in the cancer tissue, which decreased the severity of the tumor. Our study shows that open chromatin profiling can be used to identify complex in vivo processes, and we shed new light on Ras dependent cancer development.
doi:10.1371/journal.pgen.1004994
PMCID: PMC4334524  PMID: 25679813
9.  Genome-wide mapping of histone H3 lysine 4 trimethylation in Eucalyptus grandis developing xylem 
BMC Plant Biology  2015;15:117.
Background
Histone modifications play an integral role in plant development, but have been poorly studied in woody plants. Investigating chromatin organization in wood-forming tissue and its role in regulating gene expression allows us to understand the mechanisms underlying cellular differentiation during xylogenesis (wood formation) and identify novel functional regions in plant genomes. However, woody tissue poses unique challenges for using high-throughput chromatin immunoprecipitation (ChIP) techniques for studying genome-wide histone modifications in vivo. We investigated the role of the modified histone H3K4me3 (trimethylated lysine 4 of histone H3) in gene expression during the early stages of wood formation using ChIP-seq in Eucalyptus grandis, a woody biomass model.
Results
Plant chromatin fixation and isolation protocols were optimized for developing xylem tissue collected from field-grown E. grandis trees. A “nano-ChIP-seq” procedure was employed for ChIP DNA amplification. Over 9 million H3K4me3 ChIP-seq and 18 million control paired-end reads were mapped to the E. grandis reference genome for peak-calling using Model-based Analysis of ChIP-Seq. The 12,177 significant H3K4me3 peaks identified covered ~1.5% of the genome and overlapped some 9,623 protein-coding genes and 38 noncoding RNAs. H3K4me3 library coverage, peaking ~600 - 700 bp downstream of the transcription start site, was highly correlated with gene expression levels measured with RNA-seq. Overall, H3K4me3-enriched genes tended to be less tissue-specific than unenriched genes and were overrepresented for general cellular metabolism and development gene ontology terms. Relative expression of H3K4me3-enriched genes in developing secondary xylem was higher than unenriched genes, however, and highly expressed secondary cell wall-related genes were enriched for H3K4me3 as validated using ChIP-qPCR.
Conclusions
In this first genome-wide analysis of a modified histone in a woody tissue, we optimized a ChIP-seq procedure suitable for field-collected samples. In developing E. grandis xylem, H3K4me3 enrichment is an indicator of active transcription, consistent with its known role in sustaining pre-initiation complex formation in yeast. The H3K4me3 ChIP-seq data from this study paves the way to understanding the chromatin landscape and epigenomic architecture of xylogenesis in plants, and complements RNA-seq evidence of gene expression for the future improvement of the E. grandis genome annotation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-015-0499-0) contains supplementary material, which is available to authorized users.
doi:10.1186/s12870-015-0499-0
PMCID: PMC4425858  PMID: 25957781
ChIP-seq; H3K4me3; Histone; Secondary cell wall; Xylogenesis; Eucalyptus
10.  Zelda Binding in the Early Drosophila melanogaster Embryo Marks Regions Subsequently Activated at the Maternal-to-Zygotic Transition 
PLoS Genetics  2011;7(10):e1002266.
The earliest stages of development in most metazoans are driven by maternally deposited proteins and mRNAs, with widespread transcriptional activation of the zygotic genome occurring hours after fertilization, at a period known as the maternal-to-zygotic transition (MZT). In Drosophila, the MZT is preceded by the transcription of a small number of genes that initiate sex determination, patterning, and other early developmental processes; and the zinc-finger protein Zelda (ZLD) plays a key role in their transcriptional activation. To better understand the mechanisms of ZLD activation and the range of its targets, we used chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) to map regions bound by ZLD before (mitotic cycle 8), during (mitotic cycle 13), and after (late mitotic cycle 14) the MZT. Although only a handful of genes are transcribed prior to mitotic cycle 10, we identified thousands of regions bound by ZLD in cycle 8 embryos, most of which remain bound through mitotic cycle 14. As expected, early ZLD-bound regions include the promoters and enhancers of genes transcribed at this early stage. However, we also observed ZLD bound at cycle 8 to the promoters of roughly a thousand genes whose first transcription does not occur until the MZT and to virtually all of the thousands of known and presumed enhancers bound at cycle 14 by transcription factors that regulate patterned gene activation during the MZT. The association between early ZLD binding and MZT activity is so strong that ZLD binding alone can be used to identify active promoters and regulatory sequences with high specificity and selectivity. This strong early association of ZLD with regions not active until the MZT suggests that ZLD is not only required for the earliest wave of transcription but also plays a major role in activating the genome at the MZT.
Author Summary
The newly fertilized eggs of most animal species begin development with a series of rapid cell divisions. During this time of rapid DNA replication, there is little or no transcription of the embryo's genome, with the synthesis of new proteins being directed by a store of maternally deposited mRNAs. Several hours after fertilization, at a period known as the maternal-to-zygotic transition (MZT), transcription of the embryo's genome begins in earnest, but little is known about how this process is initiated. In this paper we investigate the role of a protein known as Zelda (or ZLD) at the MZT in the laboratory model insect Drosophila melanogaster. ZLD had been previously shown to control the activation of a small number of genes expressed prior to the MZT. Here, using an experimental technique (ChIP-Seq) that allowed us to visualize where on the genome a protein is bound, we show that, approximately an hour prior to the MZT, ZLD is bound to most of the genomic regions active at the MZT. This suggests that ZLD may act as a kind of an “on switch” for the zygotic genome, poising regions where it binds for activation at the MZT, and this raises the possibility that similar master regulators of the MZT exist in other species.
doi:10.1371/journal.pgen.1002266
PMCID: PMC3197655  PMID: 22028662
11.  Transcription Initiation Patterns Indicate Divergent Strategies for Gene Regulation at the Chromatin Level 
PLoS Genetics  2011;7(1):e1001274.
The application of deep sequencing to map 5′ capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: “focused” promoters with transcription start sites (TSSs) that occur in a narrowly defined genomic span and “dispersed” promoters with TSSs that are spread over a larger window. Previous studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, but virtually no studies have directly investigated the relationship with chromatin features. Here, we show that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Dispersed promoters display higher associations with well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone variants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes. Notably, differences are conserved across mammals and flies, and they provide for a clearer separation of promoter architectures than the presence and absence of CpG islands or the occurrence of stalled RNA polymerase. Computational models support the stronger contribution of chromatin features to the definition of dispersed promoters compared to focused start sites. Our results show that promoter classes defined from 5′ capped transcripts not only reflect differences in the initiation process at the core promoter but also are indicative of divergent transcriptional programs established within gene-proximal nucleosome organization.
Author Summary
How are genes transcribed at the right levels and under the right conditions? Transcription regulation in eukaryotes has long been proposed to work by a division of labor: ubiquitous DNA sequence features in the core promoter region, close to the transcription start site (TSS) of genes, were thought to generically encode information to recruit RNA polymerase to initiate transcription, while specific sequence features, often distal from the genes, were thought to boost expression under the right conditions. Supporting the generic function of core promoters, genome-wide chromatin maps showed a stereotypical arrangement of well-spaced nucleosomes providing access to the TSS. High-throughput sequencing has generated genome-wide TSS maps at high resolution, which show that promoters exhibit different initiation patterns, ranging from focused start sites to dispersed regions. Linking these patterns to chromatin maps, we now find distinct core promoter classes, those in which the TSS location is defined broadly on the chromatin level and those in which the TSS is defined by precisely positioned sequence features. Notably, these architectures are conserved deeply across eukaryotes and are used for different functional classes of genes. Our work adds to the increasing understanding that core promoters contribute significantly to the complexity of eukaryotic gene expression.
doi:10.1371/journal.pgen.1001274
PMCID: PMC3020932  PMID: 21249180
12.  Butyrate Induced IGF2 Activation Correlated with Distinct Chromatin Signatures Due to Histone Modification 
Histone modification has emerged as a very important mechanism regulating the transcriptional status of the genome. Insulin-like growth factor 2 (IGF2) is a peptide hormone controlling various cellular processes, including proliferation and apoptosis. H19 gene is closely linked to IGF2 gene, and IGF2 and H19 are reciprocally regulated imprinted genes. The epigenetic signature of H19 promoter (hypermethylation) on the paternal allele plays a vital role in allowing the expression of the paternal allele of IGF2.46 Our previous studies demonstrate that butyrate regulates the expression of IGF2 as well as genes encoding IGF Binding proteins. To obtain further understanding of histone modification and its regulatory potentials in controlling IGF2/H19 gene expression, we investigated the histone modification status of some key histones associated with the expression of IGF2/H19 genes in bovine cells using RNA-seq in combination with Chip-seq technology. A high-resolution map of the major chromatin modification at the IGF2/H19 locus induced by butyrate was constructed to illustrate the fundamental association of the chromatin modification landscape that may play a role in the activation of the IGF2 gene. High-definition epigenomic landscape mapping revealed that IGF2 and H19 have distinct chromatin modification patterns at their coding and promoter regions, such as TSSs and TTSs. Moreover, the correlation between the differentially methylated regions (DMRs) of IGF2/H19 locus and histone modification (acetylation and methylation) indicated that epigenetic signatures/markers of DNA methylation, histone methylation and histone acetylation were differentially distributed on the expressed IGF2 and silenced H19 genes. Our evidence also suggests that butyrate-induced regional changes of histone acetylation statusin the upstream regulation domain of H19 may be related to the reduced expression of H19 and strong activation of IGF2. Our results provided insights into the mechanism of butyrate-induced loss of imprinting (LOI) of IGF2 and regulation of gene expression by histone modification.
doi:10.4137/GRSB.S11243
PMCID: PMC3623616  PMID: 23645985
bovine; butyrate; ChIP-seq; chromatin; histone modification; IGF2
13.  Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data 
BMC Bioinformatics  2010;11(Suppl 1):S65.
Background
Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context.
Methods
We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters.
Results
We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters.
Conclusion
Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters.
doi:10.1186/1471-2105-11-S1-S65
PMCID: PMC3009539  PMID: 20122241
14.  The Groucho Co-repressor Is Primarily Recruited to Local Target Sites in Active Chromatin to Attenuate Transcription 
PLoS Genetics  2014;10(8):e1004595.
Gene expression is regulated by the complex interaction between transcriptional activators and repressors, which function in part by recruiting histone-modifying enzymes to control accessibility of DNA to RNA polymerase. The evolutionarily conserved family of Groucho/Transducin-Like Enhancer of split (Gro/TLE) proteins act as co-repressors for numerous transcription factors. Gro/TLE proteins act in several key pathways during development (including Notch and Wnt signaling), and are implicated in the pathogenesis of several human cancers. Gro/TLE proteins form oligomers and it has been proposed that their ability to exert long-range repression on target genes involves oligomerization over broad regions of chromatin. However, analysis of an endogenous gro mutation in Drosophila revealed that oligomerization of Gro is not always obligatory for repression in vivo. We have used chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) to profile Gro recruitment in two Drosophila cell lines. We find that Gro predominantly binds at discrete peaks (<1 kilobase). We also demonstrate that blocking Gro oligomerization does not reduce peak width as would be expected if Gro oligomerization induced spreading along the chromatin from the site of recruitment. Gro recruitment is enriched in “active” chromatin containing developmentally regulated genes. However, Gro binding is associated with local regions containing hypoacetylated histones H3 and H4, which is indicative of chromatin that is not fully open for efficient transcription. We also find that peaks of Gro binding frequently overlap the transcription start sites of expressed genes that exhibit strong RNA polymerase pausing and that depletion of Gro leads to release of polymerase pausing and increased transcription at a bona fide target gene. Our results demonstrate that Gro is recruited to local sites by transcription factors to attenuate rather than silence gene expression by promoting histone deacetylation and polymerase pausing.
Author Summary
Repression by transcription factors plays a central role in gene regulation. The Groucho/Transducin-Like Enhancer of split (Gro/TLE) family of co-repressors interacts with many different transcription factors and has many essential roles during animal development. Groucho/TLE proteins form oligomers that are necessary for target gene repression in some contexts. We have profiled the genome-wide recruitment of the founding member of this family, Groucho (from Drosophila) to gain insight into how and where it binds with respect to target genes and to identify factors associated with its binding. We find that Groucho binds in discrete peaks, frequently at transcription start sites, and that blocking Groucho from forming oligomers does not significantly change the pattern of Groucho recruitment. Although Groucho acts as a repressor, Groucho binding is enriched in chromatin that is permissive for transcription, and we find that it acts to attenuate rather than completely silence target gene expression. Thus, Groucho does not act as an “on/off” switch on target gene expression, but rather as a “mute” button.
doi:10.1371/journal.pgen.1004595
PMCID: PMC4148212  PMID: 25165826
15.  The NSL Complex Regulates Housekeeping Genes in Drosophila 
PLoS Genetics  2012;8(6):e1002736.
MOF is the major histone H4 lysine 16-specific (H4K16) acetyltransferase in mammals and Drosophila. In flies, it is involved in the regulation of X-chromosomal and autosomal genes as part of the MSL and the NSL complexes, respectively. While the function of the MSL complex as a dosage compensation regulator is fairly well understood, the role of the NSL complex in gene regulation is still poorly characterized. Here we report a comprehensive ChIP–seq analysis of four NSL complex members (NSL1, NSL3, MBD-R2, and MCRS2) throughout the Drosophila melanogaster genome. Strikingly, the majority (85.5%) of NSL-bound genes are constitutively expressed across different cell types. We find that an increased abundance of the histone modifications H4K16ac, H3K4me2, H3K4me3, and H3K9ac in gene promoter regions is characteristic of NSL-targeted genes. Furthermore, we show that these genes have a well-defined nucleosome free region and broad transcription initiation patterns. Finally, by performing ChIP–seq analyses of RNA polymerase II (Pol II) in NSL1- and NSL3-depleted cells, we demonstrate that both NSL proteins are required for efficient recruitment of Pol II to NSL target gene promoters. The observed Pol II reduction coincides with compromised binding of TBP and TFIIB to target promoters, indicating that the NSL complex is required for optimal recruitment of the pre-initiation complex on target genes. Moreover, genes that undergo the most dramatic loss of Pol II upon NSL knockdowns tend to be enriched in DNA Replication–related Element (DRE). Taken together, our findings show that the MOF-containing NSL complex acts as a major regulator of housekeeping genes in flies by modulating initiation of Pol II transcription.
Author Summary
Housekeeping genes are required to support basic cellular functions and are therefore expressed constitutively in all tissues. Although the homeostasis of housekeeping gene expression is vital for cell survival, most research on the transcription initiation has been focused on TATA-box-containing promoters of inducible and developmental genes, while regulatory mechanisms at the TATA-less promoters of housekeeping genes have remained poorly understood. Using genome-wide chromatin binding profiles, we find that the NSL complex, a histone acetyltransferase-containing complex, is bound to the majority of constitutively active gene promoters. We show that NSL-bound genes display specific sets of DNA motifs, well-defined nucleosome free regions, and broad transcription initiation patterns. In addition, we show that the NSL complex regulates the recruitment of the basal transcription machinery to target promoters; more specifically, we can pinpoint its role to the early steps of Pol II recruitment. Interestingly, we also see that NSL-bound genes are most susceptible to Pol II loss after depletion of NSLs when they contain the DNA Replication–related Element (DRE). Taken together, we provide a genome-wide analysis of a chromatin-modifying complex that is globally involved in the regulation of housekeeping gene expression.
doi:10.1371/journal.pgen.1002736
PMCID: PMC3375229  PMID: 22723752
16.  Identification of regulatory regions of bidirectional genes in cervical cancer 
BMC Medical Genomics  2013;6(Suppl 1):S5.
Background
Bidirectional promoters are shared promoter sequences between divergent gene pair (genes proximal to each other on opposite strands), and can regulate the genes in both directions. In the human genome, > 10% of protein-coding genes are arranged head-to-head on opposite strands, with transcription start sites that are separated by < 1,000 base pairs. Many transcription factor binding sites occur in the bidirectional promoters that influence the expression of 2 opposite genes. Recently, RNA polymerase II (RPol II) ChIP-seq data are used to identify the promoters of coding genes and non-coding RNAs. However, a bidirectional promoter with RPol II ChIP-Seq data has not been found.
Results
In some bidirectional promoter regions, the RPol II forms a bi-peak shape, which indicates that 2 promoters are located in the bidirectional region. We have developed a computational approach to identify the regulatory regions of all divergent gene pairs using genome-wide RPol II binding patterns derived from ChIP-seq data, based upon the assumption that the distribution of RPol II binding patterns around the bidirectional promoters are accumulated by RPol II binding of 2 promoters. In HeLa S3 cells, 249 promoter pairs and 1094 single promoters were identified, of which 76 promoters cover only positive genes, 86 promoters cover only negative genes, and 932 promoters cover 2 genes. Gene expression levels and STAT1 binding sites for different promoter categories were therefore examined.
Conclusions
The regulatory region of bidirectional promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription. From gene expression and transcription factor binding site analysis, the promoters in bidirectional regions may regulate the closest gene, and STAT1 is involved in primary promoter.
doi:10.1186/1755-8794-6-S1-S5
PMCID: PMC3552671  PMID: 23369456
17.  A chromatin code for alternative splicing involving a putative association between CTCF and HP1α proteins 
BMC Biology  2015;13:31.
Background
Alternative splicing is primarily controlled by the activity of splicing factors and by the elongation of the RNA polymerase II (RNAPII). Recent experiments have suggested a new complex network of splicing regulation involving chromatin, transcription and multiple protein factors. In particular, the CCCTC-binding factor (CTCF), the Argonaute protein AGO1, and members of the heterochromatin protein 1 (HP1) family have been implicated in the regulation of splicing associated with chromatin and the elongation of RNAPII. These results raise the question of whether these proteins may associate at the chromatin level to modulate alternative splicing.
Results
Using chromatin immunoprecipitation sequencing (ChIP-Seq) data for CTCF, AGO1, HP1α, H3K27me3, H3K9me2, H3K36me3, RNAPII, total H3 and 5metC and alternative splicing arrays from two cell lines, we have analyzed the combinatorial code of their binding to chromatin in relation to the alternative splicing patterns between two cell lines, MCF7 and MCF10. Using Machine Learning techniques, we identified the changes in chromatin signals that are most significantly associated with splicing regulation between these two cell lines. Moreover, we have built a map of the chromatin signals on the pre-mRNA, that is, a chromatin-based RNA-map, which can explain 606 (68.55%) of the regulated events between MCF7 and MCF10. This chromatin code involves the presence of HP1α, CTCF, AGO1, RNAPII and histone marks around regulated exons and can differentiate patterns of skipping and inclusion. Additionally, we found a significant association of HP1α and CTCF activities around the regulated exons and a putative DNA binding site for HP1α.
Conclusions
Our results show that a considerable number of alternative splicing events could have a chromatin-dependent regulation involving the association of HP1α and CTCF near regulated exons. Additionally, we find further evidence for the involvement of HP1α and AGO1 in chromatin-related splicing regulation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-015-0141-5) contains supplementary material, which is available to authorized users.
doi:10.1186/s12915-015-0141-5
PMCID: PMC4446157  PMID: 25934638
Chromatin; Splicing; Histones; Splicing code
18.  Discovery of active enhancers through bidirectional expression of short transcripts 
Genome Biology  2011;12(11):R113.
Background
Long-range regulatory elements, such as enhancers, exert substantial control over tissue-specific gene expression patterns. Genome-wide discovery of functional enhancers in different cell types is important for our understanding of genome function as well as human disease etiology.
Results
In this study, we developed an in silico approach to model the previously reported phenomenon of transcriptional pausing, accompanied by divergent transcription, at active promoters. We then used this model for large-scale prediction of non-promoter-associated bidirectional expression of short transcripts. Our predictions were significantly enriched for DNase hypersensitive sites, histone H3 lysine 27 acetylation (H3K27ac), and other chromatin marks associated with active rather than poised or repressed enhancers. We also detected modest bidirectional expression at binding sites of the CCCTC-factor (CTCF) genome-wide, particularly those that overlap H3K27ac.
Conclusions
Our findings indicate that the signature of bidirectional expression of short transcripts, learned from promoter-proximal transcriptional pausing, can be used to predict active long-range regulatory elements genome-wide, likely due in part to specific association of RNA polymerase with enhancer regions.
doi:10.1186/gb-2011-12-11-r113
PMCID: PMC3334599  PMID: 22082242
19.  Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements 
BMC Biology  2011;9:80.
Background
Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases.
Results
Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.
Conclusion
By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.
doi:10.1186/1741-7007-9-80
PMCID: PMC3239327  PMID: 22115494
transcription factor; ChIP-Seq; histone modification; chromatin
20.  Defining the budding yeast chromatin-associated interactome 
We report here the first large-scale affinity purification and mass spectrometry (AP-MS) study of chromatin-associated protein, in which over 100 different baits involved in chromatin biology were studied by modified chromatin immunopurification (mChIP)-MS. In particular, focus was placed on poorly studied chromatin binding proteins, such as transcription factors, which have been underrepresented in previous AP-MS studies.mChIP-MS analysis of transcription factors identified dense networks of protein associated with chromatin that were composed of specific transcriptional co-activators, information not accessible through the use of classical AP-MS methods.Finally, we demonstrate that novel protein–protein interactions identified in study by mChIP have functional implications exemplified by the detailed study of both the ubiquitination of the proline isomerase Cpr1 and of histone chaperones involved in the regulation of the HTA1-HTB1 promoter.Our work demonstrates the value of targeted interactome studies, in which affinity purification methods are adapted to the needs of specific baits, as is the case for chromatin binding proteins.
The maintenance of cellular fitness requires living organisms to integrate multiple signals into coordinated outputs. Central to this process is the regulation of the expression of the genetic information encoded into DNA. As a result, there are numerous constraints imposed on gene expression. The access to DNA is restricted by the formation of nucleosomes, in which DNA is wrapped around histone octamers to form chromatin wherein the volume of DNA is considerably reduced. As such, nucleosome positioning is critical and must be defined precisely, particularly during transcription (Workman, 2006). Furthermore, nucleosomes can be actively assembled/disassembled by histone chaperones and can be made to ‘slide' along DNA by the actions of chromatin remodelers. Moreover, the histone proteins are heavily regulated at the expression level and by extensive post-translational modifications (PTMs) (Campos and Reinberg, 2009). Histone PTMs have also been shown to help recruit numerous chromatin-associated factors in accordance with the histone code (Strahl and Allis, 2000). Although our understanding of chromatin and its roles has improved, we still have limited knowledge of the chromatin-associated protein complexes and their interactions.
The characterization of biological systems and of specific subdomain within them, such as chromatin, remains a difficult task. An efficient approach to gain insight in the function of protein is to define its interactome. The underlying principle of protein interaction mapping is that proteins found to interact must be involved in common processes and localization, i.e., guilt by association. The large-scale mapping of proteins interactions allows to annotate protein of unknown functions, implicate protein of known functions in different processes and derive new hypothesis. This is possible because most proteins do not act in isolation but rather as part of complexes, and thus possess interaction partners that can now be detected with the right tools. AP-MS has emerged as a powerful tool for characterizing protein–protein interactions and biological systems in general (Gingras et al, 2007; Gstaiger and Aebersold, 2009).
Recently, we reported the development of a novel affinity purification approach termed mChIP, which was designed to improve the characterization of DNA binding proteins interactome (Lambert et al, 2009). The mChIP method consists of a single affinity purification step, whereby chromatin-associated proteins are isolated from mildly sonicated and gently clarified cellular extracts using magnetic beads coated with antibodies (Lambert et al, 2009; Figure 1A). As such, the mChIP approach maintains chromatin fragments in solution enabling their specific purification, something not previously possible in classical AP-MS methods (Lambert et al, 2009).
In this study, we report the utilization of mChIP followed by MS for the characterization of more than 100 proteins and their associated protein networks (Figure 1B). We initially focused on DNA-associated proteins that had been poorly characterized in past AP-MS studies, such as transcription factors. In addition, many histone modifiers, such as lysine acetyl transferases (KAT) and lysine methyl transferases, critical components of chromatin function and regulation, were also studied by mChIP. This resulted in raw non-redundant mChIP-MS data containing ∼9000 protein–protein interactions between ∼900 proteins. Following a two-step curation process designed to remove common contaminants and protein not specifically associated with the baits under study, a high confidence mChIP-MS data set was produced containing 2966 protein–protein interactions between 724 proteins (Figure 1B). It is important to note that our curation strategy was capable of maintaining the majority of the protein–protein interaction identified in previous AP-MS studies, while removing the bulk of protein–protein interaction not related to chromatin biology. Further analysis of the mChIP-MS data set revealed that for most bait tested, mChIP-MS resulted in the identification of more interaction partners than classical TAP-MS.
Visualization of the mChIP-MS data set was achieved by generating heat maps from two-dimensional hierarchical clustering of the bait–prey interactions. This revealed numerous clusters within our data set supporting functional relationship. For instance, mChIP analysis of the highly homologous heat-shock-inducible transcription factors Msn2 and Msn4 clustered with different transcriptional co-activators. Importantly, our analysis also revealed key differences in the co-activators associated with Msn2 and Msn4 relevant to their function. Another example that we explore in greater details is the Cpr1 proline isomerase, a known member of the Set3 complex (Pijnappel et al, 2001). mChIP-MS analysis of Cpr1 revealed an extended network of associated proteins, including the E3 ubiquitin ligase Bre1 and its association partner Lge1 (Figure 5A). This association raised the possibility of a direct action of Bre1/Lge1 on Cpr1 to ubiquitinate it. In targeted experiments, we observed that Cpr1 is in fact ubiquitinated in a process involving Bre1/Lge1 (Figure 5E), confirming their functional relationship. As such, mChIP is capable of uncovering novel protein–protein interactions with physiological impacts.
In this study, we report how the use of an AP-MS method designed for a given class of protein (chromatin-associated proteins) can help uncover numerous novel protein–protein interactions. Furthermore, our work detected dense chromatin-associated protein networks being co-purified with multiple transcription factors and other DNA binding proteins. The fact that even in the best-characterized model organism Saccharomyces cerevisiae, thousands of novel protein–protein interactions can be detected supports our view that targeted interactome studies are worthwhile and desirable. As such, the budding yeast interactome can still be consider incomplete and warrant further study.
We previously reported a novel affinity purification (AP) method termed modified chromatin immunopurification (mChIP), which permits selective enrichment of DNA-bound proteins along with their associated protein network. In this study, we report a large-scale study of the protein network of 102 chromatin-related proteins from budding yeast that were analyzed by mChIP coupled to mass spectrometry. This effort resulted in the detection of 2966 high confidence protein associations with 724 distinct preys. mChIP resulted in significantly improved interaction coverage as compared with classical AP methodology for ∼75% of the baits tested. Furthermore, mChIP successfully identified novel binding partners for many lower abundance transcription factors that previously failed using conventional AP methodologies. mChIP was also used to perform targeted studies, particularly of Asf1 and its associated proteins, to allow for a understanding of the physical interplay between Asf1 and two other histone chaperones, Rtt106 and the HIR complex, to be gained.
doi:10.1038/msb.2010.104
PMCID: PMC3018163  PMID: 21179020
affinity purification; chromatin-associated protein networks; mass spectrometry; nucleosome assembly factor Asf1; protein–DNA interaction
21.  GAGA Factor Maintains Nucleosome-Free Regions and Has a Role in RNA Polymerase II Recruitment to Promoters 
PLoS Genetics  2015;11(3):e1005108.
Previous studies have shown that GAGA Factor (GAF) is enriched on promoters with paused RNA Polymerase II (Pol II), but its genome-wide function and mechanism of action remain largely uncharacterized. We assayed the levels of transcriptionally-engaged polymerase using global run-on sequencing (GRO-seq) in control and GAF-RNAi Drosophila S2 cells and found promoter-proximal polymerase was significantly reduced on a large subset of paused promoters where GAF occupancy was reduced by knock down. These promoters show a dramatic increase in nucleosome occupancy upon GAF depletion. These results, in conjunction with previous studies showing that GAF directly interacts with nucleosome remodelers, strongly support a model where GAF directs nucleosome displacement at the promoter and thereby allows the entry Pol II to the promoter and pause sites. This action of GAF on nucleosomes is at least partially independent of paused Pol II because intergenic GAF binding sites with little or no Pol II also show GAF-dependent nucleosome displacement. In addition, the insulator factor BEAF, the BEAF-interacting protein Chriz, and the transcription factor M1BP are strikingly enriched on those GAF-associated genes where pausing is unaffected by knock down, suggesting insulators or the alternative promoter-associated factor M1BP protect a subset of GAF-bound paused genes from GAF knock-down effects. Thus, GAF binding at promoters can lead to the local displacement of nucleosomes, but this activity can be restricted or compensated for when insulator protein or M1BP complexes also reside at GAF bound promoters.
Author Summary
Transcriptional regulation is critical for proper gene expression in response to environmental changes and developmental programs. Eukaryotes have evolved multiple mechanisms by which transcription factors regulate transcription. One mechanism is the reorganization of chromatin to allow Pol II recruitment. Another is the release of promoter-proximal paused Pol II, where Pol II transcription that is halted 20–60 bases downstream of the transcription start site (TSS) is allowed to enter into productive elongation through the gene body. The Drosophila transcription factor GAF binds to genes that undergo pausing and interacts with nucleosome remodelers and the pausing factor NELF. Thus, GAF can regulate multiple points necessary for transcription, but its mechanistic role is not fully understood genome-wide. We depleted GAF from cells and examined the genome-wide changes in Pol II and nucleosome distributions across genes. We found that GAF depletion reduces polymerase density at genes where GAF binds just upstream of the TSS, and results in nucleosomes moving into the promoter region. Our results show that GAF is important for maintaining the promoter accessibility, allowing Pol II to be recruited to promoters and enter the pause sites downstream of the TSS. Thus, GAF is critical for providing the chromatin environment necessary for the proper control of gene expression.
doi:10.1371/journal.pgen.1005108
PMCID: PMC4376892  PMID: 25815464
22.  Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages 
Nucleic Acids Research  2012;40(16):7690-7704.
We have analyzed publicly available K562 Hi-C data, which enable genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model and a power-law decay background to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 45 transcription factors and 9 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into two types of chromatin linkages. The different clusters of loci display very different relationships with transcription factor-binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 9, which is distinguished by marks of open chromatin but not by active enhancer or promoter marks, was not bound by most transcription factors but was highly enriched for three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (BRG1, INI1 and SIRT6). To investigate the impact of chromatin organization on gene regulation, we performed ribonucleicacid-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors not only alters the expression of genes having a nearby bound GATA but also affects expression of genes in interacting loci. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provides genome-wide evidence that Hi-C data identify sets of biologically relevant interacting loci.
doi:10.1093/nar/gks501
PMCID: PMC3439894  PMID: 22675074
23.  A Global Clustering Algorithm to Identify Long Intergenic Non-Coding RNA - with Applications in Mouse Macrophages 
PLoS ONE  2011;6(9):e24051.
Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study.
doi:10.1371/journal.pone.0024051
PMCID: PMC3184070  PMID: 21980340
24.  Identification and characterization of putative methylation targets in the MAOA locus using bioinformatic approaches 
Monoamine oxidase A (MAO A) is an enzyme that catalyzes the oxidation of neurotransmitter amines. A functional polymorphism in the human MAOA gene (high- and low- MAOA) has been associated with distinct behavioral phenotypes. To investigate directly the biological mechanism whereby this polymorphism influences brain function, we recently measured the activity of the MAO A enzyme in healthy volunteers. When found no relationship between the individual's brain MAO A level and the MAOA genotype, we postulated that there are additional regulatory mechanisms that control the MAOA expression. Given that DNA methylation is linked to the regulation of gene expression, we hypothesized that epigenetic mechanisms factor into the MAOA expression. Our underplaying assumption was that the differences in an individual's genotype play a key role in the epigenetic potential of the MAOA locus and, consequently, determine the individual's level of MAO A activity in the brain. As a first step towards experimental validation of the hypothesis, we performed a comprehensive bioinformatic analysis aiming to interrogate genomic features and attributes of the MAOA locus that might modulate its epigenetic sensitivity. Major findings of our analysis are the following: (1) the extended MAOA regulatory region contains two CpG islands (CGIs), one of which overlaps with the canonical MAOA promoter and the other is located further upstream; both CGIs exhibit sensitivity to differential methylation. (2) The uVNTR's effect on the MAOA's transcriptional activity might have epigenetic nature: this polymorphic region resides within the MAOA's CGI and itself contains CpGs, thus, the number of repeating increments effectively changes the number of methylatable cytosines in the MAOA promoter. An array of in silico analyses (the nucleosome positioning, the physical properties of the local DNA, the clustering of transcription-factor binding sites) together with experimental data on histone modifications and Pol 2 sites and data from the RefSeq mRNA library together suggest that the MAOA gene might have an alternative promoter. Based on our findings, we propose a regulatory mechanism for the human MAOA according to which the MAOA expression in vivo is executed by the generation of tissue-specific transcripts initiated from the alternative promoters (both CGI-associated) where transcriptional activation of a particular promoter is under epigenetic control.
PMCID: PMC3169210  PMID: 20421737
human MAOA gene; epigenetic regulation; DNA methylation; epigenetic potential; computational analysis
25.  High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions 
PLoS Computational Biology  2010;6(9):e1000916.
Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large numbers of sites and produce an unreliable list of target genes. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, so that individual sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned a list of thousands of scored sequence patterns. Meanwhile, high-resolution in vivo TF occupancy data from ChIP-seq experiments is also increasingly available. We have developed a flexible discriminative framework for learning TF binding preferences from high resolution in vitro and in vivo data. We first trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. We used a novel -mer based string kernel called the di-mismatch kernel to represent probe sequence similarities. The SVR models are more compact than E-scores, more expressive than PSSMs, and can be readily used to scan genomics regions to predict in vivo occupancy. Using a large data set of yeast and mouse TFs, we found that our SVR models can better predict probe intensity than the E-score method or PBM-derived PSSMs. Moreover, by using SVRs to score yeast, mouse, and human genomic regions, we were better able to predict genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we found that by training kernel-based models directly on ChIP-seq data, we greatly improved in vivo occupancy prediction, and by comparing a TF's in vitro and in vivo models, we could identify cofactors and disambiguate direct and indirect binding.
Author Summary
Transcription factors (TFs) are proteins that bind sites in the non-coding DNA and regulate the expression of targeted genes. Being able to predict the genome-wide binding locations of TFs is an important step in deciphering gene regulatory networks. Historically, there was very limited experimental data on the DNA-binding preferences of most TFs. Computational biologists used known sites to estimate simple binding site motifs, called position-specific scoring matrices, and scan the genome for additional potential binding locations, but this approach often led to many false positive predictions. Here we introduce a machine learning approach to leverage new high resolution data on the binding preferences of TFs, namely, protein binding microarray (PBM) experiments which measure the in vitro binding affinities of TFs with respect to an array of double-stranded DNA probes, and chromatin immunoprecipitation experiments followed by next generation sequencing (ChIP-seq) which measure in vivo genome-wide binding of TFs in a given cell type. We show that by training statistical models on high resolution PBM and ChIP-seq data, we can more accurately represent the subtle DNA binding preferences of TFs and predict their genome-wide binding locations. These results will enable advances in the computational analysis of transcriptional regulation in mammalian genomes.
doi:10.1371/journal.pcbi.1000916
PMCID: PMC2936517  PMID: 20838582

Results 1-25 (1267592)