Kaposi's sarcoma-associated herpesvirus (KSHV) is a γ-herpesvirus associated with KS and two lymphoproliferative diseases. Recent studies characterized epigenetic modification of KSHV episomes during latency and determined that latency-associated genes are associated with H3K4me3 while most lytic genes are associated with the silencing mark H3K27me3. Since the latency-associated nuclear antigen (LANA) (i) is expressed very early after de novo infection, (ii) interacts with transcriptional regulators and chromatin remodelers, and (iii) regulates the LANA and RTA promoters, we hypothesized that LANA may contribute to the establishment of latency through epigenetic control. We performed a detailed ChIP-seq analysis in cells of lymphoid and endothelial origin and compared H3K4me3, H3K27me3, polII, and LANA occupancy. On viral episomes LANA binding was detected at numerous lytic and latent promoters, which were transactivated by LANA using reporter assays. LANA binding was highly enriched at H3K4me3 peaks and this co-occupancy was also detected on many host gene promoters. Bioinformatic analysis of enriched LANA binding sites in combination with biochemical binding studies revealed three distinct binding patterns. A small subset of LANA binding sites showed sequence homology to the characterized LBS1/2 sequence in the viral terminal repeat. A large number of sites contained a novel LANA binding motif (TCCAT)3 which was confirmed by gel shift analysis. Third, some viral and cellular promoters did not contain LANA binding sites and are likely enriched through protein/protein interaction. LANA was associated with H3K4me3 marks and in PEL cells 86% of all LANA bound promoters were transcriptionally active, leading to the hypothesis that LANA interacts with the machinery that methylates H3K4. Co-immunoprecipitation demonstrated LANA association with endogenous hSET1 complexes in both lymphoid and endothelial cells suggesting that LANA may contribute to the epigenetic profile of KSHV episomes.
KSHV is a DNA tumor virus which is associated with Kaposi's sarcoma and some lymphoproliferative diseases. During latent infection, the viral genome persists as circular extrachromosomal DNA in the nucleus and expresses a very limited number of viral proteins, including LANA, a multi-functional protein. KSHV viral episomes, like host genomic DNA, are subject to chromatin formation and histone modifications which contribute to tightly controlled gene expression during latency. We determined where LANA binds on the KSHV and human genomes, and mapped activating and repressing histone marks and RNA polymerase II binding. We found that LANA bound near transcription start sites, and binding correlated with the transcription active mark H3K4me3, but not silencing mark H3K27me3. Binding sites for transcription factors including znf143, CTCF, and Stat1 are enriched at regions where LANA is bound. We identified some LANA binding sites near human gene promoters that resembled KSHV sequences known to bind LANA. We also found a novel motif that occurs frequently in the human genome and that binds LANA directly despite being different from known LANA-binding sequences. Furthermore, we demonstrate that LANA associates with the H3K4 methyltransferase hSET1 which creates activating histone marks.
Alternative splicing is primarily controlled by the activity of splicing factors and by the elongation of the RNA polymerase II (RNAPII). Recent experiments have suggested a new complex network of splicing regulation involving chromatin, transcription and multiple protein factors. In particular, the CCCTC-binding factor (CTCF), the Argonaute protein AGO1, and members of the heterochromatin protein 1 (HP1) family have been implicated in the regulation of splicing associated with chromatin and the elongation of RNAPII. These results raise the question of whether these proteins may associate at the chromatin level to modulate alternative splicing.
Using chromatin immunoprecipitation sequencing (ChIP-Seq) data for CTCF, AGO1, HP1α, H3K27me3, H3K9me2, H3K36me3, RNAPII, total H3 and 5metC and alternative splicing arrays from two cell lines, we have analyzed the combinatorial code of their binding to chromatin in relation to the alternative splicing patterns between two cell lines, MCF7 and MCF10. Using Machine Learning techniques, we identified the changes in chromatin signals that are most significantly associated with splicing regulation between these two cell lines. Moreover, we have built a map of the chromatin signals on the pre-mRNA, that is, a chromatin-based RNA-map, which can explain 606 (68.55%) of the regulated events between MCF7 and MCF10. This chromatin code involves the presence of HP1α, CTCF, AGO1, RNAPII and histone marks around regulated exons and can differentiate patterns of skipping and inclusion. Additionally, we found a significant association of HP1α and CTCF activities around the regulated exons and a putative DNA binding site for HP1α.
Our results show that a considerable number of alternative splicing events could have a chromatin-dependent regulation involving the association of HP1α and CTCF near regulated exons. Additionally, we find further evidence for the involvement of HP1α and AGO1 in chromatin-related splicing regulation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-015-0141-5) contains supplementary material, which is available to authorized users.
Chromatin; Splicing; Histones; Splicing code
Long-range regulatory elements, such as enhancers, exert substantial control over tissue-specific gene expression patterns. Genome-wide discovery of functional enhancers in different cell types is important for our understanding of genome function as well as human disease etiology.
In this study, we developed an in silico approach to model the previously reported phenomenon of transcriptional pausing, accompanied by divergent transcription, at active promoters. We then used this model for large-scale prediction of non-promoter-associated bidirectional expression of short transcripts. Our predictions were significantly enriched for DNase hypersensitive sites, histone H3 lysine 27 acetylation (H3K27ac), and other chromatin marks associated with active rather than poised or repressed enhancers. We also detected modest bidirectional expression at binding sites of the CCCTC-factor (CTCF) genome-wide, particularly those that overlap H3K27ac.
Our findings indicate that the signature of bidirectional expression of short transcripts, learned from promoter-proximal transcriptional pausing, can be used to predict active long-range regulatory elements genome-wide, likely due in part to specific association of RNA polymerase with enhancer regions.
Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion.
We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes.
The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10−100), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately −9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (−9435 to −8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.
We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0460-0) contains supplementary material, which is available to authorized users.
Cardiac differentiation; Network inference; Logistic regression; Time-varying dynamic Bayesian model; Data integration; Gene regulatory network
CCCTC binding factor (CTCF) is a highly conserved zinc finger protein, which is involved in chromatin organization, local histone modifications, and RNA polymerase II-mediated gene transcription. CTCF may act by binding tightly to DNA and recruiting other proteins to mediate its various functions in the nucleus. To further explore the role of this essential factor, we used a mass spectrometry-based approach to screen for novel CTCF-interacting partners.
Using biotinylated CTCF as bait, we identified upstream binding factor (UBF) and multiple other components of the RNA polymerase I complex as potential CTCF-interacting partners. Interestingly, CTCFL, the testis-specific paralog of CTCF, also binds UBF. The interaction between CTCF(L) and UBF is direct, and requires the zinc finger domain of CTCF(L) and the high mobility group (HMG)-box 1 and dimerization domain of UBF. Because UBF is involved in RNA polymerase I-mediated ribosomal (r)RNA transcription, we analyzed CTCF binding to the rDNA repeat. We found that CTCF bound to a site upstream of the rDNA spacer promoter and preferred non-methylated over methylated rDNA. DNA binding by CTCF in turn stimulated binding of UBF. Absence of CTCF in cultured cells resulted in decreased association of UBF with rDNA and in nucleolar fusion. Furthermore, lack of CTCF led to reduced binding of RNA polymerase I and variant histone H2A.Z near the rDNA spacer promoter, a loss of specific histone modifications, and diminished transcription of non-coding RNA from the spacer promoter.
UBF is the first common interaction partner of CTCF and CTCFL, suggesting a role for these proteins in chromatin organization of the rDNA repeats. We propose that CTCF affects RNA polymerase I-mediated events globally by controlling nucleolar number, and locally by regulating chromatin at the rDNA spacer promoter, similar to RNA polymerase II promoters. CTCF may load UBF onto rDNA, thereby forming part of a network that maintains rDNA genes poised for transcription.
MicroRNAs are small non-coding RNAs involved in post-transcriptional regulation of gene expression. Due to the poor annotation of primary microRNA (pri-microRNA) transcripts, the precise location of promoter regions driving expression of many microRNA genes is enigmatic. This deficiency hinders our understanding of microRNA-mediated regulatory networks. In this study, we develop a computational approach to identify the promoter region and transcription start site (TSS) of pri-microRNAs actively transcribed using genome-wide RNA Polymerase II (RPol II) binding patterns derived from ChIP-seq data. Based upon the assumption that the distribution of RPol II binding patterns around the TSS of microRNA and protein coding genes are similar, we designed a statistical model to mimic RPol II binding patterns around the TSS of highly expressed, well-annotated promoter regions of protein coding genes. We used this model to systematically scan the regions upstream of all intergenic microRNAs for RPol II binding patterns similar to those of TSS from protein coding genes. We validated our findings by examining the conservation, CpG content, and activating histone marks in the identified promoter regions. We applied our model to assess changes in microRNA transcription in steroid hormone-treated breast cancer cells. The results demonstrate many microRNA genes have lost hormone-dependent regulation in tamoxifen-resistant breast cancer cells. MicroRNA promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription, and therefore allows comparison of transcription activities between different conditions, such as normal and disease states.
Identification of regulatory elements within the genome is crucial for understanding the mechanisms that govern cell type–specific gene expression. We generated genome-wide maps of open chromatin sites in 3T3-L1 adipocytes (on day 0 and day 8 of differentiation) and NIH-3T3 fibroblasts using formaldehyde-assisted isolation of regulatory elements coupled with high-throughput sequencing (FAIRE-seq). FAIRE peaks at the promoter were associated with active transcription and histone modifications of H3K4me3 and H3K27ac. Non-promoter FAIRE peaks were characterized by H3K4me1+/me3-, the signature of enhancers, and were largely located in distal regions. The non-promoter FAIRE peaks showed dynamic change during differentiation, while the promoter FAIRE peaks were relatively constant. Functionally, the adipocyte- and preadipocyte-specific non-promoter FAIRE peaks were, respectively, associated with genes up-regulated and down-regulated by differentiation. Genes highly up-regulated during differentiation were associated with multiple clustered adipocyte-specific FAIRE peaks. Among the adipocyte-specific FAIRE peaks, 45.3% and 11.7% overlapped binding sites for, respectively, PPARγ and C/EBPα, the master regulators of adipocyte differentiation. Computational motif analyses of the adipocyte-specific FAIRE peaks revealed enrichment of a binding motif for nuclear family I (NFI) transcription factors. Indeed, ChIP assay showed that NFI occupy the adipocyte-specific FAIRE peaks and/or the PPARγ binding sites near PPARγ, C/EBPα, and aP2 genes. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus. Overexpression of dominant-negative NFIA or siRNA–mediated knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation, suggesting a physiological function of these factors in the adipogenic program. Together, our study demonstrates the utility of FAIRE-seq in providing a global view of cell type–specific regulatory elements in the genome and in identifying transcriptional regulators of adipocyte differentiation.
Humans consist of a few hundred types of specialized-function cells. Spatial and temporal transcriptional regulation of genes is essential for manifestation of cellular phenotypes. Identification of regulatory regions in the genome is central to understanding the mechanism of cell type–specific gene regulation. Recently developed high-throughput sequencing technology and computational analyses allow genome-wide investigation of the genome's chromatin structure. Using the FAIRE-seq technique, we identified the genome's open chromatin regions, which harbor regulatory elements in adipocytes. Open chromatin regions distal to genes' transcription start sites significantly differ among cell types. Multiple cell type–specific open chromatin regions exist near genes regulated during adipocyte differentiation. Computational motif analysis of adipocyte-specific open chromatin regions revealed enrichment of a binding motif for the NFI transcription factor family. These factors bind to the regulatory elements near adipogenic PPARγ, C/EBPα, and aP2 genes and regulate their expression. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus and knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation. Our study demonstrates the utility of FAIRE-seq in providing a global view of regulatory elements and in identifying transcriptional regulators of cellular functions.
Genomic enhancers regulate spatio-temporal gene expression by recruiting specific combinations of transcription factors (TFs). When TFs are bound to active regulatory regions, they displace canonical nucleosomes, making these regions biochemically detectable as nucleosome-depleted regions or accessible/open chromatin. Here we ask whether open chromatin profiling can be used to identify the entire repertoire of active promoters and enhancers underlying tissue-specific gene expression during normal development and oncogenesis in vivo. To this end, we first compare two different approaches to detect open chromatin in vivo using the Drosophila eye primordium as a model system: FAIRE-seq, based on physical separation of open versus closed chromatin; and ATAC-seq, based on preferential integration of a transposon into open chromatin. We find that both methods reproducibly capture the tissue-specific chromatin activity of regulatory regions, including promoters, enhancers, and insulators. Using both techniques, we screened for regulatory regions that become ectopically active during Ras-dependent oncogenesis, and identified 3778 regions that become (over-)activated during tumor development. Next, we applied motif discovery to search for candidate transcription factors that could bind these regions and identified AP-1 and Stat92E as key regulators. We validated the importance of Stat92E in the development of the tumors by introducing a loss of function Stat92E mutant, which was sufficient to rescue the tumor phenotype. Additionally we tested if the predicted Stat92E responsive regulatory regions are genuine, using ectopic induction of JAK/STAT signaling in developing eye discs, and observed that similar chromatin changes indeed occurred. Finally, we determine that these are functionally significant regulatory changes, as nearby target genes are up- or down-regulated. In conclusion, we show that FAIRE-seq and ATAC-seq based open chromatin profiling, combined with motif discovery, is a straightforward approach to identify functional genomic regulatory regions, master regulators, and gene regulatory networks controlling complex in vivo processes.
The functional expression of all genes is regulated by proteins, namely transcription factors that bind to specific areas of DNA known as regulatory regions. Whereas most DNA in our genome is normally bound by other proteins (histones) and packaged into units called nucleosomes, a specific subset of tissue-specific regulatory regions is responsible for tissue-specific gene expression; these active regions are nucleosome-depleted and bound by transcription factors. We use two techniques to identify these open chromatin regions, in a normal tissue and a RasV12 induced cancer tissue. We discovered a remarkable change in the accessible regulatory landscape between these two tissues, with several thousand regions becoming more accessible in the cancer tissue. We identified two transcription factors known to be involved in cancer (AP-1 and Stat92E) controlling these newly accessible regulatory regions. Finally, we introduced a mutation resulting in Stat92E becoming non-functional in the cancer tissue, which decreased the severity of the tumor. Our study shows that open chromatin profiling can be used to identify complex in vivo processes, and we shed new light on Ras dependent cancer development.
The application of deep sequencing to map 5′ capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: “focused” promoters with transcription start sites (TSSs) that occur in a narrowly defined genomic span and “dispersed” promoters with TSSs that are spread over a larger window. Previous studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, but virtually no studies have directly investigated the relationship with chromatin features. Here, we show that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Dispersed promoters display higher associations with well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone variants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes. Notably, differences are conserved across mammals and flies, and they provide for a clearer separation of promoter architectures than the presence and absence of CpG islands or the occurrence of stalled RNA polymerase. Computational models support the stronger contribution of chromatin features to the definition of dispersed promoters compared to focused start sites. Our results show that promoter classes defined from 5′ capped transcripts not only reflect differences in the initiation process at the core promoter but also are indicative of divergent transcriptional programs established within gene-proximal nucleosome organization.
How are genes transcribed at the right levels and under the right conditions? Transcription regulation in eukaryotes has long been proposed to work by a division of labor: ubiquitous DNA sequence features in the core promoter region, close to the transcription start site (TSS) of genes, were thought to generically encode information to recruit RNA polymerase to initiate transcription, while specific sequence features, often distal from the genes, were thought to boost expression under the right conditions. Supporting the generic function of core promoters, genome-wide chromatin maps showed a stereotypical arrangement of well-spaced nucleosomes providing access to the TSS. High-throughput sequencing has generated genome-wide TSS maps at high resolution, which show that promoters exhibit different initiation patterns, ranging from focused start sites to dispersed regions. Linking these patterns to chromatin maps, we now find distinct core promoter classes, those in which the TSS location is defined broadly on the chromatin level and those in which the TSS is defined by precisely positioned sequence features. Notably, these architectures are conserved deeply across eukaryotes and are used for different functional classes of genes. Our work adds to the increasing understanding that core promoters contribute significantly to the complexity of eukaryotic gene expression.
MicroRNAs (miRNAs) are small non-coding RNAs that regulate expression of various target genes. miRNAs are expressed in a tissue-specific manner and play important roles in cell proliferation, apoptosis, and differentiation. Epigenetic alterations such as DNA methylation and histone modification are essential for chromatin remodeling and regulation of gene expression including miRNAs. The CCCTC-binding factor, CTCF, is known to bind insulators and exhibits an enhancer-blocking and barrier function, and more recently, it also contributes to the three-dimensional organization of the genome. CTCF can also serve as a barrier against the spread of DNA methylation and histone repressive marks over promoter regions of tumor suppressor genes. Recent studies have shown that CTCF is also involved in the regulation of miRNAs such as miR-125b1, miR-375, and the miR-290 cluster in cancer cells and stem cells. miR-125b1 is a candidate of tumor suppressor and is silenced in breast cancer cells. On the other hand, miR-375 may have oncogenic function and is overexpressed in breast cancer cells. CTCF is involved in the regulation of both miR-125b1 and miR-375, indicating that there are various patterns of CTCF-associated epigenetic regulation of miRNAs. CTCF may also play a key role in the pluripotency of cells through the regulation of miR-290 cluster. These observations suggest that CTCF-mediated regulation of miRNAs could be a novel approach for cancer therapy and regenerative medicine.
microRNA; CTCF; cancer cell; embryonic stem cell; miR-125b1; miR-375; miR-290 cluster
Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context.
We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters.
We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters.
Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters.
Host cell differentiation-dependent regulation of human papillomavirus (HPV) gene expression is required for productive infection. The host cell CCCTC-binding factor (CTCF) functions in genome-wide chromatin organization and gene regulation. We have identified a conserved CTCF binding site in the E2 open reading frame of high-risk HPV types. Using organotypic raft cultures of primary human keratinocytes containing high-risk HPV18 genomes, we show that CTCF recruitment to this conserved site regulates viral gene expression in differentiating epithelia. Mutation of the CTCF binding site increases the expression of the viral oncoproteins E6 and E7 and promotes host cell proliferation. Loss of CTCF binding results in a reduction of a specific alternatively spliced transcript expressed from the early gene region concomitant with an increase in the abundance of unspliced early transcripts. We conclude that high-risk HPV types have evolved to recruit CTCF to the early gene region to control the balance and complexity of splicing events that regulate viral oncoprotein expression.
IMPORTANCE The establishment and maintenance of HPV infection in undifferentiated basal cells of the squamous epithelia requires the activation of a subset of viral genes, termed early genes. The differentiation of infected cells initiates the expression of the late viral transcripts, allowing completion of the virus life cycle. This tightly controlled balance of differentiation-dependent viral gene expression allows the virus to stimulate cellular proliferation to support viral genome replication with minimal activation of the host immune response, promoting virus productivity. Alternative splicing of viral mRNAs further increases the complexity of viral gene expression. In this study, we show that the essential host cell protein CTCF, which functions in genome-wide chromatin organization and gene regulation, is recruited to the HPV genome and plays an essential role in the regulation of early viral gene expression and transcript processing. These data highlight a novel virus-host interaction important for HPV pathogenicity.
Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications.
The DNA in eukaryotes is packaged by histones. Interestingly, histones can be marked by a variety of posttranslational modifications, and it has been hypothesized that distinct combinations of histone modifications mark at distinct functional regions of the genome. The study of histone modifications has been aided by the development of high-throughput techniques to map a wide assortment of histone modifications on a global scale. However, because much of our current understanding of the human genome is concentrated on promoters, most studies have only examined histone modifications at these well-defined sites, ignoring the vast majority of the genome. To aid in the discovery of functional elements outside of these well-annotated loci, we develop an unbiased method that searches for commonly occurring histone modification patterns on a global scale without using any annotation information. This method recovers known patterns associated with transcriptional enhancers and promoters. Supporting the histone code hypothesis, we discover that the different functional activities of enhancers are closely associated with the presence of different histone modification patterns. We also discover several novel patterns that likely contain other potential regulatory elements. As the availability of large-scale histone modification data increases, the ability of methods such as the one presented here to concisely describe commonly occurring chromatin signatures, thereby abstracting away irrelevant or redundant data, will become increasingly more critical.
The transcriptional insulator CCCTC binding factor (CTCF)2 was shown previously to be critical for human MHC class II gene expression. Whether the mechanisms used by CTCF in humans was similar to that of the mouse and whether the three-dimensional chromatin architecture created was specific to B cells was not defined. Genome-wide CTCF occupancy was defined for murine B cells and LPS-derived plasmablasts by ChIP-seq. Fifteen CTCF sites within the murine MHC-II locus were associated with high CTCF binding in B cells. Only one third of these sites displayed significant CTCF occupancy in plasmablasts. CTCF was required for maximal MHC-II gene expression in mouse B cells. In B cells, a subset of the CTCF regions interacted with each other, creating a three-dimensional architecture for the locus. Additional interactions occurred between MHC-II promoters and the CTCF sites. In contrast, a novel configuration occurred in plasma cells, which do not express MHC-II genes. Ectopic CIITA expression in plasma cells to induce MHC-II expression resulted in high levels of MHC-II proteins, but did not alter the plasma cell architecture completely. These data suggest that reorganizing the three-dimensional chromatin architecture is an epigenetic mechanism that accompanies the silencing of MHC class II genes as part of the cell fate commitment of plasma cells.
Kaposi's sarcoma-associated herpesvirus (KSHV) is a human herpesvirus that causes Kaposi's sarcoma and is associated with the development of lymphoproliferative diseases. KSHV reactivation from latency and virion production is dependent on efficient transcription of over eighty lytic cycle genes and viral DNA replication. CTCF and cohesin, cellular proteins that cooperatively regulate gene expression and mediate long-range DNA interactions, have been shown to bind at specific sites in herpesvirus genomes. CTCF and cohesin regulate KSHV gene expression during latency and may also control lytic reactivation, although their role in lytic gene expression remains incompletely characterized. Here, we analyze the dynamic changes in CTCF and cohesin binding that occur during the process of KSHV viral reactivation and virion production by high resolution chromatin immunoprecipitation and deep sequencing (ChIP-Seq) and show that both proteins dissociate from viral genomes in kinetically and spatially distinct patterns. By utilizing siRNAs to specifically deplete CTCF and Rad21, a cohesin component, we demonstrate that both proteins are potent restriction factors for KSHV replication, with cohesin knockdown leading to hundred-fold increases in viral yield. High-throughput RNA sequencing was used to characterize the transcriptional effects of CTCF and cohesin depletion, and demonstrated that both proteins have complex and global effects on KSHV lytic transcription. Specifically, both proteins act as positive factors for viral transcription initially but subsequently inhibit KSHV lytic transcription, such that their net effect is to limit KSHV RNA accumulation. Cohesin is a more potent inhibitor of KSHV transcription than CTCF but both proteins are also required for efficient transcription of a subset of KSHV genes. These data reveal novel effects of CTCF and cohesin on transcription from a relatively small genome that resemble their effects on the cellular genome by acting as gene-specific activators of some promoters, but differ in acting as global negative regulators of transcription.
Kaposi's sarcoma-associated herpesvirus (KSHV) is a human virus that causes Kaposi's sarcoma and lymphoma. KSHV establishes a lifelong infection in B lymphocytes, and persists in a latent form as circular DNA molecules. Reactivation and replication yield infectious virions, allowing transmission and maintenance of latent infection. The cellular mechanisms controlling reactivation remain incompletely characterized. Host proteins that regulate RNA transcription play an important role in controlling viral reactivation. In this study, we used high-throughput techniques to analyze the binding of two cellular proteins, CTCF and Rad21, to the KSHV genome as the virus reactivated to produce infectious virions. We found that these proteins dissociate from the latent genome when reactivation occurs. We also found that depleting cells of these proteins increases virus production as much as a hundredfold. Depleting the cell of CTCF or Rad21 caused complex changes in the synthesis of RNAs by KSHV, with the amounts of most KSHV RNAs increasing greatly. We also showed that Rad21 and CTCF are needed for the virus to synthesize RNAs efficiently. Our study provides new insights into how the cell uses CTCF and Rad21 to limit KSHV's ability to synthesize RNA and reactivate from latency to produce infectious virus.
Bidirectional promoters are shared promoter sequences between divergent gene pair (genes proximal to each other on opposite strands), and can regulate the genes in both directions. In the human genome, > 10% of protein-coding genes are arranged head-to-head on opposite strands, with transcription start sites that are separated by < 1,000 base pairs. Many transcription factor binding sites occur in the bidirectional promoters that influence the expression of 2 opposite genes. Recently, RNA polymerase II (RPol II) ChIP-seq data are used to identify the promoters of coding genes and non-coding RNAs. However, a bidirectional promoter with RPol II ChIP-Seq data has not been found.
In some bidirectional promoter regions, the RPol II forms a bi-peak shape, which indicates that 2 promoters are located in the bidirectional region. We have developed a computational approach to identify the regulatory regions of all divergent gene pairs using genome-wide RPol II binding patterns derived from ChIP-seq data, based upon the assumption that the distribution of RPol II binding patterns around the bidirectional promoters are accumulated by RPol II binding of 2 promoters. In HeLa S3 cells, 249 promoter pairs and 1094 single promoters were identified, of which 76 promoters cover only positive genes, 86 promoters cover only negative genes, and 932 promoters cover 2 genes. Gene expression levels and STAT1 binding sites for different promoter categories were therefore examined.
The regulatory region of bidirectional promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription. From gene expression and transcription factor binding site analysis, the promoters in bidirectional regions may regulate the closest gene, and STAT1 is involved in primary promoter.
MOF is the major histone H4 lysine 16-specific (H4K16) acetyltransferase in mammals and Drosophila. In flies, it is involved in the regulation of X-chromosomal and autosomal genes as part of the MSL and the NSL complexes, respectively. While the function of the MSL complex as a dosage compensation regulator is fairly well understood, the role of the NSL complex in gene regulation is still poorly characterized. Here we report a comprehensive ChIP–seq analysis of four NSL complex members (NSL1, NSL3, MBD-R2, and MCRS2) throughout the Drosophila melanogaster genome. Strikingly, the majority (85.5%) of NSL-bound genes are constitutively expressed across different cell types. We find that an increased abundance of the histone modifications H4K16ac, H3K4me2, H3K4me3, and H3K9ac in gene promoter regions is characteristic of NSL-targeted genes. Furthermore, we show that these genes have a well-defined nucleosome free region and broad transcription initiation patterns. Finally, by performing ChIP–seq analyses of RNA polymerase II (Pol II) in NSL1- and NSL3-depleted cells, we demonstrate that both NSL proteins are required for efficient recruitment of Pol II to NSL target gene promoters. The observed Pol II reduction coincides with compromised binding of TBP and TFIIB to target promoters, indicating that the NSL complex is required for optimal recruitment of the pre-initiation complex on target genes. Moreover, genes that undergo the most dramatic loss of Pol II upon NSL knockdowns tend to be enriched in DNA Replication–related Element (DRE). Taken together, our findings show that the MOF-containing NSL complex acts as a major regulator of housekeeping genes in flies by modulating initiation of Pol II transcription.
Housekeeping genes are required to support basic cellular functions and are therefore expressed constitutively in all tissues. Although the homeostasis of housekeeping gene expression is vital for cell survival, most research on the transcription initiation has been focused on TATA-box-containing promoters of inducible and developmental genes, while regulatory mechanisms at the TATA-less promoters of housekeeping genes have remained poorly understood. Using genome-wide chromatin binding profiles, we find that the NSL complex, a histone acetyltransferase-containing complex, is bound to the majority of constitutively active gene promoters. We show that NSL-bound genes display specific sets of DNA motifs, well-defined nucleosome free regions, and broad transcription initiation patterns. In addition, we show that the NSL complex regulates the recruitment of the basal transcription machinery to target promoters; more specifically, we can pinpoint its role to the early steps of Pol II recruitment. Interestingly, we also see that NSL-bound genes are most susceptible to Pol II loss after depletion of NSLs when they contain the DNA Replication–related Element (DRE). Taken together, we provide a genome-wide analysis of a chromatin-modifying complex that is globally involved in the regulation of housekeeping gene expression.
Gene expression is regulated by the complex interaction between transcriptional activators and repressors, which function in part by recruiting histone-modifying enzymes to control accessibility of DNA to RNA polymerase. The evolutionarily conserved family of Groucho/Transducin-Like Enhancer of split (Gro/TLE) proteins act as co-repressors for numerous transcription factors. Gro/TLE proteins act in several key pathways during development (including Notch and Wnt signaling), and are implicated in the pathogenesis of several human cancers. Gro/TLE proteins form oligomers and it has been proposed that their ability to exert long-range repression on target genes involves oligomerization over broad regions of chromatin. However, analysis of an endogenous gro mutation in Drosophila revealed that oligomerization of Gro is not always obligatory for repression in vivo. We have used chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) to profile Gro recruitment in two Drosophila cell lines. We find that Gro predominantly binds at discrete peaks (<1 kilobase). We also demonstrate that blocking Gro oligomerization does not reduce peak width as would be expected if Gro oligomerization induced spreading along the chromatin from the site of recruitment. Gro recruitment is enriched in “active” chromatin containing developmentally regulated genes. However, Gro binding is associated with local regions containing hypoacetylated histones H3 and H4, which is indicative of chromatin that is not fully open for efficient transcription. We also find that peaks of Gro binding frequently overlap the transcription start sites of expressed genes that exhibit strong RNA polymerase pausing and that depletion of Gro leads to release of polymerase pausing and increased transcription at a bona fide target gene. Our results demonstrate that Gro is recruited to local sites by transcription factors to attenuate rather than silence gene expression by promoting histone deacetylation and polymerase pausing.
Repression by transcription factors plays a central role in gene regulation. The Groucho/Transducin-Like Enhancer of split (Gro/TLE) family of co-repressors interacts with many different transcription factors and has many essential roles during animal development. Groucho/TLE proteins form oligomers that are necessary for target gene repression in some contexts. We have profiled the genome-wide recruitment of the founding member of this family, Groucho (from Drosophila) to gain insight into how and where it binds with respect to target genes and to identify factors associated with its binding. We find that Groucho binds in discrete peaks, frequently at transcription start sites, and that blocking Groucho from forming oligomers does not significantly change the pattern of Groucho recruitment. Although Groucho acts as a repressor, Groucho binding is enriched in chromatin that is permissive for transcription, and we find that it acts to attenuate rather than completely silence target gene expression. Thus, Groucho does not act as an “on/off” switch on target gene expression, but rather as a “mute” button.
Histone modifications play an integral role in plant development, but have been poorly studied in woody plants. Investigating chromatin organization in wood-forming tissue and its role in regulating gene expression allows us to understand the mechanisms underlying cellular differentiation during xylogenesis (wood formation) and identify novel functional regions in plant genomes. However, woody tissue poses unique challenges for using high-throughput chromatin immunoprecipitation (ChIP) techniques for studying genome-wide histone modifications in vivo. We investigated the role of the modified histone H3K4me3 (trimethylated lysine 4 of histone H3) in gene expression during the early stages of wood formation using ChIP-seq in Eucalyptus grandis, a woody biomass model.
Plant chromatin fixation and isolation protocols were optimized for developing xylem tissue collected from field-grown E. grandis trees. A “nano-ChIP-seq” procedure was employed for ChIP DNA amplification. Over 9 million H3K4me3 ChIP-seq and 18 million control paired-end reads were mapped to the E. grandis reference genome for peak-calling using Model-based Analysis of ChIP-Seq. The 12,177 significant H3K4me3 peaks identified covered ~1.5% of the genome and overlapped some 9,623 protein-coding genes and 38 noncoding RNAs. H3K4me3 library coverage, peaking ~600 - 700 bp downstream of the transcription start site, was highly correlated with gene expression levels measured with RNA-seq. Overall, H3K4me3-enriched genes tended to be less tissue-specific than unenriched genes and were overrepresented for general cellular metabolism and development gene ontology terms. Relative expression of H3K4me3-enriched genes in developing secondary xylem was higher than unenriched genes, however, and highly expressed secondary cell wall-related genes were enriched for H3K4me3 as validated using ChIP-qPCR.
In this first genome-wide analysis of a modified histone in a woody tissue, we optimized a ChIP-seq procedure suitable for field-collected samples. In developing E. grandis xylem, H3K4me3 enrichment is an indicator of active transcription, consistent with its known role in sustaining pre-initiation complex formation in yeast. The H3K4me3 ChIP-seq data from this study paves the way to understanding the chromatin landscape and epigenomic architecture of xylogenesis in plants, and complements RNA-seq evidence of gene expression for the future improvement of the E. grandis genome annotation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-015-0499-0) contains supplementary material, which is available to authorized users.
ChIP-seq; H3K4me3; Histone; Secondary cell wall; Xylogenesis; Eucalyptus
Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study.
During retinal development, post-mitotic neural progenitor cells must activate thousands of genes to complete synaptogenesis and terminal maturation. While many of these genes are known, others remain beyond the sensitivity of expression microarray analysis. Some of these elusive gene activation events can be detected by mapping changes in RNA polymerase-II (Pol-II) association around transcription start sites.
High-resolution (35 bp) chromatin immunoprecipitation (ChIP)-on-chip was used to map changes in Pol-II binding surrounding 26,000 gene transcription start sites during photoreceptor maturation of the mouse neural retina, comparing postnatal age 25 (P25) to P2. Coverage was 10–12 kb per transcription start site, including 2.5 kb downstream. Pol-II-active regions were mapped to the mouse genomic DNA sequence by using computational methods (Tiling Analysis Software-TAS program), and the ratio of maximum Pol-II binding (P25/P2) was calculated for each gene. A validation set of 36 genes (3%), representing a full range of Pol-II signal ratios (P25/P2), were examined with quantitative ChIP assays for transcriptionally active Pol-II. Gene expression assays were also performed for 19 genes of the validation set, again on independent samples. FLT-3 Interacting Zinc-finger-1 (FIZ1), a zinc-finger protein that associates with active promoter complexes of photoreceptor-specific genes, provided an additional ChIP marker to highlight genes activated in the mature neural retina. To demonstrate the use of ChIP-on-chip predictions to find novel gene activation events, four additional genes were selected for quantitative PCR analysis (qRT–PCR analysis); these four genes have human homologs located in unidentified retinal disease regions: Solute carrier family 25 member 33 (Slc25a33), Lysophosphatidylcholine acyltransferase 1 (Lpcat1), Coiled-coil domain-containing 126 (Ccdc126), and ADP-ribosylation factor-like 4D (Arl4d).
ChIP-on-chip Pol-II peak signal ratios >1.8 predicted increased amounts of transcribing Pol-II and increased expression with an estimated 97% accuracy, based on analysis of the validation gene set. Using this threshold ratio, 1,101 genes were predicted to experience increased binding of Pol-II in their promoter regions during terminal maturation of the neural retina. Over 800 of these gene activations were additional to those previously reported by microarray analysis. Slc25a33, Lpcat1, Ccdc126, and Arl4d increased expression significantly (p<0.001) during photoreceptor maturation. Expression of all four genes was diminished in adult retinas lacking rod photoreceptors (Rd1 mice) compared to normal retinas (90% loss for Ccdc126 and Arl4d). For rhodopsin (Rho), a marker of photoreceptor maturation, two regions of maximum Pol-II signal corresponded to the upstream rhodopsin enhancer region and the rhodopsin proximal promoter region.
High-resolution maps of Pol-II binding around transcription start sites were generated for the postnatal mouse retina; which can predict activation increases for a specific gene of interest. Novel gene activation predictions are enriched for biologic functions relevant to vision, neural function, and chromatin regulation. Use of the data set to detect novel activation increases was demonstrated by expression analysis for several genes that have human homologs located within unidentified retinal disease regions: Slc25a33, Lpcat1, Ccdc126, and Arl4d. Analysis of photoreceptor-deficient retinas indicated that all four genes are expressed in photoreceptors. Genome-wide maps of Pol-II binding were developed for visual access in the University of California, Santa Cruz (UCSC) Genome Browser and its eye-centric version EyeBrowse (National Eye Institute-NEI). Single promoter resolution of Pol-II distribution patterns suggest the Rho enhancer region and the Rho proximal promoter region become closely associated with the activated gene’s promoter complex.
Herpesvirus persistence requires a dynamic balance between latent and lytic cycle gene expression, but how this balance is maintained remains enigmatic. We have previously shown that the Kaposi's Sarcoma-Associated Herpesvirus (KSHV) major latency transcripts encoding LANA, vCyclin, vFLIP, v-miRNAs, and Kaposin are regulated, in part, by a chromatin organizing element that binds CTCF and cohesins. Using viral genome-wide chromatin conformation capture (3C) methods, we now show that KSHV latency control region is physically linked to the promoter regulatory region for ORF50, which encodes the KSHV immediate early protein RTA. Other linkages were also observed, including an interaction between the 5′ and 3′ end of the latency transcription cluster. Mutation of the CTCF-cohesin binding site reduced or eliminated the chromatin conformation linkages, and deregulated viral transcription and genome copy number control. siRNA depletion of CTCF or cohesin subunits also disrupted chromosomal linkages and deregulated viral latent and lytic gene transcription. Furthermore, the linkage between the latent and lytic control region was subject to cell cycle fluctuation and disrupted during lytic cycle reactivation, suggesting that these interactions are dynamic and regulatory. Our findings indicate that KSHV genomes are organized into chromatin loops mediated by CTCF and cohesin interactions, and that these inter-chromosomal linkages coordinate latent and lytic gene control.
Multiple mechanisms have been implicated in the control of herpesvirus latent and lytic gene regulation, but few mechanisms account for coordinate regulation of these two life cycles. Here, we show that the transcription control elements for KSHV latent and lytic genes are in close physical proximity. Mutations in the CTCF binding sites of the KSHV latency control region caused a loss of cohesin binding, and derepression of latent transcripts. Loss of CTCF binding also caused a loss of KSHV DNA copy number, and a failure to express lytic genes, including the immediate early gene Rta. Chromatin conformation capture (3C) methods indicated that the CTCF binding sites in the latency control region are linked to the promoter region of Rta. Additional chromatin linkages were detected between the 5′ and 3′ ends of the major latency transcripts, suggesting that chromatin loops organize both latent and lytic gene clusters. The interaction between latent and lytic control regions was subject to cell cycle regulation, consistent with earlier studies implicating cell cycle control of cohesin binding and viral transcription patterns. KSHV chromosome conformation was also disrupted by lytic cycle reactivation. We propose that CTCF-cohesin form dynamic linkages between viral regulatory domains to both insulate and coordinate latent and lytic gene expression.
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases.
Long non-coding RNAs (lncRNAs) comprise a novel, fascinating class of RNAs with largely unknown biological functions. Parkinson's-disease (PD) is the most frequent motor disorder, and Deep-brain-stimulation (DBS) treatment alleviates the symptoms, but early disease biomarkers are still unknown and new future genetic interference targets are urgently needed. Using RNA-sequencing technology and a novel computational workflow for in-depth exploration of whole-transcriptome RNA-seq datasets, we detected and analyzed lncRNAs in sequenced libraries from PD patients' leukocytes pre and post-treatment and the brain, adding this full profile resource of over 7,000 lncRNAs to the few human tissues-derived lncRNA datasets that are currently available. Our study includes sample-specific database construction, detecting disease-derived changes in known and novel lncRNAs, exons and junctions and predicting corresponding changes in Polyadenylation choices, protein domains and miRNA binding sites. We report widespread transcript structure variations at the splice junction and exons levels, including novel exons and junctions and alteration of lncRNAs followed by experimental validation in PD leukocytes and two PD brain regions compared with controls. Our results suggest lncRNAs involvement in neurodegenerative diseases, and specifically PD. This comprehensive workflow will be of use to the increasing number of laboratories producing RNA-Seq data in a wide range of biomedical studies.
The earliest stages of development in most metazoans are driven by maternally deposited proteins and mRNAs, with widespread transcriptional activation of the zygotic genome occurring hours after fertilization, at a period known as the maternal-to-zygotic transition (MZT). In Drosophila, the MZT is preceded by the transcription of a small number of genes that initiate sex determination, patterning, and other early developmental processes; and the zinc-finger protein Zelda (ZLD) plays a key role in their transcriptional activation. To better understand the mechanisms of ZLD activation and the range of its targets, we used chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) to map regions bound by ZLD before (mitotic cycle 8), during (mitotic cycle 13), and after (late mitotic cycle 14) the MZT. Although only a handful of genes are transcribed prior to mitotic cycle 10, we identified thousands of regions bound by ZLD in cycle 8 embryos, most of which remain bound through mitotic cycle 14. As expected, early ZLD-bound regions include the promoters and enhancers of genes transcribed at this early stage. However, we also observed ZLD bound at cycle 8 to the promoters of roughly a thousand genes whose first transcription does not occur until the MZT and to virtually all of the thousands of known and presumed enhancers bound at cycle 14 by transcription factors that regulate patterned gene activation during the MZT. The association between early ZLD binding and MZT activity is so strong that ZLD binding alone can be used to identify active promoters and regulatory sequences with high specificity and selectivity. This strong early association of ZLD with regions not active until the MZT suggests that ZLD is not only required for the earliest wave of transcription but also plays a major role in activating the genome at the MZT.
The newly fertilized eggs of most animal species begin development with a series of rapid cell divisions. During this time of rapid DNA replication, there is little or no transcription of the embryo's genome, with the synthesis of new proteins being directed by a store of maternally deposited mRNAs. Several hours after fertilization, at a period known as the maternal-to-zygotic transition (MZT), transcription of the embryo's genome begins in earnest, but little is known about how this process is initiated. In this paper we investigate the role of a protein known as Zelda (or ZLD) at the MZT in the laboratory model insect Drosophila melanogaster. ZLD had been previously shown to control the activation of a small number of genes expressed prior to the MZT. Here, using an experimental technique (ChIP-Seq) that allowed us to visualize where on the genome a protein is bound, we show that, approximately an hour prior to the MZT, ZLD is bound to most of the genomic regions active at the MZT. This suggests that ZLD may act as a kind of an “on switch” for the zygotic genome, poising regions where it binds for activation at the MZT, and this raises the possibility that similar master regulators of the MZT exist in other species.
We have analyzed publicly available K562 Hi-C data, which enable genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model and a power-law decay background to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 45 transcription factors and 9 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into two types of chromatin linkages. The different clusters of loci display very different relationships with transcription factor-binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 9, which is distinguished by marks of open chromatin but not by active enhancer or promoter marks, was not bound by most transcription factors but was highly enriched for three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (BRG1, INI1 and SIRT6). To investigate the impact of chromatin organization on gene regulation, we performed ribonucleicacid-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors not only alters the expression of genes having a nearby bound GATA but also affects expression of genes in interacting loci. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provides genome-wide evidence that Hi-C data identify sets of biologically relevant interacting loci.