One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.
DNA methylation and repressive histone modifications cooperate to silence promoters. One mechanism by which regions of methylated DNA could acquire repressive histone modifications is via methyl DNA-binding transcription factors. The zinc finger protein ZBTB33 (also known as Kaiso) has been shown in vitro to bind preferentially to methylated DNA and to interact with the SMRT/NCoR histone deacetylase complexes. We have performed bioinformatic analyses of Kaiso ChIP-seq and DNA methylation datasets to test a model whereby binding of Kaiso to methylated CpGs leads to loss of acetylated histones at target promoters.
Our results suggest that, contrary to expectations, Kaiso does not bind to methylated DNA in vivo but instead binds to highly active promoters that are marked with high levels of acetylated histones. In addition, our studies suggest that DNA methylation and nucleosome occupancy patterns restrict access of Kaiso to potential binding sites and influence cell type-specific binding.
We propose a new model for the genome-wide binding and function of Kaiso whereby Kaiso binds to unmethylated regulatory regions and contributes to the active state of target promoters.
DNA methylation; Zinc finger proteins; Histone modifications; Transcription factor binding; Epigenetics; Transcriptional regulation
Chromatin Histone Modification; Chromatin Immunoprecipitation (ChIP); Chromatin Regulation; Gene Transcription; Genomics; Transcription Factors
The TCF7L2 transcription factor is linked to a variety of human diseases, including type 2 diabetes and cancer. One mechanism by which TCF7L2 could influence expression of genes involved in diverse diseases is by binding to distinct regulatory regions in different tissues. To test this hypothesis, we performed ChIP-seq for TCF7L2 in six human cell lines.
We identified 116,000 non-redundant TCF7L2 binding sites, with only 1,864 sites common to the six cell lines. Using ChIP-seq, we showed that many genomic regions that are marked by both H3K4me1 and H3K27Ac are also bound by TCF7L2, suggesting that TCF7L2 plays a critical role in enhancer activity. Bioinformatic analysis of the cell type-specific TCF7L2 binding sites revealed enrichment for multiple transcription factors, including HNF4alpha and FOXA2 motifs in HepG2 cells and the GATA3 motif in MCF7 cells. ChIP-seq analysis revealed that TCF7L2 co-localizes with HNF4alpha and FOXA2 in HepG2 cells and with GATA3 in MCF7 cells. Interestingly, in MCF7 cells the TCF7L2 motif is enriched in most TCF7L2 sites but is not enriched in the sites bound by both GATA3 and TCF7L2. This analysis suggested that GATA3 might tether TCF7L2 to the genome at these sites. To test this hypothesis, we depleted GATA3 in MCF7 cells and showed that TCF7L2 binding was lost at a subset of sites. RNA-seq analysis suggested that TCF7L2 represses transcription when tethered to the genome via GATA3.
Our studies demonstrate a novel relationship between GATA3 and TCF7L2, and reveal important insights into TCF7L2-mediated gene regulation.
Developmental and homeostatic remodeling of cellular organelles is mediated by a complex process termed autophagy. The cohort of proteins that constitute the autophagy machinery functions in a multistep biochemical pathway. Though components of the autophagy machinery are broadly expressed, autophagy can occur in specialized cellular contexts, and mechanisms underlying cell-type-specific autophagy are poorly understood. We demonstrate that the master regulator of hematopoiesis, GATA-1, directly activates transcription of genes encoding the essential autophagy component microtubule-associated protein 1 light chain 3B (LC3B) and its homologs (MAP1LC3A, GABARAP, GABARAPL1, and GATE-16). In addition, GATA-1 directly activates genes involved in the biogenesis/function of lysosomes, which mediate autophagic protein turnover. We demonstrate that GATA-1 utilizes the forkhead protein FoxO3 to activate select autophagy genes. GATA-1-dependent LC3B induction is tightly coupled to accumulation of the active form of LC3B and autophagosomes, which mediate mitochondrial clearance as a critical step in erythropoiesis. These results illustrate a novel mechanism by which a master regulator of development establishes a genetic network to instigate cell-type-specific autophagy.
We have analyzed publicly available K562 Hi-C data, which enable genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model and a power-law decay background to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 45 transcription factors and 9 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into two types of chromatin linkages. The different clusters of loci display very different relationships with transcription factor-binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 9, which is distinguished by marks of open chromatin but not by active enhancer or promoter marks, was not bound by most transcription factors but was highly enriched for three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (BRG1, INI1 and SIRT6). To investigate the impact of chromatin organization on gene regulation, we performed ribonucleicacid-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors not only alters the expression of genes having a nearby bound GATA but also affects expression of genes in interacting loci. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provides genome-wide evidence that Hi-C data identify sets of biologically relevant interacting loci.
The self-renewal capacity ascribed to hESCs is paralleled in cancer cell proliferation, suggesting that a common network of genes may facilitate the promotion of these traits. However, the molecular mechanisms that are involved in regulating the silencing of these genes as stem cells differentiate into quiescent cellular lineages remain poorly understood. Here, we show that a differentiated cell specific miR-122 exemplifies this regulatory attribute by suppressing the translation of a gene, Pkm2, which is commonly enriched in hESCs and liver cancer cells (HCCs), and facilitates self-renewal and proliferation. Through a series of gene expression analysis, we show that miR-122 expression is highly elevated in quiescent human primary hepatocytes (hPHs) but lost or attenuated in hESCs and HCCs, while an opposing expression pattern is observed for Pkm2. Depleting hESCs and HCCs of Pkm2, or overexpressing miR-122, leads to a common deficiency in self-renewal and proliferation. Likewise, during the differentiation process of hESCs into hepatocytes, a reciprocal expression pattern is observed between miR-122 and Pkm2. An examination of the genomic region upstream of miR-122 uncovered hyper-methylation in hESCs and HCCs, while the same region is de-methylated and occupied by a transcription initiating protein, RNA polymerase II (RNAPII), in hPHs. These findings indicate that one possible mechanism by which hESC self-renewal is modulated in quiescent hepatic derivatives of hESCs is through the regulatory activity of a differentiated cell-specific miR-122, and that a failure to properly turn “on” this miRNA is observed in uncontrollably proliferating HCCs.
We have identified human MBT domain-containing protein L3MBTL2 as an integral component of a protein complex that we termed Polycomb Repressive Complex 1 (PRC1)-like 4 (PRC1L4) given the co-presence of PcG proteins RING1, RING2 and PCGF6/MBLR. PRC1L4 also contained E2F6 and CBX3/ HPlγ known to function in transcriptional repression. PRCIL4-mediated repression necessitated L3MBTL2 that compacted chromatin in a histone modification-independent manner. Genome-wide location analyses identified several hundred genes simultaneously bound by L3MBTL2 and E2F6, preferentially around transcriptional start sites that exhibited little overlap with those targeted by other E2Fs or by L3MBTL1, another MBT-domain containing protein that interacts with RB1. L3MBTL2-specific RNAi resulted in increased expression of target genes that exhibited a significant reduction in H2A lysine 119 monoubiquitination. These findings highlight a PcG/MBT collaboration that attains repressive chromatin without entailing histone lysine methylation marks.
TRIM28 (KAP1) is upregulated in many cancers and has been implicated in both transcriptional activation and repression. Using chromatin immunoprecipitation and sequencing, we show that KAP1 binding sites fall into several categories, specifically, the 3′ coding exons of zinc finger (ZNF) genes and promoter regions of ZNFs and other genes. The currently accepted model is that KAP1 is recruited to the genome via interaction of its N-terminal RBCC domain with KRAB ZNFs (KRAB domain containing ZNFs). To determine whether the interaction of KAP1 with KRAB ZNFs is the mechanism by which KAP1 is recruited to genomic binding sites, we analyzed stable cell lines that express tagged wild-type and mutant KAP1. Surprisingly, deletion of the RBCC domain abolished KAP1 binding to the 3′ exons of ZNF genes but KAP1 binding to promoter regions was unaffected. Using KAP1 knockdown cells, we showed that the genes most responsive to KAP1 were not ZNF genes but instead were either indirect targets or had KAP1 bound 10 to 100 kb from the transcription start site. Therefore, our studies suggest that KAP1 plays a role distinct from transcriptional regulation at the majority of its strongest binding sites.
In mammalian cells, multiple cellular processes, including gene silencing, cell growth and differentiation, pluripotency, neoplastic transformation, apoptosis, DNA repair, and maintenance of genomic integrity, converge on the evolutionarily conserved protein KAP1, which is thought to regulate the dynamic organization of chromatin structure via its ability to influence epigenetic patterns and chromatin compaction. In this minireview, we discuss how KAP1 might execute such pleiotropic effects, focusing on genomic targeting mechanisms, protein-protein interactions, specific post-translational modifications of both KAP1 and associated histones, and transcriptome analyses of cells deficient in KAP1.
Chromatin Histone Modification; Chromatin Immunoprecipitation (ChIP); Epigenetics; Transcription Factors; Transcriptional Repressor; Zinc Finger; ChIP-seq; KAP1; TIF1B; TRIM28
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
DNA methylation; Sequencing; Bisulfite
The H3K9me3 histone modification is often found at promoter regions, where it functions to repress transcription. However, we have previously shown that 3′ exons of zinc finger genes (ZNFs) are marked by high levels of H3K9me3. We have now further investigated this unusual location for H3K9me3 in ZNF genes. Neither bioinformatic nor experimental approaches support the hypothesis that the 3′ exons of ZNFs are promoters. We further characterized the histone modifications at the 3′ ZNF exons and found that these regions also contain H3K36me3, a mark of transcriptional elongation. A genome-wide analysis of ChIP-seq data revealed that ZNFs constitute the majority of genes that have high levels of both H3K9me3 and H3K36me3. These results suggested the possibility that the ZNF genes may be imprinted, with one allele transcribed and one allele repressed. To test the hypothesis that the contradictory modifications are due to imprinting, we used a SNP analysis of RNA-seq data to demonstrate that both alleles of certain ZNF genes having H3K9me3 and H3K36me3 are transcribed. We next analyzed isolated ZNF 3′ exons using stably integrated episomes. We found that although the H3K36me3 mark was lost when the 3′ ZNF exon was removed from its natural genomic location, the isolated ZNF 3′ exons retained the H3K9me3 mark. Thus, the H3K9me3 mark at ZNF 3′ exons does not impede transcription and it is regulated independently of the H3K36me3 mark. Finally, we demonstrate a strong relationship between the number of tandemly repeated domains in the 3′ exons and the H3K9me3 mark. We suggest that the H3K9me3 at ZNF 3′ exons may function to protect the genome from inappropriate recombination rather than to regulate transcription.
Previous studies of E2F family members have suggested that protein-protein interactions may be the mechanism by which E2F proteins are recruited to specific genomic regions. We have addressed this hypothesis on a genome-wide scale using ChIP-seq analysis of MCF7 cell lines that express tagged wild type and mutant E2F1 proteins. First, we performed ChIP-seq for tagged WT E2F1. Then, we analyzed E2F1 proteins that lacked the N-terminal SP1 and cyclin A binding domains, the C-terminal transactivation and pocket protein binding domains, and the internal marked box domain. Surprisingly, we found that the ChIP-seq patterns of the mutant proteins were identical to that of WT E2F1. However, mutation of the DNA binding domain abrogated all E2F1 binding to the genome. These results suggested that the interaction between the E2F1 DNA binding domain and a consensus motif may be the primary determinant of E2F1 recruitment. To address this possibility, we analyzed the in vivo binding sites for the in vitro-derived consensus E2F1 motif (TTTSSCGC) and also performed de novo motif analysis. We found that only 12% of the ChIP-seq peaks contained the TTTSSCGC motif. De novo motif analysis indicated that most of the in vivo sites lacked the 5′ half of the in vitro-derived consensus, having instead the in vivo consensus of CGCGC. In summary, our findings do not provide support for the model that protein-protein interactions are involved in recruiting E2F1 to the genome, but rather suggest that recognition of a motif found at most human promoters is the critical determinant.
Chromatin Immunoprecipitation (ChIP); DNA Binding Protein; E2F Transcription Factor; Transcription Promoter; Transcription Regulation
Only a small percentage of human transcription factors (e.g. those associated with a specific differentiation program) are expressed in a given cell type. Thus, cell fate is mainly determined by cell type-specific silencing of transcription factors that drive different cellular lineages. Several histone modifications have been associated with gene silencing, including H3K27me3 and H3K9me3. We have previously shown that genes for the two largest classes of mammalian transcription factors are marked by distinct histone modifications; homeobox genes are marked by H3K27me3 and zinc finger genes are marked by H3K9me3. Several histone methyltransferases (e.g. G9a and SETDB1) may be involved in mediating the H3K9me3 silencing mark. We have used ChIP-chip and ChIP-seq to demonstrate that SETDB1, but not G9a, is associated with regions of the genome enriched for H3K9me3. One current model is that SETDB1 is recruited to specific genomic locations via interaction with the corepressor TRIM28 (KAP1), which is in turn recruited to the genome via interaction with zinc finger transcription factors that contain a Kruppel-associated box (KRAB) domain. However, specific KRAB-ZNFs that recruit TRIM28 (KAP1) and SETDB1 to the genome have not been identified. We now show that ZNF274 (a KRAB-ZNF that contains 5 C2H2 zinc finger domains), can interact with KAP1 both in vivo and in vitro and, using ChIP-seq, we show that ZNF274 binding sites co-localize with SETDB1, KAP1, and H3K9me3 at the 3′ ends of zinc finger genes. Knockdown of ZNF274 with siRNAs reduced the levels of KAP1 and SETDB1 recruitment to the binding sites. These studies provide the first identification of a KRAB domain-containing ZNF that is involved in recruitment of the KAP1 and SETDB1 to specific regions of the human genome.
The orphan nuclear receptor TR4 (human testicular receptor 4 or NR2C2) plays a pivotal role in a variety of biological and metabolic processes. With no known ligand and few known target genes, the mode of TR4 function was unclear.
We report the first genome-wide identification and characterization of TR4 in vivo binding. Using chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), we identified TR4 binding sites in 4 different human cell types and found that the majority of target genes were shared among different cells. TR4 target genes are involved in fundamental biological processes such as RNA metabolism and protein translation. In addition, we found that a subset of TR4 target genes exerts cell-type specific functions. Analysis of the TR4 binding sites revealed that less than 30% of the peaks from any of the cell types contained the DR1 motif previously derived from in vitro studies, suggesting that TR4 may be recruited to the genome via interaction with other proteins. A bioinformatics analysis of the TR4 binding sites predicted a cis regulatory module involving TR4 and ETS transcription factors. To test this prediction, we performed ChIP-seq for the ETS factor ELK4 and found that 30% of TR4 binding sites were also bound by ELK4. Motif analysis of the sites bound by both factors revealed a lack of the DR1 element, suggesting that TR4 binding at a subset of sites is facilitated through the ETS transcription factor ELK4. Further studies will be required to investigate the functional interdependence of these two factors.
Our data suggest that TR4 plays a pivotal role in fundamental biological processes across different cell types. In addition, the identification of cell type specific TR4 binding sites enables future studies of the pathways underlying TR4 action and its possible role in metabolic diseases.
GATA factors interact with simple DNA motifs (WGATAR) to regulate critical processes, including hematopoiesis, but very few WGATAR motifs are occupied in genomes. Given the rudimentary knowledge of mechanisms underlying this restriction, and how GATA factors establish genetic networks, we used ChIP-seq to define GATA-1 and GATA-2 occupancy genome-wide in erythroid cells. Coupled with genetic complementation analysis and transcriptional profiling, these studies revealed a rich collection of targets containing a characteristic binding motif of greater complexity than WGATAR. GATA factors occupied loci encoding multiple components of the Scl/TAL1 complex, a master regulator of hematopoiesis and leukemogenic target. Mechanistic analyses provided evidence for cross-regulatory and autoregulatory interactions among components of this complex, including GATA-2 induction of the hematopoietic corepressor ETO-2 and an ETO-2 negative autoregulatory loop. These results establish fundamental principles underlying GATA factor mechanisms in chromatin and illustrate a complex network of considerable importance for the control of hematopoiesis.
A crucial question in the field of gene regulation is whether the location at which a transcription factor binds influences its effectiveness or the mechanism by which it regulates transcription. Comprehensive transcription factor binding maps are needed to address these issues, and genome-wide mapping is now possible thanks to the technological advances of ChIP-chip and ChIP-Seq. This review discusses how recent genomic profiling of transcription factors gives insight into how binding specificity is achieved and what features of chromatin influence the ability of transcription factors to interact with the genome, and also suggests future experiments to further our understanding of the causes and consequences of transcription factor-genome interactions.
Myc proteins have long been modeled to operate strictly as classical gene specific transcription factors, however we find that N-Myc has a robust role in the human genome in regulating global cellular euchromatin including that of intergenic regions. Strikingly, 90–95% of the total genomic euchromatic marks histone H3 acetylated at lysine 9 and methylated at lysine 4 is N-Myc dependent. However, Myc regulation of transcription, even of genes it directly binds and at which it is required for maintenance of active chromatin, is generally weak. Thus, Myc has a much more potent ability to regulate large domains of euchromatin than to influence transcription of individual genes. Overall, Myc regulation of chromatin in the human genome includes both specific genes, but also expansive genomic domains that invoke functions independent of a classical transcription factor. These findings support a new dual model for Myc chromatin function with important implications for the role of Myc in cancer and stem cell biology, including that of induced pluripotent stem (iPS) cells.
Myc; global chromatin; intergenic; neuroblastoma; stem cell; iPS
Next-generation sequencing is revolutionizing the identification of transcription factor binding sites throughout the human genome. However, the bioinformatics analysis of large datasets collected using chromatin immunoprecipitation and high-throughput sequencing is often a roadblock that impedes researchers in their attempts to gain biological insights from their experiments. We have developed integrated peak-calling and analysis software (Sole-Search) which is available through a user-friendly interface and (i) converts raw data into a format for visualization on a genome browser, (ii) outputs ranked peak locations using a statistically based method that overcomes the significant problem of false positives, (iii) identifies the gene nearest to each peak, (iv) classifies the location of each peak relative to gene structure, (v) provides information such as the number of binding sites per chromosome and per gene and (vi) allows the user to determine overlap between two different experiments. In addition, the program performs an analysis of amplified and deleted regions of the input genome. This software is web-based and automated, allowing easy and immediate access to all investigators. We demonstrate the utility of our software by collecting, analyzing and comparing ChIP-seq data for six different human transcription factors/cell line combinations.
Summary: W-ChIPMotifs is a web application tool that provides a user friendly interface for de novo motif discovery. The web tool is based on our previous ChIPMotifs program which is a de novo motif finding tool developed for ChIP-based high-throughput data and incorporated various ab initio motif discovery tools such as MEME, MaMF, Weeder and optimized the significance of the detected motifs by using a bootstrap resampling statistic method and a Fisher test. Use of a randomized statistical model like bootstrap resampling can significantly increase the accuracy of the detected motifs. In our web tool, we have modified the program in two aspects: (i) we have refined the P-value with a Bonferroni correction; (ii) we have incorporated the STAMP tool to infer phylogenetic information and to determine the detected motifs if they are novel and known using the TRANSFAC and JASPAR databases. A comprehensive result file is mailed to users.
Availability: http://motif.bmi.ohio-state.edu/ChIPMotifs. Data used in the article may be downloaded from http://motif.bmi.ohio-state.edu/ChIPMotifs/examples.shtml.
There is widespread interest in efficient characterization of differences between tumor and normal samples. Here, we demonstrate an effective methodology for genome-scale characterization of tumors. Using matched normal and tumor samples from liver cancer patients, as well as non-cancer-related normal liver tissue, we first determined changes in gene expression as monitored on RNA expression arrays. We identified several hundred mRNAs that were consistently changed in the tumor samples. To characterize the mechanisms responsible for creation of the tumor-specific transcriptome, we performed ChIP-chip experiments to assay binding of RNA polymerase II, H3me3K27, and H3me3K9 and DNA methylation in 25,000 promoter regions. These experiments identified changes in active and silenced regions of the genome in the tumor cells. Finally, we used a “virtual comparative genomic hybridization” (vCGH) method to identify copy number alterations in the tumor samples. Through comparison of RNA Polymerase II binding, chromatin structure, DNA methylation, and copy number changes, we suggest that the major contributor to creation of the liver tumor transcriptome was changes in gene copy number.
liver tumors; RNA profiling; ChIP-chip; comparative genomic hybridization