Kaposi's sarcoma-associated herpesvirus (KSHV) is a γ-herpesvirus associated with KS and two lymphoproliferative diseases. Recent studies characterized epigenetic modification of KSHV episomes during latency and determined that latency-associated genes are associated with H3K4me3 while most lytic genes are associated with the silencing mark H3K27me3. Since the latency-associated nuclear antigen (LANA) (i) is expressed very early after de novo infection, (ii) interacts with transcriptional regulators and chromatin remodelers, and (iii) regulates the LANA and RTA promoters, we hypothesized that LANA may contribute to the establishment of latency through epigenetic control. We performed a detailed ChIP-seq analysis in cells of lymphoid and endothelial origin and compared H3K4me3, H3K27me3, polII, and LANA occupancy. On viral episomes LANA binding was detected at numerous lytic and latent promoters, which were transactivated by LANA using reporter assays. LANA binding was highly enriched at H3K4me3 peaks and this co-occupancy was also detected on many host gene promoters. Bioinformatic analysis of enriched LANA binding sites in combination with biochemical binding studies revealed three distinct binding patterns. A small subset of LANA binding sites showed sequence homology to the characterized LBS1/2 sequence in the viral terminal repeat. A large number of sites contained a novel LANA binding motif (TCCAT)3 which was confirmed by gel shift analysis. Third, some viral and cellular promoters did not contain LANA binding sites and are likely enriched through protein/protein interaction. LANA was associated with H3K4me3 marks and in PEL cells 86% of all LANA bound promoters were transcriptionally active, leading to the hypothesis that LANA interacts with the machinery that methylates H3K4. Co-immunoprecipitation demonstrated LANA association with endogenous hSET1 complexes in both lymphoid and endothelial cells suggesting that LANA may contribute to the epigenetic profile of KSHV episomes.
KSHV is a DNA tumor virus which is associated with Kaposi's sarcoma and some lymphoproliferative diseases. During latent infection, the viral genome persists as circular extrachromosomal DNA in the nucleus and expresses a very limited number of viral proteins, including LANA, a multi-functional protein. KSHV viral episomes, like host genomic DNA, are subject to chromatin formation and histone modifications which contribute to tightly controlled gene expression during latency. We determined where LANA binds on the KSHV and human genomes, and mapped activating and repressing histone marks and RNA polymerase II binding. We found that LANA bound near transcription start sites, and binding correlated with the transcription active mark H3K4me3, but not silencing mark H3K27me3. Binding sites for transcription factors including znf143, CTCF, and Stat1 are enriched at regions where LANA is bound. We identified some LANA binding sites near human gene promoters that resembled KSHV sequences known to bind LANA. We also found a novel motif that occurs frequently in the human genome and that binds LANA directly despite being different from known LANA-binding sequences. Furthermore, we demonstrate that LANA associates with the H3K4 methyltransferase hSET1 which creates activating histone marks.
Long-range regulatory elements, such as enhancers, exert substantial control over tissue-specific gene expression patterns. Genome-wide discovery of functional enhancers in different cell types is important for our understanding of genome function as well as human disease etiology.
In this study, we developed an in silico approach to model the previously reported phenomenon of transcriptional pausing, accompanied by divergent transcription, at active promoters. We then used this model for large-scale prediction of non-promoter-associated bidirectional expression of short transcripts. Our predictions were significantly enriched for DNase hypersensitive sites, histone H3 lysine 27 acetylation (H3K27ac), and other chromatin marks associated with active rather than poised or repressed enhancers. We also detected modest bidirectional expression at binding sites of the CCCTC-factor (CTCF) genome-wide, particularly those that overlap H3K27ac.
Our findings indicate that the signature of bidirectional expression of short transcripts, learned from promoter-proximal transcriptional pausing, can be used to predict active long-range regulatory elements genome-wide, likely due in part to specific association of RNA polymerase with enhancer regions.
CCCTC binding factor (CTCF) is a highly conserved zinc finger protein, which is involved in chromatin organization, local histone modifications, and RNA polymerase II-mediated gene transcription. CTCF may act by binding tightly to DNA and recruiting other proteins to mediate its various functions in the nucleus. To further explore the role of this essential factor, we used a mass spectrometry-based approach to screen for novel CTCF-interacting partners.
Using biotinylated CTCF as bait, we identified upstream binding factor (UBF) and multiple other components of the RNA polymerase I complex as potential CTCF-interacting partners. Interestingly, CTCFL, the testis-specific paralog of CTCF, also binds UBF. The interaction between CTCF(L) and UBF is direct, and requires the zinc finger domain of CTCF(L) and the high mobility group (HMG)-box 1 and dimerization domain of UBF. Because UBF is involved in RNA polymerase I-mediated ribosomal (r)RNA transcription, we analyzed CTCF binding to the rDNA repeat. We found that CTCF bound to a site upstream of the rDNA spacer promoter and preferred non-methylated over methylated rDNA. DNA binding by CTCF in turn stimulated binding of UBF. Absence of CTCF in cultured cells resulted in decreased association of UBF with rDNA and in nucleolar fusion. Furthermore, lack of CTCF led to reduced binding of RNA polymerase I and variant histone H2A.Z near the rDNA spacer promoter, a loss of specific histone modifications, and diminished transcription of non-coding RNA from the spacer promoter.
UBF is the first common interaction partner of CTCF and CTCFL, suggesting a role for these proteins in chromatin organization of the rDNA repeats. We propose that CTCF affects RNA polymerase I-mediated events globally by controlling nucleolar number, and locally by regulating chromatin at the rDNA spacer promoter, similar to RNA polymerase II promoters. CTCF may load UBF onto rDNA, thereby forming part of a network that maintains rDNA genes poised for transcription.
Identification of regulatory elements within the genome is crucial for understanding the mechanisms that govern cell type–specific gene expression. We generated genome-wide maps of open chromatin sites in 3T3-L1 adipocytes (on day 0 and day 8 of differentiation) and NIH-3T3 fibroblasts using formaldehyde-assisted isolation of regulatory elements coupled with high-throughput sequencing (FAIRE-seq). FAIRE peaks at the promoter were associated with active transcription and histone modifications of H3K4me3 and H3K27ac. Non-promoter FAIRE peaks were characterized by H3K4me1+/me3-, the signature of enhancers, and were largely located in distal regions. The non-promoter FAIRE peaks showed dynamic change during differentiation, while the promoter FAIRE peaks were relatively constant. Functionally, the adipocyte- and preadipocyte-specific non-promoter FAIRE peaks were, respectively, associated with genes up-regulated and down-regulated by differentiation. Genes highly up-regulated during differentiation were associated with multiple clustered adipocyte-specific FAIRE peaks. Among the adipocyte-specific FAIRE peaks, 45.3% and 11.7% overlapped binding sites for, respectively, PPARγ and C/EBPα, the master regulators of adipocyte differentiation. Computational motif analyses of the adipocyte-specific FAIRE peaks revealed enrichment of a binding motif for nuclear family I (NFI) transcription factors. Indeed, ChIP assay showed that NFI occupy the adipocyte-specific FAIRE peaks and/or the PPARγ binding sites near PPARγ, C/EBPα, and aP2 genes. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus. Overexpression of dominant-negative NFIA or siRNA–mediated knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation, suggesting a physiological function of these factors in the adipogenic program. Together, our study demonstrates the utility of FAIRE-seq in providing a global view of cell type–specific regulatory elements in the genome and in identifying transcriptional regulators of adipocyte differentiation.
Humans consist of a few hundred types of specialized-function cells. Spatial and temporal transcriptional regulation of genes is essential for manifestation of cellular phenotypes. Identification of regulatory regions in the genome is central to understanding the mechanism of cell type–specific gene regulation. Recently developed high-throughput sequencing technology and computational analyses allow genome-wide investigation of the genome's chromatin structure. Using the FAIRE-seq technique, we identified the genome's open chromatin regions, which harbor regulatory elements in adipocytes. Open chromatin regions distal to genes' transcription start sites significantly differ among cell types. Multiple cell type–specific open chromatin regions exist near genes regulated during adipocyte differentiation. Computational motif analysis of adipocyte-specific open chromatin regions revealed enrichment of a binding motif for the NFI transcription factor family. These factors bind to the regulatory elements near adipogenic PPARγ, C/EBPα, and aP2 genes and regulate their expression. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus and knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation. Our study demonstrates the utility of FAIRE-seq in providing a global view of regulatory elements and in identifying transcriptional regulators of cellular functions.
MicroRNAs are small non-coding RNAs involved in post-transcriptional regulation of gene expression. Due to the poor annotation of primary microRNA (pri-microRNA) transcripts, the precise location of promoter regions driving expression of many microRNA genes is enigmatic. This deficiency hinders our understanding of microRNA-mediated regulatory networks. In this study, we develop a computational approach to identify the promoter region and transcription start site (TSS) of pri-microRNAs actively transcribed using genome-wide RNA Polymerase II (RPol II) binding patterns derived from ChIP-seq data. Based upon the assumption that the distribution of RPol II binding patterns around the TSS of microRNA and protein coding genes are similar, we designed a statistical model to mimic RPol II binding patterns around the TSS of highly expressed, well-annotated promoter regions of protein coding genes. We used this model to systematically scan the regions upstream of all intergenic microRNAs for RPol II binding patterns similar to those of TSS from protein coding genes. We validated our findings by examining the conservation, CpG content, and activating histone marks in the identified promoter regions. We applied our model to assess changes in microRNA transcription in steroid hormone-treated breast cancer cells. The results demonstrate many microRNA genes have lost hormone-dependent regulation in tamoxifen-resistant breast cancer cells. MicroRNA promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription, and therefore allows comparison of transcription activities between different conditions, such as normal and disease states.
Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context.
We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters.
We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters.
Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters.
The application of deep sequencing to map 5′ capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: “focused” promoters with transcription start sites (TSSs) that occur in a narrowly defined genomic span and “dispersed” promoters with TSSs that are spread over a larger window. Previous studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, but virtually no studies have directly investigated the relationship with chromatin features. Here, we show that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Dispersed promoters display higher associations with well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone variants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes. Notably, differences are conserved across mammals and flies, and they provide for a clearer separation of promoter architectures than the presence and absence of CpG islands or the occurrence of stalled RNA polymerase. Computational models support the stronger contribution of chromatin features to the definition of dispersed promoters compared to focused start sites. Our results show that promoter classes defined from 5′ capped transcripts not only reflect differences in the initiation process at the core promoter but also are indicative of divergent transcriptional programs established within gene-proximal nucleosome organization.
How are genes transcribed at the right levels and under the right conditions? Transcription regulation in eukaryotes has long been proposed to work by a division of labor: ubiquitous DNA sequence features in the core promoter region, close to the transcription start site (TSS) of genes, were thought to generically encode information to recruit RNA polymerase to initiate transcription, while specific sequence features, often distal from the genes, were thought to boost expression under the right conditions. Supporting the generic function of core promoters, genome-wide chromatin maps showed a stereotypical arrangement of well-spaced nucleosomes providing access to the TSS. High-throughput sequencing has generated genome-wide TSS maps at high resolution, which show that promoters exhibit different initiation patterns, ranging from focused start sites to dispersed regions. Linking these patterns to chromatin maps, we now find distinct core promoter classes, those in which the TSS location is defined broadly on the chromatin level and those in which the TSS is defined by precisely positioned sequence features. Notably, these architectures are conserved deeply across eukaryotes and are used for different functional classes of genes. Our work adds to the increasing understanding that core promoters contribute significantly to the complexity of eukaryotic gene expression.
MicroRNAs (miRNAs) are small non-coding RNAs that regulate expression of various target genes. miRNAs are expressed in a tissue-specific manner and play important roles in cell proliferation, apoptosis, and differentiation. Epigenetic alterations such as DNA methylation and histone modification are essential for chromatin remodeling and regulation of gene expression including miRNAs. The CCCTC-binding factor, CTCF, is known to bind insulators and exhibits an enhancer-blocking and barrier function, and more recently, it also contributes to the three-dimensional organization of the genome. CTCF can also serve as a barrier against the spread of DNA methylation and histone repressive marks over promoter regions of tumor suppressor genes. Recent studies have shown that CTCF is also involved in the regulation of miRNAs such as miR-125b1, miR-375, and the miR-290 cluster in cancer cells and stem cells. miR-125b1 is a candidate of tumor suppressor and is silenced in breast cancer cells. On the other hand, miR-375 may have oncogenic function and is overexpressed in breast cancer cells. CTCF is involved in the regulation of both miR-125b1 and miR-375, indicating that there are various patterns of CTCF-associated epigenetic regulation of miRNAs. CTCF may also play a key role in the pluripotency of cells through the regulation of miR-290 cluster. These observations suggest that CTCF-mediated regulation of miRNAs could be a novel approach for cancer therapy and regenerative medicine.
microRNA; CTCF; cancer cell; embryonic stem cell; miR-125b1; miR-375; miR-290 cluster
Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications.
The DNA in eukaryotes is packaged by histones. Interestingly, histones can be marked by a variety of posttranslational modifications, and it has been hypothesized that distinct combinations of histone modifications mark at distinct functional regions of the genome. The study of histone modifications has been aided by the development of high-throughput techniques to map a wide assortment of histone modifications on a global scale. However, because much of our current understanding of the human genome is concentrated on promoters, most studies have only examined histone modifications at these well-defined sites, ignoring the vast majority of the genome. To aid in the discovery of functional elements outside of these well-annotated loci, we develop an unbiased method that searches for commonly occurring histone modification patterns on a global scale without using any annotation information. This method recovers known patterns associated with transcriptional enhancers and promoters. Supporting the histone code hypothesis, we discover that the different functional activities of enhancers are closely associated with the presence of different histone modification patterns. We also discover several novel patterns that likely contain other potential regulatory elements. As the availability of large-scale histone modification data increases, the ability of methods such as the one presented here to concisely describe commonly occurring chromatin signatures, thereby abstracting away irrelevant or redundant data, will become increasingly more critical.
Kaposi's sarcoma-associated herpesvirus (KSHV) is a human herpesvirus that causes Kaposi's sarcoma and is associated with the development of lymphoproliferative diseases. KSHV reactivation from latency and virion production is dependent on efficient transcription of over eighty lytic cycle genes and viral DNA replication. CTCF and cohesin, cellular proteins that cooperatively regulate gene expression and mediate long-range DNA interactions, have been shown to bind at specific sites in herpesvirus genomes. CTCF and cohesin regulate KSHV gene expression during latency and may also control lytic reactivation, although their role in lytic gene expression remains incompletely characterized. Here, we analyze the dynamic changes in CTCF and cohesin binding that occur during the process of KSHV viral reactivation and virion production by high resolution chromatin immunoprecipitation and deep sequencing (ChIP-Seq) and show that both proteins dissociate from viral genomes in kinetically and spatially distinct patterns. By utilizing siRNAs to specifically deplete CTCF and Rad21, a cohesin component, we demonstrate that both proteins are potent restriction factors for KSHV replication, with cohesin knockdown leading to hundred-fold increases in viral yield. High-throughput RNA sequencing was used to characterize the transcriptional effects of CTCF and cohesin depletion, and demonstrated that both proteins have complex and global effects on KSHV lytic transcription. Specifically, both proteins act as positive factors for viral transcription initially but subsequently inhibit KSHV lytic transcription, such that their net effect is to limit KSHV RNA accumulation. Cohesin is a more potent inhibitor of KSHV transcription than CTCF but both proteins are also required for efficient transcription of a subset of KSHV genes. These data reveal novel effects of CTCF and cohesin on transcription from a relatively small genome that resemble their effects on the cellular genome by acting as gene-specific activators of some promoters, but differ in acting as global negative regulators of transcription.
Kaposi's sarcoma-associated herpesvirus (KSHV) is a human virus that causes Kaposi's sarcoma and lymphoma. KSHV establishes a lifelong infection in B lymphocytes, and persists in a latent form as circular DNA molecules. Reactivation and replication yield infectious virions, allowing transmission and maintenance of latent infection. The cellular mechanisms controlling reactivation remain incompletely characterized. Host proteins that regulate RNA transcription play an important role in controlling viral reactivation. In this study, we used high-throughput techniques to analyze the binding of two cellular proteins, CTCF and Rad21, to the KSHV genome as the virus reactivated to produce infectious virions. We found that these proteins dissociate from the latent genome when reactivation occurs. We also found that depleting cells of these proteins increases virus production as much as a hundredfold. Depleting the cell of CTCF or Rad21 caused complex changes in the synthesis of RNAs by KSHV, with the amounts of most KSHV RNAs increasing greatly. We also showed that Rad21 and CTCF are needed for the virus to synthesize RNAs efficiently. Our study provides new insights into how the cell uses CTCF and Rad21 to limit KSHV's ability to synthesize RNA and reactivate from latency to produce infectious virus.
Bidirectional promoters are shared promoter sequences between divergent gene pair (genes proximal to each other on opposite strands), and can regulate the genes in both directions. In the human genome, > 10% of protein-coding genes are arranged head-to-head on opposite strands, with transcription start sites that are separated by < 1,000 base pairs. Many transcription factor binding sites occur in the bidirectional promoters that influence the expression of 2 opposite genes. Recently, RNA polymerase II (RPol II) ChIP-seq data are used to identify the promoters of coding genes and non-coding RNAs. However, a bidirectional promoter with RPol II ChIP-Seq data has not been found.
In some bidirectional promoter regions, the RPol II forms a bi-peak shape, which indicates that 2 promoters are located in the bidirectional region. We have developed a computational approach to identify the regulatory regions of all divergent gene pairs using genome-wide RPol II binding patterns derived from ChIP-seq data, based upon the assumption that the distribution of RPol II binding patterns around the bidirectional promoters are accumulated by RPol II binding of 2 promoters. In HeLa S3 cells, 249 promoter pairs and 1094 single promoters were identified, of which 76 promoters cover only positive genes, 86 promoters cover only negative genes, and 932 promoters cover 2 genes. Gene expression levels and STAT1 binding sites for different promoter categories were therefore examined.
The regulatory region of bidirectional promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription. From gene expression and transcription factor binding site analysis, the promoters in bidirectional regions may regulate the closest gene, and STAT1 is involved in primary promoter.
MOF is the major histone H4 lysine 16-specific (H4K16) acetyltransferase in mammals and Drosophila. In flies, it is involved in the regulation of X-chromosomal and autosomal genes as part of the MSL and the NSL complexes, respectively. While the function of the MSL complex as a dosage compensation regulator is fairly well understood, the role of the NSL complex in gene regulation is still poorly characterized. Here we report a comprehensive ChIP–seq analysis of four NSL complex members (NSL1, NSL3, MBD-R2, and MCRS2) throughout the Drosophila melanogaster genome. Strikingly, the majority (85.5%) of NSL-bound genes are constitutively expressed across different cell types. We find that an increased abundance of the histone modifications H4K16ac, H3K4me2, H3K4me3, and H3K9ac in gene promoter regions is characteristic of NSL-targeted genes. Furthermore, we show that these genes have a well-defined nucleosome free region and broad transcription initiation patterns. Finally, by performing ChIP–seq analyses of RNA polymerase II (Pol II) in NSL1- and NSL3-depleted cells, we demonstrate that both NSL proteins are required for efficient recruitment of Pol II to NSL target gene promoters. The observed Pol II reduction coincides with compromised binding of TBP and TFIIB to target promoters, indicating that the NSL complex is required for optimal recruitment of the pre-initiation complex on target genes. Moreover, genes that undergo the most dramatic loss of Pol II upon NSL knockdowns tend to be enriched in DNA Replication–related Element (DRE). Taken together, our findings show that the MOF-containing NSL complex acts as a major regulator of housekeeping genes in flies by modulating initiation of Pol II transcription.
Housekeeping genes are required to support basic cellular functions and are therefore expressed constitutively in all tissues. Although the homeostasis of housekeeping gene expression is vital for cell survival, most research on the transcription initiation has been focused on TATA-box-containing promoters of inducible and developmental genes, while regulatory mechanisms at the TATA-less promoters of housekeeping genes have remained poorly understood. Using genome-wide chromatin binding profiles, we find that the NSL complex, a histone acetyltransferase-containing complex, is bound to the majority of constitutively active gene promoters. We show that NSL-bound genes display specific sets of DNA motifs, well-defined nucleosome free regions, and broad transcription initiation patterns. In addition, we show that the NSL complex regulates the recruitment of the basal transcription machinery to target promoters; more specifically, we can pinpoint its role to the early steps of Pol II recruitment. Interestingly, we also see that NSL-bound genes are most susceptible to Pol II loss after depletion of NSLs when they contain the DNA Replication–related Element (DRE). Taken together, we provide a genome-wide analysis of a chromatin-modifying complex that is globally involved in the regulation of housekeeping gene expression.
Gene expression is regulated by the complex interaction between transcriptional activators and repressors, which function in part by recruiting histone-modifying enzymes to control accessibility of DNA to RNA polymerase. The evolutionarily conserved family of Groucho/Transducin-Like Enhancer of split (Gro/TLE) proteins act as co-repressors for numerous transcription factors. Gro/TLE proteins act in several key pathways during development (including Notch and Wnt signaling), and are implicated in the pathogenesis of several human cancers. Gro/TLE proteins form oligomers and it has been proposed that their ability to exert long-range repression on target genes involves oligomerization over broad regions of chromatin. However, analysis of an endogenous gro mutation in Drosophila revealed that oligomerization of Gro is not always obligatory for repression in vivo. We have used chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) to profile Gro recruitment in two Drosophila cell lines. We find that Gro predominantly binds at discrete peaks (<1 kilobase). We also demonstrate that blocking Gro oligomerization does not reduce peak width as would be expected if Gro oligomerization induced spreading along the chromatin from the site of recruitment. Gro recruitment is enriched in “active” chromatin containing developmentally regulated genes. However, Gro binding is associated with local regions containing hypoacetylated histones H3 and H4, which is indicative of chromatin that is not fully open for efficient transcription. We also find that peaks of Gro binding frequently overlap the transcription start sites of expressed genes that exhibit strong RNA polymerase pausing and that depletion of Gro leads to release of polymerase pausing and increased transcription at a bona fide target gene. Our results demonstrate that Gro is recruited to local sites by transcription factors to attenuate rather than silence gene expression by promoting histone deacetylation and polymerase pausing.
Repression by transcription factors plays a central role in gene regulation. The Groucho/Transducin-Like Enhancer of split (Gro/TLE) family of co-repressors interacts with many different transcription factors and has many essential roles during animal development. Groucho/TLE proteins form oligomers that are necessary for target gene repression in some contexts. We have profiled the genome-wide recruitment of the founding member of this family, Groucho (from Drosophila) to gain insight into how and where it binds with respect to target genes and to identify factors associated with its binding. We find that Groucho binds in discrete peaks, frequently at transcription start sites, and that blocking Groucho from forming oligomers does not significantly change the pattern of Groucho recruitment. Although Groucho acts as a repressor, Groucho binding is enriched in chromatin that is permissive for transcription, and we find that it acts to attenuate rather than completely silence target gene expression. Thus, Groucho does not act as an “on/off” switch on target gene expression, but rather as a “mute” button.
Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study.
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases.
Long non-coding RNAs (lncRNAs) comprise a novel, fascinating class of RNAs with largely unknown biological functions. Parkinson's-disease (PD) is the most frequent motor disorder, and Deep-brain-stimulation (DBS) treatment alleviates the symptoms, but early disease biomarkers are still unknown and new future genetic interference targets are urgently needed. Using RNA-sequencing technology and a novel computational workflow for in-depth exploration of whole-transcriptome RNA-seq datasets, we detected and analyzed lncRNAs in sequenced libraries from PD patients' leukocytes pre and post-treatment and the brain, adding this full profile resource of over 7,000 lncRNAs to the few human tissues-derived lncRNA datasets that are currently available. Our study includes sample-specific database construction, detecting disease-derived changes in known and novel lncRNAs, exons and junctions and predicting corresponding changes in Polyadenylation choices, protein domains and miRNA binding sites. We report widespread transcript structure variations at the splice junction and exons levels, including novel exons and junctions and alteration of lncRNAs followed by experimental validation in PD leukocytes and two PD brain regions compared with controls. Our results suggest lncRNAs involvement in neurodegenerative diseases, and specifically PD. This comprehensive workflow will be of use to the increasing number of laboratories producing RNA-Seq data in a wide range of biomedical studies.
We have analyzed publicly available K562 Hi-C data, which enable genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model and a power-law decay background to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 45 transcription factors and 9 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into two types of chromatin linkages. The different clusters of loci display very different relationships with transcription factor-binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 9, which is distinguished by marks of open chromatin but not by active enhancer or promoter marks, was not bound by most transcription factors but was highly enriched for three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (BRG1, INI1 and SIRT6). To investigate the impact of chromatin organization on gene regulation, we performed ribonucleicacid-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors not only alters the expression of genes having a nearby bound GATA but also affects expression of genes in interacting loci. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provides genome-wide evidence that Hi-C data identify sets of biologically relevant interacting loci.
The earliest stages of development in most metazoans are driven by maternally deposited proteins and mRNAs, with widespread transcriptional activation of the zygotic genome occurring hours after fertilization, at a period known as the maternal-to-zygotic transition (MZT). In Drosophila, the MZT is preceded by the transcription of a small number of genes that initiate sex determination, patterning, and other early developmental processes; and the zinc-finger protein Zelda (ZLD) plays a key role in their transcriptional activation. To better understand the mechanisms of ZLD activation and the range of its targets, we used chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) to map regions bound by ZLD before (mitotic cycle 8), during (mitotic cycle 13), and after (late mitotic cycle 14) the MZT. Although only a handful of genes are transcribed prior to mitotic cycle 10, we identified thousands of regions bound by ZLD in cycle 8 embryos, most of which remain bound through mitotic cycle 14. As expected, early ZLD-bound regions include the promoters and enhancers of genes transcribed at this early stage. However, we also observed ZLD bound at cycle 8 to the promoters of roughly a thousand genes whose first transcription does not occur until the MZT and to virtually all of the thousands of known and presumed enhancers bound at cycle 14 by transcription factors that regulate patterned gene activation during the MZT. The association between early ZLD binding and MZT activity is so strong that ZLD binding alone can be used to identify active promoters and regulatory sequences with high specificity and selectivity. This strong early association of ZLD with regions not active until the MZT suggests that ZLD is not only required for the earliest wave of transcription but also plays a major role in activating the genome at the MZT.
The newly fertilized eggs of most animal species begin development with a series of rapid cell divisions. During this time of rapid DNA replication, there is little or no transcription of the embryo's genome, with the synthesis of new proteins being directed by a store of maternally deposited mRNAs. Several hours after fertilization, at a period known as the maternal-to-zygotic transition (MZT), transcription of the embryo's genome begins in earnest, but little is known about how this process is initiated. In this paper we investigate the role of a protein known as Zelda (or ZLD) at the MZT in the laboratory model insect Drosophila melanogaster. ZLD had been previously shown to control the activation of a small number of genes expressed prior to the MZT. Here, using an experimental technique (ChIP-Seq) that allowed us to visualize where on the genome a protein is bound, we show that, approximately an hour prior to the MZT, ZLD is bound to most of the genomic regions active at the MZT. This suggests that ZLD may act as a kind of an “on switch” for the zygotic genome, poising regions where it binds for activation at the MZT, and this raises the possibility that similar master regulators of the MZT exist in other species.
During retinal development, post-mitotic neural progenitor cells must activate thousands of genes to complete synaptogenesis and terminal maturation. While many of these genes are known, others remain beyond the sensitivity of expression microarray analysis. Some of these elusive gene activation events can be detected by mapping changes in RNA polymerase-II (Pol-II) association around transcription start sites.
High-resolution (35 bp) chromatin immunoprecipitation (ChIP)-on-chip was used to map changes in Pol-II binding surrounding 26,000 gene transcription start sites during photoreceptor maturation of the mouse neural retina, comparing postnatal age 25 (P25) to P2. Coverage was 10–12 kb per transcription start site, including 2.5 kb downstream. Pol-II-active regions were mapped to the mouse genomic DNA sequence by using computational methods (Tiling Analysis Software-TAS program), and the ratio of maximum Pol-II binding (P25/P2) was calculated for each gene. A validation set of 36 genes (3%), representing a full range of Pol-II signal ratios (P25/P2), were examined with quantitative ChIP assays for transcriptionally active Pol-II. Gene expression assays were also performed for 19 genes of the validation set, again on independent samples. FLT-3 Interacting Zinc-finger-1 (FIZ1), a zinc-finger protein that associates with active promoter complexes of photoreceptor-specific genes, provided an additional ChIP marker to highlight genes activated in the mature neural retina. To demonstrate the use of ChIP-on-chip predictions to find novel gene activation events, four additional genes were selected for quantitative PCR analysis (qRT–PCR analysis); these four genes have human homologs located in unidentified retinal disease regions: Solute carrier family 25 member 33 (Slc25a33), Lysophosphatidylcholine acyltransferase 1 (Lpcat1), Coiled-coil domain-containing 126 (Ccdc126), and ADP-ribosylation factor-like 4D (Arl4d).
ChIP-on-chip Pol-II peak signal ratios >1.8 predicted increased amounts of transcribing Pol-II and increased expression with an estimated 97% accuracy, based on analysis of the validation gene set. Using this threshold ratio, 1,101 genes were predicted to experience increased binding of Pol-II in their promoter regions during terminal maturation of the neural retina. Over 800 of these gene activations were additional to those previously reported by microarray analysis. Slc25a33, Lpcat1, Ccdc126, and Arl4d increased expression significantly (p<0.001) during photoreceptor maturation. Expression of all four genes was diminished in adult retinas lacking rod photoreceptors (Rd1 mice) compared to normal retinas (90% loss for Ccdc126 and Arl4d). For rhodopsin (Rho), a marker of photoreceptor maturation, two regions of maximum Pol-II signal corresponded to the upstream rhodopsin enhancer region and the rhodopsin proximal promoter region.
High-resolution maps of Pol-II binding around transcription start sites were generated for the postnatal mouse retina; which can predict activation increases for a specific gene of interest. Novel gene activation predictions are enriched for biologic functions relevant to vision, neural function, and chromatin regulation. Use of the data set to detect novel activation increases was demonstrated by expression analysis for several genes that have human homologs located within unidentified retinal disease regions: Slc25a33, Lpcat1, Ccdc126, and Arl4d. Analysis of photoreceptor-deficient retinas indicated that all four genes are expressed in photoreceptors. Genome-wide maps of Pol-II binding were developed for visual access in the University of California, Santa Cruz (UCSC) Genome Browser and its eye-centric version EyeBrowse (National Eye Institute-NEI). Single promoter resolution of Pol-II distribution patterns suggest the Rho enhancer region and the Rho proximal promoter region become closely associated with the activated gene’s promoter complex.
Herpesvirus persistence requires a dynamic balance between latent and lytic cycle gene expression, but how this balance is maintained remains enigmatic. We have previously shown that the Kaposi's Sarcoma-Associated Herpesvirus (KSHV) major latency transcripts encoding LANA, vCyclin, vFLIP, v-miRNAs, and Kaposin are regulated, in part, by a chromatin organizing element that binds CTCF and cohesins. Using viral genome-wide chromatin conformation capture (3C) methods, we now show that KSHV latency control region is physically linked to the promoter regulatory region for ORF50, which encodes the KSHV immediate early protein RTA. Other linkages were also observed, including an interaction between the 5′ and 3′ end of the latency transcription cluster. Mutation of the CTCF-cohesin binding site reduced or eliminated the chromatin conformation linkages, and deregulated viral transcription and genome copy number control. siRNA depletion of CTCF or cohesin subunits also disrupted chromosomal linkages and deregulated viral latent and lytic gene transcription. Furthermore, the linkage between the latent and lytic control region was subject to cell cycle fluctuation and disrupted during lytic cycle reactivation, suggesting that these interactions are dynamic and regulatory. Our findings indicate that KSHV genomes are organized into chromatin loops mediated by CTCF and cohesin interactions, and that these inter-chromosomal linkages coordinate latent and lytic gene control.
Multiple mechanisms have been implicated in the control of herpesvirus latent and lytic gene regulation, but few mechanisms account for coordinate regulation of these two life cycles. Here, we show that the transcription control elements for KSHV latent and lytic genes are in close physical proximity. Mutations in the CTCF binding sites of the KSHV latency control region caused a loss of cohesin binding, and derepression of latent transcripts. Loss of CTCF binding also caused a loss of KSHV DNA copy number, and a failure to express lytic genes, including the immediate early gene Rta. Chromatin conformation capture (3C) methods indicated that the CTCF binding sites in the latency control region are linked to the promoter region of Rta. Additional chromatin linkages were detected between the 5′ and 3′ ends of the major latency transcripts, suggesting that chromatin loops organize both latent and lytic gene clusters. The interaction between latent and lytic control regions was subject to cell cycle regulation, consistent with earlier studies implicating cell cycle control of cohesin binding and viral transcription patterns. KSHV chromosome conformation was also disrupted by lytic cycle reactivation. We propose that CTCF-cohesin form dynamic linkages between viral regulatory domains to both insulate and coordinate latent and lytic gene expression.
Comparative ChIP-seq data reveal adaptive evolution of insulator protein CTCF binding in multiple Drosophila species.
Changes in the physical interaction between cis-regulatory DNA sequences and proteins drive the evolution of gene expression. However, it has proven difficult to accurately quantify evolutionary rates of such binding change or to estimate the relative effects of selection and drift in shaping the binding evolution. Here we examine the genome-wide binding of CTCF in four species of Drosophila separated by between ∼2.5 and 25 million years. CTCF is a highly conserved protein known to be associated with insulator sequences in the genomes of human and Drosophila. Although the binding preference for CTCF is highly conserved, we find that CTCF binding itself is highly evolutionarily dynamic and has adaptively evolved. Between species, binding divergence increased linearly with evolutionary distance, and CTCF binding profiles are diverging rapidly at the rate of 2.22% per million years (Myr). At least 89 new CTCF binding sites have originated in the Drosophila melanogaster genome since the most recent common ancestor with Drosophila simulans. Comparing these data to genome sequence data from 37 different strains of Drosophila melanogaster, we detected signatures of selection in both newly gained and evolutionarily conserved binding sites. Newly evolved CTCF binding sites show a significantly stronger signature for positive selection than older sites. Comparative gene expression profiling revealed that expression divergence of genes adjacent to CTCF binding site is significantly associated with the gain and loss of CTCF binding. Further, the birth of new genes is associated with the birth of new CTCF binding sites. Our data indicate that binding of Drosophila CTCF protein has evolved under natural selection, and CTCF binding evolution has shaped both the evolution of gene expression and genome evolution during the birth of new genes.
A large proportion of the diversity of living organisms results from differential regulation of gene transcription. Transcriptional regulation is thought to differ between species because of evolutionary changes in the physical interactions between regulatory DNA elements and DNA-binding proteins; these can generate variation in the spatial and temporal patterns of gene expression. The mechanisms by which these protein–DNA interactions evolve is therefore an important question in evolutionary biology. Does adaptive evolution play a role, or is the process dominated by neutral genetic drift? Insulator proteins are a special group of DNA-binding proteins—instead of directly serving to activate or repress genes, they can function to coordinate the interactions between other regulatory elements (such as enhancers and promoters). Additionally, insulator proteins can limit the spreading of chromatin condensation and help to demarcate the boundaries of regulatory domains in the genome. In spite of their critical role in genome regulation, little is known about the evolution of interactions between insulator proteins and DNA. Here, we use ChIP-seq to examine the distribution of binding sites for CTCF, a highly conserved insulator protein, in four closely related Drosophila species. We find that genome-wide binding profiles of CTCF are highly dynamic across evolutionary time, with frequent births of new CTCF-DNA interactions, and we demonstrate that this evolutionary process is driven by natural selection. By comparing these with RNA-seq data, we find that gain or loss of CTCF binding impacts the expression levels of nearby genes and correlates with structural evolution of the genome. Together these results suggest a potential mechanism of regulatory re-wiring through adaptive evolution of CTCF binding.
Chromatin-organizing factors such as CTCF and cohesins have been implicated in the control of complex viral regulatory programs. We investigated the role of CTCF and cohesins in the control of the switch from latency to the lytic cycle for Kaposi's sarcoma-associated herpesvirus (KSHV). We found that cohesin subunits but not CTCF are required for the repression of KSHV immediate early gene transcription. Depletion of the cohesin subunits Rad21, SMC1, and SMC3 resulted in lytic cycle gene transcription and viral DNA replication. In contrast, depletion of CTCF failed to induce lytic transcription or DNA replication. Chromatin immunoprecipitation with high-throughput sequencing (ChIP-Seq) revealed that cohesins and CTCF bound to several sites within the immediate early control region for ORF50 and to more distal 5′ sites that also regulate the divergently transcribed ORF45-ORF46-ORF47 gene cluster. Rad21 depletion led to a robust increase in ORF45, ORF46, ORF47, and ORF50 transcripts, with similar kinetics to that observed with chemical induction by sodium butyrate. During latency, the chromatin between the ORF45 and ORF50 transcription start sites was enriched in histone H3K4me3, with elevated H3K9ac at the ORF45 promoter and elevated H3K27me3 at the ORF50 promoter. A paused form of RNA polymerase II (Pol II) was loosely associated with the ORF45 promoter region during latency but was converted to an active elongating form upon reactivation induced by Rad21 depletion. Butyrate treatment caused a rapid dissociation of cohesins and loss of CTCF binding at the immediate early gene locus, suggesting that cohesins may be a direct target of butyrate-mediated lytic induction. Our findings implicate cohesins as a major repressor of KSHV lytic gene activation and show that they function coordinately with CTCF to regulate the switch between latent and lytic gene activity.
Genomic imprinting is the epigenetic marking of genes that results in parent-of-origin monoallelic expression. Most imprinted domains are associated with differentially DNA methylated regions (DMRs) that originate in the gametes, and are maintained in somatic tissues after fertilization. This allelic methylation profile is associated with a plethora of histone tail modifications that orchestrates higher order chromatin interactions. The mouse chromosome 15 imprinted cluster contains multiple brain-specific maternally expressed transcripts including Ago2, Chrac1, Trappc9 and Kcnk9 and a paternally expressed gene, Peg13. The promoter of Peg13 is methylated on the maternal allele and is the sole DMR within the locus. To determine the extent of imprinting within the human orthologous region on chromosome 8q24, a region associated with autosomal recessive intellectual disability, Birk-Barel mental retardation and dysmorphism syndrome, we have undertaken a systematic analysis of allelic expression and DNA methylation of genes mapping within an approximately 2 Mb region around TRAPPC9.
Utilizing allele-specific RT-PCR, bisulphite sequencing, chromatin immunoprecipitation and chromosome conformation capture (3C) we show the reciprocal expression of the novel, paternally expressed, PEG13 non-coding RNA and maternally expressed KCNK9 genes in brain, and the biallelic expression of flanking transcripts in a range of tissues. We identify a tandem-repeat region overlapping the PEG13 transcript that is methylated on the maternal allele, which binds CTCF-cohesin in chromatin immunoprecipitation experiments and possesses enhancer-blocker activity. Using 3C, we identify mutually exclusive approximately 58 and 500 kb chromatin loops in adult frontal cortex between a novel brain-specific enhancer, marked by H3K4me1 and H3K27ac, with the KCNK9 and PEG13 promoters which we propose regulates brain-specific expression.
We have characterised the molecular mechanism responsible for reciprocal allelic expression of the PEG13 and KCNK9 transcripts. Therefore, our observations may have important implications for identifying the cause of intellectual disabilities associated with the 8q24 locus.
Imprinting; DNA methylation; Chromatin looping
The transcription of ribosomal RNA (rRNA) is critical to life. Despite its importance, ribosomal DNA (rDNA) is not included in current genome assemblies and, consequently, genomic analyses to date have excluded rDNA. Here, we show that short sequence reads can be aligned to a genome assembly containing a single rDNA repeat. Integrated analysis of ChIP-seq, DNase-seq, MNase-seq and RNA-seq data reveals several novel findings. First, the coding region of active rDNA is contained within nucleosome-depleted open chromatin that is highly transcriptionally active. Second, histone modifications are located not only at the rDNA promoter but also at novel sites within the intergenic spacer. Third, the distributions of active modifications are more similar within and between different cell types than repressive modifications. Fourth, UBF, a positive regulator of rRNA transcription, binds to sites throughout the genome. Lastly, the insulator binding protein CTCF associates with the spacer promoter of rDNA, suggesting that transcriptional insulation plays a role in regulating the transcription of rRNA. Taken together, these analyses confirm and expand the results of previous ChIP studies of rDNA and provide novel avenues for exploration of chromatin-mediated regulation of rDNA.
Monoamine oxidase A (MAO A) is an enzyme that catalyzes the oxidation of neurotransmitter amines. A functional polymorphism in the human MAOA gene (high- and low- MAOA) has been associated with distinct behavioral phenotypes. To investigate directly the biological mechanism whereby this polymorphism influences brain function, we recently measured the activity of the MAO A enzyme in healthy volunteers. When found no relationship between the individual's brain MAO A level and the MAOA genotype, we postulated that there are additional regulatory mechanisms that control the MAOA expression. Given that DNA methylation is linked to the regulation of gene expression, we hypothesized that epigenetic mechanisms factor into the MAOA expression. Our underplaying assumption was that the differences in an individual's genotype play a key role in the epigenetic potential of the MAOA locus and, consequently, determine the individual's level of MAO A activity in the brain. As a first step towards experimental validation of the hypothesis, we performed a comprehensive bioinformatic analysis aiming to interrogate genomic features and attributes of the MAOA locus that might modulate its epigenetic sensitivity. Major findings of our analysis are the following: (1) the extended MAOA regulatory region contains two CpG islands (CGIs), one of which overlaps with the canonical MAOA promoter and the other is located further upstream; both CGIs exhibit sensitivity to differential methylation. (2) The uVNTR's effect on the MAOA's transcriptional activity might have epigenetic nature: this polymorphic region resides within the MAOA's CGI and itself contains CpGs, thus, the number of repeating increments effectively changes the number of methylatable cytosines in the MAOA promoter. An array of in silico analyses (the nucleosome positioning, the physical properties of the local DNA, the clustering of transcription-factor binding sites) together with experimental data on histone modifications and Pol 2 sites and data from the RefSeq mRNA library together suggest that the MAOA gene might have an alternative promoter. Based on our findings, we propose a regulatory mechanism for the human MAOA according to which the MAOA expression in vivo is executed by the generation of tissue-specific transcripts initiated from the alternative promoters (both CGI-associated) where transcriptional activation of a particular promoter is under epigenetic control.
human MAOA gene; epigenetic regulation; DNA methylation; epigenetic potential; computational analysis
While tumor suppressor genes frequently undergo epigenetic silencing in cancer, how the instructions directing this transcriptional repression are transmitted in cancer cells remain largely unclear. Expression of cyclin-dependent kinase inhibitor 1C (CDKN1C), an imprinted gene on chromosomal band 11 p15.5, is reduced or lost in the majority of breast cancers. Here, we report that CDKN1C is suppressed by estrogen through epigenetic mechanisms involving the chromatin-interacting noncoding RNA KCNQ1OT1 and CCCTC-binding factor (CTCF). Activation of estrogen signaling reduced CDKN1C expression 3-fold (P < 0.001) and established repressive histone modifications at the 5′ regulatory region of the locus. These events were concomitant with induction of KCNQ1OT1 expression as well as increased recruitment of CTCF to both the distal KCNQ1OT1 promoter-associated imprinting control region (ICR) and the CDKN1C locus. Transient depletion of CTCF by small interfering RNA increased CDKN1C expression and significantly reduced the estrogen-mediated repression of CDKN1C. Further studies in breast cancer cell lines indicated that the epigenetic silencing of CDKN1C occurs in part as the result of genetic loss of the inactive methylated 11p15.5 ICR allele (R2 = 0.612, P < 0.001). We also found a novel cis-encoded antisense transcript, CDKN1C-AS, which is induced by estrogen signaling following pharmacologic inhibition of DNA methyltransferase and histone deacetylase activity. Forced expression of CDKN1C-AS was capable of repressing endogenous CDKN1C in vivo. Our findings suggest that in addition to promoter hypermethylation, epigenetic repression of tumor suppressor genes by CTCF and noncoding RNA transcripts could be more common and important than previously understood.