Search tips
Search criteria

Results 1-25 (1265932)

Clipboard (0)

Related Articles

1.  Deciphering a transcriptional regulatory code: modeling short-range repression in the Drosophila embryo 
A well-defined set of transcriptional regulatory modules was created and analyzed in the Drosophila embryo.Fractional occupancy-based models were developed to explain the interaction of short range transcriptional repressors with endogenous activators by using quantitative data from these modules.Our fractional occupancy-based modeling uncovered specific quantitative features of short-range repressors; a complex nonlinear quenching relationship, similar quenching efficiencies for different activators, and modest levels of cooperativityThe extension of the study to endogenous enhancers highlighted several features of enhancer architecture design in Drosophila embryos.
Transcriptional regulatory information, represented by patterns of protein-binding sites on DNA, comprises an important portion of genetic coding. Despite the abundance of genomic sequences now available, identifying and characterizing this information remain a major challenge. Minor changes in protein-binding sites can have profound effects on gene expression, and such changes have been shown to underlie important aspects of disease and evolution. Thus, an important aim in contemporary systems biology is to develop a global understanding of the transcriptional regulatory code, allowing prediction of gene output based on DNA sequence information. Recent studies have focused on endogenous transcriptional regulatory sequences (Janssens et al, 2006; Zinzen et al, 2006; Segal et al, 2008); however, distinct enhancers differ in many features, including transcription factor activity, spacing, and cooperativity, making it difficult to learn the effects of individual features and generalize them to other cis-regulatory elements. We have pursued a bottom up approach to understand the mechanistic processing of regulatory elements by the transcriptional machinery, using a well-defined and characterized set of repressors and activators in Drosophila blastoderm embryos. The study focuses on the Giant, Krüppel, Knirps, and Snail proteins, which have been characterized as short-range repressors, able to act locally to interfere with activator function (quenching) (Gray et al, 1994; Arnosti et al, 1996a). Such repressors have central functions in development.
The aim our study was to enable ab initio predictions of enhancer function, given defined quantities of regulatory proteins and the sequence of the enhancer (Figure 1). We have generated a large quantitative data set using fluorescent confocal laser scanning microscopy to determine the inputs (Giant, Krüppel, and Knirps protein levels) and outputs (lacZ mRNA levels) of the regulatory elements introduced into Drosophila by transgenesis. We analyzed the effect of altering specific features of a set of related gene modules, designed to uncover critical aspects of repression, including quenching distance, cooperativity, and overall factor potency.
We generated specific descriptions for each regulatory element using fractional occupancy-based modeling and identified quantitative values for parameters affecting transcriptional regulation in vivo, and these parameters were used to build and test the model. Through this process, we uncovered earlier unknown features that allow correct predictions of regulation by short-range repressors, including a non-monotonic distance function for quenching, which implicates possible phasing effects, a modest contribution for repressor–repressor cooperativity, and similarity in repression of disparate activators.
By applying these parameters to a model of the endogenous rhomboid enhancer, we uncovered novel insights into the architecture of this enhancer (Figure 8). Our study provides essential quantitative elements of a transcriptional regulatory code that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms. Extension of these predictive models should facilitate the development of more sophisticated computational algorithms for the identification and functional characterization of novel regulatory elements. The development of such quantitative modeling tools will change our understanding of the genome from essentially a parts list to a dynamically regulated system, and will greatly facilitate studies in disease, population genetics, and evolutionary biology.
Systems biology seeks a genomic-level interpretation of transcriptional regulatory information represented by patterns of protein-binding sites. Obtaining this information without direct experimentation is challenging; minor alterations in binding sites can have profound effects on gene expression, and underlie important aspects of disease and evolution. Quantitative modeling offers an alternative path to develop a global understanding of the transcriptional regulatory code. Recent studies have focused on endogenous regulatory sequences; however, distinct enhancers differ in many features, making it difficult to generalize to other cis-regulatory elements. We applied a systematic approach to simpler elements and present here the first quantitative analysis of short-range transcriptional repressors, which have central functions in metazoan development. Our fractional occupancy-based modeling uncovered unexpected features of these proteins' activity that allow accurate predictions of regulation by the Giant, Knirps, Krüppel, and Snail repressors, including modeling of an endogenous enhancer. This study provides essential elements of a transcriptional regulatory code that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms.
PMCID: PMC2824527  PMID: 20087339
Drosophila; enhancer; modeling; repression; transcription
2.  Dissecting the retinoid-induced differentiation of F9 embryonal stem cells by integrative genomics 
We reveal how the RXRα−RARγ heterodimer upon activation by ATRA sets up a sequence of temporally controlled events that generate different subsets of primary and secondarily induced gene networks.We established RARγ and RXRα chromatin immunoprecipitation (ChIP) analyses coupled with massive parallel sequencing (ChIP-seq) together with the corresponding microarray transcriptomics at five time points during differentiation using pan-RAR and RAR isotype-selective ligands.Gene-regulatory decisions were inferred in silico from the dynamic changes of the transcriptomics patterns that correlated with the expression of RXRα−RARγ and other annotated transcription factors (TFs).Our analysis provides a temporal view of retinoic acid (RA) signalling during F9 cell differentiation, reveals RA receptor (RAR) heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
Nuclear receptors are ligand-inducible transcription factors, which upon induction by their cognate ligand induce complex temporally controlled physiological programs. Retinoic acid (RA) and its receptors are key regulators of multiple physiological processes, including embryogenesis, organogenesis, immune functions, reproduction and organ homeostasis. While insight into (some of) the physiological functions of the various RA receptor (RAR) and retinoid X receptor (RXR) subtypes has been obtained by exploiting mouse genetics (for a review, see Mark et al, 2006) we are far from an understanding of the molecular circuitries and gene networks that are at the basis of these physiological events.
RAs act by interacting with a complex receptor system that comprises heterodimers formed by one of the three RXR (RARα, β and γ) and RAR (RARα, β and γ) isotypes. While insight into the role of heterodimerization on response element preference and contribution of RAR and RXR to transcription activation of model genes has been obtained (for review, see Gronemeyer et al, 2004) very little is known about the role and dynamics of target gene interaction of the various RXR–RAR heterodimers at a global scale in the context of a biological program.
More fundamentally, in order to develop a systems biology of nuclear receptors we need to establish approaches that reveal how the initial event, the information embedded in the chemical structure of a small molecular weight compound, is propagated through binding to cognate receptor(s), recruitment of co-regulatory factors, epigenetic modulators and additional complexes/machineries to establish temporally controlled gene programs. In this respect, a recent study has revealed the impact of epigenetic modulator crosstalk in the setting up of subprograms for oestrogen receptor signalling (Ceschin et al, 2011).
In the present study, we have used mouse F9 EC cells, a homogeneous cell system which is known to differentiate upon RA exposure and require RARγ for this response (Taneja et al, 1996), in order to integrate at a genome-wide scale (i) the dynamics of RXRα and RARγ binding by chromatin immunoprecipitation (ChIP) analyses coupled with massive parallel sequencing (ChIP-seq), (ii) the correlated temporal regulation of gene programs by global transcriptomics analyses, including (iii) the response to isotype-selective RAR ligands (Box 1). Our study revealed an unexpected highly dynamic association of the RXRα–RARγ with target chromatin and an unexpected dynamics of the heterodimer composition itself, which is indicative of partner swapping.
Inspired by early works on the dynamics of Drosophila puffing patterns during ecdysone-induced metamorphosis (Ashburner et al, 1974) our working hypothesis was that diversification of gene programming is achieved by the sequential activation of separable gene cohorts that constitute the various facets of differentiation, such as altered proliferation, cell physiology, signalling and finally terminal apoptogenic differentiation. To identify these temporally activated subroutines within the overall program, we inferred gene-regulatory decisions in silico from dynamically altered global gene expression patterns that occurred due to the action of RXRα−RARγ and other annotated TFs (Ernst et al, 2007). This dynamic regulatory map was used to reconstruct RXRα–RARγ signalling networks by integration of functional co-citation. Altogether we present a genome-wide view of the temporal gene-regulatory events and the corresponding gene programs elicited by the RXRα–RARγ during F9 cell differentiation. Our study deciphers some of the mechanisms by which the chemical information encoded in RA is diversified to regulate different cohorts of genes.
Retinoic acid (RA) triggers physiological processes by activating heterodimeric transcription factors (TFs) comprising retinoic acid receptor (RARα, β, γ) and retinoid X receptor (RXRα, β, γ). How a single signal induces highly complex temporally controlled networks that ultimately orchestrate physiological processes is unclear. Using an RA-inducible differentiation model, we defined the temporal changes in the genome-wide binding patterns of RARγ and RXRα and correlated them with transcription regulation. Unexpectedly, both receptors displayed a highly dynamic binding, with different RXRα heterodimers targeting identical loci. Comparison of RARγ and RXRα co-binding at RA-regulated genes identified putative RXRα–RARγ target genes that were validated with subtype-selective agonists. Gene-regulatory decisions during differentiation were inferred from TF-target gene information and temporal gene expression. This analysis revealed six distinct co-expression paths of which RXRα–RARγ is associated with transcription activation, while Sox2 and Egr1 were predicted to regulate repression. Finally, RXRα–RARγ regulatory networks were reconstructed through integration of functional co-citations. Our analysis provides a dynamic view of RA signalling during cell differentiation, reveals RAR heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
This study provides a dynamic view of retinoic acid signalling during cell differentiation, reveals RAR/RXR heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
PMCID: PMC3261707  PMID: 21988834
ChIP-seq; retinoic acid-induced differentiation; RXR–RAR heterodimers; temporal control of gene networks; transcriptomics
3.  Integrative Modeling of eQTLs and Cis-Regulatory Elements Suggests Mechanisms Underlying Cell Type Specificity of eQTLs 
PLoS Genetics  2013;9(8):e1003649.
Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.
Author Summary
When interpreting genome-wide association studies showing that specific genetic variants are associated with disease risk, scientists look for a link between the genetic variant and a biological mechanism behind that disease. One functional mechanism is that the genetic variant may influence gene transcription via a co-localized genomic regulatory element, such as a transcription factor binding site within an open chromatin region. Often this type of regulation occurs in some cell types but not others. In this study, we look across eleven gene expression studies with seven cell types and consider how genetic transcription regulators, or eQTLs, replicate within and between cell types. We identify pervasive allelic heterogeneity, or transcriptional control of a single gene by multiple, independent eQTLs. We integrate extensive data on cell type specific regulatory elements from ENCODE to identify general methods of transcription regulation through enrichment of eQTLs within regulatory elements. We also build a classifier to predict eQTL replication across cell types. The results in this paper present a path to an integrative, predictive approach to improve our ability to understand the mechanistic basis of human phenotypic variation.
PMCID: PMC3731231  PMID: 23935528
4.  Environment-responsive transcription factors bind subtelomeric elements and regulate gene silencing 
Chromosome position analysis of ChIP-chip data revealed that several carbon source and stress-responsive yeast transcription factors conditionally bind subtelomeric X elements.Integration of several microarray gene expression data sets showed that, in this context, the factors conditionally control the boundaries and strength of subtelomeric silencing.Regulation of silencing by a fatty acid-responsive factor was found to be dependent on Sir2p and independent of Hda1p.These findings provide a critical link for establishing the mechanisms by which telomere biology is coordinated with other cellular processes including responses to environmental stimuli, aging and adaptation.
It is well established that environmental conditions modulate gene expression through local binding of a variety of conditionally active transcription factors, each responsive to specific environmental cues. However, another prevalent mechanism of gene regulation in eukaryotic cells is the long-range control of groups of genes by chromatin modifications or other position-dependent mechanisms. One such phenomenon, gene silencing, is an important and evolutionarily conserved mode of regulation that controls expression of subtelomeric genes. These genes are enriched for stress response and metabolic genes and their regulation is controlled by the spreading of silencing molecules from chromosome ends (telomeres) into subtelomeric regions. Levels of subtelomeric silencing have been linked to cellular lifespan, and study of the regulation of silencing is fundamental to our understanding of human aging. The spread of silencing in subtelomeric regions is discontinuous, and is controlled by various genomic elements that can either relay and enhance silencing from telomeres (proto-silencing) or create boundaries that protect some genomic regions from silencing. In yeast, every subtelomeric region contains an X element that proto-silences centromere-proximal genes, and also insulates telomere-proximal genes from silencing.
In this paper, we identify a regulatory mechanism to control X element-mediated proto-silencing and insulating activities in response to environmental cues. The mechanism was identified using chromosome position analysis of microarray-based chromatin immunoprecipitation (ChIP-chip) data for environment-responsive TFs and genome-wide gene expression data under the same conditions. The mechanism involves the conditional association of environment-responsive transcription factors to X elements. The binding at X elements results in regulation of proto-silencing of centromere-proximal genes, or insulation of telomere-proximal genes (depending on the factor) in response to environmental stimuli related to stress response and metabolism. One example is shown below (Figure 4B). Transcription factor, Oaf1p, conditionally binds X elements in the presence of fatty acids and enhances proto-silencing specifically under this condition. Oaf1p and several other factors implicated here are known to control adjacent genes at intrachromosomal positions, suggesting their dual functionality in both gene-specific transcriptional regulation, and long-range position-dependent mechanism. Investigation of this mechanism during the response to fatty acid exposure showed that conditional proto-silencing activity is dependent on Sir2p, a molecule known to be involved in subtelomeric silencing related to aging. This study reveals a path cells can use to coordinate subtelomeric silencing related to aging with cellular environment, and with the activities of other cellular processes.
Subtelomeric chromatin is subject to evolutionarily conserved complex epigenetic regulation and is implicated in numerous aspects of cellular function including formation of heterochromatin, regulation of stress response pathways and control of lifespan. Subtelomeric DNA is characterized by the presence of specific repeated segments that serve to propagate silencing or to protect chromosomal regions from spreading epigenetic control. In this study, analysis of genome-wide chromatin immunoprecipitation and expression data, suggests that several yeast transcription factors regulate subtelomeric silencing in response to various environmental stimuli through conditional association with proto-silencing regions called X elements. In this context, Oaf1p, Rox1p, Gzf1p and Phd1p control the propagation of silencing toward centromeres in response to stimuli affecting stress responses and metabolism, whereas others, including Adr1p, Yap5p and Msn4p, appear to influence boundaries of silencing, regulating telomere-proximal genes in Y′ elements. The factors implicated here are known to control adjacent genes at intrachromosomal positions, suggesting their dual functionality. This study reveals a path for the coordination of subtelomeric silencing with cellular environment, and with activities of other cellular processes.
PMCID: PMC3049408  PMID: 21206489
chromatin; proto-silencer; Sir2; subtelomeric silencing; X element
5.  Cellular reprogramming by the conjoint action of ERα, FOXA1, and GATA3 to a ligand-inducible growth state 
Estrogen receptor α (ERα), FOXA1, and GATA3 form a functional enhanceosome in MCF-7 breast carcinoma cell that is significantly associated with active transcriptional features such as enhanced p300 co-activator and RNA Pol II recruitment as well as chromatin opening.The enhanceosome exerts significant impact and optimal transcriptional control in the regulation of E2-responsive genes.The presence of FOXA1 and GATA3 is indispensable in restoring the ERα growth-response machinery in the ERα-negative cells and recapitulating the appropriate expression cassette.
Estrogen receptor α (ERα) is a ligand-inducible hormone nuclear receptor that has important physiology and pathology roles in reproduction, cancer, and cardiovascular biology. The regulation of ERα involves its binding to the DNA recognition sequence also known as estrogen-response elements (EREs) and recruits a variety of co-activators, corepressors, and chromatin remodeling enzymes to initiate transcription machinery. In our previous (Lin et al, 2007) and recent (Joseph et al, 2010) studies, we have identified high confidence ERα binding sites in MCF-7 human mammary carcinoma cells. With known motif scanning and de novo motif detection, we identified that FOXA1 and GATA3 motifs were commonly enriched around ERα binding sites. Moreover, numerous microarray studies have documented the co-expression of ERα, FOXA1, and GATA3 in primary breast tumors (Badve et al, 2007; Wilson and Giguere, 2008). This evidence suggests that these three transcription factors (TFs) may cluster on DNA binding sites and contribute to the breast cancer phenotype. However, there is little understanding as to the nature of their coordinated interaction at the genome level or the biological consequences of their detailed interaction.
We mapped the genome-wide binding profiles of ERα, FOXA1, and GATA3 using the massive parallel chromatin immunoprecipitation-sequencing (ChIP-seq) approach. We observed that ERα, FOXA1, and GATA3 colocalized in a coordinated manner where ∼30% of all ERα binding sites were overlapped with FOXA1 and GATA3 bindings upon estrogen (E2) stimulation. Moreover, we found that the ERα+FOXA1+GATA3 conjoint sites were associated with highest p300 co-activator recruitment, RNA Pol II occupancy, and chromatin opening. Such results indicate that these three TFs form a functional enhanceosome and cooperatively modulate the transcriptional networks previously ascribed to ERα alone. And such enhanceosome binding sites appear to regulate the genes driving core ERα function.
To further validate that ERα+FOXA1+GATA3 co-binding represents an optimal configuration for E2-mediated transcriptional activation, we have performed luciferase reporter assays on GREB1 locus that actively engages ERα enhanceosome sites in gene regulation (Figure 5C). The presence of ERα induced the GREB1 luciferase activity to ∼246% (as compared with the control construct). The individual presence of FOXA1 and GATA3 or combination of both only produced subtle changes to the GREB1 luciferase activity. The combination of ERα+FOXA1 and ERα+GATA3 has increased the luciferase activity to ∼330%. Interestingly, the assemblage of ERα+FOXA1+GATA3 provided the optimal ER responsiveness to 370%. This suggests that ERα provides the fundamental gene regulatory module but that FOXA1 and GATA3 incrementally improve ERα-regulated transcriptional induction.
It is known that ERα is a ligand-activated TF that mediates the proliferative effects of E2 in breast cancer cells. Garcia et al (1992) showed inhibited growth in MDA-MB-231 cells with forced expression of ERα upon E2 treatment. The rationale for these different outcomes has remained elusive. We posited that these higher order regulatory mechanisms of ERα function such as the formation and composition of enhanceosomes may explain the establishment of transcriptional regulatory cassettes favoring either growth enhancement or growth repression.
To test this hypothesis, we stably transfected the MDA-MB-231 cells with individual ERα, FOXA1, GATA3, or in combinations (Figure 6A). We observed inhibited growth in cells with enforced expression of ERα or FOXA1. There was unaltered growth in cells with expression of GATA3. Co-expression of ERα+FOXA1 or ERα+GATA3 exhibited inhibition of cell proliferation as compared with control cells. However, the co-expression of ERα together with FOXA1 and GATA3 resulted in marked induction of cell proliferation under E2 stimulation. We have recapitulated this cellular reprogramming in another ERα-negative breast cancer cell line, BT-549 and observed similar E2-responsive growth induction in the ERα+FOXA1+GATA3-expressing BT-549 cells. This suggests that only with the full activation of conjoint binding sites by the three TFs will the proliferative phenotype associated with ligand induced ERα be manifest.
To assess the nature of this transcriptional reprogramming, we asked the question if the reprogrammed MDA-MB-231 cells display any similarity in the expression profile of the ERα-positive breast cancer cell line, MCF-7 (Figure 6C). We combined the E2-regulated genes from these differently transfected MDA-MB-231 cells, and compared their expressions in these MDA-MB-231-transfected cells and MCF-7 cells. Strikingly, we found that the expression profiles of ERα+FOXA1+GATA3-expressing MDA-MB-231 cells display a good correlation (R=0.42) with the E2-induced expression profile of MCF-7. We did not observe such correlation between the expression profiles of MDA-MB-231 transfected with ERα only (R=−0.21). Furthermore, we observed that there is marginal induced expression of luminal marker genes and reduced expression of basal genes in the ERα+FOXA1+GATA3-expressing MDA-MB-231 as compared with the vector control cells. This suggests that the enhanceosome component is competent to partially reprogramme the basal cells to resemble the luminal cells.
Taken together, we have uncovered the genomics impact as well as the functional importance of an enhanceosome comprising ERα, FOXA1, and GATA3 in the estrogen responsiveness of ERα-positive breast cancer cells. This enhanceosome exerts significant combinatorial control of the transcriptional network regulating growth and proliferation of ERα-positive breast cancer cells. Most importantly, we show that the transfection of the enhanceosome component was necessary to reprogramme the ERα-negative cells to restore the estrogen-responsive growth and to transcriptionally induce a basal to luminal transition.
Despite the role of the estrogen receptor α (ERα) pathway as a key growth driver for breast cells, the phenotypic consequence of exogenous introduction of ERα into ERα-negative cells paradoxically has been growth inhibition. We mapped the binding profiles of ERα and its interacting transcription factors (TFs), FOXA1 and GATA3 in MCF-7 breast carcinoma cells, and observed that these three TFs form a functional enhanceosome that regulates the genes driving core ERα function and cooperatively modulate the transcriptional networks previously ascribed to ERα alone. We demonstrate that these enhanceosome occupied sites are associated with optimal enhancer characteristics with highest p300 co-activator recruitment, RNA Pol II occupancy, and chromatin opening. Most importantly, we show that the transfection of all three TFs was necessary to reprogramme the ERα-negative MDA-MB-231 and BT-549 cells to restore the estrogen-responsive growth resembling estrogen-treated ERα-positive MCF-7 cells. Cumulatively, these results suggest that all the enhanceosome components comprising ERα, FOXA1, and GATA3 are necessary for the full repertoire of cancer-associated effects of the ERα.
PMCID: PMC3202798  PMID: 21878914
enhanceosome; estrogen receptor α; FOXA1; GATA3; synthetic phenotypes
6.  Divergence of nucleosome positioning between two closely related yeast species: genetic basis and functional consequences 
Inter-species hybrids can be used to dissect the relative contribution of cis and trans effects to the evolution of nucleosome positioning. Most (∼70%) differences in nucleosome positioning between two closely related yeast species are due to cis effects.Cis effects are primarily due to divergence of AT-rich nucleosome-disfavoring sequences, but are not associated with divergence of nucleosome-favoring sequences.Differences in nucleosome positioning propagate to multiple adjacent nucleosomes, supporting the statistical positioning hypothesis.Divergence of nucleosome positioning is excluded from regulatory elements and is not correlated with gene expression divergence, suggesting a neutral mode of evolution.
Phenotypic diversity is often due to changes in gene regulation, and recent studies have characterized extensive differences between the gene expression programs of closely related species (Khaitovich et al, 2006; Tirosh et al, 2009). However, very little is known about the mechanisms that drive this divergence. Here, we analyze the evolution of nucleosome positioning, by comparing the patterns of nucleosomes between two yeast species, as well as generating the allele-specific nucleosome profile in their hybrid. We ask two main questions: (1) what is the genetic basis of inter-species differences in nucleosome positioning? and (2) what is the regulatory function of these differences?
Generally speaking, we can classify the genetic basis of the divergence in nucleosome positioning into two mechanisms. First, mutations in the local DNA sequence may influence the ability to bind nucleosomes at this region; we refer to these as cis effects. Second, mutations may affect the activity of various proteins that alter nucleosome positioning either actively (e.g. chromatin-remodeling enzymes) or by simply competing with nucleosomes for binding to the same DNA sequence (e.g. transcription factors); we refer to these as trans effects.
To classify the observed inter-species differences into cis versus trans effects, we measured allele-specific nucleosome positions within the inter-specific hybrid of the two species (Wittkopp et al, 2004; Tirosh et al, 2009). The hybrid contains the alleles of both species; hence, cis effects, which involve mutations that discriminate between the two alleles, will be maintained in the hybrid so that nucleosome positioning will be different between the alleles coming from the different species. Trans effects, in contrast, will not discriminate between the two hybrid alleles from the different species, as these two alleles reside together at the same trans environment (hybrid nucleus) and are thus regulated by the same set of proteins—the combination of proteins from the two species. Using this approach, we found that ∼70% of the inter-species differences in nucleosome positioning are due to cis effects, whereas the rest is due to trans effects.
The local DNA sequence is indeed known to affect nucleosome positions, and many features of DNA sequences were proposed to influence nucleosome binding, either by rejecting nucleosomes, or by being favorable for nucleosome binding (Segal et al, 2006; Lee et al, 2007; Kaplan et al, 2009). We find, however, that nucleosome positions diverged primarily through changes in AT-rich sequences, which exclude nucleosomes, whereas mutations in sequences that correlate with high-nucleosome occupancy do not influence inter-species divergence.
Nucleosomes restrict the access of proteins to the DNA and may thus affect DNA-related processes such as transcription, recombination or replication. Indeed, promoters and regulatory sequences are often depleted of nucleosomes, and highly transcribed genes are associated with low occupancy of nucleosomes at their promoters (Lee et al, 2007). Several earlier studies also suggested that evolutionary divergence of gene expression is driven by changes in chromatin structure (Lee et al, 2006; Choi and Kim, 2008; Tirosh et al, 2008; Field et al, 2009). However, we find that nucleosome positions (or occupancy) at regulatory elements are largely conserved, and furthermore, that the inter-species differences in nucleosome positions do not correlate with gene expression differences. These results suggest that nucleosome positioning is not a central mechanism for evolutionary changes in gene regulation and that most of the observed changes may be due to neutral drift.
Does the apparent low influence of nucleosome positioning on gene expression divergence implies that nucleosome positions do not have a function in gene regulation? To address this, we examined two additional modes of gene regulation: transcriptional response to changes in growth conditions (glucose versus glycerol media), and the expression differences between different cell types (haploid versus diploid cells). Consistent with earlier studies, we found that the response to growth conditions is significantly, albeit weakly, associated with changes in nucleosome positioning. Interestingly, we also found a strikingly strong association between gene expression and nucleosomal changes in the two cell types. Taken together, these results suggest that nucleosome positioning is used preferentially for biological processes in which genes are turned on and off (e.g. different cell type), but less so during divergence of closely related species in which gradual changes accumulate over time.
Gene regulation differs greatly between related species, constituting a major source of phenotypic diversity. Recent studies characterized extensive differences in the gene expression programs of closely related species. In contrast, virtually nothing is known about the evolution of chromatin structure and how it influences the divergence of gene expression. Here, we compare the genome-wide nucleosome positioning of two closely related yeast species and, by profiling their inter-specific hybrid, trace the genetic basis of the observed differences into mutations affecting the local DNA sequences (cis effects) or the upstream regulators (trans effects). The majority (∼70%) of inter-species differences is due to cis effects, leaving a significant contribution (30%) for trans factors. We show that cis effects are well explained by mutations in nucleosome-disfavoring AT-rich sequences, but are not associated with divergence of nucleosome-favoring sequences. Differences in nucleosome positioning propagate to multiple adjacent nucleosomes, supporting the statistical positioning hypothesis, and we provide evidence that nucleosome-free regions, but not the +1 nucleosome, serve as stable border elements. Surprisingly, although we find that differential nucleosome positioning among cell types is strongly correlated with differential expression, this does not seem to be the case for evolutionary changes: divergence of nucleosome positioning is excluded from regulatory elements and is not correlated with gene expression divergence, suggesting a primarily neutral mode of evolution. Our results provide evolutionary insights to the genetic determinants and regulatory function of nucleosome positioning.
PMCID: PMC2890324  PMID: 20461072
evolution; gene regulation; nucleosome positioning
7.  Expression-Guided In Silico Evaluation of Candidate Cis Regulatory Codes for Drosophila Muscle Founder Cells 
PLoS Computational Biology  2006;2(5):e53.
While combinatorial models of transcriptional regulation can be inferred for metazoan systems from a priori biological knowledge, validation requires extensive and time-consuming experimental work. Thus, there is a need for computational methods that can evaluate hypothesized cis regulatory codes before the difficult task of experimental verification is undertaken. We have developed a novel computational framework (termed “CodeFinder”) that integrates transcription factor binding site and gene expression information to evaluate whether a hypothesized transcriptional regulatory model (TRM; i.e., a set of co-regulating transcription factors) is likely to target a given set of co-expressed genes. Our basic approach is to simultaneously predict cis regulatory modules (CRMs) associated with a given gene set and quantify the enrichment for combinatorial subsets of transcription factor binding site motifs comprising the hypothesized TRM within these predicted CRMs. As a model system, we have examined a TRM experimentally demonstrated to drive the expression of two genes in a sub-population of cells in the developing Drosophila mesoderm, the somatic muscle founder cells. This TRM was previously hypothesized to be a general mode of regulation for genes expressed in this cell population. In contrast, the present analyses suggest that a modified form of this cis regulatory code applies to only a subset of founder cell genes, those whose gene expression responds to specific genetic perturbations in a similar manner to the gene on which the original model was based. We have confirmed this hypothesis by experimentally discovering six (out of 12 tested) new CRMs driving expression in the embryonic mesoderm, four of which drive expression in founder cells.
Although genome sequences and much gene expression data are readily available, the determination of sets of transcription factors regulating particular gene expression patterns remains a problem of fundamental importance. Tissue-specific gene expression in developing animals is regulated through the combinatorial interactions of transcription factors with DNA regulatory elements termed cis regulatory modules (CRMs). Although genetic and biochemical experiments allow the identification of transcription factors and CRMs, those experiments are laborious and time-consuming. Philippakis et al. introduce a new approach (termed “CodeFinder”) for quantifying the enrichment for particular combinations of transcription factor binding site motifs within predicted CRMs associated with a given gene set of interest, identified from gene expression data. The authors' analyses allowed them to discover a specific combination of transcription factor binding site motifs that constitute a core cis regulatory code for expression of a particular subset of genes in muscle founder cells, an embryonic cell population in the developing fruit fly (Drosophila melanogaster) mesoderm, and also led them to the discovery and subsequent experimental validation of novel, tissue-specific CRMs. Importantly, the CodeFinder approach is generally applicable, and thus could be used to support, refute, or refine a known or hypothesized cis regulatory code for any biological system or genome of interest.
PMCID: PMC1464814  PMID: 16733548
8.  Combinatorial Modeling of Chromatin Features Quantitatively Predicts DNA Replication Timing in Drosophila 
PLoS Computational Biology  2014;10(1):e1003419.
In metazoans, each cell type follows a characteristic, spatio-temporally regulated DNA replication program. Histone modifications (HMs) and chromatin binding proteins (CBPs) are fundamental for a faithful progression and completion of this process. However, no individual HM is strictly indispensable for origin function, suggesting that HMs may act combinatorially in analogy to the histone code hypothesis for transcriptional regulation. In contrast to gene expression however, the relationship between combinations of chromatin features and DNA replication timing has not yet been demonstrated. Here, by exploiting a comprehensive data collection consisting of 95 CBPs and HMs we investigated their combinatorial potential for the prediction of DNA replication timing in Drosophila using quantitative statistical models. We found that while combinations of CBPs exhibit moderate predictive power for replication timing, pairwise interactions between HMs lead to accurate predictions genome-wide that can be locally further improved by CBPs. Independent feature importance and model analyses led us to derive a simplified, biologically interpretable model of the relationship between chromatin landscape and replication timing reaching 80% of the full model accuracy using six model terms. Finally, we show that pairwise combinations of HMs are able to predict differential DNA replication timing across different cell types. All in all, our work provides support to the existence of combinatorial HM patterns for DNA replication and reveal cell-type independent key elements thereof, whose experimental investigation might contribute to elucidate the regulatory mode of this fundamental cellular process.
Author Summary
Before a cell divides, its genome must be faithfully duplicated to ensure that the daughter cell receives an exact copy of the parental genetic material. However, this process requires disruption of chromatin, the combination of DNA and histone proteins, whose structure and function have to be readily restored afterwards. This is achieved through a nuclear process known as DNA replication, which represents the basis for biological inheritance. In eukaryotes, genome replication starts from distinct genomic locations termed replication origins. Origins fire in a temporally regulated, cell-type dependent manner and timing of DNA replication is therefore the result of this concerted origin activation. However, replication timing is not encoded in the genome and its regulatory mode remains to a large degree unresolved. Here, we systematically study the relationship between chromatin, represented by histone modifications and chromatin binding proteins, and DNA replication timing. We report combinatorial histone modification patterns exhibiting regulatory potential for this process and we characterize those elements that might contribute to further elucidate the regulatory mode of this fundamental cellular process.
PMCID: PMC3900380  PMID: 24465194
9.  Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements 
BMC Biology  2011;9:80.
Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases.
Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.
By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.
PMCID: PMC3239327  PMID: 22115494
transcription factor; ChIP-Seq; histone modification; chromatin
10.  BEAF Regulates Cell-Cycle Genes through the Controlled Deposition of H3K9 Methylation Marks into Its Conserved Dual-Core Binding Sites 
PLoS Biology  2008;6(12):e327.
Chromatin insulators/boundary elements share the ability to insulate a transgene from its chromosomal context by blocking promiscuous enhancer–promoter interactions and heterochromatin spreading. Several insulating factors target different DNA consensus sequences, defining distinct subfamilies of insulators. Whether each of these families and factors might possess unique cellular functions is of particular interest. Here, we combined chromatin immunoprecipitations and computational approaches to break down the binding signature of the Drosophila boundary element–associated factor (BEAF) subfamily. We identify a dual-core BEAF binding signature at 1,720 sites genome-wide, defined by five to six BEAF binding motifs bracketing 200 bp AT-rich nuclease-resistant spacers. Dual-cores are tightly linked to hundreds of genes highly enriched in cell-cycle and chromosome organization/segregation annotations. siRNA depletion of BEAF from cells leads to cell-cycle and chromosome segregation defects. Quantitative RT-PCR analyses in BEAF-depleted cells show that BEAF controls the expression of dual core–associated genes, including key cell-cycle and chromosome segregation regulators. beaf mutants that impair its insulating function by preventing proper interactions of BEAF complexes with the dual-cores produce similar effects in embryos. Chromatin immunoprecipitations show that BEAF regulates transcriptional activity by restricting the deposition of methylated histone H3K9 marks in dual-cores. Our results reveal a novel role for BEAF chromatin dual-cores in regulating a distinct set of genes involved in chromosome organization/segregation and the cell cycle.
Author Summary
The genome of eukaryotes is packaged in chromatin, which consists of DNA, histones, and accessory proteins. This leads to a general repression of genes, particularly for those exposed to mostly condensed, heterochromatin regions. DNA sequences called chromatin insulators/boundary elements are able to insulate a gene from its chromosomal context by blocking promiscuous heterochromatin spreading. No common feature has been identified among the insulators/boundary elements known so far. Rather, distinct subfamilies of insulators harbor different DNA consensus sequences targeted by different DNA-binding factors, which confer their insulating activity. Determining whether distinct subfamilies possess distinct cellular functions is important for understanding genome regulation. Here, using Drosophila, we have combined computational and experimental approaches to address the function of the boundary element-associated factor (BEAF) subfamily of insulators. We identify hundreds of BEAF dual-cores that are defined by a particular arrangement of DNA sequence motifs bracketing nucleosome binding sequences, and that mark the genomic BEAF binding sites. BEAF dual-cores are close to hundreds of genes that regulate chromosome organization/segregation and the cell cycle. Since BEAF acts by restricting the deposition of repressing epigenetic histone marks, which affects the accessibility of chromatin, its depletion affects the expression of cell-cycle genes. Our data reveal a new role for BEAF in regulating the cell cycle through its binding to highly conserved chromatin dual-cores.
Chromatin Dual-Cores define new potent nucleosome-associatedcis-regulatory elements that regulate the accessibility of promoters of genes controlling chromosome organization/segregation and the cell cycle.
PMCID: PMC2605929  PMID: 19108610
11.  Global Mapping of Cell Type–Specific Open Chromatin by FAIRE-seq Reveals the Regulatory Role of the NFI Family in Adipocyte Differentiation 
PLoS Genetics  2011;7(10):e1002311.
Identification of regulatory elements within the genome is crucial for understanding the mechanisms that govern cell type–specific gene expression. We generated genome-wide maps of open chromatin sites in 3T3-L1 adipocytes (on day 0 and day 8 of differentiation) and NIH-3T3 fibroblasts using formaldehyde-assisted isolation of regulatory elements coupled with high-throughput sequencing (FAIRE-seq). FAIRE peaks at the promoter were associated with active transcription and histone modifications of H3K4me3 and H3K27ac. Non-promoter FAIRE peaks were characterized by H3K4me1+/me3-, the signature of enhancers, and were largely located in distal regions. The non-promoter FAIRE peaks showed dynamic change during differentiation, while the promoter FAIRE peaks were relatively constant. Functionally, the adipocyte- and preadipocyte-specific non-promoter FAIRE peaks were, respectively, associated with genes up-regulated and down-regulated by differentiation. Genes highly up-regulated during differentiation were associated with multiple clustered adipocyte-specific FAIRE peaks. Among the adipocyte-specific FAIRE peaks, 45.3% and 11.7% overlapped binding sites for, respectively, PPARγ and C/EBPα, the master regulators of adipocyte differentiation. Computational motif analyses of the adipocyte-specific FAIRE peaks revealed enrichment of a binding motif for nuclear family I (NFI) transcription factors. Indeed, ChIP assay showed that NFI occupy the adipocyte-specific FAIRE peaks and/or the PPARγ binding sites near PPARγ, C/EBPα, and aP2 genes. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus. Overexpression of dominant-negative NFIA or siRNA–mediated knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation, suggesting a physiological function of these factors in the adipogenic program. Together, our study demonstrates the utility of FAIRE-seq in providing a global view of cell type–specific regulatory elements in the genome and in identifying transcriptional regulators of adipocyte differentiation.
Author Summary
Humans consist of a few hundred types of specialized-function cells. Spatial and temporal transcriptional regulation of genes is essential for manifestation of cellular phenotypes. Identification of regulatory regions in the genome is central to understanding the mechanism of cell type–specific gene regulation. Recently developed high-throughput sequencing technology and computational analyses allow genome-wide investigation of the genome's chromatin structure. Using the FAIRE-seq technique, we identified the genome's open chromatin regions, which harbor regulatory elements in adipocytes. Open chromatin regions distal to genes' transcription start sites significantly differ among cell types. Multiple cell type–specific open chromatin regions exist near genes regulated during adipocyte differentiation. Computational motif analysis of adipocyte-specific open chromatin regions revealed enrichment of a binding motif for the NFI transcription factor family. These factors bind to the regulatory elements near adipogenic PPARγ, C/EBPα, and aP2 genes and regulate their expression. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus and knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation. Our study demonstrates the utility of FAIRE-seq in providing a global view of regulatory elements and in identifying transcriptional regulators of cellular functions.
PMCID: PMC3197683  PMID: 22028663
12.  Extensive Evolutionary Changes in Regulatory Element Activity during Human Origins Are Associated with Altered Gene Expression and Positive Selection 
PLoS Genetics  2012;8(6):e1002789.
Understanding the molecular basis for phenotypic differences between humans and other primates remains an outstanding challenge. Mutations in non-coding regulatory DNA that alter gene expression have been hypothesized as a key driver of these phenotypic differences. This has been supported by differential gene expression analyses in general, but not by the identification of specific regulatory elements responsible for changes in transcription and phenotype. To identify the genetic source of regulatory differences, we mapped DNaseI hypersensitive (DHS) sites, which mark all types of active gene regulatory elements, genome-wide in the same cell type isolated from human, chimpanzee, and macaque. Most DHS sites were conserved among all three species, as expected based on their central role in regulating transcription. However, we found evidence that several hundred DHS sites were gained or lost on the lineages leading to modern human and chimpanzee. Species-specific DHS site gains are enriched near differentially expressed genes, are positively correlated with increased transcription, show evidence of branch-specific positive selection, and overlap with active chromatin marks. Species-specific sequence differences in transcription factor motifs found within these DHS sites are linked with species-specific changes in chromatin accessibility. Together, these indicate that the regulatory elements identified here are genetic contributors to transcriptional and phenotypic differences among primate species.
Author Summary
The human genome shares a remarkable amount of genomic sequence with our closest living primate relatives. Researchers have long sought to understand what regions of the genome are responsible for unique species-specific traits. Previous studies have shown that many genes are differentially expressed between species, but the regulatory elements contributing to these differences are largely unknown. Here we report a genome-wide comparison of active gene regulatory elements in human, chimpanzee, and macaque, and we identify hundreds of regulatory elements that have been gained or lost in the human or chimpanzee genomes since their evolutionary divergence. These elements contain evidence of natural selection and correlate with species-specific changes in gene expression. Polymorphic DNA bases in transcription factor motifs that we found in these regulatory elements may be responsible for the varied biological functions across species. This study directly links phenotypic and transcriptional differences between species with changes in chromatin structure.
PMCID: PMC3386175  PMID: 22761590
13.  Combinatorial Binding Leads to Diverse Regulatory Responses: Lmd Is a Tissue-Specific Modulator of Mef2 Activity 
PLoS Genetics  2010;6(7):e1001014.
Understanding how complex patterns of temporal and spatial expression are regulated is central to deciphering genetic programs that drive development. Gene expression is initiated through the action of transcription factors and their cofactors converging on enhancer elements leading to a defined activity. Specific constellations of combinatorial occupancy are therefore often conceptualized as rigid binding codes that give rise to a common output of spatio-temporal expression. Here, we assessed this assumption using the regulatory input of two essential transcription factors within the Drosophila myogenic network. Mutations in either Myocyte enhancing factor 2 (Mef2) or the zinc-finger transcription factor lame duck (lmd) lead to very similar defects in myoblast fusion, yet the underlying molecular mechanism for this shared phenotype is not understood. Using a combination of ChIP-on-chip analysis and expression profiling of loss-of-function mutants, we obtained a global view of the regulatory input of both factors during development. The majority of Lmd-bound enhancers are co-bound by Mef2, representing a subset of Mef2's transcriptional input during these stages of development. Systematic analyses of the regulatory contribution of both factors demonstrate diverse regulatory roles, despite their co-occupancy of shared enhancer elements. These results indicate that Lmd is a tissue-specific modulator of Mef2 activity, acting as both a transcriptional activator and repressor, which has important implications for myogenesis. More generally, this study demonstrates considerable flexibility in the regulatory output of two factors, leading to additive, cooperative, and repressive modes of co-regulation.
Author Summary
While genetic studies are essential to reveal the phenotypic relationships between genes, it is often very difficult to disentangle the molecular mechanism of two genes that phenocopy each other. In this study, we used global scale and single gene analysis to investigate the relationship between two transcription factors whose mutant embryos have a similar defect in myogenesis. In Drosophila, Mef2 mutant embryos display a block in myoblast fusion, which is very similar to what is observed in mutant embryos for lmd, a zinc-finger transcription factor. To understand the underlying nature of these defects we used ChIP-on-chip analysis to obtain a global view of their co-regulated enhancers, and we used expression profiling of mutant embryos to reveal their downstream transcriptional response. The results indicate that Lmd acts as a tissue specific modulator of Mef2 activity. Using in vivo and in vitro reporter assays, we show that co-binding to the same enhancer element can lead to diverse regulatory responses. The presence of Lmd has an additive, cooperative, or repressive effect on Mef2 activity, demonstrating that it acts as a molecular switch for gene expression during muscle differentiation. More broadly, our results highlight the difficulty in translating information on combinatorial binding data into a functional regulatory response.
PMCID: PMC2895655  PMID: 20617173
14.  Identification of DNA regions and a set of transcriptional regulatory factors involved in transcriptional regulation of several human liver-enriched transcription factor genes 
Nucleic Acids Research  2008;37(3):778-792.
Mammalian tissue- and/or time-specific transcription is primarily regulated in a combinatorial fashion through interactions between a specific set of transcriptional regulatory factors (TRFs) and their cognate cis-regulatory elements located in the regulatory regions. In exploring the DNA regions and TRFs involved in combinatorial transcriptional regulation, we noted that individual knockdown of a set of human liver-enriched TRFs such as HNF1A, HNF3A, HNF3B, HNF3G and HNF4A resulted in perturbation of the expression of several single TRF genes, such as HNF1A, HNF3G and CEBPA genes. We thus searched the potential binding sites for these five TRFs in the highly conserved genomic regions around these three TRF genes and found several putative combinatorial regulatory regions. Chromatin immunoprecipitation analysis revealed that almost all of the putative regulatory DNA regions were bound by the TRFs as well as two coactivators (CBP and p300). The strong transcription-enhancing activity of the putative combinatorial regulatory region located downstream of the CEBPA gene was confirmed. EMSA demonstrated specific bindings of these HNFs to the target DNA region. Finally, co-transfection reporter assays with various combinations of expression vectors for these HNF genes demonstrated the transcriptional activation of the CEBPA gene in a combinatorial manner by these TRFs.
PMCID: PMC2647325  PMID: 19074951
15.  Modeling an Evolutionary Conserved Circadian Cis-Element 
PLoS Computational Biology  2008;4(2):e38.
Circadian oscillator networks rely on a transcriptional activator called CLOCK/CYCLE (CLK/CYC) in insects and CLOCK/BMAL1 or NPAS2/BMAL1 in mammals. Identifying the targets of this heterodimeric basic-helix-loop-helix (bHLH) transcription factor poses challenges and it has been difficult to decipher its specific sequence affinity beyond a canonical E-box motif, except perhaps for some flanking bases contributing weakly to the binding energy. Thus, no good computational model presently exists for predicting CLK/CYC, CLOCK/BMAL1, or NPAS2/BMAL1 targets. Here, we use a comparative genomics approach and first study the conservation properties of the best-known circadian enhancer: a 69-bp element upstream of the Drosophila melanogaster period gene. This fragment shows a signal involving the presence of two closely spaced E-box–like motifs, a configuration that we can also detect in the other four prominent CLK/CYC target genes in flies: timeless, vrille, Pdp1, and cwo. This allows for the training of a probabilistic sequence model that we test using functional genomics datasets. We find that the predicted sequences are overrepresented in promoters of genes induced in a recent study by a glucocorticoid receptor-CLK fusion protein. We then scanned the mouse genome with the fly model and found that many known CLOCK/BMAL1 targets harbor sequences matching our consensus. Moreover, the phase of predicted cyclers in liver agreed with known CLOCK/BMAL1 regulation. Taken together, we built a predictive model for CLK/CYC or CLOCK/BMAL1-bound cis-enhancers through the integration of comparative and functional genomics data. Finally, a deeper phylogenetic analysis reveals that the link between the CLOCK/BMAL1 complex and the circadian cis-element dates back to before insects and vertebrates diverged.
Author Summary
Life on earth is subject to daily light/dark and temperature cycles that reflect the earth rotation about its own axis. Under such conditions, organisms ranging from bacteria to human have evolved molecularly geared circadian clocks that resonate with the environmental cycles. These clocks serve as internal timing devices to coordinate physiological and behavioral processes as diverse as detoxification, activity and rest cycles, or blood pressure. In insects and vertebrates, the clock circuitry uses interlocked negative feedback loops which are implemented by transcription factors, among which the heterodimeric activators CLOCK and CYCLE play a key role. The specific DNA elements recognized by this factor are known to involve E-box motifs, but the low information content of this sequence makes it a poor predictor of the targets of CLOCK/CYCLE on a genome-wide scale. Here, we use comparative genomics to build a more specific model for a CLOCK-controlled cis-element that extends the canonical E-boxes to a more complex dimeric element. We use functional data from Drosophila and mouse circadian experiments to test the validity and assess the performance of the model. Finally, we provide a phylogenetic analysis of the cis-elements across insect and vertebrates that emphasizes the ancient link between CLOCK/CYCLE and the modeled enhancer. These results indicate that comparative genomics provides powerful means to decipher the complexity of the circadian cis-regulatory code.
PMCID: PMC2242825  PMID: 18282089
16.  Genome-Wide Profiling of p63 DNA–Binding Sites Identifies an Element that Regulates Gene Expression during Limb Development in the 7q21 SHFM1 Locus 
PLoS Genetics  2010;6(8):e1001065.
Heterozygous mutations in p63 are associated with split hand/foot malformations (SHFM), orofacial clefting, and ectodermal abnormalities. Elucidation of the p63 gene network that includes target genes and regulatory elements may reveal new genes for other malformation disorders. We performed genome-wide DNA–binding profiling by chromatin immunoprecipitation (ChIP), followed by deep sequencing (ChIP–seq) in primary human keratinocytes, and identified potential target genes and regulatory elements controlled by p63. We show that p63 binds to an enhancer element in the SHFM1 locus on chromosome 7q and that this element controls expression of DLX6 and possibly DLX5, both of which are important for limb development. A unique micro-deletion including this enhancer element, but not the DLX5/DLX6 genes, was identified in a patient with SHFM. Our study strongly indicates disruption of a non-coding cis-regulatory element located more than 250 kb from the DLX5/DLX6 genes as a novel disease mechanism in SHFM1. These data provide a proof-of-concept that the catalogue of p63 binding sites identified in this study may be of relevance to the studies of SHFM and other congenital malformations that resemble the p63-associated phenotypes.
Author Summary
Mammalian embryonic development requires precise control of gene expression in the right place at the right time. One level of control of gene expression is through cis-regulatory elements controlled by transcription factors. Deregulation of gene expression by mutations in such cis-regulatory elements has been described in developmental disorders. Heterozygous mutations in the transcription factor p63 are found in patients with limb malformations, cleft lip/palate, and defects in skin and other epidermal appendages, through disruption of normal ectodermal development during embryogenesis. We reasoned that the identification of target genes and cis-regulatory elements controlled by p63 would provide candidate genes for defects arising from abnormally regulated ectodermal development. To test our hypothesis, we carried out a genome-wide binding site analysis and identified a large number of target genes and regulatory elements regulated by p63. We further showed that one of these regulatory elements controls expression of DLX6 and possibly DLX5 in the apical ectodermal ridge in the developing limbs. Loss of this element through a micro-deletion was associated with split hand foot malformation (SHFM1). The list of p63 binding sites provides a resource for the identification of mutations that cause ectodermal dysplasias and malformations in humans.
PMCID: PMC2924305  PMID: 20808887
17.  Conservation and implications of eukaryote transcriptional regulatory regions across multiple species 
BMC Genomics  2008;9:623.
Increasing evidence shows that whole genomes of eukaryotes are almost entirely transcribed into both protein coding genes and an enormous number of non-protein-coding RNAs (ncRNAs). Therefore, revealing the underlying regulatory mechanisms of transcripts becomes imperative. However, for a complete understanding of transcriptional regulatory mechanisms, we need to identify the regions in which they are found. We will call these transcriptional regulation regions, or TRRs, which can be considered functional regions containing a cluster of regulatory elements that cooperatively recruit transcriptional factors for binding and then regulating the expression of transcripts.
We constructed a hierarchical stochastic language (HSL) model for the identification of core TRRs in yeast based on regulatory cooperation among TRR elements. The HSL model trained based on yeast achieved comparable accuracy in predicting TRRs in other species, e.g., fruit fly, human, and rice, thus demonstrating the conservation of TRRs across species. The HSL model was also used to identify the TRRs of genes, such as p53 or OsALYL1, as well as microRNAs. In addition, the ENCODE regions were examined by HSL, and TRRs were found to pervasively locate in the genomes.
Our findings indicate that 1) the HSL model can be used to accurately predict core TRRs of transcripts across species and 2) identified core TRRs by HSL are proper candidates for the further scrutiny of specific regulatory elements and mechanisms. Meanwhile, the regulatory activity taking place in the abundant numbers of ncRNAs might account for the ubiquitous presence of TRRs across the genome. In addition, we also found that the TRRs of protein coding genes and ncRNAs are similar in structure, with the latter being more conserved than the former.
PMCID: PMC2640395  PMID: 19099599
18.  Discovering Cooperative Relationships of Chromatin Modifications in Human T Cells Based on a Proposed Closeness Measure 
PLoS ONE  2010;5(12):e14219.
Eukaryotic transcription is accompanied by combinatorial chromatin modifications that serve as functional epigenetic markers. Composition of chromatin modifications specifies histone codes that regulate the associated gene. Discovering novel chromatin regulatory relationships are of general interest.
Methodology/Principal Findings
Based on the premise that the interaction of chromatin modifications is hypothesized to influence CpG methylation, we present a closeness measure to characterize the regulatory interactions of epigenomic features. The closeness measure is applied to genome-wide CpG methylation and histone modification datasets in human CD4+T cells to select a subset of potential features. To uncover epigenomic and genomic patterns, CpG loci are clustered into nine modules associated with distinct chromatin and genomic signatures based on terms of biological function. We then performed Bayesian network inference to uncover inherent regulatory relationships from the feature selected closeness measure profile and all nine module-specific profiles respectively. The global and module-specific network exhibits topological proximity and modularity. We found that the regulatory patterns of chromatin modifications differ significantly across modules and that distinct patterns are related to specific transcriptional levels and biological function. DNA methylation and genomic features are found to have little regulatory function. The regulatory relationships were partly validated by literature reviews. We also used partial correlation analysis in other cells to verify novel regulatory relationships.
The interactions among chromatin modifications and genomic elements characterized by a closeness measure help elucidate cooperative patterns of chromatin modification in transcriptional regulation and help decipher complex histone codes.
PMCID: PMC2997069  PMID: 21151929
19.  Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development 
PLoS Computational Biology  2010;6(4):e1000761.
A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns—represented by graphs of k-mers, or “graph-mers”—that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.
Author Summary
A major challenge in functional genomics is to decipher the gene regulatory networks operating in multi-cellular organisms, such as the nematode C. elegans. The expression level of a gene is controlled, to a great extent, by regulatory proteins called transcription factors that bind short motifs in the gene's promoter (regulatory region in the non-coding DNA). In a temporal regulatory process, for example in development, the “regulatory logic” of DNA motifs in the promoter largely determines the gene's expression trajectory, as the gene responds over time to changing concentrations of the transcription factors that control it. This study addresses the problem of learning DNA motifs that predict temporal expression profiles, using genomewide expression data from developmental time series in C. elegans. We developed a novel algorithm based on techniques from multivariate regression that sets up a correspondence between sequence patterns and expression trajectories. Sequence motifs are represented as graphs of sequence-similar k-length subsequences called “graph-mers”. By applying the method to germline development in C. elegans, we found both known and novel DNA motifs associated with oocyte and sperm genes.
PMCID: PMC2861633  PMID: 20454681
20.  Transcriptome profiling of human hepatocytes treated with Aroclor 1254 reveals transcription factor regulatory networks and clusters of regulated genes 
BMC Genomics  2006;7:217.
Aroclor 1254 is a well-known hepatotoxin and consists of a complex mixture of polychlorinated biphenyls (PCBs), some of which have the ability to activate the aryl hydrocarbon receptor (AhR) and other transcription factors (TFs). Altered transcription factor expression enables activation of promoters of many genes, thereby inducing a regulatory gene network. In the past, computational approaches were not applied to understand the combinatorial interplay of TFs acting in concert after treatment of human hepatocyte cultures with Aroclor 1254. We were particularly interested in interrogating promoters for transcription factor binding sites of regulated genes.
Here, we present a framework for studying a gene regulatory network and the large-scale regulation of transcription on the level of chromatin structure. For that purpose, we employed cDNA and oligomicroarrays to investigate transcript signatures in human hepatocyte cultures treated with Aroclor 1254 and found 910 genes to be regulated, 52 of which code for TFs and 47 of which are involved in cell cycle and apoptosis. We identified regulatory elements proximal to AhR binding sites, and this included recognition sites for the transcription factors ETS, SP1, CREB, EGR, NF-kB, NKXH, and ZBP. Notably, ECAT and TBP binding sites were identified for Aroclor 1254-induced and E2F, MAZ, HOX, and WHZ for Aroclor 1254-repressed genes. We further examined the chromosomal distribution of regulated genes and observed a statistically significant high number of gene pairs within a distance of 200 kb. Genes regulated by Aroclor 1254, are much closer located to each other than genes distributed randomly all over the genome. 37 regulated gene pairs are even found to be directly neighbored. Within these directly neighbored gene pairs, not all genes were bona fide targets for AhR (primary effect). Upon further analyses many were targets for other transcription factors whose expression was regulated by Aroclor 1254 (secondary effect).
We observed coordinate events in transcript regulation upon treatment of human hepatocytes with Aroclor 1254 and identified a regulatory gene network of different TFs acting in concert. We determined molecular rules for transcriptional regulation to explain, in part, the pleiotropic effect seen in animals and humans upon exposure to Aroclor 1254.
PMCID: PMC1590027  PMID: 16934159
21.  Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion 
PLoS Computational Biology  2008;4(1):e13.
The regulation of DNA accessibility through nucleosome positioning is important for transcription control. Computational models have been developed to predict genome-wide nucleosome positions from DNA sequences, but these models consider only nucleosome sequences, which may have limited their power. We developed a statistical multi-resolution approach to identify a sequence signature, called the N-score, that distinguishes nucleosome binding DNA from non-nucleosome DNA. This new approach has significantly improved the prediction accuracy. The sequence information is highly predictive for local nucleosome enrichment or depletion, whereas predictions of the exact positions are only modestly more accurate than a null model, suggesting the importance of other regulatory factors in fine-tuning the nucleosome positions. The N-score in promoter regions is negatively correlated with gene expression levels. Regulatory elements are enriched in low N-score regions. While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.
Author Summary
A eukaryotic genome is packaged into chromatin. The chromatin not only makes it possible to fit the relatively long genome into a tiny nucleus, but also plays an important regulatory role. The nucleosome is the fundamental repeating unit of chromatin. High-resolution tiling array experiments have shown that many nucleosomes are well-positioned in vivo, consistent with an important regulatory role. However, the mechanisms that determine nucleosome positioning are still poorly understood. We have developed a novel computational method for predicting nucleosome positions using only the genomic sequence information. The method detects periodic sequence signatures that discriminate nucleosome sequences from linker sequences. We show that this approach has significantly improved predictive power compared to previous studies. Interestingly, the most predictable regions tend to be located where stringent regulations are needed, i.e., the neighborhood of a transcription start site. This model predicts that nucleosome occupancy is not strongly controlled by short DNA sequence motifs but rather progressively controlled by regular organization of short elements into periodic patterns. We also provide evidence that sequence specificity for nucleosome binding is conserved from yeast to human.
PMCID: PMC2211532  PMID: 18225943
22.  Subtle Changes in Motif Positioning Cause Tissue-Specific Effects on Robustness of an Enhancer's Activity 
PLoS Genetics  2014;10(1):e1004060.
Deciphering the specific contribution of individual motifs within cis-regulatory modules (CRMs) is crucial to understanding how gene expression is regulated and how this process is affected by sequence variation. But despite vast improvements in the ability to identify where transcription factors (TFs) bind throughout the genome, we are limited in our ability to relate information on motif occupancy to function from sequence alone. Here, we engineered 63 synthetic CRMs to systematically assess the relationship between variation in the content and spacing of motifs within CRMs to CRM activity during development using Drosophila transgenic embryos. In over half the cases, very simple elements containing only one or two types of TF binding motifs were capable of driving specific spatio-temporal patterns during development. Different motif organizations provide different degrees of robustness to enhancer activity, ranging from binary on-off responses to more subtle effects including embryo-to-embryo and within-embryo variation. By quantifying the effects of subtle changes in motif organization, we were able to model biophysical rules that explain CRM behavior and may contribute to the spatial positioning of CRM activity in vivo. For the same enhancer, the effects of small differences in motif positions varied in developmentally related tissues, suggesting that gene expression may be more susceptible to sequence variation in one tissue compared to another. This result has important implications for human eQTL studies in which many associated mutations are found in cis-regulatory regions, though the mechanism for how they affect tissue-specific gene expression is often not understood.
Author Summary
Transcription is initiated through the binding of transcription factors (TFs) to specific motifs that are dispersed throughout the genome. Genomics methods have helped to discern which motifs for a TF are occupied and which are not, yet it is poorly understood why certain combinations of bound motifs, and not others, drive specific patterns of expression. Here, we take a bottom-up approach to address this question: We constructed simple, synthetic elements containing motifs for only one or two TFs in different orientations and integrated them into the Drosophila genome. By assessing when and where these elements drive expression, we could model specific rules governing tissue-specific enhancer activity. Despite the general importance of TF combinatorial interactions during development, elements with a single TF's motif were often sufficient to drive complex expression. By combining motifs for two factors, we observed non-additive expression in the heart. While the enhancer's activity could tolerate changes in motif spacing and orientation in many tissues, the robustness of heart expression was very sensitive to subtle sequence changes. These results highlight an important property of enhancers—as their readout is context-specific, so too are the effects of mutations within them, including small insertions that may alter a gene's expression in one tissue, but not in another.
PMCID: PMC3879207  PMID: 24391522
23.  Target Genes of the MADS Transcription Factor SEPALLATA3: Integration of Developmental and Hormonal Pathways in the Arabidopsis Flower 
PLoS Biology  2009;7(4):e1000090.
The molecular mechanisms by which floral homeotic genes act as major developmental switches to specify the identity of floral organs are still largely unknown. Floral homeotic genes encode transcription factors of the MADS-box family, which are supposed to assemble in a combinatorial fashion into organ-specific multimeric protein complexes. Major mediators of protein interactions are MADS-domain proteins of the SEPALLATA subfamily, which play a crucial role in the development of all types of floral organs. In order to characterize the roles of the SEPALLATA3 transcription factor complexes at the molecular level, we analyzed genome-wide the direct targets of SEPALLATA3. We used chromatin immunoprecipitation followed by ultrahigh-throughput sequencing or hybridization to whole-genome tiling arrays to obtain genome-wide DNA-binding patterns of SEPALLATA3. The results demonstrate that SEPALLATA3 binds to thousands of sites in the genome. Most potential target sites that were strongly bound in wild-type inflorescences are also bound in the floral homeotic agamous mutant, which displays only the perianth organs, sepals, and petals. Characterization of the target genes shows that SEPALLATA3 integrates and modulates different growth-related and hormonal pathways in a combinatorial fashion with other MADS-box proteins and possibly with non-MADS transcription factors. In particular, the results suggest multiple links between SEPALLATA3 and auxin signaling pathways. Our gene expression analyses link the genomic binding site data with the phenotype of plants expressing a dominant repressor version of SEPALLATA3, suggesting that it modulates auxin response to facilitate floral organ outgrowth and morphogenesis. Furthermore, the binding of the SEPALLATA3 protein to cis-regulatory elements of other MADS-box genes and expression analyses reveal that this protein is a key component in the regulatory transcriptional network underlying the formation of floral organs.
Author Summary
Most regulatory genes encode transcription factors, which modulate gene expression by binding to regulatory sequences of their target genes. In plants in particular, which genes are directly controlled by these transcription factors, and the molecular mechanisms of target gene recognition in vivo, are still largely unexplored. One of the best-understood developmental processes in plants is flower development. In different combinations, transcription factors of the MADS-box family control the identities of the different types of floral organs: sepals, petals, stamens, and carpels. Here, we present the first genome-wide analysis of binding sites of a MADS-box transcription factor in plants. We show that the MADS-domain protein SEPALLATA3 (SEP3) binds to the regulatory regions of thousands of potential target genes, many of which are also transcription factors. We provide insight into mechanisms of DNA recognition by SEP3, and suggest roles for other transcription factor families in SEP3 target gene regulation. In addition to effects on genes involved in floral organ identity, our data suggest that SEP3 binds to, and modulates, the transcription of target genes involved in hormonal signaling pathways.
The key floral regulator SEPALLATA3 binds to the promoters of a large number of potential direct target genes to integrate different growth-related and hormonal pathways in flower development.
PMCID: PMC2671559  PMID: 19385720
24.  Decoding a Signature-Based Model of Transcription Cofactor Recruitment Dictated by Cardinal Cis-Regulatory Elements in Proximal Promoter Regions 
PLoS Genetics  2013;9(11):e1003906.
Genome-wide maps of DNase I hypersensitive sites (DHSs) reveal that most human promoters contain perpetually active cis-regulatory elements between −150 bp and +50 bp (−150/+50 bp) relative to the transcription start site (TSS). Transcription factors (TFs) recruit cofactors (chromatin remodelers, histone/protein-modifying enzymes, and scaffold proteins) to these elements in order to organize the local chromatin structure and coordinate the balance of post-translational modifications nearby, contributing to the overall regulation of transcription. However, the rules of TF-mediated cofactor recruitment to the −150/+50 bp promoter regions remain poorly understood. Here, we provide evidence for a general model in which a series of cis-regulatory elements (here termed ‘cardinal’ motifs) prefer acting individually, rather than in fixed combinations, within the −150/+50 bp regions to recruit TFs that dictate cofactor signatures distinctive of specific promoter subsets. Subsequently, human promoters can be subclassified based on the presence of cardinal elements and their associated cofactor signatures. In this study, furthermore, we have focused on promoters containing the nuclear respiratory factor 1 (NRF1) motif as the cardinal cis-regulatory element and have identified the pervasive association of NRF1 with the cofactor lysine-specific demethylase 1 (LSD1/KDM1A). This signature might be distinctive of promoters regulating nuclear-encoded mitochondrial and other particular genes in at least some cells. Together, we propose that decoding a signature-based, expanded model of control at proximal promoter regions should lead to a better understanding of coordinated regulation of gene transcription.
Author Summary
Human cells exploit different mechanisms to coordinate the expression of both protein-coding and non-coding RNAs. Elucidating these mechanisms is essential to understanding normal physiology and disease. In our attempt to identify new regulatory layers acting particularly at proximal promoters, we have computationally analyzed the genomic sequences located from −150 bp to +50 bp relative to the transcriptional start site (TSS), which are often at the center of ‘open’ chromatin regions in human promoters. We have confirmed the presence of a series of cis-regulatory elements (here referred to as ‘cardinal’ motifs) that show a strong preference for these short regions. Interestingly, these elements tend to act independently rather than in fixed combinations. Therefore, we propose that they confer unique regulatory features to the human promoter subsets that contain each of these particular elements. In agreement with this model, we have identified a large repertoire of preferential partnerships between transcription factors recognizing cardinal motifs and their associated proteins (cofactors), thus decoding a signature-based model that distinguishes distinctive regulatory types of promoters based on cardinal motifs. These signatures may underlie a new layer of transcriptional regulation to orchestrate coordinated gene expression in human promoters.
PMCID: PMC3820735  PMID: 24244184
25.  Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes 
Nucleic Acids Research  2004;32(9):2889-2900.
Sequence information and high-throughput methods to measure gene expression levels open the door to explore transcriptional regulation using computational tools. Combinatorial regulation and sparseness of regulatory elements throughout the genome allow organisms to control the spatial and temporal patterns of gene expression. Here we study the organization of cis-regulatory elements in sets of co-regulated genes. We build an algorithm to search for combinations of transcription factor binding sites that are enriched in a set of potentially co-regulated genes with respect to the whole genome. No knowledge is assumed about involvement of specific sets of transcription factors. Instead, the search is exhaustively conducted over combinations of up to four binding sites obtained from databases or motif search algorithms. We evaluate the performance on random sets of genes as a negative control and on three biologically validated sets of co-regulated genes in yeasts, flies and humans. We show that we can detect DNA regions that play a role in the control of transcription. These results shed light on the structure of transcription regulatory regions in eukaryotes and can be directly applied to clusters of co-expressed genes obtained in gene expression studies. Supplementary information is available at
PMCID: PMC419615  PMID: 15155858

Results 1-25 (1265932)