Search tips
Search criteria

Results 1-25 (1206258)

Clipboard (0)

Related Articles

1.  Cooperation between NRF-2 and YY-1 transcription factors is essential for triggering the expression of the PREPL-C2ORF34 bidirectional gene pair 
BMC Molecular Biology  2009;10:67.
Many mammalian genes are organized as bidirectional (head-to-head) gene pairs with the two genes separated only by less than 1 kb. The transcriptional regulation of these bidirectional gene pairs remains largely unclear, but a few studies have suggested that the two closely adjacent genes in divergent orientation can be co-regulated by a single transcription factor binding to a specific regulatory fragment. Here we report an evolutionarily conserved bidirectional gene pair, known as the PREPL-C2ORF34 gene pair, whose transcription relies on the synergic cooperation of two transcription factors binding to an intergenic bidirectional minimal promoter.
While PREPL is present primarily in brain and heart, C2ORF34 is ubiquitously and abundantly expressed in almost all tissues. Genomic analyses revealed that these two non-homologous genes are adjacent in a head-to-head configuration on human chromosome 2p21 and separated by only 405 bp. Within this short intergenic region, a 243-bp GC-rich segment was demonstrated to function as a bidirectional minimal promoter to initiate the transcription of both flanking genes. Two key transcription factors, NRF-2 and YY-1, were further identified to coordinately participate in driving both gene expressions in an additive manner. The functional cooperation between these two transcription factors, along with their genomic binding sites and some cis-acting repressive elements, are essential for the transcriptional activation and tissue distribution of the PREPL-C2ORF34 bidirectional gene pair.
This study provides new insights into the complex transcriptional mechanism of a mammalian head-to-head gene pair which requires cooperative binding of multiple transcription factors to a bidirectional minimal promoter of the shared intergenic region.
PMCID: PMC2713978  PMID: 19575798
2.  Searching for bidirectional promoters in Arabidopsis thaliana 
BMC Bioinformatics  2009;10(Suppl 1):S29.
A "bidirectional gene pair" is defined as two adjacent genes which are located on opposite strands of DNA with transcription start sites (TSSs) not more than 1000 base pairs apart and the intergenic region between two TSSs is commonly designated as a putative "bidirectional promoter". Individual examples of bidirectional gene pairs have been reported for years, as well as a few genome-wide analyses have been studied in mammalian and human genomes. However, no genome-wide analysis of bidirectional genes for plants has been done. Furthermore, the exact mechanism of this gene organization is still less understood.
We conducted comprehensive analysis of bidirectional gene pairs through the whole Arabidopsis thaliana genome and identified 2471 bidirectional gene pairs. The analysis shows that bidirectional genes are often coexpressed and tend to be involved in the same biological function. Furthermore, bidirectional gene pairs associated with similar functions seem to have stronger expression correlation. We pay more attention to the regulatory analysis on the intergenic regions between bidirectional genes. Using a hierarchical stochastic language model (HSL) (which is developed by ourselves), we can identify intergenic regions enriched of regulatory elements which are essential for the initiation of transcription. Finally, we picked 27 functionally associated bidirectional gene pairs with their intergenic regions enriched of regulatory elements and hypothesized them to be regulated by bidirectional promoters, some of which have the same orthologs in ancient organisms. More than half of these bidirectional gene pairs are further supported by sharing similar functional categories as these of handful experimental verified bidirectional genes.
Bidirectional gene pairs are concluded also prevalent in plant genome. Promoter analyses of the intergenic regions between bidirectional genes could be a new way to study the bidirectional gene structure, which may provide a important clue for further analysis. Such a method could be applied to other genomes.
PMCID: PMC2648788  PMID: 19208129
3.  Divergence of nucleosome positioning between two closely related yeast species: genetic basis and functional consequences 
Inter-species hybrids can be used to dissect the relative contribution of cis and trans effects to the evolution of nucleosome positioning. Most (∼70%) differences in nucleosome positioning between two closely related yeast species are due to cis effects.Cis effects are primarily due to divergence of AT-rich nucleosome-disfavoring sequences, but are not associated with divergence of nucleosome-favoring sequences.Differences in nucleosome positioning propagate to multiple adjacent nucleosomes, supporting the statistical positioning hypothesis.Divergence of nucleosome positioning is excluded from regulatory elements and is not correlated with gene expression divergence, suggesting a neutral mode of evolution.
Phenotypic diversity is often due to changes in gene regulation, and recent studies have characterized extensive differences between the gene expression programs of closely related species (Khaitovich et al, 2006; Tirosh et al, 2009). However, very little is known about the mechanisms that drive this divergence. Here, we analyze the evolution of nucleosome positioning, by comparing the patterns of nucleosomes between two yeast species, as well as generating the allele-specific nucleosome profile in their hybrid. We ask two main questions: (1) what is the genetic basis of inter-species differences in nucleosome positioning? and (2) what is the regulatory function of these differences?
Generally speaking, we can classify the genetic basis of the divergence in nucleosome positioning into two mechanisms. First, mutations in the local DNA sequence may influence the ability to bind nucleosomes at this region; we refer to these as cis effects. Second, mutations may affect the activity of various proteins that alter nucleosome positioning either actively (e.g. chromatin-remodeling enzymes) or by simply competing with nucleosomes for binding to the same DNA sequence (e.g. transcription factors); we refer to these as trans effects.
To classify the observed inter-species differences into cis versus trans effects, we measured allele-specific nucleosome positions within the inter-specific hybrid of the two species (Wittkopp et al, 2004; Tirosh et al, 2009). The hybrid contains the alleles of both species; hence, cis effects, which involve mutations that discriminate between the two alleles, will be maintained in the hybrid so that nucleosome positioning will be different between the alleles coming from the different species. Trans effects, in contrast, will not discriminate between the two hybrid alleles from the different species, as these two alleles reside together at the same trans environment (hybrid nucleus) and are thus regulated by the same set of proteins—the combination of proteins from the two species. Using this approach, we found that ∼70% of the inter-species differences in nucleosome positioning are due to cis effects, whereas the rest is due to trans effects.
The local DNA sequence is indeed known to affect nucleosome positions, and many features of DNA sequences were proposed to influence nucleosome binding, either by rejecting nucleosomes, or by being favorable for nucleosome binding (Segal et al, 2006; Lee et al, 2007; Kaplan et al, 2009). We find, however, that nucleosome positions diverged primarily through changes in AT-rich sequences, which exclude nucleosomes, whereas mutations in sequences that correlate with high-nucleosome occupancy do not influence inter-species divergence.
Nucleosomes restrict the access of proteins to the DNA and may thus affect DNA-related processes such as transcription, recombination or replication. Indeed, promoters and regulatory sequences are often depleted of nucleosomes, and highly transcribed genes are associated with low occupancy of nucleosomes at their promoters (Lee et al, 2007). Several earlier studies also suggested that evolutionary divergence of gene expression is driven by changes in chromatin structure (Lee et al, 2006; Choi and Kim, 2008; Tirosh et al, 2008; Field et al, 2009). However, we find that nucleosome positions (or occupancy) at regulatory elements are largely conserved, and furthermore, that the inter-species differences in nucleosome positions do not correlate with gene expression differences. These results suggest that nucleosome positioning is not a central mechanism for evolutionary changes in gene regulation and that most of the observed changes may be due to neutral drift.
Does the apparent low influence of nucleosome positioning on gene expression divergence implies that nucleosome positions do not have a function in gene regulation? To address this, we examined two additional modes of gene regulation: transcriptional response to changes in growth conditions (glucose versus glycerol media), and the expression differences between different cell types (haploid versus diploid cells). Consistent with earlier studies, we found that the response to growth conditions is significantly, albeit weakly, associated with changes in nucleosome positioning. Interestingly, we also found a strikingly strong association between gene expression and nucleosomal changes in the two cell types. Taken together, these results suggest that nucleosome positioning is used preferentially for biological processes in which genes are turned on and off (e.g. different cell type), but less so during divergence of closely related species in which gradual changes accumulate over time.
Gene regulation differs greatly between related species, constituting a major source of phenotypic diversity. Recent studies characterized extensive differences in the gene expression programs of closely related species. In contrast, virtually nothing is known about the evolution of chromatin structure and how it influences the divergence of gene expression. Here, we compare the genome-wide nucleosome positioning of two closely related yeast species and, by profiling their inter-specific hybrid, trace the genetic basis of the observed differences into mutations affecting the local DNA sequences (cis effects) or the upstream regulators (trans effects). The majority (∼70%) of inter-species differences is due to cis effects, leaving a significant contribution (30%) for trans factors. We show that cis effects are well explained by mutations in nucleosome-disfavoring AT-rich sequences, but are not associated with divergence of nucleosome-favoring sequences. Differences in nucleosome positioning propagate to multiple adjacent nucleosomes, supporting the statistical positioning hypothesis, and we provide evidence that nucleosome-free regions, but not the +1 nucleosome, serve as stable border elements. Surprisingly, although we find that differential nucleosome positioning among cell types is strongly correlated with differential expression, this does not seem to be the case for evolutionary changes: divergence of nucleosome positioning is excluded from regulatory elements and is not correlated with gene expression divergence, suggesting a primarily neutral mode of evolution. Our results provide evolutionary insights to the genetic determinants and regulatory function of nucleosome positioning.
PMCID: PMC2890324  PMID: 20461072
evolution; gene regulation; nucleosome positioning
4.  Two non-homologous brain diseases-related genes, SERPINI1 and PDCD10, are tightly linked by an asymmetric bidirectional promoter in an evolutionarily conserved manner 
Despite of the fact that mammalian genomes are far more spacious than prokaryotic genomes, recent nucleotide sequencing data have revealed that many mammalian genes are arranged in a head-to-head orientation and separated by a small intergenic sequence. Extensive studies on some of these neighboring genes, in particular homologous gene pairs, have shown that these genes are often co-expressed in a symmetric manner and regulated by a shared promoter region. Here we report the identification of two non-homologous brain disease-related genes, with one coding for a serine protease inhibitor (SERPINI1) and the other for a programmed cell death-related gene (PDCD10), being tightly linked together by an asymmetric bidirectional promoter in an evolutionarily conserved fashion. This asymmetric bidirectional promoter, in cooperation with some cis-acting elements, is responsible for the co-regulation of the gene expression pattern as well as the tissue specificity of SERPINI1 and PDCD10.
While SERPINI1 is predominantly expressed in normal brain and down-regulated in brain tumors, PDCD10 is ubiquitously expressed in all normal tissues but its gene transcription becomes aberrant in different types of cancers. By measuring the luciferase activity in various cell lysates, their 851-bp intergenic sequence was shown to be capable of driving the reporter gene expression in either direction. A 175-bp fragment from nt 1 to 175 in the vicinity of PDCD10 was further determined to function as a minimal bidirectional promoter. A critical regulatory fragment, from nt 176-473 outside the minimal promoter in the intergenic region, was identified to contain a strong repressive element for SERPINI1 and an enhancer for PDCD10. These cis-acting elements may exist to help coordinate the expression and regulation of the two flanking genes.
For all non-homologous genes that have been described to be closely adjacent in the mammalian genomes, the intergenic region of the head-to-head PDCD10-SERPINI1 gene pair provides an interesting and informative example of a complex regulatory system that governs the expression of both genes not only through an asymmetric bidirectional promoter, but also through fine-tuned regulations with some cis-acting elements.
PMCID: PMC1796892  PMID: 17212813
5.  The Role of Nucleosome Positioning in the Evolution of Gene Regulation 
PLoS Biology  2010;8(7):e1000414.
A comparative genomics study maps nucleosomes across the entire genomes of 12 fungal species, identifying multiple distinct mechanisms linking changes in chromatin architecture to evolution of gene regulation.
Chromatin organization plays a major role in gene regulation and can affect the function and evolution of new transcriptional programs. However, it can be difficult to decipher the basis of changes in chromatin organization and their functional effect on gene expression. Here, we present a large-scale comparative genomic analysis of the relationship between chromatin organization and gene expression, by measuring mRNA abundance and nucleosome positions genome-wide in 12 Hemiascomycota yeast species. We found substantial conservation of global and functional chromatin organization in all species, including prominent nucleosome-free regions (NFRs) at gene promoters, and distinct chromatin architecture in growth and stress genes. Chromatin organization has also substantially diverged in both global quantitative features, such as spacing between adjacent nucleosomes, and in functional groups of genes. Expression levels, intrinsic anti-nucleosomal sequences, and trans-acting chromatin modifiers all play important, complementary, and evolvable roles in determining NFRs. We identify five mechanisms that couple chromatin organization to evolution of gene regulation and have contributed to the evolution of respiro-fermentation and other key systems, including (1) compensatory evolution of alternative modifiers associated with conserved chromatin organization, (2) a gradual transition from constitutive to trans-regulated NFRs, (3) a loss of intrinsic anti-nucleosomal sequences accompanying changes in chromatin organization and gene expression, (4) re-positioning of motifs from NFRs to nucleosome-occluded regions, and (5) the expanded use of NFRs by paralogous activator-repressor pairs. Our study sheds light on the molecular basis of chromatin organization, and on the role of chromatin organization in the evolution of gene regulation.
Author Summary
Divergence in gene regulation plays a major role in organismal evolution. Evidence suggests that changes in the packaging of eukaryotic genomes into chromatin can underlie the evolution of divergent gene expression patterns. Here, we explore the role of chromatin structure in regulatory evolution by whole-genome measurements of nucleosome positions and mRNA levels in 12 yeast species spanning ∼250 million years of evolution. We find several distinct ways in which changes in chromatin structure are associated with changes in gene expression. These include changes in promoter accessibility, changes in promoter chromatin architecture, and changes in the accessibility of specific transcription factor binding sites. In many cases, changes in chromatin architecture are coupled to physiological diversity, including the evolution of a respiration- or fermentation-based lifestyle, mating behavior, salt tolerance, and broad aspects of genomic structure. Together, our data will provide a rich resource for future investigations into the interplay between chromatin structure, gene regulation, and evolution.
PMCID: PMC2897762  PMID: 20625544
6.  Programmed fluctuations in sense/antisense transcript ratios drive sexual differentiation in S. pombe 
Strand-specific RNA sequencing of S. pombe reveals a highly structured programme of ncRNA expression at over 600 loci. Functional investigations show that this extensive ncRNA landscape controls the complex programme of sexual differentiation in S. pombe.
The model eukaryote S. pombe features substantial numbers of ncRNAs many of which are antisense regulatory transcripts (ARTs), ncRNAs expressed on the opposing strand to coding sequences.Individual ARTs are generated during the mitotic cycle, or at discrete stages of sexual differentiation to downregulate the levels of proteins that drive and coordinate sexual differentiation.Antisense transcription occurring from events such as bidirectional transcription is not simply artefactual ‘chatter', it performs a critical role in regulating gene expression.
Regulation of the RNA profile is a principal control driving sexual differentiation in the fission yeast Schizosaccharomyces pombe. Before transcription, RNAi-mediated formation of heterochromatin is used to suppress expression, while post-transcription, regulation is achieved via the active stabilisation or destruction of transcripts, and through at least two distinct types of splicing control (Mata et al, 2002; Shimoseki and Shimoda, 2001; Averbeck et al, 2005; Mata and Bähler, 2006; Xue-Franzen et al, 2006; Moldon et al, 2008; Djupedal et al, 2009; Amorim et al, 2010; Grewal, 2010; Cremona et al, 2011).
Around 94% of the S. pombe genome is transcribed (Wilhelm et al, 2008). While many of these transcripts encode proteins (Wood et al, 2002; Bitton et al, 2011), the majority have no known function. We used a strand-specific protocol to sequence total RNA extracts taken from vegetatively growing cells, and at different points during a time course of sexual differentiation. The resulting data redefined existing gene coordinates and identified additional transcribed loci. The frequency of reads at each of these was used to monitor transcript abundance.
Transcript levels at 6599 loci changed in at least one sample (G-statistic; False Discovery Rate <5%). 4231 (72.3%), of which 4011 map to protein-coding genes, while 809 loci were antisense to a known gene. Comparisons between haploid and diploid strains identified changes in transcript levels at over 1000 loci.
At 354 loci, greater antisense abundance was observed relative to sense, in at least one sample (putative antisense regulatory transcripts—ARTs). Since antisense mechanisms are known to modulate sense transcript expression through a variety of inhibitory mechanisms (Faghihi and Wahlestedt, 2009), we postulated that the waves of antisense expression activated at different stages during meiosis might be regulating protein expression.
To ask whether transcription factors that drive sense-transcript levels influenced ART production, we performed RNA-seq of a pat1.114 diploid meiosis in the absence of the transcription factors Atf21 and Atf31 (responsible for late meiotic transcription; Mata et al, 2002). Transcript levels at 185 ncRNA loci showed significant changes in the knockout backgrounds. Although meiotic progression is largely unaffected by removal of Atf21 and Atf31, viability of the resulting spores was significantly diminished, indicating that Atf21- and Atf31-mediated events are critical to efficient sexual differentiation.
If changes to relative antisense/sense transcript levels during a particular phase of sexual differentiation were to regulate protein expression, then the continued presence of the antisense at points in the differentiation programme where it would normally be absent should abolish protein function during this phase. We tested this hypothesis at four loci representing the three means of antisense production: convergent gene expression, improper termination and nascent transcription from an independent locus. Induction of the natural antisense transcripts that opposed spo4+, spo6+ and dis1+ (Figures 3 and 7) in trans from a heterologous locus phenocopied a loss of function of the target protein. ART overexpression decreased Dis1 protein levels. Antisense transcription opposing spk1+ originated from improper termination of the sense ups1+ transcript on the opposite strand (Figure 3B, left locus). Expression of either the natural full-length ups1+ transcript or a truncated version, restricted to the portion of ups1+ overlapping spk1+ (Figure 3, orange transcripts) in trans from a heterologous locus phenocopied the spk1.Δ differentiation deficiency. Convergent transcription from a neighbouring gene on the opposing strand is, therefore, an effective mechanism to generate RNAi-mediated (below) silencing in fission yeast. Further analysis of the data revealed, for many loci, substantial changes in UTR length over the course of meiosis, suggesting that UTR dynamics may have an active role in regulating gene expression by controlling the transcriptional overlap between convergent adjacent gene pairs.
The RNAi machinery (Grewal, 2010) was required for antisense suppression at each of the dis1, spk1, spo4 and spo6 loci, as antisense to each locus had no impact in ago1.Δ, dcr1.Δ and rdp1.Δ backgrounds. We conclude that RNAi control has a key role in maintaining the fidelity of sexual differentiation in fission yeast. The histone H3 methyl transferase Clr4 was required for antisense control from a heterologous locus.
Thus, a significant portion of the impact of ncRNA upon sexual differentiation arises from antisense gene silencing. Importantly, in contrast to the extensively characterised ability of the RNAi machinery to operate in cis at a target locus in S. pombe (Grewal, 2010), each case of gene silencing generated here could be achieved in trans by expression of the antisense transcript from a single heterologous locus elsewhere in the genome.
Integration of an antibiotic marker gene immediately downstream of the dis1+ locus instigated antisense control in an orientation-dependent manner. PCR-based gene tagging approaches are widely used to fuse the coding sequences of epitope or protein tags to a gene of interest. Not only do these tagging approaches disrupt normal 3′UTR controls, but the insertion of a heterologous marker gene immediately downstream of an ORF can clearly have a significant impact upon transcriptional control of the resulting fusion protein. Thus, PCR tagging approaches can no longer be viewed as benign manipulations of a locus that only result in the production of a tagged protein product.
Repression of Dis1 function by gene deletion or antisense control revealed a key role this conserved microtubule regulator in driving the horsetail nuclear migrations that promote recombination during meiotic prophase.
Non-coding transcripts have often been viewed as simple ‘chatter', maintained solely because evolutionary pressures have not been strong enough to force their elimination from the system. Our data show that phenomena such as improper termination and bidirectional transcription are not simply interesting artifacts arising from the complexities of transcription or genome history, but have a critical role in regulating gene expression in the current genome. Given the widespread use of RNAi, it is reasonable to anticipate that future analyses will establish ARTs to have equal importance in other organisms, including vertebrates.
These data highlight the need to modify our concept of a gene from that of a spatially distinct locus. This view is becoming increasingly untenable. Not only are the 5′ and 3′ ends of many genes indistinct, but that this lack of a hard and fast boundary is actively used by cells to control the transcription of adjacent and overlapping loci, and thus to regulate critical events in the life of a cell.
Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3′ termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent ‘horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply ‘genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
PMCID: PMC3738847  PMID: 22186733
antisense; meiosis; ncRNA; S. pombe; siRNA
7.  Comprehensive Annotation of Bidirectional Promoters Identifies Co-Regulation among Breast and Ovarian Cancer Genes 
PLoS Computational Biology  2007;3(4):e72.
A “bidirectional gene pair” comprises two adjacent genes whose transcription start sites are neighboring and directed away from each other. The intervening regulatory region is called a “bidirectional promoter.” These promoters are often associated with genes that function in DNA repair, with the potential to participate in the development of cancer. No connection between these gene pairs and cancer has been previously investigated. Using the database of spliced-expressed sequence tags (ESTs), we identified the most complete collection of human transcripts under the control of bidirectional promoters. A rigorous screen of the spliced EST data identified new bidirectional promoters, many of which functioned as alternative promoters or regulated novel transcripts. Additionally, we show a highly significant enrichment of bidirectional promoters in genes implicated in somatic cancer, including a substantial number of genes implicated in breast and ovarian cancers. The repeated use of this promoter structure in the human genome suggests it could regulate co-expression patterns among groups of genes. Using microarray expression data from 79 human tissues, we verify regulatory networks among genes controlled by bidirectional promoters. Subsets of these promoters contain similar combinations of transcription factor binding sites, including evolutionarily conserved ETS factor binding sites in ERBB2, FANCD2, and BRCA2. Interpreting the regulation of genes involved in co-expression networks, especially those involved in cancer, will be an important step toward defining molecular events that may contribute to disease.
Author Summary
Promoters are regulatory regions that control transcription of genes. A special class of promoters, known as bidirectional promoters, regulates expression of two genes instead of one. These promoters are situated between two adjacent genes whose transcription start sites are physically within 1,000 bp and oriented in opposite directions. Bidirectional promoters are found repeatedly in the genome, suggesting an important biological significance for this regulatory configuration. We developed an algorithm to map bidirectional promoters using data from a comprehensive list of transcribed sequences known as expressed sequence tags, or ESTs. This approach improved the number of previously characterized bidirectional promoters by 300%. Included in the new data are bidirectional promoters that regulate expression of genes implicated in somatic cancers. For instance, ten well-recognized genes implicated in breast and ovarian cancers were identified as having bidirectional promoters. Three of the genes are further related by having duplicate copies of the same binding site for a transcription factor within their bidirectional promoters. These binding sites are conserved among species, providing greater evidence that they are functionally important. This example, in which similar regulatory structures are used to control genes involved in cancer, illustrates how data can be mined from the comprehensive set of bidirectional promoters. Within this manuscript, we show statistical evidence that many cancer genes are regulated by bidirectional promoters. These promoters will be a valuable dataset for studying the role of gene regulation in tumor development.
PMCID: PMC1853124  PMID: 17447839
8.  Cross-species mapping of bidirectional promoters enables prediction of unannotated 5' UTRs and identification of species-specific transcripts 
BMC Genomics  2009;10:189.
Bidirectional promoters are shared regulatory regions that influence the expression of two oppositely oriented genes. This type of regulatory architecture is found more frequently than expected by chance in the human genome, yet many specifics underlying the regulatory design are unknown. Given that the function of most orthologous genes is similar across species, we hypothesized that the architecture and regulation of bidirectional promoters might also be similar across species, representing a core regulatory structure and enabling annotation of these regions in additional mammalian genomes.
By mapping the intergenic distances of genes in human, chimpanzee, bovine, murine, and rat, we show an enrichment for pairs of genes equal to or less than 1,000 bp between their adjacent 5' ends ("head-to-head") compared to pairs of genes that fall in the same orientation ("head-to-tail") or whose 3' ends are side-by-side ("tail-to-tail"). A representative set of 1,369 human bidirectional promoters was mapped to orthologous sequences in other mammals. We confirmed predictions for 5' UTRs in nine of ten manual picks in bovine based on comparison to the orthologous human promoter set and in six of seven predictions in human based on comparison to the bovine dataset. The two predictions that did not have orthology as bidirectional promoters in the other species resulted from unique events that initiated transcription in the opposite direction in only those species. We found evidence supporting the independent emergence of bidirectional promoters from the family of five RecQ helicase genes, which gained their bidirectional promoters and partner genes independently rather than through a duplication process. Furthermore, by expanding our comparisons from pairwise to multispecies analyses we developed a map representing a core set of bidirectional promoters in mammals.
We show that the orthologous positions of bidirectional promoters provide a reliable guide to directly annotate over one thousand regulatory regions in sequences of mammalian genomes, while also serving as a useful tool to predict 5' UTR positions and identify genes that are novel to a single species.
PMCID: PMC2688522  PMID: 19393065
9.  Adjacent Gene Pairing Plays a Role in the Coordinated Expression of Ribosome Biogenesis Genes MPP10 and YJR003C in Saccharomyces cerevisiae ▿ 
Eukaryotic Cell  2011;10(1):43-53.
The rRNA and ribosome biogenesis (RRB) regulon from Saccharomyces cerevisiae contains some 200 genes, the expression of which is tightly regulated under changing cellular conditions. RRB gene promoters are enriched for the RRPE and PAC consensus motifs, and a significant fraction of RRB genes are found as adjacent gene pairs. A genetic analysis of the MPP10 promoter revealed that both the RRPE and PAC motifs are important for coordinated expression of MPP10 following heat shock, osmotic stress, and glucose replenishment. The association of the RRPE binding factor Stb3 with the MPP10 promoter was found to increase after glucose replenishment and to decrease following heat shock. Similarly, bulk histone H3 clearing and histone H4K12 acetylation levels at the MPP10 promoter were found to increase or decrease following glucose replenishment or heat shock, respectively. Interestingly, substitutions in the PAC and RRPE sequences at the MPP10 promoter were also found to impact the regulated expression of the adjacent RRB gene YJR003, whose promoter lies in the opposite orientation and some 3.8 kb away. Furthermore, the regulated expression of YJR003C could be disrupted by inserting a reporter cassette that increased its distance from MPP10. Given that a high incidence of gene pairing was also found within the ribosomal protein (RP) and RRB regulons across different yeast species, our results indicate that immediately adjacent positioning of genes can be functionally significant for their coregulated expression.
PMCID: PMC3019797  PMID: 21115740
10.  The essential genome of a bacterium 
This study reports the essential Caulobacter genome at 8 bp resolution determined by saturated transposon mutagenesis and high-throughput sequencing. This strategy is applicable to full genome essentiality studies in a broad class of bacterial species.
The essential Caulobacter genome was determined at 8 bp resolution using hyper-saturated transposon mutagenesis coupled with high-throughput sequencing.Essential protein-coding sequences comprise 90% of the essential genome; the remaining 10% comprising essential non-coding RNA sequences, gene regulatory elements and essential genome replication features.Of the 3876 annotated open reading frames (ORFs), 480 (12.4%) were essential ORFs, 3240 (83.6%) were non-essential ORFs and 156 (4.0%) were ORFs that severely impacted fitness when mutated.The essential elements are preferentially positioned near the origin and terminus of the Caulobacter chromosome.This high-resolution strategy is applicable to high-throughput, full genome essentiality studies and large-scale genetic perturbation experiments in a broad class of bacterial species.
The regulatory events that control polar differentiation and cell-cycle progression in the bacterium Caulobacter crescentus are highly integrated, and they have to occur in the proper order (McAdams and Shapiro, 2011). Components of the core regulatory circuit are largely known. Full discovery of its essential genome, including non-coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of this bacterial cell. We have identified all the essential coding and non-coding elements of the Caulobacter chromosome using a hyper-saturated transposon mutagenesis strategy that is scalable and can be readily extended to obtain rapid and accurate identification of the essential genome elements of any sequenced bacterial species at a resolution of a few base pairs.
We engineered a Tn5 derivative transposon (Tn5Pxyl) that carries at one end an inducible outward pointing Pxyl promoter (Christen et al, 2010). We showed that this transposon construct inserts into the genome randomly where it can activate or disrupt transcription at the site of integration, depending on the insertion orientation. DNA from hundred of thousands of transposon insertion sites reading outward into flanking genomic regions was parallel PCR amplified and sequenced by Illumina paired-end sequencing to locate the insertion site in each mutant strain (Figure 1). A single sequencing run on DNA from a mutagenized cell population yielded 118 million raw sequencing reads. Of these, >90 million (>80%) read outward from the transposon element into adjacent genomic DNA regions and the insertion site could be mapped with single nucleotide resolution. This yielded the location and orientation of 428 735 independent transposon insertions in the 4-Mbp Caulobacter genome.
Within non-coding sequences of the Caulobacter genome, we detected 130 non-disruptable DNA segments between 90 and 393 bp long in addition to all essential promoter elements. Among 27 previously identified and validated sRNAs (Landt et al, 2008), three were contained within non-disruptable DNA segments and another three were partially disruptable, that is, insertions caused a notable growth defect. Two additional small RNAs found to be essential are the transfer-messenger RNA (tmRNA) and the ribozyme RNAseP (Landt et al, 2008). In addition to the 8 non-disruptable sRNAs, 29 out of the 130 intergenic essential non-coding sequences contained non-redundant tRNA genes; duplicated tRNA genes were non-essential. We also identified two non-disruptable DNA segments within the chromosomal origin of replication. Thus, we resolved essential non-coding RNAs, tRNAs and essential replication elements within the origin region of the chromosome. An additional 90 non-disruptable small genome elements of currently unknown function were identified. Eighteen of these are conserved in at least one closely related species. Only 2 could encode a protein of over 50 amino acids.
For each of the 3876 annotated open reading frames (ORFs), we analyzed the distribution, orientation, and genetic context of transposon insertions. There are 480 essential ORFs and 3240 non-essential ORFs. In addition, there were 156 ORFs that severely impacted fitness when mutated. The 8-bp resolution allowed a dissection of the essential and non-essential regions of the coding sequences. Sixty ORFs had transposon insertions within a significant portion of their 3′ region but lacked insertions in the essential 5′ coding region, allowing the identification of non-essential protein segments. For example, transposon insertions in the essential cell-cycle regulatory gene divL, a tyrosine kinase, showed that the last 204 C-terminal amino acids did not impact viability, confirming previous reports that the C-terminal ATPase domain of DivL is dispensable for viability (Reisinger et al, 2007; Iniesta et al, 2010). In addition, we found that 30 out of 480 (6.3%) of the essential ORFs appear to be shorter than the annotated ORF, suggesting that these are probably mis-annotated.
Among the 480 ORFs essential for growth on rich media, there were 10 essential transcriptional regulatory proteins, including 5 previously identified cell-cycle regulators (McAdams and Shapiro, 2003; Holtzendorff et al, 2004; Collier and Shapiro, 2007; Gora et al, 2010; Tan et al, 2010) and 5 uncharacterized predicted transcription factors. In addition, two RNA polymerase sigma factors RpoH and RpoD, as well as the anti-sigma factor ChrR, which mitigates rpoE-dependent stress response under physiological growth conditions (Lourenco and Gomes, 2009), were also found to be essential. Thus, a set of 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor are the core essential transcriptional regulators for growth on rich media. To further characterize the core components of the Caulobacter cell-cycle control network, we identified all essential regulatory sequences and operon transcripts. Altogether, the 480 essential protein-coding and 37 essential RNA-coding Caulobacter genes are organized into operons such that 402 individual promoter regions are sufficient to regulate their expression. Of these 402 essential promoters, the transcription start sites (TSSs) of 105 were previously identified (McGrath et al, 2007).
The essential genome features are non-uniformly distributed on the Caulobacter genome and enriched near the origin and the terminus regions. In contrast, the chromosomal positions of the published E. coli essential coding sequences (Rocha, 2004) are preferentially located at either side of the origin (Figure 4A). This indicates that there are selective pressures on chromosomal positioning of some essential elements (Figure 4A).
The strategy described in this report could be readily extended to quickly determine the essential genome for a large class of bacterial species.
Caulobacter crescentus is a model organism for the integrated circuitry that runs a bacterial cell cycle. Full discovery of its essential genome, including non-coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of a bacterial cell. Using hyper-saturated transposon mutagenesis coupled with high-throughput sequencing, we determined the essential Caulobacter genome at 8 bp resolution, including 1012 essential genome features: 480 ORFs, 402 regulatory sequences and 130 non-coding elements, including 90 intergenic segments of unknown function. The essential transcriptional circuitry for growth on rich media includes 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor. We identified all essential promoter elements for the cell cycle-regulated genes. The essential elements are preferentially positioned near the origin and terminus of the chromosome. The high-resolution strategy used here is applicable to high-throughput, full genome essentiality studies and large-scale genetic perturbation experiments in a broad class of bacterial species.
PMCID: PMC3202797  PMID: 21878915
functional genomics; next-generation sequencing; systems biology; transposon mutagenesis
11.  The adjacent positioning of co-regulated gene pairs is widely conserved across eukaryotes 
BMC Genomics  2012;13:546.
Coordinated cell growth and development requires that cells regulate the expression of large sets of genes in an appropriate manner, and one of the most complex and metabolically demanding pathways that cells must manage is that of ribosome biogenesis. Ribosome biosynthesis depends upon the activity of hundreds of gene products, and it is subject to extensive regulation in response to changing cellular conditions. We previously described an unusual property of the genes that are involved in ribosome biogenesis in yeast; a significant fraction of the genes exist on the chromosomes as immediately adjacent gene pairs. The incidence of gene pairing can be as high as 24% in some species, and the gene pairs are found in all of the possible tandem, divergent, and convergent orientations.
We investigated co-regulated gene sets in S. cerevisiae beyond those related to ribosome biogenesis, and found that a number of these regulons, including those involved in DNA metabolism, heat shock, and the response to cellular stressors were also significantly enriched for adjacent gene pairs. We found that as a whole, adjacent gene pairs were more tightly co-regulated than unpaired genes, and that the specific gene pairing relationships that were most widely conserved across divergent fungal lineages were correlated with those genes that exhibited the highest levels of transcription. Finally, we investigated the gene positions of ribosome related genes across a widely divergent set of eukaryotes, and found a significant level of adjacent gene pairing well beyond yeast species.
While it has long been understood that there are connections between genomic organization and transcriptional regulation, this study reveals that the strategy of organizing genes from related, co-regulated pathways into pairs of immediately adjacent genes is widespread, evolutionarily conserved, and functionally significant.
PMCID: PMC3500266  PMID: 23051624
12.  Complex Loci in Human and Mouse Genomes 
PLoS Genetics  2006;2(4):e47.
Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cis–antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cis–antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs), along with 6,141 cis–antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cis–antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cis–antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cis–antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes.
In the traditional view, most genes occupy their own distinct territory in mammalian genomes. However, it has become apparent that many genes are in fact located in complex regions (complex loci) where they share territory with other genes by utilizing opposite strands of DNA. Such genes either share regions expressed as mRNA (i.e., form cis–antisense pairs) or start from a genome region (called a bidirectional promoter) at which transcription can initiate in both directions along the DNA. In this paper, researchers present the one of the most comprehensive censuses of complex loci to date and investigate their general properties and human–mouse differences to discover the rules of this type of gene organization and its effect on gene regulation. They found about 25% of known human and mouse genes to be in cis–antisense pairs, and estimate the total fraction to be over 40%. At bidirectional promoters, they demonstrated the existence of mirror DNA sequence composition related to the promoters' ability to initiate transcription in two directions. The researchers found over 2,000 “chains”—complex arrangements where three or more genes are coupled by cis–antisense pairing and/or bidirectional promoters; among them are many genes whose products control the expression of other genes.
PMCID: PMC1449890  PMID: 16683030
13.  Whole-Genome Cartography of Estrogen Receptor α Binding Sites 
PLoS Genetics  2007;3(6):e87.
Using a chromatin immunoprecipitation-paired end diTag cloning and sequencing strategy, we mapped estrogen receptor α (ERα) binding sites in MCF-7 breast cancer cells. We identified 1,234 high confidence binding clusters of which 94% are projected to be bona fide ERα binding regions. Only 5% of the mapped estrogen receptor binding sites are located within 5 kb upstream of the transcriptional start sites of adjacent genes, regions containing the proximal promoters, whereas vast majority of the sites are mapped to intronic or distal locations (>5 kb from 5′ and 3′ ends of adjacent transcript), suggesting transcriptional regulatory mechanisms over significant physical distances. Of all the identified sites, 71% harbored putative full estrogen response elements (EREs), 25% bore ERE half sites, and only 4% had no recognizable ERE sequences. Genes in the vicinity of ERα binding sites were enriched for regulation by estradiol in MCF-7 cells, and their expression profiles in patient samples segregate ERα-positive from ERα-negative breast tumors. The expression dynamics of the genes adjacent to ERα binding sites suggest a direct induction of gene expression through binding to ERE-like sequences, whereas transcriptional repression by ERα appears to be through indirect mechanisms. Our analysis also indicates a number of candidate transcription factor binding sites adjacent to occupied EREs at frequencies much greater than by chance, including the previously reported FOXA1 sites, and demonstrate the potential involvement of one such putative adjacent factor, Sp1, in the global regulation of ERα target genes. Unexpectedly, we found that only 22%–24% of the bona fide human ERα binding sites were overlapping conserved regions in whole genome vertebrate alignments, which suggest limited conservation of functional binding sites. Taken together, this genome-scale analysis suggests complex but definable rules governing ERα binding and gene regulation.
Author Summary
Estrogen receptors (ERs) play key roles in facilitating the transcriptional effects of hormone functions in target tissues. To obtain a genome-wide view of ERα binding sites, we applied chromatin immunoprecipitation coupled with a cloning and sequencing strategy using chromatin immunoprecipitation pair end-tagging technology to map ERα binding sites in MCF-7 human breast cancer cells. We identified 1,234 high quality ERα binding sites in the human genome and demonstrated that the binding sites are frequently adjacent to genes significantly associated with breast cancer disease status and outcome. The mapping results also revealed that ERα can influence gene expression across distances of up to 100 kilobases or more, that genes that are induced or repressed utilize sites in different regions relative to the transcript (suggesting different mechanisms of action), and that ERα binding sites are only modestly conserved in evolution. Using computational approaches, we identified potential interactions with other transcription factor binding sites adjacent to the ERα binding elements. Taken together, these findings suggest complex but definable rules governing ERα binding and gene regulation and provide a valuable dataset for mapping the precise control nodes for one of the most important nuclear hormone receptors in breast cancer biology.
PMCID: PMC1885282  PMID: 17542648
14.  The ets-Related Transcription Factor GABP Directs Bidirectional Transcription 
PLoS Genetics  2007;3(11):e208.
Approximately 10% of genes in the human genome are distributed such that their transcription start sites are located less than 1 kb apart on opposite strands. These divergent gene pairs have a single intergenic segment of DNA, which in some cases appears to share regulatory elements, but it is unclear whether these regions represent functional bidirectional promoters or two overlapping promoters. A recent study showed that divergent promoters are enriched for consensus binding sequences of a small group of transcription factors, including the ubiquitous ets-family transcription factor GA-binding protein (GABP). Here we show that GABP binds to more than 80% of divergent promoters in at least one cell type. Furthermore, we demonstrate that GABP binding is correlated and associated with bidirectional transcriptional activity in a luciferase transfection assay. In addition, we find that the addition of a strict consensus GABP site into a set of promoters that normally function in only one direction significantly increases activity in the opposite direction in 67% of cases. Our findings demonstrate that GABP regulates the majority of divergent promoters and suggest that bidirectional transcriptional activity is mediated through GABP binding and transactivation at both divergent and nondivergent promoters.
Author Summary
Surveys of the locations of genes in the human genome have revealed that a surprising number of genes, greater than 10%, have transcription start sites within 1 kb of one another on opposite strands. These divergent gene pairs, sometimes referred to as bidirectional genes, are common in organisms such as bacteria and yeast, but it is unknown why such an arrangement exists in large, mammalian genomes. Recently, it has become apparent that the promoters of these divergent genes are regulated by a subset of transcription factors, and we have focused on one of these, GA-binding protein (GABP). We find that it regulates a large number of human genes, including the majority of divergent genes, and that its binding is associated with, correlated with, and sufficient for bidirectional transcriptional activity. Although clearly GABP is a major regulator of divergent genes, which carry out a variety of roles critical for the function and survival of the cell, these data also propose novel roles for GABP as a transcription factor. For example, the ability of GABP to promote bidirectional transcription may prove to be biologically relevant in generating many of the transcripts that have been observed outside of protein coding genes.
PMCID: PMC2077898  PMID: 18020712
15.  Evolution and Selection in Yeast Promoters: Analyzing the Combined Effect of Diverse Transcription Factor Binding Sites 
In comparative genomics one analyzes jointly evolutionarily related species in order to identify conserved and diverged sequences and to infer their function. While such studies enabled the detection of conserved sequences in large genomes, the evolutionary dynamics of regulatory regions as a whole remain poorly understood. Here we present a probabilistic model for the evolution of promoter regions in yeast, combining the effects of regulatory interactions of many different transcription factors. The model expresses explicitly the selection forces acting on transcription factor binding sites in the context of a dynamic evolutionary process. We develop algorithms to compute likelihood and to learn de novo collections of transcription factor binding motifs and their selection parameters from alignments. Using the new techniques, we examine the evolutionary dynamics in Saccharomyces species promoters. Analyses of an evolutionary model constructed using all known transcription factor binding motifs and of a model learned from the data automatically reveal relatively weak selection on most binding sites. Moreover, according to our estimates, strong binding sites are constraining only a fraction of the yeast promoter sequence that is under selection. Our study demonstrates how complex evolutionary dynamics in noncoding regions emerges from formalization of the evolutionary consequences of known regulatory mechanisms.
Author Summary
Cells use sophisticated regulation to transform static genomic information into flexible function. We are still far from understanding how such regulation evolves. Short DNA sequences that physically bind transcription factors in promoter areas near target genes play an important role in gene regulation and are directly subject to mutation and selection. In this work, we develop a methodology for studying the evolution of promoter sequences under the effect of multiple regulatory interactions. We present a model that describes the evolutionary process at each genomic locus, taking into account a random flux of mutations that occur in it and the effects of transcription factor binding sites gain or loss. Our model accounts for dependencies (or epistasis) between adjacent loci that contribute to the same regulatory interactions: mutation in one such locus immediately changes the effect of mutations in the other. Using our model, we characterize the evolution of promoters in yeast, showing that many regulatory interactions that were discovered experimentally or computationally are evolutionarily unstable. The dynamic nature of transcriptional interactions may be explained if the regulatory phenotype is achieved through multiple interactions at different levels of specificity, and if only relatively few of these interactions are essential for themselves.
PMCID: PMC2186363  PMID: 18193940
16.  The determinants of gene order conservation in yeasts 
Genome Biology  2007;8(11):R233.
Current intergene distance is shown to be consistently the strongest predictor of synteny conservation as expected under a simple null model, and other variables are of lesser importance.
Why do some groups of physically linked genes stay linked over long evolutionary periods? Although several factors are associated with the formation of gene clusters in eukaryotic genomes, the particular contribution of each feature to clustering maintenance remains unclear.
We quantify the strength of the proposed factors in a yeast lineage. First we identify the magnitude of each variable to determine linkage conservation by using several comparator species at different distances to Saccharomyces cerevisiae. For adjacent gene pairs, in line with null simulations, intergenic distance acts as the strongest covariate. Which of the other covariates appear important depends on the comparator, although high co-expression is related to synteny conservation commonly, especially in the more distant comparisons, these being expected to reveal strong but relatively rare selection. We also analyze those pairs that are immediate neighbors through all the lineages considered. Current intergene distance is again the best predictor, followed by the local density of essential genes and co-regulation, with co-expression and recombination rate being the weakest predictors. The genome duplication seen in yeast leaves some mark on linkage conservation, as adjacent pairs resolved as single copy in all post-whole genome duplication species are more often found as adjacent in pre-duplication species.
Current intergene distance is consistently the strongest predictor of synteny conservation as expected under a simple null model. Other variables are of lesser importance and their relevance depends both on the species comparison in question and the fate of the duplicates following genome duplication.
PMCID: PMC2258174  PMID: 17983469
17.  Multiple tandemly repeated binding sites for cellular nuclear factor 1 that surround the major immediate-early promoters of simian and human cytomegalovirus. 
Journal of Virology  1987;61(5):1559-1570.
We show that the large DNA genomes of human and simian cytomegaloviruses (HCMV and SCMV, respectively) each contain multiple binding sites for purified cellular nuclear factor 1 (NF1) protein. Examination of the major immediate-early (IE) gene region in the HindIII H fragment of SCMV (Colburn) by filter binding assays showed that it competed 45-fold better than the single adenovirus type 2 binding site for NF1 protein and that it contained at least two distinct binding loci. Direct DNase I footprinting analyses of the 5' upstream locus detected at least 20 adjacent NF1-binding sites located between positions -600 and -1300 relative to the IE94 mRNA start site. DNA sequence analysis of the region revealed a conserved consensus NF1 recognition element (T)TGG(C/A)N5GCCAA embedded within each of 23 highly diverged 30-base-pair tandem repeats, together with a second downstream cluster of five consensus NF1-binding sites between positions +470 and +570 in the large first intron. Two separate NF1-binding loci were also found in the equivalent IE68 gene of HCMV(Towne) DNA, but in this case the DNA sequence and competition filter binding experiments indicated a maximum of only four to five consensus binding sites encompassing the promoter-enhancer region. In transient expression assays, neither the isolated upstream IE94 tandem repeats nor a synthetic single-copy consensus NF1-binding site acted as transcriptional cis activators or enhancers when placed adjacent to the simian virus 40 minimal early region promoter. We conclude that the large and complex 5' upstream promoter-regulatory region for the SCMV IE94 gene comprises two distinct domains. The previously described four sets of 13- to 18-base-pair interspersed repeat elements between -55 and -580 provide most of the high basal transcriptional strength, whereas the arrangement of further upstream tandemly repeated NF1-binding sites may contribute significantly to the expanded biological host range for expression of SCMV IE94 compared with HCMV IE68.
PMCID: PMC254136  PMID: 3033283
18.  Histone modification pattern evolution after yeast gene duplication 
Gene duplication and subsequent functional divergence especially expression divergence have been widely considered as main sources for evolutionary innovations. Many studies evidenced that genetic regulatory network evolved rapidly shortly after gene duplication, thus leading to accelerated expression divergence and diversification. However, little is known whether epigenetic factors have mediated the evolution of expression regulation since gene duplication. In this study, we conducted detailed analyses on yeast histone modification (HM), the major epigenetics type in this organism, as well as other available functional genomics data to address this issue.
Duplicate genes, on average, share more common HM-code patterns than random singleton pairs in their promoters and open reading frames (ORF). Though HM-code divergence between duplicates in both promoter and ORF regions increase with their sequence divergence, the HM-code in ORF region evolves slower than that in promoter region, probably owing to the functional constraints imposed on protein sequences. After excluding the confounding effect of sequence divergence (or evolutionary time), we found the evidence supporting the notion that in yeast, the HM-code may co-evolve with cis- and trans-regulatory factors. Moreover, we observed that deletion of some yeast HM-related enzymes increases the expression divergence between duplicate genes, yet the effect is lower than the case of transcription factor (TF) deletion or environmental stresses.
Our analyses demonstrate that after gene duplication, yeast histone modification profile between duplicates diverged with evolutionary time, similar to genetic regulatory elements. Moreover, we found the evidence of the co-evolution between genetic and epigenetic elements since gene duplication, together contributing to the expression divergence between duplicate genes.
PMCID: PMC3495647  PMID: 22776110
Histone modification; Histone modification code divergence; Gene duplication; Expression divergence; Epigenetic divergence; cis-regulation; trans-regulation
19.  Sepsid even-skipped Enhancers Are Functionally Conserved in Drosophila Despite Lack of Sequence Conservation 
PLoS Genetics  2008;4(6):e1000106.
The gene expression pattern specified by an animal regulatory sequence is generally viewed as arising from the particular arrangement of transcription factor binding sites it contains. However, we demonstrate here that regulatory sequences whose binding sites have been almost completely rearranged can still produce identical outputs. We sequenced the even-skipped locus from six species of scavenger flies (Sepsidae) that are highly diverged from the model species Drosophila melanogaster, but share its basic patterns of developmental gene expression. Although there is little sequence similarity between the sepsid eve enhancers and their well-characterized D. melanogaster counterparts, the sepsid and Drosophila enhancers drive nearly identical expression patterns in transgenic D. melanogaster embryos. We conclude that the molecular machinery that connects regulatory sequences to the transcription apparatus is more flexible than previously appreciated. In exploring this diverse collection of sequences to identify the shared features that account for their similar functions, we found a small number of short (20–30 bp) sequences nearly perfectly conserved among the species. These highly conserved sequences are strongly enriched for pairs of overlapping or adjacent binding sites. Together, these observations suggest that the local arrangement of binding sites relative to each other is more important than their overall arrangement into larger units of cis-regulatory function.
Author Summary
The transformation of a fertilized egg into a complex, multicellular organism is a carefully choreographed process in which thousands of genes are turned on and off in specific spatial and temporal patterns that confer distinct physical properties and behaviors on emerging cells and tissues. To understand how an organism's genome specifies its form and function, it is therefore necessary to understand how patterns of gene expression are encoded in DNA. Decades of analysis of the fruit fly Drosophila melanogaster have identified numerous regulatory sequences, but have not fully illuminated how they work. Here we harness the record of natural selection to probe the function of these sequences. We identified regulatory sequences from scavenger fly species that diverged from Drosophila over 100 million years ago. While these regulatory sequences are almost completely different from their Drosophila counterparts, they drive identical expression patterns in Drosophila embryos, demonstrating extreme flexibility in the molecular machines that interpret regulatory DNA. Yet, the identical outputs produced by these sequences mean they must have something in common, and we describe one shared feature of regulatory sequence organization and function that has emerged from these comparisons. Our approach can be generalized to any regulatory system and species, and we believe that a growing collection of regulatory sequences with dissimilar sequences but similar outputs will reveal the molecular logic of gene regulation.
PMCID: PMC2430619  PMID: 18584029
20.  Support for multiple classes of local expression clusters in Drosophila melanogaster, but no evidence for gene order conservation 
Genome Biology  2011;12(3):R23.
Gene order in eukaryotic genomes is not random, with genes with similar expression profiles tending to cluster. In yeasts, the model taxon for gene order analysis, such syntenic clusters of non-homologous genes tend to be conserved over evolutionary time. Whether similar clusters show gene order conservation in other lineages is, however, undecided. Here, we examine this issue in Drosophila melanogaster using high-resolution chromosome rearrangement data.
We show that D. melanogaster has at least three classes of expression clusters: first, as observed in mammals, large clusters of functionally unrelated housekeeping genes; second, small clusters of functionally related highly co-expressed genes; and finally, as previously defined by Spellman and Rubin, larger domains of co-expressed but functionally unrelated genes. The latter are, however, not independent of the small co-expression clusters and likely reflect a methodological artifact. While the small co-expression and housekeeping/essential gene clusters resemble those observed in yeast, in contrast to yeast, we see no evidence that any of the three cluster types are preserved as synteny blocks. If anything, adjacent co-expressed genes are more likely to become rearranged than expected. Again in contrast to yeast, in D. melanogaster, gene pairs with short intergene distance or in divergent orientations tend to have higher rearrangement rates. These findings are consistent with co-expression being partly due to shared chromatin environment.
We conclude that, while similar in terms of cluster types, gene order evolution has strikingly different patterns in yeasts and in D. melanogaster, although recombination is associated with gene order rearrangement in both.
PMCID: PMC3129673  PMID: 21414197
21.  Widespread promoter-mediated coordination of transcription and mRNA degradation 
Genome Biology  2012;13(12):R114.
Previous work showed that mRNA degradation is coordinated with transcription in yeast, and in several genes the control of mRNA degradation was linked to promoter elements through two different mechanisms. Here we show at the genomic scale that the coordination of transcription and mRNA degradation is promoter-dependent in yeast and is also observed in humans.
We first demonstrate that swapping upstream cis-regulatory sequences between two yeast species affects both transcription and mRNA degradation and suggest that while some cis-regulatory elements control either transcription or degradation, multiple other elements enhance both processes. Second, we show that adjacent yeast genes that share a promoter (through divergent orientation) have increased similarity in their patterns of mRNA degradation, providing independent evidence for the promoter-mediated coupling of transcription to mRNA degradation. Finally, analysis of the differences in mRNA degradation rates between mammalian cell types or mammalian species suggests a similar coordination between transcription and mRNA degradation in humans.
Our results extend previous studies and suggest a pervasive promoter-mediated coordination between transcription and mRNA degradation in yeast. The diverse genes and regulatory elements associated with this coordination suggest that it is generated by a global mechanism of gene regulation and modulated by gene-specific mechanisms. The observation of a similar coupling in mammals raises the possibility that coupling of transcription and mRNA degradation may reflect an evolutionarily conserved phenomenon in gene regulation.
PMCID: PMC4056365  PMID: 23237624
22.  Co-regulated expression of HAND2 and DEIN by a bidirectional promoter with asymmetrical activity in neuroblastoma 
BMC Molecular Biology  2009;10:28.
HAND2, a key regulator for the development of the sympathetic nervous system, is located on chromosome 4q33 in a head-to-head orientation with DEIN, a recently identified novel gene with stage specific expression in primary neuroblastoma (NB). Both genes are expressed in primary NB as well as most NB cell lines and are separated by a genomic sequence of 228 bp. The similar expression profile of both genes suggests a common transcriptional regulation mediated by a bidirectional promoter.
Northern Blot analysis of DEIN and HAND2 in 20 primary NBs indicated concurrent expression levels of the two genes, which was confirmed by microarray analysis of 236 primary NBs (Pearson's correlation coefficient r = 0.65). While DEIN expression in the latter cohort was associated with stage 4S (p = 0.02), HAND2 expression was not associated with tumor stage. In contrast, both HAND2 and DEIN transcript levels were highly associated with age at diagnosis <12 months (p = 0.001). The intergenic region shows substantial homology in different species (89%, 72% and 53% identity between human and mouse, chicken and zebrafish, respectively) and contains many highly conserved putative transcription factor binding sites. Using luciferase reporter gene constructs, asymmetrical bidirectional promoter activity was found in four NB cell lines: In DEIN orientation, an average 3.4 fold increase in activity was observed as compared to the promoterless vector, whereas an average 15.4 fold activation was detected in HAND2 orientation. The presence of two highly conserved putative regulatory elements, one of which was shown to enhance HAND2 expression in branchial arches previously, displayed weak repressor activity for both genes.
HAND2 and DEIN represent a gene pair that is tightly linked by a bidirectional promoter in an evolutionary highly conserved manner. Expression of both genes in NB is co-regulated by asymmetrical activity of this promoter and modulated by the activity of two cis-regulatory elements acting as weak repressors. The concurrent quantitative and tissue specific expression of HAND2 and DEIN suggests a functional link between both genes.
PMCID: PMC2670301  PMID: 19348682
23.  Conservation of histone H2A/H2B intergene regions: a role for the H2B specific element in divergent transcription. 
Nucleic Acids Research  1988;16(17):8571-8586.
The organization and function of potential regulatory elements associated with the promoters of chicken H2A and H2B genes pairs have been examined. The intergene regions of six dispersed and divergently-transcribed H2A/H2B gene pairs contain several extremely well conserved and spaced blocks of sequence homology. Adjacent coding regions are on average 342 base-pairs apart. Respective TATA boxes are separated by 180 base-pairs and within this confined region there are four CCAAT boxes and a previously identified 13 base-pair H2B-specific element (H2B-box) which has homology to the octamer motif present in a number of gene promoter/enhancer elements. Transcription of H2A and H2B genes from wild-type and mutant constructs was measured in transient assays by transfection into HeLa cells, and in permanently transformed clonal cell lines. In vitro separation of the two genes at a unique intergenic site significantly decreased transcription of each gene. This suggested that the H2A/H2B gene pairs contained overlapping promoters. Deletion or point mutagenesis of the H2B-specific element decreased the levels of H2B and the H2A transcripts indicating that this sequence is a common regulatory element of both genes in the divergent-pair configeration.
PMCID: PMC338577  PMID: 3267232
24.  Patterns of sequence conservation in presynaptic neural genes 
Genome Biology  2006;7(11):R105.
Comparative sequence analysis and annotation of genomic regions surrounding 150 presynaptic genes identified over 26,000 elements highly conserved in eight vertebrate species; these results are made available in the SynapseDB database.
The neuronal synapse is a fundamental functional unit in the central nervous system of animals. Because synaptic function is evolutionarily conserved, we reasoned that functional sequences of genes and related genomic elements known to play important roles in neurotransmitter release would also be conserved.
Evolutionary rate analysis revealed that presynaptic proteins evolve slowly, although some members of large gene families exhibit accelerated evolutionary rates relative to other family members. Comparative sequence analysis of 46 megabases spanning 150 presynaptic genes identified more than 26,000 elements that are highly conserved in eight vertebrate species, as well as a small subset of sequences (6%) that are shared among unrelated presynaptic genes. Analysis of large gene families revealed that upstream and intronic regions of closely related family members are extremely divergent. We also identified 504 exceptionally long conserved elements (≥360 base pairs, ≥80% pair-wise identity between human and other mammals) in intergenic and intronic regions of presynaptic genes. Many of these elements form a highly stable stem-loop RNA structure and consequently are candidates for novel regulatory elements, whereas some conserved noncoding elements are shown to correlate with specific gene expression profiles. The SynapseDB online database integrates these findings and other functional genomic resources for synaptic genes.
Highly conserved elements in nonprotein coding regions of 150 presynaptic genes represent sequences that may be involved in the transcriptional or post-transcriptional regulation of these genes. Furthermore, comparative sequence analysis will facilitate selection of genes and noncoding sequences for future functional studies and analysis of variation studies in neurodevelopmental and psychiatric disorders.
PMCID: PMC1794582  PMID: 17096848
25.  Sorting out inherent features of head-to-head gene pairs by evolutionary conservation 
BMC Bioinformatics  2010;11(Suppl 11):S16.
A ‘head-to-head’ (h2h) gene pair is defined as a genomic locus in which two adjacent genes are divergently transcribed from opposite strands of DNA. In our previous work, this gene organization was found to be ancient and conserved, which subjects functionally related genes to transcriptional co-regulation. However, some of the biological features of h2h pairs still need further clarification.
In this work, we assorted human h2h pairs into four sequentially inclusive sets of gradually incremental conservation, and examined whether those previously asserted features were conserved or sharpened in the more conserved h2h pair sets in order to identify the inherent features of the h2h gene organization. The features of TSS distance, expression correlation within h2h pairs and among h2h genes, transcription factor association and functional similarities of h2h genes were examined. Our conservation-based analyses found that the bi-directional promoters of h2h gene pairs are most likely shorter than 100 bp; h2h gene pairs generally have only significant positive expression correlation but not negative correlation, and remarkably high positive expression correlations exist among h2h genes, as well as between h2h pairs observed in our previous study; h2h paired genes tend to share transcription factors. In addition, expression correlation of h2h pairs is positively related with the TF-sharing and functional coordination, while not related with TSS distance.
Our findings remove the uncertainties of h2h genes about TSS distance, expression correlation and functional coordination, which provide insights into the study on the molecular mechanisms and functional consequences of the transcriptional regulation based on this special gene organization.
PMCID: PMC3024869  PMID: 21172051

Results 1-25 (1206258)