PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1106168)

Clipboard (0)
None

Related Articles

1.  Defining the budding yeast chromatin-associated interactome 
We report here the first large-scale affinity purification and mass spectrometry (AP-MS) study of chromatin-associated protein, in which over 100 different baits involved in chromatin biology were studied by modified chromatin immunopurification (mChIP)-MS. In particular, focus was placed on poorly studied chromatin binding proteins, such as transcription factors, which have been underrepresented in previous AP-MS studies.mChIP-MS analysis of transcription factors identified dense networks of protein associated with chromatin that were composed of specific transcriptional co-activators, information not accessible through the use of classical AP-MS methods.Finally, we demonstrate that novel protein–protein interactions identified in study by mChIP have functional implications exemplified by the detailed study of both the ubiquitination of the proline isomerase Cpr1 and of histone chaperones involved in the regulation of the HTA1-HTB1 promoter.Our work demonstrates the value of targeted interactome studies, in which affinity purification methods are adapted to the needs of specific baits, as is the case for chromatin binding proteins.
The maintenance of cellular fitness requires living organisms to integrate multiple signals into coordinated outputs. Central to this process is the regulation of the expression of the genetic information encoded into DNA. As a result, there are numerous constraints imposed on gene expression. The access to DNA is restricted by the formation of nucleosomes, in which DNA is wrapped around histone octamers to form chromatin wherein the volume of DNA is considerably reduced. As such, nucleosome positioning is critical and must be defined precisely, particularly during transcription (Workman, 2006). Furthermore, nucleosomes can be actively assembled/disassembled by histone chaperones and can be made to ‘slide' along DNA by the actions of chromatin remodelers. Moreover, the histone proteins are heavily regulated at the expression level and by extensive post-translational modifications (PTMs) (Campos and Reinberg, 2009). Histone PTMs have also been shown to help recruit numerous chromatin-associated factors in accordance with the histone code (Strahl and Allis, 2000). Although our understanding of chromatin and its roles has improved, we still have limited knowledge of the chromatin-associated protein complexes and their interactions.
The characterization of biological systems and of specific subdomain within them, such as chromatin, remains a difficult task. An efficient approach to gain insight in the function of protein is to define its interactome. The underlying principle of protein interaction mapping is that proteins found to interact must be involved in common processes and localization, i.e., guilt by association. The large-scale mapping of proteins interactions allows to annotate protein of unknown functions, implicate protein of known functions in different processes and derive new hypothesis. This is possible because most proteins do not act in isolation but rather as part of complexes, and thus possess interaction partners that can now be detected with the right tools. AP-MS has emerged as a powerful tool for characterizing protein–protein interactions and biological systems in general (Gingras et al, 2007; Gstaiger and Aebersold, 2009).
Recently, we reported the development of a novel affinity purification approach termed mChIP, which was designed to improve the characterization of DNA binding proteins interactome (Lambert et al, 2009). The mChIP method consists of a single affinity purification step, whereby chromatin-associated proteins are isolated from mildly sonicated and gently clarified cellular extracts using magnetic beads coated with antibodies (Lambert et al, 2009; Figure 1A). As such, the mChIP approach maintains chromatin fragments in solution enabling their specific purification, something not previously possible in classical AP-MS methods (Lambert et al, 2009).
In this study, we report the utilization of mChIP followed by MS for the characterization of more than 100 proteins and their associated protein networks (Figure 1B). We initially focused on DNA-associated proteins that had been poorly characterized in past AP-MS studies, such as transcription factors. In addition, many histone modifiers, such as lysine acetyl transferases (KAT) and lysine methyl transferases, critical components of chromatin function and regulation, were also studied by mChIP. This resulted in raw non-redundant mChIP-MS data containing ∼9000 protein–protein interactions between ∼900 proteins. Following a two-step curation process designed to remove common contaminants and protein not specifically associated with the baits under study, a high confidence mChIP-MS data set was produced containing 2966 protein–protein interactions between 724 proteins (Figure 1B). It is important to note that our curation strategy was capable of maintaining the majority of the protein–protein interaction identified in previous AP-MS studies, while removing the bulk of protein–protein interaction not related to chromatin biology. Further analysis of the mChIP-MS data set revealed that for most bait tested, mChIP-MS resulted in the identification of more interaction partners than classical TAP-MS.
Visualization of the mChIP-MS data set was achieved by generating heat maps from two-dimensional hierarchical clustering of the bait–prey interactions. This revealed numerous clusters within our data set supporting functional relationship. For instance, mChIP analysis of the highly homologous heat-shock-inducible transcription factors Msn2 and Msn4 clustered with different transcriptional co-activators. Importantly, our analysis also revealed key differences in the co-activators associated with Msn2 and Msn4 relevant to their function. Another example that we explore in greater details is the Cpr1 proline isomerase, a known member of the Set3 complex (Pijnappel et al, 2001). mChIP-MS analysis of Cpr1 revealed an extended network of associated proteins, including the E3 ubiquitin ligase Bre1 and its association partner Lge1 (Figure 5A). This association raised the possibility of a direct action of Bre1/Lge1 on Cpr1 to ubiquitinate it. In targeted experiments, we observed that Cpr1 is in fact ubiquitinated in a process involving Bre1/Lge1 (Figure 5E), confirming their functional relationship. As such, mChIP is capable of uncovering novel protein–protein interactions with physiological impacts.
In this study, we report how the use of an AP-MS method designed for a given class of protein (chromatin-associated proteins) can help uncover numerous novel protein–protein interactions. Furthermore, our work detected dense chromatin-associated protein networks being co-purified with multiple transcription factors and other DNA binding proteins. The fact that even in the best-characterized model organism Saccharomyces cerevisiae, thousands of novel protein–protein interactions can be detected supports our view that targeted interactome studies are worthwhile and desirable. As such, the budding yeast interactome can still be consider incomplete and warrant further study.
We previously reported a novel affinity purification (AP) method termed modified chromatin immunopurification (mChIP), which permits selective enrichment of DNA-bound proteins along with their associated protein network. In this study, we report a large-scale study of the protein network of 102 chromatin-related proteins from budding yeast that were analyzed by mChIP coupled to mass spectrometry. This effort resulted in the detection of 2966 high confidence protein associations with 724 distinct preys. mChIP resulted in significantly improved interaction coverage as compared with classical AP methodology for ∼75% of the baits tested. Furthermore, mChIP successfully identified novel binding partners for many lower abundance transcription factors that previously failed using conventional AP methodologies. mChIP was also used to perform targeted studies, particularly of Asf1 and its associated proteins, to allow for a understanding of the physical interplay between Asf1 and two other histone chaperones, Rtt106 and the HIR complex, to be gained.
doi:10.1038/msb.2010.104
PMCID: PMC3018163  PMID: 21179020
affinity purification; chromatin-associated protein networks; mass spectrometry; nucleosome assembly factor Asf1; protein–DNA interaction
2.  Uncoupling Transcription from Covalent Histone Modification 
PLoS Genetics  2014;10(4):e1004202.
It is widely accepted that transcriptional regulation of eukaryotic genes is intimately coupled to covalent modifications of the underlying chromatin template, and in certain cases the functional consequences of these modifications have been characterized. Here we present evidence that gene activation in the silent heterochromatin of the yeast Saccharomyces cerevisiae can occur in the context of little, if any, covalent histone modification. Using a SIR-regulated heat shock-inducible transgene, hsp82-2001, and a natural drug-inducible subtelomeric gene, YFR057w, as models we demonstrate that substantial transcriptional induction (>200-fold) can occur in the context of restricted histone loss and negligible levels of H3K4 trimethylation, H3K36 trimethylation and H3K79 dimethylation, modifications commonly linked to transcription initiation and elongation. Heterochromatic gene activation can also occur with minimal H3 and H4 lysine acetylation and without replacement of H2A with the transcription-linked variant H2A.Z. Importantly, absence of histone modification does not stem from reduced transcriptional output, since hsp82-ΔTATA, a euchromatic promoter mutant lacking a TATA box and with threefold lower induced transcription than heterochromatic hsp82-2001, is strongly hyperacetylated in response to heat shock. Consistent with negligible H3K79 dimethylation, dot1Δ cells lacking H3K79 methylase activity show unimpeded occupancy of RNA polymerase II within activated heterochromatic promoter and coding regions. Our results indicate that large increases in transcription can be observed in the virtual absence of histone modifications often thought necessary for gene activation.
Author Summary
The proper regulation of gene expression is of fundamental importance in the maintenance of normal growth and development. Misregulation of genes can lead to such outcomes as cancer, diabetes and neurodegenerative disease. A key step in gene regulation occurs during the transcription of the chromosomal DNA into messenger RNA by the enzyme, RNA polymerase II. Histones are small, positively charged proteins that package genomic DNA into arrays of bead-like particles termed nucleosomes, the principal components of chromatin. Increasing evidence suggests that nucleosomal histones play an active role in regulating transcription, and that this is derived in part from reversible chemical (“covalent”) modifications that take place on their amino acids. These histone modifications create novel surfaces on nucleosomes that can serve as docking sites for other proteins that control a gene's expression state. In this study we present evidence that contrary to the general case, covalent modifications typically associated with transcription are minimally used by genes embedded in a specialized, condensed chromatin structure termed heterochromatin in the model organism baker's yeast. Our observations are significant, for they suggest that gene transcription can occur in a living cell in the virtual absence of covalent modification of the chromatin template.
doi:10.1371/journal.pgen.1004202
PMCID: PMC3983032  PMID: 24722509
3.  Hominoid-Specific De Novo Protein-Coding Genes Originating from Long Non-Coding RNAs 
PLoS Genetics  2012;8(9):e1002942.
Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA–Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis), which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novo protein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novo genes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level.
Author Summary
Ever since the pre-genomic era, people believed that “mother gene”-based mechanisms such as gene duplication were the major means of creating new genes. Recently, we and others reported several “motherless” protein-coding genes in human, challenging the conventional idea in that some protein-coding genes might have emerged de novo from ancestral non-coding DNAs. However, how these interesting proteins originated is a question that remained unaddressed. The ancestral non-coding DNA must become transcribed and gain a translatable open reading frame before becoming a protein-coding gene, but either order of these two steps is possible. Here, we performed a comparative transcriptome study in human, chimpanzee, and rhesus macaque to address these fundamental questions. We found that most of the hominoid-specific de novo protein-coding genes encoded long non-coding RNAs in rhesus macaque or chimpanzee, with similar transcript structure and correlated tissue expression profile, but the protein-coding genes often had higher transcriptional abundance. According to the rule of parsimony, we conclude that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes that are then further optimized at the transcriptional level, a pattern insensitive to the parameters used in the identification and analysis of de novo genes.
doi:10.1371/journal.pgen.1002942
PMCID: PMC3441637  PMID: 23028352
4.  Epigenetic Upregulation of lncRNAs at 13q14.3 in Leukemia Is Linked to the In Cis Downregulation of a Gene Cluster That Targets NF-kB 
PLoS Genetics  2013;9(4):e1003373.
Non-coding RNAs are much more common than previously thought. However, for the vast majority of non-coding RNAs, the cellular function remains enigmatic. The two long non-coding RNA (lncRNA) genes DLEU1 and DLEU2 map to a critical region at chromosomal band 13q14.3 that is recurrently deleted in solid tumors and hematopoietic malignancies like chronic lymphocytic leukemia (CLL). While no point mutations have been found in the protein coding candidate genes at 13q14.3, they are deregulated in malignant cells, suggesting an epigenetic tumor suppressor mechanism. We therefore characterized the epigenetic makeup of 13q14.3 in CLL cells and found histone modifications by chromatin-immunoprecipitation (ChIP) that are associated with activated transcription and significant DNA-demethylation at the transcriptional start sites of DLEU1 and DLEU2 using 5 different semi-quantitative and quantitative methods (aPRIMES, BioCOBRA, MCIp, MassARRAY, and bisulfite sequencing). These epigenetic aberrations were correlated with transcriptional deregulation of the neighboring candidate tumor suppressor genes, suggesting a coregulation in cis of this gene cluster. We found that the 13q14.3 genes in addition to their previously known functions regulate NF-kB activity, which we could show after overexpression, siRNA–mediated knockdown, and dominant-negative mutant genes by using Western blots with previously undescribed antibodies, by a customized ELISA as well as by reporter assays. In addition, we performed an unbiased screen of 810 human miRNAs and identified the miR-15/16 family of genes at 13q14.3 as the strongest inducers of NF-kB activity. In summary, the tumor suppressor mechanism at 13q14.3 is a cluster of genes controlled by two lncRNA genes that are regulated by DNA-methylation and histone modifications and whose members all regulate NF-kB. Therefore, the tumor suppressor mechanism in 13q14.3 underlines the role both of epigenetic aberrations and of lncRNA genes in human tumorigenesis and is an example of colocalization of a functionally related gene cluster.
Author Summary
Recent results suggest that genome regions not coding for proteins are read and transcribed into RNA. While the function for the majority of the resulting non-coding RNA molecules remains unclear, some of them are termed according to their length (typically 200–2,000 nucleotides) as long non-coding RNA (lncRNA) genes that play a role in regulating the activity of target genes. In most instances, this deregulation involves changes of so-called “epigenetic” marks associated with the DNA that are inherited to the cellular progeny without changes in the DNA sequence. Here we describe an example where two lncRNA genes (DLEU1 and DLEU2) are epigenetically deregulated together with a cluster of neighboring protein-coding tumor suppressor genes in almost all patients suffering from chronic lymphocytic leukemia. Such a common regulation suggests that the affected genes are involved in the same cellular pathway. In line with this notion, the 13q14.3 genes modulate the NF-kB signalling pathway, either inducing or repressing its activity. An activation of NF-kB has previously been shown to promote survival of the leukemic cells, underlining the importance of the 13q14.3 tumor suppressor locus for the pathomechanism of the disease.
doi:10.1371/journal.pgen.1003373
PMCID: PMC3616974  PMID: 23593011
5.  The essential genome of a bacterium 
This study reports the essential Caulobacter genome at 8 bp resolution determined by saturated transposon mutagenesis and high-throughput sequencing. This strategy is applicable to full genome essentiality studies in a broad class of bacterial species.
The essential Caulobacter genome was determined at 8 bp resolution using hyper-saturated transposon mutagenesis coupled with high-throughput sequencing.Essential protein-coding sequences comprise 90% of the essential genome; the remaining 10% comprising essential non-coding RNA sequences, gene regulatory elements and essential genome replication features.Of the 3876 annotated open reading frames (ORFs), 480 (12.4%) were essential ORFs, 3240 (83.6%) were non-essential ORFs and 156 (4.0%) were ORFs that severely impacted fitness when mutated.The essential elements are preferentially positioned near the origin and terminus of the Caulobacter chromosome.This high-resolution strategy is applicable to high-throughput, full genome essentiality studies and large-scale genetic perturbation experiments in a broad class of bacterial species.
The regulatory events that control polar differentiation and cell-cycle progression in the bacterium Caulobacter crescentus are highly integrated, and they have to occur in the proper order (McAdams and Shapiro, 2011). Components of the core regulatory circuit are largely known. Full discovery of its essential genome, including non-coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of this bacterial cell. We have identified all the essential coding and non-coding elements of the Caulobacter chromosome using a hyper-saturated transposon mutagenesis strategy that is scalable and can be readily extended to obtain rapid and accurate identification of the essential genome elements of any sequenced bacterial species at a resolution of a few base pairs.
We engineered a Tn5 derivative transposon (Tn5Pxyl) that carries at one end an inducible outward pointing Pxyl promoter (Christen et al, 2010). We showed that this transposon construct inserts into the genome randomly where it can activate or disrupt transcription at the site of integration, depending on the insertion orientation. DNA from hundred of thousands of transposon insertion sites reading outward into flanking genomic regions was parallel PCR amplified and sequenced by Illumina paired-end sequencing to locate the insertion site in each mutant strain (Figure 1). A single sequencing run on DNA from a mutagenized cell population yielded 118 million raw sequencing reads. Of these, >90 million (>80%) read outward from the transposon element into adjacent genomic DNA regions and the insertion site could be mapped with single nucleotide resolution. This yielded the location and orientation of 428 735 independent transposon insertions in the 4-Mbp Caulobacter genome.
Within non-coding sequences of the Caulobacter genome, we detected 130 non-disruptable DNA segments between 90 and 393 bp long in addition to all essential promoter elements. Among 27 previously identified and validated sRNAs (Landt et al, 2008), three were contained within non-disruptable DNA segments and another three were partially disruptable, that is, insertions caused a notable growth defect. Two additional small RNAs found to be essential are the transfer-messenger RNA (tmRNA) and the ribozyme RNAseP (Landt et al, 2008). In addition to the 8 non-disruptable sRNAs, 29 out of the 130 intergenic essential non-coding sequences contained non-redundant tRNA genes; duplicated tRNA genes were non-essential. We also identified two non-disruptable DNA segments within the chromosomal origin of replication. Thus, we resolved essential non-coding RNAs, tRNAs and essential replication elements within the origin region of the chromosome. An additional 90 non-disruptable small genome elements of currently unknown function were identified. Eighteen of these are conserved in at least one closely related species. Only 2 could encode a protein of over 50 amino acids.
For each of the 3876 annotated open reading frames (ORFs), we analyzed the distribution, orientation, and genetic context of transposon insertions. There are 480 essential ORFs and 3240 non-essential ORFs. In addition, there were 156 ORFs that severely impacted fitness when mutated. The 8-bp resolution allowed a dissection of the essential and non-essential regions of the coding sequences. Sixty ORFs had transposon insertions within a significant portion of their 3′ region but lacked insertions in the essential 5′ coding region, allowing the identification of non-essential protein segments. For example, transposon insertions in the essential cell-cycle regulatory gene divL, a tyrosine kinase, showed that the last 204 C-terminal amino acids did not impact viability, confirming previous reports that the C-terminal ATPase domain of DivL is dispensable for viability (Reisinger et al, 2007; Iniesta et al, 2010). In addition, we found that 30 out of 480 (6.3%) of the essential ORFs appear to be shorter than the annotated ORF, suggesting that these are probably mis-annotated.
Among the 480 ORFs essential for growth on rich media, there were 10 essential transcriptional regulatory proteins, including 5 previously identified cell-cycle regulators (McAdams and Shapiro, 2003; Holtzendorff et al, 2004; Collier and Shapiro, 2007; Gora et al, 2010; Tan et al, 2010) and 5 uncharacterized predicted transcription factors. In addition, two RNA polymerase sigma factors RpoH and RpoD, as well as the anti-sigma factor ChrR, which mitigates rpoE-dependent stress response under physiological growth conditions (Lourenco and Gomes, 2009), were also found to be essential. Thus, a set of 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor are the core essential transcriptional regulators for growth on rich media. To further characterize the core components of the Caulobacter cell-cycle control network, we identified all essential regulatory sequences and operon transcripts. Altogether, the 480 essential protein-coding and 37 essential RNA-coding Caulobacter genes are organized into operons such that 402 individual promoter regions are sufficient to regulate their expression. Of these 402 essential promoters, the transcription start sites (TSSs) of 105 were previously identified (McGrath et al, 2007).
The essential genome features are non-uniformly distributed on the Caulobacter genome and enriched near the origin and the terminus regions. In contrast, the chromosomal positions of the published E. coli essential coding sequences (Rocha, 2004) are preferentially located at either side of the origin (Figure 4A). This indicates that there are selective pressures on chromosomal positioning of some essential elements (Figure 4A).
The strategy described in this report could be readily extended to quickly determine the essential genome for a large class of bacterial species.
Caulobacter crescentus is a model organism for the integrated circuitry that runs a bacterial cell cycle. Full discovery of its essential genome, including non-coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of a bacterial cell. Using hyper-saturated transposon mutagenesis coupled with high-throughput sequencing, we determined the essential Caulobacter genome at 8 bp resolution, including 1012 essential genome features: 480 ORFs, 402 regulatory sequences and 130 non-coding elements, including 90 intergenic segments of unknown function. The essential transcriptional circuitry for growth on rich media includes 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor. We identified all essential promoter elements for the cell cycle-regulated genes. The essential elements are preferentially positioned near the origin and terminus of the chromosome. The high-resolution strategy used here is applicable to high-throughput, full genome essentiality studies and large-scale genetic perturbation experiments in a broad class of bacterial species.
doi:10.1038/msb.2011.58
PMCID: PMC3202797  PMID: 21878915
functional genomics; next-generation sequencing; systems biology; transposon mutagenesis
6.  Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs 
PLoS Genetics  2013;9(4):e1003470.
Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (∼30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non–TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ∼30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ∼35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
Author Summary
An unexpected layer of complexity in the genomes of humans and other vertebrates lies in the abundance of genes that do not appear to encode proteins but produce a variety of non-coding RNAs. In particular, the human genome is currently predicted to contain 5,000–10,000 independent gene units generating long (>200 nucleotides) noncoding RNAs (lncRNAs). While there is growing evidence that a large fraction of these lncRNAs have cellular functions, notably to regulate protein-coding gene expression, almost nothing is known on the processes underlying the evolutionary origins and diversification of lncRNA genes. Here we show that transposable elements, through their capacity to move and spread in genomes in a lineage-specific fashion, as well as their ability to introduce regulatory sequences upon chromosomal insertion, represent a major force shaping the lncRNA repertoire of humans, mice, and zebrafish. Not only do TEs make up a substantial fraction of mature lncRNA transcripts, they are also enriched in the vicinity of lncRNA genes, where they frequently contribute to their transcriptional regulation. Through specific examples we provide evidence that some TE sequences embedded in lncRNAs are critical for the biogenesis of lncRNAs and likely important for their function.
doi:10.1371/journal.pgen.1003470
PMCID: PMC3636048  PMID: 23637635
7.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines 
PLoS Genetics  2006;2(4):e29.
RIKEN's FANTOM project has revealed many previously unknown coding sequences, as well as an unexpected degree of variation in transcripts resulting from alternative promoter usage and splicing. Ever more transcripts that do not code for proteins have been identified by transcriptome studies, in general. Increasing evidence points to the important cellular roles of such non-coding RNAs (ncRNAs). The distinction of protein-coding RNA transcripts from ncRNA transcripts is therefore an important problem in understanding the transcriptome and carrying out its annotation. Very few in silico methods have specifically addressed this problem. Here, we introduce CONC (for “coding or non-coding”), a novel method based on support vector machines that classifies transcripts according to features they would have if they were coding for proteins. These features include peptide length, amino acid composition, predicted secondary structure content, predicted percentage of exposed residues, compositional entropy, number of homologs from database searches, and alignment entropy. Nucleotide frequencies are also incorporated into the method. Confirmed coding cDNAs for eukaryotic proteins from the Swiss-Prot database constituted the set of true positives, ncRNAs from RNAdb and NONCODE the true negatives. Ten-fold cross-validation suggested that CONC distinguished coding RNAs from ncRNAs at about 97% specificity and 98% sensitivity. Applied to 102,801 mouse cDNAs from the FANTOM3 dataset, our method reliably identified over 14,000 ncRNAs and estimated the total number of ncRNAs to be about 28,000.
Synopsis
There are two types of RNA: messenger RNAs (mRNAs), which are translated into proteins, and non-coding RNAs (ncRNAs), which function as RNA molecules. Besides textbook examples such as tRNAs and rRNAs, non-coding RNAs have been found to carry out very diverse functions, from mRNA splicing and RNA modification to translational regulation. It has been estimated that non-coding RNAs make up the vast majority of transcription output of higher eukaryotes. Discriminating mRNA from ncRNA has become an important biological and computational problem. The authors describe a computational method based on a machine learning algorithm known as a support vector machine (SVM) that classifies transcripts according to features they would have if they were coding for proteins. These features include peptide length, amino acid composition, secondary structure content, and protein alignment information. The method is applied to the dataset from the FANTOM3 large-scale mouse cDNA sequencing project; it identifies over 14,000 ncRNAs in mouse and estimates the total number of ncRNAs in the FANTOM3 data to be about 28,000.
doi:10.1371/journal.pgen.0020029
PMCID: PMC1449884  PMID: 16683024
8.  Novel Long Non-Coding RNAs Are Regulated by Angiotensin II in Vascular Smooth Muscle Cells 
Circulation research  2013;113(3):266-278.
Rationale
Misregulation of angiotensin II (Ang II) actions can lead to atherosclerosis and hypertension. Evaluating transcriptomic responses to Ang II in vascular smooth muscle cells (VSMCs) is important to understand the gene networks regulated by Ang II which might uncover previously unidentified mechanisms and new therapeutic targets.
Objective
To identify all transcripts, including novel protein-coding and long non-coding RNAs, differentially expressed in response to Ang II in rat VSMCs using transcriptome and epigenome profiling.
Methods and Results
De novo assembly of transcripts from RNA-seq revealed novel protein-coding and long non-coding RNAs (lncRNAs). The majority of the genomic loci of these novel transcripts are enriched for histone H3 lysine-4-trimethylation and histone H3 lysine-36-trimethylation, two chromatin modifications found at actively transcribed regions, providing further evidence that these are bonafide transcripts. Analysis of transcript abundance identified all protein-coding and lncRNAs regulated by Ang II. We further discovered that one Ang II-regulated lncRNA functions as the host transcript for miR-221 and miR-222, two miRNAs implicated in cell proliferation. Additionally, siRNA-mediated knockdown of Lnc-Ang362 reduced proliferation of VSMCs.
Conclusions
These data provide novel insights into the epigenomic and transcriptomic effects of Ang II in VSMCs. They provide the first identification of Ang II-regulated lncRNAs, which suggests functional roles for these lncRNAs in mediating cellular responses to Ang II. Furthermore, we identify one Ang IIregulated lncRNA that is responsible for the production of two miRNAs implicated in VSMC proliferation. These newly identified non-coding transcripts could be exploited as novel therapeutic targets for Ang II-associated cardiovascular diseases.
doi:10.1161/CIRCRESAHA.112.300849
PMCID: PMC3763837  PMID: 23697773
Angiotensin II; genome; transcriptome; VSMCs; lncRNA; genomics; smooth muscle; gene expression/regulation; signaling; atherosclerosis
9.  Uncoupling Antisense-Mediated Silencing and DNA Methylation in the Imprinted Gnas Cluster 
PLoS Genetics  2011;7(3):e1001347.
There is increasing evidence that non-coding macroRNAs are major elements for silencing imprinted genes, but their mechanism of action is poorly understood. Within the imprinted Gnas cluster on mouse chromosome 2, Nespas is a paternally expressed macroRNA that arises from an imprinting control region and runs antisense to Nesp, a paternally repressed protein coding transcript. Here we report a knock-in mouse allele that behaves as a Nespas hypomorph. The hypomorph mediates down-regulation of Nesp in cis through chromatin modification at the Nesp promoter but in the absence of somatic DNA methylation. Notably there is reduced demethylation of H3K4me3, sufficient for down-regulation of Nesp, but insufficient for DNA methylation; in addition, there is depletion of the H3K36me3 mark permissive for DNA methylation. We propose an order of events for the regulation of a somatic imprint on the wild-type allele whereby Nespas modulates demethylation of H3K4me3 resulting in repression of Nesp followed by DNA methylation. This study demonstrates that a non-coding antisense transcript or its transcription is associated with silencing an overlapping protein-coding gene by a mechanism independent of DNA methylation. These results have broad implications for understanding the hierarchy of events in epigenetic silencing by macroRNAs.
Author Summary
Genomic imprinting is a process resulting in expression of genes according to parental origin. Some imprinted genes are expressed when paternally derived and others when maternally derived. Thus imprinted genes are monoallelically expressed and one copy has to be silenced. There is evidence that some long non-coding RNAs, acting in cis, have a role in silencing. We investigated the role of Nespas, a gene for a non-coding RNA that is only expressed from the paternally derived chromosome in the Gnas cluster and runs antisense to its sense counterpart, Nesp. Expression of Nespas is associated with silencing of Nesp and a repressive methylation mark on the Nesp DNA. We generated a Nespas mutant with reduced levels of activity and showed that it down-regulated its sense counterpart Nesp, in the absence of a DNA methylation mark, but in the presence of an altered chromatin mark. We conclude that Nespas can repress Nesp by a mechanism independent of DNA methylation, by modulating a chromatin mark.
doi:10.1371/journal.pgen.1001347
PMCID: PMC3063750  PMID: 21455290
10.  Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells 
PLoS Computational Biology  2014;10(6):e1003611.
Mechanisms that generate transcript diversity are of fundamental importance in eukaryotes. Although a large fraction of human protein-coding genes and lincRNAs produce more than one mRNA isoform each, the regulation of this phenomenon is still incompletely understood. Much progress has been made in deciphering the role of sequence-specific features as well as DNA-and RNA-binding proteins in alternative splicing. Recently, however, several experimental studies of individual genes have revealed a direct involvement of epigenetic factors in alternative splicing and transcription initiation. While histone modifications are generally correlated with overall gene expression levels, it remains unclear how histone modification enrichment affects relative isoform abundance. Therefore, we sought to investigate the associations between histone modifications and transcript diversity levels measured by the rates of transcription start-site switching and alternative splicing on a genome-wide scale across protein-coding genes and lincRNAs. We found that the relationship between enrichment levels of epigenetic marks and transcription start-site switching is similar for protein-coding genes and lincRNAs. Furthermore, we found associations between splicing rates and enrichment levels of H2az, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me3, H3K27ac, H3K27me3, H3K36me3, H3K79me2, and H4K20me, marks traditionally associated with enhancers, transcription initiation, transcriptional repression, and others. These patterns were observed in both normal and cancer cell lines. Additionally, we developed a novel computational method that identified 840 epigenetically regulated candidate genes and predicted transcription start-site switching and alternative exon splicing with up to 92% accuracy based on epigenetic patterning alone. Our results suggest that the epigenetic regulation of transcript isoform diversity may be a relatively common genome-wide phenomenon representing an avenue of deregulation in tumor development.
Author Summary
Traditionally, the regulation of gene expression was thought to be largely based on DNA and RNA sequence motifs. However, this dogma has recently been challenged as other factors, such as epigenetic patterning of the genome, have become better understood. Sparse but convincing experimental evidence suggests that the epigenetic background, in the form of histone modifications, acts as an additional layer of regulation determining how transcripts are processed. Here we developed a computational approach to investigate the genome-wide prevalence and the level of association between the enrichment of epigenetic marks and transcript diversity generated via alternative transcription start sites and splicing. We found that the role of epigenetic patterning in alternative transcription start-site switching is likely to be the same for all genes whereas the role of epigenetic patterns in splicing is likely gene-specific. Furthermore, we show that epigenetic data alone can be used to predict the inclusion pattern of an exon. These findings have significant implications for a better understanding of the regulation of transcript diversity in humans as well as the modifications arising during tumor development.
doi:10.1371/journal.pcbi.1003611
PMCID: PMC4046914  PMID: 24901363
11.  The DNA Methylome of Human Peripheral Blood Mononuclear Cells 
PLoS Biology  2010;8(11):e1000533.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Author Summary
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
doi:10.1371/journal.pbio.1000533
PMCID: PMC2976721  PMID: 21085693
12.  Independent Chromatin Binding of ARGONAUTE4 and SPT5L/KTF1 Mediates Transcriptional Gene Silencing 
PLoS Genetics  2011;7(6):e1002120.
Eukaryotic genomes contain significant amounts of transposons and repetitive DNA elements, which, if transcribed, can be detrimental to the organism. Expression of these elements is suppressed by establishment of repressive chromatin modifications. In Arabidopsis thaliana, they are silenced by the siRNA–mediated transcriptional gene silencing pathway where long non-coding RNAs (lncRNAs) produced by RNA Polymerase V (Pol V) guide ARGONAUTE4 (AGO4) to chromatin and attract enzymes that establish repressive chromatin modifications. It is unknown how chromatin modifying enzymes are recruited to chromatin. We show through chromatin immunoprecipitation (ChIP) that SPT5L/KTF1, a silencing factor and a homolog of SPT5 elongation factors, binds chromatin at loci subject to transcriptional silencing. Chromatin binding of SPT5L/KTF1 occurs downstream of RNA Polymerase V, but independently from the presence of 24-nt siRNA. We also show that SPT5L/KTF1 and AGO4 are recruited to chromatin in parallel and independently of each other. As shown using methylation-sensitive restriction enzymes, binding of both AGO4 and SPT5L/KTF1 is required for DNA methylation and repressive histone modifications of several loci. We propose that the coordinate binding of SPT5L and AGO4 creates a platform for direct or indirect recruitment of chromatin modifying enzymes.
Author Summary
Transposons and other repetitive elements occupy vast areas of the eukaryotic genomes. They pose a threat to genome integrity but at the same time regulate expression of many genes and have been proposed to be a major factor contributing to genome evolution. One of the processes responsible for controlling activity of transposons and other repetitive elements is transcriptional gene silencing. This process uses small interfering RNA and long non-coding RNA to recruit enzymes that establish repressive chromatin modifications. Several proteins have been identified to be needed for siRNA–mediated transcriptional silencing in Arabidopsis thaliana, however for many of them their position in the silencing pathway is unknown. One of those proteins is SPT5L/KTF1, a homolog of an elongation factor associated with RNA Polymerase II. Here we establish the position of SPT5L in the silencing pathway and propose the molecular mechanism of its function. This gives further knowledge of the mechanism of transcriptional gene silencing and is important to understand how transposons are controlled.
doi:10.1371/journal.pgen.1002120
PMCID: PMC3111484  PMID: 21738482
13.  Female-biased expression of long non-coding RNAs in domains that escape X-inactivation in mouse 
BMC Genomics  2010;11:614.
Background
Sexual dimorphism in brain gene expression has been recognized in several animal species. However, the relevant regulatory mechanisms remain poorly understood. To investigate whether sex-biased gene expression in mammalian brain is globally regulated or locally regulated in diverse brain structures, and to study the genomic organisation of brain-expressed sex-biased genes, we performed a large scale gene expression analysis of distinct brain regions in adult male and female mice.
Results
This study revealed spatial specificity in sex-biased transcription in the mouse brain, and identified 173 sex-biased genes in the striatum; 19 in the neocortex; 12 in the hippocampus and 31 in the eye. Genes located on sex chromosomes were consistently over-represented in all brain regions. Analysis on a subset of genes with sex-bias in more than one tissue revealed Y-encoded male-biased transcripts and X-encoded female-biased transcripts known to escape X-inactivation. In addition, we identified novel coding and non-coding X-linked genes with female-biased expression in multiple tissues. Interestingly, the chromosomal positions of all of the female-biased non-coding genes are in close proximity to protein-coding genes that escape X-inactivation. This defines X-chromosome domains each of which contains a coding and a non-coding female-biased gene. Lack of repressive chromatin marks in non-coding transcribed loci supports the possibility that they escape X-inactivation. Moreover, RNA-DNA combined FISH experiments confirmed the biallelic expression of one such novel domain.
Conclusion
This study demonstrated that the amount of genes with sex-biased expression varies between individual brain regions in mouse. The sex-biased genes identified are localized on many chromosomes. At the same time, sexually dimorphic gene expression that is common to several parts of the brain is mostly restricted to the sex chromosomes. Moreover, the study uncovered multiple female-biased non-coding genes that are non-randomly co-localized on the X-chromosome with protein-coding genes that escape X-inactivation. This raises the possibility that expression of long non-coding RNAs may play a role in modulating gene expression in domains that escape X-inactivation in mouse.
doi:10.1186/1471-2164-11-614
PMCID: PMC3091755  PMID: 21047393
14.  Deciphering a transcriptional regulatory code: modeling short-range repression in the Drosophila embryo 
A well-defined set of transcriptional regulatory modules was created and analyzed in the Drosophila embryo.Fractional occupancy-based models were developed to explain the interaction of short range transcriptional repressors with endogenous activators by using quantitative data from these modules.Our fractional occupancy-based modeling uncovered specific quantitative features of short-range repressors; a complex nonlinear quenching relationship, similar quenching efficiencies for different activators, and modest levels of cooperativityThe extension of the study to endogenous enhancers highlighted several features of enhancer architecture design in Drosophila embryos.
Transcriptional regulatory information, represented by patterns of protein-binding sites on DNA, comprises an important portion of genetic coding. Despite the abundance of genomic sequences now available, identifying and characterizing this information remain a major challenge. Minor changes in protein-binding sites can have profound effects on gene expression, and such changes have been shown to underlie important aspects of disease and evolution. Thus, an important aim in contemporary systems biology is to develop a global understanding of the transcriptional regulatory code, allowing prediction of gene output based on DNA sequence information. Recent studies have focused on endogenous transcriptional regulatory sequences (Janssens et al, 2006; Zinzen et al, 2006; Segal et al, 2008); however, distinct enhancers differ in many features, including transcription factor activity, spacing, and cooperativity, making it difficult to learn the effects of individual features and generalize them to other cis-regulatory elements. We have pursued a bottom up approach to understand the mechanistic processing of regulatory elements by the transcriptional machinery, using a well-defined and characterized set of repressors and activators in Drosophila blastoderm embryos. The study focuses on the Giant, Krüppel, Knirps, and Snail proteins, which have been characterized as short-range repressors, able to act locally to interfere with activator function (quenching) (Gray et al, 1994; Arnosti et al, 1996a). Such repressors have central functions in development.
The aim our study was to enable ab initio predictions of enhancer function, given defined quantities of regulatory proteins and the sequence of the enhancer (Figure 1). We have generated a large quantitative data set using fluorescent confocal laser scanning microscopy to determine the inputs (Giant, Krüppel, and Knirps protein levels) and outputs (lacZ mRNA levels) of the regulatory elements introduced into Drosophila by transgenesis. We analyzed the effect of altering specific features of a set of related gene modules, designed to uncover critical aspects of repression, including quenching distance, cooperativity, and overall factor potency.
We generated specific descriptions for each regulatory element using fractional occupancy-based modeling and identified quantitative values for parameters affecting transcriptional regulation in vivo, and these parameters were used to build and test the model. Through this process, we uncovered earlier unknown features that allow correct predictions of regulation by short-range repressors, including a non-monotonic distance function for quenching, which implicates possible phasing effects, a modest contribution for repressor–repressor cooperativity, and similarity in repression of disparate activators.
By applying these parameters to a model of the endogenous rhomboid enhancer, we uncovered novel insights into the architecture of this enhancer (Figure 8). Our study provides essential quantitative elements of a transcriptional regulatory code that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms. Extension of these predictive models should facilitate the development of more sophisticated computational algorithms for the identification and functional characterization of novel regulatory elements. The development of such quantitative modeling tools will change our understanding of the genome from essentially a parts list to a dynamically regulated system, and will greatly facilitate studies in disease, population genetics, and evolutionary biology.
Systems biology seeks a genomic-level interpretation of transcriptional regulatory information represented by patterns of protein-binding sites. Obtaining this information without direct experimentation is challenging; minor alterations in binding sites can have profound effects on gene expression, and underlie important aspects of disease and evolution. Quantitative modeling offers an alternative path to develop a global understanding of the transcriptional regulatory code. Recent studies have focused on endogenous regulatory sequences; however, distinct enhancers differ in many features, making it difficult to generalize to other cis-regulatory elements. We applied a systematic approach to simpler elements and present here the first quantitative analysis of short-range transcriptional repressors, which have central functions in metazoan development. Our fractional occupancy-based modeling uncovered unexpected features of these proteins' activity that allow accurate predictions of regulation by the Giant, Knirps, Krüppel, and Snail repressors, including modeling of an endogenous enhancer. This study provides essential elements of a transcriptional regulatory code that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms.
doi:10.1038/msb.2009.97
PMCID: PMC2824527  PMID: 20087339
Drosophila; enhancer; modeling; repression; transcription
15.  Niche adaptation by expansion and reprogramming of general transcription factors 
Experimental analysis of TFB family proteins in a halophilic archaeon reveals complex environment-dependent fitness contributions. Gene conversion events among these proteins can generate novel niche adaptation capabilities, a process that may have contributed to archaeal adaptation to extreme environments.
Evolution of archaeal lineages correlate with duplication events in the TFB family.Each TFB is required for adaptation to multiple environments.The relative fitness contributions of TFBs change with environmental context.Changes in the regulation of duplicated TFBs can generate new adaptation capabilities.
The evolutionary success of an organism depends on its ability to continually adapt to changes in the patterns of constant, periodic, and transient challenges within its environment. This process of ‘niche adaptation' requires reprogramming of the organism's environmental response networks by reorganizing interactions among diverse parts including environmental sensors, signal transducers, and transcriptional and post-transcriptional regulators. Gene duplications have been discovered to be one of the principal strategies in this process, especially for reprogramming of gene regulatory networks (GRNs). Whereas eukaryotes require dozens of factors for recruitment of RNA polymerase, archaea require just two general transcription factors (GTFs) that are orthologous to eukaryotic TFIIB (TFB in archaea) and TATA-binding protein (TBP) (Bell et al, 1998). Both of these GTFs have expanded extensively in nearly 50% of all archaea whose genomes have been fully sequenced. The phylogenetic analysis presented in this study reveal lineage-specific expansions of TFBs, suggesting that they might encode functionally specialized gene regulatory programs for the unique environments to which these organisms have adapted. This hypothesis is particularly appealing when we consider that the greatest expansion is observed within the group of halophilic archaea whose habitats are associated with routine and dynamic changes in a number of environmental factors including light, temperature, oxygen, salinity, and ionic composition (Rodriguez-Valera, 1993; Litchfield, 1998).
We have previously demonstrated that variations in the expanded set of TFBs (a through e) in Halobacterium salinarum NRC-1 manifests at the level of physical interactions within and across the two families, their DNA-binding specificity, their differential regulation in varying environments, and, ultimately, on the large-scale segregation of transcription of all genes into overlapping yet distinct sets of functionally related groups (Facciotti et al, 2007). We have extended findings from this earlier study with a systematic survey of the fitness consequences of perturbing the TFB network of H. salinarum NRC-1 across 17 environments. Notably, each TFB conferred fitness in two or more environmental conditions tested, and the relative fitness contributions (see Table I) of the five TFBs varied significantly by environment. From an evolutionary perspective, the relationships among these fitness landscapes reveal that two classes of TFBs (c/g- and f-type) appear to have played an important role in the evolution of halophilic archaea by overseeing regulation of core physiological capabilities in these organisms. TFBs of the other clades (b/d and a/e) seem to have emerged much more recently through gene duplications or horizontal gene transfers (HGTs) and are being utilized for adaptation to specialized environmental conditions.
We also investigated higher-order functional interactions and relationships among the duplicated TFBs by performing competition experiments and by mapping genetic interactions in different environments. This demonstrated that depending on environmental context, the TFBs have strikingly different functional hierarchies and genetic interactions with one another. This is remarkable as it makes each TFB essential albeit at different times in a dynamically changing environment.
In order to understand the process by which such gene family expansions shape architecture and functioning of a GRN, we performed integrated analysis of phylogeny, physical interactions, regulation, and fitness landscapes of the seven TFBs in H. salinarum NRC-1. This revealed that evolution of both their protein-coding sequence and their promoter has been instrumental in the encoding of environment-specific regulatory programs. Importantly, the convergent and divergent evolution of regulation and binding properties of TFBs suggested that, aside from HGT and random mutations, a third plausible (and perhaps most interesting) mechanism for acquiring a novel TFB variant is through gene conversion. To test this hypothesis, we synthesized a novel TFBx by transferring TFBa/e clade-specific residues to a TFBd backbone, transformed this variant under the control of either the TFBd or the TFBe promoter (PtfbD or PtfbE) into three different host genetic backgrounds (Δura3 (parent), ΔtfbD, and ΔtfbE), and analyzed fitness and gene expression patterns during growth at 25 and 37°C. This showed that gene conversion events spanning the coding sequence and the promoter, environmental context, and genetic background of the host are all extremely influential in the functional integration of a TFB into the GRN. Importantly, this analysis suggested that altering the regulation of an existing set of expanded TFBs might be an efficient mechanism to reprogram the GRN to rapidly generate novel niche adaptation capability. We have confirmed this experimentally by increasing fitness merely by moving tfbE to PtfbD control, and by generating a completely novel phenotype (biofilm-like appearance) by overexpression of tfbE.
Altogether this study clearly demonstrates that archaea can rapidly generate novel niche adaptation programs by simply altering regulation of duplicated TFBs. This is significant because expansions in the TFB family is widespread in archaea, a class of organisms that not only represent 20% of biomass on earth but are also known to have colonized some of the most extreme environments (DeLong and Pace, 2001). This strategy for niche adaptation is further expanded through interactions of the multiple TFBs with members of other expanded TF families such as TBPs (Facciotti et al, 2007) and sequence-specific regulators (e.g. Lrp family (Peeters and Charlier, 2010)). This is analogous to combinatorial solutions for other complex biological problems such as recognition of pathogens by Toll-like receptors (Roach et al, 2005), generation of antibody diversity by V(D)J recombination (Early et al, 1980), and recognition and processing of odors (Malnic et al, 1999).
Numerous lineage-specific expansions of the transcription factor B (TFB) family in archaea suggests an important role for expanded TFBs in encoding environment-specific gene regulatory programs. Given the characteristics of hypersaline lakes, the unusually large numbers of TFBs in halophilic archaea further suggests that they might be especially important in rapid adaptation to the challenges of a dynamically changing environment. Motivated by these observations, we have investigated the implications of TFB expansions by correlating sequence variations, regulation, and physical interactions of all seven TFBs in Halobacterium salinarum NRC-1 to their fitness landscapes, functional hierarchies, and genetic interactions across 2488 experiments covering combinatorial variations in salt, pH, temperature, and Cu stress. This systems analysis has revealed an elegant scheme in which completely novel fitness landscapes are generated by gene conversion events that introduce subtle changes to the regulation or physical interactions of duplicated TFBs. Based on these insights, we have introduced a synthetically redesigned TFB and altered the regulation of existing TFBs to illustrate how archaea can rapidly generate novel phenotypes by simply reprogramming their TFB regulatory network.
doi:10.1038/msb.2011.87
PMCID: PMC3261711  PMID: 22108796
evolution by gene family expansion; fitness; niche adaptation; reprogramming of gene regulatory network; transcription factor B
16.  Global and Local Architecture of the Mammalian microRNA–Transcription Factor Regulatory Network 
PLoS Computational Biology  2007;3(7):e131.
microRNAs (miRs) are small RNAs that regulate gene expression at the posttranscriptional level. It is anticipated that, in combination with transcription factors (TFs), they span a regulatory network that controls thousands of mammalian genes. Here we set out to uncover local and global architectural features of the mammalian miR regulatory network. Using evolutionarily conserved potential binding sites of miRs in human targets, and conserved binding sites of TFs in promoters, we uncovered two regulation networks. The first depicts combinatorial interactions between pairs of miRs with many shared targets. The network reveals several levels of hierarchy, whereby a few miRs interact with many other lowly connected miR partners. We revealed hundreds of “target hubs” genes, each potentially subject to massive regulation by dozens of miRs. Interestingly, many of these target hub genes are transcription regulators and they are often related to various developmental processes. The second network consists of miR–TF pairs that coregulate large sets of common targets. We discovered that the network consists of several recurring motifs. Most notably, in a significant fraction of the miR–TF coregulators the TF appears to regulate the miR, or to be regulated by the miR, forming a diversity of feed-forward loops. Together these findings provide new insights on the architecture of the combined transcriptional–post transcriptional regulatory network.
Author Summary
It is becoming increasingly appreciated that a new type of gene which does not code for proteins, the regulatory RNAs, constitutes a considerable portion of mammalian genomes, and these genes serve as key players in the regulatory network of living cells. Among these regulatory RNAs are the microRNAs (miRs), small RNAs that mediate posttranscriptional gene silencing through inhibition of protein production or degradation of mRNAs. So far little is known about the extent of regulation by miRs, and their potential cooperation with other regulatory layers in the network. We investigated the potential crosstalk between the miR-mediated posttranscription layer, and the transcriptional regulation layer, whose dominant players, the transcription factors (TFs), regulate the production of protein-coding mRNAs. We found that the extent of miR regulation varies extensively among different genes, some of which, especially those who serve as regulators themselves, are subject to enhanced miR silencing. Further, we identified thousands of genes that are potentially subjected to coordinated regulation by multiple miRs and by specific combinations of TFs and miRs. The regulatory network, comprising transcriptional and posttranscriptional regulation, manifests several recurring architectures, one of which consists of a TF and a miR that together regulate a large set of common genes, and that also appear to regulate one another. Altogether this work provides new insights into the logic and evolution of a new regulatory layer of the mammalian genome, and its effect on other regulatory networks in the cell.
doi:10.1371/journal.pcbi.0030131
PMCID: PMC1914371  PMID: 17630826
17.  Programmed fluctuations in sense/antisense transcript ratios drive sexual differentiation in S. pombe 
Strand-specific RNA sequencing of S. pombe reveals a highly structured programme of ncRNA expression at over 600 loci. Functional investigations show that this extensive ncRNA landscape controls the complex programme of sexual differentiation in S. pombe.
The model eukaryote S. pombe features substantial numbers of ncRNAs many of which are antisense regulatory transcripts (ARTs), ncRNAs expressed on the opposing strand to coding sequences.Individual ARTs are generated during the mitotic cycle, or at discrete stages of sexual differentiation to downregulate the levels of proteins that drive and coordinate sexual differentiation.Antisense transcription occurring from events such as bidirectional transcription is not simply artefactual ‘chatter', it performs a critical role in regulating gene expression.
Regulation of the RNA profile is a principal control driving sexual differentiation in the fission yeast Schizosaccharomyces pombe. Before transcription, RNAi-mediated formation of heterochromatin is used to suppress expression, while post-transcription, regulation is achieved via the active stabilisation or destruction of transcripts, and through at least two distinct types of splicing control (Mata et al, 2002; Shimoseki and Shimoda, 2001; Averbeck et al, 2005; Mata and Bähler, 2006; Xue-Franzen et al, 2006; Moldon et al, 2008; Djupedal et al, 2009; Amorim et al, 2010; Grewal, 2010; Cremona et al, 2011).
Around 94% of the S. pombe genome is transcribed (Wilhelm et al, 2008). While many of these transcripts encode proteins (Wood et al, 2002; Bitton et al, 2011), the majority have no known function. We used a strand-specific protocol to sequence total RNA extracts taken from vegetatively growing cells, and at different points during a time course of sexual differentiation. The resulting data redefined existing gene coordinates and identified additional transcribed loci. The frequency of reads at each of these was used to monitor transcript abundance.
Transcript levels at 6599 loci changed in at least one sample (G-statistic; False Discovery Rate <5%). 4231 (72.3%), of which 4011 map to protein-coding genes, while 809 loci were antisense to a known gene. Comparisons between haploid and diploid strains identified changes in transcript levels at over 1000 loci.
At 354 loci, greater antisense abundance was observed relative to sense, in at least one sample (putative antisense regulatory transcripts—ARTs). Since antisense mechanisms are known to modulate sense transcript expression through a variety of inhibitory mechanisms (Faghihi and Wahlestedt, 2009), we postulated that the waves of antisense expression activated at different stages during meiosis might be regulating protein expression.
To ask whether transcription factors that drive sense-transcript levels influenced ART production, we performed RNA-seq of a pat1.114 diploid meiosis in the absence of the transcription factors Atf21 and Atf31 (responsible for late meiotic transcription; Mata et al, 2002). Transcript levels at 185 ncRNA loci showed significant changes in the knockout backgrounds. Although meiotic progression is largely unaffected by removal of Atf21 and Atf31, viability of the resulting spores was significantly diminished, indicating that Atf21- and Atf31-mediated events are critical to efficient sexual differentiation.
If changes to relative antisense/sense transcript levels during a particular phase of sexual differentiation were to regulate protein expression, then the continued presence of the antisense at points in the differentiation programme where it would normally be absent should abolish protein function during this phase. We tested this hypothesis at four loci representing the three means of antisense production: convergent gene expression, improper termination and nascent transcription from an independent locus. Induction of the natural antisense transcripts that opposed spo4+, spo6+ and dis1+ (Figures 3 and 7) in trans from a heterologous locus phenocopied a loss of function of the target protein. ART overexpression decreased Dis1 protein levels. Antisense transcription opposing spk1+ originated from improper termination of the sense ups1+ transcript on the opposite strand (Figure 3B, left locus). Expression of either the natural full-length ups1+ transcript or a truncated version, restricted to the portion of ups1+ overlapping spk1+ (Figure 3, orange transcripts) in trans from a heterologous locus phenocopied the spk1.Δ differentiation deficiency. Convergent transcription from a neighbouring gene on the opposing strand is, therefore, an effective mechanism to generate RNAi-mediated (below) silencing in fission yeast. Further analysis of the data revealed, for many loci, substantial changes in UTR length over the course of meiosis, suggesting that UTR dynamics may have an active role in regulating gene expression by controlling the transcriptional overlap between convergent adjacent gene pairs.
The RNAi machinery (Grewal, 2010) was required for antisense suppression at each of the dis1, spk1, spo4 and spo6 loci, as antisense to each locus had no impact in ago1.Δ, dcr1.Δ and rdp1.Δ backgrounds. We conclude that RNAi control has a key role in maintaining the fidelity of sexual differentiation in fission yeast. The histone H3 methyl transferase Clr4 was required for antisense control from a heterologous locus.
Thus, a significant portion of the impact of ncRNA upon sexual differentiation arises from antisense gene silencing. Importantly, in contrast to the extensively characterised ability of the RNAi machinery to operate in cis at a target locus in S. pombe (Grewal, 2010), each case of gene silencing generated here could be achieved in trans by expression of the antisense transcript from a single heterologous locus elsewhere in the genome.
Integration of an antibiotic marker gene immediately downstream of the dis1+ locus instigated antisense control in an orientation-dependent manner. PCR-based gene tagging approaches are widely used to fuse the coding sequences of epitope or protein tags to a gene of interest. Not only do these tagging approaches disrupt normal 3′UTR controls, but the insertion of a heterologous marker gene immediately downstream of an ORF can clearly have a significant impact upon transcriptional control of the resulting fusion protein. Thus, PCR tagging approaches can no longer be viewed as benign manipulations of a locus that only result in the production of a tagged protein product.
Repression of Dis1 function by gene deletion or antisense control revealed a key role this conserved microtubule regulator in driving the horsetail nuclear migrations that promote recombination during meiotic prophase.
Non-coding transcripts have often been viewed as simple ‘chatter', maintained solely because evolutionary pressures have not been strong enough to force their elimination from the system. Our data show that phenomena such as improper termination and bidirectional transcription are not simply interesting artifacts arising from the complexities of transcription or genome history, but have a critical role in regulating gene expression in the current genome. Given the widespread use of RNAi, it is reasonable to anticipate that future analyses will establish ARTs to have equal importance in other organisms, including vertebrates.
These data highlight the need to modify our concept of a gene from that of a spatially distinct locus. This view is becoming increasingly untenable. Not only are the 5′ and 3′ ends of many genes indistinct, but that this lack of a hard and fast boundary is actively used by cells to control the transcription of adjacent and overlapping loci, and thus to regulate critical events in the life of a cell.
Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3′ termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent ‘horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply ‘genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
doi:10.1038/msb.2011.90
PMCID: PMC3738847  PMID: 22186733
antisense; meiosis; ncRNA; S. pombe; siRNA
18.  Impact of nuclear organization and dynamics on epigenetic regulation in the central nervous system: implications for neurological disease states 
Annals of the New York Academy of Sciences  2010;1204(Suppl):E20-E37.
Epigenetic mechanisms that are highly responsive to interoceptive and environmental stimuli mediate the proper execution of complex genomic programs such as cell type-specific gene transcription and post-transcriptional RNA processing and are increasingly thought to be important for modulating the development, homeostasis, and plasticity of the central nervous system (CNS). These epigenetic processes include DNA methylation, histone modifications, and chromatin remodeling, all of which play roles in neural cellular diversity, connectivity, and plasticity. Further, large-scale transcriptomic analyses have revealed that the eukaryotic genome is pervasively transcribed, forming interleaved protein-coding RNAs and regulatory non-protein-coding RNAs (ncRNAs), which act through a broad array of molecular mechanisms. Most of these ncRNAs are transcribed in a cell type- and developmental stage-specific manner in the CNS. A broad array of post-transcriptional processes, such as RNA editing and transport, can modulate the functions of both protein-coding RNAs and ncRNAs. Additional studies implicate nuclear organization and dynamics in mediating epigenetic regulation. The compartmentalization of DNA sequences and other molecular machinery into functional nuclear domains, such as transcription factories, Cajal bodies, promyelocytic leukemia nuclear bodies, nuclear speckles, and paraspeckles, some of which are found prominently in neural cells, is associated with regulation of transcriptional activity and post-transcriptional RNA processing. These observations suggest that genomic architecture and RNA biology in the CNS are much more complex and nuanced than previously appreciated. Increasing evidence now suggests that most, if not all, human CNS diseases are associated with either primary or secondary perturbations in one or more aspects of the epigenome. In this review, we provide an update of our emerging understanding of genomic architecture, RNA biology, and nuclear organization and highlight the interconnected roles that deregulation of these factors may play in diverse CNS disorders.
doi:10.1111/j.1749-6632.2010.05718.x
PMCID: PMC2946117  PMID: 20840166
epigenetics; non-coding RNAs; genomic architecture; nuclear organization; RNA editing; RNA trafficking; post-transcriptional processing; epigenetic memory; laminopathies, cohesinopathies; spinal muscular atrophy; nuclear ataxias
19.  Extragenic Accumulation of RNA Polymerase II Enhances Transcription by RNA Polymerase III 
PLoS Genetics  2007;3(11):e212.
Recent genomic data indicate that RNA polymerase II (Pol II) function extends beyond conventional transcription of primarily protein-coding genes. Among the five snRNAs required for pre-mRNA splicing, only the U6 snRNA is synthesized by RNA polymerase III (Pol III). Here we address the question of how Pol II coordinates the expression of spliceosome components, including U6. We used chromatin immunoprecipitation (ChIP) and high-resolution mapping by PCR to localize both Pol II and Pol III to snRNA gene regions. We report the surprising finding that Pol II is highly concentrated ∼300 bp upstream of all five active human U6 genes in vivo. The U6 snRNA, an essential component of the spliceosome, is synthesized by Pol III, whereas all other spliceosomal snRNAs are Pol II transcripts. Accordingly, U6 transcripts were terminated in a Pol III-specific manner, and Pol III localized to the transcribed gene regions. However, synthesis of both U6 and U2 snRNAs was α-amanitin-sensitive, indicating a requirement for Pol II activity in the expression of both snRNAs. Moreover, both Pol II and histone tail acetylation marks were lost from U6 promoters upon α-amanitin treatment. The results indicate that Pol II is concentrated at specific genomic regions from which it can regulate Pol III activity by a general mechanism. Consequently, Pol II coordinates expression of all RNA and protein components of the spliceosome.
Author Summary
During transcription, RNA polymerases synthesize an RNA copy of a given gene. Human genes are transcribed by either RNA polymerase I, II, or III. Here, we focus on transcription of the U6 gene that encodes a small nuclear RNA (snRNA), a non-coding RNA with unique activities in gene expression. The U6 snRNA is transcribed by RNA polymerase III (Pol III); here we report the surprising finding that RNA polymerase II (Pol II) is important for efficient expression of the U6 snRNA. Interestingly, high concentrations of Pol II have been recently observed on genomic regions that are considered outside of transcribed genes. We localized Pol II to a region upstream of the U6 snRNA gene promoters in living cells. Inhibition of Pol II activity decreased U6 snRNA synthesis and was accompanied by a decrease in Pol II accumulation as well as transcription-activating histone modifications, while Pol III remained bound at U6 genes. Thus, Pol II may promote U6 snRNA transcription by facilitating open chromatin formation. Our results provide insight into the extragenic function of Pol II, which can coordinate the expression of all components of the RNA splicing machinery, including U6 snRNA.
doi:10.1371/journal.pgen.0030212
PMCID: PMC2082468  PMID: 18039033
20.  Comparative transcriptomics of pathogenic and non-pathogenic Listeria species 
Comparative RNA-seq analysis of two related pathogenic and non-pathogenic bacterial strains reveals a hidden layer of divergence in the non-coding genome as well as conserved, widespread regulatory structures called ‘Excludons', which mediate regulation through long non-coding antisense RNAs.
Comparative transcriptome sequencing of two closely related bacterial strains reveals a hidden layer of divergence in the non-coding genome.Pathogen-specific non-coding RNAs, which might contribute to virulence, are revealed.The Listeria genome contains a class of unusually long antisense RNAs (lasRNAs) which spans divergent genes and repress expression of the genes located opposite to them while activating the other. The genetic organization of these lasRNAs and operon was named an excludon.The exhaustive transcriptome information from this publication is provided as an open resource with a web-accessible transcriptome browser.
Listeria monocytogenes is a human, food-borne pathogen. Genomic comparisons between L. monocytogenes and Listeria innocua, a closely related non-pathogenic species, were pivotal in the identification of protein-coding genes essential for virulence. However, no comprehensive comparison has focused on the non-coding genome. We used strand-specific cDNA sequencing to produce genome-wide transcription start site maps for both organisms, and developed a publicly available integrative browser to visualize and analyze both transcriptomes in different growth conditions and genetic backgrounds. Our data revealed conservation across most transcripts, but significant divergence between the species in a subset of non-coding RNAs. In L. monocytogenes, we identified 113 small RNAs (33 novel) and 70 antisense RNAs (53 novel), significantly increasing the repertoire of ncRNAs in this species. Remarkably, we identified a class of long antisense transcripts (lasRNAs) that overlap one gene while also serving as the 5′ UTR of the adjacent divergent gene. Experimental evidence suggests that lasRNAs transcription inhibits expression of one operon while activating the expression of another. Such a lasRNA/operon structure, that we named ‘excludon', might represent a novel form of regulation in bacteria.
doi:10.1038/msb.2012.11
PMCID: PMC3377988  PMID: 22617957
comparative genomics; Listeria monocytogenes; RNA-seq; transcriptome; TSS map
21.  Genomic and Transcriptional Co-Localization of Protein-Coding and Long Non-Coding RNA Pairs in the Developing Brain 
PLoS Genetics  2009;5(8):e1000617.
Besides protein-coding mRNAs, eukaryotic transcriptomes include many long non-protein-coding RNAs (ncRNAs) of unknown function that are transcribed away from protein-coding loci. Here, we have identified 659 intergenic long ncRNAs whose genomic sequences individually exhibit evolutionary constraint, a hallmark of functionality. Of this set, those expressed in the brain are more frequently conserved and are significantly enriched with predicted RNA secondary structures. Furthermore, brain-expressed long ncRNAs are preferentially located adjacent to protein-coding genes that are (1) also expressed in the brain and (2) involved in transcriptional regulation or in nervous system development. This led us to the hypothesis that spatiotemporal co-expression of ncRNAs and nearby protein-coding genes represents a general phenomenon, a prediction that was confirmed subsequently by in situ hybridisation in developing and adult mouse brain. We provide the full set of constrained long ncRNAs as an important experimental resource and present, for the first time, substantive and predictive criteria for prioritising long ncRNA and mRNA transcript pairs when investigating their biological functions and contributions to development and disease.
Author Summary
Virtually all of the eukaryotic genome is transcribed, yet far from all transcripts encode protein. Very little is known about the functions of most non-coding transcripts or, indeed, whether they convey functions at all. Among all such transcripts, we have chosen to consider long non-coding RNAs (ncRNAs) that are transcribed outside of known protein-coding gene loci. Our approach has focused on mouse long ncRNAs whose genomic sequences are conserved in humans, and also on ncRNAs that are expressed in the brain. This conservation might reflect the functionality of the underlying DNA, rather than the ncRNA, sequence. However, this cannot fully explain the concentration of predicted RNA structures in these ncRNAs. These long ncRNAs also tend to be transcribed in the genomic neighbourhood of protein-coding genes whose functions relate to transcription or to nervous system development. These observations are consistent with the positive transcriptional regulation in cis of these genes with nearby transcription of ncRNAs. This model implies co-expression of protein-coding and noncoding transcripts, a hypothesis that we validated experimentally. These findings are particularly important because they provide a rationale for prioritising specific ncRNAs when experimentally investigating regulation of protein-coding gene expression.
doi:10.1371/journal.pgen.1000617
PMCID: PMC2722021  PMID: 19696892
22.  Dynamic Expression of Long Non-Coding RNAs (lncRNAs) in Adult Zebrafish 
PLoS ONE  2013;8(12):e83616.
Long non-coding RNAs (lncRNA) represent an assorted class of transcripts having little or no protein coding capacity and have recently gained importance for their function as regulators of gene expression. Molecular studies on lncRNA have uncovered multifaceted interactions with protein coding genes. It has been suggested that lncRNAs are an additional layer of regulatory switches involved in gene regulation during development and disease. LncRNAs expressing in specific tissues or cell types during adult stages can have potential roles in form, function, maintenance and repair of tissues and organs. We used RNA sequencing followed by computational analysis to identify tissue restricted lncRNA transcript signatures from five different tissues of adult zebrafish. The present study reports 442 predicted lncRNA transcripts from adult zebrafish tissues out of which 419 were novel lncRNA transcripts. Of these, 77 lncRNAs show predominant tissue restricted expression across the five major tissues investigated. Adult zebrafish brain expressed the largest number of tissue restricted lncRNA transcripts followed by cardiovascular tissue. We also validated the tissue restricted expression of a subset of lncRNAs using independent methods. Our data constitute a useful genomic resource towards understanding the expression of lncRNAs in various tissues in adult zebrafish. Our study is thus a starting point and opens a way towards discovering new molecular interactions of gene expression within the specific adult tissues in the context of maintenance of organ form and function.
doi:10.1371/journal.pone.0083616
PMCID: PMC3877055  PMID: 24391796
23.  Decoding a Signature-Based Model of Transcription Cofactor Recruitment Dictated by Cardinal Cis-Regulatory Elements in Proximal Promoter Regions 
PLoS Genetics  2013;9(11):e1003906.
Genome-wide maps of DNase I hypersensitive sites (DHSs) reveal that most human promoters contain perpetually active cis-regulatory elements between −150 bp and +50 bp (−150/+50 bp) relative to the transcription start site (TSS). Transcription factors (TFs) recruit cofactors (chromatin remodelers, histone/protein-modifying enzymes, and scaffold proteins) to these elements in order to organize the local chromatin structure and coordinate the balance of post-translational modifications nearby, contributing to the overall regulation of transcription. However, the rules of TF-mediated cofactor recruitment to the −150/+50 bp promoter regions remain poorly understood. Here, we provide evidence for a general model in which a series of cis-regulatory elements (here termed ‘cardinal’ motifs) prefer acting individually, rather than in fixed combinations, within the −150/+50 bp regions to recruit TFs that dictate cofactor signatures distinctive of specific promoter subsets. Subsequently, human promoters can be subclassified based on the presence of cardinal elements and their associated cofactor signatures. In this study, furthermore, we have focused on promoters containing the nuclear respiratory factor 1 (NRF1) motif as the cardinal cis-regulatory element and have identified the pervasive association of NRF1 with the cofactor lysine-specific demethylase 1 (LSD1/KDM1A). This signature might be distinctive of promoters regulating nuclear-encoded mitochondrial and other particular genes in at least some cells. Together, we propose that decoding a signature-based, expanded model of control at proximal promoter regions should lead to a better understanding of coordinated regulation of gene transcription.
Author Summary
Human cells exploit different mechanisms to coordinate the expression of both protein-coding and non-coding RNAs. Elucidating these mechanisms is essential to understanding normal physiology and disease. In our attempt to identify new regulatory layers acting particularly at proximal promoters, we have computationally analyzed the genomic sequences located from −150 bp to +50 bp relative to the transcriptional start site (TSS), which are often at the center of ‘open’ chromatin regions in human promoters. We have confirmed the presence of a series of cis-regulatory elements (here referred to as ‘cardinal’ motifs) that show a strong preference for these short regions. Interestingly, these elements tend to act independently rather than in fixed combinations. Therefore, we propose that they confer unique regulatory features to the human promoter subsets that contain each of these particular elements. In agreement with this model, we have identified a large repertoire of preferential partnerships between transcription factors recognizing cardinal motifs and their associated proteins (cofactors), thus decoding a signature-based model that distinguishes distinctive regulatory types of promoters based on cardinal motifs. These signatures may underlie a new layer of transcriptional regulation to orchestrate coordinated gene expression in human promoters.
doi:10.1371/journal.pgen.1003906
PMCID: PMC3820735  PMID: 24244184
24.  Epigenetic Control of the Genome—Lessons from Genomic Imprinting 
Genes  2014;5(3):635-655.
Epigenetic mechanisms modulate genome function by writing, reading and erasing chromatin structural features. These have an impact on gene expression, contributing to the establishment, maintenance and dynamic changes in cellular properties in normal and abnormal situations. Great effort has recently been undertaken to catalogue the genome-wide patterns of epigenetic marks—creating reference epigenomes—which will deepen our understanding of their contributions to genome regulation and function with the promise of revealing further insights into disease etiology. The foundation for these global studies is the smaller scale experimentally-derived observations and questions that have arisen through the study of epigenetic mechanisms in model systems. One such system is genomic imprinting, a process causing the mono-allelic expression of genes in a parental-origin specific manner controlled by a hierarchy of epigenetic events that have taught us much about the dynamic interplay between key regulators of epigenetic control. Here, we summarize some of the most noteworthy lessons that studies on imprinting have revealed about epigenetic control on a wider scale. Specifically, we will consider what these studies have revealed about: the variety of relationships between DNA methylation and transcriptional control; the regulation of important protein-DNA interactions by DNA methylation; the interplay between DNA methylation and histone modifications; and the regulation and functions of long non-coding RNAs.
doi:10.3390/genes5030635
PMCID: PMC4198922  PMID: 25257202
Epigenetics; imprinting; gene expression; gene regulation; CTCF; long non-coding RNA; histone modifications; DNA methylation
25.  Two Distinct Repressive Mechanisms for Histone 3 Lysine 4 Methylation through Promoting 3′-End Antisense Transcription 
PLoS Genetics  2012;8(9):e1002952.
Histone H3 di- and trimethylation on lysine 4 are major chromatin marks that correlate with active transcription. The influence of these modifications on transcription itself is, however, poorly understood. We have investigated the roles of H3K4 methylation in Saccharomyces cerevisiae by determining genome-wide expression-profiles of mutants in the Set1 complex, COMPASS, that lays down these marks. Loss of H3K4 trimethylation has virtually no effect on steady-state or dynamically-changing mRNA levels. Combined loss of H3K4 tri- and dimethylation results in steady-state mRNA upregulation and delays in the repression kinetics of specific groups of genes. COMPASS-repressed genes have distinct H3K4 methylation patterns, with enrichment of H3K4me3 at the 3′-end, indicating that repression is coupled to 3′-end antisense transcription. Further analyses reveal that repression is mediated by H3K4me3-dependent 3′-end antisense transcription in two ways. For a small group of genes including PHO84, repression is mediated by a previously reported trans-effect that requires the antisense transcript itself. For the majority of COMPASS-repressed genes, however, it is the process of 3′-end antisense transcription itself that is the important factor for repression. Strand-specific qPCR analyses of various mutants indicate that this more prevalent mechanism of COMPASS-mediated repression requires H3K4me3-dependent 3′-end antisense transcription to lay down H3K4me2, which seems to serve as the actual repressive mark. Removal of the 3′-end antisense promoter also results in derepression of sense transcription and renders sense transcription insensitive to the additional loss of SET1. The derepression observed in COMPASS mutants is mimicked by reduction of global histone H3 and H4 levels, suggesting that the H3K4me2 repressive effect is linked to establishment of a repressive chromatin structure. These results indicate that in S. cerevisiae, the non-redundant role of H3K4 methylation by Set1 is repression, achieved through promotion of 3′-end antisense transcription to achieve specific rather than global effects through two distinct mechanisms.
Author Summary
In eukaryotes, DNA is packaged together with histones into nucleosomes. This packaging has a repressive role on gene expression. The N-termini of histones are subject to multiple modifications that affect DNA–dependent processes. The histone modification that has been predominantly linked with active transcription in all eukaryotes is histone H3 lysine 4 (H3K4) methylation. Here we investigate the functional effects of each H3K4 methylation state on transcription. Removal of the mark that is most characteristic for transcription, H3K4 trimethylation, has no effect on coding gene expression, in steady-state or dynamically changing conditions. Combined loss of H3K4 tri- and di-methylation does have an effect and leads to loss of repression of specific genes, the opposite of what is expected for global marks of active genes. The affected genes have antisense transcription. We show that there are two separate mechanisms through which H3K4 methylation represses transcription of protein-coding genes, one through antisense transcripts and one through the process of antisense transcription. In summary, we show how a general mark of active transcription can have specific repressive effects that are themselves also linked to repression through nucleosomes.
doi:10.1371/journal.pgen.1002952
PMCID: PMC3447963  PMID: 23028359

Results 1-25 (1106168)