Search tips
Search criteria

Results 1-25 (1356299)

Clipboard (0)

Related Articles

1.  RNA-DNA differences are rarer in proto-oncogenes than in tumor suppressor genes 
Scientific Reports  2012;2:245.
It has long been assumed that DNA sequences and corresponding RNA transcripts are almost identical; a recent discovery, however, revealed widespread RNA-DNA differences (RDDs), which represent a largely unexplored aspect of human genome variation. It has been speculated that RDDs can affect disease susceptibility and manifestations; however, almost nothing is known about how RDDs are related to disease. Here, we show that RDDs are rarer in proto-oncogenes than in tumor suppressor genes; the number of RDDs in coding exons, but not in 3′UTR and 5′UTR, is significantly lower in the former than the latter, and this trend is especially pronounced in non-synonymous RDDs, i.e., those cause amino acid changes. A potential mechanism is that, unlike proto-oncogenes, the requirement of tumor suppressor genes to have both alleles affected to cause tumor ‘buffers' these genes to tolerate more RDDs.
PMCID: PMC3270091  PMID: 22355757
2.  Detection Theory in Identification of RNA-DNA Sequence Differences Using RNA-Sequencing 
PLoS ONE  2014;9(11):e112040.
Advances in sequencing technology have allowed for detailed analyses of the transcriptome at single-nucleotide resolution, facilitating the study of RNA editing or sequence differences between RNA and DNA genome-wide. In humans, two types of post-transcriptional RNA editing processes are known to occur: A-to-I deamination by ADAR and C-to-U deamination by APOBEC1. In addition to these sequence differences, researchers have reported the existence of all 12 types of RNA-DNA sequence differences (RDDs); however, the validity of these claims is debated, as many studies claim that technical artifacts account for the majority of these non-canonical sequence differences. In this study, we used a detection theory approach to evaluate the performance of RNA-Sequencing (RNA-Seq) and associated aligners in accurately identifying RNA-DNA sequence differences. By generating simulated RNA-Seq datasets containing RDDs, we assessed the effect of alignment artifacts and sequencing error on the sensitivity and false discovery rate of RDD detection. Overall, we found that even in the presence of sequencing errors, false negative and false discovery rates of RDD detection can be contained below 10% with relatively lenient thresholds. We also assessed the ability of various filters to target false positive RDDs and found them to be effective in discriminating between true and false positives. Lastly, we used the optimal thresholds we identified from our simulated analyses to identify RDDs in a human lymphoblastoid cell line. We found approximately 6,000 RDDs, the majority of which are A-to-G edits and likely to be mediated by ADAR. Moreover, we found the majority of non A-to-G RDDs to be associated with poorer alignments and conclude from these results that the evidence for widespread non-canonical RDDs in humans is weak. Overall, we found RNA-Seq to be a powerful technique for surveying RDDs genome-wide when coupled with the appropriate thresholds and filters.
PMCID: PMC4232354  PMID: 25396741
3.  In-vitro and in-vivo phenotype of type Asia 1 foot-and-mouth disease viruses utilizing two non-RGD receptor recognition sites 
BMC Microbiology  2011;11:154.
Foot-and-mouth disease virus (FMDV) uses a highly conserved Arg-Gly-Asp (RGD) triplet for attachment to host cells and this motif is believed to be essential for virus viability. Previous sequence analyses of the 1D-encoding region of an FMDV field isolate (Asia1/JS/CHA/05) and its two derivatives indicated that two viruses, which contained an Arg-Asp-Asp (RDD) or an Arg-Ser-Asp (RSD) triplet instead of the RGD integrin recognition motif, were generated serendipitously upon short-term evolution of field isolate in different biological environments. To examine the influence of single amino acid substitutions in the receptor binding site of the RDD-containing FMD viral genome on virus viability and the ability of non-RGD FMDVs to cause disease in susceptible animals, we constructed an RDD-containing FMDV full-length cDNA clone and derived mutant molecules with RGD or RSD receptor recognition motifs. Following transfection of BSR cells with the full-length genome plasmids, the genetically engineered viruses were examined for their infectious potential in cell culture and susceptible animals.
Amino acid sequence analysis of the 1D-coding region of different derivatives derived from the Asia1/JS/CHA/05 field isolate revealed that the RDD mutants became dominant or achieved population equilibrium with coexistence of the RGD and RSD subpopulations at an early phase of type Asia1 FMDV quasispecies evolution. Furthermore, the RDD and RSD sequences remained genetically stable for at least 20 passages. Using reverse genetics, the RDD-, RSD-, and RGD-containing FMD viruses were rescued from full-length cDNA clones, and single amino acid substitution in RDD-containing FMD viral genome did not affect virus viability. The genetically engineered viruses replicated stably in BHK-21 cells and had similar growth properties to the parental virus. The RDD parental virus and two non-RGD recombinant viruses were virulent to pigs and bovines that developed typical clinical disease and viremia.
FMDV quasispecies evolving in a different biological environment gained the capability of selecting different receptor recognition site. The RDD-containing FMD viral genome can accommodate substitutions in the receptor binding site without additional changes in the capsid. The viruses expressing non-RGD receptor binding sites can replicate stably in vitro and produce typical FMD clinical disease in susceptible animals.
PMCID: PMC3224205  PMID: 21711567
4.  Cell-type specific analysis of translating RNAs in developing flowers reveals new levels of control 
Combining translating ribosome affinity purification with RNA-seq for cell-specific profiling of translating RNAs in developing flowers.Cell type comparisons of cell type-specific hormone responses, promoter motifs, coexpressed cognate binding factor candidates, and splicing isoforms.Widespread post-transcriptional regulation at both the intron splicing and translational stages.A new class of noncoding RNAs associated with polysomes.
What constitutes a differentiated cell type? How much do cell types differ in their transcription of genes? The development and functions of tissues rely on constant interactions among distinct and nonequivalent cell types. Answering these questions will require quantitative information on transcriptomes, proteomes, protein–protein interactions, protein–nucleic acid interactions, and metabolomes at cellular resolution. The systems approaches emerging in biology promise to explain properties of biological systems based on genome-wide measurements of expression, interaction, regulation, and metabolism. To facilitate a systems approach, it is essential first to capture such components in a global manner, ideally at cellular resolution.
Recently, microarray analysis of transcriptomes has been extended to a cellular level of resolution by using laser microdissection or fluorescence-activated sorting (for review, see Nelson et al, 2008). These methods have been limited by stresses associated with cellular separation and isolation procedures, and biases associated with mandatory RNA amplification steps. A newly developed method, translating ribosome affinity purification (TRAP; Zanetti et al, 2005; Heiman et al, 2008; Mustroph et al, 2009), circumvents these problems by epitopetagging a ribosomal protein in specific cellular domains to selectively purify polysomes. We combined TRAP with deep sequencing, which we term TRAP-seq, to provide cell-level spatiotemporal maps for Arabidopsis early floral development at single-base resolution.
Flower development in Arabidopsis has been studied extensively and is one of the best understood aspects of plant development (for review, see Krizek and Fletcher, 2005). Genetic analysis of homeotic mutants established the ABC model, in which three classes of regulatory genes, A, B and C, work in a combinatorial manner to confer organ identities of four whorls (Coen and Meyerowitz, 1991). Each class of regulatory gene is expressed in a specific and evolutionarily conserved domain, and the action of the class A, B and C genes is necessary for specification of organ identity (Figure 1A).
Using TRAP-seq, we purified cell-specific translating mRNA populations, which we and others call the translatome, from the A, B and C domains of early developing flowers, in which floral patterning and the specification of floral organs is established. To achieve temporal specificity, we used a floral induction system to facilitate collection of early stage flowers (Wellmer et al, 2006). The combination of TRAP-seq with domain-specific promoters and this floral induction system enabled fine spatiotemporal isolation of translating mRNA in specific cellular domains, and at specific developmental stages.
Multiple lines of evidence confirmed the specificity of this approach, including detecting the expression in expected domains but not in other domains for well-studied flower marker genes and known physiological functions (Figures 1B–D and 2A–C). Furthermore, we provide numerous examples from flower development in which a spatiotemporal map of rigorously comparable cell-specific translatomes makes possible new views of the properties of cell domains not evident in data obtained from whole organs or tissues, including patterns of transcription and cis-regulation, new physiological differences among cell domains and between flower stages, putative hormone-active centers, and splicing events specific for flower domains (Figure 2A–D). Such findings may provide new targets for reverse genetics studies and may aid in the formulation and validation of interaction and pathway networks.
Beside cellular heterogeneity, the transcriptome is regulated at several steps through the life of mRNA molecules, which are not directly available through traditional transcriptome profiling of total mRNA abundance. By comparing the translatome and transcriptome, we integratively profiled two key posttranscriptional control points, intron splicing and translation state. From our translatome-wide profiling, we (i) confirmed that both posttranscriptional regulation control points were used by a large portion of the transcriptome; (ii) identified a number of cis-acting features within the coding or noncoding sequences that correlate with splicing or translation state; and (iii) revealed correlation between each regulation mechanism and gene function. Our transcriptome-wide surveys have highlighted target genes transcripts of which are probably under extensive posttranscriptional regulation during flower development.
Finally, we reported the finding of a large number of polysome-associated ncRNAs. About one-third of all annotated ncRNA in the Arabidopsis genome were observed co-purified with polysomes. Coding capacity analysis confirmed that most of them are real ncRNA without conserved ORFs. The group of polysome-associated ncRNA reported in this study is a potential new addition to the expanding riboregulator catalog; they could have roles in translational regulation during early flower development.
Determining both the expression levels of mRNA and the regulation of its translation is important in understanding specialized cell functions. In this study, we describe both the expression profiles of cells within spatiotemporal domains of the Arabidopsis thaliana flower and the post-transcriptional regulation of these mRNAs, at nucleotide resolution. We express a tagged ribosomal protein under the promoters of three master regulators of flower development. By precipitating tagged polysomes, we isolated cell type-specific mRNAs that are probably translating, and quantified those mRNAs through deep sequencing. Cell type comparisons identified known cell-specific transcripts and uncovered many new ones, from which we inferred cell type-specific hormone responses, promoter motifs and coexpressed cognate binding factor candidates, and splicing isoforms. By comparing translating mRNAs with steady-state overall transcripts, we found evidence for widespread post-transcriptional regulation at both the intron splicing and translational stages. Sequence analyses identified structural features associated with each step. Finally, we identified a new class of noncoding RNAs associated with polysomes. Findings from our profiling lead to new hypotheses in the understanding of flower development.
PMCID: PMC2990639  PMID: 20924354
Arabidopsis; flower; intron; transcriptome; translation
5.  Molecular Evidence of RNA Editing in Bombyx Chemosensory Protein Family 
PLoS ONE  2014;9(2):e86932.
Chemosensory proteins (CSPs) are small scavenger proteins that are mainly known as transporters of pheromone/odor molecules at the periphery of sensory neurons in the insect antennae and in the producing cells from the moth female pheromone gland.
Sequencing cDNAs of RNA encoding CSPs in the antennae, legs, head, pheromone gland and wings from five single individual adult females of the silkworm moth Bombyx mori showed that they differed from genomic sequences by subtle nucleotide replacement (RDD). Both intronless and intronic CSP genes expressed RDDs, although in different rates. Most interestingly, in our study the degree of RDDs in CSP genes were found to be tissue-specific. The proportion of CSP-RDDs was found to be significantly much higher in the pheromone gland. In addition, Western blot analysis of proteins in different tissues showed existence of multiple CSP protein variant chains particularly found in the pheromone gland. Peptide sequencing demonstrated the occurrence of a pleiad of protein variants for most of all BmorCSPs from the pheromone gland. Our findings show that RNA editing is an important feature in the expression of CSPs and that a high variety of RDDs is found to expand drastically thus altering the repertoire of CSP proteins in a tissue-specific manner.
PMCID: PMC3923736  PMID: 24551045
6.  Very Few RNA and DNA Sequence Differences in the Human Transcriptome 
PLoS ONE  2011;6(10):e25842.
RNA editing is an important cellular process by which the nucleotides in a mature RNA transcript are altered to cause them to differ from the corresponding DNA sequence. While this process yields essential transcripts in humans and other organisms, it is believed to occur at a relatively small number of loci. The rarity of RNA editing has been challenged by a recent comparison of human RNA and DNA sequence data from 27 individuals, which revealed that over 10,000 human exonic sites appear to exhibit RNA-DNA differences (RDDs). Many of these differences could not have been caused by either of the two previously known human RNA editing mechanisms—ADAR-mediated A→G substitutions or APOBEC1-mediated C→U switches—suggesting that a previously unknown mechanism of RNA editing may be active in humans. Here, we reanalyze these data and demonstrate that genomic sequences exist in these same individuals or in the human genome that match the majority of RDDs. Our results suggest that the majority of these RDD events were observed due to accurate transcription of sequences paralogous to the apparently edited gene but differing at the edited site. In light of our results it seems prudent to conclude that if indeed an unknown mechanism is causing RDD events in humans, such events occur at a much lower frequency than originally proposed.
PMCID: PMC3192132  PMID: 22022455
7.  Integrative Deep Sequencing of the Mouse Lung Transcriptome Reveals Differential Expression of Diverse Classes of Small RNAs in Response to Respiratory Virus Infection 
mBio  2011;2(6):e00198-11.
We previously reported widespread differential expression of long non-protein-coding RNAs (ncRNAs) in response to virus infection. Here, we expanded the study through small RNA transcriptome sequencing analysis of the host response to both severe acute respiratory syndrome coronavirus (SARS-CoV) and influenza virus infections across four founder mouse strains of the Collaborative Cross, a recombinant inbred mouse resource for mapping complex traits. We observed differential expression of over 200 small RNAs of diverse classes during infection. A majority of identified microRNAs (miRNAs) showed divergent changes in expression across mouse strains with respect to SARS-CoV and influenza virus infections and responded differently to a highly pathogenic reconstructed 1918 virus compared to a minimally pathogenic seasonal influenza virus isolate. Novel insights into miRNA expression changes, including the association with pathogenic outcomes and large differences between in vivo and in vitro experimental systems, were further elucidated by a survey of selected miRNAs across diverse virus infections. The small RNAs identified also included many non-miRNA small RNAs, such as small nucleolar RNAs (snoRNAs), in addition to nonannotated small RNAs. An integrative sequencing analysis of both small RNAs and long transcripts from the same samples showed that the results revealing differential expression of miRNAs during infection were largely due to transcriptional regulation and that the predicted miRNA-mRNA network could modulate global host responses to virus infection in a combinatorial fashion. These findings represent the first integrated sequencing analysis of the response of host small RNAs to virus infection and show that small RNAs are an integrated component of complex networks involved in regulating the host response to infection.
Most studies examining the host transcriptional response to infection focus only on protein-coding genes. However, mammalian genomes transcribe many short and long non-protein-coding RNAs (ncRNAs). With the advent of deep-sequencing technologies, systematic transcriptome analysis of the host response, including analysis of ncRNAs of different sizes, is now possible. Using this approach, we recently discovered widespread differential expression of host long (>200 nucleotide [nt]) ncRNAs in response to virus infection. Here, the samples described in the previous report were again used, but we sequenced another fraction of the transcriptome to study very short (about 20 to 30 nt) ncRNAs. We demonstrated that virus infection also altered expression of many short ncRNAs of diverse classes. Putting the results of the two studies together, we show that small RNAs may also play an important role in regulating the host response to virus infection.
PMCID: PMC3221602  PMID: 22086488
8.  Mutations in a Novel, Cryptic Exon of the Luteinizing Hormone/Chorionic Gonadotropin Receptor Gene Cause Male Pseudohermaphroditism 
PLoS Medicine  2008;5(4):e88.
Male pseudohermaphroditism, or Leydig cell hypoplasia (LCH), is an autosomal recessive disorder in individuals with a 46,XY karyotype, characterized by a predominantly female phenotype, a blind-ending vagina, absence of breast development, primary amenorrhea, and the presence of testicular structures. It is caused by mutations in the luteinizing hormone/chorionic gonadotropin receptor gene (LHCGR), which impair either LH/CG binding or signal transduction. However, molecular analysis has revealed that the LHCGR is apparently normal in about 50% of patients with the full clinical phenotype of LCH. We therefore searched the LHCGR for novel genomic elements causative for LCH.
Methods and Findings
In the present study we have identified a novel, primate-specific bona fide exon (exon 6A) within the LHCGR gene. It displays composite characteristics of an internal/terminal exon and possesses stop codons triggering nonsense-mediated mRNA decay (NMD) in LHCGR. Transcripts including exon 6A are physiologically highly expressed in human testes and granulosa cells, and result in an intracellular, truncated LHCGR protein of 209 amino acids. We sequenced exon 6A in 16 patients with unexplained LCH and detected mutations in three patients. Functional studies revealed a dramatic increase in the expression of the mutated internal exon 6A transcripts, indicating aberrant NMD. These altered ratios of LHCGR transcripts result in the generation of predominantly nonfunctional LHCGR isoforms, thereby preventing proper expression and functioning.
The identification and characterization of this novel exon not only identifies a new regulatory element within the genomic organization of LHCGR, but also points toward a complex network of receptor regulation, including events at the transcriptional level. These findings add to the molecular diagnostic tools for LCH and extend our understanding of the endocrine regulation of sexual differentiation.
Joerg Gromoll and colleagues describe the identification and characterization of a novel exon that appears to be a new regulatory element within the luteinizing hormone/chorionic gonadotropin receptor gene of three individuals with Leydig cell hypoplasia.
Editors' Summary
A person's sex is determined by their complement of X and Y (sex) chromosomes. Someone who has two X chromosomes is genetically female and usually has ovaries and female external sex organs. Someone who has an X and a Y chromosome is genetically male and has testes and male external sex organs. Sometimes, though, the development of the reproductive organs proceeds abnormally, resulting in a person with an “intersex” condition whose chromosomes, gonads (ovaries or testes), and external sex organs do not correspond. Leydig cell hypoplasia (LCH; also called male pseudohermaphroditism or a disorder of sex development) is an XY female intersex condition. People with this inherited condition develop testes but also have a vagina (which is not connected to a womb), and they do not develop breasts or have periods. This mixture of sexual characteristics arises because the Leydig cells in the testes are underdeveloped. Leydig cells normally secrete testosterone, the hormone that promotes the development and maintenance of male sex characteristics. Before birth, chorionic gonadotropin (CG; a hormone made by the placenta) stimulates Leydig cell development and testosterone production; after birth, luteinizing hormone (LH), which is made by the pituitary gland, stimulates testosterone production. Both hormones bind to the LH/CG receptor, a protein on the surface of Leydig cells. In LCH, this receptor either does not bind CG and LH or fails to tell the Leydig cells to make testosterone.
Why Was This Study Done?
The gene that encodes the LH/CG receptor is called LHCGR. Several mutations (genetic changes) that inactivate the LC/CG receptor have been identified in people with LCH. However, the LHCGR gene is apparently normal in 50% of people with this intersex condition. In this study, the researchers examine the LHCGR gene in detail to try to find the underlying genetic defect in these individuals.
What Did the Researchers Do and Find?
The researchers used several molecular biology techniques to identify a new exon—exon 6A—within the human LHCGR gene. (Exons are DNA sequences that contain the information for making proteins; introns are DNA sequences that interrupt the coding sequence of a gene. Both introns and exons are transcribed into messenger RNA [mRNA] and the exons are then “spliced” together to make the mature mRNA, which is translated into protein.) The researchers identify several differently spliced LHCGR mRNA transcripts that contain exon 6A—a terminal exon 6A mRNA that contains exons 1–6 and exon 6A, and two internal exon 6A mRNAs that also contain exons 7–11. The researchers report that human testes express high levels of the terminal exon 6A transcript, which is translated into a short version of LHCGR protein that remains within the cell (full-length LHCGR moves to the cell surface). By contrast, testes contain low levels of the internal exon 6A mRNAs. This is because exon 6A contains two premature stop codons (DNA sequences that mark the end of a protein), which trigger “nonsense-mediated decay” (NMD), a cellular surveillance mechanism that regulates protein synthesis by degrading mRNAs that contain internal stop codons. When the researchers screened 16 people with LCH but without known mutations in the LHCGR gene, three had mutations in exon 6A. Laboratory experiments show that these mutations greatly increased the amounts of the internal exon 6A transcripts present in cells and interfered with the cells' normal response to chorionic gonadotropin.
What Do These Findings Mean?
These findings identify a new, functional exon in the LHCGR gene and show that mutations in this exon cause some cases of LCH. This is the first time that a human disease has been associated with mutations in an exon that is a target for NMD. In addition, these findings provide important insights into how the LHCGR is regulated. The researchers speculate that a complex network that involves the exon 6A-containing transcripts and NMD normally tightly regulates the production of functional LHCGR already at the transcriptional level. When mutations are present in exon 6A, they suggest, NMD is the predominant pathway for all the exon 6A-containing transcripts, thereby drastically decreasing the amount of functional LHCGR.
Additional Information.
Please access these Web sites via the online version of this summary at
The MedlinePlus Encyclopedia has a page on intersex conditions (in English and Spanish)
Wikipedia has pages on intersexuality and on the LH/CG receptor (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The Intersex Society of North America provides information and support for the parents of children with intersex conditions
The Androgen Insensitivity Syndrome Support Group also provides some general information about intersex conditions, including information about LCH and other XY female conditions (in several languages)
Sequence-Structure-Function-Analysis (SSFA), run by a group of researchers in Germany (Leibniz-Institut für Molekulare Pharmakologie; Humboldt-Universitätzu Berlin), is a database dealing the sequence, structure, and function of glycoprotein hormone receptors
Glycoprotein-hormone Receptors Information System (GRIS), from Université Libre de Bruxelles and Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire, is a database giving structural information on the LHCGR
PMCID: PMC2323302  PMID: 18433292
9.  DNA demethylases target promoter transposable elements to positively regulate stress responsive genes in Arabidopsis 
Genome Biology  2014;15(9):458.
DNA demethylases regulate DNA methylation levels in eukaryotes. Arabidopsis encodes four DNA demethylases, DEMETER (DME), REPRESSOR OF SILENCING 1 (ROS1), DEMETER-LIKE 2 (DML2), and DML3. While DME is involved in maternal specific gene expression during seed development, the biological function of the remaining DNA demethylases remains unclear.
We show that ROS1, DML2, and DML3 play a role in fungal disease resistance in Arabidopsis. A triple DNA demethylase mutant, rdd (ros1 dml2 dml3), shows increased susceptibility to the fungal pathogen Fusarium oxysporum. We identify 348 genes differentially expressed in rdd relative to wild type, and a significant proportion of these genes are downregulated in rdd and have functions in stress response, suggesting that DNA demethylases maintain or positively regulate the expression of stress response genes required for F. oxysporum resistance. The rdd-downregulated stress response genes are enriched for short transposable element sequences in their promoters. Many of these transposable elements and their surrounding sequences show localized DNA methylation changes in rdd, and a general reduction in CHH methylation, suggesting that RNA-directed DNA methylation (RdDM), responsible for CHH methylation, may participate in DNA demethylase-mediated regulation of stress response genes. Many of the rdd-downregulated stress response genes are downregulated in the RdDM mutants nrpd1 and nrpe1, and the RdDM mutants nrpe1 and ago4 show enhanced susceptibility to F. oxysporum infection.
Our results suggest that a primary function of DNA demethylases in plants is to regulate the expression of stress response genes by targeting promoter transposable element sequences.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0458-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4189188  PMID: 25228471
10.  Comparative transcriptomics of pathogenic and non-pathogenic Listeria species 
Comparative RNA-seq analysis of two related pathogenic and non-pathogenic bacterial strains reveals a hidden layer of divergence in the non-coding genome as well as conserved, widespread regulatory structures called ‘Excludons', which mediate regulation through long non-coding antisense RNAs.
Comparative transcriptome sequencing of two closely related bacterial strains reveals a hidden layer of divergence in the non-coding genome.Pathogen-specific non-coding RNAs, which might contribute to virulence, are revealed.The Listeria genome contains a class of unusually long antisense RNAs (lasRNAs) which spans divergent genes and repress expression of the genes located opposite to them while activating the other. The genetic organization of these lasRNAs and operon was named an excludon.The exhaustive transcriptome information from this publication is provided as an open resource with a web-accessible transcriptome browser.
Listeria monocytogenes is a human, food-borne pathogen. Genomic comparisons between L. monocytogenes and Listeria innocua, a closely related non-pathogenic species, were pivotal in the identification of protein-coding genes essential for virulence. However, no comprehensive comparison has focused on the non-coding genome. We used strand-specific cDNA sequencing to produce genome-wide transcription start site maps for both organisms, and developed a publicly available integrative browser to visualize and analyze both transcriptomes in different growth conditions and genetic backgrounds. Our data revealed conservation across most transcripts, but significant divergence between the species in a subset of non-coding RNAs. In L. monocytogenes, we identified 113 small RNAs (33 novel) and 70 antisense RNAs (53 novel), significantly increasing the repertoire of ncRNAs in this species. Remarkably, we identified a class of long antisense transcripts (lasRNAs) that overlap one gene while also serving as the 5′ UTR of the adjacent divergent gene. Experimental evidence suggests that lasRNAs transcription inhibits expression of one operon while activating the expression of another. Such a lasRNA/operon structure, that we named ‘excludon', might represent a novel form of regulation in bacteria.
PMCID: PMC3377988  PMID: 22617957
comparative genomics; Listeria monocytogenes; RNA-seq; transcriptome; TSS map
11.  Evolutionarily conserved long intergenic non-coding RNAs in the eye 
Human Molecular Genetics  2013;22(15):2992-3002.
The discovery that the mammalian transcriptome encodes thousands of long intergenic non-coding (linc) RNA transcripts, together with recent evidence that lincRNAs can regulate protein-coding genes, has added a new level of complexity to cellular transcriptional/translational regulation. Indeed several reports now link mutations in lincRNAs to heritable human disorders. Here, we identified a subset of lincRNAs in terminally differentiated adult human retinal neurons based on their sequence conservation across species. RNA sequencing of eye tissue from several mammalian species with varied rod/cone photoreceptor content identified 18 lincRNAs that were highly conserved across these species. Sixteen of the 18 were conserved in human retinal tissue with 14 of these also conserved in the macular region. A subset of lincRNAs exhibited restricted tissue expression profiles in mice, with preferential expression in the retina. Mouse models with different populations of retinal cells as well as in situ hybridization provided evidence that these lincRNAs localized to specific retinal compartments, most notably to the photoreceptor neuronal layer. Computational genomic loci and promoter region analyses provided a basis for regulated expression of these conserved lincRNAs in retinal post-mitotic neurons. This combined approach identified several lincRNAs that could be critical for retinal and visual maintenance in adults.
PMCID: PMC3699063  PMID: 23562822
12.  Niche adaptation by expansion and reprogramming of general transcription factors 
Experimental analysis of TFB family proteins in a halophilic archaeon reveals complex environment-dependent fitness contributions. Gene conversion events among these proteins can generate novel niche adaptation capabilities, a process that may have contributed to archaeal adaptation to extreme environments.
Evolution of archaeal lineages correlate with duplication events in the TFB family.Each TFB is required for adaptation to multiple environments.The relative fitness contributions of TFBs change with environmental context.Changes in the regulation of duplicated TFBs can generate new adaptation capabilities.
The evolutionary success of an organism depends on its ability to continually adapt to changes in the patterns of constant, periodic, and transient challenges within its environment. This process of ‘niche adaptation' requires reprogramming of the organism's environmental response networks by reorganizing interactions among diverse parts including environmental sensors, signal transducers, and transcriptional and post-transcriptional regulators. Gene duplications have been discovered to be one of the principal strategies in this process, especially for reprogramming of gene regulatory networks (GRNs). Whereas eukaryotes require dozens of factors for recruitment of RNA polymerase, archaea require just two general transcription factors (GTFs) that are orthologous to eukaryotic TFIIB (TFB in archaea) and TATA-binding protein (TBP) (Bell et al, 1998). Both of these GTFs have expanded extensively in nearly 50% of all archaea whose genomes have been fully sequenced. The phylogenetic analysis presented in this study reveal lineage-specific expansions of TFBs, suggesting that they might encode functionally specialized gene regulatory programs for the unique environments to which these organisms have adapted. This hypothesis is particularly appealing when we consider that the greatest expansion is observed within the group of halophilic archaea whose habitats are associated with routine and dynamic changes in a number of environmental factors including light, temperature, oxygen, salinity, and ionic composition (Rodriguez-Valera, 1993; Litchfield, 1998).
We have previously demonstrated that variations in the expanded set of TFBs (a through e) in Halobacterium salinarum NRC-1 manifests at the level of physical interactions within and across the two families, their DNA-binding specificity, their differential regulation in varying environments, and, ultimately, on the large-scale segregation of transcription of all genes into overlapping yet distinct sets of functionally related groups (Facciotti et al, 2007). We have extended findings from this earlier study with a systematic survey of the fitness consequences of perturbing the TFB network of H. salinarum NRC-1 across 17 environments. Notably, each TFB conferred fitness in two or more environmental conditions tested, and the relative fitness contributions (see Table I) of the five TFBs varied significantly by environment. From an evolutionary perspective, the relationships among these fitness landscapes reveal that two classes of TFBs (c/g- and f-type) appear to have played an important role in the evolution of halophilic archaea by overseeing regulation of core physiological capabilities in these organisms. TFBs of the other clades (b/d and a/e) seem to have emerged much more recently through gene duplications or horizontal gene transfers (HGTs) and are being utilized for adaptation to specialized environmental conditions.
We also investigated higher-order functional interactions and relationships among the duplicated TFBs by performing competition experiments and by mapping genetic interactions in different environments. This demonstrated that depending on environmental context, the TFBs have strikingly different functional hierarchies and genetic interactions with one another. This is remarkable as it makes each TFB essential albeit at different times in a dynamically changing environment.
In order to understand the process by which such gene family expansions shape architecture and functioning of a GRN, we performed integrated analysis of phylogeny, physical interactions, regulation, and fitness landscapes of the seven TFBs in H. salinarum NRC-1. This revealed that evolution of both their protein-coding sequence and their promoter has been instrumental in the encoding of environment-specific regulatory programs. Importantly, the convergent and divergent evolution of regulation and binding properties of TFBs suggested that, aside from HGT and random mutations, a third plausible (and perhaps most interesting) mechanism for acquiring a novel TFB variant is through gene conversion. To test this hypothesis, we synthesized a novel TFBx by transferring TFBa/e clade-specific residues to a TFBd backbone, transformed this variant under the control of either the TFBd or the TFBe promoter (PtfbD or PtfbE) into three different host genetic backgrounds (Δura3 (parent), ΔtfbD, and ΔtfbE), and analyzed fitness and gene expression patterns during growth at 25 and 37°C. This showed that gene conversion events spanning the coding sequence and the promoter, environmental context, and genetic background of the host are all extremely influential in the functional integration of a TFB into the GRN. Importantly, this analysis suggested that altering the regulation of an existing set of expanded TFBs might be an efficient mechanism to reprogram the GRN to rapidly generate novel niche adaptation capability. We have confirmed this experimentally by increasing fitness merely by moving tfbE to PtfbD control, and by generating a completely novel phenotype (biofilm-like appearance) by overexpression of tfbE.
Altogether this study clearly demonstrates that archaea can rapidly generate novel niche adaptation programs by simply altering regulation of duplicated TFBs. This is significant because expansions in the TFB family is widespread in archaea, a class of organisms that not only represent 20% of biomass on earth but are also known to have colonized some of the most extreme environments (DeLong and Pace, 2001). This strategy for niche adaptation is further expanded through interactions of the multiple TFBs with members of other expanded TF families such as TBPs (Facciotti et al, 2007) and sequence-specific regulators (e.g. Lrp family (Peeters and Charlier, 2010)). This is analogous to combinatorial solutions for other complex biological problems such as recognition of pathogens by Toll-like receptors (Roach et al, 2005), generation of antibody diversity by V(D)J recombination (Early et al, 1980), and recognition and processing of odors (Malnic et al, 1999).
Numerous lineage-specific expansions of the transcription factor B (TFB) family in archaea suggests an important role for expanded TFBs in encoding environment-specific gene regulatory programs. Given the characteristics of hypersaline lakes, the unusually large numbers of TFBs in halophilic archaea further suggests that they might be especially important in rapid adaptation to the challenges of a dynamically changing environment. Motivated by these observations, we have investigated the implications of TFB expansions by correlating sequence variations, regulation, and physical interactions of all seven TFBs in Halobacterium salinarum NRC-1 to their fitness landscapes, functional hierarchies, and genetic interactions across 2488 experiments covering combinatorial variations in salt, pH, temperature, and Cu stress. This systems analysis has revealed an elegant scheme in which completely novel fitness landscapes are generated by gene conversion events that introduce subtle changes to the regulation or physical interactions of duplicated TFBs. Based on these insights, we have introduced a synthetically redesigned TFB and altered the regulation of existing TFBs to illustrate how archaea can rapidly generate novel phenotypes by simply reprogramming their TFB regulatory network.
PMCID: PMC3261711  PMID: 22108796
evolution by gene family expansion; fitness; niche adaptation; reprogramming of gene regulatory network; transcription factor B
13.  High-Resolution Transcriptome Maps Reveal Strain-Specific Regulatory Features of Multiple Campylobacter jejuni Isolates 
PLoS Genetics  2013;9(5):e1003495.
Campylobacter jejuni is currently the leading cause of bacterial gastroenteritis in humans. Comparison of multiple Campylobacter strains revealed a high genetic and phenotypic diversity. However, little is known about differences in transcriptome organization, gene expression, and small RNA (sRNA) repertoires. Here we present the first comparative primary transcriptome analysis based on the differential RNA–seq (dRNA–seq) of four C. jejuni isolates. Our approach includes a novel, generic method for the automated annotation of transcriptional start sites (TSS), which allowed us to provide genome-wide promoter maps in the analyzed strains. These global TSS maps are refined through the integration of a SuperGenome approach that allows for a comparative TSS annotation by mapping RNA–seq data of multiple strains into a common coordinate system derived from a whole-genome alignment. Considering the steadily increasing amount of RNA–seq studies, our automated TSS annotation will not only facilitate transcriptome annotation for a wider range of pro- and eukaryotes but can also be adapted for the analysis among different growth or stress conditions. Our comparative dRNA–seq analysis revealed conservation of most TSS, but also single-nucleotide-polymorphisms (SNP) in promoter regions, which lead to strain-specific transcriptional output. Furthermore, we identified strain-specific sRNA repertoires that could contribute to differential gene regulation among strains. In addition, we identified a novel minimal CRISPR-system in Campylobacter of the type-II CRISPR subtype, which relies on the host factor RNase III and a trans-encoded sRNA for maturation of crRNAs. This minimal system of Campylobacter, which seems active in only some strains, employs a unique maturation pathway, since the crRNAs are transcribed from individual promoters in the upstream repeats and thereby minimize the requirements for the maturation machinery. Overall, our study provides new insights into strain-specific transcriptome organization and sRNAs, and reveals genes that could modulate phenotypic variation among strains despite high conservation at the DNA level.
Author Summary
Many species have evolved into diverse strains with phenotypic and genotypic variations that facilitate adaptation to different ecological niches and, in the case of pathogens, to different hosts. Whereas comparison of genome sequences reveals differences and similarities among strains, the consequences of genomic variations can be tracked by studying the functional output from the genome. RNA sequencing has been revolutionizing transcriptome analyses of both pro- and eukaryotes. However, the bioinformatics-based analysis is still lagging behind, and transcriptome features are often manually annotated, which is laborious and time-consuming. This is even more compounded for the analyses of multiple strains. Here we compared the primary transcriptomes of four isolates of Campylobacter jejuni, the leading cause of bacterial gastroenteritis in humans, and provide genome-wide transcriptional start site (TSS) maps using a novel automated annotation method. Our comparative RNA–seq showed that most TSS are conserved in multiple strains, but we also observed SNP–dependent promoter usage. Furthermore, we identified a novel minimal RNA–based CRISPR immune system as well as strain-specific small RNA repertoires. Our automated, comparative TSS annotation will facilitate and improve transcriptome annotation for a wider range of organisms and provides insights into the contribution of transcriptome differences to phenotypic variation among closely related species.
PMCID: PMC3656092  PMID: 23696746
14.  The phosphoproteome of toll-like receptor-activated macrophages 
First global and quantitative analysis of phosphorylation cascades induced by toll-like receptor (TLR) stimulation in macrophages identifies nearly 7000 phosphorylation sites and shows extensive and dynamic up-regulation and down-regulation after lipopolysaccharide (LPS).In addition to the canonical TLR-associated pathways, mining of the phosphorylation data suggests an involvement of ATM/ATR kinases in signalling and shows that the cytoskeleton is a hotspot of TLR-induced phosphorylation.Intersecting transcription factor phosphorylation with bioinformatic promoter analysis of genes induced by LPS identified several candidate transcriptional regulators that were previously not implicated in TLR-induced transcriptional control.
Toll-like receptors (TLR) are a family of pattern recognition receptors that enable innate immune cells to sense infectious danger. Recognition of microbial structures, like lipopolysaccharide (LPS) of Gram-negative bacteria by TLR4, causes within hours substantial re-programming of macrophage gene expression, including up-regulation of chemokines driving inflammation, anti-microbial effector molecules and cytokines directing adaptive immune responses. TLR signalling is initiated by the adapter protein Myd88 and leads to the activation of kinase cascades that result in activation of the MAPK and NFkB pathways. Phosphorylation has an essential role in these early steps of TLR signalling, and in addition regulates critical transcription factors (TFs). Although TLR signalling has been extensively studied, a comprehensive analysis of phosphorylation events in TLR-activated macrophages is lacking. It is therefore unknown whether the canonical MAPK and NFkB pathways comprise the main phosphorylation events and which other molecular functions and processes are regulated by phosphorylation after stimulation with LPS.
Recent progress in mass spectrometry-based proteomics has opened the possibility to quantitatively investigate global changes in protein abundance and post-translational modifications. Stable isotope labelling with amino acids in cell culture (SILAC) allows highly accurate quantification, and has proved especially useful for direct comparison of phosphopeptide abundance in time-course or treatment analyses.
Here, we adapted SILAC to primary mouse macrophages, and performed a global, quantitative and kinetic analysis of the macrophage phosphoproteome after LPS stimulation. Bioinformatic analyses were used to identify kinases, pathways and biological processes enriched in the LPS-regulated phosphoproteome. To connect TF phosphorylation with transcription, we generated a parallel dataset of nascent RNA and used in silico promoter analysis to identify transcriptional regulators with binding site enrichment among the LPS-regulated gene set.
After establishing SILAC conditions for efficient labelling of primary bone marrow-derived macrophages in two independent experiments 1850 phosphoproteins with a total of 6956 phosphorylation sites were reproducibly identified. Phosphoproteins were detected from all cellular compartments, with a clear enrichment for nuclear and cytoskeleton-associated proteins. LPS caused major regulation of a large fraction of phosphopeptides, with 24% of all sites up-regulated and 9% down-regulated after stimulation (Figure 3A and B). These changes were highly dynamic, as the majority of the regulated phosphopeptides were up-regulated or down-regulated transiently or in a delayed manner (Figure 3C). Overall, the extent of changes in the phosphoproteome was comparable to the transcriptional re-programming, underscoring the importance of phosphorylation cascades in TLR signalling. Our parallel transcriptome data also showed that widespread phosphorylation precedes massive transcriptional changes.
To obtain footprints of kinase activation in response to TLR ligation, we searched phosphopeptide sequences for known linear sequence motifs of 33 kinases and identified kinase motifs enriched among LPS-regulated phosphorylation sites (compared to non-regulated phosphorylation sites) (Table I). Motif ERK/MAPK was highly enriched, in accordance with the essential role of the MAPK module in TLR signalling. Other kinases with motif enrichment have also recently been linked to TLR signalling (e.g. PKD; AKT and its targets GSK3 and mTOR). However, the DNA damage-actviated kinases ATM/ATR and the cell cycle-associated kinases AURORA and CHK1/2 have not been associated with the macrophage response to TLR activation yet. These finding shed new light on older data on the effect of TLR on macrophage proliferation in response to macrophage colony stimulating factor. Of interest, in follow-up experiments using pharmacological inhibitors of the kinases with motif enrichment, we observed that inhibition of ATM kinase activity caused increased LPS-induced expression of several cytokines and chemokines, suggesting that this pathway regulates inflammatory responses.
In further bioinformatic analyses, the Gene Ontology and signalling pathway annotations of phosphoproteins were used to identify signalling pathways and cellular processes targeted by TLR4-controlled phosphorylation (Table II). Among the expected hits, based on the known TLR pathways, were TLR signalling, MAPK and AKT as well as mTOR signalling. Of interest, the annotation terms ‘Rho GTPase cycle' and ‘cytoskeleton' were significantly enriched among LPS-regulated phosphoproteins, indicating a more prominent role for cytoskeletal proteins in the transduction of TLR signals or in the biological response to it.
We were especially interested in the phosphorylation of TFs and its regulation by LPS (Figure 6A). We hypothesised that functionally important TFs should have an increased frequency of binding sites in the promoters of LPS-regulated genes (Figure 6B). To identify transcriptionally regulated genes with high sensitivity, we isolated nascent RNA after metabolic labelling (Figure 6C–E). In silico promoter scanning using Genomatix software for binding sites for all 50 TF families with phosphorylated members was used to test for enrichment in transciptionally induced genes (Figure 6F). At the early time point, binding site enrichment for the canonical TLR-associated TF NFkB was detected, and in addition we found that several other TF families with an established role in the transcription of individual LPS-target genes showed binding site enrichment (CEBP, MEF2, NFAT and HEAT). In addition, enrichment for OCT and HOXC binding sites at the early time point and SORY matrices later after stimulation indicated an involvement of the phosphorylated members of the respective TF families in the execution of TLR-induced transcriptional responses. An initial test of the function for a few of these candidate transcriptional regulators was performed using siRNA knockdown in primary macrophages. These experiments suggested that knock down of the SORY binding phosphoprotein Capicua homolog (Cic) and to a lesser extent of the CREB family member Atf7 selectively attenuates LPS-induced expression of Il1a and Il1b.
In summary, this study provides a novel and global perspective on innate immune activation by TLR signalling (Figure 5). We quantitatively detected a large number of previously unknown site-specific phosphorylation events, which are now publicly available through the Phosida database. By combining different data mining approaches, we consistently identified canonical and newly implicated TLR-activated signalling modules. In particular, the PI3K/AKT and the related mTOR pathway were highlighted; furthermore, DNA damage–response associated ATM/ATR kinases and the cytoskeleton emerged as unexpected hotspots for phosphorylation. Finally, weaving together corresponding phophoproteome and nascent transcriptome datasets through the loom of in silico promoter analysis we identified TFs with a likely role in mediating TLR-induced gene expression programmes.
Recognition of microbial danger signals by toll-like receptors (TLR) causes re-programming of macrophages. To investigate kinase cascades triggered by the TLR4 ligand lipopolysaccharide (LPS) on systems level, we performed a global, quantitative and kinetic analysis of the phosphoproteome of primary macrophages using stable isotope labelling with amino acids in cell culture, phosphopeptide enrichment and high-resolution mass spectrometry. In parallel, nascent RNA was profiled to link transcription factor (TF) phosphorylation to TLR4-induced transcriptional activation. We reproducibly identified 1850 phosphoproteins with 6956 phosphorylation sites, two thirds of which were not reported earlier. LPS caused major dynamic changes in the phosphoproteome (24% up-regulation and 9% down-regulation). Functional bioinformatic analyses confirmed canonical players of the TLR pathway and highlighted other signalling modules (e.g. mTOR, ATM/ATR kinases) and the cytoskeleton as hotspots of LPS-regulated phosphorylation. Finally, weaving together phosphoproteome and nascent transcriptome data by in silico promoter analysis, we implicated several phosphorylated TFs in primary LPS-controlled gene expression.
PMCID: PMC2913394  PMID: 20531401
macrophage; nascent RNA; phosphoproteome; SILAC; toll-like receptors
15.  Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art 
BMC Bioinformatics  2012;13:89.
RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition ‘code’ that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction.
We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues.
Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
PMCID: PMC3490755  PMID: 22574904
16.  The Transcriptome of the Human Pathogen Trypanosoma brucei at Single-Nucleotide Resolution 
PLoS Pathogens  2010;6(9):e1001090.
The genome of Trypanosoma brucei, the causative agent of African trypanosomiasis, was published five years ago, yet identification of all genes and their transcripts remains to be accomplished. Annotation is challenged by the organization of genes transcribed by RNA polymerase II (Pol II) into long unidirectional gene clusters with no knowledge of how transcription is initiated. Here we report a single-nucleotide resolution genomic map of the T. brucei transcriptome, adding 1,114 new transcripts, including 103 non-coding RNAs, confirming and correcting many of the annotated features and revealing an extensive heterogeneity of 5′ and 3′ ends. Some of the new transcripts encode polypeptides that are either conserved in T. cruzi and Leishmania major or were previously detected in mass spectrometry analyses. High-throughput RNA sequencing (RNA-Seq) was sensitive enough to detect transcripts at putative Pol II transcription initiation sites. Our results, as well as recent data from the literature, indicate that transcription initiation is not solely restricted to regions at the beginning of gene clusters, but may occur at internal sites. We also provide evidence that transcription at all putative initiation sites in T. brucei is bidirectional, a recently recognized fundamental property of eukaryotic promoters. Our results have implications for gene expression patterns in other important human pathogens with similar genome organization (Trypanosoma cruzi, Leishmania sp.) and revealed heterogeneity in pre-mRNA processing that could potentially contribute to the survival and success of the parasite population in the insect vector and the mammalian host.
Author Summary
Identifying genes essential for survival in the host is fundamental to unraveling the biology of human pathogens and understanding mechanisms of pathogenesis. The protozoan parasite Trypanosoma brucei causes devastating diseases in humans and animals in sub-Saharan Africa, and the publication in 2005 of the genome sequence provided the first glance at the coding potential of this organism. Although at present there is a catalogue of predicted protein coding genes, the challenge remains to identify all authentic genes, including their boundaries. We used next generation RNA sequencing (RNA-Seq) to map transcribed regions and RNA polymerase II transcription initiation sites on a genome-wide scale. This approach allowed us to improve and correct the current annotation, to reveal a widespread heterogeneity of RNA processing sites (trans-splicing and polyadenylation) and to estimate that most genes are expressed at levels corresponding to 1 to 10 mRNAs per cell. Our data indicate that different transcript forms representing the same gene are present stochastically within the mRNA population. This unanticipated scenario may contribute to determining gene expression landscapes to adapt to different environments in the parasite life cycle.
PMCID: PMC2936537  PMID: 20838601
17.  Spliced Leader Trapping Reveals Widespread Alternative Splicing Patterns in the Highly Dynamic Transcriptome of Trypanosoma brucei 
PLoS Pathogens  2010;6(8):e1001037.
Trans-splicing of leader sequences onto the 5′ends of mRNAs is a widespread phenomenon in protozoa, nematodes and some chordates. Using parallel sequencing we have developed a method to simultaneously map 5′splice sites and analyze the corresponding gene expression profile, that we term spliced leader trapping (SLT). The method can be applied to any organism with a sequenced genome and trans-splicing of a conserved leader sequence. We analyzed the expression profiles and splicing patterns of bloodstream and insect forms of the parasite Trypanosoma brucei. We detected the 5′ splice sites of 85% of the annotated protein-coding genes and, contrary to previous reports, found up to 40% of transcripts to be differentially expressed. Furthermore, we discovered more than 2500 alternative splicing events, many of which appear to be stage-regulated. Based on our findings we hypothesize that alternatively spliced transcripts present a new means of regulating gene expression and could potentially contribute to protein diversity in the parasite. The entire dataset can be accessed online at TriTrypDB or through:
Author Summary
Some organisms like the human and animal parasite Trypanosoma brucei add a leader sequence to their mRNAs through a reaction called trans-splicing. Until now the splice sites for most mRNAs were unknown in T. brucei. Using high throughput sequencing we have developed a method to identify the splice sites and at the same time measure the abundance of the corresponding mRNAs. Analyzing three different life cycle stages of the parasite we identified the vast majority of splice sites in the organism and, to our great surprise, uncovered more than 2500 alternative splicing events, many of which appeared to be specific for one of the life cycle stages. Alternative splicing is a result of the addition of the leader sequence to different positions on the mRNA, leading to mixed mRNA populations that can encode for proteins with varying properties. One of the most obvious changes caused by alternative splicing is the gain or loss of targeting signals, leading to differential localization of the corresponding proteins. Based on our findings we hypothesize that alternative splicing is a major mechanism to regulate gene expression in T. brucei and could contribute to protein diversity in the parasite.
PMCID: PMC2916883  PMID: 20700444
18.  Genetic architecture of retinal and macular degenerative diseases: the promise and challenges of next-generation sequencing 
Genome Medicine  2013;5(10):84.
Inherited retinal degenerative diseases (RDDs) display wide variation in their mode of inheritance, underlying genetic defects, age of onset, and phenotypic severity. Molecular mechanisms have not been delineated for many retinal diseases, and treatment options are limited. In most instances, genotype-phenotype correlations have not been elucidated because of extensive clinical and genetic heterogeneity. Next-generation sequencing (NGS) methods, including exome, genome, transcriptome and epigenome sequencing, provide novel avenues towards achieving comprehensive understanding of the genetic architecture of RDDs. Whole-exome sequencing (WES) has already revealed several new RDD genes, whereas RNA-Seq and ChIP-Seq analyses are expected to uncover novel aspects of gene regulation and biological networks that are involved in retinal development, aging and disease. In this review, we focus on the genetic characterization of retinal and macular degeneration using NGS technology and discuss the basic framework for further investigations. We also examine the challenges of NGS application in clinical diagnosis and management.
PMCID: PMC4066589  PMID: 24112618
19.  Recessive Antimorphic Alleles Overcome Functionally Redundant Loci to Reveal TSO1 Function in Arabidopsis Flowers and Meristems 
PLoS Genetics  2011;7(11):e1002352.
Arabidopsis TSO1 encodes a protein with conserved CXC domains known to bind DNA and is homologous to animal proteins that function in chromatin complexes. tso1 mutants fall into two classes due to their distinct phenotypes. Class I, represented by two different missense mutations in the CXC domain, leads to failure in floral organ development, sterility, and fasciated inflorescence meristems. Class II, represented by a nonsense mutation and a T-DNA insertion line, develops wild-type–like flowers and inflorescences but shows severely reduced fertility. The phenotypic variability of tso1 alleles presents challenges in determining the true function of TSO1. In this study, we use artificial microRNA, double mutant analysis, and bimolecular fluorescence complementation assay to investigate the molecular basis underlying these two distinct classes of phenotypes. We show that the class I mutants could be converted into class II by artificial microRNA knockdown of the tso1 mutant transcript, suggesting that class I alleles produce antimorphic mutant proteins that interfere with functionally redundant loci. We identified one such redundant factor coded by the closely related TSO1 homolog SOL2. We show that the class I phenotype can be mimicked by knocking out both TSO1 and its homolog SOL2 in double mutants. Such antimorphic alleles targeting redundant factors are likely prevalent in Arabidopsis and maybe common in organisms with many sets of paralogous genes such as human. Our data challenge the conventional view that recessive alleles are always hypomorphic or null and that antimorphic alleles are always dominant. This study shows that recessive alleles can also be antimorphic and can produce a phenotype more severe than null by interfering with the function of related loci. This finding adds a new paradigm to classical genetic concepts, with important implications for future genetic studies both in basic research as well as in agriculture and medicine.
Author Summary
Much of our current genetic concepts and terms came from early pioneering work in Drosophila melanogaster, which has a relatively simple genome with reduced gene sets. One noted example is the term antimorph or dominant-negative, which describes mutant proteins that antagonize the corresponding wild-type proteins in a dominant fashion. In the process of characterizing Arabidopsis thaliana tso1 mutants, we discovered a novel genetic phenomenon “recessive antimorphism,” where certain recessive and missense mutations interfere with functionally redundant genes in the genome to reveal a broader range of phenotypes than the corresponding loss-of-function or null alleles. Our work indicates a rarely noted strength of Arabidopsis as a genetic model for studying species with complex genome architecture, including humans that possess significant chromosome segmental or genome duplications and increased gene copy numbers. It adds a new paradigm to classical genetic concepts with important implications for modern genetics in both medicine and agriculture.
PMCID: PMC3207858  PMID: 22072982
20.  Mutations in SLC29A3, Encoding an Equilibrative Nucleoside Transporter ENT3, Cause a Familial Histiocytosis Syndrome (Faisalabad Histiocytosis) and Familial Rosai-Dorfman Disease 
PLoS Genetics  2010;6(2):e1000833.
The histiocytoses are a heterogeneous group of disorders characterised by an excessive number of histiocytes. In most cases the pathophysiology is unclear and treatment is nonspecific. Faisalabad histiocytosis (FHC) (MIM 602782) has been classed as an autosomal recessively inherited form of histiocytosis with similarities to Rosai-Dorfman disease (RDD) (also known as sinus histiocytosis with massive lymphadenopathy (SHML)). To elucidate the molecular basis of FHC, we performed autozygosity mapping studies in a large consanguineous family and identified a novel locus at chromosome 10q22.1. Mutation analysis of candidate genes within the target interval identified biallelic germline mutations in SLC29A3 in the FHC kindred and in two families reported to have familial RDD. Analysis of SLC29A3 expression during mouse embryogenesis revealed widespread expression by e14.5 with prominent expression in the central nervous system, eye, inner ear, and epithelial tissues including the gastrointestinal tract. SLC29A3 encodes an intracellular equilibrative nucleoside transporter (hENT3) with affinity for adenosine. Recently germline mutations in SLC29A3 were also described in two rare autosomal recessive disorders with overlapping phenotypes: (a) H syndrome (MIM 612391) that is characterised by cutaneous hyperpigmentation and hypertrichosis, hepatomegaly, heart anomalies, hearing loss, and hypogonadism; and (b) PHID (pigmented hypertrichosis with insulin-dependent diabetes mellitus) syndrome. Our findings suggest that a variety of clinical diagnoses (H and PHID syndromes, FHC, and familial RDD) can be included in a new diagnostic category of SLC29A3 spectrum disorder.
Author Summary
The histiocytoses are a group of systemic disorders usually confined to childhood and are caused by an excessive number of histiocytes which phagocytose other cells and process antigens. Although nearly a century has passed since histiocytic disorders were recognised, their pathophysiology remains largely unclear, and treatment is nonspecific. The identification of SLC29A3 mutations as the molecular basis for a familial form of syndromic histiocytosis (FHC/RDD) confirms a direct link between Faisalabad histiocytosis and Rosai-Dorfman disease and links these disorders to other SLC29A3-associated phenotypes.
PMCID: PMC2816679  PMID: 20140240
21.  Activation of synovial fibroblasts in rheumatoid arthritis: lack of expression of the tumour suppressor PTEN at sites of invasive growth and destruction 
Arthritis Research  1999;2(1):59-64.
In the present study, we searched for mutant PTEN transcripts in aggressive rheumatoid arthritis synovial fibroblasts (RA-SF) and studied the expression of PTEN in RA. By automated sequencing, no evidence for the presence of mutant PTEN transcripts was found. However, in situ hybridization on RA synovium revealed a distinct expression pattern of PTEN, with negligible staining in the lining layer but abundant expression in the sublining. Normal synovial tissue exhibited homogeneous staining for PTEN. In cultured RA-SF, only 40% expressed PTEN. Co-implantation of RA-SF and normal human cartilage into severe combined immunodeficiency (SCID) mice showed only limited expression of PTEN, with no staining in those cells aggressively invading the cartilage. Although PTEN is not genetically altered in RA, these findings suggest that a lack of PTEN expression may constitute a characteristic feature of activated RA-SF in the lining, and may thereby contribute to the invasive behaviour of RA-SF by maintaining their aggressive phenotype at sites of cartilage destruction.
PTEN is a novel tumour suppressor which exhibits tyrosine phosphatase activity as well as homology to the cytoskeletal proteins tensin and auxilin. Mutations of PTEN have been described in several human cancers and associated with their invasiveness and metastatic properties. Although not malignant, rheumatoid arthritis synovial fibroblasts (RA-SF) exhibit certain tumour-like features such as attachment to cartilage and invasive growth. In the present study, we analyzed whether mutant transcripts of PTEN were present in RA-SF. In addition, we used in situ hybridization to study the expression of PTEN messenger (m)RNA in tissue samples of RA and normal individuals as well as in cultured RA-SF and in the severe combined immunodeficiency (SCID) mouse model of RA.
Synovial tissue specimens were obtained from seven patients with RA and from two nonarthritic individuals. Total RNA was isolated from synovial fibroblasts and after first strand complementary (c)DNA synthesis, polymerase chain reaction (PCR) was performed to amplify a 1063 base pair PTEN fragment that encompassed the coding sequence of PTEN including the phosphatase domain and all mutation sites described so far. The PCR products were subcloned in Escherichia coli, and up to four clones were picked from each plate for automated sequencing. For in situ hybridization, digoxigenin-labelled PTEN-specific RNA probes were generated by in vitro transcription. For control in situ hybridization, a matrix metalloproteinase (MMP)-2-specific probe was prepared. To investigate the expression of PTEN in the absence of human macrophage or lymphocyte derived factors, we implanted RA-SF from three patients together with normal human cartilage under the renal capsule of SCID mice. After 60 days, mice were sacrificed, the implants removed and embedded into paraffin.
PCR revealed the presence of the expected 1063 base pair PTEN fragment in all (9/9) cell cultures (Fig. 1). No additional bands that could account for mutant PTEN variants were detected. Sequence analysis revealed 100% homology of all RA-derived PTEN fragments to those from normal SF as well as to the published GenBank sequence (accession number U93051). However, in situ hybridization demonstrated considerable differences in the expression of PTEN mRNA within the lining and the sublining layers of RA synovial membranes. As shown in Figure 2a, no staining was observed within the lining layer which has been demonstrated to mediate degradation of cartilage and bone in RA. In contrast, abundant expression of PTEN mRNA was found in the sublining of all RA synovial tissues (Figs 2a and b). Normal synovial specimens showed homogeneous staining for PTEN within the thin synovial membrane (Fig. 2c). In situ hybridization using the sense probe gave no specific staining (Fig. 2d). We also performed in situ hybridization on four of the seven cultured RA-SF and followed one cell line from the first to the sixth passage. Interestingly, only 40% of cultured RA-SF expressed PTEN mRNA (Fig. 3a), and the proportion of PTEN expressing cells did not change throughout the passages. In contrast, control experiments using a specific RNA probe for MMP-2 revealed mRNA expression by nearly all cultured cells (Fig. 3b). As seen before, implantation of RA-SF into the SCID mice showed considerable cartilage degradation. Interestingly, only negligible PTEN expression was found in those RA-SF aggressively invading the cartilage (Fig. 3c). In situ hybridization for MMP-2 showed abundant staining in these cells (Fig. 3d).
Although this study found no evidence for mutations of PTEN in RA synovium, the observation that PTEN expression is lacking in the lining layer of RA synovium as well as in more than half of cultured RA-SF is of interest. It suggests that loss of PTEN function may not exclusively be caused by genetic alterations, yet at the same time links the low expression of PTEN to a phenotype of cells that have been shown to invade cartilage aggressively.
It has been proposed that the tyrosine phosphatase activity of PTEN is responsible for its tumour suppressor activity by counteracting the actions of protein tyrosine kinases. As some studies have demonstrated an upregulation of tyrosine kinase activity in RA synovial cells, it might be speculated that the lack of PTEN expression in aggressive RA-SF contributes to the imbalance of tyrosine kinases and phosphatases in this disease. However, the extensive amino-terminal homology of the predicted protein to the cytoskeletal proteins tensin and auxilin suggests a complex regulatory function involving cellular adhesion molecules and phosphatase-mediated signalling. The tyrosine phosphatase TEP1 has been shown to be identical to the protein encoded by PTEN, and gene transcription of TEP1 has been demonstrated to be downregulated by transforming growth factor (TGF)-β. Therefore, it could be hypothesized that TGF-β might be responsible for the downregulation of PTEN. However, the expression of TGF-β is not restricted to the lining but found throughout the synovial tissue in RA. Moreover, in our study the percentage of PTEN expressing RA-SF remained stable for six passages in culture, whereas molecules that are cytokine-regulated in vivo frequently change their expression levels when cultured over several passages. Also, cultured RA-SF that were implanted into SCID mice and deeply invaded the cartilage did not show significant expression of PTEN after 60 days. The drop in the percentage of PTEN expressing cells from the original cell cultures to the SCID mouse implants is of interest as this observation goes along with data from previous studies that have shown the prominent expression of activation-related molecules in the SCID mice implants that in vivo are found predominantly in the lining layer. Therefore, our data point to endogenous mechanisms rather than to the influence of exogenous human cytokines or factors in the downregulation of PTEN. Low expression of PTEN may belong to the features that distinguish between the activated phenotype of RA-SF and the sublining, proliferating but nondestructive cells.
PMCID: PMC17804  PMID: 11219390
rheumatoid arthritis; synovial membrane; fibroblasts; PTEN tumour suppressor; severe combined immunodeficiency (SCID) mouse model; cartilage destruction; in situ hybridization
22.  Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs 
PLoS Genetics  2013;9(4):e1003470.
Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (∼30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non–TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ∼30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ∼35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
Author Summary
An unexpected layer of complexity in the genomes of humans and other vertebrates lies in the abundance of genes that do not appear to encode proteins but produce a variety of non-coding RNAs. In particular, the human genome is currently predicted to contain 5,000–10,000 independent gene units generating long (>200 nucleotides) noncoding RNAs (lncRNAs). While there is growing evidence that a large fraction of these lncRNAs have cellular functions, notably to regulate protein-coding gene expression, almost nothing is known on the processes underlying the evolutionary origins and diversification of lncRNA genes. Here we show that transposable elements, through their capacity to move and spread in genomes in a lineage-specific fashion, as well as their ability to introduce regulatory sequences upon chromosomal insertion, represent a major force shaping the lncRNA repertoire of humans, mice, and zebrafish. Not only do TEs make up a substantial fraction of mature lncRNA transcripts, they are also enriched in the vicinity of lncRNA genes, where they frequently contribute to their transcriptional regulation. Through specific examples we provide evidence that some TE sequences embedded in lncRNAs are critical for the biogenesis of lncRNAs and likely important for their function.
PMCID: PMC3636048  PMID: 23637635
23.  A Novel Tumor-Promoting Function Residing in the 5′ Non-coding Region of vascular endothelial growth factor mRNA 
PLoS Medicine  2008;5(5):e94.
Vascular endothelial growth factor-A (VEGF) is one of the key regulators of tumor development, hence it is considered to be an important therapeutic target for cancer treatment. However, clinical trials have suggested that anti-VEGF monotherapy was less effective than standard chemotherapy. On the basis of the evidence, we hypothesized that vegf mRNA may have unrecognized function(s) in cancer cells.
Methods and Findings
Knockdown of VEGF with vegf-targeting small-interfering (si) RNAs increased susceptibility of human colon cancer cell line (HCT116) to apoptosis caused with 5-fluorouracil, etoposide, or doxorubicin. Recombinant human VEGF165 did not completely inhibit this apoptosis. Conversely, overexpression of VEGF165 increased resistance to anti-cancer drug-induced apoptosis, while an anti-VEGF165-neutralizing antibody did not completely block the resistance. We prepared plasmids encoding full-length vegf mRNA with mutation of signal sequence, vegf mRNAs lacking untranslated regions (UTRs), or mutated 5′UTRs. Using these plasmids, we revealed that the 5′UTR of vegf mRNA possessed anti-apoptotic activity. The 5′UTR-mediated activity was not affected by a protein synthesis inhibitor, cycloheximide. We established HCT116 clones stably expressing either the vegf 5′UTR or the mutated 5′UTR. The clones expressing the 5′UTR, but not the mutated one, showed increased anchorage-independent growth in vitro and formed progressive tumors when implanted in athymic nude mice. Microarray and quantitative real-time PCR analyses indicated that the vegf 5′UTR-expressing tumors had up-regulated anti-apoptotic genes, multidrug-resistant genes, and growth-promoting genes, while pro-apoptotic genes were down-regulated. Notably, expression of signal transducers and activators of transcription 1 (STAT1) was markedly repressed in the 5′UTR-expressing tumors, resulting in down-regulation of a STAT1-responsive cluster of genes (43 genes). As a result, the tumors did not respond to interferon (IFN)α therapy at all. We showed that stable silencing of endogenous vegf mRNA in HCT116 cells enhanced both STAT1 expression and IFNα responses.
These findings suggest that cancer cells have a survival system that is regulated by vegf mRNA and imply that both vegf mRNA and its protein may synergistically promote the malignancy of tumor cells. Therefore, combination of anti-vegf transcript strategies, such as siRNA-based gene silencing, with anti-VEGF antibody treatment may improve anti-cancer therapies that target VEGF.
Shigetada Teshima-Kondo and colleagues find that cancer cells have a survival system that is regulated by vegf mRNA and that vegf mRNA and its protein may synergistically promote the malignancy of tumor cells.
Editors' Summary
Normally, throughout life, cell division (which produces new cells) and cell death are carefully balanced to keep the body in good working order. But sometimes cells acquire changes (mutations) in their genetic material that allow them to divide uncontrollably to form cancers—disorganized masses of cells. When a cancer is small, it uses the body's existing blood supply to get the oxygen and nutrients it needs for its growth and survival. But, when it gets bigger, it has to develop its own blood supply. This process is called angiogenesis. It involves the release by the cancer cells of proteins called growth factors that bind to other proteins (receptors) on the surface of endothelial cells (the cells lining blood vessels). The receptors then send signals into the endothelial cells that tell them to make new blood vessels. One important angiogenic growth factor is “vascular endothelial growth factor” (VEGF). Tumors that make large amounts of VEGF tend to be more abnormal and more aggressive than those that make less VEGF. In addition, high levels of VEGF in the blood are often associated with poor responses to chemotherapy, drug regimens designed to kill cancer cells.
Why Was This Study Done?
Because VEGF is a key regulator of tumor development, several anti-VEGF therapies—drugs that target VEGF and its receptors—have been developed. These therapies strongly suppress the growth of tumor cells in the laboratory and in animals but, when used alone, are no better at increasing the survival times of patients with cancer than standard chemotherapy. Scientists are now looking for an explanation for this disappointing result. Like all proteins, cells make VEGF by “transcribing” its DNA blueprint into an mRNA copy (vegf mRNA), the coding region of which is “translated” into the VEGF protein. Other, “noncoding” regions of vegf mRNA control when and where VEGF is made. Scientists have recently discovered that the noncoding regions of some mRNAs suppress tumor development. In this study, therefore, the researchers investigate whether vegf mRNA has an unrecognized function in tumor cells that could explain the disappointing clinical results of anti-VEGF therapeutics.
What Did the Researchers Do and Find?
The researchers first used a technique called small interfering (si) RNA knockdown to stop VEGF expression in human colon cancer cells growing in dishes. siRNAs are short RNAs that bind to and destroy specific mRNAs in cells, thereby preventing the translation of those mRNAs into proteins. The treatment of human colon cancer cells with vegf-targeting siRNAs made the cells more sensitive to chemotherapy-induced apoptosis (a type of cell death). This sensitivity was only partly reversed by adding VEGF to the cells. By contrast, cancer cells engineered to make more vegf mRNA had increased resistance to chemotherapy-induced apoptosis. Treatment of these cells with an antibody that inhibited VEGF function did not completely block this resistance. Together, these results suggest that both vegf mRNA and VEGF protein have anti-apoptotic effects. The researchers show that the anti-apoptotic activity of vegf mRNA requires a noncoding part of the mRNA called the 5′ UTR, and that whereas human colon cancer cells expressing this 5′ UTR form tumors in mice, cells expressing a mutated 5′ UTR do not. Finally, they report that the expression of several pro-apoptotic genes and of an anti-tumor pathway known as the interferon/STAT1 tumor suppression pathway is down-regulated in tumors that express the vegf 5′ UTR.
What Do These Findings Mean?
These findings suggest that some cancer cells have a survival system that is regulated by vegf mRNA and are the first to show that a 5′UTR of mRNA can promote tumor growth. They indicate that VEGF and its mRNA work together to promote their development and to increase their resistance to chemotherapy drugs. They suggest that combining therapies that prevent the production of vegf mRNA (for example, siRNA-based gene silencing) with therapies that block the function of VEGF might improve survival times for patients whose tumors overexpress VEGF.
Additional Information.
Please access these Web sites via the online version of this summary at
This study is discussed further in a PLoS Medicine Perspective by Hughes and Jones
The US National Cancer Institute provides information about all aspects of cancer, including information on angiogenesis, and on bevacizumab, an anti-VEGF therapeutic (in English and Spanish)
CancerQuest, from Emory University, provides information on all aspects of cancer, including angiogenesis (in several languages)
Cancer Research UK also provides basic information about what causes cancers and how they develop, grow, and spread, including information about angiogenesis
Wikipedia has pages on VEGF and on siRNA (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
PMCID: PMC2386836  PMID: 18494554
24.  Programmed fluctuations in sense/antisense transcript ratios drive sexual differentiation in S. pombe 
Strand-specific RNA sequencing of S. pombe reveals a highly structured programme of ncRNA expression at over 600 loci. Functional investigations show that this extensive ncRNA landscape controls the complex programme of sexual differentiation in S. pombe.
The model eukaryote S. pombe features substantial numbers of ncRNAs many of which are antisense regulatory transcripts (ARTs), ncRNAs expressed on the opposing strand to coding sequences.Individual ARTs are generated during the mitotic cycle, or at discrete stages of sexual differentiation to downregulate the levels of proteins that drive and coordinate sexual differentiation.Antisense transcription occurring from events such as bidirectional transcription is not simply artefactual ‘chatter', it performs a critical role in regulating gene expression.
Regulation of the RNA profile is a principal control driving sexual differentiation in the fission yeast Schizosaccharomyces pombe. Before transcription, RNAi-mediated formation of heterochromatin is used to suppress expression, while post-transcription, regulation is achieved via the active stabilisation or destruction of transcripts, and through at least two distinct types of splicing control (Mata et al, 2002; Shimoseki and Shimoda, 2001; Averbeck et al, 2005; Mata and Bähler, 2006; Xue-Franzen et al, 2006; Moldon et al, 2008; Djupedal et al, 2009; Amorim et al, 2010; Grewal, 2010; Cremona et al, 2011).
Around 94% of the S. pombe genome is transcribed (Wilhelm et al, 2008). While many of these transcripts encode proteins (Wood et al, 2002; Bitton et al, 2011), the majority have no known function. We used a strand-specific protocol to sequence total RNA extracts taken from vegetatively growing cells, and at different points during a time course of sexual differentiation. The resulting data redefined existing gene coordinates and identified additional transcribed loci. The frequency of reads at each of these was used to monitor transcript abundance.
Transcript levels at 6599 loci changed in at least one sample (G-statistic; False Discovery Rate <5%). 4231 (72.3%), of which 4011 map to protein-coding genes, while 809 loci were antisense to a known gene. Comparisons between haploid and diploid strains identified changes in transcript levels at over 1000 loci.
At 354 loci, greater antisense abundance was observed relative to sense, in at least one sample (putative antisense regulatory transcripts—ARTs). Since antisense mechanisms are known to modulate sense transcript expression through a variety of inhibitory mechanisms (Faghihi and Wahlestedt, 2009), we postulated that the waves of antisense expression activated at different stages during meiosis might be regulating protein expression.
To ask whether transcription factors that drive sense-transcript levels influenced ART production, we performed RNA-seq of a pat1.114 diploid meiosis in the absence of the transcription factors Atf21 and Atf31 (responsible for late meiotic transcription; Mata et al, 2002). Transcript levels at 185 ncRNA loci showed significant changes in the knockout backgrounds. Although meiotic progression is largely unaffected by removal of Atf21 and Atf31, viability of the resulting spores was significantly diminished, indicating that Atf21- and Atf31-mediated events are critical to efficient sexual differentiation.
If changes to relative antisense/sense transcript levels during a particular phase of sexual differentiation were to regulate protein expression, then the continued presence of the antisense at points in the differentiation programme where it would normally be absent should abolish protein function during this phase. We tested this hypothesis at four loci representing the three means of antisense production: convergent gene expression, improper termination and nascent transcription from an independent locus. Induction of the natural antisense transcripts that opposed spo4+, spo6+ and dis1+ (Figures 3 and 7) in trans from a heterologous locus phenocopied a loss of function of the target protein. ART overexpression decreased Dis1 protein levels. Antisense transcription opposing spk1+ originated from improper termination of the sense ups1+ transcript on the opposite strand (Figure 3B, left locus). Expression of either the natural full-length ups1+ transcript or a truncated version, restricted to the portion of ups1+ overlapping spk1+ (Figure 3, orange transcripts) in trans from a heterologous locus phenocopied the spk1.Δ differentiation deficiency. Convergent transcription from a neighbouring gene on the opposing strand is, therefore, an effective mechanism to generate RNAi-mediated (below) silencing in fission yeast. Further analysis of the data revealed, for many loci, substantial changes in UTR length over the course of meiosis, suggesting that UTR dynamics may have an active role in regulating gene expression by controlling the transcriptional overlap between convergent adjacent gene pairs.
The RNAi machinery (Grewal, 2010) was required for antisense suppression at each of the dis1, spk1, spo4 and spo6 loci, as antisense to each locus had no impact in ago1.Δ, dcr1.Δ and rdp1.Δ backgrounds. We conclude that RNAi control has a key role in maintaining the fidelity of sexual differentiation in fission yeast. The histone H3 methyl transferase Clr4 was required for antisense control from a heterologous locus.
Thus, a significant portion of the impact of ncRNA upon sexual differentiation arises from antisense gene silencing. Importantly, in contrast to the extensively characterised ability of the RNAi machinery to operate in cis at a target locus in S. pombe (Grewal, 2010), each case of gene silencing generated here could be achieved in trans by expression of the antisense transcript from a single heterologous locus elsewhere in the genome.
Integration of an antibiotic marker gene immediately downstream of the dis1+ locus instigated antisense control in an orientation-dependent manner. PCR-based gene tagging approaches are widely used to fuse the coding sequences of epitope or protein tags to a gene of interest. Not only do these tagging approaches disrupt normal 3′UTR controls, but the insertion of a heterologous marker gene immediately downstream of an ORF can clearly have a significant impact upon transcriptional control of the resulting fusion protein. Thus, PCR tagging approaches can no longer be viewed as benign manipulations of a locus that only result in the production of a tagged protein product.
Repression of Dis1 function by gene deletion or antisense control revealed a key role this conserved microtubule regulator in driving the horsetail nuclear migrations that promote recombination during meiotic prophase.
Non-coding transcripts have often been viewed as simple ‘chatter', maintained solely because evolutionary pressures have not been strong enough to force their elimination from the system. Our data show that phenomena such as improper termination and bidirectional transcription are not simply interesting artifacts arising from the complexities of transcription or genome history, but have a critical role in regulating gene expression in the current genome. Given the widespread use of RNAi, it is reasonable to anticipate that future analyses will establish ARTs to have equal importance in other organisms, including vertebrates.
These data highlight the need to modify our concept of a gene from that of a spatially distinct locus. This view is becoming increasingly untenable. Not only are the 5′ and 3′ ends of many genes indistinct, but that this lack of a hard and fast boundary is actively used by cells to control the transcription of adjacent and overlapping loci, and thus to regulate critical events in the life of a cell.
Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3′ termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent ‘horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply ‘genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
PMCID: PMC3738847  PMID: 22186733
antisense; meiosis; ncRNA; S. pombe; siRNA
25.  Systematic Analysis of the Role of RNA-Binding Proteins in the Regulation of RNA Stability 
PLoS Genetics  2014;10(11):e1004684.
mRNA half-lives are transcript-specific and vary over a range of more than 100-fold in eukaryotic cells. mRNA stabilities can be regulated by sequence-specific RNA-binding proteins (RBPs), which bind to regulatory sequence elements and modulate the interaction of the mRNA with the cellular RNA degradation machinery. However, it is unclear if this kind of regulation is sufficient to explain the large range of mRNA stabilities. To address this question, we examined the transcriptome of 74 Schizosaccharomyces pombe strains carrying deletions in non-essential genes encoding predicted RBPs (86% of all such genes). We identified 25 strains that displayed changes in the levels of between 4 and 104 mRNAs. The putative targets of these RBPs formed biologically coherent groups, defining regulons involved in cell separation, ribosome biogenesis, meiotic progression, stress responses and mitochondrial function. Moreover, mRNAs in these groups were enriched in specific sequence motifs in their coding sequences and untranslated regions, suggesting that they are coregulated at the posttranscriptional level. We performed genome-wide RNA stability measurements for several RBP mutants, and confirmed that the altered mRNA levels were caused by changes in their stabilities. Although RBPs regulate the decay rates of multiple regulons, only 16% of all S. pombe mRNAs were affected in any of the 74 deletion strains. This suggests that other players or mechanisms are required to generate the observed range of RNA half-lives of a eukaryotic transcriptome.
Author Summary
Messenger RNAs (mRNAs) are the molecules that relay the information from genes (DNA) to proteins. Cells contain different amounts of each mRNA type depending on their function and their situation. The quantity of each mRNA depends on the balance between its production (transcription) and its degradation (mRNA decay). Recent studies have shown that the rate at which each mRNA is degraded is specific for every gene, but little is known about how this is regulated. In this work, we look at the role of a class of proteins that bind to RNA molecules (RNA-binding proteins, or RBPs) in the regulation of RNA decay. By systematically examining cells in which a single RBP has been inactivated we identify those that are important for RNA degradation. We found RBPs that make mRNAs more stable (that is, they are degraded more slowly) and others that make them unstable. These RBPs control the RNAs of genes with common features, suggesting that they provide a way of coordinating the function of groups of genes. However, for many genes we did not find RBPs that control their stability, indicating that other players are important to regulate RNA degradation.
PMCID: PMC4222612  PMID: 25375137

Results 1-25 (1356299)