It has long been assumed that DNA sequences and corresponding RNA transcripts are almost identical; a recent discovery, however, revealed widespread RNA-DNA differences (RDDs), which represent a largely unexplored aspect of human genome variation. It has been speculated that RDDs can affect disease susceptibility and manifestations; however, almost nothing is known about how RDDs are related to disease. Here, we show that RDDs are rarer in proto-oncogenes than in tumor suppressor genes; the number of RDDs in coding exons, but not in 3′UTR and 5′UTR, is significantly lower in the former than the latter, and this trend is especially pronounced in non-synonymous RDDs, i.e., those cause amino acid changes. A potential mechanism is that, unlike proto-oncogenes, the requirement of tumor suppressor genes to have both alleles affected to cause tumor ‘buffers' these genes to tolerate more RDDs.
For the many years, the central dogma of molecular biology has been that RNA functions mainly as an informational intermediate between a DNA sequence and its encoded protein. But one of the great surprises of modern biology was the discovery that protein-coding genes represent less than 2% of the total genome sequence, and subsequently the fact that at least 90% of the human genome is actively transcribed. Thus, the human transcriptome was found to be more complex than a collection of protein-coding genes and their splice variants. Although initially argued to be spurious transcriptional noise or accumulated evolutionary debris arising from the early assembly of genes and/or the insertion of mobile genetic elements, recent evidence suggests that the non-coding RNAs (ncRNAs) may play major biological roles in cellular development, physiology and pathologies. NcRNAs could be grouped into two major classes based on the transcript size; small ncRNAs and long ncRNAs. Each of these classes can be further divided, whereas novel subclasses are still being discovered and characterized. Although, in the last years, small ncRNAs called microRNAs were studied most frequently with more than ten thousand hits at PubMed database, recently, evidence has begun to accumulate describing the molecular mechanisms by which a wide range of novel RNA species function, providing insight into their functional roles in cellular biology and in human disease. In this review, we summarize newly discovered classes of ncRNAs, and highlight their functioning in cancer biology and potential usage as biomarkers or therapeutic targets.
Non-coding RNAs; microRNAs; siRNAs; piRNAs; lncRNAs; Cancer
RNA editing is an important cellular process by which the nucleotides in a mature RNA transcript are altered to cause them to differ from the corresponding DNA sequence. While this process yields essential transcripts in humans and other organisms, it is believed to occur at a relatively small number of loci. The rarity of RNA editing has been challenged by a recent comparison of human RNA and DNA sequence data from 27 individuals, which revealed that over 10,000 human exonic sites appear to exhibit RNA-DNA differences (RDDs). Many of these differences could not have been caused by either of the two previously known human RNA editing mechanisms—ADAR-mediated A→G substitutions or APOBEC1-mediated C→U switches—suggesting that a previously unknown mechanism of RNA editing may be active in humans. Here, we reanalyze these data and demonstrate that genomic sequences exist in these same individuals or in the human genome that match the majority of RDDs. Our results suggest that the majority of these RDD events were observed due to accurate transcription of sequences paralogous to the apparently edited gene but differing at the edited site. In light of our results it seems prudent to conclude that if indeed an unknown mechanism is causing RDD events in humans, such events occur at a much lower frequency than originally proposed.
It has recently been shown that RNA 3′ end formation plays a more widespread role in controlling gene expression than previously thought. In order to examine the impact of regulated 3′ end formation genome-wide we applied direct RNA sequencing to A. thaliana. Here we show the authentic transcriptome in unprecedented detail and how 3′ end formation impacts genome organization. We reveal extreme heterogeneity in RNA 3′ ends, discover previously unrecognized non-coding RNAs and propose widespread re-annotation of the genome. We explain the origin of most poly(A)+ antisense RNAs and identify cis-elements that control 3′ end formation in different registers. These findings are essential to understand what the genome actually encodes, how it is organized and the impact of regulated 3′ end formation on these processes.
Comparative RNA-seq analysis of two related pathogenic and non-pathogenic bacterial strains reveals a hidden layer of divergence in the non-coding genome as well as conserved, widespread regulatory structures called ‘Excludons', which mediate regulation through long non-coding antisense RNAs.
Comparative transcriptome sequencing of two closely related bacterial strains reveals a hidden layer of divergence in the non-coding genome.Pathogen-specific non-coding RNAs, which might contribute to virulence, are revealed.The Listeria genome contains a class of unusually long antisense RNAs (lasRNAs) which spans divergent genes and repress expression of the genes located opposite to them while activating the other. The genetic organization of these lasRNAs and operon was named an excludon.The exhaustive transcriptome information from this publication is provided as an open resource with a web-accessible transcriptome browser.
Listeria monocytogenes is a human, food-borne pathogen. Genomic comparisons between L. monocytogenes and Listeria innocua, a closely related non-pathogenic species, were pivotal in the identification of protein-coding genes essential for virulence. However, no comprehensive comparison has focused on the non-coding genome. We used strand-specific cDNA sequencing to produce genome-wide transcription start site maps for both organisms, and developed a publicly available integrative browser to visualize and analyze both transcriptomes in different growth conditions and genetic backgrounds. Our data revealed conservation across most transcripts, but significant divergence between the species in a subset of non-coding RNAs. In L. monocytogenes, we identified 113 small RNAs (33 novel) and 70 antisense RNAs (53 novel), significantly increasing the repertoire of ncRNAs in this species. Remarkably, we identified a class of long antisense transcripts (lasRNAs) that overlap one gene while also serving as the 5′ UTR of the adjacent divergent gene. Experimental evidence suggests that lasRNAs transcription inhibits expression of one operon while activating the expression of another. Such a lasRNA/operon structure, that we named ‘excludon', might represent a novel form of regulation in bacteria.
comparative genomics; Listeria monocytogenes; RNA-seq; transcriptome; TSS map
In recent years, the introduction of massively parallel sequencing platforms for Next Generation Sequencing (NGS) protocols, able to simultaneously sequence hundred thousand
DNA fragments, dramatically changed the landscape of the genetics studies. RNA-Seq for transcriptome studies, Chip-Seq for DNA-proteins interaction,
CNV-Seq for large genome nucleotide variations are only some of the intriguing new
applications supported by these innovative platforms. Among them RNA-Seq
is perhaps the most complex NGS application. Expression levels of specific genes,
differential splicing, allele-specific expression of transcripts can be accurately determined by RNA-Seq experiments to address many biological-related issues. All these attributes are not readily achievable from previously widespread
hybridization-based or tag sequence-based approaches. However, the unprecedented level
of sensitivity and the large amount of available data produced by NGS platforms provide
clear advantages as well as new challenges and issues. This technology brings the
great power to make several new biological observations and discoveries, it also requires
a considerable effort in the development of new bioinformatics tools to deal with these
massive data files. The paper aims to give a survey of the RNA-Seq
methodology, particularly focusing on the challenges that this application presents both
from a biological and a bioinformatics point of view.
Neurons modulate gene expression with subcellular precision through excitation-coupled local protein synthesis, a process that is regulated in part through the involvement of microRNAs (miRNAs), a class of small non-coding RNAs. The biosynthesis of miRNAs is reviewed, with special emphasis on miRNA families, the subcellular localization of specific miRNAs in neurons, and their potential roles in the response to drugs of abuse. For over a decade, DNA microarrays have dominated genome-wide gene expression studies, revealing widespread effects of drug exposure on neuronal gene expression. We review a number of recent studies that explore the emerging role of miRNAs in the biochemical and behavioral responses to cocaine. The more powerful next-generation sequencing technology offers certain advantages and is supplanting microarrays for the analysis of complex transcriptomes. Next-generation sequencing is unparalleled in its ability to identify and quantify low-abundance transcripts without prior sequence knowledge, facilitating the accurate detection and quantification of miRNAs expressed in total tissue and miRNAs localized to postsynaptic densities (PSDs). We previously identified cocaine-responsive miRNAs, synaptically enriched and depleted miRNA families, and confirmed cocaine-induced changes in protein expression for several bioinformatically predicted target genes. The miR-8 family was found to be highly enriched and cocaine-regulated at the PSD, where its members may modulate expression of cell adhesion molecules. An integrative approach that combines mRNA, miRNA, and protein expression profiling in combination with focused single gene studies and innovative behavioral paradigms should facilitate the development of more effective therapeutic approaches to treat addiction.
cocaine; RNA-Seq; postsynaptic density; cell adhesion; miR-8; microRNAs; synaptic plasticity
In the mammalian cortex, neurons and glia form a patterned structure across six layers whose complex cytoarchitectonic arrangement is likely to contribute to cognition. We sequenced transcriptomes from layers 1-6b of different areas (primary and secondary) of the adult (postnatal day 56) mouse somatosensory cortex to understand the transcriptional levels and functional repertoires of coding and noncoding loci for cells constituting these layers. A total of 5,835 protein-coding genes and 66 noncoding RNA loci are differentially expressed (“patterned”) across the layers, on the basis of a machine-learning model (naive Bayes) approach. Layers 2-6b are each associated with specific functional and disease annotations that provide insights into their biological roles. This new resource (http://genserv.anat.ox.ac.uk/layers) greatly extends currently available resources, such as the Allen Mouse Brain Atlas and microarray data sets, by providing quantitative expression levels, by being genome-wide, by including novel loci, and by identifying candidate alternatively spliced transcripts that are differentially expressed across layers.
► Online atlas of genome-wide transcription across neocortical layers ► Significant, replicated associations between disease genes and specific layers ► Widespread isoform switching across layers ► LincRNAs conserved, coexpressed across layers with neighboring protein-coding genes
Foot-and-mouth disease virus (FMDV) uses a highly conserved Arg-Gly-Asp (RGD) triplet for attachment to host cells and this motif is believed to be essential for virus viability. Previous sequence analyses of the 1D-encoding region of an FMDV field isolate (Asia1/JS/CHA/05) and its two derivatives indicated that two viruses, which contained an Arg-Asp-Asp (RDD) or an Arg-Ser-Asp (RSD) triplet instead of the RGD integrin recognition motif, were generated serendipitously upon short-term evolution of field isolate in different biological environments. To examine the influence of single amino acid substitutions in the receptor binding site of the RDD-containing FMD viral genome on virus viability and the ability of non-RGD FMDVs to cause disease in susceptible animals, we constructed an RDD-containing FMDV full-length cDNA clone and derived mutant molecules with RGD or RSD receptor recognition motifs. Following transfection of BSR cells with the full-length genome plasmids, the genetically engineered viruses were examined for their infectious potential in cell culture and susceptible animals.
Amino acid sequence analysis of the 1D-coding region of different derivatives derived from the Asia1/JS/CHA/05 field isolate revealed that the RDD mutants became dominant or achieved population equilibrium with coexistence of the RGD and RSD subpopulations at an early phase of type Asia1 FMDV quasispecies evolution. Furthermore, the RDD and RSD sequences remained genetically stable for at least 20 passages. Using reverse genetics, the RDD-, RSD-, and RGD-containing FMD viruses were rescued from full-length cDNA clones, and single amino acid substitution in RDD-containing FMD viral genome did not affect virus viability. The genetically engineered viruses replicated stably in BHK-21 cells and had similar growth properties to the parental virus. The RDD parental virus and two non-RGD recombinant viruses were virulent to pigs and bovines that developed typical clinical disease and viremia.
FMDV quasispecies evolving in a different biological environment gained the capability of selecting different receptor recognition site. The RDD-containing FMD viral genome can accommodate substitutions in the receptor binding site without additional changes in the capsid. The viruses expressing non-RGD receptor binding sites can replicate stably in vitro and produce typical FMD clinical disease in susceptible animals.
Splicing is a cellular mechanism, which dictates eukaryotic gene expression by removing the noncoding introns and ligating the coding exons in the form of a messenger RNA molecule. Alternative splicing (AS) adds a major level of complexity to this mechanism and thus to the regulation of gene expression. This widespread cellular phenomenon generates multiple messenger RNA isoforms from a single gene, by utilizing alternative splice sites and promoting different exon–intron inclusions and exclusions. AS greatly increases the coding potential of eukaryotic genomes and hence contributes to the diversity of eukaryotic proteomes. Mutations that lead to disruptions of either constitutive splicing or AS cause several diseases, among which are myotonic dystrophy and cystic fibrosis. Aberrant splicing is also well established in cancer states. Identification of rare novel mutations associated with splice-site recognition, and splicing regulation in general, could provide further insight into genetic mechanisms of rare diseases. Here, disease relevance of aberrant splicing is reviewed, and the new methodological approach of starting from disease phenotype, employing exome sequencing and identifying rare mutations affecting splicing regulation is described. Exome sequencing has emerged as a reliable method for finding sequence variations associated with various disease states. To date, genetic studies using exome sequencing to find disease-causing mutations have focused on the discovery of nonsynonymous single nucleotide polymorphisms that alter amino acids or introduce early stop codons, or on the use of exome sequencing as a means to genotype known single nucleotide polymorphisms. The involvement of splicing mutations in inherited diseases has received little attention and thus likely occurs more frequently than currently estimated. Studies of exome sequencing followed by molecular and bioinformatic analyses have great potential to reveal the high impact of splicing mutations underlying human disease.
Chlamydia trachomatis is an obligate intracellular pathogenic bacterium that has been refractory to genetic manipulations. Although the genomes of several strains have been sequenced, very little information is available on the gene structure of these bacteria. We used deep sequencing to define the transcriptome of purified elementary bodies (EB) and reticulate bodies (RB) of C. trachomatis L2b, respectively. Using an RNA-seq approach, we have mapped 363 transcriptional start sites (TSS) of annotated genes. Semi-quantitative analysis of mapped cDNA reads revealed differences in the RNA levels of 84 genes isolated from EB and RB, respectively. We have identified and in part confirmed 42 genome- and 1 plasmid-derived novel non-coding RNAs. The genome encoded non-coding RNA, ctrR0332 was one of the most abundantly and differentially expressed RNA in EB and RB, implying an important role in the developmental cycle of C. trachomatis. The detailed map of TSS in a thus far unprecedented resolution as a complement to the genome sequence will help to understand the organization, control and function of genes of this important pathogen.
The transcriptome of a cell is represented by a myriad of different RNA molecules with and without protein-coding capacities. In recent years, advances in sequencing technologies have allowed researchers to more fully appreciate the complexity of whole transcriptomes, showing that the vast majority of the genome is transcribed, producing a diverse population of non-protein coding RNAs (ncRNAs). Thus, the biological significance of non-coding RNAs (ncRNAs) have been largely underestimated. Amongst these multiple classes of ncRNAs, the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionally diverse. A small but growing number of lncRNAs have been experimentally studied, and a view is emerging that these are key regulators of epigenetic gene regulation in mammalian cells. LncRNAs have already been implicated in human diseases such as cancer and neurodegeneration, highlighting the importance of this emergent field. In this article, we review the catalogs of annotated lncRNAs and the latest advances in our understanding of lncRNAs.
non-coding RNAs; regulation; long non-coding RNA; epigenetics
Human papillomaviruses (HPV) cause diseases ranging from benign warts to invasive tumours. A subset of these viruses termed “high risk” infects the cervix where persistent infection can lead to cervical cancer. Although many HPV genomes have been sequenced, knowledge of virus gene expression and its regulation is still incomplete. This is due in part to lack, until recently, of suitable systems for virus propagation in the laboratory. HPV gene expression is polycistronic initiating from multiple promoters. Gene regulation occurs at transcriptional, but particularly post-transcriptional levels, including RNA processing, nuclear export, mRNA stability and translation. A close association between the virus replication cycle and epithelial differentiation adds a further layer of complexity. Understanding HPV mRNA expression and its regulation in the different diseases associated with infection may lead to development of novel diagnostic approaches and will reveal key viral and cellular targets for development of novel antiviral therapies.
Human papillomavirus; gene expression; transcription; RNA processing; translation; diagnostics; antiviral therapy
Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Mitochondrial genomes are a valuable source of data for analysing phylogenetic relationships. Besides sequence information, mitochondrial gene order may add phylogenetically useful information, too. Sipuncula are unsegmented marine worms, traditionally placed in their own phylum. Recent molecular and morphological findings suggest a close affinity to the segmented Annelida.
The first complete mitochondrial genome of a member of Sipuncula, Sipunculus nudus, is presented. All 37 genes characteristic for metazoan mtDNA were detected and are encoded on the same strand. The mitochondrial gene order (protein-coding and ribosomal RNA genes) resembles that of annelids, but shows several derivations so far found only in Sipuncula. Sequence based phylogenetic analysis of mitochondrial protein-coding genes results in significant bootstrap support for Annelida sensu lato, combining Annelida together with Sipuncula, Echiura, Pogonophora and Myzostomida.
The mitochondrial sequence data support a close relationship of Annelida and Sipuncula. Also the most parsimonious explanation of changes in gene order favours a derivation from the annelid gene order. These results complement findings from recent phylogenetic analyses of nuclear encoded genes as well as a report of a segmental neural patterning in Sipuncula.
The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of ‘transcription noise’. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner.
We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented.
Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms.
Gene expression in mitochondria of kinetoplastid protozoa requires RNA editing, a post-transcriptional process which involves insertion or deletion of uridine residues at specific sites within mitochondrial pre-mRNAs. Sequence specificity of the RNA editing process is mediated by oligo-uridylated small, non-coding RNAs, designated as guide RNAs (gRNAs). In this study, we have analyzed the small ncRNA transcriptome from kinetoplast mitochondria of Leishmania tarentolae by generating specialized cDNA libraries encoding size-selected RNA species. Through this screen, a significant number of novel oligo-uridylated RNA species, which we have termed oU-RNAs, has been identified. Most novel oU-RNAs are present as stable RNA species in mitochondria as assessed by northern blot analysis. Thereby, novel oU-RNAs show similar expression levels and sizes as previously reported for canonical gRNAs. Several oU-RNAs are transcribed from both strands of the maxicircle and minicircles components of the mitochondrial genome, from regions where up till now no transcription has been reported. Two stable oU-RNAs exhibit an anchor sequence in antisense orientation to known gRNAs and thus might regulate editing of respective pre-mRNAs. A number of oU-RNAs map in antisense orientation to non-edited protein-coding genes suggesting that they might function by a different mechanism. In addition, our screen shows that all kinetoplast-derived RNAs are prone to some degree of uridylation.
Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species.
This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision.
The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.
Antisense RNAs that originate from the complementary strand of protein coding genes are involved in the regulation of gene expression in all domains of life. In bacteria, some of these antisense RNAs are transcriptional noise whiles others play a vital role to adapt the cell to changing environmental conditions. By deep sequencing analysis of transcriptome of Salmonella enterica serovar Typhi, a partial RNA sequence encoded in-cis to the dnaA gene was revealed. Northern blot and RACE analysis confirmed the transcription of this antisense RNA which was expressed mostly in the stationary phase of the bacterial growth and also under iron limitation and osmotic stress. Pulse expression analysis showed that overexpression of the antisense RNA resulted in a significant increase in the mRNA levels of dnaA, which will ultimately enhance their translation. Our findings have revealed that antisense RNA of dnaA is indeed transcribed not merely as a by-product of the cell's transcription machinery but plays a vital role as far as stability of dnaA mRNA is concerned.
The exploration of the non-protein-coding RNA (ncRNA) transcriptome is currently focused on profiling of microRNA expression and detection of novel ncRNA transcription units. However, recent studies suggest that RNA processing can be a multi-layer process leading to the generation of ncRNAs of diverse functions from a single primary transcript. Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases. Thus the correct assessment of widespread RNA processing events is one of the major obstacles in transcriptome research. Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data. The major features include efficient handling of non-unique reads, detection of novel stable ncRNA transcripts and processing products and annotation of known transcripts based on multiple sources of information. To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae. By employing the APART pipeline, we were able to detect and confirm by independent experimental methods multiple novel stable RNA molecules differentially processed from well known ncRNAs, like rRNAs, tRNAs or snoRNAs, in a stress-dependent manner.
It was recently shown that a new class of small nuclear RNAs is encoded in introns of protein-coding genes and that they originate by processing of the pre-mRNA in which they are contained. Little is known about the mechanism and the factors involved in this new type of processing. The L1 ribosomal protein gene of Xenopus laevis is a well-suited system for studying this phenomenon: several different introns encode for two small nucleolar RNAs (snoRNAs; U16 and U18). In this paper, we analyzed the in vitro processing of these snoRNAs and showed that both are released from the pre-mRNA by a common mechanism: endonucleolytic cleavages convert the pre-mRNA into a precursor snoRNA with 5' and 3' trailer sequences. Subsequently, trimming converts the pre-snoRNAs into mature molecules. Oocyte and HeLa nuclear extracts are able to process X. laevis and human substrates in a similar manner, indicating that the processing of this class of snoRNAs relies on a common and evolutionarily conserved mechanism. In addition, we found that the cleavage activity is strongly enhanced in the presence of Mn2+ ions.
Non-coding RNA (ncRNA) transcripts are RNA molecules that do not code for proteins, but elicit function by other mechanisms. The vast majority of RNA produced in a cell is non-coding ribosomal RNA, produced from relatively few loci, however more recently complementary DNA (cDNA) cloning, tag sequencing, and genome tiling array studies suggest that ncRNAs also account for the majority of RNA species produced by a cell. ncRNA based regulation has been referred to as a ‘hidden layer’ of signals or ‘dark matter’ that control gene expression in cellular processes by poorly described mechanisms. These terms have appeared as ncRNAs until recently have been ignored by expression profiling and cDNA annotation projects and their mode of action is diverse (e.g. influencing chromatin structure and epigenetics, translational silencing, transcriptional silencing). Here, we highlight recent functional genomics strategies toward identifying and assigning function to ncRNA transcription.
non-coding RNA; Sequencing; transcription; annotation
Serotyping data for pneumococci causing invasive and noninvasive disease in 2008–2009 and 2010–2011 from >43 US centers were compared with data from preconjugate vaccine (1999–2000) and postconjugate vaccine (2004–2005) periods. Prevalence of 7-valent pneumococcal conjugate vaccine serotypes decreased from 64% of invasive and 50% of noninvasive isolates in 1999–2000 to 3.8% and 4.2%, respectively, in 2010–2011. Increases in serotype 19A stopped after introduction of 13-valent pneumococcal vaccine (PCV13) in 2010. Prevalences of other predominant serotypes included in or related to PCV13 (3, 6C, 7F) also remained similar for 2008–2009 and 2010–2011. The only major serotype that increased from 2008–2009 to 2010–2011 was nonvaccine serotype 35B. These data show that introduction of the 7-valent vaccine has dramatically decreased prevalence of its serotypes and that addition of serotypes in PCV13 could provide coverage of 39% of isolates that continue to cause disease.
Streptococcus pneumoniae; bacteria; streptococci; serotype; vaccine; conjugate vaccines; pathogenesis; United States
The transmission of information from DNA to RNA is a critical process. We compared RNA sequences from human B cells of 27 individuals to the corresponding DNA sequences from the same individuals and uncovered more than 10,000 exonic sites where the RNA sequences do not match that of the DNA. All 12 possible categories of discordances were observed. These differences were nonrandom as many sites were found in multiple individuals and in different cell types, including primary skin cells and brain tissues. Using mass spectrometry, we detected peptides that are translated from the discordant RNA sequences and thus do not correspond exactly to the DNA sequences. These widespread RNA-DNA differences in the human transcriptome provide a yet unexplored aspect of genome variation.