RNAdb is a comprehensive database of mammalian non-protein-coding RNAs (ncRNAs). There is increasing recognition that ncRNAs play important regulatory roles in multicellular organisms, and there is an expanding rate of discovery of novel ncRNAs as well as an increasing allocation of function. In this update to RNAdb, we provide nucleotide sequences and annotations for tens of thousands of non-housekeeping ncRNAs, including a wide range of mammalian microRNAs, small nucleolar RNAs and larger mRNA-like ncRNAs. Some of these have documented functions and/or expression patterns, but the majority remain of unclear significance, and include PIWI-interacting RNAs, ncRNAs identified from the latest rounds of large-scale cDNA sequencing projects, putative antisense transcripts, as well as ncRNAs predicted on the basis of structural features and alignments. Improvements to the database comprise not only new and updated ncRNA datasets, but also provision of microarray-based expression data and closer interface with more specialized ncRNA resources such as miRBase and snoRNA-LBME-db. To access RNAdb, visit .
Non-coding RNA (ncRNA) genes do not encode proteins but produce functional RNA molecules that play crucial roles in many key biological processes. Recent genome-wide transcriptional profiling studies using tiling arrays in organisms such as human and Arabidopsis have revealed a great number of transcripts, a large portion of which have little or no capability to encode proteins. This unexpected finding suggests that the currently known repertoire of ncRNAs may only represent a small fraction of ncRNAs of the organisms. Thus, efficient and effective prediction of ncRNAs has become an important task in bioinformatics in recent years. Among the available computational methods, the comparative genomic approach seems to be the most powerful to detect ncRNAs. The recent completion of the sequencing of several major plant genomes has made the approach possible for plants.
We have developed a pipeline to predict novel ncRNAs in the Arabidopsis (Arabidopsis thaliana) genome. It starts by comparing the expressed intergenic regions of Arabidopsis as provided in two whole-genome high-density oligo-probe arrays from the literature with the intergenic nucleotide sequences of all completely sequenced plant genomes including rice (Oryza sativa), poplar (Populus trichocarpa), grape (Vitis vinifera), and papaya (Carica papaya). By using multiple sequence alignment, a popular ncRNA prediction program (RNAz), wet-bench experimental validation, protein-coding potential analysis, and stringent screening against various ncRNA databases, the pipeline resulted in 16 families of novel ncRNAs (with a total of 21 ncRNAs).
In this paper, we undertake a genome-wide search for novel ncRNAs in the genome of Arabidopsis by a comparative genomics approach. The identified novel ncRNAs are evolutionarily conserved between Arabidopsis and other recently sequenced plants, and may conduct interesting novel biological functions.
Recent scientific advances have demonstrated the existence of extensive RNA-based regulatory networks involved in orchestrating nearly every cellular process in health and various disease states. This previously hidden layer of functional RNAs is derived largely from non–protein-coding DNA sequences that constitute more than 98% of the genome in humans. These non–protein-coding RNAs (ncRNAs) include subclasses that are well known, such as transfer RNAs and ribosomal RNAs, as well as those that have more recently been characterized, such as microRNAs, small nucleolar RNAs, and long ncRNAs. In this review, we examine the role of these novel ncRNAs in the nervous system and highlight emerging evidence that implicates RNA-based networks in the molecular pathogenesis of stroke. We also describe RNA editing, a related epigenetic mechanism that is partly responsible for generating the exquisite degrees of environmental responsiveness and molecular diversity that characterize ncRNAs. In addition, we discuss the development of future therapeutic strategies for locus-specific and genome-wide regulation of genes and functional gene networks through the modulation of RNA transcription, posttranscriptional RNA processing (eg, RNA modifications, quality control, intracellular trafficking, and local and long-distance intercellular transport), and RNA translation. These novel approaches for neural cell- and tissue-selective reprogramming of epigenetic regulatory mechanisms are likely to promote more effective neuroprotective and neural regenerative responses for safeguarding and even restoring central nervous system function.
Recent analysis of the mouse transcriptional data has revealed the existence of ~34,000 messenger-like non-coding RNAs (ml-ncRNAs). Whereas the functional properties of these ml-ncRNAs are beginning to be unravelled, no functional information is available for the large majority of these transcripts.
A few ml-ncRNA have been shown to have genomic loci that overlap with microRNA loci, leading us to suspect that a fraction of ml-ncRNA may encode microRNAs. We therefore developed an algorithm (PriMir) for specifically detecting potential microRNA-encoding transcripts in the entire set of 34,030 mouse full-length ml-ncRNAs. In combination with mouse-rat sequence conservation, this algorithm detected 97 (80 of them were novel) strong miRNA-encoding candidates, and for 52 of these we obtained experimental evidence for the existence of their corresponding mature microRNA by microarray and stem-loop RT-PCR. Sequence analysis of the microRNA-encoding RNAs revealed an internal motif, whose presence correlates strongly (R2 = 0.9, P-value = 2.2 × 10-16) with the occurrence of stem-loops with characteristics of known pre-miRNAs, indicating the presence of a larger number microRNA-encoding RNAs (from 300 up to 800) in the ml-ncRNAs population.
Our work highlights a unique group of ml-ncRNAs and offers clues to their functions.
Non-coding RNAs (ncRNAs) are a class of transcribed RNA molecules without protein-coding potential. They were regarded as transcriptional noise, or the byproduct of genetic information flow from DNA to protein for a long time. However, in recent years, a number of studies have shown that ncRNAs are pervasively transcribed, and most of them show evidence of evolutionary conservation, although less conserved than protein-coding genes. More importantly, many ncRNAs have been confirmed as playing crucial regulatory roles in diverse biological processes and tumorigenesis. Here we summarize the functional significance of this class of “dark matter” in terms its genomic organization, evolutionary conservation, and broad functional classes.
ncRNA; transcription; genetic; long ncRNA; evolution; molecular; gene regulation
Non-protein-coding RNAs (ncRNAs) fulfill a wide range of cellular functions from protein synthesis to regulation of gene expression. Identification of novel regulatory ncRNAs by experimental approaches commonly includes the generation of specialized cDNA libraries encoding small ncRNA species. However, such identification is severely hampered by the presence of constitutively expressed and highly abundant ‘house-keeping’ ncRNAs, such as ribosomal RNAs, small nuclear RNAs or transfer RNAs. We have developed a novel experimental strategy, designated as subtractive hybridization of ncRNA transcripts (SHORT) to specifically select and amplify novel regulatory ncRNAs, which are only expressed at certain stages or under specific growth conditions of cells. The method is based on the selective subtractive hybridization technique, formerly applied to the detection of differentially expressed mRNAs. As a model system, we applied SHORT to Epstein–Barr virus (EBV) infected human B cells. Thereby, we identified 21 novel as well as previously reported ncRNA species to be up-regulated during virus infection. Our method will serve as a powerful tool to identify novel functional ncRNAs acting as genetic switches in the regulation of fundamental cellular processes such as development, tissue differentiation or disease.
It is apparent that non-coding transcripts are a common feature of higher organisms and encode uncharacterized layers of genetic regulation and information. We used public bovine EST data from many developmental stages and tissues, and developed a pipeline for the genome wide identification and annotation of non-coding RNAs (ncRNAs). We have predicted 23,060 bovine ncRNAs, 99% of which are un-annotated, based on known ncRNA databases. Intergenic transcripts accounted for the majority (57%) of the predicted ncRNAs and the occurrence of ncRNAs and genes were only moderately correlated (r = 0.55, p-value<2.2e-16). Many of these intergenic non-coding RNAs mapped close to the 3′ or 5′ end of thousands of genes and many of these were transcribed from the opposite strand with respect to the closest gene, particularly regulatory-related genes. Conservation analyses showed that these ncRNAs were evolutionarily conserved, and many intergenic ncRNAs proximate to genes contained sequence-specific motifs. Correlation analysis of expression between these intergenic ncRNAs and protein-coding genes using RNA-seq data from a variety of tissues showed significant correlations with many transcripts. These results support the hypothesis that ncRNAs are common, transcribed in a regulated fashion and have regulatory functions.
Genome-wide studies have revealed that human and other mammalian genomes are pervasively transcribed and produce thousands of regulatory non-protein-coding RNAs (ncRNAs), including miRNAs, siRNAs, piRNAs and long non-coding RNAs (lncRNAs). Emerging evidences suggest that these ncRNAs also play a pivotal role in genome integrity and stability via the regulation of DNA damage response (DDR). In this review, we discuss the recent finding on the interplay of ncRNAs with the canonical DDR signaling pathway, with a particular emphasis on miRNAs and lncRNAs. While the expression of ncRNAs is regulated in the DDR, the DDR is also subjected to regulation by those DNA damage-responsive ncRNAs. In addition, the roles of those Dicer- and Drosha-dependent small RNAs produced in the vicinity of double-strand breaks sites are also described.
DNA damage response; ncRNAs; miRNAs; lncRNAs; crosstalk
The transcriptome of a cell is represented by a myriad of different RNA molecules with and without protein-coding capacities. In recent years, advances in sequencing technologies have allowed researchers to more fully appreciate the complexity of whole transcriptomes, showing that the vast majority of the genome is transcribed, producing a diverse population of non-protein coding RNAs (ncRNAs). Thus, the biological significance of non-coding RNAs (ncRNAs) have been largely underestimated. Amongst these multiple classes of ncRNAs, the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionally diverse. A small but growing number of lncRNAs have been experimentally studied, and a view is emerging that these are key regulators of epigenetic gene regulation in mammalian cells. LncRNAs have already been implicated in human diseases such as cancer and neurodegeneration, highlighting the importance of this emergent field. In this article, we review the catalogs of annotated lncRNAs and the latest advances in our understanding of lncRNAs.
non-coding RNAs; regulation; long non-coding RNA; epigenetics
Enterococcus faecalis is a commensal bacterium and a major opportunistic human pathogen. In this study, we combined in silico predictions with a novel 5′RACE-derivative method coined ‘5′tagRACE’, to perform the first search for non-coding RNAs (ncRNAs) encoded on the E. faecalis chromosome. We used the 5′tagRACE to simultaneously probe and characterize primary transcripts, and demonstrate here the simplicity, the reliability and the sensitivity of the method. The 5′tagRACE is complementary to tiling arrays or RNA-sequencing methods, and is also directly applicable to deep RNA sequencing and should significantly improve functional studies of bacterial RNA landscapes. From 45 selected loci of the E. faecalis chromosome, we discovered and mapped 29 novel ncRNAs, 10 putative novel mRNAs and 16 antisense transcriptional organizations. We describe in more detail the oxygen-dependent expression of one ncRNA located in an E. faecalis pathogenicity island, the existence of an ncRNA that is antisense to the ncRNA modulator of the RNA polymerase, SsrS and provide evidences for the functional interplay between two distinct toxin–antitoxin modules.
With the availability of complete genome sequences for a growing number of organisms, high-throughput methods for gene annotation and analysis of genome dynamics are needed. The application of whole-genome tiling microarrays for studies of global gene expression is providing a more unbiased view of the transcriptional activity within genomes. For example, this approach has led to the identification and isolation of many novel non-protein-coding RNAs (ncRNAs), which have been suggested to comprise a major component of the transcriptome that have novel functions involved in epigenetic regulation of the genome. Additionally, tiling arrays have been recently applied to the study of histone modifications and methylation of cytosine bases (DNA methylation). Surprisingly, recent studies combining the analysis of gene expression (transcriptome) and DNA methylation (methylome) using whole-genome tiling arrays revealed that DNA methylation regulates the expression levels of many ncRNAs. Further capture and integration of additional types of genome-wide data sets will help to illuminate additional hidden features of the dynamic genomic landscape that are regulated by both genetic and epigenetic pathways in plants.
Previously, the majority of the human genome was thought to be ‘junk’ DNA with no functional purpose. Over the past decade, the field of RNA research has rapidly expanded, with a concomitant increase in the number of non-protein coding RNA (ncRNA) genes identified in this ‘junk’. Many of the encoded ncRNAs have already been shown to be essential for a variety of vital functions, and this wealth of annotated human ncRNAs requires standardised naming in order to aid effective communication. The HUGO Gene Nomenclature Committee (HGNC) is the only organisation authorised to assign standardised nomenclature to human genes. Of the 30,000 approved gene symbols currently listed in the HGNC database (http://www.genenames.org/search), the majority represent protein-coding genes; however, they also include pseudogenes, phenotypic loci and some genomic features. In recent years the list has also increased to include almost 3,000 named human ncRNA genes. HGNC is actively engaging with the RNA research community in order to provide unique symbols and names for each sequence that encodes an ncRNA. Most of the classical small ncRNA genes have now been provided with a unique nomenclature, and work on naming the long (>200 nucleotides) non-coding RNAs (lncRNAs) is ongoing.
ncRNA; RNA; nomenclature; non-protein coding
Previously, the majority of the human genome was thought to be 'junk' DNA with no functional purpose. Over the past decade, the field of RNA research has rapidly expanded, with a concomitant increase in the number of non-protein coding RNA (ncRNA) genes identified in this 'junk'. Many of the encoded ncRNAs have already been shown to be essential for a variety of vital functions, and this wealth of annotated human ncRNAs requires standardised naming in order to aid effective communication. The HUGO Gene Nomenclature Committee (HGNC) is the only organisation authorised to assign standardised nomenclature to human genes. Of the 30,000 approved gene symbols currently listed in the HGNC database (http://www.genenames.org/search), the majority represent protein-coding genes; however, they also include pseudogenes, phenotypic loci and some genomic features. In recent years the list has also increased to include almost 3,000 named human ncRNA genes. HGNC is actively engaging with the RNA research community in order to provide unique symbols and names for each sequence that encodes an ncRNA. Most of the classical small ncRNA genes have now been provided with a unique nomenclature, and work on naming the long (> 200 nucleotides) non-coding RNAs (lncRNAs) is ongoing.
ncRNA; RNA; nomenclature; non-protein coding
Non-coding RNAs (ncRNAs) are transcripts that do not code for proteins. Recent findings have shown that RNA-mediated regulatory mechanisms influence a substantial portion of typical microbial genomes. We present an efficient method for finding potential ncRNAs in bacteria by clustering genomic sequences based on homology inferred from both primary sequence and secondary structure. We evaluate our approach using a set of predominantly Firmicutes sequences. Our results showed that, though primary sequence based-homology search was inaccurate for diverged ncRNA sequences, through our clustering method, we were able to infer motifs that recovered nearly all members of most known ncRNA families. Hence, our method shows promise for discovering new families of ncRNA.
ncRNAs; noncoding RNA; RNA discovery; hierarchical clustering; motif discovery
Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.
Non-coding RNAs (ncRNAs) are an emerging focus for both computational analysis and experimental research, resulting in a growing number of novel, non-protein coding transcripts with often unknown functions. Whole genome screens in higher eukaryotes, for example, provided evidence for a surprisingly large number of ncRNAs. To supplement these searches, we performed a computational analysis of seven yeast species and searched for new ncRNAs and RNA motifs.
A comparative analysis of the genomes of seven yeast species yielded roughly 2800 genomic loci that showed the hallmarks of evolutionary conserved RNA secondary structures. A total of 74% of these regions overlapped with annotated non-coding or coding genes in yeast. Coding sequences that carry predicted structured RNA elements belong to a limited number of groups with common functions, suggesting that these RNA elements are involved in post-transcriptional regulation and/or cellular localization. About 700 conserved RNA structures were found outside annotated coding sequences and known ncRNA genes. Many of these predicted elements overlapped with UTR regions of particular classes of protein coding genes. In addition, a number of RNA elements overlapped with previously characterized antisense transcripts. Transcription of about 120 predicted elements located in promoter regions and other, previously un-annotated, intergenic regions was supported by tiling array experiments, ESTs, or SAGE data.
Our computational predictions strongly suggest that yeasts harbor a substantial pool of several hundred novel ncRNAs. In addition, we describe a large number of RNA structures in coding sequences and also within antisense transcripts that were previously characterized using tiling arrays.
The assumption that RNA can be readily classified into either protein-coding or non-protein–coding categories has pervaded biology for close to 50 years. Until recently, discrimination between these two categories was relatively straightforward: most transcripts were clearly identifiable as protein-coding messenger RNAs (mRNAs), and readily distinguished from the small number of well-characterized non-protein–coding RNAs (ncRNAs), such as transfer, ribosomal, and spliceosomal RNAs. Recent genome-wide studies have revealed the existence of thousands of noncoding transcripts, whose function and significance are unclear. The discovery of this hidden transcriptome and the implicit challenge it presents to our understanding of the expression and regulation of genetic information has made the need to distinguish between mRNAs and ncRNAs both more pressing and more complicated. In this Review, we consider the diverse strategies employed to discriminate between protein-coding and noncoding transcripts and the fundamental difficulties that are inherent in what may superficially appear to be a simple problem. Misannotations can also run in both directions: some ncRNAs may actually encode peptides, and some of those currently thought to do so may not. Moreover, recent studies have shown that some RNAs can function both as mRNAs and intrinsically as functional ncRNAs, which may be a relatively widespread phenomenon. We conclude that it is difficult to annotate an RNA unequivocally as protein-coding or noncoding, with overlapping protein-coding and noncoding transcripts further confounding this distinction. In addition, the finding that some transcripts can function both intrinsically at the RNA level and to encode proteins suggests a false dichotomy between mRNAs and ncRNAs. Therefore, the functionality of any transcript at the RNA level should not be discounted.
Polyadenylated, mRNA-like transcripts with no coding potential are abundant in eukaryotes, but the functions of these long non-coding RNAs (ncRNAs) are enigmatic. In meiosis, Rec12 (Spo11) catalyzes the formation of dsDNA breaks (DSBs) that initiate homologous recombination. Most meiotic recombination is positioned at hotspots, but knowledge of the mechanisms is nebulous. In the fission yeast genome DSBs are located within 194 prominent peaks separated on average by 65-kbp intervals of DNA that are largely free of DSBs.
We compared the genome-wide distribution of DSB peaks to that of polyadenylated ncRNA molecules of the prl class. DSB peaks map to ncRNA loci that may be situated within ORFs, near the boundaries of ORFs and intergenic regions, or most often within intergenic regions. Unconditional statistical tests revealed that this colocalization is non-random and robust (P≤5.5×10−8). Furthermore, we tested and rejected the hypothesis that the ncRNA loci and DSB peaks localize preferentially, but independently, to a third entity on the chromosomes.
Meiotic DSB hotspots are directed to loci that express polyadenylated ncRNAs. This reveals an unexpected, possibly unitary mechanism for what directs meiotic recombination to hotspots. It also reveals a likely biological function for enigmatic ncRNAs. We propose specific mechanisms by which ncRNA molecules, or some aspect of RNA metabolism associated with ncRNA loci, help to position recombination protein complexes at DSB hotspots within chromosomes.
Up to 450 000 non-coding RNAs (ncRNAs) have been predicted to be transcribed from the human genome. However, it still has to be elucidated which of these transcripts represent functional ncRNAs. Since all functional ncRNAs in Eukarya form ribonucleo-protein particles (RNPs), we generated specialized cDNA libraries from size-fractionated RNPs and validated the presence of selected ncRNAs within RNPs by glycerol gradient centrifugation. As a proof of concept, we applied the RNP method to human Hela cells or total mouse brain, and subjected cDNA libraries, generated from the two model systems, to deep-sequencing. Bioinformatical analysis of cDNA sequences revealed several hundred ncRNP candidates. Thereby, ncRNAs candidates were mainly located in intergenic as well as intronic regions of the genome, with a significant overrepresentation of intron-derived ncRNA sequences. Additionally, a number of ncRNAs mapped to repetitive sequences. Thus, our RNP approach provides an efficient way to identify new functional small ncRNA candidates, involved in RNP formation.
Non-coding RNAs (ncRNAs) are involved in a wide spectrum of regulatory functions. Within recent years, there have been increasing reports of observed polyadenylated ncRNAs and mRNA like ncRNAs in eukaryotes. To investigate this further, we examined the large data set in the Sino-Danish PigEST resource which also contains expression information distributed on 97 non-normalized cDNA libraries.
We constructed a pipeline, EST2ncRNA, to search for known and novel ncRNAs. The pipeline utilises sequence similarity to ncRNA databases (blast), structure similarity to Rfam (RaveNnA) as well as multiple alignments to predict conserved novel putative RNA structures (RNAz). EST2ncRNA was fed with 48,000 contigs and 73,000 singletons available from the PigEST resource. Using the pipeline we identified known RNA structures in 137 contigs and single reads (conreads), and predicted high confidence RNA structures in non-protein coding regions of additional 1,262 conreads. Of these, structures in 270 conreads overlap with existing predictions in human. To sum up, the PigEST resource comprises trans-acting elements (ncRNAs) in 715 contigs and 340 singletons as well as cis-acting elements (inside UTRs) in 311 contigs and 51 singletons, of which 18 conreads contain both predictions of trans- and cis-acting elements. The predicted RNAz candidates were compared with the PigEST expression information and we identify 114 contigs with an RNAz prediction and expression in at least ten of the non-normalised cDNA libraries. We conclude that the contigs with RNAz and known predictions are in general expressed at a much lower level than protein coding transcripts. In addition, we also observe that our ncRNA candidates constitute about one to two percent of the genes expressed in the cDNA libraries. Intriguingly, the cDNA libraries from developmental (brain) tissues contain the highest amount of ncRNA candidates, about two percent. These observations are related to existing knowledge and hypotheses about the role of ncRNAs in higher organisms. Furthermore, about 80% porcine coding transcripts (of 18,600 identified) as well as less than one-third ORF-free transcripts are conserved at least in the closely related bovine genome. Approximately one percent of the coding and 10% of the remaining matches are unique between the PigEST data and cow genome. Based on the pig-cow alignments, we searched for similarities to 16 other organisms by UCSC available alignments, which resulted in a 87% coverage by the human genome for instance.
Besides recovering several of the already annotated functional RNA structures, we predicted a large number of high confidence conserved secondary structures in polyadenylated porcine transcripts. Our observations of relatively low expression levels of predicted ncRNA candidates together with the observations of higher relative amount in cDNA libraries from developmental stages are in agreement with the current paradigm of ncRNA roles in higher organisms and supports the idea of polyadenylated ncRNAs.
Much of the genome is transcribed into long non-coding RNAs (ncRNAs). Previous data suggested that bithoraxoid (bxd) ncRNAs of the Drosophila bithorax complex prevent silencing of Ultrabithorax (Ubx), and recruit activating proteins of the trithorax group to their maintenance elements. We found that, surprisingly, Ubx and several bxd ncRNAs are expressed in non-overlapping patterns in both embryos and imaginal discs, suggesting that transcription of these ncRNAs is associated with repression, not activation, of Ubx. Our data rule out siRNA or miRNA-based mechanisms for repression by bxd ncRNAs. Rather, ncRNA transcription itself, acting in cis, represses Ubx. The Trithorax complex TAC1 binds the Ubx coding region in nuclei expressing Ubx, and the bxd region in nuclei not expressing Ubx. We propose that TAC1 promotes the mosaic pattern of Ubx expression by facilitating transcriptional elongation of bxd ncRNAs, which represses Ubx transcription.
Expression profiling of eukaryotic genomes has revealed widespread transcription outside the confines of protein-coding genes, leading to production of antisense and non-coding RNAs (ncRNAs). Studies in Schizosaccharomyces pombe and multicellular organisms suggest that transcription and ncRNAs provide a framework for the assembly of heterochromatin, which has been linked to various chromosomal processes. In addition to gene regulation, heterochromatin is critical for centromere function, cell fate determination as well as transcriptional and posttranscriptional silencing of repetitive DNA elements. Recently, heterochromatin factors have been shown to suppress antisense RNAs at euchromatic loci. These findings define conserved pathways that likely have major impact on the epigenetic regulation of eukaryotic genomes.
The availability of sequencing technology has enabled understanding of transcriptomes through genome-wide approaches including RNA-sequencing. Contrary to the previous assumption that large tracts of the eukaryotic genomes are not transcriptionally active, recent evidence from transcriptome sequencing approaches have revealed pervasive transcription in many genomes of higher eukaryotes. Many of these loci encode transcripts that have no obvious protein-coding potential and are designated as non-coding RNA (ncRNA). Non-coding RNAs are classified empirically as small and long non-coding RNAs based on the size of the functional RNAs. Each of these classes is further classified into functional subclasses. Although microRNAs (miRNA), one of the major subclass of ncRNAs, have been extensively studied for their roles in regulation of gene expression and involvement in a large number of patho-physiological processes, the functions of a large proportion of long non-coding RNAs (lncRNA) still remains elusive. We hypothesized that some lncRNAs could potentially be processed to small RNA and thus could have a dual regulatory output.
Integration of large-scale independent experimental datasets in public domain revealed that certain well studied lncRNAs harbor small RNA clusters. Expression analysis of the small RNA clusters in different tissue and cell types reveal that they are differentially regulated suggesting a regulated biogenesis mechanism.
Our analysis suggests existence of a potentially novel pathway for lncRNA processing into small RNAs. Expression analysis, further suggests that this pathway is regulated. We argue that this evidence supports our hypothesis, though limitations of the datasets and analysis cannot completely rule out alternate possibilities. Further in-depth experimental verification of the observation could potentially reveal a novel pathway for biogenesis.
This article was reviewed by Dr Rory Johnson (nominated by Fyodor Kondrashov), Dr Raya Khanin (nominated by Dr Yuriy Gusev) and Prof Neil Smalheiser. For full reviews, please go to the Reviewer’s comment section.
Imprinted macro non-protein-coding (nc) RNAs are cis-repressor transcripts that silence multiple genes in at least three imprinted gene clusters in the mouse genome. Similar macro or long ncRNAs are abundant in the mammalian genome. Here we present the full coding and non-coding transcriptome of two mouse tissues: differentiated ES cells and fetal head using an optimized RNA-Seq strategy. The data produced is highly reproducible in different sequencing locations and is able to detect the full length of imprinted macro ncRNAs such as Airn and Kcnq1ot1, whose length ranges between 80–118 kb. Transcripts show a more uniform read coverage when RNA is fragmented with RNA hydrolysis compared with cDNA fragmentation by shearing. Irrespective of the fragmentation method, all coding and non-coding transcripts longer than 8 kb show a gradual loss of sequencing tags towards the 3′ end. Comparisons to published RNA-Seq datasets show that the strategy presented here is more efficient in detecting known functional imprinted macro ncRNAs and also indicate that standardization of RNA preparation protocols would increase the comparability of the transcriptome between different RNA-Seq datasets.
In mammals, thousands of long non-protein-coding RNAs (ncRNAs) (>200 nt) have recently been described. However, the biological significance and function of the vast majority of these transcripts remain unclear. We have constructed a public repository, the Noncoding RNA Expression Database (NRED), which provides gene expression information for thousands of long ncRNAs in human and mouse. The database contains both microarray and in situ hybridization data, much of which is described here for the first time. NRED also supplies a rich tapestry of ancillary information for featured ncRNAs, including evolutionary conservation, secondary structure evidence, genomic context links and antisense relationships. The database is available at http://jsm-research.imb.uq.edu.au/NRED, and the web interface enables both advanced searches and data downloads. Taken together, NRED should significantly advance the study and understanding of long ncRNAs, and provides a timely and valuable resource to the scientific community.