Search tips
Search criteria

Results 1-25 (1152437)

Clipboard (0)

Related Articles

1.  Clusters of Internally Primed Transcripts Reveal Novel Long Noncoding RNAs 
PLoS Genetics  2006;2(4):e37.
Non-protein-coding RNAs (ncRNAs) are increasingly being recognized as having important regulatory roles. Although much recent attention has focused on tiny 22- to 25-nucleotide microRNAs, several functional ncRNAs are orders of magnitude larger in size. Examples of such macro ncRNAs include Xist and Air, which in mouse are 18 and 108 kilobases (Kb), respectively. We surveyed the 102,801 FANTOM3 mouse cDNA clones and found that Air and Xist were present not as single, full-length transcripts but as a cluster of multiple, shorter cDNAs, which were unspliced, had little coding potential, and were most likely primed from internal adenine-rich regions within longer parental transcripts. We therefore conducted a genome-wide search for regional clusters of such cDNAs to find novel macro ncRNA candidates. Sixty-six regions were identified, each of which mapped outside known protein-coding loci and which had a mean length of 92 Kb. We detected several known long ncRNAs within these regions, supporting the basic rationale of our approach. In silico analysis showed that many regions had evidence of imprinting and/or antisense transcription. These regions were significantly associated with microRNAs and transcripts from the central nervous system. We selected eight novel regions for experimental validation by northern blot and RT-PCR and found that the majority represent previously unrecognized noncoding transcripts that are at least 10 Kb in size and predominantly localized in the nucleus. Taken together, the data not only identify multiple new ncRNAs but also suggest the existence of many more macro ncRNAs like Xist and Air.
The human genome has been sequenced, and, intriguingly, less than 2% specifies the information for the basic protein building blocks of our bodies. So, what does the other 98% do? It now appears that the mammalian genome also specifies the instructions for many previously undiscovered “non protein-coding RNA” (ncRNA) genes. However, what these ncRNAs do is largely unknown. In recent years, strategies have been designed that have successfully identified hundreds of short ncRNAs—termed microRNAs—many of which have since been shown to act as genetic regulators. Also known to be functionally important are a handful of ncRNAs orders of magnitude larger in size than microRNAs. The availability of complete genome and comprehensive transcript sequences allows for the systematic discovery of more large ncRNAs. The authors developed a computational strategy to screen the mouse genome and identify large ncRNAs. They detected existing large ncRNAs, thus validating their approach, but, more importantly, discovered more than 60 other candidates, some of which were subsequently confirmed experimentally. This work opens the door to a virtually unexplored world of large ncRNAs and beckons future experimental work to define the cellular functions of these molecules.
PMCID: PMC1449886  PMID: 16683026
2.  Detection of RNA structures in porcine EST data and related mammals 
BMC Genomics  2007;8:316.
Non-coding RNAs (ncRNAs) are involved in a wide spectrum of regulatory functions. Within recent years, there have been increasing reports of observed polyadenylated ncRNAs and mRNA like ncRNAs in eukaryotes. To investigate this further, we examined the large data set in the Sino-Danish PigEST resource which also contains expression information distributed on 97 non-normalized cDNA libraries.
We constructed a pipeline, EST2ncRNA, to search for known and novel ncRNAs. The pipeline utilises sequence similarity to ncRNA databases (blast), structure similarity to Rfam (RaveNnA) as well as multiple alignments to predict conserved novel putative RNA structures (RNAz). EST2ncRNA was fed with 48,000 contigs and 73,000 singletons available from the PigEST resource. Using the pipeline we identified known RNA structures in 137 contigs and single reads (conreads), and predicted high confidence RNA structures in non-protein coding regions of additional 1,262 conreads. Of these, structures in 270 conreads overlap with existing predictions in human. To sum up, the PigEST resource comprises trans-acting elements (ncRNAs) in 715 contigs and 340 singletons as well as cis-acting elements (inside UTRs) in 311 contigs and 51 singletons, of which 18 conreads contain both predictions of trans- and cis-acting elements. The predicted RNAz candidates were compared with the PigEST expression information and we identify 114 contigs with an RNAz prediction and expression in at least ten of the non-normalised cDNA libraries. We conclude that the contigs with RNAz and known predictions are in general expressed at a much lower level than protein coding transcripts. In addition, we also observe that our ncRNA candidates constitute about one to two percent of the genes expressed in the cDNA libraries. Intriguingly, the cDNA libraries from developmental (brain) tissues contain the highest amount of ncRNA candidates, about two percent. These observations are related to existing knowledge and hypotheses about the role of ncRNAs in higher organisms. Furthermore, about 80% porcine coding transcripts (of 18,600 identified) as well as less than one-third ORF-free transcripts are conserved at least in the closely related bovine genome. Approximately one percent of the coding and 10% of the remaining matches are unique between the PigEST data and cow genome. Based on the pig-cow alignments, we searched for similarities to 16 other organisms by UCSC available alignments, which resulted in a 87% coverage by the human genome for instance.
Besides recovering several of the already annotated functional RNA structures, we predicted a large number of high confidence conserved secondary structures in polyadenylated porcine transcripts. Our observations of relatively low expression levels of predicted ncRNA candidates together with the observations of higher relative amount in cDNA libraries from developmental stages are in agreement with the current paradigm of ncRNA roles in higher organisms and supports the idea of polyadenylated ncRNAs.
PMCID: PMC2072958  PMID: 17845718
3.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines 
PLoS Genetics  2006;2(4):e29.
RIKEN's FANTOM project has revealed many previously unknown coding sequences, as well as an unexpected degree of variation in transcripts resulting from alternative promoter usage and splicing. Ever more transcripts that do not code for proteins have been identified by transcriptome studies, in general. Increasing evidence points to the important cellular roles of such non-coding RNAs (ncRNAs). The distinction of protein-coding RNA transcripts from ncRNA transcripts is therefore an important problem in understanding the transcriptome and carrying out its annotation. Very few in silico methods have specifically addressed this problem. Here, we introduce CONC (for “coding or non-coding”), a novel method based on support vector machines that classifies transcripts according to features they would have if they were coding for proteins. These features include peptide length, amino acid composition, predicted secondary structure content, predicted percentage of exposed residues, compositional entropy, number of homologs from database searches, and alignment entropy. Nucleotide frequencies are also incorporated into the method. Confirmed coding cDNAs for eukaryotic proteins from the Swiss-Prot database constituted the set of true positives, ncRNAs from RNAdb and NONCODE the true negatives. Ten-fold cross-validation suggested that CONC distinguished coding RNAs from ncRNAs at about 97% specificity and 98% sensitivity. Applied to 102,801 mouse cDNAs from the FANTOM3 dataset, our method reliably identified over 14,000 ncRNAs and estimated the total number of ncRNAs to be about 28,000.
There are two types of RNA: messenger RNAs (mRNAs), which are translated into proteins, and non-coding RNAs (ncRNAs), which function as RNA molecules. Besides textbook examples such as tRNAs and rRNAs, non-coding RNAs have been found to carry out very diverse functions, from mRNA splicing and RNA modification to translational regulation. It has been estimated that non-coding RNAs make up the vast majority of transcription output of higher eukaryotes. Discriminating mRNA from ncRNA has become an important biological and computational problem. The authors describe a computational method based on a machine learning algorithm known as a support vector machine (SVM) that classifies transcripts according to features they would have if they were coding for proteins. These features include peptide length, amino acid composition, secondary structure content, and protein alignment information. The method is applied to the dataset from the FANTOM3 large-scale mouse cDNA sequencing project; it identifies over 14,000 ncRNAs in mouse and estimates the total number of ncRNAs in the FANTOM3 data to be about 28,000.
PMCID: PMC1449884  PMID: 16683024
4.  Computational prediction of novel non-coding RNAs in Arabidopsis thaliana 
BMC Bioinformatics  2009;10(Suppl 1):S36.
Non-coding RNA (ncRNA) genes do not encode proteins but produce functional RNA molecules that play crucial roles in many key biological processes. Recent genome-wide transcriptional profiling studies using tiling arrays in organisms such as human and Arabidopsis have revealed a great number of transcripts, a large portion of which have little or no capability to encode proteins. This unexpected finding suggests that the currently known repertoire of ncRNAs may only represent a small fraction of ncRNAs of the organisms. Thus, efficient and effective prediction of ncRNAs has become an important task in bioinformatics in recent years. Among the available computational methods, the comparative genomic approach seems to be the most powerful to detect ncRNAs. The recent completion of the sequencing of several major plant genomes has made the approach possible for plants.
We have developed a pipeline to predict novel ncRNAs in the Arabidopsis (Arabidopsis thaliana) genome. It starts by comparing the expressed intergenic regions of Arabidopsis as provided in two whole-genome high-density oligo-probe arrays from the literature with the intergenic nucleotide sequences of all completely sequenced plant genomes including rice (Oryza sativa), poplar (Populus trichocarpa), grape (Vitis vinifera), and papaya (Carica papaya). By using multiple sequence alignment, a popular ncRNA prediction program (RNAz), wet-bench experimental validation, protein-coding potential analysis, and stringent screening against various ncRNA databases, the pipeline resulted in 16 families of novel ncRNAs (with a total of 21 ncRNAs).
In this paper, we undertake a genome-wide search for novel ncRNAs in the genome of Arabidopsis by a comparative genomics approach. The identified novel ncRNAs are evolutionarily conserved between Arabidopsis and other recently sequenced plants, and may conduct interesting novel biological functions.
PMCID: PMC2648795  PMID: 19208137
5.  Programmed fluctuations in sense/antisense transcript ratios drive sexual differentiation in S. pombe 
Strand-specific RNA sequencing of S. pombe reveals a highly structured programme of ncRNA expression at over 600 loci. Functional investigations show that this extensive ncRNA landscape controls the complex programme of sexual differentiation in S. pombe.
The model eukaryote S. pombe features substantial numbers of ncRNAs many of which are antisense regulatory transcripts (ARTs), ncRNAs expressed on the opposing strand to coding sequences.Individual ARTs are generated during the mitotic cycle, or at discrete stages of sexual differentiation to downregulate the levels of proteins that drive and coordinate sexual differentiation.Antisense transcription occurring from events such as bidirectional transcription is not simply artefactual ‘chatter', it performs a critical role in regulating gene expression.
Regulation of the RNA profile is a principal control driving sexual differentiation in the fission yeast Schizosaccharomyces pombe. Before transcription, RNAi-mediated formation of heterochromatin is used to suppress expression, while post-transcription, regulation is achieved via the active stabilisation or destruction of transcripts, and through at least two distinct types of splicing control (Mata et al, 2002; Shimoseki and Shimoda, 2001; Averbeck et al, 2005; Mata and Bähler, 2006; Xue-Franzen et al, 2006; Moldon et al, 2008; Djupedal et al, 2009; Amorim et al, 2010; Grewal, 2010; Cremona et al, 2011).
Around 94% of the S. pombe genome is transcribed (Wilhelm et al, 2008). While many of these transcripts encode proteins (Wood et al, 2002; Bitton et al, 2011), the majority have no known function. We used a strand-specific protocol to sequence total RNA extracts taken from vegetatively growing cells, and at different points during a time course of sexual differentiation. The resulting data redefined existing gene coordinates and identified additional transcribed loci. The frequency of reads at each of these was used to monitor transcript abundance.
Transcript levels at 6599 loci changed in at least one sample (G-statistic; False Discovery Rate <5%). 4231 (72.3%), of which 4011 map to protein-coding genes, while 809 loci were antisense to a known gene. Comparisons between haploid and diploid strains identified changes in transcript levels at over 1000 loci.
At 354 loci, greater antisense abundance was observed relative to sense, in at least one sample (putative antisense regulatory transcripts—ARTs). Since antisense mechanisms are known to modulate sense transcript expression through a variety of inhibitory mechanisms (Faghihi and Wahlestedt, 2009), we postulated that the waves of antisense expression activated at different stages during meiosis might be regulating protein expression.
To ask whether transcription factors that drive sense-transcript levels influenced ART production, we performed RNA-seq of a pat1.114 diploid meiosis in the absence of the transcription factors Atf21 and Atf31 (responsible for late meiotic transcription; Mata et al, 2002). Transcript levels at 185 ncRNA loci showed significant changes in the knockout backgrounds. Although meiotic progression is largely unaffected by removal of Atf21 and Atf31, viability of the resulting spores was significantly diminished, indicating that Atf21- and Atf31-mediated events are critical to efficient sexual differentiation.
If changes to relative antisense/sense transcript levels during a particular phase of sexual differentiation were to regulate protein expression, then the continued presence of the antisense at points in the differentiation programme where it would normally be absent should abolish protein function during this phase. We tested this hypothesis at four loci representing the three means of antisense production: convergent gene expression, improper termination and nascent transcription from an independent locus. Induction of the natural antisense transcripts that opposed spo4+, spo6+ and dis1+ (Figures 3 and 7) in trans from a heterologous locus phenocopied a loss of function of the target protein. ART overexpression decreased Dis1 protein levels. Antisense transcription opposing spk1+ originated from improper termination of the sense ups1+ transcript on the opposite strand (Figure 3B, left locus). Expression of either the natural full-length ups1+ transcript or a truncated version, restricted to the portion of ups1+ overlapping spk1+ (Figure 3, orange transcripts) in trans from a heterologous locus phenocopied the spk1.Δ differentiation deficiency. Convergent transcription from a neighbouring gene on the opposing strand is, therefore, an effective mechanism to generate RNAi-mediated (below) silencing in fission yeast. Further analysis of the data revealed, for many loci, substantial changes in UTR length over the course of meiosis, suggesting that UTR dynamics may have an active role in regulating gene expression by controlling the transcriptional overlap between convergent adjacent gene pairs.
The RNAi machinery (Grewal, 2010) was required for antisense suppression at each of the dis1, spk1, spo4 and spo6 loci, as antisense to each locus had no impact in ago1.Δ, dcr1.Δ and rdp1.Δ backgrounds. We conclude that RNAi control has a key role in maintaining the fidelity of sexual differentiation in fission yeast. The histone H3 methyl transferase Clr4 was required for antisense control from a heterologous locus.
Thus, a significant portion of the impact of ncRNA upon sexual differentiation arises from antisense gene silencing. Importantly, in contrast to the extensively characterised ability of the RNAi machinery to operate in cis at a target locus in S. pombe (Grewal, 2010), each case of gene silencing generated here could be achieved in trans by expression of the antisense transcript from a single heterologous locus elsewhere in the genome.
Integration of an antibiotic marker gene immediately downstream of the dis1+ locus instigated antisense control in an orientation-dependent manner. PCR-based gene tagging approaches are widely used to fuse the coding sequences of epitope or protein tags to a gene of interest. Not only do these tagging approaches disrupt normal 3′UTR controls, but the insertion of a heterologous marker gene immediately downstream of an ORF can clearly have a significant impact upon transcriptional control of the resulting fusion protein. Thus, PCR tagging approaches can no longer be viewed as benign manipulations of a locus that only result in the production of a tagged protein product.
Repression of Dis1 function by gene deletion or antisense control revealed a key role this conserved microtubule regulator in driving the horsetail nuclear migrations that promote recombination during meiotic prophase.
Non-coding transcripts have often been viewed as simple ‘chatter', maintained solely because evolutionary pressures have not been strong enough to force their elimination from the system. Our data show that phenomena such as improper termination and bidirectional transcription are not simply interesting artifacts arising from the complexities of transcription or genome history, but have a critical role in regulating gene expression in the current genome. Given the widespread use of RNAi, it is reasonable to anticipate that future analyses will establish ARTs to have equal importance in other organisms, including vertebrates.
These data highlight the need to modify our concept of a gene from that of a spatially distinct locus. This view is becoming increasingly untenable. Not only are the 5′ and 3′ ends of many genes indistinct, but that this lack of a hard and fast boundary is actively used by cells to control the transcription of adjacent and overlapping loci, and thus to regulate critical events in the life of a cell.
Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3′ termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent ‘horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply ‘genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
PMCID: PMC3738847  PMID: 22186733
antisense; meiosis; ncRNA; S. pombe; siRNA
6.  Non-Coding RNA Prediction and Verification in Saccharomyces cerevisiae 
PLoS Genetics  2009;5(1):e1000321.
Non-coding RNA (ncRNA) play an important and varied role in cellular function. A significant amount of research has been devoted to computational prediction of these genes from genomic sequence, but the ability to do so has remained elusive due to a lack of apparent genomic features. In this work, thermodynamic stability of ncRNA structural elements, as summarized in a Z-score, is used to predict ncRNA in the yeast Saccharomyces cerevisiae. This analysis was coupled with comparative genomics to search for ncRNA genes on chromosome six of S. cerevisiae and S. bayanus. Sets of positive and negative control genes were evaluated to determine the efficacy of thermodynamic stability for discriminating ncRNA from background sequence. The effect of window sizes and step sizes on the sensitivity of ncRNA identification was also explored. Non-coding RNA gene candidates, common to both S. cerevisiae and S. bayanus, were verified using northern blot analysis, rapid amplification of cDNA ends (RACE), and publicly available cDNA library data. Four ncRNA transcripts are well supported by experimental data (RUF10, RUF11, RUF12, RUF13), while one additional putative ncRNA transcript is well supported but the data are not entirely conclusive. Six candidates appear to be structural elements in 5′ or 3′ untranslated regions of annotated protein-coding genes. This work shows that thermodynamic stability, coupled with comparative genomics, can be used to predict ncRNA with significant structural elements.
Author Summary
Recent advances in DNA sequence technology have made it possible to sequence entire genomes. Once a genome is sequenced, it becomes necessary to identify the set of genes and other functional elements within the genome. This is particularly challenging as much of the genomic sequence does not appear to perform any function and is loosely referred to as “junk.” Identifying functional elements among the “junk” is difficult. Experimental methods have been developed for this purpose but they are time-consuming, expensive, and often provide an incomplete picture. Thus, it is important to develop the ability to identify these functional elements using computational methods. Protein-coding genes are relatively easy to identify computationally, but other categories of functional elements present a significantly greater challenge. In this work, we used a computational approach to identify genes that do not encode for a protein but rather function as an RNA molecule. We then used experimental methods to verify our predictions and thereby validate the computational method.
PMCID: PMC2603021  PMID: 19119416
7.  A Genome-Wide Survey of Highly Expressed Non-Coding RNAs and Biological Validation of Selected Candidates in Agrobacterium tumefaciens 
PLoS ONE  2013;8(8):e70720.
Agrobacterium tumefaciens is a plant pathogen that has the natural ability of delivering and integrating a piece of its own DNA into plant genome. Although bacterial non-coding RNAs (ncRNAs) have been shown to regulate various biological processes including virulence, we have limited knowledge of how Agrobacterium ncRNAs regulate this unique inter-Kingdom gene transfer. Using whole transcriptome sequencing and an ncRNA search algorithm developed for this work, we identified 475 highly expressed candidate ncRNAs from A. tumefaciens C58, including 101 trans-encoded small RNAs (sRNAs), 354 antisense RNAs (asRNAs), 20 5′ untranslated region (UTR) leaders including a RNA thermosensor and 6 riboswitches. Moreover, transcription start site (TSS) mapping analysis revealed that about 51% of the mapped mRNAs have 5′ UTRs longer than 60 nt, suggesting that numerous cis-acting regulatory elements might be encoded in the A. tumefaciens genome. Eighteen asRNAs were found on the complementary strands of virA, virB, virC, virD, and virE operons. Fifteen ncRNAs were induced and 7 were suppressed by the Agrobacterium virulence (vir) gene inducer acetosyringone (AS), a phenolic compound secreted by the plants. Interestingly, fourteen of the AS-induced ncRNAs have putative vir box sequences in the upstream regions. We experimentally validated expression of 36 ncRNAs using Northern blot and Rapid Amplification of cDNA Ends analyses. We show functional relevance of two 5′ UTR elements: a RNA thermonsensor (C1_109596F) that may regulate translation of the major cold shock protein cspA, and a thi-box riboswitch (C1_2541934R) that may transcriptionally regulate a thiamine biosynthesis operon, thiCOGG. Further studies on ncRNAs functions in this bacterium may provide insights and strategies that can be used to better manage pathogenic bacteria for plants and to improve Agrobacterum-mediated plant transformation.
PMCID: PMC3738593  PMID: 23950988
8.  Identification and characterisation of non-coding small RNAs in the pathogenic filamentous fungus Trichophyton rubrum 
BMC Genomics  2013;14:931.
Accumulating evidence demonstrates that non-coding RNAs (ncRNAs) are indispensable components of many organisms and play important roles in cellular events, regulation, and development.
Here, we analysed the small non-coding RNA (ncRNA) transcriptome of Trichophyton rubrum by constructing and sequencing a cDNA library from conidia and mycelia. We identified 352 ncRNAs and their corresponding genomic loci. These ncRNA candidates included 198 entirely novel ncRNAs and 154 known ncRNAs classified as snRNAs, snoRNAs and other known ncRNAs. Further bioinformatic analysis detected 96 snoRNAs, including 56 snoRNAs that had been annotated in other organisms and 40 novel snoRNAs. All snoRNAs belonged to two major classes—C/D box snoRNAs and H/ACA snoRNAs—and their potential target sites in rRNAs and snRNAs were predicted. To analyse the evolutionary conservation of the ncRNAs in T. rubrum, we aligned all 352 ncRNAs to the genomes of six dermatophytes and to the NCBI non-redundant nucleotide database (NT). The results showed that most of the identified snRNAs were conserved in dermatophytes. Of the 352 ncRNAs, 102 also had genomic loci in other dermatophytes, and 27 were dermatophyte-specific.
Our systematic analysis may provide important clues to the function and evolution of ncRNAs in T. rubrum. These results also provide important information to complement the current annotation of the T. rubrum genome, which primarily comprises protein-coding genes.
PMCID: PMC3890542  PMID: 24377353
9.  Activating RNAs associate with Mediator to enhance chromatin architecture and transcription 
Nature  2013;494(7438):497-501.
Recent advances in genomic research have revealed the existence of a large number of transcripts devoid of protein-coding potential in multiple organisms 1-8. While the functional role for long non-coding RNAs (lncRNAs) has been best defined in epigenetic phenomena such as X inactivation and imprinting, different classes of lncRNAs may have varied biological functions 8-13. We and others have identified a class of lncRNAs, termed ncRNA-activating (ncRNA-a), that function to activate their neighboring genes using a cis-mediated mechanism 5,14-16. To define the precise mode by which such enhancer-like RNAs function, we depleted factors with known roles in transcriptional activation and assessed their role in RNA-dependent activation. Here we report that depletion of the components of the co-activator complex, Mediator, specifically and potently diminished the ncRNA-induced activation of transcription in such a heterologous reporter assay. In vivo, Mediator is recruited to ncRNA-as target genes, and regulates their expression. We show that ncRNA-as interact with Mediator to regulate its chromatin localization and kinase activity toward histone H3 serine 10. Mediator complex harboring disease causing MED12 mutations 17,18 displays diminished ability to associate with activating ncRNAs. Chromosome conformation capture (3C) confirmed the presence of DNA looping between the ncRNA-a loci and its targets. Importantly, depletion of Mediator subunits or ncRNA-as reduced the chromatin looping between the two loci. Our results identify the human Mediator complex as the transducer of activating ncRNAs and highlight the importance of Mediator and activating ncRNAs association in human disease.
PMCID: PMC4109059  PMID: 23417068
10.  Microarray analysis of ncRNA expression patterns in Caenorhabditis elegans after RNAi against snoRNA associated proteins 
BMC Genomics  2008;9:278.
Short non-coding RNAs (ncRNAs) perform their cellular functions in ribonucleoprotein (RNP) complexes, which are also essential for maintaining the stability of the ncRNAs. Depletion of individual protein components of non-coding ribonucleoprotein (ncRNP) particles by RNA interference (RNAi) may therefore affect expression levels of the corresponding ncRNA, and depletion of candidate associated proteins may constitute an alternative strategy when investigating ncRNA-protein interactions and ncRNA functions. Therefore, we carried out a pilot study in which the effects of RNAi against protein components of small nucleolar RNPs (snoRNPs) in Caenorhabditis elegans were observed on an ncRNA microarray.
RNAi against individual C. elegans protein components of snoRNPs produced strongly reduced mRNA levels and distinct phenotypes for all targeted proteins. For each type of snoRNP, individual depletion of at least three of the four protein components produced significant (P ≦ 1.2 × 10-5) reductions in the expression levels of the corresponding small nucleolar RNAs (snoRNAs), whereas the expression levels of other ncRNAs were largely unaffected. The effects of depletion of individual proteins were in accordance with snoRNP structure analyses obtained in other species for all but two of the eight targeted proteins. Variations in snoRNA size, sequence and secondary structure characteristics were not systematically reflected in the affinity for individual protein component of snoRNPs. The data supported the classification of nearly all annotated snoRNAs and suggested the presence of several novel snoRNAs among unclassified short ncRNA transcripts. A number of transcripts containing canonical Sm binding element sequences (Sm Y RNAs) also showed reduced expression after depletion of protein components of C/D box snoRNPs, whereas the expression of some stem-bulge RNAs (sbRNAs) was increased after depletion of the same proteins.
The study confirms observations made for other organisms, where reduced ncRNA levels after depletion of protein components of ncRNPs were noted, and shows that such reductions in expression levels occur across entire sets of ncRNA. Thereby, the study also demonstrates the feasibility of combining RNAi against candidate proteins with ncRNA microarray analysis to investigate ncRNA-protein interactions and hence ncRNA cellular functions.
PMCID: PMC2442092  PMID: 18547420
11.  The Emerging Role of Epigenetics in Stroke 
Archives of neurology  2010;67(12):1435-1441.
Recent scientific advances have demonstrated the existence of extensive RNA-based regulatory networks involved in orchestrating nearly every cellular process in health and various disease states. This previously hidden layer of functional RNAs is derived largely from non–protein-coding DNA sequences that constitute more than 98% of the genome in humans. These non–protein-coding RNAs (ncRNAs) include subclasses that are well known, such as transfer RNAs and ribosomal RNAs, as well as those that have more recently been characterized, such as microRNAs, small nucleolar RNAs, and long ncRNAs. In this review, we examine the role of these novel ncRNAs in the nervous system and highlight emerging evidence that implicates RNA-based networks in the molecular pathogenesis of stroke. We also describe RNA editing, a related epigenetic mechanism that is partly responsible for generating the exquisite degrees of environmental responsiveness and molecular diversity that characterize ncRNAs. In addition, we discuss the development of future therapeutic strategies for locus-specific and genome-wide regulation of genes and functional gene networks through the modulation of RNA transcription, posttranscriptional RNA processing (eg, RNA modifications, quality control, intracellular trafficking, and local and long-distance intercellular transport), and RNA translation. These novel approaches for neural cell- and tissue-selective reprogramming of epigenetic regulatory mechanisms are likely to promote more effective neuroprotective and neural regenerative responses for safeguarding and even restoring central nervous system function.
PMCID: PMC3667617  PMID: 21149808
12.  Identification of Intermediate-Size Non-Coding RNAs Involved in the UV-Induced DNA Damage Response in C. elegans 
PLoS ONE  2012;7(11):e48066.
A network of DNA damage response (DDR) mechanisms functions coordinately to maintain genome integrity and prevent disease. The Nucleotide Excision Repair (NER) pathway is known to function in the response to UV-induced DNA damage. Although numbers of coding genes and miRNAs have been identified and reported to participate in UV-induced DNA damage response (UV-DDR), the precise role of non-coding RNAs (ncRNAs) in UV-DDR remains largely unknown.
Methodology/Principal Findings
We used high-throughput RNA-sequencing (RNA-Seq) to discover intermediate-size (70–500 nt) ncRNAs (is-ncRNAs) in C. elegans, using the strains of L4 larvae of wild-type (N2), UV-irradiated (N2/UV100) and NER-deficient mutant (xpa-1), and 450 novel non-coding transcripts were initially identified. A customized microarray assay was then applied to examine the expression profiles of both novel transcripts and known is-ncRNAs, and 57 UV-DDR-related is-ncRNA candidates showed expression variations at different levels between UV irradiated strains and non- irradiated strains. The top ranked is-ncRNA candidates with expression differences were further validated by qRT-PCR analysis, of them, 8 novel is-ncRNAs were significantly up-regulated after UV irradiation. Knockdown of two novel is-ncRNAs, ncRNA317 and ncRNA415, by RNA interference, resulted in higher UV sensitivity and significantly decreased expression of NER-related genes in C. elegans.
The discovery of above two novel is-ncRNAs in this study indicated the functional roles of is-ncRNAs in the regulation of UV-DDR network, and aided our understanding of the significance of ncRNA involvement in the UV-induced DNA damage response.
PMCID: PMC3492359  PMID: 23144846
13.  Genomic and Transcriptional Co-Localization of Protein-Coding and Long Non-Coding RNA Pairs in the Developing Brain 
PLoS Genetics  2009;5(8):e1000617.
Besides protein-coding mRNAs, eukaryotic transcriptomes include many long non-protein-coding RNAs (ncRNAs) of unknown function that are transcribed away from protein-coding loci. Here, we have identified 659 intergenic long ncRNAs whose genomic sequences individually exhibit evolutionary constraint, a hallmark of functionality. Of this set, those expressed in the brain are more frequently conserved and are significantly enriched with predicted RNA secondary structures. Furthermore, brain-expressed long ncRNAs are preferentially located adjacent to protein-coding genes that are (1) also expressed in the brain and (2) involved in transcriptional regulation or in nervous system development. This led us to the hypothesis that spatiotemporal co-expression of ncRNAs and nearby protein-coding genes represents a general phenomenon, a prediction that was confirmed subsequently by in situ hybridisation in developing and adult mouse brain. We provide the full set of constrained long ncRNAs as an important experimental resource and present, for the first time, substantive and predictive criteria for prioritising long ncRNA and mRNA transcript pairs when investigating their biological functions and contributions to development and disease.
Author Summary
Virtually all of the eukaryotic genome is transcribed, yet far from all transcripts encode protein. Very little is known about the functions of most non-coding transcripts or, indeed, whether they convey functions at all. Among all such transcripts, we have chosen to consider long non-coding RNAs (ncRNAs) that are transcribed outside of known protein-coding gene loci. Our approach has focused on mouse long ncRNAs whose genomic sequences are conserved in humans, and also on ncRNAs that are expressed in the brain. This conservation might reflect the functionality of the underlying DNA, rather than the ncRNA, sequence. However, this cannot fully explain the concentration of predicted RNA structures in these ncRNAs. These long ncRNAs also tend to be transcribed in the genomic neighbourhood of protein-coding genes whose functions relate to transcription or to nervous system development. These observations are consistent with the positive transcriptional regulation in cis of these genes with nearby transcription of ncRNAs. This model implies co-expression of protein-coding and noncoding transcripts, a hypothesis that we validated experimentally. These findings are particularly important because they provide a rationale for prioritising specific ncRNAs when experimentally investigating regulation of protein-coding gene expression.
PMCID: PMC2722021  PMID: 19696892
14.  Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities 
PLoS Computational Biology  2008;4(11):e1000176.
The assumption that RNA can be readily classified into either protein-coding or non-protein–coding categories has pervaded biology for close to 50 years. Until recently, discrimination between these two categories was relatively straightforward: most transcripts were clearly identifiable as protein-coding messenger RNAs (mRNAs), and readily distinguished from the small number of well-characterized non-protein–coding RNAs (ncRNAs), such as transfer, ribosomal, and spliceosomal RNAs. Recent genome-wide studies have revealed the existence of thousands of noncoding transcripts, whose function and significance are unclear. The discovery of this hidden transcriptome and the implicit challenge it presents to our understanding of the expression and regulation of genetic information has made the need to distinguish between mRNAs and ncRNAs both more pressing and more complicated. In this Review, we consider the diverse strategies employed to discriminate between protein-coding and noncoding transcripts and the fundamental difficulties that are inherent in what may superficially appear to be a simple problem. Misannotations can also run in both directions: some ncRNAs may actually encode peptides, and some of those currently thought to do so may not. Moreover, recent studies have shown that some RNAs can function both as mRNAs and intrinsically as functional ncRNAs, which may be a relatively widespread phenomenon. We conclude that it is difficult to annotate an RNA unequivocally as protein-coding or noncoding, with overlapping protein-coding and noncoding transcripts further confounding this distinction. In addition, the finding that some transcripts can function both intrinsically at the RNA level and to encode proteins suggests a false dichotomy between mRNAs and ncRNAs. Therefore, the functionality of any transcript at the RNA level should not be discounted.
PMCID: PMC2518207  PMID: 19043537
15.  Regulation of Non-Coding RNA Networks in the Nervous System—What’s the REST of the Story? 
Neuroscience letters  2009;466(2):73-80.
Recent advances are now providing novel insights into the mechanisms that underlie how cellular complexity, diversity, and connectivity are encoded within the genome. The repressor element-1 silencing transcription factor / neuron-restrictive silencing factor (REST/NRSF) and non-coding RNAs (ncRNAs) are emerging as key regulators that seem to orchestrate almost every aspect of nervous system development, homeostasis, and plasticity. REST and its primary cofactor, CoREST, dynamically recruit highly malleable macromolecular complexes to widely distributed genomic regulatory sequences, including the repressor element 1 / neuron restrictive silencer element (RE1/NRSE). Through epigenetic mechanisms, such as site-specific targeting and higher-order chromatin remodeling, REST and CoREST can mediate cell type- and developmental stage-specific gene repression, gene activation, and long-term gene silencing for protein-coding genes and for several classes of ncRNAs (e.g. microRNAs [miRNAs] and long ncRNAs). In turn, these ncRNAs have similarly been implicated in the regulation of chromatin architecture and dynamics, transcription, post-transcriptional processing, and RNA editing and trafficking. In addition, REST and CoREST expression and function are tightly regulated by context-specific transcriptional and post-transcriptional mechanisms including bidirectional feedback loops with various ncRNAs. Not surprisingly, deregulation of REST and ncRNAs are both implicated in the molecular pathophysiology underlying diverse disorders that range from brain cancer and stroke to neurodevelopmental and neurodegenerative diseases. This review summarizes emerging aspects of the complex mechanistic relationships between these intricately interlaced control systems for neural gene expression and function.
PMCID: PMC2767456  PMID: 19679163
repressor element-1 silencing transcription factor/neuron-restrictive silencer factor (REST/NRSF); CoREST; neural stem cell; oligodendrocyte; glia; neuron; epigenetic; non-coding RNA (ncRNA); microRNA (miRNA)
16.  RNAdb 2.0—an expanded database of mammalian non-coding RNAs 
Nucleic Acids Research  2006;35(Database issue):D178-D182.
RNAdb is a comprehensive database of mammalian non-protein-coding RNAs (ncRNAs). There is increasing recognition that ncRNAs play important regulatory roles in multicellular organisms, and there is an expanding rate of discovery of novel ncRNAs as well as an increasing allocation of function. In this update to RNAdb, we provide nucleotide sequences and annotations for tens of thousands of non-housekeeping ncRNAs, including a wide range of mammalian microRNAs, small nucleolar RNAs and larger mRNA-like ncRNAs. Some of these have documented functions and/or expression patterns, but the majority remain of unclear significance, and include PIWI-interacting RNAs, ncRNAs identified from the latest rounds of large-scale cDNA sequencing projects, putative antisense transcripts, as well as ncRNAs predicted on the basis of structural features and alignments. Improvements to the database comprise not only new and updated ncRNA datasets, but also provision of microarray-based expression data and closer interface with more specialized ncRNA resources such as miRBase and snoRNA-LBME-db. To access RNAdb, visit .
PMCID: PMC1751534  PMID: 17145715
17.  Unique Signatures of Long Noncoding RNA Expression in Response to Virus Infection and Altered Innate Immune Signaling 
mBio  2010;1(5):e00206-10.
Studies of the host response to virus infection typically focus on protein-coding genes. However, non-protein-coding RNAs (ncRNAs) are transcribed in mammalian cells, and the roles of many of these ncRNAs remain enigmas. Using next-generation sequencing, we performed a whole-transcriptome analysis of the host response to severe acute respiratory syndrome coronavirus (SARS-CoV) infection across four founder mouse strains of the Collaborative Cross. We observed differential expression of approximately 500 annotated, long ncRNAs and 1,000 nonannotated genomic regions during infection. Moreover, studies of a subset of these ncRNAs and genomic regions showed the following. (i) Most were similarly regulated in response to influenza virus infection. (ii) They had distinctive kinetic expression profiles in type I interferon receptor and STAT1 knockout mice during SARS-CoV infection, including unique signatures of ncRNA expression associated with lethal infection. (iii) Over 40% were similarly regulated in vitro in response to both influenza virus infection and interferon treatment. These findings represent the first discovery of the widespread differential expression of long ncRNAs in response to virus infection and suggest that ncRNAs are involved in regulating the host response, including innate immunity. At the same time, virus infection models provide a unique platform for studying the biology and regulation of ncRNAs.
Most studies examining the host transcriptional response to infection focus only on protein-coding genes. However, there is growing evidence that thousands of non-protein-coding RNAs (ncRNAs) are transcribed from mammalian genomes. While most attention to the involvement of ncRNAs in virus-host interactions has been on small ncRNAs such as microRNAs, it is becoming apparent that many long ncRNAs (>200 nucleotides [nt]) are also biologically important. These long ncRNAs have been found to have widespread functionality, including chromatin modification and transcriptional regulation and serving as the precursors of small RNAs. With the advent of next-generation sequencing technologies, whole-transcriptome analysis of the host response, including long ncRNAs, is now possible. Using this approach, we demonstrated that virus infection alters the expression of numerous long ncRNAs, suggesting that these RNAs may be a new class of regulatory molecules that play a role in determining the outcome of infection.
PMCID: PMC2962437  PMID: 20978541
18.  An RNA-Seq Strategy to Detect the Complete Coding and Non-Coding Transcriptome Including Full-Length Imprinted Macro ncRNAs 
PLoS ONE  2011;6(11):e27288.
Imprinted macro non-protein-coding (nc) RNAs are cis-repressor transcripts that silence multiple genes in at least three imprinted gene clusters in the mouse genome. Similar macro or long ncRNAs are abundant in the mammalian genome. Here we present the full coding and non-coding transcriptome of two mouse tissues: differentiated ES cells and fetal head using an optimized RNA-Seq strategy. The data produced is highly reproducible in different sequencing locations and is able to detect the full length of imprinted macro ncRNAs such as Airn and Kcnq1ot1, whose length ranges between 80–118 kb. Transcripts show a more uniform read coverage when RNA is fragmented with RNA hydrolysis compared with cDNA fragmentation by shearing. Irrespective of the fragmentation method, all coding and non-coding transcripts longer than 8 kb show a gradual loss of sequencing tags towards the 3′ end. Comparisons to published RNA-Seq datasets show that the strategy presented here is more efficient in detecting known functional imprinted macro ncRNAs and also indicate that standardization of RNA preparation protocols would increase the comparability of the transcriptome between different RNA-Seq datasets.
PMCID: PMC3213133  PMID: 22102886
19.  The majority of total nuclear-encoded non-ribosomal RNA in a human cell is 'dark matter' un-annotated RNA 
BMC Biology  2010;8:149.
Discovery that the transcriptional output of the human genome is far more complex than predicted by the current set of protein-coding annotations and that most RNAs produced do not appear to encode proteins has transformed our understanding of genome complexity and suggests new paradigms of genome regulation. However, the fraction of all cellular RNA whose function we do not understand and the fraction of the genome that is utilized to produce that RNA remain controversial. This is not simply a bookkeeping issue because the degree to which this un-annotated transcription is present has important implications with respect to its biologic function and to the general architecture of genome regulation. For example, efforts to elucidate how non-coding RNAs (ncRNAs) regulate genome function will be compromised if that class of RNAs is dismissed as simply 'transcriptional noise'.
We show that the relative mass of RNA whose function and/or structure we do not understand (the so called 'dark matter' RNAs), as a proportion of all non-ribosomal, non-mitochondrial human RNA (mt-RNA), can be greater than that of protein-encoding transcripts. This observation is obscured in studies that focus only on polyA-selected RNA, a method that enriches for protein coding RNAs and at the same time discards the vast majority of RNA prior to analysis. We further show the presence of a large number of very long, abundantly-transcribed regions (100's of kb) in intergenic space and further show that expression of these regions is associated with neoplastic transformation. These overlap some regions found previously in normal human embryonic tissues and raises an interesting hypothesis as to the function of these ncRNAs in both early development and neoplastic transformation.
We conclude that 'dark matter' RNA can constitute the majority of non-ribosomal, non-mitochondrial-RNA and a significant fraction arises from numerous very long, intergenic transcribed regions that could be involved in neoplastic transformation.
PMCID: PMC3022773  PMID: 21176148
20.  Mapping the genome landscape using tiling array technology 
Current opinion in plant biology  2007;10(5):534-542.
With the availability of complete genome sequences for a growing number of organisms, high-throughput methods for gene annotation and analysis of genome dynamics are needed. The application of whole-genome tiling microarrays for studies of global gene expression is providing a more unbiased view of the transcriptional activity within genomes. For example, this approach has led to the identification and isolation of many novel non-protein-coding RNAs (ncRNAs), which have been suggested to comprise a major component of the transcriptome that have novel functions involved in epigenetic regulation of the genome. Additionally, tiling arrays have been recently applied to the study of histone modifications and methylation of cytosine bases (DNA methylation). Surprisingly, recent studies combining the analysis of gene expression (transcriptome) and DNA methylation (methylome) using whole-genome tiling arrays revealed that DNA methylation regulates the expression levels of many ncRNAs. Further capture and integration of additional types of genome-wide data sets will help to illuminate additional hidden features of the dynamic genomic landscape that are regulated by both genetic and epigenetic pathways in plants.
PMCID: PMC2665186  PMID: 17703988
21.  Systematic classification of non-coding RNAs by epigenomic similarity 
BMC Bioinformatics  2013;14(Suppl 14):S2.
Even though only 1.5% of the human genome is translated into proteins, recent reports indicate that most of it is transcribed into non-coding RNAs (ncRNAs), which are becoming the subject of increased scientific interest. We hypothesized that examining how different classes of ncRNAs co-localized with annotated epigenomic elements could help understand the functions, regulatory mechanisms, and relationships among ncRNA families.
We examined 15 different ncRNA classes for statistically significant genomic co-localizations with cell type-specific chromatin segmentation states, transcription factor binding sites (TFBSs), and histone modification marks using GenomeRunner ( P-values were obtained using a Chi-square test and corrected for multiple testing using the Benjamini-Hochberg procedure. We clustered and visualized the ncRNA classes by the strength of their statistical enrichments and depletions.
We found piwi-interacting RNAs (piRNAs) to be depleted in regions containing activating histone modification marks, such as H3K4 mono-, di- and trimethylation, H3K27 acetylation, as well as certain TFBSs. piRNAs were further depleted in active promoters, weak transcription, and transcription elongation regions, and enriched in repressed and heterochromatic regions. Conversely, transfer RNAs (tRNAs) were depleted in heterochromatin regions and strongly enriched in regions containing activating H3K4 di- and trimethylation marks, H2az histone variant, and a variety of TFBSs. Interestingly, regions containing CTCF insulator protein binding sites were associated with tRNAs. tRNAs were also enriched in the active, weak and poised promoters and, surprisingly, in regions with repetitive/copy number variations.
Searching for statistically significant associations between ncRNA classes and epigenomic elements permits detection of potential functional and/or regulatory relationships among ncRNA classes, and suggests cell type-specific biological roles of ncRNAs.
PMCID: PMC3851203  PMID: 24267974
ncRNA; non-coding RNA; epigenetics; genome; ENCODE; GenomeRunner
22.  Genome-wide detection of predicted non-coding RNAs in Rhizobium etli expressed during free-living and host-associated growth using a high-resolution tiling array 
BMC Genomics  2010;11:53.
Non-coding RNAs (ncRNAs) play a crucial role in the intricate regulation of bacterial gene expression, allowing bacteria to quickly adapt to changing environments. In the past few years, a growing number of regulatory RNA elements have been predicted by computational methods, mostly in well-studied γ-proteobacteria but lately in several α-proteobacteria as well. Here, we have compared an extensive compilation of these non-coding RNA predictions to intergenic expression data of a whole-genome high-resolution tiling array in the soil-dwelling α-proteobacterium Rhizobium etli.
Expression of 89 candidate ncRNAs was detected, both on the chromosome and on the six megaplasmids encompassing the R. etli genome. Of these, 11 correspond to functionally well characterized ncRNAs, 12 were previously identified in other α-proteobacteria but are as yet uncharacterized and 66 were computationally predicted earlier but had not been experimentally identified and were therefore classified as novel ncRNAs. The latter comprise 17 putative sRNAs and 49 putative cis-regulatory ncRNAs. A selection of these candidate ncRNAs was validated by RT-qPCR, Northern blotting and 5' RACE, confirming the existence of 4 ncRNAs. Interestingly, individual transcript levels of numerous ncRNAs varied during free-living growth and during interaction with the eukaryotic host plant, pointing to possible ncRNA-dependent regulation of these specialized processes.
Our data support the practical value of previous ncRNA prediction algorithms and significantly expand the list of candidate ncRNAs encoded in the intergenic regions of R. etli and, by extension, of α-proteobacteria. Moreover, we show high-resolution tiling arrays to be suitable tools for studying intergenic ncRNA transcription profiles across the genome. The differential expression levels of some of these ncRNAs may indicate a role in adaptation to changing environmental conditions.
PMCID: PMC2881028  PMID: 20089193
23.  Novel classes of non-coding RNAs and cancer 
For the many years, the central dogma of molecular biology has been that RNA functions mainly as an informational intermediate between a DNA sequence and its encoded protein. But one of the great surprises of modern biology was the discovery that protein-coding genes represent less than 2% of the total genome sequence, and subsequently the fact that at least 90% of the human genome is actively transcribed. Thus, the human transcriptome was found to be more complex than a collection of protein-coding genes and their splice variants. Although initially argued to be spurious transcriptional noise or accumulated evolutionary debris arising from the early assembly of genes and/or the insertion of mobile genetic elements, recent evidence suggests that the non-coding RNAs (ncRNAs) may play major biological roles in cellular development, physiology and pathologies. NcRNAs could be grouped into two major classes based on the transcript size; small ncRNAs and long ncRNAs. Each of these classes can be further divided, whereas novel subclasses are still being discovered and characterized. Although, in the last years, small ncRNAs called microRNAs were studied most frequently with more than ten thousand hits at PubMed database, recently, evidence has begun to accumulate describing the molecular mechanisms by which a wide range of novel RNA species function, providing insight into their functional roles in cellular biology and in human disease. In this review, we summarize newly discovered classes of ncRNAs, and highlight their functioning in cancer biology and potential usage as biomarkers or therapeutic targets.
PMCID: PMC3434024  PMID: 22613733
Non-coding RNAs; microRNAs; siRNAs; piRNAs; lncRNAs; Cancer
24.  Revealing stable processing products from ribosome-associated small RNAs by deep-sequencing data analysis 
Nucleic Acids Research  2012;40(9):4013-4024.
The exploration of the non-protein-coding RNA (ncRNA) transcriptome is currently focused on profiling of microRNA expression and detection of novel ncRNA transcription units. However, recent studies suggest that RNA processing can be a multi-layer process leading to the generation of ncRNAs of diverse functions from a single primary transcript. Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases. Thus the correct assessment of widespread RNA processing events is one of the major obstacles in transcriptome research. Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data. The major features include efficient handling of non-unique reads, detection of novel stable ncRNA transcripts and processing products and annotation of known transcripts based on multiple sources of information. To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae. By employing the APART pipeline, we were able to detect and confirm by independent experimental methods multiple novel stable RNA molecules differentially processed from well known ncRNAs, like rRNAs, tRNAs or snoRNAs, in a stress-dependent manner.
PMCID: PMC3351166  PMID: 22266655
25.  Bovine ncRNAs Are Abundant, Primarily Intergenic, Conserved and Associated with Regulatory Genes 
PLoS ONE  2012;7(8):e42638.
It is apparent that non-coding transcripts are a common feature of higher organisms and encode uncharacterized layers of genetic regulation and information. We used public bovine EST data from many developmental stages and tissues, and developed a pipeline for the genome wide identification and annotation of non-coding RNAs (ncRNAs). We have predicted 23,060 bovine ncRNAs, 99% of which are un-annotated, based on known ncRNA databases. Intergenic transcripts accounted for the majority (57%) of the predicted ncRNAs and the occurrence of ncRNAs and genes were only moderately correlated (r = 0.55, p-value<2.2e-16). Many of these intergenic non-coding RNAs mapped close to the 3′ or 5′ end of thousands of genes and many of these were transcribed from the opposite strand with respect to the closest gene, particularly regulatory-related genes. Conservation analyses showed that these ncRNAs were evolutionarily conserved, and many intergenic ncRNAs proximate to genes contained sequence-specific motifs. Correlation analysis of expression between these intergenic ncRNAs and protein-coding genes using RNA-seq data from a variety of tissues showed significant correlations with many transcripts. These results support the hypothesis that ncRNAs are common, transcribed in a regulated fashion and have regulatory functions.
PMCID: PMC3412814  PMID: 22880061

Results 1-25 (1152437)