Processing of pre-miRNAs by Dicer is regulated by its dsRNA-binding protein partner, and leads to the generation of alternative miRNA forms with distinct target sets.
Transcript degradation is a widespread and important mechanism for regulating protein abundance. Two major regulators of transcript degradation are RNA Binding Proteins (RBPs) and microRNAs (miRNAs). We computationally explored whether RBPs and miRNAs cooperate to promote transcript decay. We defined five RBP motifs based on the evolutionary conservation of their recognition sites in 3′UTRs as the binding motifs for Pumilio (PUM), U1A, Fox-1, Nova, and UAUUUAU. Recognition sites for some of these RBPs tended to localize at the end of long 3′UTRs. A specific group of miRNA recognition sites were enriched within 50 nts from the RBP recognition sites for PUM and UAUUUAU. The presence of both a PUM recognition site and a recognition site for preferentially co-occurring miRNAs was associated with faster decay of the associated transcripts. For PUM and its co-occurring miRNAs, binding of the RBP to its recognition sites was predicted to release nearby miRNA recognition sites from RNA secondary structures. The mammalian miRNAs that preferentially co-occur with PUM binding sites have recognition seeds that are reverse complements to the PUM recognition motif. Their binding sites have the potential to form hairpin secondary structures with proximal PUM binding sites that would normally limit RISC accessibility, but would be more accessible to miRNAs in response to the binding of PUM. In sum, our computational analyses suggest that a specific set of RBPs and miRNAs work together to affect transcript decay, with the rescue of miRNA recognition sites via RBP binding as one possible mechanism of cooperativity.
Transcript degradation represents an important mechanism of regulation used in diverse biological processes, including during development to eliminate maternally inherited transcripts, in adult tissues to define cell lineages, and as part of signaling pathways to down-regulate unneeded transcripts. RNA binding proteins (RBPs) and microRNAs are two major classes of molecules utilized to degrade transcripts. Using computational methods, we analyzed the genomewide cooperativity between microRNA and RBP recognition sites. We observed cooperativity between Pumilio (PUM) and specific microRNAs that impacts transcript decay. Our analysis suggests that approximately seven mammalian microRNAs preferentially co-localize with PUM binding sites, and these microRNAs have recognition motifs that are reverse complements to the PUM recognition motif. Their binding sites are more likely to form RNA hairpin structures with proximal PUM recognition sites that would limit microRNA efficiency, but would be more accessible to microRNAs in response to the binding of PUM. These results indicate that rescuing microRNA recognition sites from hairpin structures may be an important role for PUM.
Kaposi's sarcoma herpesvirus (KSHV) encodes a cluster of twelve micro (mi)RNAs, which are abundantly expressed during both latent and lytic infection. Previous studies reported that KSHV is able to inhibit apoptosis during latent infection; we thus tested the involvement of viral miRNAs in this process. We found that both HEK293 epithelial cells and DG75 cells stably expressing KSHV miRNAs were protected from apoptosis. Potential cellular targets that were significantly down-regulated upon KSHV miRNAs expression were identified by microarray profiling. Among them, we validated by luciferase reporter assays, quantitative PCR and western blotting caspase 3 (Casp3), a critical factor for the control of apoptosis. Using site-directed mutagenesis, we found that three KSHV miRNAs, miR-K12-1, 3 and 4-3p, were responsible for the targeting of Casp3. Specific inhibition of these miRNAs in KSHV-infected cells resulted in increased expression levels of endogenous Casp3 and enhanced apoptosis. Altogether, our results suggest that KSHV miRNAs directly participate in the previously reported inhibition of apoptosis by the virus, and are thus likely to play a role in KSHV-induced oncogenesis.
MiRNAs are small, non-coding RNAs that regulate gene expression post-transcriptionally via binding to complementary sites in target mRNAs. This evolutionary conserved regulatory system is present in most eukaryotes, and it has recently been shown that certain viruses have evolved to express their own miRNAs. Due to their non-immunogenic nature, viral miRNAs represent an efficient tool for the virus to control its environment. Here we show that KSHV miRNAs are involved in the control of apoptosis both when expressed in stable cell lines and in the context of viral infection. Using a microarray based approach we identified putative cellular targets, among which the effector caspase 3 is targeted by three of the viral miRNAs. Finally, we showed that blocking these miRNAs in infected cells resulted both in increased Casp3 levels and a higher apoptosis rate. These findings indicate that miRNAs of viral origin are key players in cell death inhibition by KSHV.
Eukaryotic cells express a large variety of ribonucleic acid-(RNA)-binding proteins (RBPs) with diverse affinity and specificity towards target RNAs that play a crucial role in almost every aspect of RNA metabolism. In addition, specific domains in RBPs impart catalytic activity or mediate protein–protein interactions, making RBPs versatile regulators of gene expression. In this review, we elaborate on recent experimental and computational approaches that have increased our understanding of RNA–protein interactions and their role in cellular function. We review aspects of gene expression that are modulated post-transcriptionally by RBPs, namely the stability of polymerase II-derived mRNA transcripts and their rate of translation into proteins. We further highlight the extensive regulatory networks of RBPs that implement a combinatorial control of gene expression. Taking cues from the recent development in the field, we argue that understanding spatio-temporal RNA–protein association on a transcriptome level will provide invaluable and unexpected insights into the regulatory codes that define growth, differentiation and disease.
RNA-binding proteins; RNA-binding domains; RBP–RNA interaction; RBP regulatory networks; RBP target identification
The loss of HBII-52 and related C/D box small nucleolar RNA (snoRNA) expression units have been implicated as a cause for the Prader–Willi syndrome (PWS). We recently found that the C/D box snoRNA HBII-52 changes the alternative splicing of the serotonin receptor 2C pre-mRNA, which is different from the traditional C/D box snoRNA function in non-mRNA methylation. Using bioinformatic predictions and experimental verification, we identified five pre-mRNAs (DPM2, TAF1, RALGPS1, PBRM1 and CRHR1) containing alternative exons that are regulated by MBII-52, the mouse homolog of HBII-52. Analysis of a single member of the MBII-52 cluster of snoRNAs by RNase protection and northern blot analysis shows that the MBII-52 expressing unit generates shorter RNAs that originate from the full-length MBII-52 snoRNA through additional processing steps. These novel RNAs associate with hnRNPs and not with proteins associated with canonical C/D box snoRNAs. Our data indicate that not a traditional C/D box snoRNA MBII-52, but a processed version lacking the snoRNA stem is the predominant MBII-52 RNA missing in PWS. This processed snoRNA functions in alternative splice-site selection. Its substitution could be a therapeutic principle for PWS.
MicroRNAs (miRNAs) are small endogenous RNAs, which typically imperfectly base-pair with 3′UTRs and mediate translational repression and mRNA degradation. Dicer, an RNase III generating small RNAs in the miRNA and RNAi pathways, is essential for meiotic maturation of mouse oocytes. We found that 3′UTRs of transcripts up-regulated in Dicer1−/− oocytes are not enriched in miRNA binding sites implicating a weak impact of miRNAs on the maternal transcriptome. Therefore, we tested the ability of endogenous miRNAs to mediate RNA-like cleavage or translational repression of reporter mRNAs. In contrast to somatic cells, endogenous miRNAs in fully-grown GV oocytes poorly repressed translation of mRNA reporters whereas their RNAi-like activity was much less affected. In addition, reporter mRNA carrying let-7-binding sites failed to localize to P-body-like structures in oocytes. Our data suggest that normal miRNA function is down-regulated during oocyte development and this idea is further supported by normal meiotic maturation of oocytes lacking Dgcr8, which is required for the miRNA but not the RNAi pathway [Suh et al.]. We propose that suppression of miRNA function during oocyte growth is an early event in reprogramming gene expression during the transition of a differentiated oocyte into pluripotent blastomeres of the embryo.
miRNA; endo-siRNA; P-body; maternal mRNA; oocyte; mRNA stability; mRNA degradation; translational arrest
The piRNA pathway operates in animal germ lines to ensure genome integrity through retrotransposon silencing. The Piwi protein-associated small RNAs (piRNAs) guide Piwi proteins to retrotransposon transcripts, which are degraded and thereby post-transcriptionally silenced through a ping-pong amplification process. Cleavage of the retrotransposon transcript defines at the same time the 5' end of a secondary piRNA that will in turn guide a Piwi protein to a primary piRNA precursor, thereby amplifying primary piRNAs. Although several studies provided evidence that this mechanism is conserved among metazoa, how the process is initiated and what enzymatic activities are responsible for generating the primary and secondary piRNAs are not entirely clear.
Here we analyzed small RNAs from three mammalian species, seeking to gain further insight into the mechanisms responsible for the piRNA amplification loop. We found that in all these species piRNA-directed targeting is accompanied by the generation of short sequences that have a very precisely defined length, 19 nucleotides, and a specific spatial relationship with the guide piRNAs.
This suggests that the processing of the 5' product of piRNA-guided cleavage occurs while the piRNA target is engaged by the Piwi protein. Although they are not stabilized through methylation of their 3' ends, the 19-mers are abundant not only in testes lysates but also in immunoprecipitates of Miwi and Mili proteins. They will enable more accurate identification of piRNA loci in deep sequencing data sets.
The stability, localization and translation rate of mRNAs are regulated by a multitude of RNA-binding proteins (RBPs) that find their targets directly or with the help of guide RNAs. Among the experimental methods for mapping RBP binding sites, cross-linking and immunoprecipitation (CLIP) coupled with deep sequencing provides transcriptome-wide coverage as well as high resolution. However, partly due to their vast volume, the data that were so far generated in CLIP experiments have not been put in a form that enables fast and interactive exploration of binding sites. To address this need, we have developed the CLIPZ database and analysis environment. Binding site data for RBPs such as Argonaute 1-4, Insulin-like growth factor II mRNA-binding protein 1-3, TNRC6 proteins A-C, Pumilio 2, Quaking and Polypyrimidine tract binding protein can be visualized at the level of the genome and of individual transcripts. Individual users can upload their own sequence data sets while being able to limit the access to these data to specific users, and analyses of the public and private data sets can be performed interactively. CLIPZ, available at http://www.clipz.unibas.ch, aims to provide an open access repository of information for post-transcriptional regulatory elements.
The conserved pre-mRNA splicing factor SF1 is implicated in 3′ splice site recognition by binding directly to the intron branch site. However, because SF1 is not essential for constitutive splicing, its role in pre-mRNA processing has remained mysterious. Here, we used crosslinking and immunoprecipitation (CLIP) to analyze short RNAs directly bound by human SF1 in vivo. SF1 bound mainly pre-mRNAs, with 77% of target sites in introns. Binding to target RNAs in vitro was dependent on the newly defined SF1 binding motif ACUNAC, strongly resembling human branch sites. Surprisingly, the majority of SF1 binding sites did not map to the expected position near 3′ splice sites. Instead, target sites were distributed throughout introns, and a smaller but significant fraction occurred in exons within coding and untranslated regions. These data suggest a more complex role for SF1 in splicing regulation. Indeed, SF1 silencing affected alternative splicing of endogenous transcripts, establishing a previously unexpected role for SF1 and branch site-like sequences in splice site selection.
In addition to acting as an RNA quality control pathway, nonsense-mediated mRNA decay (NMD) plays roles in regulating normal gene expression. In particular, the extent to which alternative splicing is coupled to NMD and the roles of NMD in regulating uORF containing transcripts have been a matter of debate.
In order to achieve a greater understanding of NMD regulated gene expression we used 2D-DiGE proteomics technology to examine the changes in protein expression induced in HeLa cells by UPF1 knockdown. QPCR based validation of the corresponding mRNAs, in response to both UPF1 knockdown and cycloheximide treatment, identified 17 bona fide NMD targets. Most of these were associated with bioinformatically predicted NMD activating features, predominantly upstream open reading frames (uORFs). Strikingly, however, the majority of transcripts up-regulated by UPF1 knockdown were either insensitive to, or even down-regulated by, cycloheximide treatment. Furthermore, the mRNA abundance of several down-regulated proteins failed to change upon UPF1 knockdown, indicating that UPF1's role in regulating mRNA and protein abundance is more complex than previously appreciated. Among the bona fide NMD targets, we identified a highly conserved AS-NMD event within the 3' UTR of the HNRNPA2B1 gene. Overexpression of GFP tagged hnRNP A2 resulted in a decrease in endogenous hnRNP A2 and B1 mRNA with a concurrent increase in the NMD sensitive isoforms.
Despite the large number of changes in protein expression upon UPF1 knockdown, a relatively small fraction of them can be directly attributed to the action of NMD on the corresponding mRNA. From amongst these we have identified a conserved AS-NMD event within HNRNPA2B1 that appears to mediate autoregulation of HNRNPA2B1 expression levels.
RNA transcripts are subjected to post-transcriptional gene regulation by interacting with hundreds of RNA-binding proteins (RBPs) and microRNA-containing ribonucleoprotein complexes (miRNPs) that are often expressed in a cell-type dependently. To understand how the interplay of these RNA-binding factors affects the regulation of individual transcripts, high resolution maps of in vivo protein-RNA interactions are necessary1.
A combination of genetic, biochemical and computational approaches are typically applied to identify RNA-RBP or RNA-RNP interactions. Microarray profiling of RNAs associated with immunopurified RBPs (RIP-Chip)2 defines targets at a transcriptome level, but its application is limited to the characterization of kinetically stable interactions and only in rare cases3,4 allows to identify the RBP recognition element (RRE) within the long target RNA. More direct RBP target site information is obtained by combining in vivo UV crosslinking5,6 with immunoprecipitation7-9 followed by the isolation of crosslinked RNA segments and cDNA sequencing (CLIP)10. CLIP was used to identify targets of a number of RBPs11-17. However, CLIP is limited by the low efficiency of UV 254 nm RNA-protein crosslinking, and the location of the crosslink is not readily identifiable within the sequenced crosslinked fragments, making it difficult to separate UV-crosslinked target RNA segments from background non-crosslinked RNA fragments also present in the sample.
We developed a powerful cell-based crosslinking approach to determine at high resolution and transcriptome-wide the binding sites of cellular RBPs and miRNPs that we term PAR-CliP (Photoactivatable-Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation) (see Fig. 1A for an outline of the method). The method relies on the incorporation of photoreactive ribonucleoside analogs, such as 4-thiouridine (4-SU) and 6-thioguanosine (6-SG) into nascent RNA transcripts by living cells. Irradiation of the cells by UV light of 365 nm induces efficient crosslinking of photoreactive nucleoside-labeled cellular RNAs to interacting RBPs. Immunoprecipitation of the RBP of interest is followed by isolation of the crosslinked and coimmunoprecipitated RNA. The isolated RNA is converted into a cDNA library and deep sequenced using Solexa technology. One characteristic feature of cDNA libraries prepared by PAR-CliP is that the precise position of crosslinking can be identified by mutations residing in the sequenced cDNA. When using 4-SU, crosslinked sequences thymidine to cytidine transition, whereas using 6-SG results in guanosine to adenosine mutations. The presence of the mutations in crosslinked sequences makes it possible to separate them from the background of sequences derived from abundant cellular RNAs.
Application of the method to a number of diverse RNA binding proteins was reported in Hafner et al.18
RNA transcripts are subject to post-transcriptional gene regulation involving hundreds of RNA-binding proteins (RBPs) and microRNA-containing ribonucleoprotein complexes (miRNPs) expressed in a cell-type dependent fashion. We developed a cell-based crosslinking approach to determine at high resolution and transcriptome-wide the binding sites of cellular RBPs and miRNPs. The crosslinked sites are revealed by thymidine to cytidine transitions in the cDNAs prepared from immunopurified RNPs of 4-thiouridine-treated cells. We determined the binding sites and regulatory consequences for several intensely studied RBPs and miRNPs, including PUM2, QKI, IGF2BP1-3, AGO/EIF2C1-4 and TNRC6A-C. Our study revealed that these factors bind thousands of sites containing defined sequence motifs and have distinct preferences for exonic versus intronic or coding versus untranslated transcript regions. The precise mapping of binding sites across the transcriptome will be critical to the interpretation of the rapidly emerging data on genetic variation between individuals and how these variations contribute to complex genetic diseases.
MicroRNAs (miRNAs) are small regulatory RNAs with many biological functions and disease associations. We showed that in situ hybridization (ISH) using conventional formaldehyde fixation results in significant miRNA loss from mouse tissue sections, which can be prevented by fixation with 1–ethyl–3–(3–dimethylaminopropyl) carbodiimide (EDC) that irreversibly immobilizes the miRNA at its 5' phosphate. We determined optimal hybridization parameters for 130 locked nucleic acid (LNA) probes by recording nucleic acid melting temperature during ISH.
immunohistochemistry; heart; liver; brain; dendrites; methanal; paraformaldehyde; water–soluble carbodiimide; phosphoramidate; crosslink; non–coding RNA; small RNA; locked nucleic acid probe
Small nucleolar RNAs (snoRNAs) are localized within the nucleolus, a sub-nuclear compartment, in which they guide ribosomal or spliceosomal RNA modifications, respectively. Up until now, snoRNAs have only been identified in eukaryal and archaeal genomes, but are notably absent in bacteria. By screening B lymphocytes for expression of non-coding RNAs (ncRNAs) induced by the Epstein-Barr virus (EBV), we here report, for the first time, the identification of a snoRNA gene within a viral genome, designated as v-snoRNA1. This genetic element displays all hallmark sequence motifs of a canonical C/D box snoRNA, namely C/C′- as well as D/D′-boxes. The nucleolar localization of v-snoRNA1 was verified by in situ hybridisation of EBV-infected cells. We also confirmed binding of the three canonical snoRNA proteins, fibrillarin, Nop56 and Nop58, to v-snoRNA1. The C-box motif of v-snoRNA1 was shown to be crucial for the stability of the viral snoRNA; its selective deletion in the viral genome led to a complete down-regulation of v-snoRNA1 expression levels within EBV-infected B cells. We further provide evidence that v-snoRNA1 might serve as a miRNA-like precursor, which is processed into 24 nt sized RNA species, designated as v-snoRNA124pp. A potential target site of v-snoRNA124pp was identified within the 3′-UTR of BALF5 mRNA which encodes the viral DNA polymerase. V-snoRNA1 was found to be expressed in all investigated EBV-positive cell lines, including lymphoblastoid cell lines (LCL). Interestingly, induction of the lytic cycle markedly up-regulated expression levels of v-snoRNA1 up to 30-fold. By a computational approach, we identified a v-snoRNA1 homolog in the rhesus lymphocryptovirus genome. This evolutionary conservation suggests an important role of v-snoRNA1 during γ-herpesvirus infection.
Epstein-Barr virus (EBV) infects about 90% of people worldwide and is associated with different types of cancer. So far, only two large virus-encoded non-coding RNAs (EBER1 and EBER2) and 25 microRNAs (miRNAs) have been identified in the EBV genome. In this study, we report identification of the first member of another abundant non-coding RNA class, a small nucleolar RNA (snoRNA), designated as v-snoRNA1. We show that v-snoRNA1 is located in the nucleolus and interacts with the same proteins as reported for canonical eukaryal snoRNAs. Its biological function is consistent with its high conservation in a distantly related simian herpesvirus genome. Interestingly, v-snoRNA1 might serve as a miRNA-like precursor, which is processed into a 24 nt sized RNA species, designated as v-snoRNA124pp. The viral DNA polymerase BALF5 was identified as a potential target for v-snoRNA124pp. Taken together, these experiments strengthen the crucial function of v-snoRNA1 in EBV infection.
MicroRNAs (miRNAs) are short RNAs that act as guides for the degradation and translational repression of protein-coding mRNAs. A large body of work showed that miRNAs are involved in the regulation of a broad range of biological functions, from development to cardiac and immune system function, to metabolism, to cancer. For most of the over 500 miRNAs that are encoded in the human genome the functions still remain to be uncovered. Identifying miRNAs whose expression changes between cell types or between normal and pathological conditions is an important step towards characterizing their function as is the prediction of mRNAs that could be targeted by these miRNAs. To provide the community the possibility of exploring interactively miRNA expression patterns and the candidate targets of miRNAs in an integrated environment, we developed the MirZ web server, which is accessible at www.mirz.unibas.ch. The server provides experimental and computational biologists with statistical analysis and data mining tools operating on up-to-date databases of sequencing-based miRNA expression profiles and of predicted miRNA target sites in species ranging from Caenorhabditis elegans to Homo sapiens.
Genome-wide identification of mRNAs regulated by RNA-binding proteins is crucial to uncover post-transcriptional gene regulatory systems. The conserved PUF family RNA-binding proteins repress gene expression post-transcriptionally by binding to sequence elements in 3′-UTRs of mRNAs. Despite their well-studied implications for development and neurogenesis in metazoa, the mammalian PUF family members are only poorly characterized and mRNA targets are largely unknown. We have systematically identified the mRNAs associated with the two human PUF proteins, PUM1 and PUM2, by the recovery of endogenously formed ribonucleoprotein complexes and the analysis of associated RNAs with DNA microarrays. A largely overlapping set comprised of hundreds of mRNAs were reproducibly associated with the paralogous PUM proteins, many of them encoding functionally related proteins. A characteristic PUF-binding motif was highly enriched among PUM bound messages and validated with RNA pull-down experiments. Moreover, PUF motifs as well as surrounding sequences exhibit higher conservation in PUM bound messages as opposed to transcripts that were not found to be associated, suggesting that PUM function may be modulated by other factors that bind conserved elements. Strikingly, we found that PUF motifs are enriched around predicted miRNA binding sites and that high-confidence miRNA binding sites are significantly enriched in the 3′-UTRs of experimentally determined PUM1 and PUM2 targets, strongly suggesting an interaction of human PUM proteins with the miRNA regulatory system. Our work suggests extensive connections between the RBP and miRNA post-transcriptional regulatory systems and provides a framework for deciphering the molecular mechanism by which PUF proteins regulate their target mRNAs.
High-throughput sequencing studies revealed that the majority of human and mouse multi-exon genes have multiple splice forms. High-density oligonucleotide array-based measurements have further established that many exons are expressed in a tissue-specific manner. The mechanisms underlying the tissue-dependent expression of most alternative exons remain, however, to be understood. In this study, we focus on one possible mechanism, namely the coupling of (tissue specific) transcription regulation with alternative splicing. We analyzed the FANTOM3 and H-Invitational datasets of full-length mouse and human cDNAs, respectively, and found that in transcription units with multiple start sites, the inclusion of at least 15% and possibly up to 30% of the ‘cassette’ exons correlates with the use of specific transcription start sites (TSS). The vast majority of TSS-associated exons are conserved between human and mouse, yet the conservation is weaker when compared with TSS-independent exons. Additionally, the currently available data only support a weak correlation between the probabilities of TSS association of orthologous exons. Our analysis thus suggests frequent coupling of transcriptional and splicing programs, and provides a large dataset of exons on which the molecular basis of this coupling can be further studied.
alternative splicing; transcription initiation
MicroRNAs (miRNAs) are increasingly being recognized as major regulators of gene expression in many organisms, including viruses. Among viruses, members of the family Herpesviridae account for the majority of the currently known virus-encoded miRNAs. The highly oncogenic Marek's disease virus type 1 (MDV-1), an avian herpesvirus, has recently been shown to encode eight miRNAs clustered in the MEQ and LAT regions of the viral genome. The genus Mardivirus, to which MDV-1 belongs, also includes the nononcogenic but antigenically related MDV-2. As MDV-1 and MDV-2 are evolutionarily very close, we sought to determine if MDV-2 also encodes miRNAs. For this, we cloned, sequenced, and analyzed a library of small RNAs from the lymphoblastoid cell line MSB-1, previously shown to be coinfected with both MDV-1 and MDV-2. Among the 5,099 small RNA sequences determined from the library, we identified 17 novel MDV-2-specific miRNAs. Out of these, 16 were clustered in a 4.2-kb long repeat region that encodes R-LORF2 to R-LORF5. The single miRNA outside the cluster was located in the short repeat region, within the C-terminal region of the ICP4 homolog. The expression of these miRNAs in MSB-1 cells and infected chicken embryo fibroblasts was further confirmed by Northern blotting analysis. The identification of miRNA clusters within the repeat regions of MDV-2 demonstrates conservation of the relative genomic positions of miRNA clusters in MDV-1 and MDV-2, despite the lack of sequence homology among the miRNAs of the two viruses. The identification of these novel miRNAs adds to the growing list of virus-encoded miRNAs.
MicroRNAs have emerged as important regulatory genes in a variety of cellular processes and, in recent years, hundreds of such genes have been discovered in animals. In contrast, functional annotations are available only for a very small fraction of these miRNAs, and even in these cases only partially.
We developed a general Bayesian method for the inference of miRNA target sites, in which, for each miRNA, we explicitly model the evolution of orthologous target sites in a set of related species. Using this method we predict target sites for all known miRNAs in flies, worms, fish, and mammals. By comparing our predictions in fly with a reference set of experimentally tested miRNA-mRNA interactions we show that our general method performs at least as well as the most accurate methods available to date, including ones specifically tailored for target prediction in fly. An important novel feature of our model is that it explicitly infers the phylogenetic distribution of functional target sites, independently for each miRNA. This allows us to infer species-specific and clade-specific miRNA targeting. We also show that, in long human 3' UTRs, miRNA target sites occur preferentially near the start and near the end of the 3' UTR.
To characterize miRNA function beyond the predicted lists of targets we further present a method to infer significant associations between the sets of targets predicted for individual miRNAs and specific biochemical pathways, in particular those of the KEGG pathway database. We show that this approach retrieves several known functional miRNA-mRNA associations, and predicts novel functions for known miRNAs in cell growth and in development.
We have presented a Bayesian target prediction algorithm without any tunable parameters, that can be applied to sequences from any clade of species. The algorithm automatically infers the phylogenetic distribution of functional sites for each miRNA, and assigns a posterior probability to each putative target site. The results presented here indicate that our general method achieves very good performance in predicting miRNA target sites, providing at the same time insights into the evolution of target sites for individual miRNAs. Moreover, by combining our predictions with pathway analysis, we propose functions of specific miRNAs in nervous system development, inter-cellular communication and cell growth. The complete target site predictions as well as the miRNA/pathway associations are accessible on the ElMMo web server.
RNA interference and the microRNA (miRNA) pathway can induce sequence-specific mRNA degradation and/or translational repression. The human genome encodes hundreds of miRNAs that can post-transcriptionally repress thousands of genes. Using reporter constructs, we observed that degradation of mRNAs bearing sites imperfectly complementary to the endogenous let-7 miRNA is considerably stronger in human HEK293 than HeLa cells. The degradation did not result from the Ago2-mediated endonucleolytic cleavage but it was Dicer- and Ago2-dependent. We used this feature of HEK293 to address the size of a pool of transcripts regulated by RNA silencing in a single cell type. We generated HEK293 cell lines depleted of Dicer or individual Ago proteins. The cell lines were used for microarray analyses to obtain a comprehensive picture of RNA silencing. The 3′-untranslated region sequences of a few hundred transcripts that were commonly up-regulated upon Ago2 and Dicer knock-downs showed a significant enrichment of putative miRNA-binding sites. The up-regulation upon Ago2 and Dicer knock-downs was moderate and we found no evidence, at the mRNA level, for activation of silenced genes. Taken together, our data suggest that, independent of the effect on translation, miRNAs affect levels of a few hundred mRNAs in HEK293 cells.
Recent large-scale cDNA sequencing efforts show that elaborate patterns of splice variation are responsible for much of the proteome diversity in higher eukaryotes. To obtain an accurate account of the repertoire of splice variants, and to gain insight into the mechanisms of alternative splicing, it is essential that cDNAs are very accurately mapped to their respective genomes. Currently available algorithms for cDNA-to-genome alignment do not reach the necessary level of accuracy because they use ad hoc scoring models that cannot correctly trade off the likelihoods of various sequencing errors against the probabilities of different gene structures. Here we develop a Bayesian probabilistic approach to cDNA-to-genome alignment. Gene structures are assigned prior probabilities based on the lengths of their introns and exons, and based on the sequences at their splice boundaries. A likelihood model for sequencing errors takes into account the rates at which misincorporation, as well as insertions and deletions of different lengths, occurs during sequencing. The parameters of both the prior and likelihood model can be automatically estimated from a set of cDNAs, thus enabling our method to adapt itself to different organisms and experimental procedures. We implemented our method in a fast cDNA-to-genome alignment program, SPA, and applied it to the FANTOM3 dataset of over 100,000 full-length mouse cDNAs and a dataset of over 20,000 full-length human cDNAs. Comparison with the results of four other mapping programs shows that SPA produces alignments of significantly higher quality. In particular, the quality of the SPA alignments near splice boundaries and SPA's mapping of the 5′ and 3′ ends of the cDNAs are highly improved, allowing for more accurate identification of transcript starts and ends, and accurate identification of subtle splice variations. Finally, our splice boundary analysis on the human dataset suggests the existence of a novel non-canonical splice site that we also find in the mouse dataset. The SPA software package is available at http://www.biozentrum.unibas.ch/personal/nimwegen/cgi-bin/spa.cgi.
A prerequisite for the identification and analysis of splice variation in the transcriptomes of higher eukaryotes is the very accurate mapping of cDNAs to their genomes. However, current algorithms use ad hoc scoring schemes that cannot correctly trade off the likelihoods of different sequencing errors against the likelihoods of different gene structures.
In this paper the authors develop a Bayesian probabilistic approach to cDNA-to-genome mapping that combines explicit models for the prior probabilities of different gene structures with the likelihoods of different sequencing errors. The parameters of these probabilistic models can be estimated automatically from the input such that the mapping procedure is automatically adapted to the organism and sequencing technology of the data under study.
The authors implement their approach in a fast mapping algorithm called SPA and apply it to a dataset of human full-length cDNAs and the FANTOM3 dataset of mouse full-length cDNAs. Comparisons with four other mapping algorithms show that SPA produces mappings that are significantly more accurate, with the largest improvements in the mappings of the 5′ and 3′ ends of the cDNAs, and the mappings around splice boundaries. The authors also identify a novel set of putative splice sites in the human dataset.
One of the most common splice variations are small exon length variations caused by the use of alternative donor or acceptor splice sites that are in very close proximity on the pre-mRNA. Among these, three-nucleotide variations at so-called NAGNAG tandem acceptor sites have recently attracted considerable attention, and it has been suggested that these variations are regulated and serve to fine-tune protein forms by the addition or removal of a single amino acid. In this paper we first show that in-frame exon length variations are generally overrepresented and that this overrepresentation can be quantitatively explained by the effect of nonsense-mediated decay. Our analysis allows us to estimate that about 50% of frame-shifted coding transcripts are targeted by nonsense-mediated decay. Second, we show that a simple physical model that assumes that the splicing machinery stochastically binds to nearby splice sites in proportion to the affinities of the sites correctly predicts the relative abundances of different small length variations at both boundaries. Finally, using the same simple physical model, we show that for NAGNAG sites, the difference in affinities of the neighboring sites for the splicing machinery accurately predicts whether splicing will occur only at the first site, splicing will occur only at the second site, or three-nucleotide splice variants are likely to occur. Our analysis thus suggests that small exon length variations are the result of stochastic binding of the spliceosome at neighboring splice sites. Small exon length variations occur when there are nearby alternative splice sites that have similar affinity for the splicing machinery.
It has recently become clear that splice variation affects most mammalian genes. It is, however, less clear to what extent these splice variations are functional and regulated by the cell as opposed to simply a result of noise in the splicing process.
One of the most frequently observed forms of splice variation are small variations in exon length in which the boundary of an exon is shifted by small amounts between different transcripts. In this work the authors study the statistics of these splice variations in detail, and the results suggest that these variations are mostly the result of noise in the splicing process. In particular, they propose a simple physical model in which the last step of splicing involves the sequence-specific binding of the splicing machinery to the splice site. In this model, small length variations can occur when there are nearby splice sites with comparable affinity for the splicing machinery. The authors show that this model not only accurately predicts the relative abundances of different splice variations but also predicts which splice sites are likely to undergo small exon length variations.
MicroRNAs (miRNAs) are endogenous 21 to 23-nucleotide RNA molecules that regulate protein-coding gene expression in plants and animals via the RNA interference pathway. Hundreds of them have been identified in the last five years and very recent works indicate that their total number is still larger. Therefore miRNAs gene discovery remains an important aspect of understanding this new and still widely unknown regulation mechanism. Bioinformatics approaches have proved to be very useful toward this goal by guiding the experimental investigations.
In this work we describe our computational method for miRNA prediction and the results of its application to the discovery of novel mammalian miRNAs. We focus on genomic regions around already known miRNAs, in order to exploit the property that miRNAs are occasionally found in clusters. Starting with the known human, mouse and rat miRNAs we analyze 20 kb of flanking genomic regions for the presence of putative precursor miRNAs (pre-miRNAs). Each genome is analyzed separately, allowing us to study the species-specific identity and genome organization of miRNA loci. We only use cross-species comparisons to make conservative estimates of the number of novel miRNAs. Our ab initio method predicts between fifty and hundred novel pre-miRNAs for each of the considered species. Around 30% of these already have experimental support in a large set of cloned mammalian small RNAs. The validation rate among predicted cases that are conserved in at least one other species is higher, about 60%, and many of them have not been detected by prediction methods that used cross-species comparisons. A large fraction of the experimentally confirmed predictions correspond to an imprinted locus residing on chromosome 14 in human, 12 in mouse and 6 in rat. Our computational tool can be accessed on the world-wide-web.
Our results show that the assumption that many miRNAs occur in clusters is fruitful for the discovery of novel miRNAs. Additionally we show that although the overall miRNA content in the observed clusters is very similar across the three considered species, the internal organization of the clusters changes in evolution.