|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs (miRNAs), one type of small RNAs (sRNAs) in plants, play an essential role in gene regulation. Several miRNA databases were established; however, successively generated new datasets need to be collected, organized and analyzed. To this end, we have constructed a plant miRNA knowledge base (PmiRKB) that provides four major functional modules. In the ‘SNP’ module, single nucleotide polymorphism (SNP) data of seven Arabidopsis (Arabidopsis thaliana) accessions and 21 rice (Oryza sativa) subspecies were collected to inspect the SNPs within pre-miRNAs (precursor microRNAs) and miRNA—target RNA duplexes. Depending on their locations, SNPs can affect the secondary structures of pre-miRNAs, or interactions between miRNAs and their targets. A second module, ‘Pri-miR’, can be used to investigate the tissue-specific, transcriptional contexts of pre- and pri-miRNAs (primary microRNAs), based on massively parallel signature sequencing data. The third module, ‘MiR–Tar’, was designed to validate thousands of miRNA—target pairs by using parallel analysis of RNA end (PARE) data. Correspondingly, the fourth module, ‘Self-reg’, also used PARE data to investigate the metabolism of miRNA precursors, including precursor processing and miRNA- or miRNA*-mediated self-regulation effects on their host precursors. PmiRKB can be freely accessed at http://bis.zju.edu.cn/pmirkb/.
The biological significance of miRNAs was widely recognized at the beginning of this century. Since then, miRNAs have been extensively studied in both plants and animals (1,2). The miRNAs, ~21nt in length, can recognize their targets based on complementary recognition sites, normally resulting in transcript cleavages in plants or translational repression in animals (1,2). The functionalities of miRNA targets vary widely, although a sizable portion were transcription factors (2,3), suggesting extraordinarily complex regulatory networks that miRNAs involved in. To date, many miRNA families, either highly conserved or species-specific, have been cloned and functionally characterized (2,3). With the advent of next-generation sequencing, significant advances have been achieved in the miRNA research area, and valuable sequencing data were generated (4,5). Thus, standardized databases are required for data deposition, organization, parsing and analysis, while also allowing for user queries. Currently, there are several established miRNA databases, such as miRBase (6) and the plant microRNA database (PMRD) (7). These two databases catalog various organisms, and numerous plant species, respectively. Although extremely comprehensive, they only provide general information for miRNAs, such as sequences, secondary structures, experimental evidences, and references. Hence, by using such kind of databases, only basic queries of specific miRNAs are available. Apparently, a need exists for databases providing more specific tools and in-depth information regarding the miRNAs.
Here, we focus on the miRNAs of two model plants, the eudicot Arabidopsis and the monocot rice. PmiRKB (http://bis.zju.edu.cn/pmirkb/) contains four major functional modules which are titled: ‘SNP’, ‘Pri-miR’, ‘MiR–Tar’ and ‘Self-reg’ (Figure 1). The ‘SNP’ module contains SNPs within the pre-miRNAs among different Arabidopsis or rice subspecies based on currently released SNP data. To investigate the potential effects of SNPs on the secondary structure transformation of pre-miRNAs, the stem–loop structured pre-miRNAs were predicted. The ‘SNP’ module also includes SNPs present in the mature miRNAs and their target sites, which may affect miRNA–target interactions. The second module, ‘Pri-miR’, provides data regarding the transcriptional contexts of pre- and pri-miRNAs, which include tissue-specific considerations. To our knowledge, this is the first large-scale attempt to elucidate the transcriptional ranges of pri-miRNAs in planta. The third module, ‘MiR–Tar’, can identify cleavage signals present within predicted miRNA–target recognition sites based on PARE data derived from plant degradomes. As a result, this module can serve as the reference for in vivo miRNA–target pair validation. Lastly, the ‘Self-reg’ module provides data regarding miRNA precursor processing, and miRNA- or miRNA*-mediated cleavage effects on their host precursors, based on cleavage signals detected by PARE. Together, our online service of PmiRKB implemented in PostgreSQL+Apache+hypertext preprocessor (PHP)+scalable vector graphics (SVG) provides an unprecedented resource for plant miRNA research, ensuring its value for biologists.
Sequence information for the miRNAs included in PmiRKB was retrieved from miRBase (http://www.mirbase.org/index.shtml; release 15) (6). Genomic information for Arabidopsis and rice were downloaded from the Arabidopsis information resource (TAIR release 9, ftp://ftp.arabidopsis.org/home/tair/Sequences/) (8), and the rice genome annotation project established by the institute for genome research (currently named the J. Craig Venter Institute) (TIGR rice genome release 6.1, ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/) (9), respectively. The SNPs of seven Arabidopsis accessions (Col-0, Bur-0, Tsu-1, Cvi-0, Ler-1, Bay-0, and Sha; the genome of Col-0 served as the reference) reported by Weigel’s group (10) were retrieved from TAIR (ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR9_genome_release/TAIR9_gff3/Variation_GFF/, release 9) (8). The SNPs between two rice subspecies, Nipponbare and 93–11 (the genome of Nipponbare served as the reference), were a gift from the Paterson group (11), while the SNPs belonging to 20 rice subspecies (Nipponbare, Tainung 67, Li-Jiang-Xin-Tuan-Hei-Gu, M 202, Azucena, Moroberekan, Cypress, Dom-Sufid, N 22, Dular, FR13 A, Aswina, Rayada, IR64-21, Shan-Huang Zhan-2, Pokkali, Swarna, Sadu-Cho, Minghui 63 and Zhenshan 97B; the genome of Nipponbare served as the reference) were obtained from the OryzaSNP Project (ftp://ftp.plantbiology.msu.edu/pub/data/Oryza_SNP/) (12,13). All of the SNPs, and their genomic positions, were manually checked based on the current genomic information available from TAIR (release 9) (8) and TIGR (release 6.1) (9). Detected inconsistencies were removed.
A majority of the MPSS and the PARE data of Arabidopsis and rice were retrieved from the plant MPSS databases (http://mpss.udel.edu/at/mpss_index.php and http://mpss.udel.edu/at_pare/ for Arabidopsis, and http://mpss.udel.edu/rice/mpss_index.php and http://mpss.udel.edu/rice_pare/ for rice, respectively) (14–16). In addition, a portion of PARE data of Arabidopsis and short-read sequences derived from poly(A)-tailed transcripts and degradomes in rice were collected from recent reports (17–19). For users, more detailed descriptions of the features and the usages of these high-throughput sequencing (HTS) data can be obtained from the link ‘Instructions’ of PmiRKB.
All the data retrieved for this study were summarized in Table 1.
MiRNA targets were predicted in silico using the algorithm employed by miRU with default settings (20), and the messenger RNA (mRNA) sequences of Arabidopsis and rice were retrieved from TAIR (release 9) and TIGR (release 6.1), respectively. A random subset of the predicted results was manually checked with the results generated by miRU to ensure the accuracy of our prediction program written in C language.
SNPs present in pre-miRNAs and mature miRNAs in Arabidopsis and rice were extracted from the whole-genome SNP datasets, and downstream targets were predicted for each miRNA using the algorithm employed by miRU (20). These results were integrated into PmiRKB, along with the SNPs present in predicted miRNA target sites and the SNPs with potential to alter the secondary structures of pre-miRNAs according to RNAfold prediction (21) (see Table 2 for summary). The resulting ‘SNP’ module can identify different secondary conformations of a specific pre-miRNA among various Arabidopsis or rice subspecies, which result from SNP-induced thermal stability alteration (Figure 2A). Moreover, SNPs present in the miRNA–target binding regions, which have the potential to directly disturb miRNA–target interactions, were taken into account (Figure 2A). Accordingly, the output from the ‘SNP’ module can direct the investigation of phenotypic or physiological divergences among different Arabidopsis or rice subspecies, which may be attributed to SNP-involved, miRNA-mediated regulatory processes.
HTS data, largely represented by MPSS data (22), were retrieved from public resources (see details in ‘Data Collection’ section). These short reads were derived from sequencing of the poly(A)-tailed transcripts in various tissues (14). Although many poly(A)-tailed transcripts are represented by mRNAs, a majority of pri-miRNAs transcribed by RNA polymerase II also possess poly(A) tails (2,23). Thus, these HTS datasets can be used to detect the transcriptional signals of pri-miRNAs in a tissue-specific manner. For this purpose, all the HTS signatures were mapped to the corresponding genomes, and only the signatures with unique genomic loci were reserved for further investigation. The short reads located within ~5kb on either side of a pre-miRNA were used to reveal the transcriptional context of the corresponding pri-miRNA. Moreover, expression levels of the short reads associated with each library prepared from distinct tissues were normalized in RPM (reads per million) to allow cross-library comparisons. Thus, the transcriptional signals of certain pri-miRNA can be queried in a tissue-specific manner by using the module ‘Pri-miR’ (Figure 2D–F).
In plants, most miRNAs cleave their downstream targets depending on the highly complementary recognition sites (2). The resulting 3′ cleavage remnants are usually utilized for miRNA–target pair validation and slicing site mapping by using modified 5′ rapid amplification of cDNA ends (RACE) (3). However, this classical method would be time-consuming, tedious, and costly once applied to large-scale validation of miRNA–target pairs. To overcome these limitations, the modified 5′ RACE has been combined with the newly available HTS to develop high-throughput methods, such as PARE (15–18). In this study, a large-scale investigation of in vivo miRNA–target pairs was performed based on the public PARE datasets. For this purpose, the PARE short reads were mapped to the mRNAs. The signatures containing overlapping regions with the predicted target sites were considered to be potential cleavage signals that support the specific miRNA–target regulatory relationships. Besides, the sample origin and the normalized expression level (in RPM) of each short read are provided for reference. Thus, the functional module, ‘MiR–Tar’, was established (Figure 2B).
Theoretically, 3′ remnants generated after the pri-miRNA processing by Dicer-like 1 (DCL1) can be included in plant degradome libraries and detected by PARE sequencing (17,24). Thus, the DCL1-mediated first-step cleavages on both strands of the stem region of a specific pri-miRNA, which result in two poly(A)-tailed remnants, can be validated by PARE data. On the other hand, mature miRNA, and occasionally its miRNA*, can recognize their host precursor as a target based on the complementary sequence present in the stem region. This type of miRNA- or miRNA*-mediated self-regulation can be also represented by plant degradome PARE sequencing data (15,24). Therefore, the module, ‘Self-reg’, was created in PmiRKB to provide a comprehensive view of miRNA precursor processing and miRNA- or miRNA*-mediated self-cleavages on their host precursors (Figure 2C). Similar to the ‘MiR–Tar’ module, the sample origin and the normalized expression level (in RPM) of each short read are provided for reference.
PmiRKB additionally includes the module, ‘MiR info’, which provides sequences of mature miRNAs and pre-miRNAs in FASTA format, clustered pre-miRNAs (within 50kb), experimental evidences, references and external links to miRBase (6) and PMRD (7). Several useful web links for accessing miRNA-related data and additional analytical tools are provided by ‘Useful links’. Beginners may also appreciate the resources provided in ‘Instructions’ and ‘References’ (Figure 1).
Although huge quantities of plant SNP data are currently available, few analyses have been applied for a large-scale investigation of their biological significances for specific types of genes, such as miRNAs. Based on the analytical results accessible from our PmiRKB, SNPs with the potential to alter the secondary structures of pre-miRNAs can be identified. Furthermore, the SNPs within the predicted miRNA–target duplexes, which may interrupt miRNA–target regulation, were also included in the ‘SNP’ module of PmiRKB.
Technological advances in sequencing have greatly accelerated the studies on plant transcriptomes (5). However, analyses of these newly generated HTS data remain incomplete. Here, the ingenious use of HTS data derived from poly(A)-tailed transcripts enabled us to uncover the transcriptional contexts of pri-miRNAs. Considering the lack of information regarding pri-miRNAs in the current miRNA databases, the ‘Pri-miR’ module is a valuable resource for the study of plant miRNA transcription. Specifically, it can facilitate primer or probe designing for pri-miRNA detection. Besides, the level of a specific pri-miRNA will not only indicate the extent of miRNA transcription, but also the stability of the pri-miRNA in vivo. Thus, ‘Pri-miR’ will also provide valuable insights into the efficiency of pri-miRNA processing in a specific tissue.
Another valuable resource provided by PmiRKB is the whole-transcriptome identification of miRNA–target pairs in planta, based on the PARE data (15–17). For researchers who intend to validate miRNA–target pairs of interest, the ‘MiR–Tar’ module can provide the information of cleavage signal intensities. Thus, the priority of each miRNA–target pair to be validated could be determined before modified 5′ RACE assays. A sizable portion of 3′ cleavage remnants, especially those at trace levels, are difficult to be detected by the traditional RACE method. Instead, they could be reflected by PARE data, considering the in-depth feature of HTS technology. Together, the ‘MiR–Tar’ module provides a comprehensive set of miRNA–target pair candidates, including novel pairs with subtle regulatory relationships. Finally, the ‘Self-reg’ module provides novel insights into miRNA precursor processing and miRNA- or miRNA*-mediated self-regulation of miRNA genes.
As the interest in the miRNA world grows, more and more datasets are being generated. Correspondingly, there is an urgent need to organize, parse, and analyze the data produced. This can be partially accomplished through the development of both comprehensive and specialized databases. Here, the creation of a plant miRNA knowledge base, PmiRKB, has been described. The potential for its four functional modules to provide valuable insights into the transcription, processing and regulation of miRNA genes has been demonstrated.
PmiRKB has been updated based on the current release of miRBase (from release 14 to 15) (6). In the near future, more plant species will be included, and PmiRKB will be timely updated following the new release of miRBase (6). In addition, as more HTS data will become publicly available, the results provided by the four modules of PmiRKB could be further refined.
Funding for open access charge: National High Technology Research and Development Program of China (“863” Program) (2008AA10Z125); National Natural Sciences Foundation of China (30771326, 30971743, 31050110121); Program for New Century Excellent Talents in University of China (NCET-07-0740).
Conflict of interest statement. None declared.
The authors thank Dr Ramanjulu Sunkar for his kindness to provide a free access to the valuable rice degradome sequencing data during the construction of PmiRKB. The authors thank Dr Christian Klukas for his kind discussions. The authors also thank Dr Michael Galperin and the two anonymous referees for their constructive and helpful suggestions.