MicroRNAs (miRNAs) are short RNA sequences expressed from longer transcripts encoded in animal, plant and virus genomes, and recently discovered in a single-celled eukaryote (1
). miRNAs regulate the expression of target genes by binding to complementary sites in their transcripts to cause translational repression or transcript degradation (3
). Translational repression is thought to be the primary mechanism for imperfect target duplexes in animals, with transcript degradation the dominant mechanism for largely perfect matches found throughout plant target transcripts. miRNAs have been implicated in processes and pathways such as development, cell proliferation, apoptosis, metabolism and morphogenesis, and in diseases including cancer (4
miRBase is the primary repository and database resource for miRNA data. The database has three main functions:
- miRBase::Registry provides a confidential service for the independent assignment of names to novel miRNA genes prior to their publication in peer-reviewed journals. Over 70 publications describing novel miRNA genes have made use of this service, and registration is a requirement of many journals.
- miRBase::Sequences provides miRNA sequence data, annotation, references and links to other resources for all published miRNAs. The database (release 10.0) contains over 5000 sequences from 58 species.
- miRBase::Targets provides an automated pipeline for the prediction of targets for all published animal miRNAs. The current release of the database (v5) predicts targets in over 500 000 transcripts for all miRNAs in 24 species. The target prediction pipeline and algorithms have been described elsewhere (6,7).
The miRNA nomenclature scheme has been presented and discussed previously (6
). Novel miRNAs require cloning or expression evidence, and should be submitted only after a manuscript describing their identification is accepted for publication. Assigned names should then be incorporated into the final version of the manuscript prior to publication. Obvious homologues of miRNAs validated in closely related species need not be experimentally verified and may be submitted at any time. Primary features of the nomenclature scheme are:
- The miRNA name contains a three or four letter species prefix and a numeric suffix (e.g. hsa-mir-212).
- A mature miRNA sequence may be predicted to be expressed from more than one hairpin precursor locus, denoted with further numeric suffixes (e.g. dme-mir-6-1 and dme-mir-6-2).
- Related hairpin loci expressing related mature miRNA sequences have lettered suffixes (e.g. mmu-mir-181a and mmu-mir-181b).
- Plant miRNA genes are given names of the form ath-MIR166a. Lettered suffixes describe distinct loci expressing all related mature miRNAs; numeric suffixes are not used.
- Viral miRNA names conventionally relate to the locus from which the miRNA derives (e.g. ebv-mir-BART1 from the Epstein Barr virus BART locus).
However, it is important to note that a short name cannot always encode complex information such as orthology and paralogy relationships. In some cases, the short name is a pragmatic choice that is the most consistent of conflicting representations of these sequence relationships. While the names provide a guide of family and function, they should not therefore be relied upon to confer any complex meaning. Instead, dedicated fields in the database provide information about gene and mature miRNA sequence families.
The published miRNA literature is huge. Readers are referred to a number of comprehensive reviews of miRNA structure, biogenesis and function (4
). Here, we focus on specific issues and points of interest with respect to the provision of miRNA data in the miRBase database.