Search tips
Search criteria

Results 1-25 (1129010)

Clipboard (0)

Related Articles

1.  Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences 
BMC Research Notes  2014;7:286.
MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons.
The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f–measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.
The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.
The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs.
The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
PMCID: PMC4051165  PMID: 24884968
MicroRNA; Support vector machine; Random forests; MiRNA hairpin prediction; Neural network
2.  Identification of Schistosoma mansoni microRNAs 
BMC Genomics  2011;12:47.
MicroRNAs (miRNAs) constitute a class of single-stranded RNAs which play a crucial role in regulating development and controlling gene expression by targeting mRNAs and triggering either translation repression or messenger RNA (mRNA) degradation. miRNAs are widespread in eukaryotes and to date over 14,000 miRNAs have been identified by computational and experimental approaches. Several miRNAs are highly conserved across species. In Schistosoma, the full set of miRNAs and their expression patterns during development remain poorly understood. Here we report on the development and implementation of a homology-based detection strategy to search for miRNA genes in Schistosoma mansoni. In addition, we report results on the experimental detection of miRNAs by means of cDNA cloning and sequencing of size-fractionated RNA samples.
Homology search using the high-throughput pipeline was performed with all known miRNAs in miRBase. A total of 6,211 mature miRNAs were used as reference sequences and 110 unique S. mansoni sequences were returned by BLASTn analysis. The existing mature miRNAs that produced these hits are reported, as well as the locations of the homologous sequences in the S. mansoni genome. All BLAST hits aligned with at least 95% of the miRNA sequence, resulting in alignment lengths of 19-24 nt. Following several filtering steps, 15 potential miRNA candidates were identified using this approach. By sequencing small RNA cDNA libraries from adult worm pairs, we identified 211 novel miRNA candidates in the S. mansoni genome. Northern blot analysis was used to detect the expression of the 30 most frequent sequenced miRNAs and to compare the expression level of these miRNAs between the lung stage schistosomula and adult worm stages. Expression of 11 novel miRNAs was confirmed by northern blot analysis and some presented a stage-regulated expression pattern. Three miRNAs previously identified from S. japonicum were also present in S. mansoni.
Evidence for the presence of miRNAs in S. mansoni is presented. The number of miRNAs detected by homology-based computational methods in S. mansoni is limited due to the lack of close relatives in the miRNA repository. In spite of this, the computational approach described here can likely be applied to the identification of pre-miRNA hairpins in other organisms. Construction and analysis of a small RNA library led to the experimental identification of 14 novel miRNAs from S. mansoni through a combination of molecular cloning, DNA sequencing and expression studies. Our results significantly expand the set of known miRNAs in multicellular parasites and provide a basis for understanding the structural and functional evolution of miRNAs in these metazoan parasites.
PMCID: PMC3034697  PMID: 21247453
3.  MicroRNA discovery by similarity search to a database of RNA-seq profiles 
Frontiers in Genetics  2013;4:133.
In silico generated search for microRNAs (miRNAs) has been driven by methods compiling structural features of the miRNA precursor hairpin, as well as to some degree combining this with the analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1–2 ~22 nt blocks of reads corresponding to the mature and star miRNA. In complement to the previous methods, we present a study where we systematically exploit these patterns of read profiles. We created two datasets comprised of 2540 and 4795 read profiles obtained after preprocessing short RNA-seq data from miRBase and ENCODE, respectively. Out of 4795 ENCODE read profiles, 1361 are annotated as non-coding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using deepBlockAlign (dba), we align ncRNA read profiles from ENCODE against the miRBase read profiles (cleaned for “self-matches”) and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews Correlation Coefficient (MCC) of 0.8 and obtain an area under the curve of 0.93. Based on the dba score cut-off of 0.7 at which we observed the maximum MCC of 0.8, we predict 523 novel miRNA candidates. An additional RNA secondary structure analysis reveal that 42 of the candidates overlap with predicted conserved secondary structure. Further analysis reveal that the 523 miRNA candidates are located in genomic regions with MAF block (UCSC) fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts. We further analyzed known human and mouse miRNA read profiles and found two distinct classes; the first containing two blocks and the second containing >2 blocks of reads. Also the latter class holds read profiles that have less well defined arrangement of reads in comparison to the former class. On comparison of miRNA read profiles from plants and animals, we observed kingdom specific read profiles that are distinct in terms of both length and distribution of reads within the read profiles to each other. All the data, as well as a server to search miRBase read profiles by uploading a BED file, is available at
PMCID: PMC3708161  PMID: 23874353
microRNA; miRNA read profiles; RNA-seq; alignment; deepBlockAlign; read profiles
4.  Discovery of Novel MicroRNAs in Rat Kidney Using Next Generation Sequencing and Microarray Validation 
PLoS ONE  2012;7(3):e34394.
MicroRNAs (miRNAs) are small non-coding RNAs that regulate a variety of biological processes. The latest version of the miRBase database (Release 18) includes 1,157 mouse and 680 rat mature miRNAs. Only one new rat mature miRNA was added to the rat miRNA database from version 16 to version 18 of miRBase, suggesting that many rat miRNAs remain to be discovered. Given the importance of rat as a model organism, discovery of the completed set of rat miRNAs is necessary for understanding rat miRNA regulation. In this study, next generation sequencing (NGS), microarray analysis and bioinformatics technologies were applied to discover novel miRNAs in rat kidneys. MiRanalyzer was utilized to analyze the sequences of the small RNAs generated from NGS analysis of rat kidney samples. Hundreds of novel miRNA candidates were examined according to the mappings of their reads to the rat genome, presence of sequences that can form a miRNA hairpin structure around the mapped locations, Dicer cleavage patterns, and the levels of their expression determined by both NGS and microarray analyses. Nine novel rat hairpin precursor miRNAs (pre-miRNA) were discovered with high confidence. Five of the novel pre-miRNAs are also reported in other species while four of them are rat specific. In summary, 9 novel pre-miRNAs (14 novel mature miRNAs) were identified via combination of NGS, microarray and bioinformatics high-throughput technologies.
PMCID: PMC3314633  PMID: 22470567
5.  MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features 
BMC Bioinformatics  2010;11(Suppl 11):S11.
MicroRNAs (simply miRNAs) are derived from larger hairpin RNA precursors and play essential regular roles in both animals and plants. A number of computational methods for miRNA genes finding have been proposed in the past decade, yet the problem is far from being tackled, especially when considering the imbalance issue of known miRNAs and unidentified miRNAs, and the pre-miRNAs with multi-loops or higher minimum free energy (MFE). This paper presents a new computational approach, miRenSVM, for finding miRNA genes. Aiming at better prediction performance, an ensemble support vector machine (SVM) classifier is established to deal with the imbalance issue, and multi-loop features are included for identifying those pre-miRNAs with multi-loops.
We collected a representative dataset, which contains 697 real miRNA precursors identified by experimental procedure and other computational methods, and 5428 pseudo ones from several datasets. Experiments showed that our miRenSVM achieved a 96.5% specificity and a 93.05% sensitivity on the dataset. Compared with the state-of-the-art approaches, miRenSVM obtained better prediction results. We also applied our method to predict 14 Homo sapiens pre-miRNAs and 13 Anopheles gambiae pre-miRNAs that first appeared in miRBase13.0, MiRenSVM got a 100% prediction rate. Furthermore, performance evaluation was conducted over 27 additional species in miRBase13.0, and 92.84% (4863/5238) animal pre-miRNAs were correctly identified by miRenSVM.
MiRenSVM is an ensemble support vector machine (SVM) classification system for better detecting miRNA genes, especially those with multi-loop secondary structure.
PMCID: PMC3024864  PMID: 21172046
6.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 
BMC Bioinformatics  2005;6:310.
MicroRNAs (miRNAs) are a group of short (~22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology.
A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information.
The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.
PMCID: PMC1360673  PMID: 16381612
7.  A framework for automated enrichment of functionally significant inverted repeats in whole genomes 
BMC Bioinformatics  2010;11(Suppl 6):S20.
RNA transcripts from genomic sequences showing dyad symmetry typically adopt hairpin-like, cloverleaf, or similar structures that act as recognition sites for proteins. Such structures often are the precursors of non-coding RNA (ncRNA) sequences like microRNA (miRNA) and small-interfering RNA (siRNA) that have recently garnered more functional significance than in the past. Genomic DNA contains hundreds of thousands of such inverted repeats (IRs) with varying degrees of symmetry. But by collecting statistically significant information from a known set of ncRNA, we can sort these IRs into those that are likely to be functional.
A novel method was developed to scan genomic DNA for partially symmetric inverted repeats and the resulting set was further refined to match miRNA precursors (pre-miRNA) with respect to their density of symmetry, statistical probability of the symmetry, length of stems in the predicted hairpin secondary structure, and the GC content of the stems. This method was applied on the Arabidopsis thaliana genome and validated against the set of 190 known Arabidopsis pre-miRNA in the miRBase database. A preliminary scan for IRs identified 186 of the known pre-miRNA but with 714700 pre-miRNA candidates. This large number of IRs was further refined to 483908 candidates with 183 pre-miRNA identified and further still to 165371 candidates with 171 pre-miRNA identified (i.e. with 90% of the known pre-miRNA retained).
165371 candidates for potentially functional miRNA is still too large a set to warrant wet lab analyses, such as northern blotting, on all of them. Hence additional filters are needed to further refine the number of candidates while still retaining most of the known miRNA. These include detection of promoters and terminators, homology analyses, location of candidate relative to coding regions, and better secondary structure prediction algorithms. The software developed is designed to easily accommodate such additional filters with a minimal experience in Perl.
PMCID: PMC3026368  PMID: 20946604
8.  miROrtho: computational survey of microRNA genes 
Nucleic Acids Research  2008;37(Database issue):D111-D117.
MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature ∼22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database ( presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.
PMCID: PMC2686488  PMID: 18927110
9.  miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes 
Nucleic Acids Research  2005;34(Database issue):D135-D139.
Recent work has demonstrated that microRNAs (miRNAs) are involved in critical biological processes by suppressing the translation of coding genes. This work develops an integrated database, miRNAMap, to store the known miRNA genes, the putative miRNA genes, the known miRNA targets and the putative miRNA targets. The known miRNA genes in four mammalian genomes such as human, mouse, rat and dog are obtained from miRBase, and experimentally validated miRNA targets are identified in a survey of the literature. Putative miRNA precursors were identified by RNAz, which is a non-coding RNA prediction tool based on comparative sequence analysis. The mature miRNA of the putative miRNA genes is accurately determined using a machine learning approach, mmiRNA. Then, miRanda was applied to predict the miRNA targets within the conserved regions in 3′-UTR of the genes in the four mammalian genomes. The miRNAMap also provides the expression profiles of the known miRNAs, cross-species comparisons, gene annotations and cross-links to other biological databases. Both textual and graphical web interface are provided to facilitate the retrieval of data from the miRNAMap. The database is freely available at .
PMCID: PMC1347497  PMID: 16381831
10.  Systematic Curation of miRBase Annotation Using Integrated Small RNA High-Throughput Sequencing Data for C. elegans and Drosophila 
MicroRNAs (miRNAs) are a class of 20–23 nucleotide small RNAs that regulate gene expression post-transcriptionally in animals and plants. Annotation of miRNAs by the miRNA database (miRBase) has largely relied on computational approaches. As a result, many miRBase entries lack experimental validation, and discrepancies between miRBase annotation and actual miRNA sequences are often observed. In this study, we integrated the small RNA sequencing (smRNA-seq) datasets in Caenorhabditis elegans and Drosophila melanogaster and devised an analytical pipeline coupled with detailed manual inspection to curate miRNA annotation systematically in miRBase. Our analysis reveals 19 (17.0%) and 51 (31.3%) miRNAs entries with detectable smRNA-seq reads have mature sequence discrepancies in C. elegans and D. melanogaster, respectively. These discrepancies frequently occur either for conserved miRNA families whose mature sequences were predicted according to their homologous counterparts in other species or for miRNAs whose precursor miRNA (pre-miRNA) hairpins produce an abundance of multiple miRNA isoforms or variants. Our analysis shows that while Drosophila pre-miRNAs, on average, produce less than 60% accurate mature miRNA reads in addition to their 5′ and 3′ variant isoforms, the precision of miRNA processing in C. elegans is much higher, at over 90%. Based on the revised miRNA sequences, we analyzed expression patterns of the more conserved (MC) and less conserved (LC) miRNAs and found that, whereas MC miRNAs are often co-expressed at multiple developmental stages, LC miRNAs tend to be expressed specifically at fewer stages.
PMCID: PMC3268580  PMID: 22303321
microRNA; deep sequencing; database curation
11.  Pre-microRNA and Mature microRNA in Human Mitochondria 
PLoS ONE  2011;6(5):e20220.
Because of the central functions of the mitochondria in providing metabolic energy and initiating apoptosis on one hand and the role that microRNA (miRNA) play in gene expression, we hypothesized that some miRNA could be present in the mitochondria for post-transcriptomic regulation by RNA interference. We intend to identify miRNA localized in the mitochondria isolated from human skeletal primary muscular cells.
Methodology/Principal Findings
To investigate the potential origin of mitochondrial miRNA, we in-silico searched for microRNA candidates in the mtDNA. Twenty five human pre-miRNA and 33 miRNA aligments (E-value<0.1) were found in the reference mitochondrial sequence and some of the best candidates were chosen for a co-localization test. In situ hybridization of pre-mir-302a, pre-let-7b and mir-365, using specific labelled locked nucleic acids and confocal microscopy, demonstrated that these miRNA were localized in mitochondria of human myoblasts. Total RNA was extracted from enriched mitochondria isolated by an immunomagnetic method from a culture of human myotubes. The detection of 742 human miRNA (miRBase) were monitored by RT-qPCR at three increasing mtRNA inputs. Forty six miRNA were significantly expressed (2nd derivative method Cp>35) for the smallest RNA input concentration and 204 miRNA for the maximum RNA input concentration. In silico analysis predicted 80 putative miRNA target sites in the mitochondrial genome (E-value<0.05).
The present study experimentally demonstrated for the first time the presence of pre-miRNA and miRNA in the human mitochondria isolated from skeletal muscular cells. A set of miRNA were significantly detected in mitochondria fraction. The origin of these pre-miRNA and miRNA should be further investigate to determine if they are imported from the cytosol and/or if they are partially processed in the mitochondria.
PMCID: PMC3102686  PMID: 21637849
12.  The small RNA diversity from Medicago truncatula roots under biotic interactions evidences the environmental plasticity of the miRNAome 
Genome Biology  2014;15(9):457.
Legume roots show a remarkable plasticity to adapt their architecture to biotic and abiotic constraints, including symbiotic interactions. However, global analysis of miRNA regulation in roots is limited, and a global view of the evolution of miRNA-mediated diversification in different ecotypes is lacking.
In the model legume Medicago truncatula, we analyze the small RNA transcriptome of roots submitted to symbiotic and pathogenic interactions. Genome mapping and a computational pipeline identify 416 miRNA candidates, including known and novel variants of 78 miRNA families present in miRBase. Stringent criteria of pre-miRNA prediction yield 52 new mtr-miRNAs, including 27 miRtrons. Analyzing miRNA precursor polymorphisms in 26 M. truncatula ecotypes identifies higher sequence polymorphism in conserved rather than Medicago-specific miRNA precursors. An average of 19 targets, mainly involved in environmental responses and signalling, is predicted per novel miRNA. We identify miRNAs responsive to bacterial and fungal pathogens or symbionts as well as their related Nod and Myc-LCO symbiotic signals. Network analyses reveal modules of new and conserved co-expressed miRNAs that regulate distinct sets of targets, highlighting potential miRNA-regulated biological pathways relevant to pathogenic and symbiotic interactions.
We identify 52 novel genuine miRNAs and large plasticity of the root miRNAome in response to the environment, and also in response to purified Myc/Nod signaling molecules. The new miRNAs identified and their sequence variation across M. truncatula ecotypes may be crucial to understand the adaptation of root growth to the soil environment, notably in the agriculturally important legume crops.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0457-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4212123  PMID: 25248950
13.  Mammalian MicroRNA Prediction through a Support Vector Machine Model of Sequence and Structure 
PLoS ONE  2007;2(9):e946.
MicroRNAs (miRNAs) are endogenous small noncoding RNA gene products, on average 22 nt long, found in a wide variety of organisms. They play important regulatory roles by targeting mRNAs for degradation or translational repression. There are 377 known mouse miRNAs and 475 known human miRNAs in the May 2007 release of the miRBase database, the majority of which are conserved between the two species. A number of recent reports imply that it is likely that many mammalian miRNAs remain to be discovered. The possibility that there are more of them expressed at lower levels or in more specialized expression contexts calls for the exploitation of genome sequence information to accelerate their discovery.
Methodology/Principal Findings
In this article, we describe a computational method-mirCoS-that uses three support vector machine models sequentially to discover new miRNA candidates in mammalian genomes based on sequence, secondary structure, and conservation. mirCoS can efficiently detect the majority of known miRNAs and predicts an extensive set of hairpin structures based on human-mouse comparisons. In total, 3476 mouse candidates and 3441 human candidates were found. These hairpins are more similar to known miRNAs than to negative controls in several aspects not considered by the prediction algorithm. A significant fraction of predictions is supported by existing expression evidence.
Using a novel approach, mirCoS performs comparably to or better than existing miRNA prediction methods, and contributes a significant number of new candidate miRNAs for experimental verification.
PMCID: PMC1978525  PMID: 17895987
14.  Complexity of Murine Cardiomyocyte miRNA Biogenesis, Sequence Variant Expression and Function 
PLoS ONE  2012;7(2):e30933.
microRNAs (miRNAs) are critical to heart development and disease. Emerging research indicates that regulated precursor processing can give rise to an unexpected diversity of miRNA variants. We subjected small RNA from murine HL-1 cardiomyocyte cells to next generation sequencing to investigate the relevance of such diversity to cardiac biology. ∼40 million tags were mapped to known miRNA hairpin sequences as deposited in miRBase version 16, calling 403 generic miRNAs as appreciably expressed. Hairpin arm bias broadly agreed with miRBase annotation, although 44 miR* were unexpectedly abundant (>20% of tags); conversely, 33 -5p/-3p annotated hairpins were asymmetrically expressed. Overall, variability was infrequent at the 5′ start but common at the 3′ end of miRNAs (5.2% and 52.3% of tags, respectively). Nevertheless, 105 miRNAs showed marked 5′ isomiR expression (>20% of tags). Among these was miR-133a, a miRNA with important cardiac functions, and we demonstrated differential mRNA targeting by two of its prevalent 5′ isomiRs. Analyses of miRNA termini and base-pairing patterns around Drosha and Dicer cleavage regions confirmed the known bias towards uridine at the 5′ most position of miRNAs, as well as supporting the thermodynamic asymmetry rule for miRNA strand selection and a role for local structural distortions in fine tuning miRNA processing. We further recorded appreciable expression of 5 novel miR*, 38 extreme variants and 8 antisense miRNAs. Analysis of genome-mapped tags revealed 147 novel candidate miRNAs. In summary, we revealed pronounced sequence diversity among cardiomyocyte miRNAs, knowledge of which will underpin future research into the mechanisms involved in miRNA biogenesis and, importantly, cardiac function, disease and therapy.
PMCID: PMC3272019  PMID: 22319597
15.  miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM 
BMC Bioinformatics  2011;12:216.
MicroRNAs (miRNAs) are ~22 nt long integral elements responsible for post-transcriptional control of gene expressions. After the identification of thousands of miRNAs, the challenge is now to explore their specific biological functions. To this end, it will be greatly helpful to construct a reasonable organization of these miRNAs according to their homologous relationships. Given an established miRNA family system (e.g. the miRBase family organization), this paper addresses the problem of automatically and accurately classifying newly found miRNAs to their corresponding families by supervised learning techniques. Concretely, we propose an effective method, miRFam, which uses only primary information of pre-miRNAs or mature miRNAs and a multiclass SVM, to automatically classify miRNA genes.
An existing miRNA family system prepared by miRBase was downloaded online. We first employed n-grams to extract features from known precursor sequences, and then trained a multiclass SVM classifier to classify new miRNAs (i.e. their families are unknown). Comparing with miRBase's sequence alignment and manual modification, our study shows that the application of machine learning techniques to miRNA family classification is a general and more effective approach. When the testing dataset contains more than 300 families (each of which holds no less than 5 members), the classification accuracy is around 98%. Even with the entire miRBase15 (1056 families and more than 650 of them hold less than 5 samples), the accuracy surprisingly reaches 90%.
Based on experimental results, we argue that miRFam is suitable for application as an automated method of family classification, and it is an important supplementary tool to the existing alignment-based small non-coding RNA (sncRNA) classification methods, since it only requires primary sequence information.
The source code of miRFam, written in C++, is freely and publicly available at:
PMCID: PMC3120706  PMID: 21619662
16.  Automatically clustering large-scale miRNA sequences: methods and experiments 
BMC Genomics  2012;13(Suppl 8):S15.
Since the initial annotation of microRNAs (miRNAs) in 2001, many studies have sought to identify additional miRNAs experimentally or computationally in various species. MiRNAs act with the Argonaut family of proteins to regulate target messenger RNAs (mRNAs) post-transcriptionally. Currently, researches mainly focus on single miRNA function study. Considering that members in the same miRNA family might participate in the same pathway or regulate the same target(s) and thus share similar biological functions, people can explore useful knowledge from high quality miRNA family architecture.
In this article, we developed an unsupervised clustering-based method miRCluster to automatically group miRNAs. In order to evaluate this method, several data sets were constructed from the online database miRBase. Results showed that miRCluster can efficiently arrange miRNAs (e.g identify 354 families in miRBase16 with an accuracy of 92.08%, and can recognize 9 of all 10 newly-added families in miRBase 17). By far, ~30% mature miRNAs registered in miRBase are unclassified. With miRCluster, over 85% unclassified miRNAs can be assigned to certain families, while ~44% of these miRNAs distributed in ~300novel families.
In short, miRCluster is an automatic and efficient miRNA family identification method, which does not require any prior knowledge. It can be helpful in real use, especially when exploring functions of novel miRNAs. All relevant materials could be freely accessed online (
PMCID: PMC3535721  PMID: 23282099
17.  Computational prediction of the localization of microRNAs within their pre-miRNA 
Nucleic Acids Research  2013;41(15):7200-7211.
MicroRNAs (miRNAs) are short RNA species derived from hairpin-forming miRNA precursors (pre-miRNA) and acting as key posttranscriptional regulators. Most computational tools labeled as miRNA predictors are in fact pre-miRNA predictors and provide no information about the putative miRNA location within the pre-miRNA. Sequence and structural features that determine the location of the miRNA, and the extent to which these properties vary from species to species, are poorly understood. We have developed miRdup, a computational predictor for the identification of the most likely miRNA location within a given pre-miRNA or the validation of a candidate miRNA. MiRdup is based on a random forest classifier trained with experimentally validated miRNAs from miRbase, with features that characterize the miRNA–miRNA* duplex. Because we observed that miRNAs have sequence and structural properties that differ between species, mostly in terms of duplex stability, we trained various clade-specific miRdup models and obtained increased accuracy. MiRdup self-trains on the most recent version of miRbase and is easy to use. Combined with existing pre-miRNA predictors, it will be valuable for both de novo mapping of miRNAs and filtering of large sets of candidate miRNAs obtained from transcriptome sequencing projects. MiRdup is open source under the GPLv3 and available at∼blanchem/mirdup/.
PMCID: PMC3753617  PMID: 23748953
18.  A Densely Interconnected Genome-Wide Network of MicroRNAs and Oncogenic Pathways Revealed Using Gene Expression Signatures 
PLoS Genetics  2011;7(12):e1002415.
MicroRNAs (miRNAs) are important components of cellular signaling pathways, acting either as pathway regulators or pathway targets. Currently, only a limited number of miRNAs have been functionally linked to specific signaling pathways. Here, we explored if gene expression signatures could be used to represent miRNA activities and integrated with genomic signatures of oncogenic pathway activity to identify connections between miRNAs and oncogenic pathways on a high-throughput, genome-wide scale. Mapping >300 gene expression signatures to >700 primary tumor profiles, we constructed a genome-wide miRNA–pathway network predicting the associations of 276 human miRNAs to 26 oncogenic pathways. The miRNA–pathway network confirmed a host of previously reported miRNA/pathway associations and uncovered several novel associations that were subsequently experimentally validated. Globally, the miRNA–pathway network demonstrates a small-world, but not scale-free, organization characterized by multiple distinct, tightly knit modules each exhibiting a high density of connections. However, unlike genetic or metabolic networks typified by only a few highly connected nodes (“hubs”), most nodes in the miRNA–pathway network are highly connected. Sequence-based computational analysis confirmed that highly-interconnected miRNAs are likely to be regulated by common pathways to target similar sets of downstream genes, suggesting a pervasive and high level of functional redundancy among coexpressed miRNAs. We conclude that gene expression signatures can be used as surrogates of miRNA activity. Our strategy facilitates the task of discovering novel miRNA–pathway connections, since gene expression data for multiple normal and disease conditions are abundantly available.
Author Summary
MicroRNAs (miRNAs) are naturally occurring small RNA molecules of ∼22 nucleotides that regulate gene expression. Recent studies have shown that miRNAs can behave as important components of cellular signaling pathways, as pathway regulators or pathway targets. Currently however, only a few miRNAs have been functionally linked to specific signaling pathways, raising the need for novel approaches to accelerate the identification of miRNA–pathway connections. Here, we show that gene expression signatures, previously used to reflect patterns of pathway activation, can also be used to represent miRNA activities. Using this approach, we constructed a genome-wide miRNA–pathway network predicting the associations of 276 human miRNAs to 26 oncogenic pathways. The miRNA–pathway network confirmed a host of previously reported miRNA/pathway associations and uncovered several novel associations that were subsequently experimentally validated. Besides being the first study to conceptually demonstrate that expression signatures can act as surrogates of miRNA activity, our study provides a large database of candidate pathway-modulating miRNAs, which researchers interested in a particular pathway (e.g. Ras, Myc) are likely to find useful. Moreover, because this approach solely employs gene expression, it is immediately applicable to the thousands of microarray data sets currently available in the public domain.
PMCID: PMC3240594  PMID: 22194702
19.  Computational identification of microRNA gene loci and precursor microRNA sequences in CHO cell lines 
Journal of Biotechnology  2012;158(3):151-155.
► We mapped all known mature CHO miRNAs to two CHO-K1 reference genomes. ► 212 unique genomic miRNA loci and the respective precursor miRNA sequences were identified. ► The genomic loci of 4 polycistronic miRNA cluster were confirmed by PCR. ► The identified sequences were analyzed for SNPs and conservation compared to mouse. ► Sequence data have been prepared for submission to miRBase miRNA sequence repository.
MicroRNAs (miRNAs) have recently entered Chinese hamster ovary (CHO) cell culture technology, due to their severe impact on the regulation of cellular phenotypes. Applications of miRNAs that are envisioned range from biomarkers of favorable phenotypes to cell engineering targets. These applications, however, require a profound knowledge of miRNA sequences and their genomic organization, which exceeds the currently available information of ∼400 conserved mature CHO miRNA sequences. Based on these recently published sequences and two independent CHO-K1 genome assemblies, this publication describes the computational identification of CHO miRNA genomic loci. Using BLAST alignment, 415 previously reported CHO miRNAs were mapped to the reference genomes, and subsequently assigned to a distinct genomic miRNA locus. Sequences of the respective precursor-miRNAs were extracted from both reference genomes, folded in silico to verify correct structures and cross-compared. In the end, 212 genomic loci and pre-miRNA sequences representing 319 expressed mature miRNAs (approximately 50% of miRNAs represented matching pairs of 5′ and 3′ miRNAs) were submitted to the miRBase miRNA repository. As a proof-of-principle for the usability of the published genomic loci, four likely polycistronic miRNA cluster were chosen for PCR amplification using CHO-K1 and DHFR (-) genomic DNA. Overall, these data on the genomic context of miRNA expression in CHO will simplify the development of tools employing stable overexpression or deletion of miRNAs, allow the identification of miRNA promoters and improve detection methods such as microarrays.
PMCID: PMC3314935  PMID: 22306111
MicroRNA; microRNA stemloops; Chinese hamster ovary; Cell engineering
20.  Deep sequencing of small RNAs identifies canonical and non-canonical miRNA and endogenous siRNAs in mammalian somatic tissues 
Nucleic Acids Research  2013;41(5):3339-3351.
MicroRNAs (miRNAs) are small RNA molecules that regulate gene expression. They are characterized by specific maturation processes defined by canonical and non-canonical biogenic pathways. Analysis of ∼0.5 billion sequences from mouse data sets derived from different tissues, developmental stages and cell types, partly characterized by either ablation or mutation of the main proteins belonging to miRNA processor complexes, reveals 66 high-confidence new genomic loci coding for miRNAs that could be processed in a canonical or non-canonical manner. A proportion of the newly discovered miRNAs comprises mirtrons, for which we define a new sub-class. Notably, some of these newly discovered miRNAs are generated from untranslated and open reading frames of coding genes, and we experimentally validate these. We also show that many annotated miRNAs do not present miRNA-like features, as they are neither processed by known processing complexes nor loaded on AGO2; this indicates that the current miRNA miRBase database list should be refined and re-defined. Accordingly, a group of them map on ribosomal RNA molecules, whereas others cannot undergo genuine miRNA biogenesis. Notably, a group of annotated miRNAs are Dgcr8 independent and DICER dependent endogenous small interfering RNAs that derive from a unique hairpin formed from a short interspersed nuclear element.
PMCID: PMC3597668  PMID: 23325850
21.  Cross-Mapping Events in miRNAs Reveal Potential miRNA-Mimics and Evolutionary Implications 
PLoS ONE  2011;6(5):e20517.
MicroRNAs (miRNAs) have important roles in various biological processes. miRNA cross-mapping is a prevalent phenomenon where miRNA sequence originating from one genomic region is mapped to another location. To have a better understanding of this phenomenon in the human genome, we performed a detailed analysis in this paper using public miRNA high-throughput sequencing data and all known human miRNAs. We observed widespread cross-mapping events between miRNA precursors (pre-miRNAs), other non-coding RNAs (ncRNAs) and the opposite strands of pre-miRNAs by analyzing the high-throughput sequencing data. Computational analysis on all known human miRNAs also confirmed that many of them could be involved in cross-mapping events. The processing or decay of both ncRNAs and pre-miRNA opposite strand transcripts may contribute to miRNA enrichment, although some might be miRNA-mimics due to miRNA mis-annotation. Comparing to canonical miRNAs, miRNAs involved in cross-mapping events between pre-miRNAs and other ncRNAs normally had shorter lengths (17–19 nt), lower prediction scores and were classified as pseudo miRNA precursors. Notably, 4.9% of all human miRNAs could be accurately mapped to the opposite strands of pre-miRNAs, which showed that both strands of the same genomic region had the potential to produce mature miRNAs and simultaneously implied some potential miRNA precursors. We proposed that the cross-mapping events are more complex than we previously thought. Sequence similarity between other ncRNAs and pre-miRNAs and the specific stem-loop structures of pre-miRNAs may provide evolutionary implications.
PMCID: PMC3102724  PMID: 21637827
22.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome 
PLoS Computational Biology  2006;2(4):e33.
The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization.
Structurally functional RNA is a versatile component of the cell that comprises both independent molecules and regulatory elements of mRNA transcripts. The many recent discoveries of functional RNAs, most notably miRNAs, suggests that many more are yet to be found. Computational identification of functional RNAs has traditionally been hampered by the lack of strong sequence signals. However, structural conservation over long evolutionary times creates a characteristic substitution pattern, which can be exploited with the advent of comparative genomics. The authors have devised a method for identification of functional RNA structures based on phylogenetic analysis of multiple alignments. This method has been used to screen the regions of the human genome that are under strong selective constraints. The result is a set of 48,479 candidate RNA structures. For some classes of known functional RNAs, such as miRNAs and histone 3′UTR stem loops, this set includes nearly all deeply conserved members. The initial large candidate set has been partitioned by size, shape, and genomic location and ranked by score to produce specific lists of top candidates for miRNAs, selenocysteine insertion sites, RNA editing hairpins, and RNAs involved in transcript auto regulation.
PMCID: PMC1440920  PMID: 16628248
23.  Identification and characteristics of microRNAs from Bombyx mori 
BMC Genomics  2008;9:248.
MicroRNAs (miRNAs) are small RNA molecules that regulate gene expression by targeting messenger RNAs (mRNAs) and causing mRNA cleavage or translation blockage. Of the 355 Arthropod miRNAs that have been identified, only 21 are B. mori miRNAs that were predicted computationally; of these, only let-7 has been confirmed by Northern blotting.
Combining a computational method based on sequence homology searches with experimental identification based on microarray assays and Northern blotting, we identified 46 miRNAs, an additional 21 plausible miRNAs, and a novel small RNA in B. mori. The latter, bmo-miR-100-like, was identified using the known miRNA aga-miR-100 as a probe; bmo-miR-100-like was detected by microarray assay and Northern blotting, but its precursor sequences did not fold into a hairpin structure. Among these identified miRNAs, we found 12 pairs of miRNAs and miRNA*s. Northern blotting revealed that some B. mori miRNA genes were expressed only during specific stages, indicating that B. mori miRNA genes (e.g., bmo-miR-277) have developmentally regulated patterns of expression. We identified two miRNA gene clusters in the B. mori genome. bmo-miR-2b, which is found in the gene cluster bmo-miR-2a-1/bmo-miR-2a-1*/bmo-miR-2a-2/bmo-miR-2b/bmo-miR-13a*/bmo-miR-13b, encodes a newly identified member of the mir-2 family. Moreover, we found that methylation can increase the sensitivity of a DNA probe used to detect a miRNA by Northern blotting. Functional analysis revealed that 11 miRNAs may regulate 13 B. mori orthologs of the 25 known Drosophila miRNA-targeted genes according to the functional conservation. We predicted the binding sites on the 1671 3'UTR of B. mori genes; 547 targeted genes, including 986 target sites, were predicted. Of these target sites, 338 had perfect base pairing to the seed region of 43 miRNAs. From the predicted genes, 61 genes, each of them with multiple predicted target sites, should be considered excellent candidates for future functional studies. Biological classification of predicted miRNA targets showed that "binding", "catalytic activity" and "physiological process" were over-represented for the predicted genes.
Combining computational predictions with microarray assays, we identified 46 B. mori miRNAs, 13 of which were miRNA*s. We identified a novel small RNA and 21 plausible B. mori miRNAs that could not be located in the available B. mori genome, but which could be detected by microarray. Thirteen and 547 target genes were predicted according to the functional conservation and binding sites, respectively. Identification of miRNAs in B. mori, particularly those that are developmentally regulated, provides a foundation for subsequent functional studies.
PMCID: PMC2435238  PMID: 18507836
24.  Characterization of microRNAs in Mud Crab Scylla paramamosain under Vibrio parahaemolyticus Infection 
PLoS ONE  2013;8(8):e73392.
Infection of bacterial Vibrio parahaemolyticus is common in mud crab farms. However, the mechanisms of the crab’s response to pathogenic V. parahaemolyticus infection are not fully understood. MicroRNAs (miRNAs) are a class of small noncoding RNAs that function as regulators of gene expression and play essential roles in various biological processes. To understand the underlying mechanisms of the molecular immune response of the crab to the pathogens, high-throughput Illumina/Solexa deep sequencing technology was used to investigate the expression profiles of miRNAs in S. paramamosain under V. parahaemolyticus infection.
Methodology/Principal Findings
Two mixed RNA pools of 7 tissues (intestine, heart, liver, gill, brain, muscle and blood) were obtained from V. parahaemolyticus infected crabs and the control groups, respectively. By aligning the sequencing data with known miRNAs, we characterized 421 miRNA families, and 133 conserved miRNA families in mud crab S. paramamosain were either identical or very similar to existing miRNAs in miRBase. Stem-loop qRT-PCRs were used to scan the expression levels of four randomly chosen differentially expressed miRNAs and tissue distribution. Eight novel potential miRNAs were confirmed by qRT-PCR analysis and the precursors of these novel miRNAs were verified by PCR amplification, cloning and sequencing in S. paramamosain. 161 miRNAs (106 of which up-regulated and 55 down-regulated) were significantly differentially expressed during the challenge and the potential targets of these differentially expressed miRNAs were predicted. Furthermore, we demonstrated evolutionary conservation of mud crab miRNAs in the animal evolution process.
In this study, a large number of miRNAs were identified in S. paramamosain when challenged with V. parahaemolyticus, some of which were differentially expressed. The results show that miRNAs might play some important roles in regulating gene expression in mud crab under V. parahaemolyticus infection, providing a basis for further investigation of miRNA-modulating networks in innate immunity of mud crab.
PMCID: PMC3758354  PMID: 24023678
25.  Identification and characterization of new miRNAs cloned from normal mouse mammary gland 
BMC Genomics  2009;10:149.
MicroRNAs (miRNAs) are small non-coding RNAs that have been found to play important roles in silencing target genes and that are involved in the regulation of various normal cellular processes. Until now their implication in the mammary gland biology was suggested by few studies mainly focusing on pathological situations allowing the characterization of miRNAs as markers of breast cancer tumour classes. If in the normal mammary gland, the expression of known miRNAs has been studied in human and mice but the full repertoire of miRNAs expressed in this tissue is not yet available.
To extend the repertoire of mouse mammary gland expressed miRNAs, we have constructed several libraries of small miRNAs allowing the cloning of 455 sequences. After bioinformatics' analysis, 3 known miRNA (present in miRbase) and 33 new miRNAs were identified. Expression of 24 out of the 33 has been confirmed by RT-PCR. Expression of none of them was found to be mammary specific, despite a tissue-restricted distribution of some of them. No correlation could be established between their expression pattern and evolutionary conservation. Six of them appear to be mouse specific. In several cases, multiple potential precursors of miRNA were present in the genome and we have developed a strategy to determine which of them was able to mature the miRNA.
The cloning approach has allowed improving the repertoire of miRNAs in the mammary gland, an evolutionary recent organ. This tissue is a good candidate to find tissue-specific miRNAs and to detect miRNA specific to mammals. We provide evidence for 24 new miRNA. If none of them is mammary gland specific, a few of them are not ubiquitously expressed. For the first time 6 mouse specific miRNA have been identified.
PMCID: PMC2683868  PMID: 19351399

Results 1-25 (1129010)