Search tips
Search criteria

Results 1-25 (1417858)

Clipboard (0)

Related Articles

1.  Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences 
BMC Research Notes  2014;7:286.
MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons.
The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f–measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.
The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.
The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs.
The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
PMCID: PMC4051165  PMID: 24884968
MicroRNA; Support vector machine; Random forests; MiRNA hairpin prediction; Neural network
2.  Identification of Schistosoma mansoni microRNAs 
BMC Genomics  2011;12:47.
MicroRNAs (miRNAs) constitute a class of single-stranded RNAs which play a crucial role in regulating development and controlling gene expression by targeting mRNAs and triggering either translation repression or messenger RNA (mRNA) degradation. miRNAs are widespread in eukaryotes and to date over 14,000 miRNAs have been identified by computational and experimental approaches. Several miRNAs are highly conserved across species. In Schistosoma, the full set of miRNAs and their expression patterns during development remain poorly understood. Here we report on the development and implementation of a homology-based detection strategy to search for miRNA genes in Schistosoma mansoni. In addition, we report results on the experimental detection of miRNAs by means of cDNA cloning and sequencing of size-fractionated RNA samples.
Homology search using the high-throughput pipeline was performed with all known miRNAs in miRBase. A total of 6,211 mature miRNAs were used as reference sequences and 110 unique S. mansoni sequences were returned by BLASTn analysis. The existing mature miRNAs that produced these hits are reported, as well as the locations of the homologous sequences in the S. mansoni genome. All BLAST hits aligned with at least 95% of the miRNA sequence, resulting in alignment lengths of 19-24 nt. Following several filtering steps, 15 potential miRNA candidates were identified using this approach. By sequencing small RNA cDNA libraries from adult worm pairs, we identified 211 novel miRNA candidates in the S. mansoni genome. Northern blot analysis was used to detect the expression of the 30 most frequent sequenced miRNAs and to compare the expression level of these miRNAs between the lung stage schistosomula and adult worm stages. Expression of 11 novel miRNAs was confirmed by northern blot analysis and some presented a stage-regulated expression pattern. Three miRNAs previously identified from S. japonicum were also present in S. mansoni.
Evidence for the presence of miRNAs in S. mansoni is presented. The number of miRNAs detected by homology-based computational methods in S. mansoni is limited due to the lack of close relatives in the miRNA repository. In spite of this, the computational approach described here can likely be applied to the identification of pre-miRNA hairpins in other organisms. Construction and analysis of a small RNA library led to the experimental identification of 14 novel miRNAs from S. mansoni through a combination of molecular cloning, DNA sequencing and expression studies. Our results significantly expand the set of known miRNAs in multicellular parasites and provide a basis for understanding the structural and functional evolution of miRNAs in these metazoan parasites.
PMCID: PMC3034697  PMID: 21247453
3.  MicroRNA discovery by similarity search to a database of RNA-seq profiles 
Frontiers in Genetics  2013;4:133.
In silico generated search for microRNAs (miRNAs) has been driven by methods compiling structural features of the miRNA precursor hairpin, as well as to some degree combining this with the analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1–2 ~22 nt blocks of reads corresponding to the mature and star miRNA. In complement to the previous methods, we present a study where we systematically exploit these patterns of read profiles. We created two datasets comprised of 2540 and 4795 read profiles obtained after preprocessing short RNA-seq data from miRBase and ENCODE, respectively. Out of 4795 ENCODE read profiles, 1361 are annotated as non-coding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using deepBlockAlign (dba), we align ncRNA read profiles from ENCODE against the miRBase read profiles (cleaned for “self-matches”) and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews Correlation Coefficient (MCC) of 0.8 and obtain an area under the curve of 0.93. Based on the dba score cut-off of 0.7 at which we observed the maximum MCC of 0.8, we predict 523 novel miRNA candidates. An additional RNA secondary structure analysis reveal that 42 of the candidates overlap with predicted conserved secondary structure. Further analysis reveal that the 523 miRNA candidates are located in genomic regions with MAF block (UCSC) fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts. We further analyzed known human and mouse miRNA read profiles and found two distinct classes; the first containing two blocks and the second containing >2 blocks of reads. Also the latter class holds read profiles that have less well defined arrangement of reads in comparison to the former class. On comparison of miRNA read profiles from plants and animals, we observed kingdom specific read profiles that are distinct in terms of both length and distribution of reads within the read profiles to each other. All the data, as well as a server to search miRBase read profiles by uploading a BED file, is available at
PMCID: PMC3708161  PMID: 23874353
microRNA; miRNA read profiles; RNA-seq; alignment; deepBlockAlign; read profiles
4.  Discovery of Novel MicroRNAs in Rat Kidney Using Next Generation Sequencing and Microarray Validation 
PLoS ONE  2012;7(3):e34394.
MicroRNAs (miRNAs) are small non-coding RNAs that regulate a variety of biological processes. The latest version of the miRBase database (Release 18) includes 1,157 mouse and 680 rat mature miRNAs. Only one new rat mature miRNA was added to the rat miRNA database from version 16 to version 18 of miRBase, suggesting that many rat miRNAs remain to be discovered. Given the importance of rat as a model organism, discovery of the completed set of rat miRNAs is necessary for understanding rat miRNA regulation. In this study, next generation sequencing (NGS), microarray analysis and bioinformatics technologies were applied to discover novel miRNAs in rat kidneys. MiRanalyzer was utilized to analyze the sequences of the small RNAs generated from NGS analysis of rat kidney samples. Hundreds of novel miRNA candidates were examined according to the mappings of their reads to the rat genome, presence of sequences that can form a miRNA hairpin structure around the mapped locations, Dicer cleavage patterns, and the levels of their expression determined by both NGS and microarray analyses. Nine novel rat hairpin precursor miRNAs (pre-miRNA) were discovered with high confidence. Five of the novel pre-miRNAs are also reported in other species while four of them are rat specific. In summary, 9 novel pre-miRNAs (14 novel mature miRNAs) were identified via combination of NGS, microarray and bioinformatics high-throughput technologies.
PMCID: PMC3314633  PMID: 22470567
5.  Computational and in vitro Investigation of miRNA-Gene Regulations in Retinoblastoma Pathogenesis: miRNA Mimics Strategy 
Retinoblastoma (RB), a primary pediatric intraocular tumor, arises from primitive retinal layers. Several novel molecular strategies are being developed for the clinical management of RB. miRNAs are known to regulate cancer-relevant biological processes. Here, the role of selected miRNAs, namely, miR-532-5p and miR-486-3p, has been analyzed for potential therapeutic targeting in RB.
A comprehensive bioinformatic analysis was performed to predict the posttranscriptional regulators (miRNAs) of the select panel of genes [Group 1: oncogenes (HMGA2, MYCN, SYK, FASN); Group 2: cancer stem cell markers (TACSTD, ABCG2, CD133, CD44, CD24) and Group 3: cell cycle regulatory proteins (p53, MDM2)] using Microcosm, DIANALAB, miRBase v 18, and REFSEQ database, and RNA hybrid. The expressions of five miRNAs, namely, miR-146b-5p, miR-532-5p, miR-142-5p, miR-328, and miR-486-3p, were analyzed by qRT–PCR on primary RB tumor samples (n = 30; including 17 invasive RB tumors and 13 noninvasive RB tumors). Detailed complementary alignment between 5’ seed sequence of differentially expressed miRNAs and the sequence of target genes was determined. Based on minimum energy level and piCTAR scores, the gene targets were selected. Functional roles of these miRNA clusters were studied by using mimics in cultured RB (Y79, Weri Rb-1) cells in vitro. The gene targets (SYK and FASN) of the studied miRNAs were confirmed by qRT-PCR and western blot analysis. Cell proliferation and apoptotic studies were performed.
Nearly 1948 miRNAs were identified in the in silico analysis, From this list, only 9 upregulated miRNAs (miR-146b-5p, miR-305, miR-663b, miR-299, miR-532-5p, miR-892b, miR-501, miR-142-5p, and miR-513b) and 10 downregulated miRNAs (miR-1254, miR-328, miR-133a, miR-1287, miR-1299, miR-375, miR-486-3p, miR-720, miR-98, and miR-122*) were found to be common with the RB serum miRNA profile. Downregulation of five miRNAs (miR-146b-5p, miR-532-5p, miR-142-5p, miR-328, and miR-486-3p) was confirmed experimentally. Predicted common oncogene targets (SYK and FASN) of miR-486-3p and miR-532-5p were evaluated for their mRNA and protein expression in these miRNA mimic-treated RB cells. Experimental overexpression of these miRNAs mediated apoptotic cell death without significantly altering the cell cycle in RB cells.
Key miRNAs in RB pathogenesis were identified by an in silico approach. Downregulation of miR-486-3p and miR-532-5p in primary retinoblastoma tissues implicates their role in tumorigenesis. Prognostic and therapeutic potential of these miRNA was established by the miRNA mimic strategy.
PMCID: PMC4429751  PMID: 25983556
bio-informatics analysis; miRNA-mRNA; mimics; retinoblastoma
6.  MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features 
BMC Bioinformatics  2010;11(Suppl 11):S11.
MicroRNAs (simply miRNAs) are derived from larger hairpin RNA precursors and play essential regular roles in both animals and plants. A number of computational methods for miRNA genes finding have been proposed in the past decade, yet the problem is far from being tackled, especially when considering the imbalance issue of known miRNAs and unidentified miRNAs, and the pre-miRNAs with multi-loops or higher minimum free energy (MFE). This paper presents a new computational approach, miRenSVM, for finding miRNA genes. Aiming at better prediction performance, an ensemble support vector machine (SVM) classifier is established to deal with the imbalance issue, and multi-loop features are included for identifying those pre-miRNAs with multi-loops.
We collected a representative dataset, which contains 697 real miRNA precursors identified by experimental procedure and other computational methods, and 5428 pseudo ones from several datasets. Experiments showed that our miRenSVM achieved a 96.5% specificity and a 93.05% sensitivity on the dataset. Compared with the state-of-the-art approaches, miRenSVM obtained better prediction results. We also applied our method to predict 14 Homo sapiens pre-miRNAs and 13 Anopheles gambiae pre-miRNAs that first appeared in miRBase13.0, MiRenSVM got a 100% prediction rate. Furthermore, performance evaluation was conducted over 27 additional species in miRBase13.0, and 92.84% (4863/5238) animal pre-miRNAs were correctly identified by miRenSVM.
MiRenSVM is an ensemble support vector machine (SVM) classification system for better detecting miRNA genes, especially those with multi-loop secondary structure.
PMCID: PMC3024864  PMID: 21172046
7.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 
BMC Bioinformatics  2005;6:310.
MicroRNAs (miRNAs) are a group of short (~22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology.
A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information.
The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.
PMCID: PMC1360673  PMID: 16381612
8.  High-throughput mRNA and miRNA profiling of epithelial-mesenchymal transition in MDCK cells 
BMC Genomics  2015;16:944.
Epithelial-mesenchymal transition (EMT) is an important process in embryonic development, especially during gastrulation and organ formation. Furthermore EMT is widely observed in pathological conditions, e.g., fibrosis, tumor progression and metastasis. Madin-Darby Canine Kidney (MDCK) cells are widely used for studies of EMT and epithelial plasticity. MDCK cells show an epithelial phenotype, while oncogenic Ras-transformed MDCK (MDCK-Ras) cells undergo EMT and show a mesenchymal phenotype.
RNA-Seq and miRNA-Seq analyses were performed on MDCK and MDCK-Ras cells. Data were validated by qRT-PCR. Gene signature analyses were carried out to identify pathways and gene ontology terms. For selected miRNAs target prediction was performed.
With RNA-Seq, mRNAs of approximately half of the genes known for dog were detected. These were screened for differential regulation during Ras-induced EMT. We went further and performed gene signature analyses and found Gene Ontology (GO) terms and pathways important for epithelial polarity and implicated in EMT. Among the identified pathways, TGFβ1 emerged as a central signaling factor in many EMT related pathways and biological processes. With miRNA-Seq, approximately half of the known canine miRNAs were found expressed in MDCK and MDCK-Ras cells. Furthermore, among differentially expressed miRNAs, miRNAs that are known to be important regulators of EMT were detected and new candidates were predicted. New dog miRNAs were discovered after aligning our reads to that of other species in miRBase. Importantly, we could identify 25 completely novel miRNAs with a stable hairpin structure. Two of these novel miRNAs were differentially expressed. We validated the two novel miRNAs with the highest read counts by RT-qPCR. Target prediction of a particular novel miRNA highly expressed in mesenchymal MDCK-Ras cells revealed that it targets components of epithelial cell junctional complexes. Combining target prediction for the most upregulated miRNAs and validation of the targets in MDCK-Ras cells with pathway analysis allowed us to identify two novel pathways, e.g., JAK/STAT signaling and pancreatic cancer pathways. These pathways could not be detected solely by gene set enrichment analyses of RNA-Seq data.
With deep sequencing data of mRNAs and miRNAs of MDCK cells and of Ras-induced EMT in MDCK cells, differentially regulated mRNAs and miRNAs are identified. Many of the identified genes are within pathways known to be involved in EMT. Novel differentially upregulated genes in MDCK cells are interferon stimulated genes and genes involved in Slit and Netrin signaling. New pathways not yet linked to these processes were identified. A central pathway in Ras induced EMT is TGFβ signaling, which leads to differential regulation of many target genes, including miRNAs. With miRNA-Seq we identified miRNAs involved in either epithelial cell biology or EMT. Finally, we describe completely novel miRNAs and their target genes.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-2036-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4647640  PMID: 26572553
MDCK; Epithelial-mesenchymal transition; Ras; Next generation sequencing; Transcriptome; miRNAome
9.  A framework for automated enrichment of functionally significant inverted repeats in whole genomes 
BMC Bioinformatics  2010;11(Suppl 6):S20.
RNA transcripts from genomic sequences showing dyad symmetry typically adopt hairpin-like, cloverleaf, or similar structures that act as recognition sites for proteins. Such structures often are the precursors of non-coding RNA (ncRNA) sequences like microRNA (miRNA) and small-interfering RNA (siRNA) that have recently garnered more functional significance than in the past. Genomic DNA contains hundreds of thousands of such inverted repeats (IRs) with varying degrees of symmetry. But by collecting statistically significant information from a known set of ncRNA, we can sort these IRs into those that are likely to be functional.
A novel method was developed to scan genomic DNA for partially symmetric inverted repeats and the resulting set was further refined to match miRNA precursors (pre-miRNA) with respect to their density of symmetry, statistical probability of the symmetry, length of stems in the predicted hairpin secondary structure, and the GC content of the stems. This method was applied on the Arabidopsis thaliana genome and validated against the set of 190 known Arabidopsis pre-miRNA in the miRBase database. A preliminary scan for IRs identified 186 of the known pre-miRNA but with 714700 pre-miRNA candidates. This large number of IRs was further refined to 483908 candidates with 183 pre-miRNA identified and further still to 165371 candidates with 171 pre-miRNA identified (i.e. with 90% of the known pre-miRNA retained).
165371 candidates for potentially functional miRNA is still too large a set to warrant wet lab analyses, such as northern blotting, on all of them. Hence additional filters are needed to further refine the number of candidates while still retaining most of the known miRNA. These include detection of promoters and terminators, homology analyses, location of candidate relative to coding regions, and better secondary structure prediction algorithms. The software developed is designed to easily accommodate such additional filters with a minimal experience in Perl.
PMCID: PMC3026368  PMID: 20946604
10.  miROrtho: computational survey of microRNA genes 
Nucleic Acids Research  2008;37(Database issue):D111-D117.
MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature ∼22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database ( presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.
PMCID: PMC2686488  PMID: 18927110
11.  MicroRNA profiling of the whitefly Bemisia tabaci Middle East-Aisa Minor I following the acquisition of Tomato yellow leaf curl China virus 
Virology Journal  2016;13:20.
The begomoviruses are the largest and most economically important group of plant viruses exclusively vectored by whitefly (Bemisia tabaci) in a circulative, persistent manner. During this process, begomoviruses and whitefly vectors have developed close relationships and complex interactions. However, the molecular mechanisms underlying these interactions remain largely unknown, and the microRNA profiles for viruliferous and nonviruliferous whiteflies have not been studied.
Sequences of Argonaute 1(Ago1) and Dicer 1 (Dcr1) genes were cloned from B. tabaci MEAM1 cDNAs. Subsequently, deep sequencing of small RNA libraries from uninfected and Tomato yellow leaf curl China virus (TYLCCNV)-infected whiteflies was performed. The conserved and novel miRNAs were identified using the release of miRBase Version 19.0 and the prediction software miRDeep2, respectively. The sequencing results of selected deregulated and novel miRNAs were further confirmed using quantitative reverse transcription-PCR. Moreover, the previously published B. tabaci MEAM1 transcriptome database and the miRNA target prediction algorithm miRanda 3.1 were utilized to predict potential targets for miRNAs. Gene Ontology (GO) analysis was also used to classify the potential enriched functional groups of their putative targets.
Ago1 and Dcr1orthologs with conserved domains were identified from B. tabaci MEAM1. BLASTn searches and sequence analysis identified 112 and 136 conserved miRNAs from nonviruliferous and viruliferous whitefly libraries respectively, and a comparison of the conserved miRNAs of viruliferous and nonviruliferous whiteflies revealed 15 up- and 9 down-regulated conserved miRNAs. 7 novel miRNA candidates with secondary pre-miRNA hairpin structures were also identified. Potential targets of conserved and novel miRNAs were predicted using GO analysis, for the targets of up- and down-regulated miRNAs, eight and nine GO terms were significantly enriched.
We identified Ago1 and Dcr1 orthologs from whiteflies, which indicated that miRNA-mediated silencing is present in whiteflies. Our comparative analysis of miRNAs from TYLCCNV viruliferous and nonviruliferous whiteflies revealed the relevance of deregulated miRNAs for the post-transcriptional gene regulation in these whiteflies. The potential targets of all expressed miRNAs were also predicted. These results will help to acquire a better understanding of the molecular mechanism underlying the complex interactions between begomoviruses and whiteflies.
Electronic supplementary material
The online version of this article (doi:10.1186/s12985-016-0469-7) contains supplementary material, which is available to authorized users.
PMCID: PMC4736103  PMID: 26837429
Tomato yellow leaf curl China virus; Whitefly Bemisia tabaci; Gene silencing machinery; Differentially regulated miRNA profiling
12.  miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes 
Nucleic Acids Research  2005;34(Database issue):D135-D139.
Recent work has demonstrated that microRNAs (miRNAs) are involved in critical biological processes by suppressing the translation of coding genes. This work develops an integrated database, miRNAMap, to store the known miRNA genes, the putative miRNA genes, the known miRNA targets and the putative miRNA targets. The known miRNA genes in four mammalian genomes such as human, mouse, rat and dog are obtained from miRBase, and experimentally validated miRNA targets are identified in a survey of the literature. Putative miRNA precursors were identified by RNAz, which is a non-coding RNA prediction tool based on comparative sequence analysis. The mature miRNA of the putative miRNA genes is accurately determined using a machine learning approach, mmiRNA. Then, miRanda was applied to predict the miRNA targets within the conserved regions in 3′-UTR of the genes in the four mammalian genomes. The miRNAMap also provides the expression profiles of the known miRNAs, cross-species comparisons, gene annotations and cross-links to other biological databases. Both textual and graphical web interface are provided to facilitate the retrieval of data from the miRNAMap. The database is freely available at .
PMCID: PMC1347497  PMID: 16381831
13.  Systematic Curation of miRBase Annotation Using Integrated Small RNA High-Throughput Sequencing Data for C. elegans and Drosophila 
MicroRNAs (miRNAs) are a class of 20–23 nucleotide small RNAs that regulate gene expression post-transcriptionally in animals and plants. Annotation of miRNAs by the miRNA database (miRBase) has largely relied on computational approaches. As a result, many miRBase entries lack experimental validation, and discrepancies between miRBase annotation and actual miRNA sequences are often observed. In this study, we integrated the small RNA sequencing (smRNA-seq) datasets in Caenorhabditis elegans and Drosophila melanogaster and devised an analytical pipeline coupled with detailed manual inspection to curate miRNA annotation systematically in miRBase. Our analysis reveals 19 (17.0%) and 51 (31.3%) miRNAs entries with detectable smRNA-seq reads have mature sequence discrepancies in C. elegans and D. melanogaster, respectively. These discrepancies frequently occur either for conserved miRNA families whose mature sequences were predicted according to their homologous counterparts in other species or for miRNAs whose precursor miRNA (pre-miRNA) hairpins produce an abundance of multiple miRNA isoforms or variants. Our analysis shows that while Drosophila pre-miRNAs, on average, produce less than 60% accurate mature miRNA reads in addition to their 5′ and 3′ variant isoforms, the precision of miRNA processing in C. elegans is much higher, at over 90%. Based on the revised miRNA sequences, we analyzed expression patterns of the more conserved (MC) and less conserved (LC) miRNAs and found that, whereas MC miRNAs are often co-expressed at multiple developmental stages, LC miRNAs tend to be expressed specifically at fewer stages.
PMCID: PMC3268580  PMID: 22303321
microRNA; deep sequencing; database curation
14.  MiRduplexSVM: A High-Performing MiRNA-Duplex Prediction and Evaluation Methodology 
PLoS ONE  2015;10(5):e0126151.
We address the problem of predicting the position of a miRNA duplex on a microRNA hairpin via the development and application of a novel SVM-based methodology. Our method combines a unique problem representation and an unbiased optimization protocol to learn from mirBase19.0 an accurate predictive model, termed MiRduplexSVM. This is the first model that provides precise information about all four ends of the miRNA duplex. We show that (a) our method outperforms four state-of-the-art tools, namely MaturePred, MiRPara, MatureBayes, MiRdup as well as a Simple Geometric Locator when applied on the same training datasets employed for each tool and evaluated on a common blind test set. (b) In all comparisons, MiRduplexSVM shows superior performance, achieving up to a 60% increase in prediction accuracy for mammalian hairpins and can generalize very well on plant hairpins, without any special optimization. (c) The tool has a number of important applications such as the ability to accurately predict the miRNA or the miRNA*, given the opposite strand of a duplex. Its performance on this task is superior to the 2nts overhang rule commonly used in computational studies and similar to that of a comparative genomic approach, without the need for prior knowledge or the complexity of performing multiple alignments. Finally, it is able to evaluate novel, potential miRNAs found either computationally or experimentally. In relation with recent confidence evaluation methods used in miRBase, MiRduplexSVM was successful in identifying high confidence potential miRNAs.
PMCID: PMC4427487  PMID: 25961860
15.  Mammalian MicroRNA Prediction through a Support Vector Machine Model of Sequence and Structure 
PLoS ONE  2007;2(9):e946.
MicroRNAs (miRNAs) are endogenous small noncoding RNA gene products, on average 22 nt long, found in a wide variety of organisms. They play important regulatory roles by targeting mRNAs for degradation or translational repression. There are 377 known mouse miRNAs and 475 known human miRNAs in the May 2007 release of the miRBase database, the majority of which are conserved between the two species. A number of recent reports imply that it is likely that many mammalian miRNAs remain to be discovered. The possibility that there are more of them expressed at lower levels or in more specialized expression contexts calls for the exploitation of genome sequence information to accelerate their discovery.
Methodology/Principal Findings
In this article, we describe a computational method-mirCoS-that uses three support vector machine models sequentially to discover new miRNA candidates in mammalian genomes based on sequence, secondary structure, and conservation. mirCoS can efficiently detect the majority of known miRNAs and predicts an extensive set of hairpin structures based on human-mouse comparisons. In total, 3476 mouse candidates and 3441 human candidates were found. These hairpins are more similar to known miRNAs than to negative controls in several aspects not considered by the prediction algorithm. A significant fraction of predictions is supported by existing expression evidence.
Using a novel approach, mirCoS performs comparably to or better than existing miRNA prediction methods, and contributes a significant number of new candidate miRNAs for experimental verification.
PMCID: PMC1978525  PMID: 17895987
16.  Complexity of Murine Cardiomyocyte miRNA Biogenesis, Sequence Variant Expression and Function 
PLoS ONE  2012;7(2):e30933.
microRNAs (miRNAs) are critical to heart development and disease. Emerging research indicates that regulated precursor processing can give rise to an unexpected diversity of miRNA variants. We subjected small RNA from murine HL-1 cardiomyocyte cells to next generation sequencing to investigate the relevance of such diversity to cardiac biology. ∼40 million tags were mapped to known miRNA hairpin sequences as deposited in miRBase version 16, calling 403 generic miRNAs as appreciably expressed. Hairpin arm bias broadly agreed with miRBase annotation, although 44 miR* were unexpectedly abundant (>20% of tags); conversely, 33 -5p/-3p annotated hairpins were asymmetrically expressed. Overall, variability was infrequent at the 5′ start but common at the 3′ end of miRNAs (5.2% and 52.3% of tags, respectively). Nevertheless, 105 miRNAs showed marked 5′ isomiR expression (>20% of tags). Among these was miR-133a, a miRNA with important cardiac functions, and we demonstrated differential mRNA targeting by two of its prevalent 5′ isomiRs. Analyses of miRNA termini and base-pairing patterns around Drosha and Dicer cleavage regions confirmed the known bias towards uridine at the 5′ most position of miRNAs, as well as supporting the thermodynamic asymmetry rule for miRNA strand selection and a role for local structural distortions in fine tuning miRNA processing. We further recorded appreciable expression of 5 novel miR*, 38 extreme variants and 8 antisense miRNAs. Analysis of genome-mapped tags revealed 147 novel candidate miRNAs. In summary, we revealed pronounced sequence diversity among cardiomyocyte miRNAs, knowledge of which will underpin future research into the mechanisms involved in miRNA biogenesis and, importantly, cardiac function, disease and therapy.
PMCID: PMC3272019  PMID: 22319597
17.  Pre-microRNA and Mature microRNA in Human Mitochondria 
PLoS ONE  2011;6(5):e20220.
Because of the central functions of the mitochondria in providing metabolic energy and initiating apoptosis on one hand and the role that microRNA (miRNA) play in gene expression, we hypothesized that some miRNA could be present in the mitochondria for post-transcriptomic regulation by RNA interference. We intend to identify miRNA localized in the mitochondria isolated from human skeletal primary muscular cells.
Methodology/Principal Findings
To investigate the potential origin of mitochondrial miRNA, we in-silico searched for microRNA candidates in the mtDNA. Twenty five human pre-miRNA and 33 miRNA aligments (E-value<0.1) were found in the reference mitochondrial sequence and some of the best candidates were chosen for a co-localization test. In situ hybridization of pre-mir-302a, pre-let-7b and mir-365, using specific labelled locked nucleic acids and confocal microscopy, demonstrated that these miRNA were localized in mitochondria of human myoblasts. Total RNA was extracted from enriched mitochondria isolated by an immunomagnetic method from a culture of human myotubes. The detection of 742 human miRNA (miRBase) were monitored by RT-qPCR at three increasing mtRNA inputs. Forty six miRNA were significantly expressed (2nd derivative method Cp>35) for the smallest RNA input concentration and 204 miRNA for the maximum RNA input concentration. In silico analysis predicted 80 putative miRNA target sites in the mitochondrial genome (E-value<0.05).
The present study experimentally demonstrated for the first time the presence of pre-miRNA and miRNA in the human mitochondria isolated from skeletal muscular cells. A set of miRNA were significantly detected in mitochondria fraction. The origin of these pre-miRNA and miRNA should be further investigate to determine if they are imported from the cytosol and/or if they are partially processed in the mitochondria.
PMCID: PMC3102686  PMID: 21637849
18.  The small RNA diversity from Medicago truncatula roots under biotic interactions evidences the environmental plasticity of the miRNAome 
Genome Biology  2014;15(9):457.
Legume roots show a remarkable plasticity to adapt their architecture to biotic and abiotic constraints, including symbiotic interactions. However, global analysis of miRNA regulation in roots is limited, and a global view of the evolution of miRNA-mediated diversification in different ecotypes is lacking.
In the model legume Medicago truncatula, we analyze the small RNA transcriptome of roots submitted to symbiotic and pathogenic interactions. Genome mapping and a computational pipeline identify 416 miRNA candidates, including known and novel variants of 78 miRNA families present in miRBase. Stringent criteria of pre-miRNA prediction yield 52 new mtr-miRNAs, including 27 miRtrons. Analyzing miRNA precursor polymorphisms in 26 M. truncatula ecotypes identifies higher sequence polymorphism in conserved rather than Medicago-specific miRNA precursors. An average of 19 targets, mainly involved in environmental responses and signalling, is predicted per novel miRNA. We identify miRNAs responsive to bacterial and fungal pathogens or symbionts as well as their related Nod and Myc-LCO symbiotic signals. Network analyses reveal modules of new and conserved co-expressed miRNAs that regulate distinct sets of targets, highlighting potential miRNA-regulated biological pathways relevant to pathogenic and symbiotic interactions.
We identify 52 novel genuine miRNAs and large plasticity of the root miRNAome in response to the environment, and also in response to purified Myc/Nod signaling molecules. The new miRNAs identified and their sequence variation across M. truncatula ecotypes may be crucial to understand the adaptation of root growth to the soil environment, notably in the agriculturally important legume crops.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0457-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4212123  PMID: 25248950
19.  miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM 
BMC Bioinformatics  2011;12:216.
MicroRNAs (miRNAs) are ~22 nt long integral elements responsible for post-transcriptional control of gene expressions. After the identification of thousands of miRNAs, the challenge is now to explore their specific biological functions. To this end, it will be greatly helpful to construct a reasonable organization of these miRNAs according to their homologous relationships. Given an established miRNA family system (e.g. the miRBase family organization), this paper addresses the problem of automatically and accurately classifying newly found miRNAs to their corresponding families by supervised learning techniques. Concretely, we propose an effective method, miRFam, which uses only primary information of pre-miRNAs or mature miRNAs and a multiclass SVM, to automatically classify miRNA genes.
An existing miRNA family system prepared by miRBase was downloaded online. We first employed n-grams to extract features from known precursor sequences, and then trained a multiclass SVM classifier to classify new miRNAs (i.e. their families are unknown). Comparing with miRBase's sequence alignment and manual modification, our study shows that the application of machine learning techniques to miRNA family classification is a general and more effective approach. When the testing dataset contains more than 300 families (each of which holds no less than 5 members), the classification accuracy is around 98%. Even with the entire miRBase15 (1056 families and more than 650 of them hold less than 5 samples), the accuracy surprisingly reaches 90%.
Based on experimental results, we argue that miRFam is suitable for application as an automated method of family classification, and it is an important supplementary tool to the existing alignment-based small non-coding RNA (sncRNA) classification methods, since it only requires primary sequence information.
The source code of miRFam, written in C++, is freely and publicly available at:
PMCID: PMC3120706  PMID: 21619662
20.  Computational prediction of the localization of microRNAs within their pre-miRNA 
Nucleic Acids Research  2013;41(15):7200-7211.
MicroRNAs (miRNAs) are short RNA species derived from hairpin-forming miRNA precursors (pre-miRNA) and acting as key posttranscriptional regulators. Most computational tools labeled as miRNA predictors are in fact pre-miRNA predictors and provide no information about the putative miRNA location within the pre-miRNA. Sequence and structural features that determine the location of the miRNA, and the extent to which these properties vary from species to species, are poorly understood. We have developed miRdup, a computational predictor for the identification of the most likely miRNA location within a given pre-miRNA or the validation of a candidate miRNA. MiRdup is based on a random forest classifier trained with experimentally validated miRNAs from miRbase, with features that characterize the miRNA–miRNA* duplex. Because we observed that miRNAs have sequence and structural properties that differ between species, mostly in terms of duplex stability, we trained various clade-specific miRdup models and obtained increased accuracy. MiRdup self-trains on the most recent version of miRbase and is easy to use. Combined with existing pre-miRNA predictors, it will be valuable for both de novo mapping of miRNAs and filtering of large sets of candidate miRNAs obtained from transcriptome sequencing projects. MiRdup is open source under the GPLv3 and available at∼blanchem/mirdup/.
PMCID: PMC3753617  PMID: 23748953
21.  miRNA Digger: a comprehensive pipeline for genome-wide novel miRNA mining 
Scientific Reports  2016;6:18901.
MicroRNAs (miRNAs) are important regulators of gene expression. The recent advances in high-throughput sequencing (HTS) technique have greatly facilitated large-scale detection of the miRNAs. However, thoroughly discovery of novel miRNAs from the available HTS data sets remains a major challenge. In this study, we observed that Dicer-mediated cleavage sites for the processing of the miRNA precursors could be mapped by using degradome sequencing data in both animals and plants. In this regard, a novel tool, miRNA Digger, was developed for systematical discovery of miRNA candidates through genome-wide screening of cleavage signals based on degradome sequencing data. To test its sensitivity and reliability, miRNA Digger was applied to discover miRNAs from four organs of Arabidopsis. The results revealed that a majority of already known mature miRNAs along with their miRNA*s expressed in these four organs were successfully recovered. Notably, a total of 30 novel miRNA-miRNA* pairs that have not been registered in miRBase were discovered by miRNA Digger. After target prediction and degradome sequencing data-based validation, eleven miRNA–target interactions involving six of the novel miRNAs were identified. Taken together, miRNA Digger could be applied for sensitive detection of novel miRNAs and it could be freely downloaded from
PMCID: PMC4702050  PMID: 26732371
22.  Automatically clustering large-scale miRNA sequences: methods and experiments 
BMC Genomics  2012;13(Suppl 8):S15.
Since the initial annotation of microRNAs (miRNAs) in 2001, many studies have sought to identify additional miRNAs experimentally or computationally in various species. MiRNAs act with the Argonaut family of proteins to regulate target messenger RNAs (mRNAs) post-transcriptionally. Currently, researches mainly focus on single miRNA function study. Considering that members in the same miRNA family might participate in the same pathway or regulate the same target(s) and thus share similar biological functions, people can explore useful knowledge from high quality miRNA family architecture.
In this article, we developed an unsupervised clustering-based method miRCluster to automatically group miRNAs. In order to evaluate this method, several data sets were constructed from the online database miRBase. Results showed that miRCluster can efficiently arrange miRNAs (e.g identify 354 families in miRBase16 with an accuracy of 92.08%, and can recognize 9 of all 10 newly-added families in miRBase 17). By far, ~30% mature miRNAs registered in miRBase are unclassified. With miRCluster, over 85% unclassified miRNAs can be assigned to certain families, while ~44% of these miRNAs distributed in ~300novel families.
In short, miRCluster is an automatic and efficient miRNA family identification method, which does not require any prior knowledge. It can be helpful in real use, especially when exploring functions of novel miRNAs. All relevant materials could be freely accessed online (
PMCID: PMC3535721  PMID: 23282099
23.  Deep sequencing of small RNAs identifies canonical and non-canonical miRNA and endogenous siRNAs in mammalian somatic tissues 
Nucleic Acids Research  2013;41(5):3339-3351.
MicroRNAs (miRNAs) are small RNA molecules that regulate gene expression. They are characterized by specific maturation processes defined by canonical and non-canonical biogenic pathways. Analysis of ∼0.5 billion sequences from mouse data sets derived from different tissues, developmental stages and cell types, partly characterized by either ablation or mutation of the main proteins belonging to miRNA processor complexes, reveals 66 high-confidence new genomic loci coding for miRNAs that could be processed in a canonical or non-canonical manner. A proportion of the newly discovered miRNAs comprises mirtrons, for which we define a new sub-class. Notably, some of these newly discovered miRNAs are generated from untranslated and open reading frames of coding genes, and we experimentally validate these. We also show that many annotated miRNAs do not present miRNA-like features, as they are neither processed by known processing complexes nor loaded on AGO2; this indicates that the current miRNA miRBase database list should be refined and re-defined. Accordingly, a group of them map on ribosomal RNA molecules, whereas others cannot undergo genuine miRNA biogenesis. Notably, a group of annotated miRNAs are Dgcr8 independent and DICER dependent endogenous small interfering RNAs that derive from a unique hairpin formed from a short interspersed nuclear element.
PMCID: PMC3597668  PMID: 23325850
24.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome 
PLoS Computational Biology  2006;2(4):e33.
The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization.
Structurally functional RNA is a versatile component of the cell that comprises both independent molecules and regulatory elements of mRNA transcripts. The many recent discoveries of functional RNAs, most notably miRNAs, suggests that many more are yet to be found. Computational identification of functional RNAs has traditionally been hampered by the lack of strong sequence signals. However, structural conservation over long evolutionary times creates a characteristic substitution pattern, which can be exploited with the advent of comparative genomics. The authors have devised a method for identification of functional RNA structures based on phylogenetic analysis of multiple alignments. This method has been used to screen the regions of the human genome that are under strong selective constraints. The result is a set of 48,479 candidate RNA structures. For some classes of known functional RNAs, such as miRNAs and histone 3′UTR stem loops, this set includes nearly all deeply conserved members. The initial large candidate set has been partitioned by size, shape, and genomic location and ranked by score to produce specific lists of top candidates for miRNAs, selenocysteine insertion sites, RNA editing hairpins, and RNAs involved in transcript auto regulation.
PMCID: PMC1440920  PMID: 16628248
25.  A Densely Interconnected Genome-Wide Network of MicroRNAs and Oncogenic Pathways Revealed Using Gene Expression Signatures 
PLoS Genetics  2011;7(12):e1002415.
MicroRNAs (miRNAs) are important components of cellular signaling pathways, acting either as pathway regulators or pathway targets. Currently, only a limited number of miRNAs have been functionally linked to specific signaling pathways. Here, we explored if gene expression signatures could be used to represent miRNA activities and integrated with genomic signatures of oncogenic pathway activity to identify connections between miRNAs and oncogenic pathways on a high-throughput, genome-wide scale. Mapping >300 gene expression signatures to >700 primary tumor profiles, we constructed a genome-wide miRNA–pathway network predicting the associations of 276 human miRNAs to 26 oncogenic pathways. The miRNA–pathway network confirmed a host of previously reported miRNA/pathway associations and uncovered several novel associations that were subsequently experimentally validated. Globally, the miRNA–pathway network demonstrates a small-world, but not scale-free, organization characterized by multiple distinct, tightly knit modules each exhibiting a high density of connections. However, unlike genetic or metabolic networks typified by only a few highly connected nodes (“hubs”), most nodes in the miRNA–pathway network are highly connected. Sequence-based computational analysis confirmed that highly-interconnected miRNAs are likely to be regulated by common pathways to target similar sets of downstream genes, suggesting a pervasive and high level of functional redundancy among coexpressed miRNAs. We conclude that gene expression signatures can be used as surrogates of miRNA activity. Our strategy facilitates the task of discovering novel miRNA–pathway connections, since gene expression data for multiple normal and disease conditions are abundantly available.
Author Summary
MicroRNAs (miRNAs) are naturally occurring small RNA molecules of ∼22 nucleotides that regulate gene expression. Recent studies have shown that miRNAs can behave as important components of cellular signaling pathways, as pathway regulators or pathway targets. Currently however, only a few miRNAs have been functionally linked to specific signaling pathways, raising the need for novel approaches to accelerate the identification of miRNA–pathway connections. Here, we show that gene expression signatures, previously used to reflect patterns of pathway activation, can also be used to represent miRNA activities. Using this approach, we constructed a genome-wide miRNA–pathway network predicting the associations of 276 human miRNAs to 26 oncogenic pathways. The miRNA–pathway network confirmed a host of previously reported miRNA/pathway associations and uncovered several novel associations that were subsequently experimentally validated. Besides being the first study to conceptually demonstrate that expression signatures can act as surrogates of miRNA activity, our study provides a large database of candidate pathway-modulating miRNAs, which researchers interested in a particular pathway (e.g. Ras, Myc) are likely to find useful. Moreover, because this approach solely employs gene expression, it is immediately applicable to the thousands of microarray data sets currently available in the public domain.
PMCID: PMC3240594  PMID: 22194702

Results 1-25 (1417858)