PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-19 (19)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Integrated analysis of microRNA and mRNA expression and association with HIF binding reveals the complexity of microRNA expression regulation under hypoxia 
Molecular Cancer  2014;13:28.
Background
In mammalians, HIF is a master regulator of hypoxia gene expression through direct binding to DNA, while its role in microRNA expression regulation, critical in the hypoxia response, is not elucidated genome wide. Our aim is to investigate in depth the regulation of microRNA expression by hypoxia in the breast cancer cell line MCF-7, establish the relationship between microRNA expression and HIF binding sites, pri-miRNA transcription and microRNA processing gene expression.
Methods
MCF-7 cells were incubated at 1% Oxygen for 16, 32 and 48 h. SiRNA against HIF-1α and HIF-2α were performed as previously published. MicroRNA and mRNA expression were assessed using microRNA microarrays, small RNA sequencing, gene expression microarrays and Real time PCR. The Kraken pipeline was applied for microRNA-seq analysis along with Bioconductor packages. Microarray data was analysed using Limma (Bioconductor), ChIP-seq data were analysed using Gene Set Enrichment Analysis and multiple testing correction applied in all analyses.
Results
Hypoxia time course microRNA sequencing data analysis identified 41 microRNAs significantly up- and 28 down-regulated, including hsa-miR-4521, hsa-miR-145-3p and hsa-miR-222-5p reported in conjunction with hypoxia for the first time. Integration of HIF-1α and HIF-2α ChIP-seq data with expression data showed overall association between binding sites and microRNA up-regulation, with hsa-miR-210-3p and microRNAs of miR-27a/23a/24-2 and miR-30b/30d clusters as predominant examples. Moreover the expression of hsa-miR-27a-3p and hsa-miR-24-3p was found positively associated to a hypoxia gene signature in breast cancer. Gene expression analysis showed no full coordination between pri-miRNA and microRNA expression, pointing towards additional levels of regulation. Several transcripts involved in microRNA processing were found regulated by hypoxia, of which DICER (down-regulated) and AGO4 (up-regulated) were HIF dependent. DICER expression was found inversely correlated to hypoxia in breast cancer.
Conclusions
Integrated analysis of microRNA, mRNA and ChIP-seq data in a model cell line supports the hypothesis that microRNA expression under hypoxia is regulated at transcriptional and post-transcriptional level, with the presence of HIF binding sites at microRNA genomic loci associated with up-regulation. The identification of hypoxia and HIF regulated microRNAs relevant for breast cancer is important for our understanding of disease development and design of therapeutic interventions.
doi:10.1186/1476-4598-13-28
PMCID: PMC3928101  PMID: 24517586
MicroRNA; Hypoxia; HIF; Transcription factor; Gene regulation
2.  Differential microRNA Profiles and Their Functional Implications in Different Immunogenetic Subsets of Chronic Lymphocytic Leukemia 
Molecular Medicine  2013;19(1):115-123.
Critical processes of B-cell physiology, including immune signaling through the B-cell receptor (BcR) and/or Toll-like receptors (TLRs), are targeted by microRNAs. With this in mind and also given the important role of BcR and TLR signaling and microRNAs in chronic lymphocytic leukemia (CLL), we investigated whether microRNAs could be implicated in shaping the behavior of CLL clones with distinct BcR and TLR molecular and functional profiles. To this end, we examined 79 CLL cases for the expression of 33 microRNAs, selected on the following criteria: (a) deregulated in CLL versus normal B-cells; (b) differentially expressed in CLL subgroups with distinct clinicobiological features; and, (c) if meeting (a) + (b), having predicted targets in the immune signaling pathways. Significant upregulation of miR-150, miR-29c, miR-143 and miR-223 and downregulation of miR-15a was found in mutated versus unmutated CLL, with miR-15a showing the highest fold difference. Comparison of two major subsets with distinct stereotyped BcRs and signaling signatures, namely subset 1 [IGHV1/5/7-IGKV1(D)-39, unmutated, bad prognosis] versus subset 4 [IGHV4-34/IGKV2-30, mutated, good prognosis] revealed differences in the expression of miR-150, miR-29b, miR-29c and miR-101, all down-regulated in subset 1. We were also able to link these distinct microRNA profiles with cellular phenotypes, importantly showing that, in subset 1, miR-101 downregulation is associated with overexpression of the enhancer of zeste homolog 2 (EZH2) protein, which has been associated with clinical aggressiveness in other B-cell lymphomas. In conclusion, specific miRNAs differentially expressed among CLL subgroups with distinct BcR and/or TLR signaling may modulate the biological and clinical behavior of the CLL clones.
doi:10.2119/molmed.2013.00005
PMCID: PMC3667214  PMID: 23615967
3.  DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs 
Nucleic Acids Research  2012;41(Database issue):D239-D245.
Recently, the attention of the research community has been focused on long non-coding RNAs (lncRNAs) and their physiological/pathological implications. As the number of experiments increase in a rapid rate and transcriptional units are better annotated, databases indexing lncRNA properties and function gradually become essential tools to this process. Aim of DIANA-LncBase (www.microrna.gr/LncBase) is to reinforce researchers’ attempts and unravel microRNA (miRNA)–lncRNA putative functional interactions. This study provides, for the first time, a comprehensive annotation of miRNA targets on lncRNAs. DIANA-LncBase hosts transcriptome-wide experimentally verified and computationally predicted miRNA recognition elements (MREs) on human and mouse lncRNAs. The analysis performed includes an integration of most of the available lncRNA resources, relevant high-throughput HITS-CLIP and PAR-CLIP experimental data as well as state-of-the-art in silico target predictions. The experimentally supported entries available in DIANA-LncBase correspond to >5000 interactions, while the computationally predicted interactions exceed 10 million. DIANA-LncBase hosts detailed information for each miRNA–lncRNA pair, such as external links, graphic plots of transcripts’ genomic location, representation of the binding sites, lncRNA tissue expression as well as MREs conservation and prediction scores.
doi:10.1093/nar/gks1246
PMCID: PMC3531175  PMID: 23193281
4.  miRNA Regulons Associated with Synaptic Function 
PLoS ONE  2012;7(10):e46189.
Differential RNA localization and local protein synthesis regulate synapse function and plasticity in neurons. MicroRNAs are a conserved class of regulatory RNAs that control mRNA stability and translation in tissues. They are abundant in the brain but the extent into which they are involved in synaptic mRNA regulation is poorly known. Herein, a computational analysis of the coding and 3′UTR regions of 242 presynaptic and 304 postsynaptic proteins revealed that 91% of them are predicted to be microRNA targets. Analysis of the longest 3′UTR isoform of synaptic transcripts showed that presynaptic mRNAs have significantly longer 3′UTR than control and postsynaptic mRNAs. In contrast, the shortest 3′UTR isoform of postsynaptic mRNAs is significantly shorter than control and presynaptic mRNAs, indicating they avert microRNA regulation under specific conditions. Examination of microRNA binding site density of synaptic 3′UTRs revealed that they are twice as dense as the rest of protein-coding transcripts and that approximately 50% of synaptic transcripts are predicted to have more than five different microRNA sites. An interaction map exploring the association of microRNAs and their targets revealed that a small set of ten microRNAs is predicted to regulate 77% and 80% of presynaptic and postsynaptic transcripts, respectively. Intriguingly, many of these microRNAs have yet to be identified outside primate mammals, implicating them in cognition differences observed between high-level primates and non-primate mammals. Importantly, the identified miRNAs have been previously associated with psychotic disorders that are characterized by neural circuitry dysfunction, such as schizophrenia. Finally, molecular dissection of their KEGG pathways showed enrichment for neuronal and synaptic processes. Adding on current knowledge, this investigation revealed the extent of miRNA regulation at the synapse and predicted critical microRNAs that would aid future research on the control of neuronal plasticity and etiology of psychiatric diseases.
doi:10.1371/journal.pone.0046189
PMCID: PMC3468272  PMID: 23071543
5.  Use of Mutagenesis, Genetic Mapping and Next Generation Transcriptomics to Investigate Insecticide Resistance Mechanisms 
PLoS ONE  2012;7(6):e40296.
Insecticide resistance is a worldwide problem with major impact on agriculture and human health. Understanding the underlying molecular mechanisms is crucial for the management of the phenomenon; however, this information often comes late with respect to the implementation of efficient counter-measures, particularly in the case of metabolism-based resistance mechanisms. We employed a genome-wide insertional mutagenesis screen to Drosophila melanogaster, using a Minos-based construct, and retrieved a line (MiT[w−]3R2) resistant to the neonicotinoid insecticide Imidacloprid. Biochemical and bioassay data indicated that resistance was due to increased P450 detoxification. Deep sequencing transcriptomic analysis revealed substantial over- and under-representation of 357 transcripts in the resistant line, including statistically significant changes in mixed function oxidases, peptidases and cuticular proteins. Three P450 genes (Cyp4p2, Cyp6a2 and Cyp6g1) located on the 2R chromosome, are highly up-regulated in mutant flies compared to susceptible Drosophila. One of them (Cyp6g1) has been already described as a major factor for Imidacloprid resistance, which validated the approach. Elevated expression of the Cyp4p2 was not previously documented in Drosophila lines resistant to neonicotinoids. In silico analysis using the Drosophila reference genome failed to detect transcription binding factors or microRNAs associated with the over-expressed Cyp genes. The resistant line did not contain a Minos insertion in its chromosomes, suggesting a hit-and-run event, i.e. an insertion of the transposable element, followed by an excision which caused the mutation. Genetic mapping placed the resistance locus to the right arm of the second chromosome, within a ∼1 Mb region, where the highly up-regulated Cyp6g1 gene is located. The nature of the unknown mutation that causes resistance is discussed on the basis of these results.
doi:10.1371/journal.pone.0040296
PMCID: PMC3386967  PMID: 22768270
6.  DIANA miRPath v.2.0: investigating the combinatorial effect of microRNAs in pathways 
Nucleic Acids Research  2012;40(Web Server issue):W498-W504.
MicroRNAs (miRNAs) are key regulators of diverse biological processes and their functional analysis has been deemed central in many research pipelines. The new version of DIANA-miRPath web server was redesigned from the ground-up. The user of DNA Intelligent Analysis (DIANA) DIANA-miRPath v2.0 can now utilize miRNA targets predicted with high accuracy based on DIANA-microT-CDS and/or experimentally verified targets from TarBase v6; combine results with merging and meta-analysis algorithms; perform hierarchical clustering of miRNAs and pathways based on their interaction levels; as well as elaborate sophisticated visualizations, such as dendrograms or miRNA versus pathway heat maps, from an intuitive and easy to use web interface. New modules enable DIANA-miRPath server to provide information regarding pathogenic single nucleotide polymorphisms (SNPs) in miRNA target sites (SNPs module) or to annotate all the predicted and experimentally validated miRNA targets in a selected molecular pathway (Reverse Search module). DIANA-miRPath v2.0 is an efficient and yet easy to use tool that can be incorporated successfully into miRNA-related analysis pipelines. It provides for the first time a series of highly specific tools for miRNA-targeted pathway analysis via a web interface and can be accessed at http://www.microrna.gr/miRPathv2.
doi:10.1093/nar/gks494
PMCID: PMC3394305  PMID: 22649059
8.  In vivo profiling of hypoxic gene expression in gliomas using the hypoxia marker EF5 and laser-capture microdissection 
Cancer research  2011;71(3):779-789.
Hypoxia is a key determinant of tumor aggressiveness, yet little is known regarding hypoxic global gene regulation in vivo. We have employed the hypoxia marker EF5 coupled with laser capture microdissection to isolate RNA from viable hypoxic and normoxic regions of 9L experimental gliomas. Through microarray analysis, we have identified several mRNAs (including the HIF targets Vegf, Glut-1 and Hsp27) with increased levels under hypoxia compared to normoxia both in vitro and in vivo. However, we also found striking differences between the global in vitro and in vivo hypoxic mRNA profiles. Intriguingly, the mRNA levels of a substantial number of immunomodulatory and DNA repair proteins including CXCL9, CD3D and RAD51 were found to be downregulated in hypoxic areas in vivo, consistent with a pro-tumorigenic role of hypoxia in solid tumors. Immunohistochemical staining verified increased HSP27 and decreased RAD51 protein levels in hypoxic vs. normoxic tumor regions. Moreover, CD8+ T cells which are recruited to tumors upon stimulation by CXCL9 and CXCL10, were largely excluded from viable hypoxic areas in vivo. This is the first study to analyze the influence of hypoxia on mRNA levels in vivo and can be readily adapted to obtain a comprehensive picture of hypoxic regulation of gene expression and its influence on biological functions in solid tumors.
doi:10.1158/0008-5472.CAN-10-3061
PMCID: PMC3071295  PMID: 21266355
mRNA; 9L rat glioma; Rad51; HSP27
9.  Accurate microRNA Target Prediction Using Detailed Binding Site Accessibility and Machine Learning on Proteomics Data 
Frontiers in Genetics  2012;2:103.
MicroRNAs (miRNAs) are a class of small regulatory genes regulating gene expression by targeting messenger RNA. Though computational methods for miRNA target prediction are the prevailing means to analyze their function, they still miss a large fraction of the targeted genes and additionally predict a large number of false positives. Here we introduce a novel algorithm called DIANA-microT-ANN which combines multiple novel target site features through an artificial neural network (ANN) and is trained using recently published high-throughput data measuring the change of protein levels after miRNA overexpression, providing positive and negative targeting examples. The features characterizing each miRNA recognition element include binding structure, conservation level, and a specific profile of structural accessibility. The ANN is trained to integrate the features of each recognition element along the 3′untranslated region into a targeting score, reproducing the relative repression fold change of the protein. Tested on two different sets the algorithm outperforms other widely used algorithms and also predicts a significant number of unique and reliable targets not predicted by the other methods. For 542 human miRNAs DIANA-microT-ANN predicts 120000 targets not provided by TargetScan 5.0. The algorithm is freely available at http://microrna.gr/microT-ANN.
doi:10.3389/fgene.2011.00103
PMCID: PMC3265086  PMID: 22303397
microRNAs; target prediction; binding site structure
10.  TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support 
Nucleic Acids Research  2011;40(Database issue):D222-D229.
As the relevant literature and the number of experiments increase at a super linear rate, databases that curate and collect experimentally verified microRNA (miRNA) targets have gradually emerged. These databases attempt to provide efficient access to this wealth of experimental data, which is scattered in thousands of manuscripts. Aim of TarBase 6.0 (http://www.microrna.gr/tarbase) is to face this challenge by providing a significant increase of available miRNA targets derived from all contemporary experimental techniques (gene specific and high-throughput), while incorporating a powerful set of tools in a user-friendly interface. TarBase 6.0 hosts detailed information for each miRNA–gene interaction, ranging from miRNA- and gene-related facts to information specific to their interaction, the experimental validation methodologies and their outcomes. All database entries are enriched with function-related data, as well as general information derived from external databases such as UniProt, Ensembl and RefSeq. DIANA microT miRNA target prediction scores and the relevant prediction details are available for each interaction. TarBase 6.0 hosts the largest collection of manually curated experimentally validated miRNA–gene interactions (more than 65 000 targets), presenting a 16.5–175-fold increase over other available manually curated databases.
doi:10.1093/nar/gkr1161
PMCID: PMC3245116  PMID: 22135297
11.  DIANA-microT Web server upgrade supports Fly and Worm miRNA target prediction and bibliographic miRNA to disease association 
Nucleic Acids Research  2011;39(Web Server issue):W145-W148.
microRNAs (miRNAs) are small endogenous RNA molecules that are implicated in many biological processes through post-transcriptional regulation of gene expression. The DIANA-microT Web server provides a user-friendly interface for comprehensive computational analysis of miRNA targets in human and mouse. The server has now been extended to support predictions for two widely studied species: Drosophila melanogaster and Caenorhabditis elegans. In the updated version, the Web server enables the association of miRNAs to diseases through bibliographic analysis and provides insights for the potential involvement of miRNAs in biological processes. The nomenclature used to describe mature miRNAs along different miRBase versions has been extensively analyzed, and the naming history of each miRNA has been extracted. This enables the identification of miRNA publications regardless of possible nomenclature changes. User interaction has been further refined allowing users to save results that they wish to analyze further. A connection to the UCSC genome browser is now provided, enabling users to easily preview predicted binding sites in comparison to a wide array of genomic tracks, such as single nucleotide polymorphisms. The Web server is publicly accessible in www.microrna.gr/microT-v4.
doi:10.1093/nar/gkr294
PMCID: PMC3125744  PMID: 21551220
12.  Redirection of Silencing Targets by Adenosine-to-Inosine Editing of miRNAs 
Science (New York, N.Y.)  2007;315(5815):1137-1140.
Primary transcripts of certain microRNA (miRNA) genes are subject to RNA editing that converts adenosine to inosine. However, the importance of miRNA editing remains largely undetermined. Here we report that tissue-specific adenosine-to-inosine editing of miR-376 cluster transcripts leads to predominant expression of edited miR-376 isoform RNAs. One highly edited site is positioned in the middle of the 5′-proximal half “seed” region critical for the hybridization of miRNAs to targets. We provide evidence that the edited miR-376 RNA silences specifically a different set of genes. Repression of phosphoribosyl pyrophosphate synthetase 1, a target of the edited miR-376 RNA and an enzyme involved in the uric-acid synthesis pathway, contributes to tight and tissue-specific regulation of uric-acid levels, revealing a previously unknown role for RNA editing in miRNA-mediated gene silencing.
doi:10.1126/science.1138050
PMCID: PMC2953418  PMID: 17322061
13.  The DIANA-mirExTra Web Server: From Gene Expression Data to MicroRNA Function 
PLoS ONE  2010;5(2):e9171.
Background
High-throughput gene expression experiments are widely used to identify the role of genes involved in biological conditions of interest. MicroRNAs (miRNA) are regulatory molecules that have been functionally associated with several developmental programs and their deregulation with diverse diseases including cancer.
Methodology/Principal Findings
Although miRNA expression levels may not be routinely measured in high-throughput experiments, a possible involvement of miRNAs in the deregulation of gene expression can be computationally predicted and quantified through analysis of overrepresented motifs in the deregulated genes 3′ untranslated region (3′UTR) sequences. Here, we introduce a user-friendly web-server, DIANA-mirExTra (www.microrna.gr/mirextra) that allows the comparison of frequencies of miRNA associated motifs between sets of genes that can lead to the identification of miRNAs responsible for the deregulation of large numbers of genes. To this end, we have investigated different approaches and measures, and have practically implemented them on experimental data.
Conclusions/Significance
On several datasets of miRNA overexpression and repression experiments, our proposed approaches have successfully identified the deregulated miRNA. Beyond the prediction of miRNAs responsible for the deregulation of transcripts, the web-server provides extensive links to DIANA-mirPath, a functional analysis tool incorporating miRNA targets in biological pathways. Additionally, in case information about miRNA expression changes is provided, the results can be filtered to display the analysis for miRNAs of interest only.
doi:10.1371/journal.pone.0009171
PMCID: PMC2820085  PMID: 20161787
14.  miRGen 2.0: a database of microRNA genomic information and regulation 
Nucleic Acids Research  2009;38(Database issue):D137-D141.
MicroRNAs are small, non-protein coding RNA molecules known to regulate the expression of genes by binding to the 3′UTR region of mRNAs. MicroRNAs are produced from longer transcripts which can code for more than one mature miRNAs. miRGen 2.0 is a database that aims to provide comprehensive information about the position of human and mouse microRNA coding transcripts and their regulation by transcription factors, including a unique compilation of both predicted and experimentally supported data. Expression profiles of microRNAs in several tissues and cell lines, single nucleotide polymorphism locations, microRNA target prediction on protein coding genes and mapping of miRNA targets of co-regulated miRNAs on biological pathways are also integrated into the database and user interface. The miRGen database will be continuously maintained and freely available at http://www.microrna.gr/mirgen/.
doi:10.1093/nar/gkp888
PMCID: PMC2808909  PMID: 19850714
15.  Accurate microRNA target prediction correlates with protein repression levels 
BMC Bioinformatics  2009;10:295.
Background
MicroRNAs are small endogenously expressed non-coding RNA molecules that regulate target gene expression through translation repression or messenger RNA degradation. MicroRNA regulation is performed through pairing of the microRNA to sites in the messenger RNA of protein coding genes. Since experimental identification of miRNA target genes poses difficulties, computational microRNA target prediction is one of the key means in deciphering the role of microRNAs in development and disease.
Results
DIANA-microT 3.0 is an algorithm for microRNA target prediction which is based on several parameters calculated individually for each microRNA and combines conserved and non-conserved microRNA recognition elements into a final prediction score, which correlates with protein production fold change. Specifically, for each predicted interaction the program reports a signal to noise ratio and a precision score which can be used as an indication of the false positive rate of the prediction.
Conclusion
Recently, several computational target prediction programs were benchmarked based on a set of microRNA target genes identified by the pSILAC method. In this assessment DIANA-microT 3.0 was found to achieve the highest precision among the most widely used microRNA target prediction programs reaching approximately 66%. The DIANA-microT 3.0 prediction results are available online in a user friendly web server at
doi:10.1186/1471-2105-10-295
PMCID: PMC2752464  PMID: 19765283
16.  The database of experimentally supported targets: a functional update of TarBase 
Nucleic Acids Research  2008;37(Database issue):D155-D158.
TarBase5.0 is a database which houses a manually curated collection of experimentally supported microRNA (miRNA) targets in several animal species of central scientific interest, plants and viruses. MiRNAs are small non-coding RNA molecules that exhibit an inhibitory effect on gene expression, interfering with the stability and translational efficiency of the targeted mature messenger RNAs. Even though several computational programs exist to predict miRNA targets, there is a need for a comprehensive collection and description of miRNA targets with experimental support. Here we introduce a substantially extended version of this resource. The current version includes more than 1300 experimentally supported targets. Each target site is described by the miRNA that binds it, the gene in which it occurs, the nature of the experiments that were conducted to test it, the sufficiency of the site to induce translational repression and/or cleavage, and the paper from which all these data were extracted. Additionally, the database is functionally linked to several other relevant and useful databases such as Ensembl, Hugo, UCSC and SwissProt. The TarBase5.0 database can be queried or downloaded from http://microrna.gr/tarbase.
doi:10.1093/nar/gkn809
PMCID: PMC2686456  PMID: 18957447
17.  Frequency and fate of microRNA editing in human brain 
Nucleic Acids Research  2008;36(16):5270-5280.
Primary transcripts of certain microRNA (miRNA) genes (pri-miRNAs) are subject to RNA editing that converts adenosine to inosine (A→I RNA editing). However, the frequency of the pri-miRNA editing and the fate of edited pri-miRNAs remain largely to be determined. Examination of already known pri-miRNA editing sites indicated that adenosine residues of the UAG triplet sequence might be edited more frequently. In the present study, therefore, we conducted a large-scale survey of human pri-miRNAs containing the UAG triplet sequence. By direct sequencing of RT–PCR products corresponding to pri-miRNAs, we examined 209 pri-miRNAs and identified 43 UAG and also 43 non-UAG editing sites in 47 pri-miRNAs, which were highly edited in human brain. In vitro miRNA processing assay using recombinant Drosha-DGCR8 and Dicer-TRBP (the human immuno deficiency virus transactivating response RNA-binding protein) complexes revealed that a majority of pri-miRNA editing is likely to interfere with the miRNA processing steps. In addition, four new edited miRNAs with altered seed sequences were identified by targeted cloning and sequencing of the miRNAs that would be processed from edited pri-miRNAs. Our studies predict that ∼16% of human pri-miRNAs are subject to A→I editing and, thus, miRNA editing could have a large impact on the miRNA-mediated gene silencing.
doi:10.1093/nar/gkn479
PMCID: PMC2532740  PMID: 18684997
18.  Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction 
PLoS Computational Biology  2007;3(3):e54.
Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.
Author Summary
We describe a new approach to statistical learning for sequence data that is broadly applicable to computational biology problems and that has experimentally demonstrated advantages over current hidden Markov model (HMM)-based methods for sequence analysis. The methods we describe in this paper, implemented in the CRAIG program, allow researchers to modularly specify and train sequence analysis models that combine a wide range of weakly informative features into globally optimal predictions. Our results for the gene prediction problem show significant improvements over existing ab initio gene predictors on a variety of tests, including the specially challenging ENCODE regions. Such improved predictions, particularly on initial and single exons, could benefit researchers who are seeking more accurate means of recognizing such important features as signal peptides and regulatory regions. More generally, we believe that our method, by combining the structure-describing capabilities of HMMs with the accuracy of margin-based classification methods, provides a general tool for statistical learning in biological sequences that will replace HMMs in any sequence modeling task for which there is annotated training data.
doi:10.1371/journal.pcbi.0030054
PMCID: PMC1828702  PMID: 17367206
19.  miRGen: a database for the study of animal microRNA genomic organization and function 
Nucleic Acids Research  2006;35(Database issue):D149-D155.
miRGen is an integrated database of (i) positional relationships between animal miRNAs and genomic annotation sets and (ii) animal miRNA targets according to combinations of widely used target prediction programs. A major goal of the database is the study of the relationship between miRNA genomic organization and miRNA function. This is made possible by three integrated and user friendly interfaces. The Genomics interface allows the user to explore where whole-genome collections of miRNAs are located with respect to UCSC genome browser annotation sets such as Known Genes, Refseq Genes, Genscan predicted genes, CpG islands and pseudogenes. These miRNAs are connected through the Targets interface to their experimentally supported target genes from TarBase, as well as computationally predicted target genes from optimized intersections and unions of several widely used mammalian target prediction programs. Finally, the Clusters interface provides predicted miRNA clusters at any given inter-miRNA distance and provides specific functional information on the targets of miRNAs within each cluster. All of these unique features of miRGen are designed to facilitate investigations into miRNA genomic organization, co-transcription and targeting. miRGen can be freely accessed at .
doi:10.1093/nar/gkl904
PMCID: PMC1669779  PMID: 17108354

Results 1-19 (19)