Motivation: A-to-I RNA editing is an important mechanism that consists of the conversion of specific adenosines into inosines in RNA molecules. Its dysregulation has been associated to several human diseases including cancer. Recent work has demonstrated a role for A-to-I editing in microRNA (miRNA)-mediated gene expression regulation. In fact, edited forms of mature miRNAs can target sets of genes that differ from the targets of their unedited forms. The specific deamination of mRNAs can generate novel binding sites in addition to potentially altering existing ones.
Results: This work presents miR-EdiTar, a database of predicted A-to-I edited miRNA binding sites. The database contains predicted miRNA binding sites that could be affected by A-to-I editing and sites that could become miRNA binding sites as a result of A-to-I editing.
Availability: miR-EdiTar is freely available online at http://microrna.osumc.edu/mireditar.
email@example.com or firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.
Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i) do not fully exploit available parallel computing power and (ii) they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.
We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier), based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.
MicroRNAs (miRNAs) are small non-coding RNAs responsible of post-transcriptional regulation of gene expression through interaction with messenger RNAs (mRNAs). They are involved in important biological processes and are often dysregulated in a variety of diseases, including cancer and infections. Viruses also encode their own sets of miRNAs, which they use to control the expression of either the host’s genes and/or their own. In the past few years evidence of the presence of cellular miRNAs in extracellular human body fluids such as serum, plasma, saliva, and urine has accumulated. They have been found either cofractionate with the Argonaute2 protein or in membrane-bound vesicles such as exosomes. Although little is known about the role of circulating miRNAs, it has been demonstrated that miRNAs secreted by virus-infected cells are transferred to and act in uninfected recipient cells. In this work we summarize the current knowledge on viral circulating miRNAs and provide a few examples of computational prediction of their function.
microRNA; viruses; exosomes; circulating microRNA; vesicules; body fluids
Motivation: The identification of drug–target interaction (DTI) represents a costly and time-consuming step in drug discovery and design. Computational methods capable of predicting reliable DTI play an important role in the field. Recently, recommendation methods relying on network-based inference (NBI) have been proposed. However, such approaches implement naive topology-based inference and do not take into account important features within the drug–target domain.
Results: In this article, we present a new NBI method, called domain tuned-hybrid (DT-Hybrid), which extends a well-established recommendation technique by domain-based knowledge including drug and target similarity. DT-Hybrid has been extensively tested using the last version of an experimentally validated DTI database obtained from DrugBank. Comparison with other recently proposed NBI methods clearly shows that DT-Hybrid is capable of predicting more reliable DTIs.
Availability: DT-Hybrid has been developed in R and it is available, along with all the results on the predictions, through an R package at the following URL: http://sites.google.com/site/ehybridalgo/.
Supplementary data are available at Bioinformatics online.
The BITS2012 meeting, held in Catania on May 2-4, 2012, brought together almost 100 Italian researchers working in the field of Bioinformatics, as well as students in the same or related disciplines. About 90 original research works were presented either as oral communication or as posters, representing a landscape of Italian current research in bioinformatics.
This preface provides a brief overview of the meeting and introduces the manuscripts that were accepted for publication in this supplement, after a strict and careful peer-review by an International board of referees.
Graphs can represent biological networks at the molecular, protein, or species level. An important query is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.
We propose a new subgraph isomorphism algorithm which applies a search strategy to significantly reduce the search space without using any complex pruning rules or domain reduction procedures. We compare our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++ implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and interaction networks data. We show a significant reduction in the running time of our approach compared with these other excellent methods and show that our algorithm scales well as memory demands increase.
Subgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a comprehensive comparison of different software approaches to subgraph isomorphism highlighting their weaknesses and strengths. This will help researchers make a rational choice among methods depending on their application. We also distribute an open-source package including our system and our own C++ implementation of FocusSearch together with all the used datasets (http://ferrolab.dmi.unict.it/ri.html). In future work, our findings may be extended to approximate subgraph isomorphism algorithms.
Subgraph isomorphism algorithms; biochemical graph data; search strategies; algorithms comparisons and distributions
RNA Editing is a type of post-transcriptional modification that takes place in the eukaryotes. It alters the sequence of primary RNA transcripts by deleting, inserting or modifying residues. Several forms of RNA editing have been discovered including A-to-I, C-to-U, U-to-C and G-to-A. In recent years, the application of global approaches to the study of A-to-I editing, including high throughput sequencing, has led to important advances. However, in spite of enormous efforts, the real biological mechanism underlying this phenomenon remains unknown.
In this work, we present VIRGO (http://atlas.dmi.unict.it/virgo/), a web-based tool that maps Ato-G mismatches between genomic and EST sequences as candidate A-to-I editing sites. VIRGO is built on top of a knowledge-base integrating information of genes from UCSC, EST of NCBI, SNPs, DARNED, and Next Generations Sequencing data. The tool is equipped with a user-friendly interface allowing users to analyze genomic sequences in order to identify candidate A-to-I editing sites.
VIRGO is a powerful tool allowing a systematic identification of putative A-to-I editing sites in genomic sequences. The integration of NGS data allows the computation of p-values and adjusted p-values to measure the mapped editing sites confidence. The whole knowledge base is available for download and will be continuously updated as new NGS data becomes available.
Triple negative breast cancer (TNBC) is a heterogeneous disease at the molecular, pathologic and clinical levels. To stratify TNBCs, we determined microRNA (miRNA) expression profiles, as well as expression profiles of a cancer-focused mRNA panel, in tumor, adjacent non-tumor (normal) and lymph node metastatic lesion (mets) tissues, from 173 women with TNBCs; we linked specific miRNA signatures to patient survival and used miRNA/mRNA anti-correlations to identify clinically and genetically different TNBC subclasses. We also assessed miRNA signatures as potential regulators of TNBC subclass-specific gene expression networks defined by expression of canonical signal pathways.
Tissue specific miRNAs and mRNAs were identified for normal vs tumor vs mets comparisons. miRNA signatures correlated with prognosis were identified and predicted anti-correlated targets within the mRNA profile were defined. Two miRNA signatures (miR-16, 155, 125b, 374a and miR-16, 125b, 374a, 374b, 421, 655, 497) predictive of overall survival (P = 0.05) and distant-disease free survival (P = 0.009), respectively, were identified for patients 50 yrs of age or younger. By multivariate analysis the risk signatures were independent predictors for overall survival and distant-disease free survival. mRNA expression profiling, using the cancer-focused mRNA panel, resulted in clustering of TNBCs into 4 molecular subclasses with different expression signatures anti-correlated with the prognostic miRNAs.
Our findings suggest that miRNAs play a key role in triple negative breast cancer through their ability to regulate fundamental pathways such as: cellular growth and proliferation, cellular movement and migration, Extra Cellular Matrix degradation. The results define miRNA expression signatures that characterize and contribute to the phenotypic diversity of TNBC and its metastasis.
MicroRNAs are small noncoding RNAs that play an important role in the regulation of various biological processes through their interaction with cellular messenger RNAs. They are frequently dysregulated in cancer and have shown great potential as tissue-based markers for cancer classification and prognostication. microRNAs are also present in extracellular human body fluids such as serum, plasma, saliva, and urine. Most of circulating microRNAs are present in human plasma and serum cofractionate with the Argonaute2 (Ago2) protein. However, circulating microRNAs have been also found in membrane-bound vesicles such as exosomes. Since microRNAs circulate in the bloodstream in a highly stable, extracellular form, they may be used as blood-based biomarkers for cancer and other diseases. A knowledge base of extracellular circulating miRNAs is a fundamental tool for biomedical research. In this work, we present miRandola, a comprehensive manually curated classification of extracellular circulating miRNAs. miRandola is connected to miRò, the miRNA knowledge base, allowing users to infer the potential biological functions of circulating miRNAs and their connections with phenotypes. The miRandola database contains 2132 entries, with 581 unique mature miRNAs and 21 types of samples. miRNAs are classified into four categories, based on their extracellular form: miRNA-Ago2 (173 entries), miRNA-exosome (856 entries), miRNA-HDL (20 entries) and miRNA-circulating (1083 entries). miRandola is available online at: http://atlas.dmi.unict.it/mirandola/index.html.
Advanced research requires intensive interaction among a multitude of actors, often possessing different expertise and usually working at a distance from each other. The field of collaborative research aims to establish suitable models and technologies to properly support these interactions. In this article, we first present the reasons for an interest of Bioinformatics in this context by also suggesting some research domains that could benefit from collaborative research. We then review the principles and some of the most relevant applications of social networking, with a special attention to networks supporting scientific collaboration, by also highlighting some critical issues, such as identification of users and standardization of formats. We then introduce some systems for collaborative document creation, including wiki systems and tools for ontology development, and review some of the most interesting biological wikis. We also review the principles of Collaborative Development Environments for software and show some examples in Bioinformatics. Finally, we present the principles and some examples of Learning Management Systems. In conclusion, we try to devise some of the goals to be achieved in the short term for the exploitation of these technologies.
social networks; open source; collaborative research; collaborative development; collaborative learning
Chromosomal fragile sites are heritable specific loci especially prone to breakage. Some of them are associated with human genetic disorders and several studies have demonstrated their importance in genome instability in cancer. MicroRNAs (miRNAs) are small non-coding RNAs responsible of post-transcriptional gene regulation and their involvement in several diseases such as cancer has been widely demonstrated. The altered expression of miRNAs is sometimes due to chromosomal rearrangements and epigenetic events, thus it is essential to study miRNAs in the context of their genomic locations, in order to find significant correlations between their aberrant expression and the phenotype.
Here we use statistical models to study the incidence of human miRNA genes on fragile sites and their association with cancer-specific translocation breakpoints, repetitive elements, and CpG islands. Our results show that, on average, fragile sites are denser in miRNAs and also in protein coding genes. However, the distribution of miRNAs and protein coding genes in fragile versus non-fragile sites depends on chromosome. We find also a positive correlation between fragility and repeats, and between miRNAs and CpG islands.
Our results show that the relationship between site fragility and miRNA density is far more complex than previously thought. For example, we find that protein coding genes seem to be following similar patterns as miRNAs, if considered their overall distribution. However, once we allow for differences at the chromosome level in our statistical analysis, we find that distribution of miRNA and protein coding genes in fragile sites is very different from that of miRNA. This is a novel result that we believe may help discover new potential correlations between the localization of miRNAs and their crucial role in biological processes and in the development of diseases.
MicroRNAs (miRNAs) are small RNA molecules that modulate gene expression through degradation of specific mRNAs and/or repression of their translation. miRNAs are involved in both physiological and pathological processes, such as apoptosis and cancer. Their presence has been demonstrated in several organisms as well as in viruses. Virus encoded miRNAs can act as viral gene expression regulators, but they may also interfere with the expression of host genes. Viral miRNAs may control host cell proliferation by targeting cell-cycle and apoptosis regulators. Therefore, they could be involved in cancer pathogenesis. Computational prediction of miRNA/target pairs is a fundamental step in these studies. Here, we describe the use of miRiam, a novel program based on both thermodynamics features and empirical constraints, to predict viral miRNAs/human targets interactions. miRiam exploits target mRNA secondary structure accessibility and interaction rules, inferred from validated miRNA/mRNA pairs. A set of genes involved in apoptosis and cell-cycle regulation was identified as target for our studies. This choice was supported by the knowledge that DNA tumor viruses interfere with the above processes in humans. miRNAs were selected from two cancer-related viruses, Epstein-Barr Virus (EBV) and Kaposi-Sarcoma-Associated Herpes Virus (KSHV). Results show that several transcripts possess potential binding sites for these miRNAs. This work has produced a set of plausible hypotheses of involvement of v-miRNAs and human apoptosis genes in cancer development. Our results suggest that during viral infection, besides the protein-based host regulation mechanism, a post-transcriptional level interference may exist. miRiam is freely available for downloading at http://ferrolab.dmi.unict.it/miriam.
miRNA; virus; cancer; apoptosis; cell cycle; EBV; KSHV
Finding the subgraphs of a graph database that are isomorphic to a given query graph has practical applications in several fields, from cheminformatics to image understanding. Since subgraph isomorphism is a computationally hard problem, indexing techniques have been intensively exploited to speed up the process. Such systems filter out those graphs which cannot contain the query, and apply a subgraph isomorphism algorithm to each residual candidate graph. The applicability of such systems is limited to databases of small graphs, because their filtering power degrades on large graphs.
In this paper, SING (Subgraph search In Non-homogeneous Graphs), a novel indexing system able to cope with large graphs, is presented. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task.
Extensive tests on chemical compounds, biological networks and synthetic graphs show that the proposed system outperforms the most popular systems in query time over databases of medium and large graphs. Other specific tests show that the proposed system is effective for single large graphs.
The General Transcription Apparatus (GTA) comprises more than one hundred proteins, including RNA Polymerases, GTFs, TAFs, Mediator, and cofactors such as heterodimeric NC2. This complexity contrasts with the simple mechanical role that these proteins are believed to perform and suggests a still uncharacterized participation to important biological functions, such as the control of cell proliferation.
To verify our hypothesis, we analyzed the involvement in Neuroblastoma (NB) pathogenesis of GTA genes localized at 1p, one of NB critical regions: through RT-PCR of fifty eight NB biopsies, we demonstrated the statistically significant reduction of the mRNA for NC2β (localized at 1p22.1) in 74% of samples (p = 0.0039). Transcripts from TAF13 and TAF12 (mapping at 1p13.3 and 1p35.3, respectively) were also reduced, whereas we didn't detect any quantitative alteration of the mRNAs from GTF2B and NC2α (localized at 1p22-p21 and 11q13.3, respectively). We confirmed these data by comparing tumour and constitutional DNA: most NB samples with diminished levels of NC2β mRNA had also genomic deletions at the corresponding locus.
Our data show that NC2β is specifically involved in NB pathogenesis and may be considered a new NB biomarker: accordingly, we suggest that NC2β, and possibly other GTA members, are physiologically involved in the control of cell proliferation. Finally, our studies unearth complex selective mechanisms within NB cells.
Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, a key role is played by systems that search for all exact or approximate occurrences of a query graph. To deal efficiently with graph searching, advanced methods for indexing, representation and matching of graphs have been proposed.
This paper presents GraphFind. The system implements efficient graph searching algorithms together with advanced filtering techniques that allow approximate search. It allows users to select candidate subgraphs rather than entire graphs. It implements an effective data storage based also on low-support data mining.
GraphFind is compared with Frowns, GraphGrep and gIndex. Experiments show that GraphFind outperforms the compared systems on a very large collection of small graphs. The proposed low-support mining technique which applies to any searching system also allows a significant index space reduction.
Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species.
This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision.
The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.