Comparing protein interaction networks can reveal interesting patterns of interactions for a specific function or process in distantly related species. In this paper we present GASOLINE, a Cytoscape app for multiple local alignments of PPI (protein-protein interaction) networks. The app is based on the homonymous greedy and stochastic algorithm. GASOLINE starts with the identification of sets of similar nodes, called seeds of the alignment. Alignments are then extended in a greedy manner and finally refined. Both the identification of seeds and the extension of alignments are performed through an iterative Gibbs sampling strategy. GASOLINE is a Cytoscape app for computing and visualizing local alignments, without requiring any post-processing operations. GO terms can be easily attached to the aligned proteins for further functional analysis of alignments. GASOLINE can perform the alignment task in few minutes, even for a large number of input networks.
Multiple local structure comparison helps to identify common structural motifs or conserved binding sites in 3D structures in distantly related proteins. Since there is no best way to compare structures and evaluate the alignment, a wide variety of techniques and different similarity scoring schemes have been proposed. Existing algorithms usually compute the best superposition of two structures or attempt to solve it as an optimization problem in a simpler setting (e.g., considering contact maps or distance matrices). Here, we present PROPOSAL (PROteins comparison through Probabilistic Optimal Structure local ALignment), a stochastic algorithm based on iterative sampling for multiple local alignment of protein structures. Our method can efficiently find conserved motifs across a set of protein structures. Only the distances between all pairs of residues in the structures are computed. To show the accuracy and the effectiveness of PROPOSAL we tested it on a few families of protein structures. We also compared PROPOSAL with two state-of-the-art tools for pairwise local alignment on a dataset of manually annotated motifs. PROPOSAL is available as a Java 2D standalone application or a command line program at http://ferrolab.dmi.unict.it/proposal/proposal.html.
structure comparison; protein comparison; local alignment; protein families; motifs identification; binding sites identification
Comparing protein interaction networks can reveal interesting patterns of interactions for a specific function or process in distantly related species. In this paper we present GASOLINE, a Cytoscape app for multiple local alignments of PPI (protein-protein interaction) networks. The app is based on the homonymous greedy and stochastic algorithms. To the authors knowledge, it is the first Cytoscape app for computing and visualizing local alignments, without requiring any post-processing operations. GO terms can be easily attached to the aligned proteins for further functional analysis of alignments. GASOLINE can perform the alignment task in few minutes, even for a large number of input networks.
The analysis of structure and dynamics of biological networks plays a central role in understanding the intrinsic complexity of biological systems. Biological networks have been considered a suitable formalism to extend evolutionary and comparative biology. In this paper we present GASOLINE, an algorithm for multiple local network alignment based on statistical iterative sampling in connection to a greedy strategy. GASOLINE overcomes the limits of current approaches by producing biologically significant alignments within a feasible running time, even for very large input instances. The method has been extensively tested on a database of real and synthetic biological networks. A comprehensive comparison with state-of-the art algorithms clearly shows that GASOLINE yields the best results in terms of both reliability of alignments and running time on real biological networks and results comparable in terms of quality of alignments on synthetic networks. GASOLINE has been developed in Java, and is available, along with all the computed alignments, at the following URL: http://ferrolab.dmi.unict.it/gasoline/gasoline.html.
MicroRNAs (miRNAs) are small noncoding RNAs that play an important role in the regulation of various biological processes through their interaction with cellular mRNAs. A significant amount of miRNAs has been found in extracellular human body fluids (e.g. plasma and serum) and some circulating miRNAs in the blood have been successfully revealed as biomarkers for diseases including cardiovascular diseases and cancer. Released miRNAs do not necessarily reflect the abundance of miRNAs in the cell of origin. It is claimed that release of miRNAs from cells into blood and ductal fluids is selective and that the selection of released miRNAs may correlate with malignancy. Moreover, miRNAs play a significant role in pharmacogenomics by down-regulating genes that are important for drug function. In particular, the use of drugs should be taken into consideration while analyzing plasma miRNA levels as drug treatment. This may impair their employment as biomarkers.
We enriched our manually curated extracellular/circulating microRNAs database, miRandola, by providing (i) a systematic comparison of expression profiles of cellular and extracellular miRNAs, (ii) a miRNA targets enrichment analysis procedure, (iii) information on drugs and their effect on miRNA expression, obtained by applying a natural language processing algorithm to abstracts obtained from PubMed.
This allows users to improve the knowledge about the function, diagnostic potential, and the drug effects on cellular and circulating miRNAs.
RNAi is a powerful tool for the regulation of gene expression. It is widely and successfully employed in functional studies and is now emerging as a promising therapeutic approach. Several RNAi-based clinical trials suggest encouraging results in the treatment of a variety of diseases, including cancer. Here we present miR-Synth, a computational resource for the design of synthetic microRNAs able to target multiple genes in multiple sites. The proposed strategy constitutes a valid alternative to the use of siRNA, allowing the employment of a fewer number of molecules for the inhibition of multiple targets. This may represent a great advantage in designing therapies for diseases caused by crucial cellular pathways altered by multiple dysregulated genes. The system has been successfully validated on two of the most prominent genes associated to lung cancer, c-MET and Epidermal Growth Factor Receptor (EGFR). (See http://microrna.osumc.edu/mir-synth).
Research attention has been powered to understand the functional roles of non-coding RNAs (ncRNAs). Many studies have demonstrated their deregulation in cancer and other human disorders. ncRNAs are also present in extracellular human body fluids such as serum and plasma, giving them a great potential as non-invasive biomarkers. However, non-coding RNAs have been relatively recently discovered and a comprehensive database including all of them is still missing. Reconstructing and visualizing the network of ncRNAs interactions are important steps to understand their regulatory mechanism in complex systems. This work presents ncRNA-DB, a NoSQL database that integrates ncRNAs data interactions from a large number of well established on-line repositories. The interactions involve RNA, DNA, proteins, and diseases. ncRNA-DB is available at http://ncrnadb.scienze.univr.it/ncrnadb/. It is equipped with three interfaces: web based, command-line, and a Cytoscape app called ncINetView. By accessing only one resource, users can search for ncRNAs and their interactions, build a network annotated with all known ncRNAs and associated diseases, and use all visual and mining features available in Cytoscape.
microRNAs; lncRNAs; non-coding RNAs; networks; cytoscape; gene expression
Motivation: Over the past few years, experimental evidence has highlighted the role of microRNAs to human diseases. miRNAs are critical for the regulation of cellular processes, and, therefore, their aberration can be among the triggering causes of pathological phenomena. They are just one member of the large class of non-coding RNAs, which include transcribed ultra-conserved regions (T-UCRs), small nucleolar RNAs (snoRNAs), PIWI-interacting RNAs (piRNAs), large intergenic non-coding RNAs (lincRNAs) and, the heterogeneous group of long non-coding RNAs (lncRNAs). Their associations with diseases are few in number, and their reliability is questionable. In literature, there is only one recent method proposed by Yang et al. (2014) to predict lncRNA-disease associations. This technique, however, lacks in prediction quality. All these elements entail the need to investigate new bioinformatics tools for the prediction of high quality ncRNA-disease associations. Here, we propose a method called ncPred for the inference of novel ncRNA-disease association based on recommendation technique. We represent our knowledge through a tripartite network, whose nodes are ncRNAs, targets, or diseases. Interactions in such a network associate each ncRNA with a disease through its targets. Our algorithm, starting from such a network, computes weights between each ncRNA-disease pair using a multi-level resource transfer technique that at each step takes into account the resource transferred in the previous one.
Results: The results of our experimental analysis show that our approach is able to predict more biologically significant associations with respect to those obtained by Yang et al. (2014), yielding an improvement in terms of the average area under the ROC curve (AUC). These results prove the ability of our approach to predict biologically significant associations, which could lead to a better understanding of the molecular processes involved in complex diseases.
Availability: All the ncPred predictions together with the datasets used for the analysis are available at the following url: http://alpha.dmi.unict.it/ncPred/
ncRNAs-diseases association predictions; lncRNAs functional characterization; network-based inference; tripartite networks; resource transfer algorithm
Motivation: A-to-I RNA editing is an important mechanism that consists of the conversion of specific adenosines into inosines in RNA molecules. Its dysregulation has been associated to several human diseases including cancer. Recent work has demonstrated a role for A-to-I editing in microRNA (miRNA)-mediated gene expression regulation. In fact, edited forms of mature miRNAs can target sets of genes that differ from the targets of their unedited forms. The specific deamination of mRNAs can generate novel binding sites in addition to potentially altering existing ones.
Results: This work presents miR-EdiTar, a database of predicted A-to-I edited miRNA binding sites. The database contains predicted miRNA binding sites that could be affected by A-to-I editing and sites that could become miRNA binding sites as a result of A-to-I editing.
Availability: miR-EdiTar is freely available online at http://microrna.osumc.edu/mireditar.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i) do not fully exploit available parallel computing power and (ii) they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.
We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier), based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.
MicroRNAs (miRNAs) are small non-coding RNAs responsible of post-transcriptional regulation of gene expression through interaction with messenger RNAs (mRNAs). They are involved in important biological processes and are often dysregulated in a variety of diseases, including cancer and infections. Viruses also encode their own sets of miRNAs, which they use to control the expression of either the host’s genes and/or their own. In the past few years evidence of the presence of cellular miRNAs in extracellular human body fluids such as serum, plasma, saliva, and urine has accumulated. They have been found either cofractionate with the Argonaute2 protein or in membrane-bound vesicles such as exosomes. Although little is known about the role of circulating miRNAs, it has been demonstrated that miRNAs secreted by virus-infected cells are transferred to and act in uninfected recipient cells. In this work we summarize the current knowledge on viral circulating miRNAs and provide a few examples of computational prediction of their function.
microRNA; viruses; exosomes; circulating microRNA; vesicules; body fluids
Motivation: The identification of drug–target interaction (DTI) represents a costly and time-consuming step in drug discovery and design. Computational methods capable of predicting reliable DTI play an important role in the field. Recently, recommendation methods relying on network-based inference (NBI) have been proposed. However, such approaches implement naive topology-based inference and do not take into account important features within the drug–target domain.
Results: In this article, we present a new NBI method, called domain tuned-hybrid (DT-Hybrid), which extends a well-established recommendation technique by domain-based knowledge including drug and target similarity. DT-Hybrid has been extensively tested using the last version of an experimentally validated DTI database obtained from DrugBank. Comparison with other recently proposed NBI methods clearly shows that DT-Hybrid is capable of predicting more reliable DTIs.
Availability: DT-Hybrid has been developed in R and it is available, along with all the results on the predictions, through an R package at the following URL: http://sites.google.com/site/ehybridalgo/.
Supplementary data are available at Bioinformatics online.
The BITS2012 meeting, held in Catania on May 2-4, 2012, brought together almost 100 Italian researchers working in the field of Bioinformatics, as well as students in the same or related disciplines. About 90 original research works were presented either as oral communication or as posters, representing a landscape of Italian current research in bioinformatics.
This preface provides a brief overview of the meeting and introduces the manuscripts that were accepted for publication in this supplement, after a strict and careful peer-review by an International board of referees.
Graphs can represent biological networks at the molecular, protein, or species level. An important query is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.
We propose a new subgraph isomorphism algorithm which applies a search strategy to significantly reduce the search space without using any complex pruning rules or domain reduction procedures. We compare our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++ implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and interaction networks data. We show a significant reduction in the running time of our approach compared with these other excellent methods and show that our algorithm scales well as memory demands increase.
Subgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a comprehensive comparison of different software approaches to subgraph isomorphism highlighting their weaknesses and strengths. This will help researchers make a rational choice among methods depending on their application. We also distribute an open-source package including our system and our own C++ implementation of FocusSearch together with all the used datasets (http://ferrolab.dmi.unict.it/ri.html). In future work, our findings may be extended to approximate subgraph isomorphism algorithms.
Subgraph isomorphism algorithms; biochemical graph data; search strategies; algorithms comparisons and distributions
RNA Editing is a type of post-transcriptional modification that takes place in the eukaryotes. It alters the sequence of primary RNA transcripts by deleting, inserting or modifying residues. Several forms of RNA editing have been discovered including A-to-I, C-to-U, U-to-C and G-to-A. In recent years, the application of global approaches to the study of A-to-I editing, including high throughput sequencing, has led to important advances. However, in spite of enormous efforts, the real biological mechanism underlying this phenomenon remains unknown.
In this work, we present VIRGO (http://atlas.dmi.unict.it/virgo/), a web-based tool that maps Ato-G mismatches between genomic and EST sequences as candidate A-to-I editing sites. VIRGO is built on top of a knowledge-base integrating information of genes from UCSC, EST of NCBI, SNPs, DARNED, and Next Generations Sequencing data. The tool is equipped with a user-friendly interface allowing users to analyze genomic sequences in order to identify candidate A-to-I editing sites.
VIRGO is a powerful tool allowing a systematic identification of putative A-to-I editing sites in genomic sequences. The integration of NGS data allows the computation of p-values and adjusted p-values to measure the mapped editing sites confidence. The whole knowledge base is available for download and will be continuously updated as new NGS data becomes available.
MicroRNAs are small noncoding RNAs that play an important role in the regulation of various biological processes through their interaction with cellular messenger RNAs. They are frequently dysregulated in cancer and have shown great potential as tissue-based markers for cancer classification and prognostication. microRNAs are also present in extracellular human body fluids such as serum, plasma, saliva, and urine. Most of circulating microRNAs are present in human plasma and serum cofractionate with the Argonaute2 (Ago2) protein. However, circulating microRNAs have been also found in membrane-bound vesicles such as exosomes. Since microRNAs circulate in the bloodstream in a highly stable, extracellular form, they may be used as blood-based biomarkers for cancer and other diseases. A knowledge base of extracellular circulating miRNAs is a fundamental tool for biomedical research. In this work, we present miRandola, a comprehensive manually curated classification of extracellular circulating miRNAs. miRandola is connected to miRò, the miRNA knowledge base, allowing users to infer the potential biological functions of circulating miRNAs and their connections with phenotypes. The miRandola database contains 2132 entries, with 581 unique mature miRNAs and 21 types of samples. miRNAs are classified into four categories, based on their extracellular form: miRNA-Ago2 (173 entries), miRNA-exosome (856 entries), miRNA-HDL (20 entries) and miRNA-circulating (1083 entries). miRandola is available online at: http://atlas.dmi.unict.it/mirandola/index.html.
Advanced research requires intensive interaction among a multitude of actors, often possessing different expertise and usually working at a distance from each other. The field of collaborative research aims to establish suitable models and technologies to properly support these interactions. In this article, we first present the reasons for an interest of Bioinformatics in this context by also suggesting some research domains that could benefit from collaborative research. We then review the principles and some of the most relevant applications of social networking, with a special attention to networks supporting scientific collaboration, by also highlighting some critical issues, such as identification of users and standardization of formats. We then introduce some systems for collaborative document creation, including wiki systems and tools for ontology development, and review some of the most interesting biological wikis. We also review the principles of Collaborative Development Environments for software and show some examples in Bioinformatics. Finally, we present the principles and some examples of Learning Management Systems. In conclusion, we try to devise some of the goals to be achieved in the short term for the exploitation of these technologies.
social networks; open source; collaborative research; collaborative development; collaborative learning
Chromosomal fragile sites are heritable specific loci especially prone to breakage. Some of them are associated with human genetic disorders and several studies have demonstrated their importance in genome instability in cancer. MicroRNAs (miRNAs) are small non-coding RNAs responsible of post-transcriptional gene regulation and their involvement in several diseases such as cancer has been widely demonstrated. The altered expression of miRNAs is sometimes due to chromosomal rearrangements and epigenetic events, thus it is essential to study miRNAs in the context of their genomic locations, in order to find significant correlations between their aberrant expression and the phenotype.
Here we use statistical models to study the incidence of human miRNA genes on fragile sites and their association with cancer-specific translocation breakpoints, repetitive elements, and CpG islands. Our results show that, on average, fragile sites are denser in miRNAs and also in protein coding genes. However, the distribution of miRNAs and protein coding genes in fragile versus non-fragile sites depends on chromosome. We find also a positive correlation between fragility and repeats, and between miRNAs and CpG islands.
Our results show that the relationship between site fragility and miRNA density is far more complex than previously thought. For example, we find that protein coding genes seem to be following similar patterns as miRNAs, if considered their overall distribution. However, once we allow for differences at the chromosome level in our statistical analysis, we find that distribution of miRNA and protein coding genes in fragile sites is very different from that of miRNA. This is a novel result that we believe may help discover new potential correlations between the localization of miRNAs and their crucial role in biological processes and in the development of diseases.
MicroRNAs (miRNAs) are small RNA molecules that modulate gene expression through degradation of specific mRNAs and/or repression of their translation. miRNAs are involved in both physiological and pathological processes, such as apoptosis and cancer. Their presence has been demonstrated in several organisms as well as in viruses. Virus encoded miRNAs can act as viral gene expression regulators, but they may also interfere with the expression of host genes. Viral miRNAs may control host cell proliferation by targeting cell-cycle and apoptosis regulators. Therefore, they could be involved in cancer pathogenesis. Computational prediction of miRNA/target pairs is a fundamental step in these studies. Here, we describe the use of miRiam, a novel program based on both thermodynamics features and empirical constraints, to predict viral miRNAs/human targets interactions. miRiam exploits target mRNA secondary structure accessibility and interaction rules, inferred from validated miRNA/mRNA pairs. A set of genes involved in apoptosis and cell-cycle regulation was identified as target for our studies. This choice was supported by the knowledge that DNA tumor viruses interfere with the above processes in humans. miRNAs were selected from two cancer-related viruses, Epstein-Barr Virus (EBV) and Kaposi-Sarcoma-Associated Herpes Virus (KSHV). Results show that several transcripts possess potential binding sites for these miRNAs. This work has produced a set of plausible hypotheses of involvement of v-miRNAs and human apoptosis genes in cancer development. Our results suggest that during viral infection, besides the protein-based host regulation mechanism, a post-transcriptional level interference may exist. miRiam is freely available for downloading at http://ferrolab.dmi.unict.it/miriam.
miRNA; virus; cancer; apoptosis; cell cycle; EBV; KSHV
Finding the subgraphs of a graph database that are isomorphic to a given query graph has practical applications in several fields, from cheminformatics to image understanding. Since subgraph isomorphism is a computationally hard problem, indexing techniques have been intensively exploited to speed up the process. Such systems filter out those graphs which cannot contain the query, and apply a subgraph isomorphism algorithm to each residual candidate graph. The applicability of such systems is limited to databases of small graphs, because their filtering power degrades on large graphs.
In this paper, SING (Subgraph search In Non-homogeneous Graphs), a novel indexing system able to cope with large graphs, is presented. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task.
Extensive tests on chemical compounds, biological networks and synthetic graphs show that the proposed system outperforms the most popular systems in query time over databases of medium and large graphs. Other specific tests show that the proposed system is effective for single large graphs.
The General Transcription Apparatus (GTA) comprises more than one hundred proteins, including RNA Polymerases, GTFs, TAFs, Mediator, and cofactors such as heterodimeric NC2. This complexity contrasts with the simple mechanical role that these proteins are believed to perform and suggests a still uncharacterized participation to important biological functions, such as the control of cell proliferation.
To verify our hypothesis, we analyzed the involvement in Neuroblastoma (NB) pathogenesis of GTA genes localized at 1p, one of NB critical regions: through RT-PCR of fifty eight NB biopsies, we demonstrated the statistically significant reduction of the mRNA for NC2β (localized at 1p22.1) in 74% of samples (p = 0.0039). Transcripts from TAF13 and TAF12 (mapping at 1p13.3 and 1p35.3, respectively) were also reduced, whereas we didn't detect any quantitative alteration of the mRNAs from GTF2B and NC2α (localized at 1p22-p21 and 11q13.3, respectively). We confirmed these data by comparing tumour and constitutional DNA: most NB samples with diminished levels of NC2β mRNA had also genomic deletions at the corresponding locus.
Our data show that NC2β is specifically involved in NB pathogenesis and may be considered a new NB biomarker: accordingly, we suggest that NC2β, and possibly other GTA members, are physiologically involved in the control of cell proliferation. Finally, our studies unearth complex selective mechanisms within NB cells.
Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, a key role is played by systems that search for all exact or approximate occurrences of a query graph. To deal efficiently with graph searching, advanced methods for indexing, representation and matching of graphs have been proposed.
This paper presents GraphFind. The system implements efficient graph searching algorithms together with advanced filtering techniques that allow approximate search. It allows users to select candidate subgraphs rather than entire graphs. It implements an effective data storage based also on low-support data mining.
GraphFind is compared with Frowns, GraphGrep and gIndex. Experiments show that GraphFind outperforms the compared systems on a very large collection of small graphs. The proposed low-support mining technique which applies to any searching system also allows a significant index space reduction.
Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species.
This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision.
The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.