Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Inferring gene ontologies from pairwise similarity data 
Bioinformatics  2014;30(12):i34-i42.
Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene–gene pairwise similarities from -omics data;infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; andrespect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms.
Methods addressing these requirements are just beginning to emerge—none has been evaluated for GO inference.
Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method’s ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.
Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20–25% precision, recall).
Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data.
PMCID: PMC4058954  PMID: 24932003
2.  PiNGO: a Cytoscape plugin to find candidate genes in biological networks 
Bioinformatics  2011;27(7):1030-1031.
Summary: PiNGO is a tool to screen biological networks for candidate genes, i.e. genes predicted to be involved in a biological process of interest. The user can narrow the search to genes with particular known functions or exclude genes belonging to particular functional classes. PiNGO provides support for a wide range of organisms and Gene Ontology classification schemes, and it can easily be customized for other organisms and functional classifications. PiNGO is implemented as a plugin for Cytoscape, a popular network visualization platform.
Availability: PiNGO is distributed as an open-source Java package under the GNU General Public License (, and can be downloaded via the Cytoscape plugin manager. A detailed user guide and tutorial are available on the PiNGO website (
PMCID: PMC3065683  PMID: 21278188
3.  Cytoscape 2.8: new features for data integration and network visualization 
Bioinformatics  2010;27(3):431-432.
Summary: Cytoscape is a popular bioinformatics package for biological network visualization and data integration. Version 2.8 introduces two powerful new features—Custom Node Graphics and Attribute Equations—which can be used jointly to greatly enhance Cytoscape's data integration and visualization capabilities. Custom Node Graphics allow an image to be projected onto a node, including images generated dynamically or at remote locations. Attribute Equations provide Cytoscape with spreadsheet-like functionality in which the value of an attribute is computed dynamically as a function of other attributes and network properties.
Availability and implementation: Cytoscape is a desktop Java application released under the Library Gnu Public License (LGPL). Binary install bundles and source code for Cytoscape 2.8 are available for download from
PMCID: PMC3031041  PMID: 21149340
4.  Evidence mining and novelty assessment of protein–protein interactions with the ConsensusPathDB plugin for Cytoscape 
Bioinformatics  2010;26(21):2796-2797.
Summary: Protein–protein interaction detection methods are applied on a daily basis by molecular biologists worldwide. After generating a set of potential interactions, biologists face the problem of highlighting the ones that are novel and collecting evidence with respect to literature and annotation. This task can be as tedious as searching for every predicted interaction in several interaction data repositories, or manually screening the scientific literature. To facilitate the task of evidence mining and novelty assessment of protein–protein interactions, we have developed a Cytoscape plugin that automatically mines publication references, database references, interaction detection method descriptions and pathway annotation for a user-supplied network of interactions. The basis for the annotation is ConsensusPathDB—a meta-database that integrates numerous protein–protein, signaling, metabolic and gene regulatory interaction repositories for currently three species: Homo sapiens, Saccharomyces cerevisiae and Mus musculus.
Availability: The ConsensusPathDB plugin for Cytoscape (version 2.7.0 or later) can be installed within Cytoscape on a major operating system (Windows, Mac OS, Unix/Linux) with Sun Java 1.5 or later installed through Cytoscape's Plugin manager (category ‘Network and Attribute I/O’). The plugin is freely available for download on the ConsensusPathDB web site (
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2958747  PMID: 20847220
5.  Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes 
Bioinformatics  2010;26(18):i531-i539.
Motivation: Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures—specifically dimension reduction (DR), coupled with clustering—provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space.
Methods: ‘Minimum Curvilinearity’ (MC) is a principle that—for small datasets—suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering.
Results: Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin.
Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2935424  PMID: 20823318
6.  Correcting for gene-specific dye bias in DNA microarrays using the method of maximum likelihood 
In two-color microarray experiments, well-known differences exist in the labeling and hybridization efficiency of Cy3 and Cy5 dyes. Previous reports have revealed that these differences can vary on a gene-by-gene basis, an effect termed gene-specific dye bias. If uncorrected, this bias can influence the determination of differentially expressed genes.
We show that the magnitude of the bias scales multiplicatively with signal intensity and is dependent on which nucleotide has been conjugated to the fluorescent dye. A method is proposed to account for gene-specific dye bias within a maximum-likelihood error modeling framework. Using two different labeling schemes, we show that correcting for gene-specific dye bias results in the superior identification of differentially expressed genes within this framework. Improvement is also possible in related ANOVA approaches.
PMCID: PMC2811084  PMID: 17623705

Results 1-6 (6)