Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)
Year of Publication
1.  RIDDLE: reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network 
Genome Biology  2012;13(12):R125.
The growing availability of large-scale functional networks has promoted the development of many successful techniques for predicting functions of genes. Here we extend these network-based principles and techniques to functionally characterize whole sets of genes. We present RIDDLE (Reflective Diffusion and Local Extension), which uses well developed guilt-by-association principles upon a human gene network to identify associations of gene sets. RIDDLE is particularly adept at characterizing sets with no annotations, a major challenge where most traditional set analyses fail. Notably, RIDDLE found microRNA-450a to be strongly implicated in ocular diseases and development. A web application is available at
PMCID: PMC4056375  PMID: 23268829
2.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy 
Genome Biology  2008;9(Suppl 1):S5.
The complete set of mouse genes, as with the set of human genes, is still largely uncharacterized, with many pieces of experimental evidence accumulating regarding the activities and expression of the genes, but the majority of genes as yet still of unknown function. Within the context of the MouseFunc competition, we developed and applied two distinct large-scale data mining approaches to infer the functions (Gene Ontology annotations) of mouse genes from experimental observations from available functional genomics, proteomics, comparative genomics, and phenotypic data. The two strategies — the first using classifiers to map features to annotations, the second propagating annotations from characterized genes to uncharacterized genes along edges in a network constructed from the features — offer alternative and possibly complementary approaches to providing functional annotations. Here, we re-implement and evaluate these approaches and their combination for their ability to predict the proper functional annotations of genes in the MouseFunc data set. We show that, when controlling for the same set of input features, the network approach generally outperformed a naïve Bayesian classifier approach, while their combination offers some improvement over either independently. We make our observations of predictive performance on the MouseFunc competition hold-out set, as well as on a ten-fold cross-validation of the MouseFunc data. Across all 1,339 annotated genes in the MouseFunc test set, the median predictive power was quite strong (median area under a receiver operating characteristic plot of 0.865 and average precision of 0.195), indicating that a mining-based strategy with existing data is a promising path towards discovering mammalian gene functions. As one product of this work, a high-confidence subset of the functional mouse gene network was produced — spanning >70% of mouse genes with >1.6 million associations — that is predictive of mouse (and therefore often human) gene function and functional associations. The network should be generally useful for mammalian gene functional analyses, such as for predicting interactions, inferring functional connections between genes and pathways, and prioritizing candidate genes. The network and all predictions are available on the worldwide web.
PMCID: PMC2447539  PMID: 18613949
3.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence 
Genome Biology  2008;9(Suppl 1):S2.
Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.
In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%.
We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.
PMCID: PMC2447536  PMID: 18613946
4.  Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes 
Genome Biology  2007;8(12):R258.
Loss-of-function phenotypes of yeast genes can be predicted from the loss-of-function phenotypes of their neighbours in functional gene networks. This could potentially be applied to the prediction of human disease genes.
We demonstrate that loss-of-function yeast phenotypes are predictable by guilt-by-association in functional gene networks. Testing 1,102 loss-of-function phenotypes from genome-wide assays of yeast reveals predictability of diverse phenotypes, spanning cellular morphology, growth, metabolism, and quantitative cell shape features. We apply the method to extend a genome-wide screen by predicting, then verifying, genes whose disruption elongates yeast cells, and to predict human disease genes. To facilitate network-guided screens, a web server is available .
PMCID: PMC2246260  PMID: 18053250
5.  How complete are current yeast and human protein-interaction networks? 
Genome Biology  2006;7(11):120.
How can protein-interaction networks can be made more complete?
We estimate the full yeast protein-protein interaction network to contain 37,800-75,500 interactions and the human network 154,000-369,000, but owing to a high false-positive rate, current maps are roughly only 50% and 10% complete, respectively. Paradoxically, releasing raw, unfiltered assay data might help separate true from false interactions.
PMCID: PMC1794583  PMID: 17147767
6.  Systematic profiling of cellular phenotypes with spotted cell microarrays reveals mating-pheromone response genes 
Genome Biology  2006;7(1):R6.
Spotted cell microarrays were developed for measuring cellular phenotypes on a large scale and used to identify genes involved in the response of yeast to mating pheromone.
We have developed spotted cell microarrays for measuring cellular phenotypes on a large scale. Collections of cells are printed, stained for subcellular features, then imaged via automated, high-throughput microscopy, allowing systematic phenotypic characterization. We used this technology to identify genes involved in the response of yeast to mating pheromone. Besides morphology assays, cell microarrays should be valuable for high-throughput in situ hybridization and immunoassays, enabling new classes of genetic assays based on cell imaging.
PMCID: PMC1431703  PMID: 16507139
7.  Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome 
Genome Biology  2005;6(5):R40.
In order to consolidate the known human proteins interactions two tests were developed to measure the relative accuracy of the available interaction data. In addition, 6,580 interactions among 3,737 human proteins were recovered from Medline abstracts and combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing data sets.
Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins.
We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets.
These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.
PMCID: PMC1175952  PMID: 15892868
8.  Assembling a jigsaw puzzle with 20,000 parts 
Genome Biology  2003;4(6):323.
A report on the Keystone Symposium 'Proteomics: Technologies and Applications', Keystone, USA, 25-30 March 2003.
A report on the Keystone Symposium 'Proteomics: Technologies and Applications', Keystone, USA, 25-30 March 2003.
PMCID: PMC193613  PMID: 12801408

Results 1-8 (8)