Search tips
Search criteria

Results 1-4 (4)

Clipboard (0)
Year of Publication
Document Types
1.  GO-Module: functional synthesis and improved interpretation of Gene Ontology patterns 
Bioinformatics  2011;27(10):1444-1446.
Summary: GO-Module is a web-accessible synthesis and visualization tool developed for end-user biologists to greatly simplify the interpretation of prioritized Gene Ontology (GO) terms. GO-Module radically reduces the complexity of raw GO results into compact biomodules in two distinct ways, by (i) constructing biomodules from significant GO terms based on hierarchical knowledge, and (ii) refining the GO terms in each biomodule to contain only true positive results. Altogether, the features (biomodules) of GO-Module outputs are better organized and on average four times smaller than the input GO terms list (P = 0.0005, n = 16).
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3087953  PMID: 21421553
2.  Visualizing Information across Multidimensional Post-Genomic Structured and Textual Databases 
Bioinformatics (Oxford, England)  2004;21(8):1659-1667.
Visualizing relations among biological information to facilitate understanding is crucial to biological research during the post-genomic era. Although different systems have been developed to view gene-phenotype relations for specific databases, very few have been designed specifically as a general flexible tool for visualizing multidimensional genotypic and phenotypic information together. Our goal is to develop a method for visualizing multidimensional genotypic and phenotypic information and a model that unifies different biological databases in order to present the integrated knowledge using a uniform interface.
We developed a novel, flexible and generalizable visualization tool, called PhenoGenesviewer (PGviewer), which in this paper was used to display gene-phenotype relations from a human-curated database (OMIM) and from an automatic method using a Natural Language Processing tool called BioMedLEE. Data obtained from multiple databases were first integrated into a uniform structure and then organized by PGviewer. PGviewer provides a flexible query interface that allows dynamic selection and ordering of any desired dimension in the databases. Based on users’ queries, results can be visualized using hierarchical expandable trees that present views specified by users according to their research interests. We believe that this method, which allows users to dynamically organize and visualize multiple dimensions, is a potentially powerful and promising tool that should substantially facilitate biological research.
PMCID: PMC2901923  PMID: 15598839
3.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function 
Bioinformatics (Oxford, England)  2007;23(13):i529-i538.
Despite advances in the gene annotation process, the functions of a large portion of the gene products remain insufficiently characterized. In addition, the “in silico” prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or function genomics approaches.
We propose a novel approach, Information Theory-based Semantic Similarity (ITSS), to automatically predict molecular functions of genes based on Gene Ontology annotations. We have demonstrated using a 10-fold cross-validation that the ITSS algorithm obtains prediction accuracies (Precision 97%, Recall 77%) comparable to other machine learning algorithms when applied to similarly dense annotated portions of the GO datasets. In addition, such method can generate highly accurate predictions in sparsely annotated portions of GO, in which previous algorithm failed to do so. As a result, our technique generates an order of magnitude more gene function predictions than previous methods. Further, this paper presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions for an evaluation than generally used cross-validations type of evaluations. By manually assessing a random sample of 100 predictions conducted in a historical roll-back evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43%–58%) can be achieved for the human GO Annotation file dated 2003.
The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset are available at
PMCID: PMC2882681  PMID: 17646340
4.  Bio-Ontologies and Text: Bridging the Modeling Gap Between 
Bioinformatics (Oxford, England)  2006;22(19):2421-2429.
Natural language processing (NLP) techniques are increasingly being used in biology to automate the capture of new biological discoveries in text, which are being reported at a rapid rate. To facilitate the computational reuse and integration of information buried in unstructured text, we propose a schema that represents a comprehensive set of biological entities and relations as expressed in natural language. In addition, the schema connects different scales of biological information, and provides links from the textual information to existing ontologies, which are essential in biology for integration, organization, dissemination, and knowledge management of heterogeneous information. A comprehensive representation for otherwise heterogeneous datasets, such as the one proposed, are critical for advancing systems biology because they allow for acquisition and reuse of unprecedented volumes of diverse types of knowledge and information from text.
A novel representational schema, PGschema, was developed that enables translation of information in textual narratives to a well-defined data structure comprising genotypic and phenotypic concepts from established ontologies along with modifiers and relationships. Initial evaluation for coverage of a selected set of entities showed that 85% of the information could be represented. Moreover, PGschema can be realized automatically in an XML format by using natural language techniques to process the text.
PMCID: PMC2879055  PMID: 16870928

Results 1-4 (4)