PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-5 (5)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Automatic categorization of diverse experimental information in the bioscience literature 
BMC Bioinformatics  2012;13:16.
Background
Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance.
Results
We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction.
Conclusions
Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.
doi:10.1186/1471-2105-13-16
PMCID: PMC3305665  PMID: 22280404
2.  Worm Phenotype Ontology: Integrating phenotype data within and beyond the C. elegans community 
BMC Bioinformatics  2011;12:32.
Background
Caenorhabditis elegans gene-based phenotype information dates back to the 1970's, beginning with Sydney Brenner and the characterization of behavioral and morphological mutant alleles via classical genetics in order to understand nervous system function. Since then C. elegans has become an important genetic model system for the study of basic biological and biomedical principles, largely through the use of phenotype analysis. Because of the growth of C. elegans as a genetically tractable model organism and the development of large-scale analyses, there has been a significant increase of phenotype data that needs to be managed and made accessible to the research community. To do so, a standardized vocabulary is necessary to integrate phenotype data from diverse sources, permit integration with other data types and render the data in a computable form.
Results
We describe a hierarchically structured, controlled vocabulary of terms that can be used to standardize phenotype descriptions in C. elegans, namely the Worm Phenotype Ontology (WPO). The WPO is currently comprised of 1,880 phenotype terms, 74% of which have been used in the annotation of phenotypes associated with greater than 18,000 C. elegans genes. The scope of the WPO is not exclusively limited to C. elegans biology, rather it is devised to also incorporate phenotypes observed in related nematode species. We have enriched the value of the WPO by integrating it with other ontologies, thereby increasing the accessibility of worm phenotypes to non-nematode biologists. We are actively developing the WPO to continue to fulfill the evolving needs of the scientific community and hope to engage researchers in this crucial endeavor.
Conclusions
We provide a phenotype ontology (WPO) that will help to facilitate data retrieval, and cross-species comparisons within the nematode community. In the larger scientific community, the WPO will permit data integration, and interoperability across the different Model Organism Databases (MODs) and other biological databases. This standardized phenotype ontology will therefore allow for more complex data queries and enhance bioinformatic analyses.
doi:10.1186/1471-2105-12-32
PMCID: PMC3039574  PMID: 21261995
3.  WormBase: a comprehensive resource for nematode research 
Nucleic Acids Research  2009;38(Database issue):D463-D467.
WormBase (http://www.wormbase.org) is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base.
doi:10.1093/nar/gkp952
PMCID: PMC2808986  PMID: 19910365
4.  WormBase 2007 
Nucleic Acids Research  2007;36(Database issue):D612-D617.
WormBase (www.wormbase.org) is the major publicly available database of information about Caenorhabditis elegans, an important system for basic biological and biomedical research. Derived from the initial ACeDB database of C. elegans genetic and sequence information, WormBase now includes the genomic, anatomical and functional information about C. elegans, other Caenorhabditis species and other nematodes. As such, it is a crucial resource not only for C. elegans biologists but the larger biomedical and bioinformatics communities. Coverage of core areas of C. elegans biology will allow the biomedical community to make full use of the results of intensive molecular genetic analysis and functional genomic studies of this organism. Improved search and display tools, wider cross-species comparisons and extended ontologies are some of the features that will help scientists extend their research and take advantage of other nematode species genome sequences.
doi:10.1093/nar/gkm975
PMCID: PMC2238927  PMID: 17991679
5.  The tailless Ortholog nhr-67 Regulates Patterning of Gene Expression and Morphogenesis in the C. elegans Vulva 
PLoS Genetics  2007;3(4):e69.
Regulation of spatio-temporal gene expression in diverse cell and tissue types is a critical aspect of development. Progression through Caenorhabditis elegans vulval development leads to the generation of seven distinct vulval cell types (vulA, vulB1, vulB2, vulC, vulD, vulE, and vulF), each with its own unique gene expression profile. The mechanisms that establish the precise spatial patterning of these mature cell types are largely unknown. Dissection of the gene regulatory networks involved in vulval patterning and differentiation would help us understand how cells generate a spatially defined pattern of cell fates during organogenesis. We disrupted the activity of 508 transcription factors via RNAi and assayed the expression of ceh-2, a marker for vulB fate during the L4 stage. From this screen, we identified the tailless ortholog nhr-67 as a novel regulator of gene expression in multiple vulval cell types. We find that one way in which nhr-67 maintains cell identity is by restricting inappropriate cell fusion events in specific vulval cells, namely vulE and vulF. nhr-67 exhibits a dynamic expression pattern in the vulval cells and interacts with three other transcriptional regulators cog-1 (Nkx6.1/6.2), lin-11 (LIM), and egl-38 (Pax2/5/8) to generate the composite expression patterns of their downstream targets. We provide evidence that egl-38 regulates gene expression in vulB1, vulC, vulD, vulE, as well as vulF cells. We demonstrate that the pairwise interactions between these regulatory genes are complex and vary among the seven cell types. We also discovered a striking regulatory circuit that affects a subset of the vulval lineages: cog-1 and nhr-67 inhibit both one another and themselves. We postulate that the differential levels and combinatorial patterns of lin-11, cog-1, and nhr-67 expression are a part of a regulatory code for the mature vulval cell types.
Author Summary
During development, in which the single-celled egg generates a whole organism, cells become different from each other and form patterns of types of cells. It is these spatially defined fate patterns that underlie the formation of complex organs. Regulatory molecules called transcription factors influence the fate patterns that cells adopt. Understanding the role of these transcription factors and their interactions with other genes could tell us how cells establish a certain pattern of cell fates. This study focuses on studying how the seven cell types of the Caenorhabditis elegans vulva arise. This organ is one of the most intensively studied, and while the signaling network that initiates vulval development and sets the gross pattern of cell differentiation is well understood, the network of transcription factors that specifies the final cell fates is not understood. Here, we identify nhr-67, a new transcription factor that regulates patterning of cell fates in this organ. Transcription factors do not necessarily act alone, and we explore how NHR-67 works with three other regulatory factors (each with human homologs) to specify the different properties of the vulval cells. We also demonstrate that the interconnections of these transcription factors differ between these seven diverse cell types, which may partially account for how these cells acquire a certain pattern of cell fates.
doi:10.1371/journal.pgen.0030069
PMCID: PMC1857733  PMID: 17465684

Results 1-5 (5)