is a soil nematode whose small size (1 mm), fast generation time (3 days), ease of culturing, and the ability to maintain strains by either clonal or sexual reproduction have all contributed to its widespread use as a genetic model organism. Furthermore, C. elegans
is transparent, exhibits an invariant cell lineage and has a relatively simple nervous system, facilitating studies of development and nervous system function. Finally, its small genome size and gene complement (100 Mbp, 19 473 genes) and complete genome sequence have extended the benefits of C. elegans
to studies in genomics and proteomics (1
WormBase is a collaborative project whose aim is to consolidate the considerable information on the biology of C. elegans
). In particular, we seek to provide access to this data in a user-friendly format without compromising the power and flexibility of more advanced queries. As an advanced genetic model organism database, WormBase contains substantial information in those areas that helped establish the worm as a model organism. These data include: (i) the essentially complete genome sequence (3
); (ii) the developmental lineage of the worm (4
); (iii) the connectivity of the nervous system (6
); (iv) mutant phenotypes, genetic markers and genetic map information; (v) gene expression described at the level of single cells; and (vi) bibliographic resources including paper abstracts and author contact information.
WormBase continues to emphasize curation of the central information infrastructure, while expanding the biological scope of the resource. Current objectives include systematic curation of the C. elegans literature, the integration of large-scale, community submitted datasets and the development of simplified methods of data access. A survey of some of the new features of the WormBase resource are described below.
Of the substantial additions and refinements to the user interface, the most apparent is the implementation of a new Genome Browser (7
). This Genome Browser features a highly configurable interface, preservation of user preferences, semantic zooming, an extensible plug-in based architecture and the ability to display user annotations within the context of the C. elegans
genome. Users can enter the Genome Browser through hypertext links from related report pages, or can search from the Genome Browser interface directly using a marker name or position, chromosomal coordinates, or a description of biological function. In instances where multiple items are returned, a selection display is presented showing the position of items returned in the genome. Selecting an individual item takes the user into the graphical representation of that region of the genome (Fig. ).
An expanded view of the Genome Browser, display a variety of features including genes, alignments with C. briggsae, ESTs and RNAi experiments.
The Genome Browser was developed as a central component of the Generic Model Organism Database (GMOD) project, an effort to develop extensible and reusable components for model organism databases. More information on the Genome Browser and the software that drives it can be found at http://www.gmod.org/
Caenorhabditis briggsae sequence and gene predictions
In addition to maintaining information on C. elegans, WormBase now contains the essentially complete genomic sequence from the related nematode C. briggsae. This information will be useful for identifying conserved regions between the two genomes, for verifying and correcting gene models, for identifying gene family expansion and contractions, and for studying the functional differences between specific proteins.
The C. briggsae
genome is browsable and searchable in both the C. elegans
Genome Browser and in a separate instance that displays C. briggsae
-specific analysis. In the C. elegans
Genome Browser, nucleotide-level alignments to C. briggsae
calculated using the Blat (8
) and WABA (9
) algorithms are displayed (Fig. ). The C. briggsae
specific Genome Browser displays gene predictions from a number of ab initio
gene prediction algorithms and full WABA and BLATX alignments to C. elegans.
Raw analysis files of the C. briggsae
genome can be retrieved from the WormBase FTP site (ftp.wormbase.org
Single nucleotide polymorphisms (SNPs)
WormBase now includes 6386 SNPs (10
). These SNPs have been of great utility to the research community, accelerating the pace of genetic mapping and increasing its resolution. To facilitate the identification of SNPs and genetic markers suitable for mapping experiments, two new searches have been developed. Provided with a genetic or physical interval, these searches return all genetic and physical markers contained within the interval. A basic search provides quick access to this information. A more extensible version of this search allows a user to restrict returned markers based on their ease of scoring, lethality to the organism, and, in the case of SNPs, on whether they generate a restriction fragment length polymorphism. Freely-available strains carrying genetic markers are also listed, simplifying the acquisition of experimental reagents.
Results from systematic RNA interference screens as well as individual experiments are now available for 7242 genes. These data are accessible in two ways: first, they can be searched by phenotype through a specific RNAi search page; second they are presented on the descriptive pages for individual genes (Fig. ). Thus, users searching for candidate genes that result in specific phenotypes when disrupted as well as users studying the function of individual genes have direct access to the data. Results of experiments include various descriptive terms of the mutant phenotype, the span of the gene segment targeted, and in many cases, QuickTime movies displaying the resulting phenotype.
A typical page for an RNAi experiment. QuickTime movies of many results help to illustrate developmental timing defects.
Expression patterns and profiles
In addition to information on reporter gene constructs, WormBase now presents data for consolidated microarray experiments from a variety of life stages and conditions (11
). In the analysis of these experiements, genes are grouped based on their tissue or life-stage expression profile; genes expressed in similar tissues coalesce into a mountain when displayed in 3D topology. WormBase displays these microarray profiles in a 2D representation, allowing users to see which mountain a particular gene of interest belongs, as well as identify genes likely to be coexpressed based on their distance from that mountain in the display.
Although there is substantial information on the anatomy of the worm available, this information contains a level of abstraction that makes it difficult to relate to the organism as a whole. In order to make this data more accessible, WormBase has implemented neuron-specific search pages. These searches enable users to search for neurons of specific classes. For example, a user can search for all neurons derived from a specific lineage, or search for all GABA-releasing neurons.
Gene verification and model correction
Of major interest to end-users is the integrity and accuracy of gene predictions. WormBase is using a variety of methods to address difficulties in gene prediction. First, data from the C. elegans
ORFeome project has been integrated into the WormBase infrastructure. By systematically amplifying genes from a cDNA library, the ORFeome project sought to define all transcribed genes and correct their splicing patterns (12
). These experiments, in the form of PCR primer pairs and amplified products, are displayed on the Genome Browser allowing users to quickly determine if a predicted gene was amplified by the technique. Second, gene models have been enhanced by the annotation and display of UTRs on the Genome Browser. Finally, gene models continue to be revised from user submissions and literature curation and a new gene representation in the database schema is being implemented to track revisions in gene structure.
The Gene Ontology project is an attempt to develop a controlled vocabulary to describe biological processes and molecular functions (http://www.geneontology.org/
). WormBase staff work in close collaboration with the Gene Ontology project. Currently, 11 593 terms have been assigned and a new Gene Ontology browser makes it easy to search for genes associated with a specific ontology term. For example, a search for Map Kinase displays all genes associated with the Map Kinase function. Results are displayed in a hierarchical format, allowing the end user to easily explore the Ontology structure by selecting terms above or below their gene of interest.
Curation and annotation
Systematic literature curation continues to play an important role in expanding the biological scope of WormBase and in assessing the data integrity of the database. A first pass curation step outlines the types of data contained in papers. Second pass curation now focuses on gene structure, function and expression information. Detailed gene function summaries are also being generated to place this data into an integrated context.