The GeneDB homepage supplies links to the individual organism homepages. From these, researchers can take advantage of numerous ways to retrieve data and construct searches according to individual preferences and requirements. Clickable chromosome and contig maps, searchable text indices and browsable catalogues [GO assignments (
1), descriptions, products, domains] provide fast and easy access. An additional query interface supports a wide range of queries on sequences and (curated) annotations stored in GUS, with the ability to combine searches with the Boolean operators AND and OR. For example, users can select all proteins of a specified length range with a specified number of introns. Other query options include GO assignments, keywords, chromosome, protein domains and predicted protein sequence features. The queries in each session are tracked via a history page, allowing further refinement of searches and downloading of results as a nucleotide or amino acid FASTA file. Furthermore, a variety of sequence similarity search facilities are available through GeneDB. In addition to WU-BLAST, GeneDB also supports omniBLAST, which permits searching across a set of selectable databases. An iterative BLAST (PSI-BLAST) search suited to the identification of distant homologues is envisaged to be available shortly. Peptide sequences can be searched with either user-specified motifs or using the peptide mass identification tool EMOWSE, part of the suit of EMBOSS open-source software tools (
2). An alternative approach for accessing genes of interest is to use the official browser of the GO consortium, AmiGO. Several different methods are available for querying the data both externally (
http://www.godatabase.org/) and internally via GeneDB, all of which include direct links to the gene pages.
Feature pages, generated for coding sequences, display basic location information and a context map. The results of protein feature prediction algorithms [SignalP V2.0 (
3), TMHMM v2.0 (
4), GPI anchor predictions (
http://129.194.185.165/dgpi/index_en.html)] and the manual annotation and curation processes are provided in both a graphical display and text format (Fig. ). This information is complemented by the results of similarity searches, including the display of predicted and experimentally characterized orthologues and paralogues. Additional sequence features, both at the DNA level (e.g. polymorphisms, introns, UTRs, splice donor and acceptor sequences) and protein level (e.g. peptide domains), can be viewed in the context of the annotated sequence via an Artemis applet (
5) (Fig. ). The selected region can also be downloaded either in FASTA or annotated EMBL file format. Sequence data, either of the predicted coding sequence or the clustered ESTs, are accessible via a secondary page.
Extensive cross-referencing supports retrieval of related information from external resources, allowing rapid transfer between databases. This includes reciprocal links to numerous databases housing nucleotide and protein sequences [e.g. EMBL (
6), Swiss-Prot/TrEMBL (
7)], pathways [KEGG (
8)], protein families [e.g. SCOP (
9), Pfam (
10), InterPro (
11)], ontologies [e.g. GO (
1)], expression data [e.g. microarray (
http://www.sanger.ac.uk/perl/SPGE/geexview)], strain information [FYSSION (
http://pombe.biols.susx.ac.uk)] and phenotype data [e.g. the
Trypanosoma brucei RNAi project (
http://www.TrypanoFAN.org/)]. Links to databases housing the same genome at different sites [e.g. SGD (
12), TGAD (
http://www.tigr.org/tdb/e2k1/tba1/tba1.shtml)] are also provided. These links to external resources are validated and updated on a monthly basis by the GeneDB mining code. Annotators and curators are automatically alerted to inconsistencies in the datasets and changed GO identifiers.