Since YPD and WormPD are both part of the same underlying database, they are naturally integrated. The interface between the two appears seamless to the user, and navigation between them is as straightforward as navigation within each database. Every yeast or worm gene or protein name that appears within a Protein Report, in either database, is hyperlinked to its own respective Protein Report. Since the annotations are written in a standard style with a minimum of organism-specific terminology, users can easily browse both databases simultaneously. For example, a user who has identified an uncharacterized worm protein in an experimental screen can, with a single click, access all that is known from the scientific literature about the most closely related yeast protein. Such connections can allow the researcher to quickly assess whether or not to pursue an experimental lead. Furthermore, the information provided may suggest the design of future experiments.
Sequence similarity is at the heart of many connections between YPD and WormPD. It is of great interest to know whether a given protein is specific to one organism, is highly conserved, or is somewhere in between these two extremes. To facilitate such comparisons, each Protein Report in YPD and WormPD contains a section, ‘Related Proteins’, that lists similar proteins from S.cerevisiae
, Drosophila melanogaster
, Rattus norvegicus
, Mus musculus
, and Homo sapiens
, as determined by biweekly BLAST analysis (3
) refined by the Smith–Waterman algorithm (4
). A pop-up window leads to a list of the number of matches in each organism, and the name of each organism is hyperlinked to the complete BLAST report. The BLAST reports represent similarity both graphically and as sequence alignments. S.cerevisiae
gene names appearing within the BLAST reports are linked back to their respective Protein Reports. Thus it is simple and straightforward to find all proteins in both organisms with common sequence elements and examine what is known about their functions.
Having two comprehensively curated and interlinked model organism databases provides great power to comparative genomic analysis. With the complete genomic sequences of two eukaryotic organisms determined, it is now possible to compare both orthologous and divergent sets of proteins in order to identify proteins that are likely to provide core eukaryotic functions and those that may be specific to an individual species or family of organisms. Our analyses have suggested that ~41% of predicted yeast proteins and 19% of predicted worm proteins are conserved (with an expectation value of 10–10 or lower) between the two organisms. YPD and WormPD provide an information-rich background in which to expand such connections by tracing conserved pathways, protein associations and complexes across species lines.
As an example, using the information presented in YPD and WormPD Protein Reports we have been able to identify likely components of the C.elegans
mitochondrial translocation apparatus (see Supplementary Material). While not studied directly in C.elegans
, the protein machinery involved in the import of proteins into the mitochondria, a process essential to life, has been well-studied in yeast (5
). Through our analyses, we have found that most, but not all, essential yeast proteins involved in this process have recognizably conserved counterparts in C.elegans
. In all cases of apparent conservation the similarities are surprisingly weak, with expectation values for most >10–12
. This finding is consistent with our observation that mitochondrial proteins, as a group, are more highly diverged between yeast and worm than are proteins found in most other subcellular compartments (Fig. ).
Figure 1 Conservation of proteins between S.cerevisiae and C.elegans, sorted by major subcellular localization. For each group, the percentage of all proteins of that group without a match in the worm proteome (E value of 10–6 or greater) is represented. (more ...)