The availability of the nearly complete genome sequence for the laboratory mouse provides a powerful platform for predicting genes and other genome features and for exploring the biological significance of genome organization [1
]. However, building a catalog of genome annotations is just the first step in the 'post-genome' biology [2
]. Deriving new insights into complex biological processes using complete genomes and related genome-scale data will require understanding how individual biological units that comprise the genome (for example, genes and other genome features) relate to one another in pathways and networks [4
]. Identifying components within networks can be achieved through genome-wide assays of an organism's proteome or transcriptome using high-throughput technologies such as microarrays; however, it is the association of experimental data with well-curated biological knowledge that provides meaningful context to the vast amount of information produced in such experiments. Ultimately, researchers seek to understand how perturbations of these networks, presumably through study of dysregulated components, contribute to disease processes.
Biochemical interactions and transformations among organic molecules are arguably the foundation and core distinguishing feature of all organic life. Most of these transformations are understood as sequential interactions among molecules. Thus, biochemical pathways, rather than individual reactions and molecules, are often the most useful 'units' of investigation for biomedical experimentalists by providing conceptual reduction of biological system complexity. Biochemical pathways in mammalian systems historically have been characterized and defined with little or no genetic information, making the present day task of connecting metabolism and genomics a challenging enterprise.
The Kyoto Encyclopedia of Genes and Genomes (KEGG) was one of the first projects that addressed the integration of small molecule biochemical reaction networks with genes, and it includes graphical representations of these reactions [5
]. KEGG pathways are based primarily on Enzyme Commission (EC) classifications of enzymes [7
]. For individual species, the known (and predicted) EC enzymes are depicted relative to KEGG reference networks for visualization of the sequential small molecule transformations that exist for a given organism.
Another resource that seeks to integrate pathway and genomic data is Reactome [8
]. Reactome is a manually curated database of human pathways, networks and processes, including metabolism, signaling pathways, cell-cell interactions, and infection response. Data in Reactome are cross-referenced to numerous external widely used genome informatics resources. The curated human pathway data in Reactome are used to infer orthologous pathways in over 20 other organisms that have complete, or nearly complete, genome sequences and comprehensive protein annotations. The non-human pathway data in Reactome are not manually curated in a systematic fashion.
Another popular platform for integration of genetic and biochemical knowledge is Pathway Tools, a software environment for curation, analysis, and visualization of integrated genomic and pathway data [4
]. The PathoLogic component of Pathway Tools predicts complete and partial metabolic pathways for an organism by comparing user-supplied genome annotations (for example, gene names, EC numbers) to a reference database (MetaCyc) of manually curated, experimentally defined metabolic pathways [11
]. The output of PathoLogic analysis is an organism-specific pathway genome database (PGDB) [13
] that contains predicted enzymatic reactions, compounds, enzymes, transporters, and pathways. Pathway Tools has been used to implement curated PGDBs for a number of model eukaryotic organisms, for example, budding yeast, Saccharomyces cerevisiae
Genome Database [14
]), green alga, Chlamydomonas reinhardtii
]), thale cress, Arabidopsis thaliana
]), rice, Oryza sativa
]), plants of the Solanaceae family (SolCyc [18
]), human, Homo sapiens
]) and, very recently, the bovine, Bos taurus
]), as well as for hundreds of microorganisms [21
]. All databases implemented using Pathway Tools share a common web-based user interface while also providing support for users of the software to display organism-specific details and information for genes and pathways.
Here, we describe the implementation and curation of the MouseCyc database [22
] using the Pathway Tools platform. MouseCyc now joins the existing biochemical pathway resources for major biomedically relevant model organisms, providing ease of use through implementation of the Pathway Tools web interface, and integration with other Mouse Genome Informatics (MGI) resources [23
]. MouseCyc contains information on central, intermediary, and small-molecule metabolism in the laboratory mouse and serves as a resource for analyzing the mouse genome using the functional framework of biochemical pathways. MouseCyc facilitates the use of the laboratory mouse as a model system for understanding human biology and disease processes in three ways. First, the database provides a means by which the available wealth of biological knowledge about mouse genes can be organized in the context of biochemical pathways. Second, the query and analysis tools for the database serve as a means for researchers to view and analyze genome scale experiments by overlaying these data onto global views of the curated mouse metabolome. Finally, MouseCyc supports direct comparisons of metabolic processes and pathways between mouse and human; comparisons that may be critical to understanding both the power and the biological limitations of using mouse models of human disease.