The curation efforts for MetaCyc increased significantly in the past 2 years, resulting in a significant increase in the number of database objects (see Table ). Since MetaCyc was initialized with all the metabolic pathways from EcoCyc, most of the central metabolic pathways ubiquitous to microorganisms were already well represented in MetaCyc. Thus, the curation strategy was to add new pathways that would provide breadth. To diversify MetaCyc’s curation expertise, SRI began a collaboration with The Arabidopsis Information Resource (TAIR) at the Carnegie Institution of Washington (Carnegie), which presently curates plant pathways, while SRI continues to curate microbial pathways.
The size of MetaCyc as a function of time from its first release in 1999 to its most recent release in 2003
Together, SRI and Carnegie developed a curation strategy for adding new pathways and editing existing pathways. New pathways are curated in the following order: first, central metabolic pathways universal to plants; second, secondary metabolic pathways and other pathways shared among fewer microorganisms or plants; and, third, significant pathways restricted to only a few microorganisms or plants. Existing MetaCyc pathways that contained few to no comments or lacked enzymes are reviewed, and comments, enzyme information, and literature citations are added, redundant pathways are deleted, and errors are corrected.
To ensure the consistency of curation procedures over time and among multiple curators, our curation procedures were refined and documented in our evolving Pathway Tools Curators’ Guide (see URL http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf
). The Pathway Tools Curators’ Guide documents the type of information that should be collected and entered for each pathway, reaction, enzyme, gene and chemical compound, and describes stylistic conventions.
We have recently increased the number of MetaCyc releases from two to four times per year to distribute the new data content more quickly. Each release includes a list of the salient changes.
Pathways, enzymes, reactions and compounds
Literature curation is very time consuming. However, MetaCyc offers a unique paradigm for significantly increasing the rate at which its data content can be expanded: PGDBs that are created using MetaCyc and Pathway Tools software can be augmented with manual curation by outside groups, and then the newly curated pathways and enzymes can be imported into MetaCyc (see Fig. ). As the database content in MetaCyc grows, the better MetaCyc will serve as a reference DB for metabolic pathway prediction, and as a comprehensive resource on metabolic information. For example, the addition of plant pathways to MetaCyc significantly increases the capability to predict plant-specific pathways.
Figure 2 MetaCyc is the reference database of pathways and enzymes that is used in conjunction with SRI’s Pathway Tools software to predict metabolic pathways from an organism’s annotated genome, resulting in the creation of a PGDB, such as AraCyc. (more ...)
Since spring 2002, 29 new microbial pathways and 36 new plant pathways have been added to MetaCyc. The 29 microbial pathways add breadth to the existing microbial pathways, while the 36 plant pathways nearly complete the central metabolic pathways universal to plants and include important plant secondary metabolic pathways. Many of these new pathways were curated directly in MetaCyc while others were imported into MetaCyc from other PGDBs, such as EcoCyc, MtbRvCyc, a PGDB for Mycobacterium tuberculosis
H37RV curated by Stanford University, and AraCyc, a PGDB for Arabidopsis thaliana
curated by TAIR (4
). We are also collaborating with the Saccharomyces
Genome Database (SGD), which has created a pathway database for Saccharomyces cerevisiae
that is actively being curated; we plan on importing their newly curated pathways and enzymes into MetaCyc. We also hope to import new pathways from other PGDBs; ~35 groups in academia and industry have licensed SRI’s Pathway Tools software and have created, or are in the process of creating PGDBs.
In addition to curating new pathways, ~20 microbial pathways and 10 plant pathways that already existed in MetaCyc were edited extensively, which included adding comments, enzymes and literature citations. The addition of enzymes improves pathway prediction because enzymes from the annotated genome are matched to pathways in MetaCyc by enzyme name or EC number (2
). Comments also aid pathway prediction and improve MetaCyc as a comprehensive resource by explaining the physiological role of pathways. Duplicate pathways are deleted to minimize redundancy.
We have added ~460 enzymes to MetaCyc since version 5.6, which was released in 2001 and described in a previous Nucleic Acids Research
). Enzyme-specific information described in MetaCyc includes cofactors, prosthetic groups, activators, inhibitors, substrate specificity, subunit composition, comments and literature citations.
MetaCyc was also updated to reflect additions and changes to the Enzyme Nomenclature (i.e. the EC system) by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) by incorporating version 30.0 of the EC system.
We have added ~600 chemical structures to MetaCyc since version 5.6. Chemical structures help users to visualize chemical transformations in pathways and permit automated consistency checking of MetaCyc reactions.
MetaCyc contains several taxonomies including the EC system (5
), a pathway class hierarchy and a compound class hierarchy. We recently updated and improved the pathway class hierarchy, which can be accessed at URL http://biocyc.org:1555/META/class-subs?object=Pathways
. We con tinue to expand and enhance MetaCyc’s taxonomies to facilitate browsing and to better represent the new information added to MetaCyc, such as new classes of pathways and compounds, and the new information related to higher eukaryotes, such as subcellular compartment localization.
Links to other databases
MetaCyc contains unidirectional and bidirectional links to many bioinformatics databases, enabling easy navigation to and from additional biological information. MetaCyc is linked to the protein sequence databases, Swiss-Prot (6
) and Protein Information Resource (PIR) (7
), to the protein structural database, Protein Data Bank (PDB) (8
) and to TAIR (9
). We will also establish links from S.cerevisiae
pathways in MetaCyc to SGD (10