|Home | About | Journals | Submit | Contact Us | Français|
The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism.
MetaCyc (MetaCyc.org) is a highly curated, non-redundant reference database of small-molecule metabolism. It contains metabolic pathway and enzyme data that have been experimentally demonstrated in the scientific literature (1) (Figure 1). Because MetaCyc contains only experimentally determined pathways and enzymes, and due to its tight integration of data and references, MetaCyc is a uniquely valuable resource in fields including genome analysis, metabolism and metabolic engineering. The metabolic pathways and enzymes in MetaCyc are derived from organisms representing all domains of life (Tables 1 and and2).2). In the past, microbial and plant metabolism were emphasized, but current curation also focuses on vertebrate metabolism.
In conjunction with its role as a general reference on metabolism, MetaCyc is used as a reference database for the PathoLogic component of the Pathway Tools software (2) to computationally predict the metabolic network of any organism having a sequenced and annotated genome (3). In this automated process, a predicted metabolic network is created in the form of a Pathway/Genome Database (PGDB). BioCyc (BioCyc.org) is a collection of more than 500 organism-specific PGDBs that were generated in this way both at SRI and by other groups. The editing capability of Pathway Tools enables computationally predicted PGDBs to be improved and updated by manual curation. Interested scientists may adopt and curate existing PGDBs through the BioCyc Web site (biocyc.org/intro.shtml#adoption), or create new PGDBs using MetaCyc and Pathway Tools (biocyc.org/download.shtml). More than 80 groups have used Pathway Tools and MetaCyc to create PGDBs for their organisms of interest, including important model organisms such as Saccharomyces cerevisiae (4), Arabidopsis thaliana (5), Oryza sativa (6), Mus musculus (7), Bos taurus (8), Medicago truncatula (9), Dictyostelium discoideum (10), Leishmania major (11), Chlamydomonas reinhardtii (12), several Solanaceae species (13) and many pathogenic bacteria (14) (see http://biocyc.org/otherpgdbs.shtml for a more complete list).
A web server included in Pathway Tools enables the publishing of PGDBs through either the internet or an internal network. The Navigator component of Pathway Tools allows the browsing and analysis of PGDBs either locally or over the Internet. A detailed description of Pathway Tools can be found at http://bioinformatics.ai.sri.com/ptools/ and in (15).
PGDBs generated by Pathway Tools and MetaCyc are an excellent platform for the integration of genome information with many other types of data regarding metabolism, regulation, and genetics. They provide powerful tools for analyzing omics datasets from experiments related to gene transcription, metabolomics, proteomics, ChIP-chip analysis, etc. (Figure 2). The PGDBs accelerate research in many fields including biochemistry, molecular biology, biotechnology, bioinformatics, metabolic engineering and systems biology (16–19). Both MetaCyc and organism-specific PGDBs can also be used as educational tools.
During the past 2 years, we again significantly expanded the data content of MetaCyc and BioCyc. We also added supporting enhancements to the Pathway Tools software. The expanded and enhanced databases and software are described in the following sections.
All pathways in MetaCyc are curated from the experimental literature. Since the last Nucleic Acids Research publication (2 years ago) (20), we added 507 new base pathways (pathways comprised of reactions only, where no portion of the pathway is designated as a subpathway) and 129 superpathways (pathways composed of at least one base pathway plus additional reactions or pathways), and updated 104 existing pathways, for a total of 740 new and revised pathways. The total number of base pathways grew by 43%, from 977 (version 11.5) to 1399 (version 13.5) (the total increase is less than 507 pathways because some existing pathways were deleted from the database during this period); while the total number of superpathways grew by 120%, from 106 (version 11.5) to 235 (version 13.5).
Along with the increase in pathway number, the number of enzymes, reactions, chemical compounds, and citations in the database grew by 35%, 25%, 29% and 37%, respectively; and the number of referenced organisms increased by 75% (currently at 1795).
The pathways in MetaCyc are classified by an ontology developed at SRI that is constantly updated to reflect curation needs (Table 3). The four top-level categories (or classes) of this ontology are biosynthesis, degradation/utilization/assimilation, generation of precursor metabolites and energy and detoxification.
In version 13.5, the largest top-level class is Biosynthesis, with 902 base pathways. Its main subclasses are secondary metabolites biosynthesis (351); cofactors, prosthetic groups, and electron carriers biosynthesis (160); amino acids biosynthesis (105); and fatty acids and lipids biosynthesis (101).
The second-largest top-level class is degradation/utilization/assimilation, with 639 base pathways. Within this group, the largest subclasses are aromatic compounds degradation (152), amino acids degradation (113), inorganic nutrients metabolism (72), secondary metabolites degradation (58), and carbohydrates degradation (52).
The third-largest top-level class, generation of precursor metabolites and energy, contains 124 base pathways. its largest subclasses are fermentation (34), respiration (25), chemoautotrophic energy metabolism (14) and methanogenesis (12).
The final top-level class, detoxification, is much smaller, with only 16 base pathways.
During the previous 2 years, the number of metazoan pathways in MetaCyc increased by 67%, from 104 to 174 pathways.
The list of pathways added to MetaCyc since the last NAR publication is too long to give here. For a complete report, please see the MetaCyc release notes history at http://metacyc.org/release-notes.shtml.
Following the introduction of support for electron transfer reactions into the database schema, we added a total of 11 electron transfer pathways to the database. This type of pathways utilizes a different display algorithm that conveys features such as the direction of the electron flow, the cell-compartment locations where the substrates are transformed, and the optional translocation of protons across membranes. For an example of such pathways, see the pathway ‘succinate to cytochrome bd oxidase electron transfer’.
MetaCyc is regularly updated with data from the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB), which includes new and modified EC numbers. The last supplement incorporated is supplement 14.
Starting with version 12.0, the full NCBI Taxonomy database (21) is integrated into Pathway Tools, enabling specification of the taxa in which MetaCyc pathways occur using NCBI Taxonomy, and allowing taxonomic querying of MetaCyc pathways and enzymes.
We continue to update the mapping between MetaCyc and Gene Ontology (GO) process and function terms (22). In May 2009, we submitted to GO updated mappings between MetaCyc pathways and reactions and GO biological process and molecular function terms. An updated file is found at http://www.geneontology.org/external2go/metacyc2go.
To improve the mapping between MetaCyc compounds and other compound databases, all MetaCyc compounds were incorporated into the PubChem database (23), and are linked to their PubChem entries.
MetaCyc was updated to better link objects to corresponding entries in KEGG. 3269 MetaCyc reactions were mapped to KEGG reactions, and an additional 920 MetaCyc compounds now have links to KEGG compounds. MetaCyc pathways contain more than twice as many reactions (4950) than does KEGG (2463), although KEGG contains more total reactions (9194 in version 50) than MetaCyc (8387). MetaCyc contains 1399 pathways compared to the 155 pathways in KEGG.
The BioCyc databases are organized into three tiers.
During the past two years, the number of BioCyc PGDBs increased from 371 (version 11.5) to 508 (version 13.1), out of which two are in Tier 1 (EcoCyc and MetaCyc), 24 are in Tier 2, and the rest belong to Tier 3. Some Tier 2 PGDBs were provided by groups outside SRI [examples include MouseCyc (7) CattleCyc (8) and YeastCyc (4)]. Database authors are identified on the database summary page (Tools → Reports → Summary Statistics).
The extended family of Pathway Tools-based Pathway/Genome Databases includes both the SRI-created BioCyc collection and many PGDBs created outside SRI by other Pathway Tools users. This DB family exhibits a number of innovations in scientific database sharing. We believe that no one group can curate all the world’s genomes; therefore, we strongly emphasize the notion of widely distributing the workload of curating genome databases. All the Tier 3 PGDBs and some of the Tier 2 PGDBs are offered for adoption to interested parties under an open license agreement. Some groups adopt existing PGDBs within BioCyc, assuming responsibility for their ongoing curation. Other groups create their own PGDBs using Pathway Tools. We offer free technical support to facilitate this task for our academic users.
All Pathway Tools-based PGDBs share the same schema, thus facilitating comparative analyses and data exchange. An encouraged form of data exchange is the submission of experimentally determined metabolic pathways curated by curators of other PGDBs for inclusion in MetaCyc, broadening the pathways available for pathway prediction, and easing the bottleneck of data entry into MetaCyc. Conversely, Pathway Tools now includes the ability to propagate updates made to MetaCyc to other PGDBs derived from earlier versions of MetaCyc. For example, corrections made to MetaCyc chemical structures or reaction equations can be propagated to other PGDBs. Further, Pathway Tools can perform incremental pathway prediction, thus propagating newly curated pathways present in the latest version of MetaCyc to older organism-specific PGDBs.
Another means for facilitating data exchange is the PGDB registry, operated by SRI. Groups that curate PGDBs can register their databases in our PGDB registry (http://biocyc.org/registry.shtml) in a process that includes deposition of the PGDB in a downloadable format on the author’s FTP or HTTP site. With a few mouse clicks, any Pathway Tools user can download a PGDB listed in the Registry and install it into their working copy of Pathway Tools, making it available for comparative analysis, omics data analysis, etc. Thus, users can share PGDBs as easily as they exchange music files on the Internet.
Starting with version 13.0, all MetaCyc compounds have been adjusted to a consistent protonation state for a reference pH of 7.3, common in the cellular cytosol. This adjustment was performed using the Marvin computational chemistry software (ChemAxon Kft, Budapest Hungary). In addition, all reactions that had a mass-imbalance due only to hydrogen atoms were computationally balanced by adding or removing protons from the appropriate side of the reaction. These updated compounds and reactions eventually will be propagated into all BioCyc PGDBs, and to other PGDBs created using Pathway Tools, making it easier to apply flux-balance analysis techniques to these databases.
This change resulted in certain differences between some MetaCyc reactions and the comparable reactions in other databases, such as the ENZYME Database (24). However, we believe that the representation of reactions in MetaCyc is more consistent and, within the limits of the cytosolic pH of 7.3, more accurate.
The following paragraphs list a number of the most salient improvements to Pathway Tools during the past 2 years.
The BioCyc Web site has undergone a significant overhaul that includes a new toolbar, a new organism selector widget, and new search commands. The new object-specific search commands provide an intermediate level of search complexity that lies between the very easy-to-use Quick Search box and the sophisticated Advanced Query Page. Search pages customized for finding genes/proteins/RNAs, chemical compounds, and pathways are relatively easy to use, yet enable the user to define multi-criteria searches (e.g. find proteins that satisfy specified constraints on their pI, molecular weight, cellular location, small-molecule ligand and chromosomal location). As part of the site redesign, the Regulatory Overview tool that depicts the complete regulatory network stored within a PGDB is now available through the Web site (currently only the desktop version of Pathway Tools supports painting of omics data on the Regulatory Overview).
Users of the BioCyc Web site can now create accounts in which they can store Web site display preferences, specify a default organism for queries, and define and save organism lists for comparative genomics operations.
The desktop version of Pathway Tools enables users to graph omics data for selected genes or metabolites, and also provides over-representation analysis for determining whether certain ontology classes (including GO MetaCyc Pathway Ontology, etc.) are over-represented in gene lists and metabolite lists. A new X–Y plot style of tracks for the Pathway Tools genome browser allows the user to visualize ChIP-chip datasets against the genome. ChIP-chip intensity measurements can be visually correlated with promoters, gene positions and operon boundaries.
The appearance of pathway pages can now be customized in many respects (Pathway → Customize Diagram). Options include setting the font size, determining which elements are included in the drawing (such as enzyme names and gene names), and deciding whether chemical structures are displayed. The pathway diagrams can be downloaded as screen-resolution GIF images (for import into PowerPoint presentations), or as high-resolution PostScript or PDF files for import into documents.
Pathway Tools now runs on Apple computers.
The BioCyc.org and MetaCyc.org Web sites provide several informational resources, including an online BioCyc guided tour (25), a MetaCyc user guide (26) and many Webinar videos that combine narration with online demonstration of different topics (27). We routinely host workshops and tutorials (on site and at conferences) that provide training and in-depth discussion of our software for beginning and advanced users. To stay informed about recent changes and enhancements to our software, join the BioCyc mailing list at http://biocyc.org/subscribe.shtml. A list of our publications is available online (28).
The MetaCyc and BioCyc databases are freely and openly available to all. See http://biocyc.org/download.shtml for download information. New versions of the downloadable data files and of the BioCyc and MetaCyc Web sites are released four times per year.
National Institutes of Health, National Institute of General Medical Sciences (GM080746, GM077678, and GM75742); National Science Foundation, Division of Biological Infrastructure (grant number 0640769 to P.Z., A.K., K.D.). Funding for open access charge: National Institutes of Health, National Institute of General Medical Sciences.
Conflict of interest statement. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the preceding agencies.
We thank Dr Carol Bult and Dr Alexei Evsikov from the Jackson Laboratory for their contribution of pathways from the MouseCyc database. We thank Dr Lindsay Eltis and Dr Hao-Ping Chen from the University of British Columbia for their contribution of pathways from the Rhodococcus jostii RHA1 database. We also thank Dr Malabika Sarker from SRI International for her contribution of the Mycobacterium tuberculosis mycolate biosynthesis pathway. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the preceding agencies.