As described above, the pathway organization of reactions is an important step in the network reconstruction. However, in comparing the pathways in EMP and KEGG, we found they are organized very differently. In EMP there are more than 300 metabolic pathways for the human metabolic network. Almost 100 of them are very small, containing three or fewer reactions. Moreover, in the pathway maps there is no link to other pathways shown. Therefore, it is very difficult to gain a whole picture of the human metabolic network from so many small pathways. In KEGG, all the reactions from different organisms are organized into about a hundred metabolic pathways. The problems with the KEGG pathways are the following: (1) it is not human specific. In certain pathways only one or two isolated human reactions exist; (2) there is a high overlap between pathways. For example, there are many overlap reactions in the pathways of glutamate metabolism, urea cycle, arginine and proline metabolism. Grouping these pathways as a big pathway would show the functional relationships between the reactions in them much better; (3) the mass flow between a substrate and a product in the pathways is not as clear as those in EMP or Biocyc (Karp et al, 2005
). To address these problems, we decided to define a new set of human-specific pathways which are (1) human specific; (2) less overlap between pathways; (3) large enough to include functional related small pathways; (4) including links to other pathway to get a better overview of functional connectivity between pathways. Basically, the small pathways in EMP and KEGG were grouped on the basis of their functional relationships. Some original pathways may also be separated into different new pathways. Altogether, 2823 reactions are included in the network and reorganized into 66 pathways, with the number of reactions between 5 and 142 (there are also more than 300 isolated reactions in the network). The retinol (vitamin A) pathway is shown in as an example. There are much more reactions in our pathways than in the corresponding KEGG pathways. More pathway maps and the whole set of pathways in SBML format can be seen in the Supplementary files
(the Supplementary data set
and Supplementary figures
, the network and the pathways in SBML format are also available at http://wwwtest.bioinformatics.ed.ac.uk/wiki/PublicCSB/EHMN
). The users can directly open the SBML files in CellDesigner (Kitano et al, 2005
) or other softwares to generate an automatic layout for the pathways. As described in previous studies, the currency metabolites often cause trouble in graph layout of metabolic pathways because they tend to link all the metabolites in a short path. Therefore in the SBML files for the pathways we include only the main compounds in the ‘listofreactants' and ‘listofproducts' section. This makes it possible to quickly generate clear and nice pathway maps from the SBML files.
An example pathway of the reconstructed human metabolic network: the retinol (vitamin A) pathway. Compared with corresponding KEGG pathway (map00830), our pathway contains more reactions.
In the process of pathway reorganizing, we noticed that many reactions, especially those related with complex lipid metabolism, are missing in the genome-based network, where the reactions are mainly from KEGG ligand database. For example, the reactions related with omega-3 and omega-6 fatty acid (two essential nutrients for human) metabolism, mono-unsaturated fatty acid metabolism and epoxyeicosatrienoic acids (EETs) metabolism are almost completely missing in KEGG. Due to the great structural variance of complex lipids, the total number of lipid metabolites is more than 8000 (Fahy et al, 2005
) and most of them exist in the human metabolic network. Therefore, even though we already added many lipid pathways from literature, the network is still far from complete. A comprehensive database on lipids and their relating enzymes in various organisms has been developed by the LIPID MAPS Consortium (Fahy et al, 2005
; Cotter et al, 2006
). Based on information in this database and other resources, more lipid-related pathways can be added in the future version of our database.
We further compared our network with another computationally reconstructed network in HumanCyc (version 10.6) (Romero et al, 2005
). There are 996 reactions in the database, and among them 766 are catalyzed by enzymes. This is only half of the number of reactions in our database. We extracted 976 EC numbers from HumanCyc and compared them with those in our database. We found 151 EC numbers are in HumanCyc but not in our database. We then checked the reactions and proteins corresponding to these EC numbers, aiming to add new reactions to our database. Surprisingly, we found that 116 of the 151 new EC numbers were without any coding gene, but added by the pathway hole filling algorithm used in Pathologic method for the computational reconstruction of metabolic networks in Biocyc (Karp et al, 2005
). However, many of them are in pathways where many reactions are without any gene. For example, in the dTDP-L
-rhamnose biosynthesis I pathway, only the reaction catalyzed by 220.127.116.11 is encoded by a human gene. The other three reactions catalyzed by 18.104.22.168, 22.214.171.124 and 126.96.36.199 are all added to complement the pathway. There is even no literature related with these reactions in human. Therefore, we decided not to include these reactions in our network. For the other 35 EC numbers, we examined their corresponding genes and checked how these genes are annotated in other databases and literature. We found that 24 EC numbers unique in HumanCyc are because of wrong annotation of the genes in HumanCyc. For example, among the three genes encoding 188.8.131.52, gta actually functions as a galactosyltransferase activator, CDC2L2 is a galactosyltransferase-associated protein kinase, ENSG00000165196 has already been removed in the latest ENSEMBL database (Hubbard et al, 2007
). For the other 10 EC numbers, four have no reaction or a protein modification reaction, which is currently not included in our network. Therefore, we only need to add reactions for six EC numbers from HumanCyc. Actually some of the reactions are already in our reconstruction, but with a different EC number. Altogether nine reactions were added from HumanCyc. A complete list of the manually examined EC numbers unique in HumanCyc can be seen in Supplementary Table 1
. The comparative analysis between our network and HumanCyc indicates from one aspect the importance of integrating information from different databases for network reconstruction, and from another side, the importance of human curation for improving the quality of the computationally reconstructed network.
During the review process of the paper, another high-quality human metabolic network reconstructed by Palsson's group (referred as HMN-P below) was published (Duarte et al, 2007
). We obtained their data from the BiGG database and compared with our network (EHMN). At the gene level, EHMN contains 2322 genes from different databases, HMN-P contains 1496 genes mainly from EntrezGene (actually all genes have EntrezGene ID). The common part is 1069 genes. At enzyme level, in HMN-P only less than half of the genes are assigned EC numbers (total EC numbers less than 500 including unclear EC numbers). In EHMN, all the genes have clear or unclear EC number because we start the reconstruction from such genes. The total number of ECs is more than 800 (excluding unclear ECs). One may argue that in HMN-P ECs are not used to link genes with reactions. However, as a widely used standard for representing metabolic reactions, introducing EC number in the network can greatly simplify the comparative analysis of metabolic networks for that the direct comparison of reaction equations is very difficult due to compound synonyms. At the metabolite level, EHMN contains 2671 compounds, and 1769 of them can be found in KEGG database. HMN-P has 1469 compounds and about a half of them linked to KEGG. For the non-KEGG compounds, in HMN-P only one compound name is given. This makes it difficult to find a matching compound in other databases. As stated previously, we have developed a compound database with synonyms, structure information and IDs in different databases (in Supplementary data set
). At the reaction level, HMN-P contains more reactions than EHMN (3731 versus 2823). However, there are 1189 transport reactions and 457 exchange reactions, which are not considered in EHMN because the subcellular location information is still not included. Furthermore, there are 290 repeat reactions in HMN-P, which are the same reaction but in different compartments. Therefore, the number of reactions comparable with EHMN is just 1795. Because of the intrinsic complexity of human cell, it is very difficult to place the reactions into a small number of compartments. Actually we have collected protein location information from different databases and have identified hundreds of cellular locations. We are working on it to develop a GO (Gene Ontology) (Ashburner et al, 2000
)-based hierarchically compartmented human network for the next release.