|Home | About | Journals | Submit | Contact Us | Français|
Correlations of gene-to-gene co-expression and metabolite-to-metabolite co-accumulation calculated from large amounts of transcriptome and metabolome data are useful for uncovering unknown functions of genes, functional diversities of gene family members and regulatory mechanisms of metabolic pathway flows. Many databases and tools are available to interpret quantitative transcriptome and metabolome data, but there are only limited ones that connect correlation data to biological knowledge and can be utilized to find biological significance of it. We report here a new metabolic pathway database, KaPPA-View4 (http://kpv.kazusa.or.jp/kpv4/), which is able to overlay gene-to-gene and/or metabolite-to-metabolite relationships as curves on a metabolic pathway map, or on a combination of up to four maps. This representation would help to discover, for example, novel functions of a transcription factor that regulates genes on a metabolic pathway. Pathway maps of the Kyoto Encyclopedia of Genes and Genomes (KEGG) and maps generated from their gene classifications are available at KaPPA-View4 KEGG version (http://kpv.kazusa.or.jp/kpv4-kegg/). At present, gene co-expression data from the databases ATTED-II, COXPRESdb, CoP and MiBASE for human, mouse, rat, Arabidopsis, rice, tomato and other plants are available.
A vast amount of DNA microarray data has been accumulated in public repositories, contributing to studies such as investigations of gene expression patterns and discoveries of marker transcripts. Furthermore, correlation coefficients as indicators of co-expression between genes, calculated from transcriptome data from a large variety of tissues and cell conditions, and analyses of gene co-expression networks whose member genes are connected with high correlation coefficients, have contributed to the discovery of novel functions of genes (1–3). For instance, the functions of key genes in the biosynthesis of glucosinolates (4), cell walls (5) and flavonoids (6) in plants were successfully uncovered by co-expression analyses. Recent advances in co-expression analysis include the introduction of the mutual rank (MR) as a better index of co-expression (7), and new strategies to extract networks whose member genes are densely and specifically co-expressed (8,9). Meanwhile, efforts to obtain comprehensive quantitative data of metabolites in organisms (metabolomes) have also been made by chromatography–mass spectrometry and nuclear magnetic resonance (10,11), and the number of public databases in which metabolome data are complied has been increasing (12). Correlation analyses of metabolite-to-metabolite co-accumulations have also been made (13–15). Müller-Linow et al. (16) suggested that the distance between metabolites on a metabolic pathway was not related to the co-accumulation correlation. Camacho et al. (17) reported several factors that relate to how metabolites co-accumulate. Correlations of metabolite co-accumulations are thought to be useful for understanding the regulation of metabolic pathways when they are comparatively analyzed between different sets of data sources (18).
So far, numerous databases and tools have been developed to understand the biological significance of transcriptome and metabolome data. Tools for functional categorization of genes, referred to as gene ontology (GO), have been published (19). To promote a more intuitive understanding, databases and tools that project data onto metabolic pathway maps have been developed (20). As a source for pathway maps, data from the Kyoto Encyclopedia of Genes and Genomes (KEGG) have been used in many tools, including KegArray provided by KEGG itself (http://www.genome.jp/kegg/download/kegtools.html) (21–23). We have also developed and upgraded a web-based viewer, KaPPA-View, which can represent transcriptome and metabolome data simultaneously on a single pathway map (24–26), and it has contributed to several plant studies (27–29). In contrast to the quantity of software available for analyzing transcriptome and metabolome quantitative data, only a few tools have so far been developed for studying gene-to-gene and metabolite-to-metabolite correlation data. In ATTED-II (30) and COXPRESdb (2), genes mapped onto KEGG pathways are highlighted on a gene network figure that is represented according to a user’s query gene. MapMan can represent the correlation value for each gene on a map against a query gene (31). Both systems only represent correlations for a single gene, and there are no databases or tools that can overview all the gene-to-gene relationships on the pathway maps. Furthermore, no tools for projecting metabolite-to-metabolite correlations onto pathway maps have been developed so far.
Here, we report a novel database system, KaPPA-View4, which enables the overlay of correlation data onto pathway maps. After reporting the first version of KaPPA-View for an Arabidopsis-specific ‘omics’ data viewer (24), we added functions for correlation representation in version 2, and for multiple map representation and for managing multiple organisms in version 3. To meet the demands of dealing with multiple and huge sets of correlation data in recent years, we reviewed the architecture of the database system, totally reconstructed the web application, especially for the pathway display module, and released the newest version, KaPPA-View4, as an ‘omics’ database that can immediately display results. The ‘omics’ viewer functions were extended to accept data uploading from outside systems. As far as we know, this version is a unique database system that can represent all gene-to-gene and metabolite-to-metabolite correlations on metabolic pathway maps.
We are currently providing two versions of the system. One is KaPPA-View4 Classic (http://kpv.kazusa.or.jp/kpv4/) whose pathway maps are based on the traditional KaPPA-View maps of Arabidopsis (24). Another is KaPPA-View4 KEGG (http://kpv.kazusa.or.jp/kpv4-kegg/), where pathway maps are acquired from KEGG. KaPPA-View4 Classic is primarily for plant scientists, implementing information for genome-sequenced plant species: Arabidopsis, rice and Lotus japonicus; and for plant species whose Affymetrix GeneChips and correlation data are available: tomato, soybean, barley, poplar, wheat, wine grape and maize. KaPPA-View4 KEGG is for general users, including animal, microorganism and also plant scientists. One of the major advantages of KaPPA-View4 KEGG is that we can share the latest results of KEGG’s continuous effort of curation of gene descriptions, categorizations and assignments on the maps. At present, we provide information for 15 species, namely human, mouse, rat, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis, rice, poplar, castor bean, sorghum, wine grape, maize, Physcomitrella patens subsp. patens, Escherichia coli and budding yeast. Details of the basic information of the default installed data are shown in Supplementary Data Section 1. Both versions are freely available.
In KaPPA-View4 Classic, the map data is based on approximately 150 leaves of Arabidopsis pathway maps generated for the initial version (24). The assignment of genes for the other plant species was done by sequence homology searches by the blastx or blastp program against Arabidopsis amino acid sequences TAIR9_pep provided by The Arabidopsis Information Resource (TAIR, http://www.arabidopsis.org/). The best hit Arabidopsis genes for each gene or target sequence of Affymetrix GeneChip probes of the other species were defined as those having the minimum e-value among candidates with an e-value threshold of 1×10−30. Then the genes or probes were associated with the enzymatic reactions where the best hit Arabidopsis genes are assigned. For Lotus japonicus, two extra pathway maps for isoflavonoid biosynthesis (I and II) were added and the gene assignments on the maps are manually performed.
A pipeline was constructed by Java (www.java.com) to download files from the KEGG FTP (file transfer protocol) and generate all the data files required for the KaPPA-View4 KEGG setup including gene, compound and reaction information and pathway map data. The pathway maps were arranged as a tree structure for each organism according to the KEGG BRITE hierarchy, and bird’s eye maps (24) were generated for the nodes of the tree. The hierarchically categorized gene family data ‘Genes and Proteins’ in the KEGG BRITE was added to the tree where each family was represented as a map and its member genes were arrayed on the map (Figure 2, top-right map). During setup of the version on 14 June 2010, 290 and 379 sheets of metabolic pathway maps and gene family maps, respectively, were installed.
Table 1 summarizes the gene co-expression data we provide in the KaPPA-View4 system by default. The data files were downloaded from ATTED-II and COXPRESdb or provided by MiBASE (32) and CoP (33). In the case of the co-expression data described in Pearson’s correlation coefficients (PCC), strong positive and negative correlations were filtered at threshold values of 0.6 and −0.6. In the case of mutual rank (MR), only positive values ≤100 were selected. In the case of CoP data, significant gene-to-gene relationships evaluated as ‘Co-expressed Genes’ (33) were extracted, and the co-expression values were described as cosign theta correlation coefficients.
In KaPPA-View4 KEGG, the gene IDs originated from the entry IDs of KEGG. In the case of rice, KEGG entry IDs were derived from Entrez Gene IDs from the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/), while the co-expression data from ATTED-II and CoP were written as gene IDs from The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/). Therefore, the gene IDs in the co-expression data were converted to NCBI Entrez Gene IDs according to the gene information file of KEGG for rice.
The web application of the KaPPA-View4 system was developed with Java 1.6 (www.java.com), Tomcat 5.5 (tomcat.apache.org), MySQL 5.0 (dev.mysql.com) and Flex SDK 3 (opensource.adobe.com/wiki/display/flexsdk/). We chose the Adobe Flash Player (ver. 9.0 or higher) to display the map representations on Internet browsers. All the map data on KaPPA-View4 were prepared in the SVG (scalable vector graphics) format, and the map data are converted to Flash objects when displayed. Therefore, the SVG Viewer plug-in that had been required for previous versions is no longer needed for KaPPA-View4. This modification has remarkably improved the processing speeds of map representations, thus avoiding longer rendering time on the SVG Viewer plug-ins, especially for the maps where a lot of objects were drawn, e.g. maps in the Multiple Map mode (mentioned later) with a lot of genes and correlation curves. It also improved compatibility with Internet browsers and operating systems.
Both KaPPA-View4 Classic and KaPPA-View4 KEGG run on a Red Hat Enterprise Linux ES v.4 server equipped with a 3.2GHz 64bit Intel Xeon CPU and 2 GB RAM.
Gene-to-gene co-expression data are represented as curves on pathway maps (colored in red in default) after selecting the kind of data listed in the control panel placed below the pathway map, displayed in the Map View function, after selecting the species and pathway maps to view (Figure 1a). All correlations filtered by the control panel are seen here. As shown in Figure 1a, the program finds that a lot of the genes in the Calvin Cycle are co-expressed in Arabidopsis. Pathway maps having dense correlations are distinguished on the bird’s eye map set to correlation mode (Figure 1b). When the species is switched to tomato and the correlation data is set to the tomato, we see that the genes in the Calvin Cycle are densely connected too, implying that this is a general expression tendency across plant species (Figure 1c, see ‘Discussion’ section). Metabolite-to-metabolite co-accumulations can also be overlaid on pathway maps simultaneously with gene co-expression and ‘omics’ data (Figure 1d, curves in blue). Users can utilize not only the default correlation data but also user-created ones (as described later).
The KaPPA-View4 system enables up to four maps to be displayed in a single browser window (Multiple Map mode) with correlation curves represented, resulting in visualization of correlations across the maps (Figure 2). Users can choose the combination of the pathway maps for the Multiple Map mode. In KaPPA-View4 KEGG, gene family maps can be included too (Figure 2, top-right panel). This greatly facilitates analysis of non-metabolic pathway genes such as protein kinases and transcription factors that are not available in other KEGG-based programs developed so far. Furthermore, user-created SVG maps (Figure 2, bottom-right) and simple maps where genes for user-input gene IDs are arrayed (Figure 2, bottom-left) are included in the representation in the Multiple Map mode. Therefore, correlation representations in the Multiple Map mode help researchers primarily to understand relationships between genes and metabolic pathways, which could also be described as functional modules of biological systems. As exemplified in Supplementary Data Section 2, it is possible to find potential genes regulating certain metabolic pathways by representing correlation curves between the gene family maps and the pathway maps (see ‘Discussion’ section).
Users are allowed to upload their own correlation data, as well as their experimental data and map data, and utilize them for their own analyses. Therefore, users can analyze not only the correlation data calculated from the large variety of sample conditions, but also data obtained from sets of specific treatments to focus on treatment-dependent responses. This functional extension should especially help analyze metabolite correlations that are presently of limited public availability. Furthermore, users can freely create their own user accounts to save these data on the KaPPA-View4 server, which allows users to start analyses immediately after login.
As an extra mode for map representations, the KaPPA-View4 system has a ‘Universal Map’ mode where genes from multiple species are drawn simultaneously on a single map. This mode helps to recognize the differences in gene assignments and orthologous genes between the species selected. In this mode, transcriptome and metabolome data from different species can be compared on the maps (Figure 3). This should help study the diversity of paralogs and their functional differences between species by comparing omics data obtained from different species under similar physiological conditions.
The ability to represent transcriptome and metabolome data on the pathway maps was extended to an API (application programming interface), which is accessible from outside systems. Databases and application programs that store omics data can upload their data to KaPPA-View4 by the API and view it directly from their systems without logging in to KaPPA-View4 through a web user interface. As an example, the microarray data stored in MiBASE are immediately represented on KaPPA-View4 when users click the ‘view’ button in the data list of MiBASE (Supplementary Data Section 3). Since the session is kept alive as long as the browser is running, the uploaded data are stacked as temporary data on the KaPPA-View4 server. This allows users to set and compare other pairs of data.
KaPPA-View4 is a unique and novel database system that helps users study the biological significance of correlation data by overlaying it onto pathway maps. ATTED-II (30), COXPRESdb (2) and MapMan (31) can represent gene-to-pathway relationships, but the representations are restricted to a single query gene. One advantage of these systems is that users can get all the information for the query gene. In contrast, in KaPPA-View4, whose subject is pathway information, all the correlations are represented on the pathway maps, although the representations are restricted to the selected pathways. As far as we know, KaPPA-View4 is the only database that can overview all the correlations on pathway maps.
One potential application of representing data in the KaPPA-View4 style is hypothesis generation. We found that one of the genes classified as a helix-turn-helix type transcription factor in human is related to many of the genes on the map of Hypertrophic cardiomyopathy (HCM) (Supplementary Data Section 2). The gene, CSRP3 (cysteine and glycine-rich protein 3, cardiac LIM protein, NCBI Entrez Gene ID: 8048), is also highly co-expressed with a series of genes on the maps of Cardiac muscle contraction, Dilated cardiomyopathy (DCM) and Cell Communication/Tight junction (Supplementary Figure S2b). There is a recent report showing that CSRP3 is indeed involved in HCM and DCM (34). Although the function of the CSRP3 gene has already been reported and the gene might not act as transcription factor in this case, this example clearly shows that representing co-expression networks on pathway maps is a powerful approach to discovering novel gene functions.
Another advantage of the KaPPA-View4 style is shown in Figure 1 where a similarity in gene co-expression networks is observed in the Calvin Cycle between Arabidopsis and tomato. Differences in networks concerning the genes of ribulose bisphosphate carboxylases and triosephosphate isomerases were observed too. To depict the similarities and differences of the networks using currently existing databases and tools that depend on query genes, we would need to obtain correlation data for all the genes in the Calvin Cycle for each species, then re-categorize the data according to the gene assignments to the reactions, and compile them. As KaPPA-View4 is a pathway map oriented system, we can intuitively understand the nature of the relationships and discuss the significance of the similarities and differences observed. This is also an advantage when comparing the differences between metabolite-to-metabolite co-accumulation data sets, as comparative correlation analyses have been suggested to be useful for understanding the regulation of metabolic flows (18).
As described here, pathway map-oriented representation of correlation networks in KaPPA-View4 helps researchers to understand biological systems and to generate working hypotheses to advance their studies, and powerfully complements existing databases and tools such as ATTED-II, COXPRESdb and MapMan.
We have constructed a pipeline to prepare all the information files required for KaPPA-View4 KEGG. However, manual operations are still required for updating the files. As the KEGG data is updated once a week, we are now developing a program to automate this procedure so that the data on KaPPA-View4 KEGG are always kept up to date.
Both KaPPA-View4 Classic and KaPPA-View4 KEGG are freely accessible to the public. User accounts to save the user’s data can be created freely. The KaPPA-View4 system program is freely available for academic users upon request. All the information files for genes, compounds, enzyme reactions and pathway map hierarchies are available from the Download function in the web application. The pathway map data in SVG can be downloaded from the map representation window. We will support the updating and editing of default data according to users’ requirements.
Supplementary Data are available at NAR Online.
New Energy and Industrial Technology Development Organization (NEDO, Japan) as part of a project entitled the ‘Development of Fundamental Technologies for Controlling the Material Production Process of Plants’ (PMPj) [P02001]; NEDO as part of a project named the ‘New Energy Technology Development’ [P07015]; Japan Society for the Promotion of Science (JSPS), Grants-in-Aid for Scientific Research (B) [2038022 to K.Y.]; JSPS, Grants-in-Aid for Special Research on Priority Areas [19043015 and 21024010 to K.Y.]. Funding for open access charge: Kazusa DNA Research Institute Foundation.
Conflict of interest statement. None declared.
Most of the system programming was done by Axiohelix Co. Ltd. (www.axiohelix.com/en). The authors thank Dr Toshio Aoki of Nihon University for preparation of pathway maps of isoflavonoids biosynthesis of Lotus japonicus. We are grateful to all the people who had been engaged in the PMPj for preparation of the pathway maps, testing of the system and making useful comments.