The possible inputs of GENIES are any data sets about genes or proteins that are represented as the text files either in the form of the tab-delimited profile matrix or kernel similarity matrix predefined by the user. For example, suppose that we are given three profile matrices: gene expression, subcellular localization and phylogenetic profiles. Gene expression profiles can be regarded as a real-valued profile matrix, where the rows represent genes and the columns represent experiment conditions or time series. Subcellular localization profiles can be regarded as a binary profile matrix, where the rows represent gene products and the columns represent subcellular compartments (e.g. Golgi, endoplasmic reticulum). The presence or absence of each gene product is coded as 1 or 0, respectively, across different subcellular compartments. Phylogenetic profiles can be regarded as a binary profile matrix, where the rows represent genes and the columns represent fully sequenced organisms. The presence or absence of each orthologous gene is coded as 1 or 0, respectively, across the different organisms. KEGG gene IDs are accepted for the input data so that the genes can be mapped onto the KEGG PATHWAY maps, and some input examples are provided in the help page (http://www.genome.jp/tools/genies/help.html
(9 May 2012, date last accessed)).
The output of GENIES is a weighted graph with genes as nodes and prediction scores as edges, provided in the following ways (): Pathway list, Inferred list, Search and Download (An example can be seen at http://www.genome.jp/tools-bin/genies?mode=path&id=example
(9 May 2012, date last accessed)). The first option, Pathway list, outputs the predicted interactions grouped into KEGG PATHWAY (20
) maps. When one of the pathways is selected by the user, the genes that are predicted to interact with the other genes in the selected pathway will be highlighted. The second option, Inferred list, provides the predicted interaction pairs categorized into training versus prediction (TP), prediction versus prediction (PP) and training versus training (TT), where ‘training’ and ‘prediction’ mean the genes that are found and not found in KEGG PATHWAY, respectively. The third option, Search, enables the user to search for genes that are predicted to interact with the genes of interest. This option is useful for finding possible missing enzyme genes: the user can use the KEGG PATHWAY maps that contain the missing enzyme in the organism of interest. The last option, Details & Download, provides the list of the predicted gene pairs downloadable as a tab-delimited text file, which can be viewed using visualizing software like Cytoscape (http://www.cytoscape.org/
(9 May 2012, date last accessed)) (21
Figure 2. Output example of GENIES. (a) Pathway list shows the predicted gene–gene interactions grouped based on the KEGG PATHWAY maps. (b) Inferred list classifies the gene–gene network into training–prediction (TP), prediction–prediction (more ...)
The workflow of GENIES is illustrated in . Simple mode is provided for the users who want to try and see the results with the default settings. In the simple mode, profile matrices are converted into the kernel similarity matrices by linear kernel, all kernels are integrated with the same weight, and supervised learning by kernel matrix regression is performed using KEGG PATHWAY as the training network data. After obtaining the prediction result, the details of the default settings can be checked and can also be modified to perform the prediction again with different parameters (as indicated in the dotted arrow). In the Advanced mode, the users can choose the direct or the supervised approaches (although we recommend using the supervised approach for associating uncharacterized genes with known pathways). The Advanced mode provides the choices of the kernel functions, the choices of the network inference algorithms, the choices of training network data and some parameters in the algorithms. In the default settings, molecular network information in KEGG PATHWAY is used as the training network data, although the users can use their own network represented as the adjacency matrix of the genes.