We first constructed a database that consolidates kinase–substrate interactions from multiple online sources. We integrated data describing kinase–substrate interactions from NetworKIN (Linding et al.
), Phospho.ELM (Diella et al.
), MINT (Chatr-aryamontri et al.
), HPRD (Mishra et al.
), PhosphoPoint (Yang et al.
) and Swiss-Prot (Quintaje and Orchard, 2008
) as well as phosphorylation interactions we manually previously extracted from literature (Ma'ayan et al.
). The NetworKIN database contains 3847 kinase–substrate unique pairs made of 73 kinases (21 families) linked to 1452 substrates. HPRD contains 1794 kinase–substrate pairs made of 229 kinases linked to 864 substrates. Phospho.Elm has 1451 interactions between 225 kinases and 784 substrates. MINT has 269 interactions between 145 kinases and 184 substrates. In phosphoPoint there are 436 kinases, 3076 substrates, 9251 kinase–substrate relations from which only 1587 are unique in this dataset, while the rest overlaps with the other databases. In Ma'ayan et al.
, there are 66 interactions between 19 kinases and 43 substrates. There is some overlap among these sources such that the number of unique kinase–substrate relations totals 6414 links between 352 kinases and 2014 substrates in the combined dataset. We consolidated interactions from mouse and rat into human by converting all protein/gene IDs to human Entrez gene symbols. Each kinase–substrate data record is associated with a specific kinase, kinase family and kinase subfamily. To group kinases into families, we used the kinome tree from Manning et al.
) where kinases are classified into 10 major classes and 119 families. To further increase the size of our background dataset, we included all direct protein–protein interactions involving kinases from HPRD (Mishra et al.
) and MINT (Chatr-aryamontri et al.
). By this expansion the current dataset contains a total of 11 923 interactions between 445 kinases having 3995 substrates.
The analysis begins with an input list of gene symbols entered by the user for kinase enrichment analysis (KEA). Before performing the KEA, we remove all input entries that do not match a substrate in the consolidated background kinase–substrate dataset. This step is necessary for achieving proportional comparison. The expected value for a randomly generated list of kinase–substrates can be found by determining the cardinality of the set of substrates that are targeted by specific kinases (or family of kinases) dividing such number by the total number of substrates in the background dataset. In order to detect statistical significant deviations from this expected value, we use the Fisher Exact Test (Fisher, 1922
). The P
-value can be used to distinguish specific kinases among the large number of kinases appearing in the output table.
To implement the web-based system we use Java Server Pages (JSP) and MySQL database running on a Tomcat server. All reported results can be exported to Excel via CSV files. Additionally, users can mouse over on the number of targets for each kinase, kinase family or class to see the list of substrates and view a connectivity diagram that visualizes known protein–protein interactions within the substrates using a database of protein–protein interactions we previously published (Berger et al.
). The map is dynamic where users can move nodes around and click on nodes for more detail (). The visualization of these connectivity diagrams was achieved using Adobe Flash CS4 with ActionScript. Such subgraphs can be used to link kinase specific substrates to pathways and complexes.
Fig. 1. Screenshot of the KEA user interface. Users can paste lists of Entrez gene symbols, representing human proteins; select the level of analysis: kinase-class, kinase-family or kinase and then the program outputs a list of ranked kinase-classes, kinase-families (more ...)
As prior knowledge is increasingly used to interpret high-throughput results, e.g. Balazsi et al.
), we anticipate that KEA is going to be especially useful for the analysis of proteomics and phosphoproteomics data. KEA can be used for analyzing multivariate datasets collected on a time-course to observe trends in kinase activity overtime. Results that show changes in kinase enrichment under different conditions can be due to one of the following reasons: change in kinase enzymatic activity, change in kinase subcellular localization or changes in kinase concentration. Furthermore, KEA can help researchers understand how they can perturb cellular systems toward a desired phenotype by targeting a kinase or group of kinases with pharmacological or gene silencing means. Kinase signaling is well-established to be disturbed in many disease states, especially in cancer (Blume-Jensen and Hunter, 2001
), while it is apparent that phenotypic integrity is controlled by the activity of the regulated behavior of multiple kinases. Hence, mapping kinase activation patterns based on different experimental conditions and time points when measuring many genes/proteins at once in diseased/perturbed versus normal/control may directly suggest combinations of kinase inhibitors that would shift the cellular state towards a desired phenotype.