Protein kinases constitute one of the largest protein families, accounting for approximately 2% of eukaryotic genomes. Kinases catalyze the transfer of phosphate groups to proteins, thereby influencing their activity, localization, stability, conformation and/or ability to interact with other proteins [1
]. The yeast genome encodes 127 protein kinases, 20 of which are required for cellular viability [2
]. At least 30% of the yeast proteome [4
] is estimated to be phosphorylated, yet only a small portion of these phosphorylation events have been associated with their cognate kinase [5
]. In fact, PhosphoGRID database (v.1.0) reported over 5,000 phosphorylation sites in 2010, amongst 1,500 proteins in both high-throughput (HTP) and low-throughput (LTP) datasets in yeast, 90% of which have not been associated with either a function or a regulatory kinase [6
]. Since many phosphorylation events are highly transient or occur in the context of specific physiological conditions, it is difficult to capture kinase-substrate interactions. Furthermore, redundancy and promiscuity of protein kinases (particularly in vitro
) can often complicate biochemical analysis.
Many targeted and HTP approaches have been used to link kinases and substrates in budding yeast, including: the use of analogue-sensitive kinase alleles for in vitro
phosphorylation assays [7
]; the interrogation of proteome chips with purified kinases to identify rosters of proteins phosphorylated in vitro
]; affinity purification to discover kinase-associated proteins [10
]; systematic genetic screens to identify genes that functionally interact with kinases [14
]. Given the differences in the ability of large-scale datasets to capture kinase-substrate relationships and the number of different experimental approaches used to associate kinases with their targets, there is a requirement for both accurate quality assessment for HTP datasets through assembly of reliable gold standards and systematic data integration of information in the literature with HTP datasets.
Significant efforts have been made in this regard, including: PhosphoELM, a database of experimentally verified phosphorylation sites in all eukaryotic proteins [17
]; PhosphoSite, a literature-curated database that compiles post-translational modifications with a focus on phosphorylation in all organisms [19
]; NetworKIN [20
], a database that integrates consensus substrate motifs of human kinases with in vivo
phosphorylation sites, protein-protein interaction networks and kinase domain sequences in order to quantitatively predict cellular kinase-substrate relationships; and PhosphoGRID, which includes information from the literature on in vivo
phosphorylation sites for all yeast proteins and assigns the appropriate kinase or phosphatase responsible for each phosphorylated residue [6
]. All of these databases focus on consensus sites and phosphorylated residues. However, there is also considerable experimental information about kinase-substrate relationships at the protein level that is not easily represented in these databases. On the other hand, databases such as BioGRID [21
], which stores all protein and genetic interactions, do not represent the additional specific biochemical experiments that are performed in order to determine kinase-substrate relationships.
We sought to systematically amalgamate interaction information from many experimental approaches - genetic, biochemical and physical - with the specific goal of defining a bona fide interaction between kinases and substrates. We reasoned that a database designed to compile a reliable gold standard for kinase-substrate interactions would require: 1) a means of distinguishing upstream and downstream interactors of kinases, kinase activators and regulatory subunits or co-activators and complex components; 2) a measure of the directionality of genetic interactions involving kinases (for example, suppression, dosage lethality and dosage suppression); 3) a means of including a quantitative measure of the significance of a biochemical interaction; 4) a method for producing a score that reflects the quality of the evidence in the literature supporting a kinase-substrate relationship.
To address these issues, we developed Yeast KID, the first literature-curated database for kinases that integrates a series of HTP and LTP, genetic, physical, and biochemical experimental evidence with the goal of establishing known kinase-substrate relationships. KID enables not only the assembly of tailored gold standards of kinase-target pairs, but also provides a ranked score for assessing the quantity and quality of evidence supporting each pair. KID features a user-friendly interface that amalgamates all genetic, physical, and biochemical HTP data involving yeast kinases, providing easy access for integrative analysis and more complex bioinformatic approaches to study kinase pathways.