|Home | About | Journals | Submit | Contact Us | Français|
The regulation of protein function through reversible phosphorylation by protein kinases and phosphatases is a general mechanism controlling virtually every cellular activity. Eukaryotic protein kinases can be classified into distinct, well-characterized groups based on amino acid sequence similarity and function. We recently reported a highly sensitive and accurate hidden Markov model-based method for the automatic detection and classification of protein kinases into these specific groups. The Kinomer v. 1.0 database presented here contains annotated classifications for the protein kinase complements of 43 eukaryotic genomes. These span the taxonomic range and include fungi (16 species), plants (6), diatoms (1), amoebas (2), protists (1) and animals (17). The kinomes are stored in a relational database and are accessible through a web interface on the basis of species, kinase group or a combination of both. In addition, the Kinomer v. 1.0 HMM library is made available for users to perform classification on arbitrary sequences. The Kinomer v. 1.0 database is a continually updated resource where direct comparison of kinase sequences across kinase groups and across species can give insights into kinase function and evolution. Kinomer v. 1.0 is available at http://www.compbio.dundee.ac.uk/kinomer/.
The regulation of protein function through reversible phosphorylation by protein kinases and phosphatases is a widespread cellular mechanism thought to control virtually every cellular activity (1), and abnormal levels of phosphorylation are known to be responsible for severe diseases (2).
Hanks and Hunter were the first to report that sequence similarity of kinase catalytic domains reflects protein kinase function and/or mode of regulation (3,4). Observation of distinct clades where function segregated with sequence similarity allowed Hanks and Hunter to divide the protein kinase superfamily into specific ‘groups’. The currently accepted classification of the eukaryotic protein kinase superfamily considers eight ‘conventional’ protein kinase groups (ePKs) and four ‘atypical’ groups (aPKs) (5,6). Among the ePKs are the AGC group (including cyclic-nucleotide and calcium-phospholipid-dependent kinases, ribosomal S6-phosphorylating kinases, G protein-coupled kinases and all close relatives of these sets); the CAMKs (calmodulin-regulated kinases); the CK1 group (casein kinase 1, and close relatives); the CMGC group (including cyclin-dependent kinases, mitogen-activated protein kinases, glycogen synthase kinases and CDK-like kinases); the RGC group (receptor guanylate cyclase); the STEs (including many kinases functioning in MAP kinase cascades); the TKs (tyrosine kinases) and the TKLs (tyrosine kinase-like kinases). However, there is a significant proportion of kinases which, whilst exhibiting some degree of sequence similarity to the eight groups above, could not be classified easily into particular groups. These form a ninth group called ‘Other’.
The aPKs are a small set of protein kinases that do not share clear sequence similarity with ePKs, but have been shown experimentally to have protein kinase activity. The bona fide aPKs (6) are the alpha-kinase group (exemplified by myosin heavy chain kinase of Dictyostelium discoideum), PIKK (phosphatidyl inositol 3′ kinase-related kinases), RIO and PHDK (pyruvate dehydrogenase kinases).
The sequencing of complete genomes for many eukaryotic species has allowed the determination and comparison of their complete kinase complements (kinomes). These include the kinomes of Saccharomyces cerevisiae (7), Caenorhabditis elegans (8), Drosophila melanogaster (9), Mus musculus (10), Homo sapiens (5), Dictyostelium discoideum (11), Strongylocentrotus purpuratus (12), Tetrahymena thermophila (13), and the plants Arabidopsis thaliana and Oryza sativa (14). Several parasite kinomes have been determined, including the malaria parasite Plasmodium falciparum (15), its comparison with Plasmodium yoelii (16) and those of the three Trypanosomatid species Leishmania major, Trypanosoma brucei and Trypanosoma cruzi (17). The kinomes of H. sapiens, M. musculus, S. purpuratus, D. melanogaster, C. elegans, S. cerevisiae, D. discoideum and T. thermophila are available through Kinbase (http://www.kinase.com/kinbase/). In particular, the observation that many important protein kinases of parasitic protozoa are significantly dissimilar from their eukaryotic counterparts has raised the prospects for therapeutics based on the selective inhibition of parasitic protein kinases (18–20).
We have recently exploited the sequence similarity of protein kinases in developing a multi-level Hidden Markov Model (HMM) library that is capable of classifying protein kinases into their correct functional group (6). The protein kinase HMM library was shown to be three times more sensitive than BLAST for identifying kinase catalytic domains. It was also shown to be more sensitive than a general Pfam model of the kinase catalytic domain, with the added advantage that the HMM library is capable of discriminating among protein kinase groups. The validated HMM library was applied to improve the group-level classification of the S. cerevisiae ePKs from 66.96% to 90.43% by classifying many of the yeast kinases previously assigned to the ‘Other’ group. In this article, we describe the extension of this analysis to the complete classification at the kinase group level of 43 curated eukaryotic kinomes and a web-based resource through which these annotations can be examined. In addition, we provide an interface to the HMM library, allowing for the classification of arbitrary sequences.
The complete translated protein coding sequences were obtained for the fungi Aspergillus fumigatus (21), Aspergillus nidulans (22), Aspergillus niger (23), Aspergillus oryzae (24), Candida glabrata (25), Cryptococcus neoformans (26), Debaryomyces hansenii (25), Kluyveromyces lactis (25), Magnaporthe grisea (27), Neurospora crassa (28), Phanerochaete chrysosporium (29), Ustilago maydis (30) and Yarrowia lipolytica (25). Among the photosynthetic organisms we have included A. thaliana (31), the red alga Cyanidioschyzon merolae (32), the rice species Oryza sativa ssp. Japonica (33), the green algae Ostreococcus lucimarinus (34) and Ostreococcus tauri (35), and the poplar tree Populus trichocarpa (36). The metazoan genomes include the yellow fever mosquito Aedes aegypti (37), the malaria mosquito vector Anopheles gambiae (38), the silkworm Bombyx mori (39), the common dog Canis familiaris (40), the early chordate Ciona intestinalis (41), the chicken Gallus gallus (42), the Rhesus macaque Macaca mulatta (43), the marsupial Monodelphis domestica (Opossum) (44), the fishes medaka Oryzias latipes (45), Takifugu rubripes (46) and Tetraodon nigroviridis (47), the laboratory rat Rattus norvegicus (48) and the chimpanzee Pan troglodytes (49). Finally, we have also included the amoeba Entamoeba histolytica (50), the diatom Thalassiosira pseudonana (51) and the pathogenic protist Trichomonas vaginalis (52). The manually annotated kinomes of Caenorhabditis elegans (8), Dictyostelium discoideum (11), Drosophila melanogaster, Homo sapiens (5) and M. musculus (10) were downloaded from Kinbase (http://www.kinase.com/kinbase/) on 28 September 2008. The manually annotated kinomes of Encephalitozoon cuniculi, Saccharomyces cerevisiae and Schyzosaccharomyces pombe had previously been manually annotated and analysed in detail (53).
The predicted peptide sequences for each of the genomes were searched individually against the Kinomer v. 1.0 multi-level HMM library (6) with the hmmpfam program of the HMMer package (54). Partial matches to the kinase catalytic domain were excluded through manual curation. Empirical cutoffs for association of kinase matches with each of the specific kinase groups were determined through analysis of the significance scores for the matches of the library HMMs to the well annotated kinases in Kinbase for the organisms H. sapiens, C. elegans, D. melanogaster and S. cerevisiae (6). The highest observed E-value for that group was taken as the cutoff for confident assignment. These are AGC (2.7e−7), CAMK (3.2e−14), CK1 (3.2e−5), CMGC (1.2e−7), RGC (4.8e−5), STE (1.4e−6), TK (1.1e−9), TKL (1.7e−12), Alpha (8.5e−66), PDHK (2.7e−10), PIKK (8.4e−6) and RIO (2.3e−3). Protein kinase catalytic domains that had E-values above this cutoff were automatically classified as belonging to the ‘Other’ group. Table 1 lists the protein kinase complements of the 43 eukaryotic genomes contained in Kinomer v.1.0, split by kinase group. All kinase matches were stored in a relational database, linking the sequence to the library matches and the subsequent assignments to a functional group.
The Kinomer v. 1.0 web server provides a comprehensive search interface for accessing the database. Sequences can be retrieved by kinase group, by species or by a combination of both. A summary table illustrates the quality of match of each sequence to the HMM library, as well as providing direct clickable links to the public databases (Figure 1). In addition, an option is available to allow data sets to be downloaded as FASTA format sequence files. The multiple sequence alignment analysis program Jalview (55) is integrated into the Kinomer v. 1.0 interface and allows visualization of the query results. Kinase sequences retrieved are grouped by type and aligned. Jalview allows colouring of the sequences by protein secondary structural properties or amino acid chemical character and on-the-fly calculation of Neighbour-Joining and average distance phylogenetic trees. The web-applet form of Jalview can launch the full Jalview application via the ‘File->View in Full Application’ option. This gives access to further tools for the generation of multiple sequence alignments by Muscle (56), MAFFT (57,58) or ClustalW (59) and secondary structure prediction by JNet (60,61).
In addition, a separate web interface allows users to classify arbitrary sequences with the HMM library. This web based tool allows a user to upload a sequence in any of the many sequence formats supported by EMBOSS (62), including the popular FASTA, GCG, PIR and SwissProt (62) formats. This sequence is subjected to basic quality assurance checks before the hmmpfam search job is queued for execution on a multi-node Linux cluster. The user is then provided with a job ID, and the interface is asynchronous, returning a status page to the user which is updated automatically. The user can bookmark the results page and return at a later time. In addition, an optional field allows the user to associate arbitrary comments with their job, a useful feature to allow otherwise similar jobs to be distinguished. There are no additional parameters that are user-selectable. This allows for a clean and straightforward interface form.
The results are displayed as a formatted HTML page (Figure 2) with the group classification clearly indicated. This shows to which protein kinase group Kinomer v. 1.0 has assigned the sequence. In addition, alternative assignments are given and a summary of all potential significant matches shown. Kinomer v. 1.0 will typically show matches to many kinase group HMMs spanning several kinase groups. All the top-scoring HMMs for one particular group will be the most significant matches, followed by closely related groups. The detailed alignment for each HMM match is linked further down the screen. As some users may wish for more details, the Kinomer v. 1.0 results page also provides a link to the raw HMMer output.
The 43 species considered here span a number of phylogenetic lineages, genome sizes and display a range of adaptations to their environment. The genome-wide kinase group assignments are consistent with our previously published results (6) in that seven protein kinase groups (AGC, CAMK, CK1, CMGC, STE, PIKK and RIO) are present in all species surveyed (Table 1) and some kinases in these groups are likely to be essential. Kinases of the groups RGC, TK, TKL, Alpha and PDHK are late innovations in specific phyla or have been lost secondarily in specific lines of descent. The presence of a discrete number of putative TKs in photosynthetic organisms and the pathogen Entamoeba histolytica suggests that TKs are also likely to have had an ancient origin. This observation has recently been strengthened by the finding of animal-like signalling molecules in the green alga Chlamydomonas reinhardtii (63). These include scavenger receptor cysteine rich (SRCR) and C-type lectin domain (CTLD) proteins, both of which play key roles in the innate immune system of metazoa. The identification of SH2 domain proteins in photosynthetic organisms (63,64) suggests that phosphotyrosine-SH2 domain signalling also has an ancient origin and that important cell signalling and adhesion domains evolved before the divergence of the animal lineage.
The observation that many species outside the Opisthokont group lack important kinase groups, as is the case of TKs in Apicomplexa (Miranda-Saavedra, D. et al., manuscript submitted for publication), and which have many lineage-specific groups of kinases, suggests that the group level is the most specific level for the automatic classification of kinomes based on models constructed from sequences outside the taxonomic clade under investigation. With the availability of a number of Deuterostome, Protostome and pre-bilaterian genome sequences, having all kinases belonging to a particular kinase group enables novel analyses to be performed. For example, it is now possible to trace the evolution of receptor tyrosine kinase families and that of their ligands. Since receptor tyrosine kinases are multi-domain proteins, diverging rates of evolution of the various domains, and their incorporation in the receptor molecule in select phylogenetic lineages, is informative of distinct selection pressures and can be informative of newly acquired functions through the acquisition of new ligand-binding domains. This is the case with the Trk family of receptor tyrosine kinases, which encode the neurotrophin receptors [nerve growth factor (NGF), brain-derived neurotrophic factor (BDNF), neurotrophin-3 (NT-3) and neurotrophin-4 (NT-4)]. The neurotrophin receptors are an ancient family whose function has been lost in multiple lineages and the roles of the receptors have been modified over time (65).
Kinomer v. 1.0 also includes the manually annotated kinomes of the model fungi S. cerevisiae and S. pombe, and that of the unicellular fungi-like parasite Encephalitozoon cuniculi (53). We have recently shown that the two model fungi share ~85% of their kinomes (53), a degree of similarity much higher than that previously reported. The kinomes of budding and fission yeasts are therefore a useful dataset for annotating the kinomes of other fungi, among which we have included species of importance in basic and medical research, and in biotechnology. The manually annotated kinomes of C. elegans, D. discoideum, D. melanogaster, H. sapiens and M. musculus, as provided in Kinbase (http://www.kinase.com/kinbase/), have also been included in the Kinomer v. 1.0 database. These will facilitate the manual annotation of other kinomes included in the database and which belong to the same taxonomic clade. The classification of a number of kinases in the kinomes of C. elegans, D. discoideum, D. melanogaster, H. sapiens and M. musculus could be improved as suggested by the Kinomer v. 1.0 HMM group scores. However, careful manual annotation of the kinomes of other species in the same taxonomic clades will be performed in the future to make a more informed decision about the re-classification of such kinases.
To our knowledge, Kinomer v. 1.0 is unique in being based on a high-accuracy validated kinase-group classification method (6). Other databases of protein kinases exist, but none of these offer the combination of breadth and accuracy of kinase classification that is present in Kinomer v. 1.0. These include KinMutBase (66), a database of clinically validated mutations in human kinases that lead to disease, and RTK.db (67), a database of receptor tyrosine kinases. The Protein Kinase Resource (68) collates data from several databases and includes a subset of protein kinase 3D structures to produce high-quality multiple structure-based alignments. Kinbase (http://www.kinase.com/kinbase/) contains manually curated kinomes classified according to the Hanks and Hunter classification of protein kinases (4). Although of high quality, Kinbase only contains kinomes for nine species. Finally, KinG (69) includes protein kinases identified in completed genomes that have been classified by a variety of metazoan kinome-based sequence search methods, but do not provide the confidence in kinase classification that is seen in Kinomer v. 1.0. Different eukaryotic lineages possess lineage-specific kinase groups and families that are just beginning to be characterized and which constitute as much as 50% of their kinomes (17). The applicability of the KinG approach to non-metazoan kinases needs further testing. A similar limitation is encountered by the PANTHER (70) database. Although not specific to protein kinases, PANTHER provides an extensive and detailed HMM library for kinase families and sub-families. These family and sub-family HMM libraries are trained on metazoan sequences and thus preclude their use to annotate non-metazoan sequences confidently into kinase families and sub-families which may not exist in non-metazoan species. Kinomer v. 1.0 annotates to the group level only and in our view annotating to the family/sub-family level requires manual curation.
In summary, Kinomer v. 1.0 is an easy-to-use interface to a novel database of both manually and automatically annotated kinomes. The availability of 43 eukaryotic kinomes in a relational database allows the easy querying of protein kinases by species and/or protein kinase group. In addition, the Kinomer v. 1.0 website includes a web server interface to the previously validated HMM library for the classification of peptide sequences into protein kinase groups. In the future, Kinomer v. 1.0 will be enhanced with the addition of a number of manually annotated kinomes of fungal, metazoan and photosynthetic organisms (Miranda-Saavedra, D., et al., manuscript in preparation). These will include the kinomes of pathogenic fungi of the Rhizopus and Fusarium geni, and the kinomes of several unicellular and multicellular photosynthetic organisms including diatoms, red, brown and green algae, and vascular plants. Thus, Kinomer v. 1.0 is a useful and developing repository of expert and automatically annotated kinomes.
D.M.S. was a Wellcome Trust Prize Student at the University of Dundee. Funding for open access charge: Wellcome Trust.
Conflict of interest statement. None declared.
We thank Drs Tom Walsh and Jonathan Monk for assistance with computing.