We identified 133 CDK family members, 123 from animals, plants, yeasts, and four protists from which genome sequences have been completed, and 10 additional CDKs from incomplete genome sequences of organisms with known CTD sequences (Table ). Although all of sequences are included in our supplemental phylogenetic analysis (
additional file 1), only 101 of them are included in the major phylogenetic analysis (Fig. ); a large plant-specific amplification of CDK9-like kinases (the phylogenetic weight of these sequences disrupts the CDK9 sub-clade) and sequences from incomplete genomes are excluded (see Fig. and
additional file 1 legends for further explanation). The nomenclature for kinases from
Arabidopsis followed Joubès et al. (2000) and Vandepoele et al. (2002) [
33,
34] (Table ). The catalytic core base, Gly-rich motif and T-loop, required for characterized CDK function, appear to be conserved across all defined and putative kinase sequences analyzed (
additional file 2). The 50% majority rule consensus tree of 4,000 likelihood trees, sampled from the posterior probability distribution from Bayesian phylogenetic inference, is shown in Figure . This tree provides strong support for grouping a number of previously uncharacterized CDKs, from a variety of organisms, with defined CDKs from animals and yeast. Overall, however, very little support is found for relationships among different CDK orthologous groups.
| Table 1CDK-related kinases used in this study. |
In this unrooted tree the highly diversified cell-cycle kinases defined in humans, CDKs1-6, fall into a large cluster with 69% Bayesian support. This grouping includes CDKs from all organisms examined in the study. Among these putative cell-cycle CDKs, some plant and protistan kinases can be assigned with reasonable confidence to specific CDK groups. For example, apparent orthologs of human CDK1 are found in other animals (
Drosophila and
Caenorhabditis), yeasts, both plants (
Arabidopsis and
Oryza),
Encephalitozoon and
Giardia (Fig. ). Likewise, putative orthologs of CDK5 were identified in all organisms examined, except for the two plants (Fig. ). A number of other sequences, such as TbCrk2 and 3 from
Trypanosoma, cluster with cell-cycle kinases but not clearly with any specific CDK family. Significantly, and consistent with the results of Liu and Kipreos (2000) [
32], CDK5 and PCTAIRE-like kinases from fungi and animals form a strongly supported group, indicating their close relationship (Fig. ).
In contrast to cell-cycle kinases, our phylogenetic results failed to identify a clear ortholog of any transcription-related CDKs from two of the complete genomes examined,
Trypanosoma brucei and
Giardia lamblia. This includes strongly supported clades of presumed orthologs of human CDKs7-11 respectively. A well-defined CDK7 family is recovered, including sequences from yeasts, the microsporidian, plants, and animals. These are the primary groups that make up the "CTD-clade," in which the RNAP II CTD is invariably conserved (Fig. ). CDK7 shows an interesting sister relationship to HsCCRK from human and apparent orthologs from
Drosophila,
Caeorhabditis and
Arabidopsis. In
Arabidopsis, four possible CDK7 orthologs were found, as reported previously by Shimotohno and colleagues (2003) [
35]; however, AtCdkF (CAK1) is quite divergent from the core CDK7 family and related specifically to HsCCRK in our analyses. PfMRK from
Plasmodium, suggested previously to be a CDK7 [
36], does not fall within the well-defined CDK7 group, but clusters with another
Plasmodium kinase. The
a priori hypothesis that PfMRK belongs in the core CDK7 group is strongly rejected with our data set in a likelihood paired-sites test.
Likewise, GlCAKlike (gi: 292497120) has been proposed as a CDK7 from
Giardia, based on nearest sequence similarity to Kin28 in a more limited comparison to CDK sequences from fission yeast [
38]. In our expanded analyses of CDKs from 11 completed genomes, we find no evidence supporting an orthologous relationship to CDK7 for this, or any
Giardia sequence. The
a priori hypothesis that GlCAKlike belongs in the core CDK7 group also is strongly rejected in a likelihood paired-sites test.
A robust CDK8 family is recovered with strong support values in both distance bootstrap and Bayesian inference. Like CDK7, this family includes putative orthologs only from members of the "CTD-clade," specifically yeasts, animals and plants. Although the microsporidian Encephalitozoon is a member of the RNAP II "CTD clade," TBlastN searches of the complete genome of Encephalitozoon found six CDKs but none show a phylogenetic affinity to CDK8.
A CDK9 grouping also is supported as monophyletic with representative CDKs from yeasts,
Encephalitozoon, animals, plants and
Plasmodium. This group is divided into two well-defined sub-clades. One of them consists of BUR1 from yeast along with CDK9 orthologs from animals; the other contains CTK1 from yeast, CDC2L5 and CrkRS from human, and apparent orthologs from
Drosophila and
Caenorhabditis, both plants, and
Plasmodium. A putative CDK9 also is found in
Encephalitozoon, but falls at the base of the larger CDK9 grouping and does not associate clearly with either subgroup (Fig. ). Plants also contain a large number of putative CDKs that show strong phylogenetic affinity to CDK9 (
additional file 1). These kinases appear to represent a plant-specific amplification of CDK9, although their functions have not been determined experimentally.
Human CDK10 and CDK11 group with apparent orthologs from other animals, plants, fission yeast, and PfCRK1 from Plasmodium. Once again, no kinases from either Trypanosoma or Giardia show any phylogenetic affinity to this group.