Data from the GOS voyage provides a huge increase in available sequences for most prokaryotic gene families, enabling new studies in discovery, classification, and evolutionary and structural analysis of a wide array of gene families. Even for a eukaryotic family such as ePK kinases, GOS provides insights by greatly increasing understanding of related PKL families. GOS increases the number of known ELK sequences more than 3-fold, and has enabled both the discovery of novel families of kinases as well as a detailed analysis of conservation patterns and subfamilies within known families. We believe that the GOS data, coupled with the recent strong growth in whole-genome sequencing, provide the opportunity for similar insights into virtually every gene family with prokaryotic relatives.
PKL kinases are largely involved in regulatory functions, as opposed to the metabolic activities of other kinases with different folds [25
]. The characteristics of this fold that lead to the explosion of diverse regulatory functions of eukaryotic ePKs have also been exploited for many different functions within prokaryotes. While these kinases reflect only ~0.25% of genes in both GOS and microbial genomes (ePKs represent ~2% of eukaryotic genes [42
]), indicating a simpler prokaryotic lifestyle, they now outnumber the count of ~12,000 histidine kinases that we observe in GOS [22
], suggesting that ELKs may be at least as important in bacterial cellular regulation as the “canonical” histidine kinases.
PKL kinases cross huge phylogenetic and functional spaces while still retaining a common fold and biochemical function of ATP-dependent phosphorylation. The presence of Rio and Bud32 genes in all eukaryotic and archaeal genomes suggests that at least this cluster dates back to the common ancestor of these domains of life. Similarly, the presence of UbiB in all eukaryotes and most bacterial groups, the close similarity of pknB/ePK families, and the widespread bacterial/eukaryotic distribution of FruK suggest their origins before the emergence of eukaryotes, or from an early horizontal transfer. Their ancient divergence leaves little or no trace of their shared structure within their protein sequence other than at functional motifs, which include a set of ten key residues that are highly conserved across all PKLs.
Despite the huge attention paid to ePKs, four key residues (P104, H158, H164, D220), three of which are highly conserved in ePKs, are still functionally obscure and worthy of greater attention, both in ELKs and ePKs. Conversely, it appears that nine of the ten key residues have been eliminated or transformed in individual families while maintaining fold and function, showing that almost anything is malleable in evolution given the right context. That right context is frequently a set of additional changes in the family-specific motifs surrounding these key residues, and we see that in the case of K72, a substitution to arginine triggers a cascade of other core substitutions that serve to retain basic function, while a substitution to methionine involves a shift of the positive charge normally provided by K72 to another conserved residue, in both CAK-chloro and Wnk kinases. Other core changes are also seen independently in very distinct families, such as the G55-to-A change in UbiB and the chloro subfamily of CAK, or the E91-to-F change in both chloro and HSK2, suggesting that these kinases are sampling a limited space of functional replacements.
These families vary greatly in diversity. While the ePK family has expanded to scores of deeply conserved functions [42
], other families, including Bud32, Rio, Bub1, and UbiB, usually have just one or a handful of members per genome, suggesting critical function but an inability to innovate. The largely prokaryotic CAK family is also functionally and structurally diverse, containing several known functions and many distinct subfamilies likely to have novel functions. The diversity of both CAK and KdoK sequences may be related to their involvement in antibiotic resistance and immune evasion, likely to be evolutionarily accelerated processes. Comparison of CAK to the related and more functionally constrained HSK2, FruK, and MTRK families may reveal adaptive changes such as the ePK-specific flexibility changes that may assist in its diversity of functions.
GOS data are rich in highly divergent viral sequences, and accordingly we find a number of new subfamilies of viral kinases, including two of the three subfamilies of HRK and a subfamily of CapK. In both cases we see loss of N-terminal–conserved elements, suggesting that these kinases may have alternative functions or even act as inactive competitors to host kinases.
These patterns of sequence conservation and diversity raise many questions that can only be fully addressed by structural methods. The combination of structural and phylogenetic insights for ChoK enabled insights that were not clear from the structure alone, and enabled us to reject other inferences from the crystal structure that were not conserved within this family, highlighting the value of combining these approaches. The relative ease of crystallization of PKL domains, the emergence of high-throughput structural genomics, and our understanding of the diversity of these families make them attractive targets for structure determination of selected members, and position this family as a model for analysis of deep structural and functional evolution.