|Home | About | Journals | Submit | Contact Us | Français|
The reversible phosphorylation of proteins catalyzed by protein kinases in eukaryotes supports an important role for eukaryotic protein kinases (ePKs) in the emergence of nucleated cells in the third superkingdom of life. Choline kinases (ChKs) could also be critical in the early evolution of eukaryotes, because of their function in the biosynthesis of phosphatidylcholine, which is unique to eukaryotic membranes. However, the genomic origins of ePKs and ChKs are unclear. The high degeneracy of protein sequences and broad expansion of ePK families have made this fundamental question difficult to answer. In this study, we identified two class-I aminoacyl-tRNA synthetases with high similarities to consensus amino acid sequences of human protein-serine/threonine kinases. Comparisons of primary and tertiary structures supported that ePKs and ChKs evolved from a common ancestor related to glutaminyl aminoacyl-tRNA synthetases, which may have been one of the key factors in the successful of emergence of ancient eukaryotic cells from bacterial colonies.
Protein kinases play a pivotal role in communicating intracellular signals in eukaryotes. The family of eukaryotic protein kinases (ePKs)3 comprises at least 568 human members, which accounts for more than 2% of the protein coding genes of the entire human genome (1). These kinases are highly conserved both in their primary amino acid sequences (2) and in the three-dimensional structures (3) of their catalytic domains. Because of the central regulatory roles and the high conservation of the ePKs, the ancestry of these enzymes has become an important question in the study of the evolution of eukaryotic organisms.
The majority of the kinases among the ePKs are responsible for the phosphorylation of proteins on serine or threonine residues, whereas a smaller group of protein kinases catalyzes their tyrosine phosphorylation. This branch of protein-tyrosine kinases (PTKs) arose from protein-serine/threonine kinases (STKs), which is believed to be an important event in early metazoan evolution (4, 5). Of all the STKs, there is another lumped group of diverse kinases that are described as atypical protein kinases. With little sequence identity and structural similarity to typical protein kinases, these atypical protein kinases are suggested to have diverged early in evolution and have distinct evolutionary histories (6, 7). Despite the atypical protein kinases and recently derived PTKs, the rest of the typical protein kinases constitutes a major lineage in protein kinase evolution.
Eukaryotic life is believed to have evolved between 1.7 and 2.7 billion years ago, and no living representatives of the earliest eukaryotes survive today. Consequently, the actual origin of protein kinases is difficult to establish with a high degree of confidence. Firstly, protein sequences are highly degenerate, which makes the detection of sequence similarities difficult even at the superfamily level (8). Secondly, the ePKs comprise a group of very broadly expanded proteins. Loss and expansion of kinase-relatedness tree branches occur in various species, as well as insertions and deletions inside their catalytic domains. To investigate these problems, we developed novel strategies using consensus sequences from precise amino acid sequence alignments as the initial query in BLAST searches and compared top hits from multiple species. Our conclusions are supported by protein primary and tertiary structure comparisons. Our findings offer new insights into the evolution of ePKs and choline kinases (ChKs) in ancient eukaryotes. The molecular paleontology approach undertaken in this study also provides a broadly applicable strategy to generally investigate the origins of large protein domain families.
Sequences of human protein kinases, glutamine-tRNA synthetases, and choline kinases are from UniProt database (http://www.uniprot.org). Initial alignment of each group was created using ClustalW (9), followed by manual adjustment of gaps and inserts. Frequency of each amino acid at each position of the alignment was calculated to generate a positional frequency matrix.
BLAST search using STK consensus sequence was performed by National Center for Biotechnology Information (NCBI) BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) or local BLAST+ (10) with a filter to exclude all the protein kinase hits. A list of the top hits from each organism was generated separately and then compared. Candidates that were common in at least two species with a BLAST score higher than 18.0 were selected for manual alignment to identify the ancestor gene.
We utilized the Phyre server (11) for secondary structure prediction and alignment. The Research Collaboratory for Structural Bioinformatics (RCSB) PDB Protein Comparison Tool (12, 13) and other related applications from the RCSB Protein Data Bank (http://www.pdb.org) were applied to view and align the three-dimensional structures.
To search for the most ancient STK, the conservation of all the human protein-serine/threonine kinases was calculated by retrieving from the PhosphoNET website the calculated identities and similarities between each human protein kinase and its closest homologs in 20 other species. This list was compared with the ranking of human STKs by their similarities to glutamyl-tRNA synthetase (GluRS) consensus sequence and STK consensus sequence.
The amino acid sequences of all the typical human protein kinase catalytic domains were collected and precisely aligned based on primary and secondary structural data (supplemental Table S1). The alignment contained 12 catalytic subdomains, made up of about 30 highly conserved amino acids, and 10 gaps, which represented more variable regions responsible for the specificity of individual kinase. The initial alignment was facilitated by the early work of Hanks et al. (2), which was further refined with more secondary structure information that has arisen from x-ray crystallographic structures of more than 50 protein kinases. The STKs and the PTKs were separated to two groups. Despite the preponderance of conserved residues in both STKs and PTKs, major differences between the two groups often occurred near subdomains VI (HRD) and VIII (APE) (supplemental Tables S3 and S4).
To explore the origin of ePKs, the alignment of the 393 human STK catalytic domains was used to generate a consensus sequence. We calculated the frequency of each of the 20 common amino acids at each position. The average frequency of the most common amino acid at each position was 36%, and two-thirds of them were higher than 20%, indicating very high conservation among the catalytic domains of these protein kinases. An STK consensus sequence of 247 amino acids in length was defined using the amino acid with highest frequency at each position.
A protein kinase domain alignment with 56,691 sequences from the Pfam database (http://pfam.sanger.ac.uk/) was also downloaded (14, 15). This alignment included the catalytic domains of both protein-serine/threonine and protein-tyrosine kinases, which were not easily resolvable in view of the diversity of species represented and the similarity shared between these subgroups of ePKs. The consensus sequence of protein kinase domains from all species was shown to be highly similar to our human STK consensus sequence with 85% overall homology (supplemental Table S5). Because the development of the protein-tyrosine kinase group from protein-serine/threonine kinases is proposed to be a relatively late event during evolution with the emergence of metazoans (4, 5), we believe our human STK consensus sequence was a closer representation of the earliest protein kinases.
To identify the proteins that were most closely related to protein kinases, our STK consensus sequence was employed as the query in BLAST searches performed in six diverse species including Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. For each organism, the top 100 non-protein kinase subjects with a BLAST score higher than 18.0 and an alignment length longer than 35 amino acids were considered as top hits and compared among species. In search of the most likely ancestor of protein kinase, we calculated the average BLAST scores of proteins present among the top hits in more than one species to define candidates with consistent similarity to our consensus sequence. As listed in Table 1, 11 proteins were identified with the range of average score from 19.9 to 22.6.
Given that the alignment of these distant genes might be missed by automatic BLAST search due to insertions and deletions, we then manually checked through all these listed candidates on their alignment with STK consensus sequence. Among the 11 proteins, only ChK and glutaminyl-tRNA synthetase (GlnRS) exhibited particular similarities at the kinase catalytic subdomains, which are highly conserved and critical for phosphotransferase activity. ChKs have already been reported to share high similarity in three-dimensional structure with protein kinases (16). As a group of proteins, tRNA synthetases appeared among the top hits in all six species employed for BLAST search. Glutaminyl-tRNA synthetase, as well as glutamyl- and alanyl-tRNA synthetase, belongs to the class-I aminoacyl-tRNA synthetase family, which was the group most similar with the STK consensus sequences. Like ePKs and ChKs, aminoacyl-tRNA synthetases utilize ATP to accomplish their functions. These results pointed to a possible evolutionary relationship between tRNA synthetases, choline/ethanolamine kinases, and protein kinases.
Among the three tRNA synthetases from class-I aminoacyl-tRNA synthetase family, glutaminyl-tRNA synthetase is believed to be evolved from GluRS (17). Most bacteria employ an alternative two-step pathway to synthesize glutaminyl-tRNA without GlnRS. Phylogenetic analyses indicate that GlnRS arose from the duplication of ancient GluRS after the split of bacteria and archaea/eukarya branches and acquired an N-terminal nonspecific RNA-binding domain later during evolution (18). GlnRS occurrence in a few bacterial species is the result of horizontal transfer from eukaryotes before the domain acquisition event (19).
GlnRS is the only candidate we found with particular conservation at the protein kinase catalytic subdomains from the BLAST results. A consensus sequence for this protein was created using sequences from various species and compared with the STK consensus sequence. From the alignment, the catalytic domain of GlnRS showed strong similarities with the kinase subdomains near the activation loop, including the LXXLH and DFG motifs. The GlnRS N-terminal domain aligned with the ATP-binding subunits of ePKs. We also applied the same strategy to generate GluRS and alanyl-tRNA synthetase (AlaRS) consensus sequences. Both of the two genes lack the N-terminal fragment that aligns with kinase subdomains I to V. At the same time, their catalytic domains also share much lower similarities with kinase subdomains VI to IX when compared with GlnRS. All these results reveal a closer relationship of ePK to GlnRS than to the other aminoacyl-tRNA synthetases.
To investigate the evolutionary relationship between GlnRS, ePK, and ChK, we took the consensus sequence of ChK into the alignment (Table 2). GlnRS and STK consensus sequences shared the highest identity of 24%, among which 18 of the 30 conserve amino acids were identical. The GlnRS consensus sequence displayed 20% identity with ChK. The similarities of the two pairs were both ~34%. The identity of STK and ChK was 16%, which was comparable with the identity of two randomly chosen human protein kinases with each other. These percentages strongly support the possibility that ePK, ChK, and GlnRS were evolved from a common ancestor, which probably functioned as an aminoacyl-tRNA synthetase.
To further characterize the evolutionary links between GlnRS, ePK, and ChK, we employed three-dimensional structure comparison tools from the RCSB Protein Data Bank (http://www.pdb.org) to align the structures of these three groups of proteins (12). Available human structures of GlnRS and ChK were used for comparison. For STKs, the candidates were from those sharing the most similar amino acid sequences with the consensus sequence. The STK structures of the highest similarity scores with ChK and GlnRS are shown in Figs. 1 and and2.2. The human choline kinase displayed high structural similarity with human PKA (p value = 0.024, calculated by the algorithm). The human GlnRS had slightly lower scores with both ePK and ChK (p value = 0.1–0.2, calculated by the algorithm).
We also used the Phyre server (11) to predict the secondary structure of the part of GlnRS that aligns with the ePK catalytic domain (supplemental Table S8). Although some of the important β strands were missing in the predicted GlnRS secondary structure, the overall pattern was similar, especially near subdomains VI to VIII, which was also the most conserved region in GlnRS. For the regions with more dissimilar secondary structures, most of the key amino acid residues critical for maintaining the kinase catalytic core were conserved in GlnRSsequences, including Glu-91 and His-158 in human cAMP-dependent protein kinase, which is consistent with the possibility of generating the ePK catalytic structure through a series of point mutation events starting with a duplicated GlnRS gene. In summary, our data indicated that ePKs and ChKs both emerged from an ancient aminoacyl-tRNA synthetase, which was also the ancestor of contemporary GlnRS.
We hypothesized that the most ancient protein kinases should be conserved across species and be more closely related to other protein kinases in primary structure. We determined the percentage of amino acid identity and similarity scores for the full-length forms of 388 human protein-serine/threonine kinases in 22 other diverse species, and observed that casein kinases 1 and 2, various cyclin-dependent protein kinases, glycogen synthase kinase 3, and the p38 and ERK MAP kinases were the most evolutionary conserved of the human protein kinases (supplemental Table S9). However, scores from BLAST search using STK consensus sequence were very close mainly due to the high similarity among all the human STKs, which rendered the result lacking resolution. Therefore, catalytic domains of all the human protein kinases were also aligned with GlnRS consensus sequence by BLAST. The top 20 hits aligned with GlnRS consensus sequence have scores ranging from 19.4 to 23.3. Seven of them, including AMPKs and some ribosomal S6 kinase (RSK) family members, also appeared among the top 100 hits in the BLAST search against STK consensus sequence (Table 3). Among the seven candidates, AMPKs, the metabolic stress-sensing protein kinases switching off biosynthetic pathways when AMP level rises due to fuel limitation or hypoxia (20), had the highest conservation scores. Additionally, the AMPKs were also consistently found in kinomes from yeast to human (21), indicating that these kinases are most closely related to the ancient protein kinases.
Similarly, we generated a human PTK catalytic domain consensus sequence from the alignment (supplemental Table S3). From BLAST searches with this consensus sequence, EPH and Src families were identified as the most ancient PTKs. In fact, these two families appeared to be the closest to the merging point of receptors and non-receptors in the evolutionary tree of human kinome. Moreover, they were also the most broadly spanned PTK families, with 221 EPH receptors and 172 Src family members identified from 37 metazoans. Thus, we concluded that EPH- and Src-like kinases were the most ancient receptor and non-receptor PTKs, respectively.
A few eukaryotic protein kinase-like genes have been identified in archaebacteria (22) and prokaryotes (23, 24). The widespread distribution of protein kinase genes has led to suggestions that the ancestry of these catalytic domains predated the divergence of the three domains of life (6). However, these eukaryotic-like protein kinases lack some of the essential motifs of ePKs. Other studies have indicated that some of the eukaryotic-like protein kinases had distinct evolutionary histories, which might be even more ancient than ePKs (9, 25).
Signal transduction in prokaryotes is mainly conducted through the two-component system by histidine kinases instead of by protein-serine/threonine or protein-tyrosine kinases. These histidine kinases commonly possesses a conserved C-terminal kinase core domain that features the phospho-accepting histidine as well as homology boxes (H-, N-, D-, F-, G-, and X-) that are not evident in typical eukaryotic protein kinases and display no resemblance to the highly conserved kinase catalytic subdomains in ePKs (26).
With recent data generated from the sequencing of many whole genomes, it is believed that genes actively undergo horizontal transfers across species, which contribute significantly to the flows of genes in evolution (27, 28). Horizontal gene transfers most likely account for many of the eukaryotic-like protein kinases that have been identified in bacteria. These proteins, such as the PknB kinases (29, 30) and the aminoglycoside phosphotransferase APH(3′)-IIIa (31, 32), are usually limited to a few branches of the entire bacterial kingdom. Thus, ePKs are still likely to have a eukaryotic origin.
The human protein kinase complement is a well studied group of regulatory enzymes that is expanded broadly in relatedness trees in all investigated eukaryotes. As a result, we selected all of the human STK catalytic domains and precisely aligned them to generate a representative consensus sequence for ancient ePKs. The strategy of comparing BLAST results from various well studied organisms and aligning the extremely conserved key residues made it possible to detect long distant relationships. The supportive results from primary sequence analysis and structural comparison provide high confidence in the evolutionary linkages between glutaminyl-tRNA synthetase, protein kinases, and choline/ethanolamine kinases.
This contention could be further supported in future studies by site-directed mutagenesis experiments, ideally starting with the deduced consensus sequence of GlnRS or possibly human glutaminyl aminoacyl-tRNA synthetase as this would be technically easier. Based on our comparisons of the consensus sequences of the ePKs and GlnRS shown in Table 2, there are at least 8 highly conserved amino acids found in the catalytic subdomains of ePKs that were missing in GlnRS. Replacement of these amino acid residues in GlnRS with those that are conserved in the ePKs in their catalytic subdomains and that are generally involved in ATP binding and catalysis would be a first step. Additional amino acid residue replacements may be needed for improving recognition of the protein substrate. Our Protein Kinase Substrate Prediction Algorithm Version 2.0 predicts substrate-determining residues that might also be altered to improve the prospects of successful conversion of a GlnRS into a protein kinase (33).
Our results indicated that ePKs and ChKs share a common ancestor, which is consistent with previous three-dimensional structure studies on these proteins. GlnRS exhibited higher sequence identities with ePKs and ChKs than these did with each other, as well as moderate structural similarities. It appears to be the contemporary gene most closely related to the ancestor of both ePK and ChK. Although GlnRS appears exclusively in eukarya and archaea, the aminoacyl-tRNA synthetases comprise a most ancient group of genes that are believed to undergo horizontal transfers early in evolution and gave rise to many of the contemporary genes (33, 35). We are compelled to believe that ePKs and ChKs also have an early eukaryotic origin and that both played an important part in early evolution of highly complex eukaryotic cells.
Here we propose that ePKs and ChKs arose from a common ancestor that is an ancient gene involved in the mRNA translation process as an aminoacyl-tRNA ligase. The emergence of ChKs offered additional phospholipid constituents for construction of more complex membrane structures that provide for intracellular compartmentalization as well as sources of intracellular mediators of signaling. At the same time, ePKs made the communication among different compartments more specific and efficient, which facilitated the specialization of various organelles. The emergence of protein kinases and choline/ethanolamine kinases may well have been critical to the development and success of eukaryotic organisms.
S. P. conceived the project. S. L. and S. P. designed and performed most of the analyses. J. S. carried out the Pfam domain alignments and kinase domain evolutionary conservation analyses. S. L. wrote the initial draft of the manuscript, and S. P. completed the final version. S. L. and S. P. prepared the figures and tables. All authors analyzed the results and approved the final version of the manuscript.
*This work was supported by Kinexus. S. P. is the president, chief scientific officer and majority shareholder of Kinexus Bioinformatics Corporation, which offers proteomics services and products for sale.
This article contains supplemental Tables S1–S9.
3The abbreviations used are: