|Home | About | Journals | Submit | Contact Us | Français|
Plants and microorganisms use two-component signal transduction systems (TCSs) to mediate responses to environmental stimuli. TCSs mediate responses through phosphotransfer from a conserved histidine on a sensor kinase to a conserved aspartate on the receiver domain of a response regulator. Typically, signal termination occurs through dephosphorylation of the receiver domain, which can catalyze its own dephosphorylation. Despite strong structural conservation between receiver domains, reported autodephosphorylation rate constants (kdephos) span a millionfold range. Variable receiver domain active-site residues D + 2 and T + 2 (two amino acids C terminal to conserved phosphorylation site and Thr/Ser, respectively) influence kdephos values, but the extent and mechanism of influence are unclear. We used sequence analysis of a large database of naturally occurring receiver domains to design mutant receiver domains for experimental analysis of autodephosphorylation kinetics. When combined with previous analyses, kdephos values were obtained for CheY variants that contained D + 2/T + 2 pairs found in 54% of receiver domain sequences. Tested pairs of amino acids at D + 2/T + 2 generally had similar effects on kdephos in CheY, PhoBN, or Spo0F. Acid or amide residues at D + 2/T + 2 enhanced kdephos. CheY variants altered at D + 2/T + 2 exhibited rate constants for autophosphorylation with phosphoramidates and autodephosphorylation that were inversely correlated, suggesting that D + 2/T + 2 residues interact with aspects of the ground or transition states that differ between the two reactions. kdephos of CheY variants altered at D + 2/T + 2 correlated significantly with kdephos of wild-type receiver domains containing the same D + 2/T + 2 pair. Additionally, particular D + 2/T + 2 pairs were enriched in different response regulator subfamilies, suggesting functional significance.
IMPORTANCE One protein family, defined by a conserved domain, can include hundreds of thousands of known members. Characterizing conserved residues within a conserved domain can identify functions shared by all family members. However, a general strategy to assess features that differ between members of a family is lacking. Fully exploring the impact of just two variable positions within a conserved domain could require assessment of 400 (i.e., 20 × 20) variants. Instead, we created and analyzed a nonredundant database of receiver domain sequences. Five percent of D + 2/T + 2 pairs were sufficient to represent 50% of receiver domain sequences. Using protein sequence analysis to prioritize mutant choice made it experimentally feasible to extensively probe the influence of positions D + 2 and T + 2 on receiver domain autodephosphorylation kinetics.
Two-component systems (TCSs) are a prevalent means of signal transduction used by plants and microorganisms to mediate responses to stimuli (1). TCSs are present in more than 95% of sequenced bacterial genomes (2, 3), and one species can contain tens to more than a hundred TCSs. TCSs regulate a wide range of processes from cell development to virulence. Signal transduction by TCSs occurs through the transfer of phosphoryl groups between histidyl and aspartyl residues of different protein components (4). Canonically, the sensory component (the sensor kinase) is a phosphodonor to the response regulator (the response-mediating component) (4). The conserved domain in the response regulator, the receiver domain, functions as a molecular switch. The phosphorylation status of a conserved Asp on the receiver domain corresponds to turning the output response on and off. Typically, receiver domain phosphorylation initiates the output response, and dephosphorylation terminates the response. Dephosphorylation can occur with the assistance of another protein, such as a phosphatase, or by self-catalysis by the receiver domain, which is termed autodephosphorylation. Reported autodephosphorylation rate constants of receiver domains span almost 6 orders of magnitude (5) (see Fig. S1 in the supplemental material). The large variation in receiver domain autodephosphorylation kinetics is striking in light of the strong conservation of structure among receiver domains. There are now almost 300 structures of receiver domains in the RCSB Protein Data Bank (6, 7). Receiver domains have a conserved Rossmanoid fold structure, with a β-sheet made up of five β-strands, surrounded by five α-helices (8). Five conserved residues and a divalent metal ion are arranged in a conserved geometry and comprise the active site that catalyzes both the phosphorylation and dephosphorylation of the conserved Asp (8). Despite the conserved fold and active-site geometry, on average any two receiver domains will have only about 25% amino acid sequence identity (9), suggesting that there is a fair amount of variability from one receiver domain to the next. Variable residues located within or close to the receiver domain active site could potentially be positioned to exert influence on the catalysis of autodephosphorylation. Residues at D + 2 (located two amino acids C terminal to the site of phosphorylation on the β3-α3 loop) and T + 2 (located two amino acids C terminal to the conserved Thr/Ser on the β4-α4 loop) are located such that their side chains may potentially interact with conserved active-site residues, the phosphoryl group, and/or the attacking water nucleophile. Furthermore, mutual information analysis suggests a high degree of coevolution between the amino acids at positions D + 2 and T + 2, implying functional importance. In previous studies using a limited set of Escherichia coli CheY and Bacillus subtilis Spo0F mutants based on the wild-type sequences of fewer than 10 receiver domains, we found that the particular amino acids at positions D + 2 and T + 2 altered the autodephosphorylation rate constant by almost 2 orders of magnitude (10, 11). Due to the small data set, the full extent to which residues at D + 2 and T + 2 can influence receiver domain autodephosphorylation is unknown. Further, the mechanisms by which positions D + 2 and T + 2 influence autodephosphorylation remain unclear.
In this study, we extended investigation of the D + 2 and T + 2 residues and their influence on receiver domain autophosphorylation by significantly expanding our data set to be much more representative of response regulators. Pursuing a larger and more relevant data set would potentially mean that conclusions could be applied more broadly to receiver domains. Protein sequence analysis revealed that 20 (out of 400 possible) D + 2/T + 2 pairs account for 50% of receiver domain sequences found in a nonredundant database of naturally occurring response regulators; in essence, only 5% of possible amino acid pairs represent the majority of receiver domains. This circumstance made expansion of the data set experimentally feasible. We also expanded our experimental analysis to include another receiver domain, PhoBN, representative of a large family of response regulators. Combined with previous studies, the 44 D + 2/T + 2 pairs tested in CheY represent 54% of receiver domain sequences. Analysis of the expanded mutant collection showed that residues at D + 2 and T + 2 modulated autodephosphorylation rate constants (kdephos) over 2 orders of magnitude, indicating that additional factors are required to account for the much larger range of kdephos values reported for wild-type response regulators. Nevertheless, kdephos of CheY mutants were correlated (R2 = 0.62) with kdephos of wild-type response regulators bearing the same amino acids at D + 2/T + 2. Furthermore, for the 11 cases tested, a particular pair of amino acids at D + 2 and T + 2 generally had similar effects on autodephosphorylation in CheY and PhoB or Spo0F. Negatively charged amino acids (and, to a lesser extent, amide residues) at D + 2 or T + 2 enhanced autodephosphorylation in all three receiver domains tested, but positively charged amino acids did not have a consistent effect on hydrolysis. The previously measured rate constants for autophosphorylation with phosphoramidate or monophosphoimidazole by CheY variants that differ at D + 2 and T + 2 (12, 13) are inversely correlated with autodephosphorylation rate constants of the same variants, suggesting that the amino acids at D + 2 and T + 2 interact with aspects of the ground or transition states that differ between the two reactions.
Finally, analysis of receiver domains from different response regulator subfamilies revealed dramatically different distributions of D + 2/T + 2 pairs between receiver domain subclasses. Typically, a few pairs composed of chemically similar amino acids and exerting similar effects on the autodephosphorylation rate constant dominated the D + 2/T + 2 pairs in each response regulator subfamily. The biased distribution strengthens the notion that amino acids at positions D + 2 and T + 2 are important for response regulator function.
Plasmids encoding His6-tagged CheY variants were made using QuikChange (Agilent) with plasmid pKC1 (14) as the template. CheA was purified as described previously (11). Each CheY variant was purified as described previously (5), and removal of the His6 tag by thrombin cleavage left three additional residues (GSH) on the N terminus. The additional GSH does not affect CheY autodephosphorylation kinetics (5). Plasmids encoding His6-tagged PhoB1-127 variants used pET28a-phoBN (14), which encodes a thrombin-cleavable His6-tagged PhoB1-127, as a template. Thrombin cleavage leaves the N-terminal GSH. PhoB variants were purified as described for CheY. His6-tagged PhoR193-431 was expressed and purified as described previously (14). All protein variants were gel filtered using a Superdex 75 1660 size exclusion column (GE Biosciences).
We apply a nomenclature for discussion of CheY and PhoBN mutants in which the first letter is the D + 2 residue and the second letter is the T + 2 residue. One D + 2/T + 2 pair in the top 20 was not assessed. EL (D + 2 is E; T + 2 is L), twelfth in abundance, was made in CheY, but attempts at protein purification were unsuccessful. Only six of pairs 21 to 34 were tested. The GY, KF, KH, RS, EY, HD, TY, and HS pairs (pairs 21 to 23, 28 to 31, and 33, respectively) were not made because they were not among the most prevalent pairs in our initial database analysis. Notably, analysis of Spo0F in reference 11 included three high-frequency pairs (EL, KH, and EY) missing from the CheY mutant set.
Autodephosphorylation rate constants for CheY variants were measured by following the loss of 32P as described previously (5). CheY variants (8 μM) were incubated with 0.5 μM purified [32P]CheA-P (15) in 100 mM Tris at pH 7.5 and 10 mM MgCl2. Because some substitutions resulted in diminished rates of phosphotransfer from [32P]CheA-P to CheY, incubations times were varied for different CheY mutants in order to ensure that at least 95% of the 32P was transferred by the first time point. At each time point, 6 μl of the reaction mixture was removed and mixed with an equal volume of 2× SDS sample buffer to quench the reaction. Components of samples taken at each time point were separated using SDS-PAGE. Loss of 32P from [32P]CheY-P was detected using phosphorimaging analysis of the dried gel. The signals were quantified using pixel volume analysis in which the background signal was manually subtracted. The amounts of [32P]CheY-P were plotted versus time and fit to one-phase exponential decays yielding the kdephos. Each time course was designed to follow loss of 32P over at least 4 or 5 half-lives (~3 to 6% 32P remaining on [32P]CheY-P). All measurements were repeated three times.
To determine the autodephosphorylation rate constants of PhoBN variants, 4 μM His6-tagged PhoR was incubated with 0.3 mM [γ-32P]ATP in 3.5 mM MgCl2, 35 mM KCl, and 35 mM Tris (pH 8.0) for 30 min at room temperature to generate [32P]PhoR-P. The reaction mixture containing [32P]PhoR-P was pipetted onto a 0.22-μm polyvinylidene difluoride (PVDF) centrifugal filter column (Millipore) containing ~200 μl of a nickel-nitrilotriacetic acid (Ni-NTA)–agarose (Qiagen) slurry that was equilibrated in 35 mM Tris (pH 8.0), 3.5 mM MgCl2, and 35 mM KCl buffer. After centrifugation, the column was washed multiple times with equilibration buffer to remove ATP. PhoBN (90 μM in 60 μl of equilibration buffer) was added directly to the slurry, mixed, and then incubated at room temperature for 5 min on the column to allow for sufficient phosphotransfer from [32P]PhoR-P to PhoBN. The column was centrifuged to elute [32P]PhoBN-P. Time courses were completed and phosphorimaging was analyzed as described above for CheY.
To examine coevolution of residues within receiver domains, in October 2015 we performed mutual information analysis of the Pfam Response_reg RP15 database (16) using the MISTIC server (http://mistic.leloir.org.ar/index.php) (17).
To analyze the frequency of amino acids that naturally occur at various positions within receiver domains, we created a searchable database of nonredundant receiver domain amino acid sequences using a combination of publicly available web-based utilities and databases along with custom Perl (v5.16) scripts. The choice of D + 2/T + 2 combinations for experimental work was guided by preliminary analyses made several years ago (data not shown). The up-to-date analysis used to generate amino acid frequencies given in this report is described below.
The sequences of 250,546 proteins containing receiver domains, identified using the Agfam signaling domain library of the Microbial Signal Transduction (MiST2.2) database (18), were provided by the curators on 11 February 2015 (see Database S2 at http://www.unc.edu/~bourret/PageSupplementalDatabaseFiles.htm). To reduce bias and redundancy from multiple closely related genomes, one genome (the first encountered in the list) was chosen for each genus and proteins from all other genomes in the same genus were discarded. This process left 39,132 sequences. Random reordering of the original list of sequences using a Fisher-Yates shuffle resulted in different genome sequences being in the final database but did not significantly change reported results.
To facilitate sequence alignment, protein sequences were trimmed to retain only the receiver domain(s). First, a unique AseqID (19) was assigned to each protein sequence database entry. The AseqIDs were then used to query the Agfam annotation in SeqDepot (19) to retrieve the location of the first and last residues of each receiver domain. Amino acids outside the receiver domain(s) were discarded. Because some proteins contain more than one receiver domain, the number of database entries increased to 41,370. USEARCH v5.1.221 (20) was used to group the receiver domain sequences into clumps of 1,000 (a manageable size for alignment) based on sequence similarity. The sequences in each clump were then aligned using MUSCLE v3.8.31 (21). To further reduce the redundancy of the database, sequences within each clump were compared in a pairwise manner. Sequences with ≥90% identity to the amino acids (gap positions were not tested) of a query sequence were discarded. This step removed 2,989 sequences (7.2% of the total) and did not significantly affect the amino acid frequencies at positions D + 2 and T + 2 in the final database. Because the clumps were created based on sequence similarity and response regulators are on average only ~25% identical (9), the screen against high sequence identity was not applied between clumps.
The positions of the five conserved active-site residues that are critical for receiver domain function (8) were then located. Each clump of aligned sequences was searched to identify the positions that had the highest number of matches to “DD” (two adjacent Asp and/or Glu residues), “D” (Asp only), “T” (Ser or Thr), and “K” (Lys only), with the constraint that the identified positions must occur in the listed order from N to C terminal. The 6,772 sequences that did not initially appear to contain all five conserved residues were cycled through the clumping, alignment, and conserved residue identification procedures again. On the second filtering attempt, 1,643 sequences met the criterion of containing all five conserved residues. Most of the sequences that passed upon rescreening came from the same few clumps as in the first attempt and appeared to have initially failed due to sequence misalignment. Sequences that passed the first and second attempts were pooled to yield a collection of 33,252 receiver domain sequences. The 5,129 sequences (13.4%) that failed to pass the second screen for the presence of all five conserved residues were presumed to be pseudo-receiver domains (22) and were not considered further.
Receiver domains share a three-dimensional structure of alternating β-strands and α-helices, with the conserved residues on the loops at the C-terminal ends of β-strands 1, 3, 4, and 5 (8). The nine loops connecting the five β-strands and five α-helices often differ in length between different receiver domains. Therefore, a header indicating the position of the DD, D, T, and K landmarks was added to each receiver domain sequence and gaps introduced during multiple-sequence alignment were removed. This left a primary receiver domain database (see Database S3 at http://www.unc.edu/~bourret/PageSupplementalDatabaseFiles.htm) in which the amino acid at a given position with respect to a landmark (e.g., D + 2, where + signifies in the C-terminal direction and − signifies in the N-terminal direction) can be identified for each entry in spite of differences in length. Sequences that match a specified search criterion (e.g., Met at position D + 2) can also be retrieved.
Finally, response regulators can be categorized based on their output domains (23). Therefore, the primary receiver domain database was subdivided into secondary databases representing the major classes of response regulators. The AseqID was used to query Seqdepot for the Pfam (16) domain annotation associated with each protein in the primary database. The presence of specific domains led to assignment to subfamilies as follows: PF00486 (Trans_reg_C) domain, OmpR subfamily; PF00196 (GerE) domain, NarL/FixJ subfamily; both PF00158 (Sigma54_activat) and PF02954 (HTH_8) domains, NtrC subfamily; PF04397 (LytTr) domain, LytR subfamily; and PF02518 (HATPase_c) domain, hybrid kinase subfamily. Proteins that had exactly one receiver domain in Agfam annotation and no domains other than PF00072 (Response_reg) in Pfam were assigned to the single receiver domain subfamily. A total of 188 proteins (~0.6%) were assigned to more than one family; 6,697 proteins were not assigned. The sorting process resulted in seven secondary databases (the six families listed above plus the proteins without assignment) that can be individually searched (see Databases S4 to S10 at http://www.unc.edu/~bourret/PageSupplementalDatabaseFiles.htm).
If two variable positions within a protein participate in a functionally important interaction, then the amino acids at the two positions would be expected to covary during evolution, as a change at one position leads to selection of a compensating change at the other position (17). When the mutual information content of all 5,995 possible pairwise comparisons within receiver domains was determined, most (10/13) of the pairs with Z-scores over 100 involved interactions that appeared to be important for protein structure. Five were between the β-sheet in the core of the receiver domain and an adjacent α-helix, three were within the β-sheet, one was between the α1- and α2-helices, and one was between positions K + 1 (Pro in 82% of receiver domains) and K + 2 (data not shown). Positions D + 2 and T + 2, located on loops in the active site, stood out as having the highest mutual information content of the remaining top pairs (Z-score of 163, sixth highest overall). The other two high-scoring pairs involved positions D + 5 and D + 6 in the β3-α3 loop interacting with residues at the C-terminal end of α2. The high mutual information content of variable positions D + 2 and T + 2, combined with their location in the active site, suggests that the amino acids at D + 2 and T + 2 are potentially important for receiver domain function and good candidates for experimental investigation.
Previous analysis of the influence of D + 2 and T + 2 residue pairs on receiver domain autodephosphorylation (11) was limited to pairs that mimicked fewer than 10 receiver domains. Protein sequence analysis was employed to guide experimental design for assessing the impact of D + 2/T + 2 residues in a manner that represented receiver domains more broadly and that was experimentally feasible. After accessing receiver domain sequences from the MiST2 database (18), a nonredundant database of 33,252 receiver domain sequences was created by reducing sequences to one genome per genus and then removing sequences with ≥90% sequence identity (see Database S3 at http://www.unc.edu/~bourret/PageSupplementalDatabaseFiles.htm). Out of 400 possible amino acid combinations at D + 2 and T + 2, 364 pairs were present (Fig. 1). Twenty D + 2/T + 2 pairs (only 5% of the possible combinations) accounted for 50% of natural receiver domain sequences. The pair frequencies are displayed in Fig. 1 such that the bubble area reflects the total number of sequences in our database that contain a specific D + 2/T + 2 pair (see Table S11 in the supplemental material). Some pairs, such as MK (the first letter is the D + 2 residue and the second letter is the T + 2 residue) and MR, were quite abundant, each accounting for more than 5% of sequences analyzed. CheY mutants were made with the goal of completing a set of the most common D + 2/T + 2 pairs. In the end, the collection included 19 of the top 20 and 25 of the 34 most frequent D + 2/T + 2 pairs (the wild-type CheY pair NE was pair 13). Numerous single mutants representing less abundant D + 2/T + 2 pairs were also made and tested en route to making desired D + 2/T + 2 pairs in CheY.
Autodephosphorylation rate constants were measured for 26 CheY variants. With consolidated CheY data from references 10, 11, and 13, the expanded data set (Table 1) contained 12 D + 2 single mutants, 10 T + 2 single mutants, and 21 double mutants in CheY that, together (including the NE pair found in wild-type CheY), represent 54% of receiver domains. At T + 2, amino acids L, K, R, H, and Y were assessed in the study described in reference 11 and expanded to include Q, S, A, N, and F in this study. All T + 2 single substitutions removed the wild-type Glu in CheY and resulted in diminished autodephosphorylation rate constants (Table 1). Overall, single substitutions at T + 2 modulated rates constants over a 10-fold range, similar to what was reported in reference 11. Rate constants for T + 2 single mutants appeared to decrease as residue size and hydrophobicity increased; aromatics (H, Y, and F) made up the slowest T + 2 single mutants.
In contrast, single substitutions at D + 2 (changing the Asn) resulted in both enhanced and diminished rate constants. D, M, E, K, and L were assessed at D + 2 in the study described in reference 11, which was extended to include S, Q, V, A, R, G, and W in this study. Overall, single substitutions at D + 2 resulted in an ~30-fold range in rate constants. Almost all (11/12) D + 2 single mutants resulted in rate constants that were within 3-fold that of wild-type CheY. One D + 2 single mutant (Trp), which slowed the reaction 11-fold, was responsible for the added order of magnitude in the range of CheY D + 2 single mutants.
Double mutants DL, EH, MR, MK, and KY were analyzed in the study described in reference 11, RH and ML were analyzed in the study described in reference 10, and QS was analyzed in the study described in reference 13. This study expanded analysis of CheY double mutants to include ER, QN, SH, QH, GR, AA, QF, QY, VK, WH, RF, RY, and VY. An ~30-fold range in autodephosphorylation rate constants was observed for the consolidated CheY double mutants with simultaneous substitutions at D + 2 and T + 2. All of the CheY D + 2/T + 2 double mutants exhibited diminished autodephosphorylation rate constants compared to that of the wild type. Altogether, single and double substitutions at D + 2 and T + 2 in CheY resulted in a range in autodephosphorylation rate constants that spans 2 orders of magnitude.
The previously studied data set included 16 CheY variants (accounting for 23% of receiver domain sequences) and resulted in an ~90-fold range in autodephosphorylation rate constants (11). Given the small sample size, it was unknown whether the previously observed range of kdephos values represented an upper bound or a lower bound for the effect of D + 2 and T + 2 residues on autodephosphorylation. Increasing the data set to 44 CheY variants (54% of receiver domain sequences) expanded the range of CheY autodephosphorylation rate constants only ~30% to 120-fold. The expanded data set is more representative of naturally occurring D + 2/T + 2 pairs than the original and includes D + 2/T + 2 pairs found in wild-type response regulators exhibiting a range in autodephosphorylation rate constants of 4 orders of magnitude. Therefore, we can reasonably expect to attribute only about 2 orders of magnitude of the range in wild-type response regulator autodephosphorylation rate constants to the particular amino acids at positions D + 2 and T + 2 and conclude that additional factors must contribute substantially to variation in the rate of autodephosphorylation between receiver domains.
E. coli PhoBN, a receiver domain with a relatively low autodephosphorylation rate constant (14), was used to analyze substitutions that could potentially enhance autodephosphorylation. This expands a previous approach using the B. subtilis Spo0F response regulator (11). PhoB was chosen for multiple reasons. While CheY and Spo0F are single-domain response regulators, PhoB has an output domain and is representative of the large OmpR class of response regulators. Although the presence or absence of the output domain has little effect on PhoB autodephosphorylation (14), receiver and output domains within response regulators have coevolved (24, 25), and D + 2/T + 2 residues conceivably could have different effects in PhoBN than in a single-domain response regulator. Another contrast with CheY and Spo0F is that PhoBN forms dimers (26). To the best of our knowledge, PhoBN is the only response regulator for which autodephosphorylation rate constants have been measured in different multimeric states. The autodephosphorylation rate constants for monomeric PhoBN-P, the PhoBN·PhoBN-P heterodimer, and the PhoBN-P·PhoBN-P homodimer are indistinguishable (14), removing a potential complication from interpretation of data.
Single and double substitutions at D + 2 and T + 2 were made in PhoBN (changing Met and Arg at D + 2 and T + 2, respectively) that mimic CheY (Table 2). Mutants with Glu substitutions at T + 2 resulted in the largest increases compared to the wild-type PhoBN autodephosphorylation rate constant (PhoBN NE and PhoBN ME autodephosphorylated 18- and 45-fold faster, respectively, than wild-type PhoBN).
The PhoBN mimics of CheY were compared to single and double Ala mutants of PhoBN. Steric occlusion, i.e., hindrance of the in-line attack of nucleophilic water by large residues, was previously hypothesized as one mechanism by which D + 2 and T + 2 residues might influence receiver domain autodephosphorylation (10). Based on this hypothesis, removal of the large Arg and/or Met residues by substitution with Ala should result in enhanced autodephosphorylation. The autodephosphorylation rate constant of PhoBN AR was unchanged from that of wild-type PhoBN, whereas rate constants for PhoBN MA and PhoBN AA were both 2-fold faster than that of wild-type PhoBN. The relatively small effects of single and double Ala substitutions in PhoBN suggest that steric obstruction is not the key mechanism of influence by D + 2/T + 2 residues. Because introducing a Glu at T + 2 had a much larger effect than an Ala, it is clear that the key influence in the enhanced autodephosphorylation rate constant of PhoBN ME was from adding a beneficial feature of Glu, as opposed to removing a detrimental feature of Arg.
The magnitudes of effects between the same D + 2/T + 2 pairs in CheY and PhoBN were similar (Fig. 2). Generally, similar effects of D + 2/T + 2 pairs were also previously observed between CheY and Spo0F (11) (Fig. 2). The observation that the effects were similar regardless of backbone suggests that there may be some generality to the effect of a particular D + 2/T + 2 pair. Kinetic data were analyzed to probe for mechanistic insight that could be applied to all three receiver domain backbones. Data were analyzed to determine whether kinetic effects correlated with features of the amino acids (such as size, solvent accessibility, and hydrophobic surface area) present at D + 2 and T + 2. Though relationships were not observed for other features, a plot of autodephosphorylation rates constants against net charge at D + 2 and T + 2 revealed that more negatively charged active sites weakly correlated with higher autodephosphorylation rate constants (best fit R2 = 0.44) (Fig. 3). Deeper inspection provides additional evidence that negative charge had a role in modulating autodephosphorylation. There are 13 CheY variants with Glu at T + 2 (Table 1). Because the D + 2 residue was varied experimentally, for purposes of discussion, this class of variants is designated XE. Of the XE variants, 12 have one or more related double mutants in which the Glu is replaced with another amino acid, designated XZ. In all 31 of the CheY XE-versus-XZ comparisons, the autodephosphorylation rate constant is higher for the XE variant (Fig. 4A). Because Glu is the wild-type residue at T + 2 in CheY, the result could be explained as loss of wild-type function. However, in both PhoBN (3/3) and Spo0F (6/6) (Fig. 4A), in which Glu is not the wild-type residue at T + 2, all of the XE mutants tested resulted in a gain of function compared to the corresponding XZ mutants.
Acidic residues at D + 2 similarly correlated with higher autodephosphorylation rate constants. The CheY variants with an acidic residue at D + 2 are designated EX, where E represents either Asp or Glu. There are 34 comparisons between EX pairs in CheY and variants that have the D + 2 residue replaced, designated ZX (Fig. 4B). In 30/34 comparisons, the EX variant exhibited a higher kdephos than the ZX variant. Two of the four exceptions have a negatively charged Glu at the “X” (T + 2) position and the other two exceptions have the wild-type amide (Asn) at D + 2. In Spo0F, acidic substitutions at D + 2 also resulted in faster autodephosphorylation (11). There are two Spo0F EX variants that can be compared to ZX variants with the same T + 2 residue. In all four Spo0F comparisons, the EX variants autodephosphorylated faster than the corresponding ZX variants (Fig. 4B). There were no PhoBN mutants in this study that contain an acidic residue at D + 2.
If negatively charged amino acids at positions D + 2 or T + 2 enhance autodephosphorylation, then positively charged amino acids might be expected to diminish autodephosphorylation. This general trend was observed in Fig. 3. However, pairwise comparisons of CheY variants with or without positively charged residues analogous to that shown in Fig. 4 did not yield similar outcomes. When looking among groupings of CheY variants that included some with a positively charged residue (Arg or Lys) at D + 2 or T + 2, we did not discern a consistent relationship between kinetic data and removal or addition of a positive charge. In 36 of 56 comparisons (data not shown), the CheY variant containing Arg or Lys exhibited a lower value for kdephos than corresponding variants without Arg or Lys. Further, because His has a pKa within a reasonable pH range for autodephosphorylation experiments, we were able to directly assess pH dependence of CheY variants with a His at T + 2. Changing the pH will alter the charge of a His between neutral and positive. Autodephosphorylation rate constants of the CheY QH, RH, and WH variants were not affected by pH (data not shown), suggesting that positive charge at T + 2 may not influence CheY autodephosphorylation. In contrast to CheY, tested PhoBN and Spo0F variants containing Arg or Lys always supported slower autodephosphorylation than corresponding proteins without the positively charged residues, but the available data set is much smaller (only four comparisons for PhoBN and five for Spo0F). Arg, His, and Lys contain large hydrophobic surface areas, so hydrophobic surface area or size, rather than positive charge, may be the means by which these residues influence autodephosphorylation.
In addition to autodephosphorylation, receiver domains self-catalyze phosphorylation with small-molecule phosphodonors, such as phosphoramidate (PAM), monophosphoimidazole (MPI), or acetyl phosphate (AcP). Analysis in CheY showed that the amino acids at D + 2 and T + 2 influence autophosphorylation with small-molecule phosphodonors (12). Of the CheY variants for which autodephosphorylation rate constants are available (Table 1), there were 20 CheY variants for which PAM and AcP autophosphorylation rate constants (kphos/Ks) were both known and 7 for which MPI autophosphorylation rate constants were available (12, 13). For the CheY variants, there were significant correlations between kdephos and kphos/Ks with PAM (R2 = 0.51) or MPI (R2 = 0.87) but not with AcP (R2 = 0.18). For both PAM and MPI, plotting kdephos versus kphos/Ks revealed inverse relationships between the rate constants for the two reactions (Fig. 5), as exemplified by the negative slopes of the best fit lines on log-log plots of the data (Fig. 5, insets).
The plots of kdephos versus kphos/KS showed informative correlations between the amino acids found at T + 2 and the rate constants for each reaction. Pairs containing residues with large hydrophobic surface areas typically had higher PAM/MPI autophosphorylation rate constants (12) and lower autodephosphorylation rate constants (Fig. 5). Pairs containing acid or amide residues at T + 2 were typically faster for autodephosphorylation and slower for PAM/MPI autophosphorylation, with acid residues having the greatest impact.
Because response regulator subfamilies with different output domains have different functions, we wanted to determine whether particular D + 2/T + 2 pairs were associated with specific response regulator subfamilies and, potentially, with specific response regulator functions. The primary nonredundant database of receiver domain sequences (see Database S3 at http://www.unc.edu/~bourret/PageSupplementalDatabaseFiles.htm) was divided into seven secondary databases (see Databases S4 to S10 at http://www.unc.edu/~bourret/PageSupplementalDatabaseFiles.htm) based on association with major classes of response regulators as described in Materials and Methods. Subfamilies represented the following fractions of the nonredundant database: hybrid kinases, 24%; OmpR, 20%; single receiver domains, 18%; NarL/FixJ, 12%; NtrC, 6%; and LytR, 4%. Sixteen percent of sequences were not assigned to a response regulator subfamily in our analysis. The smaller databases revealed that the distributions of D + 2/T + 2 pairs were strikingly different between response regulator subfamilies, as well as between response regulator subfamilies and response regulators taken as a whole (Table 3). The functional consequences of this circumstance are considered in Discussion.
The weak correlation between net charge of the amino acids at D + 2/T + 2 and the autodephosphorylation rate constant of CheY variants (Fig. 3) resolved upon closer inspection into a consistent correlation between negative charge and enhanced kdephos in three different receiver domains (Fig. 4) but no correlation between positive charge and diminished kdephos. This discrepancy suggests that charge is not the fundamental basis for the underlying mechanism by which amino acids at D + 2/T + 2 enhance autodephosphorylation. It may be relevant that negatively charged amino acids are hydrogen bond acceptors, whereas positively charged amino acids are hydrogen bond donors. This suggests that negatively charged side chains at D + 2/T + 2 might enhance the reaction by interacting with the attacking water molecule. Because amide residues can act as hydrogen bond acceptors, this hypothesis predicts that amides should also enhance autodephosphorylation. Our mutant collection contains few examples with an amide residue at T + 2, but we have measured kdephos for many CheY variants with an amide at D + 2 (Table 1) and can analyze the data as was done in Fig. 4 for acid residues. In 33 pairwise comparisons between CheY NX and ZX variants (matched at T + 2 and differing at D + 2), there were only three cases in which kdephos for the CheY ZX mutant was faster and the “Z” (D + 2) residue was not an acid or amide (data not shown). Similarly, there were only three cases among 25 matched pairs in which kdephos was greater for CheY QX than for CheY ZX without an acid or amide residue occupying D + 2 (data not shown). For PhoBN (Table 2), in two of three NX-versus-ZX comparisons, kdephos was larger for the amide-containing variant (data not shown). For Spo0F (11), in four of five comparisons, the NX variant exhibited faster autodephosphorylation than the ZX variant, and in the one exception an acid residue at D + 2 resulted in a greater kdephos than with an amide (data not shown). Although the difference in kdephos is small in many pairwise comparisons, a preponderance of evidence indicates that amide residues at position D + 2 enhance autodephosphorylation in multiple receiver domains, although not to the same extent as acid residues.
The conclusions that (i) acid and amide residues at D + 2/T + 2 enhance receiver domain autodephosphorylation and (ii) the mechanism likely involves interaction with the attacking water molecule are supported by a previous study of CheY mimics of haloacid dehalogenase phosphatases (27). The active sites of HAD phosphatases are similar to the active sites of receiver domains and catalyze similar chemistries (28). Although position D + 2 is variable in receiver domains and rarely (2%) occupied by an Asp, HAD phosphatases contain a conserved Asp at D + 2 that accelerates phosphorylation and dephosphorylation by acid/base catalysis. The amino acid at T + 2 in HAD phosphatases often helps position the Asp at D + 2. We previously determined kdephos values in five CheY variants containing D + 2/T + 2 pairs (DR, DK, DQ, DY, and DT) that are rare in receiver domains but common in HAD phosphatases (27). Although the CheY DX variants do not utilize acid/base catalysis, they support faster autodephosphorylation in all (12/12) possible comparisons with corresponding CheY ZX variants (matched at T + 2 and differing at D + 2) reported in Table 1. Furthermore, structural evidence was obtained for interactions between an Asp at D + 2 and an attacking water molecule (27).
Receiver domains self-catalyze both phosphorylation with small-molecule phosphodonors and dephosphorylation with water. Both reactions are substitution reactions at the phosphoryl atom. The proposed transition states for autodephosphorylation and autophosphorylation share multiple features, including a partially formed Asp-P bond and a planar PO32− group coordinated by conserved active-site residues and a divalent cation (10, 12). If residues at D + 2 and T + 2 affected aspects of the two reactions that are similar, it would be reasonable to expect a direct correlation between the corresponding rate constants. However, direct correlation of kdephos for CheY D + 2/T + 2 variants was not observed with the rate constants for autophosphorylation with PAM (Fig. 5A), MPI (Fig. 5B), or AcP (data not shown). The lack of a direct correlation suggests that residues at D + 2 and T + 2 affected features of autophosphorylation and autodephosphorylation that are not similar. Obvious differences between the reactions include the ground states (CheY-P for autodephosphorylation versus CheY for autophosphorylation) and the regions of the transition states near the attacking water for autodephosphorylation or the phosphodonor for autophosphorylation. Notably, the residues at D + 2/T + 2 are within appropriate distance to affect the regions around the water or phosphodonor. A previous study concluded on different grounds that the amino acids at D + 2/T + 2 affect CheY autophosphorylation kinetics by interacting with the leaving group (12).
Instead of a direct correlation, kdephos of CheY variants differing at D + 2/T + 2 varied inversely with kphos/KS for PAM or MPI (Fig. 5) but not AcP. This distinction is consistent with a previous report that the kinetic determinants for autophosphorylation of CheY with phosphoramidates (PAM and MPI) or acyl phosphates (AcP) are different (12). In particular, autophosphorylation with PAM is more strongly influenced by the hydrophobic surface area of the amino acids at D + 2 and T + 2, whereas autophosphorylation with AcP is more strongly affected by the charge of the D + 2/T + 2 residues. The inverse relationship between kdephos and kphos/KS for PAM/MPI displayed in Fig. 5 may arise because the two reactions are primarily influenced by mutually exclusive properties of the amino acids at D + 2/T + 2. Specifically, hydrogen bond accepting ability appears to promote autodephosphorylation (presumably by interacting with water as discussed above), whereas a hydrophobic surface area appears to stimulate autophosphorylation through direct interaction with the imidazole ring or related portions of the phosphodonor (12, 13). Thus, D + 2/T + 2 variants of CheY that are adept at catalyzing both autophosphorylation with PAM/MPI and autodephosphorylation with water were not observed.
In many instances, autodephosphorylation rate constants for CheY, PhoBN, and Spo0F variants appeared to be consistent with the effects of substitutions at D + 2 and T + 2 anticipated from rate constants for wild-type response regulators containing the same D + 2/T + 2 pairs. For example, wild-type PrrA is at the slower end of the range of autodephosphorylation rate constants for wild-type response regulators and contains RY at D + 2/T + 2. In CheY, the RY variant is one of the slowest of the CheY variants (Table 1). There was a direct correlation (R2 = 0.62) between kdephos for wild-type response regulators (range, 4 orders of magnitude) and kdephos for CheY variants with the same D + 2/T + 2 pair (range, 2 orders of magnitude) when plotted in log-log form (Fig. 6). The correlation between kdephos for wild-type response regulators and corresponding CheY variants generally holds even though the effects of changing amino acids at D + 2 and T + 2 in CheY are sufficient to account for <1% of the range in autodephosphorylation rate constants observed among wild-type response regulators. The correlation may indicate that other factors affecting kdephos often evolved to reinforce the direction set by the amino acids at D + 2 and T + 2.
The enrichment of particular pairs of amino acids at positions D + 2 and T + 2 in various response regulator subfamilies (Table 3) is substantial. For example, whereas the MK and MR D + 2/T + 2 pairs are found in 14% of receiver domain sequences overall, they are present in 49% of receiver domain sequences in the OmpR subfamily (Table 3). Furthermore, 72% of all sequences containing MK or MR pairs belong to the OmpR subfamily. OmpR subfamily receiver domains typically function to regulate transcription, a relatively slow process, and receiver domains containing MK or MR at D + 2/T + 2 have autodephosphorylation rate constants at the low end of the range of wild-type response regulators. Changing D + 2/T + 2 in CheY to MK or MR made CheY autodephosphorylation rate constants at least an order of magnitude lower than for the wild type (Table 1). Similarly, replacing the natural MR pair in PhoBN with the NE pair found in CheY increased autodephosphorylation by more than an order of magnitude.
A central result reported here is that negatively charged amino acids at D + 2 or T + 2 enhance autodephosphorylation (Fig. 4). There are no D + 2/T + 2 pairs with a negatively charged residue in the top 10 pairs of all receiver domains, and there are only three in the top 20 (see Fig. S11 in the supplemental material). However, the distribution among response regulator subfamilies of the three most common D + 2/T + 2 pairs containing a negative charge is informative (Table 3). Prevalent in chemotaxis, CheYs consist of a single receiver domain and typically contain an NE pair. While NE represents less than 2% of receiver domain sequences overall, the pair makes up 8% of single receiver domain sequences. A total of 78% of sequences overall that contained an NE pair are found in the single receiver domain database. While ER is found in 1.1% of receiver domain sequence, this D + 2/T + 2 pair represents 3.4% of receiver domains in hybrid kinases. A total of 75% of receiver domains that contain an ER pair are found in the hybrid kinase database. CheB-type response regulators contain a methylesterase domain (23) and are not part of the other response regulator subfamilies chosen for analysis in Table 3. CheB receiver domains typically contain the EL D + 2/T + 2 pair. The “nonassigned” database contained 78% of the EL-representative receiver domain sequences overall. It is striking that the NE, EL, and ER combinations of amino acids at D + 2 and T + 2 are strongly enriched in response regulators that require rapid autodephosphorylation for function. Signal transduction in chemotaxis occurs in a fraction of a second (29), and the receiver domain in hybrid kinases often functions as a phosphate sink to drain phosphoryl groups out of the system through autodephosphorylation (30,–33).
The enrichment of specific D + 2/T + 2 pairs with particular effects on autodephosphorylation kinetics in different response regulator subfamilies strengthens the correlation between which D + 2/T + 2 pairs are present in a receiver domain and the biological function of that particular receiver domain. This correlation is further reinforced by the observations that (i) D + 2/T + 2 pairs with chemically similar amino acids (e.g., a basic residue at D + 2 paired with a hydrophobic residue at T + 2) often cluster together in response regulator subfamilies (Table 3) and (ii) D + 2/T + 2 pairs with chemically similar amino acids have very similar (within 2-fold) effects on CheY autodephosphorylation kinetics (Table 1).
A central part of the research strategy described here is to identify the combinations of amino acids at variable positions D + 2 and T + 2 most commonly found in naturally occurring receiver domains and then use this information to prioritize subjects of experimental investigation. However, phylogenetic groups are not uniformly represented in sequence databases, which as a result are inevitably biased by what has been sequenced. It would be useful to assess possible sources of database bias and corresponding means to mitigate the impact of bias. When constructing our receiver domain databases, we attempted to minimize the effects of overrepresentation bias by including information from only one genome per genus and excluding sequences that were more than 90% identical to other sequences. This addressed problems resulting from closely related organisms that have been sequenced multiple times but not biases due to the absence of sequences from other organisms.
The observation that a large majority of occurrences of a given D + 2/T + 2 pair were often found in a single response regulator subfamily (Table 3) provides a way to assess the potential impact of underrepresentation bias. Individual species typically encode response regulators from many different subfamilies, and there is variation in the abundance of different response regulator subfamilies across phylogenetic groups (23). Thus, the abundance of particular pairs of amino acids at D + 2/T + 2 might more directly reflect the distribution of response regulator subfamilies across species than the phylogeny of sequenced genomes. To test this idea, we compared the abundance of D + 2/T + 2 pairs in sample species from various phylogenetic groups (see Table S12 in the supplemental material) with the abundance of response regulator subfamilies in the same organisms (see Table S13 in the supplemental material). For example, more than three-quarters of response regulators in the Actinobacteria sample belonged to the OmpR or NarL/FixJ subfamilies (see Table S13), and six of the seven most abundant D + 2/T + 2 pairs in Acinetobacteria (see Table S12) were highly enriched in the OmpR and NarL/FixJ subfamilies (Table 3). Similarly, >70% of receiver domains in the Cyanobacteria sample belonged to the hybrid kinase or single-domain subfamily, and the seven most common D + 2/T + 2 pairs in the Cyanobacteria sample were characteristic of these two subfamilies. Additional examples are evident. Overall, the data in Tables S12 and S13 are consistent with the abundance of response regulator subfamilies being a primary determinant of the most frequent D + 2/T + 2 pairs found in a given species.
If the distribution of response regulator families is very different in organisms that have not been sequenced compared to those that have, then the most frequent D + 2/T + 2 pairs in our database (see Table S11 in the supplemental material) may not be the most abundant in nature. Nevertheless, the 20 overall most common D + 2/T + 2 pairs, which were the focus of our experimental investigation, account for two-thirds of the 70 pairs (10 most common D + 2/T + 2 pairs in each of seven receiver domain subfamilies) listed in Table 3. Representation of naturally occurring receiver domains could be further enhanced by experimental characterization of the effects of the D + 2/T + 2 pairs that are most abundant in each response regulator subfamily, rather than across all receiver domains.
With rapid advancements in genome sequencing, the known sizes of protein families have dramatically increased. While conserved domains identify the family to which a protein belongs, questions of functionality remain. Assessment of functionality within a protein family has historically focused on conserved residues within the conserved domain. Typically, protein sequences are aligned to determine the conserved residues, the conserved residues are changed to alanines or other amino acids, and the mutant proteins are assessed using functional assays. However, when considering the different functionalities within a family of proteins, it is often the variable features that distinguish one protein in a family from another. Candidates for functionally important variable positions can be identified by mutual information analysis, inspection of protein structures, or alanine-scanning mutagenesis. Protein sequence analysis dramatically enhanced the feasibility of studying variable features of a large protein family, i.e., response regulators. Not only were we able to assess receiver domain autodephosphorylation kinetics of a mutant set that reflects what is found in nature, but also we were able to gain more confidence in the mechanistic insights elucidated by an expanded data set. While previous work indicated steric occlusion as a key means of influence by D + 2/T + 2 pairs on autodephosphorylation, the larger data set reported here suggests with more confidence that interaction with the attacking water is a key means of influence.
Protein sequence analysis also revealed distinctions between response regulator subfamilies. The D + 2/T + 2 pairs enriched in the response regulator subfamily databases (Table 3) modulated autodephosphorylation in ways that are consistent with the biological functions associated with response regulators in those subfamilies. Grouping protein sequences into subcategories (such as the response regulator subfamilies used here) combined with the knowledge of influence from variable features (such as results from changing D + 2/T + 2 reported here) could provide further mechanistic insight into functional variation with protein families.
We acknowledge Igor Zhulin and Ogun Adebali for providing receiver domain sequences and for advice on protein sequence analysis. We acknowledge Ruth Silversmith, Rachel Creager-Allen, and Chrystal Starbird for helpful discussions regarding this work. Suggestions from anonymous peer reviewers strengthened the manuscript.
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JB.00853-15.