The specificity of two-component signaling pathways relies on the intrinsic ability of a histidine kinase to discriminate its cognate response regulator from all other possible substrates. Structural and previous mutagenesis studies have not, however, been able to elucidate the molecular basis of specificity in this protein-protein interaction. Here, we used an analysis of amino-acid covariation to guide the identification of specificity determinants in two-component signal transduction pathways and to produce a general method for rewiring the substrate selectivity of EnvZ, both in vitro
and in vivo
. Given the high homology between two-component signaling proteins, we anticipate that a similar, rational rewiring of other histidine kinases will be possible using the design approach developed here. Two-component proteins have been engineered and used in synthetic circuits previously, but this work has been limited to fusions of heterologous ligand-binding domains to the entire cytoplasmic portion of EnvZ (Levskaya et al., 2005
; Utsumi et al., 1989
). The ability to now rewire the substrate selectivity of EnvZ, and other histidine kinases, should significantly enhance their use in synthetic signaling circuits.
Protein-protein interactions are crucial to the operation of nearly every cellular process, but our understanding of interaction specificity remains limited. In particular, there are only a few cases in which protein-protein interaction specificity is understood to a level that allows the rational reprogramming, or engineering, of interaction specificity. There have been some recent successes, such as the redesign of colicin-immunity protein interactions (Kortemme et al., 2004
) and coiled-coil dimerization (Havranek and Harbury, 2003
), but these studies have relied heavily on structural data. Our results demonstrate that the analysis of amino-acid covariation in large multiple sequence alignments can effectively guide the identification of specificity determinants in protein-protein interactions, even those involved in transient signaling events such as a kinase-substrate interaction. Importantly, the analysis of amino-acid covariation does not require structural data, and in fact, provides complementary information. For instance, the Spo0B:Spo0F structure clearly revealed the molecular surfaces and amino acids in direct contact during phosphotransfer, but the covariation analysis presented here was necessary to pinpoint the subset of contact residues that dictate specificity. The covariation analysis was particularly valuable because no high-resolution structure of a histidine kinase in complex with a response regulator has yet been solved; Spo0B is a histidine phosphotransferase and considered a suitable proxy for histidine kinases, but may not be suitable for structure-based redesign efforts. As sequence databases continue their rapid expansion, covariation analyses should become increasingly useful for mapping the specificity determinants of other protein-protein interactions. Analyses of coevolution should also significantly aid efforts to design or engineer protein specificity, either as a complement to other computational approaches or by focusing the region of interest for mutagenesis or directed evolution (Bloom et al., 2005
; Kortemme and Baker, 2004
). Our analysis of histidine kinase-response regulator covariation leveraged the fact that cognate pairs are often encoded in the same operon. The use of covariation analysis in eukaryotes will likely depend on protein-protein interaction data for paralogous protein families of interest, such as that generated recently for human bZIP transcription factors and PDZ-peptide interactions (Newman and Keating, 2003
; Stiffler et al., 2007
Our covariation analyses identified two primary clusters of amino acids in histidine kinases that covary with amino acids in response regulators. These clusters are located above and below the active site histidine (colored green and orange in , respectively). Based on previous NMR studies (Tomomori et al., 1999
), we focused on the amino acids below the histidine and demonstrated that these residues are crucial determinants of a histidine kinase’s substrate specificity. The small cluster of residues above the histidine thus do not appear to play a significant role in substrate specificity. These amino acids may have been spuriously identified as amino-acid covariation can result from common ancestry rather than functional constraint. The elimination of sequences with greater than 90% identity in our analyses helps to minimize, but does not eliminate, the confounding effects of phylogeny. A number of approaches have recently been developed to further minimize the influence of common ancestry. Application of one such approach (Dunn et al., 2008
) to the multiple sequence alignments used here did not substantially change the top-scoring amino acids, although notably, two amino acids (L230 and A231 in EnvZ) in the cluster above the active site histidine were eliminated (data not shown).
Most of the amino acids showing strong covariation between histidine kinases and response regulators are located at or near the presumed molecular interface formed during phosphotransfer. There is not, however, a perfect correlation between the covarying and interfacial residues. It is only a subset of interfacial residues, mostly at the C-terminal end of alpha helix 1, that play the dominant role in determining substrate selectivity. Changing the specificity of EnvZ to match that of RstB was achieved by mutating only the specificity residues within this helix. Changing the specificity of EnvZ to match that of other kinases, however, also required swapping the loop between alpha helices 1 and 2 (). As noted earlier, residues within the loop are difficult to align and hence more refractory to covariation analysis. In this respect, structures of histidine kinases provided useful, complementary insight to the covariation analysis. A similar combination of approaches will likely be helpful in reprogramming other protein-protein interactions.
Finally, the identification of specificity determinants in two-component signaling proteins may also shed light on the system-level properties and evolution of paralogous signal transduction families. The exquisite substrate selectivity exhibited by histidine kinases suggests they have been selected during evolution both for recognition of their cognate substrates and against recognition of non-cognate substrates. Evidence for a similar balance of selective forces shaping the specificity of paralogous gene families has also recently been reported for PDZ-peptide and SH3-peptide interactions in eukaryotes (Stiffler et al., 2007
; Zarrinpar et al., 2003
). For two-component signal transduction systems, the identification of specificity-determining residues will facilitate study of the selective forces that influence the evolution of these signaling proteins. This understanding, in turn, may help to reveal how cells coordinate multiple, paralogous signaling pathways to maintain information flow while preventing unwanted cross-talk.