|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: AC CLEZ MLD VR JNH AZ. Performed the experiments: AC CLEZ MLD VR AZ. Analyzed the data: AC CLEZ MLD VR JNH AZ. Contributed reagents/materials/analysis tools: JNH AZ. Wrote the paper: CLEZ AZ. Acquired funding: CLEZ JNH AZ.
Here we introduce a quantitative structure-driven computational domain-fusion method, which we used to predict the structures of proteins believed to be involved in regulation of the subtilin pathway in Bacillus subtilis, and used to predict a protein-protein complex formed by interaction between the proteins. Homology modeling of SpaK and SpaR yielded preliminary structural models based on a best template for SpaK comprising a dimer of a histidine kinase, and for SpaR a response regulator protein. Our LGA code was used to identify multi-domain proteins with structure homology to both modeled structures, yielding a set of domain-fusion templates then used to model a hypothetical SpaK/SpaR complex. The models were used to identify putative functional residues and residues at the protein-protein interface, and bioinformatics was used to compare functionally and structurally relevant residues in corresponding positions among proteins with structural homology to the templates. Models of the complex were evaluated in light of known properties of the functional residues within two-component systems involving His-Asp phosphorelays. Based on this analysis, a phosphotransferase complexed with a beryllofluoride was selected as the optimal template for modeling a SpaK/SpaR complex conformation. In vitro phosphorylation studies performed using wild type and site-directed SpaK mutant proteins validated the predictions derived from application of the structure-driven domain-fusion method: SpaK was phosphorylated in the presence of 32P-ATP and the phosphate moiety was subsequently transferred to SpaR, supporting the hypothesis that SpaK and SpaR function as sensor and response regulator, respectively, in a two-component signal transduction system, and furthermore suggesting that the structure-driven domain-fusion approach correctly predicted a physical interaction between SpaK and SpaR. Our domain-fusion algorithm leverages quantitative structure information and provides a tool for generation of hypotheses regarding protein function, which can then be tested using empirical methods.
Because proteins so frequently function in coordination with other proteins, identification and characterization of the interactions among proteins are essential for understanding how proteins work. Computational methods for identification of protein-protein interactions have been limited by the degree to which proteins are similar in sequence. However, methods that leverage structure information can overcome this limitation of sequence-based methods; the three-dimensional information provided by structure enables identification of related proteins even when their sequences are dissimilar. In this work we present a quantitative method for identification of protein interacting partners, and we demonstrate its use in modeling the structure of a hypothetical complex between two proteins that function in a bacterial signaling system. This quantitative approach comprises a tool for generation of hypotheses regarding protein function, which can then be tested using empirical methods, and provides a basis for high-throughput prediction of protein-protein interactions, which could be applied on a whole-genome scale.
Because proteins so frequently function in coordination with other proteins, identification and characterization of protein-protein complexes are essential aspects of protein sequence annotation and function determination . A variety of empirical – and computational – methods for identifying putative protein-protein interactions have been reported. Of particular note is the Rosetta Stone approach for identifying interacting partners based on the theory of gene fusion, whereby protein domains that are encoded separately in one species may be homologous to domains that are “fused” in the same open reading frame in another species –. Whereas sequence-based domain fusion methods can be highly successful in identifying putative functional relationships among proteins, the reliance on sequence homology limits detection to protein sequences with adequate levels of sequence identity. Another approach to identifying putative protein-protein interactions is described by Lu and coworkers , whereby sequence-based searches against the PDB database were performed in order to identify multi-domain structures having at least one domain with good sequence identity to each putative interacting protein. However, the sensitivity of this search method is also dependent on the levels of sequence identity between the proteins of interest and the sequences of the domains within the identified PDB domain-fusion template. Kundrotas and Alexov  explored the use of structure-based comparisons in the identification of multi-domain templates for homology modeling of complex structures. In this work, it was determined that a structure-based protocol performed considerably better than did a sequence-based protocol in recovering known protein-protein interacting partners (86% recovery as opposed to 19%) in searches against a database of known complexes, indicating that the structure-based method was more sensitive in detecting remote homologs.
We describe the application of a quantitative structure-based comparison method to the identification of putative protein-protein interactions, and show that this approach increases sensitivity in detecting putative interactions at low (<20%) levels of sequence identity, based on the general principle that structure homology is more highly conserved in evolution than is sequence homology . Our approach, therefore, involves the generation of a structure model, based on adequate (typically >30%) sequence identity to a PDB domain, followed by structure-based homology searches against PDB to identify multi-domain structures with adequate structure identity  to the model of each putative interacting protein. Thus, we propose that our structure-driven domain-fusion method can be used to identify domain-fusion templates for modeling protein-protein interaction complexes, and that such searches may prove to be more sensitive than sequence-based searches alone.
To explore this approach, we selected as the subject of our study a protein-protein interaction that is representative of a common class of biological control systems, known as the two-component signal transduction system –: the interaction of SpaK and SpaR from Bacillus subtilis, which regulate the biosynthesis of subtilin, an antimicrobial peptide lantiobiotic that inhibits growth of a broad range of pathogenic Gram positive bacteria –. In this study we introduce a structural bioinformatics methodology for identification of putative protein-protein complexes, and we apply it to characterize the interactions between SpaK and SpaR. We generate structure homology models of SpaK and SpaR, and then use these models to identify multi-domain protein structures that have good structure homology to the models. Using one of the so-identified domain-fusion templates, we generate a model representing a hypothetical physical interaction between SpaK and SpaR, which enables further analyses of residues involved in the protein-protein interaction. In this way we extend the well-known sequence-based domain-fusion method by leveraging structural data, and use it to generate hypotheses regarding the interactions between the two proteins. We further report the results of biochemical studies on wild type and mutant proteins that characterize the interactions between SpaK and SpaR, and we assess the resulting structural model of a putative SpaK/SpaR complex arising from our structure-driven domain-fusion approach. Furthermore, our biochemical analyses confirm that SpaK autophosphorylates and subsequently transfers a phosphoryl group to SpaR.
SpaK (gi: 6226707, Uniprot P33113) and SpaR (gi: 417799, Uniprot P33112) protein sequences were input to the AS2TS protein structure modeling system (; http://as2ts.llnl.gov/), which generated initial homology models based on structures taken from the Protein Databank (PDB) (version released December 11, 2007). Structural templates having global sequence homology to each of SpaK and SpaR were further studied by examining domain-level homology.
As no suitable template for the N-terminal domain (218 residues) of SpaK was identified, this domain was not modeled. Based on match length (227 residues), e-value (4e-57), and sequence identity (28%), PDB entry 2c2a_A, a sensor histidine kinase from Thermotoga maritima, was identified as the primary template for modeling SpaK (Fig. 1). Additional templates identified by AS2TS are shown in Supplemental Results Table S1. Two domains of SpaK (SpaK_d1: residues 219–300 and SpaK_d2: 301–459) were modeled separately, pending determination of relative conformation to be provided by structure-driven domain-fusion analysis (see Results). Although identification of a structure template with acceptable global sequence homology enables initial model construction, there often remain sub-sequences in the protein of interest that do not correspond to any portion of the template due to insertions or deletions relative to that template. For this reason, and in order to construct as complete a model as possible to confirm the fitness of the modeled complex, the Local-Global Alignment (LGA) modeler gap-filling procedure (in-house software) was used to construct necessary loops, gaps or insertions by “grafting” in suitable regions from related structures in PDB.
Similarly, SpaR was modeled as two separate domains, comprising residues SpaR_d1: 1–117 and SpaR_d2: 118–220. The N-terminal domain was initially modeled based on the structural template 1mvo_A (crystal structure of the PhoP receiver domain from Bacillus subtilis), which showed the highest level of sequence identity (46%) to that domain (see Supplemental Results Table S2). In order to complete the model, the LGA gap-filling procedure was used to construct regions of missing coordinates. PDB entry 2gwr_A, a response regulator protein from Mycobacterium tuberculosis, was identified as the primary template for homology modeling of the C-terminal domain of SpaR (match length 216, e-value 9e-58, sequence identity 30%). This template was also used for the construction of the domain orientation (Fig. 2). Further refinement of the constructed SpaK and SpaR models was performed based on the structure comparison of modeled domains with other PDB templates that were structurally identified by a PDB-search procedure using LGA and the PDB release of July 8, 2008. In all created models the positioning of the sidechains for residues that were identical in the template were copied to the models, and the coordinates for missing side chain atoms were predicted using SCWRL .
The LGA software (, http://as2ts.llnl.gov/lga/) was used to perform structure homology searches against the PDB database to identify all entries with detected (LGA_S>=35%) structural similarity to any of the four modeled domains (see above) within the homology models of SpaK and SpaR. We selected an LGS_S cutoff value of 35% based on our observation that the number and quality of hits increased rapidly at LGA_S<=33% (data not shown) and based on previous work  that determined the minimal structure homology needed to assure quality of structure alignment. Those entries with homology to both respective domains of SpaK and SpaR were selected as putative domain-fusion templates for modeling a SpaK/SpaR complex (Table 1). Reported in Table 1 are the sequence identities between SpaK or SpaR compared to each corresponding domain-fusion template, whereby residue-residue correspondences were extracted from the structure alignments between the models and the domain-fusion templates. We do not report the PSI-BLAST calculated sequence identities, as these are highly inaccurate and meaningless when calculated from sequence alignments at low levels of sequence identity (i.e., below 10%).
The spaK and spaR genes were isolated from Bacillus subtilus strain LH45, a subtilin-producing derivative of strain 168 . Synthetic oligonucleotide primers were used to amplify spaR using methods described previously ,. Briefly, the commercial vector pQE31 (obtained from Qiagen, Valencia, CA), was digested with EcoRI and HindIII, and a fragment containing a truncated spaK gene encoding the C-terminal half of SpaK was cloned into the multipurpose cloning site of the QE31 vector to construct the pQE31-spaK expression vector (Supplemental Fig. S1A). (Note that we succeeded in expressing only the C-terminal residues of SpaK, as the full-length gene did not yield an expression product.) The pQE31-spaR vector was similarly constructed (details are shown in Supplemental Fig. S1B. Vectors (MLD[pQE31-spaR] and MLD[pQE31-spaK]) were transformed into JM109. For expression of the histidine-tagged proteins, the expression plasmids MLD[pQE31-spaK] and MLD[pQE31-spaR] were transformed into M15[pREP4] competent cells (Qiagen), and expressed according to the manufacturer's protocol. Expressed His-tagged proteins were purified using a Ni-NTA resin from Novagen to form slurries that were used to pack a 1.6 cm column, and eluted proteins were dialyzed against a storage buffer and stored in 50-ul aliquots at 80°C. A working stock was stored for several weeks at 20°C. Protein concentrations were determined by Bio-Rad protein assay using the manufacturer's protocol.
Mutant SpaK proteins were prepared by Ana-Gen Technologies (Palo Alto, CA) using the Stratagene QuikChange Mutagenesis Kit. Synthetic forward and corresponding reverse complement oligonucleotide primers were prepared for each of two mutations introduced into SpaK (altered nucleotides are indicated in bold type): at position H247 the histidine was changed to glutamine using forward primer 5′-GTGCTTTGGCACAAGAGATCAAGATTCCG-3′ and reverse primer: 5′-CGGAATCTTGATCTCTTGTGCCAAAGCAC-3′, and at position G392 the glycine was changed to alanine using forward primer 5′-GTAAAAGACACGGCAAATGGATTTTCGG-3′ and reverse primer 5′-CCGAAAATCCATTTGCCGTGTCTTTTAC-3′.
Phosphorylation reactions were performed with each histidine-tagged SpaK wild type and mutant protein in the absence and presence of histidine-tagged SpaR. Upon addition of 32P-labeled ATP, reaction mixtures were incubated for 20 minutes at room temperature, after which the reactions were stopped by addition of 5× phosphorylation sample buffer, then electrophoresed on a 12.5% SDS polyacrylamide gel. The gel was stained with Coomassie blue, dried, and autoradiographed using Kodak X-OMAT AR film.
Phosphorimage analysis was performed to quantify incorporation and turnover of phosphate in assays involving phosphorylation of 6xHis-SpaK. Four samples of protein were incubated in the presence of 32P-labeled ATP, of which three were followed by cold chase treatment with unlabeled 4 mM, 10 mM, or 50 mM ATP, using reaction conditions described previously . Samples were run on a 12.5% SDS-PAGE gel and subjected to autoradiography (not shown) and phosphorimaging. Image intensities of the radiolabeled-phosphorylated SpaK gel bands were analyzed using the Molecular Dynamics Phosphorimager 400.
Thin-layer chromatography was performed using Polygram Cell 300 PEI cellulose plates as described previously . 6xHis-SpaK and 6xHis-SpaR were incubated individually (SpaK) or in combination with 32P-labeled ATP in the absence or presence of EDTA. One ul aliquots from each reaction were spotted onto TLC plates, and chromatography was carried out in 0.75 M KH2PO4, pH 3.75, after which the plate was dried and autoradiographed.
The AS2TS protein structure modeling system  yielded over 30 and over 140 PDB structures suitable as templates for modeling each of SpaK and SpaR, respectively, from which were selected sets of the closest templates with sequence identities ranging from 13% to 28% for SpaK and 24% to 46% for SpaR (see Supplemental Data Tables S1, S2). LGA-mediated structure homology searches against the PDB database using constructed structural models of domains from SpaK (SpaK_d1, SpaK_d2) and SpaR (SpaR_d1, SpaR_d2) yielded 6 domain-fusion templates with structural homology (i.e., similarity based on structure alignment; ) ranging from LGA_S=37% to 95%, and root mean square deviation (RMSD) calculated on superimposed C-alpha atoms ranging from 1.11 to 2.96 (Table 1). Identification of domain-fusion templates suggested that SpaK and SpaR interact forming an interface between domain 2 of SpaK and domain 1 of SpaR. Sequence identities of SpaK and SpaR to corresponding template sequences ranged from 4% to 25%, but in no instance was sequence identity greater than 7% simultaneously to both SpaK_d2 and SpaR_d1. Structural comparison of all identified domain fusion template structures showed that they clustered into two distinct conformations, yielding the following groups: (1) 1f51_AE and 2ftk_AE (Spo0F/Spo0B from B. subtilus), and (2) 1th8_AB, 1thn_AB, 1tid_AB and 1til_AB (SpoIIAB/SpoIIAA from B. stearothermophilus). PDB entry 2ftk was determined to be the optimal domain-fusion template for modeling a SpaK/SpaR complex based on the highest structure similarity to the corresponding two modeled domains: SpaK_d2 and SpaR_d1, and based on the expected intermolecular distance between the putative functional residues H247 of SpaK and D51 of SpaR that were predicted as active site residues (His and Asp) critical for exchanging a phosphoryl group . In order to form a covalent bond with the phosphoryl group, the distances between atoms N of His and O of Asp were expected to be in the range of about 5 Angstroms. The models created based on templates 1f51 and 2ftk satisfied this requirement. 2ftk was also used to complete the homology model of SpaK (Fig. 1) by providing relative positioning of the central (SpaK_d1) and C-terminal (SpaK_d2) domains. The SpaK/SpaR complex was modeled as a trimer, comprising a SpaK homo-dimer and a SpaR monomer, based on the domain conformation between chains A and E from 2ftk (Fig. 3). The constructed model of a SpaK/SpaR complex agreed with structural analysis of the Spo0F and Spo0B interaction reported by Varughese and coworkers , who showed that the geometry of Spo0F binding to Spo0B favors an associative mechanism for phosphoryl transfer. In order to visualize the autophosphorylation of the histidine kinase, and the subsequent phosphoryl transfer to Spo0F, they generated in silico models representing these reaction steps, proposing Spo0B as a model for the autokinase domain of KinA (histidine kinase, consisting of an N-terminal sensor domain and a C-terminal autokinase domain). The level of sequence identity between KinA and SpaK is about 27%, and the KinA sensor domain comprises three PAS (Per-Arnt-Sim) domains that correspond to the N-terminal part of SpaK (1–218; not modeled). The autokinase domain corresponds to the modeled C-terminal part (219–459) of SpaK, and consists of a phosphotransferase subdomain and an ATP binding subdomain. In modeling SpaK we followed Varuguese and coauthors' suggestion that the four-helix bundle of Spo0B is formed through the dimerization of two helical hairpins from two monomers, and that it is a prototype for the phosphotransferase domains of histidine kinases (see Fig. 1A). This concept is supported by the high degree of structure similarity between the C-terminal domain of Spo0B and the ATP binding domains of histidine kinases, as well as by a report  of the crystal structure of the entire cytoplasmic portion of a histidine kinase (a PDB structure, 2c2a), which we used as a primary template for modeling individual domains of SpaK.
Inspection of the constructed SpaK/SpaR complex (Fig. 3A) allowed us to identify specific residues putatively involved in the interaction between SpaK and SpaR or believed to mediate transfer of phosphate from SpaK to SpaR (Fig. 3B). Specifically, we identified the histidine residue at position H247 in SpaK that corresponds to the histidine H30 that is phosphorylated in Spo0B (PDB entry 2ftk_A) (Table 2A), and we identified 3 aspartate residues in close proximity in SpaR (D8, D9, and D51), which we presumed to be involved in transfer of a phosphoryl group bound to the H247 residue of SpaK, if SpaK and SpaR truly mediate a phosphorelay as postulated. These residues corresponded to their equivalents (D10, D11, and D54) in Spo0F (PDB entry 2ftk_E) (Table 2B). Three additional functional residues were identified, which corresponded to functional residues that are highly conserved among response regulator proteins : T78, Y97, and K100 in SpaR, corresponding to T82, H101, and K104, respectively, of Spo0F (Table 2B). Under global superposition, the distances between corresponding functional residues were below 0.8 Angstroms and the local RMSD(3) (root mean square deviation along the main-chain atoms (N,CA,C,O) averaged over three residues: current and immediate neighbors along peptide chain (local superposition); ) values were below 0.5 Angstrom, indicating significant structure similarity in corresponding regions. The sites of phosphorylation, D51 of SpaR and H247 of SpaK, which correspond to D54 of Spo0F and H30 of Spo0B, are shown in Figure 3.
In most histidine kinases the extracellular sensing domains are variable in sequence, reflecting the wide range of environmental signals to which they respond. Conversely, the cytoplasmic portions typically have a conserved catalytic core comprising a set of characteristic sequence motifs known as the H, N, G1, F and G2 boxes , and can be dissected into several distinct functional units ,. Corresponding functional units P1 through P5 were evident upon examination of residues 219 through 459 of our modeled SpaK protein (Fig. 1B), which were determined to comprise an N-terminal dimerization and histidine phosphotransfer domain (DHp; SpaK_d1) and a C-terminal catalytic and ATP-binding domain (CA; SpaK_d2). P1 had a conserved histidine residue (H247) belonging to the autophosphorylation site known as the “H box”. Autophosphorylation was presumed to occur from ATP in the active site of P4 (the kinase domain) to H247 of P1, followed by transphosphorylation from H247 to an aspartate residue (D51) of SpaR. P2 functional units have a specific domain for recognizing the response regulator and assisting transfer of the phosphoryl group. P3 corresponds to the linking domain, through which two SpaK subunits may form a dimer. P4 resembles the ATP binding domain, which autophosphorylates the conserved histidine residue. In histidine kinases most of the residues around the ATP binding site of the P4 unit are conserved, especially those comprising the characteristic sequence motifs (identified in Fig. 1B). In addition, the histidine kinase P4 unit has a loop-like lid (ATP lid) between the F and G2 boxes (corresponding to the SpaK model, residues 409 to 417), which controls the closed-to-open conformational change of the binding pocket. It is postulated that P5 acts as a regulative domain to modulate the activity of autotransphophorylation, responding to signals from the external environment .
To examine sequence homology in structure context between SpaK and various histidine kinases in the 5 “box” regions, we used LGA to globally align the SpaK homology model with all other histidine kinases from PDB that have these structure motifs. Structures with corresponding “box” regions included 2ftk_A, 1tid_A, 1b3q_A, and 2ch4_A. In Table 3 are shown structure-based alignments, including residue-residue correspondences, between our SpaK model (based on 2c2a) and 2ftk_A in the H-box regions, and between SpaK and 2ch4_A in the N-, G1-, F-, and G2-box regions. Calculated structural alignments between our SpaK model and the PDB structures (including those not shown) indicated significant structure conservation within these defined sequence motifs. The residue-residue correspondences arising from the LGA structure alignments were consistent with respect to highly conserved residues identified by Stock and coworkers  and by Grebe and Stock  (see bold-type residue-residue correspondences in Table 3), even in the more variable F-box regions. Within group HPK-3c, a small group of histidine kinases into which Grebe and Stock  classified SpaK, most histidine kinases have an F at the position corresponding to T404 in SpaK, whereas SpaK T404 corresponds to a T in some proteins in group HPK 1a. Furthermore, SpaK F407-Y408 has identity to the corresponding F-box FY in most proteins in group HPK 1a. As group HPK 3c is closely related to group HPK 1a, it is not surprising that there is ambiguity with respect to residue-residue correspondences within the relatively variable F box among the proteins in these two groups. Based on this ambiguity, we examined the alpha-carbon structure alignment between the SpaK model and 2ch4_A to verify that the side chains of the corresponding SpaK Y408 and 2ch4_A F491 were well aligned (not shown), which further supported the residue-residue correspondence between these two residues. Protein CheA (2ch4) is classified in group HPK 9, and as such the sequence alignment also shows an F in the position corresponding to SpaK Y408.
To confirm whether SpaK undergoes auto-phosphorylation and subsequently transfers a phosphate moiety to SpaR, each protein was tested individually and in combination in the presence of radio-labeled ATP (Fig. 4). Combinations of 6xHis-SpaK and 6xHis-SpaR were created using 3 SpaKSpaR molar ratios of 41, 43, and 12 shown in Fig. 4 A and B, lanes 3, 4, and 5, respectively. Only SpaK was phosphorylated in isolation (Fig. 4B lanes 1, 2), indicating that SpaK undergoes autophosphorylation. Phosphorylation of SpaR in the presence of SpaK (Fig. 4B lanes 3–5) indicated that phosphate is transferred from SpaK to SpaR. This transfer was incomplete at a molar ratio of SpaKSpaR of 41, but reached completion at molar ratios of 43 and 12, indicating that transfer of phosphate from SpaK to SpaR reaches saturation as SpaK approaches molar equivalence or reaches molar excess relative to SpaR. These results imply that SpaR acts as a receptor for the phosphate group that is transferred from SpaK.
Quantification of radio-labeled phosphate-bound 6xHis-SpaK was performed to determine whether SpaK might exhibit phosphatase activity (Fig. 4C). Phosphor image analysis was used to measure the incorporation of radio-labeled phosphate by 6xHis-SpaK (Fig. 4C, histogram 1). This quantity served as baseline (100%) for comparison of 6xHis-SpaK samples that had been incubated in radio-labeled Pi followed by cold-ATP chase treatments (Fig. 4C, histograms 2–4). Cold chase with lower concentrations of ATP (4 mM or 10 mM) reduced the level of radio-labeled SpaK to levels about one-third to one-quarter that of the control, whereas a high concentration (50 mM) of unlabeled ATP resulted in a decrease in the rate of phosphate turnover, thereby reducing the level of radio-labeled SpaK only to about 70% that of the control. The decrease in the turnover of radio-labeled Pi on SpaK at high ATP concentration is suggestive of enzymatic inhibition of dephosphorylation (or phosphatase activity) rather than simple hydrolysis.
Thin-layer chromatography was performed to further examine the possibility that either SpaK or SpaR may exhibit phosphatase activity (Fig. 4D). Protein consisting of 6xHis-SpaK alone (Fig. 4D, lane 2) or 6xHis-SpaK in combination with 6xHis-SpaR (lane 3) was phosphorylated in the presence of radio-labeled ATP. In both cases, inorganic phosphate (Pi) was detected, but slightly more Pi and considerably more radio-labeled protein were detected when both proteins were present (compare Pi and Protein in lanes 2 and 3). The ATP-only control (lane 1) produced no detectable radio-labeled Pi, indicating that simple hydrolysis of ATP was not occurring. Furthermore, when phosphorylation was performed in the presence of EDTA, some phosphorylated protein was observed, although no inorganic phosphate was detected (Fig. 4D lane 4). This result, taken together with Fig. C, which suggested the presence of enzymatic phosphatase activity, supports the claim that SpaK (and possibly also SpaR) may possess enzymatic phosphatase activity.
Based on amino acid sequence alignment with other histidine kinases, the highly conserved histidine at position H247 was presumed to be the site of possible auto-phosphorylation, and a glycine located at position G392 in the C-terminal end of SpaK was determined to correspond to the conserved DXG motif of the nucleotide binding domain in related histidine kinases (Fig. 1A, Fig. 1B: H box and G1 box). In the superfamily of phosphotransferases, the conserved residues that form a corresponding motif (DXG in actin, GTG in hexokinase/glycerol kinase, and GNG in acetate and propionate kinases) are observed to be present in binding to a- and b-phosphate groups of the nucleotide . Because several histidine kinases are believed to exist as homo-dimers and it is believed that phosphorylation occurs in trans, in which one monomer binds ATP in the nucleotide-binding domain and then transfers the phosphoryl group to a histidine located in the other monomer, we postulated that mutations at either of these positions might reduce or abolish auto-phosphorylation of SpaK, but that complementation between mutants might occur, effectively restoring function. We used site-directed mutagenesis to construct two mutants (see Materials and Methods): one in which the histidine at position H247 was changed to a glutamine (H247Q), and the other in which the glycine at position G392 was changed to alanine (G392A). Locations of mutated residues are shown in Fig. 1A. Phosphorylation studies of mutants H247Q and G392A revealed that both mutations resulted in loss of phosphorylation when each mutant was tested individually (Fig. 5 A, B; lanes 4, 5) or when individually combined with SpaR (Fig. 5B; lanes 9, 10). However, when the mutant proteins were combined, a detectable amount (approximately 25% that of wild type) of auto-phosphorylation was observed (Fig. 5B, lane 6), suggesting that complementation between the mutants had occurred, and supporting the hypothesis that SpaK forms a homo-dimer. Furthermore, when H247Q and G392A together were subjected to phosphorylation in the presence of wild type SpaR, the phosphoryl moiety was transferred to SpaR (Fig. 5B, lane 12).
In this work we demonstrated a quantitative approach for modeling protein-protein complexes using homology modeling followed by structure-based searches for multi-domain template proteins. In a search for templates upon which to base the model of a putative SpaK/SpaR complex, we used LGA, which applies two scoring schemes: GDT (global distance test) and LCS (longest continuous segment). Based on a previous study involving structure alignments between weakly homologous proteins , we applied a relatively stringent cutoff (LGA_S>=35%)—Pettitt and coworkers  concluded that in order to assure the quality of a structure alignment between two domains, the GDT_TS score (a component of LGA's GDT) must exceed 25. In the current study we had observed a rapid increase in the number of hits obtained using LGA_S=33% and below (not shown), and therefore we selected LGA_S=35% as a conservative cutoff to assure confidence in selecting templates.
Although our approach can be used to identify domain-fusion protein structures that imply a possible functional association between two proteins of interest, it does not in itself provide sufficient information for modeling a physical interaction between the proteins. Protein domains that have less than 30–40% sequence homology to a “domain-fusion” template are likely to assume a similar orientation ,–but at sequence identity levels below this “interaction similarity twilight zone”, additional analysis is needed to make a reasonable prediction regarding the relative orientation of the interacting domains. In the current study, this additional analysis included identification and inspection of putative functional residues coupled with experimental analysis of mutant proteins. Thus, a protein-protein-complex model for a SpaK/SpaR interaction was initially built based on a structure-driven domain-fusion search method, followed by validation based on bioinformatic analysis and experimentation.
Our modeling effort supported the hypothesis that SpaK and SpaR may function as a histidine kinase sensor and a response regulator, respectively, in a two-component system. Based on homology modeling and domain-fusion analysis, residues corresponding to those believed to function in phosphorylation and subsequent transfer of a phosphate moiety from sensor to response regulator in other two-component systems were identified (Fig. 3, Tables 1, ,2).2). Modeling of SpaK enabled structure comparisons with related sensor proteins (2ftk_A, 1tid_A, 1b3q_A, 2ch4_A), identification of sequences corresponding to the 5 highly conserved regions (“boxes”) that characterize class II two-component system proteins ,, (Table 3), and mapping of these sequences to the homology model of SpaK (Fig. 1B). Functional residues and conserved sequence motifs of our modeled SpaK/SpaR complex matched well with those of known sensor/response-regulator systems. Structure-based residue-residue correspondences (Tables 2, ,3) agreed3) agreed with sequence alignments used previously to classify histidine kinases ,,, in which SpaK was placed in group HPK 3c in an 11-group classification by Grebe and Stock , but was unclassified according to the 5-type classification of Kim and Forst .
Phosphorylation studies of SpaK and SpaR showed that SpaK auto-phosphorylates and subsequently trans-phosphorylates SpaR (Fig. 4), confirming the hypothesis based on structure-driven domain-fusion analysis that SpaK and SpaR are functionally related and physically interact, and that the quaternary structure of the complex could enable transfer of a phosphate moiety between the protein subunits. Phosphorylation and complementation analyses using SpaK mutants suggested that residues H247 and G392 are important for auto- and trans-phosphorylation and that SpaK likely forms a dimer in which ATP binding and hydrolysis functions are split between the protomers (Fig. 5). Whereas both SpaK mutants (H247Q and G392A) were deficient in auto-phosphorylation (Fig. 5, lanes 4,5), this function was apparently restored when the mutants were combined (Fig. 5, lane 6), suggesting that complementation had occurred between the mutants. Complementation between H247Q and G392A also apparently restored trans-phosphorylation, as evidenced by phosphorylation of SpaR in the presence of both mutants (Fig. 5, lane 12). In an equimolar mixture of mutants H247Q and G392A, one would expect that approximately one-half of the resulting dimers would comprise a protomer of each mutant. Furthermore, phosphorylation would occur from the H247Q mutant to the G392A mutant, but not in the other direction, since G392A should not be able to bind ATP. Therefore the levels of auto-phosphorylation or trans-phosphorylation would not be expected to exceed one-half those of wild type SpaK. Also, although the H247Q/G392A mixed dimer may have had restored function, it would be expected to have functioned at less than the efficiency of a wild type SpaK dimer; since dimer formations between non-productive forms would occur, one would expect phosphorylation to proceed more slowly than in the wt. This is consistent with the observation that phosphorylation of or by H247Q combined with G392A (lanes 6, 12) occurred at levels considerably below those of wild type SpaK (lanes 3, 8).
In modeling the interaction between SpaK and SpaR we identified 6 suitable domain-fusion templates (Table 1), which were structurally clustered into two groups (see Results), each having a distinct conformation. Both groups displayed the same interaction pose with respect to the domain-domain interaction. Although each of the identified domain-fusion templates would have yielded a SpaK/SpaR complex model consistent with the experimental data, the criteria for selecting 2ftk as the domain-fusion template were based on combined structural identities between domains of 2ftk and the SpaK and SpaR models, on the resulting distance between putative functional residues involved in phosphate transfer (Fig. 3), and on the presence of a helical bundle domain, which enabled construction of a complete model. Interestingly, the domain-domain conformation between the helical bundle and the ATPase domains of 2c2a, used for modeling SpaK, differed from that of the corresponding domains within 2ftk. This difference suggests the possibility that a conformational change might take place when SpaK interacts with SpaR. Furthermore, it should be noted that the phospho-transfer in Spo0B-Spo0F (2ftk) occurs in the opposite direction (Asp to His) as that demonstrated here in SpaK-SpaR (Figs. 4, ,5).5). This is not surprising, and does not diminish the value of 2ftk as a template for modeling a SpaK/SpaR interaction, given the considerable mechanistic diversity observed among structurally conserved domains comprising sensor/response-regulator systems .
Although structure modeling and experiments involving phophorylation studies strongly suggest functional and physical interactions between SpaK and SpaR, we cannot be entirely certain that our quaternary structure is correct with respect to domain composition, conformation, or orientation, as the methodology is dependent on existing structural data within PDB; it is possible that none of the domain-fusion templates detected by our approach is truly representative of the physical interaction between SpaK and SpaR, as homology modeling is, by definition, data driven. Due to the low sequence homologies between SpaK and SpaR and the identified domain-fusion templates, one could not conclude with any degree of certainty based solely on template identification that the interaction pose modeled here is likely to be correct . However, combining bioinformatics analysis of known functional motifs (sequence “boxes”) and putative interacting residues with experimental evidence of function allows us to assert the value of the homology model of a putative SpaK/SpaR protein-protein complex. Our approach detects existing putative domain-fusion templates, which may suggest testable hypotheses regarding quaternary structure and function; a structure-based approach for identification of “Rosetta Stone” proteins greatly enhances structure-function hypothesis generation by providing structural context for putative functional residues. Additional bioinformatics analyses of a putative protein-protein complex model, which may verify the correctness of the model, include alignments of modified sequence profiles , for example, which use quantitative methods applied at the domain-domain interface to evaluate the likelihood of a stable interaction.
Although many two-component signal transduction systems have been identified by sequence homology, we wish to point out that a purely sequence-based approach would not have yielded the structural domain-fusion templates that were identified in this study. The strength of our approach is in its ability to identify putative domain-fusion templates based on structure homology searches in cases where sequence identities between the proteins of interest and the putative domain-fusion templates are low. Sequence identities of candidate domain-fusion templates to domains of SpaK and SpaR ranged from 4% to 25%, but in no instance was sequence identity greater than 7% simultaneously to both (Table 1). This point is emphasized by the lack of sufficient sequence-based evidence for linking these proteins using the standard domain-fusion approach: as of this writing, SpaK and SpaR are not linked in this way, for example, in Prolinks , nor did we find them linked by other sequence-based or empirical methods in DIP, BIND/BOND, MIPS, IntAct, MPIDB, or InterPreTS –. Homology modeling of SpaK and SpaR using a standard methodology  and subsequent structure-based searches using a quantitative structure comparison algorithm  is what enabled a more sensitive, structure-based homology search against PDB. In conclusion, our method provides a basis upon which a high-throughput system for identification of putative protein-protein interactions could be built on a whole-genome scale.
Construction of vectors for expression of SpaK and SpaR proteins. A) Expression vectors pQE-31-spaK. B) pQE-31-spaR.
(0.38 MB TIF)
Candidate templates for homology modeling of SpaK monomer.
(0.06 MB PDF)
Candidate templates for homology modeling of SpaR.
(0.06 MB PDF)
The authors have declared that no competing interests exist.
Prepared by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. The bioinformatics work was supported by an LLNL-LLNS internally funded grant to CZ and AZ through the Laboratory Directed Research and Development program, and the experimental work was supported by grant R01-AI24454-12 to NH from the National Institute of Allergy and Infectious Diseases, NIH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.