We begin by considering the PhoB dimer as it interacts with DNA, for which no complete structure exists. In the crystal structure of the PhoB ED dimer bound to
pho box DNA (PDB accession code:
1GXP[
7], shown as magenta and white molecules in Figure ), the binding of DNA direct repeats force the ED dimer to bind with a tandem symmetry. The known structure of the PhoB RD dimeric complex [
8] (PDB accession code:
2JB9), however, follows a two-fold rotational symmetry. While it is possible to simply rotate one of the EDs relative to the RD to make a complex satisfying both structures, this procedure results in a tightly stretched linker, asymmetry between the two PhoBs, and fabricating an RD-ED interface from scratch. Alternatively, we examine the variety of response regulator structures that contain RD and ED together (PDB accession codes:
1KGS,
1P2F,
1YS6,
2GWR,
2OQR,
1A04,
1YIO). These structures contain the information of RD/ED MBGs and demonstrate that the two domains can interact with a variety of binding geometries.
Combining the information of RD/ED MBGs with the structure of the ED/ED dimeric complex (1GXP), we explore the potential solutions for the PhoB dimeric complex. Out of the RD/ED conformations, only that of DrrB [
20] (1P2F, shown as the red and the blue molecules in Figure ), a PhoB/OmpR homolog, provides a satisfactory solution where the two RDs are in contact but not overlapping. Combining the structural information of ED/ED (1GXP), RD/ED (1P2F), ED (1GXP) and RD (2JB9), the model of the PhoB dimeric complex is developed (shown as the white and magenta molecules bound to DNA in Figure ). This model structure has appealing features including: good stereochemistry (no clashes between domains, stable interface surface area), protein-like structure (contents of secondary structures, density, etc.) and several of the known MBGs.
This PhoB in the modeled complex contains a previously unseen interface between RDs, however, because of the tandem head-to-tail orientation - that is different from the two-fold symmetry observed in the PhoB RD/RD dimer (2JB9). The next question is "does the new MBG between the two RDs in the model exists in other protein domains of a similar fold?" To answer this question, we search for interfaces between domains that have the flavodoxin-like fold and give the two domains with a tandem symmetry. Interestingly, the CheY (a chemotaxis protein) of the two CheY-P2 heterodimers in the crystal asymmetric unit [
21] (PDB accession code:
1FFG), has the two flavodoxin-like molecules following a tandem symmetry. This contact of the two CheYs (1FFG) in the crystal is very similar to that of the PhoB dimeric RDs as shown in Figure . While this particular CheY dimeric arrangement may not be functionally relevant for the CheY-CheA interaction, it does provide a potential MBG for the interaction of flavodoxin-like molecules.
We turn our attention to the transcription initiation complex. We choose to use the transcription initiation complex with DNA and the Catabolite Activator Protein bound to it (PDB ID:
3IYD) as a template for our model. The DNA duplex can serve as a structural link and allow the assembly of all the components into one functional unit. All the proteins in the complex either have a direct contact (i.e., α-subunit, σ-subunit, PhoB) or contacts thru other molecules (i.e., β-subunit, β'-subunit, ω-subunit) that can link to the DNA molecule. The DNA molecule that we select for this study is the
E. coli K-12 PhoA promoter (400854 to 400950 bp) with both σ-subunit and PhoB binding sites (information derived from RegulonDB [
22]). To enable comparison, the sequences of the two promoters (CAP and PhoA) are shown in Figure with the CAP promoter (as found in 3IYD) shown on the top and the PhoA promoter shown at the bottom. The protein binding sites on the two promoters are highlighted in boxes. The main difference between the two promoters is the relative binding locations for the two factors. The CAP binding sites are located upstream of the -35 site while the PhoB binding sites are overlapping with the -35 sites. There was a structural concern, whether the -35 and the two PhoB binding sites can be utilized simultaneously. When these binding sites are utilized simultaneously, a set of interactions between the RNAP and the two PhoB molecules can be predicted by our model.
In additional to the difference in the binding sites, changes in the DNA from 3IYD will be required because the CAP dimer binds and bends the DNA promoter much more than does the PhoB dimer. Therefore, the promoter region of the DNA in the PhoB transcription initiation complex has to be remodeled from the template structure (3IYD). Using a "motif modeling approach" as described in our earlier work [
23], the structure of the DNA upstream to this overlapping region (including the PhoB binding sites) can be modeled using the structure of DNA from the PhoB ED/DNA complex (from 1GXP). This promoter DNA is extended upstream with a piece of canonical DNA duplex to accommodate the α-subunit C-terminal domain (CTD) binding. As a comparison, we have modeled the same piece of DNA upstream to this overlapping region using only a piece of canonical DNA B-duplex. The template DNA (from 3IYD), the remodeled promoter DNA for PhoB transcription initiation complex, and the upstream DNA in a canonical B-duplex conformation are shown in Figure in white, magenta, and cyan respectively.
After the structure of the promoter DNA duplex is re-modeled, the corresponding proteins can be assembled back into the PhoB transcription initiation complex using the information of their MBGs with their targeted sites on the DNA (Additional file
1). With the remodeling of the promoter DNA, the positions and orientations of α-CTD and σ-CTD are different from those in the template structure. The connecting loops between the N-terminal domain (NTD) and CTD of the α- and σ-subunits also needed to be changed accordingly [
24]. The resultant structure (shown in Figure ) has the subunits interacting but not overlapping with each other, a necessary condition for complex structural modeling. According to the model, α-CTD, σ-CTD as well as a segment (residues 839 to 917) of β-subunit are in direct contact with the two PhoB molecules in the complex. To improve the stereochemistry between the interacting subunits, the remodeled portions of the complex, including the DNA promoter, the PhoB dimer, the α-CTD, the σ-CTD and residues 839 to 917 of β-subunit were subjected to a refinement procedure using AMBER [
25].
The energy-refined structure of this portion of PhoB transcription initiation complex is shown in Figure and a coordinate file is available as supplementary material. The clearest self-consistency check from our model is that the overlapping binding sites covering the -35 region allow the simultaneous binding of the PhoB dimer and the σ-CTD without violating the volume exclusions for all the molecules involved in the binding. Both α-CTD and σ-CTD interact directly with one of the PhoB molecules (shown in red in Figure ) that binds to the site upstream of the -35 region. For a more detailed check on the validity of our model, we note that the residues at the interface between these molecules include: R-586, Q-589, I-590, A-592, K-593 from the α-CTD, D-258, V-264, A-267, N-268 from the σ-CTD and W-184, G-185, V-190, E-191, D-192 from the PhoB (as highlighted in Figure ). This result is consistent with the four PhoB residues (W-184, G-185, V-190 and D-192) identified to be involved in the polymerase binding based on mutation study [
26]. The residues on the two PhoB molecules that interact directly with α-CTD, β-subunit and σ-CTD are annotated in Figure . Our results indicate that both the RD and ED domains of the two PhoB molecules in the dimer are interacting with the RNAP/σ-subunit of the transcription initiation complex. The Solvent Excluded Surface Areas for PhoB-a/α-subunit, PhoB-a/σ-subunit and PhoB-b/β-subunit are 2,867 Å
2, 1,098 Å
2 and 2,165 Å
2 respectively. These values are consistent with those (639 Å
2 to 3,228 Å
2) [
27] observed in the heterocomplexes from PDB.
There exist off-the-shelf software that allows dockings of proteins or protein domains into complexes/full proteins (e.g., ZDOCK [
28], AutoDock [
29], RosettaDock [
30]). These programs apply different sampling approaches and scoring functions with various degrees of success (e.g., see CAPRI [
31] assessments). These docking procedures seem to work at their best if the interaction between the components is strong and/or there exists a global binding minimum. As a quick comparison, we have downloaded one of these programs, ZDOCK, and generated 2,000 structures (MBGs) docking the two domains RD (2JB9, residues 3-123) and ED (1GXP, residues 127-229) for deriving the PhoB structure. The two domains (RD & ED) of PhoB molecule are separated by a loop of 4-peptides group. There is a physical limitation for a 4-residues loop to make the connection. If the cut-off length for a 4-residues loop is set to be 14 Angstrom (approximately corresponds to a complete extended conformation), only 2.12% (43) of the 2,000 MBGs satisfied the connection criteria. If we focus on the set of the top 100 MBGs, structures 21 and 96 are the two that allow the RD-ED connection. A further look at the PhoB-PhoB dimer structures modeled based on the two ED-RD MBGs and the structure of the ED-ED-DNA complex (1GXP), neither structure is stereochemically feasible due to the domain overlapping including clashes between protein-protein and protein-DNA. If all the MBGs of the two domains from the docking study are compared to the MBG from our model, the closest came from structure 1,934 with a RMSD of 4.0 Angstrom (based on C
α atoms only). Overall, the docking procedure is less than efficient (only ~2% of the docked structures satisfies the connectivity constraint). It was also found that the selection of the relevant PhoB structure out of the pool of a large number of potential MBGs from the docking study is a non-trivial task.