|Home | About | Journals | Submit | Contact Us | Français|
A major obstacle towards elucidating the molecular basis of transcriptional regulation is the lack of a detailed understanding of the interplay between non-specific and specific protein–DNA interactions. Based on molecular dynamics simulations of C2H2 zinc fingers (ZFs) and engrailed homeodomain transcription factors (TFs), we show that each of the studied DNA-binding domains has a set of highly constrained side chains in preset configurations ready to form hydrogen bonds with the DNA backbone. Interestingly, those domains that bury their recognition helix into the major groove are found to have an electrostatic hot spot for Cl− ions located on the same binding cavity as the most buried DNA phosphate. The spot is characterized by three protein hydrogen bond donors, often including two basic side chains. If bound, Cl− ions, likely mimicking phosphates, steer side chains that end up forming specific contacts with bases into bound-like conformations. These findings are consistent with a multi-step DNA-binding mechanism in which a pre-organized set of TF side chains assist in the desolvation of phosphates into well defined sites, prompting the re-organization of specificity determining side chains into conformations suitable for the recognition of their cognate sequence.
Understanding the interactions responsible for DNA recognition is critical to reveal the mechanisms of several cellular processes including transcription, replication, modification and restriction. Although in many cases the target sequence for a given DNA-binding protein is surrounded by a long stretch of non-specific genomic DNA sequences, proteins are able to find their binding sites very efficiently. It has long been accepted that proteins scan binding sites using a mechanism consisting of one-dimensional (sliding) and three-dimensional search (hopping) (1,2).
Initially, specific DNA recognition was thought to involve a limited number of hydrogen bonds between the protein side chains and DNA bases (3). It has now become clear that besides electrostatics, water molecules and solvation effects (4), shape complementarity of protein and DNA, sequence dependent DNA deformability and the physiological environment can also play critical roles in DNA recognition (5–11). The emergence of specific protein–DNA complex structures in recent years has been instrumental in our understanding of how proteins recognize specific DNA sequences (12). Molecular dynamics (MD) simulations of protein–DNA complexes have provided insights on the dynamics of the interactions and the role of water at the complex interface (13–16). However, the molecular basis of the events leading to protein–DNA recognition and binding specificity is not yet fully understood. Part of the problem is that, when DNA is involved, interactions are dominated by charged and polar groups that are highly dependent on the solvent and ionic environment (5,17–24). Positively charged counter ions associate with the negatively charged phosphate groups of nucleic acids, thus maintaining neutrality in solution. Theoretical studies of protein–DNA complexes that concentrate on the effects of counter ions have mainly used two approaches, counter ion condensation (CC) (25,26) and Poisson–Boltzmann (PB) (27) theories. The main difference between these two approaches lies in the description of salt effects around the nucleic acid. CC theory considers two distinct layers of counter ion concentration, one uniform layer around the DNA and a distant salt dependent classical ion atmosphere. PB theory, on the other hand, describes the ionic environment as a continuum.
In protein–protein interactions, the role of ions is described by the screening of long-range electrostatics consistent with classical Debye–Huckel theory (28). Indeed, ionic strength has been shown to tune the association rate of some highly optimized receptor-ligand systems by as much as five orders of magnitude (29). Generally, increasing ionic strength results in decreasing binding affinity. Strikingly, Jen-Jacobson and collaborators (20,23) and others (22) have shown that the latter is not necessarily true for protein–DNA interactions, where in several instances it has been shown that the binding affinity decreases with decreasing ionic strength (18,19,21,24,30,31) even when the experimental conditions, and individual thermodynamic parameters are different. One example of the role of salt in protein–DNA interactions is catabolite gene activator protein (CAP) binding to the lac promoter region, which also involves DNA bending. For this system, Fried and Stickle showed that for physiologically relevant salt concentrations between 0.05 and 0.2 M, there is a 5-fold increase in binding affinity when the salt concentration is increased (22). Similar behavior has been observed in Escherichia coli lac repressor–operator complex (18,19), EcoRI (20) and EcoRV (23) in 0–0.1 M range. The aforementioned CC phenomenological picture is actually able to fit the changes in binding affinity as a function of ionic strength, but no molecular/structural mechanism is revealed. Although grouping all of the above studies with one model may lead to oversimplified models, experimental observations still direct us to the question of whether there is a common underlying role played by ions affecting the dynamics and mechanism of specific and non-specific binding at the molecular level.
Motivated by the principle of pre-organization observed in protein–protein interactions (32,33), our goal here is to study its applicability in protein–DNA interactions in the presence of counter ions. Specifically, we start from the observation (33) which predicts that in the absence of their binding partners, side chains that bury a large amount of solvent accessible surface area (SASA) in the acceptor protein sample bound-like conformations for a significant amount of time, in order to facilitate the rapid formation of a bound-like intermediate state as the first step towards the formation of the high affinity complex. The structural properties of the C2H2 zinc finger (ZF) (34,35) and engrailed homeodomain (36) transcription factors (TFs) make these proteins ideal to carry out this analysis. Indeed, from a practical point of view, both their size and relatively small backbone structural rearrangement upon binding are crucial for detailed molecular analyses and simulations. Moreover, revealing the structural basis of recognition for these TFs is important in its own right, since they are some of the largest family of nucleic acid binding factors in eukaryotes.
Our hypothesis is that side chains important for recognition (i.e. side chains that contact DNA) have an inherent bound-like dynamics prior to the encounter with DNA. We show that eight C2H2 ZF domains from three TFs and the engrailed homeodomain share this property, though some key solvated side chains are bound like only in the ion-rich environment of DNA. We find that a vast network of side chains that form hydrogen bonds with the DNA backbone is always constrained into bound-like configurations. Based on MD simulations of TFs without DNA, we show that as TFs are exposed to an increasing number of negatively charged Cl− counter ions, key side chains rearrange into more bound-like conformations. Analysis of the MD trajectories show that the increasing bound-like ordering of side chains is due to ions that physically interact with the TFs at an electrostatic hot spot located at the same site where negatively charged DNA phosphates bind. In the systems we analyzed, this site is characterized by at least a pair of basic groups on either side of the ion and a third hydrogen bond donor. For ZFs, the latter corresponds to the conserved histidine at helix position +7. These findings suggest a mechanism where TFs in the presence of the DNA backbone will rapidly equilibrate into bound like interactions that lock the phosphates into well-defined hot spots, resulting in a weakly stable non-specific binding complex, while side chains involved in hydrogen bond contacts with DNA bases are also well posed for triggering specific interactions in the presence of their corresponding counterparts. Given the relatively high concentration of counter ions near the DNA, we speculate that Cl− might also act as surrogates for the phosphate groups, not only restraining critical side chains in bound-like conformations but also by providing a natural competing substrate for the positive ions that actively neutralize the DNA backbone (17,20,22) prior to binding. Namely, if upon association these counter ions (NaCl) get closer than their solubility threshold of ~6 Å, ions might prefer to move to bulk water, contributing towards the desolvation of the binding interface. In summary, MD simulations provide a more detailed picture of side chains involved in direct physical interactions with DNA, and the role of electrostatics in protein–DNA binding (3,37,38).
C2H2 zinc fingers and homeodomains are two of the largest families of transcription factors in eukaryotes, both having highly stable folds. More importantly, since their function is mostly to position themselves on the DNA without significantly affecting its structure, it is reasonable to expect that they have a similar binding mechanism. Arguably, enzyme binding might have other requirements.
The C2H2 family contains proteins that have two or more ZF domains that work together in a modular fashion to recognize specifically the corresponding DNA target sequences. The classical C2H2 ZF domain is composed of a ββα fold that typically interacts with three to four base-pairs of DNA using key residues in the α-helix (Figure 1). The ZF-fold is held together by a tetrahedrally coordinated zinc ion and a small hydrophobic core (39).
EGR factor (39) (Protein Data Bank (PDB) code 1AAY) protein has three ZFs (Figure 1A). The α-helix of each finger fits into the major groove of DNA, forming specific contacts with DNA bases. Finger I (FI) binds to the GCG triplet near the 3′ end of the primary DNA strand (Figure 1B). FII binds to the TGG triplet in the center and FIII binds to the GCG triplet near the 5′ end of the primary DNA strand (Figure 1B). The helical domains of FI and FIII have the same sequence and identical bound structure with DNA. Their critical residues are an Arg preceding the α-helix (Arg−1), an Asp on the second position of the α-helix (Asp+2), a Glu on the third position (Glu+3) and an Arg on the sixth position of the α-helix (Arg+6). FII also has an Arg immediately before the helix and an Asp on position 2 of the helix. But it has a His on the third position of the helix (His+3) and a Thr on the sixth position of the helix (Thr+6). Arg residues in all three fingers form a pair of hydrogen bonds with guanines. EGR also has several contacts with the DNA phosphate backbone (see below). In particular, the first His (His+7) coordinating the zinc ion forms a hydrogen bond to a phosphate on the primary strand. A conserved Arg on the second β-strand in each finger also contacts a phosphate on the primary strand.
TFIIIA has six fingers in the crystal structure (40) (PDB code 1TF6). Fingers I–III wrap around the major grove of DNA like those of EGR. Fingers IV–VI form an open structure. Only FV have contacts with DNA in the major grove. Fingers I, II and V have +3 and +6 contacts, and the conserved His+7—phosphate contact similar to those of EGR. In addition to these contacts FIII has an additional +10 contact (40).
GLI protein has five fingers in the crystal structure (41) (PDB code 2GLI). FI does not make any DNA contacts but makes extensive protein–protein contacts with FII (41). FII and FIII have contacts to DNA backbone. FIV and FV make extensive contacts with the DNA. FIV has contacts through +1, +2, +3 and +6 positions of the α-helix and FV has contacts through −1, +2, +3, +5 and +6 positions.
Engrailed homeodomain protein (36) (PDB code: 3HDD) forms a globular fold consisting of an extended N-terminal arm and three α-helices and binds to an optimal TAATTA sequence. The N-terminal arm and the third α-helix make base contacts. The recognition helix binds to the major groove and has base contacts through residues Ile47, Gln50 and Asn51. Arg5 from the N-terminal arm binds to the minor groove. In addition to the base contacts, Thr6, Tyr25, Arg31, Arg53 contact the DNA phosphate backbone.
MD simulations were performed using the MD simulation package GROMACS 3.3.1 (42) on individual fingers of EGR TF Zif268, TFIIIA, GLI and the engrailed homeodomain. FI and FII of GLI were simulated together. In all simulations, based on neutral pH conditions, basic Arg and Lys residues were positively charged, and acidic Asp and Glu residues were negatively charged. His residues coordinating the zinc ion were neutral with the hydrogen atom on the Nδ atom of the His side chain, since the electronegative Nε atom in these histidine residues interact with the zinc ion. Each individual finger or homeodomain was centered in a rhombic dodecahedron box with a 15 Å minimum distance from the protein surface to the box edges. The system was solvated with simple point charge water molecules giving ~4600 waters for ZFs and 6100 waters for the engrailed homeodomain. Then, the systems were minimized by using steepest descent method with GROMOS96 (43) force field. Desired numbers of ions ranging from none to 20 were added by replacing water molecules randomly with a minimum distance of 6 Å between the ions and the protein. The temperature was coupled to a bath of 300 K with a weak coupling time constant of 0.1 ps. The pressure was coupled to 1 Bar using Parrinello–Rahman method (44). Initial velocities were generated randomly from a Maxwell distribution at 300 K.
To better deal with charged ions, a twin range cut-off radius of 10 Å was used in the simulations for non-bonded interactions, as opposed to Ewald boundaries that require an overall neutral simulation box. Default GROMOS96 (43) parameters were used for the Zn+2, Cl− ions and all other residues. The ion concentration of each system depends on the volume of the simulation box. For example, one Cl− ion corresponds to ~11.5 mM in the simulations of EGR FI with a 144.9 nm3 simulation box. We note that ionic solvation is hard to treat in most molecular mechanics force fields, as well as PB, where non-linearities have to be taken into account. The above notwithstanding, here we are mostly interested in characterizing the loci of ion sites in the protein surface, and the impact of these ions in side chain dynamics, as opposed to Debye screening or protein–protein interactions that might depend on more subtle details of the ions sampling in the water box.
Since the zinc ion in ZF proteins has a structural role (45), we harmonically constrained the zinc ion and the zinc coordinating residues to keep the tetrahedral coordination using a force constant of 2.4 kCal/mol/Å2, as well as N and C atoms of the protein. These constraints are consistent with recent NMR solution structures of various C2H2 ZFs which show that the ZF is highly stable (46,47), changing little from the unbound to the bound structure. Supplementary Figure S1 shows a superimposition of fingers from the different ZF proteins including a solution NMR structure. We performed multiple 5 and 9 ns long runs, starting with different counter ion concentrations with a time step of 2 fs using periodic boundary conditions. Coordinates were saved every picosecond. The last 4 or 8 ns of the trajectories are analyzed. Four or more independent runs were performed for each finger at each ion concentration with at least 20 ns of aggregate simulation time. The total simulation time for all fingers was over a microsecond.
The side chain dynamics are analyzed by extracting snapshots from each MD trajectory. The snapshots are overlapped with the bound crystal structure of each domain using the α-helix Cα atoms. The α-carbons of each residue are further translated to coincide with the α-carbon of the side chain in the crystal structure. The RMSDs are calculated with respect to the crystal structure using the side chain heavy atoms starting from Cβ atoms. Following the analysis in refs. (33,48), Arg, His, Tyr, Lys and Trp side chains are considered bound-like if the RMSD is under 2 Å. Glu, Gln is considered bound-like if the side chain RMSD is <1.5 Å. Asp, Asn, and Leu side chains are considered bound-like with RMSD <1.25 Å and Thr is considered bound-like when RMSD is <1 Å. We also cross-checked that side chain RMSD <2 Å correlates with contact distances between H-bonds in both Supplementary Figures S2B and S3B. Supplementary Figure S4 also provides an analysis of top side chain clusters of solvated and buried side chains.
Residues important for recognition typically bury large amounts of SASA upon binding (33). Table 1 lists amino acids that contact DNA bases, including residues at helix positions −1, +3 and +6 in EGR domains (see ‘Methods’ section for description of the DNA-binding sites on the recognition helices). Also listed is the SASA they bury upon binding, and the percent of time these side chains are in a conformation close to their bound structure in two solvents, one without and one with ions (equivalent to a physiological ion concentration of 150 mM, i.e. 8–20 ions depending on the volume of the simulation box). As expected, the residue that buries the largest amount of SASA in each domain is always contained in this table of specificity determinant contacts. In particular, this data readily identifies Arg+6, Arg−1 and Arg−1 in FI, FII and FIII of EGR, respectively, as the main specificity determinant residues in this complex (in grey), which are relatively free in the unbound state but bury the largest amount of SASA upon binding. MD simulations also show that, in the absence of DNA, these side chains in the presence of Cl− ions spend a significant amount of time (~30% or more in a nanosecond time scale) in configurations similar to the one they acquire in the bound structure with DNA. Table 1 shows that 12 out of the 17 side chains that are more than 80% buried in the complex increase their bound-like behavior under physiological conditions, three side chains remain about the same, and the other two are more than 69% bound-like regardless the number of ions in solution.
Figure 2 shows more in detail the RMSD of the main specificity determinant residue of EGR (49), Arg at position +6 in FI, with respect to its bound conformation as a function of time during the last 4 ns MD simulations and increasing number of Cl− ions in the water box. Figure 2F shows the RMSD of Arg+6 with no ions present in the simulation box. In this case, the side chain is found in a bound-like conformation (i.e. <2 Å RMSD from the bound structure) 29% of the time (average of four independent simulations is 23 ± 6%). Addition of counter ions increases this percentage to as much as 79% (Figure 2A). The histograms in the right insets clearly show how the distribution of Arg+6 conformations shifts to low RMSDs with respect to the bound structure as a function of Cl− ion concentration. In addition, we note that the bound-like conformations correlate with the heavy atom charge–charge distances for the protein–DNA contacts (see Supplementary Figures S2B and S3B).
The correlation between counter ions and bound-like behavior was also observed in other specificity determining buried side chains. For instance, in EGR, Arg+6 in FIII (Figure 3) and His+3 in FII (Table 1). A similar correlation was observed for GLI, TFIIIA and engrailed homeodomain (Table 1), as well as for side chains involved in non-specific binding. In all cases, the percentage of bound-like side chains observed in the simulations appears to saturate beyond a certain number of ions consistent with an ‘effective’ ionic concentration of ~150 mM.
Two relatively buried side chains do not appear to follow the aforementioned trend. Arg62 (Pos. +6) in FII of TFIIIA is not as bound-like as one would expect, though the cavity next to it (see Supplementary Figure S2) suggests that it might not be as buried as estimated by NACCESS (50) using a water radius of 1.4 Å. Moreover, large differences on this side chain between the crystal (40) (PDB:1TF6), a 3.1 Å resolution and NMR (51) (PDB:1TF3) structures make the definition of bound-like rather difficult. Based on our own energetic analysis (4), we find that if Arg62 is buried then the NMR conformation will be more stable since Arg62.Nε forms an intra-molecular hydrogen bond with the backbone of His59 (Supplementary Figure S2). On the other hand, if Arg62.Nε is not buried then it can form a protein-solvent hydrogen bond, and both crystal and NMR configurations will have very similar energies. Note that burying a free NH group is energetically unfavorable. MD runs of the NMR structure show that Arg62 is bound-like 20% of the time in the presence of ions, forming the aforementioned backbone bond during this time. Another special side chain is Arg5 in engrailed homeodomain (PDB: 3HDD). This side chain is part of the fully flexible N-terminal domain of the protein. Contrary to a side chain in a structured domain, this side chain can always undergo induced fitting once the main recognition helix docks into the major groove, without interfering with the binding process. Side chains that do not contact DNA and are solvated in the complex do not sample bound-like conformations. Supplementary Figure S4 shows the relation between bound-like RMSD and probabilities of rotamers from the rotamer library (52) that belong to the same clusters for three side chains of EGR FI. Arg114 precedes the hydrophobic core residue Phe and is facing towards the solvent in the EGR complex (Supplementary Figure S4B). The MD simulations show that this side chain is bound-like only 1% of simulation time. The MD clusters of Arg114 do not correlate with rotamer probabilities. For Arg118 and Arg124, there is also little correlation between MD clusters and rotamer probabilities [see ref. (33) for further evidence of the disagreement between MD clusters and rotamer probabilities].
Rearranging a misfolded side chain at the core of the binding interface is much more difficult than to do it on the periphery. Strikingly, we find that the dynamics of the two side chains that bury the largest amount of SASA in EGR, Arg+6 and Arg−1 in FI and FIII, respectively, is more than 60% bound-like, while the partially exposed side chains capping the N- and C-terminals (Arg−1 and Arg+6 in FI and FIII, respectively; see Figure 1) is between 20 and 50% bound-like. Hence, despite the fact that Arg−1 and Arg+6 in FI and FIII domains have an identical crystal structure and helical binding sequence (i.e. RDER/GCG), subtle longer range interactions streaming from differences in the sequences of the β-strands have led to very different dynamics: Buried side chains are strongly native-like, non-buried side chains are less so. These differences in side chain dynamics are, perhaps, a glimpse of the extent that evolution has tuned the complementarity of these interactions.
The impact of ions in the dynamics of side chains is no accident, but in fact it correlates with a weak binding site for Cl− ions located at the same position where the negatively charged phosphates bury the largest amount of SASA in the complex structures (see highlighted rows in Table 2). Further supporting this observation is the distribution of the distance between the position of these phosphates in the crystals and their nearest Cl− ions during the MD simulations (Figure 4). For comparison, Figure 4A and D also show no significant residence time of Cl− ions near the loci of the second most buried phosphate in the corresponding crystals.
It is interesting to note that in seven out of a total of eight ZFs the ion binding site corresponds to the pocket supported by the conserved Nδ of His+7, which is also the most highly conserved ZF-DNA backbone contact (35). The sole exception is the pocket of the G26 phosphate of FI of TFIIIA that makes a bond with the OH group of Tyr24. Note that most ZFs have the highly conserved Phe24 at this position (35). Fully consistent with the aforementioned correlation, our simulations also show that Cl− ions shift their interaction site to the Tyr24 site.
Overall, MD simulations of 15 different DNA-binding domains (including one homeodomain) detected a robust Cl− ion interaction site in nine of them, and did not observe one in six. Crystal structures show that all nine domains bury their recognition helix deep into the major groove [as most DNA‐binding proteins do (53)], whereas none of the six domains bury their helix as deep (see crystals of FI, FIII and FV of GLI and FIII, FIV and FVI of TFIIIA). These findings suggest that an ‘electrostatic hot spot’ capable of trapping negatively charged ions is important for burying the recognition helices deep into the major groove. On the other hand, domains that do not have this electrostatic hot spot do not trap either Cl− ions or contact phosphate groups. The percent of simulation time that Cl− ions spend in the phosphate binding site (i.e. within 3 Å of the phosphorus atom) is 18–79% (Table 2), compared to 0–7% for phosphates that do not bind their helix into the major groove. Ions residence times anywhere on the surface of ZFs that do not contact DNA, i.e. FI of GLI, and FIV and FVI of TFIIIA, were in the range of 0–5% of the full MD simulations. The 3 Å clustering radius around the phosphorus position is quite stringent considering the four oxygens bound to the phosphorus atom. Note that in FIII of EGR the corresponding phosphate was modeled since it is missing from the crystal structure.
For ZFs, the Cl− ions interact in a pocket formed by the conserved His+7 (or Tyr24 in FI of TFIIIA) and two basic groups flanking this residue, one on the α-helix (often Pos. +6) and the other on the second β strand one residue before the conserved Phe in the hydrophobic core, usually an Arg or a Lys (see, e.g. Figure 4). For engrailed homeodomain, the ions also interact with two basic groups, Arg53 and Arg31, as well as Leu26 backbone (Figure 4D). The two basic groups together with a third hydrogen bond donor appear to be a common feature of the electrostatic hot spot. Interestingly, although these residues are involved in forming the electrostatic pockets, not all of them end up forming hydrogen bonds with the phosphates in the complex structure.
The key observation here is that the binding site of the Cl− ions on the surface of the TF domains corresponds to the same locus where a phosphate group buries the largest amount of SASA upon complexation (see Table 2 and Figure 4). Hence, the ‘re-ordering’ of Arg+6 and other side chains into bound-like conformations (in the absence of DNA) reflect in part the phosphate-like electrostatic interactions mimicked by the Cl− ions (see also Supplementary Figure S3). Since clustering of Cl− positions on the full surface of the protein domains did not reveal other preferred interacting sites, we conclude that these sites are evolutionary designed to desolvate charged groups. We also checked the sensitivity of the hot spot for acetate (another charge −1 molecule), finding similar propensities for protein association than Cl−. Three examples of the dynamics of the Cl− ion around the phosphate binding site are shown in Supplementary Figures S1–S3.
About two-thirds of protein–DNA contacts are with the DNA backbone. Table 3 lists all backbone contacting side chains, with 30 out 37 contacting side chains showing a significant amount of bound-like behavior. Interestingly, every single DNA-binding domain shows at least one or more of its side chains forming a hydrogen bond with a DNA backbone phosphate in a highly constrained bound-like configuration. Several side chains are partially buried in the free state, significantly constraining their conformations to bound-like states, e.g. His+7 in ZFs. Other side chains show a moderate increase of bound-like conformations in response to Cl− ions, e.g. Arg114 in FI of EGR improves its bound-like behavior from 30 to 60%, and Arg53 in homeodomain goes from 10 to 83% bound-like in the presence of ions. Overall, however, the role of ions for phosphate backbone contacting residues is not as striking as for those side chains forming hydrogen bonds with bases. Finally, some side chains such as Lys29 in FI of TFIIIA and Arg146 in FII of GLI are more than 60% buried prior to binding, requiring only a small rotation to rearrange and make their corresponding Hydrogen bonds.
The Lys residues in the conserved linker sequence (TG-E/Q-KP) in many ZF proteins (35) also contact and stabilize the protein–DNA complex (51,54,55). These hinge regions play a critical role capping the helical domains and become rigid upon DNA binding (55). Simulations of consecutive ZFs, FI–FII in EGR (using the same simulation protocol) show that in the canonical binding structure the linker Lys is 84% of the time in a bound-like conformation (data not shown).
For the cases studied here, most of the highly buried residues had a tendency to be bound-like prior to the encounter with DNA, suggesting that they fold in conformations conducive to a smooth binding. The time scales (MD) for this bound-like dynamics are on the order of hundreds of picoseconds to nanoseconds, a time consistent with the lifetime of an encounter complex in protein interactions (33). Other interface residues not directly contacting bases, such as Glu+3 in both FI and FIII also form their bound-like hydrogen bond with the backbone of Arg − 1, prior to the encounter with DNA. Interestingly, in a previous article (4), we have noted that an unconstrained Glu+3 side chain will clash with DNA bases of many tri-nucleotides, becoming a serious obstacle for non-specific association.
Most side chains that form hydrogen bonds with the DNA phosphate backbone (30 out of 37; Table 3) are between 20 and 100% bound-like, and 21 contacts are between 50 and 100% bound-like regardless of the number of ions present in solution. The latter group of highly constrained side chains encompasses all DNA-binding domains such that in the presence of a stretch of DNA, TFs can quickly form a non-specific complex with the DNA backbone. These complexes are unlikely to require a very precise complementarity since phosphates are relatively easy targets compared to hydrogen bonds to DNA bases. This efficient non-specific binding mechanism is consistent with association rate constants on the order of the diffusion limit, 109 M−1 s−1 for EGR (24).
Side chains that bury the largest amount of SASA in the bound state form bonds to DNA bases. Even before encountering the DNA, these groups are often found in rotamer conformations similar to those acquired in their complex structure. However, several of these side chains are also steered into these ‘native-like’ configurations by the presence of negatively charged ions in well defined ‘electrostatic hot spots’. Interestingly, in a non-specific complex, the DNA backbone phosphates are expected to sit in the same hot spots. Hence, we conclude that already in a non-specific complex, specificity determinant side chains are predisposed to be close to their bound-like rotamer conformations such that if presented with the appropriate partner they will rapidly form a higher affinity complex.
Molecular simulations of three ZF proteins and the engrailed homeodomain show that domains that bind into the DNA major groove also have one electrostatic hot spot for a negatively charged Cl− ion characterized by at least two basic groups in opposite sites of the phosphate and a relatively constrained third hydrogen bond. Crystal structures show that domains missing this substrate do not bury the recognition helix deep into the major groove, and have non-conventional binding modes (see TFIIIA and GLI). Since this site is also the locus of the phosphate that buries the largest amount of SASA upon binding in each domain, we suggest that without these hot spots proteins are not able to overcome the large penalty entailed by the desolvation of the phosphate from solvent and their own positive counter ions (17,20,22).
We emphasize that a favorable electrostatic site on the DNA-binding proteins is not unexpected. However, the picture emerging from our analysis is that subtle interactions work cooperatively in order to trap DNA backbone phosphates, reflecting evolutionary constraints (see, e.g. Figure 3) that presumably should be taken into consideration in the design of novel TFs with both specific and non-specific binding capabilities.
One might argue that the widespread bound-like behavior of buried residues and other side chains in Tables 1 and and33 is due to the harmonic constraints of the N and C atoms to the bound structure. However, these constraints are consistent with structural evidence showing that the ZF fold does not depend much in sequence [see Supplementary Figure S1 and Figure 1 in ref. (47)] and does not change before and after binding DNA (47). Moreover, some side chains that appear exposed in the co-crystals, mostly at the 3′ and 5′ end of the complexes, and are not highly buried upon complexation do not show this bound-like behavior (e.g. R118 from FI of EGR and R154 from FV of TFIIIA). Finally, it is important to note that the traditional induced fit theory (56) would suggest that the conformations of most of these side chains should respond to inter-molecular interactions. Our findings clearly show that this is not necessarily the case; in fact suitable conformations are imprinted on the TFs fold even in the absence of DNA.
The predisposition of key side chains of TFs for suitable rotamer conformations suggests that functional TFs have evolved structural motifs designed for a fast and efficient formation of a non-specific complex around the DNA backbone. Moreover, the fact that these phosphate backbone contacts are in all fingers is consistent with the experimental observation that two or more ZF domains work together to recognize specific DNA sequences (57,58). Multi-non-specific binding domains may also allow for partial dissociations and rapid reattachments of individual ZF as they diffuse from phosphate-to-phosphate along the DNA. This simple mechanism could reconcile the ‘sliding’ of TFs along DNA (2,19,59) first suggested by Winter et al. (19) by means of non-specific extended desolvated states, while bound-like specificity determinant side chains are ready to stall the 1D diffusion process at their cognate sequence.
Many of the key side chains forming bonds with bases are found buried deep in the binding interface, a natural question to ask is whether the binding process would also benefit from these side chains being bound-like prior to forming the non-specific complex. This could happen if Cl− ions are available to sit in the electrostatic hot spot that phosphates occupy in the crystals, prior to binding. Experiments seem to suggest as much given that the binding affinity can be a factor of four higher in the presence of salt, ~150 mM ionic concentration (22,33). One might argue that any potential benefit for this mechanism will be offset by the extra barrier of desolvating the negative ions upon complexation. However, if the TF with its negative counter ion and the DNA with its positive (Na+) ion (17,20,22) get closer and closer, say, within 6 Å. Then, the effective local concentration of Na+ and Cl− increases to ~6.15 M, i.e. the solubility limit of NaCl. Hence, under these conditions, ions might prefer to leave the interface and move to bulk water, providing an interesting possibility to reconcile the apparent ease that ions are removed from the protein–DNA interface. It is also worth mentioning that an alternative model of protein–DNA association (60) has recently suggested that the role of counter ions in DNA is to bias ‘hopping’ versus ‘sliding’. In as much as removing counter ions from the DNA backbone is a natural barrier for binding, a mostly ‘hopping’ mechanism should still rationalize how these ions are being efficiently removed after each hop. In summary, more experiments are certainly needed to fully resolve these questions, and the origin of the subtle increase in binding affinity as a function of ion concentration.
In conclusion, the dynamics of protein side chains contacting DNA strongly suggest a two-stage mechanism where association is first benefitted by a vast network of side chains that are preset to lock onto the DNA backbone; and, subsequently, the dynamics of ion dependent specificity determinant residues indicates that ion/charge–protein interactions play a role in side chain dynamics conducive to an efficient DNA binding. Of the DNA-binding domains studied here, only those with an electrostatic hot spot correlate with tight binding of the recognition helix. These domains cover the main B-DNA conformational forms BI and BII (61) and account for GG, CG, TG, CT, AA and CC dinucleotide steps (61), implying that our findings are not limited to particular DNA sequences. Collectively, these observations suggest a mechanism by which an electrostatic hot spot mediates the non-specific desolvation of the highly charged/polar DNA backbone, providing a cogent view of how TFs might recognize a small number of DNA sequences with high efficiency and specificity.
Supplementary Data are available at NAR Online.
National Science Foundation (MCB-0444291) and Pittsburgh Supercomputer Center (MCB049958P). Funding for open access charge: National Science Foundation (MCB-0444291).
Conflict of interest statement. None declared.