|Home | About | Journals | Submit | Contact Us | Français|
Mismatch repair (MMR) is an essential, evolutionarily conserved pathway that maintains genome stability by correcting base-pairing errors in DNA. Here we examine the sequence and structure of MutS MMR protein to decipher the amino acid framework underlying its two key activities—recognizing mismatches in DNA and using ATP to initiate repair. Statistical coupling analysis (SCA) identified a network (sector) of coevolved amino acids in the MutS protein family. The potential functional significance of this SCA sector was assessed by performing molecular dynamics (MD) simulations for alanine mutants of the top 5% of 160 residues in the distribution, and control nonsector residues. The effects on three independent metrics were monitored: (i) MutS domain conformational dynamics, (ii) hydrogen bonding between MutS and DNA/ATP, and (iii) relative ATP binding free energy. Each measure revealed that sector residues contribute more substantively to MutS structure–function than nonsector residues. Notably, sector mutations disrupted MutS contacts with DNA and/or ATP from a distance via contiguous pathways and correlated motions, supporting the idea that SCA can identify amino acid networks underlying allosteric communication. The combined SCA/MD approach yielded novel, experimentally testable hypotheses for unknown roles of many residues distributed across MutS, including some implicated in Lynch cancer syndrome.
DNA mismatch repair (MMR) initiates with MutS protein recognizing errors made by DNA polymerases, including base–base mismatches and insertion/deletion loops. MutS then activates MutL, which in turn nicks the error-containing strand, setting into motion strand excision and resynthesis by other DNA repair and replication proteins; in E. coli and related bacteria that employ methyl-directed MMR, MutL stimulates MutH to nick the strand.1−4 MMR is a highly conserved pathway that is essential for suppressing excessive spontaneous mutagenesis that leads to genome instability. Loss of MMR, e.g., due to defective MutS or MutL proteins, results in a high mutator phenotype that is associated with the hereditary Lynch syndrome as well as 10–30% of sporadic tumors in a variety of tissues.5,6
Our study focuses on MutS in order to investigate how this large, multidomain protein employs DNA/mismatch binding and ATPase activities to initiate DNA repair. Crystallographic studies have provided detailed information about the structure–function properties of MutS, especially for the two active sites. However, the functional significance of the vast majority of amino acids beyond the active sites remains unresolved, particularly with respect to residues involved in allosteric signaling between the sites, which are located ~70 Å apart. One bottleneck is the development of systematic hypotheses to direct mutational analysis of these ~1000-amino acid proteins. We approached this problem by applying statistical coupling analysis (SCA) on the evolutionarily conserved MutS protein family, which identified a network (sector) of 160 covarying residues distributed widely across the protein. We then tested the hypothesis that this coevolved network has functional significance by performing molecular dynamics (MD) simulations on point mutants of the top 5% sector residues and control nonsector residues, to measure their specific contributions to MutS structure–function. The results showed that sector residue mutations predominantly perturb MutS domain structure/dynamics and its interactions with DNA and/or ATP. Moreover, many of these disruptive effects are long-range, indicating that the sector enables allostery between the two active sites on MutS. Thus, the combined SCA/MD approach provided novel, experimentally testable hypotheses about previously unknown functions of MutS residues, including some whose mutations are associated with the Lynch syndrome.
Crystal structures of several MutS proteins bound to mismatched DNA have been solved, including T. aquaticus(7,8) and E. coli MutS homodimers9−11 as well as human MSH2-MSH612 and MSH2-MSH3 heterodimers13 (Figure Figure11A). The interactions of MutS with DNA during mismatch search and recognition, and subsequent interactions with other proteins to initiate repair, are modulated by ATP binding/hydrolysis (Figure Figure11B).14−23 The structures show that the mismatch-binding and ATPase sites on MutS are separated by a distance of about 70–100 Å and intervening protein domains; nonetheless, biochemical studies have revealed that these two active sites are tightly coupled through the repair process, as outlined below (reviewed in ref (14)). When not bound to a mismatch, one subunit of the dimer hydrolyzes ATP rapidly and the other slowly (fast: S1/MSH6; slow: S2/MSH2), and MutS with ADP bound to at least one subunit (S2/MSH2) is predominant in steady state.15,19,24−27 ADP-bound MutS is capable of enclosing DNA and diffusing along the helical contour probing for mismatched bases.16,28−30 Mismatch binding results in ADP-ATP exchange and suppression of ATP hydrolysis, such that ATP-bound MutS becomes predominant.15,24 ATP-bound MutS interacts with and activates MutL, and different studies indicate that the complex can pause at the mismatch or slide away to initiate repair (Figure Figure11B).11,19,21,22,31,32 Thus, ATP binding/hydrolysis and mismatch binding/release are reciprocally linked through allosteric communication between the two active sites on MutS, enabling the protein to transit through different conformations in order to seek, recognize, and initiate repair of a mismatch in DNA.
The structural basis for allosteric communication is difficult to understand at the amino acid level, particularly in a large, multidomain protein such as MutS, although crystal structures and computational studies have offered many insights. The crystal structure of Thermus aquaticus (Taq) MutS,7,8 the subject of this study, shows a homodimer bound to a 23 basepair DNA containing a T-bulge and ADP-BeF3¯ bound in both ATPase sites (Figure Figure11; PDB ID: 1NNE);8 this structure likely reflects an ATP-bound MutS state prior to conformational changes that enable interaction with MutL and movement on DNA.11 Each MutS subunit (S1 and S2) is subdivided into five domains (I–V). The DNA(+T) binds in the lower channel of the “θ” shaped dimer and is kinked by ~60° at the T-bulge (all MutS-DNA structures reported thus far have captured this kinked DNA complex).3 The N-terminal mismatch-binding domain (domain I) contains a highly conserved Phe-X-Glu motif that serves as a “reading head” and makes base stacking and hydrogen bonding contacts with the T-bulge. Like MutS ATPase activity, mismatch binding is also asymmetric and the base-specific contacts occur only between one subunit and DNA (S1/MSH6).7,12 Together with domains I, the clamp domains (domain IV) from both subunits complete the DNA binding site, enclosing the duplex and making nonspecific contacts with the sugar–phosphate backbone. The C-terminal domain (domain V) contains the highly conserved Walker A (P-loop) and Walker B (Mg2+ binding) motifs belonging to the ABC transporter (ATP binding cassette) superfamily that form ATPase active sites at dimer interfaces.33 The C-terminal domain also contains a conserved helix-turn-helix motif that is involved in MutS dimerization and ATPase activity.34 The remaining connector (domain II), lever (domain IIIa) and lever-clamp (domain IIIb) domains form the core of the protein between the mismatch binding and ATPase active sites; these domains are also part of the interface that binds MutL.11,35 The crystal structures of E. coli MutS homodimer and human MSH2-MSH6 heterodimer also show the same overall domain organization and interaction with mismatched DNA.9,12 Based on these structures, it has been proposed that the long α helix spanning the protein from domain IV to V could transmit allosteric signals between the mismatch binding and ATPase sites (Figure Figure11A).7 Elements in connector domain II, which is surrounded by domains III and V,12 the junction where domains II, III, and V intersect,7,12 the flexible loops lining the upper channel near the ATPase sites,12 are also proposed to be involved in allostery.8,12
A comparison between the crystal structure of G:T mismatch- and ADP-bound human MSH2-MSH6 and molecular dynamics (MD) simulations of the G:T bound/free and nucleotide-free protein showed subtle reorientation of amino acid residues in the ATPase sites associated with mismatch binding, providing theoretical and experimental evidence of allosteric communication.36 Normal mode calculations of G:T bound/free MSH2-MSH6 and E. coli MutS revealed strong motional correlation between the lever (III) and ATPase (V) domains, and between the mismatch binding (I) and ATPase (V) domains, in low frequency modes. These findings suggested that allostery between the two active sites involves coupled motions across MutS domains.37 A subsequent all-atom MD study compared ATP-bound/free forms of Taq MutS to assess nucleotide binding-induced changes in protein structure and dynamics.38 Dynamical cross-correlation maps of the atomic fluctuations indicated inter-residue and inter-domain motional coupling across the protein, and principal component analysis (PCA) of the maps revealed clusters of residues with correlated motions between the active sites, most prominently in the fully liganded ATP- and T-bulge-bound MutS complex. In addition to basepair mismatches, MSH2-MSH6 also binds DNA damage lesions and triggers a cell death response. MD analysis of MSH2-MSH6 bound to G:T mismatch v/s platinum cross-linked DNA revealed distinct changes in disordered loops in the MSH6 lever and MSH2 ATPase domains, which could reflect different allosteric responses to the two DNAs.39 All of these computational studies implicated protein dynamics in allostery and identified the key parts of MutS and the motions that might be involved. Nonetheless, the underlying amino acid-level architecture in the MutS protein family that enables long-range communication remains elusive. A recent dynamic network analysis based on MD simulations of E. coli MutS in all ATP/ADP-bound/free forms (apo, ADP, ATP in each of the two sites), identified sets of physically contiguous residues between the DNA binding and ATPase sites, and postulated that these constitute pathways of communication that change with the nucleotide-liganded state of the protein.40 The amino acids making up these pathways (151 of 800) could constitute the underlying architecture for allosteric signaling, although the assumption that allostery occurs through chains of interacting residues in MutS, and the functional significance of the particular residues identified in this study, have not been explicitly tested. Below we describe a complementary, bioinformatics, and molecular modeling-based approach to the problem, using statistical coupling analysis (SCA) to identify network(s) of coevolving residues from multiple sequence alignments of the MutS protein family, followed up by MD analysis to empirically test their involvement in allostery.
The SCA method of identifying cooperative networks of amino acids from evolutionary covariation within a protein family has been applied to a few different proteins in recent years and has yielded new insights into the structural basis of their functions.41,42 These groups of coevolved residues are termed statistical or SCA sectors, and they appear to play a significant role in the structure and activity of the proteins analyzed thus far, including PDZ domains,42 serine protease (S1A),41 dihydrofolate reductase (DHFR),43 Hsp70 chaperone,44 cathepsin K,45 as well as PAS, SH2, and SH3 domains.41 SCA sectors typically comprise a small fraction (~20%) of the total number of residues in a protein, and present as a distributed, contiguous network that includes the active site(s) and spans distant locations on the protein.42 Intra- and inter-domain connections within this network suggest that the residues are involved in long-range allosteric communication, and support amino acid coevolution as a means for establishing a major function within a protein family. For example, sector residues identified in the PDZ domain family connect the ligand-binding site with an allosteric site at the opposite surface of the protein. A recent high-throughput experiment tested the predictive capability of SCA by mutating each of the 83 amino acids in the PSD95pdz3 domain to every other amino acid and testing the impact of these changes on ligand binding affinity.42 The results showed that sector residues were much more likely to be important for PDZ function compared with nonsector residues. Mutational analysis of some sector residues in Hsp70 also supports the possibility that SCA can identify amino acid networks that allosterically couple distant active/regulatory sites on a protein.44 In this study, we applied SCA to the MutS family of mismatch repair proteins to discover the presence of such a network shaped by coevolution, and then subjected several residues in the predicted network to mutational analysis by MD in order to independently assess their contributions to structure–function. The results provide a foundation for new ideas about the roles of evolutionarily correlated amino acids in MutS, especially in allosteric communication.
A total of 105 orthologous MutS gene sequences were obtained from a previous study on the evolution of archaeal and bacterial MutS, and eukaryotic MSH2, MSH6 genes.46 An additional 59 orthologous sequences were obtained from the Conserved Domain Database: family ID 235444), an NCBI-curated collection of domain models based on 3D structure.47 The MutS structure-based alignment from Warren et al.12 was used as a reference to guide alignment of all 164 collected sequences using Clustal Omega.48,49 Sequences with >90% identity were removed, resulting in 142 aligned sequences in the final set subjected to SCA analysis. The LoCo program50 was used to postprocess the alignment and detect misaligned positions by local covariance values. The alignment was found to be significant at all positions; therefore, no modifications were required.
The SCA 5.0 method as implemented in MATLAB (The MathWorks, Inc., Natick, MA, USA) was used for analysis of evolutionary covariance in the MutS protein family. SCA is described in a series of articles by Ranganathan and co-workers,41−44,51−53 and the program was obtained from the Web site: http://systems.swmed.edu/rr_lab/sca.html. SCA is a multivariate analysis in which the calculated weighted positional correlation matrix of the MSA is decomposed into eigenvalues and eigenvectors by spectral decomposition. Co-evolved positions were obtained from the statistically significant eigenmode identified by comparing the original and 100 randomized alignment distributions. The first eigenvector of the weighted positional correlation was analyzed, and a ~ 20% cutoff yielded a sector of residues that were outside the expectation of randomized alignments (160 of 739 MutS positions).
Crystal structure based native contact maps have been used extensively as geometrical frameworks to understand the structural basis of protein folding and function.54−56 Native contacts formed by heavy atoms of MutS sector residues were calculated by the “shadow map” algorithm;55 shadow map identifies all heavy atom based direct contacts within 4–6.5 Å cutoff distance in the crystal structure (default 6.0 Å) and discards potential contacts occluded by intervening atoms. The resulting MutS contact map was visualized using the Visone program57 in which sector residues were projected as nodes that were connected if their interatomic distances were lower than the cutoff distance. Maps were created at the default 6.0 Å cutoff and at a slightly longer 6.8 Å cutoff. Contact distance distributions for the most distant residue pairs, A146 (Cα)–Y244 (Cε) and M250 (Cε)–A597 (carbonyl O), were calculated from a 15 ns wild type MutS MD trajectory.
The contribution of sector residues to MutS structure and function was investigated by performing all atom MD simulations for alanine mutants of the top 5% sector residues (21 residues that are 2σ beyond the mean in the distribution) as well as 10 least covarying residues from the bottom of the distribution as control (termed as nonsector residues). The structure of a MutS-(ATP)2–DNA(+T) complex modified from 1NNE in a previous MD study was used as the starting point for this analysis; changes to 1NNE in that study entailed modification of ADP-BeF3¯ to ATP and introduction of residues at noncrystallizing disordered positions.38 Alanine point mutations were introduced by deleting the side chains of selected residues from the PDB input file except the beta carbon. Standard H-building was implemented on all structures to add hydrogens to the modified crystal structure.
The AMBER12.058 suite was used with the ff14SB force field,59−61 which implements ff99bsc062 for DNA and uses parameters for ions supplied in frcmod.ion08;63 polyphosphate parameters were used for ATP.64 Alanine point mutations were created using tleap in the AMBER suite. Each system was solvated in a 12 Å truncated octahedron box of TIP3P65 water molecules and electroneutrality was achieved with the addition of Na+ counterions. In addition, 150 mM NaCl was added to achieve a biologically relevant ionic strength. The system was treated under periodic boundary conditions. Long-range electrostatic interactions were treated with the particle mesh Ewald (PME) algorithm66−68 with a 10 Å Lennard-Jones cutoff. The Berendsen algorithm69 maintained the simulations at the target temperature of 300 K. SHAKE70 was applied for hydrogen bond motions. The trajectory snapshots were saved at every 2 ps. The equilibrium phase of the simulation, initiated with 1000 steps of steepest descent energy minimization, was followed by 500 steps of conjugate gradient minimization to relax the solvent. This process was iterated four times with successively decreasing harmonic constraints: 100, 100, 10, 0 kcal/mol on solute and 20, 0, 0, 0 kcal/mol on the ions, respectively. Equilibration with harmonic constraints (25, 25, 15, 5 kcal/mol on solute and 20, 20, 10, 0 kcal/mol on the ions, respectively) facilitated heating of the system to 300 K over four 10 ps simulations. An additional 2 ns of dynamics without constraint was considered as equilibration and excluded from the analysis. Production of NPT simulation trajectories proceeded for 15 ns with an integration time of 2 fs. The wild type simulation was continued to 50 ns to check the stability of the trajectories. Equilibration and production level simulations were run on CUDA-enabled NVidia Telsa K20 GPU 2496 cores,71−73 which utilized the PMEMD version of sander in the Amber12.0 suite of programs.71,72 MD stability verification, dynamic cross-correlation map, and analysis of metrics assessing protein structure–function were computed with the AMBER cpptraj74 and MM-GBSA75 module on the resultant trajectories. Root-mean-square deviation (RMSD) and hydrogen bond donor–acceptor distances were monitored. Molecular visualization was carried out using VMD76 and PyMol.77
The RMSD of MutS domains was chosen as the first metric to assess the effects of alanine point mutations in 21 sector and 10 nonsector residues on protein structure and function. Average domain-wise RMSDs of both subunits in the wild type and mutant MD trajectories were calculated over five 3 ns windows of the total 15 ns simulation by global fitting of heavy backbone atoms onto the initial reference structure. A difference was considered significant if it varied at least 2σ from the mean of the wild type within the 3 ns window (Table S2). Variations were detected in the mismatch binding (I) and clamp (IV) domains, but not in the connector (II), lever (III), or ATPase domains (V); hence results are shown for I and IV (as well as V for comparison).
The second metric to assess the effects of sector and nonsector residue alanine mutations was hydrogen bonding to DNA and ATP. The MD ensemble of the MutS-(ATP)2–DNA(+T) complex38 reveals stable hydrogen bonds between R76 Nη1 and R76 Nη2 in the S1 subunit mismatch binding domain with cytosine 1545 and guanine 1546 backbone oxygens, respectively (note: R76 is present in the top 5% of sector residues and its mutation to alanine reduces the relative ATP binding free energy by 9.1 kcal/mol as described below in Metric 3); since the results are similar for both H-bonds, only the one with guanine is shown. K589 is a conserved residue in the Walker A motif required for MutS ATPase activity (K589M mutation increases KM by 18-fold33). Two H-bonds are formed between K589 Nζ and ATP Pβ and Pγ oxygens. The alanine mutation was considered disruptive if the H-bond was present in less than 20% of the MD trajectory snapshots.
The third metric used to assess the effects of alanine mutations was the relative ATP binding free energy calculated by the molecular mechanics generalized Born surface area (MM-GBSA) method.75 MM-GBSA combines molecular mechanics and continuum solvent-based generalized Born energies to calculate the relative ligand binding free energy.78,79 The calculation was performed on 100 snapshots averaged over each 15 ns MD simulation. The relative binding free energy (ΔΔGbinding) on ATP binding to both subunits of the MutS-DNA(+T) complex between wild type (ΔGbinding(wt)) and each of the 21 sector and 10 nonsector mutated residues (ΔGbinding(mut)) was determined by
The free energy associated with each term on the right side of above equation is estimated by standard MM-GBSA methods:
where Einternal (bond, angle and torsion), Gelectrostatic (electrostatic), and EVDW (van der Waals) interaction energies are molecular mechanical energies; Gnonpolar solvation is solvation free energy calculated by generalized Born implicit solvent model; Gnonpolar solvation is calculated with a linear dependence to the solvent accessible surface area.75 Note that the entropy contribution is neglected in the relative binding free energy calculation. ΔΔGbinding was considered to be significant if the value was >2.5 kcal/mol.
In the mismatch binding site, all atoms of R76 and guanine 1546 were aligned between the wild type (green) and each mutant structure (magenta) in Pymol.77 K589 and ATP were similarly aligned in the ATPase site. The final MD snapshots of the wild type and mutant proteins are shown to illustrate changes in both sites; movies of the MD trajectories for wild type MutS and two mutants, R172A and I553A, are included in Supporting Information.
The promise of SCA is the discovery of patterns of correlation among amino acids arising from functional constraints on a protein through evolution. Unlike multiple sequence alignment-based approaches that provide information on evolutionarily conserved residues, SCA resolves residues that are coevolving, not just pairwise but as a network. The hypothesis currently being tested in the field is that this coevolved network can provide a basis for understanding the structure/dynamics and interactions underlying protein function. We applied SCA to the MutS family of proteins because little is known about how the architecture of this critical DNA mismatch repair protein enables it to find and initiate repair of base pairing errors in DNA with the help of ATP. The SCA matrix calculated for a curated multiple sequence alignment of 142 MutS homologues yielded a top eigenvector that was well separated from statistical noise compared to a randomized sequence alignment (Figure S1). This eigenvector revealed a protein sector comprising 160 residues of 739 total in the aligned sequences (~22%; Table S1). The sector residue positions did not correlate strongly with highly conserved residues (Figure S1), indicating that SCA captured information beyond conservation in the MutS protein family.80
Projection of the sector residues on the T. aquaticus MutS dimer structure shows that they are widely distributed over all five domains of the protein (Figure Figure22A), with a large majority (77.5%) located in the DNA binding (mismatch binding, I; clamp, IV) and ATPase (V) domains. Only about 30% of the residues are less than 8 Å from the mismatch binding and ATPase sites; thus, the SCA sector includes key active site residues as well as distant residues that may have indirect effects on DNA binding and ATP binding/hydrolysis. Some of these 160 residues have been previously identified or postulated as important for MutS structure–function, providing support for the significance of the SCA sector. These include (i) F39 and E41 of the F-X-E motif in domain I that stack and hydrogen bond with the mismatched base, respectively;7,81 (ii) E99, P100, G106 in the glutamate-rich VEPAEEAEG loop in domain I, which is part of a β hairpin that contacts DNA7 and moves away when MutS binds ATP, possibly facilitating sliding of the protein on DNA;38 (iii) 18 positions that are implicated in key hydrogen bonding and salt bridge interactions for carboplatin- and cisplatin-lesion recognition by human MSH2-MSH6;82 (iv) F567 in domain V that stabilizes the adenosine base in the ATPase active site;33 (v) L631, A632 and G633 in the SDDLAGGKST loop in domain V, containing the highly conserved N-2 motif (ST) that binds the ATP γ -phosphate and contacts lever domain III in the other subunit;7,8,38 (vi) A745, G746, R754 and L759 in the helix-turn-helix motif in domain V, which is involved in MutS dimerization and ATPase activity;7,34 (vii) A537, F724, and H726 (A562, F758, and H760 in E. coli MutS, respectively) that lie at the interface between MutS and MutL;11 (viii) 35 residues in the region proposed as a “transmitter” of signals between the DNA binding and ATPase sites, formed by the junction of domains II, III, and V and an α helix in domain IV (Table S1);7 (ix) 48 positions that overlap with 151 pathway residues identified by a recent dynamic network analysis of E. coli MutS;40 and finally, (x) 48 positions that overlap with mutations in human MSH2 and MSH6 subunits associated with Lynch cancer syndrome (Table S1; http://insight-group.org).5
The literature survey above shows that the SCA sector includes amino acids previously implicated in MutS function, providing initial support for the utility of the method. However, given that most of the sector residues have no previously assigned function, a systematic analysis was initiated to examine whether and how the sector might enable MutS actions in MMR. First, the spatial relationship between sector residues was determined by a shadow contact map, in which interactions between amino acids are defined by an interatomic distance cutoff between heavy atoms and those potentially occluded by intervening atoms are removed.55Figure Figure22B shows all 160 sector residues colored by MutS domain, with lines connecting those in contact with each other at a default 6.0 Å cutoff. The residues group into three major structurally contiguous clusters, with cluster 1 linking domains I (red), II (green), IIIa (periwinkle) and IV (yellow); cluster 2 covering domain IV (yellow); and cluster 3 linking domains IIIa (periwinkle) and V (cyan). In the case of smaller proteins examined previously by SCA, such as PDZ and multidomain Hsp70, most of the sector residues form a single contiguous network spanning the protein.42,44,53 In case of MutS, the sector manifests as discrete clusters of connected residues, which might reflect local architecture that supports distinct MutS activities, such as DNA/mismatch binding, ATPase, interaction with MutL, and/or allostery through mechanisms other than physically connected pathways between distant sites.83,84 Notably, a small increase in the cutoff distance to 6.8 Å results in a large network that includes 67% of the sector residues and connects all five MutS domains; cluster 2, which covers domain IV, remains distinct (Figure Figure22C). Two residue pairs make critical contacts in this larger network (highlighted by gray rectangles in Figure Figure22C), A146 (Cα)–Y244 (Cε) and M250 (Cε)–A597 (carbonyl O), at average interatomic distances of 6.8 and 6.5 Å, respectively, as calculated from a wild type MutS MD trajectory (Figure S2); the minimum distances between these pairs are 5.7 and 5.0 Å, respectively, within the range for short-range interactions. This network does include contiguous pathways linking the DNA binding and ATPase sites, one of which is projected on a subunit of Taq MutS (Figure Figure22D).
As noted in the Introduction Section, the structural network defined by an SCA sector has potential functional significance, assuming that the amino acids have coevolved to retain key structure–function properties of the protein. A few studies have tested this hypothesis experimentally by mutating a subset of sector residues and assessing the impact on function, e.g., Hsp70.44 Such analysis is generally limited in scale, especially for large proteins like MutS, given the need to generate a vast number of mutants and monitor multiple interactions and activities for systematic testing. The difficulty is compounded by the need for combinatorial mutagenesis to investigate any coordination among residues in the network. We tackled the problem by combining SCA with all-atom molecular dynamics (MD) simulations of sector residues mutated to alanine. The goal was to monitor any changes in local or global dynamics associated with the mutations, and determine whether these changes could perturb MutS structure and related functions.85 We selected 21 positions constituting the top 5% of the sector residue distribution (Figure Figure22B, C, filled hexagons), and 10 least covarying positions as control “nonsector” residues for a proof-of-principle mutational study. Of the 21 sector residues, ten were from various MutS domains in cluster 1, two were from cluster 2, five from cluster 3, and the remaining four were distributed across individual or small groups of residues that did not map to these networks (Figure Figure22B, C). MD simulations were performed with the 31 total mutants to compare the effects of alanine mutations in sector versus nonsector residues.
The simulations were performed with a MutS-(ATP)2–DNA(+T) complex38 derived from the MutS-(ADP-BeF3¯)2-DNA(+T) crystal structure,8 which represents a critical MutS intermediate in the MMR pathway. Additional simulations were performed with ADP-bound and mixed ATP/ADP-bound intermediates for select mutants as well. Each mutant was subjected to 15 ns MD simulations, and the data were analyzed with respect to three key measures of MutS structure/dynamics and function, namely domain conformation and interactions with DNA and ATP. The first metric was domain-wise RMSD values calculated for both S1 (mismatch binding) and S2 subunits from the MD trajectories of wild type and mutant protein dimers (Table 1, Figures Figures33, S3, and S4). The second metric was the stability of hydrogen bonds in the active sites–between the mismatch binding domain I and DNA, and between the Walker A motif and ATP (Table 1, Figures Figures44, S5, and S6). R76, a top 5% sector residue in the mismatch binding subunit S1, makes two stable hydrogen bonds with DNA flanking the T-bulge (between R76 Nη1 and Nη2 hydrogens and cytosine 1545 and guanine 1546 backbone oxygens, respectively; only the bond with guanine is shown for clarity).38 These contacts may help distort the DNA to facilitate F-X-E interaction with the mismatched base (widening of the minor groove and kinking toward the major groove).7 The second set of hydrogen bonds is between K589, a highly conserved lysine in the Walker A motif/P-loop that is essential for ATPase activity, and ATP (between K589 Nζ and ATP Pβ and Pγ oxygens).7,33 The third metric was the relative ATP binding free energy calculated by the molecular mechanics combined with generalized Born surface area method (MM-GBSA; Table 1), which is used often to estimate ligand binding affinities based on MD simulations of ligand-macromolecule complexes,85,86 including recently to illustrate changes in E. coli MutS affinity for DNA when bound to ATP87 (MM-PBSA yielded similar results; data not shown).
Results for all three measures described above are presented in Table 1 for the 21 sector and 10 nonsector alanine mutants of MutS. Overall, 20 of the 21 sector mutations disrupt at least two metrics and 10 disrupt all three metrics, whereas only 2 of the 10 nonsector mutations disrupt two metrics (F78A, K161A) and none disrupt all three (Table 1). Two sector mutants (R172A, I553A) and one control nonsector mutant (Y167A) are highlighted in Figures Figures33 and and44 and the remainder are presented in Supporing Information (Figure S3–S6). With respect to the first metric, domain-wise RMSD across the MD trajectories of several mutants showed significant changes in the mismatch binding (I) and DNA binding clamp (IV) domains (defined as 2σ from the mean of the wild type within 3 ns windows; Table S2). No significant changes were observed in domains II, III and V; therefore, trajectories are shown only for domains I and IV, plus the stable ATPase domain (V) for comparison (Figure Figure33). Overall, 19 of 21 sector mutants and only 2 of 10 nonsector mutants showed domain destabilization (Table 1). For example, the clamp domains of both S1 and S2 subunits are destabilized in R172A (Figure Figure33B), and the mismatch binding domains of both S1 and S2 subunits are destabilized in I553A (Figure Figure33C). In contrast, all domains in both S1 and S2 subunits remain stable in the Y167A nonsector mutant (Figure Figure33D). MD simulations of both R172A and I553A were extended further to 50 ns and yielded the same results (Figure S3). Four independent 50 ns MD simulations were performed for the I553A mutant, and all of them yielded the same results as well (data not shown). With respect to the second metric, hydrogen bonding between MutS and either DNA or ATP is disrupted in 17 of 21 sector mutants and only 2 of 10 nonsector mutants (Table 1). For example, hydrogen bonding between R76 and the DNA backbone flanking the T-bulge is disrupted in both R172A and I553A mutants but not in Y167A (defined as less than 20% H-bond retention over the MD trajectory; Figure Figure44B–D, respectively). Hydrogen bonding between K589 and ATP is also disrupted in both R172A and I553A mutants but not in Y167A (Figure Figure44F–H, respectively); movies of the changes in hydrogen bonding over the MD trajectories in both active sites are included in Supporting Information. Finally, with respect to the third metric, the relative ATP binding free energy also changed significantly for 17 of 21 sector mutants and only 2 of 10 nonsector mutants (ΔΔGbinding > 2.5 kcal/mol); e.g., for both R172A and I553A mutants but not for Y167A (Table 1). Thus, an overwhelming majority of the sector residues tested in this proof-of principle MD analysis contribute to structural properties of MutS that are critical for its function.
As noted earlier, the MutS dimer has asymmetric ATPase activity; accordingly, it can adopt nine different ATP/ADP-bound/free forms during the ATPase reaction cycle. We chose to perform MD simulations first with the MutS-(ATP)2–DNA(+T) complex, since ATP-bound MutS is a key intermediate that is stabilized after mismatch recognition and undergoes conformational changes resulting in interaction with MutL to license nicking of the error-containing strand. Previous MD studies have shown that other nucleotide-bound forms of E. coli MutS maintain the same overall structure,37,40 which implies that the structural network revealed by SCA could contribute to function in these forms as well. We tested this hypothesis by performing MD simulations on some of the ADP-bound forms of MutS that occur prior to mismatch recognition: MutS-ADPS1ATPS2-DNA(+T), MutS-ATPS1-ADPS2–DNA(+T), and MutS-(ADP)2–DNA(+T).14Table S3 shows the effects of mutating sector residues R172 and I553 as well as nonsector residue Y167 to alanine on ADP-bound MutS. All ADP-bound forms of R172A and I553A show domain destabilization, but not Y167A (except ATPS1-ADPS2-bound MutS). Also, hydrogen bonding between R76 and the DNA backbone is disrupted in R172A and I553A but not in Y167A for all three ADP-bound MutS forms. The same is true for hydrogen bonding between K589 and the nucleotide (except all mutants retain this contact in ADP2-bound MutS). Thus, the results are the same overall as for ATP-bound MutS, where sector mutants are substantively more disruptive than nonsector mutants.
Remarkably, 20 of the 21 sector residue mutants tested in this study exert their disruptive effects on MutS domain conformation or interactions with DNA and/or ATP from a distance. This property is indicative of their involvement in allosteric communication across the protein. For example, R172A in connector domain II perturbs DNA binding clamp domain IV conformation as well as interaction with ATP from a distance of ~25 Å, and I553A in ATPase domain V perturbs mismatch binding domain I conformation as well as interaction with mismatched DNA from a distance of ~60 Å (Table 1, Figures Figures33 and and4).4). As noted earlier, physical contiguity is a characteristic of SCA sector residue networks identified in proteins thus far, providing support for the view that allostery occurs through direct interactions between amino acids spanning distant locations.51,53,88,89 Such allosteric pathways might operate in MutS as well, since about 2/3 of the sector residues are part of a contiguous network that spans all five domains between the mismatch binding and ATPase sites.
However, we note that sector residues that lie outside of the contiguous network in MutS also show effects at a distance (Figure Figure22C). For example, the Q468A mutation in domain IV perturbs domain I conformation as well as interaction with ATP from a distance of ~76 Å, E699A in domain V perturbs domain I conformation as well as interaction with DNA from a distance of ~70 Å, and L759A in domain V perturbs domain IV conformation as well as interaction with DNA from a distance of ~70 Å (Table 1, Figures S3 and S5). This finding implies that SCA can also reveal functionally relevant residues that are not in contact with other members of the network (this could be especially relevant for large multidomain proteins like MutS). It also supports the view that allostery need not rely on pathways of physically interacting residues and could emerge from multiple determinants of changes in the protein free energy landscape that stabilize one set of conformations over another.84,90 As noted earlier, MD studies of E. coli, T. aquaticus MutS, and human MSH2-MSH6 have revealed long-range correlated motions across the protein, implicating dynamics-based mechanisms of allostery.36−39 We therefore asked whether sector residues that are not part of contiguous pathways are involved in these conformational dynamics. Figure Figure55 shows difference maps calculated from the motional correlation matrices of Q468A, E699A, and L759A versus wild type MutS-(ATP)2–DNA(+T) complexes (Figure Figure22). All three mutants exhibit changes in correlated motions relative to the wild type protein. The changes most closely related to MutS activities, between the mismatch binding domain of S1 subunit and the ATPase domains of S1 and S2 subunits, are highlighted by black rectangles. The differences in motional coupling could reflect contributions of sector residues outside of a physically connected network to shifts in the MutS conformational ensemble that mediate allosteric communication between the DNA binding and ATPase sites.
To the best of our knowledge, MutS is the largest protein to be examined by SCA (~180 kDa dimer), a statistical method that identifies evolutionarily conserved amino acid networks from sequence alignments of protein families. The analysis yielded a set of residues bearing the hallmarks of previously identified SCA sectors in smaller proteins, such as serine protease and PDZ domains.41,42 These residues are sparse (~20% of total), spatially distributed, and the majority are part of a contiguous network defined by van der Waals contacts that spans all MutS domains and connects the two active sites. How to assess the potential evolutionary, structural, and/or functional significance of this coevolved network is an open question in the field. In this study we demonstrate that combining SCA with MD analysis of alanine mutants can provide a predictive model for the structure–function roles of individual amino acids and networks located beyond the active sites in MutS protein. This finding is especially significant in case of large proteins where open-ended experimental mutagenesis is challenging, or for activities that are difficult to resolve experimentally, such as allosteric communication. The MD trajectory data enabled empirical testing of specific sector residue contributions to MutS structure–function, especially allostery, including conformational dynamics, hydrogen bonding with mismatched DNA and ATP ligands in the active sites, and free energy of ligand binding. Most notable among the findings was the disruptive impact of mutating distant sector residues on critical contacts between MutS and DNA/ATP, and that residues both within and outside of contiguous pathways contribute to allostery. Thus, this study provides testable hypotheses for previously unknown functions of widely distributed amino acids in MutS, including their role in allosteric communication. We suggest that it can serve as a model for other large systems to explore the functional relevance of the evolutionarily conserved protein architecture revealed by SCA.
The authors would like to thank Ishita Mukerji and Sudipta Lahiri for insightful discussions and Henk Meij for technical support. This work was supported by NIH T32 GM008271 (molecular biophysics training grant), NSF grants CNS-0619508 and CNS-0959856 (Wesleyan University computational resources), and NIH grant R15 GM114743 (M.M.H.).
The authors declare no competing financial interest.