|Home | About | Journals | Submit | Contact Us | Français|
Protein function often depends on the exchange between conformational substates. Allosteric ligand binding or distal mutations can stabilize specific active site conformations and consequently alter protein function. In addition to comparing independently determined X-ray crystal structures, alternative conformations observed at low levels of electron density have the potential to provide mechanistic insights into conformational dynamics. Here, we report a new multi-conformer contact network algorithm (CONTACT) that identifies networks of conformationally heterogeneous residues directly from high-resolution X-ray crystallography data. Contact networks in Escherichia coli dihydrofolate reductase (ecDHFR) predict the long-range pattern of NMR chemical shift perturbations of an allosteric mutation. A comparison of contact networks in wild type and mutant ecDHFR suggests how mutations that alter optimized networks of coordinated motions can impair catalytic function. Thus, CONTACT-guided mutagenesis will allow the structure-dynamics-function relationship to be exploited in protein engineering and design.
Proteins fluctuate between alternative conformations to mediate their biological functions1,2. Perturbations that affect the relative populations of conformations caused by ligand binding3, mutation4, post-translational modification5, and temperature6 can affect biological mechanism. In addition to causing global or local unfolding7, perturbations can affect the structure and dynamics of groups of residues within the folded ensemble8. Proteins can evolve to exploit these perturbations for regulation: binding of ligands or the introduction of mutations at allosteric sites can stabilize transiently populated, but functional, alternative conformations at distant active sites9. However, these low population states are difficult to identify with most biophysical techniques10, making it challenging to characterize how structural fluctuations regulate protein activity by changing the populations of different conformers.
Traditionally, X-ray crystallography has provided a single static model of a “ground state”, which is assumed to represent the lowest energy conformation in the crystal lattice. More “dynamic” interpretations of crystallographic data have historically involved modeling an ensemble of independent conformations11–17. Recently developments in time-averaged refinement select a representative ensemble of 10s–100s of structures18. However, the relationship between individual ensemble members and the representation of conformational heterogeneity in the crystal remains the subject of debate15. In contrast, our recent work identifies conformational heterogeneity from high resolution X-ray diffraction data at levels of electron density (below 1σ) that are commonly ignored by manual model building efforts and are inadequately represented by harmonic atomic displacement parameters19,20 (Supplementary Fig. 1). Throughout this work, our analyses are based on such a single, sparse multi-conformer model built by qFit19.
Multi-conformer and ensemble analyses of X-ray data can be complementary to NMR8,21,22, simulations23, and co-evolutionary analyses24 to reveal how interactions between distant sites enable proteins to respond to perturbations. However, tools to uncover and interpret conformational diversity from X-ray crystallography data and to connect structural mechanisms for biomolecular dynamics remain underdeveloped. We developed an algorithm, COntact Networks Through Alternate Conformation Transitions (CONTACT), which automatically identifies conformationally heterogeneous residues that can connect functional sites, propagate chemical shift differences, and reveal the structural mechanisms of mutations that affect redistributions of the conformational ensemble.
To identify interacting residues that can respond collectively to perturbations, we used our robotics-inspired algorithm qFit19 to optimally fit up to four alternative conformations per residue into electron density features derived from experimental X-ray data that are consistent with anharmonic disorder (Fig. 1a). CONTACT analyzes the repulsive van der Waals interactions across all alternative conformations in the qFit multi-conformer model (Supplementary Fig 2.). The goal of this analysis is to define networks of conformationally coupled residues, in which movement between alternative conformations of one residue likely influences the conformations of all other residues in a network.
CONTACT first identifies ‘pathways’ of van der Waals overlaps. Each residue (e.g. residue i) in turn is moved to an alternative conformation, and overlaps of van der Waals radii with any neighboring residues (e.g. residue j) are identified (Fig. 1b). If neighboring residue j can be moved to an alternative conformation to reduce the steric overlap with residue i, the pathway is continued to neighboring residues of residue j (e.g. residue k, l, etc) until no new clashes are introduced. The relative frequency with which residues can and cannot reduce steric overlap is also recorded (Supplementary Fig. 3).
Pathways can include overlapping or nearly overlapping sets of residues using different combinations of alternative conformations. Thus, pathways that share common members indicate conformational coupling even if the residues are not directly linked in a single pathway. Any pathways that share residues are grouped into a single contact network (Fig. 1c). Thus, a pathway is a single sequence of residues that can be moved between alternative conformations such that van der Waals overlaps are reduced after each move. A contact network is a set of residues that are linked by common pathways.
To test whether CONTACT can automatically identify networks of residues from experimental X-ray data, we first examined the human proline isomerase Cyclophilin A (CYPA). Previous NMR experiments have demonstrated that CYPA undergoes conformational exchange both during its catalytic cycle and in the absence of substrate25. Room temperature X-ray electron density maps revealed extensive alternative side chain conformations for residues extending from the active site into the core, providing a structural rationale for the collective motion inferred by NMR4.
A large contact network of nine residues connects the hydrophobic core of the protein to the active site (Fig. 1d). This “red” contact network generally agrees with findings obtained from a visual analysis of room temperature X-ray data in providing a structural mechanism for collective conformational exchange detectable by NMR relaxation dispersion experiments25. Consistent with the idea that perturbations within the contact network will stabilize alternative conformations of other residues in the contact network, a mutation (S99T) of a core residue leads to chemical shift changes that spread across the red contact network and results in a kcat/KM that is ~0.3% of wild type (WT)4.
Notably, the red contact network consists of many independent pathways that connect contact network residues as reflected in the weights of the edges of the network graph (Fig. 1c). The large number of pathways suggests that the transition between the major and minor forms of the enzyme can occur by multiple structurally distinct mechanisms. The idea of multiple transition paths in CYPA is supported by recent NMR studies that incorporate evidence from the conformational dynamics of several mutants26. The contributions of these transitions are difficult to separate in the collective exchange fitting procedures of the NMR experiment26. Major and minor form end-states distinguished by alternative side chain conformations may dominate the conformational dynamics in the crystal4 and solution25; however, CONTACT identifies multiple plausible transition paths.
Dihydrofolate reductase (DHFR) is a model system for studying the relationship between conformational dynamics and catalytic activity27. The solution-state dynamics of the E.coli DHFR are dominated by the interconversion of the Met20 loop between closed and occluded conformations, which allow optimal substrate flux through the catalytic cycle28. We obtained X-ray diffraction data sets at cryogenic (100 K - 1.15 Å) and room temperature (~273 K - 1.35 Å) for the model Michaelis complex (E:NADP+:FOL) (Supplementary Table 1). Consistent with the trends observed in a larger dataset of 35 proteins (Figure 2, Supplementary Table 2, Supplementary Figs. 4–6, and Supplementary Note), the room temperature dataset exhibits both more (157 vs. 70) and longer (5.5 vs. 4.3) pathways than the cryogenic dataset.
The largest contact network from the room temperature dataset connects the functionally important FG loop to the NADP binding pocket and the adenosine binding domain (Fig. 3) This long-range connection is mediated by the cofactor NADP molecule, which uniquely connects the two subdomains (Fig. 3a and Supplementary Table 3). The electron density map is consistent with discrete disorder around the cofactor (Fig. 4a), providing further evidence that the bound NADP molecule is a dynamic hub. To test this model of conformational coupling of the FG loop to the adenosine-binding domain, we examined the chemical shift perturbations of a mutation in the FG loop (G121V). The NMR data for G121V is consistent with this hypothesis29. Changes in 15N and/or 1H chemical shifts between WT and the G121V mutant (E:NADP+:FOL) complexes propagate from the FG loop to the adenosine binding domain (Fig. 4b). However, no G121V X-ray structure has been published, leaving the structural origin of these long-range effects unclear.
Modeling a Val side chain at position 121 causes severe clashes with residues 13–15 and directly impinges upon the red ecDHFR contact network (Fig. 4c). The pattern of contacts in the network includes FG-loop residue 125 and extends to the adenosine-binding domain, resembling the chemical shift perturbations between WT and G121V ecDHFR. These results propose a physical model underlying the long-range chemical shift perturbations: the G121V mutation selectively stabilizes pre-existing conformations that propagate from the FG loop to the adenosine-binding domain. While the distance encompassed by the chemical shift changes is surprising, a single contact network mediates the direction and extent of long-range conformational coupling. In addition to these long-range effects, the “allosteric” mutation G121V destabilizes the catalytically competent closed conformation of the enzyme, stabilizes the occluded conformation, and reduces kcat approximately to ~0.6% of WT.
Several additional lines of evidence support this model. In molecular dynamics simulations, the dynamics of residue 121 are correlated with the dynamics of residues in the adenosine binding domain30. Sampling of locally unfolded states identified a similar pattern of energetic coupling7. NMR spin relaxation experiments in the fast (ps–ns) timescale found that the majority of residues with fast dynamics affected by G121V mutation are located in the adenosine binding domain29,31.
Based on our contact network analysis, we predicted that removing NADP would disrupt the conformational coupling of the FG loop to the adenosine-binding domain. To assess this idea, we compared 15N and 1H chemical shifts for the binary WT and G121V E:FOL complexes. In the absence of NADP, we observed no major chemical shift perturbations for residues in the adenosine-binding domain in the folate binary complexes (Fig. 4d). Thus, CONTACT provides testable hypotheses about how mutations shift conformational ensembles and induce long-range chemical shift perturbations observed in NMR experiments.
We used CONTACT to study the N23PP/S148A mutant of ecDHFR, which was designed to destabilize the occluded conformation of the Met20 loop. This mutant populates a nearly wild type structure as defined by a single cryogenic X-ray model32. Interestingly, and somewhat surprisingly, the N23PP/S148A mutant has a reduced rate of hydride transfer (khyd). In contrast to the WT enzyme, the N23PP/S148A mutant displays no evidence of conformational exchange for most active site residues on the millisecond timescale. However, faster (ps-ns) timescale backbone dynamics remained similar to WT32. While previous experiments indicated that the alteration of millisecond conformational dynamics in the mutant influenced the chemical step of catalysis, the underlying mechanism remained elusive. To further investigate the origins of the reduction in hydride transfer rate, we crystallized the N23PP/S148A mutant (E:NADP+:FOL) complex in the same crystal form as WT ecDHFR and collected a new room temperature data set to 1.38 Å resolution (Fig. 5, Supplementary Fig. 7, and Supplementary Tables 1 and 4).
Given the minimal structural effects of the mutation and the loss of observable millisecond conformational dynamics, we expected that CONTACT would reveal a large reduction in pathways and a decreased number of residues participating in active site contact networks. Surprisingly, we found a ~500% increase in the number of all-atom pathways in N23PP/S148A (806) compared to WT (157). In both WT and N23PP/S148A ecDHFR, a large contact network connects the FG loop to the adenosine-binding domain through the NADP cofactor. However, several residues of the Met20 loop were included in the mutant, but not wild type, contact network (Fig 5a). This result suggests that additional non-productive motions surrounding the active site of N23PP/S148A ecDHFR can influence the relative positions of the NADP and FOL during the reaction cycle. A second large contact network (blue) in the N23PP/S148A mutant reveals an extensive set of connections across the central beta strand of the protein (Fig. 5a,b). While in WT ecDHFR the connections across these beta strands were distributed over several contact networks (Fig. 4b,c), the entire beta sheet forms a single contact network in N23PP/S148A ecDHFR. These results suggest that the mutant enzyme exhibits increased active-site heterogeneity, despite the loss of detectable conformational averaging on the millisecond timescale.
To complement the results of CONTACT, we generated an isomorphous difference map between the WT and N23PP/S148A datasets. This map revealed that the most prominent difference features were located immediately adjacent to the site of mutation, corresponding to the change in amino acid identity (N23 to P) or proline insertion (Fig. 5c). However, several other features surrounding the active site suggest that the mutations had shifted the conformational distributions of neighboring residues. The presence of positive difference density without a corresponding negative difference density peak implies increased disorder in the mutant electron density map. A single model would fail to reveal any differences between the WT and mutant enzyme as the mean positions are not altered. Rather, one major effect of the mutation is to broaden the conformational distributions relative to WT, thus increasing frustrated, non-productive motions in the mutant enzyme. Indeed, difference density around the M20 loop residues Met20 and Trp22 revealed elevated conformational heterogeneity, and these residues participate in N23PP/S148A, but not WT, contact networks (Figs. 3 and and5a).5a). Additional difference density is observed across the beta sheet contacting the C terminus. These results provide a further indication that the mutations can alter dynamic properties in distant regions of the molecule and that the N23PP/S148A mutation increases conformational heterogeneity observable by X-ray crystallography.
The ability to infer networks of coordinated motion from X-ray data provides a gateway to understanding the structure-dynamics-function relationships in proteins. By analyzing experimental X-ray data, CONTACT complements existing methods for analyzing molecular dynamics trajectories33 Monte Carlo simulations34 or Rosetta-based analyses35. CONTACT automatically identifies coupled conformational heterogeneity, previously called “dynamic close packing”17, present in electron density distributions. These networks can connect allosteric and functional sites, explain the propagation of chemical shift differences, and reveal the structural mechanisms of mutations that have minimal effects on the ground state conformation.
Our analysis reveals that conformations accessible in the crystal are sensitive to perturbations such as temperature and mutations. Recent experiments have postulated that intramolecular pathways of signaling exist within a protein structure24. The physical basis of these pathways and how they are insulated from and supported by the surrounding protein structure remains unclear. While previous studies have largely focused on using the functional response to mutations to delineate the boundaries of intramolecular pathways, our results suggest that changes to contact networks as a function of data collection temperature may reveal new insights. For most proteins, we observed a decrease in the number and lengths of pathways at cryogenic temperature, implying that the broader conformational ensembles of residues at room temperature go beyond ‘filling the voids’ provided by expansion of the lattice6. Rather, networks of residues collectively sample alternative conformational substates. Some cryo-cooling procedures stabilize contact networks into specific conformational substates from those sampled at room temperature. It is important to note that exposure to X-ray radiation can introduce serious radiation damage artifacts that may complicate these analyses. Radiation damage is dramatically reduced at cryogenic temperatures. New advances in serial femto-second crystallography may mitigate some of these concerns, retaining the advantages of non-cryogenic collection with radiation damage protection that exceeds cryogenic techniques36. Thus, new biophysical tools may help circumvent the complex trade-off between collecting data below the glass transition37, which alters conformational heterogeneity, and protecting against radiation damage.
In CYPA, a large contact network consists of residues that collectively sample different conformations during the catalytic cycle. A mutation (S99T) outside of the active site that reduces the extent of the contact network, impairs the kinetics of interconversion, and reduces catalytic efficiency. In contrast, in ecDHFR, a mutant (N23PP/S148A) with a rigidified Met20 loop has the surprising effect of increasing non-productive, frustrated conformational heterogeneity in the active site. Frustration resulting from competing low-free-energy conformations that cannot be mutually satisfied are thought to play a key role in native state dynamics38. The loss of detectable millisecond conformational exchange in the active site is due to a decrease in backbone flexibility of the Met20 loop. However, local anharmonic side chain motions, which do not generate a large enough chemical shift difference or occur on a timescale inaccessible for detection by a relaxation dispersion experiment, have likely increased around the active site and may inhibit progress towards the transition state. Further NMR experiments, particularly for side chain methyl groups, may provide additional insights.
Superficially, the S99T mutation of CYPA and the N23PP/S148A mutation of ecDHFR are similar: both result in losses of catalytic activity and micro- to millisecond timescale conformational dynamics. However, the CONTACT results offer distinct interpretations of these mutations. In CYPA, an overpacked core decreases the kinetics of interconversion between conformational substates and catalytic rate. In contrast, in ecDHFR, the stabilization of Met20 loop dynamics have increased heterogeneity in the active site, which likely decreases the rate of hydride transfer. Enzyme engineering and design efforts could target residues in contact networks for simultaneous sequence optimization based on the principle that network residues exert a substantial influence on the conformations at other contact network sites. As protein design methods are optimized, the conformational dynamics necessary for catalytic cycles (for example, to occlude water or prevent product inhibition) will need to be defined and engineered. However, recently de-novo designed enzymes appear to suffer from packing imperfections that allow too much conformational heterogeneity, suggesting that contact networks can be targeted to improve packing39.
As multi-conformer and ensemble models are more widely adopted, several improvements to this analysis can reveal further links between conformational flexibility and mechanism. Minor improvements in Rfree should be carefully evaluated against model fit, stereo-chemistry and, importantly, a biologist’s ability to develop structure-based hypotheses of bio-molecular function. A particular challenge is to determine the extent of conformational coupling caused by steric mechanisms, as modeling of atomic interactions with a repulsive hard-sphere potential is an egregious simplification. As in ecDHFR, where the NADP ligand connects distant sites, it is likely that non-protein atoms play key roles in coupling distant sites; however, qFit does not currently automatically fit multi-conformer small molecules or waters into the electron density. Despite these simplifications, analysis of a multi-conformer model from a single X-ray experiment complements NMR22, long-timescale molecular dynamics40, Rosetta sampling35 and comparison of multiple independent crystal structures21. Integrated analyses combining these approaches will help to derive structural bases for conformational dynamics of proteins and to develop new hypotheses about how protein motions relate to function.
To model heterogeneous features in the electron density qFit computes an optimal fit of one to four conformations per residue together with their occupancies19. No explicit information about non-bonded contacts is included in the initial assignment of conformations or occupancy. These conformations are subsequently connected allowing for backbone heterogeneity into a multi-conformer model. The multi-conformer models can provide evidence of discrete heterogeneity of populations down to ~10%. qFit is available as a web server at: http://smb.slac.stanford.edu/qFitServer/qFit.jsp.
We used Phenix version 1.8–1069 to add hydrogens to qFit models and for subsequent conventional positional and ADP refinement steps 41. The qFit model is prepared for refinement with phenix.ready_set using the default parameters, which add riding hydrogens. The model is refined without manual intervention with phenix.refine using the flags:
“optimize_xyz_weight=true optimize_adp_weight=true main.strategy.individual_sites_real_space=false main.number_of_macro_cycles=5”
All other phenix.refine parameters are set to the default values, which includes occupancy refinement. For models with diffraction to greater than 1.55A resolution, we also include the flag:
“adp.individual.anisotropic=”not water or element H”
Since phenix.refine optimizes the hydrogen placement to the center of X-ray scattering rather than the likely nuclear position, hydrogens are stripped after refinement with phenix.pdbtools using the flags:
Hydrogens were then restored to their nuclear positions with phenix.ready_set using the default parameters.
The CONTACT algorithm (available at http://smb.slac.stanford.edu/CONTACT) calculates the most severe van der Waals clash between atoms from different residues that are separated by less than the sum of the van der Waals radii. This calculation takes into account all conformations of all pairs of residues with alternative conformations, excluding main chain hydrogen bonded atoms (identified by MMDB) and pairs of cysteine residues. In evaluating a multi-conformer model, we use the most severe 30% of these van der Waals overlaps to define the threshold value for clashes (Tstress). The 30th percentile of van der Waal’s overlap is an adjustable parameter in CONTACT. Functional biological insights may be obtained at other values of this parameter subject to considering resolution, data quality, and crystal environment.
Starting at each residue in succession, if we have obtained a pathway up to residue i, a residue j is added to the pathway if the following two conditions are satisfied:
A pathway then continues for the pairwise interactions between j and a residue k that satisfies these conditions. This process is continued until no more van der Waals overlaps are introduced. The CONTACT algorithm can calculate pathways using either side chain atoms only, or both main-chain and side chain atoms. The parameter Tstress can be varied (Supplementary Fig. 5). Although these alternative conformations need not be kinetically or thermodynamically mutually exclusive, the overlap of van der Waals radii indicates a likely coupling between the relative populations of conformations at each site.
Pathways identified by CONTACT are included in the network analysis of CYPA, WT ecDHFR and N23PP/S148A ecDHFR in the main text at the worst 30th percentile of van der Waal’s overlap, corresponding to 13%, 14%, and 14% overlaps respectively. We furthermore required that all clashes in a pathway were reduced to 10% overlap or less. We build a network graph where nodes are residues with edges indicating contacts identified by CONTACT. In analyzing the properties of these connected networks, we considered only subgraphs with more than 3 nodes and compute the edge weight as the number of distinct pathways between nodes.
We used qFit to rebuild 35 pairs of protein models with available high-resolution (2.0 Å or better) X-ray diffraction data collected at room and cryogenic temperatures (Supplementary Table 3). The majority were crystallized in nearly identical or very similar conditions. The datasets have Rmerge values indicating no unusual radiation damage6. When no Rfree set was deposited or could be extracted, we chose a test set using the standard parameters in phenix.refine. Structures were rebuilt using qFit and then refined as described above.
To examine the effect of severity of van der Waals overlap on pathway discovery the parameter Tstress was incremented in 5% steps starting at the worst 5th percentile of van der Waals overlaps. The search for additional pathways was terminated when its number exceeded 1,000,000. The effect of considering a higher percentile is that generally more van der Waals overlaps are identified, but that these are also less likely to be relieved below the threshold to continue the pathway. The number of all-atom pathways and their lengths diverge substantially between room and cryogenic temperatures, reaching a peak at the 30th–35th percentile overlap (Supplementary Fig. 5). For side chain only pathways the largest differences in the number of pathways between room and cryogenic temperature were observed at the 15th and 25th percentiles, but no significant difference was found for their lengths (Supplementary Fig. 2).
WT (E:NADP+:FOL) and N23PP/S148A (E:NADP+:FOL) were purified and concentrated to 30mg/mL32. Both WT and N23PP/S148A crystals were obtained by the hanging drop method by mixing protein 1:1 with well solution (100mM HEPES pH 7.5, 21% PEG8000, 200mM MgCl2). Data were collected at BL 8.3.1 of the Advanced Light Source. For room temperature data collection, the crystal is mounted in using loop and covered with RT tubing (MITEGEN) with 15uL of a 3:1 mixture of mother-liquor to water in the tip. The cryojet is set to 273 K and moved such that it lies just outside the capillary tube. The beam is heavily attenuated during test shots to establish a data collection strategy. During data collection, the attenuation is reduced by removing all but the Aluminum foil and the radiation dose is spread across the large (600uMx200uMx100uM) needle-shaped crystals by translation after each shot using a custom TCL script that interfaces with the standard BLUEICE interface. There are no unusual indications of radiation damage or problems with mosaic spread due to the translation: unit cell parameters and scale factors are stable during data processing, and the overall Rsym is low. The cryogenic dataset was collected after the addition of 10% PEG400 to the mother-liquor. Datasets were processed using XIA244 using the flag -3dii for XDS45 pipeline. Molecular replacement was performed using 1RX2 using Phaser46 through the Phenix GUI41.
H.v.d.B. is supported by the National Institute of General Medical Sciences Protein Structure Initiative (U54GM094586) at the Joint Center for Structural Genomics and a SLAC National Accelerator Laboratory LDRD grant SLAC-LDRD-0014-13-2; G.B. is supported as a Merck Fellow of the Damon Runyon Cancer Research Foundation (DRG-2136-12); K.Y. is supported by the General Wang Yaowu Stanford graduate fellowship; P.E.W. is supported by the US National Institutes of Health (GM75995); J.S.F. is supported by the US National Institutes of Health Early Independence Award (DP5OD009180). We acknowledge: T. Alber, T. Kortemme, D. Keedy, D. Kern, D. Tawfik, and R. Woldeyes for helpful discussions; N. Echols for advice on different treatments of hydrogen atoms in Phenix; J. Holton, J. Tanamachi and G. Meigs at Advanced Light Source Beamline 8.3.1 for support with X-ray data collection; B. Duggan and D. Boehr for chemical shift data on G121V ecDHFR complexes; H. Axelrod and C. Trame for data collection and refinement of RT RBM39.
AUTHOR CONTRIBUTIONSH.v.d.B., G.B., P.E.W. and J.S.F. designed and performed experiments, analyzed data and prepared the manuscript; G.B. and J.S.F. collected data; and H.v.d.B., K.Y. and J.S.F. developed analytical tools.
The authors declare no competing financial interests.
Protein Databank: 4KJL, 4KJK, 4KJJ