|Home | About | Journals | Submit | Contact Us | Français|
NMR structure determination is frequently hindered by an insufficient amount of distance information for determining the correct fold of the protein in its early stages. In response we introduce a simple and general structure-based metric that can be used to incorporate NMR-based restraints on protein surface accessibility. This metric is inversely proportional to the sum of the inverse square distances to neighboring heavy atoms. We demonstrate the use of this restraint using a dataset from the water to protein magnetization transfer experiment on the protein Bax and the solvent paramagnetic relaxation enhancement experiment on the protein ubiquitin and Qua1 homodimer. The calculated solvent accessibility values using the new empirical function are well correlated with the experimental data. By incorporating an associated energy term into Xplor-NIH, we show that structure calculation with a limited number of additional experimental restraints, improves both the precision and accuracy of the resulting structures. This new empirical energy term will have general applicability to other types of solvent accessibility data.
Protein structure determination by high resolution Nuclear Magnetic Resonance (NMR) has evolved greatly in the recent years. This NMR procedure has traditionally relied on NOE based distance restraints and dihedral restraints based on scalar or J-coupling to determine a protein structure. These restraints contain only local structural information. Newer classes of structural restraints have emerged that provide longer range or even distance independent information. For instance, residual dipolar coupling [1; 2; 3; 4], residual chemical shift anisotropy [5; 6; 7; 8], ratios of NMR relaxation times [9; 10; 11; 12] and cross-correlated relaxation rates [13; 14] can define bond vector orientation in a fixed common frame, or can describe the relative angles between different bond vectors. In addition to these new classes of restraints, pseudo contact shifts [15; 16] and paramagnetic relaxation enhancement (PRE) [17; 18; 19] also offer long-range orientation and distance restraints relative to the paramagnetic center. Inclusion of all or some of these restraints in a single domain structure calculation has been shown to improve the precision and, more importantly, the accuracy of the resulting structures. They also aid in determining the relative orientation of protein domains or subunits in a complex.
The NMR structure determination process generally is initiated by first obtaining a low-resolution structure of the protein with the correct fold using the minimum NMR data sets. This initial structure is further iteratively refined by adding more NMR restraints into the calculation. Arriving at the first correctly folded structure can be a bottleneck in the process due to the difficulty of direct interpretation of NMR data and the nature of the energy landscape defined by various experimental and knowledge-based terms. The above-mentioned newer restraints can help overcome this problem to a certain degree and improve the convergence of the initial fold-determining calculation. Another recurring problem for NMR structure determination is the lack of overall compactness of the structure. One proposal to address this was to use a radius  or volume  of gyration potential term during the structure calculation to keep the structure compact. An alternative is to use small angle X-ray scattering data that can provide structure information in place of the radius of gyration . Ryabov et. al. has also shown that implementing the rotational diffusion anisotropy information into the shape of the protein can improve the quality of the refined structures [23; 24]. In addition, Nilges et. al. also has proposed carrying out the final structure refinement in the presence of water molecules to regularize the protein structure and improve its overall packing . However, easily obtained experiment-based restraints are sorely needed for initial NMR structure determination.
NMR has also been a useful tool to study protein solvent interactions. For instance, hydrogen-deuterium exchange studies by NMR provide a measure of protection from the solvent that signifies stability at various sites in the protein. In solution, the PRE by paramagnetic solvent probe has been shown to depend on the water accessibility of residues and provides proton-solvent distances [26; 27; 28; 29]. Confinement of a protein within a reverse micelle also provides the residue-specific information of protein-solvent interactions . In addition, water-protein NOE or ROE measurements specify water lifetime and accessibility at various residues in a protein [31; 32; 33]. All of these experiments can easily identify which residues are solvent exposed, and thus are on the surface of the protein, and which are buried in the core of the protein. Interestingly this information has not been commonly used as structural restraints. In this work we propose a method to take advantage of this information using an empirical relationship between a simple measure of protein surface accessibility and observed solvent protein interaction. Specifically, we evaluated data sets from an experiment where water magnetization was transferred to the protein as well as data sets from paramagnetic solvent PRE experiment. Both experiments are easy to conduct and it is straightforward to analyze the data. We show that this approach improves the convergence and accuracy of structure calculations of the protein Bax, ubiquitin as well as the Qua1 homodimer using a reduced data set simulating an early stage of structure determination and cases where large numbers of NOEs are difficult to obtain. In addition this term should be generally applicable to various other measures of solvent protein interactions.
The 15N-labeled protein Bax was expressed and purified as described previously . While the 15N, 2H-labeled ubiquitin was prepared as described by Lazar et. al. . The 15N-labeled protein Bid was expressed in Escherichia coli BL21 (DE3) cells transformed with the pET15b-Bid plasmid and purified using Ni2+ affinity column followed by ion-exchange (Q) column. Samples for NMR experiments contained 0.5 mM Bax and 0.8 mM Bid in 50 mM phosphate buffer at pH 6.7, with 2mM DTT, 0.02% NaN3, while the ubiquitin NMR sample contained 0.5 mM 2H, 15N labeled ubiquitin in 20 mM Na-acetate buffer at pH 5.2 and 0.02% NaN3.
The water to protein magnetization transfer data was acquired by using the enhanced protein hydration observed by gradient spectroscopy (e-PHOGSY) with NOE transfer step pulse sequence with no modification [36; 37]. The selective water inversion pulse used was a 50 ms sinc shaped pulse. The 1H, 15N-HSQC spectrum was acquired as a reference. The water accessibility value reported is the intensity ratio between the water to protein magnetization transfer and the reference HSQC spectra. Both spectra were acquired with 1024 and 192 complex data in the 1H and 15N dimension, respectively. The 1H carrier frequency was set at the water frequency, while the 15N carrier was at 115 ppm. The reference HSQC spectrum for the protein Bax was acquired with 32 scans for each t1 increment, while the water to protein magnetization transfer spectrum was collected with 64 scans. The transfer mixing time used was 70 ms. Both experiments were acquired at 32 °C on a Bruker DMX 600 MHz spectrometer equipped with cryogenic probe and Z–pulsed field gradient. For the protein Bid, the same experiments were carried out with 32 and 64 scans for the reference HSQC and water to protein magnetization transfer spectra at 25 °C, respectively. For the deuterated 15N-ubiquitin, 16 and 64 scans were acquired for the reference HSQC and water to protein magnetization transfer spectra at 25 °C, respectively. NMR data were processed using NMRPipe  and analyzed with NMRView  and PIPP .
The e-PHOGSY detection pulse sequence was applied to detect the interaction between water and the protein amide-proton [36; 37]. The uniformly 15N labeled apoptotic protein Bax was used to carry out the measurement. Meanwhile, a 2D 1H-15N HSQC spectrum was acquired and used as a reference. An overlay of the spectra of Bax from these two experiments is shown in Figure 1A. The variation in the strengths of the interaction between water and the protein amide protons could clearly be observed as modulation in their signal intensities compared to the HSQC spectrum. Residues with overlapping resonances were taken out of the analysis. A total of 156 out of 184 non-proline residues could be analyzed. In order to eliminate the discrepancy from the rapidly exchanging hydroxyl protons, residues (Ser, Thr and Tyr) possessing the hydroxyl group were omitted. The ratios in intensities of 138 residues from these two different spectra are plotted as a function of residue number as shown in Figure 1B. Qualitatively, this water-protein interaction data differentiates the exposed and buried residues. For instance the termini and long exposed loops of the protein show high intensity ratios (> 0.07), while buried residues show low ratios (<0.05). Some residues in the same helix can also show differential exposure to solvent (for instance, helices α2 & α3).
The test for an empirical function was carried out using the previously published solution structure of Bax (PDB: 1F16). Exposed residues that are close to the protein surface tend to have fewer heavy atoms within a certain distance range, while buried residues will have more (Supplemental Figure 1). Following this observation, we chose this simple metric as a stand-in for solvent accessibility (Å2):
where ri is the distance (Å) between an amide proton and heavy atom i. The sum is over all heavy atoms within a cutoff distance Rc. Heavy atoms of the residue to which the amide-proton belongs to are excluded from this calculation. This is done to avoid residue type dependence. For instance a residue on the surface with bulky, but mobile, side chain may artificially show significant solvent protection compared to non-bulky residue, since a static structure is used in the calculation. This effect, however, is not large. In the case of the protein Bax, excluding the intra-residue heavy atoms improves the correlation by 1% (0.80 to 0.81). The calculated (using the lowest energy Bax structure) values for Eq. (1) are plotted against the experimental intensity ratios, corresponding to the different strength of water-protein interaction, and show a good linear correlation with a correlation coefficient R=0.80 (Figure 2). Using the 10 lowest energy NMR structures of Bax, the calculated values correlated with the experimental data with an average correlation coefficient R of 0.83 and standard deviation of 0.02. Other choices for power of distance in Eq. (1) result in worse agreement.
The linear regression fitting of the empirical function of the 10 lowest energy Bax structures (PDB ID: 1F16) to the water-protein interaction data on all the identified Bax residues resulted in the empirical conversion of intensity ratios (Iratio) to an effective surface area :
The fitting for the 10 lowest energy Bax structures showed the slope and offset values in the range of 1.83 Å2 to 2.30 Å2 and 0.04 Å2 to 0.07 Å2, respectively. The distance from each amide proton to all heavy atoms was calculated using a cutoff radius Rc= 20 Å. It is important to note that a cutoff range exceeding 20 Å showed very small improved agreement to the experimental values because long distances (ri) contribute much less to the final values of the empirical term. For the case of Bax, the correlation coefficient R improved from 0.805 to 0.809 when Rc=80 Å was used. The choice of the distance cutoff Rc is a compromise between computational cost and structural improvement. Using Eq. (2), the experimental water-protein interaction’s intensity ratios could be converted into restraints and used for the structure calculations.
In order to test the effect of the new term on structure calculation, the experimental water-protein interaction data were used in a simulated annealing calculation to determine the structure of Bax. The solvent accessibility potential was implemented as a new energy term in Xplor-NIH (http://nmr.cit.nih.gov/xplor-nih/) , given by:
where is the calculated solvent accessibility which is given by Eq. (1). During the calculation, the force constant k was increased from 0.2 kcal/Å2 to varying final values to test for convergence. This was done in concert with cooling the system from an initial temperature of 3500 °K to a final value of 25 °K. All structure calculations were done using extended random coil starting structures. All original dihedral and hydrogen bond restraints along with different sets of NOE distance restraints were used throughout the structure calculations.
The solvent accessibility potential was developed to distinguish buried and exposed residues. Therefore inclusion of the new potential is not expected to significantly improve the overall quality of the refined structures when a large number of other experimental NMR data are available. However this new potential should assist in determining the overall fold of the protein during the initial stage of the structure determination when relatively few restraints are available. To test the convergence efficiency using this potential to determine the overall fold of the protein structure starting from a random coil conformation, a total of 500 non-intraresidue 1HC-1HC NOE restraints were randomly chosen from the 4689 total NOEs . Among the 500 randomly chosen NOEs, there are 197 long-range (greater than 5 residues apart), 135 medium-range (2 to 4 residues apart), 21 sequential and 147 intraresidual NOEs. Dihedral and hydrogen bond restraints were included, but the RDCs were omitted. The calculations with and without the new solvent accessibility potential were performed. A total of 50 structures were calculated, from which the 5 lowest energy structures were chosen as a representative ensemble. To check for accuracy the resulting structures were compared to the lowest energy structure of Bax (PDB ID: 1F16). The optimal force constant for the new potential used was 70 kcal/Å2, based on the structure precision (convergence), accuracy and the energy contribution from the new potential. As would be the case in a de novo structure calculation we assumed the optimal slope and offset values were unknown. Therefore, a coarse grid search of the slope was carried out by varying its value from 1.4 Å2 to 2.4 Å2 while keeping the offset fixed. This was followed by a search for offset by varying their values from 0.0 Å2 to 0.12 Å2 while keeping the slope fixed at its optimal value. The slope and offset for structures with minimum energies were shown to be at 1.6 Å2 and 0.06 Å2, respectively (Figure 3A&B). To further optimize them, structures are calculated using these slope and offset values. These structures were then used to fit the experimental intensity ratios to give a new set of slope and offset. This iterative operation was performed three times to arrive at the final optimized parameters (slope of 1.69 Å2 and offset of 0.09 Å2) with the lowest energy structures. The optimal slope and offset values found in this manner are close to those reported above using the original Bax structure. The convergence and accuracy statistics of the calculated structures with or without the new potential (EAcc) are summarized in Table 1. An obvious improvement in structure convergence was observed for the 5 lowest energy structures when the new potential was used. The correlation between calculated and measured water-protein interaction intensity ratios improved from a correlation coefficient R of 0.79 to 0.86 (Figure 3C&D). This improvement is accompanied by an increase of precision, as measured by the backbone RMSD (calculated for residues 15–189) of the 5 lowest energy structures, of the calculated structures from 3.12±0.97 Å to 1.59±0.21 Å when the new potential is used. The structure comparison was illustrated in Figure 3E&F. The backbone of calculated structures using the new potential is 2.77±0.04 Å away from the published Bax structures (PDB ID: 1F16), while without using the potential it is 3.93±1.40 Å away.
To test for compatibility of the new energy term with the other terms, calculations were carried out with full restraints, including all distance restraints and residual dipolar couplings as previously described . Similar to the calculation described above, using force constant of 70 kcal/Å2 the calculated structures showed some improvement in precision with the addition of EAcc (Table 1). For instance, the backbone RMSD for the 5 lowest energy structures was 1.27±0.16 Å and 1.16±0.17 Å for structures calculated without and with the solvent accessibility potential, respectively. The correlation between calculated and measured water-protein interaction intensity ratios values improved from a correlation coefficient R of 0.82 to 0.85. To check whether the new potential is pulling the structure to a different energy minimum, the calculated structures were compared to the original lowest energy structure. The difference between the accuracy of the two calculations is not significant. The backbone structures calculated with and without the new energy term are 1.92±0.23 Å and 1.95±0.25 Å away from the original structure, respectively. This result demonstrates that inclusion of this new potential term in the context of full experimental restraints did not result in structures that were completely different from the ones calculated without the new potential, but can still improve the precision of the structures.
To show that this method is generally applicable we evaluated the new empirical solvent accessibility term against experimental data from other proteins. For instance water to protein magnetization transfer data obtained from the protein Bid  (PDB ID: 2BID) could be correlated to the surface accessibility values and yield a linear correlation coefficient of 0.79 (Supplemental Figure 2A). The linear regression of the correlation plot for Bid resulted in a slope for Eq. (2) of 1.72 Å2 with the offset value of 0.03 Å2. In addition we also measured water to protein magnetization transfer data for 15N, 2H-labeled ubiquitin. The measured intensity ratios are correlated linearly with the calculated water accessibility, using the lowest ubiquitin structure (PDB ID: 1D3Z) , with a linear correlation coefficient of 0.88 (supplemental Figure 2B). The linear regression of the correlation plot for ubiquitin resulted in a relationship of .
In order to test the applicability of the new term with another type of water accessibility data, previously published solvent PRE data of ubiquitin  were tested. Values calculated using Eq. (1) for the lowest energy NMR structures of ubiquitin (PDB ID: 1D3Z) correlated well with the experimental solvent PRE values (Figure 4A) with correlation coefficient R of 0.86 (0.84 ± 0.03 for 10 lowest energy ubiquitin structures). Furthermore, the experimental solvent PRE (ΓPRE) can be fit to a linear empirical function similar to Eq. (2). Using the 10 lowest energy ubiquitin structures with the a cutoff radius Rc of 20 Å, the following linear relationship to the effective surface area was found:
Based on Eq. (4), the experimental solvent PREs (s−1·mM−1) (0 <ΓPRE <1) could be converted into restraints and used for the structure calculations.
By applying a calculation protocol similar to that used for Bax, we tested the convergence efficiency using the new solvent accessibility potential for ubiquitin . Starting from a random coil conformation, a total of 202 NOE restraints were randomly chosen from the 2727 total NOEs . Dihedral and hydrogen bond restraints were included, but the other restraints were omitted. The calculations of protein ubiquitin with and without the solvent PRE restraints were performed. Using the same optimization protocol as in the calculation of Bax, the optimal force constant found was 300 kcal/Å2 and the optimal slope and offset were 0.396 Å2·s·mM and 0.096 Å2, respectively. The convergence and accuracy statistics of the calculated structures are summarized in Supplemental Table 1. An obvious improvement in structure convergence was shown for the 5 lowest energy structures when the new potential was used. The correlation between calculated and measured intensity ratios values improved from a correlation coefficient R of 0.79 to 0.90 without and with the new potential, respectively (Supplemental Figure 3). This improvement is accompanied by an increase of precision, as measured by the backbone RMSD (calculated for residues 2–71) of the 5 lowest energy structures, of the calculated structures from 1.76±0.31 Å to 1.24±0.20 Å when the new potential is used. The backbone accuracy of calculated structures using the new potential is 2.01±0.14 Å away from the published ubiquitin structures (PDB ID: 1D3Z), compared to 2.38±0.40 Å without using the potential. In comparison, calculations using the water-protein interaction data also improved the ubiquitin structures to 1.10±0.08 Å in precision and 2.03±0.14 Å in accuracy (Supplemental Table 1).
The new solvent accessibility term is expected to be useful in determining structures of protein complexes. In order to demonstrate this, we tested a structure calculation for the protein Sam68 Qua1 homo-dimerization domain. We applied the new solvent accessibility potential using the previously published solvent PRE data of the Qua1 homodimer . The correlation of values calculated using Eq. (1) for the lowest energy NMR structures of Qua1 homodimer (PDB ID: 2XA6) correlated well with the experimental solvent PRE values (Figure 4B) with correlation coefficient R of 0.82 (0.84 ± 0.01 for 10 lowest energy Qua1 homodimer structures) and the following linear relationship to the effective surface area was found:
Experimental solvent PREs (s−1·mM−1) (0<ΓPRE <1) were used as restraints in structure calculations. High PREs (ΓPRE > 1) of surface protons were intentionally excluded due to the contamination of chemical exchange with water protons .
The calculation protocol used was similar to the one used for Bax and ubiquitin. Starting from a random coil conformation, a total of 150 out of 883 intra-molecular and 40 out of 125 inter-molecular NOE restraints were randomly chosen for each monomer subunit . Dihedral restraints were included, but the other restraints were omitted. The optimal force constant used was 300 kcal/Å2. Again we assumed that the actual slope and offset described in Eq. (5) were not available. A grid search for the slope (between 0.1 Å2·s·mM and 0.4 Å2·s·mM) and offset (between 0.05 Å2 and 0.18 Å2) was conducted (Figure 5A&B). These values were further optimized by the iterative method to arrive at the final slope of 0.137 Å2·s·mM and offset of 0.151 Å2. The resulting 5 lowest energy structures calculated with the new solvent accessibility potential showed an improvement compared to structures calculated without the new potential (EAcc) (Table 2). The correlation between experimental and calculated solvent PRE values improved from a correlation coefficient R of 0.77 to 0.94 when the new solvent accessibility potential was used (Figure 5C&D). In addition, the precision of the calculated structures increased from 2.59 ± 0.34 Å to 1.70 ± 0.29 Å evaluated using the backbone atoms of the 5 lowest energy structures (calculated residues 99–134 for both subunit A and B) when the new potential was applied (Figure 5E&F). For each subunit of the Qua1 homodimer, the precision of the calculated structure increased from 2.34 ± 0.48 Å to 1.49 ± 0.34 Å and 1.99 ± 0.70 Å to 1.41 ± 0.40 Å for subunit A and B, respectively. The backbone of calculated structures using the new potential is 3.01 ± 0.20 Å away from the published Qua1 homodimer structures (PDB ID: 2XA6), while without using the potential it is 3.44 ± 0.37 Å away. The accuracy of the each subunit of the Qua1 homodimer increased from 3.17 ± 0.45 Å to 2.77 ± 0.35 Å and 2.98 ± 0.67 Å to 2.83 ± 0.30 Å for subunit A and B, respectively.
We demonstrate the utility of a new empirical function to represent the degree of solvent exposure of an individual residue in a protein. This new term was designed based on the location of neighboring atoms or “crowding” around the residue, which is directly related to the solvent accessibility (Supplemental Figure 1). The ability to distinguish “inside” and “outside” residues in principle should assist in determining a global fold of the protein. We illustrated the feasibility of this approach using both experimentally acquired water to protein magnetization transfer data and solvent PRE data. These data correlated well with the values calculated from the known protein structures using the new empirical term (Figure 2&4). The addition of this new potential to a limited number of NMR-based restraints can significantly improve structure convergence to the correct conformation as evidenced for protein Bax (Table 1) and ubiquitin (Supplemental Table 1). Furthermore, these types of solvent accessibility restraints were used in determining the protein complex structures as illustrated with the Qua1 homodimer (Table 2). These restraints are very useful in this case since the interaction surfaces of the complex have different solvent accessibility profiles than the free proteins. During the calculation the proper slope and offset in Eq. 2, 4, or 5 can be searched by a coarse grid method and further optimized using an iterative operation by fitting to the experimental data to the calculated structures. From the calculations tested using proteins Bax, ubiquitin and Qua1 homodimer, we demonstrated that access to this type of solvent accessibility information could substantially expedite the initial stage of the NMR structure determination.
The water to protein magnetization transfer experiment used to illustrate our method probes very complicated interactions between the protein and the water molecules. This measurement alone will not differentiate the water to a protein-proton NOE rate from the solvent exchange rate. In experimental conditions used here, the chemical exchange rates between bulk water and exchangeable NH protons are generally faster than the inefficient NOE rates between the two due to the short residence time [33; 44]. For those amide hydrogen’s protected by hydrogen-bonds, most of the amide hydrogen-water NOE correlations seen in bulk solution are also likely dominated by exchange-relayed NOE interactions with labile side chain hydrogen, not by direct protein-water NOEs [30; 45; 46]. Nevertheless, it is important to point out that an increase in both NOE and solvent exchange rates still indicates that the residue is solvent exposed. In addition to the above issue, long lived bound water or buried water molecule, Cα protons that resonate close to water frequency, as well as rapidly exchanging hydroxyl protons can result in measured intensity ratios that will not follow the relationship described here. Some of these contributions can be distinguished by carrying out a complementary ROE experiment as previously suggested [36; 47]. Interestingly, the intensity ratios for Bax and Bid that we measured were not screened for any of these possible effects and still resulted in an acceptable outcome. We did, however, use a deuterated ubiquitin sample in our water to protein magnetization transfer experiment to try to overcome the problem due to some Hα resonances that are close to the frequency of water. Alternatively, any intensity ratios that do not follow the empirical relationship can be identified during the structure refinement process and treated accordingly, since they most likely will not agree with the distance restraints of the protein. This is similar to the PRE refinement when there is contribution from minor, previously unknown, conformation that would result in a deviation from the predicted PRE values .
The results presented here provide some evidence that the SAcc metric can be of general use in quantitatively interpreting various observations that primarily depend on solvent accessibility. The inclusion of this type of structural information that differentiates surface exposed from buried residues can significantly improve convergence and accuracy in structure calculations.
We are grateful to Motoshi Suzuki for the help on experimental setup and Yi He for fermentation preparation of isotopically labeled Bax. We thank Tobias Madl and Michael Sattler for providing us with the solvent PRE data. This work was supported by the NIH Intramural Research Programs of CIT (C.D.S) and NHLBI (N.T.).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.