|Home | About | Journals | Submit | Contact Us | Français|
We describe a new, computationally effcient method for computing the molecular alignment tensor based on the molecular shape. The increase in speed is achieved by re-expressing the problem as one of numerical integration, rather than a simple uniform sampling (as in the PALES method), and by using a convex hull rather than a detailed representation of the surface of a molecule. This method is applicable to bicelles, PEG/hexanol, and other alignment media that can be modeled by steric restrictions introduced by a planar barrier. This method is used to further explore and compare various representations of protein shape by an equivalent ellipsoid. We also examine the accuracy of the alignment tensor and residual dipolar couplings (RDC) prediction using various ab initio methods. We separately quantify the inaccuracy in RDC prediction caused by the inaccuracy in the orientation and in the magnitude of the alignment tensor, concluding that orientation accuracy is much more important in accurate prediction of RDCs.
Knowledge of protein structure plays a critical role in our understanding of the molecular mechanisms underlying biological processes. One of the main methods for obtaining structural information at atomic-level resolution is the use of nuclear magnetic resonance (NMR) spectroscopy for determining structural constraints. The NMR-derived constraints, such as NOEs, hydrogen bonds, and torsion angles, are intrinsically local or short-range and could be insuffcient for accurate structure determination of biological macromolecules and their complexes due to the scarcity of long-distance structural information. Residual dipolar couplings (RDCs), resulting from partial alignment of solute molecules relative to the magnetic field, provide valuable structural information in terms of global, long-range orientational constraints . A commonly used method for aligning molecules in solution takes advantage of the anisotropy of molecular shape by imposing steric restrictions on the allowed orientations of the molecule (e.g. by means of bicelles , stretched gels [3, 4], or PEG/hexanol-based media ). Such steric alignment can often be modeled as caused by planar obstacles, and we will refer to this simplified model of molecular alignment as the barrier model. The alignment of a rigid molecule can be described by the so-called molecular alignment tensor. Accurate prediction of the molecular alignment tensor, and with it of the RDCs, is important for NMR-based structure determination and validation as well as applications to dynamic and disordered systems (see e.g., [6, 7, 8, 9, 10, 11]). The sensitivity to molecular shape has the potential for improving structure characterization, especially in multidomain systems and macromolecular complexes (e.g., ), by fully integrating RDC prediction into structure refinement protocols to directly drive structure optimization. Future progress in this direction critically depends on the effciency and accuracy of the alignment tensor prediction.
Several methods for computing the molecular alignment tensor ab initio, i.e. based solely on the three-dimensional shape of the molecule, have recently been proposed. In the method by Zweckstetter and Bax [13, 14], implemented in a program called PALES, the alignment tensor is computed by uniformly sampling all orientations of a molecule (see e.g., ) at various distances away from a planar barrier, and averaging over only those orientations in which the molecule’s surface does not collide with the barrier. The computational effciency of this method is limited due to the fact that it must compute collisions between an arbitrary shape and a plane for every sample in the four-dimensional problem space.
Simpler methods based on the barrier model, but representing the shape of the molecule by an equivalent ellipsoid, have also been proposed. In Fernandes et al. , the alignment tensor was computed by approximating the molecule as an axially-symmetric prolate ellipsoid and analytically solving the barrier model for the alignment tensor. In Almond et al.  and Azurmendi et al.  the barrier method is also used, but the formulae are derived empirically.
Here we describe a new, computationally effcient method for computing the molecular alignment tensor based on the barrier model. The increase in speed is achieved by re-expressing the problem as one of numerical integration, rather than one of simple uniform sampling. This formulation allowed us to simplify the problem by reducing its dimensionality from four to two. In addition to the reduction in computational complexity, numerical integration has the advantage of (i) allowing control over the size of numerical error, and (ii) allowing a more effcient sampling of the problem space . Computational geometry techniques are used to increase the computational speed further. We will refer to our method as PATI (Prediction of Alignment Tensor using Integration). PATI can also be used with an equivalent ellipsoid of the molecule instead of the full surface. We will refer to this simplification as PATI-E. This simplified method is used to explore and compare various representations of protein shape by a (fully anisotropic) equivalent ellipsoid: based on the gyration tensor , the actual molecular surface , or the minimum-volume ellipsoid.
Finally, we examine the accuracy of the proposed methods (PATI and PATI-E) and the existing ab initio methods for RDC prediction. This analysis separately quantifies the effect of inaccuracy in the predicted RDCs caused by the inaccuracy in the orientation or in the magnitude of the alignment tensor. The results obtained for several proteins show that (i) the predicted RDCs and their agreement with experimental data are very sensitive to errors in orientation of the alignment tensor, and (ii) all ab initio prediction methods tested here give a rather crude estimate of the RDCs.
For a rigid molecule, the molecular alignment tensor A with respect to the magnetic field B is described by a 3 × 3 symmetric traceless matrix , sometimes referred to as the Saupe matrix , with the following elements (i, j = 1, 2, 3):
where θi is the angle between molecular axis i and the magnetic field B, < … > is the average over all possible orientations of the molecule in solution, and δ is the Kronecker delta.
The RDC value DPQ for a specific bond PQ is related to the alignment tensor and the bond’s orientation relative to the molecule’s coordinate frame by the following equation:
where i is the angle between the PQ bond and the molecular axis i, SLS is the Lipari-Szabo generalized order parameter, μ0 is the permeability of free space, γP and γQ are the gyromagnetic ratios of the corresponding nuclei, is the reduced Planck’s constant, and rPQ is the length of the bond. PQ can represent bonds such as NH, CαHα, CαC′, and C′N.
Given the 3D structure of an arbitrary molecule, we will focus on the computation of its alignment tensor A, defined in Equation (1). We model the planar barrier causing steric alignment of the molecule as a set of two infinite planes with the surface normals in the z direction, positioned at a distance 2h from each other. The molecule is centered around some point m (e.g., its center of mass), which lies somewhere inside the convex hull of the molecule’s surface. The direction of the magnetic field is given by a unit vector B, where
Figure 1 shows a schematic representation of the planar barrier model. Note that due to the symmetry of the system, the possible orientations of the molecule positioned between 0 and h along the z-axis are mirror images (over the xy plane) of the possible orientations when the molecule is between h and 2h. Thus we can simplify the model by considering only the bottom plane and positioning the molecule’s center at a height from 0 to h above this plane.
The orientation of the molecule’s coordinate frame relative to the Cartesian coordinate system in Figure 1 can be defined by three Euler angles α β, and γ, which determine the rotation matrix R(α, β, γ). (See Appendix A.)
For a specific molecule, we define S to be a finite set of sample points from its molecular surface (e.g. van der Waals surface or Richards molecular surface), and the center m of the molecule to be some point inside the convex hull of this molecular surface. Referring to Figure 1, to characterize the vertical extent of the molecule under the rotation R(α, β, γ) around its center m, we define η(α, β, γ), to be the difference between the z-coordinate of the center of the molecule and the minimum z-coordinate value of all the rotated points in S:
Note that η(α, β, γ) sets the lower limit on the height of the center of the molecule at a given orientation.
We rewrite F, from Equation (1), in terms of the rotation matrix,
and average the F values at height a with the mirror cases of 2h − a into one equation
Due to the symmetry of the system, can be used instead of F to simplify our model to just one plane and a height from 0 to h.
For any rotation R(α, β, γ), the center of the molecule cannot be located at a height between 0 and η(α, β, γ). Therefore, the range of interest is from η(α, β, γ) to h. The alignment tensor A is then computed by summing weighted by the probability of the current height and orientation, for all allowed orientations and heights from η(α, β, γ) to h.
We assume equal a priori probabilities of all orientations at all heights, and write the analytical expression for Aij from Equation (1). To obtain a uniform distribution of the Euler angles we multiply our integrand by the Jacobian J = sin β/(8π2)  to obtain
where N is the normalization factor
and [α0, α1], [β0, β1], [γ0, γ1] are the ranges in which α,β, γ are defined.
We show in Appendix A, using the Euler z-y-z rotation, that the expression for the alignment tensor A, of an arbitrarily-shaped molecule can be simplified from a quadruple integral to a double integral. Specifically,
where η is independent of the angle γ and
Because A is a traceless symmetric tensor , only A11 and A22, A21, A31, and A32 need to be computed, while A33 = −(A11 + A22), A12 = A21, A13 = A31, and A23 = A32. One can multiply the alignment tensor by −0.8 to account for the incomplete bicelle alignment the order parameter for the liquid crystal), and to match the sign returned by PALES. The height h can be determined by the formula d/(2Vf), where d is the barrier thickness (≈ 40Å for DMPC/DHPC bicelles) and Vf ( 1) is the sample volume fraction occupied by the barriers (see [13, 14]).
Thus, all one needs to know in order to compute the alignment tensor is η(α, β), defined in Equation (4). Being an intrinsic geometric property of the molecule, η(α, β) can be computed separately, regardless of the barrier.
In the PALES approach [13, 14], A is estimated based on forming a mesh of the molecular surface and then rotating all the mesh triangles of this surface to check if any part of the mesh is below the barrier. Observe that the complexity of each rotation is proportional to the number of triangles in the mesh. It is possible to simplify the mesh using mesh simplification (see [23, 24]); however even this is overly complex. An infinite planar barrier is not sensitive to cavities on the surface of the molecule; therefore, a convex hull of the molecule is a sufficient representation of the molecule’s surface. Additional mesh simplifications could be performed on the convex hull to further reduce the number of points.
To compute η for an arbitrary molecule under a rotation R, we simply compute the convex hull of the atom positions of the molecule and consider the vertices of the convex hull as the set S in Equation (4). We add the van der Waals radius of the atom associated with the minimum z-value to Equation (4) to form η for the rotation R. Figure 2A shows the convex hull around the Cyanovirin-N molecule. The number of points used to represent the molecule drops dramatically, from 40708 in the molecular surface representation (see [25, 26]), to just 57 in the convex hull representation. For any rotation R, the relative error in η between the two representations is less than 5% and the absolute error is less than 0.5Å. Also the alignment tensors and the RDCs predicted by PATI (our method) and PALES are almost identical, as shown below.
A potential simplification for computing the alignment tensor is to represent a molecule by an equivalent ellipsoid. In this section we examine several methods of deriving an equivalent ellipsoidal representation of an arbitrary molecule.
An ellipsoid ε in 3 is defined as
where E is a 3 × 3 symmetric positive definite matrix that defines the shape of the ellipsoid and m 3 is its center.
The ellipsoid’s semi-principal axes can be derived by an eigendecomposition of E, such that
where the lengths of the semi-principal axes are , and V1, V2, V3 are their associated directions (eigenvectors). The matrix V also represents the rotation matrix of the ellipsoid from the orientation where it is aligned to the principal axes of the coordinate frame to the actual orientation.
One method for finding an equivalent ellipsoid of a molecule is to find the minimum volume ellipsoid (MVE) that encloses all of the atoms in the molecule. Several methods exist for computing MVE [27, 28]. When we use the MVE in PATI, we denote the method as PATI-E.
A second method for constructing an equivalent ellipsoid of a molecule is to find an ellipsoid with the same gyration tensor as the molecule. The length of the ith axis of the equivalent ellipsoid based on the gyration tensor is computed as , where λi is the ith eigenvalue of the gyration tensor. We will refer to the ellipsoid with an equivalent moment of inertia as the Gyration Ellipsoid (GE) (see Fernandes et al. ).
A third method for finding an equivalent ellipsoid is to fit the molecular surface. This can be achieved via the Principal Component Analysis (PCA) of the coordinates of the points representing the surface of the molecule (see Ryabov et al. ). The length of the ith axis of the equivalent ellipsoid based on this method is , where λi is the ith eigenvalue of the corresponding covariance matrix. We refer to this method as PCAE.
Figure 2 illustrates the three ellipsoid models of the molecular surface for the Cyanovirin-N molecule.
As shown in Appendix B, for an arbitrary ellipsoid under an Euler rotation R(α, β, γ), η(α, β, γ) can be expressed as
In this section we present a comprehensive comparison of several methods for computing RDCs ab initio. All the RDC data analyzed here are for the backbone NH bonds located in structurally well-defined regions of proteins, i.e. the α-helices and β-sheets. The RDC data were retrieved from the BMRB repository using the PDB code of the molecule. Only the RDC values measured using the neutral bicelle alignment medium (or, in the case of the B3 domain of protein G, the PEG/hexanol-based medium) are used. The 9 proteins and their codes in the Protein Data Bank are listed in Table 1.
We assess the quality of our results by computing the quality factor between the vector of experimental RDCs, Dexp, and our predicted RDCs for those same bonds, Dpred, as :
where ||·||2 is the l2-norm.
Note that the predicted magnitude of the RDC values depends on the experimental conditions (they determine the barrier height h) and selection of values for equation constants, e.g. CPQ. These factors affect all RDCs approximately uniformly, and hence can be repsesented by a scaling factor. Therefore, in order to make our analysis less sensitive to possible errors in experimental conditions and imperfect selection of values for constants, we also introduce the scaled quality factor to quantify the agreement between the experimental and predicted data with an unknown scaling factor. We define the scaled quality factor as
where the scalar ρ can be computed by linear least squares. (Note that both PATI and PALES can predict the magnitude of RDC values with reasonable accuracy. See Supplementary Material for the values of ρ.)
First, we present the Q values for the experimental alignment tensor. We define the experimental alignment tensor, Ã, as the alignment tensor that optimally fits the data, i.e. gives the lowest Q value between the experimental and back-calculated RDC data. This quality factor allows us to examine whether the experimental data are well approximated by the theoretical equation for RDCs. We derive Ã by solving a linear least-squares problem of the form
where is the normalized vector representing the orientation of the ith bond relative to the molecular coordinate frame, n is the number of bonds, and CNH is the value of CPQ in Equation (2) for a NH bond. The linear least-squares problem can be solved by standard methods; see, e.g., .
Note that Ã can be decomposed into the experimental rotation (eigenvectors) and experimental magnitudes (eigenvalues) Ã1, Ã2, Ã3, where
The Q values for the experimental alignment tensor derived using Equation (16) are presented in Column 3 (‘LS’) in Table 1. The corresponding Q values for the best-fit alignment tensor derived from PALES are presented in Column 4, labeled PALES-LS. Naturally, Qs = Q for both methods. It is worth emphasizing here that this quality factor measures the actual quality of the experimental data (i.e. how well they fit the theoretical equation for RDCs) and therefore provides the baseline Q value for subsequent evaluation of the prediction methods. Note also that the values in the parentheses, the relative error in the alignment tensor, confirm that Equation (16) gives the same experimental alignment tensor Ã as the alignment tensor derived using PALES’s best-fit algorithm.
Table 1 shows that the experimental RDC data are of high quality and consistent with the theoretical formulation of the RDC (Equations (1–2)). This is not surprising given that these RDCs were used as constraints in the calculation/refinement of the corresponding protein structures. The quality of the agreement is illustrated in Figure 3A for Cyanovirin-N. (See also Supplementary Material.)
The results of our ab initio calculations are presented in Table 2, for PATI, PALES, and for the ellipsoidal approximation methods using the MVE model. The MVE ellipsoid data are used in this table as this model provides on average a slightly more accurate estimation of the alignment tensor compared to the other two equivalent ellipsoid models considered in this study. (See Supplementary Material.) Surprisingly, the scaled quality factor Qs was rather high for all prediction methods, indicating a generally marginal agreement with experimental data, as illustrated in Figure 3D and the Supplementary Material. PATI and PALES calculations gave on average a slightly better agreement with the data compared to the other methods. It should be emphasized here that PATI gives almost identical results to PALES, as evident from Figure 4. In order to understand the reasons for the observed inaccuracy in our predictions, we now break down the contributions to the errors into those due to the principal values of the alignment tensor and those due to inaccuracy in its orientation.
In Table 3 we compare the Qs values for ‘synthetic’ alignment tensors that have the same orientation as the ab initio calculated tensors but the correct (experimental) principal values. We constructed these tensors by combining the rotation matrix V determined from our five models, GE, PCAE, MVE, PATI, and PALES, with the experimental magnitudes Ã1, Ã2, Ã3 of the alignment tensor derived from Ã. Such a comparison is expected to rank the methods based on the accuracy of prediction of the tensor’s orientation. Since the orientation of V for the equivalent-ellipsoid-based methods is derived directly from the orientation of the ellipsoid, this table also provides a direct comparison of the ellipsoid models. Note that there are six different combinations for V, since it is unknown a priori which Ãi is associated with which Vj. The smallest of the six Qs values is shown. Naturally, Qs = Q in this case.
As evident from Table 3, correcting the principal values of the alignment tensor while keeping its predicted orientation did not improve the agreement with experimental data. (See also Figure 3B.) There are large variations among the various models in the accuracy of the predicted orientation of the alignment tensor. Of the three ellipsoid models tested here, MVE gave on average a somewhat better orientation (as documented in the Supplementary Material), while PATI and PALES yielded generally similar results.
We then constructed ‘synthetic’ alignment tensors that have the correct orientation (i.e. the matrices derived from the experimental tensors Ã) but the same principal values (A1, A2, A3) as the ab initio calculated tensors. Table 4 displays the Qs values for five prediction methods, PATI, PALES, PATI-E, Almond, and PROLFIT. From this table, it is clear that using the correct orientation of the tensor dramatically improved the agreement with experimental data (cf. Table 2). This improvement is illustrated in Figure 3C for Cyanovirin-N and in the Supplementary Material for the other molecules.
Note that Column 3 (‘PATI-E’) and Column 5 (‘PROLFIT’) in Table 4 show that an additional degree of freedom provided by a fully anisotropic ellipsoid versus an axially-symmetric prolate ellipsoid approximation gives an improvement in the Qs.
Thus, the analysis presented above demonstrates that accurate prediction of the orientation of the alignment tensor is critical for the agreement with experimental data. Accurate prediction of the principal components of the tensor is important, too. However, when experimental RDCs are available, one can make an educated guess, based on the observed histogram/distribution of the data, about the magnitude of the tensor components (e.g., as described in ) and scale the predicted alignment tensor appropriately, whereas there is no obvious way to predict the orientation of the tensor.
We have reformulated the planar barrier model as a numerical integration problem and implemented it in a program called PATI. Our method has accuracy similar to PALES but is computationally more effcient and allows for finer control over numerical error. In addition, the convex hull provides a simpler representation of the surface, thus further increasing the computational effciency of the proposed method. This could allow PATI-based RDC prediction to be incorporated into the existing structure determination/refinement protocols. Because the molecular alignment tensor (and hence the RDC) is sensitive to the overall size and shape of the molecule, this would provide additional structural constraints that could potentially improve the accuracy of structure determination by NMR.
We compared several methods (old and new) for the computation of an equivalent ellipsoid of a molecule. We examined the accuracy of these equivalent ellipsoid models in predicting the alignment tensor and showed that the minimal volume ellipsoid gives on average a slightly better prediction of the alignment tensor orientation.
Finally, we compared all these methods against an extensive set of experimental RDC data. The analysis of the discrepancy between the experimental and predicted values emphasized the importance of the accurate prediction of the orientation of the alignment tensor. Possible sources of inaccuracy in ab initio alignment tensor prediction are the dynamic nature (structural flexibility) of protein molecules, not accounted for in the current prediction models, as well as the fact that the simple steric barrier model might not fully allow the correct alignment of all the molecules.
Supported by NIH grant GM065334 to D.F. The PATI program is available from the authors upon request.
In this section we will simplify the expression for A for an arbitrary molecule from the quadruple integral of Equation (7) to a double integral.
Define the Euler z-y-z rotation matrix as R(α, β γ) = Rz(γ)Ry(β)Rz(α), where
Multiplying the three matrices yields the full expression
We now write the equations for A11, A22, A33, A21, A31, A32, and N, recalling that the Jacobian is J = sin β/(8π2):
We observe that γ does not contribute to the vertical size of the molecule, and redefine η(α, β, γ) as η(α, β). Integrating by γ and z first gives us
We perform a change of variable, u = cosβ, obtaining
Integrating the terms that do not involve η gives us Equation (9).
In this section we derive the analytical expression for η for an arbitrary ellipsoid, following the notation of Section 2.4. Due to the symmetry of the ellipsoid we consider only one octant in our analysis, expressing all points p on the ellipsoid in that octant as
for x [0, a] and .
A rotation of the ellipsoid by R(α, β, γ) transforms the coordinates of these points into
We observe that
where x*(α, β, γ) and y*(α, β, γ) minimize z′.
To find the minimum/maximum value of our rotated ellipsoid, we solve z′(x, y) = 0:
Therefore, x*, y* minimizes z′, and the optimal value of z is
where γ is arbitrary for the Euler rotation defined in Equation (A1).
The axis-angle representation of a rotation parameterizes the rotation by two values: a unit vector indicating the orientation of the axis, u, and an angle, θ, describing the magnitude of the rotation about that axis. The direction of rotation around the axis u is determined by the right-hand rule. One advantage of the axis-angle representation over Euler angles representation is that one can easily quantify the magnitude of the rotation by the size of the rotation angle θ.
Given a rotation matrix R, θ is computed as
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.