|Home | About | Journals | Submit | Contact Us | Français|
Advances in NMR instrumentation and pulse sequence design have resulted in easier acquisition of Residual Dipolar Coupling (RDC) data. However, computational and theoretical analysis of this type of data has continued to challenge the international community of investigators because of their complexity and rich information content. Contemporary use of RDC data has required a-priori assignment, which significantly increases the overall cost of structural analysis. This article introduces a novel algorithm that utilizes unassigned RDC data acquired from multiple alignment media (nD-RDC, n≥3) for simultaneous extraction of the relative order tensor matrices and reconstruction of the interacting vectors in space.
Estimation of the relative order tensors and reconstruction of the interacting vectors can be invaluable in a number of endeavors. An example application has been presented where the reconstructed vectors have been used to quantify the fitness of a template protein structure to the unknown protein structure. This work has other important direct applications such as verification of the novelty of an unknown protein and validation of the accuracy of an available protein structure model in drug design. More importantly, the presented work has the potential to bridge the gap between experimental methods and computational methods of structure determination.
Recent advances in instrumentation of Nuclear Magnetic Resonance (NMR) spectrometers in addition to advances in pulse sequence design have significantly improved the ease with which Residual Dipolar Coupling (RDC) data can be acquired. In the recent decade, RDC data have been used to study the structure and dynamics of macromolecules including RNA/DNA [1; 2], carbohydrates [3–5] and proteins [6–16]. More recently, RDC data have been used successfully in simultaneous structural elucidation or characterization of internal dynamics in both aqueous [11; 17–19] and membrane [20–25] proteins.
Central to the study and analysis of the RDC data, lays the accurate estimation of alignment tensors, which provide the required information for characterization of structure or study of internal motion. Presently, the main method of determining order tensor estimates from RDC data relies on the costly and time-consuming requirement of resonance assignment and the existence of a high resolution structure [26; 27]. Some research has been conducted in obtaining order tensor estimates from unassigned RDC data collected in a single medium by comparison to the background RDC distribution obtained for an infinite number of uniformly distributed vectors (powder pattern) [28; 29]. These methods work reasonably well for certain large proteins. In general, however, the estimates of the principal order parameters obtained this way are not sufficiently accurate. Furthermore, it is mathematically impossible to determine any orientational information using these methods. Recent work [30; 31] has combined methods of estimating the principal order parameters of the order tensors from unassigned RDC data with a known structure to approximate the orientational components of the order tensor as well. This method has the advantage of not requiring a high-resolution structure; a representative of the structure’s protein fold family or the structure of a closely related homologue will often suffice. However, the order tensor estimates obtained in this way may not generally be trustworthy since it still principally assumes adequate sampling of the RDC space.
Here, we present a method that utilizes unassigned RDC data collected from 3 or more alignment media in order to provide highly accurate relative order tensors (as defined in section 2.2) for each of the alignment media. This method is notable for avoiding the requirements of assignment or a-priori knowledge of the structure while still being able to determine the relative orientation and the strength of alignment (principal order parameters) of the order tensors for each of the alignment media. An additional consequence of our algorithm is the reconstruction of the interacting vectors within the principal alignment frame of the anchor alignment medium (defined in section 2.2) to within 2 solutions when data from 3 or more aligning media are available.
The current version of our presented method provides the coordinates of the vectors in space without their assignment information. Despite the missing assignment, the reconstructed vectors can be of great utility. Here we also present an application of the reconstructed vectors in identifying the most homologous structure from a list of structures. These algorithms were implemented in the free statistical programming language and computing environment R (http://www.r-project.org/) and are available upon request from the corresponding author.
Residual dipolar coupling data (RDC) arise from the spin interaction between two nuclear magnetic moments and the external magnetic field (B0) of the NMR instrument. RDC data have provided many exciting avenues of exploration in recent years. In this article, we will not present the practical aspects of data acquisition and focus only on the relevant theoretical formulation of the phenomenon to facilitate our discussion. The interested readers are referred to many existing review articles [12; 32–36] that have been presented in the past for additional information.
The RDC between two spin 1/2 nuclei i and j can be formulated as shown in equation 1, assuming a constant inter-nuclear distance.
In this equation, γi, γj are the gyromagnetic ratios of the two interacting nuclei, h is Planck’s constant, r is the distance between the two nuclei, and θij is the angle between the internuclear vector and the external magnetic field B0. The angled brackets (< · >) in equation 1 denote the time average dependence of the RDC observable. Manipulation of this equation can lead to a more commonly listed formulation of this interaction as shown in equation 2.
Where Dmax is the collection of all the constants from equation 1, v is the internuclear vector with the Cartesian coordinates (x, y, z), and S denotes the Saupe order tensor matrix  or order tensor matrix (OTM) for short. Any valid order tensor matrix, S, described in the Cartesian space must be of dimensions 3×3 and exhibit symmetric and traceless properties [26; 27; 33; 38]. As shown previously [26; 27], a valid order tensor can then be decomposed by Eigen decomposition into its diagonal form as shown in equation 5.
S’ in this equation is a diagonal and traceless matrix, and R represents an Euler rotation matrix [27; 39]. The diagonal elements of S’ in this formalism provide information regarding the strength of alignment and are referred to as the principal order parameters (POP). The rotation matrix R can be used to obtain orientational information relating the principal alignment frame [26; 27; 39] to the arbitrarily selected molecular frame. The rotation matrix R can be any valid Euler rotation matrix since the orientation of the molecular frame (for instance the orientation of a molecule in a PDB file) with respect to the principal alignment frame is arbitrary. For simplicity, R can then be decomposed into three distinct rotations about the axes z, y and z as shown in equation 6.
This formulation of the RDC interaction can be conceptualized as re-describing the Cartesian coordinates of the interacting vector in the principal alignment frame (PAF) and then using the simpler equation 8 to describe the RDCs. Under this formulation, xo, yo and zo denote the Cartesian coordinates of the interacting vector described in the PAF.
Equation 8 is often written in polar coordinates as shown in equation 9 since it simplifies the representation of each vector to two variables instead of the three variables used in the Cartesian coordinates. This reduction in the number of variables is a consequence of the normality constraint of the interacting vectors.
When RDC data are available from multiple alignment media, each medium’s order tensor is decomposed to result in distinct rotation matrices denoted by Rj, where j indicates the designation of the alignment medium as shown in equation 10. Each of these rotation matrices provides an absolute relationship between each alignment frame and the arbitrarily chosen molecular frame. Alternatively, the orientational component of the alignment for each of the alignment frames can be described in relation to an anchor alignment frame as shown in equation 11. Under this formalism, the rotation matrix RA describes the orientation of the alignment tensor of the anchor medium with respect to the molecular frame and RAj describes the relative orientation of the jth alignment medium relative to the anchor alignment frame. We therefore define relative order tensor as described in equation 11.
Careful selection of the molecular frame can easily eliminate RA from the formalism shown in equation 11 above. If the molecular frame is selected to coincide with the alignment frame of the anchor medium, RA will be equivalent to an identity matrix and therefore can be eliminated from the entire equation.
Because each order tensor is traceless and symmetric, the set of absolute order tensors for n alignment media can be described by 5n (e.g. 15 variables for three alignment media) independent variables. Representation of the RDC data acquired from multiple alignment media in terms of relative order tensors will in total require the same number of independent variables (5n). However, one can partition these 5n variables into two sets of 3 and 5n-3 variables where, 3 independent variables are required to describe the orienational relationship between the MF and PAF of the anchor medium (PAFA) and 5n-3 additional variables are required to describe the n relative order tensors. As mentioned before, careful selection of the molecular frame will result in the elimination of the 3 variables required to describe the relationship between MF and PAFA. For example, selection of the MF to be coincident with the principle alignment frame of the first medium removes three degrees of freedom, and in the relative order tensor formalism, 12 variables are required to describe three alignment media.
When RDC data are collected from multiple alignment media, chemical shift data can be used to correlate the data. Correlated RDC data in this context is defined as the RDC observables in each medium that are originated from a given vector without any knowledge of its assignment along the primary sequence of the protein. In this section we present a conceptual discussion of the reconstruction of interacting vectors using their correlated RDC data.
Estimation of order tensors from a set of vectors paired with RDC values through the use of Singular Value Decomposition (SVD) has been discussed extensively in the literature [26; 27; 39]. However, there has been little discussion of obtaining the orientation of the interacting vectors from a given set of order tensors and correlated RDC data from multiple alignment media. Figure 1(a) illustrates the position of all possible vectors that produce the same value of RDC for an example alignment tensor. In general, there are infinite vectors (the red band), which correspond to the same RDC value. This infinite degeneracy can be reduced to a four-fold degeneracy (in general) by using RDC data from a second alignment medium as shown in Figure 1(b). Extension of the same logic will reduce the final degeneracy to two-fold by incorporating RDC data from the third alignment medium. Figure 1(c) provides an illustration of this conclusion. These two solutions are the exact negation of one another and cannot be disambiguated from one another by the inclusion of RDC data from any number of additional alignment media. It is important to note that there are exceptions to what has been shown here and these exceptions have been discussed further in section 2.3.
As discussed in section 2.2, 5n-3 number of variables is required to describe n relative order tensors. Furthermore, representing k individual normalized vectors in polar coordinates requires 2k independent variables. Therefore, simultaneous study of n relative order tensors and k vectors will require 5n-3+2k independent parameters while resulting nk RDC data points. Since each RDC data point corresponds to an equation in the variables describing the k vectors and n relative order tensors, it can be argued that in theory, simultaneous estimation of relative order tensors and reconstruction of vectors is possible so long as nk ≥5n+2k-3. For example when n=3, k ≥12 should suffice for determining relative order tensors and orientation of internuclear vectors simultaneously. In particular, equation 12 can be formulated, which provides a complete description of the observed RDC values from n alignment media for vector i as a function of its polar coordinates (θi ϕi) and n relative order tensors. Relative order tensors and the orientation of vectors can be obtained simultaneously by solving this system of equations.
This problem can be visualized by noting that this system of RDC equations (equation 12) defines a mapping between the points on the surface of the unit sphere to the point (r1, r2, …, rn), that characterizes the RDC values corresponding to that vector in each alignment medium. The collection of these n dimensional points originated from an infinite number of randomly distributed vectors defines a curved surface, denoted as the nD-RDC surface, which is purely a function of n relative order tensors. The shape of this surface for a sample set of three order tensors is shown in Figure 2. There are several important points to note here. Firstly, since each vector always produces the same RDC values as its negation [27; 40], each point on the surface of the nD-RDC surface corresponds to two vectors. The notable exception to this is the places where the surface intersects itself. The points that lie along any intersection correspond to 4 vectors. In the presence of noisy data, if a point lies in the space near an intersection where it is close (within the experimental error tolerance) to two different parts of the 3D-RDC surface, it will be impossible to know which of the two parts of the surface it originated from, and instead of 2 vector solutions, this may give rise to 4 vector solutions. In practice however, this is not a common occurrence.
It is important to note that the shape and orientation of the nD-RDC surface (Figure 2) is invariant to changes in molecular frame. A change in the relative orientation of the molecular frame to the anchor frame corresponds to a rotation of the infinite set of vectors distributed along a unit sphere. Because a rotation of a sphere results in an identical sphere, the nD-RDC surface remains unaffected. The problem of estimating order tensors can then be conceptualized as performing a best fit of nD-RDC surfaces to the data, and the problem of solving for vectors can be conceptualized as taking the inverse of the mapping from vectors to nD-RDC points. In practice, however, surface fitting is very time consuming and computationally inefficient. Due to the inefficiency of surface fitting methods, we are proposing an alternative approach to obtain a solution for equation 12. The ability to estimate vectors from order tensors coupled with the ability to estimate order tensors from vectors suggests the possibility of an iterative approach to estimating relative order tensors and reconstructing vectors in space.
While the nD-RDC problem is solvable, there may exist a finite number of degenerate solutions for some instances of nD-RDC data instead of a single unique solution. Here we will provide an informal presentation of this phenomenon.
Firstly, for a given set of relative order tensors and a vector , both and − produce the same RDC value [27; 40]. That is, a vector and its exact negation always produce the same RDC value in every possible alignment medium. Secondly, negation of sxy and sxz for all of the relative order tensors and simultaneous negation of the x-component of each vector will produce the same exact RDC values. This, of course, also applies to negating other off-diagonal elements of the relative order tensors and their corresponding Cartesian coordinate as shown in Table 1.
At first, this degeneracy may appear to increase the solution space to 8-fold. However, when reconstructing vectors in space, relative orientation of all vectors with respect to each other is of critical importance. Negation of the off-diagonal elements of the relative order tensors will result in the negation of the corresponding coordinate for all vectors in space. Equation 13 can be employed to study the effect of the relative orientation of vectors as the result of this sign toggling.
The relative orientation of the vectors with respect to each other is conserved since the negations of the x, y or z components of all the vectors results in a cancellation when calculating the relative orientation of the vectors. Note that some combination of these negations may lead to inversion of space chirality, but will preserve the relative orientation of vectors in space.
During the testing and evaluation of our methods, we have utilized simulated RDC data from three different proteins: 1A4Y (446 residues), 110M (153 residues) and 1SF0 (69 residues). These three proteins have been selected on the basis of their sizes to represent large, medium and small proteins respectively. Theoretical RDC data have been computed for these proteins with ±1 Hz error added from a uniform distribution to simulate experimental noise using the order tensors described in Table 2 and Table 3. Table 2 describes the order tensors in terms of principal order parameters and Euler angles. Table 3 lists the five essential elements that are necessary for complete reconstruction of the same order tensors as in Table 2. Although both of these tables describe the same order tensors, the latter representation reduces some ambiguities arising from the Euler angle representation. Furthermore, it is important to note that while equivalence of two order tensors can be established when their individual elements are equal, the converse is not true. Two order tensors may be composed of varying individual order parameters but produce RDC data in agreement to within the experimental error. It is therefore advisable to perform the comparison of two order tensors by observing their corresponding SF plots as well as comparison of the individual principal order parameters as demonstrated in section 4.1. The utility of simulated RDC data is invaluable to the proper study of a computational method since the ground truth is known ahead of time.
In addition to the simulated data, we have also used experimentally collected RDC data for the protein 1P7E from the BMRB database . Five sets of RDC data were available for this protein and all five were used as an illustration of the flexibility of our approach in accommodating experimental data from more than 3 alignment media.
Our proposed method operates in two major parts as shown in Figure 3. First, using a given set of relative order tensors, the orientation of corresponding vectors will be constructed in space. During the second step, a set of relative order tensors are obtained by using SVD and the reconstructed set of vectors from the first step. Iteration of these two steps is continued until convergence of the fitting score. The definition of the objective score utilized in this algorithm is shown in equation 14, where K indicates the total number of vectors and N indicates the total number of alignment media. Entities and in this equation denote the experimental and computed RDC values for the kth vector obtained from alignment medium n respectively.
Optimization of the declared objective function (equation 14) is relatively trivial when the proposed initial relative order tensors are in close proximity of the optimal solution. Since currently there is no existing method of estimating the orientational components of the anisotropy from unassigned RDC data, the presented method is forced to start from randomly selected orientational components of the relative alignment tensors. The initial principal order parameters are roughly estimated based on the observed minimum and maximum RDC values within each alignment medium. The search for the optimal principal order parameters is confined to a generous range provided by the user to assist with a more rapid convergence. A combination of grid search and simulated annealing  have been implemented as the core optimization engine of our approach in order to increase the likelihood of convergence to the optimal solution from a distant starting point. Simulated annealing has been integrated in order to eliminate the entrapment in local minima during higher annealing temperatures. Simulated annealing, during the lower annealing temperatures, will enable fine refinement of best solutions. Generally, convergence of the algorithm can be determined by observing value of the objective function (equation 14) to become sufficiently low to fall within the experimental error of the RDC data set. The process of heating and cooling is recommended to be repeated several times to ensure the discovery of a near global optimal point. Convergence of each instance of search is normally achieved within 50 steps, which approximately consumes less than a minute of execution time on a typical desktop computer. Based on empirical observation, the process of optimization from a starting random point is recommended to be repeated 10–20 times in order to provide adequate robustness to noisy data and convergence toward a deeper minimum point. Overall execution of the presented algorithm is therefore in the order of one to two hours on a typical single CPU, desktop computer.
The first step in our approach consists of reconstructing the set of vectors in space from an initial value of the relative order tensors S. There are two possible approaches to reconstructing a set of vectors from order tensor estimates and RDC data; either a closed form solution, where the orientation of a vector can be computed from a given nD-RDC data point, or a search of all possible vectors on the unit sphere. The former approach is generally preferable since it yields a solution with theoretically infinite precision in a fixed computation time. However, our attempt at manipulation of the system of equations shown in equation 12 with symbolic math program Maple (http://maplesoft.com/) did not yield a closed form solution. Therefore, our method relies on the latter approach by creating a finite number of isotropically distributed vectors as listed in equation 15 below. Figure 4 provides an illustration of isotropically generated vectors with n=15.
In this equation, 0 ≤ i ≤ n, 0 ≤ j ≤ 2nsin (θij) and (θ,)ij denote the polar coordinates of the internuclear vector. The density of the vectors can be adjusted by the parameter n and the total number of inter-nuclear vectors N can be approximated by equation 16. Using this discrete search mechanism, any internuclear vector can be captured by this isotropic vector set within an error of . During our experiments, isotropically generated vectors with n=50 have been adequate
To demonstrate the utility of constructed vectors in space without assignment, we present its application in evaluation of the fitness of a proposed structure based on unassigned RDC data. The presented method differs from that of the previously reported work [30; 31]. The new approach proceeds by first reconstructing the vectors in space followed by evaluation of fitness of any proposed structure based on matching of the reconstructed vectors to the vectors from the proposed structure. The proposed protein structure may come from different sources such as computational modeling tools, homologous proteins from PSI-BLAST, or X-ray structure for validation.
The general flowchart of our matching algorithm is shown in Figure 5. Our matching algorithm consists of a search over all Rzyz rotations that yield the best matching between the reoriented set of vectors from the proposed structure and the set of vectors estimated from the RDC data. Matching between two sets of vectors takes place by first back-calculating the theoretical RDC values for the proposed structure followed by identifying the best match to the experimental set of data by using a bipartite matching algorithm . A bipartite matching algorithm seeks to produce a least cost, matching two sets of data. The bipartite matching algorithm possesses the advantage of producing an optimal match between two complete data sets with O(n3) execution time where n is the number of data points in each set. Note that this is a significant improvement over the O(n!) execution time that is required for exploring all possible matching permutations.
Theoretically, it is reasonable to use grid search to find the best orientation. But in practice, simulated annealing is required to facilitate the convergence to a near optimal solution. The score with best orientation is the fitting score of the template protein structure to the nD-RDC data.
It is inevitable that the estimated relative order tensors will contain some error due to reconstruction of vectors that may be in slight violation of geometrical constraints such as dihedral or bond angle constraints. When presented with vectors that have been derived from a valid structure, individual vectors are confined by geometrical constraints defined by the protein structure. These differences between the reconstructed set of vectors and the proposed set of vectors can be mitigated by utilizing singular value decomposition to fine-tune the estimated value of S. After the bipartite match, singular value decomposition can be applied to the system to obtain a more refined relative order tensor. Only small adjustments within a given error tolerance are allowed based on the initial S that was obtained from the previous step. Experiments show that this fine-tuning can effectively improve the precision of relative order tensor estimates while virtually eliminating any sign degeneracy of the estimated relative order tensors.
Theoretically generated RDC data as described in section 3.1 have been subjected to estimation of the relative order tensors. The five critical components of the resulting estimated order tensors are listed in Table 4. The results shown in these tables correspond to test proteins 1A4Y, 110M and 1SF0 respectively and should closely resemble that of the known relative order tensors listed in Table 3 in order to indicate a successful estimation. Although the listed results display a clear resemblance to the original order tensors, they are not exact. It is therefore important to properly quantify the similarity between these results. However, the comparison of any two given order tensors is not as trivial as it may appear. It is important to note that simple comparison of individual elements of two order tensors may be misleading. Individual elements of two order tensors may exhibit large differences (as much as 100%) and yet produce nearly indistinguishable sets of RDCs. The complexity arises for a number of reasons. Mainly, the composition of an order tensor exhibits a linear relationship with respect to the principal order parameters and a quadratic relationship with respect to the orientation of the alignment frame (refer to equation 5). A more meaningful comparison of any two order tensors should consist of two separate steps: a comparison of the principal order parameters and a comparison of orientational components of the anisotropy. Here, it is adequate to numerically compare the principal order parameters of two given order tensors and visually compare the orientational components in the form of a SF-plot as described before [26; 27]. The principal order parameters are listed in Table 5 for all three proteins. The SF-plots illustrating a visual comparison of the orientational components of the relative order tensors are shown for only two proteins (1SF0 and 1A4Y) in Figure 6. The SF-plot of the protein 110M has been neglected for brevity. Figure 6 illustrates all acceptable order tensors that will generate RDC data within 1Hz of the simulated RDC data. The value of ±1Hz corresponds to the noise level that was used during the generation of simulated RDC data. In addition to this cluster of valid order tensors, the estimated order tensor obtained from 3D-RDC analysis has been superimposed. Without careful examination of these SF-plots, the location of the estimated relative order tensors is difficult to observe simply because they are embedded within the cluster of solutions. These results indicate that our proposed nD-RDC method is capable of producing a valid order tensor from RDC data corrupted with ±1 Hz of error. Figure 7 provides a graphical representation of the actual backbone N-H vectors of the protein 1SF0 (in blue) and the reconstructed positions (in white). In this image, the inversion relative of each possible reconstructed vector has been removed manually. The reconstructed vectors exhibit an average accuracy of ~5° with respect to the original one. This protein has been intentionally selected because it is the smallest protein and the scarcity of RDC data will in general lead to a less precise estimation of the order tensor. Reconstruction of vectors in space for this protein will therefore serve as an example of a more challenging case.
The assigned experimental RDC data from the protein 1P7E were obtained from BMRB. 1P7E is the structure of a 56 residue, immunoglobulin G binding protein, which had been previously obtained through refinement of an initial X-ray structure using RDC data . The backbone N-H RDC data in addition to the assignment of data and atomic coordinates were used to obtain the best order tensor solution using the program REDCAT . The resulting best order tensors are listed in Table 6 for each of the 5 alignment media. The last column in this table displays the total number of RDC data that were available for each alignment medium. Because of the missing data, only the vectors with data present in all alignment media were used (a total of 40 vectors). Here the alignment medium M1 has been selected as the anchor medium and the structure of the protein has been rotated so that the MF coincides with the PAF of M1. The best order tensors obtained are used to gauge the success of our proposed nD-RDC analysis method in estimating the relative order tensors from each of the five alignment media. The results of the nD-RDC analysis are listed in Table 7. As before, only the five critical components of each order tensor are listed in this table.
Note that two elements sxy and sxz of the estimated order tensors exhibit sign differences due to the degeneracy property discussed in section 2.4. Aside from the sign degeneracy, the back-calculated order tensor matrices have been well estimated. A graphical comparison between the expected and estimated elements of the relative order tensors is illustrated in Figure 8 after a manual correction of the sign degeneracy. Each point in this figure corresponds to one of the 23 non-zero elements of the five relative order tensors. The diagonal line in this figure represents the ideal case of perfect estimation of the unknown parameters. Based on the contents of this figure, the proposed method has been very effective in estimating the five relative order tensors. The overall effect of a few points that deviate slightly from the ideal line is very minimal as demonstrated in section 4.1. When decomposed, the effect of these slightly deviated elements of the relative order tensors falls within the allowed error for both principal order parameters and the orientational components of the anisotropy,
The presented method of nD-RDC analysis is capable of simultaneous reconstruction of the interacting vectors and estimation of the relative order tensor matrices from RDC data alone. Figure 9 provides a visual comparison between the actual orientation of the backbone N-H vectors of the 1P7E and the reconstructed vectors. In this plot, each inter-nuclear vector is originated from center of the ball and terminated with a dot on the surface of a unit sphere. The back-calculated (blue) and the expected (white) internuclear vectors are linked by a line to illustrate the magnitude of orientational error. Based on results shown in Figure 9, some back calculated internuclear vectors are more accurate than others. 36 out of 40 internuclear vectors are back-calculated within an error of less than 4° and 32 vectors are within an error of less than 2°. These results are in perfect agreement with our theoretical understanding of the RDC interaction. This varying degree of success is simply rooted in varying sensitivity of RDC to the orientation of the interacting vector within the alignment frame.
The information regarding the reconstructed vectors in space can be of great benefit in a number of applications. Here we present results that demonstrate the utility of our proposed method of simultaneous reconstruction of vectors and estimation of relative order tensors despite the two-fold degeneracy in the vector solution space. Inclusion of information such as a template structure will automatically resolve degeneracy ambiguities in vectors while providing the rotational relationship between the molecular frame and the principal alignment frame of the anchor medium.
Table 8 lists the results for assessment of six structures as potential structural templates. Here we utilize the experimentally collected RDC data for 1P7E to evaluate the efficacy of a vector matching approach to identifying the appropriate template. The template protein structures are collected from CATH homologous superfamily 22.214.171.124 (Immunoglobulin Binding Protein) with sizes ranging from 56 to 70 residues. Application of the algorithm discussed in section 3.2 to each template protein domain generated the results listed in Table 8. The first column in this table provides the PDB identification code. The subsequent columns provide the score of our matching algorithm, the structural similarity measured to 1P7E over the backbone C α atoms and size of each structure respectively. Based on these results, not only has the correct structure been identified, but there is also a reasonable degree of correlation between the nD-RDC score and the structural similarity.
The analysis of unassigned RDC data presented here provides a mechanism for accurate estimation of the principal order parameters and relative orientational information regarding alignment of the subject protein in several media. This is the first method that forgoes the need for assignment of data and the need for preexisting structure. Accurate estimation of the principal order parameters can be invaluable in detecting internal motion between two domains of the molecular complex. A-priori knowledge of the POPs can be very beneficial in advancing the currently existing strategies in structure determination from RDC data [11; 13; 17; 39; 45]. Because of its minimum data requirement, (3 RDC data per vector from at least 12 vectors) our proposed method may provide new avenues of structure determination in challenging cases such as membrane proteins.
Accurate estimation of the orientational components of anisotropy in addition to the POPs, extends the utility of our proposed work in novel directions. This report has demonstrated the successful use of the knowledge of the relative order tenors in reconstructing the interacting vectors. As an example application, successful identification of the most homologous protein structure has been demonstrated.
The apparent close correlation between the score of the proposed method and the backbone rmsd of structures can be suggestive of many exciting applications of this new approach. It is easy to envision a novel protein target selection mechanism for use by the community of PSI and structural genomics initiatives. A more effective means of target selection will assist in rapid completion of the most diverse and inclusive set of protein structures. In addition, the proposed method may also be deployed as the means to bridge the gap between the experimental and computational approaches to protein structure determination. Often times, protein modeling programs produce a list of most likely structures. These structures may exhibit as much as 11 Å structural diversity measured over the backbone Cα atoms . Existence of methods for validation and/or selection of the correct structure from a list of proposed structures by using an affordable set of experimental data (unassigned RDC data) may be of great benefit. A reliable and theoretically sound method can help in validating a computationally modeled structure with a small amount of inexpensively acquired experimental data. The structure of an unknown protein can be predicted by computational methods or determined by experimental methods. The computational methods are considered cheap and fast, but the quality of the prediction still depends on a number of factors. Therefore, blind acceptance of the computational modeling results is still not a common practice. On the other hand, experimental methods can determine protein structure with high resolution at expensive cost and a long data acquisition and analysis time. In experimental data collection procedures, RDC data are relatively easy to collect while NOE data are much more costly. Although a small amount of unassigned RDC data does not provide enough information for construction of a high-resolution protein structure, it can play an important role in filtration of impossible structures and evaluation of the fitness of a proposed protein structure, which could be selected from either computational methods or by identifying homologous proteins.
This work has been funded by NSF grant number MCB-0644195 and grant number 1R01GM081793 from National Institutes of Health to Dr. Homayoun Valafar.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.