|Home | About | Journals | Submit | Contact Us | Français|
The number of molecules with solved three-dimensional structure but unknown function is increasing rapidly. Particularly problematic are novel folds with little detectable similarity to molecules of known function. Experimental assays can determine the functions of such molecules, but are time-consuming and expensive. Computational approaches can identify potential functional sites; however, these approaches generally rely on single static structures and do not use information about dynamics. In fact, structural dynamics can enhance function prediction: we coupled molecular dynamics simulations with structure-based function prediction algorithms that identify Ca2+ binding sites. When applied to 11 challenging proteins, both methods showed substantial improvement in performance, revealing 22 more sites in one case and 12 more in the other, with a modest increase in apparent false positives. Thus, we show that treating molecules as dynamic entities improves the performance of structure-based function prediction methods.
Understanding the function of molecules is the first step towards learning how to manipulate them. Experimental techniques for determining molecular function tend to be expensive and time consuming. Consequently, computational methods can be critical for establishing molecular function. Some techniques use similarity in the sequence of molecules: when protein sequences are at least 40% similar, they usually perform similar functions and so annotations can be transferred with some confidence. However, when sequence similarity falls below 40%, the reliability of these methods decreases (Wilson et al., 2000).
Some molecular sites of interest, such as binding pockets and enzyme active sites, may not be recognized with sequence patterns alone, as they may be comprised of loop segments that come together in three-dimensions, but are distant in the primary sequence. In these cases, similarity may exist on the structural level even when there is no detectable similarity in the corresponding sequences. In order to address this challenge, many structure-based function recognition methods use shared 3D structural environments or “3D motifs” to recognize molecular functions (Fetrow et al., 1998; Halperin et al., 2008; Wallace et al., 1997). Some methods also combine information from sequence and structure (Friedberg, 2004; Watson et al., 2005).
Since the initiation of the structural genomics (SG) efforts, the number of solved structures in the Protein Data Bank (PDB) (Berman et al., 2000) with unknown function is increasing (Chandonia and Brenner, 2006; Levitt, 2007; Terwilliger, 2004). SG efforts specifically target proteins with little sequence similarity to functionally annotated folds. Therefore, structure-based function annotation methods able to recognize distant functional relationships would be very useful (Halperin et al., 2008; Nayal and Di Cera, 1994).
The structures from the X-ray crystallographic studies in the PDB typically provide static snapshots of molecules that are averages of the structures in the crystal lattice. Crystal packing induced by crystallization conditions and experimental modifications required to facilitate experimental studies may create slight variations of the structure that do not represent the biologically functional conformations. This may frustrate attempts to annotate function based on detailed analysis of the structures.
While performing their functions, molecules undergo dynamic structural changes, which can range from subtle side-chain rearrangements to large-scale domain movements (Henzler-Wildman and Kern, 2007). Physics-based molecular dynamics (MD) simulations allow structural investigation of such motions on the femtosecond (fs), nanosecond (ns), and occasionally microsecond scales (Karplus and Kuriyan, 2005; Karplus and McCammon, 2002). These motions may reveal conformations relevant to molecular functions. At the very least, sampling near the crystal structure may help in identifying structural motifs that are not obvious in a single static structure.
The idea of using information about dynamics to improve functional analysis is not new. Alternative X-ray structures, NMR ensembles and MD-generated trajectories all provide information about dynamics useful for structure-based drug design (Damm and Carlson, 2007; Huang and Zou, 2007; Meagher and Carlson, 2004; Sivanesan et al., 2005; Wong et al., 2005). Eyrisch and co-workers (Eyrisch and Helms, 2007) used an algorithm to locate surface pockets in structures generated by the MD simulations, where those were not apparent in the original crystal structures. Among those transient surface pockets were the actual active sites of the molecules. Frembgen-Kesner and co-workers (Frembgen-Kesner and Elcock, 2006) also demonstrated that coupling a docking method with MD simulations uncovers cryptic drug binding sites.
In this manuscript we describe our work in improving the performance of structure-based function recognition methods by coupling them to molecular dynamics (see Figure 1). The MD simulations demonstrated formation of favorable, albeit transient, conformations that suggest functions not apparent in the initial PDB structures. Specifically, for eleven different molecules we used FEATURE (Wei and Altman, 2003) and a valence-based method (Nayal and Di Cera, 1994) to analyze the presence of Ca2+ binding sites in 22 structural ensembles generated by the MD simulations using GROMACS.
Calcium plays a crucial role within many signaling pathways in the cell. Recognizing the ability of molecules to bind calcium may permit manipulations with direct impact on many aspects of cell life. Ca2+ binding sites tend to occur within flexible loops of the molecules and be oxygen rich, with several glutamate and aspartate residues lining the pocket. Often solvent molecules are also involved, since the binding occurs at the surface of the molecule. The straight forward coordination of calcium binding sites makes it surprising that the leading function-recognition methods fail to identify them even within structures with bound calcium. We show that allowing function recognition methods to examine the dynamic nature of the molecules dramatically improves their performance.
The eleven molecules studied here have at least one documented calcium-binding site (see Table 1 and Discussion). For each of the molecules at least two structures exist in the PDB: a HOLO form, with Ca2+ bound within the structure, and an APO one, without a bound Ca2+. Existing methods for structure-based function recognition often do not identify the APO sites, and even can miss the HOLO sites. As such, these structures make an excellent set of test cases. In addition to revealing the concealed Ca2+ binding sites in the HOLO structures, can simulation identify the binding sites in the APO structures?
In five of the eleven pairs of structures the two partners do not have identical sequences, because one contains at least one mutation. In four pairs these mutations are far from the Ca2+ binding sites. In one HOLO – APO pair, namely 1B9A-1B8C, a mutation affecting residue 101 (GLU to ASP) appears in the coordinating loop of the 1B8C CA109 site. This mutation in the APO modifies the affinity of Ca2+ binding at the CA109 site and eliminates Ca2+ binding at the CA110 site (Cates et al., 1999). As noted below, we are able to recognize the loss of this site.
Of the 22 structures selected for this work, five structures required slight alterations, as described in the Supplemental Experimental Procedures section 1. All these were relatively modest, consisting of easily filled missing atoms or small gaps of missing residues or modified residues that could be easily replaced with the naturally occurring ones.
After the initial equilibration period, the energy analysis (data not shown) for each simulated system showed no significant change in potential energy over time and the RMS deviation to the starting structure generally plateaued below 6 , suggesting that all systems were stable. In addition, the secondary structure content for each system simulated also showed no significant change over the course of the simulations (see Table 2). The trajectories were sampled every 2.5 ps and those structures were evaluated by both FEATURE and the valence method (see Experimental Procedures).
FEATURE examines local 3D structural environments looking at ~80 features in six concentric spherical shells 1.25 apart (Halperin et al., 2008). Based on a model that is built by comparing true sites and non-sites for a given property, FEATURE then assigns a score to local environments, which describes how likely they are to represent one of interest. Coupling FEATURE to the structural dynamics of molecules considerably improved its performance, revealing 22 more Ca2+ binding sites in addition to the 15 (9 in the HOLO and 6 in the APO initial PDB structures) found readily by FEATURE alone. In addition to the true Ca2+ binding sites, FEATURE found 7 false positive sites in the HOLO and 9 in the APO structural ensembles. A number of these are nonetheless interesting, as described below. The overall distributions of scores for static structures and for structures sampled during the molecular dynamics were not significantly different (see Figure 2). As expected, points that scored poorly for calcium binding were generally within the hydrophobic protein cores or at the structural periphery, where there were not enough atoms for the FEATURE algorithm to consider.
In order to test the generality of our approach to improve structure-based function prediction, we tested a second method which identifies Ca2+ binding sites in 3D environments. This method (Nayal and Di Cera, 1994) relies on measuring the negative charge around a potential site. Performance of the valence method also showed marked improvement with MD: it identified 15 Ca2+ binding sites in the HOLO structural ensembles, while only 3 in the initial HOLO structures.
Since FEATURE generally performed better than the valence method, the remainder of this section is focused on some results with FEATURE, which are presented below for selected HOLO – APO pairs. Results for the remaining pairs are presented in the Supplemental Results and Table 1 summarizes all results.
The results of FEATURE scanning for the 3DNI – 1DNK HOLO – APO pair were particularly interesting. This molecule contains only two Ca2+ binding sites, based on the simple scheme of relating our results to the Ca2+ ions bound within the HOLO structure (see Experimental Procedures). However, literature investigation revealed that this molecule contains one more characterized Ca2+ binding site at the active site (Suck et al., 1984). While only revealing a single site in the HOLO and APO initial PDB structures, FEATURE correctly identified all three Ca2+ binding sites in the HOLO structural ensemble (see Figure 4A). In the APO structural ensemble the Ca2+ binding site at the active site of the molecule and CA282 site were revealed.
In the case of the HOLO structure 1I40, our results showed a dependence on the initial structure with which MD simulation was started. In the 1I40 starting structure, CA306 is an active site cation, which usually binds in the presence of the protein substrate. However, since the substrate is absent, this Ca2+ ion is not in its native position and is instead located 5.3 away from the CA302 binding site, which influences the clustering analysis to lump the two sites into one (see Supplemental Experimental Procedures section 2). The high scoring putative Ca2+ binding site centers obtained with global grid FEATURE scanning spanned two different environments coordinated by different sets of amino acids, having in common only ASP157 (see Supplemental Movies). Since these environments correspond well to the two Ca2+ binding sites, CA302 and CA306, we conclude that our method locates both of those sites. The coordination environment of CA304 in this molecule consists of six water molecules and one oxygen atom from the protein itself. Because water molecules are not reliably present in crystal structures, FEATURE does not consider them in its models, and thus would be unlikely to recognize calcium sites coordinated mostly by water for either HOLO or APO structures. FEATURE does not identify any of the sites in the APO initial PDB structures or the APO structural ensemble.
While both of the expected Ca2+ binding sites present in the 1K96 HOLO structure were readily identified in the initial PDB structure and the structural ensemble, no Ca2+ binding sites were identified in its counterpart APO 1K8U system. However, the site noted as CA91 in the HOLO structure achieved scores close to the model threshold in the APO dynamic ensemble. In order to investigate whether a longer simulation might uncover this site, we continued the 1ns simulation for an additional 9 ns. The longer trajectory contained 4001 structures sampled every 2.5 ps, of which several attained a FEATURE score of more than 50, compatible with Ca2+ binding at the CA91 site, especially around the 8th nanosecond.
On the sequence and structural levels lysozyme and α-lactalbumin exhibit very high similarity, and have been postulated to be evolutionarily related: emerging from a duplication of a single ancestral gene and then diverging in their functions over time (Qasba and Kumar, 1997). As such, it is rather encouraging that FEATURE successfully identified the Ca2+ binding site present in these molecules in the HOLO and APO structural ensembles. Interesting, also, is the fact, that an identical false positive site was identified in all four structural ensembles. In the human lysozyme residues THR52, SER61, ASP67, THR70, and others nearby persistently directed available oxygen atoms towards the false positive site, and could also participate in the binding coordination of a divalent cation. The location of this false positive site was far from the locations of zinc and manganese ions binding sites. The Ca2+ binding model employed by FEATURE for this work correctly did not detect these binding sites.
The purpose of these experiments was to test the hypothesis that short molecular dynamics simulations can improve the performance of function recognition methods. Figure 3 depicts a trace of local grid FEATURE scanning around the CA302 binding site in the 1I40 HOLO structural ensemble. This particular site is easily recognized by FEATURE as a Ca2+ binding site without the help of MD simulations, evidenced by the highest score of ~56 attained by the initial PDB structure at this site. However, these results illustrate clearly that even HOLO crystal structures do not necessarily contain the more favorable local conformations for Ca2+ binding, while MD simulations have the potential to reveal structures which do: over the course of the simulation many structures were generated which scored much higher than the initial structure. Nearly all the sites conformed to this trend. In addition, sites exhibited dynamic behavior: the side chains within the sites changed between coordinated and uncoordinated states. On average, in order for FEATURE to identify the sites in the structural ensembles which were not obvious in the starting structures the side chains coordinating calcium binding moved by 3 (data not shown).
This work demonstrates the value of examining molecular motions in the course of predicting function. In general, FEATURE performed better on the initial HOLO structures than the valence method, and the MD simulations improved the results of both methods for the HOLO structural ensembles. FEATURE outperformed the valence method in predicting Ca2+ binding sites for both the APO starting structures and the MD generated structural ensembles, which also showed marked improvement in FEATURE results. As expected, for both methods, Ca2+ sites in the APO structures were more challenging to recognize than the corresponding sites in the HOLO structures. The presence of the Ca2+ ion in the binding sites of the HOLO structures may influence the topology of the sites, with side-chains adopting favorable conformations more often. In contrast, the APO side-chain conformations are not similarly constrained, and hence may form favorable Ca2+ binding states more transiently.
Both function prediction methods identified sites that were not documented to bind Ca2+ ions, which we label as false positive. Most of these sites differed from the true Ca2+ binding sites in two ways. Firstly, the highest score achieved in the TP clusters tended to be much higher than the highest score observed in the FP clusters. Also, FP clusters tended to consist of one or two putative site centers, while the TP clusters were mostly composed of multiple putative site centers (see Table 3 and Figure 5). Based on the Positive Predictive Value analysis, we required that at least three putative site centers should be present at the site for it to be counted as a TP or a FP result. Only 16 FP results identified by FEATURE survived this filter, out of possible 35 for FEATURE and 29 for the valence method, while only 2 and 10 TP results were lost for FEATURE and valence method, respectively. All the results reported in this work have survived this filter. Examination of the trade-off between the number of obtained TP and FP sites is also possible through varying the score cut-offs for FEATURE and the valence method. Such future investigations may be worthwhile.
The accuracy of the force fields in treating Ca2+ interactions with the protein and solvent may be a limiting factor in our results. A recent study demonstrated that GROMOS force field underestimated by two-fold the attractive interactions between the Ca2+ ions and protein side-chains involved in binding coordination (Project et al., 2007). This same study implicated CHARMM and OPLS-AA force fields in underestimating by two-fold and overestimating by four-fold those same interactions. Improvements in force field parameters could further improve the efficiency of Ca2+ binding recognition when coupling structure-based function prediction methods to MD simulations. With better parameterization, it may be appealing to introduce Ca2+ atoms in the high scoring APO sites in the MD protocol in order to see if these sites continue to exhibit the high scoring conformations in the presence of the cation. In our experiments, several APO Ca2+ binding sites narrowly missed the threshold score of 50: 1DNK CA281, 1MJW CA302 and CA306, and 1K8U CA91 (1 ns simulation). It is possible that if Ca2+ were included in the APO systems in the MD simulations, the ions could enhance further the higher scoring conformations.
Our results, especially for the APO structural ensembles, conform to the preexisting equilibrium hypothesis of ligand-binding, where the molecule exists in an equilibrium of several different conformations. The binding of a ligand to the appropriate conformation shifts the equilibrium towards this particular conformation (Keskin, 2007; Tobi and Bahar, 2005; Tsai et al., 1999; Xu et al., 2008). Therefore, even without the presence of a calcium ion the APO structures may exhibit transient conformations favorable for calcium binding. In fact, with the addition of Ca2+ atoms to the APO sites in the MD simulations protocol we would not expect qualitative differences in the results, other than perhaps an increase in the frequency and persistence of high-scoring conformations.
Currently, MD simulations remain a rather expensive way of generating structural diversity. The protocol used in this work relied on the simplicity of the function of interest. The 1 ns simulations generally proved to be long enough, generating sufficient number of conformations to elicit positive identification of the existing Ca2+ binding sites (it is important to remember that our goal was simply to generate representative conformations—it was not required that the conformations were sampled with a Boltzmann probability). Several improvements can be made to this protocol, including allowing longer times for system equilibration and trying different force fields. However, such steps would be impossible to predict if the work concerned a structure with an unknown function.
Our experiments are not sufficient to provide statistically significant estimates of the improved performance of structure recognition. However, for many structures, we observed (as described above and in the Supplemental Results) conformational dynamics that contributed to the formation of calcium binding pockets. With improvements in molecular dynamics algorithms and implementations on special purpose hardware, it may become possible to run the dynamics for longer time period and to apply the algorithm described here on a large scale to many proteins in the PDB (Friedrichs et al., 2009). This paper reveals the potential value of such an effort at uncovering otherwise obscure binding sites.
The dynamic ensembles generated by the MD simulations improved the performance of two different structure-based function prediction methods. Both function recognition algorithms achieved better performance in identifying Ca2+ binding sites in structural ensembles generated by the MD simulations in 1 ns, than in the initial structures obtained from the PDB. We expect that our methodology can be easily amended to accommodate large systems or those that require significant domain motions for function determination by either generating longer simulations or employing other methods that explore conformational space more rapidly, such as normal modes or replica exchange. Furthermore, while this work employed FEATURE and the valence method, which are simple and efficient to apply in a large scale study to hundreds and thousands of structures, it should be apparent from our results that any other structure annotation algorithm can be accommodated by the methodology outlined in this work. As such, the scheme for recognizing function based on structures summarized in Figure 1 should be applicable to the novel unannotated structures generated by the Structural Genomics Consortia.
The following structures were taken from the PDB for this work: carp parvalbumin β, 1B9A (Cates et al., 1999) and 1B8C (Cates et al., 1999), human grancalcin, 1K94 (Jia et al., 2001) and 1F4Q (Han et al., 2000), bovine DNase I, 3DNI (Oefner and Suck, 1986) and 1DNK (Weston et al., 1992), Escherechia coli inorganic phosphatase, 1I40 (Samygina et al., 2001) and 1MJW (Avaeva et al., 1998), human S100A6, 1K96 (Otterbein et al., 2002) and 1K8U (Otterbein et al., 2002), Naja naja naja phospholipase A2, 1PSH (Fremont et al., 1993) and 1A3D (Segelke et al., 1998), Canavalia ensiformis concanavalin A, 1NLS (Deacon et al., 1997) and 1DQ0 (Bouckaert et al., 2000), bovine α-lactalbumin, 1F6S and 1F6R (Chrysina et al., 2000), human lysozyme, 3LHM and 2LHM (Inaka et al., 1991), Clostridium perfringens alpha-toxin, 1QMD and 1QM6 (Naylor et al., 1999), and Rhodobacter capsulatus porin, 2POR (Weiss and Schulz, 1992) and 3POR (Weiss and Schulz, 1993).
The 24 Ca2+ binging sites which formed the HOLO set were defined as the sites in which Ca2+ ion was present in the original static structures of 1B9A, 1K94, 3DNI, 1I40, 1K96, 1PSH, 1NLS, 1F6S, 3LHM, 1QMD, and 2POR with five exceptions. Only CA302, CA304, and CA304 sites were used for the 1I40 case. The active site of 3DNI and the fourth Ca2+ binding site of 2POR are characterized Ca2+ binding sites (Suck et al., 1984; Weiss and Schulz, 1992), and although no Ca2+ is present at these sites in the original PDB structures, we included them in our dataset. Similarly, the zinc-binding site of 1QMD was included as an expected site (see Supplemental Results). As such, there were 21 occupied and 3 unoccupied Ca2+ binding sites in the HOLO dataset.
By definition, the APO structures do not contain Ca2+ ions. However, since these APO structures directly correspond to the HOLO structures, we expected to observe the Ca2+ binding sites in analogous positions. With the exception of the CA110 binding site of 1B8C, discussed above, 23 out of the 24 sites forming the HOLO dataset are also present in the APO structures, which consisted of 1B8C, 1F4Q, 1DNK, 1MJW, 1K8U, 1A3D, 1DQ0, 1F6R, 2LHM, 1QM6, and 3POR.
Supplemental Experimental Procedures section 1 describes minor modifications made to some of the structures in order to ensure system completeness during the molecular dynamics simulations.
GROMACS is a software suit that allows generation and analysis of molecular dynamics (MD) simulations (Lindahl et al., 2001). Using GROMACS version 3.3.1, we created 1 ns simulations of the eleven pairs of structures in order to generate structural diversity for each protein set. For each of the structures, the system consisted of only one chain and appropriate ions, a 1 nanometer (nm) layer of solvent: simple point charge (SPC) (Berendsen et al., 1981) water molecules, and sodium or chloride ions for neutralization (see Table 2).
Each system, treated with a periodic boundary condition, underwent a 200-step energy minimization using the steepest descent algorithm followed by a 10 picosecond (ps) harmonic position restrained molecular dynamics simulation at 300 K, in which all the protein atoms remained motionless. Electrostatic and Van der Waals interactions and neighborlist cut-offs were set at 1 nm. Separate external temperature baths (Berendsen et al., 1984) of 300 K and coupling constant tauT = 0.1 ps were used for the protein and non-protein components of each system. The 1 ns production run simulation was carried out at constant pressure of 1 bar, kept as such by coupling weakly to pressure baths (Berendsen et al., 1984) with tauP = 0.5 ps, and constant temperature, as above. An integration time step of 2 femtoseconds (fs) was used with the LINCS (Hess et al., 1997) algorithm, constraining all covalent bonds to their equilibrium lengths. The force field used was GROMOS96 (van Gunsteren et al., 1996), 43a1. Generation of these trajectories took between 12 to 36 hours on a single 2.8 GHz processor with 2 G RAM. For each system, a structural ensemble was created by extracting structures generated by the simulation every 2.5 ps.
This work employed FEATURE version 1.9 (Wei and Altman, 2003) and the Ca2+ binding site model built by Wei et al. (Wei and Altman, 1998) for analysis of static starting structures and structural ensembles generated by the MD simulations. A score of 50 or above at a given point in the structural space identified a putative Ca2+ binding site center.
FEATURE scanning was implemented in two ways: global grid and local grid. A global grid of points 1 apart was created to encompass each starting structure and those generated by simulations. At each point, FEATURE algorithm calculated a score using the Ca2+ binding site model. For each structure examined, coordinates of the points which scored at least 50 were retained for clustering analysis described below. When looking at a novel structure, the global grid scanning would be applied as an initial analysis. Putative sites identified by the global grid scanning could then be followed up with the local grid scanning analysis, which allows examination of the behavior of the site of interest in finer detail. In our case, we were only interested in the true Ca2+ binding sites. Therefore, a local 10 × 10 × 10 3 grid of points 1 apart was created to encompass the Ca2+ binding sites of each starting structure and those generated by simulations. For each HOLO – APO pair, the local grids were centered on an equivalent atom, which was identified to be the closest to the Ca2+ atom in the crystallized HOLO structure. FEATURE algorithm calculated a score at each point using the Ca2+ binding site model. For each structure examined, only the information about the highest-scoring point was retained for visualizing behavior of the sites over the course of the simulations (see Figure 3).
Figure 2 illustrates the distributions of all the scores at the global grid points within 7.5 of at least one atom in the structures, which is the radius of the FEATURE Ca2+ binding model, for the starting PDB structures and the respective structural ensembles generated by the MD simulations. We compared the two distributions using the chi-square goodness of fit test: the distribution from the starting structures was considered as theoretical values and the distribution from the structural ensembles was considered as the observed values. This analysis required counts for the calculations, so we normalized the structural ensembles distribution to the number of structures examined in each structural ensemble by reducing the number of counts for each bin by a factor of 401. The two distributions are statistically the same (calculations not shown).
As a negative control, we scanned with a local grid areas that were not near the Ca2+ binding sites in 1B9A, 1K94, and 1K96 HOLO structural ensembles. In each case, an atom was selected to be the grid center by two criteria: the atom density surrounding this atom in a sphere of radius 7.5 and its distance to the surface were comparable to those of the central atoms of the local grids at the true Ca2+ binding sites. The local environments surrounding these points never attained conformations suitable for Ca2+ binding; the highest FEATURE scores never exceeded 30, and had a mean value of −10 for all frames in the simulations tested (data not shown). These results demonstrate that the score threshold of 50 established using static structures is a reasonably stringent cut-off to be applied to FEATURE scanning of structural ensembles generated by the MD simulations.
We used the valence method (Nayal and Di Cera, 1994) with default parameters to scan the starting structures and the ensembles generated by the MD simulations. Valence is the number of electrons shared by an ion during bond-formation, and is generally estimated as the overall charge of the ion (thus, Ca2+ has valence of 2). This method further assumes that atoms exert partial valence at a distance, which decreases as the distance from the ion increases. Thus, in order to locate putative ion binding sites, this method sums partial valencies provided by nearby atoms. In our tests, a point around which partial valencies summed to 1.4 or above was used for predicting Ca2+ binding. A step size of 1 was chosen to allow comparison between valence method and FEATURE results.
Both function-recognition methods applied in this study may identify multiple putative Ca2+ binding site centers in a local region that often represent the same site. Additionally, combining and analyzing global scanning results of FEATURE and the valence method posed a challenge, since the gold standard Ca2+ binding sites (defined by the HOLO structures) are difficult to define once the molecules start moving during the simulations. In general, it is challenging to compare sites in different structures, because they are separated by time and conformational motion. Accordingly, we devised a clustering algorithm that considers the nearest 50 atoms to a point, such as a known Ca2+ site, and uses a paired Wilcoxon rank sum test to assess the similarity of two points in two potentially very different structural conformations. The nearest 50 atoms were chosen because they typically filled a sphere of the volume used in scanning by FEATURE (radius = 7.5 Å).The putative site centers identified by FEATURE or the valence method were clustered in order to define predicted Ca2+ binding sites that were substantially the same across the molecular dynamics trajectories. All results reported and discussed in this manuscript are based on this clustering analysis (see details in Supplemental Experimental Procedures section 2) and a post-clustering filter (see Discussion).
We used Visual Molecular Dynamics (VMD) (Humphrey et al., 1996) in order to visualize structures and trajectories generated by the MD simulations, as well as the results of FEATURE and the valence method scanning. Images of structures were generated using Tachyon Ray Tracer (Frishman and Argos, 1995; Stone, 1998) from within the VMD.
The Supplemental Materials include two movies, additional results and methods.
This animation shows a close-up of the CA302 and CA306 binding sites, which are readily identified by FEATURE in the 1I40 structural ensemble generated by the MD simulations. The yellow spheres represent the putative CA302 binding site centers, while the purple spheres represent the putative CA306 binding site centers. The binding site centers here refer to the centers of the clusters generated by the first clustering step for each frame of the simulation. With the exception of one residue, namely ASP157, the two sites are defined by distinctive coordinating residues. The backbone is shown as ribbon. The coordinating residues are shown as CPK, with oxygens further represented as VDW spheres. Colors represent the following atoms: cyan – carbon; blue – nitrogen; red – oxygen. The carboxylic oxygens on ASP157 are shown somewhat bigger in the metallic red color. At the end, the movie holds for a few seconds, displaying the last structure evaluated from the MD simulations and all the putative Ca2+ binding site centers that were identified in the structural ensemble generated by the MD simulations.
This animation is identical to the previous animation, except that the putative Ca2+ binding site centers that appear either as yellow or as purple spheres leave behind a shadow when they disappear. The shadow builds up over the course of the animation, allowing visualization of the relative placement of the two sites.
This work is supported by NIH LM-05652 and the SIMBIOS National Center for Simulation of Biological Structures, GM-072970. DSG is also supported by the Stanford Genome Training Grant NIH 5 T32 HG00044. We thank Inbal Halperin, members of the Stanford Helix group and the SIMBIOS National Center for useful discussions.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.