|Home | About | Journals | Submit | Contact Us | Français|
For well-structured, rigid proteins, the prediction of rotational tumbling time (τc) using atomic coordinates is reasonably accurate, but is inaccurate for proteins with long unstructured sequences. Under physiological conditions, many proteins contain long disordered segments that play important regulatory roles in fundamental biological events including signal transduction and molecular recognition. Here we describe an ensemble approach to the boundary element method that accurately predicts τc for such proteins by introducing two layers of molecular surfaces whose correlated velocities decay exponentially with distance. Reliable prediction of τc will help to detect intra- and inter-molecular interactions and conformational switches between more-ordered and less-ordered states of the disordered segments. The method has been extensively validated using 12 reference proteins with 14 to 103 disordered residues at the N- and/or C-terminus, and has been successfully employed to explain a set of published results on a system that incorporates a conformational switch.
Accurate prediction of rotational and translational diffusion coefficients of proteins is important for the interpretation of a wide variety of biophysical data. For ideal spherical particles of radius r in solution of viscosity η, the translational (DT) and rotational (DR=1/τc) diffusion coefficients are described by the Stokes-Einstein (SE) relations: DT = kBT/(6πηr) and DR = kBT/(8πηr3). For globular proteins in aqueous solution, the SE relation overestimates τc by a nearly constant factor of 2;1 an empirical correction can be applied such that a robust and accurate estimation of τc is made using the molecular weight and the consensus partial specific volume for globular proteins (0.73 cm3g−1). For globular proteins with arbitrary shape, τc can be predicted from atomic coordinates by two distinct but related methods. In the bead method2-4, a molecule is modeled as a group of beads which covers the molecular surface or fills the volume occupied by the molecule. By contrast, the boundary element method uses a group of triangular patches for modeling the surface5-7 (Figure 1). Despite this difference, both methods calculate various hydrodynamic properties of interest using a similar formalism, which describes the hydrodynamic interaction between beads or surface patches. Although both methods predict the behavior of rigid proteins with reasonable accuracy4,8, they are inaccurate for flexible proteins that have long unstructured sequences; these can be best described by an ensemble approach. Many proteins contain long disordered segments under physiological conditions. Such proteins play important regulatory roles in many fundamental biological events including signal transduction and molecular recognition9,10. Reliable prediction of τc for proteins containing disordered segments will help to interpret τc data from NMR relaxation, fluorescence depolarization, electric birefringence, and dielectric relaxation measurements and to detect intra- and inter-molecular interactions and conformational regulatory switches between ordered and disordered states of the disordered segments. Here we describe a new method that accurately predicts τc for proteins with disordered segments by assuming two layers of molecular surfaces whose correlated velocities decay exponentially with distance. This method was extensively validated with 12 reference proteins that have 14 to 103 disordered residues at the N- and/or C-terminus and has been successfully employed to explain published results on calmodulin and the protein kinase Syk.
The boundary element method for calculating hydrodynamic properties was originally proposed by Youngren and Acrivos5 and recently reviewed by Aragon7. The essence of this method is briefly restated here. In a sticky boundary condition, which is relevant to biomolecules in aqueous solution, the velocity field of flow, v(y) at position y in the fluid, is described by spatial integration of the Oseen hydrodynamic interaction tensor (H) from a particle surface element (dSx) which experiences surface stress force (f(x)) at position x in the fluid (eq. 1). This hydrodynamic interaction is inversely proportional to the distance of interaction (|x−y|) (eq. 2). In the eq 2, I is the identity matrix and η is the viscosity of solution.
For N finite surface elements, the purely shape-dependent terms, H and dS can be rewritten as a 3N×3N matrix G in three-dimensional Cartesian coordinates (eqs. 3 and 4) and the previously unknown 3N×1 surface stress force matrix, f, can be calculated if the inverted matrix of G and a 3N×1 velocity field matrix, v (eq. 5) are given.
In the case of a rigid particle in which all the surface elements translate and rotate as a single entity, the velocity field matrix can be determined from any given translational (vp) and angular (ωp) velocities of the whole body. The force and torque exerted by translation and rotation of the particle in the fluid are vector sums of the individual force and torque of the surface elements. They are also directly related to velocities vp and ωp through friction tensors (3×3 matrices), K (eqs. 6 and 7). The rotational (Drr) and translational (Dtt) diffusion tensors (3×3 matrices) are then obtained from these friction tensors at a given temperature (T) (eq. 8). On the left, the 6×6 matrix of the rotational diffusion tensors, Dtt, Drr, and Dtr correspond to the translational effect alone, the rotational effect alone, and the coupling between them arising from the screw-like properties of particles, respectively11 (eq. 8). The Drt matrix is assumed equal to the transpose of the Dtr matrix. The same notations were used for the 6×6 matrix of friction tensors on the right (eq. 8).
The eigenvectors of Dtt and Drr represent the principal axis of the translational and rotational diffusion tensors, respectively. The rotational correlation time (τc) is derived from the average (Diso) of three eigenvalues (λ1, λ2, λ3) of the Drr (eq. 9).
Ensemble conformers were generated using a previously described method12 with the following modifications. Since deposited PDB coordinates do not generally include the disordered flexible parts, the coordinates of any flexible parts were modeled by using the molecular modeling toolkit (MMTK)13 while retaining the coordinates of the structured part. Conformational heterogeneity of disordered segments was achieved by rotating backbone and side chain dihedral angles of the modeled template structure. From the junction between the rigid and flexible parts toward the Nor C-terminus, the molecular coordinates were rotated according to a pair of backbone dihedral angles (ϕ–ψ) and side chain dihedral angles (χ1–χ4) which were randomly selected from an amino acid specific dihedral angle library. The ϕ–ψ angle library was built from 500 low homology and high resolution x-ray structures14 (resolutions < 1.8 Å) excluding all residues in α-helices, β-sheets and turns determined by the DSSP15. The χ angles library was adopted from a published rotamer library14. Residues immediately preceding proline were treated as an additional amino acid type, due to the restricted local conformation16. Overlap of van der Waals radii was evaluated in each step and the random selection of dihedral angles was repeated for each residue up to 10 times until there was no van der Waals clash between flexible and rigid parts. If clashes remained, the random selection of dihedral angles was restarted from the previous residue. For calmodulin and the protein kinase Syk, which have two domains connected by a disordered region, simultaneous random selections of all flexible dihedral angles and the evaluation of overall van der Waals clash were repeated until the selected conformation causes no van der Waals clash.
Triangulated molecular surfaces were generated by the MSMS program17 using 1.1 Å inflated van der Waals radii to account for the uniform hydration thickness of the protein molecules in aqueous solution8. In excess of 10,000 triangulated surfaces were calculated (as defined in .vert and .face output files of the MSMS program) and converted to 700-1200 surface elements by the COARSEN utility included in the GNU triangulated surface (GTS) library. The boundary of the rigid segment was determined by noting the steady decrease of [1H]-15N NOE and 15N R2 relaxation rates from the plateau values of the rigid segment. The experimental rotational correlation times (τcexp), determined from 15N R2/R1 ratios, were taken from the literature12,18-27.
All calculations were performed using an in-house C program BE2 which reads the two layers of molecular surfaces, evaluates their correlated velocity factors with given γ and ε parameters (eq. 11), and feeds them into the standard boundary element method calculations to get rotational and translational diffusion tensors for flexible proteins. Numerical routines were adopted from the GNU scientific library (GSL). The implementation of the standard boundary element method in the program was extensively tested with fully rigid proteins (Table S1). The predicted tumbling times are in excellent agreement with the experimental data (Figure S1). For the calculation of flexible proteins, the number of instantaneous or flexible surface elements was 700 to 1200 and that of rigid surface elements was about 10,000, since computational cost mainly depends on the number of instantaneous boundary elements. To minimize possible artifacts due to boundary element modeling and coarsening procedures, extrapolation to an infinite number of surface elements was performed using multiple calculations with varying numbers of surface elements. There was no significant difference between the extrapolation from six calculations using 700, 800, 900, 1000, 1100, and 1200 surface elements and a single calculation using 800 surface elements (Table 2; Table S2 and Figure S2, method a-ext). The results from a single calculation using 800 surface elements are therefore presented in Table 1. In order to find optimal γ and ε parameters for a general prediction, absolute deviations of predicted tumbling times (τcpred) from experimental data (τcexp) were averaged over 1,000 ensemble structures (N=1,000) and 12 reference proteins (M=12) for each combination of γ and ε parameters (eqs. 10 and 11). A γ and ε set with the minimum mean deviation was chosen as the optimal set of γ and ε parameters.
In the classical boundary element method for rigid particles, all the surface elements translate and rotate as a single entity and the velocity field matrix (v) can be determined immediately if the translational (vp) and angular (ωp) velocities of the whole body are given (eqs. 1-5). However, in the case of flexible molecules, surface patches related to the unfolded chain do not translate and rotate synchronously with other rigid patches. Consequently, the velocity field matrix is not directly related to the velocities vp and ωp of the whole molecule, which makes it impossible in practice to derive definite values for DR (=1/τc) and DT.
To resolve this problem, a sub-boundary (“rigid surface”) is introduced inside the real instantaneous molecular boundary (Figure 2), such that the rigid surface encloses a least set of common atoms within the heterogeneous ensemble structures. It is then hypothesized that the velocity of an instantaneous surface patch (v(yk)) is correlated with that of the closest rigid surface patch (v(y'k)) by some factor which decays with distance (δ) between two surface patches (eq. 11 and Figure 2, distance a). That is, disordered parts far from the rigid surface would be less likely to translate and rotate synchronously with the rigid part (Figure 1). As the δ in eq. 11, the distance to the closest rigid surface element (Figure 2, distance a in Å) provides a more accurate prediction of τc than alternatives such as a spatial distance (Figure 2, distance b in Å) or residue number distance (Figure 2, distance c in residue) to the beginning of the flexible part (Table S2 and Figure S2). For the calculations of the distances b and c, each surface element is correlated with a specific residue based on the atomic coordinates in the relevant member of the disordered ensemble. The conformational space accessible to the disordered part, especially close to the rigid surface, will be constrained not only by the degrees of freedom of dihedral angles or chain length but also by the shape of the molecular surface around the beginning of the flexible part. The distance to a closest surface element of the rigid portion of the protein is thus more appropriate for describing the velocity correlation.
Since our major interest is in the tumbling time of the rigid part in the presence of disordered segments, the ensemble of disordered segments that affect the tumbling of the rigid part can be treated as a mean viscous medium surrounding the rigid part, by analogy with the effective medium theory in polymer dynamics28. In the absence of the disordered segments, the velocity field decreases in inverse proportion to the viscosity of the solution and the distance of interaction (|x−y|), as described by the Oseen equation (eq. 2). On the other hand, in the presence of the disordered segments, the additional effective viscosity caused by the dragging force of the disordered ensemble disturbs the velocity field, resulting in an effect similar to the screening of the hydrodynamic interaction in a concentrated polymer solution. The velocity field decreases rapidly and the hydrodynamic interaction between two points separated by a distance that is larger than a certain characteristic length termed the hydrodynamic screening length28 becomes negligible.
In our description of the perturbation of the velocity field by the disordered ensemble (eq. 11), ε is a modulator of the decay rate for velocity correlation and γ is a generous distance margin that in effect expands the given surface boundary and therefore compensates for the ambiguity in the definition of the rigid surface. More importantly, γ also accounts for the differences in the nature of the hydration between the rigid and disordered parts.
The classical boundary element method provides a unified description of specific volume, translational and rotational diffusion coefficients with a single universal hydration parameter for globular proteins7,8. Numerical calculations using beads and boundary element modeling assume a solvent excluded molecular surface surrounded by a uniform thickness of hydration. For both methods, an essentially identical consensus thickness of the hydration layer (≈1.2 Å in the bead modeling study29 and 1.1 ± 0.1 Å in the boundary element modeling study8) yields hydrodynamic properties in excellent agreement with experiment. Our implementation of the boundary element method7 by using 700 to 1200 triangulated surface elements derived from atomic coordinates of crystallographic or NMR structures and a single uniform hydration layer thickness of 1.1 Å also gives excellent agreement with experimental data from 35 fully folded globular proteins whose experimental τc range from 2 ns to 125 ns (Table S1 and Figure S1).
On the other hand, the hydration thickness for an unfolded disordered protein chain is not as well understood in detail as for globular proteins. Since the molecular surface of the disordered protein chain is largely exposed to solvent, interactions with surrounding water molecules should be different for an unfolded chain and for a folded globular protein domain. Recent NMR relaxation studies have revealed that the hydration of an unfolded protein is significantly higher than that of a globular protein of similar size30,31. The activation energy for the dynamics of the most strongly bound part of the hydration layer is about 50% larger for an unstructured protein than for a globular protein30. Therefore, the effective hydration layer for the unfolded chain will be thicker than for the folded part, and the general assumption of a uniform hydration layer should be corrected for the unfolded chain. In our method, the thicker hydration layer for flexible boundary elements may be considered in the calculation of velocity field correlation by the γ parameter, which effectively reduces the distance between rigid and flexible surface elements.
Applying our assumption of two surfaces and velocity correlation to the boundary element method, tumbling times were calculated with arrays of γ and ε values for 12 reference proteins which have 14 to 103 disordered segments at the N- and/or C-terminus, represented by 1,000 ensemble structures. The average deviation from the experimental tumbling time (|τcpred−τcexp|) for 1,000 ensemble structures was represented by a color code for each protein (Figure 3). Blue regions display good agreement between prediction (τcpred) and experiment (τcexp). A wide range of γ and ε parameters results in good agreement for proteins with relatively short disordered segments. In contrast, proteins with long disordered segments require a more restricted range of γ and ε parameters for the same degree of agreement. This is consistent with the expected behavior of proteins with short disordered segments, which should be more similar to rigid proteins than those with long disordered segments: any combination of γ and ε will produce a good result in the limit of a fully rigid protein. Surprisingly, there exists a consensus range of γ and ε parameters [γ:4-8 Å, ε:20-24 Å] that give good predictions for all proteins tested. The consensus range was obtained by averaging the absolute deviation (|τcpred−τcexp|) at each combination of γ and ε parameters over the 1,000 ensemble structures and 12 reference proteins (Table 2). Although γ and ε are not completely independent, a non-zero value of γ is necessary for minimum mean deviation from the experimental data. Since the main purpose of the parameter γ is to correct the hydration thickness of disordered and rigid surface boundaries in the calculation of the velocity correlation, γ should be interpreted only within a reasonable range. The τcpred obtained by using the consensus set of γ and ε parameters and an ensemble average of 1,000 structures gives |τcpred−τcexp| ≈ 0.6 (ns) (Table 2), which is sufficiently accurate for most purposes.
That a consensus is obtained in this case indicates that the relation of the correlated velocities between the disordered chain and the rigid surfaces of any given member of ensemble is essentially the same for all proteins tested. More importantly, the average stochastic influence of the disordered segments on the overall rotational tumbling of the rigid part can be described in a systematic way, regardless of the amino acid sequence or length of the disordered segments.
In contrast to our method, application of the empirical SE relation1 to proteins with disordered terminal segments always underestimates τc (Figure 4). This is because of the general preference of disordered segments for extended conformations; proteins containing disordered segments occupy a larger effective volume and therefore have lower density than rigid globular proteins of equivalent molecular weight. Since the SE relation is density-based, it is biased in this case. By contrast, application of the conventional boundary element method, where each ensemble structure is assumed to be rigid, always overestimates the τc because the effective volume of the molecule is smaller than expected due to the flexibility of the disordered chain. In addition, the resulting distribution of tumbling is substantially broader (blue curves in Figure 5), and the maximum of the distribution occurs at significantly greater τc values than predicted by our method or measured experimentally.
The conformation of the disordered segment in our method is represented by a set of ensemble structures generated by applying backbone (ϕϕ–ψ) and side chain dihedral angles (X1–X4) randomly selected from an amino acid specific coil library, rejecting conformations that exhibit steric clashes. Sets of ensemble structures generated from this coil library have provided good representations of the unfolded state of proteins in the interpretation of residual dipolar coupling (RDC) and small angle X-ray scattering (SAXS) data12: ensemble structures were generated by adding consecutive peptide planes and tetrahedral junctions in the inverse direction to the protein sequence, starting from the terminus of the folded domain and progressing to the terminus of the flexible domain. Apart from this minor difference in implementation of the coil library, the sampled conformational space or conformational preference for disordered segments should be essentially the same in the published version12 as obtained by our ensemble generation method.
The mean value for conformation-dependent tumbling times can be obtained by an ensemble average according to a probability distribution of the conformations. In the disordered state of proteins, the free energy is dominantly affected by chain elasticity, excluded volume or steric exclusion and solvent interaction32. Since all ensemble structures generated in our method are supposed to have the same bond lengths, bond angles and excluded volume, the internal energy and thus probability for each ensemble structure can be assumed equal, and the conversion from one conformation to another might be expected to be faster than tumbling because of the zero or low energy barrier between conformations. In principle, the tumbling times in different conformations are averaged by two mechanisms, depending on the rate of interconversion between substates. If the interconversion is faster than the time scale of tumbling, all molecules would experience a similar mean tumbling time, even while averaging was occurring at an individual molecule level. On the other hand, if the conversion is slower than the time scale of tumbling, molecules would experience different tumbling times; in this case, the ensemble average would be applicable. When interconversion is rapid, which is the most likely case for a disordered ensemble, the averaging is performed over the time for an individual molecule, but it will be the same as the averaging performed over ensemble conformations if the duration of measurement or observation is long enough that a system accesses all the substates with equal probability. The various methods for measuring the rotational tumbling time, such as NMR relaxation, electric birefringence, fluorescence depolarization, or dielectric relaxation, differ in their sensitivity to motions on different time scales. For example, NMR relaxation measurements (R1 and R2) are relatively insensitive to motions slower than the tumbling time, as indicated by long time scale molecular dynamics33. However, the bias in the comparison between the ensemble average and the NMR measurement should be negligible since the chance of interconversion on the time scale of tumbling would be very small. Furthermore, the calculated tumbling times for ensemble conformations are sufficiently homogeneous that the possible difficulties arising from the ensemble average under the influence of an ambiguous interconversion rate may be ignored (Figures 5 and and6).6). The distribution of tumbling times calculated for each ensemble structure is narrow, with its maximum occurrence found at the ensemble averaged mean tumbling time (Figure 5). This is consistent with the dominant influence on the tumbling of the rigid portion of the protein of the flexible surface elements near the rigid boundary, which, unlike boundary elements located further away from the interface, do not change their structure much between individual conformers in the ensemble of the flexible portion. In accordance with the narrow distribution of calculated tumbling times, a relatively small number of ensemble structures (~1,000) is sufficient to achieve practical convergence (Figure 6). Global parameters of a disordered ensemble, such as tumbling time, size, and shape, appear to require a smaller number of ensemble structures (1,000-2,000) for convergence than do local parameters such as bond vector orientations (measured by residual dipolar couplings), which require up to 50,000 ensemble structures for convergence12.
The validation described above was carried out for proteins with N- and/or C-terminal disordered segments. However, the relation of velocity fields between the rigid and the instantaneous surfaces (eq. 11) should also be valid for multi-domain proteins connected by a disordered linker because no assumption was made on the conformation of the disordered chains to describe the relation: the domains can be regarded as special conformations formed by the disordered segments. In an ensemble for such a protein, each domain would retain its fold but overall conformation would fluctuate due to the heterogeneous conformations of the linker. For each ensemble structure, the τc of a specific domain can be calculated by defining the rigid surface to enclose the domain of interest and the instantaneous surface to enclose the whole molecule. Once the rigid and instantaneous surfaces are defined, the subsequent boundary element calculations are identical to those for the proteins with terminally disordered segments.
Two domains connected by a disordered linker affect the tumbling of each other. However, this coupling of tumbling becomes negligible as the length of the disordered linker increases beyond a certain length. The velocity field calculations adopting the residue number distance to the beginning of the flexible part as δ (Figure 2, distance c; eq. 11) show that two velocity fields separated by about 20 and 45 disordered residues would be correlated to about 36% (≈1/e) and 10%, respectively (Table S2 and Figure S2). In the disordered segment, each residue is at a different distance from the beginning of the flexible part and its contribution to the tumbling of the rigid part is additive, but diminishes as the distance increases. For example, τc of the rigid globular portion of the Syrian hamster prion protein is 7.4 ns (calculated) and the addition of 44 disordered residues increases the τc by 6.2 ns (measured), but addition of a further 59 disordered residues only increases the τc by another 2.4 ns (measured) (Table 1). Although the residue number distance (Figure 2, distance c) is not the best predictor for τc, two domains connected by a disordered interdomain linker of about 45 or more residues may be regarded as decoupled within the approximate limit of projected error (1.1 ns for the method using the distance c).
To validate the application of the method to a protein containing rigid domains separated by a flexible region, we performed calculations for calmodulin (CaM). CaM plays a critical role in Ca2+-dependent signaling pathways and has two globular Ca2+ binding domains connected by a flexible linker36,37. The flexibility of the linker is essential for CaM to rearrange the Ca2+ binding domains and bind tightly to numerous target proteins. NMR relaxation data for the Ca2+-ligated Drosophila CaM showed that the tumbling times for the N- and C-domains are 7.1 and 6.3 ns at 35°C, respectively; at 20°C the corresponding correlation times would be 10.4 and 9.2 ns for the N- and C-domains, respectively37. On the basis of the crystal structure of the Ca2+-ligated Drosophila CaM (PDB id: 4CLN)38, the calculated tumbling times for the full-length protein (residues 1-148; dumbbell-like overall conformation), and the isolated N-domain (residues 4-76) and C-domain (residues 82-147) are 14.1, 4.9, and 4.3 ns at 20°C, respectively. The measured tumbling times are significantly shorter than the tumbling time expected for the dumbbell-like conformation, but are much longer than expected for the isolated domains, suggesting that the tumbling of the N- and C-domains is partly decoupled due to the flexibility of the interdomain linker. However, it remains unclear whether the measured tumbling times are explained by complete or partial disorder of the linker. We therefore applied our method to estimate the τc of the CaM, assuming that the linker is completely disordered. An ensemble for the full-length CaM was generated with residues 1-3, 77-81, and 148 being completely disordered while retaining the N- and C-domain folds as in the crystal structure (number of ensemble structures 1,000). The boundary element calculations were performed separately for the N- and C-domains using the consensus parameter set [γ:4-8 Å, ε:20-24 Å]. For instance, the rigid surface was defined to enclose the N-domain when the τc is calculated for the N-domain. The tumbling times for the N- and C-domains calculated using our approach (10.1 and 9.5 ns at 20°C, respectively) are in excellent agreement with the experimental data, suggesting that the interdomain linker is completely disordered in the Ca2+-ligated form of CaM.
As an example of the utility of our prediction method, we describe its application to obtain insights into the structural basis for an order-to-disorder regulatory switch in the protein tyrosine kinase Syk34. The Syk kinase plays a important role in signaling through the antigen B cell receptor, binding the receptor through two tandem SH2 domains connected by a folded 45 amino acid interdomain A. Upon phosphorylation of Tyr-130 in the interdomain A, Syk dissociates from the receptor. On the basis of changes in the rotational tumbling times of the two SH2 domains in wild-type Syk (τc=19.2 ns, derived from NMR relaxation measurements) and in a Tyr130Glu mutant that mimics the phosphorylated form (τc=12.1 ns), Zhang et al.34 suggested decoupling of the two SH2 domains in the phosphorylated Syk by a mechanism involving disordering of interdomain A. Predicted values of τc for the isolated SH2 domains (7.5 and 7.6 ns for the N- and C-terminal SH2 domains, respectively, estimated using a beads approximation) provided qualitative support for the mechanistic model34. However, since the predicted and measured tc values differ so significantly, it is unclear whether the decreased rotational tumbling time in the Glu130 mutant results from complete or partial disordering of Syk. We therefore applied our method to estimate rotational correlation times in this system. Following the same procedures used for the 12 reference proteins, three protein models were constructed based on the crystal structure of Syk35 (PDB id: 1A81): (i) full-length Syk containing both SH2 domains and the folded interdomain A (residues 2-262; rigid residues 11-260; number of ensemble structures 1,000); (ii) the N-terminal SH2 domain plus interdomain A (residues 2-159; rigid residues 11-114; number of ensemble structures 1,000); and (iii) interdomain A plus the C-terminal SH2 domain (residues 115-262; rigid residues 160-260; number of ensemble structures 1,000). In the full-length Syk (model (i)), interdomain A (residue 115-159) was assumed to be ordered and globular, while in models (ii) and (iii) the interdomain A was assumed to be completely disordered, leading to an assumption that the two SH2 domains are decoupled. The boundary element calculations were performed using the consensus parameter set [γ:4-8 Å, ε:20-24 Å]. The resulting τc for the full-length Syk, and the N- and C-terminal SH2 domains with disordered interdomain A are 20.4 ns, 12.6 ns, and 11.3 ns, respectively. Similar boundary element calculations were performed with a full-length Syk ensemble in which the interdomain A is completely disordered while both SH2 domain conformations are retained (number of ensemble structures 1,000), so as to verify whether the 45 disordered residues of the interdomain A are enough to decouple the two domains. The resulting τc for the N- and C-terminal SH2 domains are 13.3 ns and 13.0 ns, respectively, which are close to the values based on the decoupling assumption made in the models (ii) and (iii), implying that the two SH2 domains are almost, if not completely, decoupled. These values are in excellent agreement with the experimental data in the limit of projected error of prediction (≈ 0.6 ns), which strongly supports the suggestion that the interdomain A behaves like a disordered chain and that the two SH2 domains are decoupled in the Tyr-130 phosphorylated state34.
An ensemble approach to the boundary element method has been applied to the problem of molecular tumbling of proteins with long disordered segments. In the presence of disordered segments, the tumbling of the folded, rigid portion becomes slower than it would be for this part alone. The dragging effect can be described in a systematic way regardless of the sequence or length of the disordered segments, using the assumption of two surface boundaries and the specific relation between them. In general, the presence of disordered residues farther than a defined cutoff distance from the rigid part has a negligible effect on the tumbling of the rigid domain, although the disordered regions do contribute to the heterogeneity of the ensemble conformations. These observations explain the significant discrepancies between experimental data and values calculated based on the assumption that each ensemble conformation is rigid. Extension of the classical boundary element method or bead modeling into the problem of a disordered ensemble is largely limited by poor understanding of the motional coupling between the rigid and disordered segments and the hydrodynamic nature of the disordered chain. Nevertheless, the tumbling times predicted by our approach are in excellent agreement with experiment. This prediction method is therefore sufficiently accurate that it can be utilized to identify important biological events such as inter- and/or intra-molecular interactions between unfolded and folded regions of a protein and to probe the mechanistic basis of order-disorder switches.
We thank Dr. Phineus Markwick and Prof. Herold A. Scheraga for insightful discussions and suggestions. This work was supported by grants CA96865 and AG21601 from the National Institutes of Health and by the Skaggs Institute for Chemical Biology. SHB was supported by a fellowship from the Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research Promotion Fund) (KRF-2006-352-C00036).
Supporting Information Available
Supporting information is available for testing of our implementation of classical boundary element method with fully rigid proteins (Table S1 and Figure S1) and for comparison between alternative calculations of velocity correlations (Table S2 and Figure S2). This information is available free of charge via the Internet at http://pubs.acs.org/.