|Home | About | Journals | Submit | Contact Us | Français|
T-Analyst is a user-friendly computer program for analyzing trajectories from molecular modeling. Instead of using Cartesian coordinates for protein conformational analysis, T-Analyst is based on internal bond-angle-torsion coordinates in which internal torsion angle movements, such as side-chain rotations, can be easily detected. The program computes entropy and automatically detects and corrects angle periodicity to produce accurate rotameric states of dihedrals. It also clusters multiple conformations and detects dihedral rotations that contribute hinge-like motions. Correlated motions between selected dihedrals can also be observed from the correlation map. T-Analyst focuses on showing changes in protein flexibility between different states and selecting representative protein conformations for molecular docking studies. The program is provided with instructions and full source code in Perl.
The conformational dynamics of proteins play important roles in their functions and regulating ligand binding. A fundamental appreciation of how proteins work requires study of conformations and dynamics, as well as changes between states of protein motions, such as folded/unfolded and ligand-bound/-free states. For example, protein allosteric effects may be related by either or both conformational and dynamical changes [1–3]. Molecular dynamics (MD) simulations provide powerful tools for the exploration of the conformational energy landscape accessible to protein molecules because multiple conformations are difficult to probe experimentally [4–6]. Moreover, recent computer-aided drug discovery studies have focused on protein flexibility in molecular docking processes [7–9]. Since most docking software prefers proteins to be rigid to avoid intensive computational effort, a promising strategy is to dock ligands into protein conformation ensembles obtained from MD simulations [10, 11]. Several programs provide general tools or special modules for analysis of MD results and clustering conformations, but most are based on Cartesian coordinates [12–17]. T-Analyst uses internal bond-angle-torsion (BAT) coordinates, which are efficient in capturing side-chain rotamers and most low-frequency motions [18–20]. Our program provides useful tool to analyze MD trajectories. For example, users can easily view proper rotameric states of dihedrals from the output files instead of plotting and correcting them manually. This program allows for efficient analysis of MD simulations to study protein flexibility and extract structural information for virtual screening.
T-Analyst reads NAMD, Amber or CHARMM trajectory files and Amber topology file. The CHARMM-type topology files can be converted to Amber topology files easily with the freely available CHAMBER program . The program implements amber2accent to transform Cartesian coordinates to BAT coordinates . To eliminate repeats, only heavy-atom side-chain torsion angles and ω, and ψ angles in backbones are considered. Users can choose angles and residues for analysis, and a dihedral distribution and its rotations during a simulation are output as .agr-format files, which can be viewed with Xmgr/Grace. Moreover, results generated by T-Analyst can be visualized by freely available packages such as VMD, Xmgr/Grace or R [12, 23, 24]. Users can output a series of files, such as distributions of all or selected torsion angles in a protein, or sorted and/or unsorted standard deviations and entropy. T-Analyst also groups different conformations based on rotameric states of residues of interest and outputs coordinates of grouped conformations into different trajectory files. The program also computes pairwise cross-correlation coefficients for all pairwise dihedral angles that users selected. By examining the output correlation map, dihedrals that correlate with each other can be observed.
To correctly reflect rotameric states and capture the angle periodicity, T-analyst detects the angle population in margins, such as ±180° and 360°/0°, depending on how users output the angles. If discontinuity in a dihedral distribution is detected (see Fig. 1a, c), the output angle range will be corrected; thus, energy wells can be illustrated and defined properly (Fig. 1b, d). For example, Fig. 1c illustrates a rotameric state of side-chain Ile 153 of ligand-bound tryptophan synthase (TRPS), whereby one energy well is split into two wells near −180° and +180°. T-Analyst automatically detects the peak discontinuity at each edge of −180° and +180° and determines the range of each discontinued peak. Later, the whole population will be divided into left set (−180° to 0°) and right set (0° to +180°) and be counted in each set. T-Analyst angle correction is then applied by moving the discontinued peak on the set with smaller population to the set with larger population. For example, in Fig. 1c, the set from −180° to 0° has the smaller population, so the peak near the −180° margin was shifted by +360° and merged with the other margin at +180°.
T-Analyst calculates configurational entropy, Sconf, for each torsional degree of freedom by the Gibbs entropy formula: TSconf(i) = −RT Σ Pi ln(Pi), where Pi is the probability distribution of angle i, R is the gas constant and T is the absolute temperature. T-Analyst calls Xmgr/Grace to generate histogram for each degree of freedom. The bin size for each Pi is 0.5° for ω angle, 1° for and ψ angles and 5° for side-chain dihedrals. The value of TSconf(i) has unit of kcal/mol, which allows for direct comparison with energy calculations. Summing TSconf(i) provides a quick approximation of the entropic contribution of a system, although coupling between torsions is ignored .
T-Analyst also computes the quasi-harmonic approximation (QH) from BAT coordinates. The covariance matrix C can be computed, with the probability distribution functions approximated by a multidimensional Gaussian distribution function . The configurational entropy from QH is computed by TSQH = 1/2 nRT + 1/2 RT ln[(2π)n det(C)], where n is the number of torsions. Although QH assumes that the probability distribution function is Gaussian, which is accurate for torsions that have only one rotameric state, TSQH provides an upper bound limit for the configurational entropy [25, 27]. Moreover, the off-diagonal elements of the covariance matrix indicate the degree of significance of the coupling between the given torsions. Entropy computed from only the diagonal elements of the covariance matrix, TSQH_diag, is also computed. If TSQH equals TSQH_diag, then there is no coupling among these torsions.
The extent to which pairs of dihedrals are correlated with one another can be assessed by examining the magnitude of their cross-correlation coefficients. T-Analyst computes a correlation matrix of dihedrals and calls levelplot function in R to plot a correlation map. Users can select dihedral angles, e.g. backbone and ψ angles of selected residues, to plot a correlation matrix. Typical characteristics of a correlation map include a line of strong cross-correlation along the diagonal (where matrix element i = j), and off-diagonal cross-correlations. The high diagonal values are set to 1.00. Off-diagonal correlations can be either positive or negative, and non-zero values may indicate potentially interesting correlations between two close proximity or non-contiguous regions of a protein system.
Our program clusters protein conformations on the basis of user-selected rotameric states of residues. Although RMSD-based clustering methods are mostly applied to group conformations with significant differences, small fluctuations are challenging to detect with classical RMSD-based clustering methods. This module is particularly useful for choosing representative conformations based on side-chain rotations of key residues. Users can input specific torsions with rotameric states of interest and the range of each torsion to run T-Cluster, the second part of T-Analyst. T-Analyst will provide all the combinations of groups for further analysis. The program is sensitive to dihedral rotations and can efficiently group user picked backbone or side-chain dihedrals into separate trajectory files. A report file is also generated to record information about each group.
Molecular dynamics simulations on ligand-free and ligand-bound TRPS were performed using the NAMD package . Standard simulation procedures (e.g. () were followed using the Amber 10 package and ff03 Amber force field and general Amber force field [6, 29]. Initial coordinates of TRPS were taken from PDB code 2J9X and 3CEP [30, 31]. Briefly, after preparation of the system by sequential steps of energy minimization and equilibration, the 30 ns production runs were carried out at 298 K and 1 atm. The systems were solvated by a 12 Å TIP3P water box. Snapshots of the atomic coordinates were recorded every 1 ps. As T-Analyst does not require too many frames to run the analysis, snapshots were saved every 20 ps for T-Analyst and 1,500 frames were used. Molecular dynamics simulations on HIV-1 protease were initiated from crystallographic coordinates with a semi-open flap conformation (PDB code 1HHP) . Amber ff99 force field was used for the protein. Aqueous solvation was modeled implicitly by using the Generalized Born approach  and temperature was maintained at 298 K by using Langevin dynamics. Standard simulation procedures (e.g. ) were followed with the Amber 9 package. Since the free protease predominantly populates the semi-open conformation, we took a 1.5 ns MD simulation and saved it as one 1,500 frame trajectory which had one flap open state for our analysis.
Analysis of small backbone fluctuations, as well as conformational changes, involves investigating loop or side-chain motions. T-Analyst adopts torsion angle analysis, which allows for accurate expression of bond rotations. For example, Fig. 1a shows a rotameric state of side-chain Ile 153 of ligand-bound TRPS. Standard deviations of this torsion before and after angle correction are 136.4° and 49.4°, respectively. Large differences in standard deviations usually indicate changes in rotameric states. Of note, proper angle correction is necessary for computing accurate rotamers and their standard deviations.
With careful superposition, flexible regions in proteins can be revealed by computing root mean square deviation and fluctuation (RMSD and RMSF, respectively) in Cartesian coordinates . Sometimes one or a few residues may serve as “hinges,” which mainly control movements of a flexible region. While the RMSD/RMSF method may not be sensitive enough to detect hinges, T-Analyst can illustrate torsion angles in hinges, which are associated with protein motions. User can compare multiple torsion distributions with protein motion to detect the hinge region. Figure 2a shows MD simulations of HIV-1 protease with open flaps, and the distance between flap tips is in Fig. 2c. Highly flexible regions such as flaps and elbows are shown in the RMSF plot (Fig. 2b) but determining whether any residues behave as hinges is not straightforward . Data from T-Analyst imply that the flap β-hairpins move relatively rigidly, but some residues of the elbow, such as Gly 40, act as hinges (Fig. 2d).
One common method to express protein flexibility is by showing their rotameric states computed from corrected torsion angle distributions. When comparing the rotameric states for proteins between different states, e.g. ligand-bound/-free states, folded/unfolded states, users can have valuable information regarding the protein conformational or flexibility changes between different states. Different ligand mechanisms, such as induced fit or conformational selection (population shift) can be studied [1, 36] .
Here we use TRPS, a protein with strong allosteric regulation of substrate binding, as an example. In both ligand-free and ligand-bound TRPS, the major peak of Glu 49 remains for both states to keep its favorite rotameric state (Fig. 3a, e), which is often assumed . However, instead of dropping the minor peak, the peak changed while the substrate binds. We found that the change may be due to different hydrogen-bonds formations. In the ligand-bound state, Glu 49 form stable hydrogen-bonds with ligand 3-indole-d-glycerol-3′-phosphate (IGP), Tyr 173 and Val 23, and a residue near Glu 49, Tyr 175, also forms a hydrogen-bond with IGP. These interactions stabilize the local environment for the ligand which is necessary for catalysis. In the ligand-free state, since there is no ligand to interact with the protein in the binding site, more free space is allowed for side-chains to move around, though less stable hydrogen-bonds between side-chains can still formed. Without the ligand, the hydrogen-bond between Glu 49 and Tyr 173 is absent and Tyr 173 flips away from the binding site. Further detailed analysis shows that the distance between Tyr173:OH and Glu49:CD changes from 3.6 ± 0.16 Å in ligand-bound state (red dashed lines in Fig. 4a) to 17.5 ± 0.63 Å in ligand-free state (red dashed lines in Fig. 4b). As a result, instead of completely diminishing one rotameric state of Glu 49, the interaction induces a new rotameric state to alter the local environment to accommodate the ligand. Similarly, the major peak of residue Ile 153 in Fig. 3b, f remains in the same position in both states, but it is more populated in the ligand-bound state. Upon IGP binding, the methyl group of Ile 153 interacts with the non-polar region of the ligand and stays in one peak. Therefore, the population of the rotameric states is shifted.
Docking potential ligands to target protein is one of the key steps in drug design and discovery process. During ligand–protein docking process, the position of side-chains in binding site can affect ligand–protein interaction significantly and directly, especially when using rigid protein conformations. Here we use the program to group conformations and then select representative conformations for performing docking studies to the α-subunit of TRPS. TRPS is a potential antibiotic target, and two torsion angles, Glu 49 and Ile 153, are known to directly involve in ligand–protein binding (see Fig. 3). Figure 3a, b show the distribution of side-chain torsion angles in both residues in ligand-free state which is output by T-Analyst after angle correction. Two populations are shown in each of the torsion angle distributions. T-Analyst can group the trajectories into four groups which are characterized by—group 1 (Glu 49-a, Ile 153-a), group 2 (Glu 49-a, Ile 153-b), group 3 (Glu 49-b, Ile 153-a), group 4 (Glu 49-b, Ile 153-b). Figure 3c, g show the RMSD distribution of Glu 49 and Ile 153 with simulation time. Figure 3d, h are the group distributions for the two torsion angles. Notably, although different protein conformations are usually clustered based on computed RMSD, the value is less sensitive to small scale conformational changes, such as side-chain rotations in a protein’s active site. As illustrated in Fig. 3c, g, there is no clue from RMSD that may be used for grouping. In contrast, T-Analyst also provides information regarding jumping between groups during a course of MD simulation, see Fig. 3d, h.
Protein allosteric effects or post-translational modifications such as phosporylation do not always involve substantial conformational changes. Recent experiments confirmed that in some cases, visual inspection of the active/inactive states may not reveal differences in the shape of the ligand binding site, but changes in protein dynamics . Therefore, the magnitude of configuration entropy computed from dihedral degrees of freedom provides a direct way to examine protein flexibility.
The total entropy of a molecule in solution can be separated into two parts: solvent entropy associated with water motion and configurational entropy (Sconf) associated with molecular motion. The latter can be used to represent protein flexibility. Therefore, quantifying the configurational entropy, especially changes, could help explain important biochemical processes such as protein folding and ligand–protein binding. Figure 5 shows Sconf calculated from selected backbone and side-chain dihedrals in the binding site of TRPS. T-Analyst suggested that the flexibility of a loop region changes significantly when the ligand binds into the pocket.
The calculations can provide quantitative information regarding entropy gain or loss. For example, in Fig. 5b, the side-chain entropies of Tyr 175 and Thr 183 drop notably, around 0.7 and 0.6 kcal/mol, respectively. In the ligand-bound state, Tyr 175 forms a stable hydrogen-bond with ligand IGP which stabilized the phenol ring, and Thr 183 also forms hydrogen-bonds with Asp 60 and Gly 61. In ligand-free state, both side-chains move more freely as there is no stable hydrogen-bond formed.
Although in most cases, one can observe entropy decrease upon ligand binding, there may be some exceptions which are of particular interest and worth further analysis. The entropy changes computed by T-Analyst provide guidance for users to pick up regions of a protein to do detail dynamic analysis. For example, T-Analyst showed an entropy gain in the second side-chain of Asp 60 in the presence of ligand, though not largely. Based on the information, we carried out further investigation near Asp 60, and found that two oxygen atoms in carboxyl group can form hydrogen-bonds with Tyr 102 and Thr 183 in both ligand-bound and ligand-free states. Interestingly, in the ligand-bound state, the two oxygen atoms of Asp 60 can flip very often but retain two stable hydrogen-bonds with Tyr 102 and Thr 183 alternatively. The presence of IGP provides a more hydrophilic environment around the carboxyl group of Asp 60; thus, the oxygen atoms can flip more freely. As a result, the local entropy increases without losing hydrogen-bonding and reducing electrostatic attraction. In contrast, when IGP is absent, the local environment around Asp 60 is mainly hydrophobic. Therefore, Asp 60 forms hydrogen-bond with Thr 183, the other hydrophilic residue, and the residues become less flexible.
T-Analyst also performs cross-correlation analysis of a trajectory. The resulting cross-correlation map allows the identification of the correlated and anti-correlated motions involved in an entire protein or user selected dihedrals. Similar to calculating standard deviations, dihedral angles used to generate a correlation plot also need to run angle correction, or discontinuities in margins (±180° or 360°/0°) can cause errors when computing their cross-correlation coefficients.
Here we use HIV-1 protease as a test case, which continues to be one of the primary targets of anti-AIDS drug discovery. Understanding the dynamics of free HIV-1 protease has profound implications for designing new therapeutic agents, such as allosteric inhibitors. Figure 6a shows the structure anatomy of the protein. Yellow boxes in the correlation map, computed by the backbone and ψ angles with T-Analyst (see Fig. 6b), show the highly correlated regions within the HIV protease monomer. For example, boxes I, IV, VIII and X indicate flexible and concerted motions within fulcrum, flap, cantilever and loop regions. Other boxes highlighted in Fig. 6b indicate the correlations between flap elbow, flap tip, cantilever, fulcrum and loop regions. It has been suggested that motions of the flap tip and elbow are highly correlated, which are illustrated in our plot as well . Interestingly, correlations between the flap elbow and cantilever are identified (Box VI), and the motions of flap tip and cantilever are also correlated (Box VII), even though not quite strongly. We compared our plot to another correlation map generated by the use of the backbone Cα and N atoms in the Cartesian coordinates with the Bio3D package (Fig. 6c) . Both plots suggested similar correlations. However, because of the nature of Cartesian coordinates, patterns of correlations are harder to distinguish in Fig. 6c.
MD simulations provide invaluable conformational and dynamical landmarks useful for designing new experiments and for theoretical studies. The current analysis method we describe, T-Analyst, can help exam protein motion, identify structural and dynamic features, reveal changes of flexibility in different states, and group conformations based on dihedral rotamers. Analyzing the growing MD data may be the most time-consuming step in simulation studies, and our program facilitates this work. The program can be freely downloaded from http://research.chem.ucr.edu/groups/chang/tools.htm.
We thank Dr. Thomas Steinbrecher for helping with amber2accent.pl. This work was supported by start-up funds and the Regents Faculty Fellowship from the University of California, Riverside, and the National Science Foundation (MCB-0919586).
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.