|Home | About | Journals | Submit | Contact Us | Français|
Drug-resistant mutations (DRMs) in HIV-1 protease are a major challenge to antiretroviral therapy. Protease-substrate interactions that are determined to be critical for native selectivity could serve as robust targets for drug design that are immune to DRMs. In order to identify the structural mechanisms of selectivity, we developed a peptide docking algorithm to predict the atomic structure of protease-substrate complexes and applied it to a large and diverse set of cleavable and non-cleavable peptides. Cleavable peptides showed significantly lower energies of interaction than non-cleavable peptides with six protease active-site residues playing the most significant role in discrimination. Surprisingly, all six residues correspond to sequence positions associated with drug resistance mutations, demonstrating that the very residues that are responsible for native substrate specificity in HIV-1 protease are altered during its evolution to drug resistance, suggesting that drug resistance and substrate selectivity may share common mechanisms.
Human immunodeficiency virus type 1 (HIV-1) protease is responsible for the processing of Gag and Pol polyproteins, making it critical for viral assembly and maturation and an important target for antiretroviral therapies. The ten Federal Drug Administration-approved protease inhibitors are substrate mimics that resulted from structure-based drug design efforts of the pharmaceutical industry. The efficacy of these drugs is significantly hampered by the emergence of drug resistant mutations (DRMs) in HIV-1 protease that are thought to preferentially alter drug binding over substrate binding to the protease. Understanding the structural mechanisms of molecular recognition between HIV-1 protease and its substrates will be critical to the development of a second generation of protease inhibitors in the treatment of HIV-1 infection.
HIV-1 protease is an aspartyl protease dimer that processes the Gag and Pol polyproteins at 10 cleavage sites (hereafter referred to as endogenous substrates) by recognizing an 8-residue stretch surrounding the cleavage site. By convention, these substrate residues are denoted P4-P3-P2-P1-P1'-P2'-P3'-P4', where the scissile bond lies between the P1 and P1' residues. Substrate specificity of HIV-1 protease has been studied extensively; large databases of cleavable and non-cleavable peptides have been assembled (Chattopadhyay et al., 1992; Hellen et al., 1989; Oswald and von der Helm, 1991; Riviere et al., 1991; Tomasselli et al., 1993; Tomaszek et al., 1992; Tozser et al., 1991) and the sequences for the ten endogenous substrates have been identified (Debouck et al., 1987). Sequence-based methods including artificial neural networks and support vector machines have been developed to discriminate between cleavable and non-cleavable peptides (Kim et al., 2008; Kontijevskis et al., 2007b; You et al., 2005). From these sequence-based methods in conjunction with experimental studies (Bagossi et al., 2005; Eizert et al., 2008; Ridky et al., 1996), a complex picture of HIV-1 protease selectivity has emerged. With the exception of two relatively common sequence motifs at P1-P1’, aromatic-proline (Aro-P) and hydrophobic-hydrophobic (H-H)(Beck et al., 2002), there are few salient features in the sequences of cleavable peptides and a high degree of interdependence of various peptide residues (Kontijevskis et al., 2007b). In an impressive study by Kontijevskis et al. (Kontijevskis et al., 2007a), a statistical model developed from a large database of cleavable and non-cleavable peptides for nine different retroviral proteases identified a number of physico-chemical relationships between peptide and protease residues that accurately define and predict cleavability. Ultimately, purely sequence-based methods, can, at best, implicate, but not explicitly model, the underlying structural and energetic mechanisms of substrate selectivity that are essential for drug design.
The structural details of protease-substrate interactions have been characterized through crystallization of HIV-1 protease in complex with various substrates (Prabu-Jeyabalan et al., 2000, 2002; Tie et al., 2005). Prabu-Jeyabalan et al. crystallized six of the ten endogenous substrates in complex with a de-activated HIV-1 protease and proposed the substrate envelope hypothesis to explain HIV-1 protease selectivity (Prabu-Jeyabalan et al., 2000, 2002). They observed that all six substrate peptides conformed to a common volume within the protease active site despite significant diversity in their sequences and theorized that substrate selectivity is determined primarily by whether a given peptide sequence is able adopt a low-energy conformation that fits within this volume, or substrate envelope. This hypothesis was evaluated in the context of HIV-1 protease inhibitors and it was found that the inhibitors also conform to the substrate envelope. More interestingly, the areas of the active site where the inhibitor protruded from the envelope, and consequently formed non-substrate-like interactions with the protease, were adjacent to DRM residue positions (Chellappan et al., 2007a; King et al., 2004). Subsequent design of small molecules that fit exclusively within the substrate envelope led to tight binding inhibitors that showed low to moderate tolerance of drug resistant mutations (Altman et al., 2008a; Chellappan et al., 2007b; Surleraux et al., 2005).
Despite the prevalence of sequence-based methods modeling substrate discrimination, and the apparent success of the substrate envelope hypothesis in inhibitor design, there is a dearth of structure-based methods for modeling HIV-1 protease selectivity. Kurt et al. used a coarse-grained sequence threading approach with an empirical potential function to successfully discriminate binders from non-binders in a small set of 16 peptides and identified peptide internal conformational energy as an important discriminating factor (Kurt et al., 2003). Ozer et al. used a similar coarse-grained approach to test binding of a very large set of random sequences and demonstrated that some sequence motifs in endogenous substrates are near-optimal for binding (Ozer et al., 2006). In both these cases, the lack of atomic resolution in both the structural model and potential function limit the conclusions that can be drawn about the structural mechanisms of selectivity. Wang & Kollman used molecular dynamics methods to study the differences between substrate and inhibitor binding (Wang and Kollman, 2001). In peptide design, Altmen et al. successfully designed tighter-binding single and double mutants from the substrate peptide RT-RH using a atomic-resolution computational design algorithm but did not address the issue of selectivity (Altman et al., 2008b). Finally, none of these previous studies, bioinformatic or structure-based, have systematically explored the role of protease active-site residues in selectivity, which is vital given that some of these residues are frequently mutated in drug resistance viral strains.
The present study focuses on developing an atomic-resolution structural model of protease specificity through computational peptide docking and identifying the underlying mechanisms of substrate specificity by calculating the free energy contributions of each protease and peptide residue to the binding of cleavable and non-cleavable peptides. Active-site residue interactions that are determined to be essential for native substrate selectivity could serve as robust targets for drug design because of their central role in protease function. Finally, an atomic-resolution structural model will enable us to explicitly test the substrate envelope hypothesis in the context of substrate selectivity. Given the promising results of drug design methods implicitly based on this hypothesis, any additional insight into the substrate envelope hypothesis may yield new avenues for HIV drug research.
Accurate structure prediction of the protease-peptide complex from a given peptide sequence is critical to subsequent energy calculations that model specificity. A comparison of the six previously crystallized substrates shows that there are small, but significant, differences in their peptide backbone conformations (Prabu-Jeyabalan et al., 2002). In order to accommodate these backbone conformation changes, we developed a novel flexible peptide docking algorithm within RosettaDock (Chaudhury and Gray, 2008; Gray et al., 2003). and compared it to fixed backbone/side-chain packing methods traditionally used to study peptide/protein design and specificity (Altman et al., 2008b; Humphris and Kortemme, 2007; Sammond et al., 2007). We validated these methods by comparing the accuracy of the structure predictions on the six endogenous substrates for each crystal structures have been determined (Prabu-Jeyabalan et al., 2002).
Table 1 compares the accuracy of the peptide docking algorithm and the fixed-backbone/side-chain packing method in recovering the crystal structure of the protease-substrate complex as a measure of the fraction of native residue-residue contacts recovered (fnat) and rms of interface residues (Irms). The binding energy of the protease-peptide complex (ΔEbinding), as determined by RosettaDock, is also listed (Wang et al., 2007). The peptide docking predicted all six HIV-1 protease-substrate crystal structures to within 1.1 Å rmsd with at least 68% of native contacts recovered. The fixed-backbone method performed marginally, but consistently, less accurately than peptide docking in fnat and Irms. More importantly, the peptide docking structures showed significantly more favorable binding energies than fixed-backbone structures due to a greater number of hydrogen bonds and Van der Waals contacts. In a number of cases (CA-p2, p2-NC, p1–p6), key interactions noted by Prabu-Jeyablan et al. (Prabu-Jeyabalan et al., 2002), such as the side-chain hydrogen bonds between P2' and the protease D30 or an intra-molecular hydrogen bond between P2 and P1', are absent in the fixed-backbone structures but are recovered in peptide docking. These results demonstrate that small backbone movements are necessary to recover key protease-substrate interactions.
The structural models generated from peptide docking are compared to their respective crystal structures in Fig. 2, for the representative cases of RT-RH, MA-CA, and RH-IN. Overall, the peptide conformation of residues P3-P3' is accurately recovered for both the backbone and side-chains. These residues are the most conformationally invariant among known protease-substrate structures (Prabu-Jeyabalan et al., 2002), most likely because they are deeply buried within the substrate-binding site and thus the most highly restricted. P4 and P4' are less accurately recovered, which can be attributed to more conformational freedom in these positions, which has been observed experimentally. In a number of crystal structures, either the P4 or P4' residues is partially disordered or has high thermal factors (Prabu-Jeyabalan et al., 2002). The only readily observable feature in protease-substrate crystal structures that is absent in the predicted structures is the alternate conformation of the P3–P4 backbone in the substrates RH-IN (Fig. 2C) and p1–p6 (Prabu-Jeyabalan et al., 2002).
Following the accurate prediction of the protease-substrate complexes for the six known crystallized structures, the test set was expanded to 69 known cleavable peptides and 43 non-cleavable peptides, including 4 known endogenous substrates (NC-p1, TF-PR, PR-RT, AutoP) for which no crystal structures have been determined. We used the flexible peptide docking algorithm to generate structural models for each of the 112 peptides for subsequent energy calculations.
For each structural model, the individual residue energies and total energy of the protease-peptide complex was calculated. Fig. 3A presents a histogram of the total energies for all 112 substrates and shows a separation between the energies of cleavable peptides and non-cleavable peptides, with some overlap. Over 40% of cleavable peptides have lower energy than all but ~3% of non-cleavable peptides, while over 35% of non-cleavable peptides have higher energy than all but ~5% of cleavable peptide. A Welch’s t test confirms that there is a significant difference in the distribution of the total energy between cleavable and non-cleavable peptides (p < 10−6) suggesting that structural prediction with the peptide docking algorithm followed by energy calculations using the RosettaDock energy function accurately captures, to some degree, HIV-1 protease selectivity. A similar trend is seen with just the peptide energies alone (Fig. 3B), at a lower p value (p < 10−4), indicating that protease residue energies are improving overall discrimination.
In order to further validate that our model was capturing the energetics of substrate specificity in HIV-1 protease, we sorted the protease-peptide complexes by energy, determined the relative probabilities of finding certain amino-acid types at each peptide position from P3-P3' between high-energy and low-energy peptides, and compared the results with previous experimental and bioinformatics studies. Amino acids were categorized into the following overlapping sets: small, hydrophobic, aromatic, β-branched, charged, and polar. Table 2 lists the log of the relative probabilities (Plow_energy/Phigh_energy) for each subsite, a minimum threshold of ±0.25 was set, outside of which a trend was deemed significant.
Overall there is large agreement with the amino acid type trends from our model with those previously determined for HIV-1 protease. At P1 and P1', the single most favorable feature is aromatic residues, followed by hydrophobic residues. These residue types are typically the most prominent feature in bioinformatics analyses of cleavable peptide sequences (Kontijevskis et al., 2007b; You et al., 2005) and experimental mutations of cleavage sites (Eizert et al., 2008), and they constitute 17 of 20 P1/P1' positions among endogenous substrates (Prabu-Jeyabalan et al., 2002). P2, and to a lesser degree P2’, shows a strong preference for β-branched residues, which has been demonstrated both clinically (Ho et al., 2008) and experimentally (Altman et al., 2008b; Dauber et al., 2002; Prabu-Jeyabalan et al., 2004) and reflected in the endogenous substrates where 8 of 20 P2/P2' positions have β-branched residues. P3 is notable for favoring charged and polar amino acids in our model, and corresponds with the fact that it is the most charge/polar friendly peptide residue among the endogenous sequences. P3 is also notable among endogenous substrates as the only non P1/P1' position to contain an aromatic residue, which agrees with its favorable score for aromatic residues in the model.
Among the features disfavored at the respective substrate positions, the most prominent is that of β-branched groups at P1 and P1', charged groups at any position between P2-P1', and aromatic residues at P2 and P2'. These trends agree well with rules for identifying non-cleavable peptides identified by Kontijevskis et al. (Kontijevskis et al., 2007b). The most significant discrepancy between the residue-type trends observed in the model and experimental data is the disfavoring of polar residues at P2 in the model, where experimentally, the polar Asn is favored in this position (Bagossi et al., 2005). Overall, the agreement of the purely structure-based model of specificity with the trends observed in both endogenous substrates and in bioinformatics analyses of cleavable and non-cleavable peptides suggests that the model accurately captures substrate specificity.
Since the total energy, taken as the sum of the individual residue energies for the peptide and active-site protease residues, successfully discriminates cleavable peptides and also recovers key sequence features for cleavability, we next examined whether any individual residues in the complex were predominately responsible for the observed selectivity. Towards that end, we applied a Welch’s t test on each residue energy and classified residues as ‘specificity-determining residues’ if the mean difference in energies between cleavable and non-cleavable peptides was at least 0.2 kT and the p-value was at least p < 0.05.
Table 3 lists the specificity-determining residues in the protease and peptide and their respective p-values. Among the 34 active-site protease active site residues, five were determined to be significant at p < 0.01 (D30', I47', L76, V82, I84'), and an additional sixth was found to be significant at p < 0.05 (G48'). Among the eight peptide residues, three were found to be significant at p < 0.01 (P1-P2'), and an additional fifth was found to be significant at p < 0.05 (P3'). Taking the sum of energies of the specificity-determining residues and applying a Welch’s t test shows a difference between cleavable and non-cleavable peptides at p < 10−11 while the sum of energies from all other residues shows a difference at p < 0.01, indicating that the statistical approach was successful in isolating the residues primarily responsible for substrate selectivity in this model.
Fig. 4 illustrates the spatial position of the specificity-determining residues in the protease active-site. The protease specificity-determining residues largely cluster around the P2-P2' subsites, indicating the importance of P2-P2' in substrate specificity. V82 and I84' are found in both the P1'/P1 and P2/P2' subsites, respectively, while L76 is in the P2 subsite and I47' is in the P2' subsite. The high significance of peptide residues P1-P2' in substrate discrimination confirms this observation within the model. G48' is the only protease residue not found in the P2-P2' subsites, and it shows a relatively weaker significance in the P3' subsite, along with P3'. Overall, the relative significance of peptide subsites (P1/P1' > P2' > P3') agrees with previous bioinformatics (Kontijevskis et al., 2007b) and experimental studies (Bagossi et al., 2005; Eizert et al., 2008).
The goal of this study was to identify specificity determinants that were critical to native substrate selectivity to serve as potential targets for drug design, but, surprisingly, all six of the protease specificity-determining residues were in sequence positions that are clinically associated with drug resistance mutations (Rhee et al., 2003). Fig. 3C and 3D show histograms for the energy taken as the sum of active-site residues associated with DRMs (L23, D30, L32, G48, I47, I50, L76, V82, I84 for both chains in the protease) and compares it with the sum of non-DRM protease residues for cleavable and non-cleavable peptides. The DRM-associated residues are much stronger discriminators of cleavable peptides (p < 10−3) than non-DRM residues (p < 0.01). These findings demonstrate that the very residues most responsible for substrate selectivity are the ones that are mutated in drug-resistance, suggesting that HIV-1 alters its protease substrate specificity when evolving drug resistance.
The structural mechanism of substrate discrimination for these specificity-determining residues was primarily through steric interactions within the protease-substrate complex. Fig. 5 shows a histogram of the individual residue energies accompanied by the structures of representative cleavable and non-cleavable peptides for four of the six specificity-determining residues. In the case of I47' (Fig. 5A), the primarily energetic component of discrimination was EVdW and EDun (both at p < 0.01) due to steric interactions with the P2' side chain. Likewise, in L76 (not shown), the most significant discrimination was also in the EVdW and EDun components (p < 0.05 and p < 0.01, respectively) due to interactions with the side chains from P2 and P4. In I84' (Fig. 5B), discrimination is due primarily to EDun (p < 0.05), as steric interactions from the side-chain atoms of P1 and P2' force the I84' side chain to adopt unfavorable conformations. In many cases these steric clashes were from the Cγ methyl group in β-branched side chains at P1. A more complex mechanism emerges for V82 (Fig. 5C), where EDun is the most responsible for discrimination (p < 0.01). Here, side-chain atoms of P1' and P2 force the I84 into side-chain conformations that lead to steric clashes with the V82 side chain, forcing it to adopt unfavorable conformations. These latter two examples demonstrate a network of steric interactions between P1-P2' and P1'-P2 subsites, that spans more than 10 Å, from the Cα of P2/P2' to the Cβ of V82/V82’.
In two specificity-determining residues, D30' and G48', hydrogen bonding and solvation energy played a significant role in discrimination. In G48', the sum of the hydrogen bonding energy and solvation energy (EHbond+Esolv) was highly significant (p < 0.001). In cleavable peptides, the backbone oxygen of G48' forms a hydrogen bond with the backbone amide of P3'. By contrast, non-cleavable peptides are often characterized by the presence of an unsatisfied backbone hydrogen bond that is not exposed to solvent at this position. In D30' (Fig. 5D), EVdW, EDun, Esolv, and EHbond all play a significant role in discrimination (p < 0.05). Discrimination at this position is a result of two factors: steric interactions between D30 with P2' and P4' side chains and hydrogen bonding of the D30' carboxyl group with either the solvent, the protease (through R8 and N88’), or the peptide (through P2' or P4').
Among the substrate residues, residue energies for P1, P1', P2', P3' were found to be significantly increased for non-cleavable peptides over cleavable peptide. For P1, P1' and P2' this increase was primarily the result of increased EVdW and EDun, as non-cleavable peptides had residues that exhibited steric clashes and/or adopted unfavorable side-chain conformations. For P1, there was an additional effect of EHbond+Esolv energy terms (p < 0.001), that favored residues with low solvation energy, such as hydrophobic residues, as well as uncharged residues that can form side-chain or backbone hydrogen bonds with carbonyl oxygen of G27. In our data set, Asn was the only P1 residue that could reliably form this hydrogen bond, perhaps explaining why it is the only polar residue at P1 among the endogenous substrates (Prabu-Jeyabalan et al., 2002). At P3', there is an effect of EHbond (p < 0.05) that is due to backbone-backbone hydrogen bond formation with G48'. Finally, at P2', among the 75 peptides with polar residues at this position, there was an additional effect of EHbond+Esolv energy terms (p < 0.01), due to hydrogen bond formation with either the side-chain amide of K45’ (in the case of Glu) or side-chain carboxyl and backbone amide of D30' (in the case of Gln). Hydrogen bonding between D30' and the P2' side chain has been observed in a number of protease-substrate crystal structures (Tie et al., 2005) and demonstrated to be important for specificity (Lin et al., 2003).
The substrate envelope hypothesis states that the protease selects cleavable substrates on the basis of their ability to fit within a given consensus volume (Prabu-Jeyabalan et al., 2002). Although this hypothesis was evaluated in the context of protease inhibitors (Chellappan et al., 2007a), it has never before been evaluated in the context of substrate peptides. In evaluating this hypothesis, we look at two factors: first, whether it is necessary for a cleavable peptide to fit within the substrate envelope and second, whether being able to fit inside this volume is sufficient to determine cleavability.
We defined a substrate envelope consisting of the 70% consensus volume (King et al., 2004) from the structural models for the ten endogenous substrates, as the volume of all heavy atoms within the Van der Waals radius of another heavy atom for at least 7 other substrate peptides. Fig. 6 illustrates the degree to which cleavable and non-cleavable peptides protruded from the substrate envelope at subsites P2-P2'. Overall, more heavy atoms (Natoms) protrude from the substrate envelope for non-cleavable than cleavable peptides (p < 10−5), but there were a substantial number of non-cleavable peptides that showed minimal protrusion. Approximately 90% of cleavable peptides had at most two protruding heavy atoms between P2-P2', compared to 40% of non-cleavable peptides. These results suggest that while staying within the substrate envelope is necessary for the cleavability of a peptide, it is not sufficient to determine cleavability.
Since the substrate envelop hypothesis is based on an active-site volume that is governed by steric and conformational energies, we tested whether the degree of protrusion from the substrate envelop of a given peptide was related to the energy of the protease-peptide complex. Fig. 6B shows a correlation (R2 = 0.37) between the total energy of the complex and the number of protruding heavy atoms, demonstrating that the substrate selectivity described by the substrate envelope hypothesis is largely related to the energies modeled in our study.
The spatial positions of the protease specificity-determining residues are immediately adjacent to regions of the substrate envelope with a large number of protrusions in non-cleavable peptides (Fig. 6A), suggesting these residues play an important role in maintaining the appropriate shape of the substrate envelope for native substrate recognition. This spatial relationship bears a striking resemblance to the spatial position of DRM-associated residues in the substrate envelope with regards to inhibitor interactions (Chellappan et al., 2007a; King et al., 2004). In both cases, these residues are immediately adjacent to regions of substrate peptides and drug molecules that fall outside of the envelope. Additionally, every residue identified in the aforementioned studies as active-site DRM residues are identified in the present study as a specificity-determining residue.
Finally, we attempted to capture the promiscuity of HIV-1 protease by defining a ‘binding envelope’ which is based on the 70% consensus volume of all cleavable peptide structure in the data set and compare it to the substrate envelope. Fig. 7 illustrates the binding envelope along with the substrate envelope and the inhibitor volume, defined as the Van der Waals volume of all heavy atoms in inhibitors Nelfinavir, Saquinavir, Indinavir, Ritonavir, Amprenavir, Lopinavir, and Atazanavir. A comparison of these three volumes reveals three significant characteristics. First, the binding envelope is substantially larger than the substrate envelope, indicating significantly greater structural diversity among all binders than among the ten endogenous substrates. Second, as previous studies have noted (Chellappan et al., 2007a; King et al., 2004), there are substantial portions of the inhibitor volume that fall outside of the substrate envelope, especially around the regions of DRM residues I50, V82, I84, D30. Finally, the regions of the inhibitor volume that fall outside of the substrate envelope are largely contained within the binding envelope. This is especially clear in the aforementioned DRMs.
Although binding energy models and the substrate envelope hypothesis are largely accurate in defining the cleavability of a given peptide sequence, they are, to some degree, insufficient, because the activity of HIV-1 protease to a particular substrate is a function of both the catalytic rate (kcat) and dissociation constant (Km). While differences in the energies of protease-substrate interactions may affect both kcat and Km, kcat is additionally affected by other features that are critical to enzymatic catalysis, such as active-site bond geometry, that may be essential to defining substrate specificity.
In the reaction mechanism of an aspartyl protease, one catalytic aspartate (D25) acts as a general base, deprotonating a water molecule that subsequently hydrates the carbonyl group about the scissile peptide bond to form a tetrahedral intermediate, and the second aspartate acts (D25') as a general acid, facilitating peptide-bond cleavage. We measured the geometry of peptide carbonyl group with respect to the carboxyl groups of the catalytic aspartates as the angle between the C-O bond vector of P1 and Oδ2-Oδ2 vector between D25 and D25' (θactive-site). This angle reflects the orientation of the bond geometry of the carbonyl carbon of P1 with respect to the catalytic aspartates that catalyze its hydration. In all ten endogenous substrates and approximately 90% of cleavable peptides, θactive-site ranged from 120°–150°, compared to approximately 50% of non-cleavable peptides (Fig. S1).
We observed a significant sub-population of non-cleavable peptides (~30%) that had a θactive-site between 60° to 100°. Fig. 8B illustrates the active-site geometry for two representative cases with θactive-site at 90° and 140°. The carbonyl oxygen at θactive-site of 90° occludes the proposed position of the catalytic water molecule (Silva et al., 1996), while the same oxygen at θactive-site of 140° provides ample space for the water as well as the appropriate geometry for hydration. Furthermore, among the two most prominent sequence motifs at the P1-P1' positions for cleavable peptides (Beck et al., 2002), 100% of peptides with the Aro-P motif (11/11) and ~90% of peptides with the H-H motif (46/52) were within the normal range of 120°–150°, compared to ~60% (31/49) for all other peptides. These results suggest that specificity, especially at P1 and P1', may be governed, in part, by the ability of a peptide to adopt the appropriate catalytic geometry within active site, that is not reflected by binding energetics alone.
In this study we developed a novel peptide docking algorithm that was able to accurately predict the HIV-1 protease-substrate complex structure at an atomic resolution and recover most of the important interactions for six previously determined crystal structures. We then predicted the structures for a large, diverse set of cleavable and non-cleavable peptides, calculated an approximate free energy of the resulting complex, and showed significantly more favorable energies among cleavable peptides than non-cleavable peptides. We sorted the peptides based purely on energy and recovered most major residue-type trends at P3-P3' identified in previous studies, validating our method as an accurate model of substrate specificity. Finally, through statistical analysis of the individual residue energies in each complex, we identified several specificity-determining residues that are predominately responsible for discriminating cleavable peptides.
We found that substrate specificity is largely determined by residues that cluster around the P2-P3' subsites. Steric interactions between side chains at P2-P2' in non-cleavable peptides led to significantly higher energies in the protease residues L76, V82, D30', I47' and I84'. Hydrogen bonding and solvation energies played a major role as well, particularly with D30' and G48. These results are in broad agreement with bioinformatics-based identification of specificity-determining residues over a data set of nine retroviral proteases and corresponding cleavage data carried out by Kontijevskis et al. (Kontijevskis et al., 2007a). Using multivariate analysis using physiochemical descriptors based on the protease sequences, they also found D30, V82, and I84 to play the major role in determining cleavage specificity. However, a number of residues identified in their study were not found to be specificity-determining in the present study (R8, T26, V32, I64, P81, N83, L90), and vice versa (I47, I48, L76). The reason for this discrepancy is unclear; additional specificity effects of protease residues observed in the present study may arise indirectly through the significant intra-peptide cross-dependencies observed in Kontijevskis study, or their use of a set of nine retroviral proteases may bias certain protease residues towards being specificity-determining for reasons of sequence conservation and cleavage data availability. Additionally, since Kontijevskis et al. analyzed experimental data directly, they may have captured effects such as protease dynamics or chemical activity that are beyond the scope of our static structural methods. Given that our structural approach is completely orthogonal to their bioinformatics approach, the agreement between the two studies is encouraging, while the discrepancy is of interest for further research.
In our model, specificity was a result of both local and non-local structural mechanisms. Local mechanisms conferred specificity at a given subsite through the energetics of the peptide and protease residues within that subsite, such as the disfavoring of β-branched amino acids at P1 through clashes with I84 or the burial of a charged group at P1' in the active site. Non-local mechanisms conferred selectivity through energetics of peptide and protease residues at distal subsites or multiple subsites. Examples of non-local mechanisms include P2 side-chains affecting the energies P1' subsite through steric interactions with I84 and V82 or various combinations of peptide side-chains leading to unfavorable peptide backbone conformations that disrupt conserved peptide-protease hydrogen bonds. That no single sequence motifs appeared to dominate low energy or high energy binders in our study underscores the degree to which these indirect effects play a major role in determining cleavability, reflecting the observed interdependence between peptide sequence positions on cleavable substrates in bioinformatics studies (Kontijevskis et al., 2007b; Ozer et al., 2006). Finally, we found that effects not captured by the protease-peptide energies, such as the active-site bond geometries, may play a major role in specificity as well, potentially explaining the prevalence of Aro-P and H-H motifs at P1-P1' among cleavable peptides.
Through this effort, we hoped to identify several specificity-determining protease residues that would be critical to native selectivity and thus serve as robust targets for drug design. This goal was premised on the prevailing theory that drug resistance in HIV-1 protease is the result of mutations that alter inhibitor binding without significantly affecting substrate recognition (King et al., 2004). The unexpected result of the study is that those residues that are determined to be critical for substrate recognition are also residues that clinical studies have revealed to be positions of high-frequency drug resistance mutations, which is demonstrated most clearly by the substantial difference in energies between cleavable and non-cleavable peptides when using only active-site DRM residue energies in our model. These findings indicate that active-site drug resistant mutations do alter substrate recognition and more importantly, imply that drug resistance and substrate recognition may share common mechanisms.
The importance of DRM residues in substrate selectivity suggests that any mutations to these residues may alter the specificity of the protease, perhaps even towards its endogenous substrates. In fact, experimental evidence has long supported that native substrate binding can be altered in drug resistant variants of HIV-1 protease. Both clinical and in vitro studies have revealed that many viral variants that display early-stage drug resistance show significantly decreased viral infectivity, as a result of decreased or defective protease activity (Mammano et al., 2000; Martinez-Picado et al., 2000; Zennou et al., 1998). Lin et al. (Lin et al., 1995) showed at least a several-fold increase in the Km of HIV-1 protease binding to peptide substrate Ca-P2 in several drug resistant mutations at positions V82 and D30. Additionally, a number of studies have observed polymorphisms in the several Gag polyprotein cleavage sites, including in response to decreased activity of drug resistant protease variants (Doyon et al., 1996; Ho et al., 2008; Zhang et al., 1997). The most frequently observed of these cleavage site mutants is the heavily-studied valine to alanine substitution in P2 of NC-p1 found in patients with the V82A protease mutation, for which a crystal structure of the complex has been determined (Prabu-Jeyabalan et al., 2004). Dauber et al. found altered specificity in a peptide-library cleavage assay for several DRM proteases, most prominently in the selectivity of β-branched residues at the P2 position in V82 mutants. Finally, a series of studies on the closely related feline immunodeficiency virus (FIV) protease found that several residues that correspond to DRM positions 30, 46, 47, 48, 50, 82, and 84 of HIV-1 protease are responsible for the differences in specificity between FIV protease and HIV-1 protease (Beck et al., 2001; Lin et al., 2000; Lin et al., 2003).
We evaluated the substrate envelope hypothesis in the context of substrate selectivity and found that while it is necessary for a peptide to have a minimal number of protrusions to be cleavable, it is not sufficient in defining cleavability, as a significant number of non-cleavable peptides also had either low number or zero protrusions. A residue-type analysis carried out on the 30 least-protruding and 30 most-protruding substrates (data not shown) revealed markedly different trends from the aforementioned analysis using low and high energy peptides that are not reflected in bioinformatics studies, most prominently, and unsurprisingly, that ‘small’ residues play a dominant role in minimizing protrusions (and consequently defining cleavability, according to the hypothesis). These findings underscore the limitations of the substrate envelope hypothesis as a model for substrate specificity in HIV-1 protease.
However, we found the theory underlying the substrate envelope hypothesis, the use of substrate consensus volumes to define binding behavior, useful in describing the specificity and promiscuity of HIV-1 protease. We generated a substrate envelope, consisting of the consensus volume of the 10 endogenous substrates, and a binding envelope, consisting of the consensus volume of all cleavable peptides, and found that the binding envelope was significantly larger than the substrate envelope, indicating substantially more structural diversity among the set of all cleavable peptides than in the 10 endogenous substrates. Furthermore, we observed that regions of the active-site with the greatest disparity between the binding and substrate envelopes, indicating areas in the active-site where there is the greatest diversity among all cleavable peptides compared to endogenous substrates, (i.e. greatest ‘promiscuity’) are often immediately adjacent to DRM residues. These results suggest that the HIV-1 protease active site is ‘primed’ to develop DRMs where this structural promiscuity is highest (subject to other constrains, such as protease stability). An examination of protease inhibitor volumes shows that heavy atoms in inhibitors often protrude from the envelope into these areas of high promiscuity, adjacent to the DRM residues that disrupt their binding. Whether this relationship between DRM and binding-site structural promiscuity is a general feature of drug resistance requires further study.
From a methodological perspective, this study represents a first attempt, to our knowledge, to systematically identify enzyme specificity determinants through a structure-based method. Through this approach we demonstrate that the specificity-determining residues in HIV-1 protease largely overlap with active-site residues associated with drug resistance and expand on the substrate envelope hypothesis to account for both the specificity and promiscuity of HIV-1 protease in the hope that a more complete understanding its molecular recognition can yield valuable insight for HIV drug research. Finally, we intentionally designed the method to be generally applicable to any enzyme that interacts with peptide substrates for which a high-resolution structure is known, and a dataset of reactive and non-reactive substrate peptides has been identified. We are currently exploring its uses in other systems.
We have uploaded the set of 112 structure predictions in Protein Data Bank format along with a database containing a listing of every peptide, its sequence, cleavability, and residue energies from resulting structure prediction on to our website for public use (http://graylab.jhu.edu). We believe this study represents only a fraction of the analyses that can be performed on this data set and it is our hope that other groups will be able use this information in conjunction their own molecular modeling or statistical techniques to glean additional information about substrate specificity in HIV-1 protease.
We relied on several previous publications for peptide sequences that have been demonstrated to be cleaved by HIV-1 protease (Hellen et al., 1989; Oswald and von der Helm, 1991; Riviere et al., 1991; Tomasselli et al., 1993; Tomaszek et al., 1992; Tozser et al., 1991). The ten endogenous substrate sequences were added to the set of cleavable sequences (Prabu-Jeyabalan et al., 2002). The majority of non-cleavable sequences were collected using a sliding 8-residue window around cleavage sites of HIV-1 reverse transcriptase, as determined by Tomasselli et al. (Tomasselli et al., 1993), from i-8 residues preceding to i+8 residues flanking the cleavage site i, in increments of 2 residues. Non-cleavable sequences also included five peptides predicted by Chou et al. (Chou, 1996) to have the lowest affinity for the protease. Finally, any peptide sequence that was identical in over four consecutive positions to any other peptide in the set was removed to ensure diversity. The final data set contained 69 cleavable peptides, including 10 endogenous substrates, and 43 non-cleavable peptide sequences.
The goal of peptide docking was to predict the structure of the protease-substrate complex at an atomic resolution for subsequent energy calculations.(Hao et al., 2008; Prasad et al., 2007; Sood and Baker, 2006) We developed our own flexible peptide docking algorithm to restrict the conformational search to a relatively local search because the end-goal is specificity determination, not structure prediction. The protease structure used for docking was obtained from the crystal structure of HIV-1 protease bound to the inhibitor Saquinavir (Krohn et al., 1991). The substrate peptide starting conformation and rigid-body position was defined from the crystal structure of HIV-1 protease bound to the endogenous substrate p2-NC (Prabu-Jeyabalan et al., 2002), after its bond lengths and angles were adjusted to ideal values (Engh and Huber, 1991). This starting substrate conformation ensured that the scissile peptide bond is in the correct conformation and orientation with respect to the catalytic active site residues for enzymatic catalysis and is largely constant across all crystallized peptides (Prabu-Jeyabalan et al., 2002). A given substrate peptide sequence was then threaded through this ‘generic’ starting structure to create the starting structure for that particular peptide sequence for docking.
Peptide docking was carried out using a novel algorithm written within the RosettaDock protein docking software (Chaudhury and Gray, 2008; Gray et al., 2003; Wang et al., 2007). The protease-peptide complex is represented entirely in atomic resolution, and the peptide backbone along with the peptide and protease side-chains are perturbed in a Monte Carlo plus minimization-based algorithm with simulated annealing. A flow-chart representing the algorithm is illustrated in Fig. 1. The temperature was decreased geometrically from a starting value of kT = 3.0 to a final value of 0.8. The starting structure underwent 96 cycles of docking to generate a single structural model, or decoy, and a total of 500 decoys were generated for each substrate peptide sequence.
Within each cycle, the peptide undergoes and ψ perturbations using a series of ‘small’ moves, which randomly perturb or ψ of residue i, and ‘shear’ moves, which randomly perturb of residue i and ψ of residue i-1 (reviewed in (Rohl et al., 2004)). The magnitude of the perturbation was chosen from a randomized Gaussian distribution that was scaled linearly with temperature from 20° at the beginning of the docking run to 5.3° at the end. Following a set of small and shear moves, the peptide backbone torsion angles, rigid-body position, and peptide and protease side-chain torsion angles were optimized using conjugate gradient-based minimization using the all-atom energy function. Every cycle the peptide and protease side-chain conformations were optimized using the ‘rotamer trials’ method as described in (Wang et al., 2005) and every eighth cycle, they were further optimized through a combinatorial repacking algorithm, as described in Kuhlman et al. (Kuhlman and Baker, 2000), using an expanded rotamer library (Dunbrack and Cohen, 1997; Wang et al., 2005). The energy function used throughout the simulation is the standard docking energy function used in RosettaDock (Gray et al., 2003), consisting primarily of Van der Waals, hydrogen bonding, solvation, statistical pair-wise, and side chain internal energy terms. The lowest-energy decoy among the 500 generated for each peptide sequence was selected as the final structural model for that peptide. Each decoy took approximately 210 seconds to generate on a single 3.0 GHz Intel Xeon processor; the entire data set for 111 sequences was generated in approximately 3300 CPU hours.
We compared this flexible peptide docking algorithm with a fixed-backbone algorithm where the same starting structure was used, the peptide backbone and rigid body position was held fixed, and the side-chain conformations were optimized using the same methods as described above.
The cleavability of a given substrate peptide is represented as the enzyme specificity constant towards that substrate, defined as kcat/Km, where kcat is the rate of product formation and Km is dissociation constant. Differences in free energy of enzyme-substrate interactions can be related to the specificity constant by ΔG = -RT ln(kcat/Km) according to transition state theory. In this study, we approximate the free energy of interaction between the enzyme and the substrate as the sum of the individual residue energies (Ei) of each active-site and peptide residue i, for a given structure (struct): ΔG = ΣEi(struct). For each structure, the individual residue energies are calculated as the sum of five components of the standard RosettaDock energy function: Van der Waals energy (Ei VdW), side-chain conformational energy (Ei Dun), solvation energy (Ei solv), hydrogen bonding energy (Ei Hbond), and a residue reference energy (Ei ref): Ei(struct) =Ei VdW(struct) + Ei Dun(struct) + Ei solv(struct) + Ei Hbond(struct) + Ei ref(struct). The Van der Waals energy was represented using a modified Lennard-Jones 6–12 potential as described in (Gray et al., 2003), the side-chain conformational energy was represented by statistical potential reflecting the Dunbrack rotamer probabilities (Dunbrack and Cohen, 1997; Kuhlman and Baker, 2000), the solvation energy was represented by the Lazaridis-Karplus Gaussian solvent-exclusion model (Lazaridis and Karplus, 1999), the hydrogen bonding energy was represented by a statistical orientation-dependent potential (Kortemme and Baker, 2002), and the residue reference energy was represented by a fitted energy function that describes the unfolded reference state, as described in (Kuhlman and Baker, 2000). All energies are listed in arbitrary units of kT. The total energy was taken as the sum of the individual residue energies of active-site residues (residues 8, 23, 25, 27, 28, 29, 30, 32, 45, 47, 48, 49, 50, 76, 81, 82, and 84) on both chains in the protease and all eight peptide residues. This “free energy” approximation implicitly includes the entropy of the solvent but neglects peptide and protease conformational entropy.
We used a Welch’s t test on both the total free energy and on the individual residue energies between cleavable and non-cleavable peptide sequences to identify significant energetic differences. A threshold of 0.2 kT was set for the difference of mean values, below which the distribution of energies between cleavable and non-cleavable peptides was considered indistinguishable. The t test was conducted assuming unequal sample sizes, unequal variances, and an alternative hypothesis that cleavable peptides have a lower energy than non-cleavable peptides. We conducted all statistical analyses using the R statistical software package (Team, 2004).
A residue was considered a ‘specificity-determining residue’ if it showed a significant decrease in energy between cleavable and non-cleavable peptides with at least p < 0.05. We also conducted a jack-knife statistical test to determine the average number of specificity-determining residues identified by chance by randomizing the cleavability of a peptide sequence in our data set and carrying out the statistical protocol 20 times and found that, on average, 0.15 residues were found to be significant at p < 0.01 and 0.55 residues were found to be significant at p < 0.05.
We validated the structure prediction accuracy of the peptide docking algorithm by comparing the final structural models of six endogenous substrates (MA-CA, CA-p2, p2-NC, p1–p6, RT-RH, RH-IN) with their respective crystal structures (Prabu-Jeyabalan et al., 2002). Structural accuracy was determined using two measurements, fnat, or the fraction of native residue-residue contacts (defined as two residues in which a heavy atom from one residue is within 5 of a heavy atom from the other) between the protease and peptide that is recovered in the structure prediction, and Irms, the root mean squared distance (rmsd) of all Cα atoms within 5Å of the protease-peptide interface following an optimal superposition along those atoms (Mendez et al., 2005).
We validated our modeling of substrate specificity in two ways. First we tested if there was a significant difference in free energy between cleavable and non-cleavable peptides complexed with the protease using a Welch’s t test, as described above. Second, we classified each peptide residue according to six types: small (G, C, N, S, T, D, A, V, P), hydrophobic (A, V, P, M, F, L, Y, I, W, C), charged (D, E, H, K, R), polar (T, C, S, N, D, Q, E, K, R, H, Y, W), aromatic (F, Y, W), and β-branched (V, I, T). We then sorted the peptides based on free energy of the peptide-protease complex and calculated relative probability of finding each residue type at each subsite between the 30 lowest-energy peptides vs. 30 highest-energy peptides. We compared these relative residue type probabilities to those observed in the literature for cleavable peptides and the endogenous substrates.
(A) Histogram of the active-site geometry, measured as the angle between the C-O bond vector of P1 and the O δ2-O δ2 vector between D25 and D25' (θactive-site) for cleavable (blue) and non-cleavable (red) peptides. (B) Representative cases of a peptide with θactive-site = 89° (left) for the non-cleavable peptide IAEI↓QKQG and a peptide with θactive-site = 141° (right) for the cleavable peptide ARVL↓AEAM. The carbonyl oxygen is shown in spheres, the peptide backbone, D25 and D25' are shown in sticks. The C-O vector of P1 and O δ2-O δ2 vector between D25 and D25' are shown as arrows.
We acknowledge Tiara T. Byrd for her help in the development of the peptide docking algorithm and in data collection, and Dr. Ernesto Freire for helpful discussions on interpreting our data and results. This work was supported by National Institute of Health grant R01 GM078221.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.