Importance of V3 loop variability and charges for viral infection
HIV is characterized by its ability to frequently mutate as evidenced by the large number of different isolates and by sequence diversity. A variability "hotspot" is the V3 loop which is implicated in a number of important functions including coreceptor usage during cell entry. Despite its hypervariable nature, V3 retains a basic function, that to interact and to modulate its preferential usage of CCR5 and CXCR4, a crucial step in the process of infection and indeed for the survival of the virus [
57,
58]. With this in mind, we attempted in the present investigation to address the contrasting function of V3, that of the frequent mutations necessary to evade host immune responses, and at the same time to retain the required interaction with coreceptors on the host cell. In this respect, we explored the combined electrostatic potentials of the amino acids in the V3 loop and their distribution in all HIV-1 subtypes, for which the tropism and V3 amino acid sequence are known, in order to exploit canonical rules that might exist.
We have performed electrostatic potential calculations of the gp120 V3 loops, using the Poisson-Boltzmann method [
59] and clustering analysis [
60] of the spatial distributions of electrostatic potentials for several HIV-1 subtypes. The clustering analysis allows the classification of similarities/dissimilarities of the subtypes based on the common property of electrostatic potentials. Electrostatic interaction is expected because, typically, the V3 loop has an excess of positive charge and the putative interacting N-terminal domain of the coreceptor CCR5, and to a lesser extent CXCR4, has an excess of negative charge. We have performed similar clustering analysis for the spatial distributions of charges and for sequence similarities of HIV-1 subtypes. It is actually the property of charge that many researchers have investigated to shed light into the V3 loop-CCR5/CXCR4 interaction. For example, a recent study has proposed that positively charged amino acids at positions 11, 24 and 25 are involved in coreceptor selection and binding (the "11/24/25" rule [
12]). In our study we present an analysis that includes the sequence specificities and charges of V3 loops from various subtypes, but also incorporates the more detailed information that is hidden within the spatial distributions of electrostatic potentials. It is actually the electrostatic potential that is responsible for recognition of two proteins if they have excess of opposite net charges. Recognition, which in our protein-protein interaction model refers to the formation of a weak and nonspecific encounter complex, is followed by binding, which is the formation of the specific final complex [
27-
30,
61-
69]. Although the origin of the electrostatic potential is unit and partial charges located in the protein surface and interior, the protein net charge does not capture the effect of charge distribution on protein-protein interactions. It is the spatial distributions of electrostatic potentials of two proteins that mediate long-range electrostatic interactions and protein-protein recognition. It is also the spatial distributions of charges of the two proteins that participate in mediating short-range charge-charge (salt bridging or weak Coulombic effects) and charge-dipole or dipole-dipole (hydrogen bonding) interactions and the formation of the final protein complex. The underlying hypothesis is described by the following transitive argument: if the electrostatic potentials and charges mediate protein-protein association, and if association mediates viral entry, we can deduce correlations to virulence by studying the specific properties of electrostatic potentials and charges, such as type (positive/negative), strength, and spatial distributions. These types of correlations are indications of where to look for causalities and may be helpful in predicting viral attributes.
Clustering of electrostatic potentials, charges, and sequences
Figure shows the dendrogram that clusters the calculated spatial distributions of V3 loop electrostatic potentials. These calculations were performed using 0 mM ionic strength, depicting largest magnitude of Coulombic interactions within each structure which are unscreened by solvent ions. The calculations were performed using homology model structures derived from the crystallographic structure of gp120 with PDB Code 2QAD and the HIV-1 subtype consensus sequences available in the year 2009 at the HIV Databases of the Los Alamos National Laboratory (Table ). Clustering has been performed by pairwise comparison of the electrostatic potentials of all subtypes listed in Table , as described in Methods. V3 loop subtypes with similar spatial distribution of electrostatic potential cluster together. The V3 loops studied have positive net charge, with the exception of group O, which has -1 net charge (Figure ). The predominant net charge is +3, appearing in 9 subtypes (A, AE, AG, B, C, D35, G, F, K) and in the sequences of the two crystal structures, 2QAD and 2B4C, which belong to subtype B (Figure ). From the remaining subtypes, group N has a net charge of +1 and AB, D, H, J, and CPX have net charge of +2 (Figure ). Although subtypes with the same net charge cluster together, there are finer subclusters for subtypes that discriminate according to the spatial distribution of electrostatic potentials. For example, from the +2 subtypes: AB and J cluster together; H and CPX cluster together (they are identical); and D clusters on its own. Overall, the +2 subtypes form the following cluster (with subclusters in brackets/parentheses): {[(J, AB), (H, CPX)], D} (Figure ). Similarly, the +3 subtypes form the following cluster: {[(((G, AG), (K, B)), (2QAD, 2B4C), C), A], [(F, AE), D35]} (Figure ). The +2/+3 subtypes form a supercluster together. The +1 group N clusters on its own and forms a larger supercluster with the +2/+3 subtypes, whereas the -1 group O clusters entirely on its own (Figure ).
In a dendrogram, generated with the more realistic electrostatic potential calculations using 150 mM ionic strength (corresponding to physiological ionic strength in serum), we observe similar overall clustering with local variations (Figure ). For example, the +3 subtypes form the following cluster (with subclusters in brackets/parentheses): {[(F, AE), (D35, A)], [((G, AG), (K, B)), (2QAD, 2B4C)], C}. The +2 subtypes form individual clusters (D), (H, CPX), and (J, AB) within the +2/+3 supercluster. The +1 group N clusters on its own and forms a larger supercluster with the +2/+3 subtypes, whereas the -1 group O clusters entirely on its own (Figure ). Coulombic interactions within the V3 loops are screened by solvent ions, which results in less obvious differences in the spatial distributions of electrostatic potentials when inspected visually (e.g., compare isopotential contours of Figure to Figure ). Nevertheless, we observe persistent electrostatic clustering patterns for the various subtypes, despite differences in their V3 loop sequences.
The clustering of the distribution of charges in space for each subtype is shown in Figure . Some clusters within this dendrogram can be found in Figures and (e.g., H and CPX). However, the subtypes are mostly mixed within the +1/+2/+3 supercluster. In general, charge distribution does not depict subtle differences between the subtypes. This is because charges are localized in the structure and are independent from each other. However, electrostatic potentials, generated by these charges, have additional features. First, electrostatic potentials account for dielectric and ionic screening. Because of the latter, we observe differences in the magnitudes and shapes of electrostatic potentials in Figures and . Second, electrostatic potentials account for spatial enhancements (additive effect of potentials with same signs) or spatial cancellations (subtractive effect of potentials with opposite signs).
Figure shows clustering of the sequences of the gp120 V3 loops from the subtypes used to generate the data of Figures and . This dendrogram does not, in general, depict the charge or the electrostatic potential differences of the various V3 loops. Obvious examples are the clusters (K, B, CPX, H) and (D35, D) which mix sequences with +2 and +3 net charges. These observations suggest that electrostatic clustering is more detailed, containing more refined charge-related information, than sequence clustering.
Clustering and epidemiological data
Figures and also present correlations between the observed clusters and available epidemiological data on global prevalence and geographic distribution (year 2004, [
21]), and coreceptor selectivity (see below). Subtype C is responsible for almost 50% of the infected population [
21]. In the 0 mM data subtype C forms a cluster together with subtypes A, G, AG, K and B, accounting together for ~85% of the infected population (Figure ). In the 150 mM data subtype C forms a cluster together with subtypes G, AG, K, and B, accounting together for ~73% of the infected population (subtype A, corresponding to ~12.3% of the infected population, moved to a neighboring cluster; Figure ). Geographic distributions [
21] are also quoted in Figures and .
Clustering and structural variability
For many years the intact structure of V3 loop in gp120 was elusive, presumably because of its dynamic character. This was alleviated in the crystal structures 2QAD and 2B4C, which contain multi-protein complexes that stabilize gp120 and the V3 loop. (In both crystal structures, the V3 loop is stabilized by contacting the antibody components of the multi-protein complex.) The dynamic character of the V3 loop can be deduced by observing that its conformation is significantly different in the two crystal structures, 2QAD and 2B4C (Figure ), despite the fact that they differ only in two conservative mutations (Q/N and F/L, Table ). To assess the degree that V3 loop dynamics affect its electrostatic properties, at least using two extreme conformations of the crystal structures, we performed similar clustering analyses for electrostatic potentials and charges, using the 2B4C structure (Additional Files
1,
2 and
3). Electrostatic potential clustering at 0 mM ionic strength (Additional File
1) is similar to the corresponding data of the 2QAD structure (Figure ). However, there are differences in the 150 mM data (Additional File
2 and Figure ), i.e. +2 subtypes are scrambled within the +3 subtype clusters. The difference between the 150 mM clustering data from the two crystal structures originates from their conformational variability, which results in different charge distributions and different enhancements or cancellations of positive/negative electrostatic potential distributions. Such differences are not observed in the 0 mM data, because of lack of ionic screening, resulting in more uniform distribution of the dominant electrostatic potential (here being positive with the exception of subtype O). As in the case of 2QAD, in 2B4C clustering of spatial distributions of charges does not depict the fine clustering of electrostatic potential similarities/dissimilarities (compare Additional Files
1 and
2). Also, as in the case of 2QAD, in 2B4C electrostatic clustering is more detailed, containing refined charge-related information not present in sequence clustering (compare Additional Files
1,
2 and
3, and Figure ).
Influence of homology modeling-derived local flexibility in calculating electrostatic similarity
Our goal in the studies described above was to produce and analyze consensus electrostatic potential templates for the V3 loop structures that capture the average electrostatic characteristics of each consensus sequence. The consensus sequences were constructed using the highest-occurrence amino acid at each V3 loop position, using several thousands of patient sequences. It should be understood that amino acid changes to revert a consensus sequence back to one of the many sequences used to construct the consensus sequence, would affect the V3 loop structure at the vicinity of the change(s), as well as the corresponding electrostatic potential distributions. In addition to sequence variability, the structural flexibility of the V3 loop indicates dynamic electrostatic potential distributions around an average distribution within each subtype.
As mentioned above, with knowledge of the great structural flexibility of the V3 loop, our strategy was to perform our analysis twice using the two crystallographic structures of the V3 loop in order to represent two extremes of the possible conformations and thereby accounting for a conformational transition. Additionally, the analysis based on each crystallographic template was also performed twice, using ionic strengths corresponding to counterion concentrations of 0 and 150 mM, resulting in a total of 4 electrostatic similarity analyses (Figures and , and Additional Files
1 and
2). Calculations at 0 mM ionic strength produce electrostatic potentials which are more dispersed and smoother, not as affected by the underlying structure as the 150 mM potentials, whereas calculations at 150 mM potentials, in addition to representing physiological conditions, are more dependent on the underlying structural details.
As a test to assess the effects of local flexibility on the reliability of our electrostatic potential similarity analysis, we produced 5 homology models for each of the two V3 loop sequences corresponding to those of the crystallographic structures. This was made possible with Modeller, by back-predicting structures using the crystallographic template structures from 2B4C and 2QAD. When comparing the 5 homology models to their actual crystallographic template we observe that there is only slight variation, occurring mainly because of different side chain rotamers. We performed electrostatic potential calculations for each set of models at both 0 and 150 mM ionic strength, and computed electrostatic similarities between the electrostatic potentials of each of the 5 homology models and the electrostatic potential of the corresponding template structure. The means and standard deviations of the calculated electrostatic similarities for the models of each template structure at both ionic strengths, are shown in Table . It is observed that the electrostatic potentials calculated for the homology models at 0 mM ionic strength were quite similar to those of the template structure, since the mean ESD is ~0.1 for both template structures (Table ). When looking at the dendrogram of Figure , which was calculated at 0 mM ionic strength, we notice that an ESD value of 0.1 is lower than the branches of most clusters, suggesting that such variation is unlikely to significantly affect the overall clustering. When looking at the 150 mM data we observe that the mean ESDs are a little higher at a value of ~0.4, as anticipated given the less smooth and more detailed electrostatic potentials compared to those at 0 mM. However, by analyzing the 150 mM dendrogram in Figure , we observe that it is unlikely that these variations would have a dramatic effect on clustering either, since once again the 0.4 value is near the ESD of most pairings. These tests show that the homology modeling procedure does not exactly reproduce the parent potential, but the variations observed are acceptable given the local flexibility of the small V3 loop peptides. A previous study of the effect of homology modeling on electrostatic similarity calculations has concluded that the variation of electrostatic potentials in homology models and deviations from electrostatic potentials corresponding to experimental structures is comparable to electrostatic potential variations within NMR ensembles of structures or within molecular dynamics trajectories [
39]. In our case, the consensus electrostatic potentials resulting from homology modeling based on two structural templates and at two ionic strengths provide electrostatic fingerprints that account for sequence variability and structural flexibility. These fingerprints can be used to understand the binding properties of each subtype and to predict the classification of new sequences.
| Table 2Comparisons of ESDs of multiple V3 loop homology models. |
Sequence, glycosylation, and charge rules for coreceptor selectivity
Because there are no X4-tropic consensus sequences in the 2009 data, with the exception of the non-consensus sequence of crystal structure 2B4C (Figure ), we resorted to sequence, glycosylation, and charge rules to present a predictive scheme for coreceptor selectivity. The coreceptor selection by HIV-1 is known to be influenced by the charge of the V3 loop, amino acid types at specific locations, and the presence of glycosylation sites. Differences in coreceptor selection by HIV-1 subtypes have been shown by experimental studies [
12,
20,
70,
71], and computationally predicted [
72-
76], although the effectiveness of the predictions is not conclusive. Based on previous studies and renewed thinking with respect to net charge, we used several criteria for coreceptor selection, shown in Figure . If the glycosylation motif (N
6X
7T
8|S
8X
9, where × ≠ Pro and N being the glycosylation site) is absent from the V3 loop sequence, the virus will show preference toward CXCR4 as coreceptor. Experimental studies have demonstrated that loss of glycosylation sites in the V3 loop is associated with selection of CXCR4 [
70,
71]. If the N
6X
7T
8|S
8X
9 motif is present, the coreceptor selection will be influenced by the amino acids at positions 11, 24, and 25 (of the "11/24/25" rule); if any of these amino acids are not positively charged, the virus will show preference toward CCR5 [
12]. We propose that if the N
6X
7T
8|S
8X
9 glycosylation motif is present and any of the amino acids at positions 11, 24 and 25 are positively charged, coreceptor preference will be governed by the net charge of the V3 loop sequence. If the net charge of the V3 loop is > 5, the virus will show preference toward CXCR4. Experimental studies have suggested that a high charge in the V3 is associated with loss of the glycosylation site and utilization of CXCR4 [
71]; however if the net charge of the V3 loop is ≤ 5, the virus will show preference for CCR5. Coreceptor selection will be affected by the presence and number of acidic chemical groups, like sialic acids, in the glycans. Typically, the glycans can have up to four sialic acids, each adding one negative charge to the loop [
77]. Thus, the presence of glycans may reduce the net charge of sequences with amino acid net charge of > 5 to ≤ 5. This means that a sequence classified as X4-tropic based on amino acid net charge, can be reclassified as R5-tropic using net charge based on amino acids and glycans. Because the number of sialic acids is not known, sequences falling in this category are classified as X4-, R5-or dual-tropic (Figure ). It should be noted that at lower V3 loop net charges (+3, +4), no effect was seen with alteration of N-glycosylation [
71]. In our interpretation, if glycosylation takes place, it lowers the net positive net charge even more and thus the sequence remains within the R5-tropic definition according to the scheme of Figure . We have tested the flow chart of Figure with experimental data for a series of R5- and X4-tropic sequences [
70,
71] and found consistency between the predicted and experimentally-derived tropisms. All consensus sequences studied here, and the sequence of 2QAD crystal structure, are R5-tropic according to the scheme of Figure , perhaps because CCR5 is the first viral preference for the asymptomatic cell infection prior to switching to CXCR4, and an insufficient number of X4-tropic sequences is available for consensus. However, individual patients infected with X4-tropic viruses of the aforementioned data of Refs. [
70,
71] have V3 loop sequences which are classified as X4-tropic using the scheme of Figure . It is likely that as CCR5 receptors are being depleted, the virus evolves through mutational pressure in increasing the positive charge of the V3 loop for more efficient recognition of cells with CXCR4 receptors. This may be because the N-terminal domain of CXCR4 has smaller negative net charge (and electrostatic potential) than that of CCR5, thus requiring larger positive net charge (and electrostatic potential) in the V3 loop for interaction.