|Home | About | Journals | Submit | Contact Us | Français|
HIV-1 enters immune cells via binding the viral envelope to a host cell CD4 receptor, and then a secondary co-receptor, usually CCR5 (R5) or CXCR4 (X4), and some HIV can utilize both co-receptors (R5X4). Although a small set of amino-acid properties such as charge and sequence length applied to HIV-1 envelope V3 loop sequence data can be used to predict co-receptor usage, we sought to expand the fundamental understanding of the physiochemical basis of tropism by analyzing many, perhaps less obvious, amino-acid properties over a diverse array of HIV sequences. We examined 74 amino-acid physicochemical scales over 1,559 V3 loop sequences with biologically tested tropisms downloaded from the Los Alamos HIV sequence database. Linear regressions were then calculated for each feature relative to three tropism transitions (R5→X4; R5→R5X4; R5X4→X4). Independent correlations were rank ordered to determine informative features. A structural analysis of the V3 loop was performed to better interpret these findings relative to HIV tropism states. Similar structural changes are required for R5 and R5X4 to transition to X4, thus suggesting that R5 and R5X4 types are more similar than either phenotype is to X4. Overall, the analysis suggests a continuum of viral tropism that is only partially related to charge; in fact, the analysis suggests that charge modification may be primarily attributed to decreased R5 usage, and further structural changes, particularly those associated with β-sheet structure, are likely required for full X4 usage.
Human immunodeficiency virus type 1 (HIV-1) primarily infects human CD4+ T-cells and macrophages. In order for HIV-1 to enter either of these cells, the viral envelope (gp120) must first bind a CD4 receptor on the cell surface and then a secondary co-receptor, usually CCR5 (R5) or CXCR4 (X4).1 Although there are two primary cellular co-receptors, three viral phenotypes are typically described in the literature: R5, X4, and less-efficient intermediate forms of HIV-1 that retain some ability to bind either co-receptor, called dual-tropic HIV-1 (R5X4). The transition in an individual from R5 to X4 is assumed to be unidirectional.2–4
Both R5 and X4 co-receptors are present on T-cells and macrophages1,5; however, receptor density may vary among different immune cells.6 Transmitted viruses, as well as viral populations during early infection, preferentially bind R5.5 Viruses that bind X4 may arise, usually during late infection, and are associated with increased diversity and a high evolutionary rate.7 Because blocking the interaction of gp120 with host co-receptors is a strategy used in anti-HIV therapy, a better understanding of tropism states may assist in the development of therapies to block viral transmission or as a treatment for those infected.
Gp120 contains five hypervariable domains (V1–V5) that possess a fluctuating glycan shield and variable epitopes near binding domains.8,9 Of the variable domains, the V3 loop has the strongest known link with HIV-1 tropism, and, therefore, V3 is frequently the focus of studies that aim at defining tropism states. Computational approaches have been developed to classify viruses as R5 or X4 tropic based on a small number of genotypic features of the envelope V3 domain, including charge, number of glycosylation motifs and domain length,10–14 or the “11/25 rule” that associates positively charged amino acids at two positions (11 and 25) in the V3 loop.15 These methods are highly useful in classifying HIV-1 subtypes B and C as either R5 or X4 tropic so that the appropriate therapy can be prescribed. However, it remains unclear whether R5 viruses always pass through an R5X4 stage before the development of X4 viruses or whether other less obvious structural modifications between tropism transitions exist that could benefit anti-HIV drug design.
Our goal in this study was to more completely characterize the molecular and structural basis of tropism and its evolution by exploring a diverse database of sequences (those biologically characterized at the HIV database at Los Alamos) and a larger space of possible features, including 74 amino-acid physicochemical scales.
Publicly available V3 loop sequences (relative to HXB2 positions 7,110–7,217) for HIV-1 subtype B were downloaded from the HIV sequence database at the Los Alamos National Laboratory (www.hiv.lanl.gov/content/index) and translated into amino-acid sequences. The search criteria for each tropism were limited to “only CCR5,” “only CXCR4,” or “only R5X4.” The tropism fields at the Los Alamos HIV database are annotated based on only biological experiments and are not presumed using inferred genetic sequences. Identical sequences were removed from each category. The amino-acid sequences were aligned using the ClustalW algorithm that was implemented within the MEGA5 sequence analysis package16 and then manually edited to correct for any obvious alignment errors. Diversity in each data set was calculated using the Poisson substitution model and 1,000 bootstrap replicates in MEGA5.
Seventy-four amino-acid scales, or “features,” were identified from the available literature and resources such as ProtScale and ProtParam (www.expasy.org)17 (Table 1). These features were grouped into six major classes: (1) amino acid size, shape, or structure (n=24); (2) polarity (n=6); (3) composition (n=5); (4) hydrophobicity (n=26); (5) local features, such as charges and glycosylation motifs occurring at specific regions of the aligned data (n=4); and (6) other miscellaneous features such as those associated with HPLC and pKa (n=8).
All features were calculated for the V3 loop taken as a whole, and for specific regions of the V3 loop at the amino- (N) and carboxy- (C) ends (termed “regional V3-loop features”). So-called “N” features corresponded to alignment positions 9 through 14 in the N-terminus of the V3 loop. So-called “C(1)” features corresponded to alignment positions 22 through 28 in the C-terminus. “C(2)” features corresponded to alignment positions 31 through 37 in the C-terminus. These two C-terminus labels identify regions of the V3 loop alignment as strand and helix, and they were added simply to explore which features in specific regions had more or less influence on tropism transitions. This led to the calculation of 281 total features over all regions.
Linear regressions were then calculated for each feature relative to each of three tropism transitions (R5→X4; R5→R5X4; R5X4→X4). Independent correlations for each tropism transition were rank ordered, with features R2 ≥ 0.1 for at least one of the three tropism decisions considered potentially informative toward an understanding of the differences between these tropism types. The features with the highest correlation were used to build an understanding of the transitions among R5, R5X4, and X4 HIV V3 domains and viewed with structural models as defined next.
Three sequences were chosen at random from each of the three tropism phenotypes for further computational modeling and structural analysis. V3 loop structures were generated for each of these nine sequences using the I-Tasser server.18–20 I-Tasser generates structures using an iterative approach of (1) retrieving template proteins with similar folds from the Protein Data Bank (PDB) (www.rcsb.org), (2) reassembly of matching templates into full-length models and threading of unaligned regions built by ab initio modeling, and (3) fragment assembly simulation guided by PDB templates and TM-align. Along with a three-dimensional structure, the server also provides information regarding secondary structure and solvent accessibility.
One of the limitations of protein threading is that it relies on existing published structures in the PDB; therefore, protein threading was followed by an energy minimization of each model. Molecular dynamics are commonly used to generate improved protein structures, which is accomplished by allowing protein atoms to adjust under defined conditions (i.e., temperature and pressure) using Newtonian equations. These simulations were carried out using the Nanoscale Molecular Dynamics program (NAMD).21
The models generated by I-Tasser were solvated in a water box of 10 Å in each direction. The system was neutralized with the addition of NaCl. Simulations were carried out using periodic boundary conditions at 310K to reduce surface interactions of water molecules and to create a more accurate in vivo environment. Particle Mesh Ewald (PME) electrostatics were employed using a grid spacing of 1.0 Å to reproduce the charge distribution of the system. Langevin dynamics were used to control kinetic energies. A time step of two femtoseconds (fs) required the use of rigid bonds. Structures were minimized for 1,000 time steps and allowed to equilibrate for 100,000 time steps. Three replicate NAMD runs were performed for each structural model. Molecular simulations were viewed in the program visual molecular dynamics.22
The Los Alamos HIV sequence database queries for biologically phenotyped V3 loop HIV resulted in 3452 R5 sequences, 545 R5X4 sequences, and 197 X4 sequences. After removal of all identical sequences, the final sequence population contained 1223 R5 sequences, 241 R5X4 sequences, and 95 X4 sequences for a total of 1,559 unique sequences. Some nonidentical multiple clones from the same patient were apparent and included in our analysis to capture all positional amino-acid information for each position. Although all sequences were biologically tested for tropism, different experimental methods in different laboratories could have marginally impacted tropism determination. The significant overabundance of R5 sequences in the available public databases suggests that R5 sequences greatly predominate over X4 variants in nature. The sequence alignment spanned 40 amino acids, including gapped positions, and allowed for the inclusion of all sequences, even those that were unusually long or those with varied sequence and that were derived from the same individual. Because of this, charged positions 11 and 25, previously associated with tropism,15 corresponded to positions 12 and 30 in our alignment, respectively. A representative alignment of three sequences chosen at random from each known tropism is provided in Figure 1A. Figure 1B provides additional structural models for the V3 loop showing positional charge variation and slight structural differences commonly observed when threading sequences using major PDB models for different tropisms.
In Figure 2, we show an almost stepwise progression in the diversity of sequences for each of these tropisms as described by their internal pairwise distances. The mean pairwise amino-acid distance for R5 sequences was 18.2%, which was nearly doubled for both R5X4 at 32.9% and X4 at 38.8%. These diversity measurements are not meant to guide tropism prediction; instead, they are merely meant to elucidate the stepwise increase in the diversity trend of R5 to R5X4 to X4 sequence populations. Although the inclusion of some nonidentical sequences derived from the same individual in the study could bias the R5 sequence population toward reduced diversity, this effect would have been minimal when considering that the overall R5 sequence population was almost 17 times larger than the X4 sequence population. Furthermore, it is generally known that X4 variants maintain more diverse amino-acid substitutions.
A different set of features was associated with each tropism transition and suggested a progression or continuum of viral tropism that is only partially related to charge. In Table 2, we provide a list of all features examined and their correlation to tropism relationships. A correlation of R2 ≥ 0.1 was chosen arbitrarily as a lower cutoff for useful tropism understanding. Only 84 features had an R2 ≥ 0.1 for at least one of the three tropism comparisons. The remaining 197 features were considered to have little utility for tropism insight and were eliminated from further consideration.
Although amino-acid charges are commonly used features to discriminate R5 from X4 sequences in public computational algorithms that predict viral tropism, in our transitional analysis, a set of 12 features identified R5→R5X4 at R2 ≥0.1, with the top four features associated with charge. However, 12 other amino-acid characteristics surpassed charge as a feature in distinguishing the R5X4→X4 transition and 25 noncharge-associated features (shape, size, or structure) with an R2 > 0.2 were determined to be important in the R5→X4 transition. With this in mind, it may be that computational algorithms used to determine tropism are biased toward assigning actual intermediate R5X4 V3 loop sequences as X4 variants, or that additional features added to existing algorithms could improve their accuracy. Also, the analysis suggests that key features associated with true X4 viruses have not been adequately teased out of the structures.
In the N-terminus of the V3 loop, three features were identified for the R5→R5X4 transition with an R2 ≥ 0.1: 2D Membership Class N (structural), Grantham (polarity), and Beta Chou and Fasman (structural). However, in the transition from R5X4 to X4 and from R5 to X4, 24 structural features associated within the C(1) region were calculated to have an R2 ≥ 0.1. Many of these features were associated with beta-sheet structure. Also noteworthy in the R5X4→X4 and R5→X4 transition were the 2 pKa-associated features (one associated with the amine and one with the carboxylate) with higher R2, which could mean that the transition to X4 from either R5 or R5X4 is zwitterion like, meaning that the overall effect can be a neutral molecule with an increase in both positive and negative electrical charges.
No structural features from the C(2) region were considered important for any tropism transition. These results suggest that features that distinguish R5 from X4 viruses are found either throughout the V3 loop or within the N-terminus, whereas the transition from R5X4 to X4 virus results from changes in the C(1) domain of the V3 loop. Four polarity features (Charge Scale C(1), Charge Polarity C(1), Grantham C(1), and Polarity C(1)) were also central in the transition from R5X4 to X4.
Interestingly, glycosylation was not identified for either the R5→R5X4 or the R5X4→X4 transition, but it had a higher R2 when transitioning from R5 to X4, suggesting that this feature may be a final adjustment in the transition from R5 to X4 co-receptor usage. Alternatively, it may be that an X4 glycosylation motif arises unsystematically due to the required shape/size adjustment for the final X4 usage; this could explain why many X4 variants do not, in fact, possess an additional glycosylation motif. Also interesting is that sequence length was identified as only a minimally useful indicator during the transition of R5X4 to X4 with an R2 ≥ 0.167. This observation is likely due to the fact that although X4 tropism may allow for increased V3 loop length, the majority of X4 viruses we analyzed were no longer than R5 or R5X4 HIV in the V3 loop region.
Overall, most molecular features that are important for the R5X4→X4 transition were not found to be important for the R5→R5X4 transition. Further, transitional features between R5→R5X4 tended to also perform well for the transition R5→X4. These interesting results suggest that there are fundamental biological differences during the transitions between tropisms. For example, while examining the top 10 features for each tropism decision, a pattern emerges (Fig. 3). During the R5→R5X4 transition, changes occur that affect charge at specific positions in the V3 loop. These changes also affect overall structural characteristics such as volume, hydrophobicity, and refractivity. During the conversion of R5X4→X4, changes occur that affect β-sheet development, surface area, flexibility, and pKa of the V3 loop. These features are quite important to the development of X4 viruses. Note that with the exception of only two features (charge at position 12 and average Flex C(1)) the top 10 features for the R5X4→X4 are the same as the top 10 features for distinguishing R5→X4. This also indicates that what we currently refer to as R5X4 is probably functionally closer to R5 than X4 in its sequence and structure.
To summarize, these statistical results suggest a series of amino-acid alterations during the transition of viral tropism from R5 to X4, including: (1) charge modifications and structural N changes reduce the ability of the virus to utilize R5 receptors; (2) structural changes, especially in the C(1) region, allow for the adaptation of V3 loop to X4 receptor binding; and (3) increased glycosylation may be a final feature in structural adjustments, allowing the final transition from R5X4 to X4. These findings could be tested on controlled data sets, such as those used to develop co-receptor algorithms, to confirm their utility.
Energy minimization and equilibration of V3 loop structures revealed a more organized secondary structure in R5 than in R5X4 or X4 structures. The most commonly used threading templates used by the I-Tasser server were PDB models 1ce4A,23 4ncoA,24 and 3tygA,25 all of which have an R5-like tropism, but still produced reasonable structures for further examination and resolution using molecular dynamics simulations. The generation of structures using this server produces a confidence score (C-score) based on the significance of the threading. The resulting top scoring models based on three confidence measurements (C-score, TM-score, and RMSD) are described in Table 3. All models produced were within the range of models of reasonable topology, with the R5 models exhibiting the lowest C-scores, which is not surprising considering that the native models in the PDB had an R5-like tropism.
Another result from protein modeling is secondary structure prediction at each position in the model and predicted solvent accessibility at each position (Figs. 4 and and5).5). The N-terminus of the structure and the “tip” of the loop are highly conserved in terms of secondary structure for all tropisms. It is in the α-helix domain, at the C-terminus, that a less ordered structure is particularly apparent (Fig. 4). Minimal differences are noted for solvent accessibility after modeling (Fig. 5) between R5 and R5X4 virus; however, X4 has some potential interesting changes from two other tropism classes. In particular, the charged amino-acid position at position 12 and amino-acid positions that are observed as insertions appear exposed on the structure's surface.
Molecular dynamics simulations allow atoms and molecules to interact under a controlled set of circumstances (i.e., temperature and pressure) that mimic natural molecular forces and lead to an improved three-dimensional protein structure. These simulations allowed for structural relaxation and adjustments for each V3 loop model after the protein threading stage. Certain trends were observed after this process of modeling (Fig. 6). For example, R5 models almost always retained a more closed “C-shaped” loop structure with a well-defined α-helix at the C-terminus of the loop. In the case of the MASTR HIV isolate, secondary structure was not visible after threading, but it appeared in all structures after allowing the model to adjust over time and under controlled temperature and pressure. Loss of the α-helix was observed in two structures from the 30Rog isolate after energy minimization. In the X1H4 isolate, no recovery of the α-helix domain was observed after energy minimization.
More frequently, a less-organized structure was observed in R5X4 and X4 variants in comparison to the R5 isolates. These experiments also highlighted amino acids that exhibited more freedom of movement during the simulation, where red amino acids exhibited more freedom, blue amino acids demonstrated less movement, and green amino acids were intermediate (Fig. 6). The alpha helix in most R5 isolates appears blue to green.
The V3 loop of the HIV-1 envelope was identified as a key determinant of tropism decades ago,26 and three discrete tropism states (R5, R5X4, and X4) are commonly discussed in the literature. In this study, our goal was not to discriminate between tropism phenotypes; rather, it was to identify amino-acid features beyond those more obvious features used in computational algorithms that could potentially be harnessed for therapeutic design or improved co-receptor evaluation.
We identified that a variety of amino-acid features that have not previously been taken into account reside in the HIV-1 V3 loop and appear necessary for HIV-1 to transition between tropism states. Our analysis supports the hypothesis that a continuum of tropisms exists from R5 to X4 through a spectrum of dual-tropic intermediates.14 Of course, it is also possible that in many cases a virus population can slip between tropism states, only to return to an R5 population after rounds of selection. Our analysis also reinforces the notion that R5X4 and X4 HIV contain a less-ordered structure than R5 structures, particularly in the capacity to maintain an organized secondary structure and a compact configuration.
Early studies have described the change from R5 to X4 HIV-1 as a “switch,”27,28 with each tropism having a preference for tissue macrophages or T-cells (the primary cellular targets of HIV-1). R5 HIV are more commonly identified within infected patients and also the usual transmitted phenotype.29,30 X4 HIV were first suggested as contributing to rapid AIDS progression, because they appeared later in disease31; however, it is now well understood that AIDS progression is not always associated with X4 viruses, as most late-stage HIV-infected patients contain predominantly R5 HIV-1 or progress only to R5X4.30
These transitions among tropisms can be attributed partly to gp120, which is the target for neutralizing antibodies; it is this interplay of gp120 with the immune system that drives a substantial amount of selection against the least-fit variants during successive rounds of viral replication and results in the genetic variability of gp120.7 However, during complete immune failure, this balance breaks down and lesser-fit isolates (X4) can emerge. In fact, a study recently showed that X4 viruses were closely linked to low nadir CD4 T-cells <100 cells/mm.32 Therefore, the X4 phenotype is not the driver of disease progression; rather, it is the interaction between gp120 and a weakened immune system occurring during late-stage AIDS progression that allows for reduced viral selection and the evolution of the less structured and more openly configured X4 phenotype.
Dual-tropic HIV-1 can utilize either co-receptor; whereas later studies showed a preference for these to use the X4 receptor in lymphocytes but they use both co-receptors equally in macrophages.33 This is relevant considering that circulating plasma T-cell levels can vary greatly over the course of infection, whereas HIV-1 is readily amplified from a variety of anatomical tissues when plasma T-cell levels are nonexistent,34–36 thus implying that a persistently infected tissue-based reservoir still exists during complete immunodeficiency and could contribute to the tropism states observed. Furthermore, under cART, HIV tissue reservoirs are of increasing interest, as viral suppression from plasma fails to fully eradicate infection from an individual37 and, in fact, lymphoid tissues have recently been implicated as a sanctuary for productive virus under Cart.38
The unidirectional transition of R5 →R5X4→X4 observed is a reflection of similar ordinal selective pressures generated in different individuals who generate similar viral phenotypes until such time as the cellular environment (and resulting selective pressure) changes in perhaps less predictable ways. Thus, the specific timing of each of these events is relative to the condition of each patient, and the speed at which HIV progresses from R5 to R5X4 to X4 is a reflection of these differences. To summarize, selection against X4, a dynamic immune system, and the persistence of HIV in anatomical reservoirs where varied HIV tropisms could replicate efficiently in different immune microenvironments (i.e., diseased tissue) could all contribute to a continuum of tropism states during the span of HIV infection. Further studies of tropism modulation in anatomical reservoirs are underway.
Maraviroc is a CCR5 antagonist currently approved by the U.S. Food and Drug Administration for the treatment of patients infected with R5-tropic HIV.39 Although this therapy in combination with other antiretrovirals has had some success, it still does not greatly alter disease course in HIV-infected patients who routinely follow clinical antiretroviral HIV treatment protocols.40,41 The partial success of Maraviroc may be due to the ability of HIV to readily infect tissue-based HIV reservoirs (i.e., tissue macrophages),37 where immune cells with a more varied receptor concentration may thrive, thus supplying the blood with a low-level consistent source of new virus. For example, in one study, Maraviroc did not affect biomarkers of monocyte/macrophage activation.42 It may be that future therapeutic approaches could harness the increasing knowledge of V3 loop variability to better block viral entry for cells that are both circulating and sequestered in anatomical sites.
Importantly, viral tropism is likely related to changes outside of the V3 loop domain, either somewhere else within the HIV envelope or even in co-evolution of CD4 with HIV gp120, which has been suggested in several reports.43–45 The present study has treated V3 loop features associated with tropism as being independent; however, it is likely the case that their nonlinear interaction plays a key role in tropism. For the purpose of this investigation, we chose to assume feature independence to make the correlation between sequence and structure easier to understand. Future work will examine their true interdependencies. Future work will also assess whether the amino-acid features identified as related to tropism shifts in this work also correlate to amino-acid changes found in more controlled assays where HIV is harvested from indicator cells expressing exogenous CD4 and either R5 or R4.
The authors would like to thank the Los Alamos National Laboratory HIV Sequence Database for the HIV sequence information used in this study. This work was supported by grants from the U.S. National Institutes of Health under grant R01 MH100984 to M.S.M. and grants P50 GM103297-01 and R01 NS063897-01A2 to M.S. The funders had no role in the study design, data collection and interpretation, or the decision to submit the work for publication.
No competing financial interests exist.