The role X4 viruses play in HIV pathogenesis has been elusive on several levels. Their presence clearly increases risk of disease progression, but not every individual who progresses appears to acquire this phenotype (26
). However, genotypic analysis has suggested that their representation had been underestimated (50
) in the earlier biological studies; in particular, X4 viruses may arise but not persist. The results of the present study indicated that not all progressors develop X4 viruses predicted by PSSM matrices (8 of 11 did) but did confirm their transience in some individuals (4 of 11). There are strong genetic determinants for CXCR4 usage that evidently are not transmitted but evolve independently in a large fraction of untreated individuals. The repeatability of this phenomenon is striking in itself, as it represents repeated independent evolution to the same endpoint and suggests that the coreceptor transition might follow a common evolutionary pathway in many cases. However, no single set of mutations appears to lead to coreceptor switching in every genetic background.
We explored whether it would be more fruitful to suppose that certain mutations increase the probability that a given virus is X4, with other influences such as the larger viral genetic background also impinging on coreceptor usage. The performance of our PSSM-based method can be interpreted in this light. Empirically, the more X4-associated mutations that appear in a V3 loop the higher the score and the more likely that V3 is correctly predicted to be associated with an X4 phenotype. The association of intermediate scores with an intermediate (or at least qualitatively distinct) phenotype, dual coreceptor usage, make a model of independent accumulation of mutations leading to CXCR4 usage biologically plausible. Such a model also predicts that although certain mutations (e.g., mutations at sites 11 and/or 25) may have disproportionate influence on coreceptor usage, such mutations are not necessary for coreceptor switching (provided V3 has accumulated enough other mutations with smaller effect). Consistent with this, the data contain the results for a number of high-scoring CXCR4-using viruses that did not harbor 11/25 changes (Fig. ). Sites 24 and 27, previously implicated as contributors to the X4 phenotype, affect the PSSM score by 0.8 and 1.1 log-odds units (50th and 80th percentiles over all V3 sites) on average, while 11 and 25 contribute 1.5 and 1.2 log-odds units (95th and 90th percentiles) on average. Finally, the propensity to use CXCR4 may not lead to the actual ability to use it in particular circumstances; conversely, certain genetic backgrounds may support CXCR4 usage in spite of an unlikely V3. The presence of variability in scores within each usage class supports this risk-based model of X4 genetics.
The mutational differences reflected in PSSM score variability may account for some of the discrepancy between the results of phenotypic and genotypic studies of the frequency of X4 development among progressing individuals. It is unlikely, however, that this accounts for the entire difference. Genotypic studies have often used basic mutations as predictors of CXCR4 usage, but our study suggests that in the absence of other X4 mutations, these mutations do not always confer a high propensity to use CXCR4 (Fig. ). Subject 6 virus had the lowest mean scores, but this individual was nevertheless the fastest progressor in the MACS set. The results of our reanalysis indicate that he was unlikely to have harbored X4 variants, yet a previous study reported finding one X4 sequence on the basis of the presence of a basic residue at position 24 (50
). This is consistent with the idea that X4 virus is not always required for progression to disease (11
). Two of seven V3 sequences sampled at 5.8 years after seroconversion for subject 11 had basic residues at site 25 and were predicted by PSSM to represent X4; however, these viruses did not give rise to a persistent high-scoring lineage (supplementary Fig. S10). Subject 11 was a slow progressor (and is now being treated with highly active antiretroviral therapy) (Fig. ). Of two individuals not evaluated in a previous study (50
), subject 4 was a moderate progressor and did develop viruses predicted to be X4 and subject 10 was a nonprogressor and did not develop predicted X4 viruses.
The observation that the presence of biologically phenotyped SI virus is strongly associated with CD3+
decline and disease progression (33
) suggests that the ability to detect X4 viruses early (or to predict their evolution) has clinical prognostic value. Since (as we have shown for eight individuals developing predicted X4 virus) PSSM scores tend to rise gradually, in contrast to the relatively abrupt appearance of X4 virus in the blood, as detected by biological phenotyping (28
), it is unlikely that X4 virus arises and outgrows R5 rapidly via single mutations. Monitoring the average score of subject virus (via sequencing or less costly methods) can provide advance warning of X4 outgrowth before virus is actually able to use CXCR4, which can in turn inform prognosis or treatment decisions. In support of this, a recently presented prospective study of 1,107 HIV-positive individuals starting suppressive antiretroviral therapy showed that the presence of SI viruses at the baseline (as predicted from consensus V3 sequences by PSSM and 11/25 methods) was an independent predictor of rapid CD4+
T-cell decline and mortality on therapy (Harrigan et al., 2nd Int. AIDS Soc. Conf. HIV Pathogenesis and Treatment). Evolutionary analysis of serially sampled viral sequences might allow us to identify the order in which mutations occur and highlight mutations that typically occur early in the R5-X4 transition. A preliminary analysis of the mutational pathways inferred for the individuals we studied suggests that basic mutations at sites 11 and 25 (as well as basic changes appearing at site 32) consistently occur early in the evolution of high-scoring lineages. The detection of these mutations in infected individuals may indicate a high X4 risk going forward, even when X4 virus is not yet present; larger-scale longitudinal sequencing studies are required to answer this question definitively.
While X4 virus generally develops once and relatively gradually in an individual, as previously suggested by van ’t Wout et al. (62
), our phylogenetic analyses suggest that ultimately, it can be lost in two different ways: either by being supplanted by a preexisting population of R5 virus or by evolutionary reversion to the R5 phenotype. In the former case, R5 virus lineages from earlier in infection persist throughout infection, while in the latter, early R5 lineages are extinguished. This suggests that at least two qualitatively different types of R5 virus (or host responses to R5 virus) can occur in vivo. This idea parallels early observations (1
) and a recent study (49
) suggesting that NSI virus late-growth characteristics are different from those of the NSI virus that tends to initiate infection. A persistent R5 population may have evolved more efficient binding to the CCR5 receptor, as has been shown to occur with in vitro-passaged virus under pressure from a small-molecule CCR5 inhibitor (60
), making it better able to exploit diminishing resources. Reverted X4 lineages, on the other hand, might retain the ability to use CXCR4 despite relatively low PSSM scores. The selective forces that lead to either of these or other outcomes will vary with host-specific factors but may also involve differences in the viral genetic background. By providing reconstructed amino acid sequences for the ancestral V3 sequences (i.e., sequences at the internal nodes of the tree) that can be expressed and used as reagents in in vitro experiments (4
), phylogenetic analysis allows hypotheses such as these to be tested.
The PSSM score is a simple yet reliable method for predicting viral phenotypes on the basis of the amino acid sequence of the V3 loop of env
. Such determinations are made on the basis of an additive model of CXCR4-usage propensity that ignores length variation and possible synergistic effects that certain residues at multiple sites in cis
can have on phenotypes. Nevertheless, the method is robust with respect to these shortcomings (Table ) and as a predictor of CXCR4 usage performs in a manner comparable to that of the neural network method (42
) (sensitivity, 75%; specificity, 94%), which does incorporate synergistic effects. This suggests that amino acid residues at particular sites in V3 contribute (mostly independently) to coreceptor usage regardless of their particular combination in the haplotype. The PSSM method also has the advantage of being simpler in concept and more transparent in its assumptions than other methods (apart from the charged-based method) that have been previously employed (see Jensen and van ′t Wout [24
] for a review of current methods). The PSSM score is a bioinformatic tool, complementing biological phenotype determination, that can express the X4 potential of a given V3 loop sequence in a graded way and for which intermediate values appear to correspond well with the evolution of viruses within individuals. As such, it may be useful as a basis of sequence-based clinical assays of within-host X4 outgrowth and could allow longitudinal study of X4 evolution and disease in large numbers of individuals without requiring extensive cloning of primary viral isolates.