|Home | About | Journals | Submit | Contact Us | Français|
HIV-1 can evolve HLA-specific escape variants in response to HLA-mediated cellular immunity. HLA alleles that are common in the host population may increase the frequency of such escape variants at the population level. When loss of viral fitness is caused by immune escape variation, these variants may revert upon infection of a new host who does not have the corresponding HLA allele. Furthermore, additional escape variants may appear in response to the nonconcordant HLA alleles. Because individuals with rare HLA alleles are less likely to be infected by a partner with concordant HLA alleles, viral populations infecting hosts with rare HLA alleles may undergo a greater amount of evolution than those infecting hosts with common alleles due to the loss of preexisting escape variants followed by new immune escape. This hypothesis was evaluated using maximum likelihood phylogenetic trees of each gene from 272 full-length HIV-1 sequences. Recent viral evolution, as measured by the external branch length, was found to be inversely associated with HLA frequency in nef (p<0.02), env (p<0.03), and pol (p ≤ 0.05), suggesting that rare HLA alleles provide a disproportionate force driving viral evolution compared to common alleles, likely due to the loss of preexisting escape variants during early stages postinfection.
Hla-mediated cytotoxic T Cell (CTL) responses are thought to be critical for effective control of HIV-1. HLA class I molecules can present peptide antigens on the surface of infected cells, targeting them for removal by the cytotoxic T lymphocytes. However, HIV-1 can escape the HLA-mediated CTL response through the evolution of variants within the targeted epitopes, abolishing recognition by the CTL. Escape from a CTL response can be balanced by detrimental impacts on viral replication. For example, individuals with HLA B*57 or B*5801 will select for the evolution of the escape variant T242N in Gag.1,2 However, this variant has a decreased replication capacity, identified in vitro, that is partially maintained even after the selection of compensatory mutations.3,4 This may explain the reduced viral loads observed in HLA B*57 or B*5801-infected individuals compared to others.2,4–8 T242N will revert when transmitted to an individual without HLA B*57 or B*5801.1,2 Similarly, other CTL escape variants have been identified that cause lower viral fitness than the nonescaped variant through a variety of mechanisms.3,9–12
HLA alleles that are common in a given population may also cause the increased frequency of corresponding CTL escape variants. In particular, mutations with slower reversion times might be most likely to have a greater frequency in a given population, Slow reversion times could be caused by the lack of a strong detrimental impact on viral replication capacity, the presence of compensatory mutations that partially stabilize the escape variant, or amino acid changes that require more than one nucleotide change. Thus, in a transmission chain including several individuals with common HLA alleles, escape mutations that are slow to revert may be maintained at high frequency. Because a large proportion of HIV transmission is thought to occur during the acute stage of infection13 when there is little adaptive immune response in the host, preexisting escape mutations can conceivably be transmitted to an individual without the corresponding HLA alleles, but not have sufficient time to revert before being transmitted again to another individual. In this way, CTL escape variants could be maintained in a transmission chain involving some individuals without the corresponding HLA alleles.
Individuals with rare HLA alleles are more likely to be infected by viral variants with preexisting escape mutations to discordant HLA alleles, compared to individuals with common HLA alleles. After infection, viral variants with optimal fitness will dominate the viral population within the host, resulting in the loss of some preexisting escape mutations. In addition, the virus population is likely to evolve additional escape mutations in response to cellular immunity mediated by the rare HLA allele or alleles. Thus, we hypothesized that a greater amount of viral evolution will be necessary for the virus to achieve an adapted state in an individual with rare HLA alleles compared to those with common alleles, possibly providing additional time for the immune response to be effective during the early stages of infection.
A previous study compared the CTL responses to peptides targeted by HLA B*1503 from two populations, one in a subtype B-infected population where the HLA allele was rare and the other in a subtype C-infected population where the HLA allele was common.14 Subtype-specific sequence differences in the consensus peptide epitopes were found to be associated with the lack of recognition of several subdominant epitopes, suggesting that these differences were escape variants that had reached fixation in the population where the HLA allele was common.14 In the same study, HLAB*1503 was associated with lower viral load among the subtype B cohort where it was rare and no such association was found among the subtype C cohort where it was common, suggesting a protective effect for those from the population where the allele was rare. Another study found lower viral loads among individuals with rare HLA supertypes, suggesting that HIV had partially adapted to the common alleles, providing a selective advantage to those with rare HLA alleles.15 Furthermore, in Thailand, common HLA alleles were found to be associated with higher viral loads.16 Although these previous studies all suggest that rare HLA alleles may confer a selective advantage, no one has yet evaluated the underlying mechanism at the population level, that is, whether rare HLA alleles drive additional viral evolution. This study investigated the association of viral evolution and HLA frequency within a South African population using near full-length HIV-1 genome sequences.
The study participants were from KwaZulu Natal Province, South Africa and have been described separately.17 Previously reported full-length viral sequences18,19 from 272 members of the cohort were analyzed as part of a study approved by the University of Washington and University of KwaZulu Natal Internal Review Boards for which all participants gave informed consent. Four nonsubtype C and six hypervariable sequences were eliminated from further analysis. HLA class I genotypes were available on a larger subset of the South African participants (n = 1119).17 CD4 counts were available for 164 participants with both HIV-1 sequence and HLA genotype information.17
Sequences were aligned using ClustalW20 and MacClade version 4.08.21 Maximum likelihood trees were generated for gag, nef, env, pol, vpu, vpr, tat, rev, and vif gene sequences using PhyML.22 Tree parameters are given in Table 1 and the tree files are available at http://mullinslab.microbiol.washington.edu//publications/rousseau_2008_2/. Terminal branch lengthswere tabulated from the Newick tree file using NewickTermBranch (available at http://indra.mullins.microbiol.washington.edu) and the sums of branch lengths to the MRCA were calculated using DistParser (available at http://indra.mullins.microbiol.washington.edu).
HLA (Class I; A, B, C) genotypes were determined for the 1119 infected individuals from South Africa.17 When only two-digit genotypes were resolved (n = 405), the four-digit genotypes were probabilistically inferred using HLA Completion (http://microsoft.com/science), a novel tool that predicts the probabilities of the missing HLA information based on haplotype frequencies in similar populations.23
Linkage disequilibrium at the HLA locus was identified using HLA Linkage Disequilibrium (http://www.hiv.lanl.gov/content/immunology/hla/hla_linkage.html) and two-digit HLA allele designations. Adjustment for multiple comparisons was made using a p-value significance cutoff equal to 0.05/(total number of tests).
Ten datasets were imputed using the HLA allele probabilities. Because each study participant had two HLA alleles (with corresponding population frequencies) at each locus yet only a single measure of viral evolution, generalized estimating equations were used to evaluate the relationship between viral evolution (either terminal branch lengths or distance to MRCA) and HLA frequency, adjusting for year, cohort, CD4 counts, HLA homozygosity, and the remaining five HLA allele frequencies. The generalized estimating equations involved the assumption of Gaussian distribution of branch length, an independent correlation structure, and the Huber/White/sandwich estimator of variance. Linear regression was used to evaluate the association of HLA super-type frequency and viral evolution, adjusting for year, cohort, CD4 counts, HLA homozygosity, and the remaining five HLA allele frequencies. Linear regression was also used to evaluate the association of the presence of each individual HLA-A allele and terminal branch length adjusting for year, cohort, CD4 counts, HLA homozygosity, and the remaining five HLA allele frequencies. No correction for multiple comparisons was made in the analyses including branch lengths.
The length of the terminal branch in a maximum likelihood phylogeny indicates the amount of evolution that has taken place since the most recent ancestor, including the evolution that took place within the present host. The appearance of escape variants and reversions will increase the length of the terminal branches in a phylogenetic tree. In this study, the average terminal branch length ranged from 0.05 to 0.09 corrected substitutions/site (Table 2). The average estimated HLA-A genotype frequency was 0.01 (0.00001–0.12), HLA-B 0.01 (0.00001–0.12), and HLA-C 0.03 (0.00001–0.15). The HLA-A allele frequency was inversely associated with the length of the terminal branch from the maximum likelihood phylogenetic trees of env, nef, and pol (Table 3). Interestingly, all of the significant associations involved HLA-A. However, it is important to note that because HLA loci are clustered on the same chromosome, the alleles at these loci are often coin-herited. Thus, the recovered associations may reflect linkage rather than the effect of individual HLA genes.
To determine if single alleles are driving these associations, the individual HLA-A alleles were evaluated for their relationship with external branch lengths in env, nef, and pol (Table 4). The impact of each allele on external branch length was evaluated in relation to allele frequency (Figure 1). Again, an inverse relationship was observed between HLA allele frequency and external branch length. Furthermore, two HLA-A alleles were found to be in linkage disequilibrium with HLA-B and HLA-C alleles (Table 4).
The distance to the most recent common ancestor of the entire population, likely encompassing evolution throughout the duration of the subtype C epidemic, was not predicted to be influenced by the HLA from the most recently infected individual. And as expected, no significant association was found with HLA frequency and distance to the MRCA for any of the trees generated from each HIV gene.
HLA alleles can be grouped into supertypes24,25 based on the recognition of common motifs. HLA supertype frequency was not found to be associated with terminal branch length for any HIV gene, suggesting that this grouping, although increasing the power of the analysis by increasing sample size for each association evaluated, may have introduced misclassification bias by grouping HLA alleles with different impacts on HIV evolution, driving the association toward the null hypothesis.
Under the hypothesis of heterozygote advantage,26 individuals with heterozygous HLA alleles would select for a greater amount of viral evolution than homozygous individuals. Indeed, homozygosity at HLA-B was associated with shorter branch lengths in gag (two-sided t-test p<0.0006), and homozygosity at HLA-A with shorter branch lengths in tat (two-sided t-test p<0.012).
In general, CD4 counts decrease as the disease progresses. Thus, it would be expected that branch lengths would be inversely associated with CD4 counts, because the virus continues to evolve throughout the course of infection. Indeed, we observed that CD4 counts were inversely associated with vif (p<0.008) and rev (p<0.031) external branch lengths. Similarly, CD4 disease stage categories were associated with vifand rev external branch lengths (Table 5). However, viral load demonstrated no association with viral evolution in any HIV gene.
In our study, HLA allele frequency was found to be inversely associated with HIV-1 terminal branch lengths for env, nef, and pol, suggesting that rare HLA alleles may be associated with additional viral evolution in these HIV-1 genes. This supports previous reports suggesting that HIV escape variants can be transmitted1,27–31 and may become common in the interhost viral population.14 Furthermore, some escape variants can impart a replication cost to the virus3,9–12 and these variants may be lost during early stages of infection.1,32–34 In fact, the proportion of such events has been approximated in previous studies and ranges from 8 to 42%.32,34 Together these investigations suggest that the frequency of the HLA allele in the host population impacts the evolution of HIV by influencing the amount of escape and reversion during early stages of infection. Genetic drift also affects viral evolution but, by definition, it is not impacted by the immune response. The passage of time is associated with genetic drift and for this reason we adjusted for time in our multivariate analysis, reducing the potential for confounding due to this variable. Furthermore, the duration of infection will also impact the external branch lengths, due to genetic bottlenecking that takes place at the time of transmission. For this reason, we adjusted for CD4 count as a measure of disease progression. Neither genetic drift nor duration of infection was likely to be associated with HLA allele frequency, reducing the likelihood that these variables confounded our results.
One of the most daunting challenges in the design of an effective vaccine against HIV-1 is the enormous amount of genetic diversity observed among sequences identified worldwide. Developing a vaccine that targets elements common to all HIV-1 infections is a challenging task. One approach has been to identify sequences present in early infection with the hope that the factors selecting for the transmitted variants will result in common sequences or sequence patterns that can be incorporated into effective immunogens. An encouraging idea is that viruses undergo a procession toward an ancestral state of higher fitness prior to the evolution of escape mutations.35 A vaccine targeting this ancestral state, theoretically common to all infections, could prevent the virus from achieving high replication capacity very early after transmission, possibly resulting in eradication of the virus or reduced disease progression. Our study suggests that individuals with rare HLA alleles will have a greater number of evolutionary events, possibly due to increased reversions toward this ancestral state. Thus, sequences from individuals with rare HLA alleles may optimally assist in the identification of this ancestral state.
Viruses infecting individuals with rare HLA alleles would likely have both reduced replication capacity due to preexisting escape mutations as well as susceptibility to the immune response because the preexisting escape mutations would not protect against the immune response mediated by the rare HLA allele. This could lead to reduced viral load among those with rare HLA alleles. Supporting this hypothesis, two recent studies have identified an indirect association of viral load with the presence of transmitted escape variants selected for by discordant HLA earlier in the transmission chain.8,36 While we were able to detect an association between HLA frequency and viral evolution, no significant association was found in our analysis when HLA frequency was evaluated in relation to viral load. We may not have observed an association in this study because the difference in viral load among those with and without the escape variants may be most striking during the initial stages of infection. Because our study was cross-sectional, the majority of the study participants were likely to be in the asymptomatic stage of the infection. In addition, different HLA alleles have been shown to be associated both with increased and decreased disease progression,37 suggesting that the HLA-mediated immune response may have a complex set of effects, preventing the observation of an overall association of HLA-driven evolution with disease outcome. These and other unidentified misclassifications could explain why the clinical impact of a putative advantage of rare HLA alleles, possibly detectable in prospective analyses, may have been obscured in this cross-sectional study.
Although we evaluated the full HIV proteome, including all nine genes, no correction for multiple comparisons was performed because these genes were not independent, complicating the identification of an appropriate test. Given a p-value cutoff of 0.05 and the conservative Bonferroni method of correction for multiple comparisons, one would expect ≤1 association due to chance. Because our results identified three of nine associations, we conclude that our analysis provides significant evidence that population HLA class I allelic frequency has a substantial impact on viral evolution. Furthermore, this analysis evaluated a linear relationship between HLA frequency and HIV evolution as a logical first-order approximation and not necessarily the best approximation of the true relationship between these variables.
The three genes identified in this study are likely to reflect those with the strongest associations, allowing their identification in this analysis. It is not surprising to find Env evolution associated with HLA frequency, because this highly variable protein may allow substantial epitope escape in response to HLA-mediated immunity without substantial loss of viral fitness. Because Nef may be most targeted by the CTL response38 and may have the most amino acid variation driven by HLA selection pressure,19 it is also not surprising to find that Nef is likely to demonstrate differential evolution in response to varying CTL-based selection pressures. Finally, Pol, neither highly variable nor the most targeted protein, is an unanticipated result of this analysis. Pol is highly conserved and may not tolerate variation without loss of function, allowing variation only in the presence of strong CTL selection. In this case, a large proportion of the most recent evolution in Pol should be due to evolution toward a more fit form.
It remains unclear why we found associations with HLA-A only. However, this does not rule out the possibility that CTL responses mediated by HLA-B and HLA-C are not involved. Because the HLA genes are located in the same genomic region, they cosegregate, particularly HLA-A and HLA-B due to their proximity. In fact, two HLA alleles found to be associated with phylogenetic branch length also had significant linkage disequilibrium with HLA-B and HLA-C alleles (Table 4). Thus the associations with HLA-A may simply reflect linked associations with the other HLA genes.
There are several limitations to this study that may have prevented the identification of associations with some HIV-1 proteins. As mentioned earlier, this was a cross-sectional study and the precise time of infection was unknown. Longitudinal analysis including extensive viral sequencing during the acute stage of infection may be required to thoroughly evaluate viral evolution in response to HLA-mediated immunity. Because this ideal dataset is currently unavailable, our phylogenetic analysis was the best method to approximate those sequence changes. In addition, the external branch lengths we observed likely included evolutionary events that occurred in prior hosts, diluting our ability to identify an association between divergence and HLA frequency. Furthermore, HIV-1 evolves due to other factors, including other adaptive immune responses and protein function, possibly reducing associations evaluated with the CTL response. Finally, the sample size, although large for the extensive and high quality sequence data generated, may be too small to distinguish the impact of the CTL response on viral evolution in every HIV-1 gene when multiple factors are involved. These potential limitations are likely to bias our results toward the null hypothesis, making the identification of significant associations even less likely. Thus, our findings may be limited to only the strongest associations.
We have shown differential evolution of HIV-1 based on HLA frequency, suggesting additional HIV evolutionary events among individuals with rare HLA alleles. These findings support the idea that viral evolution undergoes a regressive path, reverting to states of greater fitness, prior to forward evolution driven by escape from the CTL response and other factors.35 These findings also suggest that CTL vaccine designs based on consensus sequences may include multiple escape variants to common HLA alleles, potentially reducing their effectiveness. Finally, focusing on infected individuals with rare HLA alleles may allow the identification of reversion variants that can be used as candidates in HIV-1 vaccine designs.
The authors would like to thank all of the participants who contributed time, information, and blood samples for the study. Funding for this research was provided by University of Washington Center For AIDS Research (AI27757), including a New Investigator Award (AI047734) to C.M.R., by Puget Sound Partners in Global Health (to C.M.R.), and by Microsoft Research (to J.L., C.K., and D.E.H.). This work was previously presented in part at the 14th Conference on Retroviruses and Opportunistic Infections, 2007.
No competing financial interests exist.