CTL responses to HIV-1 infection can lead to the evolution of escape mutations, effectively reducing the impact of the immune response and the control of disease progression. However, these escape mutations can be balanced by viral fitness constraints. A vaccine that targets both the most fit viral variants and their escape mutations might be effective enough to drive viral evolution toward states of lesser fitness and slow disease progression (50
), potentially increasing the quality and duration of life for those infected while lowering the rate of transmission. Thus, our study and other recent work have sought to identify features of the viral proteome that might be especially important for targeting by immune responses elicited by vaccines.
Our approach to identifying critical immunologic features of the viral proteome was to identify associations between the HLA alleles that restrict the CTL response and amino acid variants found in the viruses infecting these individuals, along with their plasma viral loads, using the latter as a surrogate for disease status. Elements of this approach derived from earlier studies that linked HLA with specific amino acids (39
), refinement of these associations using phylogenetic correction (8
), and associations made between specific HLA alleles and specific viral proteins or the viral load (21
). Our study extends prior efforts at defining HLA-amino acid associations using these tools (12
) to the analysis of whole viral proteomes and HIV-1 subtype C, the most common subtype worldwide.
Specific amino acid changes associated with HLA alleles were identified and placed into the context of known HLA epitopes and epitope motifs, taking HLA linkage disequilibrium into account. We identified amino acid residues that were predicted to be resistant (n
= 244) or susceptible (n
= 314) to the HLA class I-mediated immune response with the prediction that susceptible variants were likely to reflect reversions based on selective pressure for increased viral fitness. We also evaluated the plasma viral loads among individuals with the corresponding HLA allele and the presence of the susceptible amino acid in the viral sequence. Some of the sites with changes away from the susceptible residue were found to be significantly associated with lower viral loads, including the well-studied epitope TSTLQEQIGW (TW10 [Gag240
]) previously shown to impact viral fitness (11
). In total, we identified seven susceptible residues, escape from which was associated with lower viral loads, suggesting a fitness cost of immune escape. Several factors remain unknown for this analysis and likely resulted in a bias for false-negative results. These factors include the sequence of the transmitted strain (or strains), the timing of the CTL response, the presence of compensatory mutations, and the presence of coinfections that impacted the viral load (such as malaria). Furthermore, there may have been undetected associations due to lack of power in our study. Thus, these 7 residues probably represent those with the most robust associations. These results support the idea that a vaccine that induces the CTL response to such epitopes, alone or in combination, may be effective in reducing viral loads.
Although the constituents of an effective vaccine immunogen remain elusive, the results of this study suggest greater importance of some rather than other viral proteins in eliciting suppressive, if not protective, immunological responses. In particular, our results suggest a ranked hierarchy of the proteins and the fitness costs associated with immune escape. Vpr, Gag, and Rev were at the top of the list, suggesting that these proteins are best able to elicit immune responses that decrease the fitness of the virus. On the other hand, the immune responses to Vpu, Env, and Tat (at the other end of the list) were largely ineffective. The high abundance of Gag compared to other HIV-1 proteins (25
) makes it a logical vaccine candidate. Thus, this study, in combination with several previous reports, converges on Gag, or elements of Gag, as an important component of a vaccine immunogen. In a prior study, Gag was found to be the only protein targeted by the CTL response that was associated with lower viral loads (36
), and Gag p24 has been reported to be the preferred target for HLA alleles associated with protection from disease progression (9
). Other studies have shown that the proportion and magnitude of the CTL response to Gag were associated with the viral load (13
) and that CTL escape mutations in Gag were associated with reduced viral fitness (2
). Gag also has a large number of conserved peptide elements (59
) that are less likely to vary in response to immune pressure. Thus, it is probable that the structural and functional conservation of Gag is such that variation in most of its CTL epitopes is not well tolerated, making it a promising gene candidate for a CTL-based vaccine. The finding that Vpr and Rev were also among the proteins with the highest fitness costs associated with immune escape emphasizes the importance of considering these auxiliary proteins as vaccine components.
This study also supports the observations that host HLA alleles provide differential impacts on the virus. We found that HLA-B alleles, followed by HLA-C, were the most commonly associated with amino acid changes and had a greater proportion of significant associations than HLA-A. Our work confirms several previous studies that have identified the importance of HLA-B in driving HIV-1 evolution and impacting disease (12
) and also underscores the importance of HLA-C in impacting HIV-1 disease. Recently, a polymorphism upstream of the HLA-C gene was found to be associated with the viral set point in a genome-wide scan (19
). Because HIV-1 down regulates HLA-A and HLA-B, but not HLA-C, expression (1
) and because HLA-C-restricted cells can have antiviral activity similar to that of HLA-A and HLA-B (1
), it may be especially worthwhile to consider HLA-C-restricted epitopes when selecting peptides for a CTL-based vaccine.
HLA population frequencies are also critical to consider in vaccine design. In a previous study, HLA-B*1503 was found to be associated with lower viral loads in a clade B-infected population, in which B*1503 was rare (24
), but not in a clade C-infected population, in which B*1503 was common (24
). Frahm et al. (24
) concluded that fixation of escape mutations in subdominant epitopes was the cause of the lack of response in the subtype C cohort. We did identify an association of HLA B*1503 with low viral load in this subtype C cohort. Indeed, the resistant (escape) amino acid identified in our study was in the consensus C peptides used by Frahm et al., explaining the poor response and in accord with their hypothesis. This finding underscores the fact that any viral sequence used to detect CTL responses, including consensus sequences, may encode escape variants and thus preclude detection of cognate epitopes. More advanced approaches, such as toggle design (23
) and others (26
), are needed to reliably assess the total breadth of immune responses. We also found that identical epitopes, in the context of a subtype B or subtype C data set, had different susceptible and resistant variants, implying different escape mechanisms, depending on the subtype analyzed. Because the HLA frequency in each population may influence the fixation of viral variants and the viral subtype might also influence immune escape mechanisms, it is possible that different sets of epitopes will need to be considered for each subtype or each unique population.
This study employed three distinct methods for the identification of HLA-amino acid associations, each using phylogenetic correction. Each method contributed novel associations and, using known epitopes as a proxy for a gold standard (since not all epitopes have been identified), all three methods had similar predictive capacities, yet their predictions did not overlap completely. This was, in part, due to each method capturing certain types of associations better than others. For example, MLL was able to identify more details about the associations at each site, whereas MLF was able to identify more sites. Although the parsimony method identified fewer sites, there was greater validation support for those sites. Another explanation for the lack of overlap between the three studies is that each identifies a small proportion of the true associations, supporting the idea that the union of the three methods provides the most comprehensive set of associations.
To define which associations were true positives, we used a validation set of all known epitopes. However, this validation data set, consisting of the best available information, was not an ideal standard for comparison for several reasons. We expect that greater than 50% of CTL epitopes have not been experimentally identified (41
). Uncertainty in epitope motifs and nonoptimized epitopes can also lead to incorrect support of an association. There also exists a bias for subtype B among known epitopes (49
). Finally, compensatory mutations were not considered in the validation data set, and these can involve several amino acids or intermediate steps (16
). Thus, the associations with no validation support could have been novel epitopes, compensatory mutations, or false positives. For this reason, we did not exclude any of our significant associations based on lack of validation support.
Because we did not know the exact sequence of the transmitted strain from each of the infected individuals in our study, we relied on phylogenetic analyses to infer the evolutionary history of the viral populations. This inference may have introduced error into the analysis that was likely to bias the results toward the null hypothesis, producing false-negative associations. The length of the terminal branch (the average in our study was 0.047 mutations/site/generation) reflects the amount of evolution that has taken place since the most recent ancestor and may also reflect the amount of time in the transmission chain from the inferred ancestor to the terminal sequence. This time was estimated to be an average of 10 years, assuming a molecular clock, a rate of evolution of 2.5 × 10−5
), and a generation rate of 2 days (51
). This estimate is likely inflated due to recombination. Undetected transmission events may have happened during this time, leading to the potential increase in detection of false-positive results using MLF and parsimony. However, the branch lengths were taken into account in the MLL method, and this method was not found to be better or worse than MLF or parsimony, suggesting that the time elapsed from the most recent ancestor of each sequence did not adversely impact our ability to identify evolving sites within known epitopes.
In summary, we have been able to identify HIV-1 amino acids that evolve in response to HLA-mediated cellular immunity among a South African population primarily infected with subtype C, including seven susceptible residues, escape from which was associated with lower viral loads. Moreover, we have provided a ranking of proteins based on the fitness cost of immune escape and, based on this ranking, recommend the inclusion of Gag, Vpr, and Rev as vaccine components. Our analysis also showed the importance of HLA-B and HLA-C in driving HIV-1 evolution. The information from this study can be used to design follow-up analyses to characterize CTL epitopes and viral fitness for consideration in a CTL-based vaccine. Although this study does not prove that an effective CTL-based vaccine is achievable, it provides encouragement for future research in this area.