|Home | About | Journals | Submit | Contact Us | Français|
The high diversity of HLA binding preferences has been driven by the sequence diversity of short segments of relevant pathogenic proteins presented by HLA molecules to the immune system. To identify possible commonalities in HLA binding preferences, we quantify these using a novel measure termed “targeting efficiency,” which captures the correlation between HLA-peptide binding affinities and the conservation of the targeted proteomic regions. Analysis of targeting efficiencies for 95 HLA class I alleles over thousands of human proteins and 52 human viruses indicates that HLA molecules preferentially target conserved regions in these proteomes, although the arboviral Flaviviridae are a notable exception where nonconserved regions are preferentially targeted by most alleles. HLA-A alleles and several HLA-B alleles that have maintained close sequence identity with chimpanzee homologues target conserved human proteins and DNA viruses such as Herpesviridae and Adenoviridae most efficiently, while all HLA-B alleles studied efficiently target RNA viruses. These patterns of host and pathogen specialization are both consistent with coevolutionary selection and functionally relevant in specific cases; for example, preferential HLA targeting of conserved proteomic regions is associated with improved outcomes in HIV infection and with protection against dengue hemorrhagic fever. Efficiency analysis provides a novel perspective on the coevolutionary relationship between HLA class I molecular diversity, self-derived peptides that shape T-cell immunity through ontogeny, and the broad range of viruses that subsequently engage with the adaptive immune response.
Human leukocyte antigen (HLA) molecules and viruses are thought to be locked in an evolutionary arms race, where viruses adapt to evade HLA-restricted immune responses and HLA alleles evolve to optimize the fitness of human populations in the face of a wide range of pathogen species as well as the genetic variation within each pathogenic species. HLA diversity has been driven and maintained by heterozygote advantage (25), which is most evident in geographical regions with greater pathogen diversity (51), and by frequency-dependent selection, in which low-frequency allelic variants gain advantage in an environment of shifting pathogen selection (58). In turn, the selective pressures of HLA-restricted immune responses on pathogens are evident in a range of immune evasion strategies employed by viruses and encoded in their genomes, such as the ability of large DNA viruses (e.g., herpesviruses) to “hide” by inhibiting antigen presentation (61) and mimicking host peptides (39, 60) or the ability of RNA viruses to “run” through rapid evolution of genetic diversity (22, 35, 40, 43, 52, 53).
We along with others have explored the rapid viral adaptation to HLA-restricted immune responses using sequence analyses and have detected statistically significant associations between host HLA alleles and specific amino acid polymorphisms of human immunodeficiency virus (HIV) and hepatitis C virus (HCV) (4, 8, 28, 29, 43, 62). These findings have informed and directed experimentation which has, for example, confirmed that some of these HLA allele-specific viral polymorphisms are due to abrogation of HLA binding or peptide processing (17, 28, 29, 34, 62). In contrast, there is a paucity of direct evidence linking HLA evolution to the selective pressure of pathogens as the reproductive advantage for humans operates on a long timescale (5). Limited direct evidence from a set of 34 oncoproteins and HIV Nef suggests that HLA alleles might preferentially target evolutionarily conserved peptides (12, 23). As functionally important sites on proteins tend to be evolutionarily conserved (12, 26, 64), immune surveillance of conserved ligands focuses immune resources to genomic areas in humans and pathogens where mutations might alter function (26, 57) or incur a fitness cost (28, 29, 66).
The recent availability of large curated databases of genetic sequences has aided in the investigation of evolutionary relationships between human and pathogen genetic diversity. These databases enable studies of evolutionary conservation using sequence variation (2). In addition, the experimental determination of tens of thousands of HLA binding affinity measurements (48) has allowed robust estimation of binding affinities for a wide range of HLA-peptide combinations (38). These data allow direct investigation into the relationship between HLA binding and target sequence conservation, as well as into the differences in these patterns across viral species and different HLA alleles.
Here, we examined the relationships between HLA class I molecules and a large selection of pathogen-derived and self-derived HLA peptide ligands, comparing the likelihood of a given HLA molecule to bind a given peptide, and the relative conservation of that peptide sequence (Fig. (Fig.1A).1A). We term the tendency of a given HLA molecule to bind to conserved regions of a protein its “targeting efficiency.” Using this approach, we explored the variability of HLA alleles in their ability to target conserved regions of human and viral proteins. Finally, we explored the functional relevance of HLA targeting efficiency and found that targeting efficiency is associated with improved disease outcome for HIV infection and dengue virus and accounts in part for interindividual variation in HIV viral load in predictive models.
These findings suggest that the relationships between HLA binding preference and evolutionary conservation of target sequences provide a central basis around which balancing selection of both host and pathogen genetic diversity may be better understood, as first proposed by Hughes and others (23, 64). Our interspecies approach is complementary to previous intraspecies studies of HLA-allele specific viral polymorphisms (27, 60, 63), which have more statistical power in the variable than the conserved elements of pathogen genomes, and provides a novel tool with which HLA pathogen coevolution can be examined.
Conservation scores for an analyzed protein were computed using the ConSeq server (http://conseq.tau.ac.il/), which estimates the conservation score C(i) for each protein site i using a phylogenetic tree built from a set of homologous sequences. The tree is used to infer the evolutionary rates (log probabilities of substitution) for each site along the given protein (2) (Fig. (Fig.1A).1A). Conservation scores were computed only for proteins which had at least five homologous proteins in the UniRef100 database (release 11.0). For computational efficiency, no more than 100 homologues were used. Homologous proteins were aligned using the MUSCLE program, version 3.6. The site conservation scores are computed in the context of the entire protein as ConSeq uses aligned homologues to compute the phylogeny in which the targeted protein resides. This analysis also incorporated adjustment for variables that may affect P values, including various numbers of protein homologues in conservation score computations, as well as various protein length distributions. Comparisons of the effects of using inter- versus intraspecies homologous proteins for estimating the evolutionary rates are provided for HIV in Table S6 in the supplemental material.
The binding scores are based on experimental measurements characterizing individual HLA-peptide interactions, as catalogued in the Immune Epitope Database (IEDB) (48, 50), as well as known HLA-peptide binding configurations (32). Binding energies of HLA-peptide complexes were systematically estimated using the adaptive double-threading (ADT) structure-based approach (32) for estimating the binding energy of a major histocompatibility complex (MHC)-peptide complex. The method estimates the 50% inhibitory concentrations ([IC50s] a measure of the binding affinity) after threading both target peptides and HLA proteins (in particular, the known contact residues shown in red on the HLA structure in Fig. Fig.1A)1A) onto solved HLA-peptide complex structures. Here, we have focused on 9-mer peptide targets as the vast majority of known HLA class I epitopes are of this length.
The model parameters were fit to log IC50s obtained from the IEDB for ~34,000 experiments covering 35 HLA-A and HLA-B alleles. We excluded all HIV epitopes from these data for training in order to avoid a possible bias in the analyses of HIV viral load data. The ADT model can provide estimates for HLA molecules other than the limited number on which it was trained by threading the arbitrary HLA sequence onto the structure of another similar HLA protein and using the estimated model parameters, which generalize for the entire HLA allelic family. We analyzed 95 HLA-A and HLA-B alleles from a Caucasian population in Australia (43), a cohort which provided a total coverage of 95% of HLA-A alleles and 90% of HLA-B alleles in Europe. We note that of these 95 alleles, empirical binding data were provided for only 35 alleles. Due to lack of sufficient experimental measurements for HLA-C alleles in the IEDB, we did not consider these alleles in the current study. The binding score at a given position along the protein is the sum of binding energies for nine overlapping peptides, which measures the probability that the site will be visible to immune surveillance. The binding energy model provides an energy estimate, Ea(e) = Ea(e1,e2,…,e9), where a denotes the index of an HLA allele, and e is (e1,e2,…,e9) the 9-mer peptide. The model is fit to the logarithm of the IC50 measurements for different allele-peptide combinations. Therefore, the probability of peptide presentation is proportional to e−Ea(e), and high energy indicates low presentation probability and vice versa. In order to estimate the log probability of presentation of a single site in a protein, presentation probabilities of all peptides straddling that site need to be considered. As an estimate that is robust to prediction errors, we define the binding score B(i) for the i-th amino acid in the sequence s = (s1,s2,…,sN) of an arbitrary N-long protein whose segments may be presented by an HLA molecule as follows: . The predicted binding energies are highly correlated with the true experimentally measured binding energies (Spearman correlation of >0.75), and for some alleles the accuracy of prediction for our method and for other prediction methods (see, for example, references 3, 7, 16, 31, 44, 45, and 49) is believed to be within the accuracy of the IC50 measurement error. A recent analysis of various prediction methods can be found in Nielsen et al. (46).
The allele efficiency score r is defined as the Spearman correlation coefficient of the binding score and the conservation score for a given protein. A positive score indicates preferential targeting of conserved regions, and a negative score indicates preferential targeting of variable regions.
Statistical comparisons were corrected for the potential lack of independence of measurements resulting from the hierarchical structure of the groups, such as correlations generated by similarities within viral families and within HLA supertype classes. The sampling procedure for these analyses, along with a detailed analysis of potential sources of bias in our analysis and a detailed discussion of the statistical power of the methods used, is available in the supplemental material.
We obtained binding data for training the HLA binding predictor from the IEDB resource, consisting of experimentally measured binding affinities (IC50s) for ~34,000 HLA-peptide pairs, spanning 35 HLA-A and HLA-B alleles. For the viral targeting analysis, we collected the sequences of the most commonly studied human and plant viruses from the NCBI virus database. A full list is provided in Table S2 in the supplemental material. NCBI GenBank accession number information for individual viral proteins is available in Table S8 in the supplemental material. We based our analysis of the human proteins with disease-associated mutations on a version of the Online Mendelian Inheritance in Man (OMIM) database (McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, and NCBI, National Library of Medicine, Bethesda, MD, as of February 2007 [http://www.ncbi.nlm.nih.gov/omim]). A control group of randomly sampled proteins was obtained from the UniRef50 database, release 11.0 (http://www.pir.uniprot.org/search/textSearch_NR5.shtml), which contains representatives of natural protein clusters with >50% interclass similarity. The random sample did not include any human proteins. BLAST queries were performed against the UniRef100 database.
A total of 200 Western Australian HIV Cohort participants provided samples with consent for HLA typing. HLA class I genotyping was resolved to the four-digit level based on the sequence of exon 2 and 3 (exon 2/3) and using standard sequence-based typing (SBT). Allele and heterozygote ambiguities between alleles not identical in exon 2/3 were resolved using alternative primers.
In order to test the hypothesis that there is a correlation between conservation and binding patterns in proteins known to interact with HLA molecules, we examined the relationship between the efficiency scores of 95 HLA-A and HLA-B alleles (which account for more than 90% of all HLA alleles in European Caucasians) for target proteins spanning 4,761 human proteins derived from the OMIM database and 52 human viruses (see Table S2A in the supplemental material). We also examined these same correlations in 70 plant viruses (see Table S2B), as well as a control sample of 3,800 random proteins obtained from the UniRef50 database expecting that we would observe significant, but weaker, efficiency scores because of the sharing of conservation patterns between the proteins that typically do not interact with the immune system and the human proteins and human viruses that do. These control groups were also used to ascertain the significance of targeting efficiencies and variations in targeting efficiencies on human proteins and human viruses.
In order to verify that our conclusions were not dependent on the use of a specific predictor, we also used the NetMHCpan online prediction method (3) to replicate our results on Gag efficiency scores and HIV viral load correlations. Negative correlations with viral load were also obtained using this method (data not shown). In addition, we also conducted an analysis of HIV efficiency scores based on experimentally determined T-cell epitope maps and compared this analysis to the one obtained using predicted epitope maps. We found more evidence for HLA targeting of conserved regions when we used the measured epitope maps than when using the predicted ones, indicating that our analysis may underestimate the extent of this phenomenon (see supplemental material and Table S7).
We introduce a novel score termed “targeting efficiency” to quantify the relationship between HLA binding and conservation of target peptides. The HLA allele targeting efficiency score is defined to be the Spearman rank correlation coefficient between binding scores and conservation scores for amino acids along a given protein. Positive scores denote a preference for binding conserved regions while negative scores indicated a preference to bind to variable regions. The process of calculating a targeting efficiency score for the example pair of the herpesvirus-1 capsid triplex subunit 1 protein and the HLA-A*2402 molecule is illustrated in Fig. 1B and C. In addition to calculating targeting efficiency scores for individual proteins, the approach also allowed an efficiency score to be calculated for an entire proteome by concatenating all protein sequences of the given pathogen for which data were available.
In order to examine whether HLA molecules preferentially target conserved targets on human proteins, we analyzed a set of 4,761 proteins spanning the entire OMIM database of human-disease associated proteins (http://www.ncbi.nlm.nih.gov/omim/). This analysis showed that both HLA-A and HLA-B molecules tend to preferentially bind to conserved areas of human proteins (Fig. (Fig.2),2), as first proposed by Hughes and Hughes in 1995 in an analysis of 34 oncoproteins (23). These results are also in agreement with those reported by Yeager et al. (64). The average targeting efficiencies were significantly higher for both HLA-A (P < 10−76) and HLA-B alleles (P < 10−8) for OMIM proteins than for a random sample of 3,800 proteins (UniRef50 database) (Fig. (Fig.3).3). Similar targeting preferences were obtained from a smaller comparison using 300 randomly selected non-disease-related human proteins and 300 randomly selected UniRef proteins (see Table S3 in the supplemental material), verifying that this property is not specific to disease-associated human proteins. Importantly, the reported differences in efficiency distributions are not a consequence of the differences in the raw scores of the two parameters used to compute them as neither the binding scores (P = 0.96) nor the conservation scores (P = 0.99) were distributed differently in human and UniRef random protein sets. Additionally, significantly higher binding scores were observed at disease-associated mutation sites (P < 10−10 over 2,825,558 sites), providing further evidence of HLA coevolution with its targets.
We were also interested in exploring the relationship between efficiency scores and the assignment of HLA alleles to loci and to supertype groups (described in reference 29; see also Table S1 in the supplemental material). Therefore, we examined the efficiency scores of HLA-A versus HLA-B alleles for both the human and random protein sets described above. We found a marked preference for HLA-A to target conserved regions of human proteins (HLA-A HLA-B, P < 10−300) compared with the random UniRef50 sample (HLA-B > HLA-A, P = 0.00007). However, further analysis indicated that while the distribution of efficiency scores appears more uniformly positive for the HLA-A allele families (Fig. (Fig.4),4), a limited number of HLA-B alleles also preferentially bind conserved human protein sequences. For example, the B58 supertype (e.g., HLA-B57 and -B58 alleles) and a subgroup of HLA-B7 supertype alleles (HLA-B55 and -B56) have higher efficiency scores than most other B alleles for human proteins (Fig. (Fig.4).4). Interestingly, all of these alleles form a distinct HLA cluster closely associated with chimpanzee Patr-B alleles (see Fig. S2 in the supplemental material). Conversely, HLA-B alleles from the B44, B27, and B7 supertype families (other than B55 and B56) have low or negative average allele efficiencies for human proteins (Fig. (Fig.4).4). These results indicate that grouping HLA alleles by their abilities to target conserved regions of human proteins does not strictly follow their supertype classification, but it does reflect their phylogenetic history as alleles that are more likely to target conserved areas of human proteins cluster with chimpanzee Patr-B alleles (41).
To determine if binding preferences for viruses follow a trend similar to the one observed for human proteins, we computed the HLA efficiency scores for 52 human viruses and 70 plant viruses (listed in Tables S2A and S2B in the supplemental material). Figure Figure55 (see also Fig. S3) shows the distribution of efficiency scores as heat maps in which the viruses (x axis) are grouped by their families, and the HLA allelic variants (y axis) are organized into HLA supertype families characterized by their HLA peptide binding preferences (56). Figure Figure1D1D also presents the distribution of efficiency scores for HLA-A and HLA-B alleles for a selection of human viruses. Importantly, neither the binding scores (P = 0.87) nor the conservation scores (P = 0.59) were distributed differently in the two viral groups (human- and plant-infecting), consistent with previous reports (30). However, differences emerge when allele targeting efficiency is considered, suggesting that the HLA system has been optimized through coevolution with viruses to recognize functionally important protein regions that are relevant to pathogen threats. Most interestingly, these groupings revealed patterns of efficiency variation over different HLA alleles and different viral families (Fig. (Fig.5),5), indicating a possible functional importance of the existence of long-lasting allelic lineages for the HLA-A but not HLA-B allele locus (24, 41) and possible evidence of specialization of the HLA loci. To assess this further, we investigated the distribution of efficiency scores for HLA-A and HLA-B loci according to viral genome composition and viral species, as described in the supplemental material (22, 28, 29, 62).
In our analysis of DNA viruses, we found that both HLA-A and HLA-B loci demonstrated a preference for targeting evolutionarily conserved regions of human DNA viruses relative to plant DNA virus species (P < 0.01) (Fig. (Fig.6).6). Moreover, there was also a striking general preference for HLA-A alleles to target conserved regions of human DNA viruses compared with HLA-B alleles (P < 10−13). This was most notable for the Herpesviridae and Adenoviridae (Table (Table1),1), with significant differences between HLA-A and HLA-B loci for 8 of 10 viral proteomes assessed. Since these viruses have emerged via distinct lineages through vertebrate evolution (13, 42), these observations are not readily explained by sequence similarity between these viral families. Yet the results are consistent with coevolutionary relationships between herpesviruses and adenoviruses and their human and ancestral primate hosts (13). Moreover, we found a linear correlation between HLA allele efficiencies for the analyzed human proteins and these DNA viruses (as shown for cytomegalovirus in Fig. Fig.7A),7A), as well as strong correlations among overall efficiency scores for the proteomes of DNA viruses (Fig. (Fig.7C).7C). These observations are consistent with previous evidence that these viruses exploit host peptide mimicry as a means of host immune evasion (39, 60). We suggest that herpesviruses, which establish persistent but nonprogressive infection in the vast majority of human hosts, may evade the immune system (18) by exploiting the “holes” in the T-cell repertoire that are created by negative thymic selection. The similarity of herpesvirus and adenovirus proteome sequences to self-peptides would also be anticipated to induce more specific, less cross-reactive T-cell responses (10, 18), which may contribute to the progressive inflation of herpesvirus-specific T cells with a restricted T-cell receptor (TCR) repertoire during human ageing (33).
In contrast to results obtained for human protein and DNA viral protein targeting by HLA class I, we found that HLA-B alleles had higher efficiency scores for RNA viruses (Fig. (Fig.6B)6B) (P < 10−16), with preferential targeting of evolutionarily conserved viral proteins by HLA-B noted for 23/34 (68%) of RNA viruses assessed (Table (Table1).1). Further scrutiny revealed a spectrum of targeting efficiency profiles across the range of HLA-B-virus pairs. At one end of the spectrum we found a general trend toward positive HLA-B efficiency scores for the human-adapted Paramyxoviridae (including respiratory viruses such as respiratory syncytial virus, parainfluenza virus, and metapneumovirus, as well as measles and mumps viruses) and the Picornaviridae (predominantly rhinovirus and enterovirus species). These RNA viruses exhibit a diverse range of “high-efficiency” HLA-B specificities (Fig. (Fig.5),5), consistent with the existence of a host-pathogen evolutionary relationship that is relatively specific for the HLA-B locus. Preferential and efficient targeting of these highly infectious, but nonpersistent, RNA viruses by the highly polymorphic HLA-B locus could also provide a potential mechanism for the observed rapid evolution of these viral species, characterized by the emergence of transient viral mutations (which may provide an HLA context-specific selection advantage) that are then purged by purifying selection (22, 52).
At the other end of this spectrum, the arboviral Flaviviridae are a dramatic counter-example to the general observation that HLA molecules preferentially target evolutionarily conserved proteomic regions. As shown in Fig. Fig.5,5, most HLA-A and HLA-B molecules preferentially targeted nonconserved protein sequences from arboviral flaviviruses.
Targeting efficiency scores for two different protein groups are often correlated across different HLA alleles. For example, we found a strong negative correlation between allele efficiency scores for flaviviruses and the efficiency scores for human proteins or double-stranded DNA (dsDNA) viruses (Fig. 7A and C). The finding that HLA-A and HLA-B alleles that target conserved regions of human (self) proteins tend to target nonconserved regions of the dengue virus suggests that these viruses seek to reduce similarity to self-peptides to the extent that they are ignored by the immune system (18), thereby exploiting positive T-cell selection to their advantage. This evolutionary strategy also focuses HLA binding on relatively nonconserved viral proteome regions that are likely to have functional and genomic plasticity. We also find correlations between unrelated viruses (Fig. (Fig.7C),7C), suggesting that these pathogens share common strategies to evade HLA-restricted immune surveillance. In contrast, negative correlations between unrelated viruses may indicate that these pathogens exploit the weaknesses in the immune surveillance created by overadaptation of the immune system to one virus or the other.
We found that these correlations cannot be explained by the sequence (dis)similarities among viruses: when the same correlation factors are computed for simulated HLA binding preferences, they are not as strong as they are for the 95 HLA alleles analyzed here, which cover true binding properties for over 90% of the European population. This again suggests a prominent role for HLA targeting efficiency in shaping HLA and pathogen coevolution (Fig. (Fig.7B;7B; see also the supplemental material).
The findings presented thus far point to coadaptive relationships between HLA allelic diversity and human viruses but do not address the functional relevance of these observations. It is apparent from these analyses that relationships between virus species and host HLA diversity are highly specific, indicating specialized roles for HLA loci and for HLA allelic variants within these loci. We thus considered whether HLA targeting efficiencies for a given pathogen (with subsequent possible consequences for HLA-restricted immune responses) are associated with altered infectious disease outcomes. We focused on the specific examples of HIV disease progression, HIV viral load, and the incidence of dengue hemorrhagic fever (DHF).
Dengue hemorrhagic fever is a severe clinical manifestation of a secondary dengue flavivirus infection. Previous studies have suggested that the pathogenesis of this syndrome involves cross-reactive T-cell responses (15), which may be enriched in the context of low-affinity interactions between HLA class I and conserved viral peptides (as described for dengue viruses in Fig. Fig.5;5; see also Fig. S7 in the supplemental material) (10, 18). In support of this model, we found that HLA alleles known to be associated with susceptibility to dengue hemorrhagic fever (11) appeared to be less likely to target evolutionarily conserved proteomic regions than alleles that confer resistance to this disease (P = 0.05) (Fig. (Fig.88).
We next analyzed the effect of HLA targeting efficiency for specific HIV-1 proteins on HIV-1 viral load in a study population of 191 HIV clade B-infected, treatment-naive individuals from the Western Australian HIV cohort for whom viral loads (plasma HIV RNA level) and full HLA typing were available (43). We computed HLA targeting efficiency for each individual by averaging binding scores over each patient's specific HLA-A and HLA-B repertoire, thus approximating the aggregate ability of patient-specific HLA alleles to differentiate between conserved and variable targets.
While the above analysis of viral targeting involved full proteome targeting efficiencies (Fig. (Fig.5),5), the underlying analysis of residue-specific binding and conservation scores calculated for individual viral proteins and for specific HLA alleles allows for a more focused exploration of the landscape of HLA-pathogen interactions based on targeting of particular individual proteins (see, for example, Fig. S4 to S6 in the supplemental material). Analysis of protein targeting efficiencies has the potential for broader use in the investigation of viral immunity relevant to natural infection as well as vaccine design and evaluation, particularly when indicators of functional immunity are available, as is the case in HIV-1 infection.
While overall HLA allele efficiencies toward HIV proteins correlated negatively (but not significantly) with log viral load (P = 0.24), certain individual protein targeting efficiencies showed significant correlations with viral load. For example, HLA-B locus efficiency in targeting Gag protein alone is more strongly correlated with viral load (r = −0.19; P = 0.009), consistent with experimental evidence that HLA-B-restricted cytotoxic T lymphocyte (CTL) responses to Gag epitopes play a significant role in determining the natural history of HIV infection (6, 35, 36, 54). We then analyzed the distribution of HLA allele efficiency scores for HIV-1 Gag according to their known associations with HIV progression (9, 19) and found that protective HLA alleles tend to rank more highly in targeting efficiency of conserved Gag proteomic regions (Fig. (Fig.99).
We also investigated the combined effect of efficiency scores for individual proteins and HLA loci on log viral load using multivariate regression. This analysis used the efficiency scores of all nine HIV proteins, taking into account proteasomal cleavage, and was performed on multiple test/train splits of the data (see supplemental material). Here, we found that HIV efficiencies alone account for 7.0% of the log viral load variance (P = 3.67 × 10−4; correlation, 0.27), gender alone explains 7.0%, and the patient's ethnic group alone shows no significant explanatory power. The HLA efficiencies and gender combined explain 11.2% of log viral load variance. The explanatory power of efficiency scores was not attributable to a single HLA allele or HLA supertype effect nor to confounding demographic effects such as gender and race (described in the supplemental material).
HLA binding is not the sole determinant of potential immune targets. The processing of intracellular antigens relies on relatively monomorphic and evolutionarily conserved proteins to optimize peptide cleavage prior to HLA binding (47), thus providing an additional mechanism for ligand selection. We therefore examined the potential influence of proteasomal cleavage on targeting efficiency by using the NetChop algorithm (46, 55). We analyzed targeting efficiency across human proteins and viral species using a restricted data set of peptides with a high probability of appropriate C terminus cleavage. We found that proteasomal cleavage restriction has indeed coevolved with HLA binding, and cleavage is also directed toward conserved targets. As shown in Fig. S7 in the supplemental material, the distribution of HLA targeting efficiencies remained similar to those identified in Fig. Fig.5,5, indicating that relationships between HLA binding preference and evolutionary conservation are preserved among peptide targets that are selected via the antigen-processing complex.
In this study, we have found that HLA class I molecules preferentially sample conserved regions of human proteins and many viral families, as initially hypothesized by Hughes and Hughes (23). We uncovered a striking exception in the arboviral Flaviviridae species, where HLA molecules preferentially target nonconserved regions. This methodology provides a capacity to map the landscape of host-virus interactions from a novel perspective and also allows for closer examination of these effects at the viral protein level (see Fig. S4 to S6 in the supplemental material), providing a platform for comparative analyses of the complex coevolutionary relationships that exist between viruses and their human hosts.
These findings also provide evidence for the evolution of HLA class I locus and allelic specialization, suggesting a partial division of labor between the coinherited HLA-A and HLA-B loci. While molecules encoded in both loci participate in surveillance of various proteins, the HLA-A locus and certain HLA-B alleles appear to have a particularly important role in surveillance of evolutionarily conserved regions of the human proteome (14). This finding is specific to human (rather than randomly selected) proteins and is even more evident at sites of disease-associated mutation, suggesting optimization of ligand selection through human (and ancestral vertebrate) evolution.
Further evidence of partial HLA specialization can also be found through analyses of HLA-viral interactions as HLA alleles that target conserved elements from the human protein repertoire also target conserved regions of human-adapted DNA viruses. In this respect, our findings are supported by other studies (18, 39, 60) indicating that these ancient DNA viruses exploit holes in the repertoire of reactive T cells created through thymic selection, thereby evading effective immune surveillance by maintaining similarity to self-peptides. We extend these observations to show that the extent to which individual HLA alleles are adapted to bind conserved human protein elements is highly correlated with their targeting efficiencies toward DNA viruses. We also find that HLA-B alleles tend to more efficiently target conserved regions of RNA viruses. These results are in keeping with those of Prugnolle et al., who noted that relationships between pathogen diversity and balancing selection are particularly evident at the HLA-B locus (51). They are also supported by the findings of McAdam et al. (41) and by Hughes et al. (24), who found that a large number of HLA-B alleles are products of small-scale recombination events and that the HLA-B locus evolves much more rapidly than the HLA-A locus, suggesting that these two loci have been subject to different types of natural selection over long periods of time in response to different pathogenic threats. Our results are also in line with evidence of more effective HLA context-specific purifying selection followed by reversion in RNA viruses than in DNA viruses, as reported by Hughes et al. (22).
It is important to emphasize that these preferences exhibited by HLA alleles are not evident when either HLA binding energies or evolutionary conservation of target peptides is considered in isolation but only when these factors are considered together. This is in keeping with the findings of Istrail et al. (30), who conducted genome-wide analyses of binding preferences of HLA supertypes and found no meaningful differences in the tendency of HLA alleles to bind human proteins over proteins from other organisms.
Viewed from the perspective of viral evolution, these data suggest that viral species choose distinct adaptive pathways under HLA-restricted immune selection (1, 40). This is most dramatically illustrated for the arboviral Flaviviridae species, in which variable rather than conserved proteomic regions are the preferred targets for HLA binding. Evolution toward the “extinction” of predicted HLA targets in the dengue virus genome has been noted previously (21). In this context, it is interesting that dengue virus infection actively promotes (rather than downregulates) TAP (transporter associated with antigen processing)-dependent antigen processing and HLA class I cell surface expression during flavivirus infection (20), indicating that the flaviviruses employ immune evasion strategies that are the opposite of those of many DNA viral species. This particular adaptive strategy may be influenced by the fact that arboviral flaviviruses must maintain the ability to infect arthropod vectors as well as vertebrate hosts (including nonhuman primates) without significant genomic adaptation (37, 59).
The HLA targeting efficiency scores may also prove a useful tool for predicting patient response to infections, as illustrated by the examples of disease outcomes in dengue virus and HIV-1 infections. These scores provide an example of a novel, numeric, and real-valued representation of an HLA molecule, which can be utilized to quantify similarities and differences between HLA molecules based on a target preference function. Such a projection allows identification of common targeting characteristics among patients with different HLA types, thus potentially increasing statistical power in the analysis of patient cohorts. This representation is similar in concept to HLA supertypes (56), which was previously the only method for classifying HLA alleles while attempting to retain biologically meaningful differences. Further studies will be required to investigate these attributes, but it is notable that relationships between HLA targeting efficiency and HLA supertype classifications are by no means uniform, as evidenced in the HIV viral load analysis as well as in the data shown in Fig. Fig.44 and and55 and in Fig. S4 in the supplemental material.
However, multiple factors contribute to disease expression in the context of viral infection, and HLA class I binding is only one of many necessary but not sufficient, genetically determined factors involved in antigen processing and the subsequent generation of pathogen-specific immune responses. To investigate the potential influence of one of these factors, we examined the effect of proteasomal cleavage on HLA targeting efficiency. We found that proteasomal cleavage restriction was also directed toward conserved targets (see Fig. S7) but that the tendency of HLA alleles to target conserved regions remained as strong even when only the peptides which were likely cleavage targets were considered. This suggests that both HLA-peptide binding and proteasomal cleavage have been co-optimized to target conserved regions.
Previous studies of HLA allele-specific viral polymorphisms (27-29, 60, 63) have shown that adaptive interactions between individual human hosts and autologous viral populations are unique and highly dynamic, involving the evolution of HLA-specific CTL escape mutations that are known to influence the natural history of viral infection (43). We therefore offer that the methods described here, designed to investigate the broad patterns of host-pathogen coevolution across multiple viruses, complement other approaches that examine one virus at a time, such as studies that reveal host-virus adaptation by assessing HLA-associated viral polymorphisms (27, 60, 63) or phylogeny (28, 29).
Our analyses using a diverse array of HLA alleles and viral proteomes suggest that, in general, HLA-A preferentially targets DNA viruses and that HLA-B preferentially targets RNA viruses while both HLA-A and -B alleles tend to bind to nonconserved regions in arboviral flaviruses. It must be emphasized that these broad observations identify trends and will not generalize to all the viruses and the individual proteins or epitopes within those viruses.
Although viruses typically encode thousands of amino acids, most of the responding CD4+ and CD8+ T cells recognize a tiny fraction of the potential antigenic determinants (65). This serves to maximize the efficiency of clonal recruitment and activation for a highly specific and avid antiviral response. More than 90% of CD8 T-cell immunodominance is thought to be explained by HLA-peptide binding affinity, as only ~1% of peptides form a complex with HLA class I molecules with sufficient stability to be presented in adequate numbers to activate naïve CD8+ T cells (65). While within-host immunodominance may underpin efficient primary and secondary responses to some acute viral infections, the extreme dominance of a few or even single clonotypes associated with some persistent viruses and vaccine-induced responses is problematic if those immunodominant responses are not protective. The study of targeting efficiency in such infections may help clarify, in part, virus-specific immunodominance patterns and the strategies different groups of viruses have taken to counteract these. This has significant implications for vaccine immunogen design as the efficacy of an immunodominant vaccine-induced response is likely to be improved if directed against determinants that have high targeting efficiency and are functionally important to the virus rather than determinants that reproduce the counter-evolutionary strategies of the virus.
In conclusion, this study has taken advantage of recent advances in large-scale genome sequencing, HLA binding measurements, and curation, along with the availability of computationally intensive analysis techniques, to address the hypothesis that HLA class I-restricted peptide sampling is preferentially targeted to evolutionarily conserved, functionally important regions of human and viral proteomes. The data support this view and also provide support for balancing selection of HLA class I allelic diversity (particularly at the HLA-B locus) anchored on this property in response to the challenges provided by diverse human viruses. The approach provides a novel perspective on the ongoing coevolutionary relationships between HLA class I polymorphism, adaptive T-cell immunity, and the self-peptides and viruses that engage with these systems.
We thank Matthew Care for providing the OMIM data set and Sarel Fleishman, Chen Yanover, Noah Zaitlen, Felipe Veloso, David Holmes, and David Heckerman for useful discussions. We also thank Jacob John for information regarding plant viruses and Manuel Reyes Gomez for retraining the adaptive double-threading method on the data that excludes HIV epitopes, as well as Itay Mayrose for help with using the ConSeq server. We thank Rachel Tompa and Renee Ireton for editing the manuscript. We thank the anonymous reviewers for useful comments.
Published ahead of print on 17 November 2010.
†Supplemental material for this article may be found at http://jvi.asm.org/.
‡The authors have paid a fee to allow immediate free access to this article.