In the present study we have attempted to classify almost 1000 HLA-A and -B class I alleles into supertypes. This is nearly a 10-fold increase in the number of alleles compared to our original classification done about a decade ago [12
]. Besides providing supertype assignments for considerably more alleles, the present report has attempted to make more transparent how the original "phenomenological" classifications were done. About 80% of the 945 alleles examined were classified into one of the nine supertypes identified previously. Analysis of B and F pocket specificity patterns did not suggest the existence of any novel supertypes.
HLA supertypes do not necessarily demarcate groups of alleles with completely non-overlapping repertoires. A binding repertoire overlapping multiple supertypes has been demonstrated previously, for example, in the cases of A*2902 [22
] and A*3001 (see Hamdahl et al., IEDB submission 1000945 [94
]). In the present study we have identified 17 other alleles that would appear to have specificities that bridge either the A01 and A03 supertypes, or the A01 and A24 supertypes. At the same time, individual peptides can be readily identified that bear a particular supermotif, but that do not bind individual HLA allele members of the supertype, or that bind alleles of other supertypes, even supertypes associated with a different locus. Typically, in the first case, these phenomena result from differences in motif compatibility, perhaps at secondary positions. The second case likely reflects overlap(s) between the supertypes in terms of specificity, although in rare cases binding can be accomplished when no main anchor motif compatibility is apparent.
These observations are exemplified by a large scale analysis of the capacity of a non-redundant set of 252 known EBV and HIV derived epitopes to bind a panel of 30 different HLA class I A and B molecules (Sidney, Frahm, Brander and Sette, unpublished observations). It was found that about 21% of the peptides bearing a specific supermotif bound a given allele in the corresponding supertype with an affinity of 100 nM, or better. By contrast, only in 1% of the cases considered did an allele bind a peptide that did not have the corresponding supermotif. At the same time, it was noted that in the set of peptides utilized 62% (155/252) have motifs associated with 2 or more supertypes. The pattern of binding also followed this general promiscuity. It is also significant to note that when the same library of peptides was examined for recognition in HIV/EBV patients, it was found that ~95% of the epitopes were recognized in individuals not expressing the allele the epitope was originally reported to be restricted by, and the promiscuity more often than not involved an allele outside of the supertype associated with originally described restricting allele [97
]. Thus, it is apparent that the lines of demarcation between supertypes can be fuzzy from the perspective of both the allelic specificity and the peptide motif.
Restriction outside or across supertypes can also originate from overlaps in supermotifs (e.g., A02 and B62), or for alleles such as B*0801 which do not utilize the typical P2/Cterminus anchor spacing. B*0801 utilizes P3 and P5, not P2, and as such may be compatible with several supertypes and alleles. Thus, an A02- or A24-supertype epitope cross-reacting with B*0801 is not an example of a motif "failure", but merely reflects the fact that the specific peptide has both motifs.
It is important to emphasize that supertypes are based on MHC binding, and that MHC binding alone is not sufficient criteria for T cell recognition. Indeed, hundreds of examples of peptides that bind with remarkably high affinity, but that are not recognized by T cells, have been reported in the literature. We note that even in the best affinity ranges (i.e., IC50 <10 nM), rarely more than 10% of the peptides can be expected to be recognized [98
]. Similarly, binding affinity is not necessarily correlative of frequency of recognition [99
]. It is true that the trend is towards the most frequently recognized peptides being also the highest affinity binders [38
], but that is not always the case, and there are clearly cases where the dominant epitope has an IC50 of ~100 nM, while several other non-recognized peptides have affinities in the <10 nM range.
It must also be emphasized that membership of an epitope in a supertype is not sufficient to guarantee its recognition by T cells in the context of different MHC alleles. Peptide binding to MHC is an absolute requirement for an epitope to be recognized by T cells. At the same time, many other factors, including protein expression and processing, as well as T cell repertoire and the specific MHC context, come into play in determining whether a peptide will be an epitope or not, or whether an epitope will be promiscuously recognized within a specific supertype. For example, Goulder and co-workers, studying B7-supertype epitopes, found that differential selection pressure exerted on HIV by CTL targeting identical epitopes, but restricted by distinct HLA alleles from the same supertype, can result in significant functional differences [101
]. Macdonald et al., looking at two B44 subtypes described as members of the HLA B44-supertype, reported that a naturally selected dimorphism between the two molecules alters class I structure, peptide repertoire, and T cell recognition [102
The intent of the current study was to derive an updated classification of HLA class I MHC alleles on the basis of primary anchor specificity. For the vast majority of HLA class I molecules whose binding specificity have been described by crystal structure, pool sequencing or peptide binding studies, the main anchor interactions of the peptide almost invariably involve the MHC B and F pockets, while other pockets likely dictate secondary interactions. This pattern also appears to be true for most macaque and chimpanzee class I alleles studied to date.
There are exceptions, however, and indeed we have not assigned B*08 alleles to a specific supertype in recognition of the fact that these alleles appear to utilize pockets other than the B pocket as a primary anchor contact. For HLA class I molecules in general, the B and F pockets are the most likely main anchor contacts, while other pockets likely dictate secondary interactions. High levels of crossreactivity have been experimentally demonstrated in the case of 6 supertypes for alleles that vary at secondary pockets [15
By contrast, in the murine system it is well recognized that other pockets are often the important primary peptide contacts [103
]. Thus, to utilize the classification approach described here in the context of other species, additional pockets may need to be considered. It is also likely that the further parsing or sub-classifying of supertypes on the basis of secondary interactions can be accomplished.
In the case of HLA-B alleles, F pocket specificity is difficult to correlate with a specific sequence, as a diverse pattern of residues appears to be associated with similar binding specificity. Independent of the residues in the F pocket, most HLA-B alleles seem to bind hydrophobic residues. Thus, assignment of B alleles was primarily driven by the specificity exhibited by the B pocket. On the other hand, it is also possible that greater resolution in the F pocket could be achieved as more data become available to discriminate different preference patterns. For example, in the B7 supertype it is apparent that some alleles, like B*3501, prefer large hydrophobic residues, such as Y. Conversely, other B7 supertype alleles, such as B*5401, seem to prefer small hydrophobic residues, such as A, at the C-terminus. While we have noted these subtle differences in preference [20
], in practice we have not found that they significantly impact cross-reactivity between the alleles. This perhaps suggests that the C-terminal anchor in some contexts is less important, and that shared secondary preferences can have a stronger influence on degenerate binding capacity than in other cases. At the same time, it may be necessary to consider additional key residues in the analysis of the F pocket. This is exemplified in the cases of A*2603 and A*0301 which have the same key F pocket residues, but which are associated with much different specificity.
The vast majority of HLA-A and -B alleles fall into one of the 9 supertypes we have described. There are likely reasons for this [110
], which include evolutionary relationships, but also constraints and limitations inherent in the epitope processing infrastructure. For example, no allele has been identified to date that binds peptides with D, E, Q or P at the C-terminus, which is in congruence with the preferences of both proteasomal cleavage and TAP transport [111
], and an observation that has been applied in the rational design of an in vitro test reagent tool (PeptGen) offered as a tool by the Los Alamos HIV Sequence Database [112
Supertype classification should not be taken to necessarily imply an evolutionary relationship. In some cases this is largely true, as for example in the case of the A2-supertype, where most alleles are associated with the A2 serological antigen. In other cases the relationship is more complicated, such as the gene conversion relationship between the A2, A3 and A68 antigens. This latter example is somewhat of the exception that proves the rule. Supertype associations are based on shared binding specificity, which may result from both common ancestry and convergent evolution [110
]. Thus, while alleles within a supertype may have a close evolutionary relationship, that is not a given. Also, alleles (supertypes) sharing specificity at one anchor position may be associated with very disparate specificities at the other.
Other groups have also utilized various methodologies to define supertypes. In general, our classification is in agreement with those derived by other approaches, as compiled by Hertz [32
], Lund [23
] and Tong [31
]. This is not surprising given the good agreement observed between our initial dataset and other classifications, and that the methodology utilized here is not different from the one utilized to derive the original assignments. If there are variations, they usually represent the splitting of a supertype, or reassignment of individual alleles. As in any classification problem of this kind, there is no absolute truth in supertype assignments. The practical application of supertype classification schemes to identify degenerately binding peptides will ultimately show what classification scheme has the most practical value.