|Home | About | Journals | Submit | Contact Us | Français|
MHC/HLA class II molecules are important components of the immune system and play a critical role in processes such as phagocytosis. Understanding peptide recognition properties of the hundreds of MHC class II alleles is essential to appreciate determinants of antigenicity and ultimately to predict epitopes. While there are several methods for epitope prediction, each differing in their success rates, there are no reports so far in the literature to systematically characterize the binding sites at the structural level and infer recognition profiles from them.
Here we report a new approach to compare the binding sites of MHC class II molecules using their three dimensional structures. We use a specifically tuned version of our recent algorithm, PocketMatch. We show that our methodology is useful for classification of MHC class II molecules based on similarities or differences among their binding sites. A new module has been used to define binding sites in MHC molecules. Comparison of binding sites of 103 MHC molecules, both at the whole groove and individual sub-pocket levels has been carried out, and their clustering patterns analyzed. While clusters largely agree with serotypic classification, deviations from it and several new insights are obtained from our study. We also present how differences in sub-pockets of molecules associated with a pair of autoimmune diseases, narcolepsy and rheumatoid arthritis, were captured by PocketMatch13.
The systematic framework for understanding structural variations in MHC class II molecules enables large scale comparison of binding grooves and sub-pockets, which is likely to have direct implications towards predicting epitopes and understanding peptide binding preferences.
Major histocompatibility complex (MHC) class II molecules are important components of the immune system and play a critical role in processes such as phagocytosis. Antigenic peptide binding by these molecules is a pre-requisite for triggering immune responses. The diversity in antigen recognition is achieved through hundreds of class II alleles labelled by their serotypes, each differing from the others in terms of the residues at the binding site and their precise three dimensional arrangement.
The nature of binding site of an MHC class II molecule (Figure (Figure1)1) has an important bearing on the immune system of an individual [1,2]. MHC class II molecules provide important clues in understanding autoimmune diseases (e.g. [3-5]) and susceptibility to pathogens. In the context of tuberculosis, it has been reported that different MHC alleles bind peptides from Mycobacterium tuberculosis with different specificities, influencing an individual's susceptibility to infection [6-8].
A thorough knowledge of the structure of the binding site is useful in designing or identifying peptide antigens for rational vaccine design. In addition, knowledge of similar or dissimilar sites aid in understanding peptide specificities. While a general appreciation of the differences between a pair of structures can be obtained through interactive molecular graphics software tools, a thorough characterization of the differences and their mapping to individual residues in the corresponding structures, and more importantly obtaining a quantitative perspective of the extent of similarities, necessarily requires a systematic method for their analysis.
We have recently reported a new algorithm PocketMatch  based on alignment of sorted distance elements binned into point-type-pair bins. An important step that precedes pocket comparison is the definition of the binding site itself. In the previous study, all residues (or any atoms in them) that were present in a 4 Å zone around any atom of the ligand were taken to constitute the site. This approach though common, is rather simplistic and more detailed methods to define the binding site need to be explored to have more accurate site definitions. Here we incorporate a new module for defining binding sites and apply it for a large scale comparison of binding sites in the MHC class II molecules.
The modified algorithm is referred to as PocketMatch13 hereafter. Further, we show that our algorithm is useful for classification of MHC class II molecules based on binding site analysis. The algorithm captures the overall shape, detailed geometry and the chemistry at the binding sites. This analysis also aids in understanding peptide preferences by different alleles which may become the first step in the optimal design of allele specific antigens.
We report a new approach for a large scale comparison of binding sites in protein structures and apply it for comparing and classifying a set of 103 MHC class II molecules. The method, which utilizes structural features of the whole site as well as of the sub-pockets, also serves as a high resolution framework to systematically understand similarities and differences among alleles. We have used this to identify automatically intra- and inter-allelic variations in the binding grooves of molecules in the data set, and to explore the structural basis for correlations with disease.
To investigate similarities across MHC molecules of different types, one MHC molecule was selected from each of the 65 Protein Data Bank (PDB) entries in the dataset, and all-against-all comparisons were carried out on this set of 65 molecules (Table (Table1).1). Binding site similarity scores (PM13Scores ) were computed for all the pairs of molecules both at the level of whole groove and sub-pocket levels. Cladograms were generated to show similarities and differences in PM13Scores across the dataset, both at the level of the whole groove, and at the level of the five sub-pockets (Figures (Figures22 and Figures S1-S4 in Additional file 1). In addition to considering whole binding groove, it is important to know how the similarities of the sub-pockets (P1, P4, P6, P7, P9) vary as these are the ones that determine peptide specificity.
Some MHC molecules of the same type are in different branches of the cladogram calculated for the whole groove, however clustering at the sub-pocket level was more in line with the different MHC molecule types, particularly for the P4 sub-pocket. This suggests that the P4 sub-pocket is more structurally conserved within an allele, but difference occurs across alleles. The importance of the P4 sub-pocket has been noted in many studies (e.g. [1,2,10]).
Some different MHC molecules are grouped together in the same branch in some of the sub-pocket trees. In these cases, the PM13Scores highlight similarities that would otherwise be difficult to spot in a large dataset. These can be followed up by looking for independent observations about these similarities that have been reported in the literature. The matching alleles, corresponding PDB codes and PM13Scores for pairs of sub-pockets are listed in (Table (Table2),2), where the significance of the grouping of different alleles is discussed and supporting references are presented.
To analyze the net distribution of similarity scores with respect to each other for each of the five sub-pockets, a histogram is plotted for various bins of PM13Scores (Figure (Figure3).3). Each bin corresponds to a range of PM13Scores. For example, bin-5 corresponds to a PM13Score range of [0.5 to 0.6); bin-7 to the range [0.7 to 0.8) and so on. The histogram shows that P1 and P9 score highly at bin 6, corresponding to [0.6 to 0.7) of PM13Score. The histogram gives an indication of the overall distribution of scores for each sub-pocket viewed in the context of others. This could possibly mean over-representation of data or true conservation of these two sub-pockets.
This analysis has implications for understanding subtle differences that otherwise go undetected and aid in understanding antigen recognition preferences by different alleles and range of antigens recognized by a given allele.
Some MHC molecules are present more than once in the PDB entries in the dataset (Table (Table1).1). In these cases, PocketMatch13 can be used to highlight differences in the peptide binding sites in different structures for the same allele.
The sites are first compared by considering the whole binding grooves. In many cases, as expected, PM13Scores are high, indicating strong similarities in the binding sites of a given allele. However, there are cases where PM13Scores are low for different structures of the same molecule, for example different structures of DR1 and DR5 give similarity scores as low as 0.44 (Table S1 in Additional file 1). These differences can be explored by examining the individual sub-pockets within the binding grooves (see Methods). While many pairs of corresponding sub-pockets score highly, indicating similarity in the structures of the sub-pockets, in some cases the scores are significantly lower. This can be due to differences in MHC side chain conformations giving rise to different sets of intra-site distances, or can be due to determination of which MHC atoms are accessible to a probe sphere and are thus included in sub-pocket calculations. Sub-pockets highlighted by PocketMatch13 to be dissimilar can then be examined in detail to identify the reason for the low PM13Scores. Some examples of sub-pockets with low PM13 Scores are illustrated in Figure Figure44.
A pair of molecules belonging to DR1 exhibited low scores [PDB:1AQD, PDB:1DLH] in their P1 sub-pockets. Upon careful examination, we noticed that the P1 sub-pocket in 1DLH was wider and deeper with many more MHC atoms being included in the PocketMatch13 definition of the P1 sub-pocket. Considering the set of DRA*0101-DRB1*1501 structures, the largest difference is between the P7 pockets of [PDB:1BX2] and [PDB:2WBJ] (Figure (Figure4A).4A). The peptide residue at the P7 position is oriented very differently in these two structures -- in [PDB:1BX2], an isoleucine is oriented away from the groove, whereas in [PDB:2WBJ] a leucine is oriented "across" the top of the groove. Since the P7 peptide residue in [PDB:2WBJ] obstructs the P7 sub-pocket more than the P7 peptide residue in [PDB:1BX2], this affects the set of MHC atoms that are selected for the sub-pocket comparison calculation, and thus reduces the PM13Score (0.06).
The two independent molecules in the crystal structure of DQ8 [PDB:1S9V] differ from each other at the P9 sub-pocket (Figure (Figure4B);4B); the difference between the two molecules at the P9 position is noted by . This analysis indicates that PocketMatch13 is sufficiently sensitive to capture subtle differences that exist among molecules belonging to the same allele.
Several MHC class II alleles are known to be either positively or negatively associated with certain diseases, and this motivates studies to identify the reasons for disease susceptibility in terms of three-dimensional molecular structure . For example, Jones et al.  review the structures of alleles that are known to be positively or negatively associated with various diseases, including narcolepsy and rheumatoid arthritis (RA). We have used PocketMatch13 to examine the binding grooves of alleles discussed by Jones et al.  in connection with narcolepsy and RA, using experimentally determined structures from the PDB where these are available, and model structures when they are not (see Methods). In case of Narcolepsy, the pockets of the binding groove in the experimentally determined structure of HLA-DQ6.2 (positively associated with the disease) [PDB:1UVQ], were compared to those in a model structure of HLA-DQ6.1 (negatively associated with the disease). These molecules differ at only a few positions in the β chain. PocketMatch13 identified the P4 sub-pocket corresponding to the Thr6 residue of the peptide to be the most dissimilar between these two structures (Table (Table3).3). The residues Ala13bβ and Tyr26β in HLA-DQ6.2 changed to Gly13β and Leu26β in HLA-DQ6.1 in the neighbourhood of peptide residue Thr6, corresponding to P4 (Figure (Figure5A);5A); this difference is captured by the PocketMatch13 algorithm.
In case of RA, alleles HLA-DR4.1, HLA-DR4.4 and HLA-DR1 are positively associated with the disease, while HLA-DR4.2 is neutral or negative . The α chains of these four MHC molecules are the same (DRA*0101), and sequence comparison of the β chains with ClustalW  gives sequence identities of -- DR4.1:DR4.2 = 95%, DR4.1:DR4.4 = 97%, DR4.1:DR1 = 88%, DR4.2:DR4.4 = 96%, DR4.2:DR1 = 85%, DR4.4:DR1 = 88%. Given that the whole sequence similarities are not sensitive enough to capture differences at the binding site levels, we use PocketMatch13 to compare the binding grooves and sub-pockets of the experimentally determined structures of HLA-DR4.1 [PDB:1J8H] and HLA-DR1 [PDB:1DLH], and model structures of HLA-DR4.2 and HLA-DR4.4.
PocketMatch13 gives low scores for the P4 sub-pocket (Table (Table4A).4A). It has been shown by Hammer and co-workers  that the difference in residues 70 and 71 in the β chain of the DR4.1 and DR4.2 MHCs accounts for the difference in binding specificity of the peptides. The low P4 scores are in line with that study. The superposition of these two alleles is shown in Figure Figure5B.5B. The P4 peptide residue has Gln70β and Lys71β present in HLA-DR4.1 within 3.0 Å of the residue whereas an Asp at the position 70β and only Glu71β are present in the case of the model built for HLA-DR4.2.
All-against-all PM13Scores are presented in Table 4B, C. The scores indicate low PM13Score of [PDB:1DLH] to others in the P7 region of the binding site. Work by Rosloniec and co-workers found that mutation of the residue at the P7 position to an alanine has affected T cell stimulation more with DR4 than with DR1 . The involvement of P7 sub-pocket in peptide recognition specificity is also discussed in . In carrying out these case studies, model structures have been a useful supplement to the set of experimentally determined MHC class II molecules. We envisage future studies that make use of larger sets of model structures where the binding grooves have been modelled consistently using the same protocol .
A strategy for automatically comparing MHC class II binding grooves and sub-pockets based on their chemical nature and geometry is presented. Comparisons are facilitated by a pre-processing step in which MHC-peptide complexes are extracted from PDB files, and chains and structurally equivalent residue positions are relabelled consistently. Pocket similarity scores calculated by PocketMatch13 can be used as the basis for clustering pockets based on their structural and chemical characteristics.
The framework we report can be used to carry out large scale comparison of binding grooves and sub-pockets, both to highlight differences in the binding grooves of MHC molecules of the same kind, and to identify similarities in the binding grooves of different MHC alleles. Investigations of MHC alleles associated with narcolepsy and rheumatoid arthritis demonstrate that binding grooves of alleles that are positively associated with an autoimmune disease can be compared with those that are known to be negatively associated with the disease. The structural variations among binding pockets identified by PocketMatch13 corroborate known disease associations. Future applications of this systematic framework for understanding structural variations in MHC class II molecules could have direct implications towards predicting epitopes and understanding peptide binding preferences.
103 MHC class II molecules from 65 Protein Data Bank  entries are used in this study (Table (Table1),1), and the sequences of the α1 and β1 domains from these structures were matched with allele sequences from IMGT/HLA database  to confirm which allele is present in the PDB entry. In this study, the focus is on MHC class II binding domains. In some cases, different alleles share identical sequences for the binding region, e.g. human alpha chains DRA*0101 and DRA*0102 have binding domains with identical sequences, so both of these alleles are listed alongside structures with this alpha chain sequence in Table Table1.1. Similarly, many alleles have binding domains with sequences that are identical to those in [PDB:1S9V], and these are listed in Table Table11.
To facilitate automatic comparison of MHC class II structures, uniform chain identifiers and residue numbers were used for all MHC-peptide complexes extracted from the PDB files. New files were written where each file contains the core parts of an α1 domain, a β1 and a peptide, with chains relabelled to match the chain identifiers A, B and C in [PDB:1DLH], and residues renumbered to match the numbering of residues at structurally equivalent positions in [PDB:1DLH]. Positions 5-78 of the α1 domain and positions 5-91 of the β1 domain were retained. A rigid body transformation was applied to superpose the the MHC binding domain complexes onto chains A and B of [PDB:1DLH<http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1DLH>], so that all complexes are in the same frame of reference. This transformation is not necessary for the automatic comparisons that follow, but it is convenient for comparing structures using molecular graphics to review results from the automatic comparisons1DLH. Peptide residues corresponding to the 13 peptide residues in [PDB:1DLH] were identified by structural comparison, and peptide residues beyond the 13-residue peptide present in [PDB:1DLH] were removed automatically.
To enable the comparison of binding grooves of MHC class II molecules known to be positively or negatively associated with narcolepsy or RA, models of HLA-DQ6.1 consisting of alleles (HLA-DQA1*0102 and HLA-DQB1*0601), HLA-DRB4.2 (alleles HLA-DRA1*0101 and HLA-DRB1*0402) and HLA-DRB4.4 (alleles HLA-DRA1*0101 and HLA-DRB1*0404) were built interactively using the Swiss-PdbViewer . [PDB:1UVQ] was used as the template structure for the model of HLA-DQ6.1 and [PDB:1J8H] was used as the template for HLA-DRB4.2 and HLA-DRB4.4.
Binding sites are represented in a frame invariant manner by distances between pairs of points, partitioned into bins, and pairs of sites are compared based on alignment of sorted sequences of distances. The sorted arrays are then aligned and scored to finally obtain comparison scores.
Molecules can be clustered based on their comparison scores.
In this study, the points used are the centres of those atoms lining the binding site. These are determined by considering accessibility to a probe sphere with radius 1.4 Å. Those MHC atoms whose accessibility is reduced by the presence of the peptide are determined to be part of the peptide binding site. Similarly, the MHC atoms that comprise individual pockets are identified as the set of atoms whose accessibility is reduced by the presence of the peptide residue at position P1, P4, P6, P7 or P9. The ProtOr radii from Table Table22 of  are used for protein atomic groups in accessibility calculations.
The corresponding pockets between a pair of MHC binding sites are compared on large scale in an all-against-all comparison scheme. The shape signature of each pocket, capturing chemical nature and geometric distribution of atoms, is derived based on the distance lists concept used in PocketMatch .
Site comparison proceeds as follows:
• Surface atomic groups are classified into 13 types based on heavy-atom types, the number of covalently attached hydrogen atoms and the number of all covalently attached atoms, as proposed by Tsai et al. : C3H0, C3H1, C4H1, C4H2, C4H3, N3H0, N3H1, N3H2, N4H3, O1H0, O2H1, S2H0, S2H1.
• Distances between all pairs of atoms are computed and binned into 13 * (13 - 1)/2 + 13 → 91 lists corresponding to each pair of atomic types (C3H0-C3H0, C3H0-C3H1, etc.)
• Each list or bin of distances is then sorted in non-decreasing order. The sorted distance elements binned into various lists according to chemical nature of the atoms constitutes the shape descriptor of the binding pocket.
• To compare a pair of sites, each of the 91 lists is chosen in one site together with the corresponding list from the other site, and the cumulative number of similar distance elements is determined.
• A pair of distances from two lists is marked a match if the distance differ at most by a threshold of 0.5.
We call the tuned version of PocketMatch for the MHC class II binding site comparison, by considering solvent accessible atoms and 13 atomic group types, PocketMatch13.
The numerator is simply the number of matching intra-site distances. However, the denominator can be the number of intra-site distances in either the smaller site or in the larger site -- these give rise to two PM13Score values, referred to as PMSMax and PMSMin, respectively. Unless stated otherwise, PM13Score refers to the PMSMin value.
PM13Score values decrease as the similarity between a pair of binding grooves decreases (Figure (Figure6).6). The rate at which the scores decrease is affected by the threshold chosen for site comparison, since this affects the number of matching distance elements between a pair of distance-sequences. To illustrate the effect of perturbing the conformation of a binding groove, the coordinates of atoms in the binding groove of [PDB:1JWS] (A, B, C chains) were perturbed randomly, and an ensemble of 1000 structures was generated with root mean square deviation (RMSD) values up to 5 Å with respect to the original [PDB:1JWS] structure. We have used a similar strategy for sensitivity analysis for the original PocketMatch algorithm  and found that a threshold of 0.5 Å was adequate to distinguish between similar and dissimilar sites. Figure Figure6A6A shows the PM13Scores obtained by comparing the original [PDB:1JWS] structure with each of the perturbed structures in the ensemble. Rather than perturbing the atomic coordinates randomly, an alternative method for generating an ensemble of perturbed conformations would be to use conformations from a molecular dynamics trajectory. To investigate the effect of altering the chemical nature of the binding groove while retaining its original geometry, the atomic group labels of some of the atomic groups in the binding groove of [PDB:1JWS] (A, B, C chains) were re-assigned randomly, and PocketMatch13 was used to compare the modified binding groove with the original one (Figure (Figure6B).6B). Figures Figures6A6A and and6B6B demonstrate that PM13Scores capture differences due to both the geometry and the chemical nature of the binding groove.
Given a set of binding sites (whole groove or sub-pockets), one way of visualizing the relationships among these is to generate a cladogram based on distances between pairs of sites. The distance between a pair of sites is defined here to be 1-PM13Score between the two sites. The cladogram generation program is based on the neighbour joining method available in Phylip-3.67  which generates trees in Newick format, which can be visualized and labelled using MEGA . When generating cladograms, data were input to the program in descending order of PM13Scores.
The authors declare that they have no competing interests.
KY participated in implementation of the atom type version of PocketMatch, setting up of computational framework for large scale site comparisons and helped to draft the manuscript. TU participated in preparing the data set. GJLK participated in the design and coordination of the study and helped to draft the manuscript. NC participated in reviewing results, manuscript and scientific discussions.
A zip compressed archive with supplementary Figures S1-4 and Table S1.
We are grateful for support from the Kristina Stenborg Foundation. We acknowledge support from the Department of Biotechnology(DBT), Govt. of India. We also acknowledge useful comments received on preliminary results presented at ISMB/ECCB 2009 with support from a travel fellowship to KY from BioSapiens.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 1, 2010: Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S1.