Binding of peptides to Major Histocompatibility Complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC class I system (HLA-I) is extremely polymorphic. The number of registered HLA-I molecules has now surpassed 1500. Characterizing the specificity of each separately would be a major undertaking.
Here, we have drawn on a large database of known peptide-HLA-I interactions to develop a bioinformatics method, which takes both peptide and HLA sequence information into account, and generates quantitative predictions of the affinity of any peptide-HLA-I interaction. Prospective experimental validation of peptides predicted to bind to previously untested HLA-I molecules, cross-validation, and retrospective prediction of known HIV immune epitopes and endogenous presented peptides, all successfully validate this method. We further demonstrate that the method can be applied to perform a clustering analysis of MHC specificities and suggest using this clustering to select particularly informative novel MHC molecules for future biochemical and functional analysis.
Encompassing all HLA molecules, this high-throughput computational method lends itself to epitope searches that are not only genome- and pathogen-wide, but also HLA-wide. Thus, it offers a truly global analysis of immune responses supporting rational development of vaccines and immunotherapy. It also promises to provide new basic insights into HLA structure-function relationships. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan.
MULTIPRED2 is a computational system for facile prediction of peptide binding to multiple alleles belonging to human leukocyte antigen (HLA) class I and class II DR molecules. It enables prediction of peptide binding to products of individual HLA alleles, combination of alleles, or HLA supertypes. NetMHCpan and NetMHCIIpan are used as prediction engines. The 13 HLA Class I supertypes are A1, A2, A3, A24, B7, B8, B27, B44, B58, B62, C1, and C4. The 13 HLA Class II DR supertypes are DR1, DR3, DR4, DR6, DR7, DR8, DR9, DR11, DR12, DR13, DR14, DR15, and DR16. In total, MULTIPRED2 enables prediction of peptide binding to 1077 variants representing 26 HLA supertypes. MULTIPRED2 has visualization modules for mapping promiscuous T-cell epitopes as well as those regions of high target concentration – referred to as T-cell epitope hotspots. Novel graphic representations are employed to display the predicted binding peptides and immunological hotspots in an intuitive manner and also to provide a global view of results as heat maps. Another function of MULTIPRED2, which has direct relevance to vaccine design, is the calculation of population coverage. Currently it calculates population coverage in five major groups in North America. MULTIPRED2 is an important tool to complement wet-lab experimental methods for identification of T-cell epitopes. It is available at http://cvc.dfci.harvard.edu/multipred2/.
T-cell epitope hotspots; HLA; HLA supertype; Human Leukocyte Antigen; promiscuous binding peptide; vaccine design
Binding of peptides to major histocompatibility complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC genomic region (called HLA) is extremely polymorphic comprising several thousand alleles, each encoding a distinct MHC molecule. The potentially unique specificity of the majority of HLA alleles that have been identified to date remains uncharacterized. Likewise, only a limited number of chimpanzee and rhesus macaque MHC class I molecules have been characterized experimentally. Here, we present NetMHCpan-2.0, a method that generates quantitative predictions of the affinity of any peptide–MHC class I interaction. NetMHCpan-2.0 has been trained on the hitherto largest set of quantitative MHC binding data available, covering HLA-A and HLA-B, as well as chimpanzee, rhesus macaque, gorilla, and mouse MHC class I molecules. We show that the NetMHCpan-2.0 method can accurately predict binding to uncharacterized HLA molecules, including HLA-C and HLA-G. Moreover, NetMHCpan-2.0 is demonstrated to accurately predict peptide binding to chimpanzee and macaque MHC class I molecules. The power of NetMHCpan-2.0 to guide immunologists in interpreting cellular immune responses in large out-bred populations is demonstrated. Further, we used NetMHCpan-2.0 to predict potential binding peptides for the pig MHC class I molecule SLA-1*0401. Ninety-three percent of the predicted peptides were demonstrated to bind stronger than 500 nM. The high performance of NetMHCpan-2.0 for non-human primates documents the method's ability to provide broad allelic coverage also beyond human MHC molecules. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan.
MHC class I; Binding specificity; Non-human primates; Artificial neural networks; CTL epitopes
Motivation: MHC:peptide binding plays a central role in activating the immune surveillance. Computational approaches to determine T-cell epitopes restricted to any given major histocompatibility complex (MHC) molecule are of special practical value in the development of for instance vaccines with broad population coverage against emerging pathogens. Methods have recently been published that are able to predict peptide binding to any human MHC class I molecule. In contrast to conventional allele-specific methods, these methods do allow for extrapolation to uncharacterized MHC molecules. These pan-specific human lymphocyte antigen (HLA) predictors have not previously been compared using independent evaluation sets.
Result: A diverse set of quantitative peptide binding affinity measurements was collected from Immune Epitope database (IEDB), together with a large set of HLA class I ligands from the SYFPEITHI database. Based on these datasets, three different pan-specific HLA web-accessible predictors NetMHCpan, adaptive double threading (ADT) and kernel-based inter-allele peptide binding prediction system (KISS) were evaluated. The performance of the pan-specific predictors was also compared with a well performing allele-specific MHC class I predictor, NetMHC, as well as a consensus approach integrating the predictions from the NetMHC and NetMHCpan methods.
Conclusions: The benchmark demonstrated that pan-specific methods do provide accurate predictions also for previously uncharacterized MHC molecules. The NetMHCpan method trained to predict actual binding affinities was consistently top ranking both on quantitative (affinity) and binary (ligand) data. However, the KISS method trained to predict binary data was one of the best performing methods when benchmarked on binary data. Finally, a consensus method integrating predictions from the two best performing methods was shown to improve the prediction accuracy.
Supplementary information: Supplementary data are available at Bioinformatics online.
Reliable predictions of immunogenic peptides are essential in rational vaccine design and can minimize the experimental effort needed to identify epitopes. In this work, we describe a pan-specific major histocompatibility complex (MHC) class I epitope predictor, NetCTLpan. The method integrates predictions of proteasomal cleavage, transporter associated with antigen processing (TAP) transport efficiency, and MHC class I binding affinity into a MHC class I pathway likelihood score and is an improved and extended version of NetCTL. The NetCTLpan method performs predictions for all MHC class I molecules with known protein sequence and allows predictions for 8-, 9-, 10-, and 11-mer peptides. In order to meet the need for a low false positive rate, the method is optimized to achieve high specificity. The method was trained and validated on large datasets of experimentally identified MHC class I ligands and cytotoxic T lymphocyte (CTL) epitopes. It has been reported that MHC molecules are differentially dependent on TAP transport and proteasomal cleavage. Here, we did not find any consistent signs of such MHC dependencies, and the NetCTLpan method is implemented with fixed weights for proteasomal cleavage and TAP transport for all MHC molecules. The predictive performance of the NetCTLpan method was shown to outperform other state-of-the-art CTL epitope prediction methods. Our results further confirm the importance of using full-type human leukocyte antigen restriction information when identifying MHC class I epitopes. Using the NetCTLpan method, the experimental effort to identify 90% of new epitopes can be reduced by 15% and 40%, respectively, when compared to the NetMHCpan and NetCTL methods. The method and benchmark datasets are available at http://www.cbs.dtu.dk/services/NetCTLpan/.
Electronic supplementary material
The online version of this article (doi:10.1007/s00251-010-0441-4) contains supplementary material, which is available to authorized users.
MHC class I pathway; HLA; Pan-specific prediction; CTL epitope; MHC polymorphism
Predictive models of peptide-Major Histocompatibility Complex (MHC) binding affinity are important components of modern computational immunovaccinology. Here, we describe the development and deployment of a reliable peptide-binding prediction method for a previously poorly-characterized human MHC class I allele, HLA-Cw*0102.
Using an in-house, flow cytometry-based MHC stabilization assay we generated novel peptide binding data, from which we derived a precise two-dimensional quantitative structure-activity relationship (2D-QSAR) binding model. This allowed us to explore the peptide specificity of HLA-Cw*0102 molecule in detail. We used this model to design peptides optimized for HLA-Cw*0102-binding. Experimental analysis showed these peptides to have high binding affinities for the HLA-Cw*0102 molecule. As a functional validation of our approach, we also predicted HLA-Cw*0102-binding peptides within the HIV-1 genome, identifying a set of potent binding peptides. The most affine of these binding peptides was subsequently determined to be an epitope recognized in a subset of HLA-Cw*0102-positive individuals chronically infected with HIV-1.
A functionally-validated in silico-in vitro approach to the reliable and efficient prediction of peptide binding to a previously uncharacterized human MHC allele HLA-Cw*0102 was developed. This technique is generally applicable to all T cell epitope identification problems in immunology and vaccinology.
In all vertebrate animals, CD8+ cytotoxic T lymphocytes (CTLs) are controlled by major histocompatibility complex class I (MHC-I) molecules. These are highly polymorphic peptide receptors selecting and presenting endogenously derived epitopes to circulating CTLs. The polymorphism of the MHC effectively individualizes the immune response of each member of the species. We have recently developed efficient methods to generate recombinant human MHC-I (also known as human leukocyte antigen class I, HLA-I) molecules, accompanying peptide-binding assays and predictors, and HLA tetramers for specific CTL staining and manipulation. This has enabled a complete mapping of all HLA-I specificities (“the Human MHC Project”). Here, we demonstrate that these approaches can be applied to other species. We systematically transferred domains of the frequently expressed swine MHC-I molecule, SLA-1*0401, onto a HLA-I molecule (HLA-A*11:01), thereby generating recombinant human/swine chimeric MHC-I molecules as well as the intact SLA-1*0401 molecule. Biochemical peptide-binding assays and positional scanning combinatorial peptide libraries were used to analyze the peptide-binding motifs of these molecules. A pan-specific predictor of peptide–MHC-I binding, NetMHCpan, which was originally developed to cover the binding specificities of all known HLA-I molecules, was successfully used to predict the specificities of the SLA-1*0401 molecule as well as the porcine/human chimeric MHC-I molecules. These data indicate that it is possible to extend the biochemical and bioinformatics tools of the Human MHC Project to other vertebrate species.
Recombinant MHC; Peptide specificity; Binding predictions
Prediction of peptide binding to major histocompatibility complex (MHC) molecules is a basis for anticipating T-cell epitopes, as well as epitope discovery-driven vaccine development. In the human, MHC molecules are known as human leukocyte antigens (HLAs) and are extremely polymorphic. HLA polymorphism is the basis of differential peptide binding, until now limiting the practical use of current epitope-prediction tools for vaccine development. Here, we describe a web server, PEPVAC (Promiscuous EPitope-based VACcine), optimized for the formulation of multi-epitope vaccines with broad population coverage. This optimization is accomplished through the prediction of peptides that bind to several HLA molecules with similar peptide-binding specificity (supertypes). Specifically, we offer the possibility of identifying promiscuous peptide binders to five distinct HLA class I supertypes (A2, A3, B7, A24 and B15). We estimated the phenotypic population frequency of these supertypes to be 95%, regardless of ethnicity. Targeting these supertypes for promiscuous peptide-binding predictions results in a limited number of potential epitopes without compromising the population coverage required for practical vaccine design considerations. PEPVAC can also identify conserved MHC ligands, as well as those with a C-terminus resulting from proteasomal cleavage. The combination of these features with the prediction of promiscuous HLA class I ligands further limits the number of potential epitopes. The PEPVAC server is hosted by the Dana-Farber Cancer Institute at the site .
Accurate identification of peptides binding to specific Major Histocompatibility Complex Class II (MHC-II) molecules is of great importance for elucidating the underlying mechanism of immune recognition, as well as for developing effective epitope-based vaccines and promising immunotherapies for many severe diseases. Due to extreme polymorphism of MHC-II alleles and the high cost of biochemical experiments, the development of computational methods for accurate prediction of binding peptides of MHC-II molecules, particularly for the ones with few or no experimental data, has become a topic of increasing interest. TEPITOPE is a well-used computational approach because of its good interpretability and relatively high performance. However, TEPITOPE can be applied to only 51 out of over 700 known HLA DR molecules.
We have developed a new method, called TEPITOPEpan, by extrapolating from the binding specificities of HLA DR molecules characterized by TEPITOPE to those uncharacterized. First, each HLA-DR binding pocket is represented by amino acid residues that have close contact with the corresponding peptide binding core residues. Then the pocket similarity between two HLA-DR molecules is calculated as the sequence similarity of the residues. Finally, for an uncharacterized HLA-DR molecule, the binding specificity of each pocket is computed as a weighted average in pocket binding specificities over HLA-DR molecules characterized by TEPITOPE.
The performance of TEPITOPEpan has been extensively evaluated using various data sets from different viewpoints: predicting MHC binding peptides, identifying HLA ligands and T-cell epitopes and recognizing binding cores. Among the four state-of-the-art competing pan-specific methods, for predicting binding specificities of unknown HLA-DR molecules, TEPITOPEpan was roughly the second best method next to NETMHCIIpan-2.0. Additionally, TEPITOPEpan achieved the best performance in recognizing binding cores. We further analyzed the motifs detected by TEPITOPEpan, examining the corresponding literature of immunology. Its online server and PSSMs therein are available at http://www.biokdd.fudan.edu.cn/Service/TEPITOPEpan/.
NetMHC-3.0 is trained on a large number of quantitative peptide data using both affinity data from the Immune Epitope Database and Analysis Resource (IEDB) and elution data from SYFPEITHI. The method generates high-accuracy predictions of major histocompatibility complex (MHC): peptide binding. The predictions are based on artificial neural networks trained on data from 55 MHC alleles (43 Human and 12 non-human), and position-specific scoring matrices (PSSMs) for additional 67 HLA alleles. As only the MHC class I prediction server is available, predictions are possible for peptides of length 8–11 for all 122 alleles. artificial neural network predictions are given as actual IC50 values whereas PSSM predictions are given as a log-odds likelihood scores. The output is optionally available as download for easy post-processing. The training method underlying the server is the best available, and has been used to predict possible MHC-binding peptides in a series of pathogen viral proteomes including SARS, Influenza and HIV, resulting in an average of 75–80% confirmed MHC binders. Here, the performance is further validated and benchmarked using a large set of newly published affinity data, non-redundant to the training set. The server is free of use and available at: http://www.cbs.dtu.dk/services/NetMHC.
Experimental screening of large sets of peptides with respect to their MHC binding capabilities is still very demanding due to the large number of possible peptide sequences and the extensive polymorphism of the MHC proteins. Therefore, there is significant interest in the development of computational methods for predicting the binding capability of peptides to MHC molecules, as a first step towards selecting peptides for actual screening.
We have examined the performance of four diverse MHC Class I prediction methods on comparatively large HLA-A and HLA-B allele peptide binding datasets extracted from the Immune Epitope Database and Analysis resource (IEDB). The chosen methods span a representative cross-section of available methodology for MHC binding predictions. Until the development of IEDB, such an analysis was not possible, as the available peptide sequence datasets were small and spread out over many separate efforts. We tested three datasets which differ in the IC50 cutoff criteria used to select the binders and non-binders. The best performance was achieved when predictions were performed on the dataset consisting only of strong binders (IC50 less than 10 nM) and clear non-binders (IC50 greater than 10,000 nM). In addition, robustness of the predictions was only achieved for alleles that were represented with a sufficiently large (greater than 200), balanced set of binders and non-binders.
All four methods show good to excellent performance on the comprehensive datasets, with the artificial neural networks based method outperforming the other methods. However, all methods show pronounced difficulties in correctly categorizing intermediate binders.
Hepatitis B virus splice-generated protein (HBSP), encoded by a spliced hepatitis B virus RNA, was recently identified in liver biopsy specimens from patients with chronic active hepatitis B. We investigated the possible generation of immunogenic peptides by the processing of this protein in vivo. We identified a panel of potential epitopes in HBSP by using predictive computational algorithms for peptide binding to HLA molecules. We used transgenic mice devoid of murine major histocompatibility complex (MHC) class I molecules and positive for human MHC class I molecules to characterize immune responses specific for HBSP. Two HLA-A2-restricted peptides and one immunodominant HLA-B7-restricted epitope were identified following the immunization of mice with DNA vectors encoding HBSP. Most importantly, a set of overlapping peptides covering the HBSP sequence induced significant HBSP-specific T-cell responses in peripheral blood mononuclear cells from patients with chronic hepatitis B. The response was multispecific, as several epitopes were recognized by CD8+ and CD4+ human T cells. This study provides the first evidence that this protein generated in vivo from an alternative reading frame of the hepatitis B virus genome activates T-cell responses in hepatitis B virus-infected patients. Given that hepatitis B is an immune response-mediated disease, the detection of T-cell responses directed against HBSP in patients with chronic hepatitis B suggests a potential role for this protein in liver disease progression.
In vertebrates the major histocompatibility complex (MHC) presents peptides to the immune system. In humans MHCs are called human leukocyte antigens (HLAs), and some of the loci encoding them are the most polymorphic in the human genome. Different MHC molecules present different subsets of peptides, and knowledge of their binding specificities is important for understanding the differences in the immune response between individuals. Knowledge of motifs may be used to identify epitopes, understand the MHC restriction of epitopes and to compare the specificities of different MHC molecules. Several groups have developed prediction methods designed to provide broad allelic coverage of the MHC polymorphism [9-11]. These methods do in contrast to conventional allele-specific methods take both the peptide and the peptide:MHC interaction environment into account, thus allowing for extrapolations to accurately predict the binding specificity of un-characterized MHC molecules. The utility of these algorithms that predict which peptides MHC molecules bind are hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have therefore developed a web-server, MHC motif viewer, that allows the display of the likely binding motif for all human class I proteins of the loci HLA-A, B, C, and E and for MHC class I molecules from chimpanzee (Pan troglodytes), rhesus monkey (Macaca mulatta) and mouse (Mus musculus). Furthermore, it covers all HLA-DR protein sequences. A special viewing feature “MHC fight” allows for display of the specificity of two different MHC molecules side by side. We show how the web-server can be used to discover and display surprising similarities as well as differences between MHC molecules within and between different species. The MHC motif viewer is available at http://www.cbs.dtu.dk/researchgroups/immunology/HLA/Home.html
MHC; HLA; Motifs; Comparison; Viewer; Class I; Class II
T-cells are key players in regulating a specific immune response. Activation of cytotoxic T-cells requires recognition of specific peptides bound to Major Histocompatibility Complex (MHC) class I molecules. MHC-peptide complexes are potential tools for diagnosis and treatment of pathogens and cancer, as well as for the development of peptide vaccines. Only one in 100 to 200 potential binders actually binds to a certain MHC molecule, therefore a good prediction method for MHC class I binding peptides can reduce the number of candidate binders that need to be synthesized and tested.
Here, we present a novel approach, SVMHC, based on support vector machines to predict the binding of peptides to MHC class I molecules. This method seems to perform slightly better than two profile based methods, SYFPEITHI and HLA_BIND. The implementation of SVMHC is quite simple and does not involve any manual steps, therefore as more data become available it is trivial to provide prediction for more MHC types. SVMHC currently contains prediction for 26 MHC class I types from the MHCPEP database or alternatively 6 MHC class I types from the higher quality SYFPEITHI database. The prediction models for these MHC types are implemented in a public web service available at http://www.sbc.su.se/svmhc/.
Prediction of MHC class I binding peptides using Support Vector Machines, shows high performance and is easy to apply to a large number of MHC class I types. As more peptide data are put into MHC databases, SVMHC can easily be updated to give prediction for additional MHC class I types. We suggest that the number of binding peptides needed for SVM training is at least 20 sequences.
MHC class I; Peptide prediction; Machine Learning; Support Vector Machines
Accurate T-cell epitope prediction is a principal objective of computational vaccinology. As a service to the immunology and vaccinology communities at large, we have implemented, as a server on the World Wide Web, a partial least squares-based multivariate statistical approach to the quantitative prediction of peptide binding to major histocom- patibility complexes (MHC), the key checkpoint on the antigen presentation pathway within adaptive cellular immunity. MHCPred implements robust statistical models for both Class I alleles (HLA-A*0101, HLA-A*0201, HLA-A*0202, HLA-A*0203, HLA-A*0206, HLA-A*0301, HLA-A*1101, HLA-A*3301, HLA-A*6801, HLA-A*6802 and HLA-B*3501) and Class II alleles (HLA-DRB*0401, HLA-DRB*0401 and HLA-DRB*0701). MHCPred is available from the URL: http://www.jenner.ac.uk/MHCPred.
Major Histocompatibility class II (MHC-II) molecules sample peptides from the extracellular space allowing the immune system to detect the presence of foreign microbes from this compartment. Prediction of MHC class II ligands is complicated by the open binding cleft of the MHC class II molecule, allowing binding of peptides extending out of the binding groove. Furthermore, only a few HLA-DR alleles have been characterized with a sufficient number of peptides (100–200 peptides per allele) to derive accurate description of their binding motif. Little work has been performed characterizing structural properties of MHC class II ligands. Here, we perform one such large-scale analysis. A large set of SYFPEITHI MHC class II ligands covering more than 20 different HLA-DR molecules was analyzed in terms of their secondary structure and surface exposure characteristics in the context of the native structure of the corresponding source protein. We demonstrated that MHC class II ligands are significantly more exposed and have significantly more coil content than other peptides in the same protein with similar predicted binding affinity. We next exploited this observation to derive an improved prediction method for MHC class II ligands by integrating prediction of MHC- peptide binding with prediction of surface exposure and protein secondary structure. This combined prediction method was shown to significantly outperform the state-of-the-art MHC class II peptide binding prediction method when used to identify MHC class II ligands. We also tried to integrate N- and O-glycosylation in our prediction methods but this additional information was found not to improve prediction performance. In summary, these findings strongly suggest that local structural properties influence antigen processing and/or the accessibility of peptides to the MHC class II molecule.
Antigenic peptides recognized by virus-specific cytotoxic T lymphocytes (CTLs) are presented by major histocompatibility complex (MHC; or human leukocyte antigen [HLA] in humans) molecules, and the peptide selection and presentation strategy of the host has been studied to guide our understanding of cellular immunity and vaccine development. Here, a severe acute respiratory syndrome coronavirus (SARS-CoV) nucleocapsid (N) protein-derived CTL epitope, N1 (QFKDNVILL), restricted by HLA-A*2402 was identified by a series of in vitro studies, including a computer-assisted algorithm for prediction, stabilization of the peptide by co-refolding with HLA-A*2402 heavy chain and β2-microglobulin (β2m), and T2-A24 cell binding. Consequently, the antigenicity of the peptide was confirmed by enzyme-linked immunospot (ELISPOT), proliferation assays, and HLA-peptide complex tetramer staining using peripheral blood mononuclear cells (PBMCs) from donors who had recovered from SARS donors. Furthermore, the crystal structure of HLA-A*2402 complexed with peptide N1 was determined, and the featured peptide was characterized with two unexpected intrachain hydrogen bonds which augment the central residues to bulge out of the binding groove. This may contribute to the T-cell receptor (TCR) interaction, showing a host immunodominant peptide presentation strategy. Meanwhile, a rapid and efficient strategy is presented for the determination of naturally presented CTL epitopes in the context of given HLA alleles of interest from long immunogenic overlapping peptides.
The 2009 pandemic influenza was milder than expected. Based on the apparent lack of pre-existing cross-protective antibodies to the A (H1N1)pdm09 strain, it was hypothesized that pre-existing CD4+ T cellular immunity provided the crucial immunity that led to an attenuation of disease severity. We carried out a pilot scale study by conducting in silico and in vitro T cellular assays in healthy population, to evaluate the pre-existing immunity to A (H1N1)pdm09 strain.
Large-scale epitope prediction analysis was done by examining the NCBI available (H1N1) HA proteins. NetMHCIIpan, an eptiope prediction tool was used to identify the putative and shared CD4+ T cell epitopes between seasonal H1N1 and A (H1N1)pdm09 strains. To identify the immunogenicity of these putative epitopes, human IFN-γ-ELISPOT assays were conducted using the peripheral blood mononuclear cells from fourteen healthy human donors. All donors were screened for the HLA-DRB1 alleles.
Epitope-specific CD4+ T cellular memory responses (IFN-γ) were generated to highly conserved HA epitopes from majority of the donors (93%). Higher magnitude of the CD4+ T cell responses was observed in the older adults. The study identified two HA2 immunodominant CD4+ T cell epitopes, of which one was found to be novel.
The current study provides a compelling evidence of HA epitope specific CD4+ T cellular memory towards A (H1N1)pdm09 strain. These well-characterized epitopes could recruit alternative immunological pathways to overcome the challenge of annual seasonal flu vaccine escape.
Influenza A/H1N1 viruses; Hemagglutinin; CD4+ T cell epitope; Immunodominant epitope; Novel conserved HA epitope
Antigen presenting cells (APCs) sample the extra cellular space and present peptides from here to T helper cells, which can be activated if the peptides are of foreign origin. The peptides are presented on the surface of the cells in complex with major histocompatibility class II (MHC II) molecules. Identification of peptides that bind MHC II molecules is thus a key step in rational vaccine design and developing methods for accurate prediction of the peptide:MHC interactions play a central role in epitope discovery. The MHC class II binding groove is open at both ends making the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC) and three mouse H2-IA alleles.
The predictive performance of the SMM-align method was demonstrated to be superior to that of the Gibbs sampler, TEPITOPE, SVRMHC, and MHCpred methods. Cross validation between peptide data set obtained from different sources demonstrated that direct incorporation of peptide length potentially results in over-fitting of the binding prediction method. Focusing on amino terminal peptide flanking residues (PFR), we demonstrate a consistent gain in predictive performance by favoring binding registers with a minimum PFR length of two amino acids. Visualizing the binding motif as obtained by the SMM-align and TEPITOPE methods highlights a series of fundamental discrepancies between the two predicted motifs. For the DRB1*1302 allele for instance, the TEPITOPE method favors basic amino acids at most anchor positions, whereas the SMM-align method identifies a preference for hydrophobic or neutral amino acids at the anchors.
The SMM-align method was shown to outperform other state of the art MHC class II prediction methods. The method predicts quantitative peptide:MHC binding affinity values, making it ideally suited for rational epitope discovery. The method has been trained and evaluated on the, to our knowledge, largest benchmark data set publicly available and covers the nine HLA-DR supertypes suggested as well as three mouse H2-IA allele. Both the peptide benchmark data set, and SMM-align prediction method (NetMHCII) are made publicly available.
T cells recognize a complex between a specific major histocompatibility complex (MHC) molecule and a particular pathogen-derived epitope. A given epitope will elicit a response only in individuals that express an MHC molecule capable of binding that particular epitope. MHC molecules are extremely polymorphic and over a thousand different human MHC (HLA) alleles are known. A disproportionate amount of MHC polymorphism occurs in positions constituting the peptide-binding region, and as a result, MHC molecules exhibit a widely varying binding specificity. In the design of peptide-based vaccines and diagnostics, the issue of population coverage in relation to MHC polymorphism is further complicated by the fact that different HLA types are expressed at dramatically different frequencies in different ethnicities. Thus, without careful consideration, a vaccine or diagnostic with ethnically biased population coverage could result.
To address this issue, an algorithm was developed to calculate, on the basis of HLA genotypic frequencies, the fraction of individuals expected to respond to a given epitope set, diagnostic or vaccine. The population coverage estimates are based on MHC binding and/or T cell restriction data, although the tool can be utilized in a more general fashion. The algorithm was implemented as a web-application available at .
We have developed a web-based tool to predict population coverage of T-cell epitope-based diagnostics and vaccines based on MHC binding and/or T cell restriction data. Accordingly, epitope-based vaccines or diagnostics can be designed to maximize population coverage, while minimizing complexity (that is, the number of different epitopes included in the diagnostic or vaccine), and also minimizing the variability of coverage obtained or projected in different ethnic groups.
Recombinant HLA-A2, HLA-B8, or HLA-B53 heavy chain produced in Escherichia coli was combined with recombinant β2-microglobulin (β2m) and a pool of randomly synthesised nonamer peptides. This mixture was allowed to refold to form stable major histocompatability complex (MHC) class I complexes, which were then purified by gel filtration chromatography. The peptides bound to the MHC class I molecules were subsequently eluted and sequenced as a pool. Peptide binding motifs for these three MHC class I molecules were derived and compared with previously described motifs derived from analysis of naturally processed peptides eluted from the surface of cells. This comparison indicated that the peptides bound by the recombinant MHC class I molecules showed a similar motif to naturally processed and presented peptides, with the exception of the peptide COOH terminus. Whereas the motifs derived from naturally processed peptides eluted from HLA-A2 and HLA-B8 indicated a strong preference for hydrophobic amino acids at the COOH terminus, this preference was not observed in our studies. We propose that this difference reflects the effects of processing or transport on the peptide repertoire available for binding to MHC class I molecules in vivo.
MHC class II proteins bind oligopeptide fragments derived from proteolysis of pathogen antigens, presenting them at the cell surface for recognition by CD4+ T cells. Human MHC class II alleles are grouped into three loci: HLA-DP, HLA-DQ and HLA-DR. In contrast to HLA-DR and HLA-DQ, HLA-DP proteins have not been studied extensively, as they have been viewed as less important in immune responses than DRs and DQs. However, it is now known that HLA-DP alleles are associated with many autoimmune diseases. Quite recently, the X-ray structure of the HLA-DP2 molecule (DPA*0103, DPB1*0201) in complex with a self-peptide derived from the HLA-DR α-chain has been determined. In the present study, we applied a validated molecular docking protocol to a library of 247 modelled peptide-DP2 complexes, seeking to assess the contribution made by each of the 20 naturally occurred amino acids at each of the nine binding core peptide positions and the four flanking residues (two on both sides).
The free binding energies (FBEs) derived from the docking experiments were normalized on a position-dependent (npp) and on an overall basis (nap), and two docking score-based quantitative matrices (DS-QMs) were derived: QMnpp and QMnap. They reveal the amino acid preferences at each of the 13 positions considered in the study. Apart from the leading role of anchor positions p1 and p6, the binding to HLA-DP2 depends on the preferences at p2. No effect of the flanking residues was found on the peptide binding predictions to DP2, although all four of them show strong preferences for particular amino acids. The predictive ability of the DS-QMs was tested using a set of 457 known binders to HLA-DP2, originating from 24 proteins. The sensitivities of the predictions at five different thresholds (5%, 10%, 15%, 20% and 25%) were calculated and compared to the predictions made by the NetMHCII and IEDB servers. Analysis of the DS-QMs indicated an improvement in performance. Additionally, DS-QMs identified the binding cores of several known DP2 binders.
The molecular docking protocol, as applied to a combinatorial library of peptides, models the peptide-HLA-DP2 protein interaction effectively, generating reliable predictions in a quantitative assessment. The method is structure-based and does not require extensive experimental sequence-based data. Thus, it is universal and can be applied to model any peptide - protein interaction.
The Major Histocompatibility Complex (MHC) plays an important role in the human immune system. The MHC is involved in the antigen presentation system assisting T cells to identify foreign or pathogenic proteins. However, an MHC molecule binding a self-peptide may incorrectly trigger an immune response and cause an autoimmune disease, such as multiple sclerosis. Understanding the molecular mechanism of this process will greatly assist in determining the aetiology of various diseases and in the design of effective drugs. In the present study, we have used the Fresno semi-empirical scoring function and modify the approach to the prediction of peptide-MHC binding by using open-source and public domain software. We apply the method to HLA class II alleles DR15, DR1, and DR4, and the HLA class I allele HLA A2. Our analysis shows that using a large set of binding data and multiple crystal structures improves the predictive capability of the method. The performance of the method is also shown to be correlated to the structural similarity of the crystal structures used. We have exposed some of the obstacles faced by structure-based prediction methods and proposed possible solutions to those obstacles. It is envisaged that these obstacles need to be addressed before the performance of structure-based methods can be on par with the sequence-based methods.
The highly polymorphic major histocompatibility complex class Ia (MHC-Ia) molecules present a broad array of peptides to the clonotypically diverse αβ T-cell receptors. In contrast, MHC-Ib molecules exhibit limited polymorphism and bind a more restricted peptide repertoire, in keeping with their major role in innate immunity. Nevertheless, some MHC-Ib molecules do play a role in adaptive immunity. While human leukocyte antigen E (HLA-E), the MHC-Ib molecule, binds a very restricted repertoire of peptides, the peptide binding preferences of HLA-G, the class Ib molecule, are less stringent, although the basis by which HLA-G can bind various peptides is unclear. To investigate how HLA-G can accommodate different peptides, we compared the structure of HLA-G bound to three naturally abundant self-peptides (RIIPRHLQL, KGPPAALTL and KLPQAFYIL) and their thermal stabilities. The conformation of HLA-GKGPPAALTL was very similar to that of the HLA-GRIIPRHLQL structure. However, the structure of HLA-GKLPQAFYIL not only differed in the conformation of the bound peptide but also caused a small shift in the α2 helix of HLA-G. Furthermore, the relative stability of HLA-G was observed to be dependent on the nature of the bound peptide. These peptide-dependent effects on the substructure of the monomorphic HLA-G are likely to impact on its recognition by receptors of both innate and adaptive immune systems.
human leukocyte antigen G, HLA-G; structural immunology; innate immunity; antigen presentation; adaptive immunity
The three-dimensional structure of a SARS coronavirus-derived peptide, VQQESSFVM, bound to the human major histocompatibility complex (MHC) class I antigen HLA-B*1501 is presented.
The human leukocyte antigen (HLA) class I system comprises a highly polymorphic set of molecules that specifically bind and present peptides to cytotoxic T cells. HLA-B*1501 is a prototypical member of the HLA-B62 supertype and only two peptide–HLA-B*1501 structures have been determined. Here, the crystal structure of HLA-B*1501 in complex with a SARS coronavirus-derived nonapeptide (VQQESSFVM) has been determined at high resolution (1.87 Å). The peptide is deeply anchored in the B and F pockets, but with the Glu4 residue pointing away from the floor in the peptide-binding groove, making it available for interactions with a potential T-cell receptor.
human leukocyte antigen class I; SARS coronavirus-derived peptides; HLA-B*1501