The identification of peptides binding to major histocompatibility complexes (MHC) is a critical step in the understanding of T cell immune responses. The human MHC genomic region (HLA) is extremely polymorphic comprising several thousand alleles, many encoding a distinct molecule. The potentially unique specificities remain experimentally uncharacterized for the vast majority of HLA molecules. Likewise, for nonhuman species, only a minor fraction of the known MHC molecules have been characterized. Here, we describe a tool, MHCcluster, to functionally cluster MHC molecules based on their predicted binding specificity. The method has a flexible web interface that allows the user to include any MHC of interest in the analysis. The output consists of a static heat map and graphical tree-based visualizations of the functional relationship between MHC variants and a dynamic TreeViewer interface where both the functional relationship and the individual binding specificities of MHC molecules are visualized. We demonstrate that conventional sequence-based clustering will fail to identify the functional relationship between molecules, when applied to MHC system, and only through the use of the predicted binding specificity can a correct clustering be found. Clustering of prevalent HLA-A and HLA-B alleles using MHCcluster confirms the presence of 12 major specificity groups (supertypes) some however with highly divergent specificities. Importantly, some HLA molecules are shown not to fit any supertype classification. Also, we use MHCcluster to show that chimpanzee MHC class I molecules have a reduced functional diversity compared to that of HLA class I molecules. MHCcluster is available at www.cbs.dtu.dk/services/MHCcluster-2.0.
MHC; HLA; Binding motif; Functional clustering; MHC specificity; Supertypes
The interaction between antibodies and antigens is one of the most important immune system mechanisms for clearing infectious organisms from the host. Antibodies bind to antigens at sites referred to as B-cell epitopes. Identification of the exact location of B-cell epitopes is essential in several biomedical applications such as; rational vaccine design, development of disease diagnostics and immunotherapeutics. However, experimental mapping of epitopes is resource intensive making in silico methods an appealing complementary approach. To date, the reported performance of methods for in silico mapping of B-cell epitopes has been moderate. Several issues regarding the evaluation data sets may however have led to the performance values being underestimated: Rarely, all potential epitopes have been mapped on an antigen, and antibodies are generally raised against the antigen in a given biological context not against the antigen monomer. Improper dealing with these aspects leads to many artificial false positive predictions and hence to incorrect low performance values. To demonstrate the impact of proper benchmark definitions, we here present an updated version of the DiscoTope method incorporating a novel spatial neighborhood definition and half-sphere exposure as surface measure. Compared to other state-of-the-art prediction methods, Discotope-2.0 displayed improved performance both in cross-validation and in independent evaluations. Using DiscoTope-2.0, we assessed the impact on performance when using proper benchmark definitions. For 13 proteins in the training data set where sufficient biological information was available to make a proper benchmark redefinition, the average AUC performance was improved from 0.791 to 0.824. Similarly, the average AUC performance on an independent evaluation data set improved from 0.712 to 0.727. Our results thus demonstrate that given proper benchmark definitions, B-cell epitope prediction methods achieve highly significant predictive performances suggesting these tools to be a powerful asset in rational epitope discovery. The updated version of DiscoTope is available at www.cbs.dtu.dk/services/DiscoTope-2.0.
The human immune system has an incredible ability to fight pathogens (bacterial, fungal and viral infections). One of the most important immune system events involved in clearing infectious organisms is the interaction between the antibodies and antigens (molecules such as proteins from the pathogenic organism). Antibodies bind to antigens at sites known as B-cell epitopes. Hence, identification of areas on the surface antigens capable of binding to antibodies (also known as B-cell epitopes) may aid the development of various immune related applications (e.g. vaccines and immunotherapeutic). However, experimental identification of B-cell epitopes is a resource intensive task, thereby making computer-aided methods an appealing complementary approach. Previously reported performances of methods for B cell epitope predictive have been moderate. Here, we present an updated version of the B-cell epitope prediction method; DiscoTope, that on the basis of a protein structure and epitope propensity scores predicts residues likely to be involved in B-cell epitopes. We demonstrate that the low performances to some extent can be explained by poorly defined benchmarks, and that inclusion of additional biological information greatly enhances the predictive performance. This suggests that, given proper benchmark definitions, state-of-the-art B cell epitope prediction methods perform significantly better than generally assumed.
In this paper, we describe the methodologies behind three different aspects of the NetMHC family for prediction of MHC class I binding, mainly to HLAs. We we have updated the prediction servers servers, NetMHC-3.2, NetMHCpan-2.2, and a new consensus method, NetMHCcons, which, in their previous versions, have been evaluated to be among the very best performing MHC:peptide binding predictors available. Here we describe the background for these methods, and the rationale behind the different optimisation steps implemented in the methods. We go through the practical use of the methods, which are publicly available in the form of relatively fast and simple web interfaces. Furthermore, we will review results optained in actual epitope discovery projects where previous implementations of the described methods have been used in the initial selection of potential epitopes. Selected potential epitopes were all evaluated experimentally using ex vivo assays.
Prediction methods as well as experimental methods for T-cell epitope discovery have developed significantly in recent years. High-throughput experimental methods have made it possible to perform full-length protein scans for epitopes restricted to a limited number of MHC alleles. The high costs and limitations regarding the number of proteins and MHC alleles that are feasibly handled by such experimental methods have made in silico prediction models of high interest. MHC binding prediction methods are today of a very high quality and can predict MHC binding peptides with high accuracy. This is possible for a large range of MHC alleles and relevant length of binding peptides. The predictions can easily be performed for complete proteomes of any size. Prediction methods are still, however, dependent on good experimental methods for validation, and should merely be used as a guide for rational epitope discovery. We expect prediction methods as well as experimental validation methods to continue to develop and that we will soon see clinical trials of products whose development has been guided by prediction methods.
CTL; epitope; HLA; MHC; prediction; T cell; vaccine
CD4+ T cells orchestrate immunity against viral infections, but their importance in HIV infection remains controversial. Nevertheless, comprehensive studies have associated increase in breadth and functional characteristics of HIV-specific CD4+ T cells with decreased viral load. A major challenge for the identification of HIV-specific CD4+ T cells targeting broadly reactive epitopes in populations with diverse ethnic background stems from the vast genomic variation of HIV and the diversity of the host cellular immune system. Here, we describe a novel epitope selection strategy, PopCover, that aims to resolve this challenge, and identify a set of potential HLA class II-restricted HIV epitopes that in concert will provide optimal viral and host coverage. Using this selection strategy, we identified 64 putative epitopes (peptides) located in the Gag, Nef, Env, Pol and Tat protein regions of HIV. In total, 73% of the predicted peptides were found to induce HIV-specific CD4+ T cell responses. The Gag and Nef peptides induced most responses. The vast majority of the peptides (93%) had predicted restriction to the patient’s HLA alleles. Interestingly, the viral load in viremic patients was inversely correlated to the number of targeted Gag peptides. In addition, the predicted Gag peptides were found to induce broader polyfunctional CD4+ T cell responses compared to the commonly used Gag-p55 peptide pool. These results demonstrate the power of the PopCover method for the identification of broadly recognized HLA class II-restricted epitopes. All together, selection strategies, such as PopCover, might with success be used for the evaluation of antigen-specific CD4+ T cell responses and design of future vaccines.
The immune epitope database analysis resource (IEDB-AR: http://tools.iedb.org) is a collection of tools for prediction and analysis of molecular targets of T- and B-cell immune responses (i.e. epitopes). Since its last publication in the NAR webserver issue in 2008, a new generation of peptide:MHC binding and T-cell epitope predictive tools have been added. As validated by different labs and in the first international competition for predicting peptide:MHC-I binding, their predictive performances have improved considerably. In addition, a new B-cell epitope prediction tool was added, and the homology mapping tool was updated to enable mapping of discontinuous epitopes onto 3D structures. Furthermore, to serve a wider range of users, the number of ways in which IEDB-AR can be accessed has been expanded. Specifically, the predictive tools can be programmatically accessed using a web interface and can also be downloaded as software packages.
Epitopes from all available full-length sequences of yellow fever virus (YFV) and dengue fever virus (DENV) restricted by Human Leukocyte Antigen class I (HLA-I) alleles covering 12 HLA-I supertypes were predicted using the NetCTL algorithm. A subset of 179 predicted YFV and 158 predicted DENV epitopes were selected using the EpiSelect algorithm to allow for optimal coverage of viral strains. The selected predicted epitopes were synthesized and approximately 75% were found to bind the predicted restricting HLA molecule with an affinity, KD, stronger than 500 nM. The immunogenicity of 25 HLA-A*02:01, 28 HLA-A*24:02 and 28 HLA-B*07:02 binding peptides was tested in three HLA-transgenic mice models and led to the identification of 17 HLA-A*02:01, 4 HLA-A*2402 and 4 HLA-B*07:02 immunogenic peptides. The immunogenic peptides bound HLA significantly stronger than the non-immunogenic peptides. All except one of the immunogenic peptides had KD below 100 nM and the peptides with KD below 5 nM were more likely to be immunogenic. In addition, all the immunogenic peptides that were identified as having a high functional avidity had KD below 20 nM. A*02:01 transgenic mice were also inoculated twice with the 17DD YFV vaccine strain. Three of the YFV A*02:01 restricted peptides activated T-cells from the infected mice in vitro. All three peptides that elicited responses had an HLA binding affinity of 2 nM or less. The results indicate the importance of the strength of HLA binding in shaping the immune response.
β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC = 0.50, Qtotal = 82.1%, sensitivity = 75.6%, PPV = 68.8% and AUC = 0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17 – 0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively.
The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences.
Binding of peptides to Major Histocompatibility class II (MHC-II) molecules play a central role in governing responses of the adaptive immune system. MHC-II molecules sample peptides from the extracellular space allowing the immune system to detect the presence of foreign microbes from this compartment. Predicting which peptides bind to an MHC-II molecule is therefore of pivotal importance for understanding the immune response and its effect on host-pathogen interactions. The experimental cost associated with characterizing the binding motif of an MHC-II molecule is significant and large efforts have therefore been placed in developing accurate computer methods capable of predicting this binding event. Prediction of peptide binding to MHC-II is complicated by the open binding cleft of the MHC-II molecule, allowing binding of peptides extending out of the binding groove. Moreover, the genes encoding the MHC molecules are immensely diverse leading to a large set of different MHC molecules each potentially binding a unique set of peptides. Characterizing each MHC-II molecule using peptide-screening binding assays is hence not a viable option.
Here, we present an MHC-II binding prediction algorithm aiming at dealing with these challenges. The method is a pan-specific version of the earlier published allele-specific NN-align algorithm and does not require any pre-alignment of the input data. This allows the method to benefit also from information from alleles covered by limited binding data. The method is evaluated on a large and diverse set of benchmark data, and is shown to significantly out-perform state-of-the-art MHC-II prediction methods. In particular, the method is found to boost the performance for alleles characterized by limited binding data where conventional allele-specific methods tend to achieve poor prediction accuracy.
The method thus shows great potential for efficient boosting the accuracy of MHC-II binding prediction, as accurate predictions can be obtained for novel alleles at highly reduced experimental costs. Pan-specific binding predictions can be obtained for all alleles with know protein sequence and the method can benefit by including data in the training from alleles even where only few binders are known. The method and benchmark data are available at http://www.cbs.dtu.dk/services/NetMHCIIpan-2.0
Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway, the field has evolved significantly. Methods have now been developed that produce highly accurate binding predictions for many alleles and integrate both proteasomal cleavage and transport events. Moreover have so-called pan-specific methods been developed, which allow for prediction of peptide binding to MHC alleles characterized by limited or no peptide binding data. Most of the developed methods are publicly available, and have proven to be very useful as a shortcut in epitope discovery. Here, we will go through some of the history of sequence-based predictions of helper as well as cytotoxic T cell epitopes. We will focus on some of the most accurate methods and their basic background.
CPHmodels-3.0 is a web server predicting protein 3D structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile–profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6 Å when superimposed to the 3D structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D structure with an average RMSD of 9.3 Å. These performance values place the CPHmodels-3.0 method in the group of high performing 3D prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is <20 min. The web server is available at http://www.cbs.dtu.dk/services/CPHmodels/.
Reliable predictions of immunogenic peptides are essential in rational vaccine design and can minimize the experimental effort needed to identify epitopes. In this work, we describe a pan-specific major histocompatibility complex (MHC) class I epitope predictor, NetCTLpan. The method integrates predictions of proteasomal cleavage, transporter associated with antigen processing (TAP) transport efficiency, and MHC class I binding affinity into a MHC class I pathway likelihood score and is an improved and extended version of NetCTL. The NetCTLpan method performs predictions for all MHC class I molecules with known protein sequence and allows predictions for 8-, 9-, 10-, and 11-mer peptides. In order to meet the need for a low false positive rate, the method is optimized to achieve high specificity. The method was trained and validated on large datasets of experimentally identified MHC class I ligands and cytotoxic T lymphocyte (CTL) epitopes. It has been reported that MHC molecules are differentially dependent on TAP transport and proteasomal cleavage. Here, we did not find any consistent signs of such MHC dependencies, and the NetCTLpan method is implemented with fixed weights for proteasomal cleavage and TAP transport for all MHC molecules. The predictive performance of the NetCTLpan method was shown to outperform other state-of-the-art CTL epitope prediction methods. Our results further confirm the importance of using full-type human leukocyte antigen restriction information when identifying MHC class I epitopes. Using the NetCTLpan method, the experimental effort to identify 90% of new epitopes can be reduced by 15% and 40%, respectively, when compared to the NetMHCpan and NetCTL methods. The method and benchmark datasets are available at http://www.cbs.dtu.dk/services/NetCTLpan/.
Electronic supplementary material
The online version of this article (doi:10.1007/s00251-010-0441-4) contains supplementary material, which is available to authorized users.
MHC class I pathway; HLA; Pan-specific prediction; CTL epitope; MHC polymorphism
Motivation: MHC:peptide binding plays a central role in activating the immune surveillance. Computational approaches to determine T-cell epitopes restricted to any given major histocompatibility complex (MHC) molecule are of special practical value in the development of for instance vaccines with broad population coverage against emerging pathogens. Methods have recently been published that are able to predict peptide binding to any human MHC class I molecule. In contrast to conventional allele-specific methods, these methods do allow for extrapolation to uncharacterized MHC molecules. These pan-specific human lymphocyte antigen (HLA) predictors have not previously been compared using independent evaluation sets.
Result: A diverse set of quantitative peptide binding affinity measurements was collected from Immune Epitope database (IEDB), together with a large set of HLA class I ligands from the SYFPEITHI database. Based on these datasets, three different pan-specific HLA web-accessible predictors NetMHCpan, adaptive double threading (ADT) and kernel-based inter-allele peptide binding prediction system (KISS) were evaluated. The performance of the pan-specific predictors was also compared with a well performing allele-specific MHC class I predictor, NetMHC, as well as a consensus approach integrating the predictions from the NetMHC and NetMHCpan methods.
Conclusions: The benchmark demonstrated that pan-specific methods do provide accurate predictions also for previously uncharacterized MHC molecules. The NetMHCpan method trained to predict actual binding affinities was consistently top ranking both on quantitative (affinity) and binary (ligand) data. However, the KISS method trained to predict binary data was one of the best performing methods when benchmarked on binary data. Finally, a consensus method integrating predictions from the two best performing methods was shown to improve the prediction accuracy.
Supplementary information: Supplementary data are available at Bioinformatics online.
Estimation of the reliability of specific real value predictions is nontrivial and the efficacy of this is often questionable. It is important to know if you can trust a given prediction and therefore the best methods associate a prediction with a reliability score or index. For discrete qualitative predictions, the reliability is conventionally estimated as the difference between output scores of selected classes. Such an approach is not feasible for methods that predict a biological feature as a single real value rather than a classification. As a solution to this challenge, we have implemented a method that predicts the relative surface accessibility of an amino acid and simultaneously predicts the reliability for each prediction, in the form of a Z-score.
An ensemble of artificial neural networks has been trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. This is in contrast to the most commonly used procedures where reliabilities are obtained by post-processing the output.
The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability score with the individual predictions. However, our implementation of reliability scores in the form of a Z-score is shown to be the more informative measure for discriminating good predictions from bad ones in the entire range from completely buried to fully exposed amino acids. This is evident when comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0.79 and 0.74 are obtained using our and the compared method, respectively. This tendency is true for any selected subset.
CD4 positive T helper cells control many aspects of specific immunity. These cells are specific for peptides derived from protein antigens and presented by molecules of the extremely polymorphic major histocompatibility complex (MHC) class II system. The identification of peptides that bind to MHC class II molecules is therefore of pivotal importance for rational discovery of immune epitopes. HLA-DR is a prominent example of a human MHC class II. Here, we present a method, NetMHCIIpan, that allows for pan-specific predictions of peptide binding to any HLA-DR molecule of known sequence. The method is derived from a large compilation of quantitative HLA-DR binding events covering 14 of the more than 500 known HLA-DR alleles. Taking both peptide and HLA sequence information into account, the method can generalize and predict peptide binding also for HLA-DR molecules where experimental data is absent. Validation of the method includes identification of endogenously derived HLA class II ligands, cross-validation, leave-one-molecule-out, and binding motif identification for hitherto uncharacterized HLA-DR molecules. The validation shows that the method can successfully predict binding for HLA-DR molecules—even in the absence of specific data for the particular molecule in question. Moreover, when compared to TEPITOPE, currently the only other publicly available prediction method aiming at providing broad HLA-DR allelic coverage, NetMHCIIpan performs equivalently for alleles included in the training of TEPITOPE while outperforming TEPITOPE on novel alleles. We propose that the method can be used to identify those hitherto uncharacterized alleles, which should be addressed experimentally in future updates of the method to cover the polymorphism of HLA-DR most efficiently. We thus conclude that the presented method meets the challenge of keeping up with the MHC polymorphism discovery rate and that it can be used to sample the MHC “space,” enabling a highly efficient iterative process for improving MHC class II binding predictions.
CD4 positive T helper cells provide essential help for stimulation of both cellular and humoral immune reactions. T helper cells recognize peptides presented by molecules of the major histocompatibility complex (MHC) class II system. HLA-DR is a prominent example of a human MHC class II locus. The HLA molecules are extremely polymorphic, and more than 500 different HLA-DR protein sequences are known today. Each HLA-DR molecule potentially binds a unique set of antigenic peptides, and experimental characterization of the binding specificity for each molecule would be an immense and highly costly task. Only a very limited set of MHC molecules has been characterized experimentally. We have demonstrated earlier that it is possible to derive accurate predictions for MHC class I proteins by interpolating information from neighboring molecules. It is not straightforward to take a similar approach to derive pan-specific HLA-DR class II predictions because the HLA class II molecules can bind peptides of very different lengths. Here, we nonetheless show that this is indeed possible. We develop an HLA-DR pan-specific method that allows for prediction of binding to any HLA-DR molecule of known sequence—even in the absence of specific data for the particular molecule in question.
We present a new release of the immune epitope database analysis resource (IEDB-AR, http://tools.immuneepitope.org), a repository of web-based tools for the prediction and analysis of immune epitopes. New functionalities have been added to most of the previously implemented tools, and a total of eight new tools were added, including two B-cell epitope prediction tools, four T-cell epitope prediction tools and two analysis tools.
NetMHC-3.0 is trained on a large number of quantitative peptide data using both affinity data from the Immune Epitope Database and Analysis Resource (IEDB) and elution data from SYFPEITHI. The method generates high-accuracy predictions of major histocompatibility complex (MHC): peptide binding. The predictions are based on artificial neural networks trained on data from 55 MHC alleles (43 Human and 12 non-human), and position-specific scoring matrices (PSSMs) for additional 67 HLA alleles. As only the MHC class I prediction server is available, predictions are possible for peptides of length 8–11 for all 122 alleles. artificial neural network predictions are given as actual IC50 values whereas PSSM predictions are given as a log-odds likelihood scores. The output is optionally available as download for easy post-processing. The training method underlying the server is the best available, and has been used to predict possible MHC-binding peptides in a series of pathogen viral proteomes including SARS, Influenza and HIV, resulting in an average of 75–80% confirmed MHC binders. Here, the performance is further validated and benchmarked using a large set of newly published affinity data, non-redundant to the training set. The server is free of use and available at: http://www.cbs.dtu.dk/services/NetMHC.
Reliable predictions of Cytotoxic T lymphocyte (CTL) epitopes are essential for rational vaccine design. Most importantly, they can minimize the experimental effort needed to identify epitopes. NetCTL is a web-based tool designed for predicting human CTL epitopes in any given protein. It does so by integrating predictions of proteasomal cleavage, TAP transport efficiency, and MHC class I affinity. At least four other methods have been developed recently that likewise attempt to predict CTL epitopes: EpiJen, MAPPP, MHC-pathway, and WAPP. In order to compare the performance of prediction methods, objective benchmarks and standardized performance measures are needed. Here, we develop such large-scale benchmark and corresponding performance measures and report the performance of an updated version 1.2 of NetCTL in comparison with the four other methods.
We define a number of performance measures that can handle the different types of output data from the five methods. We use two evaluation datasets consisting of known HIV CTL epitopes and their source proteins. The source proteins are split into all possible 9 mers and except for annotated epitopes; all other 9 mers are considered non-epitopes. In the RANK measure, we compare two methods at a time and count how often each of the methods rank the epitope highest. In another measure, we find the specificity of the methods at three predefined sensitivity values. Lastly, for each method, we calculate the percentage of known epitopes that rank within the 5% peptides with the highest predicted score.
NetCTL-1.2 is demonstrated to have a higher predictive performance than EpiJen, MAPPP, MHC-pathway, and WAPP on all performance measures. The higher performance of NetCTL-1.2 as compared to EpiJen and MHC-pathway is, however, not statistically significant on all measures. In the large-scale benchmark calculation consisting of 216 known HIV epitopes covering all 12 recognized HLA supertypes, the NetCTL-1.2 method was shown to have a sensitivity among the 5% top-scoring peptides above 0.72. On this dataset, the best of the other methods achieved a sensitivity of 0.64. The NetCTL-1.2 method is available at .
All used datasets are available at .
Over the past decade a number of bioinformatics tools have been developed that use genomic sequences as input to predict to which parts of a microbe the immune system will react, the so-called epitopes. Many predicted epitopes have later been verified experimentally, demonstrating the usefulness of such predictions. At the same time, simulation models have been developed that describe the dynamics of different immune cell populations and their interactions with microbes. These models have been used to explain experimental findings where timing is of importance, such as the time between administration of a vaccine and infection with the microbe that the vaccine is intended to protect against. In this paper, we outline a framework for integration of these two approaches. As an example, we develop a model in which HIV dynamics are correlated with genomics data. For the first time, the fitness of wild type and mutated virus are assessed by means of a sequence-dependent scoring matrix, derived from a BLOSUM matrix, that links protein sequences to growth rates of the virus in the mathematical model. A combined bioinformatics and systems biology approach can lead to a better understanding of immune system-related diseases where both timing and genomic information are of importance.
mathematical modelling; HIV; bioinformatics; simulation; immunology; immune system
Binding of peptides to Major Histocompatibility Complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC class I system (HLA-I) is extremely polymorphic. The number of registered HLA-I molecules has now surpassed 1500. Characterizing the specificity of each separately would be a major undertaking.
Here, we have drawn on a large database of known peptide-HLA-I interactions to develop a bioinformatics method, which takes both peptide and HLA sequence information into account, and generates quantitative predictions of the affinity of any peptide-HLA-I interaction. Prospective experimental validation of peptides predicted to bind to previously untested HLA-I molecules, cross-validation, and retrospective prediction of known HIV immune epitopes and endogenous presented peptides, all successfully validate this method. We further demonstrate that the method can be applied to perform a clustering analysis of MHC specificities and suggest using this clustering to select particularly informative novel MHC molecules for future biochemical and functional analysis.
Encompassing all HLA molecules, this high-throughput computational method lends itself to epitope searches that are not only genome- and pathogen-wide, but also HLA-wide. Thus, it offers a truly global analysis of immune responses supporting rational development of vaccines and immunotherapy. It also promises to provide new basic insights into HLA structure-function relationships. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan.
Antigen presenting cells (APCs) sample the extra cellular space and present peptides from here to T helper cells, which can be activated if the peptides are of foreign origin. The peptides are presented on the surface of the cells in complex with major histocompatibility class II (MHC II) molecules. Identification of peptides that bind MHC II molecules is thus a key step in rational vaccine design and developing methods for accurate prediction of the peptide:MHC interactions play a central role in epitope discovery. The MHC class II binding groove is open at both ends making the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC) and three mouse H2-IA alleles.
The predictive performance of the SMM-align method was demonstrated to be superior to that of the Gibbs sampler, TEPITOPE, SVRMHC, and MHCpred methods. Cross validation between peptide data set obtained from different sources demonstrated that direct incorporation of peptide length potentially results in over-fitting of the binding prediction method. Focusing on amino terminal peptide flanking residues (PFR), we demonstrate a consistent gain in predictive performance by favoring binding registers with a minimum PFR length of two amino acids. Visualizing the binding motif as obtained by the SMM-align and TEPITOPE methods highlights a series of fundamental discrepancies between the two predicted motifs. For the DRB1*1302 allele for instance, the TEPITOPE method favors basic amino acids at most anchor positions, whereas the SMM-align method identifies a preference for hydrophobic or neutral amino acids at the anchors.
The SMM-align method was shown to outperform other state of the art MHC class II prediction methods. The method predicts quantitative peptide:MHC binding affinity values, making it ideally suited for rational epitope discovery. The method has been trained and evaluated on the, to our knowledge, largest benchmark data set publicly available and covers the nine HLA-DR supertypes suggested as well as three mouse H2-IA allele. Both the peptide benchmark data set, and SMM-align prediction method (NetMHCII) are made publicly available.
Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.
In higher organisms, major histocompatibility complex (MHC) class I molecules are present on nearly all cell surfaces, where they present peptides to T lymphocytes of the immune system. The peptides are derived from proteins expressed inside the cell, and thereby allow the immune system to “peek inside” cells to detect infections or cancerous cells. Different MHC molecules exist, each with a distinct peptide binding specificity. Many algorithms have been developed that can predict which peptides bind to a given MHC molecule. These algorithms are used by immunologists to, for example, scan the proteome of a given virus for peptides likely to be presented on infected cells. In this paper, the authors provide a large-scale experimental dataset of quantitative MHC–peptide binding data. Using this dataset, they compare how well different approaches are able to identify binding peptides. This comparison identifies an artificial neural network as the most successful approach to peptide binding prediction currently available. This comparison serves as a benchmark for future tool development, allowing bioinformaticians to document advances in tool development as well as guiding immunologists to choose good prediction algorithm.