Tools provided on the IEDB analysis resource website can be grouped into those that make predictions or carry out analyses (). Predictive tools can be further grouped based on their targeted immune recognition contexts: (i) peptide:MHC binding, (ii) antigen processing and (iii) B-cell-receptor/antibody binding. Since the last publication on the analysis resource, major updates and additions have been made to tools in all categories. The following sections describe these updates in detail.
An overview of immune-epitope related bioinformatics tools provided by the IEDB-AR
All T-cell epitope predictive tools take protein sequences represented by single letter amino acids as input. Once submitted, protein sequences are broken into appropriate peptide lengths as specified by the user. Predictions are made against the set of MHC molecules chosen by the user. Outputs consist of a list of peptides and their predicted scores, indicating their likelihood of binding or being epitopes.
MHC class I peptide binding predictions
For peptide:MHC-I binding prediction, a number of improvements and additions have been made since the last web-server issue on the IEDB-AR in 2008. Specifically, larger datasets have been used to re-train all existing prediction methods, adding multiple new MHC molecules in the process. Currently binding affinity data and corresponding predictors are available for 56 human, 19 non-human primate and six mouse molecules. This represents an increase of 42% in the number of molecules covered.
Furthermore, three powerful new prediction methods have been added. The first, SMMPMBEC
) is an improvement of the SMM scoring matrix-based predictor (3
implements a Bayesian approach that prefers scoring matrices that are consistent with known amino acid similarities. This is particularly helpful when estimating the contribution of specific residues to peptide binding for molecules characterized by limited binding data. The second newly added method, NetMHCpan (4
), is a neural-network based predictor, similar to NetMHC (5
), but with the crucial difference that for NetMHCpan, a single network-ensemble is trained on all MHC molecules simultaneously, incorporating both the peptide and contact residues from the MHC sequences. This allows the method to extrapolate and estimate binding predictions for any MHC molecule, including those not included in the training set (6
), by leveraging known sequence:binding data affinity relationships and extending this to those MHC molecules with no binding data.
Consensus is the third newly added method, which has not been previously described. The method was motivated by observations made by investigators in the machine-learning community that combining predictions from different predictors may yield higher predictive performance than any of the individual predictors (7
). In the current implementation, predictions from various predictors are first transformed into percentile scores, thereby allowing comparisons across predictors on a uniform scale. For a given peptide and a predictor, a percentile score is defined as a percentage of random peptides sampled from naturally occurring proteins that score better than the peptide. Using the consensus approach, the final predicted binding affinity score for the peptide is a median of percentile scores from the different predictors. Notable use of an early version of a consensus method for a large-scale prediction has been described in Moutaftsi et al.
Benchmarks for the different tools have been performed in the respective publications with average predictive performances of 0.881 AUC for the class I (NetMHCpan-2.0 against HLA A and B molecules) (4
). Of special note, an MHC class I prediction competition was held recently for the first time (9
). Tested on blind peptide:MHC binding datasets generated by an independent group, the consensus method hosted at the IEDB-AR has consistently ranked high (i.e. within the top 5 entries out of 20 total) among competitors. The blind datasets were generated for three molecules, each involving 9- and 10-mers. The combined dataset consisted of ~1200 measurements with 20% composed of binders, and the average predictive performance on this dataset for the IEDB consensus prediction was 0.96 AUC, notably higher than our own prediction performance estimates, only surpassed by the NetMHCcons (10
), NetMHC and NetMHCpan methods.
To date, top-performing methods have been entirely sequence based, despite potential advantages of structure-based methods. It is, however, expected that as structural modeling techniques improve, predictive methods based on 3D structures with comparable accuracy will emerge (11
MHC class II peptide-binding predictions
Similar to the class I tools, all class II tools have been re-trained using newly available binding data. Importantly, a large set of data have become available that covers a set of prevalent HLA-DP and DQ molecules, for which previously there were very little data available. This was not due to lack of importance of these molecules but rather due to the significantly greater experimental effort involved in characterizing them when compared to the HLA DR molecules. With the new data available, for the first time, the prediction methods cover a large fraction of human MHC class II molecules (13
). The molecules covered () were selected for experimental characterization based on their high frequency in the worldwide human population.
HLA-DP, DQ, DR molecules chosen based on their high frequency in the human population. Allele frequency data are provided by dbMHC
Two new methods for class II binding prediction [NN-align (8
) and NetMHCIIpan (9
)] were added to the resource website. Both methods are artificial neural network based and are trained using a concurrent alignment and weight optimization neural network training procedure described in (14–16
). NN-align is molecule-specific (i.e. one neural network method is trained for each MHC class II molecule), and NetMHCIIpan is HLA-DR pan-specific (i.e. one neural network method is trained covering all HLA-DR molecules). Both methods include encoding of peptide flanking regions (PFR), PFR length and the peptide length to boost the predictive performance. The pan specificity of NetMHCIIpan is achieved (as was the case for NetMHCpan) by including both the peptide and contact residues from the MHC sequences in the network training. Both novel methods have demonstrated superior predictive performance compared to the earlier methods included in the resource website with average AUC performance values of 0.821 (NN-align) and 0.846 (NetMHCIIpan) when benchmarked against a large set of HLA-DR molecules (15
MHC class I antigen processing predictions
In addition to binding of peptides to MHC molecules, there are additional steps in the MHC class I pathway that a peptide has to pass in order to be recognized by the immune system (17
). This includes proteasomal cleavage (18
) and TAP transport (20
), which have been utilized in combination with MHC binding predictions to identify T-cell epitopes (21
). A new such integrative predictive approach has recently been developed called NetCTLpan (22
), distinguished by the use of the NetMHCpan method for the peptide:MHC binding step. It has been demonstrated that for cases where a low false positive rate is desired, (high specificity predictions) proteasome cleavage and transport efficiency by TAP contribute to improved predictive performances (22
User interface updates for MHC class I/II peptide binding tools
The web interfaces to the MHC class I and II binding predictive tools have been updated based on user feedback. One example was that the selection of MHC molecules from a drop-down list used to make predictions had become cumbersome, especially with the addition of the NetMHCpan tools. To address this, there is now a checkbox that is selected by default that limits the MHC molecules included for selection to those that occur in at least 1% of the human population. The vast majority of users are focusing on such molecules, and the smaller list is much easier to navigate, whereas the entirety of alleles is still available by simply unselecting the checkbox. This feature is currently available only for MHC class I tools.
In addition, the user can select different combinations of MHC molecules and lengths, so that a different group of predictions can be run in one iteration rather than repeatedly selecting prediction method and retrieving results. Also with the addition of NetMHCpan, one can upload any MHC molecule of interest to allow predictions for MHC molecules outside of those provided by the IEDB-AR. Once a table of predictions is generated, it can be ‘expanded’ to show greater detail of how individual components of the scores contribute to the final scores ().
Figure 1. Screenshot of the peptide:MHC-I binding predictive tool results page generated using the ‘IEDB recommended’ option. The first highlighted area at the top indicates a checkbox with which the user can expand the table to display method-specific (more ...)
Finally, another user request was for the IEDB team to spell out, which prediction method is recommended for a given task. Therefore, a default choice is provided, named ‘IEDB Recommended’. Based on availability of predictors and previously observed predictive performance, this selection tries to use the best possible method for a given MHC molecule. Currently, for peptide:MHC-I binding prediction, for a given MHC molecule, ‘IEDB Recommended’ uses the consensus method consisting of NetMHC, SMM and CombLib if a trained predictor is available for the molecule. Otherwise, NetMHCpan is used. This choice was motivated by the expected predictive performance of the methods in decreasing order: Consensus
CombLib. For peptide:MHC-II binding prediction, ‘IEDB Recommended’ again uses the ‘consensus’ approach, combining NN-align, SMM-align and CombLib. The expected predicted performance for MHC-II binding methods in decreasing order are Consensus
CombLib. Of note, we fully expect the IEDB recommendation to change as we perform larger benchmarks of newly developed methods on blind datasets to determine an accurate assessment of prediction quality. For example, recent evaluations suggested that NetMHCpan is actually superior in performance to all allele-specific predictors unless there is a very large amount of data available for the particular allele (10
). If this result can be confirmed on new binding datasets, the IEDB recommendation will change.
B-cell epitope prediction
A new addition to IEDB-AR since its last publication is the ElliPro tool (23
). ElliPro predicts linear and discontinuous epitopes for a protein structure or sequence provided by the user; for sequences, the structure is modeled using MODELLER (24
). ElliPro is based on the geometrical properties of protein structure and does not require training. Tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody–protein complexes (25
), ElliPro has an AUC of 0.73 when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most in the top three for >70% of proteins and never exceeded five, ElliPro can be considered a useful research tool for identifying antibody epitopes in protein antigens. Details on the comparison of ElliPro with other structure-based epitope prediction methods can be found in (23
). It would also be interesting to compare the method against sequence-based approaches as well (26
Epitope analysis: Homology mapping tool
The homology mapping tool enables analysis of an epitope’s location in the 3D structure of its source protein. For a given epitope, linear or discontinuous, from IEDB or submitted by a user, the tool searches for known 3D protein structures in the Protein Data Bank (PDB) (27
) that are homologous to the epitope source sequence. The output page () provides mapping of the epitope to the source sequence, the PDB hits that contain the epitope regions, secondary structures and solvent accessibilities for each residue, presented in the format of either pairwise or multiple sequence alignment for the selected PDB hits, obtained using ClustalW2 (28
Residues in the alignment are colored by relative solvent accessibility; coloring can be modified according to the user-specified cutoffs on relative solvent accessibility of all atoms or side chain atoms only. PDB structures with mapped epitopes can be visualized using EpitopeViewer (30
). The 3D viewer feature uses java webstart technology, which does not appear to be smoothly integrated into Google chrome browsers. To use the feature, we recommend using Firefox.
Figure 2. Screenshots of the homology modeling tool. (A) The input page. (B) The output page: a pair-wise sequence alignment of the source protein and one of the PDB hits. Epitope residues are shown in orange. Solvent exposed residues (with a relative solvent accessibility (more ...)
In addition to maintaining the homology mapping tool, IEDB-AR is open to incorporating tools developed by external groups. One such type of tools that is of much interest would address the problem of choosing an optimal set of epitopes given various constraints for vaccine design (31
IEDB application programming interface (IEDB-API) for peptide:MHC binding predictive tools
A frequent user request has been to enable integration of IEDB-AR tools into external applications. For example, the Los Alamos HIV Immunology Database (33
) wanted to utilize the IEDB-AR MHC binding prediction for their Epitope Location Finder tool (ELF: www.hiv.lanl.gov/content/sequence/ELF/epitope_analyzer.html
). To address this general need, an application programming interface (IEDB-API) for tools predictive of MHC binding has been implemented.
IEDB-API has been implemented using a simple (RESTful) interface that allows a user to send a prediction request via a simple HTTP POST request to the IEDB-AR server, which in turn generates a page with the prediction results. Such programmatic calls can be integrated into external applications, and they not only allow the user to run prediction jobs in batch mode but also ensure that the most up-to-date predictions will continue to be used. Finally, these calls do not require any local installation. Details on the use of IEDB-API as well as examples are provided at the IEDB-AR website (http://tools.iedb.org/main/html/iedb_api.html