Vaccines are the most effective means for fighting against infectious diseases [1
]. They are emerging as promising therapies for cancer [2
], allergy [3
], and autoimmunity [4
]. The goal of vaccination is to induce immunity against pathogens and cancer cells by stimulating antigen-specific cytotoxic T lymphocytes (CTLs) or B cells. CTLs recognize peptide antigens presented by major histocompatibility complex class I (MHC-I) molecules on infected cells or cancer cells and kill them. B cells produce antibodies that specifically recognize pathogen- or cancer related molecules. Both these processes are initiated and regulated by T-helper (Th) cells that recognize antigenic peptides presented by MHC class II (MHC-II) molecules. MHC-II molecules present antigenic peptides internalized by professional antigen presenting cells, such as macrophages, dendritic cells, or T lymphocytes. A vaccine must at minimum contain two antigenic epitopes: one to induce specific B-cell or CTL responses and another to induce specific Th cells that regulate (initiate, enhance, or suppress) immune responses [5
]. Peptides presented by MHC-I molecules are mainly intracellular and those presented by MHC-II molecules originate mainly from or extracellular proteins. A distinct characteristic of MHC molecules of either class is a groove that binds peptides in a highly promiscuous manner.
The peptide-binding groove of a MHC molecule consists of a β-sheet and two α-helices. A peptide binds through a network of hydrogen bonds between its backbone and the binding groove, and through interactions between the peptide side chains and pockets inside the binding groove [6
]. Most MHC-I binding peptides are 8–11 amino acids long [8
]. MHC-II molecules bind nested sets of peptides most of which are 14–18 amino acids long [9
], but some can extend beyond 30 amino acids. MHC-I molecules accommodate the whole length of the binding peptide inside their grooves that are closed [6
]. Binding grooves of a MHC-II molecules have open ends; they accommodate the 9-mer binding core of the peptides inside while peptide termini protrude outside of the grooves [7
The ability of the immune system to respond to a particular antigen differs between individuals because they display different patterns of MHC genes. Human MHC molecules are known as human leukocyte antigens. Each human individual expresses up to six HLA-I molecules and up to a dozen HLA-II molecules. HLA genes show extensive polymorphism. As of August 2008, more than 3000 HLA alleles have been identified and sequenced including 2215 HLA-I and 986 HLA-II sequences [10
]. The diversity of HLA molecules increases the probability that any foreign antigen will contain HLA-binding peptides suitable as vaccine targets. The amino acids within the binding groove determine the specificity of peptide binding to a given HLA molecule. Across multiple HLA molecules, the polymorphic residues that form the binding groove determine the repertoire of binding peptides to a particular HLA molecule. Tens of thousands of allele-specific and promiscuous MHC binders and T-cell epitopes have been identified in humans and mice while smaller numbers have been identified in other model animals, such as monkeys and rats [11
Identification of HLA binding molecules is, therefore, important for both understanding the basing molecular function of the immune system and for vaccine development. However, systematic T-cell epitope mapping is costly and time-consuming because it involves synthesis and testing of overlapping peptides spanning the full length of target antigens. For short peptides such as tumor antigen surviving (BIRC5), that is 142 amino acids long, full overlapping studies of both HLA-I and -II binders were performed for several HLA molecules [13
]. However systematic studies are prohibitively expensive for studies of long antigens, such as autoantigen thyroglobulin (2768 amino acids long), where computational predictions were used to preselect suitable targets followed by experimental validation [15
]. This problem is particularly pronounced in the studies of whole pathogen proteomes, even in small viruses, such as influenza [17
], or dengue [18
Computational prediction of peptide binding to MHC molecules has been a topic of vigorous research and development activity [19
]. Computational methods for prediction of HLA-I binding have reached a high level of sophistication and accuracy and represent significant research resources [23
]. Computational predictions of HLA-II binding were useful in the study of infectious disease [24
], cancer [26
], and autoimmunity [15
]. However, recent reports have indicated that computational predictions of HLA-II binding are of much lower accuracy than for their HLA-I counterparts [28
], and even that these predictions may cause more confusion than conclusion [30
]. The methods used for assessment of predictors of HLA-II binding have suffered from inadequately defined test sets and testing strategies. Several critical issues need to be addressed to rectify these failings.
• Only a small fraction of peptides in a given pathogen or tumor-specific proteome are able to bind to a specific MHC molecule [31
]. Tens of thousands of protein variants have been characterized in viruses [17
]. Several hundred of tumor-related antigens and their variants have been reported [32
]. The extensive variability of target antigens significantly increases the number of testable targets, making each individual binding peptide a representative of a large family of individual peptide groups or families [34
• The comparison studies performed to date have been based on assessing predictive performance using pre-defined sets of peptides, rather than well-defined standardized full-overlapping studies of complete antigens. This introduces biases and the reported performances are likely to be overestimates.
• HLA-II peptide binding is mediated through 9-mer binding core, but longer peptides are used for experimental measurement of binding. Hereby we predict one element (the 9-mer binding cores) and experimentally test with another element (15-mer, or longer peptides). This makes the improvement of false positive rate an important issue in prediction of HLA-II binding and it requires sophisticated statistical and machine learning approaches (see [28
• Both ends of the peptide binding grooves in HLA-II molecules are open, allowing the peptides to be more variable in length (typically 14–18 amino acids) and flanking residues are known to selectively affect binding [9
]. This effect is not considered in most of the HLA-II prediction methods.
• Some longer peptides bind MHC-II through multiple overlapping 9-mer registers [34
] adding further complexity to the selection of actual binding cores. The simpler question of identification of the location of 9-mer binding is extended to identification of multiple binding cores and their locations within the same peptide.
• Experimental measurements of HLA-II binding shows variation depending on the conditions of the experiment, even for the control peptides.
• Sufficient quantities of HLA-II binding data are available only for some HLA-DR molecules while, notwithstanding notable exceptions [35
], HLA-DQ and -DP molecules have been understudied.
• Presentation of HLA-II binding peptides depends on antigen processing steps including editing by HLA-DM and other accessory molecules. DM editing affects the density and preference for particular peptide species [36
]. These effects have not yet been included in the prediction approaches.
HLA-II binding predictions are thus more complex than HLA-I predictions [23
]. Various prediction algorithms have been developed to facilitate the identification of HLA-II binding peptides within protein antigens. They made computational pre-screening of antigens for HLA-II epitopes a standard approach in epitope-mapping studies; more than twenty prediction servers have been developed to facilitate the identification of MHC-II binding peptides. The performance of six prediction methods has been compared in each of the three recent studies [28
]. The overall conclusions of these studies were similar, indicating a relatively low prediction accuracy of HLA-II binding predictors. Large quantities of HLA-DR binding peptides with precise measurements have recently become available [28
], yet contemporary methods have shown little, if any, improvement when compared to the older TEPITOPE method.
This study extends the assessment of predictive power to include a much larger number of servers that predict HLA-II binding. This study was limited to seven common HLA-DR molecules that have sufficient amount and quality of peptide binding data. We compiled and established standardized test data sets that are more representative of the experimental reality, and defined a uniform scaling scheme to use data from different studies. Finally we assessed the practical applicability of HLA-II binding predictions to identification of HLA-II T-cell epitopes. Our study identified several key issues that need to be addressed for the development of improved prediction systems of HLA-II binding.