Reliable CTL epitope predictions can minimize the experimental effort needed to identify new CTL epitopes to be used in for example vaccine design or for diagnostic purposes. Tong et al. [22
] comments on the reports of algorithms that integrate MHC class I predictions with TAP and proteasomal cleavage specificities: "These techniques are still in their infancy and need to be further developed and thoroughly tested". Here, we make a first attempt to test the performance of five of these methods on two evaluation sets of experimentally verified HIV CTL epitopes. It turned out to be a highly non-trivial task to design an objective benchmark. Mainly because the prediction methods each generate epitope predictions in a specific format and potentially with different mechanisms that filter the number of prediction scores made available to the user. Our final performance measures consist firstly of a RANK measure that allows for an objective comparison of accuracy between the different prediction methods. For comparing prediction specificity, we define three levels of prediction sensitivity, so that comparisons can be performed at equal levels. Finally, we compare the sensitivity among the 5% top-scoring peptides as obtained by each method.
Using the defined performance measures, we performed a large-scale benchmark calculation comparing the predictive performance of a series of publicly available methods for CTL epitope prediction. The benchmark included the EpiJen, MAPPP, WAPP, and MHC-pathway methods, and an updated version of the NetCTL method. The updated version of NetCTL, version 1.2, can make predictions for the A26 and B39 HLA supertypes thus completing the list of 12 recognized supertypes, and was shown to have a higher predictive performance than the old version 1.0. We find that NetCTL-1.2 has a higher predictive performance than EpiJen, MAPPP, MHC-pathway, and WAPP on all measures. When comparing NetCTL-1.2 with MAPPP and WAPP, the higher performance of NetCTL-1.2 is statistically significant on all measures. When comparing NetCTL-1.2 with EpiJen, the higher performance of NetCTL-1.2 is statistically significant for all measures except when comparing the specificities at the sensitivity values of 0.3 and 0.5 on the HIVEpiJen
dataset. When comparing NetCTL-1.2 with MHC-pathway, the higher performance of NetCTL-1.2 is statistically significant for all measures, except when comparing the specificities at the sensitivity values of 0.3 and 0.5 on either evaluation dataset. It is not surprising that MHC-pathway reaches almost as high predictive performance as NetCTL-1.2 on some of the performance measures. These two methods have several features in common: Firstly, the MHC binding prediction methods included in the MHC-pathway and NetCTL prediction methods, have recently in a large scale benchmark been shown to have comparable performance [18
]. Secondly, they use identical methods for predicting TAP transport efficiency; namely the matrix method developed by Peters et al. [23
]. Thirdly, they integrate the predicted values obtained from the separate proteasomal cleavage, TAP transport efficiency, and MHC class I affinity predictors into one combined score. Regarding differences it can be mentioned that the proteasomal cleavage predictor used for MHC-pathway is trained on in vitro
data, while NetCTL-1.2's proteasomal cleavage predictor, NetChop-3.0, is trained on natural MHC class I ligands.
NetCTL-1.2, MAPPP, and MHC-pathway integrates the predicted values into one, overall score, while EpiJen and WAPP use a number of successive filters that step by step reduce the number of possible epitopes. Doytchinova et al. [16
] has stated that the "combined score as used by SMM (MHC-pathway) and NetCTL, obscures the final result, because a low (or even negative) TAP and/or proteasomal score could be compensated for by a high MHC score." We would here like to offer our interpretation of how the combined score can be understood in a biological meaningful manner: First of all, we see the predictive values as probabilities. Secondly, one has to keep in mind that there is not just one copy of a given protein in the cell. This means that if for example a certain peptide has a low predicted cleavage score and will only be generated in 1 out of a 100 cleavage events, the peptide can still survive all the way to the cell surface and become a CTL epitope, if the TAP transport efficiency and MHC class I affinity are sufficiently high.
We have throughout the analysis on the HIV dataset compared NetCTL-1.2 to each of the other test methods separately. This was done in order to include epitopes restricted to as many supertypes as possible. Had we chosen only to include epitopes restricted to supertypes that all methods had in common, we could only have included the A1, A2, and A3 supertypes. The shortcoming of this approach is that comparisons can not be made directly in between the test methods. For comparisons in between the test methods, we refer to calculations done on the HIVEpiJen dataset, which only contains epitopes restricted to the A1, A2, and A3 supertypes.
Lastly, we would like to note that the NetCTL method predicts CTL epitopes that are presented via a pathway that utilizes TAP for peptide entry into ER. Additional pathways also exist as reviewed in [24
]. Their contribution to the total presentation of MHC class I ligands is, however, thought to be minor [25