|Home | About | Journals | Submit | Contact Us | Français|
Tumor exome and RNA sequencing data provide a systematic and unbiased view on cancer-specific expression, over-expression, and mutations of genes, which can be mined for personalized cancer vaccines and other immunotherapies. Of key interest are tumor-specific mutations, because T cells recognizing neoepitopes have the potential to be highly tumoricidal. Here, we review recent developments and technical advances in identifying MHC class I and class II-restricted tumor antigens, especially neoantigen derived MHC ligands, including in silico predictions, immune-peptidome analysis by mass spectrometry, and MHC ligand validation by biochemical methods on T cells.
T cells recognize tumor antigens (TA) as short peptides presented on MHC class I and II molecules (MHC I and II). TA can induce cancer-specific T cell immunity and comprise (1) Tumor-specific TA containing mutations that are specific for a given tumor.1-3 (2) Shared TA, which are expressed on different tumors and can be divided in over-expressed and differentiation TA. (3) Cancer testis TA expressed on testis and cancer cells (http://cancerimmunity.org/peptide/). Comprehensive identification of cancer MHC ligands, especially immunogenic ones, is a challenging task due to the high polymorphism of MHC molecules, the vast diversity of TA, the difficulty to model antigen processing and presentation, and predict immunogenicity.
Many tumors are immunogenic, i.e. they induce adaptive immunity, and are infiltrated by T ells recognizing TA that can be used for adoptive T cell therapy (ACT).5-7 In the tumor environment T cells tend to be suppressed, which is a major obstacle in cancer immunotherapy and can be reversed by checkpoint blockade, e.g. blocking of PD-1 and/or CTLA-4.8 Among tumor infiltrating T cells, some recognize neoepitopes. Remarkably these cells expand upon checkpoint blockade and can be highly tumoricidal.2,8-11 Neoepitope specific T cells usually have low frequencies due to cancer immunoediting.2,12-14 However, they are attractive for ACT, because they are exquisitely tumor-specific and exhibit superior tumor control, presumably because they are not subject to central tolerance and hence may express high-affinity TCR.11,12,15,16 Even though tumor cells usually express no MHC II molecules, many tumor-infiltrated lymphocytes (TIL) are CD4+ T cells specific for neoepitopes and important players in immunotherapy.7,10,17
Immunogenic tumors are under pressure to evade immune destruction and some escape mechanisms involve aberrant TA presentation and processing, e.g., altered proteasomal degradation, RNA splicing, and/or post-translational modifications.18-22 Most cancer cells express low levels of MHC I and no II molecules. However, they are prone to cell death and as a result can be taken up by antigen presenting cells (APC), namely DC and macrophages, which (cross)-present exogenous TA on their MHC molecules and induce TA-specific CD8+ and CD4+ T cells.2,10,23 Moreover, MHC restriction adds complexity to antigen presentation, e.g. by the polymorphism of MHC molecules. In humans, all cells express up to six HLA class I molecules and professional APC eight or more HLA II molecules, which in view of the >3,000 HLA alleles affords a vast diversity.21,22
Whole exome and transcriptome NGS sequencing data provide detailed information on TA mutations, expression levels, aberrant mRNA splicing, and MHC types and can be mined for precise targeting of cancer immunotherapies, thus greatly increasing their efficacy.2,4,6,11 This calls for new high-throughput technologies that allow comprehensive identification of cancer-specific MHC ligands, T cell epitopes, especially neoepitopes and their MHC restrictions. Here, we review established technologies, present emerging new ones and discuss future perspectives.
Detailed analysis of genomic and transcriptomic changes in tumors provides unprecedented opportunities to map differences between cancer and normal cells that are perceptible to the immune system. The recently observed correlation between the efficacy of checkpoint blockade and the mutational load of the patient's tumors triggered a broad interest in predicting and validating T cell neoepitopes.1,2,7-10,17,24-26 Studies typically relied in a first part on computational approaches predicting mutation-containing MHC ligands.
To identify genomic alterations resulting in amino acid changes, cancer exome sequencing is performed. For each patient, a tumor sample and a matched healthy tissue are sequenced and somatic variants are identified by comparing the differences between the two samples (Fig. 1A). If coverage is sufficient, the majority of single nucleotide variants can be identified.27 However, variants present in only a small fraction of cancer cells or in highly polymorphic regions may be missed and efforts to improve mutation-calling pipelines are still ongoing.28,29 In addition, RNA sequencing allows for the identification of gene products selectively expressed by cancer cells like TA, cancer-specific splice variants, or gene fusions (Fig. 1B). A major challenge in analyzing exome and RNA sequencing data consists in identifying cancer-specific changes relevant for T cell tumor immunity.
To elicit T cell immunity genetic variants must result in modified peptides that can be presented on MHC molecules. Different computational methods have been developed to predict peptide binding affinity to MHC molecules. MHC I molecules present usually 8–11 residue peptides to cytotoxic CD8+ T-lymphocytes (CTL) and the peptide's N- and C-termini are involved in MHC binding.21 Therefore, MHC ligands of equal length can be naturally aligned and predictive models of binding affinity trained by identifying specific sequence motifs across known ligands for given MHC I alleles.30 More modern approaches involve machine learning techniques, e.g., neural networks like NetMHC,31,32 hidden Markov models or support vector machines that consider more intricate sequence patterns among peptide ligands33-35and peptides of different lengths.36 Different in silico MHC ligand prediction algorithms can be combined to focus on peptides predicted by multiple strategies.2,9,37
These algorithms are trained on large datasets of peptide ligands such as those collected in IEDB database.37 Therefore, they work best for HLA alleles with many known ligands. As the number of known HLA alleles expands, algorithms such as NetMHCpan have been developed to predict peptides binding to alleles for which no ligands are known.38 These methods rely on HLA allele sequence homology and correlations between the HLA allele sequences and amino acid preferences in their ligands. Approaches integrating pMHC complex structures have been proposed.39 However, including such information infers longer computation times and provides only modest improvement of prediction accuracy.40
Most algorithms predict the affinity of peptide binding to MHC molecules, which may not correlate well with their immunogenicity, i.e., ability to elicit T cell immunity, which has been reported to correlate better with pMHC complex stability.41 Moreover, cancer cells may not present predicted peptides, e.g. due to aberrant TA processing and/or presentation.21 Different approaches have been employed to improve predictions, like filtering out low-expression genes, incorporating cleavage site, and peptide transport predictions42,43 or structural differences between mutant and wild-type peptides (Fig. 1).25 However, it is challenging to adequately combine these parameters, which individually have only a poor predictive value for immunogenicity. Future developments may benefit from combining machine learning approaches with high-throughput validation and larger data sets of cancer-specific MHC ligands and T cell epitopes.
The recognition that a substantial fraction of TIL are tumor-specific CD4+ T cells and that such T cells play important roles in tumor control7,10,17 has motivated development of computational approaches for MHC II ligand predictions. This is challenging, because peptide binding to MHC II as compared to MHC I molecules is more promiscuous in terms of peptide length, binding sequence motifs, and binding registers; therefore, peptides need to be aligned first, which is often difficult.44 In addition, except for HLA-DR molecules, the α and β chains are polymorphic, which dramatically increases the diversity of possible peptide binding specificity. To address these issues, different strategies have been proposed, like refining the alignment algorithms 33 or predicting peptide binding cores.45 Several online servers are available for MHC II ligand predictions.33,46,47 In general in silico prediction of ligands for MHC II is less accurate than for MHC molecules.45
The number of non-self MHC ligands predicted by in silico methods is typically vastly larger than the one for which T cell reactivity can be detected in cancer patients.2-4, 26 Depending on the methods used for detection, some immunogenic ligands may escape detection, however, most of the predicted ligands are either not immunogenic or are not generated and presented.1,17-21,24,48 In addition, inaccuracies of in silico predictions and lack of good correlations between pMHC complex stability, peptide binding affinity, and immunogenicity necessitate the use of generous cut-offs and calls for high-throughput in vitro validation.41,48
To reduce the number of in silico predicted MHC ligands, their MHC-binding affinities and complex stabilities are measured. Biochemical methods used to measure MHC peptide binding include: (1) A peptide-rebinding assay, referred to as iTopia, in which immobilized pMHC complexes containing an irrelevant peptide are acid stripped and upon re-incubation with test peptides and β2m newly formed pMHC complexes quantified by means of a conformational anti-HLA class I mAb, e.g.,W6/32 (Fig. 2A).49 (2) A peptide-rescuing assay in which pMHC complexes containing a photo-cleavable peptide are UV irradiated in the presence of test peptides, which depending on their MHC-binding strength can rescue the “empty” MHC molecule.50,51 This method allows the generation of large numbers of different pMHC monomers and thus facilitates combinatorial multimer staining.52,53 (3) A refolding assay in which denatured heavy and light chains are refolded in the presence of test peptides and pMHC complex formation is measured by means of the conformation-dependent anti-pan HLA class I antibody, W6/32, using a proximity-based immunoassay (Fig. 2A).41,54,55 Data generated by this assay were used for training of prediction algorithms.56, 57 It has been reported that pMHC complexes stability, as measured by a scintillation proximity assay, better correlates with immunogenicity than binding affinity.41,58
Peptide binding to MHC I molecules has also been assessed on cells by MHC stabilization assay on transporter associated with antigen presentation (TAP) deficient cells (e.g. T2 or RMA-S). However, such assays have limited throughput and limited accuracy due to overlay of peptide loading, competition, and MHC stabilization.59
Like for MHC class I molecules, peptide binding to MHC II molecules can be assessed by peptide-driven refolding of α and β chains containing preformed intra-chain disulfide bonds.60 Alternatively, using insect cell expression systems, soluble “empty” MHC II molecules can be produced and peptide binding assessed by means of biotin or fluorochrome labeled peptides.61,62 MHC II-peptide complex stability can be measured via the disappearance of the tagged peptide over time. Moreover, because MHC II molecules have an open-ended peptide-binding site, they can bind to immobilized peptides, allowing the use of peptide microarrays for MHC class II peptide screening by incubating these with “empty” class II molecules and detection with anti-MHC II mAb (Fig. 2B).63 Peptide microarrays are comparatively inexpensive and having up to two millions peptides/chips, are attractive for high throughput screening.64
In silico MHC ligand prediction combined with biochemical validation is prone to identify peptides that are not expressed and not presented on cancer cells or fail to identify peptides containing post-translational modifications (e.g., phosphorylation) or sequences altered by proteasomal reverse splicing.19-21 These caveats are circumvented by immunopeptidomics, a technology in which MHC molecules are isolated from cells, their peptide cargo isolated and analyzed by liquid chromatography and mass spectrometry (LC-MS).65
The best-established strategy for isolating HLA peptides is immunoaffinity purification of HLA-peptide complexes and recovery of their peptides (Fig. 3A). Usually, pan anti-HLA I or II antibodies are used and peptides associated with all HLA molecules isolated and analyzed. Typically 1–5 × 108 cells from cell lines or one gram of tumor tissue are needed for in-depth immunopeptidome analysis, resulting in identification of thousands of peptides.66-70 Remarkably detailed immunopeptidomes were determined on soluble HLA-peptide complexes isolated from plasma.71 This methodology highly enriches HLA peptides, the vast majority (~95%) of which matches the binding motifs of the corresponding HLA molecules.66,72
The remarkable sensitivity and accuracy of LC-MS analysis allows detailed analysis of immune-peptidomes. Three major MS data acquisition methods are available for HLA peptides: (1) targeted acquisition9,73-75; (2) data dependent acquisition, referred to as discovery or shotgun approach 66,67,72,74,76; and (3) data-independent acquisition, namely SWATH-MS-based acquisition.65 In general, these methods allow identification of T cell (neo-) MHC ligands, including those containing post-translational modifications.72,77
For identification of HLA eluted peptides MS/MS spectra are matched to theoretical spectra of peptide sequences in databases using search engines like Mascot78 or MaxQuant.79 Generation of customized databases from genomic and transcriptomic information 70 allowed identification of private peptides that are not present in reference protein sequence databases. Recently, a targeted approach permitted identification and validation of two predicted neoepitopes from sarcoma cell lines.10 Targeted MS is a highly reproducible and sensitive method, but is limited to hundreds of peptides. MS analysis of synthetic peptides guide selection of optimal transitions that can then be monitored in eluted peptide samples.74 Conversely, using the discovery approach, seven MHC ligands were identified from a mouse cancer cell line, one of which was immunogenic and upon vaccination shown to control tumor growth.25 It is expected that identification of mutation containing MHC ligands by the discovery approach from cancer tissues will become more efficient upon improvement of purification of HLA peptides, LC-MS technology, and bioinformatics algorithms.
The probability of identifying mutation-containing HLA ligands increases with the depth of the ligandome and the mutational content of the sample. Even in an in-depth analysis resulting in accurate identification of thousands of HLA peptides, only a few neoepitopes are expected. Stringent control over false discovery rate (FDR) and peptide spectrum match (PSM) scoring ensures accurate and reliable results; typically one to five percent FDR thresholds are applied to peptidomic datasets using the target-decoy approach.80,81Inclusion of exhaustive lists of known mutations from repositories like the TCGA or COSMIC will increase the level of false positives and therefore better-personalized reference databases should be used. To quantify mutation containing HLA ligands, the sample can be spiked with synthetic peptides labeled with stable heavy isotopes (Fig. 3B).74 It is also recommended to resequence PCR amplicons of the mutated loci to confirm the mutation(s).
In depth interrogation of HLA II peptidomes by immune-peptidomic is straightforward, however, since HLA II molecules are typically expressed on immune cells infiltrating the tumor and only rarely on cancer cells, MS analysis may not be sensitive enough to detect peptides presented on APC in the tumor microenvironment. However, class II peptidomes can be obtained from patient's tissues or PBMC. From the repertoire of thousands detected MHC II peptides, specific binding motifs can be identified, similar as was shown for class I.66,76,80 As current tools for MHC II peptide binding predictions are relatively inaccurate, predictions based on these motifs may improve identification of patient-specific, mutation containing MHC II ligands (Fig. 3C).
Presently there exists no single method that allows comprehensive, truthful, and high throughput TA T cell epitope identification. Correlations between the immunogenicity of a peptide, its MHC-binding affinity and pMHC complex stability tend to be poor.82-85 Each method has its advantages and shortcomings and depending on needs and means, different strategies for TA T cell epitope discovery are indicated. Bioinformatics sequencing data analysis, identification of mutated and overexpressed genes, and prediction of MHC ligands constitute a crucial initial part. For their validation three main strategies are used, each having its advantages and shortcomings: (1) Biochemical MHC ligand validation is tedious and costly in high throughput mode and is prone to identify peptides that are not presented by cancer cells or are not immunogenic. On the other hand, such data is valuable for training of prediction algorithms.19-22,52,85 (2) Immune-peptidome analysis has the big advantage to identify MHC ligands presented by cancer cells, including peptides containing post-translational modifications or altered sequences.18-21 However, this analysis is prone to miss MHC ligands and provides no information on their immunogenicity. (3) T cell (neo)-epitope discovery using patient's T cells in functional assays1,10,67,86-88 or combinatorial tetramer staining.18,53,89,90 These assays are reliable and provide information on patient's cancer-specific T cells, but they have limited throughput capabilities, e.g. because of the paucity of TILs, and especially neoepitope-specific T cells.9,13 For personalized cancer vaccines based on TA peptides or corresponding RNA, in silico prediction and validation on patient's T cells is an established but cumbersome strategy.1,7,8,17,86-88,91 Alternatively, combination of in silico MHC ligand prediction and MS immune-peptidome analysis allowed identification of some cancer presented MHC ligands, but may miss poorly expressed ones.25,67,92,93
There is an urgent need to improve the accuracy of in silico TA MHC ligand predictions, especially for MHC II and for MHC I molecules for which little or no data exist. Moreover, poor correlations between binding affinity, pMHC complex stability, and immunogenicity motivate the development of machine-learning programs that integrate multiple parameters. The rapidly growing volume of data on validated TA presented on MHC molecules should support such efforts. Every progress in in silico MHC ligand prediction will save time and costs for TA epitope discovery. At the same time, high-content peptide validation assays need to be established that allow filtering out falsely predicted peptides and training of prediction algorithms. In view of recent progress in peptide microarray technology it may become possible to rapidly screen very high numbers of peptides binding to MHC II and perhaps also MHC I molecules.63,64,94 Alternatively, MS immune-peptidome analysis may be combined with UV irradiation induced exchange of conditional MHC ligands with libraries of predicted peptides.49,50,70 A powerful strategy of epitope discovery and T cell analysis consist in using peptide exchange combined with combinatorial multimer flow cytometry analysis of patient's TIL or PBMC.5,52,53,87 For more accurate selection of MHC ligands, this analysis can be combined with MS immune-peptidome analysis.5, 50-53,71,87 By using mass cytometry the resolution of this technique can be further increased.21 A promising perspective is pMHC multimer arrays allowing detection of hundreds of T cell specificities in one sample.94 In view of the rapid progress in microfluidic multiplexing technologies the number of T cells that can be enumerated and analyzed is likely to increase dramatically in the near future.95-97
No potential conflicts of interest were disclosed.