|Home | About | Journals | Submit | Contact Us | Français|
Human leukocyte antigen (HLA) class I molecules bind peptides derived from the intracellular degradation of endogenous proteins and present them to cytotoxic T lymphocytes, allowing the immune system to detect transformed or virally infected cells. It is known that HLA class I–associated peptides may harbor posttranslational modifications. In particular, phosphorylated ligands have raised much interest as potential targets for cancer immunotherapy. By combining affinity purification with high-resolution mass spectrometry, we identified more than 2000 unique ligands bound to HLA-B40. Sequence analysis revealed two major anchor motifs: aspartic or glutamic acid at peptide position 2 (P2) and methionine, phenylalanine, or aliphatic residues at the C terminus. The use of immobilized metal ion and TiO2 affinity chromatography allowed the characterization of 85 phosphorylated ligands. We further confirmed every sequence belonging to this subset by comparing its experimental MS2 spectrum with that obtained upon fragmentation of the corresponding synthetic peptide. Remarkably, three phospholigands lacked a canonical anchor residue at P2, containing phosphoserine instead. Binding assays showed that these peptides bound to HLA-B40 with high affinity. Together, our data demonstrate that the peptidome of a given HLA allotype can be broadened by the presentation of peptides with posttranslational modifications at major anchor positions. We suggest that ligands with phosphorylated residues at P2 might be optimal targets for T-cell-based cancer immunotherapy.
Major histocompatibility complex (MHC)1 class I molecules are cell surface glycoproteins that are expressed on almost every nucleated cell in vertebrates. They result from the noncovalent interaction of a polymorphic heavy chain, a constant light chain (β-2-microglobulin (β2m)), and a peptide ligand (1). The extracellular region of the heavy chain encompasses three domains, α1, α2, and α3, with α1 and α2 forming a groove that accommodates a peptide ligand of, typically, 8 to 11 amino acid residues. The binding of the ligand to the groove is governed by the interaction of the side chains of certain peptide residues, called anchor positions, with several pockets of the heavy chain named A to F (1, 2). The size and chemical nature of these pockets impose restrictions on the peptide repertoire that can be associated with a particular class I antigen. It is reckoned that the ligandome of a given class I allotype may comprise up to 10,000 different peptides (3), although recent reports suggest that this number may be underestimated (4).
Peptides displayed by MHC class I molecules derive from the intracellular degradation of endogenous proteins in the nucleus and cytosol and reach the lumen of the endoplasmic reticulum by means of the transporter associated with antigen processing. Inside the endoplasmic reticulum, peptides bind to the heavy chain and β2m in a multistep process involving several chaperones. Finally, if the bound peptide confers enough stability to the complex, the MHC class I molecule migrates via the Golgi network to the cell surface (5). MHC class I molecules facilitate immunological surveillance by presenting peptide ligands to CD8+ T lymphocytes. When tumor-specific peptides or peptides derived from intracellular pathogens are detected by the T cells, they exert their cytotoxic effects over the antigen-presenting cell, promoting tumor suppression or eradication of the infection.
The MHC, known as the human leukocyte antigen (HLA) system in humans, is the most polymorphic region in the entire genome (6). In particular, the IMGT/HLA database (7) currently contains about 7000 allele sequences that encode more than 5000 different human class I antigens. Most of these polymorphisms are located within the α1 and α2 domains of the heavy chain and modulate the peptide binding preferences of each allotype (8). It is thought that the great diversity of HLA class I allotypes, and of their associated ligandomes, is an adaptation to guarantee immunity against intracellular pathogens (6). In this regard, the existence of a large number of different class I molecules capable of presenting diverse peptidomes hampers immune evasion by means of viral genetic mutation.
It has been known for a long time that HLA class I molecules display posttranslationally modified peptides at the cell surface (9). Among other modifications, N-terminal acetylation (10), phosphorylation (11–13), methylation (14), and glycosylation (15) have been described in MHC class I–bound peptidomes. In this context, phosphorylated ligands have raised much interest owing to their potential as targets in T-cell-based cancer immunotherapy (12, 13), given that aberrant phosphorylation is a hallmark of malignant transformation (16, 17) and phosphorylated epitopes can be specifically recognized by CTLs (11). Therefore, the characterization of the phosphopeptidome associated with MHC class I molecules and the identification of tumor-derived phosphopeptides are necessary in order for such immunotherapeutic approaches to be implemented.
Nevertheless, the identification of HLA class I–bound phosphopeptides is difficult because of several analytical limitations. Phosphopeptides constitute only a small fraction of the peptide repertoire of a given HLA allotype. Additionally, MS analysis is hindered by the low ionization efficiency of phosphorylated species relative to their nonphosphorylated counterparts (18), which makes them more difficult to detect. Moreover, the fragmentation of phosphorylated peptides by collision-induced dissociation usually results in minimally informative MS2 spectra (19). As a consequence of the lability of the phosphate group, which is readily dissociated during fragmentation, a prominent signal corresponding to the neutral loss of phosphoric acid is often observed contrasting with poor b- and y-type ion signals. This phenomenon is especially exacerbated in the case of phosphoserine (19), which is involved in about 90% of the phosphorylation events in the human proteome (18). Thus, the unambiguous identification of phosphopeptides is usually a challenging task.
To overcome these limitations, a wide variety of proteomic approaches have been developed. Several such strategies rely on the enrichment of the phosphorylated species prior to LC-MS analysis, typically by means of IMAC or TiO2 affinity chromatography (20). These techniques have also been successfully applied to the characterization of the phosphopeptidomes of several class I molecules (11–13).
In this study, we employed IMAC and TiO2 chromatography in combination with LC-MS to enrich and identify phosphorylated peptides associated with HLA-B*40:02. This strategy allowed the identification of a large number of endogenous ligands and the fine mapping of the B*40:02 binding motif. It was also effective for the characterization of the phosphopeptidome displayed by this allotype. Among the identified phospholigands, we found some sequences lacking the canonical binding motif at peptide position 2 (P2) and carrying a phosphoserine residue instead. Thus, the presentation of ligands harboring posttranslationally modified residues at major anchor positions contributes to the increased diversity of HLA class I peptidomes. We suggest that these sorts of epitopes might be valuable in T-cell-based cancer immunotherapy.
HMy2.C1R (C1R) is a human lymphoid cell line with low expression of its endogenous HLA class I molecules. C1R cells show reduced levels of HLA-B*35:03 and normal expression of HLA-C*04:01 (21). A full-length cDNA clone of B*40:02 was obtained from the LCL line 143.2 (22) and cloned into the RSV5neo vector. C1R-B*40:02 transfectants were generated via electroporation of 107 C1R cells at 250 mV and 960 μF. To select stable transfectants, we grew the cells in the presence of 1 mg/ml geneticin (Invitrogen), and surface expression of HLA-B*40:02 was confirmed by flow cytometry. Cells were cultured in DMEM supplemented with 7.5% FCS (both from Sigma). The mAb W6/32 (IgG2a specific for a monomorphic HLA class I determinant) has been described elsewhere (23).
B*40:02-bound peptides were isolated as described elsewhere (24), with minor variations. About 1010 C1R-B*40:02 transfectant cells were lysed in 1% Igepal CA-630 (Sigma-Aldrich), 20 mm Tris, 150 mm NaCl (pH 7.5) in the presence of a mixture of protease inhibitors (Complete-Midi, Roche). No phosphatase inhibitors were included. The lysate was subjected to differential centrifugation for 10 min at 2000 × g, 30 min at 10,000 × g, and 1 h at 100,000 × g. After ultracentrifugation, the soluble fraction was subjected to affinity chromatography using the W6/32 mAb. HLA-B40-bound peptides were eluted with 0.1% aqueous TFA at room temperature and filtered through Centricon 3 devices (Amicon, Beverly, MA). Peptides were concentrated in a SpeedVac, desalted with an OMIX C18 tip (Varian, Palo Alto, CA), and dried completely.
Phosphopeptide enrichment entailed the combination of two previously described approaches, namely, IMAC with Fe3+ as ligand (25) and TiO2 chromatography (26). The peptide pool, reconstituted in 200 μl of loading solution (50% acetonitrile, 0.3% TFA, pH 1.5), was incubated at room temperature with 15 μl of Phos-Select iron affinity gel (Sigma) that was subsequently packed in a homemade tip column. This column was connected to a second one packed with Oligo R3 resin (Applied Biosystems, Foster City, CA). The flow-through was recovered and both columns were washed extensively, first with loading solution, then with transfer solution (1% phosphoric acid), and finally with washing solution (2% acetonitrile, 0.1% TFA). Elution was carried out using two solutions sequentially: (i) 50% acetonitrile, 0.1% TFA; and (ii) 30% acetonitrile, 0.06 mm NH4OH. Both eluates were mixed, dried to completeness, and redissolved in 0.1% formic acid (this sample is referred to as the IMAC fraction throughout the article).
The flow-through of the first chromatography was dried and reconstituted in 1 m glycolic acid, 80% acetonitrile, 5% TFA and applied to a microcolumn packed with Titansphere TiO2 resin (GL Sciences, Tokio, Japan). The unbound fraction was recovered, acidified with 10 μl of 0.1% formic acid, dried, and reconstituted in 0.1% TFA before desalting with a C18 ZipTip (Eppendorf, Hamburg, Germany). Afterward, the sample was dried and reconstituted in 0.1% formic acid (this sample is referred to as the flow-through fraction throughout the article). The TiO2 column was washed with 80% acetonitrile, 5% TFA and the peptides were eluted with two solutions: (i) 0.3 m NH4OH and (ii) 0.3 m NH4OH, 30% acetonitrile. Both eluates were mixed and the medium was acidified by the addition of formic acid. Then, the samples were desalted using a C18 ZipTip (Eppendorf), dried in a SpeedVac, and redissolved in 0.1% formic acid (this sample is referred to as the TiO2 fraction throughout the article).
Two technical replicates of each fraction (flow-through, IMAC, and TiO2) of the B*40:02-bound peptide pool were analyzed in a nano-LC Ultra HPLC (Eksigent, Framingham, MA) coupled online with a 5600 triple TOF mass spectrometer (AB Sciex, Framingham, MA) through a nanospray III source (AB Sciex). Chromatography was performed using a C18 chromXP trapping column (350 μm × 0.5 mm, 3-μm particle diameter, 120-Å pore size; Eksigent) and a C18 chromXP column (75 μm × 150 mm, 3-μm particle diameter, 120-Å pore size; Eksigent). Solvent A was 0.1% formic acid in water, and solvent B was 0.1% formic acid in acetonitrile. The loading pump was operated at isocratic conditions with buffer A at a flow rate of 2 μl/min for 10 min. The nanopump worked at 300 nl/min under gradient elution conditions as follows: 2% B for 1 min, a linear increase to 30% B in 109 min, a linear increase to 40% B in 10 min, a linear increase to 90% B in 5 min, and 90% B for 5 min. HPLC was controlled with the Eksigent Control software (version 3.12, Eksigent).
The nanospray source was equipped with a fused silica PicoTip emitter (10 μm × 12 cm, New Objective, Woburn, MA). The ion source was operated in positive ionization mode at 150 °C with a potential difference of 2800 V. Each acquisition cycle consisted of a survey scan of 250 ms between 350 and 1250 m/z units and a maximum of 50 MS2 spectra scanning between 100 and 1500 m/z units. Ions showing the highest intensities in the MS spectrum were selected for fragmentation. Singly charged ions were excluded to avoid the fragmentation of non-peptide contaminants. A dynamic exclusion window of 20 s was applied to each fragmented ion. The total cycle time was 2.8 s. The mass spectrometer was controlled with Analyst TF software (version 1.5, AB Sciex).
Synthetic peptides were dissolved in 0.5% formic acid and 20% acetonitrile at an estimated concentration of 250 fmol/μl. Peptides were directly infused at a flow rate of 3 μl/min into the 5600 triple TOF mass spectrometer. For retention time comparison of natural and synthetic phosphopeptides, a pool of 85 synthetic phosphopeptides, 100 fmol each, was prepared in 0.1% formic acid and analyzed via LC-MS/MS using the same chromatographic conditions described above. The extracted ion chromatogram of each phosphopeptide was used to determine its retention time.
Raw data were processed with PeakView software (version 1.1, AB Sciex) to generate an MGF file that was used as input for a Mascot (version 2.4) MS/MS ion search. A concatenated target-decoy protein database containing 40,478 sequences was generated by combining the UniProt Homo sapiens complete proteome set (downloaded on May 23, 2011) with its corresponding reversed database generated with the DBToolKit software (version 4.1.4). Search parameters were set as follows: no enzyme; peptide tolerance, 15 ppm; MS/MS tolerance, 25 mDa; and electrospray ionization quadrupole TOF as instrument. Variable modifications included phosphorylation of serine, threonine, and tyrosine; oxidation of methionine; and pyroglutamic acid formation from N-terminal glutamine. Peptide sequences that matched the HLA-C*04:01 binding motif, Phe or Tyr at P2 and Asp at P3 (27), were discarded. Estimation of the false discovery rate (FDR) was carried out by decoy hit counting as previously described (28, 29), and only those matches with an FDR < 5% at the peptide level were considered. All the information related to the MS analysis, MS/MS ion search, and peptide identification, including raw data, MS metadata, MGF and mzIdentML files, and the corresponding MIAPE MS and MSI documents, were deposited in ProteomeXchange (PRIDE accession number 31118, ProteomeXchange accession number PXD000450). This process was aided by the MIAPE extractor tool (version 3.7.0).
For the identification of phosphorylated ligands, every match with a Mascot score greater than 25 was considered. Then, MS2 spectra were manually inspected for signals that could correspond to the neutral loss of the phosphate group (−98, −49, and −32.7 for singly, doubly, and triply charged ions, respectively). Putative phosphopeptides were validated by means of retention time comparison and fragmentation of the corresponding synthetic peptide (supplemental data S2).
To assess preferences in residue usage, we grouped the identified peptides according to their length. The frequency of each residue at each peptide position (fobs) was compared with the frequency of the same amino acid in the database (fexp) under the null hypothesis that fobs ≤ fexp. Preliminary p values for each residue and position were calculated assuming a binomial distribution with p = fexp. Definitive P values were obtained by subjecting preliminary p values to multiple testing correction as follows:
where k is the number of residues of each peptide in the set tested (i.e. k = 9 in the nonamer set). P values less than 0.05 were considered statistically significant.
The stepwise solid-phase peptide synthesis was performed on an automated Multipep peptide synthesizer (Intavis, Koeln, Germany) using standard Fmoc (N-(9-fluorenyl)methoxycarbonyl) chemistry. Peptides were purified via reversed-phase chromatography either in a Smartline HPLC instrument (Knauer, Berlin, Germany) equipped with a 218TP52 C18 column (Vydac, Deerfield, IL) or using an Oligo R3 (Applied Biosystems) microcolumn. Peptides intended for binding assays were quantified by means of amino acid analysis in a Biochrom 30 amino acid analyzer (Biochrom, Cambridge, UK). The peptide GEFGGCGSV was labeled after synthesis with 5-iodoacetamidofluoresceine (Thermo) following the manufacturer's instructions. Afterward, it was purified by means of reversed-phase HPLC and quantified based on absorbance at 491 nm as described elsewhere (30).
Binding assays were performed essentially as described elsewhere (30, 31), with minor variations. In brief, C1R-B*40:02 transfectants were washed twice with PBS and acid stripped via incubation in 0.263 m citric acid, 0.123 m Na2HPO4, 1% BSA, pH 3, for 2 min at 4 °C. After two washes with ice-cold DMEM, the cells were incubated in DMEM supplemented with 5% FCS, 2 μg/ml of human β2m (Calbiochem), 400 nm fluorescent reference peptide GEFGGXGSV (where X represents fluoresceine-labeled cysteine), and different concentrations of the test peptides ranging from 50 μm to 23 nm in 3-fold dilutions. After overnight incubation at 4 °C, fluorescence was measured in an Epics XL-MCL flow cytometer (Beckman Coulter). Inhibition of reference peptide binding was plotted versus the test peptide concentration, and the IC50 (the concentration of test peptide that gives 50% inhibition) was estimated after the experimental results had been fitted to a sigmoid curve, as previously described (30).
HLA-B*40:02 was affinity purified and its constitutive peptidome was acid extracted and isolated via centrifugal filtration. The phosphorylated species in the peptide pool were enriched by means of IMAC and TiO2 chromatography. This strategy yielded three fractions (namely, IMAC eluate, TiO2 eluate, and flow-through) that were then analyzed via LC-MS.
Peptide matches were filtered at an FDR < 5% at the peptide level (FDR < 2.4% at the peptide spectrum match level, Mascot score > 45.73). A total of 2246 unique ligands were identified, including 337 (15%) octamers, 1131 (50%) nonamers, 445 (20%) decamers, 245 (11%) undecamers, 74 (3%) dodecamers, and 14 (1%) tridecamers (Fig. 1A and supplemental data S1). The size distribution of this set of ligands showed a Gaussian-like pattern with a mean (± S.D.) molecular weight of 1073.2 (± 130.0) Da (Fig. 1B).
Not surprisingly, the vast majority of the ligands (1935, 86%) were identified in the flow-through fraction, including 1581 sequences (70%) that were not detected in the IMAC or TiO2 samples. The numbers of ligands in the IMAC and TiO2 eluates were 399 (18%) and 371 (17%), respectively, including 79 (4%) and 203 (9%) peptides that were found in these fractions exclusively (Fig. 1C).
To determine the B*40:02 binding motif, peptides were grouped according to their length. Only octamers to undecamers were considered because the number of sequences was large enough for proper statistical analysis only in these sets. We assumed that in peptide positions not subjected to structural constraints, residue usage should mirror the frequencies of each amino acid in the proteome. Conversely, the overrepresentation of one or more particular residues would reflect the binding preferences of HLA-B40. The frequency of each residue at each peptide position (fobs) was compared with the frequency of that particular amino acid in the database (fexp). A residue was considered to be favored at a given position if the difference between fobs and fexp was statistically significant (p < 0.05) after multiple testing correction.
The major restriction for binding to B*40:02 involved P2. B*40:02 ligands showed an almost absolute preference for acidic residues at this position (Fig. 2). Although both Asp and Glu could be accommodated, the frequency of Asp2 was appreciably lower and declined as peptide length increased. Among octamers, 228 (68%, fobs/fexp = 9.4) and 86 (26%, fobs/fexp = 5.3) peptides showed Glu2 and Asp2 motifs, respectively. In the case of nonamers, Glu2 was present in 1010 ligands (89%, fobs/fexp = 12.3) and Asp2 was found only in 67 (6%, fobs/fexp = 1.2). Notably, no peptides containing Asp2 were observed among decamers and undecamers, where the frequency of Glu2 reached 97% (fobs/fexp = 13.4) and 96% (fobs/fexp = 13.2), respectively.
The second most relevant anchor position was the peptide C terminus (PΩ), which showed a slightly more relaxed specificity than P2 (Fig. 3). In this case, a strong selection of Leu was observed encompassing 210 octamers (62%, fobs/fexp = 6.3), 696 nonamers (62%, fobs/fexp = 6.3), 297 decamers (67%, fobs/fexp = 6.8), and 178 undecamers (73%, fobs/fexp = 7.4). Other aliphatic residues (Val, Ile), Met, and Phe were also significantly overrepresented. Finally, Ala could also be found at PΩ, albeit at a lower frequency than expected (fobs/fexp < 1).
Besides these two main anchors, restrictions in residue usage were observed in most peptide positions, although much more subtle than those affecting P2 and PΩ (Fig. 4). Only the central regions of decamers (P5 and P6) and undecamers (P5, P6, and P7) showed no preference for any particular amino acid (Fig. 4).
In order to gain insight into the B*40:02 phosphopeptidome, an enrichment strategy was deployed. Two tandem affinity steps (namely, IMAC with Fe3+ and TiO2 affinity chromatography) were used to capture the putative phosphorylated species present in the B*40:02 peptide pool. Afterward, LC-MS analysis of the bound material and database searching were performed.
To bypass the limitations related to the identification of phosphopeptides via MS, every peptide match with a score greater than 25 in the Mascot search was considered. Then, MS2 spectra were manually inspected for signals that could derive from the neutral loss of phosphoric acid (see “Experimental Procedures”). Finally, putative phosphorylated sequences were further confirmed through comparison of the retention times and the MS2 spectra of the endogenous and the synthetic peptides (Fig. 5 and supplemental data S2).
A total of 85 unique phosphopeptides were sequenced using this approach (Table I and supplemental data S2). Of them, 69 (81%) were identified only in the IMAC eluate and 16 (19%) were found in both the IMAC and the TiO2 samples. No single peptide belonged exclusively to the TiO2 fraction. In our dataset, phosphorylation occurred exclusively at serine (77 sequences, 91%) and threonine residues (8 sequences, 9%) and peptides containing phosphotyrosine could not be identified. Notably, in 60 ligands (71%) phosphorylation involved SP or TP sites (i.e. phosphorylation occurred before a proline residue). Finally, 48 out of the 85 phosphorylated positions described (56%) had been previously annotated in either the HPRD or the UniProt database (Table I). Thus, the remaining 44% were novel phosphosites described here for the first time.
Interestingly, 3 out of the 85 phosphorylated ligands identified in this study lacked Asp or Glu at P2 and contained phosphoserine (pSer) instead: S[pS]YGNIRAV, G[pS]FSRFYSL, and R[pS]FPTLPTL (peptides 57, 65, and 81 in Table I and supplemental data S2). It is worth noting that pSer and Glu share structural similarities. Both residues hold side chains of comparable length with a net negative charge. As a consequence, we reasoned that the interaction of pSer with the B pocket could confer enough stability to the complex to allow B*40:02 to display ligands with phosphorylated residues at P2.
To test this hypothesis, the natural ligands S[pS]YGNIRAV and G[pS]FSRFYSL and the related mutant peptides SEYGNIRAV and GEFSRFYSL were tested for binding to B*40:02. In this assay, C1R-B*40:02 cells were acid stripped to dissociate surface HLA class I complexes. Then, a reference peptide that bound specifically to HLA-40 (Fig. 6A) was added to the cells together with human β2m and different concentrations of the test peptides. The amount of fluorescent peptide bound to B*40:02 was determined via flow cytometry, and the binding affinity of the test peptides was inferred from the concentration-dependent inhibition of the binding of the reference peptide.
Both phosphopeptides bound to B*40:02 with high affinity. The IC50 value, defined as the concentration of test peptide that yielded 50% inhibition, was 1.8 and 0.3 μm for S[pS]YGNIRAV and G[pS]FSRFYSL, respectively (Fig. 6). Likewise, the mutant peptides SEYGNIRAV and GEFSRFYSL showed affinity similar to or slightly higher than that of their phosphorylated counterparts (IC50 = 1.3 and 0.2 μm, respectively). This further confirmed that S[pS]YGNIRAV and G[pS]FSRFYSL are true B*40:02 ligands and indicated that the substitution of Glu by pSer has little or no effect on binding affinity. The peptide GRIDKPILK, a known HLA-B*27:05 ligand, was included as a negative control, as it lacks proper motifs at P2 and PΩ to fit the B*40:02 groove. As seen in Fig. 6, this ligand failed to inhibit the binding of the reference peptide, demonstrating that the rest of the inhibition curves actually reflect specific binding of the test peptides.
Modern mass spectrometers in combination with database searching strategies allow high-throughput identification of peptides and proteins on a routine basis. The same techniques devised to identify proteins have been applied to the characterization of class I–bound peptide repertoires (4, 32–34). The identification of HLA class I–bound peptidomes, however, is usually a more complex task because of the relatively low amount of sample typically available. We estimate that after affinity purification, only about 2 to 4 μg of peptides are obtained from 1010 cells transfected with the allotype of interest (data not shown). Despite this drawback, the high-throughput identification of HLA ligands is now feasible, and some authors have even proposed the staging of a Human Immunopeptidome Project, analogous to the Human Proteome Project (35), to systematically characterize the ligandomes of HLA antigens (36).
In this work, we focused on the characterization of the peptidome and phosphopeptidome presented by HLA-B*40:02, a member of the B44 supertype (8). B*40:02 has been reported to predispose to adult T-cell leukemia, a non-Hodgkin's lymphoma caused by human T-lymphotropic virus type 1 (37). Apparently, this association is explained by the limited capability of HLA-B*40:02 to present epitopes derived from human T-lymphotropic virus type 1 and to trigger a strong CTL response. If tumor-specific epitopes were described, it is conceivable that B40+ adult T-cell leukemia patients could benefit from T-cell-based immunotherapy.
The HLA-B40-associated peptide pool was affinity purified and its constitutive peptide repertoire was acid extracted. Afterward, phosphorylated ligands were enriched sequentially by means of IMAC and TiO2 affinity chromatography, yielding three different fractions, IMAC, TiO2, and flow-through, that were then analyzed via LC-MS. An MS/MS ion search allowed the identification of more than 2000 B*40:02 ligands. Most of them were identified in the flow-through, as expected, but both the IMAC and the TiO2 fractions contributed significantly to the number of detected ligands, providing more than 300 additional sequences. The size distribution of this peptide pool was that expected for a class I ligandome, with nonamers being by far the most abundant species and accounting for 50% of the peptide repertoire. Additionally, a relatively high frequency of octamers (15%) was detected. Although not common, this feature is shared by other HLA class I antigens such as B37 or B18 (38).
For fine mapping of the B40 binding motif, peptides were grouped according to their lengths. This grouping is required because class I ligands are anchored through their N and C termini (1) to the heavy chain. Consequently, short peptides are bound in an extended conformation, whereas the central region of longer ligands protrudes from the groove (39–42). For this reason, the proper alignment of peptides of different lengths is not straightforward, especially regarding peptide positions P4 to PΩ-2.
The major constraint for binding to B40 was found at P2, where the overwhelming majority of ligands contained acidic residues. Indeed, only 4.7% of the identified peptides showed alternative amino acids at this position, which is consistent with the estimated FDR of the whole set (<5%). This indicates that the presence of Glu2 or Asp2 is mandatory for binding to B*40:02 and suggests that the sequences without this motif are probably random matches. It is also possible that contaminating peptides bound to HLA-B*35:03 were present in our dataset because, in contrast to C*04:01, B35 ligands were not filtered out during data analysis. However, only 8 out of the 2246 reported sequences (0.36%) matched the HLA-B*35:03 binding motif (43). This is consistent with the very low expression level of HLA-B35 in C1R cells caused by a point mutation in its translation initiation codon (21).
Although the three-dimensional structure of HLA-B*40:02 has not yet been elucidated, other members of the B44 supertype with an identical B pocket, such as B41, show a similar restriction at P2 (39). Analysis of the crystal structures of HLA-B*41:03 and B*41:04 reveals the presence of hydrogen bonds between the residue at P2 with Tyr99 and Glu63, van der Waals interactions with Tyr7, and potential salt bridges with His9 and Lys45 (39). As a general rule, position 45 is critical for the specificity of the B pocket. In this regard, allotypes with Lys45 such as B41 (39) or B44 (44) bind peptides with acidic residues at P2, whereas other molecules with Glu45, such as B27, show an almost absolute preference for Arg at this position (33, 45, 46).
Intriguingly, a size-dependent modulation of residue usage at P2 was found. Whereas B40 bound octamers with Glu and Asp at this position (68% and 26% of the sequences, respectively), among nonamers, ligands with Asp2 accounted for only 6% of the peptide set. Furthermore, no decamers or undecamers containing Asp2 were identified. At present, we have no structural explanation for this finding, and probably the determination of the three-dimensional structure of B40 will be required in order for light to be shed on this issue.
As in other HLA class I antigens, the second most influential position for binding to B*40:02 was found at the peptide C terminus. Residues with hydrophobic side chains at PΩ were found in most cases (Fig. 3 and supplemental data S1). By far, Leu was the preferred C-terminal residue and was present in about 65% of the identified ligands. Phe, Val, Ile, and Met were statistically overrepresented, at least in the nonamer set. Finally, Ala was also found in a number of ligands, although its frequency, lower than expected by chance, suggests that it is a suboptimal anchor motif. The molecular basis for this preference can be inferred from the crystal structure of B41, which shares with B40 most of the residues that shape the F pocket, including the key amino acids Leu95, Tyr116, Tyr123, and Trp147. In B*41:03 and B*41:04, the side chain of the residue at PΩ is deeply buried in a hydrophobic F pocket in contact with the abovementioned residues (39). Regarding the main anchor positions, a similar binding motif has been described for the closely related allotype HLA-B*40:01 (38). However, some differences in residue usage exist. B*40:01 is more restrictive at P2; only Glu is found at this position, and at PΩ, where Phe is particularly disfavored. The structural basis for this discrepancy is not clear, as both allotypes share the same B pocket and a nearly identical F pocket. Perhaps the limited number of B*40:01 ligands identified to date—56 in the abovementioned study (38)—or indirect effects involving secondary anchor residues could explain this divergence.
Finally, most peptide positions showed some bias in residue usage, with the exception of P5 and P6 in decamers and P5, P6, and P7 in undecamers (Fig. 4). This lack of selection is probably a consequence of the bulged conformation that long peptides adopt to fit the binding groove (39–42). As a result, the central region of the peptide establishes no contact with the heavy chain, and thus there are no structural constraints that drive the selection of particular motifs.
One of the main goals of this work was the characterization of the phosphopeptidome displayed by HLA-B40. The identification of phosphorylated species associated with HLA class I molecules has gained considerable attention since they were proposed as putative targets for cancer immunotherapy (12, 13). Nevertheless, the identification of phosphopeptides via mass spectrometry is challenging because of several analytical limitations, such as their low stoichiometry, their inefficient ionization, or the poorly informative MS2 spectra obtained upon their fragmentation by collision-induced dissociation (18, 19). To circumvent these pitfalls, two strategies were adopted: (i) perform phosphopeptide enrichment prior to LC-MS analysis, and (ii) validate every single identification by comparing both the retention times and the MS2 spectra of the natural and the synthetic peptides.
Enrichment of phosphopeptides is mandatory for the mapping of phosphorylation events in classical bottom-up workflows (18). In the same way, identification of HLA class I–associated phospholigands has benefited from the implementation of these approaches (11–13). We combined IMAC and TiO2 affinity chromatography before LC-MS analysis, resulting in the identification of 85 B40-bound phosphopeptides. To our knowledge, this is the largest set of MHC class I phosphorylated ligands reported to date. All of them were found in the IMAC fraction, and 16 (19%) were also observed in the TiO2 sample. No single identification belonged exclusively to the TiO2 set, indicating that, in terms of the identification of phosphopeptides, the contribution of TiO2 affinity chromatography after IMAC was not really valuable. However, as stated above, about 200 nonphosphorylated endogenous ligands were sequenced from this fraction exclusively, meaning that fractionating the B40-bound peptide pool using TiO2 columns had a positive effect on the sensitivity of the LC-MS analysis.
To overcome the low quality of their MS2 spectra, all the phosphorylated sequences with Mascot scores greater than 25 were manually analyzed, and those showing signals compatible with a neutral phosphate loss were compared with the fragmentation spectra of the equivalent synthetic peptide to remove false positives. In addition, the retention times of the natural and the synthetic phosphopeptides were found to correlate closely. This approach guaranteed that the set of phosphorylated sequences presented in this study was highly curated.
In the 85 sequences reported, phosphorylation occurred in serine (91%) and threonine (9%) but not in tyrosine residues. This distribution parallels the relative frequencies of these posttranslational modifications in the proteome, namely, 90%, 10%, and <0.05% for phosphoserine, phosphothreonine, and phosphotyrosine, respectively (18). In a relatively high number of cases, phosphorylation occurred before a proline residue, probably reflecting the substrate specificity of proline-directed serine/threonine kinases, such as the MAP or the cyclin-dependent protein kinase families, which recognize and phosphorylate SP or TP sites (47). About half of the phosphorylation events described in this study had been previously reported, though not in the context of HLA-class peptide repertoires, supporting the accuracy of the identifications. 37 sequences revealed novel phosphorylation sites described here for the first time. This proves that HLA peptidomics may also contribute to the characterization and annotation of posttranslational modifications in proteins.
The main finding of this work was the identification of three ligands that lacked the canonical B40 binding motif at P2. These sequences were not false positives, as the MS2 spectra of the corresponding synthetic peptides were identical to the experimental ones. One of them derives from residues 721 to 729 of the regulatory-associated protein of mTOR (Raptor), a component of the mammalian target of rapamycin complex I, which regulates cell growth and autophagy in response to starvation (48, 49). Phosphorylation of Raptor at Ser722 has been previously described (50, 51). The other two sequences correspond to novel phosphorylation sites of cytochrome b-c1 complex subunit 2, a member of respiratory chain complex III, and runt-related transcription factor 3.
The three sequences harbored a pSer residue at the main anchor position instead of Asp or Glu. Furthermore, S[pS]YGNIRAV and G[pS]FSRFYSL were shown to bind to B40 with high affinity, confirming that they were bona fide ligands. Given the structural similarities between pSer and Glu in terms of size and charge distribution, we hypothesized that pSer and acidic residues would interact with the B pocket in a similar way, leading to the formation of stable complexes. Indeed, these two ligands behaved similarly, in terms of binding affinities, to the related mutant sequences SEYGNIRAV and GEFSRFYSL.
The identification of HLA class I ligands harboring phosphorylated residues at their major anchor position might be relevant in the design of immunotherapeutic approaches for cancer treatment. Given that abnormal phosphorylation is frequently observed in transformed cells, tumor-specific phosphopeptides are obvious candidates for T-cell-based therapies. However, besides a target epitope, a specific CTL response is required in order to eradicate transformed cells. When phosphorylation occurs at nonanchor residues, both the phosphorylated and the nonphosphorylated species will likely be displayed. If this is the case, a T-cell might recognize both species due to cross-reactivity. Although specific CTLs can be raised against phosphopeptides (11), some cytotoxic activity against their nonphosphorylated counterparts may be present (52). Thus, even if a CTL recognizing a phosphorylated ligand could escape negative selection in the thymus, cross-reactivity with the nonphosphorylated epitope would hamper its use as a therapeutic agent. In contrast, peptides phosphorylated at P2 can bypass this limitation, as the posttranslational modification of the residue is essential for binding to the class I molecule. In this scenario, cross-reactivity with the unmodified epitope is not possible, guaranteeing the specificity of the CTL response. Therefore, an important goal of future work will be the identification of HLA class I ligands phosphorylated at P2 in tumor samples.
We specially thank José Antonio López de Castro (Centro de Biología Molecular Severo Ochoa, Madrid, Spain) for his dedicated help in the generation of the C1R-B*40:02 transfectant cell line. We also thank Iñaki Álvarez (Institut de Biotecnonologia i Biomedicina, Barcelona, Spain), Alberto Paradela, and Severine Gharbi (both from the CNB, Madrid, Spain) for critical comments on the manuscript; Salvador Martínez-Bartolomé (CNB, Madrid, Spain) for his expert support during data submission to ProteomeXchange; and the flow cytometry facility of the CNB for their technical assistance. The mass spectrometry proteomics data have been deposited at the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD000450 and the PRIDE accession number 31118. The Proteomics Unit of the Centro Nacional de Biotecnología is a member of the Spanish National Institute for Proteomics (ProteoRed-ISCIII).
Author contributions: M.M. and J.P.A. designed research; M.M., A.A., M.L., and M.R. performed research; M.M., A.A., and A.R. analyzed data; M.M. wrote the paper.
* M.M. and A.A. were funded by the JAE-Doc 2009 and JAE-Pre 2011 CSIC programs, respectively.
This article contains supplemental material.
1 The abbreviations used are: