|Home | About | Journals | Submit | Contact Us | Français|
Experimental results are presented for 180 in silico designed octapeptide sequences and their stabilizing effects on the major histocompatibility class I molecule H-2Kb. Peptide sequence design was accomplished by a combination of an ant colony optimization algorithm with artificial neural network classifiers. Experimental tests yielded nine H-2Kb stabilizing and 171 nonstabilizing peptides. 28 among the nonstabilizing octapeptides contain canonical motif residues known to be favorable for MHC I stabilization. For characterization of the area covered by stabilizing and non-stabilizing octapeptides in sequence space, we visualized the distribution of 100,603 octapeptides using a self-organizing map. The experimental results present evidence that the canonical sequence motives of the SYFPEITHI database on their own are insufficient for predicting MHC I protein stabilization.
Cell surface presentation of peptides by major histocompatibility complex I (MHC I) is prerequisite for the initiation of an adaptive immune response  and knowledge of MHC-binding peptides is required for the development of vaccines and immunomonitoring protocols for cell-mediated immunity. MHC I molecules are integral membrane proteins that bind peptides with a length of eight up to thirteen amino acids for presentation to CD8+ T lymphocytes [2, 3]. Peptide binding to MHC I stabilizes the MHC-peptide structure at the cell surface of antigen presenting cells. Binding of an octapeptide to an MHC I molecule is defined by the recognition of the peptide by the MHC molecule and its binding affinity . In consequence the binding of the octapeptide leads to stabilization of the MHC-peptide complex on the cell surface. Complex stability is critically influenced by the amino acid sequence of the bound peptide , for which Rammensee and coworkers suggested allele-specific canonical sequence motifs . For the octapeptides presented by the mouse MHC I H-2Kb this sequence motif (the canonical or SYFPEITHI motif) is defined as X-X-(Y)-X-[Y/F]-X-X-[L, M, I, V]. Positions three, five, and eight are also referred to as “anchor positions” .
For characterization of the H-2Kb stabilizing and nonstabilizing sequence space we designed a diverse set of octapeptides. To explore extensions and alternatives to the known canonical motif, the set of designed octapeptides included sequences containing the full, partial, or no canonical motif. To generate new octapeptides that stabilize H-2Kb we applied an Ant Colony Optimization (ACO)  algorithm in combination with neural network classifiers. Artificial neural networks (ANNs)  were trained using a set of 423 octapeptides with known H-2Kb stabilizing effect as determined in cellular stabilization assays . The resulting machine learning classifiers served as fitness function for the ACO algorithm. Navigation through sequence space containing 208 possible octapeptides was realized by the ACO meta-heuristic which is deduced from social insect behavior [7, 10]. ACO is a probabilistic technique that is not susceptible to dominant ultimate solutions but, due to its “swarm intelligence” based on numerous autonomous agents, open for broad and distributed optimization . New peptide sequences were generated with the ACO algorithm and presented to the trained ANNs for fitness evaluation. During this optimization process the peptide sequences were iteratively adapted according to the ANN fitness score. Finally, the designed octapeptides were synthesized and their stabilization effect was tested experimentally.
We present evidence that rational peptide design utilizing ACO is feasible and leads to novel bioactive peptides with minimal experimental effort. Here, the focus is on the de novo design of peptides with a specific MHC I stabilization effect. While some of the designed peptides conform to the known canonical motif for H-2Kb stabilizing peptides, we also show that the degree to which the peptide sequence matches the motif alone is insufficient for prediction of MHC I stabilization. We designed peptides with the complete canonical sequence motif but lacking detectable stabilizing effect. For visualization of the transition between stabilizing and nonstabilizing octapeptides we present a projection of peptide sequence space on a self-organizing map (SOM). This form of representation facilitates the identification of clusters of stabilizing peptides based on their physicochemical properties.
Training data were compiled from the public databases (AntiJen , EPIMHC , IEDB , MHCBN ) and literature sources [16, 17]. The complete dataset contained 423 octapeptides with 242 positive (stabilizing) and 181 negative (nonstabilizing) examples. The annotation of octapeptides as stabilizing and nonstabilizing mouse MHC I protein H-2Kb was based on published experimental data. EC50 values below 10μM were regarded as H-2Kb stabilizing, greater EC50 values as nonstabilizing.
Each residue of an octapeptide was encoded by five different sets of molecular descriptors (See supplementary material (Suppl. 1) available online at doi:10.1155/2010/396847). The combination of amino acid descriptors served as input for the ANNs. The dimension of the input originated from the coding of each amino acid of the octapeptide by each descriptor.
The ACO algorithm was implemented using the Java programming language V.1.6 (Sun Microsystems, Inc., Santa Clara, CA, USA). Our ACO algorithm is defined by three consecutive steps: sequence design, path evaluation, and pheromone update, as previously described by Jäger et al. . Peptide design by ACO was terminated when the pheromone concentration had been constant for 10,000 iterations. Together the three steps represent a single iteration of the algorithm (one generation of ants). Ants are computational agents with individual memory coded via “pheromone concentrations”. While moving through the search space each ant generates a path corresponding to a new octapeptide. All ants of one generation move independent of each other on individual paths. The resulting paths were evaluated by a fitness function implemented as ANNs. Communication between subsequent generations of ants is achieved through the modification of pheromone concentrations (“stigmergy” [19, 20]). The pheromone matrix represents the collective memory of an ant colony. Only the path with the highest fitness obtained a pheromone update. The advantage of the ACO algorithm is that agents need no information about the complete problem to propose a solution, in our case the complete possible sequence space containing 208 octapeptides.
Fully connected feedforward networks with a single hidden layer and one output neuron (all neurons with sigmoidal activation) were implemented using Matlab (version 188.8.131.527 R2007a, The Mathsworks Inc.; neural networks toolbox version 5.0.2). The outputs of five ANNs were combined as input for a jury network . The output of the jury served as fitness value (or “score”) for the ACO algorithm, which adopted values of the interval ]0,1[. Details on the network architecture were described previously .
The stabilization assay was performed as described by Brock et al.  using TAP-deficient RMA-S cells (mutagenized Rauscher virus-induced T lymphoma cells of mouse origin) . The cells were cultured in DMEM (Gibco-BRL, Karlsruhe, Germany) with 10% FCS (Sigma-Aldrich, Steinheim, Germany) at 37°C with 8%CO2. For accumulation of peptide-free MHC I proteins at the cell surface, the cells were cultured for 16 hours at 26°C. The cells were incubated with the peptides in 10 serial dilutions of 100 to 5.6 × 10−4μg/mL at room temperature for 1 hour, followed by 1-hour incubation at 37°C for denaturation of peptide-free MHC I proteins. The stabilized MHC I proteins were visualized and quantified by flow cytometry using the H-2Kb specific monoclonal antibody B8.24.3  purified in the laboratory from hybridoma culture supernatant by protein G affinity chromatography (Pierce, Darmstadt, Germany) and an R-Phycoerythrin-conjugated anti-mouse antibody (Dianova GmbH, Hamburg, Germany) as secondary reagent. The stabilizing effect of the peptides was determined as mean fluorescence intensity (MFI). The EC50 value is the peptide concentration that is required for half-maximal stabilization of the MHC I molecules at the cell surfaces (half-maximal MFI). All peptides were custom-synthesized by EMC microcollections GmbH (Tübingen, Germany).
The S-score was calculated using the public web server at URL: http://www.syfpeithi.de/ (version July 2009). The S-score  indicates how well a peptide sequence matches the canonical motif.
For calculation of the Immune Epitope Database- (IEDB-)ANN score the public web server at URL: http://tools.immuneepitope.org/analyze/html/mhc_binding.html (version 2009-09-01) was used. IEDB offers several prediction tools for peptide binding to MHC I molecules (artificial neural networks (ANNs), average relative binding (ARB), stabilized matrix method (SMM), SMM with a peptide: MHC binding energy covariance matrix (SMMPMBEC), scoring matrices derived from combinatorial peptide libraries (comblib_sidney2008), consensus) . The IEBD-ANN method  was chosen because it has been determined to be qualitatively best performing . The IEDB-ANN scores are predicted IC50 values.
For visualization of the peptide distribution in a high-dimensional descriptor space we used planar SOMs  as implemented in the molmap software package [28, 29]. The trained SOM performs a nonlinear mapping from the original descriptor space onto a two-dimensional map. Each data point is assigned to one of a defined number of receptive fields (neurons) of the SOM. SOM training was performed as described previously .
We report the design and examination of 180 octapeptides (Table 1; Suppl. 2) in a cellular MHC I stabilization assay. The ability of an octapeptide to stabilize MHC I was specified as EC50 value, which is defined as the peptide concentration required for half-maximal stabilization of the MHC I proteins at the cell surface by the test peptide. Nine of the in silico designed octapeptides exhibited a stabilizing effect (Table 1, Seq. 1–9), and 171 were nonstabilizing (Table 1, Seq. 10–50; Suppl. 2, Seq. 51–180). Six of the nine stabilizing octapeptides had EC50 values below 10μM (Table 1, Seq. 4–9) (i.e., strong MHC I stabilization). Three octapeptides (Table 1, Seq. 1–3) can be regarded as medium stabilizers (EC50> 20μM), two of which completely matched the canonical motif (Table 1, Seq. 1 and 2) with EC50 values of 24μM and 20μM. Six of the nine octapeptides that correspond to the canonical motif in only two of the three anchor positions yielded EC50 values below 10μM. Peptide 3 had an EC50 of 25μM. Peptide 8 (WKFIFDPV) conforming to the SYFPEITHI motif in two positions (underlined) was the most potent peptide with an EC50 of 0.4μM. Peptide 9 (FHHAHRTV) obeys the canonical motif in just one anchor position but was still among the best stabilizers with an EC50 value of 9μM.
The SYFPEITHI score (S-score) is used as a computed index for prediction of stabilizing abilities of peptides for specific MHC molecules . A high value indicates strong stabilizing effects. The S-score for the positive control in our experiments (SIINFEKL from ovalbumin ) is 25. The S-score of a known nonstabilizing octapeptide (LSPFPFDL an endogenous MHC I H2-Ld epitope ) is 13. The computed S-scores for the nine stabilizing octapeptides were between 8 and 27 (mean = 20 ± 6) reflecting their stabilizing effect (outlier: octapeptide 9 with an S-score of 8). Peptides 4, 5, and 7, while exhibiting EC50 values similar to sequence 9 (EC50 = 9μM), have more than two times greater S-scores (S-score = 22, S-score = 20, S-score = 17 (Table 1, Seq. 4, 5, 7)). A possible explanation for the deviation between the SYFPEITHI score and the actual binding behavior could be the anchor position assignment. Sequences 4, 5, and 7 completely fulfill the canonical motif while sequence 9 fulfills it in only one position. Thus the degree of correspondence to the canonical motif is well represented by the S-score but does not necessarily reflect the actual binding behavior. This suggests that alternative sequence motifs might confer strong stabilization effects or that the binding motive concept needs to be extended.
We then compared our experimental results to predictions of the Immune Epitope Database (IEDB) . The database offers several prediction methods of which, according to Peters et al. , IEDB-ANN  is the best performing. For the nine binding peptides found by us, the Pearson correlation  between the IC50 values predicted by IEDB-ANN for mouse H2-Kb and our measured EC50 values is −0.34, which indicates moderate negative correlation. Using the activity cutoff of IC50 < 500nM for “medium activity” , the IEDB-ANN method correctly predicts four of nine sequences as binding peptides (Table 1, Seq. 1–3, 6).
The remaining 171 octapeptides showed no detectable stabilizing effect at a maximal experimental peptide concentration of 100μg/mL and were therefore defined as nonstabilizing (Table 1, Seq. 10–50; cf. Suppl. 2, Seq. 51–180). The nonstabilizing octapeptides can be grouped into four categories according to the degree of fulfillment of the canonical SYFPEITHI motif:
For octapeptides of category (i) high S-scores were computed in the range between 22 and 28 (mean = 25 ± 2) suggesting a stabilizing ability of the octapeptides. In comparison to the S-scores of the nine stabilizing octapeptides, category (i) sequences had higher S-scores thus erroneously predicting an even stronger MHC I stabilizing effect. Category (ii) peptides obtained a mean S-score of 19 ± 2 still indicating possible MHC I stabilization. Notably, none of these octapeptides had a stabilizing effect in our experiments. The S-scores of category (iii) peptides (mean = 11 ± 2) are in agreement with the lack of a stabilizing effect. For category (iv) peptides the computed S-scores (mean = 1 ± 1) perfectly agreed with the experimental results obtained for these 120 sequences.
The IEDB-ANN method  predicts four sequences as “binding”, which were determined as “nonbinding” in our experiments (Table 1, category (ii), Seq. 10, 12, 13, 19). The remaining 37 negative sequences are correctly predicted as “nonbinding” (Table 1, categories (ii)–(iv)). Compared to the S-score index, the IEDB-ANN method is better suited for identifying nonbinding sequences that contain only a partial canonical motif (categories (ii) and (iii)). Despite these differences, both software tools (S-score and IEDB-ANN method) can be recommended for identification of negative (inactive) sequences lacking the canonical motif (categories (iii)–(iv)). Based on these limited data, quantitative IC50 predictions of binding/nonbinding peptides by this software seem to be of limited accuracy but qualitative prediction is acceptable.
The experimental results for the 180 designed octapeptides allowed us to reassess the canonical motif. We found 28 inactive octapeptides that conform to the motif in all three (category (i)) or two residue positions (category (ii)). This corroborates the results of Zhong et al.  reporting one nonstabilizing octapeptide with the canonical motif, and Hiss et al.  reporting four nonstabilizing octapeptides corresponding to the motif. Our data suggest that the canonical motif alone is insufficient for predicting MHC I stabilization. Octapeptides stabilize the MHC I molecules by binding into the peptide binding groove which is framed by two alpha helices on top of a eight-stranded beta sheet  (Figure 1(a)). Amino acids at sequence position three and five, favorably tyrosine or phenylalanine, can form aromatic interactions with MHC I residues facing the binding groove (Figure 1(b)), which could explain why canonical occupancy often corresponds to stabilizing peptides . In addition, the aliphatic residue at position eight interacts with aliphatic amino acids in a deep pocket of the MHC I peptide binding canyon (Figure 1(b)). Octapeptides that conform completely to the canonical motif but show no stabilizing effect indicate that other amino acids besides the three anchor residues are important for the stabilizing effect. Amino acids at nonmotif positions could interfere with the favorable effects of the three anchor residues and lead to a nonstabilizing peptide.
To visualize the distributions of stabilizing and nonstabilizing octapeptides, we trained a SOM [28, 29] to obtain a two-dimensional map of the peptide distribution. The SOM represents the peptides based on their physicochemical properties coded by a multidimensional vector. Adjacent regions of a given peptide on the SOM represent peptides with similar physicochemical properties. The SOM was trained with a total of 100,603 octapeptides. We randomly generated 100,000 octamer sequences according to the amino acid frequency found in known mouse proteins to mimic murine sequence space. For the remaining 603 octapeptides the H-2Kb binding affinities were known from published experimental data (training data set: 423 octapeptides with 242 stabilizing and 181 nonstabilizing peptides and own experimental results: 180 octapeptides with 9 stabilizing and 171 nonstabilizing peptides). The 251 octapeptides with H-2Kb binding affinity are highlighted on the trained SOM presented with Figure 2(a). It is noteworthy that 241 of the 251 stabilizing peptides form a “stabilizing cluster” on the map (neurons 9–11/0, 9-10/1, 9-10/2), which indicates that these peptides are more similar to each other than to the randomly generated octapeptides. The highest density of stabilizing peptides is located in neuron (9/1) which contains 180 sequences. The outlier neuron (1/15) contains octapeptide 9 (FHHAHRTV), a stabilizing octapeptide with a canonical residue in only one anchor position.
The distribution of stabilizing octapeptides fulfilling the canonical motif in all three anchor positions (80 sequences) is presented in Figure 2(b). All 80 octapeptides are located in a “stabilizing cluster”. Of the 152 sequences complying in only two anchor positions with the canonical motif 145 are located in this “stabilizing cluster” (Figure 2(c)). The remaining seven of the 152 sequences, which are not located in the “stabilizing cluster”, are located in neurons framing the “stabilizing cluster”. The canonical motif is thus overrepresented in the “stabilizing cluster”. Although the known active octapeptide sequences constitute an island on the SOM implying similar physicochemical properties, our experimental results suggest that the canonical motif represents only a, albeit maybe dominant, fraction of the MHC I stabilizing sequences (Table 1, Seq. 10–21).
The SOM presented in Figures 3(a)–3(c) presents the distribution of sequences containing only stabilizing (green), only nonstabilizing (red), or containing both stabilizing and nonstabilizing octapeptides (blue). The locations of all 251 stabilizing octapeptides are shown; Figure 3(a) additionally includes category (i) nonstabilizing octapeptides, Figure 3(b) category (ii), and Figure 3(c) category (iii) peptides. The majority (54%) of the nonstabilizing octapeptides of category (i) (24 sequences) is located in neurons surrounding the “stabilizing cluster”, implying similarity in terms of the peptide representation by physicochemical properties (Figure 3(a)). Notably, the three neurons (9/1), (10/2), and (11/0) also contain nonstabilizing peptides (blue-colored neurons). Four motif-conform nonstabilizing octapeptides (WRYNYDPL,FRYEYRSL,HRYVYRNI,YRYKYDRL) are located in neuron (9/1) which contains the highest number of stabilizing octapeptides (180 sequences). The remaining (46%) nonstabilizing octapeptides of category (i) populate the lower left quadrant of the SOM. As illustrated in Figure 3(b), this area becomes more densely occupied when the 93 octapeptides of category (ii) are included: 75% of these peptides are located in this area of the SOM. Only two sequences of category (ii) can be found in the “stabilizing cluster”: FRYVWRTL and TTEWYTKI (neurons (9/0) and (9/1), Figure 3(b)). Apparently, category (iii) octapeptides are scattered in sequence space (Figure 3(c)). Only one nonstabilizing octapeptide of this category can be found in the “stabilizing cluster” (neuron (10/0), Figure 3(c)).
In summary, we have identified nine stabilizing octapeptides, two of which conform to the canonical motif in all three anchor positions, six fulfill two anchor requirements, and one sequence complies with the canonical motif in just one position. The majority of the designed and tested octapeptides (171 sequences) had no MHC I stabilizing effect. Twelve of the nonstabilizing octapeptides completely conform to the canonical motif, and sixteen fulfill the motif at two anchor positions. 23 octapeptides comply with the canonical motif at one residue position and exhibit no stabilizing effect. The remaining 120 octapeptides share no residue position with this motif. Since the experimental results reported here were not included in the SOM training, the resulting map provides a physicochemically defined distribution of stabilizing and nonstabilizing octapeptides in sequence space. Apparently, the stabilizing octapeptides constitute an island in octapeptide sequences space. Still, nonstabilizing octapeptides are colocated in this area. These nonstabilizing samples fulfill three or two residue positions of the canonical motif. The SOM clusters stabilizing peptides in a section of sequence space with similar physicochemical properties. A hint towards additional stabilizing clusters could be sequence 9 which is not clustered together with the other stabilizing peptides. Furthermore, neurons adjacent to the “stabilizing cluster” also contain MHC I stabilizing sequences, which indicates that the epitope motif concept need to be extended in order to cover and predict alternative stabilizing peptides.
Our study confirms and extends the epitope motif concept for MHC-binding peptides proposed by Rammensee and coworkers . We found octapeptides that lack key anchor residues but still exhibit a pronounced MHC I stabilization ability well comparable to peptides that fully conform to the canonical sequence motif. We also present a number of motif-conform but nonstabilizing peptides. This two findings clearly demonstrate that the canonical sequence motif alone is no sufficient criterion for the MHC I stabilizing peptides.
Supplementary 1: a descriptor list with specifications of the descriptors used for encoding each residue of an octapeptide.
Supplemantary 2: experimental results of 180 tested octapeptides including experimental EC50 value, computed SYFPEITHI score and IEDB-ANN score of the octapeptides.
The authors are grateful to Norbert Dichter for technical assistance. This study was supported by the Beilstein-Institut zur Förderung der Chemischen Wissenschaften and the Hermann-Willkomm-Stiftung, Frankfurt, Germany. J. M. Wisniewska and N. Jäger contributed equally to this study. G. Schneider and J. A. Hiss share senior authorship.