Identifying protein surface regions preferentially recognizable by antibodies (antigenic epitopes) is at the heart of new immuno-diagnostic reagent discovery and vaccine design, and computational methods for antigenic epitope prediction provide crucial means to serve this purpose. Many linear B-cell epitope prediction methods were developed, such as BepiPred, ABCPred, AAP, BCPred, BayesB, BEOracle/BROracle, and BEST, towards this goal. However, effective immunological research demands more robust performance of the prediction method than what the current algorithms could provide. In this work, a new method to predict linear antigenic epitopes is developed; Support Vector Machine has been utilized by combining the Tri-peptide similarity and Propensity scores (SVMTriP). Applied to non-redundant B-cell linear epitopes extracted from IEDB, SVMTriP achieves a sensitivity of 80.1% and a precision of 55.2% with a five-fold cross-validation. The AUC value is 0.702. The combination of similarity and propensity of tri-peptide subsequences can improve the prediction performance for linear B-cell epitopes. Moreover, SVMTriP is capable of recognizing viral peptides from a human protein sequence background. A web server based on our method is constructed for public use. The server and all datasets used in the current study are available at http://sysbio.unl.edu/SVMTriP.
Accurate identification of immunogenic regions in a given antigen chain is a difficult and actively pursued problem. Although accurate predictors for T-cell epitopes are already in place, the prediction of the B-cell epitopes requires further research. We overview the available approaches for the prediction of B-cell epitopes and propose a novel and accurate sequence-based solution. Our BEST (B-cell Epitope prediction using Support vector machine Tool) method predicts epitopes from antigen sequences, in contrast to some method that predict only from short sequence fragments, using a new architecture based on averaging selected scores generated from sliding 20-mers by a Support Vector Machine (SVM). The SVM predictor utilizes a comprehensive and custom designed set of inputs generated by combining information derived from the chain, sequence conservation, similarity to known (training) epitopes, and predicted secondary structure and relative solvent accessibility. Empirical evaluation on benchmark datasets demonstrates that BEST outperforms several modern sequence-based B-cell epitope predictors including ABCPred, method by Chen et al. (2007), BCPred, COBEpro, BayesB, and CBTOPE, when considering the predictions from antigen chains and from the chain fragments. Our method obtains a cross-validated area under the receiver operating characteristic curve (AUC) for the fragment-based prediction at 0.81 and 0.85, depending on the dataset. The AUCs of BEST on the benchmark sets of full antigen chains equal 0.57 and 0.6, which is significantly and slightly better than the next best method we tested. We also present case studies to contrast the propensity profiles generated by BEST and several other methods.
Functional T-cell epitope discovery is a key process for the development of novel immunotherapies, particularly for cancer immunology. In silico epitope prediction is a common strategy to try to achieve this objective. However, this approach suffers from a significant rate of false-negative results and epitope ranking lists that often are not validated by practical experience. A high-throughput platform for the identification and prioritization of potential T-cell epitopes is the iTopiaTM Epitope Discovery SystemTM, which allows measuring binding and stability of selected peptides to MHC Class I molecules. So far, the value of iTopia combined with in silico epitope prediction has not been investigated systematically. In this study, we have developed a novel in silico selection strategy based on three criteria: (1) predicted binding to one out of five common MHC Class I alleles; (2) uniqueness to the antigen of interest; and (3) increased likelihood of natural processing. We predicted in silico and characterized by iTopia 225 candidate T-cell epitopes and fixed-anchor analogs from three human tumor-associated antigens: CEA, HER2 and TERT. HLA-A2-restricted fragments were further screened for their ability to induce cell-mediated responses in HLA-A2 transgenic mice. The iTopia binding assay was only marginally informative while the stability assay proved to be a valuable experimental screening method complementary to in silico prediction. Thirteen novel T-cell epitopes and analogs were characterized and additional potential epitopes identified, providing the basis for novel anticancer immunotherapies. In conclusion, we show that combination of in silico prediction and an iTopia-based assay may be an accurate and efficient method for MHC Class I epitope discovery among tumor-associated antigens.
cancer vaccine; CEA; epitope prediction; HER2/neu; TERT
Bioinformatics tools have the potential to accelerate research into the design of vaccines and diagnostic tests by exploiting genome sequences. The aim of this study was to assess whether in silico analysis could be combined with in vitro screening methods to rapidly identify peptides that are immunogenic during Mycobacterium bovis infection of cattle. In the first instance the M. bovis-derived protein ESAT-6 was used as a model antigen to describe peptides containing T-cell epitopes that were frequently recognized across mammalian species, including natural hosts for tuberculosis (humans and cattle) and small-animal models of tuberculosis (mice and guinea pigs). Having demonstrated that some peptides could be recognized by T cells from a number of M. bovis-infected hosts, we tested whether a virtual-matrix-based human prediction program (ProPred) could identify peptides that were recognized by T cells from M. bovis-infected cattle. In this study, 73% of the experimentally defined peptides from 10 M. bovis antigens that were recognized by bovine T cells contained motifs predicted by ProPred. Finally, in validating this observation, we showed that three of five peptides from the mycobacterial antigen Rv3019c that were predicted to contain HLA-DR-restricted epitopes were recognized by T cells from M. bovis-infected cattle. The results obtained in this study support the approach of using bioinformatics to increase the efficiency of epitope screening and selection.
Accurate prediction of antigenic epitopes is important for immunologic research and medical applications, but it is still an open problem in bioinformatics. The case for discontinuous epitopes is even worse - currently there are only a few discontinuous epitope prediction servers available, though discontinuous peptides constitute the majority of all B-cell antigenic epitopes. The small number of structures for antigen-antibody complexes limits the development of reliable discontinuous epitope prediction methods and an unbiased benchmark to evaluate developed methods.
In this work, we present two novel server applications for discontinuous epitope prediction: EPSVR and EPMeta, where EPMeta is a meta server. EPSVR, EPMeta, and datasets are available at http://sysbio.unl.edu/services.
The server application for discontinuous epitope prediction, EPSVR, uses a Support Vector Regression (SVR) method to integrate six scoring terms. Furthermore, we combined EPSVR with five existing epitope prediction servers to construct EPMeta. All methods were benchmarked by our curated independent test set, in which all antigens had no complex structures with the antibody, and their epitopes were identified by various biochemical experiments. The area under the receiver operating characteristic curve (AUC) of EPSVR was 0.597, higher than that of any other existing single server, and EPMeta had a better performance than any single server - with an AUC of 0.638, significantly higher than PEPITO and Disctope (p-value < 0.05).
Epitope identification assists in developing molecules for clinical applications and is useful in defining molecular features of allergens for understanding structure/function relationship. The present study was aimed to identify the B cell epitopes of alcohol dehydrogenase (ADH) allergen from Curvularia lunata using in-silico methods and immunoassay.
B cell epitopes of ADH were predicted by sequence and structure based methods and protein-protein interaction tools while T cell epitopes by inhibitory concentration and binding score methods. The epitopes were superimposed on a three dimensional model of ADH generated by homology modeling and analyzed for antigenic characteristics. Peptides corresponding to predicted epitopes were synthesized and immunoreactivity assessed by ELISA using individual and pooled patients' sera.
The homology model showed GroES like catalytic domain joined to Rossmann superfamily domain by an alpha helix. Stereochemical quality was confirmed by Procheck which showed 90% residues in most favorable region of Ramachandran plot while Errat gave a quality score of 92.733%. Six B cell (P1–P6) and four T cell (P7–P10) epitopes were predicted by a combination of methods. Peptide P2 (epitope P2) showed E(X)2GGP(X)3KKI conserved pattern among allergens of pathogenesis related family. It was predicted as high affinity binder based on electronegativity and low hydrophobicity. The computational methods employed were validated using Bet v 1 and Der p 2 allergens where 67% and 60% of the epitope residues were predicted correctly. Among B cell epitopes, Peptide P2 showed maximum IgE binding with individual and pooled patients' sera (mean OD 0.604±0.059 and 0.506±0.0035, respectively) followed by P1, P4 and P3 epitopes. All T cell epitopes showed lower IgE binding.
Four B cell epitopes of C. lunata ADH were identified. Peptide P2 can serve as a potential candidate for diagnosis of allergic diseases.
Accurate prediction of B-cell epitopes has remained a challenging task in computational immunology despite several decades of research. Only 10% of the known B-cell epitopes are estimated to be continuous, yet they are often the targets of predictors because a solved tertiary structure is not required and they are integral to the development of peptide vaccines and engineering therapeutic proteins. In this article, we present COBEpro, a novel two-step system for predicting continuous B-cell epitopes. COBEpro is capable of assigning epitopic propensity scores to both standalone peptide fragments and residues within an antigen sequence. COBEpro first uses a support vector machine to make predictions on short peptide fragments within the query antigen sequence and then calculates an epitopic propensity score for each residue based on the fragment predictions. Secondary structure and solvent accessibility information (either predicted or exact) can be incorporated to improve performance. COBEpro achieved a cross-validated area under the curve (AUC) of the receiver operating characteristic up to 0.829 on the fragment epitopic propensity scoring task and an AUC up to 0.628 on the residue epitopic propensity scoring task. COBEpro is incorporated into the SCRATCH prediction suite at http://scratch.proteomics.ics.uci.edu.
B-cell; continuous; epitope; prediction; SVM
In spite of genome sequences of both human and N. gonorrhoeae in hand, vaccine for gonorrhea is yet not available. Due to availability of several host
and pathogen genomes and numerous tools for in silico prediction of effective B-cell and T-cell epitopes; recent trend of vaccine designing has been
shifted to peptide or epitope based vaccines that are more specific, safe, and easy to produce. In order to design and develop such a peptide vaccine
against the pathogen, we adopted a novel computational approache based on sequence, structure, QSAR, and simulation methods along with fold level
analysis to predict potential antigenic B-cell epitope derived T-cell epitopes from four vaccine targets of N. gonorrhoeae previously identified by us [Barh
and Kumar (2009) In Silico Biology 9, 1-7]. Four epitopes, one from each protein, have been designed in such a way that each epitope is highly likely to
bind maximum number of HLA molecules (comprising of both the MHC-I and II) and interacts with most frequent HLA alleles (A*0201, A*0204,
B*2705, DRB1*0101, and DRB1*0401) in human population. Therefore our selected epitopes are highly potential to induce both the B-cell and T-cell
mediated immune responses. Of course, these selected epitopes require further experimental validation.
gonorrhea; vaccine designing; epitope mapping; antigenicity HLA alleles; immune response
Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites.
In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites – hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination.
These findings may provide useful insights for exploiting the mechanisms of hydroxylation.
The current challenge in synthetic vaccine design is the development of a methodology to identify and test short antigen
peptides as potential T-cell epitopes. Recently, we described a HLA-peptide binding model (using structural properties)
capable of predicting peptides binding to any HLA allele. Consequently, we have developed a web server named T-EPITOPE
DESIGNER to facilitate HLA-peptide binding prediction. The prediction server is based on a model that defines peptide
binding pockets using information gleaned from X-ray crystal structures of HLA-peptide complexes, followed by the estimation
of peptide binding to binding pockets. Thus, the prediction server enables the calculation of peptide binding to HLA alleles.
This model is superior to many existing methods because of its potential application to any given HLA allele whose sequence
is clearly defined. The web server finds potential application in T cell epitope vaccine design.
HLA; peptide; binding; prediction; immunity; T-cell; epitope; design; vaccine
We describe PredUs, an interactive web server for the prediction of protein–protein interfaces. Potential interfacial residues for a query protein are identified by ‘mapping’ contacts from known interfaces of the query protein’s structural neighbors to surface residues of the query. We calculate a score for each residue to be interfacial with a support vector machine. Results can be visualized in a molecular viewer and a number of interactive features allow users to tailor a prediction to a particular hypothesis. The PredUs server is available at: http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PredUs.
Echinococcosis, also known as hydatid disease, is a type of zoonotic parasitic disease caused by the Echinococcus larvae infection. The disease is severely harmful to both humans and animals. Research and development of an epitope vaccine is crucial. To determine the dominant epitopes of the Eg95 antigen, the tertiary structure and the T- and B-combined epitope of the Eg95 protein for Echinococcus granulosus were predicted and analyzed in the present study. The tertiary structure of the Eg95 protein was predicted using the 3DLigandsite server and RasMol software. The T- and B-combined epitope of the Eg95 antigen was analyzed using the DNAStar (V5.0), IEDB, SYFPEITHI and BIMAS. Tertiary structure prediction results showed that there were potential epitopes in Eg95 antigen. Bioinformatics analysis revealed the T- and B-combined epitopes of Eg95 antigen. Four and six T- and B-combined epitopes induced immune responses in humans and mice. Additionally, four T- and B-combined epitopes induced immune responses in both humans and mice. The tertiary structure and T- and B-combined epitopes of the Eg95 protein were also determined. The results obtained in the present study may be beneficial in the investigation of Eg95 antigenicity and the development of dominant epitope vaccines.
Eg95; tertiary structure; T- and B-combined epitope
The binding between antigenic peptides (epitopes) and the MHC molecule is a key step in the cellular immune response. Accurate in silico prediction of epitope-MHC binding affinity can greatly expedite epitope screening by reducing costs and experimental effort.
Recently, we demonstrated the appealing performance of SVRMHC, an SVR-based quantitative modeling method for peptide-MHC interactions, when applied to three mouse class I MHC molecules. Subsequently, we have greatly extended the construction of SVRMHC models and have established such models for more than 40 class I and class II MHC molecules. Here we present the SVRMHC web server for predicting peptide-MHC binding affinities using these models. Benchmarked percentile scores are provided for all predictions. The larger number of SVRMHC models available allowed for an updated evaluation of the performance of the SVRMHC method compared to other well- known linear modeling methods.
SVRMHC is an accurate and easy-to-use prediction server for epitope-MHC binding with significant coverage of MHC molecules. We believe it will prove to be a valuable resource for T cell epitope researchers.
Protein antigens and their specific epitopes are formulation targets for epitope-based vaccines. A number of prediction servers are available for identification of peptides that bind major histocompatibility complex class I (MHC-I) molecules. The lack of standardized methodology and large number of human MHC-I molecules make the selection of appropriate prediction servers difficult. This study reports a comparative evaluation of thirty prediction servers for seven human MHC-I molecules.
Of 147 individual predictors 39 have shown excellent, 47 good, 33 marginal, and 28 poor ability to classify binders from non-binders. The classifiers for HLA-A*0201, A*0301, A*1101, B*0702, B*0801, and B*1501 have excellent, and for A*2402 moderate classification accuracy. Sixteen prediction servers predict peptide binding affinity to MHC-I molecules with high accuracy; correlation coefficients ranging from r = 0.55 (B*0801) to r = 0.87 (A*0201).
Non-linear predictors outperform matrix-based predictors. Most predictors can be improved by non-linear transformations of their raw prediction scores. The best predictors of peptide binding are also best in prediction of T-cell epitopes. We propose a new standard for MHC-I binding prediction – a common scale for normalization of prediction scores, applicable to both experimental and predicted data. The results of this study provide assistance to researchers in selection of most adequate prediction tools and selection criteria that suit the needs of their projects.
ArchPRED server () implements a novel fragment-search based method for predicting loop conformations. The inputs to the server are the atomic coordinates of the query protein and the position of the loop. The algorithm selects candidate loop fragments from a regularly updated loop library (Search Space) by matching the length, the types of bracing secondary structures of the query and by satisfying the geometrical restraints imposed by the stem residues. Subsequently, candidate loops are inserted in the query protein framework where their side chains are rebuilt and their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z-score that combines information on sequence similarity and fit of predicted and observed [ϕ/ψ] main chain dihedral angle propensities. The final loop conformation is built in the protein structure and annealed in the environment using conjugate gradient minimization. The prediction method was benchmarked on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it was possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 Å of r.m.s.d. accuracy, respectively. In a head to head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a ∼5:1 ratio an earlier developed database search method.
Class I HLA's polymorphism has hampered CTL epitope mapping with laborious experiments. Objectives are 1) to evaluate the novel in silico model in predicting previously reported epitopes in comparison with existing program, and 2) to apply the model to predict optimal epitopes with HLA using experimental results.
Materials and Methods
We have developed a novel in silico epitope prediction method, based on HLA crystal structure and a peptide docking simulation model, calculating the peptide-HLA binding affinity at four amino acid residues in each terminal. It was applied to predict 52 HIV best–defined CTL epitopes from 15-mer overlapping peptides, and its predictive ability was compared with the HLA binding motif-based program of HLArestrictor. It was then used to predict HIV-1 Gag optimal epitopes from previous ELISpot results.
43/52 (82.7%) epitopes were detected by the novel model, whereas 37 (71.2%) by HLArestrictor. We also found a significant reduction in epitope detection rates for longer epitopes in HLArestrictor (p = 0.027), but not in the novel model. Improved epitope prediction was also found by introducing both models, especially in specificity (p<0.001). Eight peptides were predicted as novel, immunodominant epitopes in both models.
This novel model can predict optimal CTL epitopes, which were not detected by an existing program. This model is potentially useful not only for narrowing down optimal epitopes, but predicting rare HLA alleles with less information. By introducing different principal models, epitope prediction will be more precise.
► All the structural B-cell epitopes we examined are discontinuous. ► Only 18% of structural epitopes are spanned by a peptide fragment of 40 residues. ► Centralized and random distributions were considered for key functional residues. ► Fragments with only 7 residues will successfully span most key functional residues.
Although it is widely acknowledged that most B-cell epitopes are discontinuous, the degree of discontinuity is poorly understood. For example, given that an antigen having a single epitope that has been chopped into peptides of a specific length, what is the likelihood that one of the peptides will span all the residues belonging to that epitope? Or, alternatively, what is the largest proportion of the epitope's residues that any peptide is likely to contain? These and similar questions are of direct relevance both to computational methods that aim to predict the location of epitopes from sequence (linear B-cell epitope prediction methods) and window-based experimental methods that aim to locate epitopes by assessing the strength of antibody binding to synthetic peptides on a chip.
In this paper we present an analysis of the degree of B-cell epitope discontinuity, both in terms of the structural epitopes defined by a set of antigen–antibody complexes in the Protein Data Bank, and with respect to the distribution of key residues that form functional epitopes. We show that, taking a strict definition of discontinuity, all the epitopes in our data set are discontinuous. More significantly, we provide explicit guidance about the choice of peptide length when using window-based B-cell epitope prediction and mapping techniques based on a detailed analysis of the likely effectiveness of different lengths.
ASA, accessible surface area; PDB, Protein Data Bank; Antigen; Antibody; Epitope; Structural epitope; Functional epitope
Antibody Z13e1 is a relatively broadly neutralizing anti-HIV-1 antibody that recognizes the membrane proximal external region (MPER) of the HIV-1 envelope (Env) glycoprotein gp41. Based on the crystal structure of an MPER epitope peptide in complex with Z13e1 Fab, we identified an unrelated protein, IL-22, with a surface-exposed region that is structurally homologous in its backbone to the gp41 Z13e1 epitope. By grafting the gp41 Z13e1 epitope sequence onto the structurally homologous region in IL-22, we engineered a novel protein (Z13-IL22-2) that contains the MPER epitope sequence for use as a potential immunogen and as a reagent for detection of Z13e1-like antibodies. The Z13-IL22-2 protein binds Fab Z13e1 with a Kd of 73nM. The crystal structure of Z13-IL22-2 in complex with Fab Z13e1 shows that the epitope region is faithfully replicated in the Fab-bound scaffold protein; however isothermal calorimetry studies indicate that Fab binding to Z13-IL22-2 is not a lock-and-key event, leaving open the question of whether conformational changes upon binding occur in the Fab, or Z13-IL-22, or in both.
HIV-1; antibody; membrane proximal external region; neutralizing antibody; x-ray crystallography
Prediction of antigenic epitopes on protein surfaces is important for vaccine design. Most existing epitope prediction methods focus on protein sequences to predict continuous epitopes linear in sequence. Only a few structure-based epitope prediction algorithms are available and they have not yet shown satisfying performance.
We present a new antigen Epitope Prediction method, which uses ConsEnsus Scoring (EPCES) from six different scoring functions - residue epitope propensity, conservation score, side-chain energy score, contact number, surface planarity score, and secondary structure composition. Applied to unbounded antigen structures from an independent test set, EPCES was able to predict antigenic eptitopes with 47.8% sensitivity, 69.5% specificity and an AUC value of 0.632. The performance of the method is statistically similar to other published methods. The AUC value of EPCES is slightly higher compared to the best results of existing algorithms by about 0.034.
Our work shows consensus scoring of multiple features has a better performance than any single term. The successful prediction is also due to the new score of residue epitope propensity based on atomic solvent accessibility.
The aim of the present study was to predict the secondary structure and the T- and B-cell epitopes of the Echinococcus multilocularis Emy162 antigen, in order to reveal the dominant epitopes of the antigen. The secondary structure of the protein was analyzed using the Gamier-Robson method, and the improved self-optimized prediction method (SOPMA) server. The T- and B-cell epitopes of Emy162 were predicted using Immune Epitope Database (IEDB), Syfpeithi, Bcepred and ABCpred online software. The characteristics of hydrophilicity, flexibility, antigenic propensity and exposed surface area were predicted. The tertiary structure of the Emy162 protein was predicted by the 3DLigandSite server. The results demonstrated that random coils and β sheets accounted for 34.64 and 21.57% of the secondary structure of the Emy162 protein, respectively. This was indicative of the presence of potential dominant antigenic epitopes in Emy162. Following bioinformatic analysis, numerous distinct antigenic epitopes of Emy162 were identified. The high-scoring T-cell epitopes were located at positions 16–29, 36–39, 97–103, 119–125 and 128–135, whilst the likely B-cell epitopes were located at positions 8–10, 19–25, 44–50, 74–81, 87–93, 104–109 and 128–136. In conclusion, five T-cell and seven B-cell dominant epitopes of the Emy162 antigen were revealed by the bioinformatic methods, which may be of use in the development of a dominant epitope vaccine.
Emy162; secondary structure; T cell epitopes; B cell epitopes; bioinformatics
Saint Louis encephalitis virus, a member of the flaviviridae subgroup, is a culex mosquito-borne pathogen. Despite severe epidemic outbreaks on several occasions, not much progress has been made with regard to an epitope-based vaccine designed for Saint Louis encephalitis virus. The envelope proteins were collected from a protein database and analyzed with an in silico tool to identify the most immunogenic protein. The protein was then verified through several parameters to predict the T-cell and B-cell epitopes. Both T-cell and B-cell immunity were assessed to determine that the protein can induce humoral as well as cell-mediated immunity. The peptide sequence from 330–336 amino acids and the sequence REYCYEATL from the position 57 were found as the most potential B-cell and T-cell epitopes, respectively. Furthermore, as an RNA virus, one important thing was to establish the epitope as a conserved one; this was also done by in silico tools, showing 63.51% conservancy. The epitope was further tested for binding against the HLA molecule by computational docking techniques to verify the binding cleft epitope interaction. However, this is a preliminary study of designing an epitope-based peptide vaccine against Saint Louis encephalitis virus; the results awaits validation by in vitro and in vivo experiments.
epitope; computational tools; humoral; cell-mediated immunity; conservancy
The application of peptide based diagnostics and therapeutics mimicking part of protein antigen is experiencing renewed interest. So far selection and design rationale for such peptides is usually driven by T-cell epitope prediction, available experimental and modelled 3D structure, B-cell epitope predictions such as hydrophilicity plots or experience. If no structure is available the rational selection of peptides for the production of functionally altering or neutralizing antibodies is practically impossible. Specifically if many alternative antigens are available the reduction of required synthesized peptides until one successful candidate is found is of central technical interest. We have investigated the integration of B-cell epitope prediction with the variability of antigen and the conservation of patterns for post-translational modification (PTM) prediction to improve over state of the art in the field. In particular the application of machine-learning methods shows promising results.
We find that protein regions leading to the production of functionally altering antibodies are often characterized by a distinct increase in the cumulative sum of three presented parameters. Furthermore the concept to maximize antigenicity, minimize variability and minimize the likelihood of post-translational modification for the identification of relevant sites leads to biologically interesting observations. Primarily, for about 50% of antigen the approach works well with individual area under the ROC curve (AROC) values of at least 0.65. On the other hand a significant portion reveals equivalently low AROC values of < = 0.35 indicating an overall non-Gaussian distribution. While about a third of 57 antigens are seemingly intangible by our approach our results suggest the existence of at least two distinct classes of bioinformatically detectable epitopes which should be predicted separately. As a side effect of our study we present a hand curated dataset for the validation of protectivity classification. Based on this dataset machine-learning methods further improve predictive power to a class separation in an equilibrated dataset of up to 83%.
We present a computational method to automatically select and rank peptides for the stimulation of potentially protective or otherwise functionally altering antibodies. It can be shown that integration of variability, post-translational modification pattern conservation and B-cell antigenicity improve rational selection over random guessing. Probably more important, we find that for about 50% of antigen the approach works substantially better than for the overall dataset of 57 proteins. Essentially as a side effect our method optimizes for presumably best applicable peptides as they tend to be likely unmodified and as invariable as possible which is answering needs in diagnosis and treatment of pathogen infection. In addition we show the potential for further improvement by the application of machine-learning methods, in particular Random Forests.
In this work, we develop a fully automated method for the quality assessment prediction of protein structural models generated by structure prediction approaches such as fold recognition servers, or ab initio methods. The approach is based on fragment comparisons and a consensus Cα contact potential derived from the set of models to be assessed and was tested on CASP7 server models. The average Pearson linear correlation coefficient between predicted quality and model GDT-score per target is 0.83 for the 98 targets which is better than those of other quality assessment methods that participated in CASP7. Our method also outperforms the other methods by about 3% as assessed by the total GDT-score of the selected top models.
model quality assessment prediction; TASSER; SP3
The I-TASSER server is an integrated platform for automated protein structure and function prediction based on the sequence-to-structure-to-function paradigm. Starting from an amino acid sequence, I-TASSER first generates three-dimensional atomic models from multiple threading alignments and iterative structural assembly simulations. The function of the protein is then inferred by structurally matching the 3D models with other known proteins. The output from a typical server run contains full-length secondary and tertiary structure predictions, and functional annotations on ligand-binding sites, Enzyme Commission numbers and Gene Ontology terms. An estimate of accuracy of the predictions is provided based on the confidence score of the modeling. This protocol provides new insights and guidelines for designing of on-line server systems for the state-of-the-art protein structure and function predictions. The server is available at http://zhang.bioinformatics.ku.edu/I-TASSER.
I-TASSER; protein structure prediction; protein function prediction
CEP server () provides a web interface to the conformational epitope prediction algorithm developed in-house. The algorithm, apart from predicting conformational epitopes, also predicts antigenic determinants and sequential epitopes. The epitopes are predicted using 3D structure data of protein antigens, which can be visualized graphically. The algorithm employs structure-based Bioinformatics approach and solvent accessibility of amino acids in an explicit manner. Accuracy of the algorithm was found to be 75% when evaluated using X-ray crystal structures of Ag–Ab complexes available in the PDB. This is the first and the only method available for the prediction of conformational epitopes, which is an attempt to map probable antibody-binding sites of protein antigens.