In order to study the stoichiometry of monoclonal antibody (MAb) neutralization of T-cell line-adapted human immunodeficiency virus type 1 (HIV-1) in antibody excess and under equilibrium conditions, we exploited the ability of HIV-1 to generate mixed oligomers when different env genes are coexpressed. By the coexpression of Env glycoproteins that either can or cannot bind a neutralizing MAb in an env transcomplementation assay, virions were generated in which the proportion of MAb binding sites could be regulated. As the proportion of MAb binding sites in Env chimeric virus increased, MAb neutralization gradually increased. Virus neutralization by virion aggregation was minimal, as MAb binding to HIV-1 Env did not interfere with an AMLV Env-mediated infection by HIV-1(AMLV/HIV-1) pseudotypes of CD4− HEK293 cells. MAb neutralization of chimeric virions could be described as a third-order function of the proportion of Env antigen refractory to MAb binding. This scenario is consistent with the Env oligomer constituting the minimal functional unit and neutralization occurring incrementally as each Env oligomer binds MAb. Alternatively, the data could be fit to a sigmoid function. Thus, these data could not exclude the existence of a threshold for neutralization. However, results from MAb neutralization of chimeric virus containing wild-type Env and Env defective in CD4 binding was readily explained by a model of incremental MAb neutralization. In summary, the data indicate that MAb neutralization of T-cell line-adapted HIV-1 is incremental rather than all or none and that each MAb binding an Env oligomer reduces the likelihood of infection.
Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype–phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying ‘hot’ or ‘cold’ regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype–phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/.
The interaction between antibodies and antigens is one of the most important immune system mechanisms for clearing infectious organisms from the host. Antibodies bind to antigens at sites referred to as B-cell epitopes. Identification of the exact location of B-cell epitopes is essential in several biomedical applications such as; rational vaccine design, development of disease diagnostics and immunotherapeutics. However, experimental mapping of epitopes is resource intensive making in silico methods an appealing complementary approach. To date, the reported performance of methods for in silico mapping of B-cell epitopes has been moderate. Several issues regarding the evaluation data sets may however have led to the performance values being underestimated: Rarely, all potential epitopes have been mapped on an antigen, and antibodies are generally raised against the antigen in a given biological context not against the antigen monomer. Improper dealing with these aspects leads to many artificial false positive predictions and hence to incorrect low performance values. To demonstrate the impact of proper benchmark definitions, we here present an updated version of the DiscoTope method incorporating a novel spatial neighborhood definition and half-sphere exposure as surface measure. Compared to other state-of-the-art prediction methods, Discotope-2.0 displayed improved performance both in cross-validation and in independent evaluations. Using DiscoTope-2.0, we assessed the impact on performance when using proper benchmark definitions. For 13 proteins in the training data set where sufficient biological information was available to make a proper benchmark redefinition, the average AUC performance was improved from 0.791 to 0.824. Similarly, the average AUC performance on an independent evaluation data set improved from 0.712 to 0.727. Our results thus demonstrate that given proper benchmark definitions, B-cell epitope prediction methods achieve highly significant predictive performances suggesting these tools to be a powerful asset in rational epitope discovery. The updated version of DiscoTope is available at www.cbs.dtu.dk/services/DiscoTope-2.0.
The human immune system has an incredible ability to fight pathogens (bacterial, fungal and viral infections). One of the most important immune system events involved in clearing infectious organisms is the interaction between the antibodies and antigens (molecules such as proteins from the pathogenic organism). Antibodies bind to antigens at sites known as B-cell epitopes. Hence, identification of areas on the surface antigens capable of binding to antibodies (also known as B-cell epitopes) may aid the development of various immune related applications (e.g. vaccines and immunotherapeutic). However, experimental identification of B-cell epitopes is a resource intensive task, thereby making computer-aided methods an appealing complementary approach. Previously reported performances of methods for B cell epitope predictive have been moderate. Here, we present an updated version of the B-cell epitope prediction method; DiscoTope, that on the basis of a protein structure and epitope propensity scores predicts residues likely to be involved in B-cell epitopes. We demonstrate that the low performances to some extent can be explained by poorly defined benchmarks, and that inclusion of additional biological information greatly enhances the predictive performance. This suggests that, given proper benchmark definitions, state-of-the-art B cell epitope prediction methods perform significantly better than generally assumed.
In all vertebrate animals, CD8+ cytotoxic T lymphocytes (CTLs) are controlled by major histocompatibility complex class I (MHC-I) molecules. These are highly polymorphic peptide receptors selecting and presenting endogenously derived epitopes to circulating CTLs. The polymorphism of the MHC effectively individualizes the immune response of each member of the species. We have recently developed efficient methods to generate recombinant human MHC-I (also known as human leukocyte antigen class I, HLA-I) molecules, accompanying peptide-binding assays and predictors, and HLA tetramers for specific CTL staining and manipulation. This has enabled a complete mapping of all HLA-I specificities (“the Human MHC Project”). Here, we demonstrate that these approaches can be applied to other species. We systematically transferred domains of the frequently expressed swine MHC-I molecule, SLA-1*0401, onto a HLA-I molecule (HLA-A*11:01), thereby generating recombinant human/swine chimeric MHC-I molecules as well as the intact SLA-1*0401 molecule. Biochemical peptide-binding assays and positional scanning combinatorial peptide libraries were used to analyze the peptide-binding motifs of these molecules. A pan-specific predictor of peptide–MHC-I binding, NetMHCpan, which was originally developed to cover the binding specificities of all known HLA-I molecules, was successfully used to predict the specificities of the SLA-1*0401 molecule as well as the porcine/human chimeric MHC-I molecules. These data indicate that it is possible to extend the biochemical and bioinformatics tools of the Human MHC Project to other vertebrate species.
Recombinant MHC; Peptide specificity; Binding predictions
MULTIPRED2 is a computational system for facile prediction of peptide binding to multiple alleles belonging to human leukocyte antigen (HLA) class I and class II DR molecules. It enables prediction of peptide binding to products of individual HLA alleles, combination of alleles, or HLA supertypes. NetMHCpan and NetMHCIIpan are used as prediction engines. The 13 HLA Class I supertypes are A1, A2, A3, A24, B7, B8, B27, B44, B58, B62, C1, and C4. The 13 HLA Class II DR supertypes are DR1, DR3, DR4, DR6, DR7, DR8, DR9, DR11, DR12, DR13, DR14, DR15, and DR16. In total, MULTIPRED2 enables prediction of peptide binding to 1077 variants representing 26 HLA supertypes. MULTIPRED2 has visualization modules for mapping promiscuous T-cell epitopes as well as those regions of high target concentration – referred to as T-cell epitope hotspots. Novel graphic representations are employed to display the predicted binding peptides and immunological hotspots in an intuitive manner and also to provide a global view of results as heat maps. Another function of MULTIPRED2, which has direct relevance to vaccine design, is the calculation of population coverage. Currently it calculates population coverage in five major groups in North America. MULTIPRED2 is an important tool to complement wet-lab experimental methods for identification of T-cell epitopes. It is available at http://cvc.dfci.harvard.edu/multipred2/.
T-cell epitope hotspots; HLA; HLA supertype; Human Leukocyte Antigen; promiscuous binding peptide; vaccine design
In this paper, we describe the methodologies behind three different aspects of the NetMHC family for prediction of MHC class I binding, mainly to HLAs. We we have updated the prediction servers servers, NetMHC-3.2, NetMHCpan-2.2, and a new consensus method, NetMHCcons, which, in their previous versions, have been evaluated to be among the very best performing MHC:peptide binding predictors available. Here we describe the background for these methods, and the rationale behind the different optimisation steps implemented in the methods. We go through the practical use of the methods, which are publicly available in the form of relatively fast and simple web interfaces. Furthermore, we will review results optained in actual epitope discovery projects where previous implementations of the described methods have been used in the initial selection of potential epitopes. Selected potential epitopes were all evaluated experimentally using ex vivo assays.
ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries.
Prediction methods as well as experimental methods for T-cell epitope discovery have developed significantly in recent years. High-throughput experimental methods have made it possible to perform full-length protein scans for epitopes restricted to a limited number of MHC alleles. The high costs and limitations regarding the number of proteins and MHC alleles that are feasibly handled by such experimental methods have made in silico prediction models of high interest. MHC binding prediction methods are today of a very high quality and can predict MHC binding peptides with high accuracy. This is possible for a large range of MHC alleles and relevant length of binding peptides. The predictions can easily be performed for complete proteomes of any size. Prediction methods are still, however, dependent on good experimental methods for validation, and should merely be used as a guide for rational epitope discovery. We expect prediction methods as well as experimental validation methods to continue to develop and that we will soon see clinical trials of products whose development has been guided by prediction methods.
CTL; epitope; HLA; MHC; prediction; T cell; vaccine
The rapid advancement of genome technologies holds great promise for improving the quality and speed of clinical and public health laboratory investigations and for decreasing their cost. The latest generation of genome DNA sequencers can provide highly detailed and robust information on disease-causing microbes, and in the near future these technologies will be suitable for routine use in national, regional, and global public health laboratories. With additional improvements in instrumentation, these next- or third-generation sequencers are likely to replace conventional culture-based and molecular typing methods to provide point-of-care clinical diagnosis and other essential information for quicker and better treatment of patients. Provided there is free-sharing of information by all clinical and public health laboratories, these genomic tools could spawn a global system of linked databases of pathogen genomes that would ensure more efficient detection, prevention, and control of endemic, emerging, and other infectious disease outbreaks worldwide.
genome-based informatics; disease monitoring; information sharing; point-of-care clinical diagnosis; genomic tools; emerging diseases; infectious diseases; outbreaks; bacteria; viruses; parasites; pathogens
Several studies have shown that cancers actively regulate alternative splicing. Altered splicing mechanisms in cancer lead to cancer-specific transcripts different from the pool of transcripts occurring only in healthy tissue. At the same time, altered presentation of HLA class I epitopes is frequently observed in various types of cancer. Down-regulation of genes related to HLA class I antigen processing has been observed in several cancer types, leading to fewer HLA class I antigens on the cell surface. Here, we use a peptidome wide analysis of predicted alternative splice forms, based on a publicly available database, to show that peptides over-represented in cancer splice variants comprise significantly fewer predicted HLA class I epitopes compared to peptides from normal transcripts. Peptides over-represented in cancer transcripts are in the case of the three most common HLA class I supertype representatives consistently found to contain fewer predicted epitopes compared to normal tissue. We observed a significant difference in amino acid composition between protein sequences associated with normal versus cancer tissue, as transcripts found in cancer are enriched with hydrophilic amino acids. This variation contributes to the observed significant lower likelihood of cancer-specific peptides to be predicted epitopes compared to peptides found in normal tissue.
Identification of antimicrobial resistance genes is important for understanding the underlying mechanisms and the epidemiology of antimicrobial resistance. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available in routine diagnostic laboratories and is anticipated to substitute traditional methods for resistance gene identification. Thus, the current challenge is to extract the relevant information from the large amount of generated data.
We developed a web-based method, ResFinder that uses BLAST for identification of acquired antimicrobial resistance genes in whole-genome data. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms. The method was evaluated on 1862 GenBank files containing 1411 different resistance genes, as well as on 23 de-novo-sequenced isolates.
When testing the 1862 GenBank files, the method identified the resistance genes with an ID = 100% (100% identity) to the genes in ResFinder. Agreement between in silico predictions and phenotypic testing was found when the method was further tested on 23 isolates of five different bacterial species, with available phenotypes. Furthermore, ResFinder was evaluated on WGS chromosomes and plasmids of 30 isolates. Seven of these isolates were annotated to have antimicrobial resistance, and in all cases, annotations were compatible with the ResFinder results.
A web server providing a convenient way of identifying acquired antimicrobial resistance genes in completely sequenced isolates was created. ResFinder can be accessed at www.genomicepidemiology.org. ResFinder will continuously be updated as new resistance genes are identified.
antibiotic resistance; genotype; ResFinder; resistance gene identification
CD4+ T cells orchestrate immunity against viral infections, but their importance in HIV infection remains controversial. Nevertheless, comprehensive studies have associated increase in breadth and functional characteristics of HIV-specific CD4+ T cells with decreased viral load. A major challenge for the identification of HIV-specific CD4+ T cells targeting broadly reactive epitopes in populations with diverse ethnic background stems from the vast genomic variation of HIV and the diversity of the host cellular immune system. Here, we describe a novel epitope selection strategy, PopCover, that aims to resolve this challenge, and identify a set of potential HLA class II-restricted HIV epitopes that in concert will provide optimal viral and host coverage. Using this selection strategy, we identified 64 putative epitopes (peptides) located in the Gag, Nef, Env, Pol and Tat protein regions of HIV. In total, 73% of the predicted peptides were found to induce HIV-specific CD4+ T cell responses. The Gag and Nef peptides induced most responses. The vast majority of the peptides (93%) had predicted restriction to the patient’s HLA alleles. Interestingly, the viral load in viremic patients was inversely correlated to the number of targeted Gag peptides. In addition, the predicted Gag peptides were found to induce broader polyfunctional CD4+ T cell responses compared to the commonly used Gag-p55 peptide pool. These results demonstrate the power of the PopCover method for the identification of broadly recognized HLA class II-restricted epitopes. All together, selection strategies, such as PopCover, might with success be used for the evaluation of antigen-specific CD4+ T cell responses and design of future vaccines.
The immune epitope database analysis resource (IEDB-AR: http://tools.iedb.org) is a collection of tools for prediction and analysis of molecular targets of T- and B-cell immune responses (i.e. epitopes). Since its last publication in the NAR webserver issue in 2008, a new generation of peptide:MHC binding and T-cell epitope predictive tools have been added. As validated by different labs and in the first international competition for predicting peptide:MHC-I binding, their predictive performances have improved considerably. In addition, a new B-cell epitope prediction tool was added, and the homology mapping tool was updated to enable mapping of discontinuous epitopes onto 3D structures. Furthermore, to serve a wider range of users, the number of ways in which IEDB-AR can be accessed has been expanded. Specifically, the predictive tools can be programmatically accessed using a web interface and can also be downloaded as software packages.
Binding of peptides to major histocompatibility complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC genomic region (called HLA) is extremely polymorphic comprising several thousand alleles, each encoding a distinct MHC molecule. The potentially unique specificity of the majority of HLA alleles that have been identified to date remains uncharacterized. Likewise, only a limited number of chimpanzee and rhesus macaque MHC class I molecules have been characterized experimentally. Here, we present NetMHCpan-2.0, a method that generates quantitative predictions of the affinity of any peptide–MHC class I interaction. NetMHCpan-2.0 has been trained on the hitherto largest set of quantitative MHC binding data available, covering HLA-A and HLA-B, as well as chimpanzee, rhesus macaque, gorilla, and mouse MHC class I molecules. We show that the NetMHCpan-2.0 method can accurately predict binding to uncharacterized HLA molecules, including HLA-C and HLA-G. Moreover, NetMHCpan-2.0 is demonstrated to accurately predict peptide binding to chimpanzee and macaque MHC class I molecules. The power of NetMHCpan-2.0 to guide immunologists in interpreting cellular immune responses in large out-bred populations is demonstrated. Further, we used NetMHCpan-2.0 to predict potential binding peptides for the pig MHC class I molecule SLA-1*0401. Ninety-three percent of the predicted peptides were demonstrated to bind stronger than 500 nM. The high performance of NetMHCpan-2.0 for non-human primates documents the method's ability to provide broad allelic coverage also beyond human MHC molecules. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan.
MHC class I; Binding specificity; Non-human primates; Artificial neural networks; CTL epitopes
Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.
Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new “omics”-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points.
NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.
Epitopes from all available full-length sequences of yellow fever virus (YFV) and dengue fever virus (DENV) restricted by Human Leukocyte Antigen class I (HLA-I) alleles covering 12 HLA-I supertypes were predicted using the NetCTL algorithm. A subset of 179 predicted YFV and 158 predicted DENV epitopes were selected using the EpiSelect algorithm to allow for optimal coverage of viral strains. The selected predicted epitopes were synthesized and approximately 75% were found to bind the predicted restricting HLA molecule with an affinity, KD, stronger than 500 nM. The immunogenicity of 25 HLA-A*02:01, 28 HLA-A*24:02 and 28 HLA-B*07:02 binding peptides was tested in three HLA-transgenic mice models and led to the identification of 17 HLA-A*02:01, 4 HLA-A*2402 and 4 HLA-B*07:02 immunogenic peptides. The immunogenic peptides bound HLA significantly stronger than the non-immunogenic peptides. All except one of the immunogenic peptides had KD below 100 nM and the peptides with KD below 5 nM were more likely to be immunogenic. In addition, all the immunogenic peptides that were identified as having a high functional avidity had KD below 20 nM. A*02:01 transgenic mice were also inoculated twice with the 17DD YFV vaccine strain. Three of the YFV A*02:01 restricted peptides activated T-cells from the infected mice in vitro. All three peptides that elicited responses had an HLA binding affinity of 2 nM or less. The results indicate the importance of the strength of HLA binding in shaping the immune response.
MHC class II binding predictions are widely used to identify epitope candidates in infectious agents, allergens, cancer and autoantigens. The vast majority of prediction algorithms for human MHC class II to date have targeted HLA molecules encoded in the DR locus. This reflects a significant gap in knowledge as HLA DP and DQ molecules are presumably equally important, and have only been studied less because they are more difficult to handle experimentally.
In this study, we aimed to narrow this gap by providing a large scale dataset of over 17,000 HLA-peptide binding affinities for a set of 11 HLA DP and DQ alleles. We also expanded our dataset for HLA DR alleles resulting in a total of 40,000 MHC class II binding affinities covering 26 allelic variants. Utilizing this dataset, we generated prediction tools utilizing several machine learning algorithms and evaluated their performance.
We found that 1) prediction methodologies developed for HLA DR molecules perform equally well for DP or DQ molecules. 2) Prediction performances were significantly increased compared to previous reports due to the larger amounts of training data available. 3) The presence of homologous peptides between training and testing datasets should be avoided to give real-world estimates of prediction performance metrics, but the relative ranking of different predictors is largely unaffected by the presence of homologous peptides, and predictors intended for end-user applications should include all training data for maximum performance. 4) The recently developed NN-align prediction method significantly outperformed all other algorithms, including a naïve consensus based on all prediction methods. A new consensus method dropping the comparably weak ARB prediction method could outperform the NN-align method, but further research into how to best combine MHC class II binding predictions is required.
Binding of peptides to Major Histocompatibility class II (MHC-II) molecules play a central role in governing responses of the adaptive immune system. MHC-II molecules sample peptides from the extracellular space allowing the immune system to detect the presence of foreign microbes from this compartment. Predicting which peptides bind to an MHC-II molecule is therefore of pivotal importance for understanding the immune response and its effect on host-pathogen interactions. The experimental cost associated with characterizing the binding motif of an MHC-II molecule is significant and large efforts have therefore been placed in developing accurate computer methods capable of predicting this binding event. Prediction of peptide binding to MHC-II is complicated by the open binding cleft of the MHC-II molecule, allowing binding of peptides extending out of the binding groove. Moreover, the genes encoding the MHC molecules are immensely diverse leading to a large set of different MHC molecules each potentially binding a unique set of peptides. Characterizing each MHC-II molecule using peptide-screening binding assays is hence not a viable option.
Here, we present an MHC-II binding prediction algorithm aiming at dealing with these challenges. The method is a pan-specific version of the earlier published allele-specific NN-align algorithm and does not require any pre-alignment of the input data. This allows the method to benefit also from information from alleles covered by limited binding data. The method is evaluated on a large and diverse set of benchmark data, and is shown to significantly out-perform state-of-the-art MHC-II prediction methods. In particular, the method is found to boost the performance for alleles characterized by limited binding data where conventional allele-specific methods tend to achieve poor prediction accuracy.
The method thus shows great potential for efficient boosting the accuracy of MHC-II binding prediction, as accurate predictions can be obtained for novel alleles at highly reduced experimental costs. Pan-specific binding predictions can be obtained for all alleles with know protein sequence and the method can benefit by including data in the training from alleles even where only few binders are known. The method and benchmark data are available at http://www.cbs.dtu.dk/services/NetMHCIIpan-2.0
Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway, the field has evolved significantly. Methods have now been developed that produce highly accurate binding predictions for many alleles and integrate both proteasomal cleavage and transport events. Moreover have so-called pan-specific methods been developed, which allow for prediction of peptide binding to MHC alleles characterized by limited or no peptide binding data. Most of the developed methods are publicly available, and have proven to be very useful as a shortcut in epitope discovery. Here, we will go through some of the history of sequence-based predictions of helper as well as cytotoxic T cell epitopes. We will focus on some of the most accurate methods and their basic background.
Although the majority of bacteria are innocuous or even beneficial for their host, others are highly infectious pathogens that can cause widespread and deadly diseases. When investigating the relationships between bacteria and other living organisms, it is therefore essential to be able to separate pathogenic organisms from non-pathogenic ones. Using traditional experimental methods for this purpose can be very costly and time-consuming, and also uncertain since animal models are not always good predictors for pathogenicity in humans. Bioinformatics-based methods are therefore strongly needed to mine the fast growing number of genome sequences and assess in a rapid and reliable way the pathogenicity of novel bacteria.
We describe a new in silico method for the prediction of bacterial pathogenicity, based on the identification in microbial genomes of features that appear to correlate with virulence. The method does not rely on identifying genes known to be involved in pathogenicity (for instance virulence factors), but rather it inherently builds families of proteins that, irrespective of their function, are consistently present in only one of the two kinds of organisms, pathogens or non-pathogens. Whether a new bacterium carries proteins contained in these families determines its prediction as pathogenic or non-pathogenic. The application of the method on a set of known genomes correctly classified the virulence potential of 86% of the organisms tested. An additional validation on an independent test-set assigned correctly 22 out of 24 bacteria.
The proposed approach was demonstrated to go beyond the species bias imposed by evolutionary relatedness, and performs better than predictors based solely on taxonomy or sequence similarity. A set of protein families that differentiate pathogenic and non-pathogenic strains were identified, including families of yet uncharacterized proteins that are suggested to be involved in bacterial pathogenicity.
Protection against pregnancy associated malaria (PAM) is associated with high levels of anti-VAR2CSA antibodies. This protection is obtained by the parity dependent acquisition of anti-VAR2CSA antibodies. Distinct parity-associated molecular signatures have been identified in VAR2CSA domains. These two observations combined point to the importance of identifying VAR2CSA sequence variation, which facilitate parasitic evasion or subversion of host immune response. Highly conserved domains of VAR2CSA such as DBL5ε are likely to contain conserved epitopes, and therefore do constitute attractive targets for vaccine development.
VAR2CSA DBL5ε-domain sequences obtained from cDNA of 40 placental isolates were analysed by a combination of experimental and in silico methods. Competition ELISA assays on two DBL5ε variants, using plasma samples from women from two different areas and specific mice hyperimmune plasma, indicated that DBL5ε possess conserved and cross-reactive B cell epitopes. Peptide ELISA identified conserved areas that are recognised by naturally acquired antibodies. Specific antibodies against these peptides labelled the native proteins on the surface of placental parasites. Despite high DBL5ε sequence homology among parasite isolates, sequence analyses identified motifs in DBL5ε that discriminate parasites according to donor's parity. Moreover, recombinant proteins of two VAR2CSA DBL5ε variants displayed diverse recognition patterns by plasma from malaria-exposed women, and diverse proteoglycan binding abilities.
This study provides insights into conserved and exposed B cell epitopes in DBL5ε that might be a focus for cross reactivity. The importance of sequence variation in VAR2CSA as a critical challenge for vaccine development is highlighted. VAR2CSA conformation seems to be essential to its functionality. Therefore, identification of sequence variation sites in distinct locations within VAR2CSA, affecting antigenicity and/or binding properties, is critical to the effort of developing an efficient VAR2CSA-based vaccine. Motifs associated with parasite segregation according to parity constitute one such site.
West Nile virus (WNV) is a growing threat to public health and a greater understanding of the immune response raised against WNV is important for the development of prophylactic and therapeutic strategies.
In a reverse-immunology approach, we used bioinformatics methods to predict WNV-specific CD8+ T cell epitopes and selected a set of peptides that constitutes maximum coverage of 20 fully-sequenced WNV strains. We then tested these putative epitopes for cellular reactivity in a cohort of WNV-infected patients. We identified 26 new CD8+ T cell epitopes, which we propose are restricted by 11 different HLA class I alleles. Aiming for optimal coverage of human populations, we suggest that 11 of these new WNV epitopes would be sufficient to cover from 48% to 93% of ethnic populations in various areas of the World.
The 26 identified CD8+ T cell epitopes contribute to our knowledge of the immune response against WNV infection and greatly extend the list of known WNV CD8+ T cell epitopes. A polytope incorporating these and other epitopes could possibly serve as the basis for a WNV vaccine.
Pregnant women acquire protective antibodies that cross-react with geographically diverse placental Plasmodium falciparum isolates, suggesting that surface molecules expressed on infected erythrocytes by pregnancy-associated malaria (PAM) parasites have conserved epitopes and, that designing a PAM vaccine may be envisaged. VAR2CSA is the main candidate for a pregnancy malaria vaccine, but vaccine development may be complicated by its sequence polymorphism.
The dynamics of P. falciparum genotypes during pregnancy in 32 women in relation to VAR2CSA polymorphism and immunity was determined. The polymorphism of the msp2 gene and five microsatellites was analysed in consecutive parasite isolates, and the DBL5ε + Interdomain 5 (Id5) part of the var2csa gene of the corresponding samples was cloned and sequenced to measure variation.
In primigravidae, the multiplicity of infection in the placenta was associated with occurrence of low birth weight babies. Some parasite genotypes were able to persist over several weeks and, still be present in the placenta at delivery particularly when the host anti-VAR2CSA antibody level was low. Comparison of diversity among genotyping markers confirmed that some PAM parasites may harbour more than one var2csa gene copy in their genome.
Host immunity to VAR2CSA influences the parasite dynamics during pregnancy, suggesting that the acquisition of protective immunity requires pre-exposure to a limited number of parasite variants. Presence of highly conserved residues in surface-exposed areas of the VAR2CSA immunodominant DBL5ε domain, suggest its potential in inducing antibodies with broad reactivity.