Different types of Mycobacterium epitopes described within the IEDB
Mycobacterial epitopes are described first in terms of their recognition by the humoral and cellular immune response. From a dataset of 1377 unique mycobacterial epitopes (described in the methods section), 1114 are recognized by T cell responses, and 357 by B cell responses. It should be noted that the sum total of the epitopes in the B and T cell categories exceeds the total number of epitopes, as some epitopes are recognized by both cell types. The overall epitope distribution likely reflects the dominance placed on the T cell response in investigating mycobacterial infections, and/or experimental bias.
All but 34 epitopes are peptidic in nature. Of the non-peptidic epitopes, 20 are recognized by B cells and 14 by T cells. Of the B cell epitopes 15 are carbohydrates, 3 are fatty acids, one is a glycolipid, and one is a small organic molecule. Of the non-peptidic T cell epitopes 7 are small organic molecules, and 4 are glycolipids, 2 are carbohydrates, and the last is a lipopeptide. These T cell epitopes are restricted by class Ib/Non-classical MHC molecules. The role of non-peptidic, post translationally-modified, and capsular antigens in Mtb infectivity and pathogenesis, as well as their potential diagnostic use, is well appreciated [24
]. Thus, this finding identifies a knowledge gap in current epitope data, and highlights the need for the generation of additional data defining non-peptidic and post-translationally modified structures recognized by host immunity against Mtb.
For this analysis, the assignment of effector cells as CD4+ versus CD8+ T cells was inferred from the assay type classification (described in the methods section) and therefore enabled a larger proportion of epitopes with no reported restriction to be functionally/phenotypically categorized (Figure ). This categorization showed that the vast majority of epitopes have been defined for CD4+ T cells, and have been described to a much lesser extent for CD8+ T cells. Only a small number of effectors were categorized as non-classical (e.g. recognized by gamma-delta T cells). The dominance of CD4+ epitopes likely represents the focus of the research community in identifying epitopes and antigens restricted by the MHC class II pathway. A significant proportion of the T cell epitopes, mostly described in references utilizing IFNγ and IL-2 assays, remained unclassified due to lack of sufficient information in the original publication regarding MHC restriction and T cell phenotype. By contrast, the classification of B cell epitopes as either linear/continuous or discontinuous was clear-cut No discontinuous epitopes for Mycobacteria have been found. This observation likely reflects the technological difficulties inherent in the definition of discontinuous epitopes, but also highlights a crucial gap in our knowledge of what is recognized during Mtb-specific immune responses.
Figure 1 The nature of antibody/B cell & T cell Mycobacterial epitopes. The proportion of mycobacterial epitopes according to their type is shown. Where the type of T cell epitope could not be defined directly from the primary information source it was (more ...)
Topology and function of the protein sources from which the Mtb epitopes are derived
If we focus specifically on TB, we find that the M. tuberculosis genome comprises approximately 4000 protein coding ORFs, for which 924 epitopes are defined in the IEDB from the literature. Most strikingly, these epitopes are derived from only 270 ORFs, corresponding to a mere 7% of the entire genome. In total, as few as 30 ORFs account for 65% of the defined epitopes. Thus, epitope definition studies in Mtb specifically, and mycobacterial species in general, are apparently far from complete. We further analyzed the epitope distribution in the context of the function and topology of the protein of origin.
In order to explore the distribution of epitopes between proteins according to function and topology category categories (described in the methods section), Epitope Density Index (EDI) values were determined. Two EDI values were calculated for each category. The EDI 1 values represent the number of epitopes in each category divided by the number of proteins (ORFs) with defined epitopes in the category. The EDI 2 values represent the number of epitopes in each category per 100 amino acid residues from the same proteins, allowing protein size to be taken into consideration (Figure ). In general, epitopes within the IEDB were identified from all functional and topological categories that have defined coding sequences. A small number of epitopes could not be classified, as their source antigen sequences were either not known or not homologous to the species of M. tuberculosis, M. bovis, M. leprae, or M. avium.
Figure 2 The epitope density of Mycobacterial proteins. Epitope Density Index (EDI) values for each protein function (a) and topology (b) category are shown. EDI 1 values (black) represent the number of epitopes in each category divided by the number of proteins (more ...)
In terms of protein function, the highest EDI 1 values (8.84 to 5.90) were observed for proteins directly associated with pathogenicity (i.e. virulence, detoxification, adaptation), lipid metabolism, followed by protein classes that are probably exposed on the pathogen surface (i.e. PE/PPE, cell wall and cell processes) (Figure ). Fewer epitopes were derived from the organism's internally regulated proteins (i.e. information pathways, intermediary metabolism and respiration, and regulatory proteins), which had an EDI 1 range from 3.90 to 0.54. In general, the EDI values calculated as a function of protein length (EDI 2) correlate with EDI 1 values accept for PE/PEE proteins, which were found to be larger on average. These findings may represent an artifact of the early work directed towards the secreted proteins of Mycobacteria [26
], while it is also consistent with current knowledge for Mtb pathogenesis; an arsenal of secreted virulence factors and massive lipid metabolism work in concert to establish infection [29
Evidence of mycobacterial proteins changing subcellular location [31
], may have an impact on the overall topographical classification of epitopes, and indicates the need to take experimental parameters into greater consideration. With this caveat in mind, the highest EDI 1 values for topology categories are seen in the extracelluar (11.49) and cytoplasmic (8.09) proteins (Figure ). Their locations cohobate the findings of the protein function categories. The EDI 1 value for proteins with an undetermined topology was less (5.18), while the lowest values were seen for the cell wall (3.50) and cytoplasmic membrane protein (3.06) categories. The correlation between EDI 1 and EDI 2 values differed for only the cytoplasmic protein category, which have a smaller average size compared to those with an undetermined topology. Finally, the ratio of antibody/B cell and T cell epitopes was equivalent in each protein function and topology category (data not shown).
It is important to note that the above analysis must be viewed with the consideration of biological phenomenon versus experimental bias. It is possible that protein classes described above are highly immunogenic and could represent good candidates for vaccine development. Alternatively, these proteins classes may have received greater attention from the research community, in terms of epitope discovery, because of their high level of expression or ease of isolation. Increased epitope identification in proteins belonging to categories with low epitope density may allow discrimination between these two alternatives, and may also lead to discovery of novel antibody/B and T cell reactivities.
Mycobacterial strain and species distribution of described epitopes
The focus of the present analysis is Mtb, because of its significance as a human pathogen. However, immune epitope data from other all mycobacterium species for which epitopic information is available were also included. These data are important to appreciating the relative balance of knowledge between the various mycobacterium species and their relevance to vaccine and diagnostic target selection. A total of 1644 epitopes have been defined within 11 different mycobacterium species, representing 363 antibody/B cell epitopes and 1281 T cell epitopes. As previously stated, the total of unique mycobacterial epitopes is 1377, therefore some unique epitopes are defined in more than one mycobacterium species or strain, as well being recognized by both B and T cell types [Additional file 1
(sheet1)]. As shown in Figure , Mtb epitopes represents 56% of the total, while M. leprae
(ML) and M. bovis
(Mb) represent 21 and 20% respectively. Only 11 epitopes have been defined in M. avium
(>1%), and 23 epitopes are defined in other known mycobacteria species. A further 9 epitopes have been identified in undetermined mycobacterium species.
Figure 3 The distribution of epitopes between Mycobacterial species and strains. The number of epitopes for each Mycobacteria species considered is presented. The proportion of epitopes with strain information (dark grey), and without (light grey) is shown for (more ...)
Evidence that the genetic variability between Mtb strains confers significant phenotypic differences in virulence and immunogenicity [32
], underscore the need for strain-specific epitope analyses. Of the 924 epitopes defined in Mtb, 54% were reported with no defined strain information (Figure ). Of the 417 Mtb strain-specific epitopes, 88% correspond to H37Rv, while the remainder corresponds to strains 103, Erdman, H37Ra and CDC1551 [Additional file 1
(sheet1)]. Of the other Mtb strains of potential interest, such as the laboratory strain H37Ra, only 5 epitopes are known. Additionally, no epitopes are described in other Mtb strains commonly associated with TB, such as the highly virulent Beijing strains, or multi-drug resistant (MDR) and or extremely drug resistant (XDR) strains. A similar picture emerges in the case of the epitopes defined in M. leprae
, with the majority of the epitopes derived from unidentified strains reflecting the challenges in M. leprae
strain typing [33
]. This general lack of strain information for epitopes was found to be, in large part, due to the characterization of epitope recognition following natural Mycobacterial infection in human populations. Indeed, 64% of mycobacterium-specific epitopes within the IEDB were defined in humans. In this context, the identity of the specific strain to which each subject was exposed is often not known and/or not identified diagnostically. Moreover, this omission is compounded by the lack of published sequence information for some of the identified strains considered herein. In the absence of strain characterization information, it is suggested that the geographic location of origin and the ethnicity of study subjects be included in future publications.
Conservancy and uniqueness of mycobacterial epitopes
Establishing the degree of sequence conservancy for a given epitope among and between mycobacterium species and strains may help differentially identify epitopes applicable to diagnostics, versus vaccine design and evaluations. In the diagnostic setting, epitopes conserved within a species, and not found in any other mycobacterium are of interest [35
]. For example, reactivity to certain epitopes may distinguish between individuals infected with Mtb and those previously vaccinated with Bacillus Calmette-Guerin (BCG).
A total of 989 mycobacteria epitope sequences can be found in the protein coding ORF's of 3 Mtb strains (described in the methods section), accounting for 72% of all Mycobacterial epitopes [Additional file 1
(sheet 2)]. Of those, 95% are also conserved among Mtb strains. Eleven of the conserved epitope sequences that are also exclusive to Mtb (they do not share >85% sequence identity with any other species) and are also conserved among three Mtb-specific strains [Additional file 1
(sheet 3)]. These epitopes have been identified as potential diagnostic candidates [38
]. No epitope sequence exclusive to any M. bovis
strains was reported. Of the 427 epitope sequences found in M. leprae
TN, 89 are exclusive to this strain. This higher number of unique sequences reflects the greater evolutionary distance between M. leprae
and the other mycobacterium species considered [40
]. Finally, of the 216 epitope sequences that were reported in M. avium
, 206 were conserved within the two identified M. avium
strains, though none are exclusive to these strains.
For the purpose of distinguishing between infection with M. tuberculosis, infection or sensitization to nontuberculous mycobacteria, or immune responses due to vaccination with BCG, it is essential to know which sequences are not shared with another species, regardless of exclusivity. We found from of a total of 940 epitope sequences conserved between the three Mtb strains, 216 are not present in BCG Pasteur 1173P2 (Table ). These epitopes might be of diagnostic use in distinguishing between TB infection and vaccination. As expected, a substantially greater number of Mtb conserved epitope are not shared with M. leprae and M. avium. All of the 955 immune epitope sequences present in the virulent AF2122/97 strain of M. bovis are also be found in Mtb, while only 18 of the 940 conserved Mtb epitopes were found to be absent from M. bovis.
The numbers of Mycobacteria B and T cell epitope sequences that are present in one species but absent in another
Of the 955 sequences present in M. bovis
strain AF2122/97, 202 epitopes are not found in M. bovis
BCG Pasteur 1173P2, reflecting known deletions in the BCG genome [41
]. These deleted epitopes may be of interest in studying the difference in immune response between the BCG vaccine and virulent M. bovis
. The majority of the sequences present in M. bovis
are not found in either M. leprae
or M. avium
. A similar pattern is observed in the case of BCG. Of the 427 epitope sequences found in M. leprae
, 242 are absent in Mtb, M. bovis
, and M. bovis
BCG, while 271 are not found in M. avium
. These sequences may discriminate infections with M. leprae
and other mycobacterial species. Finally, all of the 206 epitope sequences conserved in M. avium
, 21 are not present in Mtb, 23 in M. bovis
and BCG, and 53 in M. leprae
Hosts range of defined epitopes
In addition to humans, a variety of different host species have been utilized to define mycobacterial epitopes. These other hosts include mice, cattle, rabbits, rats, macaques, chickens, and guinea pigs. As pathogens, mycobacteria are well-established in both human and veterinary medicine. These bacterial infections are of particular significance to the cattle, swine, and fowl industries. M. bovis infection in cattle continues to have a substantial financial impact. There is no information available for epitopes recognized in other naturally-infected hosts, such badgers for M. bovis or nine-banded armadillos for M. leprae.
Accordingly, we have investigated the mycobacterial immune epitope information for the relevant animal models and host species. A total of 1548 host-epitope interactions, defined as an epitope experimentally determined to be recognized within a particular host species, are distributed between different host species and immune cell types. Therefore epitopes within the dataset of 1377 unique epitopes are recognized by more than one host species or immune cell type. The distribution of all mycobacterial epitopes amongst the different host species is shown in Table . Not surprisingly, the vast majority of epitopes (63.7%) are defined in human hosts. Murine epitopes account for an additional 24.1%, of which, approximately a third have been defined in HLA transgenic mice, and are therefore restricted by human MHC molecules. The epitopes described in bovine species account for another 7.3% of the total. The number of epitopes identified in other animal models is surprisingly small. For example, only 15 are identified in the rat, and 1 in the guinea pig. In neither case are B cell epitopes represented as they are in the other listed species. Surprisingly, only 9 epitopes have been defined in non-human primates, accounting for less than 1% of the total. Thus identification of additional epitopes in non-murine animal models, specifically those of non-human primates might be of future interest.
Recognition of Mycobacteria epitopes from different host species
In the majority of cases, the number of B cell epitopes identified in each host organism is considerably less than those for T cells, with the exception of epitopes identified in chickens and rabbits. In these species, T cell epitopes have yet to be described. This observation is likely a reflection of the preferential use of these species for antibody production and the study of humoral immunity to TB.
Epitope recognition in different disease states
Using the definitions established by the CDC (as described methods section), a total of 1460 human disease state classifications are known for the 985 epitopes reported for naturally infected human hosts. As expected, the recognition of some epitopes was reported for more than one disease state, which explains the discrepancy in the above totals. We found that the greatest proportion of human recognitions of immune epitopes (Table ) are within the Clinically Active TB category (29%), followed by Vaccinated (18%), then Prior TB (11%). Lower numbers are recognized in Exposed but not Diseased [house-hold contacts, PPD-/+, TST-/+] (9%) and TB test positive [TST, PPD, etc.] (7%) groups. M. leprae infections account for most of the Exposed to Other Mycobacterium category (23%). Only a small number of epitopes could not be classified under the current disease state scheme (31, not shown); including epitopes recognized in unspecified/unknown or other diseases where mycobacterial involvement is undetermined.
Summary of epitope recognition in different TB disease states
The IEDB also captures negative immunological data, when it is reported in the literature and found to meet our inclusion criteria. This feature of the database permits a more in depth analysis of immune epitopes differentially associated with certain diseases states. For example, a given investigator might be interested in searching the database for epitopes recognized by individuals with Clinically Active TB, but not by vaccine recipients or individuals Exposed to other Mycobacteria. When we queried the database for this specified type of epitope, 74 different epitopes were found. Similarly, we queried for epitopes reported in individuals in the Exposed/resolved category, but not by those with Clinically Active TB. In this case a total of 27 different epitopes were found. In general however, we noted that for most epitopes, information is sorely lacking regarding recognition in multiple disease states, thus limiting the usefulness of the positive recognition data, especially in diagnostic and prognostic settings. Overall, the analysis underlines the need for controlled systematic studies to correlate immune recognition of particular epitopes and antigens with disease states and outcomes.
Animal models and protective epitopes
A total 618 mycobacterial epitopes have been identified in non-human hosts. These epitopes been further categorized into TB or non-TB animal models [42
]. 492 epitopes have been identified in TB models, while 126 have been identified in non-TB models. As mentioned above, in many instances, these epitopes were recognized in HLA transgenic mice that express human MHC molecules.
Analogous to the human disease outcome scenario, it is useful to be able to identify which of the reported epitopes confer protection in experimental models of the disease. Additional file 1
(sheet 4) lists protective epitopes that have been reported in the literature for animal models of Mtb infection. It should be noted that in our classification of protective epitopes, we utilize a rather stringent definition. Namely, we only consider protective epitopes that are utilized as isolated molecular structures to immunize and confer protection, and do not consider protective epitopes merely associated with protection in contexts were responses directed against multiple epitope specificities are present. Moreover we define protection from disease in these animals as a reduction of clinical signs or reduction in bacterial load.
Utilizing these stringent criteria, only 10 T cell epitopes are shown to be protective in mice and rats. Four of the protective epitopes are derived from the antigen 85 protein, 2 from the 6 kDa early secretory antigenic target (ESAT-6) and 2 from the 60 kDa chaperonin 2 protein. Although a monoclonal antibody specific to a Mtb polysaccharide conferring partial protection on mice has been identified [43
], no protective B cell epitopes have been reported in the literature that fulfill the curation requirements of the IEDB. These data indicate that a more comprehensive analysis of the protective nature of TB epitopes is needed.
MHC binding data of Mycobacterial T cell epitopes
The analysis of MHC binding data can be valuable as a means of preliminary epitope identification, and to confirm restriction assignments. In all, MHC binding information is known for 436 of the 1114 T cell epitopes, of which 148 are bound by more than one MHC molecule. As shown in table , the majority of data is related to MHC class II (63%). Class I binding data comprise almost all of the remaining 36%, with Non-classical MHC restricted epitopes (recognized by CD1 and H-2-M3 molecules) contributing approximately 1% of the total.
The restriction of T cell epitopes by MHC molecules
There are 13 different HLA class I antigens that restrict a total of 118 different epitopes [Additional file 1
(sheet 5)]. Epitopes defined for HLA-A and HLA-B antigens are approximately equal, while no mycobacterial epitopes were reported to be restricted by HLA-C antigens. Epitopes bound by HLA-A2, B35, and B53 are the most numerous (46, 28, and 10 restrictions, respectively). Furthermore, 65% of all HLA class I restrictions are determined at the level of allele specificity, with the major alleles being A*0201 and B*3501. There are 500 known epitopes bound by 25 different HLA Class II molecules [Additional file 1
(sheet 5)]. The vast majority of epitopes are bound by DR antigens (90%), while DP and DQ antigens bound less than 10 epitopes each. The greatest number of epitopes are bound to DRB1*0101, *0301, *0401, *1501, and DRB5*0101. Finally, MHC binding information can also identify degenerate MHC ligands, of potential importance for vaccine and diagnostic applications because of their broad population coverage. Here we find 38 mycobacterial MHC ligands that have promiscuous binding properties have been identified [Additional file 1
Next, we wanted to assess whether Mtb epitopes defined in human hosts would allow for balanced coverage of the different HLA class II and class I alleles, especially those most frequently expressed in ethnicities inhabiting areas in which TB is endemic. To this end, the 10 most frequent HLA class I and class II alleles found in each region of high TB incidence were compiled, based on the WHO report on TB incidence [44
], and from the dbMHC database [45
] of allele frequencies [Additional file 1
Given the lack of breath in HLA allele specific Mtb epitope information discussed above, it is perhaps not surprising that few of these alleles are associated with epitopes of known restriction. Of the HLA class I alleles frequent in endemic areas, 7 are associated with described epitopes. Similarly, of the 33 class II alleles are most frequent in regions with a high incidence of TB, Mtb epitopes are known to be restricted by only 7 of these alleles, which represents just 51 epitopes. There are no epitopes identified for HLA-DP, and only 5 HLA-DQ alleles have been shown to present Mtb epitopes to T cells. Thirty Mtb epitopes are restricted by DRB1*0101, which represents more than half of the allele specific data for HLA-DR. In conclusion, coverage of alleles frequently expressed in endemic areas is sparse, and could be improved by additional research.
HLA antigen associations with the outcome of human TB infection have been reported [46
]. We have identified 15 HLA class I and 23 class II antigens and alleles antigens [Additional file 1
(sheet 8)], thought to be associated with susceptibility to, or protection from TB, within the published literature. Similarly to what is found in the case of the alleles frequently found in endemic areas, little information exists for the epitopes presented by alleles associated with disease resistance or susceptibility. Interesting results have been obtained in this area in the case of other pathogens such as HIV, for which epitopes restricted by beneficial class I HLA alleles have significantly stronger selective pressure exerted on them [51
]. For plasmodium, evidence that HLA class I and II polymorphisms are associated with progression of malaria has also been proposed [52
]. In light of this information additional studies in TB might be of interest.
Finally, in terms of non-human hosts, MHC binding data is available for only 5 species: mouse, rat, macaques, chimpanzee, and cattle (Table ). Murine MHC restriction accounts for roughly 90% of the data. Finally, with regard to specific MHC molecules, only 5 murine MHC molecules have 10 or more known epitopes: H-2-Db, H-2-IAg7, H-2-IAb, H-2-IEg7, H-2-IAd antigens [Additional file 1
(sheet 9)]. In conclusion, significant numbers of MHC restricted epitope data is available for non-human hosts, though the majority corresponds to murine MHC class II molecules.
Definition of reference set of Mtb epitopes
The IEDB is designed to be as comprehensive and inclusive as possible. No attempt is made to privilege particular datasets based on the origin, experimental technique, or perceived quality or value of the data. This design allows the user to query the database and select particular dataset matching desired characteristics. This approach is rigorous and associated with the least loss of information. However, it is also associated with an important potential drawback, namely the requirement of a fair level of familiarity by the user with the intricacies linked to a particular diseases or experimental systems. For this reason, standardized epitope reference datasets could be of significant use in the assessment of host-pathogen interactions, the development and testing of diagnostic assays, and vaccine evaluations.
To guard against subjectivity and arbitrary inclusion or exclusion of data, we propose that the definition of reference datasets be governed by objective criteria agreed upon by the relevant research community. This strategy has the advantage to allow automatic updates by inclusion of new epitopes that match the objective criteria. We have initially generated epitope reference datasets according to the following criteria: 1) Ex vivo detection (defined not needing in vitro restimulation for T cell recognition, and by definition, positive binding in all antibody assays) and 2) Use of standardized assays (defined as epitopes associated with a list of assays recognized by community experts as "gold standards"; we tentatively included ICS, ELISPOT, and proliferation for T cells responses; and ELISA and Antigen Competition of Ab Binding for antibody responses). The number of epitopes that fulfill the first two criteria (ex vivo and standardized assays) is 735, which represents approximately 53% of the total defined epitopes. (Table ). These epitopes are describe 344 B cell epitopes and 484 T cell epitopes, of which 93 are recognized by both immune cell types.
Number of epitopes that satisfy each of the defined criteria for epitope datasets
Next, we reasoned that researchers might be interested in different subsets of the epitopes defined by ex vivo
detection and standardized assays. For example, epitopes with known MHC restriction and/or defined molecular structures. If we apply these criteria, we obtain a reference set of 322 unique epitopes of which 182 B cell epitopes and 186 T cell epitopes (46 epitopes are recognized by both immune cell types) (Table , [Additional file 1
(sheet 10)]). It is also of potential interest to define which epitopes have been reported in at least two different publications or IEDB submissions. This limitation defines a reference set of 177 unique epitopes, of which 67 B cell epitopes and 166 T cell epitopes (56 epitopes are recognized by both immune cell types) (Table [Additional file 1
(sheet 11)]). Clinical researchers might also be interested in seeing how the epitopes defined by ex vivo
analysis and standardized assays are distributed between the TB different disease states, as well as in the more specific sub-category reference datasets. These results demonstrate that multiple customized epitope datasets can be generated to best suite the varied needs of the scientific community. We have also placed a detailed summary of these data on the IEDB website [53
] with an accompanying forum discussion thread [54
] to obtain research community feedback as to the value of these reference sets, and suggestions on how to improve them.