|Home | About | Journals | Submit | Contact Us | Français|
The hypothesis that mimicry between a self and a microbial peptide antigen is strictly related to autoimmune pathology remains a debated concept in autoimmunity research. Clear evidence for a causal link between molecular mimicry and autoimmunity is still lacking. In recent studies we have demonstrated that viruses and bacteria share amino acid sequences with the human proteome at such a high extent that the molecular mimicry hypothesis becomes questionable as a causal factor in autoimmunity. Expanding upon our analysis, here we detail the bacterial peptide overlapping to the human proteome at the penta-, hexa-, hepta- and octapeptide levels by exact peptide matching analysis and demonstrate that there does not exist a single human protein that does not harbor a bacterial pentapeptide or hexapeptide motif. This finding suggests that molecular mimicry between a self and a microbial peptide antigen cannot be assumed as a basis for autoimmune pathologies. Moreover, the data are discussed in relation to the microbial immune escape phenomenon and the possible vaccine-related autoimmune effects.
The sustained increase in the incidence of autoimmune diseases in the population1–4 and the continuously expanding list of autoimmune pathologies and autoantigens5 necessitate investigations into the role of molecular mimicry in the triggering of autoimmunity.6 Molecular mimicry, i.e., the sharing of a linear amino acid sequence or a conformational fit between a microbe and a host self determinant, has been and still is the predominant field of investigation in autoimmunity research.7–19 Recently, we reported that viral proteins overlap extensively with the human proteome,20,21 with only a limited number of viral pentamers not found in the human proteome. In conflict to the dominant tendency to causally associate viral infections and autoimmune diseases, these findings support the view that molecular mimicry is over-emphasized as a critical mechanism during autoimmune disease pathogenesis. We reasoned that, if there is a link between viral infections and autoimmune reactions, then the documented extent of viral peptide overlapping in the human proteome would suggest that the entire world human population would suffer from autoimmunity. In addition, the analysis of a number of bacterial proteomes for amino acid sequence similarity to the human proteome demonstrated the sharing of hundreds of nonamer sequences between bacterial and human proteomes.22 Again, the implications of these data appear of importance to define the current molecular mimicry model and, in general, to understand basic mechanisms in pathology and address research towards new directions. Here, as a further step in our studies on autoimmunity mechanisms, we detail the bacterial versus human peptide overlapping at the 5-, 6-, 7- and 8-mer levels, and demonstrate that no human proteins are exempt from the presence of bacterial motifs.
Forty bacterial proteomes, 20 pathogenic and 20 non-pathogenic, were analyzed for peptide sharing with the human proteome at the penta-, hexa-, heptaand octapeptide level to examine bacterial-versus-human similarity. Peptide similarity analysis of bacterial proteomes versus the human proteome was conducted as already described in detail20–22 and produced the data illustrated in Table 1.
Table 1, last line, shows that combining all bacterial proteomes into one protein set and then computing the overlap of this set with the human proteome gives 15,260,383 perfect pentapeptide matches distributed through 36,014 human proteins; 9,133,718 perfect hexapeptide matches distributed through 36,014 human proteins; 1,643,139 perfect heptapeptide matches distributed throughout 35,906 human proteins; and 200,708 perfect octapeptide matches distributed throughout 31,170 human proteins. That is, the bacterial-versus-human overlap at the penta- and hexapeptide levels spans the entire human proteome: there does not exist one human protein that does not host a bacterial hexapeptide. Only 104 of the 36,104 human proteins (i.e., about 0.3% of the human proteome) are exempt from bacterial heptapeptide motifs. The human proteins that do not harbor bacterial motifs at the heptapeptide level are listed in Box 1.
Actually, the heptapeptide sharing between bacteria and human proteome is extensive and massive as schematized in the circle graph of Figure 1, illustrating the percentage distribution of bacterial heptapeptides throughout the human proteome.
The bacterial motif distribution through the human proteome decreases at the octamer level, but is still impressive: only 4,844 human proteins (just 13.44% of the human proteome) are exempt from bacterial 8-mers.
In addition, Table 1 shows that the microbe's pathogenicity does not affect the level of bacterial overlaps through the human proteins (see the percent of unique bacterial n-mers which occur in the human proteome, columns 4, 8, 12 and 16 and in Table 1). As a further confirmation, log-log plotting the bacterial 5-mer occurrences in the human proteome as a function of the bacterial proteome length produces the graph illustrated in Figure 2. It can be seen that the bacterial versus human overlap is independent of the microbe's pathogenicity and expectedly depends almost exclusively on the size of the bacterial proteome.
Table 1 suggests that molecular mimicry between a self and a microbial peptide antigen cannot be assumed as a single or exclusive basis for autoimmune pathologies. Also, it has to be underlined that the data reported above analyze only 40 bacterial proteomes. Therefore, taking into account that the human organism hosts hundreds of bacterial organisms amounting to trillions of bacterial cells,25,26 this study greatly understates the level of overlap between bacterial and human proteomes. But even considering one bacterial protein only, we are presented with a marked bacterial-versus-human peptide commonality. In this regard, scientific relevant models are offered by Klebsiella pneumoniae and Proteus mirabilis. In the past decades, there has been an intensive scientific debate because of a consecutive sequence of six amino acids, Gln-Thr-Asp-Arg-Glu-Asp (QTDRED) shared between HLA B27.1 and the nitrogenase reductase enzyme of K. pneumoniae.27 This sequence commonality was invoked as a possible structural basis for cross-reactivity to occur and cause of ankylosing spondylitis. Following a profusion of inconclusive papers on the issue,27–33 the attention successively shifted on the molecular mimicry between human motifs (EQRRAA and LRREI) and P. mirabilis peptide sequences (ESRRAL and IRRET) as a possible aetiological basis for autoimmune rheumatoid arthritis.34
Peptide overlap analysis shows that K. pneumoniae and P. mirabilis proteomes have a peptide platform in common with the human proteome. As examples, Table 2 shows the heptapeptide sharing between the human proteome and K. pneumoniae ATP-binding protein and P. mirabilis ATP-dependent RNA helicase protein. The K. pneumoniae ATP-binding protein (UniProt accession number: B5XMS5_KLEP3, aa 1–233) is formed by 227 heptamers, 36 of which are present in the human proteome. Analogously, P. mirabilis RNA helicase protein (UniProtKB: B4ESW0_PROMH, aa 1–465, formed by a total of 459 heptamers) has 190 perfect heptapeptide sequences in common with the human proteome (Table 2). In conclusion, data from Table 2 further demonstrate that molecular mimicry cannot be considered as a single or exclusive causal factor in the genesis of autoimmune phenomena.
Using a set of forty bacterial proteomes, this study shows that there does not exist a human protein exempt from a bacterial peptide overlap at the penta- and hexapeptide level and that only 104 proteins out of 36,104 do not host a microbial heptapeptide overlap. This finding is remarkable from a biological point of view and further supports a non-stochastic nature of the peptide overlapping between microbial and human proteomes. Our considerations are the following. Taking heptamer motifs as an example, we calculate that there are 1,280,000,000 possible heptapeptides that theoretically are available and might be used to build a human proteome exempt of bacterial heptapeptides. On the other hand, we know that the human proteome is formed by 36,103 proteins and 15,697,964 occurrences of 10,431,975 unique 7-mers21 and, in the present study we find that only 104 human proteins out of 36,104 do not host a microbial heptapeptide overlap. That is, in face of the enormously high number of potential heptapeptides (1,280,000,000), the human proteome not only presents a high degree of repetitiveness in its 7-mer composition, but also utilizes heptapeptides common to bacteria so that almost no human protein is exempt from bacterial heptapeptide motifs. This peptide commonality has no mathematical justification. There is no shortage of possible heptapeptides, rather an incredibly huge number of heptapeptides are potentially available. Thus, we are forced to conclude that the redundancy present in the protein world is not stochastic (i.e., is not pure random chance), but reflects strong peptide usage bias.21,35,36
CATRl; COAS3; CT187; CU094; FA27L; KR124; KR192; KR211; KR410; KR412; KR413; KR414; KRA42; KRA44; KRA47; KRA81; LCE2B; MTIF; MTIG; MTIM; MT2; MT4; RL41; SPHAR; SPR2A; SPR2B; SPR2D; SPR2E; SPR2F; TRGll; Q2XP30; Q9MY73; Q05CR9; Q05CT7; Q0QVY9; Q6EHZ1; Q6PK85; Q7Z4Q0; Q86YX3; Q9HCX8; Q9P1F9; Q5FC06; Q86XP7; Q8IVI0; Q9NY32; Q9U153; Q6JTU6; Q6ZVA9; Q86TX6; Q81WU1; Q8NI73; Q96EQ2; Q9BXV1; Q9H325; Q5HYP9; Q6GZ88; Q7Z5A1; Q9H3A8; Q9P145; Q9P1I0; Q4VFV5; Q8IVH9; Q9NZ11; Q9P1F8; Q13254; Q6AWA8; Q7Z425; Q96S45; A0A4R1; A0MA52; Q07603; Q8WYR5; Q96Q13; Q9BZU2; Q9NYD4; Q9P1E0; Q9P1E9; Q9UI79; Q147W9; Q6JV79; Q6JV82; Q9HAZ7; A2RUG3; Q96IP2; Q31629; Q495H9; Q5JT78; Q5JVP1; Q5T7W9; Q5TAP0; Q68K28; Q6ZQP6; Q71M31; Q7LCP5; Q7Z4E0; Q86SX0; Q86YX6; Q8NG36; Q8WV73; MORN4; Q96IR5; Q96JR7; Q9BZU0; Q9UI80
Human proteins reported as accession numbers.
In addition, in light of the intensive research dedicated to understanding the function/effect of the presence of a single bacterial match in a human protein looking for pathological correlates,27–34 the present data are striking and seem to overturn our conceptualization of the relationship(s) between microbes and Homo sapiens. Actually, the data reported in this study are logical when analyzed in the light of the phylogenetic background linking bacteria and eukaryotes. Cells are of only two kinds: bacteria (or prokaryotes) and eukaryotes, which evolved from bacteria, possibly as recently as 800–850 My ago. As described in detail by Cavalier-Smith,37 eukaryogenesis involved radical changes in almost every metabolic and structural aspect of the bacterial cell with a reorganization of the membrane and cytoskeleton apparatus and new chromosomal relationships to originate the eukaryotic nucleus and mitotic cycle. In this cell re-organization, new eukaryotic proteins evolved from old bacterial ones. Therefore, we can conclude that the data from Tables 1 and and22 find a proper explanation in the evolutionary history of eukaryotes.
When analyzed from a pathological-clinical point of view, this report is of crucial importance in the study of autoimmune diseases for three reasons. As already discussed above, the data are of special relevance as regards the molecular mimicry hypothesis. Indeed, the molecular mimicry hypothesis suggests that, when bacterial/viral agents share epitopes with a host's protein, an immune response against the infectious agent may result in the formation of cross-reacting antibodies that bind the shared epitopes on the normal cell and result in the auto-destruction of the cell.6 However, the extensive sequence similarity between bacteria and human proteins documented in this study suggests that molecular mimicry between a self and a microbial peptide antigen is inadequate to explain autoimmune pathologies.
Second, this study might contribute to explaining the microbial immune escape phenomenon. Scientific and clinical literature have been and are intensively debating the escape of microbes from immune control. A number of hypotheses and possible mechanisms have been proposed,38–40 albeit with scarce results. Here, the quantitative analysis of n-peptide overlapping of bacterial versus human proteomes reported in Tables 1 and and22 offers a logical and rational explanation to the vexata quaestio of microbial escape from immune surveillance. Indeed, the present data and our past studies20,21 document that microbes are “a portion” of our human self and, consequently, presumably are subject to the same tolerance mechanisms that characterize human antigens and tissues. As a matter of fact, most chronic diseases, including pertussis,41 tuberculosis,42 leishmaniasis,43 periodontitis,44 gastritis,45 to cite only a few of them, occur because an appropriate immune response required for pathogen clearance is not established. This causes a long-term pathogen colonization favored and progressively auto-sustained by pathogen-encoded molecules that enable the suppression of host immune response. The progressively increasing bacterial burden then causes a vicious cycle of bacterial proliferation and host tissue inflammation that translates into tissue damage, impaired function and eventual disease.
The third and most crucial consequence of this study is related to current anti-infectious vaccine preparations. Possibly as a consequence of immunotolerance mechanisms towards repeatedly shared peptide sequences, in general active vaccines produce a weak immune response; also, autoimmune cross-reactions are extremely rare events.46–50 Under normal, non-stimulated conditions, the immune system fails to make immune responses to the infectious antigens present in the vaccines unless adjuvants are added.17,48,49 As a rule, the current active vaccine formulations contain adjuvants to enhance immunogenicity.50–52 The adjuvants serve to activate the immune system against microbial antigens that by themselves do not evoke immune responses, but rather are immunotolerated. However, as demonstrated in this and other studies of ours,20–22 microbial antigens contain a high number of motifs shared with human proteins. Therefore, using viral or bacterial antigens in adjuvanted active vaccines will possibly trigger the immune system to react against the shared motifs (i.e., not only against the microbial antigen(s), but also against human self-molecules) with the concrete risk of developing adverse events and autoimmune pathologies in the human population.53–56
The human proteome was downloaded from UniProtKB (www.ebi.ac.uk),23 and duplicated sequences and fragments were filtered out. After filtering, we were left with a human proteome consisting of 36,014 unique proteins, for a total of 15,806,702 amino acids. Bacterial proteomes were downloaded from Integr8,23 and each bacterial proteome was filtered in the same manner as the human proteome. The bacteria were chosen based on the following criteria: (1) known to be non-pathogenic or pathogenic; (2) phylogenetically different; (3) have proteomes established to a significant degree of completeness. In addition, the bacterial proteomes were chosen to span a range of proteome sizes, with the smallest bacterial proteome being 450,406 and 306,369 amino acids (for non-pathogenic and pathogenic bacteria, respectively), the largest being 2,582,740 and 1,439,163 amino acids (for non-pathogenic and pathogenic bacteria, respectively).
Sequence similarity analysis of each bacterial proteome to the human proteome was carried out using bacterial n-mers (with n from 5 to 8) sequentially overlapped by 4, 5, 6 and 7 residues, respectively. The scans were performed by custom programs written in C, which utilized suffix trees for efficiency.24 The bacterial proteomes were manipulated and analyzed as follows. Each bacterial proteome was decomposed in silico to a set of penta-, hexa-, hepta- or octamers (including all duplicates). A library of unique penta-, hexa-, hepta- or octamers for each microbial proteome was then created by removing duplicates. Next, for each n-mer in the library, the entire human proteome was searched for instances of the same n-mer. Any such occurrence was termed an overlap or match. Cursory analysis (e.g., identification of unique overlapping n-mers, counts of unique overlapping n-mers, counts of duplications) were performed using LINUX/UNIX shell scripts and standard LINUX/UNIX utilities.
Previously published online: www.landesbioscience.com/journals/selfnonself/article/13315
B.T., G.L. and A.S. performed the computational analysis. M.B. and A.K. performed the mathematical analysis and supervised the computational analysis. D.K. proposed the original idea, interpreted the data, developed the research project and wrote the manuscript. All authors discussed the results and revised and commented on the manuscript with a particular contribution from A.K.