|Home | About | Journals | Submit | Contact Us | Français|
Forty bacterial proteomes—20 pathogens and 20 non-pathogens—were examined for amino acid sequence similarity to the human proteome. All bacterial proteomes, independent of their pathogenicity, share hundreds of nonamer sequences with the human proteome. This overlap is very widespread, with one third of human proteins sharing at least one nonapeptide with one of these bacteria. On the whole, the bacteria-versus-human nonamer overlap is numerically defined by 47,610 total perfect matches disseminated through 10,701 human proteins. These findings open new perspectives on the immune relationship between bacteria and host, and might help our understanding of fundamental phenomena such as self-nonself discrimination and tolerance versus auto-reactivity.
The completeness of the current protein databases represents a scientific turning point for comparatively analyzing and evaluating commonalities and differences among well-defined available proteomes. Our labs are taking advantage of this unique chance for investigating the molecular determinants possibly involved in human susceptibility to infectious agents.1–3 We recently analyzed a set of viral proteomes for sequence similarity to the human proteome, and reported a massive and widespread peptide overlapping between viral and human proteins.4,5 Here we analyze a set of 20 pathogenic and 20 non-pathogenic bacterial proteomes, and report that all of the bacterial proteomes studied exhibit an unexpectedly high level of peptide sharing with the human proteome, irrespective of the microbe's pathogenicity.
Quantitative analysis of nonapeptide overlap between bacterial proteomes and the human proteome is reported in Table 1. The table shows that all 40 bacterial proteomes under analysis exhibit substantial, widespread nonamer overlap with the human proteome.
The overlap between the 40 bacterial proteomes and the human proteome consists of a total of 47,610 perfect matches disseminated through 10,701 human proteins. In other words, about 50,000 perfect sequences, each 9 amino acids long, are shared between the 40 bacterial proteomes described in Table 1 and about one third of the human proteome. The bacterial versus human overlap is independent of the microbe'spathogenicity. We find that, as expected, the extent of the bacterial overlap depends almost exclusively on the size of the bacterial proteome. Indeed, the size of the bacterial proteome (in terms of number of unique nonamers) is positively correlated (r ≥ 0.891) to the three other variables: the number of unique overlaps in the human proteome; the total number of overlaps in the human proteome, including repeats; and the number of human proteins involved in the overlap. All of these correlations are statistically significant (p < 0.01).
These data have important implications for the link between microbial infections, molecular mimicry, and autoimmunity. Molecular mimicry is based on the principle that infectious agents initiate and sustain an autoimmune reaction by generating autoreactive B and/or T lymphocytes that simultaneously recognize cross-reactive determinants from both the original infectious agent and the host. This sharing of amino acid sequences on proteins from self- and nonself-sources (i.e., host and virus/bacterium) is the fundamental essence of the molecular mimicry concept.6,7 We note that molecular mimicry may involve both linear and conformational antigenic determinants. Since the data reported in this paper represent possible linear, but not conformational epitopes, the numbers given actually understate the level of epitopic overlap between bacterial and human proteomes. Consequently, although our data suggest an impressive potential for cross-reactivity between bacterial and human proteins, this potential must surely be even greater than our numbers indicate.
A considerable number of classical and recent reports have suggested molecular mimicry as a pathogenic mechanism in a wide range of diseases. These include acute rheumatic fever, reactive arthritis after enteric infection or associated with Reiter's syndrome, myasthenia gravis, rheumatoid arthritis, insulin-dependent diabetes, ankylosing spondylitis, Guillain-Barré syndrome, autoimmune hepatitis and primary biliary cirrhosis, neurological diseases such as multiple sclerosis and other demyelinating pathologies, and even the atherosclerotic plaque.8–14
In contrast, the results presented here are consistent with a number of other reports in which the elusive character of the molecular mimicry hypothesis has been underlined.15–27 Our past4,5 and present data tend to exclude a causal mechanistic role for molecular mimicry in the genesis of autoimmunity. According to the molecular mimicry hypothesis, the widespread overlap between viral and bacterial proteomes and the human proteome (see Table 1 and ref. 5) would predict that autoimmune diseases should have a much higher incidence than actually observed, both in the total number of individuals affected and the number of autoimmune pathologies per individual. Thus, it is difficult to reconcile the enormous number of viral and bacterial peptides disseminated throughout the human proteins with a fundamental role for molecular mimicry in the etiology of certain autoimmune conditions.
Instead, we believe that the high number of bacterial sequences that are also found in the human proteome, but are not clinically relevant in terms of inducing autoimmune diseases, offers a mechanistic basis for an additional microbial immune evasion strategy. Through evolution and adaptation, microbes have developed strategies that allow them to evade the immune system of their host. Such tactics promote infectious persistence and chronicity; among others, these include the altered peptide ligands of the circumsporozoite protein in malaria;28 macrophage apoptosis in microbial infections by Shigella;29 antigenic variations in Trypanosoma cruzi,30 and the consumption/degradation of complement components in microbial organisms like Porphyromonas gingivalis and Trichomonas vaginalis.31 The high level of peptide sharing between microbial and human proteomes might represent a camouflage mechanism that protects microbes from the immune attack of the host, possibly acting through the regulatory T cells that provide critical control of unwanted autoimmune responses. In a wider context, the high level of exact peptide sharing between microbial and human proteomes suggests that post-translational modifications (i.e., glycosylation, cysteinylation, citrullination, etc.) should be reconsidered as a factor that may contribute to the creation or disruption of microbial epitopes.32
Finally, from an evolutionary point of view, the massive and repeated distribution of bacterial amino acid sequences throughout the human proteome seems to indicate that bacterial and human proteins are composed of common peptide backbone units and suggests the existence of a common structural platform in the composition of proteomes, be they microbial or human.1,33
The human proteome was downloaded from Integr8 (www.ebi.ac.uk/integr8),34 and contained 38,009 proteins at the time that it was downloaded. To reduce sequence redundancy, all possible pairs of proteins in this proteome were examined. For a given pair, if the sequences were identical then one sequence was arbitrarily chosen for deletion; if one sequence was a fragment of the other sequence, then the fragment was deleted. After filtering, we were left with a human proteome consisting of 36,014 unique proteins, for a total of 15,806,702 amino acids.
Like the human proteome, all bacterial proteomes were downloaded from Integr8.34 The set of pathogenic bacteria was acquired by searching EBI's list of bacteria (www.ebi.ac.uk/genomes/bacteria.html) for those that cause disease in humans. The set of non-pathogenic bacteria was acquired by arbitrarily choosing bacteria listed on the Integrated Microbial Genomes (IMG) website (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi)35 that contain the annotation “Disease: none.” Although the IMG site contains downloadable proteomes for each organism, these proteomes were downloaded from Integr8 instead of the IMG site in order to maintain consistency with the pathogenic bacteria. Each bacterial proteome was filtered in the same manner as the human proteome. The 40 filtered bacterial proteomes consisted of 128,248 unique proteins for a total of 39,651,163 amino acids.
Sequence similarity analysis of each of the 40 bacterial proteomes to the human proteome was carried out using bacterial nonamers sequentially overlapped by eight residues. The scans were performed by custom programs written in C, which utilized suffix trees for efficiency.36 The bacterial proteomes were manipulated and analyzed as follows. Each bacterial proteome was decomposed in silico to a set of nonamers (including all duplicates). A library of unique nonamers for each microbial proteome was then created by removing duplicates. Next, for each nonamer in the library, the entire human proteome was searched for instances of the same nonamer. Any such occurrence was termed an overlap or match. Cursory analyses (e.g., identification of unique overlapping nonamers, counts of unique overlapping nonamers, counts of duplications) were performed using shell scripts and standard LINUX/UNIX utilities. Linear least-squares regression was performed to determine whether any linear relationships exist between the size of a given bacterial proteome and its level of overlap to the human proteome.
B.T. performed the computational analysis. A.K. provided bioinformatics expertise and supervised the computational analysis. G.L. developed initial analyses of bacterial proteomes, validated them by PIR perfect match program, and analyzed output data. D.K. proposed the original idea, supervised the work, interpreted the data and wrote the paper. All four authors revised the paper, with a major contribution by B.T.
Funding for this work was provided by the Ministry of University and Research of Italy (MIUR) and the Natural Sciences and Engineering Research Council of Canada (NSERC).
Previously published online: www.landesbioscience.com/journals/selfnonself/article/9588