Search tips
Search criteria 


Logo of selfLink to Publisher's site
Self Nonself. 2010 Jan-Mar; 1(1): 71–74.
PMCID: PMC3091599

Bacterial peptides are intensively present throughout the human proteome


Forty bacterial proteomes—20 pathogens and 20 non-pathogens—were examined for amino acid sequence similarity to the human proteome. All bacterial proteomes, independent of their pathogenicity, share hundreds of nonamer sequences with the human proteome. This overlap is very widespread, with one third of human proteins sharing at least one nonapeptide with one of these bacteria. On the whole, the bacteria-versus-human nonamer overlap is numerically defined by 47,610 total perfect matches disseminated through 10,701 human proteins. These findings open new perspectives on the immune relationship between bacteria and host, and might help our understanding of fundamental phenomena such as self-nonself discrimination and tolerance versus auto-reactivity.

Keywords: bacterial proteomes, human proteome, similarity screening, peptide sharing, self-nonself discrimination, tolerance versus auto-reactivity


The completeness of the current protein databases represents a scientific turning point for comparatively analyzing and evaluating commonalities and differences among well-defined available proteomes. Our labs are taking advantage of this unique chance for investigating the molecular determinants possibly involved in human susceptibility to infectious agents.13 We recently analyzed a set of viral proteomes for sequence similarity to the human proteome, and reported a massive and widespread peptide overlapping between viral and human proteins.4,5 Here we analyze a set of 20 pathogenic and 20 non-pathogenic bacterial proteomes, and report that all of the bacterial proteomes studied exhibit an unexpectedly high level of peptide sharing with the human proteome, irrespective of the microbe's pathogenicity.

Results and Discussion

Quantitative analysis of nonapeptide overlap between bacterial proteomes and the human proteome is reported in Table 1. The table shows that all 40 bacterial proteomes under analysis exhibit substantial, widespread nonamer overlap with the human proteome.

Table 1
Overlap between bacterial proteomes and the human proteome at the nonamer level

The overlap between the 40 bacterial proteomes and the human proteome consists of a total of 47,610 perfect matches disseminated through 10,701 human proteins. In other words, about 50,000 perfect sequences, each 9 amino acids long, are shared between the 40 bacterial proteomes described in Table 1 and about one third of the human proteome. The bacterial versus human overlap is independent of the microbe'spathogenicity. We find that, as expected, the extent of the bacterial overlap depends almost exclusively on the size of the bacterial proteome. Indeed, the size of the bacterial proteome (in terms of number of unique nonamers) is positively correlated (r ≥ 0.891) to the three other variables: the number of unique overlaps in the human proteome; the total number of overlaps in the human proteome, including repeats; and the number of human proteins involved in the overlap. All of these correlations are statistically significant (p < 0.01).

These data have important implications for the link between microbial infections, molecular mimicry, and autoimmunity. Molecular mimicry is based on the principle that infectious agents initiate and sustain an autoimmune reaction by generating autoreactive B and/or T lymphocytes that simultaneously recognize cross-reactive determinants from both the original infectious agent and the host. This sharing of amino acid sequences on proteins from self- and nonself-sources (i.e., host and virus/bacterium) is the fundamental essence of the molecular mimicry concept.6,7 We note that molecular mimicry may involve both linear and conformational antigenic determinants. Since the data reported in this paper represent possible linear, but not conformational epitopes, the numbers given actually understate the level of epitopic overlap between bacterial and human proteomes. Consequently, although our data suggest an impressive potential for cross-reactivity between bacterial and human proteins, this potential must surely be even greater than our numbers indicate.

A considerable number of classical and recent reports have suggested molecular mimicry as a pathogenic mechanism in a wide range of diseases. These include acute rheumatic fever, reactive arthritis after enteric infection or associated with Reiter's syndrome, myasthenia gravis, rheumatoid arthritis, insulin-dependent diabetes, ankylosing spondylitis, Guillain-Barré syndrome, autoimmune hepatitis and primary biliary cirrhosis, neurological diseases such as multiple sclerosis and other demyelinating pathologies, and even the atherosclerotic plaque.814

In contrast, the results presented here are consistent with a number of other reports in which the elusive character of the molecular mimicry hypothesis has been underlined.1527 Our past4,5 and present data tend to exclude a causal mechanistic role for molecular mimicry in the genesis of autoimmunity. According to the molecular mimicry hypothesis, the widespread overlap between viral and bacterial proteomes and the human proteome (see Table 1 and ref. 5) would predict that autoimmune diseases should have a much higher incidence than actually observed, both in the total number of individuals affected and the number of autoimmune pathologies per individual. Thus, it is difficult to reconcile the enormous number of viral and bacterial peptides disseminated throughout the human proteins with a fundamental role for molecular mimicry in the etiology of certain autoimmune conditions.

Instead, we believe that the high number of bacterial sequences that are also found in the human proteome, but are not clinically relevant in terms of inducing autoimmune diseases, offers a mechanistic basis for an additional microbial immune evasion strategy. Through evolution and adaptation, microbes have developed strategies that allow them to evade the immune system of their host. Such tactics promote infectious persistence and chronicity; among others, these include the altered peptide ligands of the circumsporozoite protein in malaria;28 macrophage apoptosis in microbial infections by Shigella;29 antigenic variations in Trypanosoma cruzi,30 and the consumption/degradation of complement components in microbial organisms like Porphyromonas gingivalis and Trichomonas vaginalis.31 The high level of peptide sharing between microbial and human proteomes might represent a camouflage mechanism that protects microbes from the immune attack of the host, possibly acting through the regulatory T cells that provide critical control of unwanted autoimmune responses. In a wider context, the high level of exact peptide sharing between microbial and human proteomes suggests that post-translational modifications (i.e., glycosylation, cysteinylation, citrullination, etc.) should be reconsidered as a factor that may contribute to the creation or disruption of microbial epitopes.32

Finally, from an evolutionary point of view, the massive and repeated distribution of bacterial amino acid sequences throughout the human proteome seems to indicate that bacterial and human proteins are composed of common peptide backbone units and suggests the existence of a common structural platform in the composition of proteomes, be they microbial or human.1,33


The human proteome was downloaded from Integr8 (,34 and contained 38,009 proteins at the time that it was downloaded. To reduce sequence redundancy, all possible pairs of proteins in this proteome were examined. For a given pair, if the sequences were identical then one sequence was arbitrarily chosen for deletion; if one sequence was a fragment of the other sequence, then the fragment was deleted. After filtering, we were left with a human proteome consisting of 36,014 unique proteins, for a total of 15,806,702 amino acids.

Like the human proteome, all bacterial proteomes were downloaded from Integr8.34 The set of pathogenic bacteria was acquired by searching EBI's list of bacteria ( for those that cause disease in humans. The set of non-pathogenic bacteria was acquired by arbitrarily choosing bacteria listed on the Integrated Microbial Genomes (IMG) website ( that contain the annotation “Disease: none.” Although the IMG site contains downloadable proteomes for each organism, these proteomes were downloaded from Integr8 instead of the IMG site in order to maintain consistency with the pathogenic bacteria. Each bacterial proteome was filtered in the same manner as the human proteome. The 40 filtered bacterial proteomes consisted of 128,248 unique proteins for a total of 39,651,163 amino acids.

Sequence similarity analysis of each of the 40 bacterial proteomes to the human proteome was carried out using bacterial nonamers sequentially overlapped by eight residues. The scans were performed by custom programs written in C, which utilized suffix trees for efficiency.36 The bacterial proteomes were manipulated and analyzed as follows. Each bacterial proteome was decomposed in silico to a set of nonamers (including all duplicates). A library of unique nonamers for each microbial proteome was then created by removing duplicates. Next, for each nonamer in the library, the entire human proteome was searched for instances of the same nonamer. Any such occurrence was termed an overlap or match. Cursory analyses (e.g., identification of unique overlapping nonamers, counts of unique overlapping nonamers, counts of duplications) were performed using shell scripts and standard LINUX/UNIX utilities. Linear least-squares regression was performed to determine whether any linear relationships exist between the size of a given bacterial proteome and its level of overlap to the human proteome.


B.T. performed the computational analysis. A.K. provided bioinformatics expertise and supervised the computational analysis. G.L. developed initial analyses of bacterial proteomes, validated them by PIR perfect match program, and analyzed output data. D.K. proposed the original idea, supervised the work, interpreted the data and wrote the paper. All four authors revised the paper, with a major contribution by B.T.

Funding for this work was provided by the Ministry of University and Research of Italy (MIUR) and the Natural Sciences and Engineering Research Council of Canada (NSERC).


1. Kanduc D, Tessitore L, Lucchese G, Kusalik A, Farber E, Marincola FM. Sequence uniqueness and sequence variability as modulating factors of human anti-HCV humoral immune response. Cancer Immunol Immunother. 2008;57:1215–1223. [PubMed]
2. Capone G, De Marinis A, Simone S, Kusalik A, Kanduc D. Mapping the human proteome for nonredundant peptide islands. Amino Acids. 2008;35:209–216. [PubMed]
3. Kanduc D. “Self-nonself” peptides in the design of vaccines. Curr Pharm Des. 2009;28:3283–3289. [PubMed]
4. Kusalik A, Bickis M, Lewis C, Li Y, Lucchese G, Marincola FM, et al. Widespread and ample peptide overlapping between HCV and Homo sapiens proteomes. Peptides. 2008;28:1260–1267. [PubMed]
5. Kanduc D, Stufano A, Lucchese G, Kusalik A. Massive peptide sharing between viral and human proteomes. Peptides. 2008;29:1755–1766. [PubMed]
6. Oldstone MB. Molecular mimicry and immune-mediated diseases. FASEB J. 1998;12:1255–1265. [PubMed]
7. Oldstone MB. A suspenseful game of ‘hide and seek’ between virus and host. Nat Immunol. 2007;8:325–327. [PubMed]
8. Oomes PG, Jacobs BC, Hazenberg MP, Bänffer JR, van der Meché FG. Anti-GM1 IgG antibodies and Campylobacter bacteria in Guillain-Barre syndrome: evidence of molecular mimicry. Ann Neurol. 1995;38:170–175. [PubMed]
9. Cunningham MW. Autoimmunity and molecular mimicry in the pathogenesis of post-streptococcal heart disease. Front Biosci. 2003;8:533–543. [PubMed]
10. Lamb DJ, El-Sankary W, Ferns GA. Molecular mimicry in atherosclerosis: a role for heat shock proteins in immunisation. Atherosclerosis. 2003;167:177–185. [PubMed]
11. Ebringer A, Ahmadi K, Fielder M, Rashid T, Tiwana H, Wilson C, et al. Molecular mimicry: the geographical distribution of immune responses to Klebsiella in ankylosing spondylitis and its relevance to therapy. Clin Rheumatol. 1996;15:57–61. [PubMed]
12. Karopoulos C, Rowley MJ, Handley CJ, Strugnell RA. Antibody reactivity to mycobacterial 65 kDa heat shock protein: relevance to autoimmunity. J Autoimmun. 1995;8:235–48. [PubMed]
13. O'Donohue J, McFarlane B, Bomford A, Yates M, Williams R. Antibodies to atypical mycobacteria in primary biliary cirrhosis. J Hepatol. 1994;21:887–889. [PubMed]
14. Li de la Sierra I, Pernot L, Prangé T, Saludjian P, Schiltz M, et al. Molecular structure of the lipoamide dehydrogenase domain of a surface antigen from Neisseria meningitidis. J Mol Biol. 1997;269:129–141. [PubMed]
15. Markesich DC, Sawai ET, Butel JS, Graham DY. Investigations on etiology of Crohn's disease. Humoral immune response to stress (heat shock) proteins. Dig Dis Sci. 1991;36:454–460. [PubMed]
16. Richter W, Mertens T, Schoel B, Muir P, Ritzkowsky A, Scherbaum WA, et al. Sequence homology of the diabetes-associated autoantigen glutamate decarboxylase with coxsackie B4-2C protein and heat shock protein 60 mediates no molecular mimicry of autoantibodies. J Exp Med. 1994;180:721–726. [PMC free article] [PubMed]
17. Horwitz MS, Bradley LM, Harbertson J, Krahl T, Lee J, Sarvetnick N. Diabetes induced by Coxsackie virus: initiation by bystander damage and not molecular mimicry. Nat Med. 1998;4:781–785. [PubMed]
18. Zhao R, Loftus DJ, Appella E, Collins EJ. Structural evidence of T cell xeno-reactivity in the absence of molecular mimicry. J Exp Med. 1999;189:359–370. [PMC free article] [PubMed]
19. Tanaka A, Prindiville TP, Gish R, Solnick JV, Coppel RL, et al. Are infectious agents involved in primary biliary cirrhosis? A PCR approach. J Hepatol. 1999;31:664–671. [PubMed]
20. Verjans GM, Remeijer L, Mooy CM, Osterhaus AD. Herpes simplex virus-specific T cells infiltrate the cornea of patients with herpetic stromal keratitis: no evidence for autoreactive T cells. Invest Ophthalmol Vis Sci. 2000;41:2607–2612. [PubMed]
21. Kirby B, Al-Jiffri O, Cooper RJ, Corbitt G, Klapper PE, Griffiths CE. Investigation of cytomegalovirus and human herpes viruses 6 and 7 as possible causative antigens in psoriasis. Acta Derm Venereol. 2000;80:404–406. [PubMed]
22. Benoist C, Mathis D. Autoimmunity provoked by infection: how good is the case for T cell epitope mimicry? Nat Immunol. 2001;2:797–801. [PubMed]
23. Schloot NC, Willemen SJ, Duinkerken G, Drijfhout JW, de Vries RR, Roep BO. Molecular mimicry in type 1 diabetes mellitus revisited: T-cell clones to GAD65 peptides with sequence homology to Coxsackie or proinsulin peptides do not crossreact with homologous counterpart. Hum Immunol. 2001;62:299–309. [PubMed]
24. Faller G, Keller KM, Claeys D, Buderus S, Kühlwein D, Reiche N, et al. Prevalence and specificity of antigastric autoantibodies in adolescents infected with Helicobacter pylori. J Pediatr. 2002;140:68–74. [PubMed]
25. Van Bilsen JH, Wagenaar-Hilbers JP, Boot EP, van Eden W, Wauben MH. Searching for the cartilage-associated mimicry epitope in adjuvant arthritis. Autoimmunity. 2002;35:201–210. [PubMed]
26. Wang CX, Teufel A, Cheruti U, Grötzinger J, Galle PR, Lohse AW, et al. Characterization of human gene encoding SLA/LP autoantigen and its conserved homologs in mouse, fish, fly and worm. World J Gastroenterol. 2006;12:902–907. [PMC free article] [PubMed]
27. Fourneau JM, Bach JM, van Endert PM, Bach JF. The elusive case for a role of mimicry in autoimmune diseases. Mol Immunol. 2004;40:1095–1102. [PubMed]
28. Plebanski M, Lee EA, Hannan CM, Flanagan KL, Gilbert SC, Gravenor MB, et al. Altered peptide ligands narrow the repertoire of cellular immune responses by interfering with T-cell priming. Nat Med. 1999;5:565–571. [PubMed]
29. Hilbi H, Zychlinsky A, Sansonetti PJ. Macrophage apoptosis in microbial infections. Parasitology. 1997;115:79–87. [PubMed]
30. Schechter M, Nogueira N. Variations induced by different methodologies in Trypanosoma cruzi surface antigen profiles. Mol Biochem Parasitol. 1988;29:37–45. [PubMed]
31. Mosser DM, Brittingham A. Leishmania, macrophages and complement: a tale of subversion and exploitation. Parasitology. 1997;115:9–23. [PubMed]
32. Doyle HA, Mamula MJ. Post-translational protein modifications in antigen recognition and autoimmunity. Trends Immunol. 2001;22:443–449. [PubMed]
33. Kusalik A, Trost B, Bickis M, Fasano C, Capone G, Kanduc D. Codon number shapes peptide redundancy in the universal proteome composition. Peptides. 2009 In press. [PubMed]
34. Kersey P, Bower L, Morris L, Horne A, Petryszak R, Kanz C. Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res. 2005;33:297–302. [PMC free article] [PubMed]
35. Markowitz VM, Szeto E, Palaniappan K, Grechkin Y, Chu K, Chen IM. The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res. 2008;36:528–533. [PMC free article] [PubMed]
36. Gusfield D. Algorithms on strings, trees and sequences: Computer science and computational biology. Cambridge University Press; 1997.

Articles from Self Nonself are provided here courtesy of Taylor & Francis