|Home | About | Journals | Submit | Contact Us | Français|
A comparative analysis of the predicted protein sequences encoded in the complete genomes of Borrelia burgdorferi and Treponema pallidum provides a number of insights into evolutionary trends and adaptive strategies of the two spirochetes. A measure of orthologous relationships between gene sets, termed the orthology coefficient (OC), was developed. The overall OC value for the gene sets of the two spirochetes is about 0.43, which means that less than one-half of the genes show readily detectable orthologous relationships. This emphasizes significant divergence between the two spirochetes, apparently driven by different biological niches. Different functional categories of proteins as well as different protein families show a broad distribution of OC values, from near 1 (a perfect, one-to-one correspondence) to near 0. The proteins involved in core biological functions, such as genome replication and expression, typically show high OC values. In contrast, marked variability is seen among proteins that are involved in specific processes, such as nutrient transport, metabolism, gene-specific transcription regulation, signal transduction, and host response. Differences in the gene complements encoded in the two spirochete genomes suggest active adaptive evolution for their distinct niches. Comparative analysis of the spirochete genomes produced evidence of gene exchanges with other bacteria, archaea, and eukaryotic hosts that seem to have occurred at different points in the evolution of the spirochetes. Examples are presented of the use of sequence profile analysis to predict proteins that are likely to play a role in pathogenesis, including secreted proteins that contain specific protein-protein interaction domains, such as von Willebrand A, YWTD, TPR, and PR1, some of which hitherto have been reported only in eukaryotes. We tentatively reconstruct the likely evolutionary process that has led to the divergence of the two spirochete lineages; this reconstruction seems to point to an ancestral state resembling the symbiotic spirochetes found in insect guts.
Comparative analysis of the protein products encoded in complete genomes provides a powerful tool for evaluating evolutionary trends and adaptive strategies in pathogenic microbes. Identification of metabolic and signaling pathways that are unique to pathogens offers novel targets for therapeutic intervention. Also, such unique determinants may prove useful in developing better diagnostic and prognostic indicators of infectious diseases in humans. Conclusions from such an analysis enhance understanding of the host-parasite interactions that enable pathogens to carve out unique ecological niches in nature.
Borrelia burgdorferi is the causative agent of Lyme disease, whereas the related spirochete Treponema pallidum causes syphilis. Both organisms are fastidious in their growth requirements, have small genomes, and manifest clinically as distinct, chronic, disseminated diseases. The clinical similarities (68) include progression of disease in stages following local inoculation either through a tick in Lyme disease or through sexual contact in syphilis. Involvement of the central nervous system is seen in both infections, with sequalae ranging from neurologic deficits to neuropsychiatric abnormalities (19). Chronic, recurrent joint involvement is classically associated with Lyme disease (64). The protean clinical manifestations of these pathogens and their propensity to cause chronic infection in the human host contribute to the ongoing diagnostic and therapeutic challenges offered by these spirochetes (22, 42).
The recent determinations of the complete genome sequences of B. burgdorferi (30) and T. pallidum (31) have made a comprehensive computer-based comparison of the two spirochete genomes a feasible exercise. Recent comparative genomic studies include comparisons between two closely related mycoplasma species (36) and between bacteria that belong to related genera but encode vastly different numbers of proteins, namely, Escherichia coli and Haemophilus influenzae (72). The complete genomes of T. pallidum and B. burgdorferi give us the first chance to compare two moderately related pathogenic bacteria with approximately equivalent coding capacities. In order to systematically compare the protein sets encoded in the two spirochete genomes, we devised a measure for evaluating evolutionary relationships between families and functional classes of proteins: the orthology coefficient. With regard to the two spirochetes, it seemed likely that differences observed across families and functional groups of proteins could be indicative of the distinct evolutionary pressures of the host environment.
Completely automated computational analysis of genome sequences is often plagued by errors arising from compositional bias in protein sequences, difficulties in automatic assessment of the domain organization of proteins, and the lack of in-depth predictive power (16). Therefore, in comparative analysis of the two spirochete genomes we used a semiautomated strategy that is based on recently developed, powerful tools for sequence analysis, particularly position-specific iterating BLAST (PSI-BLAST) (2, 3), and also involves case-by-case examination of protein families and individual protein sequences. Optimization of the functional annotation of proteins accompanied by reconstruction of metabolic and signaling pathways in the two spirochetes helped uncover a number of unique features that likely play a role in host response-related functions. In this report, we compare and contrast the gene complements of the two spirochetes and provide examples of shared and unique genes and functional systems.
The complete protein sets of B. burgdorferi (30) and T. pallidum (31) were obtained from the genome division of the Entrez retrieval system at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome. All database searches were carried out against the nonredundant (NR) sequence database maintained at the National Center for Biotechnology Information (National Institutes of Health, Bethesda, Md.).
Large-scale sequence analysis was handled using the SEALS program package (76). The first step of the analysis involved a search of the NR database of protein sequences at the National Center for Biotechnology Information using the gapped BLAST program (3), with the complete protein sets of B. burgdorferi and T. pallidum as queries. Compositionally biased regions in query sequences that tend to produce spurious hits in database searches were masked using the SEG program (81). Two different sets of parameters for SEG were used, namely, mild masking (window length, 12; trigger complexity, 2.2; extension complexity, 2.5) for routine searches and stringent masking (window length, 45; trigger complexity, 3.4; extension complexity, 3.75) for delineating particular globular domains (81). In-depth database searches were performed using the PSI-BLAST program (3) whenever specific annotation required detection of subtle relationships. Briefly, PSI-BLAST uses multiple alignments of sequences that show expectation values, or e-values, lower than a specified cutoff (in this case 0.01) in the first-pass gapped BLAST search to construct a position-specific scoring matrix (PSSM). The PSSM is then used as the input for further iterations of the database searches, which results in increased detection of subtle sequence similarities (2, 8). Statistical evaluation of the PSI-BLAST result is based on the empirical version of Karlin-Altschul statistics, modified for gapped alignments (3). The e-value indicates the number of sequences which show a certain level of similarity by chance alone for a database of the size equivalent to NR. The e-value reported on the first instance that PSI-BLAST detects the given sequence above the cutoff is a reliable indicator of statistical significance. Under the condition that properly filtered globular domains are used as queries, an e-value of less than 0.01 is generally suggestive of homology (8).
Structural features of proteins were analyzed using the FAMASK program of the SEALS package (76), which incorporates several prediction methods. Signal peptides were predicted using the SIGNALP program (49), transmembrane helices were predicted using the PHD topology program (57) or the TopPred II program (20), and coiled-coil regions were predicted using the COILS2 program with a window length of 21 (43).
The phyletic distribution of homologous proteins detected by gapped BLAST searches was evaluated using the TAX_COLLECTOR program (76) of SEALS. The hits with an e-value below 0.001 between each protein encoded in the two spirochete genomes and the NR databases were analyzed, and the best hits in the second spirochete as well as those in other bacteria, archaea, and eukarya, with their respective e-values, were tallied. This provided the starting point for identifying orthologous genes in the two genomes as well as a preliminary indication of the phyletic distribution of homologs for most of the proteins encoded in each genome. An attempt was made to delineate true orthologous relationships on the basis of symmetric best hits between the proteins from the two spirochetes as well as between the spirochete proteins and those from other bacteria, archaea, and eukaryotes. In problematic cases, phylogenetic tree analysis using multiple sequence alignments of conserved domains was carried out in order to evaluate the relationships. Phylogenetic trees were constructed using the PAUP3 or CLUSTALW programs (35) with the neighbor-joining and least squares (Fitch-Margoliash) methods, accompanied by bootstrap analysis (27).
A complementary approach was employed for case-by-case analysis of those spirochete proteins that failed to show statistically significant matches when used as queries for PSI-BLAST. These sequences were screened for the presence of known conserved domains using the previously developed libraries of PSSMs (18, 53) as well as newly developed PSSMs. Domain-family PSSMs were created by running PSI-BLAST against the NR database with the appropriate query and an e-value cutoff of 0.01 and saving the profile using the “−C” option. The resulting PSSM was rerun against the protein sequence sets of T. pallidum and B. burgdorferi using the −R option of PSI-BLAST, and hits with significant scores were detected. To test the robustness of this procedure, we prepared databases using randomized protein sequences from each of the spirochete proteomes and conducted similar searches with each of the PSSMs on them. No hits with significant e-values (<0.01) were detected in these searches, supporting the validity of the domains detected in the spirochete proteomes with the PSSMs. Additional database searches were performed with family-specific Hidden Markov models that were constructed and run using the HMMER2 program package (66). Multiple alignments of protein families were constructed using a combination of the −m4 option of PSI-BLAST and the CLUSTALW program (35).
The proteins encoded by the genomes can be broadly categorized on the basis of either (predicted) cellular function (functional class) or shared conserved domains (protein families). Orthologs are genes (proteins) in different genomes that are related by vertical descent; i.e., they can be traced back to a common ancestor. Such orthologous proteins typically have similar, if not identical, biological functions in the respective organisms (71). Paralogs, in contrast, are genes related by duplication within a lineage (genome); paralogs are expected to possess generally similar biochemical functions but distinct biological roles. Given these differences in functional implications, careful delineation of orthologous and paralogous relationships between genes is of primary importance for predicting functions of proteins with an appropriate degree of precision. We adopted a quantitative measure of orthology that we termed the orthology coefficient (OC). This index can be applied to different protein categories, namely, superfamilies related by descent as opposed to functional classes of proteins, such as those involved in transcription or replication. In the simplest case of a one-to-one correspondence between genes in two genomes, OC = 2No/(N1 + N2) where No is the number of orthologs and N1 and N2 are the numbers of members of the given protein family or functional category in the two compared genomes (Fig. (Fig.1).1). If the gene complements in the two genomes are completely conserved, then the OC is 1.00; by contrast, in the hypothetical case where there are no orthologs, the OC is 0. If there is one or more duplications in one or both of the species that occurred after their divergence, then OC = (No1 + No2)/(N1 + N2) where No1 and No2 are the numbers of members in orthologous clusters from the two respective genomes (Fig. (Fig.1).1). Criteria used to identify orthologs included symmetry of retrieval in reciprocal BLAST searches, conservation of the domain organization of proteins, and, in problematic cases, phylogenetic tree analysis (71, 72).
Functional classes of proteins include core functions, such as replication and cell division, repair, transcription, and translation, that are shared by all bacteria and to a large extent also by archaea and eukaryotes, and more specialized functions, such as metabolism, metabolite transport, host and environment response, and signal transduction. Figure Figure22 shows the representation of the functional classes in the two genomes and the level of orthology in each class measured using OC. In order to assess the effects of specific adaptations of each of the spirochetes to their distinct niches, investigation of classes with low OC values appears to be critical. Below, we discuss the distribution of functional classes in the two spirochetes along the gradient of OC values. A selection of prominent differences in the functional systems of the two spirochetes is shown in Table Table1,1, and examples of apparent spirochete synapomorphies are listed in Table Table2.2.
Not surprisingly, the core functional classes, namely, DNA replication and translation, had OCs of 0.94 and 0.98, respectively, which is indicative of largely vertical inheritance (Fig. (Fig.2).2). The limited differences observed in these conserved functional classes could be attributed to interesting cases of horizontal gene transfer, gene loss, or duplication with divergence.
Differences in the gene complement involved in DNA replication include the lack of topoisomerase IV (parC, parE) in T. pallidum, which is of interest given the conserved role of these proteins in chromosome partitioning in other bacteria (77). Conversely, the proofreading 3′-5′ exonuclease—the subunit of DNA polymerase III (e.g., the E. coli DnaQ protein)—whose role in DNA replication seems to be conserved in all other bacteria (29), is missing in B. burgdorferi. The absence of this enzyme would result in error-prone replication, which seems to be compatible with the generation of new variants through mutation in the genes encoding variable antigens of Borrelia (58, 84). An alternative is nonorthologous displacement (E. V. Koonin, A. R. Mushegian, and P. Bork, Letter, Trends Genet. 12:334–336, 1996), that is, an unrelated or distantly related and so far unidentified exonuclease being recruited for the proofreading role. Interestingly, T. pallidum encodes a second copy of the single-stranded DNA-binding protein (TP0310), which might have a distinct function in replication or recombination.
In addition to these striking differences, we detected an interesting case of apparent displacement of an existing gene by its ortholog from a distant species, probably via horizontal gene transfer. Sequence searches showed that topoisomerase I (Topo I) from B. burgdorferi shares an insert within the TOPRIM domain (10) and a C-terminal repeat domain with actinomycetes and cyanobacteria. These features are lacking in other bacterial lineages and appear to be derived, shared characters (synapomorphies) within this protein superfamily. Phylogenetic analysis using multiple alignment of the conserved region of bacterial Topo I strongly supported grouping of B. burgdorferi with actinomycetes and cyanobacteria (Fig. (Fig.33 [bootstrap value, >90%]). This phylogenetic affinity of Topo I is in stark contrast to the typical clustering of the other spirochete proteins, particularly those from the core functional classes, and suggests horizontal gene transfer into the B. burgdorferi lineage followed by loss of the ancestral form. This phenomenon, which may be termed xenologous gene displacement, that is, displacement by an orthologous gene from a phylogenetically distant source (J. P. Gogarten, Letter and comment, J. Mol. Evol. 39:541–543, 1994), is seen also in several other instances in the two spirochetes (see below).
In spite of the high level of conservation in the translation apparatus, we did detect a few interesting examples of evolutionary divergence between the two species. One notable case was the prolyl-tRNA synthetase from B. burgdorferi, where phylogenetic analysis strongly suggested that the enzyme in B. burgdorferi has been acquired from a eukaryotic source by the xenologous displacement mechanism (78). The other aminoacyl-tRNA synthetases are strongly conserved between the spirochetes, but phylogenetic analysis provided evidence for likely ancient horizontal transfers into the ancestor of spirochetes accompanied by displacement of the original gene. One well-studied case in this category is class I lysyl-tRNA synthetase, which apparently was acquired from the archaea (37, 78); similar evidence of archaeal origin was obtained for the two subunits of phenylalanyl-tRNA synthetase (78). By contrast, the synthetases for arginine, glutamate, methionine, isoleucine, and serine showed evidence of likely horizontal transfer from eukaryotic sources (78).
The gene complements involved in repair, recombination, and transcription show significant differences between the two spirochetes, which is reflected in OC values of less than 0.8 (Fig. (Fig.2).2). A number of important repair genes are different in the two genomes. Thus, B. burgdorferi encodes the helicase-nuclease complex RecBCD, whereas T. pallidum lacks these genes but has instead a complement of recFGNR genes. Another repair enzyme unique to T. pallidum is the ERCC3-like eukaryotic-type DNA repair helicase that is also present in mycobacteria (54). It is likely that this gene was horizontally transferred from eukaryotes into a certain bacterial lineage and subsequently disseminated amidst the bacteria. The triplication of the C-terminal BRCT domain of DNA ligase in T. pallidum is a unique feature that is in contrast to the similarly located single copy of this domain present in all other bacteria (7, 15). The RecQ helicase gene is found in T. pallidum but not in B. burgdorferi. Conversely, B. burgdorferi but not T. pallidum retains the ancient duplication of the mutS gene, which is also seen in other bacteria. Interestingly, B. burgdorferi lacks RuvC, the endonuclease subunit of the Holliday junction resolvase complex, which suggests either that the Ruv complex is nonfunctional or that a distinct nuclease complements this function. Both spirochetes possess a novel mismatched-base DNA glycosylase (TP0229/BB0013), a base excision repair enzyme, orthologs of which are conserved in a number of bacteria and in all archaea (60), but the B. burgdorferi version is highly divergent from all other bacterial orthologs. The major disparities in the repertoires of repair proteins suggest a difference in the selective forces that act on this system in the two spirochetes. The fact that B. burgdorferi undergoes antigenic variation (58, 84) that may require both active recombination and error-prone repair provides a possible explanation for the retention of only certain of its repair pathways.
The intermediate OC of the transcriptional apparatus seems to reflect a bimodal effect of selective evolutionary forces. The core transcriptional machinery, which includes RNA polymerase subunits and components of the elongation and termination complexes, is highly conserved, whereas the repertoires of specific transcriptional regulators are largely different (Fig. (Fig.2).2). Within the shared heritage, there are some unique features that are likely to have been derived in the common ancestor of these spirochetes and may be considered signatures of this clade. In particular, the spirochete NusA protein (a part of the transcription termination complex) contains a C-terminal Zn ribbon domain in place of the helix-hairpin-helix (HhH) domain that is found in most other bacteria (data not shown). Similarly, the transcription elongation factor GreA (40) in both spirochetes possesses a long N-terminal extension which is shared only with chlamydiae (69). An orthologous pair of spirochete proteins (TP0511 and BB0355) are homologous to the transcription factor CarD from Myxococcus (48) and share a domain with the transcription repair-coupling helicase (TRCF) (Fig. (Fig.4A).4A). Since the corresponding region in TRCF interacts with the RNA polymerase holoenzyme (63), these proteins are likely to be novel transcription factors, characteristic of the spirochete clade, that modulate transcription by interacting directly with the RNA polymerase.
Major differences between the two spirochetes were revealed even among the sigma factors. While they share ς70 (RpoD) and ς54 (RpoN), B. burgdorferi also encodes an RpoS ortholog, whereas T. pallidum possesses ς24 (RpoE), ς28, and ς43 (SigA). The three sigma factors present in T. pallidum but not in B. burgdorferi are likely to be involved in regulating multiple gene sets specific to the former. Of particular interest in this context is ς24, whose orthologs in other bacteria regulate gene batteries associated with stress response and pathogenesis (1). Consistent with the detection of multiple sigma factors, T. pallidum encodes elements of the system of sigma factor regulators similar to that seen in Bacillus, Chlamydia, Synechocystis, and the actinomycetes. These regulators include two phosphatases of the PP2C family and an anti-sigma factor antagonist (83), which suggests that T. pallidum senses a distinct environmental signal regulating the anti-sigma factor antagonist. However, T. pallidum does not encode an ortholog of the small serine kinase of the histidine kinase class (RsbV protein), which is an essential component of the anti-sigma factor regulatory system in other bacteria (83). If this system is to be functional, it must include a kinase other than conventional histidine or serine/threonine kinases, as none of them, with the exception of CheA (which is not known to participate in this pathway), are detectable in T. pallidum.
Each of the spirochetes encodes several predicted specific transcriptional regulators of the helix-turn-helix (HTH) class of DNA-binding proteins. In addition, B. burgdorferi encodes a member of the MetJ/Arc family of β-sheet-containing transcription factors (BBD22) on its linear plasmid. With the exception of a single, novel family of HTH proteins, none of these likely transcriptional regulators are orthologous in the two spirochetes, which is consistent with the diversification of the possible target metabolic pathway operons (see below). The shared HTH proteins are an unusual group of two orthologous protein pairs (TP0711/BB0265 and TP0408/BB0512) that contain a C-terminal HTH domain similar to the PAIRED domains of resolvases (Fig. (Fig.4B).4B). In addition, they have a hydrophobic N-terminal region resembling a signal peptide that may be responsible for an unexpected membrane association and processing associated with their function. Orthologs of these proteins with the same domain arrangement are detectable in H. pylori, Bacillus subtilis, and Thermatoga maritima. As an example of the diversified transcriptional regulation for similar functions, T. pallidum encodes a winged-HTH protein, TroR (34), whereas B. burgdorferi encodes the unrelated Fur protein (75), both of which are predicted to negatively regulate metal uptake. Two HTH-containing regulators of sugar metabolism, XylR-1 and XylR-2, that combine the HTH domain with the Hsp-70 like sugar-kinase domain are present only in B. burgdorferi.
The signal transduction systems in T. pallidum and B. burgdorferi show low OC values (Fig. (Fig.2),2), with only a few conserved features that seem to be remnants of the signaling apparatus of the common ancestor of the spirochetes. This shared core includes components of chemotactic signaling based on the histidine kinase CheA, which contains a C-terminal CheW domain (Fig. (Fig.5),5), and associated downstream signaling components, such as methyl-accepting and methyltransferase chemotaxis proteins. Even within this system, however, B. burgdorferi possesses specific methylesterase, methyltransferase, and methyl-accepting proteins with no counterparts in T. pallidum (Fig. (Fig.5).5). Furthermore, B. burgdorferi has a larger representation of the two-component relay systems, with two additional histidine kinases other than the two CheA paralogs and multiple additional receiver domain proteins. These histidine kinases are likely to function as sensors of different stimuli, since one of them contains a PAS domain that has been implicated in redox sensing and the other one contains a large, uncharacterized extracellular (periplasmic) domain (Fig. (Fig.5).5). Cross-talk between these kinases and other signaling systems is suggested by the fusion of the receiver domain of the two-component system with other signaling domains, such as methylesterase and diguanylate cyclase/phosphodiesterase. Of additional interest is the presence, in both spirochetes, of CheX proteins (TP0365/BB0671) (32). These proteins are homologs of the FliM protein of E. coli, which appears to provide a link between chemotactic signaling and the flagellar motor (46).
The phosphoenolpyruvate:sugar phosphotransferase system (PTS) (59) is fully represented in B. burgdorferi (Fig. (Fig.5).5). It includes multiple EII enzymes with different sugar specificities along with other PTS proteins, such as HPR and phosphoenolpyruvate-protein-phosphotransferase. By contrast, T. pallidum encodes an unusual motley of proteins from the PTS system which includes the phosphocarrier protein Hpr and its kinase, similar to the one involved in catabolite repression in gram-positive bacteria (56), and a single protein containing an EIIA domain but not the other PTS components (Fig. (Fig.5).5). It is possible that these remnants of the PTS system in T. pallidum are involved in a novel signal transduction mechanism.
T. pallidum possesses a well-developed cyclic AMP (cAMP) signaling system that is lacking in B. burgdorferi (Fig. (Fig.5).5). The presence, in T. pallidum, of four cAMP-binding domain-containing proteins and a classical “eukaryote-type” adenylyl cyclase similar to those present in some other bacteria, such as cyanobacteria, myxobacteria, and actinomycetes, indicates a prominent role for this system in signal transduction. The domain architectures of these proteins suggest functional diversity—these include duplication of the cAMP-binding domain (TP0089) and fusions to an HTH DNA-binding domain (TP0262) and to an alpha-helical domain distantly related to the tetratricopeptide repeats (TPR) (TP0261) (Fig. (Fig.5).5). Furthermore, diguanylate cyclase/phosphodiesterase from T. pallidum contains a GAF domain that in some instances binds cyclic nucleotides (12). This protein might participate in cross-talk between cyclic nucleotide signaling and cyclic diguanylate signaling. The adenylyl cyclase of T. pallidum (TP0485) contains a HAMP (for histidine kinases, adenylyl cyclases, methyl accepting chemotaxis receptors, and phosphatases) domain, a recently described module that is associated with the cytoplasmic face of several bacterial signaling proteins (11). This domain is predicted to transmit extracellular stimuli sensed by these receptor proteins to the respective intracellular signaling domains. All bacteria that contain this eukaryotic-type cAMP-dependent signaling system lack a counterpart to the eukaryotic cyclic nucleotide phosphodiesterase and, in addition, lack the proteobacterial-type phosphodiesterase Icc (44). However, T. pallidum encodes a predicted membrane protein with six transmembrane helices that contains a metal-dependent phosphodiesterase domain of the HD superfamily, which is distantly related to eukaryotic phosphodiesterases (9). This protein is a candidate for the cAMP phosphodiesterase function. B. burgdorferi lacks classical adenylyl cyclases and cAMP-binding proteins but encodes an unrelated adenylyl cyclase (BB0723) that belongs to a recently described class of adenylyl cyclases found in thermophilic archaea and Aeromonas hydrophila (65). The function of this protein in signal transduction remains unclear in the absence of any detectable downstream effector molecules that would recognize cAMP. It cannot be ruled out that a new class of cAMP-binding domains shared by archaea, eukaryotes, and some bacteria, including B. burgdorferi, remains to be discovered.
Most bacteria encode the SpoT protein, which is a guanosine-3′,5′-bispyrophosphate (ppGpp) 3′-pyrophosphohydrolaseguanosine 3′,5′-bispyrophosphohydrolase (62). This key enzyme is responsible for cellular (p)ppGpp degradation, which reverses ppGpp accumulation during the stringent response to amino acid deprivation. While B. burgdorferi possesses a SpoT ortholog (BB0198) (Fig. (Fig.5),5), with its characteristic catalytic HD domain (9) and regulatory TGS (78) and ACT (8) domains, T. pallidum surprisingly lacks this protein. The parasitic lifestyle of T. pallidum may differ from that of Borrelia, resulting in the loss of this stringent response signaling system. Of the published genomes of bacteria, only the two chlamydial species (38, 69) and T. pallidum lack this enzyme, whereas Rickettsia prowazekii encodes a degenerate and probably inactive version of the enzyme (5). This may reflect the superfluous nature of stringent response regulation in the nutrient-rich environments of these pathogens.
Consistent with the picture emerging from other pathogenic bacteria, the genomes of the two spirochetes are noticeably reduced in terms of genes coding for metabolic enzymes, which suggests a general degeneration of metabolic systems that most likely were represented in their common ancestor. However, even in this degenerate state, there are striking differences in the predicted metabolic pathways between T. pallidum and B. burgdorferi, suggesting different adaptation strategies. The common metabolic heritage is largely restricted to the trunk of the glycolytic Embden-Meyerhof pathway, the phosphorylation steps of nucleotide interconversion, and a system for membrane potential generation, namely, the multisubunit V-type ATPase. The latter is the principal energy-generating ATPase in archaea; in bacteria, the V-ATPase subunits are encoded in a single operon that appears to be evolutionarily mobile and shows a scattered distribution, being found, in addition to in the spirochetes, in Chlamydia, Enterococcus, Deinococcus, and Thermus (55).
As already discussed for other, more conserved pathways, even within the shared core of metabolism there are instances of likely horizontal transfer and associated xenologous and nonorthologous (E. V. Koonin et al., Letter, Trends Genet. 12:334–336, 1996) gene displacement. Apparent lateral gene transfer accompanied by displacement is illustrated by fructose-1-6-bisphosphate aldolase and the glyceraldehyde 3-phosphate dehydrogenase that belong to the central glycolytic pathway. Phylogenetic analysis of the aldolases from the two spirochetes showed that, while both of them are class II aldolases, they belong to two entirely distinct, well-supported groups, which suggests independent acquisition via lateral gene transfer from other bacteria (Fig. (Fig.3).3). In phylogenetic trees, glyceraldehyde 3-phosphate dehydrogenase from T. pallidum groups strongly with the orthologs of this protein from kinetoplastid eukaryotes (Fig. (Fig.3).3). This points to horizontal gene transfer from this eukaryotic lineage to T. pallidum after the divergence of the two spirochetes, followed by displacement of the original spirochete version. An even more interesting case is T. pallidum uridine kinase, which not only shows the unexpected grouping with the ortholog from Thermotoga in phylogenetic trees but also possesses a unique domain architecture, with two domains, including the TGS domain, apparently acquired from threonyl-tRNA synthetases (78). In the threonyl-tRNA synthetases, these domains play a role in RNA binding (61), which suggests that these unique uridine kinases could regulate their own expression by binding to mRNA. Another notable case of likely horizontal transfer followed by gene displacement involves a pair of closely linked genes in an operon, namely glucose 6-phosphate dehydrogenase (G6-PDH) and the adjacent gene that encodes the DevB protein. In T. pallidum, both of these proteins are closely related to their counterparts from Haemophilus and Actinobacillus rather than to those from B. burgdorferi. Phylogenetic analysis (Fig. (Fig.3)3) provides strong bootstrap support for this lineage and suggests that the entire operon comprised of G6-PDH and DevB has been exchanged between T. pallidum and the proteobacterial lineage.
Even among the genes encoding components of metabolic pathways that have been vertically inherited by the two spirochetes, there are footprints of likely ancient horizontal transfer events. Horizontal transfer from eukaryotes to spirochetes is attested by the presence of a pyrophosphate-dependent phosphofructokinase (BB0020, TP0542) that otherwise is characteristic of plants and several protists, such as Entamoeba and Giardia. This enzyme is present in the spirochetes along with the original bacterial ATP-dependent phosphofructokinase (BB0727, TP0108) in the glycolytic pathway, suggesting functional partitioning of the two versions.
Each of the spirochetes encodes two cytidylate kinases. One of these appears to have been acquired by the common ancestor of the spirochetes from the archaea. This is strongly suggested by their highly significant relationship to the archaeal proteins in phylogenetic tree analysis (Fig. (Fig.3)3) and the modified P-loop ATPase motif (data not shown) that is uniquely shared by these proteins with their archaeal counterparts.
Both spirochetes encode an aminopeptidase M containing an N-terminal insert sequence that is specifically shared with eukaryotes; phylogenetic analysis indicates that it might have been horizontally transferred into the ancestral spirochete from a eukaryote (Fig. (Fig.33).
Some of the drastic differences in the gene complements for metabolic processes seem to reveal adaptive features unique to each spirochete. The ability to utilize glycerol and chitin as energy sources through the glycolytic pathway, which is unique to B. burgdorferi, is consistent with the existence of an arthropod-specific stage in the life cycle of this spirochete. This is supported by the presence of genes for glycerol uptake and the FAD-dependent glycerol-3-phosphate dehydrogenase (BB0243), the latter apparently being an unusual case of horizontal transfer from a eukaryotic source (Fig. (Fig.3).3). Utilization of N-acetylglucosamine polymers or chitin (available in the arthropod host) as a substrate for cell wall biosynthesis (glucosamine deacetylase, NagA; BB0151) and/or as an energy source (glucosamine deaminase, NagB; BB0152) could represent an adaptation of B. burgdorferi for survival in its specific niche, in this case the arthropod host. The presence of an additional pathway for conversion of dihydroxyacetone phosphate into lactate via a methylglyoxal intermediate also could be an adaptation for growth under certain environmental conditions and carbohydrate sources (28, 74).
Another interesting metabolic feature of B. burgdorferi inferred from sequence analysis is the presence of a pathway for the biosynthesis of isopentenyl pyrophosphate (IPP). All of the enzymes necessary for the formation of IPP from 3-hydroxymethyl glutaryl-coenzyme A (HMG-CoA), along with a novel glycolate oxidase-like, TIM barrel fold oxidoreductase (BB0684) that could participate in the same process, are encoded in a single long operon. With the exception of this oxidoreductase, the remaining proteins of this pathway show eukaryotic or archaeal affinities in phylogenetic analysis (data not shown), with no closely related proteins in bacteria. Taken together with the fact that Pseudomonas mevalonii is the only other bacterium to date that encodes an orthologous HMG-CoA reductase (14), these observations suggest that this pathway has been acquired by B. burgdorferi via horizontal transfer from either an archaeal or a eukaryotic source. The enzymes encoded in this unusual operon are likely to participate in the synthesis of still uncharacterized membrane/lipoprotein components in B. burgdorferi.
The repertoires of transporters encoded by the two spirochetes are significantly different and, with an OC of 0.38, are among the most divergent functional classes revealed in our analysis; this suggests distinct substrate requirements. In this class, it is notable that, unlike B. burgdorferi, T. pallidum appears to lack a transport mechanism for proline or glycine-betaine. Given the proposed osmoprotective role of proline and glycine-betaine in bacterial stress response (4, 23), the presence of a full complement of genes for proline biosynthesis in T. pallidum likely serves as an adaptive response to osmotic stress.
Finally, the surface lipoproteins and other, uncharacterized membrane proteins of the two spirochetes show very low OC values (Fig. (Fig.2),2), which is compatible with the different modes of parasite-host interaction.
As an alternative approach to assessing the evolutionary forces acting on the spirochetes, we analyzed all of the largest families of paralogous proteins in the two genomes and calculated the OCs for each of them (Fig. (Fig.6).6). Like other bacteria and archaea (39, 80), the largest protein superfamily in the spirochetes includes P-loop-containing NTPases, with 86 members in B. burgdorferi and 69 in T. pallidum. Different families within this large superfamily show substantial variability in the degree of conservation. A number of ATPases, such as those involved in DNA replication, V-type ATPases, and GTPases, show a largely vertical inheritance pattern. By contrast, the ABC transporter ATPases show considerable diversity in the two spirochetes, which is consistent with the differences noted in nutrient transport and substrate specificity (see above). There is a specific expansion of the MinD family of ATPases, which are involved in chromosome partitioning in B. burgdorferi (12 members); this is consistent with the presence of several cosegregating plasmids that comprise parts of the B. burgdorferi genome (30).
Compared to the P-loop ATPase superfamily, all other protein families are much smaller and show widely varying degrees of orthology (Fig. (Fig.6).6). Some of these families, such as the pseudouridine synthetases, the PDZ domain proteins, and lysostaphin-like proteases, show OC values close to 1, which suggests functional coherence between the two organisms. The two classes of aminoacyl-tRNA synthetases and SAM-dependent methyltransferases have slightly lower OC values, which is due to the likely horizontal transfers discussed above or to relatively minor lineage-specific gene losses. Other families, such as HD superfamily hydrolases (9), metallo-β-lactamase fold hydrolases (6), calcineurin-like phosphatases (7), and peptidyl-prolyl cis-trans isomerases, show much lower OC values, which indicates major effects of lineage-specific gene loss, horizontal transfer, and rapid diversification. These families, unlike those associated with translation and replication, appear to participate in diverse roles that are under constant selection due to variations in the niches occupied by the two spirochetes. Certain protein families, such as the P-methylase/lipoate synthetase family, are seen only in T. pallidum. This raises the possibility of distinct metabolic options that are absent in B. burgdorferi.
We identified several previously undetected protein families in the spirochetes by using family-specific PSSMs (see Materials and Methods). These include some secreted (or possibly periplasmic) and membrane proteins that could be involved in pathogen-host interactions. The most interesting of these are the von Willebrand A factor (vWA) domain-containing proteins (52), which, until recently, had been detected only in eukaryotes. The vWA domain is a Mg2+-binding protein module with an α/β fold that participates in adhesion and protein-protein interactions in a wide range of eukaryotic proteins, such as integrins (26). Using profile searches, we identified three vWA domain-containing proteins in each of the spirochetes (Fig. (Fig.4C).4C). However, the vWA proteins from B. burgdorferi are not highly similar to those from T. pallidum, suggesting that they either have been independently acquired or have diverged rapidly due to selective forces acting on extracellular proteins. These secreted or membrane-associated vWA domain proteins are reminiscent of similar domains present in the extracellular adhesion molecule (TRAP) of a eukaryotic pathogen, the malarial genus Plasmodium (73). By analogy to TRAP, it is likely that the spirochete vWA proteins are involved in adhesion of these bacteria to the extracellular matrix or to cells of the host connective tissues. These proteins could be potential targets for specific antispirochete therapies. The similarity between the spirochete vWA domains and that of the LFA-1 integrin is of interest given the recent report identifying LFA-1 as an autoantigen in treatment-resistant Lyme arthritis (33).
Another interesting protein of B. burgdorferi is BB0689, a secreted or periplasmic protein containing a PR1 domain (70). This domain hitherto has been detected only in eukaryotes, namely in plant pathogenesis-related proteins (PR-1) and in proteins expressed in the animal immune system such as GliPR (70). One of the human proteins in this family has been suggested to be a novel trypsin inhibitor (82), whereas other members, such as helothermine, found in the toxins of Helodermid lizards and certain snakes, act as inhibitors of potassium and calcium channels (17, 50). Another protein of this family, TPX-1, is implicated in the interaction between spermatogenic and Sertoli cells (45). All of these observations indicate that PR-1 domain-containing proteins participate in diverse extracellular protein-protein interactions, suggesting a role for BB0689 in adhesion to host extracellular proteins and, accordingly, in pathogenesis.
Conversely, T. pallidum encodes an unusual, large, secreted (periplasmic) protein (TP0544) that contains an OB-fold domain (47). This protein might interact with host cells and act as a virulence factor like the OB-fold-containing toxins from a wide range of bacteria (25).
Both spirochetes encode proteins (BB0236 and TP0421) containing a modified version of the YWTD domain (67) that is seen in intracellular animal proteins, some of which also contain the RING finger domain (L. Aravind, unpublished data). This relationship to a class of animal proteins suggests lateral transfer from an animal host into the ancestor of the two spirochetes. YWTD domains assume a β-propeller structure similar, for example, to that in low-density lipoprotein receptors and in proteins from other parasites (e.g., Trypanosoma cruzi) that mediate extracellular adhesion (51). The presence of a predicted signal peptide suggests that, like the vWA and PR-1 domain-containing proteins, the spirochete YWTD domain proteins are secreted and interact with host cell surface molecules (52).
Another interesting family that is conserved between the two spirochetes, some of whose members appear to be secreted, are the tetratricopeptide repeat (TPR) domain-containing proteins (24). These domains form an alpha-helical superstructure and mediate protein-protein interactions. The use of TPR motifs in extracellular interactions might be a novel strategy of host interaction employed by the spirochetes and is reminiscent of a family of secreted Sel-1 repeat proteins in Helicobacter pylori (52; L. Aravind, unpublished data).
To add to this repertoire of previously undetected proteins that could be important for the pathogen's interaction with host cell surfaces, T. pallidum encodes two proteins with multiple ankyrin repeat motifs (TP0502, TP0835) (Fig. (Fig.5).5). At least TP0835, which contains 20 ankyrin repeats, has a canonical signal peptide, indicating that, unlike the eukaryotic ankyrin repeat proteins, this Treponema protein is secreted.
The use of the OC concept, along with a case-by-case analysis of individual protein sequences, has provided considerable information that allows us to address several evolutionary issues: (i) the amount of shared heritage present in the spirochete genomes, (ii) the likely gene repertoire and lifestyle of the common ancestor, (iii) the role of horizontal transfer, and (iv) the strategies and adaptations used by these organisms in their interaction with the host, which are important for pathogenesis.
Likely orthologs comprise about 43% of the total number of proteins encoded by the two spirochetes (39.5% [495 of 1,256] of the B. burgdorferi proteins and 47% [486 of 1,031] of the T. pallidum proteins). This fairly low fraction of orthologs indicates that lineage-specific gene loss, horizontal gene transfer, and rapid divergence account for more than one-half of the gene repertoires of the two spirochetes. The relatively lower fraction of orthologs in Borrelia is to a large extent due to the fact that multiple plasmids harbored by this spirochete encode a number of highly variable proteins without counterparts in Treponema (30). Genome comparisons between two more closely related bacteria, namely H. influenzae and E. coli, have indicated that lineage-specific gene loss was a major force in the evolution of the parasitic organism, in this case H. influenzae (72). Given that both spirochetes are obligate parasites, it is likely that the two lineages have lost genes largely independently, resulting in major differences in the complement of genes eventually retained. However, the number of genes lost by the spirochetes is lower than that seen in some other obligate parasites, such as the mycoplasmas, chlamydiae, and rickettsiae (5, 69, 79).
The OC values for different functional classes of proteins (Fig. (Fig.2)2) illustrate the spectrum of genes that have retained identical or similar functions and were probably present in the common ancestor of the spirochetes. As expected, certain core functions, such as translation, RNA modification, and replication, are hardly affected by gene loss. By contrast, the DNA repair systems and the transcription systems show major differences, probably due to a combination of lineage-specific loss and horizontal gene transfer driven by different selective forces acting on the two organisms. The core of the glycolytic pathway and associated metabolic steps show a predominantly vertical inheritance. These systems with high OC values are very similar to their counterparts in other bacteria, which suggests a typical bacterial core physiology of the common ancestor. The presence of V-type ATPase operons in both spirochetes indicates that, like Thermus and Enterococcus (55), the common ancestor of the spirochetes used H+/Na+ exchange for ion gradient generation. The signal transduction systems in general show low OC values, but the elements involved in chemotactic response are conserved. Together with the conservation of the flagellar apparatus (Fig. (Fig.2),2), this suggests an actively motile chemoresponsive ancestral spirochete that used a mode of locomotion similar to that of the two extant species.
Even within the conserved systems, there are some striking cases of xenologous and nonorthologous gene displacement that are suggestive of horizontal transfer from distantly related bacteria and archaea. Some of these appear to have occurred in the common ancestor of the spirochetes, whereas others are lineage specific. Examples of apparent ancient gene transfers include the acquisition of certain aminoacyl-tRNA synthetases and cytidylate kinase from archaea. Likely xenologous displacements in one of the spirochete lineages were seen in a number of cases, such as fructose-1-6-bisphosphate aldolase, glyceraldehyde 3-phosphate dehydrogenase, glucose 6-phosphate dehydrogenase subunits, 6-phosphogluconate dehydrogenase, and topoisomerase I (Fig. (Fig.3).3). Other examples of likely gene exchange with distant bacteria include the small, primase-like TOPRIM domain protein (10), which is otherwise seen only in low-GC gram-positive bacteria and in archaea, and the unusual HTH proteins shared with Bacillus, Thermotoga, and Helicobacter. These observations suggest that, unlike the extant forms that lead a relatively isolated existence due to their specialized life cycles, the ancestors of B. burgdorferi and T. pallidum, both before and after their divergence from each other, existed in close proximity with other prokaryotes, which favored gene exchange.
There is considerable evidence of likely acquisition of genes from eukaryotic, and possibly animal, sources by the common ancestor of the spirochetes; on most occasions, such gene acquisitions apparently have been accompanied by displacement of the endogenous bacterial versions. Several of the aminoacyl-tRNA synthetases exemplify this phenomenon (78). Other striking cases include the YWTD domain proteins and the secreted vWA domain proteins, which until now appeared to be specific to animals, as well as predicted secreted proteins containing TPR repeats. Thus, it appears that the common ancestor of Treponema and Borrelia was already in contact with animal hosts. However, the evidence of gene exchange with other bacteria, archaea, and perhaps eukaryotic protists (see above) suggests that they were not specialized obligate parasites isolated from the rest of the microbial community. The lifestyle predicted for the common ancestor of B. burgdorferi and T. pallidum is consistent with that of the spirochetes in the microbial community in invertebrate guts (21, 41). These spirochetes are not only in contact with an animal host but also share the niche with other bacteria, methanogenic archaea, and eukaryotic protists. The alternative hypothesis, that the common ancestor of Treponema and Borrelia was a free-living spirochete and that the bacteria of these two lineages have independently established symbiosis with the eukaryotic host, cannot be ruled out but seems to be less strongly supported by the evidence. Indeed, many of the likely cases of horizontal gene transfer from eukaryotes are shared by the two spirochetes, which strongly suggests that their ancestor was already in contact with a eukaryotic host.
Subsequent to the divergence from its common ancestor, each lineage independently acquired the ability to infect vertebrate hosts. This is indicated by the presence of very different variant antigens in the two genera, a feature that might have enabled these organisms to evade the acquired immunity of the vertebrate hosts. The possibility that Borrelia has moved from arthropod parasitism to vertebrate parasitism is suggested by the presence of specific carbohydrate metabolism genes that provide for the utilization of arthropod chitin by this bacterium. The presence of commensal treponemes in vertebrates (21) suggests that the ancestors of T. pallidum were well adapted for life in a vertebrate host before the final step in the evolution of pathogenicity that might have been accompanied by the acquisition of specific variant antigens to evade the host immune system. T. pallidum lacks many of the carbohydrate metabolism genes and a functional PTS system; these systems might have been lost subsequent to its displacement from the carbohydrate-rich niches occupied by the ancestral treponeme, such as the vertebrate oral cavity.
By means of local sequence similarity searches, profile searches, and analysis of individual domains and protein families, we have conducted a detailed comparative analysis of the genomes of the spirochetes B. burgdorferi and T. pallidum. The level of conservation between functional classes and paralogous families of proteins was measured using the orthology coefficient. Using this measure, it was possible to characterize, in functional terms, the nature of the divergence between the two spirochetes and the common and distinct aspects of their physiological strategies. Protein profile searches resulted in the identification of hitherto undetected components of the signal transduction machinery and novel proteins containing domains previously seen only in eukaryotes. Secreted proteins containing these domains might mediate interactions between the spirochetes and host cells or the extracellular matrix. It appears possible to tentatively reconstruct the evolutionary steps leading from a common ancestor that might have been an invertebrate gut symbiont to the divergence of the two genera of spirochetes and adaptation to their specific niches.
The list of T. pallidum and B. burgdorferi orthologs classified by functional categories and by protein families is available by anonymous ftp at ftp://ncbi.nlm.nih.gov/pub/Koonin/Spirochetes.