Orthology coefficient: functional classes and protein families.
The proteins encoded by the genomes can be broadly categorized on the basis of either (predicted) cellular function (functional class) or shared conserved domains (protein families). Orthologs are genes (proteins) in different genomes that are related by vertical descent; i.e., they can be traced back to a common ancestor. Such orthologous proteins typically have similar, if not identical, biological functions in the respective organisms (71
). Paralogs, in contrast, are genes related by duplication within a lineage (genome); paralogs are expected to possess generally similar biochemical functions but distinct biological roles. Given these differences in functional implications, careful delineation of orthologous and paralogous relationships between genes is of primary importance for predicting functions of proteins with an appropriate degree of precision. We adopted a quantitative measure of orthology that we termed the orthology coefficient (OC). This index can be applied to different protein categories, namely, superfamilies related by descent as opposed to functional classes of proteins, such as those involved in transcription or replication. In the simplest case of a one-to-one correspondence between genes in two genomes, OC = 2No
) where No
is the number of orthologs and N1
are the numbers of members of the given protein family or functional category in the two compared genomes (Fig. ). If the gene complements in the two genomes are completely conserved, then the OC is 1.00; by contrast, in the hypothetical case where there are no orthologs, the OC is 0. If there is one or more duplications in one or both of the species that occurred after their divergence, then OC = (No1
) where No1
are the numbers of members in orthologous clusters from the two respective genomes (Fig. ). Criteria used to identify orthologs included symmetry of retrieval in reciprocal BLAST searches, conservation of the domain organization of proteins, and, in problematic cases, phylogenetic tree analysis (71
FIG. 1 OC as a quantitative measure of orthology between different types of protein categories. (A) In the case of a one-to-one correspondence between genes in two genomes, OC = 2No/(N1 + N2) where No is the number of orthologs and N1 and N2 are the (more ...) Conservation and diversity of functional classes of proteins between the two spirochetes.
Functional classes of proteins include core functions, such as replication and cell division, repair, transcription, and translation, that are shared by all bacteria and to a large extent also by archaea and eukaryotes, and more specialized functions, such as metabolism, metabolite transport, host and environment response, and signal transduction. Figure shows the representation of the functional classes in the two genomes and the level of orthology in each class measured using OC. In order to assess the effects of specific adaptations of each of the spirochetes to their distinct niches, investigation of classes with low OC values appears to be critical. Below, we discuss the distribution of functional classes in the two spirochetes along the gradient of OC values. A selection of prominent differences in the functional systems of the two spirochetes is shown in Table , and examples of apparent spirochete synapomorphies are listed in Table .
FIG. 2 OCs for functional classes of proteins. The annotated predicted open reading frames from the proteomes of B. burgdorferi and T. pallidum were grouped by the predicted functional class of the protein. Orthologous proteins within each functional class were (more ...)
TABLE 1 Selected prominent differences in the functional systems of the twospirochetesa
TABLE 2 Likely spirochetesynapomorphies Highly conserved functional classes with OCs close to 1.
Not surprisingly, the core functional classes, namely, DNA replication and translation, had OCs of 0.94 and 0.98, respectively, which is indicative of largely vertical inheritance (Fig. ). The limited differences observed in these conserved functional classes could be attributed to interesting cases of horizontal gene transfer, gene loss, or duplication with divergence.
Differences in the gene complement involved in DNA replication include the lack of topoisomerase IV (parC
) in T. pallidum
, which is of interest given the conserved role of these proteins in chromosome partitioning in other bacteria (77
). Conversely, the proofreading 3′-5′ exonuclease—the
subunit of DNA polymerase III (e.g., the E. coli
DnaQ protein)—whose role in DNA replication seems to be conserved in all other bacteria (29
), is missing in B. burgdorferi
. The absence of this enzyme would result in error-prone replication, which seems to be compatible with the generation of new variants through mutation in the genes encoding variable antigens of Borrelia
). An alternative is nonorthologous displacement (E. V. Koonin, A. R. Mushegian, and P. Bork, Letter, Trends Genet. 12:
334–336, 1996), that is, an unrelated or distantly related and so far unidentified exonuclease being recruited for the proofreading role. Interestingly, T. pallidum
encodes a second copy of the single-stranded DNA-binding protein (TP0310), which might have a distinct function in replication or recombination.
In addition to these striking differences, we detected an interesting case of apparent displacement of an existing gene by its ortholog from a distant species, probably via horizontal gene transfer. Sequence searches showed that topoisomerase I (Topo I) from B. burgdorferi
shares an insert within the TOPRIM domain (10
) and a C-terminal repeat domain with actinomycetes and cyanobacteria. These features are lacking in other bacterial lineages and appear to be derived, shared characters (synapomorphies) within this protein superfamily. Phylogenetic analysis using multiple alignment of the conserved region of bacterial Topo I strongly supported grouping of B. burgdorferi
with actinomycetes and cyanobacteria (Fig. [bootstrap value, >90%]). This phylogenetic affinity of Topo I is in stark contrast to the typical clustering of the other spirochete proteins, particularly those from the core functional classes, and suggests horizontal gene transfer into the B. burgdorferi
lineage followed by loss of the ancestral form. This phenomenon, which may be termed xenologous gene displacement, that is, displacement by an orthologous gene from a phylogenetically distant source (J. P. Gogarten, Letter and comment, J. Mol. Evol. 39:
541–543, 1994), is seen also in several other instances in the two spirochetes (see below).
FIG. 3 Phylogenetic trees for representative examples of xenologous gene displacement and horizontal gene transfer in the spirochetes. The trees were constructed using alignments generated with CLUSTALW, followed by manual adjustments based on PSI-BLAST outputs. (more ...)
In spite of the high level of conservation in the translation apparatus, we did detect a few interesting examples of evolutionary divergence between the two species. One notable case was the prolyl-tRNA synthetase from B. burgdorferi
, where phylogenetic analysis strongly suggested that the enzyme in B. burgdorferi
has been acquired from a eukaryotic source by the xenologous displacement mechanism (78
). The other aminoacyl-tRNA synthetases are strongly conserved between the spirochetes, but phylogenetic analysis provided evidence for likely ancient horizontal transfers into the ancestor of spirochetes accompanied by displacement of the original gene. One well-studied case in this category is class I lysyl-tRNA synthetase, which apparently was acquired from the archaea (37
); similar evidence of archaeal origin was obtained for the two subunits of phenylalanyl-tRNA synthetase (78
). By contrast, the synthetases for arginine, glutamate, methionine, isoleucine, and serine showed evidence of likely horizontal transfer from eukaryotic sources (78
Functional classes with intermediate OCs (0.6 to 0.8).
The gene complements involved in repair, recombination, and transcription show significant differences between the two spirochetes, which is reflected in OC values of less than 0.8 (Fig. ). A number of important repair genes are different in the two genomes. Thus, B. burgdorferi
encodes the helicase-nuclease complex RecBCD, whereas T. pallidum
lacks these genes but has instead a complement of recFGNR
genes. Another repair enzyme unique to T. pallidum
is the ERCC3-like eukaryotic-type DNA repair helicase that is also present in mycobacteria (54
). It is likely that this gene was horizontally transferred from eukaryotes into a certain bacterial lineage and subsequently disseminated amidst the bacteria. The triplication of the C-terminal BRCT domain of DNA ligase in T. pallidum
is a unique feature that is in contrast to the similarly located single copy of this domain present in all other bacteria (7
). The RecQ helicase gene is found in T. pallidum
but not in B. burgdorferi
. Conversely, B. burgdorferi
but not T. pallidum
retains the ancient duplication of the mutS
gene, which is also seen in other bacteria. Interestingly, B. burgdorferi
lacks RuvC, the endonuclease subunit of the Holliday junction resolvase complex, which suggests either that the Ruv complex is nonfunctional or that a distinct nuclease complements this function. Both spirochetes possess a novel mismatched-base DNA glycosylase (TP0229/BB0013), a base excision repair enzyme, orthologs of which are conserved in a number of bacteria and in all archaea (60
), but the B. burgdorferi
version is highly divergent from all other bacterial orthologs. The major disparities in the repertoires of repair proteins suggest a difference in the selective forces that act on this system in the two spirochetes. The fact that B. burgdorferi
undergoes antigenic variation (58
) that may require both active recombination and error-prone repair provides a possible explanation for the retention of only certain of its repair pathways.
The intermediate OC of the transcriptional apparatus seems to reflect a bimodal effect of selective evolutionary forces. The core transcriptional machinery, which includes RNA polymerase subunits and components of the elongation and termination complexes, is highly conserved, whereas the repertoires of specific transcriptional regulators are largely different (Fig. ). Within the shared heritage, there are some unique features that are likely to have been derived in the common ancestor of these spirochetes and may be considered signatures of this clade. In particular, the spirochete NusA protein (a part of the transcription termination complex) contains a C-terminal Zn ribbon domain in place of the helix-hairpin-helix (HhH) domain that is found in most other bacteria (data not shown). Similarly, the transcription elongation factor GreA (40
) in both spirochetes possesses a long N-terminal extension which is shared only with chlamydiae (69
). An orthologous pair of spirochete proteins (TP0511 and BB0355) are homologous to the transcription factor CarD from Myxococcus
) and share a domain with the transcription repair-coupling helicase (TRCF) (Fig. A). Since the corresponding region in TRCF interacts with the RNA polymerase holoenzyme (63
), these proteins are likely to be novel transcription factors, characteristic of the spirochete clade, that modulate transcription by interacting directly with the RNA polymerase.
FIG. 4 Multiple alignments of previously uncharacterized protein domains discovered in the course of comparative genome analysis of the spirochetes. (A) Resolvase-type HTH domains identified in large proteins containing an N-terminal signal peptide. (B) RNA (more ...)
Major differences between the two spirochetes were revealed even among the sigma factors. While they share ς70
(RpoD) and ς54
(RpoN), B. burgdorferi
also encodes an RpoS ortholog, whereas T. pallidum
, and ς43
(SigA). The three sigma factors present in T. pallidum
but not in B. burgdorferi
are likely to be involved in regulating multiple gene sets specific to the former. Of particular interest in this context is ς24
, whose orthologs in other bacteria regulate gene batteries associated with stress response and pathogenesis (1
). Consistent with the detection of multiple sigma factors, T. pallidum
encodes elements of the system of sigma factor regulators similar to that seen in Bacillus
, and the actinomycetes. These regulators include two phosphatases of the PP2C family and an anti-sigma factor antagonist (83
), which suggests that T. pallidum
senses a distinct environmental signal regulating the anti-sigma factor antagonist. However, T. pallidum
does not encode an ortholog of the small serine kinase of the histidine kinase class (RsbV protein), which is an essential component of the anti-sigma factor regulatory system in other bacteria (83
). If this system is to be functional, it must include a kinase other than conventional histidine or serine/threonine kinases, as none of them, with the exception of CheA (which is not known to participate in this pathway), are detectable in T. pallidum
Each of the spirochetes encodes several predicted specific transcriptional regulators of the helix-turn-helix (HTH) class of DNA-binding proteins. In addition, B. burgdorferi
encodes a member of the MetJ/Arc family of β-sheet-containing transcription factors (BBD22) on its linear plasmid. With the exception of a single, novel family of HTH proteins, none of these likely transcriptional regulators are orthologous in the two spirochetes, which is consistent with the diversification of the possible target metabolic pathway operons (see below). The shared HTH proteins are an unusual group of two orthologous protein pairs (TP0711/BB0265 and TP0408/BB0512) that contain a C-terminal HTH domain similar to the PAIRED domains of resolvases (Fig. B). In addition, they have a hydrophobic N-terminal region resembling a signal peptide that may be responsible for an unexpected membrane association and processing associated with their function. Orthologs of these proteins with the same domain arrangement are detectable in H. pylori
, Bacillus subtilis
, and Thermatoga maritima
. As an example of the diversified transcriptional regulation for similar functions, T. pallidum
encodes a winged-HTH protein, TroR (34
), whereas B. burgdorferi
encodes the unrelated Fur protein (75
), both of which are predicted to negatively regulate metal uptake. Two HTH-containing regulators of sugar metabolism, XylR-1 and XylR-2, that combine the HTH domain with the Hsp-70 like sugar-kinase domain are present only in B. burgdorferi
Functional classes with low OC (<0.6) signaling.
The signal transduction systems in T. pallidum
and B. burgdorferi
show low OC values (Fig. ), with only a few conserved features that seem to be remnants of the signaling apparatus of the common ancestor of the spirochetes. This shared core includes components of chemotactic signaling based on the histidine kinase CheA, which contains a C-terminal CheW domain (Fig. ), and associated downstream signaling components, such as methyl-accepting and methyltransferase chemotaxis proteins. Even within this system, however, B. burgdorferi
possesses specific methylesterase, methyltransferase, and methyl-accepting proteins with no counterparts in T. pallidum
(Fig. ). Furthermore, B. burgdorferi
has a larger representation of the two-component relay systems, with two additional histidine kinases other than the two CheA paralogs and multiple additional receiver domain proteins. These histidine kinases are likely to function as sensors of different stimuli, since one of them contains a PAS domain that has been implicated in redox sensing and the other one contains a large, uncharacterized extracellular (periplasmic) domain (Fig. ). Cross-talk between these kinases and other signaling systems is suggested by the fusion of the receiver domain of the two-component system with other signaling domains, such as methylesterase and diguanylate cyclase/phosphodiesterase. Of additional interest is the presence, in both spirochetes, of CheX proteins (TP0365/BB0671) (32
). These proteins are homologs of the FliM protein of E. coli
, which appears to provide a link between chemotactic signaling and the flagellar motor (46
FIG. 5 Differences and conservation in the domain architectures of proteins involved in signaling in B. burgdorferi and T. pallidum. Proteins unique to either organism are shown in the outer panels; the middle panel represents proteins conserved in the two organisms. (more ...)
The phosphoenolpyruvate:sugar phosphotransferase system (PTS) (59
) is fully represented in B. burgdorferi
(Fig. ). It includes multiple EII enzymes with different sugar specificities along with other PTS proteins, such as HPR and phosphoenolpyruvate-protein-phosphotransferase. By contrast, T. pallidum
encodes an unusual motley of proteins from the PTS system which includes the phosphocarrier protein Hpr and its kinase, similar to the one involved in catabolite repression in gram-positive bacteria (56
), and a single protein containing an EIIA domain but not the other PTS components (Fig. ). It is possible that these remnants of the PTS system in T. pallidum
are involved in a novel signal transduction mechanism.
possesses a well-developed cyclic AMP (cAMP) signaling system that is lacking in B. burgdorferi
(Fig. ). The presence, in T. pallidum
, of four cAMP-binding domain-containing proteins and a classical “eukaryote-type” adenylyl cyclase similar to those present in some other bacteria, such as cyanobacteria, myxobacteria, and actinomycetes, indicates a prominent role for this system in signal transduction. The domain architectures of these proteins suggest functional diversity—these include duplication of the cAMP-binding domain (TP0089) and fusions to an HTH DNA-binding domain (TP0262) and to an alpha-helical domain distantly related to the tetratricopeptide repeats (TPR) (TP0261) (Fig. ). Furthermore, diguanylate cyclase/phosphodiesterase from T. pallidum
contains a GAF domain that in some instances binds cyclic nucleotides (12
). This protein might participate in cross-talk between cyclic nucleotide signaling and cyclic diguanylate signaling. The adenylyl cyclase of T. pallidum
(TP0485) contains a HAMP (for histidine kinases, adenylyl cyclases, methyl accepting chemotaxis receptors, and phosphatases) domain, a recently described module that is associated with the cytoplasmic face of several bacterial signaling proteins (11
). This domain is predicted to transmit extracellular stimuli sensed by these receptor proteins to the respective intracellular signaling domains. All bacteria that contain this eukaryotic-type cAMP-dependent signaling system lack a counterpart to the eukaryotic cyclic nucleotide phosphodiesterase and, in addition, lack the proteobacterial-type phosphodiesterase Icc (44
). However, T. pallidum
encodes a predicted membrane protein with six transmembrane helices that contains a metal-dependent phosphodiesterase domain of the HD superfamily, which is distantly related to eukaryotic phosphodiesterases (9
). This protein is a candidate for the cAMP phosphodiesterase function. B. burgdorferi
lacks classical adenylyl cyclases and cAMP-binding proteins but encodes an unrelated adenylyl cyclase (BB0723) that belongs to a recently described class of adenylyl cyclases found in thermophilic archaea and Aeromonas hydrophila
). The function of this protein in signal transduction remains unclear in the absence of any detectable downstream effector molecules that would recognize cAMP. It cannot be ruled out that a new class of cAMP-binding domains shared by archaea, eukaryotes, and some bacteria, including B. burgdorferi
, remains to be discovered.
Most bacteria encode the SpoT protein, which is a guanosine-3′,5′-bispyrophosphate (ppGpp) 3′-pyrophosphohydrolaseguanosine 3′,5′-bispyrophosphohydrolase (62
). This key enzyme is responsible for cellular (p)ppGpp degradation, which reverses ppGpp accumulation during the stringent response to amino acid deprivation. While B. burgdorferi
possesses a SpoT ortholog (BB0198) (Fig. ), with its characteristic catalytic HD domain (9
) and regulatory TGS (78
) and ACT (8
) domains, T. pallidum
surprisingly lacks this protein. The parasitic lifestyle of T. pallidum
may differ from that of Borrelia
, resulting in the loss of this stringent response signaling system. Of the published genomes of bacteria, only the two chlamydial species (38
) and T. pallidum
lack this enzyme, whereas Rickettsia prowazekii
encodes a degenerate and probably inactive version of the enzyme (5
). This may reflect the superfluous nature of stringent response regulation in the nutrient-rich environments of these pathogens.
Functional classes with low OCs (<0.6): metabolic pathways and transport and other membrane components.
Consistent with the picture emerging from other pathogenic bacteria, the genomes of the two spirochetes are noticeably reduced in terms of genes coding for metabolic enzymes, which suggests a general degeneration of metabolic systems that most likely were represented in their common ancestor. However, even in this degenerate state, there are striking differences in the predicted metabolic pathways between T. pallidum
and B. burgdorferi
, suggesting different adaptation strategies. The common metabolic heritage is largely restricted to the trunk of the glycolytic Embden-Meyerhof pathway, the phosphorylation steps of nucleotide interconversion, and a system for membrane potential generation, namely, the multisubunit V-type ATPase. The latter is the principal energy-generating ATPase in archaea; in bacteria, the V-ATPase subunits are encoded in a single operon that appears to be evolutionarily mobile and shows a scattered distribution, being found, in addition to in the spirochetes, in Chlamydia
, and Thermus
As already discussed for other, more conserved pathways, even within the shared core of metabolism there are instances of likely horizontal transfer and associated xenologous and nonorthologous (E. V. Koonin et al., Letter, Trends Genet. 12:
334–336, 1996) gene displacement. Apparent lateral gene transfer accompanied by displacement is illustrated by fructose-1-6-bisphosphate aldolase and the glyceraldehyde 3-phosphate dehydrogenase that belong to the central glycolytic pathway. Phylogenetic analysis of the aldolases from the two spirochetes showed that, while both of them are class II aldolases, they belong to two entirely distinct, well-supported groups, which suggests independent acquisition via lateral gene transfer from other bacteria (Fig. ). In phylogenetic trees, glyceraldehyde 3-phosphate dehydrogenase from T. pallidum
groups strongly with the orthologs of this protein from kinetoplastid eukaryotes (Fig. ). This points to horizontal gene transfer from this eukaryotic lineage to T. pallidum
after the divergence of the two spirochetes, followed by displacement of the original spirochete version. An even more interesting case is T. pallidum
uridine kinase, which not only shows the unexpected grouping with the ortholog from Thermotoga
in phylogenetic trees but also possesses a unique domain architecture, with two domains, including the TGS domain, apparently acquired from threonyl-tRNA synthetases (78
). In the threonyl-tRNA synthetases, these domains play a role in RNA binding (61
), which suggests that these unique uridine kinases could regulate their own expression by binding to mRNA. Another notable case of likely horizontal transfer followed by gene displacement involves a pair of closely linked genes in an operon, namely glucose 6-phosphate dehydrogenase (G6-PDH) and the adjacent gene that encodes the DevB protein. In T. pallidum
, both of these proteins are closely related to their counterparts from Haemophilus
rather than to those from B. burgdorferi
. Phylogenetic analysis (Fig. ) provides strong bootstrap support for this lineage and suggests that the entire operon comprised of G6-PDH and DevB has been exchanged between T. pallidum
and the proteobacterial lineage.
Even among the genes encoding components of metabolic pathways that have been vertically inherited by the two spirochetes, there are footprints of likely ancient horizontal transfer events. Horizontal transfer from eukaryotes to spirochetes is attested by the presence of a pyrophosphate-dependent phosphofructokinase (BB0020, TP0542) that otherwise is characteristic of plants and several protists, such as Entamoeba and Giardia. This enzyme is present in the spirochetes along with the original bacterial ATP-dependent phosphofructokinase (BB0727, TP0108) in the glycolytic pathway, suggesting functional partitioning of the two versions.
Each of the spirochetes encodes two cytidylate kinases. One of these appears to have been acquired by the common ancestor of the spirochetes from the archaea. This is strongly suggested by their highly significant relationship to the archaeal proteins in phylogenetic tree analysis (Fig. ) and the modified P-loop ATPase motif (data not shown) that is uniquely shared by these proteins with their archaeal counterparts.
Both spirochetes encode an aminopeptidase M containing an N-terminal insert sequence that is specifically shared with eukaryotes; phylogenetic analysis indicates that it might have been horizontally transferred into the ancestral spirochete from a eukaryote (Fig. ).
Some of the drastic differences in the gene complements for metabolic processes seem to reveal adaptive features unique to each spirochete. The ability to utilize glycerol and chitin as energy sources through the glycolytic pathway, which is unique to B. burgdorferi
, is consistent with the existence of an arthropod-specific stage in the life cycle of this spirochete. This is supported by the presence of genes for glycerol uptake and the FAD-dependent glycerol-3-phosphate dehydrogenase (BB0243), the latter apparently being an unusual case of horizontal transfer from a eukaryotic source (Fig. ). Utilization of N
-acetylglucosamine polymers or chitin (available in the arthropod host) as a substrate for cell wall biosynthesis (glucosamine deacetylase, NagA; BB0151) and/or as an energy source (glucosamine deaminase, NagB; BB0152) could represent an adaptation of B. burgdorferi
for survival in its specific niche, in this case the arthropod host. The presence of an additional pathway for conversion of dihydroxyacetone phosphate into lactate via a methylglyoxal intermediate also could be an adaptation for growth under certain environmental conditions and carbohydrate sources (28
Another interesting metabolic feature of B. burgdorferi
inferred from sequence analysis is the presence of a pathway for the biosynthesis of isopentenyl pyrophosphate (IPP). All of the enzymes necessary for the formation of IPP from 3-hydroxymethyl glutaryl-coenzyme A (HMG-CoA), along with a novel glycolate oxidase-like, TIM barrel fold oxidoreductase (BB0684) that could participate in the same process, are encoded in a single long operon. With the exception of this oxidoreductase, the remaining proteins of this pathway show eukaryotic or archaeal affinities in phylogenetic analysis (data not shown), with no closely related proteins in bacteria. Taken together with the fact that Pseudomonas mevalonii
is the only other bacterium to date that encodes an orthologous HMG-CoA reductase (14
), these observations suggest that this pathway has been acquired by B. burgdorferi
via horizontal transfer from either an archaeal or a eukaryotic source. The enzymes encoded in this unusual operon are likely to participate in the synthesis of still uncharacterized membrane/lipoprotein components in B. burgdorferi
The repertoires of transporters encoded by the two spirochetes are significantly different and, with an OC of 0.38, are among the most divergent functional classes revealed in our analysis; this suggests distinct substrate requirements. In this class, it is notable that, unlike B. burgdorferi
, T. pallidum
appears to lack a transport mechanism for proline or glycine-betaine. Given the proposed osmoprotective role of proline and glycine-betaine in bacterial stress response (4
), the presence of a full complement of genes for proline biosynthesis in T. pallidum
likely serves as an adaptive response to osmotic stress.
Finally, the surface lipoproteins and other, uncharacterized membrane proteins of the two spirochetes show very low OC values (Fig. ), which is compatible with the different modes of parasite-host interaction.
Individual protein families in the two spirochetes. (i) Conserved and variable families.
As an alternative approach to assessing the evolutionary forces acting on the spirochetes, we analyzed all of the largest families of paralogous proteins in the two genomes and calculated the OCs for each of them (Fig. ). Like other bacteria and archaea (39
), the largest protein superfamily in the spirochetes includes P-loop-containing NTPases, with 86 members in B. burgdorferi
and 69 in T. pallidum
. Different families within this large superfamily show substantial variability in the degree of conservation. A number of ATPases, such as those involved in DNA replication, V-type ATPases, and GTPases, show a largely vertical inheritance pattern. By contrast, the ABC transporter ATPases show considerable diversity in the two spirochetes, which is consistent with the differences noted in nutrient transport and substrate specificity (see above). There is a specific expansion of the MinD family of ATPases, which are involved in chromosome partitioning in B. burgdorferi
(12 members); this is consistent with the presence of several cosegregating plasmids that comprise parts of the B. burgdorferi
FIG. 6 OCs for protein families and superfamilies identified in the two spirochetes. Protein and domain (super)families within the two spirochete proteomes were identified using transitive BLASTP searches and family-specific PSSMs, as described in Materials (more ...)
Compared to the P-loop ATPase superfamily, all other protein families are much smaller and show widely varying degrees of orthology (Fig. ). Some of these families, such as the pseudouridine synthetases, the PDZ domain proteins, and lysostaphin-like proteases, show OC values close to 1, which suggests functional coherence between the two organisms. The two classes of aminoacyl-tRNA synthetases and SAM-dependent methyltransferases have slightly lower OC values, which is due to the likely horizontal transfers discussed above or to relatively minor lineage-specific gene losses. Other families, such as HD superfamily hydrolases (9
), metallo-β-lactamase fold hydrolases (6
), calcineurin-like phosphatases (7
), and peptidyl-prolyl cis-trans
isomerases, show much lower OC values, which indicates major effects of lineage-specific gene loss, horizontal transfer, and rapid diversification. These families, unlike those associated with translation and replication, appear to participate in diverse roles that are under constant selection due to variations in the niches occupied by the two spirochetes. Certain protein families, such as the P
-methylase/lipoate synthetase family, are seen only in T. pallidum
. This raises the possibility of distinct metabolic options that are absent in B. burgdorferi
(ii) Newly identified protein families.
We identified several previously undetected protein families in the spirochetes by using family-specific PSSMs (see Materials and Methods). These include some secreted (or possibly periplasmic) and membrane proteins that could be involved in pathogen-host interactions. The most interesting of these are the von Willebrand A factor (vWA) domain-containing proteins (52
), which, until recently, had been detected only in eukaryotes. The vWA domain is a Mg2+
-binding protein module with an α/β fold that participates in adhesion and protein-protein interactions in a wide range of eukaryotic proteins, such as integrins (26
). Using profile searches, we identified three vWA domain-containing proteins in each of the spirochetes (Fig. C). However, the vWA proteins from B. burgdorferi
are not highly similar to those from T. pallidum
, suggesting that they either have been independently acquired or have diverged rapidly due to selective forces acting on extracellular proteins. These secreted or membrane-associated vWA domain proteins are reminiscent of similar domains present in the extracellular adhesion molecule (TRAP) of a eukaryotic pathogen, the malarial genus Plasmodium
). By analogy to TRAP, it is likely that the spirochete vWA proteins are involved in adhesion of these bacteria to the extracellular matrix or to cells of the host connective tissues. These proteins could be potential targets for specific antispirochete therapies. The similarity between the spirochete vWA domains and that of the LFA-1 integrin is of interest given the recent report identifying LFA-1 as an autoantigen in treatment-resistant Lyme arthritis (33
Another interesting protein of B. burgdorferi
is BB0689, a secreted or periplasmic protein containing a PR1 domain (70
). This domain hitherto has been detected only in eukaryotes, namely in plant pathogenesis-related proteins (PR-1) and in proteins expressed in the animal immune system such as GliPR (70
). One of the human proteins in this family has been suggested to be a novel trypsin inhibitor (82
), whereas other members, such as helothermine, found in the toxins of Helodermid lizards and certain snakes, act as inhibitors of potassium and calcium channels (17
). Another protein of this family, TPX-1, is implicated in the interaction between spermatogenic and Sertoli cells (45
). All of these observations indicate that PR-1 domain-containing proteins participate in diverse extracellular protein-protein interactions, suggesting a role for BB0689 in adhesion to host extracellular proteins and, accordingly, in pathogenesis.
Conversely, T. pallidum
encodes an unusual, large, secreted (periplasmic) protein (TP0544) that contains an OB-fold domain (47
). This protein might interact with host cells and act as a virulence factor like the OB-fold-containing toxins from a wide range of bacteria (25
Both spirochetes encode proteins (BB0236 and TP0421) containing a modified version of the YWTD domain (67
) that is seen in intracellular animal proteins, some of which also contain the RING finger domain (L. Aravind, unpublished data). This relationship to a class of animal proteins suggests lateral transfer from an animal host into the ancestor of the two spirochetes. YWTD domains assume a β-propeller structure similar, for example, to that in low-density lipoprotein receptors and in proteins from other parasites (e.g., Trypanosoma cruzi
) that mediate extracellular adhesion (51
). The presence of a predicted signal peptide suggests that, like the vWA and PR-1 domain-containing proteins, the spirochete YWTD domain proteins are secreted and interact with host cell surface molecules (52
Another interesting family that is conserved between the two spirochetes, some of whose members appear to be secreted, are the tetratricopeptide repeat (TPR) domain-containing proteins (24
). These domains form an alpha-helical superstructure and mediate protein-protein interactions. The use of TPR motifs in extracellular interactions might be a novel strategy of host interaction employed by the spirochetes and is reminiscent of a family of secreted Sel-1 repeat proteins in Helicobacter pylori
; L. Aravind, unpublished data).
To add to this repertoire of previously undetected proteins that could be important for the pathogen's interaction with host cell surfaces, T. pallidum encodes two proteins with multiple ankyrin repeat motifs (TP0502, TP0835) (Fig. ). At least TP0835, which contains 20 ankyrin repeats, has a canonical signal peptide, indicating that, unlike the eukaryotic ankyrin repeat proteins, this Treponema protein is secreted.
Evolutionary implications of comparative analysis of the two spirochete proteomes.
The use of the OC concept, along with a case-by-case analysis of individual protein sequences, has provided considerable information that allows us to address several evolutionary issues: (i) the amount of shared heritage present in the spirochete genomes, (ii) the likely gene repertoire and lifestyle of the common ancestor, (iii) the role of horizontal transfer, and (iv) the strategies and adaptations used by these organisms in their interaction with the host, which are important for pathogenesis.
Likely orthologs comprise about 43% of the total number of proteins encoded by the two spirochetes (39.5% [495 of 1,256] of the B. burgdorferi
proteins and 47% [486 of 1,031] of the T. pallidum
proteins). This fairly low fraction of orthologs indicates that lineage-specific gene loss, horizontal gene transfer, and rapid divergence account for more than one-half of the gene repertoires of the two spirochetes. The relatively lower fraction of orthologs in Borrelia
is to a large extent due to the fact that multiple plasmids harbored by this spirochete encode a number of highly variable proteins without counterparts in Treponema
). Genome comparisons between two more closely related bacteria, namely H. influenzae
and E. coli
, have indicated that lineage-specific gene loss was a major force in the evolution of the parasitic organism, in this case H. influenzae
). Given that both spirochetes are obligate parasites, it is likely that the two lineages have lost genes largely independently, resulting in major differences in the complement of genes eventually retained. However, the number of genes lost by the spirochetes is lower than that seen in some other obligate parasites, such as the mycoplasmas, chlamydiae, and rickettsiae (5
The OC values for different functional classes of proteins (Fig. ) illustrate the spectrum of genes that have retained identical or similar functions and were probably present in the common ancestor of the spirochetes. As expected, certain core functions, such as translation, RNA modification, and replication, are hardly affected by gene loss. By contrast, the DNA repair systems and the transcription systems show major differences, probably due to a combination of lineage-specific loss and horizontal gene transfer driven by different selective forces acting on the two organisms. The core of the glycolytic pathway and associated metabolic steps show a predominantly vertical inheritance. These systems with high OC values are very similar to their counterparts in other bacteria, which suggests a typical bacterial core physiology of the common ancestor. The presence of V-type ATPase operons in both spirochetes indicates that, like Thermus
), the common ancestor of the spirochetes used H+
exchange for ion gradient generation. The signal transduction systems in general show low OC values, but the elements involved in chemotactic response are conserved. Together with the conservation of the flagellar apparatus (Fig. ), this suggests an actively motile chemoresponsive ancestral spirochete that used a mode of locomotion similar to that of the two extant species.
Even within the conserved systems, there are some striking cases of xenologous and nonorthologous gene displacement that are suggestive of horizontal transfer from distantly related bacteria and archaea. Some of these appear to have occurred in the common ancestor of the spirochetes, whereas others are lineage specific. Examples of apparent ancient gene transfers include the acquisition of certain aminoacyl-tRNA synthetases and cytidylate kinase from archaea. Likely xenologous displacements in one of the spirochete lineages were seen in a number of cases, such as fructose-1-6-bisphosphate aldolase, glyceraldehyde 3-phosphate dehydrogenase, glucose 6-phosphate dehydrogenase subunits, 6-phosphogluconate dehydrogenase, and topoisomerase I (Fig. ). Other examples of likely gene exchange with distant bacteria include the small, primase-like TOPRIM domain protein (10
), which is otherwise seen only in low-GC gram-positive bacteria and in archaea, and the unusual HTH proteins shared with Bacillus
, and Helicobacter
. These observations suggest that, unlike the extant forms that lead a relatively isolated existence due to their specialized life cycles, the ancestors of B. burgdorferi
and T. pallidum
, both before and after their divergence from each other, existed in close proximity with other prokaryotes, which favored gene exchange.
There is considerable evidence of likely acquisition of genes from eukaryotic, and possibly animal, sources by the common ancestor of the spirochetes; on most occasions, such gene acquisitions apparently have been accompanied by displacement of the endogenous bacterial versions. Several of the aminoacyl-tRNA synthetases exemplify this phenomenon (78
). Other striking cases include the YWTD domain proteins and the secreted vWA domain proteins, which until now appeared to be specific to animals, as well as predicted secreted proteins containing TPR repeats. Thus, it appears that the common ancestor of Treponema
was already in contact with animal hosts. However, the evidence of gene exchange with other bacteria, archaea, and perhaps eukaryotic protists (see above) suggests that they were not specialized obligate parasites isolated from the rest of the microbial community. The lifestyle predicted for the common ancestor of B. burgdorferi
and T. pallidum
is consistent with that of the spirochetes in the microbial community in invertebrate guts (21
). These spirochetes are not only in contact with an animal host but also share the niche with other bacteria, methanogenic archaea, and eukaryotic protists. The alternative hypothesis, that the common ancestor of Treponema
was a free-living spirochete and that the bacteria of these two lineages have independently established symbiosis with the eukaryotic host, cannot be ruled out but seems to be less strongly supported by the evidence. Indeed, many of the likely cases of horizontal gene transfer from eukaryotes are shared by the two spirochetes, which strongly suggests that their ancestor was already in contact with a eukaryotic host.
Subsequent to the divergence from its common ancestor, each lineage independently acquired the ability to infect vertebrate hosts. This is indicated by the presence of very different variant antigens in the two genera, a feature that might have enabled these organisms to evade the acquired immunity of the vertebrate hosts. The possibility that Borrelia
has moved from arthropod parasitism to vertebrate parasitism is suggested by the presence of specific carbohydrate metabolism genes that provide for the utilization of arthropod chitin by this bacterium. The presence of commensal treponemes in vertebrates (21
) suggests that the ancestors of T. pallidum
were well adapted for life in a vertebrate host before the final step in the evolution of pathogenicity that might have been accompanied by the acquisition of specific variant antigens to evade the host immune system. T. pallidum
lacks many of the carbohydrate metabolism genes and a functional PTS system; these systems might have been lost subsequent to its displacement from the carbohydrate-rich niches occupied by the ancestral treponeme, such as the vertebrate oral cavity.