|Home | About | Journals | Submit | Contact Us | Français|
We report analyses of 202 fully sequenced genomes for homologues of known protein constituents of the bacterial phosphoenolpyruvate-dependent phosphotransferase system (PTS). These included 174 bacterial, 19 archaeal, and 9 eukaryotic genomes. Homologues of PTS proteins were not identified in archaea or eukaryotes, showing that the horizontal transfer of genes encoding PTS proteins has not occurred between the three domains of life. Of the 174 bacterial genomes (136 bacterial species) analyzed, 30 diverse species have no PTS homologues, and 29 species have cytoplasmic PTS phosphoryl transfer protein homologues but lack recognizable PTS permeases. These soluble homologues presumably function in regulation. The remaining 77 species possess all PTS proteins required for the transport and phosphorylation of at least one sugar via the PTS. Up to 3.2% of the genes in a bacterium encode PTS proteins. These homologues were analyzed for family association, range of protein types, domain organization, and organismal distribution. Different strains of a single bacterial species often possess strikingly different complements of PTS proteins. Types of PTS protein domain fusions were analyzed, showing that certain types of domain fusions are common, while others are rare or prohibited. Select PTS proteins were analyzed from different phylogenetic standpoints, showing that PTS protein phylogeny often differs from organismal phylogeny. The results document the frequent gain and loss of PTS protein-encoding genes and suggest that the lateral transfer of these genes within the bacterial domain has played an important role in bacterial evolution. Our studies provide insight into the development of complex multicomponent enzyme systems and lead to predictions regarding the types of protein-protein interactions that promote efficient PTS-mediated phosphoryl transfer.
Four decades ago, Kundig et al. reported the discovery of a novel sugar-phosphorylating system in Escherichia coli (38). The unique features of this phosphotransferase system (PTS) included the use of phosphoenolpyruvate (PEP) as the phosphoryl donor for sugar phosphorylation and the presence of three essential catalytic entities, termed enzyme I, enzyme II, and HPr (heat-stable, histidine-phosphorylatable protein). The discovery of this system provided an explanation for pleiotropic carbohydrate-negative mutants of E. coli described as early as 1949 (20).
In 1964, the three recognized activities of the PTS were presumed to correspond merely to three proteins. We now know that dozens of PTS proteins are present in the E. coli cell and that thousands of PTS protein homologues occur in other bacteria. Numerous genes encoding these proteins have been fully sequenced, and their phylogenetic relationships have been described (29, 81; Nguyen et al., unpublished data).
The bacterial PTS catalyzes the concomitant transport and phosphorylation of its sugar substrates (62, 96). It is a complex system that consists of general cytoplasmic energy-coupling proteins, enzyme I (EI) and HPr, which lack sugar specificity, and membranous enzyme II complexes, each specific for one or a few sugars. The enzyme II complexes usually consist of three proteins or protein domains, namely, IIA, IIB, and IIC. However, the enzyme II complexes of one of the PTS families, the mannose family, have one additional membrane-spanning protein or domain, called IID (72). Phosphoryl relay proceeds sequentially from PEP to EI, HPr, IIA, IIB, and finally, the incoming sugar, which is transported across the membrane via the integral membrane IIC porter (Table (Table11 and Fig. Fig.1A1A).
When the PTS was discovered, a single function was recognized, namely, sugar phosphorylation. Forty-one years later, we find that this system plays roles in many surprising aspects of bacterial physiology. Established primary functions of the system include sugar reception, transport, and phosphorylation, whereas secondary functions include a variety of ramifications for metabolic and transcriptional regulation (Table (Table2)2) (37, 39, 84, 86, 87, 99, 100; for reviews, see references 62, 76, 79, and 84).
Genetic evidence has indicated that in various bacteria, processes regulated by the PTS include (i) transport of non-PTS carbon sources (35), (ii) the net production of carbon and energy storage sources, such as poly-β-hydroxybutyrate (64, 90) and glycogen (91), (iii) the switch between fermentative and respiratory metabolism (36), (iv) flagellar motility (54), and (v) the control of σ54-dependent transcription of carbon and nitrogen metabolic genes (12, 46, 63, 70, 74). Some evidence implicates the PTS in the regulation of cell division (44, 63). Still other regulatory functions of the PTS and their physiological consequences are listed in Table Table22.
In Escherichia coli, paralogues of EI, HPr, and the fructose IIA protein (IIAFru), designated nitrogen enzyme I (EINtr), nitrogen HPr (NPr), and nitrogen IIA protein (IIANtr), respectively, constitute a phosphoryl transfer chain (Fig. (Fig.1A)1A) that has been shown to exhibit little enzymatic cross-reactivity with the classical sugar-transporting phosphoryl transfer chain consisting of EI, HPr, and various sugar-specific enzyme II complexes (65). This nitrogen-related phosphoryl transfer chain presumably functions only in regulation (46, 47, 63). EINtr homologues have been shown to cluster phylogenetically together, distantly from all other enzyme I homologues (29).
Phylogenetic data have shown that NPr in E. coli is a distant paralogue of HPr (29, 63). Sequence characteristics that distinguish NPr from HPr as well as EINtr from EI have been described previously (65). EINtr consists of two domains, an N-terminal domain of 127 amino acids homologous to the N-terminal “sensory” domain of the NifA protein of Azotobacter vinelandii (2) and a C-terminal domain of 578 amino acids homologous to all currently sequenced enzyme I proteins. EINtr may serve a sensory function linking carbon and nitrogen metabolism (69). A mutation of the orthologous EINtr-encoding ptsP gene of A. vinelandii resulted in impaired metabolism of poly-β-hydroxybutyrate as well as diminished respiratory protection of nitrogenase under carbon-limiting conditions (90). In addition to EINtr and NPr, E. coli encodes within its genome three additional EI paralogues and four additional HPr paralogues. The functions of most of these proteins are still unknown.
The biochemical detection of novel PTS proteins in bacteria as diverse as Ancalomicrobium adetum (85), Spirochaeta aurantia(83), Acholeplasma laidlawii (28), Listeria monocytogenes (50), and several antibiotic-producing species of Streptomyces (6, 104) suggests the involvement of PTS proteins in cellular processes distinct from those currently recognized. It is worth noting that other families of transport systems, such as the family of ATP-binding cassette (ABC)-type permeases (27) and the major facilitator superfamily (55), apparently do not participate in metabolic and transcriptional regulation to the extent observed for the PTS.
Recently, a nontransporting enzyme II complex was characterized in E. coli that phosphorylates dihydroxyacetone (DHA) at the expense of PEP, using three soluble DHA-specific proteins in addition to EI and HPr (25). The three components of the DHA enzyme II complex are designated DhaK, DhaL, and DhaM. The DhaM protein of E. coli is a tridomain protein consisting of an N-terminal IIADha domain that is distantly related to IIAMan, a central HPr domain, and a C-terminally truncated EI domain. All three domains have been shown to be phosphorylated, using PEP and the classical enzyme I and HPr proteins as phosphoryl donors. The various currently recognized phosphoryl relay chains of the PTS are depicted in Fig. Fig.1A1A.
A bifunctional HPr kinase/phosphorylase (HprK) that catalyzes the phosphorylation of HPr at Ser-46 at the expense of ATP (18, 53, 66) as well as the dephosphorylation of P-Ser-HPr (49) was discovered in Streptococcus pyogenes (19). It plays an important regulatory role in at least three different cellular processes: (i) sugar uptake via the PTS, (ii) catabolite control protein A (CcpA)-mediated carbon catabolite repression, and (iii) inducer control via expulsion and exclusion mechanisms (for reviews, see references 7, 78, and 79). HprK homologues from gram-positive as well as gram-negative bacteria have been proposed to carry out different functions (7, 29, 97). Detailed phylogenetic analyses of HprK homologues from several bacteria have recently been presented (97).
The currently recognized structural and functional complexity of the PTS is impressive (Tables (Tables11 and and2),2), but the available evidence suggests that its functional ramifications are only now beginning to be realized (7, 23). Only half of the PTS proteins recognized in E. coli have been functionally characterized (101), and as revealed by the present genomic analyses, almost all PTS proteins in other organisms are uncharacterized.
Based on the phylogeny of the IIC proteins, seven PTS permease families are currently recognized, namely, the (i) glucose (including glucoside) (Glc), (ii) fructose (including mannitol) (Fru), (iii) lactose (including N,N-diacetylchitobiose) (Lac), (iv) galactitol (Gat), (v) glucitol (Gut), (vi) mannose (Man), and (vii) l-ascorbate (Asc) families. Various sugar substrates transported by the few functionally characterized members of each of these families are presented in Table Table33.
Proposed evolutionary origins of the different PTS families have been discussed (81). Briefly, the substrate-recognizing protein constituents of the PTS (enzymes IIC) are derived from at least four independent sources. Some of the non-PTS precursor constituents have been identified, and the evolutionary pathways taken have been proposed (81). Analyses suggest that two of these independently evolving systems (Gat and Dha) are still in transition, as they have not yet acquired the full-fledged characteristics of PTS enzyme II complexes. The mosaic nature of PTS enzyme II complexes has also been documented (81).
The last few years have witnessed an explosion in the amount of sequence data available for analysis. Genome sequencers sometimes deposit new sequences into databases without rigorous scrutiny. Incorrect or imprecise annotations of genes and gene products not only obscure important genomic information but also lead to the proliferation of erroneous annotations of other genomes. Frequently, sequencing errors conceal important open reading frames (ORFs), and distant phylogenetic relationships are not noticed. Here we report our computational analyses of several completely sequenced genomes.
A total of 202 genomes were screened for the presence of homologues of all currently known constituents of the bacterial phosphotransferase system. These proteins include the general energy-coupling proteins EI and HPr, the IIA, IIB, IIC, and IID constituents of the enzyme II complexes, the DHA PTS proteins (DhaM, DhaL, and DhaK), and HprK. We used criteria established by Bächler (3) to detect homologues of the E. coli dihydroxyacetone PTS proteins (25). GenBank sequence identification (GI) numbers are provided for the proteins discussed in the text.
All completely sequenced genomes were obtained from NCBI (ftp://ftp.ncbi.nih.gov/genomes/). A standalone version (release 2.0.10) of BLAST (1) was obtained ftp://ftp.ncbi.nih.gov/blast/ and was employed to identify PTS homologues, using 57 functionally characterized PTS proteins obtained from the transporter classification database (TCDB; http://www.tcdb.org) as query sequences against each of the completely sequenced genomes. Over 3,000 protein homologues were retrieved and analyzed. Previous annotations were ignored, and all proteins were reexamined. Protein domains were identified and assigned using the NCBI CD-Search tool (42). Such rigorous scrutiny often revealed errors in previous annotations.
Enzyme II complexes that include IIA, IIB, and IIC (as well as IIDMan in the case of the mannose family) were considered to constitute complete PTS permeases. Enzyme IIC proteins that lack a cognate IIA and/or IIB domain/protein were considered incomplete systems. It should be noted that in the Glc family, and only the Glc family, several PTS porters have the IIB and IIC domains fused but lack their own IIA domain. Instead, these systems use the IIAGlc protein of another system (101). Such Glc-type systems were considered complete.
Table Table44 presents an overview of our PTS genome analyses. Except for phosphoenolpyruvate synthases and pyruvate:phosphate dikinases, which are distantly related to EIs of the PTS (101) and are present in both eukaryotes and archaea, and except for a single HPr kinase homologue found in an archaeon, Methanopyrus kandleri (97), no homologues of PTS proteins were identified in any eukaryote or archaeon. Within the bacterial domain, 22% of the species analyzed (30 organisms) encode no recognizable PTS protein homologues. Twenty-onepercent (29 organisms) encode cytoplasmic PTS phosphoryl transfer proteins but lack complete membrane-integrated PTS enzyme II complexes. Finally, 57% of these bacterial species (77 organisms) have at least one complete PTS enzyme II complex as well as the requisite PTS energy-coupling proteins.
The complete PTS permeases identified in distinct bacterial species were tabulated according to the family to which they belong, as classified in the TCDB (10, 77). The greatest representation was observed for the Glc family (30%), with the Fru family (25%) coming in second place (Fig. (Fig.2).2). The order of occurrence of the members of the seven families was Glc (30%) > Fru (25%) > Man (15%) > Lac (14%) > Asc (9%)> Gat (4%) > Gut (3%). The Glc, Fru, Man, and Lac systems often occur in multiple copies, while the Asc, Gat, and Gut systems are usually found as a single copy per organism (Fig. (Fig.2).2). Furthermore, about equal proportions of organisms possess the Fru and Glc systems (86% and 83%, respectively), followed by the Asc (47%), Lac (44%), Man (42%), Gut (18%), and Gat (16%) systems.
The occurrence of these permease types was analyzed according to organismal type and PTS permease family (Table (Table5).5). Actinobacteria had about equal numbers of Glc- and Fru-type systems, with the Asc and Man families showing less but substantial representation. Only one or two members had representatives of the remaining three families. Among the Firmicutes, the order was Glc > Fru > Lac > Man > Asc > Gat> Gut. All three classes of Firmicutes (bacilli, clostridia, and Mollicutes) had more glucose-type systems than any other type, but in the order Bacillales, Lac systems were more prevalent than Fru systems, which were much more common than the Man-type systems. In contrast, the order Lactobacillales, as well as the clostridial class, had far more Man systems than either Lac- or Fru-type systems, which were present in about equal numbers. This observation complements and expands the suggestion of Zuniga et al. (114) that the mannose-type systems may have played a role in the establishment of symbiotic relationships between bacteria and a wide spectrum of eukaryotes. Finally, the Mollicutes had only Glc-, Fru-, and Asc-type systems, at decreasing frequencies in that order. Proteobacteria exhibited a profile quite different from those of the Firmicutes and Actinobacteria. They were found to possess far more Fru-type systems than any other type. Glc-type systems were of secondary importance, followed by Man, Asc, and Lac systems, in that order. However, the prevalence of Fru-type systems in Proteobacteria proved to be due solely to their high occurrence in the γ-proteobacterial subdivision, for which many fully sequenced genomes are available. The alpha and beta subdivisions had a preponderance of Glc-type systems, while the one δ-proteobacterium with a complete PTS had only a Man-type system.
Figure Figure3A3A shows the numbers of PTS protein-encoding genes plotted versus genome size. There is no good correlation, as some of the organisms with the smallest genomes have a good representation of PTS constituents, while some organisms with large genomes have none at all. Instead, it appears that the PTS protein content in a bacterium correlates best with the mode of carbohydrate metabolism utilized by that organism. Bacteria that rely on anaerobic sugar metabolism via glycolysis for energy production generally have the most PTS permeases, as discussed previously (56, 57). Thus, almost all bacteria that possess the PTS either are capable of anaerobic sugar utilization via glycolysis or have close bacterial relatives that are capable of such metabolism, while many of the bacteria that lack the PTS either are strict aerobes or lack a complete glycolytic cycle.
We tabulated the aerobic versus anaerobic metabolic capabilities of the 136 bacterial species analyzed in this report. The results showed that only 20% of the bacteria that lack genes encoding PTS proteins and 28% of the bacteria that lack PTS transport systems are capable of anaerobic growth (Fig. (Fig.3B).3B). However, of the bacteria that possess PTS transport capabilities, 37% encoding 3 to 9 PTS proteins, 86% encoding 10 to 30 PTS proteins, and 100% encoding 31 to 93 PTS proteins are capable of anaerobic growth (Fig. (Fig.3B).3B). This can be explained by the fact that only anaerobic glycolysis yields two molecules of PEP per molecule of hexose metabolized. One of these PEP molecules must be used for uptake of the next sugar via the PTS, while the other is required for biosynthetic purposes. The availability of excess energetic PEP molecules, the phosphoryl bonds of which have higher free energies than those of ATP, presumably provided the basis for the establishment and selective expansion of the PTS.
All organisms whose genomes lack genes encoding identifiable PTS protein homologues are listed in Table Table6.6. These include all archaea and eukaryotes examined, as noted above, as well as 30 bacterial species. Bacteria that totally lack PTS homologues include five actinobacteria of the genuses Mycobacterium (4) and Tropheryma (1), all six cyanobacteria examined, one Mollicute species (onion yellows phytoplasma 0Y-M), several proteobacteria of the alpha (five), gamma (two), delta (two), and epsilon (four) subdivisions, and five evolutionarily divergent bacteria of the genuses Aquifex, Bacteroides, Porphyromonas, Thermatoga, and Thermus. In the case of Bacteroides thetaiotaomicron, a single homologue of a galactitol IIC protein (GI 29349515) was found. This homologue cannot function via a PTS-dependent mechanism since this organism lacks all recognizable PTS phosphoryl transfer proteins. We have noted that the encoding gene is in a monocistronic operon with a good promoter, suggesting that this IIC homologue may function as a secondary carrier, as discussed previously (33, 81). Therefore, B. thetaiotaomicron, like Bacteroides fragilis, probably lacks PTS proteins altogether. Except for cyanobacteria, the -Proteobacteria, and several primitive bacteria in the “unclassified bacteria” category (Table (Table6),6), PTS homologues are found in all of the major bacterial kingdoms for which sufficient sequence data are available. Methylococcus capsulatus encodes two homologues of the Dha proteins (see below). However, it lacks all other PTS protein homologues, including EI and HPr. It is therefore unlikely that this organism has a functional phosphoryl transfer chain.
Twenty-nine bacteria were found to have PTS phosphoryl transfer proteins but to lack the complete complement of enzymes necessary for sugar transport (Table (Table7).7). As noted above, Bacteroides thetaiotaomicron possesses only a IICGat homologue and may use this protein as a secondary carrier. A second organism, Ureaplasma parvum, possesses only IIBGlc (GI 13357736), HPr (GI 13358152), and HprK (GI 13357632). In this organism, the HPr and HprK proteins may function in catabolite repression by a PTS-mediated sugar transport-independent mechanism (48), while IIBGlc may be a relic of a complete enzyme IIGlc, the other components of which were lost during genome minimalization (51).
Of the remaining 27 organisms presented in Table Table7,7, all are gram-negative bacteria. They include proteobacteria, chlamydiae, spirochetes, and a green photosynthetic bacterium. All of these organisms encode within their genomes at least one enzyme I homologue (I or INtr) and at least one HPr homologue (HPr or NPr) (65). Three of these bacteria (Pirellula sp., Geobacter sulfurreducens, and Bradyrhizobium japonicum) have two EIs, and four of them (Parachlamydia sp., Bartonella henselae, Bartonella quintana, and Bradyrhizobium japonicum) have two HPrs. Furthermore, all but two groups of these bacteria, the chlamydial group (except for Parachlamydia) and Pirellula sp., have HprK. Finally, these organisms may possess zero to three IIA proteins (IIANtr and/or IIAFru and/or IIAMan).Only two of these bacteria (Coxiella burnetii and Legionella pneumophila) lack an identifiable IIA protein. In Agrobacterium tumefaciens, the EINtr protein (GI 17937868) is encoded on the linear chromosome, while the rest of the PTS proteins are encoded on the circular chromosome.
Of the IIA homologues, Xylella species have only a single IIAMan protein while Leptospira species have only a single IIANtr protein. All others have either two IIANtr proteins, two IIAFru proteins, one each of the IIANtr and IIAFru proteins, one each of the IIANtr and IIAMan proteins, one IIANtr, one IIAFru, and one IIAMan protein, or two IIANtr and one IIAMan protein (Table (Table7).7). The species with one IIAFru, one IIANtr, and one IIAMan homologue are the two Bartonella species, while Rhodopseudomonas palustris has two IIANtr and one IIAMan protein. The fact that most organisms possess at least two homologous or nonhomologous IIA proteins suggests that the PTS phosphoryl transfer chain functions in a regulatory capacity that depends on specific functional interactions between these proteins. These putative interactions could, for example, be antagonistic or synergistic.
In summary, bacteria that possess PTS phosphoryl transfer proteins but lack PTS permeases always have at least one enzyme I (I or INtr) and one HPr (HPr or NPr). They usually have two IIA proteins and one HprK. Only for one such organism, Treponema denticola, have the catalytic activities of the phosphoryl transfer proteins been demonstrated (23). Treponema pallidum lacks an active enzyme I homologue and instead possesses a ptsI pseudogene (23). We suggest that in most cases, these PTS phosphoryl transfer proteins act together, comprising a single unified phosphoryl transfer chain (63, 65) that acts biochemically as shown in Fig. Fig.1B.1B. Among these bacteria, two distinct, independently functioning phosphoryl transfer chains (65) appear to be present only in Bradyrhizobium japonicum. These two chains may catalyze phosphoryl transfer as shown in Fig. Fig.1C.1C. It is also interesting that among these bacteria, B. japonicum is the only one that possesses a complete putative DHA PTS (see below). We suggest that pathway 1 functions to phosphorylate DHA while pathway 2 functions to coordinate carbon and nitrogen metabolism (46, 47, 63, 65).
Complete PTS permeases (enzyme II complexes) consist of IIA, IIB, and IIC (as well as IID in the case of the Man family) domains that may occur as separate peptides or may be fused in various combinations into a smaller number of polypeptide chains (see below). Table Table88 lists the organisms that have only one or two types of complete PTS permease in addition to the PTS energy-coupling proteins. Of the 77 bacterial species that were found to encode at least one complete PTS transport system, 31 were found to encode just one or two types of PTS permease. Eighteen bacterial species possess just a single complete type of PTS permease. All of these organisms have both enzyme I and HPr, and many of the gram-negative Proteobacteria also have a partial or complete nitrogen regulatory phosphoryl transfer chain including INtr, NPr, and IIANtr. Like the case for E. coli (63, 65), these organisms may possess two independently functioning phosphoryl transfer chains, one for sugar transport and one for regulation (45-47). Nine species have just one or two fructose-type PTS permeases, while six possess only one or two glucose-type systems; two have just a single mannose system, and one has only one glucitol-type system. Except for Caulobacter crescentus, which has two Glc-type systems, and Mesorhizobium loti, which has a Gut-type system, all other α-Proteobacteria examined either possess just the regulatory PTS proteins (Table (Table7)7) or lack PTS homologues altogether (Table (Table6).6). “Candidatus Blochmannia floridanus,” an Enterobacteriaceae member, is one of the organisms that has just a single Man-type PTS. It is possible that it once possessed Fru- and/or Glu-type systems since many of the Enterobacteriaceae analyzed encode several Fru-, Glu-, and Man-type PTS permeases (Table (Table9).9). None of the organisms analyzed possesses only a single lactose-, galactitol-, or l-ascorbate-type system. However, orphan IIA and/or IIB and/or IIC homologues of these systems were often found. These may be residues of genome minimalization (51) where the other constituents of complete enzyme II complexes were lost.
Thirteen bacterial species possess just two types of complete PTS permeases. In all cases, one of thesystems is either a glucose- or fructose-type system, except in Haemophilus ducreyi, where the two systems identified are Man- and Asc-type systems. However, Haemophilus influenzae has a single Fru-type system. Further examination of other bacteria in the same phylogenetic group revealed that these organisms possess Fru- and Glc-type systems in addition to Man- and Asc-type permeases (Table (Table9).9). It is therefore possible that H. ducreyi may have lost the Fru- and Glc-type systems during evolution. Eleven species have both glucose and fructose systems, usually just one of each. However, Mesoplasma florum has seven glucose-like systems and one fructose-like system. Borrelia garinii has two fructose-like systems and one glucose-like system, while Chromobacterium violaceum has two of each. Corynebacterium diphtheriae has a glucose-type system and an l-ascorbate-type system, while Haemophilus ducreyi has a mannose-type system and an ascorbate-type system, as noted above (Table (Table88).
None of the organisms that have just two types of PTS permeases have lactose-, galactitol-, or glucitol-type systems. In fact, even orphan constituents of these systems are lacking. However, several additional PTS orphan proteins (IIA, IIB, or IIC), particularly mannose IIA homologues, are found in some of these organisms. It is likely that some of these possess specific regulatory functions, but no such function is currently recognized. Others probably represent either relics of complete PTS permeases that have been partially lost or orphan genes acquired through horizontal transfer. Interestingly, in Deinococcus radiodurans, all of the PTS homologues were found on the plasmid MP1.
Table Table99 lists the 77 bacterial species that encode all of the proteins required for PTS-dependent sugar transport. Of these, 46 were found to encode more than two types of PTS permease. In the table, these bacteria are grouped according to organismal phylogenetic division, and sometimes by subdivision. These will be discussed according to their taxonomic groups in the order in which they are presented in Table Table9.9. It should be noted that the occurrence of multiple paralogues of a specific family or subfamily of PTS permeases implies differing substrate specificities, affinities, or regulatory properties. In very few cases have these been studied in detail.
Among the high-G+C gram-positive Actinobacteria, the Mycobacteria and Tropheryma species have no recognizable PTS protein homologues (Table (Table6).6). Bifidobacterium longum and Nocardia farcinica have a single glucose-type system and a single fructose-type system, respectively. Also, these two bacteria encode the smallest number of PTS genes (three genes) among all of the bacteria that possess a complete transport PTS. Of the three Corynebacterium species studied, C. efficiens has both Glc and Fru systems, C. diphtheriae has Glc and Asc systems, and C. glutamicum has Glc, Fru, and Asc systems. The corynebacteria represent an example where several species within a single genus have different complements of PTS proteins.
Streptomyces avermitilis has Glc and Fru systems, and surprisingly, of the 10 Actinobacteria examined, this is the only one to have HprK (GI 29826878). The HprK homologue from this bacterium clusters separately but is closest to the homologues from α-Proteobacteria (97). This may represent a case of horizontal gene transfer. Streptomyces coelicolor has an Asc system in addition to Glc and Fru systems, but it lacks HprK. Of the Actinobacteria, Symbiobacterium thermophilum and Propionibacterium acnes have the most PTS permeases, with 7 and 10 complete systems, respectively. Both have Glc, Fru, and Lac systems, and S. thermophilum has three Man systems while P. acnes has just one each of the Gut-, Gat-, and Asc-type systems but no Man system.
S. thermophilum is a thermophilic bacterium that depends on microbial commensalism. Recently, a role for the Man-type PTS transporter in the establishment of symbiotic relationships has been implicated (114). The presence of three complete Man-type systems in this organism may contribute towards its commensalic growth. S. thermophilum shows remarkable similarity to bacilli and clostridia (low-G+C gram-positive bacteria) and is suggested to have shared a close common ancestor with the Bacillus/Clostridium group (105). It is interesting that the PTS family representation in this bacterium also resembles the PTS protein distribution in some of the bacilli and clostridia.
P. acnes is found ubiquitously on human skin and resides within sebaceous follicles. It has been implicated in various diseases as an opportunistic pathogen (9). Its genome encodes a great capacity to cope with changing oxygen tension, explaining its ubiquitous presence on human skin. The occurrence of a wider variety of PTS permeases may enable efficient uptake of a broad range of substrates and may allow the organism to opportunistically adapt to a pathogenic lifestyle.
All of these high-G+C gram-positive bacteria possess various orphan proteins or “partial” PTS permeases (data not shown). The IIBCGlc and IICGlc components may be functional, using the IIA/B proteins present in one of the complete Glc systems (101). Precedence for this has been observed repeatedly, as several Glc-type systems lack their own IIA protein and use the general IIA protein of the authentic glucose system (101). However, the same phenomenon has not been documented for the other PTS permease families.
Firmicutes (orders Bacillales and Lactobacillales and class Clostridia in Table Table9)9) generally have the energy-coupling PTS proteins enzyme I and HPr as well as HprK, and with just two exceptions, they all have Glc, Fru, and Lac systems. The two exceptions are Clostridium tetani, which has only a Glc system, and Clostridium perfringens, which has Glc and Fru systems but not a Lac system. The three clostridial species analyzed show remarkable variation in their numbers of PTS genes, and these numbers do not correlate well with the differences in their genome sizes. C. tetani, with a genome size of 2.87 Mb, has 4 PTS genes, Clostridium acetobutylicum, with a genome size of 4.13 Mb, has 30 PTS genes, and C. perfringens, with an intermediate genome size of 3.09 Mb, has 36 PTS genes. The distributions of various PTS permeases in these three species are remarkably different as well (Table (Table99).
The occurrence of 12 complete PTSs in C. acetobutylicum correlates with the saccharolytic lifestyle of this soil bacterium. C. perfringens is commonly found in animal and human gastrointestinal tracts as a member of the normal microflora. Interestingly, the presence of multiple Man-type PTS permeases in this organism correlates with the abundance of Man-type permeases in some of the other members of the normal intestinal microflora, such as lactobacilli, Enterococcus faecalis, and several members of the Enterobacteriaceae. One of the two Man systems in C. acetobutylicum is encoded on the plasmid pSOL1.
Most of the other low-G+C gram-positive bacteria have additional PTS permeases, and some, such as three Listeria species and Lactobacillus plantarum, have all seven PTS enzyme II complex types. The bacteria with the most pts genes and PTS permeases are L. monocytogenes, with 91 genes and 30 complete PTS permeases plus several partial systems, and Enterococcus faecalis, with 93 genes and 25 complete permeases plus 14 partial systems, all of which have the IIC constituent (data not shown). Several of these may be functional, using components from the complete systems (101). The maximal percentage of genetic material devoted to the PTS is observed for L. monocytogenes, with 3.2% of all its genes encoding PTS proteins.
Of the small-genome Mollicutes organisms, all but two possess complete Glc- and Fru-type PTS permeases as well as an HprK homologue (with the exception of Mycoplasma hyopneumoniae), and several have an Asc system as well. The two exceptions (discussed above) are listed in Tables Tables66 and and7.7. A few reports have discussed some of the PTS proteins from the different Mycoplasma species (26, 30, 48, 59). The regulation of HprK in the Mollicutes may be different from that in other Firmicutes (26). The distribution and numbers of PTS permeases in the Mollicutes also differ from those in other Firmicutes. The various species within the Mycoplasma genus also show differences. Most surprisingly, Mycoplasma pulmonis and Mycoplasma pneumoniae have three PTS permeases each, Mycoplasma mycoides has four, M. hyopneumoniae has five, and M. florum and Mycoplasma penetrans have eight and nine, respectively. In some of these bacteria with genome sizes of 0.79 to 1.36 Mbp, a major fraction of the genetic material (over 2%) is devoted to the PTS. Their obligate parasitic lifestyle and dependence on glycolysis as the ATP-generating pathway may explain the retention of PTS genes, even under conditions that result in extensive genome reduction (21).
Table Table99 presents the PTS protein complements in a variety of Proteobacteria according to their subdivision classification. Within the alpha subdivision, most either lack PTS proteins altogether (Table (Table6)6) or only possess phosphoryl transfer proteins (Table (Table7)7) that presumably function in regulation. Only two have PTS permeases. Caulobacter crescentus has two Glc systems, while Mesorhizobium loti has one Gut system. As discussed below, Bradyrhizobium japonicum has a complete dihydroxyacetone enzyme II complex and therefore can phosphorylate this triose via the PTS.
In the beta subdivision, all organisms possess regulatory PTS proteins (Tables (Tables77 and and9),9), and a few also have PTS permeases that belong exclusively to the Glc and Fru families. The maximal number of probable complete PTS permeases is four, for Chromobacterium violaceum. C. violaceum lives in tropical and subtropical regions in soil and water but can occasionally be pathogenic in immunocompromised individuals, causing diarrhea. No other β-proteobacterium has more than two complete PTS permeases. While Ralstonia solanacearum has a Glc- and a Fru-type system, Burkholderia species have only a single Glc-type system. The Glc system in R. solanacearum was found encoded in the plasmid pGMI-1000MP. Burkholderia pseudomallei is an opportunistic pathogen in humans that is usually found in terrestrial environments, while Burkholderia mallei is a host-adapted pathogen in animals and is not found outside the host.
In the gamma subdivision, only the endosymbiont of the tsetse fly, Wigglesworthia glossinidia, and the obligate methanotroph Methylococcus capsulatus lack PTS homologues altogether (Table (Table6).6). Buchnera species, which are endosymbionts of aphids, have both a single Glc system and a single Fru system. “Candidatus Blochmannia floridanus,” an endosymbiont of the carpenter ant, has only a mannose system. All other enterobacteria have multiple PTS permeases. It is therefore clear that these endosymbionts have retained the minimal number of PTS permeases to allow utilization of a very restricted number of sugars provided by the host organism (11).
Several E. coli, Shigella, and Salmonella strains (see below) have all seven types of PTS permease systems. The different strains of E. coli probably have between 17 and 26 functional PTS permeases. These variations, observed for various E. coli/Shigella strains, suggest that the gain and loss of genetic material encoding PTS proteins have occurred frequently and repeatedly during the evolution of single species (108). Scrutiny of Table Table99 will reveal that this is a common observation for many bacterial species and genera.
Other γ-Proteobacteria listed in Table Table99 have far fewer PTS permeases. The pseudomonads, for example, have just one or two. The three species of Pseudomonas studied, all with large genome sizes of over 6 Mb, vary from nonpathogenic (P. putida) to plant pathogenic (P. syringae) to opportunistically human pathogenic (P. aeruginosa). In P. aeruginosa, one PTS permease is fructose specific while the other is N-acetylglucosamine specific (68; I. T. Paulsen, personal communication). The Pasteurellales have 1 to 8 PTS permeases, vibrios have 7 to 12 PTS permeases, and the two Xanthomonas species studied, which are found exclusively associated with their plant hosts and are not found free in the soil, have just one fructose-type system each (17, 80, 110). Among the Pasteurellales, “Mannheimia succiniciproducens” (proposed name), which is found in bovine rumen, has the largest number of PTS permeases. The rumen is the first section of the stomach in ruminants, where feed is collected for initial digestion. The presence of several PTS transporters in ruminant bacteria perhaps represents an adaptation to a nutrient-rich, oxygen-free ecological niche. Pasteurella multocida is found in the mucous lining of animal intestines, in the genitalia, and in respiratory tissues. It has five PTS permeases, each belonging to a different PTS family. Each of the five PTS permeases is perhaps expressed differentially depending on the host organ. H. ducreyi and H. influenzae are an obligate pathogen and an obligate parasite, respectively, in humans. The smaller numbers of PTS permeases in these bacteria may have resulted from their evolutionary adaptation to the homeostatic environment of thehost.
Vibrios are abundant in marine and freshwater environments. The three species analyzed are pathogenic to humans. Vibrio vulnificus, which is an opportunistic pathogen causing a variety of diseases in immunocompromised patients, has Man-type PTS permeases that are lacking in the other two species. Interestingly, it also has the largest number of PTS permeases among the three Vibrio species analyzed. Vibrio parahaemolyticus, which causes gastroenteritis, and Vibrio cholerae, the causative agent of cholera, have similar complements of PTS proteins. The Asc systems, a few of the Fru PTS permeases, one of the two Man systems in V. vulnificus, and a few other orphan PTS proteins are encoded in the smaller chromosome, chromosome II, in all three species. All other PTS permeases are encoded in chromosome I.
Among the spirochetes, Leptospira and Treponema species lack PTS permeases (as discussed earlier), but Borrelia species have several. Leptospira interrogans is an obligate aerobe that is adapted for mammalian reservoir hosts. T. denticola is an obligate anaerobe that is commonly found in the oral cavity in humans, while T. pallidum is an anaerobic human pathogen that causes syphilis. Borrelia garinii has one Glc and two Fru systems, while Borrelia burgdorferi has these systems plus a Lac system. Borrelia species are microaerophilic organisms that live in the gastrointestinal tract of ticks and are capable of infecting multiple hosts via tick bites. B. burgdorferi carries dozens of plasmids, while B. garinii harbors only two. Interestingly, the Lac system as well as a IICBGlc protein (GI 11497021) in B. burgdorferi is encoded in the plasmid cp26.
Another spirochete species, Spirochaeta aurantia, whose genome has not yet been sequenced, has only a Fru-type system with specificity for mannitol (82). In this unusual system, the synthesis of all PTS enzymes (enzyme I, HPr, and the mannitol enzyme II) as well as that of mannitol-1-P dehydrogenase is induced 200-fold by the inclusion of mannitol in the growth medium (83).
Many bacteria possess nontransporting DHA-specific enzyme II complexes. Only for E. coli has such a system been characterized (22, 25, 94). The system consists of three proteins, named DhaK (IICDha; dihydroxyacetone binding), DhaL (IIBDha; ADP binding), and DhaM (IIADha; phosphoryl donor for ADP phosphorylation in DhaL) (4, 22, 81). The DhaM protein of E. coli is a three-domain protein consisting of an N-terminal IIADha, a central HPr-like domain, and a C-terminal truncated enzyme I-like domain (25). DhaL subunits exhibit sequence characteristics that distinguish them from the homologous ATP-dependent DHA kinases (3, 4, 93). The DhaK subunit covalently binds the substrate dihydroxyacetone to a histidyl residue in the protein (22). These sequence characteristics allowed us to distinguish ATP- from PEP-dependent systems and therefore to tentatively conclude that all PTS DHA enzyme II complexes consist of split DhaK/DhaL systems, while all ATP-dependent kinases have these two functional domains fused in a single polypeptide chain (DhaKL).
Table Table1010 presents all bacteria for which we found homologues of the subunits of the DHA PTS enzyme II complex. No such homologues were found in cyanobacteria, chlamydia, and the epsilon division of the Proteobacteria. Twenty-seven organisms appear to have a complete DHA PTS, and of these, 15 (including two Mycoplasma species) are low-G+C gram-positive bacteria. Other organisms with complete DHA PTS enzyme II complexes include (i) α-, γ-, and δ-Proteobacteria (six organisms), including Bradyrhizobium japonicum (γ), Mannheimia succiniciproducens (γ), Desulfovibrio vulgaris (δ), E. coli (γ), Shigella flexneri (γ), and Mesorhizobium loti (α); (ii) high-G+C gram-positive bacteria (five organisms), including Streptomyces avermitilis, S. coelicolor, Corynebacterium diphtheriae, Leifsonia xyli, and Propionibacterium acnes; and (iii) other divergent organisms, including Fusobacterium nucleatum and Deinococcus radiodurans. Unlike all other DhaL proteins examined, that in D. radiodurans is split into two polypeptide chains. These chains could only be identified using nucleotide BLAST searches because these protein fragments had not been assigned protein accession numbers. All four putative dha genes in D. radiodurans (including the genes for DhaM and DhaK and the two split genes that together comprise DhaL) were adjacent on a plasmid, the MPI plasmid (41, 109). Splicing of the putative DhaL gene could have resulted from a sequencing error. Furthermore, since the functionality of its gene product has not been demonstrated, the significance of this observation has yet to be determined.
Category A organisms possess one and only one DHA PTS. These bacteria include two γ-Proteobacteria, Fusobacterium nucleatum, one high-G+C gram-positive bacterium, and eight low-G+C gram-positive bacteria (Table (Table10).10). DhaM in these systems consists of only a IIADha domain.
Category B organisms have just one DHA PTS, but their DhaM proteins have the IIADha domain fused to other PTS protein domains. These fused proteins include IIADha-HPr fusions, found in C. diphtheriae (GI 38234870) and L. xyli (GI 50954678), a single full-length IIADha-HPr-EI fusion (GI 46579394), present in D. vulgaris, and tridomain proteins with a truncated enzyme I, IIADha-HPr-EIΔ, characteristic of the E. coli DhaM protein, found in γ-Proteobacteria (Table (Table10).10). Only category B systems have the IIADha domain fused to other PTS protein domains.
Listeria species have two complete sets of DHA PTS (category C). Category D, which includes bacteria that possess a complete DHA PTS plus an extra DhaK, includes only low-G+C gram-positive bacteria. Possibly, the extra DhaK protein allows phosphorylation of an alternative substrate. It is always encoded by a gene distantly related to that encoding DhaK carried within the dha operon, which also encodes the remaining proteins of the complete DHA system.
Category E systems, with complete PEP- and ATP-dependent DHA phosphorylating systems, are from two high-G+C gram-positive bacteria. Category F, with just one α-proteobacterial representative, has complete PEP- and ATP-dependent systems but also has two extra DhaKs (GI 13476064 and GI 13488187) and one extra DhaL (GI 13476063). One of the DhaK homologues (GI 13488187) of M. loti is encoded on plasmid pMLa.
Several bacteria lack a complete DHA PTS but nevertheless encode within their genomes at least one PTS Dha homologue, as revealed by the data in Table Table11.11. Several different combinations of these proteins can be found. For example, two organisms each have only DhaM (category A) or DhaL (category B). No organism has just DhaK. Just three organisms have only ATP-dependent DhaKL (category C). The DhaKL protein (GI 17937250) of A. tumefaciens is encoded on the linear chromosome. Several bacteria (category D) have both DhaK and DhaL, and one of these organisms (Brucella melitensis) has a second copy of DhaL (GI 17986682) as well as a IIAMan homologue (GI 17988315) that more closely resembles E. coli IIAMan than IIADha. Category E includes four organisms encoding DhaK, DhaL, and DhaKL, and three of these organisms have a IIAMan homologue. All four genes encoding the Dha proteins in Sinorhizobium meliloti were found on the plasmid pSymB. In both B. mallei and B. pseudomallei, the DhaK and DhaL proteins are encoded on chromosome 2, while the DhaKL protein is encoded on chromosome 1 along with the other PTS homologues found in these organisms. Interestingly, Bacillus cereus, which lacks the orphan IIAMan homologue, has a short DhaL fragment (GI 52144352). Finally, organisms in category F, all Bacillus species, have DhaKL as well as DhaK. It seems likely that many of these orphan PTS Dha proteins have resulted from genome minimalization, but the possibility that some of them serve a specific function, perhaps to allow phosphorylation of a novel substrate or to provide a regulatory function, should be kept in mind.
Of the organisms listed in Table Table10,10, one organism, Bradyrhizobium japonicum, has an unusual gene arrangement. The complete DHA PTS enzyme II complex as well as enzyme I (GI 27378686) and HPr (GI 27378685) is encoded by adjacent genes that are in an operon, dhaLMHIXK. The B. japonicum dhaM gene encodes only the IIADha domain (GI 27378684). dhaH and dhaI code for HPr and enzyme I homologues, respectively, while dhaX is a hypothetical gene. It is also unusual that dhaK and dhaL are not adjacent genes. It is interesting that the gene order, dhaMHI, is the same as the domain order in the fused tridomain DhaM of E. coli and that B. japonicum possesses no other PTS enzyme II complexes. In all other organisms listed in category A of Table Table10,10, the ptsH and ptsI genes map separately from the dha operon.
Of the organisms listed in Table Table11,11, Bartonella quintana has DhaM (GI 49473737) and several PTS energy-coupling proteins, but no complete enzyme II complex (Table (Table7);7); Brucella melitensis has DhaK (GI 17986679), DhaL (GI 17986680), and a IIAMan-like homologue (GI 17988315), as well as several PTS energy-coupling proteins, but no IIADha and no PTS permeases; and Methylococcus capsulatus has DhaK (GI 53802782) and DhaL (GI 53802783), but no other PTS protein of any kind. It seems likely that none of these organisms can phosphorylate DHA with either PEP or ATP, but conceivably M. capsulatus might use ATP with DhaK and DhaL in a split DhaKL system. We tentatively suggest that some of these organisms are “in transition,” losing or gaining complete systems, either by genome reduction or by horizontal transfer, respectively (51, 108).
The Dha proteins were analyzed by generating phylogenetic trees. This was done in several ways. First, a tree was generated just for the fused DhaKL proteins. The results showed that these proteins did not cluster according to organismal group, suggesting either that horizontal transfer of these genes has occurred or that rates of evolutionary sequence divergence have been radically different for the different organisms (11) (Fig. (Fig.4).4). Three separate trees were generated, for the DhaK (Fig. (Fig.5A),5A), DhaL (Fig. (Fig.5B),5B), and DhaM (Fig. (Fig.5C)5C) proteins, for organisms that possess homologues of these proteins (Tables (Tables1010 and and11).11). The analyses clearly suggest that these three proteins have evolved in parallel without shuffling of constituents between systems. However, the 16S rRNA phylogeny results for these organisms (data not shown) suggested that horizontal transfer of complete systems has occurred repeatedly and that different rates of sequence divergence in the different organisms (11) are unlikely to explain the results. The lone DhaM protein (GI 15901038) in Streptococcus pneumoniae (Table (Table11)11) clusters with the other streptococcal DhaMs (Fig. (Fig.5C),5C), but those from Clostridium perfringens (GI 18311611) and Bartonella quintana (GI 49473737) cluster very loosely together and distantly from all other DhaMs. These two proteins resemble IIAMan more closely than the other DhaM proteins. While C. perfringens has a complete mannose enzyme II complex in addition to the orphan IIAMan-like protein, B. quintana lacks all PTS permeases, including the mannose enzyme II complex.
A few organisms have “extra” DhaLs. These include S. meliloti (GI 16264045) (Table (Table11,11, category E), Brucella melitensis (GI 17986682) (Table (Table11,11, category D), and both M. gallisepticum (GI 31544231) and P. syringae (GI 28870011) (Table (Table11,11, category B). Of these organisms, all have DhaK and two DhaLs, except M. gallisepticum and P. syringae, which have only an orphan DhaL. The extra DhaLs in S. meliloti and B. melitensis cluster together (Fig. (Fig.5B,5B, cluster 6), separate from their paralogues (cluster 5). The M. gallisepticum protein is by itself, and it is distantly related to all other homologues (Fig. (Fig.5B).5B). Extra DhaKs are present in many low-G+C gram-positive bacteria (Table (Table10,10, category D, and Table Table11,11, category F) and one α-proteobacterium, M. loti (Table (Table10,10, category F). The extra M. loti protein (Mlo3) (GI 13488187) is by itself (between clusters 5 and 6) (Fig. (Fig.5A),5A), while all extra DhaKs from low-G+C gram-positive bacteria cluster together (cluster 1c). We conclude that these extra DhaL and DhaK proteins did not arise by recent gene duplication events. Instead, they either arose early to fulfill a specific but unknown function or became “orphans” as a result of the genetic loss of their partner proteins.
In categories D and E in Table Table11,11, the DhaL and DhaK proteins are present, but no DhaM homologue could be found. All of the DhaL-DhaK pairs were derived from proteobacteria. Most of these proteins cluster together (cluster 5) in both the DhaL (Fig. (Fig.5B)5B) and DhaK (Fig. (Fig.5A)5A) trees. They do not cluster with the ATP-dependent DhaKL kinases (clusters A to D). These observations led us to suggest that these DhaL-DhaK protein pairs have coevolved from a common ancestor and serve a unified but unknown function. As noted above, it is possible that they use ATP or another phosphoryl donor to phosphorylate DHA. Alternatively, they could act on another substrate. It is interesting that the Mlo2 DhaL protein (GI 13476063) clusters with Sme2 (GI 16264045) and Bme2 (GI 17986682) (Fig. (Fig.5B).5B). The corresponding Mlo2 DhaK protein (GI 13476064) (Fig. (Fig.5A)5A) is by itself because there is no corresponding orthologue in S. meliloti and B. melitensis.
Table Table1212 presents the types of PTS domain fusions as well as single-domain proteins that were identified following analysis of the bacterial genomes. These were tabulated according to fusion protein type and family of enzyme II complexes. It is apparent that the largest number of domain combinations was found for the Fru family (13), with the Glc (10) and Man (6) families coming in second and third places, respectively. Only in the Gat family were domain fusions completely lacking (33). In each of the Lac and Gut families, only a single fusion type was found, i.e., IICB in the Lac family and IIBC in the Gut family.
Both bi- and tridomain proteins, including EI and/or HPr, were found. An EI-IIA fusion (Fru family), but no IIA-EI fusion, was found. Surprisingly, no fused protein containing just EI and HPr was identified, although both HPr-IIA and IIA-HPr fusions were found. The former was recognized only in the Fru family, but the latter occurred in both the Fru and Dha families. These observations came as a surprise because tridomain proteins (HPr-EI-IIA [Fru family] and IIA-HPr-EI [Glc, Fru, and Dha families]) were found. In these proteins, HPr is always N-terminal to EI. In no case was HPr C-terminally linked to EI, nor was the IIA domain ever centrally located.
Unfused IIA, IIB, and IIC domains were identified in all PTS families except the Gut family, where an unfused IIB domain could not be found. This is an unusual case because only in the Gut family is the transmembrane IIC protein split into two separate polypeptide chains, i.e., IIC1, with four or five putative transmembrane segments, and IIC2, with three putative transmembrane segments. In all Gut systems, B is fused N-terminally to C1 (encoded by gutE), while C2 (encoded by gutA) is present as a distinct unfused polypeptide chain. IIAGut (encoded by gutB) is never fused to any other protein domain. Homology between the glucitol IIC proteins and other PTS IIC domains has not been demonstrated (13, 71, 111).
As noted above, only the Man family includes a IID domain. While the Man enzyme II complex usually has all four of its protein constituents unfused, IIA can be fused N-terminally to IIB, and IIC can be fused N-terminally to IID. No other fusion combinations in the Man family were detected. Thus, although A-B fusions occur, B-A fusions do not, and although C-D fusions occur, D-C fusions could not be found. Moreover, A and B are never fused to C or D (Table (Table1212).
While A-B fusions were found only in the Man and Asc families, B-A fusions were found only in the Glc family. In contrast, both B-C and C-B fusions were identified in both the Glc and Fru families, and the latter fusion type was also found in the Lac and Asc families.
Tridomain PTS permease proteins with IIA, IIB, and IIC fused in single polypeptide chains were restricted to the Glc and Fru families, and four of the six possibilities were found. Thus, ABC, BCA, and CBA were identified in both families, and ACB was found in the Fru family, but BAC and CAB were not found. It is surprising that A can be fused either N-terminally or C-terminally to C in a tridomain protein. Interestingly, this only occurs when B is directly fused to C, but not to A. These two domains (A and C) were never found linked in a bidomain protein. While stereospecific constraints may explain some of these observations, others seem inexplicable (see “Conclusions”).
Some of the fusions were found only in certain taxonomic groups. For example, the EI-IIA, HPr-IIA, IIA-HPr, HPr-EI-IIA, and IIA-HPr-EI proteins in the Fru family as well as IIA-HPr-EI fusions in the Glc and Dha families were found only in Proteobacteria. The two IIA-HPr fusions found in the Dha family were found in two Actinobacteria, Corynebacterium diphtheriae (GI 38234870) and Leifsonia xyli (GI 50954678). A single EI-IIAFru protein (GI 15804544) was found in only one of the four strains of E. coli analyzed, strain O157:H7 EDL933. This fusion could have resulted from a sequencing error that split the HPr-EI-IIAFru protein found in the other three strains of E. coli. The HPr-EI-IIAFru fusion was found in several γ-Proteobacteria and one member of the β-Proteobacteria, Chromobacterium violaceum (GI 34497766). Three HPr-IIANtr (Fru family) fusions were found in the three Vibrio species analyzed. Although IIAFru-HPr fusions were found only in γ-Proteobacteria, IIAFru-HPr-EI fusions were found in α-, β-, and γ-Proteobacteria. IIAGlc-HPr-EI fusions were found mostly in β-Proteobacteria, but also in Caulobacter crescentus (α-Proteobacteria; GI 16124703 and 16124792) and P. aeruginosa (γ-Proteobacteria; GI 15598955).
The IIBAGlc fusion was found only in Staphylococcus aureus (GI 49482502). Since this fusion was found in all five strains of S. aureus analyzed, it is unlikely to be a sequencing artifact. IIABCGlc fusions were found (as two copies) only in the two Lactobacillus species analyzed. IIABAsc fusions were found only in the Actinobacteria. Two IICDMan fusions were found in two thermophilic bacteria, Symbiobacterium thermophilum (GI 51892416) and Thermoanaerobacter tengcongensis (GI 20806720). The IICBLac fusions were found only in a few Firmicutes, specifically in Staphylococcus species, Streptococcus species, and Clostridium acetobutylicum (GI 15895217). While the IIBCAFru fusions were found only in the Firmicutes, the IIACBFru fusion was found only in the three Vibrio species (γ-Proteobacteria). All other fusions were found distributed in diverse taxonomic groups, although interestingly, the IIBCFru and IICBAFru fusions predominated in Proteobacteria while the IIBCAGlc and IIABCFru fusions were found mostly in Firmicutes and Actinobacteria. IICBAGlc fusions also predominated in the Firmicutes.
Transcriptional activators and antiterminators of the PTS regulatory domain (PRD) family are known to incorporate PTS IIA and IIB domains into their structures (24, 29). Excluding these known cases where PTS and non-PTS domains are fused, we have tabulated all other examples revealed by our genome analyses. Domains were identified by searching the conserved domain database(42). The types of fusions identified are listed in Table Table13.13. The proteins are grouped according to the positions of the extra domains (indicated with X's in Table Table1313).
As reported previously, the FruB proteins of E. coli and related organisms have a IIA-X-FPr domain structure (67). In Haemophilus and Mannheimia, the same domain order is found, but an extra C-terminal FPr (fructose-inducible HPr) is present (69), probably due to a recent intragenic duplication event (category A in Table Table13).13). In Chlorobium and Fusobacterium (category B), IIAFru-X and IIANtr-X proteins of similar sizes are found. These are homologous to each other throughout their lengths, and the IIA domains are homologous to the IIA domain of the FruB protein of E. coli. In F. nucleatum, the extra domain is homologous to CBS domains found in membrane proteins that are probably involved in signal transduction (COG34481).
Other PTS proteins with C-terminal X domains are from organisms within the Mycoplasma group (category B). The C-terminal X domains in these proteins share homology with proteins that interact with DNA, such as the MAD (mitotic arrest deficient or mitotic checkpoint) proteins (107), the Smc (structural maintenance of chromosomes) proteins (34), the SbcC protein (an ATPase involved in DNA repair) (15), the HEC1 (highly expressed in cancer) protein, which may play a role in chromosomal segregation (43), and the TOPEUc (DNA topoisomerase 1 [Eukaryota]) protein (92). All of these homologous domains are linked to IICGlc domains in various PTS permeases (Table (Table13).13). It is possible that some of these X domains function in DNA binding or in protein-protein interactions. The PTS proteins to which these X domains in category B are fused include IIABCGlc proteins as well as IIBCGlc, IIBCFru, and IICAsc proteins.
X domains are fused N-terminally to IIANtr domains in bacteria of the chlamydial kingdom, in the δ-proteobacterium Geobacter sulfurreducens, and in the spirochete Treponema denticola (category C in Table Table13).13). At least some of these domains are homologous to helix-turn-helix DNA binding domains in transcriptional regulators of the MerR family (8). It seems likely that these proteins function in DNA binding, possibly to regulate gene expression in response to PTS IIANtr domain phosphorylation. In other fusions, the X domains, of variable size and sequence, are fused to IIAGat, IIBFru, and IIBCFru proteins.
In category D, we found X domains linked N-terminally to the integral membrane IICGlc or IICGut domain. In the Salmonella IICGlc protein, the X domain is 165 residues long, but in the IIC2Gut homologue from M. loti, it is only 50 residues long.
In category E (miscellaneous), we found X domains linked both N- and C-terminally to a partially homologous IICFru protein. This could be a sequence-divergent IIBCA (Table (Table12).12). Finally, there is an XΔIIBAsc (X = 80 residues) and a ΔIIBGlc-X(X = 40 residues) protein where the IIB domains are N-terminally truncated. Because of the truncations, it is unlikely that these proteins function as classical PTS phosphoryl transfer proteins.
Vibrio species and Wolinella succinogenes have IICLac homologues fused to C-terminal DUF2 domains (category F). The DUF2 domain, now characterized as the EAL domain, occurs ubiquitously in bacteria and has been shown to be a characteristic feature of cyclic diguanylate-specific phosphodiesterases (89, 95). Moreover, putative Na+/H+ antiporters of the monovalent cation-proton antiporter 2 family (TC no. 2.A.37) are linked to IIAFru domains in Pirellula and Borrelia species (category G). C-terminal IIAGlc-like domains in β-galactoside permeases of the GPH family (TC no. 2.A.2) are found in lactobacilli, where they serve a regulatory function (60, 61). In Pasteurella multocida, a triosephosphate isomerase, a glycolytic enzyme involved in the reversible interconversion of glyceraldehyde 3-phosphate and dihydroxyacetone phosphate, contains a C-terminally fused IIBGlc domain. Finally, in Thermoanaerobacter tengcongensis, a transcriptional regulator with an ATPase domain is fused to a C-terminal IIAFru domain. The domain organization in this protein most closely resembles the domain structure of the LevR protein of Bacillus subtilis, except that the IIBGat domain cannot be recognized, possibly due to extensive sequence divergence, and the C-terminal PRD domain is replaced with a IIAFru-like domain. This structure is reminiscent of an HPr-transcription factor fusion protein identified previously in Clostridium acetobutylicum, where an HPr-like domain is fused N-terminally to a similar (in sequence) transcription factor (73).
The results summarized in this section and in Table Table1313 lead to the suggestion that PTS proteins and their phosphorylation regulate macromolecular interactions such as protein-protein and protein-DNA binding. The identification of these fusion proteins leads to predictions with respect to several novel regulatory functions of the PTS, including the regulation of ion transport, transcription, enzymatic activities, and several types of macromolecular interactions. It is likely that these postulated regulatory functions will prove to augment the known list of PTS functions presented in Table Table2.2. It should be noted that in some cases where non-PTS domains of unknown function are apparently fused to PTS domains, errors in sequencing and ORF prediction could account for artifactual fusions.
In several cases, the genomes of multiple strains for a single species have been fully sequenced (Table (Table14).14). For example, four different strains each of Bacillus anthracis, Chlamydophila pneumoniae, and Escherichia coli and five different strains each of Staphylococcus aureus and Streptococcus pyogenes have been sequenced. The availability of these genome sequences provides us with the unique capability of comparing the complements of PTS proteins in closely related strains. Seven of the 18 species listed in Table Table1414 had no differences in PTS protein composition between the different strains for which complete genome data were available. In addition, two species, B. anthracis and S. enterica serovar Typhi, each exhibits only a single difference, a split gene which could be due to a sequencing error (Table (Table14).14). The nine remaining species listed in Table Table1414 are believed to exhibit true strain differences.
Of the four E. coli strains, the three pathogenic strains all possess extra Asc, Gat, and Man systems, as well as an extra HPr homologue, that are lacking in E. coli K-12. Additionally, one strain (CFT073) has an extra Lac-type system. The genome sizes of these four E. coli strains differ substantially (108). The two strains of Shigella flexneri, which are as close to E. coli as the different E. coli strains are to each other, differ from one another only in possessing a large genomic inversion that carries genes encoding complete Fru, Glc, and Man systems. Consequently, there is no PTS compositional difference between these two strains. They most closely resemble the E. coli K-12 strain but differ from this strain in that the S. flexneri strains possess an extra Man system. The two strains of Vibrio vulnificus differ with respect to their complete Fru and Man systems, and they exhibit inversions and insertions with respect to the arrangement of some of their pts genes relative to each other. Only one of the three Yersinia pestis strains possesses a split Asc system, with the genes encoding the IIA and IIB proteins together but separated from the IIC protein-encoding gene. In the other two Y. pestis strains, these genes are found together in a single putative operon.
Two Bacillus species, B. cereus and B. licheniformis, include strains that differ with respect to the presence or absence of a single PTS permease system, an ascorbate (Asc)-type system for B. cereus and a fructose (Fru)-type system for B. licheniformis. Of the two strains of Listeria monocytogenes that have had their genomes sequenced, one (EGD-e) has two extra Glc systems, three extra Fru systems, and one extra Asc system compared to the other (4bF2365).
The three species of Streptococcus listed in Table Table1414 also show strain differences. The two Streptococcus agalactiae strains differ with respect to the presence or absence of a complete Lac system. The five S. pyogenes strains differ from each other in that two have an extra complete Dha system as well as an extra orphan DhaK protein that is lacking in the other three strains. The two sequenced S. pneumoniae strains differ in that one possesses a complete Fru system and a IIANtr protein that are lacking in the other strain. Additionally, the R6 strain of S. pneumoniae has a fused BglG-IIAFru protein (GI 15902323) which is split into two separate proteins, BglG (GI 15900239) and IIAFru (GI 15900240), in strain TIGR4. Whether this difference is due to a sequencing error cannot be determined with certainty using bioinformatic approaches alone.
In each case where strain differences were observed for a single species, particularly where genes encoding PTS proteins were either present or lacking, one can conclude that pts genes have been gained and lost relatively frequently during recent evolutionary history. This observation suggests that these genes have been transferred horizontally with a high frequency. This conclusion is substantiated by phylogenetic relationships which argue against a strict vertical descent for several PTS permeases (data not shown). It is interesting that many of the strain differences we observed involve the gain or loss of complete enzyme II complexes, suggesting that the acquisition or loss of these genes has functional significance and was therefore subject to selection.
In this study, we examined 202 genomes for homologues of all known PTS proteins. Homologues were found only in bacteria, in agreement with our suggestion that the PTS evolved late, after the three domains of life separated (81). Within the bacterial domain, 22% of the species examined had no PTS protein homologues, 21% had only soluble putative regulatory proteins, and 57% had all constituents required for sugar phosphorylation, including at least one complete PTS permease. Organisms lacking PTS homologues included several of the most primitive bacteria with sequenced genomes (Aquafex, Thermus, and Thermotoga). Thermotoga maritima is a saccharolytic bacterium that metabolizes several simple and complex carbohydrates, including glucose, sucrose, maltose, starch, cellulose, and xylose (31, 32). It possesses nine putative sugar-specific ABC transport systems for the uptake of these carbon sources (52). This fact again argues in favor of the late evolution of the PTS. However, all cyanobacteria examined, representative α-, δ-, -, and γ-proteobacteria, several actinobacteria, and one mollicute also lacked PTS protein homologues (Table (Table6).6). Except for the cyanobacteria, many of these pts gene-lacking genomes probably resulted from genome minimalization. Twenty-nine species of bacteria were found to possess only soluble PTS protein homologues, presumed to function in regulation (7, 23). These organisms included the chlamydiae, several spirochetes, a green bacterium, Chlorobium tepidum, and several proteobacteria, particularly in the alpha and beta subcategories (Table (Table7).7). Nevertheless, bacteria with complete complements of PTS energy-coupling proteins plus permeases could be found in almost all of these bacterial kingdoms, except the chlamydial kingdom and the primitive bacterial kingdoms (Table (Table9).9). It would therefore appear on the basis of these observations that PTS protein-encoding genes have been gained and lost with a high frequency.
This last conclusion was substantiated by analyses of various strains of a single species and of various species in a single genus where variations in the complement of PTS permeases varied drastically (Table (Table14).14). Thus, for example, different E. coli strains possess between 17 and 26 PTS permeases, and the different strains differ with respect to the presence of members of 4 PTS permease families (Asc, Gat, Man, and Lac; Table Table14).Similarly,14).Similarly, major differences were observed between different strains of Vibrio vulnificus, Listeria monocytogenes, and several Streptococcus species (Table (Table14).14). In contrast, several sequenced strains of other species (including Chlamydophila pneumoniae, Staphylococcus aureus, Buchnera aphidicola, Xylella fastidiosa, and Neisseria meningitidis) exhibited no differences in PTS protein content. We concluded that the gain and loss of PTS permeases has occurred repeatedly, but in a species-specific fashion. This has apparently resulted from genome minimalism (51) as well as the horizontal transfer of genetic information encoding PTS permeases and energy-coupling proteins (108). The detection of genes encoding PTS protein homologues on mobile genetic elements (14, 88, 103, 112; see above) substantiates this last conclusion. Recently, the evolution of the mannose PTS transporters has been discussed, and extensive horizontal transfer of the genetic material encoding these systems has been documented (114).
Within a single coherent genus, different species similarly show differences in PTS permease content. Thus, corynebacterial species may either possess or lack a fructose-type PTS and an ascorbate-type PTS permease. More surprising, Mycoplasma species with drastically reduced genome sizes may possess between two and nine complete PTS permeases. Differences were also noted among the Streptomyces and Clostridium species (Table (Table99).
In this study, we divided the PTS permeases into seven families (Glc, Fru, Lac, Gut, Gat, Man, and Asc) (Table (Table9).9). The occurrence of the members of these families is summarized in Fig. Fig.2.2. Of the 77 bacterial species analyzed that encode PTS permeases within their genomes, the glucose (Glc) family was most highly represented. The order of prevalence of the seven families was as follows: Glc (30%) > Fru (25%) > Man (15%) > Lac (14%) > Asc (9%) > Gat (4%) > Gut (3%). However, the different taxonomic groups show various proportions of each of the PTS permease families. While the most abundant family in the Firmicutes was the Glc family, followed by Fru, Lac, and Man, the most prevalent family in the Proteobacteria was the Fru family, followed by Glc, Man, and Asc. When it is considered that the Glc, Fru, and Lac families actually belong to a single superfamily, while the Gat and Asc families combine to form a second superfamily (13, 33, 81), it can be concluded that the Glc/Fru/Lac superfamily includes 69% of all PTS permeases. The Man family includes 15%, the Asc/Gat superfamily includes 13%, and the Gut family includes 3%. This observation is in agreement with our suggestion that the fructose PTS was the first primordial system to have evolved (75, 81). These arguments are strengthened by the finding reported here that more bacteria with a single type of PTS permease have a Fru-type system than any other type.
The physiological functions of HprK homologues have been identified only for the low-G+C gram-positive Firmicutes, although many such homologues have been identified in gram-negative bacterial kingdoms (29, 97). We have postulated that all such enzymes serve regulatory functions, but only in a few instances are clues available as to what those functions may be. In the case of α-Proteobacteria, truncated HprKs are found in operons with genes encoding the gluconeogenic enzyme PEP carboxykinase, a sensor kinase/response regulator pair, and other PTS proteins such as HPr and IIA homologues (29). We have proposed that the PTS proteins function in a phosphoryl transfer cascade (65) that regulates the expression of the pck gene encoding PEP carboxykinase, possibly via the sensor kinase/response regulator pair (7, 29). No experimental data have bearing on this point. The multiple distantly related HprKs found in other α-Proteobacteria, such as Rhodospirillum rubrum, are also of unknown function (97).
We have concluded that the DHA PTS enzyme II complex is a recently evolved system derived from an ATP-dependent DHA kinase (81). The fact that the DhaM components vary dramatically in their domain compositions argues in favor of this conclusion. Nevertheless, these systems occur in a wide range of bacterial kingdoms (Table (Table10).10). This may have resulted from horizontal transfer of the genes encoding these systems, as suggested by our phylogenetic analyses (see below).
Phylogenetic analyses revealed a lack of orthology between PTS Dha proteins from a variety of bacteria. Thus, although an excellent phylogenetic correlation was observed between the DhaK, DhaL, and DhaM trees (Fig. (Fig.5),5), a very poor correlation was observed between these trees and the 16S rRNA trees for the corresponding organisms (data not shown). The implication is that while little or no shuffling of the three constituents of the Dha systems has occurred throughout their evolutionary divergence, they have been transferred laterally together as a unit. While orphan PTS proteins are frequently encoded within bacterial genomes, they may not always be functional. They may be the result of residual inactive genetic information resulting from genome minimalism (51).
The instability of the DHA PTS is further indicated by the variation in the structures of the DhaM components. Some DhaM proteins contain only the IIADha domain, but three homologous fusion proteins were also identified. These included (i) a IIADha-HPr fusion, (ii) a IIADha-HPr-EI fusion, and (iii) a IIADha-HPr-EIΔ fusion with the C-terminal region of enzyme I missing. In E. coli, which possesses a DhaM protein with a type 3 fusion structure, the classical enzyme I and HPr are required for DhaM phosphorylation. These two energy-coupling proteins may not be required for phosphorylation of the type 2 fusion proteins. Furthermore, in Bradyrhizobium japonicum, the dha operon encodes DhaM, HPr, and enzyme I with the gene order dhaMHI, the same order as that observed for the type 2 and type 3 fusion proteins mentioned above. The type 2 fusion proteins could have resulted from the elimination of chain termination codons and/or the introduction of intragenic microdeletions. It thus seems that the scenario found in B. japonicum could represent a transitional state towards production of the tridomain fusion proteins. We may be visualizing a “snapshot” of the evolutionary process still in progress (81).
We have identified many types of previously unidentified PTS protein fusions present in the bacterial genomes analyzed. EI-IIA fusions with IIA C-terminally linked to EI, but no IIA-EI fusions with IIA N-terminally linked to EI, were identified. Conversely, among tridomain proteins containing EI, HPr and IIA, HPr-I-IIA, and IIA-HPr-I fusions were identified, but no fusions had HPr directly linked to the C terminus of EI. These observations cannot be related to specific stereospecific requirements for the HPr-IIA interaction since both HPr-IIA and IIA-HPr fusion types were found. The explanation may be related to the fact that the HPr binding domain in EI is the N-terminal domain (40, 58). Thus, HPr must be in the proximity of this domain, and consequently, N-terminal but not C-terminal fusions of HPr to EI may be stereospecifically allowed. The fact that IIA can be linked to EI C-terminally but not N-terminally may be similarly explained. Thus, steric hindrance and competition between HPr and IIA may prevent the covalent association of IIA with the N-terminal domain of EI. In other cases where certain domain fusions are favored over others, preferred, but not absolutely required, associative properties of the fused domains may provide an explanation.
We have found PTS protein domains fused to a variety of novel non-PTS proteins and protein domains. The fusion of PTS protein domains to (or within) transcriptional regulators (24, 29, 98, 106) and non-PTS transport proteins (60, 61) had been known previously. Our genome analyses revealed many additional fusions of this general type (Table (Table13).13). These included Na+/H+ antiporter homologues with C-terminally fused IIAFru-like domains, triose-P isomerase homologues with C-terminally fused IIBGlc-like domains, and PspF-type putative transcriptional regulatory proteins of the NtrC family, which most closely resemble LevR of Bacillus subtilis (5, 16), fused C-terminally to IIAFru-like domains. However, many other types of domains were found in association with PTS protein domains. These included domains homologous to the CBS, MAD, Sbc, HEC1, DUF2, and helix-turn-helix domains. Several of these domains are known to be involved in signal transduction, and more generally, in macromolecular (protein-protein and protein-nucleic acid) interactions. Elucidation of the generalized functions of the non-PTS associative domains will be of great value in determining the specific functions of the fused proteins tabulated in Table Table13.13. Such efforts should keep molecular biologists entertained for decades to come.
This work was supported by NIH grants GM55434 and GM64368 from the National Institute of General Medical Sciences.
We thank Mary Beth Hiller for her assistance in the preparation of the manuscript.