In this study, we examined 202 genomes for homologues of all known PTS proteins. Homologues were found only in bacteria, in agreement with our suggestion that the PTS evolved late, after the three domains of life separated (
81). Within the bacterial domain, 22% of the species examined had no PTS protein homologues, 21% had only soluble putative regulatory proteins, and 57% had all constituents required for sugar phosphorylation, including at least one complete PTS permease. Organisms lacking PTS homologues included several of the most primitive bacteria with sequenced genomes (
Aquafex,
Thermus, and
Thermotoga).
Thermotoga maritima is a saccharolytic bacterium that metabolizes several simple and complex carbohydrates, including glucose, sucrose, maltose, starch, cellulose, and xylose (
31,
32). It possesses nine putative sugar-specific ABC transport systems for the uptake of these carbon sources (
52). This fact again argues in favor of the late evolution of the PTS. However, all cyanobacteria examined, representative α-, δ-,

-, and γ-proteobacteria, several actinobacteria, and one mollicute also lacked PTS protein homologues (Table ). Except for the cyanobacteria, many of these
pts gene-lacking genomes probably resulted from genome minimalization. Twenty-nine species of bacteria were found to possess only soluble PTS protein homologues, presumed to function in regulation (
7,
23). These organisms included the chlamydiae, several spirochetes, a green bacterium,
Chlorobium tepidum, and several proteobacteria, particularly in the alpha and beta subcategories (Table ). Nevertheless, bacteria with complete complements of PTS energy-coupling proteins plus permeases could be found in almost all of these bacterial kingdoms, except the chlamydial kingdom and the primitive bacterial kingdoms (Table ). It would therefore appear on the basis of these observations that PTS protein-encoding genes have been gained and lost with a high frequency.
This last conclusion was substantiated by analyses of various strains of a single species and of various species in a single genus where variations in the complement of PTS permeases varied drastically (Table ). Thus, for example, different
E. coli strains possess between 17 and 26 PTS permeases, and the different strains differ with respect to the presence of members of 4 PTS permease families (Asc, Gat, Man, and Lac; Table ).Similarly, major differences were observed between different strains of
Vibrio vulnificus,
Listeria monocytogenes, and several
Streptococcus species (Table ). In contrast, several sequenced strains of other species (including
Chlamydophila pneumoniae,
Staphylococcus aureus,
Buchnera aphidicola,
Xylella fastidiosa, and
Neisseria meningitidis) exhibited no differences in PTS protein content. We concluded that the gain and loss of PTS permeases has occurred repeatedly, but in a species-specific fashion. This has apparently resulted from genome minimalism (
51) as well as the horizontal transfer of genetic information encoding PTS permeases and energy-coupling proteins (
108). The detection of genes encoding PTS protein homologues on mobile genetic elements (
14,
88,
103,
112; see above) substantiates this last conclusion. Recently, the evolution of the mannose PTS transporters has been discussed, and extensive horizontal transfer of the genetic material encoding these systems has been documented (
114).
Within a single coherent genus, different species similarly show differences in PTS permease content. Thus, corynebacterial species may either possess or lack a fructose-type PTS and an ascorbate-type PTS permease. More surprising, Mycoplasma species with drastically reduced genome sizes may possess between two and nine complete PTS permeases. Differences were also noted among the Streptomyces and Clostridium species (Table ).
In this study, we divided the PTS permeases into seven families (Glc, Fru, Lac, Gut, Gat, Man, and Asc) (Table ). The occurrence of the members of these families is summarized in Fig. . Of the 77 bacterial species analyzed that encode PTS permeases within their genomes, the glucose (Glc) family was most highly represented. The order of prevalence of the seven families was as follows: Glc (30%) > Fru (25%) > Man (15%) > Lac (14%) > Asc (9%) > Gat (4%) > Gut (3%). However, the different taxonomic groups show various proportions of each of the PTS permease families. While the most abundant family in the
Firmicutes was the Glc family, followed by Fru, Lac, and Man, the most prevalent family in the
Proteobacteria was the Fru family, followed by Glc, Man, and Asc. When it is considered that the Glc, Fru, and Lac families actually belong to a single superfamily, while the Gat and Asc families combine to form a second superfamily (
13,
33,
81), it can be concluded that the Glc/Fru/Lac superfamily includes 69% of all PTS permeases. The Man family includes 15%, the Asc/Gat superfamily includes 13%, and the Gut family includes 3%. This observation is in agreement with our suggestion that the fructose PTS was the first primordial system to have evolved (
75,
81). These arguments are strengthened by the finding reported here that more bacteria with a single type of PTS permease have a Fru-type system than any other type.
The physiological functions of HprK homologues have been identified only for the low-G+C gram-positive
Firmicutes, although many such homologues have been identified in gram-negative bacterial kingdoms (
29,
97). We have postulated that all such enzymes serve regulatory functions, but only in a few instances are clues available as to what those functions may be. In the case of α-
Proteobacteria, truncated HprKs are found in operons with genes encoding the gluconeogenic enzyme PEP carboxykinase, a sensor kinase/response regulator pair, and other PTS proteins such as HPr and IIA homologues (
29). We have proposed that the PTS proteins function in a phosphoryl transfer cascade (
65) that regulates the expression of the
pck gene encoding PEP carboxykinase, possibly via the sensor kinase/response regulator pair (
7,
29). No experimental data have bearing on this point. The multiple distantly related HprKs found in other α-
Proteobacteria, such as
Rhodospirillum rubrum, are also of unknown function (
97).
We have concluded that the DHA PTS enzyme II complex is a recently evolved system derived from an ATP-dependent DHA kinase (
81). The fact that the DhaM components vary dramatically in their domain compositions argues in favor of this conclusion. Nevertheless, these systems occur in a wide range of bacterial kingdoms (Table ). This may have resulted from horizontal transfer of the genes encoding these systems, as suggested by our phylogenetic analyses (see below).
Phylogenetic analyses revealed a lack of orthology between PTS Dha proteins from a variety of bacteria. Thus, although an excellent phylogenetic correlation was observed between the DhaK, DhaL, and DhaM trees (Fig. ), a very poor correlation was observed between these trees and the 16S rRNA trees for the corresponding organisms (data not shown). The implication is that while little or no shuffling of the three constituents of the Dha systems has occurred throughout their evolutionary divergence, they have been transferred laterally together as a unit. While orphan PTS proteins are frequently encoded within bacterial genomes, they may not always be functional. They may be the result of residual inactive genetic information resulting from genome minimalism (
51).
The instability of the DHA PTS is further indicated by the variation in the structures of the DhaM components. Some DhaM proteins contain only the IIA
Dha domain, but three homologous fusion proteins were also identified. These included (i) a IIA
Dha-HPr fusion, (ii) a IIA
Dha-HPr-EI fusion, and (iii) a IIA
Dha-HPr-EIΔ fusion with the C-terminal region of enzyme I missing. In
E. coli, which possesses a DhaM protein with a type 3 fusion structure, the classical enzyme I and HPr are required for DhaM phosphorylation. These two energy-coupling proteins may not be required for phosphorylation of the type 2 fusion proteins. Furthermore, in
Bradyrhizobium japonicum, the
dha operon encodes DhaM, HPr, and enzyme I with the gene order
dhaMHI, the same order as that observed for the type 2 and type 3 fusion proteins mentioned above. The type 2 fusion proteins could have resulted from the elimination of chain termination codons and/or the introduction of intragenic microdeletions. It thus seems that the scenario found in
B. japonicum could represent a transitional state towards production of the tridomain fusion proteins. We may be visualizing a “snapshot” of the evolutionary process still in progress (
81).
We have identified many types of previously unidentified PTS protein fusions present in the bacterial genomes analyzed. EI-IIA fusions with IIA C-terminally linked to EI, but no IIA-EI fusions with IIA N-terminally linked to EI, were identified. Conversely, among tridomain proteins containing EI, HPr and IIA, HPr-I-IIA, and IIA-HPr-I fusions were identified, but no fusions had HPr directly linked to the C terminus of EI. These observations cannot be related to specific stereospecific requirements for the HPr-IIA interaction since both HPr-IIA and IIA-HPr fusion types were found. The explanation may be related to the fact that the HPr binding domain in EI is the N-terminal domain (
40,
58). Thus, HPr must be in the proximity of this domain, and consequently, N-terminal but not C-terminal fusions of HPr to EI may be stereospecifically allowed. The fact that IIA can be linked to EI C-terminally but not N-terminally may be similarly explained. Thus, steric hindrance and competition between HPr and IIA may prevent the covalent association of IIA with the N-terminal domain of EI. In other cases where certain domain fusions are favored over others, preferred, but not absolutely required, associative properties of the fused domains may provide an explanation.
We have found PTS protein domains fused to a variety of novel non-PTS proteins and protein domains. The fusion of PTS protein domains to (or within) transcriptional regulators (
24,
29,
98,
106) and non-PTS transport proteins (
60,
61) had been known previously. Our genome analyses revealed many additional fusions of this general type (Table ). These included Na
+/H
+ antiporter homologues with C-terminally fused IIA
Fru-like domains, triose-P isomerase homologues with C-terminally fused IIB
Glc-like domains, and PspF-type putative transcriptional regulatory proteins of the NtrC family, which most closely resemble LevR of
Bacillus subtilis (
5,
16), fused C-terminally to IIA
Fru-like domains. However, many other types of domains were found in association with PTS protein domains. These included domains homologous to the CBS, MAD, Sbc, HEC1, DUF2, and helix-turn-helix domains. Several of these domains are known to be involved in signal transduction, and more generally, in macromolecular (protein-protein and protein-nucleic acid) interactions. Elucidation of the generalized functions of the non-PTS associative domains will be of great value in determining the specific functions of the fused proteins tabulated in Table . Such efforts should keep molecular biologists entertained for decades to come.