|Home | About | Journals | Submit | Contact Us | Français|
The major facilitator superfamily (MFS) is one of the two largest families of membrane transporters found on Earth. It is present ubiquitously in bacteria, archaea, and eukarya and includes members that can function by solute uniport, solute/cation symport, solute/cation antiport and/or solute/solute antiport with inwardly and/or outwardly directed polarity. All homologous MFS protein sequences in the public databases as of January 1997 were identified on the basis of sequence similarity and shown to be homologous. Phylogenetic analyses revealed the occurrence of 17 distinct families within the MFS, each of which generally transports a single class of compounds. Compounds transported by MFS permeases include simple sugars, oligosaccharides, inositols, drugs, amino acids, nucleosides, organophosphate esters, Krebs cycle metabolites, and a large variety of organic and inorganic anions and cations. Protein members of some MFS families are found exclusively in bacteria or in eukaryotes, but others are found in bacteria, archaea, and eukaryotes. All permeases of the MFS possess either 12 or 14 putative or established transmembrane α-helical spanners, and evidence is presented substantiating the proposal that an internal tandem gene duplication event gave rise to a primordial MFS protein prior to divergence of the family members. All 17 families are shown to exhibit the common feature of a well-conserved motif present between transmembrane spanners 2 and 3. The analyses reported serve to characterize one of the largest and most diverse families of transport proteins found in living organisms.
“If you do not expect to, you will not discover the unexpected.”
Transport systems allow the uptake of essential nutrients and ions, excretion of end products of metabolism and deleterious substances, and communication between cells and the environment (53). They also provide essential constituents of energy-generating and energy-consuming systems (54). Primary active transporters drive solute accumulation or extrusion by using ATP hydrolysis, photon absorption, electron flow, substrate decarboxylation, or methyl transfer (17). If charged molecules are unidirectionally pumped as a consequence of the consumption of a primary cellular energy source, electrochemical potentials result (54). The consequential chemiosmotic energy generated can then be used to drive the active transport of additional solutes via secondary carriers which merely facilitate the transport of one or more molecular species across the membrane (48, 49).
Recent genome-sequencing data and a wealth of biochemical and molecular genetic investigations have revealed the occurrence of dozens of families of primary and secondary transporters (63). Two such families have been found to occur ubiquitously in all classifications of living organisms. These are the ATP-binding cassette (ABC) superfamily (15, 21, 37, 44) and the major facilitator superfamily (MFS), also called the uniporter-symporter-antiporter family (7, 28, 30, 35, 51). While ABC family permeases are in general multicomponent primary active transporters, capable of transporting both small molecules and macromolecules in response to ATP hydrolysis (59), the MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients. Although well over 100 families of transporters have now been recognized and classified (73), the ABC superfamily and MFS account for nearly half of the solute transporters encoded within the genomes of microorganisms (63). They are also prevalent in higher organisms. The importance of these two families of transport systems to living organisms can therefore not be overestimated.
The MFS was originally believed to function primarily in the uptake of sugars (36, 46). Subsequent studies revealed that drug efflux systems and Krebs cycle metabolites belong to this family (30, 62). The family was then expanded to include organophosphate:phosphate exchangers and oligosaccharide:H+ symport permeases (51). Reizer et al. (67) noted that a mammalian phosphate:Na+ symporter is a distant member of this family; Paulsen et al. (60) subdivided the MFS drug efflux pumps into two phylogenetically distinct families with differing topologies; and Goffeau et al. (26) identified a novel MFS family that consists exclusively of functionally uncharacterized proteins from Saccharomyces cerevisiae revealed by genome sequencing. Recently, Williams and Shaw (86) noted that a family of bacterial aromatic acid permeases belongs to the MFS. These observations led to the probability that the MFS is far more widespread in nature and far more diverse in function than had been thought previously.
Although isolated reports have allowed recognition of an increasing degree of diversity within the MFS, there has been no recent systematic attempt to identify the sequenced proteins that make up the MFS and to classify these proteins into phylogenetic families. We have therefore undertaken this task in the hopes of allowing (i) recognition of the significance of this family to cell physiology; (ii) extrapolation of biochemical, molecular genetic, and biophysical information obtained from the study of a few such systems to all members of the family; (iii) unification of mechanistic models, to the greatest extent possible, so as to be applicable to a maximal number of transporters; (iv) introduction of a rational system of MFS protein classification; and (v) comprehension of the pathways taken in the development of structural and functional diversity resulting from the evolutionary process used.
In this report, we present analyses that allow us to generalize some previous observations regarding the MFS and to note additional characteristics of this immense superfamily. Thus, based exclusively on degrees of sequence similarity, we have constructed phylogenetic trees which allow us to divide all the recognized members of the MFS into 17 families. The members of each family all proved to be more closely related in sequence to each other than they were to any of the other MFS proteins. This fact presumably reflects the evolutionary histories of these proteins (71, 72), and, remarkably, we find that phylogenetic family correlates with function. Thus, each of the families recognizes and transports a distinct class of structurally related compounds. These observations have allowed us to derive a rational classification system for the MFS based on both phylogeny and function. This classification system has proven applicable to virtually all permeases found in nature (73).
In 1990, Rubin et al. (70) presented evidence that strongly argued in favor of an earlier suggestion (see reference 36), that MFS permeases arose by a tandem intragenic duplication event. In this report, we provide additional statistical evidence in favor of this possibility. This event generated the 12-transmembrane-spanner (TMS) protein topology from a primordial 6-TMS unit. Surprisingly, all currently recognized MFS permeases retain the two six-TMS units within a single polypeptide chain, although in 3 of the 17 MFS families, an additional two TMSs are found (60). Moreover, the well-conserved MFS-specific motif between TMS2 and TMS3 and the related but less well conserved motif between TMS8 and TMS9 (36) prove to be a characteristic of virtually all of the more than 300 MFS proteins identified. The functional significance of this repeated motif has been examined by Jessen-Marshall et al. (39) and by Yamaguchi et al. (87–89).
Many additional observations allowed the identification of highly specific characteristics of individual MFS families as well as general characteristics of the MFS as a whole. We hope that the computational analyses reported will provide a guide for molecular biologists, biochemists, and biophysicists interested in structural, functional, and evolutionary aspects of MFS permeases.
The FASTA (64) and BLAST (2) programs were used to screen the peptide and translated nucleotide databases. The statistical significance of sequence similarities between putative members of the various families of the MFS was established by using the RDF2 (64) and GAP (16) programs with at least 200 random shuffles. Binary comparison scores are expressed in standard deviations (SD) (14). A value of 9 SD for a protein segment larger than 60 residues is deemed sufficient to establish homology (18, 71). This criterion was used to establish homology between MFS families (see Table Table22).
Multiple-sequence alignments were constructed with the PREALIGN and TREE programs of Feng and Doolittle (22) and the PILEUP program (16). Phylogenetic analyses were routinely performed with the TREE program (22) but were checked with other programs. The different programs generally gave very similar, and often identical, branching orders, and the branch lengths were also strikingly similar. Branch length is approximately proportional to the degree of sequence divergence, which, to a first approximation, is assumed to be proportional to the phylogenetic distance (but see the section Conclusions and Perspectives, below). It is important to emphasize that branch lengths and even branch positions represent approximations to the evolutionary process, allowing facile visualization of the relationships between sequences within families. They reflect relative degrees of sequence divergence and can be considered to represent the evolutionary process only to a first approximation (71, 72).
Average hydropathy, average amphipathicity and average similarity analyses were conducted for all protein families analyzed. They were based on the complete multiple-sequence alignments generated with the TREE program. Only representative, well-conserved portions of these multiple-sequence alignments are presented. The hydropathy analyses were conducted with the assumptions and algorithm described by Kyte and Doolittle (45) with a sliding window of 20 residues. Similarly, a sliding window of 20 residues was used to generate the average similarity and average amphipathicity plots (45a). These latter analyses are not presented but are described in the text (see Table Table22).
Charge bias analysis of membrane protein topology was performed with the program TOP PRED (83). Signature sequences were defined by the method of Bairoch et al. (6). The programs MEME and MAST (5) were used to help identify conserved motifs within the protein families of the MFS. Most of the methods used in this study have been applied to numerous transport proteins and have been evaluated (see references 71 and 72 for recent reviews).
Table Table11 lists and summarizes the properties of transport protein families found within the current MFS. We have classified current members of the MFS into 17 (possibly 18) distinct families. This number of MFS families represents more than a threefold expansion over that published previously (51). The table provides the family number; the name of the family; the abbreviation of the family to be used in this study; the number of currently recognized sequenced members in each family; the range of organisms in which members of the family are found; the size range of the proteins (in numbers of amino acyl residues) for fully sequenced members; the number of putative TMSs in each protein (believed to be uniform for members of a given family); the energy-coupling mechanisms, if any, used by members of the family; the polarities of transport catalyzed by family members; the substrates known to be transported by various members of the family; and a representative and well-characterized member of the family.
The largest family (family 1) is the sugar porter (SP) family, with 133 identified members. These proteins are derived from all of the major groups of living organisms: bacteria, archaea, eukaryotic protists, fungi, mostly yeasts, animals, and plants. These proteins have 12 established or putative TMSs. They can function by uniport, solute:solute antiport, and/or solute:cation symport, depending on the system and/or conditions. Uniporters exhibit no polarity but can usually catalyze both uniport and antiport depending on whether a substrate is present on the trans side of the membrane. The polarity of solute:solute antiporters is indicated in Table Table11 by “both.” Symporters function with inwardly-direct polarity in the presence of a membrane potential (negative inside), but many of these proteins have also been shown to catalyze antiport when a substrate is present on the trans side of the membrane. Substrates transported by SP family members include hexoses, pentoses, disaccharides, quinate, inositols, and organic cations. Most but not all members of the SP family thus catalyze sugar transport.
Family 1 permeases exhibit a size range of 404 to 818 residues. The smaller permeases possess very short hydrophilic N and C termini and short loops connecting the 12 TMSs. As is true of many MFS families, the bacterial sugar porters are usually smaller than the eukaryotic proteins. The larger sizes of the eukaryotic proteins are due to large hydrophilic N and/or C termini or, less frequently, to increased sizes of specific inter-TMS loops. The hydrophilic regions of the eukaryotic proteins may play roles in regulation or in cytoskeletal attachment, and they are frequently subject to phosphorylation by ATP-dependent protein kinases. A representative well-characterized example of the SP family is the arabinose:H+ symport permease (AraE) of Escherichia coli (47).
Families 2 and 3 consist of drug efflux systems which possess 14 and 12 TMSs, respectively (74). Since these permeases uniformly catalyze drug:H+ antiport, they are referred to as the DHA14 and DHA12 families, respectively. A total of 30 and 46 sequenced members are currently recognized in these two families. Because these permeases have recently been the subject of an extensive review which presented multiple alignments and phylogenetic trees (60), they will not be described or analyzed here. Members of both families are found in bacteria and eukaryotes, and DHA12 family members have also been identified in archaea.
Families 4, 5, and 6, the organophosphate:inorganic phosphate antiporters (OPA), the oligosaccharide:H+ symporters (OHS), and the metabolite:H+ symporters (MHS), respectively, were recognized to be families within the MFS in 1993 (30, 51). Since these permeases are restricted to bacteria, it is not surprising that they are all relatively small (400 to 500 residues). All three of these families have become substantially larger and more diverse in function since 1993, due to the sequencing and functional identification of new members.
All the remaining families listed in Table Table11 (families 7 to 18) were not recognized in 1993 and are therefore new MFS families. Family 7 (the fucose-galactose-glucose:H+ symporters [FGHS]) is a small family with four distantly related members. As with most members of the SP family, these proteins are specific for sugars. They all probably function by proton symport. They are relatively small (404 to 438 residues), as expected since they are derived exclusively from bacteria.
The nitrate-nitrite porter (NNP) family (family 8) has members in bacteria, yeasts, and plants. Not surprisingly, these proteins exhibit a larger size range (395 to 547 residues) than was observed for FGHS family members. These proteins catalyze either nitrate uptake or nitrite efflux. The energy-coupling mechanisms are not well defined.
Family 9, the phosphate:H+ symporter (PHS) family, has sequenced representatives only in yeast and plants. The 11 proteins of the PHS family are fairly uniform in size, but they are substantially larger than most bacterial MFS proteins (518 to 587 residues). The characterized members are uniform in function.
Family 10, the nucleoside:H+ symporter (NHS) family, has only two bacterial members, and they are of the same size (418 residues each). They are both from E. coli and differ in specificity.
Family 11, the oxalate/formate antiporter (OFA) family, is a small but diverse family. Only five members have been sequenced, but these proteins are found in the bacterial, archaeal, and eukaryotic kingdoms. Surprisingly, they are of fairly uniform size (373 to 470 residues). The very small size of one of these proteins (see below) raises the possibility that its sequence is incomplete.
Family 12, the sialate:H+ symporter (SHS) family, like the NHS family, is very small (with only three members), and, again like the NHS family, the members are all derived from gram-negative bacteria. Their sizes are consistent with those generally observed for bacterial MFS proteins (407 to 496 residues). These proteins differ from most MFS proteins in possessing 14 putative TMSs.
Family 13, with 13 members derived exclusively from yeasts and animals, is the monocarboxylate porter (MCP) family. These permeases transport pyruvate, lactate, and/or mevalonate with inwardly-directed polarity. They all presumably function by proton symport. Their reported sizes range from 450 to 808 residues.
Family 14, the anion:cation symporter (ACS) family, is a relatively large family with 40 sequenced members. The proteins are derived from bacteria, yeasts, and animals, and they exhibit an intermediate range of sizes (411 to 596 residues). They accumulate their substrates in symport with either Na+ or H+, depending on the system. They may transport either inorganic anions (e.g., phosphate) or organic anions (e.g., glucarate, hexuronate, tartrate, allantoate, or 4-hydroxylphenyl acetate). Of the functionally characterized porters, the inorganic anion porters of the ACS family cotransport Na+ while the organic anion porters cotransport H+.
Family 15, the aromatic acid:H+ symporter (AAHS) family, consists of seven sequenced proteins, all from bacteria. As expected, these porters show fairly uniform sizes (418 to 460 residues), all on the low end of the scale. They transport a variety of aromatic acids as well as cis,cis-muconate, as indicated in Table Table1.1. Interestingly, one member of the family has been implicated in chemotaxis, allowing the bacteria to swim up concentration gradients of its substrates (34). This is the only documented case where an MFS protein apparently serves as a chemoreceptor. One of the AAHS proteins (BenK Aca) transports benzoate (11). Two additional (putative) benzoate:H+ symporters (BenE) have been sequenced. They are both derived from gram-negative bacteria. One is the functionally characterized BenE protein of Acinetobacter calcoaceticus, and the other is a closely related protein from E. coli (55). These two proteins both contain a single region that exhibits limited sequence similarity to family 15 porters, as might be expected on the basis of the specificity of the A. calcoaceticus protein. However, they are very divergent in sequence from the latter proteins and cannot be shown to be homologous to any member of the MFS. They are therefore included in a separate family designated the benzoate:H+ symporter (BenE; TC #2.46) family (72a).
Six members of a novel family, family 16, the unknown major facilitator (UMF) family, have recently been identified (26). Although it has been proposed that these carriers are drug efflux pumps, no member of this family has been functionally characterized, and consequently the designation UMF has tentatively been assigned to this family. All six currently recognized members of the family are from Saccharomyces cerevisiae, and no close homologs are found in other organisms. These proteins exhibit the less common putative 14-TMS topology observed for only two other MFS families. The proteins of the UMF family exhibit almost no size variation (range, 606 to 637 residues).
Family 17, the cyanate permease (CP) family, includes only three proteins, all from bacteria. They are small proteins (393 to 402 residues with 12 TMSs). The substrate of one of these proteins (CynX of E. coli) is believed to be cyanate (NCO−). The other two members, from E. coli and Bacillus subtilis, are strikingly divergent in sequence but not in size, as noted above.
The proton-dependent oligopeptide transporter (POT) family has been described previously (62, 78). We have observed sequence similarities of these proteins to members of the SP and DHA14 families (see below). Although this similarity is insufficient to establish homology, the similarities in sequence, mechanism, and topology between proteins of the POT family and those of several MFS families strongly suggest that the POT family is a distant constituent of the MFS.
Proteins within any one family of the MFS exhibit fairly extensive sequence similarities, as revealed by the portions of the multiple alignments shown in Fig. Fig.33 to to17.17. Intrafamily comparison scores are always in excess of 15 SD, thus easily establishing that the members of any one family are homologous. However, sequence similarity for any two proteins derived from different MFS families is much less extensive. We therefore conducted interfamily binary comparisons to establish homology for all MFS families (18, 71). Homology for families 1 to 6 has been established previously (51). The results of the present comparisons are presented in Table Table22 and Figure Figure1.1.
As noted above, families 1 to 6 and family 16 have already been shown to be constituent families of the MFS (26, 51, 60). The data presented in Table Table22 establish that families 7 to 17 (described above) are all constituents of the MFS. In the case of FucP of family 7 (FGHS), 21% identity (8 SD) was observed with Gtr5 of family 1 in a region of 125 residues that exhibits no gaps in the binary alignment. Gtr5 is an established member of family 1 of the MFS (see Table Table3).3). Of the comparison scores recorded in Table Table2,2, this is the only score below 10 SD, and most of the other sequences compared include all or most of the two proteins compared. Family 7 is therefore the only family included in Table Table22 that is not fully established as an MFS constituent. Other considerations provide additional support for the conclusion that the FGHS family is in fact a member of the MFS (see below).
NasA of family 8 exhibits a comparison score of 13 SD with 21% identity to YidT of family 14 for a 150-residue segment exhibiting no gaps, thus linking these two families, and GudT of family 14 exhibits 22% identity and 12 SD to Bmrl of family 3 for the full lengths of the two proteins (eight gaps in the complete binary alignment). Bmrl is an established member of the MFS (60). Thus, families 8 and 14 are members of the MFS as determined by these comparisons. Similarly, the OFA (family 11) member YhjX exhibits 18% identity and 11 SD with five gaps for the full binary alignment with respect to Ykwl of family 13. Family 13 member Motl exhibits 10 SD (17% identity; 5 gaps for the full-length binary alignment) with respect to NanT of family 12, and NanT exhibits 10 SD (25% identity with 11 gaps in the complete binary alignment) with respect to CitA of family 6, a protein shown previously to be a member of the MFS (51). Thus, on the basis of these comparisons and the superfamily principle (18, 71), families 11, 13, and 12 are within the MFS. Using similar logic, the results summarized in Table Table22 establish that all 17 families under consideration (with the improbable exception of family 7) are homologous.
Short regions of the binary alignments upon which the comparison scores recorded in Table Table22 were based are shown in Fig. Fig.1.1. These alignments exhibit between 17 and 25% identity with greater than 50% similarity in each case. Most of the regions shown are derived from the N-terminal halves of these proteins. These regions are generally the best-conserved portions of the MFS proteins, as pointed out previously for families 1 to 7 (51, 71).
A phylogenetic tree for the MFS, which includes representative proteins from most of the families, is shown in Fig. Fig.2.2. Several features are worthy of note. First, most of the families branch from points near the center of the tree. Second, the DHA14 and DHA12 families (families 2 and 3, respectively) branch off from each other after the initial divergence from the center of the tree, suggesting that they are more closely related to each other than to other MFS families. This is in agreement with their similar specificities. Third, the UMF family (family 16) does not branch from a point near the branch for the two DHA families (families 2 and 3), and thus there is no phylogenetic evidence for the suggestion that they transport drugs, even though the proteins of the UMF and DHA14 families both have 14 putative TMSs (26). Fourth, the MCP and OFA families (families 13 and 11, respectively) branch from each other at a point that is somewhat distant from the center of the tree. A late branching point, suggestive of late divergence, is consistent with the fact that both families transport carboxylates. Fifth, the MHS and SHS families (families 6 and 12, respectively) branch from each other relatively far from the center of the tree, suggesting that they are close familial relatives, having diverged from each other late in the evolutionary process. Most, and perhaps all, of the members of these two families transport anionic compounds. The PHS and SP families (families 9 and 1, respectively) also stem from the primary branch from which the MHS and SHS families stem. However, these families branch off close to the center of the tree. Consequently, close phylogenetic relationships for these families are not suggested. Finally, the OPA, NHS, and OHS families (families 4, 10, and 5, respectively) are found branching from the same trunk. All of the proteins of the NHS and OHS families, and some of the members of the OPA family, transport glycosides.
The SP family was described many years ago, and its description has been repeatedly updated (7, 7a, 28, 30, 35, 36, 46, 58). The present SP family consists of 133 sequenced members derived from bacteria, archaea, and eukarya. The family includes members that are very diverse in sequence and function. As revealed by the information in Table Table3,3, these proteins function under normal physiological conditions either by uniport or by H+ symport. However, many of these and other MFS permeases can catalyze solute:solute antiport when substrates are present on both sides of the membrane (42). The symporters all function in energized cells with inwardly directed polarity. Substrates of SP family members include galactose, arabinose, xylose, and glucose in bacteria; galactose, quinate, myoinositol, lactose, maltose, and α-glucosides in yeasts and fungi; hexoses in trypanosomes and plants; and sugars as well as organic cations and neurotransmitters in animals. Most but by no means all members of the SP family are therefore specific for sugars.
Figure Figure3A3A presents a portion of the multiple-sequence alignment of 20 representative members of the SP family. The proteins included are from bacteria, yeasts, fungi, trypanosomes, plants, and animals. The 47-residue segment shown exhibits no gaps in the multiple alignment, and there are two fully conserved residues. Almost half of the residue positions within this segment exhibit a predominant residue that appears in the consensus sequence. This fact reflects a high degree of conservation. At least 50% of the proteins included in the alignment exhibit the same residue at each of these positions, by definition. While the fully conserved glycine (G) is likely to be of structural importance, the fully conserved glutamate (E) may play a catalytic role. Several of the well-conserved residues (e.g., R’s at alignment positions 2 and 29 and G’s at alignment positions 6 and 10) are conserved in all but one or two of the proteins depicted. The high degree of conservation of these residues clearly suggests that they play important structural or functional roles.
The phylogenetic tree for the SP family members whose sequences are represented in Fig. Fig.3A3A is shown in Fig. Fig.3B.3B. On the left-hand side of the tree are all of the sugar porters as well as the quinate:H+ symporter (Qa-Y Ncr). Uniporters and proton symporters are often closely related (e.g., XylE Eco and Glf Zmo). More divergent members of the sugar porter cluster are Tht2A Tbr, Ma6T Sce, and Lac12 Kla of protists and yeasts. It is noteworthy that the bacterial, plant, and animal proteins cluster loosely together but most of the yeast and fungal proteins cluster separately.
Two yeast proteins and one protozoan protein branch from points near the base of the tree. However, the most distant members (right-hand side of the tree) include the synaptic vesicle transporter Sv2 Rno and the organic cation transporter Oct-1 Rno, both from the rat. The great phylogenetic distances observed between these proteins and the sugar porters correlate roughly with the divergent substrate specificity of these transporters relative to other members of the SP family.
The DHA14 drug efflux family has been described recently (60). Thirty members of the family were identified in that study. All functionally characterized members of the DHA14 family have been found to catalyze drug efflux. Of these functionally characterized permeases, 7 are multidrug resistance pumps from gram-negative and gram-positive bacteria as well as yeasts, 12 are putative drug-specific pumps from gram-positive bacteria, and 11 are hypothetical or uncharacterized proteins from gram-negative bacteria, yeasts, and fungi. A multiple alignment and a dendogram for the family were presented in that study (60). The multidrug resistance pumps, drug-specific permeases, and uncharacterized proteins did not group together on the DHA14 family dendogram but instead proved to be scattered in an apparently random fashion relative to each other. This fact suggests that drug-specific and multidrug efflux pumps arose repeatedly by narrowing and broadening of their specificities and that the functionally uncharacterized members of the family are also probably involved in drug efflux (74). Because of the extensive treatment of this family by Paulsen et al. (60), these proteins will not be considered further here.
The DHA12 drug efflux family, also described by Paulsen et al. (60), consists of 46 proteins. Of these, 9 have been shown to be multidrug resistance pumps, 15 are probably drug-specific efflux pumps, and 22 are hypothetical or uncharacterized proteins. Like the DHA14 family, functionally characterized members of the DHA12 family exhibit specificities only for drugs, although the range of drugs transported is remarkable (60). Interestingly, the range of organisms in which DHA12 family members are found is wider than that for the DHA14 family. Thus, the DHA12 MDR pumps are found in animals as well as in yeasts and a variety of gram-negative and gram-positive bacteria. The proven and putative drug-specific efflux pumps are also found in a wide range of gram-negative and gram-positive bacteria, yeasts, and animals. Uncharacterized members of this family include an even wider range of organisms, including humans and archaea (reference 60 and unpublished results).
The dendogram for the DHA12 family (60) resembles that for the DHA14 family in that multidrug resistance and drug-specific pumps are interspersed. Thus, in both families, phylogeny does not appear to provide an indication of drug specificity.
Table Table44 lists the members of the OPA family of the MFS. The seven functionally characterized members are derived from gram-negative and gram-positive bacteria and function in the transport of either sugar phosphates, glycerol phosphate, or phosphoglycerates and phosphoenolpyruvate. The UhpC proteins are believed to function in the regulation of hexose phosphate transporter synthesis. UhpC presumably serves as a receptor for glucose-6-phosphate the inducer, in controlling transcription of the uhpT operon (38). It is a rare example of an MFS member which does not serve a primary transport function. However, it is not known whether or not it has the capacity to transport its ligand, glucose-6-phosphate.
The predominant mechanism of transport catalyzed by permeases of the OPA family under normal physiological conditions appears to be antiport of an organophosphate ester for inorganic phosphate (49, 50). These permeases may also be capable of catalyzing substrate:H+ symport (19). The best-characterized members of the family are UhpT and GlpT, both of E. coli, for which detailed topological models have been presented (29, 90, 91). The OPA family includes several proteins from the worm Caenorhabditis elegans. Thus, this family includes members derived from eukaryotes as well as prokaryotes.
A well-conserved segment of the complete multiple alignment for the OPA family is presented in Fig. Fig.4A.4A. The 52-residue multiple-sequence alignment reveals only three single-residue gaps and shows four fully conserved residues (Q, R, W, and G). Of the 52 alignment positions, 19 are conserved in a majority of the proteins and hence appear in the consensus sequence. Structural (G, P), hydrophobic (F, V), semipolar (W), and strongly polar (N, Q, E, R and H) residues occur in the consensus sequence, with the last group being overrepresented.
The phylogenetic tree for the OPA family shows all bacterial proteins clustering together, as do the uncharacterized C. elegans proteins. Surprisingly, the UhpT transport proteins are as distant from the UhpC receptor proteins as these proteins are from the phosphoglycerate transporter (PgtP) or the glycerol phosphate transporters (GlpT). It is therefore of interest that UhpC is apparently specific for glucose-6-phosphate and 2-deoxyglucose-6-phosphate whereas UhpT recognizes a wide spectrum of sugar phosphates (3, 38). The large separation observed for the bacterial and animal proteins suggests that a primordial gene encoding one of the latter proteins was transferred to eukaryotes by vertical transmission from their prokaryotic progenitors and that gene duplication events in the developing eukaryote gave rise to the three paralogs found in C. elegans. Similarly, the configuration of the tree suggests that the gene duplication and divergence events that gave rise to the functionally dissimilar members of the bacterium-specific subfamilies occurred after the divergence of eukaryotes from prokaryotes.
The current OHS family consists of six proteins, three of which are β-galactoside permeases from closely related bacteria (Table (Table5).5). The lactose permease of E. coli is not only the best-characterized member of this family but also probably the most extensively studied permease in the MFS (41, 82). An experimentally verified 12-TMS topological model for LacY has been published (10), and extensive data provide evidence for the nature of the substrate binding sites within the transmembrane region of the permease (8, 12, 23, 27, 43, 57). A detailed mechanistic model that incorporates information obtained using many different experimental approaches has recently been proposed (40).
The other members of the OHS are specific for (i) the trisaccharide, raffinose; (ii) the α,β-nonreducing glucoside-fructoside, sucrose; and (iii) the α-galactoside, melibiose. All of these putative proton symporters are from gram-negative bacteria. The sequence of the melibiose permease of Enterobacter cloacae was deposited in the database after the completion of our phylogenetic analysis of the OHS family and is therefore not represented in Fig. Fig.5.5. However, this protein proved to resemble RafB Eco (75% identity) more closely than it resembles one of the lactose permeases (50% identity) or the sucrose permease (<40% identity).
Figure Figure5A5A presents an alignment of a well-conserved 33-residue portion of the OHS family proteins. Over one-third of the residues shown in this gap-free alignment are fully conserved, and a large majority of the residues appear in the consensus sequence.
The phylogenetic tree (Fig. (Fig.5B)5B) reveals that the raffinose permease clusters tightly with the lactose permeases, but that the sucrose permease is much more distant. Raffinose is a trisaccharide which incorporates the structural elements of sucrose, and melibiose in a single molecule. The degree to which these different permeases overlap in specificity is not known, but the broad specificity of the lactose permease of E. coli is noteworthy (56, 82).
The MHS family includes 16 currently sequenced members of widely differing specificities (Table (Table6).6). Those of known transport function recognize (i) citrate, (ii) α-ketoglutarate, (iii) proline and betaine, (iv) 4-methyl-O-phthalate, and (v) dicarboxylates. The α-ketoglutarate:H+ symport permease of E. coli (KgtP) is probably the best-characterized member of this family (75). An experimentally documented 12-TMS topological model has been proposed for this permease (76).
Metabolites transported by members of the MHS family have little in common, except that they all possess at least one carboxyl group. Several protein members of the MHS family are specific for Krebs cycle intermediates. All are from bacteria, and all characterized members of the MHS family function by proton symport.
A 51-residue segment of the multiple-sequence alignment of the MHS family, including all functionally characterized members, is shown in Fig. Fig.6A.6A. There are no gaps in the aligned sequences, and 11 residues are fully conserved. The majority of the fully conserved residues are probably of structural significance. These residues include four G’s, two P’s and an A. Two fully conserved residues (M and I) are hydrophobic, and two (D and R) are hydrophilic charged residues. Over half of the positions appear in the consensus sequence, illustrating the high degree of conservation observed for the members of this family.
The phylogenetic tree for the MHS family reveals clustering according to substrate specificity. Thus, the α-ketoglutarate permease of E. coli and the dicarboxylate permease of Pseudomonas putida cluster together, the osmoprotectant (proline/betaine) permeases of E. coli and Erwinia chrysanthemi cluster tightly together and are undoubtedly orthologs, and all three citrate permeases cluster tightly together. These three proteins are also undoubtedly orthologs. The MopB protein of Burkholderia cepacia, specific for 4-methyl-O-phthalate, is on a branch by itself. Orf3 Shy clusters loosely with dicarboxylate permeases and therefore may exhibit specificity for such a compound. The phylogenetic tree does not provide clues to the functions of the remaining two unidentified proteins. However, these two proteins (YhjE Eco and HI0418 Hin) appear to be located at a phylogenetic distance from each other consistent with their being orthologs. Several MHS homologs of unknown function listed in Table Table66 are not represented in Fig. Fig.66 because the sequences were deposited in the databases after completion of the phylogenetic studies reported.
The first sequenced member of the FGHS family to be characterized was the FucP fucose permease of E. coli (31), and a 12-TMS topological model for this permease has been presented (32). Subsequently, a galactose/glucose permease of Brucella abortus (20) and a glucose/mannose permease of Bacillus subtilis (61) were characterized and shown to be members of the FGHS family (Table (Table7).7). The Bacillus protein, like the E. coli FucP protein, is believed to be a sugar:proton symporter (61). Only four proteins are currently in the FGHS family.
In spite of the small size of the FGHS family, its members, all of which are derived from bacteria, exhibit a surprising degree of sequence diversion. Figure Figure7A7A shows the best-conserved portion of the complete multiple-sequence alignment of these sequences. Only 4 positions in the gap-free 32-position alignment shown are fully conserved, and a minority of the residue positions appear in the consensus sequence. The phylogenetic tree reveals that the Haemophilus influenzae protein clusters closely with FucP of E. coli, that the galactose/glucose permease of B. abortus is more distant, and that the hexose permease from B. subtilis is most divergent. These relative distances correlate with substrate specificity to the extent known since fucose is of the galacto configuration.
Thirteen proteins make up the current NNP family (Table (Table8).8). These proteins are derived from a variety of gram-negative and gram-positive bacteria as well as various eukaryotes including yeasts (Ynt1 Hpo), fungi (CrnA Eni), algae (Nar3 Cre), and higher plants (Bch1 Hvu and Bch2 Hvu). Irrespective of the organism, the nitrate permeases of the NNP family take up their substrate while the nitrite permeases apparently extrude theirs. Well-characterized members of the family are the NarK nitrite extrusion system involved in anaerobic nitrate-dependent respiration in E. coli (68) and the CrnA nitrate uptake permease of Aspergillus nidulans (81).
A portion of the NNP family protein multiple alignment is shown in Fig. Fig.8A.8A. Although this region is the best-conserved region in the complete multiple-sequence alignment, only three residues, all G’s, are fully conserved. There are few gaps, and several residues are largely conserved (e.g., three additional G’s, an F, and an N are conserved in all but one, two, or three of the proteins, respectively).
The phylogenetic tree for the NNP family (Fig. (Fig.8B)8B) reveals that all of the eukaryotic proteins cluster together (on the left) as do the prokaryotic proteins (on the right). Further, within the eukaryotic cluster, the fungal proteins comprise one cluster while the plant proteins comprise another. Within the prokaryotic cluster, the two (putative) nitrite extrusion systems of E. coli (NarK and NarU) cluster tightly together. These paralogs probably arose by a recent gene duplication event. All other prokaryotic full-length proteins represented are from gram-positive bacteria. The NasA and NarK proteins of B. subtilis cluster loosely together, even though they are believed to catalyze nitrate uptake and nitrite efflux, respectively. The shorter phylogenetic distance of the M. tuberculosis protein (CY04C12) from NarK Bsu suggests that these two proteins may have the same or similar functions. Further, the other M. tuberculosis protein, CY3G12, resembles the two E. coli permeases, NarK and NarU (Fig. (Fig.8B).8B). By contrast, the MG294 protein of M. genitalium is distant from all other members of the NNP family. Proteins from the mycoplasmas often exhibit greater distances from other gram-positive bacterial homologs than do other homologs from the latter bacteria (unpublished observations). The tree thus does not provide a clue to the function of this protein.
The current PHS family is unusual in that it includes members from yeasts, fungi and plants but none from bacteria, animals, and other eukaryotes (Table (Table9).9). As a family within the MFS, it is presumed to be of ancient origin, and therefore one would expect members of the family to be found in bacteria (see Conclusions and Perspectives, below). Two well-characterized members of the PHS family are the Pho84 inorganic phosphate transporter of S. cerevisiae (9) and the GvPT phosphate transporter of Glomus versiforme (33).
The 11 members of the PHS family are fairly uniform in size (518 to 587 residues) and exhibit a striking degree of sequence similarity (Fig. (Fig.9A).9A). Only eight proteins are included in Fig. Fig.99 because of the near identity of the remaining three sequences to at least one protein that was included. Thus, in the 39-residue segment of the multiple-sequence alignment presented for the eight divergent proteins, 11 positions exhibit full conservation. Four of these residues are glycines, one is a proline, and one is alanine, all of which probably play structural roles. The remaining fully conserved residues (N, E, R, S, and K) are polar and therefore may function in substrate or proton binding or in catalysis of transport.
The phylogenetic tree for eight members of the PHS family is shown in Fig. Fig.9B.9B. Three of the four plant proteins cluster tightly together, and the short branch lengths separating them indicate that these proteins differ only slightly in sequence. This fact is also revealed by the partial multiple-sequence alignment shown in Fig. Fig.9A.9A. The yeast and fungal proteins cluster loosely together with the fourth plant protein. The occurrence of distant homologs in both the plant and fungal kingdoms suggests that both kingdoms possess isoforms that diverged from each other before plants diverged from fungi.
The NHS family currently has only two sequenced members, and both proteins are from E. coli (Table (Table10).10). One is a general nucleoside:proton symporter, NupG, and the other is a xanthosine permease. NupG has been examined structurally and is the better characterized of the two proteins (85).
As shown in Fig. Fig.10,10, these two proteins are very similar, exhibiting close to 50% identity in the region shown. They evidently arose by a gene duplication event that occurred relatively recently in evolutionary time.
Five sequenced proteins comprise the OFA family, but only one of these proteins has been functionally characterized (Table (Table11).11). This protein, the oxalate:formate antiporter from Oxalobacter formigenes, provides the basis for naming the OFA family (1, 4). The protein has been purified, reconstituted in an artificial membrane system, and studied structurally (24, 69). As is apparent from Table Table11,11, members of the OFA family are widely distributed in nature, being present in the bacterial, archaeal, and eukaryotic kingdoms.
One would expect that a family that derives its members from all three kingdoms of life would exhibit little sequence similarity. However, as shown in Fig. Fig.11A,11A, A,66 residues in the 40-residue segment shown are fully conserved and a majority of residues are present in the consensus sequence. The phylogenetic tree reveals that all five members of the OFA family are nearly equally distantly related to each other. This may be due in part to the fact that most of the organisms from which these proteins are derived are distantly related. However, one cannot draw conclusions about whether these proteins are orthologs of the same function. The answer to this dilemma will require direct experimentation.
Only three currently sequenced proteins comprise the SHS family, and only one of these, the NanT sialic acid permease of E. coli, is functionally characterized (Table (Table12)12) (52). E. coli possesses two SHS family paralogs, and Haemophilus influenzae possesses one homolog. Sequence comparisons (Fig. (Fig.12)12) and phylogenetic analyses (data not shown) reveal that the two E. coli paralogs are more closely related to each other than either of these proteins is related to the protein from Haemophilus influenzae, even though H. influenzae is closely related to E. coli. The function of this last protein can therefore not be surmised.
Thirteen proteins comprise the MCP family, and all are from eukaryotes (Table (Table13).13). Most of these proteins are derived from various animal sources including three from C. elegans. However, S. cerevisiae possesses four paralogs. Only mammalian members of the MCP family have been functionally characterized. These permeases appear to be energized by proton symport (80). Monocarboxylates transported by these permeases include lactate, pyruvate, and mevalonate (25). Topological studies leading to a 12-TMS model have been reported (65).
The portion of the complete multiple-sequence alignment shown in Fig. Fig.13A13A reveals that within the 53-residue segment presented, only 4 residues are fully conserved, and all are probably of structural significance (two G’s, one F, and one A). In fact, except for the KRR motif and two serines, all of the residues that appear in the consensus sequence are structural or hydrophobic.
The phylogenetic tree for the MCP family is shown in Fig. Fig.13B.13B. All of the functionally characterized mammalian monocarboxylate transporters cluster tightly together, and even the outlying proteins from higher animals (RemP Gga and XpcT Hsa) are loosely associated with this cluster. The three C. elegans paralogs comprise a second diverse cluster. The four S. cerevisiae paralogs branch distantly from the animal proteins, and the yeast paralogs comprise two distinct clusters. Because of the extensive sequence diversion of the functionally uncharacterized proteins, we anticipate that they will prove to exhibit different transport functions.
The ACS family is a large family with 40 currently sequenced members (Table (Table14).14). One of the members of this family was previously recognized to be a member of the MFS (67). This protein is the rabbit inorganic phosphate:Na+ cotransporter (84). Several mammalian proteins of this specificity have now been characterized. All of the recognized substrates of the ACS family permeases are either organic or inorganic anions. Among the organic anions transported are glucarate, hexuronates, phthalate, allantoate, and probably tartrate (13, 66).
Proteins of the ACS family are widely distributed in nature. They are found in both gram-negative and gram-positive bacteria and in both the animal and fungal eukaryotic kingdoms. Oddly, no plant member has been sequenced, and none of the sequenced ACS proteins is from an archaeon.
Several organisms possess multiple ACS family paralogs. Thus, B. subtilis and Rattus norvegicus each have at least 2, E. coli has 5, S. cerevisiae has 7, and C. elegans has at least 15. Considering that the C. elegans genome was only about half sequenced when these analyses were conducted, one can anticipate that this one organism will prove to have nearly 30 paralogs within this one family of the MFS!
A portion of the complete multiple-sequence alignment for the proteins of the ACS family is shown in Fig. Fig.14A.14A. No residue is fully conserved at any one position. However, the aligned sequences are essentially gap free except for an incompletely sequenced region of one protein from C. elegans and another C. elegans protein which exhibits a single-residue insertion not found in the other proteins. Particularly worthy of note are the following residues found in the consensus sequence. The G is conserved in all but one protein; the W is conserved in all of the top 31 proteins, and the ER motif is conserved in many of the proteins. It is clear that the proteins have been correctly aligned in spite of very significant sequence divergence.
The phylogenetic tree for the ACS family is shown in Fig. Fig.14B.14B. Distinct clustering of the many proteins represented is apparent. Most striking is the fact that all bacterial proteins comprise one cluster, all the yeast proteins comprise a second, and the animal proteins comprise two additional diverse clusters. Proteins of known and similar specificities cluster tightly together (e.g., the mammalian inorganic phosphate:Na+ cotransporters or the glucarate transporters of gram-negative and gram-positive bacteria). Proteins specific for different but structurally related substrates (e.g., phthalate and 4-hydroxyphenylacetate, or glucarate and hexuronate) are found within the same cluster but distantly, while those of very dissimilar substrate specificities do not cluster at all. This tendency of permeases of similar specificities to “flock” together has been noted before (71, 72) and provides a basis for assigning tentative functions to several of the uncharacterized members of the ACS family.
Table Table1515 presents the seven members of the AAHS family. These proteins are derived exclusively from gram-negative bacteria, and they all transport aromatic acids. They exhibit a relatively narrow range of sizes (418 to 460 residues).
The part of the complete multiple-sequence alignment reproduced in Fig. Fig.15A15A for the seven members of the AAHS family reveals full conservation in 6 of the 49 positions shown, while 26 positions appear in the consensus sequence. Four of the fully conserved residues are glycines, of likely structural significance. The remaining two fully conserved residues are S and D. The fully conserved D and the adjacent R/K residues are part of a motif seen in part in the consensus sequence as follows: LADRFGRKR. This region appears to comprise a hydrophilic loop separating two very hydrophobic TMSs (TMS2 and TMS3 [see Table Table19]).19]).
The phylogenetic tree of the seven proteins of the AAHS family (Fig. (Fig.15B)15B) reveals that the two PcaK proteins of the same specificity cluster together, but the other proteins do not cluster. Based on this tree, we doubt that the E. coli protein is a 3-hydroxyphenylpropionate transporter like HppK.
Two additional proteins, the BenE benzoate:H+ symporter of Acinetobacter calcoaceticus and an E. coli protein, exhibit 36% identity and 61% similarity to each other. These two proteins show limited sequence similarity to members of the AAHS family, but the degree of similarity is insufficient to establish homology. The BenE family is therefore considered a distinct family (TC 2.46) (72b).
While screening the complete S. cerevisiae genome, Goffeau et al. (26) recently identified a new family within the MFS. The S. cerevisiae genome was found to encode a total of 28 proteins that either cluster with the drug resistance families 2 and 3 of the MFS or were members of a new cluster (cluster III in reference 26). Six S. cerevisiae proteins comprise the new family, and because none of the encoded proteins has been functionally characterized, we have designated this cluster the unknown major facilitator (UMF) family (Table (Table16).16). Like family 2 drug resistance permeases (DHA14; Table Table1),1), the members of this new family possess 14 putative TMSs. These novel yeast proteins are fairly uniform in size (606 to 637 residues). Surprisingly, no sequenced close homologs are found in any other organism.
A partial multiple-sequence alignment and the phylogenetic tree for the UMF family are shown in Fig. Fig.16A16A and B, respectively. Figure Figure16A16A reveals the high degrees of sequence similarity observed for these six proteins. Two of the UMF paralogs (Ykr106w and Yc1070c) are extremely close in sequence and must have arisen from a very recent gene duplication event. Two other UMF paralogs (Yh1047c and Yh1040c) cluster loosely together and presumably resulted from an earlier gene duplication event. The remaining two paralogs (Yel065w and Yol158c) are the most divergent members of the family.
Three currently sequenced proteins are found within the cyanate permease family, two from E. coli and one from B. subtilis (Table (Table17).17). One of these proteins, CynX, is encoded within the cyanate degradative operon cynTSX of E. coli (79). CynX is 393 amino acyl residues long and possesses 11 or 12 putative TMSs (Fig. (Fig.17).17). The other members of the family are an E. coli protein of 393 amino acids, demonstrably homologous to the tartrate permease of Agrobacterium vitis (Table (Table14),14), and a B. subtilis protein of 402 amino acids (Table (Table17).17).
The POT family has been described by two different laboratories. In 1994, Paulsen and Skurray (62) recognized eight sequenced or partially sequenced members of this family and named it the proton- (or proton motive force [pmf])-dependent oligopeptide transporter (POT) family. Subsequently, Steiner et al. (78) renamed the family the peptide transport (PTR) family. Neither group noticed similarities of the mem-bers of this permease family to members of the MFS.
The current POT family includes 34 proteins derived from bacteria, yeast, fungi, plants and animals (74a). In addition to transporting peptides of two to four residues, some have been shown to transport individual amino acids. One plant protein transports nitrate and chlorate, and another transports nitrite.
The POT family (TC 2.17) was found to exhibit sequence similarity to selected members of MFS families 1 (SP) and 2 (DHA14) (maximal optimized comparison score of 8 SD for an aligned segment of 70 amino acyl residues). This value is sufficient to strongly suggest that the POT family is a member of the MFS. However, the shortness of the segments compared and the fact that only a few currently sequenced MFS proteins exhibit significant similarity to POT family members lead us to retain the POT family as a distinct family, at least at present. A signature sequence for the POT family is provided in Table Table18.18.
Using the portions of the multiple-sequence alignments shown in Fig. Fig.33 to to17,17, we attempted to derive valid signature sequences for all of the 17 constituent families of the MFS. We were successful except in the case of the very diverse family 14 (ACS) and the large family 1 (SP), for which a complete multiple-sequence alignment that includes all 133 members is not available. A potential signature sequence for family 1 (SP) which encompasses the proteins included in the alignment shown in Fig. Fig.3A3A was, however, derived. This and other MFS family-specific signature sequences are presented in Table Table18.18. They should be useful in identifying and characterizing new members of these MFS families as they become sequenced.
In 1990, Henderson and Maiden noted a sequence motif common to several of the MFS permeases then available for study (36). This five-residue motif (RXGRR) occurred between TMS2 and TMS3. The motif was suggested to form a β-turn linking the adjacent transmembrane helices, and the cationic residues were suggested to interact with negative charges in membrane lipid head groups. Subsequent studies revealed that this motif could be identified even in MFS members that exhibit extensive sequence divergence (30). A similar but less well conserved motif was found in the second half of the protein, between TMS8 and TMS9.
Jessen-Marshall et al. (39) examined the functional significance of an expanded form of this motif [GX3(D/E)(R/K)XG[X](R/K)(R/K)], an 11-residue motif, by using the lactose permease of E. coli as a representative model system. Comparable studies on the same motif in the Tn10-encoded Tet carrier of E. coli have also been reported (87–89). The work on this metal-tetracycline:H+ antiporter of E. coli suggested that the negative charge at position 5, at least one of the basic residues, and both glycines at positions 1 and 8 are important for function. Jessen-Marshall et al. determined the phenotypes of 28 mutants with site-specific mutations in LacY in which this motif was altered. Their studies led to the conclusions that small side chain volume at specific positions and high β-turn propensity are probably of structural importance within this motif. Additionally, the acidic residue (D) at position 5 in the above-cited motif proved to be important for transport activity, in agreement with the results of Yamaguchi et al. (87–89). Although replacement of the glycine at position 8, or of any one of the basic residues within the motif, frequently had little or no pronounced effect on transport, any two such replacements resulted in reduced transport rates. Taken together, the results suggested that the motif is important for both structural and functional aspects of the lactose permease. It was suggested that the motif may be important in promoting global conformational changes of the permease that accompany transport.
Since this motif appears to be one of the most strongly conserved features of previously characterized MFS proteins, we were interested to determine if it could be identified between TMS2 and TMS3 in all or most of the MFS families characterized here. The complete multiple-sequence alignments of the proteins which comprise the 17 MFS families were therefore examined to attempt an identification of comparable motifs between TMS2 and TMS3. In these analyses, the single residue which was present in the largest number of the proteins which comprise the family was recorded, even when that residue was not present in a majority of the proteins. When no residue predominated, an X was recorded at that position. The alignment used for the SP family (family 1) included the representative members examined in Fig. Fig.3,3, but for all the other families, all (or almost all) of the protein members were studied. The results are recorded in Table Table19.19. Thirteen positions were included in the analysis, and in all cases the residues recorded represent those that occur sequentially (without gaps or insertions). In one case, the NHS family (family 10), just one of the two members of the family was used to derive the motif since gaps occurred in this region of the other family member.
The consensus motif for this 13-residue sequence is G-[RKPATY]-L-[GAS]-[DN]-[RK]-[FY]-G-R-[RK]-[RKP]-[LIVGST]- [LIM]. Within this motif, four positions are specified by a single amino acid (G-1, L-3, G-8, and R-9), four positions are specified by a pair of closely related residues (DN-5, RK-6, FY-7, and RK-10), three positions are specified by three possible residues (GAS-4, RKP-11, and LIM-13), and two positions are more degenerate (RKPATY-2 and LIVGST-12). At the bottom of Table Table19,19, the number of families that conform to the consensus signature sequence at each of the 13 positions is indicated. The G at position 8 is in 15 of the 17 families, DN at position 5 and LIM at position 13 are found in 14 of the families, RK at positions 6 and 10 are found in 13 of the families, and R at position 9 is found in 9 of the families. The seven other positions coincidentally show 11 families that conform to the expected residue or set of residues. For single-residue conservation, the order is as follows: G-8 (15) > D-5 (12) > G-1 (11) and L-3 (11) > R-9 (9) and L-13 (9) > R-6 (7), F-7 (7) and R-10 (7), where the numbers in parentheses indicate the number of families. Thus, the degree of conservation is striking.
Henderson and coworkers have noted the presence of shared sequence motifs in the first and second halves of MFS sugar porters (family 1) (30, 36). Rubin et al. (70) presented convincing statistical evidence that tetracycline efflux proteins of the MFS (family 3) exhibit sufficient sequence similarity in their first and second halves to establish homology on statistical grounds. Using the MEME and MAST programs (5), we have found further evidence that proteins within several of the MFS families consist of duplicated six-TMS units. The evidence will be presented here for one of these families, the PHS family (family 9).
The output from the MEME and MAST programs for the two halves of eight members of the PHS family is presented in Table Table20.20. The protein abbreviations recorded in Table Table9,9, as well as the residues that comprise the two halves of these proteins, are given. The programs identified six motifs (motifs A, A′, B, C, C′, and D). Motifs A, B, C, and D were found in the first halves of these proteins, while motifs A′, B, C′, and D were identified in the second halves of these proteins. The spacing between motifs A and B was the same as that between motifs A′ and B, that between motifs B and C was the same as that between motifs B and C′, and that between motifs C and D was the same as that between motifs C′ and D for the first and second halves of these proteins, respectively (Table (Table20).20). Motifs A and A′ as well as motifs C and C′ were found to exhibit minimal sequence similarity, even though they occurred in corresponding parts of the two halves of these proteins. However, motifs B and D proved to be essentially the same in both halves of these proteins (Table (Table20).20). Consensus motif B is DKIGRKKIQ, while consensus motif D is IPETKRYTLEE. The occurrence of these motifs has been well documented in previous publications concerned with sugar porters (family 1) (30). Table Table2020 presents the alignments of these two motifs in the eight proteins examined for both the first and second halves of these proteins. The degree of sequence conservation is striking.
When regions of the two halves of these proteins were compared with the GAP program (16), comparison scores of 8 to 10 SD were often observed. For example, PT1 Stu of family 9 (PHS; Table Table9)9) exhibited a comparison score of 10 SD with 26.4% identity when residues 68 to 121 were compared with residues 344 to 397. Sometimes heterologous protein comparisons also gave high comparison scores. Thus, when residues 29 to 94 of LacY Eco of family 5 (OHS; Table Table5)5) were compared with residues 281 to 346 of ProP Eco of family 6 (MHS; Table Table6),6), a comparison score of 13 SD (34.8% identity) was obtained. These values clearly demonstrate a significant degree of sequence similarity that is strongly suggestive of homology. In agreement with Rubin et al. (70), we therefore conclude that proteins of the MFS arose by an internal gene duplication event which probably occurred prior to divergence of the MFS families.
We have recently formed a “transport commission” (TC) with the primary goal of classifying all transmembrane transport systems. These systems are classified according to four criteria as follows: W, transport mode and energy-coupling mechanism; X, family or superfamily; Y, subfamily or family, respectively; Z, substrate specificity. Every transporter family or superfamily thus has a two-digit TC number (W.X). The TC number for the MFS is 2.1 (Table (Table21).21). Each of the 17 families within the MFS has been assigned a three-digit TC number (W.X.Y). Thus, the SP family has the TC number 2.1.1; the 14-TMS drug:H+ antiporter (DHA14) family has the TC number 2.1.2, and the CP family has the TC number 2.1.17 (Table (Table21).21). Because we were not able to satisfactorily establish homology of POT family permeases to established MFS proteins, the POT family is not included within the MFS and has its own TC number (TC 2.17).
Each permease of dissimilar function within each MFS family is given its own four-digit TC number (W.X.Y.Z), but more than one ortholog of the same function in different organisms, more than one paralog of the same function in a single organism, and functionally uncharacterized permeases are not included if they are within the same MFS family. Thus, all orthologous galactose:H+ symporters in the SP family have the TC number 22.214.171.124, while all arabinose:H+ symporters in the SP family have the TC number 126.96.36.199 (Table (Table21).21). The information provided in the TC tables includes the TC number for a particular permease, the name of that permease, its biological sources, and an example of such a permease. The accession number of the permease chosen to exemplify a particular entry is provided, allowing easy access to its sequence. A BLAST search (2) should allow the identification of other orthologs. A more complete description of the TC classification system, its rationale, and its benefits are described elsewhere (73).
In this computational analysis, we have attempted to identify all recognizable sequenced members of the MFS that have been deposited in the current databases. We have used sequence similarity as a basis for inclusion of proteins in the superfamily, with a cutoff point of 8 to 9 SD for the comparison score of a test sequence with that of an established member of the family (18, 71). Two families, the FGHS family (family 7) and the POT family (putative family 18), gave optimized comparison scores of 8 SD for segments of these permeases that are in excess of 100 and 60 residues, respectively (Table (Table2).2). These families are likely to be constituents of the MFS, but their assignments to the superfamily are more tenuous than for those of the other families. This is particularly true of the POT family, where the segment exhibiting sequence similarity to an established MFS member is short. All the other families included members that gave comparison scores with an established member of the superfamily of 10 SD or more. Furthermore, these scores were generated with large segments of the compared proteins, and usually the entirety of the sequences was compared. We are therefore confident that all the families listed in Table Table11 (with the possible exceptions of the FGHS and POT families) are members of the MFS.
Sixteen to eighteen MFS families were identified and generally shown to be restricted to a specific type of substrate. Thus, functionally characterized members of families 1, 5, and 7 are almost without exception specific for sugars; characterized members of families 2 and 3 are without exception specific for drugs and other deleterious substances; and families 4, 6, 8, 9, 11 to 14, and 17 are specific for various classes of anionic compounds. Furthermore, the only nucleoside permeases in the MFS are found in family 10, and most of the aromatic acid permeases are found in family 15. These observations clearly show that substrate specificity is a well-conserved trait and that phylogenetic classification provides a limited but reliable guide to function.
Similar considerations can be applied to pump polarity. Thus, while members of families 1, 5, 7, and 8 can apparently function quite readily by one or more modes (e.g., uniport, symport with inwardly directed polarity, and/or antiport), families 2, 3, 4, and 11 apparently function with a high propensity for an antiport mechanism, and families 6, 9, 10, and 12 to 15 probably function with a high propensity for a cation symport mechanism. Clearly, these mechanistic differences must reflect structural and catalytic residue differences, regardless of whether they reflect qualitative or quantitative differences. The molecular bases for these differences should be subject to biochemical, biophysical, and molecular genetic analyses.
The phylogenetic tree in which representative members of each MFS family were included (Fig. (Fig.2)2) revealed that the major families diverged from each other long ago, possibly more than 2 billion years ago. Moreover, all currently recognized, topologically studied members of the MFS possess either 12 or 14 putative TMSs. We predict that the 14-TMS topology arose from the 12-TMS topology more than once during the evolution of the MFS. These considerations suggest that the MFS is one of the oldest protein families on Earth and that MFS proteins were present more than 3 billion years ago in their present form, probably with 12 TMSs. The age of the MFS, as well as currently unrecognized architectural features of these proteins, can account in part for its functional and sequence diversity (73).
During our analyses of transport protein families, we have noted that some families have diversified extensively while other families have not. Thus, while proteins of the MFS (TC 2.1) and the ABC superfamily (TC 3.1) have evolved to transport almost any substrate of biological interest, other ancient families have apparently not done so. For example, the ammonium transporter (Amt) family (TC 2.49), which includes the ammonium carrier of Corynebacterium glutamicum (77), consists of homologous proteins found ubiquitously in bacteria, archaea, and eukarya, but all functionally characterized members of the family transport NH4+ (73). Many other such examples exist. Why one ancient family has been capable of evolving functional diversity while another has not is not presently understood.
As noted above, the MFS transports a wide array of substrates with either inwardly or outwardly directed polarity or without polarity (Table (Table1).1). These substrates include sugars, drugs, an array of metabolites, amino acids, nucleosides, vitamins, and both inorganic and organic anions and cations. Such compounds are also transported via ABC-type permeases, but the latter family of permeases can also transport macromolecules such as proteins, complex carbohydrates, and intact phospholipids. Of the several hundred MFS carriers so far identified, none has yet been found to be capable of transporting a macromolecule. Some do transport oligosaccharides and peptides, but the sizes of these polymers appear to be limited to about 4 amino acyl or sugar residues. The basis for this fundamental functional difference between ABC and MFS permeases is not yet understood, but it presumably reflects the dimensions and flexible associations of the channel-forming α-helices. Perhaps ABC-type channels can “breathe” and expand or even dissociate into constituent helices whereas MFS channels cannot. Perhaps the channels of MFS permeases are bounded by the helices of a single domain whereas ABC permease channels are formed from dissociable domains. We suggest that very basic architectural features of these two classes of permeases differ and provide an explanation for their distinctive functional characteristics.
As summarized in Table Table1,1, several MFS families are restricted to certain types of organisms. For example, families 6 (NHS) and 12 (SHS) are restricted to gram-negative bacteria while family 16 is restricted to yeasts. It is interesting to ask whether this restriction is a true reflection of the distribution of these proteins in nature or whether it merely reflects the need for more extensive sequence data. Examination of the data summarized in Table Table11 reveals that within limits, the largest families are represented in the largest numbers of phyla while the smallest families are restricted to the smallest numbers of phyla. Thus, families 1, 3, 14, and 18 are represented in four or more of the indicated phyla; except for family 2, these are the largest MFS families. Similarly, families 5, 7, 9, 10, 12, 15, 16, and 17 are found in only one or two phyla; except for family 11, these are the smallest families. We therefore anticipate that most of the MFS families will eventually prove to be ubiquitous. This suggestion is consistent with our proposal that most of the individual MFS families diverged from each other more than 2 billion years ago, before eukarya and archaea diverged from bacteria. It should be noted, however, that we have tacitly assumed that branch length is approximately proportional to phylogenetic distance. Branch length is, in fact, a true reflection of sequence divergence and therefore a reflection of both the time and rate of divergence. If evolutionary pressure for sequence divergence is variable over evolutionary time, a “phylogenetic” tree can be misleading. While we believe that the MFS tree is in general reflective of evolutionary time (i.e., that the rate of evolutionary divergence was fairly constant for the different families), exceptions may occur. Thus, family 16 (UMF) includes six closely related proteins, all from S. cerevisiae, and no member of this family has yet been found in another organism. Since none of these proteins has been functionally characterized, it is possible that they have evolved to serve a yeast-specific function and that the six paralogous members of the family arose relatively recently due to gene duplication events that occurred in yeasts. These proteins may serve a transport function that is specific to yeasts, a function perhaps involving mating, budding, or differentiation.
A final question arises, i.e., whether additional MFS families are likely to be found. Families with greater sequence divergence than those currently recognized (e.g., those with members that give less than 8 SD in comparison scores with members of known MFS families) and families not currently represented in the databases (e.g., those that will eventually prove to be the smallest MFS families) will undoubtedly be discovered as additional sequence data become available. However, when we screened the sequence motif presented at the top of Table Table1919 against the SwissProt database with 0, 1, 2, 3, or 4 mismatches, the only transport proteins that were retrieved were recognized members of the MFS already listed in Tables Tables33 to to17.17. This observation leads us to suggest that if additional MFS proteins are in the current databases, they have diverged in sequence to such an extent that they are unrecognizable on the basis of primary structure alone. Three-dimensional structural data on numerous secondary carriers, including both MFS and non-MFS permeases, may be required to resolve this question.
We thank Tim Bailey, Charles Elkan, Jayna Ditty, André Goffeau, Caroline Harwood, Gary Kuan, Ellen Neidle, Jonathan Reizer, and Marek Sliwinski for valuable discussions. We also thank Mary Beth Hiller, Lyn Alkan, and Milda Simonaitis for assistance in the preparation of the manuscript.
Work in our laboratory was supported by USPHS grants 2RO1 AI14176 from the National Institute of Allergy and Infectious Diseases and 2RO1 GM55434 from the National Institute of General Medical Science (to M.H.S.). I.T.P. was supported by a C. J. Martin Fellowship from the National Health and Medical Council of Australia.
H. Huel, S. Turgut, K. Schmid, and J. W. Lengeler (J. Bacteriol. 179:6014–6019, 1997) have recently reported the sequences of the d-arabinitol:H+ and ribitol:H+ symport permeases of Klebsiella pneumoniae (DalT and RbtT, respectively). These two proteins are 86% identical and are 425 and 427 aminoacyl residues long, respectively, both with 12 putative TMSs. We have conducted phylogenetic analyses of these two polyol permeases and have found that they, together with an uncharacterized protein encoded within the Bacillus subtilis genome, comprise a novel MFS family which we have termed the polyol permease (PP) family (family 18) (T.-T. Tseng and M. H. Saier, Jr., unpublished observations). The proteins of the PP family exhibit an approximation to the MFS-specific sequence motif between TMSs 2 and 3 (Table (Table19)19) of GVVAEIIGPRKTM, thus showing poor correspondence to the N-terminal half of this MFS-specific motif but excellent correspondence to the C-terminal half. Binary comparison of DalT with KgtP Eco (Table (Table6)6) gave a comparison score of 10.5 standard deviations for a segment of 107 residues (21% identity, 49% similarity, 0 gaps). This value is sufficient to establish that the three proteins of the PP family are members of the MFS (T.-T. Tseng and M. H. Saier, Jr., unpublished results). The three proteins of the PP family also exhibit recognizable sequence similarity to members of several other MFS permease families. The following two signature sequences proved to be specific to the PP family: Y(A/G)(L/I/V)RGX(A/G)YPLFXYSF(L/I/V)V and GEX2TLWXALXFX3GG(L/I/V)2AL (X is any residue). By hybrid protein construction, Heuel et al. demonstrated that the substrate specificities and kinetic properties for transport of DalT and RbtT are determined by the amino-terminal halves of the proteins. This result contrasts with those reported for the lactose permease of E. coli (LacY; TC 188.8.131.52) in which substrate specificity appears to be determined primarily by residues in the carboxy-terminal half of the protein.
In recent work, M. J. Whipp, H. Camakaris, and A. J. Pittard have cloned and analyzed the shiA gene, which encodes the shikimate transport system of E. coli K-12 (SwissProt accession no. P76350; 438 amino acids) (Gene, in press). This permease proved to be a member of the metabolite:H+ symporter (MHS) family (family 6) of the MFS.