|Home | About | Journals | Submit | Contact Us | Français|
The TqsA (YdgG) protein of Escherichia coli has been shown to export the autoinducer-2 (AI-2) molecule, a furanosyl borate diester that bears little resemblance to previously characterized biological molecules. TqsA belongs to a large superfamily, the AI-2 exporter (AI-2E) superfamily, of putative transporters with no other functionally characterized members. These proteins derive exclusively from bacteria. Many different bacterial kingdoms contain them, although several kingdoms do not. These proteins exhibit a uniform topology with 8 putative transmembrane segments (TMSs) which we show probably arose from a 4-TMS precursor in a process that involved at least one and possibly two intragenic duplication event(s). The first halves of these proteins are more diverse in sequence than the second halves, suggesting that the first halves may serve substrate-specific functions while the second halves serve family-specific functions. Conserved residues and motifs in these proteins are identified. Some homologues include extra catalytic domains including those involved in purine nucleotide biosynthesis, ATP and GTP binding, and molecular signaling. The results presented provide guides for future functional studies on members of this superfamily of bacterial transporters.
Bacteria use a variety of means to communicate with one another and with their eukaryotic hosts [1,2,3,4,5,6]. In some cases, social interactions in complex bacterial colonies allow bacteria to synchronize the behavior of all members of the group and thereby act like multicellular organisms . By contrast, some bacterial social engagements promote individuality among members within the group and thereby foster diversity [8, 9]. Bacterial communication systems include long- and short-range chemical signaling ; one-way, two-way, and multi-way communication ; contact-mediated and contact-inhibited signaling [12, 13], and the use and spread of misinformation, even deadly information [8, 14].
Bacteria use a diversity of small molecules for extra- and intracellular signaling . They scan small-molecule mixtures to access information about both their extracellular environment and their intracellular physiological status. Based on the integrated information, they continuously interpret their circumstances and react rapidly to changes . They must integrate extra- and intracellular signaling information to mount appropriate responses to environmental changes.
Several small-molecule bacterial signaling pathways have been identified. These include extracellular ‘quorum-sensing’ signaling and intracellular cyclic dinucleotide signaling. Possibly, these two pathways converge to control complex processes including multicellularity, biofilm formation, and virulence .
The exchange of extracellular signaling molecules must be mediated by specific transporters, some involved in molecular uptake, others catalyzing efflux [16,17,18]. Unlike other autoinducers, which are specific to a particular species of bacteria, autoinducer-2 (AI-2) is produced by a large number of bacterial species. AI-2 has been proposed to serve as a ‘universal’ signal for inter-species communication . The crystal structure of an AI-2 sensor protein, LuxP, in complexation with the autoinducer, revealed the bound ligand to be a furanosyl borate diester that bears no resemblance to previously characterized autoinducers .
Recently, the Escherichia coli YdgG (TqsA) protein was shown to control biofilm formation in E. coli K-12 by mediating AI-2 transport. YdgG had been known to be induced in E. coli biofilms. Deletion of ydgG decreased extracellular and increased intracellular concentrations of AI-2, suggesting that YdgG enhances export of AI-2 . Consistent with this hypothesis, deletion of the ydgG gene increased cell motility by increasing transcription of flagellar genes that were known to be induced by AI-2. By expressing ydgG in trans, the wild-type phenotypes for extracellular AI-2 activity, motility, and biofilm formation were restored.
YdgG is a membrane-spanning protein that is conserved in many bacteria. It influences resistance to several antimicrobials, including crystal violet and streptomycin . Deletion of ydgG caused 31% of the bacterial chromosome to be differentially expressed in biofilms. This fact apparently resulted because AI-2 controls the transcription of hundreds of genes. Since YdgG catalyzes export of the quorum-sensing signal, AI-2, the gene name tqsA (transporter of quorum sensing-A) was suggested .
In this communication, we analyze the family of putative transporters to which TqsA belongs. We name this family the AI-2 exporter (AI-2E) superfamily, based on the function of its first characterized member. We show that it is a huge superfamily (Pfam family PF01594) with members derived from a wide range of bacteria. However, no member of the family was identified from an archaeon, a eukaryote, or a member of certain bacterial kingdoms that include sequenced organisms with reduced genome sizes. Phylogenetic analyses revealed the relationships of these homologues to each other with groupings largely according to organismal type. However, this occurred in an unusual fashion suggestive of several functional types with occasional lateral gene transfer responsible for some of the anomalies. Thus, the analyses reveal groups of apparent orthologues that presumably serve the same function. Conserved motifs are identified which must play specific functional/structural roles. An evolutionary pathway for the appearance of these proteins, based on intragenic duplication of an ancestral 4-transmembrane segment (TMS) unit is established. Finally, extra domains with recognized functions in some of these proteins are identified, and these provide clues as to the associated transport functions.
The PerM protein of E. coli (TC #2.A.86.1.1) was used as the query sequence in PSI-BLAST searches (default settings) with six iterations . A modified CD hit program  was used to eliminate redundancies and closely similar sequences of greater than 90% identity (the default setting). This program randomly selects one protein of several that are of greater than 90% identity with each other. Membership to the AI-2E superfamily was assigned based on sequence similarity throughout the transmembrane regions. All members of the superfamily thus exhibited at least 9 SD (probability that the observed degree of similarity arose by chance is less than 10–19). 391 non-redundant homologues from the NCBI protein databank remained and served as our data set for all analyses reported here (see table S1 on our website; http://www.biology.ucsd.edu/~msaier/supmat/AI-2). A neighbor-joining phylogenetic tree (fig. (fig.1f),1f), based on a CLUSTAL × multiple alignment  (see fig. S1 on our website), and a dendogram (fig. S2), drawn using the TreeView program , were generated.
The proteins generally fell into 5 distinct clusters or families of fairly similar size (fig. (fig.1f).1f). Cluster 1 (family 1) contains 78 proteins; family 2, 58 proteins; family 3, 89 proteins; family 4, 103 proteins, and family 5, 63 proteins. The multiple alignment of all 391 proteins (fig. S1) revealed two fully conserved prolyl residues, one at alignment position 532, and the other at alignment position 627 within hydrophobic peaks 6 and 8, respectively (see below). Several gaps were observed within the hydrophilic regions of these aligned sequences (positions 249–263, 336–384, 404–416, 441–446, 463–468, 508–515, 523–526, and 559–569 of the 1,112 residue position alignment presented in fig. S1).
Each of the 5 families within the AI-2E superfamily was analyzed separately for sequence conservation, topology, and phylogeny. The proteins, their organismal sources, abbreviations, sizes, numbers of putative TMSs and Genbank (gi) identification numbers for the 5 clusters are provided in tables S3–S7. The 5 neighbor-joining phylogenetic trees are shown in figure 1a–e, while the average hydropathy, amphipathicity and similarity plots, based on the AveHAS program  are shown in figure 2a–e. The AveHAS program averages the hydropathy plots for all aligned sequences, providing much greater accuracy than would otherwise be possible. These figures provide the bases for several of the conclusions cited below. They were derived using the multiple alignments presented in figure S3–S7 on our website. Hydropathy and amphipathicity plots for individual proteins were generated using the WHAT program .
Family 1 proteins show eight peaks of hydrophobicity (fig. (fig.2a)2a) and several fully conserved residues (fig. S3). These include seven prolyl residues (alignment positions 104, peak 3; 195, end of peak 4; 217, between peaks 4 and 5; 279, peak 6; 326 and 337, beginning of peak 8; and 361, end of peak 8). The second halves of these proteins proved to be better conserved than the first halves.
We derived consensus sequences for the two best conserved regions of these proteins for all 5 families. In peak 6, the consensus sequence was:
* : : : * : : * *
GLSVLIPY (LIV) GA (LIV)3 TVP
while in peak 8, the consensus sequence was:
* : : : : * : : : * * . * : . :. :*
P (LIV) FSEAVNLHP (LIV)4 (SA) (LIV)3 FGGLWGFWGVFFAIP
where asterisks indicate identities only, colons close similarities, and dots more distant similarities as defined by the CLUSTAL × program; alternative consensus residues at any one position are indicated in parentheses.
Figure Figure33 shows how these two consensus sequences differ between the proteins of the 5 phylogenetic families. Of these 5 families, family 1 appears to be the best conserved with respect to both identities and conservative substitutions. The two sets of consensus sequences show similarities in all 5 families. For example, the P at position 7 in motif 1 and the P in the terminal position of motif 2 are fully conserved in all proteins of the AI-2E superfamily as noted above, suggesting an essential structural or functional role. The G at position 10 in the first consensus sequence and those at positions 21 and 25 in the second consensus sequence are conserved in nearly all families. While the dominant residue may differ in the different families at specific positions, these are not fully conserved in any of these families. Thus, when the consensus residues differ, they are never fully conserved. We conclude that in none of these 5 families is a distinctive function, common to all members in that family, but different from those of the other four families, dictated by a specific residue or set of residues within these two conserved motifs. It appears that these motifs provide structural features or a function that is common to all members of the AI-2E superfamily.
The average hydropathy plots shown in figure 2a–e reveal the characteristics of each of the 5 families in the AI-2E superfamily. Family 1 is most compact, with no protein showing N- or C-terminal extensions. In this plot, peaks 1 and 2 are close to each other, peak 3 is more distant, and peak 4 is still more distant. Peaks 5–7 are all overlapping while peak 8 is distinct but close to peak 7. This pattern is observed for all 5 families (compare figure 2a–e).
The average amphipathicity plots revealed that all 5 families show similar characteristics: amphipathic peaks occur following hydrophobic peaks 3, 4 and 8 and coinciding with peak 6. Motif 1, also coinciding with peak 6, reveals a general pattern of a semipolar residue followed by three hydrophobic residues. A helical wheel depiction showed that strongly hydrophobic residues occur on one side of helix 6, while semipolar residues predominate on the other side (data not shown), thus accounting for the amphipathicity of putative TMS 6.
As noted above, families 2–5 show poorer conservation than observed for family 1 (figure 2a–e and and3).3). Further, families 2–4 contain proteins with short N-terminal and long C-terminal hydrophilic sequences, while family 5 contains proteins with both N- and C-terminal extensions maximally of about 160 residues each. The domains in these longer proteins will be analyzed and described below.
The phylogenetic trees for the 5 AI-2E families are presented in figure 1a–e. Family 1 proteins derive exclusively from the proteobacterial phylum of the γ- and δ-orders. Family 2 is largely from α-, β-, γ- and δ-proteobacteria, but three proteins are from a single species each of the Bacteroides, Firmicute and Planctomycetes phyla, respectively. Family 3 includes proteins from the same proteobacteria as for family 2, but Chlorobi and Chloroflexi proteins are also present. Family 4 proteins are exclusively from firmicutes except for one actinobacterial protein. Finally, family 5 proteins are mostly from actinobacteria and cyanobacteria with a few proteins from Deinococcus, Chloroflexi and ε-proteobacterial species. It is therefore clear that phylogenetic clustering is predominantly according to organismal type with a few interesting exceptions. Also of interest is the absence of homologues in specific bacterial kingdoms such as the spirochetes, chlamydia, and mycoplasma. Most of the sequenced genomes of bacteria in these kingdoms are of reduced size due to their obligate parasitic lifestyles. We detected no homologues outside of the bacterial domain.
The GAP and IC programs, with default settings and 500 random shuffles (to establish statistical significance) [27, 28] were used in attempts to establish homology between the first and second putative halves (TMSs 1–4 and 5–8) in proteins of the AI-2E superfamily, respectively. Figure Figure4a4a shows an alignment of putative TMSs 1–3 with putative TMSs 5–7, and figure figure4b4b shows an alignment of putative TMS 4 with putative TMS 8. In figure figure4a,4a, the alignment gave 33% identity, 46% similarity, and a comparison score of 8.5 SD. In figure figure4b,4b, the alignment gave 30% identity, 41% similarity, and a comparison score of 9.4 SD. These values correspond to probabilities of ~10–17 and ~10–20, respectively, that the observed degree of sequence similarity could have arisen by chance . The proteins compared are derived from family 3. Interestingly, TMS 4 and the upstream hydrophilic region of the Nmo2 protein aligned with TMSs 6–8 in the Asp2 protein, suggesting that the hydrophilic region between putative TMSs 3 and 4 may have arisen from the equivalent of TMSs 6 and 7. These results clearly suggest that these proteins arose in a process that involved intragenic duplication, but it is not clear whether part or all of the 4-TMS element was duplicated in a single event. The precise path may have involved more than one (partial) intragenic duplication event (see ‘Discussion’).
Table S2 presents the 44 AI-2E superfamily homologues that exhibit larger than normal sizes. Twenty-two are from family 5 (actinobacteria and Chloroflexi), 14 are from family 2 (α- and β-proteobacteria and one Planctomycetes species), 5 are from family 3 (α-, β- and γ-proteobacteria), and 3 are from family 4 (firmicutes). One of the homologues, Tde1 of 665 residues, has a C-terminal domain homologous throughout its length with phosphoribosyl amino imidazole succinocarboxamide synthetase, involved in purine nucleotide biosynthesis. It is thus possible that Tde1 is an exporter specific for a purine nucleotide or one of its biosynthetic metabolites. A second protein, Bme1, also of family 3, has a C-terminal ATP/GTP binding P-loop-containing domain similar to those present in three E. coli proteins, DnaA (P03004), DNA polymerase III δ-subunit (HolA; P28630) and the ATP binding protein, IstB (AAC33916). This observation can be interpreted to suggest a function concerned with export of a nucleotide or a derivative of it.
Two large protein homologues in family 2, Nha1 and Asp3, possess C-terminal sensor GAF domains . These domains can be signaling domains and are present in cyclic di-GMP-regulated cyclic nucleotide phosphodiesterases, adenylate cyclases and bacterial transcription factors . They are similar in structure to PAS domains and probably bind cyclic nucleotides . This observation agrees with those cited above, suggesting an involvement in cyclic nucleotide export. It should be noted that cyclic AMP efflux from E. coli is dependent on the proton motive force , but that no protein has yet been shown to possess such an activity. It is possible that some of these AI-2E porters are the unidentified cyclic AMP exporters .
Three proteins in family 2, Nwi1, Nsp1 and Nha3, appear to contain partial or full-length Tpr-like ligand-binding domains [32,33]. This suggestion is based on TC-BLAST hits when these three proteins were blasted against TCDB . The searches yielded above threshold hits (e–4) to the Sec72 protein of the SRP complex of Saccharomyces cerevisiae (3.A.5.8.1). This latter protein is known to possess the Tpr domain in the region of sequence similarity. The alignment is shown in figure figure55 for Nwi1 versus Sec72. This alignment, with 36% identity and 50% similarity, gave a comparison score of 13.5 SD. This value is considerably in excess of what is required to establish homology . It is worth noting that the region of similarity overlaps the Tpr domain of Sec72 but includes the N-terminally adjacent region as well. This region is actually present in many of the large family 2 proteins.
In this report, we have characterized a superfamily of putative secondary efflux pumps, previously named the PerM family [36, 37]. The functional characterization of the TqsA protein of E. coli (previously annotated as YdgG) by Herzberg et al.  revealed that this PerM family member is the AI-2 efflux pump. The present demonstration of the distribution of 391 members of this superfamily from many bacterial kingdoms has caused us to rename this family after the substrate of its only functionally characterized member. We have, consequently designated it the AI-2E superfamily.
The AI-2E superfamily exhibits some unique characteristics. First, the members fall into 5 phylogenetic clusters or families that we have analyzed separately. Second, all members seem to have a uniform 8-TMS topology, a characteristic reflected by the average hydropathy plots for all 5 families (fig. 2a–e). Two closely spaced TMSs (TMSs 1 and 2) are followed by a more distant TMS 3 and an even more distant TMS 4. These are then followed by four more TMSs in a 3 + 1 arrangement. Third, these hydrophobic domains are homologous throughout their lengths in all 391 members of the AI-2E superfamily analyzed here, and they include two fully conserved prolyl residues, one in TMS 6 and one in TMS 8. Fourth, the second 4 TMSs are better conserved than the first 4 TMSs, a generalization that appears valid for all 5 families within this superfamily. Finally, the two halves apparently arose by one or more internal duplication event(s). TMSs 1–3 are almost certainly homologous to TMSs 5–7, and TMS 4 is homologous to TMS 8.
Surprisingly, the long hydrophilic domain preceding TMS 4 (but not TMS 8) is similar in sequence to the region bearing TMSs 6 and 7 in spite of the greater hydrophilicity of the former regions. This observation suggests that the pathway by which these proteins arose may have been complex. There may have been two partial or full-length duplications of the compact C-terminal 4-TMS domain. Two partial duplications, followed by sequence divergence, or two full-length duplications, followed by partial deletion and sequence divergence, could have given rise to the current members of the superfamily. If the latter possibility is correct, the one deletion event giving rise to an 8-TMS protein from an ancestral 12-TMS protein might have occurred early during the evolution of the superfamily in order to account for their uniform topologies.
The two fully conserved prolyl residues in all members of the AI-2E superfamily are worthy of note. Full conservation in a wide range of sequence-divergent homologues suggests one or more essential functional or structural roles. No molecular genetic analyses have been reported that provide a clue as to what this role or these roles may be. However, cis/trans-prolyl isomerization has been reported to control ion gating channel activity . It is possible that one or both of the conserved prolyl residues in members of the AI-2E superfamily undergo cis/trans-induced conformational changes that accompany transport. Other possibilities, such as essential structural features, can also be considered.
With only one member of the AI-2E superfamily characterized, the functions of most members remain an open question. However, there are clues. The clustering of members of this superfamily into at least 5 families, each derived from a different but often overlapping set of bacterial kingdoms, leads to the suggestion that each family has a set of characteristic properties. Usually, members of a superfamily catalyze only uptake or efflux, but occasionally large superfamilies (e.g. MFS, ABC, DMT) can catalyze both. In almost all cases where both sym- and antiporters occur within the same superfamily, the symporters cluster separately from the antiporters on a phylogenetic tree [36, 39,40,41,42,43,44,45,46]. Thus, if all AI-2E superfamily members catalyze efflux, as we tentatively predict, they are likely to transport a variety of compounds, possibly all related in structure.
What might these substrates be? For clues, we examined members of the AI-2E superfamily that are fused to other domains. We found that various hydrophilic domains of recognizable function may be fused to them, either N-terminally or C-terminally. Based on homology, these domains appeared to be involved in purine nucleotide biosynthesis, purine nucleotide binding and/or hydrolysis, and molecular signaling. On this basis, as well as the known function of TqsA, we tentatively suggest that many or all members of the AI-2E superfamily catalyze the pmf-dependent efflux of both intracellular and extracellular signaling molecules. If so, they are generally involved in transmembrane signaling. Substrates may include cyclic nucleotides such as cyclic AMP, cyclic GMP and bis-3′,5′-cyclic diguanosine monophosphate (cyclic di-GMP) [27, 32] as well as quorum sensors such as AI-2 . Further studies will be required to establish the functional diversity of this large superfamily of secondary carriers.
This work was supported by NIH grant GM077402 from the National Institute of General Medical Sciences. We thank Mary Beth Hiller and Jeeni Criscenzo for assistance in the preparation of the manuscript.