|Home | About | Journals | Submit | Contact Us | Français|
Transcription is the key control point for regulation of numerous cellular activities. Bacteria regulate levels of gene expression by using transcription factors that modulate the recruitment of RNA polymerase (RNAP) to promoter elements in the DNA. Many bacteria also control gene expression by using a second class of transcription factor that uses energy from nucleotide hydrolysis to actively promote transcription initiation.
The sigma (σ) subunit is required for promoter recognition and initiation of transcription by the bacterial RNAP. Typically a bacterial cell may contain several alternative σ subunits with differing sequence specificities that direct the RNAP holoenzyme to different sets of promoters. The σ54 subunit (also known as RpoN, NtrA, and σN) is unique (42) in that it shares no detectable homology with any of the other known sigma factors (e.g., σ70 and σ28). σ54-RNAP binds to specific promoter sites, with the consensus DNA sequence YTGGCACGrNNNTTGCW (6), to form a transcriptionally inactive closed complex consisting of holoenzyme bound to double-stranded DNA. In contrast to σ70-RNAP bound at its cognate promoter sites, σ54-RNAP is unable to spontaneously isomerize from a closed complex to a transcriptionally competent open complex (11, 72). To proceed with initiation of transcription, the closed complex must participate in an interaction with a transcriptional activator, involving nucleotide hydrolysis. This transcriptional activator is usually bound at least 100 bp upstream of the promoter site, and DNA looping is required for the activator to contact the closed complex and catalyze formation of the open promoter complex. In this respect the activator resembles the enhancer-binding proteins (EBPs) found in eukaryotic systems, and for this reason such activators are known as bacterial EBPs.
From a protein structural point of view, EBPs share in common a σ54 interaction module (Pfam accession number PF00158) but typically have at least one additional domain (Fig. (Fig.1).1). In nearly all of those investigated so far, there is a DNA-binding domain containing a helix-turn-helix sequence motif, enabling the protein to bind to specific DNA enhancer elements upstream of σ54-dependent promoters (44, 47, 52, 56, 72). One exception to this scenario was recently reported (30), where Pseudomonas aeruginosa FleQ can activate transcription while bound in the downstream vicinity of the promoter.
Although the physiological advantages conferred by the σ54-EBP mode of transcription are not yet clear, activation of σ54-dependent transcription is highly regulated by environmental cues through regulatory modules in the EBPs and in some cases by interactions with other regulatory proteins. Sensory modules in EBPs include CheY-like response regulator domains, PAS domains, GAF domains, PRD modules, and V4R domains, often represented within an N-terminal region (Fig. (Fig.1).1). These sequence features are described in more detail later in this article. Intriguingly, recent complete genome sequences have revealed some unusual EBPs containing regions of homology to other signal transduction domains and enzymes. With the large number of complete bacterial genomes now available, and with the importance of accurate annotation of future sequence data, we feel that it is timely to survey the variety of domain architectures found in these important proteins.
The AAA+ proteins are a large family of chaperone-like ATPases that often act as molecular machines involved in the formation and remodeling of protein and protein-nucleic acid complexes (45, 51). The sequence feature common to all bacterial EBPs is the σ54-RNAP interaction module (Pfam accession number PF00158). This stretch of about 300 amino acids is similar in sequence to the regions that define the AAA+ family (45, 73) and comprises several structural domains.
The EBPs share seven highly conserved sequence motifs, several of which are shared with other AAA+ proteins (44, 56, 72). Open reading frames with homology to EBPs can be identified in genome sequences by using Blast searches and from protein domain databases such as Pfam (8). However, an additional sequence feature that is diagnostic for genuine EBPs is the GAFTGA motif (73), which is found at residues 215 to 220 in Escherichia coli NtrC. This sequence motif, which is almost invariant in most EBPs, represents a structural feature that probably interacts with the σ factor and is unmasked or stabilized by nucleotide binding or hydrolysis (73). Related proteins that lack this GAFTGA motif (e.g., TyrR and Rhodobacter capsulatus NtrC) are unable to activate transcription at σ54-dependent promoters (10, 48).
A number of EBPs comprise just the σ54 interaction module plus a DNA-binding domain. The best characterized of these is the phage shock protein, PspF, in enteric bacteria. The pspABCDE operon encodes several proteins that appear to help the cell adapt to changes in membrane integrity (19). It is induced by filamentous phage infection and by the presence of secretin proteins, which assist export of large protein complexes from the cell, including virulence factors and type II and type III secretion systems. PspA appears to have important functions in helping E. coli to cope with insertions of secretins in the membrane in addition to its function as a regulator of PspF, and it may be these additional functions that have selected for its physical uncoupling from PspF. The pspC gene of Yersinia enterocolitica was found to be important for normal growth when the Ysc type III secretion system was expressed in the laboratory (14).
Strikingly, the PspF protein sequence does not contain any known regulatory input domains; its activity is controlled by formation of a repressive complex with the PspA protein. Elderkin et al. (19) suggest that PspA may be an escaped regulatory domain of PspF. However, our phylogenetic analysis suggests that PspF is closely related to AlgB and belongs to a clade of response regulator EBPs (see below) (Fig. (Fig.2).2). Another well-studied example of EBPs lacking regulatory input domains is the HrpR/HrpS regulators of plant pathogenicity genes in Pseudomonas syringae. These EBPs are also negatively regulated by another protein, HrpV, although HrpV displays no detectable homology with PspA.
Perhaps the best-known class of EBPs are those that contain an archetypal two-component response regulator receiver CheY-like domain (Pfam accession number PF00072) at their N termini. Two-component systems have evolved to allow bacteria to sense and respond to a wide range of stresses and environmental cues (25, 62). Examples include the nitrogen assimilation regulatory protein NtrC and the dicarboxylic acid transport regulator DctD. These proteins are activated by phosphorylation of an aspartate residue commonly located at position 54 in the N-terminal domain. In the case of NtrC, phosphorylation causes oligomerization and hence activation of transcription by σ54-RNAP, while in DctD, phosphorylation relieves the inhibitory effect that the N-terminal domain has on the σ54 interaction module. In both cases, phosphorylation is carried out by a separate histidine kinase. Unlike other members of the NtrC family, Rhodobacter capsulatus NtrC lacks the GAFTGA motif and activates transcription by σ70-RNAP rather than the σ54 holoenzyme. Transcriptional activation by this EBP requires ATP binding but not ATP hydrolysis (10).
Intriguingly, the Xanthomonas anoxopodis XAC3643 response regulator EBP also has a histidine kinase domain (Pfam accession number PF00512) fused to its C terminus (Fig. (Fig.1).1). The gene encoding this novel protein lies close to ybjY, which is preceded by a strong candidate σ54-dependent promoter. Therefore, XAC3643 probably activates transcription of ybjY, encoding a protein that resembles an ABC transporter permease.
In addition to the two-component system, the other major bacterial signal transduction apparatus that relies on transient phosphorylation of regulatory proteins is the phosphoenolpyruvate-dependent phosphotransferase system (PTS). The main function of the PTS in E. coli is to carry out the fundamental process of substrate-level phosphorylation, whereby the transport and activation of sugar substrates are thermodynamically coupled to dephosphorylation of the glycolytic intermediate phosphoenolpyruvate. The phosphoryl relay proceeds sequentially from phosphoenolpyruvate to enzyme I (EI), HPr, enzyme II (EII), and finally the incoming sugar that is transported across the membrane and concomitantly phosphorylated by EII. Additionally, PTS components participate in signal transduction, chemotaxis, and the regulation of some key physiological processes, e.g., carbohydrate transport, catabolite repression, carbon storage, and coordination of carbon and nitrogen metabolism (50, 53, 54, 58, 67). The molecular basis of the phosphoenolpyruvate-dependent phosphorylation cascades is well characterized thanks to many recent structural and functional studies, including the solution of three-dimensional structures of several of the protein components (69).
Those proteins that are subject to direct regulation by the PTS share a common domain called the PTS regulation domain (PRD) (69). PRDs (Pfam accession number PF00874) are found, usually in pairs, in the BglG/SacY antiterminators as well as in a variety of transcriptional activators such as Bacillus stearothermophilus MtlR and Bacillus subtilis LicR, and their transcription-regulatory activities are modulated by the phosphorylation states of conserved histidine residues. The LevR EBP in B. subtilis activates transcription of levDEFG, encoding a PTS EII complex for the transport and phosphorylation of fructose. This EII complex “fine-tunes” activity of LevR, stimulating it by phosphorylation at His585 and suppressing it by phosphorylation at His869 (40).
Genome sequencing projects have revealed several more PRD-containing EBPs in addition to LevR. In most cases these are in gram-positive bacteria (Bacillus spp., Clostridium spp., Listeria spp., and Thermoanaerobacter tencongensis), but homologues have also been detected in Salmonella enterica serovar Typhimurium and Klebsiella pneumoniae (63). In all cases, genes encoding PRD-containing EBPs are found situated adjacent to genes encoding PTS components, often with an obvious σ54-dependent promoter. Interestingly, in the food-borne pathogen Listeria monocytogenes the LevR homologue ManR activates an EII complex that confers susceptibility to the bacteriocin mesentericin Y105 (13).
Our survey of the domain organization of EBPs in recently sequenced genomes (Table (Table1)1) revealed some novel PRD-containing EBPs. For example, CP2358 and CAC3088 in Clostridium perfringens each contain an HPr domain in addition to a PAS domain (see below) and the σ54 interaction module. HPr is a small cytoplasmic phosphocarrier protein that is a central component of the general PTS. A conserved histidine residue in HPr serves as an acceptor for the phosphoryl group from EI. This phosphoryl group can then be transferred to a number of regulatory targets such as EII components. This situation is not totally without precedent, as domains with homology to PTS components are sometimes found within larger modular proteins. For example in E. coli the FrwA and FryA proteins each contain HPr, EI, and EIIA domains (67). In the clostridial species, genes that encode HPr-containing EBPs are found adjacent to genes encoding PTS systems.
In the thermophilic anaerobe T. tencongensis, protein TTE0180 is a PRD-containing EBP that also contains an EIIA domain that probably forms a galacitol-specific EII complex with the TTE0182 and TTE0183 gene products. Lying between the TTE0180 and TTE0182 open reading frames is TTE0181, encoding a PTS EI, which may therefore form part of the phosphotransfer chain with the galacitol-specific EII.
A protein domain found in cyclic GMP-specific and stimulated phosphodiesterases, adenylate cyclases, and E. coli FhlA has been defined as the GAF domain (3). GAF domains (Pfam accession number PF01590) were first described as noncatalytic cyclic GMP-binding domains in cyclic nucleotide phosphodiesterases (12). Subsequently they were detected in the phytochrome regulatory photoreceptors in plants and cyanobacteria, and they have since been detected in diverse bacterial and eukaryotic signaling proteins. GAF domains occur in EI of the PTS in E. coli as well as in the N termini of NifA-related EBPs. In E. coli, the FhlA EBP contains two N-terminal GAF domains, which bind formate, leading to transcriptional activation of the formate-hydrogen lyase system (26).
NifA, regulating nitrogen fixation, and related EBPs have been well studied in several organisms. The N-terminal GAF domain of Azotobacter vinelandii NifA has been shown to bind 2-oxoglutarate and modulates the response to the regulatory protein (5, 37). According to environmental conditions, NifL forms a repressive complex with NifA analogous to PspA-PspF and HrpV-Hrp/HrpS interactions (43), although NifL shows no homology to either PspA or HrpV. In contrast, the NifA proteins from Herbaspirillum seropedicae and Azospirillum brasiliense detect the nitrogen status via an N-terminal GAF domain, which may interact directly with the PII signal transduction protein (60). It has also been demonstrated that the A. vinelandii NifL-NifA system interacts directly with the PII protein (36, 38, 55, 57).
Other, less well characterized EBPs containing a single GAF domain include YgaA in E. coli, AQ_218 and AQ_1792 in Aquifex aeolicus. YgaA (also known as NorR) has recently been shown to activate transcription of a nitric oxide detoxification system (21, 28). The stimuli detected by AQ_218 and AQ_1792 are unknown, but AQ_218 was shown to bind upstream of the glnBA operon for nitrogen assimilation (65).
Another regulatory domain commonly found in bacterial EBPs is the Per, ARNT, and Sim (PAS) domain (49), which, despite sharing little sequence similarity, has a three-dimensional structure similar to that of GAF, indicating common ancestry (24). PAS domains (Pfam accession number PF00989) are used as signal sensors in many signaling proteins in archaea, bacteria, and eukaryotes (66, 74). PAS domain proteins often detect a signal by way of a bound cofactor such as heme, flavin, or a 4-hydroxycinnamoyl chromophore.
Like R. capsulatus NtrC, the E. coli TyrR protein is clearly homologous to the σ54-dependent EBPs but lacks the GAFTGA motif required for interaction with σ54-RNAP. Consequently, it is unable to activate transcription from σ54-dependent promoters (48) but nevertheless regulates σ70-dependent promoters for genes involved in degradation of aromatic amino acids. TyrR has an additional sequence at its N terminus, in addition to the PAS domain. This additional sequence appears to encode an aspartokinase, chorismate mutase, and TyrA (ACT) domain (4). ACT domains (Pfam accession number PF01842) are found in a wide range of metabolic enzymes that are regulated by amino acid concentration, with pairs of ACT domains binding specifically to a particular amino acid, leading to regulation of the linked enzyme. In solution TyrR normally exists as a dimer, but it undergoes a reversible conformational change to a hexamer in the presence of ATP and tyrosine (18, 71). In P. aeruginosa, the PA2449 sequence (trEMBL accession number Q9I133) appears to have a PAS domain and an ACT domain at its N terminus, and it has an intact GAFTGA motif and so is potentially able to activate transcription of the nearby predicted σ54-dependent promoter upstream of gcvH2, encoding a glycine cleavage protein. Another protein (PhhR, PA0873), which has the sequence GAFEGA for its motif, performs the function of TyrR in P. aeruginosa but requires σ54-RNAP for activation of the phh operon (59). This protein is therefore competent to regulate σ70-dependent promoters but interacts with the σ54 holoenzyme to activate transcription.
The V4R, or vinyl 4 reductase, domain (Pfam accession number PF02830) is predicted to bind small molecules such as hydrocarbons and is found in proteins involved in chlorophyll biosynthesis as well as in regulators of phenol degradation (2). One distinct phylogenetic clade of EBPs is comprised of proteins containing one or more V4R domains (Fig. (Fig.2).2). Those members of this group that have been investigated regulate the catabolism of aromatic hydrocarbons and phenols. For example, the prokaryotic EBP XylR is the central regulator of the toluene degradation pathway in Pseudomonas species. Genetic and biochemical data indicate that the N-terminal domain of the protein interacts directly with m-xylene, which renders the protein competent as a transcriptional activator. Three-dimensional models that go some way to explaining the molecular basis for interaction with substrate have been generated for the V4R domains of XylR and DmpR (17).
In addition to the response regulators and GAF, PAS, PRD, and V4R domains, additional domains are found in a few bacterial EBPs. For example the Clostridium sticklandii Q9L4Q6 protein, also called PrdR, contains two cystathionine-β-synthase (CBS) domains in addition to a PAS domain at its N terminus (Fig. (Fig.1).1). CBS domains (Pfam accession number PF00571) are found, usually in pairs, in a wide range of proteins and are assumed to have a regulatory role (7). Mutations in the CBS domains of cystathionine-β-synthase lead to homocystinuria in humans. Among bacteria, CBS-containing proteins include IMP dehydrogenase (in E. coli and B. subtilis), glycine betaine transport ATP-binding protein (in E. coli and B. subtilis), acetoin utilization protein AcuB (in B. subtilis), and numerous proteins of unknown function. PrdR is believed to activate transcription of prdB, encoding the 26-kDa subunit of the selenoenzyme d-proline reductase in C. sticklandii (31, 32).
Pseudomonas putida protein O87788 has a fragment of an IclR domain (PF01614) at its N terminus. This domain is otherwise known only in the IclR family of transcriptional regulators, which includes the glyoxylate shunt repressor in E. coli (39). It is unclear whether the IclR domain is functional in protein O87788.
The BkdR protein (YqiR) is a σ54-dependent EBP that activates expression of an isoleucine and valine degradation pathway in B. subtilis. As Debarouille et al. (15) point out, there are two PAS domains in the N-terminal portion of BkdR. However, Pfam also detects a fragment of a domain normally found in the enzyme acetaldehyde dehydrogenase (Pfam accession number PF02396).
Now that complete sequences are available for about 70 bacterial genomes, representing many of the major phyla, the pattern of occurrence of enhancer-dependent transcription is becoming clear (Table (Table2).2). Genes encoding the σ54-RNAP subunit and EBPs have been identified in numerous proteobacteria, including most of those whose genomes have been sequenced. The only exceptions are highly specialized parasites that have undergone reductive evolution. Genes for enhancer-dependent transcription are widespread, even among organisms with relatively small genomes. For example, in several species of Chlamydia, σ54 is one of only two alternative σ factors present (33).
In contrast, the green nonsulfur bacterium Chloroflexus aurantiacus and the cyanobacterium Synechocystis sp. strain PCC 6803 do not encode σ54 but contain homologues of TyrR, a protein that is clearly very closely related to the EBPs but that does not activate transcription at σ54-dependent promoters (48). The complete genome sequences of several high-GC gram-positive bacteria (actinobacteria) contain no obvious homologues of σ54 or any obvious EBPs, indicating a rather patchy distribution of the enhancer-dependent transcription system among bacteria (64).
It is noteworthy that the genome of the oral pathogen Fusobacterium nucleatum (34) does not appear to encode σ54, yet there are two open reading frames, FN1321 and FN1831, with homology to σ54-dependent EBPs and with response regulator domains at their N termini. This indicates that F. nucleatum must have either lost its σ54 or gained the EBP-like sequences by horizontal transfer, suggesting that the EBP-dependent regulon is subject to significant selection pressure. These EBPs contain the sequences GAFLGA and GSFEGA in place of the conserved GAFTGA motif. Site-directed mutagenesis of the GAFTGA regions of well-characterized EBPs (22, 70) suggests that these proteins may not be competent to activate transcription by σ54-RNAP. Thus, like TyrR and R. capsulatus NtrC, in which the GAFTGA sequence appears to have been deleted rather than replaced, these EBPs may have evolved to regulate transcription by the housekeeping σ70-RNAP holoenzyme.
Potential σ54-dependent promoters, the targets of EBPs, are fairly easy to detect in DNA sequences (53, 63, 64) and have a well conserved sequence that can be represented as YTGGCACGrNNNTTGCW (6). We used the PromScan program (http://www.promscan.uklinux.net) to search for close matches to this consensus σ54-dependent promoter sequence. The search used a frequency matrix based on a compilation of 186 known promoters from a range of bacteria (6). Previously, predictions made in this way have often been subsequently confirmed by experimental studies (see, e.g., references 27, 41, and 64). Furthermore, genes encoding regulatory proteins are often situated adjacent to their target genes on the chromosome. Therefore, by combining promoter prediction and positional information, we can glean clues as to the targets of these regulatory proteins. To better understand the patterns of variation in the enhancer-dependent regulon, we took this approach to analyzing the complement of EBPs in several diverse bacteria (Table (Table11).
The number of EBPs encoded in genomes is rather variable between organisms, as is the repertoire of functions regulated by enhancer-dependent transcription. Recently the complete sets of EBPs and their probable regulatory targets have been described for E. coli K-12 (53) and for S. enterica serovar Typhimurium, S. enterica serovar Typhi, and Yersinia pestis (63). However, additional complete genome sequences have become available for additional strains of E. coli and Y. pestis, allowing us to compare the repertoires of EBPs between closely related organisms and hence make inferences about the recent evolutionary history of this regulon.
Two different biovars of the plague bacillus, Y. pestis, have been completely sequenced, representing two different plague pandemics. The genome of Y. pestis KIM (16) encodes three EBPs: NtrC (regulator of nitrogen assimilation), PspF (regulator of several stress responses), and YfhA (unknown function). However, Y. pestis CO92 encodes an additional EBP, YPO0712 (also annotated as FleQ) (46). Blast searches revealed high similarity to the FleR/FleQ proteins from Pseudomonas species, which activate transcription of several genes involved in flagellar motility in a σ54-dependent fashion. Indeed, YPO0712 is situated close to several operons containing genes for components of the flagellar apparatus, and several of these have good candidate σ54-dependent promoter sequences (data not shown). The regulation of motility and chemotaxis has been well studied in E. coli and Salmonella species and does not directly involve the σ54-dependent regulon. Since close homologues of FleR and FleQ are not found in any of the closely related enteric bacteria, it seems likely that an ancestor of Y. pestis CO92 has recently acquired the flagellar transcriptional regulator from a Pseudomonas-like organism. It is unclear whether Y. pestis KIM has subsequently lost this EBP and the cognate promoters or whether Y. pestis CO92 acquired them since diverging from KIM. As Deng et al. (16) point out, the biological significance of genes encoding flagellar components is obscured by the observation that Y. pestis is nonmotile. We may just be seeing the remnants of an obsolete motility apparatus, or alternatively, the flagellum might be required in some hitherto-undiscovered stage of the plague bacillus life cycle.
Complete genome sequences are available for both the E. coli laboratory strain K-12 (9) and the pathogenic (enterohemorrhagic) E. coli strain O157:H7 (23). Furthermore, the complete genome sequence for the very closely related Shigella flexneri 2a has recently become available (29). E. coli K-12 possesses 12 known σ54-dependent EBPs. Of these, a core of eight are conserved in O157:H7 and S. flexneri, three of which are also conserved in Y. pestis and Salmonella species (53, 63). One of those missing from O157:H7 and S. flexneri is AtoC. In E. coli K-12, AtoC regulates acetoacetate catabolism and is required for utilization of short-chain fatty acids. The absence of this regulator is consistent with the observation that O157:H7 is unable to utilize short-chain fatty acids (23).
In contrast to the situation in the enteric bacteria described above, the two sequenced strains of the gastric pathogen Helicobacter pylori each contain only one EBP-encoding sequence (1, 68). This EBP, called FlgR, is known to play a major role in the transcriptional regulation of flagellar motility (35, 61). However, as well as having reduced motility, an rpoN mutant of H. pylori also displayed reduced viability and altered cell morphology in stationary phase (20). Interestingly, we have found a potential σ54-dependent promoter sequence (AGGCATTGGGTTTGCG) upstream of the gene encoding the MreB cell shape-determining protein in H. pylori 26695 but not in H. pylori J99.
It is clear from the data in Table Table22 and from other surveys of this regulon (6, 11, 42, 63, 64) that whereas some genes, e.g., glnA, are commonly controlled by EBPs, in some other genera EBPs regulate diverse physiological functions in different species (and even different strains within a species). Therefore, the elaborate σ54-EBP mode of transcription has apparently been recruited to provide fine-tuning of specific metabolic processes appropriate to the physiology of the particular host organism. It is presently unclear what selective pressures maintain this plasticity.
To better understand the evolution of domain arrangements in EBPs, we performed a phylogenetic analysis of EBP protein sequences. We obtained an alignment from the Pfam database (8) that contained all of the EBP central region (σ54 interaction module) sequences in SwissProt and trEMBL. We manually edited the alignment to remove obvious false positives and reduced redundancy by removing multiple sequences of greater than 80% identity, and we used this alignment to produce the neighbor-joining phylogenetic tree shown in Fig. Fig.2.2. We also used maximum-likelihood and parsimony methods to build trees, which were broadly consistent with the neighbor-joining tree.
From Fig. Fig.2,2, it is evident that the sequences generally cluster with sequences from proteins with similar domain architectures. Those EBPs containing a V4R domain and those with PRD domains form single clusters, as do those containing both PAS and ACT domains. However, those EBPs containing a response regulator domain do not fall into a single phylogenetic cluster. Similarly, those EBPs containing GAF or PAS domains are polyphyletic. This suggests that the fusion of these sensory domains with the σ54 interaction module may have arisen several times during the evolution of the bacteria.
Signal transduction proteins are often poorly annotated in sequence databases, largely because of their complex domain organizations. This is particularly true for the bacterial EBPs. For example, E. coli YgaA has been misannotated as a response regulator. Therefore we took a domain-oriented approach to surveying the structural diversity of this important family of regulators and revealed several novel features as well as rationally classifying the more familiar members of the family. Furthermore, we have developed the powerful approach of combining positional analysis of genes with detection of binding sites in DNA to infer novel insights into the enhancer-dependent regulon in diverse completely sequenced organisms.
We thank Alex Bateman and the Pfam group for useful discussions and support.