|Home | About | Journals | Submit | Contact Us | Français|
Related organisms typically respond to a given cue by altering the level or activity of orthologous transcription factors, which, paradoxically, often regulate expression of distinct gene sets. Although promoter rewiring of shared genes is primarily responsible for regulatory differences among related eukaryotic species, in bacteria, species-specific genes are often controlled by ancestral transcription factors and regulatory circuit evolution has been further shaped by horizontal gene transfer. Modifications in transcription factors and in promoter structure also contribute to divergence in bacterial regulatory circuits.
Free-living organisms typically respond to a change in their surroundings or in cellular components by modifying the expression of multiple genes. In addition to sensors that detect chemical or physical cues and signaling molecules that transduce these stimuli within a cell, the responses to such changes often rely on DNA-binding proteins that interact with specific DNA sequences in promoters to activate or repress gene transcription. Thus, the regulatory circuit defined by the "wiring" between regulatory proteins and target genes determines the repertoire of gene products that an organism synthesizes upon encountering a particular signal or experiencing a developmental cue.
Related species usually rely upon orthologous regulatory systems to orchestrate responses to a given signal. In certain circumstances, the elicited responses are largely similar across species, indicative that orthologous regulatory systems control common cellular functions across species even if the species occupy different niches. In other circumstances, the responses are distinct, either in qualitative or quantitative terms, suggesting that the regulatory systems adopted by individual species are suited to particular habitats and lifestyles.
The different responses that orthologous regulatory systems can elicit when experiencing a given signal indicate that transcription circuits experience modifications in the interactions between regulators and their targets. These modifications may result in abilities that enable organisms to occupy new niches, thus contributing to the phenotypic diversity that exists among related species (McAdams et al., 2004). Yet, the observed rewiring of regulatory circuits does not necessarily result from adaptive processes, even when it causes significant changes in gene expression outputs (Lynch, 2007).
Until very recently, the knowledge of transcription regulatory circuits was limited to a few model organisms belonging to phylogenetically distant groups and sharing relatively few genes. This prevented the comparative analyses of orthologous regulatory circuitries across closely related organisms. Thus, the extent of modifications undergone by regulatory circuitries remained largely unknown. However, the availability of an increasing number of genome sequences, the use of genome-wide computational and experimental approaches to uncover entire sets of regulatory interactions in multiple species, as well as the engineering of organisms harboring the regulatory architecture from a related species, have made it possible to study empirically the patterns of evolution of regulatory circuits. These studies have revealed that differences in regulatory circuitry can play a significant role in the morphological and developmental evolution in animals (Carroll, 2005, 2008; Davidson, 2006), are responsible for the distinct expression of antibiotic resistance determinants in bacteria (Kato et al., 2007; Winfield and Groisman, 2004; Winfield et al., 2005), and may direct the colonization of new niches in unicellular eukaryotic organisms (Borneman et al., 2007; Tuch et al., 2008a). Therefore, tinkering with transcription factors, promoter sequences and circuit architecture, which are often referred to as “the regulatory genome” (Carroll, 2005, 2008; Davidson, 2006), has given rise to a variety of traits both in bacteria and eukaryotes.
The investigation of bacterial regulatory circuits has focused on a relative small number of extant species (consider that most bacterial species cannot be cultured in the laboratory). Yet, these investigations suggest that the evolution of regulatory circuits in bacteria proceeds in a different manner from what has been described thus far in eukaryotes (Carroll, 2005, 2008; Davidson, 2006; Tuch et al., 2008b). The reasons for the differences are the following: First, unlike closely related eukaryotic organisms, closely related bacterial species exhibit significant differences in gene content due to the pervasiveness of horizontal gene transfer. This means that the spectrum of targets controlled by orthologous transcription factors can be quite different among related bacteria. Moreover, it can create conditions conducive to the modification of a transcription factor (Wagner and Lynch, 2008). And second, the relatively compact bacterial promoters, which are typically <100 nt in length, demands that transcription factor binding sites be located at particular positions and orientations to effectively regulate gene transcription. This is in contrast to the sparse and uneven distribution of binding sites that characterizes eukaryotic promoters (Gasch et al., 2004; Wray et al., 2003).
Here, we explore the evolution of transcriptional regulatory circuits in bacteria by analyzing the genetic bases for the qualitative and quantitative differences in gene expression outputs that often result in phenotypic variation among closely related species. We compare and contrast the modifications of regulatory circuits experienced in bacteria and eukaryotes. Finally, we discuss the molecular mechanisms by which the horizontal acquisition of genes affects both the composition of an ancestral regulon (i.e., the group of genes controlled by a regulatory protein) as well as the activity of ancestral transcription factors and the promoters on which they operate.
"… organisms of the most different sorts are constructed from the very same battery of genes. The diversity of life forms results from small changes in the regulatory systems that govern expression of these genes."
F. Jacob, Of flies, mice and men
Phenotypic differences between two closely related organisms can result when an ancestral regulatory protein controls the expression of a target gene(s) in only one of the two organisms. This could be due to the presence of a binding site for a transcription factor or an alternative sigma factor (the dissociable subunit of bacterial RNA polymerase that determines promoter specificity) in the promoter of the target gene in one of the organisms and its absence from the promoter of the orthologous gene in the other organism (Figure 1). Alternatively, the target gene itself may be present in the genome of only one of the two organisms harboring the ancestral transcription factor or sigma factor (Figure 1).
Because closely related eukaryotic species have similar gene contents (for example, less than 1% of mouse protein coding genes have no detectable homolog in the human genome and vice versa (Waterston et al., 2002)), most investigations into the genetic basis of morphological differences among animal species have focused on the contribution of cis-acting promoter sequences in homologous genes. Indeed, transcription factor binding sites can be readily gained or lost (Dermitzakis and Clark, 2002; Doniger and Fay, 2007; Moses et al., 2006), profoundly affecting transcriptional networks (Ihmels et al., 2005; Tanay et al., 2005). Furthermore, genome-wide experimental comparisons of the distribution of transcription factor binding sites have demonstrated that a large proportion of binding sites are not conserved even among closely related eukaryotic organisms (Borneman et al., 2007; Odom et al., 2007; Tuch et al., 2008a). It has been postulated that the gains and losses of binding sites (as opposed to changes in the transcription factors) is the major mechanism contributing to the evolution of gene regulation in higher eukaryotes (Carroll, 2008; Jeong et al., 2008b; Wray, 2007). This is an attractive hypothesis because changes in the binding site(s) for a particular transcription factor can selectively alter transcription of a single gene without affecting the expression of other genes co-regulated by the same transcription factor (Carroll, 2008; Jeong et al., 2008b; Wray, 2007).
Like in eukaryotes, orthologous transcription factors often govern the expression of different gene sets in related bacterial species. A genome-wide experimental analysis that combined chromatin immunoprecipitation (ChIP-chip) with gene transcription measurements demonstrated that only ~30% of the homologous genes directly controlled by the DNA-binding protein PhoP in Salmonella enterica or Yersinia pestis are PhoP-regulated in the other species (Perez et al., 2009). For example, the PhoP protein governs transcription of the regulatory gene rstA in Salmonella but not in Yersinia, and the converse is true for the putative aminidase gene y1877 (designated ybjR in Salmonella) (Perez et al., 2009). The Salmonella RstA protein modulates the levels of the alternative sigma factor RpoS (Cabeza et al., 2007) and of a Fur-repressed iron transporter (Jeong et al., 2008a) highlighting that transcriptional rewiring of a regulatory gene can have pleiotropic effects. Similarly, the orthologs of 10 genes whose expression is controlled by the stress response sigma factor RpoE in E. coli and Shigella do not appear to harbor RpoE-regulated promoters in Salmonella; likewise, RpoE-regulated promoters are found upstream of five open reading frames in Salmonella but not upstream of the corresponding orthologs in E. coli and Shigella (Rhodius et al., 2006). Consistent with the notion that DNA-binding regulatory proteins can readily acquire new targets, the CRP regulon of E. coli underwent massive changes in strains that had evolved independently under laboratory conditions for the relative short period of 20,000 generations (Cooper et al., 2008).
The prevalence of transcriptional rewiring in bacteria is presently not clear because only a small number of experimental studies have addressed its occurrence. Unfortunately, purely computational comparisons of bacterial genomes cannot be used to infer transcriptional rewiring events because such comparisons usually assume that if a transcription factor controls a particular target(s) in a given species, such regulatory interaction(s) will be conserved in another species, without taking into account the presence or absence of binding sites for the investigated transcription factor. And even in instances when binding sites are computationally identified, the uncertainty about the functionality of predicted sites poses limitations to this approach. Indeed, interpretations can be difficult even when computational studies are accompanied by genome-wide identifications of DNA segments bound by a transcription factor in vivo, such as ChIP-chip assays. This is because only a subset of the sites where a transcription factor binds in vivo affect transcription (Shimada et al., 2008). Thus, the number of binding sites for a transcription factor may far exceed the number of genes that it regulates.
Ascribing phenotypic behaviors solely on the predicted presence or absence of transcription factor binding sites in promoter regions can also lead to incorrect conclusions because related bacterial species sometimes rely on different regulatory architectures to express orthologous genes. For instance, growth in low Mg2+ or with Fe3+ renders Y. pestis resistant to the antibiotic polymyxin B (Winfield et al., 2005). This is accomplished by the presence of binding sites for the Mg2+-responsive transcription factor PhoP and the Fe3+-responsive transcription factor PmrA in the promoters of the polymyxin B resistance pbgP and ugd genes (Winfield et al., 2005) (Figure 2). Because the pbgP promoter of S. enterica lacks a PhoP box but harbors a PmrA box, one might predict that S. enterica displays polymyxin B resistance when experiencing Fe3+ but not in low Mg2+ environments. However, Salmonella does promote pbgP and ugd expression and becomes resistant to polymyxin B in low Mg2+ utilizing an indirect pathway involving the PhoP-activated PmrD protein, a post-translational activator of PmrA (Kato and Groisman, 2004) (Figure 2). Therefore, even if functional transcription factor binding sites can be accurately identified, it is not always possible to infer expression outputs without experimental studies.
The vast phenotypic differences that exist among related bacterial species reflect the extensive modifications that their genomes have undergone including the acquisition and subsequent loss of genes (McAdams et al., 2004; Pallen and Wren, 2007). For instance, a three-way comparison of the genomes corresponding to uropathogenic, enterohemorrhagic and commensal E. coli strains revealed that only 39.2% of the combined (that is, non-redundant) set of protein-coding genes is common to all three genomes (Welch et al., 2002). These genome differences are due, to a large extent, to horizontal gene transfer events from other organisms (as opposed to selective gene loss from an ancestral E. coli.) Likewise, the Gram-negative pathogen S. enterica serovar Typhimurium has acquired and retained more than 200 discrete regions of >100 bp in length since it diverged from its last common ancestor with E. coli. These regions include ~1,400 open reading frames (ORFs), or slightly more than one-quarter of Salmonella's total genetic material (Porwollik and McClelland, 2003).
Horizontal gene transfer can rapidly endow bacteria with new traits, such as virulence, resistance to antibiotics, and the ability to utilize certain compounds as carbon or energy sources (Ochman et al., 2000; Pallen and Wren, 2007). The acquired DNA sequences are often clustered in the genome in regions designated islands, and these typically include structural genes mediating the ability (for instance, those encoding a specialized secretion system and the proteins that are secreted via such system) as well as regulatory genes that govern the expression of the structural genes. Acquired DNA sequences benefit a recipient bacterium only if they are expressed at the right time, in the correct locale, and in a coordinated manner with ancestral genes. Therefore, even when a foreign DNA segment includes a regulatory gene(s), the newly acquired genes are usually embedded into ancestral regulatory networks (Dorman, 2009). For example, the SPI-2 pathogenicity island of S. enterica harbors a large number of structural genes that are coordinately regulated by the SsrB/SpiR two-component system, which is also encoded within the SPI-2 locus (Fass and Groisman, 2009). Expression of the SsrB and SpiR proteins is, in turn, under the control of the ancestral regulatory systems OmpR/EnvZ and PhoP/PhoQ, the regulatory protein SlyA, and several nucleoid-associated proteins (Fass and Groisman, 2009).
The pervasive effect of horizontal gene transfer has resulted in many ancestral regulatory proteins regulating primarily horizontally-acquired genes. For instance, over half of the targets of regulation directly controlled by the ancestral PhoP protein in Salmonella have no homologs outside Salmonella spp. (Perez et al., 2009). Thus, in addition to controlling the SPI-2 genes described above, PhoP governs the expression of genes residing in other pathogenicity islands and islets scattered in the Salmonella genome (Groisman, 2001). This situation is not unique to a particular transcription factor or species as the RovA protein controls transcription of several horizontally-acquired genes in Yersinia enterocolitica and Y. pestis, including some mediating the ability of these pathogens to cause disease (Cathelyn et al., 2007). Furthermore, the ancestral TcpP and ToxR proteins from Vibrio cholerae regulate the expression of ToxT, a horizontally-acquired transcription factor that promotes expression of the cholera toxin genes residing in a bacteriophage genome (Bina et al., 2003; Krukonis and DiRita, 2003). As a general rule, horizontally-acquired genes tend to experience more complex regulation than ancestral genes (Price et al., 2008; Rajewsky et al., 2002), perhaps as a way to prevent the detrimental effects that could result from the non-physiological expression of a product that is new to the organism.
Computational studies that probed the transcriptional regulatory networks of E. coli and Bacillus subtilis across hundreds of bacterial species by estimating the presence and absence of transcription factors and target genes have found that relatively few transcription factor-target gene pairs are maintained beyond closely related species (Lozada-Chavez et al., 2006; Madan Babu et al., 2006; Price et al., 2007). Moreover, target genes tend to be maintained to a greater extent than the transcription factors that control their expression (Balaji and Aravind, 2007). For example, examination of 30 sequenced γ-proteobacterial genomes for the presence of transcription factors and targets identified in E. coli K-12 revealed that only 13 of 143 transcription factors were present in all 30 genomes (Hershberg and Margalit, 2006). The 13 are global regulators or are located at the top of regulatory hierarchies thereby governing a plethora of biological processes (Hershberg and Margalit, 2006), which is unlike the transcription factors that are present exclusively in the close relatives of E. coli from the family Enterobacteriaceae. In other words, it appears that the more targets a transcription factor has, the broader its phylogenetic distribution.
The likelihood of a lineage losing a transcription factor is affected by the mode by which it exerts its effect: a repressor gene is more likely to be lost from a genome only after the loss of its regulated targets whereas an activator gene may be lost even when its regulated targets remain in the genome (Hershberg and Margalit, 2006). That a repressor gene is lost only after its regulated targets have been eliminated (either by gene loss or rewiring events) is attributed to the reduction of fitness often resulting from constitutive expression of the derepressed genes (Hershberg and Margalit, 2006). An example is the reduction in the ability of a bacterial pathogen to colonize its animal host (Guo et al., 2008). By contrast, loss of an activator gene does not normally affect the ability of an organism to express its regulated targets under different circumstances and could even enhance fitness if the targets are part of a pathway no longer needed by the organism. This has resulted in E. coli K-12 repressors that control many targets being retained in closely related species while activators with comparable number of targets being often absent from those species (Hershberg and Margalit, 2006).
The targets of regulation of an ancestral regulatory protein can be divided into two groups based on their phylogenetic distribution: a core set, which is shared among all (or most) species that harbor the regulatory protein, and a variable set consisting of species-specific genes. Investigations carried out with different regulatory systems suggest that these two groups of targets play largely different roles. The core set of genes performs two types of functions: modulating the amount of the active form of the regulatory protein and coping with the environmental change that activated the protein. As described below, the two tasks carried out by core regulon members have been recognized in regulatory proteins that are structurally different, operate by dissimilar mechanisms and respond to distinct signals.
The alternative sigma factor RpoE is activated in response to envelope stress in several Gram-negative bacteria. Accordingly, ~60% of the core RpoE regulon governs the synthesis and assembly of the lipopolysaccharide and outer membrane proteins or it encodes the transcriptional circuitry that maintains homeostasis of these constituents in the outer membrane (Rhodius et al., 2006). Similarly, the core regulon of the LexA protein, which mediates the SOS DNA-damage response, encompasses genes involved mainly in DNA repair and fork stabilization, and in autoregulation of LexA protein levels. The latter is achieved by LexA repressesing its own transcription as well as that corresponding to the recA gene, which codes for a protein that promotes the autocatalytic cleavage of the LexA protein (Erill et al., 2007). Likewise, the core regulon governed by the Mg2+-responsive PhoP protein mediates the adaptation to Mg2+-limiting environments by promoting transcription of orthologous and non-orthologous Mg2+ transporters as well as proteins that modify Mg2+-binding sites in the bacterial cell envelope. In addition, the PhoP core regulon dictates the levels and activity of the PhoP protein through positive (Shin et al., 2006) and negative (Perez et al., 2009) feedback loops.
Orthologous transcription factors also regulate the expression of a variable set of genes in different bacterial species, suggesting that the species-specific regulon members contribute to survival in the niche where each organism proliferates. For instance, the variable portion of the RpoE (Rhodius et al., 2006) and PhoP (Perez et al., 2009) regulons in the family Enterobacteriaceae have been implicated in pathogenicity-associated functions. Other species-specific regulon members may affect the spread of mobile genetic elements carrying virulence or antibiotic resistance determinants. For example, the LexA protein represses a promoter required for lytic development of the CTX bacteriophage of Vibrio cholerae, which carries the cholera toxin gene (Quinones et al., 2006). CTX phage spread is then favored when V. cholerae experiences DNA damage, which promotes the RecA-dependent autocleavage of the LexA protein. RecA also stimulates autocleavage of the SetR repressor encoded in the STX mobile element of V. cholerae, which harbors genes conferring resistance to several antibiotics (Beaber et al., 2004). SetR inactivation results in derepression of the transfer genes thereby stimulating horizontal dissemination of antibiotic resistance determinants.
The regulatory architectures promoting gene expression vary considerably across and within species, ranging from direct transcriptional control to multi-stage circuits involving feedback loops, feedforward loops and regulatory cascades (Alon, 2007). Because regulatory architecture is a major determinant of gene expression output, restructuring the interactions between orthologous regulatory proteins and orthologous target genes has the potential of provoking profound changes on the levels or kinetics with which gene products are synthesized. In other words, even when the same target gene(s) is turned on in response to the same cue, the timing of gene expression can be quite different depending on both the general architecture and the particular components that make up a given architecture (Alon, 2007).
As discussed above, S. enterica and Y. pestis utilize different regulatory circuits to promote transcription of the polymyxin B-resistance pbgP and ugd genes when experiencing low Mg2+ environments (Winfield et al., 2005) (Figure 2). Mathematical modeling of the two circuits revealed that the indirect pathway operating in Salmonella exhibits signal amplification (that is, higher mRNA levels for a given signal level), expression persistence (the time an mRNA is present after a cell is switched from inducing to repressing conditions) and expression delays (the time it takes for the appearance of an mRNA after the organism first experiences an inducing condition) relative to the direct pathway present in Yersinia. By creating a Salmonella strain harboring the Y. pestis regulatory circuit controlling the polymyxin B resistance genes instead of its own, it was demonstrated that the indirect pathway operating in wild-type Salmonella confers heightened levels of polymyxin B resistance than the direct one from Y. pestis (Kato et al., 2007).
The non-orthologous replacement of a component(s) of a regulatory circuit also has the potential of altering the expression output of a transcription factor. For example, the levels of the alternative sigma factor RpoS increase when E. coli and Salmonella experience specific nutrient limiting conditions (Bougdour et al., 2008). In the case of phosphate limitation, both organisms promote expression of the highly conserved IraP protein that binds to and antagonizes the RssB protein, which is responsible for delivering RpoS to the ClpXP protease for degradation (Figure 3) (Bougdour et al., 2006; Tu et al., 2006). In the case of magnesium limitation, IraP mediates RpoS accumulation in Salmonella but not in E. coli because the iraP promoter harbors a binding site for and is regulated by the Mg2+-responsive PhoP protein in the former but not in the latter species (Tu et al., 2006). However, E. coli does accumulate RpoS and promote transcription of RpoS-regulated genes in the low Mg2+ conditions that activate the PhoP protein by utilizing the E. coli-specific PhoP-activated iraM gene (Figure 3) (Bougdour et al., 2008). Despite exhibiting limited amino acid identity to one another, both IraP and IraM interact with the RssB protein (Bougdour et al., 2008). That growth in low Mg2+ promotes heightened RpoS accumulation in Salmonella than in E. coli (Tu et al., 2006) raises the possibility of the Salmonella IraP protein being more efficient at stabilizing RpoS than the E. coli IraM protein.
The IraP and IraM proteins described above, and the PmrD protein, which connects the PhoP/PhoQ and PmrA/PmrB two-component systems (Kato and Groisman, 2004), belong to an emerging class of proteins termed two-component system connectors that integrate signal transduction pathways at a post-translational level (Mitrophanov and Groisman, 2008). These connector proteins, which play roles in a variety of physiological functions including sporulation, competence, antibiotic resistance, and the transition to stationary phase, provide a source of architectural diversity to bacterial regulatory circuits.
Streptococcus pneumoniae and Streptococcus mutans are related Gram-positive species that rely on different related circuits to govern competence (the physiological state that allows naturally transformable bacteria to take up naked DNA from the environment.) In S. pneumoniae, competence is induced by the small peptide hormone CSP (competence stimulating peptide), which is detected by the membrane sensor kinase ComD. The latter protein promotes phosphorylation of the regulator ComE, and phosphorylated ComE directly activates expression of the comX gene encoding a competence-specific sigma factor that, in turn, directs transcription of the genes encoding the machinery for uptake and processing of DNA (Claverys et al., 2006). Even though S. mutans also relies on the alternative sigma factor ComX to promote transcription of its competence genes, it lacks comD and comE orthologs and uses the sensor BlpH and regulator BlpR to control the production, activity, and stability of ComX (Martin et al., 2006) (Note that the blpH and blpR genes are not orthologs of the S. pneumonia comD and comE genes.)
The regulation of flagella synthesis and assembly provides a striking example of non-orthologous regulators governing a conserved biological function. The flagellum, a long thin filament that protrudes from the cell body, enables bacterial movement through liquid and highly viscous environments as well as surfaces (McCarter, 2006). Dozens of structural and regulatory genes are involved in the assembly of this organelle. The temporal pattern of gene expression and protein production, for the most part, conforms to the order in which the products are assembled (Chevance and Hughes, 2008). Although different bacterial species express the flagellar genes in a similar order, the transcription factors that are responsible for the expression of each tier of genes can be vastly different. For example, the γ-proteobacterial species E. coli and S. enterica have three tiers of gene control with the master regulators FlhD and FlhC at the top of the hierarchy governing the expression of the genes coding for the basal body and hook regions as well as for a specific alternative sigma factor (known as FliA or σ28), which, in turn, directs transcription of the flagellin gene itself (McCarter, 2006).
The FlhD and FlhC proteins appear to be restricted to the β- and γ-Proteobacteria (Smith and Hoover, 2009). Moreover, only 44% of the β- and γ-Proteobacteria species predicted to be flagellated contain flhDC orthologs, suggesting the existence of other master regulators within members of these two groups of organisms (Smith and Hoover, 2009). For example, the γ-Proteobacteria species Pseudomonas aeruginosa harbors a four-tiered regulatory cascade with the FleQ protein at the top and involving two alternative sigma factors: σ54 and σ28 (McCarter, 2006). And in the α-proteobacterial organism Caulobacter crescentus, flagellar gene expression is governed by the essential cell cycle regulator CtrA, and late gene expression requires the σ54-dependent transcription factor FlbD (Smith and Hoover, 2009). At least another six non-orthologous master regulators of flagellar genes have been characterized in other species (Smith and Hoover, 2009). In addition, organisms display specificity with respect to the regulatory factors and signals dictating expression of the master regulators (McCarter, 2006).
It has been suggested that transcription factors can be divided into two groups, global and local regulators, based on their DNA binding specificity, being low for the former and high for the latter (Lozada-Chavez et al., 2008; Rajewsky et al., 2002). The lower specificity exhibited by global regulators in comparison to that displayed by local regulators may enable them to control numerous targets. Global regulators appear to evolve more slowly than other regulators with respect to both their primary sequences and the target genes they control (Rajewsky et al., 2002), making it more likely for phenotypic variation to arise from allelic differences in local regulators. Consistent with this notion, amino acid sequence divergence between the orthologous regulatory proteins Nra/RofA from two S. pyogenes strains is responsible for quantitative differences in the output of a circuit governing the levels of pili in this Gram-positive pathogen (Lizano et al., 2008). The nra/rofA allele from a strain that preferentially colonizes the human throat promotes higher levels of pilus expression than the allele from the strain that colonizes the skin. Thus, the particular allele of a regulatory protein affects the human tissue preferentially colonized by S. pyogenes.
Disparate expression outputs may result from allelic differences in sensing and signaling proteins; for instance, members of the Gram-positive genera Bacillus and Clostridium proliferate in diverse ecological niches and form dormant spores that ensure survival under adverse environmental conditions. The initiation of sporulation is a tightly regulated process triggered by signals detected by five different sensor kinases that initiate a phosphorelay, which eventually results in the phosphorylation of the regulator Spo0A, a transcription factor that controls the expression of the sporulation genes (Dworkin and Losick, 2005). A bioinformatic analysis of the amino acid sequence of the sensors and regulatory proteins involved in this process indicated that there is significant variation in the size, domain composition and putative membrane-spanning regions of the signal input domains of the sensor kinases across related species (Stephenson and Hoch, 2002). This is in contrast to the striking conservation found in the signaling domains of the sensor kinases, and in the protein-protein and protein-DNA contacts of the entire phosphorelay. The detected variation in the sensing domains of the sensor kinases may reflect that different species start sporulation in response to distinct signals.
The PmrD protein enables Salmonella to express genes regulated by the PmrA protein, such as those mediating resistance to the antibiotic polymxyin B, in response to the low Mg2+ signal that activates the PhoP protein (Kato and Groisman, 2004). By contrast, E. coli cannot express PmrA-activated genes in low Mg2+ because it harbors a highly divergent PmrD protein that is only 55.3% identical to the Salmonella PmrD (Winfield and Groisman, 2004), much lower than the 90% median amino acid identity between E. coli and Salmonella proteins (McClelland et al., 2001). Replacement of the E. coli pmrD gene by the Salmonella ortholog enables E. coli to transcribe PmrA-regulated genes under PhoP-inducing conditions (Winfield and Groisman, 2004) (Figure 2). The pmrD gene appears to be evolving in a non-neutral fashion in E. coli (Winfield and Groisman, 2004), which may enable interactions with a yet-to-be identified partner(s) in this species. Alternatively or in addition, the divergence of the E. coli PmrD, which prevents expression of PmrA-regulated genes under PhoP-inducing conditions, is perhaps a means to avoid the hypersensitivity to deoxycholic acid resulting from pmrA hyperactivation (Froelich et al., 2006) when the organism experiences inducing conditions for the PhoP/PhoQ system.
"Evolution proceeds like a tinkerer who, during millions of years, has slowly modified his products, retouching, cutting, lengthening, using all opportunities to transform and create"
F. Jacob, The possible and the actual
The number of genes in the sequenced bacterial genomes varies over 40-fold: there are only 182 genes in the symbiont Cresonella ruddii and nearly 8,000 in the soil bacterium Solibacter usitabus (van Passel et al., 2008). However, as realized in the early bacterial genome projects, the number of genes devoted to gene regulation is not directly proportional to genome size, but rather the fraction of a bacterial genome devoted to gene regulation increases with genome size (Stover et al., 2000). For example, the 4.3 Mbp E. coli genome devotes ~6% to gene regulation whereas the portion dedicated to regulatory functions is ~9% and ~12% for the larger P. aeruginosa (6.3 Mbp) and Streptomyces coelicolor (8.7 Mbp), respectively. On the other side of the spectrum, regulatory networks have essentially disappeared from the vastly reduced genomes of many bacterial symbionts, possibly because these organisms live in relatively constant host environments, and rely primarily on the interactions between RNA polymerase and promoter sequences to express their genes at the required levels.
The evolution of bacterial regulatory circuits appears to follow two main mechanisms (Martínez-Antonio et al., 2006; Teichmann and Babu, 2004). On the one hand, it entails duplication of the genes for a transcription factor and its regulated targets, which has been estimated to account for 5–8% of the regulatory interactions observed in bacterial genomes (Rajewsky et al., 2002). In the case of E. coli and B. subtilis, the majority of global regulators belong to different paralogous groups. On the other hand, the evolution of bacterial regulatory circuits involves changes in the connections between preexisting elements as well as the incorporation of new regulatory and structural genes as a consequence of horizontal gene transfer.
One may hypothesize that an organism's fitness may be hampered upon the elimination of certain branches of its circuits or by the formation of new regulatory connections, at least when these events first happen. However, this does not appear to be the case because 95% of the connections artificially added to the circuitry of E. coli were tolerated without significantly affecting bacterial growth under standard laboratory culture conditions (Isalan et al., 2008). Whereas it is unclear whether this way of "sampling" new transcriptional regulatory interactions resembles how organisms normally evolve their regulatory circuits, this finding does suggest that the formation of new connections between genes may rarely be detrimental to bacterial survival.
The emergence of novel regulatory circuits may entail the acquisition or invention of new genes as well as the rewiring of connections with ancestral transcription factors. For example, it has been proposed that the PmrD-mediated pathway, which enables S. enterica to express PmrA-dependent genes under PhoP inducing conditions (Kato and Groisman, 2004), emerged when the ancestral strain that gave rise to the lineage resulting in Klebsiella pneumoniae, S. enterica, Shigella flexneri and E. coli acquired (or "invented") the pmrD gene (Mitrophanov et al., 2008) (as opposed to the repeated loss of the pmrD gene by all enteric species except for K. pneumoniae, S. enterica, S. flexneri and E. coli.). This proposal implies that the circuit present in Yersinia spp., which lacks PmrD but harbors binding sites for both the PmrA and PhoP proteins in certain promoters (Winfield et al., 2005), corresponds to the ancestral state. K. pneumoniae appears to be an intermediate in the evolution of the PmrD-mediated pathway because it harbors a circuit composed of both the direct control of PmrA-activated promoters by PhoP like in Yersinia spp., as well as a pmrD gene that enables activation of the PmrA protein like in Salmonella (Figure 2) (Mitrophanov et al., 2008).
Mutations in regulatory genes can eliminate the function of entire circuits and not affect the structural genes (which may still be targeted by other regulatory proteins). For instance, the absence of flagella and the resulting lack of motility exhibited by strains of Y. pestis and Shigella flexneri has been ascribed to mutations in the master regulatory gene flhD and alternative sigma factor gene fliA, respectively (Hershberg and Margalit, 2006). This illustrates how changes in a single regulatory gene can shut down an entire morphogenic pathway that requires dozens of gene products.
Certain regulatory architectures are present across bacteria and eukaryotes (Milo et al., 2002; Shen-Orr et al., 2002) whereas others appear to be more prevalent in one of these two domains of life. For example, multicomponent regulatory loops are characteristic of eukaryotic circuits. One of the rare cases of bacterial multicomponent regulatory loops is the one identified in S. enterica where the PmrA protein represses transcription of the pmrD gene (Kato et al., 2003), encoding for the post-translational activator of the PmrA protein (Kato and Groisman, 2004) (Figure 2). This feedback loop is not present in E. coli possibly because the PmrD protein does not activate the PmrA protein in this species (Winfield and Groisman, 2004). Yet, E. coli does utilize a multi-component loop to control the level of the sigma factor RpoS, which promotes transcription of the rssB gene, encoding the protein that delivers RpoS to the ClpXP protease for degradation (Ruiz et al., 2001).
The incorporation of a horizontally-acquired gene(s) into an ancestral regulatory circuit requires that the promoter of such a gene(s) harbor (or evolve) sequences matching the motif recognized by an ancestral transcription factor at the right distance and orientation of the sites recognized by RNA polymerase so that productive interactions can be established by an activator protein and RNA polymerase, or in the case of an ancestral repressor, that effective silencing can be achieved. This is because, unlike eukaryotic promoters, where transcription factor binding sites occur sparsely and unevenly over large DNA regions, bacterial promoters are short (typically <100 nt) and binding sites must be properly positioned to contact the transcription machinery in a manner that results in gene expression (Browning and Busby, 2004). In fact, studies of the evolution of cis-regulatory sequences in yeast indicate that there is no selective pressure to maintain the exact positions of individual binding sites in these organisms (Gasch et al., 2004). By contrast, modifications in the structure of regulated promoters, transcription factor binding sites, and in the transcription factors themselves can be key determinants in the reconfiguration of bacterial regulatory circuits.
The architecture of a given promoter dictates the particular mechanism that a transcription factor uses to promote transcription. This is because, depending on the position of a binding site, a specific transcription factor surface will be exposed to make productive contacts with RNA polymerase or other transcription factors (Figure 4). For instance, the transcriptional activator CRP from E. coli interacts with RNA polymerase through more than one surface depending on the location of its binding site (Niu et al., 1996). In addition to the location of a binding site, its orientation can also determine the transcription factor contact surface exposed to make contacts with RNA polymerase. This appears to be the case for transcription factors belonging to the OmpR/PhoB family of regulators because these bind as homodimers in a head to tail conformation to direct DNA repeats (Blanco et al., 2002; Harrison-McMonagle et al., 1999) and because functional binding sites for a member of this family have been detected in both orientations with respect to the direction of transcription (Zwir et al., 2005).
Given that transcriptional activation entails interaction of a transcription factor with the transcription machinery, the architecture of functional promoters may constrain the location and orientation of binding sites. Consistent with this notion, the PhoP box location has been maintained among many promoters activated by the PhoP protein in several members of the family Enterobacteriaceae that diverged >200 million years ago (Perez and Groisman, 2009). When new genes are brought under PhoP control, their promoters could either acquire this "ancestral" architecture or develop a "novel" architecture. Both scenarios have occurred as some PhoP-regulated horizontally-acquired genes in Salmonella have promoters harboring functional PhoP boxes located at the position shared with the ancestral PhoP-activated gene promoters whereas others harbor a PhoP box further upstream (up to ~60 nt) and in the opposite relative orientation (Perez et al., 2008; Zwir et al., 2005). Interestingly, these promoter architectures are often species- or lineage-specific.
Different promoter architectures may demand distinct modes of transcriptional activation (Figure 4). Therefore, it is likely that a transcription factor adopts distinct strategies to cope with the novel promoters, perhaps through a gain of function mutation that enable a new interaction(s) with a different RNA polymerase subunit or a co-regulator. A prediction from this model is that changes in promoter architecture and in a transcription factor could result in specialization so that activation of certain promoters may require a “modified” transcription factor even when the motif recognized by it is conserved across bacterial species. If this is the case, transcription factors of different species may no longer be fully functionally equivalent (although they may retain the ability to operate with the "ancestral" promoter architecture). Indeed, this has been shown to be the case for the PhoP proteins from Salmonella and Yersinia, which are interchangeable with respect to transcription of ancestral genes but display species-specificity for particular horizontally-acquired targets (Perez and Groisman, 2009).
What type of structural change(s) modify the activity of a transcription factor so that it can operate with novel promoter architectures? Gains or losses of protein-protein interactions have been reported in eukaryotic transcription factors (Lynch et al., 2008; Tsong et al., 2006; Tuch et al., 2008a; Tuch et al., 2008b) and several mechanisms have been proposed to promote the emergence of new functions in these proteins including domain shuffling, short-linear motif switches, and variations in simple sequence repeats (reviewed in Lynch and Wagner, 2008). Then, large modifications in the structure of a bacterial transcription factor do not appear to be necessary to change its ability to operate with novel promoter architectures. Consistent with this notion, functional differences have been detected even among orthologous regulators exhibiting >90% overall identity (Lintner et al., 2008).
The complexity of transcriptional control in bacteria is large enough that the output of a circuit can be altered in multiple ways. This is despite the fact that bacterial promoters are relatively short and the number of proteins required to elicit transcription of a given gene is small compared to eukaryotes. The location of binding sites, the position of the transcription factor contact surface with respect to the contact surface on RNA polymerase as well as the sequences in the core promoter (which determine the kinetics of the individual promoter), all contribute to whether a particular promoter will respond to a particular transcription factor. All these elements can be affected by point mutations and lead to changes in gene expression.
The DNA motifs recognized by orthologous transcription factors are typically conserved across closely related species, but can vary considerably across different phyla. For instance, the bacterial repressor LexA, which governs the response to DNA damage, is widely distributed across most major groups of bacteria. However, the LexA box in the Gram-positive bacterium B. subtilis is remarkably unrelated to the LexA box in the Gram-negative E. coli, and the motifs in these two organisms are completely different to the corresponding motif found in α-Proteobacteria (Erill et al., 2007). The changes in the DNA motif have been accompanied by modifications in the transcription factor because a LexA protein recognizing a derived motif cannot take up its regulatory role in other species (Erill et al., 2007). The LexA protein functions as repressor making it unlikely that promoter architecture has played a major role in the evolution of the LexA protein (as opposed to the proposed evolutionary scenario for activators such as the PhoP protein). This is because the exact position and orientation of the LexA boxes within the promoter are not as critical as long as their occupancy by the LexA protein obstructs RNA polymerase binding to the target promoters and repress gene expression.
It is becoming increasingly clear that regulatory circuits are constantly being modified and that these changes contribute significantly to the generation of phenotypic diversity within and across species. The evolution of bacterial regulatory circuits entails four classes of changes: 1) transcriptional rewiring whereby the promoters of orthologous genes in related species differ in the presence or absence of a binding site(s) for a conserved transcription factor(s); 2) embedding horizontally-acquired genes under regulation of an ancestral transcription factor; 3) restructuring of the promoters controlled by a transcription factor; and 4) modifications in the transcription factors themselves. The combination of these changes enables bacteria to expand or modify the repertoire of cellular functions that transcription factors control.
We have focused here on transcriptional regulation because transcription initiation is a prominent regulated step in gene expression. However, it is now evident that non-coding RNAs are key components of regulatory circuits both in eukaryotes (Bartel, 2009) and bacteria (Waters and Storz, 2009). Thus, a comprehensive understanding of how organisms regulate their genomes, and how this comes about, will require the integration of knowledge about both transcription circuits as well as cis- and trans-acting regulatory RNAs. Some small RNAs encode small biologically active peptides (Wadler and Vanderpool, 2007), raising questions about the selection taking place on genes that encode bifunctional RNA and peptide products.
Finally, to our knowledge, the evolution of transcriptional regulatory circuits has not been investigated in archaeal species. This would be of interest given that gene transcription in archaea more closely resembles transcription by RNA polymerase II in eukaryotes, yet archaea share with bacteria the capacity for horizontal gene transfer (Navarre et al., 2007; Porwollik and McClelland, 2003).
We thank L. Schroer for help with the figures. Our research on bacterial regulatory circuits is supported, in part, by grants 49561 and 42236 from the NIH. E.A.G. is an HHMI investigator.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.