|Home | About | Journals | Submit | Contact Us | Français|
Bacterial signalling network includes an array of numerous interacting components that monitor environmental and intracellular parameters and effect cellular response to changes in these parameters. The complexity of bacterial signalling systems makes comparative genome analysis a particularly valuable tool for their studies. Comparative studies revealed certain general trends in the organization of diverse signalling systems. These include (i) modular structure of signalling proteins; (ii) common organization of signalling components with the flow of information from N-terminal sensory domains to the C-terminal transmitter or signal output domains (N-to-C flow); (iii) use of common conserved sensory domains by different membrane receptors; (iv) ability of some organisms to respond to one environmental signal by activating several regulatory circuits; (v) abundance of intracellular signalling proteins, typically consisting of a PAS or GAF sensor domains and various output domains; (vi) importance of secondary messengers, cAMP and cyclic diguanylate; and (vii) crosstalk between components of different signalling pathways. Experimental characterization of the novel domains and domain combinations would be needed for achieving a better understanding of the mechanisms of signalling response and the intracellular hierarchy of different signalling pathways.
Bacterial signal transduction systems provide a fascinating array of numerous interacting components that sense changes in a variety of environmental and intracellular parameters and transmit these signals to various cellular mechanisms to cause adaptive changes in metabolism, physiology and/or behaviour (for reviews, see Hoch and Silhavy, 1995; Grebe and Stock, 1999; Stock et al., 2000; Inouye and Dutta, 2003). The complexity of signalling networks in model organisms, such as Escherichia coli and Bacillus subtilis, has long hindered their systematic analysis. The first description of a two-component system by Ninfa and Magasanik (1986) was quickly followed by the discovery of crosstalk between nitrogen assimilation and chemotaxis (Ninfa et al., 1988), suggesting complex interactions between different regulatory systems and signal integration. However, various components of the bacterial signalling machinery – histidine kinases and response regulators, cyclic AMP (cAMP)-dependent systems, phosphotransferase system components and chemotaxis proteins – were traditionally viewed and studied as separate entities. Certain components of the signal transduction circuits, such as adenylate cyclases, diguanylate cyclases and phosphodiesterases, and serine/threonine protein kinases and phosphatases, were identified only recently and many remain poorly characterized. The availability of complete genomic sequences from numerous bacterial species allowed the researchers for the first time to evaluate the total number and composition of the signal transduction proteins encoded in each particular genome and finally appreciate the complexity of the whole system. In a way, instead of describing legs, trunk and tail of the same elephant, we can now take a look at the entire elephant. Studies of signal transduction offer one of the first examples where genomics has actually lead to important biological insights that would have been otherwise impossible. It has become clear that, in addition to sensory histidine kinases and methyl-accepting chemotaxis proteins, bacteria have other receptor proteins with similar overall organization, namely with an N-terminal periplasmic or extracytoplasmic sensory domain, followed by one or more transmembrane segments and a cytoplasmically located signal transduction domain. This kind of organization has been described for membrane-anchored adenylate cyclases, putative diguanylate cyclases and phosphodiesterases, serine/threonine protein kinases and phosphatases, revealing a much more complex signalling network than has been generally assumed before genomics (Galperin et al., 2001a; Kennelly, 2002). Another interesting result was gleaned from the absence of certain genes in a genome, namely the absence of membrane-associated components (EIIB and EIIC) of the phosphoenolpyruvate-dependent sugar:phosphotransferase system (PTS) in Xylella fastidiosa and several other bacteria that still encode the soluble components of PTS (EI, HPr and EIIA). This observation, which could not have been made without knowledge of the complete genome, suggested that soluble PTS components in these organisms are involved solely in signalling. In other words, studies of signal transduction have grown from ‘sequence gazing’ (a term coined by Henikoff, 1991), to ‘genome grazing’ when comparative genomics became an integral part of discovery and analysis. The rapid rate of genome sequencing (148 complete prokaryotic genomes available in GenBank® at the end of 2003) is contributing to the progress in comparative genome analysis.
When the first bacterial genomes were sequenced, the first order of business, of course, was to enumerate the genes encoding signal transduction proteins in each organism and to perform cross-species comparisons to determine which systems are common and which are specific for a given species or genus. Several independent ‘censuses’ of bacterial signal transduction proteins (Mizuno et al., 1996; Mizuno, 1997; Koretke et al., 2000; Galperin et al., 2001a; Ashby, 2004) brought very similar results, leading to a general consensus on the distribution of various signalling systems in various microorganisms (see Table 1). It has been found, for example, that parasitic bacteria usually encode fewer signalling proteins than free-living bacteria, even if one takes into account their smaller genome sizes. Gram-positive bacteria and archaea turned out to have fewer signal transduction proteins than proteobacteria or cyanobacteria of the same genome size (Galperin et al., 2001a).
In the beginning of the genome era, we have predicted that the new paradigm of genome-based microbiology would eventually replace the old paradigm of gene-by-gene approach (Koonin and Galperin, 1997). Now, seven years later, this process of paradigm shift can be seen in earnest. The ever-improving coverage of microbial diversity by complete genome sequences allows increasingly accurate reconstructions of the metabolic pathways in poorly studied organisms (Koonin and Galperin, 2002; Osterman and Overbeek, 2003) and even prediction of their nutritional requirements (Lemos et al., 2003). In a similar fashion, one could hope that some day it would be possible to reconstruct microbial signalling pathways and predict responses of a given microorganism to various environmental factors, based solely on its genome content. The first glimpses of such approaches are already evident from the Table 1. Indeed, two α-proteobacteria, Mesorhizobium loti and Caulobacter crescentus, encode the same number of histidine kinases but differ dramatically in the number of encoded adenylate cyclases and methyl-carrier proteins, suggesting the importance of chemotactic response for the latter, but not the former, organism. Here I briefly discuss the recent insights into microbial signal transduction that originated from comparative genome analyses and list some unresolved problems.
The most important feature of signal transduction proteins is their modular organization, presciently noted by Parkinson and Kofoid (1992) 12 years ago. Modular organization accounts for enormous diversity of components of the bacterial signal transduction systems, but it also makes possible their systematic analysis. Most signal transduction proteins consist of two or more domains – evolutionarily conserved individually folding compact protein units that have more or less same functions regardless of genomic context (Table 2). If this definition sounds somewhat fuzzy, so are boundaries of many domains. Nevertheless, most domains can be relatively easily recognized and associated with particular biochemical functions. Thus, response regulators typically contain a phosphate-accepting receiver domain CheY – similar to the chemotaxis transducer protein of the same name, often referred to as ‘chemotaxis response regulator’ – and a DNA-binding signal output domain of the helix-turn-helix (HTH), winged helix, SAPR, LytTR, Fis, or some other family. This means that the CheY domain, like many other signalling domains, is ‘promiscuous’, i.e. can be found in a variety of distinct proteins associated with distinct signalling domains. In sequence similarity searches using blast, fasta or other algorithms such promiscuous domains readily align with each other, which results in convincingly high similarity scores between otherwise unrelated proteins and significantly complicates sequence analysis (Fedorova et al., 2003). For example, in a blastp search, two different proteins sharing only a common CheY domain would nonetheless be aligned over 100–120 residues with reported probability of such a hit solely by chance of 10−4 or even lower. This makes splitting a multidomain signal transduction protein into individual domains the necessary first step in its sequence analysis. Often enough, it is also the easiest way to get an insight into potential functions of this protein.
The most straightforward way to delineate the domain composition of a signal transduction protein is to compare it to a protein domain database, such as Pfam (http://www.sanger.ac.uk/Software/Pfam), SMART (http://smart.embl-heidelberg.de), or COG (http://www.ncbi.nlm.nih.gov/COG) (see Bateman et al., 2004; Letunic et al., 2004; Tatusov et al., 2000 respectively). The European Bioinformatics Institute (EBI) and the National Center for Biotechnology Information (NCBI) maintain integrated domain databases, respectively, InterPro (http://www.ebi.ac.uk/interpro), which unifies, among others, SMART and Pfam entries (Mulder et al., 2003), and the Conserved Domains Database (CDD, http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml), which unifies SMART, Pfam and COG entries (Marchler-Bauer et al., 2003). Each of these databases provides a brief description of every domain and a variety of external links. SMART lets the user to list all the proteins with given domain organization (or domain composition). The Conserved Domains Database (CDD) offers a useful CDART tool that allows one to look for various domain combinations involving the given domain (Geer et al., 2002). Importantly, these databases rely on different software tools and different default parameters for domain identification (hidden Markov models in SMART, Pfam, and InterPro, COGnitor in COGs and Reverse Position-Specific blast in CDD), so the results obtained are not necessarily identical and it is always better to use more than one database (see Koonin and Galperin, 2002; for a discussion of the advantages of each particular database). For example, the recently characterized HWE family of histidine kinases (Karniol and Vierstra, 2004) had been properly annotated in COGs and CDD, but not in SMART or Pfam.
Although novel signalling domains continue to be discovered, it is safe to assume that domains with the widest phylogenetic distribution have already been identified through cross-genome comparisons. It has become clear that certain sensory domains have very narrow specificity towards their ligands (e.g. citrate or nitrate binding domains, see below). Some domains, however, are rather indiscriminate in their affinities and interact with a wide variety of ligands. The PAS domain, for example, binds flat heterocyclic molecules from haeme to flavin to cinnamic acid and, reportedly, even the adenine moiety of ATP (see Table 2). The exact functions of many signalling domains remain obscure, which opens new avenues for future experimental studies.
For the purposes of this review, components of the signal transduction system can be subdivided into sensory (usually, ligand-binding) domains, signal transduction (phosphorylation, methylation, homodimerization) domains, and signal output (DNA-binding, heterodimerization or enzymatic) domains.
Recent studies have greatly expanded the diversity of known sensory domains, adding many novel periplasmic (or extracytoplasmic) domains, as well as cytoplasmically located and integral membrane domains (see Table 3). Some recently described sensory domains have well-defined and narrow substrate specificity, e.g. nitrate-binding NIT domain (Shu et al., 2003) and citrate-binding CitAP domain (Gerharz et al., 2003; Pappalardo et al., 2003). For many other domains, the nature of the sensed signal(s) remains unknown and their roles as sensors are deduced solely from their predicted location as periplasmic domains of different transmembrane receptors: histidine kinases, methyl-accepting proteins, adenylate or diguanylate cyclases and phosphodiesterases.
Periplasmic solute-binding proteins have long been known to function as ligand-binding domains of sensor histidine kinases, for example, in the EvgS protein from Escherichia coli. Recently, however, a protein of the same family, Pseudomonas aeruginosa AmiC (PA33664), was found localized in the cytoplasm. AmiC serves as the receptor and negative regulator for amide-inducible aliphatic amidase operon amiEBCRS. Together with the RNA-binding response regulator AmiR, AmiC regulates expression of the AmiE amidase, as well as expression of its own gene, amiC, in response to amides (Wilson et al., 1996).
Another cytoplasmically located sensor domain is the N-terminal turgor-sensing domain of the K+-transport regulatory kinase KdpD from E. coli and other bacteria. Recent data indicate that the transmembrane segments of this protein are needed only for proper positioning of the sensory domain with respect to the histidine kinase domain (Heermann et al., 2003). Remarkably, in Bacillus cereus, Deinococcus radiodurans and several other bacteria this sensory domain is encoded in a stand-alone form.
The most common cytoplasmic signalling domains are PAS and GAF domains (Aravind and Ponting, 1997; Ho et al., 2000; reviewed in Taylor and Zhulin, 1999; Galperin et al., 2001a; Hurley, 2003). Originally recognized as cytoplasmic domains of histidine kinases, these domains have now been found in combination with a great variety of other signalling domains (see Table 4). The PAS and GAF domains were shown to have similar structures, characterized by a presence of a ligand-binding pocket that can accommodate a variety of small-molecule ligands, from haeme to flavin to adenine and guanine (Ho et al., 2000; Crosson and Moffat, 2001). Presence of oxygen was shown to affect the position of the PAS-bound haeme molecule, causing a change in the general conformation of the PAS domain and thereby allowing the sensing of oxygen to effect signal transmission to the C-terminally located signal transduction domains (Delgado-Nixon et al., 2000; Chang et al., 2001). Due to the availability of several comprehensive reviews (Taylor and Zhulin, 1999; Zhulin, 2001; Hurley, 2003), these relatively well-studied domains will not be discussed here in any detail. We will also leave aside small-molecule-binding domains (ACT, 4VR, 3H and others) that are involved in allosteric regulation of metabolic enzymes and modulation of the activity of transcriptional regulators (Anantharaman et al., 2001).
Integral membrane domains can also serve as sensors. Although certain histidine kinases were long known to contain multiple transmembrane segments (Kadner, 1995), it was not clear whether these segments actually worked as sensors or just anchored the enzyme in the membrane. The first membrane domain with proven sensory function was the ethylene-binding domain of the Arabidopsis thaliana ETR1 protein (Bleecker, 1999). This domain was found later in cyanobacteria and several proteobacteria (Mount and Chang, 2002). Histidine kinases from P. aeruginosa (PA3271), Vibrio cholerae (VC0303), and several other bacteria contain proline permease-like N-terminal domains, which led to a suggestion that these proteins might serve as sensors of sodium-motive force (Häse et al., 2001). The final evidence that integral membrane segments may represent evolutionary mobile conserved domains has come when several such domains (MHYT, MASE1 and MASE2) were found in association with two or more different signal output domains (see below), suggesting their involvement in signalling (Galperin et al., 2001b; Nikolskaya et al., 2003). Recently, four new families of membrane signalling domains with seven, seven, five and eight transmembrane segments, respectively, were identified (Anantharaman and Aravind, 2003). Although sequence conservation in these domains allows one to make educated guesses about their ligands (metal binding for MHYT and MHYE, aromatic compound binding for MASE1 and MASE2, carbohydrate binding for 7TMR-DISMED1), their exact nature is still obscure and needs to be explored in experimental studies.
Environmental changes, sensed by the periplasmic (extracytoplasmic) or membrane-embedded sensory domains of transmembrane receptors, affect cytoplasmically located domains of these receptors to trigger appropriate cellular responses. These responses include increased gene expression, changes in motility (chemotaxis), changes in secretion and many other processes. Exactly how the signal is transmitted across the membrane from sensory to cytoplasmic domains is still not completely understood. Dimerization (oligomerization) events appear to be important in some cases. A dimerization domain, HAMP (Aravind and Ponting, 1999; Williams and Stewart, 1999), has been found in many transmembrane receptors, but certainly not in all of them (Zhulin et al., 2003). Several potential mechanisms of transmembrane signalling are currently being considered and breakthroughs in this area are expected in the near future (see Falke and Hazelbauer, 2001).
An important recent development coming from comparative genomics was the realization that histidine kinases and methyl-accepting proteins are the major but by no means the only receptor molecules capable of sensing extracellular signals. Fusions of periplasmic sensory domains to adenylate cyclase, serine/threonine protein kinase, diguanylate cyclase and phosphodiesterase domains have been described, reinforcing the notion that bacterial signalling is even more complex than previously thought. The first of these, the receptor adenylate cyclase, was originally recognized in cyanobacteria Spirulina platense and Anabaena sp. PCC7120 as a fusion of type 3 adenylate cyclase domain to a periplasmic sensor domain and experimentally verified to have adenylate cyclase activity (Yashiro et al., 1996; Katayama and Ohmori, 1997). Sensory adenylate cyclases were soon described in a number of other organisms, including proteobacteria Stigmatella aurantiaca and Myxococcus xanthus (Coudart-Cavalli et al., 1997; Kimura et al., 2002).
Serine/threonine protein kinases and phosphatases have been known in prokaryotes for quite some time (Yang et al., 1996; Shi et al., 1998), but most of them either were soluble enzymes, or contained cytoplasmic N-terminal kinase domains and were anchored in the membrane by their C-terminal fragments. Recently, however, serine/threonine protein kinase domains were found in membrane receptor proteins, fused to N-terminal periplasmic sensory domains (Zhulin et al., 2003; see Table 2). In addition, transmembrane receptors were found whose cytoplasmic domain was similar to RsbU, a protein phosphatase of the PP2C family that is involved in an environmental stress signalling pathway (Yang et al., 1996; Zhulin et al., 2003). Unfortunately, the signals sensed by these proteins are still obscure, as are their phosphorylation/dephosphorylation targets. Because of their apparent similarity to the eukaryotic signalling systems, bacterial protein kinases and phosphatases are attractive targets for further experimental studies of the signal transduction mechanisms.
The list of bacterial membrane receptors also includes a group of proteins that combine periplasmic sensory domains with the cytoplasmic GGDEF, EAL and HD-GYP domains, whose enzymatic functions are still somewhat uncertain (reviewed in Galperin et al., 2001a). The GGDEF domain is often paired with the EAL domain, forming a diguanylate cyclase/phosphodiesterase combination that catalyses synthesis and hydrolysis of cyclic diguanylate (c-diGMP, Fig. 1). In Gluconobacter xylinum, c-diGMP regulates formation of extracellular cellulose (Tal et al., 1998). It has recently been implicated in regulation of extracellular polysaccharide formation in a number of other bacteria (Rashid et al., 2003). In Caulobacter crescentus, a GGDEF domain-containing response regulator PleD is involved in regulation of the cell development programme, offering a convenient model to study its potential functions (Ausmees et al., 2001; Aldridge et al., 2003). Studies of this protein strongly suggested that the GGDEF domain acts as a diguanylate cyclase that combines two GTP molecules to form c-diGMP (Ausmees et al., 2001; Pei and Grishin, 2001). Although an unequivocal biochemical proof that the purified GGDEF domain indeed carries this activity has not been published so far, preliminary data indicate that this is in fact the case (M. Gomelsky, pers. comm.). The EAL domain must therefore be responsible for the complementary phosphodiesterase activity that degrades c-diGMP, either by itself or in combination with the GGDEF domain. Judging from its sequence motifs, HD-GYP is also a phosphodiesterase domain (Galperin et al., 1999). Its natural phosphoester substrate remains unidentified; it too could be c-diGMP. The biochemical characterization of these domains and processes that they regulate is still in the very early stages. In any case, the sheer abundance of genes encoding GGDEF, EAL, and HD-GYP domains in diverse bacterial genomes (Table 2) shows that they represent a major signalling system with c-diGMP most likely functioning as a secondary messenger in signal transduction. The potential importance of this novel signalling pathway came to light only thanks to the availability of complete genome sequences.
Signal transduction downstream of sensory histidine kinases and methyl-accepting proteins often involves intermediate domains, including HPt and CheY domains, which are relatively well characterized and described in detail elsewhere (Stock et al., 2000; Inouye and Dutta, 2003). In contrast, signal transduction downstream of receptor adenylate cyclases does not involve any proteins or domains beside the cAMP receptor protein (CAP). The cAMP-CAP complex has been shown to activate transcription of many genes in diverse bacterial species. Although the mechanisms of c-diGMP action are still obscure, signal transduction from transmembrane receptors containing GGDEF, EAL or HD-GYP signalling domains likewise do not seem to involve any intermediate domains, at least judging from the domain numerology (Table 2). In the only experimentally characterized model, activation of the G. xylinum cellulose synthase by c-diGMP was mediated by a membrane-bound c-diGMP-binding protein (Weinhouse et al., 1997).
In a sense, adenylate cyclase, diguanylate cyclase and phosphodiesterase can be considered output domains, so that input and output modules of these receptors co-exist on the same polypeptide chain. In an even further deviation from the ‘two-component’ paradigm, certain transmembrane sensors were found to contain C-terminal DNA-binding domains (Nikolskaya and Galperin, 2002), following the classical example of the lysine-sensing transcriptional regulator CadC (Dell et al., 1994). These extreme cases clearly show that there are no strict limits on the number of components in the signal transduction chain, which may vary from one to three or more. It should also be noted that certain signalling systems include a stand-alone sensor protein that interacts with a histidine kinase (Kadner, 1995) while other transmit the signal directly to the transcription regulation apparatus (Braun, 1997).
Response regulators of the two-component system typically consist of an N-terminal phosphoacceptor CheY domain and a C-terminal DNA-binding output domain that activates or represses transcription of specific target genes (Martinez-Hackert and Stock, 1997; Grebe and Stock, 1999; Stock et al., 2000). These DNA-binding domains are quite diverse: a majority belongs to the winged helix family, exemplified by the well-known OmpR and PhoB proteins, but there are several families of helix-turn-helix (HTH) domains, such as NarL/FixJ, AraC/XylS and Spo0A domains. In addition, certain response regulators contain non-HTH DNA-binding domains of SAPR, LytTR or Fis families. Although the operons that these response regulators activate or repress are often unknown, SAPR family proteins are typically involved in the regulation of secondary metabolism (Wietzorrek and Bibb, 1997), whereas LytTR family proteins often regulate production of virulence factors (Nikolskaya and Galperin, 2002). In transcriptional regulators of the NtrC family, the N-terminal CheY domains and the C-terminal DNA-binding Fis-like domains are separated by the central AAA-type ATP-binding domains, whose ATPase activity is required for the DNA-binding (Hwang et al., 1999).
In addition to the DNA-binding response regulators, an RNA-binding output domain has been described in Pseudomonas aeruginosa response regulator AmiR and related proteins (Shu and Zhulin, 2002).
In certain response regulators, the output domains are enzymatic and do not necessarily regulate transcription. Such response regulators combine the CheY domain with CheB-type methylesterase domain, GGDEF, EAL, HD-GYP or PP2C domains, mentioned above, or with other, sometimes unknown, enzymatic domains.
In addition to transmembrane receptors, there are several well-studied histidine kinases that have no transmembrane segments, such as chemotaxis histidine kinase CheA and nitrogen regulation protein NtrB (GlnL) from E. coli, sporulation kinase KinA from B. subtilis, or rhizobial oxygen sensor FixL (see Hoch and Silhavy, 1995; for reviews). Receptor census (Table 1) shows that free-living bacteria typically encode a significant number of intracellular histidine kinases, adenylate cyclases, diguanylate cyclases and phosphodiesterases. In fact, their cytoplasmic signalling network may be as complex as transmembrane signal transduction system. The genome of M. loti, for example, encodes 13 copies of the adenylate cyclase domain (Table 1). Of these, only one appears to be fused to a periplasmic sensor domain, and another one is fused to an integral membrane sensor domain. All the rest are found in predicted cytoplasmic proteins, fused to poorly characterized N-terminal or C-terminal domains, most of which are likely involved in signalling. Of the 32 copies of the GGDEF domain, encoded in M. loti, 18 belong to transmembrane sensors and 14 are found in intracellular signal transduction proteins and response regulators (Table 1).
Intracellular signalling proteins typically combine N-terminal cytoplasmic sensor domains, usually PAS or GAF, with a variety of signal transduction or output domains (Table 4). Some of these proteins contain N-terminal CheY domains and can be considered bona fide response regulators. Indeed, phosphorylation of the CheY domain was shown to affect adenylate cyclase activity of the C-terminal ACyc domain, just as it affects DNA-binding properties of classical response regulators (Coudart-Cavalli et al., 1997). However, many intracellular signalling proteins lack the CheY domains. Such proteins should not be confused with response regulators, despite certain parallelism in their domain architectures (see Fig. 2 and Table 4). For example, in addition to four NtrC-type response regulators of the CheY-AAA-Fis domain architecture (AtoC, GlnG, HydG and YfhA, see COG2204), E. coli K12 encodes three intracellular signalling proteins with GAF-AAA-Fis domain structure (FhlA, HyfA, and NorR, see COG3604) and one more protein (YgeV) with GAF-PAS-AAA-Fis domain structure. Whereas the exact nature of the ligands of most of these proteins remains obscure, there is little doubt that they are directly involved in monitoring levels of NO and other intracellular parameters and regulating transcription in response to changes in these parameters (Gardner et al., 2003).
Several pioneering studies have provided experimental evidence of the involvement of cytoplasmic signalling proteins in intracellular signalling. An E. coli protein with the PAS-GGDEF-EAL domain combination has been named a ‘direct oxygen sensor’ (DOS), based on the effect oxygen binding has on the conformation of its N-terminal domain (Delgado-Nixon et al., 2000). Further, oxygen binding has been shown to activate the phosphodiesterase activity of a G. xylinum protein with the same domain organization (Chang et al., 2001). Likewise, cGMP binding to the GAF domain of human phosphodiesterase PDE5 was shown to stimulate the activity of its C-terminal enzymatic domain (Rybalkin et al., 2003). Besides oxygen, the DOS protein could also bind NO and CO, indicating that PAS- or GAF-containing molecules could be used for sensing a variety of intracellular parameters and effecting a variety of cellular responses. Finally, the NorR protein of GAF-AAA-Fis domain architecture has been shown to regulate transcription in response to nitric oxide and reactive nitrogen species (Pohlmann et al., 2000; Gardner et al., 2003; Mukhopadhyay et al., 2004). These results clearly demonstrate that ligand binding to the N-terminal PAS and/or GAF domains can modulate the activities of the downstream output domains. Thus, the similarity between CheY-containing response regulators and PAS- or GAF-containing signallers apparently extends to their regulation mechanisms: both phosphorylation of the CheY domain in response to the extracellular signal and ligand binding to PAS or GAF, comprising an intracellular signal, induce conformational changes in these domains. In turn, these conformational changes activate (rather, cause a relief of inhibition) the downstream output domains, allowing them to perform their functions, be that binding DNA or RNA, catalysing synthesis or hydrolysis of cAMP or c-diGMP, demethylation of MCPs, and so on.
Unfortunately, most intracellular signalling proteins are still poorly studied and remain to be recognized as legitimate members of the bacterial signalling network.
In the extreme diversity of signalling domain combinations encoded in different microbial genomes (see Table 4 for examples), several domain architectures stand out, clearly demonstrating the possibility of cross-talk between different signalling systems. It is well known that many histidine kinases and at least some methyl-carrier proteins contain PAS and GAF domains, which appear to modulate the activity of these proteins (Aravind and Ponting, 1997; Taylor and Zhulin, 1999). Likewise, the existence of PAS-PP2C, GAF-PP2C and PAS-GAF-PP2C combinations in many bacteria, particularly Gram-positive bacteria and actinobacteria, indicates a link between energy stress and σB-dependent transcription (Vijay et al., 2000). The existence of the GAF-PtsI combination in many proteobacteria provides a way for GAF-sensed signals to affect PTS-dependent processes of catabolite repression and inducer exclusion. Because the GAF domain can bind cAMP, this might be a feedback mechanism of maintaining cAMP levels. Fusions of the phosphothreonine-binding FHA (forkhead-associated) domain with ACyc and GGDEF domains, found in several cyanobacteria, suggest that protein phosphorylation could affect the activities of the respective cyclases. Finally, two recently sequenced genomes, Thermosynechococcus elongatus and Pirellula sp., encode a FHA-GAF-HisKin domain combination that ties together three different signalling mechanisms. An even more vivid demonstration of the principle that any two signalling pathways can affect each other is the coexistence of the Ser, Thr-kinase and HisKin domains on the same polypeptide chain in several proteins from Anabaena sp. (Ohmori et al., 2001).
For many years, two-component systems were believed to be specifically bacterial, whereas serine-, threonine- and tyrosine-dependent kinases were seen exclusively in eukaryotes. Sequencing of complex bacterial genomes, as well as genomes of plants, animals and lower eukaryotes, revealed the presence of unexpected signalling domains in many of them. How exactly these systems appeared in these genomes is still a matter of controversy. Some of them most likely have come from a common ancestor or were appropriated by first eukaryotes from their pro-mitochondrial or pro-chloroplast symbionts. In other cases, a relatively recent horizontal gene transfer seems to be a plausible explanation. Anyway, one should not be shocked to find in a free-living bacterium a signalling domain seen previously only in eukaryotes. Conversely, many eukaryotic signalling domains appear to have roots in bacteria (Ponting et al., 1999; Koretke et al., 2000; Aravind et al., 2003). This is an unexpected but promising development, as data on eukaryotic signal transduction could help in deciphering the functions of bacterial proteins and vice versa. Of course, one has to be cautious, as, for example, most, if not all, the genes encoding the GGDEF, EAL and HisKA domains in the mosquito Anopheles gambiae genome (according to the SMART database, nine, six and 10 copies respectively) probably have come from bacterial contamination.
Studies of domain architectures of metabolic enzymes revealed a very limited number of possible domain architectures, making the annotation relatively straightforward even in cases like human CAD protein (PYR1_HUMAN, P27708) that consists of three different domains. In signalling systems, however, associations of various sensory, signal transduction and output domains produce almost an infinite number of domain combinations with a three-domain protein looking fairly mundane. The complexity of domain organization makes correct functional annotation of signal transduction proteins anything but trivial (Fedorova et al., 2003). Annotation is further complicated by the fact that many conserved domains have poorly understood or unknown enzymatic activities and/or binding specificities. Even the best domain analysis tools, employed in domain databases like SMART, Pfam and CDD, are not designed to provide correct annotation for the whole protein. The best they can do is to (i) uncover the domain composition of the given protein and (ii) show the annotations of other proteins with the same domain composition, if available (Geer et al., 2002). In many cases these annotations are inconsistent; at least some of them are likely to be wrong. We have noted, for example, that three virtually identical proteins from Anabaena sp. PCC7120 were originally annotated, respectively, as ‘adenylate cyclase’ (All7310, TrEMBL entry Q8YKI7), ‘similar to adenylate cyclase’ (All3180, Q8YSA9) and a ‘hypothetical protein’ (Alr1378, Q8YX39), although none of them actually contained the adenylate cyclase domain (Zhulin et al., 2003). In fact, annotation of signal transduction proteins as ‘unknown’, ‘hypothetical’, or ‘conserved hypothetical’ proteins is quite common and is generally considered to be appropriate. As we have argued earlier, short of a systematic mistake in sequencing, a protein that is conserved across diverse phylogenetic lineages should not be considered hypothetical (Galperin, 2001). So what could be an acceptable annotation of a novel signal transduction protein in a newly sequenced genome? First of all, it would be helpful to include the word ‘signalling’, as in ‘signalling protein’. If domain analysis shows the presence of a well-characterized enzymatic domain, the protein should be annotated based on its enzymatic activity, as a histidine kinase, adenylate cyclase, and so on. Presence of a periplasmic or membrane-bound sensory domain should also be reflected in the name, making it, for example, ‘sensory Ser/Thr-protein kinase’. Of course, if the ligand specificity of the sensory domain is known, that, too, should be reflected in the name, for example, in ‘osmosensory cAMP phosphodiesterase’, or ‘pH-sensing histidine kinase’. Finally, if one can say nothing besides the domain composition, annotation of the novel protein should probably look as follows: ‘Predicted signal transduction protein, containing PAS, GAF and HD-GYP domains’. This would still be much better than ‘similar to hypothetical protein’.
Despite significant progress made in the last several years, we are still far from understanding many key aspects of bacterial signal transduction. First of all, it is often not clear which particular parameters are measured by many sensor domains (Table 3). Osmolarity of the medium, for example, affects the intracellular K+ concentration, the ionic strength in the cytoplasm, water content of the cell and a number of other physicochemical parameters. The effects of changes in the external pH values or temperature are equally dramatic. Therefore, even for the relatively well characterized pH- and osmosensors, the exact nature of the signal often remains elusive (see Heermann and Jung, 2004; for a recent review).
Second, the list of signalling domains is probably far from complete and new domains of poorly defined function are still being described. For example, a predicted hydrolase of HD superfamily (COG1639), which is found mostly in stand-alone form, serves as the output domain of the P. aeruginosa response regulator PA0267 and several related proteins. However, it is not clear what is the substrate of this predicted hydrolase and what (if any) is the function of its inactivated variant in the V. cholerae response regulator VC1081.
Third, there are many domains that are likely involved in signalling but whose functions are still enigmatic. One of the most conspicuous examples is the tryptophan-rich sensory protein TspO/CrtK/MBR, an integral membrane protein found in representatives of all domains of life, from archaea to human (PF03073, COG3476). This protein, often referred to as peripheral-type mitochondrial benzodiazepine receptor, contains five predicted transmembrane segments with 12–14 well conserved aromatic amino acid residues, including seven Trp residues (Yeliseev and Kaplan, 2000). It has been shown to regulate photosynthesis gene expression in Rhodobacter sphaeroides, nutrient stress in Sinorhizobium meliloti, and to bind various benzodiazepins, tetrapyrrols and steroids, including cholesterol, protoporphyrin IX, and many others (Gavish et al., 1999; Davey and de Bruijn, 2000; Yeliseev and Kaplan, 2000; Lacapere and Papadopoulos, 2003). None of these functions, however, readily explains the role of this domain in cells of B. subtilis or Archaeoglobus fulgidus, which do not carry out photosynthesis and have no known affinity to benzodiazepines or steroids.
Another important but still uncharacterized signalling domain is PfoR, a predicted membrane protein found in many bacteria and distantly related to the membrane components of fructose- and sucrose-specific PTS (EIICFru, see COG1299). In Clostridium spp., pfoR genes are located upstream of the genes encoding thiol-activated cytotoxins perfringolysin O, tetanolysin O and septicolysin, but PfoR does not seem to regulate toxin expression (Awad and Rood, 2002). In Streptococcus pyogenes, however, the pfoR-like sloR gene was shown to affect streptolysin O expression, despite the fact that these genes are not adjacent in the genome (Savic et al., 2002). These examples clearly demonstrate that comparative sequence analysis is but a first step, rather than a panacea, and has to rely upon and be followed up by experimental studies.
Besides delineating the domain ‘parts set’, understanding of the signal transduction will require answering many critical questions. The following is a selection of questions that I consider most interesting and experimentally tractable within the next several years:
Although this selection primarily reflects personal bias, I strongly believe that the time has come when these and other critical questions about bacterial signalling can be finally addressed. This would require combining a variety of experimental and computational approaches, of which genome analysis will be a significant part.
I thank Mark Gomelsky, Kirsten Jung, Armen Mulkidjanian and Chester Price for critically reading the manuscript and for helpful comments. I apologize to all the colleagues whose important contributions could not be cited here due to the space limit.
†Dedicated to the memory of Moshe Benziman, who discovered the c-diGMP, diguanylate cyclases and c-diGMP phosphodiesterases. Passed away 10 September 2003.
Note added in proof
Recent analysis of the crystal structure of the citrate-binding CitAP domain (Reinelt, S., Hofmann, E., Gerharz, T., Bott, M., and Madden, D.R., 2003, The structure of the periplasmic ligand-binding domain of the sensor kinase CitA reveals the first extracellular PAS domain. J Biol Chem 278: 39189–39196) and of the NMR structure of the fumarate-binding domain of the sensor kinase DcuS (Pappalardo et al., 2003) revealed a PAS-like fold in both of these periplasmic domains. These data further expand the diversity of substrates that can be bound by PAS domains and show that PAS domains can function outside the cytoplasm.