|Home | About | Journals | Submit | Contact Us | Français|
Transmembrane receptors in microorganisms, such as sensory histidine kinases and methyl-accepting chemotaxis proteins, are molecular devices for monitoring environmental changes. We report here that sensory domain sharing is widespread among different classes of transmembrane receptors. We have identified two novel conserved extracellular sensory domains, named CHASE2 and CHASE3, that are found in at least four classes of transmembrane receptors: histidine kinases, adenylate cyclases, predicted diguanylate cyclases, and either serine/threonine protein kinases (CHASE2) or methyl-accepting chemotaxis proteins (CHASE3). Three other extracellular sensory domains were shared by at least two different classes of transmembrane receptors: histidine kinases and either diguanylate cyclases, adenylate cyclases, or phosphodiesterases. These observations suggest that microorganisms use similar conserved domains to sense similar environmental signals and transmit this information via different signal transduction pathways to different regulatory circuits: transcriptional regulation (histidine kinases), chemotaxis (methyl-accepting proteins), catabolite repression (adenylate cyclases), and modulation of enzyme activity (diguanylate cyclases and phosphodiesterases). The variety of signaling pathways using the CHASE-type domains indicates that these domains sense some critically important extracellular signals.
Signal transduction networks are information-processing pathways that recognize various physicochemical stimuli, amplify and process signals, and trigger adaptive responses in a cell. In prokaryotes, the best-studied and most frequently found signal transduction pathways are centered around two components, a sensor histidine kinase and a response regulator (for recent reviews, see references 28, 29, 56, and 62). Most of the characterized two-component systems regulate gene expression and utilize sensor (class I) histidine kinases for both stimulus detection and signal transmission to a response regulator (18). A specialized version of a two-component system that regulates microbial motility (chemotaxis) utilizes chemotaxis transducers (methyl-accepting chemotaxis proteins [MCPs]) that detect stimuli and transmit the signal to a nonsensor (class II) histidine kinase, CheA, which transmits it to its cognate response regulator, CheY (10, 11). Recent experimental data and genomic studies revealed the existence of additional sensory pathways in microorganisms that combine different elements of two-component signaling with various output modules, including adenylate cyclases and putative diguanylate cyclases and phosphodiesterases (8, 22, 25, 40, 48, 58).
The similarity between systems governed by sensor histidine kinases and by chemotaxis histidine kinases was first recognized in conserved but differently organized signaling domains involved in the phosphotransfer between the kinase and response regulator (5, 47). The advent of genomics resulted in detecting less-pronounced similarities between sensor histidine kinases, MCPs, and sensor cyclases. All three major types of sensor proteins share conserved intracellular sensory domains, PAS (49, 60, 70) and GAF (7, 68), as well as a specialized intracellular module for transmembrane signaling, the HAMP domain (6, 63).
A majority of sensor histidine kinases and MCPs recognize extracellular stimuli. The transmembrane MCPs of Escherichia coli serve as the model for studying transmembrane signaling (19). Comparative genomic analysis detected hundreds of MCPs in dozens of sequenced microbial genomes that have the same membrane topology as the E. coli proteins (69) and therefore are predicted to have a similar signaling mechanism. However, the extracellular ligand-binding domains of various MCPs and other sensor proteins that serve as input modules in microbial signal transduction appeared to comprise an extremely diverse group. Experimental studies seem to support this view. For example, the extracellular sensory domains of MCPs from E. coli (19), Bacillus subtilis (1), Pseudomonas aeruginosa (64), and Halobacterium salinarum (36) have quite different repertoires of chemical ligands, ranging from ions to polypeptides.
The recent explosion of microbial genomic data and the improvement in bioinformatic tools helped identify two types of extracellular sensing domains that are present in various transmembrane sensor proteins. The Cache (Ca2+ channels, chemotaxis receptors) domain (3, 69) is common to three major classes of sensor proteins: class I histidine kinases, MCPs, and adenylate cyclases. It is found in several characterized proteins: McpB (26) and McpC (23) of B. subtilis, which serve as amino acid and carbohydrate sensors, respectively, and the DctB histidine kinase of Rhizobium leguminosarum, which is a dicarboxylate sensor (51). In addition, the Cache domain has been found in association with the GGDEF and HD-GYP domains, which have not yet been characterized biochemically but are believed to have diguanylate cyclase (8, 48) and phosphodiesterase (21) activity, respectively (see reference 22 for a review). A recent survey detected Cache domains in more than 100 transmembrane sensors from various species, including the pathogens Borrelia burgdorferi, Campylobacter jejuni, Helicobacter pylori, Pseudomonas aeruginosa, Treponema denticola, and Vibrio cholerae (C. J. Shu, N. Lanka, and I. B. Zhulin, unpublished observations).
The CHASE (cyclase/histidine kinase-associated sensing extracellular) domain (4, 42) is another extracellular module that is present in more than one type of sensory proteins. It is less common than the Cache domain and was detected in sensor histidine kinases, adenylate cyclases, and predicted diguanylate cyclases/phosphodiesterases, but not in chemotaxis transducers. It appears to recognize such stimuli as cytokines and short peptides that are important for the development program of an organism.
In this study, we analyzed the extracellular domains that are found in different classes of sensory proteins encoded in recently sequenced bacterial and archaeal genomes. We identified five such domains, including two domains that were found in as many as four classes of transmembrane receptors. We named these novel domains CHASE2 through CHASE6 in order to maintain a uniform nomenclature for the extracellular sensory domains and underscore their occurrence in histidine kinases and various other membrane sensors.
Similarity searches were performed against the nonredundant protein database and the database of finished and unfinished microbial genome sequences at the National Center for Biotechnology Information (NCBI, Bethesda, Md.). Standard BLAST searches (2) were performed with default parameters. Position-specific iterative (PSI) BLAST searches (2) were performed with the inclusion threshold (E value) of 0.01. Domain architecture analyses were carried out by individually comparing the protein sequences against the Conserved Domain Database at NCBI (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml), using reverse-position-specific (RPS) BLAST, against the Clusters of Orthologous Groups (COG) database at NCBI (http://www.ncbi.nlm.nih.gov/COG) using its COGnitor tool (59) and by searching the SMART (39) and Pfam (9) databases with HMMer with default parameters. The ProDom database (13, 14) was searched for automatically generated alignments of protein sequence regions of interest.
Multiple alignments were constructed with Clustal (61) and T-Coffee (44) programs and manually adjusted based on the PSI-BLAST outputs. The consensus for each multiple alignment was calculated with a custom Perl script by Nigel Brown and Jianmei Lai (available at http://www.bork.embl-heidelberg.de/Alignment/consensus.html).
Secondary-structure predictions were performed with the JPRED2 server (16), which utilizes several independent algorithms, including the PHD program (52). Fold recognition was performed with the 3D-PSSM (position-specific scoring matrix) program (32).
All novel domains were identified in the course of systematic genomic analysis of microbial signal transduction proteins (22, 69) and assigning putative signaling proteins from newly sequenced bacterial genomes to the COG database (59). They were initially found as conserved N-terminal domains fused to different C-terminal domains, such as histidine kinase, adenylate cyclase, predicted diguanylate cyclase (with or without the putative phosphodiesterase domain), and/or MCP domains and further defined with exhaustive, reciprocal PSI-BLAST searches initiated with predicted extracellular regions of each signal transduction protein.
Several conserved regions appeared to define novel domains which were not detected in current domain databases, such as SMART, Pfam, or COG. Threading of these regions of conservation against the database of known three-dimensional structures failed to identify a recognizable fold (no hits with more than 50% certainty based on the 3D-PSSM E value), suggesting that each of these domains had a novel structure.
PSI-BLAST searches initiated with a predicted periplasmic region (residues 44 to 309) of a sensor histidine kinase from Pseudomonas aeruginosa (gi 15599231; accession no. AAG07423) identified more than 30 proteins that contained similar regions. Figure Figure11 shows a multiple alignment of this domain, termed CHASE2 based on its extracellular location and occurrence in sensor cyclases and histidine kinases. CHASE2 comprises a 250- to 300-amino-acid-long domain that consists of six predicted α-helices and eight predicted β-strands. Conserved charged residues are signature elements of the CHASE2 domain: two in the α-1 helix, two bordering the β-1 strand, one in the α-2 helix, one in the α-2-α-3 loop, one in the α-4 helix, and two preceding the β-4 strand.
Analysis of the domain architecture of CHASE2-containing proteins revealed that they are present in four classes of sensory proteins from a variety of bacterial signal transduction pathways (Fig. (Fig.2):2): class I histidine kinases, adenylate cyclases, predicted diguanylate cyclases, and serine/threonine kinases. The latter combination, found in two Nostoc sp. strain PCC 7120 (also referred to as Anabaena sp. strain PCC 7120) proteins (Fig. (Fig.2),2), differs from the cytoplasmically located serine/threonine kinases described earlier in Nostoc spp. (24, 67) and in some other bacteria (33, 50, 55), as these proteins have a typical transmembrane receptor organization, indicating that they respond to the extracytoplasmic signals. In all cases, the CHASE domain is found N-terminally to the signaling domain and is flanked by predicted transmembrane regions, indicating an extracytoplasmic (periplasmic, thylakoidal) localization. The CHASE2 domain was always followed by three transmembrane regions (Fig. (Fig.2)2) but not the HAMP domain, which has been implicated in transmembrane signaling (6, 63).
The CHASE3 domain was identified in PSI-BLAST searches initiated with a predicted periplasmic region (residues 29 to 183) of the adenylate cyclase CyaA from a cyanobacterium, Spirulina platensis (gi 2575805; accession no. BAA22996) (66). Similar regions were identified in more than 20 proteins. Figure Figure33 shows a multiple alignment of CHASE3 domains. CHASE3 comprises a 130- to 150-amino-acid-long domain that appears to be entirely α-helical (four to six putative α-helices). No coiled-coil regions were predicted in this domain. The signature motif for the CHASE3 domain (Arg-Gly-aromatic-aliphatic-aliphatic-alcohol residue) is located in a highly conserved loop connecting the α-1 and α-2 helices (Fig. (Fig.3).3). Other noticeable elements of the CHASE3 domain are a conserved tyrosine in the α-2 helix and a conserved charged residue in the α-5 helix. Although no statistically significant hits (more than 50% certainty based on the 3D-PSSM E value) were obtained by threading of the CHASE3 domain against the database of known three-dimensional structures, the CHASE3 domain showed remote similarity to the periplasmic ligand-binding domain of the aspartate chemoreceptor Tar from both Escherichia coli (3D-PSSM E value, 2.71) and Salmonella enterica serovar Typhimurium (3D-PSSM E value, 3.69). This similarity may reflect an antiparallel helical bundle characteristic of the Tar protein (41).
Analysis of the domain architecture of CHASE3-containing proteins revealed that it occurs in four classes of sensory proteins (Fig. (Fig.4).4). CHASE3 domains are present in class I histidine kinases, methyl-accepting chemotaxis proteins, adenylate cyclases, and diguanylate cyclases. In multidomain, receptor-like proteins, the CHASE3 domain is found N-terminal to the signaling domain and is flanked by two predicted transmembrane regions, indicating the extracytoplasmic localization. The CHASE3 domain is often found in combination with the intracellular sensory domains PAS and GAF. Some CHASE3-containing proteins have the HAMP domain, whereas others do not (Fig. (Fig.44).
The CHASE4 domain was identified in PSI-BLAST searches initiated with a predicted periplasmic region (residues 47 to 306) of a predicted sensor histidine kinase from the hyperthermophilic archaeon Archaeoglobus fulgidis (gi 11499310; accession no. AAB89530). The novel domain was identified in 19 proteins. Figure Figure55 shows a multiple alignment of CHASE4 domains. This is a 150- to 160-amino-acid-long domain with an apparent αβ fold: six β-strands flanked by two α-helices. An extensive (20 to 45 amino acid residues) loop is predicted in the middle portion of the CHASE4 domain (omitted from the alignment). The signature motif of the novel domain, WDD (Trp-Asp-Asp), is located in the first long α-helix. Similar to CHASE2 and CHASE3, the CHASE4 domain is found exclusively in signal transduction proteins. We detected CHASE4 in only two classes, sensor histidine kinases and diguanylate cyclases/phosphodiesterases (Fig. (Fig.6).6). Some CHASE4-containing proteins also have HAMP and PAS domains. Like other extracellular sensory domains, CHASE4 is always found between two transmembrane regions N-terminal to known signaling domains.
Careful analysis of domain architectures of the transmembrane receptors encoded in complete microbial genomes identified two more extracellular domains that were shared by two classes of transmembrane receptors. The extracellular domain common to histidine kinases and predicted diguanylate cyclases was named CHASE5, and another extracellular domain found in histidine kinases and the HDc phosphodiesterases was named CHASE6 (Fig. (Fig.77 and Table Table11).
The ability to sense extracellular stimuli is a fundamental property of living cells. However, little is known about the diversity of extracellular sensor elements of biological receptors. By using sensitive database searches combined with detailed protein sequence analysis, we have identified several novel extracellular sensory domains that are found in various bacterial transmembrane receptors. All of the domains are found in sensor histidine kinases and nucleotide (adenylate and diguanylate) cyclases, similar to the distribution of two previously described extracellular sensory domains, Cache (3, 69) and CHASE (4, 42). CHASE2 was recently described as being an exclusively cyanobacterial (CMS, cyanobacterial membrane sensor) domain (45). Indeed, in the cyanobacterium Nostoc sp. strain PCC 7120, the CHASE2 domain is present in 13 proteins. However, our results demonstrate that CHASE2 is found in transmembrane receptors in many species outside the cyanobacterial lineage. These include Deinococcus radiodurans, spirochetes, and various proteobacterial species (Fig. (Fig.11 and and2).2). Similarly, the CHASE3 domain is present in several proteins in cyanobacteria but is also found in D. radiodurans and various gram-positive bacteria and proteobacteria (Fig. (Fig.33 and and44).
While neither CHASE2 nor CHASE3 has been found in Archaea, CHASE4 and CHASE6 domains are encoded in archaeal species as well as in representatives of gram-positive bacteria and proteobacteria (CHASE4) and cyanobacteria (CHASE6). Thus, novel sensory domains have a wide phyletic distribution. Two-component regulatory systems are found in only few species of Archaea and could have been acquired from Bacteria through lateral gene transfer (34, 37). Thus, the prevalence of CHASE-like sensory domains in Bacteria is not surprising.
Some of the proteins that contain newly defined domains have been studied experimentally, for example, the CHASE3-containing adenylate cyclase CyaA from the cyanobacteria Spirulina platense and Nostoc sp. strain PCC 7120 (31, 66) and histidine kinase VsrA from Ralstonia (formerly Pseudomonas) solanacearum (54). The histidine kinase VsrA was identified as a critical sensor required for expression of virulence factors, both polysaccharides and proteins, in this wilt-inducing phytopathogen. CHASE3-containing sensors are also found in other important pathogens of plants and mammals, such as Pseudomonas syringae, Burkholderia fungorum, Legionella pneumophila, and Bacillus anthracis (unpublished genomes; data not shown). A recent study of the adenylate cyclase CyaA from Myxococcus xanthus demonstrated that this CHASE2 domain-containing protein participates in signal transduction during osmotic stress and might function as an osmosensor (35). Thus, it is an attractive hypothesis that the CHASE2 domain is the osmosensing module in all other types of transmembrane receptors in which it is found.
At this time, the stimuli that are recognized by other novel domains are unknown. Identification of conserved residues in CHASE-like domains presented in this work, such as the RGFLLT motif in the VsrA protein (Fig. (Fig.3),3), may assist investigators in future studies of the ligand specificity of various transmembrane receptors.
The CHASE2 and CHASE4 domains have not been found in MCPs. However, as sensor elements of histidine kinases, they are present in proteobacterial species that have large numbers of MCPs, for example, Pseudomonas aeruginosa (57), Ralstonia solanacearum (53), and Vibrio cholerae (27). This observation suggests that CHASE2 and CHASE4 recognize stimuli that are not important for the immediate motility response but might be critical for regulation of metabolism or cell development. It is remarkable that in archaea, CHASE4 appears to be associated exclusively with histidine kinases, while in bacteria it is associated exclusively with diguanylate cyclases (Fig. (Fig.6).6). Uncovering the reasons for this dichotomy should help in understanding the principles of signal transduction in these two groups of prokaryotes.
It is important to note that the complex multidomain organization of transmembrane sensors seriously complicates functional annotation of these proteins. In the course of microbial genome sequencing projects, signaling proteins containing the CHASE2, CHASE3, and CHASE4 domains have been repeatedly misannotated and deposited in GenBank under obscure or erroneous names. For example, the histidine kinase (All3347, gi 17230839) and an uncharacterized protein (Alr0357, gi 17227853) from Nostoc sp. strain PCC 7120 (Fig. (Fig.2)2) were both annotated as adenylate cyclases (30), although their similarity to adenylate cyclases was limited to the common N-terminal CHASE2 domain. Curiously, three virtually identical CHASE2-containing proteins were annotated as adenylate cyclase (All7310, gi 17233326), similar to adenylate cyclase (All3180, gi 17230672), and a hypothetical protein (Alr1378, gi 17228873), although none of them contained the cyclase domain (Fig. (Fig.2).2). In fact, annotation of transmembrane sensors as unknown, hypothetical, or conserved hypothetical proteins is quite common.
As we have argued earlier, short of a systematic mistake in sequencing, a protein that is conserved across diverse phylogenetic lineages is not hypothetical anymore (20). We hope that delineation of the novel sensor domains in this work would help simplify the recognition of transmembrane receptor proteins and eventually improve their annotation. It is clear, however, that fully consistent annotation of the CHASE-type domain-containing sensors will be impossible without experimentally identifying their ligands and understanding the full set of environmental signals sensed by these domains.
Although most of the novel domains described here are found in transmembrane receptors, they might utilize different mechanisms of signal transduction across the membrane. CHASE3 and CHASE4 are often found in proteins that also contain the HAMP domain (6, 63). Moreover, the CHASE3 domain shows remote (likely topological) similarity to the periplasmic-ligand binding domain of the HAMP domain-containing Tar chemoreceptor (41). Therefore, in CHASE3-containing receptors, signal translocation across the membrane might be similar to the piston model suggested for Tar (12, 46). However, the CHASE2 domain is never found in combination with the HAMP domain but is always followed by three transmembrane helices. This unusual motif occurred without exception in all proteins in which CHASE2 was detected (Fig. (Fig.2)2) and might be an attractive subject for future studies on novel mechanisms of transmembrane signaling.
In summary, our results indicate that various transmembrane receptors that transduce signals into diverse regulatory pathways may utilize similar sensory (input) domains. The pervasiveness of a particular sensor domain in receptors from diverse regulatory networks within a given organism suggests that the stimuli recognized by this domain might be important for the biology of the organism.
This work was supported, in part, by National Science Foundation grant EIA-0219079 (to I.B.Z.).
We thank Darren Natale for pointing out the cupin output domain. We acknowledge the availability of unfinished genome sequences from the Institute for Genomic Research, the Sanger Institute, and the DOE Joint Genome Institute.