|Home | About | Journals | Submit | Contact Us | Français|
Sequence analysis of bacterial genomes revealed a novel DNA-binding domain. This domain is found in several response regulators of the two-component signal transduction system, such as Pseudomonas aeruginosa AlgR, involved in the regulation of alginate biosynthesis and in the pathogenesis of cystic fibrosis; Clostridium perfringens VirR, a regulator of virulence factors, and in several regulators of bacteriocin biosynthesis, previously unified in the AgrA/ComE family. Most of the transcriptional regulators that contain this DNA-binding domain are involved in biosynthesis of extracellular polysaccharides, fimbriation, expression of exoproteins, including toxins, and quorum sensing. We refer to it as the LytTR (‘litter’) domain, after Bacillus subtilis LytT and Staphylococcus aureus LytR response regulators, involved in regulation of cell autolysis. In addition to response regulators, the LytTR domain is found in combination with MHYT, PAS and other sensor domains.
Response regulators of the microbial two-component signal transduction systems consist of an N-terminal CheY-like receiver domain and a C-terminal output domain, typically a DNA-binding helix–turn–helix (HTH) or winged helix domain (1–3). In response to an environmental stimulus, a phosphoryl group is transferred from the sensor histidine kinase to an Asp residue in the CheY-like receiver domain of the cognate response regulator. Although the CheY-like domains do not bind DNA by themselves, phosphorylation of these domains induces conformational changes that affect the binding of the associated HTH domains to their recognition sites on the chromosomal DNA. Phosphorylation-induced conformational changes in the response regulator molecule have been demonstrated in direct structural studies (4,5).
Recent studies, including analyses of the response regulators encoded in complete bacterial genomes, illuminated a previously unseen diversity of their output domains. The HTH domains were found to fall into two main categories: NarL/FixJ family regulators contain a single HTH motif in the middle of a four-helix bundle (6), whereas regulators of the AraC/XylS family contain two HTH motifs separated by a linker helix and are usually flanked by two additional α-helices (7). A majority of known response regulators, including the well studied OmpR and PhoB proteins, were found to contain an alternative DNA-binding domain, referred to as ‘winged helix’ or ‘winged HTH’ (8). In addition, response regulators can contain other DNA-binding domains (1,9–11), as well as alternative output domains, such as AAA-type ATPase, methylesterase, GGDEF, EAL and HD-GYP domains, which do not bind DNA (1–3,12). Finally, alternative output domains can be combined with the DNA-binding domains, e.g. in the NtrC family of transcriptional regulators (2,4).
A detailed study of 13 response regulators encoded in the Streptococcus pneumoniae genome found a CheY-HTH domain organization in four of them; seven more had an OmpR-like winged helix DNA-binding domain (13). The two remaining response regulators, AgrA and BlpR, were found to have an unusual C-terminal DNA-binding domain, specific for the AgrA/ComE family (13). The importance of this domain was underscored by the observation that BlpR appeared to be the only response regulator essential for growth of S.pneumoniae cells (13,14).
Here, we present sequence analysis of this unusual DNA-binding domain, which shows no significant similarity to the HTH or winged-helix domains found in the previously characterized response regulators. Bacterial response regulators containing this novel DNA-binding domain comprise a well defined family of proteins that regulate production of important virulence factors, such as toxins, bacteriocins and extracellular polysaccharides.
Initial similarity searches against the NCBI non-redundant protein database were performed using the PSI-BLAST program run to convergence with S.pneumoniae BlpS (NCBI protein database entry CAC03514.1) or the cytoplasmic portion of Oligotropha carboxidovorans CoxC (NCBI protein database entry CAB76400.1) as queries. C-terminal DNA-binding domains of various response regulators identified in these initial searches were used as queries for subsequent PSI-BLAST searches with varying parameters (because of the short size of the query proteins, to improve the sensitivity of the search, the expect value threshold was increased to 100 or more, the word size was decreased to 2, and the inclusion threshold was changed to 0.02) and for non-iterated BLAST searches against finished and unfinished microbial genome sequences (http://www.ncbi.nlm.nih.gov/BLAST). The multiple alignment of the representative sequences, retrieved in these searches, was generated by PSI-BLAST, followed by minimal manual editing. A major part of the LytTR domain (missing the first one or two β-strands) is listed in the ProDom database (http://prodes.toulouse.inra.fr/prodom/doc/prodom.html) as domain PD032235. Response regulators containing the LytTR domain comprise COG3279 in the COG database (http://www.ncbi.nlm.nih.gov/COG/).
Secondary structure predictions were performed using the full set of programs available at the PredictProtein (http://cubic.bioc.columbia.edu/predictprotein/), JPred (http://jpred.ebi.ac.uk/) and UCSC HMM (http://www.cse.ucsc.edu/research/compbio/) web servers. The pI values were calculated using the Compute pI/Mw tool at the ExPASy web server (http://www.expasy.org/).
The consensus sequence of the LytTR-binding site was deduced from a library of the upstream regulatory DNA sequences that have been experimentally demonstrated to bind response regulators with the LytTR domain. This library included AlgR-binding sites RB1, RB2 and RB3 for Pseudomonas aeruginosa algD gene (15,16) and the AlgR-binding site of the algC gene; sequences upstream of Lactobacillus plantarum plnABCD, plnEFI and plnJKLR operons that bind PlnC and PlnD (17); Lactobacillus sake sppA, sppIP, sppT_1 and sppT_3 probes that bind SppR, PlnC and PlnD (18); probes DR1, DR2 and DR3 with sequences upstream of S.pneumoniae comAB, comCDE and orf12 genes that bind ComE (19); and two probes covering the upstream region of Clostridium perfringens pfoA gene that binds VirR (20). With this library as an input, the SignalX program (21), a module of the GenomeExplorer program suite (22), was used to generate the profiles of monomeric and tandem LytTR-binding sites and to use these profiles to search for the likely LytTR-binding sites in selected DNA sequences.
PSI-BLAST searches of the NCBI non-redundant protein sequence database using the BlpS protein of S.pneumoniae as a query converged after five to eight iterations (depending on the search parameters used), retrieving a set of 94 proteins (as of January 1, 2002) with several characteristic sequence motifs (Fig. (Fig.1).1). Most of those hits comprised C-terminal portions of bacterial response regulators, including such well characterized ones as P.aeruginosa AlgR, Staphylococcus aureus AgrA and LytR, S.pneumoniae ComE and C.perfringens VirR (Table (Table1).1). Since these response regulators all contain N-terminal CheY-like domains, which do not interact directly with DNA, their C-terminal parts must be responsible for their well documented binding to specific DNA sequences in the upstream regions of target genes (9,10,15–20). Using the C-terminal domains of any of these response regulators as PSI-BLAST queries each time retrieved the same set of 94 proteins, indicating that these domains form a tight family, unrelated to any other DNA-binding domains. Because of a certain confusion with the nomenclature of individual proteins of this family, we refer to this conserved DNA-binding domain found in Bacillus subtilis protein LytT and S.aureus transcriptional regulator LytR as LytTR (‘litter’) domain, in line with the designation of the Staphylococcus lugdunensis synergistic hemolysin locus as slush (23). [In addition to the S.aureus LytR protein, the name LytR is also used for another group of transcriptional regulators (e.g. B.subtilis LytR), which are unrelated to members of the LytTR family (see COG1316 in the COG database, http://www.ncbi.nlm.nih.gov/COG/). Likewise, none of the proteins encoded in the comE operon of B.subtilis (ComEA, ComEB or ComEC) are related to the streptococcal ComE proteins or belongs to the LytTR family. Finally, the VirR protein from Agrobacterium tumefaciens is also unrelated to the LytTR family proteins with the same name. Instead, it belongs to a large family of uncharacterized ‘VirR-related’ proteins that comprise COG1720 in the COG database and UPF0066 in PROSITE, http://www.expasy.org/prosite/.] Accordingly, we refer to transcriptional regulators that contain the LytTR domain (Table (Table2)2) as the LytTR family proteins.
Sequence analysis of the LytTR domain indicates that it is most likely composed of four β-strands, followed by two α-helices, a β-strand and one more α-helix (Fig. (Fig.1A).1A). The most conserved sequence motif FhRhH[RK][SNQ]hhVN (where ‘h’ indicates a hydrophobic amino acid residue) is located at the beginning of the second α-helix in the region that could potentially comprise a HTH motif (Fig. (Fig.1A1A and B). However, the variable distance between the two α-helices in different proteins and a lack of sequence conservation in this spacer region (Fig. (Fig.1A),1A), argue against its role in DNA binding. In addition, some of the programs used predict, in this region, a short β-strand, while others predict a single long α-helix, which makes the structural assignment in this region somewhat questionable. In any case, the predicted secondary structure of the LytTR domain is not similar to any protein with known three-dimensional structure, indicating that it comprises a novel type of a DNA-binding domain.
The C-terminus of the LytTR domain contains a cluster of Lys and Arg residues (Fig. (Fig.1B)1B) that confer strong positive charge to the whole domain, bringing its pI into the 9.0–11.0 range. This positive charge has been previously proposed to play a role in the DNA binding by the response regulators of the AgrA/ComE family (10). However, the LytTR domain appears to bind DNA in a sequence-specific fashion, interacting with direct repeats of 10–12 bp with a relatively well defined consensus sequence (see below). In addition, there are several exceptions, including the B.subtilis LytT protein whose LytTR domain has a pI of 6.5. These observations argue against direct involvement of those Lys and Arg residues in DNA binding by the LytTR domain.
In addition to numerous response regulators with CheY-LytTR domain organization, the LytTR domain has been found in other contexts (Table (Table2).2). The deep-water marine bacterium Bacillus halodurans encodes a unique protein, combining an ATP-binding domain of ABC-type transporter family with a C-terminal LytTR domain. Such a domain combination has not been seen elsewhere and the regulatory signal transmitted by this protein remains enigmatic.
In CoxC and CoxH proteins from the CO-metabolizing bacterium Oligotropha carboxidovorans (24), the LytTR domain can be found in association with MHYT, a recently described sensor domain, which contains six transmembrane segments carrying conserved His and Met residues (25). The coxC and coxH genes flank the operon encoding the CO dehydrogenase and are involved in transcriptional regulaton of the autotrophic metabolism in O.carboxidovorans (24). A LytTR domain-containing protein is also encoded upstream of the CO dehydrogenase-encoding genes in Hydrogenophaga pseudoflava (data not shown).
Several Gram-positive bacteria combine the LytTR domain with a short (~40 amino acids) poorly conserved N-terminal domain (see Table Table2,2, and Fig. S2A in the Supplementary Material), dubbed LMO (‘elmo’). These proteins are likely to be transcriptional regulators whose binding to DNA is controlled by protein–protein interactions involving the LMO domain.
The RpfD protein from the plant pathogen Xanthomonas campestris (26) combines the LytTR domain with an N-terminal membrane-bound domain that consists of three predicted transmembrane segments with conserved Glu, Asp, His and Tyr residues (see Fig. S2B in the Supplementary Material). The membrane topology of this domain is somewhat similar to that of the MHYT domain (25), suggesting that it, too, might be membrane-bound and metal-binding. We refer to it as the MHYE domain. Although the rpfD gene is located in the rpf (regulation of pathogenicity factors) locus of X.campestris, mutants with transposon insertion in the rpfD gene did not exhibit any discernible phenotype (26), just as MHYT-less mutants of B.subtilis (25). While the function of this protein still remains unknown, it is worth noting that very similar proteins are encoded in the genomes of Caulobacter crescentus and Mesorhizobium loti (Table (Table22).
Studies of the DNA binding by response regulators of the AgrA family established that these proteins bind to imperfect direct repeats of the sequence pattern [TA][AC][CA]GTTN[AG][TG] separated by a 12–13 bp spacer (10,14,17–19). Remarkably similar binding sites were also identified in front of AlgR- and VirR-regulated promoters (15,20). Using an alignment of the experimentally characterized LytTR-binding sites in a profile-based search with the SignalX program (21) identified potential LytTR-binding sites in the upstream region of the S.aureus lrgAB operon (27) and in front of coxC, coxD and coxH genes (24) of the O.carboxidovorans cox gene cluster (data not shown). One of the most interesting findings from these searches was the prediction of the fourth LytTR-binding site in the near-upstream region of the P.aeruginosa algD gene. This site (–71 TCCGTTTGA –79) is located next to the previously identified RB3 AlgR-binding site (16) and can actually be seen as the second AlgR-protected DNA region on figure 3 in Mohr et al. (16). The adjacency of these two AlgR-binding sites suggest that they might function together in binding an AlgR dimer in a manner similar to that of other LytTR-containing response regulators. A detailed analysis of the potential LytTR-binding sites in completely sequenced bacterial genomes will be reported elsewhere.
Involvement of the transcriptional regulators that contain the LytTR domain in the pathogenesis of cystic fibrosis (P.aeruginosa AlgR), toxic shock syndrome (S.aureus AgrA), gas gangrene (C.perfringens VirR) and other deadly diseases has ensured sustained interest in their structure and function. However, a detailed analysis of these proteins has been hampered by the remarkable sequence conservation of their N-terminal CheY domain, which largely masked (dis)similarities of the DNA-binding domains of various response regulators. When Shimizu et al. (28) sequenced C.perfringens virR gene, they noticed the similarity of its product to MrkE, AlgR and AgrA response regulators, but paid most attention to their common N-terminal CheY domains. Likewise, Brunskill and Bayles (29) aligned S.aureus LytR with YehT, AgrA, AlgR and MrkE, but limited that alignment to the CheY portion of the protein. Nevertheless, several previous studies reported the lack of sequence similarity of the LytTR domain to the HTH or winged-helix DNA-binding domains and succeeded in assigning LytTR-containing response regulators to two families. On one hand, Whitchurch et al. (30) linked AlgR with YehT and LytR in a new family of response regulators and Mizuno (11) identified Escherichia coli YehT and MrkE (o244) as response regulators with unique output domains. On the other hand, Pestova et al. (9) identified the S.pneumoniae ComE response regulator and demonstrated its similarity to the quorum-sensing regulators AgrA, CbnR, PlnC and PlnD and to the C.perfringens VirR. Likewise, Grebe and Stock (1) assigned AlgR, LytR, LytT and MrkE to the RC3 group of response regulators with the LytR-type output domain; in contrast, AgrA, CbnR, ComE, PlnC, PlnD, PlsR and SppR were assigned to the RD group of response regulators with the ComE-type output domain. Here, we have unified the AlgR/LytR and AgrA/ComE families into one and identified other proteins with the LytTR domain that are encoded in complete microbial genomes.
Although transcriptional regulators with the LytTR domain are widely distributed among the bacteria of the Bacillus/Clostridium group (low G+C Gram-positive bacteria) and are found, albeit usually in just one copy, in many proteobacteria, they are conspicuously missing in other groups of Gram-negative bacteria, in archaea and in eukaryotes (see Table S3 in the Supplementary Material). Although the LytTR domain is missing in Mycobacterium tuberculosis and Mycobacterium leprae, response regulators with this domain are encoded in other actinobacteria (high G+C Gram-positive bacteria), such as Streptomyces coelicolor and Rhodococcus ruber, as well as in the unfinished genome of Mycobacterium smegmatis (data not shown). This skewed phylogenetic distribution generally correlates with the genome size, as only those proteobacteria with genomes >2 MB (i.e. encoding at least 2000 proteins) encode a LytTR domain. Thus, although LytTR-containing transcriptional regulators play important roles in regulating virulence factors, this domain appears to be more typical for free-living bacteria and non-obligate parasites that tend to have relatively large genomes. This correlates with the involvement of LytTR-containing transcriptional regulators in complex behavioral responses, such as quorum sensing (9,18).
It is worth noting that in α-proteobacteria, the LytTR domain is found in combinations with various membrane-bound domains, including all five instances of this domain in C.crescentus (Table (Table2).2). Transcriptional regulation mediated by these proteins should occur only when the corresponding target genes are positioned in close vicinity to the cell membrane and may account for the temporary expression of those genes, which could be particularly important for the complex cell cycle of C.crescentus.
As discussed above, the LytTR domain consists of three (predicted) α-helices and four (predicted) β-strands with a potential additional short β-strand between the first and second α-helices (Fig. (Fig.1A).1A). Remarkably, some of the available structure prediction programs predict that the conserved F[FYVL][RQ][CIV] motif forms an additional β-strand, while others predict that these residues belong to very long α-helix, extending into the α-helix shown in Figure Figure1.1. Thus, although different programs disagree on the exact structure of this region, they all consider a loop (or turn) structure in that region unlikely. A possibility of helix-to-strand conversion has been reported in the DNA-binding region of the transcriptional regulator Fis (31). In any case, this region has no apparent similarity to the turn regions of the classical HTH motifs and can be confidently predicted to comprise a novel type of DNA-binding domain. Of course, all secondary structure assignments discussed above are just predictions that need to be verified by determining the three-dimensional structure of the LytTR domain (currently in progress). Knowledge of the LytTR structure should also help in refining its binding site, shedding some light on the range of cellular functions regulated by this remarkable system.
Supplementary Material is available at NAR Online.
We thank Eugene Koonin and Igor Rogozin for helpful suggestions, Mikhail Gelfand and Andrey Mironov for letting us use an updated version of their SignalX program, and Chester Price for critically reading the manuscript. We acknowledge the availability of unfinished genome sequences from the Institute for Genomic Research (Geobacter sulfurreducens), Washington University (Klebsiella pneumoniae) and the DOE Joint Genome Institute (Burkholderia cepacia).