|Home | About | Journals | Submit | Contact Us | Français|
The ubiquitous bacterial second messenger cyclic di-GMP (c-di-GMP) has recently become prominent as a trigger for biofilm formation in many bacteria. It is generated by diguanylate cyclases (DGCs; with GGDEF domains) and degraded by specific phosphodiesterases (PDEs; containing either EAL or HD-GYP domains). Most bacterial species contain multiples of these proteins with some having specific functions that are based on direct molecular interactions in addition to their enzymatic activities. Escherichia coli K-12 laboratory strains feature 29 genes encoding GGDEF and/or EAL domains, resulting in a set of 12 DGCs, 13 PDEs, and four enzymatically inactive “degenerate” proteins that act by direct macromolecular interactions. We present here a comparative analysis of GGDEF/EAL domain-encoding genes in 61 genomes of pathogenic, commensal, and probiotic E. coli strains (including enteric pathogens such as enteroaggregative, enterohemorrhagic, enteropathogenic, enterotoxigenic, and adherent and invasive Escherichia coli and the 2011 German outbreak O104:H4 strain, as well as extraintestinal pathogenic E. coli, such as uropathogenic and meningitis-associated E. coli). We describe additional genes for two membrane-associated DGCs (DgcX and DgcY) and four PDEs (the membrane-associated PdeT, as well as the EAL domain-only proteins PdeW, PdeX, and PdeY), thus showing the pangenome of E. coli to contain at least 35 GGDEF/EAL domain proteins. A core set of only eight proteins is absolutely conserved in all 61 strains: DgcC (YaiC), DgcI (YliF), PdeB (YlaB), PdeH (YhjH), PdeK (YhjK), PdeN (Rtn), and the degenerate proteins CsrD and CdgI (YeaI). In all other GGDEF/EAL domain genes, diverse point and frameshift mutations, as well as small or large deletions, were discovered in various strains.
IMPORTANCE Our analysis reveals interesting trends in pathogenic Escherichia coli that could reflect different host cell adherence mechanisms. These may either benefit from or be counteracted by the c-di-GMP-stimulated production of amyloid curli fibers and cellulose. Thus, EAEC, which adhere in a “stacked brick” biofilm mode, have a potential for high c-di-GMP accumulation due to DgcX, a strongly expressed additional DGC. In contrast, EHEC and UPEC, which use alternative adherence mechanisms, tend to have extra PDEs, suggesting that low cellular c-di-GMP levels are crucial for these strains under specific conditions. Overall, our study also indicates that GGDEF/EAL domain proteins evolve rapidly and thereby contribute to adaptation to host-specific and environmental niches of various types of E. coli.
Although cyclic di-GMP (c-di-GMP) was first described as an allosteric activator of cellulose synthase already in 1987 (1), it was only in the 21st century that it became clear that this nucleotide second messenger is ubiquitous in the bacterial world and generally promotes biofilm formation, downregulates flagella expression and/or activity, and can modulate virulence, the cell cycle, and development. Furthermore, research on c-di-GMP signaling in a small group of model bacterial species has led to novel general concepts in second messenger signaling (2,–6).
c-di-GMP is generated by diguanylate cyclases (DGCs) characterized by GGDEF domains, with this amino acid motif representing the active center (the A-site). Most of these enzymes also contain a secondary and inhibitory binding site for c-di-GMP (the I-site), i.e., their activities are feedback inhibited by their own product. Specific phosphodiesterases (PDEs) that degrade c-di-GMP belong to one of two protein families, featuring either EAL or HD-GYP domains. Structures of DGCs and PDEs have been elucidated and functionally important amino acids have been identified (7). c-di-GMP effector mechanisms operate via many and unexpectedly diverse families of c-di-GMP-binding proteins and RNAs (riboswitches) (8,–10). These can target virtually any molecular process in bacterial cells, including transcription, mRNA stability, translation, functional protein-protein interactions, or protein degradation.
One of the most striking features of c-di-GMP signaling is the multiplicity of GGDEF/EAL/HD-GYP domain-encoding genes in most bacterial species. The Escherichia coli K-12 laboratory strain, one of the model species in c-di-GMP-related research, has a complement of 29 of these genes, with 12 and 10 genes encoding GGDEF and EAL domains only, respectively, and 7 genes encoding proteins with both domains. Based on biochemical evidence and knowledge of the functions of specific highly conserved amino acids, 12 of the gene products are DGCs, 13 are PDEs, and 4 are “degenerate” proteins with nonenzymatic functions (11, 12). This is reflected in a novel systematic nomenclature for the genes encoding these enzymes and their products proposed by a group of researchers working on c-di-GMP signaling in E. coli (see the report by Hengge et al. in this issue of the Journal of Bacteriology ). For maximal clarity, we use the new designations here but also provide the previous “Y” designations.
For almost half of these GGDEF/EAL domain proteins, the physiological contexts of action and in some cases also molecular functions and interactions have been clarified. For example, several DGCs can contribute to downregulating flagellar rotation via the c-di-GMP-binding protein YcgR (14,–16). Another major target for positive control by c-di-GMP is the expression of the biofilm regulator CsgD (15, 17), which activates the biosynthesis of amyloid curli fibers and cellulose, i.e., the components of the extracellular matrix in colony biofilms of E. coli and related bacteria (18, 19). The underlying mechanism provides a paradigm for highly specific “local” c-di-GMP signaling by distinct DGCs and PDEs: the “trigger enzyme” and PDE PdeR (YciR) and the DGC DgcM (YdaM) form a complex with the transcription factor MlrA and thereby act as a core transcriptional switch module that controls transcription of csgD (20). This switch module responds to the cellular level of c-di-GMP, which under standard laboratory conditions increases during entry into stationary phase as a result of decreasing levels of PdeH (YhjH) and increasing levels of DgcE (YegE) (15). DgcC (YaiC) is another DGC that equally specifically activates a cellular target, i.e., cellulose synthase, but here the mechanism is still unclear (21, 22). Highly specific macromolecular interactions are also underlying the functions of the enzymatically inactive “degenerate” GGDEF/EAL domain proteins BluF (YcgF), CsrD (YhdA), and RflP (YdiV) (23,–26). However, other GGDEF/EAL domain proteins of E. coli K-12 remain largely uncharacterized at the functional molecular level.
With some rare exceptions (27,–30), analyses of GGDEF/EAL domain genes or proteins of E. coli have been performed with laboratory K-12 strains. Therefore, almost nothing is known about how the complement of GGDEF/EAL domain-encoding genes may vary in different E. coli strains, in particular when comparing pathogenic and commensal strains. E. coli is a particularly diverse species with a pangenome several times larger than the core genome conserved in all known strains (31, 32). It colonizes different host-associated niches but also thrives under quite diverse environmental conditions. This diversity suggested that the genomes of E. coli strains may represent a fertile ground for discovering novel GGDEF/EAL domain genes and interesting variations in such genes that are already known, in particular, since different types of pathogenic E. coli differ profoundly in the way they adhere to host tissue (33) and adhesion mechanisms in general are a major target of c-di-GMP signaling. Moreover, our study here was spurred by our analysis of c-di-GMP signaling in the 2011 German outbreak O104:H4 strain, which already led to the discovery of a novel and extremely highly expressed DGC (DgcX) and some other interesting variations in GGDEF/EAL domain genes in enteroaggregative and enterohemorrhagic E. coli (EAEC and EHEC, respectively) (30).
Here we present a more comprehensive genomic comparison of 61 strains—including pathogens of different pathotypes as well as commensals and probiotics—that led to the discovery of additional DGC/PDE-encoding genes and numerous variations with respect to integrity and expression of already known GGDEF/EAL domain proteins. Certain variations correlate with distinct pathotypes and lifestyles of the E. coli strains analyzed and shed light on how rapid evolution of c-di-GMP signaling can contribute to the diversity and specific adaptations of different E. coli strains and pathotypes.
This study started with a systematic analysis of all of the 39 completed E. coli genomes available on the National Center for Biotechnology Information (NCBI) database in May 2011 (i.e., during the O104:H4 outbreak in Germany). In May 2012 an updated search incorporated an additional nine genomes. In June 2014 an additional 13 genomes were selectively added since there were too many new completed E. coli genomes added to the NCBI database to be included into the present study. The 13 selected genomes were added to complement the previous strains such that the different pathotypes of E. coli are adequately represented. Overall, 61 strains were thus included in this study (see Table S1 in the supplemental material). The strains analyzed included enteric pathogens (enteroaggregative, enterotoxigenic, enterohemorrhagic, enteropathogenic, and adherent and invasive E. coli [EAEC, ETEC, EHEC, EPEC, and AIEC, respectively]), extraintestinal pathogenic E. coli (ExPEC; including uropathogenic E. coli [UPEC] and meningitis-associated E. coli [MNEC]), and commensal E. coli strains (including probiotics such as E. coli Nissle 1917), as well as E. coli strains of nonhuman origin (avian pathogenic E. coli [APEC], porcine ETEC, and environmental isolates). E. coli K-12 W3110 was used as a reference strain here, since it has been extensively used as an experimental model for c-di-GMP signaling in E. coli (20, 24, 34,–36).
The Basic Local Alignment Search Tool (BLAST) (37) was used to search for proteins with GGDEF and/or EAL domains encoded in single selected genomes. The 169-amino-acid GGDEF domain of the DgcM (YdaM) protein and the 245-amino-acid EAL domain of the PdeC (YjcC) protein (as identified by the SMART EMBL database) were used as query sequences to detect previously known, as well as unidentified, GGDEF and EAL domain proteins. If a GGDEF/EAL domain gene known from E. coli K-12 was not detected from the genome of a given strain, a second BLAST search was performed using the particular protein from the W3110 strain as the query sequence. This approach showed true absences but also yielded apparent discrepancies in the lengths of the proteins. To verify whether these differences reflected real genomic variations or rather arbitrary differences in the annotation of the respective genome sequence, a follow-up tBLAST search was performed, where the amino acid sequence was searched directly within the nucleotide database by the tBLAST algorithm translating the nucleotide sequence into an amino acid sequence, again using proteins from the W3110 strain as the query sequence. In this way various GGDEF/EAL domain genes that initially seemed to be “missing” were found, whereas others were confirmed to be truly absent from the respective genomes. Apparent discrepancies in protein annotation between strains were further addressed by directly looking at the nucleotide sequences, the annotated start codons and putative Shine-Dalgarno sequences. Many of these cases, in particular when nucleotide sequences were nearly or even complete identical, were found to be due to misannotations of start codons. In other cases, true deleterious disruptions, such as insertions, deletions, or point mutations, were also found at the nucleotide sequence level and were analyzed in detail by alignments (38) with the corresponding nucleotide sequences from strain W3110.
Newly identified GGDEF/EAL domain proteins and their genes, as well as novel sensory input domains, were designated according to the rules for a systematic nomenclature proposed in another publication in this issue (13).
All GGDEF/EAL domain-containing proteins were analyzed for potential uncharacterized motifs using the MEME program (39). Default settings were used, except that the condition “any number of repetitions” was selected for the prediction of how single motifs were distributed among the sequences. The locations of the motifs were determined for individual proteins relative to the locations of the putative transmembrane helices using the hydropathy plots generated by the TMHMM program (40).
Our analysis of GGDEF/EAL domain genes was originally triggered by the outbreak of E. coli O104:H4 infection in May 2011 in Germany (41, 42). At this time the NCBI database genome contained the genome sequences of 39 E. coli strains. Later on, newly completed E. coli genomes were successively added to the analysis (see Materials and Methods for details) such that the final set of 61 genomes included the different pathogroups, as well as commensal strains of E. coli (see Table S1 in the supplemental material). The following pathogroups are represented in our study: enteric pathogens, including EAEC, ETEC, EHEC (also called Shiga toxin [Stx]-producing E. coli [STEC]), EPEC, and AIEC; ExPEC, including UPEC and MNEC; and pathogenic E. coli of nonhuman origin (APEC and porcine ETEC).
Initial BLAST searches (37) using the GGDEF and EAL domain sequences of the DGC DgcM (YdaM) and the PDE PdeC (YjcC), respectively, were followed by a reiterative comparative process that allowed to pinpoint single nucleotide polymorphisms (SNP) in specific genes, as well as the exact extent of smaller or larger deletions (see Materials and Methods). A total of 35 GGDEF/EAL domain-encoding genes (see Table S2 in the supplemental material), as well as numerous small and large variations in the sequences of distinct genes, were identified in the 61 genomes. Previously uncharacterized GGDEF/EAL domain-encoding genes, which, based on the presence of functionally relevant amino acids (see Table S2 in the supplemental material), encode active DGCs or PDEs, were named using a dgc/pde nomenclature, as suggested in the accompanying publication on the nomenclature of c-di-GMP-related enzymes in E. coli (13). The 35 genes include two DGC genes (dgcX and dgcY) and four PDE genes (pdeT, pdeW, pdeX, and pdeY) not found in E. coli K-12 (Table 1).
The 35 GGDEF/EAL domain-encoding genes are conserved with various frequencies (Fig. 1). There is a core set of eight genes that are completely conserved among all 61 strains. These encode (i) the two DGCs, DgcC (YaiC) and DgcI (YliF), (ii) the four PDEs PdeB (YlaB), PdeH (YhjH), PdeN (Rtn), and PdeK (YhjK), and (iii) the degenerate GGDEF/EAL proteins CsrD and CdgI (YeaI); these proteins seem to be functionally important independently of all host-related or environmental specialization of different E. coli strains. Furthermore, a large group of 21 GGDEF/EAL domain-encoding genes are found in versions predicted to encode functional proteins in >65% of all strains analyzed, i.e., these genes belong to a complement of ancient and typical E. coli genes but seem dispensable in certain niches or “lifestyles.” Some of these genes also display specific sequence variants that occur frequently in certain groups of strains (for details, see below). Finally, the six GGDEF/EAL domain-encoding genes not present in E. coli K-12 are found in small minorities of strains, indicating that these genes represent recent acquisitions in distinct clades or even single strains that contribute to adaptation to specific host-associated and/or environmental niches.
Below, the novel DGC and PDE genes and their putative gene products (Table 1), as well as a subset of functionally interesting variations in certain previously known GGDEF/EAL domain genes, are described and discussed in detail. A list of these variations detected in the 61 E. coli strains, including those where a functional consequence is not readily apparent, is given in Table 2 (note that synonymous codons or occasional variations specifying similar amino acids were not included).
As we have previously described in a study of c-di-GMP signaling in the 2011 outbreak O104:H4 and related strains (30), DgcX is the most highly expressed DGC described thus far in E. coli. It contains a GGDEF domain with intact A- and I-sites linked to an N-terminal domain of unknown function, which is predicted to fold into eight transmembrane helices. A similar putative sensory domain termed MASE4 (membrane-associated sensor) (13) is also present in two other GGDEF domain proteins in E. coli, DgcT (YcdT) and CdgI (YeaI). MASE4-GGDEF proteins do not seem to be widespread (for instance Salmonella does not have any), but we found two and four similar proteins that can be predicted to be active DGCs in Klebsiella pneumoniae and Enterobacter lignolyticus, respectively. An alignment of these with DgcX, DgcT, and CdgI of E. coli (see Fig. S1 in the supplemental material) revealed patches of conserved amino acids in three of the four periplasmic loops, which are rich in aromatic amino acids (Fig. 2). This may represent a binding site for a ligand that itself has a ring structure, e.g., some aromatic compound, a nucleotide or a sugar. Our observation that, in contrast to DgcE and DgcC, DgcX (when cloned on a plasmid without any epitope tagging) was unable to suppress a mutation in dgcE (which results in low curli production), suggests that an unknown signaling molecule may have to activate DgcX via its MASE4 domain (T. Povolotsky and R. Hengge, unpublished data). Among the 61 E. coli strains under study here, the dgcX gene was found in nine strains, with six strains belonging to EAEC of the O104:H4 serotype (Table 1). Its location right next to lambdoid prophages in all of these strains suggests its horizontal spreading by specialized transduction (for further details on DgcX, see reference 30).
A novel E. coli DGC gene identified here is dgcY, which occurs in two strains only (Table 1). These are E. coli SMS-3-5 (an environmental pathogenic isolate with multiple antibiotic resistances ) and the neonatal meningitis E. coli (NMEC) O7:K1 strain CE10 (44). Its gene product is 349 amino acids long and is predicted to be an active DGC, now termed DgcY, since it features a C-terminal GGDEF domain with an A-site (but no inhibitory I-site), which is most closely related to the GGDEF domain of DgcZ (YdeH). Its N-terminal putative sensory domain (termed MASE5 ), which is predicted to fold into six transmembrane helices, is unique within E. coli and of unknown function. In both strains, the dgcY gene is preceded by a gene encoding a putative metallo-β-lactamase (EcSMS35_1714) and a small open reading frame (EcSMS35_1715), with the three genes apparently constituting a unique operon not found in any of the other E. coli strains analyzed here (Fig. 3).
With a length of 1,105 amino acids and its six domains DgcE is by far the largest of all GGDEF/EAL domain proteins of E. coli. It consists of a membrane-inserted MASE1 domain (with eight transmembrane helices), followed by two additional transmembrane segments, three PAS domains, an active GGDEF domain, and a degenerate EAL domain (45) (see Table S2 in the supplemental material). Probably by integrating various signals, YegE-mediated c-di-GMP synthesis plays a key role in initiating the expression of the biofilm regulator CsgD during entry into stationary phase and therefore the production of amyloid curli fibers and cellulose as biofilm matrix components (15, 20). The dgcE gene was found to be corrupted in nine E. coli strains, with all EHEC of the O157:H7 serotype sharing the same disruption (a one-nucleotide insertion after nucleotide 457, which should result in the production of a short N-terminal fragment of DgcE only). Additional mutations were found in another EHEC (O26:H11), as well as in an EPEC (O127:H6) strain, but also in the EAEC strain 042 (Table 2). Thus, many EHEC/EPEC strains have lost DgcE, a key DGC for the synthesis of curli fibers and cellulose, suggesting that the production of a biofilm matrix may be counterproductive for an important and specific activity of EHEC/EPEC strains, possibly their specialized adherence mechanism (see also below).
DgcF, a DGC of 472 amino acids, consists of a MASE1 domain connected to a GGDEF domain via two additional transmembrane segments and a HAMP linker domain. No function has been described for DgcF thus far. In E. coli K-12 strains, a deletion that includes the promoter region as well as the first 433 nucleotides of the dgcF coding sequence was originally overlooked, which led to misannotation of an internal codon as a start codon and prediction of a DGC with 315 amino acids only. However, a comparison to other E. coli strains revealed the corruption of dgcF in E. coli K-12, which in fact results in the absence of DgcF (30). Notably, an ETEC strain of porcine origin carries the same large deletion mutation (Table 2). In addition, five other E. coli strains studied here, including two STEC strains, show various one-nucleotide deletion/frameshift mutations in dgcF that should result in the absence of DgcF.
DgcO and its cognate PdeO (DosP, YddU), a DGC/PDE pair expressed from an operon, have been found in a complex with RNase E, enolase, and polynucleotide phosphorylase (PNPase), with the latter responding to c-di-GMP, which suggests a regulatory role of DgcO and PdeO in RNA turnover in a specialized degradosome (46). However, no target RNAs and therefore no clear physiological role of this system have been identified thus far. The entire operon was deleted in a specific EHEC strain of the O103:H2 serotype (Table 2). 16 strains display a complex disruption of dgcO consisting of two deletions (the first ranging from nucleotides –38 to +644, followed by 44 nucleotides of the original dgcO sequence and then by a second deletion of 530 nucleotides). Notably, many ExPEC strains, both UPEC and MNEC, carry this corrupted allele and therefore do not possess DgcO. Moreover, the probiotic strain Nissle 1917 also contains this allele.
DgcQ, a DGC of 564 amino acids, consists of a large periplasmic sensor domain (termed CHASE7 ) flanked by two transmembrane helices and followed by the GGDEF domain. DgcQ plays a minor role in reducing flagellar rotation (15), and it has been reported to contribute to cellulose biosynthesis in a particular E. coli strain (E. coli 1094, for which no genome sequence is available) (47). In a variety of E. coli strains, the dgcQ gene is corrupted by various small frameshift and stop codon mutations and, in one case, a complete deletion (Table 2). Notably, in a series of EAEC, all of the O104:H4 serotype, codon 312 is changed to a stop codon, which, however, is almost immediately followed by an ATG (codon 314). Thus, translational coupling could result in a restart, such that DgcQ would be made in two parts, with the second fragment consisting of the second transmembrane helix and the GGDEF domain. This N-terminally truncated DgcQ protein has indeed been observed when this dgcQ allele was cloned onto a plasmid with a C-terminal His6 tag, but this construct did not complement low curli production of a dgcE mutant, suggesting that an intact CHASE7 sensory domain is required for DcgQ activity (Povolotsky and Hengge, unpublished).
DgcT (YcdT) is a DGC that has been implicated in the production of poly-GlcNAc (PGA), which is expressed within the host and serves as biofilm matrix component and/or virulence factor in some pathogenic E. coli (48,–50). We found that the dgcT gene was entirely deleted in 14 strains and disrupted by small frameshift or stop codon mutations in five additional strains (Table 2). Among the strains with dgcT deletions, there was a high incidence of EHEC/STEC strains, as well as one EPEC strain. In addition, five EHEC and two EPEC strains have an extra PDE gene (pdeT, see below) inserted right after dgcT in an obvious operon, suggesting that this PDE may counteract DgcT activity. This could be an indication that, similar to DgcE-dependent production of a curli and cellulose biofilm matrix (see above), DgcT-driven production of the alternative matrix component PGA may also be detrimental for the specialized adherence mechanism of EHEC/EPEC. Furthermore, two EAEC strains show either a full deletion or an early frameshift mutation in dgcT (Table 2), but in one of these, the EAEC strain 55989, the role of DgcT may be taken over by the strongly expressed DgcX, since these two DGCs show the same type of sensory input domain (see above and reference 30).
Our analyses showed the presence, in one or more E. coli strains, of four PDE genes (pdeT, pdeW, pdeX, and pdeY) not found in E. coli K-12 strains, as well as numerous alterations in PDE genes already known from E. coli K-12. However, within four PDE genes—pdeB (ylaB), pdeH (yhjH), pdeK (yhjK), and pdeN (rtn)—not a single mutation was detected in all 61 E. coli strains studied here, even though these genes can be knocked out experimentally (34). This suggests that these PDEs play an important role under some conditions that all E. coli strains experience during their life cycles. For instance, PdeH is crucial for maintaining a low c-di-GMP level in post-exponentially growing flagellum-expressing cells—in pdeH mutants, flagellar rotation is inhibited by the c-di-GMP-binding effector YcgR (see below), which renders these mutants nonmotile despite their expression of flagella (14,–16). For the other three strictly conserved PDEs, however, no physiological functions have been reported.
The pdeT gene is inserted downstream of dgcT (ycdT) in five EHEC and two EPEC strains (Table 1, Fig. 4). PdeT features a membrane-integrated periplasmic loop domain, a CSS domain (13), followed by an EAL domain. This functionally uncharacterized CSS domain is also found in a subset of five other PDEs present in all or most other E. coli strains, i.e., PdeB (YlaB), PdeC (YjcC), PdeD (YoaD), PdeG (YcgG), and PdeN (Rtn) (13). PdeT was first described in the classical EHEC strain EDL933, where it was shown to constitute an operon with dgcT and to encode an active PDE (28). Thus, PdeT most likely acts as an antagonist for DgcT, which is believed to be involved in the control of the matrix polymer PGA (48, 49), with the pga operon being located right next to and divergently oriented from dgcT. In functional terms, the insertion of pdeT may thus be equivalent to the corruption of dgcT found in some other EHEC and EPEC strains (see above). In fact, a subset of EHEC strains (all of the O145:H8 serotype) show a 5′-end-truncated pdeT gene (Table 2) but no dgcT, suggesting that this lineage originally possessed a dgcT-pdeT operon but then acquired a large deletion that removed dgcT and the first 91 nucleotides of pdeT. This does not necessarily mean that no PdeT activity is present, since in a similar case of a 5′-truncated gene encoding another CSS-PDE (PdeG), gene product activity was observed (see below).
The novel genes pdeW and pdeX, which encode PDEs consisting of an EAL domain only, were each detected in only a single E. coli strain (Fig. 4). pdeW (annotated as ecE24377A_E0054) is located on the uncharacterized plasmid 2 of the ETEC O139:H28 strain E24377A. Together with a few other novel genes involved in synthesis of Pix fimbriae, pdeX (annotated as ECP_2965) is inserted in the genome of the classical UPEC strain 536. A third stand-alone EAL domain PDE gene, pdeY, was previously found in the meningitis-associated E. coli strain IHE3034, where it is associated with the sfaX(II) locus involved in the synthesis of S fimbriae and was initially termed sfaY (51). In addition, we find pdeY also in five other E. coli strains, including three widely studied UPEC strains (536, CFT073, and UTI89). Notably, UPEC strain 536 thus has even two of these additional small PDEs, i.e., PdeX and PdeY. The operon layouts in the six pdeY-containing strains are essentially the same, with the exception of the UPEC strain CFT073 and the commensal E. coli ABU 93972 that show an extra gene inserted in this region (c1248 in CFT073) (Fig. 4C).
The 507-amino-acid PdeG belongs to the group of PDEs that combine a membrane-inserted periplasmic loop CSS domain at the N terminus with a C-terminal EAL domain. A number of pathogenic E. coli of various pathotypes are devoid of PdeG due to larger deletions (that include neighboring genes as well), a frameshift-generating one-nucleotide deletion or an early stop codon in pdeG (Table 2). Functional consequences are unclear since knocking out the intact pdeG in a K-12 strain does not produce any phenotype under standard laboratory conditions (34) although PdeG is expressed (G. Klauck and R. Hengge, unpublished data). An interesting allele, in which a larger deletion removes the first 630 nucleotides of pdeG, is found in 14 members of a series of ExPEC, two AIEC, an APEC and several commensal E. coli strains (Table 2). At first glance, such a deletion seems likely to eliminate the expression of the gene. However, the experimental deletion of this shortened allele (c1610), which had not been recognized as a 5-truncated version of an originally longer gene, in the UPEC strain CFT073 resulted in increased biofilm formation (27). This indicates that (i) an N-terminally truncated version of PdeG (denoted as PdeG* in Fig. 1) is in fact expressed from this 5′-incomplete allele, probably from an internal secondary start codon, and (ii) this PdeG* variant, which has an intact EAL domain but no CSS domain, shows PDE activity. Since wild-type PdeG is expressed but inactive under comparable conditions, this suggests an inhibitory role of the CSS domain in the control of PDE activity.
The pdeL gene, which encodes a PDE consisting of a putative DNA-binding LuxR-like domain followed by a canonical EAL domain (52), is entirely deleted in two laboratory strains (BW2952 and ED1a) and corrupted by an internal stop codon in an APEC strain (Table 2). In contrast, almost half of the remaining 58 strains with intact pdeL show a large insertion upstream of pdeL (i.e., between betT and pdeL) which contains the gene for an AidA-I-like adhesin (Table 2) (30). The region between this adhesin gene and pdeL does not contain any apparent terminator motifs, suggesting that the two genes constitute an operon. The regulatory regions present upstream of this operon or upstream of pdeL (in the strains that do not contain the AidA-I-like adhesin gene) share important regulatory motifs, e.g., a Cra binding site and the putative promoter (53), with some divergence in between these motifs. The physiological role of the AidA-I-like adhesin and its apparent coregulation with PdeL is not yet clear, mainly because it is widespread among different pathogenic E. coli but also occurs in some commensal strains. Although K-12 strains do not have it, all EHEC strains studied here, as well as all EAEC strains of the O104:H4 serotype (including the 2011 German outbreak strain LB226692), possess this Aid-I-like adhesin gene linked to pdeL.
PdeO is a 799-amino-acid protein with two PAS domains and a GAF domain, followed by a degenerate GGDEF and an EAL domain, whose PDE activity is controlled by oxygen via a PAS domain-associated heme (54, 55). It acts as an antagonist to DgcO, with both proteins being part of a specialized degradosome that also contains the c-di-GMP-regulated PNPase (46, 56). Due to a whole dgcO-pdeO operon deletion (in the EHEC O103:H2 strain 12009) or small insertions or deletions that generate frameshifts in pdeO (Table 2), PdeO is absent in several EHEC and ExPEC strains. Since the RNA substrates of this specialized DgcO/PdeO-containing degradosome are unknown, the functional consequences of this absence of PdeO in certain EHEC and ExPEC strains are unclear.
PdeR is a 661-amino-acid composite of a PAS domain, a GGDEF domain with hardly detectable activity, and an active EAL domain. It acts as a c-di-GMP-sensing and inhibitory component of the molecular switch mechanism that activates the expression of the biofilm regulator and matrix production activator CsgD in response to rising intracellular c-di-GMP (20). It is thus not surprising that it is highly conserved (Fig. 1). However, there are a few noteworthy exceptions (Table 2). In two EHEC strains of the O157:H7 serotype, a five-nucleotide insertion (in codon 524) produces a frameshift in pdeR and thus should result in an absence of a functional PdeR protein. In the EHEC O111:H− strain 11128 a sense-to-stop codon mutation (in codon 445) produces a similar effect. The consequence of knocking out pdeR is a hyperactivation of CsgD expression and therefore very high production of curli fibers and cellulose. This very high CsgD expression is no longer c-di-GMP regulated but still depends on RpoS-containing RNAP and the transcription factor MlrA (20). In classical O157:H7 EHEC strains, however, a lambdoid stx-carrying phage is inserted within mlrA (57); these strains therefore do not produce CsgD (although derivatives with csgD promoter mutations exist, in which CsgD is expressed again ), and the frameshift mutation in pdeR mentioned above should not have any consequences. However, in the equally Stx-producing O111:H− serotype 11128 strain, mlrA is fully intact. Moreover, a distinct aggregative behavior was recently reported for O111 strains that was positively correlated with the production of curli fibers and RpoS function (59). Our finding that the O111:H− strain 11128 is a PdeR-deficient mutant, but wild-type with respect to mlrA, suggests that this strain overproduces CsgD and curli fibers in comparison to most other E. coli strains (i.e., pdeR+ mlrA+ strains). In that respect, it resembles the Stx-producing 2011 outbreak O104:H4 strain, which also combines very high CsgD and curli expression with the production of Stx (30). Curli fibers are highly inflammatory (60, 61) and, if expressed at 37°C, may therefore contribute to systemic absorption of Shiga toxin (30, 62). Notably, curli fibers and cellulose also serve for attachment to surfaces of plants that are of importance for human nutrition and have repeatedly been implicated in EHEC transmission (63,–66).
Among the four genes for degenerate GGDEF/EAL domain proteins, bluF (ycgF) and rflP (ydiV) show interesting variations described in detail below. The other two genes are absolutely conserved in the 61 E. coli strains studied here, although these genes can be knocked out under laboratory conditions: (i) csrD (yhdA), which encodes a protein involved in the turnover of the regulatory RNAs CsrB and CsrA (23, 67), and (ii) cdgI (yeaI), which shows hardly any expression (34) and encodes an GGDEF domain protein with a degenerate A-site. Purified CdgI is enzymatically inactive but can bind c-di-GMP via its intact I-site (F. Skopp and R. Hengge, unpublished results) and therefore most likely represents a c-di-GMP-binding effector protein acting in an unknown physiological context.
BluF consists of a blue light-responsive BLUF domain (68), followed by a degenerate EAL domain that neither degrades nor binds c-di-GMP (24). It acts as a blue-light activated direct antagonist to the repressor protein BluR (YcgE) and thereby can induce several small proteins involved in the control of activity of the Rcs phosphorelay system, which, via the sRNA RprA, can downregulate the expression of CsgD (24, 35, 69). Five EHEC strains of the O157:H7 serotype, as well as a commensal and laboratory strain (MDS42), feature a larger deletion that not only eliminates bluF completely but that also extends to pdeG (ycgG) as well (see above). In three additional strains of diverse patho- and serotypes, the bluF gene is affected by a premature stop codon, a frameshifting four-nucleotide deletion, and an IS element insertion (Table 2). Why these bluF mutant strains have lost the environmental modulation (by light) of the globally regulating Rcs/RprA system is currently not apparent.
This highly degenerate stand-alone EAL domain protein acts as an inhibitor and proteolytic targeting factor for the flagellar master regulator FlhDC (13, 25, 26). In E. coli K-12, rflP shows very low expression only under standard lab condition (34). However, in other strains or under some unknown conditions, RflP expression might be higher. Then mutants deficient for RflP would have higher levels of FlhDC and increased expression of genes of the flagellar control cascade. This would not only affect flagellar components but would also result in higher expression of two regulatory factors that downregulate CsgD and the biofilm matrix components curli and cellulose: (i) PdeH (YhjH, see above), which keeps c-di-GMP levels low and thereby interferes with the expression of CsgD, and (ii) FliZ, a histone-like protein that downregulates many RpoS-dependent genes (including those involved in activating CsgD expression) (15, 70). It is noteworthy that several widely used laboratory strains (Table 2) carry a one-nucleotide deletion/frameshift mutation in rflP, which may represent a biofilm-reducing laboratory “domestication” that researchers inadvertently have selected for.
To date, only four effector proteins that respond to the cellular level of c-di-GMP and directly control the activity of distinct targets have been found in E. coli. These are the two PilZ domain proteins YcgR (71) and BcsA (72), the GIL domain protein BcsE (73), and the “trigger enzyme” and PDE PdeR already described above (20).
By directly interacting with the flagellar basal body, c-di-GMP-bound YcgR slows down flagellar rotation (16, 74, 75). This can be observed during entry into stationary phase in liquid medium (15) and may also occur in macrocolony biofilms where flagella are produced and get entangled in the bottom layer of the colony (76, 77). Notably, several classical EHEC strains of the O157:H7 serotype either carry an IS element within ycgR or exhibit a deletion that eliminates a large 5′ part of ycgR (Table 3), suggesting that these EHEC strains do not shut down flagellar rotation under conditions of high internal c-di-GMP concentration.
BcsA is one of two subunits of the membrane-inserted cellulose synthase complex and consists of several domains, including a c-di-GMP-binding PilZ domain which allosterically controls the glucosyltransferase domain (78). Several ExPEC strains show small deletions that result in frameshifts and thus premature termination of BcsA expression. These mutations would not only eliminate BcsA but are also expected to be polar onto the downstream genes bcsB (encoding the other subunit of cellulose synthase), bcsZ and bcsC, i.e., to confer a complete cellulose-negative phenotype. It is conceivable that cellulose production is counterselected for in these strains because it could interfere with adhesion to host tissue via specific fimbriae made by these E. coli strains.
BcsE binds c-di-GMP via a motif (RxGD) that resembles the I-site of DGCs (73). In E. coli, BcsE, as well as BcsF and BcsG, which are all encoded within a single operon, is required for cellulose biosynthesis (30), whereas in Salmonella it is only required for maximal cellulose production (73), suggesting that BcsE plays a regulatory rather than a structural role in cellulose synthesis. The recently emerged Stx-producing O104:H4 outbreak strains show a C-terminal truncation of bcsE (Table 3) and are cellulose negative, which probably contributes to their virulence because “naked” curli fibers (not in a composite with cellulose) are highly inflammatory (30). In addition, several Stx-producing strains of the O145:H28 serotype show a large deletion that removes most of bcsE and should therefore also be cellulose negative.
In our study we detected a large number of highly diverse mutations (Table 2) in a total of 35 GGDEF/EAL domain-encoding genes in the genomes of 61 E. coli strains which represent the major pathotypes, as well as commensals. Overall, our detailed analysis revealed interesting trends in different types of pathogenic E. coli that seem to reflect different host niches and mechanisms of host cell adherence. Moreover, our findings provide a basis for detailed hypotheses that may guide future experimental analyses of c-di-GMP signaling in these diverse E. coli strains.
In many EHEC strains we observe a tendency to lose DGCs such as DcgE (YegE) and DcgT (YcdT), which are involved in the production of CsgD (and therefore curli and cellulose) and PGA, respectively. Moreover, those EHEC as well as some EPEC that possess an intact dgcT gene, often show an insertion of the extra PDE gene pdeT (vmpA) right downstream of dgcT in a common operon, suggesting that PdeT antagonizes DgcT activity. These strains can therefore be expected to produce low levels of biofilm matrix components, which is often further supported by the absence of MlrA, an activator of csgD transcription, due to insertion of the stx-carrying prophage into the mlrA gene (30, 57). Possibly, matrix production is counterselected because it may interfere with the specialized adherence mechanism of EHEC/EPEC which involves a type III secretion system that induces pedestal formation followed by adhesion via intimin (33). Furthermore, several classical EHEC O157:H7 strains also lack the c-di-GMP binding protein YcgR, suggesting that they continue to be motile under certain conditions of high c-di-GMP levels, which may be those promoting the expression of the type III secretion system effectors EspA and EspB, several types of pili, Tir, and intimin and therefore adhesion to intestinal cells (79). An interesting deviation from this general pattern in classical EHEC/STEC is found in the Stx-producing O111:H− strain 11128. Due to a mutation in pdeR (yciR) and an intact mlrA gene, this strain can be expected to even overproduce CsgD, curli, and cellulose. Indeed, it shows curli-dependent cellular aggregation (59). With respect to Stx and high curli production this strain thus resembles the O104:H4 strains rather than classical STEC (see below).
Also in ExPEC—in both UPEC and MNEC—a trend to reducing c-di-GMP is becoming apparent from our analysis. Many of these strains have lost DgcO (DosC), sometimes together with its cognate PdeO (DosP, encoded in a common operon), which is an oxygen-controlled system that affects an unknown target. Moreover, classical UPEC strains tend to possess additional stand-alone EAL domain PDEs (strain 536 even has two of these, PdeX and PdeY). For UPEC strains, low c-di-GMP levels may be crucial because they depend on motility, which is negatively c-di-GMP regulated, to establish a urinary tract infection. Moreover, they may benefit from downregulating the expression of curli fibers at least during acute infection, since curli can trigger a local immune defense (80).
Finally, EAEC of the O104:H4 serotype, which adhere to intestinal cells in biofilm-like patches with a characteristic stacked brick pattern (81), are characterized by an additional DGC, DgcX. In previous work, we have demonstrated the extremely high expression of the dgcX gene (30). c-di-GMP produced by DgcX may contribute to the very high curli fiber production of these strains (30), as well as to additional adhesion mechanisms. Besides acquiring DgcX, these EAEC strains have lost another DGC, DgcQ (YedQ). In strain 55989, DgcT (YcdT) is also corrupted, and strain 042 has lost both DgcT and DgcE (YegE). Taken together, this indicates a tendency to reduce the diversity of DGCs and to focus c-di-GMP production onto the extremely strongly expressed DgcX. In addition, the membrane-associated DgcX seems to need activation by an unknown molecule binding to a conserved motif on the periplasmic side of its transmembrane MASE4 sensory domain. This unknown ligand could be an intestinal metabolite that may guide EAEC to their optimal sites of adherence.
If high c-di-GMP accumulation by DgcX and strong production of highly inflammatory “naked” curli fibers (due to an absence of cellulose synthesis) occur in combination with Stx production, such as in the recently emerged Stx-producing O104:H4 variants (30), the result may be enhanced virulence, as was observed in the 2011 outbreak (42). It may therefore be useful to complement rapid PCR-based diagnostics as developed during the 2011 outbreak (81) by testing for the presence of dgcX and the status of the mlrA and bcs genes. Overall, our analysis thus indicates that STEC fall into two rather different classes: (i) classical EHEC of several serotypes with reduced c-di-GMP and biofilm matrix production and (ii) nonclassical STEC with high production of c-di-GMP and biofilm matrix, in particular of inflammatory curli fibers, such as the outbreak O104:H4 strain and the Stx-producing O111 strains.
In conclusion, variations in the complement of GGDEF/EAL domain proteins and c-di-GMP-binding effector proteins suggest an intricate interplay of biofilm properties and virulence in pathogenic E. coli. Moreover, these variations within a single bacterial species show that c-di-GMP-related genes and proteins evolve rapidly and thereby contribute to adaptation to host-specific and environmental niches.
We thank our coworkers and colleagues Gisela Klauck and Franziska Mika for helpful discussions and Diego Serra for comments on the manuscript.
Financial support was provided by the European Research Council under the European Union's Seventh Framework Programme (ERC-AdG 249780 to R.H.). T.L.P. was supported by a graduate fellowship from the Deutsche Forschungsgemeinschaft (GRK 1673: Functional Molecular Infection Epidemiology).
Individual author contributions were as follows: concept of the study, R.H.; bioinformatic analyses, T.L.P.; interpretation of data, T.L.P. and R.H.; and writing of the paper, R.H. and T.L.P.
We declare that we do not have any conflicts of interest.
Research reported here has been funded by the European Research Council under the European Union’s Seventh Framework Programme (ERC-AdG 249780 to R.H.). T.L.P. has been partially supported by a fellowship provided by the Graduate Programme GKR 1673 (Functional Molecular Infection Epidemiology) by the Deutsche Forschungsgemeinschaft (DFG).
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JB.00520-15.