|Home | About | Journals | Submit | Contact Us | Français|
Cyclic di-GMP (c-di-GMP) synthetases and hydrolases (GGDEF, EAL, and HD-GYP domains) can be readily identified in bacterial genome sequences by using standard bioinformatic tools. In contrast, identification of c-di-GMP receptors remains a difficult task, and the current list of experimentally characterized c-di-GMP-binding proteins is likely incomplete. Several classes of c-di-GMP-binding proteins have been structurally characterized; for some others, the binding sites have been identified; and for several potential c-di-GMP receptors, the binding sites remain to be determined. We present here a comparative structural analysis of c-di-GMP-protein complexes that aims to discern the common themes in the binding mechanisms that allow c-di-GMP receptors to bind it with (sub)micromolar affinities despite the 1,000-fold excess of GTP. The available structures show that most receptors use their Arg and Asp/Glu residues to bind c-di-GMP monomers, dimers, or tetramers with stacked guanine bases. The only exception is the EAL domains that bind c-di-GMP monomers in an extended conformation. We show that in c-di-GMP-binding signature motifs, Arg residues bind to the O-6 and N-7 atoms at the Hoogsteen edge of the guanine base, while Asp/Glu residues bind the N-1 and N-2 atoms at its Watson-Crick edge. In addition, Arg residues participate in stacking interactions with the guanine bases of c-di-GMP and the aromatic rings of Tyr and Phe residues. This may account for the presence of Arg residues in the active sites of every receptor protein that binds stacked c-di-GMP. We also discuss the implications of these structural data for the improved understanding of the c-di-GMP signaling mechanisms.
Cyclic bis(3′→5′) dimeric GMP (c-di-GMP) (see Fig. S1 in the supplemental material) is a nearly universal bacterial second messenger that regulates a variety of processes, including bacterial switching from motility to sessility, cell motility, intercellular interactions, biofilm formation and dispersal, and the responses to oxygen, nitric oxide, and a variety of other environmental challenges (1,–3). In the past several years, the mechanisms of c-di-GMP turnover, synthesis by diguanylate cyclases (DGCs) that contain the GGDEF domain and hydrolysis by c-di-GMP-specific phosphodiesterases (PDEs) that contain EAL or HD-GYP domains, have been thoroughly investigated; this included the structural characterization of several DGCs and PDEs (4, 5). The emphasis now has largely shifted to the identification of (i) the upstream cellular signals that regulate the DGC and PDE activities and (ii) the downstream modules that respond to the changes in the c-di-GMP levels and elicit certain changes in cellular behavior and metabolism. Progress in the latter area has been slow to come by, despite considerable effort to identify potential c-di-GMP-binding proteins. Here, we briefly review the available data on the structural and sequence diversity of experimentally characterized c-di-GMP-binding proteins and try to deduce some general trends in c-di-GMP binding mechanisms that might help to understand c-di-GMP signaling mechanisms and identify additional c-di-GMP receptors.
c-di-GMP was initially described (6) as an allosteric activator of cellulose biosynthesis in the alphaproteobacterium “Acetobacter xylinum” (also known as Gluconacetobacter xylinus and, more recently, Komagataeibacter xylinus ). The cellulose synthase (BCS) complex of K. xylinus includes four key subunits, referred to as BcsA, BcsB, BcsC, and BcsD, and fractionation of this complex mapped the c-di-GMP binding site to a 90-kDa BcsB-containing proteolytic fragment (8, 9). As a result, BcsB subunits from a variety of organisms have been routinely annotated as a “c-di-GMP-binding protein.” However, several lines of evidence appeared incompatible with the notion that BcsB is the (sole) c-di-GMP-binding protein. Thus, progress in microbial genome sequencing led to the identification of the GGDEF and EAL domains, linked to c-di-GMP turnover, in genomic sequences from a variety of bacteria, including some that had not been known to produce cellulose (10, 11). The domain architectures of diverse GGDEF- and EAL-containing proteins indicated that these domains are part of a vast signaling machinery closely linked to the bacterial two-component signal transduction system (10, 11). Also, the sheer abundance of GGDEF, EAL, and HD-GYP domain-containing proteins encoded in the genomes of Escherichia coli, Pseudomonas aeruginosa, Vibrio cholerae, and other bacteria (see the c-di-GMP census [http://ncbi.nlm.nih.gov/Complete_Genomes/c-di-GMP.html] for a recent count) suggested that c-di-GMP-mediated signaling is likely not limited to the regulation of cellulose production.
The discovery of the c-di-GMP-binding allosteric inhibitory (I) site in many GGDEF domains (12, 13) showed that at least some, perhaps inactivated, GGDEF domains could serve as c-di-GMP receptors (14). Likewise, enzymatically inactivated EAL and HD-GYP domains were proposed—and subsequently demonstrated—to function as c-di-GMP-binding receptor proteins in at least some signal transduction systems (5, 15). However, c-di-GMP binding to such proteins did not resolve the downstream signaling mechanism problem, including the mechanism of BCS activation. Obviously, there had to be other c-di-GMP targets.
The early suggestions that c-di-GMP could bind to GTPases, such as p21ras (16, 17), did not look promising, as the observed cellular levels of c-di-GMP were almost 1,000-fold lower than those of GTP. That would make GTP a powerful competitive inhibitor of c-di-GMP signaling, something that had not been observed in bacterial cells.
For a number of years, the search for a c-di-GMP-binding target(s) was guided by the perception that there should be a single such protein (or protein domain) acting pretty much the same way as the well-characterized cAMP-binding adaptor protein (CAP or CRP). The search for such a protein finally led to the identification of PilZ, an ~100-amino-acid (aa) protein domain (18) that appeared to satisfy at least some of the requirements for being considered the (universal) c-di-GMP-binding adaptor protein.
First of all, a PilZ domain has been found at the C terminus of BcsA, the catalytic subunit of the BCS complex, which was in line with the role of c-di-GMP as an activator of BCS. The apparent contradiction to the earlier observation on c-di-GMP binding to BcsB (9) had been resolved by fact that in K. xylinus, the bscA and bcsB genes are fused, encoding a single 1,500-aa BcsAB fusion protein that contains the PilZ domain in the middle. It was reasonable to assume that proteolytic digestion of BcsAB could leave the PilZ domain bound to the BcsB fragment. Second, the PilZ domain had been seen in various combinations with GGDEF, EAL, and HD-GYP domains, which was consistent with all of these domains being part of the same signaling system. Finally, the YcgR protein, which combines the C-terminal PilZ domain with an N-terminal PilZN (PilZ N terminal, also referred to as YcgR_N) domain (18) (see Fig. S2 in the supplemental material), had been shown to participate in the c-di-GMP-dependent regulation of flagellar motility.
Despite all of these observations, which were soon buttressed by direct demonstration of c-di-GMP binding to various PilZ-containing proteins (19,–23), the phyletic distribution of various c-di-GMP-related domains indicated that PilZ could be only a partial solution to the problem of c-di-GMP signaling. A variety of organisms, including the important human pathogens Rickettsia spp., encoded predicted c-di-GMP synthetases and hydrolases but no PilZ domains (18, 24). Moreover, c-di-GMP-mediated regulation of virulence has been demonstrated in Anaplasma phagocytophylum (25), a member of the order Rickettsiales and also a human pathogen, which encodes a single GGDEF-containing DGC and no (known) PDEs or PilZ domains (see the c-di-GMP census). Thus, the PilZ domain could not account for all instances of the c-di-GMP receptors.
Remarkably, subsequent studies revealed the existence of three distinct classes of PilZ domains that differ in the ability to bind c-di-GMP. Type I PilZ is a full-length domain containing both RXXXR and (D/N)XSXXG motifs (described in reference 18); it is apparently the only PilZ type capable of directly binding c-di-GMP. Many PilZ domain proteins, including the eponymous PilZ protein from P. aeruginosa, belong to type II, which lacks the N-terminal RXXXR motif and is incapable of binding c-di-GMP (26,–28). PilZ type III is a truncated version that forms a stable tetramer via a helical bundle (29); it does not bind c-di-GMP either.
Another possible solution to the c-di-GMP receptor conundrum came from the discovery of c-di-GMP-binding RNA aptamers (riboswitches) (30, 31). However, again, these riboswitches appear to have a limited phylogenetic distribution (see http://ncbi.nlm.nih.gov/Complete_Genomes/c-di-GMP.html) and offer only a partial solution to the problem. To conclude, the multitude of c-di-GMP-dependent systems dictates that there should be a multitude of c-d-GMP-receptor proteins (or RNAs).
One of the potential reasons for the diversity of c-di-GMP-binding proteins is the structural diversity of c-di-GMP itself. Indeed, c-di-GMP is a rather flexible molecule that can adopt a variety of conformations, ranging from a fully stacked form to an extended form, by adjusting a few torsional angles or the glycosidic angle (Fig. 1). The hydrophobic surface of the guanine base can also stack with an arginine or phenylalanine/tyrosine residue, which increases the diversity of its interactions. It also has two guanine bases, each containing a Watson-Crick (WC) edge that can bind to an aspartate or glutamate residue (Fig. 1A) and a Hoogsteen (H) edge (see Fig. S1 and reference 32 for the nomenclature) that can interact with an arginine residue in either perpendicular (Fig. 1B) or parallel mode (Fig. 1C). In addition, c-di-GMP can oligomerize; there are protein structures containing c-di-GMP in its monomeric, dimeric, and as of recently, tetrameric forms (33). A dimeric c-di-GMP molecule can adopt even more forms; the four guanine bases can be either completely mutually stacked or stacked only partially, and the destacked guanine base can interact extensively with the surrounding hydrophobic amino acid residues (see below). The superimposition of several typical c-di-GMP molecule conformations found in some protein complexes is shown in Fig. 1E. It is clear that the two guanine bases can span considerable conformational space, allowing significant binding flexibility of this wonderful molecule.
In order to assess the diversity of c-di-GMP-binding proteins and define any potential common trends in c-di-GMP binding mechanisms, it is instructive to examine the structures of known c-di-GMP receptors, particularly those solved as c-di-GMP complexes. Among these structures, three distinct families of c-di-GMP-binding proteins (GGDEF I site based, EAL domain based, and PilZ domain based) are represented by multiple structures in the Protein Data Bank (PDB) (34). In addition, there are several other proteins displaying alternative mechanisms of c-di-GMP binding, each currently represented by a single protein structure (Table 1). Understanding the common and specific features of these complexes could also help attempts to predict additional c-di-GMP-interacting proteins.
Structural characterization of PleD, a DGC from Caulobacter crescentus, revealed a c-di-GMP molecule bound at the autoinhibitory (I) site, which was formed by the Arg and Asp residues of the conserved RXXD motif located five residues upstream of the active-site GG(D/E)EF motif (12, 13). The RXXD motif of the I sites of active or inactive DGCs is arguably the simplest known c-di-GMP-binding motif; it is also the best-characterized one, with five structures of such complexes available in the PDB (Table 1). Figure 2 shows the active site of the enzymatically inactive DGC PelD (35) in a complex with a c-di-GMP dimer (Fig. 2A) and presents a simplified binding scheme (Fig. 2B). The RXXD motif adopts a β-turn loop structure with a hydrogen (H) bond forming between the backbone atoms of the Arg (R367) and Asp (D370) residues. The guanidinium group of R367 forms two H bond/salt bridges with one of the guanine base of c-di-GMP (G1) via its side H edge (O-6 and N-7 atoms), while D370 forms two H bond/salt bridges with another guanine base (G4) via its WC edge. Thus, the RXXD motif enhances the mutual intercalation of the guanine bases of the c-di-GMP dimer. In addition to the RXXD motif, several accessory residues (carbon atoms shown in gray) contribute to the c-di-GMP-binding activity of PleD. For example, Arg161 and Arg402 participate in binding by interacting with the guanine bases of c-di-GMP via their H edge, which seems to be a common binding mode for the arginine residues involved in interactions with intercalating guanine bases. Two hydrophobic residues, Tyr399 and Leu388, form a hydrophobic cluster that helps to stabilize the inner G4 base, whereas the outermost G2 base is exposed to the solvent. The I sites of the active DGCs PleD (PDB code 2V0N) and WspR (PDB code 3I5A) show very similar c-di-GMP-binding modes (36,–38), although with fewer accessory residues involved in c-di-GMP binding (Fig. 2C).
A similar RXGD sequence motif has been shown to participate in c-di-GMP binding in BcsE (39), a cytoplasmic protein that is encoded in a variety of E. coli-like cellulose synthase operons (class II bcs operons in the classification proposed in reference 40). Although BcsE shows no significant sequence similarity to the GGDEF domain, its RXGD motif is located on a loop between two predicted β-strands, which is similar to the secondary structure around the RXXD motif in the I site (39). BcsE might regulate cellulose biosynthesis by transferring c-di-GMP to the PilZ domain of BcsA, the catalytic subunit of the cellulose synthase complex, thereby providing an additional level of control.
While the RXXD motif in the I sites binds the characteristic mutually intercalated dimeric form of c-di-GMP, the EAL-containing PDEs bind and hydrolyze c-di-GMP in its fully extended monomeric form, which is apparently more suitable for cleavage of the central 12-atom ribose-phosphate ring (41,–43). The same binding mode of the extended monomeric c-di-GMP is retained in the enzymatically inactive EAL domains that function exclusively as c-di-GMP receptors (28, 44, 45). Figure 3A shows the structure of the EAL domain of the blue light-regulated PDE BlrP1 in a complex with c-di-GMP (PDB code 3GG1) and a calcium ion, an inhibitor of the c-di-GMP hydrolysis (41). c-di-GMP binding depends on the eponymous EAL motif, with the absolutely conserved Glu residue (E188) coordinating the calcium ion, which is also coordinated by surrounding residues E272, D302, and N239, and one of the oxygen atoms of the central ribose-phosphate ring. Thus, this Glu is a critically important residue that is required to bind the rigid ribose phosphate ring of c-di-GMP. The Ala residue of the EAL motif is also highly conserved, but its role in binding is unclear. However, the Leu residue of this motif (L190) plays a crucial role in stabilizing the central 12-member ring. This is clearly shown in the van der Waals (sphere) plot in Fig. 3B, where L190 fits next to the ribose-phosphate ring. This plot explains the absolute conservation of this Leu residue; a Val residue would be too short to interact with the ring, while an Ile residue would cause some steric hindrance. Both Val and Ile residues would destabilize the interaction with the ribose-phosphate ring and therefore not be suitable in this position.
The EAL motif can be extended further to include the Arg in the fifth position, which is conserved in nearly all EAL domain sequences (1, 11). This Arg residue (R192 in BlrP1) forms two bonds with the distal phosphate of the central 12-member ribose-phosphate ring (Fig. 3A and andB).B). Thus, the EXLXR motif forms a stable platform to wrap around the ribose-phosphate ring of the c-di-GMP molecule. The two guanine bases of c-di-GMP are also engaged in interaction with amino acid residues. For example, the carboxyl group of Asp215 forms two bonds with the WC edge of the G2 base, and Asn239 stacks with the G2 base from above, while Phe203 and Pro199 stack with the G2 base from below to fix the G2 base. The G1 base at the other end is more exposed to the solvent but also forms two bonds with the carboxyl group of Glu362, as well as hydrophobic stacking with the aromatic group of Phe381 from below. Thus, the conserved EXLXR motif and an array of additional residues cooperate in binding monomeric c-di-GMP.
Two distinct variants of this binding mode are seen in the EAL domains of the enzymatically inactive PDEs FimX from Xanthomonas campestris (XcFimX; PDB code 4F3H) and LapD from Pseudomonas fluorescens (PDB code 3PJT) (27, 45), which have the EXLXR motif replaced with QAFLR and KVLSR, respectively. In the first one, one of the guanine bases of c-di-GMP adopts an unusual syn-glycosidic angle. Figure 3C shows a superposition between the c-di-GMP binding sites of XcFimX and its ortholog FimX from P. aeruginosa (PaFimX; PDB code 3HV8), which contains the typical EVLLR motif (44). While the G1 base in the complex with PaFimX is extended and interacts with the carboxylate of E654 and stacks with Y673 and F652, the glycosidic-angle-switched nucleotide syn-G1 in XcFimX can still interact with E216 via the WC edge of c-di-GMP and stacks with the aromatic ring of F217. While a syn-glycosidic angle for a purine base is rather unusual, it has been seen previously, e.g., in cAMP bound to its CAP receptor (46). Thus, in an extended monomeric c-di-GMP molecule, the guanine base exposed to the solvent may adopt a syn- or anti-glycosidic angle, depending on the local sequences of the proteins to which they bind. Another interesting feature of this structure is that the aromatic group of the Phe residue of the QAFLR motif adopts an orientation that fits nicely into the ribose-phosphate ring in a way similar to that of the usual Leu residue of the EXLXR motif (cf. Fig. 3B and andDD).
In the case of LapD, the c-di-GMP binding mode is largely similar to that in other EAL domains, with the exception that the amino group of the Lys446 residue (which replaces the active-site Glu of the EAL motif) forms a strong electrostatic bond with one of the phosphate oxygen atoms of the ribose-phosphate ring of c-di-GMP (45). Nevertheless, the affinity of the EAL domain of LapD for c-di-GMP is much weaker than that of the EAL domain of FimX (Table 1) and the affinity of full-size LapD is even lower (45). Still, it is remarkable that c-di-GMP binding is accompanied by only a minor change in the LapDEAL domain that is limited to just four residues in the c-di-GMP binding site (45).
The PilZ domain possesses several features that make it unique among c-di-GMP-binding proteins. First, it contains two separate sequence motifs, an RXXXR motif with conserved Arg residues surrounding one guanine base of c-di-GMP and a DXSXXG motif that surrounds the other guanine base. Second, the PilZ domain is able to bind with monomeric or dimeric c-di-GMP molecules without much conformational adjustment. In the complex with the PlzD protein from V. cholerae (VCA0042; PDB code 2RDE), only one c-di-GMP molecule is detected (20), while the closely related YcgR protein from Pseudomonas putida (PP4397; PDB code 3KYF) binds dimeric c-di-GMP, as do PilZ domains in the structures of cellulose synthetase subunit BcsA (PDB code 4P02) and the Alg44 protein (PDB code 4RT0) (47, 48).
Figure 4A shows the structure of PlzD in complex with monomeric c-di-GMP. The first conserved Arg residue (R136) of the RXXXR motif forms two bonds with the H edge of guanine base G1. It also stacks with the second conserved Arg residue (R140), which forms a bond with two water molecules nearby when only a single c-di-GMP molecule is present. The Leu residue L135 immediately preceding R136 forms a CH-π interaction with the G1 base. The DXSXXG motif comprises a loop that interacts with the second guanine base from above. Its conserved aspartate (D162) forms two bonds with the WC edge of guanine base G2, and the serine residue (S164) forms a bond with the 2-NH2 group of the G2 base.
Figure 4C shows the BcsA structure in complex with two c-di-GMP molecules (47). Interestingly, only a minor conformational change of the residue preceding the RXXXR motif is needed to incorporate the second c-di-GMP molecule. Compared to the PlzD structure (Fig. 4A), the guanine base G3 of the second c-di-GMP intercalates deeply into the first c-di-GMP by displacing the two water molecules binding with R140 of PlzD and by pushing L135 around to make room for guanine base G4. In the BcsA structure, guanine base G3 forms two bonds with the guanidinium group of R580 via its H edge and guanine base G4 interacts with the two residues (Q578 and R579) that precede the first conserved Arg residue (R580). The side chain of Q578 forms two bonds with the WC edge of guanine base G4, while the guanidinium group of R579 is engaged in two bonds with the H edge of guanine base G4. Thus, the two residues prior to the RXXXR motif play important roles in stabilizing the PilZ domain interaction with c-di-GMP. Indeed, sequence analysis of various PilZ domains shows a high degree of conservation of these two residues, Glu/Gln in the −2 position and Arg/Lys in the −1 position with respect to the RXXXR motif (1, 18). From the structure analysis, it is clear that glutamate can also use its carboxylate to interact with the WC edge of guanine base G4; the terminal amino group of lysine can also pair with the H edge of guanine base G4, although only one bond will be formed. This is also seen in the structure of the stand-alone PilZ domain protein PA4608 from P. aeruginosa (PDB code 2L74), which contains a Glu-Arg pair in front of the RXXXR motif and incorporates two c-di-GMP molecules in the active site (23). Thus, the RXXXR motif can be extended to LRXXXR for the PilZ domains binding monomeric c-di-GMP and to (Q/E)(R/K)RXXXR for those binding dimeric c-di-GMP.
The nature of the upstream residue has been suggested to be a critical factor in determining whether a given PilZ domain would interact with a monomeric or dimeric form of c-di-GMP (49). If this residue is large and hydrophobic, such as Leu, it would favor interaction with monomeric c-di-GMP, whereas if this residue is an Arg or Lys, it allows the second c-di-GMP molecule to come in. However, it remains to be seen whether this rule is generally applicable. In a recent study, the PilZ domain from Alg44, which contains a Gln residue in the −1 position, bound the dimeric form of c-di-GMP, and this has not been affected by mutating this residue to Leu (48).
While most of the cases discussed above include c-di-GMP binding to a single receptor molecule, there is growing evidence of c-di-GMP molecules binding at protein interfaces and facilitating protein-protein interactions.
All of the c-di-GMP binding modes described above are nonsymmetrical, with two or four guanine bases of c-di-GMP located in a somewhat different environment. However, there are also symmetrical c-di-GMP binding modes, with monomeric, dimeric, or tetrameric c-di-GMP bound mainly in the protein dimer interface. One such noncanonical binding mode is found in the mammalian STING protein that senses bacterial c-di-GMP to elicit the innate immune response. Several human and mouse STING structures in complex with c-di-GMP have been reported (50,–54), all with a single c-di-GMP binding at the interface of the STING dimer in a symmetrical way (Fig. 5A). The most unique feature of this c-di-GMP binding mode is the extensive guanine-guanine π-π or arginine-guanine cation-π stack (55, 56), with two guanine bases involved in each stack. The Arg residue again uses its guanidinium group to stack with the guanine base. In addition, the long carbon side chain of arginine (C-β–C-γ–C-δ) can also stack upon the aromatic group of tyrosine residues to form an extensive four-layer stack. This four-layer stack thus comprises a unique Tyr/Gua/Arg/Tyr stacking interaction. In addition, the c-di-GMP molecule bound in the STING interface also adopts a more compact central ring, as guanine bases are partially stacking to each other (Fig. 5A), unlike all other reported ribose-phosphate ring conformations that are more extended, allowing an arginine residue or guanine base to stack in between.
The c-di-GMP binding mode exhibited by the transcriptional regulator VpsT (57) is somewhat similar to that seen in the RXXD-containing I sites, namely, the four guanine bases in the dimeric c-di-GMP form a partial stack, with two Arg residues (R134 of each VpsT monomer) binding with the inner guanine bases via their respective H edges (Fig. 5B). Yet, it differs from the one with the RXXD motif in that there is no β-turn RXXD loop and, obviously, no aspartate-guanine base binding. Instead, one of the outer guanine base pairs with the carboxylate group of a tartrate molecule present in the crystallization screening agent. This complex thus comprises another four-layer stack with the Gua/Arg/Arg/Gua stacking interaction.
When the GGDEF domain contains no I site, its DGC activity can still be subject to product feedback inhibition, as seen in the c-di-GMP complex of the isolated GGDEF domain from X. campestris protein Xcc3486 (PDB code 3QYY) (58), which has a HAMP-GGDEF domain architecture. This complex contains dimeric c-di-GMP with a unique semistacked mode for the guanine bases. As shown in Fig. 5C, the two inner G1 bases form a partial stack, while the two outer G2 bases do not stack at all to any base or aromatic group. Instead, the two G2 bases are enclosed by several hydrophobic side chains such as those from L193, L177, and the C-β of D216. In addition, the G2 base forms two bonds with D190 via its WC edge and one bond with R212 via its H edge. It is interesting that this Arg212 residue precedes the GGDEF signature motif and is extremely conserved in all GGDEF domains (1, 11). Another interesting feature of this structure is the stacking of the planar G214-G215 backbone with the G1 base (Fig. 5C) and the backbone G215-D216 turn of 90° upward to prevent potential steric hindrance with the G2 base. Thus, this binding mode includes two guanines in a Gly-Gly/Gua/Gua/Gly-Gly stack and two other guanines completely destacked and accommodated in a hydrophobic environment in the GGDEF dimer interface.
While most c-di-GMP research has been done with Gram-negative bacteria, a recent study revealed a critical role for c-di-GMP in controlling the developmental switch between sporulation and vegetative growth in the actinobacterium Streptomyces coelicolor, which is based upon a very unusual mode of c-di-GMP binding to the master developmental regulator BldD (33). In this complex, a c-di-GMP tetramer interacts with RXD-X8-RXXD signature motifs of two BldD molecules that have no other points of contact. Basically, this novel BldD2-(c-di-GMP)4 complex contains a set of four c-di-GMP molecules comprising two interlocked c-di-GMP dimers (Fig. 6A). One c-di-GMP dimer adopts a stacking configuration similar to that of the RXXXR motif binding mode in the PilZ domain (Fig. 4D) or an RXXD binding mode in the GGDEF I site (Fig. 2B), in that two inner arginine residues (R125, residue numbering from Q7AKQ8_STRCO) that are paired with guanine bases in the familiar Hoogsteen mode (G1 and G3) are stacked by two outer guanine bases (G2 and G4). This c-di-GMP dimer is interlocked with another novel (c-di-GMP)2 stacking configuration in which the two inner guanine bases (G5 and G7) are stacked by two outer arginine residues (R114) paired with guanine bases (G6 and G8) in the same Hoogsteen mode. Moreover, R114 seems to play an important role in interlocking these two c-di-GMP dimers (33). It uses its N-ε and NH1 atoms to H bond with one c-di-GMP dimer on one side and uses the NH2 atom on the other side to pair with another c-di-GMP dimer. Asp116 also helps bind this guanine base via binding to the R114-NH2 atom. Both aspartic acid residues (D128 and D116) again bind the corresponding guanine bases (G2 and G6, respectively) via the WC edge. Thus, in this BldD-(c-di-GMP)4 complex, there are three columns and four rows of stacks. The columns are G2/R125/R125/G4, R114/G3/G1/R114, and G6/G7/G5/G8, respectively. The rows are G2/R114/G6/D116, R125/G3/G7, R125/G1/G5, and G4/R114/G8/D116, respectively.
The above-mentioned important residues of BldD, crucial for interaction with tetrameric c-di-GMP (R114, D116, R125, and D128), are all highly conserved in the RXD-X8-RXXD signature motif. Some of the linker residues also seem to play a role in binding. For example, side chain atoms of Val121 form a CH-π bond with guanine base G6/G8, and those of Ser123 form H bonds with guanine base G6/G8. In addition, the side chain atoms of Leu122 and Ile124 turn out toward the BldD protein for hydrophobic interactions, which also contribute to the formation of a stable BldD-(c-di-GMP)4 complex. Thus, the binding motif can be extended to RXD-XXXX-VLSI-RXXD, in which the Val, Leu, or Ile residue can be a large, hydrophobic-group-containing amino acid residue and the Ser residue can be a smaller one containing a polar group. This unique stacking pattern thus includes several interdigitated Gua/Arg/Arg/Gua, Arg/Gua/Gua/Arg, and Gua/Gua/Gua/Gua stacks. The interface between the two BldD subunits spans around 10 Å, so c-di-GMP tetramer binding bridges that gap, allowing the formation of a functional BldD dimer (33).
All of the c-di-GMP-binding proteins discussed above have three-dimensional structures that have been solved in the presence of c-di-GMP. In addition, there is a growing number of potential c-di-GMP receptors that have been experimentally demonstrated to bind c-di-GMP with physiologically relevant affinities but—so far—produced no crystal structures in the c-di-GMP-bound form (Table 2).
One of the best examples is the interaction of two subunits, PgaC and PgaD, of the E. coli poly-β-1,6-N-acetyl-d-glucosamine synthase PgaABCD, a membrane-bound enzyme that is responsible for the synthesis and secretion of this extracellular polysaccharide, a major component of biofilms produced by a variety of bacteria. The activity of this enzyme has been shown to depend on the presence of c-di-GMP, with half-maximal activation observed already at 62 nM c-di-GMP (59). A detailed study of the mechanisms of enzyme activation revealed that c-di-GMP was concomitantly binding to both the PgaC and PgaD subunits by interacting with the R(TS)qRXYG(NR)V and LhNKXR motifs on the cytoplasmic side of the respective proteins (59). No specific binding to either PgaC or PgaD was observed, indicating that c-di-GMP binds at the interface of the two proteins and stabilizes their interaction (59). Structural characterization of the c-di-GMP complexes of these and other proteins listed in Table 2 is a challenge for the near future, as these complexes may reveal entirely new modes of c-di-GMP binding and bring new insights into the mechanisms of c-di-GMP signaling.
As discussed above, c-di-GMP binding sites typically include a pocket with a hydrophobic lining to accommodate the guanine rings of c-di-GMP and some amino acid residues, usually Arg or Asp/Glu, that form hydrogen bonds with the H edge or the WC edge of the guanine base, respectively. Indeed, Arg residues are present in every active site of (known) protein complex structures containing c-di-GMP molecules in stacked form, and they seem to play an important role in the stabilization of monomeric or dimeric c-di-GMP molecules. Mutation of Arg residues has been successfully used to define the c-di-GMP binding sites in the PgaC, PgaD, and BcsE proteins (39, 59). Indeed, Arg is a very special amino acid; it has a long linear side chain of three carbon atoms (C-β–C-γ–C-δ) ending with a terminal guanidinium group. The guanidinium group comprises three nitrogen atoms (N-ε, NH1, and NH2) attached to a central carbon atom (C-ζ) (Fig. 1D). It exhibits both H bonding and stacking capability to interact with a guanine base in diverse ways. For example, the guanidinium group can form two bonds to the O-6 and N-7 atoms of a guanine base, by using either its NH1 or NH2 amino group or the N-ε atom, and also stack with a guanine base through the well-known cation-π interaction (55, 56). In addition, its linear alkyl chain can partially stack upon an aromatic group, enabling it to connect two structural moieties (Fig. 1D). A good example is shown in Fig. 4B, in which R584 of BcsA uses the NH1 and NH2 atoms to interact with the H edge G3–N-7 and O-6 atoms in a perpendicular mode, while R580 and R579 use their N-ε and NH1 atoms to interact with the similar H edge of the G1 and G4 guanine bases in a parallel mode, respectively (enlarged in Fig. 1A). Another example is shown in Fig. 5A, in which R238 of STING uses its guanidinium group to form a cation-π interaction with the G2 base but uses its C-β–C-γ–C-δ atoms to form a partial hydrophobic interaction with aromatic amino acid Y240 (shown as spheres in Fig. 1D). This dual stacking capability thus enables an arginine residue to cross-stack two aromatic groups to enhance c-di-GMP binding. One of the most amazing points is that arginine always uses its guanidinium group to form bonds with the guanine base H edge O-6 and N-7 atoms. It never interacts with the guanine base from its WC edge atoms. A possible reason for this phenomenon is that NH1, NH2, and N-ε atoms of Arg all carry protons and can only serve as proton donors. Therefore, Arg cannot interact with the N-1 and N-2 atoms in the guanine WC edge, which are also proton donors. In addition, an Arg residue has more opportunities to stack upon surrounding residues (it can stack upon a guanine base, an aromatic group, or even another guanidinium group). Interestingly, the guanine WC edge seems to be set aside for interaction with the carboxylate atoms of aspartate, glutamate, or even tartrate, possibly because the carboxylate group has only H bonding but no stacking ability (Fig. 2A, ,3A3A and andC,C, ,4A4A and andC,C, ,5C,5C, and and6B).6B). Thus, a guanine base can interact with an aspartate or glutamate from its WC edge atoms and with an arginine from its H edge atoms to significantly enhance its binding affinity for different proteins.
The diversity of c-di-GMP-binding proteins implies an equal diversity of the mechanisms of signal transmission from these proteins to their downstream targets. Indeed, several distinct mechanisms of c-di-GMP signaling have already been uncovered and many more are expected to be described in the future. Nevertheless, structural comparisons of c-di-GMP receptors in the presence or absence of a bound c-di-GMP ligand revealed several general principles that could narrow down the list of potential regulatory interactions.
First of all, both PilZ and GGDEF domains are structurally rigid and usually do not undergo any major conformational changes upon c-di-GMP binding. In PilZ domains, c-di-GMP binding affects primarily the unstructured N-terminal fragment, which wraps around the ligand, with the Arg residues of the RXXXR motif interacting with c-di-GMP (60). Essentially the same picture is observed in the PilZ domains of YcgR-like proteins PlzD (VCA0042) and PP4397. c-di-GMP binding affects the mutual positioning of PilZ and PilZN/YcgR domains but has little effect on the structures of these domains themselves (20, 49). Likewise, in BcsA, c-di-GMP binding causes relatively minor conformational changes in the PilZ domain but shifts the positions of some of its N-terminal residues (47). This shift, however, is sufficient to cause repositioning of the gating loop that controls substrate access of the active site of BcsA. As a result, binding of c-di-GMP leads to a dramatic increase in enzyme activity (almost 100-fold in K. xylinus ).
In the GGDEF domains, the effect of c-di-GMP binding seems to be even less pronounced. The inhibition of DGC activity upon c-di-GMP binding to the I sites of PleD and WspR appears to be caused by locking the respective GGDEF domains in an unproductive conformation that still allows them to bind GTP but not to form c-di-GMP (37, 38). In all of these cases, c-di-GMP seems to act at the level of entire protein domains, either bringing them closer together or keeping them apart.
The second mechanism of c-di-GMP signaling is through its binding at protein interfaces, where it functions as molecular glue, facilitating protein-protein interactions. A good example is the interaction of the enzymatically inactive EAL domain of FimX with the PilZ protein of X. campestris, which regulates extension and retraction of the pilus in the presence of c-di-GMP (27, 28). This XcPilZ protein contains a type II PilZ domain (see above) and cannot bind c-di-GMP on its own. It also could not bind with the XcFimXEAL domain alone, as both the PilZ and XcFimXEAL domains eluted at the expected molecular weights of the respective monomers in the absence of c-di-GMP (27). Yet these three components were eluted as a single ternary complex peak in the presence of c-di-GMP.
The role of c-di-GMP in facilitating protein-protein interactions is particularly important for transcriptional regulators, such as VpsT, BldD, CLP, and MrkH, that need to be in a dimeric (oligomeric) form in order to effectively bind their DNA targets (33, 57, 61, 62). A good example is dimerization of transcription factor BldD, which controls the progression of multicellular differentiation in sporulation in Streptomyces spp. (33). Structural and biochemical analyses show that a tetrameric c-di-GMP links two subunits of BldD through their C-terminal domains, which are otherwise separated by approximately 10 Å and thus would not form dimers without it (33).
c-di-GMP can also disrupt protein-protein interactions, as has been observed in the case of the YajQ protein (63). In X. campestris, YajQ serves as a coactivator for XC_2801, a transcriptional factor of the LysR family, which controls the expression of various virulence-related genes. The c-di-GMP-bound form of YajQ no longer functions as a coactivator, resulting in the switching off of several distinct operons (63). This mechanism positions c-di-GMP near the top of a complex signaling hierarchy of X. campestris.
Finally, there are c-di-GMP-type signaling mechanisms that actually do not include c-di-GMP. Some proteins with highly diverged GGDEF, EAL, or PilZ domains participate in regulatory interactions despite a total loss of enzymatic activity or the ability to bind c-di-GMP. Examples include the GGDEF domain protein GdpS (SA0701) from Staphylococcus aureus, the E. coli EAL domain proteins BluF (YcgF) and CdgR (YdiV), and the E. coli carbon storage regulator CsrD (YhdA), which contains both GGDEF and EAL domains (64,–67). A potentially even more striking example is the interaction between the X. campestris response regulator RpfG, which combines a two-component receiver (REC) domain with the c-di-GMP-degrading HD-GYP domain, and various GGDEF domain-containing proteins, which then recruits a PilZ domain protein to form a tripartite HD-GYP–GGDEF–PilZ complex that effectively controls pilus motility (68, 69). While it is still unclear whether c-di-GMP is required for the formation of either of these complexes (69), this might be just the first example of the peculiar interaction between supposedly c-di-GMP-synthesizing and c-di-GMP-hydrolyzing domains.
Recent studies have significantly expanded the list of c-di-GMP-binding proteins, including several novel classes of c-di-GMP receptors (Tables 1 and and2).2). While most early studies documented c-di-GMP regulation at the posttranslational level, recent characterization of the DNA-binding proteins VpsT, BldD, Bcam1349, BrlR, CLP, FleQ/FlrA, LtmA, MrkH, and VpsR (33, 57, 61, 62, 70,–75) provided clear examples of c-di-GMP involvement in transcriptional regulation. Remarkably, while most of these transcriptional regulators bind DNA via standard helix-turn-helix domains, MrkH combines its c-di-GMP-binding PilZ domain with a previously unknown type of the DNA-binding domain, which is related to PilZN/YcgR and PilZNR/YcgR_2 domains (see Fig. S3 in the supplemental material). Proteins with various domain architectures that combine PilZ domains with uncharacterized or poorly characterized domains are obvious candidates for new types of c-di-GMP receptors. PilZ domain entry PF07238 in the Pfam database (76) provides several examples of such proteins, including, among others, combinations of PilZ with Fis-, Cro/C1-, and MarR-type DNA-binding domains, with the two-component REC domain, and with the DnaK- and DnaJ-type chaperone domains (see Fig. S2 in the supplemental material). All of these proteins are attractive candidates for future experimental studies.
The discoveries of the distinct new modes of c-di-GMP binding by the PgaC/D, BldD, and STING proteins suggest that there might be additional c-di-GMP receptors in addition to those listed in Table 2. Their prediction through bioinformatics does not seem feasible and will require experimental approaches using pulldown assays, differential radial capillary action of ligand assay, or the c-di-GMP capture compound (77, 78). These methods have already been used to identify a number of potential c-di-GMP-binding proteins (39, 78, 79), although it remains to be seen whether their binding affinities prove sufficient to be relevant under physiological conditions. In any case, there is every reason to believe that the existing list of c-di-GMP-binding proteins is not yet complete and there are exciting new discoveries still to be made.
We thank the reviewers for many helpful comments, particularly for suggesting an explanation for why Arg always interacts with the H edge but not with the WC edge.
This work was supported by the Ministry of Education, Taiwan, Republic of China, under the ATU plan, by the National Science Council, Taiwan, Republic of China (grant 102-2113-M005-006-MY3 to S.-H.C.), and by the NIH Intramural Research Program at the National Library of Medicine (M.Y.G.).
Shan-Ho Chou is currently a chair professor and director of the Institute of Biochemistry, National Chung Hsing University, Taiwan. He received his bachelor's degree in chemistry from the National Taiwan Normal University, a master's degree in biochemistry from the National Taiwan University, and a Ph.D. in chemistry from the University of Washington in Seattle, WA. At the first stage of his research career, he studied unusual nucleic acid structures by using nuclear magnetic resonance (NMR) analysis and found several stable nucleic acid structures different from the WC base-paired duplex, which were published in review papers in Nucleic Acids Research and the Journal of Molecular Biology. He then switched to studying structural genomics of the plant pathogen X. campestris by X-ray crystallography and solved several unique c-di-GMP–protein complex structures. He is now combining X-ray, NMR, and small-angle X-ray scattering techniques to study multidomain proteins associated with c-di-GMP and c-di-AMP binding. He is an author of more than 100 research papers, reviews, and book chapters.
Michael Y. Galperin is a lead staff scientist at the Computational Biology Branch of the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine at the National Institutes of Health in Bethesda, MD. He has received an M.Sc. in biochemistry and a Ph.D. in microbiology from the Lomonosov Moscow State University in Russia and postdoctoral training at the University of Louisville and the University of Connecticut. Dr. Galperin has been at the NCBI since 1996. He is an author of a textbook and more than 180 research papers, reviews, and book chapters on various aspects of microbial genomics, genome annotation, and metabolic and signaling pathways. He is a member of the Journal of Bacteriology editorial board and serves as the editor of the Nucleic Acids Research annual database issue and the editor of the Genomics Updates section of Environmental Microbiology.
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JB.00333-15.