|Home | About | Journals | Submit | Contact Us | Français|
Light-oxygen-voltage (LOV) domains serve as the photosensory modules for a wide range of plant and bacterial proteins, conferring blue light dependent regulation to effector activities as diverse as enzymes and DNA binding. LOV domains can also be engineered into a variety of exogenous targets, enabling similar regulation for new protein-based reagents. Common to these proteins is the ability for LOV domains to reversibly form a photochemical adduct between an internal flavin chromophore and the surrounding protein, using this to trigger conformational changes that affect output activity. Using the Erythrobacter litoralis protein EL222 model system which links LOV regulation to a helix-turn-helix (HTH) DNA binding domain, we demonstrated that the LOV domain binds and inhibits the HTH domain in the dark, releasing these interactions upon illumination [Nash et al. (2011) Proc. Natl. Acad. Sci. USA 108, 9449–9454]. Here we combine genomic and in vitro selection approaches to identify optimal DNA binding sites for EL222. Within the bacterial host, we observe binding several genomic sites using a 12 bp sequence consensus that is also found by in vitro selection methods. Sequence-specific alterations in the DNA consensus reduce EL222-binding affinity in a manner consistent with the expected binding mode: a protein dimer binding to two repeats. Finally, we demonstrate the light-dependent activation of transcription of two genes adjacent to an EL222 binding site. Taken together, these results shed light on the native function of EL222 and provide useful reagents for further basic and applications research of this versatile protein.
For cells to respond to changes in their environment, they rely on sensory proteins to perceive these changes and initiate appropriate responses at the biochemical level. Two critical aspects of this process – detecting the signal and transmitting this to downstream effectors – are elegantly combined within several types of small protein domains that bind environmentally-sensitive cofactors, using these to trigger protein structural changes that affect sensor/effector interactions. This principle has been demonstrated for several different types of sensory domains, including the PAS (Per-ARNT-Sim) domain family that includes sensors of oxygen, redox, light and other stimuli1.
The signaling mechanism of PAS domains is nicely exemplified by a subset which utilizes internally-bound flavin chromophores to sense changes in blue light or redox state, known as LOV (Light-Oxygen-Voltage) domains2. In the dark, LOV domains exist with a single non-covalently bound FMN or FAD molecule near a conserved set of residues within a mixed α/β fold common to all PAS domains. Upon illumination, a covalent adduct is formed between one of these residues, a cysteine, and the C4a position of the flavin isoalloxazine ring. This adduct formation triggers the rearrangement or dissociation of protein binding to the external surface of the β-sheet, controlling the activity of effector domains3, 4. Originally demonstrated in studies of isolated LOV domains from phototropins2, a group of light-activated serine/threonine kinases from plants, this type of light-dependent regulation has since been found in a wide range of plant, algal and bacterial proteins with very diverse effectors5. LOV domain regulation is portable enough to be engineered into a variety of downstream targets, enabling the successful design of fusion proteins conferring photoactivation to enzymatic and non-enzymatic targets6–8. As such, understanding the biophysical nature of this control is essential to understanding this type of natural photosensing and furthering engineering efforts.
In this vein, we have examined the generality of this signaling mechanism with studies of several bacterial LOV-containing proteins, which are members of the rapidly-growing ensemble of photoreceptors that control diverse responses in phototrophic and non-phototropic bacteria (recently reviewed in ref. 9). One such protein, EL222 from the alphaproteobacterium Erythrobacter litoralis HTCC259410, provides one of the smallest complete LOV-containing proteins with both sensor and effector domains inside of a small framework (222 aa). An example of a “one-component” signaling protein11, EL222 contains both a LOV sensor and a helix-turn-helix (HTH) DNA binding domain. Combining this domain architecture and LOV signaling principles, we hypothesized that EL222 is a light-dependent DNA binding protein, which we tested with a combination of biophysical and biochemical approaches10. Structural studies indicated that the LOV and HTH domains are tightly associated in the dark, with the LOV domain β-sheet docking to the HTH 4α helix, blocking the ability of this helix and protein to dimerize as is typically required for HTH domains to bind DNA12. Using NMR, limited proteolysis and other approaches, we demonstrated that light dependent conformational changes break this association. To survey the functional effects of these changes, we used a candidate-based approach to identify EL222-binding sequences from within the EL222 promoter. In vitro screening of over twenty overlapping 45-mer duplex DNA sites found several that bound specifically to EL222 in the light but not in the dark. However, the relatively low affinity of this interaction (5–10 μM) compared to other HTH/DNA interactions reported to be between 0.1–1000 nM13–16 suggested that higher affinity DNA binding sites might exist. Such would be useful reagents for biochemical studies – to verify the binding site preference of EL222 – and for engineering purposes.
To address this shortcoming, we have pursued two independent approaches to identify higher affinity DNA binding sites. Our first approach was to use in vivo ChIP-Seq (Chromatin ImmunoPrecipitation – high throughput Sequencing17, 18) to identify binding sites within the E. litoralis genome. We demonstrate that this approach was successful at providing eleven different sites that we validated with a variety of in vitro approaches. Further, we have found light-dependent gene activation near one of these sites, providing the first indication for EL222 serving as a light-dependent transcription factor. A complementary approach was provided by in vitro selection (SELEX19) of EL222 binding under lit state conditions. Both methods identified a converged sequence consensus between the natural and artificial sites that are sufficient to predict novel sites within the EL222 genome. Finally, we present results from mutagenesis studies of the EL222 DNA binding sites, showing position-dependent effects consistent with binding in the predicted dimeric, major groove-binding mode expected of HTH proteins. Taken together, these data advance our understanding of this class of LOV-dependent proteins and lay the foundation for further functional work in the area.
E. litoralis HTCC2594 was grown in 1 l ZoBell marine broth 2216 (HiMedia Laboratories) at 30°C to a OD600 of 0.8. Cells were illuminated using a blue LED panel (14 W) and a white light flood lamp (150 W) for 20 min, crosslinked with 0.6% formaldehyde under illumination for an additional 20 min and subsequently quenched with 0.5 mM glycine. Cells were harvested by centrifugation, washed twice with PBS, and stored at −80°C until ready to use. Pellets were thawed on ice, resuspended in 1 ml lysis buffer (10 mM Tris (pH 8.0), 50 mM NaCl, 10 mM EDTA, 20% sucrose) and incubated with lysozyme (10 mg/ml) for 30 min at 30°C. The suspension was diluted by adding 4 ml PBS with 1% Triton X-100 and sonicated on ice at 60% power for 4 cycles of 15 pulses (1 s on, 1 s off) with 1 min between cycles. Samples were supplemented with 1 mM PMSF and clarified by centrifugation. Chromatin was pre-cleared by incubating with 100 μl of protein A/G-Plus agarose beads (Santa Cruz Biotechnology) for 2 hr at 4°C in a rotator, and subsequently divided into 800 μl aliquots. Each aliquot was incubated overnight at 4°C with 20 μl of protein A/G agarose beads and 4 μg of anti-EL222 antibody (YenZym) (with the exception of the mock sample, which did not have antibody). Beads were washed 4 times with PBS, after which protein and DNA were eluted in 50 μl of elution buffer by heating (65°C, 15 min). Eluted fractions were treated with RNase A and proteinase K, heated for 16 hr at 65°C and processed with the Qiaquick PCR clean up kit (Qiagen).
Single end read libraries were constructed for IP and Mock samples following standard protocols (Illumina). After checking the library quality using an Agilent Technologies 2100 Bionalyzer and measuring the concentration with PicoGreen reagent (Invitrogen), approximately 0.1 pg of DNA from IP and Mock samples were sequenced on an Illumina Genome Analyzer IIx at the UTSW Next Generation Sequencing Core. Reads were aligned to the E. litoralis HTCC2594 genome20 using the Burrows-Wheeler Aligner21 and the aligned reads were visualized using IGB22. Sequencing statistics are provided in Table S1.
E. litoralis HTCC2594 cultures were grown in triplicate in ZoBell marine broth 2216 (HiMedia Laboratories) to an OD600 of 0.8. Samples (1 ml) were collected at three timepoints: 1). pre-illumination, 2). after 30 min of white and blue light illumination, and 3). after 30 min of incubation in the dark following illumination. RNA was extracted using Trizol reagent (Invitrogen) and treated with DNase I before generation of cDNA with the iScript cDNA synthesis kit (Bio-Rad). Real-time PCR was performed in duplicate for each biological replicate using the iTaq SYBR Green Supermix (Bio-Rad). Changes in mRNA levels were analyzed using the comparative CT method23 using rpoD expression as a reference.
WT-EL222 and A79Q-EL222 proteins10, 24 were expressed in E. coli BL21(DE3) cells, grown in LB-AMP at 37°C in the dark and induced with 0.5 mM IPTG in OD600~0.5–0.7. After inducing for 20 hr at 18°C, cells were centrifuged and the resulting pellets resuspended in buffer A (50 mM Tris-Cl pH 8.0 and 100 mM NaCl) and subsequently lysed by sonication. For SELEX experiments, protein was purified in the dark at 4°C by gravity-flow chromatography with Ni-NTA agarose (Qiagen) equilibrated in buffer A. Proteins were eluted in buffer A plus 75 mM imidazole, exchanged into imidazole-free buffer A and concentrated to 100–200 μM. For EMSA experiments done with ChIP-Seq derived DNA, WT-EL222 was purified by FPLC using Ni-affinity chromatography as previously described 10, followed by a final Superdex 75 size exclusion chromatography step.
The initial single-stranded oligonucleotide library (5′-GGGAATGGATCCACATCTACG-(N)33-TTCAACTTGACGAAGCTTGCC-3′) was chemically synthesized (IDT). To amplify the DNA pool, six 50 μl PCR reactions were set up using 0.1 μM of the synthetic pool oligonucleotide as template, 2 μM primers (Fwd-L1 5′-GGGAATGGATCCACATCTACG-3′and Rev-L1 5′-GGCAAGCTTCGTCAAGTTGAA-3′), 200 μM dNTPs, 20 mM Tris-Cl pH 8.8, 10 mM (NH4)2SO4, 10 mM KCl, 2 mM MgSO4, 0.1% Triton X-100, 4 mM MgCl2, and 0.04 U Vent (New England Biolabs). Amplified DNAs were purified using QIAquick PCR purification kit (Qiagen). In a total volume of 500 μl, approximately 15.7 μM DNA (= 5×1014 molecules) and 7.8 μM His6-tagged EL222 protein (containing an A79Q point mutation to slow dark state recovery24) were incubated in binding buffer (10 mM Tris-Cl pH 8.0, 80 mM NaCl, 3 mM MgCl2, 10% glycerol, 0.025 mg/mL poly(dI-dC), and 0.01 mg/mL BSA). The binding reaction was mixed by rotation for 25 min at 4°C and kept under continuous illumination with a fluorescent white light bulb (23 W). Ni-NTA agarose beads (Qiagen) were pre-blocked with 0.02 mg/mL of poly(dI-dC), added to the binding reaction, and incubated for another 25 min at 4°C with mixing and continuous illumination. The bead/EL222/DNA complexes were pulled down by centrifugation at 400 g for 1 min at 4°C. Next, complexes were washed with 300 μl of wash buffer (50 mM Tris-Cl pH 8.0, 300 mM NaCl) and then centrifuged to pellet the beads, this step was repeated two or more times. After the last wash, the bead/EL222/DNA complexes were resuspended in 400 μl of binding buffer (without poly(dI-dC) and BSA) and incubated in the dark for 30 min at 4°C to elute the DNAs. The beads/EL222 complexes were pulled down by centrifugation and the supernatant (containing the DNAs) was transferred to a new tube. Phenol/chloroform/isoamyl alcohol was added to the supernatant in a 1:1 ratio, the sample was vortexed and centrifuged at 15,800 g for 5 min at 4°C. The top aqueous phase, which contains the eluted DNAs, was transferred to a new tube containing 1 mL of 100% ethanol, 0.3 M NaOAc pH 5.2, and 0.01 mg/mL glycogen. The DNAs were precipitated overnight at −20°C and subsequently recovered by centrifugation at 15,800 g for 20 min at 4°C. The pelleted DNAs were resuspended in 12 μl of DNase-free water and later used as the template DNA pool in a second PCR amplification step and round of selection. To sequence individual DNA sequences from the DNA pools obtained after each SELEX round, the DNA pools were cloned into the pBlueSkript+ vector (Stratagene) using BamHI and HindIII restriction sites. Computational analysis of the sequences identified with SELEX was carried out using the MEME algorithm25.
All EMSA experiments done with ChIP-Seq derived DNA sequences were performed as previously described10. For EMSAs done with DNAs derived from SELEX the experiment was carried out as follows: The DNA pools from each cycle were first PCR amplified as described for the SELEX procedure. For the individual SELEX-derived clones complementary oligonucleotides were chemically synthesized (Sigma). The oligonucleotides were annealed by heating them to 95–100°C for 5 min and then left to cool down to room temperature. The DNA pools and the individual clones were 5′-end labeled in a 50 μl reaction containing 68 nM DNA substrate, 70 mM Tris-Cl pH 7.6, 10 mM MgCl2, 5 mM DTT, 0.6 μCi (γ -32P)ATP (Perkin Elmer), and 0.2 U PNK (New England Biolabs). The reaction was incubated for 30 min at 37°C, followed by a 20 min incubation at 65°C to heat-inactivate the enzyme. The 32P-labeled DNA was purified from the unincorporated (γ -32P)ATP using Illustra ProbeQuant G-50 microcolumns (GE Healthcare). Approximately 13.6 nM radiolabeled DNA was incubated with varying concentrations of WT-EL222 in the same binding buffer used for SELEX procedure for 25 min at 4°C with continuous illumination with a fluorescent white light bulb or kept in the dark. Reactions were analyzed on a 5% native gel (acrylamide/bis-acrylamide ratio of 29:1) and run in TBE buffer at 150 V for 1.5 hr at 4°C. The gel was exposed to a phosphorimaging plate and visualized using FujiFilm FLA-5100 imaging system.
As noted above, we first demonstrated the light-dependent DNA binding properties of EL222 using a candidate-based approach to find putative DNA binding sites. Assuming EL222 might be auto-regulatory and bind to sequences in its own promoter, we used EMSA to survey twenty-one 45 bp sequences located within 350 bp of the EL222 gene itself10. Three sequences bound with low micromolar affinities, the best of which AN-45, called oligomer 1 in ref. 10, bound EL222 with EC50 of 5–10 μM in the light with minimal binding in the dark, expected from the presence of the photosensitive LOV domain. However, while AN-45 bound to EL222 with the highest affinity of this limited set of oligonucleotides, the relatively low affinity compared to other HTH-containing proteins13–16 led us to suspect that it was not an ideal binding site.
To search for EL222 binding sequences more broadly and in an unbiased manner, we used chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-Seq17, 18) (Figure 1a). We compared results from two datasets, both of which were generated using E. litoralis cultures grown and crosslinked in lit state conditions. One set used an anti EL222 antibody for the immunoprecipation step (IP in Figure 1a); an antibody-free parallel experiment (Mock) controlled for non-specific binding. Reads for both IP and mock-IP were mapped to the E. litoralis HTCC2594 genome sequence20.
From these data, we identified eleven putative EL222 binding sites (peaks A–K in Figure 1a) that were enriched up to >30-fold in the IP dataset over the mock-IP counterpart. For our initial characterization, we focused on peak C, which showed the greatest enrichment in the ChIP-Seq experiment. Using EMSA to survey the ability of 45 bp probes covering this peak to bind 0.5 μM EL222 in vitro, we found two sequences (C7 and C8) that were substantially bound (Figure 1b, c); both are located at the center of the ChIP-Seq peak. We tested smaller 20–24 bp probes covering the sequence comprised by probes C7 and C8 and found a 24 bp probe (C-24mer) that bound with similar affinity as the 45 bp probe C8 (Figure S1). Finally, we confirmed that the binding of EL222 to this 24 bp sequence was light-dependent by comparing EMSA data collected under dark and lit conditions (Figure 1d). Binding in the dark was negligible compared to binding under lit conditions at all concentrations tested, consistent with our prior findings10. Similar studies with peaks A, B and D established 45 bp sequences near the centers of each peak that bound EL222 (Figure 2, S2). Analogously to peak C, we refined probe B7 of peak B to a 22 bp sequence that bound in a light-dependent manner with similar affinity to the C-24mer (Figure 2c).
To complement the results we obtained from the ChIP experiment, we developed an in vitro selection strategy based on SELEX (Systematic Evolution of Ligands by EXponential enrichment19). To do so, we used a large library of double-stranded oligonucleotides (~5×1014 molecules) consisting of a random 33 bp central portion flanked by constant 5′ and 3′ ends (each 21 bp long) with primer binding sites (Figure 3a). Next, we incubated the oligonucleotide library with recombinant His6-tagged EL222 protein and exposed the mixture to light to activate the protein. EL222/DNA complexes were purified by affinity chromatography using Ni-NTA beads under stringent binding conditions, and the DNAs were eluted from the bead-bound complexes by incubation in the dark. Eluted DNAs were amplified by PCR and used as a library for a subsequent round of selection; this entire cycle was repeated a total of four times. To verify that the binding affinity of the DNA pools increased with each successive round, EMSAs were done after each selection cycle. As expected, by the fourth cycle of SELEX we had enriched for a pool of DNAs that bound EL222 appreciably tighter than the AN-45 substrate (Figure 3b).
To characterize the features of these EL222-binding sequences, we cloned and sequenced 57 unique sequences from the DNA pool obtained after the fourth round of selection (Table S2). Based on analyses done using the motif-based sequence analysis tool MEME (Multiple Em for Motif Elicitation25), the ten highest scoring DNAs were chosen for follow-up binding studies (Figures S3a and S3b). Gel shift assays showed that Clone-1 bound EL222 most tightly out of the ten DNAs tested, with approximately 16–30 fold higher affinity than AN-45 (EC50=0.3 μM Clone-1 versus EC50= 5–10 μM AN-45; Figure S3d and ref. 10). To further refine the sequence determinants for binding within the 45 bp Clone-1 DNA, we designed four overlapping 20 bp fragments derived from Clone-1 and assessed their binding to EL222 by gel shift assay (Figures 3c and 3d). Interestingly, only the C1-2 fragment was bound by recombinant EL222 protein at protein concentrations between 0.3 and 1 μM, while the other three fragments showed no binding under these conditions. Thus, this result suggests that C1-2 contains all of the residues necessary for the binding of EL222.
From these data, it is clear that both ChIP and SELEX were capable of independently identifying sequences with sub-micromolar affinities for EL222. To determine if any shared motifs existed within these sequences, we again used the motif-based sequence analysis tool MEME25. We aligned the 45 bp EL222-binding sequences from ChIP-Seq peaks A-D (probes B7, D8, and the combination of probes A5–A6 and C7–C8) and 33 bp SELEX sites and searched for palindromic motifs. A search for a 12 bp or longer motif revealed the consensus sequence RGNCYWWRGNCY (Y=C/T, W=A/T, R=A/G, N=any nucleotide; Figure 4), while searches with the default minimum motif width of 6 returned only part of this motif, RGNCY. As observed for other HTH domains12, the 12 bp consensus contained two repeats separated by a short spacer. Here, we found 5 bp repeats, each of which contained a highly-conserved GNC element flanked by a purine (5′) and pyrimidine (3′) residue, separated by an AT-rich 2 bp central spacer. Within the search sequences, we found examples of both inverted and direct arrangements of these repeats. Remarkably, we find that the MEME-predicted EL222 consensus motif maps specifically to the 20–24 bp fragments identified by truncation analyses of ChIP-Seq Peaks B and C (Figures 1, ,2,2, S1) and Clone-1 (Figure 3d). We also confirmed that a 22 bp fragment of peak A containing the motif found in the MEME alignment was bound by EL222 (Figure S2c). These results provide independent support for the possibility that the MEME motif may serve as an EL222 binding site; however, they do not confirm that the motif predicted by MEME is the bona fide consensus sequence for EL222, nor do they directly establish the relative energetic importance of different residues within the motif.
To further explore the EL222-binding roles of residues within the MEME motif, we engineered mutations to disrupt sequence elements in the predicted consensus sequence. We used the SELEX Clone-1 as the template for these mutations, as it both bound EL222 tightly (Figure 3d) and contains a copy of the consensus sequence (Figure 4B), and used EMSA to analyze EL222 binding. Purified recombinant EL222 efficiently binds the wild type Clone-1 sequence with sub-micromolar affinity (Figures 5b and S3); in contrast, sequences with mutations in the either of the two GNC elements substantially weakened binding (Figures 5b and S4). Simultaneous mutation of both G and C residues in a single repeat were most detrimental to binding (Figure 5b, mut1 and mut2). We also saw lowered binding affinities when individual G or C residues were changed within either repeat, albeit not as dramatically as the double mutant (Figure S4, mut3–6). These results suggest that EL222 requires two intact GNC repeats to bind to the Clone-1 sequence with submicromolar affinity. The fact that disrupting only one of the repeats is enough to weaken the binding of EL222 in our gel shift assays is consistent with our signaling model that EL222 binds to DNA as a dimer10, as is commonly observed for other HTH proteins12.
Turning to less well-conserved residues, we examined the importance of the purine and pyrimidine residues that MEME predicted to flank the GNC element on the 5′ and 3′ side, respectively. Consistent with this, swaps among the purines (A/G on 5′, mut7) or pyrimidines (T/C on 3′, mut9) within the left repeat were tolerated, while purine to pyrimidine changes and vice versa (A/T, mut8; T/A, mut10) more substantially lowered binding affinity (Figure 5). Similar results were obtained when we made the equivalent mutations in the right repeat (Figure 5, mut11–mut14). These results confirm that the sequence conservation of these flanking residues reflects an inherent energetic preference for specific types of residues at these sites.
Finally, we also examined the importance of the least conserved positions, the N residues of the GNC repeats and the inter-repeat spacers. As shown in Figure S4, binding of EL222 to DNAs with mutations in the middle N position of either (mut15, 16) or both (mut17) repeats were virtually identical to wild type Clone-1. Turning to the spacer region separating the two repeats, we found changes to the spacer sequence had little to no effect on binding (mutating both T/A, mut18) (Figure S4). Importantly, however, the length of the spacer itself was critical; insertion of two additional T residues to convert the spacer from 2 to 4 bp completely eliminated binding. In conclusion, these data demonstrate that two, properly-spaced, RGNCY repeats are the primary determinants of EL222 binding.
Using the EL222-binding consensus sequence identified from ChIP-Seq peaks A-D and SELEX data (Figure 4), we predicted binding sites for EL222 in sequences for the remaining ChIP-Seq peaks E through K. We searched for the consensus near the centers of these peaks, based on our observations that this is where the binding sequences were located in the peaks we investigated initially. We then tested if EL222 would bind to these sequences in an EMSA (Figure S5); indeed, EL222 bound to 22 bp oligonucleotides containing the sequences predicted from the consensus at protein concentrations between 0.25 and 0.75 μM.
The ChIP-Seq results suggested that EL222 bound to specific sites within the E. litoralis genome, indicating a possible light-dependent regulatory function. To test the specific possibility that this would affect transcription, we examined the expression levels of genes downstream of peaks with the most ChIP-Seq reads, peaks B (ELI_05375, protein of unknown function and ELI_05380, radical SAM protein, putative pyrimidine dimer lyase), C (ELI_06040, putative indoleamine 2,3-dioxygenase), and D (ELI_08405, NAD synthetase). Using qRT-PCR, we quantitated expression of these four genes in E. litoralis cultures at three points: before exposure to light, immediately after 30 min illumination, and after a 30 min post-illumination dark state recovery period. Notably, we saw induction of both genes near peak B after illumination: 6-fold up for ELI_05375, 2-fold up for ELI_05380, respectively (Figure 6). Expression levels diminished after returning the cultures back to the dark for 30 min, supporting the light dependence of this effect. Importantly, the timing of this decrease in mRNA levels correlates with the quick recovery of dark state EL222 conformation post-illumination, which is approximately 30 s at room temperature10. We did not observe significant changes in expression for genes from the other two peaks studied.
Utilizing independent ChIP-Seq and SELEX approaches, we have identified a 12 bp consensus sequence recognized by the light-dependent LOV-HTH protein EL222. Several properties of this consensus are consistent with our model for EL222 signaling10, and more broadly, with general characteristics of HTH domain/DNA interactions. First, the consensus contains two 5 bp binding sites that can be oriented as direct or inverted repeats separated by an AT-rich 2 bp spacer (“5-2-5”) for binding an EL222 dimer (Figure 4). Similar arrangements are commonly used by other HTH domains to bind DNA as dimers12, with minor differences in the number of basepairs involved: the related LuxR-type HTH proteins DosR26, NarL27, and TraR28–30 utilize inverted repeats in 9-2-9, 7-2-7, and 8-2-8 configurations, respectively. Interestingly, several HTH proteins have been demonstrated to bind both inverted and direct repeats, including NarL27 and the serine recombinase Sin31. Such binding is facilitated for these proteins by relatively long linkers (approx. 30 aa) between the HTH and the adjacent domain31, 32, as we observed between the LOV and HTH domains in EL222 (22 aa10). Studies of several of these proteins show that the inter-repeat spacing is critical to positioning the two half-sites at ideal distances and phasing for interacting with the HTH domains. Alteration of these AT-rich spacers greatly diminish the protein-binding capability of these repeats, as demonstrated for 2 bp insertions for EL222 (Figure S4, mut 19) or TraR30, despite the lack of specific protein/DNA base contacts in this region28, 29. This parallel supports a key feature of our model for EL222 activation by blue light, where covalent adduct formation within the LOV domain triggers an allosteric process that releases inhibitory interactions between the LOV and HTH domains, converting a dark state monomer into a dimeric form that can bind DNA10.
Turning to each DNA half-site, HTH domains typically utilize sequence-specific contacts between residues from one of the HTH helices (predicted to be helix 3α for EL222) and groups located in the major groove from bases in these repeats12, making them particularly important for binding as we observed for EL222 (Figure 5). In the case of TraR, structural data predicted sequence-specific contacts between the protein and the tra box DNA that were subsequently tested by using synthetic tra boxes containing various base substitutions in in vitro binding assays30. Substitutions at positions near the center of each tra box half-site, and which make direct contacts with TraR28, 29, caused the most severe defects in binding affinity. Comparably, our studies showed that binding of EL222 to Clone-1 DNA was most affected by mutations in the GNC element found in the middle of each repeat (Figures 5 and S4; mut1–6). Taken together, these data support parallels between EL222 and other HTH proteins, which will be quantitatively examined in future biophysical and structural studies of lit-state EL222 in complex with its cognate DNA.
Regarding the predicted functional effects of blue light activation on EL222, we note that this is an example of a “one component” bacterial signaling protein11, directly linking an environmental sensor domain with a DNA binding effector domain instead of separating these into distinct “two-component” kinase/response regulator proteins. EL222 links the most common sensor and effector functions (small molecule sensing, DNA binding) observed among one component proteins11, leading to our proposal that EL222 is a light-dependent transcription factor. Here we provide several lines of data supporting this hypothesis: 1). the majority of the EL222 binding sites found by ChIP-Seq are in intergenic regions; 2). all but one ChIP-Seq binding sites are within 300 bp upstream of a predicted open reading frame (ORF) translational start (the two genes adjacent to the binding site in peak A, one of which is EL222 itself, have their 3′ ends pointing towards the binding site); 3). in vitro binding assays confirmed EL222 bound to sites within the ChIP-Seq peaks with sub-μM affinity; 4). genes adjacent to ChIP-Seq peak B exhibited statistically significant upregulation upon illumination (Figure 6). Further studies are needed to determine the complete set of genes regulated by blue light illumination and to establish that EL222 is directly responsible for this control, particularly given that E. litoralis HTCC2594 contains two functional blue-light sensing histidine kinases that rely on comparable LOV photochemistry33. Strong circumstantial evidence in favor of a direct role for EL222 is provided by the fast decrease in transcript level after removal of light, which is consistent with the rapid rate of LOV dark state recovery (τ~30 s)10 but is much faster than expected for the slowly-cycling LOV-HK proteins in E. litoralis and other α-proteobacteria (τ >30 min)33, 34.
If EL222 indeed functions as an E. litoralis transcription factor, what purpose could it serve? We note the increasing appreciation for light-dependent changes in fundamental aspects of microbial metabolism, some of which are mediated by other LOV-containing proteins9. In this vein, several Erythrobacter strains are facultative phototrophs, providing numerous candidates for light-dependent regulation. However, the lack of key genes for photosynthesis (e.g. bacteriochlorophyll synthesis, photosynthetic reaction centers, CO2 fixation)20 and an obvious light-dependent growth phenotype (G.R.C. and K.H.G., unpublished results) for E. litoralis HTCC2594 strongly suggest that other biological responses are controlled by EL222. Some insight is provided by our ChIP-Seq analysis (Table S3); however, limited homology and domain information hampers an in-depth analysis of many genes, including the light-regulated ELI_05375 (Figure 6). Nevertheless, several broad themes can be observed, among them metabolism (including NAD biosynthesis), metabolite/ion transport and DNA repair.
Further examining the last of these three areas, we see several lines of evidence supporting the role of EL222 in controlling DNA repair. Notably, one of our light-induced genes (ELI_05380) encodes a protein that could potentially be implicated in repair because of the radical S-adenosylmethionine (SAM) and helix-hairpin-helix (HHH) domains contained within this protein. Using SAM and [4Fe-4S] cluster cofactors, radical SAM domains catalyze a large variety of biochemical reactions35, including the spore product (SP) lyase repair of a type of UV-induced thymine dimer35, 36. As shown in the recently-determined structure of Geobacillus thermodenitrificans SP lyase36, this enzyme contains both a radical SAM domain and a C-terminal β-hairpin, the latter of which is thought to be important for SP lesion recognition and insertion into the enzymatic active site. ELI_05380 resembles SP lyase in that it is predicted to have both a radical SAM domain and a C-terminal HHH domain, a conserved non-sequence specific DNA-binding domain37. This similarity in domain organization implies that ELI_05380 might also have a role in the repair of UV-induced DNA lesions such as SP. Further support for this assertion is provided by the presence of ELI_05380. This gene contains a uracil-DNA glycosylase domain, which initiates the removal of uracil from DNA38. Uracil in DNA often results from cytosine deamination, which is accelerated in UV-induced cytosine photodimers39 and can lead to C→T transitions if left unrepaired. The pairing of these two types of proteins is quite common, having been observed in ~20% of bacterial genomes as adjacent genes40, supporting the possible use in DNA repair.
At this time, EL222 homologs have been found in the genomes of eight α-proteobacteria. Intriguingly, in five of these strains (Novosphingobium aromaticivorans DSM12444, Sphingopyxis alaskensis RB2256, Sphingomonas sp. KC8, Sphingomonas elodea ATCC31461, Novosphingobium nitrogenifigen DSM19370) these homologs are located adjacent to genes encoding DNA photolyases and enzymes needed for synthesis of the pterin cofactors41, GTP cyclohydrolases. Another homolog with an inverted domain composition (HTH-LOV, instead of LOV-HTH) can be found in the ε-proteobacterium Sulfuromonas denitrificans DSM1251 (previously Thiomicrospira denitrificans) and is also adjacent to a GTP cyclohydrolase. The clustering of these genes among among several strains suggests that LOV-HTH light-dependent regulation of DNA repair enzymes might be a conserved theme.
While the natural targets of EL222 regulation remain open, we note the ability to control DNA binding with light provides tremendous opportunities for engineering EL222 to flexibly control transcription and other DNA-modifying activities with unparalleled spatial and temporal resolution. A particular advantage of EL222 is that it directly regulates DNA binding with light, rather than relying on light-dependent activation of protein dimerization of separate DNA-binding and transactivation components42, 43 or second messenger systems44, enabling its use as a single component in heterologous systems. Notably, preliminary data indicate that the DNA binding sites discovered here are sufficient to work in a variety of different eukaryotic environments (L.B.M.M. and K.H.G., unpublished results), enabling discoveries in areas far from their native marine bacterial environments.
Funding sources: We gratefully acknowledge funding from the NIH (R01 GM081875 to K.H.G.) and the Robert A. Welch Foundation (I-1424 to K.H.G.) in support of this research.
We thank Fernando Correa and Victor Ocasio for their comments on the manuscript. The pBlueSkript+ vector was a kind gift from Dr. Michael Dellinger (UT Southwestern). E. litoralis HTCC2594 genome sequence data was provided in advance of publication by Stephen Giovannoni’s laboratory (Oregon State University, Corvallis, OR) and The J. Craig Venter Institute with grant support from The Gordon and Betty Moore Foundation Microbial Genome Sequencing Project.
Includes five Supporting Figures showing EL222 binding to short 24 bp probe from ChIP-Seq peak B (Figure S1), EL222 binding to fragments of ChIP-Seq peaks A and D (Figure S2), differential binding affinity of EL222 to different SELEX-derived sequences (Figure S3), mutational analyses of EL222 binding to SELEX clone 1 (Figure S4) and verification of EL222 binding to sequences within ChIP-Seq peaks E-K. Additionally, three Supporting Tables are provided including statistics from high-throughput sequencing for ChIP-Seq (Table S1), a list of SELEX-derived EL222 binding sites (Table S2) and a list of E. litoralis genes near EL222 binding sites (Table S3). This material is available free of charge via the Internet at http://pubs.acs.org.