|Home | About | Journals | Submit | Contact Us | Français|
Nucleoid-associated proteins (NAPs) are global regulators of gene expression in Escherichia coli, which affect DNA conformation by bending, wrapping and bridging the DNA. Two of these—H-NS and Fis—bind to specific DNA sequences and structures. Because of their importance to global gene expression, the binding of these NAPs to the DNA was previously investigated on a genome-wide scale using ChIP-chip. However, variation in their binding profiles across the growth phase and the genome-scale nature of their impact on gene expression remain poorly understood. Here, we present a genome-scale investigation of H-NS and Fis binding to the E. coli chromosome using chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-seq). By performing our experiments under multiple time-points during growth in rich media, we show that the binding regions of the two proteins are mutually exclusive under our experimental conditions. H-NS binds to significantly longer tracts of DNA than Fis, consistent with the linear spread of H-NS binding from high- to surrounding lower-affinity sites; the length of binding regions is associated with the degree of transcriptional repression imposed by H-NS. For Fis, a majority of binding events do not lead to differential expression of the proximal gene; however, it has a significant indirect effect on gene expression partly through its effects on the expression of other transcription factors. We propose that direct transcriptional regulation by Fis is associated with the interaction of tandem arrays of Fis molecules to the DNA and possible DNA bending, particularly at operon-upstream regions. Our study serves as a proof-of-principle for the use of ChIP-seq for global DNA-binding proteins in bacteria, which should become significantly more economical and feasible with the development of multiplexing techniques.
Transcription in bacteria is controlled by a combination of DNA sequence, topology and a range of trans-acting factors (1). The best-studied trans-acting regulators are transcription factors (TFs) which modulate transcription by promoting or inhibiting the interaction of the DNA-dependent RNA-polymerase with promoter regions. Bacterial regulators are broadly classified into global and local (2) based primarily on the number of genes that they target for regulation: notable among the former are a subset of ‘nucleoid-associated proteins’ (NAPs) (3).
Many NAPs alter the topology of the bound DNA by bending, wrapping or bridging it (3,4). This has multiple effects on the bacterial cell, among which is transcriptional regulation. Analysis of 12 types of NAPs present in Escherichia coli showed that they differ in their expression during the growth phase (5), the degree of sequence specificity (6) and the capacity for post-translational modifications. Two of the best-studied NAPs, H-NS and Fis, display peak expression during exponential growth and bind to specific DNA sequences and structures. Neither protein is known to be post-translationally modified.
H-NS is a global repressor of transcription in enterobacteria. It acts as a ‘genome sentinel’ (7) by suppressing the transcription of horizontally-acquired genes, thus providing a fitness benefit to Salmonella grown under laboratory conditions (8). It is expressed throughout the growth phase, but shows maximal expression during exponential growth (5), though conflicting data—that H-NS expression is constant across the growth phase—have been presented for Salmonella (9). H-NS displays sequence-specific binding and simultaneously affects chromosome structure and transcription by forming DNA–H-NS–DNA bridges, so reinforcing plectonemically supercoiled structures (10,11).
Fis is a versatile protein which affects multiple processes including transcription, replication and recombination (12). In contrast to H-NS, the expression of Fis peaks during exponential phase, but decreases to undetectable levels in stationary phase (5); therefore, it is thought to be an important player in controlling the growth transition to stationary phase (13). Currently available information in RegulonDB database (14) indicates that, as a TF Fis can activate as well as repress gene expression. On binding, Fis introduces an interwound and branched structure in the DNA where a branch is defined as ‘a separate DNA lobe containing at least one intrinsic crossover’; these structures may be associated with regions of high transcriptional activity (15). Fis influences the distribution of DNA topoisomers in a population of cells: for example fis deletion leads to a decreased proportion of cells with low negative supercoiling in stationary phase (13), which might have an impact on stationary phase gene expression.
Analysis of general trends for transcriptional control by Fis and H-NS have generally been performed using compilations of data from small-scale experiments (16). Recently, the use of chromatin immunoprecipitation (ChIP), followed by DNA hybridization to genome-tiling microarrays has led to a systematic and relatively less biased identification of genomic loci physically associated with these proteins, primarily in mid-exponential phase of growth. The only study to have investigated both proteins simultaneously—using microarrays with probes tiled at 160bp resolution—showed that there is significant overlap between the genes targeted by the two proteins (17). There are two contrasting models of how H-NS represses transcription: Lucchini and colleagues proposed that H-NS inhibits the initial RNA-polymerase-DNA interaction (8), whereas Grainger and co-workers and Oshima et al. demonstrated that the polymerase is trapped at the promoter (17,18). For Fis, the majority of bound genes were shown not to change in expression in a fis deletion strain (19), which is intriguing given that Fis is considered to be a global regulator of transcription.
Despite the above studies, we do not know whether and how the binding of these proteins to the DNA varies across the growth phase. This is particularly important since their expression levels are known to change substantially during growth. It has been previously suggested that H-NS might act both as a canonical TF and as a silencer of gene expression (20): however, the distinction between these two modes of H-NS function have not been described on a genomic scale. Finally, given prior observations of limited overlap between genes bound by Fis and those that change in expression in a deletion strain (19), the genome-scale impact of Fis–DNA interactions on gene expression remains poorly understood.
Here, we present an investigation of genomic loci bound by Fis and H-NS in E. coli K12 using ChIP followed by high-throughput sequencing, instead of microarray hybridization, of the immunoprecipitated DNA (ChIP-Seq) (21). Improvements in sequencing have revolutionized genomics by providing a platform for quantifying nucleic acid concentrations that affords higher dynamic range, higher resolution and lower false positive rates (22,23). These are now being used extensively to investigate protein–nucleic acid interactions in eukaryotes (21,24–27). In bacteria, however, their use has been largely limited to whole-genome sequencing and transcriptomic analysis (28–33), though transcriptome-level investigations have been extended using immunoprecipitation-based interrogation of protein–RNA interactions in Salmonella (34). Recently, a ChIP-Seq-based analysis of a Mycobacterium tuberculosis TF DosR, which binds to ~25 loci on the genome, was published (35). To our knowledge, ours is the first detailed genome-scale interrogation of protein–DNA interactions, for any global DNA binding protein in bacteria, using high-throughput sequencing. In addition to providing a proof-of-principle for the use of this new technology for bacteria, we perform our study at multiple time-points during growth in rich medium, thus generating new insights into how these proteins function under different cellular conditions. Further, by analysing our data in conjunction with gene expression and RNA-polymerase–DNA interaction data we provide new interpretation of the regulatory functions of these proteins.
The E. coli K-12 MG1655 bacterial strains used in this work are the following: E. coli MG1655 (F-lambda- ilvG- rfb-50 rph-1); MG1655 Δhns (Δhns::Kanr); MG1655 Δfis (Δfis::Kanr); MG1655 hns-FLAG (hns::3xFLAG::Kanr); MG1655 fis-FLAG (fis::3xFLAG::Kanr). Luria-Bertani (0.5% NaCl) broth and agar (15 g/liter) were used for routine growth. Where needed, ampicillin, kanamycin and chloramphenicol were used at final concentrations of 100, 30 and 30µg/ml, respectively.
Disruption of hns and fis genes in the E. coli chromosome was achieved by the λ Red recombination system, previously described by Datsenko and Wanner (36). Primers designed for this purpose are shown in Supplementary Data 19. Sets of additional external primers were used to verify the correct integration of the PCR fragment by homologous recombination (Supplementary Data 19).
The 3xFLAG epitope was added at the C terminus of the H-NS and Fis protein by a PCR-based method with plasmid pSUB11 as template (37). Primers used for introducing the 3xFLAG tag are shown in Supplementary Data 19. The tagged construct was then introduced onto the chromosome of E. coli MG1655 using the λ Red recombinase system (36). At each stage, DNA and strain constructions were confirmed by PCR and/or sequencing. This approach resulted in the introduction of a kanamycin resistance cassette in the chromosome downstream of the tagged gene. The cassette can be removed by FLP-mediated site-specific recombination (36), although this was not done for the experiments described here. In all cases, the complete functionality of the 3xFLAG-tagged version of the proteins was tested.
To prepare cells for RNA extraction, 100ml of fresh LB was inoculated 1:200 from an overnight culture in a 250ml flask and incubated with shaking at 180rpm in a New Brunswick C76 waterbath at 37°C. Two biological replicates were performed for each strain and samples were taken at early-exponential, mid-exponential, transition-to-stationary and stationary phase. The cells were pelleted by centrifugation (10000g, 10min, 4°C), washed in 1× PBS and pellets were snap-frozen and stored at −80°C until required. RNA was extracted using Trizol Reagent (Invitrogen) according to the manufacturer’s protocol until the chloroform extraction step. The aqueous phase was then loaded onto mirVanaTM miRNA Isolation kit (Ambion Inc.) columns and washed according to the manufacturer's protocol. Total RNA was eluted in 50µl of RNAase free water. The concentration was then determined using a Nanodrop ND-1000 machine (NanoDrop Technologies), and RNA quality was tested by visualization on agarose gels and by Agilent 2100 Bioanalyser (Agilent Technologies).
For the generation of fluorescence-labelled cDNA the FairPlay III Microarray Labelling Kit (Stratagene) was used. Briefly, 1µg of total RNA was annealed to random primers, and cDNA was synthesized in a reverse transcription reaction with an amino allyl modified dUTP in the presence of 1μg of Actinomycin D. The amino allyl labelled cDNA was then coupled to a Cy3 dye (GE Healthcare) containing a NHS-ester leaving group. The labelled cDNA was hybridized to the probe DNA on custom Agilent microarrays by incubating at 65°C for 16 h. The unhybridized labelled cDNA was removed and the hybridized labelled cDNA was visualized using an Agilent Microarray Scanner. Note that we performed a one-colour experiment on the Agilent array.
To validate the results of the microarray analysis, quantitative reverse-transcriptase PCR (qRT-PCR) was carried out using specific primers to the mRNA targets showing up- or down-regulation, and control targets not showing differential expression. RNA was extracted as described above from wild-type, Δhns and Δfis cells and 30ng total RNA was used with the Express One-Step SYBR GreenER kit (Invitrogen) according to the manufacturer’s guidelines, using a MJ Mini thermal cycler (Bio-Rad).
ChIP was performed as previously described (38) with some modifications to the protocol. Cells were grown aerobically at 37°C to the desired OD600 and formaldehyde was added to a final concentration of 1%. After 20min of incubation, glycine was added to a final concentration of 0.5 M to quench the reaction and incubated for a further 5min. Cross-linked cells were harvested by centrifugation and washed twice with ice-cold TBS (pH 7.5). Cells were resuspended in 1ml of lysis buffer [10 mM Tris (pH 8.0), 20% sucrose, 50mM NaCl, 10mM EDTA, 20mg/ml lysozyme and 0.1mg/ml RNase A] and incubated at 37°C for 30min. Following lysis, 3ml immunoprecipitation (IP) buffer [50 mM HEPES–KOH (pH 7.5), 150mM NaCl, 1mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate, 0.1% sodium dodecyl sulphate (SDS) and PMSF (final concentration 1mM)] was added and the lysate passed through a French pressure cell twice. Two microlitres of aliquots were removed and the DNA sheared to an average size of ~200bp using a Bioruptor (Diagenode) with 30 cycles of 30 s on/off at high setting. Insoluble cellular matter was removed by centrifugation for 10min at 4°C, and the supernatant was split into two 800μl aliquots. The remaining 400μl was kept to check the size of the DNA fragments.
Each 800μl aliquot was incubated with 20μl Protein A/G UltraLink Resin (Pierce) on a rotary shaker for 45min at room temperature to get rid of complexes binding to the resin non-specifically. The supernatant was then removed and incubated with either no antibody (mock-IP), FLAG mouse monoclonal antibody (Sigma-Aldrich) or RNAP β subunit mouse monoclonal (NeoClone) and 30μl Protein A/G UltraLink Resin, pre-incubated with 1mg/ml bovine serum albumin (BSA) in TBS, on a rotary shaker at 4°C overnight (FLAG antibody) or at room temperature for 90min (RNAP β subunit antibody). Samples were washed once with IP buffer, twice with IP buffer+500mM NaCl, once with wash buffer [10 mM Tris (pH 8.0), 250mM LiCl, 1mM EDTA, 0.5% Nonidet P-40 and 0.5% sodium deoxycholate] and once with TE (pH 7.5). Immunoprecipitated complexes were eluted in 100μl elution buffer [10 mM Tris (pH 7.5), 10mM EDTA and 1% SDS] at 65°C for 20min.
Immunoprecipitated samples and the sheared DNA following the Bioruptor were de-crosslinked in 0.5× elution buffer containing 0.8mg/ml Pronase at 42°C for 2h followed by 65°C for 6 h. DNA was purified using a PCR purification kit (QIAGEN). Prior to sequencing, the DNA fragment sizes were checked and gene-specific quantitative PCR (qPCR) was carried out.
To measure the enrichment of the Fis, H-NS or RNAP-binding targets in the immunoprecipitated DNA samples, real-time qPCR was performed using a MJ Mini thermal cycler (Bio-Rad). One microlitre of IP or mock-IP DNA was used with specific primers to the promoter regions (primer sequences are available upon request) and Quantitect SYBR Green (QIAGEN).
Prior and post library construction, the concentration of the immunoprecipitated DNA samples was measured using the Qubit HS DNA kit (Invitrogen). Library construction and sequencing was done using the ChIP-Seq Sample Prep kit, Reagent Preparation kit and Cluster Station kit (Illumina). Samples were loaded at a concentration of 10 pM.
The E. coli K12 MG1655 genome was downloaded from the KEGG database (39). Annotations of gene coordinates were obtained from Ecocyc 11.5 database (40). Literature-derived transcriptional regulatory network, including known targets for Fis and H-NS, for E. coli K12 was obtained from RegulonDB 6.2 database (14). Targets of the global transcriptional regulator CRP were downloaded from RegulonDB 6.2 and augmented, where required, with additional targets identified by Grainger and colleagues (41). Genomic regions with atypical composition of higher-order oligonucleotides—and thus putatively corresponding to horizontally-acquired DNA—were identified using the Alien Hunter software (42). Lists of genes identified as bound by H-NS or Fis in previous high-resolution tiling microarray studies were downloaded from the respective publications (17–19). Protein occupancy domains of E. coli were downloaded from Vora et al. (43). Orthologs between Salmonella enterica Typhimurium LT2 and E. coli K12 MG1655 were obtained from the work of Moreno-Hagelsieb and Latimer (44).
Sequences obtained from the Illumina Genome Analyzer were mapped to both strands of the E. coli K12 MG1655 genome using BLAT allowing no gaps and up to two mismatches. Each alignment was extended to 200bp—the approximate average length of DNA fragments—on the 3′ end. Only reads which mapped to a single region of the genome were considered for further analysis. For each base position on the genome, the number of reads that mapped to that position was calculated. The distribution of read counts thus obtained had a sharp peak at a low value followed by a heavy tail. Since this characteristic of the distribution is similar to that obtained for high-resolution gene expression tiling arrays, we used a procedure adopted earlier for tiling array analysis (45). Briefly, the background was a normal distribution with the following parameters:μ=mode (as computed using the ‘shorth’ procedure in R) of the entire distribution and; σ=1.483×median absolute deviation of all values less than the mean of the entire distribution. This gives a better fit of the empirical distribution than a Poisson distribution of the sameμ (Supplementary Data 1). The cutoff read count was defined as Z=μ+3σ. Any consecutive stretch of DNA where each coordinate had a read count greater than or equal to the cutoff was flagged; pairs of adjacent regions so obtained were merged to give a single binding region if they were separated by <200bp. Then the number of reads mapped to each binding region, normalized by the total number of reads obtained for that sample, was compared to the corresponding value from the mock-IP using a binomial test, as described in the PeakSeq algorithm (46). Any region giving a Bonferroni-corrected P≤0.01 was defined as a bonafide protein binding region. We performed mock-IP only in the mid-exponential phase taking into consideration the following: (i) it has been suggested that a single control library can be used across multiple ChIP-Seq experiments given that these were performed in the same organism under similar fragmentation conditions (47); (ii) qPCR data for mock-IP experiments from our laboratory show minimal and inconsistent variation across time-points.
For this purpose, we downloaded lists of genes identified as bound (either upstream or in genebody) by H-NS and Fis from published tiling microarray studies (17–19). These genes were compared with those which overlap with binding regions identified in our study; here the cut-off for defining an overlap was set at 100bp. Here, we used the union of genes detected as bound by the protein of interest in early- and mid-exponential phases of growth to partly account for possible differences in the environmental/cellular conditions used in the compared studies.
To identify DNA sequence motifs for the binding of H-NS and Fis, we obtained the sequence of 101bp of DNA including 50bp on either side of the summit for each binding region. Here the summit for each binding patch was defined as the base coordinate with the highest read count within that region. The sequences so obtained were scanned for motifs using the MEME software (48) with the following parameters: zero or one motif per sequence; motif width ranging from 6 to 24; searching both strands of the sequences; using a background distribution file containing the mono- and di-nucleotide frequencies of the E. coli chromosome. Then the complete sequence of each binding region was searched for the presence of these motifs using the MAST programme (48) with a P≤0.001 and using the same background mono- and di-nucleotide frequencies as above. Any definition of a motif in this work refers to those which were identified within the binding regions.
We used the operon definitions available in RegulonDB 6.2 (14) to identify a set of 2567 lead genes, which are the first genes of each operon. An operon was flagged as being bound by the protein of interest if at least 50bp of the intergenic region upstream of the operon overlapped with a binding region. For long intergenic regions, only the first 400bp of the sequence immediately upstream of the operon were used.
Reads obtained from the sequencing of RNA-polymerase-associated DNA were mapped to the genome and read counts obtained per base position as described above. For each gene, the median read count across all base positions corresponding to the gene body was defined as its occupancy. In addition, for each lead gene, the highest read count in the upstream region was calculated and used as a representation of transcription initiation. Data from each sample were normalized to the total number of reads obtained for that sample and then divided by the corresponding value from the mock-IP.
Gene expression analyses were performed on custom-designed isothermal Agilent microarrays containing 10 821 60-mer probes covering 4373 genes. In addition to these sense probes, the array contained 4172 anti-sense probes which were excluded from this analysis. These probes were designed using Array Oligo Selector (49).
Microarray data were processed in Bioconductor using standard procedures. Briefly, array data were background corrected using normexp (50). Biological replicates were first normalized using variance stabilization and normalization (VSN) (51). All arrays, across genetic backgrounds, from the same time-point were again normalized together using VSN. Differential expression in the deletion strains compared with the wild-type at the same time-point was called at false discovery rate (FDR)-adjusted P-value of 0.01 using the LIMMA package (52); this was performed at the level of individual probes. Any gene was called differentially expressed even if one of the probes corresponding to it passed the above threshold. For direct comparison with operons that are bound by the protein of interest, we used only the list of lead genes that were differentially expressed. ‘Absolute’ expression level for each probe under a given genetic background and growth phase, where required, was defined as the average value across replicates; this shows a significant correlation with RNA-seq data obtained in our lab during exponential phase of growth (Spearman Rank correlation=0.73; Supplementary Data 19).
The Fisher’s exact test was used for categorical data. Wilcoxon–Mann–Whitney tests were performed when comparing distributions. Since the size of the distributions were typically large, we used the t-test as well to ensure that the result of our comparisons were significant in both tests. In this article, we report P-values from the Wilcoxon test. Unless otherwise stated, a P-value cutoff of 0.01 was used to signal statistical significance. Correlation coefficients of read count ‘Z’ (see above) between two samples were computed at the base resolution, ignoring ‘background’ coordinates where the Z for both samples were <2. All these tests were carried out in R (www.r-project.org).
All microarray and ChIP-seq data have been submitted to ArrayExpress, and have been assigned the following accession numbers: ChIP-seq: E-MTAB-332; Microarray design: A-MEXP-1866; Microarray raw and normalized data: E-MEXP-2838; RNA-seq data: E-MTAB-387.
We investigated H-NS- and Fis-binding to the E. coli K12 chromosome during early-exponential, mid-exponential, transition-to-stationary and stationary phases of growth in LB medium by ChIP combined with high-throughput sequencing (21,23). As controls, we performed mock-IP experiments in mid-exponential phase in the absence of antibodies to identify non-specifically precipitated DNA.
For each sample, we obtained ~6–15-million reads of 36-nt length, amounting to 50–120-fold coverage of the E. coli genome (Table 1). We mapped these reads to both strands of the E. coli K12 MG1655 genome sequence and extended the mapping in the 3′-end to 200bp, which is the approximate average length of the DNA fragments obtained from the immunoprecipitation experiments. To identify bound loci, we calculated the number of reads that mapped to each base-pair in the genome (Figure 1). We expected to see a near-complete representation of the genome in our sequences, irrespective of where the proteins bind; therefore we derived an internal background distribution for each sample as described earlier for tiling microarray data (see ‘Materials and methods’ section, Supplementary Data 1) (45). The cutoff value for calling binding regions was fixed at three standard deviations above the mean of the background normal distribution (not more than 1% of values within the normal distribution are higher than this cutoff). Any stretch of DNA where each position mapped to more reads than the above-defined cutoff was called a binding region. Then, all binding regions separated by <200bp were merged; this was performed to counter possible under-sequencing of chromosomal regions of length equal to a single read (22). Finally, binding regions whose read counts did not differ significantly from the mock-IP sample were removed. Selected binding regions were verified using quantitative PCR (Supplementary Data 19).
First, we compared our dataset with previously published ChIP-chip data. Here we note that cross-study comparisons are not straightforward owing to differences in experimental conditions and platforms, analysis procedure and the manner in which data are presented. However where possible, we use published lists of bound genes and raw binding signals as points of comparison.
We compared our data for H-NS (combining early- and mid-exponential phase data) with that from a tiling microarray-based study by Oshima and co-workers (18). A large majority of genes (75%; Fisher’s exact test P<10−50) flagged as bound by H-NS in the above study overlap with binding regions (by at least 100bp) we identify; just over a quarter of genes (27%) detected in our study are not identified by Oshima and colleagues.
We then compared ChIP-chip data for Fis by Cho and colleagues (19) with our data. Overall, binding regions that we identify display significantly higher fluorescence intensities in the above dataset than randomly picked regions (Supplementary Data 2). This is despite the fact that Cho et al. performed their experiments in M9 plus glucose whereas ours was carried out in LB without supplemented sugars. Over two-thirds (67%; Fisher’s exact test P<10−50) of genes bound by Fis in the Cho dataset overlap with binding regions identified here. However, we detect a significantly larger number of bound genes, with binding either in the gene body or in upstream regions (1592 genes, compared with 894 genes in the Cho dataset). Even at a more stringent threshold for identifying binding regions, we identify more bound genes (1006 genes) than Cho et al. with a recovery of 53% (Fisher’s exact test P<10−50).
We then compared our dataset with the lower resolution (160-bp resolution) ChIP-chip study of Grainger and colleagues (17). For H-NS, there is excellent agreement in binding signals between the two datasets. However the overlap at the gene level is poor (39% of genes flagged as bound by H-NS by Grainger and colleagues are recovered here; Supplementary Data 2). We believe that the poor overlap at the gene level is a consequence of assumptions made in assigning binding regions to target genes; this could have been exacerbated by the lower resolution of the Grainger study (David Grainger, personal communication). Remarkably for Fis, there is no similarity between the two datasets either at the level of binding signals or bound genes (31% of bound genes in the Grainger dataset are recovered). This might be a consequence of differences in experimental conditions, which might affect Fis more than H-NS because of the former's link with catabolite repression (53,54); in fact, we observe a statistically significant overlap (Fisher’s exact test P=9.8×10−6) between operons bound in their upstream regions by Fis (but not H-NS) in our study and publicly-available targets of CRP (14,41), the global regulator of catabolite repression. However, we note that there is only a limited correlation in binding signals between the studies of Cho et al. and Grainger et al. despite the fact that both studies were performed in minimal media with a difference only in the identity of the carbon source used (glucose and fructose, respectively; Supplementary Data 2). It is possible that Fis binds to the E. coli genome extensively and that each study sampled only a subset of binding sites: this might be substantiated by the fact that the background signal is higher for Fis than H-NS (Figure 1; Stephen Busby, personal communication).
Finally, we compared the lists of genes identified as H-NS targets in S. enterica Typhimurium (8) with our data. A majority (58%) of genes bound by H-NS in Salmonella do not have orthologs in E. coli (Fisher’s exact test, P=5.2×10−36). These genes are probably horizontally acquired, and are exemplified by the H-NS-mediated regulation of the Salmonella-specific pathogenicity islands such as Spi-1 and Spi-2 which have been horizontally-acquired. Similarly, ~46% of genes with H-NS binding in E. coli do not have orthologs in Salmonella. Therefore, the targets of H-NS are substantially different between E. coli and Salmonella. We note here that over 75% of the conserved H-NS targets in Salmonella are bound by H-NS in E. coli; this proportion is similar to the agreement between two independent studies of H-NS targets in E. coli (see above).
We focus on the DNA-binding profiles of H-NS and Fis in mid-exponential phase (Figure 1 and Table 1). H-NS binds to ~17% of the genome in terms of basepairs, whereas Fis binds to ~11%, distributed over 458 and 1464 discrete binding regions, respectively (Figure 2A). In contrast to observations of Grainger and colleagues (17)—made under substantially different growth conditions—we find little overlap between Fis and H-NS-binding regions (Figure 2A). In fact, across the genome, there is a significant negative correlation between the binding signals for H-NS and Fis.
H-NS binds to longer tracts of DNA than Fis (averages of 1686 and 355bp for H-NS and Fis, respectively; Wilcoxon test, P<10−50; Figure 3A and B; Supplementary Data 3). The observed length distribution for H-NS is in line with the results of a recent study in Salmonella (55). This is consistent with the ability of H-NS to form long oligomers, extending from high affinity nucleation sites to flanking lower affinity sites (10,56).
The H-NS binding motif (57), defined by enriched oligonucleotide sequences within bound regions, is 5–6nt in length and comprises only A/T nucleotides (Figure 2B; Supplementary Data 4). This motif is present in 96% of all binding regions at an average of 19.9 occurrences per region. In agreement with published results (19), the ~15-nt Fis motif consists of an A/T tract flanked by highly conserved G/C on either side (Figure 2B). This motif is present in 91% of binding regions at an average of 2.3 occurrences per region. Note that we differentiate between binding regions and motifs: whereas binding regions are empirically identified by our experiments, motifs represent the computationally identified sequences that fall within our binding regions.
On average, 18 and 17% of all basepairs covered by H-NS and Fis binding regions—corresponding to 24 and 21%, respectively, of binding motifs—fall within intergenic regions upstream of predicted operons (Figure 2C; Supplementary Data 5). Given that ~8% of the E. coli genome comprises operon-upstream intergenic regions, Fis and H-NS display a preference for binding upstream of operons. Most of the other motifs fall within the body of operons (67 and 74% for H-NS and Fis, respectively). In agreement with previous reports (8,18,58), there is significant enrichment of H-NS (but not Fis) binding across horizontally-acquired regions (Figure 2D).
Finally, we define 597 (23%) and 649 (25%) operons as bound in a regulatory capacity by H-NS and Fis, respectively, based on binding in upstream regulatory sequences (applying a limit of 400bp). The rest of our discussion is based on the above operons only and not those with protein binding only in the gene body (as included in our comparison with previous studies). Operons targeted by H-NS are enriched for gene functions associated with fimbrial biogenesis (Fisher’s exact test, P=5.1×10−3), which expands previous work linking H-NS to the regulation of biofilm formation and motility (59). As expected from prior molecular studies (60), operons bound by Fis show an enrichment for genes involved in translation (Fisher’s exact test, P=1.6×10−3). In agreement with the significant overlap of Fis bound genes with CRP targets, carbohydrate metabolism and transport also shows a significant enrichment among Fis targets (Fisher’s exact test, P=4.7×10−3).
We had noted earlier that genomic regions bound by H-NS tend to be longer than those bound by Fis (Figure 3). In order to investigate systematically the association between the length of H-NS binding regions and genomic features recognized, we classified H-NS binding regions into those that are longer than 1000bp (‘LH-NS’ regions; n=300 in mid-exponential phase) and those that are shorter (‘SH-NS’ regions; n=158). We observe that a significantly higher proportion of motifs within SH-NS (37%) than LH-NS (22%) regions fall in operon-upstream regions (Fisher’s exact test, P=1.2×10−21; Supplementary Data 5). This might be expected given the differences in their lengths and the fact that operon-upstream regions have high A/T content (61). Unexpectedly however, the proportion of operon-upstream SH-NS motifs is significantly higher than that for Fis motifs as well (Fisher’s exact test, P=1.3×10−20).
We also observe that horizontally-acquired genes are significantly enriched in the LH-NS group (Supplementary Data 6); this is in accordance with the fact that predicted horizontally-acquired genes are located in long regions of typically higher A/T content than the genomic average (Supplementary Data 7).
Therefore, short H-NS binding regions tend to behave in a manner typical of canonical TFs, where the protein binds upstream of the gene whose expression it regulates. On the other hand, longer binding regions wrap large segments of the chromosome, encompassing both genes and intergenic regions.
It was previously demonstrated that Fis–DNA complexes adopt variable structures depending on the A/T content of the DNA surrounding the core binding motif (62). The variability in these complexes is manifested by the degree to which the bound DNA is bent, with greater bending in regions of higher A/T content. To investigate this in our data, we defined binding regions in the top quarter of the distribution of A/T contents (101bp around the summit) as likely to be bent by Fis. The association between the A/T content of the binding region and gene expression is described later.
As intergenic regions tend to have higher A/T contents and intrinsic bending, a greater proportion of motifs falling within high-A/T binding regions are in operon-upstream regions (40 versus 11%; Wilcoxon test, P<10−50; Supplementary Data 5). Furthermore, high A/T Fis-binding regions show significantly greater binding signal than other regions (Wilcoxon test, P=3.5×10−9; Supplementary Data 8); this might reflect the fact that Fis-DNA complexes involving DNA bending dissociate slower than others (62).
Next, we investigated the variation in H-NS and Fis-binding profiles from the early-exponential to the stationary phases of growth (Figure 4).
For H-NS, we detected similar binding at all four stages of growth (Table 1 and Figure 4). Though previous experiments showed a peak for H-NS protein expression during exponential growth followed by an ~2–2.5×decrease during later stages (5), our western blot experiment showed constant H-NS levels across our experimental conditions (Supplementary Data 9), in agreement with previous results in Salmonella (9). For Fis, we identified comparable numbers of binding regions in both early- and mid-exponential phases (Table 1 and Figure 4). In agreement with earlier studies, our western blots (Supplementary Data 9) show that Fis is expressed below detectable levels after exponential growth (5).
Though the binding profiles are significantly correlated between time-points (Figure 4A), there are specific differences (Supplementary Data 10). For H-NS, the number of binding regions and genes targeted for binding increase as the cells progress from exponential to stationary phase; this includes both stationary phase-specific binding regions and extension of mid-exponential phase binding regions. For Fis, we observe greater variability in binding between early- and mid-exponential phases than in H-NS (ρ=0.85 for Fis compared with 0.95 for H-NS, between early-and mid-exponential phases), with more binding in mid-exponential phase. However, we advocate caution in interpreting these results, as they may represent marginal quantitative differences resulting from the thresholds used to call binding events and therefore have limited biological relevance.
Finally, as mentioned in the section above, there is a negative correlation between H-NS and Fis at each time-point (Figure 4A), suggesting that the binding regions of the two proteins tend to be mutually exclusive.
Having examined the pattern of DNA binding by H-NS and Fis, we investigated whether genes bound by H-NS and Fis showed distinct patterns of gene expression in wild-type E. coli cells during mid-exponential phase. Using one-colour experiments on Agilent oligonucleotide microarrays, we found that absolute gene expression levels (which correlate with expression measures derived from RNAseq data) were: (i) lower for genes bound by H-NS than those that are not; and (ii) higher for genes bound by Fis (Figure 5A; Supplementary Data 11). We make consistent observations in experiments in which we measure genome-wide RNA-polymerase occupancy during mid-exponential phase using ChIP-seq (Figure 5B). The former observation is in line with the accepted role of H-NS as a global repressor of gene expression (63). The latter, linking Fis binding to higher expression levels, may be consistent with the hypothesis that a branched DNA topology, which is induced by Fis binding, is a chromatin state that is associated with transcriptional activity (15).
We compared our data with a public dataset from Vora and colleagues describing general protein occupancy across the E. coli genome (Supplementary Data 12) (43). These authors classified domains of high occupancy into those with high gene expression (hePOD; highly expressed protein occupancy domains) and those that are transcriptionally silent (tsPOD). As expected, we find a strong enrichment for H-NS-bound genes within tsPODs (Fisher’s exact test, P <10−50). In contrast to observations by the above authors (made using computational searches of Fis-binding motifs), we find that Fis-bound genes are under-represented within tsPODs (Fisher’s exact test, P=9.0×10−5).
Though these results show that there is an association between protein binding and the transcriptional state of the corresponding genes, they do not establish causality. In order to test this in vivo, we measured gene expression levels for Δhns and Δfis strains of E. coli K12 MG1655, and verified selected results using RT-PCR. In agreement with our observations of expression levels in wild-type strains, more genes are up- than down-regulated in Δhns when compared with the wild-type (971 are up-regulated; 335 are down-regulated in mid-exponential phase; Supplementary Data 13), whereas the contrary is true for Δfis (338 are down-regulated; 160 are up-regulated). In order to investigate whether these effects are proximal on the chromosome to the binding regions of Fis and H-NS, we compared our ChIP-seq-based binding profiles with the genes that are differentially expressed in the mutant strains when compared with the wild-type.
A significant proportion of genes that are bound by H-NS display differential up-regulation of gene expression in Δhns during mid-exponential phase: 65% of H-NS–bound genes are differentially expressed compared with only 19% of genes not bound by H-NS (Figure 5C; Supplementary Data 14). Similarly, the RNA-polymerase occupancy in the body of genes bound by H-NS increases significantly in Δhns, again demonstrating increased transcription in the mutant strain (Figure 5D).
Previous genome-scale studies had reached conflicting conclusions on the manner in which H-NS represses transcription. ChIP-chip data for S. enterica Typhimurium H-NS by Lucchini and colleagues showed that RNA-polymerase is excluded from H-NS bound regions (8). On the other hand, the work of Grainger et al. and Oshima et al. showed that the polymerase was bound to 50–65% of H-NS bound sites though the associated genes were transcriptionally inactive (17,18); as a result they proposed that H-NS-mediated repression might generally involve trapping the polymerase at the promoter. We find a distinct increase in the RNA-polymerase occupancy upstream of operons bound by H-NS in Δhns when compared with the wild-type (Supplementary Data 15), which is concomitant with a corresponding increase in the enzyme’s occupancy in the gene body; this suggests that our data support the conclusions of Lucchini et al. However, it must be mentioned here that RNA-polymerase trapping by H-NS, though not prevalent in our data, has been experimentally demonstrated at certain promoters (64,65). The differences between the studies, especially with that by Grainger and colleagues, must be interpreted in light of the substantially different numbers of bound genes identified.
In order to extend our analysis further, we performed our DNA microarray-based analysis of gene expression change under all four conditions of growth (Supplementary Data 13 and 14). H-NS has a statistically significant direct impact on gene expression across all phases of growth. However, compared with mid-exponential phase a much smaller proportion of genes bound by H-NS are differentially regulated in Δhns in stationary phase (65% of H-NS bound genes are flagged as differentially expressed in mid-exponential compared with only 26% in stationary phase). This could partly be a consequence of the relatively poor quality of RNA that could be collected from the stationary phase cells, which would lead to the assignment of weaker statistical significance to differential regulation; the total number of genes called as differentially expressed in stationary phase is far less than that in mid-exponential phase (1313 differentially expressed genes in mid-exponential compared with only 400 in stationary phase). Alternatively, there could be a biological basis to this, in which any possible gene expression increase in Δhns is suppressed by other stationary phase-specific factors.
Having described the effect of H-NS binding on gene expression, we now examine the effect of different types of binding (LH-NS/long and SH-NS/short) described earlier. Both LH-NS and SH-NS operons show a significant tendency to be differentially expressed in Δhns (Figure 6C and D; Supplementary Data 14); however, LH-NS operons tend to display more differential expression than SH-NS indicating a greater degree of repression. Further, in the wild-type, LH-NS genes show lower expression levels than SH-NS genes (Figure 6A and B, Supplementary Data 11).
To test further whether LH-NS and SH-NS genes represent distinct modes of transcriptional repression, as indicated by the above results, we compared their occurrence within tsPODs which represent transcriptionally silent loci (43). We find that LH-NS genes are enriched within tsPODs, whereas SH-NS genes are not (Fisher’s exact test, P=4.7×10−13 comparing L and S genes; Supplementary Data 12).
Together, these suggest that global regulation of transcription by H-NS may encompass: (i) transcriptional modulation, typically mild repression, of SH-NS genes and (ii) ‘total’ transcriptional ‘silencing’ of LH-NS genes, including putative horizontally-acquired genes (20). The former, given the propensity of the corresponding binding regions to lie within operon-upstream regions, might act like a canonical TF; transcriptional silencing on the other hand involves extensive wrapping of large tracts of the chromosome. Based on the overall distribution of the lengths of H-NS-binding regions, we suggest that the predominant role of H-NS is transcriptional silencing.
Though the role of H-NS as a transcriptional repressor is well-established, the impact of Fis on gene expression on a genomic scale remains unclear. Given that genes bound by Fis, on average, have higher expression levels in wild-type E. coli, one might reasonably expect these genes to be down-regulated in Δfis. Activation of transcription of individual operons, those of stable RNA in particular, by Fis is well-characterized (60,66,67); an inspection of regulatory targets for Fis in the RegulonDB database suggests that it activates more genes than it represses. However, it must be emphasised that activation of gene expression does not fully explain the regulatory roles of Fis as it is a key repressor of several non-essential genes during exponential growth (68–70).
In our study, the large number of genes differentially expressed in Δfis account for only a small proportion of Fis-bound genes (Figure 5C; Supplementary Data 14). We also make a consistent observation in the sequencing-based RNA-polymerase occupancy data for mid-exponential phase (Figure 5D), thus indicating that the above is not an artefact of the array technology. Our results are in agreement with a previously published ChIP-chip study of Fis, which showed differential expression for only about a quarter of bound genes (19). Curiously, despite the general agreement in Fis-binding regions between early- and mid-exponential phases of growth, there is little overlap between the sets of genes that are differentially expressed in Δfis between the two time points; similar observations were made earlier for Fis in E. coli (71) and IHF in Salmonella typhimurium (72).
These data indicate that deletion of fis is not sufficient to cause expression change in most genes that are bound by this protein; this might be because Fis only has a weak role as a TF in these genes, or because these effects are compensated for by other cis- and trans-acting players which we do not study here.
We then investigated whether binding regions associated with the relatively fewer differentially-expressed genes in Δfis show any distinctive property. These, when compared with binding regions associated with genes not differentially expressed in Δfis, (i) tend to be longer (Figure 7A; Wilcoxon test, P=2.5×10−10 for mid-exponential phase) and consequently contain more Fis binding motifs (Figure 7B); (ii) have higher A/T content (Figure 7F; Wilcoxon test, P<10−50). Following from the latter (see ‘Variable structures of Fis–DNA complexes’ section), these binding regions also tend to have higher binding signals (Figure 7C and D; Wilcoxon test, P=7.0×10−8), and contain a greater proportion of operon-upstream motifs (Figure 7E; Wilcoxon test, P=3.0×10−28).
These results indicate that change in expression of a gene bound by Fis might require Fis-binding in multiple tandem copies, possibly nucleated by high-affinity sites at operon-upstream regions. The higher A/T content of binding motifs associated with proximal differential expression suggests that, in accordance with observations made on a molecular scale, DNA-bending by Fis might be required for gene expression control (62). These are exemplified by the tyrT promoter which is regulated by three Fis dimers binding and bending the DNA (66,67). However, these features are not predictive of differential expression (Supplementary Data 16), indicating that definitive determinants of gene expression control by Fis are still lacking.
A large number of genes are down-regulated in both Δhns and Δfis, a large majority of which are not bound by the NAPs concerned; therefore these effects are likely to be indirect. Genes that are down-regulated in the two deletion strains tend to have higher expression levels than other genes in the wild-type strain (Figure 8). Thus, despite the dissimilarities in the binding of H-NS and Fis, an important minority of their influence on gene expression—especially of highly expressed genes—is shared. This might be a consequence of the impact the two proteins have on the topology of the chromosome—its supercoiled state in particular (73)—which, despite in vitro studies on plasmids and phage DNA, is only beginning to be characterized on a genome-wide scale and at a high resolution (74,75).
Given that genes that are down-regulated in the deletion strains tend to have high expression levels, we sought to mine our data to speculate on how the free RNA-polymerase molecules thus generated are redistributed in the mutants. A significantly higher proportion of genes up-regulated in Δhns than in Δfis have RNA-polymerase occupancy that are within the top 10% of highly expressed genes (12% of up-regulated genes in Δhns, 3% in Δfis, P=2.6×10−4). Thus, both deletions lead to fall in expression of highly expressed genes; however, the manner in which the free RNA-polymerase molecules are redistributed may be different between the two. In Δfis, these are probably distributed across genes with relatively low expression levels; on the other hand, in Δhns this is compensated for by a subset of genes whose repression is relieved by the lack of H-NS (51 of 80 up-regulated operons in the top 10% of genes with the highest RNA-polymerase occupancy in Δhns are bound by H-NS).
An observable phenotype of Δhns is loss of motility. These genes are not directly regulated by H-NS, making them targets for studying non-proximal effects of H-NS on gene expression. Though the expression of the transcription factor FlhDC—the master regulator of flagellar gene expression—has been reported to be directly regulated by H-NS (76), we do not find evidence for the same in any of the conditions tested. Instead, we find that 17 of the 26 operons coding for cyclic-di-GMP-metabolising GGDEF/EAL domain-containing proteins, which regulate the switch between motility and adhesion, are bound by H-NS in at least one of the four conditions; 22 of the 29 such genes are differentially expressed in Δhns (Supplementary Data 17). It has already been shown that two GGDEF/EAL proteins that inversely control adhesion through regulating curli biogenesis are regulated by H-NS (59); indeed, we observe binding and regulation of csgD—a transcriptional regulator of curli biogenesis—by H-NS under all conditions. Our genome-scale study indicates that H-NS is a global regulator that is positioned at the apex of the c-di-GMP regulatory network controlling motility and adhesion.
Finally, a large majority of genes bound by Fis show little change in gene expression in Δfis. However, the Δfis mutation leads to a global change in gene expression during the exponential phases of growth, with over 950 genes differentially expressed in early- or mid-exponential phases of growth. Clearly, most of these gene expression changes are caused by indirect effects. These effects might be mediated by the impact of Fis on the overall chromosome topology. A second, more tractable, effect might be through cascades of transcription factors. To investigate this, we used the transcriptional regulatory network comprising 3254 interactions between 163 TFs and 1450 target genes available in RegulonDB. We find that 37 TFs, including the prolific global regulator CRP, are differentially expressed in Δfis in early- or mid-exponential phases of growth. Of the 851 annotated targets of these TFs, 316 (37%) are differentially expressed in Δfis; this represents a significant enrichment over other genes of which only ~20% are differentially expressed (Fisher's Exact test, P=5.9×10−13). Of the 37 TFs differentially expressed, only 12 are bound directly by Fis. The regulatory cascade effect described holds even if we were to restrict our analysis to the targets of these 12 TFs (199 of 541 targets are differentially expressed; 37%). Of the remaining 25 TFs, 10 are known direct targets of the Fis-bound TFs. Therefore the expression change of 22 of the 37 TFs can be explained by direct Fis binding or by regulation by Fis-bound TFs.
In summary, a significant proportion of genes are differentially expressed in Δfis probably because of the cascading effects of multiple transcription factors.
We have investigated the genome-wide binding of two NAPs, H-NS and Fis, to the E. coli K12 MG1655 chromosome using ChIP followed by sequencing of resulting DNA. Though this technique has been extensively adopted in eukaryotic genomics, to our knowledge ours is the first ChIP-Seq experiment for any global bacterial DNA-binding protein. We interpret our data using a combination of deletion strains, microarray-based measurements of gene expression and parallel-sequencing of RNA-polymerase-bound DNA fragments.
The binding of both NAPs has been studied on a genome-wide scale using microarrays. Grainger and colleagues studied the binding of H-NS, Fis (and IHF) in mid-exponential phase and expanded the list of genes known to be bound by these proteins (17). In particular, they reported the presence of extensive overlap between the promoters bound by Fis and H-NS, which we do not observe in our conditions. Our observation of a negative correlation between the ChIP-Seq signals for Fis and H-NS is further manifested by the following observations: (i) H-NS binding is enriched in putative horizontally-acquired regions, whereas Fis binding is not; (ii) H-NS targets are enriched in transcriptionally silent protein-occupied DNA domains, whereas Fis-bound genes are under-represented. This difference in observation between our study and that of Grainger and colleagues (17) is caused by the discrepancy between the two in identifying Fis binding regions. Though these differences are surprising, they may be explained in various ways. First, Fis might bind, with a range of affinities, to most of the E. coli genome; this may be observed in the higher background in our Fis experiment (Figure 1). Therefore, each study may be sampling a distinct set of bound loci. Second, the experimental conditions are vastly different: our experiments were carried out in rich LB medium without sugar supplements, whereas Grainger et al. performed theirs in M9 minimal medium plus fructose. This could have led to substantial differences in Fis binding profiles due to its reported association with catabolite repression and competition with the global transcription factor CRP (53,54). Analysis performed here shows statistically significant overlap between Fis-bound genes and known CRP targets. This suggested association between Fis and CRP targets might be indicative of cooperative or competitive interactions; however given these data, this cannot be substantiated at present. Taken together, there might be substantial differences in Fis function between rich and minimal medium, and in the presence and absence of catabolite repression-inducing sugars. In addition to the above, the following factors might have had relatively minor effects on the results. We used antibodies against the FLAG epitope which had been tagged to the protein of interest, whereas Grainger and colleagues used direct antibodies. The use of the same antibody against three different proteins makes the data from each protein more comparable by eliminating the effect of differential affinities that different antibodies might have towards their target proteins. Though the use of a tag might alter the function of the target protein, microarray analysis of gene expression in the tagged strains show that these effects are insubstantial (Supplementary Data 18). Finally, the lengths of the fragments used for sequencing (~200bp) and microarray hybridization (500–1000bp), and therefore the achievable resolution, are generally different (38).
Lucchini and co-workers used a similar low-resolution array to investigate the binding of H-NS to the genome of S. enterica Typhimurium (8). The important conclusion of this study, which was independently demonstrated in the same organism by Navarre and colleagues (58), was the silencing of horizontally-acquired genes by H-NS. They showed that H-NS-binding regions in general exclude RNA-polymerase. Oshima and colleagues identified binding regions of H-NS in E. coli using high-resolution microarrays and again showed its effect on horizontally-acquired genes (18). In contrast to the conclusions of Lucchini et al. and in agreement with those of Grainger and co-workers (17), these authors identified binding of RNA-polymerase to operon-upstream H-NS-binding regions, though the proximal genes are transcriptionally silent. Our data and analyses do not support this, possibly because of differences between the studies in the sampled binding sites, but are in agreement with those of Lucchini et al. Both Lucchini and Navarre have also demonstrated that uncontrolled expression of H-NS-silenced genes can lead to fitness defects (8,58). However, under the conditions used in our study, the wild-type and Δhns have similar growth rates. The difference between our observations might be due to the nature of the genes which are regulated by H-NS in the two organisms. This is reflected in our observation that a majority of H-NS targets Salmonella are not conserved in E. coli (see section ‘Comparison with previously published high-throughput datasets’), in line with the tendency of H-NS to silence horizontally-acquired genes.
The above studies were performed only during mid-exponential phase of growth, though Grainger and co-workers extended theirs to a medium supporting lower growth rates (17). A more recent genome-wide interrogation of H-NS-genome interactions by Noom and colleagues was interpreted, albeit tenuously, in the context of the formation of looped domain boundaries in the E. coli and S. typhimurium chromosomes (77). These authors performed their study in stationary phase cells, in addition to mid-exponential cells: in agreement with the documented 2-fold decrease in H-NS levels in stationary phase, the authors found that the spacing between adjacent H-NS binding patches doubles in stationary phase. In contrast, we find no evidence for decreased H-NS expression or binding in stationary phase, in agreement with observations made earlier for H-NS in Salmonella (9).
Cho and colleagues used high-density genome-tiling microarrays to interrogate the binding of Fis to the E. coli genome during mid-exponential growth under aerobic and anaerobic conditions, again in minimal medium (19). They showed that there is little difference in binding profiles between aerobic and anaerobic conditions, a comparison we do not perform. On the other hand, unlike our study they did not investigate multiple time-points during a growth phase. Similar to our conclusions, these authors found little association between Fis binding and differential expression in Δfis. This extends the observations made for another global transcriptional regulator CRP in E. coli (41), a large majority of whose binding sites are likely to have little effect on transcription. This led the authors to propose that the primary role of CRP is to structure the chromosome in an as yet uncharacterized manner; its role as a global transcription factor might be an incidental development. A similar interpretation may be valid for Fis as well.
Despite substantial overlap between our study and those of earlier investigations, we extend our interpretation by analysing the association between the nature of binding patches and its influence on gene expression. We show that H-NS binds to significantly longer patches of the chromosome than Fis, in both early- and mid-exponential phases. We speculate that these long binding tracts might include both arms of the plectonemic supercoils and the apical loops that H-NS introduces on the bound DNA (3,10); however, we note that this does not rule out the fact that instead of bridging DNA, H-NS might stiffen the bound DNA at certain sites (78). These long regions of H-NS binding enable transcriptional silencing—displaying greater differential expression in Δhns and also showing an enrichment for being present within protein occupancy domains associated with transcriptionally silent loci—whereas shorter patches act as gentler modulators of gene expression. Short H-NS binding regions display a greater preference towards binding to operon-upstream regions than both long H-NS- and Fis-binding regions. This tendency of short H-NS-binding regions to behave more typically like canonical transcription factors than Fis binding regions might explain the relatively greater proximal effect of short H-NS binding patches on gene expression when compared with Fis (40% of genes targeted by short H-NS binding regions are differentially expressed in mid-exponential phase, whereas only 15% of Fis targets are; Fisher’s exact test, P=1.6×10−8).
As mentioned above, both our study and that by Cho et al. discover that a large majority of strong Fis-binding events are inconsequential from the transcriptional perspective; however, we additionally suggest that the interaction of tandem arrays of Fis molecules to the DNA and possible DNA bending, particularly at operon-upstream regions, might be necessary, though not sufficient, for affecting transcription. Further, we notice that signals in our ChIP-Seq experiments for Fis are weaker than those for H-NS (Figure 1). This observation must be interpreted with caution since the efficiency of immunoprecipitation may depend on the clustering of multiple target proteins on the same chromosomal loci. Additionally, this might also be due to a higher background for Fis, resulting from weak or sporadic binding events across the genome. If this difference is indeed because Fis–DNA interactions in general are weaker and/or more dynamic than H-NS–DNA contacts, it might be responsible for the relatively weaker association between Fis binding and proximal gene expression change.
In contrast to previous studies, we also perform an analysis of the origins of non-proximal effects of the binding of Fis and H-NS to the chromosome. We show a general decrease in the expression of highly expressed transcripts in both the deletion strains, and speculate on the manner in which the RNA-polymerase is redistributed in these mutants: whereas foci of high transcriptional activity may be lost in Δfis, these are replaced by H-NS-bound genes in Δhns.
The main roles of NAPs, particularly in relation to gene expression control, are still under active investigation. Though our study contributes to this field, it leaves several questions, including the following, unanswered. (i) What is the predominant function of Fis-chromosome interactions? (ii) What are the implications, if any, of our observation that, on a genome-wide scale, there is a higher background signal for Fis than H-NS? (iii) What factors definitively link Fis binding to proximal gene expression change?
Finally, we also provide a proof-of-principle study for the use of massively parallel high-throughput sequencing for the analysis of protein–DNA interactions on a genomic scale in bacteria. This is a state-of-the-art technology which affords significantly higher resolution and dynamic range than microarray-based studies. However, there is substantial room for improvement. For example, modifications to the ChIP protocol, which minimize experimental artifacts—including capture of large molecular weight complexes—were proposed very recently (79). Second, from the sequencing perspective, multiplexing techniques are under active development (80). Since 10–15-fold coverage of the genome (compared with ~150-fold obtained in our study) should enable good recovery of binding regions for most bacterial proteins, multiplexing should make ChIP-Seq more economical and therefore prevalent in the field.
Array Express E-MTAB-332, Array Express A-MEXP-1866, Array Express E-MEXP-2838, European Nucleotide Archive ERP000280, Array Express E-MTAB-387.
Cambridge Commonwealth Trust; St. John’s College, University of Cambridge; Girton College, University of Cambridge (to A.S.N.S.); Spanish Ministry of Science and Innovation (to A.I.P); Biotechnology and Biological Sciences Research Council (BBSRC) grant ‘Genomic Analysis of Regulatory Networks for Bacterial Differentiation and Multicellular Behaviour’ (to G.M.F. and N.M.L.); Isaac Newton Trust (to G.M.F); European Molecular Biology Laboratory (EMBL) (to N.M.L.). Funding for open access charge: European Molecular Biology Laboratory.
Conflict of interest statement. None declared.
Supplementary Data are available at NAR Online.
The authors thank David Grainger, Sacha Lucchini and Nadia Abed for their advice on ChIP protocols. We thank Prof. Stephen Busby and Dr David Grainger for helpful discussion.