3.1. TCTCGCGAGA is the primary regulatory element of the HNRNPK promoter
Although computational approaches permit identification of gene regulatory and coding sequences, definitive characterization of promoters and their cis-regulatory motifs requires experimental investigation.
20 To identify which bioinformatically identified potential
HNRNPK promoter(s) preferentially activates transcription, a series of promoter region fragments fused to a reporter gene were generated, and these constructs were transiently transfected into HeLa cells. Reporter plasmids contained four fragments of an
HNRNPK promoter region (Fig. A). Sequences were identical to the human reference sequence (GenBank, NC_000009.10), with the exception of two single nucleotide polymorphisms: rs7859578 in promoter fragment 1, and rs796 004 in promoter fragment 3. The highest luciferase activity resulted from the reporter with
HNRNPK promoter fragment 1. Fragment 2 produced <20% of the luciferase activity obtained with reporter fragment 1, while the luciferase activities generated by constructs containing Fragments 3 and 4 were marginal (Fig. B).
Thus, promoter region 1 can be assumed to be the major
HNRNPK promoter. It contains the palindromic motif TCTCGCGAGA, which has been recently grouped with the most-conserved motifs in a genome-wide human–mouse assessment of six to eight nucleotide segments.
12 Palindromic motifs constitute an important group of regulatory elements. To further study the significance of this particular motif, we designed a series of Promoter 1 reporter constructs with mutated sequences at one, two, and three conserved nucleotide positions. Transcription assays revealed marked luciferase activity decreases in cells transfected with constructs containing the mutant motifs. A single point mutation in the palindromic motif resulted in a >80% loss of promoter activity, whereas triple mutations reduced the activity to near basal levels, demonstrating that this single regulatory site is essential for the majority of
HNRNPK promoter activity (Fig. C). Similar results were demonstrated by Guo
et al.,
21 who investigated the role of this motif in
FBN1 gene activity; the activity of a reporter gene decreased significantly when two middle nucleotides in the palindrome were replaced with two non-canonical nucleotides.
3.2. In vitro searches for the palindromic motif's interacting protein(s)
Regulatory elements that are concentrated in promoters (cis-elements) are recognized by trans-acting proteins that control transcriptional efficiency. To analyse the interactions between nuclear proteins and the palindromic motif in vitro, we performed EMSA and MS-based analyses.
In order to characterize pertinent binding interactions, we used NEs, ds 16 and 30 bp DNA probes, and ss 16 and 30 oligonucleotide (nt) probes containing either the wild-type palindromic motif or a similar motif mutated at three nucleotide positions (Table ). As shown in Fig. , a protein complex bound to 16 bp (Fig. A) and 30 bp (Fig. B) ds probes, containing either the wild-type or mutated motif, caused a similar DNA mobility shift in the form of a single band (Fig. , left panels). The wild-type ss oligonucleotide probes, on the other hand, formed two main protein–DNA complexes represented by the typically shifted band and a slowly migrating band (Fig. , right panels). When the wild-type 16 nt probe was replaced with a mutated oligonucleotide, the binding activity of the protein complex was substantially abolished (Fig. C). In contrast, replacing the wild-type 30 nt sequence with a mutated probe did not change the intensity of the upper band, and significantly increased the intensity of the lower band (Fig. D).
With the exception of the ss 16 nt probes, the other probes predominantly captured proteins in a sequence-non-specific manner; most of the signal originating from the shifted band was supershifted by antibodies directed against ATP-dependent DNA helicase 2 subunit 2 (Ku80), one of the proteins forming DNA double-strand break repair complexes.
22 Immunodepletion of Ku80 from the NEs had no effect on the supershift pattern, although the overall binding signal was weaker (Fig. ). On the other hand, the EMSA with a 16 nt ss probe produced quite a different picture of affinity to nuclear factors when compared with the activity of the other DNA probes. The supershift signal introduced by the anti-Ku80 antibody was much weaker in this assay, and competitor binding was diminished. This result suggests that the palindromic sequence in a single-stranded conformation may represent a functional cis-element. Since a palindromic motif may form a hairpin structure only when single-stranded, it is plausible that this state preferentially favours certain nuclear factors, or facilitates changes in local chromatin structure that drive transcription from promoters bearing that sequence.
To gain a broader view of the protein repertoire that might exist in complexes with the DNA probes, MS-based analyses were applied. We used NEs isolated from resting and proliferating cells in binding reactions with four 5′-biotinylated ss and ds probes, of two lengths, containing the palindromic motif sequence TCTCGCGAGA. Paired binding reactions were performed with and without a 10-fold excess of a competitor with similar characteristics (ss or ds, shorter or longer, non-biotinylated probe with a mutated motif). In addition, binding reactions were repeated using NEs that were depleted by immunoprecipitation with anti-Ku80 antibodies. Altogether, quantitative MS analyses of proteins bound to the DNA probes were carried out on 32 individual samples.
Database searches were performed on 135 592 MS/MS spectra, resulting in the identification of 15 150 PSMs with
q-values ≤ 0.001 (corresponding to 2268 unique peptides). In total, 522 proteins were identified. Some proteins were represented by multiple peptides in all analytical MS runs, whereas other proteins appeared in individual runs as singular peptides; 297 proteins were identified by at least two peptides. Two proteins were detected by the MS/MS analysis only in the depleted samples, while 33 proteins were found only in the samples using crude NEs. Two hundred and sixty-two proteins were common to both experimental conditions. For statistical analysis we selected a subset of 132 proteins identified by at least three peptides that were annotated with the GO term nucleus (GO: 0005634), after exclusion of ribosomal proteins and histones (
Supplementary Table S2).
Normalized relative protein abundances were first transformed to corresponding principal-component scores. In the resulting plot (Fig. ), the first component divided the analysed samples into two distinct groups by discriminating between the binding of ss and ds sequences, although the second component discriminated based on probe length.
Proteins may interact with a DNA sequence directly and/or indirectly, in both sequence-specific and -non-specific manners. Assuming that a competitor competes out non-specific DNA–protein interactions, we constructed a protein abundance ratios matrix. The elements of the matrix were equal to the ratios of the protein abundances measured in the paired binding reactions conducted with and without DNA competitor molecules. We assumed that lack of statistically significant changes in the protein abundance, or its enrichment within protein complexes existing in the presence of a competitor, may indicate that proteins are bound to DNA probes in a sequence-specific manner.
As already demonstrated by PCA, the relative abundances of DNA probe-associated proteins, and the protein compositions of DNA–bound complexes, were mostly dependent on probe sequence. Although the relative levels of 75 proteins, extracted from the nuclei of both resting and proliferating cells and associated with ss 16 nt and 30 nt probes, were not changed or were enriched in the binding reactions conducted in the presence of a competitor, the same was true only for 21 proteins associated with ds DNA probes (
Supplementary Table S3). Further depletion of NEs by anti-Ku80 antibodies treatment additionally reduced the numbers of proteins which interaction with wild-type motif was unaffected by competitors to 40 and 8, respectively (
Supplementary Table S4). Only three proteins (P46 013, Antigen KI-67; Q09666, Neuroblast differentiation-associated protein AHNAK; Q5SSJ5, Heterochromatin protein 1-binding protein 3) were common to all experimental settings.
Therefore, analysis by EMSA and MS of the in vitro formation of DNA–protein complexes revealed compositional complexity of proteins interacting with the DNA probes, but did not select trans-acting protein(s) that unambiguously directly recognize and bind the palindromic motif TCTCGCGAGA.
EMSA is a well-established method of monitoring the ability of a protein to bind to a DNA sequence
in vitro. Despite the fact that it has been successfully employed to characterize transcription factors and the transcriptional regulation of various genes,
23 it also responds to non-specific protein binding, as in the case of the abundant DNA repair proteins. In particular, a heterodimer of Ku80 and Ku70 binds with high affinity to DNA ends, attracting the other components of the DNA double-strand break repair machinery.
22 We have encountered this inherent technical flaw during our EMSA of oligonucleotides containing the palindromic motif. Recently, Hu
et al.
24 used a protein microarray-based strategy to systematically characterize the repertoire of proteins bound to 752 predicted DNA motifs from previously published studies; the palindrome TCTCGCGAGA was characterized among them. Using their data deposited in the Human Protein-DNA Interactome database,
25 we obtained a list of 21 proteins with reported
in vitro affinity to this motif: LIG1, MTCP1, HOXB9, TRIP6 PHOX2A, TFAP2E, RBPMS2, NEIL2, MYOD1, ETV7, CREB5, RBM19, TCF7, C1orf25 LIG3, ACTL6B, MAFB, ZNF498, SORCS3, POU6F1, IRF3, and PARP3. We also generated a list of the 10 best matching proteins (RPS4X, PLG, H1FX, HES5, TFEB, USF2, GRHPR, TP73, ZMAT2, and LARP4) when the database was queried with the TCTCGCGAGA motif. Additionally, the hDREF protein, which was shown to regulate the expression of ribosomal proteins genes, is another protein with reported
in vitro affinity to this motif.
26 Surprisingly, none of these 32 proteins were found within the group of 132 proteins taken through quantitative statistical analysis of our MS data. This discrepancy again underlines the difficulties inherent in
in vitro methods, where slightly different conditions (salt, detergent, protein, and DNA probe concentrations) may significantly affect the experimental readout and the ultimate data interpretation.
3.3. Cellular proliferation state increases non-specific DNA-binding
We discovered that the composition of protein complexes interacting with the DNA probes relates to the proliferation status of the cells used for NE preparation. Using a FDR threshold equal to 0.05, our analysis identified 13 proteins that bind to ds 16 and 30 bp DNA probes with significantly higher affinities during proliferation, compared with quiescent cells (Table ). All these proteins are components of DNA repair process according to the GO annotation analysis (GO: 0006281). The fold change values of their relative abundances were significantly (P < 0.02; Mann–Whitney test) decreased by a competitor and by anti-Ku80 antibody depletion of NEs used in the binding reactions.
| Table 2Relative protein abundances in proliferating vs. resting cells, bound to ds DNA probes |
To further characterize the changes in the DNA-binding affinities of non-specific binders in response to mitogenic signalling, quiescent cells were treated with 15% FBS for 0, 1, 6, and 24 h, after which NEs were prepared and used in EMSA. Again, the amount of the shifted ds probes increased with extension of time of serum induction of cells used for NE preparation. Results of a representative experiment with ds 16 bp DNA probes are presented in Fig. A. These results indicate that either expression of the non-specifically interacting proteins in proliferating cells is increased, or that there is an increased interaction with non-specific DNA-binding proteins, or both. To differentiate between these possibilities, we analysed equal amounts of proteins from NEs by western blotting, using anti-Ku80, anti-PARP1, and anti-hnRNP K antibodies. This analysis revealed equal amounts of Ku80 and hnRNP K in extracts from both resting and proliferating cells and moderately increased signal of PARP1 after 6 h of serum stimulation (Fig. B).
3.4. Histone modification at promoters with the palindromic motif TCTCGCGAGA
Chromatin is composed of nucleosomes, in which ~147 bp of DNA are wrapped 1.7 times around a core of two copies each of histone proteins H2A, H2B, H3, and H4.
In vivo positioning and covalent modifications of nucleosomes play an important role in transcriptional regulation. Post-translational covalent modifications of histone proteins include methylation, acetylation, phosphorylation, and ubiquitination. Such processes recruit and bind chromatin remodelling complexes which alter the configuration of the chromatin, and in turn, control gene transcription.
27 There is a high degree of complexity in the number of enzymes, histone modifications, and variant histones involved in chromatin regulation.
28 The spatial and temporal intricacies of the interactions of these factors with the genome suggest that chromatin regulation is at least as important to the transcriptional state of a gene as the DNA cis-elements and trans-factors present in its promoter. Thus, any complete study of the factors regulating gene expression must include the chromatin state at that gene's locus.
The promoters of expressed genes are characterized by decreased nucleosome occupancy (particularly a nucleosome-free region around the transcription start site), acetylation of H3 and H4 histones (H3Ac and H4Ac), trimethylation of histone H3 lysine 4 (H3K4me3), and the presence of histone destabilizing chromatin remodelling complexes.
29 The overall effect of these modifications is a general decreased presence and stability of nucleosomes, and as a result, increased accessibility to transcription factors and the RNA polymerase II (RNAPII) complex. On the other hand, silenced genes are characterized by a different set of histone modifications and, in general, by denser chromatin architecture. Histone H3 lysine 9 and lysine 27 trimethylation levels (H3K9me3 and H3K27me3) are highest at silent genes.
29To characterize the HNRNPK promoter in greater detail, we compared it to other promoters bearing the palindromic motif TCTCGCGAGA (HNRNPH—heterogeneous ribonucleoprotein H and RPS7—ribosomal protein S7), well-established constitutively active housekeeping genes (HGs) (TUBB6—Tubulin beta-6, GAPDH—glyceraldehyde-3-phosphate dehydrogenase), an immediate early gene (EGR1—early growth response 1), a heat shock-induced gene (HSP70—heat shock 70 kDa protein), and to a silent gene (HBB—beta globin). We measured histone modification and RNAPII complex levels during serum stimulation in a ChIP assay (Fig. ).
The RNAPII complex levels measured on
HNRNPK,
HNRNPH, and
RPS7 were similar to those at the HGs (~10% of input), and remained relatively unchanged during serum stimulation. As expected, RNAPII levels at the
HBB promoter were smallest, and levels at the
EGR1 promoter after 1 h of serum treatment were 2-fold higher than quiescent cells, followed by a return to basal levels at 6 and 24 h. We also observed progressive accumulation of RNAPII at the
HSP70 promoter during the serum time course, from 15.8% of input up to 24.1% in quiescent cells and after 24 h in serum, respectively (fold change = 1.52; Fig. ,
Supplementary Table S5).
We subsequently analysed levels of acetylation of histone H3 lysine 9/18 (H3K9/18Ac), trimethylation of H3 lysine 4 (H3K4me3), and trimethylation of H3 lysine 27 (H3K27me3) at the various promoters. Acetylation of lysines eliminates their positive charge which, when it occurs on certain histone tails, has been shown to have a negative effect on the higher order structure of chromatin, essentially making it more open. H3K9/18Ac is associated with actively transcribed genes.
27 As expected, H3Ac levels were mirrored by the presence of RNAPII complexes, and remained similar during the serum stimulation on both palindromic and HG promoters. On the other hand, H3Ac levels measured at the
EGR1 and
HSP70 promoters were relatively smaller when compared with RNAPII amounts. The RNAPII/H3Ac ratios for
EGR1 and
HSP70 were 67:1 and 47:1, respectively, whereas the same measurement for the
HBB promoter was 69:1. On the other hand, the ratios for both palindromic and HG promoters ranged from 11:1 for
HNRNPH to 23:1 for
RPS7 (Fig. ,
Supplementary Table S5). H3K4me3 is associated with the 5′ ends of actively transcribed genes; H3K4me3 has been suggested to mark an open chromatin state by recruiting ATP-dependent chromatin remodelling complexes.
27 H3K27me3, however, marks silenced genes,
29 and appears to be dominant over other activating marks; genes that are marked with both K3K27me3 and H3K4me3 are typically silent. As expected, H3K4me3 was detected at the promoters of all analysed genes, including
HBB. Also as expected, the H3K27me3 levels for all promoters were lower in comparison to levels at the transcriptionally silent
HBB. In sum, these results indicate that RNAPII levels and histone modification levels at the promoters containing the palindromic motif TCTCGCGAGA are similar to those of constitutively active HG promoters.
We also wished to determine whether any of the proteins belonging to the DNA double-strand break repair complexes, as determined by EMSA and MS (Table ), were present at these promoters. Using ChIP, we measured the abundances of bound Ku80 and PARP1 at these promoters. PARP1 is responsible for the synthesis and attachment of polymers of ADP-ribose from nicotinamide adenine dinucleotide (NAD+) to the glutamic or aspartic acid residues of target proteins.
30 PARP1 binds to the
HSP70 promoter
in vivo,
31 and therefore we used this gene as a ChIP positive control for PARP1 binding. Although Ku80 did not bind any of the studied promoters (data not shown), unexpectedly, our ChIP experiment demonstrated PARP1 binding not only to the
HSP70 promoter, but to all promoters tested, with the highest level of bound PARP1 protein at the
HBB promoter (Fig. A and B). Additionally, we tested an intergenic region and two sites within the
rhodopsin (
RHO) gene, where we found similar levels of PARP1 as measured at the
HBB promoter (Fig. B), along with the expected depletion of RNAPII abundance (data not shown). Overall, the ChIP assays revealed ubiquitous binding of PARP1 to various promoters, with significantly higher abundance at heterochromatin sites (Fig. C).
A large body of evidence has accumulated which unambiguously supports the involvement of double-strand break repair complex proteins in transcriptional regulation.
32–34 Ku80, Ku70, DNA-PK (DNA-dependent protein kinase), and PARP1 were among the most abundant proteins bound to DNA probes, as quantified by our MS assay (Table ).
PARP1, a ubiquitous and abundant nuclear protein, functions as part of the DNA repair pathway, but also orchestrates transcription through interactions with transcription factors, nucleosomes, histone-modifying proteins, and specific binding to promoters, enhancers, and insulators.
30 PARP1 was found to display an affinity not only to ds DNA ends, but also to DNA secondary structures like hairpins.
35,36 In particular, the
PARP1 promoter contains several imperfect repeats that are capable of forming a hairpin structure recognized by PARP1 itself, resulting in self-mediated transcriptional inhibition.
36 Since the TCTCGCGAGA motif present in the
HNRNPK promoter may form a hairpin structure
in vivo, we speculate that such a spatial scaffold may attract PARP1. This possibility could explain the loss of signal in EMSA employing the mutated 16 nt ss probe, as the correct hairpin structure would be recognized by PARP1 both
in vivo and
in vitro. PARP1 was described as a structural component of heterochromatin that modulates transcription by self-modification in an NAD-dependent manner.
37 Furthermore, it was shown
in vitro that PARP1 is able to promote the compaction of nucleosomes into higher order structures when NAD
+ is not present; on the contrary, saturating amounts of NAD
+ caused decondensation.
38 Overall, PARP1 acts as either an activator or a repressor of specific genes, depending on the physiological conditions,
30 and we cannot rule out the possibility of its direct involvement in the regulation of promoters with the TCTCGCGAGA motif.
3.5. Chromatin accessibility at promoters with the palindromic motif TCTCGCGAGA
During gene activation, nucleosomes are displaced and/or repositioned in a process involving histone modification, ATP-dependent nucleosome remodelling complexes, histone chaperones, and the shuffling of histone variants.
28 Thus, characterizing global nucleosome positions and their chemical and compositional modifications is key to unravelling the mechanism of transcriptional regulation.
The advent of massively parallel DNA sequencing technology has allowed the localization of every nucleosome across the genomes with a high degree of accuracy.
39 We made use of publicly available databases to gain additional insight into chromatin structures surrounding promoters bearing the TCTCGCGAGA palindromic motif. We explored datasets provided by the ENCODE consortium within the Open Chromatin project, which, among other datasets, offers DNase I hypersensitive site (HS) tracks for multiple cell lines.
19 DNase I HS sites contain a mixture of cis-regulatory elements, and it has been shown that there is a striking co-occurrence of HS with regions of well-positioned nucleosomes.
40First, we examined the DNase I HS profile within the HNRNPK promoter (Fig. A), averaging the HS profile for four cell lines. This analysis revealed decreased values in the HS profile occurring at the palindromic motif, as well as an oscillating pattern of hypersensitive DNA that may represent a combination of trans-elements and positioned nucleosome occupancies within this region. Next, we extracted a HeLa dataset for 50 genes containing the palindromic motif TCTCGCGAGA, centred the dataset on that sequence, and superimposed all HS profiles averaged over a 150 bp window (Fig. B). This analysis revealed that TCTCGCGAGA may define the boundaries of open chromatin, as it is flanked by two distinct HS peaks corresponding to positioned nucleosomes. On the basis of this observation, we believe that the palindrome itself may demarcate nucleosomal positions within promoters bearing it, exhibiting features that prevent nucleosomal occupancy of this motif.
The control of transcription is one of the most important and highly regulated steps in eukaryotic gene expression, and thus is instrumental in orchestrating the metabolic, growth, replicative, and differentiation states of cells. It is not surprising then, that transcription is carried out by a complex sequence of events, any combination of which may be regulated. Studies that investigate gene regulation are greatly facilitated by methods that characterize the interactions between proteins and DNA. Several methods have been developed to identify the direct binding of proteins to DNA sequences. In the classical DNA footprinting protocol, sequences that bind to a protein of interest are determined by identifying regions of DNA that are protected from digestion by exonuclease III or DNase I. The protected sequence corresponds to the ‘footprint’ the protein leaves on the DNA during digestion.
41 Another classical method, EMSA, involves the retardation of a radiolabelled DNA probe by the binding of a protein of interest (or by a protein within a complex), indicating a molecular interaction.
23 All of these methods involve the
in vitro interaction between a recombinant or NE-extracted protein with a naked segment of DNA, eliminating the chromatin context and the possible modification state(s) of the protein of interest. Similarly, assays utilizing reporter genes fail to take into account the local chromatin context, whereas potential cis-elements (promoters and enhancers) are tested using transient transfections, in which the test plasmid remains unintegrated in the nucleus.
20 Despite improvements in both computational prediction of functional DNA sites and experimental verification techniques, current methods allow only snapshots of the transcription process, and as such favour indirect conclusions.
Recent advancements in high-throughput microarray (ChIP-chip) and sequencing methods (ChIP-Seq) have increased our understanding of chromatin composition, revealing transcription factor binding sites (TFBS),
42,43 chromatin accessibility,
44 histone modifications,
45 and nucleosome positions
39 on a genome-wide scale. These varied experimental datasets augment computational TFBS prediction methods, and aid in the description of transcriptional regulation on the scales of single genes to the entire genome.
3.6. Conclusions
In the present study, we have characterized the molecular features of the palindromic motif TCTCGCGAGA, found not only in the
HNRNPK promoter but also, as described in our previous work,
12 a potential cis-regulatory element of ~5% of human genes. We used a reporter system assay to show that out of four identified
HNRNPK promoters, the one containing the palindromic motif exhibits the highest expression of a reporter gene. Although EMSA and MS analyses, performed to discover the palindrome-binding transcription factor, did identify a complex of DNA-binding proteins, neither method unambiguously identified the relevant direct trans-acting protein(s). Moreover, for a collection of proteins functionally implicated in DNA repair processes we observed that changes in their affinity to ds DNA probes are related to the proliferation status of the cells.
To analyse the chromatin configuration at the HNRNPK promoter, we performed comparative ChIP experiments of RNAPII occupancy and histone modification density among the promoters of housekeeping, inducible, and silent genes. Our analysis revealed that promoters with the TCTCGCGAGA motif display properties of the promoters of HGs (GAPDH and TUBB6), containing similar RNAPII levels and relatively more H3Ac modifications than inducible and silent genes.
The analyses of open chromatin data provided by the ENCODE project revealed that the TCTCGCGAGA motif has features that promote constitutive openness of the chromatin and favor DNA accessibility. This function could be due to the binding of an unknown transcription factor, a hairpin spatial structure preventing nucleosome occupancy, or both. In sum, experimental observations using in vitro methods and analyses of open chromatin data support the function of this sequence in local nucleosomal organization, resulting in the regulation of transcription of the HNRNPK gene. Our findings suggest that the TCTCGCGAGA motif may play a critical role in regulating and establishing nucleosome patterns at the promoters containing it. Further advancements in analytical methods should elucidate the comprehensive picture of the function of this motif in gene expression regulation.