The 3′ ends of ZNF genes do not have characteristics of promoter regions. As indicated above, H3K9me3 is often associated with promoter regions but our previous studies showed that it specifically covers the 3′ exons of many ZNF genes. Because H3K9me3 is not generally associated with the 3′ ends of genes, we reasoned that it was possible that the 3′ ends of the ZNF genes may harbor previously unidentified promoters. To investigate this hypothesis, we first performed a bioinformatic analysis. Others 
have previously identified a set of motifs that are highly enriched in human promoters. If the 3′ ends of the ZNF genes are previously unidentified promoters, then we would expect the known promoter motifs to be enriched in these regions. We extracted the target region for each ZNF 3′ end on chr19 that was identified using ChIP-seq of hES cells as being enriched for H3K9me3 (121 genes). As a positive control set for human promoters, we selected +/- 500 bp from the center of the transcription start site of each ZNF target gene and as a negative control set we selected “enhancer regions”, which we define as 1000 nt regions located from −10,000 to −11,000 nt upstream of the promoter of each ZNF gene bound by H3K9me3 at its 3′ end (note that if these enhancer regions overlapped any known promoter regions, they were excluded from analysis). Each of the 3 sets of sequences (upstream negative controls, ZNF promoters, and ZNF 3′ ends) from the 121 ZNF genes were analyzed for the occurrence of the position weight matrix (as defined in 
) for the promoter-enriched motifs CAAT, CREB, NRF2, GC, Inr, NRF1, Sp1, and YY1 using the program MAST. Motifs with an E-value below 10 were used for evaluation and each region was searched for the motif in the forward and reverse orientation. Then, the number of motifs per 1kb region was determined for each category (upstream negative controls, ZNF promoters, and ZNF 3′ ends) to normalize the analysis (). As expected, most of the promoter motifs are found in the ZNF promoters and some motifs are found in the enhancer regions. However, the 3′ ends of the ZNF genes do not contain the known promoter motifs.
ZNF 3′ ends are not enriched for promoter motifs or CAGE tags.
Of course, it is possible that promoters embedded in ZNF 3′ exons might be regulated by a different set of transcription factors and thus would not show enrichment of the common promoter motifs. Therefore, we next investigated whether we could associate the H3K9me3 binding sites with 5′ ends of transcripts. For these analyses, we obtained all CAGE tags available from the Riken CAGE database (http://gerg01.gsc.riken.jp/cage/download/hg17prmtr/chromosomes/
). The number of CAGE tags were counted for each region (upstream negative controls, ZNF promoters, and ZNF 3′ ends) and normalized for region size. As shown in , the set of ZNF promoters could be associated with the CAGE tags but very few CAGE tags could be found that corresponded to the upstream enhancer regions or to the H3K9me3 sites at the ZNF 3′ ends. Thus, bioinformatic analyses do not provide strong support for the hypothesis that ZNF 3′ exons are promoter regions.
As a third approach to test the hypothesis that the 3′ exons of ZNF genes are promoters, we cloned a set of the 3′ exons into a luciferase reporter vector (). As positive controls for these assays, we used the DHFR
promoter (a well-studied housekeeping promoter that is active in all cell types) and the promoter regions of the ZNF554
genes. Eight ZNF 3′ ends were analyzed for promoter activity. It was possible that the ZNF 3′ ends could be promoters oriented such that they produced antisense transcripts relative to that same ZNF gene or they could be alternative upstream promoters of a downstream gene. Therefore, we cloned 5 of the ZNF 3′ ends in both orientations upstream of the luciferase cDNA so that we could analyze transcriptional activity in both directions. To begin, HepG2 and DAOY cells were transfected with the various reporter constructs and luciferase activity was determined. As expected, the DHFR
, and ZNF440
promoters were active in both cell types. However, none of the ZNF 3′ ends showed any promoter activity in either orientation (). We have shown previously that the ZNF 3′ ends bound by H3me3K9 are also bound by KAP1 (also called TRIM28) 
. KAP1 is a corepressor that recruits the histone methyltransferase SETDB1, resulting in trimethylation of histone H3 on lysine 9 
. It is postulated that when KAP1 is located at promoter regions, it can function to repress transcription by recruiting a histone methyltransferase and other repressive proteins such as HP1 family members 
. Thus, it was possible that potential promoter activity from the ZNF 3′ exons was suppressed due to KAP1-mediated repression. Therefore, we also transfected the various promoter-luciferase constructs into both U2OS cells and HEK293 cells that were stably expressing shRNAs against KAP1 
. However, even under conditions of KAP1 knockdown, none of the tested 3′ ends of the ZNF genes functioned as a promoter in the luciferase reporter assays (). Therefore, neither bioinformatics analyses nor a functional assay provides support for the hypothesis that the 3′ exons of the ZNF genes are cryptic promoters, even though they are bound by a mark (H3K9me3) often found at promoter regions.
The 3′ ends of ZNF genes do not display promoter activity.
The 3′ exons of ZNF genes have histone modifications characteristic of both silenced and active regions. As indicated above, H3K9me3 is generally considered to be a mark of repression due to its colocalization with the KAP1 corepressor and the HP1 family of silencing proteins. However, the unusual location of the H3K9me3 marks at the 3′ exons of ZNF genes may suggest that this mark is functioning in a different manner for this set of genes. To investigate whether the ZNF genes that are bound by H3K9me3 are active or repressed, we analyzed the ZNF genes for additional histone modifications, including ChIP-seq data for H3K4me3, H3K4me1, H3AcK9, H3K27me3, and H3K36me3. The only other mark that was found at the ZNF 3′ ends was H3K36me3, a mark of transcriptional elongation. As shown in , the ZNF 3′ ends that have H3K9me3 also have high levels of the H3K36me3 mark. Although the binding patterns of H3K9me3 and H3K36me3 are similar over many ZNF genes, there are clearly many regions identified by H3K9me3 but not by H3K36me3 (e.g. 59680000 and 60060000 which includes a cluster of killer cell and leukocyte immunoglobulin-like receptors). As expected, there are also many regions identified by H3K36me3 and RNA-seq (e.g. the RPS9 gene at ~5940000) but not by H3K9me3. To examine the relationship between H3K9me3 and H3K36me3 on a genome-wide scale, we analyzed hES cell ChIP-seq data obtained using the H3K9me3 and H3K36me3 antibodies. To do so, we first needed to modify Sole-search, our ChIP-seq peak-calling program 
. This program was originally developed to analyze the binding patterns of site-specific DNA binding proteins, which produce very narrow peaks in ChIP-seq analyses. The binding patterns of modified histones, in particular H3K9me3 and H3K36me3, are not narrow but instead can spread over large regions and exhibit very jagged enrichment profiles. Therefore, we modified the program such that it allows “binding regions” rather than narrow peaks to be identified; see Supplementary Information S1
and Figure S1
for more details about Sole-searchv2. A comparison of the number of called peaks and genomic coverage using the standard versus the histone parameters is shown in ; the most important difference is that a larger region of the genome is identified as being covered by histone marks using the modified program. Using the histone-specific program parameters, we identified 20,744 H3K9me3 regions and 32,685 H3K36me3 regions. However, only 2,813 regions are covered by both types of modified histones (). We further characterized the 2,813 regions marked by both H3K9me3 and H3K36me3. First we determined how many RNA-seq tags mapped to the H3K36-specific regions, the H3K9me3-specific regions, and the regions covered by both marks. As expected, the regions covered only by H3K36me3 were transcribed and the regions covered only by H3K9me3 were not transcribed (). Interestingly, the dual bound regions display modest transcription levels, suggesting that the presence of a H3K9me3 mark is not incompatible with transcription. Then, we mapped the dual covered regions to the nearest gene and used the DAVID gene ontology program 
to identify specific functional classes of genes showing these dual patterns. Strikingly, the main gene category identified was Krueppel-associated C2H2 zinc finger genes (243 genes), with a minor category of cadherins (25 genes) (). The cadherins are all clustered on chromosome 5 and consist of a tandem array of alternatively used 5′ exons and common 3′ exons. Unlike the 3′ exon-specific marking of the ZNFs by H3K9me3 and H3K36me3, it is the promoter regions and alternatively used 5′ exons of the cadherin genes that are dually bound by the two histone marks.
ZNF3′ ends are marked by H3K36me3.
ZNFs are the largest category of genes that have both H3K9me3 and H3K36me3 marks.
The H3K9me3 mark is not repressive when located at ZNF 3′ ends. Our ChIP-seq results suggested the possibility that the ZNF genes may be imprinted, with one allele transcribed (and covered by H3K36me3) and one allele repressed (and covered by H3K9me3). To test the hypothesis that the contradictory modifications are due to imprinting, we examined levels of RNA corresponding to the ZNF genes in hES cells, using RNA-seq data 
. We found that the ZNF exons that show enrichment for H3K9me3 are transcribed in hES cells (). To further investigate the role of H3K9me3, we identified the top 716 H3K9me3 sites and separated them based on whether they were located in a promoter region (183 sites) or within a gene (433 sites). As shown in , promoter regions bound by H3K9me3 are associated with a very low number of RNA-seq tags in hES cells whereas regions within genes that are bound by H3K9me3 are associated with a larger number of RNA-seq tags. Thus, both RNA analyses and chromatin modification analyses suggest that ZNF genes having H3K9me3 on their 3′ exons are transcribed from at least one allele.
If only one allele of the ZNF genes is transcribed, one might expect that the H3K9me3 and H3K36me3 peak heights at the dually covered regions would be approximately half the peak heights of the marks at the singly covered sites (because only one allele would contribute to each signal in the dually covered sites whereas both alleles would contribute to the signal in the singly covered sites). However, the average height of the H3K9me3 peaks at 3′ ZNF exons that had both marks was higher than the average height of the H3K9me3 peaks at 3′ ZNF exons that had only H3K9me3. Similarly, the average height of the H3K36me3 peaks at 3′ ZNF exons that had both marks was higher than the average height of the H3K36me3 peaks at 3′ ZNF exons that had only H3K36me3 (data not shown). Although not conclusive, these results suggested that the H3K9me3 and H3K36me3 marks on the dually bound ZNF 3′ exons may not be limited to only one allele. We next used the program ssahaSNP to identify single nucleotide polymorphisms (SNPs), as compared to the reference genome sequence, that were located within genes targeted by both H3K9me3 and H3K36me3 on chromosome 19. We then searched for these SNPs within the RNA-seq dataset. Out of 72 SNPs that were identified, 40 SNPs and 3 wt nts were fixed in the RNA sequence data (the SNP or wt nt composed >90% of all nucleotides sequenced for this position). This could suggest either that the ZNFs containing these SNPs were expressed in an allele-specific manner or that the SNP was fixed in the ES cell genome. However, we did identify 29 cases where both wt and SNP alleles are present in the RNA-seq tags approximately equally ( and Table S3
). Since both alleles are transcribed, the hypothesis that transcription is allele-specific is not supported for these ZNF genes bound by both H3K9me3 and H3K36me3.
H3K9me3 and H3K36me3 are not co-regulated. The studies presented above suggest that the H3K9me3 and H3K36me3 marks on the dually bound ZNF3′ exons are not mutually exclusive. We next wished to investigate whether the dual H3K9me3 and H3K36me3 marks are co-regulated. To investigate this hypothesis, we used a method that we have previously developed called eChIP, which allows the in vivo study of a chromosomal region removed from its normal chromosomal location 
. We cloned the 3′ ends of ZNF555, ZNF556, and ZNF77 into the episomal vector, introduced this vector into HEK293 cells that express EBNA1 (required for the maintenance of the episome), and then performed a ChIP analysis using H3K9me3 and H3K36me3 antibodies. As a control, we first demonstrated that in HEK293 cells the 3′ ends of the endogenous ZNF555
, and ZNF426
genes are bound by both H3K9me3 and H3K36me3 and that, as predicted from the hES ChIP-seq results, the 3′end of ZNF556
is bound only by H3K9me3 and not by H3K36me3 (). We also showed that the GAPDH
gene is bound only by H3K36me3 and not by H3K9me3, as expected for a highly transcribed gene and that the GMNN
promoter is not bound by either mark. We next analyzed the episomal constructs. We observed high levels of H3K9me3 on the episomal ZNF 3′ ends, but not on the GMNN promoter, an episomal vector that does not contain an insert, or the hygromycin cDNA region (). In contrast, analysis of the episomal ZNF 3′ ends showed that they are not marked by H3K36me3. As a control to demonstrate that the H3K36me3 antibody could detect the H3K36me3 mark on an episome, we analyzed the hygromycin cDNA that is located on the same episomal vector as the GMNN promoter and cloned downstream of a eukaryotic promoter. Expression of hygromycin is used to select the stable episomes and thus this cDNA is transcribed and we could detect H3K36me3 in this region of the episome. These results demonstrate that although both H3K9me3 and H3K36me3 marks show very high enrichment at the 3′ ends of ZNFs, these two marks are not co-regulated. H3K36me3 is produced by SETD2 
and is thought to be recruited to the transcriptional elongation complex via interaction with Ser 2-phosphorylated RNA Polymerase II (reviewed in 
); the levels of Ser 2-phosphorylated RNA polymerase II, and thus also the levels of H3K36me3, increase from the 5′ to 3′ direction throughout the transcription unit. Our results show that the H3K9me3 deposition is not dependent on the same mechanism as H3K36me3 deposition since the episomes have the H3K9me3 but not the H3K36me3 mark. In fact, the high H3K9me3 mark on the episomal ZNF 3′ exons indicates that the cis elements required for the mark are all present on the episomes. We have recently shown that the KAP1/SETDB1 histone methylation complex colocalizes with H3K9me3 
. Thus, the episomes must contain the cis elements that can recruit the KAP1/SETDB1 complex.
H3K9me3 and H3K36me3 are not co-regulated at ZNF 3′ ends.
H3K9me3 at ZNF 3′ exons corresponds with the number of tandemly repeated finger domains. ZNF genes contain multiple, highly conserved finger domains in their 3′ exons. In essence, these finger domains create highly repetitive genomic regions. H3K9me3 has been shown to bind to other repetitive elements, such as transposons and centromeric repeats 
. If the H3K9me3 was binding to the ZFN 3′ ends simply as a consequence of the repetitive nature of these genomic regions, then we would expect the likelihood of H3K9me3 to be bound to a ZNF gene to increase as the number of finger domains increases. To test this possibility, we classified the ZNF genes into sets depending on how many tandomly repeated finger domains were present in the gene and then plotted the percentage of each set that is covered by H3K9me3 (). We note that ZNFs that have very few finger domains are rarely covered by H3K9me3, whereas ZNFs having greater than 15 domains are almost always covered by H3K9me3. Clearly, there is a strong relationship between the number of tandemly repeated finger domains and the likelihood of being covered by H3K9me3. In contrast, there is no relationship between the number of repeated finger domains and the likelihood of being covered by H3K36me3. As a further analysis of the repetitive nature of H3K9me3 vs. H3K36me3 peaks, we examined 20,744 H3K9me3 peaks and 20,744 H3K36me3 peaks and searched for tandem repeats using the program XSTREAM, identifying sequences that are 50bp or larger, repeated at least 5 times in the human genome, and having at least 60% conservation between repeat elements. We then calculated the percentage of each peak that was a repetitive element. We found that more of the H3K9me3 peaks consisted of a high percentage of repetitive regions. For example, there are ~17 times more H3K9me3 peaks that consist of 91-100% repetitive elements (see Figure S2
H3K9me3 enrichment at ZNF 3′ ends correlates with the number of finger domains.