High-resolution mapping of protein positions with ChIP-seq on small viral episomes yields advantages over similar experiments in larger genomes: technical gains allow confident data analysis and verified annotations help generate functional hypotheses of protein function. The high copy number of episomes, ~ 45 in the case of EBV-infected Raji cells (Sternas et al., 1990
), increases the signal-to-noise ratio of ChIP experiments because of deeper sequencing compared to the simultaneously examined human genome. Multifold coverage of the background consequently allowed us to statistically define an experimental baseline for experiments. This simplified peak calling to choosing a suitable enrichment over background; our conservative threshold of a 10-fold enrichment yielded an ostensible false discovery rate of 0 and strong confidence in our identification of binding sites. Viral genomes have also been well annotated and subjected to functional mutagenesis experiments. Once protein-binding sites have been determined at high resolution, we can correlate occupancy with known and tested sequence elements. These advantages yield reliable distributions of occupancy that allow us to make functional inferences based on protein positions.
Widespread distribution of CTCF and cohesin on the EBV genome reveals binding at or near both latent and lytic genes, suggesting distinct regulatory functions at different positions. Focus on CTCF function in EBV has been limited to binding sites within the latency control region, and consequently potential roles for CTCF have only involved the determination of latent promoter choice and establishment of boundaries between latent and lytic genes (Chau et al., 2006
; Day et al., 2007
; Tempera et al., 2010
). Our experiments identified CTCF-binding sites widely dispersed on the genome, including in and around many lytic genes. We also observed colocalization with cohesin, similar to what is observed in the human and KSHV genomes (Parelho et al., 2008
; Stedman et al., 2008
; Wendt et al., 2008
), that is likewise not limited to areas of latent transcription initiation. Because of the widespread occupancy distributions, we surmise that CTCF and cohesin may be performing functions near lytic genes separate from those previously suggested in latency control regions.
RNAP II does not colocalize with CTCF, and RNAP II positioning suggests functions separate from the transcription of latent genes. We find RNAP II at the EBER transcripts, short non-coding RNA with expression predominantly driven by RNA Polymerase III but also partially transcribed by RNAP II (Kirchner et al., 1991
). We also find RNAP II at two novel positions, sites that yield the strongest enrichment of occupancy in our experiments yet do not map to any annotated transcripts. The strongest signal maps to the unique sequence adjacent to FR within OriP
, the proposed origin of replication for latency. The second strongest RNAP II signal maps upstream of the EBER-1 transcript and does not appear to correspond to an annotated element. We were unable, however, to find CTCF occupancy at any of these RNAP II-binding sites. We also could not detect RNAP II and cohesin colocalization as observed in the human genome (Kagey et al., 2010
). RNAP II, like CTCF and cohesin, binds near different regulatory elements throughout the genome, thus suggesting functions ranging from transcription to replication.
RNAP II occupancy detects a subset of known transcription units in the EBV-infected Raji cells, but CTCF proximity correlates with both active and repressed promoters. The two EBER RNAs are actively transcribed (Lerner et al., 1981
) and strong RNAP II enrichment is observed. A complex set of alternatively spliced mRNAs, referred to as BARTs, are generated from the rightward transcription of the BamHI A fragment (Chen et al., 1992
; Sadler and Raab-Traub, 1995
). Different initiation sites have been proposed, but we detect a weak signal in only one promoter (Smith et al., 1993
). Similarly, conflicting evidence exists regarding the identity of the transcription start site (Tao et al., 1998
; Woisetschlaeger et al., 1990
) for the set of alternatively spliced mRNAs that yield the EBNA genes (Allday et al., 1988
; Petti et al., 1988
; Wang et al., 1987
). Although the W promoter is obscured from our analysis by repeat regions, we find RNAP II at neither the C nor Q promoters. This cell line also expresses the LMP1 gene (Contreras-Brodin et al., 1991
), as well as LMP2A but not LMP2B (Sample et al., 1989
). Although weak RNAP II enrichment is observed in the general promoter region for these cells, the peak does not correlate to an annotated element. Combining our RNAP II localization with literature describing known transcription in Raji cells gives us the opportunity to compare CTCF occupancy with polymerase activity. Again, however, neither activity nor putative repression of promoters correlates with CTCF proximity. We interpret CTCF occupancy near both active and repressed promoters as another suggestion of context-dependent gene regulation.
Combining confident, high resolution mapping of protein positions by ChIP-seq and the deep annotation of viral genomes allows us to make specific and testable inferences. What functions can we hypothesize for CTCF, cohesion, and RNAP II? In addition to operating within the latency control region, CTCF and cohesin colocalization near lytic genes may control those regulatory elements, for example by repressing lytic transcription. In addition to generating latent transcripts, occupancy at the origin may couple RNAP II recruitment with replication. CTCF, cohesin, and RNAP II all share the common potential of context-dependent functions throughout the EBV genome.