Genome-wide annotation of NFRs. It is generally accepted that mRNA transcription initiates at the edges of NFRs and can be facilitated by multiple factors that positively regulate the formation and/or size of NFRs. Recent studies have shown that ncRNA also initiates at the edges of NFRs (
45,
66). However, the mechanisms for regulation of these transcripts are not well understood. Given the widespread prevalence of ncRNA detected throughout the
S. cerevisiae genome (
12,
13,
43-
45,
66), it is critical to understand the mechanisms for regulation of ncRNAs. Isw2 is known to target the 3′ end of ~250 genes and to repress noncoding transcription at three of the loci tested (
63). These results suggest a possibility that Isw2 may be preferentially targeted to NFRs, where it functions as a unique chromatin regulator to repress ncRNA transcription by negatively restricting the size of NFRs. To test our model, we first systematically annotated NFRs genome-wide by using data from multiple independent nucleosome mapping data sets.
Global identification of NFRs was first done at the 5′ ends of genes by Yuan et al. (
68). Subsequently, numerous studies recapitulated this group's findings and further showed that NFRs are also commonly present at the 3′ ends of genes (
1,
30,
41,
45,
59,
66). Generally, these studies based their analyses on a single nucleosome mapping data set and identified NFRs as regions with a wider-than-average linker length or by fitting an idealized model to smoothed nucleosome signals. However, all nucleosome mapping data sets contain a large number of loci where nucleosomes are poorly defined, and while the overall agreement in nucleosome positions between studies is good, significant differences between all studies exist (
30). Furthermore, some NFRs were identified only by their proximity to the ends of genes, thus precluding the annotation of NFRs located elsewhere in the genome. These reasons thus necessitated an unbiased and systematic annotation of NFRs.
Therefore, we developed an algorithm by using multiple criteria to systematically identify NFRs commonly found in multiple nucleosome mapping data sets (see Materials and Methods for details). These criteria were designed to eliminate NFRs from regions with poorly defined nucleosomes and to identify NFRs present in several nucleosome mapping data sets, thus mitigating the identification of spurious NFRs and enriching for NFRs with high confidence levels. This algorithm was applied to four independent nucleosome mapping data sets, two by using high-resolution microarrays and two by using high-throughput sequencing (
19,
37,
41,
63). This algorithm identified a reference set of 6,589 core NFRs (Fig. A). Because the annotated NFRs represent the overlapping regions identified from each data set and not necessarily the definitive edges of each NFR, the mean length and total genomic coverage (99 bp and ~5.4%, respectively) are likely underestimates.
It should be noted that during the course of this study, Jiang and Pugh (
30) independently annotated NFRs genome-wide. In their study, NFRs were annotated based on the length of the linker regions from a compiled reference set of nucleosome positions, resulting in the identification of 14,467 NFRs. A direct comparison of both sets of NFRs revealed a statistically significant overlap (4,548 overlapping NFR regions;
P < 10
−300). However, many of the NFRs identified by Jiang and Pugh that were not present in our data set corresponded to regions of poorly defined nucleosomes or regions deficient of nucleosomes in only a single data set. Thus, for the purpose of this study, NFRs defined by our algorithm were more suitable.
Classification of NFRs. NFRs were first identified around the TSSs at the 5′ ends of genes and, more recently, around TTSs at the 3′ ends of genes. Of the annotated NFRs, a total of 3,127 (averaging ~111 bp in size and associated with ~64% of nondubious ORFs) and 2,440 (averaging ~109 bp in size and associated with ~50% of nondubious ORFs) were classified as 5′-NFRs and 3′-NFRs, respectively (Fig. ). Because of the compact nature of the yeast genome, some 5′- and 3′-NFRs are shared between two neighboring genes in tandem, divergent, or convergent orientations. Therefore, we further classified 5′- and 3′-NFRs and identified 1,312, 555, and 484 shared tandem, divergent, and convergent NFRs, respectively (Fig. ). Thus, 2,438 (~42%) nondubious genes contain a shared NFR at their 5′ end and 2,291 (~40%) contain a shared NFR at their 3′ end.
Our unbiased genome-wide annotation of NFRs allowed us to further identify two additional classes of NFRs, ORF-NFRs and Other-NFRs (Fig. ). The ORF-NFR class comprises 2,114 long linkers (averaging ~91 bp in size) that are located completely within the open reading frames of 1,639 genes. The remaining 758 NFRs that are not associated with any nondubious ORF were classified as Other-NFRs (averaging ~88 bp in size) and represent NFRs located within long intergenic regions or upstream of tRNA genes and retrotransposons.
General properties of NFRs. Previously, others have shown that nucleosomes immediately adjacent to 5′- and 3′-NFRs are well-positioned and that the phasing of nucleosomes progressively decreases with distance from NFRs (
41). More recently, NFRs were also shown to have a sequence-intrinsic tendency to exclude nucleosomes (
33,
69). Whether these properties are shared by ORF-NFRs or Other-NFRs is not known. We therefore compared the phasing and sequence-dependent exclusion of nucleosomes around each class of NFRs.
To this end, the nucleosome signals from two in vivo WT mapping data sets and one in vitro-reconstituted nucleosome mapping data set were analyzed around all NFRs (Fig. A). As expected, the in vivo nucleosome maps displayed a prominent NFR that was flanked on both sides by an array of well-positioned nucleosomes whose phasing progressively decreased with distance from the NFR midpoint. The in vitro-reconstituted nucleosomes also exhibited a general depletion of nucleosomes at NFRs (Fig. ).
Next, we individually compared the average in vivo and in vitro nucleosome profiles around each class of NFRs (Fig. ). This analysis revealed that all classes of NFRs are flanked by an array of highly positioned nucleosomes with phasing that progressively decreases with distance from the NFR in vivo. In vitro, all classes of NFRs exhibit a sequence-intrinsic property to exclude nucleosomes. However, the level of phasing or sequence-intrinsic exclusion of nucleosomes varies between classes. For example, 5′- and 3′-NFRs are generally bordered by the most highly phased nucleosomes and display the most prominent in vitro nucleosome exclusion, while ORF-NFRs and Other-NFRs have significantly reduced levels of phasing and sequence-intrinsic nucleosome exclusion.
Because the in vivo and in vitro nucleosome profiles for each class of NFRs are different, we speculated that additional factors, beyond the sequence-dependent exclusion of nucleosomes, are required to establish the in vivo nucleosome architecture surrounding NFRs. As such, we analyzed the distribution of various histone modifications, transcriptional machineries, and chromatin-remodeling enzymes surrounding all NFRs. k-means clustering of this profile revealed distinct classes of NFRs that are enriched within different chromatin environments (Fig. ). This result was consistent with our model that the chromatin architecture surrounding distinct classes of NFRs is differentially influenced by chromatin and transcription regulators in vivo. The observation that the annotated NFRs define boundaries for some histone modifications, such as H3K4me3, H4ac, and H3K14ac within clusters 1 and 8, as well as a histone variant, Htz1 within clusters 1, 4, 5, 6, and 8, provides additional support of the quality of the NFR annotation.
Isw2 association with NFRs. We next examined whether Isw2 is preferentially targeted to and functions around NFRs. To address these possibilities, we determined the total number of Isw2 targets, defined as regions where both enrichment of Isw2 chromatin immunoprecipitation (ChIP) signals and Isw2-dependent chromatin remodeling (
63) take place, that are in close proximity to one of the annotated NFRs. Because we utilized a strict definition for Isw2 targets and restricted our analysis to the edges of the annotated core NFRs, our calculation for the association of Isw2 targets with NFRs is likely a significant underestimation. Nonetheless, our analysis revealed a striking association of Isw2 targets with NFRs. A total of 406 Isw2 targets (
P = 9.5e−279), representing ~48% of all Isw2 targets, were found within a single nucleosome distance (150 bp) of an annotated NFR. Classification of these NFRs targeted by Isw2 showed a statistical enrichment in all classes of NFRs: 5′-NFRs (217 Isw2 targets;
P = 1.3e−148), 3′-NFRs (164 Isw2 targets;
P = 2.1e−108), ORF-NFRs (50 Isw2 targets;
P = 9.0e−14), and Other-NFRs (124 Isw2 targets;
P = 4.1e−129). This analysis identified ORF-NFRs as a previously unknown class of Isw2 targets located within genes.
To further understand a role for Isw2 targeting NFRs, we first used self-organizing maps (SOMs) of Isw2 ChIP signals and Isw2-dependent chromatin remodeling around each class of Isw2 target NFRs (Fig. A to D, top panels). This analysis revealed that with each class of NFRs, Isw2 tends to be enriched and remodels nucleosomes immediately adjacent to NFRs, consistent with the fact Isw2 preferentially targets NFRs. Second, plotting the distribution of the change in NFR size between WT and
Δisw2 strains (
63) revealed an increase in the size of many, but not all, NFRs in
Δisw2 strains compared to WT (Fig. , bottom panels). These results suggest that Isw2-dependent chromatin remodeling around target NFRs functions to restrict the size of many, but not all, target NFRs.
Isw2-dependent repression of cryptic transcripts. We next sought to determine a functional consequence for Isw2-dependent chromatin remodeling around target NFRs. We previously showed that Isw2 represses cryptic antisense transcripts from the 3′ ends of three genes (
63). However, whether Isw2 generally functions to repress cryptic RNA transcription is currently unknown. Given that Isw2 is targeted and remodels chromatin around NFRs and that the TSSs of ncRNA are enriched around the edges of NFRs, we speculated that Isw2-dependent chromatin remodeling may result in repression of ncRNA transcription at NFRs.
To test our model, we hybridized total RNA from Isw2 deletion strains to high-resolution, strand-specific microarrays tiling chromosomes III, VI, and XII, covering ~14% of the genome. Because the exosome complex efficiently degrades cryptic transcripts, deletion of components in the exosome pathway, either
TRF4 or
RRP6, in combination with
ISW2 (
Δisw2 trf4 and
Δisw2 rrp6, respectively) is required to stabilize some cryptic transcripts (
63). These mutations are not expected to alter the frequency of ncRNA transcription (
28,
53) but are required for detection of cryptic RNA transcripts. Furthermore, because it is unclear how cryptic RNA transcripts are processed
in vivo, especially in the absence of Trf4 or Rrp6, we avoided any selection or amplification of RNA (see Materials and Methods for details).
While the Δisw2 single mutant showed relatively few changes compared to WT (Δisw2 versus WT; 35 total Isw2-dependent cryptic transcripts), a total of 80 (mean length of 604 bases) and 141 (mean length of 411 bases) Isw2-dependent cryptic transcripts were identified in the Δtrf4 (Δisw2 trf4 versus Δtrf4) and Δrrp6 (Δisw2 rrp6-versus Δrrp6) backgrounds, respectively. Interestingly, while a larger number of Isw2-dependent cryptic transcripts were identified in Δisw2 rrp6 than in Δisw2 trf4, a more robust increase in cryptic RNA levels was observed in Δisw2 trf4 (data not shown). However, a comparison of cryptic transcripts identified in both double mutants revealed that 47 cryptic transcripts in Δisw2 trf4 cells directly overlapped with 46 cryptic transcripts in Δisw2 rrp6 cells (Fig. A). These data suggest that Trf4 and Rrp6 have overlapping but distinct functions in cryptic RNA regulation.
We next classified each identified cryptic transcript as either cryptic sense, antisense, or other (Fig. ) with respect to the transcriptional direction of an overlapping ORF (mean overlap with OFRs, 491 bp). The major class of Isw2-repressed cryptic transcripts in both Δtrf4 and Δrrp6 backgrounds was cryptic antisense, representing 54% (43 transcripts) and 34% (48 transcripts) of identified cryptic transcripts, respectively (Fig. ). A comparison of the cryptic RNA levels between each class of transcripts further revealed, as expected, that the exosome components have little contribution to Isw2-repressed cryptic sense transcripts, as shown by the same RNA levels in Δisw2 and Δisw2 trf4 or Δisw2 rrp6 strains (data not shown). In contrast, Isw2-repressed cryptic antisense transcripts generally require loss of both Isw2 and an exosome component for maximum derepression (data not shown), consistent with the known role of Trf4 and Rrp6 in the selective degradation of cryptic antisense transcripts.
We next examined the relationships between Isw2-dependent chromatin remodeling and cryptic RNA transcription. We found that 22 (28%) and 41 (29%) Isw2-repressed cryptic RNA TSSs in Δisw2 trf4 and Δisw2 rrp6, respectively, are located within 300 bp of an Isw2 target. It should be noted that this is likely an underestimation, as many cryptic transcripts have multiple TSSs, resulting in blurring of microarray signals around the TSSs (see below). Strikingly, at these loci nucleosomes are preferentially shifted downstream of the cryptic TSSs in the absence of Isw2, compared to WT (Fig. and data not shown). These results show that Isw2-dependent chromatin remodeling is often, but not always, associated with the repression of cryptic transcripts. For unknown reasons, we also found nucleosomes are more highly phased around cryptic TSSs in Δisw2 trf4 than in Δisw2 rrp6 (Fig. and data not shown). Finally, an increased level of cryptic transcripts in Δisw2 trf4 was also observed in non-Isw2 targets (Fig. ). However, the change in transcription, from the baseline to the peak of the cryptic transcript, was lower in nontargets than in targets. These data suggest that indirect effects of Isw2 on ncRNA transcription at nontargets do exist but that these effects tend to be smaller than the direct effects at Isw2 targets.
5′ rapid amplification of cDNA ends (5′-RACE) was then performed at select loci to verify the cryptic transcripts identified by microarray analysis and to map TSS locations (Fig. ). All three loci tested from a
Δisw2 trf4 strain revealed capped transcripts with multiple TSSs that initiated from the edges of an NFR (within 150 bp) specifically targeted by Isw2. These data are consistent with previous studies that mapped the TSSs of cryptic transcripts (
45,
66) and further confirm the role of Isw2-dependent chromatin remodeling in the repression of cryptic ncRNA transcription around NFRs.
Transcriptional interference. There are several potential biological reasons for the degradation or repression of cryptic transcripts by the exosome or Isw2. These include, but are not limited to, conservation of resources for transcription and translation, prevention of abnormal protein synthesis, and alleviation of transcriptional interference. In particular, recent reports in
S. cerevisiae have demonstrated examples of transcriptional interference in the regulation of coding transcription (
27,
39,
40). It is currently unknown how frequently transcriptional interference occurs on a global scale. Thus, to identify a potential biological role for the repression of ncRNA by Isw2 or the exosome, we identified genes whose mRNA levels were decreased when the levels of directly overlapping cryptic sense (sense-sense transcriptional interference) or antisense (antisense-sense transcriptional interference) transcripts were increased in each mutant strain analyzed.
Potential antisense-sense transcriptional interference loci were identified in all strains analyzed, totaling 1 locus in the Δisw2 versus WT comparison, 36 loci in Δtrf4 versus WT, 19 loci in Δrrp6 versus WT, 5 loci in Δisw2 trf4 versus Δtrf4, and 3 loci in Δisw2 rrp6 versus Δrrp6 (one example locus is displayed in Fig. A). In contrast, potential sense-sense transcriptional interference loci were less frequent, totaling nine loci in the Δtrf4 versus WT comparison and one locus in the Δisw2Δrrp6 versus Δrrp6 comparison (one example locus is displayed in Fig. ). Closer inspection revealed that many of the increased cryptic transcripts directly overlap the TSS of the repressed mRNA, totaling 1 locus in Δisw2 versus WT, 12 loci in Δtrf4 versusWT, 5 loci in Δrrp6 versus WT, 3 loci in Δisw2 trf4 versus Δtrf4, and 1 locus in Δisw2 rrp6 versus Δrrp6 for antisense-sense transcriptional interference and 9 loci in Δtrf4 versus WT and 1 locus in Δisw2 rrp6 versus Δrrp6 for sense-sense transcriptional interference. These data suggest the possibility that transcriptional interference may be caused by cryptic RNA transcription through the promoter of a gene, underscoring the importance of Isw2 and the exosome in the repression of ncRNA. Considering that our microarrays represent ~14% of the genome, both sense-sense and antisense-sense transcriptional interference likely occurs at a high frequency throughout the genome. For example, assuming that chromosomes III, VI, and XII accurately represent the global picture, we estimate that ~60 sense-sense and ~250 antisense-sense transcriptional interference loci are present genome-wide in a Δtrf4 mutant alone.