The recent explosion in the number of genome-wide datasets has greatly increased our appreciation of transcriptome complexity and regulation, particularly the role of polymerase distribution, intergenic regulatory elements and non-coding RNAs. Here we study transcriptional output in erythroid cells by sequencing nuclear RNA and chromatin bound by active RNA polymerase II. We show that nucRNA-Seq identifies mainly unspliced primary transcripts and is significantly different than poly(A)-enriched RNA-Seq. Then, we investigated the relationship between RNAPII occupancy and nucRNA output, identified intergenic regions of the genome associated with RNAPII which have characteristics of regulatory regions and identified novel, stable, nuclear-retained lncRNAs expressed in adult erythroid cells.
Our observations show that a generalized level of RNAPII occupancy is a poor predictor of expression levels for most transcription units, with only very highly expressed RNAPII-transcribed genes showing a correlation between RNAPII association and transcriptional output. These results suggest that polymerase occupancy is just one of potentially many factors influencing the level of transcription of chromatin templates. Peaks of RNAPII found in promoter-proximal regions have been suggested to represent paused polymerase and correlate with lower expression 
. Our analysis confirmed these observations in that RNAPII peaks at the 5′ end of genes generally correlated with lower expression of the genes. Furthermore, our results show that genes displaying RNAPII peaks at their 3′ ends are also poorly expressed. We also observed genes with RNAPII peaks within the gene body suggesting that other pause sites exist which may impede transcription. It remains to be determined whether or not these 3′ and internal RNAPII peaks actually represent engaged, paused polymerase. In accordance with these sites as polymerase pausing locations a study in S. cerevisiae
, identifying the 3′ ends of nascent transcripts, using the NET-Seq (native elongating transcript sequencing) technique, identified numerous pause sites within genes 
We also found that accumulation of polymerase at the 5′ end of genes is not always associated with lower expression. In particular, genes featuring both 5′ and 3′ RNAPII peaks are more efficiently transcribed than genes with either peak alone. These peaks of RNAPII located within both the 5′ and 3′ regions of the “double RNAPII peak” genes may reflect a point of chromatin-chromatin interaction between these two regions allowing both locations to be captured in the RNAPII pull-down. Gene loop interactions between the promoter and 3′ end of inducible genes in S. cerevisiae
have been associated with more rapid induction of transcription 
. Our results indicating that genes displaying both 5′ and 3′ peaks of RNAPII are more efficiently transcribed suggest that similar gene loop interactions could occur at selected genes in higher eukaryotes and that these interactions contribute to increased gene expression.
Long range chromatin interactions are known to occur between regulatory regions and active genes 
. Our RNAPII ChIP-seq data identified intergenic regions bound by RNAPII, erythroid cell-expressed TFs and p300. This approach not only reveals regulatory regions by virtue of their TF binding properties, but potentially identifies the subset of regulatory regions physically associated with transcribing genes and as a result immunoprecipitated with the anti-RNAPII antibody. In agreement with this possibility, a subset of neuronal enhancers are bound by RNAPII 
. However, in contrast to the neuronal study, we failed to detect enhancer-associated RNAs 
in our dataset. We presume that these RNAs may not have been captured in our library preparation due to their size, stability or abundance. It has been shown that HS2 of the human HBB
LCR has promoter activity and the entire LCR region is transcribed 
. It is likely that the mouse LCR has similar properties and yet we did not identify significant levels of nucRNA in this region by nucRNA-Seq suggesting these transcripts are of relatively low abundance compared with the rest of the nuclear transcriptome. It should be noted that we cannot distinguish whether RNAPII is present at these regulatory regions as a result of their close association with the highly active Hbb
gene, synthesis of short-lived LCR ncRNA, or both. A previous study has identified LCR transcripts and shown that RNAPII is present at the LCR in mouse embryonic stem cells which do not express any of the Hbb
genes suggesting the LCR recruits RNAPII independently of and prior to Hbb
gene transcription 
In sequencing the nuclear RNA pool we were able to identify stable, nuclear-retained lncRNAs. These RNA species were found to be enriched in the nuclear fraction and many are present at low levels. They are likely to be missed in approaches that isolate total RNA as the cytoplasmic RNA pool is larger than the nuclear RNA pool. In comparing to existing sets of lncRNAs identified from total RNA we found only limited overlap with our set indicating that by isolating the nuclear pool of RNA we were able to identify novel nuclear retained transcripts that are masked by the cytoplasmic pool in other RNA-Seq studies. In support of this we found that for the 12 candidates we investigated further these RNAs were found almost exclusively in the nuclear fraction. One point of note is that in this approach, purely because we exclude candidates which overlap annotated genes, we overlook antisense and gene-overlapping lncRNAs. By inspection, such RNAs are still immediately obvious, the Kcnq1ot1
transcript being one example (Figure S12
). Future experiments using strand-specific methodologies will help further annotate this part of the nuclear transcriptome 
. The nuclear-retained non-coding transcripts we identified are relatively stable and show lower association with RNAPII compared to other protein-coding genes expressed at similar levels (they are in the T sub-group). This suggests that they would be less easily identified using genome-wide techniques that identify nascent transcripts such as the GRO-Seq, NET-Seq and genome-wide nuclear run-on assays 
The accurate and thorough characterization of transcriptional output represents an important step in the understanding of the regulatory environment in which gene expression occurs for a particular cell type or induced state 
. Sequencing the nuclear transcriptome reveals the relative levels of primary transcripts and in addition identifies novel nuclear retained lncRNAs not identified from total RNA-Seq studies. In this study we have presented a detailed description of the nuclear transcriptome in erythroid cells, though the methods described here could be applied to any given cell type or state including disease, experimentally perturbed states and cell fate changes.