Detection of short RNAs associated with the 5′ end of genes
We designed a DNA microarray to identify short RNAs at the 5′ end of human genes. RNA from primary human CD4+ T-cells was fractionated by size into short (<200nt) and large RNAs, labeled with fluorescent dyes and hybridized to the microarrays in replicate experiments (Figure and S1
). The arrays contained multiple probes allowing detection of sense-strand RNA at the promoter, 5′-most exons, 1st
intron and 3′-most exons of protein-coding genes (Supplemental Data and Table S1
). These arrays also contained probes for known short nuclear (sn)RNA and short nucleolar (sno)RNA that acted as positive controls (Figures and S1
Detection of promoter-associated short RNAs in primary human T-cells
We investigated whether we could detect short RNAs transcribed from protein-coding genes. Applying a threshold derived from our snRNA probes, we found thousands of short RNAs transcribed from the sense-strands of promoters, exons and introns of protein-coding genes in replicate experiments (, Table S2
). The RNAs were concentrated at the 5′ end of genes, within 700bp upstream and downstream of the mRNA TSS (). A substantial subset of the genes associated with short RNAs in primary human CD4+ T-cells also produced short RNAs in cell lines and ES cells (Kapranov et al., 2007
; Seila et al., 2008
) (), although often from different locations (Figure S1
). These results indicate that short RNAs are generated in primary human T-cells, are most abundant in promoter-proximal sequences, and are a general feature of the transcriptome of normal somatic cells.
Transcription of short RNAs from genes that are otherwise repressed
The short RNAs described in previous reports are associated with transcriptionally active genes but not with repressed polycomb target genes (Affymetrix/CSHL ENCODE, 2009
; Core et al., 2008
; Kapranov et al., 2007
; Seila et al., 2008
; Taft et al., 2009
). We therefore examined the transcriptional status of T-cell genes that produce short RNAs. We found that short RNAs could be detected both at genes that produced mRNA transcripts and at genes that do not (). Although the mRNA signals are quite different between these two sets of genes, the short RNA signals are similar (). This indicates that the transcription of short RNAs can occur in the absence of mRNA transcription.
Transcriptional status of short RNA genes
We next used ChIP-Chip to determine whether RNA pol II was present at genes that transcribe short RNAs but not mRNA (). The antibody used preferentially recognises the non-phosphorylated form of RNA pol II that is recruited to chromatin (Lee et al., 2006
). We plotted average enrichment for RNA pol II across the TSS of genes that transcribe short RNA but not mRNA and compared this to genes that transcribe neither RNA type. We found that RNA pol II was enriched at genes that produce short RNAs but not mRNA. This enrichment of RNA pol II at these genes was also apparent using data from independent ChIP-Seq experiments (Barski et al., 2007
) (Figure S2
At genes that produce short RNAs but not mRNA, the location of the short RNA correlates with the position of the initiating form of RNA pol II. At the set of genes where short RNA was detected at promoter regions, the RNA pol II peak was shifted upstream of the mRNA TSS (Figures and S2
). At the set of genes where short RNA was detected at exon or introns, RNA pol II occupancy was enriched downstream of the mRNA TSS. The positioning of RNA pol II at the site of short RNA production, rather than the mRNA TSS, could also be observed at individual genes (). At genes that produce both short RNA and mRNA, RNA pol II was concentrated at the mRNA TSS (Figure S2
). These results argue that the RNA pol II observed at repressed genes is primarily involved in the transcription of short RNA and that short RNAs can be produced from TSS distinct from the mRNA TSS.
We next examined the transcriptional status of RNA pol II at short RNA loci. We used ChIP-Chip to map H3K4me3, a marker of transcriptional initiation, and H3K79me2, a marker of transcription through a gene (Guenther et al., 2007
; Steger et al., 2008
). Looking across all human genes, we found that H3K79me2 showed comparable enrichment to H3K4me3 (). In contrast, genes that produced short RNAs but not mRNAs were enriched for H3K4me3 but not H3K79me2 ( and Figure S2
). We conclude that these genes associated with short RNAs experience transcriptional initiation but the protein-coding portion of the gene is not transcribed.
Repressed genes producing short RNAs are targets for polycomb
Genes repressed by polycomb proteins are generally associated with nucleosomes containing H3K4me3, a marker of transcriptional initiation, and often RNA pol II (Azuara et al., 2006
; Bernstein et al., 2006
; Roh et al., 2006
; Stock et al., 2007
; Chopra et al., 2009
). We therefore considered the possibility that genes that generate short transcripts but not mRNA could be associated with polycomb proteins. To test this, we used ChIP-Chip to measure H3K27me3 and compared the results to our short RNA data. We found that genes that transcribed short RNA but not mRNA were enriched for H3K27me3 (Figures and S3
). H3K27me3 could be detected more frequently at genes that transcribe short RNA but not mRNA than at any other category of gene (), demonstrating the close association between short RNA transcription and H3K27 methylation. Sites of short RNA transcription were also commonly associated with CpG islands (), previously linked to polycomb recruitment (Ku et al., 2008
Short RNA loci are associated with H3K27me3
Transcription of short RNAs from polycomb target genes would explain the association of these genes with H3K4me3. We therefore sought to verify that the transcriptional machinery was present at H3K27-methylated genes that express short RNAs. Plotting enrichment of H3K27me3, H3K4me3 and RNA pol II across these genes confirms that transcriptional initiation occurs at polycomb target genes in CD4+ T-cells (Figures and S3
). Strikingly, H3K27-methylated nucleosomes flank the sites of RNA pol II and H3K4me3 occupancy, resembling the pattern of H3K27me3 around Drosophila polycomb response elements (Schwartz et al., 2006
). PRC2 targets developmental regulators that must be repressed to maintain cell identity (Barski et al., 2007
; Boyer et al., 2006
; Lee et al., 2006
). We therefore asked whether the set of genes that expressed short RNAs in the absence of mRNA were enriched for genes that play a role in development. Using Gene Ontology (GO), we found that functional categories such as multicellular development and cell-cell signalling were significantly enriched in the set of genes from which short RNA was transcribed in the absence of mRNA (). Therefore, consistent with polycomb-mediated silencing, genes from which short RNA is transcribed in the absence of mRNA tend to have functions in development and cell differentiation.
Short RNAs transcribed from polycomb target genes are ~50-200 nt in length
We next used Northern blotting to verify the transcription of short RNAs from polycomb target genes and to perform a more accurate determination of their size. We selected array probes that detected short RNAs at genes for which no mRNA could be detected and were also associated with H3K27me3. We then purified short RNA from peripheral blood mononuclear cells (PBMC), treated with DNase and performed northern blotting for short RNAs (). 17 of the 22 probes we tested over the course of this study (77%) detected a short RNA species (Table S3
). Some probes identified a single RNA species of between ~50 and 200 nt, while others detected multiple RNA products. Short RNA transcripts could be detected from exons, introns and promoter regions, consistent with our array data.
Short RNA transcription is not dependent on polycomb activity
We considered that short RNA production might be a by-product of polycomb activity and thereby dependent on H3K27 methylation. To test this, we made use of the murine ES cell line Ezh2-1.3, in which deletion of the Ezh2 SET domain can be induced by tamoxifen (Pereira et al., unpublished data). We first sought to establish if short RNAs are transcribed from polycomb target genes in ES cells. We identified short RNA loci conserved between human and mouse and used histone methylation data (Boyer et al., 2006
) to identify those targeted by polycomb in ES cells. Northern blotting detected a short RNA at each of these genes in murine ES cells (), indicating that the transcription of short RNAs from repressed polycomb target genes is common to different cell types and involves loci that are conserved between different mammalian species.
Transcription of short RNAs in murine ES cells deficient for Ezh2 and Ring1
We next measured short RNA transcription in Ezh2-1.3 cells at timepoints after the addition of tamoxifen. Loss of full-length Ezh2 protein (together with the appearance of a truncated form lacking the SET domain) and a loss of H3K27me3 could be observed in these cells over the five-day timecourse (). Blotting for short RNAs at the genes Hes5, Msx1 and Ybx2 showed that loss of Ezh2 and H3K27me3 had no effect on the levels of short RNAs at these genes (). Although we did not observe an increase in mRNA expression from these genes upon Ezh2 deletion (Figure S4
), our results show that the production of short RNAs at polycomb target genes is not dependent on H3K27 methylation.
The block to mRNA transcription at bivalent genes is dependent on the PRC1 component Ring1 that catalyzes the ubiquitination of histone H2A. Deletion of Ring1 causes activation of PRC1 target genes, including Msx1 (Stock et al., 2007
; Figure S4
). We blotted for short RNAs in the murine ES cell line ES-ERT2 with and without addition of tamoxifen that induces deletion of Ring1b
and loss of H2AK119ub (Stock et al., 2007
). As we found for deletion of Ezh2
, loss of Ring1b had no effect on short RNA transcription (). These data show that transcription of short RNAs is not dependent on H2A ubiquitination and, taken together with the Ezh2 deletion experiments, indicate that short RNA transcription is independent of polycomb activity.
Short RNAs encode stem-loop structures and interact with PRC2
If short RNAs are not a product of polycomb-mediated gene silencing they might instead act upstream. Supporting this hypothesis, the long ncRNAs HOTAIR and Xist RepA interact with PRC2 and this interaction is necessary for H3K27me3 of HOXD
and the X-chromosome, respectively (Rinn et al., 2007
; Zhao et al., 2008
). We therefore considered the possibility that PRC2 may interact with the short ncRNAs identified here.
The PRC2 binding site within mouse Xist RepA appears to be a double stem-loop structure that is repeated 7 times (Zhao et al., 2008
). We therefore first examined whether the short RNAs we have identified could form such a structure. We derived a general structural motif based on the RepA sequence and searched for the presence of this motif in the DNA sequences immediately surrounding probes that detect short RNAs from H3K27-methylated genes (). We found that 71% of these sequences encode this PRC2-binding structure, compared to 36% of control sequences not associated with short RNAs but with an equal distribution around the mRNA TSS. Examples of these structures are given in Figures and S5
. These data indicate that short RNAs transcribed from polycomb target genes have the potential to interact with PRC2. However, we found that these structures were not limited to short RNAs transcribed from polycomb-associated genes and that they were also present within short RNAs transcribed from genes not associated with H3K27me3 (). These RNAs may therefore also have the potential to bind PRC2.
PRC2 interacts with short RNAs in vitro
To test for an interaction between short RNAs encoding the stem-loop structure and PRC2, we performed electrophoretic mobility shift assays (EMSA). Consistent with recent observations (Zhao et al., 2008
), we found that incubation of T-cell lysate with radiolabeled RNA oligonucleotides encoding the Xist-RepA stem-loop produced a mobility shift and that mutation of the RNA stem-loop structure abolished this interaction (). We then repeated the experiment with RNA oligonucleotides corresponding to the stem-loops encoded by BSN, C20orf112, HEY1, MARK1 and PAX3 short RNAs. We observed a similar shift with RNA oligonucleotides corresponding to these short RNA stem-loop structures as we did for Xist-RepA and did not observe an interaction when the BSN short RNA structure was disrupted (). The BSN short RNA stem-loop was also able to compete with Xist-RepA for binding (), indicating that the interaction between PRC2 and Xist-RepA is similar to that between PRC2 and short RNAs.
We next sought to identify which PRC2 component was responsible for the interaction with short RNAs. We purified GST-tagged recombinant SUZ12, EZH2, EED and RBBP4 from E. coli and incubated each protein with radiolabeled Xist-RepA and BSN oligonucleotides (). We found that SUZ12 interacted strongly with Xist-RepA and BSN short RNA stem-loops. SUZ12 also interacted with other short RNA stem-loops (), displayed a weaker interaction with mutated Xist-RepA RNA () and did not bind to the BSN sequence encoded by single-stranded DNA, double-stranded DNA or DNA:RNA duplexes (). These data demonstrate that SUZ12 specifically interacts with RNA stem-loop structures encoded by Xist-RepA and short RNAs.
PRC2 interacts with short RNAs in cells
We next sought to verify that the interaction between PRC2 and short RNA occurred in living cells. We immunoprecipitated SUZ12 from a female T-cell line, isolated co-purifying RNA and subjected this to quantitative reverse-transcription (RT)-PCR (). Immunoprecipitation with the SUZ12 antibody, but not a non-specific rabbit control antibody, enriched for Xist RNA over Actin mRNA and mRNAs encoding GAPDH and HPRT, indicating we could specifically detect PRC2-RNA interactions in these cells. We then performed RT-PCR for short RNAs transcribed from polycomb target genes. We found that 4 of the 5 short RNAs we tested were enriched by SUZ12 IP, although to a lesser extent than Xist. These short RNAs also possessed a stem-loop structure (Figure S5
). Amplification of short RNAs was not observed in control reactions lacking RT and enrichment was maintained under stringent wash conditions (Figure S6
). We also examined short RNAs transcribed from genes that were not associated with H3K27me3. We found that these were also often enriched by SUZ12 IP, consistent with the presence of stem-loop structures in these transcripts (). Therefore, PRC2 interacts with short promoter-associated RNA but this interaction does not necessarily produce detectable levels of H3K27me3. The snRNAs U1, U2 and U3 and the short structured RNA pol III transcripts 7SK and 5S ribosomal RNA were not enriched by SUZ12 IP, demonstrating that enrichment of short RNA was specific to the set transcribed from the 5′ ends of genes.
PRC2 interacts with short RNA in cells
PRC2-binding RNA stem-loops cause gene repression in cis
Our data suggest a model in which short RNAs transcribed at the 5′ end of polycomb target genes interact with PRC2, stabilising its interaction with chromatin in cis. This model suggests that addition of sequences encoding PRC2-binding stem-loop structures to the 5′ end of a gene would lead to ectopic PRC2 binding and consequent H3K27 methylation and gene repression. To test the model, we introduced sequences encoding PRC2-binding stem loops into the HIV long terminal repeat (LTR). The R portion of the LTR located immediately downstream of the TSS encodes the structured transactivation response element (TAR) RNA that is transcribed as a short RNA and as the 5′ UTR of HIV mRNA transcripts. We replaced the R and U5 portions of the LTR with the Xist-RepA stem-loop or the stem-loop from the short RNA at C20orf112, which also interacts with PRC2 (Figures and ), and cloned the luciferase coding region downstream ().
We first examined whether the incorporation of these PRC2-binding RNA stem-loops into the HIV LTR would allow binding by PRC2. The different constructs were transfected into Hela cells and the enrichment of luciferase RNA in SUZ12 IP material relative to input RNA measured by quantitative RT-PCR (). We found that addition of Xist-RepA and short RNA stem-loops resulted in the specific enrichment of luciferase RNA in the SUZ12 IP fraction.
We next tested whether incorporation of PRC2-targeting RNA stem-loops affects expression of the luciferase gene (). We found that there was a significant drop (p<0.05) in luciferase activity in cells transfected with constructs containing wild-type Xist-RepA or short RNA stem-loops. Furthermore, mutation of the stem-loop sequences restored luciferase activity.
Finally, we asked whether incorporation of PRC2-binding RNA stem-loops caused an increase in H3K27 methylation. We performed ChIP for H3K27me3 and total H3 and measured enrichment of luciferase construct DNA by quantitative PCR (). We found that the constructs encoding PRC2-binding RNA stem-loops showed a small increase in H3K27 methylation compared to the wild-type LTR construct. This increase in H3K27me3 was not observed with constructs in which the RNA stem-loops had been mutated. Taken together with our other experiments, these data show that the incorporation of Xist-RepA and short RNA stem-loop sequences at the 5′ end of genes allows PRC2 binding, H3K27me3 and gene repression in cis.
Short RNAs are lost from polycomb target genes active in other cell types
Differential H3K27 methylation allows for cell-type specific expression of developmental regulators and this underlies the differing identities of specialized cell types. We hypothesised that short RNAs may be specifically depleted from polycomb target genes that are derepressed in other cell types. To address this, we examined short RNA expression in the neuronal cell line SH-SY5Y. We choose neuronal cells because our GO analysis () revealed that many genes that express short RNAs in T-cells play roles in neuronal development. We used gene expression and functional data to select polycomb target genes that are repressed in T-cells and active in neuronal tissue (Figure S7
). We then purified short RNAs from both PBMC and the neuronal cultures and blotted for short RNAs identified at these genes in T-cells (). We found that the increased expression of mRNA from the genes FOXN4, HEY1, MARK1, NKX2-2, BSN
in the neuronal cells was accompanied by a reduction in the expression of short RNAs. In contrast, short RNAs transcribed from YBX2
, a gene expressed only in germ cells, or from the thyroid and lung-specific NKX2-1
, were present equally in PBMC and neuronal cells. These results show that high-levels of short RNA transcription is specific to polycomb target genes silent in a given cell type.
Loss of short RNAs from polycomb target genes activated in neuronal cells
We next asked whether activation of polycomb target genes during differentiation of ES cells into specialised cell types is accompanied by loss of short RNAs. To test this, we differentiated murine ES cells over a period of 4 days, through embryoid bodies and neural precursor cells to precursor motor neurons (Wichterle et al., 2002
). During this process there is an increase in the expression of mRNA encoding the neuronal proteins Hes5 and Pcdh8 (). Blotting for short RNAs at these genes reveals that activation of Hes5
is accompanied by a progressive drop in the levels of the ~190 nt Hes5
and ~100 nt Pcdh8
short RNAs (). As the levels of the ~190 nt Hes5
short RNA decrease, there is a concomitant increase in the levels of shorter species, implying that the longer RNA is degraded. We conclude that derepression of polycomb target genes during ES cell differentiation is accompanied by a decrease in short RNA transcripts.