|Home | About | Journals | Submit | Contact Us | Français|
Long intergenic noncoding RNAs (lincRNAs) are key regulators of chromatin state, yet the nature and sites of RNA-chromatin interaction are mostly unknown. Here we introduce Chromatin Isolation by RNA Purification (ChIRP), where tiling oligonucleotides retrieve specific lincRNAs with bound protein and DNA sequences, which are enumerated by deep sequencing. ChIRP-seq of three lincRNAs reveal that RNA occupancy sites in the genome are focal, sequence-specific, and numerous. Drosophila roX2 RNA occupies male X-linked gene bodies with increasing tendency toward the 3’ end, peaking at CES sites. Human telomerase RNA TERC occupies telomeres and Wnt pathway genes. HOTAIR lincRNA preferentially occupies a GA-rich DNA motif to nucleate broad domains of Polycomb occupancy and histone H3 lysine 27 trimethylation. HOTAIR occupancy occurs independently of EZH2, suggesting the order of RNA guidance of Polycomb occupancy. ChIRP-seq is generally applicable to illuminate the intersection of RNA and chromatin with newfound precision genome-wide.
Long noncoding RNAs are key regulators of chromatin states for important biological processes such as dosage compensation, imprinting, and developmental gene expression (Kelley et al., 1999; Koziol and Rinn, 2010; Mercer et al., 2009; Pandey et al., 2008; Rinn et al., 2007; Wang et al., 2011; Zhao et al., 2008). The recent discovery of thousands of lincRNAs in association with specific chromatin modification complexes, such as Polycomb Repressive Complex 2 (PRC2) that mediates histone H3 lysine 27 trimethylation (H3K27me3), suggests broad roles for numerous lincRNAs in managing chromatin states in a gene-specific fashion (Khalil et al., 2009; Zhao et al., 2010). While some lincRNAs are thought to work in cis on neighboring genes, other lincRNAs work in trans to regulate distantly located genes. For instance, Drosophila ncRNAs roX1 and roX2 bind numerous regions on the X chromosome of male cells, and are critical for dosage compensation (Franke and Baker, 1999; Meller et al., 1997). However, the exact locations of their binding sites are not known at high resolution. Similarly, human lincRNA HOTAIR can affect PRC2 occupancy on hundreds of genes genome-wide (Gupta et al., 2010; Rinn et al., 2007; Tsai et al., 2010), but how specificity is achieved is unclear. LincRNAs can also serve as modular scaffolds to recruit the assembly of multiple protein complexes. The classic trans-acting RNA scaffold is the TERC RNA that serves as the template and scaffold for the telomerase complex (Zappulla and Cech, 2006); HOTAIR can also serve as a scaffold for PRC2 and a H3K4 demethylase complex (Tsai et al., 2010).
Prior studies mapping RNA occupancy at chromatin have revealed substantial insights (Carter et al., 2002; Nagano et al., 2008), but only at single gene locus at a time. The occupancy sites of most lincRNAs are not known, and the roles of lincRNAs in chromatin regulation have been mostly inferred from the indirect effects of lincRNA perturbation. Just as chromatin immunoprecipitation followed by microarray or deep sequencing (ChIP-chip or ChIP-seq, respectively) have greatly improved our understanding of protein-DNA interactions on a genomic scale, here we introduce a strategy to map long RNA occupancy genome-wide at high resolution, and apply this strategy to illuminate the mechanisms of RNA guided PRC2 localization.
We developed a method termed ChIRP that allows unbiased high-throughput discovery of RNA-bound DNA and proteins (Fig. 1A). Briefly, cultured cells are crosslinked in vivo, and their chromatin extracted and homogenized. Biotinylated complementary oligonucleotides that tile the RNA of interest were hybridized to target RNAs, and isolated using magnetic streptavidin beads. Co-purified chromatin was eluted for protein, RNA, or DNA, which was then subject to downstream assays for identification and quantitation.
A key feature of our approach is the use of tiling oligonucleotides, which we found to be critical for success. We initially attempted to capture a lncRNA with morpholino probes, which are high-affinity ribonucleotide analogues resistant to nuclease digestion. As lncRNAs are known to be highly structured (Tsai et al., 2010), we designed three morpholino probes against single-stranded portions of HOTAIR, as determined by prior high-throughput RNA secondary structure measurements (Kertesz et al., 2010), Fig. S1A). As a negative control we synthesized a morpholino probe that bore no sequence homology with any human RNA. We titrated a wide array of hybridization parameters, and consistently obtained best results under high ionic strength (results not shown), higher hybridization temperature (Fig. S1B) and moderately denaturing conditions (Fig. S1C). However, with the 3-probe approach we could retrieve at most ~10% of HOTAIR RNA (Fig. S1C). Importantly, we found that the HOTAIR transcript was sheared into the size range of ~100nt to 500nt during sonication, a step necessary for the solubilization of chromatin (Fig. 1B). We suspected that the 3-oligo approach was ineffective at recovering all fragments of long RNAs such as HOTAIR, and indeed qRT-PCR primers targeting distinct regions of HOTAIR reported drastically different efficiencies of recovery from the same pull down (~10 fold range, Fig. S1D). This raised a serious concern because functional domains of HOTAIR at its 5’ and 3’ ends (Tsai et al., 2010) could potentially be lost due to their distance away from arbitrarily chosen probes. Moreover, without prior knowledge of the DNA interacting domain within a lncRNA, it would be difficult to decide where to target a small number of oligonucleotide probes or insert a RNA aptamer to consistently retrieve a lncRNA of interest on chromatin.
To develop a method that is applicable to any lncRNA without prior knowledge of its secondary structure or functional domains, we decided to target all parts of HOTAIR equally. We were inspired by the technique of single molecule RNA fluorescent in situ hybridization (Fusco et al., 2004; Raj et al., 2008), where dozens of short oligonucleotide probes that tile the length of an RNA generate highly specific signals. Thus, we designed 48 complementary DNA oligonucleotides that were 20mer each and tiled the entire length of HOTAIR across 2.2 kb (~50% tiled) (Fig. 1C). Sequences that have extensive complementarity to other sites in the genome or are repetitive are excluded (Methods). As a negative control, we designed a similar set of probes that targeted the LacZ mRNA, normally absent from human cells. With the tiling probes, we could pull down almost all HOTAIR RNA from chromatin (Fig. 1D), a substantial improvement over the 3-morpholinos approach. HOTAIR probes did not retrieve GAPDH nor did LacZ probes retrieve HOTAIR (Fig. 1D), demonstrating the specificity of the method. Furthermore, HOTAIR fragments were equally recovered (<2-fold difference between different qRT-PCR primers, Fig. S1E), further demonstrating the strength of a non-biased targeting method.
We next examined whether lncRNA-associated DNA and proteins could be co-purified. As a positive control, we examined the TERC RNA, which functions as the template and scaffold for the telomerase complex. In HeLa S3 cells transduced with human TERC and TERT, TERC RNA expression is ~4 fold over control and constitutively bound at telomeric ends of dividing chromosomes (Abreu et al., 2010). Using 19 probes complementary DNA against TERC RNA (84% tiled) or LacZ, ~90% of total TERC RNA was specifically retrieved (Fig. 2A), showing that the method is easily compatible with other lncRNAs. We gently eluted DNA off of beads using a combination of RNase A and RNase H so that only DNA retrieved via a RNA bridge, but not direct probe-DNA interaction, could be preferentially released. We evaluated various crosslinking strategies for fixing RNA:DNA:protein interactions. Consistent with classic electron micrographs showing RNA at chromatin using the thermo-stable crosslinker glutaraldehyde (Hopwood, 1972; Sabatini et al., 1962), we found that glutaraldehyde crosslinking consistently yielded the highest signal-to-noise ratio in comparison to ultraviolet light or formaldehyde crosslinking (Fig. 2B). TERC ChIRP specifically retrieved telomere DNA but not Alu repeats, while LacZ ChIRP retrieved neither (Fig. 2B). The telomere ChIRP signal could not have arisen due to direct probe-telomere interaction: The CCCTAA template region on TERC was avoided in probe design (for this reason), and no probe shared homology with telomeric sequences. Furthermore, the ChIRP signal was crosslinking-dependent, suggesting that it was specific to telomere-TERC interaction. As another positive control for the method, we found that TERC ChIRP specifically retrieved TCAB1, a subunit of the telomerase holoenzyme that facilitates telomerase trafficking (Venteicher et al., 2009; Zhong et al., 2011) (Fig. 2C). Thus, ChIRP is compatible with the simultaneous analysis of DNA and proteins associated with specific RNAs.
One potential source of noise in ChIRP-seq is the precipitation of non-specific DNA fragments from off-target hybridization of the pool of oligonucleotide probes. In order to eliminate such artifacts, we devised the “split-probe” strategy, where we ranked all probes based on their relative positions along the target RNA, and split them into two pools such that all even probes were in one set and all odd probes in another. As the two sets of probes shared no overlapping sequences, the only target they have in common is the RNA of interest and its associated chromatin. Similar to using two independent polyclonal antibodies to obtain high confidence ChIP-seq signal, we performed two independent ChIRP-seq runs with “even” and “odd” probes separately, and focused our analysis exclusively on the overlap between their signals. Notably, “even” and “odd” probe sets enriched each of the target RNAs below with similar efficiency, yielding comparable ChIRP-seq libraries in terms of signal-to-noise ratio (Fig. S2).
To assess the sensitivity and specificity of ChIRP when applied genome-wide, we need a biological system where the binding sites of a lincRNA are already known genome-wide, and we should show that ChIRP-seq selectively retrieves most of these sites. The Drosophila dosage compensation system is ideal for this purpose. In Drosophila, male cells up-regulate the expression of genes on their single X chromosome by two-fold; this dosage compensation requires a ribonucleoprotein complex containing the Male-Specific Lethal (MSL) proteins and two lncRNAs, roX1 and roX2 (Lucchesi et al., 2005). Staining of polytene chromosomes showed that roX and MSL co-localize exclusively on the male X chromosome but not on autosomes (Franke and Baker, 1999), and pioneering work by Kuroda and colleagues have defined the occupancy landscape of MSL proteins at high resolution (Alekseyenko et al., 2008).
We performed ChIRP-seq on endogenous roX2 in male S2 cells. Using the software MACS (Zhang et al., 2008), we identified 308 roX2 binding sites--all of them are on the X chromosome and none on autosomes (Fig. 3A, B, Table S1). Autosomes constitute ~80% of the Drosophila genome. Thus, ChIRP-seq is highly specific even on a genome-wide scale, and has a negligible false discovery rate (FDR ~0). The 308 rox2 binding sites recovered 89.3% of known Chromosomal Entry Sites (CES), which are high affinity binding sites of the roX-MSL complex that have been previously defined by genetic epistasis (Alekseyenko et al., 2008). This number compares favorably with ChIP-seq of single MSL components, whose top 309 peaks identified 91% of CES (Alekseyenko et al., 2008). roX2 ChIRP-seq profile is highly correlated with MSL ChIP-seq profile, and both show very strong peaks at CES (Fig. 3A). The roX2 and MSL occupancy profiles show a Pearson correlation of 0.77, which is in the range of correlation for biological replicates of a single MSL protein, or ChIP-seq of different MSL subunits in parallel (Fig. 3A, C; R= 0.65 - 0.94) (Alekseyenko et al., 2006). Remarkably, direct comparison of roX2 ChIRP-seq vs. MSL3 ChIP-seq shows that ChIRP-seq has better dynamic range and discrimination of X chromosome vs. autosomes than ChIP-seq (Fig. 3D). Aligning roX2 ChIRP signal across all bound genes, we discovered that the roX2 occupancy is enriched over gene bodies of X chromosome genes, and increases from 5’ to 3’ end of each gene. This pattern provides independent support for a recent notion that the roX-MSL complex acts by promoting transcriptional elongation rather than initiation (Larschan et al., 2011) (Fig. 3D). In addition, motif analysis of roX2 ChIRP-seq data revealed a very significantly enriched DNA motif that is nearly identical to the MSL motif (a sequence shown to function as CES when inserted into autosomes) (Alekseyenko et al., 2008) (Fig. 3E). These data demonstrate that ChIRP-seq is highly sensitive and specific, and retrieves biologically useful signal.
Consistent with our hypothesis that the shared signal between even and odd probes will improve ChIRP-seq accuracy, we found that the shared signal between the two independent probe sets are highly correlated with that of MSL3 ChIP-seq while the unique signals in either probe sets alone were not (Fig. S2D). Based on these findings, we performed at least two ChIRP-seq experiments using independent sets of non-overlapping probes for each target RNAs, and we only accept binding sites that are concordant in both experiments (Methods). Only the shared signal between from two independent ChIRP-seq experiments can be considered meaningful signal; signal present in only even or odd experiments alone should not be interpreted. These data also suggest a lower bound of false discovery rate in larger genomes. The roX2 results showed no off-target peaks in all Drosophila autosomes, which is ~100 MB. Assuming a worst case scenario that the off-target effect is actually 1 peak per 100 MB and that this scales linearly on the human genome (3GB, 30x the fly autosomal genome), then one expects ~30 false positive peaks, a far smaller number than actually observed peaks in actual experiments below (estimated FDR<0.05 for each). Thus, our data indicate that off-target effects should be limited even in larger mammalian genomes.
We next performed ChIRP-seq of TERC in HeLa S3 cells transduced with TERT and TERC. TERC ChIRP-seq showed significant enrichment of telomeric DNA sequences (~9 fold) relative to input reads, whereas Alu repeats were not (Fig. 4A). In addition, we observed numerous specific TERC binding events throughout the genome with signal intensities comparable to conventional ChIP-seq. TERC binding sites were focal; most binding sites are “peaks” of <600 bp that do not spread beyond 1 kb (Fig. 4B), which is a pattern reminiscent of ChIP-seq peaks of transcription factors. Using the same analysis pipeline employed in roX2 analysis, we identified over 2198 TERC binding sites in the genome, which represents a large resource to study potential non-canonical functions of TERC RNA and telomerase (Table S2). It is known that TERT can bind to and co-activate Wnt target genes at chromatin (Park et al., 2009), and we hypothesized that TERC, as a component of the TERT complex, may also co-occupy some of the same genes. Unbiased analysis of the TERC-bound peaks revealed that one of the top three enriched Gene Ontology terms is Wnt receptor signaling pathway (p = 1.3 × 10-6), strongly supporting our initial hypothesis. We found that TERC occupied multiple Wnt genes directly, including Wnt11, which is transcriptionally induced by TERT overexpression in vivo (Choi et al., 2008). ChIRP-seq revealed a series of TERC binding peaks near the MYC gene, concordant with previously documented binding sites of TERT (Fig. 4C). Analysis of TERC-bound sequences identified an enriched cytosine rich sequence motif (Fig. 4D), suggesting that specific DNA motifs may be involved in TERC occupancy. ChIRP-seq of TERC in wild type HeLa cells identified largely similarly pattern of chromatin occupancy [r= 0.80; 1549 of 2198 peaks independently identified (70% overlap), p <10-20, hypergeometric distribution], indicating that endogenous TERC bind similar genomic sites (Fig. S3). These results bolster the concept of direct connections between chromosome replication and self-renewal pathways (Park et al., 2009). It will be of great interest to use ChIRP-seq to interrogate TERC binding in various biological systems where TERT has been shown to assume non-canonical roles.
We next turned to discover the genomic binding sites of HOTAIR and their relationship with Polycomb occupancy. HOTAIR is a 2.2 kb lincRNA from the HOXC locus that binds the Polycomb Repressive Complex 2 (PRC2) and affects PRC2 occupancy to target genes throughout the genome. How HOTAIR guides PRC2 to target genes is not understood. Overexpression of HOTAIR alters the positional identity of cancer cells and promotes cancer metastasis (Gupta et al., 2010). We chose to map HOTAIR occupancy genome-wide by ChIRP-seq in MDA-MB-231 breast cancer cells expressing HOTAIR, which matches the HOTAIR level and phenotypic consequences in metastasis-prone human breast cancers (Gupta et al., 2010),
We identified 832 HOTAIR occupancy sites genome-wide, using the same analysis pipeline described above with two independent ChIRP-seq probe sets (Table S3). HOTAIR binding sites occur on multiple chromosomes and are enriched in genic regions, notably regions annotated as enhancers and introns (Fig. 5A). HOTAIR binding events are focal; typical HOTAIR peaks are no more than a few hundred base pairs, a pattern reminiscent of transcription factors. When overlaid with previous generated genomic-binding data of PRC2 subunits EZH2, SUZ12, and H3K27Me3 in the same cell type, we discovered a significant pattern of co-occupancy (Fig. 5B). Focal sites of HOTAIR occupancy are associated with more broad domains of PRC2 occupancy and H3K27me3, suggesting that HOTAIR may nucleate Polycomb domains. One prime example of this pattern is in the human HOXD locus, where HOTAIR is known to target PRC2 to silence multiple HOXD genes across 40 kilobases (Rinn et al., 2007). One of the high confidence HOTAIR ChIRP-seq peaks mapped to the intergenic region between HOXD3 and HOXD4, which corresponds to middle of a broad domain of H3K27me3 and SUZ12 occupancy loss upon HOTAIR depletion (Fig. 5C) (Rinn et al., 2007; Tsai et al., 2010). Endogenous HOTAIR in primary human fibroblasts also bound the same sites (four of four tested, including in HOXD), as indicated by ChIRP-qPCR (Fig. S4). HOTAIR occupancy sites are significantly enriched for genes that gain PRC2 occupancy in a HOTAIR-dependent manner in the same cell type, or become de-repressed when endogenous HOTAIR is depleted (Gupta et al., 2010; Tsai et al., 2010) (p = 2.4 × 10-5 and p = 8.57 × 10-3 respectively, hypergeometric distribution). Unbiased analyses of HOTAIR occupied genes revealed enrichment for genes involved in pattern specification processes (p= 8.7 × 10-7), consistent with prior data that HOTAIR enforces the epigenomic state of distal and posterior positional identity (Gupta et al., 2010). These results provide additional evidence that HOTAIR-chromatin interaction is associated with PRC2 relocalization and gene silencing. Despite these significant overlaps, it is clear that the correspondence between HOTAIR occupancy and downstream effects (PRC2 occupancy, gene silencing) does not map one-to-one, which may suggest additional layers of complexity.
ChIRP-seq data enable potentially new mechanistic insights into RNA-chromatin interaction. Analysis of HOTAIR binding sites revealed enrichment of a GA-rich polypurine motif (e=3.8e-128, Fig. 5D), which we term the HOTAIR motif. Interestingly, Drosphila Polycomb Response Element (PRE) are known to bind GAGA protein (Horard et al., 2000), and recent studies of mammalian PREs also identified GA-repeats as a shared feature (Woo et al., 2010), (Sing et al., 2009), although other sequences are also required. In addition, the MSL/roX ribonucleoprotein complex responsible for dosage compensation in Drosophila also recognizes a GA-rich element on fly X chromosome (Alekseyenko et al., 2008), raising the intriguing possibility of similar mechanisms where lncRNAs could potentially serve as guides for chromatin-lncRNA complexes such as PRC2-HOTAIR and MSL-roX.
HOTAIR may actively recruit PRC2 to it targets genes, or simply serve as a scaffolding molecule that gets passively transported along with PRC2. The observed pattern of focal HOTAIR occupancy in the midst of broader domains of PRC2 strongly suggests the former hypothesis. To formally distinguish between these two possibilities, we performed HOTAIR ChIRP-seq in isogenic cells depleted for EZH2 (Gupta et al., 2010), which directly binds HOTAIR (Kaneko et al., 2010). Notably, the pattern of HOTAIR occupancy was largely preserved upon EZH2 depletion (Fig. 6A), indicating that HOTAIR can bind chromatin without an intact PRC2. Independent ChIRP-qPCR validated the binding sites and confirmed the specificity of ChIRP-seq results in control and shEZH2 cells (Fig. 6B). Together, these results support the role HOTAIR lincRNA as an active recruiter of chromatin modifying complexes.
Here we described ChIRP-seq, a method of mapping in vivo lincRNA binding sites genome-wide. The key parameters for success are the split pools of tiling oligonucleotide probes and glutaraldehyde crosslinking. The design of affinity-probes is straightforward given the RNA sequence and requires no prior knowledge of the RNA’s structure or functional domains. Our success with roX2, TERC, and HOTAIR-- three rather different RNAs in two species-- suggests that ChIRP-seq is likely generalizable to many lncRNAs. As with all experiments, care and proper controls are required to interpret the results. Different lincRNA may require titration of conditions, and judicious change of conditions, such as selection of different affinity probes or crosslinkers, may highlight different aspects of RNA-chromatin interactions. Like ChIP-seq, not all binding events are necessarily functional, and additional studies are required to ascertain the biological consequences of RNA occupancy on chromatin. Nonetheless, we foresee many interesting application of this technology for researchers of other chromatin-associated lncRNAs, which number now in the thousands (Khalil et al., 2009; Zhao et al., 2010). Just as ChIP-seq has opened the door for genome-wide explorations of DNA-protein interactions, ChIRP-seq studies of the “RNA interactome” may reveal many new avenues of biology.
ChIRP-seq has enabled the first genome-wide views of ncRNA occupancy on the human genome. Commonalities in the occupancy patterns of TERC and HOTAIR suggest several lessons for RNA-chromatin interactions. First, lincRNA binding sites are focal, specific, and numerous. In contrast to histone modifications which often broadly occupy certain genomic elements (e.g. promoters, enhancers, transcribed exons, or silent genes (Rando and Chang, 2009), the focal, interspersed, and gene-selective nature of lincRNA occupancy more resembles transcription factors. Even roX2, which binds across gene bodies of fly X-linked genes, shows focal peaks of high occupancy at CES sites. These results imply that certain lincRNAs may be “selector” elements that can access the genome in a highly discriminating fashion.
Second, lincRNAs access the genome through specific DNA sequences. Using ChIRP-seq, we show that genome-scale collections of RNA binding sites can be used to discover the enriched underlying DNA sequence motifs. These findings indicate the existence of an entirely new class of regulatory elements--lincRNA target sites--in the genome. For instance, we discovered a GA-rich homopurine motif for HOTAIR, a lincRNA known to recruit Polycomb. Importantly, mammalian Polycomb response elements are known to have a GAGA motif (Sing et al., 2009; Woo et al., 2010), but the cognate partner has been lacking. The HOTAIR motif also has similarities to the MSL binding motif in that both are GA-rich. But the HOTAIR motif is more degenerate than the MSL motif and does not strictly conserve a GAGA sequence. The discovery of specific RNA targeting motifs may start to unify at a mechanistic level many of the disparate phenomena that involve RNA-mediated chromatin states. The GA-rich HOTAIR motif may enable formation of RNA:DNA:DNA triplex (facilitated by homopurine runs and known mediate some lncRNA-chromatin interaction (Martianov et al., 2007; Schmitz et al., 2010), serve as the binding site of a protein that recruits HOTAIR, or indirectly configure a chromatin state that facilitates HOTAIR binding. Additional studies are required to evaluate these hypotheses, which are now possible due to ChIRP and knowledge of the candidate motif. HOTAIR also binds the LSD1-coREST-REST complex that can target DNA (Tsai et al., 2010), and multiple mechanisms may operate together to target lincRNAs.
Third, comparison of lincRNA occupancy map with chromatin state maps can reveal the order and logic of the regulatory cascade. For instance, comparison of HOTAIR versus Polycomb occupancy suggested that HOTAIR nucleates Polycomb domains. Focal HOTAIR binding sites (<500 bp) occur in the midst of a broad domain of Polycomb that can extend in both directions for several kilobases. This pattern argues that HOTAIR does not simply bind to or stabilize pre-existing Polycomb, which would have predicted broad co-occupancy of the two. Rather, the maps suggested that the RNA may be a pioneering factor that recruits Polycomb, which then spreads out bilaterally. To directly test the order of occupancy, we depleted PRC2 subunit EZH2 and showed that HOTAIR can bind to target chromatin genome-wide. This result uncouples the formation of HOTAIR-PRC2 ribonucleoprotein complex (the RNA scaffold function) from RNA targeting to chromatin. Because EZH2 is the enzymatic subunit of PRC2, H3K27me3 is also presumably not required a priori for HOTAIR targeting. Thus, the information for target gene selectivity resides in the RNA, which then recruits Polycomb to chromatin. Prior efforts have identified sequence motifs associated with PRC2 occupancy as a function of HOTAIR (Tsai et al., 2010), which may facilitate the spreading of PRC2 occupancy. We previously showed that EZH2 depletion diminished the metastatic potential of HOTAIR-expression cancer cells (Gupta et al., 2010). The ChIRP-seq data indicate that it is the lack of PRC2, rather than the inactivation of HOTAIR function at chromatin, that is responsible for this epistatic interaction. Together, these experiments suggest that lincRNAs are surprisingly like sequence-specific transcription factors in dictating chromatin states, and again suggests the utility of ChIRP to generate mechanistic insights.
SuperTelomerase, MDA-MB-231-HOTAIR, MDA-MB-231 HOTAIR-shEZH2 cells were maintained in DMEM (Invitrogen) supplemented with 10% FBS (HyClone) and 1% Pen/Strep (Invitrogen).
Morpholino Probes against HOTAIR were designed on three open regions detected by PARS-seq (ref) by Gene-Tools LLC (HOTAIR Morpho-1: GAGCAGCTCAAGTCCCCTGCATCCA, HOTAIR Morpho-2: GCACCCGCTCAGGTTTTTCCAGCGT, HOTAIR Morpho-3: TACATAAACCTCTGTTCTGTGAGTGC, Mock Morpho: CCTCTTACCTCAGTTACAATTTATA). All probes were biotinylated at the 3’ end. Antisense DNA probes were designed against HOTAIR full-length sequence using online designer at www.singlemoleculefish.com. All probes were compared with the human genome using the BLAT tool and probes returning noticeable homology to non-HOTAIR targets were discarded (BLAT searches through a non-overlapping 11-mers index). 48 probes were generated and split into two sets based on their relative positions along HOTAIR sequence such as even-numbered and odd-numbered probes were separately pooled. A symmetrical set of probes against LacZ RNA was also generated as the mock control. All probes were biotinylated at the 3’ end with an 18-carbon spacer arm (Protein and Nucleic Acid Facility, Stanford University). 19 probes were generated against TERC RNA and 24 for roX2 by similar methods. Sequences of all probes are listed in Table S4. The absolute levels of the ncRNAs in this study are as follows in Ct values per 100 ng of total RNA: roX2 =16.6; TERC=18.4; HOTAIR= 22.95. Thus, the fly and mammalian experiments are roughly comparable, and the mammalian experiments in fact show that ChIRP is compatible with lower expressed ncRNAs.
Cells were grown to log-phase in tissue culture plates and rinsed once with room temperature PBS. For UV crosslinking, the plates were irradiated in UV crosslinker (Stratagene) with lids off and PBS aspirated. UV strength was titrated from 240mJ to 960mJ. For chemical crosslinking, cells were fixed on plate with appropriate amounts of 1% formaldehyde or 1% glutaraldehyde in PBS for 10 minutes at room temperature. Crosslinking was then quenched with 0.125M glycine for 5 minutes. Cells were rinsed again with PBS, scraped into Falcon tubes, and pelleted at 800g for formaldehyde crosslinking and 2500g for glutaraldehyde crosslinking. Cell pellets were then snap frozen in liquid nitrogen and can be stored in -80C indefinitely.
To prepare chromatin, cell pellets were quickly thawed in 37C water bath and resuspended in Swelling Buffer (0.1M Tris pH7.0, 10mM KOAc, 15mM MgOAc. Before use, add 1% NP-40, 1mM DTT, 1mM PMSF, complete protease inhibitor (GE), and 0.1U/ul Superase-in (Ambion)) for 10’ on ice. Cell suspension was then dounced and pelleted at 2500g for 5’. Nuclei was further lysed in nuclear lysis buffer at 100mg/ml (50mM Tris 7.0, 10mM EDTA, 1% SDS, add DTT, PMSF, P.I., and Superase-in before use) on ice for 10’, and sonicated using Bioruptor (Diagenode) until most chromatin has solubilized and DNA is in the size range of 100-500bp. Chromatin can be snap frozen in liquid nitrogen and stored in -80C until use.
Chromatin is diluted in 2 times volume of hybridization buffer (500mM NaCl, 1%SDS, 100mM Tris 7.0, 10mM EDTA, 15% Formamide, add DTT, PMSF, P.I, and Superase-in fresh). 100pmol probes were added to 3ml of diluted chromatin, which was mixed by end-to-end rotation at 37C for 4 hours. Streptavidin-magnetic C1 beads were washed three times in nuclear lysis buffer, blocked with 500ng/ul yeast total RNA and 1mg/ml BSA for 1 hour at room temperature, and washed three times again in nuclear lysis buffer before resuspended in its original volume. 100ul washed/blocked C1 beads were added per 100pmol of probes, and the whole reaction was mixed for another 30min at 37C. Beads:biotin-probes:RNA:chromatin adducts were captured by magnets (Invitrogen) and washed five times with 40x beads volume of wash buffer (2x SSC, 0.5% SDS, add DTT and PMSF fresh). After last wash buffer was removed carefully with P-10 pipette so that no trace volume was left behind. Beads are now poised for different elution protocols depending on downstream assays.
For reversible crosslinking (formaldehyde), beads was resuspended in 10x original volume of RNA elution buffer (Tris 7.0, 1% SDS) and boiled for 15min, followed by trizol:chloroform extraction and RNeasy mini column purification. For non-reversible crosslinking (UV and glutaraldehyde), beads were resuspended in 10x original volume of RNA pK buffer (100mM NaCl, 10mM Tris 7.0, 1mM EDTA, 0.5% SDS) and 0.2U/ul Proteinase K (Invitrogen). pK treatment was carried out at 65C for 45’, followed by boiling for 15’, and trizol:chloroform extraction. Eluted RNA was subject to quantitative reverse-transcription PCR (QRTPCR) for the detection of enriched transcripts.
Beads were resuspended in 3x original volume of DNase buffer (100mM NaCl and 0.1% NP-40), and protein was eluted with a cocktail of 100ug/ml RNase A (Sigma-Aldrich) and 0.1U/ul RNase H (Epicenter), and 100U/ml DNase I (Invitrogen) at 37C for 30’. Protein eluent was supplemented with 0.2 volume of 5x laemmeli buffer (without bromophenol blue or glycerol), boiled for 5’, and dot blotted to nitrocellulose membrane with Bio-Dot apparatus (Biorad). Membrane was then blotted against TCAB1 and tubulin antibodies (gifts from Artandi lab) per normal Western protocol.
Beads were resuspended in 3x original volume DNA elution buffer (50mM NaHCO3, 1%SDS, 200mM NaCl), and DNA was eluted with a cocktail of 100ug/ml RNase A (Sigma-Aldrich) and 0.1U/ul RNase H (Epicenter). RNase elution was carried out twice at 37C with end-to-end rotation and eluent from both steps was combined. For formaldehyde crosslinking, chromatin was reverse-crosslinked at 65C overnight. For non-reversible crosslinking, eluted chromatin was pK treated with 0.2U/ul pK at 65C for 45’. In either case, DNA was then extracted with equal volume of phenol:chloroform:isoamyl (Invitrogen) and precipitated with ethanol at -80C overnight. Eluted DNA was subject to QPCR, Dot Blots, or high-throughput sequencing.
DNA was denatured in 0.1 volume of denaturing solution (4M NaOH, 100mM EDTA) at 95C for 5’, and then chilled on ice for 5’. Equal volume of chilled 2M NH4OAC was added to neutralize DNA on ice, which is then dot blotted onto nitrocellulose membrane using a Bio-Dot apparatus. Membrane was immediately crosslinked at 120mJ in Stratalinker, and pre-hybridized in Rapid-Hyb (GE) at 42C for 30’. Telomere and Alu repeats were detected using end-labeled radioactive Southern probes CCCTAACCCTAACCCTAACCCTAACCCTAA and GTGATCCGCCCGCCTCGGCCTCCCAAAGTG respectively.
High-throughput sequencing libraries were constructed from ChIRPed DNA according the ChIP-seq protocol as described(Johnson et al., 2007), and sequenced on Genome Analyzer IIx (Illumina), with read length of 36bp. Raw reads were uniquely mapped to reference genome (hg18 assembly for HOTAIR, TERC, LacZ and EZH2 ChIRP-seq samples, and dm3 for roX2) using Bowtie (Langmead et al., 2009).
ChIRP-seq workflow consists of three steps.
Sequences of top 500 true peaks (ranked by fold enrichment) within +/-200bp around peak summits were extracted and motifs analysis against these 500 peaks was performed using MEME (Bailey and Elkan, 1994). Only motifs of the highest significance were reported. Enriched gene sets were obtained through GREAT (McLean et al., 2010) on all 2198 TERC true peaks and all 832 HOTAIR true peaks. Gene Ontology of both gene sets were performed using DAVID (Huang da et al., 2009; Wishart et al., 2009).
roX2 peaks and motif were obtained in a way described above, within 308 predicted true peaks, none was in autosomes, resulted a false discovery rate (FDR) = 0. Normalized signal of both the combine lane of Rox2 ChIRP-seq and MSL3-TAP ChIP-seq was obtained in a similar way described in HOTAIR ChIRP-seq analysis. Only regions where normalized signal is >=10 were counted in calculating the Pearson correlation between Rox2 and MSL3-TAP samples. Genes who overlaps >=1bp with windows +/-2kbp of true Rox2 peak summits were included in the average diagram. In total, 1087 RefSeq transcripts were included in chrX average diagram, and 4260 RefSeq transcripts were included in that of chr2L. Distance on the diagram was scaled with gene length, so that the diagram shows signal in a region from 50% gene length upstream to 50% gene length downstream.
Reads from “TERC ChIRP” sample and “Input” sample were compared against telomere sequence (CCCTAAx5) and Alu sequence (GTGATCCGCCCGCCTCGGCCTCCCAAAGTG). Complete matches were tallied and divided by total number of reads in that sample to give Reads per Million (RPM). RPMs from TERC enriched sample were divided with those from the Input sample to give “Fold Enrichment.” We note that the odd probes yields better enrichment of telomere than the even probes. Because the genome-wide TERC binding sites require by definition comparable pull down by both sets of probes, this result raise the possibility that TERC interacts with telomeres vs other genomic binding site via different mechanisms.
Normalized signal within 10kb upstream and downstream of the summits of true HOTAIR peaks were extracted with a smooth window size of 50bp. Within each 50bp, the normalized HOTAIR ChIRP signal is calculated via:
Suz12, Ezh2 and H3K27Me3 ChIP-chip data were generated previous by Gupta et. al., 2010, Tsai et. al., 2010, and Rinn et. al., 2007. ChIP-chip signal of Suz12, Ezh2 and H3K27Me3 of 10kb upstream and downstream of HOTAIR peak summits were also extracted in a similar way.
We thank T. Hung, MC. Tsai, O. Manor, E. Segal, M. Kuroda, T. Swigut, and I. Shestopalov for discussions. Supported by the Agency of Science, Technology and Research of Singapore (C.C., F.L.Z.), NIH R01-CA118750 and R01-HG004361 (H.Y.C.), and California Institute for Regenerative Medicine (H.Y.C.). H.Y.C. is an Early Career Scientist of the Howard Hughes Medical Institute.
Accession Number Deep sequencing data in this study are available for download from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) (accession ID: GSE31332).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.