|Home | About | Journals | Submit | Contact Us | Français|
While thousands of large intergenic non-coding RNAs (lincRNAs) have been identified in mammals, few have been functionally characterized, leading to debate about their biological role. To address this, we performed loss-of-function studies on most lincRNAs expressed in mouse embryonic stem cells (ESC) and characterized the effects on gene expression. Here we show that knockdown of lincRNAs has major consequences on gene expression patterns, comparable to knockdown of well-known ESC regulators. Notably, lincRNAs primarily affect gene expression in trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage commitment programs. We integrate lincRNAs into the molecular circuitry of ESCs and show that lincRNA genes are regulated by key transcription factors and that lincRNA transcripts bind to multiple chromatin regulatory proteins to affect shared gene expression programs. Together, the results demonstrate that lincRNAs have key roles in the circuitry controlling ESC state.
The mammalian genome encodes many thousands of large non-coding transcripts1 including a class of ~3500 large intergenic ncRNAs (lincRNAs) identified using a chromatin signature of actively transcribed genes2–4. These lincRNA genes have been shown to have interesting properties, including clear evolutionary conservation2–5, expression patterns correlated with various cellular processes2,6 and binding of key transcription factors to their promoters2,6, and the lincRNAs themselves physically associate with chromatin regulatory proteins4,7. Yet, it remains unclear whether the RNA transcripts themselves have biological functions8–10. Few have been demonstrated to have phenotypic consequences by loss-of-function experiments6. As a result, the functional role of lincRNA genes has been widely debated. Various proposals include that lincRNA genes act as enhancer regions, with the RNA transcript simply being an incidental by-product8,9, that lincRNA transcripts act in cis to activate transcription11, and that lincRNA transcripts can act in trans to repress transcription12,13.
We therefore sought to undertake systematic loss-of-function experiments on all lincRNAs known to be expressed in mouse embryonic stem cells (ESCs)2,3. ESCs are pluripotent cells that can self-renew in culture and can give rise to cells of any of the three primary germ layers including the germline14. The signalling14, transcriptional15–17, and chromatin15,18–21 regulatory networks controlling pluripotency have been well characterized providing an ideal system to determine how lincRNAs may integrate into these processes.
Here we show that knockdown of the vast majority of ESC-expressed lincRNAs has a strong effect on gene expression patterns in ESCs, of comparable magnitude to that seen for the well-known ESC regulatory proteins. We identify dozens of lincRNAs that upon loss-of-function cause an exit from the pluripotent state and dozens of additional lincRNAs that, while not essential for the maintenance of pluripotency, act to repress lineage-specific gene expression programs in ESCs. We integrate the lincRNAs into the molecular circuitry of ESCs by demonstrating that most lincRNAs are directly regulated by critical pluripotency-associated transcription factors and ~30% of lincRNAs physically interact with specific chromatin regulatory proteins to affect gene expression. Together, these results demonstrate a regulatory network in ESCs whereby transcription factors directly regulate the expression of lincRNA genes, many of which can physically interact with chromatin proteins, affect gene expression programs, and maintain the ESC state.
To perform loss-of-function experiments, we generated five lentiviral-based short hairpin RNAs (shRNAs)22 targeting each of the 226 lincRNAs previously identified in ESCs2,3 (see Methods, Supplemental Table 1). These shRNAs successfully targeted 147 lincRNAs and reduced their expression by an average of ~75% compared to endogenous levels in ESCs (see Methods, Figure 1a, Supplemental Figure 1, Supplemental Table 2). As positive controls, we generated shRNAs targeting ~50 genes encoding regulatory proteins, including both transcription and chromatin factors that have been shown to play critical roles in ESC regulation17,20,23; validated hairpins were obtained against 40 of these genes (Supplemental Table 2). As negative controls, we performed independent infections with lentiviruses containing 27 different shRNAs with no known cellular target RNA.
We infected each shRNA into ESCs, isolated RNA after 4 days, and profiled their effects on global transcription by hybridization to genome-wide microarrays (Figure 1a, see Methods). We employed a stringent procedure to control for non-specific effects due to viral infection, generic RNAi responses, or ‘off-target’ effects. Expression changes were deemed significant only if they exceeded the maximum levels observed in any of the negative controls, showed a two-fold change in expression compared to the negative controls, and had a low false discovery rate (FDR) assessed across all genes based on permutation tests (Figure 1b, see Methods). This approach controls for the overall rate of non-specific effects by estimating the number and magnitude of observed effects in the negative control hairpins, where all effects are non-specific.
For 137 of the 147 lincRNAs (93%), knockdown caused a significant impact on gene expression (Supplemental Table 3), with an average of 175 protein-coding transcripts affected (range: 20–936) (Figure 1c, Supplemental Figure 2, Supplemental Table 4). These results were similar to those obtained upon knockdown of the 40 well-studied ESC regulatory proteins: 38 (95%) showed significant effects on gene expression, with an average of 207 genes affected (range: 28 (for DNMT3L) to 1187 (for Oct4)) (Figure 1c, Supplemental Figure 2, Supplemental Table 4). Although some individual lincRNAs have been found to lead primarily to gene repression12,13, we find that knockdown of the lincRNAs studied here largely led to comparable numbers of activated and repressed genes (Supplemental Figure 2, Supplemental Table 4). To further assess ‘off-target’ effects, we also profiled the effects of the second-best validated shRNA targeting 10 randomly selected lincRNA genes. In all cases, second shRNAs against the same target produced significantly similar expression changes (see Methods, Supplemental Table 5). These results indicate that the vast majority of lincRNAs have functional consequences on overall gene expression of comparable magnitude (in terms of number of affected genes and impact on levels) to the known transcriptional regulators in ESCs.
Following the observation that a few lincRNAs act in cis 24,25, some recent papers have claimed that most lincRNAs act primarily in cis8,11,26. We found no evidence to support this latter notion: knockdown of only 2 lincRNAs showed effects on a neighbouring gene, only 13 showed effects within a window of ten genes on either side, and only 8 showed effects on genes within 300 kb; these proportions are no greater than observed for protein-coding genes (Supplemental Figure 3, Supplemental Table 6). In short, lincRNAs appear to affect expression largely in trans.
Our results contrast with a recent study that concluded that lincRNAs act in cis, based on the observation that knockdown of 7 out of 12 lincRNAs affected expression of a gene within 300 kb11. The explanation appears to be that the threshold in the previous study failed to account for multiple hypothesis testing within the local region. Accounting for this, the effects on neighbouring genes are no greater than expected by chance and are consistent with our observations here (see Methods).
While some lincRNAs can regulate gene expression in cis11,24,25, determining the precise proportion of cis regulators requires more direct experimental approaches. We note that our results are consistent with observed correlations between lincRNAs and neighbouring genes2,26, which may represent shared upstream regulation2,12 or local transcriptional effects10,27. In addition, the lincRNAs studied here should be distinguished from transcripts that are produced at enhancer sites8,9, the function of which has yet to be determined.
We next sought to investigate whether lincRNAs play a role in regulating the ESC state. Regulation of the ESC state involves two components, maintaining the pluripotency program and repressing differentiation programs15. To determine whether lincRNAs play a role in the maintenance of the pluripotency program, we studied their effects on the expression of Nanog, a key transcription factor that is required to establish28 and uniquely marks the pluripotent state29,30. We infected ESCs carrying a luciferase reporter gene expressed from the endogenous Nanog promoter31 with shRNAs targeting lincRNAs or protein-coding genes. We monitored loss of reporter activity after 8 days relative to 25 negative control hairpins across biological replicates (see Methods). To ensure that the observed effects were not simply due to a reduction in cell viability, we excluded shRNAs that caused a reduction in cell numbers (see Methods, Supplemental Figure 4, Supplemental Table 7). Altogether, we identified 26 lincRNAs that had major effects on endogenous Nanog levels with many at comparable levels to the knockdown of the known protein-coding regulators of pluripotency such as Oct4 and Nanog (Figure 2a, Supplemental Table 7). This establishes that these lincRNAs have a role in maintaining the pluripotent state.
To further validate the role of these 26 lincRNAs in regulating the pluripotent state, we knocked down these lincRNAs in wild-type ESCs and measured mRNA levels of pluripotency marker genes Oct4, Sox2, Nanog, Klf4, and Zfp42 after 8 days. In all cases we observe a significant reduction in the expression of multiple pluripotency markers with >90% showing a significant decrease in both Oct4 and Nanog levels (Supplemental Figure 5, Supplemental Tables 8,9). To control for ‘off-target’ effects, we studied additional hairpins targeting these lincRNAs. For 15 lincRNAs we had an effective second hairpin. In all 15 cases, the second hairpin produced comparable reductions in Oct4 expression levels, showing that the observations were not due to ‘off-target’ effects (Figure 2b, Supplemental Table 10). Notably, >90% of lincRNA knockdowns affecting Nanog reporter levels led to loss of ESC morphology (Figure 2c, Supplemental Figures 6, 7). Thus, inhibition of these 26 lincRNAs lead to an increased exit from the pluripotent state.
To determine if lincRNAs act in repressing differentiation programs we compared the overall gene expression patterns resulting from knockdown of the lincRNAs to published gene expression patterns resulting from induced differentiation of ESCs32,33 and assessed significance using a permutation-derived FDR34 (see Methods). These states include differentiation into endoderm, ectoderm, neuroectoderm, mesoderm, and trophectoderm lineages. As a positive control for our analytical method, we confirmed the expected results that the expression pattern caused by Oct4 knockdown was strongly associated with the trophoectoderm lineage35 and the pattern caused by Nanog knockdown was strongly associated with endoderm differentiation30 (Figure 3a).
Using this approach, we identified 30 lincRNAs whose knockdown produced expression patterns similar to differentiation into specific lineages (Supplemental Table 11). Amongst these lincRNAs, 13 are associated with endoderm differentiation, 7 with ectoderm differentiation, 5 with neuroectoderm differentiation, 7 with mesoderm differentiation, and 2 with the trophectoderm lineage (Figure 3a). Consistent with these functional assignments, we observe that the majority (>85%) of the 30 lincRNAs associated with specific differentiation lineages showed upregulation of the well-known marker genes for the identified states17,32 upon knockdown (such as Sox17 (endoderm), Fgf5 (ectoderm), Pax6 (neuroectoderm), Brachyury (mesoderm), and Cdx2 (trophectoderm)) (Figure 3b, Supplemental Figures 8, 9, Supplemental Tables 12, 13).
The fact that knockdown of these 30 lincRNAs induces gene expression programs associated with specific early differentiation lineages suggests that these lincRNAs normally are a barrier to such differentiation. Interestingly, most of the lincRNA knockdowns (~85%) that induce gene expression patterns associated with these lineages did not cause the cells to differentiate as determined by Nanog reporter levels (Supplemental Table 7) and Oct4 expression (Supplemental Figure 10). This is consistent with observations for several critical ESC chromatin regulators, such as the polycomb complex; loss-of-function of these regulators similarly induce lineage-specific markers without causing differentiation18,36,37.
Together, these data indicate that many lincRNAs play important roles in regulating the ESC state, including maintaining the pluripotent state and repressing specific differentiation lineages.
Having demonstrated a functional role for lincRNAs in ESCs, we sought to integrate the lincRNAs into the molecular circuitry controlling the pluripotent state. First, we explored how lincRNA expression is regulated in ESCs. Toward this end, we utilized published genome-wide maps of 9 pluripotency-associated transcription factors (TFs)16,38 and determined whether they bind to the promoters of lincRNA genes. Of the 226 lincRNA promoters ~75% are bound by at least one of 9 pluripotency-associated TFs (including Oct4, Sox2, Nanog, cMyc, nMyc, Klf4, Zfx, Smad, and Tcf3) with a median of 3 factors bound to each promoter (Figure 4a, Supplemental Figure 11, Supplemental Table 14), comparable to the proportion reported for protein-coding genes16. Interestingly, the 3 core factors (Oct4, Sox2, and Nanog) bind to the promoters of ~12% of all ESC lincRNAs and ~50% of lincRNAs involved in the regulation of the pluripotent state.
To determine if lincRNA expression is functionally regulated by the pluripotency-associated TFs, we used shRNAs to knock down the expression of 5 of the 9 pluripotency-associated TF genes for which we could obtain validated hairpins and profiled the resulting changes in lincRNA expression after 4 days. Upon knockdown of a TF, ~50% of lincRNAs genes whose promoters are bound by the TF exhibit expression changes (Figure 4a); this proportion is comparable to that seen for protein-coding genes whose promoters are bound by the TF (Supplemental Figure 12). The strong but imperfect correlation between TF-binding and effect of TF-knockdown is consistent with previous observations39 and may reflect regulatory redundancy in the pluripotency network40. In addition, we profiled the knockdown of an additional 7 pluripotency-associated transcription factors (including Esrrb, Zfp42, and Stat3). Altogether, for ~60% of the ESC lincRNAs, we identified a significant downregulation upon knockdown of one of these 11 TFs (Figure 4b, Supplemental Table 15).
After retinoic-acid-induced differentiation of ESCs, the ESC lincRNAs show temporal changes across the time course with ~75% showing a decrease in expression compared to untreated ESCs (Supplemental Figure 13, Supplemental Table 16). Notably, all of the lincRNAs shown to regulate pluripotency are down-regulated upon retinoic acid treatment (Supplemental Figure 13). Our results establish that lincRNAs are direct transcriptional targets of pluripotency-associated TFs and are dynamically expressed across differentiation. Collectively, these results demonstrate that lincRNAs are an important regulatory component within the ESC circuitry.
To explore how lincRNAs carry out their regulatory roles, we studied whether lincRNAs physically associate with chromatin regulatory proteins in ESCs. We previously showed that many human lincRNAs can interact with the polycomb repressive complex4, a complex that plays a critical functional role in the regulation of ESCs18,19. To determine whether the ESC lincRNAs physically associate with the polycomb complex, we crosslinked RNA-protein complexes using formaldehyde, immunoprecipitated the complex using antibodies specific to both the Suz12 and Ezh2 components of Polycomb, and profiled the co-precipitated lincRNAs using a direct RNA quantification method41 (see Methods). We performed immunoprecipitation of the Polycomb complex across 5 biological replicates and 8 mock-IgG controls, and we assessed significance using a permutation test (see Methods, Supplemental Figure 16). Altogether, we identified 24 lincRNAs (~10% of the ESC lincRNAs) that were strongly enriched for both Polycomb components (Figure 5b, Supplemental Table 17).
To determine if lincRNAs interact with additional chromatin proteins, we systematically analysed chromatin-modifying proteins that have been shown to play critical roles in ESCs18–21,42. Specifically, we screened antibodies against 28 chromatin complexes (see Methods, Supplemental Figure 14, Supplemental Table 18) and identified 11 additional chromatin complexes that are strongly and reproducibly associated with lincRNAs (see Methods, Supplemental Figure 15, 16). These chromatin complexes are involved in ‘reading’ (PRC1, Cbx1, and Cbx3), ‘writing’ (Tip60/P400, PRC2, Setd8, ESET, and Suv39h1), and ‘erasing’ (Jarid1b, Jarid1c, and HDAC1) histone modifications, as well as a chromatin-associated DNA binding protein (YY1) (Figure 5a). Altogether, we found that 74 (~30%) of the ESC lincRNAs are associated with at least one of these 12 chromatin complexes (Figure 5b, Supplemental Table 17). While most of the identified interactions are with repressive chromatin regulators, this is likely due to limitations of our selection criteria and available antibodies.
Many lincRNAs are strongly associated with multiple chromatin complexes (Figure 5b). For example, we identified 8 lincRNAs that bind to the PRC2 H3K27 and ESET H3K9 methyltransferase complexes (‘writers’ of repressive marks) and the Jarid1c H3K4 demethylase complex (an ‘eraser’ of activating marks). Consistent with this, the PRC2 and ESET complexes have been reported to bind at many of the same ‘bivalent’ domains21 and to functionally associate with the Jarid1c complex43. Similarly, we identified a distinct set of 17 lincRNAs that bind to the PRC2 complex (‘writer’ of K27 repressive marks), PRC1 complex (‘reader’ of K27 repressive marks), and Jarid1b complex (‘eraser’ of K4 activating marks) (Figure 5b), as well as other functionally consistent ‘reader’, ‘writer’, and ‘eraser’ combinations (Supplemental Figure 17). One of several potential models consistent with this data is that lincRNAs may bind to multiple distinct protein complexes, perhaps serving as ‘flexible scaffolds’ to bridge functionally related complexes as previously described for telomerase RNA44.
To determine if the identified lincRNA-protein interactions have a functional role, we examined the effects on gene expression resulting from knockdown of individual lincRNAs that are physically associated with particular chromatin complexes and from knockdown of genes encoding the associated complex itself (see Methods). For >40% of these lincRNA-protein interactions, we identified a highly significant overlap in affected gene expression programs compared to just ~6% for random lincRNA-protein pairs (see Methods, Supplemental Table 19). Other cases may reflect the limited power to detect the overlaps, because specific lincRNA-protein complexes may be related to only a fraction of the overall expression pattern mediated by the chromatin complex.
Together, these data demonstrate that many ESC lincRNAs physically associate with multiple different chromatin regulatory proteins and these interactions are likely to be important for the regulation of gene expression programs.
While the mammalian genome encodes thousands of lincRNA genes, few have been functionally characterized. We performed an unbiased loss-of-function analysis of lincRNAs expressed in ESCs and show that lincRNAs are clearly functional and primarily act in trans to affect global gene expression. We establish that lincRNAs are key components of the ESC transcriptional network that are functionally important for maintaining the pluripotent state, and that many are down-regulated upon differentiation. The ESC lincRNAs physically interact with chromatin proteins, many of which have been previously implicated in the maintenance of the pluripotent state18,20,21. In addition to chromatin proteins, lincRNAs interact with other protein complexes including many RNA-binding proteins (data not shown).
Our data suggest a model whereby a distinct set of lincRNAs is transcribed in a given cell type and interacts with ubiquitous regulatory protein complexes to form cell-type-specific RNA-protein complexes that coordinate cell-type specific gene expression programs (Figure 6). Because many of the lincRNAs studied here interact with multiple different protein complexes, they may act as cell-type specific ‘flexible scaffolds’44 to bring together protein complexes into larger functional units (Figure 6). This model has been previously demonstrated for the yeast telomerase RNA44 and suggested for the XIST45 and HOTAIR46 lincRNAs. The hypothesis that lincRNAs serve as flexible scaffolds could explain the uneven patterns of evolutionary conservation seen across the length of lincRNA genes3: the more highly conserved patches could correspond to regions of interaction with protein complexes.
While a model of lincRNAs acting as ‘flexible scaffolds’ is attractive, it is far from proven. Testing the hypothesis for lincRNAs will require systematic studies, including defining all protein-complexes with which lincRNAs interact, determining where these protein interactions assemble on RNA, and ascertaining whether they bind simultaneously or alternatively. Moreover, understanding how lincRNA-protein interactions give rise to specific patterns of gene expression will require determination of the functional contribution of each interaction and possible localization of the complex to its genomic targets.
We cloned 5 shRNAs targeting each lincRNA into a puromycin-resistant lentiviral vector22. ESCs were plated on pre-gelatinized 96-well plates and infected with lentivirus prior to addition of irradiated DR4 MEFs. Media containing 1ug/mL puromycin was added 24 hr after infection. On-target knockdown was assessed after 4 days and the best hairpin showing a knockdown >60% was selected. RNA from 147 lincRNAs, 40 protein-coding genes, and 27 negative controls were hybridized to Agilent microarrays. Differentially expressed genes were defined as having an FDR<5% and fold-change >2-fold compared to controls.
Nanog-Luciferase ESCs31 were infected and measured after 8 days. Hits were identified if they reduced luciferase levels (z<−6) across all replicates and did not reduce AlamarBlue levels. Hits were validated in wild-type ESCs by measuring mRNA levels of Oct4, Nanog, Sox2, Klf4, and Zfp42. Oct4 expression was assessed using immunofluorescence staining and morphology was visually assessed.
Lineage expression programs were defined using published datasets (GSE12982, GSE11523, and GSE4082) and curated gene expression signatures32,33. Overlaps in gene expression affects were assessed using a modified GSEA34. Expression changes in lineage markers were determined using qPCR.
ChIP-Seq data was downloaded (GSE11724 and GSE11431), aligned, and analysed. lincRNA promoters were previously defined using H3K4me3 peaks3. Changes in expression of the lincRNAs upon knockdown of the TFs were analyzed using Agilent microarrays.
ESCs were crosslinked with formaldehyde, lysed, immunoprecipitated, washed, and reverse crosslinked. RNA was hybridized to the Nanostring codeset. We tested antibodies for 28 chromatin complexes and selected successful antibodies that (i) had >10 lincRNAs exceeding a 5-fold change and (ii) had significant enrichments across 3 replicates. We compared the overlap in gene expression using a modified GSEA34.
V6.5 (genotype 129SvJae × C57BL/6) and Nanog-Luciferase31 ES cells were co-cultured with irradiated C57BL/6 MEFs (GlobalStem; GSC-6002C) on pre-gelatinized plates as previously described47. Briefly, cells were cultured in mES media consisting of knock-out DMEM (Invitrogen; 10829018) supplemented with 10% FBS (GlobalStem; GSM-6002), 1% penicillin-streptomycin (Invitrogen; 15140-163), 1% L-glutamine (Invitrogen; 25030-164), 0.001% Beta-mercaptoethanol (Sigma; M3148-100ML) and 0.01% ESGRO (Millipore; ESG1106).
Using our previous catalogue of K4-K36 defined lincRNAs2 along with the reconstructed full-length sequences we determined using RNA-Seq3, we designed shRNA hairpins targeting each lincRNA identified in both sets. Specifically, we used the conservative K4-K36 definitions from our previous work2 that were expressed in mouse ES cells. We further filtered the list to include only multi-exonic lincRNAs that were reconstructed in mouse ES cells3. Together, this yielded 226 lincRNA genes.
We selected protein coding gene controls consisting of both transcription factors and chromatin proteins. These proteins were selected based on their well-characterized role in regulating mouse ES cells and include Pou5f1 (Oct4)35,48, Sox217,49, Nanog29,30, Stat350, Klf451, and Zfp42 (Rex1)52. In addition, we selected additional transcriptional and chromatin regulators that were identified by RNAi screens as regulators of pluripotency17,20,23 and/or were found in smaller focused studies to have critical roles in the maintenance of the pluripotent state (such as Carm153, Chd154, Thap1155, Suz1218,19,36, and Setdb121,56). A full list is provided in Supplemental Table 2.
For each lincRNA we designed 5 hairpins by extending the previously described design rules22 accounting for the sequence content of the hairpin, miRNA seed matches, uniqueness to the target compared to the transcriptome and the genome, and number of lincRNA isoforms covered.
For each lincRNA we enumerated all 21-mer sub-sequences and scored them as follows: (i) A “clamp score” was computed by looking at the nucleotides at positions 18, 19, and 20. If all three positions contained an A/T it was assigned a score of 4, if two positions were A/T it was assigned a score of 1.5 and if one was A/T it was assigned a score of 0.8. We then looked at positions 16, 17, and 21 if all 3 were A/T it was assigned a score of 1.25, if 2 were A/T it was assigned a score of 1.1, and if 1 was A/T is was assigned a score of 0.8. The clamp score was computed as the product of these two scores. (ii) A “GC score” was computed by looking at the total GC percentage of the 21-mer sequence. If the sequence was <25% GC it was assigned a score of 0.01 if it was <55% it was assigned a score of 3, if it was <60% it was assigned a score of 1, and if >60% it was assigned a score of 0.01. (iii) A “4-mer penalty” of 0.01 was assigned for any hairpin containing the same nucleotide in 4 subsequent nucleotides. (iv) A “7 GC penalty” of 0.01 was assigned to any hairpin containing any 7 consecutive G/C nucleotides. (v) We removed all hairpins containing an A in either position 1 or position 2 of the hairpin. (vi) We removed all hairpins containing a repeat masked nucleotide. (vii) Finally, we computed a “miRNA-seed penalty” by looking at the forward positions 11–17, 12–20, and 13–19 of the hairpin as well as the reverse complement of positions 14–20, 15–21, or 16–21 plus a 3′ C. We then looked up whether these positions matched known miRNA seeds and with what frequency. We computed the scores for the forward and reverse positions and defined the score as the product of the forward and reverse scores. The final score for each hairpin sequence is defined as the product of all seven scores.
We then sorted the candidate hairpin sequences by score, breaking high scoring ties by the total number of lincRNA isoforms that are covered by the hairpin. We then aligned each hairpin sequence against both the genome and the RefSeq-defined transcriptome (NCBI Release 39), and filtered any hairpin with fewer than three mismatches to any other gene or position in the genome. Candidate sequences were chosen for shRNA production by first picking the highest scoring candidate and then proceeding to successively lower scores. As each hairpin was selected, all other hairpins overlapping this hairpin were removed. We repeated this process until we identified 5 hairpins that covered each lincRNA.
We designed 1,143 hairpins targeting 226 lincRNA genes. Of these, we successfully cloned 1,010 hairpins targeting 214 lincRNAs. These hairpins were cloned into a vector containing a puromycin resistance gene and incorporated into a lentiviral vector as previously described22. Briefly, synthetic double stranded oligos that represent a stem-loop hairpin structure were cloned into the second-generation TRC (the RNAi Consortium) lentiviral vector, pLKO.5; the expression of a given hairpin produces a shRNA that targets the gene of interest. Lentivirus was prepared as previously described22. Briefly, 100ng of shRNA plasmid, 100ng of packaging plasmid (psPAX2) and 10ng of envelope plasmid (VSV-G) were used to transfect packaging cells (293T) with TransIT-LT1 (Mirus Bio). Virus was harvested 48 and 70 hours post-transfection. Two harvests were combined. Virus titers were measured as previously described22. Briefly, we measured virus titers by infecting A549 cells with appropriately diluted viruses. 24 hours post infection, puromycin was added to a final concentration of 5ug/ml and the selection proceeded for 48 hours. The number of surviving cells, which is correlated to virus titer, was measured by alamarBlue (BioSource) staining utilizing the Envision 2103 Multilabel plate reader (PerkinElmer)
V6.5 ESCs or Nanog-Luciferase ESCs were plated at a density of 5000 cells/well (8-day time point) or 25,000 cells/well (4-day time point) in 100ul mES media onto pre-gelatinized 96-well dishes (VWR; BD356689). Cells were infected with 5ul of a lentiviral shRNA stock and incubated at 37°C for 30 minutes. Puromycin resistant DR4 MEFs (GlobalStem; GSC-6004G) were then added to the plates at a density of ~6000 cells/well and incubated overnight at 37°C, 5% CO2. After 24 hours, all media was removed from the cells and replaced with media containing 1ug/mL puromycin. Media was then changed every other day with fresh media containing 1ug/mL puromycin. The end-point depended on the assay and was either 4-days post infection (knockdown validation and microarrays) or 8-days (reporters and qPCR of marker genes).
ES cells were infected and lysed at day 4 with 150ul of Qiagen’s RLT buffer and 3 replicates of each virus plate were pooled for RNA extraction using Qiagen’s RNeasy 96-well columns (74181). RNA extraction was completed following Qiagen’s RNeasy 96-well protocol with the following modifications: 450ul of 70% ethanol was added to 450ul total lysate prior to the first spin. An additional RPE wash was added to the protocol, for a total of 3 RPE washes.
lincRNA primers were designed using primer3 (http://frodo.wi.mit.edu/primer3/). Specifically, we designed primers spanning exon-exon junctions by specifying each of the regions as preferred inclusion regions in the primer3 program. When a low scoring primer pair (primer penalty <1) was available it was used. If none was available, we then identified all primers that contained amplicons that spanned an exon-exon junction. In a few cases, when we could not identify a primer pair spanning an exon-exon junction, we designed primers within an exon of the lincRNA. For each primer pair, we tested the specificity against the transcriptome57 (Ref Seq NCBI Release 39) and the genome (Mouse MM9) using the isPCR (http://genome.ucsc.edu/cgi-bin/hgPcr) program. Specifically, we required that the primer pair amplify the lincRNA gene and no other genomic of gene amplicon.
For each primer pair, we validated the quantification and specificity prior to use. Specifically, we tested primers in qPCR reactions using a dilution series of mouse ES cDNA including a no reverse transcriptase (RT) sample. We excluded any primer that did not have robust quantification across a 64-fold dilution curve, had high signal in the no RT sample, or had low detectable expression in the undiluted sample (cycle number >34). For primers that failed this validation we redesigned and tested new primers.
To determine if lincRNA hairpins were effective at knocking down the lincRNA of interest, we infected each hairpin into mouse embryonic stem cells, selected for lentiviral integration, and measured changes in the targeted lincRNA expression level. We isolated total cellular RNA after 4 days; this time-point was chosen to allow for identification of robust changes while minimizing secondary effects due to differentiation of the ES cells. We reasoned that this would allow us to determine more direct effects due to RNAi rather than to differentiation.
Gene panels were constructed that contained all 5 hairpins targeting a gene along with an empty vector control pLKO.5-nullT and the GFP-targeting hairpin clonetechGfp_437s1c1. cDNA was generated using 10ul of RNA and 10ul of 2x cDNA master mix containing 5x Transcriptor RT Reaction Buffer (Roche), DTT, MMLV-RT (Roche), dNTPs (Agilent; 200415-51), Random 9-mer oligos (IDT), Oligo-dT (IDT) and water. cDNA was diluted 1:9 and quantitative PCR was performed using 250 nM of each primer in 2x Sybr green master mix (Roche) and run on a Roche Light-Cycler 480. Target lincRNA expression and GAPDH levels were computed for each panel. lincRNA expression levels were normalized by GAPDH levels and this normalized value was compared to the reference control hairpins within the panel. Knockdown levels were computed as the average of the fold decrease compared to the two control hairpins. Hairpins showing a knockdown greater than 60% of the endogenous level were considered validated and the best validated hairpin from a lincRNA panel was selected for microarray studies.
To assess the effects of a lincRNA on gene expression, we profiled the changes in gene expression after knocking down each lincRNA gene. Specifically, for each lincRNA with at least 1 validated hairpin we profiled the genome-wide expression level changes after knockdown across 2 independent infections (see above). To control for expression changes due to viral infection, we performed five independent infections containing no RNAi hairpin (pLKO.5-nullT). This control hairpin was embedded in each RNA prep plate. To control for effects due to an off-target RNAi effect, we profiled 27 distinct negative control hairpins which do not have a known target in the cell. These hairpins included 6 RFP hairpins, 10 GFP hairpins, 6 Luciferase hairpins, and 5 LacZ hairpins. These hairpins provide a measurement of the variability of the RNAi response triggered due to non-specific effects. Furthermore, we profiled hairpins targeting 147 lincRNAs, including 10 with a second best hairpin, and 40 protein-coding genes in biological replicate. The hairpins and their replicates were randomly distributed across 7 96-well plates and prepared in batches. Each RNA preparation batch contained 1 pLKO hairpin and 1 clonetechGfp_437s1c1 hairpin in a random location on the plate. To minimize batch effects, the plate locations of the biological replicates were scrambled and the positions within the plates were scrambled for all hairpins and replicates.
Using Agilent’s One-Color Quick Amp Labelling kit (5190-0442), we amplified and labelled total RNA for hybridization to prototype mouse lincRNA arrays (G4140-90040) according to manufacturer’s instructions with a few variations. The custom Agilent SurePrint G3 8×60K mouse array design used for this study (G4102A, AMADID 025725 G4852A) has probes to 21,503 Entrez genes and 2,230 lincRNA genes. A new updated version of this mouse design is commercially available that contains probes to 34,017 Entrez gene targets as well as 2,230 lincRNA genes (G4825A). The cRNA samples were prepared by diluting 200ng of RNA in 8.3ul water and adding positive control one-color RNA spike-in mix (Agilent, 5188-5282) that was diluted serially 1:20, then 1:25 and finally 1:10. We annealed the T7 promoter primer from the kit by incubating at 65°C for 10 minutes. We prepared the cDNA master mix and added it to the annealed RNA and incubated at 40°C for 2 hours, followed by 65°C for 15 minutes. We prepared the cRNA transcription master mix and added it to the cDNA and incubated at 40°C for 2 hours protected from light. We purified the labeled cRNA using Qiagen’s RNeasy 96-well columns (Qiagen, 74181) by adding 350ul of Qiagen RLT (without BME) to the cRNA followed by the addition of 250ul of 95% ethanol before applying to the plate column. After a 4 minute spin at 6000RPM, we washed the columns 3 times with 800ul buffer RPE. We dried the columns by spinning for 10 minutes and eluted the cRNA with 50ul of water. We measured the cRNA yield and dye incorporation using the Nanodrop 8000 Microarray measurement setting. We mixed 600ng of cRNA with blocking agent and fragmentation buffer (Agilent, 5190-0404) and fragmented for 30 minutes in the dark at 60°C. We added 2x Hybridization Buffer to each sample and loaded 40ul onto an 8-pack Hybridization gasket. We placed the microarray slides on top, sealed in the Hybridization Chamber, and incubated for 18 hours at 65°C. We washed the slides for 1 minute in room temperature GE Wash Buffer 1 and then for 1 minute in 37°C GE Wash Buffer 2 (Agilent 5188-5327, no triton addition). We scanned the microarrays using an Agilent Scanner C (G2565CA) using the following settings: Dye Channel = Red&Green, Scan Region = ScanArea (61×21.6mm), Scan Resolution = 3 μm. We prepared all of the samples simultaneously using homogenous master mixes to limit variability. Fragmentation and hybridization was staggered over time in batches of 3 to 4 slides (24 to 32 samples).
Each array was processed and data extracted using the Agilent feature extraction software (G4462AA, Version 10.7.3). Samples were retained if they passed all the following quality control statistics:
Gene expression values were determined using the gProcessedSignal intensity values. Probes were flagged if they were not detectable well above background or had an expression level lower than the lowest detectable spike-in control value. The values were floored across all samples by taking the maximum of the minimum non-flagged values across all experiments. Any value less than this maximum value were set to the maximum. This conservatively eliminates any detection variability across the samples due to stringency or other array variables.
The result of this is a single value for each probe per array. To normalize expression values across arrays, we performed quantile normalization as previously described58. Briefly, we ranked each array from lowest to highest expression. For each rank, we computed the average expression and each experiment with this value at the associated rank. For each probe, we computed the difference between the second smallest expression value and the second largest expression value. If this difference was less than 2, we filtered the probe. This metric was chosen to eliminate bias due to single sample outliers.
To control for effects due to non-specific effects of shRNAs, we profiled 27 distinct negative control hairpins which do not have a known target in the cell. These hairpins provide a measurement of the variability of the expression profiles due to random variability or triggered by ‘off-target’ effects of the shRNA lentiviruses. Assuming that any observed effects in the negative control hairpins are due to ‘off-target’ effects and observed effects in the targeting hairpins include a mix of both ‘off-target’ effects and ‘on-target’ effects, we use permutations of the negative controls to assign a false discovery rate (FDR) confidence level for being an ‘on-target’ hit to each gene. As such, a gene would only reach genome-wide significance if the number of genes and scale of the effect was much larger than would be observed randomly among all of the expression changes found for the negative control hairpin.
Specifically, for each gene we computed a t-statistic between shRNAs targeting the lincRNA and control shRNA samples. To assess the significance of each gene we permuted the sample and control groups retaining the relative sizes of the groups and computing the same t-statistic. We then assigned an FDR value to each gene by computing the average number of values in the permuted t-statistics that were greater than the observed value of interest and divided this by the number of all observed t-statistics that were greater than the observed value. We defined genes as significantly differentially expressed if the FDR was <5% and the fold-change compared to the negative controls was >2-fold. Using this approach, an effect would only reach a significant FDR if the scale is significantly larger than would be observed in the negative controls. Knockdown of a lincRNA was considered to have a significant effect of gene expression if we identified at least 10 genes that had an effect that passed all of the criteria.
We identified neighbouring genes based on the RefSeq genome annotation57 (NCBI Release 39). We excluded from analysis all RefSeq genes that corresponded to our lincRNA of interest but included all other coding and non-coding transcripts. We identified a significant hit as any lincRNA affecting a neighbour within 10 genes on either side with an FDR<.05 and 2-fold expression change. To compute the closest affected neighbour, we classified all genes affected upon knockdown of the lincRNAs using the same criteria above. We computed the distance between each affected gene and the locus of the lincRNA gene (and protein-coding gene) that was perturbed and took the minimum absolute distance across all affected genes.
To determine the expected number of differentially expressed “neighbouring” genes occurring by chance assuming that the knockdown has no effect on gene expression, we calculated the average number of genes in a 300Kb window around a randomly selected gene in the human and mouse genome. We calculated this to be 11.2 (human) and 11.8 (mouse). For simplicity, we will conservatively round this down to 11. Assuming that no genes are changing between the knockdown and control, using a nominal p-value, which has a uniform distribution under the null hypothesis (nothing effected), we would expect to see a difference called in 5% of cases at a p-value of 0.05. If we test one locus, which has on average 11 neighbours we would expect to identify 0.55 hits by chance (11 × 0.05=0.55). However, if we now test 12 loci we would expect to see 6.6 (12 × 0.55) knockdowns which appear to have an effect under the null hypothesis.
ES cells containing a Nanog-Luciferase construct31 were infected in biological duplicate and monitored after 7 days. Luciferase activity was measured using Bright-Glo (Promega). All reagents and cells were equilibrated to room temperature. 100ul Bright-Glo solution was added to each plate well. Plates were incubated in the dark at room temperature for 10 minutes and luciferase was measured on a plate reader. The luciferase units were normalized to the control hairpins and a z-score compared to the negative controls (excluding luciferase hairpins) was computed. For each hairpin, we computed a Z-Score relative to the negative control hairpins and identified hits reducing Luciferase levels more than 6 standard deviations (Z<−6) for both independent replicates. In all cases we were able to identify a significant reduction in luciferase levels when using distinct hairpins targeting luciferase. To exclude hits that were due to an overall reduction in proliferation (which would also cause a reduction of Nanog positive cells in this read-out) we excluded all hairpins that caused a reduction in proliferation as measured by AlamarBlue incorporation (described below). AlamarBlue incorporation was measured in the same cells immediately before reading out Nanog-Luciferase levels.
After a 7 day infection, Nanog-Luciferase cell viability was measured using AlamarBlue (Invitrogen; DAL1025). AlamarBlue was mixed with mES media in a 1:10 ratio, added to the cells and incubated at 37°C for 1 hour. Absorbance readings at 570nm were taken. To control for possible effects due to virus titer, we measured AlamarBlue incorporation on both puromycin treated and non-puromycin treated samples for each infection.
V6.5 ESCs were infected with shRNAs targeting lincRNAs, protein-coding genes, and 21 negative controls. After 8-days, RNA was extracted and mRNA levels of the Oct4, Nanog, Sox2, Klf4, and Zfp42 pluripotency markers were analysed using qPCR. Primer sequences are listed in Supplemental Table 9. Each sample was normalized to Gapdh levels. Significance was assessed compared to the negative control hairpins using a one-tailed t-test.
To control for ‘off-target’ effects, we analysed additional hairpins against the 26 lincRNAs affecting Nanog-Luciferase levels. Of the 26 lincRNAs, we identified 15 lincRNAs that contained an additional hairpin that reduced lincRNA expression by >50%. V6.5 ESCs were infected with the best and additional hairpin across biological replicates for these 15 lincRNAs and 21 negative control hairpins. RNA was extracted after 8 days and Oct4 expression levels were determined using qPCR. Significance was assessed relative to the negative controls using a one-tailed t-test.
We crosslinked cells in 4% paraformaldehyde for 15 minutes, and washed in 1x PBS three times. To permeabilize the cells, we washed with 1x PBS + 0.1% Triton and then blocked in 1x PBS + 0.1% Triton + 1% BSA for 45 minutes at room temperature. We incubated cells with α-Pou5f1 antibody (Santa Cruz: SC-9081) at 1:100 dilution in blocking solution for 1.5 hours at room temperature and then washed in blocking solution three times. Next, we incubated cells in α-rabbit secondary antibody coupled to GFP (Jackson ImmunoResearch: 111-486-152) at a dilution of 1:1000 in blocking solution for 45 minutes. Finally, we thoroughly washed cells in blocking solution three times, and added vectashield containing DAPI (VWR: 101098-044) to each well.
Traditionally, lineage markers are used to identify changes in phenotypic states. While these markers can be good indicators of differentiation potential, there are two major limitations with this approach. First, there are multiple genes that are associated with each lineage so simply looking at one can often be misleading. Second, this approach only works for classifying states with well-characterized marker genes but would not work for a comprehensive characterization of the function in the cell. Therefore, we decided to take a different approach and look at the entire gene expression profile of each lincRNA knockdown to determine what cell state each lincRNA resembles.
We curated a set of ES perturbations and differentiation states from publicly available sources. Specifically, we utilized the NCBI e-utils (http://eutils.ncbi.nlm.nih.gov/) to programmatically identify all published datasets containing keywords associated with embryonic stem cells. We filtered the list to only include mouse data sets that were generated across one of three commercial array platforms (Affymetrix, Agilent, and Illumina). Following this approach, we manually curated the list to include datasets associated with ESC perturbations (genetic deletions, RNAi, or chemical perturbations) and differentiation or induced differentiation profiles. This curation yielded 41 GEO datasets corresponding to >150 samples.
Specifically, we defined differentiation lineage states using the following datasets.
In addition, for all lineage states we utilized a curated discrete gene expression signature of differentiation which was previously functionally tested and shown to correspond specifically to differentiation into the associated states63.
To determine relationships between lincRNA knockdowns and functional states, we employ a modified Gene Set Enrichment Analysis34 approach that accounts for the continuous nature of the two datasets, similar to previously described extensions34,64,65. For each lincRNA knockdown by functional pair we compute a continuous enrichment score. Specifically, (i) for each lincRNA knockdown we compute a normalized score matrix compared to a panel of negative control hairpins by computing a t-statistic for each gene between the replicate lincRNA knockdown expression values and the control knockdown values. (ii) For each experiment, we sort the matrix by the normalized score such that the most differentially expressed upregulated gene is first and the most differentially expressed downregulated genes is last. Using this ordering we sort the functional dataset such that the ordering corresponds to the differential rank of the lincRNA knockdown set. (iii) We compute a score Si as the running average of values from the first position to position i. We then define the enrichment score E as the maximum of the absolute value of Si for all values of i>10. We require i>10 to avoid small fluctuations in the beginning of the ranked list causing fluctuations in the enrichment score. This score is computed for each lincRNA knockdown by functional set. Since we have many lincRNA knockdowns and functional sets, in reality we have a matrix of scores and we will refer to the enrichment score of the ith knockdown and jth functional set as Eij.
To assess the significance of these scores, we compute a permutation derived false discovery rate and assign a confidence value for each projection. Specifically, to assess the significance of Eij, we permute the lincRNA knockdown samples and control samples and compute the enrichment score for each pair across all permutations. To account for the false discovery rate associated with many lincRNAs and functional sets, we use the values of all permutations directly to assess the FDR level of Eij. Specifically, to assess the FDR for each enrichment value Eij, we accumulate all the permutation values for all lincRNA knockdowns and functional sets and compute the number of values greater than Eij as well as a vector of values greater than Eij corresponding to each permutation. The FDR is computed as the average number of permuted values greater than Eij divided by the observed number greater than Eij. Using this approach, we assign an FDR value to each lincRNA knockdown by functional set and identify significant hits as those with an FDR<0.01.
To highlight the accuracy of this approach, we observed that for publicly available gene perturbations for which we also perturbed the gene we were able to identify a significant association of target genes in ~75% of cases. While the remaining few did not pass our conservative significance criteria, they also showed increased enrichments consistent with their common effects. In addition, the projected effects are highly reproducible across distinct experiments originating from many groups and across multiple expression platforms. Highlighting the specificity of this approach, we note that there are many profiles for which no lincRNA had a similar effect.
To determine whether independent hairpins targeting the same lincRNA gene share common gene targets, we computed a continuous enrichment score described above. Briefly, we computed a t-statistic for both hairpins against the negative controls. We then took the second best hairpin and sorted the genes. We scored the best hairpin affected genes based on this ranked order. We assessed the significance of this enrichment by permuting the samples and controls and assigned an FDR of the overlap of the expression effect (as described above).
Discrete gene sets were analysed using the Gene Set Enrichment Analysis with a slight modification to the scoring procedure to be more analogous to our continuous scoring procedure (described above). Specifically, we computed the average of the expression changes (defined by the t-statistic) for all genes within the discrete gene set upon knockdown63. Significance was assessed by permuting the control and sample labels and recomputing the average statistic for each permutation. The FDR was assessed off of these values as described above.
We curated lineage marker gene sets from published work and publicly available sources17,32,63. We identified lineage marker genes as significantly upregulated using the differential expression criteria outlined above. We validated the expression of these lineage marker genes for a selected set of lineage marker genes using qPCR (as described above) after a 4-day infection. Specifically, we looked at the expression of FGF5 (ectoderm), Sox1 (neuroectoderm), Sox17 (endoderm), Brachyury (mesoderm), and Cdx2 (trophectoderm). Primer sequences are listed in Supplemental Table 9. Expression estimates were normalized to Gapdh and compared to a panel of 25 negative control hairpins.
We obtained genome-wide transcription factor binding data in mouse ES cells from 2 sources. The transcription factors Oct4, Sox2, Nanog, and Tcf3 were downloaded from the Gene Expression Omnibus (GSE11724) and the cMyc, nMyc, Zfx, Stat3, Smad1, Klf4, and Esrrb from GEO (GSE11431). For each ChIP-Seq dataset, the raw reads were obtained from the SRA (http://www.ncbi.nlm.nih.gov/sra) and processed as follows. (i) The reads were all aligned to the mouse genome assembly (build MM9) using the Bowtie aligner66, requiring a single best placement of each read. All reads with multiple acceptable placements were removed from the analysis. (ii) Binding sites were determined from the aligned reads using the MACS67 (http://liulab.dfci.harvard.edu/MACS/) algorithm using the default parameters with –mfold 8 to account for varying read counts in the libraries. (iii) lincRNA promoter regions were defined as previously described2,3 using the location of the K4me3 peaks overlapping or within 5Kb of the transcriptional start site determined by RNA-Seq reconstruction. (iv) The transcription factor binding locations and lincRNA promoter locations were intersected and the enrichment level of the peak overlapping a lincRNA promoter was assigned transcription factor binding enrichment for each lincRNA. We defined transcription factor binding locations for protein-coding genes in a comparable way. (v) To exclude the possibility that some of this binding might be due to transcription factor binding at distal enhancers, we excluded all binding events that showed evidence of P300, a protein associated with active enhancers68, localization. Altogether, we only identified ~5% of promoters overlapping with any P300 enrichment signal, a slightly lower percentage than identified for protein-coding gene promoters with detectable P300 signal.
lincRNA probes on the Agilent microarray were analysed using the differential expression methodology described above after knockdown of the transcription factor and comparison to the negative control hairpins. To confirm the expression changes of these lincRNAs, we hybridized 12 transcription factor knockdowns on a custom lincRNA codeset using the Nanostring nCounter assay41 (LIN-MES1-96). The knockdowns were profiled in biological duplicate along with 15 negative controls. Regulated lincRNAs were identified using the differential expression approach described above.
Nanostring probes against lincRNA genes were designed following the standard nanostring design principles with the following modifications specifically for the lincRNA probes. (i) To exclude possible cross-hybridization, probes were screened for cross-hybridization against both the standard mouse transcriptome as well as a background database constructed from all the lincRNA sequences. (ii) To account for isoform coverage, a first pass design attempted to select a probe that would target as many isoforms as possible for each lincRNA. In cases where it was not possible to target all isoforms for a given lincRNA, the probe that targeted the largest number was selected, and additional probes were chosen when possible to target the remaining isoforms. (iii) The standard restrictions on Tm and sequence composition were relaxed to include probes for as many lincRNAs as possible.
V6.5 cells were cultured on gelatin-coated dishes in mES media in the absence of LIF. 5μM of retinoic acid was added daily and cell samples were taken daily for 6 days. RNA was extracted using Qiagen’s RNeasy spin columns following the manufacturer’s protocol.
30ug of mESC nuclear protein extracts were run on 10% Bis-Tris gels (Invitrogen NP0316BOX) in MOPS buffer (Invitrogen NP0001) at 75 volts for 20 minutes followed by 120 volts for 1 hour. Gels were incubated for 30 minutes in 20% methanol transfer buffer (Invitrogen NP0006-1) and transferred onto PVDF membranes (Invitrogen 831605) at 20 volts for 1 hour using the Bio-Rad semi-dry transfer system (170–3940). Membranes were blocked in Blotto (Pierce, 37530) at room temperature for 1 hour. Antibodies were diluted in Blotto and membranes were incubated overnight at 4°C. Antibodies were diluted in using the following concentrations. Ezh2 1:2000, Suz12 1:5000, hnRNPH 1:1000, Ruvbl2 1:1000, Jarid1b 1:500, HDAC1 1:250, Cbx6 1:500, YY1 1:500. All antibodies tested were raised in rabbit. The next day, membranes were washed 3x in 0.1% TBST for 5 minutes each. The membranes were probed with anti-Rabbit-horse radish peroxidase (GE Healthcare; NA9340V) at a 1:10,000 dilution, washed 3x in 0.1% TBST, incubated in ECL reagent (GE Healthcare RPN2132), and exposed.
V6.5 mES cells were fixed with 1% formaldehyde for 10 minutes at room temperature, quenched with 2.5M glycine, washed with 1x PBS (3x) harvested by scraping, pelleting, and resuspended in modified RIPA lysis buffer (150mM NaCl, 50mM Tris, 0.5% Sodium deoxycholate, 0.2% SDS, 1% NP-40) supplemented with RNase inhibitors (Ambion, AM2694) and protease inhibitors. For UV crosslinking experiments, cells were irradiated with 254nm UV light. Cells were kept on ice and crosslinked in 1x PBS using 400,000 μjoules/cm2.
Cell suspension was sonicated using Branson 250 Sonifier for 3 × 20 s cycles at 20% amplitude. 10ul of Turbo DNase (Ambion, AM2238) was added to sonicated material, incubated at 37°C for 10 minutes, and spun down at max speed for 10 minutes at 4°C. Protein-G beads were washed and pre-incubated with antibodies for 30 minutes at room temperature. Lysate and beads were incubated at 4°C for 2 hours. Beads were washed 3x using the following wash buffer (1x PBS, 0.1% SDS, 0.5% NP-40) followed by 2x using a high salt wash buffer (5x PBS, 0.1% SDS, 0.5% NP-40) and crosslinks were reversed and proteins were digested with 5ul proteinase-K (NEB, P8102S) at 65° for 2–4 hours. RNA was purified using phenol/chloroform/isoamyl alcohol and RNA was precipitated in isopropanol.
500ng of total RNA was hybridized for 17 hours using the lincRNA codeset. The hybridized material was loaded into the nCounter prep station followed by quantification on the nCounter Digital Analyzer following the manufacturer’s protocol. For RNA immunoprecipitation experiments, we used a modified protocol. After reverse crosslinking, RNA was extracted using phenol/chloroform and ethanol precipitation methods and resuspended in 10ul of H2O. 5ul of the eluted material was hybridized for 17 hours using the lincRNA codeset.
Probe values were normalized to negative control probes by dividing the value of the probe by the maximum negative control probe. Probe values were floored to a normalized value of 3 (3-fold higher than maximum negative control). Probes with no value greater than this floor across all samples were removed from the analysis. The values were log transformed. To control for variability between runs and different input material amounts, we normalized all samples simultaneously using the quantile normalization approach described above. The result is a set of normalized log-expression values for each probe normalized across all experiments.
To validate our formaldehyde based RNA immunoprecipitation method we immunoprecipitated the RNA binding protein hnRNPH, which plays a role in mRNA splicing69 and identified the associated RNAs. Consistent with known interactions, we identified a strong enrichment for its binding to intronic regions of mRNA genes. We validated these observed results in mouse ES cells by performing UV-crosslinking experiments70–72 and identified nearly identical results. We identified a similar correlation between the UV and formaldehyde crosslinked samples as for biological replicates of UV crosslinked samples and formaldehyde crosslinked samples and highly comparable enrichments (data not shown).
We selected chromatin proteins that have been implicated in regulation of the pluripotent state along with their known associated ‘reader’, ‘writer’, and ‘eraser’ complexes. Specifically, we tested antibodies against 40 chromatin proteins, corresponding to 28 chromatin complexes. In many cases, we tested multiple antibodies against the same target protein to try and identify an antibody that worked well for immunoprecipitation. A full list of tested complexes and their associated antibodies are listed in Supplemental Table 18.
We tested each antibody using formaldehyde crosslinked cells and had a two-step procedure for considering an antibody successful. (i) We tested all selected antibodies in batches, with each batch containing a mock-IGG (Santa Cruz) negative control and hnRNPH (Bethyl) positive control. Batches with variability in either the mock-IGG or hnRNPH controls were excluded and retested. For each successful batch, we computed enrichment for each lincRNA between the tested antibody and mock-IGG. We considered an antibody successful in the first step if the highest enrichment level exceeded a 5-fold change compared to the mock-IGG control and more than 10 lincRNAs exceeded this threshold. While this approach can yield false positives (antibodies that pass but are not efficient) it significantly reduced the number of antibodies to be tested in the next step. (ii) For all antibodies that successfully passed the first criteria, we performed immunoprecipitation on two additional biological replicates along with 4 mock-IGG controls. We computed a t-statistic for each lincRNA compared to the controls and assessed the significance using a permutation test, by permuting the samples and IGG samples (as above). Hits were considered significant if they exceed a t-statistic cutoff of 2 (log scale) compared to the controls and had an FDR<0.2. We allowed a slightly higher FDR cutoff since the number of permutations was far smaller yielding lower power to estimate the FDR. Only antibodies yielding significant lincRNAs were considered successful. In total, we identified 12 of the 28 complexes (55 antibodies) with at least one successful antibody.
To determine the functional overlap between the lincRNA and the chromatin complexes it physically interacts with, we compared the effects on gene expression upon knockdown of the lincRNA and the associated protein complex. To do this, we utilized the gene expression profiles determined for each lincRNA knockdown and knockdowns of 9 of the 12 identified chromatin complexes for which we had good hairpins. We defined each interaction between a lincRNA and protein, and compute a continuous enrichment score, generated all permutations of the control hairpins and sample hairpins and assigned a false discovery rate to the scores (as described above). At an FDR<0.05 we identified 43% of the interactions to be significant. For 69% of the interactions, we were able to identify an overlap at an FDR<0.1.
We thank Dianali Rivera, Thomas Green, Tashfeen Bhimdi, Griet Verstappen, Christine Surka, Serena Silver, Adam Brown, Daniel Lam, and Oren Ram for technical help, Casey Gifford, Stella Markoulaki, and Rudolph Jaenisch for providing cell lines used in this study, Peter Tsang, Bo Curry, Anya Tsalenko, and Agilent Technologies for microarray and technical help, Bart Challis and Active Motif for antibodies, Gary Geiss, Rich Boykin, and Nanostring technologies for technical help, Eric Wang and Chris Burge for help with RNA immunoprecipitation experiments and helpful discussions, Piyush Gupta, Andreas Gnirke, John Cassady, Erez Lieberman-Aiden, Moran Cabili, and Matt Thompson for discussions and ideas, and Leslie Gaffney for assistance with figures. M. Guttman is a Vertex scholar. This work was funded by NHGRI, a Center for Excellence for Genomic Science, the Merkin Foundation for Stem Cell Research, and funds from the Broad Institute of MIT and Harvard.
Author ContributionsM. Guttman and ESL conceived and designed the overall project with help from AM, AR, JLR, and DER, M. Guttman and JD designed experiments with help from JKG, XY (RNAi), BWC (pluripotency assays), and IA (RNA IP), M. Guttman, JD, GM, ABL, RA, GY performed experiments, M. Guttman, JD, M. Garber analysed data, LB, AM, DER provided reagents, M. Guttman and ESL wrote the manuscript.
Microarray data have been deposited in the Gene Expression Omnibus (GEO) under accession number GSE30245.
Reprints and permissions information is available at www.nature.com/reprints.
The authors declare no competing financial interests.