|Home | About | Journals | Submit | Contact Us | Français|
Many tools are available to analyse genomes but are often challenging to use in a cell type–specific context. We have developed a method similar to the isolation of nuclei tagged in a specific cell type (INTACT) technique [Deal,R.B. and Henikoff,S. (2010) A simple method for gene expression and chromatin profiling of individual cell types within a tissue. Dev. Cell, 18, 1030–1040; Steiner,F.A., Talbert,P.B., Kasinathan,S., Deal,R.B. and Henikoff,S. (2012) Cell-type-specific nuclei purification from whole animals for genome-wide expression and chromatin profiling. Genome Res., doi:10.1101/gr.131748.111], first developed in plants, for use in Drosophila neurons. We profile gene expression and histone modifications in Kenyon cells and octopaminergic neurons in the adult brain. In addition to recovering known gene expression differences, we also observe significant cell type–specific chromatin modifications. In particular, a small subset of differentially expressed genes exhibits a striking anti-correlation between repressive and activating histone modifications. These genes are enriched for transcription factors, recovering those known to regulate mushroom body identity and predicting analogous regulators of octopaminergic neurons. Our results suggest that applying INTACT to specific neuronal populations can illuminate the transcriptional regulatory networks that underlie neuronal cell identity.
The nervous system provides a striking example of cellular diversity, with myriad neuronal, glial, and other cell types organized into neural circuits. The identity of these cell types, established during development and maintained throughout adulthood, requires the expression of unique combinations of genes (1,2). These combinations include genes that implement a particular biochemical or signaling function (e.g. ion channels, neurotransmitter receptors) and other regulatory genes (e.g. transcription factors) that control when, where and at what level each gene is expressed (2). Understanding how these transcriptional networks are established and then control the phenotype of a specific cell type is a fundamental problem in modern molecular biology. This challenge also has practical implications for molecular neuroscience, where characterizing the molecular components of individual neuronal cell types will improve our ability to dissect neural circuits.
In principle, genome-wide methods allow systematic characterization of these regulatory networks (3). However, applying these techniques to specific cell types requires a method for the isolation of a homogeneous population of cells in quantities sufficient to produce a robust signal. Solutions to this problem, particularly for transcript analysis, include cell purification techniques (e.g. fluorescent activated cell sorting, laser capture micro-dissection and manual sorting) and biochemical purification strategies that rely on cell type–specific labeling of core machinery, including ribosomes (translating ribosome affinity purification) and the Argonaute complex (microRNA tagging-affinity-purification) (4–8). It would, however, be advantageous to use a single isolation method to characterize cell type–specific gene expression, chromatin modifications, transcription factor binding and other types of genome-wide profiles.
One promising approach is the isolation of nuclei tagged in a specific cell type (INTACT) strategy, first described in Arabidopsis and extended to Caenorhabditis elegans (9,10). This method marks the nucleus of a specific cell type with a genetically encoded tag. After these labeled nuclei are purified, cell type–specific transcriptional profiles and chromatin maps can be constructed. Another approach involves the cell type–specific expression of a GFP-histone H2B fusion protein, which was used to isolate nuclei from Drosophila by fluorescent activated cell sorting (11). Both of these approaches have been used to characterize embryonic mesoderm in Drosophila.
We are interested in adult Drosophila neuronal cell types and would like to take advantage of an extensive collection of GAL4 lines that target sparse sub-populations of neurons (12). Toward this end, we have independently developed an INTACT procedure that permits the isolation of nuclei from the brains of adult flies. Unlike the original INTACT approach, our system does not rely on streptavidin-mediated capture of biotinylated nuclei (9). Instead, nuclei are immunoaffinity purified by magnetic beads adsorbed to an antibody that recognizes our tag. In addition, we describe a rapid isolation procedure that allows the purification of nuclei from adult flies at reasonable yields with high purity. Finally, because tag expression is driven by the GAL4/UAS system, it can be used in any class of neuron for which a suitable driver is available (13).
We present a proof of principle study profiling gene expression (RNA-seq) and histone modifications (ChIP-seq) in three Drosophila neuronal populations ranging from 100000 to 130 cells per brain. We describe the observed differential expression profiles in the context of known marker genes. We further describe patterns of differential histone modifications that indicate active promoter (H3K4me3), open chromatin (H3K27ac) and polycomb group (PcG)-mediated transcriptional silencing (H3K27me3). In particular, we observed strong cell-specific repression of a small number of transcription factors in one population, a concomitant cell-specific activation in the other population and a consistent differential expression pattern. We close by discussing the utility of our approach for characterizing the regulatory networks that control neuronal cell identity.
A synthetic linker encoding the following amino acids: LAAASGGGGSGGGGSLAAASEFSAAALSGGGGSGGGGSAAAL was inserted into the unc84 (NP_001024707.1) reading frame between amino acid 1111 and the stop codon. Two copies of the super folder GFP variant were then cloned into the centrally located EcoRI site (amino acids EF in the linker) (14) to produce UNC-84-2XGFP. The UNC-84-tdTomFL construct used the same linker strategy except that the fluorescent protein cassette carried a restriction site at its 3′ end that allowed the addition of a C-terminal 3XFlag epitope tag.
P(GawB)ey[OK107-GAL4] (#854) and P(Tdc2-GAL4.C)2 (#9313) were obtained from the Bloomington stock center. R57C10-GAL4 is a promoter fusion of the GAL4 coding region and an 824bp upstream fragment of the n-synaptobrevin gene, defined by the primers atttcccaccccttggccatcggca and gttctagagggttgcgctctcagtg, and was constructed as described previously (12). Similarly, both the UAS_unc84-2XGFP and UAS_unc84-tdTomFl cassettes were inserted into the attP2 site using phi31-mediated recombination (15).
ML-DmBG3-c2 cells were transfected with the same UAS constructs that were used to make transgenic flies by the Effectene method (Qiagen: 301425). Expression was driven by ubiquitin-GAL4.
300μl of Dynal Protein-G beads (Invitrogen: 100-03D) were adsorbed to either 5μg of anti-GFP antibody (Invitrogen: G10362) or 10ug of anti-Flag antibody (Sigma: F7425) in 600μl PBS/0.1%Tween 20 for 30min at 4°C. Beads were then washed once in PBS/0.1%Tween-20 and stored in 300μl of 10mM β-glycerophosphate pH7, 2mM MgCl2.
Adult flies were anesthetized by CO2 and flash frozen in liquid N2. Heads were separated from thoracicoabdominal segments, wings and legs by vigorous vortexing followed by separation over dry ice cooled sieves. In all, 600–10000 frozen heads were added to 100ml of 10mM β-glycerophosphate pH7, 2mM MgCl2, 5mM sodium butyrate, 1X complete protease inhibitor cocktail (Roche: 11873580001), and the suspension was passed over a Yamato continuous flow homogenizer, set at 100rpm, five to seven times. The homogenate was filtered over Miracloth (EMD Biosciences: 475855) and brought to 0.7mM β-mercaptoethanol and 0.5% NP-40. After six tractions in two 40ml Dounce homogenizers (tight-pestle B), 600μl of antibody-adsorbed beads were added to 100ml of lysate. The binding reaction was performed at 4°C for 30min with constant end-over-end agitation. Beads were then collected on a magnet (Invitrogen: 123-02D) and washed three to four times in 50ml10mM β-glycerophosphate pH7, 250mM sucrose, 2mM MgCl2, 25mM KCl and 5mM sodium butyrate. Bead-bound nuclei in 20ml of wash buffer were then passed over a 20um nylon mesh (Small Parts: B001D8ECDE), returned to the magnet stand and resuspended in 1ml of 10mM β-glycerophosphate pH7, 250mM sucrose, 2mM MgCl2, 25mM KCl and 5mM sodium butyrate. Sodium butyrate and the protease inhibitor cocktail are omitted from all buffers, if nuclei were to be used for transcript profiling (RNA-seq).
Bead-bound nuclei collected on a magnet stand (Invitrogen: 123-21D) or whole dissected brains were resuspended in 400μl of 100mM Tris pH7, 4M guanidinium thiocyanate. After 30min of agitation at 4°C (in the case of bead-bound nuclei), the supernatant containing nuclear RNA was removed from the beads and extracted with an equal volume of phenol:CHCl3. After the addition of 0.1 volume 3M sodium acetate pH5, the sample was extracted with an equal volume of acid phenol:CHCl3 (Invitrogen: AM9722). The aqueous layer was recovered and brought to 400μl by the addition of H2O. The Agencourt RNA-Advantage kit (Beckman Coulter: 47942) was then used to further purify the sample. Briefly, 100μl of the lysis buffer supplied with the kit was added to the aqueous layer that resulted from the acid extraction step. After brief centrifugation to remove insoluble material, the samples were then processed exactly as directed by the kit’s instructions (including DNaseI treatment). Nuclear RNA (10–50ng) was then converted to complementary DNA using a Nugen Ovation RNA-seq v2 kit (Nugen: 7102). Amplified complementary DNA (2μg) was then sheared in a Covaris S2 instrument (duty cycle=10%; intensity=5; cycles/burst=100; time=5 minutes; volume=120μl). In all, 200ng of sheared DNA was then end-repaired, linker-adapted and sequenced on an Illumina HiSeq 2000 to 50bp read length. The library synthesis steps are exactly those recommended by Illumina in the Genomic DNA Sample Preparation Kit except that Qiagen column purification was substituted with Agencourt AMPure magnetic bead purification.
Bead-bound nuclei were collected on a magnet stand (Invitrogen: 123-21D) and re-suspended in 1ml of 15mM Hepes pH 7, 1mM KCl, 5mM MgCl2, 2mM CaCl2, 340mM sucrose, 0.5mM spermidine, 0.15mM spermine, 5mM sodium butyrate. The sample was then split into two 500μl volumes, and nuclei were digested for 15min at 37°C after the addition of micrococcal nuclease (Worthington: LS004798) to 0.025 units/μl. The reaction was terminated by the addition of EGTA at 2mM. Nucleosomes were then extracted on ice for 30min in 200–400μl of 15mM Hepes pH 7, 200mM NaCl, 1mM KCl, 5mM MgCl2, 2mM EGTA, 340mM sucrose, 0.5mM spermidine, 0.15mM spermine and 5mM sodium butyrate. The extraction was repeated with the same buffer adjusted to 400mM NaCl. The supernatant from the second extraction was combined with the first and dialyzed for 2 hours at 4°C against 15mM Hepes pH 7, 25mM KCl, 1mM β-mercaptoethanol, 1mM PMSF, 5mM sodium butyrate. Greater than 70% of the nucleosomes prepared in this manner are monosomes.
The following antibodies were used to detect modified histones: H3K4me3 (Abcam: 8580), H3K27Ac (Abcam: 4729) and H3K27me3 (Millipore: 07-449). In all cases, 10μg of antibody was adsorbed to 3mg Dynal Protein-G beads in 600μl 1XPBS, 5mg/ml BSA for 4–8 hours at 4°C. After washing the beads on a magnet stand 3X in 1XPBS, 5mg/ml BSA, they were resuspended in 50 ul of the same buffer before ChIP.
Purified nucleosomes (1–5μg) were brought to 500μl in 15mM Hepes pH 7, 25mM KCl and 5mM sodium butyrate. In all, 50μl of this material was removed and stored as the non-enriched input sample, whereas the remaining 450μl portion was adjusted to 600μl by the addition of 150μl of 34mM Hepes pH 7, 9mM EDTA, 4% Triton X-100, 0.4% deoxycholate, 4X complete protease inhibitor cocktail. Finally, 50μl of antibody adsorbed Dynal Protein-G beads were added to the nucleosome preparation, and ChIP was carried out at 4°C for 12 hours under constant end-over-end agitation. Bead-bound nucleosomes were then washed on a magnet stand 8X in 50mM Hepes pH 8, 1mM EDTA, 1% IGEPAL, 0.7% deoxycholate, 0.5M LiCl, 1X complete protease inhibitor cocktail. After a single wash in TE, beads were pelleted at 4000rpm for 3min in a microcentrifuge and then incubated in 170μl of 1X TE/1% SDS for 30min at 65°C. After brief centrifugation, 150μl of 400ug/ml glycogen, 933ug/ml proteinase K was added to the supernatant fraction, and the sample was incubated at 37°C for 2 hours. Nucleic acid was recovered by extracting the sample once with phenol, followed by an additional extraction with phenol: CHCl3 and precipitation after the addition of NaCl to 0.2M. Finally, the sample was incubated in 50μl of TE containing RNAse A at 330ug/ml for 30min at 37°C for 1 hour, followed by purification on Agencourt AMPure magnetic beads (16) (Beckman Coulter: A63880). Enriched immunoprecipitated and non-enriched input DNA was end-repaired, linker-adapted and sequenced on an Illumina HiSeq 2000 to 50bp read length (17). The library synthesis steps are exactly those recommended by Illumina in the Genomic DNA sample preparation kit except that Qiagen column purification was substituted with Agencourt AMPure magnetic bead purification.
5′ ends of all reads were trimmed by five nucleotides to remove artifacts of the Nugen Ovation kit (FASTX; http://hannonlab.cshl.edu/fastx_toolkit). Reads were then aligned to the annotated transcriptome (FlyBase r5.41) (18) of the fly genome (UCSC dm3), using the TOPHAT splice-aware aligner (v1.4.0) (19). Pairs of libraries were analysed using CUFFDIFF v1.3.0 to estimate the abundance of each isoform and identify differentially expressed genes at a 1% false discovery rate (20). Fragment bias correction, multi-hit read correction and a mask of mitochondrial and non-coding transcripts were used to improve robustness of the expression levels, which were estimated in terms of reads per kilobase of exon model per million. Genome tracks of RNA-seq reads were created by counting read alignments per genomic position using BEDTools (v 2.15) (21) and scaling these counts to 10 million total read alignments using a custom Perl script. Gene ontology analysis was performed with the FlyMine web server (22). A list of candidate transcription factors (n=749) in the Drosophila genome was obtained from FlyTF (23,24).
ChIP-seq and input library reads were aligned to the fly genome using BOWTIE (v0.12.7) (25), keeping only those that mapped uniquely to a single position in the genome. For visualization, the reads were extended to the mean length of the library fragments (200bp), the number of extended reads covering each genomic position counted using BEDTools (21), and these counts scaled to a total number of 10 million read alignments using a custom Perl script.
We counted the number of reads in each ChIP and input library within a 10kb window scanned across the genome in 5kb increments using BEDTools. These counts were converted to a Z-score using chromosome-specific mean and standard deviations of window counts. To compare marks between cell types, differences in corresponding Z-scores were computed and then plotted on a Hilbert curve representing the euchromatic Drosophila genome (2L, 2R, 3L, 3R, 4, X) (26).
Each annotated FlyBase isoform was assigned a score representing the intensity of each mark by counting the number of reads mapping to the gene body or promoter (1-Kb window surrounding the TSS), and converting these counts to Z-scores using the mean and standard deviation of corresponding counts across all genes. These per-gene scores were corrected by subtracting the corresponding Z-score from an input library of the same cell type.
Data analysis was performed using a combination of the aforementioned utilities and custom Perl scripts. Plots were made using the R project (R Development Core Team, 2010) and genome landscapes visualized using the Broad Integrated Genomics Viewer (27). Hilbert curves were visualized using the HilbertVis R package (26).
The DNA constructs described in this article are available at Addgene (http://www.addgene.org). All data have been deposited in National Center for Biotechnology Information’s (NCBI) Gene Expression Omnibus (GSE37033).
When nuclei are harvested in the presence of non-ionic detergents, the outer nuclear membrane is stripped away from the nucleus; thus, our strategy takes advantage of the SUN domain family of proteins, which are embedded in the inner nuclear membrane of all eukaryotes (28). We evaluated several candidate SUN domain proteins for their ability to both localize to the nuclear envelope and to have minimal effects on the viability of flies. In the end, we selected a construct based on the C. elegans protein UNC-84 because both the mouse and Drosophila SUN homologues failed to support efficient tag localization in transfected Drosophila cells (29). For a GFP-based tag (UNC84-2XGFP), two copies of the fluorescent protein were used to increase both the antigenicity and brightness of the tag (Figure 1A). A tdTomato-based tag was also constructed that contained a C-terminal 3XFlag epitope tag (UNC84-tdTomFlag) (Figure 1A). In each tag, the fluorescent protein/epitope tag is oriented into the lumenal space of the nuclear envelope, which requires the removal of the outer nuclear membrane for detection (Figure 1A). The expression of both the red and green tags was driven by the GAL4/UAS system, and proper localization at the periphery of the nucleus was observed in both transfected cultured Drosophila cells and in neurons of the adult fly (Figure 1B–G) (13).
We developed a bead-based immunoaffinity purification scheme and tested its yield and purity in a reconstruction experiment. An equivalent number of nuclei from two populations of transfected cultured Drosophila cells, one expressing UNC84-2XGFP and the other UNC84-tdTomFlag, were mixed and subjected to bead-based immunoaffinity purification (Figure 2A, D). As expected, beads adsorbed to α-GFP antibody selectively capture GFP labeled nuclei (Figure 2B), and α-Flag beads specifically bind to nuclei tagged with tdTomatoFlag (Figure 2E). At subsaturating (ratio of nuclei to beads) conditions, the capture of UNC84-2XGFP tagged nuclei is more efficient than UNC84-tdTomFl tagged nuclei, as seen in the unbound fractions of nuclei (compare Figure 2C and F).
An important requirement for our method is the ability to isolate nuclei from flies where a small number of nuclei are tagged per brain. For the experiments described in this report, we used three GAL4 driver lines that express in a range of cell numbers per brain. Pan-neuronal expression was driven with the R57C10-GAL4 driver, which uses the neuron-specific enhancer of the n-synaptobrevin gene; OK107-GAL4 was used to target the Kenyon cell population of the mushroom body, and octopaminergic neurons were targeted with a Tdc2-GAL4 line (Figure 2G-I) (12,31,32). To test the sensitivity of our INTACT procedure, we used a bead binding assay (Figure 2J) that allowed us to quantitate yields from flies, where either green or red tag expression was driven by either the pan-neuronal or octopaminergic drivers. We estimate that the R57C10 driver targets 105 nuclei and that the Tdc2 driver targets 100–150 nuclei per brain (33,34). When INTACT was performed on 600 pan-neuronally tagged heads, 1.1×107 (three trials: 1.5, 1.0, 0.9×107) and 1.6×107 (three trials: 2.2, 0.9, 1.8×107) green and red nuclei, respectively, were recovered at 15–20% yield. The same experiment using the octopaminergic driver resulted in the recovery of 4.1×104 (three trials: 4.0, 4.1, 4.1×104) and 1.1×104 (three trials: 0.9, 0.9, 1.5×104) green and red nuclei, respectively, at approximately 15–50% yield. The lower yields associated with the pan-neuronally tagged brains result from saturation of the binding reaction (ratio of nuclei to magnetic beads), whereas the recovery of nuclei from sparsely tagged brains is more efficient especially when the green tag is used.
The specificity of INTACT was measured in a mixing experiment where UNC84-2XGFP tagged nuclei were mixed with an excess of UNC84-tdTomFl tagged nuclei. The mixture was generated by mixing green nuclei obtained from heads with octopaminergic tag expression and red nuclei obtained from an equal number of heads with pan-neuronal tag expression. Thus, the input mixture contained a ratio of 130/105 green versus red nuclei. After capture of these nuclei with beads adsorbed to an α-GFP antibody, the exact number of correctly captured green and incorrectly captured red nuclei was determined. These experiments showed that our technique is capable of recovering the approximately 130 Tdc2 cells per brain in 99% purity at 50% yield (Table 1). Because we can scale the assay to tens of thousands animals, we can isolate hundreds of thousands of nuclei from flies where similar numbers of neurons have been tagged.
One of the main goals of our method is to characterize gene expression in individual neuronal cell types. Although it is already established that nuclear RNA is sufficient to transcriptionally profile a cell type (9,10), we performed a series of experiments to confirm that RNA-seq can be performed with nuclei isolated from Drosophila neurons (35). However, before doing so, we assessed the performance of our RNA-seq procedure in the absence of INTACT, by first profiling whole-cell RNA isolated from whole dissected brains and compared the resulting expression levels with microarray results in the FlyAtlas compendium (Figure 3A) (36). Of the 27 tissue profiles in FlyAtlas, our brain RNA-seq levels were most correlated to microarray levels measured from the adult brain (Pearson’s r=0.86, Figure 3A), followed by the adult thoracicoabdominal ganglion (r=0.84) and larval central nervous system (r=0.74). These correlation values are in line with previous studies comparing RNA-seq and microarrays, suggesting our RNA-seq procedure is valid (37).
Next, we used the INTACT method with RNA-seq to characterize gene expression in nuclei isolated from all neurons, Kenyon cells and octopaminergic cells, using R57C10-GAL4, OK107-GAL4 and Tdc2-GAL4 drivers respectively (Figure 2G–I) (12,31,32). In the first experiment, RNA obtained from bulk neuronal nuclei (pan-neuronal INTACT) was compared with RNA harvested from whole brain (without INTACT), revealing 426 neuronally enriched genes and 440 depleted genes (CUFFDIFF q-value<0.01) (Figure 3B). If INTACT works as anticipated, we expect pan-neuronal nuclear RNA to be enriched in transcripts that encode genes that are involved in neuronal function and depleted in transcripts that are known to be expressed in non-neuronal cell types like glia. Gene ontology (GO) analysis (38) revealed that neuronally enriched genes (pan-neuronal INTACT) were significantly over-represented for ion channel activity (Holm-Bonferonni P=10−7; n=24), whereas neuronally depleted genes (relative to whole dissected brains) were over-represented in active transmembrane transporter activity (P=10−7, n=42) and gliogenesis (P=0.04; n=11). Transcripts that were identified in a screen for genes enriched in glia were also significantly over-represented in the depleted pool (P=10−7, n=16) (39). In addition to this broad-scale functional analysis, we checked the levels of genes known to be specific to neurons or glia. The pan-neuronal sample is relatively enriched for transcripts that encode the neuron-specific genes elav (277 versus 161 Fragments Per Kilobase of transcript per Million mapped (FPKM) in neurons vs. whole brain) and cadN (172 versus 61 FPKM). Neither of these markers reaches the threshold for differential expression, which is not surprising, given that 90% of cells in the fly brain are thought to be neuronal (34). In contrast, the glial markers repo (1.6 vs. 21 FPKM in pan-neuronal nuclei vs. whole brain) and nrv2 (35 vs. 1474 FPKM) are significantly depleted in the neuronal sample (Figure 3B) (40–43). It is possible that some of the observed transcriptional differences result from the retention of specific mRNAs inside of the nucleus, which has been demonstrated for a small population of mRNAs (44).
Given that 90% of the brain is estimated to be neuronal, the maximum attainable enrichment would seem to be 1.1X; thus, we were surprised at the number of genes (n=426) that were significantly enriched in the pan-neuronal sample relative to the whole dissected brain (34). We believe that the main reason for this apparent discrepancy is that the INTACT procedure was performed on whole heads (not dissected brains), which contain not only neurons in the brain but also R57C10-GAL4 expressing neurons found in peripheral sensory structures. Supporting this explanation, the pan-neuronal sample is significantly enriched in the mechanosensory channel nompC (7.3-fold) expressed in the antennae (45), and several chemosensory receptors including ionotropic receptors (Ir47a, 11.5x; Ir56a, 1.2x; Ir76b, 35.8x), gustatory receptors (Gr47a, 15.4x; Gr64b, 33.6x), odorant receptors (Or45a, 26.4x; Or98a, 31.1x) and the chemosensory protein CheB74a (9.5x). Biological replicates showed that RNA-seq on INTACT samples are reproducible (Figure 3C).
We next asked whether RNA-seq of INTACT isolated nuclei is as efficient as conventional RNA-seq of whole cells, given that we are sequencing a more complex nuclear RNA population that also contains introns. To address this issue, we analysed the genomic distribution of RNA-seq reads from each sample. Fewer of the RNA-seq read alignments from the INTACT nuclear RNA samples occurred over exons (pan-neuronal: 63%, 63%; Kenyon cells: 72%, 79%; Octopaminergic neurons: 77%, 85%) when compared with whole-cell RNA alignments (whole brain: 91%). The relatively small decrease in exon-mapped reads is consistent with the finding that splicing occurs co-transcriptionally (46,47). These observations suggest that roughly 25% more RNA-seq reads are necessary to achieve exon coverage of nuclear RNA comparable with whole-cell RNA.
Having established that RNA-seq of INTACT isolated nuclei is a reproducible and efficient means of transcriptional profiling, we next asked whether this approach could provide functional insight into neuronal subpopulations. To address this question, we analysed the transcriptional profile of two neuronal subpopulations: Kenyon cells and octopaminergic cells. We first checked whether their profiles were individually enriched in a functionally diverse set of genes previously shown to express in these two cell types. For example, Kenyon cells express a trio of transcription factors (ey, dac and toy), short neuropeptide F (sNPF) and the octopamine receptor of the mushroom body (OAMB), all of which we see significantly enriched in Kenyon cells versus pan-neuronal nuclei (Figure 3D) (48–51). Octopaminergic cells express two enzymes required for the biosynthesis of octopamine: tyrosine decarboxylase 2 (Tdc2) and Tyramine β-hydroxylase (Tbh), both of which we see significantly enriched in nuclear RNA harvested from octopaminergic neurons relative to pan-neuronal nuclei (Figure 3E) (52). The transcript levels of these markers were also appropriately enriched, when we directly compared the Kenyon cell and octopaminergic populations (Figure 3F). For a more systematic analysis, we also compared our expression data with FlyBase annotations of gene expression in each cell population (18). We observed at least moderate expression (FPKM>10) in Kenyon cells, for 53 of 66 genes (80%) reported to express in the adult mushroom body (Fisher exact test P=10–56), and 8 of 10 genes (80%) reported to express in adult Kenyon cells (P=10–8). Tbh, which is strongly expressed in octopaminergic RNA-seq data, is the only gene reported in FlyBase to express in octopaminergic neurons.
We next turned to the question of what neurotransmitters operate in the two profiled cell types. As expected, the octopaminergic profiles were enriched for the biosynthetic enzymes Tdc2 (58-fold vs. pan-neuronal nuclei) and Tbh (30x) and for the vesicular transporter of octopamine, Vmat (‘HIDATA’–CUFFLINKS is unable to reliably estimate an expression level because of the high number of RNA-seq reads). To a lesser extent, the octopaminergic profile was also enriched for genes involved in glutamate synthesis (Got2, 2.7×) and transport (VGlut, 1.7x; Eaat2, 1.7×), in line with previous reports of octopamine and glutamate co-transmission (53,54). In contrast to the clear signal in the octopaminergic profile, no single group of neurotransmitter genes was strongly enriched in the Kenyon cell profile. The Kenyon cell profile was also enriched for portabella (CG10251, 5.5×), a recently identified vesicular transporter that expresses in the mushroom body, but whose substrate is unknown (55). Our Kenyon cell data should contribute a rich set of candidate genes to help identify the portabella ligand.
As chromatin profiling has shown promise for systematically identifying transcriptional regulatory regions (e.g. enhancers), we tested its feasibility on INTACT samples (56,57). ChIP-seq was used to profile histone modifications associated with active promoters (trimethylation of histone H3 on lysine 4, H3K4me3) (58), open chromatin (acetylation of histone H3 on lysine 27, H3K27ac) (59) and Polycomb group (PcG)-mediated silencing (trimethylation of histone H3 on lysine 27, H3K27me3) (60). We quantified the level of H3K4me3 modification over promoters, as this signal correlates with gene expression (58). In contrast, H3K27me3 occurs in broad domains that often span the entire body of Polycomb target genes (61). H3K27ac is enriched over active promoters and can also mark whole gene bodies (61). For this reason, both H3K27me3 and H3K27ac levels were quantified over gene bodies. Although assigning a single value to each gene does not capture the subtleties of the histone modification pattern, this representation provides a convenient and compact way of interpreting the signal in a genome-wide manner. We first profiled pan-neuronal nuclei and found that all three histone modifications were reproducibly detected using our ChIP-seq protocol (Figure 4A–C).
When we profiled the histone modifications in octopaminergic and Kenyon cell neurons (Figure 4D–F), we found that nearly all the marker genes were differentially modified in the appropriate population, but with far less enrichment than observed in the RNA signal (Figure 3F). For example, the mushroom body marker ey is more actively marked at its promoter (H3K4me3) and gene body (H3K27ac) in Kenyon cells. Consistent with their proposed active and repressive roles, we observed a statistically significant, although weak, correlation between differential histone modification and differential gene expression in the octopaminergic and Kenyon cell populations (Figure 4G–I). Although most markers were differentially modified in a direction consistent with their expression, we were surprised to see that the biosynthetic enzymes Tbh and Tdc2 were not differentially marked by the PcG-mediated H3K27me3 modification (Figure 4F and I).
As the octopaminergic biosynthetic factors did not appear to be differentially PcG-repressed, we decided to take a closer look at the genes that are targeted by this silencing mechanism. In both the octopaminergic and Kenyon cell populations, repressed loci (H3K27me3 z≥2) were significantly enriched for transcriptional regulators (Figure 5A; Octopaminergic cells, n=168 of 561 genes, P=10−92; Kenyon cells, n=168 of 596 genes, P=10−88), which is in line with previous studies that have shown PcG-mediated silencing to target developmentally regulated transcription factors (60). The silenced genes were also enriched for several GO terms that are associated with neuronal cell fate determination. For example, the genes silenced in octopaminergic neurons were enriched for CNS development (P=10−20, n=46), cell fate specification (P=10−20, n=31), cell fate commitment (P=10−17, n=63), generation of neurons (P=10−16, n=72), neuron differentiation (P=10−10, n=59) and neuron projection development (P=10−5, n=39). The genes silenced in Kenyon cells showed a similar enrichment profile. Based on this observation, we hypothesized that perhaps transcription factors that are required for the establishment or maintenance of neuronal identity undergo PcG-mediated repression in cell types where they have no function (i.e. cell types where ectopic expression would alter their identity). To address this hypothesis, we studied PcG silencing over transcription factors found to be differentially expressed by RNA-seq (Figure 5A, colored points). This analysis revealed only a handful of differentially expressed transcription factors that were significantly repressed in one cell type but not the other. Consistent with our hypothesis, transcription factors known to regulate mushroom body development (ey, toy and dac) are repressed in octopaminergic nuclei, but lack repression in Kenyon cell nuclei. Based on this observation, we predict that the less-studied factors dmrt99b, Fer2, CG4328 and fd59A are responsible for establishing or maintaining the identity of octopaminergic neurons (Figure 5A).
The striking pattern of differential repression and activation is evident when we look at genome landscapes incorporating all of our ChIP-seq and RNA-seq data for the two most differentially modified loci: dmrt99b and ey (Figure 5B and C). Expression of ey in Kenyon cells is consistent with the promoter of the gene being actively marked, the gene body sitting in an open chromatin domain and the locus lacking PcG-mediated silencing (H3K4me3+, H3K27Ac+, H3K27me3−) (Figure 5B). Ey is not expressed in octopaminergic neurons, supported by the promoter and gene body lacking active histone modifications and the locus sitting under a broad island of PcG-mediated silencing (H3K4me3−, H3K27Ac−, H3K27me3+). The dmrt99b locus exhibits the complementary pattern of expression and repression, as the gene is expressed in octopaminergic neurons and repressed in the mushroom body (Figure 5C). A feature present at both the ey and dmrt99b loci is that in measurements from bulk neuronal nuclei, there is low-level expression and strong repression over the gene bodies, as one would expect from a mixed population of cells—detecting transcripts from expressing cells while detecting repression in other non-expressing cells. We reason that genes that show both expression and PcG repressive marks indicate (i) that the cell population is mixed and (ii) that the gene plays an important role in the specification of cell type.
If this hypothesis is true, then a combination of active and repressive histone modifications could systematically identify such developmentally important genes. We next asked if there are other genomic regions where a gene is actively marked (H3K27ac) in one population of neurons and repressed (H3K27me3) in the other. We first quantified the level of each modification observed in the two cell populations over a 10kb window scanned in 5kb increments across the whole genome (Figure 6A and B, top left). We chose a 10kb window to identify broad patterns, as H3K27me3 has been shown to mark the genome in broad domains of tens to hundred kilobases (61). H3K27ac can also mark the genome in broad domains, although it is also enriched at active promoters (61). The majority of genomic windows were similarly modified in the two cell populations (Figure 6A and B, top right). To provide a genome-wide view of the differential modification, we projected the data onto a Hilbert curve (Figure 6A and B, bottom). The Hilbert curve representation essentially folds the entire genome onto itself in a self-similar, or fractal, manner that fits into a two-dimensional image where neighboring pixels are typically also close in genomic sequence. Coloring this curve according to a genomic signal, such as differential modification, enables one to visualize its genome-wide spatial distribution in a compact manner. It is clear from these plots that differences in histone modifications between the two cell types occur in broad domains rather than individual windows (Figure 6A and B, bottom). As expected, the differential H3K27me3 modification occurs in broader domains than H3K27ac (57,61). We next asked how often a stronger H3K27ac signal in one cell type accompanies a stronger H3K27me3 signal in the other cell type. To address this issue, we calculated a correlation score between the differential H3K27ac and H3K27me3 modification levels measured in each genomic window (Figure 6C, top). Projecting this score onto a Hilbert curve indicates only a few discrete loci in the genome with strongly opposing differential H3K27me3 and H3K27ac signals in octopaminergic neurons versus Kenyon cells. These regions cover roughly 700kb of the genome and contain 16 genes, including 10 that are significantly differentially expressed, such as the mushroom body regulators (dac, toy, ey) and the vesicular transporter for octopamine (Vmat) (Figure 6C, bottom). Performing this series of analyses at a 1kb window scale does not significantly change the results. As expected, the Hilbert images become more punctate and the colors more intense; however, the distributions of histone modification levels and the broad domains of differential modification remain similar.
We then returned to the list of differentially expressed genes (Figure 3F) and ordered them by the anti-correlation of their differential H3K27me3 and H3K27ac signals (Figure 6D). We found, as we previously observed (Figure 5A), that many of the anti-correlated genes were transcription factors. In the case of the Kenyon cell population, four factors known to play a role in mushroom body development were highly ranked by this analysis (ey, toy, dac, Hr51) (48,62). Similarly, the two most anti-correlated loci in octopaminergic cells were CG4328, a homeobox transcription factor, and dmrt99b, a doublesex-related transcription factor.
Our version of the INTACT method enables both the isolation of specific neuronal cell types in Drosophila and their characterization by RNA-seq, ChIP-seq and other systematic genomic methods. Expression of our UAS-nuclear tag cassettes can be driven by any GAL4 line, such as those described in large systematic collections of drivers that have been screened for specific neuronal expression patterns (12). We showed that we can isolate tagged nuclei in high yields (~50%) at high purity (~99%) from sparse lines where a few 100 neurons (Tdc2) are tagged per brain. Because the purification protocol starts from frozen adult flies, we can amass many thousands of frozen animals, if necessary to obtain sufficient numbers of cells, either from a sparsely expressing line or for a genomic analysis that requires a large amount of input material (such as ChIP-seq). An additional advantage of starting with frozen flies is that in cases where the expression of a GAL4 driver is only characterized at the level of the brain (12), exogenous expression in the thoracicoabdominal region of the body can be ignored because the heads of frozen flies can be isolated by passing dissociated bodies over cooled sieves. We expect that the protocol will work on lines that are sparser than Tdc2, but the exact limit of sensitivity is unknown at this time.
The most immediate application we envision for this technology is the generation of cell type–specific gene expression profiles of specific Drosophila neuronal cell types by INTACT/RNA-seq. High resolution anatomical descriptions of specific cell types in neuronal circuits has been made possible by the systematic identification of cell type–specific GAL4 lines (12), which can be used to drive the expression of a nuclear tag, thus enabling the generation of cell type–specific profiles. This will allow the systematic characterization of the neurotransmitters, receptors, peptides and transcription factors expressed by the individual neurons that populate a neuronal circuit. Our data show that such gene expression profiles can be obtained by either RNA-seq or ChIP-seq, but RNA-seq gives better signal/noise and requires less input material (102–103 nuclei for RNA-seq versus 105–106 nuclei for ChIP-seq).
An advantage of isolating nuclei (either by INTACT or other sorting approaches) is that one can apply high-throughput genomic characterization protocols to isolated nuclei, beyond just transcriptional profiling. Our experiments demonstrate the reliability and feasibility of chromatin profiling by INTACT/ChIP-seq, and we also expect to be able to apply a variety of other methods, such as DNAse-seq, Gro-seq, Nascent-seq, Hi-C and ChIA-PET (43,63–66). We therefore expect to gain access not only to gene expression profiles but also to the transcriptional regulatory networks that are necessary for driving the expression profile. Such information has proven critical to the study of the mechanisms that control neuronal identity. For example, in the worm C. elegans, excellent progress has been made in the identification of terminal selector transcription factors, which maintain the identity of differentiated neurons (67–69). These factors were identified by first generating a list of genes specifically expressed in the neuron of interest (a gene battery), followed by a thorough experimental analysis to identify regulatory regions and binding sites around the loci of members of the gene battery. By enabling comprehensive application of the same basic idea, we expect that INTACT should facilitate such efforts in Drosophila neurons.
When we compared the chromatin profiles of Kenyon cells (OK107) with octopaminergic neurons (Tdc2), we fortuitously noticed a pattern that suggests a means of screening for key transcription factors that are involved in either the establishment or maintenance of neuronal identity. PcG-mediated trimethylation of histone H3 on lysine 27 has been implicated in the regulation of transcription factors that are known to play an important role in development (70,71), and we observed selective PcG-silencing of transcription factors in differentiated neurons. In fact, some of these loci show a strongly anti-correlated H3K27me3 and H3K27ac signal in octopaminergic neurons (Tdc2) and Kenyon cells (OK107). We imagine that key transcription factors, potentially capable of altering cell fate, must be silenced in cell types where they should be off, and thus they are targeted with an additional layer of repression (PcG-mediated). We hypothesized that we can enrich for these factors by identifying loci that show expression (measured by RNA-seq) and H3K27ac marking in one cell type along with an anti-correlated lack of expression and PcG-mediated silencing in the other cell type. When we do this for Kenyon cells, a small set of transcription factors are identified, including ey, dac, toy and Hr51, all of which are known to play a role in the development of the mushroom body (48,62). When we do the reverse comparison for octopaminergic neurons, where much less is known about their transcriptional program, we identify a different set of genes including the presumptive transcription factors dmrt99B, fd59A, Fer2 and CG4328. Consistent with the hypothesis that these factors play a role in the specification of octopaminergic neurons, all four are expressed on the embryonic midline (72,73), from which the octopaminergic cell population arises (74). It is not uncommon for the same transcriptional regulatory network to play a role both in the early development and adult maintenance of a neuronal cell type as has been described for Tv neuropeptidergic cells (75). A role for PcG-silencing in the specification of cell types, in particular specific subsets of neurons, has been suggested by others (76–80).
Our PcG-silencing data can also be used to characterize the heterogeneity of a population of neurons. In bulk neuronal nuclei (57C10), we see many genetic loci that show signatures of being both active and repressed (active: RNA-seq, H3K4me3, H3K27ac; repressed: H3K27me3). A simple explanation, which has been previously observed in other systems (81), is that the bulk population is a mixture composed of expressing and non-expressing/repressed cells. For example, in bulk neuronal nuclei (57C10) the ey locus appears to be active and repressed because the gene is known to be expressed in a specific group of cells in the adult brain (82). In the Kenyon cell population (OK107) where ey is broadly expressed, the locus is active and lacks repression, which is consistent with the OK107-GAL4 line being an enhancer trap near the ey locus (31).
A major limitation of INTACT involves its application to sparsely tagged lines (1–10 neurons) or to cell types found at earlier stages of development where freezing the animals is not possible (larval stages of development). For example, some of the downstream genomic protocols, such as ChIP-seq, typically require 105–106 cells. We expect this barrier to drop as more sophisticated methodologies for amplification are interfaced with the technique. For example, a method has been described that allows ChIP-seq to be performed on 103 cells (83). Another solution for the isolation of nuclei from sparsely tagged lines might involve the generation of a second generation of tags that have increased antigenicity or that enable two-step purification procedures similar to those used in proteomic assays that rely on tandem affinity purification (84).
Funding for open access charge: Howard Hughes Medical Institute.
Conflict of interest statement. None declared.
We would like to thank Julide Bilen and Todd Laverty for having the patience to teach us how to work with Drosophila; Amanda Cavallaro, Dona Fetter, Jennifer Jeter, Kevin McGowan, Andrey Revyakin, Zhengjian Zhen, Tim Brown, the Eddy lab, Julie Simpson, Jim Truman, Aljoscha Nern, Barret Pfeiffer, Gerry Rubin and David Stern were also helpful at various points in the development of this technique. We also thank Goran Ceric for managing Janelia Farm’s high-performance computing resources.