|Home | About | Journals | Submit | Contact Us | Français|
The binding of sequence-specific regulatory factors and the recruitment of chromatin remodeling activities cause nucleosomes to be evicted from chromatin in eukaryotic cells. Traditionally, these active sites have been identified experimentally through their sensitivity to nucleases. Here we describe the details of a simple procedure for the genome-wide isolation of nucleosome-depleted DNA from human chromatin, termed FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). We also provide protocols for different methods of detecting FAIRE-enriched DNA, including use of PCR, DNA microarrays, and next-generation sequencing. FAIRE works on all eukaryotic chromatin tested to date. To perform FAIRE, chromatin is crosslinked with formaldehyde, sheared by sonication, and phenol-chloroform extracted. Most genomic DNA is crosslinked to nucleosomes and is sequestered to the interphase, whereas DNA recovered in the aqueous phase corresponds to nucleosome-depleted regions of the genome. The isolated regions are largely coincident with the location of DNaseI hypersensitive sites, transcriptional start sites, enhancers, insulators, and active promoters. Given its speed and simplicity, FAIRE has utility in establishing chromatin profiles of diverse cell types in health and disease, isolating DNA regulatory elements en masse for further characterization, and as a screening assay for the effects of small molecules on chromatin organization.
In eukaryotes, packaging of DNA into chromatin reduces the accessibility of genetic information to the set of proteins involved in regulating DNA-templated processes such as transcription. Successful orchestration of DNA-dependent processes is achieved in part by regulating the stability of nucleosomes at these sites [1–3]. Here “stability” refers to the probability of an intact nucleosome at a given nucleotide position, versus a nucleosome in an absent or disrupted state at that position. Several mechanisms exist to modulate nucleosome stability, including competition with sequence-specific factors [4–7], ATP-dependent nucleosome remodeling complexes [8–10] and post-translational modifications of the histone tails [11–14]. Nucleosome stability at any given locus is governed by a combination of factors acting in concert, which results in a context-specific set of DNA elements bound by regulatory factors for each cell type.
Traditionally, active regulatory elements have been identified by their increased sensitivity to nuclease digestion, such as DNase I [15–20]. Typically this involves subjecting isolated nuclei to a mild nuclease treatment, followed by detection using Southern blots to identify nuclease hypersensitive sites. Several groups have recently adapted the procedure for genome-wide detection with DNA microarrays or next-generation sequencing [21–24]. However, requirements for a clean nuclei preparation from a single-cell suspension, and the need for laborious enzyme titrations means that it is difficult to perform DNase hypersensitivity assays on solid tissues, on a limited number of cells, or in parallel on many different samples.
Here we describe an alternative strategy for genome-wide isolation of active regulatory elements termed FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). It is a simple, high-throughput procedure to isolate and map genomic regions depleted of nucleosomes. The procedure involves crosslinking proteins to DNA using formaldehyde, shearing the chromatin, and performing a phenol-chloroform extraction. The genomic regions preferentially segregated into the aqueous phase are then mapped back to the genome by hybridization to tiling microarrays or are read directly using next-generation DNA sequencing (Figure 1). Quantitative PCR can be used to assay individual loci, which is useful when screening many cell or tissue types. The relatively straightforward nature and tractability of FAIRE has broad utility for the genome-wide detection of active regulatory elements across all eukaryotic species, in clinical samples, and for high-throughout screens.
FAIRE was first demonstrated in Saccharomyces cerevisiae . In yeast, the genomic regions immediately upstream of genes were preferentially segregated into the aqueous phase, in a manner that was strongly negatively correlated with nucleosome occupancy [26–29]. Subsequent studies demonstrated that FAIRE efficiently isolated nucleosome-depleted regions of the Homo sapiens genome, which included both transcription start sites and distal regulatory elements such as enhancers and silencers  (Figure 2). Results from both yeast and human found that enrichment of the upstream regions of genes was positively correlated with transcription of the downstream gene. However, in human cells the vast majority of sites identified were far from any annotated gene. For the majority of these distal sites, it is not yet possible to ascribe a function, identify what factors might be bound, or determine the genes being regulated by each regulatory element.
The enrichment of regulatory regions in the aqueous phase is thought to result from the very high crosslinking efficiency of histone proteins to DNA, versus the lower efficiency of crosslinking sequence-specific proteins to DNA. This difference in crosslinking efficiency is likely due in part to formaldehyde’s very short crosslinking distance. Formaldehyde is a small molecule (HCHO) and crosslinks are only formed between proteins and DNA in direct contact. There are approximately 10 to 15 histone-DNA interactions within a nucleosome that serve as potential crosslinking sites . However, for most DNA-binding proteins there are far fewer potential crosslinking sites. The average binding sites are 5 to 15 bp , with only a few of the bases close enough to the protein contacts be crosslinked . In addition, formaldehyde requires a ε-amino group such as occurs on lysine, to form a crosslink [34,35]. Approximately 10% of the amino-acid composition of histones are lysine, a much higher proportion than a typical protein. Due to both of these factors nucleosomes are much more readily crosslinkable to DNA, and are likely to dominate the crosslinking profile (Figure 3).
The following provides a general framework for performing FAIRE, which specifically emphasizes performing FAIRE on cells grown in culture. The final methods section provides the modifications required to perform FAIRE on tissue samples. The protocols for cells and tissues are also included as one-page supplementary files for easier use at the bench.
For cells grown in culture, add 37% formaldehyde directly to the growth media to a final concentration of 1% and incubate at room temperature on an orbital shaker at 80 rpm. Typically, we incubate for 30 minutes for yeast and 5 minutes for human cultured cells, although these times will vary for different species and cell types. Generally, whatever fixation time and conditions are used for ChIP (Chromatin Immunoprecipitation) experiments will be adequate for FAIRE, with slightly shorter fixation times often being optimal. To quench the fixation, add 2.5 M glycine to a final concentration of 125 mM and incubate for 5 min at room temperature while continuing to shake. Cells grown in suspension should be collected by centrifugation at 700 × g for 5 min at 4 °C. For adherent cells, first remove the media containing formaldehyde and glycine, add ice-cold PBS to cover the cell layer, scrape, and transfer the cells to a conical tube. For both adherent cells and cells in suspension, wash two more times with ice-cold PBS to ensure all residual media is removed.
Resuspend cells in 1 ml of lysis buffer (2% Triton X-100, 1% SDS, 100 mM NaCl, 10 mM Tris-Cl pH 8.0, 1 mM EDTA) per 107 (or 0.4g of) cells. Transfer 1 ml of lysis solution to 2 ml screw-capped tube with rubber seal and add 1 ml of 500 μM glass beads. Cell disruption is performed in a mini bead-beater (Mini-BeadBeater-8, BioSpec Inc.) set to homogenize for five 1-minute sessions with 2-minute incubations on ice between sessions (see the alternative protocol if a Beadbeater is not available). To recover the lysate, puncture the bottom of the 2 ml tube with a 25G syringe and drain into 15 ml tube on ice. Once the lysate has drained, add an additional 500 μl lysis buffer to clear any remaining sample from the beads. Filtered air can be used to push the liquid through the hole in the bottom of the tube. Proceed directly to sonication.
If a bead-beater is not available, the following procedure is suitable for human or similar cell types, but not yeast . This procedure often requires additional rounds of sonication. Add 10 ml of Lysis Buffer 1 (50 mM HEPES-KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100) per 108 cells and rock at 4° C for 10 minutes. Spin at 1,300 × g for 5 minutes at 4° C and remove supernatant. Add 10 ml of Lysis Buffer 2 (10 mM Tris-HCl, pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA) per 108 cells and rock at room temperature for 10 minutes. Spin at 1,300 × g for 5 minutes at 4° C and remove supernatant, at this point the pellet should appear white and fluffy. Add 3.5 ml of Lysis Buffer 3 (10 mM Tris-HCl, pH 8.0, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 0.1% Na-Deoxycholate, 0.5% N-lauroylsarcosine) per 108 cells. Proceed directly to sonication.
Transfer the lysate to 1.5 ml tubes in 300 μl aliquots and sonicate for 15 minutes using a Bioruptor UCD-200 (Diagenode) set to pulse on high for 30 seconds followed by 30 seconds of rest. The water bath should be maintained at a constant temperature of 4° C using a recirculator. Alternatively one may use a microtip sonicator (Branson Sonifier 450) set at 15% amplitude for five sessions of sixty pulses (1 second on/1 second off), incubating the sample on ice for two minutes between sessions. Clear the lysate of cellular debris by spinning at 15,000 × g for 5 minutes at 4° C, transfer supernatant to a new tube. Run an aliquot, equivalent to 500 ng total genomic DNA, on a 1% agarose gel to ensure fragment sizes range between 100–1000 bp.
Add a volume of phenol/chloroform (Sigma #P3803 phenol, chloroform, and isoamyl alcohol 25:24:1 saturated with 10mM Tris, pH 8.0, 1 mM EDTA) that is equal to the volume of the lysate, vortex well, spin at 12,000 × g for 5 minutes, and transfer the aqueous fraction to a fresh 1.5 ml tube. If there is very little aqueous phase due to an exceptionally large interphase, remove aqueous phase, add 500 μl TE to old interphase, vortex, and spin again. To ensure all protein has been removed, perform an additional extraction by adding an equal volume of phenol/chloroform to the isolated aqueous fraction. Finally, add an equal volume of chloroform (Fluka BioChemika 25666, chloroform, isoamyl alcohol 24:1) to the aqueous fraction, spin, and transfer aqueous phase to a new tube.
Add 3M sodium acetate (pH 5.2) to a final concentration of 0.3 M, and add 1 μl of 20 mg/ml glycogen. Mix by inverting. Add two volumes of 95% ethanol mix by inverting and incubate at −20° C overnight. Although overnight incubations are routinely performed, incubation as short as one hour should be sufficient. Pellet the precipitate by spinning at 15,000 × g for 30 minutes at 4° C, wash the pellet with 500 μl ice cold 70% ethanol, spin at 15,000 × g for 5 minutes at room temperature, remove the supernatant, and dry pellet in a speed-vac. Resuspend the dried pellet in 50 μl of 10 mM Tris-HCl pH 7.5. Add 1 μl of 10 mg/ml RNase A and incubate for 1 hour at 37° C. Earlier versions of the protocol included a step that incubates DNA from crosslinked samples at 65° C overnight to ensure that any DNA-DNA crosslinks do not interfere with downstream enzymatic steps. However, we have found that skipping this step results in no detectable difference in the efficiency of downstream enzymatic reactions.
Clean up the sample using either a spin column capable of recovering small DNA fragments (75–200 bp) or perform an additional phenol/chloroform extraction and ethanol precipitation. We have found that this is necessary to achieve accurate spectrophotometric measurements of our samples for subsequent reactions. Depending on the number of cells used for FAIRE and the final concentration, it may be possible to see the size distribution of FAIRE DNA fragments on a 1% agarose gel, which typically ranges between 75–200 bp. However, gel verification is not necessary and is often omitted.
The following modifications for performing FAIRE in tissues include steps to prepare the tissue sample for crosslinking, disassociating the cells, and cell lysis. These modifications have been successfully used on tissue samples as small as 10 mg. Other considerations for working with tissue samples include whether it is fresh or frozen, and how fibrous the tissue is. For fresh soft tissues, such as brain, simply mince the tissue into small pieces using a scalpel, transfer to a dounce with 1 ml of PBS containing 37% formaldehyde at a final concentration of 1%, and incubate for 5 minutes at room temperature (22–25°C) with swirling. Add 2.5 M glycine to a final concentration of 125 mM glycine and incubate for an additional 5 minutes. Disassociate the cells with a dounce homogenizer, wash two times with ice cold PBS, and proceed with cell lysis and all remaining steps for FAIRE as described above.
For previously frozen tissues or fresh fibrous tissues, samples should be placed in a 15 ml conical tissue grinder (VWR #47732-446), precooled in a liquid nitrogen bath, incubated for 10 minutes, and ground into a powder until roughly the consistency of sand. Remove the 15 ml tube containing the powder from liquid nitrogen bath, add 1.5 ml of room-temperature PBS containing 1% formaldehyde per 10 mg of tissue, and incubate for 7 minutes at room temperature. For most tissue types you can proceed with the protocol described above, but for especially tough tissue types use larger 2.8 mm ceramic or metal beads (Precellys CK28 or MK28) and perform additional cycles in the mini bead-beater for an efficient lysis before sonication.
Quantitative PCR (qPCR) is used both as a method for detecting open chromatin sites and as a means to validate sites identified using either DNA microarray or high-throughput sequencing data. There are several considerations when designing qPCR experiments, including selection of an appropriate set of reference regions, exact primer localization, and methods for quantitation of the results. It is important to select an appropriate set of reference regions since these will be used to calculate relative enrichment for all other sites tested. This can be difficult due to the limited knowledge of “gold standard” sites of closed chromatin available for most species. Even for cells in which sites of closed chromatin have been mapped, these may be limited to a specific growth condition. Therefore we often use a tiling approach (Figure 1B) for detection of open chromatin sites using qPCR. Here, primer pairs are designed such that the products are either overlapping or closely spaced across the genomic regions being interrogated. The reference regions are those primer sets flanking the regions isolated by FAIRE. This strategy is also useful for validating results from microarray and sequencing data, which requires a set of positive and negative sites to determine both sensitivity and specificity. Primer design is also critical for obtaining accurate results from qPCR, since primer pairs spanning or near the edges of open chromatin sites may be able to only detect a subset of the DNA fragments isolated in the aqueous phase (Figure 1B). Optimally, primer pairs should be designed to amplify 60–100 bp products within the central portion of the identified regions. We typically calculate the relative enrichment for each amplicon using the comparative cT method . Here, a ratio is calculated using the signal from the FAIRE sample relative to the signal from DNA prepared from an uncrosslinked sample. All ratios are then normalized to the amplicon with the lowest ratio, which is typically from the reference regions. Relative quantitation is used in part because FAIRE enriches for mitochondrial DNA, and since the mitochondrial content can vary considerably between cells it is difficult to get an accurate measurement of the proportion of genomic DNA enriched in each of the FAIRE samples.
High quality FAIRE data has been obtained from several microarray platforms, including Agilent, NimbleGen (Roche), and PCR-based arrays. Any microarray platform will suffice, but there are several factors to consider, such as the type of probe, the genomic regions covered, and the resolution . One of the most important for FAIRE is selecting a microarray design with sufficient resolution (Figure 1C). For oligonucleotide (50–75 bp) tiling microarrays, probe-to-probe spacing should not exceed 100 bp if possible. Doing so reduces the number of probes per FAIRE site to just one or two.
Typically, we amplify the DNA using ligation-mediated (LM) PCR . The DNA fragments are made blunt using T4 DNA polymerase, asymmetric linkers (5′-GCGGTGACCCGGGAGATCTGAATTC-3′ and 5′-GAATTCAGATC-3′) are ligated to the blunt ends using T4 DNA ligase, and then amplified by PCR with a primer complementary to the linker.
For dual-channel microarray platforms, DNA derived from uncrosslinked cells, processed in parallel to the crosslinked cells, is hybridized as the reference or input sample (Figure 1A). If it is not possible to obtain uncrosslinked cells, which is often the case when cells are limited or with tissues, crosslinks from a portion of the sample can be reversed and used as a reference. Remove an aliquot from the cleared lysate following sonication. Reverse crosslinks by incubating at 65° C overnight, and perform a phenol/chloroform extraction, ethanol precipitation, and RNase A treatment.
For tiling microarrays, raw data extraction is specific to the particular platform selected and entails image acquisition and feature quantitation. Data can be expressed as a raw intensity for single-channel platforms or as a log2 ratio for dual-channel platforms. For data preprocessing, we typically normalize each dataset by calculating the z-score for each log2 ratio. The z-score is calculated by subtracting the mean log2 ratio and dividing by the standard deviation, which centers every dataset on the mean and standardizes the variance. In this way, every dataset has a mean of 0 and standard deviation of 1. This methodology is only applicable to dual channel platforms, although alternative strategies are available for single channel platforms [40,41].
Identification of regions enriched by FAIRE can be accomplished using most existing peak-finding algorithms used for ChIP-chip experiments [42–45]. For microarray data we typically use ChIPOTle , which uses a sliding window to identify statistically significant signals that comprise a peak. The significance of each region is determined by reflecting the negative portion of the data about zero, and then assuming a Gaussian distribution to estimate the null distribution.
The three main user-adjustable parameters in ChIPOTle are window size, step size, and threshold. Briefly we have found the following parameters to be optimal for analyzing FAIRE data from oligonucleotide tiling microarrays. For microarrays with probes spaced every 38 bp we use a window size of 300 bp. Whereas for probe spacing of every 60 to 100 bp we use a 500 bp window size. The larger window size is necessary to ensure a sufficient number of probes are included in each window. We use a step size that is the average probe spacing, which is measured as the start of one probe to the start of the next. We often try a range of thresholds and look at how the overlap changes between replicates and genomic features.
Each of the high-throughput sequencing platforms utilizes a different sample preparation procedure. We are most familiar with library preparation of FAIRE DNA for the Illumina Genome Analyzer II (GAII) (Figure 1D). We use 100 to 200 ng of DNA for starting material. This procedure involves blunting the ends of the DNA fragments (Epicentre #ER0720), adding an “A” overhang (Epicentre #KL06041K), and ligating double-stranded adapters containing a T-overhang to the DNA fragments (Illumina #1000521). Ligation products are then run on a 2% agarose gel, and a portion of the gel corresponding to 125 bp– 275 bp is excised. It usually is not possible to see the DNA on the gel at this point. PCR amplification is then carried out using PfuUltraII (Stratagene #600670) and primers complimentary to the adapters (Illumina #1000537 and 1000538).
Raw data acquisition for the GAII entails image acquisition and base calling. Approximately 25 million mapped 36 bp reads are typically required for robust detection of FAIRE peaks in a mammalian sample. Several algorithms are available for mapping the reads back to the genome, each utilizing different computational and alignment strategies [47–49]. Typically we use Maq , which incorporates information about read quality into the alignment. Since only the first 36 bp from either end of each ~200 bp double-stranded DNA molecule is sequenced, we computationally extend each aligned read to produce 200 bp extended reads. For visualization, we count the number of extended reads overlapping every basepair in the genome, and compute a density by dividing by the total number of bases contained within the extended reads. These density estimates for each basepair can be loaded into genome browsers, such as the UCSC genome browser (Figure 2) .
Several algorithms exist for identifying enriched regions [52,53] for high-throughput sequencing data. Currently we use fseq , which calculates a density estimation for each base pair by summing the set of Gaussian distribution representing the center of each extended sequence read. Thresholds, based on the set of density estimates throughout the genome, can then be used to identify enriched regions. In addition to identifying regions of open chromatin, we are also able to identify copy number variations by analyzing large-scale (100 kb to 1 Mbp) changes in the data.
Several aspects of FAIRE make it a powerful genome-wide approach for detecting functional in vivo regulatory elements in eukaryotes. It requires little treatment of cells prior to the addition of formaldehyde and involves only a few reagents: formaldehyde, phenol, chloroform, and ethanol. The successful application of FAIRE on a limited numbers of cells expands its utility beyond what other DNA accessibility assays can accomplish. This provides an opportunity to perform genome-wide assays of chromatin structure on tissue samples from patients, or to grow cells in small-well plates to screen small molecules for chromatin effects. Additionally, since FAIRE recovers the complete DNA fragments at regulatory elements it is possible to use this material directly in functional assays, such as with reporter vectors.
Genome-wide maps of active regulatory elements will allow a better understanding of how the availability of sequence-based regulatory elements are coordinated with the regulation of factors that utilize them in a given cellular environment. The emerging set of consortium-based datasets, such as those derived from the ENCODE project , will provide a foundation for understanding the relationships among these factors, and be critical to constructing realistic models of gene regulation in eukaryotic cells. The next major challenge will be to functionally annotate the catalogue of regulatory elements discovered across a diverse set of cell types, organisms, and disease states.
We thank members of the Lieb lab for discussions. Support for this work has been provided by grants from the NHGRI.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.