In eukaryotes, packaging of DNA into chromatin reduces the accessibility of genetic information to the set of proteins involved in regulating DNA-templated processes such as transcription. Successful orchestration of DNA-dependent processes is achieved in part by regulating the stability of nucleosomes at these sites [1
]. Here “stability” refers to the probability of an intact nucleosome at a given nucleotide position, versus a nucleosome in an absent or disrupted state at that position. Several mechanisms exist to modulate nucleosome stability, including competition with sequence-specific factors [4
], ATP-dependent nucleosome remodeling complexes [8
] and post-translational modifications of the histone tails [11
]. Nucleosome stability at any given locus is governed by a combination of factors acting in concert, which results in a context-specific set of DNA elements bound by regulatory factors for each cell type.
Traditionally, active regulatory elements have been identified by their increased sensitivity to nuclease digestion, such as DNase I [15
]. Typically this involves subjecting isolated nuclei to a mild nuclease treatment, followed by detection using Southern blots to identify nuclease hypersensitive sites. Several groups have recently adapted the procedure for genome-wide detection with DNA microarrays or next-generation sequencing [21
]. However, requirements for a clean nuclei preparation from a single-cell suspension, and the need for laborious enzyme titrations means that it is difficult to perform DNase hypersensitivity assays on solid tissues, on a limited number of cells, or in parallel on many different samples.
Here we describe an alternative strategy for genome-wide isolation of active regulatory elements termed FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). It is a simple, high-throughput procedure to isolate and map genomic regions depleted of nucleosomes. The procedure involves crosslinking proteins to DNA using formaldehyde, shearing the chromatin, and performing a phenol-chloroform extraction. The genomic regions preferentially segregated into the aqueous phase are then mapped back to the genome by hybridization to tiling microarrays or are read directly using next-generation DNA sequencing (). Quantitative PCR can be used to assay individual loci, which is useful when screening many cell or tissue types. The relatively straightforward nature and tractability of FAIRE has broad utility for the genome-wide detection of active regulatory elements across all eukaryotic species, in clinical samples, and for high-throughout screens.
FAIRE was first demonstrated in Saccharomyces cerevisiae
]. In yeast, the genomic regions immediately upstream of genes were preferentially segregated into the aqueous phase, in a manner that was strongly negatively correlated with nucleosome occupancy [26
]. Subsequent studies demonstrated that FAIRE efficiently isolated nucleosome-depleted regions of the Homo sapiens
genome, which included both transcription start sites and distal regulatory elements such as enhancers and silencers [30
] (). Results from both yeast and human found that enrichment of the upstream regions of genes was positively correlated with transcription of the downstream gene. However, in human cells the vast majority of sites identified were far from any annotated gene. For the majority of these distal sites, it is not yet possible to ascribe a function, identify what factors might be bound, or determine the genes being regulated by each regulatory element.
The enrichment of regulatory regions in the aqueous phase is thought to result from the very high crosslinking efficiency of histone proteins to DNA, versus the lower efficiency of crosslinking sequence-specific proteins to DNA. This difference in crosslinking efficiency is likely due in part to formaldehyde’s very short crosslinking distance. Formaldehyde is a small molecule (HCHO) and crosslinks are only formed between proteins and DNA in direct contact. There are approximately 10 to 15 histone-DNA interactions within a nucleosome that serve as potential crosslinking sites [31
]. However, for most DNA-binding proteins there are far fewer potential crosslinking sites. The average binding sites are 5 to 15 bp [32
], with only a few of the bases close enough to the protein contacts be crosslinked [33
]. In addition, formaldehyde requires a ε-amino group such as occurs on lysine, to form a crosslink [34
]. Approximately 10% of the amino-acid composition of histones are lysine, a much higher proportion than a typical protein. Due to both of these factors nucleosomes are much more readily crosslinkable to DNA, and are likely to dominate the crosslinking profile ().
Formaldehyde crosslinking efficiency as the basis for FAIRE