A number of assays analyze chromatin state and identify active gene regulatory elements genome-wide including mapping DNaseI hypersensitive sites (DHSs) (11
), formaldehyde-assisted isolation of regulatory elements (FAIRE) (13
) and chromatin immunoprecipitation (ChIP) (reviewed in 14
). Figure schematically shows how these assays work. DHSs and FAIRE identify active regulatory elements through detection of nucleosome-free regions, whereas ChIP identifies specific transcription factor-binding sites and presence of specific histone variants and histone tail modifications. One can also study DNA methylation using a number of methods (15
), but that is beyond the scope of this review and will not be discussed here. DHSs, FAIRE and ChIP are distinct methods and the strengths and limitations of using these assays in large population studies are further discussed below.
Figure 2. A schematic diagram of three chromatin assays. The blue cylinders represent the core histones, around which the DNA is wrapped. The pentagon represents a bound transcription factor (TF) which has displaced histones in a particular region; this region (more ...)
DHSs represent regions of the genome where nucleosomes have been displaced by transcription factors, making them hypersensitive to DNaseI digestion. These regions are commonly described as ‘open’ chromatin, whereas the remaining regions are ‘closed’. DHSs can robustly identify all different types of active regulatory elements, including promoters, enhancers, silencers, insulators and locus control regions. While DHSs do not directly reveal which transcription factor(s) are binding to each region, it does identify in a general sense where the functional regulatory elements of the genome are and whether they are open or closed across diverse cell types, as well as within the same cell type across many individuals.
FAIRE uses formaldehyde to biochemically separate DNA that is packaged in nucleosomes from DNA that is bound by non-nucleosomal proteins like transcription factors. Although FAIRE is also enriching for open chromatin regions, it is methodologically independent from DNaseI experiments and therefore complementary. It is also comprehensive in that it is an inherently genome-wide method that enriches for all known classes of regulatory elements. Since FAIRE uses formaldehyde, a distinct advantage of this method is that it can readily be used on fixed frozen tissues.
ChIP precisely determines the location of specific DNA-associated proteins, histone variants and histone modifications within the genome, which is more informative than general open chromatin data generated by DNaseI and FAIRE. Specific factors, variants and/or modifications can be targeted for analysis depending on the disease or suspected gene involvement. While ChIP provides very specific information about factor location, this assay is limited to factors that have high-quality ChIP-grade antibodies and only one factor is tested per experiment. Tagged versions of proteins are an alternative option for cultured cells, but not suitable for studying primary cell types or tissues.
The original implementations of these methods to study chromatin involved detection of specific signals using Southern blots, PCR or microarray hybridization, but all of these methods have now been adapted to use next-generation sequencing (DNase-seq, FAIRE-seq and ChIP-seq), which in addition to providing a genome-wide readout, also offers the opportunity to resolve allele-specific signals. DNaseI, FAIRE and ChIP experiments generate libraries of DNA fragments that are enriched in genomic regions corresponding to open chromatin (DNaseI and FAIRE), or that were cross-linked to targets of specific antibodies (ChIP). These fragments will vary in size depending on the protocol, but all are amenable to construction of sequencing libraries using any of the currently available platforms. Each sequencing experiment generates tens of millions of short sequence reads that provide a sampling of the DNA in the constructed library.
To determine regions of open chromatin, ChIP targets and allele-specific biology, sequence reads must be aligned to a reference genomic sequence. A large and growing number of software packages are available to align reads and further process these data (16
). The short length of the sequence reads, the repetitive nature of large mammalian genomes and the incompleteness of specific types of regions within the reference genome create challenges that must be carefully considered, particularly for detection of allele-specific signals at heterozygous SNPs because of the effect of apparent mismatches on shorter reads as described below. Paired-end sequencing of both ends of an enriched DNA fragment can alleviate some of the inherent uncertainty of aligning reads to the reference genome.