Eukaryotic promoters are located directly upstream of the transcribed region, providing a site for recruitment of RNA polymerases and subsequent initiation of transcription. Promoter-driven transcription is influenced by cis
-regulatory elements – sequences located on the same chromosome anywhere from a few kb to tens or hundreds of kb from the gene(s) they help to regulate. Cis
-regulatory elements that modulate gene expression include enhancers, silencers, locus control regions (LCRs), insulator/boundary elements and matrix attachment regions (MARs). Enhancers greatly increase the basal level of gene transcription regardless of their orientation and location upstream, downstream or even within introns. In contrast, silencers repress expression, either actively or in a ‘hit-and-run’ fashion by inducing the formation of heritable alterations to chromatin (5
). LCRs contain both enhancer and insulator activity and are defined as cis
-regulatory elements conferring tissue-specific, copy-number-dependent gene expression (6
). Insulators/boundary elements may demarcate genomic loci and establish or maintain the organization of chromatin into domains that allow genes to be regulated independently of the influence of neighboring loci (7
). Some insulators function as barriers - protecting a genomic locus from silencing by surrounding heterochromatin - while others block the activity of neighboring enhancers. MARs are thought of as sequences through which chromatin loops are tethered - perhaps to the nuclear matrix (8
Gene expression is governed by these regulatory elements through the binding of TFs and other regulatory proteins, with the probability of binding determined by factor abundance and ability to access target regulatory elements in chromatin. Chromatin accessibility at regulatory elements is influenced by nucleosome composition, position and interactions with DNA, post-translational histone modifications, and methylation of cytosines within CpG dinucleotides. Such modifications also serve as landmarks for the presence and activity of regulatory elements in a given cell type (9
) and techniques for their detection are essential tools for finding candidate regulatory elements. Regulatory elements often show evolutionary conservation, which means that they may be predicted by computational searches for evolutionarily conserved non-coding sequences (CNSs) (Box 1
Box 1. Approaches to Identify or Infer the Presence of Gene Regulatory Regions and to Characterize their Status and Interactions
This box describes complementary experimental approaches by which to detect candidate gene regulatory elements. By comparing findings in cell types that do or do not express the gene(s) of interest (e.g., Th1 vs. Th2 cells) or that do or do not have the potential to do so (e.g., naïve CD4 T cells vs. fibroblasts) element detection is enhanced and, in some cases, the function of elements can be imputed (), thus helping to prioritize and to inform the design of experiments to test function directly.
Chromatin immunoprecipitation (ChIP)
This assay detects modified histones, transcription factors and other regulatory proteins by immunoprecipitation followed by analysis of the associated DNA; provides information regarding the location of regulatory elements, genes/gene loci, locus boundaries and their activity in that cell type.
- ChIP-PCR and ChIP-qPCR – Focused assessment of candidate sequences/regions by PCR or quantitative real-time PCR.
- ChIP-chip – DNA is amplified and hybridized to a focused or genome-wide microarray (with repetitive regions omitted). More comprehensive than PCR-based approaches; technical and statistical/threshold issues must be carefully considered; may be somewhat less sensitive and have a narrower dynamic range than qPCR (15, 57).
- ChIP-seq – DNA is amplified, size selected and the abundance and sequence of short tags determined by high-throughput sequencing until novel tag discovery reaches/approaches saturation (typically >107 tag sequences); more quantitative than ChIP-chip with high dynamic range; limited polymorphisms between individuals/strains do not interfere with detection or quantitation; allele-specific differences can be detected; if sequencing does not reach saturation, may fail to detect more weakly associated sequences (13••, 16•).
MethylDIP-chip or MethylDIP-seq
This assay assesses DNA cytosine methylation. DNA methylation of regulatory elements, particularly at promoters and the proximal transcribed regions, typically inhibits transcription. Cytosine methylation is assessed as for ChIP-chip or ChIP-Seq using an antibody specific for methylated cytosine. Allows genomewide semi-quantitative assessment of DNA methylation; this approach may underestimate or fail to detect CpG methylation in regions in which the density of CpGs is low; resolution is less than that of sequencing of bisulfite-modified DNA (58
DNase I hypersensitive site detection (HS)
HS sites are regions where the density of nucleosomes is reduced or the association of DNA with nucleosomes is otherwise altered so that the DNA is more sensitive to digestion with DNaseI (and/or other nucleases) than surrounding regions. HS sites are present at all or nearly all gene regulatory elements, including promoters, enhancers, silencers, boundary elements, and locus control regions, that are active or poised for activity in the cell type evaluated.
- Southern blot analysis - Yields approximate location and semi-quantitative estimate of the degree of hypersensitivity.
- DNase-qPCR – Focused, high resolution (~250bp), quantitative detection of HS sites.
- DNase-chip or DNase-seq – Approach, principals, output, resolution and quantitation analogous to ChIP-chip or ChIP-Seq (19••, 59, 60).
Chromatin Conformation Capture (52•
These assays assess the physical proximity between DNA sequences as they occur in the nucleus. As noted in the text, distal regulatory elements are often approximated to the genes they regulate and active or repressed genes are often approximated to each other at transcription factories or in heterochromatin, respectively.
- 3C – chromatin confirmation capture - Focused analysis of interactions between one DNA fragment (the anchor) and a few other regions of interest with detection by PCR or qPCR.
- 4C – circular chromatin confirmation capture – Allows genome wide screening for DNA fragments interacting with a specific (anchor) DNA fragment. 3C is followed by inverse PCR and detection of interacting DNA fragments by hybridization to tiling microarrays or high-throughput sequencing. Most useful for detecting interactions with regions ≥5 MB from the anchor fragment, with specific interactions difficult to resolve from background for regions <5MB away.
- 5C – chromatin conformation capture carbon copy – Multiplexed analysis of hundreds/thousands of DNA fragments within a genomic region of interest followed with detection as done for 4C.
- Chip-loop – Chromatin immunoprecipitation is used first to select for sequences associated with a specific protein after which 3C is performed.
FISH (fluorescence in situ hybrization)
This assay assesses the sub-nuclear location of genes/loci/chromosome territories and physical proximity of two or more genomic regions to each other. The information is complementary to that obtained by 3-5C; resolution of sequences separated by as little as 90kb of linear DNA can be achieved (62
3D-FISH – FISH is performed on cells processed in a manner that the 3-dimensional structure of the nucleus is preserved, with or without 3-D reconstruction of images obtained by confocal fluorescence microscopy.
A variation of 3-D in which 100-200 nm cryosections are made of paraformaldehyde fixed cells, which improves resolution by removing out of focus light that would otherwise be reflected by objects outside the section in the z axis (63
Immuno-FISH – Immunofluorescence and FISH are done together to detect protein co-localization with specific genes/loci/chromosomes.
Nucleosomes are typically displaced or altered in conformation at functional regulatory elements, thereby rendering the DNA at these sites hypersensitive to digestion by DNase I. Thus, DNase hypersensitive (HS) sites denote the presence of regulatory elements functional in the cell type studied (12
). Patterns of histone modifications, DNA methylation and binding of specific transcription and regulatory factors, when correlated to gene expression patterns in specific cell types, provide additional information by which the function of specific elements may be inferred. Technological advances have markedly accelerated element discovery, by providing high-resolution, genome-wide profiles of epigenetic marks in embryonic stem cells and certain differentiated cell types, including primary human CD4 T cells (12
Chromatin immunoprecipitation (ChIP) paired with genome tiling arrays (Chip-chip) or high-throughput sequencing (ChIP-seq) has revealed that in active or poised genes histone H3 mono- or di-methylated on lysine 4 (H3-K4me1
) marks the transcriptionally permissive chromatin of distal regulatory elements, transcribed regions and promoters (); H3-K4me3
is markedly enriched on nucleosomes flanking the transcription start site; H2A.Z (a variant of H2A) is present at promoters and distal regulatory elements but not within transcribed regions; RNA polymerase II (Pol II) and TAF1 are bound to transcription start sites and nucleosomes are displaced from them (20
); DNase HS sites, detected using DNase-chip or DNase-seq, are found at the promoters and at intronic and distal regulatory elements of genes with these permissive chromatin marks (11•
); these marks and Pol II binding are most intense at actively transcribed genes, are commonly detected though less intense at primed and poised ‘bivalent’ genes, but are not detected at poised ‘null’ and silenced genes (see below and ). H3-K4me3
coordinates proper transcription initiation by docking TFIID, CHD1, BPTF/NURF and MLL complexes, which in turn facilitate chromatin remodeling, transcript elongation, splicing and histone acetylation, sustain H3-K4 methylation, and remove repressive H3-K9 and H3-K27 methylation (20
is present throughout the transcribed region of genes undergoing active transcription. Thus, focal H3-K4me3
followed by a region of H3-K36me3
identifies active promoters and their transcribed regions with the magnitude of these marks correlating directly with transcription. Conversely, H3-K27me3
is not present in genes marked by H3-K36me3
, but rather at silent genes, consistent with its role in Polycomb-mediated repression (23
is an alternative mark associated with the silent heterochromatin associated with repetitive elements and transposons. However, discrete peaks of H3K9me3/2
are present in some active genes where they are thought, like H3-K36me3
, to inhibit inappropriate transcription initiation (24
). In contrast to the repressive nature of H3-K27me3/2
are found at active genes. Promoters of some genes are ‘bivalent’ - marked both by H3-K4me3
). ‘Bivalent’ marks are most commonly found at CpG rich promoters of developmentally regulated genes that are inactive but poised for induction or silencing on differentiation, suggesting that lack of expression in this context is an active process. By contrast, CpG poor promoters of tissue-specific genes that are rapidly and transiently activated in response to environmental stimuli, i.e., immune response genes, often have neither mark in precursors but gain H3-K4me3
as they differentiate into expressing or non-expressing cell types, respectively (13•
). Boundary elements are suggested by the presence of an HS site separating a domain of accessible chromatin from repressive chromatin, where CCCTC-binding factor (CTCF) and cohesin mark and create chromatin domain boundaries (13••
). However, CTCF and cohesin may also bind within transcribed regions, and at these sites their binding does not necessarily impede transcription and chromatin remodeling, as is the case for a binding site in the first intron of Ifng
, for reasons as yet unclear (26•
). Though not defined by such global studies, silencers may have ‘bivalent’ marks that resolve with further differentiation, as shown for the Il4
Figure 1 Typical patterns of histones/histone modifications, DNase hypersensitive sites (HS), RNA polymerase II (Pol II) and CTCF binding to genes and their regulatory elements. A gene, its regulatory elements and surrounding chromatin domain are shown on the (more ...)