In contrast to the model of direct gene regulation, several studies have demonstrated transcription factor binding at a large number of sites, many of which cannot be clearly connected with target gene regulation. In Drosophila
, several ChIP-chip studies using whole genome tiling arrays have been performed for developmental transcription factors [21
]. These studies have identified a large number of binding regions, on the order of several thousands, for individual factors in the developing embryo, indicating a greater amount of DNA binding by developmental factors than had been anticipated. For example, over 2,000 binding regions were observed for Twist in the Drosophila
genome in two separate studies utilizing distinct microarray designs [21
], vastly exceeding the number of known Twist targets and including many intronic and intergenic sites. Also unexpectedly, Twist binding overlaps significantly with both Dorsal and Snail binding sites, and many of these sites possess highly conserved motifs. Their conservation suggests they are likely to be functional sites, but their significance is still unclear.
While widespread binding of early developmental transcription factors is perhaps not entirely surprising [24
], the unexpected finding has been the identification of numerous binding sites of unclear function, including for other factors as well. Studies of the binding and gene regulation of Myc and other proteins of the dMax family in Drosophila
and human cells have shown extensive binding across the genome, but that binding did not necessarily correlate with transcriptional regulation of the nearby target genes [25
In an early ChIP-seq study examining the interferon-γ (IFN-γ) responsive transcription factor STAT1 in human cells, a strikingly large number of bound sites was observed [27
]. In unstimulated cells, over 10,000 binding sites were identified, and this increased more than four-fold after stimulation with IFN-γ. In both conditions, approximately 50% of the total sites were intragenic and 25% intergenic. While there was a strong overlap with sites of known STAT1 activity, the majority of binding sites were not located adjacent to STAT1 regulated genes, suggesting that many, or most, bound sites were not directly regulating a nearby gene target. The authors suggested that many of the STAT1 sites might correspond to weaker, less favored binding sites, or possibly functional sites with STAT1 bound in only a subset of the total cell population.
As another example of widespread binding, the hematopoietic factor GATA1 was reported to have over 15,000 DNA binding sites in a mouse erythroblast line [28
]. GATA1-factor binding is apparently necessary for the binding of another hematopoietic factor, the basic helix-loop-helix (bHLH) factor TAL1, to an adjacent E-box motif, the consensus binding site for bHLH factors. There is a strong association of TAL1 binding with erythroid gene regulation [29
], with over 2000 genes, most of which (90%) were categorized as related to erythroid development, having TAL1 binding within putative regulatory elements in one study, and over half of TAL1-regulated genes containing TAL1 bound within a proximal or distal regulatory element in another study [29
]. In this case, the widespread binding of GATA1 might be identifying the sites that can be bound by TAL1, and possibly other factors at different times or in different cells, to execute cell-type specific programs of gene expression.
The myogenic bHLH factor MyoD is another transcription factor that offers potential insight into genome-wide binding. MyoD directly regulates genes expressed during skeletal muscle differentiation [32
] and orchestrates a temporal pattern of gene expression through a feed-forward circuit [33
]. ChIP-seq on MyoD in skeletal muscle cells identified approximately 30,000–60,000 MyoD binding sites [34
]. As anticipated, genes regulated by MyoD during myogenesis had associated MyoD binding sites. However, almost 75% of all genes were associated with a MyoD binding site and about 25% of the MyoD sites were in intergenic regions. Therefore, the majority of MyoD binding events were not directly associated with gene regulation. Although regional transcription was not detected at these intergenic sites, MyoD binding was demonstrated to induce local chromatin modifications, specifically acetylation of histone H4 that is generally associated with active and/or accessible regions of the genome.
Together with the studies discussed above, these findings demonstrate that some transcription factors have binding events that are vastly in excess of the genes that they directly regulate. The remainder of this review will discuss the possible significance of these large number of transcription factor binding events that are not directly related to gene transcription. One proposed explanation for large-scale genome-wide transcription factor binding is the presence of `non-functional' binding sites that serve no biological purpose [22
]. Alternatively, it has been proposed that transcription factors may bind to many low affinity sites in the genome and contribute to gene expression at levels that are low but sufficient to allow evolutionary conservation, an idea proposed from a large scale ChIP-chip study in yeast [35
]. Presuming that these sites are functional, other possibilities include roles in affecting the functional concentration of factors, induction of chromatin looping, changing chromatin and nuclear structure, or the evolution of new transcriptional regulatory networks.