Whole genome analysis of transcription factors provides an unbiased view of their regulatory dynamics. Here we present a genome-wide analysis of the DNA binding sites of ERα as present in the MCF-7 breast cancer cell line and map these sites to transcripts regulated by estrogen. We used a cloning and sequencing based technology and identified 1,234 high probability binding sites using an algorithm that minimizes false positives from amplified regions of the genome. That 94% of a sample of these sites could be validated by standard ChIP suggests that the majority of the 1,234 sites identified by ChIP-PET represent bona fide binding regions for ERα. Of note is that 96% of the validated binding sites harbored either full ERE-like (71%) or solely half-ERE motifs (25%). Only 4% had no ERE-like sequences detectable using a two-position degeneracy cut-off, and therefore a pure tethered mechanism of ER transcriptional regulation must occur infrequently.
This dispersed nature of these 1,234 sites vis-à-vis the TSSs makes the direct molecular assessment of whether these adjacent genes can be regulated by ER highly impractical. We sought to resolve this problem by examining the clinical behavior of these genes adjacent to ER binding sites. We posited that if these adjacent genes were under ER regulation, then their expression in breast cancers should readily determine ER status of primary breast cancers. Our results using a cohort of 251 breast cancers showed that these putative ER regulated genes can significantly separate ER status in breast tumors and therefore represent a transcriptional regulatory cassette that appears to affect ER response. We further examined this question by studying the behavior of these genes in MCF-7 cells as assessed using expression arrays. Though only 23% of the genes proximal to an adjacent ER binding site are responsive following estrogen treatment, this represents a significant enrichment of bona fide ER binding sites adjacent to estrogen responsive versus unresponsive genes (p
51) (unpublished data). Therefore, our in vitro and in vivo probabilistic analysis all point to the biological significance of the ER binding sites identified by our ChIP-PET analysis in the regulation of gene expression by ER.
It is important to note the ability of ChIP-PET to identify, in an unbiased manner, bona fide ER binding sites among nearby EREs predicted only by computational methods. For example, for the carbonic anhydrase XII (CA12)
gene, matrix-based computational approaches used to identify potential cis
-regulatory elements directing ER-regulation of CA12
indicate that five putative EREs reside in the proximal 5′ 5 kb with an additional ERE found in the first intron of the gene. However, we have identified a moPET 5′ binding site approximately 6 kb 5′ to the gene, which was found to be the major regulatory site directing ER-mediated transactivation as a distal enhancer (D.H. Barnett and B.S. Katzenellenbogen, unpublished data). CA12
mRNA is up-regulated by estradiol in MCF-7 breast cancer cells [3
] and in other ER-positive cells [33
] and is positively associated with ERα status in primary breast tumors [34
]. Hence, our findings highlight the ability of ChIP-PET to identify previously undiscovered enhancers of biologically relevant target genes.
Much of the research of ER transcriptional regulation has focused on a few EREs located within the proximal promoter. We have shown with our global binding site data that, in fact, the vast majority of sites are located in distal or intragenic regions relative to the nearest regulated transcripts. Our genome wide analysis confirmed the more limited observations previously seen in Chromosomes 21 and 22 that only a small portion (5%) of the binding sites are within 5 kb of the TSS and consistent with our previous predictions [6
]. Intriguingly, however, detailed analyses revealed that the statistical preponderance of genes responsive in MCF-7 cells to E2 adjacent to ChIP-PET identified ER binding sites were up-regulated rather than down-regulated. Moreover, the location of these sites next to E2 induced genes showed an obvious enrichment around the TSS both in prestart locations and in 5′ introns and within 50 kb from the TSS (B). The number of these sites is small when the entirety of genes regulated by ER is considered and therefore would have been missed by a less specific analysis. This distribution of the ER binding sites relative to the induced transcripts indicates diversity in both proximal and distal mechanisms in regulating RNA polymerase activity and suggests that the proposed looping mechanisms [15
] may play a more prominent role in ER-mediated transcriptional regulation than previously thought. We have further mapped the entire transcriptome of the MCF-7 cell line using a full-length cDNA library sequencing approach [17
]. In sequencing pair-end tags of over 500,000 full-length cDNA equivalents, we found that 13% of the 22,115 individual transcripts identified were novel. When novel transcripts from MCF-7 are accounted for, 90% of the 1,234 high quality ER binding sites are within 100 kb of transcript boundaries (G. Bourque, C.L. Wei, and E.T. Liu, unpublished data). This apparent distance restriction may reflect structural and spatial constraints on the distal effects of the bound ER on promoters.
Equally intriguing is the possibility that ER-mediated gene repression may use mechanisms very different than gene induction, and that genomic topography (i.e., binding site location and affinity) may have a significant role. Consistent with this is our quantification study using ChIP-qPCR on ChIP-PET identified ER binding sites where genes repressed by ER uniformly had ER binding sites that had the lowest fold induction after E2 exposure (~1–25), as compared to those binding sites adjacent to induced genes (~25–473), and were less likely to harbor a full ERE-like motif. When all sites are taken into account and measured by the number of overlapping PETs (moPETs) up-regulated genes have significantly higher moPET counts than down-regulated genes (p
4, unpublished data). Moreover, in reporter assays performed with the 11 candidate ER binding sites, the only three that did not induce transcription off a TATA promoter were sites associated with repressed genes. These observations are consistent with previous findings that deviations from the full ERE motif reduces ER binding affinity and that the binding site dynamics may differ in genes that are induced by ER than those repressed by ER [11
]. The large number of bona fide and nonproximal ER binding sites reported here represents ideal candidates for further characterization of these distinct mechanisms.
It is known that ER can regulate gene expression not by direct DNA binding but through association with an intermediary transcription factor such as AP-1. Theoretically, this mechanism of ER transcriptional regulation does not require ER binding to an ERE. Our motif searches in these non-ERE sites revealed that the predominant motifs in the pure tethered bin are those for the forkhead transcription factors, SRY, with MAF reaching borderline statistical significance (p
= 0.056). MAF recognizes sequences related to the AP-1 target site and are considered as part of the larger AP-1 family of transcription factors and, therefore, our results suggest that AP-1 and MAF can bind to these sites [36
]. The interesting observation is that in the absence of a minimum of an ERE half site, the fold enrichment of ER binding in these sites is lower (a median of 51-fold enrichment of binding as compared to 81-fold for ERE; Student's t
= 0.027). Moreover, our analysis of AP-1 sites within EREs show perfect orientation with one half site with similarities to AP-1 recognition sequences, and the second (cognate) half site primarily an ERE recognition sequence. These “hybrid” sites show higher levels of ER binding. This suggests that AP-1–associated tethering may favor sites with ERE half-site “anchors.” Indeed, previous analysis of the ERE half site associated with the AP-1 site found in the progesterone receptor promoter showed that the integrity of the ERE half site is required for ER and AP-1 binding and estrogen responsive promoter activity [37
These genome wide approaches to nuclear hormone receptor binding sites are revealing in that the large number of validated binding sites provide statistical power in assessing underlying motif structure in the binding sites. The results of our motif search analysis also point to the potential involvement of a number of other transcription factors participating in ER transcriptional regulatory activity. Included in the list of putative transcriptional coregulators is FOXA1, which has been previously shown to be required for ER functions [15
]. However, the fact that 46 other factors are enriched in the ER binding sites with the same probability as the proven interactors of FOXA1, AP-1, and Sp1 suggests the potential for highly complex interactions. Of course not all cis
-partner transcription factors will be expressed in every cell type. But though it would be highly improbably that each co-occurrence will predict binding by both factors, our analysis of Sp1 action on ten estrogen-responsive genes with adjacent ChIP-PET ER binding sites and predicted Sp1 binding sequences showed down-regulation of all ten. Moreover, we have validated the effect of adjacent GATA3 and BACH interactions in ER binding to EREs (J. Thomsen and E.T. Liu, unpublished data). This suggests that our algorithms to predict adjacent transcription-factor binding are potentially highly accurate.
Perhaps even more interesting is the systematic order of these potential partner transcription factors relative to the position of the central ERE in bona fide ER binding sites. Consistent with the model where AP-1 binding appears “anchored” by an adjacent ERE is that AP-1 is distributed in a nonrandom manner within a 500-bp window of an ER binding site. In this distribution, the sequence of a number of full EREs are actually composite binding elements with an AP-1 site posing as an ERE half site. These composite EREs are seen with SF-1, MAF/BACH, AP-1, and PAX2 and PAX3. All these factors have recognition sequences that overlap with (but are distinct from) the ERE half site. Unexpectedly, highly skewed positioning was found with the SF-1 and PAX3 recognition sequences (), where a large proportion of these response elements are positioned as the second ERE half site within bona fide ER binding sites. Although such overlap may be cues for inherent similarity of the computational model between ER binding sites and other factor binding sites, in the case with SF-1, it has been previously observed that SF-1 response elements can also bind ERα, but not ERβ [38
]. Interestingly, SF-1 knock-out mice exhibited ovarian abnormalities and sterility resembling tissues from ER and aromatase knock-out animals, further suggesting an interaction between SF-1 and the ER-estrogen axis [39
]. Thus, such composite sites are potential points of exchange for transcription factors possibly switching to and from homodimer and heterodimer states of occupancy and represent a potential mechanism to augment heterogeneous response to estrogen exposure.
We have previously reported very little conservation of ERE motifs within promoter regions of human and mouse genes even though conserved and nonconserved sites both bind ER [6
]. In the promoter regions of putative direct target genes, approximately 6% of predicted EREs were conserved in the mouse. In contrast, Carroll and colleagues reported conservation in sequences flanking ER binding sites they experimentally mapped to human Chromosomes 21 and 22 and in their whole-genome study [15
]. To reconcile these apparent differences, we examined the 1,234 ChIP-PET ER binding sites and determined conservation in both flanking sequences and detected ERE motifs. Using similar analytical approaches as those used by Carroll et al., we also find evidence of conservation within the 500-bp windows around the discovered binding sites. However, a more in-depth analysis showed that the conservation signal observed was driven by only 22% of all sites tested. There was limited conservation regardless of whether local sequence similarity or presence of an ERE motifs were used as the metric for conservation (). Thus, the conservation also observed by Carroll et al. is likely due to a small number of highly conserved sequences and does not represent global conservation of binding sites [15
]. We have noted that the conservation may be underestimated due to alignment errors in the comparative analysis of whole-genome sequence data [25
], but these errors will not fully explain the large number of nonconserved sites. The list of genes with conserved ER binding sites does not appear to have functional coherence (unpublished data). Genes classically thought of as prototypes of ER responsiveness, such as pS2/TFF1,
and the progesterone receptor have bona fide ER binding sites in the human MCF-7 cell line that are not conserved by sequence or motif presence across mammalian species. Moreover, both conserved and nonconserved sites are associated with ER-regulated genes. A total of 287 of all 1,234 binding sites (23.3%) are associated with ER-regulated genes, while 63 of the 273 conserved binding sites (23.1%) are associated with regulated genes (not significantly different).
The limited conservation of ER binding sites does not imply that the genes that are important in ER function are not regulated by ER, but that the precise DNA targets may differ. Given the distance of 100 kb, in which an occupied ERE can potentially regulate its associated promoter, there is much flexibility in the placement of ER regulatory elements. Nevertheless, these observations indicate that there are likely species-specific differences in the components and the dynamics of estrogen action and that results from animal studies need to be interpreted with this caveat in mind.
In summary, our work provides a new cartography of ER binding on a genome-wide scale. The collective configuration of these binding sites has revealed fundamental rules that describe the characteristics of a bona fide ER recognition motif. The dominance of the ERE, the distributed nature of the binding sites distant to their associated genes, the separate nature of up- versus down-regulated genes, the importance of adjacent binding motifs of other transcription factors, and the frequency of composite ER response elements are all findings that would have been difficult to assess on a gene-by-gene basis. Data from this work will provide the experimental targets that will further dissect the intricacies of ER transcriptional regulation.