Our study was aimed at identifying the binding sites of HLH-1 genome-wide during embryogenesis while addressing two common problems associated with transcription factor ChIP assays; limited temporal-spatial patterns of expression and lack of suitable reagents for immunoprecipitation. These problems were addressed using two completely different methodologies. In one approach, we used antibodies directed against HLH-1 in combination with heat shock induced over expression of HLH-1 in transgenic animals with ChIP probes hybridized to a genome tiling array (ChIP-chip). As an alternative, we used an epitope-tagged (GFP) HLH-1 transgene that was ChIPed with anti-GFP antibodies and the immunoprecipitated DNA analyzed by next generation sequencing (ChIP-seq). Using these alternative approaches for the transcription factor HLH-1 in C. elegans embryos reveals similar genome-wide profiles for this master myogenic regulator. Although epitope-tagging has been used previously as a surrogate to interrogate binding site distributions in multiple systems, the use of heat shock induced over expression in vivo for such studies is novel. Despite concerns that over expression might result in a high rate of false positive binding sites, our data suggests that is not the case in C. elegans. Our results expand the repertoire of experimental approaches for ChIP studies to include options readily engineered in most biological systems.
This study provides valuable insights into HLH-1 biology in
C. elegans. Both ChIP datasets demonstrate that HLH-1 can be detected just upstream of the TSS of thousands of potential target genes, that an E-box (CANNTG) is the dominant over-represented motif in these bound sequence intervals, and the genes associated with binding sites are enriched in those known to be expressed in bodywall muscle. The widespread localization of HLH-1 predominantly to promoter regions is consistent with the expectation that HLH-1 acts directly to regulate target gene expression. HLH-1 activity is likely mediated through direct binding to E-box sequences (CANNTG) in the promoter of target genes. The symmetry of the top returned E-box motifs is suggestive of homodimer binding, consistent with the fact that
C. elegans bodywall myogenesis occurs in the absence of the E protein binding partner for the MRFs typically found in other biological systems
[17]. The strong association of HLH-1 binding sites upstream of genes known to be expressed in bodywall muscle cells, including many structural genes, suggests that HLH-1 acts as a transcriptional activator of these genes. Thus, our data support a model in which HLH-1 homodimers direct bodywall myogenesis by directly binding to, and activating, many genes required for the development and function of muscle.
There were also several surprising aspects of our results from this study. We did not anticipate the large number of HLH-1 bound intervals and associated genes, up to half of all protein coding genes in C. elegans in the ChIP-seq data using default threshold values. Equally important was the lack of evidence for widespread, ectopic binding of HLH-1 after over expression in the ChIP-chip data. We found that only a small fraction (~3.5%) of all possible E-boxes genome wide were identified by either our ChIP-chip or ChIP-seq (10−6) approaches, arguing for a high degree of selectivity for HLH-1 binding. A comparison of the HLH-1 ChIP data to that recently published for another transcription factor, PHA-4, also suggests highly selective binding (1). Although the peak overlaps between HLH-1 and PHA-4 embryonic datasets was slightly above random expectations, the associated gene overlap was at, or far below, random expectations for all pair wise comparisons. HLH-1 and PHA-4 target genes would not be expected to overlap in embryos as these two transcription factors are expressed in mutually exclusive tissues at this stage of development. Further studies are required to determine if the binding site specificity we observed is an intrinsic property of HLH-1 or reflects additional constraints, including required cooperativity with other factors or the limited accessibility to potential binding sites in the context of general or tissue-specific chromatin organization.
We were also surprised to find that neither of the ChIP methods identified a majority of the genes defined previously by various groups cataloging the bodywall muscle transcriptome. The best observed overlap (45%) was between the ChIP-seq (10−6) and the 297 WormAtlas bodywall muscle expression gene list. The inability to identify all bodywall muscle genes by mixed stage embryo ChIP technology is not completely unexpected because some bodywall muscle genes may be activated late in embryogenesis or post-embryonically. However, the discordance between the ChIP data and bodywall muscle transcriptome profiles also highlights the inherent flaws in experimental manipulation, current technologies employed for gene expression and ChIP studies, and bioinformatic treatment of the resulting data. At the level of gene expression, each of the several attempts to define the bodywall muscle transcriptome in C. elegans has generated a gene list, yet there is limited consensus among these. These differences arise due to many factors, including methodological efficiencies, life stage variability, and the use of various experimental platforms to assay expression. At the ChIP level, our results similarly highlight the variability associated with experimental and platform differences. For example, our ability to validate ChIP-seq peaks not detected by ChIP-chip, using the ChIP-chip reagents and methods demonstrates that our ChIP-chip data is under representing the HLH-1 binding sites. One likely explanation of this under representation is the limited dynamic range of chip hybridizations that makes the discrimination of signal and background more difficult compared to the more quantitative next generation sequencing approach. In addition, our ChIP-chip peak calls required consensus between three biological replicates, a stringency that may have eliminated true positives. The identification and validation of ChIP-chip HLH-1 binding sites that were not detected by ChIP-seq are more difficult to explain. Clearly, no single approach captures a clear and complete view of transcriptional systems suggesting a combination of approaches is likely needed to more fully understand the relationship of transcription factor DNA binding and gene expression.
Our inability to validate many HLH-1 bound sequences with in vivo assays for enhancer activity, even in the presence of heat shock induced over expression of HLH-1, was also unexpected. Although such reporter assays are fraught with caveats, they frequently can be successfully used to demonstrate enhancer function. It seems likely that many sites that bind HLH-1 alone in vivo may not be associated with transcriptional activity. Such sites may never be transcriptionally active, may require additional factors to cooperate for activation through associated cis-acting sequences, or may be sites of transcriptional repression. Regardless, the large number of binding sites makes the correlation of ChIP data with bodywall muscle transcriptomes an imperfect science, demonstrating clear overlaps between the datasets, but at levels that are insufficient to describe the network with any confidence.
A recent study of genome binding sites of the mammalian homolog of HLH-1, MyoD, found results very comparable to ours
[27]. That study suggested that mouse MyoD binds approximately 60,000 sites on the autosomes, is associated with between 41–74% of all protein coding genes, is positively correlated with genes expressed during myogenesis, and that MyoD binding sites often fail in enhancer activity assays in tissue culture. These remarkably similar observations for MyoD in the mouse system to our results in
C. elegans for HLH-1 suggest evolutionary conservation in the roles for this master regulatory transcription factor, both in activation of target muscle genes and in as yet to be determined functions genome-wide.
Our results suggest, as in mammals, that many of the binding sites for HLH-1 in
C. elegans may be functionally inert or play a role not directly linked to transcriptional activation, such as modifying or maintaining chromosomal architecture necessary for a committed muscle cell fate. These alternate functions may underlie the non-random distribution of HLH-1 binding sites on the chromosomes, including a vast over-representation of the ChIP overlapping set on the X chromosome. The high number of genome-wide binding sites for HLH-1/MyoD presents an unanticipated challenge for the field in interpreting the increasing volumes of transcription factor ChIP data becoming available. An unexpectedly large number of binding sites have also been observed for several transcription factors in
Drosophila, suggesting this is a general feature of these factors
[28],
[29],
[30]. Clearly, the function of transcription factors, as well as the predictive power with respect to transcriptional activation or repression, will require the combination of profiles for multiple factors and chromatin modifications to be fully understood.