|Home | About | Journals | Submit | Contact Us | Français|
Perturbation and time-course data sets, in combination with computational approaches, can be used to infer transcriptional regulatory networks which ultimately govern the developmental pathways and responses of cells. Here, we individually knocked down the four transcription factors PU.1, IRF8, MYB and SP1 in the human monocyte leukemia THP-1 cell line and profiled the genome-wide transcriptional response of individual transcription starting sites using deep sequencing based Cap Analysis of Gene Expression. From the proximal promoter regions of the responding transcription starting sites, we derived de novo binding-site motifs, characterized their biological function and constructed a network. We found a previously described composite motif for PU.1 and IRF8 that explains the overlapping set of transcriptional responses upon knockdown of either factor.
The human genome project (1) and the subsequent annotation efforts (2,3) provided us a catalog of genes present in our genome. These efforts quickly gave rise to system approaches aiming at understanding the interactions between genes that ultimately govern phenotype and disease pathology (4). The complex interactions among transcription factors derived from such networks point to diverse regulatory programs responsible for cell differentiation during development and cellular responses to outside stimuli.
A powerful technique to understand gene regulatory networks is the perturbation of individual transcription factors in concert with high-throughput expression profiling of all genes (5). Commonly, microarrays are used to measure the changes in gene expression (6–8). In addition to defining regulatory interactions, transcription factor binding site (TFBS) motifs can be extracted from promoter regions of affected genes. Searching the genome sequence in silico with such motifs can reveal putative downstream targets of the transcription factors. However, these predictions are fraught with difficulties summarized by the ‘futility theorem’ (9). In brief, most predicted binding sites will have no functional role in general and, despite binding in vitro, may not be functional in the cellular model studied or may only be functional in presence of additional factors (co-regulation). Therefore, it is desirable to couple computational approaches with experimental techniques to identify actively used TFBS.
Chromatin immunoprecipitation (ChIP) in conjunction with tiling microarrays or sequencing is able to tell us the possible binding sites of transcription factors. To be able to perform experiments for specific transcription factors, however, specific antibodies are needed whose production is both difficult and, for many of the transcription factors, not yet available (10). Additional specific experimental optimizations are required.
Here, we describe the use of deep sequencing based Cap Analysis of Gene Expression (deepCAGE) (11) to study the effects of transcription factor (TF) perturbations on target gene expression at the promoter level. Previously, deepCAGE was used to accurately define and compare the transcriptional start sites (TSS) of genes in various tissues (7), determine the distance of the TATA-box from the TSS (12), as well as during cell differentiation (3). Restricting TFBS analysis to the accurately mapped TSSs discards many false-positive predictions in intergenic regions and thus improves the accuracy of transcriptional regulatory networks (3). In contrast to previous approaches, this allows for the construction of transcriptional regulatory gene networks at the resolution of individual promoters.
In this study, we combined our deepCAGE (3,13) technology with knockdown (KD) perturbation experiments of four key transcription factors (PU.1, IRF8, MYB and SP1) expressed in the human monoblastic leukemia cell line THP-1 (14). Previously, we demonstrated by using siRNA-mediated gene knockdown and microarray profiling that these four factors regulate large numbers of genes important to monocyte biology. In particular, MYB knockdown promotes monocytic differentiation of THP-1 cells, indicating a central role in maintaining the undifferentiated monoblast state (3).
DeepCAGE profiles were generated for each of the samples and compared to cells treated with a scrambled negative control oligo. This approach allowed us to identify the most strongly affected TSSs for each TF knockdown and their corresponding promoter regions. We then attempted to derive de novo TFBS motifs from the promoter regions and compared our results to the known binding-site models in the TRANSFAC database. Finally, these data were used to draw a basic regulatory network based on the direct regulatory interactions we identified.
We used RNA extracted from the same knockdown human leukemia THP-1 cell batches used in the recent FANTOM4 project (3,8). In brief, transfection was performed using stealth siRNA (Invitrogen) and RNA was harvested after 48h. TF gene-expression levels in THP-1 cells treated with gene-specific siRNAs (SP1, PU.1, IRF8 and MYB) or the calibrator negative control (NC) siRNA were estimated by qRT-PCR in triplicate [see Supplementary material of Suzuki et al. (3)].
deepCAGE libraries were prepared for the five knockdown experiments according to the deepCAGE protocol (3,13) and sequenced using the Roche 454 sequencer. In total, 6187981 deepCAGE tags were mapped to the human reference genome sequence (hg18) using Nexalign (Lassmann,T., http://genome.gsc.riken.jp/osc/english/dataresource/) allowing up to one mismatch or one indel. Tags with TSS falling into windows of 20bp were grouped into 396118 tag clusters (TCs). For all further analyses, we focused on a filtered set of 3332 robustly detected TCs with a minimum average deepCAGE expression across the five (four KD and control) libraries of 30 tags per million (TPM).
For comparing the perturbation of deepCAGE expression profiles with microarray expression, we first mapped the 3332 robustly detected TCs to Entrez gene models, requiring that the tags originated within the boundaries of known transcripts for the locus or up to 1kb upstream. The 3332 TCs mapped to 3114 Entrez genes using this approach, with 84 genes possessing more than one robustly detected TC. Fold change for the deepCAGE data was then calculated by dividing the gene expression in TF KD by the expression in the negative control experiment. Microarray probe mapping to Entrez gene and expression fold changes were obtained as described in Suzuki et al. (3). This then allowed direct comparison of fold changes measured by deepCAGE with the corresponding measurement by microarray.
Proximal promoter regions of TSSs were defined as previously described (3) and include 300bp upstream and 100bp downstream of the deepCAGE-defined TSS. We extracted the corresponding active deepCAGE promoter regions from the human genome (hg18) and applied the motif-finding program MEME (15). We applied MEME to regions which are at least 1.5-fold up- or downregulated in both microarray and deepCAGE measurement. The selection was further restricted to the top 50 of such regions based on recommendations found in Bailey et al. (15). We hypothesize that this selection enriches for promoters that are direct targets of the transcription factor. In the case of IRF8, SP1 and PU.1, fewer than 50 TCs were upregulated by at least 1.5-fold (20, 22 and 38, respectively); therefore, smaller training sets were used for these classes.
MEME can report multiple motifs for each set of the proximal promoter regions. In such cases, we only selected the motif with the most significant E-value for further analysis. We did not attempt to merge similar motifs.
To assess whether the obtained motifs are biologically relevant, we searched the remaining TCs (3332 TCs, excluding the training sets) using the program Fimo from the Meta-MEME package (16). For comparison, we used the TRANSFAC database and the accompanying Match program (17,18) to scan our sequences for the presence of TRANSFAC defined motifs. Furthermore, we overlaid our TCs with previously published ChIP-chip data (3) for PU.1 and SP1 (detailed Methods available in the Supplementary Data).
We used UCSC browser Vertebrate Multiz Alignment & PhastCons trac to look for conservation of our motifs. A base position in the motif was deemed to be conserved if the conservation was at least 80%.
DNA Data Bank of Japan (DDBJ) Read Archive: DRX000341 (CAGE library I05).
To evaluate deepCAGE as a platform for measuring gene-expression perturbation, we used the same batches of RNAs for both TF suppression and negative control samples as were used in the microarray analysis for the FANTOM4 main paper (3). For these samples, the efficient knockdown was already confirmed by qRT-PCR and western blotting. We observed an overall positive correlation for all four TF knockdown samples across both platforms (Figure 1). In general, deepCAGE fold changes were greater than those measured by microarrays, as has been previously noted (19).
Knockdown of SP1, IRF8, PU.1 and MYB led to induction of 267, 347, 189 and 307 genes and repression of 428, 527, 260 and 1160 genes by 1.5-fold up- or downregulation, respectively. Eight sets of proximal promoter regions were extracted corresponding to the top 50 most upregulated and most downregulated TCs for each knockdown experiment (see ‘Methods’ section). The de novo motif-finding algorithm MEME (15) was used to identify motifs enriched in the perturbed promoters. We identified motifs for all four downregulated promoter sets and also identified motifs in the promoters of the upregulated sets for MYB and PU.1.
Enrichment in the upregulated set of promoters suggests the TF works as a repressor,whereas enrichment in the downregulated set of promoters suggests the TF works as an activator. As an example, we find that knockdown of IRF8, a known activator (20), results in downregulation in both the deepCAGE and microarray experiments of XAF1, a gene which we predict to contain our novel motif (Figure 2). The observation that MYB knockdown yielded motifs for both up- and downregulated sets is consistent with its known role as both a transcriptional activator and repressor (21). Despite this, the motifs found in either set appear to be different, which may suggest different modes or different co-factors for binding repressive and activating sites.
To assess whether our de novo motifs identify functional sites, we examined the expression fold-changes of TCs containing the predicted motifs compared to all other TCs. The TCs used to derive the motifs in the first place were excluded. Instead of using CAGE data based on a single experiment, we used microarray expression data based on three biological replicas from the same RNA batch since we deemed it to be more reliable (Figure 3). However, when using CAGE expression data instead, there are no discernible differences (Supplementary Figure S1).
Of the motifs tested, those for MYB up, MYB down, PU.1 up and SP1 down data sets (Supplementary Figures S2 and S3) did not show significant expression differences between the sequences containing the motifs and those that do not contain the motifs. Hence, we did not further analyze these motifs.
However, promoters containing PU.1 down or IRF8 down motifs were expressed at significantly lower levels than promoters lacking the motif. Moreover, when the same test was carried out using the published TRANSFAC (17,18) motifs for PU.1 and IRF8, or using ChIP data for PU.1 to identify PU.1-bound promoters, neither outperformed the novel motif (Figure 3). Furthermore, comparison to UCSC’s vertebrate-conservation track revealed that 32.8 and 35.5% of the novel PU.1 and IRF8 base positions, respectively, are strictly conserved, while 11 out of 47 and 7 out of 20 PU.1 and IRF8 motifs are completely conserved. This compares with 3–8% average overall conservation and 11–24% conservation in coding regions.
In a parallel effort, we used the program CLOVER (22) to detect enriched motifs in the top 50 downregulated IRF8 and PU.1 CAGE clusters. As expected, we found enrichment for the corresponding known motifs in both data sets (for details see Supplementary Data and Figures S3 and S4 and Tables S2 and S3). However, the enriched motifs are only weakly overrepresented when considering all downregulated clusters. Therefore, the de novo derived motifs describe the transcriptional response to TF knockdown better than using known motifs or the present ChIP-chip data.
The motifs obtained for PU.1 and IRF8 were longer than the corresponding motifs in the TRANSFAC database (Figure 4a). Manual alignment of our matrices to each other and to the TRANSFAC motifs revealed that both of our motifs contain regions similar to the TRANSFAC PU.1 and the IRF8 motifs. Furthermore, we observed 44 promoters that were downregulated in both IRF8 and PU.1 knockdown (Supplementary Figure S5 and Table S4). Our IRF8 motif contains three triple-T (TTT) regions. To understand their significance, we truncated our IRF8 motif by removing the triple-T sub-motif from either end. The expression differences in the test set became less pronounced (Figure 4b), indicating that all three triple-Ts are important for the specificity. Similar examples of combinatorial regulation were previously described for IRF8 and other IRF family members and for the PU.1 transcription factor (20,23).
Above we have demonstrated that KD followed by deepCAGE expression profiling (KD-CAGE) can be effectively used to identify promoters regulated by a given transcription factor. Moreover, highly downregulated promoters in the PU.1 and IRF8 KDs were shown to contain PU.1 and IRF8 motifs indicating they are direct targets of these factors. The approach can thus be used to directly generate a transcriptional network model (24). For illustration purposes, we generated a small sub-network based on genes co-perturbed by the knockdown of at least two of the four factors (Figure 5). Edges upregulated upon knockdown are shown in red and those downregulated are shown in blue. Genes co-regulated by PU.1 and IRF8 were predominantly co-downregulated upon knockdown. Interestingly, there is an antagonistic relationship for genes co-regulated by PU.1 and MYB, with the majority downregulated upon PU.1 KD but upregulated upon MYB KD. The network predicts 47 genes as targets of our novel PU.1 motif. Eight of these (CD74, HCLS1, NRGN, TNFSF13B, IFI6, MLC1, MARCH3 and CHI3L1) are supported by ChIP signal for PU.1 (Supplementary Table S1). Most of these are known to be important in hematopoietic lineages and IFI6 is known to be an interferon-inducible gene. CHI3L1 has been previously reported as a PU.1 target (25). However, this is the first report that TNFSF13B, a myeloid-associated marker gene, is regulated by both PU.1 and IRF8.
These directed edges reflect the regulation of individual TSSs rather than responses at the gene level and represents a powerful new approach to building alternative promoter-aware networks in the near future.
We have demonstrated for the first time that deepCAGE technology is a feasible alternative to microarrays for measuring RNAi-mediated perturbations and generating perturbation networks. As the technique is a direct measure of promoter expression, it allows focusing on the actual promoters used in a given cellular context, rather than ambiguous mapping of microarray expression to the 5′-ends of known transcripts. Furthermore, we have shown that our approach can be used to de novo identify regulatory motifs with a clear demonstration of functional motifs for PU.1 and IRF8 with similarity to the published TRANSFAC motifs. The motifs described by us perform better at describing the response to the KD than TRANSFAC and ChIP-chip data.
In the case of PU.1 and IRF8, many of the same promoters responded to either knockdown and a longer composite motif was identified. While the known IRF8 TFBS contains two copies of a triple-T motif, ours contains three copies. This longer motif, however, is functionally relevant as truncating the motif by removing the first or third triple-T reduced our ability to explain the transcriptional response to IRF8 knockdown. These observations are supported by the previously reported cooperative binding of both factors (20,23). As the significant motifs were identified in the promoters of downregulated genes, we conclude that PU.1 and IRF8 in combination act primarily as activators as previously reported (22), while the motifs observed for MYB suggest it can act both as a repressor or an activator (Supplementary Figure 2A and B).
This pilot experiment paves the way for building regulatory networks and identifying regulatory motifs for the majority of transcription factors. Genome-wide ChIP of TFs is an alternative approach to identify transcriptional regulatory regions (26), which is extensively being used in the ENCODE project (4). However, to date only 160 ChIP grade antibodies are available for the estimated 882 DNA-binding transcription factors in mammals (27). KD-CAGE is not restricted by such reagents, and in the light of constantly reducing costs of DNA sequencing (28) it is possible to test a large collection of all DNA-binding proteins to characterize their function. In addition to the 330 regulatory interactions, we reported in our four knockdown experiments (Supplementary Table S1), only 3 were supported by current ChIP-chip experiments. This highlights that there are sites where the TF is bound but is functionally inactive, as noted by Wasserman and Sandelin (9). However, in spite of this, a combined approach would potentially be a very powerful method to discriminate indirect targets from direct targets bound by factors at both proximal and distal sites including enhancers and insulators.
Finally, we have previously described the application of motif activity response analysis (MARA) in a developmental time course to predict the regulation by TFs on individual promoters (3). However, this approach depends on known TFBS motifs. The approach described here can be used to identify TFBS motifs de novo. In the future, we will aim to extend the set of known motifs using this approach and extend our network analyses to encompass the function and targets of uncharacterized DNA-binding proteins and to provide a network of interactions among such proteins.
Supplementary Data are available at NAR Online.
Research Grant for RIKEN Omics Science Center from Ministry of Education, Culture, Sports, Science and Technology (MEXT) (to Y.H.); International Program Associate stipend from RIKEN (to M.V.). Funding for open access charge: Research Grant for RIKEN Omics Science Center from Ministry of Education, Culture, Sports, Science and Technology (MEXT) (to Y.H.).
Conflict of interest statement. None declared.
Mr Akira Hasegawa assisted in the alignment of the deepCAGE tags. P.C. developed the deepCAGE technology. T.L. conceived perturbation deepCAGE. M.V. and T.L. designed the experiments and carried out the motif analyses and network building. ARRF carried out the microarray analysis and Entrez gene mapping. Y.T. and M.S. carried out the knockdowns. T.L., M.V., A.R.R.F. and C.D. wrote the manuscript. H.S., Y.H. and C.D. advised on the experimental design. All authors read and approved the final manuscript.