|Home | About | Journals | Submit | Contact Us | Français|
Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
The ability of a transcription factor to regulate its targets is modulated by a variety of genetic and epigenetic mechanisms, resulting in highly context-dependent regulatory networks. However, high-throughput methods for the identification of proteins that affect transcription factor activity are still largely unavailable. Here we introduce a systems biology framework, MINDy (Modulator Inference by Network Dynamics), for the genome-wide identification of post-translational modulators of transcription factor activity within a specific cellular context. When used to dissect the regulation of MYC activity in human B lymphocytes, the approach inferred novel modulators of MYC function, which act by distinct mechanisms, including protein turn-over, transcriptional complex formation, and selective enzyme recruitment. MINDy is generally applicable to study the post-translational modulation of mammalian transcription factors in any cellular context. As such it provides a useful resource to dissect context-specific signaling pathways and combinatorial transcriptional regulation.
Reverse engineering of cellular networks in prokaryotes and lower eukaryotes1–3, as well as in mammals4–6, has started to unravel the remarkable complexity of transcriptional programs. These programs, however, may change substantially as a function of the availability of proteins affecting their post-translational modification, such as phosphorylation, acetylation and ubiquitination enzymes7, as well as of those participating in transcription complexes (co-factors), thus making cellular networks highly context dependent.
Although the large-scale reprogramming of the cell’s transcriptional logic was studied in yeast8, 9, identification of a repertoire of genes that effect these events remains elusive. Indeed, compared to tools such as ChIP-on-Chip or reverse engineering algorithms for the analysis of transcriptional networks4, 10, only one experimentally validated algorithm exists for the dissection of signaling networks in a mammalian context11, which inferred substrates of 73 kinases. Here we propose and experimentally validate MINDy (Modulator Inference by Network Dynamics), a gene expression profile based method for the systematic identification of genes that modulate a transcription factor’s (TF) transcriptional program at the post-translational level, i.e. those encoding proteins that affect the TF’s activity without changing its mRNA abundance. These proteins may post-translationally modify the TF (e.g., kinases), affect its cellular localization or turnover, be its cognate partners in transcriptional complexes, or compete for its DNA binding sites. They may also include proteins that do not physically interact with the TF, such as those in its upstream signaling pathways.
MINDy interrogates a large gene expression profile dataset to identify candidate modulator genes whose expression strongly correlates with changes in a TF’s transcriptional activity. As shown in Supplementary Information (SI) Section 1.1, this can be efficiently accomplished by computing an information theoretic measure known as the Conditional Mutual Information (CMI), I[T F; t| M] 12, between the expression profile of a transcription factor (TF) and one of its putative targets (t), given the expression of a modulator gene (M). Accurate estimation of the CMI requires exceedingly large datasets. Thus MINDy infers candidate modulators using a related, yet simpler estimator, which we denote as ΔI (see Methods). Briefly, the estimator assesses the statistical significance of the difference in Mutual Information (MI) between the TF and a target in two subsets, each including 35% of samples, in which the modulator is least and most expressed, respectively. The 35% parameter was determined empirically as the one optimizing the identification of proteins in the B-cell receptor signaling pathway as modulators of the MYC TF (see Methods).
A schematic representation of the MINDy algorithm is provided in Figure 1A. MINDy takes four inputs: a gene expression profile dataset, a TF of interest, a list of potential modulator genes (M1, M2,…) and a list of potential TF targets (t1,t2,…). The ΔI estimator requires that the expression of the modulator and of the TF be statistically independent (independence constraint), and that the modulator expression have sufficient range (range constraint). Appropriate statistical tests for these constraints are discussed in the Methods. Candidate modulators may include all genes satisfying these constraints (unbiased analysis) or may be filtered by additional criteria, e.g., their molecular functions. Each possible triplet (TF, M, t) is then independently tested using the ΔI estimator. False positives are controlled using appropriate statistical thresholds (see Methods).
A positive or negative Mode of Action is determined, depending on whether the TF-target MI increases or decreases as a function of the modulator abundance (Figure 1A). The Mode of Action, however, is not necessarily equivalent to the modulator’s Biological Activity as a TF activator or antagonist. For instance, a modulator may be such a strong TF-activator that the TF-target kinetics becomes saturated even when the TF is slightly activated. In that case, TF-target MI may decrease as a function of the modulator. Details on the CMI analysis and on how to assess both Mode of Action and Biological Activity of a modulator are provided in the Methods section.
For illustrative purposes, we show a simple synthetic network (Figure 1B, see Methods), which explicitly models two post-translational modulation events (activation by phosphorylation and co-factor binding) differentially affecting a TF’s regulatory logic. Rather than representing a realistic case this model is only a conceptual tool to illustrate two alternative regulatory programs of a TF depending on its modulators (Figure 1C).
We applied MINDy to the genome-wide identification of modulators of the MYC proto-oncogene, using a previously assembled collection of 254 gene expression profiles13, 14, representing 17 distinct cellular phenotypes derived from normal and neoplastic human B lymphocytes (see Methods). MYC encodes a basic helix-loop-helix/leucine zipper (bHLH/Zip) TF, controlling many cellular processes, including cell growth, differentiation, apoptosis and DNA replication15, 16. It is implicated in the pathogenesis of several human cancers17 and can either activate or repress a large number of targets, depending on the cellular context (for a review see ).
To study MINDy’s robustness and generality, we tested its performance using different MYC-target selections. First, we used 340 literature-validated targets from the MYC database19 (DB-targets). Then, to also test whether the algorithm may generalize to TFs whose targets are not characterized in the literature, we used 197 MYC targets inferred by ARACNe4 (AR-targets). Finally, see SI section 1.9, we considered all genes in the gene expression profile data as candidate targets (ALL-targets), representing cases when literature or computationally-inferred TF targets are not available.
From a pool of 3,131 genes satisfying both independence and range constraints and using DB-targets, MINDy inferred 662 MYC modulators (Table S1) at a False Discovery Rate (FDR) of 4.5×10−3 (see Methods). Gene Ontology (GO) molecular function enrichment analysis revealed that the 20 most enriched categories by Fisher’s exact test (FET) include protein kinase activity (p = 0.002), transcription factor binding (p = 0.003), acetyltransferase activity (p = 0.004) and phosphoprotein phosphatase activity (p = 0.016). Thus, inferred modulators were enriched in categories known to include effectors of MYC activity20–22 (see Methods and Table S2 for details on the GO enrichment analysis).
To test whether MINDy could recapitulate literature-based MYC modulators, we assembled an unbiased set of 233 MYC modulators, including both proteins physically interacting with MYC and indirect modulators (Table S3), using the Ingenuity software (Ingenuity® Systems, www.ingenuity.com). The assumption is that physical interactors are likely to affect MYC protein function, although there may be false positives. While not exhaustive, this provides an independently-established, literature-based dataset to assess algorithm performance (see SI Section 1.8 for details on inclusion criteria).
From this set, 150 genes were excluded as not represented on the chip, not expressed in B cells or not satisfying the range or independence constraints. Of the remaining 83, 29 (35%) were inferred as MYC modulators by the algorithm (p = 0.0078 by FET). This suggests that the algorithm is effective in recapitulating known MYC modulators (especially since Ingenuity modulators may not be B cell specific) with recall comparable to high-throughput assays for protein-protein interactions, which on average detect 20% of known interactions23. We note that 54/83 proteins were reported by Ingenuity as MYC-modulators not supported by a direct physical interaction. Of these, MINDy identified 18 (33.3%, p = 0.041 by FET), suggesting that MINDy is useful in identifying both physically interacting and indirect TF-modulators. Indeed, almost twice as many modulators (18 vs. 11) were found by MINDy among proteins not known to interact directly with MYC.
To focus our analysis on specific molecular functions, we restricted candidate modulators to 542 signaling proteins – including protein kinases, phosphatases, acetyltransferases and deacetylases – and to 598 TFs (see Methods). Among these, MINDy identified 91 signaling proteins (Table S4) and 99 TFs (Table S5), respectively, as MYC modulators (FDR = 0.0053). For each modulator, virtually 100% of the ΔI tests inferred the same Mode of Action (see columns “T+” and “T−” in Table S4 and S5) and fewer than 15% of the modulators had an ambiguous Biological Activity.
To assess a lower bound on the fraction of true positives among inferred modulators (i.e. precision), we performed manual literature curation. Given the labor-intensive nature of this step, only 29 signaling proteins and 35 TFs affecting more than 30 MYC targets were considered. Among the former (Table 1), 11 appear as MYC modulators in published reports (precision = 37.9%). Similarly, among the latter (Table 1), 6 appear in published reports as MYC co-factors or antagonists (precision = 17.1%). For TFs with informative binding profiles in TRANSFAC24, we tested whether their binding sites were enriched in promoters of the modulated targets (see Methods). 14 of 35 TFs had appropriate binding profiles. Of these, 11 were highly enriched. Overall, 17 of 35 TFs (precision = 48.6%) were either literature-validated or had enriched binding sites (see Table 1 and Table S6). This suggests that MINDy’s precision may approach experimental assays23 when all modulators will be experimentally tested.
We then compared the performance of MINDy using literature-based targets (DB-targets) to that with targets computationally-inferred by ARACNe (AR-targets). Overlap between the two target sets was highly significant (p = 2.89×10−18) but relatively small (17%). Nonetheless, overlap between MINDy inferred modulators, using the two target sets, was almost complete: 93.8% (p = 3.10×10−27) among signaling proteins and 95.3% (p = 4.56×10−288) among TFs, respectively (See SI Section 1.9 for more details). This suggests that the method is highly robust to target selection and can be effectively generalized to TFs whose targets are not known from the literature but can be inferred computationally.
We selected four candidates among signaling genes and co-TFs for experimental validation, including a kinase (STK38), two TFs (BHLHB2 and MEF2B) and a de-acetylating enzyme (HDAC1). These genes were selected based on the availability of reagents allowing the thorough validation of their activity, and the diversity of the possible mechanisms by which they may modulate MYC activity. Since no single B-cell line expressed more than two of the four tested modulators, appropriate lines were selected for the silencing experiments among those with the highest modulator expression.
As a first candidate, we validated STK38, a Serine/Threonine kinase25 inferred by MINDy as a strong positive modulator of MYC activity, affecting 60 targets (Figure 2A, Table S4). We silenced STK38 by lentiviral vector-mediated shRNA expression in ST486 cells. While qRT-PCR analysis showed that MYC mRNA concentration was unchanged following STK38 silencing, MYC protein levels were significantly affected (Figure 2B), suggesting a post-translational modulation effect. We proceeded to test two MYC targets in MINDy predictions, BCL2 and NME1, which are normally repressed by MYC26, 27. Consistently with what MINDy predicted, both genes were up-regulated following STK38 silencing (Figure 2C). Additionally, to test whether STK38-mediated MYC modulation is target specific, we tested three additional MYC targets not inferred by MINDy. The first two, TERT and EBNA1BP2, which are known to be activated by MYC4, 28, were down-regulated following STK38 silencing, while the third one, p21CIP1, which is known to be repressed by MYC29, was up-regulated (Figure 2C). These results confirm the observed target-independent down-regulation of MYC at the protein level. Furthermore, STK38-mediated modulation of MYC affected its phosphorylation. Immunoblot analysis of MYC protein in ST486 cells, in the presence of inhibitor of proteasomal degradation (MG132), showed accumulation of phosphorylated MYC in cells following STK38 silencing compared to cells treated with control shRNA, suggesting that STK38 mediates MYC phosphorylation (Figure 2D). Finally, co-immunoprecipitation (co-IP) of epitope-tagged STK38 (HA-STK38) and MYC (FLAG-MYC), in 293T cells transfected with vectors expressing both proteins, showed that STK38 and MYC interacts at the protein level (Figure 2E). Immunoprecipitation of endogenous MYC using specific antibodies in Ramos cells confirmed that the two proteins can interact physiologically in native cells (Figure 2F). These results suggest that STK38 modulates MYC activity by directly affecting MYC protein stability.
MINDy infers this TF as a negative modulator of MYC activity, affecting the regulation of 80 targets (Figure 3A, Table S5). Indeed, BHLHB2 is a TF able to bind to E-boxes through its bHLH domain, and it has been proposed to act as a transcriptional repressor30, but not validated as a MYC antagonist. Thus, we tested whether BHLHB2 could antagonize MYC-mediated transcriptional activation of its target genes, by first investigating whether BHLHB2 could affect the transcriptional activation of TERT, a well characterized MYC target28. Transient co-transfection of a reporter gene driven by a segment of the human TERT promoter region, carrying two E-boxes [TERT-Luc80028], and vectors encoding MYC or BHLHB2 in 293T cells showed that BHLHB2 represses MYC-mediated transcriptional activation on TERT in a dose-dependent manner. This effect is MYC-dependent since the basal transcriptional activity of the reporter gene is actually mildly increased by BHLHB2 (1.2 fold, Figure 3B). Thus BHLHB2 represses MYC-mediated regulation but is not a direct repressor of the TERT promoter. We next analyzed whether endogenous BHLHB2 molecules are physiologically bound to E-boxes within the promoter region of MYC target genes in vivo by quantitative chromatin immunoprecipitation (qChIP) assays in B-cells (ODH-III) using antibodies against MYC and BHLHB2. The results showed that the promoters of BOP1, ATIC, MRPL12, EBNA1BP2 and TERT were bound by both MYC and BHLHB2 (Figure 3C). In order to establish the functional significance of BHLHB2-mediated modulation of MYC transcriptional activity, we examined whether shRNA-mediated inhibition of BHLHB2 could affect the response of the 340 canonical MYC-target genes used in the MINDy analysis or, more specifically, of the MYC-target genes modulated by BHLHB2 as inferred by MINDy. The latter signature was used in case the effect may be highly target specific and thus only a subset of MYC-target may be affected by BHLHB2 silencing. To this end, ODH-III cells were transduced with lentiviral vectors expressing BHLHB2-specific or control shRNAs. Western blot analysis showed that BHLHB2 was effectively silenced, while MYC levels were not affected (Figure S5). We then performed gene expression profile analysis to assess the effect of BHLHB2 silencing on the expression of MYC targets. MYC is known to both positively and negatively regulate its targets31. Thus, without prior knowledge of MYC specific activity on each target, we used Gene Set Enrichment Analysis (GSEA)32 to assess whether MYC target signature genes are more differentially expressed than non-MYC target genes following BHLHB2 silencing (see Methods). The analysis confirmed a highly significant enrichment of canonical MYC targets within differentially expressed genes (p< 0.001) (Figure 3D). Among the 80 MINDy inferred BHLHB2-modulated MYC-targets, 30 were among the most differentially expressed genes. This constitutes approximately a 2-fold increase over their enrichment in non-differentially expressed genes (FET p= 8×10−5). MYC mRNA and protein levels were not affected (data not shown and Figure S5) indicating a post-translational effect. These results validate BHLHB2 as an antagonist of MYC activity.
MEF2B: this TF was inferred as a positive modulator of MYC activity, affecting 14 MYC targets (Figure 4A, Table S5). MEF2B is a member of the MEF TF family, which interacts with the myogenic bHLH proteins MyoD and E12 to activate gene transcription through direct binding to E-boxes on target promoters33. Details on the validation assays are provided in SI Section 1.11. Briefly, similar to BHLHB2, we showed (a) that MEF2B physically interacts with MYC both exogenously in 293T cells (Figure 4B), and endogenously in P3HR1 and Ramos cells (Figure 4C), (b) that MYC and MEF2B can synergistically activate a TERT reporter gene (Figure 4D), and (c) that genes differentially expressed following shRNA-mediated silencing of MEF2B were highly enriched in MYC targets by GSEA (p< 0.001, Figure 4E), while MYC expression was not affected (Figure S6B).
Finally, MINDy identified the histone deacetylase and well-known transcriptional co-repressor HDAC134, 35 as a modulator of MYC transcriptional activity on 8 MYC targets (Figure 5A, Table S4). Experiments (see SI Section 1.12 for more details) confirmed (a) that HDAC1 and MYC can interact in vivo, both exogenously in 293T cells (Figure 5B) as also reported in , and endogenously in Ramos and P3HR1 cells (Figure 5C), (b) that genes differentially expressed following shRNA-mediated silencing of HDAC1 were highly enriched in MYC targets by GSEA (p < 0.001, Figure 5D), while MYC expression was not affected (Figure S7A), (c) that HDAC1 can deacetylate MYC in vitro, following its CBP-mediated acetylation (Figure 5E) which has been shown to increase MYC’s activity as a transcriptional activator, and (d) that, based on qChIP assays with anti-MYC and anti-HDAC1 specific antibodies, both MYC and HDAC1 bind to the promoters of p21CIP1 and CR2, which are repressed by MYC in B-cells (Figure 5F). These results suggested that MYC could recruit HDAC1 to repress transcription on selected target genes. Taken together, we have demonstrated that HDAC1 may modulate MYC activity both by de-acetylation of the MYC protein and by transcriptional repression of selected targets.
Extension to other Transcription Factors: to validate MINDy on a broader range of TFs, we used the algorithm to infer all TFs modulated by BHLHB2, MEF2B, HDAC1 and STK38, for which we had already collected gene expression profile data following shRNA-mediated silencing. Specifically, we tested whether their MINDy-inferred targets were enriched in differentially expressed genes (by GSEA), following lentivirus mediated shRNA silencing of the corresponding modulators. As shown in Table S13, 75% of the TFs inferred by MINDy as modulated by any of the four modulators with support greater than 100 targets (33 out of 44, p ≤ 5.1×10−34) could be experimentally validated by the analysis. Furthermore, as one may expect, validation rates increased – from 51% (87 out of 171), to 61% (59 out of 96), to 75% (33 out of 44) – when the minimum number of MINDy-inferred targets supporting the TF was increased from 25, to 50, to 100, respectively. In general, these results are consistent with the MYC analysis and suggest that MINDy is broadly applicable to the analysis of TFs other than MYC. (See SI Section 1.13 for further details).
We have introduced MINDy, a novel method for the identification of context-specific, post-translational modulators of TF activity. Literature-based and experimental validation suggests that MINDy can recapitulate a large fraction of known MYC modulators and infer novel, context-specific modulators, both of MYC and of other TF.
For well studied TFs, targets for the analysis may be selected from the literature or by performing genome-wide ChIP assays37, 38. However, computationally-inferred targets performed as well or better than literature-based ones, likely due to their context-specific nature. About 269 TFs have more than 50 ARACNe-inferred targets using the B-cell profiles, and may thus be effectively analyzed by MINDy.
Algorithm performance was remarkably robust to candidate target selection (DB-targets vs. AR-targets). Additionally, MINDy’s formulation is relatively simple, requiring only the availability of a large gene expression profile dataset (n ≥ 200 profiles) characterizing a sufficient variety of naturally occurring or experimentally perturbed cellular phenotypes. This suggests that MINDy can be used to analyze most TFs in a variety of cellular contexts.
Several limitations should be noted. First, candidate modulators that do not satisfy the range constraint cannot be tested by the method. These, however, include mostly either genes that are not expressed or genes whose availability is so tightly regulated (e.g. housekeeping genes) that variability in the gene expression profiles is too limited to establish a low and a high range of expression. Second, candidate modulators that do not satisfy the independence constraint may not be tested using this approach. In practice, less than half (100/233 = 42.9%) of the Ingenuity modulators were in this category. This is not a theoretical limitation of the method but rather an assumption we use to increase its sensitivity. If desired, the more general test may be used without relying on the assumption I[TF; M] = 0 (see SI Section 1.1). Additionally, transcriptional modulators of MYC can be directly inferred by ARACNe and do not require MINDy. Third, in the rare event that the regulatory program of a TF changes from activation to repression for specific targets, as a function of a modulator, this may not be detected by the algorithm because the MI may not change substantially. In this case the multi-information could be used instead of the conditional.
Comparison with existing methods reveals that MINDy is unique in its ability to discover large numbers of modulators of the same TF. Finding the optimal Bayesian Network structure, assuming arbitrary interactions among genes’ parents, for instance, is hyper-exponential in the number of parents. As a result, dissecting network topologies with large numbers (tens to hundreds) of upstream modulators, as is the case for MYC, may be difficult. Other methods39, while promising in a yeast context, have not yet been extended to mammalian networks. Finally, comparison with a recently introduced algorithm, NetworKIN11, shows that the latter is restricted to substrates of only 73 kinases, from 20 families, while MINDy can be used to dissect post-translational interactions of a much wider nature, including phosphorylation, acetylation, chromatin modification, formation of transcriptional complexes, and binding site antagonism, as shown for STK38, HDAC1, MEF2B and BHLHB2, just to mention those experimentally validated in this manuscript.
The ability to infer direct and upstream modulators of a desired TF’s activity suggests that MINDy may provide highly specific pharmacological targets for the activation or repression of specific transcriptional programs, when modulators are restricted to druggable genes40. This could be valuable because TFs are generally considered difficult pharmacological target.
Although preliminarily applied to the identification of modulators of MYC for experimental validation purposes, MINDy has already provided novel biological insights of general significance. First, the results indicate that not all modulators can influence the program of a TF in a global fashion, but may rather influence specific subsets of the target genes. This observation suggests that additional levels of regulation can influence the relationships between modulators and the TF they control in different cellular contexts or depending on different signals. Second, MINDy provided novel information about molecules that regulate the activity of the MYC protein. These mechanisms may be critically altered in tumors, thus modulating MYC’s established oncogenic activity. Finally, MINDy is not limited to dissecting post-translational interactions and may be applied without modification to identify TFs that are directly modulated by microRNAs or indirectly by genetic and epigenetic alterations.
At manuscript publication, the source code and executables for MINDy will be made available under the Open Source licensing agreement. Additionally, MINDy will be incorporated into the geWorkbench package which is distributed both by the NCI and by the National Center for Biomedical Computing at Columbia University (http://wiki.c2b2.columbia.edu/workbench/index.php/Home).
We used 254 gene expression profiles previously generated by our labs for several studies of normal and tumor related B-cell phenotypes using the Affymetrix HG-U95Av2 GeneChip® System (approximately 12,600 probes)13, 14. The gene expression profiles are available in GEO (series GSE2350, including samples GSM44075-44095, GSM44113-44302 and GSM65674-65716). Probe sets with expression mean μ < 50 and standard deviation σ< 0.3μ, were considered uninformative and were thus excluded, leaving 7907 probe sets for the analysis. Table S9 summarizes the 17 B-cell phenotypes included in this study.
The synthetic network models two post-transcriptional modifications of a TF, affecting its regulatory behavior (Figure 1B). It includes the transcription factor (TF), an activating protein kinase (K), a co-factor (cTF) forming a transcriptionally active complex with the TF, and three downstream TF targets. The full kinetic model is described in Table S10. 250 synthetic expression profiles were generated from this model by: (a) randomly sampling the TF, K, and cTF mRNA concentration from independent Normal distributions (μ = 4 and σ =1), (b) simulating network dynamics until a steady state was reached, and (c) measuring the mRNA concentration of the represented species with a multiplicative Gaussian noise (μ = 0 and σ = 0.1μ).
The statistical test for the range constraint is defined in SI Section 1.3. The statistical significance test for the independence constraint is based on the Mutual Information, as described in  (see also SI Sections 1.5 and 1.6).
For the category-specific analysis, we further selected 542 signaling proteins (GO molecular function: “protein kinase activity”, “phosphoprotein phosphatase activity”, “acetyltransferase activity” and “deacetylase activity”) and 598 TFs (GO molecular category: “transcription factor activity”) as candidate modulators.
Given a triplet (TF, M, t), with t ≠ TF and t ≠ M, MINDy assesses whether the CMI, I[T F; t| M], is constant as a function of M. Assuming that the CMI is a monotonic function of M (see SI Section 1.10), this can be efficiently tested by measuring , where and represent two subsets of the samples where M is respectively most and least expressed. For this dataset is chosen to be 35% of the samples by optimizing the effect of B-cell receptor pathway genes (which is known to modulate MYC activity42) on regulating canonical MYC target genes (see SI Section 1.2). MI was computed using the Gaussian Kernel estimator of  (see also SI Section 1.5). The p-value corresponding to a specific ΔI is obtained by permutation tests (see SI Section 1.1), Bonferroni corrected for the total number of tested target-modulator pairs. If a candidate target list is not provided, triplets are further pruned if there exists a third gene, x, such that I[T F; x] ≥ I [T F; t] and I[t; x] ≥ I[T F; t] in both , showing an indirect relationship between TF and t mediated by x, as suggested by the Data Processing Inequality59 (see SI Section 1.7 and Figure S8 for details).
Once MINDy inference is made on individual (TF, M, t), modulators are selected based on its support, i.e. the number of TF targets it modulates. The minimum support is determined using a permutation test procedure where an identical MINDy run is performed on the same set of candidate modulators and candidate targets except that and are chosen at random. This produces a permutated set of MINDy inferences, based on which we can compute the modulator support under the null. The minimum support is then selected as the support that gives the smallest FDR (i.e. the ratio between the numbers of selected modulators in the permutated run vs. real run). For MINDy analysis based on DB-targets, fifteen was determined as the minimum support searching for modulators among all genes, and 7 if candidate modulators are restricted to only signaling proteins and TFs. For MINDy analysis based on AR-targets, the minimum support is determined to be 8 when candidate modulators are selected among signaling proteins and TFs,
For each significant triplet, we define M as a positive (negative) modulator of the T F→ t interaction if ΔI> 0 (ΔI< 0). This indicates only whether M increases or decreases the mutual information between TF and t, and does not necessarily correspond to the biological activity of the modulator (i.e. TF activator or repressor). The latter can be assessed for each tested triplet as:
where ρ is the Pearson correlation between the TF and the candidate target t, and is the mean expression of t in . We assess these differences using a two-sample Student t-test with 10% type-I error rate (two sided). For modulators affecting more than one MYC target, the biological mode is labeled as undetermined if the undetermined triplets are the majority (>50%) or if neither mode dominates the other by more than 30%. Otherwise it is assigned the dominant mode (See SI Section 1.4 and Table S11 for more description).
GO molecular function categories with fewer than 20 and more than 500 genes were excluded. The enrichment of each category was computed using the Fisher’s exact test And corrected for multiple testing using the method of Storey and Tibshirani60.
For TFs with an appropriate DNA binding profile in the TRANSFAC professional database (version 6.0)24, we determined the binding-site enrichment in the promoter regions (defined as 2Kb upstream and 2Kb downstream of the transcription initiation site, masked for repetitive elements) of the targets they modulate. Sequences were retrieved from the UCSC Golden Path database (build35, May 2004)61. The binding profile match threshold was calibrated for a FDR ≤ 5% per 1 Kb (in both directions). The binding-site enrichment, compared to a 13,000 random human promoter5 background, was computed by Fisher’s exact test.
The human embryonic kidney 293T cells were maintained in DMEM supplemented with 10% FBS and antibiotics. The Burkitt lymphoma cell lines, Ramos, P3HR1, ST486, and ODH-III were maintained in IMDM supplemented with 10% FBS and antibiotics.
The mammalian expression vectors encoding MYC and TERT-800Luc have been previously described28. The mammalian expression vectors encoding BHLHB2/Stra13-FLAG, HDAC1-FLAG, and HA-MEF2B were kindly provided by Dr R. Taneja (Mount Sinai School of Medicine, New York, NY), Dr S.L. Schreiber (Harvard University, Cambridge, MA), and Dr R. Prywes (Columbia University, New York, NY), respectively. HA-STK38 (pcDNA3.1-NDR1-wt) expression vector was provided by Dr. B. Hemmings (FMI, Basel, Switzerland). MYC-HA and FLAG-MYC plasmids were constructed by subcloning the corresponding human cDNA amplified by PCR into the pcDNA3 (Invitrogen) and pCMV-Tag2A (STRATAGENE) vectors, respectively.
293T cells were transiently transfected by using the calcium-phosphate precipitation method and luciferase reporter assays were performed as previously described62, 63. Each transfection was done in duplicate and luciferase activities were measured 48h post-transfection using dual-luciferase reporter assay kit (Promega) according to the manufacturer’s protocol.
Nuclear cell extracts and whole cell lysates were prepared as previously described64. Proteins were analyzed by SDS-PAGE and subsequently by Western blot using the following antibodies: anti-MYC (C33 and N262), anti-HDAC1 (N-19), and anti-NDR1 (STK38) (G15) (Santa Cruz Biotechnology); anti-BHLHB2/DEC1 (BL2928) (BETHYL); anti-MEF2B (ab33540) (Abcam); anti-STK38 (2G8-1F3) (Novus Biologicals); Flag M2 and anti-HA beads (Sigma); and hemagglutinin (Roche).
ChIP assays were performed as previously described65, 66. Antibodies used were anti-MYC (N-262, Santa Cruz), anti-BHLHB2/DEC1 (BL2928, Bethyl) and anti-HDAC1 (AB7028, Abcam). The immunoprecipitated chromatin fragments from two independent experiments were pooled together and the amounts of sample immunoprecipitated by individual antibody were assessed by quantitative real-time PCR. The oligonucleotide pairs are listed in Table S12.
FLAG-MYC and CBP-HA expression vectors were transiently transfected into 293T cells. Acetylated-FLAG-MYC was purified with FLAG-beads. FLAG-HDAC1 was purified by FLAG-beads from 293T cells transiently transfected with FLAG-HDAC1 expression vector. Acetylated-FLAG-MYC and FLAG-HDAC1 were incubated at 30°C for 2hrs in a buffer containing 50mM Tris-HCl, 50mM NaCl, 4mM MgCl2, 0.5mM DTT, 0.2mM PMSF, 0.02% NP-40 and 5% Glycerol with or without 2μM TSA for inhibition of HDAC1 activity.
Lentiviral vectors for BHLHB2 shRNA (TRCN0000013249), MEF2B shRNA (TRCN0000015739), HDAC1 shRNA (TRCN0000004818), STK38 shRNA (TRCN0000010216) and Non-Target control shRNA (SHC002) were purchased from Sigma. Lentiviral supernatants were produced by transiently co-transfecting the lentiviral vectors, the packaging vector delta 8.9, and the VSV-G envelope glycoprotein vector as previously described67, 68. For infection, ODH-III, P3HR1 and ST486 cells (2×106 cells/ml) were mixed with viral supernatants, supplemented with 8μg/ml polybrene and centrifuged for 120 min at 450g. The ODH-III and ST486 cells were collected for analysis 96h and 60h post-infection, respectively. The lentiviral infected P3HR1 cells were selected with puromycin (0.5 μg/ml) for 5 days and collected for analysis.
Polymerase chain reaction with reverse transcription (RT-PCR) analysis was performed using FastLane Cell cDNA Kit (Qiagen) according to the manufacturer’s instructions. qRT-PCR was performed with Quanti Tect SYBR Green PCR kit (Qiagen) using the 7300 Real Time PCR systems (Applied Biosystems) according to manufacturer’s instructions. The oligonucleotide primers are described in Table S12.
For each modulator of interest, 5 samples infected with the modulator-specific shRNA and 5 with non-target control shRNA, were obtained. Total RNA was extracted with Trizol reagent (Invitrogen) and purified using the RNeasy kit (Qiagen). Biotinylated cRNA was produced according to the manufacturer instructions, starting with 6μg of total RNA and following the one-cycle cDNA synthesis protocol (Affymetrix, 701025 Rev.6). 15 ug of fragmented cRNA were hybridized to HG-U95Av2 microarrays (Affymetrix).
We determined the gene expression values by MAS5.0 normalization method provided in Affymetrix’s GeneChip Operating Software (GCOS). Differential expression between modulator shRNA and control shRNA samples was analyzed using two-sample t-test. For the GSEA analysis probe sets were sorted in decreasing order by the −log10 of their t-test p-values. Readers are referred to  for details on the GSEA algorithm. In the GSEA plot specific targets (MYC- or MINDy-signatures) are shown as vertical bars against the background of all B cell expressed genes in the expression profiles, sorted from the most to the least differentially expressed following silencing of the candidate modulator. The curve represents a random walk where the value on the y-axis is increased proportionally each time the gene on the x-axis is one of the selected targets and decreased if it is part of the background. Weights are chosen proportionally to the statistical significance of the differential expression and to the relative number of targets in the signatures vs. the background list, such that the curve starts and ends at y = 0. The statistical significance of the GSEA statistics (i.e., the maximum height of the curve, called Enrichment Score, ES) was determined by permutation test where the ranks of the probe sets were randomly shuffled 1000 times. To determine the enrichment of MINDy predicted modulator-specific targets of MYC among the differentially expressed genes, probe sets that rank before the GSEA leading edge (i.e. the increasing phase of GSEA profile) were determined to be significantly differentially expressed, and the enrichment was calculated using the Fisher’s exact test.
This work was supported by the National Cancer Institute (R01CA109755), the National Institute of Allergy and Infectious Diseases (R01AI066116), and the National Centers for Biomedical Computing NIH Roadmap Initiative (U54CA121852). I.N. was supported in part by the National Institute for General Medical Sciences (R21GM080216). A.A.M. is supported by an IBM Ph.D. fellowship.