The input to TIDAL consists of a gene-expression time-series. Internally, TIDAL also makes use of TRANSFAC transcription factor binding site descriptions and genomic multiple alignments with transcription start site annotations for TF binding site filtering. The method constructs a regulatory network using the following basic steps, with greater implementation details provided in the Methods section:
1. Identify up-regulated genes in the gene expression time-series, and group genes according to the time of their first detected up-regulation (t).
2. Infer transcriptional regulators that are active at each t by identifying TRANSFAC matrices with predicted targets that are over-represented in the group of genes first up-regulated at time t.
3. Define an activity window for each TF as the union of time-points when: (i) the TF gene is up-regulated and (ii) any TRANSFAC matrix annotated to the TF has inferred activity.
4. Connect TFs to targets that contain a TF binding site and are first up-regulated within the TF's activity window. For each target, retain the top few (can be set as a parameter) incoming links ranked by their enrichment P-values.
In the following sections, we use TIDAL to infer the transcriptional regulatory networks that control the immune responses to influenza and measles infections in human monocyte-derived dendritic cells. In each case, infection of these cells triggers a genetic regulatory cascade that controls the differential expression of hundreds of genes. The qualitative results obtained from the application of TIDAL to both datasets are similar. We present our analysis of the influenza infection in the main body of the paper, while analogous results for the measles infection are contained in the Supplementary Information.
Identification of transcription factors driving the influenza response
Human monocyte-derived DCs were infected in vitro with seasonal H1N1 influenza A/New Caledonia/20/1999. Genome-wide mRNA expression analysis was carried out at 0, 2, 4, 6, 8, 10 and 12 hours post-infection. We identify a set of 763 genes that are up-regulated over the course of the response (Figure , top).
Figure 1 Enrichment profiles for TFs implicated in the influenza response. Top panel shows numbers of genes first up-regulated at each time-point, split by TF and non-TF genes. Bottom panel shows a heatmap plot with the over-representation analysis of targets (more ...)
The first step in reconstructing the antiviral response network is to infer the set of TFs that likely regulate these observed gene expression changes. TFs act through distinct cis-regulatory elements located in the promoter regions of their target genes. We assume that genes which are up-regulated at similar times share cis-regulatory logic (i.e., they are regulated by common TFs). This is an important distinction from many other methods (e.g. [9
]), which assume common transcriptional regulation for groups of genes sharing similarity in expression across their entire time-series profiles. We propose that this is likely to be unnecessarily restrictive, and instead infer the identity of the regulators of each distinct time-point by looking for TFs whose target genes are over-represented among the genes up-regulated at that time. Note that we do not attempt to explain the process of down-regulation. We have previously observed that the timing of expression changes among down-regulated genes is not correlated across experimental replicates, suggesting looser regulatory control [26
]. Our goal is thus to explain the more tightly controlled up-regulation events using the set of transcription factors whose genes themselves are transcriptionally regulated as part of the response. These factors are key candidates for propagation of transcriptional signals [3
While genome-wide identification of direct TF targets can be done experimentally using techniques such as ChIP-Seq [27
], it has only been done for few TFs and limited to specific experimental conditions. Instead, we identify TF targets computationally, and use the presence of binding sites as a proxy for each potential regulatory relationship between a TF and target. We begin with the set of 744 TF binding signatures annotated to human proteins within the TRANSFAC database, a broad compilation of experimentally verified binding sites summarized as position weight matrices [28
]. We consider any gene that contains a binding site described by that factor's TRANSFAC [28
] matrix in its promoter region a target. We also require that the binding site be evolutionarily conserved, since conservation has been shown to reduce rates of false positive binding site prediction [29
Genes are grouped according to the time of their first detected up-regulation in the microarray (Figure , top panel). There is one group for each of the microarray sampling times: 2, 4, 6, 8, 10 and 12 hours post-infection for the influenza dataset analyzed here. For each of these groups, over-representation analysis, comparing the number of genes in the group with a specified TF binding site against a background set via the hypergeometric distribution (see Methods
), is applied to infer the activity of transcription factors. Of the 49 TRANSFAC matrices annotated to genes that are up-regulated at some point in the time-series, our analysis identifies 15 of these matrices as having a role in the response. Figure provides a visual display of the inferred activity for each of these TRANSFAC matrices over time. The set of transcription factors associated with these matrices contains many known integral components of the antiviral response, including those linked with interferon activation (IRFs, STATs, NFkB) [24
]. Indeed, these factors feature prominently in the activation heatmap, with the IRFs and some STAT factors showing sustained activity over a long period. The inferred activity for most TFs occurs early in the response, indicating a rapid transition to the antiviral state following infection. Figure also shows the highly redundant nature of the TRANSFAC database, with many matrices annotated to multiple, overlapping sets of TF genes.
We validate the inferred activity profiles through a complementary computational analysis. It has been observed that the location of functional cis-regulatory binding sites relative to the transcription start site (TSS) is non-random [30
]. True cis-regulatory binding sites are often located close to the TSS. Thus, we predict that the location of the binding sites for active TFs would be correlated with their inferred activity profiles, with binding sites located closer to the TSS during times when the activity of the associated TF is inferred. As an example, binding sites for the V$IRF_Q6_01 matrix are located significantly closer to the TSS in genes that are up-regulated at 2, 4, 6 and 10 hours post-infection (P <
0.05, Mann-Whitney test) (Figure , bottom left panel). This pattern closely parallels the temporal activity profile for the IRF matrix, which is significant at 2, 4 and 6 hours post-infection Figure , top left panel). Furthermore, the median distance of the site from the TSS steadily increases over time so that by 12 hours post-infection, the locations of the IRF binding sites are not different from the control set, suggesting that IRF-driven activity is minimal at this point. To test whether this pattern holds for the other TRANSFAC matrices with inferred activity, we compare the shift in binding site locations at the time of peak predicted activity (e.g., 4 hours post-infection for the IRF matrix) with the shift at the time of minimum activity. As seen in Figure , we observe that binding sites show a significantly shifted location distribution when the associated TF is maximally active (P <
0.0002, Mann-Whitney test). Inspection of the location plots shows that the shift in location during times of peak TF activity is towards the TSS, as expected.
Figure 2 Binding site location mirrors TF activity profile. (a) Analysis of the general IRF matrix (V$IFR_Q6_01). Top panel shows profile for the over-respresentation analysis, with red dots indicating significance at 2, 4, and 6 hours post infection (FDR q < (more ...)
The influenza transcriptional response network
Having identified the set of TFs driving the antiviral response to influenza, we next seek to explain how each of the individual TFs becomes up-regulated by connecting these factors into a coherent network. We initially consider all TF pairs such that a binding site of one factor is located in the promoter region of the other based on our promoter analysis. We filter these potential network links in two steps. First, we define a time-window for each TF's activity based on when its mRNA is up-regulated, and when any of its associated TRANSFAC matrices shows significant activity, and retain only regulator-target connections where the target is up-regulated within the regulator's inferred activity window. Since TF binding site locations are correlated with the TF activity profiles (Figure ), limiting links to the activity windows allow us to have greater confidence in the inferred regulatory relationships. Second, to predict the most likely regulators for each target, we rank each target's incoming links by the regulator's closest time of enrichment, and retain the top three links in the network, breaking ties by enrichment P-value (see Methods for details).
To visualize the inferred network, we order nodes (representing individual TFs) vertically based on their up-regulation times. Links between nodes indicate predicted regulatory relationships (Figure ). Since each TF can be associated with multiple TRANSFAC matrices, there are several choices of how to place a TF in the network. We choose to place each TF based on the time its mRNA was first up-regulated in the microarray time-series. Alternate placement of TFs, based on their associated TRANSFAC matrix enrichment, is ruled out for two main reasons. First, many of the matrices are enriched at multiple time-points, making the choice of node placement somewhat arbitrary. Second, this scheme does not allow for differentiation between TFs that are annotated to the same TRANSFAC matrix, making disambiguation of which factor is actually driving the response difficult.
Figure 3 The influenza regulatory response network. Network nodes correspond to individual TFs. Edges indicate predicted regulatory relationships, which can be either forward (green links), feed-back (red links) or reciprocal (black links). Time in the figure (more ...)
Having filtered the links by TF activity windows and having limited each node to three most likely regulators, we obtain an influenza antiviral network that contains 12 TF nodes and 32 regulatory links (Figure ). We visualize it with Cytoscape [32
] using the Cerebral [33
] plug-in to produce a time-dependent layout. To interpret the network visualization in Figure , it is helpful to keep in mind that links connect regulators to targets, and that arrow-tails indicate up-regulation of the regulator itself, while arrow-heads indicate transcriptional activity of the regulator. A majority of the links in the network are forward links (colored green), which propagate the transcriptional signal forward through time. Other links (colored red), indicate regulatory feedback relationships, which may contribute to prolonged activity of a TF that was first activated earlier. Within each time slice the TF nodes are laid out in such a way as to enable the forward links to point downward. Surprisingly, all the predicted TFs could be connected together into a single network, even after a large portion of the edges have been removed by the filtering procedure. Furthermore, considering all of the putative targets of TFs in the network, we account for 53% of the genes that are up-regulated during the influenza response.
A large number of methods have been proposed for inference of gene regulatory interactions based on time-series gene expression data [34
]. Many of these methods rely heavily on knock-out (or knock-down data) [35
], or require large microarray compendia [11
], and are thus not appropriate for the experimental setup considered here. To evaluate the performance of TIDAL, we sought another method that could predict regulator-target pairs and associate these with specific time-points, features that we consider essential benefits of TIDAL. While several approaches were considered [7
], we have chosen to compare TIDAL's results with those of the Dynamic Regulatory Events Miner (DREM) [13
], another state-of-the-art computational approach that can operate on a single time-series dataset. DREM is a method that infers global regulatory networks, assigning transcription factors to individual time-points, and thus allows for a direct comparison of the inferred regulators along with their predicted timing between the two methods. It is important to note that while both DREM and TIDAL rely on inferring response regulators by testing for statistical enrichment of putative targets among differentially regulated genes, a major difference lies in the grouping of genes being tested. TIDAL groups genes by time of first differential expression, and DREM performs a clustering based on the full temporal profile, identifying genes with similar expression patterns across the entire time-series.
DREM identifies points in the time-series where the expression of a subset of genes diverges from the rest. It is assumed that this divergence is the result of transcriptional control mechanisms and DREM associates the divergence events with TFs that regulate them in order to produce a global temporal map. It is also important to note that DREM can identify regulation events that are beyond the scope of our method, such as declines in expression following the initial up-regulation event. Applied to the influenza gene expression data and using the same set of TF matrices and their mapped binding sites as in TIDAL, DREM identifies 9 distinct temporal profiles (see Figure ), with 13 TRANSFAC matrices regulating 4 of the 9 clusters leading out of different branching points. To compare the relative performance of the methods, we compute the overlap between the TFs inferred by TIDAL (pictured in Figure ) and DREM (mapped from the TRANFAC matrices pictured in Figure ) with a set of TFs with 'known' immune involvement. This set of known genes consists of TFs that are part of the general pathogen response signature [1
] and the core dendritic cell response [38
]. High overlap with this set serves as a good indicator of correctness for a method's implication of genes as involved in an influenza or another infection.
Figure 4 Dynamic regulatory map of the influenza response. DREM  is used to create a dynamic map based on the time-series fold-change gene expression data as described in Methods. DREM parameters are left at their defaults. Each line in the figure represents (more ...)
While many more transcription factors, when mapped from TRANSFAC matrices, are predicted by DREM as compared to TIDAL, taken together, they count among their targets approximately 69% of all up-regulated genes (compared with 53% of genes explained by TIDAL). Recall for the two methods (see Figure ) stands at 16% for TIDAL and 25% for DREM. However, since we do not expect that all of the 'known' TFs are involved in the influenza response in dendritic cells, these recall values are likely to be underestimated for both methods. In contrast, TIDAL has much higher precision (100%) compared with DREM (40%). It is important to note that these results for the DREM analysis depend on the preprocessing of the microarray data. When DREM is supplied with the full set of genes without differential expression filtering, it produces much inferior results. Moreover, filtering the inferred transcription factors for differential expression, a step we consider essential to our analysis, also improves DREM's precision (to over 90%) without significantly reducing the recall rate.
Figure 5 Transcription factor overlap comparison with DREM for the influenza response. The inferred TFs from TIDAL (Figure 3) and DREM (Figure 4) are compared to a set of TFs with implicated immune response roles. This 'known' set consists of the union between (more ...)
DREM identifies the most well-known TF families in the antiviral response (namely IRF, STAT and NFkB). However, NFkB and STAT activity is predicted to regulate the latter stages of the response (via the V$STAT_Q6 and V$NFKB_Q6_01 matrices). In contrast, TIDAL predicts their activity early in the response, which is consistent with known biology [24
]. Overall, TIDAL's ability to more precisely infer the important antiviral regulators among the pathogen response signature genes coupled with more accurate temporal identification of their regulatory activity point to its better performance.