Understanding the dynamic programs that a cell utilizes in response to internal or external stimuli is an important challenge. These programs activate regulatory networks controlled by several transcription factors (TFs) (Harbison et al, 2004
) and can involve a large number of genes (Natarajan et al, 2001
). Direct information about this process has been obtained from genome-wide chromatin immunoprecipitation (ChIP-chip) experiments and comparative motif studies that have been carried out to identify some of the regulators involved (Hahn et al, 2004
; Harbison et al, 2004
; Xie et al, 2005
; Workman et al, 2006
). Time-series microarray expression experiments are a complementary source of data, providing dynamic information about the expression of thousands of genes that are activated or repressed in response to stimuli such as environmental stress (Gasch et al, 2000
An extensive literature has accumulated on methods to analyze and model time-series gene expression data. Some of these methods have focused on determining a continuous representation of the time-series expression data using splines (D'haeseleer et al, 1999
; Bar-Joseph et al, 2003a
). Other methods have focused on clustering genes while taking into account gene expression dynamics using methods such as auto-regressive equations (Ramoni et al, 2002
), hidden Markov models (HMMs) (Schliep et al, 2003
), and template-based methods (Ernst et al, 2005
). Others have modeled time-series gene expression data using techniques such as differential equations (Chen et al, 1999
), dynamic bayesian networks (Kim et al, 2003
), and singular value decomposition (Holter et al, 2001
). These methods, although useful, only provide a partial view of the transcriptional regulation process as they do not explicitly integrate information about TF–gene interactions such as ChIP-chip and motif data.
Most methods that integrate gene expression data with motif or ChIP-chip data do so without explicitly taking into account the dynamic nature of biological systems. A number of these methods combined a large number of expression data sets and motif data to infer transcription modules (Pilpel et al, 2001
; Ihmels et al, 2002
; Segal et al, 2003
). Transcriptional modules are subsets of TFs and genes, such that genes in the same module tend to be similarly expressed and regulated by the same TFs across a number of experimental conditions. Bar-Joseph et al (2003b)
integrated ChIP-chip data with expression data with a similar objective. Das et al (2006)
presented a method that combined human expression data and motif information to identify active motifs, combinations of motifs and target genes under certain conditions. Although these prior methods provided important insights and often used time-series expression data sets, they did not take advantage of the sequential ordering of time points in an expression experiment, essentially treating time-series and static expression data in the same way.
A few recent methods have been proposed to integrate time-series expression data with ChIP-chip or motif data while taking into account the ordering of experiments in time-series data sets. For instance, time-series expression data were used to determine which genes were active at certain phases and then combined with ChIP-chip data using a trace-back algorithm to identify active TFs at these phases (Luscombe et al, 2004
). This method in effect identified an ordered series of static regulatory graphs, but its direct connection with the dynamics of observed gene expression patterns is less clear. Other methods have relied more heavily on individual gene expression profile dynamics. For instance, Kundaje et al (2005)
forms independent clusters of genes by using a joint probabilistic model for the dynamics of time-series expression profiles of genes and the motifs in their promoter regions. Others have integrated time-series expression data with ChIP-chip data to model the expression of individual genes (Lin et al, 2005
) and interactions among TFs (Cokus et al, 2006
) applying their techniques to model the cell cycle. Another method (Bonneau et al, 2006
) used kinetic equations based on the time-series expression data to associate TFs with subsets of genes across a subset of experimental conditions.
Our objective is different from that of these prior works. We present a computational method that integrates the time-series expression data and ChIP-chip or motif information to infer an annotated global
temporal map. This map describes the main transcriptional regulatory events leading to the observed time-series expression patterns and the factors controlling these events during a cell's response to stimuli. Our method focuses on bifurcation events. Bifurcation events occur when sets of genes that have roughly the same expression level up until some time point diverge (see ). Modeling expression patterns as results of a series of bifurcation events is consistent with a multilayer hierarchical model of gene regulation previously suggested for some organisms (Balázsi et al, 2005
). Our method attempts to both detect these bifurcation events and explain them in terms of regulation by TFs. By focusing on detecting and explaining bifurcation events, we can determine the time when TFs are exerting their influence. The method also assigns genes to paths in the map based on their expression profiles and the TFs that control them. The model we use to learn these maps is based on an instance of an input–output hidden Markov model (IOHMM) (Bengio and Frasconi, 1995
), where the ChIP-chip or motif data are the input and the observed expression data are the output.
Figure 1 Model overview. (A) Plots of time series expression profiles generated to illustrate the model. (B) Static TF-DNA binding data—DREM integrates TF-gene regulatory relationships derived from ChIP-chip or motif data with the time series expression (more ...)
We applied our method to study several stress responses in yeast. Our method was able to automatically infer many aspects of the temporal responses, some of which were previously known whereas others were new predictions. These new predictions range from low-level predictions regarding the timing of specific interactions to mechanistic predictions about the set of TFs controlling recovery from stress to predictions related to phenotypic changes. We have experimentally validated all types of these predictions leading to new roles for TFs in controlling yeast response to stress. We also used our temporal maps to compare different stress experiments and to identify a number of common control mechanisms. By using the time of activation that our method assigned to TFs, we were able to identify cascades of activators. Analysis of these cascades provides insights into the utilization of networks motifs and condition-specific regulation in response to stress.