Modeling gene regulatory networks (GRNs) is a key challenge when studying development and disease progression. These networks are dynamic with different (overlapping) sets of transcription factors activating genes at different points in time or developmental stages. Reconstructing the dynamics of these networks is a non-trivial task that requires the integration of datasets from different types of genome-wide assays.
Several methods were proposed for reconstructing GRNs (see the following reviews for a general overview: [1
]). These methods often combine expression and protein-DNA interaction data to recover the underlying networks. However, most methods to date focused on reconstructing static networks and the resulting models did not provide any temporal information. In this paper we focus on the reconstruction of dynamic GRNs using time-series expression data. Such data is prevalent for several species, mostly from microarray studies [4
] and more recently using RNA-Seq methods [6
While several studies measure time series expression data, the available protein-DNA interaction data is almost always static (either from sequence motifs or from ChIP-chip or ChIP-Seq experiments). This creates a major computational challenge when attempting to integrate these dynamic and static datasets.
Several methods were suggested for clustering time series expression data [9
], or for constructing dynamic networks with regression-based techniques that rely on only the temporal expression data [12
]. While these approaches led to some success, as we show in Results, methods that can utilize both the temporal expression data and the static interaction data can improve upon the expression-only methods.
A number of methods have been suggested for addressing these issues, though most of them were targeted at specific input datasets and did not offer any software to support their general use. For example, [Luscombe et al. 13
] created a dynamic network by overlaying TFs regulating differentially expressed genes for different time points. [Lu et al. 14
] created a 2D visualization for different dynamic measurements, including time series expression, histone modification, and Pol2-occupancy data using the GATE software [15
] although no combined model is presented. Bromberg et al. measure TF activation as a time series and derive pathways that explain activated TFs by integrating subnetworks from PPI networks [16
]. Baugh et al. relies on the expression data of transcription factors to identify representatives regulating early development of C. elegans
A different way of formulating the problem is to decompose the gene expression data into TF activity and TF affinity values for each expressed gene as suggested by Network Component Analysis [18
]. From the matrix of TF affinity values one can construct a dynamic network with connections for each time point [19
]. There have been many extensions to this idea with different underlying mathematical models, including ordinary differential equations [20
] and Factor analysis [21
]. Note however that such regression-based methods do not really take time into account. If one randomly reorders the temporal columns (exchanging, for example the second time point with the fourth etc.) these models will still result in the same network.
One of the first approaches to construct networks that change over time while still incorporating the ordering of time series data was suggested by [Friedman 22
] using dynamic Bayesian networks (DBNs). A DBN is a set of directed networks, one for each time point. Although general learning of DBNs is NP-hard there exist conditions where these networks can be learned optimally [23
]. However, these methods do not scale to hundreds of regulators.
To provide a general method that can be widely applied to reconstructing dynamic regulatory networks, [25
] presented DREM, a method that integrates times series and static data using an Input-Output Hidden Markov Model (IOHMM). DREM learns a dynamic GRN by identifying bifurcation points, places in the time series where a group of co-expressed genes begins to diverge. These points are annotated with the TFs controlling the split leading to a combined dynamic model. Since its release 5 years ago the DREM software has been used for modeling a wide range of GRNs for example stress response in yeast [25
] and E. coli
], development in fly by the modENCODE consortium [8
], stem cell differentiation in mice [27
] and disease progression in human [28
While DREM has been successfully used for multiple species, so far each group using it had to obtain its own protein-DNA interaction data. Since such data is often dispersed among several databases, websites and publications, this step was a major hurdle to using DREM. Other features not supported in the original DREM version included: the integration of motif discovery, the ability to utilize dynamic ChIP binding data [29
] and TF expression data, and visualization of these new data types. In this paper we discuss a new version of DREM, termed DREM 2.0, that addresses all these limitations. As we show, by addressing these issues DREM 2.0 improves upon both methods that do not integrate static information in the analysis of dynamic data and the previous version of DREM which lacked the above features.