When a cell is perturbed by external stimuli, it responds by adjusting the amount at which different types of proteins are needed. Transcriptional regulatory networks form the core of this cellular response system. However, the static wiring of these networks does not reveal which parts of the network are active under certain conditions and how perturbations are propagated through the network. For this reason there has been much interest in integrating the static network topology with gene expression data which reflect the dynamical or functional state of the network. In a pioneering paper, large changes were identified in the subnetworks of the transcriptional regulatory network of S. cerevisiae
active under five different conditions [1
]. In reality, the transcriptional regulatory network cannot be considered in isolation, but it is integrated with other networks such as the protein-protein interaction network [2
]. In [3
], a framework was developed which integrates protein-protein and protein-DNA interactions to identify active subnetworks of physical interactions in perturbational data. These subnetworks extend traditional clustering approaches by grouping genes consistent with the constraints of the physical interaction networks. In [4
], a further step was taken by introducing a probabilistic model to link a causative gene, via paths in the protein-DNA and protein-protein interaction network, to the set of effect genes which are differentially expressed upon knockout of the causative gene, without requiring that the intermediate genes be differentially expressed as well. This approach was used to map DNA-damage response pathways [5
] and jointly model regulatory and metabolic networks [6
]. The problem to explain knockout pairs using physical interactions continues to attract much interest. In [7
], an integer programming formulation was introduced and in [8
] an approach based on representing the physical networks by electrical wiring diagrams was applied to the study of expression quantitative trait loci. In [9
], a similar approach was used to connect genetic hits to differentially expressed genes using an integrated network containing protein-protein, protein-DNA and metabolic interactions, and in [10
] a technique based on the Steiner tree problem was presented. All of these techniques have in common that they are computationally expensive and try to explain as many knockout or cause-effect pairs as possible in a particular set of experiments, but do not search for general mechanisms or path structures which are common between different (classes of) knocked-out genes.
A much simpler method was used in [11
]. There all paths of length two in an integrated protein-protein and protein-DNA interaction network connecting a transcription factor to its knockout gene set were kept to study the effect of redundancy between paralogous transcription factors in perturbational data. The optimal path length was determined by a hypergeometric test between the knockout set and the set of genes reached by paths of a given length [11
In this paper we present an alternative strategy for elucidating response-to-perturbation mechanisms in integrated networks which is based on the notion of a path-like network motif. Standard network motifs are small subgraphs which occur significantly more often in a network than expected by chance and characterize its static properties [12
], forming functional modules in integrated networks [14
]. Recently, it has been shown that by overlaying functional data over static network structures additional types of network motifs can be discovered [15
]. The kind of motifs studied in [15
] are so-called activity motifs
, short patterns of timed gene expression regulation events occurring significantly more often than expected by chance in the metabolic network of S. cerevisiae
. In the same spirit, we define regulatory path motifs
as short, significantly enriched paths in integrated physical networks which connect a causative gene (for example, a transcription factor) to a set of effect genes which are differentially expressed after perturbation of the causative gene. Enrichment of a regulatory path indicates that it connects significantly more true cause-effect pairs than suitably randomized cause-effect pairs.
Our method is implemented as a Cytoscape [16
] plug-in 'Pathicular' to identify regulatory path motifs in integrated networks. As a case study, we used comprehensive microarray data sets for 157 transcription factor deletion experiments [17
] and 55 transcription factor overexpression experiments [18
] in S. cerevisiae
, together with large-scale networks of transcriptional regulatory interactions [19
], protein-protein interactions [21
] and phosphorylation interactions [22
]. Our algorithm identified eight regulatory path motifs, of which five were enriched in both deletion and overexpression data. These eight motifs explain 13% of all genes differentially expressed in deletion data and 24% in overexpression data, a more than five- to ten-fold increase compared to using direct transcriptional links only, confirming that perturbational microarray experiments contain mostly indirect regulatory links. We further observed that regulatory path motifs are organized into modules of genes connected to a transcription factor by the same path and the same intermediate nodes. Perturbed targets forming such modules tend to be highly coexpressed and functionally coherent and we have used this property for predicting periodic genes and associating novel functions to genes. Finally, we considered two condition-dependent data sets, one containing deletion experiments for 27 transcription factors under DNA-damage condition [5
], and one cell cycle specific data set by selecting only the cell cycle regulators from [17
], and compared the relative abundance of each path motif between those data sets.
The current version of Pathicular supports functions to calculate regulatory path significance values for user-defined cause-effect and directed or undirected physical interaction networks, to visualize regulatory paths on the integrated interaction network, and to extract and visualize regulatory path modules. Pathicular is freely available for academic use.