Molecular profiling of cells sampled from healthy patients and patients suffering from diseases led to the discovery of signatures of deregulated genes, i.e., distinctive expression patterns of genes that are differentially regulated and thus change expression between these two populations of cells. Such deregulated genes facilitate classification into different tumors [1
], define new cancer subtypes and can serve as predictors of tumor differentiation stages and patient survival [6
In recent years, the focus of research has moved from analyzing differentially expressed genes to analyzing differential regulatory networks [11
]. These approaches are based on the observation that cellular adaptation to different environments and stimuli [12
], to changes induced by diseases [13
] or gene mutations [17
], as well as to developmental processes [18
] results in gains or losses of interactions in the molecular networks of the cell [19
]. For example, Workman et al
] showed extensive re-wiring of gene regulatory networks in yeast cells undergoing DNA damage by using genome-wide measurements of gene expression upon transcription factor (TF) perturbations, as well as TF binding to DNA.
Computational and statistical analysis of changes in network structure between two cell populations has become a rapidly expanding field of research [11
]. Many methods have been developed to infer differential interactions from gene expression data, either based on linear measures of correlation [20
] and regression [23
] or non-linear information theoretic criteria [13
]. Additional to methods comparing two cellular populations, there are dynamic approaches to infer re-wiring over time [24
However, these extant approaches to analyze deregulation between two different cell populations do mostly not take into account available knowledge about cellular signaling pathways nor their transcriptional targets, which may also differ between the cell populations. For example, Mani et al
] and Taylor et al
] take as input a static interactome, which is not specific for the two cell populations, to discover loss or gain of expression correlation between its nodes. The advanced approach of Workman et al
] could be further improved by incorporating prior information about the signaling pathways that are differentially activated upstream of the re-wired gene regulatory network, and the complementarity between the TF DNA-binding and the TF perturbation data.
Here, we present a novel approach to assess re-wiring in two cell populations that combines two key ideas: (1) we analyze the effects of pathway-targeted experimental single-gene perturbations and (2) we explicitly include knowledge of pathway topologies and their downstream targets. In this way, our approach facilitates research in a particular context of the biological system under study, implementing the concept of data analysis that gains power from incorporation of knowledge [25
]. Our knowledge-based approach is designed for quantifying deregulation
, i.e., changes in gene regulatory network between two different populations of cells. It performs joint analysis of perturbation data from the two cell populations, and is referred to as joint deregulation analysis
(JODA) throughout the text. The cell populations may correspond to healthy and diseased cells, or diseased cells in two different stages, or, more generally, cells exposed to two different external stimuli, with different cellular signaling and downstream transcriptional targets.
JODA analyzes high-throughput perturbation experiments, where genome-wide expression is measured upon single-gene knock-outs or knock-downs. It is assumed that the set of perturbed genes is composed of regulators, i.e., genes active in signaling or gene regulation systems of the analyzed cells. The perturbations need to be performed on the same set of regulators in both cell populations.
The first kind of knowledge given as input to JODA is the information about the topology of the signaling pathways active in the two cell populations. The pathway topologies are graphs, which represent regulators with nodes and the known signaling relations between the regulators with edges. Internally, based on the given pathway topologies, JODA builds two binary matrix models (one per each cell population). The models are used by the algorithm to determine which perturbation experiments affect which regulators in the pathways. Second, JODA takes as input the known regulator-target gene relations downstream of the pathways. Those relations, when available, are given for those regulators, which are also TFs, and their known target genes. Since both the signaling and regulatory relations may differ between the two cell populations, they need to be provided for both of them separately.
The output of JODA are deregulation scores that quantify deregulation using the difference in perturbation effects in the two cell populations. An up-regulation effect indicates (possibly indirect) inhibition, and down-regulation indicates activation by the perturbed regulator. The most extreme deregulation scores are assigned to those genes which switch regulatory mechanism between the cell populations and show a different perturbation effect in the one cell population than in the other. We show that JODA performs better than investigating gene regulation in each cell population separately: with the deregulation scores, JODA prioritizes genes that are more enriched in those Gene Ontology (GO [26
]) terms which are important for the switch between the compared cell populations. Similarly, functionally important genes can be missed when deregulation is analyzed without incorporation of prior knowledge, but based only on differences in expression correlation, adapting the ideas of Mani et al
] and Taylor et al
]. An R package 'joda', implementing the JODA algorithm, is released by Bioconductor [27
]. A short summary of the package functionality and its demo are available at http://joda.molgen.mpg.de/
In application to analysis of deregulation driven by DNA damage in Human cells, JODA reveals broad changes of gene regulatory network downstream of the ATM signaling pathway. The analysis integrates expression data from perturbation experiments in the healthy cells and in cells undergoing DNA damage [28
] (see Figure ), the knowledge about ATM signaling down to RelA and p53 (absent in the healthy cells and active in the damaged cells), together with the known targets of RelA and p53 in both cell populations. The damaged cells are obtained by exposure to neocarzinostatin (NCS), an antibiotic that induces DNA double strand breaks and activates the ATM pathway [29
]. Original data analysis [28
] rigorously but exclusively focused on a small set of 112 genes responding to NCS treatment, which showed perturbation effects that correctly reconstructed the known ATM pathway interactions. Here, based on the deregulation scores, we cluster 645 genes into thirteen functional clusters, reflecting the rich spectrum of biological activities in the DNA damage response program. We review genes in the functional clusters in terms of the known impact of NCS on its gene targets. Analyzing enrichment in canonical pathways and known gene-regulatory and protein-protein interactions, we elucidate the connectivity within those functional clusters. We list genes with the most extreme deregulation scores reporting their involvement in DNA damage response. Our results validate that genes with dominant deregulation scores are directed by the ATM pathway and are functionally involved in the switch between the healthy and damaged cells induced by NCS. In the final section we show that the approach can also lead to testable hypotheses: we investigate the indirect regulatory impact of each ATM, RelA and p53 on the deregulated genes, and build a hypothetical hierarchy of direct regulation.
Figure 1 Method overview. (A) Two different cells (ovals): a healthy cell h in a neutral environment (left) and a damaged cell d treated with neocarzinostatin (right). Inside each oval: a pathway topology, with regulators RelA, ATM and p53, and a set of remaining (more ...)