Previous work has demonstrated the utility of exploring the reverse causal interpretation of large scale data sets using HYPs as opposed to reasoning downstream from the data [
12-
14]. The ability to deduce the degree of activity for a broad spectrum of biological processes, afforded by an extensive causal knowledgebase, would provide enormous potential for facilitating biological characterization and yield an even deeper mining of information from large-scale data. This approach offers the potential to quantify responses of biological systems to anything from toxicity and disease processes to therapeutic benefit. This study successfully demonstrated the use of the causal connections provided by an appropriate knowledgebase as the basis for quantifying the activity degree of specific biology from high-throughput data. This quantitative application of HYPs, representing possibly complex network models, to experimental data measuring treatment-induced perturbations is called Network Perturbation Amplitude (NPA) scoring.
Causal directionality is key for the HYP framework
For all NPA metrics, the proper scoring of a HYP is dependent upon the directionality (signs) of the causal influences linking the upstream biological entity represented by the HYP to the downstream genes whose expression it regulates. The knowledgebase harbors information about the specific signs (positive or negative) of the regulation exerted by the entity represented by the HYP on the expression of each of the downstream genes. The logic for incorporating differential gene expression measurements into an NPA score based on a knowledgebase-specified directional blueprint can be made via arguments against two specific alternative scoring schemes. The case of scoring an activity without taking into account the sign of causal influence in the HYP can make sense if the HYP represents a transcription factor that always activates or represses genes. However, if there are downstream genes in a HYP that are controlled in an opposite manner to the others, the error of an activity score based on an assumption of a single sign becomes apparent: the score contribution of genes that are known to be negatively regulated within a HYP might cancel, instead of add to, the score contribution of genes that are known to be positively regulated within a HYP. An alternative tactic would be to incorporate the absolute values of the differential expression for each gene. This has the problem of always producing positive scores for a HYP as well as artificially inflating scores: genes that change in a manner opposite of how a HYP is known to control a gene would add to a HYP activity score rather than detract from it. Standard gene enrichment analyses usually ignore regulation signs when scoring pathways [
21,
22], but like some newer gene enrichment methods [
23], NPA assessment methods integrate both types of causal signed relationships of the biology to the measurements for their output.
HYPs can be rapidly constructed from an appropriate knowledgebase
HYPs were constructed from the Selventa Knowledgebase, a database of causal biological knowledge that allows rapid creation of HYPs for any biological process, entity, or causally consistent network model that is adequately connected to gene expression changes (see Methods). The TNF and E2F1-direct HYPs were created from this knowledgebase without any additional literature or experimental investigation. For the NF-κB-direct HYP, because the content of the knowledgebase was biologically too limited in this context, additional genes that are directly regulated by NF-κB were mined from the literature and added to the knowledgebase as causal relationships. The additional knowledge concerning the direct effects of NF-κB was necessary to ensure a broadest representation of NF-κB biology. To construct the IKK/NF-κB signaling HYP, a network model was built by assembling causal relationships between relevant entities that were represented in the Selventa Knowledgebase. Similarly, HYPs and network models can integrate information from other sources besides the Selventa Knowledgebase, including literature articles and curated databases, as long as this information can be interpreted as signed causal relationships. The boundaries of HYPs and network models are defined during their construction (see Methods). For this study the model boundaries were chosen based on the biology known to be associated with the TNFα treatment of NHBE cells. In a case where the expected biology is unknown, a process of identifying biology is required to determine the most appropriate network models and HYPs to score. Such an exploratory perspective can be provided by evaluating the resulting HYPs using the RCR approach [
1], which provides a statistical assessment of whether the activation of a biological entity is consistent with measured data, as previously described [
12-
14].
Building the IKK/NF-κB signaling network model afforded the ability to aggregate the gene expression measurements that underlie all the individual HYPS of a specific NF-κB network and provide a single score for that network. However, in condensing a complex model into a single score, there are caveats to consider. Gene expressions that have an ambiguous relationship to the network (both causally positively and negatively regulated) must effectively be removed for scoring purposes. These ambiguities affect approximately 6 % of the downstream genes of the aggregated IKK/NF-κB signaling HYP, which is similar to the case of single HYPs: the NF-κB-direct and the TNF HYPs contains approximately 4 % and 7 % ambiguous downstream genes, respectively. Their actual impact on the NPA results is expected to remain limited (see below).
Additionally, resolution with regard to which individual entities of the network model are being perturbed is also diluted in the overall network score. Thus, when generating an aggregated HYP for a network model, key information about the network is not explicitly available, and it is important to keep these features in mind when interpreting NPA scores (see below for a further discussion of the methodological perspectives). However, despite these caveats, the IKK/NF-κB signaling HYP produced a near-identical pattern of response as the NF-κB-direct HYP, and thus is also correlated with the measured physiological endpoint, NF-κB activation.
This finding that similar NPA results were obtained using HYPs featuring different characteristics highlights an essential aspect of this work: using the same network model for calculating the NPA scores of the various experimental conditions to be compared (e.g., for all TNFα doses and post-treatment times) provides results that are robust against having exhaustively captured the perturbed biology in the network model used for NPA scoring. This aspect is fundamental, especially given the practical impossibility of constructing networks models capturing all of the biological processes potentially perturbed in a given experiment. It is also exploited when constructing network models describing processes that are sufficiently generic, e.g., cell proliferation or cellular stress [
16,
17], so that they can be meaningfully applied in a variety of experimental situations.
The robustness of the results also preserves the validity of the NPA approach against the possibility of HYPs and network models evolving slightly due to the constantly improving understanding of the biological processes they describe. This property was concretely tested with a simple step-back calculation consisting of randomly removing edges to the HYPs and comparing the corresponding NPA results to the original ones. The results demonstrated a remarkable robustness: typically, after removal of 20 % of their downstream genes, the four HYPs used in this work returned GPI profiles that correlated extremely well with their original values shown on Figure (Spearman correlations of 0.99

±

0.01 obtained on 1000 samples). Therefore reasonable future additions to the Selventa Knowledgebase are not expected to significantly impact the NPA results presented in the work. As a corollary, these robustness considerations also support the choice of discarding the downstream genes expressions that have an ambiguous relationship to the HYP upstream entity. Examination of Additional file
1, Additional file
5, Additional file
7, Additional file
8 showed that the fraction of ambiguous downstream genes never exceeds 10 %, which clearly indicates that this effect does not affect the NPA results.
NPA scoring methods accurately assess biological activation
Four different algorithms were developed to quantify the amplitude of perturbation of a HYP. Each method employs a different approach to evaluate the degree of perturbation between two experimental measures for a given HYP (see Table and Methods). Despite their differences, the four methods generally produced similar qualitative results, suggesting that each NPA scoring method is able to effectively quantify the changes in activity of the underlying biological processes. This claim is supported by the fact that the NPA scores correlated well with NF-κB nuclear translocation, a standard measure of NF-κB activity (correlation coefficients between 0.85 and 0.98). This correlation further validates our method of constructing HYPs from a database of prior knowledge.
Future work will confirm the circumstances in which each method is expected to be the most appropriate. For example, the MASS algorithm was developed to use absolute measurement technologies such as an absolute transcript count offered by quantitative next generation sequencing. Given more appropriate measurement technology, the MASS method may be more applicable in circumstances where small differential expressions in a set of highly expressed genes have a dominant effect over large differential expressions in a set of weakly expressed genes. On the other hand, the GPI algorithm, and to an even larger extent the EPI algorithm, down-weight the contributions from genes with poor statistical significance, favoring small sets of strongly differentially expressed genes rather than large sets of weakly differentially expressed ones. From this point of view, the Strength algorithm is unbiased since it contains no weighting factors. However, because an NPA score represents a condensed view of the biology underlying a HYP, the ability to assess the amplitude of its perturbation with complementary NPA methods also highlights which conclusions are robust versus which conclusions may be specific to a particular NPA method. For example, the four NPA methods supported the same time- and dose- dependent NF-κB activation in response to TNFα (Figure a), whereas only Strength and GPI suggested NF-κB activation in response to CDK inhibition (Figure a).
There are some important considerations when using NPA scoring methods to evaluate HYPs. Scores are meant to be directly compared between different treatment versus control contrasts when using the identical HYP. Scores cannot be quantitatively compared between two different HYPs without first verifying that the relationship between a change in the activity of the HYP’s upstream entity and the resulting change in the expression of downstream genes is conserved between the two HYPs. In general, this relationship is not expected to be conserved due to differences in the dynamic range of expression of individual genes that compose each HYP. Additionally, a HYP with a higher number of downstream gene expressions may be expected to represent broader biology than a smaller HYP, and thus in any given experiment, a smaller fraction of genes in the larger HYP may be perturbed, resulting in a lower score than a smaller HYP. However, additional statistical power is gained in the Uncertainty and Specificity statistics with increasing number of downstream gene expressions in a HYP, such that the weaker scores of larger HYPs can be just as significant and meaningful as higher scores from smaller HYPs.
Although scores cannot be directly compared between two different HYPs, the pattern of scores across a set of contrasts can be qualitatively compared. Likewise, the absolute magnitude of the NPA scores should not be directly compared between two amplitude scoring methods, but the pattern of scores across NPA scoring methods can be qualitatively compared, keeping in mind that the scoring methods may be assessing different aspects of the contrasts.
The NPA score represents an abstracted measure of the data represented in the HYP. The score captures the amplitude of the perturbation of a HYP, but does not capture which genes in the HYP most strongly contribute to the score. For example, of the 20 genes that contribute most to the IKK/NF-κB signaling HYP score upon TNFα treatment (100 ng/mL, 24 hour), only one is in common with the 20 genes that contribute most to the IKK/NF-κB signaling HYP upon CDK inhibition (0.6 μM, 24 hour). Given that the NF-κB-direct HYP consists of only 155 genes, this suggests that there is a significant difference in the biology represented by the NF-κB-direct HYP scores in these two cases.
Uncertainty and specificity of NPA scores
Uncertainty estimates the confidence interval of each NPA statistic, and therefore also tests the nullity of the score accounting for the experimental error. The Specificity statistic gives a measure of whether the score is dependent on the expression of specific genes in the HYP, or is instead dependent on a particular property (the likelihood of modulation) of the ensemble of gene expressions in the HYP. Although this definition of Specificity is useful, there are some important points to ensure that Specificity is interpreted appropriately. First, a weak Specificity does not mean that the score fails to accurately characterize the amplitude of the process described by the HYP. Rather, it means that many other
comparable HYPs would obtain a similar score. For example, a very weak score (approximately zero) for a transcriptomic HYP is likely to have a weak Specificity because the majority of the genes on a microarray are unchanged under most conditions. Thus, any random assortment of genes in a HYP might produce a similarly low score. Weak Specificity for low scores could therefore be an indication that the genes in the HYP are not sufficiently perturbed. Alternatively, a high score with a weak Specificity does not indicate that the process measured by the HYP is not perturbed. Rather, it indicates that comparable gene expressions are perturbed to a similar extent, suggesting that other processes with comparable HYPs are likewise perturbed, and thus the score cannot be attributed with high probability to the process represented in the HYP. For example, the fact that the pattern of Strength scores for the TNF HYP in the CDK inhibitor experiment is similar to the pattern of Strength scores for the E2F1-direct HYP suggests that the TNF HYP may contain some genes that are cell cycle controlled (Figure ). However, this number of genes is not sufficient to distinguish this score from the “background” of scores for comparable HYPs, as only one of the fifteen TNF HYP Strength scores met the Specificity criterion. In fact, 32 genes are common to the TNF HYP and the E2F1-direct HYP, which constitutes more than a third of the E2F1-direct HYP, but only one fortieth of the TNF HYP. Methods such as Network Component Analysis [
24,
25] could possibly be adapted to resolve overlaps between HYPs by assigning shared gene expressions to the most statistically likely HYP, potentially increasing the precision of each HYP and modulating score Specificity appropriately.
Together, the Uncertainty and Specificity statistics enable the identification of non-specific and non-significant scores in HYPs when scored against unrelated perturbations. These statistics demonstrate that TNFα treatment of NHBE cells only has a significant effect on cell cycle progression when the dose is above 0.1 ng/mL, and that this effect takes two-to-four hours to appear. Also, these statistics support the conclusion that some NF-κB-regulated genes are upregulated at 6 and 24 hours after CDK-inhibition in HCT116 cells, but likely not at 4 hours or earlier.
Potential applications beyond the comparative assessment of biological impact
The NPA approach developed in this work aims at quantifying the treatment-induced perturbations of the biological processes described by causal network models. It enables the comparative assessment of the biological impact from high-throughput data in response to given stimuli. However the NPA approach could be also used in more exploratory perspectives. For instance, by applying NPA scoring to each HYP in a causal network model, rather than constructing and scoring a single aggregated HYP for the model (Figure b), differences in activation across a model could be investigated. Scoring individual HYPs within a model instead of the larger aggregated model HYP presents a tradeoff of increased granularity of information at the expense of statistical robustness, due to the smaller sizes of the HYPs being scored. Another possibility would be to use the NPA scores and their companion statistics to identify which processes are potentially activated in response to a given perturbation, and thus help guide the construction of a HYP or of a causal network model that capture the relevant perturbed biology.
Finally, NPA scores could be used as a supplementary source of information in studying different types of mathematical models of regulatory networks. The fact that NPA scores provide quantitative measurements for the response of entities that are not explicitly measured or measureable can be exploited in the construction, calibration, or evaluation phases of such models. For instance, in the case of TNFα-treated NHBE cells considered in this study, the NPA scores provide direct quantitative measurements of the inflammatory response of the system, a quantity that would be difficult to access in the absence of the NF-κB nuclear translocation measurements performed in this work.