3.2.1. Model Structure
gives the observed steady states for 9 single perturbations involving the 9 CRGs (perturbations of all but Plac8 were over-expression). See Appendix II
for details regarding experimental design. The direct perturbation effects are summarized in the connectivity diagram in , in which an edge leading from gene a
is included if there is statistical evidence that a perturbation of a
changes the expression level of b
. This is equivalent to the accessibility graph
described in methodology developed in Wagner (2001
. This type of model can be expected to contain redundant edges, since an edge will be included from a
to all genes downstream of a
, whereas the causal structure could be represented using fewer edges which model sequential, or transitive, causality. Wagner proposes replacing the accessibility graph
with an adjacency graph
, defined as the most parsimonious graph which preserves the pathway structure of the accessibility graph (effectively removing shortcuts
). However, such a model is not feasible in an application in which special interest is placed on the discovery of self sustaining cyclical attractors representing alternative meta-stable states.
Table 9: Steady state summaries for the 20 edge model selected by the thresholding method described in Appendix II. The implied edges are shown in . Persistent perturbations are indicated in bold type.
Connectivity graph derived from the 9 CRG model using a threshold of 20 edges. Merging or splitting of edges is indicated by the solid circles. Increased gene expression or gene repression are represented by arrows or bars, respectively.
shows marginal edge probabilities p(j → i) obtained by averaging 10 full posterior densities estimated by our proposed algorithm (see below for more detail). All probabilities p(j → i) ≥ 0.6 are highlighted, which are represented by a connectivity graph shown in . We note that the edges are not functionally annotated (as in ), and should be regarded as a summary of many networks, rather than a single network model, with edges representing connections of some form present in most models sampled from a posterior density. Many of the general features of the connectivity graph of are preserved in , in particular a pathway structure originating in HoxC13, then passing through Id2, then through a set of three genes Zfp386, Jag2, Notch3, and terminating in a circuit comprising Wnt9a, Sfrp2 and Id4. There is also a potential for feedback from this final circuit to the previous triplet.
Marginal edge posterior probabilities p(j → i) obtained by consolidating 10 separate posterior densities. The Null parent signifies no parent.
Figure 4: A connectivity graph for the 9 CRG model (a) constructed using edges with estimated marginal posterior probability p(j → i) ≥ 0.6 and (b) derived from a ternary network model fit. In both plots merging or splitting of edges is indicated (more ...)
Lastly, we examine the complete ternary network. We note that the network topology () shows a similar partition into highly connected circuits (Zfp385/Jag2/Notch3 and Wnt9a/Sfrp2/Id4) as can been seen in and . The remaining genes play no important role. In general, the networks shown in and are very similar (edges which are not in common are indicated in gray).
The circuit Wnt9a/Sfrp2/Id4 forms an attractor which alternates over-expression of Sfrp2 with under-expression of Wnt9a and Id4. One advantage of the ternary network model is that it predicts those perturbations which are able to force a transition from one meta-stable state to another. Thus we note that Id4 does not induce feedback, so that the self sustaining behavior is due to Wnt9a and Sfrp2 alone. This circuit can be induced by setting Wnt9a or Sfrp2 individually to their attractor levels. Alternatively it can be induced by an over-expression of Jag2, which forces under-expression of Zfp385 and subsequently Wnt9a, which suffices to induce the cycle. This is predicted by applying operator A iteratively to an initial state characterized by over-expression of Jag2 (see ).
Table gives a sample pathway induced by iterating unperturbed operator A from initial state (0,0,0,1,0,0,0,0,0) until the Wnt9a/Sfrp2/Id4 attractor is reached (iterations 4–6).
The network contains an interesting robustness property. If Wnt9a is over-expressed, or Sfrp2 is under-expressed, contrary to the predicted attractor, a regulatory action on the part of the Zfp385/Jag2/Notch3 circuit is induced which forces a return to the attractor (see ).
Table gives a sample pathway induced by iterating unperturbed operator A from initial state (0,0,0,0,0,0,1,0,0) until the Wnt9a/Sfrp2/Id4 attractor is reached (iterations 8–10).
We emphasize that this network is only one of many plausible networks. The advantage of Bayesian averaging is that any feature can be assigned a posterior probability. We may then reject or accept features with probabilities close to zero or one. Further experimentation can then resolve features possessing greater uncertainty.
In spite of any modeling uncertainty, a coherent GRN emerges, with a circuit involving genes Wnt9a, Sfrp2, Id4 predicted as a possible meta-stable state. There is some support in the literature for this model. These genes are known to regulate cell proliferation and differentiation (Lasorella, Uo, and Iavarone, 2001
, Suzuki, Watkins, Jair, Schuebel, Markowitz, Chen, Pretlow, Yang, Akiyama, Van Engeland, Toyota, Tokino, Hinoda, Imai, Herman, and Baylin, 2004
, Xiang, Lin, Zhang, Tan, and Lu, 2008
) and have been associated with promoting or inhibiting tumor activity (McMurray et al., 2008
). In addition, consistent with their proposed role as regulators in the observed network, each of the down-regulated CRGs – Jag2, Notch, and Zfp385 – behaves as a strong inhibitor of the cancer phenotype upon resetting its expression to levels found in normal parental cells (McMurray et al., 2008
). In the present analysis, these genes appear to play a regulatory role with respect to the Wnt9a/Sfrp2/Id4 circuit. Consistent with this prediction, interactions between Notch and Wnt signaling have been observed in the regulation of stem cell proliferation and cell fate choice (Crosnier, Stamataki, and Lewis, 2006
3.2.2. Statistical Properties of Model Fits
To test the model, the algorithm was run with 30 replicates for each score type, using the parameter values described in Appendix I
. We refer to scores using dM
) and dT
), as introduced in Section 2.2, as score types 1 and 2, respectively. 20 replications using score type 1 were run for each value of Ns
= 2,000, 10,000 and 25,000. The type 1 score distributions grouped by Ns
() clearly show the dependence of the closeness of the fit on the Ns
parameter. The minimum score achieved was 2.0, giving the approximate number of incorrectly predicted steady state gene expression outcomes out of 72 (72 = 9 experiments × 8 unperturbed gene outcomes). The mean score for parameter Ns
= 100,000 was 2.87. This is close to the minimum attainable of 0, so we are presented with the question of whether these fits are viable solutions, or whether overfitting plays a dominant role. In this algorithm, no complexity control is employed other than imposing an indegree bound of K
= 2. It was observed that most low scoring fits reached bound K
for most genes. However, the construction of a Bayesian posterior distribution will permit us to directly assess the degree of overfitting due to network complexity.
Figure 5: Analysis of fit replications using score type 1. (a) Boxplots of score values D(A | Y) grouped by Ns parameter. (b) Plot of extreme cycle count against score D(A | Y). A running mean smoother with bandwidth 0.5 is superimposed. (c) A running mean smoother (more ...)
A different form of overfitting involves the appearance of extreme cycles in which expression of a single gene alternates between −1 and 1. Conceivably, such cycles introduce enough complexity to permit closer fitting of the steady states. This effect may be especially relevant for score type 1, since if the outcomes within a cycle are equally frequent then the predicted steady state value will be 0, which is the value of most observed outcomes in the data. For each of the 90 score type 1 fits, the number of extreme cycles was calculated (each of 72 predicted outcomes may have a distinct extreme cycle). A scatter plot of extreme cycle count against score D(A | Y) is shown in , with a running mean smoother superimposed. In the proportion of fits with > 0 extreme cycles is estimated as a function of D(A | Y) using a running mean smoother. The tendency for the model to fit extreme cycles with greater frequency as D(A | Y) decreases is evident. From , when using score type 2 extreme cycles are permitted, but are always penalized. shows histograms of extreme cycle count grouped by score type (70 fits using score type 1, Ns > 2,000, 30 fits using score type 2, Ns = 100,000). The two samples were tested against equality using a permutation procedure, with significance levels P = 0.048 using difference in means and P = 0.025 using difference in zero-deleted means. We conclude that score type 2 succeeds in controlling against extreme cycles and have confined our analyses to fits using this score.
Histograms of extreme cycle count grouped by score type. For score type 1, 70 replications using Ns = 10,000, 25,000, 100,000 are used, for score type 2, 30 replications using Ns = 100,000.
For each score type 2 fit the minimum attained score was 4. Conceivably, this minimum represents the lowest attainable score with indegree bound K = 2, although with no available analytical theory, this can be verified only by an exhaustive model search. We also expect many solutions to achieve exactly this score.
defines a method of summarizing attractors as vectors of single components. Each component of the summary vector is determined by observing the sequence of that component’s outcomes in the attractor cycle. The attractor summary component is set to 0 if all outcomes in the sequence are 0, set to 1 if a 1 occurs at least once but −1 does not occur, set to −1 if a −1 occurs at least once but 1
defines a method of summarizing attractors as vectors of single components. Each component of the summary vector is determined by observing the sequence of that component’s outcomes in the attractor cycle. The attractor summary component is set to 0 if all outcomes in the sequence are 0, set to 1 if a 1 occurs at least once but −1 does not occur, set to −1 if a −1 occurs at least once but 1 does not occur, and set to 9 if both 1 and −1 occur.
It is important to emphasize that analysis involving operator A differs substantially from that involving perturbation operators AEk. The attractors associated with A are not observed, but represent the behavior of the unperturbed cell, and would therefore be of greater interest in a general study of the behavior of the cell, forming the basis of an in silico model of the cells’ GRN. In addition, the steady states resulting from any new perturbation experiment E′ can be predicted computationally using AE′.
We examined the predicted attractor summaries for operator A for a variety of initial states. We use initial states corresponding to experiments E1,…,E9, except that the initial state is not forced to be persistent. For example, corresponding to E1, a single forced persistent over-expression of HoxC13, we set initial state x0 = (1,0,0,0,0,0,0,0,0), then observe iterates Am(x0), m ≥ 1, until a complete attractor is observed. A variety of attractors exist among the replicates, but a clear trend emerges. There are three attractors which dominate. The first is the null attractor (0,0,0,0,0,0,0,0,0), essentially representing indefinite gene expression at control levels. Another highly represented attractor is (0,0,−1,0,0,1,−1,0,0). This is of particular interest, since it represents cyclical behavior or feedback in which high levels of Sfrp2 alternate with low levels of Id4 and Wnt9a. The third attractor is (0,0,1,0,0,1,−1,0,0), a variation of the previous one. There are 9 other attractors represented, however, 8 of these are represented in only 1 fit, and the remaining one in 2 fits. From a Bayesian point of view, the three frequently represented attractors appear to represent a model feature with high posterior probability, while the remaining attractors can be assigned low posterior probability. In addition, there appears to be some consistency across the fits in terms of the transient structure of A. Given initial states defined by over-expression of HoxC13, Id4, Notch3, or under-expression of Plac8, the attractor is the null attractor in most fits (25/30, 29/30, 30/30, 30/30, respectively). For initial states defined by over-expression of Jag2, Sfrp2, Wnt9a, Zfp385, the attractor is a non-null attractors in most fits (27/30, 30/30, 27/30, 27/30). For the initial state defined by over-expression of Id2 there is more uncertainty, in that 16/30 fits predict the null attractor.
3.2.3. Consistency of Bayesian Posterior Densities
Under the methodology defined in Section 2.3 a SA algorithm is first used to determine τ
*, which defines the posterior density ϕτ*
used to estimate posterior probabilities. Some flexibility can be introduced by using posterior density ϕrτ*
for some positive factor r
. For our analysis the posterior density was sampled for the first 10 model fits using the reported model fit as the initial point of the sampler, using r
1,3. Our primary concern will be with consistent convergence. While we do not expect the algorithm to converge consistently to a single model, we should expect consistent convergence to a single posterior density, to within some acceptable approximation.
We first summarize the graph structure. Using Bayesian averaging the posterior probability that j is a parent of i, p(j → i), was estimated for each posterior density. A no parent state was included. These probabilities appear to be consistent across model fits ( and ). Under consistent convergence, these values should be found close to the identity. If we compile all absolute differences | p(j → i) − p′ (j → i) | where p and p′ denote distinct posterior densities, for each pair, we find 97%, 98% and 2% no greater than 0.1, no greater than 0.25 and greater than 0.95, respectively, for r = 1. The corresponding percentages for r = 3 are 87%, 95% and 0%. The numbers, along with the plots themselves, suggest a straightforward consequence of varying r – lower values yield greater overall consistency but a greater number of large differences. This tendency was also observed by the authors for other model features. It is suggestive of the bias/variance trade-off common in many forms of complex statistical modeling. In the our analyses, we use the value r = 3, as the more cautious choice.
Scatterplots of p(j → i) for posterior density pairs (1,2),(2,3),…,(9,10), using temperature τ*. Identity and +/− 0.1 boundaries are indicated.
Scatterplots of p(j → i) for posterior density pairs (1,2),(2,3),…,(9,10), using temperature 3τ*. Identity and +/− 0.1 boundaries are indicated.
We noted above that when examining multiple fits, there was considerable variation in the attractor structure, but that only 3 attractors appeared consistently in most fits. Interestingly, these rarer attractors have little influence on their associated posterior densities. For example, the attractor (1,0,−1,1,9,9,−1,1,1) appears in one model fit, but has negligible posterior probability in the associated posterior density. This is generally true of all the rarer attractors. This is an important feature of the methodology, since it identifies which features of a given model are likely to be spurious, without the need for extensive model fitting replications.
Finally, we briefly raise a technical consideration. Recall that when calculating the score D(A | Y) the operators AEk are iterated from a given initial state until an attractor is reached. This means that there may be table entries which are never used. For example, in the table for Zfp385 it is predicted that the simultaneous under-expression of Jag2 and Notch3 will induce the over-expression of Zfp385. However, that particular combination of outcome levels for Jag2 and Notch3 was not encountered during the scoring of A. Although assigning values to each table entry is needed to conduct a random search, and possibly to perform further Bayesian averaging, it is still appropriate to flag these entries, since they have no effect on the score or on any of the resulting attractors or pathway structure of the perturbed operators.