Below, we formally describe the algorithm, including motivation, justifications, and several optional modifications. The specifics of implementations will depend on the simplicity of the domain as well as the desired degree of resource investment, with the more involved options requiring a larger number of experiments.
The main incentive for this algorithm is the recovery of the structure of large networks of proteins from limited measurement sets. In parallel, the number of experiments should be kept to a minimum. In this section, we present an algorithm for structure learning from small data sets. In addition, we present several additional modifications of the algorithm that offer better accuracy in exchange for a higher amount of experiments. Finally, we discuss the algorithm and study why we believe it might give good results in general.
The most basic form of the algorithm has three main steps: Determining the neighborhoods of variables, running constrained BN learning on measurements of those neighborhoods, and pruning the resulting graph for parasitical edges. A more detailed description is as follows:
The BN step is a regular BN search that is done in a restricted space, since to score a certain child/parent set in the BN algorithm, we need them to be measured together. Thus the search is restricted to graphs whose child/parent sets are measured in at least one of the neighborhoods. Since we are considering a smaller search space, it might appear that the quality of the results is strongly compromised. In the following section, we show that this doesn't have to be the case.
Detecting Neighborhoods and deciding on which variables to include in a certain neighborhood is at the heart of the MN algorithm. It can be accomplished in a number of alternate ways. The most basic subroutine determines the variables that are most “indicative” of each xi and adds them to xi's neighborhood. We include a variable xj in xi's neighborhood if under any condition, there is a strong dependence between xj and xi:
It is important for the results that every variable is measured with all of its parents at the same time. Thus when a certain neighborhood is too large to be measured at once, attempting to prune that neighborhood down is essential. Since the goal is to keep the parents of the variable (xi) in its neighborhood, variables that are independent of xi when conditioned on (some of) xi's parents can be removed from the neighborhood. More advanced pruning can be done to distinguish parents from children, but this requires perturbations on variables in the neighborhood in general.
This pruning requires additional experiments to be run, since it needs measurements of the variable with different candidate parents. Once a variable is removed from the neighborhood, it does not have to be included in further experiments in this step.
If the pruning fails to make the neighborhood size small enough, two approaches can be used. A hard cutoff can be used to determine the neighborhood. For example, the variables with most effect on xi will be included in N(xi), or prior knowledge from the literature can be used to indicate which variables are the most likely parents. On the other hand, several experiments measuring xi with different collections of its parents might be included in the BN learning step. This would improve the results in general, but a huge number of experiments might be needed to cover all possible combinations.
We finally turn to the subroutine of detecting parents from perturbations. The premise of this subroutine is that a variable's children are most likely to be the ones affected most by a perturbation on that variable. Thus, this subroutine has a parameter DN, which is the number of descendants detected by perturbations that we include as children of the perturbed variable. This parameter can be determined by prior knowledge and the degree of confidence in the perturbations (the quality of the inhibitors, for example).
A final intricacy of the perturbations parents is in the case where a perturbation on variable y affects variable x substantially, but x and y are not correlated statistically. In this case, y can be added to the neighborhood of x without taking the place of one of the variables already in the neighborhood. This can be done as follows: an experiment can be run with y and N(x) measured together (without x). From that experiment, the empirical distribution of Y |N(x) can be determined. From this, the experiment with x and N(x) can be augmented with y (by choosing y according to Y|N(x). This allows larger effective neighborhoods, which in turn produces better results in the BN learning step.
In this subsection we turn to the study the validity of the output of the Markov Neighborhood Algorithm. We argue that with reasonable neighborhood sizes and some careful study, even the optimal results recovered by the BN algorithm with full data can be reached.
Let the optimal graph recovered from the BN study on full data be GBN, and the optimal graph recovered by the MNA with neighborhood size n be GMNA(n). First note that in the BN scoring, measuring each child/parent set simultaneously is sufficient for scoring. This is because the Bayesian score is the product of probabilities of child/parent sets. This implies that if every child/parent set is measured simultaneously in the MNA, GBN can be scored (or more accurately, GBN is in the restricted search space).
Two additional facts have to be accounted for: the measurement of each child/parent set and the correct scoring of graphs. The second issue arises because different child/parent sets might in general belong to a different number of neighborhoods, and thus we might have a different number of measurements for different child/parent sets. One has to be careful because the Bayesian score is sensitive to the number of measurements. This problem can be easily solved if the number of measurements (per neighborhood) is large and the priors on the subgraphs (of child/parent sets) are inversely proportional to the amount of data measuring the subgraphs. Alternatively, one can make sure that they use the same number of measurements while scoring every child/parent set.
The previous two points tell us that if we make sure that all child/parent sets in GBN are measured in a certain neighborhood, then GMNA(n) = GBN. This is what motivates our methodology for choosing the neighborhoods; we assume that parents are highly correlated with their children, or that perturbations on them affect their children most. It is obvious why these assumptions are natural, additionally, using conditional independence to prune the neighborhoods allows highly correlated grandparents and grandchildren to be picked out.
Thus under our assumptions, the Markov Neighborhood algorithm will produce results that are close to the results from BN learning on full data. Some parasitical edges might be enforced because of the restrictions on the search space, and the last step in the MNA attempts to clear those edges out, to further improve the results.