Boolean networks were introduced by Kauffman in the sixties and were one of the first methods to describe gene expression data 
and model a gene regulatory network (GRN). GRNs have been the focus of research in the bioinformatics and genome sciences community for more than a decade now. A GRN can be viewed as a network composed of nodes and edges, where nodes represent genes or proteins and edges symbolize various relationships, at the molecular level, such as protein-protein or DNA-protein interactions 
. There are many existing techniques for modeling GRNs, e.g., Bayesian networks 
, neural networks 
, support vector machines 
, metabolic control analysis 
etc. A review of inference techniques of GRNs has been presented by Hecker et al.
. The problems and methods related to GRNs have also been outlined by Han et al.
, where they discuss the various network topologies and the reconstruction methods. One technique used to model GRNs is based on Boolean network models.
Boolean networks are constructed by discretizing the expression data to two states, zero indicating ‘off’ state and one indicating ‘on’. The state of any node at any time is determined by its parent nodes and the updating scheme, i.e., synchronous or asynchronous. This is a very strong simplification with respect to real GRNs that may not be binary, but it has been shown that “meaningful biological information” is still present in data even if it is quantized to two possible states 
. There are also effective methods to discretized gene expression data to binary form (see e.g., 
In this article we present a fast, efficient and accurate method called ReBMM (reverse engineering of Boolean networks using Bernoulli mixture models) for reverse engineering Boolean Networks by making use of a probabilistic model called a Bernoulli mixture model. We illustrate the reconstruction of a Boolean network from state transition data that represents a time series of gene expression data. Our method is based on learning Bernoulli mixture models from raw data and these mixtures are then used to determine the network structure and also the logical rules governing each node in the network. Most of the existing reverse engineering algorithms have a high time complexity, normally a polynomial of a high degree or an exponential in some cases 
. Their application is therefore restricted to small networks. In this work we show that ReBMM has significantly reduced time complexity as compared to other methods and yet it predicts a Boolean network with little or no compromise in accuracy.
When reverse engineering Boolean networks, two problems have to be addressed. One is to determine the network structure in which the parents of each individual node have to be determined. Secondly, the rules governing each node in the network have to be inferred from the given data. A majority of the state of the art reverse engineering algorithms generally involve a combinatorial search through the space of all node combinations to reconstruct a Boolean network, leading to a high time complexity. Normally these techniques limit the indegree,
, of a node to a small number like 3 or 4. Some examples of reconstruction algorithms include REVEAL, which is based on the information theoretic principle and has a time complexity factor a multiple of
; predictor chooser method of Ideker et al.
that uses minimum set covering 
; minimal sets algorithm and genetic algorithms by Dimitrova et al.
Monte-Carlo type randomized algorithm of Akutsu et al.
, which has an exponential time complexity. Best fit extension principle of Lähdesmäki et al.
again involves a factor of
is the total number of nodes in the network. Nam et al.
devised an efficient search algorithm for learning Boolean networks 
, with a complexity of
is the total number of time steps (explained later). For all the afore mentioned methods, the performance deteriorates considerably with an increase in the indegree and the size of the network.
Recently Maucher et al.
introduced a very elegant algorithm for reconstructing Boolean networks by using correlations. The method has a time complexity quadratic in the number of nodes in the network 
. Their approach is one of the first methods which is independent of the indegree of a node. However, their algorithm is only restricted to carrying out step 1 of the algorithm, i.e., the structure of the network is determined but the set of rules governing each node in the network is not determined. Their method is also restricted to reverse engineering networks, which have only Boolean monotone functions like the logical AND and the logical OR and cannot deal with functions like the XOR.
In this paper, we introduce ReBMM, a method that infers a Boolean network and has a complexity quadratic in terms of the number of nodes in the network and does not depend upon the indegree of a node. The method is simple and intuitive and based on a probabilistic model, which is later converted to a rule based system using a very simple conversion technique. We show how Bernoulli mixture models can be used to determine the parents of a node and how these mixtures lend themselves to a natural representation of logical Boolean rules, an approach which has not been explored before. Our method can also infer both monotone and non-monotone functions.
ReBMM is an intuitive method that derives a rule based system from a probabilistic model based on Bernoulli mixture models. Finite mixture models of probability distributions represent a convex combination of different parametric distributions, where each distribution has its own set of parameters 
. These mixtures are used in situations where the actual shape of the distribution is unknown, and a single distribution is not sufficient to model the local variations in data. Previously, finite mixture models have been widely used in cluster analysis, where the task is to find entities of a similar nature and to group them together 
. In this paper we extend the usage of mixture models to extract rules and summaries from data.
ReBMM was devised as a model to explain the data derived from a plant signaling network called the SIGNET dataset (see http://www.causality.inf.ethz.ch/pot-luck.php
. This was launched as a part of the causality challenge, which consisted of different tasks related to causal discovery 
. We developed this method to analyze and infer asynchronous Boolean networks. In this paper we extend our work to synchronous networks and also carry out a detailed analysis of the complexity of this method. We have also devised a model selection scheme for this method and tested it on synthetic and real life datasets. Finally we demonstrate how this algorithm can be applied to bigger networks with varying indegrees of individual nodes.