Search tips
Search criteria 


Logo of ejbiosysbioEURASIP Journal on Bioinformatics and Systems Biology
EURASIP J Bioinform Syst Biol. 2016 December; 2016: 1.
Published online 2016 January 15. doi:  10.1186/s13637-015-0035-4
PMCID: PMC4715135

Bayesian estimation of the discrete coefficient of determination


The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.

Keywords: Discrete coefficient of determination, Bayesian inference, Gene regulatory network inference


DNA regulatory circuits can be often described by networks of Boolean logical gates updated and observed at discrete time intervals [16]. In a stochastic setting, the degree of association between Boolean predictors and targets can be quantified by means of the discrete coefficient of determination (CoD) [7]. As such, the CoD is a function of the joint probability of target and predictor variables, which, however, is usually unknown in practice. Hence, this requires the inference of the discrete CoD- given sample data. A larger sample-based CoD value indicates a tighter regulation between target and predictors.

The concept of CoD has far-reaching applications in genomics. The CoD was perhaps the first predictive paradigm utilized in the context of microarray data, the goal being to provide a measure of nonlinear interaction among genes [7]. The CoD has been used in the reconstruction or inference of gene regulatory networks using gene expression data quantized into discrete levels [811]. It has also been used in the definition of the intrinsically-multivariate prediction (IMP) criterion for the characterization of canalizing genes [12, 13]. In [1416], we studied the inferential theory of the discrete CoD in a classical framework, by means of nonparametric and parametric maximum-likelihood estimation (MLE) approaches.

Classical parametric and nonparametric approaches to CoD estimation have been investigated in [14, 15]. In the present paper, we introduce a fully Bayesian approach to the inference of the discrete CoD, based on a parameterized family of target-predictor distributions. Given the priors, the probability model and sample data, we obtain the posterior distributions of the parameters, which can then be used to obtain the optimal predictors and prediction error estimators for the given problem. Such a Bayesian approach for prediction error estimation was first introduced in [17, 18], in a classification context.

Part of the work presented here appeared in [19], which introduced the minimum mean-square error (MMSE) Bayesian CoD estimator. In the present paper, we provide an exact representation of the analytical expressions of this estimator, and in addition, introduce the optimal Bayesian predictor (OBP) CoD estimator, which is based on an optimal predictor with the minimum expected true error with respect to the posterior distributions of the parameters [20, 21]. We derive exact formulas for the bias, variance, and root-mean-square (RMS) error of the OBP CoD estimator. The accuracy of both Bayesian CoD estimators is compared against that of several nonparametric CoD estimators by numerical simulations. The results indicate that the Bayesian MMSE CoD estimator is the best one when averaged over all distributions and samples, whereas the simpler OBP CoD estimator, though suboptimal in the MMSE sense, can be more accurate than the MMSE CoD estimator, in a frequentist sense, under low-variance informative priors around fixed parameters corresponding to a fixed distribution between target and predictors. It is also unsurprisingly found that priors with higher densities around true fixed distributions produce more accurate Bayesian estimators in a frequentist sense.

This paper is organized as follows. In Section 2, we introduce the discrete model for prediction and present the coefficient of determination in this model. In Section 3, we develop a Bayesian framework of the inference of the discrete CoD, define two Bayesian CoD estimators, one in the sense of minimum mean-square error (MMSE), and the other based on the optimal Bayesian classifier, and derive the analytical expressions for both Bayesian CoD estimators. In Section 4, we first present an exact formulation of accuracy metrics for the OBP CoD estimator. Afterwards, we discuss the accuracy of both Bayesian CoD estimators when averaged over all distributions and samples as well as under fixed distributions under varying priors, and their comparison with the nonparametric CoD estimators. Section 6 describes an approach to the inference of gene regulatory networks using the proposed Bayesian CoD estimators and illustrates the approach with gene expression data from a previously published study on metastatic melanoma. Finally, Section 7 presents concluding remarks.

The discrete coefficient of determination

The CoD, which was originally defined in classical regression analysis, gives the relative decrease in unexplained variability when entering a variable X into the regression of the dependent variable Y, in comparison with the total unexplained variability when entering no variables. Dougherty and collaborators extended the concept of CoD to discrete random variables [7]. Given a specified error criterion, such as the mean-square error or the mean-absolute error, the CoD was defined in [7] as


where ε 0 is the minimum error of predicting Y by a constant (i.e., in the absence of observations) and ε is the minimum error of predicting Y based on the observation of X. Since εε 0 (all sensible error criteria satisfy this property), the CoD ranges from 0 to 1. The closer it is to one, the closer ε is to zero and the tighter the association between predictor and target variables, whereas the closer it is to zero, the closer ε is to ε 0 and the weaker the association is. By convention, CoD=0 when ε 0=0. The CoD is a function only of the distribution of (X,Y); in particular, it is not a function of sample data. This definition of the CoD reduces gracefully to the classical one in the case when (X,Y) is jointly Gaussian [7].

We consider in this paper the case where X = (X1X2, …, Xd) ∈ {0,1}d is a binary vector of predicting variables and Y[set membership]{0,1} is a binary target random variable. For example, X and Y may consist of the active/inactive expression state of various genes. The probability distribution of the pair (X,Y) is specified by the probability c=P(Y=0), and the probabilities p i=P(X=x i[mid ]Y=0) and q i=P(X=x i[mid ]Y=1), for i = 1, …, b, with i=12dpi=1 and i=12dqi=1. Let (x1, …, x2d) be an arbitrary enumeration of the possible values of the predicting vector X. An optimal predictor of Y given X is well-known to be ψ *(X)= arg maxk P(Y=k[mid ]X) [22]. The minimum error of predicting Y based on the observation of X is therefore


where I A is the indicator function, which is equal to 1 if A is satisfied and zero, otherwise. On the other hand, an optimal predictor in the absence of observations is clearly given by ψ *= arg maxk P(Y=k), so that the minimum error of predicting Y by a constant is given by

ε0 =  min {P(Y = 0), P(Y = 1)} =  min {c, 1 − c}, 

Plugging (2) and (3) in (1) results in


This formula gives the relationship between the CoD and the parameters of the distribution of (X,Y).

Bayesian CoD estimators

In practice, the distributional parameters are generally unknown, and one would like to estimate the CoD from sample data. We present in this section the derivation of two Bayesian estimators for the CoD in (4). One approach is analogous to that followed by [17] in defining the Bayesian MMSE prediction error estimator, whereas the other one makes use of the optimal Bayesian predictor (OBP), a straightforward generalization of the optimal Bayesian classifier (OBC), introduced in [20].

We will assume that an i.i.d. sample S n={(X 1,Y 1),…,(X n,Y n)} from the distribution of (X,Y) is available. Given S n, define U i as the number of sample points such that X=x i and Y=0, and V i as the number of sample points such that X=x i and Y=1, for i = 1, …, 2d. Note that N0=i=12dUi and N1=i=12dVi are the (random) sample sizes corresponding to Y=0 and Y=1, respectively.

Let p = (p1, …, p2d), q = (q1, …, q2d), and θ=(c,p,q), where 0≤c,p i,q i≤1, and i=12dpi=i=12dqi=1. As shown in the previous section, the distribution of (X,Y) is completely specified by the parameter vector θ. The Bayesian approach treats θ as a random variable, the prior distribution of which can take advantage of a priori knowledge about the problem. We will assume that c, p, and q are independent, i.e., f(θ)=f(c)f(p)f(q). It is shown in [17] that this implies that the posterior distribution of θ also factors f(θ[mid ]S n)=f(c[mid ]S n)f(p[mid ]S n)f(q[mid ]S n).

In this paper, we will employ the standard choice of priors for discrete distributions, namely, the Beta and Dirichlet distributions (c.f. Appendices A and B):


where the hyperparameters α, β, α i, β i, i=1,…,2d, are positive numbers. These distributions have bounded supports; the Beta distribution is defined over the interval [0,1], while the Dirichlet distribution is defined over the simplex of 2d nonnegative numbers that add up to one. The shapes of the distributions are controlled by the concentration parameters Δ c=α+β, Δp=j=12dαj, and Δq=j=12dβj, and the base measures c 0=α/Δ c, p0 = (α1/Δp, …, α2d/Δp), and q0 = (β1/Δq, …, β2d/Δq). Please refer to Appendices A and B for definitions and important facts about the Beta and Dirichlet distributions, which will be needed in the sequel.

A very important property for our purposes is that the Beta and Dirichlet priors are conjugate priors for the discrete multinomial distribution, i.e., they have the same form as the corresponding posteriors. Given the sample data S n, the posterior distributions are [17, 18]:


where n 0 and n 1 are the observed sample sizes corresponding to Y=0 and Y=1, respectively, while U i and V i are the observed sample values of the random variables U i and V i, respectively.

Minimum mean-square error CoD estimator

Given a CoD estimator CoD^, consider the mean-square error


The minimum MSE solution, as is well known, is given by the expectation of the CoD according to the posterior distribution of the parameters [23]. This defines the Bayesian MMSE CoD estimator:


where the CoD is given by (4).

It is well-known that the MMSE estimator CoD^MMSE not only displays the least root mean-square error (RMS) over the distribution of (θ,S n), but it is also an unbiased estimator (however, for a specific model with fixed θ, CoD^MMSE might not be unbiased or have the least RMS).

In order to derive an expression for the Bayesian MMSE CoD estimator, first note that (4) can be rewritten as


Applying (8) to (9) and using the previously mentioned fact that the posterior distribution factors allows one to write the Bayesian MMSE CoD estimator as


Using (6) and the fact that the marginal distributions of a Dirichlet are Beta (c.f. Appendix B), we have that c [mid ] S n~Beta(α s,β s), piSnBeta(αis,α¯is), and qiSnBeta(βis,β¯is), where α s = n 0+α, β s=n 1+β, αis=ui+αi, α¯is=n0ui+Δpαi, βis=vi+βi, and β¯is=n1vi+Δqβi, for i=1,…,2d. Using the results in Appendix A and assuming that the hyperparameters are integers (if they are not, a simple adjustment to the derivation below can be made; see Appendix A), it follows that


Likewise, we have





where the Beta function B(a,b) and the coefficients r i(a,b) are defined in Appendix A.

Replacing (11)–(14) into (10) produces an exact expression for computing the MMSE CoD estimator in terms of sample sizes and model hyperparameters. Notice that for the previous expressions to make sense, one must have α>Δ p−1 and β>Δ q−1. In particular, if uniform priors are chosen for p or q, then the prior for c cannot be uniform (c.f. Appendix A).

Optimal Bayesian predictor CoD estimator

In this section, we derive a second Bayesian CoD estimator, using the optimal Bayesian predictor (OBP), a simple extension to the Boolean prediction problem of the “optimal Bayesian classifier” (OBC) proposed in [20]. Formally, let ε θ[ψ] denote the error of a predictor ψ under parameter vector θ. The OBP predictor ψ OBP minimizes the average error over the family of (posterior) distributions indexed by the parameter


Using the results of [20] for the OBC, one can verify that the OBP for the Beta-Dirichlet model considered here is given by


for i=1,…,2d, with optimal prediction error


On the other hand, the average errors of the the constant predictors ψ[equivalent]0 and ψ[equivalent]1 are


respectively, so that the OBP error in the absence of observations is


We can then combine (17) and (19) to obtain the optimal Bayesian predictor (OBP) CoD estimator


It is easy to show that 0ε^OBPε^0,OBP, and thus 0CoD^OBP1.

Execution time for computation of the OBP CoD estimator grows as O(2d). By comparison, the complexity for exact computation of the Bayesian MMSE CoD estimator introduced in the previous subsection is O(n 3 × 2d). Neither n or d tends to be too large in Genomics applications, due to small sample sizes and the fact that the average number of predictor genes d per target gene must be small for a stable system, as remarked by S. Kauffman in [2]. However, if n and d become large, one could devise Monte Carlo approximation methods to compute both CoD estimators.

Therefore, the OBP CoD estimator, though suboptimal, is much more efficient computationally than the MMSE CoD estimator, especially at large sample sizes. In addition, we will see in the next section that the OBP CoD can be even more accurate than the MMSE CoD estimator, in frequentist sense, under a fixed value of the parameters.

Performance analysis

In this section, we investigate the accuracy of the Bayesian CoD estimators proposed in the previous section. We distinguish between two types of accuracy metrics: global metrics concern the average performance over all samples and all distributions of (X,Y), weighted by the prior distribution of θ, whereas fixed-parameter metrics have to do with the average performance over all samples, but under a particular distribution of (X,Y), corresponding to a fixed value of the parameter θ. Fixed-parameter metrics thus evaluate the proposed Bayesian estimators from a purely frequentist perspective.

For a given Bayesian CoD estimator, the fixed-parameter accuracy metrics of interest are the bias


the variance,


and the root-mean-square (RMS) error,


It becomes clear that the fixed-parameter bias, variance, and RMS of a Bayesian CoD estimator can be obtained with knowledge of the first and second moments ESnθε^ε^0 and ESnθε^2ε^02.

The corresponding global accuracy metrics are obtained by taking expectation of the previous quantities with respect to the marginal (i.e., prior) distribution of θ.

As mentioned previously, the global bias of the Bayesian MMSE CoD estimator is zero and its global RMS is minimal among all CoD estimators. However, this does not imply that its fixed-parameter bias is zero or that its fixed-parameter RMS is minimum for all values of the parameter.

In what follows, we give exact expressions for the computation of ESnθε^ε^0 and ESnθε^2ε^02for the OBP CoD estimator. As argued previously, this allows the exact computation of the fixed-parameter bias, variance, and RMS of that CoD estimator. Via simple numerical integration, it is possible then to obtain the global bias, variance, and RMS. It turns out that similar expressions for the MMSE CoD estimator are much harder to obtain; the performance of that estimator are studied via a numerical approach in the next section.

All the expectations and probabilities below are with respect to S n [mid ] θ (the subscript will be omitted for convenience). In the expressions below, c, p i, and q i, for i=1,…,2d refer to the (deterministic) parameters in θ.

First note that


where M=(n+α+β)ε^0,OBP=min(n0+α,n1+β) and


where [left floor]x[right floor] denotes that the largest integer smaller or equal to x. Let L0=α,α+1,,n+βα2+α, L1=β,β+1,,n+αβ2+β. There are three possibilities: (1) α[left floor]α[right floor]β[left floor]β[right floor]; (2) α[left floor]α[right floor]=β[left floor]β[right floor] and α=β; (3) α[left floor]α[right floor]=β[left floor]β[right floor] but αβ. We will provide the derivation only in case (3); the other cases are similar, and lead to the exact same expressions.

We assume that α[left floor]α[right floor]=β[left floor]β[right floor] but αβ. Without loss of generality, we assume that α>β, and let α=β+δ, where δ is a positive integer. Notice that it is easy to show in this case that n+βα2+α=n+αβ2+β by considering the evenness and oddness of n and δ. Therefore, we have L 0[subset or is implied by]L 1. In the following, we discuss two cases when n+βα is even and when n+βα is odd.

(1) When n+βα is even, the event [M=m] is equal to the union of the disjoint events [ n 0=mα], and [n 1=mβ]=[n 0=nm+β], for mL0α+n+βα2, whereas M=α+n+βα2=n0=mα=n+βα2, and [ M=m] = [ n 0=nm+β], for m[set membership]L 1[backslash]L 0.

Now, we are going to use the fact that, for a random variable X and disjoint events A and B, one has1


We can then write Eε^OBPε^0,OBP as:


where P(n0=r)=nrcr(1c)nr and


with P(Ui=k,Vi=ln0=r)=rkpik(1pi)rknrlqil(1qi)nrl. The expression for Eε^OBP(n1+β)/(n+Δc)n0=nr is obtained from (27), with r+α replaced by r+β.

(2) When n+βα is odd, the event [M=m] is equal to the union of the disjoint events [n 0=mα], and [n 1=mβ]=[n 0=nm+β], for m[set membership]L 0, whereas [M=m]=[n 0=nm+β], for m[set membership]L 1[backslash]L 0. By applying the same reasoning, we have the same expression as in (26). Note that In1n+αβ2 is always equal to 1 in this case since n+αβ2 is not an integer.

For the second moment, we have


where M=(n+Δc)ε^0,OBP, as before. By using the same reasoning applied previously in the case of the first moment, we have


where, letting ki=(r+α)(k+αi)r+αi, li=(nr+α)(l+βi)nr+βi, rj=(r+α)(r+αi)r+αj, sj=(nr+α)(s+βi)nr+βj, for i,j=1,…,b, we have


with P(Ui=k,Vi=l,Uj=r,Vj=sn0=t)=n0k,rpikpjr(1pipj)n0krnn0l,sqilqjs(1qiqj)nn0ls. The expression for Eε^OBP2(n1+β)2/(n+Δc)2n0=nt is obtained from (30) with t+α replaced by t+β.

Numerical experiments

Global accuracy

In this section, we employ Monte Carlo sampling (with M=10,000 simulated data sets for each sample size) to compute global accuracy metrics of the two Bayesian CoD estimators. Following [17], we let α=2d+1=β=2d+1, which produces a prior for c peaked around the value c=0.5, and α i=β i=1, for all i=1,…,2d, i.e. flat (uniform) prior distributions for (p,q). In each iteration, the values of c and (p,q) are drawn from the respective priors, and then sample data is generated according to these probabilities. Given the sample data, we compute the exact Bayesian MMSE and OBP CoD estimates as expressed in Section 3, and compare them to the standard resubstitution CoD estimator, which is based on plugging in sample frequencies in the expression for the optimal CoD, and corresponds to the the original choice of CoD estimator in [7]. This estimator is also called the nonparametric maximum-likelihood CoD estimator in [15]. For further comparison, we also compute CoD estimators based on leave-one-out, 0.632 bootstrap and 10-repeated twofold cross-validation error estimators—for details on all these CoD estimators, please see [14, 15]. Sample means and sample variances are employed to approximate the global accuracy metrics of each CoD estimator.

Figure Figure11 displays the global bias, variance, and RMS as a function of varying sample size, for different numbers of binary predictive variables, d=1 through d=3. Several observations are evident. First, as expected, the Bayesian MMSE CoD estimator is unbiased and has the least RMS among all the estimators, and the gap in performance widens as dimensionality increases. Secondly, the OBP CoD estimator has the second-best performance, which indicates the benefits of using the Bayesian estimation approach. The accuracy of the OBP CoD estimator is quite close to that of the MMSE estimator for d=1, but the gap widens as d increases. Thirdly, it is also observed that the OBP Bayesian CoD estimator is pessimistically biased. Incidentally, the 0.632 bootstrap CoD estimator displays the best accuracy among the four nonparametric ones according to global RMS, but it is matched by the resubstitution CoD estimator as sample size increases.

Fig. 1
Global bias, variance, and RMS for several CoD estimators versus sample size n, for different numbers of predicting binary variables d. Top row: d=1; Middle row: d=2; Bottom row: d=3. Plot key: MMSE (red), OBP (blue), resubstitution (gold), leave-one-out ...

Fixed-parameter accuracy

In this section, we study the average accuracy of the two proposed Bayesian CoD estimators for a fixed parameter, that is, we evaluate the Bayesian estimators from a purely frequentist perspective.

As in the previous subsection, we consider d=1 through d=3 binary predictive variables. We consider fixed values of the parameters, c *, p *, and q *. In order to examine the effect of prior belief on performance, we consider four scenarios regarding prior density around the true parameters: a flat (“non-informative”) prior and three nonflat “matched,” “poorly matched,” and “mismatched” priors. This is done by assuming different base measures and concentration parameters for the priors (c.f. Section 3). As an illustration of the approach, consider the case d=1. In our simulation, c *=0.5, p *=(0.6,0.4), and q *=(0.4,0.6), and the base measures for the nonflat priors are c 0=0.5, p 0=p *, q 0=q * (matched prior), c 0=0.5, p 0=(0.5,0.5), q 0=(0.5,0.5) (poorly matched prior), and c 0=0.5, p 0=(0.4,0.6), q 0=(0.6,0.4) (mismatched prior). In addition, we consider different values of the concentration parameters to reflect different degrees of peaking of the prior distributions. Figure Figure22 plots the nonflat prior densities for p 1 (which is Beta-distributed), for different values of the concentration parameter: Δ p=5 (high-variance), Δ p=25 (medium variance), and Δ p=50 (low variance). Notice that each density is centered around the expected value. Note that, if the variance is high, even the matched prior becomes very diffuse around its expected value (which is the true value, in this case).

Fig. 2
Beta prior densities for p 1 and d=1, for different values of the concentration parameter: Δ p=5 (high-variance priors), Δ p=25 (medium-variance priors), and Δ p=50 (low-variance priors). Legend: dashed line (mismatched prior), ...

Table Table11 gives the values of the parameters used in the experiments. In all cases, the true value and base measure for c are the same, c *=c 0=0.5. In addition, in each case, the true value q * and base measure q 0 are obtained from p * and p 0, respectively, by flipping the corresponding vector left to right; for example, when p 0=(0.2,0.1,0.3,0.4) then q 0=(0.4,0.3,0.1,0.2). Therefore, only the values for p * and p 0 are shown in Table Table11.

Table 1
True distributions and nonflat prior base measures for fixed-parameter experiments. In all cases, c *=c 0=0.5, and q * and q 0 are obtained from p * and p 0, respectively, by flipping left to right (see text.)

Figures Figures3,3, ,4,4, and and55 show the results for d=1 through d=3 predictors, respectively. Each figure displays the bias, variance, and RMS as a function of the sample size for the Bayesian MMSE and OBP CoD estimators and the nonparametric CoD estimators. The Bayesian estimators assume a flat non-informative prior and three nonflat matched, poorly matched, and mismatched priors, specified by Table Table1.1. For the non-flat priors only, three different variance groups are considered, corresponding to three different settings for the concentration parameters: high variance, medium variance, and low variance priors. Results for the OBP CoD estimator are computed exactly using the results of Section 4. For all other CoD estimators, bias, variance, and RMS are approximated by averaging results over 5000 Monte Carlo samples drawn from the fixed distribution. The curves for the nonparametric CoD estimators and the flat prior Bayesian CoD estimators are repeated across the columns, for comparison with the results for the nonflat prior Bayesian CoD estimators.

Fig. 3
Bias, variance, and RMS for several CoD estimators versus sample size for d=1, under different values of the concentration parameter: Δ c/2=Δ p=Δ q=5 (high-variance priors), Δ c/2=Δ p=Δ q=25 (medium-variance ...
Fig. 4
Bias, variance, and RMS for several CoD estimators versus sample size for d=2, under different values of the concentration parameter: Δ c/2=Δ p=Δ q=10 (high-variance priors), Δ c/2=Δ p=Δ q=50 (medium-variance ...
Fig. 5
Bias, variance, and RMS for several CoD estimators versus sample size for d=3, under different values of the concentration parameter: Δ c/2=Δ p=Δ q=20 (high-variance priors), Δ c/2=Δ p=Δ q=100 (medium-variance ...

We can observe that, as expected, both Bayesian CoD estimators perform better when the prior is matched to the true value of the parameters than when the match is poor or nonexistent. In addition, for the matched prior, accuracy improves substantially as one moves from a diffuse (high-variance) to a peaked (low-variance) prior. This effect is especially visible in the case of the OBP CoD estimator. For example, with d=1 the RMS is reduced by nearly 80 % between the high-variance and low-variance matched priors. In fact, the accuracy of the OBP CoD estimator beats that of the MMSE CoD estimator for peaked priors, while the opposite is true under diffuse priors. Both Bayesian CoD estimators outperform the nonparametric ones in cases d=1 and d=3, whereas, in the d=2 case, the Bayesian estimators based on mismatched or poorly matched priors can perform worse than the nonparametric estimators, for larger sample size. It is also observed that, as the variance of priors decreases (i.e., for a larger δ value), the performance of both Bayesian estimators improves over the nonparametric ones. Moreover, it is interesting that the Bayesian MMSE CoD estimator performs better than the OBP CoD estimator, for a high-variance prior with matched prior, while the OBP CoD estimator beats the Bayesian MMSE CoD estimator for medium and high-variance matched priors. This indicates that the OBP CoD estimator is preferable due to its straightforward representation and superior performance with low-variance priors. Notice that the Bayesian MMSE CoD estimator has the least RMS only when averaged over all distributions and all possible samples, but its optimality does not apply to the settings with a fixed distribution. In addition, we observe that the Bayesian MMSE CoD estimator is less variant than the OBP CoD estimator. It can be seen that the Bayesian CoD estimators based on informative priors are less variant than those based on non-informative uniform priors. In the d=1 and d=3 cases, the OBP CoD estimator with uniform priors becomes more variant than even the cross-validation estimator, for larger sample size. In addition, the OBP CoD estimator is less biased in magnitude than the MMSE estimator for low-variance matched priors. However, as the variance of priors increases, the Bayesian MMSE CoD estimator turns out to have less bias than the OBP estimator.

Gene regulatory network inference: a melanoma example

We discuss in this section the application of the Bayesian CoD estimation approach discussed previously to the inference of gene regulatory networks. We apply the proposed inference procedure on data collected in a study of metastatic melanoma [24], containing 31 binarized sample expression profiles, which have been binarized, with 0 indicate no significant expression whereas 1 represents significant expression (either over- or under-expression). It was found in [24] that the WNT5A gene is a major driver of processes that lead to metastatic melanoma. We derive the logic relationships and wiring of a 7-gene WNT5A network consisting of genes selected using data analysis and prior biological knowledge: WNT5A, pirin, S100P, RET1, MART1, HADHB, and STC2; for more details about the selection of these genes, see [25, 26].

We assume a model where the target binary gene expression Y[set membership]{0,1} is regulated by a binary predictor gene expression vector X=(X 1,…,X d)[set membership]{0,1}d through the relationship

Yf(X)  ⊕  N

where f:{0,1}d→{0,1} is a Boolean function, the symbol “ [plus sign in circle]” indicates modulo-2 addition, and N[set membership]{0,1} is a noise Bernoulli random variable, independent from X, such that P(N=0)=p. The modulo-2 addition behaves as a XOR operation, which flips the state of the target Y when N=1, and leaves it unaltered when N=0. Hence, p quantifies the predictive power of the model: if p=1, the system is noiseless and prediction is deterministic, while if p<1, there is a degree of indeterminacy in the state of the target given the state of the predictors. This model is studied in detail in [15], where an inference procedure, based on a maximum-likelihood CoD estimator, is proposed to select the unknown Boolean function f, assuming that f is a member of a candidate model set F containing Boolean functions that depend on the same number k of essential variables. Each f in F is specified by (1) a Boolean function g:{0,1}k→{0,1} and (2) the indices for the predicting variable set {i 1,…,i k}[subset or is implied by]{1,…,d}, or wiring, such that f(X) =  g(Xi1, …, Xik). If the candidate boolean functions g belong to a model set G, then the total number of possible models is |G|×dl.

Here, we modify the network inference in [15] to allow the use of the Bayesian CoD estimators described previously. For a given target Y and predictor set X, we assume Dirichlet prior distributions as in (5). Instead of adopting a non-informative choice of hyperparameters, we employ an “empirical Bayes” approach, where the hyperparameters are estimated in part from the sample data, as described next.

First, it follows from the model in (31) that the parameters p i=P(X=x i[mid ]Y=0), q i=P(X=x i[mid ]Y=1), and c=P(Y=0) are given by:


The unknown quantities here are the predictive power p and the distribution P(X) of the predictors. Given the sample data S n={(X 1,Y 1),…,(X n,Y n)}, and a fixed Boolean function g and wiring {i 1,…,i k}, p can be very effectively estimated by means of the sample frequency [15]:


The distribution P(X) could in principle be also estimated from the data using sample frequencies; however, such an estimator can become very poor under small sample sizes and large dimensionality d. Therefore, we simply assume a flat distribution P(X=x i)=1/2d, for i=1,…,2d. Substituting this and (33) into (32) gives the values of the hyperparameters used in our experiment:


Recall from Section 3 that the shape of the Dirichlet prior distribution is determined by the hyperparameters through a location parameter, called the base measure, and a concentration parameter. Our strategy is to set up the estimates in (34) as the base measure, so that the Dirichlet priors are concentrated around them, to a degree specified by the concentration parameter. Formally, the hyperparameters are set to: {α1,,α2d}={p^1Δ,,p^2dΔ}, {β1,,β2d}={q^1Δ,,q^2dΔ}, α = ⌈ĉΔ⌉ and β = ⌈(1 − ĉ)Δ, where [left ceiling]x[right ceiling] gives the smallest integer larger or equal to x. The value of δ is tuned by the experimenter, either manually or using a data-driven procedure.

We are now ready to state the procedure to select a function f in F, consisting of a k-predictor Boolean function g and its wiring.

Bayesian model selection procedure

  1. For each of the Boolean functions g[set membership]G, compute the prior hyperparameters as described earlier. Obtain the MMSE Bayesian CoD / OBP CoD estimate under each of the dk possible wirings. Pick the wiring for g that produces the largest CoD estimate. Ties, if any, are broken randomly.
  2. Among the |G| pairs of Boolean function g and wiring obtained in the previous step, select the one that produces the largest predictive power estimate p^. Ties, if any, are broken randomly.

In our experiment with the 7-gene WNT5A network, we consider in turn each gene as a target and the remaining six genes as predictors (so that a gene cannot be a predictor of itself). Hence, d=6. In addition, we assume that that each gene is predicted by three genes out of the six predictors. Therefore, k=3 and there are 63=20 possible wirings for each target gene. The set G contains all 218 Boolean functions of exactly three essential variables (this is less than the full set of 223 = 256 3-input Boolean functions since those that are reducible to 0-, 1-, and 2-input logics are not considered). We set Δ=1.0 and apply the proposed Bayesian model selection procedure to infer a gene regulatory network for the MMSE and OBP CoD estimators. We also obtain the gene regulatory network produced by employing the standard model selection procedure, which picks the predictor set (among all 63=20 choices, in this case) with the largest estimated resubstitution CoD [25].

The results are presented in Fig. Fig.6.6. The diagrams represent the predicted logic functions as binary strings (in the usual logic table order; e.g., AND = 00000001) and the predicted wirings as oriented edges, and, in addition, the estimated CoD in each case is displayed. We can see that the predicted logic functions and wirings for the three networks are similar, especially in the cases of the OBP and resubstitution CoD networks. If one considers only the three top predicted relationships according to CoD magnitude, one obtains the diagrams depicted in Fig. Fig.7,7, which show that the same network is inferred by the OBP and resubstitution CoDs, which differ from the network obtained with the MMSE CoD by only a single arrow shift in the wirings (the inferred logics in all three cases are also very similar, differing by only a few bit shifts). The important difference between the Bayesian and standard approaches that can be observed from this experiment is in the estimated CoD magnitudes: those estimated with the standard resubstitution CoD tend to be much larger than the ones estimated with the Bayesian CoDs. This reflects the optimistic bias that tends to be displayed by resubstitution [27], a problem that is avoided by the Bayesian CoD estimators.

Fig. 6
Gene regulatory networks inferred using the Bayesian model selection procedure for the Bayesian MMSE CoD and OBP CoD and the standard model selection procedure using the resubstitution CoD
Fig. 7
Gene regulatory networks inferred using the Bayesian model selection procedure for the Bayesian MMSE CoD and OBP CoD and the standard model selection procedure using the resubstitution CoD, corresponding to the top three predicted relationships according ...


We introduced a Bayesian framework for the estimation of the CoD in a discrete prediction setting and analyzed the accuracy of the proposed Bayesian MMSE and OBP CoD estimators based on fixed and random parameters, using analytical and simulation methods. We also compared the accuracy of the two Bayesian CoD estimators against those of several classical CoD estimators, based on resubstitution, leave-one-out, bootstrap, and cross-validation prediction error estimation. Our results indicated that the Bayesian MMSE CoD estimator has the best performance with zero bias and least RMS, when averaged over all distributions and sample data, whereas, for fixed distributions, we conclude that priors with higher densities around the fixed distributions present better accuracy with less RMS. It is also interesting to see that the OBP CoD estimator, one with very simple calculation, can beat the Bayesian MMSE CoD estimator when using low-variance priors with higher densities around the parameters of the fixed distributions. Furthermore, we proposed an approach for inference of gene regulatory networks based on the proposed Bayesian CoD estimators, and applied it to the inference of a 7-gene regulatory network using melanoma data. We observed that the inferred boolean functions and wirings were similar for both CoD Bayesian estimators. Interestingly, the network inferred with the OBP CoD estimator was very close to the network obtained with the standard inference method based on the resubstitution CoD estimator; however, the magnitude of the CoDs were larger in the latter case, which is consistent with the fact that resubstitution tends to be optimistic. We hope that this paper will provide a theoretical foundation for further work on Bayesian estimation methodologies for the inference of gene regulatory networks. The issue of obtaining informative priors based on established biological knowledge about regulatory relationships, which was not addressed in detail here, is one that deserves careful consideration in future work on this topic.


1 A proof of this fact is given in the Appendix of [14].

Appendix A: the beta distribution

If X~Beta(a,b), where a,b>0, then the probability density function of X is given by


where the normalizing term B(a,b) is known as the Beta function:


Clearly, B(a,b)=B(b,a).

For k>−a and l>−b,


For example, E[ X]=B(a+1,b)/B(a,b)=a/(a+b) (the second equality can be proved using the definition of the Beta function in terms of the Gamma function and the properties of the latter [28]). Similarly, E[ 1/X]=B(a,b−1)/B(a,b)=(a+b−1)/(b−1), provided that b>1.

The incomplete Beta function is defined as


Notice that B(a,b)=IB(1;a,b).

It is easy to verify that


Finally, for k>−a and l>−b,


which follows easily from the definitions of the Beta density and the incomplete Beta function, and the fact that X[set membership][ 0,1]. In particular, if x≥1, then E[X k(1−X)l I Xx]=E[X k(1−X)l].

Clearly, all the previous quantities can be computed in terms of the incomplete beta function, an expression of which is given by the next result.


If X~Beta(a,b), then


where P=b−1 and


if b is an integer, or P= and




When b is a positive non-integer (that is, [left floor]b[right floor]>0), we have, by using the Taylor series expansion,


Note that [left floor]b[right floor] denotes the largest integer that is less than b. Therefore,


To interchange the integration and summation in (45), we need to construct a sequence of measurable functions g i(x), i=0,1,…,, that satisfy the following three conditions:

  • (i) (1)ib1ixa+i1gi(x), for all k and almost all x;
  • (ii) i=0gi(x) converges for almost all x;
  • (iii) i=001gi(x)dx<.

Let gi(x)=b1ixa+i1,i=0,,, and obviously the condition (i) is satisfied.

For 0≤x<1,




and thus the condition (ii) is satisfied.


Let j=1ibj11a+i=zi(i=1,2,,). Since


i=001gi(x)dx converges by Raabe’s test [29].

Now we can interchange the integration and summation in (45), and we have


When b is an integer, we have (1x)b1=i=0b1(1)ib1ixi, and it is easy to show that


Notice that B(a,b)=i=0Pri(a,b). Note also that the general case reduces to the special case if b is an integer. An equivalent expression can be derived where a appears in the binomial coefficient instead, which can then be used if a is an integer. If neither a nor b are integers, an approximation can be obtained by truncating the resulting infinite series, or by using a numerical software package.

If both a and b are integers, then IB(x;a,b) reduces to a polynomial in x. Otherwise, it is a simple matter to replace the finite summations by infinite series as specified in Theorem 1.

Appendix B: the Dirichlet distribution

Consider a random vector X=(X 1,…,X K), with K≥2, defined over the (K−1)-simplex


If X~Dirichlet(α 1,…,α K), where a i>0, for i=1,…,K, then the probability density function of X is given by


where the normalizing term B(a 1,…,a K) is the multivariate generalization of the Beta function:


The shape of the Dirichlet distributions controlled by the concentration parameter Δ=i=1Kai and the base measure a1,,aK=(a1/Δ,,aK/Δ). Note that the base measure is a valid discrete probability measure. It can be shown easily that


so that the base measure provides the “central” value around which X is distributed. In particular, large components in the base measure bias the distribution in their direction.

The concentration parameters, on the other hand, control the variance of the distribution around the base measure, with large values indicating smaller variance. In fact, it can be shown that [30]


From the previous equations, one can see that, as δ approaches infinity, variances converge to zero and X becomes equal to the base measure with probability 1; in addition, covariances also go to zero, rendering the components of X uncorrelated. The special case a i=1, for all i=1,…,K corresponds to a uniform over S K−1. This corresponds to a uniform base measure and concentration parameter Δ=K. If the base measure is not uniform but Δ=K, the distribution is approximately uniform. For δ approaching zero, the distribution becomes concentrated at the boundary of the simplex.

Summing up, large δ implies large probability density around the base measure, Δ=K implies a nearly uniform distribution, whereas δ close to zero produces sparse sample vectors with most of the components close to zero.

The Dirichlet distribution is the multivariate generalization of the Beta distribution, in the sense that the components of a Dirichlet-distributed vector X=(X 1,…,X K) are Beta distributed: X i~Beta(a i,Δa i), for i=1,…,K. Notice that in the case K=2 the Dirichlet distribution essentially reduces to the Beta distribution.


The authors acknowledge the support of the National Science Foundation, through NSF award CCF-1320884.


Competing interests

The authors declare that they have no competing interests.

About the Authors

Ting Chen received the PhD degree in electrical engineering from Texas A&M University, College Station, TX, in 2013. She worked as a postdoctoral researcher in the Department of Electrical and Computer Engineering at Texas A&M University from October 2013 to April 2014. She is now working as a bioinformatician at the Emmes Corporation, Rockville, MD. Her current research interests include the study of discrete prediction and estimation methods and their applications in genomics.

Ulisses M. Braga-Neto received the PhD degree in Electrical and Computer Engineering from The Johns Hopkins University, Baltimore, MD, in 2002. He is an Associate Professor in the Department of Electrical and Computer Engineering at Texas A&M University, College Station, TX, where he has been a faculty member since 2007. He was a post-doctoral fellow at the University of Texas M.D. Anderson Cancer Center, Houston, from 2002 to 2004, and a researcher at the Oswaldo Cruz Foundation (FIOCRUZ), in Recife, Brazil, from 2004 to 2006. Dr. Braga-Neto received an NSF CAREER Award in 2008 for his work on error estimation for pattern recognition with applications in genomic signal processing. He is a Senior Member of IEEE, was President of the Midsouth Computational Biology and Bioinformatics Society (MCBIOS) in 2010–2011, and was voted to the Texas A&M University Council of Principal Investigators in 2014. He has been associate editor for several journals and special issues. His research interests include signal processing for Boolean dynamical systems and error estimation for pattern recognition, with applications in the study of cancer and infectious diseases.

Contributor Information

Ting Chen, moc.liamg@xrl.nehC.

Ulisses M. Braga-Neto, ude.umat.ece@sessilu.


1. Kauffman S. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor. Biol. 1969;22(3):437–467. doi: 10.1016/0022-5193(69)90015-0. [PubMed] [Cross Ref]
2. Kauffman S. The Origins of Order: Self-Organization and Selection in Evolution. New York, NY: Oxford University Press; 1993.
3. Bornholdt S. Boolean network models of cellular regulation: prospects and limitations. J. R. Soc. Interface. 2008;5(1):S85—S94. [PubMed]
4. Albert R, Othmer H. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in drosophila melanogaster. J. Theor. Biol. 2003;223(1):1–18. doi: 10.1016/S0022-5193(03)00035-3. [PubMed] [Cross Ref]
5. Li F, T Long YLu, Ouyang Q, Tang C. The yeast cell-cycle network is robustly designed. Proc. Natl. Acad. Sci. U.S.A. 2004;101(14):4781–4876. doi: 10.1073/pnas.0305937101. [PubMed] [Cross Ref]
6. Faure A, Naldi A, Chaouiya C, Thieffry D. Dynamical analysis of a generic boolean model for the control of the mammalian cell cycle. Bionformatics. 2006;22(14):124–131. doi: 10.1093/bioinformatics/btl210. [PubMed] [Cross Ref]
7. Dougherty ER, Kim S, Chen Y. Coefficient of determination in nonlinear signal processing. EURASIP J. Signal Process. 2000;80(10):2219–2235. doi: 10.1016/S0165-1684(00)00079-7. [Cross Ref]
8. Kim S, Dougherty ER, Chen Y, Sivakumar K, Meltzer P, Trent JM, Bittner M. Multivariate measurement of gene expression relationships. Genom. 2000;67(2):201–209. doi: 10.1006/geno.2000.6241. [PubMed] [Cross Ref]
9. Zhou X, Wang X, Dougherty ER. Binarization of microarray data based on a mixture model. Mol. Cancer Ther. 2003;2(7):679–684. [PubMed]
10. Kim S, Dougherty ER, Bittner ML, Chen Y, Sivakumar K, Meltzer P, Trent JM. General nonlinear framework for the analysis of gene interaction via multivariate expression arrays. J. Biomed. Opt. 2000;5(4):411–424. doi: 10.1117/1.1289142. [PubMed] [Cross Ref]
11. Shmulevich I, Dougherty ER, Kim S, Zhang W. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinforma. 2002;18(2):261–274. doi: 10.1093/bioinformatics/18.2.261. [PubMed] [Cross Ref]
12. Martins D, Braga-Neto U, Hashimoto R, Bittner M, Dougherty ER. Intrinsically multivariate predictive genes. IEEE J. Sel. Top. Sign. Proces. 2008;2(3):424–439. doi: 10.1109/JSTSP.2008.923841. [Cross Ref]
13. Chen T, Braga-Neto UM. Statistical detection of intrinsically multivariate predictive genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 2015;12(4):951–964. doi: 10.1109/TCBB.2014.2377731. [PubMed] [Cross Ref]
14. T Chen, UM Braga-Neto, Exact performance of CoD estimators in discrete prediction. EURASIP J. Adv. Signal Process (2010). (Article ID 2010:487893).
15. Chen T, Braga-Neto UM. Maximum-likelihood estimation of the discrete coefficient of determination in stochastic boolean systems. IEEE Trans. Signal Process. 2013;61(15):3880–3894. doi: 10.1109/TSP.2013.2264054. [Cross Ref]
16. Chen T, Braga-Neto UM. Statistical detection of Boolean regulatory relationships. IEEE/ACM Trans. Comput. Biol. Bioinform. 2013;10(5):1310–1321. doi: 10.1109/TCBB.2013.118. [PubMed] [Cross Ref]
17. Dalton LA, Dougherty ER. Bayesian minimum mean-square error estimation for classification error – Part I: Definition and the Bayesian mmse error estimator for discrete classification. IEEE Trans. Signal Process. 2011;59(1):115–129. doi: 10.1109/TSP.2010.2084572. [Cross Ref]
18. Dalton LA, Dougherty ER. Bayesian minimum mean-square error estimation for classification error – Part II: Linear classification of gaussian models. IEEE Trans. Signal Process. 2011;59(1):130–144. doi: 10.1109/TSP.2010.2084573. [Cross Ref]
19. Chen T, Braga-Neto UM. In Proceedings of the 2013 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS’2013) Houston: TX; 2013. Optimal Bayesian MMSE estimation of the coefficient of determination for discrete prediction; pp. 66–69.
20. Dalton LA, Dougherty ER. Optimal classifiers with minimum expected error within a Bayesian framework – Part I: Discrete and gaussian models. Pattern Recogn. 2013;46(5):1301–1314. doi: 10.1016/j.patcog.2012.10.018. [Cross Ref]
21. Dalton LA, Dougherty ER. Optimal classifiers with minimum expected error within a Bayesian framework – Part II: Properties and performance analysis. Pattern Recogn. 2013;46(5):1288–1300. doi: 10.1016/j.patcog.2012.10.019. [Cross Ref]
22. Devroye L, Gyorfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. New York: Springer; 1996.
23. Casella G, Berger R. Statistical Inference. Duxbury: Pacific Grove, CA; 2002.
24. Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty ER, Marincola F, Wang E, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V, Hayward N, Trent J. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature. 2000;406:536–540. doi: 10.1038/35020115. [PubMed] [Cross Ref]
25. Kim S, Dougherty ER, Cao N, Chen Y, Bittner M, Suh E. Can markov chain models mimic biological regulation? J. Biol. Syst. 2002;10:437–458. doi: 10.1142/S0218339002000676. [Cross Ref]
26. Datta A, Choudhary A, Bittner M, Dougherty ER. External control in markovian genetic regulatory networks. Mach. Learn. 2003;52:169–191. doi: 10.1023/A:1023909812213. [Cross Ref]
27. Braga-Neto UM, Dougherty ER. Error Estimation for Pattern Recognition. New York: Wiley; 2015.
28. Ross S. A first course in probability. New York: Macmillan; 1994.
29. Arfken G. Mathematical Methods for Physicists. Orlando, FL: Academic Press; 1985.
30. Balakrishnan N, Nevzorov V. A Primer on Statistical Distributions. Hoboken, NJ: Wiley; 2003.

Articles from EURASIP Journal on Bioinformatics and Systems Biology are provided here courtesy of Springer