Enter Your Search:Search tips Search criteria Articles Journal titles Advanced

Stat Probab Lett. Author manuscript; available in PMC 2011 January 1.
Published in final edited form as:
Stat Probab Lett. 2010 January 1; 80(1): 57–62.
PMCID: PMC2786189
NIHMSID: NIHMS149906

# A Note on the Existence of the Posteriors for One-way Random Effect Probit Models

## Abstract

The existence of the posterior distribution for one-way random effect probit models has been investigated when the uniform prior is applied to the overall mean and a class of noninformative priors are applied to the variance parameter. The sufficient conditions to ensure the propriety of the posterior are given for the cases with replicates at some factor levels. It is shown that the posterior distribution is never proper if there is only one observation at each factor level. For this case, however, a class of proper priors for the variance parameter can provide the necessary and sufficient conditions for the propriety of the posterior.

Keywords: Uniform Prior, Noninformative Prior, Probit Models, Propriety of Posteriors

## 1 Introduction

In Bayesian hierarchical models, the choice of priors for the variance components is crucial. The commonly used prior for a variance component follows the inverse-Gamma distribution, which is conjugate for the variance component of a normal distribution. However, the specification of hyperparameters in the inverse-Gamma prior is often subjective. In many applications, little or no prior information about the variance components is available. Then a “vague” or “noninformative” prior is used to reflect the uncertainty. One of the vague priors is so-called Inverse-Gamma(ε, ε) with ε as a small number (see Spiegelhalter et al., 1994, 2003). The improper uniform prior for the standard deviation was also recommended by Gelman et al. (2006). However, the uniform prior for the standard deviation and the Inverse-Gamma (ε, ε) as ε → 0 belong to the class of improper priors 1/δa+1 for the variance component δ (Natarajan and McCulloch, 1995), where a is a real number.

Natarajan and McCulloch (1995) investigated the existence of the posteriors for a class of mixed models for binomial responses. Their model specification was

$(yi∣u)∼Bernoulli(h(xiβ+ziu)),(u∣δ)∼Nq(0,δIq),[δ∣a]∝1δa+1,$

where h is a link function, β and u are fixed and random effects, respectively. Although their model is based on a general h, the fixed effect parameter β is assumed to be known or has a proper prior. They gave sufficient conditions to ensure the propriety of the joint posterior distribution of (β, u, δ). One condition is about the likelihood (data outcomes). Another condition is to assume the range of the hyper parameter a to be in the interval (–1/2, 0), which does not include commonly used priors 1/δ, $1∕δ$, or the constant prior corresponding to a = 0, –1/2, and –1, respectively.

In this note, we work with the one-way random effect probit models of data yik with factor-level effects αi:

$(yik∣μ,αi)∼ind.Bernoulli(Φ(μ+αi)),i=1,…,I,k=1,…,ki,(αi∣δ)∼i.i.d.N(0,δ),i=1,…,I,$
(1)

where ki ≥ 1. Model (1) belongs to the set of mixed models in Natarajan and McCulloch (1995), and has two hyperparameters μ and δ. The case here is different from Natarajan and McCulloch (1995) in the sense that an improper uniform prior is used to the overall mean μ instead of assuming μ known or having a proper prior. Although we consider only the probit link due to the ease of computation (Albert and Chib, 1993) and the practical similarity between the probit and logit link, it is not hard to generalize the results for a general link. For the variance component parameter δ, the same class of improper priors 1/δa+1 is considered first. This class of improper priors have been widely investigated for the propriety of posterior in hierarchical linear mixed models (Hobert and Casella, 1996). In Section 2, we prove that if there are replicate observations at some factor levels, the condition about the likelihood can be relaxed and the range of valid a can be wider. We also show that if there is no replicate at each factor level, the posterior would never be proper. For this case, in Section 3, we apply another class of priors incorporating a shrinkage effect, and gave the necessary and sufficient conditions to ensure the propriety of the posterior.

## 2 Propriety of Posterior for Prior [μ, δ] 1/δa+1

We consider the class of priors of (μ, δ),

$[μ,δ]∝1δa+1,$
(2)

where a is a real number. Let τ = 1/δ. Then the corresponding prior for (μ, τ) is

$[μ,τ]∝τa−1.$
(3)

Clearly this prior is improper. To ensure correct Bayesian inferences, the posterior distribution must be proper. In Section 2.1, we study the propriety of the posterior distribution for the case that there are replicates at some factor levels. In Section 2.2, we investigate the case without replication.

### 2.1 Case A: Some ki > 1

Theorem 2.1 For model (1) with prior (2), the joint posterior of (μ, δ) is proper if the two conditions hold:

1. for each i, i = 1, 2, · · · I1, (2 ≤ I1I), there is at least one success and one failure;
2. –(I1 – 1)/2 < a < 0.

Proof. To ensure a proper posterior, we need to show

$G≡∫L(μ,α)[α∣τ][μ,τ]dμdαdτ<∞,$
(4)

where

$L(μ,α)=∏i=1I∏k=1kiΦ(μ+αi)yikΦ‒(μ+αi)1−yik.$

From Condition (i),

$L(μ,α)≤∏i=1I1Φ(μ+αi)Φ‒(μ+αi)≤exp{−12∑i=1I1(μ+αi)2}.$

Clearly,

$G≤∫exp{−12∑i=1I1(μ+αi)2}τI2exp{−τ2∑i=1Iαi2}τa−1dμdαdτ∝∫exp{−12(μ,α1,…,αI1)H(μ,α1,…,αI1)′}τa+I12−1dμdα1⋯dαI1dτ,$

where

$H=(I111⋯111+τ0⋯0101+τ⋯0⋮⋮⋮⋱⋮100⋯1+τ).$

It can be shown that |H| = I1τ(1 + τ)I1–1. Therefore, there exists a constant C such that

$G≤C∫τa+I12−1∣H∣12dτ∝∫0∞τa+I1−32(1+τ)I1−12dτ,$

which is integrable when Condition (ii) holds.

Theorem 2.1 gives two properties. First there is no need for every level to have at least one success and one failure to guarantee a proper posterior. Second, if there are quite a few levels with both success and failure, then the range of a may be quite large. For example, if both success and failure occur at 4 or more levels, the posterior is proper under the constant prior of (μ, δ). More importantly, all valid a′s are less than zero, indicating that the prior (2) is not the limit of the vague inverse-Gamma (ε, ε) prior when ε → 0.

### 2.2 Case B: All ki = 1

Theorem 2.2 Consider model (1) with prior (2). If all ki = 1, then the joint posterior distribution of (μ, δ) is never proper.

Proof. For G defined in (4), by letting θi = μ + αi, for i = 1, . . . , I, and θ = (θ1, . . . , θI)′, we have

$G=∫[∏i=1IΦ(θi)yiΦ‒(θi)1−yi]1δI∕2+a+1exp{−12δ∑i=1I(θi−μ)2}dμdθdδ∝∫[∏i=1IΦ(θi)yiΦ‒(θi)1−yi]1δ(I−1)∕2+a+1exp{−12δ∑i=1I(θi−θ‒)2}dθdδ.$
(5)

It is easy to show that the subspace of RI , defined as Θsub = {(θ1, . . . , θI): θi > 0, if yi = 1; θi < 0, if yi = 0, for i = 1, . . . , I}, satisfies that

$(1∕2)I≤∏i=1IΦ(θi)yiΦ‒(θi)1−yi≤1,forθ∈ϴsub.$
(6)

Let $wi=θi∕δ$, for i = 1, . . . , I. Define $Wsub={w=(w1,…,wI)′:δw∈ϴsub}$. Then

$∫0∞∫ϴsub1δ(I−1)∕2+a+1exp{−12δ∑i=1I(θi−θ‒)2}dθdδ=∫0∞1δ1∕2+adδ∫Wsubexp{−12∑i=1I(wi−w‒)2}dw,$
(7)

which is not integrable. Clearly, the integral G does not exist by (5)-(7). The result follows.

## 3 Propriety of Posterior for Prior [μ, δ] 1/(1+δ)a

Theorem 2.2 shows that we never have a proper posterior distribution of (μ, δ) for model (1) without replication under prior (2). Indeed, in presence of a overall mean with the constant prior, prior (2) for δ will promote too low variance values obstructing the likelihood to contribute useful factors. We now consider another class of priors for (μ, δ),

$[μ,δ]∝1(1+δ)a.$
(8)

This prior incorporates a shrinkage effect. It is interesting that when a = 2, this prior corresponds to the uniform shrinkage prior on the shrinkage parameter s = 1/(1+δ) for the normal-normal hierarchical model, see Daniels (1999). The uniform shrinkage prior has several attractive properties as discussed by various authors, e.g., Strawderman (1971) and Daniels (1999). When a > 1, prior (8) is actually marginally proper for δ.

Theorem 3.1 Consider model (1) with prior (8) when all ki = 1. The necessary and sufficient conditions to ensure the posterior propriety of (μ, δ) are

1. there is at least one success and one failure among the I levels (I ≥ 2);
2. a > 3/2.

Proof. For G defined in (4), by letting θi = μ + αi, for i = 1, . . . , I, and θ = (θ1, . . . , θI)′, we have

$G=∫{∏i=1IΦ(θi)yiΦ‒(θi)1−yi}1δI∕2exp{−12δ∑i=1I(θi−μ)2}1(1+δ)adμdθdδ∝∫{∏i=1IΦ(θi)yiΦ‒(θi)1−yi}1δ(I−1)∕2exp{−12δ∑i=1I(θi−θ‒)2}1(1+δ)adθdδ.$
(9)

Necessity of (i) and (ii). By the inequality (6) and by (9), one necessary condition that G is finite is

$G1≡∫0∞∫ϴsub1δ(I−1)∕2exp{−12δ∑i=1I(θi−θ‒)2}1(1+δ)adθdδ<∞.$

Let $wi=θi∕δ$, for i = 1, . . . , I. Define $Wsub={w=(w1,…,wI)′:δw∈ϴsub}$. Then

$G1=G2⋅G3,$

where

$G2=∫0∞δ(δ+1)adδ,$
(10)
$G3=∫Wsubexp{−12∑i=1I(wi−w‒)2}dw.$
(11)

Obviously, Condition (ii) is required to make G2 finite.

To prove the necessity of Condition (i), WLOG (Without Loss of Generality) we consider only the case that all yi = 1, that is, all outcomes are successes.

Suppose I = 2. Let Θsub = {(θ1, θ2) : θ1 > 0, θ2 > 0}. Then 1/2 ≤ Φ(θi) ≤ 1, i = 1, 2 for (θ1, θ2) Θsub, and Wsub = {(w1, w2) : w1 > 0, w2 > 0}. Clearly G3 in (11) is just

$G3=∫0∞∫0∞exp{−12∑i=2I(wi−w‒)2}dw1dw2=∫0∞∫0∞exp{−14(w1−w2)2}dw1dw2=∫0∞∫−w2∞exp{−14z12}dz1dw2,≥∫0∞∫0∞exp{−14z12}dz1dw2=∞.$
(12)

When I > 2, Θsub = {(θ1, . . . , θI) : θi > 0, for i = 1, . . . , I} and Wsub = {(w1, . . . , wI) : wi > 0, for i = 1, . . . , I}. Using the equality,

$∑i=1I(wi−w‒)2=∑i=2I(wi−w‒(−1))2+I−1I(w1−1I−1∑i=2Iwi)2,$

where is the mean value of all wi's, and (–1) is the mean value of all wi's but excluding w1, the integral G3 can be written as

$∫0∞⋯∫0∞G4⋅exp{−12∑i=2I(wi−w‒(−1))2}dw2⋯dwI,$
(13)

where

$G4≡∫0∞exp{−I−12I(w1−1I−1∑i=2Iwi)2}dw1≥∫0∞exp{−I−12Iz12}dz1.$

This inequality holds because all wi > 0, for i = 2, . . . , I. Combining (12) with (13), it is clear that the integral G3 is not finite if I ≥ 2 and all observations are successes. Thus, Condition (i) is required to make G3 finite.

Sufficiency of (i) and (ii). When I = 2, WLOG, we assume y1 = 1 and y2 = 0. From (9),

$G∝∫0∞∫R2Φ(θ1)Φ‒(θ2)1δexp{−14δ(θ1−θ2)2}1(δ+1)adθdδ.$

We partition $R2$ for (θ1, θ2) as Θ1 Θ2 Θ3 with Θ1 = {(θ1, θ2) : θ1 < 0, – ∞ < θ2 < ∞}, Θ2={(θ1, θ2) : θ1 > 0, θ2 > 0}, and Θ3 = {(θ1, θ2) : θ1 > 0, θ2 < 0}. Correspondingly,

$G∝∑i=13Qi,$

where

$Qi≡∫ϴiΦ(θ1)Φ‒(θ2)1δexp{−14δ(θ1−θ2)2}1(δ+1)adθdδ.$

Note that

$Q1≤∫0∞∫−∞∞∫−∞0exp{−θ122}1δexp{−14δ(θ1−θ2)2}1(δ+1)adθ1dθ2dδ≤∫0∞∫−∞∞∫−∞∞exp{−θ122}1δexp{−14δ(θ1−θ2)2}1(δ+1)adθ1dθ2dδ=∫0∞∫−∞∞∫−∞∞exp{−12(θ1,θ2)H(θ1,θ2)′}1δ1(δ+1)adθ1dθ2dδ,$

where

$H=(1+12δ−12δ−12δ12δ).$

It is easy to show that |H| = 1/(2δ). Then there is a constant C such that

$Q1≤C∫0∞1∣H∣1∕21δ1(δ+1)adδ∝∫0∞1(δ+1)adδ,$

which is integrable when a > 1. Similarly, Q2 < ∞ if a > 1. Finally,

$Q3≤∫0∞∫−∞0∫0∞1δexp{−14δ(θ1−θ2)2}1(δ+1)adθ1dθ2dδ.$

Let θ1 = r sin β and θ2 = r cos β. On Θ3, we have 0 < r < ∞ and π/2 < β < π. Then

$Q3≤∫0∞∫π2π∫0∞rδexp{−r24δ(sinβ−cosβ)2}1(δ+1)adrdβdδ∝∫0∞δ(δ+1)adδ∫π2π1(sinβ−cosβ)2dβ,$

which is integrable when Condition (ii) holds.

Suppose I > 2. WLOG, we again assume y1 = 1 and y2 = 0. Now,

$L(μ,α)≤Φ(μ+α1)Φ‒(μ+α2).$

Then, there exists a constant C such that

$G≤C∫Φ(μ+α1)Φ‒(μ+α2)1δI∕2exp{−12δ∑i=1Iαi2}1(δ+1)adμdαdδ=C∫Φ(μ+α1)Φ‒(μ+α2)1δexp{−12δ∑i=12αi2}1(δ+1)adμdα1dα2dδ,$

which is exactly the case for I = 2 with y1 = 1 and y2 = 0. Hence, Conditions (i) and (ii) are sufficient for I ≥ 2.

## Acknowlegements

The research was supported by grant SES-0720229 of the National Science Foundation and grants R01-MH071418 and R01-CA109675 of the National Institute of Health. The authors would like to thank a referee for the comments and suggestions.

## Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

## References

• Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88:669–679.
• Daniels MJ. A prior for the variance in hierarchical models. The Canadian Journal of Statistics. 1999;27:567–578.
• Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;1:515–533.
• Hobert PJ, Casella G. The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. Journal of the American Statistical Association. 1996;91:1461–1473.
• Natarajan R, McCulloch C. A note on the existence of the posterior distribution for a class of mixed models for binomial responses. Biometrika. 1995;82:639–643.
• Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation, section 5.7.3. Wiley; Chichester: 2004.
• Spiegelhalter DJ, Thomas A, Best NG, Gilks WR, Lunn D. BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit; Cambridge, England: 1994. p. 2003. www.mrc-bsu.cam.ac.uk/bugs/
• Strawderman W. Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics. 1971;42:385–388.

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.