PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Stat Probab Lett. Author manuscript; available in PMC 2011 January 1.
Published in final edited form as:
Stat Probab Lett. 2010 January 1; 80(1): 57–62.
doi:  10.1016/j.spl.2009.09.012
PMCID: PMC2786189
NIHMSID: NIHMS149906

A Note on the Existence of the Posteriors for One-way Random Effect Probit Models

Abstract

The existence of the posterior distribution for one-way random effect probit models has been investigated when the uniform prior is applied to the overall mean and a class of noninformative priors are applied to the variance parameter. The sufficient conditions to ensure the propriety of the posterior are given for the cases with replicates at some factor levels. It is shown that the posterior distribution is never proper if there is only one observation at each factor level. For this case, however, a class of proper priors for the variance parameter can provide the necessary and sufficient conditions for the propriety of the posterior.

Keywords: Uniform Prior, Noninformative Prior, Probit Models, Propriety of Posteriors

1 Introduction

In Bayesian hierarchical models, the choice of priors for the variance components is crucial. The commonly used prior for a variance component follows the inverse-Gamma distribution, which is conjugate for the variance component of a normal distribution. However, the specification of hyperparameters in the inverse-Gamma prior is often subjective. In many applications, little or no prior information about the variance components is available. Then a “vague” or “noninformative” prior is used to reflect the uncertainty. One of the vague priors is so-called Inverse-Gamma(ε, ε) with ε as a small number (see Spiegelhalter et al., 1994, 2003). The improper uniform prior for the standard deviation was also recommended by Gelman et al. (2006). However, the uniform prior for the standard deviation and the Inverse-Gamma (ε, ε) as ε → 0 belong to the class of improper priors 1/δa+1 for the variance component δ (Natarajan and McCulloch, 1995), where a is a real number.

Natarajan and McCulloch (1995) investigated the existence of the posteriors for a class of mixed models for binomial responses. Their model specification was

(yiu)Bernoulli(h(xiβ+ziu)),(uδ)Nq(0,δIq),[δa]1δa+1,

where h is a link function, β and u are fixed and random effects, respectively. Although their model is based on a general h, the fixed effect parameter β is assumed to be known or has a proper prior. They gave sufficient conditions to ensure the propriety of the joint posterior distribution of (β, u, δ). One condition is about the likelihood (data outcomes). Another condition is to assume the range of the hyper parameter a to be in the interval (–1/2, 0), which does not include commonly used priors 1/δ, 1δ, or the constant prior corresponding to a = 0, –1/2, and –1, respectively.

In this note, we work with the one-way random effect probit models of data yik with factor-level effects αi:

(yikμ,αi)ind.Bernoulli(Φ(μ+αi)),i=1,,I,k=1,,ki,(αiδ)i.i.d.N(0,δ),i=1,,I,
(1)

where ki ≥ 1. Model (1) belongs to the set of mixed models in Natarajan and McCulloch (1995), and has two hyperparameters μ and δ. The case here is different from Natarajan and McCulloch (1995) in the sense that an improper uniform prior is used to the overall mean μ instead of assuming μ known or having a proper prior. Although we consider only the probit link due to the ease of computation (Albert and Chib, 1993) and the practical similarity between the probit and logit link, it is not hard to generalize the results for a general link. For the variance component parameter δ, the same class of improper priors 1/δa+1 is considered first. This class of improper priors have been widely investigated for the propriety of posterior in hierarchical linear mixed models (Hobert and Casella, 1996). In Section 2, we prove that if there are replicate observations at some factor levels, the condition about the likelihood can be relaxed and the range of valid a can be wider. We also show that if there is no replicate at each factor level, the posterior would never be proper. For this case, in Section 3, we apply another class of priors incorporating a shrinkage effect, and gave the necessary and sufficient conditions to ensure the propriety of the posterior.

2 Propriety of Posterior for Prior [μ, δ] [proportional, variant] 1/δa+1

We consider the class of priors of (μ, δ),

[μ,δ]1δa+1,
(2)

where a is a real number. Let τ = 1/δ. Then the corresponding prior for (μ, τ) is

[μ,τ]τa1.
(3)

Clearly this prior is improper. To ensure correct Bayesian inferences, the posterior distribution must be proper. In Section 2.1, we study the propriety of the posterior distribution for the case that there are replicates at some factor levels. In Section 2.2, we investigate the case without replication.

2.1 Case A: Some ki > 1

Theorem 2.1 For model (1) with prior (2), the joint posterior of (μ, δ) is proper if the two conditions hold:

  1. for each i, i = 1, 2, · · · I1, (2 ≤ I1I), there is at least one success and one failure;
  2. –(I1 – 1)/2 < a < 0.

Proof. To ensure a proper posterior, we need to show

GL(μ,α)[ατ][μ,τ]dμdαdτ<,
(4)

where

L(μ,α)=i=1Ik=1kiΦ(μ+αi)yikΦ(μ+αi)1yik.

From Condition (i),

L(μ,α)i=1I1Φ(μ+αi)Φ(μ+αi)exp{12i=1I1(μ+αi)2}.

Clearly,

Gexp{12i=1I1(μ+αi)2}τI2exp{τ2i=1Iαi2}τa1dμdαdτexp{12(μ,α1,,αI1)H(μ,α1,,αI1)}τa+I121dμdα1dαI1dτ,

where

H=(I111111+τ00101+τ01001+τ).

It can be shown that |H| = I1τ(1 + τ)I1–1. Therefore, there exists a constant C such that

GCτa+I121H12dτ0τa+I132(1+τ)I112dτ,

which is integrable when Condition (ii) holds.

Theorem 2.1 gives two properties. First there is no need for every level to have at least one success and one failure to guarantee a proper posterior. Second, if there are quite a few levels with both success and failure, then the range of a may be quite large. For example, if both success and failure occur at 4 or more levels, the posterior is proper under the constant prior of (μ, δ). More importantly, all valid a′s are less than zero, indicating that the prior (2) is not the limit of the vague inverse-Gamma (ε, ε) prior when ε → 0.

2.2 Case B: All ki = 1

Theorem 2.2 Consider model (1) with prior (2). If all ki = 1, then the joint posterior distribution of (μ, δ) is never proper.

Proof. For G defined in (4), by letting θi = μ + αi, for i = 1, . . . , I, and θ = (θ1, . . . , θI)′, we have

G=[i=1IΦ(θi)yiΦ(θi)1yi]1δI2+a+1exp{12δi=1I(θiμ)2}dμdθdδ[i=1IΦ(θi)yiΦ(θi)1yi]1δ(I1)2+a+1exp{12δi=1I(θiθ)2}dθdδ.
(5)

It is easy to show that the subspace of RI , defined as Θsub = {(θ1, . . . , θI): θi > 0, if yi = 1; θi < 0, if yi = 0, for i = 1, . . . , I}, satisfies that

(12)Ii=1IΦ(θi)yiΦ(θi)1yi1,forθϴsub.
(6)

Let wi=θiδ, for i = 1, . . . , I. Define Wsub={w=(w1,,wI):δwϴsub}. Then

0ϴsub1δ(I1)2+a+1exp{12δi=1I(θiθ)2}dθdδ=01δ12+adδWsubexp{12i=1I(wiw)2}dw,
(7)

which is not integrable. Clearly, the integral G does not exist by (5)-(7). The result follows.

3 Propriety of Posterior for Prior [μ, δ] [proportional, variant] 1/(1+δ)a

Theorem 2.2 shows that we never have a proper posterior distribution of (μ, δ) for model (1) without replication under prior (2). Indeed, in presence of a overall mean with the constant prior, prior (2) for δ will promote too low variance values obstructing the likelihood to contribute useful factors. We now consider another class of priors for (μ, δ),

[μ,δ]1(1+δ)a.
(8)

This prior incorporates a shrinkage effect. It is interesting that when a = 2, this prior corresponds to the uniform shrinkage prior on the shrinkage parameter s = 1/(1+δ) for the normal-normal hierarchical model, see Daniels (1999). The uniform shrinkage prior has several attractive properties as discussed by various authors, e.g., Strawderman (1971) and Daniels (1999). When a > 1, prior (8) is actually marginally proper for δ.

Theorem 3.1 Consider model (1) with prior (8) when all ki = 1. The necessary and sufficient conditions to ensure the posterior propriety of (μ, δ) are

  1. there is at least one success and one failure among the I levels (I ≥ 2);
  2. a > 3/2.

Proof. For G defined in (4), by letting θi = μ + αi, for i = 1, . . . , I, and θ = (θ1, . . . , θI)′, we have

G={i=1IΦ(θi)yiΦ(θi)1yi}1δI2exp{12δi=1I(θiμ)2}1(1+δ)adμdθdδ{i=1IΦ(θi)yiΦ(θi)1yi}1δ(I1)2exp{12δi=1I(θiθ)2}1(1+δ)adθdδ.
(9)

Necessity of (i) and (ii). By the inequality (6) and by (9), one necessary condition that G is finite is

G10ϴsub1δ(I1)2exp{12δi=1I(θiθ)2}1(1+δ)adθdδ<.

Let wi=θiδ, for i = 1, . . . , I. Define Wsub={w=(w1,,wI):δwϴsub}. Then

G1=G2G3,

where

G2=0δ(δ+1)adδ,
(10)
G3=Wsubexp{12i=1I(wiw)2}dw.
(11)

Obviously, Condition (ii) is required to make G2 finite.

To prove the necessity of Condition (i), WLOG (Without Loss of Generality) we consider only the case that all yi = 1, that is, all outcomes are successes.

Suppose I = 2. Let Θsub = {(θ1, θ2) : θ1 > 0, θ2 > 0}. Then 1/2 ≤ Φ(θi) ≤ 1, i = 1, 2 for (θ1, θ2) [set membership] Θsub, and Wsub = {(w1, w2) : w1 > 0, w2 > 0}. Clearly G3 in (11) is just

G3=00exp{12i=2I(wiw)2}dw1dw2=00exp{14(w1w2)2}dw1dw2=0w2exp{14z12}dz1dw2,00exp{14z12}dz1dw2=.
(12)

When I > 2, Θsub = {(θ1, . . . , θI) : θi > 0, for i = 1, . . . , I} and Wsub = {(w1, . . . , wI) : wi > 0, for i = 1, . . . , I}. Using the equality,

i=1I(wiw)2=i=2I(wiw(1))2+I1I(w11I1i=2Iwi)2,

where w is the mean value of all wi's, and w(–1) is the mean value of all wi's but excluding w1, the integral G3 can be written as

00G4exp{12i=2I(wiw(1))2}dw2dwI,
(13)

where

G40exp{I12I(w11I1i=2Iwi)2}dw10exp{I12Iz12}dz1.

This inequality holds because all wi > 0, for i = 2, . . . , I. Combining (12) with (13), it is clear that the integral G3 is not finite if I ≥ 2 and all observations are successes. Thus, Condition (i) is required to make G3 finite.

Sufficiency of (i) and (ii). When I = 2, WLOG, we assume y1 = 1 and y2 = 0. From (9),

G0R2Φ(θ1)Φ(θ2)1δexp{14δ(θ1θ2)2}1(δ+1)adθdδ.

We partition R2 for (θ1, θ2) as Θ1 [union or logical sum] Θ2 [union or logical sum] Θ3 with Θ1 = {(θ1, θ2) : θ1 < 0, – ∞ < θ2 < ∞}, Θ2={(θ1, θ2) : θ1 > 0, θ2 > 0}, and Θ3 = {(θ1, θ2) : θ1 > 0, θ2 < 0}. Correspondingly,

Gi=13Qi,

where

QiϴiΦ(θ1)Φ(θ2)1δexp{14δ(θ1θ2)2}1(δ+1)adθdδ.

Note that

Q100exp{θ122}1δexp{14δ(θ1θ2)2}1(δ+1)adθ1dθ2dδ0exp{θ122}1δexp{14δ(θ1θ2)2}1(δ+1)adθ1dθ2dδ=0exp{12(θ1,θ2)H(θ1,θ2)}1δ1(δ+1)adθ1dθ2dδ,

where

H=(1+12δ12δ12δ12δ).

It is easy to show that |H| = 1/(2δ). Then there is a constant C such that

Q1C01H121δ1(δ+1)adδ01(δ+1)adδ,

which is integrable when a > 1. Similarly, Q2 < ∞ if a > 1. Finally,

Q30001δexp{14δ(θ1θ2)2}1(δ+1)adθ1dθ2dδ.

Let θ1 = r sin β and θ2 = r cos β. On Θ3, we have 0 < r < ∞ and π/2 < β < π. Then

Q30π2π0rδexp{r24δ(sinβcosβ)2}1(δ+1)adrdβdδ0δ(δ+1)adδπ2π1(sinβcosβ)2dβ,

which is integrable when Condition (ii) holds.

Suppose I > 2. WLOG, we again assume y1 = 1 and y2 = 0. Now,

L(μ,α)Φ(μ+α1)Φ(μ+α2).

Then, there exists a constant C such that

GCΦ(μ+α1)Φ(μ+α2)1δI2exp{12δi=1Iαi2}1(δ+1)adμdαdδ=CΦ(μ+α1)Φ(μ+α2)1δexp{12δi=12αi2}1(δ+1)adμdα1dα2dδ,

which is exactly the case for I = 2 with y1 = 1 and y2 = 0. Hence, Conditions (i) and (ii) are sufficient for I ≥ 2.

Acknowlegements

The research was supported by grant SES-0720229 of the National Science Foundation and grants R01-MH071418 and R01-CA109675 of the National Institute of Health. The authors would like to thank a referee for the comments and suggestions.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88:669–679.
  • Daniels MJ. A prior for the variance in hierarchical models. The Canadian Journal of Statistics. 1999;27:567–578.
  • Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;1:515–533.
  • Hobert PJ, Casella G. The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. Journal of the American Statistical Association. 1996;91:1461–1473.
  • Natarajan R, McCulloch C. A note on the existence of the posterior distribution for a class of mixed models for binomial responses. Biometrika. 1995;82:639–643.
  • Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation, section 5.7.3. Wiley; Chichester: 2004.
  • Spiegelhalter DJ, Thomas A, Best NG, Gilks WR, Lunn D. BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit; Cambridge, England: 1994. p. 2003. www.mrc-bsu.cam.ac.uk/bugs/
  • Strawderman W. Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics. 1971;42:385–388.