PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of biometLink to Publisher's site
 
Biometrika. 2010 September; 97(3): 603–620.
Published online 2010 June 11. doi:  10.1093/biomet/asq031
PMCID: PMC3372243

On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems

Summary

This paper considers the asymptotic distribution of the likelihood ratio statistic T for testing a subset of parameter of interest θ, θ = (γ, η), H0 : γ = γ0, based on the pseudolikelihood L(θ, [phi with hat]), where [phi with hat] is a consistent estimator of ϕ, the nuisance parameter. We show that the asymptotic distribution of T under H0 is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, θ0, or the true value of the nuisance parameter, ϕ0, lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of T may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments.

Some key words: Asymptotic distribution, Boundary problem, Frailty survival model, Likelihood ratio test, Nuisance parameter, Pseudolikelihood, Teratological experiment, Variance component model

1. Introduction

1.1. Motivation

Very often a probability model for data is indexed by two sets of parameters: the parameter of interest, θ, of dimension p, and the nuisance parameter, ϕ, of dimension q. Sometimes the likelihood function L(θ, ϕ), when considered jointly as a function of θ and ϕ, is ill-behaved, yet when considered as a function of θ alone with ϕ held fixed, is well-behaved. When the nuisance parameter cannot be eliminated through conditioning, marginalizing or factorization, one approach is to use the pseudolikelihood proposed by Gong & Samaniego (1981). The key idea of this approach is that inference for θ can be based on L*(θ) = L(θ, [phi with hat]), known as the pseudolikelihood for θ, where [phi with hat] is a consistent estimator of ϕ, assuming such an estimator exists. Gong & Samaniego derived the asymptotic distribution of the maximum pseudolikelihood estimator θ^ for θ under regularity conditions. Thus, Wald-based inference for θ can be made based on this asymptotic distribution. However, it has been well documented in the literature that such inference could be ill-behaved, and that this concern can be alleviated by using likelihood-ratio-based inference (Hauck & Donner, 1977). To this end, Liang & Self (1996) studied the asymptotic behaviour of the likelihood ratio test statistic for testing H0 : θ= θ0 based on L*(θ). The primary purpose of this paper is to extend the work by Liang & Self (1996) to deal with situations where only a subset of θ is tested, and the parameter value of interest or the nuisance parameter may lie on the boundary of its parameter space.

1.2. Example 1: Variance component models

Consider the model

y=Xϕ+[sm epsilon],          [sm epsilon]~N{0, Σ(θ)},

where y is an n × 1 vector of the response variables, X is the design matrix of n × q dimension and Σ(θ) is the n × n covariance matrix indexed by θ, a p × 1 vector. In pedigree analyses, it has been a standard procedure to estimate ϕ by the conventional least-squares estimator [phi with hat] = (XTX)−1XTy and to base inference for θ on L(θ, [phi with hat]), where L(θ, ϕ) is proportional to the conditional distribution of Y = y given X. Often we are interested in testing the significance of the multiple variance components one at a time and leave others unspecified (Beaty et al., 1985).

1.3. Example 2: Behrens–Fisher problem

Consider yij ~ N(θi, ϕi) (j = 1, …, ni; i = 1, 2), when the yij s are independent of each other. The Behrens–Fisher problem corresponds to the problem of testing the difference between the two means, i.e. H0 : θ1 = θ2. With ϕi replaced by ϕ^i=i=1ni(yijy¯i)2/ni, the sample variance of the ith group (i = 1, 2), the pseudolikelihood for (θ1, θ2) has the form

L(θ1,θ2,ϕ^1,ϕ^2)[proportional, variant][product]i=12[product]j=1ni(2πϕ^i)1/2exp{(yijθi)22ϕ^i}.

The null hypothesis of interest is H0 : γ = θ1θ2 = 0, leaving η = θ2 unspecified.

1.4. Example 3: Frailty survival models

Bandeen-Roche & Liang (1996) described a class of models for failure-time data that accounts for multiple levels of clustering. This class of models permits specification of simple distributional forms for all bivariate margins. In addition, the distribution reduces to the univariate frailty model (Vaupel et al., 1979; Oakes, 1989) when there is only a single level. Consider a cluster with two levels; for example, households and villages. Suppose there are four individual members and the households are clustered as {1, 2} and {3, 4}. When the Clayton copula (Clayton, 1978) is used, the multivariate survival function has the form

S(t1,,t4)=[{j=12Sj(tj)1θ11}α+{j=34Sj(tj)1θ11}α1]1/(1θ2),
(1)

where α = (θ2 – 1)/(θ1 – 1), Sj is the marginal survival distribution of the j th member (j = 1, …, 4), and θ1 [gt-or-equal, slanted] 1 and θ2 [gt-or-equal, slanted] 1 characterize the association within and between households, respectively. To ensure that the above survival function is legitimate, one needs θ2 [less-than-or-eq, slant] θ1, i.e. the between-household association not exceeding the within-house association. Let ϕ denote the parameters which specify the Sj s, the pseudolikelihood approach for inference about θ has been proposed (Bandeen-Roche & Liang, 1996; Shih & Louis, 1995) as convenient estimators of ϕ are typically available by, for example, maximizing the likelihood function with respect to ϕ assuming independence, i.e. fixing θ1 and θ2 at 1. One hypothesis of interest is H0 : θ2 = 1, i.e. no between-household association. Here θ2 = 1 is on the boundary of the parameter space for θ2.

1.5. Example 4: Teratological experiments

In teratological experiments, the response variables are often dichotomous and their sums are commonly modelled by beta-binomial distributions (Weil, 1970; Haseman & Kupper, 1979). The distributions are indexed by π, the probability of experiencing an adverse event, and ϕ, the nuisance parameter characterizing the so-called litter effect, also known as overdispersion, the tendency that the subjects from the same litter to respond more alike than subjects from different litters. Often investigators would like to establish the dose-response relationship between π and x, the dosage level or the treatment status. One may consider the model,

g(π)=β0+β1x,        var(Y)=mπ(1π){1+(m1)ϕ},

where g is known as the prespecified link function. For beta-binomial models, it has been well documented that the nuisance parameter has a profound impact on the parameter of interest (Williams, 1988). One way to alleviate such concern is to base the inference of β on the pseudo-likelihood L(β, [phi with hat]) where [phi with hat] is a consistent estimator of ϕ. Such an estimator could be obtained, for example, by using method of moments. One hypothesis of interest is H0 : β1 = 0, i.e. no dose-response relationship. Since the intralitter correlation could be dose-dependent, heterogeneous correlations are often assumed. When the dosage level is zero or the placebo is used, the intralitter correlation is usually close to zero, lying near or on the boundary of its parameter space.

1.6. Example 5: A general setting

Consider the case where the likelihood can be decomposed into two parts,

L(θ,ϕ)=L1(θ,ϕ) L2(ϕ),

where both L1 and L2 are legitimate likelihood functions. In this case, one could use the maximizer of L2(ϕ), [phi with hat], to form the pseudolikelihood function L*(θ) for θ. Many examples fit into this case. It is easy to see that Example 2 above is a special case with L2(ϕ) proportional to the product of probability density functions of sample variances from both samples. Another example is genetic linkage study where the data consist of genetic marker data, M, and genotyping data, T, from pedigrees. In this situation, the likelihood function is proportional to

f(M|T; θ, ϕ)f(T;ϕ)=L1(θ,ϕ)L2(ϕ),

where θj, (j = 1, …, p), is the location of the jth disease gene, and ϕ are parameters characterizing an underlying genetic mechanism for the disease. In the context of measurement errors, the true covariate, X, is observed indirectly through surrogate, W, which is measured with errors. With Y as the dependent variable, the likelihood function is proportional to

f(Y=y|W=w;θ,ϕ)f(W=w;  ϕ)=xf(y|x;θ)f(x|w;ϕ)dxxf(w|x;ϕ)f(x;ϕ)dx=L1(θ,ϕ)L2(ϕ),

where θ is the regression coefficient relating X to Y and ϕ is the parameter characterizing the measurement errors for the covariates.

2. Main results

2.1. The parameter of interest and the nuisance parameter are interior points

In this subsection, we consider the cases where only a subset of the parameter of interest is tested while the true values of the parameter of interest and nuisance parameter are interior points of their parameter spaces. Consider the partition of θ into two parts: the parameters to be tested, γ, of dimension p1 and the remaining parameters, η, of dimension p2 = pp1. The null hypothesis of interest is specified as H0 : γ = γ0. Let T be the pseudolikelihood ratio test statistic, i.e. T = −2 log [L*{γ0, [eta w/ hat](γ0)}/L*([gamma with circumflex], [eta w/ hat])], where [eta w/ hat](γ0) = arg maxη L*(γ0, η) and (γ^, [eta w/ hat]) = arg maxγ,η L*(γ, η). Theorem 1 states the large sample distribution of T under H0.

Theorem 1. Suppose that under a distribution indexed by (θ0, ϕ0), the consistent estimator of ϕ, [phi with hat], has the asymptotic distribution n1/2([phi with hat]ϕ0) → N(0, Σ22) as n → ∞.

Then, under the regularity conditions (A1)–(A6) in Gong & Samaniego (1981),

  1. the p × p matrix
    {I111(000Iηη1)}I11*
    has p1 positive eigenvalues, λ1 [gt-or-equal, slanted] (...) [gt-or-equal, slanted] λp1>0, and pp1 zero eigenvalues;
  2. the asymptotic distribution of the pseudolikelihood ratio statistic T is U=j=1p1λjUj, where the Ujs are independent χ12 variables.

Here

I11=limn[E{1n[partial differential]2  logL(θ0,ϕ0)[partial differential]θ2;θ0,ϕ0}],Iηη=limn[E{1n[partial differential]2 log L(γ0,η0,ϕ0)[partial differential]η2;γ0,η0,ϕ0}],

and I11* is the asymptotic variance of n−1/2[partial differential] log L(θ0, [phi with hat])/[partial differential]θ and can be calculated as I11*=I11+I12Σ22I12T with

I12=limn[E{1n[partial differential]2log L(θ0,ϕ0)[partial differential]θ[partial differential]ϕ;θ0,ϕ0}].

The proof is straightforward using Taylor expansions and Theorem 4.4.4 of Graybill (1976) and hence is omitted. Implicitly, we used the fact that n1/2{θ^(ϕ0) – θ0}and n1/2([phi with hat]ϕ0) are asymptotically independent because θ^(ϕ0) is asymptotically efficient and n1/2([phi with hat]ϕ0) has asymptotic constant mean zero (Pierce, 1982; Parke, 1986). When p1 = p, Theorem 1 reduces to the main result of Liang & Self (1996).

It is easy to show that when I11*=I11, the distribution of U is χp12, which is the same as the asymptotic distribution of the conventional likelihood ratio statistic. As noted by Liang & Self (1996), this condition is satisfied when I12(θ0, ϕ) = 0 for all ϕ, which means that the score function for θ and ϕ are orthogonal to each other when evaluated at (θ0, ϕ). Another situation where we have I11*=I11 is Σ22 = 0. This could occur when ϕ is estimated through external data whose sample size n* increases at a faster rate than n, i.e. n*/n → ∞ as n → ∞. A less obvious situation, which results in the distribution of U being χp12, occurs when Iγη = Iγϕ =0. This means that the score function for γ is orthogonal both to the score function for η and to the score function for ϕ.

2.2. The parameter of interest lies on the boundary

In this subsection, we consider cases where the true value of θ lies on the boundary of its parameter space while the true value of the nuisance parameter ϕ is an interior point of its parameter space. These situations were informally mentioned in Liang & Self (1996) but were not formally treated therein. Denote the parameter space of θ by Ω and rewrite the hypothesis H0 : γ = γ0 as H0 : θ [set membership] Ω0, where Ω0 = {θ : θ = (γ0, η) [set membership] Ω}. The complement of Ω0 in Ω is denoted by Ω1. In § 2.1 where θ0 is not on the boundary, Ω, Ω0 and Ω1 are simply Rp, Rp2 and Rp1, respectively. For any subset of Rp, [var phi], we define P[var phi]= supθ[set membership][var phi] L*(θ). We also define the maximum pseudolikelihood estimator in the parameter space [var phi], θ^[var phi], as that value of θ in the closure of [var phi] which maximizes L*(θ).

The pseudolikelihood ratio test statistic can be written as T = −2 log (PΩ0/PΩ). Gong & Samaniego (1981) established the n1/2-consistency of the maximum pseudolikelihood estimator for θ when the true value of θ is an interior point of its parameter space. Here we give the consistency result when the true value of θ lies on the boundary.

Lemma 1. If the regularity conditions in Chernoff (1954) hold, the intersection of [var phi] and the closure of neighbourhoods centred about the true value of the parameter of interest, θ0, constitute closed subsets of Rp, and θ0 is a limit point of [var phi], then with probability tending to 1 as n → ∞, there exists a sequence of points in [var phi], [theta w/ hat][var phi], at which local maxima of log L*(θ) occur, and that converges to θ0 in probability. Moreover, n1/2([theta w/ hat][var phi]θ0) = Op(1).

We now establish the equivalence between the asymptotic distribution of the pseudolikelihood ratio and the distribution of the likelihood ratio of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix.

Theorem 2. If the regularity conditions in Chernoff (1954) are satisfied, θ0 is a limiting point of both Ω0 and Ω1, and the sets Ω0 and Ω1 are approximated by nonnull and disjoint cones CΩ0 and CΩ1, respectively, then, when γ = γ0, the asymptotic distribution of the pseudolikelihood ratio statistic, T, is the same as the distribution of the likelihood ratio test of θ [set membership] CΩ0 against θ [set membership] CΩ1 based on one observation from a population with multivariate normal distribution with mean and unknown covariance matrix I111I11*I111 while the covariance matrix is misspecified as I111 in the likelihood ratio test.

A sketch of the proof is given in Appendix A. When there is no nuisance parameter ϕ or I11*=I11, Theorem 2 reduces to Theorem 3 of Self & Liang (1987) in which the likelihood ratio instead of pseudolikelihood ratio is considered.

The result given by Theorem 2 reduces the general problem of computing the limit distribution of a pseudolikelihood ratio to a problem of computing the distribution of

T1(Z)=QCΩ0(Z)QCΩ(Z),
(2)

where Q[var phi](Z) = infθ[set membership][var phi] (Zθ)TI11(Zθ), Z ~ N(0, I111I11*I111) and CΩ = CΩ0 [union operator] CΩ1.

The distribution of T1(Z) in any given case could be rather complicated. We now offer several special cases in which Theorem 2 may be used to compute the asymptotic distribution of the pseudolikelihood ratio statistic.

From here on we assume Ω = Ω1 × (...) × Ωp, where the Ωjs are either closed, half-open or open intervals in R. To compute the distribution of T1(Z), we need the following lemma.

Lemma 2.Suppose Z ~ N(0, I111I11*I111) and [var phi] = {0}p1 × Rpp1. Let Q[var phi](Z) = infθ[set membership][var phi] (Zθ)TI11(Zθ). Then

  1. Qϕ(Z)=Z(p1)Te(p1)1Z(p1), where Z(p1) = (Z1, …, Zp1)T and e(p1) is the p1 × p1 upper left submatrix of I111;
  2. the distribution of Q[var phi](Z) is U=j=1p1λjUj, where Ujs are independent χ12 variables and λ1 [gt-or-equal, slanted] (...) [gt-or-equal, slanted] λp1 > 0 are the eigenvalues of matrix e(p1)1e(p1)*, with e(p1)* being the p1 × p1 upper left submatrix of I111I11*I111; and
  3. when I11*=I11, U is simply χp12.

Part (i) can be shown by completing the quadratic form, and the rest of the results follow by Theorem 4.4.4 of Graybill (1976).

Now we consider some special cases, adopting the notation in Self & Liang (1987). Partition the parameter vector, θ = (γ, η), into four coordinates: (p11, p12, p21, pp11p12p21) where the first p11 coordinates of θ represent the parameter to be tested with true values on the boundary; the next p12 coordinates of θ represent the parameter to be tested with true values not on the boundary; the next p21 coordinates of θ represent the first p21 component of η with true values on the boundary; and finally the remaining pp11p12p21 coordinates of θ represent the last pp11p12p21 coordinates of η with the true value not on the boundary. Here p1 = p11 + p12.

Case 1. Suppose that the parameter configuration is (0, p1, 0, pp1), i.e. θ0 is an interior point of Ω. Then CΩ0 = {0}p1 × Rp–p1 and CΩ=Rp. By Theorem 2, the asymptotic distribution of the pseudolikelihood ratio reduces to the distribution of T1(Z) = QCΩ0(Z). By Lemma 2, the distribution of T1(Z) is U=j=1p1λjUj, where Ujs are independent χ12 variables and λ1 [gt-or-equal, slanted] (...) [gt-or-equal, slanted] λp1 > 0 are the eigenvalues of e(p1)1e(p1)*. This can be shown to agree with results in Theorem 1. When I11*=I11, the asymptotic distribution of pseudolikelihood ratio statistic is χp12.

Case 2. Suppose the parameter configuration is (1, 0, 0, p – 1). Then CΩ0 = {0} × Rp−1 and CΩ = [0, ∞) × Rp−1. By Lemma 2, equation (2) can be reduced to

T1(Z)=Z12/e(1)Z12I(Z1<0)/e(1)=Z12I(Z1>0)/e(1),

where the first step is due to the fact that QCΩ (Z) = QCΩ0 (Z) when Z1 [less-than-or-eq, slant] 0. This is true because the quadratic function (Zθ)TI11(Zθ) is a continuous function of θ and the infimum in the function QCΩ (Z) is achieved at the boundary of CΩ, i.e. CΩ0. Consequently, the asymptotic distribution of T is a 50:50 mixture of χ02 and e(1)*χ12/e(1). When I11*=I11, this distribution is a 50:50 mixture of χ02 and χ12, which agrees with the result in Case 5 of Self & Liang (1987).

Case 3. Suppose the parameter configuration is (1, p1 – 1, 0, pp1). Then CΩ0 = {0} × {0}p1–1 × Rpp1 and CΩ = [0, ∞) × Rp−1. By Lemma 2 and a similar argument to that in Case 2, equation (2) can be expressed as T1(Z)=Z(p1)Te(p1)1Z(p1)Z12I(Z1<0)/e(1).

Its distribution is a 50:50 mixture of random variables Ua and Ub, where Ua=i=1p1λiUi with the λis being the p1 positive eigenvalues of the matrix e(p1)1e(p1)* and Ub=j=1p11λjUj with the λjs being the p1 – 1 positive eigenvalues of the matrix

{e(p1)1(1/e(1)01×(p11)0(p11)×10(p11)×(p11))}e(p1)*,

and the Ui s and Uj s being independent χ12 variables. In variance component models, for example, we are testing a variance component and p1 − 1 regression coefficients simultaneously while leaving the other parameters unspecified. When I11*=I11, the distribution is a 50:50 mixture of χp112 and χp12. This agrees with the result in Case 3 of Stram & Lee (1994), where one variance component and p1 − 1 covariances are tested simultaneously. When I11*=I11 and p1 = 2, the result reduces to the results in Case 2 of Stram & Lee (1994) and Case 6 of Self & Liang (1987).

In Cases 2 and 3, we take the advantage of the fact that only one component of γ0 lies on the boundary. In these cases, the parameter spaces CΩ0 and CΩ, and their boundaries, are straight lines or planes which are preserved under linear transformations. In Cases 4 and 5, we consider the situation where there are two boundary parameters. In Case 4, both boundary parameters are tested while in Case 5, only one boundary parameter is tested. The calculation of the asymptotic distribution of the pseudolikelihood ratio statistic is more complicated than that in Cases 1–3.

Case 4. Suppose the parameter configuration is (2, 0, 0, 0). Then CΩ0 = {0} × {0} and CΩ = [0, ∞) × [0, ∞). Let I11 = PT P, where P is a nonsingular matrix and denote CΩ = {[theta w/ tilde]: [theta w/ tilde] = for any θ [set membership] CΩ} and Z = P Z. Then T1(Z) can be rewritten as

T1(Z)=Z˜2infθ˜[set membership]C˜ΩZ˜θ˜2,
(3)

where ‖ · ‖ is the usual Euclidean metric. Calculation of the second term in equation (3) depends on the location of Z relative to the boundary of CΩ. There are four regions that must be considered. The shaded region in Fig. 1 represents CΩ, and CΩ0 is the origin. The angle in the shaded area is less than 180°. This is due to the fact that the convexity of CΩ is preserved under linear mapping P. Denote the rotation matrix in R2 and the information matrices I11 and I11* by

Tθ=(cos(θ)sin(θ)sin(θ)cos(θ)),  I11=(abbd),   I11*=(a*b*b*d*),

respectively. Further denote the columns of P by P1 and P2, and the inner product of vectors a and b by left angle bracketa, bright angle bracket = aTb. Then the distribution of T1(Z) can be calculated as

T1(Z)={Z˜2~U=λ1U1+λ2U2ifZ˜ is in the shaded region,{left angle bracketZ˜,P2right angle bracketP2}2~d*χ12/difZ˜ is in region 1,{left angle bracketZ˜,P1right angle bracketP1}2~a*χ12/aifZ˜ is in region 2,0ifZ˜ is in region 3,

where λ1 and λ2 are eigenvalues of I11* and I111, and U1 and U2 are independent χ12 variables.

Fig. 1.
Diagram of the parameter space for Case 4. The shaded region, CΩ, represents admissible parameter values under the alternative hypothesis. Under the null hypothesis, the parameter is located at the origin. The asymptotic distribution ...

The mixing probabilities for the shaded region, region 1 and region 2, are

ps=cos1[(0,1)I11{I11*}1I11(10){(0,1)I11(I11*)1I11(01)}1/2{(1,0)I11(I11*)1I11(10)}1/2]/2π,p1=cos1[(0,1)I11(I11*)1PTTπ/2P2{(0,1)I11(I11*)1I11(01)}1/2{P2TTπ/2P(I11*)1PTTπ/2P2}1/2]/2π,p2=cos1[(1,0)I11(I11*)1PTTπ/2P1{(1,0)I11(I11*)1I11(10)}1/2{P1TTπ/2P(I11*)1PTTπ/2P1}1/2]/2π.

Thus the asymptotic distribution of T is a mixture of U, d*χ12/d, a*χ12/a and χ02 with mixing probabilities ps, p1, p2 and 1 − psp1p2, respectively. When I11=I11*, the asymptotic distribution reduces to a ps, 0.5 and 0.5 − ps mixture of χ22, χ12 and χ02, respectively, which agrees with the results in Case 7 of Self & Liang (1987) and Example 3 in Chernoff (1954).

Case 5. Suppose the parameter configuration is (1, 0, 1, 0). Then CΩ0 = {0} × [0, +∞) and CΩ = [0, ∞) × [0, ∞). By a similar argument to that in Case 4, we may write

T1(Z)=infθ˜[set membership]C˜Ω0Z˜θ˜2infθ˜[set membership]C˜ΩZ˜θ˜2.
(4)

Calculation of the terms in equation (4) depends on the location of Z relative to the boundary of both CΩ0 and CΩ. In fact, there are two possible partitions of regions that must be considered, depending on whether the angle ψ spanned by P1 and P2 is greater than π/2. It is easy to show ψ > π/2 if and only if b < 0.

If ψ > π/2, i.e. b < 0, there are five different regions to be considered. The shaded region in Fig. 2 represents CΩ, and CΩ0 is the half-line in the same direction as the vector P2 with the endpoint at the origin. The distribution of T1(Z) can be calculated as

T1(Z)={Z˜2~λ1U1+λ2U2ifZ˜ is in region 1,Z˜2{left angle bracketZ˜,P2right angle bracketP2}2~λ3χ12ifZ˜ is in region 2,0ifZ˜ is in region 3 or 4,{left angle bracketZ˜,P1right angle bracketP1}2~a*χ12/aifZ˜ is in region 5,

where λ1 and λ2 are eigenvalues of I11*I111, U1 and U2 are independent χ12 variables, and λ3 = (a*d2 − 2b*bd + d*b2)/{d(adb2)}.

Fig. 2.
Diagrams of the parameter space for Case 5 when the angle spanned by P1 and P2 is (a) greater than π/2, or (b) is less than π/2. The shaded region, CΩ, represents admissible parameter values under the alternative ...

By a similar argument, the distribution of T1(Z) when ψ [less-than-or-eq, slant] π/2, i.e. b [gt-or-equal, slanted] 0, is calculated as

T1(Z)={Z˜2{left angle bracketZ˜,P2right angle bracketP2}2~λ1χ12ifZ˜ is in region 1,0ifZ˜ is in region 2 or 3,{left angle bracketZ˜,P1right angle bracketP1}2~a*χ12/aifZ˜ is in region 4,{left angle bracketZ˜,P1right angle bracketP1}2{left angle bracketZ˜,P2right angle bracketP2}2~λ2U1+λ3U2ifZ˜ is in region 5,

where λ1 = (a*d2 − 2b*bd + d*b2)/{d(adb2)}, U1 and U2 are independent χ12 variables, and λ2 and λ3 are eigenvalues of

I11(1/a001/d)I11*I111.

2. 3. The nuisance parameter lies on the boundary

In this subsection, we consider cases where the true value of nuisance parameter, ϕ, lies on the boundary of its parameter space. In this situation, the asymptotic distribution of the pseudolikelihood ratio statistic is more complicated since the consistent estimator [phi with hat] may not be asymptotically normally distributed. Specifically, by a similar argument as in the proof of Theorem 2, the pseudolikelihood ratio statistic can be written as

T=infθ [set membership] CΩ0(Znθ)TI11(Znθ)infθ [set membership] CΩ(Znθ)TI11(Znθ)+op(1),

where

Zn=n1/2I111[partial differential]   log L (θ0,ϕ0)[partial differential]θ+I111I12n1/2(ϕ^ϕ0)+op(1).

The second term in Zn may not be asymptotically normally distributed and the results by Pierce (1982) and Parke (1986) cannot be applied here to show the asymptotic independence between the two terms in Zn since the second term in Zn does not have asymptotic constant mean.

In fact, the asymptotic distribution of Zn is a mixture of normals with identical mean zero and possibly unequal variances. To show this, denote by [phi with hat]* the estimator of ϕ without restriction of parameter space and assume the parameter space for ϕ is [0, +∞) × Rq−1. By applying similar arguments as in Self & Liang (1987), [phi with hat] can be represented as

ϕ^*I(ϕ^1*>0)+(0ϕ^2*(Σ2221/Σ2211)ϕ^1*[vertical ellipsis]ϕ^q*(Σ22q1/Σ2211)ϕ^1*)I(ϕ^1*[less-than-or-eq, slant]0),

where Σ22ij are elements of the matrix Σ22, the asymptotic variance of ϕ*. Therefore, Zn can be rewritten as

{n1/2I111[partial differential]   log  L (θ0,ϕ0)[partial differential]θ+I111I12n1/2(ϕ^*ϕ0)}I(ϕ^1*>0)          +{n1/2I111[partial differential]   log L (θ0,ϕ0)[partial differential]θ+I111I12An1/2(ϕ^*ϕ0)}I(ϕ^1*[less-than-or-eq, slant]0),

where

A=(000Σ2221/Σ221110[vertical ellipsis][vertical ellipsis][ddots, three dots, descending][vertical ellipsis]Σ22q1/Σq21101).

Suppose that as n → ∞,

n1/2(1n[partial differential]   logL(θ0,ϕ0)[partial differential]θϕ^*ϕ0)(XY)~N{(00),(I11Σ12Σ12TΣ22)},
(5)

where Σ12 is the asymptotic covariance between n−1/2[partial differential] log L(θ0, ϕ0)/[partial differential]θ and n1/2([phi with hat]*ϕ0), which is in fact zero by Pierce (1982) and Parke (1986). The asymptotic distribution of Zn is

(I111X+I111I12Y)I(Y1>0)+(I111X+I111I12AY)I(Y1[less-than-or-eq, slant]0).
(6)

Thus the asymptotic distribution of Zn is a 50:50 mixture of normals with mean zero. For more general cases where the parameter space for ϕ is [0, +∞)q0 × Rqq0 with q0 [gt-or-equal, slanted] 1, similar arguments can be applied to show that the asymptotic distribution of Zn is a 2q0 mixture of normals.

The asymptotic distribution of the pseudolikelihood ratio statistic T is

infθ  [set membership] CΩ0(Zθ)TI11(Zθ)infθ [set membership] CΩ(Zθ)TI11(Zθ),

where Z is the asymptotic distribution of Zn. Therefore, the asymptotic distribution of T is that of the likelihood ratio for testing the restricted mean of a multivariate normal distribution with covariance matrix I111 based on one observation from a population with a mixture of normals specified by equations (5) and (6). When the true value of the nuisance parameter is an interior point, the likelihood ratio reduces to equation (2) since the mixture of normals is in fact a normal distribution.

3. Examples revisited

3. 1. Example 1: Variance component models

It can be shown that

[partial differential] log L(θ,ϕ)[partial differential]ϕ=XTΣ1(θ)(yXϕ).

Therefore,

I12(θ,ϕ)=E[[partial differential][partial differential]θ{XTΣ1(θ)(yXϕ)}]={[partial differential][partial differential]θXTΣ1(θ)}E(yXϕ)=0,

for all θ and ϕ, thus I11*=I11. By result in Case 2, the pseudolikelihood ratio testing the j th variance component being zero, i.e. H0 : θj = 0, has an asymptotically 50:50 mixture of χ02 and χ12.

3. 2. Example 2: Behrens–Fisher problem

It can be shown that

E{[partial differential]2 log L(θ,ϕ)[partial differential]θi [partial differential]ϕi}=δiiE(j=1niyijθiϕ12)=0    (i,  i=1,2),

where δii = 1 if i′ = i and 0 otherwise. Therefore, I12(θ, ϕ) = 0. As indicated in § 2.1, the pseudolikelihood ratio has, asymptotically, a χ12 distribution.

3. 3. Example 3: Frailty survival models

Let yi = (yi1, . . ., yi4) be the observations from the i th cluster (i = 1, . . ., n), with the joint survival distribution of the form equation (1) where the marginal survival distribution for the jth individual, Sj = Sj (yj ; ϕ), is indexed by ϕ. For simplicity, we assume no censoring, so the likelihood function for (θ1, θ2, ϕ) is

L(θ1,θ2,ϕ)[proportional, variant][partial differential]4S(t1,t2,t3,t4)[partial differential]t1[partial differential]t2[partial differential]t3[partial differential]t4=[partial differential]h1[partial differential]t1[partial differential]h1[partial differential]t2[partial differential]h2[partial differential]t3[partial differential]h2[partial differential]t4[partial differential]4S[partial differential]h12[partial differential]h22,

where

h1=j=12Sj(tj)1θ11, h2=j=34Sj(tj)1θ11,S=h1/(1θ2)=(h1a+h2α1)1/(1θ2),     [partial differential]h1/[partial differential]tj=(1θ1)Sj(tj)1θ1λj(tj)   (j=1,2),[partial differential]h2/[partial differential]tj=(1θ1)Sj(tj)1θ1λj(tj)    (j=3,4),

λj (tj) is the marginal hazard function for the j th individual, and

[partial differential]4S[partial differential]h12[partial differential]h22=α2(α1)2h1α2h2α2[partial differential]2S[partial differential]h2+α3(α1){h12(α1)h1α2+h1α2h22(α1)}[partial differential]3S[partial differential]h3+α4(h1h2)2(α1)[partial differential]4S[partial differential]h4

with [partial differential]lS/[partial differential]hl=[[product]j=1l{1/(1θ2)+1j}]h1/(1θ2)l (l = 2, 3, 4). A hypothesis of interest is H0 : θ2 = 1 which implies the independence of yij s for two individuals from different households.

The parameters θ1 and θ2 must satisfy the inequality θ1 [gt-or-equal, slanted] θ2 [gt-or-equal, slanted] 1. The cones which approximate Ω0 and Ω1 are [0, ∞) × {0} and {(t1, t2) : t1 [gt-or-equal, slanted] t2 [gt-or-equal, slanted] 0}. Since the parameter space cannot be written as the product of intervals, results in Cases 1–5 may not be applied. A simple reparameterization from θ = (θ1, θ2) to τ = (θ2, δ), δ = θ1θ2, yields the cones approximating the parameter space of (θ2, δ) being {0} × R and [0, ∞) × R. By result in Case 2, the asymptotic distribution of the pseudolikelihood ratio is

12χ02+12e(1)*e(1)χ12,

where e(1) and e(1)* are the (1, 1) elements of matrix I111 and I111I11*I111, respectively,

I11=limn[E{1n[partial differential]2log L(τ0,ϕ0)[partial differential]τ2;τ0,ϕ0}],          I12=limn[E{1n[partial differential]2logL(τ0,ϕ0)[partial differential]τ[partial differential]ϕ;τ0,ϕ0}],

L(τ, ϕ) = L(θ2 + δ, θ2, ϕ), I11*=I11+I12Σ22I12T, and Σ22 is the asymptotic variance of the consistent estimator of ϕ.

3. 4. Example 4: Teratological experiments

For simplicity, we consider the special case of a single treatment group and a control group and assume a logistic dose–response model. Denote xi = 0 if the litter i is in the control group and 1 if in the treatment group, and ϕc and ϕt for correlations within control and treatment groups, respectively. The moment estimates for ϕc and ϕt can be found by solving the following equations for nonnegative roots,

{Σxi=0{(piπc)2/[πc(1πc){1+(mi1)ϕc}/mi](n01)/n0}=0,Σxi=1{(piπt)2/[πt(1πt){1+(mi1)ϕt}/mi](n11)/n1}=0,

where pi = yi/mi, πc = exp(β0)/{1 + exp(β0)} and πt = exp(β0 + β1)/{1 + exp(β0 + β1)}, and n0 and n1 are numbers of litters in control and treatment groups, respectively. The parameters πc and πt are unknown. One may estimate (ϕc, ϕt) by substituting (πc, πt) by a consistent estimator, such as the maximum likelihood estimator ignoring the correlations, i.e. [pi]c = ∑xi = 0 yi / ∑xi=0 mi and [pi]t = ∑xi = 1 yi /∑xi = 1 mi.

To alleviate the impact of the nuisance parameter on the parameter of interest, one could base the inference for β on the pseudolikelihood L*(β0, β1) = L(β0, β1, [phi with hat]c, [phi with hat]t). Here we derive the asymptotic distribution of the pseudolikelihood ratio test statistic for H0 : β1 = 0, i.e. no treatment effect, when the intralitter correlation in the control group, ϕc, is 0, but the intralitter correlation in the treatment group ϕt > 0. Denote θ = (β1, β0) and ϕ = (ϕc, ϕt). The true value of nuisance parameter, ϕc, lies on the boundary of its parameter space. According to the results in § 2.3, the asymptotic distribution of the pseudolikelihood ratio test statistic is Z12/e(1), where (e(1) is the (1, 1) element of the matrix I111, I11 = E [[partial differential] log L(θ0, ϕ0)/[partial differential]θ {[partial differential] log L (θ0, ϕ0)/[partial differential]θ}T] and Z = (Z1, Z2) is a 50:50 mixture of bivariate normal distributions described by equations (5) and (6). The Fisher information matrix I11 and the asymptotic variance of [phi with hat]*, Σ22, are calculated in Appendix B.

4. Simulation study

To explore the theoretical findings empirically, we conducted two simulation studies. In the first, we applied the pseudolikelihood ratio test to test for the statistical significance of a variance component in the model described in Example 1 and compared its performance with the likelihood ratio test based on its asymptotic distribution and the exact finite sample distribution (Crainiceanu & Ruppert, 2004). The model described in Example 1 belongs to the linear mixed model with one variance component, specifically,

y=Xϕ+Zb+[sm epsilon],         E(b[sm epsilon])=(0K0n),         cov(b[sm epsilon])=(σb2IK00σ[sm epsilon]2In),
(7)

where b is random effects due to specific genetic or nongenetic factors shared among relatives, and [sm epsilon] represents random deviations unique to each individual. With nuisance parameter ϕ replaced by the conventional least-squares estimator [phi with hat] = (XTX)−1XTy, the pseudolikelihood is L*(σb2,σ[sm epsilon]2)=L(ϕ^,σb2,σ[sm epsilon]2) and the pseudolikelihood ratio test statistic for testing H0:σb2=0 is

T=2{sup{σb2[gt-or-equal, slanted]0,   σ[sm epsilon]2[gt-or-equal, slanted]0}L*(σb2,σ[sm epsilon]2) sup{σb2=0, σ[sm epsilon]2[gt-or-equal, slanted]0}L*(σb2,σ[sm epsilon]2)}.

Maximizing L*(σb2,σ[sm epsilon]2) over σb2 and σ[sm epsilon]2 simultaneously by Newton–Raphson or Fisher’s scoring method could lead to negative estimates of variances, so we reparameterized σb2 by λσ[sm epsilon]2, and maximized the pseudolikelihood first for fixed λ and then over λ. Testing the null hypothesis H0:σb2=0 is equivalent to testing H0 : λ = 0 (Crainiceanu & Ruppert, 2004). Twice the log-pseudolikelihood function is

2 log{L(ϕ^,λ,σ[sm epsilon]2)}=n log (σ[sm epsilon]2)log|Vλ|(yXϕ^)TVλ1(yXϕ^)σ[sm epsilon]2,

where Vλ = In +λZ ZT. Under H0, σ[sm epsilon]2˜=arg maxσ[sm epsilon]2[log{L(ϕ^,0,σ[sm epsilon]2)}]=yTP0y/n, where P0 = InX(XTX)−1XT. For fixed λ, σ[sm epsilon]2^(λ)=arg maxσ[sm epsilon]2[log{L(ϕ^,λ,σ[sm epsilon]2)}]=yTP0TVλ1 P0y/n. Therefore, the pseudolikelihood ratio test statistic is

T=supλ [gt-or-equal, slanted] 0[log|Vλ|+n  log (σ˜[sm epsilon]2)n log{σ^[sm epsilon]2(λ)}].
(8)

The above supremum can be found by a grid search over possible values of λ. The likelihood ratio test statistic has a form similar to equation (8), except that σ[sm epsilon]2^(λ) is replaced by yTPλTVλ1Pλy/n, where Pλ=InX(XTVλ1X)1XTVλ1.

To obtain the size of the tests, we drew y from (7) with σb2=0. The family size was set to 3 and the number of families varied from 20 to 40. The true value of ϕ was set to be (1, 2, 3)T and the matrix X was fixed at randomly drawn numbers from standard normal distribution. The distribution of the pseudolikelihood ratio test statistic does not depend on the magnitude of σ[sm epsilon]2 because testing the random effect is equivalent to testing the within group correlation σb2/(σb2+σ[sm epsilon]2), which only depends on σb2/σ[sm epsilon]2. For simplicity, we fixed σ[sm epsilon]2=1. For each generated dataset, we computed the pseudolikelihood ratio test statistic and the likelihood ratio test statistic, and rejected the null if the test statistic is larger than the (1 − α)th quantile of 50:50 mixture of χ02 and χ12, i.e. the (1 − 2α)th quantile of χ12. We also computed the exact distribution of the likelihood ratio test based on results in Crainiceanu & Ruppert (2004) and obtained the (1 − α)th quantile of its distribution. To obtain the power of the test, a similar procedure was implemented, except that y was drawn from model (7) with σb2/σ[sm epsilon]2=0.1, 0.2 and 0.4, respectively.

Table 1 shows results from 5000 simulations. The nominal level for Type I error, α, was set to 0.05 and 0.01. When the null hypothesis is true, both the pseudolikelihood ratio test and the likelihood ratio test based on its exact distribution have proportions of rejections within 95% confidence intervals for the nominal error rates, i.e. 0.007–0.013 for α = 0.01, and 0.044–0.056 for α = 0.05, suggesting that the asymptotic approximation for the pseudolikelihood ratio test statistic is adequate for moderate sample sizes. The pseudolikelihood test is slightly less powerful than the likelihood ratio test based on its exact distribution. On the other hand, the likelihood ratio test based on its asymptotic distribution is liberal but more powerful compared to the other two tests. In general, the pseudolikelihood ratio test is more conservative and less powerful than the likelihood ratio test when the nuisance parameter is estimated by the maximum likelihood estimator under the null.

Table 1.
Empirical rejection rates (%) in 5000 simulations for the pseudolikelihood ratio test and the likelihood ratio test for testing for a variance component, based on samples of size n, for different variance ratios

In the second simulation study, we applied the pseudolikelihood ratio test to test for the treatment effect in a two-group teratological experiment. We simulated data that mimic the litter size structure of Weil (1970) with β0 = 1.05, ϕt = 0.05 and ϕc = 0; see Example 4 of § 1 for detailed descriptions of the statistical models considered. The true value of treatment effect, β1, was set to vary from 0 to −1 in order to obtain the size and the power of the test, at nominal levels of 0.05 and 0.01, respectively. The number of litters varied from 16 to 64 per group. In addition to the pseudolikelihood ratio test, we also applied two other methods for comparison; the Wald test, comparing the Wald test statistic to the standard normal distribution; and the naive likelihood ratio test ignoring the intralitter correlations. Table 2 shows results from 5000 simulations. Under all settings considered, the pseudolikelihood ratio test has proportions of rejections within 95% confidence intervals for the nominal error rates. The Wald test was liberal and its performance was not improved as the sample size became larger. This is because the true asymptotic distribution of the Wald test statistic is a mixture of two normals (Self & Liang, 1987). The naive likelihood ratio test has proportions of rejections higher than the nominal level compared to the Wald test. This simulation suggests that neither the Wald test nor the naive likelihood ratio test can be recommended in practice.

Table 2.
Empirical rejection rates (%) in 5000 simulations for the pseudolikelihood ratio test, Wald test and the naive likelihood ratio test for testing the treatment effect in a two-group teratological experiment, based on samples of size n0/n1, for different ...

5. Discussion

One interesting area of future research is to extend the current results to situations where the nuisance parameters are infinite-dimensional. For instance, in Example 4 of the multilevel frailty survival models, the marginal survival distribution may be left unspecified. It will be of interest to examine the asymptotic behaviour of the pseudolikelihood ratio test statistic in this situation.

Acknowledgments

The authors thank the referees and the editor for comments that substantially improved the presentation. This research was supported in part by a grant from the National Institutes of Health, U.S.A.

Appendix A

Proof of Theorem 2

Proof . For simplicity of notation, let θ0 = 0. Under the same regularity conditions as Chernoff (1954), for any θ in a neighbourhood N of θ = 0, we have

2{log L*(θ)logL*(θ)}=2{logL(θ,ϕ^)logL(0,ϕ^)}=2{[partial differential] log L(0,ϕ^)[partial differential]θ }Tθ+θT{[partial differential]2 log L(0,ϕ^)[partial differential]θ2}θ+Op(n)θ3=2(n1/2A1*)T(n1/2θ)+(n1/2θ)TB11*(n1/2θ)+Op(n)θ3,

where A1*=n1i=1n[partial differential]logL(0,ϕ^)/[partial differential]θ and B11*=n1i=1n[partial differential]2logL(0,ϕ^)/[partial differential]θ2.

The asymptotic distribution of n1/2 A* is normal with mean zero and variance matrix I11*, where I11*=I11+I12Σ22I12T, i.e. n1/2A1*N(0,I11*) and B11*I11 as n → ∞.

Let θ=I111A1*+η with η = Op(n −1/2). Then,

2{log L*(θ)logL*(0)}=2(n1/2A1*)T(n1/2θ)+(n1/2θ)TI11*(n1/2θ)+Op(1)= 2(n1/2η)TI11(n1/2η)+(n1/2A1*)TI111(n1/2A1*)+Op(1).

Let −2 log λ* = −2 log{PΩ0/PΩ1}.

Applying Lemma 1 and the above approximations to Ω0 and Ω1, we get

2 logλ*=infθ[set membership]Ω0(n1/2η)TI11(n1/2η)infθ[set membership]Ω1(n1/2η)TI11(n1/2η)+Op(1).

But for any set [var phi] that can be approximated by C[var phi],

infθ[set membership][var phi](yθ)TI11(yθ)=infθ[set membership]C[var phi](yθ)TI11(yθ)+o(y2),

and therefore,

2logλ*=infθ[set membership]CΩ0(n1/2I111A1*n1/2θ)TI11(n1/2I111A1*n1/2θ)infθ[set membership]CΩ1(n1/2I111A1*n1/2θ)TI11(n1/2I111A1*n1/2θ) +Op(1).

Since CΩ and CΩ0 are positively homogeneous,

2logλ*=infθ[set membership]CΩ0(Znθ)TI11(Znθ)infθ[set membership]CΩ1(Znθ)TI11(Znθ)+Op(1),

where Zn=n1/2I111A1* and ZnZ~N(0,I111I11*I111) as n → ∞. The function infθ[set membership]ϕ(Zθ)T I11(Zθ) is a continuous function of Z. Therefore, the asymptotic distribution of −2 log λ* is the distribution of

infθ[set membership]CΩ0(Zθ)TI11(Zθ)infθ[set membership]CΩ1(Zθ)TI11(Zθ)

under the assumption that Z~N(0,I111I11*I111). By the equation T = max{−2 log λ*(x), 0}, we have that the asymptotic distribution of T is the distribution of

infθ[set membership]CΩ0(Zθ)TI11(Zθ)infθ[set membership]CΩ(Zθ)TI11(Zθ),

where CΩ = CΩ0 [union or logical sum] CΩ1.

Appendix B

Calculation of Fisher information matrices and asymptotic variances of moment estimators in Example 4

The loglikelihood function contributed by the ith litter is

log Li(θ,ϕ)[proportional, variant]log Pr(yi|mi,xi;θ,ϕ)=I(xi=0){k=0yi1log(πc+kψc)+k=0miyi1log(1πc+kψc)k=0mi1log(1+kψc)}    +I(xi=1){k=0yi1log (πt+kψt)+k=0miyi1log(1πt+kψt)k=0mi1log(1+kψt)},
(B1)

where ψc = ϕc/(1 − ϕc), ψt = ϕt /(1 − ϕt), πc = exp(β0)/{1 + exp(β0)} and πt = exp(β0 + β1)/{1 + exp(β0 + β1)}. Denote the score vector for θ by Sθi=(Sβ1i,Sβ0i)T, where Sβji=[partial differential]logLi(θ,ϕ)/[partial differential]βj(j=0,1) ; and the score vector for ϕ by Sϕi=(Sϕci,Sϕti)T, where Sϕji=[partial differential]logLi(θ,ϕ)/[partial differential]ϕj(j=c,t). Therefore, the Fisher information matrices I11 and I12 can be estimated by their empirical versions, n1i=1n{Sθi(Sθi)T} and n1i=1n{Sθi(Sϕi)T}, respectively. The simple representation in equation (B1) avoids having different forms of density functions, binomial if ϕ = 0 and beta-binomial if ϕ > 0, and does not require the digamma function when calculating the scores.

To calculate the asymptotic variances of ϕ^c* and ϕ^t*, we first notice that they are asymptotically independent and we only need to calculate the asymptotic variance of ϕ^c* since calculation for ϕ^t* is similar.

Denote g(ϕc, mi, yi, πc) = (piπc)2/[πc(1 − πc){1 + (mi − 1)ϕc}/mi] − (n0 − 1)/n0. The moment estimator ϕ^c* is the solution of xi=0g(ϕ^c*,mi,yi,π^c)=0, where [pi]c = ∑xi = 0 yi / ∑xi = 0 mi. Apply the Taylor expansion of g( ϕ^c*, mi, yi, [pi]c) around ϕc:

xi=0[g(ϕc,mi,yi,π^c)+{[partial differential]g(ϕc,mi,yi,π^c)/[partial differential]ϕc}(ϕ^c*ϕc)]0.

Therefore,

n01/2(ϕ^c*ϕc){n01xi=0[partial differential]g(ϕc,mi,yi,π^c)/[partial differential]ϕc}1n01/2xi=0g(ϕc,mi,yi,π^c).

Apply again the Taylor expansion of g(ϕc, mi, yi, [pi]c), this time around πc:

n01/2(ϕ^c*ϕc){n01xi=0[partial differential]g(ϕc,mi,yi,πc)[partial differential]ϕc}1×[{n01/2xi=0g(ϕc,mi,yi,πc)}+{n01xi=0[partial differential]g(ϕc,mi,yi,πc)[partial differential]πc}n01/2(π^cπc)]{n01xi=0[partial differential]g(ϕc,mi,yi,πc)[partial differential]ϕc}1[{n01/2xi=0g(ϕc,mi,yi,πc)}{n01xi=0[partial differential]g(ϕc,mi,yi,πc)[partial differential]πc}{n01xi=0mi}1n01/2xi=0(miπcyi)]={n01xi=0[partial differential]g(ϕc,mi,yi,πc)[partial differential]ϕc}1×n01/2xi=0×[g(ϕc,mi,yi,πc){n01xi=0[partial differential]g(ϕc,mi,yi,πc)[partial differential]πc}{n01xi=0mi}1(miπcyi)],

where the second to last step is by the fact that [pi]c is the solution of the estimating equation ∑xi =0(mi πcyi) = 0.

Finally, the asymptotic variance of ϕ^c* is

{1n0xi=0[partial differential]g(ϕc,mi,yi,πc)[partial differential]ϕc}2  ×1n0xi=0[g(ϕc,mi,yi,πc){1n0xi=0[partial differential]g(ϕc,mi,yi,πc)[partial differential]πc}{1n0xi=0mi}1(miπcyi)]2.

References

  • Bandeen-Roche KJ, Liang K-Y. Modelling failure-time associations in data with multiple levels of clustering. Biometrika. 1996;83:29–39.
  • Beaty TH, Self SG, Liang K-Y, Connolly MA, Chase GA, Kwiterovich PO. Use of robust variance components models to analyse triglyceride data in families. Ann Hum Genet. 1985;49:315–28. [PubMed]
  • Chernoff H. On the distribution of the likelihood ratio. Ann Math Statist. 1954;25:573–8.
  • Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978;65:141–51.
  • Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. J. R. Statist. Soc. B. 2004;66:165–85.
  • Gong G, Samaniego FJ. Pseudo maximum likelihood estimation: theory and applications. Ann Statist. 1981;9:861–9.
  • Graybill FA. Theory and Application of the Linear Model. North Scituate, MA: Duxbury Press; 1976.
  • Haseman JK, Kupper LL. Analysis of dichotomous response data from certain toxicological experiments. Biometrics. 1979;35:281–93. [PubMed]
  • Hauck WW, Donner A. Wald’s test as applied to hypotheses in logit analysis. J Am Statist Assoc. 1977;72:851–3.
  • Liang K-Y, Self SG. On the asymptotic behaviour of the pseudolikelihood ratio test statistic. J. R. Statist. Soc. B. 1996;58:785–96.
  • Oakes D. Bivariate survival models induced by frailties. J Am Statist Assoc. 1989;84:487–93.
  • Parke WR. Pseudo maximum likelihood estimation: the asymptotic distribution. Ann Statist. 1986;14:355–7.
  • Pierce DA. The asymptotic effect of substituting estimators for parameters in certain types of statistics. Ann Statist. 1982;10:475–8.
  • Self SG, Liang K-Y. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Statist Assoc. 1987;82:605–10.
  • Shih JH, Louis TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics. 1995;51:1384–99. [PubMed]
  • Stram DO, Lee JW. Variance components testing in the longitudinal mixed effects model. Biometrics. 1994;50:1171–7. [PubMed]
  • Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979;16:439–54. [PubMed]
  • Weil CS. Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Food Cosmet Toxicol. 1970;8:177–82. [PubMed]
  • Williams DA. Estimation bias using the beta-binomial distribution in teratology. Biometrics. 1988;44:305–9. [PubMed]

Articles from Biometrika are provided here courtesy of Oxford University Press