Biometrika. 2010 September; 97(3): 603–620.
Published online 2010 June 11.
PMCID: PMC3372243

On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems

Summary

This paper considers the asymptotic distribution of the likelihood ratio statistic T for testing a subset of parameter of interest θ, θ = (γ, η), H0 : γ = γ0, based on the pseudolikelihood L(θ, ), where is a consistent estimator of ϕ, the nuisance parameter. We show that the asymptotic distribution of T under H0 is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, θ0, or the true value of the nuisance parameter, ϕ0, lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of T may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments.

Some key words: Asymptotic distribution, Boundary problem, Frailty survival model, Likelihood ratio test, Nuisance parameter, Pseudolikelihood, Teratological experiment, Variance component model

1. Introduction

1.1. Motivation

Very often a probability model for data is indexed by two sets of parameters: the parameter of interest, θ, of dimension p, and the nuisance parameter, ϕ, of dimension q. Sometimes the likelihood function L(θ, ϕ), when considered jointly as a function of θ and ϕ, is ill-behaved, yet when considered as a function of θ alone with ϕ held fixed, is well-behaved. When the nuisance parameter cannot be eliminated through conditioning, marginalizing or factorization, one approach is to use the pseudolikelihood proposed by Gong & Samaniego (1981). The key idea of this approach is that inference for θ can be based on L*(θ) = L(θ, ), known as the pseudolikelihood for θ, where is a consistent estimator of ϕ, assuming such an estimator exists. Gong & Samaniego derived the asymptotic distribution of the maximum pseudolikelihood estimator θ^ for θ under regularity conditions. Thus, Wald-based inference for θ can be made based on this asymptotic distribution. However, it has been well documented in the literature that such inference could be ill-behaved, and that this concern can be alleviated by using likelihood-ratio-based inference (Hauck & Donner, 1977). To this end, Liang & Self (1996) studied the asymptotic behaviour of the likelihood ratio test statistic for testing H0 : θ= θ0 based on L*(θ). The primary purpose of this paper is to extend the work by Liang & Self (1996) to deal with situations where only a subset of θ is tested, and the parameter value of interest or the nuisance parameter may lie on the boundary of its parameter space.

1.2. Example 1: Variance component models

Consider the model

where y is an n × 1 vector of the response variables, X is the design matrix of n × q dimension and Σ(θ) is the n × n covariance matrix indexed by θ, a p × 1 vector. In pedigree analyses, it has been a standard procedure to estimate ϕ by the conventional least-squares estimator = (XTX)−1XTy and to base inference for θ on L(θ, ), where L(θ, ϕ) is proportional to the conditional distribution of Y = y given X. Often we are interested in testing the significance of the multiple variance components one at a time and leave others unspecified (Beaty et al., 1985).

1.3. Example 2: Behrens–Fisher problem

Consider yij ~ N(θi, ϕi) (j = 1, …, ni; i = 1, 2), when the yij s are independent of each other. The Behrens–Fisher problem corresponds to the problem of testing the difference between the two means, i.e. H0 : θ1 = θ2. With ϕi replaced by $ϕ^i=∑i=1ni(yij−y¯i)2/ni$, the sample variance of the ith group (i = 1, 2), the pseudolikelihood for (θ1, θ2) has the form

The null hypothesis of interest is H0 : γ = θ1θ2 = 0, leaving η = θ2 unspecified.

1.4. Example 3: Frailty survival models

Bandeen-Roche & Liang (1996) described a class of models for failure-time data that accounts for multiple levels of clustering. This class of models permits specification of simple distributional forms for all bivariate margins. In addition, the distribution reduces to the univariate frailty model (Vaupel et al., 1979; Oakes, 1989) when there is only a single level. Consider a cluster with two levels; for example, households and villages. Suppose there are four individual members and the households are clustered as {1, 2} and {3, 4}. When the Clayton copula (Clayton, 1978) is used, the multivariate survival function has the form

$S(t1,…,t4)=[{∑j=12Sj(tj)1−θ1−1}α+{∑j=34Sj(tj)1−θ1−1}α−1]1/(1−θ2),$
(1)

where α = (θ2 – 1)/(θ1 – 1), Sj is the marginal survival distribution of the j th member (j = 1, …, 4), and θ1 1 and θ2 1 characterize the association within and between households, respectively. To ensure that the above survival function is legitimate, one needs θ2 θ1, i.e. the between-household association not exceeding the within-house association. Let ϕ denote the parameters which specify the Sj s, the pseudolikelihood approach for inference about θ has been proposed (Bandeen-Roche & Liang, 1996; Shih & Louis, 1995) as convenient estimators of ϕ are typically available by, for example, maximizing the likelihood function with respect to ϕ assuming independence, i.e. fixing θ1 and θ2 at 1. One hypothesis of interest is H0 : θ2 = 1, i.e. no between-household association. Here θ2 = 1 is on the boundary of the parameter space for θ2.

1.5. Example 4: Teratological experiments

In teratological experiments, the response variables are often dichotomous and their sums are commonly modelled by beta-binomial distributions (Weil, 1970; Haseman & Kupper, 1979). The distributions are indexed by π, the probability of experiencing an adverse event, and ϕ, the nuisance parameter characterizing the so-called litter effect, also known as overdispersion, the tendency that the subjects from the same litter to respond more alike than subjects from different litters. Often investigators would like to establish the dose-response relationship between π and x, the dosage level or the treatment status. One may consider the model,

where g is known as the prespecified link function. For beta-binomial models, it has been well documented that the nuisance parameter has a profound impact on the parameter of interest (Williams, 1988). One way to alleviate such concern is to base the inference of β on the pseudo-likelihood L(β, ) where is a consistent estimator of ϕ. Such an estimator could be obtained, for example, by using method of moments. One hypothesis of interest is H0 : β1 = 0, i.e. no dose-response relationship. Since the intralitter correlation could be dose-dependent, heterogeneous correlations are often assumed. When the dosage level is zero or the placebo is used, the intralitter correlation is usually close to zero, lying near or on the boundary of its parameter space.

1.6. Example 5: A general setting

Consider the case where the likelihood can be decomposed into two parts,

where both L1 and L2 are legitimate likelihood functions. In this case, one could use the maximizer of L2(ϕ), , to form the pseudolikelihood function L*(θ) for θ. Many examples fit into this case. It is easy to see that Example 2 above is a special case with L2(ϕ) proportional to the product of probability density functions of sample variances from both samples. Another example is genetic linkage study where the data consist of genetic marker data, M, and genotyping data, T, from pedigrees. In this situation, the likelihood function is proportional to

where θj, (j = 1, …, p), is the location of the jth disease gene, and ϕ are parameters characterizing an underlying genetic mechanism for the disease. In the context of measurement errors, the true covariate, X, is observed indirectly through surrogate, W, which is measured with errors. With Y as the dependent variable, the likelihood function is proportional to

where θ is the regression coefficient relating X to Y and ϕ is the parameter characterizing the measurement errors for the covariates.

2. Main results

2.1. The parameter of interest and the nuisance parameter are interior points

In this subsection, we consider the cases where only a subset of the parameter of interest is tested while the true values of the parameter of interest and nuisance parameter are interior points of their parameter spaces. Consider the partition of θ into two parts: the parameters to be tested, γ, of dimension p1 and the remaining parameters, η, of dimension p2 = pp1. The null hypothesis of interest is specified as H0 : γ = γ0. Let T be the pseudolikelihood ratio test statistic, i.e. T = −2 log [L*{γ0, (γ0)}/L*(, )], where (γ0) = arg maxη L*(γ0, η) and (γ^, ) = arg maxγ,η L*(γ, η). Theorem 1 states the large sample distribution of T under H0.

Theorem 1. Suppose that under a distribution indexed by (θ0, ϕ0), the consistent estimator of ϕ, , has the asymptotic distribution n1/2(ϕ0) → N(0, Σ22) as n → ∞.

Then, under the regularity conditions (A1)–(A6) in Gong & Samaniego (1981),

1. the p × p matrix
${I11−1−(000Iηη−1)}I11*$
has p1 positive eigenvalues, λ1 λp1>0, and pp1 zero eigenvalues;
2. the asymptotic distribution of the pseudolikelihood ratio statistic T is $U=∑j=1p1λjUj$, where the Ujs are independent $χ12$ variables.

Here

and $I11*$ is the asymptotic variance of n−1/2 log L(θ0, )/θ and can be calculated as $I11*=I11+I12Σ22I12T$ with

The proof is straightforward using Taylor expansions and Theorem 4.4.4 of Graybill (1976) and hence is omitted. Implicitly, we used the fact that n1/2{θ^(ϕ0) – θ0}and n1/2(ϕ0) are asymptotically independent because θ^(ϕ0) is asymptotically efficient and n1/2(ϕ0) has asymptotic constant mean zero (Pierce, 1982; Parke, 1986). When p1 = p, Theorem 1 reduces to the main result of Liang & Self (1996).

It is easy to show that when $I11*=I11$, the distribution of U is $χp12$, which is the same as the asymptotic distribution of the conventional likelihood ratio statistic. As noted by Liang & Self (1996), this condition is satisfied when I12(θ0, ϕ) = 0 for all ϕ, which means that the score function for θ and ϕ are orthogonal to each other when evaluated at (θ0, ϕ). Another situation where we have $I11*=I11$ is Σ22 = 0. This could occur when ϕ is estimated through external data whose sample size n* increases at a faster rate than n, i.e. n*/n → ∞ as n → ∞. A less obvious situation, which results in the distribution of U being $χp12$, occurs when Iγη = Iγϕ =0. This means that the score function for γ is orthogonal both to the score function for η and to the score function for ϕ.

2.2. The parameter of interest lies on the boundary

In this subsection, we consider cases where the true value of θ lies on the boundary of its parameter space while the true value of the nuisance parameter ϕ is an interior point of its parameter space. These situations were informally mentioned in Liang & Self (1996) but were not formally treated therein. Denote the parameter space of θ by Ω and rewrite the hypothesis H0 : γ = γ0 as H0 : θ Ω0, where Ω0 = {θ : θ = (γ0, η) Ω}. The complement of Ω0 in Ω is denoted by Ω1. In § 2.1 where θ0 is not on the boundary, Ω, Ω0 and Ω1 are simply Rp, Rp2 and Rp1, respectively. For any subset of Rp, , we define P= supθ L*(θ). We also define the maximum pseudolikelihood estimator in the parameter space , θ^, as that value of θ in the closure of which maximizes L*(θ).

The pseudolikelihood ratio test statistic can be written as T = −2 log (PΩ0/PΩ). Gong & Samaniego (1981) established the n1/2-consistency of the maximum pseudolikelihood estimator for θ when the true value of θ is an interior point of its parameter space. Here we give the consistency result when the true value of θ lies on the boundary.

Lemma 1. If the regularity conditions in Chernoff (1954) hold, the intersection of and the closure of neighbourhoods centred about the true value of the parameter of interest, θ0, constitute closed subsets of Rp, and θ0 is a limit point of , then with probability tending to 1 as n → ∞, there exists a sequence of points in , , at which local maxima of log L*(θ) occur, and that converges to θ0 in probability. Moreover, n1/2(θ0) = Op(1).

We now establish the equivalence between the asymptotic distribution of the pseudolikelihood ratio and the distribution of the likelihood ratio of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix.

Theorem 2. If the regularity conditions in Chernoff (1954) are satisfied, θ0 is a limiting point of both Ω0 and Ω1, and the sets Ω0 and Ω1 are approximated by nonnull and disjoint cones CΩ0 and CΩ1, respectively, then, when γ = γ0, the asymptotic distribution of the pseudolikelihood ratio statistic, T, is the same as the distribution of the likelihood ratio test of θ CΩ0 against θ CΩ1 based on one observation from a population with multivariate normal distribution with mean and unknown covariance matrix $I11−1I11*I11−1$ while the covariance matrix is misspecified as $I11−1$ in the likelihood ratio test.

A sketch of the proof is given in Appendix A. When there is no nuisance parameter ϕ or $I11*=I11$, Theorem 2 reduces to Theorem 3 of Self & Liang (1987) in which the likelihood ratio instead of pseudolikelihood ratio is considered.

The result given by Theorem 2 reduces the general problem of computing the limit distribution of a pseudolikelihood ratio to a problem of computing the distribution of

$T1(Z)=QCΩ0(Z)−QCΩ(Z),$
(2)

where Q(Z) = infθ (Zθ)TI11(Zθ), Z ~ N(0, $I11−1I11*I11−1$) and CΩ = CΩ0 CΩ1.

The distribution of T1(Z) in any given case could be rather complicated. We now offer several special cases in which Theorem 2 may be used to compute the asymptotic distribution of the pseudolikelihood ratio statistic.

From here on we assume Ω = Ω1 × × Ωp, where the Ωjs are either closed, half-open or open intervals in R. To compute the distribution of T1(Z), we need the following lemma.

Lemma 2.Suppose Z ~ N(0, $I11−1I11*I11−1$) and = {0}p1 × Rpp1. Let Q(Z) = infθ (Zθ)TI11(Zθ). Then

1. $Qϕ(Z)=Z(p1)Te(p1)−1Z(p1)$, where Z(p1) = (Z1, …, Zp1)T and e(p1) is the p1 × p1 upper left submatrix of $I11−1$;
2. the distribution of Q(Z) is $U=∑j=1p1λjUj$, where Ujs are independent $χ12$ variables and λ1 λp1 > 0 are the eigenvalues of matrix $e(p1)−1e(p1)*$, with $e(p1)*$ being the p1 × p1 upper left submatrix of $I11−1I11*I11−1$; and
3. when $I11*=I11$, U is simply $χp12$.

Part (i) can be shown by completing the quadratic form, and the rest of the results follow by Theorem 4.4.4 of Graybill (1976).

Now we consider some special cases, adopting the notation in Self & Liang (1987). Partition the parameter vector, θ = (γ, η), into four coordinates: (p11, p12, p21, pp11p12p21) where the first p11 coordinates of θ represent the parameter to be tested with true values on the boundary; the next p12 coordinates of θ represent the parameter to be tested with true values not on the boundary; the next p21 coordinates of θ represent the first p21 component of η with true values on the boundary; and finally the remaining pp11p12p21 coordinates of θ represent the last pp11p12p21 coordinates of η with the true value not on the boundary. Here p1 = p11 + p12.

Case 1. Suppose that the parameter configuration is (0, p1, 0, pp1), i.e. θ0 is an interior point of Ω. Then CΩ0 = {0}p1 × Rp–p1 and CΩ=Rp. By Theorem 2, the asymptotic distribution of the pseudolikelihood ratio reduces to the distribution of T1(Z) = QCΩ0(Z). By Lemma 2, the distribution of T1(Z) is $U=∑j=1p1λjUj$, where Ujs are independent $χ12$ variables and λ1 λp1 > 0 are the eigenvalues of $e(p1)−1e(p1)*$. This can be shown to agree with results in Theorem 1. When $I11*=I11$, the asymptotic distribution of pseudolikelihood ratio statistic is $χp12$.

Case 2. Suppose the parameter configuration is (1, 0, 0, p – 1). Then CΩ0 = {0} × Rp−1 and CΩ = [0, ∞) × Rp−1. By Lemma 2, equation (2) can be reduced to

$T1(Z)=Z12/e(1)−Z12I(Z1<0)/e(1)=Z12I(Z1>0)/e(1),$

where the first step is due to the fact that QCΩ (Z) = QCΩ0 (Z) when Z1 0. This is true because the quadratic function (Zθ)TI11(Zθ) is a continuous function of θ and the infimum in the function QCΩ (Z) is achieved at the boundary of CΩ, i.e. CΩ0. Consequently, the asymptotic distribution of T is a 50:50 mixture of $χ02$ and $e(1)*χ12/e(1)$. When $I11*=I11$, this distribution is a 50:50 mixture of $χ02$ and $χ12$, which agrees with the result in Case 5 of Self & Liang (1987).

Case 3. Suppose the parameter configuration is (1, p1 – 1, 0, pp1). Then CΩ0 = {0} × {0}p1–1 × Rpp1 and CΩ = [0, ∞) × Rp−1. By Lemma 2 and a similar argument to that in Case 2, equation (2) can be expressed as $T1(Z)=Z(p1)Te(p1)−1Z(p1)−Z12I(Z1<0)/e(1)$.

Its distribution is a 50:50 mixture of random variables Ua and Ub, where $Ua=∑i=1p1λiUi$ with the λis being the p1 positive eigenvalues of the matrix $e(p1)−1e(p1)*$ and $Ub=∑j=1p1−1λjUj$ with the λjs being the p1 – 1 positive eigenvalues of the matrix

${e(p1)−1−(1/e(1)01×(p1−1)0(p1−1)×10(p1−1)×(p1−1))}e(p1)*,$

and the Ui s and Uj s being independent $χ12$ variables. In variance component models, for example, we are testing a variance component and p1 − 1 regression coefficients simultaneously while leaving the other parameters unspecified. When $I11*=I11$, the distribution is a 50:50 mixture of $χp1−12$ and $χp12$. This agrees with the result in Case 3 of Stram & Lee (1994), where one variance component and p1 − 1 covariances are tested simultaneously. When $I11*=I11$ and p1 = 2, the result reduces to the results in Case 2 of Stram & Lee (1994) and Case 6 of Self & Liang (1987).

In Cases 2 and 3, we take the advantage of the fact that only one component of γ0 lies on the boundary. In these cases, the parameter spaces CΩ0 and CΩ, and their boundaries, are straight lines or planes which are preserved under linear transformations. In Cases 4 and 5, we consider the situation where there are two boundary parameters. In Case 4, both boundary parameters are tested while in Case 5, only one boundary parameter is tested. The calculation of the asymptotic distribution of the pseudolikelihood ratio statistic is more complicated than that in Cases 1–3.

Case 4. Suppose the parameter configuration is (2, 0, 0, 0). Then CΩ0 = {0} × {0} and CΩ = [0, ∞) × [0, ∞). Let I11 = PT P, where P is a nonsingular matrix and denote Ω = {: = for any θ CΩ} and = P Z. Then T1(Z) can be rewritten as

(3)

where ‖ · ‖ is the usual Euclidean metric. Calculation of the second term in equation (3) depends on the location of relative to the boundary of Ω. There are four regions that must be considered. The shaded region in Fig. 1 represents Ω, and Ω0 is the origin. The angle in the shaded area is less than 180°. This is due to the fact that the convexity of CΩ is preserved under linear mapping P. Denote the rotation matrix in R2 and the information matrices I11 and $I11*$ by

respectively. Further denote the columns of P by P1 and P2, and the inner product of vectors a and b by a, b = aTb. Then the distribution of T1(Z) can be calculated as

where λ1 and λ2 are eigenvalues of $I11*$ and $I11−1$, and U1 and U2 are independent $χ12$ variables.

Diagram of the parameter space for Case 4. The shaded region, Ω, represents admissible parameter values under the alternative hypothesis. Under the null hypothesis, the parameter is located at the origin. The asymptotic distribution ...

The mixing probabilities for the shaded region, region 1 and region 2, are

$ps=cos−1[(0,1)I11{I11*}−1I11(10){(0,1)I11(I11*)−1I11(01)}1/2{(1,0)I11(I11*)−1I11(10)}1/2]/2π,p1=cos−1[(0,1)I11(I11*)−1PTTπ/2P2{(0,1)I11(I11*)−1I11(01)}1/2{P2TT−π/2P(I11*)−1PTTπ/2P2}1/2]/2π,p2=cos−1[(1,0)I11(I11*)−1PTT−π/2P1{(1,0)I11(I11*)−1I11(10)}1/2{P1TTπ/2P(I11*)−1PTT−π/2P1}1/2]/2π.$

Thus the asymptotic distribution of T is a mixture of U, $d*χ12/d$, $a*χ12/a$ and $χ02$ with mixing probabilities ps, p1, p2 and 1 − psp1p2, respectively. When $I11=I11*$, the asymptotic distribution reduces to a ps, 0.5 and 0.5 − ps mixture of $χ22$, $χ12$ and $χ02$, respectively, which agrees with the results in Case 7 of Self & Liang (1987) and Example 3 in Chernoff (1954).

Case 5. Suppose the parameter configuration is (1, 0, 1, 0). Then CΩ0 = {0} × [0, +∞) and CΩ = [0, ∞) × [0, ∞). By a similar argument to that in Case 4, we may write

(4)

Calculation of the terms in equation (4) depends on the location of relative to the boundary of both Ω0 and Ω. In fact, there are two possible partitions of regions that must be considered, depending on whether the angle ψ spanned by P1 and P2 is greater than π/2. It is easy to show ψ > π/2 if and only if b < 0.

If ψ > π/2, i.e. b < 0, there are five different regions to be considered. The shaded region in Fig. 2 represents Ω, and Ω0 is the half-line in the same direction as the vector P2 with the endpoint at the origin. The distribution of T1(Z) can be calculated as

where λ1 and λ2 are eigenvalues of $I11*I11−1$, U1 and U2 are independent $χ12$ variables, and λ3 = (a*d2 − 2b*bd + d*b2)/{d(adb2)}.

Diagrams of the parameter space for Case 5 when the angle spanned by P1 and P2 is (a) greater than π/2, or (b) is less than π/2. The shaded region, Ω, represents admissible parameter values under the alternative ...

By a similar argument, the distribution of T1(Z) when ψ π/2, i.e. b 0, is calculated as

where λ1 = (a*d2 − 2b*bd + d*b2)/{d(adb2)}, U1 and U2 are independent $χ12$ variables, and λ2 and λ3 are eigenvalues of

$I11(1/a00−1/d)I11*I11−1.$

2. 3. The nuisance parameter lies on the boundary

In this subsection, we consider cases where the true value of nuisance parameter, ϕ, lies on the boundary of its parameter space. In this situation, the asymptotic distribution of the pseudolikelihood ratio statistic is more complicated since the consistent estimator may not be asymptotically normally distributed. Specifically, by a similar argument as in the proof of Theorem 2, the pseudolikelihood ratio statistic can be written as

where

The second term in Zn may not be asymptotically normally distributed and the results by Pierce (1982) and Parke (1986) cannot be applied here to show the asymptotic independence between the two terms in Zn since the second term in Zn does not have asymptotic constant mean.

In fact, the asymptotic distribution of Zn is a mixture of normals with identical mean zero and possibly unequal variances. To show this, denote by * the estimator of ϕ without restriction of parameter space and assume the parameter space for ϕ is [0, +∞) × Rq−1. By applying similar arguments as in Self & Liang (1987), can be represented as

where $Σ22ij$ are elements of the matrix Σ22, the asymptotic variance of ϕ*. Therefore, Zn can be rewritten as

where

Suppose that as n → ∞,

(5)

where Σ12 is the asymptotic covariance between n−1/2 log L(θ0, ϕ0)/θ and n1/2(*ϕ0), which is in fact zero by Pierce (1982) and Parke (1986). The asymptotic distribution of Zn is

(6)

Thus the asymptotic distribution of Zn is a 50:50 mixture of normals with mean zero. For more general cases where the parameter space for ϕ is [0, +∞)q0 × Rqq0 with q0 1, similar arguments can be applied to show that the asymptotic distribution of Zn is a 2q0 mixture of normals.

The asymptotic distribution of the pseudolikelihood ratio statistic T is

where Z is the asymptotic distribution of Zn. Therefore, the asymptotic distribution of T is that of the likelihood ratio for testing the restricted mean of a multivariate normal distribution with covariance matrix $I11−1$ based on one observation from a population with a mixture of normals specified by equations (5) and (6). When the true value of the nuisance parameter is an interior point, the likelihood ratio reduces to equation (2) since the mixture of normals is in fact a normal distribution.

3. Examples revisited

3. 1. Example 1: Variance component models

It can be shown that

Therefore,

for all θ and ϕ, thus $I11*=I11$. By result in Case 2, the pseudolikelihood ratio testing the j th variance component being zero, i.e. H0 : θj = 0, has an asymptotically 50:50 mixture of $χ02$ and $χ12$.

3. 2. Example 2: Behrens–Fisher problem

It can be shown that

where δii = 1 if i′ = i and 0 otherwise. Therefore, I12(θ, ϕ) = 0. As indicated in § 2.1, the pseudolikelihood ratio has, asymptotically, a $χ12$ distribution.

3. 3. Example 3: Frailty survival models

Let yi = (yi1, . . ., yi4) be the observations from the i th cluster (i = 1, . . ., n), with the joint survival distribution of the form equation (1) where the marginal survival distribution for the jth individual, Sj = Sj (yj ; ϕ), is indexed by ϕ. For simplicity, we assume no censoring, so the likelihood function for (θ1, θ2, ϕ) is

where

λj (tj) is the marginal hazard function for the j th individual, and

with (l = 2, 3, 4). A hypothesis of interest is H0 : θ2 = 1 which implies the independence of yij s for two individuals from different households.

The parameters θ1 and θ2 must satisfy the inequality θ1 θ2 1. The cones which approximate Ω0 and Ω1 are [0, ∞) × {0} and {(t1, t2) : t1 t2 0}. Since the parameter space cannot be written as the product of intervals, results in Cases 1–5 may not be applied. A simple reparameterization from θ = (θ1, θ2) to τ = (θ2, δ), δ = θ1θ2, yields the cones approximating the parameter space of (θ2, δ) being {0} × R and [0, ∞) × R. By result in Case 2, the asymptotic distribution of the pseudolikelihood ratio is

$12χ02+12e(1)*e(1)χ12,$

where e(1) and $e(1)*$ are the (1, 1) elements of matrix $I11−1$ and $I11−1I11*I11−1$, respectively,

L(τ, ϕ) = L(θ2 + δ, θ2, ϕ), $I11*=I11+I12Σ22I12T$, and Σ22 is the asymptotic variance of the consistent estimator of ϕ.

3. 4. Example 4: Teratological experiments

For simplicity, we consider the special case of a single treatment group and a control group and assume a logistic dose–response model. Denote xi = 0 if the litter i is in the control group and 1 if in the treatment group, and ϕc and ϕt for correlations within control and treatment groups, respectively. The moment estimates for ϕc and ϕt can be found by solving the following equations for nonnegative roots,

${Σxi=0{(pi−πc)2/[πc(1−πc){1+(mi−1)ϕc}/mi]−(n0−1)/n0}=0,Σxi=1{(pi−πt)2/[πt(1−πt){1+(mi−1)ϕt}/mi]−(n1−1)/n1}=0,$

where pi = yi/mi, πc = exp(β0)/{1 + exp(β0)} and πt = exp(β0 + β1)/{1 + exp(β0 + β1)}, and n0 and n1 are numbers of litters in control and treatment groups, respectively. The parameters πc and πt are unknown. One may estimate (ϕc, ϕt) by substituting (πc, πt) by a consistent estimator, such as the maximum likelihood estimator ignoring the correlations, i.e. c = ∑xi = 0 yi / ∑xi=0 mi and t = ∑xi = 1 yi /∑xi = 1 mi.

To alleviate the impact of the nuisance parameter on the parameter of interest, one could base the inference for β on the pseudolikelihood L*(β0, β1) = L(β0, β1, c, t). Here we derive the asymptotic distribution of the pseudolikelihood ratio test statistic for H0 : β1 = 0, i.e. no treatment effect, when the intralitter correlation in the control group, ϕc, is 0, but the intralitter correlation in the treatment group ϕt > 0. Denote θ = (β1, β0) and ϕ = (ϕc, ϕt). The true value of nuisance parameter, ϕc, lies on the boundary of its parameter space. According to the results in § 2.3, the asymptotic distribution of the pseudolikelihood ratio test statistic is $Z12/e(1)$, where (e(1) is the (1, 1) element of the matrix $I11−1$, I11 = E [ log L(θ0, ϕ0)/θ { log L (θ0, ϕ0)/θ}T] and Z = (Z1, Z2) is a 50:50 mixture of bivariate normal distributions described by equations (5) and (6). The Fisher information matrix I11 and the asymptotic variance of *, Σ22, are calculated in Appendix B.

4. Simulation study

To explore the theoretical findings empirically, we conducted two simulation studies. In the first, we applied the pseudolikelihood ratio test to test for the statistical significance of a variance component in the model described in Example 1 and compared its performance with the likelihood ratio test based on its asymptotic distribution and the exact finite sample distribution (Crainiceanu & Ruppert, 2004). The model described in Example 1 belongs to the linear mixed model with one variance component, specifically,

(7)

where b is random effects due to specific genetic or nongenetic factors shared among relatives, and represents random deviations unique to each individual. With nuisance parameter ϕ replaced by the conventional least-squares estimator = (XTX)−1XTy, the pseudolikelihood is and the pseudolikelihood ratio test statistic for testing $H0:σb2=0$ is

Maximizing over $σb2$ and simultaneously by Newton–Raphson or Fisher’s scoring method could lead to negative estimates of variances, so we reparameterized $σb2$ by , and maximized the pseudolikelihood first for fixed λ and then over λ. Testing the null hypothesis $H0:σb2=0$ is equivalent to testing H0 : λ = 0 (Crainiceanu & Ruppert, 2004). Twice the log-pseudolikelihood function is

where Vλ = In +λZ ZT. Under H0, , where P0 = InX(XTX)−1XT. For fixed λ, . Therefore, the pseudolikelihood ratio test statistic is

(8)

The above supremum can be found by a grid search over possible values of λ. The likelihood ratio test statistic has a form similar to equation (8), except that is replaced by $yTPλTVλ−1Pλy/n$, where $Pλ=In−X(XTVλ−1X)−1XTVλ−1$.

To obtain the size of the tests, we drew y from (7) with $σb2=0$. The family size was set to 3 and the number of families varied from 20 to 40. The true value of ϕ was set to be (1, 2, 3)T and the matrix X was fixed at randomly drawn numbers from standard normal distribution. The distribution of the pseudolikelihood ratio test statistic does not depend on the magnitude of because testing the random effect is equivalent to testing the within group correlation , which only depends on . For simplicity, we fixed . For each generated dataset, we computed the pseudolikelihood ratio test statistic and the likelihood ratio test statistic, and rejected the null if the test statistic is larger than the (1 − α)th quantile of 50:50 mixture of $χ02$ and $χ12$, i.e. the (1 − 2α)th quantile of $χ12$. We also computed the exact distribution of the likelihood ratio test based on results in Crainiceanu & Ruppert (2004) and obtained the (1 − α)th quantile of its distribution. To obtain the power of the test, a similar procedure was implemented, except that y was drawn from model (7) with , 0.2 and 0.4, respectively.

Table 1 shows results from 5000 simulations. The nominal level for Type I error, α, was set to 0.05 and 0.01. When the null hypothesis is true, both the pseudolikelihood ratio test and the likelihood ratio test based on its exact distribution have proportions of rejections within 95% confidence intervals for the nominal error rates, i.e. 0.007–0.013 for α = 0.01, and 0.044–0.056 for α = 0.05, suggesting that the asymptotic approximation for the pseudolikelihood ratio test statistic is adequate for moderate sample sizes. The pseudolikelihood test is slightly less powerful than the likelihood ratio test based on its exact distribution. On the other hand, the likelihood ratio test based on its asymptotic distribution is liberal but more powerful compared to the other two tests. In general, the pseudolikelihood ratio test is more conservative and less powerful than the likelihood ratio test when the nuisance parameter is estimated by the maximum likelihood estimator under the null.

Empirical rejection rates (%) in 5000 simulations for the pseudolikelihood ratio test and the likelihood ratio test for testing for a variance component, based on samples of size n, for different variance ratios

In the second simulation study, we applied the pseudolikelihood ratio test to test for the treatment effect in a two-group teratological experiment. We simulated data that mimic the litter size structure of Weil (1970) with β0 = 1.05, ϕt = 0.05 and ϕc = 0; see Example 4 of § 1 for detailed descriptions of the statistical models considered. The true value of treatment effect, β1, was set to vary from 0 to −1 in order to obtain the size and the power of the test, at nominal levels of 0.05 and 0.01, respectively. The number of litters varied from 16 to 64 per group. In addition to the pseudolikelihood ratio test, we also applied two other methods for comparison; the Wald test, comparing the Wald test statistic to the standard normal distribution; and the naive likelihood ratio test ignoring the intralitter correlations. Table 2 shows results from 5000 simulations. Under all settings considered, the pseudolikelihood ratio test has proportions of rejections within 95% confidence intervals for the nominal error rates. The Wald test was liberal and its performance was not improved as the sample size became larger. This is because the true asymptotic distribution of the Wald test statistic is a mixture of two normals (Self & Liang, 1987). The naive likelihood ratio test has proportions of rejections higher than the nominal level compared to the Wald test. This simulation suggests that neither the Wald test nor the naive likelihood ratio test can be recommended in practice.

Empirical rejection rates (%) in 5000 simulations for the pseudolikelihood ratio test, Wald test and the naive likelihood ratio test for testing the treatment effect in a two-group teratological experiment, based on samples of size n0/n1, for different ...

5. Discussion

One interesting area of future research is to extend the current results to situations where the nuisance parameters are infinite-dimensional. For instance, in Example 4 of the multilevel frailty survival models, the marginal survival distribution may be left unspecified. It will be of interest to examine the asymptotic behaviour of the pseudolikelihood ratio test statistic in this situation.

Acknowledgments

The authors thank the referees and the editor for comments that substantially improved the presentation. This research was supported in part by a grant from the National Institutes of Health, U.S.A.

Appendix A

Proof of Theorem 2

Proof . For simplicity of notation, let θ0 = 0. Under the same regularity conditions as Chernoff (1954), for any θ in a neighbourhood N of θ = 0, we have

where and .

The asymptotic distribution of n1/2 A* is normal with mean zero and variance matrix $I11*$, where $I11*=I11+I12Σ22I12T$, i.e. $n1/2A1*→N(0,I11*)$ and $B11*→−I11$ as n → ∞.

Let $θ=I11−1A1*+η$ with η = Op(n −1/2). Then,

Let −2 log λ* = −2 log{PΩ0/PΩ1}.

Applying Lemma 1 and the above approximations to Ω0 and Ω1, we get

But for any set that can be approximated by C,

and therefore,

Since CΩ and CΩ0 are positively homogeneous,

where $Zn=n1/2I11−1A1*$ and $Zn→Z~N(0,I11−1I11*I11−1)$ as n → ∞. The function infθϕ(Zθ)T I11(Zθ) is a continuous function of Z. Therefore, the asymptotic distribution of −2 log λ* is the distribution of

under the assumption that $Z~N(0,I11−1I11*I11−1)$. By the equation T = max{−2 log λ*(x), 0}, we have that the asymptotic distribution of T is the distribution of

where CΩ = CΩ0 CΩ1.

Appendix B

Calculation of Fisher information matrices and asymptotic variances of moment estimators in Example 4

The loglikelihood function contributed by the ith litter is

(B1)

where ψc = ϕc/(1 − ϕc), ψt = ϕt /(1 − ϕt), πc = exp(β0)/{1 + exp(β0)} and πt = exp(β0 + β1)/{1 + exp(β0 + β1)}. Denote the score vector for θ by $Sθi=(Sβ1i,Sβ0i)T$, where ; and the score vector for ϕ by $Sϕi=(Sϕci,Sϕti)T$, where . Therefore, the Fisher information matrices I11 and I12 can be estimated by their empirical versions, $n−1∑i=1n{Sθi(Sθi)T}$ and $n−1∑i=1n{Sθi(Sϕi)T}$, respectively. The simple representation in equation (B1) avoids having different forms of density functions, binomial if ϕ = 0 and beta-binomial if ϕ > 0, and does not require the digamma function when calculating the scores.

To calculate the asymptotic variances of $ϕ^c*$ and $ϕ^t*$, we first notice that they are asymptotically independent and we only need to calculate the asymptotic variance of $ϕ^c*$ since calculation for $ϕ^t*$ is similar.

Denote g(ϕc, mi, yi, πc) = (piπc)2/[πc(1 − πc){1 + (mi − 1)ϕc}/mi] − (n0 − 1)/n0. The moment estimator $ϕ^c*$ is the solution of $∑xi=0g(ϕ^c*,mi,yi,π^c)=0$, where c = ∑xi = 0 yi / ∑xi = 0 mi. Apply the Taylor expansion of g( $ϕ^c*$, mi, yi, c) around ϕc:

Therefore,

Apply again the Taylor expansion of g(ϕc, mi, yi, c), this time around πc:

where the second to last step is by the fact that c is the solution of the estimating equation ∑xi =0(mi πcyi) = 0.

Finally, the asymptotic variance of $ϕ^c*$ is

References

• Bandeen-Roche KJ, Liang K-Y. Modelling failure-time associations in data with multiple levels of clustering. Biometrika. 1996;83:29–39.
• Beaty TH, Self SG, Liang K-Y, Connolly MA, Chase GA, Kwiterovich PO. Use of robust variance components models to analyse triglyceride data in families. Ann Hum Genet. 1985;49:315–28. [PubMed]
• Chernoff H. On the distribution of the likelihood ratio. Ann Math Statist. 1954;25:573–8.
• Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978;65:141–51.
• Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. J. R. Statist. Soc. B. 2004;66:165–85.
• Gong G, Samaniego FJ. Pseudo maximum likelihood estimation: theory and applications. Ann Statist. 1981;9:861–9.
• Graybill FA. Theory and Application of the Linear Model. North Scituate, MA: Duxbury Press; 1976.
• Haseman JK, Kupper LL. Analysis of dichotomous response data from certain toxicological experiments. Biometrics. 1979;35:281–93. [PubMed]
• Hauck WW, Donner A. Wald’s test as applied to hypotheses in logit analysis. J Am Statist Assoc. 1977;72:851–3.
• Liang K-Y, Self SG. On the asymptotic behaviour of the pseudolikelihood ratio test statistic. J. R. Statist. Soc. B. 1996;58:785–96.
• Oakes D. Bivariate survival models induced by frailties. J Am Statist Assoc. 1989;84:487–93.
• Parke WR. Pseudo maximum likelihood estimation: the asymptotic distribution. Ann Statist. 1986;14:355–7.
• Pierce DA. The asymptotic effect of substituting estimators for parameters in certain types of statistics. Ann Statist. 1982;10:475–8.
• Self SG, Liang K-Y. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Statist Assoc. 1987;82:605–10.
• Shih JH, Louis TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics. 1995;51:1384–99. [PubMed]
• Stram DO, Lee JW. Variance components testing in the longitudinal mixed effects model. Biometrics. 1994;50:1171–7. [PubMed]
• Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979;16:439–54. [PubMed]
• Weil CS. Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Food Cosmet Toxicol. 1970;8:177–82. [PubMed]
• Williams DA. Estimation bias using the beta-binomial distribution in teratology. Biometrics. 1988;44:305–9. [PubMed]

Articles from Biometrika are provided here courtesy of Oxford University Press

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.