Home | About | Journals | Submit | Contact Us | Français |

**|**Biometrika**|**PMC3372243

Formats

Article sections

- Summary
- 1. Introduction
- 2. Main results
- 3. Examples revisited
- 4. Simulation study
- 5. Discussion
- References

Authors

Related links

Biometrika. 2010 September; 97(3): 603–620.

Published online 2010 June 11. doi: 10.1093/biomet/asq031

PMCID: PMC3372243

Department of Biostatistics, Johns Hopkins University, 615 North Wolfe Street, Baltimore, Maryland 21205, U.S.A., Email: ude.hpshj@nehcnoy, ; Email: ude.hpshj@gnailyk

Received 2009 November; Revised 2010 January

Copyright © 2010 Biometrika Trust

This paper considers the asymptotic distribution of the likelihood ratio statistic *T* for testing a subset of parameter of interest *θ*, *θ* = (*γ*, *η*), *H*_{0} : *γ* = *γ*_{0}, based on the pseudolikelihood *L*(*θ*, ), where is a consistent estimator of *ϕ*, the nuisance parameter. We show that the asymptotic distribution of *T* under *H*_{0} is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, *θ*_{0}, or the true value of the nuisance parameter, *ϕ*_{0}, lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of *T* may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments.

Very often a probability model for data is indexed by two sets of parameters: the parameter of interest, *θ*, of dimension *p*, and the nuisance parameter, *ϕ*, of dimension *q*. Sometimes the likelihood function *L*(*θ*, *ϕ*), when considered jointly as a function of *θ* and *ϕ*, is ill-behaved, yet when considered as a function of *θ* alone with *ϕ* held fixed, is well-behaved. When the nuisance parameter cannot be eliminated through conditioning, marginalizing or factorization, one approach is to use the pseudolikelihood proposed by Gong & Samaniego (1981). The key idea of this approach is that inference for *θ* can be based on *L**(*θ*) = *L*(*θ*, ), known as the pseudolikelihood for *θ*, where is a consistent estimator of *ϕ*, assuming such an estimator exists. Gong & Samaniego derived the asymptotic distribution of the maximum pseudolikelihood estimator *θ**^* for *θ* under regularity conditions. Thus, Wald-based inference for *θ* can be made based on this asymptotic distribution. However, it has been well documented in the literature that such inference could be ill-behaved, and that this concern can be alleviated by using likelihood-ratio-based inference (Hauck & Donner, 1977). To this end, Liang & Self (1996) studied the asymptotic behaviour of the likelihood ratio test statistic for testing *H*_{0} : *θ*= *θ*_{0} based on *L**(*θ*). The primary purpose of this paper is to extend the work by Liang & Self (1996) to deal with situations where only a subset of *θ* is tested, and the parameter value of interest or the nuisance parameter may lie on the boundary of its parameter space.

Consider the model

$$y=X\varphi +,~N\{0,\mathrm{\Sigma}(\theta )\},$$

where *y* is an *n* × 1 vector of the response variables, *X* is the design matrix of *n* × *q* dimension and Σ(*θ*) is the *n* × *n* covariance matrix indexed by *θ*, a *p* × 1 vector. In pedigree analyses, it has been a standard procedure to estimate *ϕ* by the conventional least-squares estimator = (*X*^{T}*X*)^{−1}*X*^{T}*y* and to base inference for *θ* on *L*(*θ*, ), where *L*(*θ*, *ϕ*) is proportional to the conditional distribution of *Y* = *y* given *X*. Often we are interested in testing the significance of the multiple variance components one at a time and leave others unspecified (Beaty et al., 1985).

Consider *y _{ij}* ~

$$L({\theta}_{1},{\theta}_{2},{\widehat{\varphi}}_{1},{\widehat{\varphi}}_{2})\underset{\underset{{(2\pi {\widehat{\varphi}}_{i})}^{1/2}}{\overset{}{j=1{n}_{i}}}\text{exp}\left\{-\frac{{({y}_{ij}-{\theta}_{i})}^{2}}{2{\widehat{\varphi}}_{i}}\right\}.}{\overset{}{i=12}}$$

The null hypothesis of interest is *H*_{0} : *γ* = *θ*_{1} – *θ*_{2} = 0, leaving *η* = *θ*_{2} unspecified.

Bandeen-Roche & Liang (1996) described a class of models for failure-time data that accounts for multiple levels of clustering. This class of models permits specification of simple distributional forms for all bivariate margins. In addition, the distribution reduces to the univariate frailty model (Vaupel et al., 1979; Oakes, 1989) when there is only a single level. Consider a cluster with two levels; for example, households and villages. Suppose there are four individual members and the households are clustered as {1, 2} and {3, 4}. When the Clayton copula (Clayton, 1978) is used, the multivariate survival function has the form

$$S({t}_{1},\dots ,{t}_{4})={\left[{\left\{\sum _{j=1}^{2}{S}_{j}{({t}_{j})}^{1-{\theta}_{1}}-1\right\}}^{\alpha}+{\left\{\sum _{j=3}^{4}{S}_{j}{({t}_{j})}^{1-{\theta}_{1}}-1\right\}}^{\alpha}-1\right]}^{1/(1-{\theta}_{2})},$$

(1)

where *α* = (*θ*_{2} – 1)*/*(*θ*_{1} – 1), *S _{j}* is the marginal survival distribution of the

In teratological experiments, the response variables are often dichotomous and their sums are commonly modelled by beta-binomial distributions (Weil, 1970; Haseman & Kupper, 1979). The distributions are indexed by *π*, the probability of experiencing an adverse event, and *ϕ*, the nuisance parameter characterizing the so-called litter effect, also known as overdispersion, the tendency that the subjects from the same litter to respond more alike than subjects from different litters. Often investigators would like to establish the dose-response relationship between *π* and *x*, the dosage level or the treatment status. One may consider the model,

$$g(\pi )={\beta}_{0}+{\beta}_{1}x,\mathrm{\text{var}(Y)=m\pi (1-\pi )\{1+(m-1)\varphi \},}$$

where *g* is known as the prespecified link function. For beta-binomial models, it has been well documented that the nuisance parameter has a profound impact on the parameter of interest (Williams, 1988). One way to alleviate such concern is to base the inference of *β* on the pseudo-likelihood *L*(*β*, ) where is a consistent estimator of *ϕ*. Such an estimator could be obtained, for example, by using method of moments. One hypothesis of interest is *H*_{0} : *β*_{1} = 0, i.e. no dose-response relationship. Since the intralitter correlation could be dose-dependent, heterogeneous correlations are often assumed. When the dosage level is zero or the placebo is used, the intralitter correlation is usually close to zero, lying near or on the boundary of its parameter space.

Consider the case where the likelihood can be decomposed into two parts,

$$L(\theta ,\varphi )={L}_{1}(\theta ,\varphi ){L}_{2}(\varphi ),$$

where both *L*_{1} and *L*_{2} are legitimate likelihood functions. In this case, one could use the maximizer of *L*_{2}(*ϕ*), , to form the pseudolikelihood function *L*^{*}(*θ*) for *θ*. Many examples fit into this case. It is easy to see that Example 2 above is a special case with *L*_{2}(*ϕ*) proportional to the product of probability density functions of sample variances from both samples. Another example is genetic linkage study where the data consist of genetic marker data, *M*, and genotyping data, *T*, from pedigrees. In this situation, the likelihood function is proportional to

$$f(M|T;\theta ,\varphi )f(T;\varphi )={L}_{1}(\theta ,\varphi ){L}_{2}(\varphi ),$$

where *θ _{j}*, (

$$\begin{array}{l}f(Y=y|W=w;\theta ,\varphi )f(W=w;\varphi )=\hfill & {\int}_{x}f(y|x;\theta )f(x|w;\varphi )dx{\int}_{x}f(w|x;\varphi )f(x;\varphi )dx\hfill \hfill & =\hfill & {L}_{1}(\theta ,\varphi ){L}_{2}(\varphi ),\hfill \hfill \end{array}$$

where *θ* is the regression coefficient relating *X* to *Y* and *ϕ* is the parameter characterizing the measurement errors for the covariates.

In this subsection, we consider the cases where only a subset of the parameter of interest is tested while the true values of the parameter of interest and nuisance parameter are interior points of their parameter spaces. Consider the partition of *θ* into two parts: the parameters to be tested, *γ*, of dimension *p*_{1} and the remaining parameters, *η*, of dimension *p*_{2} = *p* − *p*_{1}. The null hypothesis of interest is specified as *H*_{0} : *γ* = *γ*_{0}. Let *T* be the pseudolikelihood ratio test statistic, i.e. *T* = −2 log [*L*^{*}{*γ*_{0}, (*γ*_{0})}*/L*^{*}(, )], where (*γ*_{0}) = arg max_{η}
*L*^{*}(*γ*_{0}, *η*) and (*γ**^, *) = arg max_{γ,η}
*L*^{*}(*γ*, *η*). Theorem 1 states the large sample distribution of *T* under *H*_{0}.

Theorem 1. *Suppose that under a distribution indexed by* (*θ*_{0}, *ϕ*_{0}), *the consistent estimator of ϕ, *, *has the asymptotic distribution n*^{1/2}( – *ϕ*_{0}) → *N*(0, Σ_{22}) *as n* → ∞.

*Then, under the regularity conditions* (A1)–(A6) *in*
*Gong & Samaniego* (1981),

*the p*×*p matrix*$$\left\{{I}_{11}^{-1}-\left(\begin{array}{ll}0\hfill & 0\hfill \\ 0\hfill & {I}_{\eta \eta}^{-1}\hfill \end{array}\right)\right\}{I}_{11}^{*}$$*has p*_{1}*positive eigenvalues,**λ*_{1}λ_{p1}>0,*and p*–*p*_{1}*zero eigenvalues;**the asymptotic distribution of the pseudolikelihood ratio statistic T is*$U={\sum}_{j=1}^{{p}_{1}}{\lambda}_{j}{U}_{j}$*, where the U*_{j}*s are independent*${\chi}_{1}^{2}$*variables.*

*Here*

$$\begin{array}{c}{I}_{11}=\underset{n\to \infty}{\text{lim}}[E\{-\frac{1}{n}\frac{{2}^{\text{log}L({\theta}_{0},{\varphi}_{0}){\theta}^{2};{\theta}_{0},{\varphi}_{0}}}{]},{I}_{\eta \eta}=\underset{n\to \infty}{\text{lim}}[E\{-\frac{1}{n}\frac{{2}^{\text{log}L({\gamma}_{0},{\eta}_{0},{\varphi}_{0}){\eta}^{2};{\gamma}_{0},{\eta}_{0},{\varphi}_{0}}}{]},\end{array}$$

*and*
${I}_{11}^{*}$
*is the asymptotic variance of n*^{−1/2} log *L*(*θ*_{0}, )/*θ*
*and can be calculated as*
${I}_{11}^{*}={I}_{11}+{I}_{12}{\mathrm{\Sigma}}_{22}{I}_{12}^{T}$
*with*

$${I}_{12}=\underset{n\to \infty}{\text{lim}}[E\{-\frac{1}{n}\frac{{2}^{\text{log}}}{]}.$$

The proof is straightforward using Taylor expansions and Theorem 4.4.4 of Graybill (1976) and hence is omitted. Implicitly, we used the fact that *n*^{1/2}{*θ**^*(*ϕ*_{0}) – *θ*_{0}}and *n*^{1/2}( – *ϕ*_{0}) are asymptotically independent because *θ**^*(*ϕ*_{0}) is asymptotically efficient and *n*^{1/2}( – *ϕ*_{0}) has asymptotic constant mean zero (Pierce, 1982; Parke, 1986). When *p*_{1} = *p*, Theorem 1 reduces to the main result of Liang & Self (1996).

It is easy to show that when
${I}_{11}^{*}={I}_{11}$, the distribution of *U* is
${\chi}_{{p}_{1}}^{2}$, which is the same as the asymptotic distribution of the conventional likelihood ratio statistic. As noted by Liang & Self (1996), this condition is satisfied when *I*_{12}(*θ*_{0}, *ϕ*) = 0 for all *ϕ*, which means that the score function for *θ* and *ϕ* are orthogonal to each other when evaluated at (*θ*_{0}, *ϕ*). Another situation where we have
${I}_{11}^{*}={I}_{11}$ is Σ_{22} = 0. This could occur when *ϕ* is estimated through external data whose sample size *n*^{*} increases at a faster rate than *n*, i.e. *n*^{*}*/n* → ∞ as *n* → ∞. A less obvious situation, which results in the distribution of *U* being
${\chi}_{{p}_{1}}^{2}$, occurs when *I*_{γη} = *I*_{γ}* _{ϕ}* =0. This means that the score function for

In this subsection, we consider cases where the true value of *θ* lies on the boundary of its parameter space while the true value of the nuisance parameter *ϕ* is an interior point of its parameter space. These situations were informally mentioned in Liang & Self (1996) but were not formally treated therein. Denote the parameter space of *θ* by Ω and rewrite the hypothesis *H*_{0} : *γ* = *γ*_{0} as *H*_{0} : *θ* Ω_{0}, where Ω_{0} = {*θ : θ* = (*γ*_{0}, *η*) Ω}. The complement of Ω_{0} in Ω is denoted by Ω_{1}. In § 2.1 where *θ*_{0} is not on the boundary, Ω, Ω_{0} and Ω_{1} are simply *R ^{p}*,

The pseudolikelihood ratio test statistic can be written as *T* = −2 log (*P*_{Ω0}/*P*_{Ω}). Gong & Samaniego (1981) established the *n*^{1/2}-consistency of the maximum pseudolikelihood estimator for *θ* when the true value of *θ* is an interior point of its parameter space. Here we give the consistency result when the true value of *θ* lies on the boundary.

Lemma 1. *If the regularity conditions in*
*Chernoff* (1954)
*hold, the intersection of *
*and the closure of neighbourhoods centred about the true value of the parameter of interest, θ*_{0}*, constitute closed subsets of R ^{p}, and θ*

We now establish the equivalence between the asymptotic distribution of the pseudolikelihood ratio and the distribution of the likelihood ratio of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix.

Theorem 2. *If the regularity conditions in*
*Chernoff* (1954)
*are satisfied*, *θ*_{0}
*is a limiting point of both* Ω_{0}
*and* Ω_{1}, *and the sets* Ω_{0}
*and* Ω_{1}
*are approximated by nonnull and disjoint cones C*_{Ω0}
*and C*_{Ω1}, *respectively, then, when*
*γ* = *γ*_{0}, *the asymptotic distribution of the pseudolikelihood ratio statistic, T, is the same as the distribution of the likelihood ratio test of θ* *C*_{Ω0}
*against θ* *C*_{Ω1}
*based on one observation from a population with multivariate normal distribution with mean and unknown covariance matrix*
${I}_{11}^{-1}{I}_{11}^{*}{I}_{11}^{-1}$
*while the covariance matrix is misspecified as*
${I}_{11}^{-1}$
*in the likelihood ratio test*.

A sketch of the proof is given in Appendix A. When there is no nuisance parameter *ϕ* or
${I}_{11}^{*}={I}_{11}$, Theorem 2 reduces to Theorem 3 of Self & Liang (1987) in which the likelihood ratio instead of pseudolikelihood ratio is considered.

The result given by Theorem 2 reduces the general problem of computing the limit distribution of a pseudolikelihood ratio to a problem of computing the distribution of

$${T}_{1}(Z)={Q}_{{C}_{{\mathrm{\Omega}}_{0}}}(Z)-{Q}_{{C}_{\mathrm{\Omega}}}(Z),$$

(2)

where *Q _{}*(

The distribution of *T*_{1}(*Z*) in any given case could be rather complicated. We now offer several special cases in which Theorem 2 may be used to compute the asymptotic distribution of the pseudolikelihood ratio statistic.

From here on we assume Ω = Ω_{1} × × Ω* _{p}*, where the Ω

Lemma 2.*Suppose Z* ~ *N*(0,
${I}_{11}^{-1}{I}_{11}^{*}{I}_{11}^{-1}$) *and * = {0}^{p1} × *R*^{p−p1}*. Let Q _{}*(

- ${Q}_{\varphi}(Z)={Z}_{({p}_{1})}^{T}{e}_{({p}_{1})}^{-1}{Z}_{({p}_{1})}$,
*where Z*_{(p1)}= (*Z*_{1}, …,*Z*_{p1})^{T}*and e*_{(p1)}*is the p*_{1}×*p*_{1}*upper left submatrix of*${I}_{11}^{-1}$; *the distribution of Q*(_{}*Z*)*is*$U={\sum}_{j=1}^{{p}_{1}}{\lambda}_{j}{U}_{j}$*, where U*${\chi}_{1}^{2}$_{j}s are independent*variables and*λ_{1}λ_{p1}> 0*are the eigenvalues of matrix*${e}_{({p}_{1})}^{-1}{e}_{({p}_{1})}^{*}$*, with*${e}_{({p}_{1})}^{*}$*being the p*_{1}×*p*_{1}*upper left submatrix of*${I}_{11}^{-1}{I}_{11}^{*}{I}_{11}^{-1}$;*and**when*${I}_{11}^{*}={I}_{11}$*, U is simply*${\chi}_{{p}_{1}}^{2}$.

Part (i) can be shown by completing the quadratic form, and the rest of the results follow by Theorem 4.4.4 of Graybill (1976).

Now we consider some special cases, adopting the notation in Self & Liang (1987). Partition the parameter vector, *θ* = (*γ*, *η*), into four coordinates: (*p*_{11}, *p*_{12}, *p*_{21}, *p* − *p*_{11} − *p*_{12} − *p*_{21}) where the first *p*_{11} coordinates of *θ* represent the parameter to be tested with true values on the boundary; the next *p*_{12} coordinates of *θ* represent the parameter to be tested with true values not on the boundary; the next *p*_{21} coordinates of *θ* represent the first *p*_{21} component of *η* with true values on the boundary; and finally the remaining *p* − *p*_{11} − *p*_{12} − *p*_{21} coordinates of *θ* represent the last *p* − *p*_{11} − *p*_{12} − *p*_{21} coordinates of *η* with the true value not on the boundary. Here *p*_{1} = *p*_{11} + *p*_{12}.

*Case* 1. Suppose that the parameter configuration is (0, *p*_{1}, 0, *p* − *p*_{1}), i.e. *θ*_{0} is an interior point of Ω. Then *C*_{Ω0} = {0}^{p}^{1} × *R ^{p–p}*

*Case* 2. Suppose the parameter configuration is (1, 0, 0, *p* – 1). Then *C*_{Ω0} = {0} × *R ^{p}*

$${T}_{1}(Z)={Z}_{1}^{2}/{e}_{(1)}-{Z}_{1}^{2}I({Z}_{1}<0)/{e}_{(1)}={Z}_{1}^{2}I({Z}_{1}>0)/{e}_{(1)},$$

where the first step is due to the fact that *Q*_{CΩ} (*Z*) = *Q*_{CΩ0} (*Z*) when *Z*_{1} 0. This is true because the quadratic function (*Z* – *θ*)^{T}*I*_{11}(*Z* – *θ*) is a continuous function of *θ* and the infimum in the function *Q*_{CΩ} (*Z*) is achieved at the boundary of *C*_{Ω}, i.e. *C*_{Ω0}. Consequently, the asymptotic distribution of *T* is a 50:50 mixture of
${\chi}_{0}^{2}$ and
${e}_{(1)}^{*}{\chi}_{1}^{2}/{e}_{(1)}$. When
${I}_{11}^{*}={I}_{11}$, this distribution is a 50:50 mixture of
${\chi}_{0}^{2}$ and
${\chi}_{1}^{2}$, which agrees with the result in Case 5 of Self & Liang (1987).

*Case* 3. Suppose the parameter configuration is (1, *p*_{1} – 1, 0, *p* – *p*_{1}). Then *C*_{Ω0} = {0} × {0}^{p}^{1–1} × *R ^{p}*

Its distribution is a 50:50 mixture of random variables *U _{a}* and

$$\left\{{e}_{({p}_{1})}^{-1}-\left(\begin{array}{ll}1/{e}_{(1)}\hfill & {0}_{1\times ({p}_{1}-1)}\hfill \\ {0}_{({p}_{1}-1)\times 1}\hfill & {0}_{({p}_{1}-1)\times ({p}_{1}-1)}\hfill \end{array}\right)\right\}{e}_{({p}_{1})}^{*},$$

and the *U _{i}* s and

In Cases 2 and 3, we take the advantage of the fact that only one component of *γ*_{0} lies on the boundary. In these cases, the parameter spaces *C*_{Ω0} and *C*_{Ω}, and their boundaries, are straight lines or planes which are preserved under linear transformations. In Cases 4 and 5, we consider the situation where there are two boundary parameters. In Case 4, both boundary parameters are tested while in Case 5, only one boundary parameter is tested. The calculation of the asymptotic distribution of the pseudolikelihood ratio statistic is more complicated than that in Cases 1–3.

*Case* 4. Suppose the parameter configuration is (2, 0, 0, 0). Then *C*_{Ω0} = {0} × {0} and *C*_{Ω} = [0, ∞) × [0, ∞). Let *I*_{11} = *P*^{T}
*P*, where *P* is a nonsingular matrix and denote _{Ω} = {*: * = *Pθ* for any *θ* *C*_{Ω}} and = *P Z*. Then *T*_{1}(*Z*) can be rewritten as

$${T}_{1}(Z)={\Vert \tilde{Z}\Vert}^{2}-\underset{\tilde{\theta}{\tilde{C}}_{\mathrm{\Omega}}}{\text{inf}}$$

(3)

where ‖ · ‖ is the usual Euclidean metric. Calculation of the second term in equation (3) depends on the location of relative to the boundary of _{Ω}. There are four regions that must be considered. The shaded region in Fig. 1 represents _{Ω}, and _{Ω0} is the origin. The angle in the shaded area is less than 180°. This is due to the fact that the convexity of *C*_{Ω} is preserved under linear mapping *P*. Denote the rotation matrix in *R*^{2} and the information matrices *I*_{11} and
${I}_{11}^{*}$ by

$${T}_{\theta}=\left(\begin{array}{ll}\text{cos}(\theta )\hfill & -\text{sin}(\theta )\hfill \\ \text{sin}(\theta )\hfill & \text{cos}(\theta )\hfill \end{array}\right),{I}_{11}=\left(\begin{array}{ll}a\hfill & b\hfill \\ b\hfill & d\hfill \end{array}\right),{I}_{11}^{*}=\left(\begin{array}{ll}a*\hfill & b*\hfill \\ b*\hfill & d*\hfill \end{array}\right),$$

respectively. Further denote the columns of *P* by *P*_{1} and *P*_{2}, and the inner product of vectors *a* and *b* by *a, b* = *a*^{T}*b*. Then the distribution of *T*_{1}(*Z*) can be calculated as

$${T}_{1}(Z)=\{\begin{array}{lll}{\Vert \tilde{Z}\Vert}^{2}~U={\lambda}_{1}{U}_{1}+{\lambda}_{2}{U}_{2}\hfill & \text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{the}\text{shaded}\text{region},\hfill & {\{\frac{\tilde{Z},{P}_{2}\Vert {P}_{2}\Vert}{\}}2~d*{\chi}_{1}^{2}/d}^{}\text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}1,\hfill & {\{\frac{\tilde{Z},{P}_{1}\Vert {P}_{1}\Vert}{\}}2~a*{\chi}_{1}^{2}/a}^{}\text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}2,\hfill & 0\hfill & \text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}3,\hfill \hfill \hfill \end{array}$$

where *λ*_{1} and *λ*_{2} are eigenvalues of
${I}_{11}^{*}$ and
${I}_{11}^{-1}$, and *U*_{1} and *U*_{2} are independent
${\chi}_{1}^{2}$ variables.

Diagram of the parameter space for Case 4. The shaded region, _{Ω}, represents admissible parameter values under the alternative hypothesis. Under the null hypothesis, the parameter is located at the origin. The asymptotic distribution **...**

The mixing probabilities for the shaded region, region 1 and region 2, are

$$\begin{array}{l}{p}_{s}={\text{cos}}^{-1}\left[\frac{(0,1){I}_{11}{\left\{{I}_{11}^{*}\right\}}^{-1}{I}_{11}\left(\begin{array}{l}1\\ 0\end{array}\right)}{{\left\{(0,1){I}_{11}{({I}_{11}^{*})}^{-1}{I}_{11}\left(\begin{array}{l}0\\ 1\end{array}\right)\right\}}^{1/2}{\left\{(1,0){I}_{11}{({I}_{11}^{*})}^{-1}{I}_{11}\left(\begin{array}{l}1\\ 0\end{array}\right)\right\}}^{1/2}}\right]/2\pi ,\hfill \\ {p}_{1}={\text{cos}}^{-1}\left[\frac{(0,1){I}_{11}{({I}_{11}^{*})}^{-1}{P}^{\text{T}}{T}_{\pi /2}{P}_{2}}{{\left\{(0,1){I}_{11}{({I}_{11}^{*})}^{-1}{I}_{11}\left(\begin{array}{l}0\\ 1\end{array}\right)\right\}}^{1/2}{\left\{{P}_{2}^{\text{T}}{T}_{-\pi /2}P{({I}_{11}^{*})}^{-1}{P}^{\text{T}}{T}_{\pi /2}{P}_{2}\right\}}^{1/2}}\right]/2\pi ,\hfill \\ {p}_{2}={\text{cos}}^{-1}\left[\frac{(1,0){I}_{11}{({I}_{11}^{*})}^{-1}{P}^{\text{T}}{T}_{-\pi /2}{P}_{1}}{{\left\{(1,0){I}_{11}{({I}_{11}^{*})}^{-1}{I}_{11}\left(\begin{array}{l}1\\ 0\end{array}\right)\right\}}^{1/2}{\left\{{P}_{1}^{\text{T}}{T}_{\pi /2}P{({I}_{11}^{*})}^{-1}{P}^{\text{T}}{T}_{-\pi /2}{P}_{1}\right\}}^{1/2}}\right]/2\pi .\hfill \end{array}$$

Thus the asymptotic distribution of *T* is a mixture of *U*,
${d}^{*}{\chi}_{1}^{2}/d$,
${a}^{*}{\chi}_{1}^{2}/a$ and
${\chi}_{0}^{2}$ with mixing probabilities *p _{s}*,

*Case* 5. Suppose the parameter configuration is (1, 0, 1, 0). Then *C*_{Ω0} = {0} × [0, +∞) and *C*_{Ω} = [0, ∞) × [0, ∞). By a similar argument to that in Case 4, we may write

$${T}_{1}(Z)=\underset{\tilde{\theta}{\tilde{C}}_{{\mathrm{\Omega}}_{0}}}{\text{inf}}$$

(4)

Calculation of the terms in equation (4) depends on the location of relative to the boundary of both _{Ω0} and _{Ω}. In fact, there are two possible partitions of regions that must be considered, depending on whether the angle *ψ* spanned by *P*_{1} and *P*_{2} is greater than *π/*2. It is easy to show *ψ > π/*2 if and only if *b <* 0.

If *ψ > π/*2, i.e. *b <* 0, there are five different regions to be considered. The shaded region in Fig. 2 represents _{Ω}, and _{Ω0} is the half-line in the same direction as the vector *P*_{2} with the endpoint at the origin. The distribution of *T*_{1}(*Z*) can be calculated as

$${T}_{1}(Z)=\{\begin{array}{lll}{\Vert \tilde{Z}\Vert}^{2}~{\lambda}_{1}{U}_{1}+{\lambda}_{2}{U}_{2}\hfill & \text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}1,\hfill & {\Vert \tilde{Z}\Vert}^{2}-{\{\frac{\tilde{Z},{P}_{2}\Vert {P}_{2}\Vert}{\}}2~{\lambda}_{3}{\chi}_{1}^{2}}^{}\text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}2,\hfill & 0\hfill & \text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}3\text{or}4,{\{\frac{\tilde{Z},{P}_{1}\Vert {P}_{1}\Vert}{\}}2~a*{\chi}_{1}^{2}/a}^{}\text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}5,\hfill \hfill \hfill \hfill \end{array}$$

where *λ*_{1} and *λ*_{2} are eigenvalues of
${I}_{11}^{*}{I}_{11}^{-1}$, *U*_{1} and *U*_{2} are independent
${\chi}_{1}^{2}$ variables, and *λ*_{3} = (*a*^{*}*d*^{2} − 2*b*^{*}*bd* + *d*^{*}*b*^{2})/{*d*(*ad* − *b*^{2})}.

Diagrams of the parameter space for Case 5 when the angle spanned by *P*_{1} and *P*_{2} is (a) greater than *π/*2, or (b) is less than *π/*2. The shaded region, _{Ω}, represents admissible parameter values under the alternative **...**

By a similar argument, the distribution of *T*_{1}(*Z*) when *ψ* *π/*2, i.e. *b* 0, is calculated as

$${T}_{1}(Z)=\{\begin{array}{l}{\Vert \tilde{Z}\Vert}^{2}-{\{\frac{\tilde{Z},{P}_{2}\Vert {P}_{2}\Vert}{\}}2~{\lambda}_{1}{\chi}_{1}^{2}}^{}\text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}1,\hfill & 0\hfill & \text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}2\text{or}3,\hfill & {\{\frac{\tilde{Z},{P}_{1}\Vert {P}_{1}\Vert}{\}}2~a*{\chi}_{1}^{2}/a}^{}\text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}4,\hfill & {\{\frac{\tilde{Z},{P}_{1}\Vert {P}_{1}\Vert}{\}}2-{\{\frac{\tilde{Z},{P}_{2}\Vert {P}_{2}\Vert}{\}}2~{\lambda}_{2}{U}_{1}+{\lambda}_{3}{U}_{2}}^{}\text{if}\hfill & \tilde{Z}\text{is}\text{in}\text{region}5,\hfill}^{}\hfill \hfill \hfill \end{array}$$

where *λ*_{1} = (*a*^{*}*d*^{2} − 2*b*^{*}*bd* + *d*^{*}*b*^{2})*/*{*d*(*ad* − *b*^{2})}, *U*_{1} and *U*_{2} are independent
${\chi}_{1}^{2}$ variables, and *λ*_{2} and *λ*_{3} are eigenvalues of

$${I}_{11}\left(\begin{array}{cc}1/a& 0\\ 0& -1/d\end{array}\right){I}_{11}^{*}{I}_{11}^{-1}.$$

In this subsection, we consider cases where the true value of nuisance parameter, *ϕ*, lies on the boundary of its parameter space. In this situation, the asymptotic distribution of the pseudolikelihood ratio statistic is more complicated since the consistent estimator may not be asymptotically normally distributed. Specifically, by a similar argument as in the proof of Theorem 2, the pseudolikelihood ratio statistic can be written as

$$T=\underset{\theta {C}_{{\mathrm{\Omega}}_{0}}{({Z}_{n}-\theta )}^{\text{T}}{I}_{11}({Z}_{n}-\theta )-\underset{\theta {C}_{\mathrm{\Omega}}{({Z}_{n}-\theta )}^{\text{T}}{I}_{11}({Z}_{n}-\theta )+{o}_{p}(1),}{\text{inf}}}{\text{inf}}$$

where

$${Z}_{n}={n}^{-1/2}{I}_{11}^{-1}\frac{\text{log}L({\theta}_{0},{\varphi}_{0})\theta +{I}_{11}^{-1}{I}_{12}{n}^{1/2}(\widehat{\varphi}-{\varphi}_{0})+{o}_{p}(1).}{}$$

The second term in *Z _{n}* may not be asymptotically normally distributed and the results by Pierce (1982) and Parke (1986) cannot be applied here to show the asymptotic independence between the two terms in

In fact, the asymptotic distribution of *Z _{n}* is a mixture of normals with identical mean zero and possibly unequal variances. To show this, denote by

$${\widehat{\varphi}}^{*}I({\widehat{\varphi}}_{1}^{*}>0)+(\begin{array}{ccc}& 0& \\ {\widehat{\varphi}}_{2}^{*}& -({\mathrm{\Sigma}}_{22}^{21}/{\mathrm{\Sigma}}_{22}^{11})& {\widehat{\varphi}}_{1}^{*}\\ & & {\widehat{\varphi}}_{q}^{*}& -({\mathrm{\Sigma}}_{22}^{q1}/{\mathrm{\Sigma}}_{22}^{11})& {\widehat{\varphi}}_{1}^{*}\end{array})I({\widehat{\varphi}}_{1}^{*}0),$$

where
${\mathrm{\Sigma}}_{22}^{ij}$ are elements of the matrix Σ_{22}, the asymptotic variance of *ϕ*^{*}. Therefore, *Z _{n}* can be rewritten as

$$\begin{array}{l}\{{n}^{-1/2}{I}_{11}^{-1}\frac{\text{log}L({\theta}_{0},{\varphi}_{0})\theta +{I}_{11}^{-1}{I}_{12}{n}^{1/2}({\widehat{\varphi}}^{*}-{\varphi}_{0})\}I({\widehat{\varphi}}_{1}^{*}0)+\{{n}^{-1/2}{I}_{11}^{-1}\frac{\text{log}L({\theta}_{0},{\varphi}_{0})\theta +{I}_{11}^{-1}{I}_{12}A{n}^{1/2}({\widehat{\varphi}}^{*}-{\varphi}_{0})\}I({\widehat{\varphi}}_{1}^{*}0),}{}}{}\end{array}$$

where

$$A=(\begin{array}{cccc}0& 0& \dots & 0\\ -{\mathrm{\Sigma}}_{22}^{21}/{\mathrm{\Sigma}}_{22}^{11}& 1& \dots & 0\\ & -{\mathrm{\Sigma}}_{22}^{q1}/{\mathrm{\Sigma}}_{q2}^{11}& 0& \dots & 1& )& .\end{array}$$

Suppose that as *n* → ∞,

$${n}^{1/2}(\begin{array}{c}\frac{1}{n}\frac{\text{log}L({\theta}_{0},{\varphi}_{0})\theta {\widehat{\varphi}}^{*}-{\varphi}_{0}}{})\to \left(\begin{array}{l}X\\ Y\end{array}\right)~N\left\{\left(\begin{array}{l}0\\ 0\end{array}\right),\left(\begin{array}{cc}{I}_{11}& {\mathrm{\Sigma}}_{12}\\ {\mathrm{\Sigma}}_{12}^{\text{T}}& {\mathrm{\Sigma}}_{22}\end{array}\right)\right\},\end{array}$$

(5)

where Σ_{12} is the asymptotic covariance between *n*^{−1/2} log *L*(*θ*_{0}, *ϕ*_{0})*/θ* and *n*^{1/2}(^{*} − *ϕ*_{0}), which is in fact zero by Pierce (1982) and Parke (1986). The asymptotic distribution of *Z _{n}* is

$$({I}_{11}^{-1}X+{I}_{11}^{-1}{I}_{12}Y)I({Y}_{1}>0)+({I}_{11}^{-1}X+{I}_{11}^{-1}{I}_{12}AY)I({Y}_{1}0).$$

(6)

Thus the asymptotic distribution of *Z _{n}* is a 50:50 mixture of normals with mean zero. For more general cases where the parameter space for

The asymptotic distribution of the pseudolikelihood ratio statistic *T* is

$$\underset{\theta {C}_{{\mathrm{\Omega}}_{0}}{(Z-\theta )}^{\text{T}}{I}_{11}(Z-\theta )-\underset{\theta {C}_{\mathrm{\Omega}}{(Z-\theta )}^{\text{T}}{I}_{11}(Z-\theta ),}{\text{inf}}}{\text{inf}}$$

where *Z* is the asymptotic distribution of *Z _{n}*. Therefore, the asymptotic distribution of

It can be shown that

$$\frac{\text{log}L(\theta ,\varphi )\varphi ={X}^{\text{T}}{\mathrm{\Sigma}}^{-1}(\theta )(y-X\varphi ).}{}$$

Therefore,

$${I}_{12}(\theta ,\varphi )=E[-\frac{\theta \left\{{X}^{\text{T}}{\mathrm{\Sigma}}^{-1}(\theta )(y-X\varphi )\right\}}{]}=\{-\frac{\theta {X}^{\text{T}}{\mathrm{\Sigma}}^{-1}(\theta )}{\}}E(y-X\varphi )=0,$$

for all *θ* and *ϕ*, thus
${I}_{11}^{*}={I}_{11}$. By result in Case 2, the pseudolikelihood ratio testing the *j* th variance component being zero, i.e. *H*_{0} : *θ _{j}* = 0, has an asymptotically 50:50 mixture of
${\chi}_{0}^{2}$ and
${\chi}_{1}^{2}$.

It can be shown that

$$E\{\frac{{2}^{\text{log}L(\theta ,\varphi ){\theta}_{{i}^{\prime}}{\varphi}_{i}\}=-{\delta}_{{i}^{\prime}i}E\left(\sum _{j=1}^{{n}_{i}}\frac{{y}_{ij}-{\theta}_{i}}{{\varphi}_{1}^{2}}\right)=0({i}^{\prime},i=1,2),}}{}$$

where *δ _{i}*

Let *y _{i}* = (

$$L({\theta}_{1},{\theta}_{2},\varphi )\frac{{4}^{S}{t}_{1}{t}_{2}{t}_{3}{t}_{4}=\frac{{h}_{1}{t}_{1}\frac{{h}_{1}{t}_{2}\frac{{h}_{2}{t}_{3}\frac{{h}_{2}{t}_{4}\frac{{4}^{S}{h}_{1}^{2}{h}_{2}^{2},}{}}{}}{}}{}}{}}{}$$

where

$$\begin{array}{c}{h}_{1}=\sum _{j=1}^{2}{S}_{j}{({t}_{j})}^{1-{\theta}_{1}}-1,{h}_{2}=\sum _{j=3}^{4}{S}_{j}{({t}_{j})}^{1-{\theta}_{1}}-1,& S={h}^{1/(1-{\theta}_{2})}={({h}_{1}^{a}+{h}_{2}^{\alpha}-1)}^{1/(1-{\theta}_{2})},{h}_{1}/{t}_{j}=(1-{\theta}_{1}){S}_{j}{({t}_{j})}^{1-{\theta}_{1}}{\lambda}_{j}({t}_{j})(j=1,2),{h}_{2}/{t}_{j}=(1-{\theta}_{1}){S}_{j}{({t}_{j})}^{1-{\theta}_{1}}{\lambda}_{j}({t}_{j})(j=3,4),\end{array}$$

*λ _{j}* (

$$\begin{array}{l}\frac{{4}^{S}{h}_{1}^{2}{h}_{2}^{2}={\alpha}^{2}{(\alpha -1)}^{2}{h}_{1}^{\alpha -2}{h}_{2}^{\alpha -2}\frac{{2}^{S}{h}_{2}+{\alpha}^{3}(\alpha -1)\left\{{h}_{1}^{2(\alpha -1)}{h}_{1}^{\alpha -2}+{h}_{1}^{\alpha -2}{h}_{2}^{2(\alpha -1)}\right\}\frac{{3}^{S}{h}^{3}}{}& +{\alpha}^{4}{({h}_{1}{h}_{2})}^{2(\alpha -1)}\frac{{4}^{S}{h}^{4}}{}}{}}{}\end{array}$$

with
${l}^{S}$ (*l* = 2, 3, 4). A hypothesis of interest is *H*_{0} : *θ*_{2} = 1 which implies the independence of *y _{ij}* s for two individuals from different households.

The parameters *θ*_{1} and *θ*_{2} must satisfy the inequality *θ*_{1} *θ*_{2} 1. The cones which approximate Ω_{0} and Ω_{1} are [0, ∞) × {0} and {(*t*_{1}*, t*_{2}) : *t*_{1} *t*_{2} 0}. Since the parameter space cannot be written as the product of intervals, results in Cases 1–5 may not be applied. A simple reparameterization from *θ* = (*θ*_{1}*, θ*_{2}) to *τ* = (*θ*_{2}*, δ*), *δ* = *θ*_{1} − *θ*_{2}, yields the cones approximating the parameter space of (*θ*_{2}*, δ*) being {0} × *R* and [0, ∞) × *R*. By result in Case 2, the asymptotic distribution of the pseudolikelihood ratio is

$$\frac{1}{2}{\chi}_{0}^{2}+\frac{1}{2}\frac{{e}_{(1)}^{*}}{{e}_{(1)}}{\chi}_{1}^{2},$$

where *e*_{(1)} and
${e}_{(1)}^{*}$ are the (1, 1) elements of matrix
${I}_{11}^{-1}$ and
${I}_{11}^{-1}{I}_{11}^{*}{I}_{11}^{-1}$, respectively,

$${I}_{11}=\underset{n\to \infty}{\text{lim}}[E\{-\frac{1}{n}\frac{{2}^{\text{log}}\}}{}],{I}_{12}=\underset{n\to \infty}{\text{lim}}[E\{-\frac{1}{n}\frac{{2}^{\text{log}}\tau \varphi ;{\tau}_{0},{\varphi}_{0}\}}{}],$$

*L*(*τ, ϕ*) = *L*(*θ*_{2} + *δ, θ*_{2}*, ϕ*),
${I}_{11}^{*}={I}_{11}+{I}_{12}{\mathrm{\Sigma}}_{22}{I}_{12}^{\text{T}}$, and Σ_{22} is the asymptotic variance of the consistent estimator of *ϕ*.

For simplicity, we consider the special case of a single treatment group and a control group and assume a logistic dose–response model. Denote *x _{i}* = 0 if the litter

$$\{\begin{array}{c}{\mathrm{\Sigma}}_{{x}_{i}=0}\{{({p}_{i}-{\pi}_{c})}^{2}/[{\pi}_{c}(1-{\pi}_{c})\{1+({m}_{i}-1){\varphi}_{c}\}/{m}_{i}]-({n}_{0}-1)/{n}_{0}\}=0,\\ {\mathrm{\Sigma}}_{{x}_{i}=1}\{{({p}_{i}-{\pi}_{t})}^{2}/[{\pi}_{t}(1-{\pi}_{t})\{1+({m}_{i}-1){\varphi}_{t}\}/{m}_{i}]-({n}_{1}-1)/{n}_{1}\}=0,\end{array}$$

where *p _{i}* =

To alleviate the impact of the nuisance parameter on the parameter of interest, one could base the inference for *β* on the pseudolikelihood *L*^{*}(*β*_{0}, *β*_{1}) = *L*(*β*_{0}, *β*_{1}, * _{c}*,

To explore the theoretical findings empirically, we conducted two simulation studies. In the first, we applied the pseudolikelihood ratio test to test for the statistical significance of a variance component in the model described in Example 1 and compared its performance with the likelihood ratio test based on its asymptotic distribution and the exact finite sample distribution (Crainiceanu & Ruppert, 2004). The model described in Example 1 belongs to the linear mixed model with one variance component, specifically,

$$y=X\varphi +Zb+,\mathrm{E(\begin{array}{l}b\\ \\ )\end{array}=\left(\begin{array}{l}{0}_{K}\\ {0}_{n}\end{array}\right),\text{cov}(\begin{array}{l}b\\ \\ )\end{array}=(\begin{array}{cc}{\sigma}_{b}^{2}{I}_{K}& 0\\ 0& {\sigma}_{2}^{{I}_{n}}\end{array}),}$$

(7)

where *b* is random effects due to specific genetic or nongenetic factors shared among relatives, and represents random deviations unique to each individual. With nuisance parameter *ϕ* replaced by the conventional least-squares estimator = (*X*^{T}*X*)^{−1}*X*^{T}*y*, the pseudolikelihood is
${L}^{*}({\sigma}_{b}^{2},{\sigma}_{2}^{)}$ and the pseudolikelihood ratio test statistic for testing
${H}_{0}:{\sigma}_{b}^{2}=0$ is

$$T=2\{\underset{\{{\sigma}_{b}^{2}0,{\sigma}_{2}^{0}{L}^{*}({\sigma}_{b}^{2},{\sigma}_{2}^{})-\underset{\left\{{\sigma}_{b}^{2}=0,{\sigma}_{2}^{0}{L}^{*}({\sigma}_{b}^{2},{\sigma}_{2}^{})\right\}.}{\text{sup}}}{\text{sup}}$$

Maximizing
${L}^{*}({\sigma}_{b}^{2},{\sigma}_{2}^{)}$ over
${\sigma}_{b}^{2}$ and
${\sigma}_{2}^{}$ simultaneously by Newton–Raphson or Fisher’s scoring method could lead to negative estimates of variances, so we reparameterized
${\sigma}_{b}^{2}$ by
$\lambda {\sigma}_{2}^{}$, and maximized the pseudolikelihood first for fixed *λ* and then over *λ*. Testing the null hypothesis
${H}_{0}:{\sigma}_{b}^{2}=0$ is equivalent to testing *H*_{0} : *λ* = 0 (Crainiceanu & Ruppert, 2004). Twice the log-pseudolikelihood function is

$$2\text{log}\{L(\widehat{\varphi},\lambda ,{\sigma}_{2}^{})\}=-n\text{log}({\sigma}_{2}^{})-\text{log}\left|{V}_{\lambda}\right|-\frac{{(y-X\widehat{\varphi})}^{\text{T}}{V}_{\lambda}^{-1}(y-X\widehat{\varphi})}{{\sigma}_{2}^{}}$$

where *V _{λ}* =

$$T=\underset{\lambda 0[-\text{log}\left|{V}_{\lambda}\right|+n\text{log}({\tilde{\sigma}}_{2}^{})-n\text{log}\{{\widehat{\sigma}}_{2}^{(}\}].}{\text{sup}}$$

(8)

The above supremum can be found by a grid search over possible values of *λ*. The likelihood ratio test statistic has a form similar to equation (8), except that
$\stackrel{(}{{\sigma}_{2}^{}^}$ is replaced by
${y}^{\text{T}}{P}_{\lambda}^{\text{T}}{V}_{\lambda}^{-1}{P}_{\lambda}y/n$, where
${P}_{\lambda}={I}_{n}-X{({X}^{\text{T}}{V}_{\lambda}^{-1}X)}^{-1}{X}^{\text{T}}{V}_{\lambda}^{-1}$.

To obtain the size of the tests, we drew *y* from (7) with
${\sigma}_{b}^{2}=0$. The family size was set to 3 and the number of families varied from 20 to 40. The true value of *ϕ* was set to be (1, 2, 3)^{T} and the matrix *X* was fixed at randomly drawn numbers from standard normal distribution. The distribution of the pseudolikelihood ratio test statistic does not depend on the magnitude of
${\sigma}_{2}^{}$ because testing the random effect is equivalent to testing the within group correlation
${\sigma}_{b}^{2}/({\sigma}_{b}^{2}+{\sigma}_{2}^{)}$, which only depends on
${\sigma}_{b}^{2}/{\sigma}_{2}^{}$. For simplicity, we fixed
${\sigma}_{2}^{=}$. For each generated dataset, we computed the pseudolikelihood ratio test statistic and the likelihood ratio test statistic, and rejected the null if the test statistic is larger than the (1 − *α*)th quantile of 50:50 mixture of
${\chi}_{0}^{2}$ and
${\chi}_{1}^{2}$, i.e. the (1 − 2*α*)th quantile of
${\chi}_{1}^{2}$. We also computed the exact distribution of the likelihood ratio test based on results in Crainiceanu & Ruppert (2004) and obtained the (1 − *α*)th quantile of its distribution. To obtain the power of the test, a similar procedure was implemented, except that *y* was drawn from model (7) with
${\sigma}_{b}^{2}/{\sigma}_{2}^{=}$, 0.2 and 0.4, respectively.

Table 1 shows results from 5000 simulations. The nominal level for Type I error, *α*, was set to 0.05 and 0.01. When the null hypothesis is true, both the pseudolikelihood ratio test and the likelihood ratio test based on its exact distribution have proportions of rejections within 95% confidence intervals for the nominal error rates, i.e. 0.007–0.013 for *α* = 0.01, and 0.044–0.056 for *α* = 0.05, suggesting that the asymptotic approximation for the pseudolikelihood ratio test statistic is adequate for moderate sample sizes. The pseudolikelihood test is slightly less powerful than the likelihood ratio test based on its exact distribution. On the other hand, the likelihood ratio test based on its asymptotic distribution is liberal but more powerful compared to the other two tests. In general, the pseudolikelihood ratio test is more conservative and less powerful than the likelihood ratio test when the nuisance parameter is estimated by the maximum likelihood estimator under the null.

In the second simulation study, we applied the pseudolikelihood ratio test to test for the treatment effect in a two-group teratological experiment. We simulated data that mimic the litter size structure of Weil (1970) with *β*_{0} = 1.05, *ϕ _{t}* = 0.05 and

One interesting area of future research is to extend the current results to situations where the nuisance parameters are infinite-dimensional. For instance, in Example 4 of the multilevel frailty survival models, the marginal survival distribution may be left unspecified. It will be of interest to examine the asymptotic behaviour of the pseudolikelihood ratio test statistic in this situation.

The authors thank the referees and the editor for comments that substantially improved the presentation. This research was supported in part by a grant from the National Institutes of Health, U.S.A.

*Proof .* For simplicity of notation, let *θ*_{0} = 0. Under the same regularity conditions as Chernoff (1954), for any *θ* in a neighbourhood *N* of *θ* = 0, we have

$$\begin{array}{l}2\{\text{log}{L}^{*}(\theta )-\text{log}{L}^{*}(\theta )\}=\hfill & 2\left\{\text{log}L(\theta ,\widehat{\varphi})-\text{log}L(0,\widehat{\varphi})\right\}\hfill \hfill & \hfill & =\hfill & 2{\{\frac{\text{log}L(0,\widehat{\varphi})\theta \}}{\text{T}}\theta +{\theta}^{\text{T}}\{\frac{{2}^{\text{log}L(0,\widehat{\varphi}){\theta}^{2}}\theta +{O}_{p}(n){\Vert \theta \Vert}^{3}}{}\hfill & =\hfill & 2{({n}^{1/2}{A}_{1}^{*})}^{\text{T}}({n}^{1/2}\theta )+{({n}^{1/2}\theta )}^{\text{T}}{B}_{11}^{*}({n}^{1/2}\theta )+{O}_{p}(n){\Vert \theta \Vert}^{3},\hfill}^{}\hfill \end{array}$$

where ${A}_{1}^{*}={n}^{-1}{\sum}_{i=1}^{n}\text{log}L(0,\widehat{\varphi})/\theta $ and ${B}_{11}^{*}={n}^{-1}{\sum}_{i=1}^{n}{2}^{\text{log}}$.

The asymptotic distribution of *n*^{1/2}
*A*^{*} is normal with mean zero and variance matrix
${I}_{11}^{*}$, where
${I}_{11}^{*}={I}_{11}+{I}_{12}{\mathrm{\Sigma}}_{22}{I}_{12}^{\text{T}}$, i.e.
${n}^{1/2}{A}_{1}^{*}\to N(0,{I}_{11}^{*})$ and
${B}_{11}^{*}\to -{I}_{11}$ as *n* → ∞.

Let
$\theta ={I}_{11}^{-1}{A}_{1}^{*}+\eta $ with *η* = *O _{p}*(

$$\begin{array}{l}2\{\text{log}{L}^{*}(\theta )-\text{log}{L}^{*}(0)\}=2{({n}^{1/2}{A}_{1}^{*})}^{\text{T}}({n}^{1/2}\theta )+{({n}^{1/2}\theta )}^{\text{T}}{I}_{11}^{*}({n}^{1/2}\theta )+{O}_{p}(1)\hfill \hfill & \hfill & =2{({n}^{1/2}\eta )}^{\text{T}}{I}_{11}({n}^{1/2}\eta )+{({n}^{1/2}{A}_{1}^{*})}^{\text{T}}{I}_{11}^{-1}({n}^{1/2}{A}_{1}^{*})+{O}_{p}(1).\hfill \end{array}$$

Let −2 log *λ*^{*} = −2 log{*P*_{Ω0}/*P*_{Ω1}}.

Applying Lemma 1 and the above approximations to Ω_{0} and Ω_{1}, we get

$$-2\text{log}{\lambda}^{*}=\underset{\theta {\mathrm{\Omega}}_{0}}{\text{inf}}$$

But for any set that can be approximated by *C _{}*,

$$\underset{\theta {(y-\theta )}^{\text{T}}{I}_{11}(y-\theta )=\underset{\theta {C}_{}{(y-\theta )}^{\text{T}}{I}_{11}(y-\theta )+o({\Vert y\Vert}^{2}),}{\text{inf}}}{\text{inf}}$$

and therefore,

$$\begin{array}{lll}-2\text{log}{\lambda}^{*}\hfill & =\hfill & \underset{\theta {C}_{{\mathrm{\Omega}}_{0}}}{\text{inf}}\hfill & \hfill & \hfill & -\underset{\theta {C}_{{\mathrm{\Omega}}_{1}}}{\text{inf}}\hfill \end{array}$$

Since *C*_{Ω} and *C*_{Ω0} are positively homogeneous,

$$-2\text{log}{\lambda}^{*}=\underset{\theta {C}_{{\mathrm{\Omega}}_{0}}}{\text{inf}}$$

where
${Z}_{n}={n}^{1/2}{I}_{11}^{-1}{A}_{1}^{*}$ and
${Z}_{n}\to Z~N(0,{I}_{11}^{-1}{I}_{11}^{*}{I}_{11}^{-1})$ as *n* → ∞. The function inf_{θϕ}(*Z* − *θ*)^{T}
*I*_{11}(*Z* − *θ*) is a continuous function of *Z*. Therefore, the asymptotic distribution of −2 log *λ*^{*} is the distribution of

$$\underset{\theta {C}_{{\mathrm{\Omega}}_{0}}}{\text{inf}}$$

under the assumption that
$Z~N(0,{I}_{11}^{-1}{I}_{11}^{*}{I}_{11}^{-1})$. By the equation *T* = max{−2 log *λ*^{*}(*x*), 0}, we have that the asymptotic distribution of *T* is the distribution of

$$\underset{\theta {C}_{{\mathrm{\Omega}}_{0}}}{\text{inf}}$$

where *C*_{Ω} = *C*_{Ω0} *C*_{Ω1}.

The loglikelihood function contributed by the *i*th litter is

$$\begin{array}{l}\text{log}\mathrm{{L}_{i}(\theta ,\varphi )}\text{log}\mathrm{Pr({y}_{i}|{m}_{i},{x}_{i};\theta ,\varphi )}& =I({x}_{i}=0)\left\{\sum _{k=0}^{{y}_{i}-1}\text{log}({\pi}_{c}+k{\psi}_{c})+\sum _{k=0}^{{m}_{i}-{y}_{i}-1}\text{log}(1-{\pi}_{c}+k{\psi}_{c})-\sum _{k=0}^{{m}_{i}-1}\text{log}(1+k{\psi}_{c})\right\}\\ & +I({x}_{i}=1)\{\sum _{k=0}^{{y}_{i}-1}\text{log}({\pi}_{t}+k{\psi}_{t})+\sum _{k=0}^{{m}_{i}-{y}_{i}-1}\text{log}(1-{\pi}_{t}+k{\psi}_{t})-\sum _{k=0}^{{m}_{i}-1}\text{log}(1+k{\psi}_{t})\},\end{array}$$

(B1)

where *ψ*_{c} = *ϕ _{c}*/(1 −

To calculate the asymptotic variances of ${\widehat{\varphi}}_{c}^{*}$ and ${\widehat{\varphi}}_{t}^{*}$, we first notice that they are asymptotically independent and we only need to calculate the asymptotic variance of ${\widehat{\varphi}}_{c}^{*}$ since calculation for ${\widehat{\varphi}}_{t}^{*}$ is similar.

Denote *g*(*ϕ _{c}, m_{i}, y_{i}, π_{c}*) = (

$$\sum _{{x}_{i}=0}[g({\varphi}_{c},{m}_{i},{y}_{i},{\widehat{\pi}}_{c})+\{g({\varphi}_{c},{m}_{i},{y}_{i},{\widehat{\pi}}_{c})/{\varphi}_{c}\}({\widehat{\varphi}}_{c}^{*}-{\varphi}_{c})]\approx 0.$$

Therefore,

$${{n}_{0}}^{1/2}({\widehat{\varphi}}_{c}^{*}-{\varphi}_{c})\approx {\{-{n}_{0}^{-1}\sum _{{x}_{i}=0}g({\varphi}_{c},{m}_{i},{y}_{i},{\widehat{\pi}}_{c})/{\varphi}_{c}\}-1{n}_{0}^{-1/2}\sum _{{x}_{i}=0}g({\varphi}_{c},{m}_{i},{y}_{i},{\widehat{\pi}}_{c}).}^{}$$

Apply again the Taylor expansion of *g*(*ϕ _{c}, m_{i}, y_{i}, _{c}*), this time around

$$\begin{array}{lll}{{n}_{0}}^{1/2}({\widehat{\varphi}}_{c}^{*}-{\varphi}_{c})\hfill & \approx \hfill & {\{-{n}_{0}^{-1}\sum _{{x}_{i}=0}\frac{g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c}){\varphi}_{c}}{}\}-1}^{}\hfill & \hfill & \times [\left\{{{n}_{0}}^{-1/2}\sum _{{x}_{i}=0}g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c})\right\}+\left\{{n}_{0}^{-1}\sum _{{x}_{i}=0}\frac{g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c}){\pi}_{c}}{}\}{{n}_{0}}^{1/2}({\widehat{\pi}}_{c}-{\pi}_{c})\right]\hfill & \approx \hfill & {\{-{n}_{0}^{-1}\sum _{{x}_{i}=0}\frac{g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c}){\varphi}_{c}}{\}}-1\left[\right\{{{n}_{0}}^{-1/2}\sum _{{x}_{i}=0}g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c})\}}^{}\hfill & \hfill & -\left\{{n}_{0}^{-1}\sum _{{x}_{i}=0}\frac{g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c}){\pi}_{c}}{}\}{\left\{{n}_{0}^{-1}\sum _{{x}_{i}=0}{m}_{i}\right\}}^{-1}{{n}_{0}}^{-1/2}\sum _{{x}_{i}=0}({m}_{i}{\pi}_{c}-{y}_{i})\right]\hfill & =\hfill & {\{-{n}_{0}^{-1}\sum _{{x}_{i}=0}\frac{g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c}){\varphi}_{c}}{}\}-1\times {{n}_{0}}^{-1/2}\sum _{{x}_{i}=0}}^{}\hfill & \hfill & \times [g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c})-\left\{{n}_{0}^{-1}\sum _{{x}_{i}=0}\frac{g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c}){\pi}_{c}}{}\}{\left\{{n}_{0}^{-1}\sum _{{x}_{i}=0}{m}_{i}\right\}}^{-1}({m}_{i}{\pi}_{c}-{y}_{i})\right],\hfill \hfill \hfill \hfill \hfill \hfill \end{array}$$

where the second to last step is by the fact that * _{c}* is the solution of the estimating equation ∑

Finally, the asymptotic variance of ${\widehat{\varphi}}_{c}^{*}$ is

$$\begin{array}{l}{\{-\frac{1}{{n}_{0}}\sum _{{x}_{i}=0}\frac{g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c}){\varphi}_{c}}{}\}-2}^{}\times \frac{1}{{n}_{0}}\sum _{{x}_{i}=0}{[g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c})-\left\{\frac{1}{{n}_{0}}\sum _{{x}_{i}=0}\frac{g({\varphi}_{c},{m}_{i},{y}_{i},{\pi}_{c}){\pi c}_{}}{}\}{\left\{\frac{1}{{n}_{0}}\sum _{{x}_{i}=0}{m}_{i}\right\}}^{-1}({m}_{i}{\pi}_{c}-{y}_{i})\right]2.}^{}\end{array}$$

- Bandeen-Roche KJ, Liang K-Y. Modelling failure-time associations in data with multiple levels of clustering. Biometrika. 1996;83:29–39.
- Beaty TH, Self SG, Liang K-Y, Connolly MA, Chase GA, Kwiterovich PO. Use of robust variance components models to analyse triglyceride data in families. Ann Hum Genet. 1985;49:315–28. [PubMed]
- Chernoff H. On the distribution of the likelihood ratio. Ann Math Statist. 1954;25:573–8.
- Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978;65:141–51.
- Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. J. R. Statist. Soc. B. 2004;66:165–85.
- Gong G, Samaniego FJ. Pseudo maximum likelihood estimation: theory and applications. Ann Statist. 1981;9:861–9.
- Graybill FA. Theory and Application of the Linear Model. North Scituate, MA: Duxbury Press; 1976.
- Haseman JK, Kupper LL. Analysis of dichotomous response data from certain toxicological experiments. Biometrics. 1979;35:281–93. [PubMed]
- Hauck WW, Donner A. Wald’s test as applied to hypotheses in logit analysis. J Am Statist Assoc. 1977;72:851–3.
- Liang K-Y, Self SG. On the asymptotic behaviour of the pseudolikelihood ratio test statistic. J. R. Statist. Soc. B. 1996;58:785–96.
- Oakes D. Bivariate survival models induced by frailties. J Am Statist Assoc. 1989;84:487–93.
- Parke WR. Pseudo maximum likelihood estimation: the asymptotic distribution. Ann Statist. 1986;14:355–7.
- Pierce DA. The asymptotic effect of substituting estimators for parameters in certain types of statistics. Ann Statist. 1982;10:475–8.
- Self SG, Liang K-Y. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Statist Assoc. 1987;82:605–10.
- Shih JH, Louis TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics. 1995;51:1384–99. [PubMed]
- Stram DO, Lee JW. Variance components testing in the longitudinal mixed effects model. Biometrics. 1994;50:1171–7. [PubMed]
- Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979;16:439–54. [PubMed]
- Weil CS. Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Food Cosmet Toxicol. 1970;8:177–82. [PubMed]
- Williams DA. Estimation bias using the beta-binomial distribution in teratology. Biometrics. 1988;44:305–9. [PubMed]

Articles from Biometrika are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |