|Home | About | Journals | Submit | Contact Us | Français|
The mixed-effects models with two variance components are often used to analyze longitudinal data. For these models, we compare two approaches to estimating the variance components, the analysis of variance approach and the spectral decomposition approach. We establish a necessary and sufficient condition for the two approaches to yield identical estimates, and some sufficient conditions for the superiority of one approach over the other, under the mean squared error criterion. Applications of the methods to circular models and longitudinal data are discussed. Furthermore, simulation results indicate that better estimates of variance components do not necessarily imply higher power of the tests or shorter confidence intervals.
In the past two decades, the mixed-effects linear model has received considerable attention from both theoretical and practical points of view due to its extensive applications, for example, in analyzing longitudinal data arising from the health sciences, computer graphics and mechanical engineering. For a comprehensive overview of this model, see Davidian and Giltinan (1996), Diggle, et al. (2002) and Demidenko (2004).
Consider the mixed-effects linear model with two variance components
where β is p × 1 vector of fixed effects, ui is random effect, ui and εit are assumed to be independent and normally distributed with mean 0 and variance , respectively. Writing this model in matrix form, we have
where y and X are of dimensions n × 1 and n × p, respectively, n = NT, Z = IN 1T, denotes the Kronecker product, IN is the identity matrix of order N, and 1T is a vector of ones of dimension T, u = (u1, u2,…, uN)′ and ε = (ε11,…, ε1T,…, εN1,…, εNT )′. Thus the covariance matrix of y is given by
Analysis of variance is one of the popular methods in the statistical literature to estimate the variance components . Compared to other estimates such as the maximum likelihood estimate (MLE), restricted maximum likelihood estimate (REMLE), minimum norm quadratic unbiased estimate (MINQUE), the analysis of variance estimate (ANOVAE) has simple closed forms that make it feasible to study analytically its small sample behavior and construct exact tests and confidence intervals for the variance components based on the estimates; see, e.g. Wald (1940), Searle, et al. (1992), and Burch and Iyer (1997). In contrast, when the parameter lies in the boundary of the parametric space, the asymptotic distributions of the likelihood-based tests, such as the likelihood ratio tests, do not follow chi-square distributions, as showed in Stram and Lee (1994) and Crainiceanu and Ruppert (2004). Moreover, solutions based on maximum likelihood (ML) estimates and restricted maximum likelihood (REML) estimates usually have very poor performance in mixed models with relatively small sample size; see Burdick and Larsen (1997).
Exact tests and confidence intervals are often required in practice. For the two-variance-component mixed-effects model (1), a new approach, referred to as the spectral decomposition, to estimating the variance components, was recently developed by Wang and Yin (2002). The resulting estimate, the spectral decomposition estimate (SDE), possesses similar advantages as the ANOVAE, having a simple closed form and allowing the construction of exact tests and confidence intervals.
The present paper serves as the first in the literature to compare analytically the small sample properties of ANOVAE and SDE under model (1). In Section 2 we introduce the ANOVAE and SDE of the variance components. A necessary and sufficient condition is derived in Section 3 under which the two methods yield identical estimates. In Section 4, a relationship between the two estimates is established, and some conditions are found under which one estimate is superior to the other in terms of mean squared error (MSE). Exact test and confidence intervals based on the two estimates are constructed in Section 5. Applications to two special models are given in Section 6. Section 7 presents some simulation results and some discussions are given in Sections 8.
Denote by A− a generalized inverse of a matrix A, and write PA = A(A′ A)−A′. Then PA is the orthogonal projector onto (A), the column space of a matrix A. Let QA = I − PA. Using Henderson Method III (Henderson, 1953 or Searle, et al., 1992), we can obtain the ANOVAE of as
where (X : Z) is a n × (p + N) matrix consisting of the column vectors of the matrices X and Z, rk(A) and tr(A) stand for the rank and trace of a matrix A, respectively, and r0 = rk(X : Z).
Note that the covariance matrix of y can be decomposed as
where . Left-multiplying model (1) by PZ and QZ, respectively, gives
both being singular linear models. From the unified theory of least squares (Rao, 1973), we can use
where m = N − rk(PZX) and b = rk(QZ) − rk(QZX), to estimate and λ, respectively. An estimate of is then given by
Wang and Yin (2002) derived the two estimates and termed them as the spectral decomposition estimates of , respectively. It is easy to verify (Craig’s Theorem, see Rao, 1973) that the two quadratic forms, y′(PZ − PZX(X′PZX)−X′PZ)y and y′(QZ − QZX(X′QZX)−X′QZ)y, are independent, and distributed as , respectively, where denotes a χ2 variable with k degrees of freedom. Using this we can construct exact tests and confidence intervals for based on these quadratic forms; see Section 4 for details.
We first notice that the ANOVA and spectral decomposition approach lead to the same estimate for , as stated below.
Under model (1) the ANOVAE and SDE of are identical, that is,
It follows from (A : B) = (A : QAB) and A′QAB = 0 that
for any matrices A and B with the same number of rows. Thus
However, the estimates of from the ANOVA and spectral decomposition can be quite different. We derive below conditions under which the two estimates, , are identical. Write υ = y − Xβ,
Clearly, both are unbiased estimators of , and can be reexpressed as , where υ ~ N (0, V(σ2)). Using the fact (Wang and Chow, 1994) that for any known symmetric matrix A, Var(υ′Aυ) = 2tr(AV AV), we have
We will need the following two lemmas to prove the main results.
(Wang and Chow, 1994). Let P = PAPB. Then
The following statements are equivalent.
where dim(·) denotes the dimension of a space.
Note that for any vector c (A) ∩ (B), there exist vectors α and γ such that c = Aα = Bγ. It follows that PBPAc = PBPAAα = PBAα = PBBγ = Bγ = c, yielding
If (i) holds, then PA(PBA) = PBPAA = PBA, which is equivalent to (PBA) (A). Note that (PBA) (B). Thus (PBA) (A) ∩ (B). This together with (11) yield (ii).
Conversely, if (ii) is true, then by (11) we have (A)∩(B) =(PBPA), and PA(PBPA) = PBPA. Thus PBPA is symmetric, and (i) holds.
That (ii) and (iii) are equivalent is obvious from (11) and the fact that dim((PBA)) = rk(PBA), and that (iv) and (i) are equivalent is a direct consequence of Lemma 3.1.
The two lemmas lead to the following theorem.
if and only if PXPZ is symmetric, i.e. PXPZ = PZPX.
Note that for any if and only if the following three equalities all hold:
Write r1 = dim((X) ∩ (Z)). Then r0 = rk(X) + rk(Z) − r1. Since rk(Z) = N, we have
Thus the three conditions (a)–(c) are equivalent to
Note that the last equality above is equivalent to rk(PZX) = r1. Hence by Lemma 3.2 the last equality of (12) holds if and only if PXPZ = PZPX. On the other hand, if PXPZ = PZPX, then the first two equalities of (12) hold. Theorem 3.2 follows.
We now give the main results.
if and only if PXPZ = PZPX.
Note that PXPZ = PZPX if and only if QXPZ = PZQX. By Lemma 3.2, if PXPZ = PZPX, then
Using (8) we have
On the other hand, if , then . By Theorem 3.2, we have PXPZ = PZPX. This completes the proof.
Both are uniformly minimum variance ubiased (UMVU) estimates of if PXPZ = PZPX, see Wu and Wang (2004).
We exemplify the main results with two popular mixed-effects models, both satisfying the identity condition PXPZ = PZPX, and thus the ANOVA and spectral decomposition yield the same estimates for .
The one-way classification model has the form
where μ is a fixed parameter, ui is a random effect, ui and εij are independent with . Here X = 1ab, Z = Ia 1b. Thus PXPZ = PZPX = PX = ab, where .
The two-way classification model is given by
where αi is a random effect, βj is a fixed effect. We assume that , and all αis and εijs are mutually independent.
In matrix form, the model can be expressed as
where γ = (μ, β1, , βb)′, α = (α1, , ,αa)′, and X = (1a 1b : 1a Ib), Z = Ia 1b. It is easy to verify that
An important question to be answered concerning ANOVAE and SDE is, “If PXPZ ≠ PZPX, then which estimator is better and in what sense?”. In this section, we obtain conditions under which one estimator has smaller mean squared error than the other. Since both estimators are unbiased, their mean squared error is identical to their variances. To this end, let us express the spectral decomposition of the matrix QXZZ′QX as
where d1, , dg are the different nonzero eigenvalues of the matrix QXZZ′QX, Mi is the orthogonal projector onto the corresponding eigenvalue subspace. Note that g > 1 because PXPZ ≠ PZPX.
Using the facts that ,
with , where mi = tr(Mi), and y′M1y, , y′Mgy are independent.
T is a nonzero eigenvalue of QXZZ′QX and PZ − PZX(X′PZX)−X′PZ is the orthogonal projector onto the eigenvalue subspace corresponding to T. Without loss of generality, we assume that
The ANOVAE of is a weighted average of , that is
Thus the SDE under model (1), in fact, is an unbiased estimate deduced by only one quadratic form of y′M1y, , y′Mgy.
The corresponding coefficients of and σ4 in are larger than their counterparts in if PXPZ ≠ PZPX.
Using the facts that
and the equalities above hold if and only if PXPZ = PZPX by Theorem 3.2. Theorem 4.2 thus follows.
we obtain the ratio of the coefficient of in to that in :
it follows that the first two fractions in the right-hand side of (16) are larger than or equal to 1, and the third fraction is no larger than 1. Combining these with Theorem 4.2 we have
If the right-hand side of (16) is less than 1, then .
The ANOVAE is not uniformly superior to the SDE . If and the right-hand side of (16) > 1, then .
Example 6.2 in Section 6 illustrates that (16) is not always less than 1. Note that is a quadratic function of with the negative coefficients for the terms τ2 and τ, and the common term is obtained by subtracting the denominator from the numerator of (16). Thus when the right-hand side of (16) is larger than 1, for small τ and for large τ; see Example 6.2. That is, the SD estimate is preferred when the predictor varies considerably across subjects and the random effect variance and the number of repeated observations for each subject, T, are small.
A common interest in model (1) is testing hypotheses and constructing confidence intervals concerning the relative magnitude, , of the variation due to the random effects and the random errors. To compare the two approaches under investigation, we assume that PXPZ ≠ PZPX so the two methods yield different estimates of .
Consider the hypothesis
which follows an F-distribution with degrees of freedom m1 and b under the null hypothesis H0 : θ = θ0. Here, , is the spectral decomposition estimate of θ.
We reject the null hypothesis H0 if F(θ0) > Fa,b(α). The power of the test is given by
where is an F-statistic with degrees of freedom m1 and b, and Fm1,b(α) is the corresponding upper 100α% quartile.
In contrast the quadratic forms y′(P(X:Z)−PX)y and y′(I −P(X:Z))y in the ANOVA method can not be directly used to construct an exact test for H0 except when θ0 = 0, though the two quadratic forms are independent for any θ ≥ 0. From (14) we have , which is not a χ2 variable, but rather a linear combination of two or more independently distributed χ2 random variables if θ ≠ 0.
In the case of θ0 = 0, the exact test statistic based on the quadratic forms in the ANOVA method can be given by
If θ0 > 0, FA is intractable because of its complicated distribution. As an alternative instead, Wald’s test statistic
is often used in the literature, see Wald (1940). Clearly, W(0) = FA.
To compare the power of the two tests based on F(θ0) and W(θ0), we define
Then W(θ0), in fact, is a weighted sum of these F-statistics, that is
It is easy to see that W(θ0) follows an F-distribution with degrees of freedom a and b under the null hypothesis H0 : θ = θ0, and the power is given by
Intuitively the power of the two tests based on F(θ0) and W(θ0) should be close when m1/a approaches to 1; see simulation results in Section 7. In this case, the former test is more appealing in practice due to its simplicity.
Note that the pivotal quantities F(θ) ~ Fm1,b and W(θ) ~ Fa,b are decreasing functions of θ. Thus two exact 100(1 − α)% confidence intervals for θ, based on pivotal quantities F(θ) and W(θ), respectively, can be given by the sets
Furthermore, (23) can be simplified as
In Section 7, we will compare the expected lengths of these confidence intervals via Monte Carlo simulation.
Section 3 describes two simple examples in which the identity condition PXPZ = PZPX always holds and thus the ANOVAE and SDE are identical. In this section we examine two more popular models and discuss when the identity condition holds.
The circular feature of a machined part is one of the most basic geometric primitives, and can be described easily by its center and radius. Due to imperfections introduced in manufacturing, the desired feature may not be truly circular. In order to control the production, we need to estimate the geometric parameters (center and radius), which requires data on machined parts along their circumferences, and a corresponding statistical model. In practice the data can be obtained using a computer-controlled coordinate measuring machine, while one of the models adopted for such data is the mixed-effects model provided by Wang and Lam (1997), which takes into consideration of the variability in center location of different machined parts.
Let (xij, yij) be the jth coordinated measurement on the circumference of the ith machined part, i = 1, … , m, j = 1, , n, where m is the number of machined parts, and n is the number of measurements taken from the circumference. Denote by τi(j) the angle of the jth measurement of the ith part. In general the phase change or angular difference τi(j+1) − τi(j) can be assumed to be known. Write
αi = ρicosθi0 and βi = ρisinθi0, where θj is known and θi0 is fixed but unknown. Then for the data points (xij, yij), we have the following model (see Wang and Lam, 1997)
where (ξ, η)′ is the designed location of the center of the part, (u1i, u2i)′ is the random variable in the location of the center of the ith machined part, ϵkij ~ N (0, σ2), , k = 1, 2, i = 1, , m, j = 1, , n, and all ϵkijs and ukis are independent.
Put z = (z1, , zm)′, , xi = (xi1, , xim)′, yi = (yi1, , yim)′, γ = (ξ, η, α1, β1, , αm, βm)′, u = (u11, u21, u12, u22, , u1m, u2m)′ and
with ϕ1 = (cosθ1, , cos θn)′ , ϕ2 = (sin θ1, , sin θn)′. Then model (25) can be rewritten in matrix form as
where X = (1m I2 1n : Im Φ), Z = I2m 1n. Clearly, the covariance matrix of z is
It is easy to verify that PXPZ = PZPX under Model (25) is equivalent to
According to Theorems 3.1 and 3.2, the SDE and ANOVAE of the variance components are equal if (26) holds. And the condition = = 0 can be easily satisfied by sampling n measurements equally spaced around the circumference of the circular feature.
Consider the popular model for longitudinal data:
where ui is the random effect, ui and εit have the same assumption as model (1), see Diggle, et al. (2002). In fact, Model (27) is a special case with design matrix X = (1n : x) where . Without loss of generality, we assume that rk(X) = 2. That is, x ≠ a1n for any scalar quantity a.
. Thus tr(PXPZ) = [tr(PXPZ)]2 if and only if sb(x) = 0 or sb(x) = 1. Combining with the following facts
, we can show that PZPX = PXPZ in Model (27) is equivalent to sb(x) = 0 or sb(x) = 1. In fact, if PZPX = PXPZ then tr(PXPZ) = [tr(PXPZ)]2, thus we prove sb(x) = 0 or sb(x) = 1. On the other hand, we have PZPX = n if sb(x) = 0; PZPX = P(1:c) T if sb(x) = 1, where c = (c1, , cN)′. Thus we obtain PZPX = PXPZ.
It follows from Theorem 3.3 that the necessary and sufficient condition for the identity of the SDE and ANOVAE of under model (27) is 1. = 2. = = N. or xi = ci1T, i = 1, , N. However, the two conditions are not usually satisfied in practice.
In the following, we compare the the SDE and ANOVAE of under general case 0 < sb(x) < 1. For convenience, we simply write sb = sb(x).
Clearly, if 0 < sb < 1, then rk(PZX) = 2, dim((X) ∩ (Z)) = 1, and the ratio of the coefficients of by (16) is equal to
which is a continuous and increasing function of sb in the interval (0, 1), and for any N > 2 and T > 1, it has
So there exists sb0 (0, 1) such that f(sb) < 1 if sb < sb0 and f(sb) > 1 if sb > sb0. In fact, sb0 can be given by
From Corollary 4.1, it follows that the ANOVAE of is superior to the SDE if 0 < sb < sb0.
it follows that (i) f(sb) < 1 if sb < 0.5 − 1/(2T); (ii) f(sb) > 1 if sb > 0.5 + 1/(2N − 3).
Combining with Theorem 4.2, we conclude that for any the ANOVAE of is superior to the SDE if sb < 0.5 − 1/(2T). On the other hand, the ANOVAE of is not always superior to its SDE. Let which is the quadratic function of , and the coefficients of τ2 and τ are negative. Using the fact that υ(0) > 0 if sb > 0.5 + 1/(2N − 3), we can show that there exist two different solutions for the equation υ(τ) = 0 in this case, denoted by τ1, τ2 (τ1 < τ2). Clearly, τ1 < 0 < τ2. Thus υ(τ) > 0 for any τ (0, τ2) if sb > 0.5 + 1/(2N − 3). That is, .
Under model (27), T and Tsb both are distinct nonzero eigenvalues of the matrix QXZZ′QX, and their multiplicities are N − 2 and 1, respectively. We fix , and compare via Monte Carlo simulation the power of F0(θ0) and W(θ0). Results are displayed for two different choices xit = t + eit + αi and xit = log(t + 1) + eit + 2αi, and three different sample sizes (N,T): (12, 3), (20, 4), (5, 10), with 10000 replicates in each case, where αi ~ N(0, 1) and eit ~ N(0, 1) are independent. Note that for each choice of sample size (N, T), xits are fixed. Here sb0 can be calculated by (28) for the three sample sizes of (N,T), that is, 0.354, 0.387 and 0.499, respectively.
We first obtain two sets of simulated values, (0.345, 0.280, 0.039) and (0.732, 0.741, 0.431), of sb under the three sample sizes of (N,T), respectively, for xit = t + eit + αi and xit = log(t + 1) + eit + 2αi. Clearly, the three values of sb in the first set and the last value in the second set are smaller than the corresponding sb0, and the first two values of sb in the second set are larger than the corresponding sb0. Using the results in example 6.2, we can obtain the following results under three sample sizes above. For the first choice xit = t + eit + αi the ANOVAE of is superior to its SDE for any . For the second choice xit = log(t + 1) + eit + 2αi, the SDE of is superior to its ANOVAE only if for (N, T) = (12, 3), or only if for (N, T) = (20, 4).
Secondly, we take random numbers from the independent χ2 distributions with 10000 replicates for each of the six cases above. Using formula (18) and (22), we can obtain simulated test powers of F(θ0) and W(θ0), see Table 1, Table 2.
Table 1 and Table 2 show that the test based on W(θ0) is more powerful than that based on F(θ0) for the null hypothesis H0 : θ = θ0 under sample size (N, T) = (5, 10), a scenario with relatively large number of repeated observations for each subject but small number of subjects. There is no substantial difference between the power of the two tests above under the sample size (N, T) = (12, 3) or (20, 4), a scenario with relatively small number of repeated observations for each subject but large number of subjects, regardless of wether the ANOVAE is superior to the SDE. As a subsequent observation, the result indicates that better estimates of variance components do not necessarily imply higher power of the tests. A similar phenomenon is observed in the expected lengths of confidence interval (24) and (23); see Figure 1 below.
Thus we can adopt the simpler test statistic F(θ0) and confidence interval (23) in practice when T, the number of repeated observations for each subject, is relatively small. However they are not good choices due to the low power of the test and the long length of the confidence interval when T is relatively large, especially, when T is larger than N, the number of subjects.
In this paper, we study the finite sample properties of ANOVAE and SDE under model (1). We establish the necessary and sufficient condition for the equality of the two estimates, and some sufficient conditions for the superiority of one estimate over the other under the mean squared error criterion. Furthermore, we consider exact tests and confidence intervals based on the two estimates, and demonstrate via simulations that better estimates of variance components do not necessarily imply higher power of the tests or shorter confidence intervals.
As comparison, the likelihood method has poor small sample size performance in mixed models (see Burdick, 1997). Moreover, when the parameter lies in the boundary of the parametric space, the likelihood-based tests such as Wald and likelihood ratio tests, do not asymptotically follow chi-square distributions as demonstrated in Stram and Lee (1994) and Crainiceanu and Ruppert (2004). For the two models considered in Section 7 with the same settings, Table 3 presents the coverage probabilities of the confidence intervals based on the maximum likelihood estimate. The results in the table clearly indicate that with relatively small sample sizes, the coverage probabilities are substantially below the nominal level of 95%.
As noticed in the literature, both ANOVAE and SDE may take negative values. This is not a problem for Genetic field in which variance component is permitted to be negative, see Burton et al. (1999) and Hazelton and Gurrin (2003). However, estimates of variance components are often required to be non-negative in many other cases. One remedy is to estimate the two variance components using . However further research is needed to find conditions for the superiority among the two truncated estimates. In this regard Theorem 4.1 in this paper is a potentially useful tool for such investigation.
This research was supported by the Intramural Research Program of the National Institute of Child Health and Human Development, National Institutes of Health. Wu’s research was also partially supported by Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality PHR (IHLB) and National Natural Science Foundation of China (NSFC). The opinions expressed in the article are not necessarily of the National Institutes of Health. The authors wish to thank two referees for their valuable suggestions which considerably improved the paper.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.