J Stat Plan Inference. Author manuscript; available in PMC 2010 December 1.
Published in final edited form as:
J Stat Plan Inference. 2009 December 1; 139(12): 3962–3973.
PMCID: PMC2762235
NIHMSID: NIHMS129742

# Estimation of Variance Components in the Mixed-Effects Models: A Comparison Between Analysis of Variance and Spectral Decomposition

## Abstract

The mixed-effects models with two variance components are often used to analyze longitudinal data. For these models, we compare two approaches to estimating the variance components, the analysis of variance approach and the spectral decomposition approach. We establish a necessary and sufficient condition for the two approaches to yield identical estimates, and some sufficient conditions for the superiority of one approach over the other, under the mean squared error criterion. Applications of the methods to circular models and longitudinal data are discussed. Furthermore, simulation results indicate that better estimates of variance components do not necessarily imply higher power of the tests or shorter confidence intervals.

Keywords: Analysis of variance, Best linear unbiased estimate, Least squares estimate, Mixed-effects model, Spectral decomposition

## 1 Introduction

In the past two decades, the mixed-effects linear model has received considerable attention from both theoretical and practical points of view due to its extensive applications, for example, in analyzing longitudinal data arising from the health sciences, computer graphics and mechanical engineering. For a comprehensive overview of this model, see Davidian and Giltinan (1996), Diggle, et al. (2002) and Demidenko (2004).

Consider the mixed-effects linear model with two variance components

where β is p × 1 vector of fixed effects, ui is random effect, ui and εit are assumed to be independent and normally distributed with mean 0 and variance $σu2andσε2$, respectively. Writing this model in matrix form, we have

$y=Xβ+Zu+ε,$
(1)

where y and X are of dimensions n × 1 and n × p, respectively, n = NT, Z = IN 1T, denotes the Kronecker product, IN is the identity matrix of order N, and 1T is a vector of ones of dimension T, u = (u1, u2,…, uN)′ and ε = (ε11,…, ε1T,…, εN1,…, εNT )′. Thus the covariance matrix of y is given by

$Cov(y)=V(σ2)=σu2ZZ′+σε2In.$
(2)

Analysis of variance is one of the popular methods in the statistical literature to estimate the variance components $σu2andσε2$. Compared to other estimates such as the maximum likelihood estimate (MLE), restricted maximum likelihood estimate (REMLE), minimum norm quadratic unbiased estimate (MINQUE), the analysis of variance estimate (ANOVAE) has simple closed forms that make it feasible to study analytically its small sample behavior and construct exact tests and confidence intervals for the variance components based on the estimates; see, e.g. Wald (1940), Searle, et al. (1992), and Burch and Iyer (1997). In contrast, when the parameter lies in the boundary of the parametric space, the asymptotic distributions of the likelihood-based tests, such as the likelihood ratio tests, do not follow chi-square distributions, as showed in Stram and Lee (1994) and Crainiceanu and Ruppert (2004). Moreover, solutions based on maximum likelihood (ML) estimates and restricted maximum likelihood (REML) estimates usually have very poor performance in mixed models with relatively small sample size; see Burdick and Larsen (1997).

Exact tests and confidence intervals are often required in practice. For the two-variance-component mixed-effects model (1), a new approach, referred to as the spectral decomposition, to estimating the variance components, was recently developed by Wang and Yin (2002). The resulting estimate, the spectral decomposition estimate (SDE), possesses similar advantages as the ANOVAE, having a simple closed form and allowing the construction of exact tests and confidence intervals.

The present paper serves as the first in the literature to compare analytically the small sample properties of ANOVAE and SDE under model (1). In Section 2 we introduce the ANOVAE and SDE of the variance components. A necessary and sufficient condition is derived in Section 3 under which the two methods yield identical estimates. In Section 4, a relationship between the two estimates is established, and some conditions are found under which one estimate is superior to the other in terms of mean squared error (MSE). Exact test and confidence intervals based on the two estimates are constructed in Section 5. Applications to two special models are given in Section 6. Section 7 presents some simulation results and some discussions are given in Sections 8.

## 2 ANOVA and spectral decomposition

Denote by A a generalized inverse of a matrix A, and write PA = A(AA)A′. Then PA is the orthogonal projector onto (A), the column space of a matrix A. Let QA = IPA. Using Henderson Method III (Henderson, 1953 or Searle, et al., 1992), we can obtain the ANOVAE of $σε2andσu2$ as

$σ^ε2=1n−r0y′(I−P(X:Z))y,$
(3)

$σ^u2={y′(P(X:Z)−PX)y−(r0−rk(X))σ^ε2}/{Ttr(QXPZ)},$
(4)

where (X : Z) is a n × (p + N) matrix consisting of the column vectors of the matrices X and Z, rk(A) and tr(A) stand for the rank and trace of a matrix A, respectively, and r0 = rk(X : Z).

Note that the covariance matrix of y can be decomposed as

$Cov(y)=λPZ+σε2QZ,$

where $λ=Tσu2+σε2$. Left-multiplying model (1) by PZ and QZ, respectively, gives

both being singular linear models. From the unified theory of least squares (Rao, 1973), we can use

$σ˜ε2=y′(QZ−QZX(X′QZX)−X′QZ)y/b,$
(5)

$λ˜=y′(PZ−PZX(X′PZX)−X′PZ)y/m,$
(6)

where m = N − rk(PZX) and b = rk(QZ) − rk(QZX), to estimate $σε2$ and λ, respectively. An estimate of $σu2$ is then given by

$σ˜u2=(λ˜−σ˜ε2)/T.$
(7)

Wang and Yin (2002) derived the two estimates $σ˜ϵ2andσ˜u2$ and termed them as the spectral decomposition estimates of $σϵ2andσu2$, respectively. It is easy to verify (Craig’s Theorem, see Rao, 1973) that the two quadratic forms, y′(PZPZX(XPZX)XPZ)y and y′(QZQZX(XQZX)XQZ)y, are independent, and distributed as $λχm2andσε2χb2$, respectively, where $χk2$ denotes a χ2 variable with k degrees of freedom. Using this we can construct exact tests and confidence intervals for $σε2orσu2$ based on these quadratic forms; see Section 4 for details.

## 3 Conditions for equality

We first notice that the ANOVA and spectral decomposition approach lead to the same estimate for $σϵ2$, as stated below.

### Theorem 3.1

Under model (1) the ANOVAE and SDE of $σε2$ are identical, that is,

$σ^ε2=σ˜ε2.$

#### Proof

It follows from (A : B) = (A : QAB) and AQAB = 0 that

$P(A:B)=PA+QAB(B′QAB)−B′QA,$
(8)

and

$rk(A:B)=rk(A)+rk(QAB)$

for any matrices A and B with the same number of rows. Thus

$QZ−QZX(X′QZX)−X′QZ=I−(PZ+QZX(X′QZX)−X′QZ)=I−P(X:Z),b=rk(QZ)−rk(QZX)=n−r0.$

Therefore $σ^ε2andσ˜ε2$.

However, the estimates of $σu2$ from the ANOVA and spectral decomposition can be quite different. We derive below conditions under which the two estimates, $σ˜u2andσ^u2$, are identical. Write υ = yXβ,

$A0={(P(X:Z)−PX)−(r0−rk(X))(I−P(X:Z))n−r0}{Ttr(QXPZ)},A1={PZ−PZX(X′PZX)−X′PZN−rk(PZX)−QZ−QZX(X′QZX)−X′QZn−r0}T.$

Clearly, both $σ˜u2andσ^u2$ are unbiased estimators of $σu2$, and can be reexpressed as $σ^u2=υ′A0υ,σ˜u2=υ′A1υ$, where υ ~ N (0, V2)). Using the fact (Wang and Chow, 1994) that for any known symmetric matrix A, Var(υ′Aυ) = 2tr(AV AV), we have

$Var(σ^u2)=2{σε4(n−rk(X))(r0−rk(X))T2[tr(QXPZ)]2(n−r0)+σε2σu22Ttr(QXPZ)+σu4tr(QXPZ)2[tr(QXPZ)]2},$
(9)

and

$Var(σ˜u2)=2{σε4(n+N−r0−rk(PZX))T2[N−rk(PZX)](n−r0)+σε2σu22T[N−rk(PZX)]+σu41N−rk(PZX)}.$
(10)

We will need the following two lemmas to prove the main results.

### Lemma 3.1

(Wang and Chow, 1994). Let P = PAPB. Then

1. P is an orthogonal projection matrix if and only if PAPB = PBPA.
2. If PAPB = PBPA, then P is the orthogonal projection matrix onto (A) ∩ (B).

### Lemma 3.2

The following statements are equivalent.

1. PAPB = PBPA
2. (A) ∩ (B) = (PBA)
3. rk(PBA) = dim((A) ∩ (B))
4. PBA(APBA)APB = PAPB

where dim(·) denotes the dimension of a space.

#### Proof

Note that for any vector c (A) ∩ (B), there exist vectors α and γ such that c = Aα = Bγ. It follows that PBPAc = PBPAAα = PBAα = PBBγ = Bγ = c, yielding

$M(A)∩M(B)⊆M(PBPA)⊆M(PBA).$
(11)

If (i) holds, then PA(PBA) = PBPAA = PBA, which is equivalent to (PBA) (A). Note that (PBA) (B). Thus (PBA) (A) ∩ (B). This together with (11) yield (ii).

Conversely, if (ii) is true, then by (11) we have (A)∩(B) =(PBPA), and PA(PBPA) = PBPA. Thus PBPA is symmetric, and (i) holds.

That (ii) and (iii) are equivalent is obvious from (11) and the fact that dim((PBA)) = rk(PBA), and that (iv) and (i) are equivalent is a direct consequence of Lemma 3.1.

The two lemmas lead to the following theorem.

### Theorem 3.2

$Var(σ˜u2)=Var(σ^u2)$ if and only if PXPZ is symmetric, i.e. PXPZ = PZPX.

#### Proof

Note that for any $σu2≥0andσε2>0,Var(σ˜u2)=Var(σ^u2)$ if and only if the following three equalities all hold:

1. $(n−rk(X))(r0−rk(X))[tr(QXPZ)]2=n+N−r0−rk(PZX)N−rk(PZX)$,
2. tr(QXPZ) = N − tr(PXPZ) = N − rk(PZX)
3. [tr(QXPZ)]2/[tr(QXPZ)]2 = 1/(N − rk(PZX)).

Write r1 = dim((X) ∩ (Z)). Then r0 = rk(X) + rk(Z) − r1. Since rk(Z) = N, we have

$n+N−r0−rk(PZX)=n−rk(X)+r1−rk(PZX).$

Thus the three conditions (a)–(c) are equivalent to

$[tr(QXPZ)]2=tr(QXPZ)=N−rk(PZX)=N−r1.$
(12)

Note that the last equality above is equivalent to rk(PZX) = r1. Hence by Lemma 3.2 the last equality of (12) holds if and only if PXPZ = PZPX. On the other hand, if PXPZ = PZPX, then the first two equalities of (12) hold. Theorem 3.2 follows.

We now give the main results.

### Theorem 3.3

$σ˜u2=σ^u2$ if and only if PXPZ = PZPX.

#### Proof

Note that PXPZ = PZPX if and only if QXPZ = PZQX. By Lemma 3.2, if PXPZ = PZPX, then

Using (8) we have

$P(X:Z)−PX=QXPZ=PZ−PZX(X′PZX)−X′PZ.$

Thus $σ˜u2=σ^u2$.

On the other hand, if $σ˜u2=σ^u2$, then $Var(σ˜u2)=Var(σ^u2)$. By Theorem 3.2, we have PXPZ = PZPX. This completes the proof.

### Remark 3.1

Both $σ˜u2andσ^u2$ are uniformly minimum variance ubiased (UMVU) estimates of $σu2$ if PXPZ = PZPX, see Wu and Wang (2004).

We exemplify the main results with two popular mixed-effects models, both satisfying the identity condition PXPZ = PZPX, and thus the ANOVA and spectral decomposition yield the same estimates for $σu2$.

### Example 3.1

The one-way classification model has the form

where μ is a fixed parameter, ui is a random effect, ui and εij are independent with $ui~N(0,σu2)andεij~N(0,σε2)$. Here X = 1ab, Z = Ia 1b. Thus PXPZ = PZPX = PX = ab, where $J¯t=1t,1t′/t$.

### Example 3.2

The two-way classification model is given by

where αi is a random effect, βj is a fixed effect. We assume that $αi~N(0,σα2)andεij~N(0,σε2)$, and all αis and εijs are mutually independent.

In matrix form, the model can be expressed as

$y=Xγ+Zα+ε,$

where γ = (μ, β1, , βb)′, α = (α1, , ,αa)′, and X = (1a 1b : 1a Ib), Z = Ia 1b. It is easy to verify that

$PXPZ=PZPX=(J¯a⊗Ib)·(Ia⊗J¯b)=J¯a⊗J¯b.$

## 4 Conditions for superiority

An important question to be answered concerning ANOVAE and SDE is, “If PXPZPZPX, then which estimator is better and in what sense?”. In this section, we obtain conditions under which one estimator has smaller mean squared error than the other. Since both estimators are unbiased, their mean squared error is identical to their variances. To this end, let us express the spectral decomposition of the matrix QXZZQX as

$QXZZ′QX=∑i=1gdiMi,$
(13)

where d1, , dg are the different nonzero eigenvalues of the matrix QXZZQX, Mi is the orthogonal projector onto the corresponding eigenvalue subspace. Note that g > 1 because PXPZPZPX.

Using the facts that $Mi2=Mi,MiMj=0(i≠j)$,

$P(X:Z)−PX=QXZ(Z′QXZ)−Z′QX=PQXZ=∑i=1gMi,$

we obtain

$y′(P(X:Z)−PX)y=∑i=1gy′Miy,$
(14)

with $y′Miy~(diσu2+σε2)χmi2$, where mi = tr(Mi), and yM1y, , yMgy are independent.

Since

$QXZZ′QX(PZ−PZX(X′PZX)−X′PZ)=T(PZ−PZX(X′PZX)−X′PZ),$

T is a nonzero eigenvalue of QXZZQX and PZPZX(XPZX)XPZ is the orthogonal projector onto the eigenvalue subspace corresponding to T. Without loss of generality, we assume that

and

$σ^u2(i)=(y′Miy/mi−σ^ε2)/di,i=1,⋯,g.$

It follows straightforwardly that $σ^u2(1)=σ˜u2$, and each $σ^u2(i)$ is an unbiased estimate of $σu2$. Combining (4) and (14), we obtain the following theorem.

### Theorem 4.1

The ANOVAE of $σu2$ is a weighted average of ${σ^u2(i):i=1,⋯,g}$, that is

$σ^u2=Tm1σ˜u2/Δ+∑i=2gdimiσ˜u2(i)/Δ,$
(15)

where $Δ=Ttr(QXPZ)=∑i=1gdimi$.

Thus the SDE $σ˜u2$ under model (1), in fact, is an unbiased estimate deduced by only one quadratic form of yM1y, , yMgy.

In the following, we will compare the variance of $σ^u2$, the weighted average of ${σ^u2(i):i=1,⋯,g}$, with that of $σ˜u2$ when PXPZPZPX. Recall Eq. (9) and Eq. (10), we have

### Theorem 4.2

The corresponding coefficients of $σu2σε2$ and σ4 in $Var(σ˜u2)$ are larger than their counterparts in $Var(σ˜u2)$ if PXPZPZPX.

#### Proof

Using the facts that

$rk(PZX)=tr(PZX(X′PZX)−X′PZ)≥tr(PZPXPZ)=tr(PXPZ),$

and

$[tr(QXPZ)]2=tr(QXPZQXPZQX)≤tr(QXPZQX)=tr(QXPZ)=N−tr(PXPZ),$

we have

$[tr(QXPZ)]2[tr(QXPZ)]2≤1tr(QXPZ)≤1N−rk(PZX),$

and the equalities above hold if and only if PXPZ = PZPX by Theorem 3.2. Theorem 4.2 thus follows.

Note that

$r0−rk(X)=N−dim(M(X)∩M(Z)),n+N−r0−rk(PZX)=n−rk(PZX:QZX),$

we obtain the ratio of the coefficient of $σε4$ in $Var(σ^u2)$ to that in $Var(σ˜u2)$:

(16)

Because

$rk(PZX)≥tr(PXPz)≥dim(M(X)∩M(Z))≥0,$

and

$n−rk(PZX:QZX)≤n−rk(X),$

it follows that the first two fractions in the right-hand side of (16) are larger than or equal to 1, and the third fraction is no larger than 1. Combining these with Theorem 4.2 we have

### Corollary 4.1

If the right-hand side of (16) is less than 1, then $Var(σ^u2).

### Remark 4.1

The ANOVAE $σ^u2$ is not uniformly superior to the SDE $σ˜u2$. If $(σe2/T)≫σu2$ and the right-hand side of (16) > 1, then $Var(σ^u2)>Var(σ˜u2)$.

Example 6.2 in Section 6 illustrates that (16) is not always less than 1. Note that $(Var(σ^u2)−Var(σ˜u2))/σε2$ is a quadratic function of $τ=Tσu2/σε2$ with the negative coefficients for the terms τ2 and τ, and the common term is obtained by subtracting the denominator from the numerator of (16). Thus when the right-hand side of (16) is larger than 1, $Var(σ^u2)>Var(σ˜u2)$ for small τ and $Var(σ^u2)>Var(σ˜u2)$ for large τ; see Example 6.2. That is, the SD estimate is preferred when the predictor varies considerably across subjects and the random effect variance and the number of repeated observations for each subject, T, are small.

## 5 Exact tests and confidence intervals

A common interest in model (1) is testing hypotheses and constructing confidence intervals concerning the relative magnitude, $θ=σu2/σε2$, of the variation due to the random effects and the random errors. To compare the two approaches under investigation, we assume that PXPZPZPX so the two methods yield different estimates of $σu2$.

Consider the hypothesis

$H0:θ=θ0↔H1:θ>θ0,$

where θ0 ≥ 0. Since the spectral estimates and $σ˜ε2$ in (5) and (6) are independent, with $m1λ˜/λ~χm12,bσ˜ε2/σε2~χb2$, we can construct an exact test statistic

$F(θ0)=Tθ˜+1Tθ0+1=λ˜(Tθ0+1)σ˜ε2=y′M1y(Tθ0+1)m1σ˜ε2,$
(17)

which follows an F-distribution with degrees of freedom m1 and b under the null hypothesis H0 : θ = θ0. Here, $θ˜=σ˜u2/σ˜ε2$, is the spectral decomposition estimate of θ.

We reject the null hypothesis H0 if F0) > Fa,b(α). The power of the test is given by

$P(F(θ0)>Fm1,b(α))=P{Tθ+1Tθ0+1Fm1,b>Fm1,b(α)},$
(18)

where $Fm1,b=bχm12/m1χb2$ is an F-statistic with degrees of freedom m1 and b, and Fm1,b(α) is the corresponding upper 100α% quartile.

In contrast the quadratic forms y′(P(X:Z)PX)y and y′(IP(X:Z))y in the ANOVA method can not be directly used to construct an exact test for H0 except when θ0 = 0, though the two quadratic forms are independent for any θ ≥ 0. From (14) we have $y′(P(X:Z)−PX)y/σε2~∑i=1g(diθ+1)χmi2$, which is not a χ2 variable, but rather a linear combination of two or more independently distributed χ2 random variables if θ ≠ 0.

In the case of θ0 = 0, the exact test statistic based on the quadratic forms in the ANOVA method can be given by

$FA=y′(P(X:Z)−PX)y/ay′(I−P(X:Z))y/b,$
(19)

where b is defined as in (5), $a=rk(QXZ)=∑i=1gmi$, and mi is defined in (14). Under H0 the statistic follows an F-distribution with degrees of freedom a and b.

If θ0 > 0, FA is intractable because of its complicated distribution. As an alternative instead, Wald’s test statistic

$W(θ0)={∑i=1gy′Miy/(diθ0+1)}/ay′(I−P(X:Z))y/b$
(20)

is often used in the literature, see Wald (1940). Clearly, W(0) = FA.

To compare the power of the two tests based on F0) and W0), we define

Then W0), in fact, is a weighted sum of these F-statistics, that is

$W(θ0)=m1aF(θ0)+∑i=2gmiaFi(θ0).$
(21)

It is easy to see that W0) follows an F-distribution with degrees of freedom a and b under the null hypothesis H0 : θ = θ0, and the power is given by

$P{W(θ0)>Fa,b(α)}=P{∑i=1g(diθ+1)χmi2/(diθ0+1)aχb2/b>Fa,b(α)}.$
(22)

Intuitively the power of the two tests based on F0) and W0) should be close when m1/a approaches to 1; see simulation results in Section 7. In this case, the former test is more appealing in practice due to its simplicity.

Note that the pivotal quantities F(θ) ~ Fm1,b and W(θ) ~ Fa,b are decreasing functions of θ. Thus two exact 100(1 − α)% confidence intervals for θ, based on pivotal quantities F(θ) and W(θ), respectively, can be given by the sets

${θ∈[0,+∞):Fm1,b(α/2)≤F0(θ)≤Fm1,b(1−α/2)}$
(23)

and

${θ∈[0,+∞):Fa,b(α/2)≤W(θ)≤Fa,b(1−α/2)}.$
(24)

Furthermore, (23) can be simplified as

${θ∈[0,+∞):y′M1yTm1σ^ε2Fm1,b(1−α/2)−1T≤θ≤y′M1yTm1σ^ε2Fm1,b(α/2)−1T}.$

In Section 7, we will compare the expected lengths of these confidence intervals via Monte Carlo simulation.

## 6 Applications

Section 3 describes two simple examples in which the identity condition PXPZ = PZPX always holds and thus the ANOVAE and SDE are identical. In this section we examine two more popular models and discuss when the identity condition holds.

### 6.1 Circular data

The circular feature of a machined part is one of the most basic geometric primitives, and can be described easily by its center and radius. Due to imperfections introduced in manufacturing, the desired feature may not be truly circular. In order to control the production, we need to estimate the geometric parameters (center and radius), which requires data on machined parts along their circumferences, and a corresponding statistical model. In practice the data can be obtained using a computer-controlled coordinate measuring machine, while one of the models adopted for such data is the mixed-effects model provided by Wang and Lam (1997), which takes into consideration of the variability in center location of different machined parts.

Let (xij, yij) be the jth coordinated measurement on the circumference of the ith machined part, i = 1, … , m, j = 1, , n, where m is the number of machined parts, and n is the number of measurements taken from the circumference. Denote by τi(j) the angle of the jth measurement of the ith part. In general the phase change or angular difference τi(j+1) − τi(j) can be assumed to be known. Write

αi = ρicosθi0 and βi = ρisinθi0, where θj is known and θi0 is fixed but unknown. Then for the data points (xij, yij), we have the following model (see Wang and Lam, 1997)

${xij=ξ+αicosθj−βisinθj+u1i+ϵ1ij,yij=η+αisinθj+βicosθj+u2i+ϵ2ij,$
(25)

where (ξ, η)′ is the designed location of the center of the part, (u1i, u2i)′ is the random variable in the location of the center of the ith machined part, ϵkij ~ N (0, σ2), $uki~N(0,σ02)$, k = 1, 2, i = 1, , m, j = 1, , n, and all ϵkijs and ukis are independent.

Put z = (z1, , zm)′, $zi=(xi′,yi′)′$, xi = (xi1, , xim)′, yi = (yi1, , yim)′, γ = (ξ, η, α1, β1, , αm, βm)′, u = (u11, u21, u12, u22, , u1m, u2m)′ and

$Φ=(ϕ1−ϕ2ϕ2ϕ1)$

with ϕ1 = (cosθ1, , cos θn)′ , ϕ2 = (sin θ1, , sin θn)′. Then model (25) can be rewritten in matrix form as

$z=Xγ+Zu+ϵ,$

where X = (1m I2 1n : Im Φ), Z = I2m 1n. Clearly, the covariance matrix of z is

$Cov(z)=σ2I2mn+σ02(I2m⊗1n1n′).$

It is easy to verify that PXPZ = PZPX under Model (25) is equivalent to

${c¯=1n∑j=1ncosθj=0,s¯=1n∑j=1nsinθj=0.$
(26)

According to Theorems 3.1 and 3.2, the SDE and ANOVAE of the variance components $(σ02,σ2)$ are equal if (26) holds. And the condition = = 0 can be easily satisfied by sampling n measurements equally spaced around the circumference of the circular feature.

### 6.2 Longitudinal data

Consider the popular model for longitudinal data:

(27)

where ui is the random effect, ui and εit have the same assumption as model (1), see Diggle, et al. (2002). In fact, Model (27) is a special case with design matrix X = (1n : x) where $x=(x1′,x2′,⋯,xN′)′,xi=(xi1,xi2,⋯,xiT)′$. Without loss of generality, we assume that rk(X) = 2. That is, xa1n for any scalar quantity a.

Note that

where

$sb(x)=T∑i=1N(x¯i.−x¯‥)2∑i=1N∑t=1T(xit−x¯‥)2,$

$x¯i.=∑t=1Txit/Tandx¯‥=∑i=1N∑t=1Txit/(TN)$. Thus tr(PXPZ) = [tr(PXPZ)]2 if and only if sb(x) = 0 or sb(x) = 1. Combining with the following facts

1. 0 ≤ sb(x) ≤ 1
2. sb(x) = 0 if and only if 1. = 2. = = N.
3. sb(x) = 1 if and only if xi = ci1T, i = 1, , N

, we can show that PZPX = PXPZ in Model (27) is equivalent to sb(x) = 0 or sb(x) = 1. In fact, if PZPX = PXPZ then tr(PXPZ) = [tr(PXPZ)]2, thus we prove sb(x) = 0 or sb(x) = 1. On the other hand, we have PZPX = n if sb(x) = 0; PZPX = P(1:c) T if sb(x) = 1, where c = (c1, , cN)′. Thus we obtain PZPX = PXPZ.

It follows from Theorem 3.3 that the necessary and sufficient condition for the identity of the SDE and ANOVAE of $σu2$ under model (27) is 1. = 2. = = N. or xi = ci1T, i = 1, , N. However, the two conditions are not usually satisfied in practice.

In the following, we compare the the SDE and ANOVAE of $σu2$ under general case 0 < sb(x) < 1. For convenience, we simply write sb = sb(x).

Clearly, if 0 < sb < 1, then rk(PZX) = 2, dim((X) ∩ (Z)) = 1, and the ratio of the coefficients of $σε4inVar(σ^u2)andVar(σ˜u2)$ by (16) is equal to

$f(sb)=(n−2)(N−1)(N−2)(n−3)(N−1−sb)2,$

which is a continuous and increasing function of sb in the interval (0, 1), and for any N > 2 and T > 1, it has

$limsb→0+f(sb)=(n−2)(N−2)(n−3)(N−1)=1−N(T−1)−1(n−3)(N−1)<1,limsb→1_f(sb)=(n−2)(N−1)(n−3)(N−2)>1.$

So there exists sb0 (0, 1) such that f(sb) < 1 if sb < sb0 and f(sb) > 1 if sb > sb0. In fact, sb0 can be given by

$sb0=N−1−(N−1)(N−2)(n−2)(n−3).$
(28)

From Corollary 4.1, it follows that the ANOVAE of $σu2$ is superior to the SDE if 0 < sb < sb0.

Because

$0.5−12T

it follows that (i) f(sb) < 1 if sb < 0.5 − 1/(2T); (ii) f(sb) > 1 if sb > 0.5 + 1/(2N − 3).

Combining with Theorem 4.2, we conclude that for any $σε2>0,σu2≥0$ the ANOVAE of $σu2$ is superior to the SDE if sb < 0.5 − 1/(2T). On the other hand, the ANOVAE of $σu2$ is not always superior to its SDE. Let $υ(τ)={Var(σ^u2)−Var(σ˜u2)}/σε4$ which is the quadratic function of $τ=Tσu2/σε2$, and the coefficients of τ2 and τ are negative. Using the fact that υ(0) > 0 if sb > 0.5 + 1/(2N − 3), we can show that there exist two different solutions for the equation υ(τ) = 0 in this case, denoted by τ1, τ21 < τ2). Clearly, τ1 < 0 < τ2. Thus υ(τ) > 0 for any τ (0, τ2) if sb > 0.5 + 1/(2N − 3). That is, $Var(σ^u2)>Var(σ˜u2)ifsb>0.5+1/(2N−3)andTσu2/σε2<τ2$.

## 7 Simulation

Under model (27), T and Tsb both are distinct nonzero eigenvalues of the matrix QXZZQX, and their multiplicities are N − 2 and 1, respectively. We fix $σe2=1$, and compare via Monte Carlo simulation the power of F00) and W0). Results are displayed for two different choices xit = t + eit + αi and xit = log(t + 1) + eit + 2αi, and three different sample sizes (N,T): (12, 3), (20, 4), (5, 10), with 10000 replicates in each case, where αi ~ N(0, 1) and eit ~ N(0, 1) are independent. Note that for each choice of sample size (N, T), xits are fixed. Here sb0 can be calculated by (28) for the three sample sizes of (N,T), that is, 0.354, 0.387 and 0.499, respectively.

We first obtain two sets of simulated values, (0.345, 0.280, 0.039) and (0.732, 0.741, 0.431), of sb under the three sample sizes of (N,T), respectively, for xit = t + eit + αi and xit = log(t + 1) + eit + 2αi. Clearly, the three values of sb in the first set and the last value in the second set are smaller than the corresponding sb0, and the first two values of sb in the second set are larger than the corresponding sb0. Using the results in example 6.2, we can obtain the following results under three sample sizes above. For the first choice xit = t + eit + αi the ANOVAE of $σu2$ is superior to its SDE for any $σu2≥0andσε2>0$. For the second choice xit = log(t + 1) + eit + 2αi, the SDE of $σu2$ is superior to its ANOVAE only if $σu2≤0.227$ for (N, T) = (12, 3), or only if $σu2≤0.159$ for (N, T) = (20, 4).

Secondly, we take random numbers from the independent χ2 distributions $χm12,χ12andχb2$ with 10000 replicates for each of the six cases above. Using formula (18) and (22), we can obtain simulated test powers of F0) and W0), see Table 1, Table 2.

Power of tests for hypothesis θ = θ0 under Model (27) with xit = t + eit + αi, where all αi ~ N (0, 1), eit ~ N (0, 1) are independent (runs:10000)
Power of tests for hypothesis θ = θ0 under Model (27) with xit = log(t + 1) + eit + 2αi, where all αi ~ N (0, 1), eit ~ N (0, 1) are independent (runs:10000)

Table 1 and Table 2 show that the test based on W0) is more powerful than that based on F0) for the null hypothesis H0 : θ = θ0 under sample size (N, T) = (5, 10), a scenario with relatively large number of repeated observations for each subject but small number of subjects. There is no substantial difference between the power of the two tests above under the sample size (N, T) = (12, 3) or (20, 4), a scenario with relatively small number of repeated observations for each subject but large number of subjects, regardless of wether the ANOVAE is superior to the SDE. As a subsequent observation, the result indicates that better estimates of variance components do not necessarily imply higher power of the tests. A similar phenomenon is observed in the expected lengths of confidence interval (24) and (23); see Figure 1 below.

Expected lengths of exact confidence intervals based on F0) (solid line) and W0) (dash line).

Thus we can adopt the simpler test statistic F0) and confidence interval (23) in practice when T, the number of repeated observations for each subject, is relatively small. However they are not good choices due to the low power of the test and the long length of the confidence interval when T is relatively large, especially, when T is larger than N, the number of subjects.

## 8 Discussion

In this paper, we study the finite sample properties of ANOVAE and SDE under model (1). We establish the necessary and sufficient condition for the equality of the two estimates, and some sufficient conditions for the superiority of one estimate over the other under the mean squared error criterion. Furthermore, we consider exact tests and confidence intervals based on the two estimates, and demonstrate via simulations that better estimates of variance components do not necessarily imply higher power of the tests or shorter confidence intervals.

As comparison, the likelihood method has poor small sample size performance in mixed models (see Burdick, 1997). Moreover, when the parameter lies in the boundary of the parametric space, the likelihood-based tests such as Wald and likelihood ratio tests, do not asymptotically follow chi-square distributions as demonstrated in Stram and Lee (1994) and Crainiceanu and Ruppert (2004). For the two models considered in Section 7 with the same settings, Table 3 presents the coverage probabilities of the confidence intervals based on the maximum likelihood estimate. The results in the table clearly indicate that with relatively small sample sizes, the coverage probabilities are substantially below the nominal level of 95%.

Estimated coverage probabilities * the confidence interval for θ based on MLE under Model (27) (runs:10000). Case I: xit = t + eit + αi. Case II: xit = log(t + 1) + eit + 2αi.

As noticed in the literature, both ANOVAE and SDE may take negative values. This is not a problem for Genetic field in which variance component is permitted to be negative, see Burton et al. (1999) and Hazelton and Gurrin (2003). However, estimates of variance components are often required to be non-negative in many other cases. One remedy is to estimate the two variance components using $max{0,σ^u2}andmax{0,σ˜u2}$. However further research is needed to find conditions for the superiority among the two truncated estimates. In this regard Theorem 4.1 in this paper is a potentially useful tool for such investigation.

## Acknowledgments

This research was supported by the Intramural Research Program of the National Institute of Child Health and Human Development, National Institutes of Health. Wu’s research was also partially supported by Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality PHR (IHLB) and National Natural Science Foundation of China (NSFC). The opinions expressed in the article are not necessarily of the National Institutes of Health. The authors wish to thank two referees for their valuable suggestions which considerably improved the paper.

## Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

## References

• Burch BD, Iyer HK. Exact confidence intervals for a variance ratio (or Heritability) in a mixed linear model. Biometrics. 1997;53:1318–1333.
• Burdick RK, Larsen GA. Confidence intervals on measures of. variability in repeatability and reproducibility study. J. Quality Tech. 1997;29:261–273.
• Burton P, Tiller K, Gurrin L, Cookson W, Musk A, Palmer L. Genetic variance components analysis for binary phenotypes using generalized linear mixed models(GLMMs) and Gibbs sampling. Genetic Epidemiology. 1999;17:118–140. [PubMed]
• Crainiceanu CM, Ruppert D. Likelihood ratio tests in Linear Mixed Models with One Variance Component. J. Royal Stat. Soc. B. 2004;66:165–185.
• Davidian M, Giltinan DM. Nonlinear Models for Repeated Measurement Data. London: Chapman and Hall; 1996.
• Demidenko E. Mixed Models: Theory and Applications. New Jersey: John Wiley & Sons; 2004.
• Diggle PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd ed. New York: Oxford Science; 2002.
• Hazelton ML, Gurrin LC. A note on genetic variance components in mixed models. Genetic Epidemiology. 2003;24:297–301. [PubMed]
• Henderson CR. Estimation of variance and covariance components. Biometrics. 1953;9:226–253.
• Rao CR. Linear Statistical Inference and Its Applications. 2nd ed. New York: Wiley; 1973.
• Searle SR, Casella G, McCulloch CE. Variance Components. New York: Wiley; 1992.
• Stram DO, Lee JW. Variance components testing in the longitudinal mixed-effects model. Biometrics. 1994;50:1171–1177. [PubMed]
• Wald A. A note on the analysis of variance with unequal class frequencies. Ann. Math. Statist. 1940;11:96–100.
• Wang CM, Lam CT. A mixed-effects models for the analysis of circular measurements. Technometrics. 1997;39:119–126.
• Wang SG, Chow SC. Advanced Linear Models. New York: Marcel Dekker Inc.; 1994.
• Wang SG, Yin SJ. A new estimate of the parameters in linear mixed models. Science in China (Series A) 2002;45:1301–1311.
• Wu MX, Wang SG. Simultaneous optimal estimates of fixed effects and variance components in the mixed model. Science in China (Series A) 2004;47:787–799.

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.