Stat Biosci. Author manuscript; available in PMC 2010 May 1.
Published in final edited form as:
Stat Biosci. 2009 May 1; 1(1): 10–31.
PMCID: PMC2791377
NIHMSID: NIHMS130405

# Efficient Semiparametric Marginal Estimation for the Partially Linear Additive Model for Longitudinal/Clustered Data

## Abstract

We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. As part of this work, we first describe the behavior of nonparametric estimators for additive models with repeated measures when the underlying model is not additive. These results are critical when one considers variants of the basic additive model. We apply them to the partially linear additive repeated-measures model, deriving an explicit consistent estimator of the parametric component; if the errors are in addition Gaussian, the estimator is semiparametric efficient. We also apply our basic methods to a unique testing problem that arises in genetic epidemiology; in combination with a projection argument we develop an efficient and easily computed testing scheme. Simulations and an empirical example from nutritional epidemiology illustrate our methods.

Keywords: Additive models, Generalized least squares, Interaction testing, Nonparametric regression, Partially linear model, Repeated measures, Smooth backfitting, Tukey-type models

## 1 Introduction

We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, see below, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction.

There has been considerable interest in the simplest version of this problem. Suppose that there are i = 1, … , n individuals, and j = 1, … , J observations per individual. The responses are Yij, the scalar predictors modeled nonparametrically are Xij, and the possibly vector-values parameters modeled parametrically are Zij. A simple model says that given (Xi1, …, XiJ),

$Yij=Zij⊤θtrue+mtrue(Xij)+ϵij;cov(ϵ˜i)=cov(ϵi1,…,ϵiJ)⊤=Σtrue,$
(1)

where ϵ̃ has mean zero. This is the classical partially linear model in the repeated measures case.

When there are no repeated measures (J = 1), there is a vast literature on estimating θtrue under various conditions (Härdle et al. 2000). With the repeated measures, estimation using kernel methods has been treated (Wang et al. 2005) and considerably generalized (Lin and Carroll 2006).

In this paper, our goal is to consider repeated measures models such as (1) in the case that the argument of mtrue(·) is multivariate, rather than a scalar. As is usual, to avoid the curse of dimensionality we take an additive modeling approach. Suppose we have a random sample ${(Yi⊤,Xi⊤)⊤}$ for i = 1, …, n, where Yi = (Yi1, …, YiJ) and Xi = (Xi11, …, Xi1J, …, XiD1, …, XiDJ) with

$Yij=Zij⊤θtrue+m0,true+∑d=1Dmd,true(Xidj)+ϵij;ϵ˜i=(ϵi1,…,ϵiJ)⊤∼(0,∑true),$
(2)

where Σtrue has elements σk,true and is positive definite. We have J repeated measurements consisting of one response and D regressors from n individuals. Assume X1 has the density p(·). We denote the density of (X11, …, X1D) as p(·), the density of (X1sk,X1b) as $pklsb$ and the density of X1dk as $pkd$.

As far as we are aware, there has been no considerations of efficient estimation of θtrue in model (2), and it is the purpose of this paper to provide such estimates and the accompanying theory. If θtrue were known, then the functions {md,true(·)} can be estimated efficiently by a local linear smooth backfitting algorithm (Carroll et al. 2009) and reviewed in Section 2. We combine this methodology with a semiparametric profiling argument (Lin and Carroll 2006) to develop estimates of θtrue and inference for it in Section 3.

While the algorithm for estimating θtrue is quite simple, technically the development is challenging because it involves fitting additive nonparametric models to data that do not have the additive structure. Such a technical issue does not arise in simpler models such as (1), where there is only one nonparametric function being estimated. In particular, as part of the work, we describe the properties of additive model regression when the additive model fails, a result of interest in itself.

Our paper is organized as follows. In Section 2, we review the basic repeated measures smooth backfitting algorithm for local linear regression. Included in this section is a new result about the properties of additive model regression when the additive model fails. Section 3 gives the algorithm for estimating θtrue, and states the asymptotic results for it. A simulation study is described in Section 4, while Section 5 describes an empirical data set from nutritional epidemiology. Section 6 describes a non-standard model arising in genetics in which it is interesting to test a null hypothesis that a parameter equals zero. We develop a score test approach for this problem, the theory for which again uses our new theory of misspecified additive models. Section 7 gives concluding remarks. Proofs are sketched in a technical appendix.

## 2 Local Linear Smooth Backfitting and its Properties

In this section, we describe the local linear smooth backfitting estimator for repeated measures data (Carroll et al. 2009), state its properties when an additive model holds, and then describe its properties when the additive model fails. The latter result is new.

### 2.1 The Local Linear Estimator

Let K0 be a base kernel function and $Kh0(u)=h−1K0(u/h)$. Define a boundary corrected kernel function by $Kh(u,v)=I(u,v∈S)Kh0(u−v)/∫SKh0(w−v)dw$, where S is the support.

We allow for estimation of the covariance matrix Σtrue. Let the elements of $Σtrue−1$ be bjk,true, and for an arbitrary covariance matrix Σ, let the elements of Σ−1 be bjk. Write the symmetric matrix Σ−1/2 to have elements ajk. Note that $∑ℓ=1Jajℓ,Σakℓ,Σ=bjk,Σ$. One choice of Σ is the covariance estimator based on the residuals of the pooled data, but more complex generalizations are easily handled.

Let 1 be a vector of ones and $m^0Σ=n−1∑i=1n1⊤Σ−1Yi/1⊤Σ−11$. For a local linear fit, consider the smoothed sum of squares given as follows:

(3)

Here, x = (x11, … , x1J, … , xD1, … , xDJ). The goal of smooth backfitting is to minimize (3) in the functions (m1, … , mD,m1, … , mD).

Define the following density estimators $p^ka(x)=n−1∑i=1nKha(x,Xiak),p^jkab(x,y)=n−1∑i=1nKha(x,Xiaj)Khb(y,Xibk)$ for j,k = 1, … , J and a,b = 1, … , D. The minimizer of (3) exists and is given as follows (Carroll et al 2009). Define the following estimators, for q = 1, … , D, r = 0,1,2,

$u^sr(x)=∑j=1Jbjj,Σn−1∑i=1n(Xiqj−xhq)rKhq(x,Xiqj)$

and for q,p = 1, … , D; r,s = 0,1,

$υ^qrs(x,y)=∑j=1J∑k=1,≠jJbjk,Σn−1∑i=1n(Xiqj−xhq)r(Xiqk−yhq)sKhq(x,Xiqj)Khq(y,Xiqk);υ^qprs(x,y)=∑j=1J∑k=1Jbjk,Σn−1∑i=1n(Xiqj−xhq)r(Xipk−yhp)sKhq(x,Xiqj)Khp(y,Xipk).$

Then we obtain the following system of fitting integral equations: for d = 1, … , D,

(4)

where

$M^d(xd)={u^d0(xd)u^d1(xd)u^d1(xd)u^d2(xd)};{m˜dΣ(xd)m˜d,Σ(xd)}=M^d−1(xd){n−1∑i=1n∑j,k=1Jbjk,Σ(Yik−m^0Σ)Khd(x,Xidj)n−1∑i=1n∑j,k=1Jbjk,Σ(Yik−m^0Σ)Xidj−xhdKhd(x,Xidj)};V^d(xd,t)={υ^d0,0(xd,t)υ^d0,1(xd,t)υ^d1,0(xd,t)υ^d1,1(xd,t)};V^ds(xd,t)={υ^ds0,0(xd,t)υ^ds0,1(xd,t)υ^ds1,0(xd,t)υ^ds1,1(xd,t)},$
(5)

for d, s = 1, … , D. They impose the identification condition

$∫∑k=1J∑j=1Jbjk,Σ{md(x)p^jd(x)+md(x)n−1∑i=1n(Xidj−xhd)Khd(x,Xidj)}dx=0.$
(6)

It is easy to check that the solution of Equation (4) automatically satisfies the identification condition.

A backfitting algorithm (Carroll et al. 2009) can be used to implement this estimator without having to compute the high dimensional integrals in (3). Define

${𝓖^ddΣ(f,g)⊤}(x)=M^d−1(x)∫V^d(x,t){f(t),g(t)}⊤dt;{𝓖^daΣ(f,g)⊤}(x)=M^d−1(x)∫V^da(x,t){f(t),g(t)}⊤dt.$

Then the backfitting algorithm takes the following form. Equation (4) can be rewritten as

${(I+𝓖^ddΣ)(md,md)⊤}(xd)={m˜dΣ(xd)m˜d,Σ(xd)}−∑s=1,≠dD{𝓖^ds(m^s,m^s)⊤}(xd),d=1,…,D,$

where I is the identity mapping. Based upon this observation, we build up a backfitting type algorithm, as follows.

1. Set the initial values $md[0]$ and md[0] for d = 1, … , D which satisfy the identification condition.
2. Let ($md[k]$, md[k]) be the estimates of the dth function in the kth iteration. The updating equation is defined as
${(I+𝓖^ddΣ)(md[k],md[k])⊤}(xd)={m˜dΣ(xd)m˜d,Σ(xd)}(xd) −∑s=1d−1{𝓖^ds(ms[k],ms[k])⊤}(xd)−∑s=d+1D{𝓖^ds(ms[k−1],ms[k−1])⊤}(xd).$
Note that in practice the integrations in the operators and the inversion of the operators $I+𝓖^ddΣ$ can be done by discrete approximations of functions on grid points. In our simulations and in our data example we applied the trapezoidal rule for numerical integration.
3. Iterate this updating until it converges. Convergence of the algorithm has been proven (Carroll et al. 2009).

### 2.2 Properties When the Additive Model Holds

The following results are known (Carroll et al. 2009). Make the following assumptions.

#### ASSUMPTIONS

• A1. For j, k = 1, … , J and d, s = 1, … , D, $pjkds$ is bounded away from zero and infinity on its support, [0, 1]2, and have continuous partial derivatives.
• A2. E|Y1j|r0 < ∞ for j = 1, … , J and some r0 > 5/2.
• A3. The true component functions md,true’s are twice continuously differentiable.
• A4. The base kernel function K0 is a symmetric density function with compact support, [−1, 1] say, and is Lipschitz continuous.
• A5. n1/5hd converge to constants cd > 0 for d = 1, … , D as n goes to infinity.

Let Σ be any positive definite matrix. We say converges to Σ in probability if

$supx:‖x‖2=1x⊤(Σ^−Σ)x=op(1).$

Let $Pd(xd)=diag{p1d(xd),…,pJd(xd)}$. Define, for d = 1, …, D,

$υdΣ(xd)=cd−1μ0{(K0)2}trace{Σ−1ΣtrueΣ−1Pd(xd)}{∑j=1Jbjj,Σpjd(xd)}−2;βdLL(xd)=(1/2)μ2(K0)hd2md,true″(xd),$

where μr(K) = ∫ trK(t)dt for a function K. They show that if the positive definite estimator converges to Σ in probability, it follows that for any x1, …, xd (0; 1),

$n2/5{m^1Σ^(x1)−m1,true(x1)−β1LL(x1)⋮m^DΣ^(xD)−mD,true(xD)−βDLL(xD)}→Normal[0,diag{υ1Σ(x1),…,υDΣ(xD)}];n1/2(m^0Σ^−m0,true)→Normal{0,1⊤Σ−1ΣtrueΣ−11(1⊤Σ−11)2}.$

If the covariance estimator converges to Σtrue in probability, for d = 1, … , D, we have βd,true(xd) defined with Σtrue instead of Σ and

$υd,true(xd)=cd−1μ0{(K0)2}{∑ℓ=1Jbℓℓ,truepℓd(xd)}−1;n1/2(m^0Σ^−m0,true)→Normal{0,(1⊤∑true−11)−1}.$

##### Remark 1

The bias term of our estimators is the same as for ordinary nonparametric regression with no repeated measures and no additive model. This bias term in our estimator does not involve the density function of the X’s, which is called design independent bias (Fan and Gilbels 1996). Also, when the covariance estimator converges to Σtrue in probability, our estimator is efficient.

##### Remark 2

The assumption A5 means that ordinary bandwidth estimation will work in our context, and the semiparametric contexts to follow. Section 3.2 of Carroll et al. (2009) discusses this issue. Also see Maity, et al. (2007) for other comments on bandwidth estimation in semiparametric models.

### 2.3 When the True Regression Function is not Additive

In this section, we do not assume an additive model but instead assume only that

(7)

and where ϵ̃ has mean zero.

One can expect that the proposed smooth backfitting type estimator converges to an additive projection of m(·) under the model (7), because the estimator minimizes a distance between responses and additive functions. We denote this additive projection by $madd(x1,…,xD)=m0*+∑d=1Dmd*(xd)$. The explicit definition of the $md*’s$ is given in equation (A.2) of Appendix A.1. Define $m˜dϵˇ,Σ$ the same as $m˜dΣ$ in (5) by using ϵ̌ij Yijmadd(Xi1j, … , XiDj) = m(Xi1j, … , XiDj) − madd(Xi1j, … , XiDj) + ϵij instead of Yij0.

#### Proposition 1

Under assumptions A1–A5 with md,true replaced by $md*$, we have, for d = 1, … , D, for the local linear estimator,

$supxdϵ[0,1]|m^d(xd)−md*(xd)−βdLL(xd)−m˜dϵˇ,Σ(xd)|=op(n−2/5).$

##### Remark 3

It is easy to see that

$n2/5m˜dϵˇ,Σ(xd)⇒dNormal{0,cd−1μ0{(K0)2}trace{Σ−1(Σtrue+∑dm)Σ−1Pd(xd)}{∑j=1Jbjj,Σpjd(xd)}−2}$

where $∑dm=covd*{(m(X111,…,X1Dj)−madd(X111,…,X1Dj))j=1,…,J⊤$ and where $covd*$ means the conditional covariance matrix given X1d1 = xd, … , X1dJ = xd. The expansions coincide with the case when the true regression function is additive. However, when it is not additive, in the variance expression the covariance Σtrue is replaced by $Σtrue+Σdm$. This additional price for deviations from additivity also appeared in applications of smooth backfitting to spatial data (Lu et al. 2007).

##### Remark 4

It is important to remember that when the true regression function is not additive, an additive regression model does not consistently estimate the true regression function, but estimates $m0+∑d=1Dmd*(xd)$, which is the projection of the true regression function onto the space of additive models.

## 3 Profiling Estimator in Partially Linear Additive Models

### 3.1 The Method

In many studies, it is useful and interesting to impose a linear relation between the response and some covariates as in (2). As we now show, estimation of θtrue is easily accomplished by a profiling argument.

For any random sample (W1, … , Wn), let $m^0Wand{m^dW(·)}d$ be the (componentwise) smooth backfitting estimate when treating W as the responses and let $md*,W(·)$ be the limit of the latter. Here we suppress the dependence of the estimation on Σ. Define $U^iℓj=Ziℓj−m^0Zℓ−∑d=1Dm^dZℓ(Xidj),𝒰^i=(U^iℓj)ℓ=1,…,p;j=1,…,JandS^Z(Σ)=n−1∑i=1n𝒰^i∑−1𝒰^i⊤$. Mimicking Lin and Carroll (2006), for any working covariance matrix Σ, the profile estimator is easily found to have the closed form expression

$θ^=S^z−1(Σ)n−1∑i=1n∑k=1J∑j=1Jbjk,ΣU^ik{Yij−m^0Y−∑d=1Dm^dY(Xidj)}$

where Ûik = (Ûi1k, … , Ûipk).

#### Remark 5

Note crucially here that Ûik is the result of an additive regression of Z in X, even though that regression need not be additive in actuality. This is where the result in Section 2.3 comes into play.

### 3.2 Asymptotic Theory

Define $Uiℓj=Ziℓj−m0*,Zℓ−∑d=1Dmd*,Zℓ(Xidj),𝒰i=(Uiℓj)ℓ=1,…,p;j=1,…,J,Uik=(Ui1k,…,Uipk)⊤andSZ(Σ)=E(𝒰iΣ−1𝒰i⊤)$. Then, by using Proposition 1 and orthogonality properties in the calculations, we have the following result, sketched in Appendix A.2.

#### Proposition 2

Suppose SZ(Σ) is invertible. Make the assumptions of Proposition 1 for the additive regressions with covariates Xi and with responses Yi and Zi = (Zi1, … , Zij) for = 1, … , p. We also assume that the kernel function K0 has bounded derivatives and E{exp(|Zij|)} < ∞. Then we have

$n1/2(θ^−θtrue)=SZ−1(Σ)n−1/2∑i=1n∑k=1J∑j=1Jbjk,ΣUikϵij+op(1)=SZ−1(Σ)n−1/2∑i=1n𝒰iΣ−1ϵ˜i+op(1)⇒Normal{0,SZ−1(Σ)E(𝒰Σ−1ΣtrueΣ−1𝒰⊤)SZ−1(Σ)}.$
(8)

When the working covariance matrix equals the true covariance matrix, i.e., Σ = Σtrue,

$n1/2(θ^−θtrue)⇒Normal{0,SZ−1(Σtrue)}.$
(9)

##### Remark 6

Plug-in estimators of the covariance matrices in (8) and (9) can be constructed easily. For instance, and SZtrue) can be estimated by and ŜZtrue) as in Section 3.1, thus giving an estimator for the covariance matrix in (9). An estimator of Σtrue can be constructed using the sample covariance of residuals from a nonparametric additive regression using working independence. For the covariance matrix in (8), the term E(Σ−1ΣtrueΣ−1) can be estimated by the sample covariance matrix of the estimates of iΣ−1ϵ̃i.

### 3.3 Semiparametric Efficiency

For Gaussian errors the semiparametric efficiency bound for the variance can be shown to equal $SZ−1(Σtrue)$. Thus in this case is asymptotically efficient. Here we state the main details.

Consider the model given in (2) where ϵ̃i = Normal(0, Σtrue). Recall that $Σtrue−1=(bjk,true)j,k=1,…,J$. Define the symmetric matrix $Σtrue−1/2=(ajk,true)j,k=1,…,J$.

#### Proposition 3

Assume that ϵ̃ = Normal(0, Σtrue). Given Σ = Σtrue, the semiparametric efficient influence function for θ0 is given by

$Ψθ,eff=SZ−1(Σtrue)∑j=1J∑k=1Jbjk,trueUkϵj=SZ−1(Σtrue)𝒰Σtrue−1ϵ˜.$

The semiparametric variance bound for θ0 is given by $E(Ψθ,effΨθ,eff⊤)=SZ−1(Σtrue)$, and in this case our estimator is semiparametric efficient.

## 4 Simulations

In this section, we discuss finite sample properties of the proposed estimators via simulation studies. Compared to the ordinary partially linear model, the models considered in this paper have two important characteristics: one is that we observe independent clusters which may have within-cluster correlations and the other is that the functional parameter in the regression model has an additive structure. Our method considers these two aspects of the model. We will compare the finite sample performance of our estimator and three alternatives, as follows.

1. Our proposed method which considers within-cluster correlations and the additive structure of the nuisance parameter
2. An estimator which consider only the additive structure of the nuisance parameter, but assumes that there is no correlation within the clusters. This is the working independence estimator for an additive model (Yu et al 2009).
3. An estimator which consider only within-cluster correlations, but does not model the nonparametric function additively. This method is based on the estimators of Wang (2003) and Wang, et al. (2005), but instead uses a multivariate kernel function.
4. An estimator which consider neither the correlations nor the additive structure, i.e., the ordinary profile multivariate-kernel estimator.

We considered the semiparametric additive regression model

$Yij=βZij+m1(Xi1j)+m2(Xi2j)+ϵij$

where β = 1:5, m1(x) = sin{2π (x − 0.5) and m2(x) = x − 0.5 + sin{2π(x − 0.5)}. The sample size was 200, the bandwidth was 0.1, there were J = 3 repeated measures and the grid size for integration was 0.01. For each scenario we generated 500 data sets. Estimation of the covariance matrix used the residuals from the pooled data estimator. We generated ϵi = (ϵi1i2i3) from Normal(0, ΣE) where ΣE has elements σij. We investigated seven cases. For Cases 1, 2 and 3, we used the exchangeable covariance matrix ΣE = (1 − ρa)I + ρa11 with ρ1 = 0.9, ρ2 = 0.5 and ρ3 = 0.1, respectively. For Case 4, we used common variances σ11 = σ22 = σ33 = 1 with σ12 = 0.9, σ13 = 0:5 and σ23 = 0:4. For Case 5, we used common variances σ11 = σ22 = σ33 = 1 with AR(1) structure having correlation coefficient −0.9. Finally, for Cases 6 and 7 we allowed heteroscedasticity with σ11 = 9, σ22 = 4 and σ33 = 1, and with common correlation 0.9 and 0.1, respectively.

The six dimensional vector Xi was generated from independent Normal(0.5,0.25) but truncated to the 6 dimensional unit cube [0, 1]6. The covariate Zij was generated as Zij = 3(1−2Xi1j)(1−2Xi2j)+uij where uij were independently from Normal{0, 0.5}. Note that the function f(x,y) = (1−2x)(1−2y) is orthogonal to additive functions under this setup so that $E{f(Xi11,Xi21),…,f(Xi1J,Xi2J)}Σtrue−1{f(Xi11,Xi21),…,f(Xi1J,Xi2J)}⊤$ is the source of our gaining efficiency by considering additivity. To see this, recall that the asymptotic variance of type C estimator is $E(𝒱iΣtrue−1𝒱i⊤)−1$ where i = {Zi1E(Zi1|Xi11,Xi21), … , ZijE(Zij|Xi1j,Xi2j)}. Then

because the function f(x,y) = E(Z|X1 = x,X2 = y) is orthogonal to additive functions.

We report the mean squared errors (MSE) for estimating β in Table 1. The biases were negligible in almost every case and are not reported here. The worst bias was about −0.015 for the local linear estimator of type D in Case 7. Here is a summary of the main findings.

MSEs × 104 of the estimators for β based on 500 samples of size 200. LC stands for local constant estimator and LL for local linear. The seven covariance types are described in the text. Method A accounts for additivity and within-cluster ...
• There is little difference between the local linear and local constant estimators.
• Our method that takes into account both the additivity and the within-cluster correlation does much better than the other methods.
• The comparison of whether to ignore the additivity or the within-cluster correlation, methods B and C, depends on the covariance structure.
• The method that ignores both the additivity and the within-cluster correlations is wildly inefficient.

## 5 Urinary Nitrogen Example

To illustrate the repeated measures smooth backfitting algorithm, we use data from the OPEN Study (Kipnis et al 2003). Background on nutritional epidemiology may be found in Willett (1990). The data was analyzed in Carroll et al (2009) with the additive model framework but no parametric part.

We use a data set of 294 men and women measured at two visits who reported their short-term intake of protein Yij as measured by the biomarker urinary nitrogen. Here the protein biomarker data were log-transformed. To predict protein intake we used three variables, body mass index Xi1j, log-protein intake Xi2j as measured by a 24-hour recall instrument and log-protein intake Zij as measured by a food frequency questionnaire. A preliminary analysis using three dimensional nonparametric additive fit suggested linearity in the last predictor Zij. The residuals from an additive regression fit suggested an estimated covariance matrix with variances 0.065 and 0.075 and with a correlation of 0.507. The bandwidths were selected as in Carroll et al (2009) and are given as 3:02 for body mass index Xi1j and 0.374 for log-protein via the 24-hour recall Xi2j.

Under this working covariance, we fitted the following model:

$E(Yij|Zij,Xi1j,Xi2j)=m0+βZij+m1(Xi1j)+m2(Xi2j).$

We also found that third order and second order polynomials approximate m1(·) and m2(·) fairly well, respectively. Table 2 shows the estimates of β from the parametric polynomial fits and semiparametric fits with (WD) and without (WI) taking account the within-cluster correlations. As is seen in the table, they give similar results with slightly smaller standard errors when taking account of the within-cluster correlations. We also found that the average length of pointwise confidence interval of WD estimators for nonparametric functions were 13.5% shorter than WI estimators. Figure 1 shows the residual plots and Figure 2 shows the fits of the functions m1(·) and m2(·).

Residual plots from parametric polynomial models and semiparametric models with WI and WD estimators.
OPEN Study data: nonparametric and ploynomial function fits for body mass index, left column, and the protein recall, right column, as predictors of the protein biomarker, ”Protein B”, along with the 95% pointwise confidence interval for ...
The estimates of β from parametric polynomial models: WD means taking into account the within-cluster correlations, while WI ignores these correlations.

## 6 Score Testing for Parametric Effects in a Complex Interaction Model

Although our major focus is on the partially linear repeated measures additive model, the power of our results and another application of Proposition 1 can be seen in their application to a unique testing problem not considered previously in the additive models case. In a genetics context (Chatterjee et al. 2006), there is a parametric modeling framework for testing certain parametric components in a way that gains considerable power in the presence of interactions. The natural version of their model in our framework is as follows. We have additional covariates Zij, and the model consists of a parametric component in the Zij, an additive model in the Xidj and a possible interaction determined by a scalar γ, and is

$Yij=m0,true+∑d=1Dmd,true(Xidj)+{1+γ∑d=1Dmd,true(Xidj)}Zij⊤θtrue+ϵij.$
(10)

Here the goal is to test whether Z influences Y, and thus the null hypothesis is that H0true = 0, see Chatterjee et al. for detailed motivation. Basically, the idea is that in some cases it can be expected that the Z’s and the X’s interact, and model (10) is one way to exploit this expectation to obtain a more powerful test for the main effect of Z.

The value of the interaction γ is not identified under the null hypothesis, and Chatterjee et al. refer to this as a Tukey-type formulation, with the exception that the interest is in the main effect and not in the interaction.

Because (10) is a complex model that is not even identified under the null model, Chatterjee et al. suggest using a modified score test procedure. In our case, even more is true, because even if we fix γ ≠ 0, (10) is a non-standard model for which there are no available methods. Maity, et al. (2009) consider this problem when there are no repeated measures, and our development is a generalization of theirs.

The normalized loglikelihood score statistic in θ for model (10) evaluated at θtrue = 0 is easily seen to equal

$𝒮n(γ)=n−1/2∑i=1n∑j=1J∑k=1Jbjk,Σtrue{1+γ∑d=1Dm^d(Xijd)}Zij{Yik−m^0−∑s=1Dm^s(Xisk)},$

where {d(·)}d are computed at the null model with θtrue = 0. Chatterjee et al. (2006) suggest normalizing the score statistic n(γ) so that it is asymptotically chi-square, and then maximizing over an interval of γ-values.

One can derive uniform expansions of the function estimates similar to that of Lin and Carroll (2006) to order op(n−1/2) and use those to derive the asymptotic distribution of n(γ) under the null hypothesis that θtrue = 0. However, two difficulties arise. The first is that undersmoothing is required, i.e., $nhd4→0$ is needed for d = 1, … , D. Second, and critically, the distribution depends upon terms that are not directly computable because they are the solution to an integral equation, and hence computing p-values in simple and easily computed ways is not possible.

To overcome both problems raised above, we use the following projection-type approach. Define $Qij(γ)={1+γ∑d=1Dmd,true(Xidj)}ZijandQ^ij(γ)={1+γ∑d=1Dm^d(Xidj)}Zij$, where again {d(·)}d are computed in model (10) with θtrue = 0.

Let $m^dQ(·,γ)}dandmˇdQ(·,γ)}d$ be the smooth backfitting estimators when the ”responses” are Qij(γ) and ij(γ), respectively, and similarly for $m^0Qandmˇ0Q$. We emphasize that we are not assuming that Qij(γ) follows an additive model, and this is where Proposition 1 comes into play.

We propose to use the adjusted score

$𝒮n,adj(γ)=n−1/2∑i=1n∑j=1J∑k=1Jbjk,Σtrue{Q^ij(γ)−mˇ0Q−∑d=1DmˇdQ(Xidj,γ)}×{Yik−m^0−∑s=1Dm^s(Xisk)}.$

Define $UijQ(γ)=Qij−m0Q−∑d=1DmdQ(Xidj,γ)$ and iQ(γ) = {Ui1Q(γ), … , UiJQ(γ)}. Then we have the following result, which is the first step in the construction of the test-statistic, namely normalization to a chi-squared random variable for fixed γ.

#### Proposition 4

Suppose that the dimension of Z is pw. Assume that the assumptions of Proposition 1 holds for the additive regressions with covariates Xi and responses Yi and (Qi1, … , QiJ). We also assume that the kernel function K0 has bounded derivative and E exp(|Qij|) < ∞. Then, under the null hypothesis, we have

$𝒮n,adj(γ)=n−1/2∑i=1n∑j=1J∑k=1Jbjk,ΣtrueUijQ(γ)ϵik+op(1)→Normal[0,E{𝒰Q(γ)Σtrue−1𝒰Q⊤(γ)}].$
(11)

Let $Σ^true−1$ with elements jk be a consistent estimate of $Σtrue−1$ and define $U^ijQ(γ)=Q^ij−mˇ0Q−∑d=1DmˇdQ(Xidj,γ)$. Then

$Λ^(γ)=n−1∑i=1n∑j=1J∑k=1Jb^jkU^ijQ(γ)U^ikQ⊤(γ)$

is a consistent estimate of $E{𝒰Q(γ)Σtrue−1𝒰Q⊤(γ)}$, and thus $𝓣n(γ)=𝒮n,adj⊤(γ)Λ^−1(γ)𝒮n,adj(γ)$ is asymptotically chisquared with pw degrees of freedom.

### 6.2 Implementation and p-Values

Chatterjee et al. suggest the following procedure. Instead of choosing an arbitrary γ, they form n = max0≤γ≤b n(γ). We emphasize that they do not consider our model, only parametric ones. It is possible to show that n(γ) converges weakly to a Gaussian process in γ. Hence, at least in principle, it is possible to use this fact directly to obtain p-values, but the process is not easy because it requires sampling from the Gaussian process.

An alternative is to use the parametric bootstrap, by assuming that the (ϵij) are normally distributed and generating bootstrap samples under the null hypothesis for model (10) evaluated at θ0 = 0, with the fits {d(·)} replacing the true functions. While simple, this replies on the normality assumption and requires computing smoothing backfitting estimates for all the bootstrap samples.

To avoid these difficulties, we reply on the linear expansion (11). Then we propose to use n as the test statistic. Its percentiles can be estimated as follows. Let B be a large number. For b = 1, … , B, generate mean zero normal random variables bij such that (bi1, … , biJ) = Normal(0, Σtrue). Then compute

$𝒮bn,adj(γ)=n−1/2∑i=1n∑j=1J∑k=1Jbjk,ΣtrueU^ijQ(γ)𝓏bik;𝓣bn=max0≤γ≤b𝒮bn,adj⊤(γ)Λ^−1(γ)𝒮bn,adj(γ).$
(12)

Note that except for the estimation of UijQ, the limiting distribution of bn,adj(γ) is the same as the limiting distribution of n,adj(γ), no matter what the distribution of the (ϵij) is. This means that a consistently estimated p-value for the null hypothesis is $B−1∑b=1BI(𝓣n<𝓣bn)$.

#### Remark

It is worth emphasizing that in constructing bn,adj(γ) for b = 1, … , B in (12), smooth backfitting is not used. Our methods require only that smooth backfitting for repeated measures be applied to the actual data, only once, to construct d(·) and $mˇdQ(·,γ)$. Thus, computing the p-value is not a difficult process.

## 7 Discussion and Conclusions

We have discussed fitting semiparametric additive models in the repeated measures context. We have used the smooth backfitting kernel approach to this problem largely because it has a tractable asymptotic theory, with good efficiency properties. In practice, of course, regression spline approaches can be used as well. In simulations not reported here, we have found that regression spline approaches have similar good small-sample behavior, although asymptotic theory would be challenging.

## Acknowledgments

Yu and Mammen’s research was supported by the Deutsche Forschungsgemeinschaft project MA 1026/7-3. Carroll and Maity’s research was supported by a grant from the National Cancer Institute (CA57030). Carroll’s work was also supported by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST).

## Appendix: Proofs and Complements

#### A.1 Proof of Proposition 1

Let = {f = (f1, … , fJ) : fj are real valued functions on [0,1]D} and $ℋ={f=(f,…,f)⊤:f(x)=∑d=1Dfd(xd)$ is additive functions on [0,1]D}. Note that is a subspace of and has one common function f. We equip with a Hilbert (semi-)norm ‖fΣ = ∫f(x−1f (x)p(x)dx. Let also 0 = {f ϵ :< 1,f >Σ= 0} where 1 = (1, … , 1) ϵ and < ·,· >Σ is the inner product inducing the norm ‖·‖Σ. Let (x) = {m(x1), … , m(xJ)} with xj = (x1j, … , xDj). For simplicity, we assume that the < , 1 >Σ= 0, i.e., it is centered. Define $m*(x)={∑d=1Dmd*(xd1),…,∑d=1Dmd*(xdJ)}⊤$ as the projection of onto 0, i.e. m* = argminh0– h‖Σ: This yields that

$0=Σ=∫{m(x1)−∑d=1Dmd*(xd1),…,m(xJ)−∑d=1Dmd*(xdJ)}×Σ−1{∑d=1Dgd(xd1),…,∑d=1Dgd(xdJ)}⊤p(x)dx$
(A.1)

for any $g(x)=(∑d=1Dgd(xd1),…,∑d=1Dgd(xdJ))⊤∈ℋ0$ for which the integral exists. We can derive following system of integral equations from the equation (A.1):

$md*(xd)=mdΣ(xd)−∫md*(t)∑j=1J∑k≠ℓJbjk,Σpjkdd(xd,t)∑j=1Jbjj,Σpjd(xd)dt −∑s=1,≠dD∫ms*(t)∑j=1J∑k=1Jbjk,Σpjkds(xd,t)∑j=1Jbjj,Σpbd(xd)dt$
(A.2)

where $mdΣ(xd)=∑j=1J∑k=1Jbjk,ΣE(Yik|Xidj=xd)pjd(xd)/∑j=1Jbjj,Σpjd(xd)$.

The proof of Proposition 1 can be done as in Carroll et al. (2009) based on the orthogonality that arises in (A.1) and modified definition of functions they call $m˜dV,M(xd)andm˜dV,E(xd)$, which are given as

$m˜dV,M(xd)=n−1∑i=1n∑j,k=1Jbjk,V∑s=1Dms*(Xisk)Khd(xd,Xidj)/∑j=1Jbjj,Vp^jd(xd);m˜dV,E(xd)=n−1∑i=1n∑j,k=1Jbjk,VϵˇikKhd(xd,Xidj)/∑j=1Jbjj,Vp^jd(xd).$

Note that the orthogonality in equation (A.1) implies that $E{∑j,k=1Jbjk,Σϵˇ1kKhd(xd,X1dj)}=0ford=1,…,D$.

#### A.2 Sketch Proof of Proposition 2 and 4

The basic idea of the proof is related to that of Yu et al. (2009). By Proposition 1, we have $S^Z(Σ)→pSZ(Σ)$. It suffices to show that

Note that $Yij−m^0Y−∑d=1Dm^dY(Xidj)−U^ij⊤θtrue=m0,true−m^0Y(βtrue)+∑d=1Dmd,true(Xidj)−∑d=1Dm^dY(βtrue)(Xidj)+ϵijwhereY(βtrue)=Y−Z⊤βtrue(=m0,true+∑d=1D(md,true(Xd)+ϵ)$ since the estimators are additive in responses. Now, we have

It is clear that Cn2 and Cn3 are op(1) by Proposition 1. Now we will show Cn1 = op(1). By Proposition 1, we have

$supx∈[0,1]D|m0,true−m^0Y(βtrue)+∑d=1D{md,true(xd)−m^dY(βtrue)(xd)}|=op(δn)$
(A.3)

for δn = n−1/5+a with any small positive a. Denote $h^n(x1,…,xd)=m0,true−m^0Y(βtrue)+∑d=1D(md,true(xd)−m^dY(βtrue)(xd))≡h^n,0+∑d=1Dh^n,d(xd)$. Using definition of d(xd), d = 1, … , D and results from Carroll et al (2009), we have

$ddxdm^d(xd)=ddxdm^dΣ(xd)−∫m^d(t)ddxd∑j=1J∑k≠ℓJbjk,Σp^jkdd(xd,t)∑j=1Jbjj,Σp^jd(xd)dt −∑s=1,≠dD∫m^s(t)ddxd∑j=1J∑k=1Jbjk,Σp^jkds(xd,t)∑j=1Jbjj,Σp^bd(xd)dt$
(A.4)

if $∑j=1Jbjj,Σp^jd(xd)>0$, which happens with probability tending to one. By Proposition 1, standard calculations in kernel smoothing and integration by part, we have

$supxd∈[0,1]|ddxdh^n,d(x)−hdbn,d(xd)|=Op(n−3/10logn)=op(δn)$
(A.5)

for some uniformly bounded non-random functions bn,d. Equation (A.3) and (A.5) imply that $δn−1h^n(·)∈B(0,1)≡{g=∑d=1Dgd:gd$ are real functions on [0, 1] with ‖g ≤ 1 and supxd,ydϵ[0,1] |gd(xd) − gd(yd)| ≤ |xdyd| for d = 1, … , D} with probability tending to one. Note that the covering number with bracketing N(η) N{η,B(0,1),‖·‖} of B(0,1) with respect to the uniform norm is bounded by (2η−1)D3Dη−1, i.e., one can find a set (η) that has N(η) elements such that for any g ϵ B(0,1) there exists $giL,giU∈𝒩(η)withgiL≤g≤giUand‖giL−giU‖∞<η$.

Define ${F(Xi,Ziℓ)}g=∑j=1J∑k=1Jbjk,ΣUiℓkg(Xi1j,…,XiDj)$ for g B. Then $Fn(·)≡{n−1/2∑i=1nF(Xi,Ziℓ)}(·)$ can be regarded as a random process indexed by elements of B. Note that E[{F(Xi,Zi)}g] = 0 for any g ϵ B(0,1) by the orthogonality (A.1). Then one can show that supgϵB(0,1) |Fng| = Op(1). This follows by Corollary 8.8 in van de Geer (2000) with the above entropy condition and the tail condition assumed in the proposition.

Finally, we have $pr(|δn−1Cn1|>M)≤pr{supg∈B(0,1)|Fng|>M}+pr{δn−1h^n(·)∉B(0,1)}$. Thus Cn1 = Opn) = op(1). This completes the proof. The same argument can be applied to the proof of Proposition 4.

#### A.3 Proof of Proposition 3

Recall that $U^iℓj=Ziℓj−m^0Zℓ−∑d=1Dm^dZℓ(Xidj)$ and it is a residual of a nonparametric additive regression of (Zi)i=1, … , n on (Xi)i=1, … , n. Hence using Carroll et al. (2009) we obtain that for any p,

$∑ℓ=1J∑j=1Jbℓℓ,truep^ℓd(xd)m^dZp(xd)=n−1∑i=1n∑ℓ=1J∑k=1Jbℓk,true(Zipk−m^0)Khd(xd,Xidℓ) −∫m^dZ(t)∑ℓ=1J∑k≠ℓ=1Jbℓk,truep^ℓkdd(xd,t)dt −∫∑a≠d=1Dm^aZ(t)∑ℓ=1J∑k=1Jbℓk,truep^ℓkda(xd,t)dt.$

Now taking limits as n → ∞ and simplifying, we obtain that for d = 1, … , D,

$0=∑ℓ=1J∑k=1Jbℓk,truepℓd(xd)E[Zpk−m0Zp−∑a=1DmaZp(Xak)|Xdℓ=xd].$
(A.6)

##### A.3.1 Proof of Proposition 3: Semiparametric efficiency

Assume that ϵ̃ = Normal(0, Σtrue). Assuming that Σtrue is known, the loglikelihood (up to a constant) is given by

$ℒ=−(1/2)∑j=1J∑k=1Jbjk,true[Yj−{m0+Zj⊤θ0+∑d=1Dmd(Xdj)}]×[Yk−{m0+Zk⊤θ0+∑d=1Dmd(Xdk)}].$

##### Parametric submodel, score functions and tangent space

Let M denote the true nonparametric model. Now we take a parametric submodel Mγ: for γ = (γ1, … , γD), we replace md(Xdk) by md(Xdkd) and obtain the loglikelihood

$ℒγ=−(1/2)∑j=1J∑k=1Jbjk,true[Yj−{m0+Zj⊤θ0+∑d=1Dmd(Xdj,γd)}]×[Yk−{m0+Zk⊤θ0+∑d=1Dmd(Xdk,γd)}].$

such that γ coincides with for γ = γ0. Under γ, the score functions for m0, θ0 and γ, evaluated at true values of θ0 and m0 and γ = γ0, are given by

$Sm0=∑j=1J∑ℓ=1Jbjℓ,trueϵj;Sθ=∑j=1J∑ℓ=1JZℓbjℓ,trueϵj;Sγd=∑j=1J∑ℓ=1Jηd(Xdℓ)bjℓ,trueϵj,$

where ηd(Xd) are any function.

##### The Tangent Space

We see that the tangent space is spanned by Sm0, Sβ and (Sγ1, … , SγD). We orthogonalize Sθ and the nuisance space spanned by Sm0 and (Sγ1, … , SγD).

We start by observing that by (A.6) it is straightforward that for any function 𝓖(Xd),

$0=E[∑ℓ=1J∑k=1Jbℓk,true𝓖(Xdℓ)E{Zpk−m0Zp−∑a=1DmaZp(Xak)|Xdℓ}].$
(A.7)

Next we derive that for any d,

$E[SθSγd]=E[{∑j=1J∑ℓ=1Jbjℓ,trueϵjZℓ}{∑k=1J∑q=1Jbkq,trueϵkηd(Xdq)}]=E[∑j=1J∑ℓ=1J∑k=1J∑q=1Jbjℓ,truebkq,trueE{Zℓηd(Xdq)ϵjϵk|Xdq}]=E{∑ℓ=1J∑q=1Jbℓq,trueE(Zℓ|Xdq)ηd(Xdq)},$

the last line following from the fact that ϵ̃ = Normal[0, Σtrue]. Similarly we derive that

$E[{∑j=1J∑ℓ=1Jbjℓ,trueϵj(m0Zp+∑a=1DmaZp(Xaℓ))}Sγd] =E[∑ℓ=1J∑q=1Jbℓq,trueE{m0Zp+∑a=1DmaZp(Xaℓ)|Xdq}ηd(Xdq)].$

Hence, using (A.7) we see that for any d,

$E[{Sθ−∑j=1J∑ℓ=1Jbjℓ,trueϵj(m0Z+∑a=1DmaZ(Xaℓ))}Sγd] =E[∑ℓ=1J∑q=1Jbℓq,trueηd(Xdq)E{Zℓ−m0Z−∑a=1DmaZ(Xaℓ)|Xdq}]=0.$

Hence

$SθΣtrue=∑j=1J∑ℓ=1Jbjℓ,true{Zℓ−m0Zℓ−∑a=1DmaZℓ(Xaℓ)}ϵj=𝒰Σtrue−1ϵ˜$

is orthogonal to the nuisance space spanned by (Sγ1, … , SγD).

A similar argument can be used to prove that $SθΣtrue$ is also orthogonal to Sm0.

Hence the efficient score for θ0 is $SθΣtrue$, which in turn gives us the efficient influence function

$Ψθ,eff=SZ−1(Σtrue)𝒰Σtrue−1ϵ˜,$

where $SZ(Σtrue)=E(𝒰Σtrue−1𝒰⊤)$. Hence the result.

## Contributor Information

Raymond Carroll, Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843, USA, ude.umat.tats@llorrac, Telephone 979 845 3141, Fax 979 845 3144.

Arnab Maity, Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, Massachusetts 02115, U.S.A. ude.dravrah.hpsh@ytiama.

Enno Mammen, Department of Economics, University of Mannheim, L 7, 3–5, 68131 Mannheim, Germany, ed.miehnnam-inu.smmur@nemmame..

Kyusang Yu, Department of Economics, University of Mannheim, L 7, 3-5, 68131 Mannheim, Germany, rk.oc.oohay@ugnasuyk.

## References

• Carroll RJ, Maity A, Mammen E, Yu K. Nonparametric additive regression for repeatedly measured data. Biometrika. 2009 to appear.
• Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions. Am. J. Human Genetics. 2006;79:1002–1016. [PubMed]
• Fan J, Gijbels I. Local Polynomial Modeling and its Applications. London: Chapman and Hall; 1996.
• Härdle W, Liang H, Gao J. Partially Linear Models. Heidelberg: Physica Verlag; 2000.
• Kipnis V, Subar AF, Midthune D, Freedman LS, Ballard-Barbash R, Troiano R, Bingham S, Schoeller DA, Schatzkin A, Carroll RJ. The structure of dietary measurement error: results of the OPEN biomarker study. Am. J. Epid. 2003;158:14–21. [PubMed]
• Lin X, Carroll RJ. Semiparametric estimation in general repeated measures problems. J. Royal Stat. Soc., Series B. 2006;68:68–88.
• Lu Z, Lundervold L, Tjøstheim D, Yao Q. Exploring spatial nonlinearity using additive approximation. Bernoulli. 2007;13:447–472.
• Maity A, Carroll RJ, Mammen E, Chatterjee N. Powerful multi-locus tests for genetic association with semiparametric gene-environment interactions. J. Royal Stat. Soc., Series B. 2009;71:75–96. [PubMed]
• Maity A, Ma Y, Carroll RJ. Efficient estimation of population-level summaries in general semiparametric regression models with missing response. J. Am. Stat. Assoc. 2007;102:123–139.
• van de Geer S. Empirical processes in M-estimation. Cambridge: Cambridge University Press; 2000.
• Wang N. Marginal nonparametric kernel regression accounting for within-subject correlation. Biometrika. 2003;90:43–52.
• Wang N, Carroll RJ, Lin X. Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Am. Stat. Assoc. 2005;100:147–157.
• Willett WC. Nutritional Epidemiology. New York, NY: Oxford University Press; 1990.
• Yu K, Mammen E, Park B. Semiparametric additive regression: gains from the additive structure of the infinite dimensional parameter. Preprint. 2009

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.