Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2964941

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Marginal longitudinal nonparametric regression and extensions
- 3. Maximum likelihood estimation and best prediction
- 4. Bayesian inference
- 5. Illustrations
- 6. Discussion
- References

Authors

Related links

Stat Probab Lett. Author manuscript; available in PMC 2010 October 27.

Published in final edited form as:

Stat Probab Lett. 2010 August; 80(15-16): 1242–1252.

doi: 10.1016/j.spl.2010.04.002PMCID: PMC2964941

NIHMSID: NIHMS218456

We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.

The past decade has seen a great deal of interest and activity in nonparametric regression for longitudinal data. A prominent component of this research is the *marginal longitudinal nonparametric regression* problem in which the covariance matrix of the responses for each subject is not modelled conditionally, and instead is an unspecified parameter to be estimated.

Ruppert et al. (2009, Section 3.9) provide a summary of research on this problem up until about 2008. Whilst Zeger and Diggle (1994) is an early reference for marginal longitudinal nonparametric regression, the area started to heat up in response to Lin and Carroll (2001), where it was shown that ordinary kernel smoothers are more efficient if so-called working independence is assumed. This spawned a flurry of activity on the problem. Relevant references include: Lin and Carroll (2000, 2006), Welsh et al. (2002), Wang (2003), Linton et al. (2003), Lin et al. (2004), Carroll et al. (2004), Hu et al. (2004), Chen and Jin (2005), Wang et al. (2005) and Fan et al. (2007), Sun et al. (2007), Fan and Wu (2008) and Carroll et al. (2009a,b).

In this article, we describe a relatively simple approach to the marginal longitudinal regression problem and its semiparametric extensions. Our approach is the natural one arising from the mixed model representation of penalized splines (e.g. Brumback et al., 1999; Ruppert et al., 2003) with estimation and inference done using maximum likelihood and best prediction. There is also the option of adopting a Bayesian standpoint and calling upon Markov chain Monte Carlo to achieve approximate inference. An interesting aspect of our marginal longitudinal semiparametric regression models is that Gibbs sampling applies with draws from standard distributions. For the Bayesian version of our model, the BUGS inference engine (Lunn et al., 2000) can be used for fitting, and we provide some illustrative code.

The penalized spline/mixed model approach means that semiparametric extensions of the marginal longitudinal regression problem can be handled straightforwardly. We describe extensions to additive and varying coefficient models, although other extensions can be handled similarly.

Section 2 describes the penalized spline approach and identifies the mixed model structures required to handle marginal longitudinal semiparametric regression problems. In Section 3, we discuss fitting via maximum likelihood and best prediction. Section 4 describes Bayesian inference via Gibbs sampling and BUGS. Illustrations are provided in Section 5 and closing discussion is given in Section 6.

For 1 ≤ *i* ≤ *m* subjects, we observe 1 ≤ *j* ≤ *n* (*n* *m*) scalar responses *y _{ij}* and predictors

$$E({y}_{ij})=f({x}_{ij}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{Cov}\{{\mathit{y}}_{i}\mid f({\mathit{x}}_{i})\}=\mathit{\sum},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}1\le i\le m,\phantom{\rule{0.38889em}{0ex}}1\le j\le n$$

(1)

for some real-valued smooth function *f* and *n* × *n* covariance matrix **Σ**. The notation *f*(*x** _{i}*) means that the function

Fig. 1 shows a simulated data set for model (1), with *m* = 100, *n* = 5,

$$f(x)=1+\frac{1}{2}\mathrm{\Phi}((2x-36)/5)\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\mathit{\sum}=\left[\begin{array}{ccccc}0.122& 0.098& 0.078& 0.063& 0.050\\ 0.098& 0.122& 0.098& 0.078& 0.063\\ 0.078& 0.098& 0.122& 0.098& 0.078\\ 0.063& 0.078& 0.098& 0.122& 0.098\\ 0.050& 0.063& 0.078& 0.098& 0.122\end{array}\right],$$

(2)

where *Φ* is the standard normal distribution function. The main problem is efficient estimation of *f* from data such as that shown in Fig. 1. Estimation of **Σ** may also be of interest.

Our approach to function estimation involves spline models for *f* of the form

$$f(x)={\beta}_{0}+{\beta}_{1}x+\sum _{k=1}^{K}{u}_{k}{z}_{k}(x)$$

(3)

where *z*_{1}, …, *z _{K}* is a rich set of spline basis functions. A simple basis arises from setting

To avoid over-fitting the spline coefficients *u _{k}*, 1 ≤

$$\mathit{y}=\mathit{X}\mathbf{\beta}+\mathit{Zu}+\mathbf{\epsilon}$$

(4)

where

$$\begin{array}{l}\mathit{y}=\left[\begin{array}{c}{\mathit{y}}_{1}\\ \vdots \\ {\mathit{y}}_{m}\end{array}\right],\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{X}=\left[\begin{array}{cc}\mathbf{1}& {\mathit{x}}_{1}\\ \vdots & \vdots \\ \mathbf{1}& {\mathit{x}}_{m}\end{array}\right],\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{Z}=\left[\begin{array}{ccc}{z}_{1}({\mathit{x}}_{1})& \cdots & {z}_{K}({\mathit{x}}_{1})\\ \vdots & \ddots & \vdots \\ {z}_{1}({\mathit{x}}_{m})& \cdots & {z}_{K}({\mathit{x}}_{1})\end{array}\right],\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathbf{\epsilon}=\left[\begin{array}{c}{\mathbf{\epsilon}}_{1}\\ \vdots \\ {\mathbf{\epsilon}}_{m}\end{array}\right],\\ \mathbf{\beta}=\left[\begin{array}{c}{\beta}_{0}\\ {\beta}_{1}\end{array}\right]\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{u}=\left[\begin{array}{c}{u}_{1}\\ \vdots \\ {u}_{K}\end{array}\right].\end{array}$$

The random vectors on the right-hand side of (4) have mean zero and covariance matrix:

$$\text{Cov}\left[\begin{array}{c}\mathit{u}\\ {\mathbf{\epsilon}}_{1}\\ {\mathbf{\epsilon}}_{2}\\ \vdots \\ {\mathbf{\epsilon}}_{m}\end{array}\right]=\left[\begin{array}{ccccc}{\sigma}^{2}\mathit{I}& \mathbf{0}& \mathbf{0}& \cdots & \mathbf{0}\\ \mathbf{0}& \mathit{\sum}& \mathbf{0}& \cdots & \mathbf{0}\\ \mathbf{0}& \mathbf{0}& \mathit{\sum}& \cdots & \mathbf{0}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathbf{0}& \mathbf{0}& \mathbf{0}& \cdots & \mathit{\sum}\end{array}\right]=\left[\begin{array}{cc}{\sigma}^{2}\mathit{I}& \mathbf{0}\\ \mathbf{0}& {\mathit{I}}_{m}\otimes \mathit{\sum}\end{array}\right],$$

where denotes Kronecker product. For fixed values of *σ*^{2} and **Σ**, we can call upon best linear unbiased prediction (e.g. Robinson, 1991) to estimate **β** and ** u** and, hence, the regression function

$$\left[\begin{array}{c}\mathit{u}\\ \mathbf{\epsilon}\end{array}\right]\sim N\left(\left[\begin{array}{c}\mathbf{0}\\ \mathbf{0}\end{array}\right],\left[\begin{array}{cc}{\sigma}^{2}\mathit{I}& \mathbf{0}\\ \mathbf{0}& {\mathit{I}}_{m}\otimes \mathit{\sum}\end{array}\right]\right).$$

(5)

From now on, we will assume that the *Gaussian* linear mixed model (4) and (5) is reasonably assumed. Sections 3 and 4 describe two approaches to fitting and inference. Before getting to that, we describe some semiparametric extensions of (1).

Suppose now that, corresponding to each *y _{ij}*, several predictor variables are available. There are a number of semiparametric regression extensions of (1) that could be considered. In this section, we focus on the additive model extension. To keep the notation simple, we restrict discussion to the situation where there are two continuous predictors with the

$$E({y}_{ij})={\beta}_{0}+{f}_{1}({x}_{1ij})+{f}_{2}({x}_{2ij}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{Cov}\{{\mathit{y}}_{i}\mid {f}_{1}({\mathit{x}}_{1i}),{f}_{2}({\mathit{x}}_{2i})\}=\mathit{\sum},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}1\le i\le m,\phantom{\rule{0.38889em}{0ex}}1\le j\le n$$

(6)

where *f*_{1} and *f*_{2} are smooth functions. If each of these is modelled as a penalized spline:

$${f}_{1}({x}_{1})={\beta}_{11}{x}_{1}+\sum _{k=1}^{{K}_{1}}{u}_{1k}{z}_{1k}({x}_{1})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{f}_{2}({x}_{2})={\beta}_{21}{x}_{2}+\sum _{k=1}^{{K}_{2}}{u}_{2k}{z}_{2k}({x}_{2})$$

(7)

with coefficients independently distributed subject to

$${u}_{1k}\text{i}.\text{i}.\text{d}.N(0,{\sigma}_{1}^{2})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{u}_{2k}\text{i}.\text{i}.\text{d}.N(0,{\sigma}_{2}^{2})$$

then a Gaussian linear mixed model

$$\mathit{y}=\mathit{X}\mathbf{\beta}+\mathit{Zu}+\mathbf{\epsilon}$$

arises. The differences between this model and that of Section 2 are that the design matrices are now

$$\mathit{X}=\left[\begin{array}{ccc}\mathbf{1}& {\mathit{x}}_{11}& {\mathit{x}}_{21}\\ \vdots & \vdots & \vdots \\ \mathbf{1}& {\mathit{x}}_{1m}& {\mathit{x}}_{2m}\end{array}\right],\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{Z}=\left[\begin{array}{cccccc}{z}_{11}({\mathit{x}}_{11})& \cdots & {z}_{1{K}_{1}}({\mathit{x}}_{11})& {z}_{21}({\mathit{x}}_{21})& \cdots & {z}_{2{K}_{2}}({\mathit{x}}_{21})\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ {z}_{11}({\mathit{x}}_{1m})& \cdots & {z}_{1{K}_{1}}({\mathit{x}}_{1m})& {z}_{21}({\mathit{x}}_{2m})& \cdots & {z}_{2{K}_{2}}({\mathit{x}}_{2m})\end{array}\right]$$

where *x*_{1}* _{i}* is the

$$\mathbf{\beta}=\left[\begin{array}{c}{\beta}_{0}\\ {\beta}_{11}\\ {\beta}_{12}\end{array}\right]\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{u}=\left[\begin{array}{c}{\mathit{u}}_{1}\\ {\mathit{u}}_{2}\end{array}\right]$$

where *u*_{1} is the *K*_{1} × 1 vector containing the *u*_{1}* _{k}* and

$$\left[\begin{array}{c}{\mathit{u}}_{1}\\ {\mathit{u}}_{2}\\ \mathbf{\epsilon}\end{array}\right]\sim N\left(\left[\begin{array}{c}\mathbf{0}\\ \mathbf{0}\\ \mathbf{0}\end{array}\right],\left[\begin{array}{ccc}{\sigma}_{1}^{2}\mathit{I}& \mathbf{0}& \mathbf{0}\\ \mathbf{0}& {\sigma}_{2}^{2}\mathit{I}& \mathbf{0}\\ \mathbf{0}& \mathbf{0}& {\mathit{I}}_{m}\otimes \mathit{\sum}\end{array}\right]\right).$$

(8)

Fitting via maximum likelihood and best prediction is analogous to that described in Section 3. The main difference is that there are two variance parameters
${\sigma}_{1}^{2}$ and
${\sigma}_{2}^{2}$ (and in extensions to additive models with *d* smooth functions there will be *d* such variance components) as well as the error covariance matrix **Σ**. Maximum likelihood fitting, described in Section 3, requires an expression for ** V** Cov(

$$\mathit{V}=\mathit{V}({\sigma}_{1}^{2},{\sigma}_{2}^{2},\mathit{\sum})={\sigma}_{1}^{2}{\mathit{Z}}_{[1]}{\mathit{Z}}_{[1]}^{T}+{\sigma}_{2}^{2}{\mathit{Z}}_{[2]}{\mathit{Z}}_{[2]}^{T}+{\mathit{I}}_{m}\otimes \mathit{\sum}$$

where *Z*_{[1]} and *Z*_{[2]} correspond to the column-wise partitioning of ** Z** according to the basis functions for

Before closing this section, we briefly mention that the model

$$E({y}_{ij})={\beta}_{0}+{\beta}_{1}{x}_{1ij}+{f}_{2}({x}_{2ij}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{Cov}\{{\mathit{y}}_{i}\mid {\mathit{x}}_{1i},{f}_{2}({\mathit{x}}_{2i})\}=\mathit{\sum},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}1\le i\le m,\phantom{\rule{0.38889em}{0ex}}1\le j\le n$$

(9)

is a simpler type of additive model than (6), since it only has one smooth function component. This is a *bona fide* semiparametric regression model since the right-hand side has the effect of the *x*_{1}* _{ij}*s modelled parametrically and the effect of the

An important extension of (6) is

$$E({y}_{ij})={\beta}_{0}+{f}_{{x}_{3ij}}({x}_{1ij})+{f}_{2}({x}_{2ij}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{Cov}\{{\mathit{y}}_{i}\mid {f}_{{x}_{3ij}}({\mathit{x}}_{1i}),{f}_{2}({\mathit{x}}_{2i})\}=\mathit{\sum},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}1\le i\le m,\phantom{\rule{0.38889em}{0ex}}1\le j\le n$$

(10)

where the *x*_{3}* _{ij}* correspond to measurements on a categorical variable

Interestingly, the Gaussian linear mixed model for fitting (10) takes the same form as that for fitting the additive model (6). In particular, the covariance matrix of the random effects and error vectors takes the same as that given in (8), but with two additional variance parameters. The only other difference is the form of the design matrices ** X** and

Another type of multiple-predictor semiparametric regression model is that involving varying coefficients. The simplest marginal longitudinal varying coefficient model is

$$E({y}_{ij})={f}_{0}({s}_{ij})+{f}_{1}({s}_{ij}){x}_{ij},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{Cov}\{{\mathit{y}}_{i}\mid {f}_{0}({\mathit{s}}_{i}),{f}_{1}({\mathit{s}}_{i})\}=\mathit{\sum},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}1\le i\le m,\phantom{\rule{0.38889em}{0ex}}1\le j\le n$$

(11)

where *s _{ij}* are the longitudinal measurements on a continuous predictor variable

As with the interaction extension, the Gaussian linear mixed model for fitting the varying coefficient model (11) takes the same form as that for fitting the additive model (6). In particular, the covariance matrix of the random effects and error vectors is exactly the same as that given in (8). Only the design matrices require modification. Section 12.4 of Ruppert et al. (2003) provides the relevant details.

Each of the marginal longitudinal semiparametric regression models in the previous section, and their extensions to *d* smooth functions, can be handled using the Gaussian linear mixed model

$$\mathit{y}\mid \mathit{u}\sim N(\mathit{X}\mathbf{\beta}+\mathit{Zu},{\mathit{I}}_{m}\otimes \mathit{\sum}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{u}\sim N(\mathbf{0},\underset{1\le \ell \le d}{\text{blockdiag}}({\sigma}_{\ell}^{2}{\mathit{I}}_{{K}_{\ell}})).$$

(12)

Here, *K*_{} corresponds to the number of spline basis functions used in the th smooth function estimate. Let
${\mathbf{\sigma}}^{2}=({\sigma}_{1}^{2},\dots ,{\sigma}_{d}^{2})$ be the vector of variance parameters. Then, the log-likelihood of ** y** under (12) is

$$\ell (\mathbf{\beta},{\mathbf{\sigma}}^{2},\mathit{\sum})=-\frac{1}{2}\{nlog(2\pi )+log\mid \mathit{V}\mid +\phantom{\rule{0.16667em}{0ex}}{(\mathit{y}-\mathit{X}\mathbf{\beta})}^{T}{\mathit{V}}^{-1}(\mathit{y}-\mathit{X}\mathbf{\beta})\}$$

(13)

where

$$\mathit{V}=\mathit{V}({\mathbf{\sigma}}^{2},\mathit{\sum})\equiv \text{Cov}(\mathit{y})=\sum _{\ell =1}^{d}{\sigma}_{\ell}^{2}{\mathit{Z}}_{[\ell ]}{\mathit{Z}}_{[\ell ]}^{T}+{\mathit{I}}_{m}\otimes \mathit{\sum}$$

and [*Z*_{[1]} · · · *Z*_{[}_{d}_{]}] is the partition of ** Z** corresponding to the basis functions for each smooth function estimate. For any fixed values of

$$\stackrel{\sim}{\mathbf{\beta}}({\mathbf{\sigma}}^{2},\mathit{\sum})={({\mathit{X}}^{T}{\mathit{V}}^{-1}\mathit{X})}^{-1}{\mathit{X}}^{T}{\mathit{V}}^{-1}\mathit{y}.$$

(14)

On substitution into (13), we obtain the *profile log-likelihood* for (**σ**^{2}, **Σ**) as:

$${\ell}_{P}({\mathbf{\sigma}}^{2},\mathit{\sum})=-\frac{1}{2}\left[log\mid \mathit{V}\mid +{\mathit{y}}^{T}{\mathit{V}}^{-1}\{\mathit{I}-\mathit{X}{({\mathit{X}}^{T}{\mathit{V}}^{-1}\mathit{X})}^{-1}{\mathit{X}}^{T}{\mathit{V}}^{-1}\}\mathit{y}\right]-\frac{n}{2}log(2\pi ).$$

(15)

However, the *restricted log-likelihood* (Patterson and Thompson, 1971)

$${\ell}_{R}({\mathbf{\sigma}}^{2},\mathit{\sum})={\ell}_{P}({\mathbf{\sigma}}^{2},\mathit{\sum})-\frac{1}{2}log\mid {\mathit{X}}^{T}{\mathit{V}}^{-1}\mathit{X}\mid $$

(16)

is usually preferred, since it accounts for estimation of the fixed effects vector **β**. The maximizers of * _{R}*(

Likelihood-based estimation of the model parameters **β**, **σ**^{2} and **Σ** thus involves:

- Obtain the REML estimates
^{2}and by maximizing(_{R}**σ**^{2},**Σ**). - Obtain the maximum likelihood estimate of = (
^{2}, ) according to (14).

Step 1 is by far the more challenging, since it involves multivariate numerical optimization.

Lastly, there is the problem of estimating spline coefficients ** u**. Since

$$\stackrel{\sim}{\mathit{u}}({\mathbf{\sigma}}^{2},\mathit{\sum})\equiv E(\mathit{u}\mid \mathit{y})={\mathit{G}}_{{\mathbf{\sigma}}^{2}}{\mathit{Z}}^{T}\mathit{V}{({\sigma}^{2},\mathit{\sum})}^{-1}\{\mathit{y}-\mathit{X}\stackrel{\sim}{\mathbf{\beta}}({\sigma}^{2},\mathit{\sum})\}$$

where
${\mathit{G}}_{{\mathbf{\sigma}}^{2}}={\text{blockdiag}}_{1\le \ell \le d}({\sigma}_{\ell}^{2}{\mathit{I}}_{{K}_{\ell}})$. An appropriate estimator for ** u** in this context is the empirical best predictor

$$\widehat{\mathit{u}}={\mathit{G}}_{{\widehat{\mathbf{\sigma}}}^{2}}{\mathit{Z}}^{T}\mathit{V}{({\widehat{\mathbf{\sigma}}}^{2},\widehat{\mathit{\sum}})}^{-1}\{\mathit{y}-\mathit{X}\stackrel{\sim}{\mathbf{\beta}}({\widehat{\mathbf{\sigma}}}^{2},\widehat{\mathit{\sum}})\}.$$

It is straightforward to construct estimates of the regression function *f* at arbitrary locations *x* using and ** û**.

Despite (12) being a relatively simple linear mixed model, we have not yet been successful in fitting it with standard mixed model software such as
`lme()` (Pinheiro et al., 2009) in the R computing language (R Development Core Team, 2009). This led us to also consider the Bayesian inference version and implementation via Gibbs sampling, as the next section describes. Note, however, that the approach described in the current section can be implemented via direct programming using either Newton–Raphson or Expectation–Maximization iteration.

An alternative inference strategy, which permits more direct implementation in standard software, involves working with a hierarchical Bayesian version of the Gaussian linear mixed model (12). This entails treating **β**, **σ**^{2} and **Σ** as random and setting prior distributions for each of them. The most convenient choice, because of conjugacy properties, are priors of the form:

$$\mathbf{\beta}\sim N(\mathbf{0},\mathit{F}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\sigma}_{\ell}^{2}\sim \text{Inverse}-\text{Gamma}({A}_{\ell},{B}_{\ell})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{\sum}\sim \text{Inverse}-\text{Wishart}(a,\mathit{B})$$

(17)

where *A*_{}, *B*_{}, 1 ≤ ≤ *d*, are positive constants and ** F** and

$$[{\sigma}^{2}]=\frac{{B}^{A}}{\mathrm{\Gamma}(A)}{({\sigma}^{2})}^{-A-1}{\text{e}}^{-B/{\sigma}^{2}},\phantom{\rule{0.38889em}{0ex}}{\sigma}^{2},A,B>0.$$

The notation **Σ** ~ Inverse-Wishart(*a*, ** B**), where

$$[\mathit{\sum}]={C}_{n,a}^{-1}{\mid \mathit{B}\mid}^{a/2}{\mid \mathit{\sum}\mid}^{-(a+n+1)/2}exp\left\{-\frac{1}{2}\text{tr}(\mathit{B}{\mathrm{\sum}}^{-1})\right\},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}a>0,\mathit{\sum},\mathit{B}\phantom{\rule{0.16667em}{0ex}}\text{both}\phantom{\rule{0.16667em}{0ex}}\text{positive}\phantom{\rule{0.16667em}{0ex}}\text{definite}$$

where ${C}_{n,a}\equiv {2}^{an/2}{\pi}^{n(n-1)/4}{\prod}_{i=1}^{n}\mathrm{\Gamma}\left({\scriptstyle \frac{a+1-i}{2}}\right)$.

Bayesian inference is based on the posterior density functions:

$$[\mathbf{\beta}\mid \mathit{y}],\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}[\mathit{u}\mid \mathit{y}],\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}[{\sigma}^{2}\mid \mathit{y}]\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}[\mathit{\sum}\mid \mathit{y}].$$

(18)

The probability calculus required to obtain each of these is unwieldy and, in practice, either analytic or Monte Carlo approximations need to be called upon. As shown in Section 4.1, the Markov chain Monte Carlo method *Gibbs sampling* is straightforward to implement for the Bayesian version of (12) and the priors (17) and, upon convergence, yields samples of arbitrary size from the posterior densities (18). The software package BUGS (Lunn et al., 2000) facilitates this approach to approximate Bayesian inference and illustrative code is given in Section 4.2.

A final, albeit important, aspect of this approach to fitting and inference is choice of the hyperparameters ** F**,

$$\mathit{F}={10}^{8}\mathit{I}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{A}_{\ell}={B}_{\ell}=\mathrm{0.01.}$$

(19)

Reasonable choices for the hyperparameters associated with **Σ** are

$$a=n\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{B}=0.01{\mathit{I}}_{n}.$$

(20)

We recommend that the sensitivity checks be done on these hyperparameters. An example of such a check, for a different Bayesian penalized spline model, is Figure 3 of Zhao et al. (2006). Section 3.2 of Crainiceanu et al. (2007) has detailed discussion on this issue, including the consequences of failing to pre-standardize the data.

The hierarchical Bayesian model specified by (12) and (17) can be fitted using a Gibbs sampling scheme with draws from standard distributions. We give the details here.

First, we note (e.g. Robert and Casella, 2004, p. 371) that Gibbs sampling requires successive draws from the full conditional distributions for each member of a particular partition of the parameters in the model. For the present model, we use the partition:

$$\left[\begin{array}{c}\mathbf{\beta}\\ \mathit{u}\end{array}\right],\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\sigma}_{1}^{2},\dots ,{\sigma}_{d}^{2},\mathit{\sum}.$$

As an example, the full conditional distribution for ${\sigma}_{1}^{2}$ is

$${\sigma}_{1}^{2}\mid \mathit{y},\left[\begin{array}{c}\mathbf{\beta}\\ \mathit{u}\end{array}\right],\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\sigma}_{2}^{2},\dots ,{\sigma}_{d}^{2},\mathit{\sum}.$$

We denote this by ‘
${\sigma}_{1}^{2}\mid \text{rest}$’ for short. Let ** C** [

$$\begin{array}{l}\left[\begin{array}{c}\mathbf{\beta}\\ \mathit{u}\end{array}\right]\mid \text{rest}\sim N({({\mathit{C}}^{T}({\mathit{I}}_{m}\otimes {\mathit{\sum}}^{-1})\mathit{C}+{\mathit{G}}_{{\mathbf{\sigma}}^{2}})}^{-1}{\mathit{C}}^{T}{\mathit{\sum}}^{-1}\mathit{y},{({\mathit{C}}^{T}(\mathit{I}\otimes {\mathit{\sum}}^{-1})\mathit{C}+{\mathit{G}}_{{\mathbf{\sigma}}^{2}})}^{-1}),\\ {\sigma}_{\ell}^{2}\mid \text{rest}\sim \text{Inverse}-\text{Gamma}\left({A}_{\ell}+\frac{1}{2}{K}_{\ell},{B}_{\ell}+\frac{1}{2}{\left|\right|{\mathit{u}}_{\ell}\left|\right|}^{2}\right),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}1\le \ell \le d\end{array}$$

and

$$\mathit{\sum}\mid \text{rest}\sim \text{Inverse}-\text{Wishart}(a+m,\mathit{B}+(\mathit{y}-\mathit{X}\mathbf{\beta}-\mathit{Zu}){(\mathit{y}-\mathit{X}\mathbf{\beta}-\mathit{Zu})}^{T}).$$

Provided that *a* is an integer then the Inverse-Wishart draws for **Σ** can be achieved by setting

$$\mathit{\sum}={\left(\sum _{i=1}^{a+m}{\mathit{v}}_{i}{\mathit{v}}_{i}^{T}\right)}^{-1}$$

where the *v** _{i}* are independent

Interestingly, the fact that **Σ** is unstructured means that Gibbs sampling is exact. This is not the case if **Σ** is structured (e.g. autoregressive) and more complicated Markov chain Monte Carlo schemes are then required.

The BUGS language supports implementation of our Bayesian marginal longitudinal semiparametric regression models. It is recommended that the spline basis functions be set up outside of BUGS. We do this in R and then call BUGS using the BRugs package (Ligges et al., 2009). We pass the regression data to
`BUGS` using matrices. For example, the variable
`yMat` is an *m*×*n* matrix with (*i*, *j*) entry containing *y _{ij}*. Our BUGS code for fitting the marginal longitudinal nonparametric regression model is:

model { for (i in 1:m) { for (j in 1:n) { mu[i,j] <- beta0 + beta1*xMat[i,j] + inprod(u[],Z[(i-1)*n+j,]) } yMat[i,1:n] ~ dmnorm(mu[i,],Omega[1:n,1:n]) } for (k in 1:K) { u[k] ~ dnorm(0,tau) } beta0 ~ dnorm(0,1.0E-8); beta1 ~ dnorm(0,1.0E-8) tau ~ dgamma(0.01,0.01); Omega[1:n,1:n] ~ dwish(R[,],n) for (i in 1:n) { for (j in 1:n) { R[i,j] <- 0.01*equals(i,j) } } sigma <- 1/sqrt(tau); Sigma[1:n,1:n] <- inverse(Omega[,]) }

Note that BUGS uses precision matrices rather than covariance matrices in its multivariate normal distribution specification. Hence, the above code uses the variable
`Omega`, corresponding to *Ω* = **Σ**^{− 1}. Similarly, the precision parameter
`tau` corresponds to *τ* = 1/*σ*^{2} where *σ*^{2} is the spline penalization variance component.

We tested out BUGS fitting of the four types of models presented in Section 2 on several sets of simulated data, as well as some empirical data. We now present some of these results.

We fitted the penalized spline model (12) to the data of Fig. 1. The *y _{ij}* were generated according to (2). The

Fig. 2 summarizes the BUGS output for the diagonal entries of **Σ**. The chains mix quite well with no significant autocorrelation. In addition, the true values of *Σ _{ii}* are captured by the 95% credible intervals in four out of the five cases.

Summary of MCMC-based inference for the diagonal entries of **Σ** in the fitted marginal longitudinal nonparametric regression model. The columns are: parameter, trace plot of MCMC sample, plot of sample against 1-lagged sample, sample autocorrelation **...**

The results for the off-diagonal entries of **Σ** are summarized in Fig. 3. All ten of the true values of *Σ _{ij}* are captured by the 95% credible intervals.

Summary of MCMC-based inference for the off-diagonal entries of **Σ** in the fitted marginal longitudinal nonparametric regression model. The columns are: parameter, trace plot of MCMC sample, plot of sample against 1-lagged sample, sample autocorrelation **...**

The Bayesian penalized spline estimate of *f* is shown in Fig. 4, with and without the data. The thick solid curves correspond to the posterior means of (3) over a grid of *x*s. The dashed curves are corresponding 95% credible sets. The regression function from which the data were generated is shown for comparison. The pointwise posteriors are seen to cover the true *f* quite well.

Left panel: estimated regression function *f*, together with the longitudinal data on which it is based. Dashed curves correspond to pointwise 95% credible sets. The true *f* is shown as a thin grey curve. Right panel: estimated regression function *f*, with **...**

The good results presented in this section are typical of the performances we observed over several runs, as well as different choices for *f* and **Σ**. An interesting future project would be a large scale simulation study that compares this approach with existing methods.

We fitted the Bayesian additive/interaction model (10) to data from a nutritional epidemiology study (source: Kipnis et al., 2003). The sample sizes are *m* = 294 and *n* = 2 and each penalized spline fit involved *K* = 25 basis functions. The BUGS program, with 5000 burn-in samples and 5000 retained samples (thinned by a factor of 5), took 52 min to run on the third author’s laptop computer. In the notation of (10), the variables are

*y*= logarithm of intake of protein as measured by the biomarker urinary nitrogen,*x*_{1}= body mass index,*x*_{2}= logarithm of intake of protein as measured by a 24-hour recall instrument,*x*_{3}= gender.

Fig. 5 shows the estimates of *f*_{male}, *f*_{female} and *f*_{2} and corresponding 95% pointwise credible sets.

Fig. 6 summarizes the MCMC-based inference for the entries **Σ**. Excellent chain mixing is observed. The within-subject covariance is significantly positive, as indicated by the 95% credible set for **Σ**_{12}.

It is somewhat of a quirk that the mixed model-based penalized spline approach to marginal longitudinal nonparametric regression has not been explored in depth until now. Nevertheless, as we have illustrated in the previous section, it is a viable approach that is readily implemented in standard software. Another advantage of this approach is that complications such as missingness can be handled within the same likelihood-based or Bayesian frameworks. It would be interesting to see if the asymptotic efficiency results established for other approaches (e.g. Wang et al., 2005) also apply here. Another interesting line of inquiry is the use of penalized splines in flexible marginal modelling of longitudinal data with binary and count responses. Heagerty (1999), for example, describes the parametric models of this type. We believe that such models are amenable to a penalized spline approach, with implementation in BUGS, but we have not explored this deeply.

Wand’s research was partially supported by Australian Research Council Discovery Project DP0877055. Carroll’s research was supported by a grant from the US National Cancer Institute (CA57030) and by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology, Saudi Arabia.

- Brumback BA, Ruppert D, Wand MP. Comment on paper by Shively, Kohn & Wood. Journal of the American Statistical Association. 1999;94:794–797.
- Carroll RJ, Hall P, Apanasovich TV, Lin X. Histospline method in nonparametric regression models with application to clustered/longitudinal data. Statistica Sinica. 2004;14:649–674.
- Carroll RJ, Maity A, Mammen E, Yu K. Nonparametric additive regression for repeatedly measured data. Biometrika. 2009a;96:383–398.
- Carroll RJ, Maity A, Mammen E, Yu K. Efficient semiparametric marginal estimation for the partially linear additive model for longitudinal/clustered data. Statistics in Biosciences. 2009b;1:10–31. [PMC free article] [PubMed]
- Chen K, Jin Z. Local polynomial regression analysis of clustered data. Biometrika. 2005;92:59–74.
- Coull BA, Ruppert D, Wand MP. Simple incorporation of interactions into additive models. Biometrics. 2001;57:539–545. [PubMed]
- Crainiceanu CM, Ruppert D, Carroll RJ, Joshi A, Goodner B. Spatially adaptive Bayesian penalized splines with heteroscedastic errors. Journal of Computational and Graphical Statistics. 2007;16:265–288.
- Fan J, Huang T, Li R. Analysis of longitudinal data with semiparametric estimation of covariance function. Journal of the American Statistical Association. 2007;102:632–641. [PMC free article] [PubMed]
- Fan J, Wu Y. Semiparametric estimation of covariance matrixes for longitudinal data. Journal of the American Statistical Association. 2008;103:1520–1533. [PMC free article] [PubMed]
- Heagerty PJ. Marginally specified logistic-normal models for longitudinal binary data. Biometrics. 1999;55:688–698. [PubMed]
- Hu ZH, Wang N, Carroll RJ. Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data. Biometrika. 2004;91:251–262.
- Kipnis V, Subar AF, Midthune D, Freedman LS, Ballard-Barbash R, Troiano R, Bingham S, Schoeller DA, Schatzkin A, Carroll RJ. The structure of dietary measurement error: results of the OPEN biomarker study. American Journal of Epidemiology. 2003;158:14–21. [PubMed]
- Li Y, Ruppert D. On the asymptotics of penalized splines. Biometrika. 2008;95:415–436.
- Lin X, Carroll RJ. Nonparametric function estimation for clustered data when the predictor is measured without/with error. Journal of the American Statistical Association. 2000;95:520–534.
- Lin X, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. Journal of the American Statistical Association. 2001;96:1045–1056.
- Lin X, Carroll RJ. Semiparametric estimation in general repeated measures problems. Journal of the Royal Statistical Society, Series B. 2006;68:68–88.
- Lin X, Wang N, Welsh AH, Carroll RJ. Equivalent kernels of smoothing splines in nonparametric regression for clustered/longitudinal data. Biometrika. 2004;91:177–193.
- Linton OB, Mammen E, Lin X, Carroll RJ. Accounting for correlation in marginal longitudinal nonparametric regression. In: Lin DY, Heagerty PJ, editors. Proceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data. Springer; New York: 2003. pp. 23–33.
- Ligges U, Thomas A, Spiegelhalter D, Best N, Lunn D, Rice K, Sturtz S. BRugs 0.5-3: Analysis of graphical models using MCMC techniques. R package. 2009. http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.10.
- Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS–a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing. 2000;10:325–337.
- Patterson HD, Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrika. 1971;58:545–554.
- Pinheiro J, Bates D, DebRoy S, Sarkar D. The R Core Team. nlme 3.1: linear and nonlinear mixed effects models. R package. 2009. www.R-project.org.
- R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2009. www.R-project.org.
- Robert CP, Casella G. Monte Carlo Statistical Methods. 2. Springer-Verlag; New York: 2004.
- Robinson GK. That BLUP is a good thing: the estimation of random effects. Statistical Science. 1991;6:15–51.
- Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; New York: 2003.
- Ruppert D, Wand MP, Carroll RJ. Semiparametric regression during 2003–2007. Electronic Journal of Statistics. 2009;3:1193–1256. [PMC free article] [PubMed]
- Sun Y, Zhang W, Tong H. Estimation of the covariance matrix of random effects in longitudinal studies. The Annals of Statistics. 2007;35:2795–2814.
- Wand MP, Ormerod JT. On semiparametric regression with O’Sullivan penalized splines. Australian and New Zealand Journal of Statistics. 2008;50:179–198.
- Wang N. Marginal nonparametric kernel regression accounting for within-subject correlation. Biometrika. 2003;90:43–52.
- Wang N, Carroll RJ, Lin X. Efficient semiparametric marginal estimation for longitudinal/clustered data. Journal of the American Statistical Association. 2005;100:147–157.
- Welham SJ, Cullis BR, Kenward MG, Thompson R. A comparison of mixed model splines for curve fitting. Australian and New Zealand Journal of Statistics. 2007;49:1–23.
- Welsh AH, Lin X, Carroll RJ. Marginal longitudinal nonparametric regression: locality and efficiency of spline and kernel methods. Journal of the American Statistical Association. 2002;97:482–493.
- Zeger S, Diggle PJ. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics. 1994;50:689–699. [PubMed]
- Zhao Y, Staudenmayer J, Coull BA, Wand MP. General design Bayesian generalized linear mixed models. Statistical Science. 2006;21:35–51.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |