Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2897156

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Multilevel functional regression models
- 3 Posterior distribution of subject-specific scores
- 4 Model uncertainty
- 5 Bayesian inference
- 6 Simulation studies
- 7 The analysis of sleep data from the SHHS
- 8 Discussion
- References

Authors

Related links

J Am Stat Assoc. Author manuscript; available in PMC 2010 September 1.

Published in final edited form as:

J Am Stat Assoc. 2009 December 1; 104(488): 1550–1561.

doi: 10.1198/jasa.2009.tm08564PMCID: PMC2897156

NIHMSID: NIHMS127980

Ciprian M. Crainiceanu, Ciprian M. Crainiceanu is Associate Professor, Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21205 (E-mail: ude.hpshj@ciniarcc)

See other articles in PMC that cite the published article.

We introduce Generalized Multilevel Functional Linear Models (GMFLMs), a novel statistical framework for regression models where exposure has a multilevel functional structure. We show that GMFLMs are, in fact, generalized multilevel mixed models (GLMMs). Thus, GMFLMs can be analyzed using the mixed effects inferential machinery and can be generalized within a well researched statistical framework. We propose and compare two methods for inference: 1) a two-stage frequentist approach; and 2) a joint Bayesian analysis. Our methods are motivated by and applied to the Sleep Heart Health Study (SHHS), the largest community cohort study of sleep. However, our methods are general and easy to apply to a wide spectrum of emerging biological and medical data sets. Supplemental materials for this article are available online.

Recording and processing of functional data has become routine due to advancements in technology and computation. Many current studies contain observations of functional data on the same subject at multiple visits. For example, the Sleep Heart Health Study (SHHS) described in Section 7 contains, for each subject, quasi-continuous electroencephalogram (EEG) signals at two visits. In this paper we introduce a class of models and inferential methods for association studies between functional data observed at multiple levels/visits, such as sleep EEG or functional magnetic resonance imaging (fMRI), and continuous or discrete outcomes, such as systolic blood pressure (SBP) or Coronary Heart Disease (CHD). As most of these data sets are very large, feasibility of methods is a primary concern.

Functional regression is a generalization of regression to the case when outcomes or regressors or both are functions instead of scalars. Functional Regression Analysis is currently under intense methodological research [7, 23, 31, 34, 45] and is a particular case of Functional Data Analysis (FDA) [21, 24, 44, 42]. Two comprehensive monographs of FDA with applications to curve and image analysis are [33, 34]. There has been considerable recent effort to apply FDA to longitudinal data, e.g., [14, 37, 47]; see [30] for a thorough review. However, in all current FDA research, the term “longitudinal” represents single-level time series.

FDA was extended to multilevel functional data; see, for example, [2, 13, 19, 29, 28, 40]. However, all these papers have focused on models for functional data and not on functional regression. The multilevel functional principal component analysis (MFPCA) approach in [13] uses functional principal component bases to reduce data dimensionality and accelerate the associated algorithms, which is especially useful in moderate and large data sets. Thus, MFPCA provides an excellent platform for methodological extensions to the multilevel regression case.

We introduce Generalized Multilevel Functional Linear Models (GMFLMs), a novel statistical framework for regression models where exposure has a multilevel functional structure. This framework extends MFPCA in several ways. First, GMFLMs are designed for studies of association between outcome and functional exposures, whereas MFPCA is designed to describe functional exposure only; this extension is needed to answer most common scientific questions related to longitudinal collection of functional/image data. Second, we show that GMFLMs are the functional analog of measurement error regression models; in this context MFPCA is the functional analog of the exposure measurement error models [5]. Third, we show that all regression models with functional predictors contain two mixed effects sub-models: an outcome and an exposure model. Fourth, we propose and compare two methods for inference: 1) a two-stage frequentist approach; and 2) a joint Bayesian analysis. Using the analogy with measurement error models we provide insight into when using a two-stage method is a reasonable alternative to the joint analysis and when it is expected to fail. Our methods are an evolutionary development in a growth area of research. They build on and borrow strength from multiple methodological frameworks: functional regression, measurement error and multilevel modeling. Given the range of applications and methodological flexibility of our methods, we anticipate that they will become one of the standard approaches in functional regression.

The paper is organized as follows. Section 2 introduces the functional multilevel regression framework. Section 3 describes estimation methods based on best linear prediction. Section 4 presents our approach to model selection. Section 5 discusses the specific challenges of a Bayesian analysis of the joint mixed effects model corresponding to functional regression. Section 6 provides simulations. Section 7 describes an application to sleep EEG data from the SHHS. Section 9 summarizes our conclusions.

In this Section we introduce the GMFLM framework and inferential methods.

The observed data for the *i*th subject in a GMFLM is [*Y _{i}*,

We assume that *W _{ij}*(

To ensure identifiability we assume that *X _{i}*(

We assume that the distribution of the outcome, *Y _{i}*, is in the exponential family with linear predictor

$${X}_{i}(t)=\sum _{k\ge 1}{\xi}_{ik}{\psi}_{k}^{(1)}(t),\phantom{\rule{0.38889em}{0ex}}{U}_{ij}(t)=\sum _{l\ge 1}{\zeta}_{\mathit{ijl}}{\psi}_{l}^{(2)}(t);\phantom{\rule{0.38889em}{0ex}}\beta (t)=\sum _{k\ge 1}{\beta}_{k}{\psi}_{k}^{(1)}(t).$$

(1)

This form of the model is impractical because it involves three infinite sums. Instead, we will approximate model (1) with a series of models where the number of predictors is truncated at *K* = *K _{I,J}* and

$$\{\begin{array}{l}{Y}_{i}\sim \text{EF}({\vartheta}_{i}^{K},\alpha );\hfill \\ {\vartheta}_{i}^{K}={\sum}_{k=1}^{K}{\xi}_{ik}{\beta}_{k}+{\mathit{Z}}_{i}^{t}\mathit{\gamma}.\hfill \end{array}$$

(2)

Other multilevel outcome models could be considered by including regression terms for the *U _{ij}*(

We use MFPCA [13] to obtain the parsimonious bases that capture most of the functional variability of the space spanned by *X _{i}*(

Once consistent estimators of *K ^{X}*(

Conditional on the eigenfunctions and truncation lags *K* and *L*, the model for observed functional data can be written as a linear mixed model. Indeed, by assuming a normal shrinkage distribution for scores and errors, the model can be rewritten as

$$\{\begin{array}{l}{W}_{ij}(t)={\sum}_{k=1}^{K}{\xi}_{ik}{\psi}_{k}^{(1)}(t)+{\sum}_{l=1}^{L}{\zeta}_{\mathit{ijl}}{\psi}_{l}^{(2)}(t)+{\epsilon}_{ij}(t);\hfill \\ {\xi}_{ik}\sim \text{N}\{0,{\lambda}_{k}^{(1)}\};\phantom{\rule{0.16667em}{0ex}}{\zeta}_{\mathit{ijl}}\sim \text{N}\{0,{\lambda}_{l}^{(2)}\};\phantom{\rule{0.16667em}{0ex}}{\epsilon}_{ij}(t)\sim \text{N}(0,{\sigma}_{\epsilon}^{2}).\hfill \end{array}$$

(3)

For simplicity we will refer to ${\psi}_{k}^{(1)}(\xb7),{\psi}_{l}^{(2)}(\xb7)$ and ${\lambda}_{k}^{(1)},{\lambda}_{l}^{(2)}$ as the level 1 and 2 eigenfunctions and eigenvalues, respectively.

We propose to jointly fit the outcome model (2) and the exposure model (3). Because the joint model is a generalized linear mixed effects model the inferential arsenal for mixed effects models can be used. In particular, we propose to use a Bayesian analysis via posterior Markov Chain Monte Carlo (MCMC) simulations as described in Section 5. An alternative would be to use a two-stage analysis by first predicting the scores from model (3) and then plug-in these estimates into model (2).

To better understand the potential problems associated with two-stage estimation we describe the induced likelihood for the observed data. We introduce the following notations *ξ** _{i}* = (

$$[{Y}_{i}\mid {\mathit{W}}_{i},{\mathit{Z}}_{i}]=\int [{Y}_{i}\mid {\mathit{\xi}}_{i},{\mathit{Z}}_{i}][{\mathit{\xi}}_{i}\mid {\mathit{W}}_{i}]d{\mathit{\xi}}_{i}.$$

(4)

Under normality assumptions it is easy to prove that [*ξ** _{i}*|

For most nonlinear models the induced model for observed data (4) does not have an explicit form. A procedure to avoid this problem is to use a two-stage approach with the following components: 1) produce predictors of *ξ** _{i}*, say b

Consider, for example, the outcome model *Y _{i}*

$${\mathrm{\Phi}}^{-1}({q}_{i})=\{{m}^{t}({\mathit{W}}_{i})\mathit{\beta}+{\mathit{Z}}_{i}^{t}\mathit{\gamma}\}/{(1+{\mathit{\beta}}^{t}{\mathbf{\sum}}_{i}\mathit{\beta})}^{1/2}.$$

(5)

Thus, using the two-stage procedure, where *ξ** _{i}* is simply replaced by

There are important potential differences between joint and two-stage analyses in a multilevel functional regression context. Indeed, the term
${\sum}_{l=1}^{L}{\zeta}_{\mathit{ijl}}{\psi}_{l}^{(2)}(t)$ in equation (3) quantifies the visit/subject-specific deviations from the subject specific mean. This variability is typically large and makes estimation of the subject-specific scores, *ξ** _{i}*, difficult even when the functions are perfectly observed, that is when
${\sigma}_{\epsilon}^{2}=0$. Thus, the effects of variability on bias in a two-stage procedure can be severe, especially when the within-subject variability is large compared to the between-subject variability. In the next section we provide the technical details associated with a two-stage procedure and provide a simple example to build up the intuition.

We now turn our attention to calculating the posterior distribution of subject-specific scores for the MFPCA model (3). While this section is more technical and contains some pretty heavy notation, the results are important because they form the basis of any reasonable inferential procedure in this context, be it two-stage or joint modeling. We first introduce some notation for a subject *i*. Let **W*** _{ij}* = {

If **Σ**_{Wi} denotes the covariance matrix of *W** _{i}* then its (

Consider the exposure model (3) with a fixed number of observations per visit, i.e. M_{ij} = M_{i}, at the same subject-specific times for each visit, i.e. t_{ijm} = t_{im} for all j = 1, …, J_{i}. Denote by
${\mathit{K}}^{X}={\mathbf{\Psi}}_{i1}^{(1)}{\mathbf{\Lambda}}^{(1)}{\mathbf{\Psi}}_{i1}^{(1)t}$, by
${\mathit{K}}_{T}^{U}={\mathbf{\Psi}}_{i1}^{(2)}{\mathbf{\Lambda}}^{(2)}{\mathbf{\Psi}}_{i1}^{(2)t}$, by **1**_{Ji×Ji} the J_{i} × J_{i} dimensional matrix of ones, and by the Kronecker product of matrices. Then
${\mathbf{\sum}}_{{\mathbf{W}}_{i}}={\mathbf{1}}_{{J}_{i}\times {J}_{i}}\otimes {\mathit{K}}^{X}+{\mathit{I}}_{{J}_{i}}\otimes ({\sigma}_{\epsilon}^{2}{\mathit{I}}_{{M}_{i}}+{\mathit{K}}_{T}^{U})$ and
${\mathbf{\sum}}_{{\mathbf{W}}_{i}}^{-1}={\mathit{I}}_{{J}_{i}}\otimes {({\sigma}_{\epsilon}^{2}{\mathit{I}}_{{M}_{i}}+{\mathit{K}}_{T}^{U})}^{-1}-{\mathbf{1}}_{{J}_{i}\times {J}_{i}}\otimes \{{({\sigma}_{\epsilon}^{2}{\mathit{I}}_{{M}_{i}}+{\mathit{K}}_{T}^{U})}^{-1}{\mathit{K}}^{X}{({J}_{i}{\mathit{K}}^{X}+{\sigma}_{\epsilon}^{2}{\mathit{I}}_{{M}_{i}}+{\mathit{K}}_{T}^{U})}^{-1}\}$.

Assume the balanced design considered in Theorem 1 and denote by ${\overline{\mathit{W}}}_{i}={\sum}_{j=1}^{{J}_{i}}{\mathit{W}}_{ij}/{J}_{i}$. Then $m({\mathit{W}}_{i})={\mathbf{\Lambda}}^{(1)}{\mathbf{\Psi}}_{i1}^{(1)t}{\{{\mathit{K}}^{X}+{\scriptstyle \frac{1}{{J}_{i}}}({\sigma}_{\epsilon}^{2}{\mathit{I}}_{{M}_{i}}+{\mathit{K}}_{T}^{U})\}}^{-1}{\overline{\mathit{W}}}_{i}$ and ${\mathbf{\sum}}_{i}={\mathbf{\Lambda}}^{(1)}-{\mathbf{\Lambda}}^{(1)}{\mathbf{\Psi}}_{i1}^{(1)t}{\{{\mathit{K}}^{X}+{\scriptstyle \frac{1}{{J}_{i}}}({\sigma}_{\epsilon}^{2}{\mathit{I}}_{{M}_{i}}+{\mathit{K}}_{T}^{U})\}}^{-1}{\mathbf{\Psi}}_{i1}^{(1)}{\mathbf{\Lambda}}^{(1)}$.

Proofs can be found in the accompanying web supplement. Theorem 2 provides a particularly simple description of the conditional distribution *ξ** _{i}*|

Consider a balanced design model with *K* = *L* = 1 and *ψ*^{(1)}(*t*) = 1, *ψ*^{(2)}(*t*) = 1 for all *t*. The exposure model becomes a balanced mixed two-way ANOVA model

$$\{\begin{array}{l}{W}_{ij}(t)={\xi}_{i}+{\zeta}_{ij}+{\epsilon}_{ij}(t);\hfill \\ {\xi}_{i}\sim \text{N}(0,{\lambda}_{1});{\zeta}_{ij}\sim \text{N}(0,{\lambda}_{2});{\epsilon}_{ij}(t)\sim \text{N}(0,{\sigma}_{\epsilon}^{2}),\hfill \end{array}$$

(6)

where, for simplicity, we denoted by *ξ _{i}* =

$${\mathrm{\sum}}_{i}=\frac{{\lambda}_{1}\{{\lambda}_{2}/{J}_{i}+{\sigma}_{\epsilon}^{2}/({M}_{i}{J}_{i})\}}{{\lambda}_{1}+\{{\lambda}_{2}/{J}_{i}+{\sigma}_{\epsilon}^{2}/({M}_{i}{J}_{i})\}}\le min\{{\lambda}_{1},{\lambda}_{2}/{J}_{i}+{\sigma}_{\epsilon}^{2}/({M}_{i}{J}_{i})\}.$$

Several important characteristics of this formula have direct practical consequences. First, Σ* _{i}* ≤

Our framework is faced with two distinct types of model uncertainty related to: 1) the choice of *K* and *L*, the dimensions of the two functional spaces in the exposure model (3); and 2) estimating *β*(*t*), the functional effect parameter, conditional on *K* and *L*, in the outcome model (2).

To address the first problem we focus on estimating *K*, as estimating *L* is similar. Note that, as *K* increases, the models described in (3) form a nested sequence of mixed effects models. Moreover, testing for the dimension of the functional space being equal to *K* versus *K* + 1 is equivalent to testing
${H}_{0,K}:{\lambda}_{K+1}^{(1)}=0$ versus
${H}_{A,K}:{\lambda}_{K+1}^{(1)}>0$, which is testing for the null hypothesis that a particular variance component is *equal to zero*. This connection provides a paradigm shift for estimating the dimension of the functional space or, more generally, the number of non-zero eigenvalues in PCA. Current methods are based on random matrix theory and require that eigenvalues be *bounded away from zero*, see, for example, [3, 20]. This is not the correct approach when the null hypothesis is that the eigenvalue is zero.

In this context Staicu, Crainiceanu and Carroll [40] proposed a sequence of Restricted Likelihood Ratio Tests (RLRTs) for zero variance components [10, 12, 41] to estimate *K*. Müller and Stadtmüller [31] proposed to use either the Akaike’s Information Criterion (AIC) [1] or the Bayesian Information Criterion (BIC) [39]. Moreover, they found these criteria to be more stable and less computationally intensive than methods based on cross-validation [38] or relative difference between the Pearson criterion and deviance [6]. Staicu, Crainiceanu and Carroll [40] show that both AIC and BIC are particular cases of sequential RLRT with non-standard *α* levels. They also explain that AIC performs well because its associated *α* level is 0.079, which is different from the standard *α* = 0.05, but might be reasonable in many applications. In contrast, they recommend against using the BIC in very large data sets, such as in our application, because the corresponding *α* level becomes extremely small.

In practice we actually prefer an even simpler method for estimating the number of components based on the estimated explained variance. More precisely, let *P*_{1} and *P*_{2} be two thresholds and define
${N}_{1}=min\{k:{\rho}_{k}^{(1)}\ge {P}_{1},{\lambda}_{k}<{P}_{2}\}$, where
${\rho}_{k}^{(1)}=({\lambda}_{1}^{(1)}+\dots +{\lambda}_{k}^{(1)})/({\lambda}_{1}^{(1)}+\dots +{\lambda}_{T}^{(1)})$. For the cumulative explained variance threshold we used *P*_{1} = 0.9 and for the individual explained variance we used *P*_{2} = 1/*T*, where *T* is the number of grid points. We used a similar method for choosing the number of components at level 2. These choices were slightly conservative, but worked well in our simulations and application. However, the two thresholds should be carefully tuned in any other particular application using simulations.

To address the second problem we note that it can be reduced to a standard model selection problem. Forward, backward, single-variable or all subset selection can be used to identify statistically significant predictors in the outcome model (2). Typical pitfalls reported for these methods are avoided because predictors are mutually orthogonal by construction. In practice, we prefer to do a backward selection combined with sensitivity analysis around the chosen model. More precisely, we obtain an optimal model and the two next best models. For all these models we provide the functional estimates and the log-likelihood differences.

A powerful alternative to estimating *β*(*t*) was proposed in a series of papers by Reiss and Ogden [35, 36] for the single-level functional regression case. In short, they project the original (un-smooth) matrix of functional predictors onto a *B*-spline basis and use the P-spline basis penalty to induce shrinkage directly on the functional parameter. Another alternative is to adapt the forward selection method using pseudo-variables [27, 43], which could work especially well because the estimated eigenvalues are sorted. Both methods could easily be used in our framework. However, they would need to be adapted to a joint analysis context to overcome the bias problem induced by the two-stage analysis described in Section 2.

Because of the potential problems associated with two-stage procedures, we propose to use joint modeling. Bayesian inference using MCMC simulations of the posterior distribution provides a reasonable, robust, and well tested computational approach for this type of problems. Possible reasons for the current lack of Bayesian methodology in functional regression analysis could be: 1) the connection between functional regression models and joint mixed effects models was not known; and 2) the Bayesian inferential tools were perceived as unnecessarily complex and hard to implement. We clarified the connection to mixed effects models in Section 2.1 and we now show that 2) is not true, thanks to intense methodological and computational research conducted over the last 10–20 years. See, for example, the monographs [4, 8, 16, 18] and the citations therein for a good overview.

To be specific, we focus on a Bernoulli/logit outcome model with functional regressors. Other outcome models would be treated similarly. Consider the joint model with the outcome *Y _{i}* ~ Bernoulli(

The priors for *ξ** _{i}* and

While standard choices of priors for fixed effects parameters exist and are typically non-controversial, the same is not true for priors of variance components. Indeed, the estimates of the variance components are known to be sensitive to the prior specification, see, for example, [11, 15]. In particular, the popular inverse-gamma priors may induce bias when their parameters are not tuned to the scale of the problem. This is dangerous in the shrinkage context where the variance components control the amount of smoothing. However, we find that with reasonable care, the conjugate gamma priors can be used in practice. Alternatives to gamma priors are discussed by, for example, [15, 32], and have the advantage of requiring less care in the choice of the hyperparameters. Nonetheless, exploration of other prior families for functional regression would be well worthwhile, though beyond the scope of this paper.

We propose to use the following independent inverse gamma priors
${\lambda}_{k}^{(1)}\sim \text{IG}({A}_{k}^{(1)},{B}_{k}^{(1)}),k=1,\dots ,K,{\lambda}_{l}^{(2)}\sim \text{IG}({A}_{l}^{(2)},{B}_{l}^{(2)})$*, l* = 1, … *L*, and
${\sigma}_{\epsilon}^{2}\sim \text{IG}({A}_{\epsilon},{B}_{\epsilon})$, where IG(*A*, *B*) is the inverse of a gamma prior with mean *A*/*B* and variance *A*/*B*^{2}. We first write the full conditional distributions for all the parameters and then discuss choices of non-informative inverse gamma parameters. Here we treat
${\lambda}_{k}^{(1)}$ and
${\lambda}_{l}^{(2)}$ as parameters to be estimated, but a simpler Empirical Bayes (EB) method proved to be a reasonable alternative in practice. More precisely, the EB method estimates
${\lambda}_{k}^{(1)}$ and
${\lambda}_{l}^{(2)}$ by diagonalizing the functional covariance operators as described in Section 2.1. These estimators are then fixed in the joint model. In the following we present the inferential procedure for the case when
${\lambda}_{k}^{(1)}$ and
${\lambda}_{l}^{(2)}$ are estimated with obvious simplifications for the EB procedure where they would be fixed.

We use Gibbs sampling [17] to simulate [**Ω**|** D**], where

$$\begin{array}{l}[\mathit{\beta},\mathit{\gamma}\mid \text{others}]\phantom{\rule{0.38889em}{0ex}}\propto \phantom{\rule{0.38889em}{0ex}}exp[{\sum}_{i=1}^{n}{Y}_{i}({\mathit{\xi}}_{i}^{t}\mathit{\beta}+{\mathit{Z}}_{i}^{t}\mathit{\gamma})-{\sum}_{i=1}^{n}log\{1+exp({\mathit{\xi}}_{i}^{t}\mathit{\beta}+{\mathit{Z}}_{i}^{t}\mathit{\gamma})\}]\\ \times exp(-0.5{\mathit{\beta}}^{t}\mathit{\beta}/{\sigma}_{\beta}^{2}-0.5{\mathit{\gamma}}^{t}\mathit{\gamma}/{\sigma}_{\gamma}^{2});\\ [{\mathit{\xi}}_{i}\mid \text{others}]\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\propto \phantom{\rule{0.38889em}{0ex}}exp[{Y}_{i}({\mathit{\xi}}_{i}^{t}\mathit{\beta}+{\mathit{Z}}_{i}^{t}\mathit{\gamma})-log\{1+exp({\mathit{\xi}}_{i}^{t}\mathit{\beta}+{\mathit{Z}}_{i}^{t}\mathit{\gamma})\}]\\ \times exp[-0.5{\sum}_{j=1}^{{J}_{i}}\mid \mid {\mathit{W}}_{ij}-{\mathbf{\Psi}}_{ij}^{(1)}{\mathit{\xi}}_{i}-{\mathbf{\Psi}}_{ij}^{(2)}{\mathit{\zeta}}_{ij}\mid {\mid}^{2}/{\sigma}_{\epsilon}^{2}-0.5{\mathit{\xi}}_{i}{\left\{{\mathbf{\Lambda}}^{(1)}\right\}}^{-1}{\mathit{\xi}}_{i}];\\ [{\mathit{\zeta}}_{ij}\mid \text{others}]=N\left[{\mathit{A}}_{ij}^{-1}\{{\mathit{W}}_{ij}-{\mathbf{\Psi}}_{ij}^{(1)}{\mathit{\xi}}_{i}\},{\mathit{A}}_{ij}^{-1}\right]\\ \left[{\lambda}_{k}^{(1)}\mid \text{others}\right]=\text{IG}\left\{I/2+{A}_{k}^{(1)},{\sum}_{i=1}^{n}{\xi}_{ik}^{2}/2+{B}_{k}^{(1)}\right\};\\ \left[{\lambda}_{l}^{(2)}\mid \text{others}\right]=\text{IG}\left\{{\sum}_{i=1}^{I}{J}_{i}/2+{A}_{k}^{(2)},{\sum}_{i=1}^{I}{\sum}_{i=1}^{{J}_{i}}{\zeta}_{\mathit{ijl}}^{2}/2+{B}_{k}^{(2)}\right\};\\ [{\sigma}_{\epsilon}^{2}\mid \text{others}]=\text{IG}\{{\sum}_{i=1}^{I}{J}_{i}/2+{A}_{\epsilon},{\sum}_{i=1}^{n}\mid \mid {\mathit{W}}_{ij}-{\mathbf{\Psi}}_{ij}^{(1)}{\mathit{\xi}}_{i}-{\mathbf{\Psi}}_{ij}^{(2)}{\mathit{\zeta}}_{ij}\mid {\mid}^{2}/2+{B}_{\epsilon}\},\end{array}$$

where
${\mathit{A}}_{ij}={\{{\mathbf{\Psi}}_{ij}^{(2)}\}}^{T}{\mathbf{\Psi}}_{ij}^{(2)}+{\{{\mathbf{\Lambda}}^{(2)}\}}^{-1}$. The first two full-conditionals do not have an explicit form, but can be sampled using MCMC. For Bernoulli outcomes the MCMC methodology is routine. We use the Metropolis-Hastings algorithm with a normal proposal distribution centered at the current value and small variance tuned to provide an acceptance rate around 30–40%. The last four conditionals are explicit and can be easily sampled. However, understanding the various components of these distributions will provide insights into rational choices of inverse gamma prior parameters. The first parameter of the full conditional for
${\lambda}_{k}^{(1)}$ is
$I/2+{A}_{k}^{(1)}$, where *I* is the number of subjects and it is safe to choose
${A}_{k}^{(1)}\le 0.01$. The second parameter is
${\sum}_{i=1}^{n}{\xi}_{ik}^{2}/2+{B}_{k}^{(1)}$, where
${\sum}_{i=1}^{n}{\xi}_{ik}^{2}$ is an estimator of
$n{\lambda}_{k}^{(1)}$ and it is safe to choose
${B}_{k}^{(1)}\le 0.01{\lambda}_{k}^{(1)}$. This is especially relevant for those variance components or, equivalently, eigenvalues of the covariance operator, that are small, but estimable. A similar discussion holds for
${\lambda}_{l}^{(2)}$. For
${\sigma}_{\epsilon}^{2}$ we recommend to choose *A _{ε}* ≤ 0.01 and
${B}_{\epsilon}\le 0.01{\sigma}_{\epsilon}^{2}$. Note that method of moments estimators for
${\lambda}_{k}^{(1)},{\lambda}_{l}^{(2)}$ and
${\sigma}_{\epsilon}^{2}$ are available and reasonable choices of
${B}_{k}^{(1)},{B}_{l}^{(2)}$ and

In this section, we compare the performance of the joint analysis procedure with the two-stage procedure through simulation studies. We examine the Bernoulli model with probit link when the functional exposure model is single-level and multilevel.

The outcome data was simulated from a Bernoulli/probit model with linear predictor
${\mathrm{\Phi}}^{-1}({p}_{i})={\beta}_{0}+{\int}_{0}^{1}{X}_{i}(t)\beta (t)\phantom{\rule{0.16667em}{0ex}}dt+{z}_{i}\gamma $, for *i* = 1, …, *n*, where *n* = 1000 is the number of subjects. We used the functional predictor *X _{i}*(

Consider the case when for each subject, *i*, instead of observing *X _{i}*(

Joint Bayesian analysis (upper panel) versus two-stage analysis with BLUP (bottom panel): box plots of and for different values of *β* and *σ*_{ε}.

For the case *σ _{ε}* = 3, Table 1 displays the root mean squared error (RMSE) and coverage probability of confidence intervals for

Consider now the situation when the predictors are measured through a hierarchical functional design, as in SHHS. To mimic the design of the SHHS, we assume *J* = 2 visits per subject and that the observed noisy predictors *W _{ij}*(

Figure 2 presents the boxplots of the estimates of *β* using the joint Bayesian analysis (top panels) and the two-stage method with BLUP estimation of scores (bottom panels). The left panels correspond to *λ*_{1} = 1, *λ*_{2} = 1 and three values of *σ _{ε}*, 0.5, 1 and 3. The joint Bayesian inference produces unbiased estimates, while the two-stage procedure produces biased estimates with the bias increasing only slightly with the measurement error variance. This confirms our theoretical results that, typically, in the hierarchical setting the noise magnitude is not the main source of bias. The middle and right panels display results when the measurement error variance is fixed,

Joint Bayesian analysis (upper panel) versus two-stage analysis with BLUP (bottom panel): box plots of for *β* = 1 and various values of *σ*_{ε} and *λ*’s.

In spite of the obvious advantages of the joint Bayesian analysis, the message is more nuanced than simply recommending this method. In practice, the two-stage method with BLUP estimation of scores is a robust alternative that often produces similar results to the joint analysis with less computational effort. Our recommendation is to apply both methods and compare their results. We also provided insight into why and when inferential differences may be observed, and, especially, how to address such differences.

We now apply our proposed methods to the SHHS data. We considered 3, 201 subjects with complete baseline and visit 2 data with sleep duration that exceeds 4 hours at both visits and we analyzed data for the first 4 hours of sleep. We focus on the association between hypertension (HTN) and sleep EEG *δ*-power spectrum. Complete descriptions of the SHHS data set and of this functional regression problem can be found in [9, 13]. We provide here a short summary.

A quasi-continuous EEG signal was recorded during sleep for each subject at two visits, roughly 5 years apart. This signal was processed using the Discrete Fourier Transform (DFT). More precisely, if *x*_{0}, …, *x _{N}*

Gray solid and dashed lines display percent *δ*-power in 30 seconds intervals for the same 2 subjects at baseline (top panel) and visit 2 (bottom panel). Missing data correspond to wake periods. Solid black line displays visit-specific average **...**

The first step was to subtract from each observed normalized function the corresponding visit-specific population average. Following notations in Section 3, *W _{ij}*(

To obtain the subject- and visit-level eigenfunctions and eigenvalues we used the MFPCA methodology introduced by [13] and summarized in Section 2.1. Table 2 provides the estimated eigenvalues at both levels indicating that 95% of level 1 (subject) variability is explained by the first five eigenfunctions and 80% is explained by the first eigenfunction. Table 2 indicates that there are more directions of variation in the level 2 (visit) space. Indeed, 80% of the variability is explained by the first 7 eigenfunctions and 90% of the variability is explained by the first 14 components (results not shown). The proportion of variability explained by subject-level functional clustering was *W* = 0.213 with a 95% confidence interval: (0.210, 0.236), i.e, 21.3% of variability in the sleep EEG *δ*-power is attributable to the subject-level variability.

Estimated eigenvalues on both levels for SHHS data. We showed the first 5 components for level 1 (subject level), and 7 components for level 2.

We started with *K* = 5 and performed a backward selection starting with the full outcome model
$\text{logit}\{P({Y}_{i}=1)\}={\beta}_{0}+{\sum}_{k=1}^{K}{\beta}_{k}\phantom{\rule{0.16667em}{0ex}}{\xi}_{ik}$, where *Y _{i}* is the HTN indicator variable and no additional covariates were included into the model. Three principal components were eliminated in the following order: PC4 (p-value= 0.49), PC2 (p-value= 0.46), PC3 (p-value= 0.23). The other two principal components (PCs) were retained in the model: PC1 (p-value< 0.001) and PC5 (p-value= 0.0012). For illustration, Figure 4 displays principal components 1, 2, 3, and 5. PC1 is, basically, a vertical shift. Thus, subjects who are positively loaded on it have a higher long-term

Characteristics of normalized sleep EEC *δ*-power. Principal components 1, 2, 3 and 5 of the subject-level functional space.

Table 3 provides results for two models, one without confounding adjustment (labeled Model 1) and one with confounding adjustment (labeled Model 2). The confounders in Model 2 are sex, smoking status (with three categories: never smokers, former smokers, and current smokers), age, body mass index (BMI) and respiratory disturbance index (RDI). Each model was fitted using a two-stage analysis with BLUP estimates of scores from the exposure model and a joint Bayesian analysis. We note that there is good agreement between the two methods with the exception of the statistical significance of PC5: the two stage analysis finds it highly significant whereas the Bayesian analysis does not. As expected, the magnitude of association varies with the amount of confounding adjustment. For example, Model 1 estimates that a one standard deviation increase in PC1 scores corresponds to a relative risk *e*^{−1.55*0.11} = 0.84 (Table 2 provides the variance of PC1 scores). Model 2, which adjusts for confounders, estimated that a one standard deviation increase in PC1 scores corresponds to a relative risk *e*^{−0.85*0.11} = 0.91.

Mean and standard error estimates (within brackets) for parameters of models of association between hypertension and sleep EEG *δ*-power. Smoking status has three categories: never smokers (reference), former smokers (smk:former) and current smokers **...**

These results are now easy to explain. The bias of point estimators is likely due to the variability of PC scores. The wider credible intervals obtained from the Bayesian analysis are likely due to the appropriate incorporation of the sources of variability. The negative relationship between smoking and hypertension may seem counterintuitive. However, in this study smokers are younger, have a lower BMI and many other smokers with severe disease were not included in the study [46].

Figure 5 displays results for *β*(*t*), the functional association effect between subject-specific deviations, *X _{i}*(

Results for *β*(*t*), the functional association effect between subject-specific deviations, *X*_{i}(*t*), from the visit-specific mean, *μ*(*t*) + *η*_{j}(*t*), and HTN in the model without confounders. Two-stage (top panel); joint Bayesian (bottom **...**

Joint Bayesian analysis is simple, robust and requires minimal tunning. This is possible because MFPCA produces a parsimonious decomposition of the functional variability using orthonormal bases. The use of orthonormal bases leads to reduction in the number of parameters and of posterior correlation among parameters, which lead to excellent mixing properties. For example, the web supplement displays chains for the regression coeffcients indicating independence-like behavior.

The methodology introduced in this paper was motivated by many current studies where exposure or covariates are functional data collected at multiple time points. The SHHS is just one example of such studies. The GMFLM methodology provides a self contained set of statistical tools that is robust, fast and reasonable for such studies. These properties are due to: 1) the connection between GMFLMs and mixed effects models; 2) the parsimonious decomposition of functional variability in principal directions of variation; 3) the modular way mixed effects models can incorporate desirable generalizations; and 4) the good properties of Bayesian posterior simulations due to the orthogonality of the directions of variation.

The methods described in this paper have a few limitations. First, they require a large initial investment in developing and understanding the multilevel functional structure. Second, they require many choices including number and type of basis functions, distribution of random effects, method of inference, etc. The choices we made are reasonable, but other choices may be more appropriate in other applications. Third, our framework opened many new theoretical problems; addressing all these problems exceeds the scope of the current paper and will be addressed in subsequent papers. Fourth, the computational problems may seem daunting, especially when we propose a joint Bayesian analysis of a data set with thousands of subjects, multiple visits and thousands of random effects. However, we do not think that they are insurmountable; see the software we posted at www.biostat.jhsph.edu/~ccrainic/webpage/software/GFR.zip.

Crainiceanu’s and Di’s research was supported by Award Number R01NS060910 from the National Institute Of Neurological Disorders And Stroke. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute Of Neurological Disorders And Stroke or the National Institutes of Health.

Ciprian M. Crainiceanu, Ciprian M. Crainiceanu is Associate Professor, Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21205 (E-mail: ude.hpshj@ciniarcc)

Ana-Maria Staicu, Ana-Maria Staicu is Assistant Professor, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695 (E-mail: ude.uscn.tats@uciats)

Chong-Zhi Di, Chong-Zhi Di is Assistant Professor, Biostatistics Program, Fred Hutchinson Cancer Seattle, WA 98109 (E-mail: ude.hpshj@idc)

1. Akaike H. Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika. 1973;60:255–265.

2. Baladandayuthapani V, Mallick BK, Hong MY, Lupton JR, Turner ND, Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis. Biometrics. 2008;64:64–73. [PMC free article] [PubMed]

3. Bilodeau M, Brenner D. Theory of Multivariate Statistics. Springer-Verlag; New York: 1999.

4. Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. 2. Chapman & Hall/CRC; 2000.

5. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. Chapman & Hall/CRC; New York: 2006.

6. Chiou JM, Müller HG. Quasi-likelihood regression with unknown link and variance functions. Journal of the American Statistical Association. 1998;93:1376–1387.

7. Chiou JM, Müller HG, Wang JL. Functional quasi-likelihood regression models with smooth random effects. Journal of the Royal Statistical Society, Series B. 2003;65:405–423.

8. Congdon P. Applied Bayesian Modelling. Wiley; 2003.

9. Crainiceanu CM, Caffo B, Di C, Punjabi N. Nonparametric signal extraction and measurement error in the analysis of electroencephalographic activity during sleep. Journal of the American Statistical Association to appear. 2009 [PMC free article] [PubMed]

10. Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society, Series B. 2004;66:165–185.

11. Crainiceanu CM, Ruppert D, Carroll RJ, Adarsh J, Goodner B. Spatially adaptive Penalized splines with heteroscedastic errors. Journal of Computational and Graphical Statistics. 2007;16(2):265–288.

12. Crainiceanu CM, Ruppert D, Claeskens G, Wand MP. Exact likelihood ratio tests for penalized splines. Biometrika. 2005;92(1):91–103.

13. Di C, Crainiceanu CM, Caffo B, Naresh P. Multilevel functional principal component analysis. Annals of Applied Statistics, online access 2008. 2009;3(1):458–488. [PMC free article] [PubMed]

14. Fan J, Zhang JT. Two-step estimation of functional linear models with application to longitudinal data. Journal of the Royal Statistical Society, Series B. 2000;62:303–322.

15. Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;1(3):515–533.

16. Gelman A, Carlin JB, Stern HA, Rubin DB. Bayesian Data Analysis. 2. Chapman & Hall/CRC; 2003.

17. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984;6:721–741. [PubMed]

18. Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC; 1996.

19. Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. [PubMed]

20. Hall P, Hosseini-Nasab M. On properties of functional principal components analysis. Journal of the Royal Statistical Society, Series B. 2006;68:109–126.

21. Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Annals of Statistics. 2006;34:1493–1517.

22. Indritz J. Methods in analysis. Macmillan & Colier-Macmillan; 1963.

23. James GM. Generalized Linear Models with Functional Predictors. Journal of the Royal Statistical Society, Series B. 2002;64:411–432.

24. James GM, Hastie TG, Sugar CA. Principal component models for sparse functional data. Biometrika. 2000;87:587–602.

25. Karhunen K. Über lineare Methoden in der Wahrscheinlichkeitsrechnung Suomalainen Tiedeakatemia. 1947

26. Loève M. Functions aleatoire de second ordre. Comptes Rendus des Séances de l’Academie des Sciences. 1945:220.

27. Luo X, Stefanski LA, Boos DD. Tuning variable selection procedures by adding noise. Technometrics. 2006;48:165–175.

28. Morris JS, Carroll RJ. Wavelet-based functional mixed models. Journal of the Royal Statistical Society, B. 2006;68:179–199. [PMC free article] [PubMed]

29. Morris JS, Vanucci M, Brown PJ, Carroll RJ. Wavelet-based nonparametric modeling of hierarchical functions in colon carcinogenesis. Journal of the American Statistical Association. 2003;98:573–583.

30. Müller HG. Functional modelling and classification of longitudinal data. Scandivanian Journal of Statistics. 2005;32:223–240.

31. Müller HG, Stadtmüller U. Generalized Functional Linear Models. The Annals of Statististics. 2005;33(2):774–805.

32. Natarajan R, Kass RE. Reference Bayesian methods for generalized linear mixed models. Journal of the American Statistical Association. 2000;95:227–237.

33. Ramsay JO, Silverman BW. Applied Functional Data Analysis. Springer-Verlag; New York: 2005.

34. Ramsay JO, Silverman BW. Functional Data Analysis. Springer-Verlag; New York: 2006.

35. Reiss PT, Ogden RT. Functional principal component regression and functional partial least squares. Journal of the American Statistical Association. 2007;102:984–996.

36. Reiss PT, Ogden RT. Functional generalized linear models with images as predictors. Biometrics, to appear. 2009 [PubMed]

37. Rice JA. Functional and longitudinal data analysis. Statistica Sinica. 2004;14:631–647.

38. Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society, Series B. 1991;53:233–243.

39. Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;6:461–464.

40. Staicu A-M, Crainiceanu CM, Carroll RJ. Fast methods for spatially correlated multilevel functional data. 2009 manuscript. [PMC free article] [PubMed]

41. Stram DO, Lee JW. Variance components testing in the longitudinal mixed effects model. Biometrics. 1994;50:1171–1177. [PubMed]

42. Wang N, Carroll RJ, Lin X. Efficient semiparametric marginal estimation for longitudinal/clustered data. Journal of the American Statistical Association. 2005;100:147–157.

43. Wu Y, Stefanski LA, Boos DD. Controlling variable selection by the addition of pseudovariables. Journal of the American Statistical Association. 2007;102:235–243.

44. Yao F, Lee TCM. Penalized spline models for functional principal component analysis. Journal of the Royal Statistical Society Series B. 2006;68:3–25.

45. Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. The Annals of Statistics. 2005;33:2873–2903.

46. Zhang L, Samet J, Caffo B, Punjabi NM. Cigarette smoking and nocturnal sleep architecture. American Journal of Epidemiology. 2006;164(6):529–537. [PubMed]

47. Zhao X, Marrron JS, Wells MT. The functional data analysis view of longitudinal data. Statistica Sinica. 2004;14:789–808.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |