PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Can J Stat. Author manuscript; available in PMC 2010 August 4.
Published in final edited form as:
Can J Stat. 2010 June; 38(2): 256–270.
doi:  10.1002/cjs.10062
PMCID: PMC2915799
NIHMSID: NIHMS184162

Longitudinal functional principal component modeling via Stochastic Approximation Monte Carlo

Abstract

The authors consider the analysis of hierarchical longitudinal functional data based upon a functional principal components approach. In contrast to standard frequentist approaches to selecting the number of principal components, the authors do model averaging using a Bayesian formulation. A relatively straightforward reversible jump Markov Chain Monte Carlo formulation has poor mixing properties and in simulated data often becomes trapped at the wrong number of principal components. In order to overcome this, the authors show how to apply Stochastic Approximation Monte Carlo (SAMC) to this problem, a method that has the potential to explore the entire space and does not become trapped in local extrema. The combination of reversible jump methods and SAMC in hierarchical longitudinal functional data is simplified by a polar coordinate representation of the principal components. The approach is easy to implement and does well in simulated data in determining the distribution of the number of principal components, and in terms of its frequentist estimation properties. Empirical applications are also presented.

Keywords: Functional data analysis, Hierarchical models, Longitudinal data, Markov chain monte carlo, Principal Components, Stochastic approximation

1. INTRODUCTION

Because longitudinal data can be viewed as sparsely observed functional data (Rice, 2004), functional principal components analysis becomes an important tool for analyzing longitudinal data. Since the early contributions of Besse & Ramsay (1986), Ramsay & Dalzell (1991), and Rice & Silverman (1991), a substantial literature has developed for functional principal components analysis, including but not limited to the work of Silverman (1996), Cardot (2000), James, Hastie & Sugar (2000), Rice & Wu (2001), Boente & Fraiman (2002), Ramsay & Silverman (2005, Chapters 8–10), Yao, Müller & Wang (2005a,b), Hall & Hosseini-Nasab (2006), Hall, Müller & Wang (2006), Jank & Shmueli (2006), Ocaña, Aguilera & Escabias (2007), Reiss & Ogden (2007), and Huang, Shen & Buja (2008).

Let Yi[ell] be the [ell]th response for the ith subject at time ti[ell]. Then the hierarchical model we are considering takes the general form

Yi=μ(ti)+gi(ti)+ϵi,
(1)

where μ(·) is the fixed effects population-level function, gi(·) represents random functions associated with each individual, and ϵi[ell] is noise. The goal is to estimate the fixed-effects function μ(·) and to understand the variability of the random effects functions gi(·).

It is standard in the functional data analysis literature to model the correlations among responses at different times through a mixed effects model, e.g., via the Karhunen-Lóeve expansion used in a number of the papers referenced above. As a modification of work by James et al. (2000) and Rice & Wu (2001), to fit models such as (1), Zhou, Huang & Carroll (2008) introduced penalized spline estimation of functional principal components for longitudinal data using a mixed effects model framework. Suppose that there are i = 1, …, n subjects, each of which contribute [ell] = 1, …, mi observations at times or locations ti[ell]. Let b(t) be a p × 1 orthogonal basis with the property that ∫b(t)bT(t)dt = I, and let Bi be a mi × p matrix with [ell]th row bT(ti[ell]). Let Yi be the mi × 1 vector of responses for the ith observation. Then the model becomes

Yi=Biθμ+BiΘfαi+ϵi;E(ϵiBi)=0;cov(ϵiBi)=σϵ2Imi;αi=Normal{0,Dα=diag(d1,,dM)},
(2)

where the principal components matrix Θf is p × M with p > M, the main effects parameter θμ is p × 1, and the vector of principal components scores αi is M × 1. In addition to the orthogonality constraint on the basis functions, for identifiability we need Θf to be orthogonal, ΘfTΘ=IM. Thus, comparing (1) and (2), we are making the approximations that μ(t) ≈ bT(tμ and gi(t) ≈ bT(tfαi.

It is interesting to note that in common with the Karhunen-Lóeve approaches, as in ours, if the number of principal components is M = 1, then the correlation structure of responses at different times is equicorrelated, but this is not the case when M > 1.

As in any kind of spline-based analysis, it is useful to think about penalizing the likelihood to ensure smoothness of fit and to avoid overfitting. In model (2), we have two functions that need to be penalized: the fixed effects functions represented as bT(tμ and the random effects functions represented as bT(tf. For the fixed effects function, for example, the usual device is to have a penalty parameter, here called λμ, and to penalize the likelihood by λμθμTKθμ, where K is a so-called penalty matrix. The typical form of the penalty matrix is the integrated square of the second derivative matrix, i.e., K = ∫ b″(t)b″(t)Tdt. Methods for constructing the orthogonal basis functions Bi and for computing the penalty matrix K are described by Zhou et al. (2008).

Enforcing smoothing of the random effects functions is somewhat more involved, because bT(tf has M columns, each of which needs to be penalized. Here is the device used by Zhou, et al. (2008). If [theta]j is the jth column of Θf, the penalty for the random functions becomes λfj=1MϑjTKϑj. Overall then, to ensure smoothness, the log likelihood is penalized by λμθμTKθμ+λfj=1MϑjTKϑj.

Of course, penalty parameters need to be estimated, and for this purpose there are a variety of possibilities. Zhou, et al. (2008) used 10-fold crossvalidation to estimate (λμ, λf) for any given M. To choose the number of principal components, they use an informal ad hoc procedure, see their Section 5.2. James et al. (2000) used the informal %-variation explained and a likelihood ratio test, although they found that the performance of the latter was “ambiguous” and they recommended the use of the former.

Especially when the number of subjects is small, there will be ambiguity as to the choice of the number of principal components. We propose to acknowledge this ambiguity by taking a Bayesian approach to the problem.

An outline of this paper is as follows. In Section 2 we present the basic algorithm. Section 3 describes simulation studies, while Section 4 investigates empirical examples. Concluding remarks are presented in Section 5. Technical details are given in an appendix.

2. ALGORITHM

2.1 Parameters of the Basic Model and their Posterior Distributions

The model is as described at (2).

In what follows, let σμ2=λμ1 and σf2=λf1. It is convenient to work with the parameterization ξ1 = log(σϵ), ξ2 = log(σμ) and ξ3 = log{diag(Dα)}. We write the number of principal components as M.

As described in Section 2.2, because of the restriction that ΘfTΘf=I, we will parameterize Θf in terms of auxiliary parameters Ωf. If α = (α1, …, αn), the main effects vector is θμ and Y denotes the response data, then except for constants of proportionality, we write the posterior distribution of (Ωf, ξ3, ξ2, ξ1, σf2, θμ, α) given (Y) as ff, ξ3, ξ2, ξ1, σf2, θμ, α|Y), see the Appendix for details.

In formulating this problem, we face a number of challenges.

  • By definition, Θf is constrained to be orthogonal. We achieve this orthogonality by combining the polar coordinate approach of Zhang and Davidian (2001) with a Gram-Schmidt transformation, see Section 2.2.
  • Even though Θf is orthogonal by construction, it still needs to be smoothed.
  • Because the models are of different dimensions, a basic reversible jump type algorithm is attractive (Green, 1995). However, we observed difficulty in accepting the death step of such an algorithm, and in order to explore the model space chose instead to implement the Stochastic Approximation Monte Carlo or SAMC algorithm of Liang, Liu & Carroll (2007).

2.2 Parameterizing Θf

In this section, we show how to write Θf as a function of other parameters Ωf, in such a way that Θf is orthogonal, and all elements of its first row are positive, as required by the identifiability result in Zhou et al. (2008).

Because Θf is an orthogonal p × M matrix with p > M, we turn to a polar coordinate representation, handily provided by Zhang and Davidian (2001). Let Θf = ([theta]1, …, [theta]M) where p > M + 1, each [theta]j is p × 1, and define the p × 1 vector sj as follows. Its first element is sin(γ1j), and its kth element for k = 2, …, p − 1 is sin(γkj)=1k1cos(γj). Finally, its pth element is =1p1cos(γj). These are thus polar coordinates in p − 1 parameters. Take −π/2 ≤ γkj ≤ π/2 for all (k, j) except k = j = 1, and let 0 ≤ γ11 ≤ π/2 to force identifiability, see Zhou et al. (2008) for details. The matrix sj is of unit length: sjTsj=sj2=1. Define γjT=(γ1j,,γ(p1)j) and Ωf = (γ1, …, γM), then we can generate γkj for k = 1, …, p − 1 and j = 1, …, M as uniform on their support.

To form an orthogonal p×M matrix, we use a Gram-Schmidt transformation. Define proje(s) = (sTe)e. Then [theta]1 = s1, for k = 2, …,M, define ϑk=(Ij=1k1ϑjϑjT)sk, and [theta]k = [theta]k*/||[theta]k*||. Finally, take Θf = ([theta]1, …, [theta]M).

Write eT = (1, 0, …0). Note that the first element of [theta]k*, eT[theta]k* is eT(Ij=1k1ϑjϑjT)sk. To force the first element of [theta]k to be positive, we set [theta]k = sign(eT[theta]k*)[theta]k*/||[theta]k*||.

2.3 Basic Reversible Jump Algorithm

Let M denote the number of principal components, and assume that it has a uniform prior distribution. We marginalized the posterior by integrating out θμ and α, resulting in the function gf, ξ3, ξ2, ξ1, σf2 |Y) given in equation (6) in the Appendix. Let (Ωf(v), ξ3(v), ξ2(v), ξ1(v), σf2(v)) denote the sample obtained at iteration v. Below we outline the algorithm assuming that there are only three possible principal components; however, this can easily be extended to include any number of principal components.

2.3.1 Initial Step

In this step, we decide which component of the model to update at iteration v + 1. We let “MH” here denote a Metropolis-Hastings step, see Section 2.3.2. Birth and death steps are described in Sections 2.3.3 and 2.3.4, respectively.

  • Draw U = Uniform(0, 1).
  • If M = 1 and U < 0.5 do “birth” step and set ω12 = 0.5; if M = 1 and U ≥ 0.5 do “MH” step.
  • If M = 2 and U < 1/3 do “birth” step and set ω23 = 1/3; If M = 2 and 1/3 ≤ U < 2/3 do “death” step and set ω21 = 1/3; if M = 2 and U ≥ 2/3 do “MH” step.
  • If M = 3 and U < 0.5 do “death” step and set ω32 = 0.5; if M = 3 and U ≥ 0.5 do “MH” step.

2.3.2 Metropolis-Hastings Step

In this step, we keep the dimensions of Ωf(v) and ξ3(v) unchanged while the parameters are updated. Draw U = Uniform(0, 1). If U ≤ 1/5 make Move-1; if 1/5 < U ≤ 2/5, make Move-2; if 2/5 < U ≤ 3/5, make Move-3; if 3/5 < U ≤ 4/5, make Move-4, and if 4/5 < U ≤ 1 make Move-5. In what follows, we set τ1 = τ2 = τ3 = τ4 = 0.1.

  • (a)
    (Move-1) Updating ξ1(v) and ξ2(v). Let U1 and U2 be independent Uniform(−0.5, 0.5) random variables. Draw ξ1=ξ1(v)+0.2U1 and ξ2=ξ2(v)+0.5U2. Calculate the MH ratio
    r=f(Ωf(v),ξ3(v),ξ2,ξ1,σf2(v)Y)f(Ωf(v),ξ3(v),ξ2(v),ξ1(v),σf2(v)Y).
    Accept (ξ1, ξ2) with probability min(1, r).
  • (b)
    (Move-2) Updating Ωf(v). Draw a vector e=Normal(0,τ22)I and add e to a column of Ωf(v) selected at random. Denote the new Ωf matrix by Ωf. Calculate the MH ratio
    r=f(Ωf,ξ3(v),ξ2(v),ξ1(v),σf2(v)Y)I(ΩfA)f(Ωf(v),ξ3(v),ξ2(v),ξ1(v),σf2(v)Y),
    where A denotes the set (π2,π2)dim(Ωf). Accept Ωf with probability min(1, r).
  • (c)
    (Move-3) Updating ξ3(v). Let U3 = Uniform(−0.5, 0.5). Draw the jth component of ξ3, ξ3,j=ξ3,j(v)+0.1U3. Calculate the MH ratio
    r=f(Ωf(v),ξ3,ξ2(v),ξ1(v),σf2(v)Y)f(Ωf(v),ξ3(v),ξ2(v),ξ1(v),σf2(v)Y).
    Accept ξ3 with probability min(1, r).
  • (d)
    (Move-4) Updating σf2(v). This can be done by a Gibbs step, see the appendix.
  • (e)
    (Move-5) Put the elements of ξ3(v) is descending order and exchange columns of Ωf(v) and Θf(v) accordingly. Because identifiability requires that the elements of Dα = diag{exp(ξ3)} be in descending order, this step avoids the label switching problem. Accept Ωf and ξ3 with probability min(1, r).

2.3.3 Birth Step

In this step, we try to append a new column to Ωf(v) and append an element to ξ3(v).

  • (a)
    (Angle sampling) Draw a random vector γ from Uniform(π2,π2), and draw ξ=Normal(0,τ42). set Ωf=(Ωf(v),γ) and ξ3=diag[ξ3(v),ξ].
  • (b)
    (Proposal acceptance) Calculate the MH ratio
    r=f(Ωf,ξ3,ξ2(v),ξ1(v),σf2(v)Y+1)f(Ωf(v),ξ3(v),ξ2(v),ξ1(v),σf2(v)Y)τ4π(p1)ϕ(ξτ4)ωk+1,kωk,k+1.
    (3)
  • (c)
    Accept Ωf and ξ3 with probability min(1, r).

2.3.4 Death step

In this step, we try to delete a random column of Ωf(v) and the corresponding diagonal element of ξ3(v).

r=f(Ωf,ξ3,ξ2(v),ξ1(v),σf2(v)Y1)f(Ωf(v),ξ3(v),ξ2(v),ξ1(v),σf2(v)Y)π(p1)ϕ(ξ(v)τ4)τ4ωk1,kωk,k1,
(4)

where ξ(v) denotes the selected diagonal element of ξ3(v).

The two groups of parameters, (Ωf, ξ3) and (ξ1,ξ2,σf2), are updated in an unbalanced way in the above algorithm. This is allowed by the Gibbs sampler.

2.4 SAMC

Because of low acceptance rates in the death step of the algorithm in Section 2.3, we opt to implement the Stochastic Approximation Monte Carlo or SAMC algorithm (Liang et al., 2007, Section 5; Liang, 2009), which is a method designed to provide samples from the entire space that are then weighted to form posterior distributions. The SAMC algorithm, especially appropriate for our problem, is given in equation (13) of Liang et al. (2007). We describe the algorithm for three principal components, detailed reasoning for this is provided later.

Let v be the index in the MCMC calculations. Let ζ = (1/3, 1/3, 1/3) and let v0 be a tuning parameter that can be adjusted. Define κv = v0/ max(v0, v). Also define θ(v) = (θv,1, θv,2, θv,3)T, where θv,i is the current estimate of the ith normalizing constant of the target density, which in this case equals log(1/ζi). Let T be the sampling space of allowable values of θ(v). Liang et al. (2007) make this a very large space, but because of the exponentiation involved in the adjusted Metropolis ratio, we have restricted this to the cube [−10, 10]3.

First start with v = 1 and θ(v) = {log(3), log(3), log(3)}T. Let M(v) be the current number of principal components, and let M* be a proposal for the number of principal components. If M(v) = M*, the Metropolis steps in Section 2.3.2 do not change. If M(v)M*, and if r is the birth or death ratio in (3) or (4), respectively, the only thing that changes is that r is multiplied by exp(θv,M(v) − θv,M*). Let the new model be M(v+1). Set θ* = θ(v) + κv+1(eM(v+1) − ζ), where the three component vector, eM(v+1), is all zeros except with a 1.0 in location M(v+1). Then if θT, we set θ(v+1) = θ*. Otherwise, we set θ(v+1) = θ* + c*, where c* is chosen so that θ+cT.

Having run SAMC, we are now in a position to do posterior inference. Thus for example, assuming equal prior probabilities for the three possible models, the posterior probability that the number of principal components, M = j, is given as

pr(M=jY)=vTexp(θv,M(v))I(M(v)=j)vTexp(θv,M(v)),

where T is the number of iterations used to do posterior inference. Posterior quantities can be calculated similarly. For example, given that we have j principal components, let σμ(v) be the value of σμ at iteration v. Then

E(σμM=j,Y)=vTσμ(v)exp(θv,M(v))I(M(v)=j)vTexp(θv,M(v))I(M(v)=j);E(σμY)=vTσμ(v)exp(θv,M(v))vTexp(θv,M(v)).
(5)

From (5), it can be seen that our approach employs model averaging to do posterior inference.

We provide details only for the case of up to 3 principal components because, in biological data, 3 principal components usually suffice, and most algorithms become unstable for higher number of principal components, see the comments of Rice and Silverman (1991) and Hall, Müller & Wang (2006) in a different context. If more than 3 principal components are used in our simulations and data analyses, the 4th and higher diagonal elements of the variance matrix Dα, which are in decreasing order, become crowded around zero, making the corresponding elements of Θf essentially unidentified.

3. SIMULATION DETAILS AND RESULTS

3.1 Simulation From the Model

We simulated 200 data sets from model (2) with n = 20 subjects, mi [equivalent] m = 11 equally-spaced observations per subject, and with Θf = p × M, where p = 5 and M = 1 and m = 1 and M = 2. When M = 1, Dα = 1.00, while with M = 2, we generated two scenarios, Dα = diag(1.00, 0.30) and Dα = diag(1.50, 0.75). We set σϵ2=0.25 and θμ = (−4.09, 7.07, −7.82, 6.59,−2.10)T. The values of Ωf were generated completely at random for each of the 200 data sets.

We ran SAMC with 200,000 Gibbs steps, and repeated the process 5 times with different starting values. In each case, we started by assuming 1 principal component, with Ωf generated completely at random, and with starting values for ξ1 = log(σϵ) = log{Uniform(0.4, 0.6)}, ξ3 = Uniform(0.5, 1.5), ξ2 = log(σμ) = Uniform[4, 6] and σf2=2c, where c yields 4.5 degrees of freedom for the smoother matrix trace{(BTB +K/c)−1BTB}. We also repeated the analysis with no smoothing: as might be expected, there was slightly more variability in this latter case.

In all cases, σϵ2 and θμ were estimated with little bias. In addition, the within-subject function variances diag{cov(BiΘfαi)} were estimated with only minor bias. An example of this is given in Figure 1, for the case that Dα = diag(1.50, 0.75). The top left panel shows the true function and the 5th and 95th pointwise percentiles of the simulations: there is almost no bias so that the mean of the simulations is not displayed. The bottom right panel shows that the within-subject function variances are also essentially unbiased. The top right panel gives the posterior probabilities of the models: in this case, 2 principal components were selected for all 200 data sets. The bottom left panel gives 16 examples of the within-subject function variances.

Figure 1
Results of the simulation study in the case of two principal components with Dα = diag(1.50; 0.75), as described in Section 3.1. Top left: all 200 posterior means for Biθμ. The true value is within the 200 lines. Top right: the ...

We also compared our analysis with the method of Zhou et al. (2008). Their informal method choose 2 principal components in 90.5% of the cases, and 1 principal component in 7% of the cases. In Table 1, we compare the mean absolute error and the mean squared error for estimating the within-subject function variances for the method of Zhou et al. (2008) versus our method. SAMC had somewhat smaller mean squared errors for estimating the mean function, and much smaller mean squared errors for estimating the variance function.

Table 1
Results of the simulation study for either 1 principal component, “1PC”, or 2 principal components, “2PC”. The simulations labeled “Model” are described in Section 3.1, while those labeled “Function” ...

3.2 Simulation Away From the Model

In this case, we simulated with σϵ2=0.25, and in two settings with Dα = 1.00 and Dα = diag(1.50, 0.75). We used the Bspline basis functions as in Zhou et al. (2008) with 7 basis functions. In the case of one principal component, the model is

Yi=sin(2πt)+2cos(2πt)αi+ϵi,

while with two principal components, and αi = (αi1, αi2)T,

Yi=sin(2πt)+2cos(2πt)αi1+{11.65(x0.5)2cos(2πt)π2}αi2+ϵi.

We used 7 basis functions with 3 interior knots. All functions are adequately but not exactly approximated with this number of knots. The same priors were used as above, except that we centered them at the estimates from Zhou et al. (2008). We also experimented with fixed priors as in Section 3.2, with little change in results.

The results are again reported in Table 1, with graphical representation in the two principal components case given in Figure 2. In the former, we see that SAMC very similar to the method of Zhou et al. (2008) for estimating the mean function sin(2πtj), and much better in terms of estimating the variance function. Figure 2 shows that there is more ambiguity in the posterior model probabilities across the simulated data sets, although in 97% of the cases two principal components have posterior probability > 0.50. We actually think that this ambiguity is a strength of our approach. If the model is not correct, then there should be ambiguity in the choice of the number of principal components, with some random data sets coming close to being fit by 2 principal components, and others needing 3.

Figure 2
Results of the simulation study in the case of two principal components with Dα = diag(1.50, 0.75), as described in Section 3.2. Top left: the true mean function and the pointwise 5th and 95th percentiles of the estimated mean function. The true ...

4. EMPIRICAL EXAMPLES

4.1 Outline

The simulations presented in Section 3 suggest that the method of Zhou et al. (2008) and our SAMC method will have roughly equal performance in terms of estimating the mean function, but will tend to have different performance when estimating the variance function. The SAMC method of course comes endowed with Bayesian posterior credible intervals. In this section, we will describe two data sets, each with two responses, and then display the analysis of all four. All analyses used 7 basis functions: the estimates of σϵ2, σμ2=1λμ, σf2=1λf and the first component of Dα from Zhou et al. (2008) were used as starting values. The starting value of Θf was random. SAMC was run with 200,000 iterations twice, and the results pooled.

4.2 RNA and CD4 Data

Our first example concerns data from an AIDS clinical trial. Both virologic and immunologic surrogate markers such as plasma HIV RNA copies (viral load) and CD4+ cell counts currently play important roles in evaluating antiviral therapies in AIDS clinical research. Both CD4+ cell counts and plasma HIV RNA copies have each been used as sole surrogate markers in AIDS clinical trials. Here we study the time course of plasma HIV RNA copies and CD4+ cell counts obtained from an AIDS clinical study conducted by the AIDS Clinical Trials Group (ACTG 315). In this study, forty-six evaluable HIV-1 infected patients were treated with potent antiviral therapy consisting of ritonavir, 3TC and AZT, see Lederman, Connick, Landay, Kuritzkes, Spritzler, St Clair, Kotzin, Fox, Chiozzi, Leonard, Rousseau, Wade, Roe, Martinez & Kessler (1998) and Wu & Ding (1999) for more details on this study. After initiation of potent antiviral treatment at day 0, patients were followed for up to 10 visits and at each visit both viral load and CD4+ cell counts were monitored simultaneously. The actual visit times are irregularly spaced and different for different patients. The visit time varies from day 0 (first visit) to day 196. We took logarithms for both CD4 data and RNA data and then standardized them to have mean zero and variance one. We also standardized follow up times to the unit interval. In this case, what we have been calling “time”, ti[ell], is indeed time.

4.3 Phenotype Data

We apply our method to phenotypic longitudinal data of colonic crypts in the rat model. Colonic crypts are tube like structures in the colon formed by cells which undergo development from stem cells to fully differentiated mature colonic cells. This development begins at the bottom of the colonic crypt where the mother stem cells divide and produce new colon cells which will mature as they travel up the crypt to the colon surface. Colonic DNA damage can change the behavior of any cell by fundamentally altering its normal function and potentially inducing uncontrollable growth. This can contribute to carcinogenesis, see Hong, Bancroft, Turner, Davidson, Murphy, Carroll, Chapkin & Lupton (2005).

In this model, the times ti[ell] correspond to the cell locations, normalized to the unit interval. The responses are the DNA adduct levels, a measure of DNA damage, in the proximal and in the distal region of the colon. In this analysis, there were 30 subjects. The DNA adduct levels in the proximal and distal regions were log-transformed, and then standardized to have mean zero and variance one.

In this case, what we call “time” is not time, but actually location of the cell. Since stem cells, or colloquially “baby” cells are at the bottom of the crypts, with t near zero, and since mature cells at the top of the crypts, t near one, the functions may be thought to be functions of the age of the cells.

4.4 Results

The method of Zhou et al. (2008) picked 1 principal component in all cases. The SAMC method had posterior probabilities of 1 principal component equal to 1.0 for the CD4 and RNA data. However, for the distal adducts, the posterior probability of two principal components was 0.92, while for proximal data 1 principal component has a posterior probability of 0.60. In all cases, the posterior probability of 3 principal components equalled 0.00.

As expected by the simulations, the estimated posterior mean function bT(tμ was almost identical to that found by Zhou et al. (2008) (not displayed). In Figure 3, we display the posterior mean and medians of bT(tμ, along with 90% pointwise credible intervals. In the adduct level, the distribution of ti[ell] was almost uniform, and hence the lengths of the credible intervals might be expected to be nearly constant, a conjecture seen in the plots. On the other hand, in the CD4 and RNA data, the distribution of ti[ell] is highly skewed right, with most of the observations being between 0.0 and 0.4. In this case, one would expect the credible intervals to be much wider for larger values of ti[ell] than for smaller values, an expectation born out in the plot.

Figure 3
Results of the data analysis of Section 4. Displayed are the posterior mean function (dash-dotted magenta line), the posterior median (black solid line) and 90% pointwise posterior credible intervals (blue and red dashed lines). The labels for each graph ...

Figure 4 displays results for estimating the marginal variances bT(t)ΘfDαΘfTb(t) (magenta dot-dashed lines), the estimate from Zhou et al. (2008) (black solid lines) and posterior pointwise 90% credible intervals. Variance functions are of course much harder to estimate than mean functions, and as seen in the simulations here we do not expect SAMC and the method of Zhou et al. (2008) to be nearly identical, and they are not, although the differences are fairly small given the uncertainties.

Figure 4
Results of the data analysis of Section 4. Displayed are the posterior mean variance function (dash-dotted magenta line), the estimated variance function from the method of Zhou et al. (2008) and labeled “pdfa” (black solid line) and 90% ...

5. DISCUSSION

We have shown how to use Stochastic Approximation Monte Carlo (SAMC) for Bayesian principal component selection and model estimation for longitudinal functional data. Our simulations suggest that our method is comparable to the method of Zhou et al. (2008) in terms of frequentist mean squared errors when estimating the mean function, and generally considerably more accurate when estimating the random variation about the mean function. While there are statistical gains in our approach, it is still MCMC based, and hence in terms of computing time generally much slower than the EM-based algorithm. Both methods scale up similarly when the number of subjects increases.

An interesting case to consider would be where the random functions gi(·) in (1) or their spline equivalents bT(·)θfαi in (2) are spatially correlated. Problems with such spatial correlation structure have been investigated by Baladandayuthapani, Hong, Mallick, Lupton, Turner & Carroll (2008) in a Bayesian manner and by Li, Wang, Hong, Turner, Lupton & Carroll (2007) in a frequentist manner. In unpublished work, we have developed a direct generalization of model (2) for this context, which again involves estimating the number of principal components. We believe that SAMC can be used successfully in this context.

Another interesting question is how to deal with correlation when, for example, there are pairs of responses such as might occur for familial data where different family members are followed in time (Sneddon & Sutradhar, 2004). Zhou, et al. (2008) discuss a generalization of model (2), this may be one way of attacking the problem from the principal components framework: they use an EM-algorithm. It would not be a great leap to generalize the SAMC algorithm to handle this case.

ACKNOWLEDGEMENTS

We wish to thank the editor and a referee for their detailed reading of the original manuscript and their many suggestions that led to a significant improvement in the paper. We also thank the participants of the International Symposium in Statistics on Inferences in Generalized Linear Longitudinal Mixed Models, whose comments led us to rethink important parts of this work.

Martinez was supported by a post-doctoral training grant from the National Cancer Institute (R25T-CA90301), Carroll's research was supported by a grant from the National Cancer Institute (CA57030), Liang's research was supported by a grant from the National Science Foundation (DMS-0706755) and Zhou was supported by a grant from the National Science Foundation (DMS-0907170). Both Carroll and Liang were also supported by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST).

APPENDIX

A.1 Marginal Posterior Density Computation

We provide some detail of integrations which lead to the marginal posterior density

f(Ωf,ξ3,ξ2,ξ1,σf2Y).

Recall that ξ1 = log(σϵ), ξ2 = log(σμ) and ξ3 = log{diag(Dα)}. Our prior distributions are in terms of (ξ1, ξ2, ξ3), but since they are not being integrated our displayed equations will use (σϵ, σμ, Dα).

Our prior distributions were ξ1=Normal(μ1,σ12), ξ2=Normal(μ2,σ22), and ξ3=Normal(μ3,σj2I). It is more tricky to penalize Θf as done by Zhou et al. (2008), because Θf must be orthogonal. We wrote the prior for [σf2Ωf] to be IG(A+pM2,B+=1MϑjTKϑj), with A = B = 3, where [theta]j is the jth column of Θf. This has the convenient behavior that samples from this distribution are of the same form as samples for a smoothing parameter in an ordinary penalized spline regression without constraints, see Carroll, Ruppert, Tosteson, Crainiceanu & Karagas (2004). The prior distribution for Ωf is independent uniforms on [−π/2, π/2], and is written as pf) = πM(p−1)I(−π/2 < Ωf < π/2).

Let the prior density for (Ωf,ξ3,ξ2,ξ1,σf2)=p(Ωf,ξ3,ξ2,ξ1,σf2). Then except for a normalizing constant, the posterior density is

f(Ωf,ξ3,ξ2,ξ1,σf2,θμ,αY)=i=1n{p(YiΩf,ξ3,ξ2,ξ1,σf2,θμ,α)p(αiξ3)}p(θμ)p(Ωf,ξ3,ξ2,ξ1,σf2).

For convenience in displaying the equations, where it causes no confusion we will write this posterior in terms of (σϵ, σμ, Dα), which is given as

(σϵ2)N2exp{(2σϵ2)1i=1n(YiBiθμBiΘfαi)T(YiBiθμBiΘfαi)}×2πDαn2exp{(12)i=1nαiTDα1αi}×σμ2K112exp{(12)θμT(Kσμ2)θμ}×p(Ωf,ξ3,ξ2,ξ1,σf2),

where N=i=1nni. Integrate with respect to α = (α1, …, αn)

f(Ωf,ξ3,ξ2,ξ1,σf2,θμY)=(σϵ2)N2×2πDαn2×σμ2K112×(i=1nC2i12)×exp{(2σϵ2)1i=1nRiTRi(12)θμT(Kσμ2)θμ}{+(12)i=1nC1iTC2iC1i}×p(Ωf,ξ3,ξ2,ξ1,σf2),

where C2i=(ΘfTBiTBiΘfσϵ2+Dα1)1, C1iT=(RiTBiΘfσϵ2), and Ri = YiBiθμ. Now we integrate with respect to θμ and obtain

f(Ωf,ξ3,ξ2,ξ1,σf2Y)=(σϵ2)N22πDαn2σμ2K112(i=1nC2i12)×C412exp{(12)i=1nYiT(σϵ2IniΨi)Yi+(12)C3TC4C3}×p(Ωf,ξ3,ξ2,ξ1,σf2),
(6)

where C4=(σμ2K+σϵ2i=1nBiTBii=1nBiTΨiBi)1, C3T=(σϵ2i=1nYiTBii=1nYiTΨiBi) and Ψi=σϵ4BiΘfC2iΘfTBiT.

A.2 Conditional Posterior Density of θμ and αi

Easy calculations show that samples from θμ and αi given the data and the parameters can be obtained as

[θμΩf,ξ3,ξ2,ξ1,σf2,Y]=Normal(C4C3,C4);[αiΩf,ξ3,ξ2,ξ1,σf2,θμ,Y]=Normal(C2iC1i,C2i).

REFERENCES

  • Baladandayuthapani V, Hong MY, Mallick BK, Lupton JR, Turner ND, Carroll RJ. Bayesian hierarchical spatially correlated functional data analysis with application to colon carcino-genesis. Biometrics. 2008;64:64–73. [PMC free article] [PubMed]
  • Besse P, Ramsay JO. Principal components-analysis of sampled functions. Psychometrika. 1986;51:285–311.
  • Boente G, Fraiman R. Kernel-based functional principal components. Statistics & Probability Letters. 2000;48:335–345.
  • Cardot H. Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. Journal of Nonparametric Statistics. 2000;12:503–538.
  • Carroll RJ, Ruppert D, Tosteson TD, Crainiceanu C, Karagas MR. Nonparametric regression and instrumental variables. Journal of the American Statistical Association. 2004;99:736–750.
  • Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82:711–732.
  • Hall P, Hosseini-Nasab M. On properties of functional principal components analysis. Journal of the Royal Statistical Society, Series B. 2006;68:109–126.
  • Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Annals of Statistics. 2006;34:1493–1517.
  • Huang JZ, Shen H, Buja A. Functional principal components analysis via penalized rank one approximation. Electronic Journal of Statistics. 2008;2:678–695.
  • Hong MY, Bancroft LK, Turner ND, Davidson LA, Murphy ME, Carroll RJ, Chapkin RS, Lupton JR. Fish oil decreases oxidative DNA damage by enhancing apoptosis in rat colon. Nutrition and Cancer. 2005;52(2):166–175. [PubMed]
  • James GM, Hastie TJ, Sugar CA. Principal component models for sparse functional data. Biometrika. 2000;87:587–602.
  • Jank W, Shmueli G. Functional data analysis in electronic commerce research. Statistical Science. 2006;21:155–166.
  • Lederman MM, Connick E, Landay A, Kuritzkes DR, Spritzler J, Clair M, Kotzin BL, Fox L, Chiozzi MH, Leonard JM, Rousseau F, Wade M, Roe JD, Martinez A, Kessler H. Immunological responses associated with 12 weeks of combination antiretroviral therapy consisting of Zidovudine, Lamivudine & Ritonavir: Results of AIDS Clinical Trials Group Protocal 315. J. Infectious Diseases. 1998;178:70–79. [PubMed]
  • Li Y, Wang N, Hong MY, Turner ND, Lupton JR, Carroll RJ. Nonparametric estimation of correlation functions in longitudinal and spatial data, with application to colon carcinogenesis experiments. Annals of Statistics. 2007;35:1608–1643.
  • Liang F. On the use of stochastic approximation Monte Carlo for Monte Carlo integration. Statistics and Probability Letters. 2009;79:581–587.
  • Liang F, Liu C, Carroll RJ. Stochastic approximation in Monte Carlo computation. Journal of the American Statistical Association. 2007;102:305–320.
  • Ocaña FA, Aguilera AM, Escabias M. Computational considerations in functional principal component analysis. Computational Statistics. 2007;22:449–465.
  • Sneddon G, Sutradhar BC. On semiparametric familial longitudinal models. Statistics & Probability Letters. 2004;69(3):369–379.
  • Ramsay JO, Dalzell CJ. Some tools for functional data analysis. Journal of the Royal Statistical Society, Series B. 1991;53:539–572. With discussion.
  • Ramsay JO, Silverman BW. Functional Data Analysis. 2nd Edition Springer; New York: 2005.
  • Reiss PT, Ogden T. Functional principal component regression and functional partial least squares. Journal of the American Statistical Association. 2007;102:984–996.
  • Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society, Series B. 1991;53:233–243.
  • Rice JA, Wu CO. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics. 2001;57:253–59. [PubMed]
  • Rice JA. Functional and longitudinal data analysis: Perspectives on smoothing. Statistical Sinica. 2004;14:613–29.
  • Silverman BW. Smoothed functional principal components analysis by choice of norm. Annals of Statistics. 1996;24:1–24.
  • Wu H, Ding A. Population HIV-1 dynamics in vivo: application models and inference tools for virological data from AIDS clinical trials. Biometrics. 1999;55:410–18. [PubMed]
  • Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005a;100:577–90.
  • Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. Annals of Statistics. 2005b;33:2873–903.
  • Zhang D, Davidian M. Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics. 2001;57:795–802. [PubMed]
  • Zhou L, Huang JZ, Carroll RJ. Joint modeling of paired sparse functional data using principal components. Biometrika. 2008;95:601–619. [PMC free article] [PubMed]