PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Neuroimage. Author manuscript; available in PMC 2014 May 15.
Published in final edited form as:
PMCID: PMC3621129
NIHMSID: NIHMS440022

Multiscale Adaptive Generalized Estimating Equations for Longitudinal Neuroimaging Data 1

Abstract

Many large-scale longitudinal imaging studies have been or are being widely conducted to better understand the progress of neuropsychiatric and neurodegenerative disorders and normal brain development. The goal of this article is to develop a multiscale adaptive generalized estimation equation (MAGEE) method for spatial and adaptive analysis of neuroimaging data from longitudinal studies. MAGEE is applicable to making statistical inference on regression coefficients in both balanced and unbalanced longitudinal designs and even twin and familial studies, whereas standard software platforms have several major limitations in handling these complex studies. Specifically, conventional voxel-based analyses in these software platforms involve Gaussian smoothing imaging data and then independently fitting a statistical model at each voxel. However, the conventional smoothing methods suffer from the lack of spatial adaptivity to the shape and spatial extent of region of interest and the arbitrary choice of smoothing extent, while independently fitting statistical models across voxels does not account for the spatial properties of imaging observations and noise distribution. To address such drawbacks, we adapt a powerful propagationseparation (PS) procedure to sequentially incorporate the neighboring information of each voxel and develop a new novel strategy to solely update a set of parameters of interest, while fixing other nuisance parameters at their initial estimators. Simulation studies and real data analysis show that MAGEE significantly outperforms voxel-based analysis.

Keywords: Gaussian smoothing, generalized estimation equation, hypothesis, longitudinal studies, multiscale adaptive, voxel-based analysis

1. Introduction

Many large-scale longitudinal neuroimaging studies including the Alzeimer’s disease neuroimaging initiative and the NIH magnetic resonance imaging study of normal brain have been or are being widely conducted to better understand the progress of neuropsychiatric and neurodegenerative diseases or normal brain development (Evans and Group., 2006; Almli et al., 2007; Petersen et al., 2010; Skup et al., 2011; Meltzer et al., 2009; Kim et al., 2010). The primary goal of longitudinal neuroimaging studies is to characterize individual change in neuroimaging measurements (e.g., volumetric and morphometric measurements) over time, and the effect of some covariates (or predictors) of interest, such as diagnostic status and gender, on the individual change (Petersen et al., 2010; Evans and Group., 2006). A distinctive feature of longitudinal neuroimaging data is that neuroimaging data have a temporal order. Imaging measurements of the same individual usually exhibit positive correlation and the strength of the correlation decreases with the time separation. Ignoring temporal correlation structure in imaging measures would likely influence subsequent statistical inference, such as increase in false positive and negative errors, which may lead to misleading scientific inference (Diggle et al., 2002; Fitzmaurice et al., 2004). However, the analysis of longitudinal imaging data has been hindered by the lack of advanced tools, which effectively integrate advanced image processing and statistical tools for analyzing complex and correlated imaging data along with behavioral and clinical data.

Standard software platforms have several major limitations. Standard neuroimaging software platforms including statistical parametric mapping (SPM) (www.fil.ion.ucl.ac.uk/spm/) and FMRIB Software Library (FSL) (www.fmrib.ox.ac.uk/fsl/), among many others, cannot accurately model longitudinal data when there are more than two visits (repeated measurements) (Nichols and Waldorp, 2010). Specifically, FSL can only accommodate a univariate measure at the second level (e.g., comparing visit 2 - visit 1) and SPM, even though it models the correlation among repeated measures, unrealistically assumes that the correlation is equal over the whole brain. In contrast, proper longitudinal modeling is available in standard statistical software platforms including proc MIXED and proc GEE in SAS and lme4 and nlme in R. Recently, analysis of functional neuroImages (AFNI) (afni.nimh.nih.gov/afni/) adopts the linear mixed effects modeling packages nlme (Pinheiro et al., 2011) and lme4 (Bates et al., 2011) in R for longitudinal functional magnetic resonance imaging data. Moreover, the Freesurfer implements the linear mixed effects modeling in the Freesurfer’s LME Matlab tool-box (http://surfer.nmr.mgh.harvard.edu/fswiki/LinearMixedEffectsModels) (Bernal-Rusiel et al., 2013). The conventional analyses of longitudinal neuroimaging data, referred to as voxel-based analysis, may be carried out in two major steps: Gaussian smoothing the imaging data and subsequently fitting a statistical model at each voxel by using either SAS or R. As discussed below, the voxel-based analysis is generally not optimal in power and the use of Gaussian smoothing may introduce substantial bias in statistical results.

The voxel-based analysis has several major limitations. First, it is common to apply a single Gaussian kernel with the full width half maximum in the range of 8–16mm to imaging data in order to account for registration errors, to Gaussianize the data, and to integrate imaging signals from a region, rather than from a single voxel. As pointed out in (Jones et al., 2005; Zhang and Davatzikos, 2011; Zhao et al., 2012; T.Ball et al., 2012), such Gaussian smoothing method can suffer from several major drawbacks including the arbitrary choice of smoothing extent and the lack of spatial adaptivity to the shape and spatial extent of the region of interest. Thus, it is suboptimal in power. In addition, as discussed in (Li et al., 2012), directly smoothing imaging data from twin and familial studies can introduce substantial bias in estimating these factors and lead to a dramatic increase of the numbers of false positives and false negatives. Second, as pointed out in (Worsley et al., 2004; Li et al., 2011), the voxel-based analysis essentially treats all voxels as independent units in the estimation stage, and thus it does not explicitly account for the spatial properties (e.g., location and smoothness) of imaging observations.

There are several attempts to address the limitations of voxel-based analysis. In (Zhang and Davatzikos, 2011), an optimally-discriminative voxel-based analysis was proposed to determine the spatially adaptive smoothing of images, followed by applying voxel-wise group analysis. The key drawback of the optimally-discriminative voxel-based analysis is that it uses the imaging data twice for both optimal weights determination and group analysis, and thus the test statistics calculated for the group analysis do not have a simple asymptotic null distribution, such as the t distribution. Thus, the optimally-discriminative voxel-based analysis has to resort to permutation test to calculate the p-values of test statistics. However, the permutation methods are not only computational intensive, but also require the so-called complete exchangeability. Such complete exchangeability is in fact a very strong assumption, and thus the optimally-discriminative voxel-based analysis is limited to both univariate imaging measure and two-group comparisons and cannot control for other continuous covariates of interest, such as age. Moreover, the optimally-discriminative voxel-based analysis has not been extended to analyze longitudinal neuroimaging data. In (Tabelow et al., 2006, 2008; Polzehl et al., 2010), the authors generalized a powerful propagationseparation (PS) approach (Polzehl and Spokoiny, 2000, 2006) to develop a multiscale adaptive linear model to adaptively and spatially de-noise functional magnetic resonance images and diffusion tensor images from a single subject and analyze neuroimaging data from cross-sectional studies. Recently, in (Zhu et al., 2009; Li et al., 2011; Skup et al., 2012), a multiscale adaptive regression model and a multiscale adaptive generalized method of moments approach were developed to integrate the PS approach (Polzehl and Spokoiny, 2000, 2006) with statistical modeling at each voxel for spatial and adaptive analysis of neuroimaging data from multiple subjects. All these PS related methods, however, only allow simultaneously smoothing all parameters.

This article has two major aims. The first one is to review a class of statistical methods called generalized estimating equation (GEE) for general neuroimaging researchers. We illustrate that GEE is a powerful tool for making statistical inference on regression coefficients in both balanced and unbalanced longitudinal designs and even twin and familial studies. The second aim is to develop a multiscale adaptive generalized estimating equation (MAGEE) for the spatial and adaptive analysis of longitudinal neuroimaging data. Compared with the existing literature including (Zhu et al., 2009; Li et al., 2011; Skup et al., 2012; Polzehl et al., 2010), we make several novel contributions. (i) MAGEE integrates the PS approach with GEE, which is a semiparametric model, into a simultaneous smoothing and estimation framework, allowing adaptively smoothing images while accounting for the spatial pattern of activation regions. (ii) We develop a new novel strategy of estimation and testing hypothesis of interest in MAGEE. Specifically, the new strategy allows solely smoothing the images of a set of parameters of interest, while fixing other parameters at their initial estimates. For instance, the scientific interest of many neuroimaging studies typically focuses on the comparison of imaging measures across diagnostic groups, while controlling for age, gender, and other covariates. MAGEE allows solely smoothing the images for parameter estimates of the diagnostic effect without smoothing the images of other parameter estimates, such as age and gender. (iii) We use simulated data sets to show that the new strategy can dramatically gain statistical power in some scenarios. (iv) Theoretically, in the appendix, the adaptive estimates and test statistics of MAGEE are shown to have appropriate statistical properties. We will validate companion software for MAGEE and release it to the public.

2. Methods

2·1. Balanced Versus Unbalanced Designs

In a typical longitudinal study, one collects a fixed number of repeated measurements on all study participants at a set of common time points. When all individuals have the same number of repeated measurements on a common set of occasions, the study is “balanced” over time. Many of the early statistical methods, such as repeated-measures analysis of variance, have been developed specifically for balanced longitudinal designs. However, in most longitudinal studies over a relatively long duration in the health sciences, some individuals almost always miss their scheduled visit or date of observation. Consequently, the sequence of observation times may vary across individuals. In that case, we call the data “unbalanced” over time.

2·2. Missing Data

Missing data, a ubiquitous problem in longitudinal studies, can be caused by various reasons, such as skipped assessments, bad MRI scans, or study dropout. Therefore, in practice, the longitudinal data are necessarily unbalanced and they are often called “incomplete” to emphasize the fact that an intended measurement for an individual could not be obtained. Complete case analysis, a common and simple method for handling incomplete data, focuses on all individuals with complete measurements from the analysis. This approach, however, can be highly inefficient when a large proportion of the subjects are excluded. Moreover, when the individuals with complete data are not a random sample from the target population, this approach can also seriously bias estimates of longitudinal change. Fortunately, most statistical methods for longitudinal analysis, such as GEE discussed below, accommodate incomplete data under less stringent assumptions, such as missing at random (Diggle et al., 2002; Fitzmaurice et al., 2004). A good longitudinal analysis should include serious assessment of these assumptions for the data at hand and consideration of the effects of their violation on the results of the analysis, which is beyond the score of this paper. See, for example, (Ibrahim and Molenberghs, 2009) for an exhaustive review of missing data methods in longitudinal studies.

2·3. Data Structure

In a typical longitudinal neuroimaging study, we observe repeated imaging and clinical measures from n subjects. Let mi denote the total number of time points and tij be the j-th time point for the ith subject, in which i = 1, …, n and j = 1, …, mi. Let An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg be an imaging space and d represent a specific voxel of An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg. Specifically, for the ith subject at time tij, we observe imaging data, denoted by Yij = {yi(tij, d) : d An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg}, and a q × 1 covariate vector of interest, denoted by xij = xi(tij), where yi(t, d) is a p × 1 vector of imaging measures (e.g., diffusion tensor) at the voxel d. The xij may include age, time point, gender, genetic marker, diagnostic status, height, and their interactions, among many others. Without loss of generality, we assume that all imaging data have already been registered to a common template and An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg is the common template. Moreover, our main scientific interest is focused on characterizing the longitudinal change of imaging measure.

2·4. GEE Model Formulation

The GEE method for longitudinal data has 2 main components: (i) a mean model for the mean response and its dependence on covariates; and (ii) a working covariance model for the covariance among repeated measures. GEE allows characterization and comparison of changes in the imaging measure of interest over time, complex models for the covariance, and accommodation of incomplete data. GEE can also handle unbalanced data, accommodate continuous and discrete covariates, and model the covariance in a parsimonious way (Liang and Zeger, 1986; Diggle et al., 2002). Moreover, GEE is also free of distributional assumption. For notational simplicity, we temporarily drop voxel d from our notation.

The mean model of GEE is to characterize the trajectories of imaging measure over time and their association with inter-individual differences in selected covariates (e.g., diagnostic group, and gender). Such trajectories can be linear or nonlinear. Specifically, the mean model of GEE is given by

equation M1
(1)

where β is a q ×1 vector of regression coefficients and μ(·, ·) is a p × 1 vector of known functions that describe longitudinal change in the responses. For instance, it is common to set equation M2 for linear models. Furthermore, we consider a quadratic growth model given by

equation M3
(2)

where β = (β1, …, β6)T and equation M4, in which gi represents gender. As an illustration, we considered the fractional anisotropy values of 38 subjects obtained from our neonatal study of normal brain and each subject has three repeated measures. Panels (a)–(d) of Fig. 1 are, respectively, the line plots of fractional anisotropy versus age with the fitted quadratic growth curves for four selected voxels.

Fig. 1
Results of Real Longitudinal Data I: the raw line curves (in black) and fitted growth curves (in red) from MAGEE-A at a selected voxel obtained from the genu (panel (A)), splenium (panel (B)), optic radiation (panel (C)), and cerebral peduncle (panel ...

The working covariance model of GEE is to characterize the correlation among repeated measurements on an individual and heterogeneous variability, which are two common features of longitudinal data. That is, knowledge of the value of the response on one occasion provides information about the likely value of the response on a future occasion and the variance of the response changes over the duration of the study. Specifically, the covariance matrix of Yi = (yi(ti1)T, …, yi(timi)T)T is given by

equation M5
(3)

for i = 1, …, n, where θ = (β, α, γ), equation M6 is a pmi × pmi diagonal matrix and contains the standard deviations of Yi, and Ci(α) represents the correlation among all mi repeated measurements over time and the correlation among all p imaging measures. Moreover, γ and α are, respectively, additional parameter vectors for characterizing the variances of imaging measures and the correlations among imaging measures across time.

As an illustration, we consider several covariance structures (3) for univariate responses. Some additional parametric models for covariance structure can be found in Appendix A. Without loss of generality, we consider the homogeneous variances of yi(tij), that is, equation M7 for all i, j. In this case, γ = (σy). There are several commonly used correlation structures including exchangeable, autoregression AR(1), unstructured, and mdependent. These four correlation structures are summarized as follows:

  • exchangeable: Corr(yi(tij), yi(tik)) = ρ for jk;
  • autoregressive: Corr(yi(tij), yi(tik)) = ρ|jk| for jk;
  • mdependent: Corr(yi(tij), yi(tik)) = ρ|jk| for |jk| ≤ m;
  • unstructured: Corr(yi(tij), yi(tik)) = ρjk for jk.

Based on a specific covariance structure of yi(tij), we can derive the explicit form of Ai(β, γ) and Ci(α). For instance, let’s consider the exchangeable correlation. In this case, we have

equation M8
(4)

where 1mi is an mi × 1 vector of ones and Imi is an mi × mi identity matrix.

There are several advantages and drawbacks associated with the four correlation structures. The exchangeable working correlation structure is well known as compound symmetry in the longitudinal literature. The exchangeable correlation has simple interpretation, but it may be not reasonable for longitudinal studies with more than three measurements. The auto-regressive AR(1) correlation structure assumes the decreasing correlation as the distance between two measures increases. The mdependent correlation structure assumes the zero correlation when two measures are m steps away. Both the auto-regressive and mdependent correlation structures are appealing for equally spaced data, less so for unequally spaced data. The unstructured working correlation structure leaves the correlation matrix completely unspecified and has mi(mi − 1)/2 parameters to be estimated, which limits to the studies with few observation times or conditions. The unstructured working correlation is not useful in the presence of missing data and/or varying numbers of observations per subject.

2·5. Voxelwise GEE Estimation Procedure

In most longitudinal studies, our primary interest focuses on making inference on β or subcomponents of β. At each voxel, voxelwise GEE is iteratively solved as follows.

  • Given estimates of [gamma with circumflex](s) and [alpha](s), one calculates [beta](s+1) as the solution of the following GEE given by
    equation M9
    (5)
    where μi(β) = (μ(xi1, β)T, …, μ(ximi, β)T)T and Ui(β) = [partial differential]μi(β)/[partial differential]βT. Specifically, one sets beta(0) = [beta](s) and then updates beta(k) according to a Newton-type algorithm until ||beta(k+1)beta(k) ||2 is smaller than a fixed small number ε, say 10−4. One sets [beta](s+1) = beta(k+1).
  • Given [beta](s+1), one computes equation M10 for all i, j and uses them to construct some estimates of (α, γ), denoted by ([alpha](s+1), γ(s+1)). See Appendix B for some estimation methods for (α, γ).
  • Iterate the previous two steps until convergence.
  • Obtain the final estimator [theta w/ hat] = ([beta], [gamma with circumflex], [alpha]) and then calculate Vi(θ), [r with circumflex]ij = yi(tij) − μ(xij, [beta]) and the sandwich estimator of the covariance matrix of [beta], denoted by Σn([beta]), which is given by
    equation M11
    (6)
    where [r with circumflex]i = Yiμi([beta]), Ûi = Ui([beta]), and Vi = Vi([theta w/ hat]).

The [beta] based on GEE (5) has three attractive properties. Firstly, [beta] can be almost as efficient as the maximum likelihood estimates of β in many practical applications provided that Vi(θ) for all i are reasonably approximated (Diggle et al., 2002). In fact, GEE in (5) are exactly the maximum likelihood score equations for multivariate Gaussian data when Vi(θ) are correctly specified (Fitzmaurice et al., 2004). Secondly, [beta] converges to the true, but unknown β* as n → ∞, even if Vi(θ) are incorrectly specified. When regression coefficients in β are the scientific focus, which is the case for most longitudinal neuroimaging studies, one should concentrate on modeling the mean structure, while using a reasonable approximation to Vi(θ). Thirdly, Σn([beta]) is close to the true covariance matrix of [beta] as the sample size is large even if Vi(θ) is misspecified and ([alpha], [gamma with circumflex]) is prefixed. This property is very appealing in practice, since GEE is directly applicable to twin and family studies if our primary interest is on β. In practice, to ensure the robustness of the inferences about β, one may fit GEE by using different Vi(θ) and compare the two sets of estimates and their robust standard errors. If they differ substantially, a more careful treatment of Vi(θ) is necessary.

2·6. Weighted Generalized Estimating Equations

In longitudinal studies, β(d) is commonly decomposed as a qI × 1 vector of parameters of interest, denoted by βI (d), and a qN × 1 vector of nuisance parameters, denoted by βN (d), where q = qI + qN. For instance, in model (2), if our primary interest focuses on the gender and time interaction, then βI (d) includes the regression coefficients for gitij and equation M12, whereas all other regression coefficients can be regarded as nuisance parameters, that is,

equation M13

Moreover, the components of (α(d), γ(d)) in Vi(θ) are often nuisance parameters. Throughout the paper, we fix (α(d), γ(d)) at their voxel-wise GEE estimates ([alpha](d), [gamma with circumflex](d)) obtained from the voxel-wise GEE procedure.

We propose a weighted generalized estimating equations method, referred to as weighted GEE-A, to spatially and adaptively update {βI (d) : d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg}, while fixing βN (d) at their voxel-wise GEE estimates [beta]N (d) obtained from the voxel-wise GEE procedure. Since weighted GEE-A focuses on updating a subset of β(d), it distinguishes from the original multiscale adaptive regression model and PS methods (Li et al., 2011; Polzehl and Spokoiny, 2000, 2006). The key idea of weighted GEE-A is to combine GEEs for βI (d′) in a neighboring sphere of voxel d to make inference on βI (d) at the voxel d. Specifically, let B(d, h) be a sphere with radius h centered at voxel d and ω(d, d′; h) be a weight function of triple (d, dh) such that

equation M14

weighted GEE-A is based on a set of weighted GEEs, denoted by Gn(βI (d); ω, h), which is defined as follows:

equation M15
(7)

where ei(βI (d), [beta]N (d′)) =Yi(d′) − μi(βI (d), [beta]N (d′)), UiI (β) = [partial differential]μi(β)/[partial differential]βI and Vi([theta w/ hat](d′)) is evaluated at the voxel-wise GEE estimate of θ(d′). Therefore, in (7), only βI (d) is unknown, whereas other parameters are fixed at their voxel-wise GEE estimates. Given the current weights {ω(d, d′; h) : d, d[set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg}, we consider the weighted GEE estimator of βI (d), denoted by [beta]I (d, h), which satisfies

equation M16
(8)

A good ω(d, d′; h) plays a critical role in preventing oversmoothing the estimates of βI (d) across voxel, while preserving the edges of significant regions. We require that ω(d, d′; h) characterize the similarity between βI (d) and βI (d′). If βI (d) differs from βI (d′), then the data in voxel d′ do not contain too much information on βI (d′) and ω(d, d′; h) should be close to 0. However, if βI (d) is close to βI (d′) indicating that the data in voxel d′ contain useful information on βI (d), then ω(d, d′; h) should be significantly bigger than zero. See the explicit expression of ω(d, d′; h) in Appendix C.

2·7. Testing Statistics for weighted GEE

We present test statistic based on the weighted GEE (7) at each d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg for a fixed radius h. Our choice of which hypotheses to test is motivated by either a comparison of brain structure (or function) across diagnostic groups or the detection of a change in brain structure (or function) across time (Skup et al., 2011; Meltzer et al., 2009; Kim et al., 2010). These questions usually can be formulated as testing hypotheses about βI (d) as follows:

equation M17
(9)

where R is a r × qI matrix and b0 is a r × 1 specified vector, such as a r × 1 vector of zeros. For instance, in model (2) with βI (d) = (β5(d), β6(d))T, if we are interested in testing the time and gender interaction, then we have

equation M18

We test the null hypothesis H0,β using a Wald test statistic given by

equation M19
(10)

where Σn([beta]I (d, h)) is an approximation of the covariance matrix of [beta]I (d, h). See the explicit expression of Σn([beta](d, h)) in (28) for details.

2·8. Multiscale Adaptive Generalized Estimating Equations

We develop a PS procedure based on multiscale adaptive generalized estimating equations, referred as to MAGEE-A, by integrating weighted GEE-A and the PS procedure proposed in (Polzehl and Spokoiny, 2000, 2006). Since PS and the choice of its associated parameters have been discussed in details in (Polzehl et al., 2010; Li et al., 2011), we briefly mention them here.

  • At each d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg, the PS procedure evolves along a sequence of nested spheres with increasing radii hs as follows:
    equation M20
    (11)
  • At the scale h0 = 0, we just use the voxel-wise GEE estimator [theta w/ hat](d, h0) = [theta w/ hat](d) and then we fix (βN(d), α(d), γ(d)) at ([beta]N (d), [alpha](d), [gamma with circumflex](d)).
  • We combine all information contained in {[beta]I (d) : d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg} to calculate weights ω(d, d′; h1) at scale h1 for all d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg. Subsequently, we utilize all data in {B(d, h1) : d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg}, all weights {ω(d, d′; h1) : d, d[set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg}, and weighted GEEs (7) to estimate [beta](d; h1) for all d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg. In this way, we can sequentially determine ω(d, d′; hs) and adaptively update [beta] (d, hs) from h0 = 0 to hS = r0. At the end of PS, we calculate the Wald test statistics Wβ(d, hS) across all voxels d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg.

A path diagram of the PS procedure for MAGEE-A is given as follows:

equation M21
(12)

The detailed steps of MAGEE-A are given Appendix D.

3. Results

3·1. Simulation: Scenario I

We simulated data at all m = 23, 232 voxels on a 88 × 88 × 5 phantom image for n = 80 subjects. Each slice contains the same activation region. Specifically, at each voxel d in An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg, we simulated yij(d) according to

equation M22

where β(d) = (β1(d), β2(d), β3(d))T and xij = (1, xij2, xij3)T. To create unbalanced data, we set mi = 2 for i = 1, …, 40 and mi = 3 for i = 41, …, 80. We independently generated εi(d) = (εi1(d), …, εimi(d))T from a multivariate N(0, Ωi) distribution, where diag(Ωi) equals an mi × 1 vector with all ones and the correlation between εij1 (d) and εij2 (d) equals 0.7|j1j2| for j1, j2 = 1, …, mi and i = 1, …, n. To include time-dependent and time-invariant covariates in longitudinal studies, we set xij2 and xij3 to be a time-dependent covariate and a time-invariant covariate, respectively. Specifically, we generated xij2 ~ U [j − 1, j] for j = 1, …, mi, where U [a, b] denotes the uniform distribution on [a, b]. We also generated xij3 independently from a Bernoulli distribution with equal probability for each i. We set β1(d) = 0 across all voxels d. For (β2(d), β3(d)), we divided the phantom image into six different regions of interest with different shapes and then varied (β2(d), β3(d)) as (0, 0), (0.05, 0.9), (0.1, 0.8), (0.2, 0.6), (0.3, 0.4), and (0.4, 0.2) across the six regions of interest. Moreover, we chose two different shapes in order to test our methods in a relatively rich spatial structure of activation areas. By varying (β2(d), β3(d)) in different regions of interest representing different signal-to-noise ratios, we can examine the finite-sample performance of our methods at different signal-to-noise ratios and shapes.

We fitted GEE with the AR(1) working correlation structure and homogeneous variances and set

equation M23

We used MAGEE-A to spatially and adaptively calculate the parameter estimates of β(d) across all voxels. Moreover, if βI (d) = β(d) for all d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg, then we call MAGEE-A as MAGEE-B for this special case. MAGEE-B can also be regarded a direct extension of the original functional magnetic resonance imaging and PS methods in the longitudinal setting. For MAGEE-A, we set βI (d) = β2(d) and βN (d) = (β1(d), β3(d))T, whereas for MAGEE-B, we simultaneously updated all subcomponents of β(d). For both MAGEE-A and MAGEE-B, we applied the PS procedure described in Appendix D to calculate adaptive parameter estimates across all voxels at 10 different scales. Furthermore, for β(d), we calculated bias, empirical standard error, root-mean-square error estimates, and the ratio of the empirical standard error over the mean of the root-mean-square error estimates in all six regions of interest based on the results obtained from the 1,000 simulated data sets. We also smoothed the simulated data by using heat kernel smoothing with 1, 22, 32, 42, 52, 62, 72, and 82 iteration, yielding the effective smoothness of approximately 1, 2, 3, 4, 5, 6, 7, and 8mm, respectively (Chung et al., 2005), and calculated parameter estimates by using the voxel-wise GEE method.

For simplicity, we present some selected results for [beta]2(d, hs). Both MAGEE-A and MAGEE-B show better accuracy of parameter estimates compared with the voxel-wise GEE estimates based on the effective smoothness scales of 4 and 8mm in terms of bias (Fig. 2 (B)–(E)). Moreover, MAGEE-B slightly outperforms MAGEE-A in terms of bias and the root-mean-square error (Table 1), since β2(d) and β3(d) have the same imaging pattern and updating all components of β(d) can lead to a better preservation of the edges of regions of interest. Comparing the panels (D) and (E) of Fig. 2 confirms this observation.

Fig. 2
Simulation results for scenario I: Panel (A) is a selected slice of the ground truth image for simulation study of the six regions of interest with different gray levels representing β2(d)=0, 0.05, 0.1, 0.2, 0.3, 0.4 respectively; Panel (B) represents ...
Table 1
Simulation results: Average Bias, RMS, SD, and RE of β2(d) parameters in the six regions of interest at scale h10 for MAGEE-A, MAGEE-B, and the voxel-wise GEE methods with the heat kernel smoothing with k iterations, which are denoted as SGEE- ...

We tested the hypotheses H0 : β2(d) = 0 and H1 : β2(d) ≠ 0 across all voxels by using Wβ(d, h10) for MAGEE-A and MAGEE-B and evaluated their performance in cluster based thresholding (Salimi-Khorshidi et al., 2011; Chumbley et al., 2010; Silver et al., 2011). Specifically, we thresholded the images of Wβ(d, h10) by using a false-discovery rate corrected threshold, pc = 0.05, to identify resulting clusters of contiguous supra-threshold voxels. Then, we separate clusters into two groups including a ‘good’ (or true positive) group, in which all clusters contain at least one true positive voxel, and a ‘bad’ group (’true negative’), in which all clusters contain only true negative voxel(s). Similarly, we applied the same procedure to the Wald-test statistic maps of the voxel-wise GEE results based on smoothed imaging data with different effective smoothness scales.

We calculated three statistics based on the two groups of clusters including the dice overlap ratio between the clusters in the ‘good’ group and the true activation regions, the cluster number, and the spatial extent (or the number of voxels) of all clusters in the ‘bad’ group. The value of dice overlap ratio is between 0 and 1 and a larger dice overlap ratio value represents better performance in detecting true positive voxels. Moreover, if the spatial extent of a cluster is smaller than a threshold, such as 10, then the cluster is not detected to be significant by the cluster based thresholding. Table 2 presents the three statistics of each method.

Table 2
Simulation results: Dice Overlap Ratio (DOR), the non overlap cluster number and cluster size for testing β2(d) = 0 in the six regions of interest for MAGEE-A, MAGEE-B, and the voxel-wise GEE methods with the heat kernel smoothing with k iterations, ...

Inspecting Table 2 reveals several facts. (i) Both MAGEE-A and MAGEE-B have larger dice overlap ratio compared with voxel-wise GEEs with relatively large smoothing scale (≥4mm). See Fig. 4 (a) for details. (ii) Although MAGEE-A and MAGEE-B have many more non-overlap small clusters than voxel-wise GEEs methods, their spatial extent is much smaller than the standard spatial extent threshold (e.g., 10). (iii) MAGEE-B outperforms MAGEE-A in terms of dice overlap ratio and non-overlap cluster number. (iv) For voxel-wise GEEs, small smoothing scales (≤3mm) outperform large smoothing scales (≤4mm) in terms of dice overlap ratio, and vice versa in terms of non-overlap cluster number. However, as expected, the larger is smoothing scale, the larger is non-overlap cluster. This can lead to false detection of non-overlap clusters.

Fig. 4
Simulation results for scenarios I and II for comparing MAGEE-A, MAGEE-B, and the voxel-wise GEE methods with the heat kernel smoothing with k iterations, which are denoted as SGEE-k for k = 1, 4, 9, 16, 25, 36, 49, and 64. Panel (a) presents Dice Overlap ...

Figure 4 (b) and (c) presents the results for testing β2(d) = 0 in all voxels in the six regions of interest for all methods by using the false-discovery rate corrected threshold, pc = 0.05, based on 1,000 simulations. For each method, we calculated the type I error rate and power based on voxels in all regions of interest for each simulated data set. Figure 4 (b) also presents their average type I error rates, whereas Figure 4 (c) presents the detection power of voxels in each of active regions of interest. MAGEE-A and MAGEE-B have higher detection power in most regions of interest, while their type I error rates are quite small. As smoothing scale is larger than 3mm, the type I error rate of the voxel-wise GEE becomes more liberal and not well controlled. In contrast, although the type I error rate of voxel-wise GEE is small, their detection powers are poor compared with the MAGEE-A and MAGEE-B methods. In summary, the MAGEE methods outperform the voxel-wise GEEs with different smoothing scales in terms of both type I and II error rates.

3·2. Simulation: Scenario II

We used the same setup as scenario I except that we randomly generated β3(d) from U[−5, 5] in the prefixed seven regions of interest with β2(d) ≠ 0. In this case, β3(d) varies in these six regions of interest, and thus the β2 image and the β3 image do not share the same imaging pattern. We are interested in examining the effect of different patterns in the β3 image on [beta]2(d, hs) across all d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg. We also smoothed the simulated data using the heat kernel smoothing with the effective smoothing scales varying from 1- to 8mm.

For [beta]2(d, hS), MAGEE-A, which smooths β2(d) solely, outperforms MAGEE-B, leading to smaller empirical standard errors and root-mean-square error estimates, even though their biases are comparable (Table 1). For Wβ(d, h10), MAGEE-A significantly increases statistical power in rejecting the null hypothesis of β2(d) in the five regions of interest, while the non-overlap cluster size is still under control (Table 2 and Fig 3. (D)). Compared with Scenario I, inspecting Table 2 and Figure 4 reveals several different facts for Scenario II. (i) MAGEE-A outperforms MAGEE-B in terms of the dice overlap ratio. (ii) Although MAGEE-A and MAGEE-B have many more small clusters than the voxel-wise GEE methods, their spatial extent is much smaller than the standard spatial extent threshold (e.g., 10). (iii) MAGEE-A has higher power in detecting voxels in regions of interest than MAGEE-B in general. Figure 4 (b) and (d) show the average type I error rate and power of detection of the voxels in regions of interest, respectively, and shows clearly the advantage of MAGEE methods: they achieve high detection power, while controlling for the type I error rate. In summary, when the images of ’nuisance’ parameter estimates do not share the same pattern with those of parameters of interest, solely smoothing the parameters of interesting may lead to better results.

Fig. 3
Simulation results for scenario II: Panel (A) is a selected slice of the ground truth image for simulation study of the six regions of interest with different gray levels representing β2(d)=0, 0.05, 0.1, 0.2, 0.3, 0.4 respectively; Panel (B) represents ...

3·3. Real Data I: Longitudinal Study

A wealth of cross-sectional diffusion tensor imaging studies has been conducted on characterizing white matter development (prenatal to adolescent stages) using various diffusion parameters, such as fractional anisotropy and radial diffusivity, in the past decade. Current diffusion tensor imaging studies involving neonates have revealed three phases in the early postnatal brain development including the rapid changes within first 12 months, the slow maturation from 12 to 24 months, and the steady state afterwards. Particularly, in white matter, neonates have significantly lower fractional anisotropy values and significantly higher mean diffusivity values compared to adults (Neil et al., 1998). These diffusion tensor imaging studies also reveal the temporal non-linearity and spatial inhomogeneity of the apparent changes in diffusion tensor imaging parameters within brain (Mukherjee and McKinstry, 2006; Schneider et al., 2004).

We used 38 subjects from a longitudinal study designed to investigate early brain development led by Dr. John Gilmore at the University of North Carolina at Chapel Hill. For each subject, diffusion-weighted images were acquired at 2 weeks, year 1, and year 2. Diffusion-weighted imaging acquisition scheme includes 18 repeated measures of six non-collinear directions, (1,0,1), (−1,0,1), (0,1,1), (0,1,−1), (1,1,0), and (−1,1,0), at a bvalue of 1000 s/mm2 and a b = 0 reference scan. Forty-six contiguous slices with a slice thickness of 2 mm covered a field of view of 256×256 mm2 with an isotropic voxel size of 2 × 2 × 2 mm3. High resolution T1 weighted images were acquired using a 3D MP-RAGE sequence. We then calculated a weighted least squares estimation method to construct diffusion tensors (Basser et al., 1994; Zhu et al., 2007). All images were visually inspected before analysis to ensure no bulk motion. All diffusion tensor images (38 subjects, 3 time points each) were registered onto a template, which is a randomly selected brain diffusion tensor image of a 2-year-old subject (Yap et al., 2009).

We use fractional anisotropy images to identify the spatial patterns of white matter maturation. We fitted a GEE with AR(1) correlation structure and homogeneous variance structure and

equation M24

for i = 1, …, n and j = 1, 2, 3 at each voxel of the template. We are interested in testing two sets of hypothesis including the gender and time interaction effect and then the time effect if the gender and time interaction effect is not significant. We used MAGEE-A and MAGEE-B to address such hypotheses. We also smoothed fractional anisotropy imaging data using an isotropic Gaussian kernel with full width half maximum 8mm and used voxel-wise GEE to address those hypotheses as well. Moreover, although fractional anisotropy is between 0 and 1, μ(x, β(d)) may fall out of [0, 1]. Therefore, it may be better to use a nonlinear transformation (e.g., logit) of fractional anisotropy value for better prediction.

We run the PS procedure with ch = 1.15 and S = 10 and then tested H0 : β5(d) = β6(d) = 0 for the time and gender interaction effect across all voxels d. We corrected multiple comparisons using false discovery rate with level at 0.05. We did not observe significant voxels and clusters where two genders have different time trends, which is not presented here. Therefore, we fitted a GEE with AR(1) correlation structure and

equation M25

for i = 1, …, n and j = 1, 2, 3 at each voxel of the template. Similarly, the findings for MAGEE-A are similar to those for MAGEE-B, whereas MAGEE-A detects more significant clusters and voxels compared with MAGEE-B at S = 10 (Fig. 5 (B), (C), (B′), and (C′)). As expected, both MAGEE-A and MAGEE-B outperform voxel-wise GEE based on unsmoothed and smoothed FA images in terms of the number of both significant clusters and voxels (Fig. 5 (A) and (A′)). Specifically, the voxel-wise GEE results based on the smoothed fractional anisotropy images show the obvious oversmoothing in cerebrospinal fluid and the gray matter areas, such as the ventricle (Fig. 5 (D) and (D′)).

Fig. 5
Results of Real Longitudinal Data I: panels (A), (B), (C), and (D) respectively, show the – log10(p) values of Wβ(d, hs) for testing the age effect for voxel-wise GEE for unsmoothed FA images, MAGEE-A at the 10-th iteration, and MAGEE-B ...

We also plotted the line plots and fitted growth curves in four selected voxels in the genu, splenium, optic radiation and cerebral peduncle (Fig. 1). Different growth patterns were observed for the genu, splenium, optic radiation, and cerebral peduncle (Fig. 1). The genu and cerebral peduncle have a similar and small fractional anisotropy value at birth, whereas the genu’s fractional anisotropy increases very fast and the cerebral peduncle’s fractional anisotropy increases slowly. The splenium and optic radiation have a similar and moderate fractional anisotropy value at birth, whereas the fractional anisotropy value of the splenium and optic radiation increases relatively slow. There is a substantial variability across different anatomic regions, even though fractional anisotropy is highly significantly different between neonates and 1-year-olds for all regions of interest.

3·4. Real Data II: Twin Study

We considered the same early postnatal brain development project led by Dr. Gilmore at the University of North Carolina at Chapel Hill (Gilmore et al., 2010). A total of 49 paired twins (36 males and 62 females) were selected for an illustration. All 49 pairs were scanned as neonates within a few weeks after birth at term. Written consent was obtained from their parents before imaging acquisition. The mean gestational age at magnetic resonance scanning was 246 ± 18.3 days (range: 192 to 270 days). All infants were fed and calmed to sleep on a warm blanket with proper ear protection and they slept comfortably inside the MR scanner. None of infants was sedated during the imaging session.

We then employed a nonlinear fluid deformation based high-dimensional, unbiased atlas computation method to process all 98 diffusion tensor imaging datasets (Goodlett et al., 2009). The atlas building procedure started with an affine registration and was followed by a nonlinear registration of a set of feature images for all subjects. The feature images are the maximum eigenvalue of the Hessian of the fractional anisotropy image, which are sensitive to the geometry of white matter. Using the computed deformation fields, we warped all tensor images into the unbiased atlas space via log-euclidean based interpolation (Arsigny et al., 2006). We also averaged all the warped tensor images to create a study specific diffusion tensor imaging atlas.

Fractional anisotropy has been widely used as a measurement to assess directional organization of the brain, which is greatly influenced by the magnitude and orientation of white matter tracts. Here, fractional anisotropy images are employed to identify the spatial patterns of white matter maturation. We considered a structural equation model given by

equation M26
(13)

for i = 1, …, 49 and j = 1, 2 at each voxel of the template, where gij and zij, respectively, represent the dummy variables for gender (male=1 and female=0) and zygote (monozygotic twin=1 and dizygotic twin=0), and aij(d), ci(d) and eij(d) are, respectively, the additive genetic, common environmental, and residual effects on the i-th twin pair.

Our objective is to show that MAGEE is applicable to making statistical inference on regression coefficients including β1(d), β2(d), and β3(d) in twin studies. As an illustration, we consider the gender effect, that is H0 : β2(d) = 0. We applied MAGEE-A and MAGEE-B and compared their results with those obtained from three other methods including an maximum likelihood estimation method, TwinMARM, and voxel-wise GEE based on Gaussian smoothed images with full width half maximum 8mm. For MAGEE-A and MAGEE-B, we fitted weighted GEE with the compound-symmetry correlation and μ(xij, β(d)) = β1(d)+β2(d)gij +β3(d)zij. The maximum likelihood estimation method is to calculate the maximum likelihood estimates of unknown parameters in model (13) at each voxel in voxel-based analysis (Feng et al., 2009). We apply the maximum likelihood estimation method to the twin imaging data without the use of Gaussian kernel, since the results obtained from the maximum likelihood estimation method do not contain biases introduced by the use of Gaussian kernel. See Li et al. (2012) for detailed discussions on such biases. TwinMARM is a two-stage multi-scale adaptive regression method for spatial and adaptive analysis of twin neuroimaging and behavioral data (Li et al., 2012).

Inspecting Fig. 6 (A)–(J) reveals that the images of parameter estimator obtained from all five methods have similar pattern. All multiscale adaptive methods including TwinMARM, MAGEE-A, and MAGEE-B significantly outperform the maximum likelihood estimation method in terms of the spatial smoothness of significant clusters and the magnitude of standard deviations (Fig. 6). Comparing Fig. 6 (E) with Fig. 6 (A)–(D) reveals some subtle differences between voxel-wise GEE based on the smoothed images and all other methods. This may confirm the biases introduced by the use of Gaussian kernel as discussed in Li et al. (2012).

Fig. 6
Results of Real Twin Data II: panels (A), (B), (C), (D) and (E) are, respectively, the estimates of β2(d) with β2(d, hs) > 0 for the maximum likelihood estimation method, MAGEE-A at the scale h10, MAGEE-B at the scale h10, TwinMARM ...

4. Discussion

4·1. GEE

We have reviewed the method of GEE for analyses of repeated-measures data. GEE is powerful for handling unbalanced designs and missing data obtained from longitudinal, twin, and familial studies, accommodating continuous and discrete covariates and their interactions, and releasing distributional assumptions and complex models for the covariance. Specifically, we discuss how to set up the mean model and the working covariance model of GEE. The mean model is used to directly characterize individual trajectory in repeated measurements as linear and nonlinear functions of time, while adjusting for the effect of other predictors, such as diagnostic status and gender, on the individual trajectory. Although the linear mean models are generally satisfactory approximations for most neuroimaging applications, there are many cases, such as growth curve and dose-response relationships, when an empirical indicated or a theoretically justified nonlinear mean model is more appropriate. For instance, since growth from birth to maturity in human subjects typically is nonlinear in nature, a nonlinear mean model should be used. The logistic and Gompertz nonlinear models described in Appendix A have been widely used to characterize rapid growth shortly after birth, pronounced growth during puberty, and a leveling off sometime before adulthood. In many applications, if a simple nonlinear mean model is unavailable at the beginning, it is common to fit a nonparametric mean model to longitudinal data and then use a simple mean model to approximate the fitted nonparametric mean model (Wang, 2003; Wu and Zhang, 2006; Wang, 1998).

The working covariance model of GEE is used to model and understand the likely sources of random variation in longitudinal data. As discussed in Diggle et al. (2002), a useful working covariance model should include at least three qualitatively different sources of random variation: (i) random effects, (ii) serial correlation, and (iii) measurement error. Various strategies have been developed to incorporate them into specific models (Diggle et al., 2002). For instance, we have discussed four commonly used correlation structures, such as exchangeable, in Section 2.4, and several parametric models for covariance structure in Appendix A. Although these parametric models may be sufficient for longitudinal data with few repeated measures, most parametric models discussed here are not appealing for sparsely and irregularly longitudinal data. Recently, several nonparametric covariance models have been proposed to deal with such issue (Yao et al., 2005; Ramsay and Silverman, 2005). Similar to the mean model, one may fit a nonparametric covariance model to longitudinal data and then use a simple covariance model to approximate the nonparametric model.

4·2. Advantages of MAGEE

We have developed MAGEE for the spatial and adaptive analyses of longitudinal neuroimaging data. MAGEE is essentially a locally adaptive and spatial smoothing method and is adaptive to the spatial pattern and extent of each activation region for each regression coefficient map. Such adaptive and spatial property is very appealing from theoretical and practical perspectives. Specifically, according to the matched filter theorem, the size of the optimal filter should match the size of target signal, while accounting for noise distribution. When there are multiple effect regions with different spatial patterns and extent, multiple filters with different shapes and kernel sizes should be used. Moreover, MAGEE as a weighted GEE method explicitly incorporate the mean model and the working covariance model of GEE. In contrast, the single-filter and multi-filter methods are independent of the model assumptions of GEE (Jones et al., 2005; Poline and Mazoyer, 1994; Zhao et al., 2012; T.Ball et al., 2012; Siegmund and Worsley, 1995). Thus, the smoothed imaging data may not follow the assumed model assumptions of GEE, which can lead to biases for the analysis of longitudinal neuroimaging data. Such biases can be substantial for all kinds of neuroimaging data from longitudinal, twin, and familial studies. See Li et al. (2012) for detailed discussions on biases introduced by directly smoothing twin imaging data. Actually, the same discussions are valid for longitudinal neuroimaging data when either the nonlinear mean model is valid or the covariance model is the primary problem of interest. Therefore, one should interpret the statistical findings obtained from the voxel-based analysis based on directly smoothed longitudinal neuroimaging data with great caution.

We have used both simulations and real imaging data to demonstrate that MAGEE outperforms the voxel-wise GEE method coupled with a single Gaussian kernel. As shown in our simulations, the kernel size in the single Gaussian kernel can have a substantial impact on brain mapping results. Specifically, as the kernel width varies from 1- to 8mm, the dice overlap ratio and the type I and II error rates of voxel-wise GEE change substantially. As shown in the first real imaging data, the use of the single smoothing kernel can lead to the false positive results of significant diffusion property changes in cerebrospinal fluid and the gray matter areas. Our results are also consistent with previous results on single-filter and multi-filter analyses for various neuroimaging data including positron emission tomography, diffusion tensor imaging, functional magnetic resonance imaging, and cortical morphometry (Jones et al., 2005; Poline and Mazoyer, 1994; Zhao et al., 2012; T.Ball et al., 2012; Siegmund and Worsley, 1995). As shown in the second real imaging data, the commonly used Gaussian kernel for smoothing imaging data can introduce biases for the analysis of twin imaging data.

4·3. Future Works

Several important issues need to be addressed in future research. Firstly, the MAGEE procedure is solely powerful for detecting relatively large effect regions, which are smooth interiorly and consistent across subjects. In practice, however, the extent and location of effect regions may vary dramatically across subjects due to both registration error and population heterogeneity. Therefore, it is important and interesting to model population heterogeneity, while accounting for registration error. Secondly, although we focus on parametric growth curves, it is interesting to develop more flexible nonparametric growth curve models, which are important for sparsely and irregularly longitudinal studies. Developing multiscale adaptive methods for such non-parametric models faces up with many new challenges both computationally and theoretically. Thirdly, more research is needed for optimizing the choices of the parameters in MAGEE and for incorporating other edge-preserving local smoothing methods into MAGEE (Qiu, 2005; Qiu and Mukherjee, 2010; Mukherjee and Qiu, 2011). Fourthly, we will extend MAGEE from simple longitudinal studies to more complex longitudinal twin/familial studies. Fifthly, it is interesting and important to treat the whole image as a single piecewisely smoothed function, instead of a collection of isolated voxels, and then develop new statistical models to directly model such functions from cross-sectional and longitudinal studies (Zhu et al., 2011; Ramsay and Silverman, 2005; Yao et al., 2005; Greven et al., 2010).

Highlights

  • Develop MAGEE for longitudinal neuroimaging data.
  • Characterize the development of white matter diffusivities.
  • Evaluate the finite sample performance of MAGEE.
  • MAGEE significantly outperforms VBA.

Appendix A. Some Parametric Models for Mean and Covariance of Yi

We may consider other parametric models for mean structure. The μ(xij, β) of the logistic and Gompertz models are, respectively, given by

equation M27
(14)

where xij = (tij) and β = (β1, β2, β3)T. Fig. 7 plots sample growth curves based on the linear model in (2) for female and male groups and the two other nonlinear growth models in (14).

Fig. 7
The growth curve pattern for different models. Red: Gomportz growth model with μ(x, β) = 3.6 exp(−2 exp(−2t)); Blue: Quadratic growth model for female with μ(x, β) = 0.1 + 0.8t − 0.05t2; Magenta: ...

We may consider other parametric models for covariance structure. See (Diggle et al., 2002; Pourahmadi, 2011; Zimmerman and Nunez-Anton, 2001) for overviews of different parametric models for covariance structure. For instance, one may consider a heteroscedastic model for the variances of yi(tij). Specifically, one may assume that var(yi(tij)) takes the form σy(xij, γ)2, where σy(·, ·) is a known function of xij and γ. For instance, it is common to use the log-linear model and the power-of-mean model, which correspond to equation M28 and σy(xij, γ)2 = γ1μ(xij, β)γ2, respectively. One may use other parametric models to model the correlation structure of yi(tij). For instance, one may assume a stationary correlation structure as follows:

equation M29
(15)

where ρ(u) is a known function of u. Two popular choices of ρ(u) are the exponential correlation model and the Gaussian correlation model, which are, respectively, given by

equation M30
(16)

for some γ1 > 0. Based on these more general forms of covariance structure, one can derive the explicit form of Yi in (3).

Appendix B. Methods for Estimating (α, γ)

To estimate (α, γ), one may use different statistical methods, such as generalized estimating equations, moment estimates, and a quasi-least squares method based on the assumed covariance form (3). For instance, let’s consider the generalized estimating equations method. Let An external file that holds a picture, illustration, etc.
Object name is nihms440022ig2.jpg(β) = Vecs([Yi μi(β)][Yi μi(β)]T), where Vecs(A) is the half-vectorization of a symmetric matrix A. We define

equation M31
(17)

In many cases, the dimension of (α, γ) is typically small, say 3 or 4, and (17) has a simple form. Thus, computing ([alpha], [gamma with circumflex]) is straightforward. In most statistical software platforms, it is also common to implement moment estimates of (α, γ). For instance, for the equicorrelated structure, the GEE moment estimates of (α, γ) are given by

equation M32

For some other covariance structures, the exact expression of their moment estimates of (α, γ) can be found in (Ratcliffe and Shults, 2008).

Appendix C. Asymptotic covariance matrix of [beta](d,h)

We derive an approximation of the covariance matrix of [beta]I (d, h) as follows. It should be noted that Gn(βI (d); ω, h) contains two sets of nuisance parameters including {[beta]U(d′) : d[set membership] B(d, h)} and {([alpha](d′), [gamma with circumflex](d′)) : d′ [set membership] B(d, h)}. As shown below, {([alpha](d′), [gamma with circumflex](d′)) : d[set membership] B(d, h)} have negligible effects on the asymptotic distribution of [beta]I (d, h), whereas {[beta]U (d′) : d[set membership] B(d, h)} do.

We need to introduce some notation. Let θN (d) = (βN (d), α(d), γ(d)) be the vector of all nuisance parameters and and θ*(d) = (β(d), α*(d), γ*(d)) be the true value of θ(d). Let op(1) be a sequence of random vectors that converges to zero in probability. Let 0qNqI and 0qIqN be, respectively, a qN ×qI matrix of zeros and a qI ×qN matrix of zeros and IqN and IqI be, respectively, a qN ×qN identity matrix and a qI ×qI identity matrix. Since Gn(βI (d); ω, h) defined in (7) is also a function of [theta w/ hat]N; (d′; B(d, h)) = {[theta w/ hat]N(d′) : d[set membership] B(d, h)}, we may make this fact explicitly by using

equation M33

We define

equation M34
(18)

in which we have partitioned F*(d)−1 according to the partition of β(d) = (βI (d), βN (d)).

The derivations consist of five steps as follows.

  • Step 1 is to derive an asymptotic expansion of Gn([beta]I (d, h); ω, h). By using the Taylor’s series expansion, we have
    equation M35
    (19)
    equation M36
    (20)
    where θN*(d) and θN*(d′; B(d, h)) are, respectively, the true value of βN (d) and that of βN (d′; B(d, h)).
  • Step 2 is to characterize the asymptotic properties of the first-order derivative of Gn(βI*(d), θN*(d′; B(d, h)); ω, h) with respect to θN (d′). It follows from the law of large number and (1) that
    equation M37
    (21)
    equation M38
    (22)
    equation M39
    (23)
    where ≈ denotes equal except a op(1) term.
  • Step 3 is to characterize the asymptotic expansion of equation M40. Substituting (23) into (19) leads to
    equation M41
    (24)
    By using (21)–(22) and (24), we have
    equation M42
    (25)
  • Step 4 is to characterize the asymptotic expansion of equation M43. Since [beta]N (d′) is a subvector of [beta](d′) and [beta](d′) is the solution of (5), it follows from the asymptotic results in (Liang and Zeger, 1986) that for any voxel d, equation M44is asymptotically equivalent to
    equation M45
    (26)
    where ei(d′; β(d)) = Yi(d′) – μi(β(d)) for any d[set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg.
  • Step 5 is to characterize the asymptotic expansion of equation M46. By substituting (26) into (25), we have
    equation M47
    (27)
    where S(d′; d, h) is given by
    equation M48
    Finally, it follows from (27) that the covariance matrix of equation M49, denoted by equation M50), can be approximated by
    equation M51
    (28)

Appendix D. The PS procedure for MAGEE-A

The PS procedure for MAGEE-A has five key steps: initialization, weights adaptation, estimation, stop checking, and inference.

  • In the initialization step (i), we calculate the voxel-wise GEE estimate [theta w/ hat](d) at each voxel d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg. Then, we prefix a geometric series { equation M52: s = 1, …, S} of radii with h0 = 0, where ch [set membership] (1, 2), say ch = 1.15. We suggest relatively small ch, since small ch prevents incorporating too many neighboring voxels in the beginning of PS, which improves the robustness of PS and the accuracy of parameter estimation. We then set s = 1 and h1 = ch.
  • In the weights adaptation step (ii), we compute DβI(d, d′; hs−1) and the adaptive weights ω(d, d′; hs), which are defined as
    equation M53
    (29)
    where ΔβI(d, d′;hs−1) = [beta]I (d, hs−1) − [beta]I (d′, hs−1), Kloc(u) and Kst(u) are two nonnegative kernel functions with compact support, Cn is a number associated with n, and ||·||2 denotes the Euclidean norm of a vector (or a matrix). The weights Kloc(||dd′||2/hs) give less weight to the voxel d[set membership] B(d, hs), whose location is far from the voxel d. The weights Kst(u) downweight the voxels d′ with large differences between [beta]I (d′, hs−1) and [beta]I (d, hs−1). In practice, we set Kloc(u) = (1 − u)+ in order to increase the smoothness of the images of {[beta]I (d, h) : d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg}, whereas we choose Kst(u) = exp(−u) in order to heavily penalize moderate to large differences. We choose Cn = n1/3 χ2(qI)0.8 based on our experiments, where χ2(qI)a is the upper 1 a-percentile of the χ2(qI) distribution.
  • In the estimation step (iii), for the radius hs, we substitute ω(d, d′; hs) into (7) to calculate [beta]I (d, hs) by using the Newton-Raphson algorithm at each voxel d [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms440022ig1.jpg.
  • In the stop checking step (iv), for s > S0, we compute
    equation M54
    where ΔβI(d, hs, hS0) = [beta]I (d, hS0) – [beta]I (d, hs). If DI (d; hs, hS0) ≥ C(s), then we set [beta]I (d, hS) = [beta]I (d, hs−1) and s = S. If s = S, we go to the inference step (v). If sS0 or DI (d; hs, hS0) ≤ for S − 1 ≥ s > S0, then we set hs+1 = chhs, increase s by 1 and continue with the weight adaptation step (ii). Throughout the paper, we set S0 = 3 and C(s) = χ2(qI)0.8/(s−2)0.9.
  • In the inference step (v), when s = S, we report the final [beta]I (d, hS) and compute the p-values for Wβ(d, hS). In practice, we usually set the maximal step S to be relatively small, say 10, and thus each B(d, h10) only contains a relatively small number of voxels compared to the whole volume. Throughout the paper, we have used the false discovery rate (FDR) method in (Benjamini and Yekutieli, 2001), since the test statistics obtained from PS satisfy the positive dependency condition. We may use other multiple comparison correction methods (e.g., random field theory (RFT), FDR, or permutation methods)(Benjamini and Yekutieli, 2001; Worsley et al., 2004; Chumbley et al., 2010; Salimi-Khorshidi et al., 2011).

Footnotes

1This work was partially supported by NIH grants R01ES17240, MH091645, U54 EB005149, P30 HD03110, RR025747-01, P01CA142538-01, MH086633, AG033387, MH064065, HD053000, and MH070890. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • Almli CR, Rivkin MJ, McKinstry RC. BDC Group. The nih mri study of normal brain development (objective-2): newborns, infants, toddlers, and preschoolers. IEEE Transactions on Medical Imaging. 2007;35:308–325. [PubMed]
  • Arsigny V, Fillard P, Pennec X, Ayache N. Log-euclidean metrics for fast and simple caculus on diffusion tensors. Magnetic Resonance in Medicine. 2006;56:411–421. [PubMed]
  • Basser PJ, Mattiello J, LeBihan D. Estimation of the effective self-diffusion tensor from the nmr spin echo. Journal of Magnetic Resonance Ser B. 1994;103:247–254. [PubMed]
  • Bates D, Maechler M, Bolker B. lme4: Linear mixed-effects models using s4 classes. R package-42 2011
  • Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics. 2001;29:1165–1188.
  • Bernal-Rusiel J, Greve D, Reuter M, Fischl B, Sabuncu MR. Statistical analysis of longitudinal neuroimage data with linear mixed effects models. NeuroImage. 2013;66:249–260. [PMC free article] [PubMed]
  • Chumbley J, Worsley KJ, Flandin G, Friston KJ. Topological fdr for neuroimaging. Neuroimage. 2010;49 (4):3057–3064. [PMC free article] [PubMed]
  • Chung MK, Robbins S, Dalton KM, Davidson RJ, Alexander AL, Evans AC. Cortical thickness analysis in autism via heat kernel smoothing. NeuroImage. 2005;25:1256–1265. [PubMed]
  • Diggle P, Heagerty P, Liang KY, Zeger S. Analysis of Longitudinal Data. 2. Oxford University Press; New York: 2002.
  • Evans AC. BDC Group. The nih mri study of normal brain development. NeuroImage. 2006;30:184–202. [PubMed]
  • Feng R, Zhou G, Zhang M, Zhang H. Analysis of twin data using sas. Biometrics. 2009;65:584–589. [PMC free article] [PubMed]
  • Fitzmaurice G, Laird N, Ware J. Applied Longitudinal Analysis. John Wiley and Sons; New York: 2004.
  • Gilmore JH, Schmitt JE, Knickmeyer RA, Smithm JK, Lin W, Styner M, Gerig G, Neale MC. Genetic and environmental contributions to neonatal brain structure: a twin study. Human Brain Mapping. 2010;31:1174–1182. [PMC free article] [PubMed]
  • Goodlett CB, Fletcher PT, Gilmore JH, Gerig G. Group analysis of dti fiber tract statistics with application to neurodevelopment. NeuroImage. 2009;45:S133–S142. [PMC free article] [PubMed]
  • Greven S, Crainiceanu C, Caffo B, Reich B. Longitudinal functional principal components analysis. Electronic Journal of Statistics. 2010;4:1022–1054. [PMC free article] [PubMed]
  • Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test. 2009;18:1–43. [PMC free article] [PubMed]
  • Jones DK, Symms MR, Cercignani M, Howard RJ. The effect of filter size on vbm analyses of dt-mri data. NeuroImage. 2005;26:546–554. [PubMed]
  • Kim P, Leckman JF, Mayes L, Feldman R, Wang X, Swain JE. The plasticity of human maternal brain: longitudinal changes in brain anatomy during the early postpartum period. Behavioral Neuroscience. 2010;124:695–700. [PubMed]
  • Li Y, Gilmore J, Wang J, Styner M, Lin WL, Zhu H. Twin-marm: two-stage spatial adaptive analysis of twin neuroimaging data. IEEE Transactions on Medical Imaging. 2012;31:1100–1112. [PMC free article] [PubMed]
  • Li Y, Zhu H, Shen D, Lin W, Gilmore JH, Ibrahim JG. Multiscale adaptive regression models for neuroimaging data. Journal of the Royal Statistical Society: Series B. 2011;73:559–578. [PMC free article] [PubMed]
  • Liang K, Zeger S. Longitudinal data analysis using general linear models. Biometrika. 1986;73:13–22.
  • Meltzer JA, Postman-Caucheteux W, McArdle JJ, Braun A. Strategies for longitudinal neuroimaging studies of overt language production. Neuroimage. 2009;47:745–755. [PMC free article] [PubMed]
  • Mukherjee P, McKinstry RC. Diffusion tensor imaging and tractography of human brain development. Neuroimaging Clin N Am. 2006;16:19–43. [PubMed]
  • Mukherjee PS, Qiu P. 3-d image denoising by local smoothing and nonparametric regression. Technometrics. 2011;53:196–208.
  • Neil JJ, Shiran SI, McKinstry RC, Schefft GL, Snyder AZ, Almli CR, Akbudak E, Aronovitz JA, Miller JP, Lee BC, Conturo TE. Normal brain in human newborns: apparent diffusion coefficient and diffusion anisotropy measured by using diffusion tensor mr imaging. Radiology. 1998;209:57–66. [PubMed]
  • Nichols T, Waldorp L. Accurate and computationally efficient analysis of longitudinal fmri data. 17th Annual Meeting of the Organization for Human Brain Mapping; Barcelona. 2010.
  • Petersen R, Aisen P, Beckett L, Donohue M, Gamst A, Harvey D, Jack C, Jagust W, Shaw L, Toga A, Trojanowski J, Weiner M. Alzheimer’s disease neuroimaging initiative (ADNI): clinical characterization. Neurology. 2010;74:201–209. [PMC free article] [PubMed]
  • Pinheiro J, Bates D, DebRoy S, Sarkar D. the R Development Core team. nlme: linear and nonlinear mixed effects models. R package version 3.1 2011
  • Poline J, Mazoyer B. Enhanced detection in brain activation maps using a multifiltering approach. J Cereb Blood Flow Metab. 1994;14:639–642. [PubMed]
  • Polzehl J, Spokoiny VG. Adaptive weights smoothing with applications to image restoration. J R Statist Soc B. 2000;62:335–354.
  • Polzehl J, Spokoiny VG. Propagation-separation approach for local likelihood estimation. Probab Theory Relat Fields. 2006;135:335–362.
  • Polzehl J, Voss HU, Tabelow K. Structural adaptive segmentation for statistical parametric mapping. Neuroimage. 2010;52:515–523. [PubMed]
  • Pourahmadi M. Modeling covariance matrices: The glm and regularization perspectives. Statistical Science. 2011;26:369–387.
  • Qiu P. Image Processing and Jump Regression Analysis. New York: John Wiley & Sons; 2005.
  • Qiu P, Mukherjee PS. Edge structure preserving image denoising. Signal Processing. 2010;90:2851–2862.
  • Ramsay JO, Silverman BW. Springer Series in Statistics. 2. Springer; New York: 2005. Functional data analysis.
  • Ratcliffe S, Shults J. Geeqbox: A matlab toolbox for generalized estimating equations and quasi-least squares. Journal of Statistical Software. 2008;25:1–14.
  • Salimi-Khorshidi G, Smith S, Nichols TE. Adjusting the effect of nonstationarity in cluster-based and tfce inference. Neuroimage. 2011;54 (3):2006–2019. [PubMed]
  • Schneider J, Il’yasov K, Hennig J, Martin E. Fast quantitative diffusion-tensor imaging of cerebral white matter from the neonatal period to adolescence. Neuroradiology. 2004;46:258–266. [PubMed]
  • Siegmund DO, Worsley KJ. Testing for a signal with unknown location and scale in a stationary gaussian random field. Ann Statist. 1995;23:608–639.
  • Silver M, Montana G, Nichols TE, ADNI False positives in neuroimaging genetics using voxel-based morphometry data. NeuroImage. 2011;54:992–1000. [PMC free article] [PubMed]
  • Skup M, Zhu H, Zhang H. Multiscale adaptive marginal analysis of longitudinal neuroimaging data with time-varying covariates. Biometrics. 2012 in press. [PMC free article] [PubMed]
  • Skup M, Zhu HT, Wang Y, Giovanello KS, Lin JA, Shen DG, Shi F, Gao W, Lin W, Fan Y, Zhang HP, ADNI Sex differences in grey matter atrophy patterns among ad and amci patients: Results from adni. NeuroImage. 2011;56:890–906. [PMC free article] [PubMed]
  • Tabelow K, Polzehl J, Spokoiny V, Voss HU. Diffusion tensor imaging: structural adaptive smoothing. NeuroImage. 2008;39:1763–1773. [PubMed]
  • Tabelow K, Polzehl J, Voss HU, Spokoiny V. Analyzing fmri experiments with structural adaptive smoothing procedures. NeuroImage. 2006;33:55–62. [PubMed]
  • Ball T, Breckel TP, Mutschler I, Aertsen A, Schulze-Bonhage A, Hennig J, Speck O. Variability of fmri-response patterns at different spatial observation scales. Human Brain Mapping. 2012;33:1155–1171. [PubMed]
  • Wang N. Marginal nonparametric kernel regression accounting for within-subject correlation. Biometrika. 2003;90:43–52.
  • Wang Y. Smoothing spline models with correlated random errors. Journal of the American Statistical Association. 1998;93 (441):341–348.
  • Worsley KJ, Taylor JE, Tomaiuolo F, Lerch J. Unified univariate and multivariate random field theory. NeuroImage. 2004;23:189–195. [PubMed]
  • Wu HL, Zhang JT. Nonparametric Regression Methods for Longitudinal Data Analysis. John Wiley & Sons, Inc; Hoboken, New Jersey: 2006.
  • Yao F, Müller H-G, Wang J-L. Functional linear regression analysis for longitudinal data. Ann Statist. 2005;33(6):2873–2903.
  • Yap P, Wu G, Zhu H, Lin W, Shen D. Timer: tensor image morphing for elastic registration. NeuroImage. 2009;47:549–563. [PMC free article] [PubMed]
  • Zhang T, Davatzikos C. Odvba: optimally-discriminative voxel-based analysis. IEEE Transactions on Medical Imaging. 2011;30:1441–1454. [PMC free article] [PubMed]
  • Zhao L, Boucher M, Rosa-Neto P, Evans A. Impact of scale space search on age- and gender-related changes in mri-based cortical morphometry. Human Brain Mapping. 2012 in press. [PubMed]
  • Zhu H, Kong L, Li R, Styner M, Gerig G, Lin W, Gilmore JH. Fadtts: functional analysis of diffusion tensor tract statistics. NeuroImage. 2011;56:1412–1425. [PMC free article] [PubMed]
  • Zhu HT, Li YM, Ibrahim JG, Lin WL, Shen DG. Marm: Multiscale adaptive regression models for neuroimaging data. Information Processing in Medical Imaging. 2009;5636:314–325. [PubMed]
  • Zhu HT, Zhang HP, Ibrahim JG, Peterson BG. Statistical analysis of diffusion tensors in diffusion-weighted magnetic resonance image data (with discussion) Journal of the American Statistical Association. 2007;102:1085–1102.
  • Zimmerman D, Nunez-Anton V. Parametric modeling of growth curve data: an overview (with discussions) Test. 2001;10:1–73.