Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC4812165

Formats

Article sections

- Summary
- 1. Introduction
- 2. Two Sample Testing for Functional Data
- 3. Extension to Practical Situations
- 4. Simulation Studies
- 5. Diffusion Tensor Image Data Analysis
- 6. Discussion
- Supplementary Material
- References

Authors

Related links

J R Stat Soc Ser C Appl Stat. Author manuscript; available in PMC 2017 April 1.

Published in final edited form as:

Published online 2016 January 9. doi: 10.1111/rssc.12130

PMCID: PMC4812165

NIHMSID: NIHMS723553

The publisher's final edited version of this article is available at J R Stat Soc Ser C Appl Stat

See other articles in PMC that cite the published article.

Motivated by an imaging study, this paper develops a nonparametric testing procedure for testing the null hypothesis that two samples of curves observed at discrete grids and with noise have the same underlying distribution. The objective is to formally compare white matter tract profiles between healthy individuals and multiple sclerosis patients, as assessed by conventional diffusion tensor imaging measures. We propose to decompose the curves using functional principal component analysis of a mixture process, which we refer to as *marginal functional principal component analysis*. This approach reduces the dimension of the testing problem in a way that enables the use of traditional nonparametric univariate testing procedures. The procedure is computationally efficient and accommodates different sampling designs. Numerical studies are presented to validate the size and power properties of the test in many realistic scenarios. In these cases, the proposed test has been found to be more powerful than its primary competitor. Application to the diffusion tensor imaging data reveals that all the tracts studied are associated with multiple sclerosis and the choice of the diffusion tensor image measurement is important when assessing axonal disruption.

In this paper, we consider testing the hypothesis that two or multiple samples of curves have the same distribution, when the observed data are realizations of the curves at finite grids and possibly corrupted by noise. The motivating application is a diffusion tensor imaging study, where the objective is to formally assess whether several imaging modalities vary differently between patients with multiple sclerosis and healthy controls. We address this problem by introducing a novel framework based on functional principal component analysis of an appropriate mixture process, which we refer to as marginal functional principal component analysis (marginal FPCA). We develop a computationally stable and theoretically valid procedure. Our approach is applicable to a variety of realistic scenarios, including 1) curves observed at dense or sparse grids of points, with or without measurement error; 2) different sampling designs for different samples; and 3) different sample sizes. The methodology scales well with the total sample size and it can be extended to test for the equality of distributions of more than two samples of curves.

Multiple sclerosis (MS) is a disease that affects the central nervous system and in particular damages white matter tracts in the brain through lesions, myelin loss, and axonal damage. One of the recent approaches to visualize the white matter tracts is the use of diffusion tensor imaging (DTI), a magnetic resonance imaging technique that measures water diffusivity in the brain. In the brain, water diffuses isotropically in both gray matter and cerebrospinal fluid and anisotropically in the white matter regions, which makes DTI an ideal approach to study the white matter tracts (Goldsmith et al. (2012)). There are several well identified white matter tracts in the brain: right/left corticospinal tract (rCST, lCST), corpus callosum (CCA), and right/left optic radiations tract (rOPR, lOPR). Along each tract, there are several measures that DTI provides. Two of these measures (referred to as modalities) that are most commonly used are fractional anisotropy (FA) and parallel diffusivity (L0), which characterize the tissue microstructure. Briefly, FA describes the degree of anisotropy of the water diffusion process while LO measures the diffusivity along the principal axis. In this paper, we focus on continuous summaries of these measures of water diffusivity, as parameterized by distance along the tract and refer to them as tract-specific modality profiles.

The DTI study comprises 162 subjects with MS and 42 healthy controls observed at multiple hospital visits. Details of this study have been described in Staicu et al. (2012); Gertheiss et al. (2013a), Gertheiss et al. (2013b); they addressed some of the scientific questions by focusing on specific aspects of the data. Our objective is to study whether the water diffusivity, as measured by the conventional FA or LO, along each of several well identified white matter tracts varies between MS patients and controls. In this work, we consider the data collected at the patients' baseline visit: Figure 1 displays the two DTI measures, FA and L0, along the CCA, rCST, lCST, and lOPR tracts for all the subjects in the study. For example the top left plot shows the FA profile along the CCA tract for the MS patients. Each line corresponds to an MS patient and is obtained by connecting the patient's FA at the 93 locations along the CCA. A subject's data can be viewed as arising from a subject-specific latent smooth function that is evaluated at a grid of locations and the evaluations are corrupted by noise. It is this latent function that characterizes the FA of the water diffusivity along the CCA for each subject. The interest is to formally assess whether the profiles of the true FA or L0 of water diffusivity along each of the five tracts vary differently between the MS patients and healthy controls. The result of this investigation has the potential to shed new light onto our understanding of the neurodegeneration associated with this disease. In particular, it facilitates the identification of the white matter tracts that are likely targeted by MS, as well as the modalities that capture such neurodegeneration.

Top: Fractional anisotropy (left) and parallel diffusivity (right) for MS cases and healthy controls. The pointwise means are displayed by the solid black lines.

In studies involving multiple groups, formally assessing whether the distribution of the ‘characteristics’ of interest is the same across the groups is the first step of the analysis. In a recent study of the neuronal electrophysiological properties of rodents of different sexes, Dorris et al. (2014) compare the distribution of each of several characteristics of interest across the two gender groups. For scalar or vector ‘characteristics’, there is extensive literature on this topic. However, this problem is beginning to attract increasing interest when the ‘characteristics’ are curves that are observed with error, at finite grids of points. Hall and Van Keilegom (2007) and Corain et al. (2014) have considered this problem when the sampling design is dense, e.g. the curves are observed at very fine grids of points. To the best of the authors' knowledge no methodology exists outside of this case. In the DTI study, although both FA and L0 are sampled on a regular grid, some subjects have missing data, with the percentage of missing values ranging up to 22%. This sampling design does not fall under the situations studied by the existing methods. In this paper, we propose testing methodology that is applicable to noisy curves observed under both dense and sparse sampling designs.

While our research is motivated by the DTI study, it is increasingly common to observe curves that are separated into groups and to test whether the distribution of the curves is the same across the groups. One example is studied by Annette Moller et al. (2015), where the interest is to test whether the Raman spectra of boar-tainted or non boar-tainted mean samples have the same distribution. Another example is a recent study of activity in cats suffering from degenerative joint disease where the interest is to test whether the minute-by-minute activity profiles of the cats who receive treatment vary differently compared to those who receive placebo.

In this section, we formally describe the statistical framework for this problem. Suppose we observe data arising from two groups,
${\{({t}_{1ij},{Y}_{1ij}):j=1,\dots ,{m}_{1i}\}}_{i=1}^{{n}_{1}}$ and
${\{({t}_{2ij},{Y}_{2ij}):j=1,\dots ,{m}_{2i}\}}_{i=1}^{{n}_{2}}$, where *t*_{1ij}, *t*_{2ij}
, a compact interval; for simplicity we take
= [0, 1]. For example *Y*_{1ij} could be the FA at location *t*_{1ij} along the CCA for the *i*th MS patient, while *Y*_{2ij} could be the FA at location *t*_{2ij} along the CCA for the *i*th healthy control. The notation of the time-points, *t*_{1ij} and *t*_{2ij}, allows for different observation points in the two groups. It is assumed that the *Y*_{1ij}'s and the *Y*_{2ij}'s are independent realizations of two underlying (stochastic) processes observed with noise on a finite grid of points. Specifically, assume

$${Y}_{1ij}={X}_{1i}({t}_{1ij})+{\epsilon}_{1ij},\phantom{\rule{0.5em}{0ex}}\text{and}\phantom{\rule{0.5em}{0ex}}{Y}_{2ij}={X}_{2i}({t}_{2ij})+{\epsilon}_{2ij},$$

(1)

where
${X}_{1i}(\xb7)\stackrel{\mathit{\text{iid}}}{~}{X}_{1}(\xb7)\phantom{\rule{0.5em}{0ex}}\text{and}\phantom{\rule{0.5em}{0ex}}{X}_{2i}(\xb7)\stackrel{\mathit{\text{iid}}}{~}{X}_{2}(\xb7)$ are independent and square-integrable random functions over
, for some underlying (latent) random processes *X*_{1}(·) and *X*_{2}(·). In the DTI example, *X*_{1i}(·) and *X*_{2i}(·) would be the true latent smooth FA along the CCA for the *i*th MS patient and *i*th control, respectively. It is assumed that *X*_{1}(·) and *X*_{2}(·) are second order stochastic processes with mean functions assumed to be continuous and covariance functions assumed to be continuous and positive semi-definite, both being unknown. The measurement errors, {*ε*_{1ij}} and {*ε*_{2ij}}, are independent and identically distributed with mean zero, and with variances
${\sigma}_{1}^{2}$ and
${\sigma}_{1}^{2}$ respectively, and are assumed independent of *X*_{1i}(·) and *X*_{2i}(·). Our objective is to test the null hypothesis,

$${H}_{0}:{X}_{1}(\xb7)\stackrel{d}{=}{X}_{2}(\xb7)$$

(2)

versus the alternative ${H}_{A}:{X}_{1}(\xb7)\stackrel{d}{\ne}{X}_{2}(\xb7),\text{where}\phantom{\rule{0.3em}{0ex}}\u201c\stackrel{d}{=}\u201d$ denotes that the processes on either side have the same distribution. In this paper we develop a non-parametric and computationally stable method to test the null hypothesis in (2).

Since *X*_{1}(·) and *X*_{2}(·) are processes defined over a continuum, hypothesis (2) implies that the two infinite dimensional objects have the same generating distribution. This is different from the two sample testing in a multivariate framework, where the dimension of the random objects of interest is finite and the same. In the case where the sampling design is common to all the subjects (i.e. *t*_{1ij} = *t*_{2ij} = *t _{j}* and

Two sample hypothesis testing for functional data has been considered in many contexts; ranging from testing for specific types of differences, such as differences in the mean or covariance functions, to testing for overall differences in the cumulative density functions. To detect differences in the mean functions of two independent samples of curves Ramsay and Silverman (2005) introduced a pointwise t-test, Zhang et al. (2010) presented an *L*^{2}–norm based test, Horváth et al. (2013) proposed a test based on the sample means of the curves, and Staicu et al. (2014) developed a pseudo likelihood ratio test. Extension to *k* independent samples of curves was discussed in Cuevas et al. (2004), Estévez-Pérez and Vilar (2008), and Laukaitis and Račkauskas (2005), who proposed ANOVA-like testing procedures for testing the equality of mean functions. For detecting differences in the covariance functions of independent samples of curves Ferraty et al. (2007) proposed a factor-based test, Kraus and Panaretos (2012) introduced a regularized M-test, and Fremdt et al. (2012) proposed a chi-squared test.

Literature on testing the equality of the distributions of two sets of curves observed at a discrete grid of points and possibly with error is rather scarce. Corain et al. (2014) proposed a permutation testing framework for comparing two sets of functions. However, their method requires the functions to be observed on a regularly spaced grid of points. Thus, pre-smoothing would be required to apply this test to the DTI data; such extension has not been studied. Hall and Van Keilegom (2007) investigated the effect of pre-smoothing on testing procedures. They proposed an extension of the multivariate Cramer-von Mises (CVM) test and use of bootstrapping to approximate the null distribution of the test. These methods are again developed for noisy functional data observed on dense designs.

Benko et al. (2009) attempted to address this problem by considering a common functional principal components model for the two samples and testing for the equality of the corresponding model components using bootstrapping. In the proposed form, this test does not account for multiple comparisons and is not directly applicable to our general testing problem. We use their proposed bootstrap test in the DTI application for testing the equality of the mean profiles of the MS patients and healthy controls. The testing procedure is compared with our proposed test using simulated data and the results are presented in the appendix. The use of bootstrap techniques, while advantageous because they do not rely on specific distributional assumptions, comes with the price of a large computational burden. Therefore, it is often numerically unfeasible to perform extensive empirical power analysis when the sample size is moderately large.

In this paper, we propose an approach based on the so-called marginal FPCA, which facilitates representation of the curves using the marginal eigenbasis. This reduces the original infinite dimensional two-sample functional testing problem to an approximate simpler finite dimensional testing problem. The methodology is applied using the two-sample Anderson-Darling statistic (Pettitt, 1976); however, any other two-sample distribution-free tests can also be used. Our simulation results show that the method performs well, and in cases where the approach of Hall and Van Keilegom (2007) applies, our proposed test is considerably more powerful.

To begin with, we describe how to test hypothesis (2) under the assumption that the curves are observed entirely and without noise (Hall et al., 2006). Extension to practical settings is discussed in Section 3. Consider two sets of independent curves {*X*_{11}(·), …, *X*_{1n1}(·)} and {*X*_{21}(·), …, *X*_{2n2}(·)}, defined on [0, 1], where *X*_{1i}(·) ~ *X*_{1}(·) and *X*_{2i}(·) ~ *X*_{2}(·) are square integrable and have continuous mean and covariance functions respectively.

The large sample validity of our methodology is developed under the assumption that both *n*_{1}, *n _{2}* → ∞ such that lim

$${H}_{0}^{K}:{\{{\xi}_{1k}\}}_{k=1}^{K}\stackrel{d}{=}{\{{\xi}_{2k}\}}_{k=1}^{K},$$

(3)

where the superscript *K* in
${H}_{0}^{k}$ emphasizes the dependence of the reduced null hypothesis on the finite truncation *K*. See the Supplementary Material for further justification.

One possible approach to test hypothesis (3) is to consider two-sample multivariate procedures; see for example Wei and Lachin (1984), Schilling (1986) or Bohm and Zech (2010), Ch.10. For simplicity, we consider multiple two-sample univariate tests combined with a multiple comparison adjustment (e.g. a Bonferroni correction). In particular, testing the null hypothesis (3) can be conducted by testing of the null hypotheses
${H}_{k0}^{K}$, for *k* = 1, …, *K*, where

$${H}_{k0}^{K}:{\xi}_{1k}\stackrel{d}{=}{\xi}_{2k}.$$

(4)

There are several common univariate two sample tests: for example the Kolmogorov-Smirnov test (KS, Massey Jr (1951)) or the Anderson-Darling test (AD, Pettitt (1976)). The KS and AD tests are both capable of detecting higher order moment shifts between two univariate distributions, by using differences in the empirical cumulative distributions. Empirical studies have shown that the AD test tends to have higher power than the KS test (Stephens, 1974; Bohm and Zech, 2010). Thus, we present the proposed testing procedure using the AD test.

Recall that the *X*_{1i}(·)'s and the *X*_{2i}(·)'s are two samples of independent and identically distributed curves observed from *X*_{1}(·) and *X*_{2}(·) respectively, and assume that both the mean function, *μ*(*t*), and the eigenbasis {*ϕ _{k}*(·)}

Extending the testing procedure described in Section 2.1 to practical applications is not straightforward, as the true smooth trajectories *X _{i}*(·), and the true scores

Our logic is based on the result that under null hypothesis (2)
${\widehat{\xi}}_{1ik}-{\widehat{\xi}}_{2ik}\stackrel{p}{\to}0$ as *n* → ∞ where “
$\stackrel{p}{\to}$” denotes convergence in probability, for *k* = 1, …, *K*. Thus, to test (4), one can use the testing procedure described in Section 2, but with the estimated basis coefficients _{1ik} and _{2ik} instead of the true ones.

First, consider the situation when the grid of points for each subject is dense in [0, 1], that is *m*_{1i} and *m*_{2i} are very large. Zhang and Chen (2007) proved that one can reconstruct the curves *X _{i}*(

Consider the pooled sample {_{1i}(·) : *i* = 1, …, *n*_{1}} {_{2i}(·) : *i* = 1, …, *n*_{2}}, and let * _{i}*(

Next, consider the situation when the grid of points for each subject is sparse (*m*_{1i}, *m*_{2i} are as small as few observations) but the overall sets of observed points {*t*_{1ij} : *j* = 1, …, *m*_{1i}, *i* = 1, …, *n*_{1}} and {*t*_{2ij} : j = 1, …, *m*_{2i}, *i* = 1, …, *n*_{2}} are dense sets in [0, 1]. The sparse setting requires different methodology for several reasons. First, the bounding constraint on the number of repeated observations *m*_{1i} and *m*_{2i} implies a sparse setting at the curve level and does not provide accurate estimators by smoothing each curve separately. Secondly, estimation of the basis coefficients *ξ _{ik}* via numerical integration is no longer reliable. Instead, we consider the pooled sample
${\{({t}_{1ij},{Y}_{1ij}):j\}}_{i=1}^{{n}_{1}}\cup {\{({t}_{2ij},{Y}_{2ij}):j\}}_{i=1}^{{n}_{2}}$, where {(

Common FPCA-techniques can be applied to reconstruct the underlying subject-trajectories, * _{i}*(·) from the observed {(

Once the marginal mean function, marginal eigenfunctions, eigenvalues, and noise variance are estimated, the model for the observed data {*Y _{ij}* : 1 ≤

The performance of the proposed testing procedure is studied under a variety of settings and for varying sample sizes. We focus on a small number of FPCs first and later study more complex data settings. Section 4.1 studies the Type I error rate of the FAD test, and the sensitivity to sample size and the percentage of explained variance, *τ*, used to estimate the parameter, *K*. Section 4.2 provides a numerical comparison of the proposed approach with the closest available competitor - the Cramér-von Mises (CVM) -type test introduced by Hall and Van Keilegom (2007).

We construct datasets
${\{({t}_{1ij},{Y}_{1ij}):j\}}_{i=1}^{{n}_{1}}$ and
${\{({t}_{2ij},{Y}_{2ij}):j\}}_{i=1}^{{n}_{2}}$ using model (1) for *t*_{1ij} = *t*_{2ij} = *t _{j}* observed on an equally spaced grid of

The FAD test is employed to test the null hypothesis in (2); the marginal mean functions, the marginal eigenbasis functions, and the corresponding basis coefficients are estimated using the methods described in Section 3. The number of marginal eigenbasis functions is estimated using the percentage of explained variance, *τ*, for the pooled data. The estimates for all the model components, including the basis coefficients, are obtained using the
`R` (Team, 2015) package
`refund` (Crainiceanu et al., 2012). The
`R` package
`AD` (Scholz, 2011) is used to test the equality of the corresponding univariate distributions for each pair of basis coefficients. The Bonferroni multiple testing correction is used to maintain the desired level of the FAD test. The null hypothesis is rejected/not rejected according to the approach described in Section 2.1. All the results in this section are based on *α* = 0.05 significance level.

First, we assess the Type I error rate for various threshold parameter values, *τ*. For simplicity, we set *μ*_{1}(*t*) = *μ*_{2}(*t*) = 0 for all *t* and consider *ξ*_{1ik}, *ξ*_{2ik} ~ *N*(0, *λ _{k}*), for

Estimated Type I error rate of FAD, based on 5000 replications for several values *τ*, leading to different estimates of the truncation parameter *K*.

Next, we set *τ* = 0.95 and study the power performance of the FAD test. Setting the percentage of explained variance to be too high, or implicitly selecting an unnecessarily large number of components may result in a loss of power of the testing procedure. In particular, if we set the threshold value at a level that results in *K* marginal eigenfunctions, and the difference between the sets of curves is captured by the distribution of the coefficients of the (*K* + 1)th eigenfunction, then the proposed testing procedure does not have any power.

The distribution of the true processes under the alternative is described by the mean functions, as well as by the distributions of the basis coefficients. The following scenarios allow us to study the FAD test for specific types of changes in the two data sets. Settings A, B and C correspond to deviations in the first, second and third moments, respectively, of the corresponding distribution of the first set of basis coefficients. Throughout this section it is assumed that *λ*_{1k} = *λ*_{2k} = 0 for all *k* ≥ 3. **[A] Mean Shift:** Set the mean functions as *μ*_{1}(*t*) = *t* and *μ*_{2}(*t*) = *t* + *δt*^{3}. Generate the coefficients as *ξ*_{1i1}, *ξ*_{2i1} ~ *N*(0, 10), *ξ*_{1i2}, *ξ*_{2i2} ~ *N*(0, 5). The index *δ* controls the departure in the mean behavior of the two distributions. **[B] Variance Shift:** Set *μ*_{1}(*t*) = *μ*_{2}(*t*) = 0. Generate the coefficients *ξ*_{1i1} ~ *N*(0, 10), *ξ*_{2i1} ~ *N*(0,10 + *δ*), and *ξ*_{1i2}, *ξ*_{2i2} ~ *N*(0, 5). Here *δ* controls the difference in the variance of the first basis coefficient between the two data sets. **[C] Skewness Shift:**
*ξ*_{1i1} ~ *T*_{4}(0, 10) and *ξ*_{2i1} ~ *ST*_{4}(0, 10, 1 + *δ*), and *ξ*_{1i2}, *ξ*_{2i2} ~ *T*_{4}(0, 5). Here, *T*_{4}(*μ*, *σ*) denotes the common student T distribution with 4 degrees of freedom, that is standardized to have mean *μ* and standard deviation *σ* and *ST*_{4}(*μ,σ,γ*) is the standardized skewed T distribution (Wurtz et al., 2006) with 4 degrees of freedom, mean *μ*, standard deviation *σ*, and shape parameter 0 < *γ* < ∞, which controls skewness. The
`rstd` and
`rsstd` functions in the
`R` package
`fgarch` (Wurtz et al., 2006) are used to generate random data from a standardized *T* and standardized skewed *T* distribution respectively. The shape parameter *γ* is directly related to the skewness of this distribution and the choice *γ* = 1 corresponds to the symmetric T distribution. In this case, the index *δ* controls the difference in the skewness of distribution of the first basis coefficient.

For all the settings, *δ* = 0 corresponds to the null hypothesis that the two samples of curves have the same generating distribution, whereas *δ* > 0 corresponds to the alternative hypothesis that the two sets of curves have different distributions. Thus, *δ* represents the departure from the null hypothesis and it will be used to characterize empirical power curves. The estimated power is based on 1000 MC replications. Results are presented in Figure 2 for the case of equal/unequal sample sizes in the two groups, and for various total sample sizes.

Empirical power curves under simulation setting A (leftmost panels), B (middle panels) and C (rightmost panels). The top panels consist of the case in which both sets of curves have equal sample sizes. The bottom panels correspond to unequal sample sizes **...**

Column A of Figure 2 displays the empirical power curves of the FAD test when the mean discrepancy index *δ* ranges from 0 to 8. It appears that the performance of the power is affected more by the combined sample size, *n* = *n*_{1} + *n*_{2}, than the magnitude of each sample size *n*_{1} or *n*_{2}. Column B shows the empirical power, when the variance discrepancy index *δ* ranges from 0 to 70. The empirical power increases at a faster rate for equal sample sizes than unequal sample sizes, when the total sample size is the same. However, the differences become less pronounced as the total sample size increases. Finally, column C displays the power behavior for observed data generated under setting C for *δ* between 0 and 6. For moderate sample sizes, irrespective of their equality, the probability of rejection does not converge to 1 no matter how large *δ* is; see the results corresponding to a total sample size equal to *n* = 200 or 400. This is in agreement with our intuition that detecting differences in higher order moments of the distribution becomes more difficult and requires increased sample sizes. In contrast, for larger total sample sizes, the empirical power curve has a fast rate of increase.

Following a suggestion by an anonymous reviewer, we numerically compared our testing procedure with a simpler test for the mean; we used the *L*^{2}-based mean test proposed in Benko et al. (2009). The two testing procedures are developed for different null hypotheses; to compare them for testing the null hypothesis that the distribution of the two curves is the same across groups, the mean detection test requires an additional working assumption that the distribution of the curves in each group differs by at most a mean shift. In practice one typically does not know if the way two sets of curves vary is captured solely by the mean function. Thus, we generate two sets of curves as described in settings A (mean shift) and B (variance shift) above. In both cases we record the rejection probabilities. Note that the necessary assumption for the mean test is true for setting A but not true for B. One expects a higher power for the mean test in setting A, as this test uses the additional information that the distribution of the sets of curves differ solely in the mean function, while our proposed test does not use such information. On the contrary, when the distribution of the curves does not differ just by the mean function, the loss of power may be substantial. In setting B, the mean of the two distributions is the same across groups. Therefore, the test for the mean does not have power to detect the alternative hypothesis. These additional results are included in the Supplementary Material, due to space limitation. In the DTI study, we use the FAD test for initial screening and consider the “targeted” test for the mean as a second stage analysis.

To our best of our knowledge, the work of Hall and Van Keilegom (2007) is the only available alternative that considers testing that the distributions of two samples of curves are the same, when the observed data are noisy, discrete and irregular realizations of the latent curves. In this section, we compare the performance of our proposed FAD test compared to their Cramer-von Mises (CVM) - type test, based on the empirical distribution functions after local-polynomial smoothing of the two samples of curves.

We generate data
${\{({t}_{1ij},{Y}_{1ij}):j\}}_{i=1}^{{n}_{1}}$ and
${\{({t}_{2ij},{Y}_{2ij}):j\}}_{i=1}^{{n}_{2}}$ exactly as in Hall and Van Keilegom (2007), and for completeness we describe it next: *Y*_{1ij} = *X*_{1i} (*t*_{1ij}) + *ε*_{1ij} and *Y*_{2ij} = *X*_{2i} (*t*_{2ij}) + *ε*_{2ij}, where, *ε*_{1ij} ~ N (0, 0 .01), *ε*_{2ij} ~ *N*(0, 0.09),
${X}_{1i}(t)={\sum}_{k=1}^{15}{e}^{-k/2}{N}_{k1i}{\psi}_{k}(t)$ and
${X}_{2i}(t)={\sum}_{k=1}^{15}{e}^{-k/2}{N}_{k21i}{\psi}_{k}(t)+\delta {\sum}_{k=1}^{15}{k}^{-2}{N}_{k22i}{\psi}_{k}^{\ast}(t)$, such that
${N}_{k1i},{N}_{k21i},{N}_{k22i}\stackrel{\mathit{\text{iid}}}{~}N(0,1)$. Here *ψ*_{1}(*t*) 1 and *ψ _{k}*(

The null hypothesis that the underlying distribution is identical in the two samples is tested using CVM (Hall and Van Keilegom, 2007) and FAD testing procedures for various values of *δ*. Figure 3 illustrates the comparison between the approaches for significance level *α* = 0.05; the results are based on 500 Monte Carlo replications. In practice, we observe that as the number of observations on each curve decreases, the empirical size of the test seems to increase slightly. The CVM test is conducted using the procedure described in Hall and Van Keilegom (2007), and the p-value is determined based on 250 bootstrap replicates; the results are obtained using the
`R` code provided by the authors. To apply the marginal FPCA and determine the marginal eigenbasis, we use the
`refund` package (Crainiceanu et al., 2012) in
`R`, which requires that the data are formatted corresponding to a common grid of points, with possible missingness. Thus, a pre-processing step is necessary. For each scenario, we consider a common grid of *m* = 100 equally spaced points in [0, 1] and bin the data of each curve according to this grid. This procedure introduces missingness for the points where data are not observed. This pre-processing step is not necessary if one uses
`PACE` package (Yao et al., 2005) in
`Matlab`. However, our preference for using open-source software motivates the use of
`refund`. Comparison of
`refund` and
`PACE` revealed that the two methods lead to similar results when smoothing trajectories from noisy and sparsely observed ‘functional’ data.

Estimated power curves for the FAD (solid line) and CVM (dashed) for equal sample sizes, varying from *n*_{1} = *n*_{2} = 15, 25 to 50 and for *m*_{1} = *m*_{2} = 20 (dot) and *m*_{1} = *m*_{2} = 100 (square). Results use 5% significance level and 500 MC replications.

As Figure 3 illustrates, both procedures maintain the desired level of significance. However, the empirical power of the FAD test increases at a faster rate than the CVM test (Hall and Van Keilegom, 2007) under all the settings considered. This should not seem surprising, since representing the data using the marginal eigenbasis expansion, as detailed in Section 3, removes extraneous components. In contrast, the CVM test attempts to estimate all basis functions by smoothing the data. This could lead to cumulative estimation errors that can ultimately lower the power of the test. Additionally, due to the usage of bootstrapping to approximate the null distribution of the test, the CVM test has a much higher computational burden than the FAD test. Thus we restrict our simulation studies to 500 MC replications.

We now return to our motivating brain tractography application. Recall that for each patient we measure the water diffusivity along five well identified tracts -right/left corticospinal tract (rCST, lCST), corpus callosum (CCA), and right/left optic radiation tract (rOPR, lOPR) - by two common DTI modalities fractional anisotropy (FA) and parallel diffusivity (L0). Three of these 10 tract-modality profiles are available in the
`R`-package refund (Crainiceanu et al., 2012). Our interest is to study if FA or L0 profiles along the five tracts have different distributions for subjects affected by MS when compared to controls. Such an assessment would provide researchers with valuable information about the neurodegeneration of MS caused by the axonal disruption along the white mater tracts and the usefulness of the commonly acquired modalities to capture this process. In the following, we focus first on the CCA, as measured by FA and L0, since there is scientific evidence that MS, in advanced stages, is associated with significant neuronal loss in the corpus callosum (Ozturk et al., 2010).

The top four panels in Figure 1 display the FA profiles (left most two panels) and L0 profiles (right most two panels) sampled at 93 locations along CCA for the 162 MS patients and 42 healthy controls. Though the modalities are observed on a regular sampling design, there are missing observations and the data are likely corrupted with measurement error. It seems reasonable to suspect that the distribution of the modalities has a relatively similar mean profile for MS cases compared to controls. Furthermore, it seems the pointwise variance in the two groups may be different though this observation may be an artifact of the larger group size of the MS patients compared to the healthy controls. More importantly, L0 seems to exhibit group-specific distributional characteristics that vary across the locations along the tract (such as skewness). Staicu et al. (2012) investigated this point in detail and proposed different semi-parametric Gaussian copula-based models to describe the distribution of the L0 profiles along CCA (CCA-L0) in MS patients and healthy controls. Based on bootstrapping the subjects, they inferred that the marginal distribution parameters - pointwise mean, variance and skewness functions - are significantly different in MS cases and controls. In our analysis we go one step further and formally assess whether the distribution of the CCA-L0 profiles in the MS population is different than the distribution of the CCA-L0 profiles in the healthy controls, without making any parametric assumptions.

We begin with the study of the CCA-FA profiles and test the null hypothesis that the CCA-FA profiles have the same distribution for the MS patients and controls. We also investigate if there are differences in the mean of CCA-FA between cases and controls using the *L*^{2}-based mean test presented in Benko et al. (2009); this can be considered as a more ‘targeted’ testing procedure. The null distribution of the test is approximated by bootstrapping the curves. The method does not directly account for missingness. Therefore, to employ it for our application we need some pre-processing. Specifically, we reconstruct the latent smooth profiles within each group over equally spaced grids. To do this, we use the *fpca.sc* function in the
`R` package
`refund` with the percentage of explained variance parameter set to 99% (Crainiceanu et al. (2012)). The p-value with this approach is based on 5000 bootstrap replications. The p-value is 0.00, indicating that the means profiles are statistically different in the two groups.

We turn next to the CCA-L0 profiles and study whether they vary according to the same distribution in the MS patients compared to controls; we use FAD as described above. The analysis follows roughly the same procedure as above; to avoid repetition we focus only on the part that is new. Five eigenfunctions are selected to estimate 95% of the total variability of the combined dataset. Figure S3 displays the leading three eigenfunctions, along with the boxplots of the corresponding coefficients presented separately for the MS and control groups. The leading eigenfunction accounts for 76% of the total variability; it is negative and has a concave shape with a dip around location 60 of the CCA. This component gives the direction along which the two curves differ the most. Near location 60, the distribution of the CCA-L0 is highly skewed for the MS group, but not as skewed as in the control group. Examination of the boxplot of the coefficients corresponding to the first eigenfunction (left, bottom panel of Figure S3) shows that most healthy individuals (controls) are positive, yielding CCA-L0 profiles that are lower than the population average profile. In contrast, over half of the MS cases have a negative coefficient for this function, leading to CCA-L0 profiles that are larger than the population average. Our findings are consistent with Staicu et al. (2012), who discuss possible reasons for which L0 tends to be higher in the MS cases than controls. We apply the AD testing procedure to assess whether the distribution of the coefficients is different for the two groups. The p-values of the univariate tests are *p*_{1} = 0.00001, *p*_{2} = 0.07165, *p*_{3} = 0.02479, *p*_{4} = 0.19887, and *p*_{5} = 0.29161, leading to the p-value *p* = 5 × min_{1≤k≤5}
*p _{k}* = 0.00005 of the FAD test. This shows significant evidence that the parallel diffusivity has a different distribution in MS subjects compared to controls. The

Overall the results provide strong support that along the CCA, the fractional anisotropy and parallel diffusivity properties of the water diffusion vary differently in the MS patients relative to controls. This is in agreement with the scientists' expectations that MS is generally associated with lesions and axonal demyelination on the CCA (Evangelou et al. (2000); Ozturk et al. (2010)). Thus, the water diffusivity is affected along this tract for diseased subjects. The “targeted” mean test (Benko et al. (2009)) yields that both FA and L0 have a different mean along the CCA for MS cases and controls.

In the attempt to gain insight into the neurodegeneration of the white mater tracts associated with MS, we study whether the measured properties of water diffusivity, FA and L0, vary differently in MS cases and controls over the remaining four tracts. For each tract modality data set, the
`fpca.sc` function in the
`R` package
`refund` is employed for the marginal FPCA and the percentage of explained variance is fixed at *τ* = 95%. Table 2 shows the p-values corresponding to testing the equality of the distributions of the FA and L0 profiles along each of the five tracts. When interpreting the results, Bonferroni correction is used to ensure a family error rate equal to the desired nominal level; the analysis uses 0.05 level of significance.

The results show that the water diffusivity varies differently along the lCST tract in MS patients compared to healthy individuals, as assessed through FA and L0. The p-values for these cases are very small and less than 0.005 - which is the threshold level using the Bonferroni adjustment for 10 comparisons. The situation is not as clear for rCST, lOPR, and rOPR tracts, where only one of the DTI modalities seems to capture the difference in the distribution of the water diffusivity properties. Take for example the rCST tract. When assessing the null hypothesis that the FA profiles vary in the same way for MS patients and controls using the FAD test, we obtain a p-value equal to 0.0103, which does not provide support against this null hypothesis. On the other hand, testing the null hypothesis that the L0 profiles vary in the same way for MS patients and controls using the FAD test yields a p-value equal to 0.00002, which indicates that if the null hypothesis were true, obtaining such a value of the test would be very unlikely. These two results show evidence that the rCST tract is also affected by MS and that the parallel diffusivity property of the water diffusivity appropriately captures this degeneration. Furthermore, our findings suggest that both the lOPR and the rOPR tracts are associated with MS, and that FA is the more appropriate modality to capture this effect.

As before, we consider a second stage analysis to investigate whether the difference in the distribution of the profiles along various tracts is in the mean, or whether it is in the higher order moments. The second stage analysis is only conducted on the tract-modality profiles for which the FAD test determines significant difference in the distributions. We test the null hypothesis that the means of various modalities along corresponding tracts is the same for MS patients and controls, using the test proposed by Benko et al. (2009) with 5000 bootstrap samples. As discussed above, this procedure requires some pre-processing of the data, to reconstruct the true latent signals. The
`fpca.sc` function was used for this purpose with the percentage of explained variance set at 99%. The results show that CCA-FA, lOPR-FA, rOPR-FA, CCA-L0, lCST-L0, and lCST-L0 have mean profiles that are significantly different in MS patients and controls (p-values of 0.00). In contrast the difference between the distribution of lCST-FA in MS subjects and controls seems to be in the higher order moments (p-value of 0.0672).

Earlier versions of this study consisting of data from fewer and possibly different patients were considered in Greven et al. (2011) and Goldsmith et al. (2011). Greven et al. (2011) proposed a longitudinal functional principal component framework analysis for the CCA-FA profiles observed over time. Goldsmith et al. (2011) studied the association between CCA-L0 profiles of MS observed at the baseline and the patients' cognitive scores. In the earlier version of the study, a different registration technique was used for the various tracts: the tracts were sampled at 120 equidistant locations - based on a processing algorithm that used 20 equally spaced points between known landmarks. In more recent versions of this study (Crainiceanu et al. (2012), Staicu et al. (2012); Gertheiss et al. (2013a); McLean et al. (2014), Ivanescu et al. (2014); Li et al. (2014)) including the one considered in this paper, 93 locations are used for the CCA tract - based on slice numbers in the atlas space after registering images across subjects (Reich et al. (2010)). This process aligns the functions across patients using anatomical characteristics of the brain. Nevertheless, this work is the first one that formally assesses whether the way the several modalities along the main tracts vary is different in MS patients relative to healthy controls.

Perhaps the closest work to ours is that of Gertheiss et al. (2013a), who considered a simultaneous study of the magnetic resonance imaging indices along the various tracts in an attempt to find which ones are important in predicting disease status. Their study included six magnetic resonance imaging indices, including the two DTI modalities, FA and L0. Their results indicated that CCA-FA and CCA-L0 are not deemed important in predicting the disease status. Our presented analysis does find that these tract-modality profiles differ between patients with MS and healthy controls. Gertheiss et al. (2013a) did identify a different modality - perpendicular diffusivity - to be more relevant. Some of our results do seem to agree with theirs, since lCST-FA, rCST-FA, lOPR-FA, rOPR-FA, and rOPR-L0 are found to be associated with MS. However their functional variable selection approach centered and scaled the functional covariates (modalities along tracts), to have pointwise zero mean and unit variance in the overall sample; no such transformation was used by our approach. While their transformation is not required, the comparison of our results with their findings requires further exploration.

We develop methodology to compare the white matter tract neurodegeneration in MS patients and healthy individuals using state-of-the-art DTI. We formally assess whether two conventional DTI modalities along each of several well identified white matter tracts have different distributions in MS patients compared to controls. Our findings confirm that the CCA and lCST tracts are associated with the disease. Our results suggest evidence of axonal disruption along the other three tracts in the study - rCST, lOPR and rOPR - though only one of the two DTI measures actually captures it. Furthermore, while the mean of most DTI profiles is one of the reasons (or the reason) that the modalities vary significantly differently in MS patients than controls, the way that FA along the lCST tract varies differently in the two groups is captured by higher order moments.

When dealing with functional data, which theoretically are infinite dimensional objects, it is important to use data reduction techniques. We propose marginal FPCA to represent the curves using the eigenbasis of the marginal covariance of an appropriate mixture process. This reduces the dimension of the testing problem and allows the application of well known lower-dimensional testing procedures. The proposed testing approach combines marginal FPCA and classical univariate procedures, scales well to large samples sizes, and can be easily extended to test the null hypothesis that multiple (more than two) groups of curves have identical distribution. We used Anderson-Darling test and showed through simulation studies that the proposed FAD test is very competitive when compared with alternatives.

We thank Daniel Reich and Peter Calabresi for the DTI data and Ingrid Van Keilegom for sharing the R code used in Hall and Van Keilegom (2007). Pomann was funded by the AT&T Graduate Research Fellowship and NSF grant DGE-0946818. Staicu was funded by the NSF grants DMS 1007466 and DMS 0454942 and NIH grants R01 NS085211 and R01 MH086633. Ghosh was funded by the NSF grant DMS-1358556 (IPA assignment) and DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute.

Gina-Maria Pomann, Duke University, Durham, USA.

Ana-Maria Staicu, North Carolina State University, Raleigh, USA.

Sujit Ghosh, North Carolina State University, Raleigh and Statistical and Applied Mathematical Sciences Institute, RTP, NC. USA.

- Annette Moller A, Tutz G, Gertheiss J. Random forests for functional covariates. preparation 2015
- Aslan B, Zech G. New test for the multivariate two-sample problem based on the concept of minimum energy. Journal of Statistical Computation and Simulation. 2005;75(2):109–119.
- Benko M, Hardle W, Kneip A. Common functional principal components. The Annals of Statistics. 2009;37:1–34.
- Bohm G, Zech G. Introduction to statistics and data analysis for physicists. DESY; 2010.
- Bosq D. Linear processes in function spaces: theory and applications. Vol. 149. Springer; 2000.
- Corain L, Melas VB, Pepelyshev A, Salmaso L. New insights on permutation approach for hypothesis testing on functional data. Advances in Data Analysis and Classification. 2014;8(3):339–356.
- Crainiceanu C, Reiss P, Goldsmith J, Huang L, Huo L, Scheipl F, Greven S, Harezlak J, Kundu MG, Zhao Y. refund : Regression with functional data. R Package 0.1-6 2012
- Cuevas A, Febrero M, Fraiman R. An anova test for functional data. Computational statistics & data analysis. 2004;47(1):111–122.
- Di C, Crainiceanu CM, Caffo BS, Naresh NM, Punjabi M. Multilevel functional principal component analysis. The annals of Applied Statistics. 2009;3(1):458–488. [PMC free article] [PubMed]
- Dorris DM, Cao J, Willett JA, Hauser CA, Meitzen J. Intrinsic excitability varies by sex in pre-pubertal striatal medium spiny neurons. Journal of neurophysiology. 2014 jn–00687. [PubMed]
- Estévez-Pérez G, Vilar JA. Functional anova starting from discrete data: an application to air quality data. Environmental and Ecological Statistics. 2008;20:495–515.
- Evangelou N, Konz D, Esiri M, Smith S, Palace J, Matthews P. Regional axonal loss in the corpus callosum correlates with cerebral white matter lesion volume and distribution in multiple sclerosis. Brain. 2000;123(9):1845–1849. [PubMed]
- Ferraty F, Vieu P, Viguier-Pla S. Factor-based comparison of groups of curves. Computational Statistics & Data Analysis. 2007;51(10):4903–4910.
- Fremdt S, Horvath L, Kokoszka P, Steinebach J. Testing the equality of covariance operators in functional samples. Scand J Statist. 2012;40:138–152.
- Friedman J, Rafsky L. Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests. The Annals of Statistics. 1979;7:697–717.
- Gertheiss J, Maity A, Staicu AM. Variable selection in generalized functional linear models. Stat. 2013a;2(1):86–101. [PMC free article] [PubMed]
- Gertheiss J, Goldsmith J, Crainiceanu C, Greven S. Longitudinal scalar-on-functions regression with application to tractography data. Biostatistics. 2013b;14:447–461. [PMC free article] [PubMed]
- Goldsmith J, Bobb J, Crainiceanu C, Caffo B, Reich D. Penalized functional regression. Journal of Computational and Graphical Statistics. 2011;20:850–851. [PMC free article] [PubMed]
- Goldsmith J, Crainiceanu CM, Caffo B, Reich D. Longitudinal penalized functional regression for cognitive outcomes on neuronal tract measurements. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2012;61(3):453–469. [PMC free article] [PubMed]
- Greven S, Crainiceanu C, Caffo B, Reich D. Recent Advances in Functional Data Analysis and Related Topics. Springer; 2011. Longitudinal functional principal component analysis; pp. 149–154.
- Hall P, Muller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Annals of Statistics. 2006;34:1493–1517.
- Hall P, Van Keilegom I. Two-sample tests in functional data analysis starting from discrete data. Statistica Sinica. 2007;17(4):1511–1531.
- Horváth L, Kokoszka P, Reeder R. Estimation of the mean of functional time series and a two-sample problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2013;75(1):103–122.
- Ivanescu AE, Staicu AM, Scheipl F, Greven S. Penalized function-on-function regression. Computational Statistics. 2014:1–30.
- Kraus D, Panaretos V. Dispersion operators and resistant second-order analysis of functional data. Biometrika. 2012;99:813–832.
- Laukaitis A, Račkauskas A. Functional data analysis for clients segmentation tasks. European journal of operational research. 2005;163(1):210–216.
- Li M, Staicu AM, Bondell HD. Incorporating covariates in skewed functional data models. Biostatistics. 2014 kxu055. [PubMed]
- Massey F., Jr The kolmogorov-smirnov test for goodness of fit. Journal of the American statistical Association. 1951;46(253):68–78.
- McLean MW, Hooker G, Staicu AM, Scheipl F, Ruppert D. Functional generalized additive models. Journal of Computational and Graphical Statistics. 2014;23(1):249–269. [PMC free article] [PubMed]
- Ozturk A, Smith S, Gordon-Lipkin E, Harrison D, Shiee N, Pham D, Caffo B, Calabresi P, Reich D. Mri of the corpus callosum in multiple sclerosis: association with disability. Multiple Sclerosis. 2010;16(2):166–177. [PMC free article] [PubMed]
- Pettitt AN. A two-sample anderson-darling rank statistic. Biometrika. 1976;63(1):161–168.
- Ramsay J, Silverman B. Functional Data Analysis. Springer; 2005.
- Read T, Cressie N. Goodness-of-fit statistics for discrete multivariate data. Vol. 7. Springer-Verlag; New York: 1988.
- Reich DS, Ozturk A, Calabresi PA, Mori S. Automated vs. conventional tractography in multiple sclerosis: variability and correlation with disability. NeuroImage. 2010;49(4):3047–3056. [PMC free article] [PubMed]
- Schilling M. Multivariate two-sample tests based on nearest neighbors. Journal of the American Statistical Association. 1986;81(395):799–806.
- Scholz F. Anderson darling k sample test. R Package adk 2011
- Scholz F, Stephens M. K-sample anderson–darling tests. Journal of the American Statistical Association. 1987;82(399):918–924.
- Staicu AM, Crainiceanu CM, Carroll RJ. Fast methods for spatially correlated multilevel functional data. Biostatistics. 2010;11(2):177–194. [PMC free article] [PubMed]
- Staicu AM, Crainiceanu CM, Reich DS, Ruppert D. Modeling functional data with spatially heterogeneous shape characteristics. Biometrics. 2012;68(2):331–343. [PMC free article] [PubMed]
- Staicu AM, Li Y, Crainiceanu CM, Ruppert D. Likelihood ratio tests for dependent data with applications to longitudinal and functional data analysis. Scandinavian Journal of Statistics to appear 2014
- Staniswalis JG, Lee JJ. Nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association. 1998;93(444):1403–1418.
- Stephens MA. Edf statistics for goodness of fit and some comparisons. Journal of the American Statistical Association. 1974;69(347):730–737.
- Team, R. C. vienna, austria: 2015. R: A language and environment for statistical computing. URL http://www.R-project.org.
- Wei L, Lachin J. Two-sample asymptotically distribution-free tests for incomplete multivariate observations. Journal of the American Statistical Association. 1984;79(387):653–661.
- Wurtz D, Chalabi Y, Luksan L. Parameter estimation of arma models with garch/aparch errors an r and splus software implementation. University of Pennsylvania; 2006. Online Article.
- Yao F, Muller H, Wang J. Functional data analysis for sparse longitudinal data. JASA. 2005;100:577–591.
- Zhang JT, Chen J. Statistical inferences for functional data. The Annals of Statistics. 2007;35(3):1052–1079.
- Zhang JT, Liang X, Xiao S. On the two-sample behrens-fisher problem for functional data. Journal of Statistical Theory and Practice. 2010;4(4):571–587.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |