|Home | About | Journals | Submit | Contact Us | Français|
We propose a spatial Bayesian variable selection method for detecting blood oxygenation level dependent activation in functional magnetic resonance imaging (fMRI) data. Typical fMRI experiments generate large datasets that exhibit complex spatial and temporal dependence. Fitting a full statistical model to such data can be so computationally burdensome that many practitioners resort to fitting oversimplified models, which can lead to lower quality inference. We develop a full statistical model that permits efficient computation. Our approach eases the computational burden in two ways. We partition the brain into 3D parcels, and fit our model to the parcels in parallel. Voxel-level activation within each parcel is modeled as regressions located on a lattice. Regressors represent the magnitude of change in blood oxygenation in response to a stimulus, while a latent indicator for each regressor represents whether the change is zero or non-zero. A sparse spatial generalized linear mixed model captures the spatial dependence among indicator variables within a parcel and for a given stimulus. The sparse SGLMM permits considerably more efficient computation than does the spatial model typically employed in fMRI. Through simulation we show that our parcellation scheme performs well in various realistic scenarios. Importantly, indicator variables on the boundary between parcels do not exhibit edge effects. We conclude by applying our methodology to data from a task-based fMRI experiment.
This article develops a spatial Bayesian variable selection (SBVS) procedure for detecting blood oxygenation level dependent (BOLD) activation in functional magnetic resonance imaging (fMRI) experiments. fMRI is an established method for revealing region-specific activation in the human brain that occurs in response to tasks or stimuli (Bandettini, 2009). During a single-subject fMRI experiment, a participant performs a task in response to a stimulus while 3D images of the subject's brain are collected every 2–3s. The images capture the BOLD signal contrast, allowing us to indirectly observe neuronal activity. The presence (or absence) of neuronal activity in response to stimuli is used to describe brain function networks, to assess brain development, or to assess impairment related to conditions, including alcoholism and brain trauma.
Single-subject fMRI data have a complex structure. Each collected image comprises a 3D square grid, or lattice, of volume elements (voxels). The BOLD response is observed over time at each voxel, resulting in a large quantity of data (in excess of 20million observations per person) that exhibit both spatial and temporal dependence. Regression models are used to analyze fMRI data such that the per-voxel regression coefficients quantify the brain activity, relative to a rest condition, in that voxel during the task; a voxel is considered inactive if the magnitude of the BOLD contrast is not significantly different from zero. The standard non-Bayesian approach to computing estimates of the activation magnitude for a single brain follow, e.g., Friston and others (1995) and Worsley (2003), which do not incorporate spatial modeling. Several Bayesian approaches have appeared in the neuroimaging literature (Woolrich and others, 2004; Bowman, 2007; Smith and Fahrmeir, 2007). In particular, Smith and Fahrmeir (2007) proposed an SBVS approach in which the regressions are located on a voxel-based lattice, and latent binary indicators represent whether each voxel on the lattice is active or inactive.
This SBVS approach considers voxels as having spatially linked regressions. By borrowing strength across neighboring voxels, this approach results in more reliable inference for voxels where there is little information, i.e., small signal-to-noise ratio (SNR). The primary drawback of the above-mentioned SBVS approaches is that temporal correlation within each voxel time series is ignored in the interest of computational efficiency (Smith and others, 2003; Smith and Fahrmeir, 2007). Our model accounts for both types of dependence. Another innovation of our proposed method is the use of the sparse spatial generalized linear mixed model (SGLMM) of Hughes and Haran (2013) to model latent indicator variables. The sparse SGLMM permits much faster computation than does the more commonly used Ising model (Ising, 1925).
This article also utilizes a brain parcellation. A typical adult brain has more than 50,000 voxels, whereas our parcellation method results in parcels potentially comprising voxels. This parcellation speeds computation in two ways: (1) parcel-level analyses can be done in parallel, and (2) the “size” of the required spatial model is reduced. Thus, the use of the sparse SGLMM in conjunction with a parcellation allows for rapid simulation of posterior samples. The computational efficiency of our approach allows us to make more statistically principled modeling decisions, i.e., our modeling need not be driven by computational considerations. The result is a full, and fully Bayesian, model than can be applied efficiently.
We show that our method is insensitive to model misspecification and suffers minimal edge effects (edge effects could arise because voxels on the edge of a parcel are strongly spatially correlated with voxels on the edges of neighboring parcels). Using simulated data, we demonstrate that our parcellation-based method is able to achieve a high accuracy rate in predicting voxel- and task-specific activations. A full-brain dataset is used to illustrate the applicability of the method. Maps of estimated activation probabilities are produced for these data.
The article is organized as follows. In Section 2, our model is specified, and previous methods are described. In Section 3, the innovations introduced in our approach are described in detail. In Section 4, a simulation study is carried out. In Section 5, our model is applied to a single-subject dataset from a study on depression in adolescents (Musgrove and others, 2015). The article ends in Section 6 with a discussion. Additional supporting material and details regarding our Markov chain Monte Carlo (MCMC) sampling scheme are given in Appendix A of the supplementary materials (available at Biostatistics online).
For the th voxel, the voxel time series comprises the BOLD response over time points. For each such time series, we have a linear model:
The baseline trend is modeled as a fixed parameter or as a set of polynomial or trigonometric functions (Smith and Fahrmeir, 2007). The activation profile is captured via the -vector , which is the magnitude of the BOLD response to the stimuli. The design matrix , , is composed of columns that are a function of the task stimuli corresponding to . Temporal correlation is accounted for by , an -vector of autoregression coefficients, with the matrix of lagged prediction errors, while . Spatial dependence between voxels is induced via latent indicator variables such that if and if , where indexes the stimulus effects in . Estimation of and is of primary interest.
SBVS methods were first applied in fMRI by Smith and others (2003), and more recently in Smith and Fahrmeir (2007). In the latter paper, Smith and Fahrmeir considered a full-brain approach that introduced spatial dependence via an Ising prior for the latent binary indicator variables. The Ising model (Ising, 1925) is a binary Markov random field model (Kindermann and Snell, 1980) that was first employed in the study of ferromagnetism. The Ising prior is appealing from a modeling point of view but has an intractable normalizing function. This problem can be addressed using perfect sampling (Møller, 1999; Moller and others, 2006) or one of various approximations (Gelman and Meng, 1998; Wang and Landau, 2001; Zhang and Ma, 2007). The former leads to burdensome computation. The latter fails to guarantee that the Markov chain has the desired stationary distribution.
where . For , the vector of binary indicator variables over all voxels for the th stimulus effect, the Ising prior is given by
where denotes neighboring voxels, denotes the indicator function, is fixed and represents prior knowledge of activation locations, is the interaction effect between neighboring voxels, and is a pre-specified weight. For , the marginal posterior distribution is sampled using MCMC, and the probabilities of activation are estimated as the sample means of the indicator variables. Specifically, for samples from the posterior distribution of , the estimated probability of activation is
where denotes the full indicator variable set sans . The regression coefficients are estimated, using Rao–Blackwellization (Smith and Fahrmeir, 2007), as
The data-based g-prior was chosen for convenience, as it allows the regression coefficients to be integrated out analytically, with the goal of speeding convergence. Further, magnitudes of the stimulus effects (represented by the regression coefficients) are estimated by least squares. The method is purely spatial and so does not account for temporal dependence in voxel time series.
Noting that the lack of temporal modeling leads to lower quality inference, Lee and others (2014) developed a spatiotemporal model using the Ising prior as in Smith and Fahrmeir (2007). There, the authors captured the temporal dependence via a first-order autoregressive (AR(1)) model. Such an approach proves problematic when the regression coefficients are analytically integrated out of the likelihood: the resulting voxel time-series covariance matrix includes an unknown AR(1) coefficient, which necessitates the inversion of a dense matrix during each iteration of the sampler. This implies a sizable computational burden.
Our approach makes use of a parcellation that divides the brain into many non-overlapping parcels. Within each parcel, we apply a SBVS method that also accounts for voxel-level temporal correlation. In the context of fMRI, our procedure is novel in that a sparse SGLMM prior is used to model the spatial dependence among the activation indicators. Parcellation and the sparse SGLMM together permit very efficient sampling even though our model is fully Bayesian.
We use a divide-and-conquer approach to achieve speed gains over previous methods via analysis of parcels generated pseudorandomly as follows. First, divide the brain into cubes, each of which has voxels on a side (where is chosen by the investigator). For example, results in parcels of size 729 voxels. Since the brain is not rectangular, many of the parcels will include fewer than voxels. We have found that a minimum of 500 voxels per parcel results in excellent spatial inference. Next, an iterative algorithm is implemented that ensures parcels have a minimum of 500 voxels while ensuring that the underlying graph is connected. The final dataset for analysis comprises parcels, with the th parcel having voxels. Computational gains are achieved since the parcels can be analyzed in parallel. The parcellation could potentially be based on an existing anatomical atlas, such as Brodmann areas (Tzourio-Mazoyer and others, 2002), but we found that such an approach leads to an excessive number of parcels with fewer than 500 voxels. Thus, we will apply the divide-and-conquer approach in both our simulation study and application to real data.
We develop a Bayesian variable selection scheme (George and McCulloch, 1997) for the regression coefficients of the voxel-level regression (2.1). We place a spike-and-slab mixture prior on the regression coefficients such that each is drawn from a diffuse normal distribution (the slab) or a point mass at zero (the spike). This structure reflects the prior belief that a coefficient is non-zero or zero, respectively. Latent indicator variables are introduced such that the mixture prior for each has the form
where is an unknown stimulus-level variance and denotes a point mass at zero. We make the natural assumption that the regression coefficients are a priori independent conditional on the indicator variables (George and McCulloch, 1997). We model the spatial dependence between voxels by placing a spatial prior on the indicator variables (discussed in Section 3.3).
The columns of the design matrix contain the transformed stimulus function, a hemodynamic response function (HRF) convolved with task onset times and durations. The use of the HRF is motivated by the fact that the BOLD response is delayed in time after the underlying neuronal activation, and occurs continuously, rather than in a binary fashion, at the task onset times. Since the focus of this article is on our spatial prior and our parcellation scheme, we use an “off-the-shelf” HRF, namely the gamma density. Other possibilities include the Poisson probability mass function, and the double-gamma canonical HRF (Lindquist and others, 2009).
To account for the serial correlation present in the univariate voxel time series, we employ an AR(2) model that is easy to implement and computationally efficient. The AR(2) model offers a trade-off between the computational complexity of a higher-order AR process and a simpler AR(1) process. Similar to Penny and others (2003), we include a matrix of lagged prediction errors, denoted , in the regression model. We use a zero mean and vague normal prior on the AR(2) coefficients . To complete the voxel-level prior specification, the prior on the error variance is , and the prior on the variance of the non-zero regression coefficients is . In this way, regression coefficients across voxels share a prior variance, which results in additional smoothing beyond that induced by the spatial prior.
Here we describe our spatial prior on the voxel- and task-specific binary indicator variables . We assume a sparse areal generalized linear mixed model (Hughes and Haran, 2013) for each . Specifically, are conditionally independent Bernoulli random variables with a probit link function, i.e.,
where denotes the cumulative distribution function of a standard normal random variable, is a vector of synthetic spatial predictors, is a vector of spatial random effects, is an auxiliary variable introduced to facilitate Gibbs sampling (Holmes and Held, 2006), and is fixed to reflect the prior probability of activation. Voxels are located at the vertices of an underlying undirected graph, the structure of which reflects spatial adjacency among voxels. We represent the graph using its parcel-level adjacency matrix , which is with entries given by and . In a 2D analysis, we let a voxel neighborhood comprise the four nearest voxels. With 3D fMRI data, a neighborhood contains the 26 nearest voxels. The prior for the spatial random effects is
where is a smoothing parameter; is an matrix, the columns of which are the principal eigenvectors of ; and is the graph Laplacian. Note that above is the th row of . The columns of are multiresolutional spatial basis vectors that are well suited for spatial smoothing and allow us to capture both small-scale and large-scale spatial variation typically exhibited by fMRI data (Woolrich and others, 2004). Following Hughes and Haran (2013), we introduce sparsity by selecting a small subset of the columns of . This choice permits spatial smoothing while reducing the dimensionality considerably (typically, ). Our choice of prior on the smoothing parameter follows Kelsall and Wakefield (1999) such that the shape and the scale . Such a choice results in a large prior mean of the precision implying a small variance of the random effects which does not produce artifactual spatial structure in the posterior.
Denote the voxel-level parameters as , and the parcel-level parameters as . Within the th parcel (having voxels), we assume that
which implies that the between-voxel and between-task parameters are conditionally independent a priori. The posterior distribution is obtained in the usual way by combining priors and the likelihood. To obtain updates for each , we use a voxel-level likelihood where has been integrated out analytically. For , let , and let . Then, conditional on , the likelihood can be written as a mixture with two components:
where is the voxel-level likelihood when , and is the likelihood when . Integrating out of , it is straightforward to show that
where . The posterior probability that is , where
Conditional on , set . Otherwise, is updated from its full conditional distribution. We can write , and since the error term is normally distributed, and each has a normal prior distribution, conditional on , the posterior distribution of is given by , where and .
Posterior sampling of each of , , and uses probit regression with auxiliary variables, conditional on only (Holmes and Held, 2006). The full conditional distributions of all remaining variables are derived in Appendix A of the supplementary materials (available at Biostatistics online).
In order to declare a voxel active or inactive, a threshold on the estimated posterior probability of activation must chosen. We follow Smith and Fahrmeir (2007) and Lee and others (2014) in choosing a threshold value of 0.8722. Lee and others (2014) showed that a threshold of 0.8722 corresponds to a p-value of 0.05. Using a simulation study, we found that this threshold works well in practice.
We carried out two simulation studies to assess the performance of our approach. First, we simulated data from our model under six scenarios to investigate the effects on inference of different values of the spatial smoothing parameter and the error variance . Next, we investigated the parcellation method's effect on activation detection. The latter simulation study employs the neuRosim package (Welvaert and others, 2011) for R (Ihaka and Gentleman, 1996) to simulate 2D datasets exhibiting spatial and temporal dependence.
We applied our proposed method to data simulated from the prior distributions given in (3.2) and (3.3). Under investigation is the influence of the spatial smoothing parameter and the error variance . We choose or . Simulated probabilities are shown in Figure Figure1(a)–(c).1(a)–(c). We chose values of 0.5 and 2 for the error variance.
We fix and simulate on the (625 voxels) square lattice. On the 2D square lattice, voxels were considered neighbors if they shared a face, thus the maximum number of neighbors per voxel is 4 voxels. The spatial random effects are generated according to (3.3), using eigenvectors of the adjacency matrix . This value of is specific to the chosen parcel structure. The probability of activation is computed as (recall that is the th row of the matrix of spatial basis vectors). For purposes of the simulation study, the fixed prior probability, , was set to zero. Thus, the implied prior probability of activation is 0.5 for each voxel. Simulated activation locations are shown in Figure Figure1(d)–(f).1(d)–(f). A design matrix of dimension is created such that the first column is the appropriate vector of ones, and the second column corresponds to a stimulus from our application to a block design study, as shown by the black curve in Figure Figure2.2. We set the baseline signal to 1,200. Each is set to zero or six depending on the value of . The value of six was chosen to represent a very small percentage increase over baseline. We assume a second-order autoregressive structure with coefficients , and fix the error variance . In this setting, the value of , along with the stimulus time series, corresponds to an average activation amplitude of 6.3. Thus, the SNRs for the simulation study are and . The latter SNR is similar to that observed in our application to real data.
For each scenario, we simulated and analyzed 100 datasets. We used a fixed-width approach to convergence diagnosis (Flegal and others, 2008). This led us to draw 200,000 samples for each dataset, which ensured that no Monte Carlo standard error (MCSE) was larger than 0.005. We define accuracy as the percentage of voxels correctly classified, i.e., and simultaneously, and false-positive rate as the percentage of false positives, i.e., . Using a threshold for the estimated posterior means of the of 0.8722, the accuracy and false-positive rate are estimated for each simulated dataset, and the results are presented as medians across all 100 datasets.
The summaries are given in Table Table1.1. We see that our approach performs well across all values of the spatial smoothing parameter . The median voxel accuracy rates are above 99% for an error variance of 0.5, while the median false-positive rate is 2%. As the error variance increases to two, the accuracy rate remains very high. Figure Figure1(g)–(l)1(g)–(l) depicts our model's ability to recover the probability and binary maps. Regression and autoregression coefficients, error variances, and spatial parameters were estimated with minimal error (not shown).
For the second simulation study, we used the neuRosim package (Welvaert and others, 2011) to simulate spatiotemporal fMRI outcomes on the square lattice. Four regions of activation were selected, as shown in Figure Figure3(a),3(a), and the faces stimulus was used, as shown in Figure Figure2.2. The locations of activation were chosen such that a region of activation overlaps two quadrants, and one quadrant is empty. Each of the 2,500 voxels was simulated from a white noise process or from an autoregressive process of order 2 with , dependent on the true value of shown in Figure Figure3(a).3(a). The SNR was chosen to be 4.5, as in our application to real data.
To assess sensitivity to the chosen parcellation, we simulated 100 datasets from this scenario, and applied our model to each dataset in full. Next, we partitioned each dataset into four parcels, and carried out estimation on each of the parcels independently. Then the four parcels were rejoined before classification accuracy was assessed.
The summaries are given in Table Table1.1. We see that, similar to the varying precisions simulation study, the high sensitivity, the high percentage of voxels correctly classified, and the relatively low false-positive rate suggest that our model is insensitive to parcellation. Figure Figure3(c)3(c) shows the resulting binary map for a randomly selected dataset. Somewhat surprisingly, the false-positive rate under parcellation was slightly lower than for analyses of the full datasets. We investigated the sensitivity and false-positive rate among voxels along the partition lines, and the results were similar to those for the full datasets (not shown). Regression and autoregression coefficients, error variances, and spatial parameters were estimated with minimal error (not shown).
The dataset used to illustrate our method is a single-subject fMRI time series from a visual experiment. The dataset was collected as part of a larger connectivity study designed to quantify the interaction between depression and neural response in adolescents with major depressive disorder (Musgrove and others, 2015). Subjects completed a 6.5min fMRI task during the scan that had two matching conditions (facial expressions of fear and anger) or neutral objects (geometric shapes). The task included 13 counterbalanced blocks (3 fixation, 5 shape, and 5 faces). Face and shape blocks comprised six consecutive 5-s trials. Fixation blocks were presented at the beginning, after five blocks of shapes and faces, and at the conclusion of the task. In particular, the task was designed to elicit a negative emotional response (Hariri and others, 2002), and so activation was expected in regions, including the amygdala, anterior cingulate cortex (subgenual and supragenual regions), and prefrontal cortex (ventromedial and dorsolateral regions) (Musgrove and others, 2015).
Imaging was performed using a 3T Siemens Trio MRI Scanner at the Center for Magnetic Resonance Research at the University of Minnesota. The task was projected onto a screen inside the scanner's bore that participants could view using a mirror on the head coil. Using a T2-sensitive echo planar sequence, 197 scans were acquired, 46 axial slices (no gap, interleaved slice acquisition), 4mm in thickness (repetition time 2s, echo time 28ms, array, and a flip angle of ). The voxel size was and there were 61 thousand brain voxels in total. The following preprocessing steps were used (FMRIB software library, FSL; http://fsl.fmrib.ox.ac.uk): motion correction using the first volume in the functional series as the reference volume, slice timing correction, skull stripping, high-pass temporal filtering with a 25s cutoff, prewhitening, and registration to Montreal Neurological Institute standard space.
Onset time indicators for each of the stimuli were convolved with the double-gamma HRF, as shown in Figure Figure2.2. The resulting shape and face stimuli time series were used in the design matrix. Our model was fit to a single subject's dataset. The parcellation technique resulted in 90 parcels with a median parcel size of 729 voxels (min: 500, max: 1,000). Use of the sparse SGLMM resulted in an average spatial dimension reduction of 72% per parcel. The prior probability of activation was fixed at 2% across all voxels by choosing .
Initial values were chosen as the maximum likelihood estimates of each parameter. A single chain with no burn-in was used to analyze each parcel. Owing to the high dimensionality, it is difficult to use convergence criteria beyond a sufficiently small MCSE. Thus, to ensure small MCSE for all parameter estimates, 200,000 samples were drawn from the posterior distribution for each parcel. The largest MCSE of the estimated posterior means of the activation indicator variables was below 0.005, indicating that a sufficient number of samples were drawn. Figure Figure44 displays axial slices showing voxels with BOLD level changes for the faces stimulus exceeding fixation with a high probability. The highlighted voxels have estimated posterior probabilities that exceed 0.8722. Regions highlighted include the amygdala, anterior cingulate cortex, and prefrontal cortex.
The results of Figure Figure44 demonstrate the model's ability to encourage spatial clustering in areas with a high posterior probability of activation. This result reflects the goal of task-based experiments in which an investigator aims to reveal localization of task-related activation effects. Of primary interest for these data is activation of the amygdala, a relatively small functional region located medially within the temporal lobes (shown in Figure Figure4(a)).4(a)). Panels (a) and (b) of Figure Figure44 show the expected activity in the visual cortex. Panels (c) and (d) of Figure Figure44 show activity in the prefrontal cortex, especially the dorsolateral region.
This article presented a new approach to SBVS for regressions located on a lattice, with an emphasis on applications in single-subject fMRI studies. Our approach is novel in that we avoid the use of Zellner's g-prior, and instead use a spike-and-slab prior, where the spike is a point mass at zero. We also employ a flexible autoregressive model to account for the temporal correlation that is known to exist in voxel time series.
We assumed space-time separability, as in Lee and others (2014) and Smith and Fahrmeir (2007). To ensure convergence of the Markov chains in our MCMC scheme, a large number of draws must be made. We found that 200,000 posterior samples are required to ensure convergence. Computation time is limited by the number of parcels and access to a computing cluster. In our application, computations were carried out in a supercomputing environment using Intel Xeon E5-2680, 2.5GHz processors. Thus, our model required 40min of computing time for the 90 parcels. We have seen that inference appears insensitive to parcellation, and that the joint use of the parcellation and the sparse SGLMM greatly reduces the computational burden relative to analysis of a full dataset. The MCMC procedure produces samples from the joint posterior distribution of the model parameters, which obviates adjustment for multiple comparisons in a post-processing step.
It is important to account for voxel-level temporal correlation, and we found that accounting for spatial dependence is no less important. As for the former, our modeling framework efficiently handles temporal autoregression in a fully Bayesian way. Regarding spatial dependence, we compared our method with a non-spatial approach. The results (see Appendix B of the supplementary materials available at Biostatistics online) suggest that our approach yields a lower false-positive rate, and so spatial dependence should not be ignored.
Software in the form of R and C code, together with a sample dataset and complete documentation, is available upon request.
Subject data acquisition was supported by the National Institute of Mental Health (K23MH090421 to Kathryn R. Cullen); and the Biotechnology Research Center [P41RR008079 (Center for Magnetic Resonance Research)]. D.R.M. was supported by University of Minnesota Academic Health Center Faculty Research Development Grant #11.12 (to L.E.E. and Kathryn R. Cullen).
The authors are grateful to Kathryn R. Cullen, MD, University of Minnesota, for providing the single-subject dataset used in the application. This work was carried out in part using computing resources at the University of Minnesota Supercomputing Institute. Conflict of Interest: None declared.