Search tips
Search criteria 


Logo of biostsLink to Publisher's site
Biostatistics. 2016 April; 17(2): 291–303.
Published online 2015 November 9. doi:  10.1093/biostatistics/kxv044
PMCID: PMC5006116

Fast, fully Bayesian spatiotemporal inference for fMRI data


We propose a spatial Bayesian variable selection method for detecting blood oxygenation level dependent activation in functional magnetic resonance imaging (fMRI) data. Typical fMRI experiments generate large datasets that exhibit complex spatial and temporal dependence. Fitting a full statistical model to such data can be so computationally burdensome that many practitioners resort to fitting oversimplified models, which can lead to lower quality inference. We develop a full statistical model that permits efficient computation. Our approach eases the computational burden in two ways. We partition the brain into 3D parcels, and fit our model to the parcels in parallel. Voxel-level activation within each parcel is modeled as regressions located on a lattice. Regressors represent the magnitude of change in blood oxygenation in response to a stimulus, while a latent indicator for each regressor represents whether the change is zero or non-zero. A sparse spatial generalized linear mixed model captures the spatial dependence among indicator variables within a parcel and for a given stimulus. The sparse SGLMM permits considerably more efficient computation than does the spatial model typically employed in fMRI. Through simulation we show that our parcellation scheme performs well in various realistic scenarios. Importantly, indicator variables on the boundary between parcels do not exhibit edge effects. We conclude by applying our methodology to data from a task-based fMRI experiment.

Keywords: Bayesian variable selection, Dimension reduction, MCMC, Parallel computation

1. Introduction

This article develops a spatial Bayesian variable selection (SBVS) procedure for detecting blood oxygenation level dependent (BOLD) activation in functional magnetic resonance imaging (fMRI) experiments. fMRI is an established method for revealing region-specific activation in the human brain that occurs in response to tasks or stimuli (Bandettini, 2009). During a single-subject fMRI experiment, a participant performs a task in response to a stimulus while 3D images of the subject's brain are collected every 2–3 s. The images capture the BOLD signal contrast, allowing us to indirectly observe neuronal activity. The presence (or absence) of neuronal activity in response to stimuli is used to describe brain function networks, to assess brain development, or to assess impairment related to conditions, including alcoholism and brain trauma.

Single-subject fMRI data have a complex structure. Each collected image comprises a 3D square grid, or lattice, of volume elements (voxels). The BOLD response is observed over time at each voxel, resulting in a large quantity of data (in excess of 20 million observations per person) that exhibit both spatial and temporal dependence. Regression models are used to analyze fMRI data such that the per-voxel regression coefficients quantify the brain activity, relative to a rest condition, in that voxel during the task; a voxel is considered inactive if the magnitude of the BOLD contrast is not significantly different from zero. The standard non-Bayesian approach to computing estimates of the activation magnitude for a single brain follow, e.g., Friston and others (1995) and Worsley (2003), which do not incorporate spatial modeling. Several Bayesian approaches have appeared in the neuroimaging literature (Woolrich and others, 2004; Bowman, 2007; Smith and Fahrmeir, 2007). In particular, Smith and Fahrmeir (2007) proposed an SBVS approach in which the regressions are located on a voxel-based lattice, and latent binary indicators represent whether each voxel on the lattice is active or inactive.

This SBVS approach considers voxels as having spatially linked regressions. By borrowing strength across neighboring voxels, this approach results in more reliable inference for voxels where there is little information, i.e., small signal-to-noise ratio (SNR). The primary drawback of the above-mentioned SBVS approaches is that temporal correlation within each voxel time series is ignored in the interest of computational efficiency (Smith and others, 2003; Smith and Fahrmeir, 2007). Our model accounts for both types of dependence. Another innovation of our proposed method is the use of the sparse spatial generalized linear mixed model (SGLMM) of Hughes and Haran (2013) to model latent indicator variables. The sparse SGLMM permits much faster computation than does the more commonly used Ising model (Ising, 1925).

This article also utilizes a brain parcellation. A typical adult brain has more than 50,000 voxels, whereas our parcellation method results in parcels potentially comprising equation M1 voxels. This parcellation speeds computation in two ways: (1) parcel-level analyses can be done in parallel, and (2) the “size” of the required spatial model is reduced. Thus, the use of the sparse SGLMM in conjunction with a parcellation allows for rapid simulation of posterior samples. The computational efficiency of our approach allows us to make more statistically principled modeling decisions, i.e., our modeling need not be driven by computational considerations. The result is a full, and fully Bayesian, model than can be applied efficiently.

We show that our method is insensitive to model misspecification and suffers minimal edge effects (edge effects could arise because voxels on the edge of a parcel are strongly spatially correlated with voxels on the edges of neighboring parcels). Using simulated data, we demonstrate that our parcellation-based method is able to achieve a high accuracy rate in predicting voxel- and task-specific activations. A full-brain dataset is used to illustrate the applicability of the method. Maps of estimated activation probabilities are produced for these data.

The article is organized as follows. In Section 2, our model is specified, and previous methods are described. In Section 3, the innovations introduced in our approach are described in detail. In Section 4, a simulation study is carried out. In Section 5, our model is applied to a single-subject dataset from a study on depression in adolescents (Musgrove and others, 2015). The article ends in Section 6 with a discussion. Additional supporting material and details regarding our Markov chain Monte Carlo (MCMC) sampling scheme are given in Appendix A of the supplementary materials (available at Biostatistics online).

2. Current approaches for the statistical analysis of fMRI data

For the equation M2th voxel, the voxel time series equation M3 equation M4 comprises the BOLD response over equation M5 time points. For each such time series, we have a linear model:

equation M6

The baseline trend equation M7 is modeled as a fixed parameter or as a set of polynomial or trigonometric functions (Smith and Fahrmeir, 2007). The activation profile is captured via the equation M8-vector equation M9, which is the magnitude of the BOLD response to the equation M10 stimuli. The design matrix equation M11, equation M12, is composed of columns that are a function of the task stimuli corresponding to equation M13. Temporal correlation is accounted for by equation M14, an equation M15-vector of autoregression coefficients, with equation M16 the equation M17 matrix of lagged prediction errors, while equation M18. Spatial dependence between voxels is induced via latent indicator variables equation M19 such that equation M20 if equation M21 and equation M22 if equation M23, where equation M24 indexes the stimulus effects in equation M25. Estimation of equation M26 and equation M27 is of primary interest.

SBVS methods were first applied in fMRI by Smith and others (2003), and more recently in Smith and Fahrmeir (2007). In the latter paper, Smith and Fahrmeir considered a full-brain approach that introduced spatial dependence via an Ising prior for the latent binary indicator variables. The Ising model (Ising, 1925) is a binary Markov random field model (Kindermann and Snell, 1980) that was first employed in the study of ferromagnetism. The Ising prior is appealing from a modeling point of view but has an intractable normalizing function. This problem can be addressed using perfect sampling (Møller, 1999; Moller and others, 2006) or one of various approximations (Gelman and Meng, 1998; Wang and Landau, 2001; Zhang and Ma, 2007). The former leads to burdensome computation. The latter fails to guarantee that the Markov chain has the desired stationary distribution.

The prior on the regression coefficients in Smith and Fahrmeir (2007) is based on Zellner's g-prior (Smith and Kohn, 1996), which has the form

equation M28

where equation M29. For equation M30, the vector of binary indicator variables over all voxels for the equation M31th stimulus effect, the Ising prior is given by

equation M32

where equation M33 denotes neighboring voxels, equation M34 denotes the indicator function, equation M35 is fixed and represents prior knowledge of activation locations, equation M36 is the interaction effect between neighboring voxels, and equation M37 is a pre-specified weight. For equation M38, the marginal posterior distribution equation M39 is sampled using MCMC, and the probabilities of activation are estimated as the sample means of the indicator variables. Specifically, for equation M40 samples from the posterior distribution of equation M41, the estimated probability of activation is

equation M42

where equation M43 denotes the full indicator variable set sans equation M44. The regression coefficients are estimated, using Rao–Blackwellization (Smith and Fahrmeir, 2007), as

equation M45

The data-based g-prior was chosen for convenience, as it allows the regression coefficients to be integrated out analytically, with the goal of speeding convergence. Further, magnitudes of the stimulus effects (represented by the regression coefficients) are estimated by least squares. The method is purely spatial and so does not account for temporal dependence in voxel time series.

Noting that the lack of temporal modeling leads to lower quality inference, Lee and others (2014) developed a spatiotemporal model using the Ising prior as in Smith and Fahrmeir (2007). There, the authors captured the temporal dependence via a first-order autoregressive (AR(1)) model. Such an approach proves problematic when the regression coefficients are analytically integrated out of the likelihood: the resulting voxel time-series covariance matrix includes an unknown AR(1) coefficient, which necessitates the inversion of a dense equation M46 matrix during each iteration of the sampler. This implies a sizable computational burden.

3. Extending current approaches

Our approach makes use of a parcellation that divides the brain into many non-overlapping parcels. Within each parcel, we apply a SBVS method that also accounts for voxel-level temporal correlation. In the context of fMRI, our procedure is novel in that a sparse SGLMM prior is used to model the spatial dependence among the activation indicators. Parcellation and the sparse SGLMM together permit very efficient sampling even though our model is fully Bayesian.

3.1. Partitioning the brain

We use a divide-and-conquer approach to achieve speed gains over previous methods via analysis of parcels generated pseudorandomly as follows. First, divide the brain into equation M47 cubes, each of which has equation M48 voxels on a side (where equation M49 is chosen by the investigator). For example, equation M50 results in parcels of size 729 voxels. Since the brain is not rectangular, many of the parcels will include fewer than equation M51 voxels. We have found that a minimum of 500 voxels per parcel results in excellent spatial inference. Next, an iterative algorithm is implemented that ensures parcels have a minimum of 500 voxels while ensuring that the underlying graph is connected. The final dataset for analysis comprises equation M52 parcels, with the equation M53th parcel having equation M54 voxels. Computational gains are achieved since the parcels can be analyzed in parallel. The parcellation could potentially be based on an existing anatomical atlas, such as Brodmann areas (Tzourio-Mazoyer and others, 2002), but we found that such an approach leads to an excessive number of parcels with fewer than 500 voxels. Thus, we will apply the divide-and-conquer approach in both our simulation study and application to real data.

3.2. SBVS with temporal correlation

We develop a Bayesian variable selection scheme (George and McCulloch, 1997) for the regression coefficients of the voxel-level regression (2.1). We place a spike-and-slab mixture prior on the regression coefficients equation M55 such that each equation M56 is drawn from a diffuse normal distribution (the slab) or a point mass at zero (the spike). This structure reflects the prior belief that a coefficient is non-zero or zero, respectively. Latent indicator variables equation M57 are introduced such that the mixture prior for each equation M58 has the form

equation M59

where equation M60 is an unknown stimulus-level variance and equation M61 denotes a point mass at zero. We make the natural assumption that the regression coefficients are a priori independent conditional on the indicator variables (George and McCulloch, 1997). We model the spatial dependence between voxels by placing a spatial prior on the indicator variables (discussed in Section 3.3).

The columns of the design matrix equation M62 contain the transformed stimulus function, a hemodynamic response function (HRF) convolved with task onset times and durations. The use of the HRF is motivated by the fact that the BOLD response is delayed in time after the underlying neuronal activation, and occurs continuously, rather than in a binary fashion, at the task onset times. Since the focus of this article is on our spatial prior and our parcellation scheme, we use an “off-the-shelf” HRF, namely the gamma density. Other possibilities include the Poisson probability mass function, and the double-gamma canonical HRF (Lindquist and others, 2009).

To account for the serial correlation present in the univariate voxel time series, we employ an AR(2) model that is easy to implement and computationally efficient. The AR(2) model offers a trade-off between the computational complexity of a higher-order AR process and a simpler AR(1) process. Similar to Penny and others (2003), we include a matrix of lagged prediction errors, denoted equation M63, in the regression model. We use a zero mean and vague normal prior on the AR(2) coefficients equation M64. To complete the voxel-level prior specification, the prior on the error variance is equation M65, and the prior on the variance of the non-zero regression coefficients is equation M66. In this way, regression coefficients across voxels share a prior variance, which results in additional smoothing beyond that induced by the spatial prior.

3.3. Sparse SGLMM prior

Here we describe our spatial prior on the voxel- and task-specific binary indicator variables equation M67. We assume a sparse areal generalized linear mixed model (Hughes and Haran, 2013) for each equation M68 equation M69. Specifically, equation M70 are conditionally independent Bernoulli random variables with a probit link function, i.e.,

equation M71

where equation M72 denotes the cumulative distribution function of a standard normal random variable, equation M73 is a vector of synthetic spatial predictors, equation M74 is a vector of spatial random effects, equation M75 is an auxiliary variable introduced to facilitate Gibbs sampling (Holmes and Held, 2006), and equation M76 is fixed to reflect the prior probability of activation. Voxels are located at the vertices of an underlying undirected graph, the structure of which reflects spatial adjacency among voxels. We represent the graph using its parcel-level adjacency matrix equation M77, which is equation M78 with entries given by equation M79 and equation M80. In a 2D analysis, we let a voxel neighborhood comprise the four nearest voxels. With 3D fMRI data, a neighborhood contains the 26 nearest voxels. The prior for the spatial random effects is

equation M81

where equation M82 is a smoothing parameter; equation M83 is an equation M84 matrix, the columns of which are the equation M85 principal eigenvectors of equation M86; and equation M87 is the graph Laplacian. Note that equation M88 above is the equation M89th row of equation M90. The columns of equation M91 are multiresolutional spatial basis vectors that are well suited for spatial smoothing and allow us to capture both small-scale and large-scale spatial variation typically exhibited by fMRI data (Woolrich and others, 2004). Following Hughes and Haran (2013), we introduce sparsity by selecting a small subset of the columns of equation M92. This choice permits spatial smoothing while reducing the dimensionality considerably (typically, equation M93). Our choice of prior on the smoothing parameter equation M94 follows Kelsall and Wakefield (1999) such that the shape equation M95 and the scale equation M96. Such a choice results in a large prior mean of the precision implying a small variance of the random effects which does not produce artifactual spatial structure in the posterior.

3.4. Posterior computation and inference

Denote the voxel-level parameters as equation M97, and the parcel-level parameters as equation M98. Within the equation M99th parcel (having equation M100 voxels), we assume that

equation M101

which implies that the between-voxel and between-task parameters are conditionally independent a priori. The posterior distribution is obtained in the usual way by combining priors and the likelihood. To obtain updates for each equation M102, we use a voxel-level likelihood where equation M103 has been integrated out analytically. For equation M104, let equation M105, and let equation M106. Then, conditional on equation M107, the likelihood can be written as a mixture with two components:

equation M108


equation M109

where equation M110 is the voxel-level likelihood when equation M111, and equation M112 is the likelihood when equation M113. Integrating equation M114 out of equation M115, it is straightforward to show that

equation M116

where equation M117. The posterior probability that equation M118 is equation M119, where

equation M120

Conditional on equation M121, set equation M122. Otherwise, equation M123 is updated from its full conditional distribution. We can write equation M124, and since the error term is normally distributed, and each equation M125 has a normal prior distribution, conditional on equation M126, the posterior distribution of equation M127 is given by equation M128, where equation M129 and equation M130.

Posterior sampling of each of equation M131, equation M132, and equation M133 uses probit regression with auxiliary variables, conditional on equation M134 only (Holmes and Held, 2006). The full conditional distributions of all remaining variables are derived in Appendix A of the supplementary materials (available at Biostatistics online).

In order to declare a voxel active or inactive, a threshold on the estimated posterior probability of activation must chosen. We follow Smith and Fahrmeir (2007) and Lee and others (2014) in choosing a threshold value of 0.8722. Lee and others (2014) showed that a threshold of 0.8722 corresponds to a p-value of 0.05. Using a simulation study, we found that this threshold works well in practice.

4. Simulation studies

We carried out two simulation studies to assess the performance of our approach. First, we simulated data from our model under six scenarios to investigate the effects on inference of different values of the spatial smoothing parameter equation M135 and the error variance equation M136. Next, we investigated the parcellation method's effect on activation detection. The latter simulation study employs the neuRosim package (Welvaert and others, 2011) for R (Ihaka and Gentleman, 1996) to simulate 2D datasets exhibiting spatial and temporal dependence.

4.1. Model performance for varying precisions

We applied our proposed method to data simulated from the prior distributions given in (3.2) and (3.3). Under investigation is the influence of the spatial smoothing parameter equation M137 and the error variance equation M138. We choose equation M139 or equation M140. Simulated probabilities are shown in Figure Figure1(a)–(c).1(a)–(c). We chose values of 0.5 and 2 for the error variance.

Fig. 1.
Representative simulated and estimated probability and binary maps. Each image column corresponds to a value of spatial smoothing parameter equation M141. For simulated probability maps (a–c), a binary map is simulated as conditionally independent Bernoullis ...

We fix equation M142 and simulate on the equation M143 (625 voxels) square lattice. On the 2D square lattice, voxels were considered neighbors if they shared a face, thus the maximum number of neighbors per voxel is 4 voxels. The spatial random effects equation M144 are generated according to (3.3), using equation M145 eigenvectors of the adjacency matrix equation M146. This value of equation M147 is specific to the chosen parcel structure. The probability of activation is computed as equation M148 (recall that equation M149 is the equation M150th row of the matrix equation M151 of spatial basis vectors). For purposes of the simulation study, the fixed prior probability, equation M152, was set to zero. Thus, the implied prior probability of activation is 0.5 for each voxel. Simulated activation locations are shown in Figure Figure1(d)–(f).1(d)–(f). A design matrix equation M153 of dimension equation M154 is created such that the first column is the appropriate vector of ones, and the second column corresponds to a stimulus from our application to a block design study, as shown by the black curve in Figure Figure2.2. We set the baseline signal equation M155 to 1,200. Each equation M156 is set to zero or six depending on the value of equation M157. The value of six was chosen to represent a very small percentage increase over baseline. We assume a second-order autoregressive structure with coefficients equation M158, and fix the error variance equation M159. In this setting, the value of equation M160, along with the stimulus time series, corresponds to an average activation amplitude of equation M1616.3. Thus, the SNRs for the simulation study are equation M162 and equation M163. The latter SNR is similar to that observed in our application to real data.

Fig. 2.
Experiment stimuli. Each curve represents a set of stimulus onset times convolved with the gamma density. The gray curves are the shape and fixation stimuli, and the black curve is the faces stimulus. The faces stimulus was used to simulate data for each ...

For each scenario, we simulated and analyzed 100 datasets. We used a fixed-width approach to convergence diagnosis (Flegal and others, 2008). This led us to draw 200,000 samples for each dataset, which ensured that no Monte Carlo standard error (MCSE) was larger than 0.005. We define accuracy as the percentage of voxels correctly classified, i.e., equation M164 and equation M165 simultaneously, and false-positive rate as the percentage of false positives, i.e., equation M166. Using a threshold for the estimated posterior means of the equation M167 of 0.8722, the accuracy and false-positive rate are estimated for each simulated dataset, and the results are presented as medians across all 100 datasets.

The summaries are given in Table Table1.1. We see that our approach performs well across all values of the spatial smoothing parameter equation M168. The median voxel accuracy rates are above 99% for an error variance of 0.5, while the median false-positive rate is equation M1692%. As the error variance increases to two, the accuracy rate remains very high. Figure Figure1(g)–(l)1(g)–(l) depicts our model's ability to recover the probability and binary maps. Regression and autoregression coefficients, error variances, and spatial parameters were estimated with minimal error (not shown).

Table 1.
Median accuracy and false-positive rates for simulated data. Median percentage (minimum and maximum) of correctly classified voxels and false positives over 100 simulated datasets for each scenario are displayed. The Full dataset and Parcellation simulations ...

4.2. Edge effects

For the second simulation study, we used the neuRosim package (Welvaert and others, 2011) to simulate spatiotemporal fMRI outcomes on the equation M174 square lattice. Four regions of activation were selected, as shown in Figure Figure3(a),3(a), and the faces stimulus was used, as shown in Figure Figure2.2. The locations of activation were chosen such that a region of activation overlaps two quadrants, and one quadrant is empty. Each of the 2,500 voxels was simulated from a white noise process or from an autoregressive process of order 2 with equation M175, dependent on the true value of equation M176 shown in Figure Figure3(a).3(a). The SNR was chosen to be 4.5, as in our application to real data.

Fig. 3.
Parcellation for a dataset simulated on the equation M177 square lattice. (a) True binary map showing areas of activation in white. (b) Estimated binary map based on analysis of full dataset. (c) Estimated binary map based on analysis of four equation M178 parcels. Dashed lines represent ...

To assess sensitivity to the chosen parcellation, we simulated 100 datasets from this scenario, and applied our model to each equation M179 dataset in full. Next, we partitioned each equation M180 dataset into four equation M181 parcels, and carried out estimation on each of the parcels independently. Then the four parcels were rejoined before classification accuracy was assessed.

The summaries are given in Table Table1.1. We see that, similar to the varying precisions simulation study, the high sensitivity, the high percentage of voxels correctly classified, and the relatively low false-positive rate suggest that our model is insensitive to parcellation. Figure Figure3(c)3(c) shows the resulting binary map for a randomly selected dataset. Somewhat surprisingly, the false-positive rate under parcellation was slightly lower than for analyses of the full datasets. We investigated the sensitivity and false-positive rate among voxels along the partition lines, and the results were similar to those for the full datasets (not shown). Regression and autoregression coefficients, error variances, and spatial parameters were estimated with minimal error (not shown).

5. Application to a single subject from an adolescent-depression study

The dataset used to illustrate our method is a single-subject fMRI time series from a visual experiment. The dataset was collected as part of a larger connectivity study designed to quantify the interaction between depression and neural response in adolescents with major depressive disorder (Musgrove and others, 2015). Subjects completed a 6.5 min fMRI task during the scan that had two matching conditions (facial expressions of fear and anger) or neutral objects (geometric shapes). The task included 13 counterbalanced blocks (3 fixation, 5 shape, and 5 faces). Face and shape blocks comprised six consecutive 5-s trials. Fixation blocks were presented at the beginning, after five blocks of shapes and faces, and at the conclusion of the task. In particular, the task was designed to elicit a negative emotional response (Hariri and others, 2002), and so activation was expected in regions, including the amygdala, anterior cingulate cortex (subgenual and supragenual regions), and prefrontal cortex (ventromedial and dorsolateral regions) (Musgrove and others, 2015).

Imaging was performed using a 3T Siemens Trio MRI Scanner at the Center for Magnetic Resonance Research at the University of Minnesota. The task was projected onto a screen inside the scanner's bore that participants could view using a mirror on the head coil. Using a T2-sensitive echo planar sequence, 197 scans were acquired, 46 axial slices (no gap, interleaved slice acquisition), 4 mm in thickness (repetition time 2 s, echo time 28 ms, equation M182 array, and a flip angle of equation M183). The voxel size was equation M184 and there were 61 thousand brain voxels in total. The following preprocessing steps were used (FMRIB software library, FSL; motion correction using the first volume in the functional series as the reference volume, slice timing correction, skull stripping, high-pass temporal filtering with a 25 s cutoff, prewhitening, and registration to Montreal Neurological Institute standard space.

Onset time indicators for each of the stimuli were convolved with the double-gamma HRF, as shown in Figure Figure2.2. The resulting shape and face stimuli time series were used in the design matrix. Our model was fit to a single subject's dataset. The parcellation technique resulted in 90 parcels with a median parcel size of 729 voxels (min: 500, max: 1,000). Use of the sparse SGLMM resulted in an average spatial dimension reduction of 72% per parcel. The prior probability of activation was fixed at 2% across all voxels by choosing equation M185.

Initial values were chosen as the maximum likelihood estimates of each parameter. A single chain with no burn-in was used to analyze each parcel. Owing to the high dimensionality, it is difficult to use convergence criteria beyond a sufficiently small MCSE. Thus, to ensure small MCSE for all parameter estimates, 200,000 samples were drawn from the posterior distribution for each parcel. The largest MCSE of the estimated posterior means of the activation indicator variables was below 0.005, indicating that a sufficient number of samples were drawn. Figure Figure44 displays axial slices showing voxels with BOLD level changes for the faces stimulus exceeding fixation with a high probability. The highlighted voxels have estimated posterior probabilities that exceed 0.8722. Regions highlighted include the amygdala, anterior cingulate cortex, and prefrontal cortex.

Fig. 4.
Voxel-level estimated posterior probabilities of increased activity over baseline for the faces stimulus (thresholded at 0.8722). Slice numbers denote axial slices (46 slices total). Important contiguous regions of activation include the amygdala (slice ...

The results of Figure Figure44 demonstrate the model's ability to encourage spatial clustering in areas with a high posterior probability of activation. This result reflects the goal of task-based experiments in which an investigator aims to reveal localization of task-related activation effects. Of primary interest for these data is activation of the amygdala, a relatively small functional region located medially within the temporal lobes (shown in Figure Figure4(a)).4(a)). Panels (a) and (b) of Figure Figure44 show the expected activity in the visual cortex. Panels (c) and (d) of Figure Figure44 show activity in the prefrontal cortex, especially the dorsolateral region.

6. Discussion

This article presented a new approach to SBVS for regressions located on a lattice, with an emphasis on applications in single-subject fMRI studies. Our approach is novel in that we avoid the use of Zellner's g-prior, and instead use a spike-and-slab prior, where the spike is a point mass at zero. We also employ a flexible autoregressive model to account for the temporal correlation that is known to exist in voxel time series.

We assumed space-time separability, as in Lee and others (2014) and Smith and Fahrmeir (2007). To ensure convergence of the Markov chains in our MCMC scheme, a large number of draws must be made. We found that 200,000 posterior samples are required to ensure convergence. Computation time is limited by the number of parcels and access to a computing cluster. In our application, computations were carried out in a supercomputing environment using Intel Xeon E5-2680, 2.5 GHz processors. Thus, our model required 40 min of computing time for the 90 parcels. We have seen that inference appears insensitive to parcellation, and that the joint use of the parcellation and the sparse SGLMM greatly reduces the computational burden relative to analysis of a full dataset. The MCMC procedure produces samples from the joint posterior distribution of the model parameters, which obviates adjustment for multiple comparisons in a post-processing step.

It is important to account for voxel-level temporal correlation, and we found that accounting for spatial dependence is no less important. As for the former, our modeling framework efficiently handles temporal autoregression in a fully Bayesian way. Regarding spatial dependence, we compared our method with a non-spatial approach. The results (see Appendix B of the supplementary materials available at Biostatistics online) suggest that our approach yields a lower false-positive rate, and so spatial dependence should not be ignored.

7. Software

Software in the form of R and Cequation M186 code, together with a sample dataset and complete documentation, is available upon request.


Subject data acquisition was supported by the National Institute of Mental Health (K23MH090421 to Kathryn R. Cullen); and the Biotechnology Research Center [P41RR008079 (Center for Magnetic Resonance Research)]. D.R.M. was supported by University of Minnesota Academic Health Center Faculty Research Development Grant #11.12 (to L.E.E. and Kathryn R. Cullen).

Supplementary Material

Supplementary Data:


The authors are grateful to Kathryn R. Cullen, MD, University of Minnesota, for providing the single-subject dataset used in the application. This work was carried out in part using computing resources at the University of Minnesota Supercomputing Institute. Conflict of Interest: None declared.


  • Bandettini P. A. (2009). Seven topics in functional magnetic resonance imaging. Journal of Integrative Neuroscience 83, 371–403. [PMC free article] [PubMed]
  • Bowman F. D. (2007). Spatiotemporal models for region of interest analyses of functional neuroimaging data. Journal of the American Statistical Association 102478, 442–453.
  • Flegal J. M., Haran M., Jones G. L. (2008). Markov Chain Monte Carlo: can we trust the third significant figure? Statistical Science 232, 250–260.
  • Friston K., Ashburner J., Frith C., Poline J., Heather J., Frackowiak R. (1995). Spatial registration and normalization of images. Human Brain Mapping 2, 165–189.
  • Gelman A., Meng X.-L. (1998). Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical science 132, 163–185.
  • George E., McCulloch R. (1997). Approaches for Bayesian variable selection. Statistica Sinica 7, 1126–1141.
  • Hariri A. R., Tessitore A., Mattay V. S., Fera F., Weinberger D. R. (2002). The amygdala response to emotional stimuli: a comparison of faces and scenes. Neuroimage 171, 317–323. [PubMed]
  • Holmes C. C., Held L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis 11, 145–168.
  • Hughes J., Haran M. (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 751, 139–159.
  • Ihaka R., Gentleman R. (1996). R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics 5, 299–314.
  • Ising E. (1925). Beitrag zur Theorie des Ferromagnetismus. Zeitschrift für Physik 311, 253–258.
  • Kelsall J. E., Wakefield J. C. (1999). Discussion of “Bayesian Models for Spatially Correlated Disease and Exposure Data”, by Best et al. In: Berger, J. O., Bernardo, J. M., Dawid, A. P. and Smith, A. F. M. (editors), Bayesian Statistics 6. Oxford: Oxford University Press.
  • Kindermann R., Snell J. L. (1980) Markov Random Fields and Their Applications. Providence, RI: American Mathematical Society.
  • Lee K. J., Jones G., Caffo B., Bassett S. S. (2014). Spatial Bayesian variable selection models on functional magnetic resonance imaging time-series data. Bayesian Analysis 93, 699–732. [PMC free article] [PubMed]
  • Lindquist M., Loh J. M., Atlas L., Wager T. (2009). Modeling the hemodynamic response function in fMRI: efficiency, bias and mis-modeling. Neuroimage 451, S187–S198. [PMC free article] [PubMed]
  • Møller J. (1999). Perfect simulation of conditionally specified models. Journal of the Royal Statistical Society, Series B, Methodological 61, 251–264.
  • Moller J., Pettitt A., Reeves R., Berthelsen K. (2006). An efficient Markov Chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 932, 451–458.
  • Musgrove D., Eberly L. E., Klimes-Dougan B., Basgoze Z., Thomas K., Mueller B., Houri A., Lim K., Kathryn K. C. (2015). Impaired bottom-up effective connectivity between amygdala and subgenual anterior cingulate cortex in unmedicated adolescents with major depression: results from a dynamic causal modeling analysis. Brain Connectivity, doi:10.1089/brain.2014.0312. [PMC free article] [PubMed]
  • Penny W., Kievel S., Friston K. (2003). Variational Bayesian inference for fMRI time series. Neuroimage 193, 727–741. [PubMed]
  • Smith M., Fahrmeir L. (2007). Spatial Bayesian variable selection with application to functional magnetic resonance imaging. Journal of the American Statistical Association 102478, 417–431.
  • Smith M., Kohn R. (1996). Nonparametric regression using Bayesian variable selection. Journal of Econometrics 752, 317–343.
  • Smith M., Pütz B., Auer D., Fahrmeir L. (2003). Assessing brain activity through spatial Bayesian variable selection. NeuroImage 202, 802–815. [PubMed]
  • Tzourio-Mazoyer N., Landeau B., Papathanassiou D., Crivello F., Etard O., Delcroix N., Mazoyer B., Joliot M. (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 151, 273–289. [PubMed]
  • Wang F., Landau D. P. (2001). Efficient, multiple-range random walk algorithm to calculate the density of states. Physical Review Letters 8610, 2050. [PubMed]
  • Welvaert M., Durnez J., Moerkerke B., Verdoolaege G., Rosseel Y. (2011). neuRosim: an R package for generating fMRI data. Journal of Statistical Software 4410, 1–18.
  • Woolrich M., Jenkinson M., Brady J., Smith S. (2004). Fully Bayesian spatio-temporal modeling of fMRI data. IEEE Transactions on Medical Imaging 232, 213–231. [PubMed]
  • Worsley K. J. (2003). Detecting activation in fMRI data. Statistical Methods in Medical Research 12, 401–418. [PubMed]
  • Zhang C., Ma J. (2007). Simulation via direct computation of partition functions. Physical Review E 763, 036708. [PMC free article] [PubMed]

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press