|Home | About | Journals | Submit | Contact Us | Français|
Computer simulations have played a critical role in functional magnetic resonance imaging (fMRI) research, notably in the validation of new data analysis methods. Many approaches have been used to generate fMRI simulations, but there is currently no generic framework to assess how realistic each one of these approaches may be. In this paper, a statistical technique called parametric bootstrap was used to generate a simulation database that mimicked the parameters found in a real database, which comprised 40 subjects and 5 tasks. The simulations were evaluated by comparing the distributions of a battery of stastical measures between the real and simulated databases. Two popular simulation models were evaluated for the first time by applying the bootstrap framework. The first model was an additive mixture of multiple components and the second one implemented a non-linear motion process. In both models, the simulated components included the following brain dynamics : a baseline, physiological noise, neural activation and random noise. These models were found to successfully reproduce the relative variance of the components and the temporal autocorrelation of the fMRI time series. By contrast, the level of spatial autocorrelation was found to be drastically low using the additive model. Interestingly, the motion process in the second model intrisically generated some slow time drifts and increased the level of spatial autocorrelations. These experiments demonstrated that the bootstrap framework is a powerful new tool that can pinpoint the respective strengths and limitations of simulation models.
Functional magnetic resonance imaging (fMRI) measures the hemodynamic correlates of brain neural activity [1,2]. This modality had a deep impact on human cognitive neuroscience and clinical research over the past fifteen years [3, Figure 1]. Computer simulations have critically contributed to advance fMRI research in at least two directions. They have first been used to evaluate the performance of some data analysis methods in a completely controlled environment, e.g. fMRI data preprocessing , hypothesis testing strategies [5,6], activation detection techniques [7–10] or motion-correction algorithms [11–14]. Simulations have also played a key role in uncovering some of the underlying mechanisms of the fMRI data-generating process, both on the aspects of neural dynamics [15,16], brain metabolism [17,18] and MR physics .
There has been a wide variety of approaches to generate fMRI simulations : a new ad hoc environment has essentially been developed for each particular study. As a consequence, it has been difficult so far to compare the simulation results reported by different groups working on the same question. To fully appreciate the potential value of the results reported in a given simulation study, the fMRI research community would need a rigorous and comprehensive evaluation of the strengths and limitations of employed simulation framework. Some groups have used an additive model with three main components : a constant baseline, a signal of interest called activation and a random noise, e.g. [20,7,4,8,21–23,9,24,5,6]. Such additive models essentially gave access to the signal-to-noise ratio and the shape of activated regions. The investigators mostly did not question whether the simulations were realistic as long as some plausible values were assigned to the parameters of the simulation model.
More elaborate approaches included motion as part of the data-generating process, either through spatial interpolation [11–13] or using a first-principle model of the MR physics [19,25,26]. Such models gave access to the motion parameters and in some cases to the pulse sequence parameters. They were also much more computationally demanding, requiring hours or even days of computation on a single workstation. Unfortunately, whether these models generated realistic simulations has not yet been quantitatively assessed. Rather, the evaluation was a qualitative examination, i.e. a visual check of the presence or the shape of known artefacts on static images .
This paper presents a principled framework to generate an fMRI simulation database and evaluate its strengths and limitations. In general, the proximity between a simulation database and real fMRI acquisitions depends on the quality of the simulation model, but is also related to the way the simulation parameters are selected to generate the database. Obviously, if the parameters themselves strongly deviate from realistic values, the most accurate model of the fMRI data-generating process would lead to simulations of poor quality.
We developed a flexible method to address these issues which could in theory be applied to any parametric simulation model. The rationale of our method was to replicate a real fMRI database by systematically using the parameters estimated on the real data to set up the parameters of the simulations. This approach was a direct application of a statistical technique called parametric bootstrap. Iterating this process on each subject of a large database ensured that the variability of the simulated parameters precisely mimicked the variability found in the real database under scrutiny, for example in terms of brain anatomy or signal-to-noise ratio. The evaluation of the quality of the simulated database proceeded by assessing which statistical properties of fMRI data were well replicated and which ones were not. Specifically, a battery of statistical measures was estimated on each individual dataset and group-level summaries were derived. The significance of the differences between the real and simulated group-level summaries were established using a non-parametric bootstrap of the subjects found in the database.
We applied the bootstrap framework on two simulation models that encapsulated the main features of common models used in literature, namely an additive (ADD) model and a simple motion (SM) model. These experiments were designed to demonstrate that the bootstrap approach is feasible on recent popular models. To the best of our knowledge, it was also the first quantitative evaluation conducted on an fMRI simulation database. The simulated components included a baseline, slow time drifts, physiological noise, neural activation and random noise. This range of brain dynamics is as large or even larger than what has been considered thus far by other groups working on fMRI simulations. Parametric bootstrap was used to replicate a real database, the so-called functional reference battery (FRB), featuring a large population (N = 40 subjects) and 5 tasks designed to engage a broad range of brain systems. The FRB database was replicated using each model, resulting into two simulation databases, respectively called FRB-ADD and FRB-SM. We evaluated the quality of each simulation database by investigating the following individual statistical measures : respective variance of the estimated components of the model, the spatial and temporal autocorrelation, as well as the distribution of the variance in a principal component analysis.
The most critical ingredient in a computer-based simulation study is the data-generating process (DGP). The DGP f describes the mechanisms used to generate a data sample y of a random variable Y. For example, f could represent the process of drawing multiple independent samples from a Gaussian random number generator. The DGP f is perfectly known and controlled in a simulation study, yet simulations are also generally intended to replicate a real experiment. In the case of brain imaging, y would be an individual set of fMRI time series, Y would represent a subject and f would be a model of the random process that goes from the individual brain functional architecture to a digital BOLD measure of the brain activity acquired during an MRI session. The real DGP f or its parameters θ are actually unknown and the investigator is bound to rely on a more or less elaborate approximation f^. The bootstrap is a general statistical method to perform this estimation that relies on a real data sample y to derive a bootstrap estimated DGP f^y.
The other main ingredient in computer simulations is a function m that can be estimated on each data sample y, resulting into one measurement m(y) (see figure C.1). The measure m (y) could for example be a set of maps of regression coefficients estimated in a linear model or their associated level of significance against a null hypothesis, i.e. a statistical map. One basic objective in a computer simulation study would be to uncover the distribution of the measures m (y) when y follows the DGP f. This can be formally done through Monte-Carlo simulations which consists in drawing from f a number B of independent and identically distributed (i.i.d.) data replicates and use these samples to derive an estimate Ĝ of the (left-sided) cumulative distribution function (cdf) G of m(y) :
where # is the cardinality of a set and the symbol means that the two terms are asymptotically equal as B tends toward infinity.
Some further explicit parametric assumptions on the DGP would allow to raise even more interesting questions. A parametric model provides the investigator with an explicit control on the different aspects of the DGP, such as the localisation of truly activated regions or the signal-to-noise ratio. Formally, the DGP takes a closed-form expression fθ which depends on a set of ground-truth parameters θ. In our case, these parameters would represent all the features of brain dynamics included in the model, e.g. brain anatomy, characteristics of brain tissues, modulations of related to the metabolic response to neural activity, motion, etc. Under such a parametric assumption, the parametric bootstrap simulation (PBS) consists in replacing the real unknown parameters θ by some estimate derived from a real dataset y. Section 3 will describe some particular models in details but the techniques presented here are general. One possible application of a parametric DGP would be to perform a sensitivity analysis, i.e. determine which is the influence of one parameter in θ on G(x). To build on the previous examples, it would be interesting to assess how the signal-to-noise ratio affects the estimated statistics in the activated regions. Moreover, if m(y) was an estimator θ^ of the parameters θ, it would be possible to investigate the accuracy of the estimator and even to compare the accuracy of different estimators as it was commonly done using receiver-operating characteristic (ROC) curves .
The parametric bootstrap can be extended to a full fMRI database. An fMRI database is a collection of individual datasets collected for a number N of subjects. Those subjects are independent samples of a specific population, for example adults with no history of neurological or psychiatric disorder between the age of 20 and 40. Each fMRI dataset yn is modeled as one sample drawn from a subject-specific random variable Yn with some subject-specific distribution fθn. Replicating the real database using PBS simply consists in estimating the parameter n for each subject Yn and generating a PBS replicate from fn. The PBS method to replicate an fMRI simulation database was summarised in the central part of Figure C.2.
There are a number of concerns with the theoretical foundation of PBS in the context of individual fMRI simulations. The chief concern that would actually stand for any parametric approach is the validity of the parametric model. Other possible shortcomings are the quality of the procedure used to estimate the parameters of the model, or the applicability of the bootstrap approach itself that depends on the properties of the measure m. As a consequence, it is crucial to evaluate whether an fMRI simulation database generated through PBS exhibits some realistic statistical features. Our evaluation strategy was to compare the distribution of a battery of statistical features between the real fMRI database and its PBS replication.
For a particular statistical feature mn = m(yn), the evaluation of the quality of the simulation database boils down to compare the approximate distribution of the measure derived with real samples to the distribution of their simulated replications . For that purpose, it is possible to derive some summary statistics s(m1 …, mn) of the distribution such as the mean, median, 0.05% upper and lower percentiles and compare this summary s* to the value s* derived on the simulation database. To assess statistically any departure between s* and s*, we resorted to non-parametric bootstrap of the subjects of the database. Those subjects are independent and identically distributed (i.i.d.) and in addition our summary statistics are smooth functions of the mean, so the most standard version of non-parametric bootstrap is adapted to that problem, see for example  p. 80–90. Non-parametric bootstrap consists in drawing subjects with replacement from the population, and derive a new replicate of the summary statistics s*b for this surrogate population. By repeating this step a large number B of times, the bootstrap replicates can be used to derive a two-sided symmetric boostrap confidence interval on s* based on the bootstrap cdf defined in Equation (1) . If the real summary statistics s did not fall within the bootstrap confidence interval for s* at a reasonable level of confidence, e.g. p < 0.01, we would conclude to a significant departure between the real and simulated data. The whole evaluation procedure was summarised in the Figure C.2.
Ideally, the proposed evaluation method would be performed on the very same measure(s) of interest as the one(s) investigated in the simulation study. Unfortunately, such measure m usually relies on the ground truth parameters θn. In such case, it would not be possible to derive the samples on the real data because the ground-truth parameters would be unknown. The evaluation method alternatively relies on a battery of different statistical features m that can be estimated both on the real and simulated data. The list of statistical features would cover the most salient aspects that can potentially affect a data analysis method. The features used in this work were listed in Section 4.2.
The following notations were used to describe the models of fMRI data generation :
where Y, B, D, P, A and E were matrices and ϕ was a function. The matrix Y was a 3D+t dataset of size T × S that represented the simulated fMRI time series, with T and S standing for the number of time samples and voxels, respectively. The other matrices, of size T × R with R eventually greater than S for high spatial resolution, each represented one component of the model :
The function ϕ was simulating the MR imaging process, and could be either a linear or a non-linear function. This function was potentially dependent on additional unknown parameters, e.g. the motion parameters.
The ADD simulations were based on a linear mixture model with a Gaussian noise :
where XC and βC respectively were the temporal and spatial distribution of component C, with C standing for any of the components B, D, P or A. The T × S matrix E* was a sample of a Gaussian random variable independent in time and space, with a zero mean and a voxel-specific variance for all voxels υ. A diagram representation of Equation (3) can be found in Figure C.3. The following parameters were a fixed, known part of the parametric model :
The following parameters of the model were unknown and had to be estimated using a real dataset :
Both the spatial and temporal distributions of the physiological noise had to be estimated, which is a blind source estimation problem. As a consequence, the estimation of the parameters of model ADD was addressed using an iterative procedure mixing some least-square linear fits and an independent component analysis. The outline of the estimation pipeline was:
The parameter estimation pipeline was described in further details in Annex A. This procedure was not standard and notably relied on an approximation for the estimation of the noise variance. The components B, A and E* of the model were commonly used in fMRI simulations, e.g. . By contrast with a common approach, e.g. [23,5,6,22], the noise was not modeled as a random process exhibiting dependencies in both time and space. Alternatively, the slow time drifts and physiological noise components were added as deterministic mixtures, as was done for example in [20,21]. Lund et al  showed that adding components in the linear model would account for the temporal autocorrelation of the noise, even though this result cannot be directly translated to the ADD model because the study by Lund et al did not use ICA to identify the physiological noise. The slow time drifts and physiological noise may also account for the spatial correlations in the fMRI noise, but this question is to our knowledge still opened. All the questions and potential concerns raised in this paragraph were investigated in the evaluation of the ADD simulation database presented in Section 4.
The simulation model SM took the following form :
The deterministic components of the model, i.e. B = Xbβb, P = XPβP and A = Xaβa, were first added to generate a motion-free and noise-free space-time dataset Yref of size T × R. The spatial dimension R was significantly higher than the native functional spatial dimension S, e.g. a high-resolution voxel size of 1 × 1 × 1 mm3 for Yhigh compared to a native functional resolution of 4 × 4 × 4 mm3 for Y. A simple rigid-body motion process ϕSM was then applied on Yhigh using the translation parameters τ and rotation parameters ρ. For each slice and each volume of the fMRI dataset, a cubic interpolation was performed in time on Yhigh, τ and ρ to derive a brain volume V of size 1 × R and a set of six motion parameters that precisely corresponded to the time of acquisition of this slice and this volume. The rigid-body transformation was applied to V using a tricubic interpolation scheme1. For each voxel of the slice at the native functional resolution, the value of all high-resolution voxels inside the low-resolution voxel were averaged. This process was iterated over all slices and volumes to produce a low-resolution dataset Ylow of size T × S. A Gaussian noise E* of size T × S was finally added to Ylow to generate a sample of fMRI simulation with a size T × S. The flowchart of the SM model was depicted in Figure C.4.
The parameters XB and XA were known and defined in the same way as for the ADD model. The unknown parameters of the model SM were almost the same as for the ADD model. The differences were that SM did not have a slow time drifts component but included instead some unknown motion parameters τ and ρ, both of size T × 3, for a total of 6 parameters per volume of the time-space fMRI dataset.
The parameters of the model SM were estimated using essentially the same pipeline as for model ADD. This pipeline actually featured a motion parameter estimation, even though motion was not included in ADD. The main additional step was to oversample the spatial distributions βB, βP and βA at the high spatial dimension R. This was done through nearest-neighbour interpolation for the physiological noise and the activation components. The high-resolution baseline spatial distribution was derived by combining a segmentation of a T1 image of the subject and the ADD estimate B, see annex B for details.
As was noted for the model ADD, the method used to estimate the parameters of the model SM relied on a number of approximations. First, the nearest-neighbour oversampling of the spatial distribution of P and A neglected the partial-volume effects in the low-resolution estimates of the ADD model. Second, the high-resolution baseline was derived by assuming a one-to-one correspondence between the T1 and coefficients, which may not be accurate. Note also that the transformation (ϕSM was very similar to some other transformations previously used in the literature [12,11], with small differences : Freire et al. used a low-resolution space to simulate the motion and both this group and Ardekani et al did not include the slice timing effects, i.e. the same volume and transformation was used to generate all the slices in each volume of the fMRI simulation. To our knowledge, none of the studies that have used a model of motion have considered a structured noise such as P in addition to the baseline. Another important modelling choice was to exclude the slow time drifts component from the SM model. The origin of these slow time drifts is somewhat controversial, and they probably represent a mixture of motion/physiological noise and scanner drifts. We postulated that the motion process could account for the most part of the drifts. Excluding the drifts would therefore be regarded as the lesser of two evils, because adding an extra slow time drifts component on top of the motion-induced drifts would result in a far too large total amount of slow time drifts. The potential concerns raised in this paragraph were investigated in the evaluation of the SM simulation database presented in Section 4.
The PBS framework relies on a large number of real datasets acquired following a specific scanning protocol, such that the variability of the simulation parameters would be representative of the characteristics found in this protocol. For this purpose, we used the so-called functional reference battery (FRB) database, which has been designed and acquired by the International Consortium for Brain Mapping (ICBM)2 . Forty right-handed healthy volunteers (20 men, 20 women; age ranging from 20 to 38 years, median 25 years) have participated in this study which was approved by the local ethic committee. Subjects had no history of neurological or psychological disorders. Functional data was acquired while subjects performed five different tasks, i.e. auditory naming, AN; external ordering, EO, which is a working memory task; hand imitation, HA, a visuo-motor task; an oculomotor task, OM, and a verbal generation task, VG. The control condition used for all tasks was to fixate a small arrow at the centre of a screen and press a button when the arrow pointed to the left (very infrequently). An extensive description of the stimuli and tasks can be found on the internet 3 .
For each functional run, 88 brain volumes of blood oxygenation level dependent signals were recorded on a 1.5T Siemens Sonatavision MRI scanner at the McConnel Brain Imaging Center in Montreal, using a 2D echo-planar imaging sequence and the following parameters: TR/TE = 4 s/50 ms, 64 × 64 matrix, in-plane resolution 4mm×4mm, 26 non-contiguous slices, with a 1 mm slice gap, FOV = 256 mm×256 mm, slice thickness 4 mm and flip angle = 90º. A high-resolution T1-weighted scan was also acquired using the following FLASH gradient echo sequence: TR/TE = 22 ms/9.2 ms, 192 × 256 × 256 matrix, FOV = 192 mm×256 mm×256 mm and flip angle = 30º. Subjects performed on-off block paradigms, with 28-sec (7 vol.) long blocks. There were 12 blocks per run (6 ‘off-on’), 3 volumes at the beginning of the run to wait for magnetisation stabilisation, and one extra volume at the end which was ‘off’.
In total, there were 200 fMRI datasets in the FRB database (40 subjects times 5 tasks). To generate the simulation databases, the parameters of data simulation were estimated for each dataset, and plugged into the simulation engines ADD and SM to derive one parametric bootstrap replication. The resulting simulation databases comprised 200 datasets each and were called FRB-ADD and FRB-SM respectively.
The following statistical features were investigated to evaluate the quality of the simulation databases. The first set of features aimed at quantifying the empirical signal-to-noise ratio. The same processing pipeline that was applied to estimate the parameters of the ADD model on the real FRB database was used again on the FRB-ADD and FRB-SM simulation databases. Maps of relative percentage of variance were derived for each component of the model. For example, the variance map of slow time drifts was defined as the absolute difference between the variance of the functional data before and after temporal filtering, divided at each voxel by the total variance of the time series after motion correction and slice timing. The same procedure was applied for the physiological noise and the activation. The relative variance of the residuals was defined in a similar way, except it simply involved the variance map of the residuals rather than a difference between two variance maps.
Another batch of measures was used to investigate the amount of correlation structure present in the data. First, local spatial autocorrelation maps were derived, i.e. the average correlation between the time series associated with each voxel and its 6 spatial neighbours, as well as the temporal (lag 1) autocorrelation maps. These maps were derived at the very beginning of the processing pipeline, i.e. on motion-corrected data, and also on the residuals of the linear model. To get an insight into correlations regardless of the spatial scale, the repartition of the variance in the 60 first components of a spatial principal component analysis (PCA) relative to the total variance explained by these components was also derived.
For all measures that took the form of a map, a group-level average was computed after non-linear coregistration in the MNI space. Axial slices of these averages were presented in the figures for qualitative evaluation. The corresponding slices were also generated on a structural average for anatomical localisation, see Figure C.5. For each individual map, some quantitative summaries were derived, i.e. the 0.01%, 0.25%, 0.50%, 0.75% and 0.99% percentiles within the brain. The summary at the group level was the mean of the individual summaries. Regarding the PCA variance curves, the group-level summary was also the average curve. The bootstrap test described in Section 2.2 was applied to assess any statistical departure between the group-level summaries derived from the FRB and FRB-ADD/FRB-SM databases. The text gives a concise report of the main results and a comprehensive description can be found in the tables.
On the real FRB database, the slow time drifts explained a large part of the variance (median : 14%), with the most important contributions on the edges of the brain and the cerebro-spinal fluid (99%-percentile : 67%) and the smallest contributions in the white matter, see Figure C.6a. The physiological noise explained a smaller part of the variance (median : 7%) yet it was prominent in the ventricles and the medial cerebral artery (99%-quantile : 58%), see Figure C.6b. The activation component explained only a very small part of the variance (75%-percentile : 3%), yet this component had a significant contribution in a localised set of regions (99%-percentile : 22%), see Figure C.6c for task HA. The residuals explained the largest amount of variance (median : 72%), with the highest variance located in the white matter, an almost uniform contribution in the grey matter and the lowest contribution in cerebro-spinal fluid, see Figure C.6d.
With ADD simulations, the distribution of relative variances closely matched what was observed on the real FRB database for all components but for the activation, see Figure C.6. For the upper percentiles of the distribution (50%, 75%, 99%), no significant difference could actually be detected between FRB and FRB-ADD for the slow time drifts, physiological noise and the residuals. A significant bias was still present for the 1% percentile of the distribution, possibly related to a limitation of the shrinkage correction applied to the spatial effects of the components of the model. Note that in the absence of a shrinkage correction, i.e. with the conventional least-square estimates, a much bigger systematic (upward) bias of about 5% in the variance of the slow time drifts and physiological noise components could be observed (results not shown), which was a known statistical property of least-square estimates termed the Stein effect . The close replication of the variance of the residuals suggested that the approximation made in the estimation procedure of the ADD parameters (Equation A.3) was accurate, see Figure C.6d. The activation component had a downward bias of about 0.5% in median for the 99% percentile. This was likely due to the way the spatial effects were estimated for the activation component. An arbitrary peak threshold of p < 0.001 was applied to define the activated regions. This may have lead to a biased estimate of the effects, and apparently it did. There was unfortunately no way to derive a priori an unbiased threshold for the activation component, as the effect size and the real activated regions were unknown. Shrinkage estimation would have been a way to avoid this problem, but it would have resulted in some very large activated regions which were not physiologically plausible. The bootstrap evaluation proved able to identify this shortcoming of the parameter estimation procedure.
Interestingly on the FRB-SM database, a significant amount of variance was explained by the slow time drifts (median : 12%) even though the SM model did not explicitly include this component. The motion process induced a spatial distribution of drifts prominent in the edges of the brain and the ventricles which shared some similarities with what was observed on the real database, see Figure C.6a. However, some clear departures could be observed for example in the white matter. In quantitative terms, all percentiles of the slow time drifts variance significantly departed between FRB-SM and FRB. The size of the bias was still smaller than 5% of the real estimate for the upper percentiles (50%, 75%, 99%)4 . Regarding the physiological noise, there was a clear downward bias, even though the bias was still less than 5% of the FRB estimate. This was probably a consequence of neglecting the partial volume effects in the estimation procedure of the SM parameters. The partial-volume effects also slightly reduced the variance of task-induced fluctuations, see Figure C.6c. These multiple departures caused a biased variance for the residuals, but still in the range of 5% of the real estimates except for the first 1% percentile.
As a whole, the evaluation performed on the FRB database demonstrated that the FRB-ADD database had a realistic repartition of the relative variance among the different components of the model, and that the FRB-SM had only a small bias in its relative variance distribution. This bias could probably be neglected for most applications. As an interesting by-product, this experiment also demonstrated the accuracy of the approximation made to estimate the parameters of the ADD model, at least for the range of parameters present in the FRB database. The FRB-SM database also confirmed the hypothesis that the sole motion could induce slow time drifts to a quite realistic level without having to add any type of arbitrary fluctuations in the slow frequency band.
Percentiles of the spatial and temporal autocorrelation maps were reported in Table C.2. For real datasets, the motion-corrected data exhibited on average some positive spatial autocorrelation prominent in the midline of the brain and in the gray matter, see Figure C.7a. This structure was almost unchanged on residual data except for the tails (median : 0.075 and 0.073 for motion-corrected data and residuals, respectively), see Figure C.7c. Temporal autocorrelation maps had a quite different behaviour : on motion corrected-data, real datasets exhibited a positive median temporal autocorrelation of 0.12, prominent at the edges of the brain and the gray matter, see Figure C.7b. This temporal autocorrelation was almost all gone with the components of the model, leaving the temporal autocorrelation of residuals with a negative median of −0.047, see Figure C.7d.
The ADD simulations reproduced qualitatively the distributions of temporal autocorrelations on motion-corrected data with prominent values on the edges of the brain and in the gray matter for motion-corrected data, and negative temporal autocorrelation on the residuals, see Figure C.7b,d. Despite this qualitatively satisfying behaviour, there was some slight but significant quantitative differences with the real distribution, see Table C.2. Regarding the spatial autocorrelations, there were marked discrepancies between real and ADD data : motion-corrected ADD data exhibited much smaller spatial autocorrelation than the real data (median : 0.012 and 0.075 respectively), see Figure C.7a, and the effect was drastic on residuals (median : 0.001 and 0.073 respectively), see Figure C.7b.
The spatial autocorrelations of motion-corrected SM data was closer of those observed on real datasets (median : 0.031 and 0.075, respectively). In a similar way to what was observed on real data, the spatial autocorrelation on residuals was very close of the one derived on motion-corrected data (median : 0.031 and 0.028 for SM motion-corrected data and residuals, respectively). As a consequence, on residuals, the spatial autocorrelation of FRB-SM was much closer of FRB than FRB-ADD was (median : 0.073, 0.001 and 0.028 for FRB, FRB-ADD and FRB-SM respectively). Despite this improvement, there was still some significant differences between the spatial autocorrelations of FRB and FRB-SM, see Table C.2. The behaviour of the temporal autocorrelations was very similar between FRB-ADD and FRB-SM. The distribution on motion-corrected data was actually a bit closer of the real database for FRB-SM, with no significant difference detected on the 99% percentile (0.703 and 0.698 for FRB and FRB-SM respectively).
Consistent with the findings on autocorrelation maps, there was some significant differences for every component between the average PCA variance curves derived for FRB, FRB-ADD and FRB-SM, both on motion-corrected data and residuals. On motion-corrected data, however, the relative size of the bias was small and both FRB-ADD and FRB-SM gave a reasonable approximation of the real PCA variance curve, suggesting that the estimation procedure had achieved a good identification of the main sources of variance, see Figure C.8a. On residuals, the ADD simulations exhibited an almost flat PCA variance. By contrast, on real datasets, the distribution clearly departed from uniformity which indicated the presence of remaining correlations in the data. The PCA variance of SM residuals clearly departed from the uniform distribution observed on ADD residuals, and got closer of the one actually observed on real residuals, see Figure C.8b.
In this paper, we proposed a bootstrap method for the generation and evaluation of simulation databases. That approach was applied to the FRB database with two simulation models called ADD and SM. The ADD model was based on a linear data-generating process and the SM implemented a non-linear motion process.
The method proposed here to generate fMRI simulations is a straightforward application of parametric bootstrap. The evaluation of the database was an original approach based on a non-parametric bootstrap applied at the population level. The bootstrap is a flexible statistical tool that has great potential for developing new fMRI data analysis methods. In the perspective of hypothesis testing, non-parametric bootstrap should be preferred to the parametric form because it alleviates the potential issues regarding the validity of the parametric model . In a strict simulation perspective, parametric bootstrap still comes with the critical advantage that it provides control on the parameters of the model.
The parametric bootstrap requires an estimation of the parameters of the model in a statistically sound way. That may become intractable if the simulation model gets too complex and this was the case both with the ADD and SM model. An interesting feature of the evaluation procedure is that it detects any discrepancy between the real and simulation databases, whether this discrepancy originates from intrinsic limitations of the model or from limitations of the parameter estimation procedure. It was therefore possible to rely on various simplifications in the estimation procedure. Estimation may even rely on an “educated guess” for some parameters, i.e. using values found in the literature as it is commonly done in the simulation community.
If the procedure of parameter estimation was correct, the evaluation of the simulation database could be viewed as a diagnostic study of the model of the fMRI data-generating process. Such a diagnostic tool would have a wider application range than existing tools specifically tailored for linear models . This flexibility comes at the price of a large increase in computational cost because the database needs to be fully replicated and reanalysed in the bootstrap approach. However, the purpose of the evaluation procedure was not to demonstrate that a given model of fMRI simulation would be able to perfectly reproduce any type of statistical measure on a real dataset. One may argue that such an analytical model probably does not exist anyway. Rather than just testing for the presence of a significant deviation between the real and the simulation databases, the bootstrap evaluation method provided an estimation of the potential bias regarding a set list of statistical measures. This approach was designed to assess the strengths and limitations of a model using different perspectives, rather than strictly demonstrating the validity of the model.
The evaluation method was able to pinpoint some of the strengths and limitations of the two simulation models, which were recapitulated in Table C.3. The ADD model covered a realistic range of components, including slow time drifts and physiological noise, which have not been routinely considered so far in the literature. Generating a dataset with model ADD was relatively fast : it took about 2.5 minutes on average 5 , including the time to read the parameters and then write and compress the results, for a total of about 8 hours to generate the FRB-ADD database. Such computational cost would make the ADD model suitable to very computationally intensive evaluations that might involve thousands of simulations. This model achieved a very accurate replication of the relative variance of the components of the model. It was also able to reproduce with moderate bias the level of temporal autocorrelation of the real data. The ADD model failed to generate a realistic level of spatial correlation both in the raw data and in the residuals of the model, as demonstrated both by the spatial autocorrelation map and by the distribution of PCA variance.
The SM model also gave control on the same components as ADD, except for the slow time drifts, and in addition gave access to the motion parameters. Rather than imposing an arbitrary model of slow time drifts as was done with ADD, the drifts were intrinsically generated as part of the motion process. The SM model was markedly more computationally demanding than ADD : each dataset took about 7.5 hours to generate, totaling roughly two months to generate the full FRB-SM database. This model had a small bias in the variance of the components of the model that could probably be neglected for most applications. It closely approximated the temporal autocorrelation of the fMRI time series, or at least performed as well as the ADD model. It also generated a significant amount of spatial correlation both in the raw data and in the residuals of the model, yet this level of spatial correlation was still significantly smaller than the amount observed in real data. Those features would favour the SM model over the ADD model if the accuracy of the simulations was the main priority.
Please note that the two simulation databases FRB-ADD and FRB-SM strictly reproduced the characteristics of the FRB database. In particular, the parameters were only representative of the 1.5T scanner of the MNI at the time of scanning (sequence and hardware upgrades may impact the observed characteristics) and the employed block-designed experiments. Investigating the variability of simulation parameters across field strengths, scanning sites and experimental designs will be a major direction for future work.
There are at least three promising directions to expand the proposed models of simulations and potentially fix some of their limitations. Rather than generating a single task-related large-scale network, some other networks could be included, e.g. the default-mode network , by using an exploratory identification method [22,36].
The simulations could also be expanded to include more realistic models of large-scale network activity and the associated response of brain metabolism and hemodynamic, such as the model of Riera et al. . However, at this stage, and despite some promising results , estimating the parameters for such models is still challenging when a large number of regions is involved. In this case, parameters of the model would have to be fixed in an arbitrary way by the investigator, yet a study based on linear models could still provide estimates of the variance of fMRI activity.
Another direction would be to include a realistic simulation engine, such as the one recently proposed by Drobnjak et al . Special attention would then need to be given to the residuals. The estimated residual variance in the current approach is mixing the random measurement noise with, notably, some unmodelled generation artefacts. The variance related to the random (Rician) measurement noise could be more specifically estimated using regions of interest outside the brain  and the realistic engine would then take the scanner artefacts into account.
This work is part of a larger effort made by a consortium, named MIDAS6 , formed by our group at Montreal, a group at FMRIB, Oxford, and a group at the University of Hawaii (formerly at the University of Pittsburgh). The aims of this consortium are to build a web interface that would make accessible some pregenerated simulated databases and allow to request for customised simulations. Multiple choices for the simulation model and the database of reference will be made available. In particular, additional simulation databases will be generated to cover a wide range of scanning protocols. Public simulation databases would critically help to standardise the benchmarking of new data analysis methods and would also grant access to some elaborate simulation models for the fMRI community at large. The bootstrap method presented here provides a rigorous framework to generate and evaluate such fMRI simulation databases.
The authors would like to thank Habib Benali, Christophe Grova and Keith Worsley for fruitful discussions, and to acknowledge the work of Roger Woods, Tomas Paus and the International Consortium for Brain Mapping (ICBM) fMRI community in creating the FRB database. This research has been supported in part by CLUMEQ7, which is funded in part by NSERC (MRS), FQRNT, and McGill University. Pierre Bellec was supported partly by NIH (NIFTI) grant R01 MH67172-01 and partly by an FRSQ-Inserm fellowship.
The components in the model ADD were estimated iteratively in the following order: βB and βD; βP, XP and . The notations and an outline of the estimation pipeline were described in Section 3 and the details of each stage of the pipeline were given below.
The first volumes corresponding to approximately 10 seconds of acquisition were discarded to allow the magnetisation to reach equilibrium. For each subject and each functional run, some translation parameters and rotation parameters were estimated on the gradient of each volume using MINC-TRACC  with the gradient of the volume acquired in the middle of the first functional run of the subject as a target reference. The estimated withinrun rigid-body transformation was inverted and applied to each volume using sinc spatial interpolation. Raw functional data were then corrected for interslice acquisition time delay with the implementation found in SPM28, using sinc temporal interpolation. The resulting motion-corrected and slice-timing corrected dataset was denoted Ym.
The baseline and slow time drifts were modelled using a discrete cosine basis :
where T is the number of volumes in the motion-corrected dataset Ym. The frequency associated with Xd(k) was k/(2T × TR) Hz, where TR is the repetition time in seconds. To model the slow time drifts in the [0, fc] band, k thus ranged from 0 to Kd, the integer division of 2T × TR by 1/fc. In our case, with TR = 4 s, T = 85 and fc = 0.01 Hz, the basis to model slow time drifts was composed of Kd + 1 = 7 discrete cosines. Note that the baseline Xb was the constant cosine Xd(k) for k = 0. The spatial distributions of the baseline B and slow time drifts D were estimated through least-square fitting of the slow time drifts on Ym. The drifts-corrected dataset Yd was the motion-corrected dataset Ym minus the estimated contributions of the baseline B^ = XBB and the slow time drifts = XDD.
Spatial independent component analysis (ICA) can achieve blind identification of physiological noise related to heart beat and respiration in fMRI data [41,42]. A spatial ICA of Yd into 60 components was estimated by optimising the spatial independence using the InfoMax algorithm9, starting from an initial decomposition into principal components with highest variance . A procedure called CORSICA  was used to identify the ICA components related to physiological noise p, along with the spatial distributions p. This method finds components with predominant influence in areas known a priori to be corrupted in a systematic fashion by physiological noise. The physiology-corrected dataset Ypis the drifts-corrected dataset Yd minus the estimated contributions of physiological noise = Pβp.
The local hemodynamic response XA to the task, of size T × 1, is assumed to be a boxcar function convolved with a canonical hemodynamic response function which accounts for the metabolic and vascular consequences of neural activity, as implemented in fMRIstat 10. The least-square estimate A(υ) is in general different of zero for all voxels υ in the brain, while a given task is likely to engage only a subset of brain regions. The false-positive rate p(υ) associated with a null hypothesis that β(υ) = 0 at a voxel υ was therefore derived using t-statistics. The null hypothesis was rejected for p < 0.001, uncorrected for multiple comparisons. This lead to the following estimate of the hemodynamic consequences of neural activity at voxel υ:
The residuals of the model were Ê = Ym – B^ – D^ – P^ – Â, and var(Ê(υ)) was the residual variance at voxel υ. Note that here Â actually referred to the least-square estimate, and not the estimate derived using a threshold. In a full linear model, the estimate of the variance of the Gaussian error term σ2(υ) would simply be related to the variance of the residuals var(Ê(υ)) and the total number of covariates in the model :
The differences between this estimation pipeline and the conventional linear model case are twofold. First, contributions of the components of the model were not estimated jointly, but iteratively. This is only a minor difference, because the covariates of the different components are almost uncorrelated, so joint and iterative estimations are almost equivalent. Second, ICA was used to estimate both the space and time distribution of physiological noise in the brain, and there is to our knowledge no analytical derivation for estimating the variance of the residual noise. We assumed the estimate of equation (A.3) was still a fair approximation, and this assumption was checked when evaluating the ADD simulations.
The least-square estimator is known to be unbiased, which means that its statistical expectation is the true value. Some biased estimators may still outperform the least-square estimator in terms of variance accuracy. This phenomenon is clear on the following example. Consider fitting a set of covariates on some pure Gaussian noise. The variance explained by the model being strictly positive, it is necessarily a biased estimator of the true variance of the model which is zero. The larger the number of covariates, the larger the bias. This is an instance of a general phenomenon called the Stein effect . The so-called shrinkage estimators try to address that issue by shrinking the estimator and eventually setting it to zero if it is too close of the expected level in the noise. The respective variances of the components of the model can be seen as a measure of the signal-to-noise ratio. Accuracy of the variance estimation is consequently of primary interest in computer simulations and we advocate it should be favoured over unbiasedness.
A James-Stein shrinkage correction  was applied to the components of the model involving more than two covariates, i.e. the slow time drifts and the physiological noise. To simplify notations, the correction was described below for D but it was applied in the same fashion to P. Let the the following correction factor be defined at every voxel υ :
Let be defined as max(0, ĈD(υ)). The James-Stein corrected estimate of βD(υ) was .
The estimation of the SM parameters consisted in oversampling the low-resolution spatial distributions that had been estimated for the ADD model, denoted by and respectively. The high-resolution spatial maps and were derived by oversampling and at a 1 × 1 × 1 mm3 resolution on the same field-of-view as the native functional images with a nearest-neighbour interpolation scheme. The high-resolution baseline was processed differently, because the baseline was the major source of contrast in the fMRI volumes. The high spatial frequencies of that component therefore had a critical impact on the simulated motion process.
The low-resolution baseline map was first coregistered to a T1 image of the same subject using MINCTRACC . The T1 image was segmented into white matter, gray matter and cerebrospinal fluid using the CIVET11 pipeline, and some partial volume effects (PVE) maps were estimated at each voxel for these tissue types. The baseline and PVE images were resampled in the high-resolution space using a sinc spatial interpolation scheme. The PVE maps were converted into a crisp classification by selecting the dominant tissue type at each voxel. The median of the baseline map was calculated in the white and gray matter, as well as the maximal value inside the cerebrospinal fluid. The value of the high-resolution baseline at each voxel was a weighted sum of these three statistics, the weights being the partial-volume effect of the corresponding tissue type.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1This process was actually restricted to a neighbourhood of the slice of interest to save computation time.
4A preliminary experiment was conducted where the motion was simulated at the native, low resolution of 4 × 4 × 4 mm3 rather than the high resolution of 1 × 1 × 1 mm3. This experiment demonstrated that the maximum of variance for drifts in FRB-SM could be as high as 2779% of the real FRB variance (results not shown) with consequently very unrealistic edges effects.
5All computation times reported here were for a single SUN Dual -Dual Opteron 875 node, and represented strictly the time to generate a simulated dataset. This means that the time to process the data and estimate the parameters was not included. The data processing was done on a cluster of 8 nodes (32 cpus).