|Home | About | Journals | Submit | Contact Us | Français|
In this editorial, we outline a set of guidelines for the reporting of methods and results in functional magnetic resonance imaging studies and provide a checklist to assist authors in preparing manuscripts that meet these guidelines.
Students in the sciences are often taught that one goal of a scientific paper should be to allow other researchers to replicate their study. However, as scientific research becomes more complex, it is increasingly difficult to report all of the details necessary to allow exact replication. This lack of methodological details can also hinder understanding and assessment of reported results by reviewers and readers. In addition, precise specification of relevant methodological details is crucial to ensure that large-scale databases contain the metadata necessary for effective data mining and meta-analysis. Because neuroimaging is such a multidisciplinary science, papers may be written and read by physicists, physiologists, psychologists and statisticians, just to name a few possibilities. This poses a unique challenge since it is important to give sufficient detail that any of these readers will be satisfied that they understood what was done in the study.
Our goal in the present editorial is to present some straightforward guidelines aimed at making fMRI papers more complete in their description of methodological details and results. We describe and outline a set of guidelines for what details should be specified in any fMRI paper. Rather than specifying how a study should be done, we instead focus on what needs to be reported, regardless of how the study is done. In Appendix A, we present a more explicit checklist, which authors can use to ensure that their papers report all of the necessary information outlined in the guidelines. We realize that any guidelines must be responsive to changes in research methods over time and to differences in opinion regarding what should be reported. We are anxious too that this editorial should not be seen as an exercise in dogma or governance but rather as a starting point for encouraging debate aimed towards a widely accepted and flexible set of guidelines. With this in mind, a Wiki-based web site has been established at http://www.fmrimethods.org/. At this site, researchers can debate and collaboratively edit the guidelines, which should ensure that they reflect current standards in the field rather than the opinion of a select set of researchers. The web site also presents an example of a complete methods section that follows the proposed guidelines to serve as a guide for implementing the guidelines.
It is standard to provide basic demographic information about the participants in a study, but additional information is necessary to provide a full description. First, any inclusion and exclusion criteria beyond those implied in the demographics should be described (e.g., “Subjects reported no history of psychiatric or neurological disorders, and no current use of any psychoactive medications”). If the subject sample was recruited in a targeted manner, then the nature of the sampling strategy should be noted. In addition, it is important to note how many subjects were excluded from the study after the data were collected, and specifically why they were excluded.
When describing a psychological task used in fMRI, you should aim to provide sufficient detail that another experimenter could implement the task in a way that is functionally equivalent to the reported procedure; although this is often difficult even with papers from the experimental psychology literature, it is a worthy goal nonetheless. Many details matter; the psychophysicist may be concerned with the visual angle and luminance of the stimuli, whereas the economist may focus on details of how payment was determined. Writers must use their judgment to decide which details are important for a specific study, but a general rule is that it is better to include too much detail than too little.
The term “Talairach space” has become a potent source of confusion in neuroimaging, and researchers need to be careful when using it. A brain or atlas is in Talairach space if the anterior and posterior commisures are on the same horizontal line (the AC–PC line) and the midline plane contains this line. The fact that a brain is in Talairach space does not imply any particular brain shape or size, and in particular, does not mean that such a brain matches a particular template—such as the original Talairach atlas or the MNI/ICBM template. In fact, there are substantial differences between the original atlas described by Talairach and the MNI305 template that is most commonly used today (Brett et al., 2002; Devlin and Poldrack, 2007). In addition, even when the same template is used, different software packages can result in significant differences in the localization of specific structures in 3D space (Van Essen and Dierker, 2007). Therefore, reporting coordinates as being in ‘Talairach space’ without more details is too generic to be useful. It is critical that you specify the atlas or template that you have matched to. You should also give the specific details of the spatial normalization method, including the type of transformation used, and what kind of image is being transformed. Imaging papers often label activations according to Brodmann areas; if you do this, be clear how the label was identified (nearest coordinate in Talairach daemon, cytoarchitectonic definitions from the SPM Anatomy toolbox, etc). This issue is of particular importance for databasing efforts, which require the accurate mapping of data into a common space across datasets produced using different methods.
Regions of interest (ROIs) may be used either to extract estimates of evoked signals or to limit corrections for multiple tests to a subset of all voxels (Poldrack, 2007). In either case, it is essential that the paper describes how the ROIs were determined. It is particularly important that ROIs used for multiple test correction (often called “small volume correction”) are determined independently of the specific test on which the correction is performed, either using an orthogonal contrast or an independent scan. If ROIs are determined anatomically, then the rules for anatomical demarcation should be specified explicitly (e.g., “the inferior frontal gyrus pars triangularis was defined as the region bounded dorsally by the inferior frontal sulcus, ventrally by the lateral fissure, posteriorly by the ascending ramus of the lateral fissure and anteriorly by the horizontal ramus of the lateral fissure, as described by Petrides and Pandya, 2004”). If the ROIs are functionally defined, then the specific contrast used to define them should be specified. We recommend that researchers provide ROI definitions in some appropriate format in the Supplementary material of the paper.
While very powerful, fMRI analysis packages can produce results that are easily misinterpreted or, more problematic, have advanced features that can be misused. To ensure that you and your reader exactly understand the model, it is essential that the approach be described in detail. Although most fMRI studies now report analyses using the general linear model (GLM), there remain substantial differences in how these models are specified and estimated. To a great degree these differences can be captured by knowing which software package was used to perform the analysis, but there can be substantial variability within packages depending upon which options are chosen. Whenever possible, provide a rationale for the user-specified parameters of the software. Some of the important details that may vary even within a package include how the error covariance structure is modeled (e.g., temporal autocorrelation in fMRI timeseries, or correlation induced by repeated measures across subjects). Even within the framework of GLM-based analyses, there are many different approaches to building models. For task-related regressors, it is important to be clear about how the task was modeled (e.g., for a blocked design, was the model based on a boxcar or a series of impulses for each trial within a block?) and how the BOLD impulse response was modeled (e.g., a single or dual-gamma canonical hemodynamic response, or a finite impulse response basis set?). If other regressors such as motion parameters or behavioral covariates are included these should also be described, as should any measures to orthogonalize these regressors. One increasingly common way to present GLM-based design matrices is as an image, which is available from most statistical packages. It is also important to describe the how group effects, as opposed to those in individual subjects, were analyzed and, finally, what precise statistical tests formed the basis for inferences reported. The comparisons that have been performed should be clearly specified in terms of which regressors were included in the contrast and be related to the hypotheses that these comparisons are meant to test.
The majority of published studies today use methods that are part of established software packages and have been described in methodological publications. However, it is not uncommon for a paper to present results using a method that has not been previously described in a methodological publication. In this case, it is critical that the method be described in algorithmic detail so that it can be reproduced by others. We encourage researchers to do this by making their code available with their publication as the most complete description of the procedure. It may also be useful to attach an appendix that describes the method, either mathematically or with pseudocode.
The best test of reproducibility is allowing others to directly reproduce the analysis on your own data. We strongly encourage researchers to make their raw data publicly available with their publication, e.g., via a central database or local web site.
Any empirical claim that is reported should be supported by a specific statistical test. While this may seem obvious, it is a principle that is often violated in the neuroimaging literature. Most commonly, this occurs when an author observes that activation is present in one comparison but absent in another comparison and concludes on this basis that there is a difference in the two effects; Henson (2005) referred to this as the “imager's fallacy” due to its prevalence in the literature. However, presence versus absence of a significant effect across two comparisons (e.g., groups) does not demonstrate a significant difference between the two; demonstrating that the two effects are different requires a direct statistical comparison of the effects. Likewise, claims about differences in activation across hemispheres or regions must be supported by a significant interaction. It is critical to note that identification of a significant regional response does not imply that this region is uniquely or more strongly involved in the process of interest compared to other regions, merely that, while the null hypothesis has been refuted in this region, it has not been so refuted elsewhere. Authors should try to avoid implying that their activated region is the only region involved in the task. If they do wish to directly assert that one or more other regions was not active, this assertion should be accompanied by effect sizes, confidence intervals, or Bayesian posterior probabilities for the effect.
fMRI provides an embarrassment of riches due to the high dimensionality of the data, but this comes with the cost of a high risk of type I error due to the very large number of concurrent statistical tests. Hence it is essential that authors specify the magnitude of the multiple testing problem and how this issue is dealt with. The severity of the problem is described by the number of voxels tested and smoothness of the data (the estimated smoothness, not applied smoothness, if reported by the software). Examples of specific approaches to multiple testing include voxel- or cluster-wise control of family-wise error, voxel-wise control of the false discovery rate, or formal heuristics which have been shown (in peer reviewed publication) to control false positives in some objective manner. Be clear about the inferences that can be drawn from your approach. For example, if you have used an uncorrected threshold then state clearly that you have unquantified control of family-wise error. Corrected or both corrected and uncorrected inferences should be reported and clearly labeled according to the type of correction. When cluster-based inference is used, this should be clearly noted and both the threshold used to create the clusters and the threshold for cluster size should be reported. Finally, while thresholds must account for the multiplicity of tests, we do encourage authors to make available unthresholded statistic and effect size images in order to display the whole range of effects in the data, including those that do not reach significance. These maps also make it easier to compare effect sizes across studies and increase the options for future meta-analyses.
The effective presentation of fMRI results often involves presentation of figures with thresholded color-coded statistical maps or presentation of tables listing locations of significant activation. For figures, important details include the nature of the statistical map, the intensity and cluster size threshold used to create the image, the identity of the underlying anatomical image, and any additional operations that have been performed to the map (such as masking out particular regions). It is helpful to put these details in the caption. It is best to present statistical maps at the same threshold used in the results section, but if different thresholds are used for the figure and results text, then this must be clearly specified. For multi-contrast experiments, plots of effect size for each contrast (e.g., condition) in a given region of interest can be helpful, though it is important to indicate how the ROI was identified. Likewise, tables should include information about the nature of the statistical map and thresholding operations. Minimum data to be included in a table should include location of activation in stereotactic space (e.g., that of the maximum for voxel-level inference), statistics regarding the activation cluster (including maximum statistic value and size of the cluster), and anatomical labels. The means by which the anatomical labels are derived (e.g., an atlas or automated labeling method) should be clearly specified. We also recommend that tables or figures include some form of effect size measure (e.g., mean percent signal change and standard deviation) in order to allow future meta-analyses.
The question often arises as to how data should best be presented. There are many acceptable forms for presentation of fMRI results, from bar graphs to maximum intensity projections (‘glass brains’) to full color cortical surface renderings, and each has its rightful place. Our general recommendation is that the nature of the data presentation should follow from the hypotheses that are being tested. Thus, if hypotheses are being tested at the group level, it likely makes most sense to present group-averaged maps, whereas a study that is testing hypotheses about individual differences should present some representation of the data that makes these differences clear (e.g., scatterplots or boxplots).
There is a broad range of quality control measures that are applied in fMRI data acquisition and analysis, with no common set of measures or methods across laboratories. We encourage both the use and the detailed documentation of quality control measures in order to provide reviewers and readers with the best possible ability to estimate the presence of potential problems with the data or analysis. One particular measure that we recommend is the presentation (either in supplementary materials or in a downloadable online format) of the voxel mask used in the group data analysis, which demonstrates which voxels were included in the analysis. In our experience, examination of the mask can provide a quick way to determine the presence of a number of problems with the data. In recommending the presentation of data for the purposes of quality control, we follow the example of other fields, such as human genetics (Chanock et al., 2007) and gene microarrays (Shi et al., 2006).
Instituting a more consistent and coherent policy for the reporting of fMRI methods should ensure that reviewers and readers of publications have the greatest possible ability to understand and potentially reconstruct the methods employed in the study. Furthermore, we believe that the generally accessible web page may help promote a broadly collaborative approach to defining and refining these guidelines and, in so doing, may promote their wider acceptance. We realize that this could result in published papers that are longer, but the costs of such lengthening should be outweighed by a more effective literature, and the ability to publish online supplementary materials in many journals also facilitates the presentation of more extensive methodological details without lengthening the main text.
Many other areas of bioscience are currently undergoing similar debates regarding minimal information standards for methodological reporting, such as the MIAME guidelines for microarray research (Brazma et al., 2001) and the CONSORT guidelines for clinical trials (Begg et al., 1996), and we hope that the fMRI community will join us in working towards a community standard for fMRI methods reporting. In some areas (e.g., clinical trials), checklists like the one in our Appendix A are required to be completed for submission of papers. We hope neuroimaging journals will consider this requirement.
This document is derived from a set of guidelines for presenting neuroimaging analyses originally developed by Tom Nichols, with contributions to the discussion from the following people: Max Gunther, Karsten Specht, Kent Kiehl, Mauro Pesenti, Jesper Andersson, Iain Johnstone, Robert Welsh, Dara Ghahremani, Alexa Morcom, Lena Katz, Jack Kelly, Cyril Pernet and Alex Shackman. Thanks to Joe Devlin, Dara Ghahremani, Karl Friston, Jeanette Mumford, and Jack Van Horn for helpful comments on an earlier version, Nelson Freimer for useful discussion, and Marisa Geoghegan for assistance with manuscript preparation.