|Home | About | Journals | Submit | Contact Us | Français|
Over the past year, a heated discussion about ‘circular' or ‘nonindependent' analysis in brain imaging has emerged in the literature. An analysis is circular (or nonindependent) if it is based on data that were selected for showing the effect of interest or a related effect. The authors of this paper are researchers who have contributed to the discussion and span a range of viewpoints. To clarify points of agreement and disagreement in the community, we collaboratively assembled a series of questions on circularity herein, to which we provide our individual current answers in ≤100 words per question. Although divergent views remain on some of the questions, there is also a substantial convergence of opinion, which we have summarized in a consensus box. The box provides the best current answers that the five authors could agree upon.
Brain imaging produces very large data sets of brain activity measurements. However, the neuroscientific conclusions in papers are typically based on a small subset of data. The necessary selection—unless carefully accounted for in the analysis—can bias and invalidate statistical results (Vul et al, 2009a; Kriegeskorte et al, 2009).
The large number of brain locations measured in parallel allows us to discover brain regions with particular functional properties. However, the more we search a noisy data set for active locations, the more likely we are to find spurious effects by chance. This complicates statistical inference and decreases our sensitivity to true brain activation. In functional magnetic resonance imaging, the goal is typically twofold: (1) to identify voxels that contain a particular effect and (2) to estimate the size of the effect, typically within a region of interest. Whether widely used analyses meet the resulting statistical challenges has been hotly debated in the past year.
Let us consider the first goal: finding brain regions that contain a particular effect. For example, we may wish to answer questions such as: Which voxels respond more to faces than houses? Or, in which voxels does the face-house contrast correlate with IQ across subjects? The use of many null-hypothesis tests across brain locations presents a multiple testing problem: the more voxels that are tested, the greater the family-wise error rate (FWE), i.e., the probability that one or more voxels will pass the significance threshold by chance even when there are no true effects (false-alarm voxels). A number of statistical methods have been developed to control the FWE (for a review, see Nichols and Hayasaka, 2003).
The Bonferroni method increases the significance threshold for each voxel to ensure that the FWE does not exceed, say 0.05. However, as Bonferroni's method does not account for image smoothness, it is overly conservative and not optimally sensitive. Random field theory methods (Worsley et al, 1992; Friston et al, 1994) adjust for spatial correlation between voxels to achieve greater sensitivity (i.e., power—the probability that a truly active voxel will be identified as such). Although voxel-wise methods detect individual voxels, cluster-wise methods (Poline and Mazoyer, 1993) report as significant clusters (contiguous sets of voxels that all exceed a primary threshold) that are larger than a predetermined cluster-size threshold (chosen to ensure a 5% FWE for clusters).
Instead of limiting the probability of any false alarms (i.e., the FWE), false-discovery rate methods (Genovese et al, 2002) limit the average proportion of false alarms among the voxels identified as significant. This approach promises greater sensitivity when there are effects in many voxels. When used appropriately, these methods solve the multiple testing problem and ensure that we are unlikely to mistake an inactive region for an active region.
The second goal is estimating the size of the effect. For example, we may wish to answer questions such as: How strongly do these voxels respond to faces? Or, how highly does the activation contrast in this region correlate with IQ across subjects? Unfortunately, we cannot accurately address such questions by simply analyzing the selected voxels without worrying about the selection process. The effect-size statistics need to be independent of the selection criterion; otherwise the results will be affected by ‘selection bias.' For intuition, imagine the data were pure noise. If we select voxels by some criterion, those voxels are going to better conform to that criterion than expected by chance (for randomly selected voxels). Even if the selected voxels truly contain the effect of interest, the noise in the data will typically have pushed some voxels into the selected set and some others out of it, thus inflating the apparent effect in the selected set.
This problem has long been well-understood in theory, but is not always handled correctly in practice. Variants of bias due to selection among noisy effect estimates affect many parts of science. Just like voxels are selected for inclusion in an ROI, so studies are selected for publication in scientific journals (Ioannidis, 2005; 2008). In either case, the selection criterion is effect strength, and effect estimates are inflated as a result.
Vul et al (2009a) suggested that cross-subject correlation neuroimaging studies in social neuroscience are affected by ‘nonindependence' (also see Vul and Kanwisher, 2010). Kriegeskorte et al (2009) discuss the problem of ‘circularity' more generally as a challenge to systems neuroscience.
These authors argued that effect estimates and tests based on selected data need to be independent of the selection process, and that this can be ensured by using independent data for selection (e.g., using half of the data to select signal-carrying voxels, and the other half to estimate the signal) or by using inherently independent functional or anatomic selection criteria.
Although there is little controversy about the basic mechanism of selection bias, the 2009 papers have sparked a debate about exactly which analysis practices are affected and to what degree (Diener, 2009; Nichols and Poline, 2009; Yarkoni, 2009; Lieberman et al, 2009; Lazar, 2009; Lindquist and Gelman, 2009; Barrett, 2009; Vul et al, 2009b; Poldrack and Mumford, 2009). Herein, we collaboratively assembled and then individually answered a series of questions on circular analysis to clarify points of agreement and disagreement. Each answer is ≤100 words. We hope to contribute to a convergence within the community toward statistical practices that ensure that systems and cognitive neuroscience remain solidly grounded in empirical truth.
The authors declare no conflict of interest.