The main purpose of this article is to advocate for detailed descriptions of what is meant by visual inspection of IC spatial maps and/or time courses, in articles that use this term. Such descriptions are needed to facilitate the exploration of visual inspection as part of fMRI data denoising procedures. The level of detail in such descriptions must be sufficient to allow reproduction by independent investigators, meaningful comparison of findings, and scientific discussion of the potential uses for and advantages of one method of visual inspection over another. We provide an example of a description of an operationalized denoising procedure that includes visual inspection of ICs, which we believe is detailed enough to meet this requirement. The procedure involves a novel combination of methods and conceptual framework that allows for the possibility that some ICs might represent a combination of noise and neural signals of interest, rather than representing exclusively one category or the other. Two independent trained raters achieved 96% (κ = 0.91) agreement using this procedure; and testing the procedure on a separate, task-related dataset resulted in statistically significant (18% increase in z-score) and visually noticeable improvements in sensitivity for detecting responses in brain areas relevant to the experimental paradigm.
Our goal was to define a data-driven VIID procedure that combines elements from previously described ICA-based denoising approaches and can be performed with a minimum of training and labeling time, using widely available brain imaging software packages. The procedure focuses primarily on the appearance of IC spatial maps because we found that IC spatial maps are often highly suggestive of artifacts or of neural activity corresponding to known brain functions (constellations of brain areas used for particular brain functions). The use of peripheral “activation” to identify artifacts was suggested in McKeown et al. (1998)
and implemented in Tohka et al. (2008)
. Considering diffuse, speckled (or spotty) patterns as indications of noise was suggested in McKeown et al. (1998)
; and conversely, considering tightly clustered, not diffusely spread spatial patterns as indications of neural signal was discussed in Sui et al. (2009)
. Utilization of information concerning what portion of spatial maps lies in CSF, GM, and WM for automated methods of IC-labeling was described in Stevens et al. (2007)
and Sui et al. (2009)
. Using "activation” in the superior sagittal sinus (Criterion D) as an indication of artifacts is similar to the focus on vasculature in Zou et al. (2009)
. Such activation has been hypothesized to be due to breathing-related changes in central venous pressure (Windischberger et al., 2002
), and has been found to correlate with the cardiac cycle (Dagli et al., 1999
). The sinus co-activation criterion was secondary because we noticed that sometimes such activation was present in the spatial maps of components that in all other respects seemed to reflect neural activity. Only three temporal aspects of ICs were considered in the VIID procedure, to be applied in secondary criteria when labeling based on spatial characteristics was inconclusive. The high-frequencies criterion (A) is exactly as was implemented in Greicius et al. (2004
); the "spikes" criterion (B) is similar to criteria suggested in McKeown et al. (1998)
and implemented in Tohka et al. (2008)
; and we included the saw-tooth pattern criterion (C), similar to the “quasiperiodic” pattern suggested in McKeown et al. (1998)
, because we hypothesized that saw-tooth temporal patterns are a sign of aliasing of cardiac and/or respiratory signals whose frequencies are faster than the Nyquist frequency for our experiment (0.25 Hz) (Huettel et al., 2004a
; Beckmann et al., 2005
). The main reason for focusing only secondarily on IC temporal characteristics was that frequency ranges of artifactual and neural activity sometimes overlap: Some artifacts manifest themselves at frequencies typically dominated by neural activity (Beckmann et al., 2005
; Birn et al., 2006
), and some neural activity occurs at relatively higher frequencies, as illustrated by EEG data (Luck, 2005
). Such overlap in frequencies complicates the determination of how much of component variance is due to artifacts vs. neural signal. In addition, we found that some of the methods for labeling ICs based on temporal characteristics required programs that were not available in the FSL software package.
We introduced a two-pass system because of perceived advantages with individual ICs over group ICs and vice-versa. Individual ICs allow comparison with a more precise segmentation of high-resolution images; while group ICs may reflect signals that might only be revealed after combining data from all study participants. In some cases, the group ICAs might not be necessary depending upon how much of the variance from artifacts is removed in the first pass of denoising, as illustrated in the denoising of our task-related data. We performed intensity normalization between the first and second passes because the (spatial) ICA constraint of spatial independence reduces the likelihood that ICA would detect components affecting most of the brain (Thomas et al., 2002
), so some form of global brain signal removal may enhance ICA-based denoising, as was the case with denoising of our task-related data. We did not regress out global brain signal (e.g., CSF, WM, GM, or whole-brain signal) as in Fox et al. (2005)
because we were concerned that in cases where the data variance is dominated by neural signals of interest, regressing out global brain signals might remove much of neural signals (Petersson et al., 1999
; Birn et al., 2006
), more so than would be the case with intensity normalization; however for "noisy" data, regressing out global brain signals should be an acceptable alternative to intensity normalization.
The rationale for further exploration of VIID procedures includes the following. 1) The current study and others have demonstrated the potential improvement in fMRI data analysis sensitivity that may be obtained by adding ICA-based denoising to conventional fMRI data preprocessing. 2) Procedures for manual labeling need to be defined and validated, in order for validation of automated methods based on manual methods to be meaningful. 3) Improved methods for IC labeling with visual inspection may model enhancements to IC labeling with automated methods. 4) Perhaps the most compelling reason for exploring IC labeling with visual inspection is its presumed accuracy, despite potential limitations regarding rater expertise, time expended by raters, and rater subjectivity. The situation is analogous to that with visual inspection for determination of regions of interest (ROIs) in fMRI and other brain imaging studies. The amount of training and time required for manual drawing of ROIs can be prohibitive. According to one estimate, at least one month of training is required, followed by hours to weeks to manually label the anatomy of a single brain (Klein et al., 2009
). However, in spite of such costs and the availability of automated methods for drawing ROIs, some studies still utilize manual drawing of ROIs (McKeown and Hanlon, 2004
; Pereira et al., 2007
; Wager et al., 2008
), presumably because of anticipated improvement in drawing accuracy with manual methods (Huettel et al., 2004c
). Although human subjectivity can decrease reliability for manual drawing of ROIs or visual inspection of ICs, this decrease would be inconsequential if it could be demonstrated that the overall accuracy for manual methods is greater than that for automated methods. Also, a potential selection bias due to the human element can frequently be neutralized by blinding the rater with respect to knowledge of subject and/or scanning session characteristics.
We believe that the demands of time for training and performance of IC labeling with visual inspection are small enough that visual inspection may be preferred over automated methods in many applications involving ICA-based denoising. Raters performing IC labeling should have some knowledge of the general locations of brain CSF, WM, and GM, but detailed knowledge of comparative brain anatomy as required for drawing ROIs is not necessary. The results from our inter-rater reliability study indicate that training of raters need not take more than a few hours, and once raters are trained, approximately 5–10 minutes (possibly fewer) are required per subject, depending upon the number of IC's generated per subject and the rater's experience with the visual inspection labeling procedure. Most components can be classified quickly through pattern recognition. The process of visual inspection can also save time by serving a dual role in facilitating fMRI quality assurance (Huettel et al., 2004b
) through identifying artifacts, including those related to scanner malfunction.
Our example description of fMRI data denoising through visual inspection of ICs is not intended as an optimal, finished proposal, but as one of many conceivable proposals, and a starting point for discussion of what elements should be included in a VIID procedure. In addition to the minor modifications to the procedure that we have indicated, many potential enhancements can be considered. To provide three examples: 1) Methods involving task-related information (if available) can be added to the VIID procedure (McKeown, 2000
; Calhoun et al., 2001a
; Thomas et al., 2002
; Moritz et al., 2003
; Kochiyama et al., 2005
; McKeown et al., 2005
). 2) Combining the proposed visual inspection method with an automated component clustering method such as Partner-Matching (PM) (Wang and Peterson, 2008
) might save time and mental effort in the labeling process by organizing components across subjects or sessions into groups according to spatial similarity. One could begin manual labeling on clusters of homologous ICs identified by PM as being highly reliable (most significantly reproducible), representing artifacts or neural activity that are common across subjects or sessions. Then, one could visually inspect the remaining clusters of components that are less significantly reproducible across subjects or sessions, which may represent components of artifacts or neural activity that exist only in some individuals. Conversely, the proposed visual inspection method may be applied to verification of the automated labeling of highly reliable clusters of similar components identified by PM, to explore the possibility that such components, spatial similarities notwithstanding, might appear to differ somewhat in degree of contribution from artifactual vs. neural signals. 3) We have observed that ICs that are reproducible from one run to the next or that correlate well with an experimental task appear to maintain the characteristic form of their spatial maps regardless of what z-score is used for thresholding. In contrast, the spatial maps of some other components dwindle in size and disappear quickly as thresholding z-scores are increased. We speculate that this property might be useful in distinguishing components representing predominantly neural signals from those representing predominantly random noise. Automated methods could be used to generate statistics such as the slope of the log of the number of thresholded voxels as a function of thresholding z-score. Such statistics could then be incorporated into a visual inspection procedure. This example illustrates the general principle that automated methods can be used to enhance visual inspection by providing the rater with precise statistics that would otherwise be time-consuming or impossible for the rater to approximate.
We envision that the development of VIID methods might proceed as follows. Potentially useful elements that may be incorporated in visual inspection procedures would be described and motivated. Based on face validity considerations, one or more VIID procedures would be selected for evaluation of reliability and validity. Testing of inter-rater reliability would be straightforward. However, validity testing would be complicated by the absence of a definitive standard for what constitutes a “correct” IC. With manual drawing of ROIs the structures shown in brain images can ultimately be compared to those seen in cadavers, and well-established comparative neuroanatomy considerations can help determine the accuracy of ROI drawings. With ICA-based denoising no physical entity corresponding to an IC can be examined. Spatially independent components are an abstraction of the fMRI data that does not take into account non-linear effects and that is based upon the assumption that the complexities of brain function can be modeled simply, with no more than a few dozen ICs that are assumed to reflect perfectly temporally synchronized brain activity (McKeown and Sejnowski, 1998
; Thomas et al., 2002
). Thus, validation efforts cannot determine the “correctness” with which ICs reflect brain function. Instead, VIID validation must proceed by circuitous routes. For example, artificial signal can be added to fMRI data to gain insight into how such additions might affect the generated set of ICs. Experiments of this kind might elucidate the conditions under which an IC might represent a synthesis of neural signal, structured noise, and/or random noise.
Ultimately, the most important indicator of the success of a VIID procedure is the improvement in functional signal-to-noise ratio that results from its application. A variety of methods can be used to assess the improvement in sensitivity and/or specificity of fMRI data analysis after denoising (Biswal et al., 1996
; Liu et al., 2001
; Stone et al., 2002
; Thomas et al., 2002
; Kochiyama et al., 2005
; McKeown et al., 2005
; Gretton et al., 2006
). If an improvement in receiver operating characteristics is observed after a modification of a VIID procedure, we can judge that the modification has a positive effect on the process, even if we cannot demonstrate that the accuracy of generating ICs or of labeling them has been enhanced. Thus, it should be possible to improve VIID procedures through a process of trial and error, systematically including or excluding various elements of the procedures. We hope that such an iterative process will result in successively better denoising procedures and elucidate the potential advantages and disadvantages of IC labeling with visual inspection methods compared with automated methods.
In conclusion, the use of visual inspection to label ICs has been reported as part of procedures for denoising fMRI data and as a standard of comparison for automated labeling methods. In order for such studies to be reproducible, detailed descriptions of visual inspection procedures are needed. We address this need by providing an example of an operationalized VIID procedure and demonstrate its reliability, costs in terms of time for training and IC labeling, and capacity for improving the sensitivity of results from fMRI data analysis. In addition to serving as a procedure that can readily be implemented using a standard brain imaging software package (FSL, among others) we hope that the procedure will serve as a starting point for discussion of what elements should be included in a VIID procedure, and will encourage investigators to document what steps are involved in their visual inspection procedures.