|Home | About | Journals | Submit | Contact Us | Français|
Detecting artifacts produced in EEG data by muscle activity, eye blinks and electrical noise is a common and important problem in EEG research. It is now widely accepted that independent component analysis (ICA) may be a useful tool for isolating artifacts and/or cortical processes from electroencephalographic (EEG) data. We present results of simulations demonstrating that ICA decomposition, here tested using three popular ICA algorithms Infomax, SOBI, and FastICA, can allow more sensitive automated detection of small non-brain artifacts than applying the same detection methods directly to the scalp channel data. We tested the upper bound performance of five methods for detecting various types of artifacts by separately optimizing and then applying them to artifact-free EEG data into which we had added simulated artifacts of several types, ranging in size from thirty times smaller (−50 dB) to the size of the EEG data themselves (0 dB). Of the methods tested, those involving spectral thresholding were most sensitive. Except for muscle artifact detection where we found no gain of using ICA, all methods proved more sensitive when applied to the ICA-decomposed data than applied to the raw scalp data: the mean performance for ICA was higher and situated at about two standard deviations away from the performance distribution obtained on raw data. We note that ICA decomposition also allows simple subtraction of artifacts accounted for by single independent components, and/or separate and direct examination of the decomposed non-artifact processes themselves.
In event-related experiments, each data epoch normally represents one or more experimental trials time locked to one or more experimental events of interest. Typically, software for ERP analysis first subtracts a baseline – e.g., the average pre-stimulus potential– from each trial, then finds and eliminates ‘bad’ electrodes at which the resulting potential values exceed some predefined bound or some level of noise. The remaining ‘good’ electrodes usually include central scalp sites containing mainly brain activity, temporo-parietal sites that may contain temporal muscle artifacts, and frontal sites that may contain prominent blinks, eye movement and other muscle artifacts. It is critical to detect such artifacts contaminating event-related EEG data for several reasons. First, artifactual signals often have high amplitudes relative to brain signals. Thus, even if their appearance in the recorded EEG is infrequent, they can bias average evoked potential or other measures computed from the data and, as a consequence, bias or dilute the results of an experiment. In clinical research, however, artifacts may be abundant, limiting the usability of the data altogether.
In most current EEG analysis, single data trials that contain out-of-bounds potential values at single electrodes are selected for rejection from analysis. A problem with the simple thresholding criterion is that it only takes into account low-order signal statistics (minimum and maximum). This rejection method may fail to detect e.g. muscle activity, which typically involves rapid electromyographic (EMG) signals of small to moderate size, nor will it detect artifacts generated by small eye blinks. Statistical measures of EEG signals may contain more relevant information about these and other types of artifacts. For instance, linear trend detection may help in isolating current drift. Computing the probability of each data epoch, given the probability distribution of potential values over all epochs, may help in detecting trials with improbable artifacts. A 4th order moment of the data distribution, its kurtosis, may detect activity distribution indicative of some artifacts. Finally standard threshold detection methods applied to the single trial data spectra may help in detecting artifacts with specific spectral signatures.
Independent Component Analysis (ICA) (Bell and Sejnowski, 1995; Jung et al., 2001; Makeig et al., 1996) applied to concatenated collections of single-trial EEG data has also proven to be an efficient method for separating distinct artifactual processes including eye blink, muscle, and electrical artifacts (Barbati et al., 2004; Delorme et al., 2001; Iriarte et al., 2003; James and Gibson, 2003; Joyce et al., 2004; Jung et al., 2000b; Tran et al., 2004; Urrestarazu et al., 2004; Zhukov et al., 2000). Although several ICA algorithms in different implementations have been used to separate artifacts from EEG and MEG data, they all can be derived from related mathematical principles (Lee et al., 2000). While ICA is now considered an important technique for detecting artifacts, there are still few quantitative measures of the advantage for artifact detection that is gained from ICA decomposition.
Here we develop a framework for comparing artifact detection methods and use it to determine whether preprocessing EEG data using ICA can help in detecting brief data epochs that contain artifacts. We first apply a set of statistical and spectral analysis methods to detect artifacts in the raw data, optimizing a free parameter for each method so as to optimally detect known artifactual data epochs. Then, we apply the same procedure to the data decomposed using ICA. Finally, we quantitatively compare results of these artifact detection methods applied either to raw or to ICA-preprocessed data.
Software routines for performing the artifact detection methods described above are available within the EEGLAB toolbox (Delorme and Makeig, 2004).
To test and optimize the artifact detection process, we used event-related EEG data from a ‘Go/Nogo’ visual categorization task (Delorme et al., 2004). EEG was recorded at a 1000 Hz sampling rate using a 32-electrode scalp montage with all channels referenced to the vertex electrode (Cz). The montage did not include specific eye artifact channels, but did include channels for electrodes located above the eyes (FPz; FP1, FP2). Responses to target and non-target stimuli presented about every 2 seconds were recorded for each subject. Data epochs were extracted surrounding each stimulus, extending from 100 before to 600 ms after stimulus onsets. The mean value in the pre-stimulus baseline (−100 to 0 ms) was subtracted from each individual epoch. Data were then pruned of noticeable eye and muscle artifacts by careful visual inspection (AD), resulting in 119 “clean” data epochs.
We then simulated five types of artifacts (Fig. 1):
In the test data depicted above, each data channel could only have one type of artifact, excepting the first two artifact types (eye and muscles), which projected with varying strengths to all the electrodes. We took care that the randomly selected channels for each artifact type differed from each other and did not coincide with channels where the two first topographical artifacts had maximum amplitude.
Since our goal was to test the sensitivity of each method for detecting artifacts, we varied simulated artifact amplitude to find the smallest artifacts that each method in Section II could detect. Artifacts at the smallest amplitude level (−50 dB) were so small that none of the methods were able to detect them. For each artifact type, amplitude was gradually increased from −50 dB to 0 dB. To compute signal to noise ratio (SNR; i.e., artifact to background brain EEG signal ratio), we divided the spectrum of each type of artifact (not mixed yet with data) at each frequency by the data spectrum at the same frequency. We then found the frequency with the largest SNR and converted it to dB scale (10*log10(SNR)). Prior to computing SNR for the first two (topographic) artifacts, we scaled their amplitudes by the highest channel gain in the applied scalp map.
Since we knew which data trials contained simulated artifacts, for each type of artifact we could determine the most efficient artifact detection method. For each method, we chose one free threshold parameter(s) which we optimized to estimate an upper bound on the ability of the method to detect artifacts of the given type.
We assumed voltage thresholds to be symmetrical, so only one parameter had to be optimized in the standard thresholding method. Linear trend detection had two parameters (minimum slope and goodness of fit). We set the slope to be equal or higher than the minimal artifactual slope at the maximum level of noise (0.5 µV/700 ms) and then optimized the goodness-of-fit parameter. Since this was time consuming and was specifically aimed at detecting trends, we applied this method only to detecting linear trend artifacts. For the probability and kurtosis methods, we optimized the z threshold. Finally, for the spectral measure, we optimized the dB limits independently for three frequency bands (0 to 3 Hz, 20 to 60 Hz, and 60 to 125 Hz) and then used the frequency band that returned the best results.
For each parameter, the optimization procedure minimized the total number of trials misclassified (misses plus false alarms). To reduce the computational load, we used a procedure that recursively divided its value range until a minimum was reached. More powerful non-linear optimization methods we considered, using Matlab Nelder-Mead multidimensional unconstrained nonlinear minimization, proved computationally infeasible. We attempted to design an algorithm requiring the least number of iterations while minimizing the risk of falling into local minima. To perform the optimization, we thus divided the parameter interval range into 10 equally spaced intervals, and evaluated the number of correctly classified trials at these points. We located the two values surrounding local minima and created a new interval. Then, we ran the procedure recursively three times, repetitively dividing the newly created value range into 10 intervals and counting correctly classified trials, thus obtaining the ‘optimal’ parameters for each detection algorithm.
ICA separates EEG processes whose time waveforms are maximally independent of each other. The separated processes may be generated either within the brain or outside it. For instance, eye blinks and muscle activities produce ICA components with specific activity patterns and component maps (Jung et al., 2000b; Makeig et al., 1997). However, scalp EEG activity as recorded at different electrodes is highly correlated and thus contains much redundant information. Also, several artifacts might project to overlapping sets of electrodes. Thus it would be useful to isolate and measure the overlapping projections of the artifacts to all the electrodes and this is what ICA does (Jung et al., 2000a; Makeig et al., 1996).
To build intuition about how ICA works, one might imagine an n-electrode recording array as an n-dimensional space. The recorded signals can be projected into a more relevant coordinate frame than the single-electrode space: e.g. the independent component space. In this new coordinate frame, the projections of the data on each basis vector – i.e. the independent components – are maximally independent of each other. Intuitively, by assessing the statistical properties of the data in this space, we might be able to isolate and remove the artifacts more easily and efficiently.
Multiplication of the scalp data, X, by the unmixing matrix, W, found for example by infomax ICA represents a linear change of coordinates from the electrode space to the independent component space, or
where S is the ‘activation’ matrix of the components across time. Each component is a linear weighted sum of the activity of an independent source process projecting to all of the electrodes. Each row of the unmixing matrix W that extracts each component activity time course may be seen as a spatial filter for a distinctive activity pattern in the data. Each independent component comprises an activation time course plus an associated scalp map (the corresponding column of W−1) that gives the relative projection strengths (and polarities) of the component to each of the electrodes.
All the artifact detection methods described in the previous section were also applied to raw potential values decomposed using different ICA algorithms. In this study we used three ICA algorithms most often used to date to process EEG data: Infomax ICA, SOBI, and FastICA. These ICA algorithms have the same overall goal (Lee et al., 2000) and generally produce near-identical results when applied to idealized source mixtures. The approach of each algorithm to estimating independence is different: Infomax employs a parametric approach to estimate the component probability distributions (Bell and Sejnowski, 1995), whereas FastICA maximizes the neg-entropy of the component distributions (Hyvarinen and Oja, 2000). SOBI is a second-order method that requires and takes advantage of temporal correlations in the source activities (Belouchrani and Cichocki, 2000). Results produced by these algorithms may differ when the component activities are not far from Gaussian or are not independent.
For infomax, we used default parameters as implemented in the runica function of the EEGLAB toolbox (Delorme and Makeig, 2004). These involved pre-sphering of the data, and stopping training when weight change was less than 10−6. Since extended infomax ICA decompositions revealed that the EEG data did not contain any sub-Gaussian component, we here used the non-extended version of infomax for somewhat faster computation (sub-Gaussian components of EEG data include single-frequency line noise, not simulated here). Since we were processing data epochs rather than continuous data, we slightly modified the SOBI algorithm (Belouchrani and Cichocki, 2000) so that data covariance matrices for different time delays were computed for each epoch, then averaged over data epochs. This modified version of the SOBI algorithm is currently been made available for testing in the EEGLAB toolbox (Delorme and Makeig, 2004). When running SOBI, we set to 100 the number of correlation matrices at different time delays (Akaysha Tang, personal communication). For the FastICA algorithm (Hyvarinen and Oja, 2000), we forced the decomposition to estimate all components in parallel (“symmetric” approach for version 2.1 of the FastICA algorithm available on the Internet), setting believed to be suitable for EEG data analysis (Aapo Hyvärinen, personal communication). Since we could not determine which component contained relevant artifacts, for each artifact type each detection method was applied to all components and the single component returning the best results was used in the detection process.
Altogether, we generated a total of 20 datasets with different artifact and noise instantiations, and then used a Linux cluster of 36 processors (1.4 GHz or faster) to optimize parameters for each detection method and each dataset. The results presented here required about 24 hours of computation on this cluster.
Results for each detection method and each artifact type are presented in Fig. 2, which shows results for one artifact type in each row and one detection method in each column. Applied either to the raw data or to ICA component activities, the frequency thresholding method performed the best overall; the joint probability method was second best, and standard thresholding third. Kurtosis thresholding performed the poorest, though it was partly successful in detecting large discontinuity and trend artifacts. Finally, the trend detection method was the most efficient for detecting linear trends in the data, although nearly equal performance was achieved using the frequency thresholding method (1–3 Hz band).
However, one has to balance the performance advantage against the speed of computation. Here, simple thresholding was fastest, while the joint probability was 4 times slower, kurtosis rejection was 8 times slower, trend rejection was about 25 times slower, and spectral rejection was about 120 times slower. However, for all detection methods, the measure of interest (joint probability, kurtosis, slope, or spectrum) does not need to be computed several times. Once a measure has been computed, simple thresholding may be applied repeatedly to obtain optimal results (see Methods). The EEGLAB routines that implement these detection methods use this strategy (Delorme and Makeig, 2004).
We then attempted to compare the performance of artifact detection applied to the raw data and to the same data preprocessed using ICA. To do so, we only considered the frequency thresholding method since it outperformed most methods for all types of artifacts. Since the performance trend, as artifact size decreased, was different for each artifact type, we normalized each performance curve to the logistic function before averaging. To do so, we fitted each performance curve obtained from simulated data with a sigmoid function by optimizing its slope and horizontal offset. We then normalized the observed data points on each curve so they would lie on a sigmoid curve centered on 0 with slope 1. We applied the same transformation to the ICA results using the normalization parameters obtained from the simulated data (this explains why standard deviations were larger for the ICA results than for raw data results). In Fig. 3, we plot the mean normalized performance curves for the raw and ICA-separated data for all three ICA algorithms we tested - Infomax, SOBI and FastICA (see Methods). We expected that frequency thresholding would be more efficient when applied to data preprocessed by any of the tested ICA algorithms than when applied to raw data. This is indeed what we observed (Fig. 3). Data preprocessing by ICA led to a 10–20% increase in artifact detection performance for all the three ICA algorithms tested. Detection performance was best using infomax ICA, approaching two standard deviations above raw data mean detection performance (shaded area in Fig. 3).
We have shown that optimally applying spectral methods to identify artifacts in 32-channel EEG data epochs allowed more reliable detection of smaller artifacts than optimally applying the same thresholding methods to the scalp channel data itself. Preprocessing the data using ICA allows more effective artifact detection. However, it should be noted that the frequency thresholding methods are as efficient at detecting muscle artifact in either the raw or ICA decomposed data. For this type of artifact, ICA decomposition does not seem to improve artifact detection.
In our simulated data, mixing of artifacts with data was perfectly linear. Might this not be the case for real data? In fact, instantaneous mixing via EEG volume conduction of artifacts and EEG processes is linear. By Ohm’s law, externally imposed electrical artifacts (DC trends, discontinuities, white noise) also mix linearly with EEG data at scalp electrodes. On this basis, at least, linear ICA decomposition algorithms are not inappropriate for separating artifacts from other data processes. On the other hand, these simulations may have disadvantaged the ICA approach since, for example, the simulated muscle artifacts may have been mixed with some actual low-level muscle activity in the ‘clean’ EEG. This might explain why muscle artifact detection applied to the ICA decompositions did not strongly outperform artifact detection applied to the raw data.
Here we optimized each measure threshold based on our ‘ground-truth’ knowledge of the simulated data. The results showed that threshold optimization might be of most benefit when applied to the raw channel data, where frequency domain thresholds must be finely tuned to best separate artifacts from the background noise. Optimal tuning of frequency-domain thresholds for ICA component activities might be less important in the (typical) case in which ICA largely isolates stereotyped eye blink, muscle, heart and line noise artifacts to a single ICA component.
First, several major assumptions of ICA seem to be fulfilled in the case of EEG recordings. As mentioned previously, a first assumption is that the ICA component projections are summed linearly at scalp electrodes. A second assumption is that sources are independent. This is not strictly realistic but even if the appearance of artifacts might be related to brain activity – muscle contractions, for example, triggered by activity in the motor cortex – the time courses of the resulting artifacts and the triggering brain events should typically be different across all or some trials. Thus, they may be accounted for by different independent components (Jung et al., 2000b). A third assumption concerns the non-Gaussianity of the source activity distributions. This last condition is quite plausible for artifacts, which are usually sparsely active and thus far from Gaussian in value distribution. Finally, the spatial stationarity assumption for the component projections is compatible with many, though not all observations (for more detailed discussion, see Jung et al., 2000a; Zhukov et al., 2000). In our results, we noticed that the Infomax algorithm seems to perform better than the two other ICA algorithms, SOBI and FastICA. Conceivably, this advantage may have resulted from using a more fortunate set of training parameters For example, the FastICA algorithm has more than a dozen tunable parameters. Since, running the whole analysis using each algorithm required about 8 hours on a cluster of 36 workstations, optimizing each algorithm’s parameters was not feasible.
It is also important to note the difference between stereotyped biological and line noise artifacts considered here and non-stereotyped artifacts such as may be induced by generalized head and electrode movements. Such non-stereotyped artifacts may quickly introduce a variety of unique scalp patterns into the EEG data, which may in turn confound and compromise ICA decompositions. Therefore it is important to identify and discard such noise periods from the data before running ICA, as was done, by visual inspection, for the EEG data used in this study. Some types of artifacts (e.g., line noise or muscle artifact) may also be partially removed using frequency-band filtering methods. We did not explore such methods for removing of artifacts from data epochs, but instead focused here on methods for detecting artifactual data epochs.
To perform artifact detection in practice, we generally recommend (1) setting detection thresholds such that roughly 10% of data trials are detected using a specified method, (2) visually inspecting data trials marked as artifactual, and (3) optimizing the thresholds manually. For instance, for the joint data probability measure (which in our tests here performed better than standard thresholding yet was much faster to compute than spectral thresholding), we usually use thresholds outside of 5 standard deviations above the mean. (For a Gaussian distribution, the probability that a tagged artifact trial belongs to the ‘ordinary’ trial distribution would then be less than 10−11). After finally rejecting the marked and checked data artifacts, the cleaned data may be decomposed again by ICA to study the recovered brain source dynamics and/or processed by other analysis methods.
It is important to consider whether outright rejection of epochs containing artifacts is necessary. Beyond advantages for the detection of artifacts, there may be several other advantages to using ICA in EEG analysis. Using ICA allows direct examination of information sources in the data (in a particular sense), rather than their summed effects at the scalp electrodes. By removing or minimizing the effects of volume conduction, ICA allows detailed examination of the separate dynamics and dynamic inter-relationships of different cortical areas (Delorme & Makeig, 2003; Makeig et al., 2002). Stereotyped artifacts separated into one or more components by ICA may be literally subtracted from the data by subtracting their summed back-projections from the raw data. Independent components analysis, by its nature, separates both artifactual and non-artifactual processes, many of which have distinct dynamics relative to experimental event and scalp maps consistent with their generation in single (or sometimes dual bilateral) patches of cortex. Analysis of the dynamics of such components may include epochs in which artifact component activities also occur, if ICA has already separated the ongoing brain EEG and artifact processes (Makeig et al., 2004).
This work was supported by a fellowship from the INRIA organization, by the Howard Hughes Foundation and the Swartz Foundation (Old Field, NY), and by the National Institutes of Health USA (grant RR13651-01A1).