|Home | About | Journals | Submit | Contact Us | Français|
Independent component analysis (ICA) is a multivariate approach that has become increasingly popular for analyzing brain imaging data. In contrast to the widely used general linear model (GLM) that requires the user to parameterize the brain’s response to stimuli, ICA allows the researcher to explore the factors that constitute the data and alleviates the need for explicit spatial and temporal priors about the responses. In this paper, we introduce ICA for hemodynamic (fMRI) and electrophysiological (EEG) data processing, and one of the possible extensions to the population level that is available for both data types. We then selectively review some work employing ICA for the decomposition of EEG and fMRI data to facilitate the integration of the two modalities to provide an overview of what is available and for which purposes ICA has been used. An optimized method for symmetric EEG-fMRI decomposition is proposed and the outstanding challenges in multimodal integration are discussed.
Since its inception in 1992 (Frahm, Bruhn, Merboldt, & Hanicke, 1992; Kwong, et al., 1992; Ogawa, et al., 1992) functional magnetic resonance imaging has become a major tool to study human brain function (Amaro & Barker, 2006; Bandettini, Birn, & Donahue, 2000; Huettel, Song, & McCarthy, 2004). This was propelled by the non-invasiveness, excellent spatial resolution, flexible experiment design and repeatability. The signal in fMRI is based on changes in magnetic susceptibility of the blood during brain activation, it does not directly reflect neuro-electric activity in the brain. The signal is based upon the complex coupling of neuronal activity, metabolic activity and blood flow parameters in the brain (Heeger & Ress, 2002; Logothetis & Wandell, 2004; Raichle & Mintun, 2006). The hemodynamic response is delayed and smoothed relative to the neuronal activity, and it is usually not possible to reconstruct the neuro-electric process from the hemodynamic process. Nevertheless, the hemodynamic signal remains a very informative surrogate for neuronal activity. Moreover, what fMRI alone lacks in temporal resolution and direct relationships with neuro-electric activity could be compensated by the combination with concurrent EEG measurements (Debener, Ullsperger, Siegel, & Engel, 2006; Eichele, et al., 2005).
A wide variety of studies have been performed to delineate the neuronal correlates of cognitve functions (for a review see e.g. Cabeza & Nyberg, 2000), use activity patterns to predict perception and behavior, identify abberant regional activation in clinical populations, and many more. Inferences made from fMRI data rely most often on pre-specified temporal models, i.e. the correlation between a predicted time course that models the activation to stimuli/responses and the measured data. However, in many studies the temporal dynamics of event related or intrinsic regional activations are difficult to predict due to the lack of a well-understood brain-activation model. In contrast, independent component analysis (ICA) is an exploratory, data-driven tool that can reveal inter-subject and inter-event differences in the temporal dynamics of the fMRI signal without a prior model. ICA is increasingly utilized as a tool for evaluating the hidden spatio-temporal structure contained within electrophysiological and hemodynamic brain imaging data. The strength of ICA is its ability to reveal function-relevant dynamics for which a temporal model cannot be specified a priori (Calhoun & Adali, 2006; Eichele, et al., 2008b), or is not available, such as in resting state data (Damoiseaux, et al., 2006; Fox & Raichle, 2007). Below, we will introduce a ICA generative model and the extension of ICA to group data. We will review how the method has been used in fMRI, EEG and concurrent EEG-fMRI research, and identify some of the outstanding challenges for future work.
ICA is a multivariate statistical method used to uncover hidden sources from multiple data channels (e.g., electrodes, microphones, images) such that these sources are maximally independent. Typically, it assumes a generative model where observations are assumed to be linear mixtures of statistically independent sources. Unlike principal component analysis (PCA), which decorrelates the data, ICA includes higher-order statistics to achieve independence. An intuitive example of ICA can be given by a scatter-plot of two independent signals s1 and s2. Figure 1a (left, middle) show the projections for PCA and ICA, respectively, for a linear mixture of s1 and s2 and Figure 1a shows a plot of the two independent signals (s1, s2) in a scatter-plot. PCA finds the orthogonal vectors u1, u2, but cannot identify the independent vectors. In contrast, ICA is able to find the independent vectors a1, a2 of the linearly mixed signals (s1, s2), and is thus able to restore the original sources.
A typical ICA model assumes that the recorded signal consists of statistically independent and non-Gaussian sources that are linearly mixed. Consider an observed M – dimensional random vector denoted by x = [x1, x2,, xM]T:
where s = [s1, s2,, sN]T is an N -dimensional vector whose elements are the random variables that refer to the independent sources and AM×N is an unknown mixing matrix. Typically M ≥ N, so that A is usually of full rank. The goal of ICA is to estimate an unmixing matrix WN×M such that y given by
is a good approximation to the ‘true’ sources s.
Two commonly used ICA algorithms for EEG and fMRI data derived within these formulations are Infomax (Bell & Sejnowski, 1995) and FastICA (Hyvärinen, Karhunen, & Oja, 2001). Both Infomax and FastICA typically work with a fixed nonlinearity or one that is selected from a small set, e.g., two in the case of extended Infomax (Lee, Girolami, & Sejnowski, 1999). These algorithms work well for symmetric distributions and are less accurate for skewed distributions and for sources close to Gaussian. An intuitive introduction into ICA is provided by Stone (Stone, 2004), and a detailed mathematical description by Hyvärinen (Hyvärinen, Karhunen, & Oja, 2001).
The initial application of single-subject temporal ICA to EEG data was introduced by Makeig using multichannel event related potentials (Makeig, Jung, Bell, Ghahremani, & Sejnowski, 1997). Since then, ICA of EEG signals has become popular for a wide user community, facilitated through the open-source toolbox EEGLAB (www.sccn.eeglab.edu). The use of ICA for multi-channel EEG recordings has been reviewed (Onton, Westerfield, Townsend, & Makeig, 2006), and a conceptual framework for using ICA for the study of event-related brain dynamics has been proposed by Makeig and colleagues (Makeig, Debener, Onton, & Delorme, 2004b).
The popularity of ICA is in large part due to two key features. First, it is a powerful way to remove artefacts from EEG data (Jung, et al., 2000a; Jung, et al., 2000b), and second, it helps to disentangle otherwise mixed brain signals (Makeig, et al., 2002). One of the most common artefacts typically identified by ICA is shown in Figure 2. Eye blinks can often be easily identified by their characteristic topography and time course. Also very common in EEG recordings are ICs reflecting lateral eye movements, and muscle activity (EMG). ICA from high-density EEG recordings typically reveals a number of these components. Beyond artifacts, various examples have been published demonstrating the potential of ICA for the separation of event-related brain activity patterns (Debener, Makeig, Delorme, & Engel, 2005; Debener, et al., 2005; Delorme, Westerfield, & Makeig, 2007; Makeig, et al., 2002; Onton, Delorme, & Makeig, 2005). These studies have combined ICA with single-trial EEG analysis, thereby allowing exploration of brain dynamics beyond the evoked fraction of the signal that is preserved in the ERP. Typically, the inspection of non-artefact ICs suggests that some can be specifically linked to stimulus processing and explain condition-specific variance, which affords functional inferences about event-related component activity (for a review, see Onton & Makeig, 2006).
Following its first application to fMRI (McKeown, et al., 1998), ICA has been successfully utilized in a number of fMRI studies, especially in those that have proven challenging to analyze with the standard regression-type approaches (Calhoun & Adali, 2006; McKeown, Hansen, & Sejnowsk, 2003). Spatial ICA of fMRI finds systematically non-overlapping, temporally coherent brain regions without constraining the shape of the temporal response. Note that ICA can be used to discover either spatially or temporally independent components (Stone, 2004). Due to the data structure, in which the number of observed volume elements (voxels) by far exceeds the number of observed time points, most applications to fMRI use the former approach and seek components that are maximally independent in space. The aim of ICA is then to factor a two dimensional data matrix (voxels-by-timepoints) into a product of a set of time courses and a set of independent spatial patterns. The choice of spatial or temporal independence has been somewhat controversial, although these are just two possible modeling assumptions. McKeown et al. argued that the sparse distributed nature of the spatial pattern for typical cognitive activation paradigms would work well with spatial ICA (sICA). Furthermore, since the prototypical image artifacts are also sparse and localized, e.g., vascular pulsation, CSF flow signals in the ventricles, or breathing induced motion (signal localized to strong tissue contrast near discontinuities), the Infomax algorithm with a sparse prior is very well suited for spatial analysis, and has also been used for temporal ICA as have decorrelation-based algorithms (Calhoun, Adali, Pearlson, & Pekar, 2001; Petersen, Hansen, Kolenda, Rostrup, & Strother, 2000). Stone et al., proposed a method which attempts to maximize both spatial and temporal independence (Stone, Porrill, Buchel, & Friston, 1999). An interesting combination of spatial and temporal ICA was presented by Seifritz et al. (Seifritz, et al., 2002) hey used an initial sICA to reduce the spatial dimensionality of the data by locating a region of interest in which they then subsequently performed temporal ICA to study in more detail the structure of the response in the human auditory cortex.
Unlike univariate methods such as the general linear model (GLM), ICA does not naturally generalize to a method suitable for drawing inferences about observations from multiple subjects. For example, when using the GLM, the investigator specifies a fixed set of regressors, and so drawing inferences about group data comes naturally, since all individuals in the group share the same regressors. In ICA, by contrast, different individuals in the group will have different time courses and component maps. Even if components are very similar across subjects they require a subsequent clustering effort to be grouped together. Accordingly, it is not immediately clear how to draw inferences about group data by using single-subject ICA.
To overcome this problem, several multi-subject ICA approaches have been proposed (Beckmann & Smith, 2005; Calhoun, Adali, Pearlson, & Pekar, 2001; Esposito, et al., 2005; Guo & Pagnoni, 2008; Lukic, Wernick, Hansen, & Strother, 2002; Schmithorst & Holland, 2004; Svensen, Kruggel, & Benali, 2002). The various approaches differ in terms of how the data are preprocessed and organized prior to the ICA analysis and what types of outputs are generated. One possibility is to perform single-subject ICA and then attempt to combine the output into a group post hoc by using a clustering metric or simple correlation of the components (Calhoun, et al., 2001; Esposito, et al., 2005). This has the advantage of allowing for unique spatial and temporal features, but has the disadvantage that the components are not necessarily unmixed in the same way for each subject.
Other approaches involve ICA computed on condensed group data. The advantage of these approaches is that they perform one single ICA on the group data, in which subject-specific and condition-specific contributions can be identified. Note that a temporal concatenation approach allows for unique time courses for each subject, but assumes common group maps. A spatial concatenation approach on the other hand allows for unique maps but assumes common time courses. It appears that temporal concatenation works better for fMRI data (Guo & Pagnoni, 2008; Schmithorst & Holland, 2004). Temporal concatenation is implemented in MELODIC (http://www.fmrib.ox.ac.uk/fsl/) and in GIFT (http://icatb.sourceforge.net/). GIFT additionally implements a back-projection step which returns subject specific maps and timecourses for further analysis (Calhoun, Adali, Pearlson, & Pekar, 2001; Eichele, et al., 2008a). A detailed comparison of several group ICA approaches including temporal concatenation and tensor ICA is provided in recent papers (Guo & Pagnoni, 2008; Schmithorst & Holland, 2004).
While group spatial ICA is more prominent in fMRI, it has been more typical in EEG research to employ single-subject temporal ICA and cluster individual component sets along features of interest for inferences about groups of subjects (Debener, Makeig, Delorme, & Engel, 2005; Debener, et al., 2005; Onton, Delorme, & Makeig, 2005; Onton, Westerfield, Townsend, & Makeig, 2006). In the case of group temporal ICA on EEG time domain data, the spatial concatenation, aggregation and data reduction with PCA precedes component estimation, enabling the identification of components that contribute to event-related potentials (Eichele, et al., 2008a), and is implemented in EEGIFT (http://icatb.sourceforge.net/). A similar group ICA model in combination with partial least squares was proposed recently by Kovacevic (Kovacevic & McIntosh, 2007). The drawback however is that processes which are not well time-locked across subjects, such as ongoing EEG rhythms, cannot be satisfyingly reconstructed in this model. The accuracy of component detection and back-reconstruction with such a group ICA model depends on the degree of intra- and inter-individual latency jitter of event related EEG processes (see also Moosmann, Eichele, Nordby, Hugdahl, & Calhoun, 2008).
We focus on the group ICA approach implemented in the GIFT software, which is freely available and has been designed for analysis of EEG and fMRI data. GIFT uses multiple data reduction steps following data concatenation, and enables a statistical comparison of individual maps and time courses. GIFT essentially estimates a mixing matrix which has partitions that are unique to each subject. Once the mixing matrix is estimated, the component maps for each subject can be computed by projecting the single subject data onto the inverse of the partition of the unmixing matrix that corresponds to that subject. Mathematically, if we let Xi = Fi−1Yi be the L×V reduced data matrix from subjecti, where Yi is the K×V data matrix containing the preprocessed fMRI or EEG data, Fi−1 is the L×K reducing matrix determined by the PCA decomposition, V is the number of fMRI voxels or EEG timepoints, K is the number of fMRI time points or EEG channels and L is the size of the dimension following reduction. The reduced data from all subjects is concatenated into a matrix and reduced using PCA to N dimensions the number of components to be estimated. The N×V reduced, concatenated matrix for the M subjects is
where G−1 is an N ×LM reducing matrix (also determined by a PCA decomposition) and is multiplied on the right by the LM×V concatenated data matrix for the M subjects. Following ICA estimation, we can write X = ÂŜ where Â is the N×N mixing matrix and Ŝ is the N×V component fMRI map or EEG timecourse. Substituting this expression for X into equation (3) and multiplying both sides by G, we have
Partitioning the matrix G by subject provides the following expression
We then write the equation for subject i by working only with the elements in partition i of the above matrices such that
The matrix Ŝi in equation (6) contains the single subject maps for subject i and is calculated from the following equation
We now multiply both sides of equation (6) by Fi and write
which provides the ICA decomposition of the data from subject i, contained in the matrix Yi. The N×V matrix Ŝi contains the N source maps in fMRI and timecourses in the EEG modality and the K×N matrix FiGiÂ is the single subject mixing matrix and contains the fMRI component time course or the EEG component topography for each of the N components.
It has become popular to collect multiple types of imaging and other (e.g. genetic) data from the same participants, often in settings where relatively large groups are sampled. Each imaging method informs on a limited domain and typically provides both common and unique information about the problem in question. Approaches for combining or fusing data in brain imaging can be conceptualized as having a place on an analytic spectrum with meta-analysis (highly distilled data) to examine convergent evidence at one end and highly detailed large-scale computational modeling at the other end (Husain, et al., 2002). In between are methods that attempt to perform a direct data fusion (Horwitz & Poeppel, 2002). One promising data fusion approach is to first process each image type and extract features from different modalities. These features are then examined for relationships among the data types at the group level. This approach allows us to take advantage of the ‘cross’-information among data types and when performing multimodal fusion provides a natural link among different data types (Ardnt, 1996; Savopol & Armenakis, 2002). Here we focus selectively on ‘integration by prediction’, where typically some feature from the EEG (e.g. alpha power, P300 amplitude) is convolved with a canonical hemodynamic response function and used as a predictor of hemodynamic activity in a GLM. Integration-by-prediction is based on the assumption that the hemodynamic response is linearly related to local changes in neuronal activity, in particular local field potentials (Heeger & Ress, 2002; Lauritzen & Gold, 2003; Logothetis, Pauls, Augath, Trinath, & Oeltermann, 2001). Since large-scale synchronous field potentials are the basis for the scalp EEG (Nunez, 1995), such integration can be achieved by investigating correlations between BOLD and scalp EEG. This can be done either continuously over time, as in the study of background rhythms and epileptic discharges in the EEG (for a review see Laufs, Daunizeau, Carmichael, & Kleinschmidt, 2008), or in the context of inducing variation in some cognitive operation (Debener, Ullsperger, Siegel, & Engel, 2006; Debener, et al., 2005; Eichele, et al., 2005). ICA has been used in this context on many levels: for artifact reduction during preprocessing, for feature extraction from the EEG, decomposition of fMRI, parallel decomposition of EEG and fMRI, and joint ICA of multimodal data.
Besides using ICA for feature extraction and making inferences, it has been successfully employed during pre-processing for denoising of EEG data acquired in the scanner. Temporal ICA is very useful for artifact reduction in EEG data (Jung, et al., 2000a; Jung, et al., 2000b), and it is similarly useful in pre-processing of simultaneously acquired EEG-fMRI data. Principally, the same artifacts affect the in-scanner EEG recording as outside the scanner. In addition, in-scanner EEG recordings suffer from further artifacts, most prominently the pulse artifact. A number of authors have used ICA for removing the pulse artifact, given that template-based rejection algorithms (Allen, Josephs, & Turner, 2000; Allen, Polizzi, Krakow, Fish, & Lemieux, 1998) do not always sufficiently control for this type of artifact. ICA has been shown to be helpful in reducing the pulse artifact only at lower field strengths (Debener, Mullinger, Niazy, & Bowtell, 2008; Debener, et al., 2007; Mantini, et al., 2007). Since this artifact seems to violate the spatial stationarity assumption inherent in temporal ICA, other existing tools such as optimal basis sets (OBS) might be more advantageous (Niazy, Beckmann, Iannetti, Brady, & Smith, 2005). It is our experience that ICA returns better unmixing results when used subsequent to pulse artifact correction with OBS or template-based rejection algorithms (Debener, et al., 2007).
Another powerful option is to use ICA to identify and select a brain signal of interest, such as the power modulation of the posterior alpha rhythm (Feige, et al., 2005), frontal theta (Scheeringa, et al., 2008), or single trial variability in the error-related negativity (ERN, Debener, et al., 2005). ICA here provides an unmixed and denoised source waveform, and the assumption is that this improves the correlation between EEG and fMRI signals. This approach assumes that unmixed data allows a better matching of specific EEG timecourses to their underlying spatial sources in the fMRI. An intuitive example is activity in the 8-12 Hz range, and a common ICA observation often ignored in the literature is that this frequency range is occupied by at least posterior alpha and central mu sources. These rhythms are regionally and functionally separable, and it is easily conceivable that sampling alpha activity from single or grouped posterior electrodes in which all these activities mix provides a less clear measure of any of these sub-processes than decomposition and selection of one feature (Feige, et al., 2005). Similar considerations apply to other rhythms, and to event related activity. For example, the isolation of an ERN-like topography and time course separates the variability associated with response-locked error-related processing from stimulus-locked and background activity (Debener, et al., 2005).
In other circumstances where it is difficult to define a feature of interest it seems more appropriate to employ ICA for artifact reduction only and choose the back-projected, mixed EEG rather than the sources for correlation analysis with the fMRI (Eichele, et al., 2005). The results from both method choices may provide comparable results (Bagshaw & Warbrick, 2007). However, one caveat is that temporal independence in the EEG does not necessarily equate with different regional generators of the activity in the fMRI, such that a variety of scenarios are conceivable in which unmixing of the EEG could reduce sensitivity of the EEG-fMRI correlation. Also, although the mixing problem is acknowledged and addressed in the EEG, this is not extended in these publications to the treatment of fMRI data, where it exists as well.
While the work cited above used ICA to decompose EEG data for a better integration with the fMRI BOLD signal, more recently alternative approaches have been used. Here, ICA is applied to address the spatio-temporal mixing problem that is inherent in fMRI data as well, while the EEG signals were not subjected to ICA. For instance, Mantini and colleagues (Mantini, Corbetta, Perrucci, Romani, & Del Gratta, 2009; Mantini, Perrucci, Del Gratta, Romani, & Corbetta, 2007) decomposed resting state and event related fMRI data with fastICA and subsequent component clustering across subjects and extracted distributed regional networks. In the resting state data (Mantini, Perrucci, Del Gratta, Romani, & Corbetta, 2007), the BOLD signals of these networks were correlated with power fluctuations in the delta, theta, alpha, beta, and gamma bands from concurrently recorded EEG. Each network was characterized by a signature that involved variable contributions from the different bands. The interesting idea of this work is the perspective that wide-band EEG activity principally describes large-scale brain networks more comprehensively than narrow-band activity. In the event related data (Mantini, Corbetta, Perrucci, Romani, & Del Gratta, 2009), the amplitude fluctuation of the P300 revealed correlations with attentional networks. The approach illustrates a means of how to meaningfully reduce the multiple testing problem in fMRI without defining regions of interest a priori. However, again, only one modality was unmixed. In this case the EEG remained mixed.
The two approaches described above each solve a part of the mixing problem and make way for refined spatio-temporal mapping. However, the choice to unmix only one modality but not the other is somewhat inconsistent with the reasoning that leads researchers to use ICA in the first place. We now lay out how parallel unmixing could be used for EEG-fMRI integration.
The motivation for implementing parallel EEG-fMRI decomposition is based on the data structure and the presumed physiology of signal generation in EEG and fMRI. First, consider that a concurrent experiment generates a multidimensional dataset consisting of about 104 to 105 volume elements by 102 to 103 timepoints sampled in the fMRI, by 102 to 103 timepoints by 102 to 103 trials by 32-128 scalp channels sampled in the EEG, by a number of participants that constitute a group study, providing a rich and complex source of information. The utility of ICA lies in the visualization and exploratory assessment of such data. Second, we can assume that processing of simple stimuli and tasks produces widespread event-related responses that are spatially and temporally mixed across the brain (Baudena, Halgren, Heit, & Clarke, 1995; Halgren, et al., 1995a; Halgren, et al., 1995b; Halgren & Marinkovic, 1995; Makeig, Debener, Onton, & Delorme, 2004a; Onton, Westerfield, Townsend, & Makeig, 2006). The scalp EEG samples a spatially degraded version of brain activity, where the potential is a mixture of independent timecourses from large-scale synchronous field potentials. Similarly, the neurovascular transformation of the electrophysiological activity into hemodynamics provides temporally degraded and spatially mixed signals across the fMRI volume (Calhoun & Adali, 2006; McKeown, Hansen, & Sejnowski, 2003).
The spatial and temporal mixing in both modalities creates situations in which prediction of fMRI activity by EEG features in a mass-univariate voxel-by-voxel framework may be difficult since neither the predictor, nor the response variables are any likely to represent single and coupled sources of variability. Thus, one improvement to achieve a more symmetric treatment of the data is to unmix both modalities in parallel. We have developed such an analysis framework for multisubject data that employs Infomax ICA (Bell & Sejnowski, 1995) to recover a set of statistically independent maps from the fMRI, and independent time-courses from the EEG separately (Eichele, et al., 2008a). We utilized the approach that is described above, i.e. create aggregate data containing observations from all subjects, estimate a single set of ICs and then back-reconstruct these in the individual data (Calhoun, Adali, Pearlson, & Pekar, 2001; Schmithorst & Holland, 2004). The analysis is done on the single trial level rather than averaged data (cf. Calhoun, Adali, Pearlson, & Kiehl, 2006), and it does not assume a joint mixing matrix for both modalities (cf. Calhoun, Adali, Pearlson, & Kiehl, 2006; cf. Moosmann, Eichele, Nordby, Hugdahl, & Calhoun, 2008). The components are matched across modalities by correlating their trial-to-trial modulation, where we can use the convolved trial-by-trial modulation in EEG components to predict the fMRI, or (vice versa) employ deconvolution and single trial estimation on the fMRI component timecourses, and use these as predictors for the trial-by-trial amplitude modulation in the EEG. Parallel group ICA provides a means to disentangle and visualize large scale networks both in their spatial and temporal form, given coherent neuronal sources jointly express scalp electrophysiologic and hemodynamic features (Calhoun & Adali, 2006; Debener, Ullsperger, Siegel, & Engel, 2006; Makeig, Debener, Onton, & Delorme, 2004a; McKeown, Hansen, & Sejnowski, 2003; Onton, Westerfield, Townsend, & Makeig, 2006).
Joint ICA (jICA) is another approach, which enables the joint decomposition of multi-modal data that have been collected from the same sample of subjects. Here, we primarily consider a set of extracted features from each subject’s data, and these data form the multiple observations. Given two (or more) sets of group data, XF and XG, we concatenate the two datasets side-by-side to form XJ and write the likelihood as
where uJ = WxJ. Here, we use the notation in terms of random variables such that each entry in the vectors uJ and xJ correspond to a random variable, which is replaced by the observation for each sample n = 1,, N as rows of matrices UJ and XJ. When posed as a maximum likelihood problem, we estimate a joint unmixing matrix W such that the likelihood L(W) is maximized.
Let the two datasets XF and XG have dimensionality N×V1 and N×V2, then we have
Depending on the data types in question, the above formula can be made more or less flexible. This formulation characterizes the basic jICA approach and assumes that the sources associated with the two data types modulate the same way across N subjects. Note that the assumption of the same linear covariation for both modalities is fairly strong. However, it has the advantage of providing a parsimonious way to link multiple data types and has been demonstrated in a variety of cases with meaningful results, for example for fusing ERPs with fMRI contrast images (Calhoun, Adali, Pearlson, & Kiehl, 2006). The extension of joint ICA for concurrent single trial data is described in a simulation study by Moosmann and colleagues (Moosmann, Eichele, Nordby, Hugdahl, & Calhoun, 2008). Although robust results can be acquired with this methodology, broader validity of the application to real single trial data would require an iterative procedure with estimation and removal of the hemodynamic transfer function prior to fusion (see below).
The methodological and conceptual development in the field of multimodal integration is ongoing, and ICA plays a prominent role in this effort. With respect to EEG-fMRI integration we will highlight two issues that are currently being adressed. First, we discuss the utility of hemodynamic deconvolution and single trial estimation in the fMRI, and second, the need for more flexible modeling of the EEG-fMRI coupling.
An outstanding problem that was not addressed in the work described above are the assumptions made about the hemodynamic response function (HRF). Most previous concurrent EEG-fMRI studies assumed a generic transfer HRF to the transient neuronal response evoked by discrete sensory stimulation (e.g. derived from Boynton, Engel, Glover, & Heeger, 1996) to model the hemodynamic activation, whereby the parameters of the HRF such as shape and latency of the peak and undershoot are implicitly assumed to be fixed across the entire brain and across subjects. However, quite some variability of the HRF exists across regions and subjects (Aguirre, Zarahn, & D’Esposito, 1998; Glover, 1999; Handwerker, Ollinger, & D’Esposito, 2004). Inclusion of temporal and dispersion derivative terms of the HRF into the model can alleviate such differences on the first level. However, it ignores the potential for amplitude bias induced by model mismatches due to variable hemodynamic delays and is typically not helpful for 2nd level random effects analyses (Calhoun, Stevens, Pearlson, & Kiehl, 2004). Therefore, in order to achieve a more sensitive and less biased analysis it is more desirable to estimate subject- and region-specific HRFs. Another consideration is that the eliciting conditions that generate a response which conforms to the canonical HRF are not necessarily valid for intrinsic/resting state activity and also more complex cognitive operations. An illustrative example of estimating the HRF from EEG-data has been published by de Munck (de Munck, et al., 2008; de Munck, et al., 2007). Here, HRF’s were directly estimated from the alpha power modulation in concurrent EEG-fMRI data, and indeed show systematic latency and shape differences from the canonical HRF. The regional activation with significant HRFs was much more widespread than what was previously reported, which can be taken as an indication that the sparser activation patterns in earlier studies (Goldman, Stern, Engel, & Cohen, 2002; Moosmann, et al., 2003) could have resulted from poor sensitivity.
Apart from estimating the HRF from the EEG, event related HRFs can be estimated from independent component timecourses through deconvolution (Eichele, et al., 2008b). While typically noise-tolerant models are needed for voxel-by-voxel deconvolution, ICA forms large regions of interest with quasi-denoised time courses such that deconvolution can be done simply by forming a convolution matrix of the stimulus onsets with an assumed duration of the HRF. FMRI single trial response amplitudes can then be recovered by fitting a design matrix (X) containing separate predictors for the onset times of each trial convolved with the estimated HRF onto the IC timecourse, estimating the scaling coefficients (β) in the multiple linear regression model y = β·X + ε using least squares. In this approach, the HRF is estimated and then effectively removed from the hemodynamic data, leaving just the amplitude modulation across trials, which has the same resolution as the EEG trial-by-trial dynamics (see figure 4).
There are a variety of ways how one can conceive the coupling between electrophysiology and hemodynamic signals. Linear regression between fMRI and EEG is usually used in combination with ICA to investigate links between modalities (e.g., Debener et al., 2006). In searching for such one-to-one mappings it is assumed that a particular EEG feature is related to a particular fMRI network. Although this is justifiable since both the fMRI maps and EEG topographies are different from each other and stationary, it appears oversimplified since in principle several fMRI networks can affect several EEG signals. One explanation might be that one component is a generator, i.e. it has an open field configuration, is reasonably large and close to the cortex surface to generate a scalp potential, while other components transiently and remotely modulate this source, while being electrophysiologically silent on the scalp, since these otherwise should induce topographic variability in the EEG. This explanation already implies that the fMRI networks are coupled with each other, which is reflected in their functional connectivity (Eichele, et al., 2008b; Jafri, Pearlson, Stevens, & Calhoun, 2008), and effective connectivity (Stevens, Kiehl, Pearlson, & Calhoun, 2007). Another related explanation is that the fMRI networks might be considered spatially independent nodes in regionally distributed, functionally coherent source networks, such that many nodes can contribute directly or indirectly to the scalp potentials in a given paradigm/task. Here, the EEG components, although temporally independent on the scalp, would not represent a single source but still a weighted average of multiple spatially independent regional sources. Both these explanations are physiologically plausible (Baudena, Halgren, Heit, & Clarke, 1995; Halgren, et al., 1995a; Halgren, et al., 1995b; Halgren & Marinkovic, 1995), and although it is typically not the aim of integration-by-prediction to directly solve the inverse problem, it is helpful to consider in which way the topographies and timecourses relate to each other.
If we assume that many fMRI networks affect a particular EEG feature one way of addressing this is by employing multiple regression with a design that contains all fMRI trial-by-trial modulations for prediction of EEG activity. The problem of collinearity between regressors can be addressed by orthogonalization (Andrade, Paradis, Rouquette, & Poline, 1999), stepwise selection, or a decomposition of the fMRI component timecourses into set of unrelated factors by means of e.g. PCA or ICA (similar to the procedure in Seifritz, et al., 2002). However, such a treatment also means that the interpretation of the results becomes less straightforward. Other options include canonical correlation analysis (CCA) in order to treat the problem. Here, however, 2nd level inference is less well defined. Also, a joint (temporal) ICA of the sIC and tIC trial-by-trial modulations appears feasible (Calhoun, Adali, Pearlson, & Kiehl, 2006; Moosmann, Eichele, Nordby, Hugdahl, & Calhoun, 2008), but has as yet not been explored in detail. A conceptually more advanced approach that, however, needs detailed specifications would be to adapt dynamic causal models that explain the EEG/ERP responses as changes in the effective connectivity between independent sources (Friston, 2005; Friston, Harrison, & Penny, 2003; Garrido, Kilner, Kiebel, & Friston, 2007).
ICA is a powerful data driven approach that has been successfully used to analyze EEG, fMRI, and simultaneous EEG-fMRI data. It has been shown repeatedly that the integrations of both modalities can be achieved on the (statistical) source level, as provided by ICA. The overview provided here demonstrates the utility and diversity of the various existing ICA-based approaches for the analysis of brain imaging data. Two key challenges in the field, the estimation of the HRF and a more flexible way of data integration, were identified. Multimodal integration by means of ICA is still under development. However, we expect ICA to contribute to more refined ways of achieving comprehensive joint mapping of electrophysiology and hemodynamic signals.
This work was supported by a grant from the L. Meltzer university fund (801616) to TE, and by the National Institutes of Health, under grants 1 R01 EB 000840, 1 R01 EB 005846, and 1 R01 EB 006841 to VDC.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.