|Home | About | Journals | Submit | Contact Us | Français|
Independent component analysis (ICA) has become an increasingly utilized approach for analyzing brain imaging data. In contrast to the widely used general linear model (GLM) that requires the user to parameterize the data (e.g. the brain's response to stimuli), ICA, by relying upon a general assumption of independence, allows the user to be agnostic regarding the exact form of the response. In addition, ICA is intrinsically a multivariate approach, and hence each component provides a grouping of brain activity into regions that share the same response pattern thus providing a natural measure of functional connectivity. There are a wide variety of ICA approaches that have been proposed, in this paper we focus upon two distinct methods. The first part of this paper reviews the use of ICA for making group inferences from fMRI data. We provide an overview of current approaches for utilizing ICA to make group inferences with a focus upon the group ICA approach implemented in the GIFT software. In the next part of this paper, we provide an overview of the use of ICA to combine or fuse multimodal data. ICA has proven particularly useful for data fusion of multiple tasks or data modalities such as single nucleotide polymorphism (SNP) data or event-related potentials. As demonstrated by a number of examples in this paper, ICA is a powerful and versatile data-driven approach for studying the brain.
Independent component analysis (ICA) is increasingly utilized as a tool for evaluating the hidden spatiotemporal structure contained within brain imaging data. In this paper, we first provide a brief overview of ICA and ICA as applied to functional magnetic resonance imaging (fMRI) data. Next, we discuss group ICA and ICA for data fusion with an emphasis upon the methods developed within our group and also discuss within a larger context of the many alternative approaches that are currently in use.
ICA is a statistical method used to discover hidden factors (sources or features) from a set of measurements or observed data such that the sources are maximally independent. Typically, it assumes a generative model where observations are assumed to be linear mixtures of independent sources, and unlike principal component analysis (PCA) which only uncorrelates the data, ICA works with higher-order statistics to achieve independence. Uncorrelatedness is only partway to independence, if two random variables are independent they are uncorrelated, however not all uncorrelated random variables are independent. An intuitive example of ICA can be given by a scatter-plot of two independent signals s1 and s2. Fig. 1a (left, middle) show the projections for PCA and ICA, respectively, for a linear mixture of s1 and s2 and Fig. 1a (right) shows a plot of the two independent signals (s1, s2) in a scatter-plot. PCA finds the orthogonal vectors u1,u2, but cannot identify the independent vectors. In contrast, ICA is able to find the independent vectors a1,a2 of the linearly mixed signals (s1, s2), and is thus able to restore the original sources.
A typical ICA model assumes that the source signals are not observable, statistically independent and non-Gaussian, with an unknown, but linear, mixing process. Consider an observed M–dimensional random vector denoted by x =[x1, x2,...,xM]T, which is generated by the ICA model:
where s =[s1, s2,...,sN]T is an N-dimensional vector whose elements are the random variables that refer to the independent sources and AM × N is an unknown mixing matrix. Typically M≥ N, so that A is usually of full rank. The goal of ICA is to estimate an unmixing matrix WN × M such that y given by
is a good approximation to the ‘true’ sources: s.
Since to achieve ICA, statistical information higher than second order is needed, it can either be generated using nonlinear functions or can be explicitly calculated. Algorithms that use nonlinear functions to generate higher-order statistics have been the most popular ICA approaches and there are a number of algorithms derived based on maximum likelihood estimation, maximization of information transfer, mutual information minimization, and maximization of non-Gaussianity. The first three approaches are equivalent to each other, and they coincide with maximization of non-Gaussianity when the unmixing matrix W is constrained to be orthogonal (Adali et al., 2008). The algorithms derived within these formulations have optimal large sample properties in the maximum likelihood sense when the nonlinearity within each algorithm is chosen to match the source density. Two commonly used ICA algorithms derived within these formulations are Infomax (Bell and Sejnowski, 1995; Lee et al., 1999) and FastICA (Hyvarinen and Oja, 1997). Another popular algorithm is joint approximate diagonalization of eigenmatrices (JADE) (Cardoso and Souloumiac, 1993), which relies on explicit computation of fourth-order statistical information. Both Infomax and FastICA typically work with a fixed nonlinearity or one that is selected from a small set, e.g., two in the case of extended Infomax (Bell and Sejnowski, 1995; Lee et al., 1999). These algorithms typically work well for symmetric distributions and are less accurate for skewed distributions and for sources close to Gaussian. Since optimality condition requires the nonlinearities to match the form of source distributions, there are a number of adaptation strategies that are developed. A flexible ICA using generalized Gaussian density model method is introduced in (Choi et al., 2000). Other flexible extensions of ICA include non-parametric ICA (Boscolo et al., 2001) and kernel ICA (Bach and Jordan, 2002) as well as approaches introduced in (Hong et al., 2005; Vlassis and Motomura, 2001). The variety of recent approaches for performing ICA and its applications in areas as diverse as biomedicine, astrophysics, and communications demonstrates the vitality of research in this area.
Following its first application to fMRI (McKeown et al., 1998), ICA has been successfully utilized in a number of exciting fMRI applications and especially in those that have proven challenging with the standard regression-type approaches for a recent collection of examples see (Calhoun and Adali, 2006; McKeown et al., 2003). Spatial ICA finds systematically non-overlapping, temporally coherent brain regions without constraining the shape of the temporal response. The temporal dynamics of many fMRI experiments are difficult to study with fMRI due to the lack of a well-understood brain-activation model whereas ICA can reveal inter-subject and inter-event differences in the temporal dynamics. A strength of ICA is its ability to reveal dynamics for which a temporal model is not available (Calhoun et al., 2002). A comparison of the GLM approach and ICA as applied to fMRI analysis is shown in Fig. 1b.
Independent component analysis is used in fMRI modeling to study the spatio-temporal structure of the signal, and it can be used to discover either spatially or temporally independent components (Jung, 2001). Most applications of ICA to fMRI use the former approach and seek components that are maximally independent in space. In such a setting, we let the observation data matrix be X,an N × M matrix (where N is the number of time points and M is the number of voxels) as shown in Fig. 1b. The aim of fMRI component analysis is then to factor the data matrix into a product of a set of time courses and a set of spatial patterns. An illustration of how ICA decomposes the data into a parsimonious summary of images and time courses is shown in Fig. 1c. The number of components is a free parameter, which has previously been either empirically determined or estimated. There are a number of approaches for estimating the number of components using information theoretic approaches (Beckmann and Smith, 2004; Li et al., 2007).
Since the introduction of ICA for fMRI analysis, the choice of spatial or temporal independency has been controversial. However, the two options are merely two different modeling assumptions. McKeown et al. argued that the sparse distributed nature of the spatial pattern for typical cognitive activation paradigms would work well with spatial ICA (sICA). Furthermore, since the proto-typical confounds are also sparse and localized, e.g., vascular pulsation (signal localized to larger veins that are moving as a result of cardiac pulsation) or breathing induced motion (signal localized to strong tissue contrast near discontinuities: “tissue edges”), the Infomax algorithm with a sparse prior is very well suited for spatial analysis (Petersen et al., 2000) and has also been used for temporal ICA (Calhoun et al., 2001c) as have decorrelation-based algorithms (Petersen et al., 2000). Stone et al., proposed a method which attempts to maximize both spatial and temporal independence (Stone et al., 1999). An interesting combination of spatial and temporal ICA was pursued by Seifritz et al. (2002); they used an initial sICA to reduce the spatial dimensionality of the data by locating a region of interest in which they then subsequently performed temporal ICA to study in more detail the structure of the non-trivial temporal response in the human auditory cortex.
Unlike univariate methods (e.g., regression analysis, Kolmogorov–Smirnov statistics), ICA does not naturally generalize to a method suitable for drawing inferences about groups of subjects. For example, when using the general linear model, the investigator specifies the regressors of interest, and so drawing inferences about group data comes naturally, since all individuals in the group share the same regressors. In ICA, by contrast, different individuals in the group will have different time courses, and they will be sorted differently, so it is not immediately clear how to draw inferences about group data using ICA. Despite this, several ICA multi-subject analysis approaches have been proposed (Beckmann and Smith, 2005; Calhoun et al., 2001a; Calhoun et al., 2001b; Calhoun et al., 2004a; Esposito et al., 2005; Guo and Giuseppe, In Press; Lukic et al., 2002; Schmithorst and Holland, 2004; Svensen et al., 2002). The various approaches differ in terms of how the data is organized prior to the ICA analysis, what types of output are available (e.g. single subject contributions, group averages, etc), and how the statistical inference is made.
A summary of several group ICA approaches is provided in Fig. 2. Approaches can be grouped into five categories. Fig. 2a illustrates approaches that perform single-subject ICA and then attempt to combine the output into a group post hoc by using approaches such as self-organized clustering or spatial correlation of the components (Calhoun et al., 2001a; Esposito et al., 2005). This has the advantage of allowing for unique spatial and temporal features, but has the disadvantage that since the data are noisy the components are not necessarily unmixed in the same way for each subject. The other four approaches involve an ICA computed on the group data directly. Temporal concatenation Fig. 2b and spatial concatenation Fig. 2c have both been examined. The advantage of these approaches is that they perform one ICA, which can then be divided into subject specific parts, hence the comparison of subject differences within a component is straightforward. The temporal concatenation approach allows for unique time courses for each subject, but assumes common group maps whereas the spatial concatenation approach allows for unique maps but assumes common time courses. Although they are really just two different approaches for organizing the data, temporal concatenation appears to work better for fMRI data (Schmithorst and Holland, 2004) most likely because the temporal variations in the fMRI signal are much larger than the spatial variations, and has been widely used for group ICA of fMRI data.
The temporal concatenation approach is implemented in the MELODIC software (http://www.fmrib.ox.ac.uk/fsl/) and also the GIFT Matlab software (http://icatb.sourceforge.net/). The GIFT software additionally implements a back-reconstruction step which produces subject specific images (Calhoun et al., 2001b). This enables a comparison of both the time courses and the images for one group or multiple groups (Calhoun et al., 2008) (see simulation in (Calhoun et al., 2001b) which shows ICA with temporal concatenation plus back-reconstruction can capture variations in subject specific images). The approach implemented in GIFT thus trades-off the use of a common model for the spatial maps against the difficulties of combining single subject ICA. An in-between approach would be to utilize temporal concatenation separately for each group (Celone et al., 2006), although in this case matching the components post hoc becomes again necessary. The approach in Fig. 2d involves averaging the data prior to performing ICA. This approach is less computationally demanding, but makes a more stringent assumption that requires a common time course and a common spatial map. Finally, the tensorial approach in Fig. 2e (implemented in MELODIC) involves estimating a common time course and a common image for each component but allows for a subject specific parameter to be estimated.
Higher order tensor decompositions (also known as multidimensional, multi-way, or n-way), have received renewed interest recently, although their adaptation to group and multi-group fMRI data is still being explored. Fig. 2e shows an approach based upon a three-dimensional tensor that has been developed to estimate a single spatial, temporal, and subject-specific ‘mode’ for each component to attempt to capture the multidimensional structure of the data in the estimation stage (Beckmann and Smith, 2005). This approach however may not work as well (without additional preprocessing) if the time courses between subjects are different, such as in a resting state study. A detailed comparison of several group ICA approaches including temporal concatenation and tensor ICA is provided in a recent paper (Guo and Giuseppe, In Press).
In the remainder of this paper, we focus on the group ICA approach implemented in the GIFT software (Calhoun et al., 2001b), which uses multiple data reduction steps following data concatenation to reduce the computational load, along with back-reconstruction and statistical comparison of individual maps and time courses following ICA estimation. An example group ICA analysis of nine subjects performing a four cycle alternating left/right visual stimulation task is presented in Fig. 3 (from Calhoun et al., 2001b). Separate components for primary visual areas on the left and the right visual cortex (depicted in red and blue, respectively) were consistently task-related with respect to the appropriate stimulus. A large region (depicted in green) including occipital areas and extending into parietal areas appeared to be sensitive to changes in the visual stimuli. Additionally some visual association areas (depicted in white) had time courses that were not task related. As we discuss later, group inference or comparison of groups can be performed by performing statistics on either the ICA images or the time courses.
Many studies are currently collecting multiple types of imaging data from the same participants. Each imaging method reports on a limited domain and typically provides both common and unique information about the problem in question. Approaches for combining or fusing data in brain imaging can be conceptualized as having a place on an analytic spectrum with meta-analysis (highly distilled data) to examine convergent evidence at one end and large-scale computational modeling (highly detailed theoretical modeling) at the other end (Husain et al., 2002). In between are methods that attempt to perform a direct data fusion (Horwitz and Poeppel, 2002). One promising data fusion approach is to first process each image type and extract features from different modalities. These features are then examined for relationships among the data types at the group level (i.e., variations among individuals or between patients and controls). This approach allows us to take advantage of the ‘cross’-information among data types and when performing multimodal fusion provides a natural link among different data types (Ardnt, 1996; Calhoun et al., 2006c; Savopol and Armenakis, 2002).
A natural set of tools for performing data fusion include those that transform data matrices into a smaller set of modes or components. Such approaches include those based upon singular value decomposition (SVD) (Friston et al., 1996; McIntosh et al., 1996) as well as ICA (McKeown et al., 1998). An advantage of ICA over variance-based approaches like SVD or PCA is the use of higher-order statistics to reveal hidden structure. In this paper, we describe two approaches for data fusion, joint ICA and parallel ICA. We show two examples, the first one involving event-related potential (ERP) and fMRI data and a second one on fMRI and genetic data.
In this section, we review the methods behind group ICA, joint ICA, and parallel ICA.
As we mentioned earlier, the group ICA approach implemented in GIFT incorporates temporal concatenation plus back-reconstruction. Fig. 4 (top) provides a graphical representation of the GIFT approach that essentially involves estimating a mixing matrix which has partitions that are unique to each subject. Once the mixing matrix is estimated, the component maps for each subject can be computed by projecting the single subject data onto the inverse of the partition of the mixing matrix that corresponds to that subject. In the end this provides subject specific time courses and images which can be used to make group and inter-group inferences.
An additional aspect to consider is that GIFT performs multiple data reduction steps, typically using PCA, primarily for computational reasons to reduce the amount of required memory. Mathematically, if we let be the L × V reduced data matrix from subject i, where Yi is the K × V data matrix (containing the preprocessed and spatially normalized data), is the L × K reducing matrix (determined by the PCA decomposition), V is the number of voxels, K is the number of fMRI time points and L is the size of the time dimension following reduction. The reduced data from all subjects is concatenated into a matrix and reduced using PCA to N dimensions (the number of components to be estimated). The LM × V reduced, concatenated matrix for the M subjects is
where G−1 is an N × LM reducing matrix (also determined by a PCA decomposition) and is multiplied on the right by the LM × V concatenated data matrix for the M subjects. Following ICA estimation, we can write X = AS, where A is the N × N mixing matrix and S is the N × V component map. Substituting this expression for X into Eq. (3) and multiplying both sides by G, we have
Partitioning the matrix G by subject provides the following expression.
We then write the equation for subject i by working only with the elements in partition i of the above matrices such that
The matrix Si in Eq. (6) contains the single subject maps for subject i and is calculated from the following equation
We now multiply both sides of Eq. (6) by Fi and write
which provides the ICA decomposition of the data from subject i, contained in the matrix Yi. The N × V matrix Si contains the N source maps and the K × N matrix FiGiA is the single subject mixing matrix and contains the time course for each of the N components.
Group inferences can be made by analyzing the subject specific time courses and spatial maps. Fig. 4 (bottom) categorizes these analyses into three main areas. To evaluate spatial properties of a given component statistically, one can perform voxel-wise tests on the spatial maps (Fig. 4; bottom left). The time courses can be analyzed by fitting to a GLM (the same model one would use for a GLM analysis; e.g. multiple regression), but instead of fitting to the voxel-wise data the ICA time courses are the dependent variable (Fig. 4; bottom middle). The estimated parameters can then be entered into a second level statistical analysis to make inferences about how much each component is modulated by a given stimulus, whether one component is modulated more by one stimulus than another, whether one group shows a stronger task-modulation than another, etc (Stevens et al., 2007). This provides a powerful way to make inferences about the components. Finally, one may be interested in the non-task-related components (or components in a resting-state study). In this case, one can evaluate differences in the spectral power between groups (Garrity et al., 2007) or compute additional parameters such as the fractal dimension of the subject component time courses (Fig.4; bottom right).
Next, we introduce two approaches for performing data fusion with ICA, joint ICA and parallel ICA (both of which are implemented in the Matlab-based Fusion ICA Toolbox (FIT: http://icatb.sourceforge.net).
Joint ICA is an approach that enables us to jointly analyze multiple modalities which have all been collected in the same set of subjects. In our development, we primarily consider a set of extracted features from each subject's data, these data form the multiple observations—the vertical dimension in our group dataset. Given two sets of group data (can be more than two, for simplicity, we first consider two), XF and XG, we concatenate the two datasets side-by-side to form XJ and write the likelihood as
where uJ = WxJ. Here, we use the notation in terms of random variables such that each entry in the vectors uJ and xJ correspond to a random variable, which is replaced by the observation for each sample n =1,...,N as rows of matrices UJ and XJ. When posed as a maximum likelihood problem, we estimate a joint unmixing matrix W such that the likelihood L(W) is maximized.
Let the two datasets XF and XG have dimensionality N × V1 and N × V2, then we have
Depending on the data types in question, the above formula can be made more or less flexible.
This formulation characterizes the basic jICA approach and assumes that the sources associated with the two data types (F and G) modulate the same way across N subjects (see Fig. 5a). The assumption of the same linear covariation for both modalities is fairly strong, however it has the advantage of providing a parsimonious way to link multiple data types and has been demonstrated in a variety of cases with meaningful results (Calhoun et al., 2006a; Calhoun et al., 2006b; Calhoun et al., 2006c; Eichele et al., 2008; Moosmann et al., 2008).
There are different ways to relax the assumptions made in the formulation above, such as instead of constraining the two types of sources to share the same mixing coefficients, i.e., to have the same modulation across N samples, we can require that the form of modulation across samples for the sources from two data types to be correlated but not necessarily the same (Correa et al., 2008). The approach we discuss next, called parallel ICA provides this additional flexibility in modeling (Liu and Calhoun, 2006; Liu and Calhoun, 2007).
As noted earlier, the strong regularization imposed by the jICA framework can be relaxed in a number of ways to allow for more flexibility in the estimation. One such approach we developed is called parallel independent component analysis (paraICA). As a framework to investigate the integration of data from two imaging modalities, this method identifies components of both modalities and connections between them through enhancing intrinsic interrelationships (see Fig. 5b). We have applied this approach to link fMRI/ERP data and also fMRI and genetic data (single nucleotide polymorphism arrays) (Liu and Calhoun, 2006; Liu and Calhoun, 2007; Liu et al., In Press). Results show that paraICA provides stable results and can identify the linked components with a relatively high accuracy.
In our initial application of paraICA, we defined a genetic independent component as a specific SNP association, i.e., a group of SNPs with various degrees of contribution, which partially determines a specific phenotype or endophenotype. This association can be modeled as a linear combination of SNP genotypes (Dawy et al., 2005; Lee and Batzoglou, 2003),
where, snp is a genotype at a given locus and β is a weight contributed from a SNP to the genetic association. Beside the independent component, the weight itself is also of interest, implying the influence factor and type, i.e., inhibitory or excitatory to a phenotype. With the assumption that each component has an independent distribution pattern in 367 SNPs, we constructed the SNP data matrix, X, in a participant-by-SNP direction. The mixing process is presented in Eq. (12),
where, n is the number of participants and m is the number of components. xsi is a vector of 367 SNP genotypes for one participant. ssi is a vector of 367 SNP weights for one genetic component. As is the matrix of the loading parameters, presenting the influence of each SNP component on participants.
In our current formulation, the relationship between brain function and the genetic component is calculated as the correlation between the columns of the fMRI Af matrix and the SNP As matrix (note this can also be defined using other criteria such as mutual information, to identify nonlinear coupling between fMRI and SNP data). Thus, we have the correlation term and the maximization function based upon entropy. The procedure of parallel ICA is illustrated in Fig. 5b, where data 1 is the fMRI data and data 2 is the SNP data. The algorithm proceeds such that two demixing matrices W are updated separately during which the component with highest correlation from each modality is selected and used to modify the update of the demixing matrix based on the correlation value using appropriate stopping criteria.
In this section, we present examples of results from previous work using group ICA, joint ICA, and parallel ICA. The first example shows an analysis of a simulated driving paradigm, a case in which ICA is particularly useful as it is a naturalistic task that is difficult to parameterize for use in a traditional GLM analysis. fMRI data from 15 subjects were collected during a 10 min paradigm with alternating 1 min blocks of fixation, simulated driving, and watching (Calhoun et al., 2002). ICA time courses were first analyzed to evaluate task-relatedness. Six components were identified and entered into a voxelwise one-sample t-test. A total of six components are presented showing different dynamics in response to simulated driving. In this case, ICA has proven to be a very powerful approach for analysis and enabled us to develop a model for the neural correlates of simulated driving which is nicely related to existing models based upon behavioral data (Calhoun et al., 2005; Calhoun et al., 2002; Calhoun et al., 2004b; Meda et al., In Press)(Fig. 6).
The second example we present is an analysis of fMRI data collected from an auditory oddball task for two patients groups as well as healthy controls. Back-reconstructed component maps were entered into two sample t-tests to evaluate pair-wise differences between the three groups. Results are presented for each group for two components, one in temporal lobe and also the default mode (Fig. 7; left). We performed a multiple regression including the target, novel, and standard stimuli and the mean of the estimated beta parameters is shown in Fig. 7 (right). We were also able to utilize these results to accurately differentiate healthy controls, schizophrenia patients, and patients with bipolar disorder. This example illustrates the ability of group ICA to differentiate groups and also shows both a comparison of the spatial maps and the time courses.
The next example involves data fusion of ERP and fMRI data using joint ICA. The fMRI data and the 64 channel ERP data are entered into a joint ICA analysis. This provides us with not only a temporal ERP profile and a spatial fMRI profile, but the topography from the ERP data provides additional information for interpretation (see Fig. 8). We developed a method for parallel spatial and temporal independent component analysis for concurrent multi-subject single-trial EEG-fMRI that addresses the mixing problem in both modalities, and integrates the data via correlation of the trial-to-trial modulation of the recovered fMRI maps and EEG time courses. The method affords extraction of a previously missed spatiotemporal process corresponding to the auditory onset response and subsequent low-level orienting/change detection. For full details please see (Eichele et al., 2008).
Our final example shows results from a parallel ICA analysis of auditory oddball fMRI data and 367 SNPs from schizophrenia patients and healthy controls (Liu et al., In Press). When 43 healthy controls and 20 schizophrenia patients, all Caucasian, were studied, we found a correlation of 0.38 between one fMRI component and one SNP component. This fMRI component consisted of regions in parietal lobe, right temporal lobe, and bilateral frontal lobe. The relevant SNP component was contributed to significantly by 10 SNPs located in genes including those coding for the nicotinic alpha-7cholinergic receptor, aromatic amino acid decarboxylase, disrupted in schizophrenia 1, among others. Both fMRI and SNP components showed significant differences in loading parameters between the schizophrenia and control groups (p=0.0006 for the fMRI component; p=0.001 for the SNP component). The parallel ICA framework enabled us to identify interactions between brain functional and genetic information; our findings provide a proof-of-concept that genomic SNP factors can be investigated by using endophenotypic imaging findings in a multivariate format (Fig. 9).
ICA is a powerful data driven approach that can be used to analyze group fMRI data or to analyze multimodal data including fMRI, ERP, and genetic data. The examples demonstrate the utility and diversity of ICA-based approaches for the analysis of brain imaging data.
This research was supported in part by the National Institutes of Health (NIH), under grants 1 R01 EB 000840, 1 R01 EB 005846, and 1 R01 EB 006841.
Conflict of interest The authors declare that there are no conflicts of interest.