|Home | About | Journals | Submit | Contact Us | Français|
The widespread use of data-driven methods, such as independent component analysis (ICA), for the analysis of functional magnetic resonance imaging data (fMRI) has enabled deeper understanding of neural function. However, most popular ICA algorithms for fMRI analysis make several simplifying assumptions, thus ignoring sources of statistical information, types of diversity, and limiting their performance.
We propose the use of complex entropy rate bound minimization (CERBM) for the analysis of actual fMRI data in its native, complex, domain. Though CERBM achieves enhanced performance through the exploitation of the three types of diversity inherent to complex fMRI data: noncircularity, non-Gaussianity, and sample-to-sample dependence, CERBM produces results that are more variable than simpler methods. This motivates the development of a minimum spanning tree (MST)-based stability analysis that mitigates the variability of CERBM.
In order to validate our method, we compare the performance of CERBM with the popular CInfomax as well as complex entropy bound minimization (CEBM).
We show that by leveraging CERBM and the MST-based stability analysis, we are able to consistently produce components that have a greater number of activated voxels in physically meaningful regions and can more accurately classify patients with schizophrenia than components generated using simpler models.
Our results demonstrate the advantages of using ICA algorithms that can exploit all inherent types of diversity for the analysis of fMRI data when coupled with appropriate stability analyses.
Recent advances in technology have greatly enhanced both the spatial resolution of functional magnetic resonance imaging (fMRI) data and the speed at which it can be collected, thus allowing researchers the unprecedented ability to characterize and classify brain disorders including schizophrenia. Since relatively little is known about the exact processes underlying neural activation, it is important to make as few assumptions as possible when performing an analysis on such data, especially when the analysis is exploratory in nature. This incentivizes the use of data-driven methods, such as independent component analysis (ICA) (Mckeown et al., 1998). Despite the popularity of ICA for fMRI analysis, see e.g, (McKeown et al., 2003, Rosazza and Minati, 2011, Calhoun and Adalı, 2012, Smith et al., 2013, Lee et al., 2013, Griffanti et al., 2014), most of the popular ICA algorithms used for fMRI analysis, such as Infomax (Bell and Sejnowski, 1995) and FastICA (Hyvärinen, 1999), make several implicit assumptions, which can substantially affect their performance. The first implicit assumption of many popular ICA algorithms for fMRI analysis involves the use of a fixed nonlinearity. The use of a fixed nonlinearity can significantly bias the estimated sources when there is a difference between distribution implied by the nonlinearity and the true source distribution (Adalı et al., 2015). A second assumption involves analyzing the fMRI data in the real domain. Since fMRI data is naturally complex, performing an analysis in the real domain will lead to a loss of information (Hoogenraad et al., 1998, Menon, 2002, Rauscher et al., 2005, Tomasi and Caparelli, 2007, Arja et al., 2010, Yu et al., 2015) stemming from the transformation of the complex signal to the real domain and ignoring the potentially useful property of noncircularity, which fMRI data has been shown to exhibit (Li et al., 2011, Rodriguez et al., 2012, Loesch and Yang, 2013). A final implicit assumption made by the majority of ICA algorithms for fMRI analysis is that there exists no sample-to-sample dependence between adjacent voxels. In fMRI data, activation patterns of blood-oxygenation-level-dependent (BOLD) signals tend to be spatially smooth and clustered, hence, ignoring the potentially informative property of spatial dependence can lead to inferior performance (Du et al., 2014). To avoid such assumptions, the adoption of a more general ICA framework is necessary. For this reason, we propose to use complex entropy rate bound minimization (CERBM) to study fMRI data in its native complex domain and take full advantage of all important statistical properties—types of diversity—inherent to complex fMRI data: dynamic non- Gaussianity, noncircularity, and sample-to-sample dependence (Fu et al., 2015). We compare the performance of the popular complex Infomax (CInfomax) (Calhoun and Adalı, 2002) to that of two algorithms, complex entropy bound minimization (CEBM) (Li and Adalı, 2010), a more recent ICA algorithm that exploits both noncircularity and dynamic non-Gaussianity, but ignores sample-to-sample dependence, as well as CERBM. Note that we use the term dynamic non-Gaussianity to differentiate techniques that use a fixed nonlinearity, such as CInfomax, from those that use a dynamic nonlinearity, such as CEBM and CERBM.
Though able to exploit a greater number of statistical properties due their more general source models, the iterative nature of CEBM and CERBM means that they can converge to local minima, hence producing solutions that are more variable than those of less general methods. This variability motivates the performance of a stability analysis, in order to identify a result that represents a consistent estimate to use as a final output. A widely used approach for resolving this issue in spatial ICA is ICASSO (Himberg and Hyvarinen, 2003), an exploratory visualization method for investigating the relations between estimates. ICASSO runs an ICA algorithm several times and clusters estimated components from all runs based on the absolute value of the correlation between estimates and then selects the centrotype of each cluster as the consistent estimate. However, the direct use of centrotypes can lead to loss of information, since more than one type of component may be grouped into the same cluster, especially when the ICA model order is high, but only one type of component can be selected as the centrotype. Additionally, some approaches for selecting centrotypes with ICASSO can result in the selection of different runs for different components, which breaks the connection with the ICA mixing model. A related and much explored area of research, particularly for multi-subject fMRI data, is a clustering analysis, see e.g., (Sabuncu et al., 2010, Lashkari et al., 2010a,b, Deligianni et al., 2011, Varoquaux et al., 2013, Ng et al., 2013). Generally, these methods focus on matching estimated components across subjects, though it is a simple task to adjust the methods to match components based on runs. However, a natural goal when performing a stability analysis on multi-subject fMRI data is to determine a set of robust T-maps for use in further analyses, an issue that clustering methods, in general, ignore. Additionally, the method for performing a stability analysis should be readily applicable to both real-valued as well as complex fMRI data, the latter being a case that is rarely explored in the context of clustering. In order to address such a diverse set of challenges, we discuss a novel minimum spanning tree (MST)-based approach to investigate the stability of ICA algorithms in both the real and complex domain (Du et al., 2014). Our method not only addresses the issue of determining robust T-maps in both the complex and real domains, but also provides a metric for interpreting the statistical reliability of estimated components for fMRI data. Note that an MST-based method has been previously used for clustering in the context of an fMRI analysis (Baumgartner et al., 2001), however in this study the MST is to cluster fMRI time courses and not fMRI spatial maps.
The remainder of the paper is organized as follows. In Section 2 we discuss the statistical basis of CERBM, the MST-based stability analysis method, as well as the complex fMRI data used in this study. Section 3 contains the discussion of the simulation results as well as the experimental results using actual complex-fMRI data. We present our conclusions in Section 4.
The noiseless spatial ICA model for complex fMRI data gathered from a single subject is given by
where the N latent sources or spatial neural activation patterns, s(v) N, are strictly linearly mixed by the mixing matrix, A N×N, producing observations, x(v) N. If we gather the observations together into a single complex matrix, X N×V, where the ith row is formed by flattening the ith brain volume of V voxels. By assuming spatial independence on the part of the latent sources, ICA seeks to estimate a demixing matrix, W, such that the estimated components are given by Ŝ = WX.
The model in (1) can be extended in a straightforward manner to K data sets as
Since there are scaling and permutation ambiguities associated with ICA, running ICA individually for each subject and aligning the results is impractical. For this reason and to exploit similarities across subjects, we assume that each subject has N common components but individual mixing matrices. Following two successive steps of principal component analysis, first at the subject level and then at the group level, a single Group ICA (Calhoun et al., 2001), implemented in the Group ICA of fMRI Toolbox (GIFT) (GIFT, 2011), is performed and individual subject maps are back-reconstructed. Clearly, the proper choice of ICA algorithm is vital to the success of Group ICA, thus using an algorithm which can take advantage of multiple types of statistical properties, such as CERBM, which we describe next, is important.
For CERBM, mutual information rate, which leads to the minimization of entropy rate, provides the umbrella under which dynamic non-Gaussianity, noncircularity, and sample-to-sample dependence are jointly exploited to yield an independent decomposition. The mutual information rate of the estimated sources, (ŝ1,, ŝN), where ŝi is the ith estimated component, can be written as
where Hr is the entropy rate and (·)H is the Hermitian operator. Since Hr(x) is constant with respect to W and det(WWH) = |det(W)|2, the cost function for CERBM can be written as (Fu et al., 2015)
From (2), it can be seen that CERBM naturally incorporates diversity in the form of sample-to-sample dependence through the use of entropy rate in the formulation. When samples are independent and identically distributed, entropy rate reduces to simple entropy. In addition, by considering a density model that takes full noncircularity into account (Adalı et al., 2008), CERBM is able to exploit another form of diversity, noncircularity (Fu et al., 2015).
Note that the performance of CERBM is greatly affected by the estimation of Hr(ŝi). For this reason and due to the difficulty of estimating the entropy rate for a complex random variable, CERBM uses an invertible source model to model the sample-to-sample dependence and bounds the entropy rate with four measuring functions. These four measuring functions each correspond to a different class of distributions: symmetric, asymmetric, unimodal, and bi-modal, with CERBM selecting the measuring function corresponding to the tightest bound on the entropy rate (Fu et al., 2015). This dynamic selection of the appropriate bound enables CERBM to exploit non-Gaussianity dynamically; this is contrasted to CInfomax, which does not exploit non-Gaussianity dynamically due to its use of a fixed hyperbolic tangent nonlinearity Adalı et al. (2004). It should be noted that the Infomax algorithm has been extended in the real domain to fit both super-Gaussian as well as sub-Guassian sources (Lee et al., 1999), however this is a weaker form of dynamic non-Gaussianity than that possessed by CEBM and CERBM since extended Infomax cannot describe either bi-modal or asymmetric distributions.
The general framework provided by CERBM allows enhanced performance through the exploitation of three types of statistical properties: dynamic non-Gaussianity, noncircularity, and sample-to-sample dependence. However, this enhancement comes at a cost. As a method becomes more general, its stability from one run to another decreases, incentivizing the development of a method to determine a consistent estimate for use in further analyses or to compare the performance of different ICA algorithms. We must stress that the goal of our method is not to determine a “best run,” which can simply be found by selecting the run with the lowest value of the likelihood function, but to select a stable run, which represents the “average” performance of the algorithm and varies little from one set of runs to another. The computation and use of such a stable run, here referred to as a “final run,” enables greater confidence in the reproducibility of results and is described next. Note that in the complex regime considered in this paper, the scaling ambiguity inherent to ICA algorithms includes a phase term. Thus, in order to account for the phase information, rather than normalize the magnitude of the components, the components are normalized using the Mahalanobis distance (Mahalanobis, 1936, Rodriguez et al., 2012) prior to performing the stability analysis. For this reason, we refer to the spatial maps corresponding to components normalized using the Mahalanobis distance as Mahalanobis distance maps, Zc-maps, to differentiate them from spatial maps corresponding to components normalized using only the magnitude, Z-maps.
The MST-based method depicted in Fig. 1 considers the problem of finding a stable run, first, as a generalized assignment problem (GAP) and then exploits the inherent structure of the GAP using graph theory (Du et al., 2014). In the first stage, the objective is to align the N estimated components from each pairwise combination of runs, such that the assignment cost is minimized. For run k and run l, this is accomplished through the use of a cost matrix, Ckl, whose i jth element is given by , 0 ≤ Ψ ≤ 1, where , and Ψ, are the ith estimated source from run k, jth estimated source from run l, and an appropriate similarity measure, respectively. In this work we use the Pearson correlation coefficient as the distance measure for two main reasons: first, it is symmetric thus cij = cji and second, the correlation coefficient describes statistical properties of the sources and, unlike mutual information, is bounded between 0 and 1, thus cij is 0 if and only if . The Hungarian algorithm (Kuhn, 1955) is used to solve the assignment problem, i.e, align the components, for each pair of runs and to obtain the cost of aligning the runs. Then, a fully connected, undirected graph is constructed with nodes representing runs and edges with weights representing the cost of aligning each pair of runs (Du et al., 2014).
The second stage of the MST-based method processes the graph produced in the first stage, aligning the estimated components across all runs. First, an MST (Kruskal, 1956) and corresponding central node, Ŝ[m], are found for the graph, allowing all runs to be connected with low, though sub-optimal, cost. The central node provides local information on the run that is most similar to the other runs, since it is only affected by runs that share the greatest degree of similarity and not affected by those that share little similarity with the other runs, which may not be the case for other metrics, such as lowest average cost of alignment. Having selected the central run, the components of the other runs are aligned with the components of the central run. Then, for each corresponding component across K runs, a one-sample t-test across subjects is performed, producing a single T-map for each of the N groups of components. The use of T-maps is well justified for fMRI analysis, since T-maps can be used to investigate the reliability of components across subjects or runs. Finally, the “final run,” Ŝ[F], is selected as the run with highest average correlation between its estimated components and the corresponding T-maps. This final step gives us the run whose maps best correspond with the group result, i.e., the run that, unlike the central run, is a good representation of the “average” performance of the algorithm. Note that this step avoids the two major issues associated with ICASSO, since the most consistently estimated components are used to cluster components and a single run is guaranteed to be selected as the “final run,” thus maintaining the relation with the ICA mixing model. Additionally, the MST-based method naturally takes advantage of the expectation that meaningful components will be more consistently estimated than noise components, leading to a higher similarity measure, and thus the cost of aligning meaningful components will be lower than the cost of aligning less consistent components.
The complex fMRI data used in this study was gathered from 69 schizophrenia patients and 31 healthy controls. The participants were scanned while listening to three kinds of auditory stimuli: standard (500 Hz occurring with probability 0.80), novel (nonrepeating random digital sounds with probability 0.10), and target (1 kHz with probability 0.10, to which a button press with the right index finger was required). Further details of the AOD paradigm and image acquisition parameters are described in (Kiehl et al., 2005).
For real brain data, the ground truth, i.e., the collection of true latent sources and subject covariations, is not available. Thus, in order to test the performance of CERBM, we generate simulated data similar to the artificial complex fMRI sources used in (Xiong et al., 2008) and examine the performance of CInfomax, CEBM, and CERBM, in terms of intersymbol interference (ISI) (Amari et al., 1996), as the number of samples, V, increases from 100 to 100,000. The 8 sources are corrupted by first order auto-regressive (AR) noise, with an AR coefficient of 0.4, and non-circularity between 0.2 and 0.8. The noise followed was driven by a generalized Gaussian distribution (GGD) with shape parameter, β, between 0.4 and 7. The shape parameter controls the non-Gaussianity of the GGD, with β > 1 being sub-Gaussian, i.e, distributions with tails that approach zero faster than a Gaussian, and β < 1 being super-Gaussian, i.e, distributions with tails that approach zero slower than a Gaussian. Thus, generating sources with a range of values for the shape parameter favors algorithms that can incorporate non-Gaussianity dynamically. In addition, added information provided by the sample-to-sample dependence imposed by the AR source model favors algorithms that can exploit this added type of diversity. The plots presented here are the average result of 100 independent simulations, where, for each simulation, every method was used on the same data and each algorithm was initialized at the same random point per simulation.
We observe in Fig. 2 the high level of performance obtained using CERBM for simulated sources. This performance is due to the exploitation of the three types of diversity: non-circularity, dynamic non-Gaussianity, and sample-to-sample dependence. The value of exploiting sample-to-sample dependence can be seen in the performance improvement of CERBM over CEBM, which only exploits non-circularity and dynamic non-Gaussianity. Additionally, the value of exploiting dynamic non-Gaussianity can be seen in the improvement in performance of methods that use this diversity, i.e. CEBM and CERBM, over CInfomax, which does not.
In order to evaluate the performance of CERBM, Group ICA using CInfomax, CEBM, and CERBM is performed on the complex fMRI data and 30 independent components (ICs) were estimated for each algorithm. The number of components estimated was based on the order selection results achieved using a modified minimum description length (MDL) criterion (Xiong et al., 2008). In order to compare the results using different algorithms, each algorithm was run 10 times and the representative “final run” for each algorithm was determined using the MST-based method described previously.
Task-related components were selected to evaluate the performance of CERBM on the complex fMRI data used in this study. The task-related components were extracted based upon the magnitude of the correlation between IC timecourse and the model timecourse. The ICs were further evaluated visually using prior information about the task as well as the spatial properties of fMRI, such as sooth and focal regions of activity. Figure 3 shows the Z-maps corresponding to the “final run” for each of the three algorithms under study. Numbered 1 through 10, these ICs include: 1. DMN-1; 2. DMN-2; 3, temporal; 4, motor-temporal; 5, motor; 6, sensorimotor; 7, right frontoparietal; 8, left frontoparietal; 9, medial visual; and 10, lateral visual, respectively. Note that to aid in displaying the results, the maps have been flipped so as to only show activation. These maps show a clear increase in the strength and bilateral nature of activation for CEBM and CERBM over CInfomax in IC1, IC2, IC3, IC5, IC6, IC7, IC9, and IC10. We also note a reduction in the number of unwanted “noise voxels,” corresponding to fragmented or physically meaningless activation, for CEBM and CERBM over CInfomax in IC3, IC4, IC5, and IC10. These results demonstrate, in a qualitative manner, the performance improvement, in terms of the extraction of physically meaningful components, achieve by using CEBM and CERBM over CInfomax.
In order to quantify the performance of CInfomax, CEBM, and CERBM in the results presented in Figure 3, receiver operator characteristic (ROC) curves were generated for all components from binary masks. These masks were composed of joined dilated Brodmann areas (BAs), generated using WFU PickAtlas (Maldjian et al., 2003, 2004), where for components that represented the same functional area but were split between more than one IC during the estimation, such as: IC1 and IC2 for default mode network (DMN), IC5 and IC6 for motor, IC7 and IC8 for frontoparietal, as well as IC9 and IC10 for visual, the corresponding ICs were combined prior to mask formation. Note that for brevity, ROC curves for all subjects and corresponding spatial masks for only the DMN, temporal, and motor components are presented in Figures 4 and and5,5, respectively. Additionally, note that, ROC curves were generated for the Zc-maps as well as for the corresponding Z-maps to show the significant improvement achieved when accounting for phase information. The generation of the ROC curves is described below.
At a given value of Zc—Z for the magnitude map—the number of voxels within the mask, Nin, and the number of voxels outside the mask, Nout, with values higher than Zc or Z are found. Then, the probability of detection, PD, and false alarm, PFA, are calculated using the ratios, Ntrue/Nin and Nfalse/Nout, respectively, where Ntrue and Nfalse are determined by the masks. The ROC curve is generated by plotting PD and PFA as the value of Zc and Z is increased from 0.
From Figure 4, we can see that for all three components CERBM has better performance than CInfomax, though such claims cannot be made about CEBM. We can also see the clear improvement in the performance for each of the algorithms when Zc-maps are used rather than the Z-maps. This demonstrates the importance of accounting for the phase information and performing an ICA analysis on fMRI data in its native, complex, domain (Li et al., 2011, Rodriguez et al., 2011, 2012).
Additionally, we perform classification on selected components to further investigate the estimation of spatial components. In the classification, we apply a linear support vector machine (SVM) (Cortes and Vapnik, 1995), to the Zc maps of the “final runs” for each algorithm in order to identify schizophrenia patients from controls. Table 1 shows classification results of each selected component, including accuracy, sensitivity, and specificity, which are calculated by using a leave-one-out approach. Sensitivity measures the proportion of actual patients which are correctly identified in classification and specificity represents the percentage of correctly classified controls. Overall, the results indicate that CERBM tends to estimate better spatial components than CEBM and CInfomax in terms of classification performance. We should note that it may be possible to improve the classification performance through the use of feature selection and other classifiers.
Data-driven methods such as ICA have achieved wide-spread popularity for the analysis of fMRI data due to their ability ability to let the data “speak.” However, the popular ICA algorithms for fMRI analysis, such as Infomax, make several simplifying assumptions that can substantially limit their performance. These include: the analysis of fMRI data in the real domain, the use of a fixed nonlinearity, and the assumption of no sample-to-sample dependence. For this reason, we propose the use of more general ICA algorithms, such as complex entropy bound minimization (CERBM), for the analysis of fMRI data. We have shown that CERBM achieves improved performance of Infomax and CEBM in both simulations and the analysis of actual complex fMRI data. Though able to achieve enhanced performance, CERBM produces results that are more variable than simpler methods, motivating the development of a stability analysis that can identify a result that represents a consistent estimate to use as a final output. To this end, we also discussed a minimum spanning tree (MST)-based method through which we are able to compare the performance of different algorithms on actual fMRI data and mitigate the inherent instability of CERBM. We have shown that by leveraging CERBM and the MST-based stability analysis, we are able to produce components that tend to more accurately classify patients with schizophrenia than those generated using simpler models. These results show the advantage of CERBM when combined with the MST-based stability analysis. The code for CERBM is freely available on the authors’ webpage (mlsp.umbc.edu) and the code for the MST-based stability analysis will be added soon. Currently the code for the MST-based stability analysis is available as part of the open access Group ICA of fMRI Toolbox (GIFT). A final note is that leveraging a more general ICA algorithm in combination with the MST-based stability analysis can be used in real-valued ICA, potentially enhancing both the quality and interpretability of the results of the fMRI analysis.
This work was supported by the following grants NIH-NIBIB R01 EB 005846 and NSF-CCF-1117056.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.