|Home | About | Journals | Submit | Contact Us | Français|
Independent Components Analysis (ICA) has previously been used to denoise EEG/MEG signals before performing neural source localization. Source localization is then performed using a method such as beamforming or dipole fitting. Here we show how ICA can also be used as a source localization method, negating the need for beamforming and dipole fitting. This type of approach is valid whenever an estimate of the forward (mixing) model for all putative source locations is available, which includes EEG and MEG applications. The proposed method consists of estimating the forward model using the laws of physics, estimating a second forward model using ICA, and then correlating the columns of the matrices that represent the two forward models. We show that, when synthetic data are used, the proposed localization method produces a smaller localization error than several alternatives. We also show localization results for real auditory-evoked MEG data.
Many data sets can be modeled as a linear multiple-input, multiple-output (MIMO) mixture of source signals. The main goal of standard Independent Components Analysis (ICA) methods is to learn a linear MIMO function that, when applied to a mixture of sources (the observations), extracts the original, unmixed sources. The MIMO function that maps the source space to the observation space is commonly referred to as the mixing matrix in the field of ICA and to the forward solution in the field of neuroscience. Likewise, the MIMO function that correctly extracts the sources from the observations is referred to as either the demixing matrix or the inverse solution.
Since its inception researchers have used ICA for a variety of applications. The list of applications includes blind source separation (BSS) ,, blind identification ,,, angle of arrival estimation , jammer suppression , deconvolution , feature extraction , denoising/artifact rejection ,, genetic analyses ,, and for estimating the number of active sources . Unlike for many of the applications listed previously it is possible to use the laws of physics to estimate the forward solution for electroencephalographic (EEG) and magnetoencephalographic (MEG) applications.
ICA can also be used to estimate the forward solution, but the two estimates have important differences (the differences below were previously pointed out by Knuth ). The physics-derived forward solution contains information on the spatial location of all of the putative neural sources, but it is not able to determine which of the putative sources are active. The ICA-derived forward solution depends only on the active sources. It does not, however, contain any information on the location of these sources. Hence, neither of these approaches can be used, individually, to locate the neural sources.
Localization is commonly performed on EEG and MEG data after applying ICA denoising using, e.g., EEGLAB  or its MEG counterpart known as NUTMEG . The ICA denoising is used to extract meaningful components, such as evoked responses, or to remove unwanted interference sources. Localization is then performed on the processed data using techniques such as beamforming ,, dipole fitting ,,,,,, or low resolution brain electromagnetic tomography (sLORETA) ,.
In this paper, which extends our previous results , we show how the physics-derived and ICA-derived forward solutions can be combined to directly estimate the spatial locations of the active sources, thus negating the need for a separate localization method.
We will assume MEG data throughout, although the ideas can also be applied to EEG data with little modification. The following assumptions are made,
Assumption H1 is standard for EEG and MEG data , H2 is ubiquitous, and H3-H6 are standard whenever ICA is used . Note, however, that the noise in each sensor represents a unique source in addition to any other sources (thus violating H6). Furthermore, real neural sources are almost certainly distributed (thus violating H5 and H6) and, based on neurophysiological arguments, we expect the time courses of neighboring voxels to be correlated (thus violating H3). Fortunately, as far as ICA is concerned, sources having a relatively small total energy in sensor space can oftentimes be neglected, thus reducing the number of effective sources (blind identification ,, and bias removal  may also help alleviate concerns about H6). In addition, multiple focal sources (used to represent a single distributed source) become one effective source (1) in the limit as the maximum pairwise distance between the constituent focal sources approaches zero or (2) in the limit as the minimum pairwise absolute normalized correlation coefficient approaches one. In the former case, it may be possible to accurately localize the effective source when ICA processing is used, but only the mean time course of the constituent focal sources can be recovered. In the latter case, it may be possible to recover the correct time course, but ICA processing necessarily biases the localization. Hence, in cases where an ICA-based method yields meaningful results, it may be that the data are represented well by a relatively few energetic effective sources (where case (1) holds for localization applications and case (2) holds for time course extraction applications) and the effective sources are approximately statistically independent (also, H3 and H4 can be traded for different assumptions commonly used in BSS, e.g., those that require the sources to have unique spectra , , or to be statistically non-stationary ).
where Ω represents the three-dimensional source space, r is a spatial location within Ω expressed in Cartesian coordinates, L(r) and ψ(r) are, respectively, the (Nb×Nψ) lead field matrix and the (Nψ×1) orientation vector of the neural source at location r, s(r, t) is the amplitude of the zeromean neural source (in source space) at location r and time t, y(t) is the amplitude of the zero-mean interference sources (in observation space), and ν(t) is the zero-mean noise. If the brain is modeled well as a homogeneous spherical conductor , which is a standard assumption for MEG data, then we can use Nψ = 2 since the field outside the head vanishes for radially-oriented dipoles. For the general case Nψ = 3. The neural sources have both an amplitude and an orientation since they can be thought of as infinitesimal current dipoles.
For temporally discrete signals the magnetic field at sample n is assumed to be given by,
where bn = b(nT), sn(r) = s(r, nT), yn = y(nT), and νn = ν(nT), n is the integer-valued sample number, and T is the sampling period. For spatially discrete signals, and under conditions for which we are allowed to interchange the summation and integration, the magnetic field is given by,
where ri corresponds to the location of the ith neural source, si,n corresponds to the ith (scalar) element of the (Ns×1) vector sn (used below), and Li = L(ri) and ψi = ψ(ri) are, respectively, the short hand notations for the lead field matrix and orientation vector associated with the ith neural source. The precise expressions for the lead field matrices, Li, depend on additional assumptions about the head model. The lead field matrices for several different head models are found in papers authored by Sarvas  and Mosher et al. .
We define the (Nb×NψNs) composite lead field matrix as,
and the (NψNs×Ns) orientation matrix as,
We let Fs = LC and yn = Fxxn, where Fs is the (Nb×Ns) forward solution for the neural sources of interest, Fx is the (Nb×Nx) forward solution for the interference sources, and xn is the (Nx×1) vector of interference sources. The final expression for the magnetic field is found by inserting Equations (7) and (8) into Equation (6), which produces,
The physical model given by Equation (7) expresses the observations, bn, as a function of three different types of signals: neural sources, interference sources, and noise. The neural source vector, sn, is defined as the application-specific sources of interest. The interference source vector, xn, is defined as the set of signals that are of no interest and are not generated in the sensors. They may originate in the brain, other parts of the body (e.g., heart), or outside the body. The noise is defined as the set of signals that originate in the sensors. Notice that, under H6, Equation (7) is identical to the standard ICA model, with additive noise, which is given by bn = ‾ F ‾sn+νn, where ‾F = [Fs Fx] and .
Our goal is to localize each of the neural sources. In our model of the data, given by Equation (7), only the observations, bn, are known. Anything else we desire must be estimated from the data. We propose to perform localization using two different estimates of Fs, which we hereafter refer to as simply the forward solution (since Fx is of no interest). We denote the ICA-derived and physics-derived estimates of the forward solution as A and F, respectively.
The proposed source localization method consists of three main steps. The steps are,
Many ICA algorithms estimate the inverse solution, as opposed to the forward solution. In this case we need to apply a pseudo-inverse. First, we choose values for Ns and Nx. Then we learn the three matrices that yield the source estimate vector, , and the neural source estimate vector, , which are given by,
where Wp is a (Np×Nb) projection matrix, Wd is a (Np×Np) demixing matrix, and ‾I is a (Ns×Np) selection matrix. Finally, we apply the Moore-Penrose pseudo inverse to the inverse solution, which produces the ICA-derived estimate of the forward solution,
In Step 1a we learn the projection matrix, Wp. The projection matrix is intended to suppress the undesired signals, especially the noise (under the proper conditions the demixing matrix can suppress the interference sources). It also helps to control the complexity of the proposed algorithm by reducing the dimensionality of bn prior to learning the demixing matrix. The projection matrix can be found using any desired dimension-reduction method, including methods such as Principal Components Analysis (PCA), Partitioned Factor Analysis, and a supervised method appropriate for brain computer interfaces (BCI) known as the common spatial pattern decomposition .
Partitioned Factor Analysis (PFA) denoising , is an alternative to PCA. PFA can be used when all or part of the data can be partitioned into two regions: one where the neural sources are known to be active and one where they are known to be inactive. This condition is approximately met when the data are collected using the stimulus-evoked paradigm and then averaged over trials . In this case the active region is assumed to be the post-stimulus window and the inactive region is assumed to be the pre-stimulus window. PFA is designed to suppress both the interference and the noise and it directly extracts the neural sources from the data. Thus, for PFA denoising we (ideally) use Np=Ns.
In Step 1b we learn the demixing matrix, Wd. The demixing matrix is intended to separate the sources. ICA algorithms typically learn Wd by making the source estimates, as statistically independent as possible. The demixing matrix also enables us to suppress the interference sources through the process of selection.
In Step 1c we learn the selection matrix, ‾I. Recall that there is no need for selection if PFA is used, in which case we let ‾I = ‾I. When PFA is not used, we select the subset of that are thought to be interference sources. The selection matrix is then formed by taking an identity matrix and removing all rows that are associated with the interference sources. We select the neural sources by taking the Ns source estimates having the largest ratio of post-stimulus energy to pre-stimulus energy.
The forward solution, Fs, is a function of the location of the Nb sensors and the Ns neural sources. For source localization applications the locations of the sensors are oftentimes known. However, the locations of the neural sources are always, by definition, unknown. To address this shortcoming we form a grid of Ng putative locations and then we use the data to estimate the activity of the putative neural source at each of these voxels. This is the approach taken by beamforming methods , which estimate an upper bound of the energy at each voxel. One consequence of using a grid is that the forward solution, Fs, and its physics-derived estimate, F, differ in size. The former is the (Nb×Ns) matrix that results from the product of LC and Ψ, whereas the latter is the (Nb×Ng) matrix that results from the product of the spatially oversampled versions of LC and Ψ.
In Step 2a we build a dense uniform grid of Ng >> Ns putative locations throughout the brain. The grid is formed over the region corresponding to the brain. The brain region is determined by the co-registration of MEG data with a structural magnetic resonance image (MRI) . More specifically, we place three fiducial coils (one nasion and two opposing preauricular points) on each subject/patient before acquiring their MEG and structural MRI data. We use the NUTMEG suite  to load both types of data into memory and then select the brain region with the mouse by clicking on the structural MRI. Once the desired grid resolution is chosen, the NUTMEG program returns r1 to rNg, which are the coordinates of the Ng putative source locations.
In Step 2b we learn the (spatially oversampled) composite lead field matrix, LC. This (Nb×NψNg) matrix is constructed using Equation (4) with the exception that the neural sources are assumed to be located at the Ng grid points.
In Step 2c we learn the (spatially oversampled) orientation matrix, Ψ. This (NψNg×Ng) matrix is constructed using Equation (5) with the exception that it must use the same set of Ng locations used to create LC. The (Nψ×1) orientation vector at voxel j, ψj, is chosen so that it maximizes a similarity measure between A and the jth column of F. The estimation involves two maximizations. The first is used to find , which is the optimal value of ψj associated with the ith neural source estimate (optimal in the sense of maximizing the similarity metric below). The second is used to find the optimum value taken over all Ns neural source estimates.
The parameter is given by,
where Ai is the ith column of A, Fj is the jth column of F, and where we use the fact that the square root operator monotonically increases over the domain, the maximization is independent of the positive scalar quantity , and Fj equals Ljψj. It is obvious that the optimal solution is independent of the norm of ψj. Hence, we add the constraint that ψj must have unit norm. Notice that both and are symmetric and positive semi-definite matrices. Hence, the vector j that maximizes (15) is given by the eigenvector associated with the largest eigenvalue of the generalized eigenvalue decomposition (GED) of with respect to . The overall optimum orientation vector at the jth voxel is found by jointly optimizing over all possible orientation vectors and all possible neural sources. This is accomplished by replacing the subscript i in Equation (10) with f(j), which is defined by,
where . We denote the overall optimum orientation vector as .
If assumptions H1 and H2 are valid, the orientation vector at each (true) neural source location is estimated accurately, and if the true locations of the neural sources happen to fall on the chosen grid, which is guaranteed for uniform sampling in the limit as Ng → ∞, then (by definition) each of the Ns columns of F that correspond to a neural source is identical to one of the columns in Fs.
If, in addition, assumptions H3 to H6 are valid, the userdefined value of Ns equals the true number of neural sources, the combination of denoising and demixing successfully removes all interference and noise, and A is such that the neural source estimates are mutually statistically independent then each of the Ns columns of A equals (to within a multiplicative scalar) one of the columns in Fs.
The consequence of the two preceding paragraphs is that each of the columns of A also equals (to within a multiplicative scalar) one of the columns in F. Although each column of F corresponds to a known location in source space (under the above conditions), we do not know which of the columns correspond to (active) neural sources. Likewise, each column of A corresponds to an active neural source, but A does not provide information on the location of the associated sources. In the proposed approach we combine the information contained in F and A by comparing their columns. More specifically, we use the following similarity metric to express the confidence level that a neural source exists at voxel j,
where ρj is the absolute value of the normalized correlation coefficient (except that the means of the columns are not removed), which is guaranteed to take values between 0 and 1, and Equation (11) is used to determine f(j), which is the column of A that is most similar to the jth column of F. The means are not removed from the columns in the similarity metric because they represent parameters of a system, as opposed to a signal, in which case differences in the mean are significant. Notice that Equations (15) and (16) also use the same similarity metric given by Equation (12).
Due to the well-known gain and permutation indeterminacies of ICA , the ICA-derived estimate of the forward solution, under the ideal conditions listed above, is related to the true forward solution using Fs = DPA, where D is a diagonal matrix and P is a permutation matrix. Because of the permutation ambiguity of ICA we do not know which column of F is associated with a given column of A. For this reason we compare all columns of F with all columns of A. Likewise, because of the gain ambiguity of ICA we use a similarity metric that is invariant to multiplicative gains.
For the synthetic data we use a total of 500 different evoked-response data sets, which corresponds to 50 Monte Carlo trials for each of 2 experiments and 5 different values of the input signal-to-interference ratio (SIR), which is defined by,
where the stimulus onset is n=0, sn,k and xn,k are (Ns×1) vectors, is the square of the (scalar) ith element of the sum, and is similarly defined. Spatially and temporally white Gaussian noise is added to each sensor. The power of the isotropic noise is chosen so that the input signal-to-noise ratio (SNR) is −5 dB. Each dataset is 1000 samples in length, 5/8 of which corresponds to the pre-stimulus period and 3/8 to the post-stimulus period, and the sampling frequency is 1 kHz. Each neural source has a damped sinusoidal time course having a random frequency between 5–20 Hz, whereas each (neural) interference source has an un-damped sinusoidal time course. Damping is performed such that the energy of the neural sources is concentrated within the post-stimulus period. Determination of the lead field matrices requires knowledge of the spatial locations of the sensors. For this we use the true spatial locations of the sensors corresponding to an in-house 275-channel whole head MEG system (Omega 2000, VSM MedTech Inc., Port Coquitlam, Canada). To simplify the analysis all of the sources of the synthetic data are placed within the sagittal plane that passes through the center of the head. The voxel size is 3.7 mm per side and Ng = 1600. The number of sensors is Nb = 275, the number of neural sources is Ns = 2, and the number of interference sources is Nx=3. We assume that the true values of Ns and Nx are known.
The performance is measured using the localization error, which is the mean distance between the true neural source locations and the estimated locations. The estimated locations correspond to the tallest two peaks of the tomographic map. Since there are 2 neural sources, there are two different ways of associating the true source locations with the estimated source locations. We use the more meaningful association, which is the one that has a smaller mean distance.
We compare the performance of the proposed column correlation method, 3 scalar minimum variance beamformers (operating on the raw data and operating on denoised data using PCA and PFA denoising), and 2 methods that use ICA denoising prior to localization (sLORETA and dipole fits). The proposed column correlation method uses PFA denoising and the AR-MOG ICA algorithm with Kg = 0 and KQ = 3 . The AR-MOG algorithm assumes each source distribution is a mixture of Gaussians, the parameters of which are learned from the data. Hence, it is appropriate for EEG and MEG sources, which are thought to have multi-modal distributions . For the beamforming methods we use the denoised data covariance matrix whenever denoising is used. For all 6 methods we use Bayesian regularization for inverting ill-conditioned matrices, where a matrix is defined to be ill-conditioned when the ratio of the maximum eigenvalue to the minimum eigenvalue is greater than 1e12.
First we report the computational complexities of the 6 methods, which are measured in terms of mean completion time per dataset. The mean completion times are 3.0, 3.1, 18.7, 19.3, 40.6, and 63.0 seconds for the PCA/Beamformer, Raw Data/Beamformer, PFA/Beamformer, PFA/Correlate-Columns, ICA/Dipole Fit, and ICA/sLORETA, respectively.
Figures Figures11 and and22 pertain to the first experiment. Figure 1 shows contour plots for 4 of the methods, where the y-axis represents left-right position and the z-axis represents vertical position. For this example, the Raw Data/Beamformer has peaks at all 5 (neural and interference) sources (the peaks at the neural sources are barely visible), the PCA/Beamformer map has a single peak occurring between 2 of the interference sources, whereas the PFA/Beamformer and PFA/Column Correlation methods have peaks near both neural sources.
Figure 2 shows the mean localization error for the first experiment as a function of the (input) SIR. The ICA/Dipole Fits method, not shown in order to reduce clutter, performs slightly better than PCA/Beamforming for low SIR and slightly worse than PCA/Beamforming for high SIR. The ICA/Dipole Fits and PCA/Beamformer methods consistently perform the worst, PFA/Column Correlation has the best performance for most values of SIR, and the Raw Data/Beamformer performs poorly for low SIR and performs the best (as expected) for extremely high SIR.
Figures Figures33 and and44 pertain to the second experiment. In this experiment we replace one of the focal neural sources with a distributed source. The single distributed source spans a 1 cm × 1 cm area and consists of 9 uniformly-placed focal sources, each of which has an independently-drawn time course and orientation. Each of the methods continues to assume that there are only 2 sources of interest, although there are actually 10. Figure 3 shows the contour plots of 4 of the methods. In this example, the Raw Data/Beamformer has a single large peak at one of the interference sources and the PCA/Beamformer has peaks at 2 of the interference sources, whereas PFA/Beamformer and PFA/Column Correlation have peaks near both of the neural sources and 1 or 2 extraneous peaks.
Figure 4 shows the mean localization error for the second experiment. The ICA/Dipole Fits method (not shown) performs the same as ICA/sLORETA for low SIR and slightly worse than PCA/Beamforming for high SIR. The ICA/Dipole Fits, ICA/sLORETA, and PCA/Beamformer methods consistently perform the worst and the PFA/Column Correlation consistently performs the best.
Next we present results from real auditory-evoked data from a male subject in his mid 20’s. The stimulus is a 1 kHz tone of 400 ms duration, which is presented binaurally once every 1.35 sec (±100 ms jitter). The sample rate is 4 kHz, the number of trials is 247, and each trial lasts for 800 ms. We apply a 4 Hz high pass filter to the data in addition to the hardware anti-alias filter. The structural MRI data are collected using a 1.5 T Signa Excite by General Electric (Milwaukee, WI). The MEG data are collected using the 275-channel VSM system described earlier. The voxel size is 7 mm per side and Ng = 7000. For PFA denoising we use Ns = 2 and Nx = 5 and for ICA demixing we use AR-MOG with Kg =0 and KQ=3.
Figure 5 shows the trial-averaged waveforms for 20 and 247 trials of data, where the trials are aligned according to the stimulus onset prior to averaging. Also shown in this figure are the two source estimates of the proposed method found for the case that there are only 20 trials. The M100 is visible in the source estimates of the proposed method with as few as 5 trials, whereas it takes 10 or 20 trials in order for the M100 to be evident in the trial-averaged data (not shown). Also, the proposed method indicates there is significant activity around 200 ms post-stimulus using as few as 20 trials, although this later activity cannot be seen in the trial-averaged data until there are much more than 50 trials.
Figure 6 shows the localization results of the PFA/Column Correlation method using the three standard orthogonal projections and the neurological convention. The voxels having the highest energy are highlighted, where the energy of each voxel is found by time-averaging over the post-stimulus period. Both coronal slices of the tomographic map produced by the proposed method show activation in the primary auditory cortex.
The three methods, known to the authors, that are most similar to the proposed method are Bayesian methods by Knuth  and Roberts , and a convolutive ICA method by Took et al. . These methods assume that the sources are statistically independent and they incorporate a physical forward model, as does the proposed method. The first of the methods assumes that the sources radiate spherically and the amplitude falls as the distance squared. This is unlike our model, where each source has an orientation (hence, radiation is not spherical) and the magnitude falls as the distance cubed. The second method imposes a non-negativity constraint on the ICA mixing matrix. The third method is designed for convolutive mixtures and it requires that the number of sensors is equal to or larger than the number of potential source locations. This is unlike the standard model for EEG and MEG data, which consists of an instantaneous mixture of the sources and where the number of putative source locations far exceeds the number of sensors.
Good denoising performance does not necessarily equate with good localization performance. For example, the mean PCA/Beamformer performance shown in Figures Figures22 and and44 is always worse than for the Raw Data/Beamformer even though PCA successfully removes noise (as measured by output SNR, the results of which are not shown here). For our data, PFA denoising outperforms PCA denoising when both use the same localization method (beamformer) and column correlation outperforms the beamformer when both use the same denoising method (PFA). The peaks in the tomographic map of the proposed method are not as sharp as the beamformer methods since the correlation of two columns of Fs decays slowly as the distance between the two associated voxels increases (especially if the orientations associated with the two voxels are approximately collinear). ICA denoising consistently suffers from over-fitting, which is not surprising since it must learn parameters. Hence, the two methods that use ICA denoising, ICA/sLORETA and ICA/Dipole Fit, perform poorly. The proposed method, on the other hand, must learn approximately parameters. The ICA/sLORETA, ICA/Dipole Fit, and all 3 beamformer methods require the inversion of a (Nb×Nb) matrix (Nb = 275 for our data), whereas the proposed method requires the inversion of a (Ns×Ns) matrix (Ns = 2 for our data). Finally, the magnitude of the normalized correlation between the localization and separation metrics is 0.46, which indicates that the localization performance of the proposed method is not overly sensitive to the quality of the source separation.
This work was partially supported by NIH grant F32-NS052048 and NIH Grant R01-DC004855.
Kenneth E. Hild II (M’91, SM’05) received a Ph.D. in Electrical Engineering from the University of Florida, Gainesville, where he studied information theoretic learning and blind source separation in the Computational NeuroEngineering Laboratory. Later, he studied biomedical informatics at Stanford University, Palo Alto. From 1995 to 1999, Dr. Hild worked in the Advanced Concepts group at Seagate Technologies. From 2000 to 2003, he taught several graduate-level classes on adaptive filter theory and stochastic processes at the University of Florida. From 2003 to 2007, he was at the Biomagnetic Imaging Laboratory at UCSF, San Francisco, where he applied variational Bayesian techniques for biomedical signal processing of encephalographic and cardiographic data. He is currently employed at the Oregon Health & Science University, Portland, where he is performing research on human and computer vision. His interests include full Bayesian models for competing versions of evolution, reinforcement learning using a hierarchy of innate concepts, and general-purpose machine learning algorithms.
Srikantan S. Nagarajan (S’90, M’91, SM’07) began his research career at the Applied Neural Control Laboratory of Case Western Reserve University (CWRU), where he obtained both an M.S. and a Ph.D. in Biomedical Engineering. After graduate school, he did a postdoctoral fellowship at the Keck Center for Integrative Neuroscience at the University of California, San Francisco, under the mentorship of Drs. Michael Merzenich and Christoph Schreiner. Subsequently, he was a tenure-track faculty member in the Department of Bioengineering at the University of Utah. Currently, he is the Director of the Biomagnetic Imaging Laboratory, an Associate Professor in Residence in the Department of Radiology, and a member in the UCSF/UCB Joint Graduate Program in Bioengineering. His research interests are in the area of Neural Engineering where his goal is to better understand dynamics of brain networks involved in processing, and learning, of complex human behaviors such as speech, through the development of functional brain imaging technologies.