Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2850823

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Materials and Methods
- 3 Results
- 4 Discussion
- A Minimizer of Single Block Problem
- B Derivation of λMAX
- References

Authors

Related links

Neuroimage. Author manuscript; available in PMC 2010 July 15.

Published in final edited form as:

Published online 2009 February 6. doi: 10.1016/j.neuroimage.2009.01.056

PMCID: PMC2850823

NIHMSID: NIHMS152942

Department of Electrical and Computer Engineering, University of Wisconsin, Madison, WI 53715 USA

The publisher's final edited version of this article is available at Neuroimage

See other articles in PMC that cite the published article.

This article presents a new spatio-temporal method for M/EEG source reconstruction based on the assumption that only a small number of events, localized in space and/or time, are responsible for the measured signal. Each space-time event is represented using a basis function expansion which reflects the most relevant (or measurable) features of the signal. This model of neural activity leads naturally to a Bayesian likelihood function which balances the model fit to the data with the complexity of the model, where the complexity is related to the number of included events. A novel Expectation-Maximization algorithm which maximizes the likelihood function is presented. The new method is shown to be effective on several MEG simulations of neurological activity as well as data from a self-paced finger tapping experiment.

The ill-posed nature of the M/EEG inverse problem is widely recognized (Baillet et al., 2001), and unique solutions are usually obtained by placing prior expectations on the nature of the brain activity associated with the measured data. In this paper we present an approach for solving the inverse problem that assumes the underlying brain activity is due to a small number or sparse set of space-time events. Each space-time event corresponds to activity that is localized spatially and in a temporal subspace. Sets of spatial and temporal basis functions are employed to represent the activity due to each space-time event and are chosen to match the characteristics of expected source activity. Seeking to explain the measured data in terms of a small number of space-time events leads to a convex optimization problem, which we solve via a novel Expectation-Maximization (EM) algorithm.

Common approaches to the M/EEG inverse problem may be classified as either parametric or imaging methods (Baillet et al., 2001). Parametric or scanning methods use a small number of dipoles (Scherg and von Cramon, 1985), multipoles (Mosher et al., 1999; Nolte and Curio, 2000), or cortical patches (Limpiti et al., 2006; Wagner et al., 2002, 2000) and scan over locations to find the best set of sources to represent the data. Examples of scanning methods include MUSIC (Mosher et al., 1992), beamforming (Van Veen et al., 1997), and maximum likelihood estimation (Dogandžić and Nehorai, 2000). One persistent challenge for scanning methods is choosing the number of sources in the solution.

The method proposed in this paper may be classified as an imaging technique. Imaging methods tessellate the brain using a large number of dipoles and attempt to simultaneously solve for all dipole amplitudes. This leads to the following linear model:

$$\mathbf{\text{Y}}=\mathbf{\text{HX}}+\mathbf{\text{N}}$$

(1)

where **Y** is an *M* × *T* matrix of spatio-temporal measurements from *M* sensors at *T* time points, **H** is an *M* × 3*D* (or *M* × *D* in the case of known dipole moment orientation) lead field matrix relating sensor measurements to *D* discrete dipole locations (voxels) on the cortex, **X** is the 3*D* × *T* signal matrix representing the unknown time courses and orientations of each dipole on the cortical surface, and **N** is a matrix of additive noise. The goal of imaging techniques is to estimate each element of **X** simultaneously given **Y** and **H**. Typically *M* is on the order of several hundred, while *D* exceeds several thousand, so Eq. (1) is severely underdetermined. Furthermore, the effective rank of **H** is often significantly less than *M*.

The problem of nonuniqueness with imaging techniques is often reduced by use of a Bayesian prior to favor some solutions over others. The prior is incorporated as a penalty term in a cost function which is minimized to estimate the neural activity. Under the assumption of white noise, these cost functions take the form:

$$\text{\u2211}_{t=1}^{T}{\Vert {\mathbf{\text{Y}}}_{t}-{\mathbf{\text{HX}}}_{t}\Vert}_{2}^{2}+{\mathrm{\lambda}}_{t}\phantom{\rule{0.2em}{0ex}}f({\mathbf{\text{X}}}_{t})$$

(2)

Equation (2) is expressed in terms of individual time samples since the optimization, in most previous work, is solved separately at each time point, and solutions are often found only at specific time points selected by the user (such as signal peaks). The first term of Eq. (2) ensures that solutions explain the data well, while *f* acts as a penalty term. The constants *λ _{t}* balances the contribution of both terms. A common choice of penalty is simply
$f({\mathbf{\text{X}}}_{t})={\Vert {\mathbf{\text{WX}}}_{t}\Vert}_{2}^{2}$, which corresponds to a Gaussian prior on

Another set of methods aimed at producing focal solutions to the inverse problem use the _{1} norm of **X*** _{t}* as a regularizer. The corresponding solutions tend to be sparse, i.e., most entries in

Temporal priors and sparsity have also been used to combat the ill-posed nature of the M/EEG inverse problem. Baillet and Garnero (1997) assume dipole magnitudes vary slowly with respect to sampling rate and thus penalize large changes between time samples. A similar approach is taken by Daunizeau et al. (2006) in which the second derivative of dipolar temporal evolution is penalized. Penalization of roughness encourages solutions that are low dimensional or sparse, that is, solutions implicitly described in terms of a small number of smooth bases. Dalal et al. (2008) present a parametric method for localizing cortical activity in time, frequency, and space. Localization in time and frequency implicitly assumes cortical activity is sparse in the time-frequency plane. A number of researchers have used wavelet temporal basis functions to explicitly find sparse representations of M/EEG activity. Effern et al. (2000) present a method for representing event related potentials (ERPs) recorded via EEG (or invasive cortical sensors) using a sparse set of wavelet bases. Similarly, a wavelet shrinkage procedure is used to select a small number of coefficients representing M/EEG signals prior to a Variational Bayes inversion in Trujillo-Barreto et al. (2008).

The “event sparse penalty” (ESP) regularization we present here is closely related to _{1} regularization. The ESP approach seeks a solution composed of a small number of “space-time events” (STEs). Each STE is a spatio-temporal signal defined in terms of a group of basis functions. Bases are a flexible and powerful tool for balancing prior expectations of source characteristics, such as focal or bandlimited source activity, against uncertainty in exact signal behavior, such as dipole orientation or temporal evolution. The STE is constrained to lie in the space spanned by the bases, but is unconstrained within this space. Example spatial bases include x-, y-, and z- components of a dipolar source, or multipoles and patch bases for extended sources. Example temporal basis functions include Fourier bases that represent specific frequency bands, wavelets that represent desired time-frequency characteristics, and singular value decomposition (SVD) derived bases which attempt to capture temporal characteristics of the signal from the data itself.

Use of spatial and temporal bases in the ESP regularizer effectively combines the spatial and temporal aspects of the source activity estimation problem. We present ESP regularization as well as a novel Expectation-Maximization (EM) algorithm for solving the resulting optimization problem. A simple criterion for guiding the selection of an appropriate regularization parameter λ is incorporated into the optimization procedure. The effectiveness of the method is illustrated on simulations as well as human subject data from a self-paced finger tapping experiment.

The remainder of the paper is organized as follows. In Materials and Methods, we first introduce the concept of spatio-temporal basis function expansions of **X** in Eq. (3). This expansion allows us to define STEs and develop the ESP regularization technique for finding solutions which are sparse in STEs. Next, we present an EM algorithm for solving the ESP regularized least squares problem. The Results section describes several MEG simulations, including a set of randomized source simulations, and application of ESP regularization to MEG recordings of a subject performing self-paced finger tapping. We conclude with a discussion of the ESP method and possible extensions.

Matrices and vectors are represented by upper and lower case boldface letters, respectively. The notation **e*** _{i}* represents the vector of all zeros except for a one in the

The ESP approach to the M/EEG inverse problem relies on representing the cortical signal **X** as a linear combination of physiologically meaningful basis functions from carefully crafted spatial and temporal dictionaries. In general, a spatio-temporal expansion of **X** can be written:

$$\mathbf{\text{X}}=\mathbf{\text{S}}\mathbf{\Theta}{\mathbf{\text{T}}}^{T}$$

(3)

where the columns of **S** are a dictionary containing spatial bases, the columns of **T** are a dictionary containing temporal bases, and the (*k*, *l*)* ^{th}* element of coefficient matrix

Given a spatio-temporal expansion in the form of Eq. (3), many methods can be used to solve the reformulated problem for **Θ**. In particular, regularized least squares problems analogous to Eq. (2) can be formulated with some function of **Θ** as the regularization term. However, improved quality solutions are possible by designing the bases and the penalty term concurrently.

Our approach assumes that M/EEG signals are the result of a small number of space-time events (STEs) which occur over limited time spans and occupy contiguous, local areas of cortex. This assumption is reflected in the following spatio-temporal expansion:

$$\begin{array}{l}\mathbf{\text{X}}=\mathbf{S}\mathbf{\Theta}{\mathbf{\text{T}}}^{T}=\\ \left[{\mathbf{S}}_{1}\phantom{\rule{0.2em}{0ex}}{\mathbf{S}}_{2}\dots {\mathbf{S}}_{I}\right]\phantom{\rule{0.2em}{0ex}}[\begin{array}{cccc}{\mathbf{\Theta}}_{1,1}& {\mathbf{\Theta}}_{1,2}& \dots & {\mathbf{\Theta}}_{1,J}\\ {\mathbf{\Theta}}_{2,1}& {\mathbf{\Theta}}_{2,2}& \dots & {\mathbf{\Theta}}_{2,J}\\ & {\mathbf{\Theta}}_{I,1}& {\mathbf{\Theta}}_{I,2}& \dots & {\mathbf{\Theta}}_{I,J}& ]& \phantom{\rule{0.2em}{0ex}}[\begin{array}{c}{\mathbf{\text{T}}}_{1}^{T}\\ {\mathbf{\text{T}}}_{2}^{T}\\ & {\mathbf{\text{T}}}_{J}^{T}\end{array}]=\text{\u2211}_{i=1}^{I}\text{\u2211}_{j=1}^{J}{\mathbf{S}}_{i}{\mathbf{\Theta}}_{i,j}{\mathbf{\text{T}}}_{j}^{T}\end{array}\end{array}$$

(4)

Here the spatial and temporal dictionaries, as well as the coefficient matrix, have been partitioned into space-time events. The columns of each submatrix **S*** _{i}* are an orthogonal basis for measurable spatial activity, such as that from a local region of cortex. Likewise, the columns of each

The choice of spatial and temporal bases in Eq. (4) allows users a great deal of flexibility in both signal model and assumptions about cortical activity. The temporal bases in **T*** _{j}* may be chosen using wavelets, Gabor functions, Fourier bases, etc., to span time intervals and bandwidths consistent with the temporal characteristics of the signals of interest. Possible spatial bases

The motivation behind Event Sparse Penalty (ESP) regularization is the assumption that the activity of interest is a collection of a small number of space-time events. This assumption, combined with appropriate spatial and temporal dictionaries, implies that the true underlying signal should have **Θ*** _{i},_{j}* =

Many recent results have suggested that relaxing nonconvex counting functions to the convex _{1} norm in regularized least squares problems provides a very good approximation to the desired regularization and results in sparse solutions (Candes and Tao, 2005; Donoho, 2006; Tropp, 2006). In the spirit of these results, we propose the _{1} block penalty regularized problem:

$$\underset{\mathbf{\Theta}}{\text{min}}{\Vert \mathbf{\text{Y}}-\mathbf{\text{HS}}\mathbf{\Theta}{\mathbf{\text{T}}}^{T}\Vert}_{F}^{2}+\mathrm{\lambda}\text{\u2211}_{i,j}{\Vert \text{vec}({\mathbf{\Theta}}_{i,j})\Vert}_{p}$$

(5)

where *p* ≥ 1 is chosen to capture the magnitude of the contribution of each block. Obvious choices are *p* = 1, 2, or ∞. If *p* = 1, then the block structure of Eq. (4) is lost because the penalty in Eq. (5) becomes ‖vec (**Θ**) ‖_{1}. This encourages a solution which is sparse in the number of nonzero coefficients in **Θ**, but not necessarily sparse in terms of the number of STEs. If *p* = 2 or *p* = ∞, then the penalty corresponds to the _{1} norm of the _{2} or _{∞} norm, respectively, of each block **Θ_{i},_{j}**. This results in solutions that involve a small number of STEs, yet can accommodate uncertainty in the event characteristics by placing minimal restrictions on the coefficients in

We prove in (Bolstad et al., 2007) that when one STE is present, the *p* = 2 case with lead field normalization (see Section 2.7) provides a higher probability of activating the correct STE than the *p* = ∞ case. Using *p* = 2 in Eq. (5) and noting that ‖vec (**Θ*** _{i},_{j}*) ‖

$$\widehat{\mathbf{\Theta}}=\text{arg}\phantom{\rule{0.2em}{0ex}}\underset{\mathbf{\Theta}}{\text{min}}{\Vert \mathbf{\text{Y}}-\mathbf{\text{HS}}\mathbf{\Theta}{\mathbf{\text{T}}}^{T}\Vert}_{F}^{2}+\mathrm{\lambda}\text{\u2211}_{i,j}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}$$

(6)

This objective function is convex since it is a sum of convex functions. At least one minimizer exists since the objective function is positive (i.e., bounded below) and convex for all **Θ**. Section 2.3 describes a novel EM algorithm for finding **Θ^**. Next we discuss possible spatio-temporal dictionary constructions.

The ESP approach is not limited to a specific set of space-time basis functions. Degenerate choices for the basis functions in Eq. (3) transform ESP regularization into existing regularization techniques. For example, ESP can reconstruct activity independently over all time points by setting each submatrix **T*** _{j}* to the vector

The power of the ESP approach is that it allows for more general and flexible spatial and temporal representations of expected source activity than these degenerate examples. Temporal bases can be used, for example, to isolate a desired frequency band, or may be grouped into different frequency bands such that, for example, **T**_{1} spans the *α*-band, **T**_{2} the *β*-band, **T**_{3} the *γ*-band and so on, reflecting the expectation that different source activity occupies different bands. Temporal bases can also be used to isolate time ranges. Alternatively wavelets or other time-frequency bases can be used to identify source activity based on time-frequency characteristics analogous to Dalal et al. (2008). Similarly, the spatial bases can represent spatially extended source activity The expectation that signals occupy extended areas rather than single dipoles is discussed, for example, in (Baillet and Garnero, 1997; Daunizeau and Friston, 2007). Representations for multipoles and cortical patches have been proposed to accommodate extended source activity (Limpiti et al., 2006; Mosher et al., 1999; Wagner et al., 2002, 2000) and are easily incorporated as bases, as we illustrate in Section 3.

The gradient of Σ* _{i},_{j}*‖

The EM algorithm monotonically increases the Bayesian likelihood *p*(**Y****Θ**)*p*(**Θ**) over **Θ**. Assuming white Gaussian noise and the density
$p(\mathbf{\Theta})\text{exp}\left\{-\frac{\mathrm{\lambda}}{2{\sigma}^{2}}{\sum}_{i,j}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}\right\}$, the negated Bayesian log-likelihood is proportional to
$\frac{{\Vert \mathbf{\text{Y}}-\mathbf{\text{HS}}\mathbf{\Theta}{\mathbf{\text{T}}}^{T}\Vert}_{F}^{2}}{2{\sigma}^{2}}+\frac{\mathrm{\lambda}}{2{\sigma}^{2}}{\sum}_{i,j}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}$. Multiplying through by 2*σ*^{2}, we see that the EM algorithm is monotonically decreasing the function
${\Vert \mathbf{\text{Y}}-\mathbf{\text{HS}}\mathbf{\Theta}{\mathbf{\text{T}}}^{T}\Vert}_{F}^{2}+\mathrm{\lambda}{\sum}_{i,j}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}$, which is the objective of Eq. (6).

The key aspect of solving an estimation problem via the EM algorithm is to identify a so-called “hidden data set” which, if known, would make the problem easy to solve. EM alternates between solving the easy problem given an estimate of the hidden data and updating the estimate of the hidden data. We identify hidden data **Z** for the ESP problem based on the decomposition:

$$\mathbf{Z}=\mathbf{\Theta}+\alpha {\mathbf{N}}_{1}$$

(7)

$$\mathbf{Y}=\mathbf{\text{HSZ}}{\mathbf{T}}^{T}+{\mathbf{N}}_{2}$$

(8)

Here the noise matrix **N** of Eq. (1) is decomposed into two independent random matrices **N**_{1} and **N**_{2} with distributions vec (**N**_{1}) ~ *N*(**0**, **I** **I**) and vec (**N**_{2}) ~ *N*(**0**, *σ*^{2}**I** − *α*^{2}**TT**^{T}**HSS**^{T}**H*** ^{T}*). The scalar

The EM algorithm alternates between an Expectation Step (E-Step) and a Maximization Step (M-Step). The E-Step calculates the Q-function: *Q*(**Θ****Θ^**^{(}^{k}^{−1)}) *E*[log *p*(**Y**, **Z****Θ**)**Y**,**Θ^**^{(}^{k}^{−1)})] where the expectation is over **Z** and **Θ^**^{(}^{k}^{−1)} is the estimate from the previous iteration. The M-Step maximizes *Q*(**Θ****Θ^**^{(k−1)})+ log *p*(**Θ**) over **Θ** to produce **Θ^**^{(}^{k}^{) 4}. This produces a series of iterates **Θ^**^{(}^{k}^{)} which are guaranteed to converge to a stationary point provided that *Q*(**A****B**) is continuous in both **A** and **B** (Wu, 1983).

In the E-Step, we calculate the Q-function using the measured data and the estimate **Θ^**^{(}^{k}^{−1)}:

$$\begin{array}{cc}Q(\mathbf{\Theta}|{\widehat{\mathbf{\Theta}}}^{(k-1)})& =E[\text{log}\phantom{\rule{0.2em}{0ex}}p\left(\mathbf{\text{Y}},\mathbf{\text{Z}}|\mathbf{\Theta}\right)|\mathbf{\text{Y}},{\widehat{\mathbf{\Theta}}}^{(k-1)})]\hfill \\ & =K-\frac{{\Vert \mathbf{\Theta}-{\widehat{\mathbf{\text{Z}}}}^{(k)}\Vert}_{F}^{2}}{2{\alpha}^{2}}\hfill \end{array}$$

(9)

where *K* is not a function of **Θ** and:

$$\begin{array}{cc}{\widehat{\mathbf{\text{Z}}}}^{(k)}& =E[\mathbf{\text{Z}}|\mathbf{\text{Y}},{\widehat{\mathbf{\Theta}}}^{(k-1)}]\hfill \\ \hfill & ={\widehat{\mathbf{\Theta}}}^{(k-1)}+\frac{{\alpha}^{2}}{{\sigma}^{2}}{\mathbf{\text{S}}}^{T}{\mathbf{\text{H}}}^{T}(\mathbf{\text{Y}}-\mathbf{\text{HS}}{\widehat{\mathbf{\Theta}}}^{(k-1)}{\mathbf{\text{T}}}^{T})\mathbf{\text{T}}\end{array}$$

(10)

Thus, specifying *Q*(**Θ****Θ^**^{(}^{k}^{−1)}) amounts to calculating **Z^**^{(}^{k}^{)} using Eq. (10).

The M-Step is a generalization of the M-Step for an _{1} penalized least squares problem (see Figueiredo and Nowak (2003)) since the penalty in Eq. (6) is a generalization of the _{1} norm. We seek to solve the following problem:

$$\begin{array}{cc}{\widehat{\mathbf{\Theta}}}^{(k)}& =\text{arg}\phantom{\rule{0.2em}{0ex}}\underset{\mathbf{\Theta}}{max}Q(\mathbf{\Theta}|{\widehat{\mathbf{\Theta}}}^{(k-1)})+log\phantom{\rule{0.2em}{0ex}}p\left(\mathbf{\Theta}\right)\hfill \\ & =\text{arg}\phantom{\rule{0.2em}{0ex}}\underset{\mathbf{\Theta}}{\text{min}}\left\{\frac{{\Vert \mathbf{\Theta}-{\widehat{\mathbf{\text{Z}}}}^{(k)}\Vert}_{F}^{2}}{2{\alpha}^{2}}+\frac{\mathrm{\lambda}}{2{\sigma}^{2}}\text{\u2211}_{i,j}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}\right\}\end{array}$$

(11)

This is the maximum a posteriori estimate of **Θ** given **Z^**^{(}^{k}^{)} assuming
$p(\mathbf{\Theta})exp\left(-\frac{\mathrm{\lambda}}{2{\sigma}^{2}}{\sum}_{i,j}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}\right)$. The key feature facilitating the solution of Eq. (11) is the separability of the problem into blocks corresponding to the blocks of Eq. (4):

$$\frac{{\Vert \mathbf{\Theta}-{\widehat{\mathbf{\text{Z}}}}^{(k)}\Vert}_{F}^{2}}{2{\alpha}^{2}}+\frac{\mathrm{\lambda}}{2{\sigma}^{2}}\text{\u2211}_{i,j}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}=\text{\u2211}_{i,j}\frac{{\Vert {\mathbf{\Theta}}_{i,j}-{\widehat{\mathbf{\text{Z}}}}_{i,j}^{(k)}\Vert}_{F}^{2}}{2{\alpha}^{2}}+\frac{\mathrm{\lambda}}{2{\sigma}^{2}}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}$$

(12)

We solve Eq. (11) by finding the **Θ**_{i}_{,}* _{j}* that minimizes
${\Vert {\mathbf{\Theta}}_{i,j}-{\widehat{\mathbf{\text{Z}}}}_{i,j}^{(k)}\Vert}_{F}^{2}+\frac{{\alpha}^{2}\mathrm{\lambda}}{{\sigma}^{2}}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}$ for every pair (

$${\theta}^{}$$

(13)

We summarize the *k ^{th}* iteration of the EM algorithm as:

$$\begin{array}{l}{\widehat{\mathbf{\text{Z}}}}^{(k)}={\widehat{\mathbf{\Theta}}}^{(k-1)}+c{\mathbf{\text{S}}}^{T}{\mathbf{\text{H}}}^{T}(\mathbf{\text{Y}}-\mathbf{\text{HS}}{\widehat{\mathbf{\Theta}}}^{(k-1)}{\mathbf{\text{T}}}^{T})\mathbf{\text{T}}\\ {\widehat{\mathbf{\Theta}}}^{(k)}=\text{arg}\phantom{\rule{0.2em}{0ex}}\underset{\mathbf{\Theta}}{\text{min}}\left\{{\Vert \mathbf{\Theta}-{\widehat{\mathbf{\text{Z}}}}^{(k)}\Vert}_{F}^{2}+c\mathrm{\lambda}\text{\u2211}_{i,j}{\Vert {\mathbf{\Theta}}_{i,j}\Vert}_{F}\right\}\end{array}$$

(14)

where *c* acts as a step size parameter. Setting *α*^{2} at the upper bound *α*^{2} = *σ*^{2}‖**TT*** ^{T}*‖

As described in (Wu, 1983), continuity of the Q function and the existence of a lower bound to Eq. (6) implies convergence to a stationary point. Since Eq. (6) is convex, the only stationary points are global minima, and **Θ^**^{(}^{k}^{)} converges to a global minimizer. Various stopping criteria may be used to terminate the algorithm. To ensure a correct solution has been reached, we run the algorithm until the subgradient condition (i.e., the subdifferential of Eq. (6) at **Θ^** contains **0**) is satisified to within a small tolerance.

An advantage of our EM algorithm approach is that it provides an easy means to calculate solutions **Θ^**(λ) for a wide variety of λ via a “continuation” approach. Since a small change in λ does not typically produce a large change in **Θ^** (λ), the solution **Θ^** (λ_{1}) can be used to initialize the algorithm when solving for **Θ^**(λ_{2}). If λ_{2} ≈ λ_{1}, then this initialization point will be close to **Θ^**(λ_{2}). Furthermore, we show in Appendix B that **Θ^**(λ) = **0** for λ ≥ λ* _{MAX}*, where:

$${\mathrm{\lambda}}_{\mathit{\text{MAX}}}=2{\Vert \underset{i,j}{\text{max}}\phantom{\rule{0.2em}{0ex}}{\mathbf{\text{S}}}_{i}^{T}{\mathbf{\text{H}}}^{T}{\mathbf{\text{YT}}}_{j}\Vert}_{2}$$

(15)

depends on the data **Y**. Thus, we solve for a range of λ by setting **Θ^**(λ* _{MAX}*) =

The EM algorithm described above solves Eq. (6) efficiently compared to second-order cone programming (SOCP) methods often used to minimize cost functions similar to Eq. (6) (e.g., Ding and He (2007); Malioutov et al. (2005); Ou et al. (2009)). The most computationally expensive calculation in EM is matrix multiplication, namely (**HS**)**Θ^**^{(}^{k}^{−1)}**T*** ^{T}* and (

There are also qualitative differences between solutions found by EM and those found by SOCP. Solutions found by SOCP are not sparse; that is, many of the coefficients are very small, but not zero. This means users must define some threshold to decide which blocks of coefficients should be set to zero. In contrast EM returns truly sparse solutions, with most **Θ^**_{i}_{,}* _{j}* =

The main criticism of EM algorithms in general is that they converge slowly when the cost function is relatively flat. This slow convergence is resolved in three ways in the ESP procedure. First, our modified EM algorithm uses a variable step size to speed up convergence in flat areas (see Sec. 2.3). Second, EM typically identifies nonzero **Θ^**_{i}_{,}* _{j}* quickly, but takes longer to adjust the coefficients within in each block. Since debiasing is typically applied to the EM result (see Sec. 2.8), the algorithm may be stopped once the “sparsity pattern” has stabilized. Third, use of EM continuation (see Sec. 2.4) provides a sequence of good initial guesses. Examples of convergence behavior are given in Fig. (15).

The quality of the estimate **Θ^** given by Eq. (6) depends on the penalty weight parameter λ. Intuitively, large λ corresponds to heavy reliance on the prior and gives a very sparse (or all zero) estimate, while small λ corresponds to heavy reliance on the noisy data and gives solutions with activity spread all over the cortical surface. Clearly, both of these extremes are undesirable. In general, user expertise may provide the best results for selecting λ, and in many cases it may be desirable to consider a range of solutions from multiple values of λ. However, automatic selection of λ is necessary for objective performance evaluation and to provide a starting point for user based selection.

We have developed a simple heuristic to automate selection of λ based on the number of nonzero **Θ**_{i}_{,}* _{j}*, that is, the cardinality of

Characteristic behavior of the number of STEs identified by ESP vs. *λ/λ*_{MAX}. The presence of source activity results in a plateau in the number of STEs selected as *λ/λ*_{MAX} decreases.

These observations suggest a simple strategy analogous to the L-curve method of regularization in which we select the value of λ where the number of STEs begins to increase rapidly with further decrease in λ. We implement one such strategy by comparing the cardinality of *A^*(λ), which we denote #*A*(λ), with a piecewise approximation based on two curve segments: a linear function for large λ and quadratic function for small λ. The linear function connects the points (λ* _{MAX}*, 0) and (λ, #

This method of λ selection requires that #*A*(λ) be calculated for small enough λ such that the characteristic shape of Fig. 1 is observed before applying the heuristic. Two examples of the heuristic λ selection are shown in Fig. 2. These two examples suggest that the λ selected with this heuristic is typically smaller than the “optimal” λ. Hence, this heuristic provides a good lower bound on λ.

A key assumption behind the ESP method as presented so far is that the signal is contaminated with white noise. We accommodate colored noise by first whitening the data with the noise covariance matrix. Suppose the noise in Eq. (1) is distributed **N** ~ *N*(**0**, **R*** _{N}*). Taking the matrix square root
${\mathbf{\text{R}}}_{N}={\mathbf{\text{R}}}_{N}^{1/2}{\mathbf{\text{R}}}_{N}^{T/2}$, the whitened data is:

$$\stackrel{~}{\mathbf{\text{Y}}}={\mathbf{\text{R}}}_{N}^{-1/2}\mathbf{\text{Y}}={\mathbf{\text{R}}}_{N}^{-1/2}\mathbf{\text{HX}}+{\mathbf{\text{R}}}_{N}^{-1/2}\mathbf{\text{N}}=\stackrel{~}{\mathbf{\text{H}}}\mathbf{\text{X}}+\stackrel{~}{\mathbf{\text{N}}}$$

(16)

where **N˜** ~ *N*(**0**, **I**). ESP is applied to the whitened system. In practice **R*** _{N}* is usually unknown, so whitening requires that

After whitening the data and expanding the signal with ESP basis functions, the signal model is:

$$\stackrel{~}{\mathbf{\text{Y}}}=\stackrel{~}{\mathbf{\text{H}}}\mathbf{\text{S}}\mathbf{\Theta}{\mathbf{\text{T}}}^{T}+\stackrel{~}{\mathbf{\text{N}}}$$

(17)

We normalize the lead fields using diagonal matrix **W** whose entries equal the _{2} norm of the corresponding column of **H˜S**. This eliminates bias toward sources that are closest to the sensors. Thus we have:

$$\stackrel{~}{\mathbf{\text{Y}}}=\stackrel{~}{\mathbf{\text{H}}}{\mathbf{\text{SW}}}^{-1}\mathbf{\text{W}}\mathbf{\Theta}{\mathbf{\text{T}}}^{T}+\stackrel{~}{\mathbf{\text{N}}}$$

(18)

Here the columns of the matrix **H˜SW**^{−1} are unit norm due to the normalization. The columns of **T** may be normalized as well, though typically the **T*** _{j}* are constructed so that
${\mathbf{\text{T}}}_{j}^{T}{\mathbf{\text{T}}}_{j}=\mathbf{\text{I}}$ and thus normalization of

The penalty in Eq. (6) tends to “shrink” the coefficient matrices **Θ**_{i}_{,}* _{j}* and produces biased amplitude estimates. This bias can be removed by using ESP to estimate the set of active STEs:

We illustrate examples of ESP reconstruction in this section. We begin by describing the specific spatio-temporal bases used for the examples. Next, we present a simulation in which the true activity consists of three space-time events and the measurements are contaminated by realistic sensor noise, followed by results from a randomized simulation involving two cortical signals. Finally, we apply ESP reconstruction to measurements collected during a self-paced finger tapping experiment.

As noted previously there are many possible ways to define STEs. We illustrate the attributes of the ESP reconstruction approach using STEs defined spatially by cortical patches and temporally by fixed time windows and a common bandwidth. Use of cortical patches to design spatial bases results in event bases that are locally supported and particularly parsimonious. This significantly reduces the dimension of the inverse problem. The basis construction procedure described here exemplifies several key attributes of appropriate bases and is derived from the cortical patch bases of Limpiti et al. (2006).

Assume the *i ^{th}* cortical patch is tessellated with a set of dipoles whose lead fields and amplitudes are concatenated into a patch lead field matrix

The columns of **V*** ^{i}* are a natural basis for the patch signal

$${\text{Y}}^{i}=\text{\u2211}_{k}{a}_{k}{\mathbf{\text{u}}}_{k}^{i}{\sigma}_{k}^{i}$$

(19)

where
${\mathbf{\text{u}}}_{k}^{i}$ is the *k ^{th}* column of

Figure 3 depicts the singular values for two sets of patches computed for the 275 channel CTF MEG system. The singular values shown in Figure 3(a) correspond to all 393 patches of geodesic radius 20 mm which tile both hemispheres of a subject with 50% overlap. Note that in each case the 7* ^{th}* and higher singular values are an order of magnitude smaller than the largest singular value. Furthermore, the first two or three singular values are five times larger than the higher order singular values. The number of significant singular values is related to the size of the patch. Figure 3(b) shows singular values corresponding to 57 patches of geodesic radius 50 mm which again tile both hemispheres with 50% overlap. The singular values for the larger patches decay more slowly; the 10

Normalized singular values associated with patch lead field matrices from (a) 393 patches of 20 mm geodesic radius and (b) 57 patches of 50 mm geodesic radius which span the cortical surface with 50% overlap. The singular values decay rapidly in every **...**

Figure 4 illustrates the spatial pattern associated with several right singular vectors for a typical 20 mm patch by depicting each element of **v*** _{k}* at the corresponding dipole location in the patch. The singular vectors associated with the largest singular values tend to represent low resolution features within the patch. The level of detail or fine structure represented by a singular vector increases as the corresponding singular values decrease. Selecting the number of significant singular values and vectors thus involves a tradeoff between representing the details of the spatial distribution of activity within the patch and having adequate SNR to estimate the detail. Relatively high SNR is required to reliably identify the fine structure associated with small singular values. Similar tradeoffs exist with other approaches to choosing spatial bases. For example, the contributions of higher order terms in a multipole expansion also decay rapidly, so the SNR dictates the number of terms that can be reliably estimated and used as spatial bases.

Color cortical activity pattern for singular vectors **v**_{k} of a representative patch. The vertices in the mesh represent dipole locations in the cortical tessellation of the patch. (a) **v**_{1}, largest singular value. (b) **v**_{3}, third largest singular value. (c) **...**

For the examples that follow, we choose patches of 20 mm geodesic radius on the cortical surface extracted from an MRI of the subject. Note that geodesic distances calculated using the edges of triangles representing the cortical surface are greater than the true geodesic distance. Figures 13 and and1414 illustrate typical 20 mm patches. We choose three spatial bases per patch for each STE since the majority of measurable activity in any patch is represented by the first three singular vectors, as indicated in Fig. 3.

We represent the temporal aspect of STEs for our examples by dividing the total time extent of interest into time windows that overlap by 50%. A wavelet basis that spans the frequency band of interest (0-100 Hz) is used for each window. We have found that wavelet bases result in smooth transitions at the window edges. We use 32 wavelets per time interval, so the number of coefficients associated with each STE is 96 (**Θ**_{i}_{,}* _{j}* is 3 by 32).

We test the ESP approach using simulated data from a 275 channel CTF MEG system. The simulated cortical signal consists of three STEs: activation of a patch in both the left and right primary auditory cortex with a slight offset in time followed by a patch of activity in the motor cortex. All three simulated patches have geodesic radii of roughly 10 mm. The patches used as simulated sources do not correspond to patches used in the ESP procedure; the simulated source patches are smaller and have different center locations. Furthermore, the simulated spatial activity pattern within each patch (raised cosine) does not correspond to the singular vectors used to represent any patch. Lastly, the temporal evolution of the signal, although primarily low-pass, does not correspond directly to the wavelet bases used in reconstruction. The measurements are simulated using Eq. (1) for 60 trials, with **N** consisting of spatially colored noise drawn from a *N*(**0**, **I****R**) distribution where **R** was calculated from the prestimulus portion of an MEG experiment. The noise level is set to achieve an average SNR (defined as
${\Vert \mathbf{\text{HX}}\Vert}_{F}^{2}/E[{\Vert \mathbf{\text{N}}\Vert}_{F}^{2}]$) of -6 dB after averaging over all epochs. Figure 5 illustrates the time courses and spatial extents of the three space-time events, as well as the noisy measurements.

Simulated three source scenario. True source location and spatial extent of sources at times (a) 12, (b) 30, and (c) 116 ms. (d) True source time courses. (e) Simulated signals in noise for SNR = -6 dB.

After whitening the data as described in Section 2.7 using a noise covariance matrix estimated by removing the mean from each trial, we apply ESP reconstruction to the simulated signal using the STEs described in Section 3.1. We use 7 overlapping time windows of 64 ms each. Combined with the 393 spatial patches, this results in a total of 2751 candidate STEs, each with 96 coefficients.

The heuristic of Section 2.6 is used to select λ. The result of debiasing the ESP reconstruction for this λ is shown in Figs. 6 and and7.7. All three STEs are recovered at the correct locations in space and time. Activity is spread spatially around the true location. Some of this spreading is due to the use of larger patches in the ESP reconstruction, while some spreading is inherent to the ill-posed nature of the inverse problem. We also point out that some temporal spreading occurs as a result of noise lying in the subspace defined by fixed time windows.

Cortical signal estimate from ESP before (a – c) and after (d – f) debiasing. Estimates shown for peak signal times: (a,d) 12 ms, (b,e) 30 ms, and (c,f) 116 ms.

(a) Noise free simulated data (b) ESP reconstruction of sensor measurements before debiasing (c) ESP reconstruction of sensor measurements after debiasing. Estimated temporal evolution of: (d) a single dipole in the right auditory cortex, (e) a single **...**

This is evident between 97 and 105 ms, where the motor cortex exhibits low-level activity in the reconstruction, though the true motor activity starts at 106 ms. The fixed time window in which the motor cortex is active begins at 97 ms. The reconstructed cortical activity in Fig. 7(d) – 7(f) is consistent with the true sources depicted in Fig. 5(d).

We now illustrate the flexibility of ESP by employing different sets of temporal basis functions for the **T*** _{j}*. We also show the advantage of assuming sparsity in the temporal domain. First, we use the same seven overlapping time windows, but perform singular value decomposition (SVD) of the noisy data in each window and keep the two most significant singular vectors for each

(a) Reconstructed sensor measurements using multiple time windows and SVD temporal basis functions. (b) Reconstructed sensor measurements using single time window and SVD temporal basis functions.

Reconstructed cortical signal using a single temporal block at time: (a,d) 12 ms (b,e) 30 ms (c,f) 116 ms.

In our final variation on this example, we illustrate an alternative way to exploit sparsity in the temporal domain. Here we consider a scenario involving a “slow” event which spans a long period of time and two “fast” events spanning much shorter time periods. These signals are illustrated in Fig. 10. Simulated measurements are generated using Eq. (1) assuming white noise at 0 dB. We apply ESP using multiscale temporal blocks. For **T**_{1} we use 32 low frequency, large scale wavelets which occupy the entire time period. In contrast, **T**_{2} and **T**_{3} consist of 32 wavelets each spanning, respectively, the first and second half of the 256 ms measurement period. Thus the temporal blocks implement a time-frequency partition of the data. We illustrate the resulting reconstruction (after heuristic λ selection and debiasing) in Fig. 11. ESP estimates the “slow” portion of the data using the large scale wavelets (**T**_{1}). The “fast” components are reconstructed with the shorter scale, higher frequency bases of **T**_{2}. **T**_{3} is identified by ESP as inactive. This example illustrates the use of temporal sparsity to identify events spanning different time scales. Similarly, by appropriate basis selection, temporal sparsity can be used to identify events spanning different frequency bands.

We now conduct a series of randomized simulations to illustrate the performance of the ESP procedure under various signal conditions. Two spatially extended cortical sources of activity are present in each simulation. Each source follows a damped cosine time course as shown in Fig. 12. Trials with less than about 25 ms between the onset of source 1 and source 2 exhibit temporal overlap at the sensors. For example, the onset of source 2 is 14 ms later than that of source 1 in Fig. 12 and significant temporal overlap is present. The location, size, and starting time of each source is selected at random to provide a rich variety of source configurations. Specifically, the center point of each event is selected uniformly at random from all 24514 voxels on the cortical surface. The geodesic radius of each source is selected uniformly at random between 8 and 40 mm. The amplitude of activity of each dipole in a source is given by
$\text{cos}\left(\frac{1.5\mathrm{\Delta}}{r}\right)$ where Δ is the geodesic distance from the center of the source and *r* is the geodesic radius of the source. Finally, the onset of activity in each source is selected uniformly at random between 0 and 75 ms. The measurement period is 128 ms and the average SNR is set to 0 dB.

ESP reconstruction is performed using the 20 mm radius spatial bases and 64 ms duration wavelet temporal bases of Sec. 3.1 with three overlapping time windows. We calculate the distance error between the center of the true source and the center of the debiased ESP estimate of that source at the peak time sample. The center is found by taking a signal energy weighted average of all active voxels. When sources overlap in time, a weighted k-means algorithm is used to separate voxels into two groups (one for each source). The source one and two SNR is defined as ${\Vert {\mathbf{\text{HX}}}_{1}\Vert}_{F}^{2}/E[{\Vert \mathbf{\text{N}}\Vert}_{F}^{2}]$ and ${\Vert {\mathbf{\text{HX}}}_{2}\Vert}_{F}^{2}/E[{\Vert \mathbf{\text{N}}\Vert}_{F}^{2}]$, respectively. These values depend on both the source spatial extent and the source location. Note that the individual source SNRs do not sum to the overall SNR of 0dB because of spatio-temporal interaction between the sources. The heuristic of Section 2.6 is used to select λ in each trial. Table 1 summarizes the results of 10 trials.

Figure 13 illustrates ESP spatial reconstruction for both sources in Trial 6. Some spatial spreading occurs around both sources. ESP identifies low level activity in the region of source 2 before the activity begins. This is a result of noise present between the edge of the active time window and the onset of activity. Figure 13(d) shows the ESP reconstruction at the time of the peak source 2 activity. This time point corresponds roughly to the second temporal peak of the source 1 activity (see Fig. 12), so activity associated with source 1 is visible.

Spatial reconstructions for Trial 8 are shown in Fig. 14. Here again spatial spreading is evident. Though the correct location is included in the solution for both sources, the peak of activity in both cases is biased.

We conclude this example with an illustration of EM algorithm convergence performance. Figure 15 shows the number of EM iterations required to find each **Θ^**(λ) for all ten trials of Table 1. For nearly all trials, fewer than 50 iterations are required for λ ≥ 0.3λ* _{MAX}*.When λ = 0.1λ

We also applied ESP to MEG measurements of a subject performing self-paced finger tapping. The data were collected with a Magnes 3600 whole head system at a sampling rate of 678 Hz and band pass filtered to lie between 2 and 40 Hz. The measured data were grouped into 80 epochs of 755 ms duration (512 samples) referenced to the tap at time 0 detected via a switch closure. Whitening was performed via Eq. (16) where **R*** _{N}* was estimated by removing the average response from all epochs and calculating a sample covariance matrix by averaging over time and epochs. The averaged sensor measurements before and after whitening are shown in Fig. 16. Cortical patches of 20 mm geodesic radius are used to construct the spatial bases

The debiased reconstruction shown in Fig. 17 is calculated using the heuristic of Section 2.6 to select λ. The estimate of activity begins about 200 ms pre-tap and persists through the end of the measurement period. The earliest activity occurs in the left motor cortex, which corresponds with the action of initiating the finger tap. The next region to become active spans the somatosensory and posterior parietal cortex. The somatosensory activity likely reflects the sensation of pushing down the switch. The posterior parietal cortex is involved in spatial relationships, body image, and movement planning (Bear et al., 2001). The parietal lobe activity begins at low levels about 100 ms before the switch closes and continues throughout the measurement period. It is possible that the early onset of activity in the parietal lobe is a result of noise in the same temporal subspace as an active time window. Beginning at about 0 ms the ESP reconstruction includes a small patch on the left frontal lobe. Some weak activity is also identified in the right hemisphere, primarily in the parietal lobe. This is consistent with earlier studies' finding that the right parietal cortex plays a role in spatial processing (Culham et al., 2006).

The ESP approach searches for a small number of STEs which best explain the measurements. An appropriately chosen basis function representation for each STE reduces the number of parameters in the inverse problem by only representing activity that is measurable at the M/EEG sensors. Thus, our approach can incorporate the inherent limitations of the forward problem to formulate a more tractable inverse problem. The spatial bases can be chosen to represent extended sources such as patches or multipoles. The temporal basis functions naturally constrain the solution based on temporal features of interest, such as the time-frequency characteristics. Use of both spatial and temporal bases allows ESP to automatically determine both where and when significant activity occurs. A similar joint spatio-temporal philosophy based on beamforming has been proposed recently by Dalal et al. (2008).

We have illustrated the effectiveness of the ESP approach on simulated data and MEG recordings from a finger tapping paradigm. The simulated example in Sec. 3.2 shows the benefit of exploiting spatio-temporal sparsity in a three source scenario. The randomized two source scenarios in Sec. 3.3 indicate that ESP is capable of recovering almost all cortical sources with the majority of centroid localization errors less than 1 cm. The centroid localization errors are generally inversely proportional to the source SNR. In Trials 3 and 7, the weaker of the two sources was not detected using ESP due to the very low SNRs associated with these sources. In the self-paced finger tapping experiment of Sec. 3.4, ESP recovers spatio-temporal activity consistent with known brain behavior.

Our examples include source activity of varying complexity. In some cases, individual events have similar powers. Others include a weak source and a strong source. The spatial separation and temporal overlap of sources also vary. Evaluating the degradation in performance of ESP (or any other approach) as a function of increasing source complexity is an important problem. Unfortunately, quantifying this degradation is difficult, as “source complexity” can include both the number of events and the pattern of spatial and/or temporal activity of each source. We believe the use of temporal priors in the ESP approach renders it more robust to degradation due to source complexity than methods which use only spatial priors. Sources falling in different temporal subspaces are easier to separate with ESP than simple _{1} spatial regularization at a single time point.

The use of sparse representations in the spatial and temporal/frequency domains for M/EEG is not unique to our approach. Sparsity in the spatial domain is enforced by limiting the number of dipolar, multipolar, patch based, or “meso-” sources used to represent measured data, usually via an _{1} penalty (Auranen et al., 2005; Daunizeau and Friston, 2007; Ding and He, 2007; Gorodnitsky and Rao, 1997; Huang et al., 2006; Jeffs et al., 1987; Matsuura and Okabe, 1995, 1997; Ou et al., 2009; Uutela et al., 1999). Sparsity in the time domain has been used both implicitly and explicitly in M/EEG. One common implicit approach is to project measured data onto a few temporal basis functions derived from a singular value decomposition of the data (Huang et al., 2006; Ou et al., 2009). The more explicit approaches often involve finding a small number of wavelets which represent the signal well. This is accomplished via thresholding of wavelet coefficients (Effern et al., 2000; Trejo and Shensa, 1999) or wavelet shrinkage approaches (Trujillo-Barreto et al., 2008). A similar approach using damped sinusoids instead of wavelets is proposed in (Demiralp et al., 1998).

The ESP criterion involves the weighted combination of the squared fitting error and a penalty designed to encourage solutions consisting of a small number of STEs. The resulting cost function is convex and is minimized with a novel EM algorithm. The EM algorithm efficiently identifies a parameterized family of solutions that balance sparsity against squared error. By choosing appropriate bases and minimizing the number of STEs, ESP solutions avoid the “spiky” spatial and temporal artifacts typical of _{1} methods that minimize the number of active dipoles at individual time instants.

We propose choosing the penalty weight by evaluating the number of STEs in the solution as the weight decreases from the maximum value. The number of STEs increases relatively slowly as the weight decreases when unrepresented signal components are present, and then increases very rapidly as the weight decreases when the STEs in the solution begin to represent noise. Hence, the weight value at which the number of STEs begins to increase rapidly is a good choice. We present one automated approach for finding this value of rapid increase based on fitting a curve to the number of STEs.

The ESP approach differs significantly from recent work on _{1} regularization. In most existing _{1} approaches, the regularization term penalizes every active dipole via ‖vec (**X**) ‖_{1} or, if applied in the context of a basis expansion (Eq. (4)), penalizes every active coefficient via ‖vec(**Θ**) ‖_{1}. In contrast, ESP gives a smaller penalty to active coefficients in the same block than to active coefficients in different blocks. This promotes solutions consisting of a few active spatio-temporal blocks, rather than a few active coefficients or dipoles, while accommodating uncertainty in the spatio-temporal evolution of each block. Given appropriate spatial and temporal bases for each STE, ESP favors solutions which are connected spatially and vary smoothly over time as opposed to the “spiky”, disconnected results often seen in standard _{1} minimization.

The mesostate model recently proposed by Daunizeau and Friston (2007) also treats cortical activity as a collection of a small number of events, or mesosources, which are identified through a Variational Bayesian (VB) inversion. The mesostate approach differs in a number of ways from the ESP approach. First, the maximum number of meso-sources must be known a priori. In Daunizeau and Friston (2007), this number is between one and eight. Second, the VB inversion “switches off” mesostates with little or no evidence. The ESP approach, on the other hand, works by “switching on” STEs with strong evidence from a large number of candidate STEs. Third, each meso-source is described by a mean time course and spatial location. Dipoles comprising a meso-source are modeled as perturbations of this mean activity. An STE, on the other hand, describes any measurable activity occurring in a local region of space and time, and thus is not limited to representation of average activity.

Another recent source imaging method proposed by Ou et al. (2009) uses a cost function similar to our ESP cost function. Unlike our approach, Ou et al. (2009) do not exploit potential sparsity in the temporal and/or frequency domains to separate sources. Instead, the measured data is projected onto a few dominant singular vectors of the data, similar to the VESTAL approach. As a result, these bases span the entire measurement period under study. The main downside to this approach is that, unlike ESP, it can not resolve temporally localized activity (compare Figs. 6 and and9).9). Furthermore, the method proposed in Ou et al. (2009) decimates the mesh representing the cortex, using single dipolar sources to represent larger cortical areas. A third difference between ESP and the Ou et al. (2009) approach is the method by which the cost function is minimized. Ou et al. (2009) use second-order cone programming on problems with far fewer parameters. ESP uses an novel EM algorithm which scales linearly with problem dimensions, whereas SOCP scales cubically. SOCP approaches are computationally intractable for the problem sizes considered in this paper.

Several block penalty methods similar to our ESP approach have been proposed (Malioutov et al., 2005; Tropp, 2006; Turlach et al., 2005; Yuan and Lin, 2006), though none of them in the context of M/EEG. All of these aim to find solutions which are sparse spatially, but not temporally, i.e., are sparse in only one of the dimensions. Also, all but Yuan and Lin (2006) construct blocks containing only a single spatial basis repeated over multiple time samples. This corresponds to defining blocks based on single rows of **X**. Malioutov et al. (2005) penalize the sum of the _{2} norms of each row of the signal matrix in direction of arrival (DOA) estimation for radar. SOCP is used to perform the optimization. Both Turlach et al. (2005) and Tropp (2006) use the sum of _{∞} norms as a penalty, but again consider only single rows of the signal matrix as blocks and do not employ a temporal decomposition. Turlach et al. (2005) solve a constrained version of the ESP problem via an interior point method. Tropp (2006) does not specify an optimization algorithm, but instead provides useful analysis of the properties of minimizers. Yuan and Lin (2006) suggest essentially fixed point iteration on the nonlinear functions comprising the subgradient condition for a minimizer of Eq. (6). However, they indicate that computational complexity limits their approach to small problems. None of these earlier works consider the cost function as a Bayesian likelihood, nor does there appear to be any published Expectation-Maximization approach to these “block-sparse” problems.

The choice *p* = 2 in Eq. (5) results in a convex optimization and appears to yield very good results. However, modifying the M-Step in Eq. (14) slightly allows solution of Eq. (5) for other values of *p* as well. In this case Eq. (11) is written:

$${\widehat{\mathbf{\Theta}}}^{(k)}=\text{arg}\phantom{\rule{0.2em}{0ex}}\underset{\mathbf{\Theta}}{\text{min}}\left\{{\Vert \mathbf{\Theta}-{\widehat{\mathbf{\text{Z}}}}^{(k)}\Vert}_{F}^{2}+c\mathrm{\lambda}\text{\u2211}_{i,j}{\Vert \text{vec}\left({\mathbf{\Theta}}_{i,j}\right)\Vert}_{p}\right\}$$

(20)

Exact solutions to the M-Step represented by Eq. (20) can be obtained for the cases *p* = 1 and *p* = ∞ in addition to *p* = 2. The Generalized Expectation-Maximization approach (Gellman et al., 2003) can be used to solve Eq. (20) for other values of *p*.

Using a set of spatial basis functions based on cortical patches and wavelet temporal bases, we demonstrated the effectiveness of ESP regularization on simulated cortical signals as well as data collected from a self-paced finger tapping experiment. These specific basis functions are not required to use ESP. Other methods for constructing the **S*** _{i}* and

An anatomical approach to patch design is used in the recently proposed MiMS technique (Cottereau et al., 2007). The spatial extent of each patch is chosen based on the local characteristics of the cortical surface. A multipolar expansion of fixed order is used to represent the activity from candidate patch. MiMs employs a coarse to fine strategy to identify active regions. It begins with large patches, finds a subset of active patches that best fit the data using cross validation, subdivides active patches and repeats the patch selection/subdivision process until a fitting criterion is satisfied. MiMS does not address selection of temporal areas of interest or solve for activity jointly over space and time.

The assumptions behind ESP apply to a wide range of experimental paradigms and can accommodate varying specific assumptions through the choice of basis functions. Simulated and actual MEG data suggest ESP is effective at localizing cortical activity in both space and time.

We show that *θ**** given by Eq. (13) is the minimizer of
$f(\theta )={\Vert \theta -\mathbf{\text{Z}}\Vert}_{2}^{2}+\frac{{\alpha}^{2}\mathrm{\lambda}}{{\sigma}^{2}}{\Vert \theta \Vert}_{2}$ using facts from the theory of subgradients (Rockafellar, 1970). A vector **g** is a subgradient of a convex function *f*(** θ**) at

It is easy to see that the subdifferential of ‖** θ**‖

We show that **Θ^**(λ) = **0** for λ ≥ λ* _{MAX}* given in Eq. (15). We again use the theory of subgradients (Rockafellar, 1970). We first rewrite the objective function of Eq. (6) as
$f\left(\theta \right)={\Vert \mathbf{\text{Y}}-{\sum}_{i,j}({\mathbf{\text{T}}}_{j}{\mathbf{\text{HS}}}_{i}){\theta}_{i,j}\Vert 22}_{+}^{\mathrm{\lambda}}$, where

The authors thank the MEG Lab at the University of Texas Health Science Center Houston for the finger tapping data.

^{2}It can be shown that the minimizer of
${\Vert {\mathbf{\text{Y}}}_{t}-{\mathbf{\text{HX}}}_{t}\Vert}_{2}^{2}+\mathrm{\lambda}{\Vert {\mathbf{\text{X}}}_{t}\Vert}_{2}$ is also the minimizer of
${\Vert {\mathbf{\text{Y}}}_{t}-{\mathbf{\text{HX}}}_{t}\Vert}_{2}^{2}+\stackrel{~}{\mathrm{\lambda}}{\Vert {\mathbf{\text{X}}}_{t}\Vert}_{2}^{2}$ for some *λ˜*.

^{3}Although the problem formulation is the same, we solve the problem with a different algorithm.

^{4}In ML estimation, the M-Step is to maximize the Q-function alone; however, in the case of MAP estimation, the log likelihood of Θ is included as well (see Dempster et al. (1977)).

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- Auranen T, Nummenmaa A, Hämäläinen M, Jääskeläinen I, Lampinen J, Vehtari A, Sams M. Bayesian analysis of the neuromagnetic inverse problem with
-norm priors. NeuroImage. 2005;26(3):870–884. [PubMed]^{p} - Baillet S, Garnero L. A Bayesian approach to introducing anatomo-functional prior in the EEG/MEG inverse problem. IEEE Trans Biomed Eng. 1997;44(5):374–385. [PubMed]
- Baillet S, Mosher J, Leahy R. Electromagnetic brain mapping. IEEE Signal Processing Magazine. 2001;18(6):14–30.
- Bear M, Connor B, Paradiso M. Neuroscience: Exploring the Brain. 2nd. Lippincott, Williams and Wilkins; Philadelphia, PA: 2001.
- Bolstad A, Van Veen B, Nowak R. IEEE Statistical Signal Processing Workshop. Madison, WI: Aug, 2007. Magneto-/electroencephalography with space-time sparse priors; pp. 190–194.
- Brewer J. IEEE Transactions on Circuits and Systems cas- 9. Vol. 25. 1978. Kronecker products and matrix calculus in system theory; pp. 772–781.
- Candes EJ, Tao T. Decoding by linear programming. IEEE Transactions on Information Theory. 2005;51(12):4203–4215.
- Cottereau B, Jerbi K, Baillet S. Multiresolution imaging of meg cortical sources using an explicit piecewise model. NeuroImage. 2007;38(3):439–451. [PubMed]
- Culham JC, Cavina-Pratesi C, Singhal A. The role of parietal cortex in visuomotor control: What have we learned from neuroimaging? Neuropsychologia. 2006;44(13):2668–2684. [PubMed]
- Dalal S, Guggisberg A, Edwards E, Sekihara K, Findlay A, Canolty R, Berger M, Knight R, Barbaro N, Kirsch H, Nagarajan S. Five-dimensional neuroimaging: Localization of the time-frequency dynamics of cortical activity. NeuroImage. 2008;40(4):1686–1700. [PMC free article] [PubMed]
- Daunizeau J, Friston K. A mesostate-space model for EEG and MEG. NeuroImage. 2007;38(1):67–81. [PubMed]
- Daunizeau J, Mattout J, Clonda D, Goulard B, Benali H, Lina J. Bayesian spatio-temporal approach for eeg source reconstruction: Conciliating ecd and distributed models. IEEE Transactions on Biomedical Engineering. 2006;53(3):503–516. [PubMed]
- Demiralp T, Ademoglu A, Istefanopulos Y, Gülçür H. Analysis of event-related potentials (erp) by damped sinusoids. Biological Cybernetics. 1998;78:487–493. [PubMed]
- Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the em algorithm (with discussion) Journal of the Royal Statistical Society B. 1977;39:1–38.
- Ding L, He B. Proceedings of NFSI & ICFBI. Oct, 2007. Sparse source imaging in EEG.
- Dogandži&cacute A, Nehorai A. Estimating evoked dipole responses in unknown spatially correlated noise with EEG/MEG arrays. IEEE Trans Signal Processing. 2000;48(1):13–25.
- Donoho D. De-noising by soft-thresholding. IEEE Transactions on Information Theory. 1995;41(3):613–627.
- Donoho D. For most large underdetermined systems of linear equations, the minimal ell-1 norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics. 2006;59(6):797–829.
- Effern A, Lehnertz K, Fernández G, Grunwald T, David P, Elger C. Single trial analysis of event related potentials: non-linear de-noising with wavelets. Clinical Neurophysiology. 2000;111:2255–2263. [PubMed]
- Figueiredo M, Nowak R. An em algorithm for wavelet-based image restoration. IEEE Transactions on Image Processing. 2003;12(8):906–916. [PubMed]
- Gellman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis. 2nd. CRC Press; Boca Raton, FL: 2003.
- Gorodnitsky I, Rao B. IEEE Transactions on Signal Processing. 3 Vol. 45. 1997. Sparse signal reconstruction from limited data using focuss: A re-weighted minimum norm algorithm.
- Huang M, Dale A, Song T, Halgren E, Harrington D, Podgorny I, Canive J, Lewis S, Lee R. Vector-based spatial-temporal minimum l1-norm solution for meg. NeuroImage. 2006;31:1025–1037. [PubMed]
- Jeffs B, Leahy R, Singh M. An evaluation of methods for neuromagnetic image reconstruction. IEEE Transactions on Biomedical Engineering. 1987;34:713–723. [PubMed]
- Limpiti T, Van Veen B, Wakai R. Cortical patch basis model for spatially extended neural activity. IEEE Transactions on Biomedical Engineering. 2006;53(9):1740–1754. [PubMed]
- Malioutov D, Çetin M, Wilsky A. A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Transactions on Signal Processing. 2005;53(8):3010–3022.
- Matsuura K, Okabe Y. Selective minimum-norm solutions of the biomagnetic inverse problem. IEEE Transactions on Biomedical Engineering. 1995;42(6):608–615. [PubMed]
- Matsuura K, Okabe Y. A robust reconstruction of sparse biomagnetic sources. IEEE Transactions on Biomedical Engineering. 1997;44(8):720–726. [PubMed]
- Mosher J, Leahy R, Shattuck D, Baillet S. Lecture Notes in Computer Science Vol 1613 Proc IPMI99. Springer-Verlag GmbH; 1999. MEG source imaging using multipolar expansions.
- Mosher J, Lewis P, Leahy R. IEEE Trans Biomed Eng. 6. Vol. 39. 1992. Multiple dipole modeling and localization from spatio-temporal MEG data; pp. 541–557. [PubMed]
- Nolte G, Curio G. IEEE Trans Biomed Eng. 10. Vol. 47. 2000. Current multipole expansion to estimate lateral extent of neuronal activity: A theoretical analysis; pp. 1347–1355. [PubMed]
- Ou W, Hämäläinen M, Golland P. A distributed spatio-temporal eeg/meg inverse solver. NeuroImage. 2009;44(3) [PMC free article] [PubMed]
- Pascual-Marqui RD, Michel CM, Lehmann D. Low resolution electromagnetic tomography: a new method for localizing electrical activity in the brain. International Journal of Psychophysiology. 1994;18:49–65. [PubMed]
- Rockafellar RT. Convex Analysis. Princeton University Press; Princeton, New Jersey: 1970.
- Scherg M, von Cramon D. Two bilateral sources of the late AEP as identified by a spatio-temporal dipole model. Electroencephalogr Clin Neurophysiol. 1985;62:32–44. [PubMed]
- Trejo L, Shensa M. Feature extraction of event-related potentials using wavelets: An application to human performance monitoring. Brain and Language. 1999;66:89–107. [PubMed]
- Tropp J. Algorithms for simultaneous sparse approximation, part ii: Convex relaxation. Signal Processing. 2006;86:589–602.
- Trujillo-Barreto N, Aubert-Vázquez, Penny W. Bayesian m/eeg source reconstruction with spatio-temporal priors. NeuroImage. 2008;39(1):318–335. [PubMed]
- Turlach B, Venables W, Wright S. Simultaneous variable selection. Technometrics. 2005;47(3):349–363.
- Uutela K, Hämäläinen M, Somersalo E. Visualization of magnetoencephalographic data using minimum current estimates. NeuroImage. 1999;10:173–180. [PubMed]
- Van Veen B, Van Drongelen W, Yuchtman M, Suzuki A. IEEE Trans Biomed Eng. 9. Vol. 44. 1997. Localization of brain electrical activity via linearly constrained minimum variance spatial filtering; pp. 867–880. [PubMed]
- Wagner M, Fuchs M, Kastner J. Proceedings of the 13th International Conference on Biomagnetism. Jena; Germany: Aug, 2002. Current density reconstructions and deviation scans using extended sources.
- Wagner M, Köhler T, Fuchs M, Kastner J. An extended source model for current density reconstructions. In: Nenonen J, Ilmoniemi R, Katila T, editors. Proceedings of the 12th International Conference on Biomagnetism; Espoo, Finland: Espoo: Helsinki Univ of Technology; Aug, 2000. pp. 749–752.
- Wright S, Nowak R, Figueiredo M. Sparse reconstruction by separable approximation. IEEE International Conference on Acoustics, Speech, and Signal Processing; Las Vegas, NV. 2008.
- Wu C. On the convergence properties of the em algorithm. Annals of Statistics. 1983;11:95–103.
- Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society B. 2006;68:49–67.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |