Inverse problems are common in applied physics. They consist of estimating the parameters of a model from incomplete and noisy measurements. Examples are tomography which is a key technology in the field of medical imaging, or identifying the targets using sonars and radars. Blind source separation, which is an active topic of research in audio-processing, also falls in this category. In this contribution, we target the localization of the brain regions whose neural activations produce electromagnetic fields measured by Magnetoencephalography (MEG) and Electroencephalography (EEG), which we will refer to collectively as M/EEG. The *sources* of M/EEG are current generators classically modeled by current dipoles. Given a limited number of noisy measurements of the electromagnetic fields associated to neural activity, the task is to estimate the positions and amplitudes of the sources that have generated the signals. By solving this problem, M/EEG become noninvasive methods for functional brain imaging with a high temporal resolution.

Finding a solution to an inverse problem requires finding a good model for the observed data given the model parameters: this is called the forward problem. The task in the inverse problem is to infer the model parameters given the measurements. This is particularly challenging for an under-determined problem where the number of parameters to estimate is greater than the number of measurements. In such settings, several different source configurations can explain the experimental data and additional constraints are needed to provide a sound solution. In addition, the solution may be highly sensitive to noise in the measurements. Such problems are said to be *ill-posed*. Note that even over-determined problems can be ill-posed.

The linearity of Maxwell’s equations implies that the signals measured by M/EEG sensors are linear combinations of the electromagnetic fields produced by all current generators. The linear operator, called

*gain matrix* in the context of M/EEG, predicts the fields measured by the sensors due to a configuration of sources (

Mosher et al. 1999). Computing the gain matrix accurately is particularly crucial for EEG, and involves complex numerical solvers (

Kybic et al. 2005,

Gramfort et al. 2010). In the M/EEG literature, solvers known as

*distributed inverse solvers* essentially seek to invert the gain matrix. In practice, the

*distribution* of estimated currents is defined over a discrete set of locations where are positioned current dipoles. The distribution is scalar valued when only their amplitudes are unknown, and vector valued when both amplitudes and orientations of the dipoles need to be estimated. The current generators are commonly assumed to lie on the cortex and are in practice fixed at the locations of the vertices of a cortical mesh (

Dale & Sereno 1993*a*). However, the number of generators largely exceeds the number of M/EEG sensors. To tackle this problem, one needs to use

*a priori* knowledge on the characteristics of a realistic source configuration.

The priors most commonly used in the M/EEG community are based on the

_{2} norm, leading to what is known as the

*Minimum Norm* (MN) inverse solver (

Wang et al. 1992,

Dale & Sereno 1993*a*,

Hämäläinen & Ilmoniemi 1994,

Pascual-Marqui et al. 1994). This MN inverse solver leads to a linear solution:

*i.e.,* Minimum Norm Estimates (MNE) are obtained by simple matrix multiplication (

Tikhonov & Arsenin 1977). This makes the estimation extremely fast. However,

_{2}-based solvers suffer from several limitations. Among which is the smearing of the even focal activations, often leading to overestimation of the extents of the activated areas. Also, they require to use a

*two-step* approach where the MNE are post-processed to obtain an interpretable picture of the spatio-temporal activation patterns (

Pantazis et al. 2003,

Gramfort et al. 2011). To address these limitations many alternatives to MNE have been proposed.

In the mid 90’s, Matsuura

*et al* (

Matsuura & Okabe 1995) proposed to regularize the amplitudes of the estimated sources with an

_{1} prior using an optimization procedure based on the simplex method. This approach was then later slightly modified by Uutela

*et al* (

Uutela et al. 1999), who called the

_{1} penalized solutions minimum-current estimates (MCEs). About the same time, Gorodnitsky

*et al* proposed to use Iterative Reweighted Least Squares (IRLS) with the FOCUSS algorithm (

Gorodnitsky et al. 1995) to approximate the solution that would be obtained with an

_{0} prior. Subsequently, (

Phillips et al. 1997) proposed a Bayesian approach based of Markov random fields (MRF) and solved with mean field annealing. All these approaches are motivated by the fact that realistic source configurations are likely to have only a limited number of active sites. For example, only a few brain regions are typically significantly activated by a given cognitive task. The source configuration is said to be spatially

*sparse*. This assumption has proved to be relevant for clinical applications (

Huppertz et al. 2001) and also justifies dipole fitting which is currently the most widely used method in clinical settings.

However, the above approaches suffer from significant limitations. As they promote a sparse solution independently at each time instant, they fail to recover the time courses of cortical sources. In order to go beyond these limitations, there has been a growing interest for methods that promote spatially sparse solutions while taking into the temporal dynamics of the data (

Phillips et al. 2005,

Friston et al. 2008,

Wipf & Nagarajan 2009,

Valdés-Sosa et al. 2009,

Haufe et al. 2008,

Ou et al. 2009). While the methods proposed in (

Phillips et al. 2005,

Friston et al. 2008,

Wipf & Nagarajan 2009) are related to sparse Bayesian learning where the problem boils down to the maximization of a non-convex cost function called the

*model evidence*, (

Haufe et al. 2008,

Ou et al. 2009) address the problem using a sparsity-inducing prior that mixes both

_{1} and

_{2} norms. A

_{1} prior is used to promote a spatially sparse solution and a smooth

_{2} prior is used either for orientations (

Haufe et al. 2008) or both time and orientations (

Ou et al. 2009), leading to a convex optimization problem. A problem is convex when it consists in minimizing a convex function over a convex set (

Boyd & Vandenberghe 2004). The main reason for the success of these solvers is the

*structured sparsity* induced by the

_{21} mixed-norm. illustrates source estimates with a simple

_{1} norm compared to a structured prior with a

_{21} mixed-norm. The latter leads to a structured sparsity pattern while a simple

_{1} norm provides a scattered pattern that is not consistent with what is known about the sources. Here, the

_{21} prior guarantees that the active source sites will be the same over the entire time interval of interest. Furthermore, grouping the temporal coefficients with an

_{21} norm is a natural way to take into account the smooth temporal dynamics of M/EEG data. More generally, mixed-norm based priors offer a general way to take the structure of a problem into consideration. We call solutions obtained with such priors Mixed-Norm Estimates (MxNE). For an application to other brain imaging methods, see, for example (

Varoquaux et al. 2010), where a two-level mixed norm was employed for the identification of brain networks using functional Magnetic Resonance Images (MRI) data.

Despite this growing interest, the use of sparsity-inducing priors is still limited to a small group of researchers. One possible reason is that solvers proposed so far are slow when applied to the analysis of real datasets. Another explanation is that algorithms proposed so far are complex and difficult to implement. Indeed, while a basic minimum norm can be computed in a few hundreds of milliseconds, sparse inverse solvers as proposed in (

Haufe et al. 2008,

Ou et al. 2009) can take an hour to converge when realistic dimensions are used. A challenge is therefore to develop efficient optimization strategies that can solve the M/EEG inverse with such priors in a few seconds.

In the last few years, the machine learning and signal processing communities have devoted a lot of efforts into the improvement of the optimization methods that help to solve non-differentiable problems arising when considering sparse priors. One reason is that, under certain conditions, it has been proved that

*sparsity* could enable the perfect resolution of ill-posed problems (

Donoho 2006,

Candés & Tao 2005). Among the list of algorithms that have been proposed are IRLS methods, similar to the FOCUSS algorithm, that consist in iteratively computing weighted MN solutions with weights updated after each iteration (

Li 1993,

Daubechies et al. 2008). The LARS-LASSO algorithm (

Tibshirani 1996,

Efron et al. 2004), which is a variant of the homotopy method from Osborne (

Osborne et al. 2000), is an extremely powerful method for solving the

_{1} problem. Simple coordinate descent methods (

Friedman et al. 2007) or blockwise coordinate descent, also called Block Coordinate Relaxation (BCR) (

Bruce et al. 1998), are also possible strategies. Alternatively, methods based on projected gradients and proximity operators have been proposed (

Daubechies et al. 2004,

Combettes & Wajs 2005,

Nesterov 2007*a*,

Beck & Teboulle 2009). Even if some MxNE can be obtained efficiently,

*e.g.*, with coordinate descent, the algorithms proposed in this contribution rely on proximal operators and gradient based methods as they provide a generic approach for all MxNE. They are also grounded on the current mathematical understanding and convergence properties of these solvers.

In this paper, we introduce efficient methods to compute mixed-norm estimates from M/EEG data. The three main contributions of this article are:

- We present the M/EEG inverse problem as a convex optimization problem and we explain how structured solutions can be promoted via appropriate priors based on mixed-norms.
- We present in detail optimization methods that outperform in terms of convergence speed previously proposed algorithms and derive optimality conditions to control the convergence of the algorithm.
- We then give two examples of MxNE that are relevant for M/EEG using two and three-level mixed-norms, including application to real data.

The first section of the paper provides the necessary background and notation. Mixed-norm estimates with two or three-level mixed-norms are introduced. The second contains the algorithmic and mathematical details of the optimization methods. The third section provides experimental results on real MEG data, demonstrating the efficiency and relevance of the proposed methods.