Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2863111

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Background
- 3. Multiple-system single-output (MSSO) simultaneous sparse approximation
- 4. Proposed algorithms
- 5. Experiments and results
- 6. Discussion
- 7. Conclusion
- REFERENCES

Authors

Related links

SIAM J Sci Comput. Author manuscript; available in PMC 2010 May 4.

Published in final edited form as:

SIAM J Sci Comput. 2010 January 20; 31(6): 4533–4579.

doi: 10.1137/080730822PMCID: PMC2863111

NIHMSID: NIHMS184504

See other articles in PMC that cite the published article.

A problem that arises in slice-selective magnetic resonance imaging (MRI) radio-frequency (RF) excitation pulse design is abstracted as a novel linear inverse problem with a simultaneous sparsity constraint. Multiple unknown signal vectors are to be determined, where each passes through a different system matrix and the results are added to yield a single observation vector. Given the matrices and lone observation, the objective is to find a *simultaneously sparse* set of unknown vectors that approximately solves the system. We refer to this as the *multiple-system single-output* (MSSO) simultaneous sparse approximation problem. This manuscript contrasts the MSSO problem with other simultaneous sparsity problems and conducts an initial exploration of algorithms with which to solve it. Greedy algorithms and techniques based on convex relaxation are derived and compared empirically. Experiments involve sparsity pattern recovery in noiseless and noisy settings and MRI RF pulse design.

In this work we propose a linear inverse problem that requires a *simultaneously sparse* set of vectors as the solution, i.e., a set of vectors where only a small number of each vector's entries are nonzero, and where the vectors' *sparsity patterns* (the locations of the nonzero entries) are equal. Sparsity constraints and regularizations that promote sparsity have a long history that we will not attempt to recount; the reader is referred to [15] both for a theoretical result on the numerical robustness of using sparsity constraints and for references to early empirical work. While perhaps an old topic, the estimation of sparse or approximately sparse signals is also an extremely active area of research because of the emergence of compressed sensing [5, 17] and a multitude of applications for wavelet-domain sparsity of images; a review of recent developments with an emphasis on algorithms and performance guarantees is provided by [4].

We call the problem of interest *multiple-system single-output (MSSO) simultaneous sparse approximation*:

$$\mathbf{d}\approx \underset{p=1}{\overset{P}{\Sigma}}{\mathbf{F}}_{p}{\mathbf{g}}_{p},$$

(1.1)

where **d** ^{M}, **F**_{p} ^{M×N} for each *p*, and each **g**_{p} ^{N} has a common sparsity pattern or, more precisely, we measure the sparsity level by the number of positions at which *any* of the **g**_{p}'s is nonzero. This name highlights the distinction from the more common setting of *single-system multiple-output (SSMO) simultaneous sparse approximation*:

$${\mathbf{d}}_{p}\approx {\mathbf{Fg}}_{g}\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}p=1,\dots ,P,$$

(1.2)

where **d**_{p} ^{M} for each *p*, **F** ^{M×N}, and the **g**_{p}'s again have a common sparsity pattern. In either case, associated inverse problems are to minimize the simultaneous sparsity level of the **g**_{p}'s for a given goodness of fit, to optimize the goodness of fit for a given sparsity level, or to minimize an objective function that combines goodness of fit and sparsity level. There is a large and growing literature on SSMO problems, often using the phrase *multiple measurement vectors*, but MSSO problems have received much less attention.

MSSO sparse approximation came to our attention as an abstraction of the placement of spokes in short-duration, slice-selective magnetic resonance imaging (MRI) radio-frequency (RF) excitation pulse design [70, 73, 74], and we know of no earlier application. To keep the bulk of the developments applicable generally, we defer a description of the MRI application until the end of the paper. The paper is focused on formalizing the MSSO sparse approximation problem and introducing and contrasting several algorithms for finding approximate solutions. We have implemented and tested three greedy algorithms—generalizing matching pursuit (MP) [44], orthogonal matching pursuit (OMP) [6, 51, 14, 46, 9, 10], and least squares matching pursuit (LSMP) [10]—and also algorithms for solving a convex relaxation of the MSSO problem, applying second-order cone programming (SOCP) [3, 43], and generalizing iteratively reweighted least squares (IRLS) [36, 31] and iterative shrinkage [20, 12, 21, 22]. We evaluate the performance of the algorithms across three experiments: the first and second involve sparsity pattern recovery in noiseless and noisy scenarios, respectively, while the third deals with MRI RF excitation pulse design.

It is worth noting explicitly that this paper does not provide conditions for optimality (i.e., equivalence with a generally intractable sparsity-constrained problem) of greedy algorithms or convex relaxations, nor does it analyze random ensembles of problems. The sparse approximation literature is increasingly dominated by these types of analyses, and we cite these results in the appropriate sections.

The structure of this paper is as follows: in section 2, we provide background information about ordinary sparse approximation and SSMO sparse approximation. In section 3, we formulate the MSSO problem. Algorithms for solving the problem are then posed in section 4. The coverage of some algorithms is very brief; additional details on these can be found in [70, 72]. Details and results of the numerical experiments appear in section 5. Section 6 highlights the strengths and weaknesses of the algorithms and presents ideas for future work. Concluding remarks are given in section 7.

Following our naming rubric, ordinary sparse approximation can be called SSSO sparse approximation for emphasis. The problem can be written as

$$\underset{\mathbf{g}\phantom{\rule{thinmathspace}{0ex}}\in \phantom{\rule{thinmathspace}{0ex}}{\u2102}^{N}}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{min}}\left\{\frac{1}{2}{\parallel \mathbf{d}-\mathbf{Fg}\parallel}_{2}^{2}+\lambda {\parallel \mathbf{g}\parallel}_{0}\right\},\phantom{\rule{1em}{0ex}}\text{given}\phantom{\rule{thickmathspace}{0ex}}\mathbf{d}\in {\u2102}^{M}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\mathbf{F}\in {\u2102}^{M\times N},$$

(2.1)

where ‖ · ‖_{0} denotes the number of nonzero elements of a vector and *λ* (0, ∞) is a control parameter. Varying *λ* determines the relative importance of fitting the data **d** and keeping the solution **g** sparse. In many applications, *λ* is set to yield a specified sparsity or specified residual; it may also have other significance. For general **F**, solving (2.1) is NP-hard [13, 46]. Thus, great effort has gone into the design and analysis of approximate algorithms.

A greedy approach is to iteratively choose one position for a nonzero entry of **g** at a time or, equivalently, to pick one column of **F** at a time. To select the column of **F** to maximize the magnitude of the inner product with the current residual is called matching pursuit (MP) [44], and several more sophisticated variants have been proposed [6, 51, 14, 46, 9, 10]. Most important among these is orthogonal MP (OMP), which avoids deleterious interactions from iteration to iteration by working in the orthogonal complement of all previously selected columns. Notable analyses of OMP are those in [19, 61], where sufficient conditions are given for OMP to recover the sparsity pattern of the solution of (2.1). Analyses of OMP for random problem ensembles are given in [26, 64].

A second approach is to replace ‖ · ‖_{0} in (2.1) with its *relaxation* [7, 57]:

$$\underset{\mathbf{g}\phantom{\rule{thinmathspace}{0ex}}\in \phantom{\rule{thinmathspace}{0ex}}{\u2102}^{N}}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{min}}\left\{\frac{1}{2}{\parallel \mathbf{d}-\mathbf{Fg}\parallel}_{2}^{2}+\lambda {\parallel \mathbf{g}\parallel}_{1}\right\},\phantom{\rule{1em}{0ex}}\text{given}\phantom{\rule{thickmathspace}{0ex}}\mathbf{d}\in {\u2102}^{M}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\mathbf{F}\in {\u2102}^{M\times N}.$$

(2.2)

This is a convex optimization and thus may be solved efficiently [3]. Certain conditions on **F** guarantee proximity of the solutions to (2.1) and (2.2) [18, 19, 63]. Analyses of random problem ensembles are given in [53, 67].

Note that (2.2) applies an _{1} norm to **g**, but an _{p} norm (where *p* < 1) may also be used to promote sparsity [31, 7]; this leads to a nonconvex problem and will not considered in this paper. A problem of the form (2.2) may arise as a proxy for (2.1) or as the inherent problem of interest. For example, in a Bayesian estimation setting, (2.2) yields the maximum a posteriori probability estimate of **g** from **d** when the observation model involves **F** and Gaussian noise and the prior on **g** is Laplacian. Similar statements can be made about the relaxations in the following sections.

In SSMO, each of *P* observation vectors **d**_{p} is approximated by a product **Fg**_{p}, where the **g**_{p}'s are simultaneously sparse. This yields the problem

$$\underset{\mathbf{G}\phantom{\rule{thinmathspace}{0ex}}\in \phantom{\rule{thinmathspace}{0ex}}{\u2102}^{N\times P}}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{min}}\left\{\frac{1}{2}{\parallel \mathbf{D}-\mathbf{FG}\parallel}_{\mathrm{F}}^{2}+\lambda {\parallel \mathbf{G}\parallel}_{0,2}\right\},\phantom{\rule{1em}{0ex}}\text{given}\phantom{\rule{thickmathspace}{0ex}}\mathbf{D}\in {\u2102}^{M\times P}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\mathbf{F}\in {\u2102}^{M\times N},$$

(2.3)

where **D** = [**d**_{1}, …, **d**_{P}], **G** = [**g**_{1}, …, **g**_{P}], ‖ · ‖_{F} is the Frobenius norm, and ‖ · ‖_{0, 2} is the number of rows with nonzero _{2} norm (i.e., the simultaneous sparsity level).^{1} This reduces to (2.1) when *P* = 1 and is thus also generally computationally intractable.

Greedy algorithms for approximating (2.3) were first developed in [11, 65]. Deterministic conditions for sparsity pattern recovery and average case analyses for greedy algorithms are both presented in [32].

Analogous to the relaxation (2.2) is

$$\underset{\mathbf{G}\phantom{\rule{thinmathspace}{0ex}}\in \phantom{\rule{thinmathspace}{0ex}}{\u2102}^{N\times P}}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{min}}\left\{\frac{1}{2}{\parallel \mathbf{D}-\mathbf{FG}\parallel}_{\mathrm{F}}^{2}+\lambda {\parallel \mathbf{G}\parallel}_{\mathrm{S}}\right\},\phantom{\rule{1em}{0ex}}\text{given}\phantom{\rule{thickmathspace}{0ex}}\mathbf{D}\in {\u2102}^{M\times P}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\mathbf{F}\in {\u2102}^{M\times N},$$

(2.4)

where

$${\parallel \mathbf{G}\parallel}_{\mathrm{S}}={\parallel \mathbf{G}\parallel}_{0,1}=\underset{n=1}{\overset{N}{\Sigma}}\sqrt{\underset{p=1}{\overset{P}{\Sigma}}{\left|\mathbf{G}(n,p)\right|}^{2}}=\underset{n=1}{\overset{N}{\Sigma}}\sqrt{\underset{p=1}{\overset{P}{\Sigma}}{\left|{\mathbf{g}}_{p}\left[n\right]\right|}^{2}};$$

(2.5)

i.e., ‖**G**‖_{S} is the _{1} norm of the _{2} norms of the rows of the **G** matrix.^{2} This latter operator is a *simultaneous sparsity norm*: it penalizes the program (produces large values) when the columns of **G** have dissimilar sparsity patterns [43]. Fixing *λ* to a sufficiently large value and solving this optimization yields simultaneously sparse **g**_{p}'s.

Given the convex objective function in (2.4), one may then attempt to find a solution that minimizes the objective using an IRLS-based [11] or SOCP-based [43] approach. A formal analysis of the minimization of the convex objective may be found in [62]. Convex relaxations for this problem are also studied in [25, 28]. A related approach that may incorporate additional prior information is given in [68], and a boosting strategy that may be combined with either a greedy algorithm or a convex relaxation is presented in [45] and analyzed further in [66]. Also applicable to both greedy algorithms and convex relaxation are results in [8] that are analogous to the principal results of [19, 61, 63].

We outline the MSSO problem in a style analogous to that of SSMO systems in (2.3), (2.4) and then pose a second formulation that is useful for deriving several algorithms in section 4.

Consider the following system:

$$\mathbf{d}={\mathbf{F}}_{1}{\mathbf{g}}_{1}+\cdots +{\mathbf{F}}_{P}{\mathbf{g}}_{P}=[{\mathbf{F}}_{1}\cdots {\mathbf{F}}_{P}]\left[\begin{array}{c}\hfill {\mathbf{g}}_{1}\hfill \\ \hfill \vdots \hfill \\ \hfill {\mathbf{g}}_{P}\hfill \end{array}\right]={\mathbf{F}}_{\mathrm{tot}}{\mathbf{g}}_{\mathrm{tot}},$$

(3.1)

where **d** ^{M} and the **F**_{p} ^{M×N} are known. Unlike the SSMO problem, there are now only one observation and *P* different system matrices. Here again we desire an approximate solution where the **g**_{p}'s are simultaneously sparse, formally,

$$\underset{{\mathbf{g}}_{1},\dots ,{\mathbf{g}}_{P}}{\mathrm{min}}{\parallel \mathbf{d}-{\mathbf{F}}_{\mathrm{tot}}{\mathbf{g}}_{\mathrm{tot}}\parallel}_{2}\phantom{\rule{1em}{0ex}}\text{subject to}\phantom{\rule{thinmathspace}{0ex}}(\mathrm{s}.\mathrm{t}.)\phantom{\rule{thinmathspace}{0ex}}\text{the simultaneous}\phantom{\rule{thinmathspace}{0ex}}K-\text{sparsity of the}\phantom{\rule{thinmathspace}{0ex}}{\mathbf{g}}_{p}\u2019\mathrm{s},$$

(3.2)

or an equivalent unconstrained formulation. There are no constraints on the values of *M*, *N*, or *P*; i.e., there is no explicit requirement that the system be overdetermined or underdetermined.

In the first half of section 4, we will pose three approaches that attempt to solve the MSSO problem (3.2) in a greedy fashion. Another approach is to apply a relaxation similar to (2.2), (2.4):

$$\underset{\mathbf{G}}{\mathrm{min}}\left\{\frac{1}{2}{\parallel \mathbf{d}-{\mathbf{F}}_{\mathrm{tot}}{\mathbf{g}}_{\mathrm{tot}}\parallel}_{2}^{2}+\lambda {\parallel \mathbf{G}\parallel}_{\mathrm{S}}\right\},$$

(3.3)

where **G** and ‖**G**‖_{S} are the same as in (2.4) and (2.5), respectively. In the second half of section 4, we will outline four algorithms for solving this relaxed problem.

The MSSO problem can be expressed in an equivalent form using new variables that are simply permutations of the **F**_{p}'s and **g**_{p}'s. First we define *N* new matrices:

$${\mathbf{C}}_{n}=[{\mathbf{f}}_{1,n},\dots ,{\mathbf{f}}_{P,n}]\in {\u2102}^{M\times P}\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}n=1,\dots ,N,$$

(3.4)

where **f**_{p,n} is the *n*th column of **F**_{p}. Next we construct *N* new vectors:

$${\mathbf{h}}_{n}={[{\mathbf{g}}_{1}\left[n\right],\dots ,{\mathbf{f}}_{P}\left[n\right]]}^{T}\in {\u2102}^{P}\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}n=1,\dots ,N,$$

(3.5)

where **g**_{p}[*n*] is the *n*th element of **g**_{p} and ^{T} is the transpose operation. Using the **C**_{n}'s and **h**_{n}'s, we have another way to write **d**:

$$\mathbf{d}={\mathbf{C}}_{1}{\mathbf{h}}_{1}+\cdots +{\mathbf{C}}_{N}{\mathbf{h}}_{N}=[{\mathbf{C}}_{1}\cdots {\mathbf{C}}_{N}]\left[\begin{array}{c}\hfill {\mathbf{h}}_{1}\hfill \\ \hfill \vdots \hfill \\ \hfill {\mathbf{h}}_{N}\hfill \end{array}\right]={\mathbf{C}}_{\mathrm{tot}}{\mathbf{h}}_{\mathrm{tot}}.$$

(3.6)

The relationship between the **g**_{p}'s and **h**_{n}'s implies that if we desire to find a set of simultaneously sparse **g**_{p}'s to solve (3.1), we should seek out a set of **h**_{n}'s where many of the **h**_{n}'s equal an all-zeros vector, **0**. This claim is apparent if we consider the fact that **H** = [**h**_{1}, …, **h**_{N}] is equal to the transpose of **G**. This formulation of MSSO has recently been termed block-sparsity [24, 55] because the nonzero entries of **h**_{tot} come in contiguous blocks.

Continuing with this alternate formulation, and given our desire to find a solution where most of the **h**_{n}'s are all-zeros vectors, we relax the problem as follows:

$$\underset{{\mathbf{h}}_{\mathrm{tot}}}{\mathrm{min}}\left\{\frac{1}{2}{\parallel \mathbf{d}-{\mathbf{C}}_{\mathrm{tot}}{\mathbf{h}}_{\mathrm{tot}}\parallel}_{2}^{2}+\lambda \underset{n=1}{\overset{N}{\Sigma}}{\parallel {\mathbf{h}}_{n}\parallel}_{2}\right\}.$$

(3.7)

The equivalence of ${\Sigma}_{n=1}^{N}{\parallel {\mathbf{h}}_{n}\parallel}_{2}$ and ‖**G**‖_{S} means that (3.7) is equivalent to (3.3), and thus, just as in (3.3), the approach in (3.7) finds a set of simultaneously sparse **g**_{p}'s.

In the SSMO problem, the ratio of unknowns to knowns is *N/M* regardless of the number of observations, *P*. Increasing *P* has little effect on the residual error, but it is beneficial for estimation of a “best” or “correct” sparsity pattern because the simultaneous sparsity of the underlying **g**_{p}'s becomes more exploitable. Empirical evidence of improved sparsity pattern recovery with increasing *P* may be found, for example, in [11, 43], and related theoretical results on random problem ensembles are given in [47, 48].

In contrast, the ratio of unknowns to knowns is not constant with respect to *P* in the MSSO problem; rather it is equal to *PN/M*. As *P* is increased, it becomes easier to achieve a small residual, but sparsity pattern recovery becomes harder. This is supported experimentally in section 5.

After the initial submission of this manuscript, algorithms similar to our OMP variation in section 4.2 were introduced as block OMP (BOMP) [24] and parallel OMP (POMP) [41]. (POMP is more general, and three variations are given in [41].) The speed of these algorithms over general-purpose convex optimization is demonstrated in [41], and theoretical results analogous to those for ordinary sparse approximation from [19, 61] are given in [24]. Also, an empirical study of iterative shrinkage approaches to solving mixed-norm formulations like (3.3) is given in [38].

Another recent work is a random-ensemble analysis of high-dimensional block-sparse problems in [55]. This work uses the convex relaxation of (3.3) and studies problem size scalings at which a correct sparsity pattern can be recovered in the undersampled (*M* < *NP*), noiseless (**d** = **C**_{tot}**h**_{tot} for some block-sparse **h**_{tot}) case.

We now present algorithms for (approximately) solving the MSSO problem defined in section 3. Some algorithms are described in more detail, with pseudocode, in [70, 72].

The classic MP technique first finds the column of the system matrix that best matches with the observed vector and then removes from the observation vector the projection of this chosen column. It proceeds to select a second column of the system matrix that best matches with the *residual observation*, and continues doing so until either *K* columns have been chosen (as specified by the user) or the residual observation ends up as a vector of all zeros.

Now let us consider the MSSO system as posed in (3.6). In the MSSO context, we need to seek out which of the *N ***C**_{n} matrices can be best used to represent **d** when the columns of **C**_{n} undergo an arbitrarily weighted linear combination. The key difference here is that on an iteration-by-iteration basis, we are no longer deciding which column vector best represents the observation, but which *matrix* does so. For the *k*th iteration of the algorithm, we need to select the proper index *n* {1, …,*N*} by solving

$${q}_{k}=\underset{n}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{min}}\phantom{\rule{thinmathspace}{0ex}}\underset{{\mathbf{h}}_{n}}{\mathrm{min}}{\parallel {\mathbf{r}}_{k-1}-{\mathbf{C}}_{n}{\mathbf{h}}_{n}\parallel}_{2}^{2},$$

(4.1)

where *q _{k}* is the index that will be selected and

$${q}_{k}=\underset{n}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{min}}\phantom{\rule{thinmathspace}{0ex}}\underset{{\mathbf{h}}_{n}}{\mathrm{min}}{\parallel {\mathbf{r}}_{k-1}-{\mathbf{C}}_{n}\left({\mathbf{C}}_{n}^{\u2020}{\mathbf{r}}_{k-1}\right)\parallel}_{2}^{2}=\underset{n}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{max}}\phantom{\rule{thinmathspace}{0ex}}{\mathbf{r}}_{k-1}^{\mathrm{H}}{\mathbf{C}}_{n}{\mathbf{C}}_{n}^{\u2020}{\mathbf{r}}_{k-1},$$

(4.2)

where ^{H} is the Hermitian transpose. From (4.2) we see that, analogously to standard MP, choosing the best index for iteration *k* involves computing and ranking a series of inner product–like quadratic terms.

In single-vector MP, the residual **r**_{k} is orthogonal to the *k*th column of the system matrix, but as the algorithm continues iterating, there is no guarantee that the residual remains orthogonal to column *k* or is minimized in the least squares sense with respect to the entire set of *k* chosen column vectors (indexed by *q*_{1}, …, *q*_{k}). Furthermore, *K* iterations of single-vector MP do not guarantee that *K* different columns will be selected. Single-vector OMP is an extension to MP that attempts to mitigate these problems by improving the calculation of the residual vector. During the *k*th iteration of single-vector OMP, column *q _{k}* is chosen exactly as in MP (by ranking the inner products of the residual vector

To extend OMP to the MSSO problem, we choose matrix *q _{k}* during iteration

$${\mathbf{r}}_{k}=\mathbf{d}-{\mathbf{S}}_{k}\left({\mathbf{S}}_{k}^{\u2020}d\right),$$

(4.3)

where **S**_{k} = [**C**_{q1}, …,**C**_{qk}] and ${\mathbf{S}}_{k}^{\u2020}\mathbf{d}$ is the best choice of **x** that minimizes the residual error **d** − **S**_{k}x_{2}. That is, to update the residual we now employ all chosen matrices, weighting and combining them to best represent **d** in the least squares sense, yielding an **r**_{k} that is orthogonal to the columns of **S**_{k} (and thus orthogonal to **C**_{q1}, …,**C**_{qk}), which has the benefit of ensuring that OMP will select a new **C**_{n} matrix at each step.

Beyond OMP there exists a greedy algorithm with greater computational complexity known as LSMP. In single-vector LSMP, one solves *N* − *k* + 1 least squares minimizations during iteration *k* to determine which column of the system matrix may be used to best represent **d** [10].

To extend LSMP to MSSO systems, we must ensure that during iteration *k* we account for the *k* − 1 previously chosen **C**_{n} matrices when choosing the *k*th one to best construct an approximation to **d**. Specifically, index *q _{k}* is selected as follows:

$${q}_{k}=\underset{n\phantom{\rule{thinmathspace}{0ex}}\in \phantom{\rule{thinmathspace}{0ex}}\{1,\dots ,N\},\phantom{\rule{thinmathspace}{0ex}}n\phantom{\rule{thinmathspace}{0ex}}\notin \phantom{\rule{thinmathspace}{0ex}}{I}_{k-1}}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{min}}\underset{\mathbf{x}}{\mathrm{min}}{\parallel {\mathbf{S}}_{k}^{\left(n\right)}\mathbf{x}-\mathbf{d}\parallel}_{2}^{2},$$

(4.4)

where *I*_{k−1} is the set of indices chosen up through iteration $k-1,{\mathbf{S}}_{k}^{\left(n\right)}=[{\mathbf{S}}_{k-1},{\mathbf{C}}_{n}]$, **S**_{k−1} = [**C**_{q1}, …,**C**_{qk−1}], and **x** ^{Pk}. For fixed *n*, the solution of the inner iteration is ${\mathbf{x}}^{\mathrm{opt}}={\left({\mathbf{S}}_{k}^{\left(n\right)}\right)}^{\u2020}\mathbf{d}$; it is this step that ensures the residual observation error will be minimized by using *all* chosen matrices. Substituting **x**^{opt} into (4.4) and simplifying the expression yields

$${q}_{k}=\underset{n\phantom{\rule{thinmathspace}{0ex}}\notin \phantom{\rule{thinmathspace}{0ex}}{I}_{k-1}}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{max}}{\mathbf{d}}^{\mathrm{H}}{\mathbf{Q}}_{k}^{\left(n\right)}\mathbf{d},$$

(4.5)

where ${\mathbf{Q}}_{k}^{\left(n\right)}=\left({\mathbf{S}}_{k}^{\left(n\right)}\right){\left({\mathbf{S}}_{k}^{\left(n\right)}\right)}^{\u2020}$.

Algorithm 1 describes the LSMP method. The complexity here is much greater than that of OMP because *N* − *k* + 1 pseudoinversions of an *M* × *Pk* matrix are required during each iteration *k*. Furthermore, the dependence of ${\mathbf{Q}}_{k}^{\left(n\right)}$ on both *n* and *k* makes precomputing all such matrices infeasible in most cases. One way to decrease computation and runtime might be to extend the projection-based recursive updating scheme of [10] to MSSO LSMP.

Having posed three greedy approaches for solving the MSSO problem, we now turn our attention toward minimizing (3.7), the relaxed objective function. Here, the regularization term *λ* is used to trade off simultaneous sparsity with residual observation error.

One way to minimize (3.7) is to use an IRLS-based approach [36]. To begin, consider manipulating the right-hand term of (3.7) as follows:

$$\begin{array}{cc}\hfill \lambda \underset{n=1}{\overset{N}{\Sigma}}{\parallel {\mathbf{h}}_{n}\parallel}_{2}& =\lambda \underset{n=1}{\overset{N}{\Sigma}}\frac{{\parallel {\mathbf{h}}_{n}\parallel}_{2}^{2}}{{\parallel {\mathbf{h}}_{n}\parallel}_{2}}=\lambda \underset{n=1}{\overset{N}{\Sigma}}\frac{{\left|{\mathbf{h}}_{n}\left[1\right]\right|}^{2}+\cdots +{\left|{\mathbf{h}}_{n}\left[P\right]\right|}^{2}}{{\parallel {\mathbf{h}}_{n}\parallel}_{2}}\hfill \\ \hfill & \approx \frac{\lambda}{2}\underset{n=1}{\overset{N}{\Sigma}}[{\mathbf{h}}_{n}^{\ast}\left[n\right],\dots ,{\mathbf{h}}_{n}^{\ast}\left[P\right]]\left[\begin{array}{ccc}\hfill \frac{2}{{\parallel {\mathbf{h}}_{n}\parallel}_{2+\u220a}}\hfill & \hfill \hfill & \hfill \hfill \\ \hfill \hfill & \hfill \ddots \hfill & \hfill \hfill \\ \hfill \hfill & \hfill \hfill & \hfill \frac{2}{{\parallel {\mathbf{h}}_{n}\parallel}_{2+\u220a}}\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\mathbf{h}}_{n}\left[1\right]\hfill \\ \hfill \vdots \hfill \\ \hfill {\mathbf{h}}_{n}\left[P\right]\hfill \end{array}\right]\hfill \\ \hfill & =\frac{\lambda}{2}\underset{n=1}{\overset{N}{\Sigma}}{\mathbf{h}}_{n}^{\mathrm{H}}{\mathbf{W}}_{n}{\mathbf{h}}_{n}\hfill \\ \hfill & =\frac{\lambda}{2}[{\mathbf{h}}_{1}^{\mathrm{H}}\cdots {\mathbf{h}}_{N}^{\mathrm{H}}]\left[\begin{array}{ccc}\hfill {\mathbf{W}}_{1}\hfill & \hfill \hfill & \hfill \hfill \\ \hfill \hfill & \hfill \ddots \hfill & \hfill \hfill \\ \hfill \hfill & \hfill \hfill & \hfill {\mathbf{W}}_{N}\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\mathbf{h}}_{1}\hfill \\ \hfill \vdots \hfill \\ \hfill {\mathbf{h}}_{N}\hfill \end{array}\right]=\frac{\lambda}{2}{\mathbf{h}}_{\mathrm{tot}}^{\mathrm{H}}{\mathbf{W}}_{\mathrm{tot}}{\mathbf{h}}_{\mathrm{tot}},\hfill \end{array}$$

(4.6)

where * is the complex conjugate of a scalar, **W**_{n} is a *P* × *P* real-valued diagonal matrix whose diagonal elements each equal 2/(**h**_{n}_{2} + ), and is some small non-negative value introduced to mitigate poor conditioning of the **W**_{n}s. If we fix **W**_{tot} ^{PN×PN} by computing it using some prior estimate of **h**_{tot}, then the right-hand term of (3.7) becomes a quadratic function and (3.7) transforms into a Tikhonov optimization [58, 59]:

$$\underset{{\mathbf{h}}_{\mathrm{tot}}}{\mathrm{min}}\left\{\frac{1}{2}{\parallel \mathbf{d}-{\mathbf{C}}_{\mathrm{tot}}{\mathbf{h}}_{\mathrm{tot}}\parallel}_{2}^{2}+\frac{\lambda}{2}{\mathbf{h}}_{\mathrm{tot}}^{\mathrm{H}}{\mathbf{W}}_{\mathrm{tot}}{\mathbf{h}}_{\mathrm{tot}}\right\}.$$

(4.7)

Finally, by performing a change of variables and exploiting the properties of **W**_{tot}, we can convert (4.7) into an expression that may be minimized using the robust and reliable conjugate-gradient (CG) least squares solver LSQR [50, 49], so named because it applies a QR decomposition [30] when solving the system in the least squares sense. LSQR works better in practice than several other CG methods [2] because it restructures the input system via the Lanczos process [40] and applies a Golub–Kahan bidiagonalization [29] prior to solving it.

To apply LSQR to this problem, we first construct ${\mathbf{W}}_{\mathrm{tot}}^{1\u22152}$ as the element-by-element square root of the diagonal matrix **W**_{tot} and then take its inverse to obtain ${\mathbf{W}}_{\mathrm{tot}}^{-1\u22152}$. Defining $\mathbf{q}={\mathbf{W}}_{\mathrm{tot}}^{1\u22152}{\mathbf{h}}_{\mathrm{tot}}$ and $\mathbf{A}={\mathbf{C}}_{\mathrm{tot}}{\mathbf{W}}_{\mathrm{tot}}^{-1\u22152}$, (4.7) becomes

$$\underset{\mathrm{q}}{\mathrm{min}}\left\{{\parallel \mathbf{d}-\mathbf{Aq}\parallel}_{2}^{2}+\lambda {\parallel \mathbf{q}\parallel}_{2}^{2}\right\}.$$

(4.8)

LSQR is formulated to solve the exact problem in (4.8). Calling LSQR with **d**, *λ*, and **A** yields **q**^{opt}, and the solution ${\mathbf{h}}_{\mathrm{tot}}^{\mathrm{opt}}$ is backed out via ${\mathbf{W}}_{\mathrm{tot}}^{-1\u22152}{\mathbf{q}}^{\mathrm{opt}}$.

Algorithm 2 outlines how one may iteratively apply (4.8) to attempt to find a solution that minimizes the original cost function, (3.7). The technique iterates until the objective function decreases by less than *δ* or the maximum number of iterations, *K*, is exceeded. The initial solution estimate is obtained via pseudoinversion of **C**_{tot} (an all-zeros initialization would cause **W**_{tot} to be poorly conditioned). A line search is used to step between the prior solution estimate and the upcoming one; this improves the rate of convergence and ensures that the objective decreases at each step. This method is global in that all *PN* unknowns are being estimated concurrently per iteration.

The proposed IRLS technique solves for all *PN* unknowns during each iteration, but this is cumbersome when *PN* is large. An alternative approach is to apply an inner loop that fixes *n* and then iteratively tunes **h**_{n} while holding the other **h**_{m}s (*m* ≠ *n*) constant; thus only *P* (rather than *PN*) unknowns need to be solved for during each inner iteration.

This idea inspires the RBRS algorithm. The term “row-by-row” is used because each **h**_{n} corresponds to row *n* of the **G** matrix in (3.3), and “shrinkage” is used because the _{2} energy of most of the **h**_{n}'s will essentially be “shrunk” (to some extent) during each inner iteration: when *λ* is sufficiently large and many iterations are undertaken, many **h**_{n}'s will be close to all-zeros vectors.

Assume that **d** and the **C**_{n}'s of (3.7) are real-valued. We seek to minimize the function by extending the single-vector sequential shrinkage technique of [21] and making modifications to (3.7). Assume that we have prior estimates of **h**_{1} through **h**_{N}, and that we now desire to update only the *j*th vector while keeping the other *N* − 1 fixed. The shrinkage update of **h**_{j} is achieved via

$$\underset{\mathbf{x}}{\mathrm{min}}\left\{\frac{1}{2}{\parallel [{\Sigma}_{n=1}^{N}{\mathbf{C}}_{n}{\mathbf{h}}_{n}-{\mathbf{C}}_{j}{\mathbf{h}}_{j}]+{\mathbf{C}}_{j}\mathbf{x}-\mathbf{d}\parallel}_{2}^{2}+\lambda {\parallel \mathbf{x}\parallel}_{2}\right\},$$

(4.9)

where ${\Sigma}_{n=1}^{N}{\mathbf{C}}_{n}{\mathbf{h}}_{n}-{\mathbf{C}}_{j}{\mathbf{h}}_{j}$ forms an approximation of **d** using the prior solution coefficients, but discards the component contributed by the original *j*th vector, replacing the latter via an updated solution vector, **x**. It is crucial to note that the right-hand term does not promote the element-by-element sparsity of **x**; rather, it penalizes the overall _{2} energy of **x**, and thus both sparse and dense **x**'s are penalized equally if their overall _{2} energies are the same.

One way to solve (4.9) is to take its derivative with respect to **x**^{T} and find **x** such that the derivative equals **0**. By doing this and shuffling terms, and assuming we have an initial estimate of **x**, we may solve for **x** iteratively:

$${\mathbf{x}}_{i}={\left[{\mathbf{C}}_{j}^{\mathrm{T}}{\mathbf{C}}_{j}+\frac{\lambda}{{\parallel {\mathbf{x}}_{i-1}\parallel}_{2}+\u220a}\mathbf{I}\right]}^{-1}{\mathbf{C}}_{j}^{\mathrm{T}}{\mathbf{r}}_{j},$$

(4.10)

where ${\mathbf{r}}_{j}=\mathbf{d}+{\mathbf{C}}_{j}{\mathbf{h}}_{j}-{\Sigma}_{n=1}^{N}{\mathbf{C}}_{n}{\mathbf{h}}_{n}$, **I** is a *P* × *P* identity matrix, and is a small value that avoids ill-conditioned results.^{3} By iterating on (4.10) until (4.9) changes by less than *δ*_{inner}, we arrive at a solution to (4.9), **x**^{opt}, and this then replaces the prior solution vector, **h**_{j}. Having completed the update of the *j*th vector, we proceed to update the rest of the vectors, looping this outer process *K* times or until the main objective function, (3.7), changes by less than *δ*_{outer}. Algorithm 3 details the entire procedure; unlike IRLS, here we essentially repeatedly invert *P* × *P* matrices to pursue a row-by-row solution, rather than *PN* × *PN* matrices to pursue a solution that updates *all* rows per iteration.

If (3.7) contains complex-valued terms, we may structure the row-by-row updates as in (4.9), but because the derivative of the objective function in (4.9) is more complicated due to the presence of complex-valued terms, the simple update equation given in (4.10) is no longer applicable. One way to overcome this problem is to turn the complex-valued problem into a real-valued one.

Let us create several real-valued expanded vectors,

$$\stackrel{~}{\mathbf{d}}=\left[\begin{array}{c}\hfill \mathrm{Re}\left(\mathbf{d}\right)\hfill \\ \hfill \mathrm{Im}\left(\mathbf{d}\right)\hfill \end{array}\right]\in {\mathbb{R}}^{2M},\phantom{\rule{1em}{0ex}}{\stackrel{~}{\mathbf{h}}}_{n}=\left[\begin{array}{c}\hfill \mathrm{Re}\left({\mathbf{h}}_{n}\right)\hfill \\ \hfill \mathrm{Im}\left({\mathbf{h}}_{n}\right)\hfill \end{array}\right]\in {\mathbb{R}}^{2P},$$

(4.11)

as well as real-valued expanded matrices,

$${\stackrel{~}{\mathbf{C}}}_{n}=\left[\begin{array}{cc}\hfill \mathrm{Re}\left({\mathbf{C}}_{n}\right)\hfill & \hfill -\mathrm{Im}\left({\mathbf{C}}_{n}\right)\hfill \\ \hfill \mathrm{Im}\left({\mathbf{C}}_{n}\right)\hfill & \hfill \mathrm{Re}\left({\mathbf{C}}_{n}\right)\hfill \end{array}\right]\in {\mathbb{R}}^{2M\times 2P}.$$

(4.12)

Due to the structure of (4.11), (4.12) and the fact that **h**_{n}_{2} equals _{n}_{2}, the following optimization is equivalent to (3.7):

$$\underset{{\stackrel{~}{\mathbf{h}}}_{1},\dots ,{\stackrel{~}{\mathbf{h}}}_{N}}{\mathrm{min}}\left\{\frac{1}{2}{\Vert \stackrel{~}{\mathbf{d}}-{\Sigma}_{n=1}^{N}{\stackrel{~}{\mathbf{C}}}_{n}{\stackrel{~}{\mathbf{h}}}_{n}\Vert}_{2}^{2}+\lambda \underset{n=1}{\overset{N}{\Sigma}}{\parallel {\stackrel{~}{\mathbf{h}}}_{n}\parallel}_{2}\right\}.$$

(4.13)

This means that we may apply RBRS to complex-valued scenarios by substituting the _{n}'s for the **h**_{n}'s and _{n}'s for the **C**_{n}'s in (4.9), (4.10), and Algorithm 3. (Equation (4.10) becomes an applicable update equation because (4.9) will consist of only real-valued terms and the derivative calculated earlier is again applicable.) Finally, after running the algorithm to obtain finalized _{n}'s, we may simply restructure them into complex **h**_{n}'s.

In embedding the complex-valued problem into a larger real-valued problem, we have demonstrated a fact about simultaneous sparse approximation that seems to not have been remarked upon previously: by setting *P* = 1, we see that seeking a single sparse complex-valued vector is equivalent to seeking two *simultaneously sparse* real-valued vectors.

We have also developed a dual of RBRS—a technique that sequentially updates the *columns* of **G** (i.e., the **g**_{p}'s) in (3.1), (3.3) rather than its rows (the **h**_{n}'s). This approach yields a *separable* optimization and reduces the overall problem to simply repeated *element-by-element* shrinkages of each **g**_{p}. For a detailed derivation and pseudocode, see [70, 72].

We now propose a seventh and final algorithm for solving the MSSO problem as given in (3.3). We branch away from the shrinkage approaches that operate on individual columns or rows of the **G** matrix and again seek to concurrently estimate all *PN* unknowns. Rather than using an IRLS technique, however, we pursue an SOCP approach, motivated by the fact that second-order cone programs may be solved via efficient interior point algorithms [56, 60] and are able to encapsulate conic, convex-quadratic [1], and linear constraints. (Quadratic programming is not an option because the **g**_{p}'s, **F**_{p}'s, and **d** may be complex.)

Second-order conic constraints are of the form **a** = [*a*_{1}, …, *a _{N}*]

$${\parallel {[{a}_{1},\dots ,{a}_{N-1}]}^{\mathrm{T}}\parallel}_{2}\le {a}_{N}.$$

(4.14)

The generic format of an SOC program is

$$\underset{\mathbf{x}}{\mathrm{min}}{\mathbf{c}}^{\mathrm{T}}\mathbf{x}\phantom{\rule{1em}{0ex}}\mathrm{s}.\mathrm{t}.\phantom{\rule{1em}{0ex}}\mathbf{Ax}=\mathbf{b}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\mathbf{x}\in \mathbf{K},$$

(4.15)

where $\mathbf{K}={\mathbb{R}}_{+}^{N}\times {\mathbf{L}}_{1}\times \cdots \times {\mathbf{L}}_{N},{\mathbb{R}}_{+}^{N}$ is the *N*-dimensional positive orthant cone and the **L**_{n}s are second-order cones [1]. To convert (3.3) into the second-order cone format, we first write

$$\begin{array}{c}\hfill \underset{\mathbf{G}}{\mathrm{min}}\left\{\frac{1}{2}s+\lambda {1}^{\mathrm{T}}\mathbf{t}\right\}\phantom{\rule{1em}{0ex}}\mathrm{s}.\mathrm{t}.\phantom{\rule{1em}{0ex}}\mathbf{z}={\mathbf{d}}_{\mathrm{tot}}-{\mathbf{F}}_{\mathrm{tot}}{\mathbf{g}}_{\mathrm{tot}},\phantom{\rule{1em}{0ex}}{\parallel \mathbf{z}\parallel}_{2}^{2}\le s,\hfill \\ \hfill \text{and}\phantom{\rule{1em}{0ex}}{\parallel {[\mathrm{Re}\left({\mathbf{g}}_{1}\left[n\right]\right),\mathrm{Im}\left({\mathbf{g}}_{1}\left[n\right]\right),\dots ,\mathrm{Re}\left({\mathbf{g}}_{P}\left[n\right]\right),\mathrm{Im}\left({\mathbf{g}}_{P}\left[n\right]\right)]}^{\mathrm{T}}\parallel}_{2}\le {t}_{n},\hfill \end{array}$$

(4.16)

where *n* {1, …,*N*} and **t** = [*t*_{1}, …,*t*_{N}]^{T}. The splitting of the complex elements of the **g**_{p}'s mimics the approach used when extending CBCS to complex data, and (4.16) makes the objective function linear, as required. Finally, in order to represent the ${\parallel \mathbf{z}\parallel}_{2}^{2}\le s$ inequality in terms of second-order cones, an additional step is needed. Given that $s=\frac{1}{4}{(s+1)}^{2}-\frac{1}{4}{(s-1)}^{2}$, the inequality may be rewritten as ${\mathbf{z}}^{\mathrm{H}}\mathbf{z}+\frac{1}{4}{(s-1)}^{2}\le \frac{1}{4}{(s+1)}^{2}$ and then expressed as a conic constraint: ${\parallel {[{\mathbf{z}}^{T},\frac{1}{2}(s-1)]}^{\mathrm{T}}\parallel}_{2}\le \frac{1}{2}(s+1)$ [1, 42]. Applying these changes yields

$$\begin{array}{c}\hfill \mathrm{min}\phantom{\rule{thickmathspace}{0ex}}\left\{\frac{1}{2}s+\lambda {1}^{\mathrm{T}}\mathbf{t}\right\}\phantom{\rule{1em}{0ex}}\mathrm{s}.\mathrm{t}.\phantom{\rule{1em}{0ex}}\mathbf{z}={\mathbf{d}}_{\mathrm{tot}}-{\mathbf{F}}_{\mathrm{tot}}{\mathbf{g}}_{\mathrm{tot}},\hfill \\ \hfill {\parallel {[{\mathbf{z}}^{\mathrm{T}},u]}^{T}\parallel}_{2}\le v,\phantom{\rule{1em}{0ex}}u=(s-1)\u22152,\phantom{\rule{1em}{0ex}}v=(s+1)\u22152,s\ge 0,\phantom{\rule{1em}{0ex}}\text{and}\hfill \\ \hfill {\parallel {[\mathrm{Re}\left({\mathbf{g}}_{1}\left[n\right]\right),\mathrm{Im}\left({\mathbf{g}}_{1}\left[n\right]\right),\dots ,\mathrm{Re}\left({\mathbf{g}}_{P}\left[n\right]\right),\mathrm{Im}\left({\mathbf{g}}_{P}\left[n\right]\right)]}^{\mathrm{T}}\parallel}_{2}\le {t}_{n},\hfill \end{array}$$

(4.17)

which is a fully defined second-order cone program that may be implemented and solved numerically. There is no algorithm pseudocode for this technique because having set up the variables in (4.17), one may simply plug them into an SOCP solver. In this paper we implement (4.17) in SeDuMi (Self-Dual-Minimization) [56], a free software package consisting of MATLAB and C routines.

Our motivation for solving MSSO sparse approximation problems comes from MRI RF excitation pulse design. Before turning to this problem in section 5.3, we present several synthetic experiments. These experiments allow comparisons among algorithms and also empirically reveal some properties of the relaxation (3.3). Theoretical exploration of this relaxation is also merited but is beyond the scope of this manuscript. One work in this area is [55].

Experiments were performed on a Linux server with a 3.0-GHz Intel Pentium IV processor. The system has 16 GB of random access memory, ample to ensure that none of the algorithms requires the use of virtual memory and to avoid excessive hard drive paging. MP, LSMP, IRLS, RBRS, and CBCS are implemented in MATLAB, whereas SOCP is implemented in SeDuMi. The runtimes could be reduced significantly by implementing the methods in a completely compiled format such as C. Note: OMP is not evaluated because its performance always falls in between that of MP and LSMP.

We now evaluate how well the algorithms of section 4 estimate sparsity patterns when the underlying **g**_{p}'s are each strictly and simultaneously *K*-sparse and the observation **d** of (3.1) is not corrupted by noise. This caricatures a high signal-to-noise ratio (SNR) source localization scenario, where the sparsity pattern indicates locations of emitters and our goal is to find the locations of these emitters [34, 39, 42, 43].

We synthetically generate real-valued sets of **F**_{p}'s and **g**_{p}'s in (3.1), apply the algorithms, and record the fraction of correct sparsity pattern entries recovered by each. We vary *M* in (3.1) to see how performance at solving the MSSO problem varies when the **F**_{p}'s are underdetermined vs. overdetermined and also vary *P* to see how rapidly performance degrades as more system matrices and vectors are employed.

For all trials, we fix *N* = 30 in (3.1) and *K* = 3, which means each **g**_{p} vector consists of 30 elements, three of which are nonzero. We consider *P* {1, 2, …, 8} and *M* {10, 15, …, 40}. For each of the 56 fixed (*M, P*) pairs, we create 50 random instances of (3.1) over which to report the average performance. Each of the 2 800 instances is constructed with the sparsity pattern chosen uniformly at random and the nonzero entries of the **g**_{p}'s drawn from the standard normal *N*(0, 1) distribution. The columns of the **F**_{p}'s are chosen independently from the uniform distribution on the ^{M} unit sphere.

MP and LSMP are applied by iterating until *K* elements are chosen or the residual approximation is **0**. If fewer than *K* terms are chosen, this hurts the recovery score.

For IRLS, RBRS, CBCS, and SOCP, we approximate a λ oracle as follows: loop over roughly 70 λ's in [0, 2], running the given algorithm each time. For each of the estimated **ĝ**_{tot}'s (that vary with λ), estimate a sparsity pattern by noting the largest *λ*_{2} energy rows of the associated **Ĝ** matrix.^{4} Remember the highest fraction recovered across all λ's.

Each subplot of Figure 5.1 depicts the average fraction of recovered sparsity pattern elements vs. the number of knowns, *M*, for a fixed value of *P*, revealing how performance varies as the **F**_{p} ^{M×N} matrices go from being underdetermined to overdetermined.

As the number of knowns *M* increases, recovery rates improve substantially, which is sensible. For large *M* and small *P*, the six algorithms behave similarly, consistently achieving nearly 100% recovery. For large *P* and moderate *M*, however, sparsity pattern recovery rates are dismal—as *P* increases, the underlying simultaneous sparsity of the **g**_{p}'s is not enough to combat the increasing number of unknowns, *PN*. As *M* is decreased and especially when *P* is increased, the performance of the greedy techniques falls off relative to that of IRLS, RBRS, CBCS, and SOCP, showing that the convex relaxation approach itself is a sensible way to approximately solve the formal NP-hard combinatorial MSSO simultaneous sparsity problem. Furthermore, the behavior of the convex algorithms relative to the greedy ones coincides with the studies of greedy vs. convex programming sparse approximation methods in single-vector [7, 10] and SSMO contexts [11]. LSMP tends to perform slightly better than MP because it solves a least squares minimization and explicitly considers earlier chosen rows whenever it seeks to choose another row of **G**.

Across most trials, IRLS, RBRS, CBCS, and SOCP converge rapidly and do not exceed the maximum limit of 500 outer iterations. The exception is CBCS when *M* is small and *P* = 8: here, the objective function frequently fails to decrease by less than the specified *δ* = 10^{−5}.

For several fixed (*M, P*) pairs, Table 5.1 lists the average runtimes of each algorithm across the 50 trials associated with each pair. For IRLS, RBRS, CBCS, and SOCP, runtimes are also averaged over the many *λ* runs. Among the convex minimization methods, SOCP seems superior given its fast runtimes in three out of four cases. Peak memory usage is not tracked here because it is difficult to do so when using MATLAB for such small problems; it will be tracked during the third experiment where the system matrices are vastly larger and differences in memory usage across the six algorithms are readily apparent.

Average algorithm runtimes for noiseless sparsity pattern estimation
For several fixed (M, P) pairs, each algorithm's average runtime over the corresponding *50* trials is given in units of milliseconds; N = *30* and K = *3* for all trials (runtimes of the **...**

Some details on how IRLS, RBRS, CBCS, and SOCP differ in the objective functions that they achieve are provided in [70, 72].

We now modify the scenario of section 5.1 to include additive white Gaussian noise in the observation **d**. The simultaneous sparsity level *K* and SNR (in dB) are varied across sets of Monte Carlo trials. The variance of each component of the noise is related to the SNR as follows:

$${\sigma}^{2}=\frac{1}{M}{\parallel {\mathbf{d}}_{\text{true}}\parallel}_{2}^{2}\cdot {10}^{-\mathrm{SNR}\u221510}.$$

(5.1)

This noise measure is analogous to that of [11].

We fix *N* = 30, *M* = 25, and *P* = 3, and consider SNR {−10, −5, 0, …, 25, 30} and *K* {1, 3, 5, 7, 9}. For each fixed (SNR, K) pair, we generate 100 random instances over which to report the average performance. The **g**_{p}'s and **F**_{p}'s are generated as in section 5.1.2. The algorithms are applied as before, with the exception that IRLS, RBRS, CBCS, and SOCP use a fixed λ as described below.

Before running the overall experiment, we generate three noisy observations for each (SNR, *K*) pair. We then apply IRLS, RBRS, CBCS, and SOCP, tuning the control parameter *λ* by hand until finding a single value that produces reasonable solutions. All algorithms then use this fixed *λ* for all 100 trials with the (SNR, *K*) pair under consideration. Thus, in contrast to the noiseless experiment, we no longer assume an ideal *λ* is known for each denoising trial.

Each subplot of Figure 5.2 depicts the average fraction of recovered sparsity pattern elements vs. SNR for a fixed *K*, revealing how well the six algorithms are able to recover the sparsity pattern amid noise.

When *K* = 1, we see from the upper-left subplot of Figure 5.2 that all algorithms have essentially equal performance for each SNR. Recovery rates improve substantially with increasing SNR, which is sensible. For each algorithm, we see across the subplots that performance generally decreases with increasing *K*; this too is as expected. For low SNRs, e.g., −5 dB, all algorithms tend to perform similarly, but the greedy algorithms perform increasingly worse than the others as *K* goes from moderate to large and SNR surpasses zero dB. In general, MP performs worse than LSMP, and LSMP in turn performs worse than IRLS, SOCP, RBRS, and CBCS, while the latter four methods exhibit essentially the same performance across all SNRs and *K*'s. Overall, Figure 5.2 shows that convex programming algorithms are superior to greedy methods when estimating sparsity patterns in noisy situations; this coincides with data collected in the noiseless experiment in section 5.1, as well as the empirical findings of [10, 11].

CBCS typically requires more iterations than the other techniques in order to converge. At times, it fails to converge to within the specified *δ* = 10^{−5}, similarly to how it behaves during the noiseless experiment of section 5.1.

Across all denoising trials, MP, LSMP, IRLS, RBRS, CBCS, and SOCP have average runtimes of 3.1, 25.1, 57.2, 247.0, 148.5, and 49.2 milliseconds. It seems that SOCP is best for denoising given that it is the fastest algorithm among the four methods that outperform the greedy ones. IRLS is nearly as fast as SOCP and thus is a close second choice for sparsity pattern estimation.

This experiment is extended with a discussion of mean-squared error of the estimates in [70, 72].

For the final experiment we study how well the six algorithms design MRI RF excitation pulses. In the interest of space and because the conversion of the physical problem into an MSSO format involves MRI physics and requires significant background, we only briefly outline how the system matrices arise and why simultaneously sparse solutions are necessary. A complete formulation of the problem for engineers and mathematicians is given in [70, 71]; MRI pulse designers may refer to [74]. We limit our evaluation here to fidelity of the designed excitations. Related papers provide results from a real system for head-shaped water phantom imaging [74] and in vivo human brain imaging [73].

Consider an MRI experiment in which thin slices are desired in a spatial dimension defined to be the *z* direction. Thin-slice imaging is a dominant use of RF excitation in clinical MRI. For the purposes of this paper, the design of an MRI RF excitation pulse reduces to the following problem: assume that we are given *M* points in the two-dimensional (2-D) (*x*, *y*) spatial domain, **r**_{1}, …, **r**_{M}, along with *N* points in a 2-D “Fourier-like” domain, **k**_{1}, …, **k**_{N}. Each **r**_{m} equals [*x _{m}*,

$$\mathbf{m}=\mathbf{Ag},$$

(5.2)

where **A** ^{M}×^{N} is a known dense Fourier matrix^{5} and the *m*th element of **m** ^{M} is the image that arises at **r**_{m}, denoted *m*(**r**_{m}), due to the energy deposition along the *N* points on the *k*-space grid as described by the weights in the **g** vector. The energy deposition in the *k _{z}* dimension is along sinc-like pulse segments that do not enter our design process because the

The goal now is to form a desired (possibly complex-valued) spatial-domain image *d*(**r**) at the given set of spatial-domain coordinates (the **r**_{m}'s) by placing energy at some of the given **k**_{n} locations while obeying a special constraint on how the energy is deposited. To produce the spatial-domain image, we will use a “*P*-channel MRI parallel excitation system” [37, 54]—each of the system's *P* channels is able to deposit energy of varying magnitudes and phases at the *k*-space locations and is able to influence the resulting spatial-domain pattern *m*(**r**) to some extent. Each channel *p* has a known “profile” across space, *S _{p}*(

We now finalize the formulation: first, we construct *P* diagonal matrices **S**_{p} ^{M×M} such that **S**_{p}(*m*, *m*) = *S _{p}*(

$$\mathbf{m}={\mathbf{S}}_{1}{\mathbf{Ag}}_{1}+\cdots +{\mathbf{S}}_{P}{\mathbf{Ag}}_{P}={\mathbf{F}}_{1}{\mathbf{g}}_{1}+\cdots +{\mathbf{F}}_{P}{\mathbf{g}}_{P}={\mathbf{F}}_{\mathrm{tot}}{\mathbf{g}}_{\mathrm{tot}},$$

(5.3)

where **m** = [*m*(**r**_{1}), …, *m*(**r**_{M})]^{T}. Essentially, (5.3) refines the model of (5.2) by incorporating multiple excitation channels and accounting for coil transmission profiles ${\left\{{S}_{p}\left(r\right)\right\}}_{p=1}^{P}$ that are not necessarily constant across **r**.

Recalling that our overall goal is to deposit energy in *k*-space to produce the image *d*(**r**), and given the special constraint that we may deposit energy only among a small subset of the *N* points in *k*-space, we arrive at the following problem:

$$\underset{{\mathbf{g}}_{1},\dots ,{\mathbf{g}}_{P}}{\mathrm{min}}{\parallel \mathbf{d}-\mathbf{m}\parallel}_{2}\phantom{\rule{1em}{0ex}}\mathrm{s}.\mathrm{t}.\phantom{\rule{thinmathspace}{0ex}}\text{the simultaneous}\phantom{\rule{thinmathspace}{0ex}}K-\text{sparsity of the}\phantom{\rule{thinmathspace}{0ex}}{\mathbf{g}}_{p}\u2019\mathrm{s},$$

(5.4)

where **d** ^{M} = [*d*(**r**_{1}), …, *d*(**r**_{M})]^{T} ^{M} and **m** is given by (5.3). That is, we seek out *K* < *N* locations in *k*-space at which to deposit energy to produce an image *m*(**r**) that is close in the _{2} sense to the desired image *d*(**r**). Strictly and simultaneously *K*-sparse **g**_{p}'s are the only valid solutions to the problem.

One sees that (5.4) is precisely the MSSO system given in (3.2), and thus the algorithms posed in section 4 are applicable to the pulse design problem. In order to apply the convex minimization techniques (IRLS, SOCP, RBRS, and CBCS) to this problem, the only additional step needed is to retune any given solution estimate **ĝ**_{tot} (alg, *λ*) into a strictly and simultaneously *K*-sparse set of vectors.

An alternative approach to deciding where to place energy at *K* locations in *k*-space is to compute the Fourier transform of *d*(**r**) and decide to place energy at (*k _{x}*,

Using an eight-channel system (i.e., *P* = 8) whose profile magnitudes (the *S*_{p}(**r**)'s) are depicted in Figure 5.3, we will design pulses to produce the desired image shown in the left subplot of Figure 5.4. We sample the spatial (*x*, *y*) domain at *M* = 356 locations within the region where at least one of the profiles in Figure 5.3 is active; this region of interest is referred to as the *field of excitation* (FOX) in MRI literature.^{6} The spatial samples are spaced by 0.8 cm along each axis, and the FOX has a diameter of roughly 20 cm. Given our choice of **r**_{1}, …, **r**_{356}, we sample the *S*(**r**)'s and *d*(**r**) and construct the **S**_{p}'s and **d**. Next, we define a grid of *N* = 225 points in (*k*_{x}, *k*_{y})-space that is 15 × 15 in size and extends outward from the *k*-space origin. The points are spaced by $\frac{1}{20}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{cm}}^{-1}$ along each *k*-space axis, and the overall grid is shown in the right subplot of Figure 5.4. Finally, because we know the 356 **r**_{m}'s and 225 **k**_{n}'s, we construct the 356 × 225 matrix **A** in (5.2), (5.3) along with the **F**_{p}'s in (5.3). We now have all the data we need to apply the algorithms and determine simultaneously *K*-sparse **g**_{p}'s that (approximately) solve (5.4).

We apply the algorithms and evaluate designs where the use of *K* 1 {1, …, 30} candidate points in *k*-space is permitted (in practical MRI scenarios, *K* up to 30 is permissible). Typically, the smallest *K* that produces a version of *d*(**r**) to within some _{2} fidelity is the design that the MRI pulse designer will use on a real system.

To obtain simultaneously *K*-sparse solutions with MP and LSMP, we set *K* = 30, run each algorithm once, remember the ordered list of chosen indices, and back out every solution for *K* = 1 through *K* = 30.

For each convex minimization method (IRLS, SOCP, RBRS, and CBCS), we apply the following procedure: first, we run the algorithm for 14 values of $\lambda \in [0,\frac{1}{4}]$ storing each resulting solution **ĝ**_{tot}(alg,*λ*). Then, for fixed *K*, to determine a simultaneously *K*-sparse deposition of energy on the *k*-space grid, we apply the retuning process to each of the 14 solutions, obtaining 14 strictly simultaneously *K*-sparse retuned sets of solution vectors, ${\widehat{g}}_{\mathrm{tot}}^{\left(K\right)}(\mathrm{alg},\lambda )$. The one solution among the 14 that best minimizes the _{2} error between the desired and resulting images, ${\parallel \mathbf{d}-{\mathbf{F}}_{\mathrm{tot}}{\mathbf{g}}_{\mathrm{tot}}^{\left(K\right)}(\mathrm{alg},\lambda )\parallel}_{2}$, is chosen as the solution for the *K* under consideration. Essentially, we know a good value for *λ* when applying each of the convex minimization methods. To attempt to avoid convergence problems, RBRS and CBCS are permitted 5 000 and 10 000 maximum outer iterations, respectively (see below).

The left subplot of Figure 5.5 shows the _{2} error vs. *K* curves for each algorithm. As *K* is increased, each method produces images with lower _{2} error, which is sensible: depositing energy at more locations in *k*-space gives each technique more degrees of freedom with which to form the image. In contrast to the sparsity pattern estimation experiments in sections 5.1 and 5.2, however, here LSMP is the best algorithm: for each fixed *K* considered in Figure 5.5, the LSMP technique yields a simultaneously *K*-sparse energy deposition that produces a higher-fidelity image than all other techniques. For example, when *K* = 17, LSMP yields an image with _{2} error of 3.3. In order to produce an image with equal or better fidelity, IRLS, RBRS, and SOCP need to deposit energy at *K* = 21 points in *k*-space and thus produce less useful designs from an MRI perspective. CBCS fares the worst, needing to deposit energy at *K* = 25 grid points in order to compete with the fidelity of LSMP's *K* = 17 image.

The right subplot of Figure 5.5 shows how well the four convex minimization algorithms minimize the objective function (3.3), (3.7) before retuning any solutions and enforcing strict simultaneous *K*-sparsity. For each fixed *λ*, SOCP and IRLS find solutions that yield the same objective function value. RBRS's solutions generally yield objective function values equal to those of SOCP and IRLS, but at times lead to higher values: in these cases RBRS almost converges but does not do so completely. Finally, for many *λ*'s, CBCS fails to converge and yields extremely large objective function values. Some additional detail on convergence behavior is provided in [70, 72].

Setting *K* = 30, we run MP and LSMP and record the runtime of each. Across the 14 *λ*'s, the IRLS, RBRS, CBCS, and SOCP runtimes are recorded and averaged. The peak memory usage of each algorithm is also noted; these statistics are presented in Table 5.2. In distinct contrast to the smaller-variable-size experiments in sections 5.1 and 5.2 where all four convex minimization methods have relatively short runtimes (under one second), here RBRS and CBCS are much slower, leaving IRLS and SOCP as the only feasible techniques among the four. Furthermore, while LSMP does indeed outperform IRLS and SOCP on an _{2} error basis (as shown in Figure 5.5), the runtime statistics here show that LSMP requires order-of-magnitude greater runtime to solve the problem—therefore, in some real-life scenarios where designing pulses in less than 5 minutes is a necessity, IRLS and SOCP are superior. Finally, in contrast to section 5.1's runtimes given in Table 5.1, here IRLS is 1.9 times faster than SOCP and uses 1.4 times less peak memory, making it the superior technique for MRI pulse design since IRLS's _{2} error performance and ability to minimize the objective function (3.3), (3.7) essentially equal those of SOCP.

To conclude the experiment, we fix *K* = 17 and observe the images produced by the algorithms along with the points at which each algorithm chooses to deposit energy along the grid of candidate points in (*k _{x},k_{y}*)-space. Figure 5.6 illustrates the images (in both magnitude and phase) that arise due to each algorithm's simultaneously 17-sparse set of solution vectors,

The MRI pulse design problem in section 5.3 differs substantially from the source localization problem in section 5.1, the denoising experiment in section 5.2, and other routine applications of sparse approximation (e.g., [16, 7, 27, 22, 10, 11, 43]). It differs not only in purpose but in numerical properties, the latter of which are summarized in Table 6.1. While this list will not always hold true on an application-by-application basis, it does highlight general differences between the two problem classes.

Even though LSMP, IRLS, and SOCP tend to exhibit superior performance across different experiments in this manuscript, RBRS and CBCS are worthwhile because, unlike the former methods that update all *PN* unknowns concurrently, the shrinkage techniques update only a subset of the total variables during each iteration.

For example, RBRS as given in Algorithm 3 updates only *P* unknowns at once, while CBCS updates but a single scalar at a time [70, 72]. RBRS and CBCS are thus applicable in scenarios where *P* and *N* are exceedingly large and tuning all *PN* parameters during each iteration is not possible. If storing and handling *M* × *PN* or *PN* × *PN* matrices exceeds a system's available memory and causes disk thrashing, RBRS and CBCS, though they require far more iterations, might still be better options than LSMP, IRLS, and SOCP in terms of runtime.

A fast technique for finding ideal values of *λ* is an open problem. It might be useful to investigate several approaches to automated selection such as the “L-curve” method [33], universal parameter selection [20], and min-max parameter selection [35].

As noted in section 4, LSMP's computation and runtime could be improved upon by extending the projection-based recursive updating schemes of [10, 11] to MSSO LSMP. In regard to the MRI design problem, runtime for any method might be reduced via a multiresolution approach (as in [43]) by first running the algorithm with a coarse *k*-space frequency grid, noting which energy deposition locations are revealed, and then running the algorithm with a grid that is finely sampled around the locations suggested by the coarse result. This is faster than providing the algorithm a large, finely sampled grid and attempting to solve the sparse energy deposition task in one step. An initial investigation has shown that reducing the maximum frequency extent of the *k*-space grid (and thus the number of grid points, *N*) may also reduce runtime without significantly impacting image fidelity [74].

When solving the MRI pulse design problem, both RBRS and CBCS required excessive iterations and hence exhibited lengthy runtimes. To mitigate these problems, one may consider extending parallel coordinate descent (PCD) shrinkage techniques used for SSSO sparse approximation (as in [21, 22]). Sequential subspace optimization (SESOP) [23] might also be employed to reduce runtime. Combining PCD with SESOP and adding a line search after each iteration would yield sophisticated versions of RBRS and CBCS.

It may be useful to consider a combined problem where there are multiple observations as well as multiple system matrices. That is, assume we have a series of *J* observations, **d**_{1}, …, **d**_{J}, each of which arises due to a set of *P* simultaneously *K*-sparse unknown vectors **g**_{1,j}, …, **g**_{P,j}^{8} passing through a set of *P* system matrices **F**_{1,j}, …, **F**_{P,j} and then undergoing linear combination, as follows:

$${\mathbf{d}}_{j}={\mathbf{F}}_{1,j}{\mathbf{g}}_{1,j}+\cdots +{\mathbf{F}}_{P,j}{\mathbf{g}}_{P,j}=\underset{p=1}{\overset{P}{\Sigma}}{\mathbf{F}}_{p,j}{\mathbf{g}}_{p,j}\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}j=1,\dots ,J.$$

(6.1)

If **F**_{p,j} is constant for all *J* observations, then the problem reduces to

$${\mathbf{d}}_{j}={\mathbf{F}}_{1}{\mathbf{g}}_{1,j}+\cdots +{\mathbf{F}}_{P}{\mathbf{g}}_{P,j}={\mathbf{F}}_{\mathrm{tot}}{\mathbf{g}}_{\mathrm{tot},j}\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}j=1,\dots ,J,$$

(6.2)

and we may stack the matrices and terms as follows:

$$[{\mathbf{d}}_{1},\dots ,{\mathbf{d}}_{J}]={\mathbf{F}}_{\mathrm{tot}}[{\mathbf{g}}_{\mathrm{tot},1},\dots ,{\mathbf{g}}_{\mathrm{tot},J}].$$

(6.3)

Having posed (6.1), (6.2), (6.3), one may formulate optimization problems similar to (2.4), (3.3) to determine simultaneously sparse **g**_{p,j}'s that solve (6.3). Algorithms for solving such problems may arise by combining the concepts of SSMO algorithms [11, 43, 65, 62] with those of the MSSO algorithms posed earlier.

We defined the linear inverse MSSO simultaneous sparsity problem where simultaneously sparse sets of unknown vectors are required as the solution. This problem differed from prior problems involving multiple unknown vectors because, in this case, each unknown vector was passed through a different system matrix and the outputs of the various matrices underwent linear combination, yielding only one observation vector.

To solve the proposed MSSO problem, we formulated three greedy techniques (MP, OMP, and LSMP) along with algorithms based on IRLS, iterative shrinkage, and SOCP methodologies. The MSSO algorithms were evaluated across noiseless and noisy sparsity pattern estimation experiments as well as an MRI pulse design experiment; for sparsity pattern recovery, algorithms that minimized the relaxed convex objective function outperformed the greedy methods, whereas in the MRI pulse design experiment, greedy LSMP exhibited superior performance.

When deriving RBRS for complex-valued data, we showed that seeking a single sparse complex-valued vector is equivalent to seeking two simultaneously sparse real-valued vectors: we mapped single-vector sparse approximation of a complex vector to the MSSO problem, increasing the applicability of algorithms that solve the latter.

Overall, while improvements upon these seven algorithms (and new algorithms altogether) surely do exist, this manuscript has laid the groundwork of the MSSO problem and conducted an initial exploration of algorithms with which to solve it.

The authors thank D. M. Malioutov for assistance with the derivation step that permitted the transition from (4.16) to 4.17) in section 4.7, as well K. Setsompop, B. A. Gagoski, V. Alagappan, and L. L. Wald for collecting the experimental coil profile data in Figure 5.3. The authors also gratefully acknowledge discussions on related work with A. C. Gilbert, R. Maleh, and D. Yoon.

^{*}This material is based upon work supported by the National Institutes of Health under grants 1P41RR14075, 1R01EB000790, 1R01EB006847, and 1R01EB007942; the National Science Foundation under CAREER Grant 0643836; United States Department of Defense National Defense Science and Engineering Graduate Fellowship F49620-02-C-0041; the MIND Institute; the A. A. Martinos Center for Biomedical Imaging; Siemens Medical Solutions; and R. J. Shillman's Career Development Award.

^{1}The choice of _{2} here is arbitrary; it can be replaced by any vector norm. We write · _{0,2} because the relaxation we use subsequently is then naturally · _{1,2}.

^{2}An _{p} norm with *p* < 1 could be used in place of the _{1} norm if one is willing to deal with a nonconvex objective function. Further, an _{q} norm (with *q* > 2) rather than an _{2} norm could be applied to each row of **G** because the purpose of the row operation is to collapse the elements of the row into a scalar value without introducing a sparsifying effect.

^{3}Equation (4.10) consists of a direct inversion of a *P* × *P* matrix, which is acceptable in this paper because all experiments involve *P* ≤ 10. If *P* is large, (4.10) could be solved via a CG technique (e.g., LSQR).

^{4}This could be described as thresholding the output of the original computation.

^{5}Formally, **A**(*m, n*) = *jγe*^{irm·kn}, where $j=\sqrt{-1}$ and *γ* is a known lumped gain constant.

^{6}Sampling points outside of the FOX where no profile has influence is unnecessary because an image can never be formed at these points no matter how much energy any given channel places throughout *k*-space.

^{7}Each image is generated by taking the corresponding solution **g**_{tot}, computing **m** in (5.3), unstacking the elements of **m** into *m*(**r**), and then displaying the magnitude and phase of *m*(**r**).

^{8}The *K*-term simultaneous sparsity pattern of each set of **g**_{p,j}'s may or may not change with *j*.

1. Ben-Tal A, Nemirovski A. MPS/SIAM Ser. Optim. 2. SIAM; Philadelphia: 2001. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications.

2. Bjorck A, Elfving T. Technical report LiTH-MAT-R-1978-5. Department of Mathematics, Linköping University; Linköping, Sweden: 1978. Accelerated Projection Methods for Computing Pseudoinverse Solutions of Systems of Linear Equations.

3. Boyd S, Vandenberghe L. Convex Optimization. Cambridge University Press; Cambridge, UK: 2004.

4. Bruckstein AM, Donoho DL, Elad M. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 2009;51:34–81.

5. Candès EJ, Tao T. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory. 2006;52:5406–5425.

6. Chen S, Billings SA, Luo W. Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control. 1989;50:1873–1896.

7. Chen SS, Donoho DL, Saunders MA. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 1998;20:33–61.

8. Chen J, Huo X. Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Process. 2006;54:4634–4643.

9. Chen S, Wigger J. Fast orthogonal least squares algorithm for efficient subset model selection. IEEE Trans. Signal Process. 1995;43:1713–1715.

10. Cotter SF, Adler J, Rao BD, Kreutz-Delgado K. Forward sequential algorithms for best basis selection. IEE Proc. Vision Image Signal Process. 1999;146:235–244.

11. Cotter SF, Rao BD, Engan K, Kreutz-Delgado K. Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans. Signal Process. 2005;53:2477–2488.

12. Daubechies I, Defrise M, De Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 2004;57:1413–1457.

13. Davis G. Adaptive Nonlinear Approximations. New York University; New York: 1994. Ph.D. thesis.

14. Davis G, Mallat S, Zhang Z. Adaptive time-frequency decomposition. Opt. Eng. 1994;33:2183–2191.

15. Donoho DL. Superresolution via sparsity constraints. SIAM J. Math. Anal. 1992;23:1309–1331.

16. Donoho DL. De-noising by soft-thresholding. IEEE Trans. Inform. Theory. 1995;41:613–627.

17. Donoho DL. Compressed sensing. IEEE Trans. Inform. Theory. 2006;52:1289–1306.

18. Donoho DL. For most large underdetermined systems of equations, the minimal _{1}-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 2006;59:907–934.

19. Donoho DL, Elad M, Temlyakov VN. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory. 2006;52:6–18.

20. Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81:425–455.

21. Elad M. Why simple shrinkage is still relevant for redundant representations? IEEE Trans. Inform. Theory. 2006;52:5559–5569.

22. Elad M, Matalon B, Zibulevsky M. Image denoising with shrinkage and redundant representations; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR); 2006. pp. 1924–1931.

23. Elad M, Matalon B, Zibulevsky M. Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization. Appl. Comput. Harmon. Anal. 2007;23:346–367.

24. Eldar YC, Bölcskei H. Block-sparsity: Coherence and efficient recovery. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing; Taipei, Taiwan. 2009.pp. 2885–2888.

25. Eldar YC, Rauhut H. Average Case Analysis of Multichannel Sparse Recovery Using Convex Relaxation. 2009. available online from http://www.arxiv.org/abs/0904.0494v1.

26. Fletcher AK, Rangan S. Orthogonal matching pursuit from noisy random measurements: A new analysis. Proceedings of the Neural Information Processing Systems Conference; Vancouver, Canada. 2009.

27. Fletcher AK, Rangan S, Goyal VK, Ramchandran K. Denoising by sparse approximation: Error bounds based on rate-distortion theory. EURASIP J. Appl. Signal Process. 2006;2006:26318.

28. Fornasier M, Rauhut H. Recovery algorithms for vector-valued data with joint sparsity constraints. SIAM J. Numer. Anal. 2008;46:577–613.

29. Golub G, Kahan W. Calculating the singular values and pseudo-inverse of a matrix. SIAM J. Numer. Anal. 1965;2:205–224.

30. Golub GH, Van Loan CF. Matrix Computations. Johns Hopkins University Press; Baltimore: 1983.

31. Gorodnitsky IF, Rao BD. Sparse signal reconstruction from limited data using FOCUSS: A recursive weighted norm minimization algorithm. IEEE Trans. Signal Process. 1997;45:600–616.

32. Gribonval R, Rauhut H, Schnass K, Vandergheynst P. Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms. J. Fourier Anal. Appl. 2008;14:655–687.

33. Hansen PC. Regularization tools: A Matlab package for analysis and solution of discrete ill-posed problems. Numer. Algorithms. 1994;6:1–35.

34. Johnson DH, Dudgeon DE. Array Signal Processing: Concepts and Techniques. Englewood Cliffs, NJ. Prentice–Hall; 1993.

35. Johnstone IM. On minimax estimation of a sparse normal mean vector. Ann. Statist. 1994;22:271–289.

36. Karlovitz LA. Construction of nearest points in the L^{p}, p even, and L^{∞} norms. I, J. Approx. Theory. 1970;3:123–127.

37. Katscher U, Bornert P, Leussler C, van den Brink JS. Transmit SENSE. Magn. Reson. Med. 2003;49:144–150. [PubMed]

38. Kowalski M, Torrésani B. SPARS'09—Signal Processing with Adaptive Sparse Structured Representations. Saint-Malo, France: 2009. Structured sparsity: From mixed norms to structured shrinkage.

39. Krim H, Viberg M. Two decades of array signal processing research. The parametric approach. IEEE Signal Process. Mag. 1996;13:67–94.

40. Lanczos C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Research Nat. Bur. Standards. 1950;45:255–282.

41. Maleh R, Yoon D, Gilbert AC. SPARS'09—Signal Processing with Adaptive Sparse Structured Representations. Saint-Malo, France: 2009. Fast algorithm for sparse signal approximation using multiple additive dictionaries.

42. Malioutov DM. A Sparse Signal Reconstruction Perspective for Source Localization with Sensor Arrays. Massachusetts Institute of Technology; Cambridge, MA: 2003. Master's thesis.

43. Malioutov DM, Çetin M, Willsky AS. A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process. 2005;53:3010–3022.

44. Mallat SG, Zhang Z. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 1993;41:3397–3415.

45. Mishali M, Eldar YC. Reduce and boost: Recovering arbitrary sets of jointly sparse vectors. IEEE Trans. Signal Process. 2008;56:4692–4702.

46. Natarajan BK. Sparse approximate solutions to linear systems. SIAM J. Comput. 1995;24:227–234.

47. Negahban S, Wainwright MJ. Joint support recovery under high-dimensional scaling: Benefits and perils of _{1,∞}*-regularization*. Proceedings of the Neural Information Processing Systems Conference; Vancouver, Canada. 2008.

48. Obozinksi G, Wainwright MJ, Jordan MI. High-dimensional support union recovery in multivariate regression. Proceedings of the Neural Information Processing Systems Conference; Vancouver, Canada. 2008.

49. Paige CC, Saunders MA. *Algorithm* 583. LSQR: Sparse linear equations and least squares problems. ACM Trans. Math. Software. 1982;8:195–209.

50. Paige CC, Saunders MA. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software. 1982;8:43–71.

51. Pati YC, Rezaiifar R, Krishnaprasad PS. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems, and Computers; Pacific Grove, CA. 1993.pp. 40–44.

52. Pauly JM, Nishimura D, Macovski A. A k-space analysis of small-tip-angle excitation. J. Magn. Reson. 1989;81:43–56.

53. Rangan S, Fletcher AK, Goyal VK. Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing. 2009. available online from http://www.arxiv.org/abs/0906.3234v1.

54. Setsompop K, Wald LL, Alagappan V, Gagoski BA, Hebrank F, Fontius U, Schmitt F, Adalsteinsson E. *Parallel RF transmission with* 8 *channels at* 3 *Tesla*. Magn. Reson. Med. 2006;56:1163–1171. [PubMed]

55. Stojnic M, Parvaresh F, Hassibi B. On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Trans. Signal Process. 2009;57:3075–3085.

56. Sturm J. *Using SeDuMi* 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 1999;11/12:625–653.

57. Tibshirani R. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B. 1996;58:267–288.

58. Tikhonov AN. Solution of incorrectly formulated problems and the regularization method. Dokl. Akad. Nauk SSSR. 1963;151:501–504. in Russian.

59. Tikhonov AN, Arsenin VA. Solution of Ill-Posed Problems. Winston & Sons; Washington, DC: 1977.

60. Toh KC, Todd MJ, Tutuncu RH. *SDPT* 3 —a MATLAB software package for semidefinite programming, version 1.3. Optim. Methods Softw. 1999;11/12:545–581.

61. Tropp JA. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory. 2004;50:2231–2242.

62. Tropp JA. Algorithms for simultaneous sparse approximation: Part II: *Convex relaxation*. Signal Process. 2006;86:589–602.

63. Tropp JA. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory. 2006;52:1030–1051.

64. Tropp JA, Gilbert AC. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inform. Theory. 2007;53:4655–4666.

65. Tropp JA, Gilbert AC, Strauss MJ. Algorithms for simultaneous sparse approximation: Part I: *Greedy pursuit*. Signal Process. 2006;86:572–588.

66. van den Berg E, Friedlander MP. Joint-Sparse Recovery from Multiple Measurements. 2009. available online from http://www.arxiv.org/abs/0904.2051v1.

67. Wainwright MJ. Sharp thresholds for high-dimensional and noisy sparsity recovery using _{1}*-constrained quadratic programming (Lasso)* IEEE Trans. Inform. Theory. 2009;55:2183–2202.

68. Wipf DP, Rao B. An empirical Bayesian strategy for solving the simultaneous sparse approximation problem. IEEE Trans. Signal Process. 2007;55:3704–3716.

69. Yip CY, Fessler JA, Noll DC. *Advanced* 3*D tailored RF pulse for signal recovery in*${T}_{2}^{\ast}$*-weighted functional magnetic resonance imaging*. Magn. Reson. Med. 2006;56:1050–1059. [PubMed]

70. Zelinski AC. Improvements in Magnetic Resonance Imaging Excitation Pulse Design. Massachusetts Institute of Technology; Cambridge, MA: 2008. Ph.D. thesis.

71. Zelinski AC, Goyal VK, Adalsteinsson E, Wald LL. Sparsity in MRI RF excitation pulse design. Proceedings of the 42nd Annual Conference on Information Sciences and Systems; Princeton, NJ. 2008.pp. 252–257.

72. Zelinski AC, Goyal VK, Adalsteinsson E. Simultaneously Sparse Solutions to Linear Inverse Problems with Multiple System Matrices and a Single Observation Vector. 2009. available online from http://www.arxiv.org/abs/0907.2083v1. [PMC free article] [PubMed]

73. Zelinski AC, Wald LL, Setsompop K, Alagappan V, Gagoski BA, Goyal VK, Adalsteinsson E. Fast slice-selective radio-frequency excitation pulses for mitigating ${B}_{1}^{+}$*inhomogeneity in the human brain at 7 Tesla*. Magn. Reson. Med. 2008;59:1355–1364. [PMC free article] [PubMed]

74. Zelinski AC, Wald LL, Setsompop K, Goyal VK, Adalsteinsson E. Sparsity-enforced slice-selective MRI RF excitation pulse design. IEEE Trans. Med. Imaging. 2008;27:1213–1229. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |