PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
SIAM J Sci Comput. Author manuscript; available in PMC 2010 May 4.
Published in final edited form as:
SIAM J Sci Comput. 2010 January 20; 31(6): 4533–4579.
doi:  10.1137/080730822
PMCID: PMC2863111
NIHMSID: NIHMS184504

SIMULTANEOUSLY SPARSE SOLUTIONS TO LINEAR INVERSE PROBLEMS WITH MULTIPLE SYSTEM MATRICES AND A SINGLE OBSERVATION VECTOR*

Abstract

A problem that arises in slice-selective magnetic resonance imaging (MRI) radio-frequency (RF) excitation pulse design is abstracted as a novel linear inverse problem with a simultaneous sparsity constraint. Multiple unknown signal vectors are to be determined, where each passes through a different system matrix and the results are added to yield a single observation vector. Given the matrices and lone observation, the objective is to find a simultaneously sparse set of unknown vectors that approximately solves the system. We refer to this as the multiple-system single-output (MSSO) simultaneous sparse approximation problem. This manuscript contrasts the MSSO problem with other simultaneous sparsity problems and conducts an initial exploration of algorithms with which to solve it. Greedy algorithms and techniques based on convex relaxation are derived and compared empirically. Experiments involve sparsity pattern recovery in noiseless and noisy settings and MRI RF pulse design.

Keywords: iterative shrinkage, iteratively reweighted least squares, magnetic resonance imaging excitation pulse design, matching pursuit, multiple measurement vectors, second-order cone programming, simultaneous sparse approximation, sparse approximation

1. Introduction

In this work we propose a linear inverse problem that requires a simultaneously sparse set of vectors as the solution, i.e., a set of vectors where only a small number of each vector's entries are nonzero, and where the vectors' sparsity patterns (the locations of the nonzero entries) are equal. Sparsity constraints and regularizations that promote sparsity have a long history that we will not attempt to recount; the reader is referred to [15] both for a theoretical result on the numerical robustness of using sparsity constraints and for references to early empirical work. While perhaps an old topic, the estimation of sparse or approximately sparse signals is also an extremely active area of research because of the emergence of compressed sensing [5, 17] and a multitude of applications for wavelet-domain sparsity of images; a review of recent developments with an emphasis on algorithms and performance guarantees is provided by [4].

We call the problem of interest multiple-system single-output (MSSO) simultaneous sparse approximation:

dΣp=1PFpgp,
(1.1)

where d [set membership] [mathematical double-struck c]M, Fp [set membership] [mathematical double-struck c]M×N for each p, and each gp [set membership] [mathematical double-struck c]N has a common sparsity pattern or, more precisely, we measure the sparsity level by the number of positions at which any of the gp's is nonzero. This name highlights the distinction from the more common setting of single-system multiple-output (SSMO) simultaneous sparse approximation:

dpFggforp=1,,P,
(1.2)

where dp [set membership] [mathematical double-struck c]M for each p, F [set membership] [mathematical double-struck c]M×N, and the gp's again have a common sparsity pattern. In either case, associated inverse problems are to minimize the simultaneous sparsity level of the gp's for a given goodness of fit, to optimize the goodness of fit for a given sparsity level, or to minimize an objective function that combines goodness of fit and sparsity level. There is a large and growing literature on SSMO problems, often using the phrase multiple measurement vectors, but MSSO problems have received much less attention.

MSSO sparse approximation came to our attention as an abstraction of the placement of spokes in short-duration, slice-selective magnetic resonance imaging (MRI) radio-frequency (RF) excitation pulse design [70, 73, 74], and we know of no earlier application. To keep the bulk of the developments applicable generally, we defer a description of the MRI application until the end of the paper. The paper is focused on formalizing the MSSO sparse approximation problem and introducing and contrasting several algorithms for finding approximate solutions. We have implemented and tested three greedy algorithms—generalizing matching pursuit (MP) [44], orthogonal matching pursuit (OMP) [6, 51, 14, 46, 9, 10], and least squares matching pursuit (LSMP) [10]—and also algorithms for solving a convex relaxation of the MSSO problem, applying second-order cone programming (SOCP) [3, 43], and generalizing iteratively reweighted least squares (IRLS) [36, 31] and iterative shrinkage [20, 12, 21, 22]. We evaluate the performance of the algorithms across three experiments: the first and second involve sparsity pattern recovery in noiseless and noisy scenarios, respectively, while the third deals with MRI RF excitation pulse design.

It is worth noting explicitly that this paper does not provide conditions for optimality (i.e., equivalence with a generally intractable sparsity-constrained problem) of greedy algorithms or convex relaxations, nor does it analyze random ensembles of problems. The sparse approximation literature is increasingly dominated by these types of analyses, and we cite these results in the appropriate sections.

The structure of this paper is as follows: in section 2, we provide background information about ordinary sparse approximation and SSMO sparse approximation. In section 3, we formulate the MSSO problem. Algorithms for solving the problem are then posed in section 4. The coverage of some algorithms is very brief; additional details on these can be found in [70, 72]. Details and results of the numerical experiments appear in section 5. Section 6 highlights the strengths and weaknesses of the algorithms and presents ideas for future work. Concluding remarks are given in section 7.

2. Background

2.1. Single-system single-output (SSSO) sparse approximation

Following our naming rubric, ordinary sparse approximation can be called SSSO sparse approximation for emphasis. The problem can be written as

argmingN{12dFg22+λg0},givendMandFM×N,
(2.1)

where ‖ · ‖0 denotes the number of nonzero elements of a vector and λ [set membership] (0, ∞) is a control parameter. Varying λ determines the relative importance of fitting the data d and keeping the solution g sparse. In many applications, λ is set to yield a specified sparsity or specified residual; it may also have other significance. For general F, solving (2.1) is NP-hard [13, 46]. Thus, great effort has gone into the design and analysis of approximate algorithms.

A greedy approach is to iteratively choose one position for a nonzero entry of g at a time or, equivalently, to pick one column of F at a time. To select the column of F to maximize the magnitude of the inner product with the current residual is called matching pursuit (MP) [44], and several more sophisticated variants have been proposed [6, 51, 14, 46, 9, 10]. Most important among these is orthogonal MP (OMP), which avoids deleterious interactions from iteration to iteration by working in the orthogonal complement of all previously selected columns. Notable analyses of OMP are those in [19, 61], where sufficient conditions are given for OMP to recover the sparsity pattern of the solution of (2.1). Analyses of OMP for random problem ensembles are given in [26, 64].

A second approach is to replace ‖ · ‖0 in (2.1) with its relaxation [7, 57]:

argmingN{12dFg22+λg1},givendMandFM×N.
(2.2)

This is a convex optimization and thus may be solved efficiently [3]. Certain conditions on F guarantee proximity of the solutions to (2.1) and (2.2) [18, 19, 63]. Analyses of random problem ensembles are given in [53, 67].

Note that (2.2) applies an [ell]1 norm to g, but an [ell]p norm (where p < 1) may also be used to promote sparsity [31, 7]; this leads to a nonconvex problem and will not considered in this paper. A problem of the form (2.2) may arise as a proxy for (2.1) or as the inherent problem of interest. For example, in a Bayesian estimation setting, (2.2) yields the maximum a posteriori probability estimate of g from d when the observation model involves F and Gaussian noise and the prior on g is Laplacian. Similar statements can be made about the relaxations in the following sections.

2.2. Single-system multiple-output (SSMO) simultaneous sparse approximation

In SSMO, each of P observation vectors dp is approximated by a product Fgp, where the gp's are simultaneously sparse. This yields the problem

argminGN×P{12DFGF2+λG0,2},givenDM×PandFM×N,
(2.3)

where D = [d1, …, dP], G = [g1, …, gP], ‖ · ‖F is the Frobenius norm, and ‖ · ‖0, 2 is the number of rows with nonzero [ell]2 norm (i.e., the simultaneous sparsity level).1 This reduces to (2.1) when P = 1 and is thus also generally computationally intractable.

Greedy algorithms for approximating (2.3) were first developed in [11, 65]. Deterministic conditions for sparsity pattern recovery and average case analyses for greedy algorithms are both presented in [32].

Analogous to the relaxation (2.2) is

argminGN×P{12DFGF2+λGS},givenDM×PandFM×N,
(2.4)

where

GS=G0,1=Σn=1NΣp=1P|G(n,p)|2=Σn=1NΣp=1P|gp[n]|2;
(2.5)

i.e., ‖GS is the [ell]1 norm of the [ell]2 norms of the rows of the G matrix.2 This latter operator is a simultaneous sparsity norm: it penalizes the program (produces large values) when the columns of G have dissimilar sparsity patterns [43]. Fixing λ to a sufficiently large value and solving this optimization yields simultaneously sparse gp's.

Given the convex objective function in (2.4), one may then attempt to find a solution that minimizes the objective using an IRLS-based [11] or SOCP-based [43] approach. A formal analysis of the minimization of the convex objective may be found in [62]. Convex relaxations for this problem are also studied in [25, 28]. A related approach that may incorporate additional prior information is given in [68], and a boosting strategy that may be combined with either a greedy algorithm or a convex relaxation is presented in [45] and analyzed further in [66]. Also applicable to both greedy algorithms and convex relaxation are results in [8] that are analogous to the principal results of [19, 61, 63].

3. Multiple-system single-output (MSSO) simultaneous sparse approximation

We outline the MSSO problem in a style analogous to that of SSMO systems in (2.3), (2.4) and then pose a second formulation that is useful for deriving several algorithms in section 4.

3.1. MSSO problem formulation

Consider the following system:

d=F1g1++FPgP=[F1FP][g1gP]=Ftotgtot,
(3.1)

where d [set membership] [mathematical double-struck c]M and the Fp [set membership] [mathematical double-struck c]M×N are known. Unlike the SSMO problem, there are now only one observation and P different system matrices. Here again we desire an approximate solution where the gp's are simultaneously sparse, formally,

ming1,,gPdFtotgtot2subject to(s.t.)the simultaneousKsparsity of thegps,
(3.2)

or an equivalent unconstrained formulation. There are no constraints on the values of M, N, or P; i.e., there is no explicit requirement that the system be overdetermined or underdetermined.

In the first half of section 4, we will pose three approaches that attempt to solve the MSSO problem (3.2) in a greedy fashion. Another approach is to apply a relaxation similar to (2.2), (2.4):

minG{12dFtotgtot22+λGS},
(3.3)

where G and ‖GS are the same as in (2.4) and (2.5), respectively. In the second half of section 4, we will outline four algorithms for solving this relaxed problem.

3.2. Alternate formulation of the MSSO problem

The MSSO problem can be expressed in an equivalent form using new variables that are simply permutations of the Fp's and gp's. First we define N new matrices:

Cn=[f1,n,,fP,n]M×Pforn=1,,N,
(3.4)

where fp,n is the nth column of Fp. Next we construct N new vectors:

hn=[g1[n],,fP[n]]TPforn=1,,N,
(3.5)

where gp[n] is the nth element of gp and T is the transpose operation. Using the Cn's and hn's, we have another way to write d:

d=C1h1++CNhN=[C1CN][h1hN]=Ctothtot.
(3.6)

The relationship between the gp's and hn's implies that if we desire to find a set of simultaneously sparse gp's to solve (3.1), we should seek out a set of hn's where many of the hn's equal an all-zeros vector, 0. This claim is apparent if we consider the fact that H = [h1, …, hN] is equal to the transpose of G. This formulation of MSSO has recently been termed block-sparsity [24, 55] because the nonzero entries of htot come in contiguous blocks.

Continuing with this alternate formulation, and given our desire to find a solution where most of the hn's are all-zeros vectors, we relax the problem as follows:

minhtot{12dCtothtot22+λΣn=1Nhn2}.
(3.7)

The equivalence of Σn=1Nhn2 and ‖GS means that (3.7) is equivalent to (3.3), and thus, just as in (3.3), the approach in (3.7) finds a set of simultaneously sparse gp's.

3.3. Differences between the SSMO and MSSO problems

In the SSMO problem, the ratio of unknowns to knowns is N/M regardless of the number of observations, P. Increasing P has little effect on the residual error, but it is beneficial for estimation of a “best” or “correct” sparsity pattern because the simultaneous sparsity of the underlying gp's becomes more exploitable. Empirical evidence of improved sparsity pattern recovery with increasing P may be found, for example, in [11, 43], and related theoretical results on random problem ensembles are given in [47, 48].

In contrast, the ratio of unknowns to knowns is not constant with respect to P in the MSSO problem; rather it is equal to PN/M. As P is increased, it becomes easier to achieve a small residual, but sparsity pattern recovery becomes harder. This is supported experimentally in section 5.

3.4. Previous results on MSSO problems

After the initial submission of this manuscript, algorithms similar to our OMP variation in section 4.2 were introduced as block OMP (BOMP) [24] and parallel OMP (POMP) [41]. (POMP is more general, and three variations are given in [41].) The speed of these algorithms over general-purpose convex optimization is demonstrated in [41], and theoretical results analogous to those for ordinary sparse approximation from [19, 61] are given in [24]. Also, an empirical study of iterative shrinkage approaches to solving mixed-norm formulations like (3.3) is given in [38].

Another recent work is a random-ensemble analysis of high-dimensional block-sparse problems in [55]. This work uses the convex relaxation of (3.3) and studies problem size scalings at which a correct sparsity pattern can be recovered in the undersampled (M < NP), noiseless (d = Ctothtot for some block-sparse htot) case.

4. Proposed algorithms

We now present algorithms for (approximately) solving the MSSO problem defined in section 3. Some algorithms are described in more detail, with pseudocode, in [70, 72].

4.1. Matching pursuit (MP)

The classic MP technique first finds the column of the system matrix that best matches with the observed vector and then removes from the observation vector the projection of this chosen column. It proceeds to select a second column of the system matrix that best matches with the residual observation, and continues doing so until either K columns have been chosen (as specified by the user) or the residual observation ends up as a vector of all zeros.

Now let us consider the MSSO system as posed in (3.6). In the MSSO context, we need to seek out which of the N Cn matrices can be best used to represent d when the columns of Cn undergo an arbitrarily weighted linear combination. The key difference here is that on an iteration-by-iteration basis, we are no longer deciding which column vector best represents the observation, but which matrix does so. For the kth iteration of the algorithm, we need to select the proper index n [set membership] {1, …,N} by solving

qk=argminnminhnrk1Cnhn22,
(4.1)

where qk is the index that will be selected and rk−1 is the current residual observation. For fixed n, the solution to the inner minimization is obtained via the pseudoinverse, hnopt=Cnrk1, yielding

qk=argminnminhnrk1Cn(Cnrk1)22=argmaxnrk1HCnCnrk1,
(4.2)

where H is the Hermitian transpose. From (4.2) we see that, analogously to standard MP, choosing the best index for iteration k involves computing and ranking a series of inner product–like quadratic terms.

4.2. Orthogonal matching pursuit (OMP)

In single-vector MP, the residual rk is orthogonal to the kth column of the system matrix, but as the algorithm continues iterating, there is no guarantee that the residual remains orthogonal to column k or is minimized in the least squares sense with respect to the entire set of k chosen column vectors (indexed by q1, …, qk). Furthermore, K iterations of single-vector MP do not guarantee that K different columns will be selected. Single-vector OMP is an extension to MP that attempts to mitigate these problems by improving the calculation of the residual vector. During the kth iteration of single-vector OMP, column qk is chosen exactly as in MP (by ranking the inner products of the residual vector rk−1 with the various column vectors), but the residual vector is updated by accounting for all columns chosen up through iteration k rather than simply the last one [46, 10].

To extend OMP to the MSSO problem, we choose matrix qk during iteration k as in MSSO MP and then, in the spirit of single-vector OMP, compute the new residual as follows:

rk=dSk(Skd),
(4.3)

where Sk = [Cq1, …,Cqk] and Skd is the best choice of x that minimizes the residual error ||dSkx||2. That is, to update the residual we now employ all chosen matrices, weighting and combining them to best represent d in the least squares sense, yielding an rk that is orthogonal to the columns of Sk (and thus orthogonal to Cq1, …,Cqk), which has the benefit of ensuring that OMP will select a new Cn matrix at each step.

4.3. Least squares matching pursuit (LSMP)

Beyond OMP there exists a greedy algorithm with greater computational complexity known as LSMP. In single-vector LSMP, one solves Nk + 1 least squares minimizations during iteration k to determine which column of the system matrix may be used to best represent d [10].

To extend LSMP to MSSO systems, we must ensure that during iteration k we account for the k − 1 previously chosen Cn matrices when choosing the kth one to best construct an approximation to d. Specifically, index qk is selected as follows:

qk=argminn{1,,N},nIk1minxSk(n)xd22,
(4.4)

where Ik−1 is the set of indices chosen up through iteration k1,Sk(n)=[Sk1,Cn], Sk−1 = [Cq1, …,Cqk−1], and x [set membership] CPk. For fixed n, the solution of the inner iteration is xopt=(Sk(n))d; it is this step that ensures the residual observation error will be minimized by using all chosen matrices. Substituting xopt into (4.4) and simplifying the expression yields

qk=argmaxnIk1dHQk(n)d,
(4.5)

where Qk(n)=(Sk(n))(Sk(n)).

Algorithm 1 describes the LSMP method. The complexity here is much greater than that of OMP because Nk + 1 pseudoinversions of an M × Pk matrix are required during each iteration k. Furthermore, the dependence of Qk(n) on both n and k makes precomputing all such matrices infeasible in most cases. One way to decrease computation and runtime might be to extend the projection-based recursive updating scheme of [10] to MSSO LSMP.

An external file that holds a picture, illustration, etc.
Object name is nihms-184504-f0001.jpg

4.4. Iteratively reweighted least squares (IRLS)

Having posed three greedy approaches for solving the MSSO problem, we now turn our attention toward minimizing (3.7), the relaxed objective function. Here, the regularization term λ is used to trade off simultaneous sparsity with residual observation error.

One way to minimize (3.7) is to use an IRLS-based approach [36]. To begin, consider manipulating the right-hand term of (3.7) as follows:

λΣn=1Nhn2=λΣn=1Nhn22hn2=λΣn=1N|hn[1]|2++|hn[P]|2hn2λ2Σn=1N[hn[n],,hn[P]][2hn2+2hn2+][hn[1]hn[P]]=λ2Σn=1NhnHWnhn=λ2[h1HhNH][W1WN][h1hN]=λ2htotHWtothtot,
(4.6)

where * is the complex conjugate of a scalar, Wn is a P × P real-valued diagonal matrix whose diagonal elements each equal 2/(||hn||2 + [set membership]), and [set membership] is some small non-negative value introduced to mitigate poor conditioning of the Wns. If we fix Wtot [set membership] RPN×PN by computing it using some prior estimate of htot, then the right-hand term of (3.7) becomes a quadratic function and (3.7) transforms into a Tikhonov optimization [58, 59]:

minhtot{12dCtothtot22+λ2htotHWtothtot}.
(4.7)

Finally, by performing a change of variables and exploiting the properties of Wtot, we can convert (4.7) into an expression that may be minimized using the robust and reliable conjugate-gradient (CG) least squares solver LSQR [50, 49], so named because it applies a QR decomposition [30] when solving the system in the least squares sense. LSQR works better in practice than several other CG methods [2] because it restructures the input system via the Lanczos process [40] and applies a Golub–Kahan bidiagonalization [29] prior to solving it.

To apply LSQR to this problem, we first construct Wtot12 as the element-by-element square root of the diagonal matrix Wtot and then take its inverse to obtain Wtot12. Defining q=Wtot12htot and A=CtotWtot12, (4.7) becomes

minq{dAq22+λq22}.
(4.8)

LSQR is formulated to solve the exact problem in (4.8). Calling LSQR with d, λ, and A yields qopt, and the solution htotopt is backed out via Wtot12qopt.

Algorithm 2 outlines how one may iteratively apply (4.8) to attempt to find a solution that minimizes the original cost function, (3.7). The technique iterates until the objective function decreases by less than δ or the maximum number of iterations, K, is exceeded. The initial solution estimate is obtained via pseudoinversion of Ctot (an all-zeros initialization would cause Wtot to be poorly conditioned). A line search is used to step between the prior solution estimate and the upcoming one; this improves the rate of convergence and ensures that the objective decreases at each step. This method is global in that all PN unknowns are being estimated concurrently per iteration.

An external file that holds a picture, illustration, etc.
Object name is nihms-184504-f0002.jpg

4.5. Row-by-row shrinkage (RBRS)

The proposed IRLS technique solves for all PN unknowns during each iteration, but this is cumbersome when PN is large. An alternative approach is to apply an inner loop that fixes n and then iteratively tunes hn while holding the other hms (mn) constant; thus only P (rather than PN) unknowns need to be solved for during each inner iteration.

This idea inspires the RBRS algorithm. The term “row-by-row” is used because each hn corresponds to row n of the G matrix in (3.3), and “shrinkage” is used because the [ell]2 energy of most of the hn's will essentially be “shrunk” (to some extent) during each inner iteration: when λ is sufficiently large and many iterations are undertaken, many hn's will be close to all-zeros vectors.

4.5.1. RBRS for real-valued data

Assume that d and the Cn's of (3.7) are real-valued. We seek to minimize the function by extending the single-vector sequential shrinkage technique of [21] and making modifications to (3.7). Assume that we have prior estimates of h1 through hN, and that we now desire to update only the jth vector while keeping the other N − 1 fixed. The shrinkage update of hj is achieved via

minx{12[Σn=1NCnhnCjhj]+Cjxd22+λx2},
(4.9)

where Σn=1NCnhnCjhj forms an approximation of d using the prior solution coefficients, but discards the component contributed by the original jth vector, replacing the latter via an updated solution vector, x. It is crucial to note that the right-hand term does not promote the element-by-element sparsity of x; rather, it penalizes the overall [ell]2 energy of x, and thus both sparse and dense x's are penalized equally if their overall [ell]2 energies are the same.

One way to solve (4.9) is to take its derivative with respect to xT and find x such that the derivative equals 0. By doing this and shuffling terms, and assuming we have an initial estimate of x, we may solve for x iteratively:

xi=[CjTCj+λxi12+I]1CjTrj,
(4.10)

where rj=d+CjhjΣn=1NCnhn, I is a P × P identity matrix, and [set membership] is a small value that avoids ill-conditioned results.3 By iterating on (4.10) until (4.9) changes by less than δinner, we arrive at a solution to (4.9), xopt, and this then replaces the prior solution vector, hj. Having completed the update of the jth vector, we proceed to update the rest of the vectors, looping this outer process K times or until the main objective function, (3.7), changes by less than δouter. Algorithm 3 details the entire procedure; unlike IRLS, here we essentially repeatedly invert P × P matrices to pursue a row-by-row solution, rather than PN × PN matrices to pursue a solution that updates all rows per iteration.

An external file that holds a picture, illustration, etc.
Object name is nihms-184504-f0003.jpg

4.5.2. Extending RBRS to complex-valued data

If (3.7) contains complex-valued terms, we may structure the row-by-row updates as in (4.9), but because the derivative of the objective function in (4.9) is more complicated due to the presence of complex-valued terms, the simple update equation given in (4.10) is no longer applicable. One way to overcome this problem is to turn the complex-valued problem into a real-valued one.

Let us create several real-valued expanded vectors,

d~=[Re(d)Im(d)]2M,h~n=[Re(hn)Im(hn)]2P,
(4.11)

as well as real-valued expanded matrices,

C~n=[Re(Cn)Im(Cn)Im(Cn)Re(Cn)]2M×2P.
(4.12)

Due to the structure of (4.11), (4.12) and the fact that ||hn||2 equals ||hn||2, the following optimization is equivalent to (3.7):

minh~1,,h~N{12d~Σn=1NC~nh~n22+λΣn=1Nh~n2}.
(4.13)

This means that we may apply RBRS to complex-valued scenarios by substituting the hn's for the hn's and Cn's for the Cn's in (4.9), (4.10), and Algorithm 3. (Equation (4.10) becomes an applicable update equation because (4.9) will consist of only real-valued terms and the derivative calculated earlier is again applicable.) Finally, after running the algorithm to obtain finalized hn's, we may simply restructure them into complex hn's.

In embedding the complex-valued problem into a larger real-valued problem, we have demonstrated a fact about simultaneous sparse approximation that seems to not have been remarked upon previously: by setting P = 1, we see that seeking a single sparse complex-valued vector is equivalent to seeking two simultaneously sparse real-valued vectors.

4.6. Column-by-column shrinkage (CBCS)

We have also developed a dual of RBRS—a technique that sequentially updates the columns of G (i.e., the gp's) in (3.1), (3.3) rather than its rows (the hn's). This approach yields a separable optimization and reduces the overall problem to simply repeated element-by-element shrinkages of each gp. For a detailed derivation and pseudocode, see [70, 72].

4.7. Second-order cone programming (SOCP)

We now propose a seventh and final algorithm for solving the MSSO problem as given in (3.3). We branch away from the shrinkage approaches that operate on individual columns or rows of the G matrix and again seek to concurrently estimate all PN unknowns. Rather than using an IRLS technique, however, we pursue an SOCP approach, motivated by the fact that second-order cone programs may be solved via efficient interior point algorithms [56, 60] and are able to encapsulate conic, convex-quadratic [1], and linear constraints. (Quadratic programming is not an option because the gp's, Fp's, and d may be complex.)

Second-order conic constraints are of the form a = [a1, …, aN]T such that

[a1,,aN1]T2aN.
(4.14)

The generic format of an SOC program is

minxcTxs.t.Ax=bandxK,
(4.15)

where K=+N×L1××LN,+N is the N-dimensional positive orthant cone and the Lns are second-order cones [1]. To convert (3.3) into the second-order cone format, we first write

minG{12s+λ1Tt}s.t.z=dtotFtotgtot,z22s,and[Re(g1[n]),Im(g1[n]),,Re(gP[n]),Im(gP[n])]T2tn,
(4.16)

where n [set membership] {1, …,N} and t = [t1, …,tN]T. The splitting of the complex elements of the gp's mimics the approach used when extending CBCS to complex data, and (4.16) makes the objective function linear, as required. Finally, in order to represent the z22s inequality in terms of second-order cones, an additional step is needed. Given that s=14(s+1)214(s1)2, the inequality may be rewritten as zHz+14(s1)214(s+1)2 and then expressed as a conic constraint: [zT,12(s1)]T212(s+1) [1, 42]. Applying these changes yields

min{12s+λ1Tt}s.t.z=dtotFtotgtot,[zT,u]T2v,u=(s1)2,v=(s+1)2,s0,and[Re(g1[n]),Im(g1[n]),,Re(gP[n]),Im(gP[n])]T2tn,
(4.17)

which is a fully defined second-order cone program that may be implemented and solved numerically. There is no algorithm pseudocode for this technique because having set up the variables in (4.17), one may simply plug them into an SOCP solver. In this paper we implement (4.17) in SeDuMi (Self-Dual-Minimization) [56], a free software package consisting of MATLAB and C routines.

5. Experiments and results

Our motivation for solving MSSO sparse approximation problems comes from MRI RF excitation pulse design. Before turning to this problem in section 5.3, we present several synthetic experiments. These experiments allow comparisons among algorithms and also empirically reveal some properties of the relaxation (3.3). Theoretical exploration of this relaxation is also merited but is beyond the scope of this manuscript. One work in this area is [55].

Experiments were performed on a Linux server with a 3.0-GHz Intel Pentium IV processor. The system has 16 GB of random access memory, ample to ensure that none of the algorithms requires the use of virtual memory and to avoid excessive hard drive paging. MP, LSMP, IRLS, RBRS, and CBCS are implemented in MATLAB, whereas SOCP is implemented in SeDuMi. The runtimes could be reduced significantly by implementing the methods in a completely compiled format such as C. Note: OMP is not evaluated because its performance always falls in between that of MP and LSMP.

5.1. Sparsity pattern estimation in a noiseless setting

5.1.1. Overview

We now evaluate how well the algorithms of section 4 estimate sparsity patterns when the underlying gp's are each strictly and simultaneously K-sparse and the observation d of (3.1) is not corrupted by noise. This caricatures a high signal-to-noise ratio (SNR) source localization scenario, where the sparsity pattern indicates locations of emitters and our goal is to find the locations of these emitters [34, 39, 42, 43].

We synthetically generate real-valued sets of Fp's and gp's in (3.1), apply the algorithms, and record the fraction of correct sparsity pattern entries recovered by each. We vary M in (3.1) to see how performance at solving the MSSO problem varies when the Fp's are underdetermined vs. overdetermined and also vary P to see how rapidly performance degrades as more system matrices and vectors are employed.

5.1.2. Details

For all trials, we fix N = 30 in (3.1) and K = 3, which means each gp vector consists of 30 elements, three of which are nonzero. We consider P [set membership] {1, 2, …, 8} and M [set membership] {10, 15, …, 40}. For each of the 56 fixed (M, P) pairs, we create 50 random instances of (3.1) over which to report the average performance. Each of the 2 800 instances is constructed with the sparsity pattern chosen uniformly at random and the nonzero entries of the gp's drawn from the standard normal N(0, 1) distribution. The columns of the Fp's are chosen independently from the uniform distribution on the RM unit sphere.

MP and LSMP are applied by iterating until K elements are chosen or the residual approximation is 0. If fewer than K terms are chosen, this hurts the recovery score.

For IRLS, RBRS, CBCS, and SOCP, we approximate a λ oracle as follows: loop over roughly 70 λ's in [0, 2], running the given algorithm each time. For each of the estimated ĝtot's (that vary with λ), estimate a sparsity pattern by noting the largest λ2 energy rows of the associated Ĝ matrix.4 Remember the highest fraction recovered across all λ's.

5.1.3. Results

Each subplot of Figure 5.1 depicts the average fraction of recovered sparsity pattern elements vs. the number of knowns, M, for a fixed value of P, revealing how performance varies as the Fp [set membership] RM×N matrices go from being underdetermined to overdetermined.

Fig. 5.1
Sparsity pattern estimation in a noiseless setting

Recovery trends

As the number of knowns M increases, recovery rates improve substantially, which is sensible. For large M and small P, the six algorithms behave similarly, consistently achieving nearly 100% recovery. For large P and moderate M, however, sparsity pattern recovery rates are dismal—as P increases, the underlying simultaneous sparsity of the gp's is not enough to combat the increasing number of unknowns, PN. As M is decreased and especially when P is increased, the performance of the greedy techniques falls off relative to that of IRLS, RBRS, CBCS, and SOCP, showing that the convex relaxation approach itself is a sensible way to approximately solve the formal NP-hard combinatorial MSSO simultaneous sparsity problem. Furthermore, the behavior of the convex algorithms relative to the greedy ones coincides with the studies of greedy vs. convex programming sparse approximation methods in single-vector [7, 10] and SSMO contexts [11]. LSMP tends to perform slightly better than MP because it solves a least squares minimization and explicitly considers earlier chosen rows whenever it seeks to choose another row of G.

Convergence

Across most trials, IRLS, RBRS, CBCS, and SOCP converge rapidly and do not exceed the maximum limit of 500 outer iterations. The exception is CBCS when M is small and P = 8: here, the objective function frequently fails to decrease by less than the specified δ = 10−5.

Runtimes

For several fixed (M, P) pairs, Table 5.1 lists the average runtimes of each algorithm across the 50 trials associated with each pair. For IRLS, RBRS, CBCS, and SOCP, runtimes are also averaged over the many λ runs. Among the convex minimization methods, SOCP seems superior given its fast runtimes in three out of four cases. Peak memory usage is not tracked here because it is difficult to do so when using MATLAB for such small problems; it will be tracked during the third experiment where the system matrices are vastly larger and differences in memory usage across the six algorithms are readily apparent.

Table 5.1
Average algorithm runtimes for noiseless sparsity pattern estimation For several fixed (M, P) pairs, each algorithm's average runtime over the corresponding 50 trials is given in units of milliseconds; N = 30 and K = 3 for all trials (runtimes of the ...

Some details on how IRLS, RBRS, CBCS, and SOCP differ in the objective functions that they achieve are provided in [70, 72].

5.2. Sparsity pattern estimation in the presence of noise

5.2.1. Overview

We now modify the scenario of section 5.1 to include additive white Gaussian noise in the observation d. The simultaneous sparsity level K and SNR (in dB) are varied across sets of Monte Carlo trials. The variance of each component of the noise is related to the SNR as follows:

σ2=1Mdtrue2210SNR10.
(5.1)

This noise measure is analogous to that of [11].

5.2.2. Details

We fix N = 30, M = 25, and P = 3, and consider SNR [set membership] {−10, −5, 0, …, 25, 30} and K [set membership] {1, 3, 5, 7, 9}. For each fixed (SNR, K) pair, we generate 100 random instances over which to report the average performance. The gp's and Fp's are generated as in section 5.1.2. The algorithms are applied as before, with the exception that IRLS, RBRS, CBCS, and SOCP use a fixed λ as described below.

Control parameter selection

Before running the overall experiment, we generate three noisy observations for each (SNR, K) pair. We then apply IRLS, RBRS, CBCS, and SOCP, tuning the control parameter λ by hand until finding a single value that produces reasonable solutions. All algorithms then use this fixed λ for all 100 trials with the (SNR, K) pair under consideration. Thus, in contrast to the noiseless experiment, we no longer assume an ideal λ is known for each denoising trial.

5.2.3. Results

Each subplot of Figure 5.2 depicts the average fraction of recovered sparsity pattern elements vs. SNR for a fixed K, revealing how well the six algorithms are able to recover the sparsity pattern amid noise.

Fig. 5.2
Sparsity pattern estimation in the presence of noise

Recovery trends

When K = 1, we see from the upper-left subplot of Figure 5.2 that all algorithms have essentially equal performance for each SNR. Recovery rates improve substantially with increasing SNR, which is sensible. For each algorithm, we see across the subplots that performance generally decreases with increasing K; this too is as expected. For low SNRs, e.g., −5 dB, all algorithms tend to perform similarly, but the greedy algorithms perform increasingly worse than the others as K goes from moderate to large and SNR surpasses zero dB. In general, MP performs worse than LSMP, and LSMP in turn performs worse than IRLS, SOCP, RBRS, and CBCS, while the latter four methods exhibit essentially the same performance across all SNRs and K's. Overall, Figure 5.2 shows that convex programming algorithms are superior to greedy methods when estimating sparsity patterns in noisy situations; this coincides with data collected in the noiseless experiment in section 5.1, as well as the empirical findings of [10, 11].

Convergence

CBCS typically requires more iterations than the other techniques in order to converge. At times, it fails to converge to within the specified δ = 10−5, similarly to how it behaves during the noiseless experiment of section 5.1.

Runtimes

Across all denoising trials, MP, LSMP, IRLS, RBRS, CBCS, and SOCP have average runtimes of 3.1, 25.1, 57.2, 247.0, 148.5, and 49.2 milliseconds. It seems that SOCP is best for denoising given that it is the fastest algorithm among the four methods that outperform the greedy ones. IRLS is nearly as fast as SOCP and thus is a close second choice for sparsity pattern estimation.

This experiment is extended with a discussion of mean-squared error of the estimates in [70, 72].

5.3. MRI RF excitation pulse design

5.3.1. Overview

For the final experiment we study how well the six algorithms design MRI RF excitation pulses. In the interest of space and because the conversion of the physical problem into an MSSO format involves MRI physics and requires significant background, we only briefly outline how the system matrices arise and why simultaneously sparse solutions are necessary. A complete formulation of the problem for engineers and mathematicians is given in [70, 71]; MRI pulse designers may refer to [74]. We limit our evaluation here to fidelity of the designed excitations. Related papers provide results from a real system for head-shaped water phantom imaging [74] and in vivo human brain imaging [73].

5.3.2. Formulation

Consider an MRI experiment in which thin slices are desired in a spatial dimension defined to be the z direction. Thin-slice imaging is a dominant use of RF excitation in clinical MRI. For the purposes of this paper, the design of an MRI RF excitation pulse reduces to the following problem: assume that we are given M points in the two-dimensional (2-D) (x, y) spatial domain, r1, …, rM, along with N points in a 2-D “Fourier-like” domain, k1, …, kN. Each rm equals [xm, ym]T, a point in space, while each kn equals [kx,n, ky,n]T, a point in the Fourier-like domain, referred to as “k-space.” The rm's and kn's are in units of centimeters (cm) and inverse centimeters (cm−1), respectively. The kn's are Nyquist-spaced relative to the sampling of the rm's and may be visualized as a 2-D grid located at low kx and ky frequencies (where “kx” denotes the frequency domain axis that corresponds to the spatial x axis). Under a small-tip-angle approximation, energy placed at one or more points in k-space produces a pattern in the spatial domain; this pattern is related to the k-space energy via a “Fourier-like” transform [52]. Assume that we place an arbitrary complex weight gn [set membership] C (i.e., both a magnitude and phase) at each of the N locations in k-space. Let us represent these weights using a vector g = [g1, …, gN]T [set membership] CN. In an ideal setting (i.e., using the small-tip-angle approximation [52] as mentioned above and neglecting coil transmission profiles), the following holds:

m=Ag,
(5.2)

where A [set membership] CM×N is a known dense Fourier matrix5 and the mth element of m [set membership] CM is the image that arises at rm, denoted m(rm), due to the energy deposition along the N points on the k-space grid as described by the weights in the g vector. The energy deposition in the kz dimension is along sinc-like pulse segments that do not enter our design process because the z dependence and (x, y) dependence of the desired excitation are separable. A formulation that considers three dimensions more generally is presented in [70, sect. 5.2].

The goal now is to form a desired (possibly complex-valued) spatial-domain image d(r) at the given set of spatial-domain coordinates (the rm's) by placing energy at some of the given kn locations while obeying a special constraint on how the energy is deposited. To produce the spatial-domain image, we will use a “P-channel MRI parallel excitation system” [37, 54]—each of the system's P channels is able to deposit energy of varying magnitudes and phases at the k-space locations and is able to influence the resulting spatial-domain pattern m(r) to some extent. Each channel p has a known “profile” across space, Sp(r) [set membership] C, that describes how the channel influences the magnitude and phase of the resulting image at different spatial locations. For example, if S3(r5) = 0, then the third channel is unable to influence the image that arises at location r5, regardless of how much energy it deposits along k1, …, kN. The special constraint mentioned above is that the system's channels may visit only a small number of points in k-space—they may deposit energy only at K [dbl less than] N locations.

We now finalize the formulation: first, we construct P diagonal matrices Sp [set membership] CM×M such that Sp(m, m) = Sp(rm), m = 1, …, M. Now we assume that each channel deposits arbitrary energies at each of the N points in k-space and describe the weighting of the k-space grid by the pth channel with the vector gp [set membership] CN. Based on the physics of the P-channel parallel excitation system, the overall image m(r)that forms is the superposition of the profile-scaled subimages produced by each channel:

m=S1Ag1++SPAgP=F1g1++FPgP=Ftotgtot,
(5.3)

where m = [m(r1), …, m(rM)]T. Essentially, (5.3) refines the model of (5.2) by incorporating multiple excitation channels and accounting for coil transmission profiles {Sp(r)}p=1P that are not necessarily constant across r.

Recalling that our overall goal is to deposit energy in k-space to produce the image d(r), and given the special constraint that we may deposit energy only among a small subset of the N points in k-space, we arrive at the following problem:

ming1,,gPdm2s.t.the simultaneousKsparsity of thegps,
(5.4)

where d [set membership] CM = [d(r1), …, d(rM)]T [set membership] CM and m is given by (5.3). That is, we seek out K < N locations in k-space at which to deposit energy to produce an image m(r) that is close in the [ell]2 sense to the desired image d(r). Strictly and simultaneously K-sparse gp's are the only valid solutions to the problem.

One sees that (5.4) is precisely the MSSO system given in (3.2), and thus the algorithms posed in section 4 are applicable to the pulse design problem. In order to apply the convex minimization techniques (IRLS, SOCP, RBRS, and CBCS) to this problem, the only additional step needed is to retune any given solution estimate ĝtot (alg, λ) into a strictly and simultaneously K-sparse set of vectors.

Aside

An alternative approach to deciding where to place energy at K locations in k-space is to compute the Fourier transform of d(r) and decide to place energy at (kx, ky) frequencies where the Fourier coefficients are largest in magnitude [69]. This method does yield valid K-sparse energy placement patterns, but empirically it is always outperformed by the convex minimization approaches [73, 71, 74]; thus we do not delve into the Fourier-based method in this paper.

5.3.3. Experimental setup

Using an eight-channel system (i.e., P = 8) whose profile magnitudes (the Sp(r)'s) are depicted in Figure 5.3, we will design pulses to produce the desired image shown in the left subplot of Figure 5.4. We sample the spatial (x, y) domain at M = 356 locations within the region where at least one of the profiles in Figure 5.3 is active; this region of interest is referred to as the field of excitation (FOX) in MRI literature.6 The spatial samples are spaced by 0.8 cm along each axis, and the FOX has a diameter of roughly 20 cm. Given our choice of r1, …, r356, we sample the S(r)'s and d(r) and construct the Sp's and d. Next, we define a grid of N = 225 points in (kx, ky)-space that is 15 × 15 in size and extends outward from the k-space origin. The points are spaced by 120cm1 along each k-space axis, and the overall grid is shown in the right subplot of Figure 5.4. Finally, because we know the 356 rm's and 225 kn's, we construct the 356 × 225 matrix A in (5.2), (5.3) along with the Fp's in (5.3). We now have all the data we need to apply the algorithms and determine simultaneously K-sparse gp's that (approximately) solve (5.4).

Fig. 5.3
Profile magnitudes of the eight-channel parallel excitation MRI system
Fig. 5.4
Desired image and k-space grid

We apply the algorithms and evaluate designs where the use of K [set membership]1 {1, …, 30} candidate points in k-space is permitted (in practical MRI scenarios, K up to 30 is permissible). Typically, the smallest K that produces a version of d(r) to within some [ell]2 fidelity is the design that the MRI pulse designer will use on a real system.

To obtain simultaneously K-sparse solutions with MP and LSMP, we set K = 30, run each algorithm once, remember the ordered list of chosen indices, and back out every solution for K = 1 through K = 30.

For each convex minimization method (IRLS, SOCP, RBRS, and CBCS), we apply the following procedure: first, we run the algorithm for 14 values of λ[0,14] storing each resulting solution ĝtot(alg,λ). Then, for fixed K, to determine a simultaneously K-sparse deposition of energy on the k-space grid, we apply the retuning process to each of the 14 solutions, obtaining 14 strictly simultaneously K-sparse retuned sets of solution vectors, g^tot(K)(alg,λ). The one solution among the 14 that best minimizes the [ell]2 error between the desired and resulting images, dFtotgtot(K)(alg,λ)2, is chosen as the solution for the K under consideration. Essentially, we know a good value for λ when applying each of the convex minimization methods. To attempt to avoid convergence problems, RBRS and CBCS are permitted 5 000 and 10 000 maximum outer iterations, respectively (see below).

5.3.4. Results

Image [ell]2 error vs. number of energy depositions in k-space

The left subplot of Figure 5.5 shows the [ell]2 error vs. K curves for each algorithm. As K is increased, each method produces images with lower [ell]2 error, which is sensible: depositing energy at more locations in k-space gives each technique more degrees of freedom with which to form the image. In contrast to the sparsity pattern estimation experiments in sections 5.1 and 5.2, however, here LSMP is the best algorithm: for each fixed K considered in Figure 5.5, the LSMP technique yields a simultaneously K-sparse energy deposition that produces a higher-fidelity image than all other techniques. For example, when K = 17, LSMP yields an image with [ell]2 error of 3.3. In order to produce an image with equal or better fidelity, IRLS, RBRS, and SOCP need to deposit energy at K = 21 points in k-space and thus produce less useful designs from an MRI perspective. CBCS fares the worst, needing to deposit energy at K = 25 grid points in order to compete with the fidelity of LSMP's K = 17 image.

Fig. 5.5
MRI pulse design results

Closer look: Objective function vs. λ

The right subplot of Figure 5.5 shows how well the four convex minimization algorithms minimize the objective function (3.3), (3.7) before retuning any solutions and enforcing strict simultaneous K-sparsity. For each fixed λ, SOCP and IRLS find solutions that yield the same objective function value. RBRS's solutions generally yield objective function values equal to those of SOCP and IRLS, but at times lead to higher values: in these cases RBRS almost converges but does not do so completely. Finally, for many λ's, CBCS fails to converge and yields extremely large objective function values. Some additional detail on convergence behavior is provided in [70, 72].

Runtimes and peak memory usage

Setting K = 30, we run MP and LSMP and record the runtime of each. Across the 14 λ's, the IRLS, RBRS, CBCS, and SOCP runtimes are recorded and averaged. The peak memory usage of each algorithm is also noted; these statistics are presented in Table 5.2. In distinct contrast to the smaller-variable-size experiments in sections 5.1 and 5.2 where all four convex minimization methods have relatively short runtimes (under one second), here RBRS and CBCS are much slower, leaving IRLS and SOCP as the only feasible techniques among the four. Furthermore, while LSMP does indeed outperform IRLS and SOCP on an [ell]2 error basis (as shown in Figure 5.5), the runtime statistics here show that LSMP requires order-of-magnitude greater runtime to solve the problem—therefore, in some real-life scenarios where designing pulses in less than 5 minutes is a necessity, IRLS and SOCP are superior. Finally, in contrast to section 5.1's runtimes given in Table 5.1, here IRLS is 1.9 times faster than SOCP and uses 1.4 times less peak memory, making it the superior technique for MRI pulse design since IRLS's [ell]2 error performance and ability to minimize the objective function (3.3), (3.7) essentially equal those of SOCP.

Table 5.2
Algorithm runtimes and peak memory usage for MRI pulse design Each algorithm's runtime and peak memory usage are listed. The runtimes of the latter four algorithms are averaged over the 14 λ's per trial. MP is again faster than the other techniques, ...

Closer look: Images and chosen k-space locations forK = 17

To conclude the experiment, we fix K = 17 and observe the images produced by the algorithms along with the points at which each algorithm chooses to deposit energy along the grid of candidate points in (kx,ky)-space. Figure 5.6 illustrates the images (in both magnitude and phase) that arise due to each algorithm's simultaneously 17-sparse set of solution vectors,7 while Figure 5.7 depicts the placement pattern chosen by each method. From Figure 5.6, we see that each algorithm forms a high-fidelity version of the desired image d(r) given in the left subplot of Figure 5.4, but among the images, LSMP's most accurately represents d(r) (e.g., consider the sharp edges of the LSMP image's rectangular box). MP's and CBCS's images are noticeably fuzzy relative to the others. The placements in Figure 5.7 give insight into these performance differences. Here, LSMP places energy at several higher frequencies along the ky and kx axes, which ensures the resulting rectangle is narrow with sharp edges along the spatial y and x axes. In contrast, CBCS fails to place energy at moderate-to-high (kx,ky)-space frequencies and thus cannot produce a rectangle with desirable sharp edges, while MP branches out to some extent but fails to utilize high ky frequencies. IRLS, RBRS, and SOCP branch out to higher ky frequencies but not to high kx frequencies, and thus their associated rectangles in Figure 5.6 are sharp along the y axis but exhibit less distinct transitions (more fuzziness) along the spatial x axis. In general, each algorithm has determined 17 locations at which to place energy that yield a fairly good image, and each has avoided the computationally impossible scenario of searching over all N-choose-K (225-choose-17) possible placements.

Fig. 5.6
MRI pulse design: Images per algorithm for K = 17
Fig. 5.7
MRI pulse design: Energy deposition patterns per algorithm for K = 17

6. Discussion

6.1. MRI pulse design vs. denoising and source localization

The MRI pulse design problem in section 5.3 differs substantially from the source localization problem in section 5.1, the denoising experiment in section 5.2, and other routine applications of sparse approximation (e.g., [16, 7, 27, 22, 10, 11, 43]). It differs not only in purpose but in numerical properties, the latter of which are summarized in Table 6.1. While this list will not always hold true on an application-by-application basis, it does highlight general differences between the two problem classes.

Table 6.1
Unique trends of the MRI pulse design problem This table highlights differences between the MRI problem and standard denoising and source localization applications. Items here will not always be true, instead providing general highlights about each problem ...

6.2. Merits of row-by-row and column-by-column shrinkage

Even though LSMP, IRLS, and SOCP tend to exhibit superior performance across different experiments in this manuscript, RBRS and CBCS are worthwhile because, unlike the former methods that update all PN unknowns concurrently, the shrinkage techniques update only a subset of the total variables during each iteration.

For example, RBRS as given in Algorithm 3 updates only P unknowns at once, while CBCS updates but a single scalar at a time [70, 72]. RBRS and CBCS are thus applicable in scenarios where P and N are exceedingly large and tuning all PN parameters during each iteration is not possible. If storing and handling M × PN or PN × PN matrices exceeds a system's available memory and causes disk thrashing, RBRS and CBCS, though they require far more iterations, might still be better options than LSMP, IRLS, and SOCP in terms of runtime.

6.3. Future work

6.3.1. Efficient automated control parameter selection

A fast technique for finding ideal values of λ is an open problem. It might be useful to investigate several approaches to automated selection such as the “L-curve” method [33], universal parameter selection [20], and min-max parameter selection [35].

6.3.2. Runtime, memory, and complexity reduction

As noted in section 4, LSMP's computation and runtime could be improved upon by extending the projection-based recursive updating schemes of [10, 11] to MSSO LSMP. In regard to the MRI design problem, runtime for any method might be reduced via a multiresolution approach (as in [43]) by first running the algorithm with a coarse k-space frequency grid, noting which energy deposition locations are revealed, and then running the algorithm with a grid that is finely sampled around the locations suggested by the coarse result. This is faster than providing the algorithm a large, finely sampled grid and attempting to solve the sparse energy deposition task in one step. An initial investigation has shown that reducing the maximum frequency extent of the k-space grid (and thus the number of grid points, N) may also reduce runtime without significantly impacting image fidelity [74].

6.3.3. Shrinkage algorithm convergence improvements

When solving the MRI pulse design problem, both RBRS and CBCS required excessive iterations and hence exhibited lengthy runtimes. To mitigate these problems, one may consider extending parallel coordinate descent (PCD) shrinkage techniques used for SSSO sparse approximation (as in [21, 22]). Sequential subspace optimization (SESOP) [23] might also be employed to reduce runtime. Combining PCD with SESOP and adding a line search after each iteration would yield sophisticated versions of RBRS and CBCS.

6.3.4. Multiple-system multiple-output (MSMO) simultaneous sparse approximation

It may be useful to consider a combined problem where there are multiple observations as well as multiple system matrices. That is, assume we have a series of J observations, d1, …, dJ, each of which arises due to a set of P simultaneously K-sparse unknown vectors g1,j, …, gP,j8 passing through a set of P system matrices F1,j, …, FP,j and then undergoing linear combination, as follows:

dj=F1,jg1,j++FP,jgP,j=Σp=1PFp,jgp,jforj=1,,J.
(6.1)

If Fp,j is constant for all J observations, then the problem reduces to

dj=F1g1,j++FPgP,j=Ftotgtot,jforj=1,,J,
(6.2)

and we may stack the matrices and terms as follows:

[d1,,dJ]=Ftot[gtot,1,,gtot,J].
(6.3)

Having posed (6.1), (6.2), (6.3), one may formulate optimization problems similar to (2.4), (3.3) to determine simultaneously sparse gp,j's that solve (6.3). Algorithms for solving such problems may arise by combining the concepts of SSMO algorithms [11, 43, 65, 62] with those of the MSSO algorithms posed earlier.

7. Conclusion

We defined the linear inverse MSSO simultaneous sparsity problem where simultaneously sparse sets of unknown vectors are required as the solution. This problem differed from prior problems involving multiple unknown vectors because, in this case, each unknown vector was passed through a different system matrix and the outputs of the various matrices underwent linear combination, yielding only one observation vector.

To solve the proposed MSSO problem, we formulated three greedy techniques (MP, OMP, and LSMP) along with algorithms based on IRLS, iterative shrinkage, and SOCP methodologies. The MSSO algorithms were evaluated across noiseless and noisy sparsity pattern estimation experiments as well as an MRI pulse design experiment; for sparsity pattern recovery, algorithms that minimized the relaxed convex objective function outperformed the greedy methods, whereas in the MRI pulse design experiment, greedy LSMP exhibited superior performance.

When deriving RBRS for complex-valued data, we showed that seeking a single sparse complex-valued vector is equivalent to seeking two simultaneously sparse real-valued vectors: we mapped single-vector sparse approximation of a complex vector to the MSSO problem, increasing the applicability of algorithms that solve the latter.

Overall, while improvements upon these seven algorithms (and new algorithms altogether) surely do exist, this manuscript has laid the groundwork of the MSSO problem and conducted an initial exploration of algorithms with which to solve it.

Acknowledgments

The authors thank D. M. Malioutov for assistance with the derivation step that permitted the transition from (4.16) to 4.17) in section 4.7, as well K. Setsompop, B. A. Gagoski, V. Alagappan, and L. L. Wald for collecting the experimental coil profile data in Figure 5.3. The authors also gratefully acknowledge discussions on related work with A. C. Gilbert, R. Maleh, and D. Yoon.

Footnotes

*This material is based upon work supported by the National Institutes of Health under grants 1P41RR14075, 1R01EB000790, 1R01EB006847, and 1R01EB007942; the National Science Foundation under CAREER Grant 0643836; United States Department of Defense National Defense Science and Engineering Graduate Fellowship F49620-02-C-0041; the MIND Institute; the A. A. Martinos Center for Biomedical Imaging; Siemens Medical Solutions; and R. J. Shillman's Career Development Award.

1The choice of [ell]2 here is arbitrary; it can be replaced by any vector norm. We write || · ||0,2 because the relaxation we use subsequently is then naturally || · ||1,2.

2An [ell]p norm with p < 1 could be used in place of the [ell]1 norm if one is willing to deal with a nonconvex objective function. Further, an [ell]q norm (with q > 2) rather than an [ell]2 norm could be applied to each row of G because the purpose of the row operation is to collapse the elements of the row into a scalar value without introducing a sparsifying effect.

3Equation (4.10) consists of a direct inversion of a P × P matrix, which is acceptable in this paper because all experiments involve P ≤ 10. If P is large, (4.10) could be solved via a CG technique (e.g., LSQR).

4This could be described as thresholding the output of the original computation.

5Formally, A(m, n) = jγeirm·kn, where j=1 and γ is a known lumped gain constant.

6Sampling points outside of the FOX where no profile has influence is unnecessary because an image can never be formed at these points no matter how much energy any given channel places throughout k-space.

7Each image is generated by taking the corresponding solution gtot, computing m in (5.3), unstacking the elements of m into m(r), and then displaying the magnitude and phase of m(r).

8The K-term simultaneous sparsity pattern of each set of gp,j's may or may not change with j.

REFERENCES

1. Ben-Tal A, Nemirovski A. MPS/SIAM Ser. Optim. 2. SIAM; Philadelphia: 2001. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications.
2. Bjorck A, Elfving T. Technical report LiTH-MAT-R-1978-5. Department of Mathematics, Linköping University; Linköping, Sweden: 1978. Accelerated Projection Methods for Computing Pseudoinverse Solutions of Systems of Linear Equations.
3. Boyd S, Vandenberghe L. Convex Optimization. Cambridge University Press; Cambridge, UK: 2004.
4. Bruckstein AM, Donoho DL, Elad M. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 2009;51:34–81.
5. Candès EJ, Tao T. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory. 2006;52:5406–5425.
6. Chen S, Billings SA, Luo W. Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control. 1989;50:1873–1896.
7. Chen SS, Donoho DL, Saunders MA. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 1998;20:33–61.
8. Chen J, Huo X. Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Process. 2006;54:4634–4643.
9. Chen S, Wigger J. Fast orthogonal least squares algorithm for efficient subset model selection. IEEE Trans. Signal Process. 1995;43:1713–1715.
10. Cotter SF, Adler J, Rao BD, Kreutz-Delgado K. Forward sequential algorithms for best basis selection. IEE Proc. Vision Image Signal Process. 1999;146:235–244.
11. Cotter SF, Rao BD, Engan K, Kreutz-Delgado K. Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans. Signal Process. 2005;53:2477–2488.
12. Daubechies I, Defrise M, De Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 2004;57:1413–1457.
13. Davis G. Adaptive Nonlinear Approximations. New York University; New York: 1994. Ph.D. thesis.
14. Davis G, Mallat S, Zhang Z. Adaptive time-frequency decomposition. Opt. Eng. 1994;33:2183–2191.
15. Donoho DL. Superresolution via sparsity constraints. SIAM J. Math. Anal. 1992;23:1309–1331.
16. Donoho DL. De-noising by soft-thresholding. IEEE Trans. Inform. Theory. 1995;41:613–627.
17. Donoho DL. Compressed sensing. IEEE Trans. Inform. Theory. 2006;52:1289–1306.
18. Donoho DL. For most large underdetermined systems of equations, the minimal [ell]1-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 2006;59:907–934.
19. Donoho DL, Elad M, Temlyakov VN. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory. 2006;52:6–18.
20. Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81:425–455.
21. Elad M. Why simple shrinkage is still relevant for redundant representations? IEEE Trans. Inform. Theory. 2006;52:5559–5569.
22. Elad M, Matalon B, Zibulevsky M. Image denoising with shrinkage and redundant representations; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR); 2006. pp. 1924–1931.
23. Elad M, Matalon B, Zibulevsky M. Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization. Appl. Comput. Harmon. Anal. 2007;23:346–367.
24. Eldar YC, Bölcskei H. Block-sparsity: Coherence and efficient recovery. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing; Taipei, Taiwan. 2009.pp. 2885–2888.
25. Eldar YC, Rauhut H. Average Case Analysis of Multichannel Sparse Recovery Using Convex Relaxation. 2009. available online from http://www.arxiv.org/abs/0904.0494v1.
26. Fletcher AK, Rangan S. Orthogonal matching pursuit from noisy random measurements: A new analysis. Proceedings of the Neural Information Processing Systems Conference; Vancouver, Canada. 2009.
27. Fletcher AK, Rangan S, Goyal VK, Ramchandran K. Denoising by sparse approximation: Error bounds based on rate-distortion theory. EURASIP J. Appl. Signal Process. 2006;2006:26318.
28. Fornasier M, Rauhut H. Recovery algorithms for vector-valued data with joint sparsity constraints. SIAM J. Numer. Anal. 2008;46:577–613.
29. Golub G, Kahan W. Calculating the singular values and pseudo-inverse of a matrix. SIAM J. Numer. Anal. 1965;2:205–224.
30. Golub GH, Van Loan CF. Matrix Computations. Johns Hopkins University Press; Baltimore: 1983.
31. Gorodnitsky IF, Rao BD. Sparse signal reconstruction from limited data using FOCUSS: A recursive weighted norm minimization algorithm. IEEE Trans. Signal Process. 1997;45:600–616.
32. Gribonval R, Rauhut H, Schnass K, Vandergheynst P. Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms. J. Fourier Anal. Appl. 2008;14:655–687.
33. Hansen PC. Regularization tools: A Matlab package for analysis and solution of discrete ill-posed problems. Numer. Algorithms. 1994;6:1–35.
34. Johnson DH, Dudgeon DE. Array Signal Processing: Concepts and Techniques. Englewood Cliffs, NJ. Prentice–Hall; 1993.
35. Johnstone IM. On minimax estimation of a sparse normal mean vector. Ann. Statist. 1994;22:271–289.
36. Karlovitz LA. Construction of nearest points in the Lp, p even, and L norms. I, J. Approx. Theory. 1970;3:123–127.
37. Katscher U, Bornert P, Leussler C, van den Brink JS. Transmit SENSE. Magn. Reson. Med. 2003;49:144–150. [PubMed]
38. Kowalski M, Torrésani B. SPARS'09—Signal Processing with Adaptive Sparse Structured Representations. Saint-Malo, France: 2009. Structured sparsity: From mixed norms to structured shrinkage.
39. Krim H, Viberg M. Two decades of array signal processing research. The parametric approach. IEEE Signal Process. Mag. 1996;13:67–94.
40. Lanczos C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Research Nat. Bur. Standards. 1950;45:255–282.
41. Maleh R, Yoon D, Gilbert AC. SPARS'09—Signal Processing with Adaptive Sparse Structured Representations. Saint-Malo, France: 2009. Fast algorithm for sparse signal approximation using multiple additive dictionaries.
42. Malioutov DM. A Sparse Signal Reconstruction Perspective for Source Localization with Sensor Arrays. Massachusetts Institute of Technology; Cambridge, MA: 2003. Master's thesis.
43. Malioutov DM, Çetin M, Willsky AS. A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process. 2005;53:3010–3022.
44. Mallat SG, Zhang Z. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 1993;41:3397–3415.
45. Mishali M, Eldar YC. Reduce and boost: Recovering arbitrary sets of jointly sparse vectors. IEEE Trans. Signal Process. 2008;56:4692–4702.
46. Natarajan BK. Sparse approximate solutions to linear systems. SIAM J. Comput. 1995;24:227–234.
47. Negahban S, Wainwright MJ. Joint support recovery under high-dimensional scaling: Benefits and perils of [ell]1,-regularization. Proceedings of the Neural Information Processing Systems Conference; Vancouver, Canada. 2008.
48. Obozinksi G, Wainwright MJ, Jordan MI. High-dimensional support union recovery in multivariate regression. Proceedings of the Neural Information Processing Systems Conference; Vancouver, Canada. 2008.
49. Paige CC, Saunders MA. Algorithm 583. LSQR: Sparse linear equations and least squares problems. ACM Trans. Math. Software. 1982;8:195–209.
50. Paige CC, Saunders MA. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software. 1982;8:43–71.
51. Pati YC, Rezaiifar R, Krishnaprasad PS. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems, and Computers; Pacific Grove, CA. 1993.pp. 40–44.
52. Pauly JM, Nishimura D, Macovski A. A k-space analysis of small-tip-angle excitation. J. Magn. Reson. 1989;81:43–56.
53. Rangan S, Fletcher AK, Goyal VK. Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing. 2009. available online from http://www.arxiv.org/abs/0906.3234v1.
54. Setsompop K, Wald LL, Alagappan V, Gagoski BA, Hebrank F, Fontius U, Schmitt F, Adalsteinsson E. Parallel RF transmission with 8 channels at 3 Tesla. Magn. Reson. Med. 2006;56:1163–1171. [PubMed]
55. Stojnic M, Parvaresh F, Hassibi B. On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Trans. Signal Process. 2009;57:3075–3085.
56. Sturm J. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 1999;11/12:625–653.
57. Tibshirani R. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B. 1996;58:267–288.
58. Tikhonov AN. Solution of incorrectly formulated problems and the regularization method. Dokl. Akad. Nauk SSSR. 1963;151:501–504. in Russian.
59. Tikhonov AN, Arsenin VA. Solution of Ill-Posed Problems. Winston & Sons; Washington, DC: 1977.
60. Toh KC, Todd MJ, Tutuncu RH. SDPT 3 —a MATLAB software package for semidefinite programming, version 1.3. Optim. Methods Softw. 1999;11/12:545–581.
61. Tropp JA. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory. 2004;50:2231–2242.
62. Tropp JA. Algorithms for simultaneous sparse approximation: Part II: Convex relaxation. Signal Process. 2006;86:589–602.
63. Tropp JA. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory. 2006;52:1030–1051.
64. Tropp JA, Gilbert AC. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inform. Theory. 2007;53:4655–4666.
65. Tropp JA, Gilbert AC, Strauss MJ. Algorithms for simultaneous sparse approximation: Part I: Greedy pursuit. Signal Process. 2006;86:572–588.
66. van den Berg E, Friedlander MP. Joint-Sparse Recovery from Multiple Measurements. 2009. available online from http://www.arxiv.org/abs/0904.2051v1.
67. Wainwright MJ. Sharp thresholds for high-dimensional and noisy sparsity recovery using [ell]1-constrained quadratic programming (Lasso) IEEE Trans. Inform. Theory. 2009;55:2183–2202.
68. Wipf DP, Rao B. An empirical Bayesian strategy for solving the simultaneous sparse approximation problem. IEEE Trans. Signal Process. 2007;55:3704–3716.
69. Yip CY, Fessler JA, Noll DC. Advanced 3D tailored RF pulse for signal recovery inT2-weighted functional magnetic resonance imaging. Magn. Reson. Med. 2006;56:1050–1059. [PubMed]
70. Zelinski AC. Improvements in Magnetic Resonance Imaging Excitation Pulse Design. Massachusetts Institute of Technology; Cambridge, MA: 2008. Ph.D. thesis.
71. Zelinski AC, Goyal VK, Adalsteinsson E, Wald LL. Sparsity in MRI RF excitation pulse design. Proceedings of the 42nd Annual Conference on Information Sciences and Systems; Princeton, NJ. 2008.pp. 252–257.
72. Zelinski AC, Goyal VK, Adalsteinsson E. Simultaneously Sparse Solutions to Linear Inverse Problems with Multiple System Matrices and a Single Observation Vector. 2009. available online from http://www.arxiv.org/abs/0907.2083v1. [PMC free article] [PubMed]
73. Zelinski AC, Wald LL, Setsompop K, Alagappan V, Gagoski BA, Goyal VK, Adalsteinsson E. Fast slice-selective radio-frequency excitation pulses for mitigating B1+inhomogeneity in the human brain at 7 Tesla. Magn. Reson. Med. 2008;59:1355–1364. [PMC free article] [PubMed]
74. Zelinski AC, Wald LL, Setsompop K, Goyal VK, Adalsteinsson E. Sparsity-enforced slice-selective MRI RF excitation pulse design. IEEE Trans. Med. Imaging. 2008;27:1213–1229. [PMC free article] [PubMed]