Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3199372

Formats

Article sections

Authors

Related links

Magn Reson Imaging. Author manuscript; available in PMC 2011 November 1.

Published in final edited form as:

Published online 2011 September 9. doi: 10.1016/j.mri.2011.07.017

PMCID: PMC3199372

NIHMSID: NIHMS314821

Ze Wang^{*}

Department of Psychiatry, School of Medicine, Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104

The publisher's final edited version of this article is available at Magn Reson Imaging

See other articles in PMC that cite the published article.

Constrained independent component analysis (CICA) eliminates the order ambiguity of standard ICA by incorporating prior information into the learning process to sort the components intrinsically. However, the original CICA (OCICA) and its variants depend on a learning rate, which is not easy to be tuned for various applications. To solve this problem, two learning-rate free CICA algorithms were derived in this paper using the fixed-point learning concept. A complete stability analysis was provided for the proposed methods, which also made a correction to the stability analysis given to OCICA. Variations for adding constraints either to the components or the associated time courses were derived too. Using synthetic data, the proposed methods yielded a better stability and a better source separation quality in terms of higher SNR and smaller performance index (PI) than OCICA. For the artificially generated brain activations, the new CICAs demonstrated a better sensitivity/specificity performance than standard univariate general linear model (GLM) and standard ICA. OCICA showed a similar sensitivity/specficity gain but failed to converge for several times. Using fMRI data acquired with a well-characterized sensorimotor task, the proposed CICAs yielded better sensitivity than OCICA, standard ICA, and GLM in all the target functional regions in terms of either higher t-values or larger suprathreshold cluster extensions using the same significance threshold. In addition, they were more stable than OCICA and standard ICA for analyzing the sensorimotor fMRI data.

Independent component analysis (ICA) [17] is an advanced signal processing method designed to separate mutually independent components (ICs) from their observed mixtures without using any prior information of them. As a multivariate and data-driven method, ICA has gained popularity in many signal processing fields [17], including fMRI data analysis [4, 24, 30]. Without need of prior modeling for brain response, ICA bestows a large freedom to analyzing fMRI data with various kinds of tasks or even the null hypothesis data [2, 11]. However, standard ICA does not extract ICs with a fixed order [6], resulting in a problem of IC comparisons across different runs or across different subjects. Constrained ICA (CICA) [13, 20] provides a solution for this problem. In this approach, prior information contained in constraints is integrated into the source separation process, so that the decomposed ICs are sorted intrinsically. This idea was first proposed by Luo *et al*.[23] but was explicitly defined by Lu and Rajapakse who also provided a CICA algorithm based on a more advanced ICA method [20, 21]. For this reason, the conceptual framework of CICA proposed by Lu and Rajapakse will be employed in this paper, though a similar approach, the semi-blind ICA, has later been proposed by Calhoun *et al* [1].

One problem of the original CICA (OCICA) algorithm proposed by Lu and Rajapakse [20, 21] is that it uses a learning rate to control the updating gradient at each iteration. This learning-rate is not easy to tune for various applications, and a bad choice of it could completely destroy the algorithm convergence [3, 17]. For fMRI data analysis, the OCICA was directly adopted to extract temporally independent components using temporal constraints. As extracting spatially independent components represents another major interest of ICA-based fMRI data analysis [5, 24], CICA should be adapted to have these two capabilities, as well as applying either spatial or temporal constraints. Moreover, the stability analysis for CICA given in [21] was incomplete and was based on a non-established condition, so a complete stability analysis is still required to prove the convergence of CICA.

We have recently proposed a fixed-point CICA in a conference abstract [31], which does not need a learning rate. The purpose of this paper is to provide full derivations, comprehensive evaluations, and detailed convergence analysis for the fixed-point CICA algorithms. Method evaluations were conducted using synthetic 1D signal and 2D fMRI data and sensorimotor fMRI data.

Assuming the observed signal **x** = (*x*_{1}, *x*_{2}, …, *x _{n}*)

$$\mathcal{O}\left(\mathrm{y}\right)=-\sum _{i}\mathcal{J}\left({y}_{i}\right)$$

(1)

where $\mathcal{J}$(.) is the negentropy and can be approximated by [15]:

$$\mathcal{J}\left({\mathrm{y}}_{i}\right)\approx c{\left[E\left\{G\left({\mathrm{y}}_{i}\right)\right\}-E\left\{G\left(v\right)\right\}\right]}^{2}$$

(2)

where *c* is a positive constant, *G*(.) is a nonquadratic function, and ν is a Gaussian variable with a zero mean and a unit variance. This approximation was used in the OCICA and will be used in our proposed methods as well.

Due to the blindness to **s** and **A**, ICA is up to two ambiguities [17]: the scale ambiguity and the order ambiguity. Since signal shape is usually more informative than the amplitude in practice, the scale ambiguity is generally less problematic than the order ambiguity. To eliminate the order ambiguity, prior information was incorporated to constrain the directions of IC learning in CICA [20, 21]. Denote the constraint function by **q**(**y**) = (*q*_{1}(*y*_{1}), …, *q _{M}*(

$$\mathfrak{L}(\mathbf{W},\mu ,\mathbf{z},\lambda )=\mathcal{O}\left(\mathbf{y}\right)+{\mu}^{T}(\mathbf{q}\left(\mathbf{y}\right)+{\mathbf{z}}^{2})+\frac{1}{2}\gamma {\Vert \mathbf{q}\left(\mathbf{y}\right)+{\mathbf{z}}^{2}\Vert}^{2}+{\lambda}^{T}\mathbf{h}\left(\mathbf{y}\right)+\frac{1}{2}\gamma {\Vert \mathbf{h}\left(\mathbf{y}\right)\Vert}^{2}$$

(3)

where μ = (μ_{1}, …, μ_{M})^{T} and λ = (λ_{1}, …, λ_{N})^{T} are two sets of Lagrange multipliers, γ is the scalar penalty parameter, **z** = (*z*_{1}, …, *z _{i}*, …,

$${\mathfrak{L}}_{\mathbf{z}\ast}(\mathbf{W},\mu ,\lambda )=\mathcal{O}\left(\mathbf{y}\right)+{\mathfrak{L}}_{1\mathbf{z}\ast}(\mathbf{W},\mu )+{\lambda}^{T}\mathbf{h}\left(\mathbf{y}\right)+\frac{1}{2}\gamma {\Vert \mathbf{h}\left(\mathbf{y}\right)\Vert}^{2}$$

(4)

where ${\mathfrak{L}}_{1\mathbf{z}\ast}=\mathfrak{L}(\mathbf{W},\mu ,\mathbf{z}\ast )$ (see Eq. A.4). Without introducing any ambiguity, ${\mathfrak{L}}_{\mathbf{z}\ast}$ and $\mathfrak{L}$ will be used interchangeably hereafter in this paper, and so will be ${\mathfrak{L}}_{1\mathbf{z}\ast}$ and ${\mathfrak{L}}_{1}$. A Newton-like gradient method can be used to solve the optimization problem of CICA [20]:

$${\mathbf{W}}_{k+1}={\mathbf{W}}_{k}-\eta {\nabla}_{\mathbf{W}}\mathfrak{L}\u2215{\nabla}_{\mathbf{W}}{\nabla}_{\mathbf{W}}\mathfrak{L}$$

(5)

where _{W} and _{W}_{W} mean the first and second derivative respectively, η is the learning rate. This algorithm involves an inversion of the Hessian matrix, which is usually computationally intensive even when an approximation was used [20, 21]. Nevertheless, this matrix inversion can be avoided [19] upon using data prewhitening that is now a general preprocessing step in ICA [17]. We can then get ${\nabla}_{\mathbf{w}}{\mathfrak{L}}_{1}=[{\nabla}_{\mathbf{w}1}{\mathfrak{L}}_{1},\dots ,{\nabla}_{\mathbf{w}m}{\mathfrak{L}}_{1}]$ and ${\nabla}_{\mathbf{W}}{\nabla}_{\mathbf{W}}{\mathfrak{L}}_{1}=[{\nabla}_{\mathbf{w}1}{\nabla}_{\mathbf{w}1}{\mathfrak{L}}_{1},\dots ,{\nabla}_{\mathbf{w}m}{\nabla}_{\mathbf{w}m}{\mathfrak{L}}_{1}]$ with:

$${\nabla}_{\mathbf{w}p}{\mathfrak{L}}_{1}={\mu}_{p}{\nabla}_{{\mathbf{w}}_{p}}{q}_{p}\left({y}_{p}\right)={\mu}_{p}E\left\{\mathbf{x}{q}_{p}^{\prime}\left({y}_{p}\right)\right\}$$

(6a)

$${\nabla}_{{\mathbf{w}}_{p}}{\nabla}_{{\mathbf{w}}_{p}}{\mathfrak{L}}_{1}={\mu}_{p}E\left\{{\mathbf{x}}^{T}\mathbf{x}{q}_{p}^{\u2033}\left({y}_{p}\right)\right\}\approx {\mu}_{p}E\left\{{\mathbf{x}}^{T}\mathbf{x}\right\}E\left\{{q}_{p}^{\u2033}\left({y}_{p}\right)\right\}={\mu}_{p}E\left\{{q}_{p}^{\u2033}\left({y}_{p}\right)\right\}$$

(6b)

Since the maxima of $\mathcal{J}\left({\mathbf{y}}_{i}\right)$ is approached at certain optima of *E*{*G*(**y**_{i})}, the second item in Eq. 2 can be ignored during the process of pursuing the optimal **W** [16]. Also assuming that *G*(.) is second-order differentiable, we can get the first and second derivative of $\mathcal{O}$ wrt **W** as:

$${\nabla}_{\mathbf{W}}\mathcal{O}=-\widehat{c}{\nabla}_{\mathbf{W}}\left[E\left\{G\left(\mathbf{Wx}\right)\right\}\right]=-\widehat{c}E\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}$$

(7a)

$${\nabla}_{\mathbf{W}}{\nabla}_{\mathbf{W}}\mathcal{O}=-\widehat{c}E\{{\mathbf{x}}^{T}\mathbf{x}G\u2033\left(\mathbf{Wx}\right)\}-2c{E}^{2}\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}$$

(7b)

$$\approx -\widehat{c}E\{G\u2033\left(\mathbf{Wx}\right)\}+2c{E}^{2}\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}$$

(7c)

where

$$\widehat{c}=2c[E\left\{G\left(\mathbf{Wx}\right)\right\}-E\left\{G\left(v\right)\right\}]$$

(8)

By inserting Eqs. 6 and 7 and the first and second derivatives of the last two items in Eq. 4 into Eq. 5, we can get:

$${\mathbf{W}}^{+}=\mathbf{W}-\eta \frac{\widehat{c}E\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}-{\mu}^{T}E\{\mathbf{xq}\prime \left(\mathbf{y}\right)\}-2\lambda E\left\{{\mathbf{yx}}^{T}\right\}}{\widehat{c}E\{G\u2033\left(\mathbf{Wx}\right)\}+2c{E}^{2}\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}-{\mu}^{T}E\{\mathbf{q}\u2033\left(\mathbf{y}\right)\}-2\lambda}$$

(9)

followed by a normalization step:

$$\mathbf{W}={\mathbf{W}}^{+}\u2215\Vert {\mathbf{W}}^{+}\Vert $$

(10)

and an orthogonalization [16] to prevent different weight vector **w**_{i} from converging to the same optima [16], which could also reduce the accumulated residual errors in the sequential component extractions [34]:

$$\mathbf{W}={\left({\mathbf{WW}}^{T}\right)}^{-1\u22152}\mathbf{W}$$

(11)

The Lagrangian multipliers μ and λ are updated with an ascent iteration process [3]:

$${\mu}_{k+1}={\mu}_{k}+\gamma (\mathbf{q}\left(\mathbf{y}\right)+{(\mathbf{z}\ast )}^{2})=\mathrm{max}\{0,{\mu}_{k}+\gamma \mathbf{q}\left(\mathbf{y}\right)\}$$

(12a)

$${\lambda}_{k+1}={\lambda}_{k}+\gamma \mathbf{h}\left(\mathbf{y}\right)$$

(12b)

Eqs. 9, 10, 11, and 12 together constitute a new version of the OCICA method. Although this is a simplified and improved version, we will use it as the OCICA in this paper in order to get a fair comparison with the methods to be derived below, because all of these methods are dependent on the same data prewhitening and post-learning weight normalization processes.

As in [16], *c* in Eq. 2 can be set to be 1, and [*E*{*G*(**Wx**)} − *E*{*G*(ν)}] can be treated as a constant, so Eq. 7 can be simplified as:

$${\nabla}_{\mathbf{W}}\mathcal{O}\approx -\stackrel{\u2012}{c}E\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}$$

(13a)

$${\nabla}_{\mathbf{W}}{\nabla}_{\mathbf{W}}\mathcal{O}\approx -\stackrel{\u2012}{c}E\{G\u2033\left(\mathbf{Wx}\right)\}$$

(13b)

where $\stackrel{\u2012}{c}=\mathrm{sign}(E\left\{G\left(\mathbf{Wx}\right)\right\}-E\left\{G\left(v\right)\right\})$. Therefore, the Newton-like CICA algorithm Eq. 5 can be further simplified as:

$${\mathbf{W}}^{+}=\mathbf{W}-\eta \frac{\stackrel{\u2012}{c}E\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}-{\mu}^{T}E\{\mathbf{xq}\prime \left(\mathbf{y}\right)\}-2\lambda E\left\{{\mathbf{yx}}^{T}\right\}}{\stackrel{\u2012}{c}E\{G\u2033\left(\mathbf{Wx}\right)\}-{\mu}^{T}E\{\mathbf{q}\u2033\left(\mathbf{y}\right)\}-2\lambda}$$

(14)

As mentioned above, the separating matrix **W** is forced to be orthogonal and normalized after each iteration, so that **WW**^{T} = **I**. A simple Lagrangian function can be formed to minimize $\mathcal{O}$ with this constraint: $\mathfrak{L}=\mathcal{O}+{\lambda}^{T}({\mathbf{WW}}^{T}-\mathbf{I})$. Subsequently, the gradient $\Delta \mathbf{W}\propto {\nabla}_{\mathbf{W}}\mathfrak{L}={\nabla}_{\mathbf{W}}\mathcal{O}+2{\lambda}^{T}\mathbf{W}$. According to the Karesh-Kuhn-Tucker (KKT) condition [22, 29], at a stable point of this optimization problem, this gradient turns to be 0. Because λ is a scalar, this KKT condition indicates $\mathbf{W}\propto {\nabla}_{\mathbf{W}}\mathcal{O}$ and subsequently Δ**W** **W**. In other words, the updating gradient of **W** must fall in the same direction of **W** around the stable point, and they will only differ by a scaling factor. This is the basic idea of the fixed-point ICA learning algorithm proposed by Hyvärinen et al. [15, 16]. The same idea can be used here to derive a fixed-point CICA through replacing **W** by the calculated gradient at each iteration, and canceling the scaling factor out during the post normalization step (Eq. 10). Three approaches can be used to derive the fixed-point CICA algorithms.

The first approach is similar to that used to derive the FastICA in [15]. Since _{W}_{W}$\mathfrak{L}$ becomes diagonal due to the approximations used in Eq. 13b, multiplying **W** with _{W}_{W}$\mathfrak{L}$ will not affect **W** after normalization. Resultantly, a fixed-point CICA algorithm (CICA1) can be obtained by multiplying the right side of Eq. 15 with the denominator item in the 2nd item of Eq. 14 and then updating **W** with the multiplication result:

$${\mathbf{W}}^{+}=\stackrel{\u2012}{c}E\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}-\stackrel{\u2012}{c}E\{G\u2033\left(\mathbf{Wx}\right)\}\mathbf{W}-{\mu}^{T}(E\{\mathbf{xq}\prime \left(\mathbf{y}\right)\}-E\{\mathbf{q}\u2033\left(y\right)\}\mathbf{W})$$

(15)

As compared to the simplified OCICA in Eq. 9, λ is canceled out in CICA1, so only one Lagrangian multiplier is required in CICA1. The complete CICA1 learning process consists of Eqs. 15, 10, 11, and 12a.

The second approach (CICA2) is to update **W** directly with the Newton-like learning gradient (the second item in Eq. 15) and dropping the diagonal denominator for the same reason as described above. The corresponding learning procedure, which is called CICA2 in this paper, is:

$${\mathbf{W}}^{+}=-\stackrel{\u2012}{c}E\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}+{\mu}^{T}E\{\mathbf{xq}\prime \left(\mathbf{y}\right)\}$$

(16)

The third approach [31] is to remove the last item in the right side of Eq. 4. That item is usually added to enhance the local convexity so that the optimization problem is guaranteed to converge [3], but can be ignored here because of the additional normalization step involved in ICA. As we will discuss later, this simplification-based CICA will still converge to an optimum point. After removing the last term, the Lagrangian object function becomes:

$${\mathfrak{L}}_{\mathbf{z}\ast}\left(W,\mu \right)=\mathcal{O}\left(\mathbf{y}\right)+{\mathfrak{L}}_{1}(\mathbf{W},\mu )+{\lambda}^{T}({\mathbf{WW}}^{T}-\mathbf{I})$$

(17)

The optimal **W** can be approached by setting the first derivative of ${\mathfrak{L}}_{\mathbf{z}\ast}$_{z}* wrt **W** to zero, resulting in the following equation:

$$2{\lambda}^{T}\mathbf{W}={\nabla}_{\mathbf{W}}\mathcal{O}+{\nabla}_{\mathbf{W}}{\mathfrak{L}}_{1}$$

(18)

As the preceding scaling factor 2λ will be removed due to the weigh normalization after each learning procedure, the above equation can be simplified as:

$$\mathbf{W}={\nabla}_{\mathbf{W}}\mathcal{O}+{\nabla}_{\mathbf{W}}{\mathfrak{L}}_{1}$$

(19)

which is exactly the same as CICA2 as described above in Eq. 16. However, in order to make the algorithm converge for both super- and sub- Gaussian sources as discussed in the following stability analysis, Eq. 16 needs to be changed as:

$${\mathbf{W}}^{+}=E\{\mathbf{x}G\prime \left(\mathbf{Wx}\right)\}\pm {\mu}^{T}E\{\mathbf{xq}\prime \left(\mathbf{y}\right)\}$$

(20)

the plus sign + before μ is for using the distance-based closeness measure *q*_{1}, and the negative sign - is for using the correlation-based closeness measure *q*_{2}.

As in [20, 21], the closeness can be measured by distance of correlation to the reference. To facilitate the method convergence and allow **y** (or **W**) to evolve along both the same or the opposite direction of **r**, two functions can be used: **q**_{1}(**y**) = 1/4(min(*E*{(**y** − **r**)^{4}}, *E*{(**y** + **r**)^{4}}) − ξ^{4}) and **q**_{2} = 1/4(ξ^{4} − *E*{(**y**^{T }**r**)^{4}}), where ξ^{2} is the desired closeness.

The stability analysis given in [21] assumes λ > 0, which is not a classical KKT condition and can not be proven. In the following, we give a full derivation of the stability analysis for CICA1 and CICA2, which can be easily adapted for OCICA too. The standard KKT conditions can be described as: ${\nabla}_{\mathbf{W}}{\mathfrak{L}}_{\mathbf{z}\ast}(\mathbf{W}\ast ,\mu \ast )=0,\mathbf{q}(\mathbf{y}\ast )\le 0,\mu \ast \ge 0$ and μ*^{T}**q**(**y***) = 0, which represent the necessary conditions for whether the solution **y*** is an optimum of CICA. Since CICA1 is still based on the object function $\mathfrak{L}$ in Eq. 4, the corresponding Hessian matrix of the object function can be approximated by:

$${\nabla}_{\mathbf{W}}{\nabla}_{\mathbf{W}}\mathfrak{L}=-\stackrel{\u2012}{c}E\{G\u2033\left(\mathbf{Wx}\right)\}+{\mu}^{T}E\{\mathbf{q}\u2033\left(\mathbf{y}\right)\}+2\lambda +2\gamma \mathbf{h}\left(\mathbf{y}\right)+4\gamma E\left\{{\mathbf{WW}}^{T}\right\}\approx -\stackrel{\u2012}{c}E\{G\u2033\left(\mathbf{Wx}\right)\}+{\mu}^{T}E\{q\u2033\left(\mathbf{y}\right)\}+2\lambda +4\gamma $$

(21)

Eq. 12b and *E*{**WW**^{T}} = **I** have been used in the above derivation. The first item in the Hessian matrix, − $\stackrel{\u2012}{c}E\{G\u2033\left(\mathbf{Wx}\right)\}$ can be ensured to be no less than 0 by choosing the appropriate nonlinear function *G*(.) as described in [21]. For the two closeness functions used in CICA (see the following sub-section), μ^{T}*E*{**q**″(**y**) ≥ 0 is satisfied. If we choose a large positive value for γ, which is true in Lagrangian multiplier-based optimization [3], we can then make sure **2**λ+4γ > 0 and ${\nabla}_{\mathbf{W}}{\nabla}_{\mathbf{W}}\mathfrak{L}>0$. Different from [21], λ > 0 is not required here. This relation is further guaranteed in CICA1 for any positive γ since λ is canceled out after taking the fixed-point manipulation (Eq. 15), which is equivalent to λ = 0. The approximation of *E*{**WW**^{T}} = **I** does not change the validity of this condition since *E*{**WW**^{T}} > 0 and we can always adjust γ if that approximation is not true. In summary, CICA1 has a positive definite Hessian matrix, ensuring that any solution of the KKT conditions, **y***, is an optimum of the CICA optimization problem. Due to the use of a Newton-like gradient updating process as well as the modified closeness measures, CICA1 is also guaranteed to have an at least quadratic convergence rate [26].

Similar to the convergence analysis for FastICA given in [27], the stability of CICA2 requires the following assumption and the full analysis is given in the Appendix B.

*Assumption* 1: Functions *g*(.) = *G*′(.) and *q*′(.) are odd, twice differentiable, which satisfies:

$$E\left\{{s}_{i}g\left({s}_{i}\right)\right\}\pm \mu E\{{s}_{i}q\prime \left({s}_{i}\right)\}\ne 0$$

(22)

for all sources *s _{i}*,

*g*(.) is ensured to be odd since it is the derivative of the even nonlinear function *G*(.). For the two closeness functions described in last sub-section, *q*′(.) is also guaranteed to be odd. The 2 order differentiability can be easily verified for the functions used in CICA. With CCCM as the closeness measure, *E*{*s _{i}g*(

Since the constraints themselves may not be optima at all, a too small closeness threshold will destroy the convergence of any CICA algorithm. As proposed in [21], the closeness threshold ξ can be initialized with a small value to avoid any local maxima, and gradually increased so that the algorithm converges at the global maximum corresponding to the desired IC. When the threshold becomes large, the inequality constraints will be automatically satisfied, and μ will be 0, and the convergence performance of CICA1 will be the same as that of FastICA, which has been fully proven in [16, 27]. For CICA2, Assumption 1 will be automatically true for either of the closeness measure functions, and its convergence can be proven following the analysis in Appendix B as well.

For fMRI data analysis, ICA is usually applied along the temporal domain to extract some independent spatial component maps. Using the above CICA on fMRI data will require spatial brain activation patterns that are generally to be discovered during data analysis. A more widely available constraint is the temporal behavior of the brain activity since many fMRI experiments do have explicit design paradigms. This kind of temporal constraints acts on the mixing matrix **A**, or equivalently the separating weight **W**. The above described algorithms should be accordingly adjusted by replacing **q**(**y**) with **q**(**W**) in the CICA learning process.

As a data-driven method, ICA does not have a statistical framework for assessing the extracted components. A widely used approach to threshold the component map is to convert it to a z map, and applying a standard z significance threshold [24]. That method can only tell whether a voxel value in the component map is significantly different from 0 or not. To assess whether these patterns are invoked by the tasks or the stimuli, a statistical inference can be achieved by relating the component map associated TC to the design function. Since CICA is right designed to select the interested components, population level inference can be easily made using a standard one-sample t-test.

The FastICA (http://www.cis.hut.fi/projects/ica/fastica) package was modified to implement all the CICA algorithms in Matlab (Mathworks, Natick MA) scripts. The proposed CICA1 and CICA2 were compared to OCICA and FastICA (fICA) for performance evaluations. *G*(.) was chosen to be the log cosh function [16], and the constant *c* was set to be 1. The termination criteria for the ICA learning procedure was set to be ||**W**|| < 1*e*−6. For CICA, the Lagrangian multiplier μ was initialized to be 1, and the positive scalar parameter γ for the quadratic penalty item in Eq. 3 was set to 0.1 × 4^{k−1} (*k* is the iteration index). As suggested in [20, 21], the learning rate η in OCICA was initialized to be 1 and decreased by 10 times after every 200 iterations if the learning process did not converge. All input data were centered and dewhitened using PCA through singular vector decomposition [28]. And the number of ICs was estimated after PCA decomposition so that the remained components represented no less than 99% of the original data's variance. For each ICA decomposition run, a maximum of 1000 iterations was allowed. If an algorithm did not meet the convergence criteria as described above, a new decomposition run will be started.

A sequential mode implementation of each CICA was given and tested in the following corresponding listing. The batch mode implementations can be easily obtained based on the equations presented above.

Implementation of CICA2 was nearly the same as that of CICA1 in Listing 2 except that in CICA2, line 8 of Listing 2 should be replaced by “update **w*** _{i}* using Eq. 20”. In all CICA implementations,

Thirty experiments were performed using randomized mixing matrixes and source signals (each with 2000 samples). Fig. 1A shows the first 400 time points of the 6 source signals, S1, S2, S3, S4, S5, and S6 used in one of the 30 experiments. S2 is an ECG signal; S4 is a brain activation function generated by convolving a boxcar function with a canonical hemodynamic reference function (HRF) [9]; S5 is a Gaussian noise which was randomly generated each time; S6 is a 1/f noise which was regenerated at each time of simulation. During each experiment, S2 was randomly extracted from a segment of a long ECG signal; S1, S3, and S4 were generated by randomly shifting the full waves as shown in Fig. 1A by ± 100 points. At each experiment, the regenerated 6 sources were mixed 5 times using 6 randomly generated mixing matrices. The mixed data were then separated using the 4 ICA algorithms as described above. For CICA, the 4 reference signals shown in Fig. 1B were used as the constraints to extract the signals of interest. Those references were generated to have the same major frequency item as in their corresponding sources so that they were correlated to the sources. They were not chosen to be the same as the sources in order to test the capability of CICA for source sorting using constraints with large shape deviations from the sources of interest.

1D sources used for algorithm evaluations. A) The 5 source signals to be separated, B) the 4 references used to extract the 4 sources of interest using CICA.

After decomposition, the fICA extracted ICs were sorted based on their correlations to the original signals; their signs were also flipped if the correlations were negative. For CICAs, only the signs of the ICs were flipped if their correlations to the constraints were negative. The accuracy of each IC was measured using the signal-to-noise-ratio (SNR) in dB: SNR = 10 log_{10}(σ^{2}/MSE), where σ^{2} represents the variance of the source signal and MSE is the mean square error between the extracted IC and original source signal. The performance of each implemented algorithm for source separation was measured with the convergence time and an aggregate performance index (PI):

$$\mathrm{PI}=\frac{1}{m}(\sum _{i=1}^{m}{\mathrm{rPI}}_{i}+\sum _{j=1}^{m}{\mathrm{cPI}}_{j})$$

(23)

where ${\mathrm{rPI}}_{i}={\Sigma}_{j=1}^{m}\frac{\mid {p}_{ij}\mid}{{\mathrm{max}}_{k}{p}_{ik}}-1$, and ${\mathrm{cPI}}_{j}={\Sigma}_{i=1}^{m}\frac{\mid {p}_{ij}\mid}{{\mathrm{max}}_{k}{p}_{kj}}-1$, and *p _{ij}* denotes the (i,j)-th element of the permutation matrix

Synthetic fMRI data were generated from 6 source components (**S**) and the associated 6 time courses (**A**) as shown in Fig. 2 by **X** = **AS** + **v**, where **v** is random Gaussian noise. The Gaussian noise added here was to simulate the noise in real fMRI data after preprocessing and should not be confused as one of the target sources. Signal intensities were increased in several brain regions in the grey matter to simulate brain activations in the first 3 components: S1, S2, and S3. S1 was created to simulate activation in response to a standard “on-off” fMRI stimulus series as shown in the associated time course (tc1) in Fig. 2, which is a box function convolved with a canonical HRF [9]. S2 and S3 were generated to simulate some transient activations. S4 was related to resting brain activity that associated with a slowly fluctuating time course (tc4). S5 was related to head motion as the abrupt motion can be perceived easily around the 40th timepoint in tc5. S6 was simulating a quickly fluctuating noise component with CSF involved. Gaussian noise was added to simulate the additive noise in real fMRI environment. The same signal mixing process was repeated 50 times with different Gaussian noise. The resultant 50 pseudo-fMRI time series were then fed into the above mentioned 4 ICA methods to separate the source components using the same parameter settings. The image masks shown in Fig. 2C were used as spatial constraints for the source components that to be separated using CICA. The references were chosen to be not the same as the sources but just with similar spatial locations for them to be correlated with the sources. After decomposition, the fICA extracted ICs were sorted based on their correlations to the original signals; their signs were also flipped if the correlations were negative. For CICAs, only the signs of the ICs were flipped if their correlations to the constraints were negative. PI and SNR were collected to measure the algorithm performance.

The same pseudo-fMRI data were used to validate the performance of the proposed CICA methods when temporal constraints were used. Giving the fact that the stimulus paradigm is usually the only available reference for the temporal behavior of brain activations, temporally constrained ICAs were only used to extract the first source component: S1, as S1 was generated to represent brain activation in response to the pseudo stimulus paradigm. The box function of the pseudo stimulus paradigm (as shown in Fig. 6.TREF) was used as the temporal constraint, which was subsequently converted into the constraint for the separating weight **W** using the same dewhitening matrix as used for data dewhitening. Corresponding adjustments in Eqs. 15, 20, and 9 were made by replacing **q**(**y**) with **q**(**W**) in order to implement the temporally constrained ICA learning process. Other configurations were the same as in the spatially constrained cases.

The first source component and the associated time course identified by various ICA algorithms. AS, BS, CS and the associated time course ASTC, BSTC, CSTC were extracted by CICA1, CICA2, and OCICA, respectively using the spatial constraint sref1 as shown **...**

GLM was also applied to the random noise contaminated synthetic fMRI data using TC1 in Fig. 2A as the regressor. The statistical parametric maps were collected. To compare the sensitivity and specificity of CICA and GLM for brain activation detection, receiver operator characteristic (ROC) [25] curves were calculated by moving the threshold from the minimum to the maximum of the signal intensity of the extracted ICs or the statistical parametric maps generated by GLM and then calculating the true positive rate and false positive rate from the suprathresholded (using the current threshold) voxel as described in [33]. The true activated voxels were defined to be those within the two big blobs in s1 shown in Fig. 2A. The area under curve (AUC) was used as the surrogate score for the efficacy of each method.

Imaging experiments were performed using the same protocol as mentioned in previous work [35]. Briefly, both T1-weighted images and BOLD fMRI data were obtained on a 3T Siemens Trio whole body MR scanner. Seventeen young healthy subjects were scanned with written informed consent obtained before scanning following an Institutional Review Board approved protocol. The acquisition parameters for BOLD were: 30 interleaving slices, slice thickness=3 mm, flip angle=75°, TR/TE=3000/30 msec, in plane FOV=220 mm. The stimuli comprised 5 pairs of a 48 sec 8.3 Hz reversing black and white checkerboard and a 48 sec black screen block (baseline). Each participant was also asked to perform a self-paced left hand only fingertapping task during the visual stimuli.

All data preprocessing was performed using SPM (Wellcome Department of Cognitive Neurology, London, UK, http://www.fil.ion.ucl.ac.uk/spm) based batch scripts [32]. Functional images of each subject were motion corrected, coregistered with the structural images, and smoothed with an isotropic Gaussian filter (FWHM=6 mm). The structural image was normalized to the Montreal Neurological Institute/International Consortium for Brain Mapping (MNI/ICBM) 152 standard brain using SPM5. Standard GLM was performed on the spatially smoothed sensorimotor fMRI data using the canonical HRF function [9] convolved design function as the reference. The temporal autocorrelated noise was considered using the AR(1) model [8], and the high frequency noise was lowpassed with a cutoff of 1/128 Hz. The 3 directional translation and 3 directional rotation movement time course estimated during image realignment were included as nuisance covariates in the GLM model. Statistical parametric maps (SPMs) of the task minus baseline contrast were collected at each voxel using a standard t-test, which were subsequently normalized to the MNI space using the same normalization parameters estimated from the T1 images and finally a group analysis was performed by running a one-sample t-test on these spatially normalized parametric maps.

For CICA, the realigned and smoothed fMRI series were filtered using the same GLM model as described above but without the HRF convolved reference function in the model in order to preserve the task effects for the successive activation detection. The filtered images were obtained from the residual images after running the GLM estimation. CICA1, CICA2, OCICA, and fICA were used to analyze the preprocessed fMRI data. The box-car design function was used as the temporal constraint in all CICAs for extracting the activation component. Both OCICA and fICA were applied to each subject' data first, and the number of runs that they needed to converge was recorded. Then, the proposed CICAs were run on the same data for the same number of times in order to validate their stability. The extracted IC map of interest was sign flipped if the associated TC was negatively correlated to the box-car function. For fICA, the IC of interest was identified as the one whose TC presented the highest correlation to the box-car function. The extracted ICs using the constraints were then spatially normalized to the MNI standard space and a one-sample t-test was performed to assess the group level statistical significance for these revealed spatially distributed activation patterns. The same ICA process was run several times until it converged.

All algorithms converged for all 150 source separation trials when log cosh was used as the nonlinear function *G*(.). The average convergence time of CICA1, CICA2, OCICA, and fICA were 0.246, 0.237, 0.507, and 0.028 sec, respectively. Although the time saving was moderate, the differences were significant across different methods (*P* < 0.000001). The proposed CICAs still converged in another 150 separation trials when the kurtosis was used as the nonlinear function. By comparison, OCICA and fICA both did not converge for a few times. In the following, the performance validation results were based on the first nonlinear function used in the ICA algorithms. Fig. 3 shows the aggregate PI, and mean SNR of each ICA method for separating the 4 sources of interest from their randomly mixed observations. As compared to OCICA and fICA, the proposed 2 CICA algorithms yielded significantly lower PI value. OCICA did not improve the aggregate PI as compared to fICA. CICA1 and CICA2 yielded the same PI. As shown in Fig. 3B, CICA1 and CICA2 performed the same and both yielded higher SNR than OCICA and fICA for separating s2, s3, and s4. CICA1 and CICA2 also demonstrated higher SNR than OCICA for extracting s1. OCICA performed better than fICA only for separating the 3rd source. All the difference reported above were statistically significant at *P* < 0.005 (one-way ANOVA).

Fig. 4 shows the first 400 points of the signals extracted from one randomly mixed dataset using CICAs and fICA. The 4 signals of interest extracted by CICA1 (Fig. 4A) and CICA2 (Fig. 4B) were almost identical to the original sources as displayed in Fig. 1A. The most obvious performance differences were found in the 3rd and 4th sources, Y3 and Y4 extracted using OCICA and fICA, though minor variations still presented in Y1 and Y2 in Fig. 4C and 4D.

All tested ICA algorithms, including the spatially constrained CICA1, CICA2, OCICA, and fICA converged for all 50 source separation trials for those 50 different random noise contaminated 2D source data. Fig. 5 shows the aggregate PI, and SNR of each ICA method for separating the 6 sources. Using the same spatial constraints as shown in Fig. 2C, CICA1and CICA2 showed lower PI values than OCICA and fICA, and OCICA yielded lower PI than fICA. No PI difference was found between CICA1 and CICA2. Source extraction SNR was shown in Fig. 5B; spatial constraints were used in all the 3 CICA methods. CICA1 and CICA2 demonstrated higher SNR for extracting all the 6 sources than OCICA and fICA. OCICA presented higher SNR for extracting S1, S2, S3, and S5 than fICA. No performance difference was found between CICA1 and CICA2. Both CICA1 and CICA2 yielded slightly increased (not significant) CC for extracting S1 as compared to OCICA when the same temporal constraint was used (results not shown here). However, the latter did not converge for 2 times out of the 50 source separation runs. CICA with temporal constraint still outperformed fICA though only the first component was tested. The significance level used for reporting the differences in all comparisons was *P* < 0.005 (one-way ANOVA).

Fig. 6 shows the 2D source separation results using fICA and CICA with either spatial or temporal constraints. To save space, only the separation results for the 1st source were displayed here. As shown in Fig. 6AS (AT), BS (BT), ASTC (ATTC), BSTC (BTTC), the IC maps and TCs extracted by the proposed CICA methods using either spatial (or temporal) constraint were almost the same. When converged (as shown in Fig. 6), OCICA still yielded a good quality source separation, while fICA did not perform well for separating S1 from the same noise contaminated mixed signal. By comparing both the IC maps and the TCs in Fig. 6, we can find that CICA with spatial constraints demonstrated slightly better spatial map quality in terms of less background noise while CICA with temporal constraints yielded a slightly better temporal quality in terms of less temporal fluctuation in the TC.

In ROC analysis, both the proposed CICAs and OCICA (when converged) yielded the same AUC values (mean ± std = 0.9998464±5*e*−05), better than GLM-based method (0.9905818±6*e*−04). However, fICA yielded the lowest AUC (0.9543832 ± 5.48*E* − 02) in all compared methods.

OCICA and fICA converged at the first time when they were applied to 15 subjects' data, but both did not converge for 2 subjects until the 3rd run and OCICA did not converge until the 2nd run. The proposed CICA methods converged at the 1st run for analyzing all subjects' data, and also converged at the 2nd and 3rd time when they were applied to those 2 particular subjects' data for which OCICA and fICA did not converge at the 1st run. OCICA also converged at the 3rd run for those 2 subjects. Fig. 7 shows the statistical parametric maps of the 3 CICA methods, fICA, and GLM-based group level analyses using the sensorimotor fMRI data. At the same significance level (t=4.67, P=0.05 with the false detection rate based multiple comparison correction [10]), the proposed 2 CICA methods demonstrated higher sensitivity than OCICA in terms of higher t-values in all regions activated by the sensorimotor task. The 2 new CICA methods also yielded larger suprathreshold clusters in thalamus and left Insula than those revealed by OCICA. The peak locations of the group level statistical parametric maps of CICA1, CICA2, and OCICA based analyses are in the same location in both visual cortex (Vis) and motor cortex (Mot), with the value of 12.72/12.73/11.57 (CICA1/CICA2/OCICA) for Vis, and 12.87/12.87/12.62 (CICA1/CICA2/OCICA) for Mot. All CICA methods outperformed fICA for sensorimotor task activated brain activations in terms of larger suprathreshold clusters in all activated brain regions and also larger peak-t values in both Vis and Mot (9.53/8.05 in Vis/Mot in fICA-based group level statistical parametric map). As compared to GLM-based method, all CICA yielded higher sensitivity in terms of both peak t-value (12.12/8.5 in Vis/Mot in GLM based group analysis result) and cluster extensions in all activated brain regions except the left Insula.

Two fixed-point CICA algorithms were proposed in order to improve the stability of the OCICA method. CICA1 is based on a Newton-like gradient learning process; CICA2 is using a regular gradient descent approach. Algorithms for applying constraints to either the components or to the mixing time courses were derived especially for 2D source separations. No learning rate was used in both proposed algorithms. 1D randomly mixed synthetic data and 2D random noise contaminated synthetic data were used to evaluate the performance of the proposed algorithms with comparisons to the OCICA and standard ICA. The 1D dataset was used to validate the algorithms for separating randomly mixed signal. The 2D synthetic data were simulating fMRI data and were used to validate the proposed algorithms for source separation in the case of randomly generated noise; as well as their performance for source separation with either spatial constraints or temporal constraints. In both 1D and 2D source separation experiments, the proposed methods were demonstrated to be better than OCICA in terms of higher stability (converged for all experiments for both 1D and 2D source separations), higher source separation quality, and lower PI. All CICAs outperformed fICA for source separation with higher CC and lower PI due to the use of constraints. No performance difference was observed between CICA1 and CICA2. The proposed CICAs presented higher convergence rate than OCICA. Because fICA does not include any additional constraints, it showed the fastest convergence rate when converged. The ROC analysis showed that CICA yielded better sensitivity/specificity performance than GLM for brain activation detection. However, fICA was demonstrated to be even less effective than GLM for the pseudo-activation detection, which may partly due to the interference of noise added to the mixed signal. Another reason may be that FastICA does not yield the best separation results for fMRI data as compared to other ICA algorithms [7]. The additive noise could contribute significant background fluctuations, and the sub-optimal fMRI data source separation could make ICA converge to a local optima, which subsequently contributes component split or cross-contaminations. Among the 50 times source separations, the fICA identified ICs had large background noise or contaminations from other sources as shown in Fig. 6.D, which markedly degraded the sensitivity and specificity performance for the activation detection. By attracting the separating weight to the correct direction in the first several iterations, CICA provides a practical solution to deal with both the noise interference and the local optima problem.

A detailed stability analysis was provided for the proposed CICA methods. It is worth to note that the stability analysis for the OCICA [21] was based on an additional KKT condition λ > 0, which is not necessary to be true. However, the convergence of the OCICA is still satisfied if the stability analysis is based on the classical KKT condition and the convexity condition as described in this paper.

Random noise is usually not explicitly modeled in standard ICA since it can be treated as a regular IC if there is only one Gaussian noise source. However, noise added after source mixing could drastically degrade the source separation quality of regular ICA as demonstrated in Fig. 6.D. By contrast, CICA is much less sensitive to this additive noise as shown in Fig. 6. An interesting finding in the additive noise contaminated 2D source separation simulations is that better separation quality for IC can be achieved by using CICA with constraints for the ICs (the so-called spatial constraints in the 2D data separation simulations) and better separation quality for TC can be achieved by using CICA with constraints for the TCs (the so-called temporal constraints in the 2D data separation simulations), suggesting gaining further quality improvement for both ICs and the associated TCs using constraints for both of them if we do have the constraints.

Since the noise environment and source features might be quite different in real fMRI from what was simulated in the synthetic datasets, sensorimotor fMRI data were used to further validate the proposed methods for analyzing fMRI data. The proposed 2 CICA algorithms were able to converge at the first time when they were applied to each subject's data. As compared to OCICA, the proposed methods demonstrated increased sensitivity for detecting brain activations in response to the sensorimotor and visual task. As in the simulations, all CICAs demonstrated better activation detection sensitivity than fICA due to the use of constraints. Both CICA1 and CICA2 performed nearly the same for detecting both visual and sensorimotor activations. CICA including the proposed new CICAs and OCICA (when converged) yielded much higher sensitivity for sensorimotor brain activation detection than the GLM-based univariate method. The aforementioned ROC analysis results and the demonstrated superiority of CICA for the well-characterized sensorimotor task activation detection suggest that the proposed CICA methods can be employed as sensitive tools for brain activation in fMRI. One reason for this sensitivity gain is the multivariate data processing in ICA. As each input sample is treated as a single unit, ICA-based fMRI data analysis is searching the spatial patterns as an entity rather than pursuing a collection of discrete patterns separately at each voxel like in the univariate GLM. Because the spatially coherent activation patterns are more robust to noise interference than each single voxel if the noise is spatially independent which is a quite common assumption taken in the real MRI field [12], assessing the patterns as a whole should then gain a sensitivity increase as compared to the way that assesses the patterns at each isolated voxel. However, standard ICA especially fICA might not be able to fully demonstrate this benefit as partly shown in Fig. 7. A right solution is then to incorporating prior information into the ICA learning process as in CICA.

The sensorimotor data analysis results (Fig. 7) showed that all CICA methods and fICA yielded less sensitivity than GLM in left Insula in terms of lower peak t-value and small cluster size, which actually reflected a caveat of a multivariate fMRI data analysis method that it could induce a sensitivity loss to some focal brain activations if they are not coherent with the overall spatially connected activation pattern. Actually, we have identified in [31] that the left insula was activated less coherently than the other three regions, resulting in a less sensitive detection using ICA and CICA. Nevertheless, combining prior information and the data-driven property of ICA, CICA yielded better sensitivity for brain activation detection in the target visual and motor cortex in a well characterized sensorimotor fMRI experiment. In reality, resting fMRI data analysis is one of the hottest ICA application in fMRI. Since several resting networks have been repeatedly identified in the literature, it is possible to build templates for those networks using CICA and retrospectively use them in CICA for general resting state analysis. However, either topic needs additional extensive future work and should be reported separately.

One issue of CICA is that adding constraints could possibly affect the blindness feature of ICA. Similar to [20, 21], this potential bias is minimized by quickly releasing the constraints, so that the constraints will be totally inactive after several iterations and the rest of iterations will be the same as those in standard ICA. From this point of view, adding constraints does not change the data-driven property of ICA. Including constraints does not force ICA to produce the desired ICs if there does not exist any separating weights or ICs close to the constraints, but CICA through the proposed CICA1, CICA2, or OCICA (when an appropriate learning rate is used) is guaranteed to converge if some local optima exist regardless of whether they are close to the constraints or not. All the simulations demonstrated that the proposed CICA even the OCICA (when converged) were capable to identify the sources when the corresponding constraints were chosen to be a binary pulse or box-car function or binary image mask which are correlated to but with large deviations from the latent sources.

Although the proposed methods do not use any learning rate, they are still using two other parameters, the Lagrangian multiplier μ and the scalar penalty parameter γ. Different settings could potentially give different convergence rate. But we did not observe significant difference when the initial value of μ was set to be any number between 0.1 to 5 and when two additional different updating rules: 0.1 × 5^{k−1} and 4^{k−1} were used for γ. More details about these two parameters can be found in [3].

For CICA itself, we noticed that there is another approach called semi-blind ICA [1], which incorporates the prior information by adding an additional adjusting step to the mixing matrix to attract the separating weights to the desired directions. A possible issue is that appending an additional adjusting step is not guaranteed to keep the stability of the original ICA. The two factors for controlling the additional adjustment may not be easy to be tuned for different applications too. By contrast, the convergence of the proposed CICA methods was fully proven. As the focus of this paper was CICA method development, a direct comparison between CICA and semi-blind ICA would be beyond the scope of this paper and was not provided. We also noticed that a CICA algorithm very similar to CICA2 proposed in this paper has recently been published by Lin *et al.* [18] using a Lagrangian similar to Eq. 17. Although they did not provide a stability analysis for their method, they had demonstrated the superiority of their improved CICA (similar to CICA2 in this paper) to the improved OCICA proposed by the same group [19].

- Center and prewhiten the observed signal
**x**; - Set ε to a small value to control the learning precision ;
- Initialize η;
- Generate a zero–mean unit–norm Gaussian noise vector
**v**; - For the i–th constrained IC:
- Initialize μ;
- Initialize
**w**_{i}with a random vector, set its norm to be 1; - Repeat the following steps until the norm of Δ
**w**_{i}is less than ε or $\mid 1-\Vert {\mathbf{w}}_{i,k}^{T}{\mathbf{w}}_{i,k+1}\Vert \mid \le $ ε (*k*is the iteration index); - Calculateĉ with Eq. 8;
- Update γ;
- Update μ using Eq. 12a;
- Adjust ε to gradually release the constraints.

- Center and prewhiten the observed data
**x**; - Set ε to a small value to control the learning precision ;
- For the i–th constrained IC:
- Initialize μ;
- Initialize
**w**_{i}with a random vector, set the norm to 1; - Repeat the following steps until the norm of Δ
**w**_{i}is less than ε or $\mid 1-\Vert {\mathbf{w}}_{i,k}^{T}{\mathbf{w}}_{i,k+1}\Vert \mid \le \u220a$ (*k*is the iteration index); - Update γ;
- Update μ using Eq. 12a;
- Adjust ε to gradually release the constraints.

This research was supported by NIH grants: R21DC011074, R03DA023496, RR02305, R21DA026114, R01DA025906 and NS045839.

According to KKT condition [29], μ and **q** should be orthogonal at the local optimal point:

$${\mu}^{T}\mathbf{q}\left(\mathbf{y}\right)=0$$

(A.1)

Considering γ > 0, we can obtain the following equations using the optimal solution of (**z***)^{2} and A.1:

$${\mu}^{T}{(\mathbf{z}\ast )}^{2}=0$$

(A.2)

$${\mu}^{T}(\mathbf{q}\left(\mathbf{y}\right)+{(\mathbf{z}\ast )}^{2})=0$$

(A.3)

Using Eqs. A.1, A.2, and A.3, we can get the following classical equation of ${\mathfrak{L}}_{1\mathbf{z}\ast}$ (simplified as ${\mathfrak{L}}_{1,\mathbf{z}}$)as described in [3]:

$$\begin{array}{cc}\hfill {\mathfrak{L}}_{1\mathbf{z}}(\mathbf{W},\mu )& =\frac{\gamma}{2}(\mathbf{q}\left(\mathbf{y}\right)+{(\mathbf{z}\ast )}^{2})=\frac{1}{2\gamma}{(\text{max}\{\mu +\gamma \mathbf{q}\left(\mathbf{y}\right),0\}-\mu )}^{2}\hfill \\ \hfill & =\frac{1}{2\gamma}({\text{max}}^{2}\{\mu +\gamma \mathbf{q}\left(\mathbf{y}\right),0\}+{\mu}^{2}-2\phantom{\rule{thinmathspace}{0ex}}\text{max}\{{\mu}^{2}+\gamma {\mu}^{T}\mathbf{q}\left(\mathbf{y}\right),0\})\hfill \\ \hfill & =\frac{1}{2\gamma}(\text{max}\{\mu +\gamma \mathbf{q}\left(\mathbf{y}\right),0\}-{\mu}^{2})\hfill \end{array}$$

(A.4)

As in [27], let's assume the source signals *s _{i}* are zero mean, unit variance, so that they are whitened because of their independence. Since the input data

$$\mathbf{z}=\mathbf{Vx}=\mathbf{VAs}$$

(B.1)

which is actually the real input to the CICA2 learning process as described in Eq. 20. It is easy to see that (**VA**)^{T} (**VA**) = **I**. If the mixing matrix **A** were know, the sources could be estimated as **s** = (**VA**)^{T}**z**. To facilitate an easy derivation of the convergence stability, a simplifying linear transformation is made in [27]

$$\mathbf{U}=\mathbf{W}\left(\mathbf{VA}\right)\phantom{\rule{1em}{0ex}}\stackrel{\u2012}{\mathbf{U}}=\stackrel{\u2012}{\mathbf{W}}\left(\mathbf{VA}\right)$$

(B.2)

where **U** = (**u**_{1}, …, **u**_{m})^{T}. Combining with Eq. B.1, we get **Wz** = **W**(**VA**)**s** = **Us** and ${\mathbf{w}}_{i}^{T}\mathbf{z}={\mathbf{u}}_{i}^{T}\mathbf{s}$.

Replacing **x** with its whitened version **z** in Eq. 20, and left multiplying both sides of Eq. 20 with (**VA**)^{T}, we get

$${\stackrel{\u2012}{\mathbf{u}}}_{i}=E\left\{\mathbf{s}g\left({\mathbf{u}}_{i}\mathbf{s}\right)\right\}\pm {\mu}^{T}E\{\mathbf{s}q\prime \left({\mathbf{u}}_{i}\mathbf{s}\right)\}$$

(B.3)

Note that this derivation has considered ** s** = (

By right multiplying both sides of Eq. 11 with (**VA**), we can get [27]

$$\mathbf{W}\left(\mathbf{VA}\right)={\left(\stackrel{\u2012}{\mathbf{W}}\left(\mathbf{VA}\right){\left(\mathbf{VA}\right)}^{T}{\stackrel{\u2012}{\mathbf{W}}}^{T}\right)}^{-1\u22152}\stackrel{\u2012}{\mathbf{W}}\left(\mathbf{VA}\right)$$

(B.4)

which is equivalent to say [27]

$$\mathbf{U}={\left(\stackrel{\u2012}{\mathbf{U}}{\stackrel{\u2012}{\mathbf{U}}}^{T}\right)}^{-1\u22152}\stackrel{\u2012}{\mathbf{U}}$$

(B.5)

Therefore, Eqs. B.3 and B.5 form the general CICA2 algorithm in the transformed space.

Based on Assumption 1, we have *Lemma* 1*:* Matrix **U** = **D** = diag(±1, …, ±1) is a fixed point of Eqs. B.3 and B.5.

*Proof.* Since the i-th row vector of **D** is ${\mathbf{d}}_{i}^{T}=(0,\dots ,\pm 1,\dots ,0)$, we have ${\mathbf{d}}_{i}^{T}\mathbf{s}=\pm {\mathbf{s}}_{i}$, $g\left({\mathbf{d}}_{i}^{T}\mathbf{s}\right)=g(\pm {\mathbf{s}}_{i})$ and *q*′(**d**_{i}**s**) = *q*′(±**s**_{i}), which are independent of other sources **s**_{j}, *j* ≠ *i*. Consequently, *E*{**s**_{j}*g*(**d**_{i}**s**)} = 0 and *E*{**s**_{j}*q*′(**d**_{i}**s**)} = 0 for *i* ≠ *j*. From Eq. B.3, we have

$${\stackrel{\u2012}{\mathbf{d}}}_{i}={(0,\dots ,E\left\{{\mathbf{s}}_{i}g(\pm {\mathbf{s}}_{i})\right\}\pm {\mu}^{T}E\{{\mathbf{s}}_{i}q\prime (\pm {\mathbf{s}}_{i})\},\dots ,0)}^{T}$$

(B.6)

For ${\mathbf{d}}_{i}^{T}=(0,\dots ,1,\dots ,0)$, the above equation becomes

$${\stackrel{\u2012}{\mathbf{d}}}_{i}={(0,\dots ,E\left\{{\mathbf{s}}_{i}g\left({\mathbf{s}}_{i}\right)\right\}-{\mu}^{T}E\{{\mathbf{s}}_{i}q\prime \left({\mathbf{s}}_{i}\right)\},\dots ,0)}^{T}={c}_{i}{\mathbf{d}}_{i}$$

(B.7)

where *c _{i}* =

$${\stackrel{\u2012}{\mathbf{d}}}_{i}={(0,\dots ,-(E\left\{{\mathbf{s}}_{i}g\left({\mathbf{s}}_{i}\right)\right\}-{\mu}^{T}E\{{\mathbf{s}}_{i}q\prime \left({\mathbf{s}}_{i}\right)\}),\dots ,0)}^{T}={c}_{i}{\mathbf{d}}_{i}$$

(B.8)

as well because of the oddness of both *g*(.) and *q*′(.).

Therefore, we have ${\stackrel{\u2012}{\mathbf{d}}}_{i}={c}_{i}{\mathbf{d}}_{i}$ for both positive and negative cases, which can be also written in matrix form

$$\stackrel{\u2012}{\mathbf{D}}=\mathbf{KD}$$

(B.9)

where **K** = diag(*c*_{1}, …, *c _{m}*). According to Assumption 1, all

$${\left(\stackrel{\u2012}{\mathbf{D}}{\stackrel{\u2012}{\mathbf{D}}}^{T}\right)}^{-1\u22152}\stackrel{\u2012}{\mathbf{D}}={\left({\mathbf{KDD}}^{T}\mathbf{K}\right)}^{-1\u22152}\mathbf{KD}={\mathbf{K}}^{-1}\mathbf{KD}=\mathbf{D}$$

(B.10)

which means that **D** satisfies Eq. B.5 and then completes the proof.

*Lemma* 2*:* With Assumption 1, the fixed point stated in Lemma 1 is asymptotically stable and the convergence rate of CICA2 is at least two.

*Proof.* Assume the fixed point **D** in Lemma 1 is perturbed by:

$${u}_{ij}={d}_{ij}+{\u220a}_{ij}$$

(B.11)

where *d _{ij}* are the elements of

$$\begin{array}{cc}\hfill {\stackrel{\u2012}{\mathbf{u}}}_{i}& =E\left\{\mathbf{s}g({\mathbf{d}}_{i}\mathbf{s}+{\mathbf{e}}_{i}\mathbf{s})\right\}-{\mu}^{T}E\{\mathbf{s}q\prime ({\mathbf{d}}_{i}\mathbf{s}+{\mathbf{e}}_{i}\mathbf{s})\}\hfill \\ \hfill & =E\left\{\mathbf{s}g(\pm {\mathbf{s}}_{i}+{\mathbf{e}}_{i}\mathbf{s})\right\}-{\mu}^{T}E\{\mathbf{s}q\prime (\pm {\mathbf{s}}_{i}+{\mathbf{e}}_{i}\mathbf{s})\}\hfill \end{array}$$

(B.12)

According to Taylor series expansion theory, *g*(±**s**_{i} + **e**_{i}**s**) and *q*′(±**s**_{i} + **e**_{i}**s**) can be expanded at the points of ±**s**_{i} through:

$$g(\pm {\mathbf{s}}_{i}+{\mathbf{e}}_{i}s)=\pm g\left({\mathbf{s}}_{i}\right)+g\prime \left({\mathbf{s}}_{i}\right){\mathbf{e}}_{i}^{T}\mathbf{s}+O\left({\u220a}^{2}\right)$$

(B.13a)

$$q\prime (\pm {\mathbf{s}}_{i}+{\mathbf{e}}_{i}s)=\pm q\prime \left({\mathbf{s}}_{i}\right)+g\u2033\left({\mathbf{s}}_{i}\right){\mathbf{e}}_{i}^{T}\mathbf{s}+O\left({\u220a}^{2}\right)$$

(B.13b)

*O*(^{2}) collects all the remaining terms that are two and above order in the elements of **e**_{i}. It is then bounded by ^{2} times a scalar constant. We then have

$${\stackrel{\u2012}{\mathbf{u}}}_{i}\pm E\left\{\mathbf{s}g\left({\mathbf{s}}_{i}\right)\right\}\mp {\mu}^{T}E\{\mathbf{s}q\prime \left({\mathbf{s}}_{i}\right)\}+E\{\mathbf{s}g\prime \left({\mathbf{s}}_{i}\right){\mathbf{e}}_{i}^{T}\mathbf{s}\}-{\mu}^{T}E\{\mathbf{s}q\u2033\left({\mathbf{s}}_{i}\right){\mathbf{e}}_{i}^{2}\mathbf{s}\}+O\left({\u220a}^{2}\right)$$

(B.14)

Because

$$E\{\mathbf{s}g\prime \left({\mathbf{s}}_{i}\right){\mathbf{e}}_{i}^{T}\mathbf{s}\}-{\mu}^{T}E\{\mathbf{s}q\u2033\left({\mathbf{s}}_{i}\right){\mathbf{e}}_{i}^{T}\mathbf{s}\}=\pm (E\{g\prime \left({\mathbf{s}}_{i}\right){\mathbf{s}}_{i}^{2}\}-{\mu}^{T}E\{q\u2033\left({\mathbf{s}}_{i}\right){\mathbf{s}}_{i}^{2}\}){\u220a}_{ii}{\mathbf{d}}_{i}$$

(B.15)

we have

$${\stackrel{\u2012}{\mathbf{u}}}_{i}=(E\left\{\mathbf{s}g\left({\mathbf{s}}_{i}\right)\right\}-{\mu}^{T}E\{\mathbf{s}q\prime \left({\mathbf{s}}_{i}\right)\}){\mathbf{d}}_{i}\pm (E\{g\prime \left({\mathbf{s}}_{i}\right){\mathbf{s}}_{i}^{2}\}-{\mu}^{T}E\{q\u2033\left({\mathbf{s}}_{i}\right){\mathbf{s}}_{i}^{2}\}){\u220a}_{ii}{\mathbf{d}}_{i}+O\left({\u220a}^{2}\right)$$

(B.16)

This can be written in matrix form

$$\stackrel{\u2012}{\mathbf{U}}=\mathbf{KD}+\widehat{\mathbf{K}}\mathbf{D}+O\left({\u220a}^{2}\right)$$

(B.17)

where $\widehat{\mathbf{K}}$ is the diagonal matrix whose diagonal elements are $(E\{g\prime \left({\mathbf{s}}_{i}\right){\mathbf{s}}_{i}^{2}\}-{\mu}^{T}E\{q\u2033\left({\mathbf{s}}_{i}\right){\mathbf{s}}_{i}^{2}){\u220a}_{ii}$. With some trivial algebraic manipulations and based on **D**^{2} = **I**, we can get [27]:

$$\stackrel{\u2012}{\mathbf{U}}{\stackrel{\u2012}{\mathbf{U}}}^{T}={\mathbf{K}}^{2}(\mathbf{I}-2\widehat{\mathbf{K}}{\mathbf{K}}^{-1})+O\left({\u220a}^{2}\right)$$

(B.18)

Since $-2\widehat{\mathbf{K}}{\mathbf{K}}^{-1}$ is first order in , using the 1st order Taylor series expansion on ${(\mathbf{I}-2\widehat{\mathbf{K}}{\mathbf{K}}^{-1})}^{-1\u22152}$, we can have

$${\left(\stackrel{\u2012}{\mathbf{U}}{\stackrel{\u2012}{\mathbf{U}}}^{T}\right)}^{-1\u22152}={\mathbf{K}}^{-1}(\mathbf{I}+\widehat{\mathbf{K}}{\mathbf{K}}^{-1})+O\left({\u220a}^{2}\right)$$

(B.19)

and

$$\begin{array}{cc}\hfill {\left(\stackrel{\u2012}{\mathbf{U}}{\stackrel{\u2012}{\mathbf{U}}}^{T}\right)}^{-1\u22152}\stackrel{\u2012}{\mathbf{U}}& ={\mathbf{K}}^{-1}(\mathbf{I}+\widehat{\mathbf{K}}{\mathbf{K}}^{-1})(\mathbf{K}-\widehat{\mathbf{K}})\mathbf{D}+O\left({\u220a}^{2}\right)\hfill \\ \hfill & =\mathbf{D}+O\left({\u220a}^{2}\right)\hfill \end{array}$$

(B.20)

This equation proves the stability of the fixed point **D** of CICA2 with second-order convergence. The proof can also be adapted to prove the convergence of the fixed point solution of CICA1 with minor math manipulations as well as changes to Assumption 1.

1. Semi-blind ICA of fMRI: A method for utilizing hypothesis-derived time courses in a spatial ica analysis. NeuroImage. 2005;25(2):527–538. [PubMed]

2. Beckmann CF, DeLuca M, Devlin JT, Smith SM. Investigations into resting-state connectivity using independent component analysis. Philosophical Transactions of the Royal Society B: Biological Sciences. 2005;360(1457):1001–1013. [PMC free article] [PubMed]

3. Bertsekas DP. Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific; Belmont, Mass: 1996.

4. Calhoun V, Adali T, McGinty V, Pekar J, Watson T, Pearlson G. fMRI activation in a visual-perception task: Network of areas detected using the general linear model and independent components analysis. NeuroImage. 2001;14:1080–1088. [PubMed]

5. Calhoun V, Adali T, Pearlson G, Pekar J. Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms. Hum. Brain Map. 2001;13:43–53. [PubMed]

6. Comon P. Independent component analysis-a new concept? Signal Processing. 1994;36:287–314.

7. Daubechies I, Roussos E, Takerkart S, Benharrosh M, Golden C, D'Ardenne K, Richter W, Cohen JD, Haxby J. Independent component analysis for brain fMRI does not select for independence. Proceedings of the National Academy of Sciences. 2009;106(26):10415–10422. [PubMed]

8. Friston K, Holmes A, Poline J, Grasby PJ, Williams SCR, Frackowiak R, Turner R. Analysis of fmri time series revisited. NeuroImage. 1995;2:45–53. [PubMed]

9. Friston K, Worsley K, Frackowiak R, Mazziotta J, Evans A. Assessing the significance of focal activations using their spatial extent. Human Brain Mapping. 1994;1:214–220. [PubMed]

10. Genovese C, Lazar N, Nichols T. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage. 2002:870–878. [PubMed]

11. Greicius MD, Srivastava G, Reiss AL, Menon V. Default-mode network activity distinguishes Alzheimer's disease from healthy aging: Evidence from functional MRI. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(13):4637–4642. [PubMed]

12. Haacke EM, Brown RW, Thompson MR, Venkatesan R, Haacke EM, Brown RW, Thompson MR, Venkatesan R. Magnetic Resonance Imaging: Physical Principles and Sequence Design. 1 edition Wiley-Liss; 1999.

13. Hesse C, James C. The fastica algorithm with spatial constraints. Signal Processing Letters, IEEE. 2005 Nov.12(11):792–795.

14. Hestenes MR. Multiplier and gradient methods. J.O.T.A. 1969;4:303–320.

15. Hyväarinen A. New approximation of differential entropy for independent component analysis and projection pursuit. Advances in Neural Information Processing Systems. 1998;10:273–279.

16. Hyväarinen A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. on Neural Network. 1999;10(3):626–634. [PubMed]

17. Hyväarinen A, Karhunen J, Oja E. Independent Component Analysis. John Wiley & Sons; New York: 2001.

18. Lin Q-H, Liu J, Zheng Y-R, Liang H, Calhoun VD. Semiblind spatial ica of fmri using spatial constraints. Human Brain Mapping. 2010 page Articles online in advance of print. [PMC free article] [PubMed]

19. Lin Q-H, Zheng Y-R, Yin F-L, Liang H, Calhoun VD. A fast algorithm for one-unit ICA-R. Information Sciences. 2007;177:1265–1275.

20. Lu W, Rajapakse JC. Constrained independent component analysis. In: Leen TK, Dietterich TG, Tresp V, editors. Advances in Neural Information Processing Systems. 13. volume 16. MIT Press; Cambridge, MA: 2000. pp. 570–576.

21. Lu W, Rajapakse JC. Approach and applications of constrained ICA. IEEE Trans Neural Netw. 2005;16(1):203–12. [PubMed]

22. Luenberger DG. Optimization by Vector Space Methods. John Wiley and Sons; 1969.

23. Luo J, Hu B, Ling X-T, Liu R-W. Principal independent component analysis. IEEE Transactions on Neural Networks. 1999;10(4):912–917. [PubMed]

24. McKeown MJ, Sejnowski TJ. Independent component analysis of fMRI data: Examining the assumptions. Human Brain Mapping. 1998;6(5–6):368–372. [PubMed]

25. Metz CE. Basic principle of ROC analysis. Semin Nucl Med. 1978;VIII(4):283–98. [PubMed]

26. Nocedal J, Wright S. Numerical Optimization. 2nd edition Springer; New York: 2006.

27. Oja E, Yuan Z. The FastICA Algorithm revisited: convergence analysis. IEEE Transactions on Neural Networks. 2006;17(6) [PubMed]

28. Press W, Teukolsky S, Vetterling W, Flannery B. Numerical Recipes in C++: the art of scientific computing. 2 edition Cambridge University Press; 2002.

29. Rockafellar RT. Lagrange multipliers and optimality. SIAM Rev. 1993;35:183–238.

30. Stone JV, Porrill J, Porter NR, Wilkinson ID. Spatiotemporal independent component analysis of event-related fMRI data using skewed probability density functions. NeuroImage. 2002;15(2):407–421. [PubMed]

31. Wang Z. Proc of ISMRM. Stockholm, Sweden: 2010. A fixed-point iteration based constrained independent component analysis and its application in fMRI; p. 3496. ISMRM.

32. Wang Z, Aguirre GK, Rao H, Wang J, Fernández-Seara MA, Childress AR, Detre JA. Empirical optimization of ASL data analysis using an ASL data processing toolbox: ASLtbx. Magnetic Resonance Imaging. 2008;26(2):261–269. [PMC free article] [PubMed]

33. Wang Z, Childress AR, Wang J, Detre JA. Support vector machine learning-based fMRI data group analysis. NeuroImage. 2007;36(4):1139–1151. [PMC free article] [PubMed]

34. Wang Z, Lee Y, Fiori S, Leung C, Zhu Y. An improved sequential method for principal component analysis. Pattern Recognition Letters. 2003;24(9–10):1409–1415.

35. Wang Z, Mechanic-Hamilton D, Pluta J, Glynn S, Detre JA. Function lateralization via measuring coherence laterality. NeuroImage. 2009;47(1):281–288. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |