Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2997172

Formats

Article sections

Authors

Related links

Neuroimage. Author manuscript; available in PMC 2012 January 15.

Published in final edited form as:

Published online 2010 September 25. doi: 10.1016/j.neuroimage.2010.09.052

PMCID: PMC2997172

NIHMSID: NIHMS241256

Address for Correspondence: Srikanth Ryali, Ph.D. and Vinod Menon, Ph.D., Department of Psychiatry & Behavioral Sciences, 780 Welch Rd, Room 201, Stanford University School of Medicine, Stanford, CA 94305-5778, Email: ude.drofnats@ilayrs and ; Email: ude.drofnats@nonem, Tel no: 1-650-736-0128, Fax no: 1-650-736-7200

The publisher's final edited version of this article is available at Neuroimage

See other articles in PMC that cite the published article.

Analysis of dynamical interactions between distributed brain areas is of fundamental importance for understanding cognitive information processing. However, estimating dynamic causal interactions between brain regions using functional magnetic resonance imaging (fMRI) poses several unique challenges. For one, fMRI measures Blood Oxygenation Level Dependent (BOLD) signals, rather than the underlying latent neuronal activity. Second, regional variations in the hemodynamic response function (HRF) can significantly influence estimation of casual interactions between them. Third, causal interactions between brain regions can change with experimental context over time. To overcome these problems, we developed a novel state-space Multivariate Dynamical Systems (MDS) model to estimate intrinsic and experimentally-induced modulatory causal interactions between multiple brain regions. A probabilistic graphical framework is then used to estimate the parameters of MDS as applied to fMRI data. We show that MDS accurately takes into account regional variations in the HRF and estimates dynamic causal interactions at the level of latent signals. We develop and compare two estimation procedures using maximum likelihood estimation (MLE) and variational Bayesian (VB) approaches for inferring model parameters. Using extensive computer simulations, we demonstrate that, compared to Granger causal analysis (GCA), MDS exhibits superior performance for a wide range of signal to noise ratios (SNRs), sample length and network size. Our simulations also suggest that GCA fails to uncover causal interactions when there is a conflict between the direction of intrinsic and modulatory influences. Furthermore, we show that MDS estimation using VB methods is more robust and performs significantly better at low SNRs and shorter time series than MDS with MLE. Our study suggests that VB estimation of MDS provides a robust method for estimating and interpreting causal network interactions in fMRI data.

Functional magnetic resonance imaging (fMRI) has emerged as a powerful tool for investigating human brain function and dysfunction. fMRI studies of brain function have primarily focused on identifying brain regions that are activated during performance of perceptual or cognitive tasks. There is growing consensus, however, that localization of activations provides a limited view of how the brain processes information and that it is important to understand functional interactions between brain regions that form part of a neurocognitive network involved in information processing (Bressler and Menon 2010; Friston, 2009c; Fuster, 2006). Furthermore, evidence is now accumulating that the key to understanding the functions of any specific brain region lies in disentangling how its connectivity differs from the pattern of connections of other functionally related brain areas (Passingham et al., 2002). A critical aspect of this effort is to better understand how causal interactions between specific brain areas and networks change dynamically with cognitive demands (Abler et al., 2006; Deshpande et al., 2008; Friston, 2009b; Goebel et al., 2003; Mechelli et al., 2003; Roebroeck et al., 2005; Sridharan et al., 2008). These and other related studies in the literature highlight the importance of dynamic causal interactions for understanding brain function at the systems level.

In recent years, several methods have been developed to estimate causal interactions in fMRI data (Deshpande et al., 2008; Friston et al., 2003; Goebel et al., 2003; Guo et al., 2008; Rajapakse and Zhou, 2007; Ramsey et al., 2009; Roebroeck et al., 2005; Seth, 2005; Smith et al., 2009; Valdes-Sosa et al., 2005). Of these, Granger causal analysis (GCA) (Roebroeck et al., 2005; Seth, 2005) and dynamic causal modeling (DCM) (Friston et al., 2003) are among the more commonly used approaches thus far. There is a growing debate about the relative merits and demerits of these approaches for estimating causal interactions using fMRI data (Friston, 2009a, b; Roebroeck et al., 2009). The main limitations of GCA highlighted by this debate are that: (1) GCA estimates causal interactions in the observed Blood-Oxygenation-Level-Dependent (BOLD) signals, rather than in the underlying neuronal responses; (2) GCA may not be able to accurately recover causal interactions because of regional variations in hemodynamic response; and (3) GCA does not take into account the experimentally induced modulatory effects while estimating causal interactions (Friston, 2009a, b). The main limitations of DCM highlighted in this debate are that: (1) DCM is a confirmatory method wherein several causal models are tested and the model with the highest evidence is chosen. This is problematic if the number of regions under investigation is large since a large number of models need to be tested increases exponentially with the increase in the number of regions; (2) conventional DCM uses a deterministic model to describe the dynamics of the latent neuronal signals which may not be adequate to capture the dynamics of the underlying neuronal processes (a stochastic version was recently proposed (Daunizeau et al., 2009)); and (3) the assumptions used by DCM for deconvolution of hemodynamic response have not yet been adequately verified (Roebroeck et al., 2009). Here, we develop a new method that incorporates the relative merits of both GCA and DCM while attempting to overcome their limitations.

We propose a novel multivariate dynamical systems (MDS) approach (Bishop, 2006) for modeling causal interactions in fMRI data. MDS is based on a state-space approach which can be used to overcome many of the aforementioned problems associated with estimating causal interactions in fMRI data. State-space models have been successfully used in engineering applications of control systems and machine learning (Bishop, 2006) but their use in neuroscience has been limited. Notable examples of state-space models include Hidden Markov models (HMM) which are widely used in speech recognition applications (Rabiner, 1989) and Kalman filters for object tracking (Koller and Friedman, 2009). Critically, state-space models can be represented as probabilistic graphical models (Koller and Friedman, 2009), which as we show below (Figure 1), greatly facilitate representation and inference for causal modeling of fMRI data.

Probabilistic graphical model for multivariate dynamical system (MDS). All conditional interdependencies in MDS can be inferred from this model. The state variables *s**(t)* are modeled as a linear dynamical system. The non-diagonal elements of matrices **...**

Critically, MDS estimates causal interactions in the underlying latent signals, rather than the observed BOLD-fMRI signals. In order to estimate causal interactions from the observed fMRI data, it is important to take into account variations in hemodynamic response function (HRF) across different brain regions (David et al., 2008). MDS is a state-space model in which a “state equation” is used to model the unobserved states of the system and an “observation equation” is used to model the observed data as a function of latent state signals (Figure 1). The state equation is a vector autoregressive model incorporating both intrinsic and modulatory causal interactions. Intrinsic interactions reflect causal influences independent of external stimuli and task conditions, while modulatory interactions reflect context dependent influences. The observation models produce BOLD-fMRI signals as a linear convolution of latent signals and basis functions spanning the space of variations in HRF.

The latent signals and unknown parameters that characterize causal interactions between brain regions are estimated using two different approaches. In the first approach, we use expectation maximization (EM) to obtain maximum likelihood estimates (MLE) of the parameters and test the statistical significance of the estimated causal relationships between brain regions using a nonparametric approach. We refer to this approach as MDS-MLE. In the second approach, we use a Variational Bayes (VB) approach to compute the posterior distribution of latent variables and parameters which cannot be computed analytically using a fully Bayesian approach. We refer to this approach as MDS-VB. By representing MDS as probabilistic graphical network (Figure 1), we show that MDS-VB provides an elegant analytical solution for computing the posterior distributions and deriving causal connectivity estimates which are sparse and more readily interpretable.

We first describe our MDS model and discuss MLE and VB approaches for estimating intrinsic and modulatory causal interactions between multiple brain regions. We test the performance of MDS using computer-simulated datasets as a function of network size, fMRI time points and signal to noise ratio (SNR). We evaluate performance of our MDS models with extensive computer simulations and examine several metrics, including sensitivity, false positive rate and accuracy, in terms of correctly identifying both intrinsic and modulatory causal interactions. Finally, we contrast our results with those obtained with GCA.

In the following sections, we represent matrices by upper case letters and scalars and vectors by lower-case letters. Random matrices are represented by bold face letters whereas random vectors and scalars are represented by bold face lower-case letters.

Consider the following state-space model to represent the multivariate fMRI time series

$$\mathit{s}(t)=A\mathit{s}(t-1)+{\sum}_{j=1}^{J}{v}_{j}(t){C}_{j}\mathit{s}(t-1)+Du(t)+\mathit{w}(t)$$

(1)

$${\mathit{x}}_{m}(t)={[{\mathit{s}}_{m}(t){\mathit{s}}_{m}(t-1)\dots .{\mathit{s}}_{m}(t-L+1)]}^{\prime}$$

(2)

$${y}_{m}(t)={b}_{m}\mathrm{\Phi}{\mathit{x}}_{m}(t)+{\mathit{e}}_{m}(t)$$

(3)

In Equation (1), ** s**(

We model the fMRI time series in region *“m”* as a linear convolution of HRF and latent signal *s** _{m}*(

In Equation (3), *y _{m}*(

Equations (1–3) together represent a state-space model for estimating the causal interactions in latent signals based on observed multivariate fMRI time series. This model can be seen both as a multivariate extension of univariate time series models (Makni et al., 2008; Penny et al., 2005), and also as an extension of GCA wherein vector autoregressive model for latent, rather than BOLD-fMRI, signals are used to model the causal interactions among brain regions. Furthermore, our MDS model also takes into account variations in HRF as well as the influences of modulatory and external stimuli in estimating causal interactions between the brain regions.

Estimating causal interactions between *M* regions specified in the model is equivalent to estimating the unknown parameters *A* and *C _{j}*,

Maximum likelihood estimates of MDS model parameters and latent signals are obtained by maximizing the log-likelihood of the observed fMRI data. We use EM algorithm to estimate the unknown parameters and latent variables of the model. EM algorithm is an iterative method consisting of two steps viz., E-step and M-step. In the E-step, the posterior distribution of latent variables is computed given the current estimates of parameters. In the M-step, given the current posterior distribution of latent variables, the parameters of the model are estimated by maximizing the conditional expectation of log of complete likelihood given the data. The E and M steps are repeated until convergence. The log-likelihood of the data is guaranteed to increase or remain the same for every iteration of E and M steps. Also, the application of EM algorithm asymptotically gives maximum likelihood estimates of the parameters. In the E step, the posterior distributions are obtained by using Kalman filtering and smoothing algorithms (Bishop, 2006). The detailed equations for E and M steps are given in Appendix A. We refer this solution to MDS using MLE as MDS-MLE.

The statistical significance of intrinsic (*A*(*m*, *n*)) and modulatory (*C _{j}*(

In this approach, we use a VB framework to obtain the posterior distributions of the unknown parameters and latent variables. Let **Θ** = {**A**, **C _{1}**,..

$$logP(Y)=L(q)+KL(q||p)$$

(4)

Where

$$L(q)=\int d\mathit{S}d\mathbf{\Theta}q(\mathbf{S},\mathbf{\Theta}\mid \text{Y})log\frac{\text{p}(\text{Y},\mathbf{S},\mathbf{\Theta})}{\text{q}(\mathbf{S},\mathbf{\Theta}\mid \text{Y})}$$

(5)

$$KL(q||p)=-\int d\mathit{S}d\mathbf{\Theta}q(\mathbf{S},\mathbf{\Theta}\mid \text{Y})log\frac{\text{p}(\mathbf{S},\mathbf{\Theta}\mid \text{Y})}{\text{q}(\mathbf{S},\mathbf{\Theta}\mid \text{Y})}$$

(6)

KL(q||p) is the Kullback-Leibler divergence between *q*(**S**, **Θ**|Y) and *p*(**S**, **Θ**|Y). KL(q||p) ≥ 0, with equality, if and only if, *q*(**S**, **Θ**|Y) = *p*(**S**, **Θ**|Y). Therefore, *L*(*q*) serves as a lower bound on the log of the evidence (log *P*(*Y*)). The maximum of this lower bound occurs when KL divergence is zero for which the optimal choice of *q*(**S**, **Θ**|Y) is *p*(**S**, **Θ**|Y). Since *p*(**S**, **Θ**|Y) is not tractable, certain assumptions on the form of *q*(**S**, **Θ**|Y) are made and then the optimal distribution is found by maximizing the lower bound *L*(*q*). In this work, we assume that the posterior distribution *q*(**S**, **Θ**|Y) factorizes over **S** and **Θ**, i.e.,

$$q(\mathbf{S},\mathbf{\Theta}\mid \text{Y})={\text{q}}_{\mathbf{s}}(\mathbf{S}\mid \text{Y}){\text{q}}_{\mathbf{\Theta}}(\mathbf{\Theta}\mid \text{Y})$$

(7)

We note that no further assumptions are made on the functional form of these distributions q** _{s}**(

$$log\phantom{\rule{0.16667em}{0ex}}{\text{q}}_{\mathbf{s}}(\mathbf{S}\mid \text{Y})\propto {\text{E}}_{\mathbf{\Theta}}(log\text{p}(\text{Y},\mathbf{S},\mathbf{\Theta}))$$

(8)

$$log\phantom{\rule{0.16667em}{0ex}}{\text{q}}_{\mathbf{\Theta}}(\mathbf{\Theta}\mid \text{Y})\propto {\text{E}}_{\mathbf{s}}(log\text{p}(\text{Y},\mathbf{S},\mathbf{\Theta}))$$

(9)

Equations (8) and (9) are respectively VB-E and VB-M steps. Expectations are computed with respect to q** _{Θ}**(

The Bayesian approach allows the specification of both informative and non-informative priors on the model parameters. The specification of these priors helps in regularizing the solution and avoids over fitting in the case where number of parameters to be estimated is large compared to the number of observations. This is generally the case where the number of brain regions to be modeled is large. Since we do not have a priori information on these parameters, we specify non-informative conjugate priors on these parameters. Here, we briefly explain the notion of non-informative conjugate priors (Bishop, 2006). Let ** z** be a Gaussian random variable with mean

$${\mathit{A}}_{i,j}\sim N\left(0,\frac{1}{{\lambda}_{i,j}}\right)$$

(10)

where, *λ _{i,j}* is the prior precision for

We assess the performance of MDS using a number of computer-simulated datasets generated at various *SNRs* (10, 5 and 0 dB), for different number of brain regions or nodes (*M* = 2, 3 and 5) and for different number of time samples *(T = 200, 300 and 500).*

Figure 3 shows the intrinsic and modulatory connectivity of three networks with 2, 3 and 5 nodes. For example, in the two nodes network (Figure 3A), node 1 receives an external input and there is an intrinsic causal connection from node 1 to node 2 with a weight of −0.3 (*A*(2,1) = −0.3. A modulatory input induces a connection from node 1 to node 2 with weight of 0.5 (C(2,1) = 0.5) and whose sign is opposite to that of intrinsic connection. Similarly in the five-node structure (Figure 3C), node 1 receives the external input and has causal influences on nodes 2, 3 and 4 (matrix elements A). Nodes 4 and 5 have bidirectional influences. Modulatory inputs induce causal influences from node 1 to 2 and from node 3 to 2 (matrix C). Note that all three networks have intrinsic and modulatory connections from node 1 to 2 with weights −0.3 and 0.5 respectively. We simulated networks with these weights to explicitly test whether MDS could recover these interactions which could be missed by GCA because of opposing signs of the weights of intrinsic and modulatory connections. We set the autocorrelations of the time series (diagonal elements of matrix *A)* to 0.7 (Ge et al., 2009; Roebroeck et al., 2005). Our simulations also modeled variations in HRFs across regions. Figure 4 shows the simulated HRFs at each of the network nodes in the structure. The HRFs were constructed in such a way that the direction of hemodynamic delays is confounded with the direction of latent signal delay, making the task of recovering network parameters more challenging. For example in the 5-node network, node 1 drives nodes 2, 3 and 4 at the level of latent signals. But the HRF at node 1 peaks later than that in the nodes 2, 3 and 4. These HRFs were simulated using a linear combination of canonical HRF and its temporal derivative. Analogous to most fMRI data acquisition paradigms, we assume that the sampling interval, also referred to as repetition time (TR), is 2 seconds. This corresponds to an embedding dimension of *L* = 16, which is also the length of the HRF. Note that the duration of the canonical HRF is approximately about 32 seconds.

Simulated models with intrinsic and experimentally induced modulatory connections for (**A**) 2-node, (**B**) 3-node and (**C**) 5-node networks. Intrinsic connections are shown in solid lines and modulatory connections are shown in broken lines and highlighted with **...**

Figure 5 shows the experimental and modulatory inputs applied to the nodes shown in Figure 3. The external input was simulated to reflect an event-related fMRI design with stimuli occurring at random instances with the constraint that the time difference between two consecutive events is at least 2 TR (Figure 5A). This input can also be a slow event or block design. In the MDS framework, there is no restriction on the nature of the experiment design. The modulatory input is assumed to be a boxcar function (Figure 5B). The modulatory inputs indicate the time periods wherein the network configuration could change because of context specific influences such as changes in attention, alertness and explicit experimental manipulations.

Onset and duration of event related experimental stimuli (**A**) and modulatory inputs (**B**) used in the simulations.

The simulated datasets were generated using the model described in the equations (1)–(3). The latent noise covariance was fixed at *Q* = 0.1*I _{M}*, where

$${\sigma}_{m}^{2}=\mathit{Var}({y}_{m}){10}^{-0.1\mathit{SNR}}$$

(11)

We assume that the canonical HRF and its temporal derivative span the space of HRFs. Therefore, they constitute the rows of Θ which would be a 2×16 matrix. The coefficients of the matrices *A* and *C* for each network structure are shown in Figure 3.

We generated 25 datasets for each *SNR,* network structure and time samples. The performance of the method was assessed using the performance metrics described in the next section.

To examine the performance of the methods when there are no modulatory or external stimulus effects, we simulated 25 datasets for a 3-node network shown in Figure 3(B) at 5 dB SNR. We set the weights to the same values as in the previous case except that there are no causal interactions from modulatory inputs. The weights corresponding to external stimuli were all set to zero. The diagonal elements (autocorrelations) in matrix A were set to 0.8, 0.7 and 0.6 respectively. These datasets were created to provide a more appropriate, albeit less general, comparison of MDS with GCA.

Interactions between brain regions occur at finer time scales compared to the sampling intervals of fMRI signals. fMRI signals are typically sampled at TR = 1 or 2 seconds while neuronal processes occur at millisecond resolution. To investigate the effects of fMRI down sampling on MDS performance, we adopted the approach described by Deshpande and colleagues (Deshpande et al., 2009).We generated datasets with 1 millisecond sampling interval at 0 dB network shown in Figure 3A. We obtained neuronal signals in node 1 and node 2 with a delay of *d _{n}* milliseconds between them. In this case, node 1 drives node 2 under both intrinsic and modulatory conditions with weights shown in Figure 3A. The autocorrelations in node1 and 2 were set to 0.8 and 0.7 respectively. We then convolved neuronal signal at node 1 with a canonical HRF generated again at 1 KHz sampling rate and then re-sampled to sampling interval of TR = 2 seconds to obtain fMRI signal. In node 2, we convolved the “neuronal” signal with the HRF which was delayed by

The performance of MDS in discovering the intrinsic and modulatory causal interactions in simulated datasets was assessed using various performance metrics such as sensitivity, false positive rate and accuracy in correctly identifying causal intrinsic and modulatory interactions, where:

$$\mathit{sensitivity}=\frac{TP}{TP+FN}$$

(12)

$$\mathit{false}\phantom{\rule{0.16667em}{0ex}}\mathit{positive}\phantom{\rule{0.16667em}{0ex}}\mathit{rate}=\frac{FP}{TN+FP}$$

(13)

$$\mathit{accuracy}=\frac{TP+TN}{TP+FP+FN+TN}$$

(14)

where, TP is the number of true positives, TN is the number of true negatives, FN is the number of false negatives and FP is the number of false positives. These performance metrics are computed for each of the 25 datasets and then are averaged to obtain the overall performance.

We first illustrate the performance of MDS-MLE and MDS-VB by computing the estimated intrinsic and modulatory connections and the deconvolved (or estimated) latent signals of a five node network simulated at 5 dB noise shown in Figure 3C. The MDS approach, using either MLE or VB, simultaneously estimates both latent signals and unknown parameters in the model using E and M steps respectively. The left and right panels in Figure 6 respectively show the actual and estimated latent signals and the actual and estimated BOLD signal at the five nodes in the network using MDS-MLE and MDS-VB. The estimated BOLD signal $\widehat{{y}_{m}}$ at m-th node using these methods was computed as follows

$$\widehat{{y}_{m}}={\widehat{{b}_{m}}}^{\prime}\mathrm{\Phi}\widehat{{x}_{m}}(t)$$

(15)

Where,
${\widehat{{b}_{m}}}^{\prime}$ are the estimated coefficients (using MLE or VB) corresponding to the basis functions spanning the subspace of HRFs and
${\widehat{{x}_{m}}}^{\prime}$ (using MLE or VB) is the estimated latent signal at the *m*-th node. As shown in this figure, both MDS-MLE and MDS-VB were able to recover the latent and BOLD signals at this SNR. Table-1 shows the mean square error (MSE) between the estimated and latent signals and estimated and actual BOLD-fMRI responses in each node using these two methods. The MSE in estimating these signals is very low by both methods. Figures 7A and B respectively show the estimated intrinsic and modulatory causal interactions by MDS-MLE and MDS-VB in the simulated five node network. MDS-VB correctly identified both intrinsic (solid lines) and modulatory connections (dotted lines) in this network as shown in Figure 7B. MDS-MLE also correctly recovered both intrinsic and modulatory networks but it introduced an additional false modulatory connection from node 3 to node 1 as shown in Figure 7A.

(**A**) Intrinsic and modulatory connections estimated by MDS using Maximum likelihood estimates (MDS-MLE) and (**B**) estimates Variational Bayes estimates (MDS-VB). (**C**) Causal interactions estimated by Granger Causal Analysis (GCA). MDS-VB correctly identified **...**

We next compare the performance of MDS with that of GCA using the same simulated data. This analysis was performed using the multivariate GCA toolbox developed by Seth (Seth, 2010). We applied GCA on the same dataset to verify whether it can recover the causal connections (either intrinsic or modulatory). As shown in Figure 7C, GCA likely missed both the intrinsic and modulatory interactions from node 1 to 2 but it was able to recover modulatory interactions from node 3 to 4 in addition to other connections. However, unlike MDS, GCA cannot distinguish between intrinsic and modulatory interactions. GCA missed the connection from node 1 to 2 because these nodes have both intrinsic and modulatory interactions with opposing actions. Since GCA does not model these interactions separately, the net connection strength between these nodes is not significant. On the other hand, MDS models these interactions explicitly and therefore is able to recover both types of connections. This example demonstrates that GCA cannot recover all the connections under these conditions while both MDS methods could recover all the connections and at the same time differentiate between the different types of interactions.

We evaluated the performance of MDS-MLE and MDS-VB on simulated datasets by computing sensitivity, false positive rate and accuracy in finding intrinsic and modulatory causal interactions as a function of SNR, network size and the number of time samples. Figures 8, ,99 and and1010 respectively show the performance of MDS-MLE and MDS-VB for time samples *T = 500, 300, 200.* For each *T* and network size, the performance of MDS-MLE and MDS-VB is evaluated *for SNR = 0, 5 and 10 dB*. In each of these figures panels A, B and C show the performance of MDS-MLE and MDS-VB with respect to sensitivity, false positive rate and accuracy in identifying the intrinsic and modulatory interactions for 2, 3 and 5 node networks respectively. The performance of these methods improved with the increase in SNR and time samples (*T*).

Between the two MDS methods, MDS-VB showed superior performance compared to MDS-MLE across all the SNRs, time samples and network size. MDS-VB showed significantly greater performance at low SNR and for shorter time series. For example, for *SNR= 0dB, T = 200* time points and number of nodes = 5 (Figure 10C), the sensitivity of MDS-VB in recovering intrinsic and modulatory interactions was about 0.75 and 0.6 respectively while MDS-MLE has sensitivity of only about 0.3 and 0.5 respectively. The accuracy of MDS-VB is also high (>0.8) under all conditions (panel (C) in Figures 8, ,99 and and10)10) because the sensitivities are high and false positive rates of this method is very low (panels A and B in Figures 8, ,99 and and10).10). More generally, in cases with high noise, lower sample length, and larger network size MDS-VB consistently outperforms MDS-MLE.

Finally, we compared the performance of GCA with MDS methods on the same datasets and performance metrics. Figures 8, ,99 and and1010 respectively show the comparative performance of GCA with MDS-VB and MDS-MLE for *T = 500, 300 and 200* at various SNRs and network sizes. The results suggest that performance of GCA is poor compared to both MDS methods with respect to sensitivity and accuracy in indentifying causal interactions between brain nodes. Since GCA cannot distinguish between intrinsic and modulatory interactions, we computed the performance metrics by considering both the connection types. The performance of GCA declined with the decrease in SNR for networks of size 3 and 5 (Figures 8–10). The performance of GCA is worse even for a 2-node network as shown in Figures 8A, ,9A9A and 10A. The sensitivity of GCA for this network is less than 10% because intrinsic and modulatory interactions have weights with opposite signs. Since GCA does not model these interactions explicitly, it does not detect the interactions between the two nodes. On the other hand, both MDS methods showed better sensitivity and accuracy in identifying both types of interactions at all SNRs for this network.

Table 2 shows the relative performance of MDS-VB, MDS-MLE and GCA on 25 datasets simulated for a 3-node network at 5 dB SNR without any modulatory effects and external stimuli. The performance of GCA improved and recovered the causal network with sensitivity of 0.9, FPR of 0 and accuracy of 0.98. In this case, the performance of GCA is comparable to MDS, suggesting that in the absence of modulatory effects and external stimuli GCA can perform as well as MDS even in the presence of HRF variations.

We examined the performance of MDS-VB on simulated data in which latent signals were generated at various delays using a sampling interval of 1 millisecond and convolved with various delays in the HRF. Causal interactions were then estimated based on the observed time series obtained at sampling interval of 2 seconds. We examined the performance of MDS-VB under four different cases:

In this case, there are no causal interactions between the two nodes with respect to the latent signals but the observed fMRI time series are delayed with respect to each other because of delays in their respective HRFs. In this case, MDS-VB was accurate in that it did not recover any causal interactions (both intrinsic and modulatory) despite variations in HRF.

In this case, HRFs in both nodes are identical. As shown in Figure S1(A), the sensitivity of MDS-VB in recovering both intrinsic and modulatory interactions is above 0.9 (left panel) with FPRs below 0.1 (middle panel) and therefore with accuracies of above 0.9 (right panel) at all the latent signal delays. The performance of MDS-VB improved with the increase in latent signal delays.

In this scenario, HRF delays do not confound the causal interactions between the nodes at the latent signal level. For HRF delay of 500 and 2500 milliseconds, the performance metrics shown in Figures S1 (B) and S1(C) suggest that MDS-VB is able to recover both intrinsic and modulatory causal interactions reliably. For the HRF delay of 2500 milliseconds, there is a small drop in sensitivity for latent signal delays of 800 and 1000 milliseconds.

This is the most difficult situation for any method because HRF delays confound the causal interactions at latent signal level. Figure S2 (B) shows the performance of MDS-VB when HRF in node 2 peaks 500 milliseconds before the HRF in node 1 while node1 drives node 2 at latent signal level. The performance of MDS-VB improved with the increase in latent signal delay. Figure S2(C) shows the performance of MDS-VB for HRF delay of 2500 milliseconds. Although the sensitivities for latent signal delays from 200 to 600 milliseconds are higher (left panel) but are accompanied by greater false positives (middle panel) and therefore have poor accuracies (right panel). The performance of MDS-VB improved for latent signal delays of 800 and 1000 milliseconds in recovering causal interactions.

We have developed a novel dynamical systems method to model intrinsic and modulatory interactions in fMRI data. MDS uses a vector autoregressive state-space model incorporating both intrinsic and modulatory causal interactions. Intrinsic interactions reflect causal influences independent of external stimuli and task conditions, while modulatory interactions reflect context dependent influences. Our proposed MDS method overcomes key limitations of commonly used methods for estimating the causal relations using fMRI data.

Critically, causal interactions in MDS are modeled at the level of latent signals, rather than at the level of the observed BOLD-fMRI signals. Our simulations clearly demonstrate that this has the added effect of eliminating the confounding effects of regional variability in HRF. The parameters and latent variables of the state-space model were estimated using two different methods. In the MDS-MLE method, the statistical significance of the parameters of the state equation, which represent the causal interactions between multiple brain nodes, was tested using a Bootstrap method. In the MDS-VB method, we used non-informative priors to facilitate automatic relevance detection. We first discuss findings from our simulations, and show that MDS-VB provides the robust and accurate solutions even at low SNRs and smaller number of observed samples (time points). We then contrast the performance of MDS with the widely used GCA method. In this context, we highlight instances where GCA works reasonably well and where it fails. Finally, we discuss several important conceptual issues concerning the investigation of dynamic causal interactions in fMRI, contrasting MDS with other recently developed methods.

In the following sections, we evaluate and discuss the performance of MDS under various scenarios. Importantly, we demonstrate, for the first time, that VB approaches provide better estimates of model parameters than MLE based approaches. We investigated the performance of MDS-MLE and MDS-VB on simulated datasets generated at SNRs of 0, 5 and 10 dB for network structures of sizes 2, 3 and 5 and time samples of 200, 300 and 500. We simulated regional HRF variations in such a way that the directions of hemodynamic response delays were in opposite direction to the delays in the latent signals (Figure 4). HRF delays could therefore influence the estimation of causal interactions when applied directly on the observed BOLD-fMRI signals. This makes the problem of estimating causal interactions particularly challenging and provides novel insights into strengths and weaknesses of multiple approaches used here.

The performance of MDS was found to be robust when tested under various simulated conditions. Specifically, MDS was able to reliably recover both intrinsic and modulatory causal interactions from the simulated datasets and its performance was found to be superior to the conventional approaches such as GCA. Among MDS methods, the performance of MDS-VB was found to be superior to MDS-MLE with respect to performance metrics such as sensitivity, false positive rate and accuracy in identifying intrinsic and modulatory causal interactions (Figures 7–10). MDS-VB showed significantly improved performance over MDS-MLE, especially at adverse conditions such as low SNRs, large network size and for less number of observed samples (Figure 10C).

The superior performance of MDS-VB can be attributed to the regularization imposed by priors in this method. Our priors not only regularized the solution but also helped in achieving sparse solutions. By using sparsity promoting priors, the weights corresponding to insignificant links are driven towards zero and therefore enable automatic relevance detection (Tipping, 2001). This approach is not only useful for regularizing solutions when the number of unknown parameters is high, but also for providing sparse and interpretable solutions. This feature of VB can be especially important in analyzing networks with large number of nodes, an aspect often overlooked in most analyses of causality in complex networks.

Another advantage of Bayesian analysis lies in computing the statistical significance of the network connections estimated by MDS methods. In the MLE approach, we need to resort to a Bootstrap approach, which can be computationally expensive. MDS-VB, on the other hand, provides posterior probabilities of each model parameter, as opposed to point estimates in MLE, which can be used to compute their statistical significance. From a computational perspective, MDS-VB is several orders of magnitude faster than MLE-MLE because it does not require nonparametric tests for statistical significance testing. Taken together, these findings suggest that MDS-VB is a superior and more powerful method than MDS-MLE.

We demonstrated the importance of modeling the influence of both external and modulatory stimuli for estimating the causal networks by applying GCA on a five-node network. On this dataset, GCA failed to detect both the modulatory and intrinsic connections between nodes 1 and 2 (Figure 7C). As mentioned earlier, GCA missed this connection because the network has both intrinsic and modulatory connections between these nodes but with weights of opposite signs. Therefore, in GCA the net strength of this connection is very small and it did not survive a conservative test of statistical significance. Our MDS methods, on the other hand, uncovered both these connections. This phenomenon is most obvious in our simulations of the 2-node network wherein GCA could not find causal interactions between the nodes (Figures 8A, ,9A9A and 10A). In this network also both intrinsic and modulatory connections have weights with opposite signs. These results demonstrate the importance of explicitly modeling the influence of external and modulatory stimuli. Overall, the performance of GCA, when applied on the datasets generated at various SNR, networks and for different number of observations, was found to be inferior when compared to both MDS methods. This was true with respect to both sensitivity and accuracy in indentifying causal interactions between multiple brain nodes (Figures 8, ,99 and and10).10). When compared to MDS, the performance of GCA drops significantly at lower SNRs. These results suggest that MDS is more robust against observation noise than GCA. Therefore, our simulations suggest that MDS-VB outperforms GCA for networks consisting of less than 6 nodes. More extensive simulations however are needed to compare the performance of MDS with GCA, for larger networks.

Conventional GCA methods do not take into account dynamic changes in modulatory inputs and their effect on context dependent causal interactions. In order to compare GCA more directly with MDS, we examined causal interactions in the absence of modulatory influences. As expected, in this case, the performance of GCA was comparable to that of MDS. Together, these findings suggest that GCA can accurately recover causal interactions in the absence of modulatory effects. Although newer dynamic GCA methods have been proposed, they appear to be designed more for improving estimations of causal interactions rather than examining context dependent dynamic changes in causal interactions (Havlicek et al., 2010; Hemmelmann et al., 2009; Hesse et al., 2003; Sato et al., 2006). Further simulation studies are needed to assess how well dynamic GCA can estimate context specific modulatory effects.

We next contrast our findings using GCA and MDS in the context of the equivalence of ARIMA and structural time series models. GCA is based on autoregressive integrated moving average (ARIMA) models proposed by Box and Jenkins whereas MDS is a structural time series model (Box. et al., 1994). In the econometrics literature, it is well known that the linear structural time series models have equivalent ARIMA model representations (Box. et al., 1994). This equivalence has been under-appreciated in the neuroimaging literature, as demonstrated by the recent discussion regarding the relative merits of GCA and DCM (Friston, 2009a, b; Roebroeck et al., 2009). Our detailed simulations suggest that, under certain conditions, GCA is able to recover much of the causal network structure in spite of the presence of HRF delay confounds. This is most clearly illustrated using the simulations shown in Figure 7C, where we found that GCA could recover the network structure except for the intrinsic/modulatory connection from node 1 to 2. Our simulations also suggest that GCA may not be able to uncover causal connections when there is a conflict between intrinsic and modulatory connections (Figure 7C, ,8,8, ,99 and and10)10) but for other cases it is able to recover the underlying networks. In estimating the causal interactions, the estimated model order for GCA using the Akaike information criterion (AIC) was more than 3. Note that in our simulations, the causal interactions at the latent signal level were generated using VAR with model order 1 (Equation 1). It is plausible that in GCA, the higher model order is used to compensate for variations in HRF delay and experimental effects such as context specific modulatory connections (Deshpande et al., 2009). Our simulations suggest that this is indeed that case and that optimal model order selection in GCA results in improved estimation of causal interactions between nodes. Nevertheless, structural time series based models like MDS and DCM can provide better interpretation of network structure since they can distinguish between intrinsic and context specific modulatory causal interactions in latent signals.

In most fMRI studies, data are typically acquired at sampling rates of about 2 seconds (or TR = 2seconds). However, dynamical interactions between brain regions occur at faster time scales of 10–100 milliseconds. To examine the effects of downsampling fMRI data on the performance of MDS, we first simulated interactions between nodes at a sampling rates of 1KHz and then re-sampled the time series to 0.5 Hz after convolving with region-specific HRFs. MDS-VB was then applied on these datasets to estimate causal interactions between nodes. We also examined the influence of HRF delays on the estimation of causal interactions under four scenarios (Figures S1 and S2), similar to the strategy used by Deshpande and colleagues (Deshpande et al., 2009) to study the effect of HRF variability on GCA. In the first scenario, there were no causal interactions between nodes but HRFs were delayed between the nodes. In this case, MDS-VB performed accurately and did not infer any false causal interactions. This shows that MDS-VB can model and remove the effects HRF variation while estimating causal interactions at latent signal level. In the second scenario, we introduced causal interactions between nodes, but without HRF variations. MDS reliably estimated causal interactions for various delays in latent signals (Figure S1A). In the third scenario, we introduced causal interactions between nodes and also varied HRFs such that delays in latent signals and HRFs were in the same direction. In this case, MDS was able to recover both intrinsic and modulatory causal interactions accurately (Figure S1B, S1C). In the fourth scenario, when delays in latent signal and HRF opposed each other performance dropped significantly, just as with GCA (Deshpande et al., 2009). Further research is needed to examine whether causal interactions under this scenario are inherently unresolvable by MDS and other techniques such as DCM.

As noted above, like GCA, MDS can be used to estimate causal interactions between a large numbers of brain nodes. Unlike GCA, however, causal interactions are estimated on the underlying latent signals while simultaneously accounting for regional variations in the HRF. Furthermore, unlike GCA, MDS can differentiate between intrinsic and stimulus-induced modulatory interactions. Like DCM, MDS takes into account regional variations in HRF while estimating the causal interactions between brain regions. And like DCM, MDS also explicitly models external and modulatory inputs, allowing us to simultaneously estimate intrinsic and modulatory causal interactions between brain regions. Unlike DCM, however, MDS does not require the investigator to test multiple models and choose one with the highest model evidence. This overcomes an important limitation of DCM - as the number of brain regions of interest increases, an exponentially large number of models needs to be examined; as a result, the computational burden in evaluating these models and identifying the appropriate model can become prohibitively high. MDS overcomes such problems and as our study illustrates, MDS incorporates the relative merits of both GCA and DCM while attempting to overcome their limitations.

Both DCM and MDS are state-space modes but DCM uses a deterministic state model, (although a stochastic version has been recently developed (Daunizeau et al., 2009)) whereas MDS employs a stochastic model. Modeling latent interactions as a stochastic process is important for taking into account intrinsic variations in latent signals that are not induced by experimental stimuli. Another important difference is that MDS uses empirical basis functions to model variations in HRF whereas DCM uses a biophysical Balloon model (Friston et al., 2003). Since the Balloon model is a nonlinear model, several approximations are required to solve it. In contrast, empirical HRF basis functions allow MDS to use a linear dynamical systems framework. The relative accuracy of these approaches is currently not known.

One important advantage of MDS is that it does not assume that the fMRI time series is stationary unlike methods based on vector autoregressive modeling such as GCA. This is important because the dynamics of the latent signals can be altered significantly by experimental stimuli, leading to highly non-stationary signals. In GCA, the time series is tested for stationarity by either examining the autocorrelation of the time series or by investigating the presence of unit roots. If the time series is found to be nonstationary then one commonly used approach to remove non-stationarity is to replace the original time series with a difference of the current and previous time points (Seth, 2010). A problem with the use of such a manipulation is it acts as a high-pass filter that can significantly distort the estimated causal interactions (Bressler and Seth, 2010).

Two methods based on dynamical systems based approach for modeling fMRI data have been proposed recently (Ge et al., 2009; Smith et al., 2009). Smith and colleagues used a switching linear dynamical systems model wherein modulatory inputs were treated as random variables (Smith et al., 2009). In contrast, MDS models them as deterministic quantities which are known for a given fMRI experiment. Modeling modulatory inputs as unknown random variables is useful for fMRI experiments in which the occurrence of modulatory inputs is unknown. However, for most fMRI studies modulatory inputs are known and modeling them as unknown quantities unnecessarily increases the number of parameters to be estimated. Also, the switching dynamical systems model makes additional assumptions in computing the probability distributions of the state variables (Murphy, 1998). Further, Smith and colleagues used an MLE approach to estimate latent signals and model parameters. As we show in this study, compared to MLE, a VB approach yields more robust, computationally efficient and accurate model estimation even when the SNR and the number of time points are low, as is generally the case with fMRI data. Another difference is that Smith and colleagues combine intrinsic and modulatory matrices i.e., for *every j-th* modulatory input, the connection matrix (*Aj = A + Cj*) is estimated from the data. In MDS, we estimate intrinsic and modulatory matrices *Cj* separately which explicitly dissociates intrinsic and modulatory effects on causal interactions between brain regions. Another difference lies in testing the statistical significance of the estimated causal connections. In MLE- MDS, we use a non-parametric approach and in MDS-VB, posterior probabilities of the model parameters are used for testing the significance of the causal interactions. Finally, it should be noted that the performance of this method under varying SNRs and sample sizes is not known since no simulations were performed.

Ge and colleagues (2009) used a different state-space approach to estimate causal interactions in the presence of external stimuli. They used vector autoregressive modeling for the state equation to model causal interactions among brain regions, whereas the observation model was nonlinear. They used an extended Kalman filtering approach to estimate the state variables and model parameters. This method was applied on local field potential data, so its usefulness for fMRI data is unclear. However, there are several differences between MDS and this approach. MDS has been developed explicitly for fMRI data to account for HRF variations in brain regions while simultaneously estimating causal interactions. In the work of Ge and colleagues, both state variables and unknown model parameters were treated as state variables and extended Kalman filtering is used to obtain maximum likelihood estimates of these variables (Ge et al., 2009). In MDS, we have taken a different approach - state variables are separated from model parameters. This allowed us to use sparsification promoting priors in the MDS-VB approach. Our results on simulated data suggest that MDS-VB outperforms MDS-MLE especially at low SNRs and smaller number of temporal observations. Finally, Ge and colleagues used Kalman filtering to estimate state variables while in MDS we used Kalman smoothing for estimating the latent signals (Ge et al., 2009). In Kalman smoothing, both past and future data is used to estimate latent signals whereas filtering uses only past values to estimate the current values. In general, smoothing provides better estimates of latent signal than the filtering approach (Bishop, 2006). Finally, although Ge and colleagues validated their approach on two three-node toy models, the performance of this method is not known under varying conditions such as different SNRs, network sizes and number of data samples.

The Bayesian multivariate dynamical system framework we have developed here provides a robust method for estimating and interpreting causal network interactions in simulated BOLD-fMRI data. Extensive computer simulations demonstrate that this MDS method is more accurate and robust than GCA and among the MDS methods developed here MDS-VB exhibits superior performance over MDS-MLE. Critically, MDS estimates both intrinsic and experimentally induced modulatory interactions in the latent signals, rather than the observed BOLD-fMRI signals. Unlike DCM, our proposed MDS framework does not require testing multiple models and may therefore be more useful for analyzing networks with a large number of nodes and connections. One limitation of this work is that our simulations were based on datasets created using the same model as the one used to estimate causal interactions. In this vein, preliminary analysis using simulations with delayed latent signals at millisecond temporal resolution suggests that MDS can accurately recover intrinsic and modulatory causal interactions in the presence of confounding delays in HRF. Future studies will examine the performance of MDS using more realistic simulations in which causal influences are generated independently of any one particular model, as well as the application of MDS real experimental fMRI data (Menon et al. In Preparation).

This research was supported by grants from the National Institutes of Health (HD047520, HD059205, HD057610) and the National Science Foundation (BCS-0449927).

In this appendix, we provide detailed equations for estimating the model parameters and latent states of MDS using an expectation maximization algorithm.

The state space and observation equations (1)–(3) can be expressed in the standard state form so that Kalman filtering and smoothing recursive equations can be used to estimate the probability distribution of latent signals which constitutes the E-step in our EM algorithm (Penny et al., 2005).

Let

$$\mathit{x}(t)={[{\mathit{s}}^{\prime}(t){\mathit{s}}^{\prime}(t-1)\dots \mathrm{..}{\mathit{s}}^{\prime}(t-L+1)]}^{\prime}$$

(A.1)

The equations (1)–(3) can be written in terms of ** x**(

$$\mathit{x}(t)=\stackrel{\sim}{G}(t)\mathit{x}(t-1)+\stackrel{\sim}{D}u(t)+\stackrel{\sim}{\mathit{w}}(t)$$

(A.2)

$$y(t)=B\stackrel{\sim}{\mathrm{\Phi}}\mathit{x}(t)+\mathit{e}(t)$$

(A.3)

where,

$$\begin{array}{cc}G(t)=A+\sum _{j=1}^{J}{v}_{j}(t){C}_{j},& \stackrel{\sim}{G}(t)=\stackrel{\sim}{A}+\sum _{j=1}^{J}{v}_{j}(t){\stackrel{\sim}{C}}_{j}\\ \stackrel{\sim}{A}=\left(\begin{array}{cc}A& {0}_{M\times (L-1)}\\ \psi & {0}_{M}\end{array}\right),& {\stackrel{\sim}{C}}_{J}=\left(\begin{array}{cc}{C}_{j}& {0}_{M\times (L-1)}\\ \psi & {0}_{M}\end{array}\right)\end{array}$$

(A.4)

ψ is the *M*(*L* – 2) × *ML* delay matrix that fills the lower rows of *Ã*.

Similarly,

$$\stackrel{\sim}{D}=\left(\begin{array}{cc}D& {0}_{M\times (L-1)}\\ {0}_{M(L-2)\times ML}& {0}_{M}\end{array}\right)$$

(A.5)

$$U(t)={[\begin{array}{ll}{u}^{\prime}(t)\hfill & {0}_{1,M(L-1)}\hfill \end{array}]}^{\prime}$$

(A.6)

$$\stackrel{\sim}{\mathit{w}}(t)={[\begin{array}{ll}{\mathit{w}}^{\prime}(t)\hfill & {0}_{1,M(L-1)}\hfill \end{array}]}^{\prime}$$

(A.7)

$$\stackrel{\sim}{\mathit{w}}(t)\sim N(0,\stackrel{\sim}{Q})$$

(A.8)

$$\stackrel{\sim}{Q}=\left(\begin{array}{cc}Q& {0}_{M\times (L-1)}\\ {0}_{M(L-2)\times ML}& {0}_{M}\end{array}\right)$$

(A.9)

$$B=\left[\begin{array}{ccc}{b}_{1}& \cdots & 0\\ \vdots & \ddots & \vdots \\ 0& \cdots & {b}_{M}\end{array}\right]$$

(A.10)

$$\stackrel{\sim}{\mathrm{\Phi}}=\left[\begin{array}{ccc}\phantom{\rule{0.16667em}{0ex}}& \cdots & \phantom{\rule{0.16667em}{0ex}}\\ \vdots & \ddots & \vdots \\ \phantom{\rule{0.16667em}{0ex}}& \cdots & \phantom{\rule{0.16667em}{0ex}}\end{array}\right]$$

(A.11)

$$\mathit{e}(t)={[{\mathit{e}}_{1}(t)\phantom{\rule{0.16667em}{0ex}}{\mathit{e}}_{2}(t)---{\mathit{e}}_{m}(t)]}^{\prime}$$

(A.12)

$$\mathit{e}(t)\sim N(0,R)$$

(A.13)

$$R=\mathit{diag}({\sigma}_{1}^{2},{\sigma}_{2}^{2},\dots .{\sigma}_{M}^{2})$$

(A.14)

Let ** x**(0) be the initial state and uncorrelated with state noise

In E-step, the probability of latent signal ** x**(

In the filtering step, the goal is to compute the following posterior distribution of ** x**(

$$p(\mathit{x}(t)\mid y(1),y(2)\dots .y(t))=N({x}_{t}^{t},{\mathrm{\sum}}_{t}^{t})$$

(A.15)

The mean and covariance of this distribution can be computed using the following forward recursive steps.

$${x}_{t}^{t-1}=\stackrel{\sim}{G}{x}_{t-1}^{t-1}+\stackrel{\sim}{D}U(t)$$

(A.16)

$${\mathrm{\sum}}_{t}^{t-1}=\stackrel{\sim}{G}(t){\mathrm{\sum}}_{t-1}^{t-1}{\stackrel{\sim}{G}}^{\prime}+\stackrel{\sim}{Q}$$

(A.17)

$$K(t)={\mathrm{\sum}}_{t}^{t-1}{E}^{\prime}{(E{\mathrm{\sum}}_{t}^{t-1}{E}^{\prime}+R)}^{-1}$$

(A.18)

where, *E = B*

$${x}_{t}^{t}={x}_{t}^{t-1}+K(t)(y(t)-{Ex}_{t}^{t-1})$$

(A.19)

$${\mathrm{\sum}}_{t}^{t}={\mathrm{\sum}}_{t}^{t-1}-K(t)E{\mathrm{\sum}}_{t}^{t-1}$$

(A.20)

The above recursion is initialized using ${x}_{1}^{0}={\mu}_{0}$ and ${\mathrm{\sum}}_{1}^{0}={\mathrm{\sum}}_{o}$.

In the smoothing step, the goal is to compute the posterior distribution of ** x**(

$$p(\mathit{x}(t)\mid y(1),y(2)\dots .y(T))=N({x}_{t}^{T},{\mathrm{\sum}}_{t}^{T})$$

(A.21)

The mean ${x}_{t}^{T}$ and covariance ${\mathrm{\sum}}_{t}^{T}$ at each time t can be estimated using the following backward recursions.

$${x}_{t}^{T}={x}_{t}^{t}+{J}_{t}({x}_{t+1}^{T}-\stackrel{\sim}{G}(t){x}_{t}^{t})$$

(A.22)

$${\mathrm{\sum}}_{t}^{T}={\mathrm{\sum}}_{t}^{t}+{J}_{t}({\mathrm{\sum}}_{t}^{T+1}-{\mathrm{\sum}}_{t}^{t-1}){J}_{t}^{\prime}$$

(A.23)

where, *J _{t}* is defined as

$${J}_{t}={\mathrm{\sum}}_{t}^{t}{\stackrel{\sim}{G}}^{\prime}(t){({\mathrm{\sum}}_{t}^{t-1})}^{-1}$$

(A.24)

The above backward recursions are initialized noting the fact that for *t =T*,
${x}_{T}^{T}$ and
${\mathrm{\sum}}_{T}^{T}$ can be obtained from equations (21) and (22) respectively.

In the M-step, the goal is to find the unknown parameters **θ** = {*A, C*_{1,..,}*C _{J}, D, Q, B, R*} given the data and the current posterior distributions of

The complete log-likelihood of the data is given by

$$L=logp(\mathit{x}(1),\mathit{x}(2)\dots \mathit{x}(T),y(1),y(2)\dots y(T)\mid \mathrm{\Theta})=\mathit{logp}(\mathit{x}(1)\mid \mathrm{\Theta})+\sum _{t=2}^{T}\mathit{logp}(\mathit{x}(t)\mid \mathit{x}(t-1),\mathrm{\Theta})+\sum _{t=1}^{T}\mathit{logp}(y(t)\mid \mathit{x}(t),\mathrm{\Theta})$$

(A.25)

The complete log-likelihood that depends on the parameters *A, C* and *D* is given by

$$L(A,{C}_{1},\mathrm{..}{C}_{J},D)\propto -0.5{\mathrm{\sum}}_{t=2}^{T}({\mathit{x}}_{s}(t)-(A+{\mathrm{\sum}}_{j=1}^{J}{v}_{j}(t){C}_{j})F\mathit{x}(t-1)-du{(t))}^{\prime}{Q}^{-1}({\mathit{x}}_{s}(t)-(A+{\mathrm{\sum}}_{j=1}^{J}{v}_{j}(t){C}_{j})F\mathit{x}(t-1)-du(t))$$

(A.26)

Where, *F =* [*I _{M}* 0

Taking expectations of equation (A.26) with respect to *p*(** x**(

$$[A\phantom{\rule{0.16667em}{0ex}}{C}_{1},\mathrm{..}{C}_{J}\phantom{\rule{0.16667em}{0ex}}d]\left(\begin{array}{cc}\sum _{t=2}^{T}F(t)P(t-1)F{(t)}^{\prime}& \sum _{t=2}^{T}F(t){x}_{t-1}^{T}u(t)\\ \sum _{t=2}^{T}{\mathit{x}}_{t-1}^{T}u(t)F{(t)}^{\prime}& \sum _{t=2}^{T}u{(t)}^{2}\end{array}\right)=\left[\begin{array}{cc}\sum _{t=2}^{T}{P}_{s}(t,t-1)F{(t)}^{\prime}& \sum _{t=2}^{T}{m}_{s}(t)u(t)\end{array}\right]$$

(A.27)

Where,

$$P(t)={\mathrm{\sum}}_{t}^{T}+({x}_{t}^{T}){({x}_{t}^{T})}^{\prime}$$

(A.28)

$${m}_{s}(t)={x}_{t}^{T}(1:M)$$

(A.29)

$$F(t)={[{I}_{M}\phantom{\rule{0.16667em}{0ex}}{v}_{1}(t){I}_{M}\dots \mathrm{..}{v}_{J}(t){I}_{M}]}^{\prime}F$$

(A.30)

(*m _{s}*(

$$P(t,t-1)={J}_{t-1}{\mathrm{\sum}}_{t}^{T}+({x}_{t}^{T}){({x}_{t}^{T})}^{\prime}$$

(A.31)

*P _{s}*(

Taking expectations of Equation (A.26) with respect to *p*(** x**(

$$\widehat{Q}=\frac{1}{T-1}\sum _{t=2}^{T}({P}_{s}(t)-{P}_{s}(t,t-1){F}^{\prime}G{(t)}^{\prime}-{m}_{s}(t)u{(t)}^{\prime}d-G(t)F{P}_{s}(t,t-1)+G(t)FP(t-1){F}^{\prime}G{(t)}^{\prime}+G(t)F{x}_{t-1}^{T}{{u}^{\prime}}^{(t)}d-du(t){{m}^{\prime}}_{s}(t)+du(t)({{x}_{t-1}^{T}}^{\prime}{F}^{\prime}G{(t)}^{\prime}+{d}^{\prime}u{(t)}^{2}d)$$

(A.32)

Where,

*P _{s}*(t) is the first

Each row vector *b _{m}*,

$${\widehat{{b}_{m}}}^{\prime}={(\mathrm{\Phi}\sum _{t=1}^{T}{P}_{m}(t){\mathrm{\Phi}}^{\prime})}^{-1}\mathrm{\Phi}\sum _{\text{t}=1}^{\text{T}}{y}_{m}(t){x}_{t}^{T}(m)$$

(A.33)

Where, *P _{m}*(

The diagonal observation covariance matrix *R* can be estimated by maximizing the conditional expectation of the complete log-likelihood given in equation (A.25). The estimation of diagonal components of R are given by

$$\widehat{R}(m,m)=\frac{1}{T}\sum _{t=1}^{T}({y}_{m}^{2}(t)-2{\widehat{{b}_{m}}}^{\prime}\mathrm{\Phi}{y}_{m}(t){x}_{t}^{T}(m)+\mathit{trace}({\widehat{{b}_{m}}}^{\prime}\mathrm{\Phi}{P}_{r}(t)\mathrm{\Phi}\widehat{{b}_{m}}),m=1,2,\dots M$$

(A.34)

The maximum likelihood estimates of initial state mean *μ _{o}* and covariance Σ

$$\widehat{{\mu}_{o}}={x}_{1}^{T}$$

(A.35)

$$\widehat{{\mathrm{\sum}}_{o}}={\mathrm{\sum}}_{1}^{T}$$

(A.36)

The above E and M steps are repeated until the change in log-likelihood of the data between two iterations is below a specified threshold.

In VB, the goal is to find the posterior distributions of latent variables q**s**(**S**|Y) and parameters q** _{θ}**(

In this step, the posterior distributions of latent variables q** _{S}**(

In this step, the posterior distributions of model parameters q** _{θ}**(

$${\text{q}}_{\mathbf{\Theta}}(\mathbf{\Theta}\mid \text{Y})=\text{q}(\mathbf{A},{\mathit{C}}_{\mathbf{1}},\mathrm{..}{\mathit{C}}_{\mathit{J}},\mathbf{D},\mathbf{Q})\text{q}(\mathbf{B},\mathbf{R})$$

(B.1)

In this work, we also assume that the state and observation noise covariance matrices (**Q & R**) to be diagonal. Therefore, the distribution of the elements in the rows of

$${\mathit{s}}_{m}(t)=({\mathit{a}}_{m}+{\mathrm{\sum}}_{j=1}^{J}{v}_{j}(t){\mathit{c}}_{j,m}){\mathit{s}}_{m}(t-1)+{\mathit{d}}_{m}u(t)+{\mathit{w}}_{m}(t)),\phantom{\rule{0.38889em}{0ex}}{\mathit{w}}_{m}(t)\sim N(0,{\mathit{\beta}}_{m})$$

(B.2)

where, *a** _{m}* and

$${\mathit{s}}_{m}(t)={{\mathit{\theta}}^{\prime}}_{m}[F(t)\mathit{x}(t);u(t)]+{\mathit{w}}_{m}(t)$$

(B.3)

Where, ** θ**′

We assume the following Gaussian-Gamma conjugate priors for ** θ_{m}** and

$$p({\mathit{\theta}}_{m},{\mathit{\beta}}_{m}\mid \mathit{\alpha})\sim N(0,{({\mathit{\beta}}_{m}{\mathit{A}}_{\alpha})}^{-1})Ga({a}_{o},{b}_{o})$$

(B.4)

Where, ** agr; =** [

Let the prior on ** α** be

$$p(\mathit{\alpha})=\prod _{i=1}^{2M+1}Ga({c}_{o},{d}_{o})$$

(B.5)

Therefore, by applying equation (9), the joint posterior for *θ*_{m}**& β**

$$q({\mathit{\theta}}_{m},{\mathit{\beta}}_{m}\mid Y)=N(\overline{{\theta}_{m}},{\beta}_{m}^{-1}{\mathrm{\sum}}_{m})Ga({a}_{m,N},{b}_{mN})$$

(B.6)

Where,

$${\mathrm{\sum}}_{m}^{-1}=\left(\begin{array}{cc}{\mathrm{\sum}}_{t=2}^{T}F(t)P(t-1)F{(t)}^{\prime}& {\mathrm{\sum}}_{t=2}^{T}F(t){x}_{t-1}^{T}u(t)\\ {\mathrm{\sum}}_{t=2}^{T}{x}_{t-1}^{T}u(t)F{(t)}^{\prime}& {\mathrm{\sum}}_{t=2}^{T}u{(t)}^{2}\end{array}\right)$$

(B.7)

$$\overline{{\theta}_{m}}={\mathrm{\sum}}_{m}\left(\begin{array}{c}\sum _{t=2}^{T}F(t)E({\mathit{s}}_{m}(t)\mathit{x}(t-1))\\ \sum _{t=2}^{T}{P}_{s}(t,t-1)F{(t)}^{\prime}\end{array}\right)$$

(B.8)

$${a}_{m,N}={a}_{o}+\frac{T+2M}{2}$$

(B.9)

$${b}_{m,N}={b}_{o}+0.5\left(\sum _{t=2}^{T}E({\mathit{s}}_{m}^{2}(t))-{\overline{\theta}}_{m}^{\prime}{\mathrm{\sum}}_{m}^{-1}{\overline{\theta}}_{m}\right)$$

(B.10)

The posterior for hyper parameters ** α** is given by

$$q(\mathit{\alpha}\mid Y)=\prod _{i=1}^{2M+1}Ga({c}_{N},{d}_{Ni})$$

(B.11)

Where,

$${c}_{N}={c}_{o}+\frac{1}{2}$$

(B.12)

$${d}_{Ni}={d}_{o}+\frac{1}{2}\left({\overline{\theta}}_{m}^{2}(i)\frac{{a}_{m,N}}{{b}_{m,N}}+{\mathrm{\sum}}_{m}(i,i)\right)$$

(B.13)

The posteriors for *θ*_{m},*β** _{m}* and

Similarly, the posterior distribution for the model parameters in the output equation is computed. Since ** R** is assumed to be diagonal, the observation equation in m-th node is given by

$${y}_{m}(t)={b}_{m}\mathrm{\Phi}{\mathit{x}}_{m}(t)+{\mathit{e}}_{m}(t),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\mathit{e}}_{m}(t)\sim N(0,{\mathit{\lambda}}_{m})$$

(B.14)

Where,
${\mathit{\lambda}}_{m}={\scriptstyle \frac{1}{\mathit{R}(m,m)}}$. Again assuming Gaussian-Gamma conjugate priors for *b** _{m}* and

$$p({\mathit{b}}_{m},{\mathit{\lambda}}_{m}\mid \mathit{\alpha})\sim N(0,{({\mathit{\lambda}}_{m}{\mathit{A}}_{\alpha})}^{-1})Ga({a}_{o},{b}_{o})$$

(B.15)

Where, ** α** = [

Let the prior on ** α** be

$$p(\mathit{\alpha})=\prod _{i=1}^{P}Ga({a}_{o},{b}_{o})$$

(B.16)

By applying equation (9), the joint posterior for *b** _{m}* and

$$q({\mathit{b}}_{m},{\mathit{\lambda}}_{m}\mid Y)=N({\overline{b}}_{m},{\mathit{\lambda}}_{m}^{-1}{V}_{m})Ga({a}_{m,N},{b}_{mN})$$

(B.17)

$${V}_{m}^{-1}=\mathrm{\Phi}\sum _{t=1}^{T}{P}_{m}(t){\mathrm{\Phi}}^{\prime}+{\overline{A}}_{\alpha}$$

(B.18)

$${\overline{b}}_{m}={V}_{m}\mathrm{\Phi}\sum _{\text{t}=1}^{\text{T}}{y}_{m}(t){x}_{t}^{T}(m)$$

(B.19)

$${a}_{m,N}={a}_{o}+\frac{T+P-1}{2}$$

(B.20)

$${b}_{m,N}={b}_{o}+0.5\left(\sum _{t=2}^{T}E({y}_{m}^{2}(t))-{\overline{b}}_{m}^{\prime}{V}_{m}^{-1}{\overline{b}}_{m}\right)$$

(B.21)

$${c}_{N}={c}_{o}+\frac{1}{2}$$

(B.22)

$${d}_{Ni}={d}_{o}+\frac{1}{2}\left({\overline{b}}_{m}^{2}(i)\frac{{a}_{m,N}}{{b}_{m,N}}+{V}_{m}(i,i)\right)$$

(B.23)

$${\overline{A}}_{\alpha}(i,i)=\frac{{c}_{N}}{{d}_{Ni}}$$

(B.24)

The posteriors for *b*_{m},*λ** _{m}* and

The VB-E and VB-M steps are repeated until convergence.

The above iterative procedure (E and M step) needs to be initialized. In this work, we estimate the initial values of latent signals(** ŝ**(

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- Abler B, Roebroeck A, Goebel R, Hose A, Schonfeldt-Lecuona C, Hole G, Walter H. Investigating directed influences between activated brain areas in a motor-response task using fMRI. Magn Reson Imaging. 2006;24:181–185. [PubMed]
- Bishop C. Pattern Recognition and Machine Learning. Springer; 2006.
- Box GEP, Jenkins GM, Reinsel GC. Time Series Analysis Forecasting and Control. Pearson Education; 1994.
- Bressler SL, Menon V. Large-scale brain networks in cognition: emerging methods and principles. Trends Cogn Sci 2010 [PubMed]
- Bressler SL, Seth AK. Wiener-Granger Causality: A well established methodology. Neuroimage 2010 [PubMed]
- Cassidy MJ, Penny WD. Bayesian nonstationary autoregressive models for biomedical signal analysis. IEEE Trans Biomed Eng. 2002;49:1142–1152. [PubMed]
- Daunizeau J, Friston KJ, Kiebel SJ. Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models. Physica D. 2009;238:2089–2118. [PMC free article] [PubMed]
- Deshpande G, Hu X, Stilla R, Sathian K. Effective connectivity during haptic perception: a study using Granger causality analysis of functional magnetic resonance imaging data. Neuroimage. 2008;40:1807–1814. [PMC free article] [PubMed]
- Deshpande G, Sathian K, Hu X. Effect of hemodynamic variability on Granger causality analysis of fMRI. Neuroimage 2009 [PMC free article] [PubMed]
- Friston K. Causal modelling and brain connectivity in functional magnetic resonance imaging. PLoS Biol. 2009a;7:e33. [PubMed]
- Friston K. Dynamic causal modeling and Granger causality Comments on: The identification of interacting networks in the brain using fMRI: Model selection, causality and deconvolution. Neuroimage 2009b [PMC free article] [PubMed]
- Friston KJ. Modalities, modes, and models in functional neuroimaging. Science. 2009c;326:399–403. [PubMed]
- Friston KJ, Harrison L, Penny W. Dynamic causal modelling. Neuroimage. 2003;19:1273–1302. [PubMed]
- Fuster JM. The cognit: a network model of cortical representation. Int J Psychophysiol. 2006;60:125–132. [PubMed]
- Ge T, Kendrick KM, Feng J. A novel extended Granger Causal Model approach demonstrates brain hemispheric differences during face recognition learning. PLoS Comput Biol. 2009;5:e1000570. [PMC free article] [PubMed]
- Glover GH. Deconvolution of impulse response in event-related BOLD fMRI. Neuroimage. 1999;9:416–429. [PubMed]
- Goebel R, Roebroeck A, Kim DS, Formisano E. Investigating directed cortical interactions in time-resolved fMRI data using vector autoregressive modeling and Granger causality mapping. Magn Reson Imaging. 2003;21:1251–1261. [PubMed]
- Guo S, Wu J, Ding M, Feng J. Uncovering interactions in the frequency domain. PLoS Comput Biol. 2008;4:e1000087. [PMC free article] [PubMed]
- Havlicek M, Jan J, Brazdil M, Calhoun VD. Dynamic Granger causality based on Kalman filter for evaluation of functional network connectivity in fMRI data. Neuroimage [PMC free article] [PubMed]
- Hemmelmann D, Ungureanu M, Hesse W, Wustenberg T, Reichenbach JR, Witte OW, Witte H, Leistritz L. Modelling and analysis of time-variant directed interrelations between brain regions based on BOLD-signals. Neuroimage 2009 [PubMed]
- Hesse W, Moller E, Arnold M, Schack B. The use of time-variant EEG Granger causality for inspecting directed interdependencies of neural assemblies. J Neurosci Methods. 2003;124:27–44. [PubMed]
- Koller D, Friedman N. Probabilistic graphical models Principles and Techniques. The MIT Press; 2009.
- Makni S, Beckmann C, Smith S, Woolrich M. Bayesian deconvolution of [corrected] fMRI data using bilinear dynamical systems. Neuroimage. 2008;42:1381–1396. [PubMed]
- Mechelli A, Price CJ, Noppeney U, Friston KJ. A dynamic causal modeling study on category effects: bottom-up or top-down mediation? J Cogn Neurosci. 2003;15:925–934. [PubMed]
- Murphy KP. Technical report. DEC/Compaq Cambridge Research Labs; 1998. Switching Kalman Filters.
- Passingham RE, Stephan KE, Kotter R. The anatomical basis of functional localization in the cortex. Nat Rev Neurosci. 2002;3:606–616. [PubMed]
- Penny W, Ghahramani Z, Friston K. Bilinear dynamical systems. Philos Trans R Soc Lond B Biol Sci. 2005;360:983–993. [PMC free article] [PubMed]
- Prichard D, Theiler J. Generating surrogate data for time series with several simultaneously measured variables. Phys Rev Lett. 1994;73:951–954. [PubMed]
- Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of IEEE. 1989;77:257–285.
- Rajapakse JC, Zhou J. Learning effective brain connectivity with dynamic Bayesian networks. Neuroimage. 2007;37:749–760. [PubMed]
- Ramsey JD, Hanson SJ, Hanson C, Halchenko YO, Poldrack RA, Glymour C. Six problems for causal inference from fMRI. Neuroimage. 2009;49:1545–1558. [PubMed]
- Roebroeck A, Formisano E, Goebel R. Mapping directed influence over the brain using Granger causality and fMRI. Neuroimage. 2005;25:230–242. [PubMed]
- Roebroeck A, Formisano E, Goebel R. The identification of interacting networks in the brain using fMRI: Model selection, causality and deconvolution. Neuroimage 2009 [PubMed]
- Sato JR, Junior EA, Takahashi DY, de Maria Felix M, Brammer MJ, Morettin PA. A method to produce evolving functional connectivity maps during the course of an fMRI experiment using wavelet-based time-varying Granger causality. Neuroimage. 2006;31:187–196. [PubMed]
- Seth AK. Causal connectivity of evolved neural networks during behavior. Network. 2005;16:35–54. [PubMed]
- Seth AK. A MATLAB toolbox for Granger causal connectivity analysis. J Neurosci Methods. 2010;186:262–273. [PubMed]
- Smith JF, Pillai A, Chen K, Horwitz B. Identification and validation of effective connectivity networks in functional magnetic resonance imaging using switching linear dynamic systems. Neuroimage 2009 [PMC free article] [PubMed]
- Sridharan D, Levitin DJ, Menon V. A critical role for the right fronto-insular cortex in switching between central-executive and default-mode networks. Proc Natl Acad Sci U S A. 2008;105:12569–12574. [PubMed]
- Tipping M. Sparse Bayesian learning and relevant vector machine. Journal of Machine Learning Research. 2001;1:211–244.
- Valdes-Sosa PA, Sanchez-Bornot JM, Lage-Castellanos A, Vega-Hernandez M, Bosch-Bayard J, Melie-Garcia L, Canales-Rodriguez E. Estimating brain functional connectivity with sparse multivariate autoregression. Philos Trans R Soc Lond B Biol Sci. 2005;360:969–981. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |