There is much current interest in identifying the anatomical and functional circuits that we believe are the basis of the brain's computations (

Varela *et al*. 2001). Interest in neuroscience has shifted away from mapping sites of

*activation*, towards identifying the

*connectivity* that weave them together into dynamical systems (

Lee *et al*. 2003;

Bullmore *et al*. 2004). More importantly, the availability of functional neuroimaging techniques, such as fMRI, optical images, and EEG/MEG, opens hope for the

*in vivo* study of these neural processes through the statistical analysis of the time-series they produce. Unfortunately, the complexity of our object of study far outstrips the amount of data we are able to measure. Activation studies already face the daunting problem of analysing large amounts of correlated variables, measured on comparatively few observational units. These problems escalate when all pairs of relations between variables are of interest—a situation that has led some to consider that the concept of connectivity itself is ‘elusive’ (

Horwitz 2003).

A neural system is an instance of a complex network. A convenient representation is that of a graph () defined by a set of nodes that represents observed or unobserved (latent) variables, a set of edges, that indicate relations between nodes, and a set of probability statements about these relations (

Speed & Kiiveri 1986;

Wermuth & Lauritzen 1990;

Cowell *et al*. 1999;

Jensen 2002;

Jordan 2004). Graphs, with only undirected edges, have been extensively used in the analysis of covariance relations (

Wermuth & Lauritzen 1990;

Wermuth & Cox 1998,

2004), but do not attempt causal interpretations. Neuroimaging studies based on this type of model will identify what Friston has defined as ‘functional connectivity’ (

Friston 1994). To apply graphical models to functional neuroimaging data, one must be aware of the additional specificity that they are vector-valued time-series, with

the vector of observations at time

*t*, observed at Nt time instants. The

*p* components of the vector are sampled at different nodes or spatial points in the brain. There has been much recent work in combining graphical models with multiple time-series analysis. An excellent example of the use of undirected graphs in the frequency domain is

Bach & Jordan (2004) with applications to fMRI functional connectivity in

Salvador *et al*. (2005).

A different line of work is represented by Pearl (

1998,

2000,

2003) and Spirtes

*et al*. (

1991,

1998,

2000), among others, who studied graphs with directed edges that represent causal relations between variables. In the context of neuroimaging, searching for causality is what Friston terms the identification of effective connectivity. We will be concerned with this more ambitious type of modelling.

For functional neuroimages, the arrow of time may be used to help in the identification of causal relations. To be more specific, we model these time-series by means of a linear (stationary) multivariate autoregressive (MAR) model (

Hamilton 1994;

Harrison *et al*. 2003). While this type of model is very restrictive and brain-unrealistic, it will serve our purpose of developing methods for identifying connectivities in large complex neural networks for which the number of nodes

*p* is very large compared with Nt. The general MAR model reads:

The dynamics of the process modelled are determined by the matrices of autoregressive coefficients

that are defined for different time lags

*k* and the spatial covariance matrix

of

, the white-noise input process (innovations). MAR modelling has been widely applied in neuroscience research (

Baccala & Sameshima 2001;

Kaminski *et al*. 2001;

Harrison *et al*. 2003).

Unfortunately there is a problem with this approach when dealing with neuroimaging data: the brain is a network with extremely large

*p*, in the order of hundreds of thousands. A ‘curse of complexity’ immediately arises. The total number of parameters to be estimated for model

(1.1) is

, a situation for which usual time-series methods break down. One approach to overcome this curse of complexity is to pre-select a small set of regions of interest (ROI), on the basis of prior knowledge. Statistical dependencies may then be assayed by standard methods of time-series modelling (

Hamilton 1994) that in turn are specializations of multivariate regression analysis (

Mardia *et al*. 1979). The real danger is the probable effect of spurious correlations induced by the other brain structures not included for study. Thus, the ideal would be to develop MAR models capable of dealing with large

*p*.

An alternative to using ordinary multivariate regression techniques for model

(1.1) is to attempt regression based on selection of variables. This could drastically reduce the number of edges in the network graph to be determined, effectively restricting our attention to networks with sparse connectivity. That this is a reasonable assumption is justified by studies of the numerical characteristics of network connectivity in anatomical brain databases (

Sporns *et al*. 2000;

Stephan *et al*. 2000;

Hilgetag *et al*. 2002;

Kotter & Stephan 2003;

Sporns *et al*. 2004). The main objective of this paper is to develop methods for the identification of sparse connectivity patterns in neural systems. We expect this method to be scaled, eventually, to cope with hundreds or thousands of voxels. Explicitly, we propose to fit the model with sparsity constraints on

and

.

Researchers into causality (

Scheines *et al*. 1998;

Pearl 2000) have explored the use of regression by the oldest of variable selection techniques—stepwise selection for the identification of causal graphs. This is the basis of popular algorithms such as principal components embodied in programmes such as T

etrad. These techniques have been proposed for use in graphical time-series models by

Demiralp & Hoover (2003). Unfortunately these techniques do not work well for large

*p*/Nt ratios. A considerable improvement may be achieved by stochastic search variable selection (SSVS), which relies on Markov chain–Monte Carlo (MCMC) exploration of possible sparse networks (

Dobra *et al*. 2004;

Jones & West 2005). These approaches, however, are computationally very intensive and not practical for implementing a pipeline for neuroimage analysis.

A different approach has arisen in the data mining context, motivated to a great extent by the demands posed by analysis of micro-array data (

West 2002;

Efron *et al*. 2004;

Hastie & Tibshirani 2004;

Hastie *et al*. 2001). This involves extensive use of Bayesian regression modelling and variable selection, capable of dealing with the

*p*Nt situation. Of particular interest is recent work in the use of penalized regression methods for existing variable selection (

Fan & Li 2001;

Fan & Peng 2004) which unify nearly all variable selection techniques into an easy-to-implement iterative application of minimum norm or ridge regression. These techniques have been shown to be useful for the identification of the topology of huge networks (

Leng *et al*. 2004;

Meinshausen & Bühlmann 2004).

Methods for variable selection may also be combined with procedures for the control of the false discovery rates (FDR) (

Efron 2003,

2004,

2005) in situations where a large number of null hypothesis is expected to be true. Large

*p* in this case becomes a strength instead of a weakness, because it allows the non-parametric estimation of the distribution of the null hypotheses to control false discoveries effectively.

In a previous paper,

Valdes-Sosa (2004) introduced a Bayesian variant of MAR modelling that was designed for the situation in which the number of nodes far outnumbers the time instants (

*p*Nt). This approach is, therefore, useful for the study of functional neuroimaging data. However, that paper stopped short of proposing practical methods for variable selection. The present work introduces a combination of penalized regression with local FDR methods that are shown to achieve efficient detection of connections in simulated neural networks. The method is additionally shown to give plausible results with real fMRI data and is capable of being scaled to analyse large datasets.

It should be emphasized that in the context of functional imaging there are a number of techniques for estimating the effective connectivity, or edges, among the nodes of small pre-specified neuroanatomic graphs. These range from maximum likelihood techniques using linear and static models (e.g. structural equation modelling;

McIntosh & Gonzalez-Lima 1994) to Bayesian inference on dynamic nonlinear graphical models (e.g. dynamic causal modelling;

Friston *et al*. 2003). Almost universally, these approaches require the specification of a small number of nodes and, in some instances, a pre-specified sparsity structure, i.e. elimination of edges to denote conditional independence among some nodes. The contribution of this work is to enable the characterization of graphical models with hundreds of nodes using the short imaging time-series. Furthermore, the sparsity or conditional independence does not need to be specified

*a priori* but is disclosed automatically by an iterative process. In short, we use the fact that the brain is sparsely connected as part of the solution, as opposed to treating it as a specification problem.

The structure of this paper is as follows. The subsequent section introduces a family of penalized regression techniques useful for identifying sparse effective connectivity patterns. The effectiveness of these methods for detecting the topology of large complex networks is explored in

§2 by means of extensive simulations and is quantified by means of ROC measures. These methods are then applied together with local FDR techniques to evaluate real fMRI data. The paper concludes with a discussion of implications and possible extensions.