Search tips
Search criteria 


Neuroimage. 2007 December; 38(4-3): 677–695.
PMCID: PMC2643839

Diffusion-based spatial priors for imaging


We describe a Bayesian scheme to analyze images, which uses spatial priors encoded by a diffusion kernel, based on a weighted graph Laplacian. This provides a general framework to formulate a spatial model, whose parameters can be optimized. The application we have in mind is a spatiotemporal model for imaging data. We illustrate the method on a random effects analysis of fMRI contrast images from multiple subjects; this simplifies exposition of the model and enables a clear description of its salient features. Typically, imaging data are smoothed using a fixed Gaussian kernel as a pre-processing step before applying a mass-univariate statistical model (e.g., a general linear model) to provide images of parameter estimates. An alternative is to include smoothness in a multivariate statistical model (Penny, W.D., Trujillo-Barreto, N.J., Friston, K.J., 2005. Bayesian fMRI time series analysis with spatial priors. Neuroimage 24, 350–362). The advantage of the latter is that each parameter field is smoothed automatically, according to a measure of uncertainty, given the data. In this work, we investigate the use of diffusion kernels to encode spatial correlations among parameter estimates. Nonlinear diffusion has a long history in image processing; in particular, flows that depend on local image geometry (Romeny, B.M.T., 1994. Geometry-driven Diffusion in Computer Vision. Kluwer Academic Publishers) can be used as adaptive filters. This can furnish a non-stationary smoothing process that preserves features, which would otherwise be lost with a fixed Gaussian kernel. We describe a Bayesian framework that incorporates non-stationary, adaptive smoothing into a generative model to extract spatial features in parameter estimates. Critically, this means adaptive smoothing becomes an integral part of estimation and inference. We illustrate the method using synthetic and real fMRI data.

Keywords: Diffusion kernel, Weighted graph Laplacian, Spatial priors, Gaussian process model, fMRI, General linear model, Random effects analysis


Functional MRI data are typically transformed to a three-dimensional regular grid of voxels in anatomical space, each containing a univariate time series of responses to experimental perturbation. The data are then used to invert a statistical model, e.g., general linear model (GLM), after a number of pre-processing steps, which include spatial normalization and smoothing (i.e., convolving the data with a spatial kernel). In mass-univariate approaches (e.g., statistical parametric mapping), a statistical model is used to extract features from the smoothed data by treating each voxel as a separate observation. Model parameters, at each voxel, are estimated (Friston et al., 2002) and inference about these parameters proceeds using SPMs or posterior probability maps (Friston and Penny, 2003). Smoothing the data ensures the maps of parameter estimates are also smooth. This can be viewed as enforcing a smoothness prior on the parameters. The current paper focuses on incorporating smoothness into the statistical model by making smoothness a hyperparameter of the model and estimating it using empirical Bayes. This optimizes the spatial dependencies among parameter estimates and has the potential to greatly enhance spatial feature detection.

Recently Penny et al. (2005) extended the use of shrinkage priors on parameter estimates (Penny et al., 2003), which assume spatial independence, to spatial priors in a statistical model of fMRI time series. They developed an efficient algorithm using a mean-field approximation within a variational Bayes framework. The result is a smoothing process that is incorporated into a generative model of the data, where each parameter is smoothed according to a measure of uncertainty in that parameter. The advantage of a mean-field approximation is that inversion of a requisite spatial precision matrix is avoided. The advantage of a Bayesian framework is that the evidence for different spatial priors can be compared (MacKay, 2003). Other Bayesian approaches to spatial priors in fMRI include those of Gossl et al. (2001); Woolrich et al. (2004); and more recently Flandin and Penny (2007).

There are two main departures from this previous work on spatiotemporal models in the current method. The first is that we use a Gaussian process prior (GPP) over parameter estimates. Spatial correlations are then encoded using a covariance matrix instead of precisions (cf. Penny et al., 2005). The second is that the covariance matrix is the Green's function of a diffusive process, i.e., a diffusion kernel, which encodes the solution of a diffusion equation involving a weighted graph Laplacian. This has the advantage of providing a full spatial covariance matrix and enables inference with regards to the spatial extent of activations. This is not possible using a mean-field approximation that factorizes the posterior distribution over voxels. The result is an adaptive smoothing that can be spatially non-stationary, depending on the data. This is achieved by allowing the local geometry of the parameter field to influence the diffusion kernel (smoothing operator). This is important as stationary smoothing reveals underlying spatial signal at the expense of blurring spatial features. Given the convoluted spatial structure of the cortex and patchy functional segregation, it is reasonable to expect variability in the gradient structure of a parameter field. The implication is that the local geometry of activations should be preserved. This can be achieved with a nonlinear smoothing process that adapts to local geometric ‘features’. A disadvantage is the costly operation of evaluating matrix exponentials and inverting potentially large covariance matrices, which the mean-field approach avoids. However, many approximate methods exist (MacKay, 2003; Rasmussen and Williams, 2006) that can ameliorate this problem, e.g., sparse GPPs (see discussion and Quinonero-Candela and Rasmussen, 2005).

The paper is organized as follows. First, we discuss background and related approaches, before giving an outline of the theory of the method. We start with the model, which is a two-level general linear model (GLM) with matrix-variate density priors on GLM parameters. We focus on reducing the model to the specification of covariance components, in particular, the form of covariance and its hyperparameters. We then look at the form of the spatial priors using graph Laplacians and diffusion kernels. We then describe the EM algorithm that is used to update hyperparameters of covariance components, which embody empirical spatial priors. The edge preserving quality of diffusion over a weighted graph is demonstrated using synthetic data and then applied to real fMRI data. The illustrations in this paper use 2D spatial images, however, the method can be easily extended to 3D, subject to computational resources, which would be necessary to analyze a volume of brain data. We perform a random effects (between subjects) analysis (Penny and Holmes, 2003) on a sample of contrast images from twelve subjects. This means that we consider a scalar field of parameter estimates encoding the population response. However, the nonlinear diffusion kernels described here can be extended to fields of vectors and matrices (ChefD'Hotel et al., 2004; Zhang and Hancock, 2006b). This paper concludes with comments on outstanding issues and future work.


The current work draws on two main sources in the literature; diffusion-based methods in image processing and Gaussian process models (GPM). The image processing community has been using diffusion models for many years, e.g., for the restoration of noisy images (Knutsson et al., 1983). For overviews, from the perspective of scale-space theories, see Romeny (1994, 2003). These models rest on the diffusion equation, which is a nonlinear partial differential equation describing the density fluctuations in an ensemble undergoing diffusion; μ˙  = [nabla]·D(μ)[nabla]μ, where μ can be regarded as the density of the ensemble (e.g., image intensity) and D is the diffusion coefficient. Generally, the diffusion coefficient depends on the density, however, if D is a constant, the equation reduces to the ‘classical heat equation’; μ˙  =  D[nabla]2 μ, where [nabla]2  [equivalent] Δ is the Laplacian operator (second-order spatial derivative). A typical use in image processing is to de-noise an image, where the noisy image is the initial condition, μ(t  = 0) and a smoothed, de-noised, image is the result of integrating the heat equation to evaluate the diffused image at some time later; μ(t). In particular, Perona and Malik (1990) used nonlinear diffusion models to preserve the edges of images using an image dependent diffusion term, D  =  D([nabla]μ). The dependence on this spatial gradient has the effect of reduced diffusion over regions with high gradient, i.e., edges. Later formulations of nonlinear diffusion methods include those of Alvarez et al. (1992) and Weickert (1996). Of particular relevance to the method presented here are graph-theoretic methods, which use graph Laplacians (Chung, 1991). These have been used recently to adaptively smooth scalar, vector and matrix-valued images (Zhang and Hancock, 2005). Graphical methods provide a general formulation on arbitrary graphs, which is easy to implement. There are also many useful graph-based algorithms in the literature, e.g., image processing on arbitrary graphs (Grady and Schwartz, 2003) and, more generally, graph partitioning to sparsify and solve large linear systems (Spielman and Teng, 2004).

Gaussian process models also have a long history. A Gaussian process prior (GPP) is a collection of random variables, any finite number of which have a joint Gaussian distribution (MacKay, 2003; Rasmussen and Williams, 2006). As such it is completely specified by a mean and covariance function. This is a very flexible prior as it is a prior over a function, which can be used to model general data, not just images. Given a function over space, this function is assumed to be a sample from a Gaussian random field specified by a mean and covariance, which can take many forms, as long as it is positive semi-definite.

Diffusion methods in image processing and covariance functions in GPMs furnish the basis of a spatial smoothing operator; however, the emphasis of each approach is different. One main difference is that a GPM is a statistical model from which inferences and predictions can be made (MacKay, 1998). The objective is not solely to smooth data, but to estimate an optimal smoothing operator, which is embedded in a model of how the data were generated. Graphical models in machine learning (Jordan, 1999) provide a general and easy formulation of statistical models. The similar benefits of graph-based diffusion methods in image processing further motivates the use of graph-theoretic approaches to represent and estimate statistical images, given functional brain data.

The relation between models of diffusion and GPPs is seen when considering a random variable as a diffusive process, which locally is a Gaussian process. We can see this by comparing the Green's function of the classical heat equation, used in early diffusion methods in image processing (Romeny, 1994) and the squared exponential (SE) covariance function used in GPMs (Rasmussen and Williams, 2006). In two dimensions, (u k, u l), where subscripts indicate location in the domain and D is a scalar;




where K(τ) is the Green's function (solution) of the diffusion equation that represents the evolution of a solution over time. The first line is the special case of constant diffusion coefficient. The solution of this equation is given in the second and third line, where the image at time t, μ(t), is propagated to t  +  τ by convolution with the Green's function, or practically by the matrix–vector product using the matrix exponential of the scaled discrete Laplacian. This Green's function is Gaussian with variance 2, meaning that the image at t  +  τ is a smoothed version of μ(t). This is shown explicitly in the last line1 , which has the same form as the SE covariance function, given below, where the squared characteristic length scale is σ 2  = 2. Typically, a GPP has an additional scale hyperparameter to give


where λ  = (υ, σ). A zero mean GPP is then specified, at a set of locations, by the multivariate density, μ  ~  N(0, K) (Rasmussen and Williams, 2006). In what follows, we use a diffusion kernel as the covariance of a GPP. This is a spatial prior on model parameter images.

There are a number of papers applying methods from image processing to anatomical and functional brain images. These include those of Gerig et al. (1992), who applied nonlinear diffusion methods to MRI data and Chung et al. (2003) who used the Laplace–Beltrami operator (a generalization of the heat equation to a Riemannian manifold) in a statistical approach to deformation-based morphometry. Nonlinear diffusion methods have been used to adaptively smooth functional images (Kim and Cho, 2002; Kim et al., 2005). Other approaches to adaptive analysis of fMRI include those of Cosman et al. (2004); Friman et al. (2003); and Teo et al. (1997). Graph-theoretic approaches to image processing have been used to regularize diffusion tensor images (DTI) (Zhang and Hancock, 2006a). These authors used a weighted graph Laplacian, which is a discrete analogue of the Laplace–Beltrami operator, to adaptively smooth over a field of diffusion tensors, thereby preserving boundaries between regions, e.g., white matter tracts and grey matter.

The contribution of our work is to combine graph-theoretic methods from image processing and Gaussian process models from machine learning to provide a spatial model of fMRI data. We are essentially constructing a manifold out of the parameter estimates of a linear model of fMRI data and performing isotropic diffusion on the induced manifold, which is anisotropic from the perspective of the domain. In other words, the diffusion is isotropic on the sub-manifold (that represents the surface of the image) embedded in anatomical–feature space (see Fig. 1 ), which is anisotropic in anatomical space. This is somewhat related to the random field approach (Worsley et al., 1999), where isotropic smoothing is attained by smoothing along an induced manifold. In our application we use anisotropic diffusion as an empirical spatial prior in a Bayesian setting.

Fig. 1
GLM parameters as a function over physical space: (a) parameter estimates, considered as a function, μ(u1, u2), over a 2D regular mesh, is thought of as a 2D sub-manifold embedded within a 3D space comprising ‘anatomical’ and ‘feature’ ...

The model

In this section, we formulate a two-level GLM in terms of matrix-variate normal densities (Gupta and Nagar, 2000). Our focus is the formulation of this as a multivariate normal model, with emphasis on covariance components and their hyperparameters. We start with a linear model, under Gaussian assumptions, of the form

Y=Xθ+ε1p(Y,θ|X)=p(Y|X,θ)p(θ)θ=ε2[implies]p(Y|X,θ)=MN(Xθ,S1[multiply sign in circle]K1)εi~MN(0,Si[multiply sign in circle]Ki)p(θ)=MN(0,S2[multiply sign in circle]K2)

where the left-hand expressions specify a hierarchical linear model and the right-hand expressions define the implicit generative density in terms of a likelihood, p(Y|X, θ) and prior, p(θ). MN stands for matrix-normal, where the density on matrix A[set membership]Rr×c,A~MN(M,S[multiply sign in circle]K), has a mean, M, of size r  ×  c, with covariances, K and S, of size c  ×  c and r  ×  r, that encode covariance between columns and rows respectively2 . Here, Y is a T  ×  N data matrix and X is a T  ×  P design matrix with an associated unknown P  ×  N parameter matrix θ.

The errors at both levels have covariance S i over rows (e.g., time, subjects or regressors) and K i over columns (e.g., voxels). In this paper S i are fixed. Eq. (3) is a typical model used in the analysis of fMRI data comprising T scans, N voxels and P parameters. The addition of the second level places empirical shrinkage priors on the parameters. This model can now be simplified by vectorizing each component using the identity vec(ABC) = (C T [multiply sign in circle]A)vec(B) (see Appendix I and Harville, 1997)


where y  = vec(Y), Z  =  I N [multiply sign in circle]X, w  = vec(θ), e i  = vec(ε i) and Σ i  =  K i [multiply sign in circle]S i. [multiply sign in circle] is the Kronecker product of two matrices and I N is the identity matrix of size N. The unknown covariances Σ(λ)1 and Σ(λ)2 depend on hyperparameters, λ. The model parameters and hyperparameters are estimated using expectation maximization (EM) by maximizing the log-marginal likelihood


with respect to the parameters in the E-Step and the covariance hyperparameters in the M-Step. Here, Σ(λ) represents the covariance of the data induced by both levels of the model. The model inversion with EM will be described later (see also Appendices I and II). First, we look at the hyperparameterization of the spatial covariances and the specific forms of K(λ) entailed by Σ i  =  K i [multiply sign in circle]S i.

The priors

In the previous section, we reduced the problem of specifying a linear empirical Bayesian model to the specification of prior covariance components for noise and signal. In this section, we introduce diffusion-based spatial priors and consider adaptive priors that are functions of the parameters. In brief, we will assume the error or noise covariance is spatially unstructured; i.e., Σ 1  =  K 1 [multiply sign in circle]S 1, where K(λ)1  =  υ 1 I N and S 1  =  I T. This means that υ 1 is the error variance. For simplicity, we will assume that this is fixed over the same for all voxels; however, it is easy to specify a component for each voxel, as in conventional mass-univariate analyses.

For the signal, we adopt an adaptive prior using a non-stationary diffusion kernel, which is based on a weighted graph Laplacian (see Chung, 1991 and next section), L(μ), which is a function of the conditional expectation of the parameters, μ  = left angle bracketwright angle bracket


This means the hyperparameters comprise, λ  = {υ 1,υ 2,τ}, where the first hyperparameter controls a stationary independent and identical (i.i.d.) noise component, the second the amplitude of the parameter image and third its dispersion. The matrix L in Eq. (6) is a weighted graph Laplacian, which is a discrete analogue of the Laplace–Beltrami operator used to model diffusion processes on a Riemannian manifold. The solution of the heat equation is3




The diffusion kernel, K  = exp(−  ), is the local solution to the heat equation on a graph and corresponds to a symmetric diffusion kernel that encodes the dispersion of μ, over a period τ. The diffusion kernel also evolves according to the heat equation


We use this diffusion kernel as the covariance matrix of a GPM. Generally, the Laplacian is a function, L(μ (m)), of the current image (of parameter expectations), where the superscript indicates the m th iteration. In this situation, Green's function is a composition of local solutions.


Updating K D (m) requires computation of the current Laplacian L (m) and a matrix multiplication, both of which are costly operations on large matrices. However, if the Laplacian is approximately constant then K D (m) can be evaluated much more simply


This approximation retains the edge preserving character of the diffusive flow, without incurring the computational cost of re-evaluating the Laplacian. Our experience is that weighted graph Laplacians based on the ordinary least squares (OLS) estimate, μ ols, gives very reasonable results. However, the need to update the Laplacian may arise when the OLS parameter estimate is very noisy. All anisotropic Laplacian priors in this paper are based on μ ols; however, we have included update equations based on the Baker–Campbell–Hausdoff formula in Appendix IV.

In summary, the covariance components and their derivatives are:



[partial differential]K1/[partial differential]υ1=IN

[partial differential]K2/[partial differential]υ2=KD

[partial differential]K2/[partial differential]τ=υ2LKD

where their hyperparameters λ  = {υ 1, υ 2, τ} are optimized to ensure an optimal balance between signal and noise and that the parameter estimates have an optimal, non-stationary and non-isotropic smoothness encoded by the spatial covariance, K 2. In the next section, we review graph Laplacians and the diffusion model in more detail and then conclude with a summary of the EM scheme used for optimization.

Diffusion on graphs

In this section, we describe diffusion on graphs and illustrate how this furnishes useful spatial priors for parameters at each voxel. This formulation is useful as it is easily extended to vector and matrix-valued images, which will be necessary when modeling a general vector-field of parameter estimates, e.g., when the number of columns of the design matrix is greater than one. We start with some basic graph theory and then discuss diffusion in terms of graph Laplacians. The end point of this treatment is the form of the diffusion kernel, K D, used in the spatial prior of the previous section. We will see that this is a function of the parameters that enables the prior smoothness to adapt locally to non-stationary features in the image of parameter estimates.

GLM parameters as a function on a graph

We consider the parameter estimates as a function on a graph, Γ, with vertices, edges and weights, Γ  = (V, E, W). The vertices are indexed 1 to N and pairs are connected by edges, E kn, where (k, n[set membership]  V. If two vertices are connected, i.e., are neighbors, we write k  ~  n. Consider a regular 2D mesh with spatial coordinates u 1 and u 2. Fig. 1a shows a surface plot of OLS parameter estimates of synthetic data described later (see Fig. 4) to illustrate the construction of a weighted graph Laplacian. To simplify the discussion we concentrate on a small region over a 3 × 3 grid, or stencil (see inset). Pairs of numbers u 1, u 2 indicate a vertex or pixel location, where each number corresponds to a spatial dimension. The function has a value at each pixel (voxel if in 3D anatomical space) given by its parameter estimate μ(u), so that three numbers locate a pixel at which a parameter has a specific value, u 1, u 2, μ(u 1, u 2). These are coordinates of the parameter estimate at a pixel in Euclidean space, R3, which decomposes into ‘anatomical’ and ‘feature’ space coordinates (lower right of Fig. 1a). In this case these are 2 and 1 respectively. The 2D image is considered as a 2D sub-manifold of this 3D embedding space (Sochen et al., 1998), which provides a general framework that is easily extended to 3D anatomical space and feature dimensions greater than one. We represent the k th pixel by v k. Distance between two pixels is taken as the shortest distance along the 2D sub-manifold of parameter estimates embedded in R3. This is a geodesic distance between points on the sub-manifold, d s(v k, v n). This is shown schematically in Fig. 1c between neighboring pixels. The shortest distance is easy to compute for direct neighbors (example shown in red), however, if the stencil were larger then fast marching algorithms (Sethian, 1999) may be used to compute the shortest path between two points on the sub-manifold. Note that the displacement along the feature coordinates is scaled by a, such that if a  = 0, then d s is reduced to distance on the 2D domain and is no longer a function of image intensity (see subsection on special cases). The construction of a weighted graph Laplacian starts by specifying weights of edges between vertices, w kn. These are a function of the geodesic distance, d s(v k, v n), and are important for specifying non-stationary diffusion. This is shown in Fig. 1b for the 3 × 3 stencil in Fig. 1a.

Fig. 4Fig. 4
Synthetic data — random effects analysis: (a) twelve samples of the binary image shown in Fig. 3, (b) original image and OLS estimate, (c) estimated conditional means using GSP, EGL and GGL, (d) posterior probability maps at thresholds, p(w > 0.33) > 0.95. ...

Graph Laplacian

As mentioned above, a graph is composed of vertices, edges and weights. Neighboring vertices are encoded by the adjacency matrix, A, with elements


Weights make up a weight matrix, W, with elements


The un-normalized Laplacian of Γ is L  =  D  −  W, where D is a diagonal matrix with elements D kk  =  Σ n w kn, which is the degree of the k th vertex. The graph Laplacian is sometimes called the admittance or Kirchhoff matrix. The weights w kn  [set membership] [0, 1] encode the relationship between neighboring pixels and are symmetric; i.e., w kn  =  w nk. They play the role of conductivities, where a large value enables flow between pixels. κ is a constant that controls velocity of diffusion, which we set to one.

The weights are a function of the distance, d s(v k, v n), on the surface of the function, μ(u), between vertices v k and v n. It is this distance that defines the nature of diffusion generated by the weighted graph Laplacian. In brief, we will define this distance to make the diffusion isotropic on the surface of the parameter image. This means, when looked at from above, that the diffusion will appear less marked when the spatial gradients of parameters are high. In other words, diffusion will be attenuated at the edges of regions with high parameter values. More formally, we specify the distance by choosing a map, χ, from the surface of the function μ(u) to an embedding space, the Euclidean space of R3. Each space has a manifold and metric, (M, g) and (N, h), respectively (see Appendix III for more details and a heuristic explanation).


Choosing a metric, H, of the embedding space (see below) and computing the Jacobian, J, we can calculate the induced metric, G, on μ(u) (Sochen et al., 1998). In matrix form, these are

H=(a1000a2000aμ)J=[partial differential]χ[partial differential]u=(1001μμ1μμ2)

where a i are the relative scales among dimensions and derivatives are with respect to physical space; i.e., μ x  = [partial differential]μ/[partial differential]x, which are computed using central differences. The induced metric is then


which is used to calculate distance


where du  = (du 1, du 2)T. Due to the dependence of the graph Laplacian on the parameters we write the Laplacian as L  =  L([nabla]μ ols), where the Laplacian is computed using the OLS estimate, μ ols. As this depends on geodesic distances on the embedded sub-manifold of an image we call it a geodesic graph Laplacian (GGL). If a μ  = 0 then the Laplacian is based on Euclidean distance in anatomical space. We refer to this as a Euclidean graph Laplacian (EGL) (see also subsection below on special cases). Note that we have chosen the embedding coordinates and embedding space metric. This is one of the advantages of a geometric formulation as we could have chosen a non-Euclidean anatomical space, e.g., spherical coordinates to model diffusion on the surface of a sphere (see Sochen et al., 2003; Sochen and Zeevi, 1998). The diffusion kernel can be computed efficiently using an eigenvalue decomposition of the Laplacian.


This is a standard method for computing the matrix exponential (Moler and Van Loan, 2003) with the added benefit that knowing the eigensystem simplifies many computations in the algorithm. See Appendix IV for more details.

Expectation maximization

Inversion of the multivariate normal model in Eq. (4) is straightforward and can be formulated in terms of expectation maximization (EM). EM entails the iterative application of an E-Step and M-Step (see Appendix 1 and Friston et al., 2002, 2007 for details). The E-Step evaluates the conditional density of the parameters in terms of their expectation and precision (i.e., inverse variance); p(w|y,λ) =  N(μ, Π − 1 ), where


The unknown covariances Σ(λ)i  =  K i [multiply sign in circle]S i are functions of covariance hyperparameters, λ. These are estimated by maximizing the log-marginal likelihood, ln p(y|λ), in an M-Step. This involves updating the hyperparameters by maximizing a bound on the log-marginal likelihood or log-evidence


We update hyperparameters (indexed by subscripts) using a Fisher-scoring scheme4 , where Δλ represents incremental change of λ

M−StepΔλ=I(λ)1[nabla]λF[partial differential]F[partial differential]λk=12tr(Σ1[partial differential]Σ[partial differential]λk)+12yTΣ1[partial differential]Σ[partial differential]λkΣ1yIkl=left angle bracket[partial differential]2F[partial differential]λk[partial differential]λlright angle bracket=12tr(Σ1[partial differential]Σ[partial differential]λkΣ1[partial differential]Σ[partial differential]λl)

I(λ) is the expected information matrix, see Wand (2002), with element I kl, where the expectation, left angle bracketright angle bracket, is over the marginal likelihood of the data, [nabla]λF is the score, i.e., a vector of gradients (k th element given by [partial differential]F/[partial differential]λ k) with respect to the hyperparameters and Σ is the current maximum likelihood estimate of the data covariance (see Appendix I). In the examples below, we fix S 1  =  I T and S 2  = 1; this means the only unknown covariances are K(λ)i. This scheme is formally identical to classical restricted maximum likelihood (ReML) (see Friston et al., 2007).

In summary, to invert our model we simply specify the covariances K(λ)i and their derivatives, [partial differential]K/[partial differential]λ i. These enter an M-Step to provide ReML or ML-II estimates of covariance hyperparameters. K(λ)i are then used in the E-Step to provide the conditional density of the parameters. E- and M-Steps are iterated until convergence, after which, the objective function for the M-Step can be used as an approximation to the models log-evidence. This quantity is useful in model comparison and selection, as we will see later when comparing models based on different spatial priors.

We now have all the components of a generative model (shown schematically in Fig. 2 ) that, when inverted, furnishes parameter estimates that are adaptively smooth, with edge preserving characteristics. Furthermore, this smoothing is chosen automatically and optimizes the evidence of the model. Before applying this scheme to synthetic and real data we will consider some special cases that will be compared in the final section.

Fig. 2
Generative model: single observations of an image are generated from the GPP, p(vec(θ)|λ) = N(0, Σ2), parameterized by υ2,τ. Multiple observations are collected in the matrix Y. X is the design matrix ...

Special cases

Linear diffusion: aμ = 0

If we set the scale of the parameter dimension a μ  = 0 (see Eq. (16) and Fig. 1c) we recover linear diffusion. The Laplacian (EGL) is now independent of μ. In this case edges are not preserved by the smoothness prior. Although these kernels are not the focus of this paper they are still useful and, as we demonstrate later, produce compelling results compared to non-spatial priors.

Global shrinkage priors: KD = I

If we removed diffusion by setting the Laplacian to zero then K 2  =  υ 2 I N. This corresponds to global or spatially independent (shrinkage) priors (GSP) of the sort used in early posterior probability maps using empirical Bayes (Friston and Penny, 2003). Here, we use the variability of parameter estimates over pixels to shrink their estimates appropriately and provide a posterior or conditional density.

Ordinary least squares estimate: K2 = 0

The OLS estimate obtains when we remove the empirical priors completely by setting K 2  = 0.

Synthetic data: de-noising an image

In this section, we apply the algorithm to one image, i.e. T  = 1, to demonstrate edge preservation and provide some intuition as to how this is achieved using a diffusion kernel based on a GGL and compare it to EGL. We use synthetic data shown in Fig. 3 . The model is


Fig. 3Fig. 3
Synthetic data — de-noising an image: (a) a binary image with added noise is shown in the central panel (see grey-scale for pixel values). GPMs using diffusion kernels from EGL and GGL are shown on left and right respectively. On the left, noise ...

The central panel of Fig. 3a contains a binary image of a 2D closed curve (with values equal to one within and on the curve) on a circular background of zeros. Gaussian noise has been added, which we will try to remove using a GPM based on a diffusion kernel. The left panel shows the conditional expectation or mean using a diffusion kernel from a EGL, while GGL is shown on the right. Below each image is a color bar, which encodes the grey-scale of each pixel in the image. It is immediately obvious that smoothing with a EGL removes noise at the expense of blurring the image, while the GGL preserves edges. This is reflected in the values of log-marginal likelihood achieved for each prior (see Table 1 ). We will discuss the ratio of these values in the next section when we consider model comparison.

Table 1
Model comparison for synthetic data shown in Fig. 3: fixed parameters and log evidence (natural logarithm) for EGL and GGL (difference shown in parentheses)

Fig. 3b shows contours of local diffusion kernels at three locations in the image, where the local diffusion kernel at the k th pixel is a 2D image reconstructed from the appropriate row of K 2. These are superimposed on the smoothed image. Locations include (i) within the perimeter of the object, (ii) near the edge inside the perimeter of the object and (iii) outside the perimeter of the object. The EGL is on the left, where local kernels are the same through-out the image. This is not the case for the GGL on the right. The EGL smoothes the image isotropically, i.e., without a preferred direction. On the right, local kernels within and outside the perimeter of the object are different, depending on their relation to the edge of the central object. The contours of kernels within the perimeter spread into the interior of the object, but stop abruptly at the edge, encoding the topology of the surface, much like the contours on a geographic map. As a result the image is smoothed such that noise is removed, but not at the expense of over smoothing the edges of signal. This is shown in Fig. 3c for a cross-section at the level indicated by a dotted line in Fig. 3b. This shows the original binary image, noisy image, smoothing with EGL and GGL.

Further intuition comes from considering the induced metric tensor, G; consider the square-root of the determinant, det(G), which is the ratio of surface and domain areas. An area element on the surface of μ(u) is dAs=det(G)du1du2, while one on the domain is dA D  = du 1du 2. This gives a ratio dAS/dAD=det(G), which can be calculated at each location of the image. This is referred to as the magnification factor (Bishop, 1999). This provides a scalar value at each pixel that represents a salient difference between the sub-manifold of the function, compared to the flat surface of the domain. This is shown in Fig. 3d. Flat regions have ratio of about one, while edges are greater than unity. High values correspond to locations where the distance on μ(u) between adjacent pixels (see Fig. 1) is large; i.e., at an edge, where gradients are large. This results in a small weight across the edge connecting these pixels and reduced flow. The effect is that regions with large gradients have less smoothing. As large gradients are a feature of edges, this means that they are preserved. To highlight the anisotropic nature of the ensuing diffusion, we have super-imposed ellipses representing the orientation and magnitude (eigenvectors and eigenvalues respectively) of G at a selection of different locations. Red and blue ellipses represent det(G) greater and lower than 2, respectively. It can be seen that the metric tensor is aligned with the edge of the central figure and isotropic elsewhere. This leads to preferential diffusion within the image and edge preservation.

Lastly, we have included a representation of a global property of the graph Laplacian using the graph plot routine in Matlab (gplot.m) of the second and third eigenvectors (of the Laplacian) in Figs. 3e and f. This is a standard representation of similarity between vertices on a graph. The second eigenvector is known as the Fiedler vector (Chung, 1991) and is used in graph-partitioning algorithms. The EGL is regular, whereas the GGL illustrates partitioning of the image into two distinct parts; the central region, which represents the central object in the image of Fig. 3a, while the background is represented by the periphery of the plot.


In this section, we compare the performance of three different Gaussian process priors used to model the same data. These were global shrinkage priors (GSP) and diffusion kernels from Euclidean (EGL) and geodesic graph Laplacians (GGL).

Synthetic data: random effects analysis

The next simulation is similar to the above except now we have twelve samples of the image. Their purpose is to demonstrate posterior probability maps (PPMs) and model selection. The samples, original image, OLS estimate and estimated posterior means are shown in Figs. 4a–c. Fig. 4d compares PPMs using the three different priors. The first observation is enhanced detection of the signal with EGL and GGL compared to GSP. The second is the edge preserving nature of GGL. The evidence for each model is shown in Table 2 . As expected, given data with edges, the GGL attains the highest evidence. The log-marginal likelihood ratio (natural logarithm) for EGL and GGL was 146, which is very strong evidence in favor of the GGL. A direct comparison of the log-marginal likelihood is possible as the number of hyperparameters is equal for EGL and GGL. Additional penalty terms can be included for model comparison based on the number of hyperparameters used in a model and their uncertainty (Bishop, 1995). Details of these additional terms are included in Appendix II.

Table 2
Model comparison for synthetic (Fig. 4) and real data (Fig. 5)

Real data: random effects analysis

fMRI data collected from twelve subjects during a study of the visual motion system (Harrison et al., 2007) were used for our comparative analyses. The study had a 2 × 2 factorial design with motion type (coherent or incoherent) and motion speed as the two factors. Single subject analyses were performed, with no smoothing, using SPM2 ( to generate contrast images of the main effect of coherence. Images (one slice) of the twelve contrast images are shown in Fig. 5a. These constitute the data, Y, and the design matrix, X  = 1, was a column of ones, implementing a single-sample t-test. The aim was to estimate μ(u); the conditional expectation of the main effect of coherent motion as a function of position in the brain. We calculated μ(u) under the different priors above.

Fig. 5Fig. 5Fig. 5Fig. 5
Real data: twelve contrast images of a slice showing bilateral response in posterior cingulated gryi (pCG) during a study of coherent motion (Harrison et al., 2007). (a) Twelve samples (b) estimated conditional means using EGL and GGL (left and right). ...

For demonstration purposes we selected a slice of the whole brain volume, which contained punctuate responses from bilateral posterior cingulate gyri (pCG). The conditional expectations under the Laplacian priors are shown in Fig. 5b for EGL and GGL. Although regions of high or low parameter estimates can be distinguished in the EGL analysis (left panel), the borders are difficult to make out. The posterior mean of GGL is different with well-delineated borders between regions of high and low coefficients, e.g., increased response in pCG. Activated regions are no longer ‘blobs’. This feature is easily seen in Fig. 5c with contour plot of a local kernel superimposed.

Posterior probability maps (Friston and Penny, 2003) of full slices for EGL and GGL are shown in Fig. 5d with thresholds p(w  > 0.5) > 0.95 and close-ups of all three priors in Fig. 5e. The odd-one-out is the model with global shrinkage priors that had only one positive region within the whole slice. Surface plots are shown in Figs. 5f–h and graph embeddings in Figs. 5i and j. Note the vertical axes of surface plots showing largest magnitude for GGL and the degree of shrinkage with GSP.

The log-marginal likelihood for each model is shown in Table 2. The highest log-evidence was for GGL. The difference in log-evidence for the geodesic and Euclidean Laplacian priors was 260. This represents very strong evidence that the data were likely to be generated from a field with spatial covariance given by GGL compared to EGL. This sort of model comparison suggests that the data support the use of adaptive diffusion kernels when modeling spatial covariance of activation effects.


We have outlined a Bayesian scheme to estimate the optimal smoothing of conditional parameter estimates using a Gaussian process model. In this model, the prior covariance uses a diffusion kernel generated by a weighted graph Laplacian.

There are many issues to consider. We have only demonstrated the model using scalar-valued parameter images. A typical GLM of single subject data has a vector of parameters of size P  × 1, at each voxel, with a covariance matrix, size P  ×  P, where the design matrix has P columns. This means that there is a vector-field of regression coefficients over a brain volume. The weights of a GGL can be a function of scalars, vectors or matrices, which make it very flexible. For example, a GGL based on distance between parameter vectors at different voxels is easily implemented using the scheme presented here. More complex spaces, such as a field of symmetric positive definite (SPD) matrices, used to regularize Diffusion Tensor Images (DTI) (Chefd’hotel et al., 2002; Tschumperle and Deriche, 2003; Zhang and Hancock, 2006a), require methods from Lie group analysis, where a SPD matrix is represented as a sub-manifold of RP2. Matrices can be represented by vectors and probabilities over such a space can be represented by Gaussian densities (Begelfor and Werman, 2005), which suggests the possibility of using a Gaussian process prior over a spatial distribution of SPD matrices, or a Lie–Gaussian process prior. We also have considered the simplest noise model in this paper; however, noise models that vary over space, i.e., a heteroscedastic noise process, are also easily formulated using Gaussian process priors (see Chapter 5, Rasmussen and Williams, 2006) and Goldberg et al., 1998; Kersting et al., 2007). A possible use in fMRI is a GPP over autoregressive model coefficients in single subject data-sets following Penny et al. (2007).

A major computational issue is the time needed to compute the eigensystem of the Laplacian from which the matrix exponential, inverse, trace and determinant can be computed. The computational complexity scales with N 3, which is an issue for large data-sets. We have made no attempt to address this issue here, as our focus was on the combination of graph-theoretic approaches to image processing and spatial GPMs. The time taken to process the 3319 voxels in the random-effects analysis above was about 20 min on a standard personal computer. This has to be reduced, especially if a whole volume is to be processed. An advantage of a geometric formulation of the Laplacian is that 2D coordinates of the cortical surface can be used as the anatomical space and suggests using a cortical mesh, similar to that used in MEG source reconstruction. The cortical mesh is constructed from anatomical MRI and contains subject-specific anatomical information. A GPP based on such a diffusion kernel provides a way to formulate not only anatomically, but also functionally informed basis functions, thereby extending work by Kiebel et al. (2000).

There is a growing literature on sparse GPPs for regression (Lawrence, 2006; Quinonero-Candela and Rasmussen, 2005) used to formulate an approximate instead of a full GPP for use on large data-sets. Alternatively, multi-grid methods may be used (Larin, 2004) or we can utilize the graphical structure of the model and apply graph-theoretic methods to optimally partition a graph into sub-graphs or nested graphs (Tolliver et al., 2005). Recently, the computationally efficient properties of wavelets have been used to adaptively smooth fMRI parameter images (Flandin and Penny, 2007). Diffusion wavelets are an established method for fast implementation of general diffusive processes (Coifman and Maggioni, 2006; Maggioni and Mahadevan, 2006) and suggest an alternative implementation of the current method.

The issue of inverting large matrices is avoided by using the mean-field approximation of Penny et al. (2005). The spatial precision matrix they used is equivalent to the Euclidean graph Laplacian used here. This encodes local information, given by the neighborhood of a vertex, which they use in a variational Bayes scheme to estimate scaling parameters for each regressor, given data. The spatial covariance matrix can be calculated from the matrix exponential of this, which requires consideration when dealing with large data-sets as outlined above. What we get in return is a full spatial covariance that encodes global properties of the graph and the possibility of multivariate statistics over anatomical space.

As we focused on second level (between-subject) analyses the issue of temporal dynamics did not arise. However, for single-subject data this is not the case. A sensible approach would be to use a GLM to summarize temporal features in the signal and adaptively smooth over the vector-field of GLM regression coefficients, as mentioned above. Alternatively, a Kalman filter approach could be used; however, this may be more appropriate for EEG/MEG. The resulting algorithm would have a GPP for spatial signal with temporal covariance constrained by the Kalman filter.

The application of the current method to random-effects analysis was for demonstration purposes only. The method may also be useful in modeling functional and structural data from lesion studies, retinotopic mapping in fMRI, high-resolution fMRI and diffusion tensor imaging (Faugeras et al., 2004; Zhang and Hancock, 2006a).


The Wellcome Trust and British Council funded this work.


1Element-wise exponential as opposed to matrix exponential used in Eq. (6).

2This means that the vectorized random matrix has a multivariate normal density, given by vec(AT) ~ N(vec(MT),S[multiply sign in circle]K) and vec(A) ~ N(vec(M),K[multiply sign in circle]S).

3Minus sign used by convention.

4This is equivalent to a Newton step, but using the expected curvature as opposed to the local curvature of the objective function.

Appendix A. Linear algebra for the EM scheme

This appendix provides notes on the linear algebra used to compute the gradients and curvatures necessary for the EM scheme in the main text. They are not necessary to understand the results presented above but help optimize implementation.

We require the bound on the log-marginal likelihood, ln p(y|λ)

F=12(ln|Σ|+yTΣ1y+TNln2π)Σ(λ)=Σ1+ZΣ2ZTΣi=Ki[multiply sign in circle]Si

and its derivatives. The first term of A.1 is


see Appendix of Rasmussen and Williams (2006). This can be reduced further using

ln|Σi|=ln|Ki[multiply sign in circle]Si|=rank(Si)ln|Ki|+rank(Ki)ln|Si|

The second term of A.1 is


where we have used vec(A)T vec(B) =  tr(A T B) and ε^ 1  =  Y  −  Xθ^ is the matrix of prediction errors, where μ  = vec(θ^) (see Eq. (19)).

The conditional precision is

[product]=ZTΣ11Z+Σ21=K11[multiply sign in circle]XTS11X+K21[multiply sign in circle]S21

Given the number of columns, P, of the design matrix used in this paper is one (i.e., a scalar field of parameter estimates) and K 2 includes a scale term, υ 2, we can assume X T S 1 − 1 X  =  S 2 − 1 (note that this is not the case in general). The conditional precision then factorizes and we avoid computing the Kronecker tensor products implicit in the EM scheme. The precision then simplifies to

[product]=K1[multiply sign in circle]S1K1=K11+K21S1=S21

However, more generally, i.e., P  > 1

[product]1=Σ1Σ1V2(D2+V2TΣ1V2)1V2TΣ1Σ1=K1[multiply sign in circle](XTS11X)1V2=VK2[multiply sign in circle]VS2D2=DK2[multiply sign in circle]DS2

where we have used the matrix inversion lemma (see below), eigenvalue decomposition; Σ 2  =  V 2 D 2 V 2   1 and

C=A[multiply sign in circle]B=(VA[multiply sign in circle]VB)(DA[multiply sign in circle]DB)(VA[multiply sign in circle]VB)1

where V and D are eigenvectors and eigenvalues respectively of matrices A and B and in this case V   1  =  V T. Using S 1  =  I T and S 2  = 1 the conditional mean decomposes into the OLS estimate and a shrinkage term due to the priors.


To compute the derivatives required for the M-Step, we use the matrix inversion lemma (MIL)


and standard results for Kronecker tensor products to show the score and expected information reduce to

[partial differential]F[partial differential]λk=12(tr(Ak)tr(Bk)tr(Ck)tr(Dk)tr(A˜kATB˜kTA))Ikl=12tr((Ak[multiply sign in circle]BkCk[multiply sign in circle]Dk)(Al[multiply sign in circle]BlCl[multiply sign in circle]Dl))

The second line can be simplified further using tr(A[multiply sign in circle]B) =  tr(A)tr(B). Expressions for à k, B˜k, A k, B k, C k, and D k are given in Table 3

Table 3

Expressions used to compute log-marginal likelihood and its derivatives

[partial differential]K1/[partial differential]υ1S1
[partial differential]K2/[partial differential]υ2XS2XT
[partial differential]K2/[partial differential]τXS2XT


K1− 1 ÃkS1−1kK1−1K¯K1− 1 ÃkS1− 1 XS¯XTS1− 1k

, where we have used X T S 1 − 1 X  =  S 2 − 1 (special case where P  = 1).

Appendix B. Computing the model evidence; accuracy and complexity

Given A.2 and A.4 we can write the bound on the log-marginal likelihood as


After convergence of the EM scheme, this bound can be used as an approximation to the log-evidence, which can be expressed in terms of accuracy and complexity terms,


where ê 1  = vec(ε^ 1). The first three terms of A.12 represent the accuracy and remaining terms complexity. To see the equivalence of A.11 and A.12 more clearly,


where the MIL and Eq. (19) are used in the first line and Eq. (19) again in the second. To see the decomposition into accuracy and complexity terms consider the likelihood, prior and posterior of the parameters (p(y|w,λ), p(w) and q(w) respectively), note that

F(q,λ)=dwq(w)lnp(y|w,λ)dwq(w)lnq(w)p(w)=left angle bracketL(w)right angle bracketq(w)KL(q(w)||p(w))left angle bracketL(w)right angle bracketq(w)=12(ln|Σ1|+e^1TΣ11e^1+tr(ZTΣ11Z[product]1))KL(q||p)=12(ln|Σ2[product]|+μTΣ21μ+tr(Σ21[product]1)tr(IPN))

where left angle bracketL(w)right angle bracketq(w) is the average log-likelihood under q(w) and the Kullback–Leibler (KL) divergence is a penalty term on the parameters. We have used standard results for Gaussian densities N(w; m, S)


where the KL divergence is over two Gaussian densities q(w) ~  N(μ 0, Σ 0) and p(w) ~  N(μ 1, Σ 1), i.e., are the posterior and prior densities over the parameters.

In practice, optimization of non-negative scale parameters in the M-Step uses the transform; γ i  = lnλ i. The derivatives in Table 3 are then [partial differential]K/[partial differential]γ i  =  λ i[partial differential]K/[partial differential]λ i. Under this change of variables, the hyperparameters have non-informative log-normal hyper-priors. Uncertainty about the hyperparameters can be included in the log-evidence for any model m. For example, the approximate log-evidence including uncertainty of one hyperparameter is

lnp(y|m)lnp(y|γ,m)12ln[partial differential]2F[partial differential]γ2=lnp(y|γ,m)+12lnI

Where the second-order derivative is the expected information used in the M-Step. See Friston et al. (2007) for details.

Appendix C. Metrics on manifolds

The intuition behind the induced metric comes from considering Pythagoras' theorem. See Fig. 1c (inset).


More formally, consider a one-dimensional curve embedded in two-dimensional Euclidean space. A map from one manifold, (M, g), to another, (N, h), where G and H are metrics associate with each respectively, is


Where u is a local coordinate on the curve and χ 1 and χ 2 are coordinates in the embedding space. A distance d s on the curve in terms of du is given by

H=(100a)J=[partial differential]χ[partial differential]u=(1μu)G=JTHJ=(1μu)(100a)(1μu)=1+aμu2ds2=Gdu2

Where the relative scale between the domain and feature coordinates is a and G is the induced metric, i.e., metric on the curve.

Appendix D. Computing the graph Laplacian

We assemble the graph Laplacian using a 3 × 3 stencil with 8 nearest neighbors. See Fig. 1 showing how the distance between function values (parameter image) at different points in the embedded sub-manifold are represented by a graph with edge weights, w kn. Computing the eigensystem of the graph Laplacian simplifies many computations; for example

L=VΛVTdiag(Λ)=(λ0,λ1,,λN1), whereλi0K=Vexp(Λ)VTK1=Vexp(Λ)VT|K|=[product]i=0N1exp(λi)tr(K)=Σi=0N1exp(λi)

The third line affords a way to compute the matrix exponential (Moler and Van Loan, 2003). The issue with updating the graph Laplacian with each iteration based on the posterior mean of the parameters is that it entails a composition of non-commuting matrices. An approximation is possible using a truncation of the Baker–Campbell–Hausdoff (BCH) formula (Rossmann, 2002).


where we have used the first three terms of the BCH formula;


[A, B] is the commutator of operators, A and B given by [A, B] =  AB  −  BA.


Alvarez L., Lions P.L., Morel J.M. Image selective smoothing and edge-detection by nonlinear diffusion: 2. SIAM J. Numer. Anal. 1992;29:845–866.
Begelfor E., Werman M. How to put probabilities on homographies. IEEE Trans. Pattern Anal. Mach. Intell. 2005;27:1666–1670. [PubMed]
Bishop C. Oxford University Press; Oxford: 1995. Neural Networks for Pattern Recognition.
Bishop C.M. Latent variable models. In: Jordan M.I., editor. Learning in Graphical Models. MIT Press; Massachusetts: 1999. pp. 371–403.
Chefd'hotel C., Tschumperle D., Deriche R., Faugeras O. Constrained flows of matrix-valued functions: application to diffusion tensor regularization. Comput. Vis. - Eccv. 2002;2350(Pt 1):251–265.
ChefD'Hotel C., Tschumperle D., Deriche R., Faugeras O. Regularizing flows for constrained matrix-valued images. J. Math. Imaging Vis. 2004;20:147–162.
Chung F. American Mathematics Society; Providence Rhode Island: 1991. Spectral Graph Theory.
Chung M.K., Worsley K.J., Robbins S., Paus T., Taylor J., Giedd J.N., Rapoport J.L., Evans A.C. Deformation-based surface morphometry applied to gray matter deformation. NeuroImage. 2003;18:198–213. [PubMed]
Coifman R.R., Maggioni M. Diffusion wavelets. Appl. Comput. Harmon. Anal. 2006;21:53–94.
Cosman E.R., Fisher J.W., Wells W.M. Exact MAP activity detection in fMRI using a GLM with an Ising spatial prior. Med. Image Comput. Comput. -Assist. Interv. - Miccai. 2004;3217(Pt 2):703–710. (Proceedings)
Faugeras O., Adde G., Charpiat G., Chefd'Hotel C., Clerc M., Deneux T., Deriche R., Hermosillo G., Keriven R., Kornprobst P. Variational, geometric, and statistical methods for modeling brain anatomy and function. NeuroImage. 2004;23:S46–S55. [PubMed]
Flandin G., Penny W.D. Bayesian fMRI data analysis with sparse spatial basis function priors. NeuroImage. 2007;34:1108–1125. [PubMed]
Friman O., Borga M., Lundberg P., Knutsson H. Adaptive analysis of fMRI data. NeuroImage. 2003;19:837–845. [PubMed]
Friston K.J., Penny W. Posterior probability maps and SPMs. NeuroImage. 2003;19:1240–1249. [PubMed]
Friston K.J., Penny W., Phillips C., Kiebel S., Hinton G., Ashburner J. Classical and Bayesian inference in neuroimaging: theory. NeuroImage. 2002;16:465–483. [PubMed]
Friston K., Mattout J., Trujillo-Barreto N., Ashburner J., Penny W. Variational free energy and the Laplace approximation. NeuroImage. 2007;34:220–234. [PubMed]
Gerig G., Kubler O., Kikinis R., Jolesz F. Nonlinear anisotropic filtering of MRI data. IEEE Trans. Med. Imag. 1992;11:221–232. [PubMed]
Goldberg P.W., Williams C.K.I., Bishop C. NIPS 10. MIT Press; 1998. Regression with input-dependent noise: a Gaussian process treatment.
Gossl C., Auer D.P., Fahrmeir L. Bayesian spatiotemporal inference in functional magnetic resonance imaging. Biometrics. 2001;57:554–562. [PubMed]
Grady L., Schwartz E.L. Boston University; Boston, MA: 2003. The Graph Analysis Toolbox: Image Processing on Arbitrary Graphs.
Gupta A.K., Nagar D.K. Chapman and Hall/CRC; Boca Raton: 2000. Matrix Variate Distributions.
Harrison L., Stephan K.E., Rees G., Friston K. Extra-classical receptive field effects measured in striate cortex with fMRI. NeuroImage. 2007;34(3):1199–1208. (Feb 1) [PMC free article] [PubMed]
Harville D. Springer Science+Business Media Inc; New York: 1997. Matrix Algebra from A Statistician's Perspective.
Jordan M.I., editor. Learning in Graphical Models. The MIT press; 1999.
Kersting K., Plagemann C., Pfaff P., Burgard W. Most likely heteroscedastic Gaussian process regression. International Conference on Machine Learning; 2007.
Kiebel S.J., Goebel R., Friston K.J. Anatomically informed basis functions. NeuroImage. 2000;11:656–667. [PubMed]
Kim H.Y., Cho Z.H. Proceedings of the 15th Brazilian Symposium on Computer Graphics and Image Processing. 2002. Robust anisotropic diffusion to produce clear statistical parametric map from noisy fMRI; pp. 11–17.
Kim H.Y., Javier G., Cho Z.H. Robust anisotropic diffusion to produce enhanced statistical parametric map from noisy fMRI. Comput. Vis. Image Underst. 2005;99:435–452.
Knutsson H.E., Wilson R., Granlund G.H. Anisotropic nonstationary image estimation and its applications: 1. Restoration of noisy images. IEEE Trans. Commun. 1983;31:388–397.
Larin M. On a multigrid method for solving partial eigenproblems. Sib. J. Numer. Math. 2004;7:25–42.
Lawrence, N., 2006. Large scale learning with the Gaussian process latent variable model (Technical Report No. CS-06-05. University of Sheffield).
MacKay D.J.C., editor. Introduction to Gaussian Processes, Neural Networks and Machine Learning. Springer; Berlin: 1998.
MacKay D.J.C. Cambridge University Press; Cambridge: 2003. Information Theory, Inference, and Learning Algorithms.
Maggioni M., Mahadevan S. University of Massachusetts; Massachusetts: 2006. A Multiscale Framework For Markov Decision Processes Using Diffusion Wavelets.
Moler C., Van Loan C. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev. 2003;45:3–49.
Penny W., Holmes A. Random-effects analysis. In: Frackowiak R., Friston K., Frith C., Dolan R., Price C., Zeki S., Ashburner J., Penny W., editors. Human Brain Function. Elseiver Science (USA); San Diego, California: 2003.
Penny W., Kiebel S., Friston K. Variational Bayesian inference for fMRI time series. NeuroImage. 2003;19:727–741. [PubMed]
Penny W.D., Trujillo-Barreto N.J., Friston K.J. Bayesian fMRI time series analysis with spatial priors. NeuroImage. 2005;24:350–362. [PubMed]
Penny W., Flandin G., Trujillo-Barreto N. Bayesian comparison of spatially regularised general linear models. Hum. Brain Mapp. 2007;28:275–293. [PubMed]
Perona P., Malik J. Scale-space and edge-detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 1990;12:629–639.
Quinonero-Candela J.Q., Rasmussen C.E. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 2005;6:1939–1959.
Rasmussen C., Williams C. The MIT Press; Massachusetts, Cambridge: 2006. Gaussian Processes for Machine Learning.
Romeny B.M.T. Kluwer Academic Publishers; 1994. Geometry-driven Diffusion in Computer Vision.
Romeny B.M.T. Springer; 2003. Front-end Vision and Multi-Scale Image Analysis.
Rossmann W. Oxford University Press; Oxford: 2002. Lie Groups: An Introduction through Linear Groups.
Sethian J.A. Cambridge University Press; Cambridge: 1999. Level Set Methods and Fast Marching Methods Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science.
Sochen N., Zeevi Y.Y. Representation of colored images by manifolds embedded in higher dimensional non-euclidean space. International Conference on Image Processing; 1998.
Sochen N., Kimmel R., Malladi R. A general framework for low level vision. IEEE Trans. Image Process. 1998;7:310–318. [PubMed]
Sochen N., Deriche R., Lucero-Lopez P. The Beltrami flow over implicit manifolds. Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV); 2003.
Spielman D.A., Teng S.H. Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems; pp. 81–90.
Teo P.C., Sapiro G., Wandell B.A. Creating connected representations of cortical gray matter for functional MRI visualization. IEEE Trans. Med. Imag. 1997;16:852–863. [PubMed]
Tolliver D., Baker S., Collins R. 2005. Multilevel Spectral Partitioning for Efficient Image Segmentation and Tracking.
Tschumperle D., Deriche R. DT-MRI images: estimation, regularization, and application. Comput. Aided Syst. Theor. - Eurocast. 2003;2809:530–541.
Wand M.P. Vector differential calculus in statistics. Am. Stat. 2002;56:55–62.
Weickert, J., 1996. Anisotropic diffusion in image processing.
Woolrich M.W., Jenkinson M., Brady J.M., Smith S.M. Fully Bayesian spatio-temporal modeling of FMRI data. IEEE Trans. Med. Imag. 2004;23:213–231. [PubMed]
Worsley K.J., Andermann M., Koulis T., MacDonald D., Evans A.C. Detecting changes in nonisotropic images. Hum. Brain Mapp. 1999;8:98–101. [PubMed]
Zhang F., Hancock E.R. Image scale-space from the heat kernel. Progr. Pattern Recogn. Image Anal. Applic. Proc. 2005;3773:181–192.
Zhang F., Hancock E.R. Riemannian graph diffusion for DT-MRI regularization. Med. Image Comput. Comput. - Assist. Interv.—Miccai. 2006;4191(Pt 2):234–242. [PubMed]
Zhang F., Hancock E.R. Smoothing tensor-valued images using anisotropic geodesic diffusion. Struct. Syntact. Stat. Pattern Recogn. Proc. 2006;4109:83–91.