Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2963450

Formats

Article sections

- Summary
- 1. Introduction
- 2. Spatial modelling for disease mapping
- 3. Order-free multivariate conditionally autoregressive distributions
- 4. Bayesian computation
- 5. Simulation study
- 6. Data example: Minnesota cancer data
- 7. Summarizing remarks and future research
- References

Authors

Related links

J R Stat Soc Series B Stat Methodol. Author manuscript; available in PMC 2010 October 25.

Published in final edited form as:

J R Stat Soc Series B Stat Methodol. 2007 November 1; 69(5): 817–838.

doi: 10.1111/j.1467-9868.2007.00612.xPMCID: PMC2963450

NIHMSID: NIHMS237254

University of Minnesota, Minneapolis, USA

See other articles in PMC that cite the published article.

With the ready availability of spatial databases and geographical information system software, statisticians are increasingly encountering multivariate modelling settings featuring associations of more than one type: spatial associations between data locations and associations between the variables within the locations. Although flexible modelling of multivariate point-referenced data has recently been addressed by using a *linear model of co-regionalization*, existing methods for multivariate areal data typically suffer from unnecessary restrictions on the covariance structure or undesirable dependence on the conditioning order of the variables. We propose a class of Bayesian hierarchical models for multivariate areal data that avoids these restrictions, permitting flexible and order-free modelling of correlations both between variables and across areal units. Our framework encompasses a rich class of multivariate conditionally autoregressive models that are computationally feasible via modern Markov chain Monte Carlo methods. We illustrate the strengths of our approach over existing models by using simulation studies and also offer a real data application involving annual lung, larynx and oesophageal cancer death-rates in Minnesota counties between 1990 and 2000.

The last decade has seen an explosion of interest in disease mapping, with recent developments in advanced spatial statistics and increasing availability of computerized geographic information system technology. For example, the databases from the National Center for Health Statistics or from the ‘Surveillance, epidemiology, and end results’ (SEER) programme of the National Cancer Institute, which are publicly available to anyone with a Web browser, provide an enormous supply of georeferenced data.

Disease mapping is an epidemiological technique that is used to describe the geographic variation of disease and to generate aetiological hypotheses about the possible causes for apparent differences in risk. Disease maps are used to highlight geographic areas with high and low incidence or mortality rates of a specific disease, and the variability of such rates over a spatial domain. They can also be used to detect spatial clusters which may be due to common environmental, demographical or cultural effects that are shared by neighbouring regions. However, mapping of crude incidence or mortality rates can be misleading when the population sizes for some of the units are small, resulting in large variability in the estimated rates, and making it difficult to distinguish chance variability from genuine differences. The correct geographic allocation of health care resources would be greatly enhanced by the development of statistical models that allow a more accurate depiction of ‘true’ rates of disease and their relation to explanatory variables (e.g. covariates).

For reasons of confidentiality or practicality, disease incidence or mortality data are often reported as counts or rates at a regional level (county, census tract, zip code, etc.). Conditionally autoregressive (CAR) models have been widely used for disease mapping with such data. They allow the borrowing of strength across regions by using not only the data from a given region but also the data from neighbouring regions. When we have multivariate areal (lattice) data (say, counts of *p* ≥ 2 diseases over the same regions), an obvious first choice would be to use *p* separate univariate CAR models. But correlation across diseases may occur if they share the same set of (spatially distributed) risk factors, or are linked by aetiology, a common risk factor or an affected organ. Moreover, the presence of one disease might encourage or inhibit the presence of another over a region. A multivariate areal model can permit modelling of dependence between those diseases while maintaining spatial dependence between regions. Identifying similar patterns in geographical variation of related diseases in a multivariate way may provide more convincing evidence for any real clustering in the underlying risk than would be available from the analysis of any single disease separately.

Several multivariate areal models have been proposed to date, any of which could be applied to multiple-disease mapping. The primary underlying challenge of multivariate areal modelling is to formulate valid probability models that account for association between different variables (e.g. diseases) *within* areal units along with the spatial association *between* areal units. Although they provide valuable theoretical insight into joint modelling of areal data, current methods often fall short of offering a template that is at once versatile and practical. Mardia (1988) described the theoretical background for multivariate Gaussian Markov random-field specifications, extending the pioneering work of Besag (1974), but used *separable* models that force identical spatial smoothing for all variables. The ‘twofold CAR’ model of Kim *et al.* (2001) offers richer spatial covariance structures for counts of two different diseases over each areal unit, but its extension to the case of more than two variables is unclear. Knorr-Held and Best (2001) developed a latent variable ‘shared component’ model for bivariate disease mapping; here extension to more than two diseases is possible (Held *et al.*, 2005) but can be awkward. Sain and Cressie (2002) discussed a multiobjective version of the CAR model that allows for flexible modelling of the spatial dependence structure, the cross-correlations in particular, but may become computationally prohibitive. Carlin and Banerjee (2003) and Gelfand and Vounatsou (2003) developed essentially equivalent multivariate CAR (MCAR) models for hierarchical modelling to include non-separable models, but they left room for further generality in the covariance structures.

Adapting the multivariate point level data approach of Royle and Berliner (1999), Jin *et al.* (2005) proposed a generalized MCAR (GMCAR) model for areal data that formulates the joint distribution for a multivariate Markov random field by specifying simpler conditional and marginal models. These models are computationally efficient and allow sufficient flexibility in the specifications of the spatial covariance structure. Indeed, many of the above models arise as special cases of the GMCAR model. However, an inherent problem with these methods is that their conditional specification imposes a potentially arbitrary order on the variables being modelled, as they lead to different marginal distributions depending on the conditioning sequence. This problem is somewhat ameliorated in certain (e.g. medical and environmental) contexts where a *natural* order is reasonable, but in many disease mapping contexts this is not so. Although Jin *et al.* (2005) suggested using model comparison techniques to decide on the proper order of modelling, since all possible permutations of the variables would need to be considered this seems feasible only with relatively few variables. In any case, the principle of choosing between conditioning sequences by using model comparison metrics is perhaps not uncontroversial.

In this paper we develop an order-free framework for multivariate areal modelling that allows versatile spatial structures yet is computationally feasible for many variables. Our approach is based on a *linear model of co-regionalization* (LMC) that has recently been proposed for multivariate point-referenced data (Wackernagel, 2003; Schmidt and Gelfand, 2003; Gelfand *et al.*, 2004). Essentially, the idea is to develop richer spatial association models by using linear transformations of much simpler spatial distributions. In this paper, we apply the LMC approach to the analysis of multivariate areal data, with an eye towards developing models for multipledisease mapping. In the process, we arrive at a very versatile framework, which encompasses a rich class of MCAR models (including most of the existing models) as special cases. In particular, we consider modelling the annual mortality rates from lung, larynx and oesophageal cancer between 1990 and 2000 in Minnesota counties, a setting in which association would be expected both within and across the areal units.

The format of our paper is as follows. In Section 2, we briefly review the spatial modelling of a single disease and multiple diseases. In Section 3, we introduce new LMC-based ways of multivariate spatial modelling. Bayesian computing issues are discussed in Section 4, whereas Section 5 evaluates our approach in terms of the average mean-square error (AMSE) and the deviance information criterion DIC via simulation. Section 6 then illustrates our approach in the aforementioned multiple-cancer data mapping setting. Finally, Section 7 summarizes our findings and suggests avenues for future research in this area.

Disease incidence or mortality data are often reported as counts or rates at a regional level (county, census tract, zip code, etc.) and are called *areal* (or *lattice*) data. Markov random-field models for lattice data are based on the Markov property, where the conditional distribution of a site's response given the responses of all the other sites depends only on the observations in the neighbourhood of this site. In this paper we define the neighbourhood by area adjacency, although other definitions are sometimes used (e.g. regions with centroids within a given fixed distance).

Let *Y _{i}* be the observed number of cases of a certain disease in region

$${Y}_{i}\stackrel{\mathrm{ind}}{\sim}\text{Poisson}\left\{{E}_{i}\mathrm{exp}\left({\mu}_{i}\right)\right\},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}i=1,\dots ,n,$$

(1)

where ${\mu}_{i}={\mathbf{x}}_{i}^{\prime}\beta +{\varphi}_{i}$. The **x**_{i} are explanatory, region level spatial covariates, having parameter coefficients *β*. The parameter *μ _{i}* represents the log-relative-risk, estimates of which are often based on the departures of observed from expected counts. We place a form of Gaussian Markov random-field model, which is commonly referred to as the CAR prior, on the random effects

$$\varphi \phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{n}[0,{\left\{\tau (D-\alpha W)\right\}}^{-1}],$$

(2)

where *N _{n}* denotes the

$${\varphi}_{i}\mid {\varphi}_{j},j\ne i,\phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}N\left(\frac{\alpha}{{m}_{i}}\underset{i\sim j}{\Sigma}{\varphi}_{j},\frac{1}{\tau {m}_{i}}\right),\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}i,j=1,\dots ,n,$$

(3)

where *i* ~ *j* denotes that region *j* is a *neighbour* (which is typically defined in terms of spatial adjacency) of region *i*. The CAR structure (2) reduces to the well-known intrinsic CAR (ICAR) model (Besag *et al.*, 1991) if *α* = 1, or an independence model if *α* = 0. The ICAR model induces ‘local’ smoothing by borrowing strength from the neighbours, whereas the independence model assumes independence of spatial rates and induces ‘global’ smoothing. The smoothing parameter *α* in the CAR prior (2) controls the strength of spatial dependence between regions, though it has long been appreciated that a fairly large *α* may be required to deliver significant spatial correlation; see Wall (2004) for recent discussion and exemplification.

A similar approach proposes a Gaussian convolution prior for the modelling of the random effects *ϕ*. The random effects *ϕ* are assumed to be the sum of the two independent components with one having a Gaussian independence prior and the other a Gaussian ICAR prior (Besag *et al.*, 1991). With such a convolution prior, we may capture both the relative contributions of regionwide heterogeneity and local clustering. Although this method has limitations (see for example Banerjee *et al.* (2004), pages 163–165), since the convolution process priors are among the most widely used we implement this model for our Minnesota cancer data analysis in Section 6.

Now let *Y _{ij}* be the observed number of cases of disease

$${Y}_{\mathit{ij}}\stackrel{\mathrm{ind}}{\sim}\text{Poisson}\left\{{E}_{\mathit{ij}}\mathrm{exp}({\mathbf{x}}_{\mathit{ij}}^{\prime}{\beta}_{j}+{\varphi}_{\mathit{ij}})\right\},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}i=1,\dots ,n,\phantom{\rule{1em}{0ex}}j=1,\dots ,p,$$

(4)

where the **x**_{ij} are explanatory, region level spatial covariates for disease *j* having (possibly region-specific) parameter coefficients *β _{j}*.

Carlin and Banerjee (2003) and Gelfand and Vounatsou (2003) generalized the univariate CAR model (2) to a joint model for the random effects *ϕ _{ij}* under a separability assumption, which permits modelling of correlation between the

$$\varphi \phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{\mathit{np}}[0,{\{\Lambda \otimes (D-\alpha W)\}}^{-1}],$$

(5)

where $\varphi ={({\varphi}_{1}^{\prime},\dots ,{\varphi}_{p}^{\prime})}^{\prime},{\varphi}_{j}={({\varphi}_{1j},\dots ,{\varphi}_{\mathit{nj}})}^{\prime}$, Λ is a *p* × *p* positive definite matrix that is interpreted as the non-spatial precision (inverse dispersion) matrix between cancers and ‘’ denotes the Kronecker product. We denote the distribution in expression (5) by MCAR(*α*, Λ). This distribution can be further generalized by allowing different smoothing parameters for each disease, i.e.

$$\varphi \phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{\mathit{np}}[0,{\left\{\mathrm{diag}({R}_{1},\dots ,{R}_{p})(\Lambda \otimes {I}_{n\times n})\mathrm{diag}{({R}_{1},\dots ,{R}_{p})}^{\prime}\right\}}^{-1}],$$

(6)

where ${R}_{j}{R}_{j}^{\prime}=D-{\alpha}_{j}W$, *j* = 1, … *p*. We denote the distribution in expression (6) by MCAR (*α*_{1}, …, *α _{p}*, Λ). Note that the off-diagonal block matrices (the

Recently, Jin *et al.* (2005) developed a more flexible GMCAR model for the random effects *ϕ*. For example, in the bivariate case (*p* = 2), they specified the conditional distribution *ϕ*_{1}|*ϕ*_{2} as *N*[(*η*_{0}*I* + *η*_{1}*W*)*ϕ*_{2}, {*τ*_{1}(*D* − *α*_{1}*W*)}^{−1}], and the marginal distribution of *ϕ*_{2} as *N*[**0**, {*ϕ*_{2}(*D* − *α*_{2}*W*)}^{−}], both of which are univariate CAR as in model (2). This formulation yields the models of Kim *et al.* (2001) as a special case and recognizes explicit smoothing parameters (*η*_{0} and *η*_{1}) for the cross-covariances, unlike the MCAR models in expression (6) where the cross-covariances are not smoothed explicitly.

Kim *et al.* (2001) and Jin *et al.* (2005) demonstrated that explicit smoothing of the crosscovariances yield better model fits to areally referenced bivariate data. However, to model the random effects *ϕ* with the GMCAR model, we need to specify the order of conditioning, since different orders of conditioning will result in different marginal distributions for *ϕ*_{1} and *ϕ*_{2} and, hence, different joint distributions for *ϕ*. As mentioned in Section 1, in disease mapping contexts a natural conditioning order is often not evident—which is a problem that is exacerbated when we have more than two diseases. What we seek, therefore, are models that avoid this dependence on conditional ordering, yet are computationally feasible with sufficiently rich spatial structures.

Our primary methodological objective is to formulate MCAR distributions that allow explicit smoothing of cross-covariances while not being hampered by conditional ordering. The most natural model here would parameterize the cross-covariances themselves as *D* − *γ _{ij}W*, instead of using the

It is worth pointing out that our use of the LMC here is somewhat broader than usually encountered in geostatistics. In geostatistics we typically transform independent latent effects, which suffices in meeting the primary goal of introducing a different spatial range for each variable. This is akin to introducing different smoothing parameters for each variable and indeed, as we show below in Section 3.2, independent latent effects lead to the MCAR(*α*_{1}, …, *α _{p}*; Λ) model in expression (6). However, to smooth the cross-covariances explicitly with identifiable parameters, we shall relax the independence of latent effects. Still, in our ensuing parameterization, we can derive conditions that yield valid joint distributions. To be precise, let $\varphi ={({\varphi}_{1}^{\prime},\dots ,{\varphi}_{p}^{\prime})}^{\prime}$ be an

First, we shall assume that the random spatial processes **u**_{j}, *j* = 1, …, *p*, are independent and identical. Since each spatial process **u**_{j} is a univariate process over areal units, we might adopt a CAR structure (2) for each of them, i.e.

$${\mathbf{u}}_{j}\phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{n}\{0,{(D-\alpha W)}^{-1}\},\phantom{\rule{1em}{0ex}}j=1,\dots ,p.$$

(7)

Since the **u**_{j} are independent of each other, the joint distribution of $\mathbf{u}={({\mathbf{u}}_{1}^{\prime},\dots ,{\mathbf{u}}_{p}^{\prime})}^{\prime}$ is **u** ~ *N _{np}* {

$$\varphi \phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{\mathit{np}}\{0,\Sigma \otimes {(D-\alpha W)}^{-1}\},$$

(8)

defining Σ = *AA*′. We denote the distribution in expression (8) by MCAR(*α*, Σ). Note that the joint distribution of expression (8) is identifiable up to Σ = *AA*′ and is independent of the choice of *A*. Thus, without loss of generality, we can specify the matrix *A* as the upper triangular Cholesky decomposition of Σ. The MCAR(*α*, Σ) distribution that is given in expression (8) is exactly the same as the MCAR(*α*, Λ) structure in expression (5), as in Carlin and Banerjee (2003) and Gelfand and Vounatsou (2003) with Σ corresponding to Λ^{−1}. Since *ϕ* = (*A* *I _{n × n}*)

Currently, the WinBUGS package (`http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml`) can fit the MCAR(*α* = 1, Σ) distribution (using its `mv.car` distribution), but not the MCAR(*α*, Σ) distribution. However, through the LMC approach we still can fit the MCAR(*α*, Σ) distribution in WinBUGS by writing *ϕ* = (*A* (*I _{n}×n*)

In case 1, we assumed that the random spatial processes **u**_{j}, *j* = 1,…, *p*, were independent and identical. However, it will often be preferable to have *p* different spatial processes. In this subsection, we shall continue to assume that the **u**_{j} are independent, but we relax their being identically distributed. Adopting the CAR structure (2), the distribution of **u**_{j} is assumed to be

$${\mathbf{u}}_{j}\phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{n}\{0,{(D-{\alpha}_{j}W)}^{-1}\},\phantom{\rule{1em}{0ex}}j=1,\dots ,p,$$

(9)

where *α*_{j} is the smoothing parameter for the *j*th spatial process. Since the **u**_{j}s are independent of each other and *ϕ* = (A *I _{n×n}*/

$$\varphi \phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{\mathit{np}}\{0,(A\otimes {I}_{n\times n}){\Gamma}^{-1}{(A\otimes {I}_{n\times n})}^{\prime}\},$$

(10)

where Σ = *AA*′ and Γ is an *n p* × *n p* block diagonal matrix with *n* × *n* diagonal entries Γ_{j} = *D* − *α*_{j}*W*, *j* = 1,…, *p*. We denote the distribution in expression (10) by MCAR*α*_{1},…, *α*_{p}, Σ).

In this case, from the joint distribution in expression (10) it can be seen that different joint distributions of *ϕ* having different covariance matrices emerge under different linear transformation matrices *A*. To ensure that *A* is identifiable, we could again specify it to be the upper triangular Cholesky decomposition of Σ, although this might not be the best choice computationally. Through the LMC approach in this case, the distribution in expression (10) is similar to the MCAR(*α*_{1},…, *α*_{p}, Λ) structure (6), which was developed in Carlin and Banerjee (2003) and Gelfand and Vounatsou (2003). All of these have the same number of parameters, and there is no unique joint distribution for *ϕ* with the MCAR(*α*_{1},…, *α*_{p}, Λ) structure, since there is not a unique *R _{j}*-matrix such that

Again, a valid joint distribution in expression (10) requires *p* valid distributions for **u**_{j}, i.e. 1=*ξ*_{min} < *α _{j}* < 1=

Finally, in this case we shall assume that the random spatial processes **u**_{j} = (*u*_{1j},…, *u _{nj}*)′,

$$E({u}_{\mathit{ij}}\mid {u}_{k\ne i,j},{u}_{i,l\ne j},{u}_{k\ne i,l\ne j})={b}_{\mathit{jj}}\underset{k\sim i}{\Sigma}{u}_{\mathit{kj}}\u2215{m}_{i}+\underset{l\ne j}{\Sigma}\left({b}_{\mathit{jl}}\underset{k\sim i}{\Sigma}{u}_{\mathit{kl}}\u2215{m}_{i}\right),$$

and conditional variance var(*u _{ij}*|

$$\mathbf{u}\phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{\mathit{np}}\{0,{({I}_{p\times p}\otimes D-B\otimes W)}^{-1}\},$$

(11)

where *I* is a *p* × *p* identity matrix and *B* is a *p* × *p* symmetric matrix with elements *b _{jl}*,

$${I}_{p\times p}\otimes D-B\otimes W={({I}_{p\times p}\otimes D)}^{1\u22152}({I}_{\mathit{pn}\times \mathit{pn}}-B\otimes {D}^{-1\u22152}{\mathit{WD}}^{-1\u22152}){({I}_{p\times p}\otimes D)}^{1\u22152}.$$

Denoting the eigenvalues for *D*^{−1/2 }*WD*^{−1/2} as *ξ _{i}*,

Model (11) introduces smoothing parameters in the cross-covariance structure through the matrix *B* but unlike the MCAR models in Sections 3.1 and 3.2 does not have the Σ-matrix to capture non-spatial variances. To remedy this, we model *ϕ* = (*A**I _{n×n}*)

$$\varphi \phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{\mathit{np}}\{0,(A\otimes {I}_{n\times n}){({I}_{p\times p}\otimes D-B\otimes W)}^{-1}{(A\otimes {I}_{n\times n})}^{\prime}\}.$$

(12)

Since *ϕ* = (*A**I _{n×n}*)

To see the generality of expression (12), we find that the joint distribution of *ϕ* reduces to the MCAR(*α*_{1},…,*α _{p}*,Σ) distribution (10) if

$$(A\otimes {I}_{n\times n}){({I}_{p\times p}\otimes D-B\otimes W)}^{-1}{(A\otimes {I}_{n\times n})}^{\prime}=(T\otimes {I}_{n\times n}){({I}_{p\times p}\otimes D-C\otimes W)}^{-1}{(T\otimes {I}_{n\times n})}^{\prime},$$

where *C*=*P*′*BP*. Without loss of generality, then, we can choose the matrix *A* as the upper triangular Cholesky decomposition of Σ.

To understand the features of the MCAR(*B*,Σ) distribution (12), we illustrate in the bivariate case (*p*=2). Define

$${\left({\mathit{AA}}^{\prime}\right)}^{-1}={\Sigma}^{-1}=\left(\begin{array}{cc}\hfill {\Lambda}_{11}\hfill & \hfill {\Lambda}_{12}\hfill \\ \hfill {\Lambda}_{12}\hfill & \hfill {\Lambda}_{22}\hfill \end{array}\right),$$

and

$$B={A}^{\prime}\left(\begin{array}{cc}\hfill {\gamma}_{1}{\Lambda}_{11}\hfill & \hfill {\gamma}_{12}{\Lambda}_{12}\hfill \\ \hfill {\gamma}_{12}{\Lambda}_{12}\hfill & \hfill {\gamma}_{2}{\Lambda}_{22}\hfill \end{array}\right)A,$$

where

$$A=\left(\begin{array}{cc}\hfill {a}_{11}\hfill & \hfill {a}_{12}\hfill \\ \hfill 0\hfill & \hfill {a}_{22}\hfill \end{array}\right)$$

and

$$B=\left(\begin{array}{cc}\hfill {b}_{11}\hfill & \hfill {b}_{12}\hfill \\ \hfill {b}_{12}\hfill & \hfill {b}_{22}\hfill \end{array}\right).$$

Note that the *γ*s are not identifiable from the matrix Λ and our reparameterization in terms of *B* must be used to conduct posterior inference on *B* and Λ (see Section 4), from which the cross-covariances may be recovered. The above expression does allow the MCAR(*B*,Σ) distribution (12) to be rewritten as

$$\varphi \phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{N}_{2n}\left\{0,{\left(\begin{array}{cc}\hfill (D-{\gamma}_{1}W){\Lambda}_{11}\hfill & \hfill (D-{\gamma}_{12}W){\Lambda}_{12}\hfill \\ \hfill (D-{\gamma}_{12}W){\Lambda}_{12}\hfill & \hfill (D-{\gamma}_{2}W){\Lambda}_{22}\hfill \end{array}\right)}^{-1}\right\},$$

(13)

which is precisely the general dispersion structure that we set out to achieve.

To see how the parameters in expression (13) affect smoothing, we obtain the conditional means

$$E({\varphi}_{\mathit{i}1}\mid {\varphi}_{k\ne \mathit{i},1},{\varphi}_{\mathit{i}2},{\varphi}_{k\ne \mathit{i},2})=-\frac{{\Lambda}_{12}}{{\Lambda}_{11}}{\varphi}_{\mathit{i}2}+\frac{1}{{m}_{i}}\underset{k\sim i}{\Sigma}\left\{{\gamma}_{1}{\varphi}_{k1}-{\gamma}_{12}\left(-\frac{{\Lambda}_{12}}{{\Lambda}_{11}}\right){\varphi}_{k2}\right\}$$

and

$$E({\varphi}_{\mathit{i}2}\mid {\varphi}_{k\ne \mathit{i},2},{\varphi}_{\mathit{i}1},{\varphi}_{k\ne \mathit{i},1})=-\frac{{\Lambda}_{12}}{{\Lambda}_{22}}{\varphi}_{\mathit{i}1}+\frac{1}{{m}_{i}}\underset{k\sim i}{\Sigma}\left\{{\gamma}_{2}{\varphi}_{k2}-{\gamma}_{12}\left(-\frac{{\Lambda}_{12}}{{\Lambda}_{22}}\right){\varphi}_{k1}\right\},$$

and the conditional variances $\mathrm{var}({\varphi}_{i1}\mid {\varphi}_{k\ne i,1},{\varphi}_{i2},{\varphi}_{k\ne i,2})={\Lambda}_{11}^{-1}\u2215{m}_{i}$ and $\mathrm{var}({\varphi}_{i2}\mid {\varphi}_{k\ne i,2},{\varphi}_{i1},{\varphi}_{k\ne i,1})={\Lambda}_{22}^{-1}\u2215{m}_{i}$, *i*, *k* = 1, … *n*. (The conditional moments for special cases such as the separable model (5) arise by simply setting *γ*_{1} = *γ*_{2} = *γ*_{12} = *α*.) We note that −Λ_{12}/Λ_{11} and −Λ_{12}/Λ_{22} appear as regression parameters, regressing one component of *ϕ* at a given site on the other. The spatial smoothing is incorporated by an autoregressive term for each site in the neighbour set which is corrected at each site for the regression on the other component. The smoothing parameters are different for each spatial association and cross-spatial association. For neighbouring regions *i* and *k*, expression (13) yields the partial correlations corr(*ϕ _{i1}*,

Our proposed MCAR(*B*, Σ) model is straightforwardly implemented in a Bayesian frame-work by using Markov chain Monte Carlo (MCMC) methods. As in Section 3.3, we write *ϕ* = (*A* *I*_{n×n})**u**, where **u** = (**u**_{1}, **u**_{2})′ and **u**_{j} = (*u*_{1j},…, *u*_{nj}/′. The joint posterior distribution is

$$p(\beta ,{\sigma}^{2},\mathbf{u},A,B\mid {\mathbf{Y}}_{1},{\mathbf{Y}}_{2})\propto L({\mathbf{Y}}_{1},{\mathbf{Y}}_{2}\mid \mathbf{u},{\sigma}^{2},A)p(\mathbf{u}\mid B)\phantom{\rule{thinmathspace}{0ex}}p\left(B\right)\phantom{\rule{thinmathspace}{0ex}}p\left(\beta \right)\phantom{\rule{thinmathspace}{0ex}}p\left(A\right)\phantom{\rule{thinmathspace}{0ex}}p\left({\sigma}^{2}\right),$$

(14)

where **Y**_{1} = (*Y*_{11}, …, *Y*_{n1})′ and **Y**_{2} = (*Y*_{12}, …, *Y*_{n2})′ and *L*(**Y**, **Y**_{2}|**u**, σ^{2}, *A*) is the data likelihood.

The second term on the right-hand side of expression (14) is *p*(**u**|*B*) = *N _{n p}* {0, (

Instead, here we outline a different strategy to update the matrix *B*. Our approach is to represent *B* by using the spectral decomposition, which we write as *B* = *P*Δ*P*′, where *P* is the corresponding orthogonal matrix of eigenvectors and Δ is a diagonal matrix of ordered eigen-values, *ζ*_{1},…, *ζ*_{p}. We parameterize the *p* × *p* orthogonal matrix *P* in terms of the *p*(*p* − 1)/2 Givens angles *θ*_{ij} for *i* = 1,…, *p* − 1 and *j* = *i* + 1,…, *p* (Daniels and Kass, 1999). The matrix *P* is written as the product of *p*(*p*− 1)/2 matrices, each associated with a Givens angle. Specifically, *P* = *G*_{12}*G*_{13} … *G*_{1p} … *G*_{p−1})*p* where *i* and *j* are distinct and *G*_{ij} is the *p* × *p* identity matrix with the *i*th and *j*th diagonal elements replaced by cos(*θ*_{ij}), and the (*i*, *j*)th and (*j*, *i*)th elements replaced by ± sin(*θ*_{ij}). Since the Givens angles *θ*_{ij} are unique with a domain (−*π*/2, *π*/2) and the eigenvalues *ζ*_{j} of *B* are in the range (1/*ξ*_{min}, 1), we then put a *U*(−*π*/2, *π*/2) prior on the *θ*_{ij} and a *U*(1/*ξ*_{min}, 1) prior on the *ζ*_{j}. To update *θ*_{ij}s or *ζ*_{j}s by using random-walk Metropolis–Hastings steps with Gaussian proposals, we need to transform them to have support equal to the whole real line. A straightforward solution here is to use

$$g\left({\theta}_{ij}\right)=\mathrm{log}\left(\frac{\pi \u22152+{\theta}_{ij}}{\pi \u22152-{\theta}_{ij}}\right),$$

a transformation having Jacobian ${\Pi}_{i=1}^{p-1}{\Pi}_{j=i+1}^{p}(\pi \u22152+{\theta}_{\mathit{ij}})(\pi \u22152-{\theta}_{ij})$. In practice, the *ζ*_{j} must be bounded from 1 (say, by insisting that 1=*ξ*_{min} < *ζ*_{j} < 0:999, *j* = 1,…, *p*) to maintain identifiability and hence computational stability. In fact, with our approach it is also easy to calculate the determinant of the precision matrix, i.e. $|{I}_{p\times p}\otimes D-B\otimes W|\propto {\Pi}_{i=1}^{n}{\Pi}_{j=1}^{p}(1-{\xi}_{i}{\zeta}_{j})$, where *ξ*_{i} are the eigenvalues of *D*^{−1/2 }*WD*^{−1/2}, which can be calculated before any MCMC iteration. For the special case of the MCAR(*α*_{1}, …, *α*_{p}, Σ) models, we could assign each *α*_{i} ~ *U*(0, 1), which would be sufficient to ensure a valid model (e.g. Carlin and Banerjee (2002)). However, we also investigated with more informative priors on the α_{i}s such as the beta(2, 18) prior that centres the smoothing parameters closer to 1 and leads to greater smoothing.

With respect to the prior distribution *p*(*A*) on the right-hand side of expression (14), we can put independent priors on the individual elements of *A*, such as inverse gamma for the square of the diagonal elements of *A* and normal for the off-diagonal elements. In practice, we cannot assign non-informative priors here, since then MCMC convergence is poor. In our experience it is easier to assign a vague (i.e. weakly informative) prior on Σ than to put such priors on the elements of *A* in terms of letting the data drive the inference and obtaining good convergence. Since Σ is a positive definite covariance matrix, the inverse Wishart prior distribution renders itself as a natural choice, i.e. Σ^{−1} ~ Wishart{*ν*, (*νR*)^{−1}} (see for example Carlin and Louis (2000), page 328). Hence, we instead place a prior directly on Σ and then use the one-to-one relationship between the elements of Σ and *A*. Then the prior distribution *p*(A) becomes

$$p\left(A\right)\propto \mid {AA}^{\prime}{\mid}^{-(\nu +4)\u22152}\mathrm{exp}\left[-\frac{1}{2}\mathrm{tr}\left\{\nu D{\left({AA}^{\prime}\right)}^{-1}\right\}\right]\mid \frac{\partial \Sigma}{{a}_{ij}}\mid ,$$

where |Σ/*a*_{ij}| is the Jacobian. For example, when *p* = 2, the Jacobian is $4{a}_{22}^{2}{a}_{11}$. Rather than updating Σ as a block by using a Wishart proposal, updating the elements *a*_{ij} of *A* offers better control. These are updated via a random-walk Metropolis step, using log-normal proposals for the diagonal elements and normal proposals for the off-diagonal elements. With regard to choosing *ν* and *R* in the Wishart{*ν*, (*ν*R)^{−1} prior, since *E*(Σ^{−1}) = *R*^{−1}, if there is no information about the prior mean structure of Σ, a diagonal matrix *R* can be chosen, with the scale of the diagonal elements being judged by using ordinary least squares estimates based on independent models for each response variable. Although this leads to a data-dependent prior, it typically lets the data drive the results, leading to robust posterior inference. In this study we adopt *ν* = 2 (i.e. the smallest value for which this Wishart prior is proper) and *R* = diag(0.1, 0.1). Finally, for the remaining terms on the right-hand side of expression (14), flat priors are chosen for *β*_{1} and *β*_{2}, whereas *σ*^{2} is assigned a vague inverse gamma prior, i.e. an IG(1, 0.01) prior parameterized so that *E*(*σ*^{2}) = *b*/(*a* − 1). In this study, ** β** and

To evaluate our new approach for modelling multivariate areal data, we begin with some simulation studies. The studies use the spatial lay-out of the 87 counties in the state of Minnesota, which is a fairly typical areal arrangement and the one that is used by our Section 6 data set. We generated count data from a Poisson distribution, as is typical in disease mapping settings:

$${Y}_{ij}\stackrel{\mathrm{ind}}{\sim}\mathrm{Po}\left\{{E}_{ij}\mathrm{exp}\left({\mu}_{\mathit{ij}}\right)\right\},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}i=1,\dots ,n,\phantom{\rule{1em}{0ex}}j=1,2,$$

(15)

where *Y*_{ij} is the observed number of cases of cancer type *j* (one of two types) in region *i* and log(*μ*_{ij}) = *β*_{j} + *ϕ*_{ij} with the *β*_{j}s being fixed as *β*_{1} = −0.05 and *β*_{2} = − 0.01. This set-up mimics oesophageal (relatively low incidence) and lung (higher incidence) Minnesota cancer incidence data and uses the actual internally standardized expected mortalities *E*_{ij} from our SEER data.

To assess the relative performance of our proposed model, we designed five simulation studies. In study 1, we generate *ϕ*_{1} = (*ϕ*_{11},…, *ϕ*_{1n})′ and *ϕ*_{2} = (*ϕ*_{21},…, *ϕ*_{2n})′ from the MCAR(*B*, Σ) distribution (12), where *D* = diag(*m _{i}*) and the adjacency matrix

$$A=\left(\begin{array}{cc}\hfill 0.3\hfill & \hfill 0.1\hfill \\ \hfill 0.0\hfill & \hfill 0.3\hfill \end{array}\right),$$

which yields a non-spatial correlation of 0.32, and

$$B=\left(\begin{array}{cc}\hfill 0.8\hfill & \hfill 0.4\hfill \\ \hfill 0.4\hfill & \hfill 0.1\hfill \end{array}\right).$$

These choices yielded *ϕ*_{ij}s ranging between (−1.5, 1.5) with correlations between the spatial effects ranging from 0.08 to 0.73. In study 2, we generate *ϕ* from the MCAR(*B*, *I*) model (11) retaining the same *B* as in study 1, whereas, in study 3, we generate them from the MCAR(*α*, Σ) model (8) with the same *A* as in study 1. In study 4, we generate the *ϕ* from the GMCAR(*α*_{1}, *α*_{2}, *η*_{0}, *η*_{1}, *τ*_{1}, *τ*_{2}) models under the conditioning order *ϕ*_{1}|*ϕ*_{2}. We choose the true parameter values in this GMCAR distribution to be *α*_{1} = 0.1, *α*_{2} = 0.8, *η*_{0} = 0.4, *η*_{1} = 0:3, *τ*_{1} = 10 and *τ*_{2} = 10. Finally, study 5 examines the effect of model misspecification by using a geostatistical model (instead of an MCAR model) as the true random-effect distribution. Here ϕ_{1} and ϕ_{2} arise from a Gaussian process with exponential covariance functions exp(−0.01*d*_{ij}) and exp(−0.05*d _{ij}*) respectively, where

The data that are analysed in this paper and the programs that were used to analyse them can be obtained from `http://www.blackwellpublishing.com/rss`

To evaluate the performance of our proposed model, we simulated *N* = 1000 data sets and fitted several multivariate models to each. In each study of Table 1, model 1 is the MCAR(*B*, Σ) model that is specified by expression (12), whereas model 2 is the MCAR(*B*, *I*) model in expression (11). Model 3 is the MCAR(*α*, Σ) model from Section 3.2 or, equivalently, the MCAR(*α*, Λ) model in expression (5). Models 4 and 5 are the order-specific GMCAR(*α*_{1}, *α*_{2}, *η*_{0}, *η*_{1}, *τ*_{1}, *τ*_{2}) models (Jin *et al.*, 2005) with the conditioning order *ϕ*_{1}|*ϕ*_{2} and the reverse conditioning order *ϕ*_{2}|*ϕ*_{1} respectively. Finally, model 6 is a bivariate independent and identically distributed (IID) model.

AMSE, associated Monte Carlo standard errors se and percentage change in AMSE (Δ) relative to the true model in each simulation study with Poisson responses^{†}

For each data set and model in each study, we first ran a few initially overdispersed parallel MCMC chains, and monitored them by using measurements of sample autocorrelations within the chains, cross-correlations between parameters and plots of sample traces. From these, we decided to use 20000 iterations for the preconvergence burn-in period, and then a further 20000 iterations as our ‘production’ run for posterior summarization. Unfortunately, the complexity of model (12) precluded us from using the WinBUGS package, so we instead relied on our own programs written in C and executed in R (`http://www.r-project.org`) using the .`C` function. Random-number generation and posterior summarization were also implemented in R.

To evaluate the relative performance of the models, we compare their AMSE. Since the true *ϕ _{ij}*-values are known in the simulation, the AMSE for each disease can be estimated as

$${\widehat{\mathrm{AMSE}}}_{j}=\frac{1}{Nn}\underset{r=1}{\overset{N}{\Sigma}}\underset{i=1}{\overset{n}{\Sigma}}{({\widehat{\varphi}}_{ij}^{\left(r\right)}-{\varphi}_{ij})}^{2}$$

with associated Monte Carlo standard error estimate

$$\widehat{\mathrm{se}}\left({\widehat{\mathrm{AMSE}}}_{j}\right)=\sqrt{\left[\frac{1}{\mathit{Nn}(\mathit{Nn}-1)}\underset{r=1}{\overset{N}{\Sigma}}\underset{i=1}{\overset{n}{\Sigma}}{\{{({\widehat{\varphi}}_{ij}^{\left(r\right)}-{\varphi}_{ij})}^{2}-{\widehat{\mathrm{AMSE}}}_{j}\}}^{2}\right]},$$

where ${\varphi}_{\mathit{ij}}^{\left(r\right)}$ is the posterior mean estimate for disease *j* at county *i* based on the *r*th data set. In our case we have N = 1000, *n* = 87 and *j* = 1, 2.

Table 1 gives the estimated AMSE_{j}-values and their associated Monte Carlo standard errors for each disease and each model in each simulation study. The estimated overall AMSE values are calculated by aggregating over the two diseases. Here we also calculate the percentage change in estimated AMSE for each model compared with the true model in each study, i.e. $\Delta =({\widehat{\mathrm{AMSE}}}_{j}^{\left(k\right)}-{\widehat{\mathrm{AMSE}}}_{j}^{\left(\text{true}\right)})\u2215{\widehat{\mathrm{AMSE}}}_{j}^{\left(\text{true}\right)}\times 100$ for models *k* = 1,…,5; negative values would indicate superiority over the true model.

From Table 1, the performance of the MCAR(*B*, Σ) model appears quite impressive. Not surprisingly, in study 1 where it is the true model it excels over the other MCAR and GMCAR models. In study 2 its performance is highly comparable with the true model MCAR(*B*, *I*). In fact, we find a marginally lower AMSE score for *ϕ*_{1} (−0.13%) and a virtually identical overall AMSE (0.80%). The story is similar for study 3, where the MCAR(*α*, *Σ*) model is the true model, and now we find a marginally lower overall AMSE score (−0.40%). In study 4, where the GMCAR model is the true model, it performs a little worse (the overall AMSE score exceeds that of the true model by 1.93%) but is still appreciably better than the other competing models. Finally, in study 5 where all the models are misspecified, the MCAR(*B*, *Σ*) model remains the best, with considerably lower overall AMSE scores and only a hint of a competition from the GMCAR (reverse order) model.

The performance of the MCAR(*B*, *I*) model is disappointing. The only study where it performs competitively is in study 2, where it is in fact the true model. In all the remaining studies its overall AMSE scores beat only those of the bivariate IID model and are usually much higher than those of the GMCAR models (especially that having the better ordering) and the MCAR(*α*, *Σ*) model. That this occurs despite the smoothing of the cross-covariances through the matrix *B* is probably a reflection of the model's inadequacy in capturing the scaling of the spatial effects that are offered by Σ. The MCAR(*α*, *Σ*) model, unlike the MCAR(*B*, *I*) model, incorporates a single smoothing parameter but captures the within-site association between the two diseases through Σ. The MCAR(*α*, *Σ*) model's overall AMSE performance is typically seen to lie between those of the MCAR(*B*, *Σ*) and the better GMCAR model.

The GMCAR model also presents some interesting features. Recall that its parameterization is quite rich and it does allow spatially adaptive smoothing of the variances and cross-covariances. However, its primary drawback lies in its sensitivity to the order of conditioning, and this is apparent in Table 1. Except for study 4, where *ϕ*_{1}|*ϕ*_{2} is the true model, *ϕ*_{2}|*ϕ*_{1} appears to be offering much better estimation. In fact, its performance is seen to be only marginally inferior to the MCAR(*B*, *Σ*) model and its overall AMSE score never exceeds 3.81% of that of the true model. It typically beats the MCAR(*α*, *Σ*) model, except in study 3 where the latter is the true model and has an overall AMSE score lesser by 3.56%. In study 5, where all the models are misspecified, the better GMCAR model's AMSE score exceeds that of the MCAR(*B*, *Σ*) model by only 3% and is considerably better than all the remaining models.

In summary, although the MCAR(*B*, *Σ*) model consistently produces the lowest AMSEs-cores in Table 1, the absolute differences in AMSE among most of the models are rather small, of the order of 10^{−2} in our scale. The associated Monte Carlo standard errors of these AMSEs are of the order of 10^{−5}, making these differences highly reliable computationally. However, in disease mapping exercises we would not expect to see practical differences between the models. Rather, the main message here is that, despite the rich parameterization of the MCAR(*B*, *Σ*) model, it does not appear to be suffering from excessive overfitting (i.e. few negative Δs with respect to the true model) while offering very robust estimation compared with its competitors.

To investigate the predictive performance of our methods, we now turn to the *deviance information criterion* DIC (Spiegelhalter *et al.*, 2002) as our model choice criterion. This criterion is based on the posterior distribution of the *deviance* statistic, *D*(*θ*) =−2 log {f(**y| θ**) 2 log {h(

To calculate DIC in our MCMC setting, we need to calculate only the average deviance *D̄*, and the deviance of the posterior mean, *D*(). Here *D*(*θ*) is the same for the models that we wish to compare since they differ only in their random-effect distributions *p*(** ϕ**|

Table 2 gives the simulated results for *D̄*, *p _{D}* and DIC for the same six models and five studies as were considered in Table 1. In Table 2 we redefine Δ as the absolute (not relative) change when compared with the true model in each case. Thus positive Δ-values indicate poorer fit, larger effective model size and poorer overall model selection scores for

Simulated *D̄*, *p*_{D} and DIC-scores and associated standard deviations, along with absolute change (Δ) relative to the true model in each simulation study^{†}

Overall, the results are similar to those which were seen for the mean-squared error, with generally strong support for the MCAR(*B*, *Σ*) model. The large improvements in fit that are brought about by this model are typically not offset by the relatively modest increases in effective model size, leading to significant improvements in overall DIC-score (the ‘sd’ measures in Table 2 are simply sample standard deviations of the MCMC draws themselves, not Monte Carlo standard errors as in Table 1). Not only does it excel in study 1, but it actually outperforms the true models in terms of actual DIC in studies 2 and 3. In fact, in terms of *D̄* it outperforms all the true models in studies 2–4 but marginally loses out to the true GMCAR model in terms of overall DIC owing to a substantial increase in the effective number of parameters. The full MCAR(*B*, *Σ*) model also dominates the other models in study 5 even though it, like the rest of the models, is not designed to capture the true geostatistical nature of the simulated data. Although the differences in the average DIC-scores in Table 2 are not very large relative to the standard deviations, the ranking of the models is consistently preserved. For instance, the percentage of replicate data sets for which the MCAR(*B*, *Σ*) model had the lowest DIC-score was roughly 99% in study 1, 57% in study 2, 86% in study 3, 47% in study 4 (the true GMCAR model had lower DIC in the remaining 53% of the replications) and a full 100% in study 5.

The behaviour of the remainder of the models is also quite similar with results seen for the AMSE and probably with similar reasonings to those given there. For instance, we again note the typically poorer performance of the MCAR(*B*, *I*) model relative to the other spatial models; the GMCAR model with the better ordering is the closest competitor to the MCAR(*B*, *Σ*) model whereas the MCAR(*α*, *Σ*) model performs somewhere between the two GMCAR models in studies 1 and 2, performs better than either of them in study 3 (the true model) and study 5 (marginally over the better GMCAR model) and worse in study 4.

We now turn from the Gaussian likelihood with *p* = 2 to a non-Gaussian likelihood with *p* = 3 in a disease mapping context, and we illustrate our methods with a data set that was extracted from the public use SEER mortality database (`http.//seer.cancer.gov`). SEER county level mortality databases (National Cancer Institute, 2003) provide the numbers of deaths and corresponding numbers of person-years at risk in quinquennial age brackets for each county in a particular state and each cancer site. Our data consist of the numbers of deaths due to cancers of the lung, larynx and oesophagus in the years from 1990 to 2000 at the county level in Minnesota. The larynx and oesophagus are sites of the upper aerodigestive tract, so they are closely related anatomically. Epidemiological evidence shows a strong and consistent relationship between exposure to alcohol and tobacco and the risk of cancer at these two sites (Baron *et al.*, 1993). Meanwhile, lung cancer is the leading cause of cancer death for both men and women. An estimated 160440 Americans will have died in 2004 from lung cancer, accounting for 28% of all cancer deaths. It has long been established that tobacco, and particularly cigarette smoking, is the major cause of lung cancer. More than 87% of lung cancers are smoking related (`http.//www.lungcancer.org`).

These cancers are sufficiently rare relative to the population in each county that the Poisson spatial regression model (4) in Section 2.2 is appropriate. To calculate the expected counts *E _{ij}*, we must take each county's age distribution (over the 18 age groups) into account. To do so, we calculate the expected

$${\omega}_{j}^{k}=\underset{i=1}{\overset{87}{\Sigma}}{D}_{ij}^{k}/\underset{i=1}{\overset{87}{\Sigma}}{N}_{i}^{k}$$

is the age-specific death-rate due to cancer *j* for age group *k* over all Minnesota counties, ${D}_{\mathit{ij}}^{k}$ is the number of deaths in age group *k* of county *i* due to cancer *j* and ${N}_{i}^{k}$ is the total population at risk in county *i* and age group *k*, which we assume to be the same for each type of cancer.

To compare models, we might set aside the values for some of the cancers in some of the counties, impute them by using the posterior predictive distribution based on the remaining data and then compare the resulting total mean-squared prediction error across models. However, several practical difficulties arise, including the number and precise choice of validation counties, and which cancer counts (one or all three) to delete in each. The dependence structure that is imposed by the county lattice and the problem of spatial edge effects also suggest that this would not offer a general model checking method in our context. As such, we again compare models by using the DIC-criterion.

The county level maps of the raw age-adjusted standardized mortality ratios (i.e. SMR* _{ij}Y_{ij}*/

Maps of raw standardized mortality ratios SMR for (a) lung, (b) larynx and (c) oesophageal cancer in the years from 1990 to 2000 in Minnesota

As in the previous section, the deviance *D*(*θ*) is the same for the models that we wish to compare since they differ only in their random-effect distributions *p*(** ϕ**|

In what follows, models 1–6 are multivariate lattice models with different assumptions about the smoothing parameters. Model 1 is the full model MCAR(*B*, *Σ*) (with a 3 × 3 matrix *B* whose elements are the six smoothing parameters) whereas model 2 is the MCAR(*B*, *I*) model. Model 2 is the MCAR(*α*_{1}, *α*_{2}, *α*_{3}, Σ) model (10) with a different smoothing parameter for each cancer. Model 3 assumes a common smoothing parameter *α* and model 4 fits the three separate univariate CAR model, whereas model 6 is the trivariate IID model. Fit measures *D̄*, effective numbers of parameters *p _{D}* and DIC-scores for each model are seen in Table 3. We find that the MCAR(

Models 7-11 are the convolution prior models corresponding to models 1–5 that are formed by adding IID random effects (following *N*(0, *τ*^{2})) to the *ϕ _{ij}*s. Here the distinctions between the models are somewhat more pronounced owing to the added variability in the models that is caused by the IID effects. The relative performances of the models remain the same with the MCAR(

In this subsection, we begin by summarizing our results from the MCAR(*B*, *Σ*) model, or model 1 in Table 3. Table 4 provides posterior means and associated standard deviations for the parameters ** β**, Σ and

Turning to geographical summaries, Fig. 2 maps the posterior means of the fitted standard mortality ratios SMR of lung, larynx and oesophageal cancer from our MCAR(*B*, *Σ*) model. The correlation between the cancers is apparent, with higher fitted ratios extending from the Twin Cities metro area to the north and north-east (an area where cigarette smoking may be more common). In Fig. 1, the range of the raw SMRs is seen to be from 0 to 3.3, whereas, in Fig. 2, the range of the fitted SMRs is from 0.7 to 1.3, owing to spatial shrinkage in the random effects.

In this paper we have applied the notion of the linear model of co-regionalization to the analysis of multivariate areal data and proposed the MCAR(*B*, *Σ*) model for mapping multiple diseases. As with the existing MCAR(*α*, Λ) and multivariate IID models in the literature, the MCAR(*B*, *Σ*) model is order free, i.e. independent of the ordering of the variables in the hierarchical model. But the MCAR(*B*, *Σ*) model is much more flexible for modelling spatial correlations in multivariate areal data. Our simulations and data example demonstrate the improved performance of the MCAR(*B*, *Σ*) model over the existing alternatives as measured by the AMSE or DIC, as well as its efficient implementation by using MCMC algorithms. Our approach is also readily extended to higher dimensional (*p*> 2) settings; the computational burden does not increase much with dimension *p* since it does not involve large matrix calculations.

Though our emphasis has been on mapping multiple diseases, our proposed MCAR model may also be useful for mapping a single disease. For example, consider a *spatially varying-coefficients model* (Assunção, 2003) extending model (1) in Section 2.1 to

$${Y}_{i}\stackrel{\mathrm{ind}}{\sim}\text{Poisson}\left\{{E}_{i}\mathrm{exp}({\mathbf{x}}_{i}^{\prime}\beta +{\varsigma}_{\mathit{i}1}{z}_{\mathit{i}1}+{\varsigma}_{\mathit{i}2}{z}_{\mathit{i}2}+{\varphi}_{i})\right\},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}i=1,\dots ,n,$$

where the **x**_{i} are explanatory, region level spatial covariates having parameter coefficients ** β** and

In our current work, we have only considered mapping the geographic pattern of multiple diseases at a single point in time. However, disease data are often reported over a series of time periods. For example, the SEER database currently provides cancer mortality information for the years from 1969 to 2001. In such cases, we may be interested in temporal effects as well as spatial effects. This motivates an extension of the proposed MCAR model to multivariate *spatiotemporal* data. Knorr-Held (2000) proposed a general framework for spatiotemporal modelling of a single disease by using ICAR models, which could perhaps be extended to our multivariate setting. Alternatively, we may work with a multivariate Gaussian AR(1) time series of spatial processes in the setting of dynamic models (Gelfand *et al.*, 2005). This would extend model (4) in Section 2.2 to

$${Y}_{i\phantom{\rule{thinmathspace}{0ex}}\mathrm{jt}}\stackrel{\mathrm{ind}}{\sim}\text{Poisson}\left\{{E}_{i\phantom{\rule{thinmathspace}{0ex}}\mathrm{jt}}\mathrm{exp}({\mathbf{x}}_{i\phantom{\rule{thinmathspace}{0ex}}\mathrm{jt}}^{\prime}{\beta}_{\mathit{jt}}+{\varphi}_{i\phantom{\rule{thinmathspace}{0ex}}\mathrm{jt}})\right\},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}i=1,\dots ,n,\phantom{\rule{1em}{0ex}}j=1,\dots ,p,\phantom{\rule{1em}{0ex}}t=1,\dots ,T,$$

where the **x**_{ijt} are explanatory, region level spatial covariates for disease *j* at time period *t* having parameter coefficients *β*_{jt}. Let *ϕ*_{j, t+1} = *H _{jϕj, t}* +

Finally, here we have studied methods for mapping *rare* disease rates, with the rarity required to ensure the validity of the Poisson approximation to a binomial sampling distribution. However, it is easy to adapt the statistical methods that we have presented here to analyse non-rare diseases by replacing the Poisson likelihood with a binomial distribution for the data (MacNab, 2003). Also, although in this paper we considered only mapping disease rates, we also can apply our MCAR(*B*, *Σ*) model methodologies when modelling random effects in a *hazard function* (Carlin and Banerjee, 2003).

The work of all three authors was supported in part by National Institutes of Health grants 2-R01-ES07750 and 1-R01-CA95955, and that of the second and third authors was supported in part by National Institutes of Health grant 1-R01-CA112444.

- Assunção RM. Space-varying coefficient models for small area data. Environmetrics. 2003;14:453–473.
- Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall–CRC; Boca Raton: 2004.
- Baron AE, Franceschi S, Barra S, Talamini R, La Vecchia C. Comparison of the joint effect of alcohol and smoking on the risk of cancer across sites in the upper aerodigestive tract. Cancer Epidem. Biomark. Prevn. 1993;2:519–523. [PubMed]
- Besag J. Spatial interaction and the statistical analysis of lattice systems (with discussion) J. R. Statist. Soc. B. 1974;36:192–236.
- Besag J, York JC, Mollié A. Bayesian image restoration, with two applications in spatial statistics (with discussion) Ann. Inst. Statist. Math. 1991;43:1–59.
- Carlin BP, Banerjee S. Hierarchical multivariate CAR models for spatio-temporally correlated survival data (with discussion) In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M, editors. Bayesian Statistics 7. Oxford University Press; Oxford: 2003. pp. 45–63.
- Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. 2nd edn. Chapman and Hall–CRC; Boca Raton: 2000.
- Celeux G, Forbes F, Robert CP, Titterington DM. Deviance information criteria for missing data models (with discussion) Bayes. Anal. 2006;1:651–706.
- Chilés JP, Delfiner P. Geostatistics: Modelling Spatial Uncertainty. Wiley; New York: 1999.
- Daniels MJ, Kass RE. Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. J. Am. Statist. Ass. 1999;94:1254–1263.
- Gelfand AE, Banerjee S, Gamerman D. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics. 2005;16:1–15.
- Gelfand AE, Schmidt AM, Banerjee S, Sirmans CF. Nonstationary multivariate process modelling through spatially varying coregionalization (with discussion) Test. 2004;13:263–312.
- Gelfand AE, Vounatsou P. Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics. 2003;4:11–25. [PubMed]
- Harville DD. Matrix Algebra from a Statistician's Perspective. Springer; New York: 1997.
- Held L, Natário I, Fenton SE, Rue H, Becker N. Towards joint disease mapping. Statist. Meth. Med. Res. 2005;14:61–82. [PubMed]
- Jin X, Carlin BP, Banerjee S. Generalized hierarchical multivariate CAR models for areal data. Biometrics. 2005;61:950–961. [PubMed]
- Kim H, Sun D, Tsutakawa RK. A bivariate Bayes method for improving the estimates of mortality rates with a twofold conditional autoregressive model. J. Am. Statist. Ass. 2001;96:1506–1521.
- Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Statist. Med. 2000;19:2555–2567. [PubMed]
- Knorr-Held L, Best NG. A shared component model for detecting joint and selective clustering of two diseases. J. R. Statist. Soc. A. 2001;164:73–85.
- MacNab YC. Hierarchical Bayesian spatial modelling of small-area rates of non-rare disease. Statist. Med. 2003;22:1761–1773. [PubMed]
- Mardia KV. Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. J. Multiv. Anal. 1988;24:265–284.
- National Cancer Institute . Mortality—All COD, Public-use with County, Total U.S. for Expanded Races (1990-2000) <18 Age Groups>. National Cancer Institute; Bethesda: 2003.
- Royle JA, Berliner LM. A hierarchical approach to multivariate spatial modeling and prediction. J. Agric. Biol. Environ. Statist. 1999;4:29–56.
- Sain SR, Cressie N. Multivariate lattice models for spatial environmental data. Proc. Statist. Environ. Sect. Am. Statist. Ass. 2002:2820–2825.
- Schmidt AM, Gelfand AE. A Bayesian coregionalization approach for multivariate pollutant data. J. Geophys. Res. Atmos. 2003;108 article 8783.
- Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) J. R. Statist. Soc. 2002;64:583–639.
- Wackernagel H. Multivariate Geostatistics: an Introduction with Applications. 3rd edn. Springer; New York: 2003.
- Wall MM. A close look at the spatial structure implied by the CAR and SAR models. J. Statist. Planng Inf. 2004;121:311–324.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |