Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Environ Ecol Stat. Author manuscript; available in PMC 2010 October 13.
Published in final edited form as:
Environ Ecol Stat. 2010 March 1; 17(1): 73–95.
doi:  10.1007/s10651-008-0102-z
PMCID: PMC2953810

Space-time Bayesian small area disease risk models: development and evaluation with a focus on cluster detection


This paper extends the spatial local-likelihood model and the spatial mixture model to the space-time (ST) domain. For comparison, a standard random effect space-time (SREST) model is examined to allow evaluation of each model’s ability in relation to cluster detection. To pursue this evaluation, we use the ST counterparts of spatial cluster detection diagnostics. The proposed criteria are based on posterior estimates (e.g., misclassification rate) and some are based on post-hoc analysis of posterior samples (e.g., exceedance probability). In addition, we examine more conventional model fit criteria including mean square error (MSE). We illustrate the methodology with a real ST dataset, Georgia throat cancer mortality data for the years 1994–2005, and a simulated dataset where different levels and shapes of clusters are embedded. Overall, it is found that conventional SREST models fair well in ST cluster detection and in goodness-of-fit, while for extreme risk detection the local likelihood ST model does best.

Keywords: Cluster, Space-time, Diagnostic, Local likelihood, Mixture, MCMC

1 Introduction

Space-time (ST) modeling of small area health data is a complex subject where spatial, and temporal systems and their interaction must be considered. In this paper we address the development of models that are aimed both at the description of relative risk for a disease of interest in ST, and also the ability of these models to detect unusual aggregations of disease. While these aims could be considered somewhat at odds, there is a growing literature that suggests that cluster detection can be effectively carried out by the use of sampled output from random effect smoothing models (Lawson and Denison 2002). The focus of this paper is the extension to the ST domain of two spatial models that have been proposed for risk estimation and cluster detection: the local likelihood cluster model (LLC) (Kauermann and Opsomer 2003; Hossain and Lawson 2005; Lawson 2006) and the spatial mixture Poisson (MP) model of Fernandez and Green (2002). The ST versions of these models are denoted as LLST and MPST, respectively.

The motivation for the development of spatio-temporal models is the need to provide a parsimonious description of the risk variation over spatial and temporal units when temporal variation is also available. This scenario is commonly found when using publicly available health data: most public health departments provide temporally-varying geo-referenced count incidence data for diseases. Space-time data potentially involves two confounded dimensions of dependency: between different spatial locations and between different time points. So, any proposal of modeling to this kind of data also needs to address the interdependence between these dimensions. Although accounting for this interdependence increases the computational burden considerably, doing so may improve the identification of areas where risks of disease may be particularly notable. We also wish to develop models that can provide tools or output useful in the detection of unusual aggregations of disease or clusters. While relative risk estimation models employ smoothing components, and these could remove evidence of localized unusual risk aggregations, it is also convenient to consider these models in the context of detection of unusual risk as they can have reasonable detection behavior and they are often relatively easy to fit.

1.1 Cluster detection models and evaluation

Here we briefly describe the background of the extension of the chosen models to the ST domain. For a Gaussian model, Higdon (2007) gave a detail illustration how a ST model can be constructed for spatial and temporal dependencies. Haas (1995) proposed a ST local estimator model for kriging where the spatial and temporal dimensions of the cylinder are defined separately. The temporal range of the cylinder is a user-selected value to ensure approximate second-order stationarity along the temporal dimension.

The spatial LLC model of Hossain and Lawson (2005) appears to provide a novel approach to clustering and emphasizes the local nature of estimation. The model assumes a circular window for each data point whose radius is a lasso parameter. This lasso parameter controls the size of the window and so thence a local relative risk parameter. The data point can be the centroid of a county for count data or a residential location for case event data. The extension that we propose to ST data is based on extending the idea of a circular window for the lasso parameter to a cylindrical window where the base represents space and cylinder height represents time. The optimum values for the radius of the base and height of the cylinder are determined by using the spatial and temporal associations that exist in the data.

Using mixture models to model heterogeneity is a common approach to risk assessment (Richardson and Green 1997; Agarwal et al. 2002). Fernandez and Green (2002) proposed a spatial MP model with a variable number of mixing components which requires the implementation of a reversible jump Markov chain Monte Carlo (MCMC) algorithm for the estimation of all model parameters. The weights in their model are spatially-dependent and are estimated from a Markov random field in order to incorporate spatial neighborhood information. We extend this model to the ST domain by making the weights ST dependent. In the estimation of these weights, we employ a Markov random field to explain the spatial heterogeneity and an autoregressive-type model for temporal heterogeneity.

In addition to these two models, another ST model, a ST extension of the spatial convolution model (Knorr-Held and Besag 1998; Besag et al. 1991) is also used to check the relative performances of each model in cluster detection. As for the estimation procedure for these three models, we employ a Bayesian hierarchical modeling paradigm with joint implementation of Gibbs and Metropolis-Hasting MCMC computational methods to obtain posterior estimates of all model parameters.

The ability of models to detect unusual aggregations of disease incidence (cluster detection) can be assessed in a variety of ways. Here we extend the proposal of Hossain and Lawson (2006) for spatial diagnostics to the ST domain. Some of the criteria are based on functionals of the posterior distribution estimated from posterior sample output, while others are computed directly from the model fit. The criteria we utilize can be used for single region checking and can be extended to groups of regions based on neighborhood information. In addition to these extensions, some further criteria are proposed based on mis-classification rates.

The remaining sections of this paper are organized as follows: Section 2 proposes the extensions of the LLC model and MP model to space-time data (LLST and MPST) and also presents the specific form of standard random effect space-time (SREST) model that will be used in comparison with the LLST and MPST models. Section 3 explains how the posterior samples can be generated from these three models for posterior inference. Section 4 extends the previously developed cluster diagnostic criteria to its ST analog and also proposes a number of new criteria based on misspecification rate found in simulated data. Section 5 illustrates the simulation of a dataset for Ohio county geographies in which clusters of different shapes and sizes are embedded and the results are reported in a subsection. Section 6 introduces the Georgian throat cancer data and reports the analysis. Section 7 gives the concluding remarks.

2 Space-time models considered in the evaluation

We assume that disease incidence data are observed in a fixed set of small areas and for discrete time periods. Denote the small area units as i = 1, …, n, and the temporal units as t = 1, …, T. Denote the unit counts as Oit to denote the observed count of disease and Eit to denote the expected count for area i and period t. The expected count is usually assumed to be fixed and obtained from a standard population after adjustment for case-mix, and that is assumed here.

2.1 Local likelihood space-time (LLST) model

The local LLC assumes a Poisson count data model. Hence, at the first level of hierarchy, we model the observed count as


where θit is an implicit function of δit, and Oδit=diieκi,dtteγtOit,Eδit=diieκi,dtteγtEit,i=1,,nandt=1,,T.

The ST lasso can be defined as δit = {eκi, eγt}. The Euclidean distance between the centroids of counties i and i′ is denoted by dii′ and between years t and t′ is denoted by dtt′. The spatial and temporal distances are standardized to have the values between [0,2] and [0, 1], respectively. One of the simplest ways of defining these lasso parameters, κi and γt, are as spatial and temporal structured random effects. So, the spatial lasso parameter, eκi, is the radius and the temporal lasso parameter, eγt, is the half-height of (i, t)-th cylinder. The estimates of radius and height are stabilized by the spatial and temporal associations. In other words, the volume of a cylinder at (i, t)-th point is determined by its spatial and temporal association with other neighboring points.

In the local likelihood formulation each Poisson component is assumed to enter independently and so the local likelihood derived from this model is given by:


It is possible to confine the local likelihood model to the direct dependence of the θit on the (Oit, Eit) pairs resulting from a given δit. However, it is natural to specify a model at the second level of the hierarchy for the log relative risk which focuses on the link between the cylinder parameters and the risk. Hence we assume that the logarithm of relative risk is defined as


where ρ is an overall log-scale mean effect, κi and γt are two structured spatial and temporal random effects (as mentioned above)which control the volume of risk, εi and ξt are two unstructured spatial and temporal random effects and λit is a ST interaction random effects. In effect, the relative risk is modified by the scaling exp(κi + γt) which represents a product of the dimension parameters.

Parameter prior specification

To complete the above models (2.1) and (2.2), we need to specify prior and hyper-prior distributions for all the model parameters. First we assume a zero-mean Gaussian prior distribution for the overall rate ρ:


where the variance kρ is assumed to be large but known. The other parameters assumed to have zero-mean Gaussian distributions are the temporal effect:


the unstructured spatial heterogeneity effect


and the ST interaction effect


Knorr-Held (2000), found that this last assumption was appropriate for the ST disease incidence data he examined. We have also found this to be the case in a range of disease mapping studies. The structured effects have conventional prior distributions often used to reflect localized spatial and temporal correlation (the conditional autoregressive (CAR) model in space and Markov random walk prior distribution in time):



where, ni=j=1nI(j{Δi}),κ¯i=j{Δi}κjniand{Δi} is the set of first-order neighbors of the ith region. The assumed first-order random walk prior distribution for the structured temporal effect, γt, leads to a relatively nonparametric trend in time.

At the next level of hierarchy, the choices of hyper-priors for all variance parameters (σκ2,σγ2,σε2,σξ2,σλ2) are assumed to be inverse gamma distribution: IG (a, b). The choice of inverse gamma distribution as the prior distribution for the variance parameter is equivalent to choosing a gamma distribution for the precision parameter. The scale and shape parameters for the inverse gamma are chosen so as to have large variance. A similar idea for gamma priors is also recommended in Kelsall and Wakefield (1999). The variance parameter for ρ, kρ, is set to a large number, 10,000, to ensure non-informativeness.

The joint posterior distribution of all the variables (θ, ρ, κ, γ, ε, ξ, σκ, σγ, σε, σξ, σλ) for the above LLST model is proportional to L (Oδ|Eδ, θ, δ) p (κκ) p (γγ) p (εε) p (ξξ) p (λλ) pκ) pγ) pε) pξ) pλ) p (ρ), where the ST lasso, δit depends on κi and γt. We defer discussion of posterior sampling until Sect. 3.

2.2 Space-time with mixture of poissons (MPST)

The observed count for the ith county and tth year, Oit, is assumed to follow a mixture of poisson distributions where the total number of mixing components, k, is assumed to be fixed. Mathematically,

Oit|k,ω,λ~l=1kωitl Poisson (Oit|Eitλl)

where the weights are to satisfy two conditions, 0 < ωitl < 1 and l=1kωitl=1, and is determined by a hidden (unobserved) allocation variable, {zit}i=1,,nt=1,,T as ωitl = p (zit = l). A reversible jump (RJ) MCMC will be required when the number of mixing components, k is a random variable. However, in this paper, we do not explore this possibility. Rather, a goodness-of-fit statistic, deviance information criteria (DIC) (Spiegelhalter et al. 2002) or mean square predictive error (MSPE) (Gelfand and Ghosh 1998), will be used to determine the optimum value for k. When we adopt this strategy, we are aware of the criticism that this approach does not estimate k and other unknown parameters jointly. None-the-less, by using an information criterion to find the value for k, we do ensure the selection of the most parsimonious model. In another paper, the above model will be implemented using variable dimension methods.

Fernandez and Green (2002) illustrated a way to calculate the location dependence weights for the spatial mixture model by using a Markov random field. These structured spatial random effects are then logistic transformed to calculate the weights. For our spatio-temporal mixture model, these ST weights can be generalized to include structured spatial effects, structured temporal effects and non-separable ST interaction effects. So, besides considering a Markov random field for spatial dependence, we can consider an autoregressive type model for temporal dependence and a model that arises as the product of theses two (space and time) main effects. Thus the weights are calculated by a logistic transformed function

ωitl=exp (fitl/ϕ)l=1kexp (fitl/ϕ),l=1,,k

where ϕ > 0 is a scaling parameter, which controls the smoothness of the weight function, and fitl is a function of random effect terms. The exact form of fitl can be specified to suit the application. One such example could be fitl = κil + γtl + ηitl where different terms describe separate space and time effects and non-separable interaction effects. In the application with a real and a simulated datasets, we will consider a much simpler model as fitl = κil + γtl, where the κ’s are the structured spatial random effects, γ’s are the structured temporal random effects. The structured spatial random effects κl = (κ1l, … , κnl) have a CAR prior specification:


and the structured temporal random effects γl = (γ1l, … , γTl) have the first-order autoregressive prior specification:


As for defining prior distributions for other unknown parameters in the above mixture model: the allocation variables z11, …, znT are assumed to have a multinomial prior distribution with probability p (zit = l|k, ω, λ) = ωitl, for l = 1, …, k. The ϕ has a uniform prior distribution with the range (0, ϕmax), ϕmax is assumed to be a small number (generally, 10). Choosing a uniform prior for ϕ allows the restriction of the estimate for ϕ to a small number by assigning a reasonably narrow range for the uniform distribution, otherwise large values for ϕ (i.e., ϕ → ∞) lead ωitl = 1/k for all i, t and l, indicating no spatial and temporal dependences in the weights. The prior distribution for the λ’s is assumed to have a gamma form and is defined as


The order restriction in the above gamma distribution is imposed to ensure identifiability of the estimates and a constant, k!, arises because of this order restriction. It has been observed for the spatial model in Fernandez and Green (2002) that this order restriction was sufficient to generate distinct posterior samples for each parameter. The hyperparameters α and β are assumed to be known.

Using the conditional independence assumption, the joint posterior distribution of all the variables is proportional to p (O|E, λ, k, ω) p (λ|α, β, k) p (z|k, ω) p (κ|σκ, k) p (γ|σγ, k) p (η|ση, k) p (σκ|k) p (σγ|k) p (ση|k), where the likelihood is defined as

p(O|E,λ,k,ω)=i=1nt=1Tl=1kωitl Poisson (Oit|Eit,λl).

Sampling algorithms for this model are discussed in Sect. 3 and Appendix B.

2.3 Standard Random Effect Space-Time (SREST) model

The SREST model is defined by assuming the same Poisson distribution for the observed counts at the first level of hierarchy, given as

Oit|θit~ Poisson (Eitθit).

At the second level of the hierarchy, the logarithm of relative risk, θit, is defined as

log (θit)=ρ+κi+γt+εi+ξt+λit.

This model is similar to a random effect model suggested by Knorr-Held and Besag (1998) for ST data (see also Knorr-Held 2000). To complete the above SREST model specification, the prior distributions are specified as:

ρ~N(0,kρ),   kρ is assumed to be known,




ξt|σξ2  ~  N(0,σξ2),and,


The choice of prior distributions reflects the nature of the effects assumed. The choices of hyper-prior distributions for all variance parameters are as for the LLST model.

The joint posterior distribution of all the variables is proportional to p (O|E, θ) p (κκ) p (γγ) p (εε) p (ξξ) p (λλ) pκ) pγ) pε) pξ) pλ) p (ρ) where the likelihood is defined as

p(O|E,θ)=i=1nt=1TPoisson (Oit|Eit,θit).

3 Posterior sampling

The parameters of LLST and SREST models have fixed dimension. The posterior sampling in this fixed dimension model is relatively straightforward (Gamerman and Lopes 2006). Within this hierarchy there is conditional conjugacy with IG prior distributions for Gaussian variances and with normal prior distributions for overall mean and random effects. The posterior samples for each parameter are generated by an MCMC algorithm with joint implementation of Metropolis-Hastings and Gibbs sampling. The details and specific form of posterior distributions for LLST model are given in Appendix A. For the relatively simpler SREST model, the posterior distributions are similar as for the LLST model (Appendix A) where the pair (Oδ, Eδ) is replaced by (O, E).

As for the initial values for the lasso parameters, κ and γ of LLST model, we set these two equal to a large negative number such that the exponentiation of this will be close to zero. The reason of this prior guess, the exponentiation of the estimates of κ and γ should be in the ranges [0,2] and [0, 1], respectively, since [0,2] is the range for the distance between two centroids and [0, 1] is for the time points. Thus, estimates close to the lower limits of these ranges imply no association whereas, close to the upper limits imply maximum association. We exploit these arguments in determining the initial values for κ and γ, and set to a value −10, i.e., a priori we are assuming the observed data contains no cluster.

For MPST model, due to the fixed dimension of the number of mixing components k, we do not have the issue of variable dimensionality for the parameters λ, κ and γ. The implementation details and the posterior distributions for all the parameters are illustrated in Appendix B. Appendix C also presents the rationale for choice of k. We implement the Metropolis-Hastings sampling algorithm when the posterior distributions are known up to proportionality and Gibbs sampling when conditional distributions can be sampled easily.

4 Cluster diagnostics

In order to check the ability of models in the recovery of clusters we propose a set of criteria for ST model where spatial and/or spatio-temporal clusters are likely to be present. Here we assume that a model has been fitted under the Bayesian paradigm and that estimates of relative risk are available in each small area of the study. Alternative non-Bayesian estimates of relative risk could also be examined using some of the diagnostics, but some of our diagnostics address quantities only directly available from posterior sampling. Among these measures that we will propose; the exceedance-based measure is for both simulated and real data; the measures based on misspecification rates, Receiver Operating Characteristic (ROC) curves and mean square errors (MSEs) are for simulated data only.

4.1 Exceedence-based measures

The first diagnostic we consider is the posterior exceedence probability (PEP) of the relative risk estimate. The exceedence probability of the sampled θit can be computed as qit=Pr^(θ^it>c)=1Gg=1GI(θ^itg>c) where θ^itg is the estimated risk for the gth sample value from converged posterior sampling output, G is the posterior sample size and c is a threshold value. This probability estimate is commonly used to provide evidence of ‘significant’ excess risk within individual areas (see, e.g., Richardson et al. 2004). This can be regarded as a criterion for identifying ‘hot-spot’ clusters. For simulated data, the PEP for the relative risk is given by qits=Pr^(θ^its>c) for sth simulated data. The average PEP (APEP), over the simulated data, is defined as q¯it=1Ssqits. In case of real data, S is set to one.

We could use PEP to examine single regions. However we could also believe that clustering should have some spatial and/or temporal integrity and so criteria that also examine neighborhoods around points could be useful. Define a set, {qitj; j = 0, 1, …, ni}, of first-order neighbor q values of ith region and tth year, where qit0 is the q value of ith region and tth year itself. A local measure Ri, which was defined previously in Hossain and Lawson (2006) for spatial data, can be extended to include a temporal dimension as


to calculate the proportion having APEP greater than 0.95 based on the first-order neighbors. The first indicator function in the right hand side of above equation, I (qit > 0.95), is to ensure that only the regions of excess risk are used in checking the model ability in cluster detection. The measure Rit shows the grouping of excess risk where the average posterior probability of excess risk is greater than 0.95. Note that there is a trade off between the choice of c and the chosen critical probability value (here defined as 0.95). Higher values of c will lead to fewer areas signaling while lower critical probability values will admit more areas. For calculations of Rit here we have used c = 1.

The above measure Rit includes only the spatial (first-order) neighborhood information in order to check the models ability in detecting clusters. This measure can be extended to include preceding year information with an weight, ω, as


The value for the weight (0 < ω < 1) can be chosen depending on the significance of preceding year information in detecting clusters of excess risks; in a typical situation it is equal to 0.5. The measure Rit,t−1 can be useful to detect spatial clusters since it adds evidence of excess risk from one preceding year with the weight 1 − ω. This can also be generalized over longer time periods. We do not pursue this generalization here.

4.2 Misspecification rate

Denote θittrue as the true relative risk that was assigned to region i and time t in the data generating process. A set of threshold values in the local relative risk can be defined to allow the definition of a cluster. Hence we can use a risk threshold as a classification tool for groups of regions to be regarded as clusters. The levels we examine here are d = (1.0, 2.0, 3.0), exceeding these values means presence of clusters of excess risks. Choosing these threshold values will help to define three risk levels as: low-risk level when d = 1, medium-risk level when d = 2 and high-risk level when d = 3. An indicator matrix with the element, Zit(d)=I(θittrued) is also defined to show the ST location of clusters for a given threshold value d.

In a similar way, based on the mean posterior estimate (θ^¯) that obtained for each model and for a given threshold value d, two indicator matrices Z1(d)andZ0(d) for clusters and not-clusters, respectively, can be defined where the elements of each matrices are

Z1it(d)={1,if θ^¯itd and θittrued0,if θ^¯it<d and θittrued


Z0it(d)={1,if θ^¯it<d and θittrue<d0,if θ^¯itd and θittrue<d.

The elements of Z1(d)(Z0(d)) will be one if the clusters (not-clusters) for a given threshold value are correctly estimated by a specific model. Two misspecification rates based on these two indicator matrices, Z1(d)andZ0(d), can be defined for the purpose of checking each model’s ability in detecting true clusters or not-clusters. These are

M R1(d)=1(i,t):θittruedZ1it(d)(i,t)Zit(d),


M R0(d)=1(i,t):θittrue<dZ0it(d)(i,t)(1Zit(d)).

It is expected that with the increasing value of d, the direction of these two misspecification rates will be reversed.

4.3 ROC curves and AUC

The above misspecification rates are calculated for the numbers based on true clusters (true positive) and true not-clusters (true negative), but do not exploit the information in false clusters (false positive) and false not-clusters (false negative). An ROC curve can be used in this situation which is a plot of the true positive rate (sensitivity) against the false positive rate (1-specificity) for the different possible cut-off of a diagnostic test (see, Hossain and Lawson 2006 for details). In doing so, we first define the true state of “clusters” and “not-clusters” by an indicator function Zit(d)asI(θ^¯itd) for a given value of d. We use the ROC curve (and hence the area under the curve (AUC)) to judge each model’s ability in detecting true clusters.

4.4 Mean square error

When the true relative risks are known, using MSE is a common practice to check the adequacy of fit (see e.g., Green and Richardson 2002). The measure is defined as,




is the averaged [theta w/ hat]it over replicated datasets. This MSE is a global measure, and shows the overall goodness-of-fit for each model over all regions and time periods. The main pitfall attached to the MSE is that it does not show any localized evidence how well the imbedded clusters (or, the true clusters) are estimated by each model. We extend the MSE to pursue this objective, as a goodness-of-fit measure only for the cluster regions for a given threshold value, d. Thus, MSE for clusters is defined as,


In a similar way, the MSE for ‘no cluster’ can also be defined.

5 Simulation design: Ohio County geographies

We adopt the Clark and Lawson (2002) definition of spatio-temporal clusters, in the sense of persistence of clusters over space and time. We have used the 88 counties of the state of Ohio, USA, for a period of 21 years (1968–1988) to illustrate the method, as a base for simulation. This period matches the well-known lung cancer dataset analyzed by various authors (see, e.g., The available variables in the lung cancer dataset are observed and expected death counts for ith county and tth year where i =1, …, 88 and t = 1, …, 21. However, this dataset does not demonstrate strong localized clustering patterns and so we have resorted to the use of the Ohio geographies over a 21-year period with simulated data based on true relative risks which include known clusters. We conducted a simulation experiments for the given Ohio geography where the true risks are assigned in a way to yield spatial and/or spatio-temporal clusters. The simulated data also makes feasible a relative comparison among the LLST, MPST and SREST models, when the true risks are known.

A dataset is simulated to have a large number of clusters of different shapes and magnitudes. The underlying risks, {μit}, are generated randomly from an Uniform (0.5, 1.5) distribution since in a normal background null situation it is expected that the relative risk at each county and year will be close to ‘one’ with some small variation. Then, clusters of different shapes and magnitudes (δit) are embedded to the desired counties and years to give θittrue=μit+δit. In Fig. 1 (top-left), the maps for the embedded spatio-temporal clusters are given. For these clusters, θittrue ranges from 1.7 to 4.0, which is fairly high. The simulation study was intended to generate clusters of high risk areas and to check model performances for this scenario.

Fig. 1
Thematic maps of true and the posterior estimates of relative risks for the simulated data. Top-left: true relative risks, top-right: LLST model, bottom-left: MPST model, bottom-right: SREST model

Given the true relative risks with the clusters added, defined under the model and the expected counts from the real data, we conditionally simulated S sets of observed counts for each county and year from a Poisson distribution with:

yits|eit,θittrue~ Poisson (eitθittrue);i=1,C,  t=1,,T  and  s=1,,S

where θittrue is assumed to be the true value for θit. This number of replication was fixed at S = 100 and was chosen to reduce computation time in model fitting, but was deemed large enough to allow averaging to be representative. The results reported are for averages over the S replications.

5.1 Results

The LLST model was implemented in the software environment R (R Development Core Team 2004). The MPST and SREST models were implemented in WinBUGS (Spiegelhalter et al. 2003). The MPST model was implemented for a fixed dimension and Appendix C illustrates the reasoning how the fixed dimension was chosen. Figure 1 displays the true (top-left) and posterior simulation-averaged relative risk maps resulting from fitting the LLST model (top-right), MPST model (bottom-left) and the SREST model (bottom-right). It shows clearly that the estimates of relative risks from each model are very close to the true values. The maps for the averaged PEP (q) and R are given in left and right columns of Fig. 2. The results reported are for the threshold value c = 1 and averages over the 100 replications. This threshold level seems to signal the clusters of high intensity well. It is also visible in these maps that all the models seem correctly signal all the spatio-temporal clusters. For the increasing threshold value, the average PEP and R maps appear to become ‘cleaned out’, in that lower risk areas no longer signal (the maps are not included here for space limitation).

Fig. 2
Thematic maps of posterior exceedance probability q for threshold value 1 (left column) and R (right column) for simulated data. Top-row: LLST model, middle-row: MPST model and bottom-row: SREST model

Table 1 gives the cluster misspecification rates for each model for various threshold values. The no-cluster misspecification rates are within parenthesis. It appears that each model performs differently for each risk level. The MPST model has the lowest misspecification rate in detecting clusters of low-risk (d = 1), the SREST has lowest rate for clusters of medium-risk (d = 2) and LLST model has the lowest rate for clusters of high-risk (d = 3). From this table, it can be concluded that the detection rates of true clusters of low and medium risks by MPST and SREST models are better than LLST model. In case high-risk level clusters, LLST performs better than the other two models.

Table 1
Cluster (not-cluster) misspecification rates for each model for a given threshold value, d

Figure 3 displays the ROC curve for each model with the corresponding AUC. The top three rows are for the threshold values d =1, 2 and 3. Left, middle and right columns are for the LLST, MPST and SREST models, respectively. For threshold values 1 and 2, MPST and SREST models perform better than LLST model. For high threshold value (d = 3), the LLST model performs equally well to the MPST and SREST models. The ROC curves in the bottom row are for the PEP. The binary classifications rule of a true cluster is defined as before (Hossain and Lawson 2006): a true status of a ‘cluster’ is defined when the average exceedance probability, qit at (i, t)th point, is greater than 0.95 since at this value the evidence of excess relative risk is substantial. The test status is defined for θitt based on a cut-off and we call all points above the cut-off cluster, and all those below not cluster. The AUCs indicate that the LLST model performs slightly better than MPST and SREST models.

Fig. 3
ROC curves and AUC. Top-three rows are for the threshold values d = 1, 2 and 3, respectively, for simulated data. The bottom-row is for the exceeding probability. Left, middle and right columns are respectively, for the LLST, MPST and SREST models

Table 2 gives the MSE for various threshold values. For clusters with low and medium risks (i.e., threshold values 1 and 2), SREST model has the lowest MSE and clusters with high risks (i.e., threshold value, 3) LLST model has the lowest MSE. These results are in coherent to the findings from Table 1. The overall MSE is given in the last row and indicates that SREST model has the lowest MSE.

Table 2
Mean square error (MSE) for each model for a given threshold value, d

6 Georgia throat cancer data and results

We analyze throat cancer mortality data for the state of Georgia for the years 1994–2005. The state of Georgia has 159 counties. The data is available from the Georgia division of Public health website In order to fit the models discussed in Sect. 2, we need to calculate the case-mix adjusted expected counts. The expected counts are obtained by the indirect method of standardization. The age-sex distribution of throat cancer death for the year 2000 of US population is considered as a reference population in the expected deaths calculation. The county specific observed SMRs (ratio of observed to expected) for the years 1994–2005 are given in Fig. 4 (top-left). The observed SMRs do not show any indication of persistent clusters of excess risks, but show some evidence of sporadic spatio-temporal clusters, e.g., one cluster in the south for the year 1994, one cluster each near the center for 1996, 1997 and 1999, and one cluster in the south-east for 2002.

Fig. 4
Thematic maps of observed SMR and the posterior estimates of relative risks of throat cancer for the state of Georgia. Top-left: observed SMR, top-right: LLST model, bottom-left: MPST model, bottom-right: SREST model

The LLST model was implemented in the software environment R. The MPST and SREST models were implemented in WinBUGS. We check the density and trace plots of posterior samples to ensure the convergence of each model parameters. The posterior mean of relative risk estimates for each county and year are given in Fig. 4: LLST model in top-right, MPST model in bottom-left and SREST model in bottom-right. The number of mixing components for the mixture model was set to three after checking the MSPE, as illustrated for simulated data in Appendix C. It appears that all three models predict the spatio-temporal clusters well. The relative risk estimates from MPST model appears to shrink near one for many counties. We suspect, an excess of zero counts for the throat cancer mortality is the main reason for this shrinkage.

Figure 5 shows the posterior estimates of exceedance probability for the LLST model (top-two rows), the MPST model (middle-two rows) and the SREST model (bottom-two rows). The probabilities are calculated for the threshold value one. It appears that all three models give almost identical signals for the regions of excess risks. There is not much evidence of clusters, group of regions of excess risks. The maps for Rit do not signal any clusters (the maps are not included for space limitation). It is also interesting to observe that MPST model performs equally well in comparing with LLST and SREST models in signaling those high risk regions. Overall, analyzing this real data also confirms the findings of the simulation study that the SREST model, in general, performs better, but the LLST model picks up the extreme risk counties well.

Fig. 5
Thematic maps of posterior exceedance probability q for threshold value 1 for the state of Georgia. Top-two rows: LLST model, middle-two rows: MPST model and bottom-two rows: SREST model

7 Conclusions

The primary focus of this paper is the proposed extension of LLST model and MPST model to ST data. The LLST model appears to provide a novel approach to clustering and emphasizes the local nature of estimation. The MCMC implementation is also straightforward. The MPST model is a semiparametric model and can be very useful in cluster analysis as it models the spatial, temporal and spatio-temporal clusters according to the mixing weights. The only difficulty lies in the MCMC implementation which could require a complicated RJ algorithm because of variable dimension of mixing components. Here we have used an approximate profile likelihood approach for inference in this model.

All three models considered in this paper, the LLST, MPST and SREST, included only the random effects corresponding to spatial, temporal and interaction effects to mimic various sources of variability. This is in agreement with the recent suggestion in Best et al. (2005) that cluster detection can be effectively carried out by the use of sampled output from random effects smoothing models. Depending on study objectives, sometimes it may be essential to detect clusters after the adjustment for covariate effects. Adding covariates to these models would not be difficult. For the spatial LLC model, Hossain and Lawson (2005) illustrated two ways of adding covariates. Along these lines, the LLST model can also be extended to include covariates. For the MPST model, the Poisson distribution can be extended to include the covariates, X,asf(Oit|λl,β,Eit,Xit)=Poisson(EitejβjXitsλl). For the SREST model, the covariates can be added to calculate the logarithm of relative risk at the second level of hierarchy.

The secondary focus of this paper is the demonstration of the use of ST diagnostics, which are intended to check the models’ ability in the recovery of known clusters. Among the proposed diagnostics, Rit is based on neighborhood information with the belief that clustering could have spatial integrity. The other proposed diagnostics: misspecification rate, MSE and ROC curve, are risk specific, i.e., threshold value specific. This makes the model comparison feasible for various risk levels.

In general, the major results of the evaluation of the proposed methods is that we have observed that MPST and SREST models perform better for detecting clusters of low or medium risks, and LLST model performs better for detecting clusters of high risks. The misspecification rates and the MSEs for any of these models are very small. So in real applications, the SREST model may be preferable and is easy to implement and available on standard Bayesian software packages. This certainly supports the widespread use of the SREST model. Because of space limitation, we avoided providing the WinBUGS and R codes that we have developed for this paper, but they can be made available from the first author on request.


The support of National Institutes of Health NIH grant # 1 R03 CA125828-01 is gratefully acknowledged. We thank the editor and two anonymous referees for many thorough and constructive suggestions.



Md. Monir Hossain is a Research Assistant Professor of Biostatistics in the Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina. His research interests include developing models for spatial and spatial-temporal data, and developing method for cluster diagnostics.

Andrew B. Lawson is a Professor of Biostatistics in the Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina. He has a research specialization in statistical methods in spatial epidemiology. He has published extensively, including 6 books in the area of spatial epidemiology.

Appendix A

The posterior samples for each unknown parameters (θ, ρ, κ, γ, ε, ξ, σκ, σγ, σε, σξ, σλ) of LLST model are generated by the joint implementation of Gibbs and Metropolis-Hasting updating scheme. The parameter sampling schemes are given as:

θit|    Poisson(Oδit|Eδit,θit)N(logθit|ρ+κi+γt+εi+ξt,σλ2),

ρ|  ~  N(ρ|kρi=1nt=1T(logθitκiγtεiξt)σλ2+nTkρ,kρσλ2σλ2+nTkρ),

κi|  ~  N(κi|σλ2jΔiκj+σκ2t=1T(logθitργtεiξt)niσλ2+Tσκ2,σκ2σλ2niσλ2+Tσκ2),

γt|  ~  N(γt|σλ2γt1+σγ2i=1n(logθitρκiεiξt)σλ2+nσγ2,σγ2σλ2σλ2+nσγ2),

εi|  ~  N(εi|σε2t=1T(logθitρκiγtξt)σλ2+Tσε2,σε2σλ2σλ2+Tσε2),

ξt|  ~  N(ξt|σξ2i=1n(logθitρκiγtεi)σλ2+nσξ2,σξ2σλ2σλ2+nσξ2),

σλ2|  ~  Inv-Gamma(σλ2|aλ+nT2,  bλ+12i=1nt=1T(logθitρκiγtεiξt)2),

σκ2|  ~  Inv-Gamma(σκ|aκ+n2,  bκ+12i=1nni(κiκ¯i)2),

σγ2|  ~  Inv-Gamma(σγ2|aγ+T2,  bγ+12t=1T(γtγt1)2),

σε2|  ~  Inv-Gamma(σε2|aε+n2,  bε+12i=1nεi2),


σξ2|  ~  Inv-Gamma(σξ2|aξ+T2,  bξ+12t=1Tξt2).

Candidate values for (θ11,,θnT) are drawn by a Metropolis-Hasting updating scheme from the posterior for θit. For all other parameters, samples are drawn directly by Gibbs sampling.

Appendix B

For the MPST model and a given value of k, the conditional distributions are:

κil|    t=1T{ωitlPoisson(Oit|Eit,λl)}N(κil|κ¯il,σκl2ni),

γtl|    i=1n{ωitlPoisson(Oit|Eit,λl)}N(γtl|γt1l,σγl2),

σκl2|  ~  Inv-Gamma(σκl2|aκ+n2,  bκ+12i=1nni(κilκ¯il)2),

σγl2|  ~  Inv-Gamma(σγl2|aγ+T2,  bγ+12t=1T(γtlγt1l)2),

ϕ|  ~  i=1nt=1T{l=1kωitlPoisson(Oit|Eit,λl)}I(0<ϕ<ϕmax),

zit=l|  ~  ωitlPoisson(Oit|Eitl,λl)I(l{1,,k}),


λ|  ~  l=1kGamma(λl|α+i,t:zit=lOit,β+i,t:zit=lEit)I(λ1,,λk).

Candidate values for (κi1,,κik),(γt1,,γtk) and ϕ′ are drawn by a Metropolis-Hasting updating scheme from the posterior distributions for (κi1, …, κik), (γt1, … , γtk) and ϕ, respectively. For all other parameters, samples are drawn directly by Gibbs sampling.

Appendix C

The WinBUGS implementation is based on the idea of profile likelihood where k is fixed to an integer value since k is an integer valued function. The results reported for the MPST model is for k = 3 which is chosen based on the mean square prediction error (MSPE). It is calculated by


where Oit(g,pred) is the predicted O for the given estimates of all model parameters at gth iteration and G is the MCMC sample size. The predicted values are generated from the posterior predictive distribution,


where β includes all the model parameters. The predicted values are generated after ensuring the convergence of all model parameters.

The MSPE, like the DIC, includes a penalizing function for model complexity. However, the type of mixture models that we have considered, it appears that the MSPE keeps decreasing even for dimension k = 5, although the rates of decreasing reduce remarkably after k = 3. The MSPEs for the current simulated data for the number of mixing components 2, 3, 4 and 5 are respectively, 342.4, 167.1, 128.7 and 125.3. Adding a component to k = 2 reduces the MSPE by 175.3 whereas increment from 3 to 4 reduces the MSPE by 38.4 and afterwards there is not much substantial decrease. The second reduction may not be justified for the increased model complexity (i.e., the increased number of model parameters for the increase of mixing component). Note that, increase the mixing component by 1 will increase the number of parameters by 110 (one λ, eighty-eight κ’s and twenty-one γ’s).


  • Agarwal DK, Gelfand AE, Citron-Pousty S. Zero-inflated models with application to spatial count data. Environ Ecol Stat. 2002;9:341–355. doi:10.1023/A:1020910605990.
  • Besag J, York J, Mollie A. Bayesian image restoration with two applications in spatial statistics (with discussion) Ann Inst Stat Math. 1991;43:1–59. doi:10.1007/BF00116466.
  • Best N, Richardson S, Thomas A. A comparison of Bayesian spatial models for disease mapping. Stat Methods Med Res. 2005;14:35–59. doi:10.1191/0962280205sm388oa. [PubMed]
  • Clark AB, Lawson AB. Spatio-temporal cluster modelling of small area health data, Ch 14. In: Lawson AB, Denison D, editors. Spatial cluster modelling. London: Chapman and Hall; 2002.
  • Fernandez C, Green PJ. Modelling spatially correlated data via mixtures: a Bayesian approach. J R Stat Soc B. 2002;64:805–826. doi:10.1111/1467-9868.00362.
  • Gamerman D, Lopes HF. Markov chain Monte Carlo: stochastic simulation for Bayesian inference. 2nd edn. Boca Raton, FL: Chapman & Hall/CRC; 2006.
  • Gelfand AE, Ghosh SK. Model choice: a minimum posterior predictive loss. Biometrika. 1998;85:1–11. doi:10.1093/biomet/85.1.1.
  • Green PJ, Richardson S. Hidden Markov models and disease mapping. J Am Stat Assoc. 2002;97:1–16. doi:10.1198/016214502388618870.
  • Haas TA. Local prediction of a spatio-temporal process with an application to wet sulfate deposition. J Am Stat Assoc. 1995;90:1189–1199. doi:10.2307/2291511.
  • Higdon H. A primer on space-time modeling from a Bayesian perspective, Ch 6. In: Finkenstadt B, Held L, Isham V, editors. Statistical methods for spatio-temporal systems. Florida: Chapman & Hall/CRC; 2007.
  • Hossain MM, Lawson AB. Local likelihood disease clustering: development and evaluation. Environ Ecol Stat. 2005;12:259–273. doi:10.1007/s10651-005-1512-9.
  • Hossain MM, Lawson AB. Cluster detection diagnostics for small area health data: with reference to evaluation of local likelihood models. Stat Med. 2006;25:771–786. doi:10.1002/sim.2401. [PubMed]
  • Kauermann G, Opsomer JD. Local likelihood estimation in generalized additive models. Scand J Stat. 2003;30:317–337. doi:10.1111/1467-9469.00333.
  • Kelsall JE, Wakefield JC. Discussion of “Bayesian models for spatially correlated disease and exposure data” by Best et al. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian statistics 6. Oxford: Oxford University Press; 1999. p. 151.
  • Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Stat Med. 2000;19:2555–2567. doi:10.1002/1097-0258(20000915/30)19:17/18<2555::AID-SIM587>3.0.CO;2-#. [PubMed]
  • Knorr-Held L, Besag J. Modelling risk from a disease in time and space. Stat Med. 1998;17:2045–2060. doi:10.1002/(SICI)1097-0258(19980930)17:18<2045::AID-SIM943>3.0.CO;2-P. [PubMed]
  • Lawson AB. Cluster detection: a critique and a Bayesian proposal. Stat Med. 2006;25:897–916. doi:10.1002/sim.2417. [PubMed]
  • Lawson AB, Denison DGT. Spatial cluster modeling: an overview, Ch 1. In: Lawson AB, Denison D, editors. Spatial cluster modelling. Boca Raton, FL: Chapman and Hall; 2002.
  • R Development Core Team. R Foundation for Statistical Computing. Austria: Vienna; 2004. R: A language and environment for statistical computing.
  • Richardson S, Green PJ. On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc B. 1997;59(4):731–792. doi:10.1111/1467-9868.00095.
  • Richardson S, Thomson A, Best N, Elliott P. Interpreting posterior relative risk estimates in disease-mapping studies. Environ Health Perspect. 2004;112:1016–1025. [PMC free article] [PubMed]
  • Spiegelhalter D, Best N, Carlin B, Linde A. Bayesian measures of model complexity and fit (with discussion) J Roy Statist Soc Ser B Methodological. 2002;64:583–639. doi:10.1111/1467-9868.00353.
  • Spiegelhalter D, Thomas A, Best N, Lunn D. MRC Biostatistics Unit. Cambridge, UK: Institute of Public Health; 2003. WinBUGS user manual.