Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2911798

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. MULTIVARIATE SPATIAL MODELS FOR PROGENY TRIALS
- 3. PREDICTIVE PROCESS MODELS FOR LARGE SPATIAL DATASETS
- 4. HIERARCHICAL MODEL SPECIFICATION AND IMPLEMENTATION
- 5. ANALYSIS OF DATA
- 6. SUMMARY
- References

Authors

Related links

J Am Stat Assoc. Author manuscript; available in PMC 2010 July 29.

Published in final edited form as:

J Am Stat Assoc. 2010 June 1; 105(490): 506–521.

doi: 10.1198/jasa.2009.ap09068PMCID: PMC2911798

NIHMSID: NIHMS206706

Sudipto Banerjee, Associate Professor of Biostatistics, Andrew O. Finley, Assistant Professor, Patrik Waldmann, Statistician, and Tore Ericsson, Senior Scientist

Sudipto Banerjee, School of Public Health, University of Minnesota, Minneapolis, MN 55455;

Sudipto Banerjee: ude.nmu.tatsoib@botpidus; Andrew O. Finley: ude.usm@ayelnif; Patrik Waldmann: es.uls.syfneg@nnamdlaW.kirtaP; Tore Ericsson: es.ksrofgoks@nosscire.erot

See other articles in PMC that cite the published article.

This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets of multiple traits of interest. Direct application of such multivariate models to large spatial datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several thousand iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negates the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects we discuss a multivariate *predictive process* that reduces the computational burden by projecting the original process onto a subspace generated by realizations of the original process at a specified set of locations (or knots). We illustrate the proposed methods using a synthetic dataset with multivariate additive and dominant genetic effects and anisotropic spatial residuals, and a large dataset from a scots pine (*Pinus sylvestris* L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.

Increasing international carbon credit markets and demand for wood fiber to supply burgeoning bioeconomies has spurred interest in improving forest ecosystem services. Much of this demand will be met by large plantation forests and, as a result, foresters seek methods to improve the productivity of these plantations. Tree breeding is one of the most important methods to improve productivity. Here, the focus is on increasing the genetic gain in future generations of certain traits by identifying and mating superior individuals that have high breeding values. In plantation settings, foresters are typically interested in improving traits which increase wood production (e.g., increasing both stem height and diameter). Depending on the anticipated use of the wood, there are other traits considered such as those that determine stem quality (e.g., the number and form of knots produced by branches from the stem). The principal objective is to reveal whether interrelationship among these various traits of primary or secondary interest are favorable or unfavorable for future selection (i.e., evaluating if traits are positively or negatively correlated depending on the objectives of the study). Henderson and Quaas (1976) introduced the multitrait animal (or tree or individual) model in quantitative genetics. Efficient breeding requires accurate estimates of the genetic variance and covariance parameters within the multitrait animal model and individual breeding values (Lynch and Walsh 1998).

Quantitative genetics studies the inheritance of polygenic traits, focusing upon estimation of additive genetic variance and *heritability* often estimated as the proportion of additiuve variance out of the total genetic and unexplained variation. A high heritability should result in a larger selection response, that is, a higher probability for genetic gain in future generations. A feature common to genetic trials in forestry and agriculture is the presence of systematic heterogeneity among observational units. If this spatial heterogeneity is neglected in the model, estimates of the genetic parameters might be biased (Dutkowski et al. 2002; Cappa and Cantet 2007). For spatially arranged observational units, heterogeneity typically results from small-scale environmental conditions (e.g., soil characteristics, microclimates, light availability, etc.) that might cause correlation among units as a function of distance and/or direction. While traditional randomized designs sought to negate any confounding spatial correlation, recent interest has shifted to models accounting for spatial correlation that, most often, incorporate spatial correlation through differencing or by constructing covariates using residuals from neighboring units (see, e.g., Zimmerman and Harville 1991; Cullis et al. 1998).

Hierarchical random effects models implemented through Markov chain Monte Carlo (MCMC) methods have recently gained popularity within quantitative genetics (see, e.g., Sorensen and Gianola 2002; Waldmann and Ericsson 2006; and references therein). Both single-component and blocked Gibbs samplers have been proposed. However, the former suffer from slow mixing and the latter often involve matrix decompositions whose complexity increases as *O*(*n*^{3}) in the number of observations, *n*, at every iteration of the MCMC algorithm. These computationally intensive matrix calculations limit inference to small datasets insufficient for breeding purposes. Recently, Finley et al. (2009a) proposed a class of hierarchical models for a single outcome variable with genetic and spatial random effects that are suited for large datasets. A regular spectral decomposition of genetic covariance matrices together with a reduced-rank spatial process enable full Bayesian inference using MCMC methods.

The current article extends the work of Finley et al. (2009a) to a *multivariate* spatial setting, where georeferenced trees with known pedigrees have been observed. Multiple traits of breeding interest are measured on each tree. We posit that the variation within and between these traits can be explained by heredity, spatial variation (e.g., soil characteristics or small scale topographic attributes), and some unstructured measurement error. Here we focus upon three traits that are vitally important in the production of trees for lumber and biomass: height (H), diameter at 1.4 meters from the ground (D), and branch angle (B). The first two traits describe the amount of wood produced by a tree (i.e., the wood volume) and the third describes both the wood quality (i.e., due to knots resulting from branches connecting to the tree bole) and structural integrity of the tree (i.e., susceptibility to breakage due to heavy snow loadings). By estimating the variability of each trait’s additive and, to some extent, nonadditive effects while adjusting for their associations within a location, along with the breeding values, plantation tree nurseries are better able to choose among breeding stock to produce seed with the greatest genetic potential for the desired traits.

One approach builds hierarchical multivariate spatial models by exploiting conditional relationships (spatial regressions) between the variables (Royle and Berliner 1999). This can accrue some computational benefits (see, e.g., Banerjee, Carlin, and Gelfand 2004, section 7.2.2) and can be useful in situations where there is a natural chronology, or causality, for the conditioning. For genetic traits, however, such chronology is often absent or, at best, controversial and one seeks *joint* multivariate models (e.g., Gelfand et al. 2004; Jin, Banerjee, and Carlin 2007).

For the genetic effects we make use of separable association structures and derive a multivariate eigen-decomposition that separates heredity from association among the traits and produces significant computational benefits. For multivariate spatial dependence, a computationally feasible choice is “separable” or “intrinsic” association structures (see, e.g., Banerjee, Carlin, and Gelfand 2004, chapter 7). Here each trait is assigned the same spatial correlation function. It is unlikely that the strength and nature of association will be the same for the different traits and we seek more flexible correlation structures that accommodate a different spatial correlation function for each trait.

A “linear model of coregionalization” (Gelfand et al. 2004) yields arbitrarily rich multivariate correlation models, but its estimation for large datasets is onerous. A reduced-rank multivariate “kriging” model can help achieve dimension reduction. One possibility is to use low-rank smoothing splines that have been fairly conspicuous in univariate settings (Kamman and Wand 2003; Ruppert, Wand, and Carroll 2003; Crainiceanu, Diggle, and Rowlingson 2008). With multiple traits we require multivariate splines whose specifications may be awkward. A reduced-rank multivariate spatial process is arguably more natural and at least as flexible here with the nice interpretability of correlation functions (see Banerjee et al. 2008; Cressie and Johannesson 2008).

A key point here is to address the issue of systematic biases in the estimates of variance components that arise in low-rank kriging or spline models. Since most existing low-rank methods focus upon prediction and smoothing, this has remained largely unaddressed in the aforementioned literature. Here we offer further insight into this systematic bias and improve the predictive process model with a bias-adjustment that easily applies to multivariate applications. Again, the spatial process approach that we adopt here facilitates in understanding and rectifying these biases that, arguably, would have been more difficult to resolve with multivariate splines.

In Section 2, we develop the multivariate spatial models for the data. Section 3 discusses the dimension-reducing predictive process models, systematic biases in estimating variance components from such models and their remedy. Section 4 outlines the implementation details for the genetic and spatial effects. A synthetic data analysis and the scots pine analyses are presented in Section 5. Finally, Section 6 concludes the article with a brief discussion.

For our subsequent development we consider a finite collection of spatial locations, say = {**s**_{1}, …, **s*** _{n}*}, referenced by Easting–Northing, where

Finley et al. (2009a) (also see references therein) discussed structured random effect models for a single trait. Here we extend their work to multivariate traits over spatial domains. The following multivariate spatially referenced trait model

$$\mathbf{Y}({\mathbf{s}}_{i})=\mathbf{X}{({\mathbf{s}}_{i})}^{\prime}\mathit{\beta}+{\mathbf{a}}_{i}+{\mathbf{d}}_{i}+\mathbf{w}({\mathbf{s}}_{i})+\mathit{\epsilon}({\mathbf{s}}_{i});\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}i=1,\dots ,n.$$

(1)

Here **X**(**s*** _{i}*)′ is an

While **a*** _{i}* and

In modeling these multivariate effects, we need to model associations between the outcome variables within a site as well as association across sites. The *m* × 1 additive and dominance vector effects, **a*** _{i}* and

Estimating (1) from large datasets can be a computationally onerous task as it involves the inverses of the dispersion matrices of
$\mathbf{a}={({\mathbf{a}}_{1}^{\prime},\dots ,{\mathbf{a}}_{n}^{\prime})}^{\prime}$, **d** defined analogously and **w** = (**w**(**s**_{1})′, …, **w**(**s*** _{n}*)′)′. For the genetic effects, separability simplifies computations through the Kronecker products var{

Spatially structured dependence is introduced in (1) through a multivariate (*m* × 1) spatial process **w**(**s**) ~ GP(**0**, **C _{w}**(·, ·;

The first condition asserts that **Σ _{w}** be symmetric, although the cross-covariance function itself need not be. The second ensures the positive definiteness of

One possibility for **C _{w}**(

More general cross-covariance functions are not routine to specify since they demand that for any number of locations and any choice of these locations the resulting covariance matrix for the associated data be positive definite. Various constructions are possible. One possibility is the moving average approach of Ver Hoef, Cressie, and Barry (2004). The technique is also called kernel convolution and is a well-known approach for creating rich classes of stationary and nonstationary spatial processes (e.g., Higdon 2002). Yet another approach could attempt a multivariate version of local stationarity, extending ideas in Fuentes (2002). The primary characterization theorem for cross-covariance functions (Yaglom 1987) says that real-valued functions, say *C _{ij}*(

Cross-spectral forms are often computationally infeasible resulting in integral representations with embedded parameters that are difficult to estimate. We motivate an alternative approach as follows. For the separable model, notice that we can write **C _{w}**(

For modeling **L**(**s**), without loss of generality one can assume that
$\mathbf{L}(\mathbf{s})={\mathbf{C}}_{\mathbf{w}}^{1/2}(\mathbf{s},\mathbf{s})$ is a lower-triangular square root; the one-to-one correspondence between the elements of **L**(**s**) and *C*** _{w}**(

Finally, we turn to the choice of the spatial correlation functions. The nonstationary Matérn correlation function (Paciorek and Schervish 2006),

$$\frac{1}{\mathrm{\Gamma}(\nu ){2}^{\nu -1}}\mid \mathbf{\sum}({\mathbf{s}}_{i}){\mid}^{1/4}\mid \mathbf{\sum}({\mathbf{s}}_{j}){\mid}^{1/4}{\left|\frac{\mathbf{\sum}({\mathbf{s}}_{i})+\mathbf{\sum}({\mathbf{s}}_{j})}{2}\right|}^{-1/2}\times {(2\sqrt{\nu d({\mathbf{s}}_{i},{\mathbf{s}}_{j})})}^{\nu}{\kappa}_{\nu}(2\sqrt{\nu d({\mathbf{s}}_{i},{\mathbf{s}}_{j})}),$$

offers a very flexible choice for *ρ*(**s*** _{i}*,

Once a valid cross-covariance function is specified for a multivariate Gaussian process, the realizations of **w**(**s**) over the set of observed locations are given by N(**0**, **Σ _{w}**(

Modeling large spatial datasets have received much attention in the recent past. Vecchia (1988) proposed approximating the likelihood with a product of appropriate conditional distributions to obtain maximum-likelihood estimates. Stein, Chi, and Welty (2004) adapt this effectively for restricted maximum likelihood estimation. Another possibility is to approximate the likelihood using spectral representations of the spatial process (Fuentes 2007). These likelihood approximations yield a joint distribution, but not a process that facilitate spatial interpolation. Furthermore, these approaches do not seem to extend easily to multivariate settings, especially in the presence of other structured dependencies such as genetic traits. Yet another approach considers compactly supported correlation functions (Furrer, Genton, and Nychka 2006; Kaufman, Schervish, and Nychka 2008; Du, Zhang, and Mandrekar 2009) that yield sparse correlation structures. More efficient sparse solvers can then be employed for kriging and variance estimation, but the tapered structures may limit modeling flexibility. Also, full likelihood-based inference still requires determinant computations that may be problematic.

In recent work Rue, Martino, and Chopin (2009) propose a promising Integrated Nested Laplace Approximation (INLA) algorithm as an alternative to MCMC that utilizes the sparser matrix structures to deliver fast and accurate posterior approximations. This uses conditional independence to achieve sparse spatial precision matrices that considerably accelerate computations, but relaxing this assumption would significantly detract from the computational benefits of the INLA and the process needs to be *approximated* by a Gaussian Markov Random Field (GMRF) (Rue and Held 2006). Furthermore, the method involves a mode-finding exercise for hyper-parameters that may be problematic when the number of hyperparameters is more than 10. Briefly, its effectiveness is unclear for our multivariate genetic models with different structured random effects and unknown covariance matrices as hyperparameters.

We opt for a class of models that emerge from representations of the spatial process in a lower-dimensional subspace and that easily adapt to multivariate processes. These are often referred to as low-rank or reduced-rank spatial models and have been explored in different contexts (Kamman and Wand 2003; Stein 2007, 2008; Banerjee et al. 2008; Crainiceanu, Diggle, and Rowlingson 2008; Cressie and Johannesson 2008). Many of these methods are variants of the so-called “subset of regressors” methods used in Gaussian process regressions for large datasets in machine learning (e.g., Rasmussen and Williams 2006). The idea here is to consider a smaller set of locations, or “knots,” say
${\mathcal{S}}^{\ast}=\{{\mathbf{s}}_{1}^{\ast},\dots ,{\mathbf{s}}_{{n}^{\ast}}^{\ast}\}$, where the number of knots, *n*^{*}, is much smaller than the number of observed sites, and to express the spatial process realizations over in terms of its realizations over the smaller set of knots. It is reasonable to assume that the spatial information available from the *n* sites could be summarized in terms of a smaller, but representative, set of locations—the knots. This is achieved by an optimal projection of **w**(**s**) using the information available from the knots.

A process-based reduced-rank formulation has recently been explored by Banerjee et al. (2008). We call **w**(**s**) the *parent process* and consider its realizations over an arbitrary set of *n*^{*} locations
${\mathcal{S}}^{\ast}=\{{\mathbf{s}}_{1}^{\ast},\dots ,{\mathbf{s}}_{{n}^{\ast}}^{\ast}\}$ with *n*^{*} *n*. Letting
${\mathbf{w}}^{\ast}=(\mathbf{w}{({\mathbf{s}}_{1}^{\ast})}^{\prime},\dots ,\mathbf{w}{({\mathbf{s}}_{n}^{\ast})}^{\prime})$ be the realizations of the parent process over , we formulate a *multivariate predictive process* defined as

$$\begin{array}{l}\stackrel{\sim}{\mathbf{w}}(\mathbf{s})=\text{E}[\stackrel{\sim}{\mathbf{w}}(\mathbf{s})\mid {\mathbf{w}}^{\ast}]\\ =cov(\mathbf{w}(\mathbf{s}),{\mathbf{w}}^{\ast}){var}^{-1}({\mathbf{w}}^{\ast}){\mathbf{w}}^{\ast}\\ =\mathcal{C}{(\mathbf{s};\mathit{\theta})}^{\prime}{\mathbf{\sum}}^{\ast}{(\mathit{\theta})}^{-1}{\mathbf{w}}^{\ast},\end{array}$$

(2)

where (**s**; ** θ**)′ is a 1×

$$\mathbf{Y}({\mathbf{s}}_{i})=\mathbf{X}{({\mathbf{s}}_{i})}^{\prime}\mathit{\beta}+{\mathbf{a}}_{i}+{\mathbf{d}}_{i}+\stackrel{\sim}{\mathbf{w}}({\mathbf{s}}_{i})+\mathit{\epsilon}({\mathbf{s}}_{i});\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}i=1,\dots ,n.$$

(3)

No new parameters are introduced and the cross-covariance function of the predictive process derives directly from that of the parent process.

Unlike many other knot-based methods, the predictive process model does not introduce additional parameters or in having to project the data onto a grid. Also, some approaches (e.g., Cressie and Johannesson 2008) require obtaining an empirical estimate of the covariance function. This may be challenging for the hierarchical models we envision here as the variability in the “data” is assumed to be a sum of different structured effects. Hence we cannot use variograms on the data to isolate the empirical estimates of the covariance function from the other random effects. These effects are identified through the second-stage specifications of their dispersion structures. In addition, the predictive process enjoys several desirable theoretical properties in terms of approximating the parent process. For instance, (**s**), being a conditional expectation, can be regarded as an orthogonal projection of **w**(**s**) on an appropriate linear subspace (e.g., Stein 1999). Other properties are discussed in Banerjee et al. (2008).

For multivariate spatial processes, we have the following inequality:

$$\begin{array}{l}var\{\mathbf{w}(\mathbf{s})\}-var\{\stackrel{\sim}{\mathbf{w}}(\mathbf{s})\}\\ ={C}_{\mathbf{w}}(\mathbf{s},\mathbf{s})-\mathcal{C}{(\mathbf{s},\mathit{\theta})}^{\prime}{\mathbf{\sum}}^{\ast}{(\mathit{\theta})}^{-1}\mathcal{C}(\mathbf{s},\mathit{\theta})\\ =var\{\mathbf{w}(\mathbf{s})\mid {\mathbf{w}}^{\ast}\}\succcurlyeq 0,\end{array}$$

(4)

where var{·} denotes the variance–covariance matrix and ≥ 0 indicates nonnegative definiteness. Equality holds only when **s** and thus the predictive process coincides with the parent process.

To understand the implications of the inequality in (4), let us momentarily focus upon a univariate predictive process. For the univariate process (i.e., with *m* = 1), let *w*(**s**) follow a spatial process with covariance function parameterized as
${C}_{w}({\mathbf{s}}_{1},{\mathbf{s}}_{2})={\sigma}_{w}^{2}\rho ({\mathbf{s}}_{1},{\mathbf{s}}_{2};\mathit{\phi})$. The corresponding covariance function for (**s**) is given by
${C}_{\stackrel{\sim}{w}}({\mathbf{s}}_{1},{\mathbf{s}}_{2})={\sigma}_{w}^{2}\mathbf{r}{({\mathbf{s}}_{1};\mathit{\phi})}^{\prime}{\mathbf{R}}^{\ast}{(\mathit{\phi})}^{-1}\mathbf{r}({\mathbf{s}}_{2};\mathit{\phi})$, where
${\sigma}_{w}^{2}$ now represents the variance of the spatial process, **r**(**s**; **) is the ***n*^{*} × 1 vector with the spatial correlation between **s** and
${\mathbf{s}}_{i}^{\ast}$, that is,
$\rho (\mathbf{s};{\mathbf{s}}_{i}^{\ast})$, as its *i*th element and **R**^{*}(**) is the ***n*^{*} × *n*^{*} correlation matrix of the realizations of the parent process over . In particular, the variance of the predictive process at any location **s** is given by
${\sigma}_{\stackrel{\sim}{w}}^{2}(\mathbf{s})={\sigma}_{w}^{2}\mathbf{r}{(\mathbf{s};\mathit{\phi})}^{\prime}{\mathbf{R}}^{\ast}{(\mathit{\phi})}^{-1}\mathbf{r}(\mathbf{s};\mathit{\phi})$. The inequality in (4) implies
${\sigma}_{\stackrel{\sim}{w}}^{2}(\mathbf{s})\le {\sigma}_{w}^{2}$ at every location **s**.

The above argument reveals the lower variability of the predictive process which leads to oversmoothing by predictive process models. This, in turn, would lead to overestimation by the predictive process model of the residual (unstructured) variance, *τ*^{2}, as it attempts to absorb the remaining variability. Furthermore, note that the predictive process is inherently nonstationary. This calls for care in interpreting the parameter
${\sigma}_{w}^{2}$ in predictive process models. In fact, it may be tempting to conclude that the oversmoothing caused by the predictive process would lead to underestimation of
${\sigma}_{w}^{2}$. The matter, however, is somewhat more subtle and in practice we tend to see an *upward* bias in the estimation of *both* the spatial and unstructured residual variance components (see Section 5.1).

To explain this, consider a single trait *Y*(**s**) in a univariate version (i.e., *m* = 1) of the model in (1). Let = {**s**_{1}, …, **s*** _{n}*} denote the set of observed data locations and assume (for convenience) that ∩ is empty, that is, the knots are completely exclusive of the data locations. Then, standard multivariate normal calculations reveal

$$var(\mathbf{w}\mid {\mathbf{w}}^{\ast})={\sigma}_{w}^{2}(\mathbf{R}(\mathit{\phi})-\mathcal{R}(\mathit{\phi}){\mathbf{R}}^{\ast}{(\mathit{\phi})}^{-1}\mathcal{R}(\mathit{\phi}))\succ 0,$$

where **w** = (*w*(**s**_{1}), …, *w*(**s**_{n}))′,
$\mathcal{R}{(\mathit{\phi})}^{\prime}=\{\rho ({\mathbf{s}}_{i},{\mathbf{s}}_{j}^{\ast})\}$ is the *n* × *n*^{*} cross-correlation matrix between the realizations over the locations and over the knots, and > 0 denotes positive definiteness. This implies that **R**(**) + ***γ*^{2} *I _{n}* > (

$${\mathbf{u}}^{\prime}{(\mathcal{R}{(\mathit{\phi})}^{\prime}{\mathbf{R}}^{\ast}(\mathit{\phi})\mathcal{R}(\mathit{\phi})+{\gamma}^{2}{I}_{n})}^{-1}\mathbf{u}\ge {\mathbf{u}}^{\prime}{(\mathbf{R}(\mathit{\phi})+{\gamma}^{2}{I}_{n})}^{-1}\mathbf{u},$$

(5)

where **u** = **y** − **X β** −

If
${\widehat{\sigma}}_{w2}^{2}$ and
${\widehat{\sigma}}_{w1}^{2}$ denote the likelihood-based estimates from the predictive process model (3) and the parent model (1), respectively, then using standard eigen-analysis (Appendix A) we obtain
${\widehat{\sigma}}_{w1}^{2}/{\widehat{\sigma}}_{w2}^{2}\le {\lambda}_{max}((\mathbf{R}(\mathit{\phi})+{\gamma}^{2}{I}_{n}){(\mathcal{R}{(\mathit{\phi})}^{\prime}{\mathbf{R}}^{\ast}{(\mathit{\phi})}^{-1}\times \mathcal{R}(\mathit{\phi})+{\gamma}^{2}{I}_{n})}^{-1})$, where λ_{max}(**A**) denotes the maximum eigen-value of the matrix **A**. Arguments analogous to the above can be used to demonstrate biases in variance component estimation in other low-rank spline models as well.

Using equivalence of Gaussian measures and fixed domain asymptotics Zhang (2004) proved that the spatial covariance parameters cannot all be estimated consistently in classical geostatistical settings with the isotropic Matérn correlation function (see Section 2.4), although the function *σ*^{2}(λ^{2}* ^{ν}*) is consistently estimable. However, assigning priors to such reparametrizations may not always be straightforward. From a Bayesian perspective, this implies that the effect of the priors on the inference of the variance components may not be easily overwhelmed by the data—even when the datasets are large.

In lieu of the above, the biases we point out are derived holding the parameters in ** as constant. In such settings, our simulation experiments reveal that both the parent model and the predictive process model often tend to estimate the regression coefficients **** β** and

The key point is that the predictive process, univariate or multivariate, introduces nonstationarity in the spatial variance component. The variance component parameters, therefore, have different interpretations for stationary parent process models and their nonstationary predictive process counterparts. To adjust for the nonstationary spatial variance component, we need to introduce a nonstationary unstructured variance while making sure not to incur computational expenses.

Returning to the more general *m*× 1 multivariate processes, a rectification is obtained by adding a spatially adaptive “nugget” process to the predictive process (**s**). More precisely, we introduce * _{}*(

$$\mathbf{Y}({\mathbf{s}}_{i})=\mathbf{X}{({\mathbf{s}}_{i})}^{\prime}\mathit{\beta}+{\mathbf{a}}_{i}+{\mathbf{d}}_{i}+{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}({\mathbf{s}}_{i})+\mathit{\epsilon}({\mathbf{s}}_{i});\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}i=1\dots ,n.$$

(6)

Notice that var{* _{}*(

We provide some brief remarks on knot selection (see Finley et al. 2009b, for details). With a fairly even distribution of data locations, one possibility is to select knots on a uniform grid overlaid on the domain. Alternatively, selection can be achieved through a formal design-based approach based upon minimization of a spatially averaged predictive variance criterion (e.g., Diggle and Lophaven 2006). However, in general, the locations are highly irregular, generating substantial areas of sparse observations where we wish to avoid placing knots, since they would be “wasted” and possibly lead to inflated predictive process variances and slower convergence. Here, more practical space-covering designs (e.g., Royle and Nychka 1998) can yield a representative collection of knots that better cover the domain. Another alternative is to apply popular clustering algorithms such as k-means or more robust median-based *partitioning around medoids* algorithms (e.g., Kaufman and Rousseeuw 1990). User-friendly implementations of these algorithms are available in R packages such as fields and cluster and have been used in spline-based low-rank kriging models (Ruppert, Wand, and Caroll 2003; Crainiceanu, Diggle, and Rowlingson 2008).

Let **Y** = (**Y**(**s**_{1})′, …, **Y**(**s*** _{n}*)′)′ be

$$\begin{array}{l}p(\{\mathbf{\Psi}\})p(\mathit{\theta})\text{N}(\mathit{\beta}\mid {\mathit{\mu}}_{\beta},{\mathbf{\sum}}_{\beta})\text{IW}({\mathbf{\sum}}_{a}\mid {r}_{a},{\mathrm{\Upsilon}}_{a})\text{N}(\mathbf{a}\mid \mathbf{0},\mathbf{A}\otimes {\mathbf{\sum}}_{a})\times \text{IW}({\mathbf{\sum}}_{d}\mid {r}_{d},{\mathrm{\Upsilon}}_{d})\text{N}(\mathbf{d}\mid \mathbf{0},\mathbf{D}\otimes {\mathbf{\sum}}_{d})\times \text{N}({\mathbf{w}}^{\ast}\mid \mathbf{0},{\mathbf{\sum}}^{\ast}(\mathit{\theta}))\times \text{N}({\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}\mid \mathbf{F}(\mathit{\theta}){\mathbf{w}}^{\ast},{\mathbf{\sum}}_{\stackrel{\sim}{\epsilon}})\times \prod _{i=1}^{n}\text{N}\{\mathbf{Y}({\mathbf{s}}_{i})\mid \mathbf{X}{({\mathbf{s}}_{i})}^{\prime}\mathit{\beta}+{\mathbf{a}}_{i}+{\mathbf{d}}_{i}+{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}({\mathbf{s}}_{i}),\mathbf{\Psi}\}.\end{array}$$

(7)

The matrices **A** and **D** are fixed, while **Σ*** _{a}* and

Assuming stationary cross-covariances [i.e., **L**(**s**) = **L** in Section 2.3], we have ** θ** = {

Estimation proceeds using MCMC employing a Gibbs sampler with Metropolis–Hastings (MH) steps (Gelman et al. 2004); see Appendix B for details. Briefly, the full conditional distributions for each of the parameters’ regression coefficients, the additive, dominant and spatial effects are all normal, while those for the matrices **Σ*** _{a}*,

Gibbs updates still requires computing
${[{\mathbf{A}}^{-1}\otimes {\mathbf{\sum}}_{a}^{-1}+{I}_{n}\otimes {\mathbf{\Psi}}^{-1}]}^{-1}$ for updating **a** and a similar matrix for **d**, defined analogously, that are both *nm* × *nm*. A brute-force approach will not be computationally feasible, so we derive the following *multivariate eigen-decomposition* (Appendix C) to obtain

$${\mathbf{\sum}}_{a\mid \xb7}=({\mathbf{P}}_{\mathbf{A}}\otimes {I}_{m})\left[\underset{i=1}{\overset{n}{\oplus}}{\left(\frac{1}{{\lambda}_{\mathbf{A}}^{(i)}}{\mathbf{\sum}}_{a}^{-1}+{\mathbf{\Psi}}^{-1}\right)}^{-1}\right]({\mathbf{P}}_{\mathbf{A}}^{\prime}\otimes {I}_{m}).$$

(8)

Here
${\mathbf{A}}^{-1}={\mathbf{P}}_{\mathbf{A}}{\mathrm{\Lambda}}_{\mathbf{A}}^{-1}{\mathbf{P}}_{\mathbf{A}}^{\prime}$ and
${\mathbf{\sum}}_{a}^{-1}={\mathbf{P}}_{{\mathbf{\sum}}_{a}}{\mathrm{\Lambda}}_{{\mathbf{\sum}}_{a}}^{-1}{\mathbf{P}}_{{\mathbf{\sum}}_{a}}^{\prime}$ are the spectral decompositions of **A**^{−1} and
${\mathbf{\sum}}_{a}^{-1}$, and
${\lambda}_{\mathbf{A}}^{(i)}$ denotes the eigenvalues of **A**. Massive computational gains are achieved since the matrix **A** is fixed, so the spectral decomposition, which delivers the eigenvalues, needs to be evaluated only *once*, preceding the Gibbs sampler. All the other matrix inversions on the right hand expression in (8) involve the much smaller *m* × *m* matrices. The setting for **Σ**_{d}_{|·} is exactly analogous.

A useful strategy for estimating spatial hierarchical models is to replace the conditionally independent likelihood in (7) with the marginalized likelihood obtained after integrating out the spatial effects (e.g., Banerjee et al. 2008; Finley et al. 2009b). However, this becomes problematic in the presence of other structured random effects as we are unable to arrive at forms such as (8). Instead, we integrate out the **w**^{*} from (7) and design the Gibbs sampler to obtain the realizations of the predictive process * _{}*. More specifically, this leads to replacing N(

Updating * _{}* requires the inverse of

$${\left[{\mathbf{\sum}}_{{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}}{(\mathit{\theta})}^{-1}+{I}_{n}\otimes {\mathbf{\Psi}}^{-1}\right]}^{-1}=\mathbf{H}{(\mathit{\theta})}^{-1}+\mathbf{H}{(\mathit{\theta})}^{-1}\mathbf{V}{(\mathit{\theta})}^{\prime}\mathbf{J}{(\mathit{\theta})}^{-1}\mathbf{V}(\mathit{\theta})\mathbf{H}{(\mathit{\theta})}^{-1},$$

where **H**(** θ**) is block-diagonal with [

$$det({\mathbf{\sum}}_{{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}}(\mathit{\theta}))=det(\mathbf{M}(\mathit{\theta}))\prod _{i=1}^{n}det\{{\mathbf{C}}_{\mathbf{w}}({\mathbf{s}}_{i},{\mathbf{s}}_{i})-\mathcal{C}{({\mathbf{s}}_{i},\mathit{\theta})}^{\prime}{\mathbf{\sum}}^{\ast}{(\mathit{\theta})}^{-1}\mathcal{C}({\mathbf{s}}_{i},\mathit{\theta})\}/det({\mathbf{\sum}}^{\ast}(\mathit{\theta})).$$

This setup still allows us to recover full posterior inference on the two components of * _{}*, viz.

To supplement the discussion on the predictive process bias-adjustment in Section 3.3, we offer a brief analysis of a synthetic univariate dataset. This dataset comprises 500 locations distributed randomly within a unit square domain. The univariate outcome *Y*(**s**) is generated over these 500 locations from a zero-centered isotropic Gaussian process, with covariance
$cov\{Y(\mathbf{s}),Y({\mathbf{s}}^{\prime})\}={\sigma}_{w}^{2}exp(-\phi \mid \mid \mathbf{s}-{\mathbf{s}}^{\prime}\mid \mid )+{\tau}^{2}{I}_{[\mathbf{s}={\mathbf{s}}^{\prime}]}$, where
${\sigma}_{w}^{2}=5$, *τ*^{2} = 5, and = 12 [i.e., the spatial correlation drops to 0.05 at a distance of 0.25 units]. To illustrate the relationship between bias and knot intensity over the domain, we fit regular (unadjusted) and bias-adjusted predictive process models to these data based on 11 knot intensities between 0 and 500. For each model,
${\sigma}_{w}^{2}$, *τ*^{2}, , and the model intercept *β*_{0} were estimated using three MCMC chains of post burn-in 25,000 iterations.

For brevity, we present only the results for the variance parameters of interest. Results for the unadjusted models are presented in the top subfigures in Figure 1 which illustrates the upward bias in
${\sigma}_{w}^{2}$ and *τ*^{2} estimates as a function of knot intensity. Here, we see a large bias in
${\sigma}_{w}^{2}$ and *τ*^{2} estimates when knot intensity is low. Increasing knot intensity does reduce the bias, but only marginally for *τ*^{2} which fails to recover the true parameter value even as the number of knots approaches the number of observations. In comparison, the bias-adjusted predictive process recovers the true
${\sigma}_{w}^{2}$ and *τ*^{2} values even at low knot intensities, as illustrated in the bottom subfigures in Figure 1. For the analyses offered in Sections 5.2 and 5.3 only the bias-adjusted models are considered.

One replicate of full-sib families (800 offspring in total) were constructed by crossing 50 unrelated parents according to a partial diallel design (see Kempthorne and Curnow 1961, for design details). The 850 individuals were assigned randomly to intersections of a 34 × 25 unit grid. The values of three traits of interest were generated for each individual. The synthetic population mean of each trait was described with an intercept and single slope coefficient [with corresponding variable drawn from N(0, 1)] and the multitrait random polygenic effects **a** and **d** were drawn from MVN(**0**, **A** **Σ*** _{a}*) and MVN(

We estimate the hierarchical model in (7) using a stationary but anisotropic Matérn correlation function as described in Section 2.4. The mean fixed effects and lower-triangles of **Σ*** _{a}* and

The left column shows the interpolated surfaces of the three response variables over the 850 synthetic data points (overlaid on top left surface). The right column shows the associated estimates of the random spatial effects based on 80 knots (overlaid **...**

Parameter credible intervals, 50% (2.5%, 97.5%) for the synthetic dataset nonspatial and spatial predictive process models. Credible intervals that do not include the parameter’s true value are bolded

The synthetic dataset is used to illustrate the extent to which the proposed methods can recover genetic parameters of interest. Assuming ignorance about the partitioning of the variance in **Y**, we assign an IW(*m* + 1, *I _{m}*) prior to each genetic dispersion matrix, while each diagonal element of

Four candidate models were considered: the nonspatial model with only genetic effects and three models with genetic effects and predictive process based random spatial effects. For the spatial models, we considered knot intensities of 80, 101, and 125. Knot configuration followed a *lattice plus close pairs* design (see Diggle and Lophaven 2006). This design consists of a regular *k* × *k* lattice of knots intensified by a set of *m*′ knots placed at random adjacent to the lattice knots—say within a circle having the knot point as center and a radius that is some fraction of the spacing on the lattice. This design is illustrated in the top right panel in Figure 2.

For each model, three MCMC chains were run for 25,000 iterations. The sampler was coded in C++ and leveraged threaded BLAS and LAPACK routines for matrix computations. The code was run on an Altix 350 SGI with two 64-bit 1.6 GHz Intel Itanium processors. Each chain of 25,000 iterations took less than 2 hours to complete. The CODA package in R (http://cran.r-project.org/web/packages/coda/) was used to diagnose convergence by monitoring mixing using Gelman–Rubin diagnostics and autocorrelations (Gelman et al. 2004, section 11.6). Acceptable convergence was diagnosed within 10,000 iterations and therefore 45,000 samples (15,000 × 3) were retained for posterior analysis.

Tables 1 and and22 offer parameter estimates and associated credible intervals for the nonspatial and predictive process models based on 80 and 125 knot intensities. The nonspatial model is not able to estimate correctly three of the six fixed effects (as identified in bold text in Table 1). Interestingly, despite the misspecification of the model (i.e., missing spatial effects), the additive and dominance dispersion matrices are estimated correctly, but for **Σ**_{d}_{;2,2}. However, apportioning of spatial variation to pure error causes the 95% credible intervals of two diagonal elements in **Ψ** to not cover their true values, Table 2. This can cause a downward bias in the measures of heritability. The true heritability for the three traits are **Σ**_{a}_{;1,1}/(**Σ**_{a}_{;1,1} + **Σ**_{d}_{;1,1} + **Ψ**_{1,1}) = 25/(25 + 25 + 5) = 0.45, **Σ**_{a}_{;2,2}/(**Σ**_{a}_{;2,2} + **Σ**_{d}_{;2,2} + **Ψ**_{2,2}) = 0.48, and **Σ**_{a}_{;3,3}/(**Σ**_{a}_{;3,3} + **Σ**_{d}_{;3,3} + **Ψ**_{3,3}) = 0.9, and the nonspatial model estimates with 95% credible intervals are 0.43 (0.32, 0.51), 0.45 (0.35, 0.55), and 0.69 (0.55, 0.78), as provided in the Table 2. A similar downward bias is seen in the dominance ratio when calculated.

The spatial models provide improved estimates of the fixed and genetic random effects as well as estimates of heritability and dominance ratio. For example, at the 80 knot intensity the heritability estimates for the tree traits are 0.43 (0.31, 0.54), 0.49 (0.36, 0.63), and 0.97 (0.88, 0.99). We note that two parameters fall outside of the 95% credible interval in the 80 knot model (i.e., **Σ**_{d}_{;3,3} and **C _{w}**

Field measurements from a 26-year-old scots pine (*Pinus sylvestris* L.) progeny study in northern Sweden serve as a trial dataset for our proposed methods. The original design established in 1971 (by Skogforsk; trial identification S23F7110264 Vindeln) consisted of 8160 seedlings divided into two spatially disjoint trial sites (i.e., denoted as northern and southern trial site). The analysis presented here focuses on the northern planting site. At planting, this site was subdivided into 105 8.8 m × 22 m blocks each containing 40 seedlings placed on a 2.2 m × 2.2 m grid. Parent stock was crossed according to a partial diallel design of 52 parent trees assumed unrelated and seedlings were planted out on the grids with unrestricteded randomization. In 1997, the 2598 surviving trees were measured for various traits of breeding interest, including diameter (D) (measured at 130 cm above ground), total tree height (H) and branch angle (B). The branch angle was scored such that an average tree was given score 5, and the range 1–9 reflect desirable to undesirable branch angle (i.e., horizontal branches were given low scores and upwards vertical branches high scores). The top panel in Figure 3 shows the location of the surviving trees use in the subsequent analysis.

First row: Each point represents a scots pine tree location. The left column shows interpolated surface of nonspatial model residuals for stem height (H), diameter (D), and branch angle (A), respectively from top to bottom. The middle and right columns **...**

As in Section 5.2, we fit both nonspatial and spatial models with random spatial effects estimated with varying knot intensities. Specifically, we fit a nonspatial model with a single intercept and additive and dominance effects for each trait. Then random spatial effects were added with knot intensities of 78, 124, and 160. An initial exploratory analysis with an omnidirectional semivariogram of the nonspatial additive and dominance model residuals showed no evidence of an anisotropic process for any of the response variables; therefore, we consider only the spatial isotropic models here.

The left column of Figure 3 shows clear spatial association among the residuals, which is further affirmed by the empirical semivariogram for the trait of interest (not shown). Specifically, in the left column of Figure 3, we see that D and H are strongly associated and have a relatively longer spatial range compared to the surface for B. Inspection of the semivariogram supports this observation, showing an effective range of ~50 for D and H, while B’s spatial range is less than 10. This disparity in effective range suggests that a separable model, with a single spatial range parameter, would be inappropriate. Further, the relatively short spatial range of B, will require a knot configuration where the distance between knots is less than the residual spatial range. We again use the *lattice plus close pairs* design to help estimate small scale variability and local dependence. Using the isotropic Matérn correlation function, the decay parameters were set to ~ *U*(0.03, 3) and smoothness *ν* fixed at 0.05. This implies an effective spatial range of about 1 to 100 meters. Again, assuming ignorance about the partitioning of the variance in **Y**, we assign an IW(*m* + 1, *I _{m}*) prior to each dispersion matrix with the exception of

For each model, three MCMC chains were run for 25,000 iterations and 45,000 post burn-in samples (15,000 × 3) were retained for posterior analysis. The parameter estimates for the nonspatial additive and dominance model as well as the spatial models based on 78 and 160 knots are provided in Table 3. Here, overlapping 95% credible intervals suggest agreement among the nonspatial and spatial models’ fixed and genetic dispersion parameters (i.e., ** β**,

Parameter credible intervals, 50% (2.5%, 97.5%) for the scots pine dataset nonspatial and spatial predictive process models

We compare the candidate models using Deviance Information Criterion (DIC) (Spiegelhalter et al. 2002). With Ω denoting the generic set of parameters being estimated for each model, we compute
$\overline{D(\mathrm{\Omega})}={E}_{\mathrm{\Omega}\mid \mathbf{Y}}\{-2logL(\mathit{Data}\mid \mathrm{\Omega})\}$, the expected posterior deviance, where *L*(*Data*|Ω) is the first stage likelihood from the respective model and the effective number of parameters (penalty) is
${p}_{D}=\overline{D(\mathrm{\Omega})}-D(\stackrel{\sim}{\mathrm{\Omega}})$, where is the posterior mean of the model parameters. The DIC is then computed easily from the posterior samples as
$\overline{D(\mathrm{\Omega})}+{p}_{D}$ with lower values indicating better models.

Table 4 provides DIC scores and associated *p _{D}* for the non-spatial and predictive process models. Here, we see a large decrease in DIC moving from the nonspatial to the spatial models. As the knot intensity increases the models become more responsive to local variation in the residual surface and, as a result, we see the DIC scores incrementally decrease.

From a breeding point of view, it is encouraging that the additive genetic correlations between branch angle (B) and growth traits (H and D) are small or negative, which means that faster growing trees develop more desirable branching characteristics (lower B scores denote more favorable branch angles). For instance the 160 knot model estimates the median and 95% credible intervals *ρ _{a;H}*

Ultimately, interest turns to estimation and subsequent ranking of the breeding values (i.e., additive effects) and specifically disparities in the ranking order given by the different models. Figure 4 offers a trait specific comparison between the ranking of the top 100 breeding values from the nonspatial and 160 knot predictive process models. Here, departures from the 1: 1 line indicate different ranking. These dispersed scatters suggest the addition of the spatial effect substantially influences breeding preference.

Comparative ranking of the additive genetic effects for scots pine height (left), diameter (middle), and branch angle (right) from the nonspatial and 160 knot predictive process models. The *x*-axis plots the nonspatial model estimated position of the 100 **...**

This analysis supports the idea that ignoring spatial variability in these models can lead to biased estimates of genetic variance parameters and ranking of breeding values, which could result in nonoptimal selection response and lower genetic gain (Magnussen 1993). Analysis of the scots pine data reveals that tree height and diameter are highly genetically correlated, have relatively low heritability, and display similar long distance effective spatial range. In contrast, branch angle seems to be more influenced by genetic components, as indicated by a higher heritability and shorter effective spatial range. Comparison between the nonspatial and spatial predictive process models also shows that the nonspatial model estimates lower heritability for all three traits, although the effect is less pronounced for height. Similar results have been found in other spatial genetic studies of trees. Zas (2006) reported that application of an iterative kriging method increased heritability in *Pinus pinaster* from a 95% confidence interval of 0.04–0.11 to 0.10–0.17 for diameter, and from 0.13–0.18 to 0.17–0.33 for height growth. In Cappa and Cantet (2007), the heritability of diameter at breast height in *Eucalyptus globulus* increased from 0.08 to 0.27 when a spatial spline component was included in the model. Moreover, both Zas (2006) and Cappa and Cantet (2007) showed a dramatic increase in accuracy of breeding values for spatial models, a trend that is likely also seen in the scots pine analysis presented here.

While modeling of large spatial datasets have received much attention in the literature, estimation of variance components in reduced-rank kriging models have been explored less. Here we proposed computationally feasible multivariate spatial genetic trait models that robustly estimate variance components in datasets of sizes that normally are out of range because of prohibitive matrix computations. The process-based approach laid out here not only helps us identify a bias in the estimates of variance components that plagues most reduced-rank spline models but also leads to a remedy that neither involves additional computational burden, nor introduces any additional parameters. Future work may include extending predictive process models to space-time dynamic models and exploration of variance components in temporal contexts (see, e.g., Pourahmadi 1999; Ghosh et al. 2009).

Our simulated and field analysis reveals the spatial component to demonstrably improve the model’s ability to recapture correct estimates of fixed and genetic random effects as well as estimates of heritability and dominance ratios. The spBayes package in R (http://cran.r-project.org/web/packages/spBayes/) now include functions for these bias-adjusted multivariate predictive process models. Functions specifically geared for spatial quantitative genetics are being planned.

Finally, selecting the “knots” (Section 3) is an integral part of any reduced-rank spatial methodology. Since our field trials are conducted over a fairly regular configuration of plots, fixed knots on a regular grid provided robust inference. An arguably better strategy is to allow the knots to vary in their positions and/or their number. In particular, the knots can learn from the realizations and while they will change with each MCMC iteration, the number of adaptive knots required to deliver robust inference will likely be far smaller. These will form a part of future explorations.

The authors thank the editor, the associate editor and two anonymous referees for valuable suggestions that greatly helped to improve the manuscript. The work by the first two authors was supported in part by National Science Foundation grant DMS-0706870. The first author’s work was also supported in part by NIH grant 1-R01-CA95995.

That the ratio
${\widehat{\sigma}}_{w1}^{2}/{\widehat{\sigma}}_{w2}^{2}$ is bounded can be seen as follows. Let **A** = ((**)′****R**^{*}(**)**^{−1}(**) + ***γ*^{2}*I _{n}*)

$$\underset{\mathbf{u}\ne \mathbf{0}}{max}\frac{{\mathbf{u}}^{\prime}\mathbf{Au}}{{\mathbf{u}}^{\prime}\mathbf{Bu}}=\underset{\mathbf{z}\ne \mathbf{0}}{max}\frac{{\mathbf{z}}^{\prime}{{\mathbf{C}}^{-1}}^{\prime}\mathbf{A}{\mathbf{C}}^{-1}\mathbf{z}}{{\mathbf{z}}^{\prime}\mathbf{z}}={\lambda}_{max}({{\mathbf{C}}^{-1}}^{\prime}\mathbf{A}{\mathbf{C}}^{-1}).$$

Note that the eigenvalues of **C**^{−1′}**AC**^{−1} are all real and are the same as those of **C**^{−1}**C**^{−1′}**A** = **B**^{−1}**A** (applying Theorem 21.10.1, pp. 545–546, Harville 1997) This shows that
${\widehat{\sigma}}_{w1}^{2}/{\widehat{\sigma}}_{w2}^{2}\le {\lambda}_{max}((\mathbf{R}(\mathit{\phi})+{\gamma}^{2}{I}_{n}){(\mathcal{R}{(\mathit{\phi})}^{\prime}{\mathbf{R}}^{\ast}{(\mathit{\phi})}^{-1}\mathcal{R}(\mathit{\phi})+{\gamma}^{2}{I}_{n})}^{-1})$.

The full conditional distributions are listed below:

$$\begin{array}{l}\mathit{\beta}\mid \xb7\sim \text{MVN}({\mu}_{\beta \mid \xb7},{\mathbf{\sum}}_{\beta \mid \xb7}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{where}\\ {\mathbf{\sum}}_{\beta \mid \xb7}={[{\mathbf{\sum}}_{\beta}^{-1}+{\mathbf{X}}^{\prime}(I\otimes {\mathbf{\Psi}}^{-1})\mathbf{X}]}^{-1},\\ {\mu}_{\beta \mid \xb7}={\mathbf{\sum}}_{\beta \mid \xb7}[{\mathbf{\sum}}_{\beta}^{-1}{\mathit{\mu}}_{\beta}+{\mathbf{X}}^{\prime}({I}_{n}\otimes {\mathbf{\Psi}}^{-1})(\mathbf{Y}-\mathbf{a}-\mathbf{d}-{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}})];\\ \mathbf{a}\mid \xb7\sim \text{MVN}({\mu}_{a\mid \xb7},{\mathbf{\sum}}_{a\mid \xb7}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{where}\\ {\mathbf{\sum}}_{a\mid \xb7}={[{\mathbf{A}}^{-1}\otimes {\mathbf{\sum}}_{a}^{-1}+{I}_{n}\otimes {\mathbf{\Psi}}^{-1}]}^{-1},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\\ {\mu}_{a\mid \xb7}={\mathbf{\sum}}_{a\mid \xb7}[({\mathbf{A}}^{-1}\otimes {\mathbf{\sum}}_{a}^{-1}({\mathit{\mu}}_{a}+({I}_{n}\otimes {\mathbf{\Psi}}^{-1})(\mathbf{Y}-\mathbf{X}\mathbf{\beta}-\mathbf{d}-{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}})];\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\\ \mathbf{d}\mid \xb7\sim \text{MVN}({\mu}_{d\mid \xb7},{\mathbf{\sum}}_{d\mid \xb7}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{where}\\ {\mathbf{\sum}}_{d\mid \xb7}={[{\mathbf{D}}^{-1}\otimes {\mathbf{\sum}}_{d}^{-1}+{I}_{n}\otimes {\mathbf{\Psi}}^{-1}]}^{-1},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\\ {\mu}_{d\mid \xb7}={\mathbf{\sum}}_{d\mid \xb7}[({\mathbf{D}}^{-1}\otimes {\mathbf{\sum}}_{d}){\mathit{\mu}}_{d}+({I}_{n}\otimes {\mathbf{\Psi}}^{-1})(\mathbf{Y}-\mathbf{X}\beta -\mathbf{a}-{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}})];\\ {\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}\mid \xb7\sim \text{MVN}({\mathit{\mu}}_{{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}\mid \xb7},{\mathbf{\sum}}_{{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}\mid \xb7}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{where}\\ {\mathbf{\sum}}_{{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}\mid \xb7}={[{\mathit{\sum}}_{{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}}^{-1}+({I}_{n}\otimes {\mathbf{\Psi}}^{-1})]}^{-1};\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\\ {\mathit{\mu}}_{{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}\mid \xb7}={\mathbf{\sum}}_{{\stackrel{\sim}{\mathbf{w}}}_{\stackrel{\sim}{\epsilon}}\mid \xb7}({I}_{n}\otimes {\mathbf{\Psi}}^{-1})(\mathbf{Y}-\mathbf{a}-\mathbf{d}).\end{array}$$

For the full conditional distribution of the dispersion matrices, we have conjugate Wishart distributions. Assuming that **Σ*** _{a}* ~ IW(

$$p({\mathbf{\sum}}_{a})\propto \frac{1}{det{({\mathbf{\sum}}_{a})}^{{r}_{a}+(m+1)/2}}exp\left\{-\frac{1}{2}tr({\mathrm{\Upsilon}}_{a}^{-1}{\mathbf{\sum}}_{a}^{-1})\right\},$$

and **Σ*** _{d}* ~ IW(

$$\begin{array}{l}{\mathbf{\sum}}_{a}\mid \xb7\sim \text{IW}\left({r}_{a}+n,{\left[{\mathrm{\Upsilon}}_{a}^{-1}+\sum _{i=1}^{n}\sum _{j=1}^{n}{\stackrel{\sim}{a}}^{ij}{\mathbf{a}}_{j}{\mathbf{a}}_{i}^{\prime}\right]}^{-1}\right),\\ {\mathbf{\sum}}_{d}\mid \xb7\sim \text{IW}\left({r}_{d}+n,{\left[{\mathrm{\Upsilon}}_{d}^{-1}+\sum _{i=1}^{n}\sum _{j=1}^{n}{\stackrel{\sim}{d}}^{ij}{\mathbf{d}}_{j}{\mathbf{d}}_{i}^{\prime}\right]}^{-1}\right),\end{array}$$

where **A**^{−1} = {*ã ^{ij}*} and

$${\tau}_{j}^{2}\mid \xb7\text{IG}\left({r}_{\epsilon j}+\frac{n}{2},{s}_{\epsilon j}+\frac{1}{2}\sum _{i=1}^{n}{({y}_{j}({\mathbf{s}}_{i})-{\mathbf{x}}_{j}{({\mathbf{s}}_{i})}^{\prime}\mathit{\beta}-{a}_{ij}-{d}_{ij}-{\stackrel{\sim}{w}}_{\stackrel{\sim}{\epsilon}j}({\mathbf{s}}_{i}))}^{2}\right)$$

for *j* = 1, …, *m*, where *a _{ij}*,

In the above expressions, we can simplify ${\mathbf{\sum}}_{a\mid \xb7}={[{\mathbf{A}}^{-1}\otimes {\mathbf{\sum}}_{a}^{-1}+{I}_{n}\otimes {\mathbf{\Psi}}^{-1}]}^{-1}$ as

$$\begin{array}{l}{\mathbf{\sum}}_{a\mid \xb7}={[{\mathbf{A}}^{-1}\otimes {\mathbf{\sum}}_{a}^{-1}+{I}_{n}\otimes {\mathbf{\Psi}}^{-1}]}^{-1}\\ ={[({\mathbf{P}}_{\mathbf{A}}\otimes {\mathbf{P}}_{{\mathbf{\sum}}_{a}})({\mathrm{\Lambda}}_{\mathbf{A}}^{-1}\otimes {\mathrm{\Lambda}}_{{\mathbf{\sum}}_{a}}^{-1})({\mathbf{P}}_{\mathbf{A}}^{\prime}\otimes {\mathbf{P}}_{{\mathbf{\sum}}_{a}}^{\prime})+{I}_{n}\otimes {\mathbf{\Psi}}^{-1}]}^{-1}\\ =({\mathbf{P}}_{\mathbf{A}}\otimes {\mathbf{P}}_{{\mathbf{\sum}}_{a}}){[{\mathrm{\Lambda}}_{\mathbf{A}}^{-1}\otimes {\mathrm{\Lambda}}_{{\mathbf{\sum}}_{a}}^{-1}+{I}_{n}\otimes ({\mathbf{P}}_{{\mathbf{\sum}}_{a}}^{\prime}{\mathbf{\Psi}}^{-1}{\mathbf{P}}_{{\mathbf{\sum}}_{a}})]}^{-1}\times ({\mathbf{P}}_{\mathbf{A}}^{\prime}\otimes {\mathbf{P}}_{{\mathbf{\sum}}_{a}}^{\prime})\\ =({\mathbf{P}}_{\mathbf{A}}\otimes {\mathbf{P}}_{{\mathbf{\sum}}_{a}})({I}_{n}\otimes {\mathbf{P}}_{{\mathbf{\sum}}_{a}}^{\prime})\left[\underset{i=1}{\overset{n}{\oplus}}{\left(\frac{1}{{\lambda}_{\mathbf{A}}^{(i)}}{\mathbf{\sum}}_{a}^{-1}+{\mathbf{\Psi}}^{-1}\right)}^{-1}\right]\times ({I}_{n}\otimes {\mathbf{P}}_{{\mathbf{\sum}}_{a}})({\mathbf{P}}_{\mathbf{A}}^{\prime}\otimes {\mathbf{P}}_{{\mathbf{\sum}}_{a}}^{\prime})\\ =({\mathbf{P}}_{\mathbf{A}}\otimes {I}_{m})\left[\underset{i=1}{\overset{n}{\oplus}}{\left(\frac{1}{{\lambda}_{\mathbf{A}}^{(i)}}{\mathbf{\sum}}_{a}^{-1}+{\mathbf{\Psi}}^{-1}\right)}^{-1}\right]({\mathbf{P}}_{\mathbf{A}}^{\prime}\otimes {I}_{m}),\end{array}$$

where
${\lambda}_{\mathbf{A}}^{(i)}$ is the *i*th diagonal element of Λ** _{A}**. Note that

Sudipto Banerjee, School of Public Health, University of Minnesota, Minneapolis, MN 55455.

Andrew O. Finley, Departments of Forestry and Geography, Michigan State University, East Lansing, MI 48824.

Patrik Waldmann, Thetastats, Uar-davgen, 91, SE-224 71 Lund, Sweden.

Tore Ericsson, Skogforsk (the Forestry Research Institute of Sweden) and Associate Professor, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Skogforsk, Box 3 918 21 Svar, Sweden.

- Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman & Hall/CRC Press; 2004.
- Banerjee S, Gelfand AE, Finley AO, Sang H. Gaussian Predictive Process Models for Large Spatial Datasets. Journal of the Royal Statistical Society, Ser B. 2008;70:825–848. [PMC free article] [PubMed]
- Cappa EP, Cantet RJC. Bayesian Estimation of a Surface to Account for a Spatial Trend Using Penalized Splines in an Individual-Tree Mixed Model. Canadian Journal of Forest Research. 2007;37:2677–2688.
- Crainiceanu CM, Diggle PJ, Rowlingson B. Bivariate Binomial Spatial Modeling of Loa Loa Prevalence in Tropical Africa” (with discussion) Journal of the American Statistical Association. 2008;103:21–37.
- Cressie N. Statistics for Spatial Data. 2. New York: Wiley; 1993.
- Cressie N, Johannesson G. Fixed Rank Kriging for Very Large Spatial Data Sets. Journal of the Royal Statistical Society, Ser B. 2008;70:209–226.
- Cullis BR, Gogel B, Verbyla A, Thompson R. Spatial Analysis of Multi-Environment Early Generation Variety Trials. Biometrics. 1998;54:1–18.
- Daniels MJ, Kass RE. Nonconjugate Bayesian Estimation of Covariance Matrices and Its Use in Hierarchical Models. Journal of the American Statistical Association. 1999;94:1254–1263.
- Diggle P, Lophaven S. Bayesian Geostatistical Design. Scandinavian Journal of Statistics. 2006;33:53–64.
- Du J, Zhang H, Mandrekar VS. Fixed-Domain Asymptotic Properties of Tapered Maximum Likelihood Estimators. The Annals of Statistics. 2009;37:3330–3361.
- Dutkowski GW, Silva JC, Gilmour AR, Lopez GA. Spatial Analysis Methods for Forest Genetic Trials. Canadian Journal of Forest Research. 2002;32:2201–2214.
- Finley AO, Banerjee S, Waldmann P, Ericsonn T. Hierarchical Spatial Modeling of Additive and Dominance Genetic Variance for Large Spatial Trial Datasets. Biometrics. 2009a;61:441–451. [PMC free article] [PubMed]
- Finley AO, Sang H, Banerjee S, Gelfand AE. Improving the Performance of Predictive Process Modeling for Large Datasets. Computational Statistics and Data Analysis. 2009b;53:2873–2884. [PMC free article] [PubMed]
- Fuentes M. Periodogram and Other Spectral Methods for Nonstationary Spatial Processes. Biometrika. 2002;89:197–210.
- Fuentes M. Approximate Likelihood for Large Irregularly Spaced Spatial Data. Journal of the American Statistical Association. 2007;102:321–331. [PMC free article] [PubMed]
- Furrer R, Genton MG, Nychka D. Covariance Tapering for Interpolation of Large Spatial Datasets. Journal of Computational and Graphical Statistics. 2006;15:502–523.
- Gaspari G, Cohn SE. Construction of Correlation Functions in Two and Three Dimensions. The Quarterly Journal of the Royal Meteorological Society. 1999;125:723–757.
- Gelfand AE, Schmidt AM, Banerjee S, Sirmans CF. Nonstationary Multivariate Process Modeling Through Spatially Varying Coregionalization” (with discussion) Test. 2004;13:263–312.
- Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. Boca Raton, FL: Chapman & Hall/CRC Press; 2004.
- Ghosh SK, Lee H, Davis J, Bhave P. technical report. North Carolina State Univesity; 2009. Spatio-Temporal Analysis of Total Nitrate Concentrations Using Dynamic Statistical Models.
- Harville DA. Matrix Algebra From a Statistician’s Perspective. New York: Springer; 1997.
- Henderson CR, Quaas RL. Multiple Trait Evaluation Using Relatives Records. Journal of Animal Science. 1976;43:1188–1197.
- Henderson HV, Searle SR. On Deriving the Inverse of a Sum of Matrices. SIAM Review. 1981;23:53–60.
- Higdon D. Space and Space-Time Modeling Using Process Convolutions. In: Anderson C, Barnett V, Chatwin PC, El-Shaarawi AH, editors. Quantitative Methods for Current Environmental Issues. New York: Springer-Verlag; 2002. pp. 37–56.
- Jin X, Banerjee S, Carlin BP. Order-Free Coregionalized Lattice Models With Application to Multiple Disease Mapping. Journal of the Royal Statistical Society, Ser B. 2007;69:817–838. [PMC free article] [PubMed]
- Kamman EE, Wand MP. Geoadditive Models. Applied Statistics. 2003;52:1–18.
- Kaufman CG, Schervish MJ, Nychka DW. Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets. Journal of the American Statistical Association. 2008;103:1545–1555.
- Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley; 1990.
- Kempthorne O, Curnow RN. The Partial Diallel Cross. Biometrics. 1961;17:229–250.
- Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Assoc; 1998.
- Magnussen S. Bias in Genetic Variance Estimates Due to Spatial Autocorrelation. Theoretical and Applied Genetics. 1993;86:349–355. [PubMed]
- Majumdar A, Gelfand AE. Multivariate Spatial Modeling for Geostatistical Data Using Convolved Covariance Functions. Mathematical Geology. 2007;39:225–245.
- Paciorek CJ, Schervish MJ. Spatial Modelling Using a New Class of Nonstationary Covariance Functions. Environmetrics. 2006;17:483–506. [PMC free article] [PubMed]
- Pourahmadi M. Joint Mean-Covariance Model With Applications to Longitudinal Data: Unconstrained Parameterisation. Biometrika. 1999;86:677–690.
- Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. Cambridge, MA: The MIT Press; 2006.
- Reich BJ, Fuentes M. A Multivariate Nonparametric Bayesian Spatial Framework for Hurricane Surface Wind Fields. The Annals of Applied Statistics. 2007;1:249–264.
- Royle JA, Berliner LM. A Hierarchical Approach to Multivariate Spatial Modeling and Prediction. Journal of Agricultural Biological and Environmental Statistics. 1999;4:29–56.
- Royle JA, Nychka D. An Algorithm for the Construction of Spatial Coverage Designs With Implementation in SPLUS. Computers and Geosciences. 1998;24:479–488.
- Rue H, Held L. Gaussian Markov Random Fields: Theory and Applications. Boca Raton, FL: Chapman & Hall/CRC Press; 2006.
- Rue H, Martino S, Chopin N. Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations” (with discussion) Journal of the Royal Statistical Society, Ser B. 2009;71:1–35.
- Ruppert D, Wand MP, Caroll RJ. Semiparametric Regression. Cambridge, U.K: Cambridge University Press; 2003.
- Sorensen D, Gianola D. Likelihood, Bayesian and MCMC Methods in Quantitative Genetics. New York: Springer-Verlag; 2002.
- Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian Measures of Model Complexity and Fit” (with discussion) Journal of the Royal Statistical Society, Ser B. 2002;64:583–639.
- Stein ML. Interpolation of Spatial Data: Some Theory of Kriging. New York: Springer; 1999.
- Stein ML. Spatial Variation of Total Column Ozone on a Global Scale. The Annals of Applied Statistics. 2007;1:191–210.
- Stein ML. A Modeling Approach for Large Spatial Datasets. Journal of the Korean Statistical Society. 2008;37:3–10.
- Stein ML, Chi Z, Welty LJ. Approximating Likelihoods for Large Spatial Datasets. Journal of the Royal Statistical Society, Ser B. 2004;66:275–296.
- Vecchia AV. Estimation and Model Identification for Continuous Spatial Processes. Journal of the Royal Statistical Society, Ser B. 1988;50:297–312.
- Ver Hoef JM, Cressie NAC, Barry RP. Flexible Spatial Models Based on the Fast Fourier Transform (FFT) for Cokriging. Journal of Computational and Graphical Statistics. 2004;13:265–282.
- Wackernagel H. Multivariate Geostatistics: An Introduction With Applications. 3. New York: Springer-Verlag; 2006.
- Waldmann P, Ericsson T. Comparison of REML and Gibbs Sampling Estimates of Multi-Trait Genetic Parameters in Scots Pine. Theoretical and Applied Genetics. 2006;112:1441–1451. [PubMed]
- Yaglom AM. Correlation Theory of Stationary and Related Random Functions. I. New York: Springer-Verlag; 1987.
- Zas R. Iterative Kriging for Removing Spatial Autocorrelation in Analysis of Forest Genetic Trials. Tree Genetics and Genomes. 2006;2:177–185.
- Zhang H. Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics. Journal of the American Statistical Association. 2004;99:250–261.
- Zhang H. Maximum-Likelihood Estimation for Multivariate Spatial Linear Coregionalization Models. Environmetrics. 2007;18:125–139.
- Zimmerman DL, Harville DA. A Random Field Approach to the Analysis of Field-Plot Experiments and Other Spatial Experiments. Biometrics. 1991;47:223–239.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |