Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2748961

Formats

Article sections

- Summary
- 1 Introduction
- 2 Reparameterizations of a Correlation Matrix
- 3 Priors for R via the partial autocorrelations
- 4 Partial correlations in the behavior and social sciences
- 5 Discussion
- References

Authors

Related links

J Multivar Anal. Author manuscript; available in PMC 2010 November 1.

Published in final edited form as:

J Multivar Anal. 2009 November 1; 100(10): 2352–2363.

doi: 10.1016/j.jmva.2009.04.015PMCID: PMC2748961

NIHMSID: NIHMS115017

M.J. Daniels, Department of Statistics, University of Florida;

M.J. Daniels: ude.lfu.tats@sleinadm; M. Pourahmadi: ude.uin.htam@mharuop

See other articles in PMC that cite the published article.

We study the role of partial autocorrelations in the reparameterization and parsimonious modeling of a covariance matrix. The work is motivated by and tries to mimic the phenomenal success of the partial autocorrelations function (PACF) in model formulation, removing the positive-definiteness constraint on the autocorrelation function of a stationary time series and in reparameterizing the stationarity-invertibility domain of ARMA models. It turns out that once an order is fixed among the variables of a general random vector, then the above properties continue to hold and follows from establishing a one-to-one correspondence between a correlation matrix and its associated matrix of partial autocorrelations. Connections between the latter and the parameters of the modified Cholesky decomposition of a covariance matrix are discussed. Graphical tools similar to partial correlograms for model formulation and various priors based on the partial autocorrelations are proposed. We develop frequentist/Bayesian procedures for modelling correlation matrices, illustrate them using a real dataset, and explore their properties via simulations.

Positive-definiteness and high-dimensionality are two major obstacles in modeling the *p* × *p* covariance matrix Σ of a random vector *Y* = (*Y*_{1}, *Y _{p}*)′. These can partially be alleviated using various decompositions which in increasing order of effectiveness are the variance-correlation (Barnard et al., 2000), spectral (Yang and Berger, 1994;) and Cholesky (Pourahmadi, 1999; Chen and Dunson, 2003) decompositions. Only the latter has the unique distinction of providing an unconstrained and statistically interpretable reparameterization of a covariance matrix, but at the expense of imposing an order among the entries of

We present yet another unconstrained and statistically interpretable reparameterization of Σ using the notion of partial autocorrelation from time series analysis (Box et al., 1994; Pourahmadi, 2001, Chap. 7), which, like the Cholesky decomposition also imposes an order among the entries of *Y*; this reparamaterization is also ideal for models that directly include correlation matrices, instead of covariance matrices, including multivariate probit models (Chib and Greenberg, 1998) and copulas (Pitt et al., 2006). For covariance matrices, we start with the decomposition Σ = *DRD* or the variance-correlation strategy (Barnard et al, 2000) and reduce the problem to and focus on reparameterizing a correlation matrix *R =* (*ρ _{ij}*) in terms of a simpler symmetric matrix Π = (

Compared to the long history of the use of the PACF in time series analysis (Quenouille, 1949; Daniels, 1956; Barndorff-Nielson and Schou, 1973; Ramsey, 1974; Jones, 1980; Jones 1987), research on establishing a one-to-one correspondence between a general covariance matrix, its PACF and connecting the latter to the entries of the Cholesky factor of the former has a rather short history. An early work in the Bayesian context is due to Eaves and Chang (1992), followed by Zimmerman (2000) and Pourahmadi (1999; 2001, p.102) and Daniels and Pourahmadi (2002) for longitudinal data, and Dégerine and Lambert-Lacroix (2003) for the time series setup. For a general random vector, Kurowicka and Cooke (2003, 2006) and Joe (2006) have relied on graph-theoretical and standard multivariate techniques, respectively. The origins of a fundamental determinantal identity involving the PACF, unearthed recently by these three groups of researchers can be traced to a notable and somewhat neglected paper of Yule (1907, equ. 25) and in the literature of time series in connection with the Levinson-Durbin type algorithms. It plays a central role in Joe's (2006) method of generating random correlation matrices with distributions independent of the order of the indices, and we use it effectively in introducing priors for the Bayesian analysis of correlation matrices. For a similar application in time series analysis, see Jones (1987).

Correlation matrices themselves are accompanied by additional challenges. The constraint of diagonal elements fixed (at one) complicates both reparameterizations, decompositions and computations. Other than the partial correlation parameterization proposed here, there are no unconstrained parameterizations currently in the statistical literature for a correlation matrix. In addition, recent advances in Bayesian computations for correlation matrices (i.e., sampling with Markov chain Monte Carlo algorithms) rely on augmenting the correlation matrix with a diagonal scale matrix to create a covariance matrix (i.e., parameter expansion algorithms). The strategy is to then sample the inverse of this covariance matrix from a Wishart distribution and then transform back to the correlation matrix; see, e.g., Liu (2001) and Liu and Daniels (2006). However, these approaches do not easily extend to structured correlation matrices (as we will discuss here).

The outline of the paper is as follows. In Section 2, we review the recent results in reparameterizing a correlation matrix via PACF and the Cholesky decomposition. We use the latter to derive a remarkable identity expressing determinant of *R* as a simple function of the partial au-tocorrelations. This identity obtained by Dégerine and Lambert-Lacroix (2003), Joe (2006) and Kurowicka and Cooke (2006), plays a fundamental role in introducing prior distributions for the correlation matrix *R* which is independent of the order of indices used in defining the PACF. The role of a generalized partial correlogram in formulating parsimonious models for *R* is discussed and illustrated using Kenward's (1987) cattle data. In Section 3, we introduce new priors for correlation matrices based on this parameterization, examine their properties and relation to other priors that have appeared in the literature (Barnard, et al., 2000), present a simple approach to sample from the posterior distribution of a correlation matrix, and do some simulations to examine the behavior of these new priors. Section 4 provides guidance on use of these models and tools in applications in behavior and social sciences. Section 5 wraps up and provides directions for future work.

Modeling correlation matrices and simulating random or “typical” correlation matrices are of central importance in various areas of statistics (Chib and Greenberg, 1998; Barnard et al. 2000; Pan and Mackenzie, 2003; Pitt, Chan, and Kohn, 2006), engineering and signal processing (Holmes, 1991), social and behavior sciences (Liu, Daniels, and Marcus, 2009), finance (Engle, 2002) and numerical analysis (Davies and Higham, 2000). An obstacle in dealing with a correlation matrix *R* is that all its diagonal entries are the same and equal to one.

In this section, first we reparameterize correlation/covariance matrices of a general random vector *Y* = (*Y*_{1}, , *Y _{p}*)′ in terms of the

The notion of PACF is known to be indispensable in the study of stationary processes and situations dealing with Toeplitz matrices such as the Szegö's orthogonal polynomials, trigonometric moment problems, geophysics, digital signal processing and filtering (see Landau, 1987; Pourahmadi, 2001), identification of ARMA models, the maximum likelihood estimation of their parameters (Jones, 1980) and simulating a random or “typical” ARMA model (Jones, 1987). The one-to-one correspondence between the stationary autocorrelation functions {*ρ _{k}*} and their PACF {

We parameterize a (non-Toeplitz) correlation matrix *R =* (*ρ _{ij}*) in terms of the lag-1 correlations

Following Joe (2006), for *j* = 1, , *p*−*k*, *k* = 1, , *p*−1, let
${r}_{1}^{\prime}\left(j,k\right)=({\rho}_{j,j+1},\dots ,{\rho}_{j,j+k-1}),{r}_{3}^{\prime}\left(j,k\right)=({\rho}_{j+k,j+1},\dots ,{\rho}_{j+k,j+k-1})$, and *R*_{2}(*j,k*) be the correlation matrix corresponding to the components (*j* + 1,…, *j + k−*1). Then, the partial autocorrelations between *Y _{j}* and

$${\pi}_{j,j+k}=\frac{{\rho}_{j,j+k}-{r}_{1}^{\prime}\left(j,k\right){R}_{2}{\left(j,k\right)}^{-1}{r}_{3}\left(j,k\right)}{{\left[1-{r}_{1}^{\prime}\left(j,k\right){R}_{2}{\left(j,k\right)}^{-1}{r}_{1}\left(j,k\right)\right]}^{1/2}{\left[1-{r}_{3}^{\prime}\left(j,k\right){R}_{2}{\left(j,k\right)}^{-1}{r}_{3}\left(j,k\right)\right]}^{1/2}}\cdot $$

(1)

In what follows and in analogy with *R*, it is convenient to arrange these partial autocorrelations in a matrix Π where its (*j,j + k*)th entry *π _{j}*

$${\rho}_{j,j+k}={r}_{1}^{\prime}\left(j,k\right){R}_{2}{\left(j,k\right)}^{-1}{r}_{3}\left(j,k\right)+{D}_{jk}\phantom{\rule{0.2em}{0ex}}{\pi}_{j,j+k},$$

(2)

where *D _{jk}* is the denominator of the expression in (1). Then, the formulae (1)-(2) clearly establish a one-to-one correspondence between the matrices

Evidently, when *R* is a stationary (Toeplitz) correlation matrix, then *π _{j}*

Moreover, for a stationary correlation matrix, *R* reduces precisely to the celebrated Levinson-Durbin formula (Pourahmadi, 2001, Theorem 7.3) for computing the PACF recursively.

Next, we present an alternative reparameterization of a covariance matrix via its Cholesky decomposition or the idea of autoregression for the underlying random vector.

Consider a mean-zero random vector *Y* with the positive-definite covariance matrix Σ = (*σ _{st}*). For1

$${Y}_{t}=\sum _{j=1}^{t-1}{\phi}_{tj}{Y}_{j}+{\epsilon}_{t},t=1,\cdots ,p.$$

(3)

Let *ε* = (*ε*_{1}, , *ε _{p}*)′ be the vector of successive uncorrelated prediction errors with
$\mathit{\text{Cov}}(\epsilon )=\mathit{\text{diag}}\left({\sigma}_{1}^{2},\cdots ,{\sigma}_{p}^{2}\right)\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}D$. Then, (3) rewritten in matrix form becomes

Computing covariances using *ε = TY*, it follows that

$$T\sum {T}^{\prime}=D,\left|\sum \right|=\prod _{t=1}^{p}{\sigma}_{t}^{2}.$$

(4)

The first factorization in (4), called the modified Cholesky decomposition of Σ, makes it possible to swap the *p(p +* 1)/2 constrained parameters of Σ with the unconstrained set of parameters * _{tj}* and log
${\sigma}_{t}^{2}$ of the same cardinality. In view of the similarity of (3) to a sequence of varying order autoregressions, we refer to the parameters

It should be noted that imposing structures on Σ will certainly lead to constraints on *T* and *D* in (4). For example, a correlation matrix *R* with 1's as its diagonal entries is structured with possibly *p(p −* 1)/2 distinct parameters. In this case, certain entries of *T* and *D* are either known, redundant or constrained. In fact, it is easy to see that the diagonal entries of the matrix *D* for a correlation matrix are monotone decreasing with
${\sigma}_{1}^{2}=1$ For this reason and others, it seems more prudent to rely on the ordered partial correlations when reparameterizing a correlation matrix *R* as in Section 2.1, than using its Cholesky decomposition.

First, we study the role of partial autocorrelations in measuring the reduction in prediction error variance when a variable is added to the set of predictors in a regression model. Using this and the second identity in (4) we obtain a fundamental determinantal identity expressing |Σ| in terms of the partial autocorrelations and diagonal entries of Σ. Joe (2006) and Kurowicka and Cooke (2006) had obtained this identity using determinantal recursions and graph-theoretical methods based on (1), respectively. An earlier and a slightly more general determinantal identity for covariance matrices in the context of nonstationary processes was given by Dégerine and Lambert-Lacroix (2003, p.54), using an analogue of the Levinson-Durbin algorithm.

For *u* and *v* two distinct integers in {1, 2, , *p*}, let *L* be a subset of {1, 2, , *p*}*\*{*u, v*} and *π _{uv|L}* stand for the partial correlation between

**Lemma 1.** Let *Y* = (*Y*_{1}, , *Y*_{p})′ be a mean-zero random vector with a positive-definite covariance matrix Σ. Then, with *u*,*v* and *L* as above, we have

- $${\widehat{Y}}_{u|Lv}={\widehat{Y}}_{u|L}+{\alpha}_{uv\phantom{\rule{0.2em}{0ex}}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right)\phantom{\rule{0.2em}{0ex}},{\alpha}_{uv}={\pi}_{uv|L}\sqrt{\frac{\mathit{\text{Var}}\left({Y}_{u}-{\widehat{Y}}_{u|L}\right)}{\mathit{\text{Var}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right)}}.$$(5)
- $$\mathit{\text{Var}}\phantom{\rule{0.2em}{0ex}}\left({Y}_{u}-{\widehat{Y}}_{u|Lv}\right)=\left(1-{\pi}_{uv|L}^{2}\right)\phantom{\rule{0.2em}{0ex}}\mathit{\text{Var}}\phantom{\rule{0.2em}{0ex}}\left({Y}_{u}-{\widehat{Y}}_{u|L}\right).$$(6)

**Proof.** (a) Let *sp*{*Y _{u}; uεL*} stand for the linear subspace generated by the indicated random variables. Since

$$sp\left\{{Y}_{u};u\epsilon Lv\right\}\phantom{\rule{0.2em}{0ex}}=sp\left\{{Y}_{u};u\epsilon L\right\}\oplus sp\left\{{Y}_{v}-{\widehat{Y}}_{v|L}\right\},$$

and from the linearity of the orthogonal projection we have

$${\widehat{Y}}_{u|Lv}={\widehat{Y}}_{u|L}+{\alpha}_{uv}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right),$$

where *α _{uv}*, the regression coefficient of

$${\alpha}_{uv}=\frac{\mathit{\text{Cov}}\left({Y}_{u},{Y}_{v}-{\widehat{Y}}_{v|L}\right)}{\mathit{\text{Var}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right)}.$$

Since *Ŷ _{u|L}ε sp*{

$$\begin{array}{l}{\alpha}_{uv}=\frac{\mathit{\text{Cov}}\left({Y}_{u}-{\widehat{Y}}_{u|L},{Y}_{v}-{\widehat{Y}}_{v|L}\right)}{\sqrt{\mathit{\text{Var}}\left({Y}_{u}-{\widehat{Y}}_{u|L}\right)\mathit{\text{Var}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right)}}\cdot \sqrt{\frac{\mathit{\text{Var}}\left({Y}_{u}-{\widehat{Y}}_{u|L}\right)}{\mathit{\text{Var}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right)}}\\ \phantom{\rule{1.7em}{0ex}}={\pi}_{uv|L}\phantom{\rule{0.2em}{0ex}}\sqrt{\frac{\mathit{\text{Var}}\left({Y}_{u}-{\widehat{Y}}_{u|L}\right)}{\mathit{\text{Var}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right)}}.\end{array}$$

(b) From the first identity in (5), it is immediate that

$${Y}_{u}-{\widehat{Y}}_{u|Lv}={Y}_{u}-{\widehat{Y}}_{u|L}-{\alpha}_{uv\phantom{\rule{0.2em}{0ex}}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right),$$

or

$${Y}_{u}-{\widehat{Y}}_{u|Lv}+{\alpha}_{uv\phantom{\rule{0.1em}{0ex}}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right)={Y}_{u}-{\widehat{Y}}_{u|L}.$$

(7)

Computing variances of both sides of (7) and using the fact that *Y _{u}* −

$$\mathit{\text{Var}}\phantom{\rule{0.1em}{0ex}}\left({Y}_{u}-{\widehat{Y}}_{u|Lv}\right)=\mathit{\text{Var}}\phantom{\rule{0.1em}{0ex}}\left({Y}_{u}-{\widehat{Y}}_{u|L}\right)-{\alpha}_{uv}^{2}\mathit{\text{Var}}\phantom{\rule{0.1em}{0ex}}\left({Y}_{v}-{\widehat{Y}}_{v|L}\right).$$

Now, substituting for *α _{uv}* from (a), the desired result follows.

Next, we use the recursion (6) to express the innovation variances or the diagonal entries of *D* in terms of the partial correlations or the entries of *π*. Similar expressions for * _{tj}*'s, the entries of

**Theorem 1.** Let *Y* = (*Y*_{1}, , *Y*_{p}) be a mean-zero random vector with a positive-definite covariance matrix Σ which can be decomposed as in (4).

- Then, for
*t*= 2, ,*p*;*j*= 1, ,*t*− 1, we have$${\sigma}_{t}^{2}={\sigma}_{tt}\prod _{j=1}^{t-1}\left(1-{\pi}_{jt}^{2}\right),$$ - $$\left|\sum \right|=\left(\prod _{t=1}^{p}{\sigma}_{tt}\right)\prod _{t=2}^{p}\prod _{j=1}^{t-1}\left(1-{\pi}_{jt}^{2}\right).$$
- For
*t*= 2, ,*p; L*= {2, ,*t*− 1}$${\phi}_{t1}={\pi}_{1t}\sqrt{\frac{\mathit{\text{Var}}\left({Y}_{t}-{\widehat{Y}}_{t|L}\right)}{\mathit{\text{Var}}\left({Y}_{1}-{\widehat{Y}}_{1|L}\right)}}\cdot $$ =_{tj}−_{tj|L}_{t1}_{1,t−j|L}, for*j*= 2, ,*t*−1,

where * _{tj|L}* and

$${\widehat{Y}}_{t|L}=\sum _{j=2}^{t-1}{\phi}_{tj|L\phantom{\rule{0.2em}{0ex}}}{Y}_{j},{\widehat{Y}}_{1|L}=\sum _{j=2}^{t-1}{\phi}_{t,t-j}{Y}_{j}\cdot $$

**Proof** (a) follows from the repeated use of (6) with *u = t* and *v = j,j =* 1, , *t* −1. (b) follows from (4) and (a).

Part (c) proved first in Pourahmadi (2001, p.102) shows that only the entries of the first column of *T* are multiples of the ordered partial correlations appearing in the first column of *π*. However, for *j* > 1, since * _{tj}* is a multiple of the partial correlation between

Parsimonious modeling of the GARP of the modified Cholesky decomposition often relies on exploring for structure as a function of lag; for example, fitting a polynomial to the GARP as a function of lag (Pourahmadi, 1999). For such models, the GARP, in a sense, have different interpretations within lag; i.e., the lag 1 coefficient from the regression of *Y*_{3} on (*Y*_{2}, *Y*_{1}) is the *Y*_{2} coefficient, when another variable, *Y* is also in the model; however, for the regression of *Y*_{2} on *Y*_{1}, the lag 1 coefficient is the Y_{1} coefficient with no other variable in the model. So, for a given lag *k*, the lag *k* coefficients all come from conditional regressions where the number of variables conditioned on are different. However, by construction, the lag *k* partial autocorrelations are all based on conditional regressions where the number of conditioning variables are the same, always conditioning on *k −* 1 intervening variables. This will facilitate building models for the partial autocorrelations as a function of lag. We discuss such model building in the next section.

In this section, we use the generalized partial correlogram, i.e. the plot of {*π _{j}*

Note that the partial autocorrelations *π _{j}*

To illustrate the capabilities of the generalized correlograms in revealing patterns, we use the cattle data (Kenward, 1987) which consists of *p =* 11 bi-weekly measurements of the weights of *n =* 30 cows. Table 1, displays the sample (partial) correlations for the cattle data in the lower (upper) triangular segment and the sample variances are along the main diagonal. It reveals several interesting features of the dependence in the data that the commonly used profile plot of the data cannot discern. For example, note that all the correlations are positive, they decrease monotonically within the columns (time-separation), they are not constant (nonstationary) within each subdiagonal. In fact, they tend to increase over time (learning effect). Furthermore, the partial autocorrelations of lags 2 or more are insignificant except for the entries 0.56 and 0.35.

Cattle Data. Sample correlations (below the main diagonal), sample PACF (above the main diagonal) and sample variances (along the main diagonal).

Figure 1 presents the generalized correlograms corresponding to the sample correlation matrix of the data, the full partial correlations, the generalized partial correlogram and the Fisher's *z* transform of the PACF. Note that the first two correlograms suggest linear and quadratic patterns in the lag *k*, but in fitting such models one has to be mindful of the constraints on the coefficients so that the corresponding fitted correlation matrices are positive definite. Details of fitting such models and the ensuing numerical results can be found in Pourahmadi (2001). The generalized partial correlogram in (c) reveals a cubic polynomial in the lags, i.e. *π _{j}*

(a) Generalized sample correlogram for the cattle data, (b) Generalized inverse correlogram, (c) Generalized partial correlogram, (d) Plot of Fisher *z* transform of the PACF.

In addition to the advantages for formulating parsimonious models, the unconstrainedness of the PACF suggests some approaches for constructing priors for *R* using independent linearly transformed Beta priors on (−1,1) for the PACF.

Given that each partial autocorrelation is free to vary in the interval (−1,1), we may construct priors for *R* derived from independent priors on the PACF. For example, independent Unif[-1,1] priors (IU priors) on the partial autocorrelations (Jones, 1987; Joe, 2006) can be shown to induce the following prior on the correlation matrix *R*:

$$p(R)\propto [\prod _{k=1}^{p-1}\prod _{j=1}^{p-k}{(1-{\pi}_{j,j+k}^{2})}^{p-1-k}{]}^{-1/2}.$$

(8)

One can express the prior in (8) in terms of the marginal correlations, *ρ _{j,j+k}* by plugging in for

Marginal priors on *ρ*_{jk} from independent uniform priors on the partial correlations, *π*_{jk}. The subplots are arranged as the matrix *R*.

It is also evident from Figure 2 that the priors for *ρ _{j}*

**Theorem 2.** The partial autocorrelations, *π _{j,k}* have independent “stationary” priors, i.e

$$p\left({\pi}_{jk}\right)=p\left({\pi}_{il}\right)\phantom{\rule{0.2em}{0ex}}if\left|j-k\right|=\left|i-l\right|,$$

(9)

(or the priors are the same along the subdiagonals of Π), if and only if the marginal priors on the correlations *ρ _{jk}* are also “stationary”, i.e.

$$p\left({\rho}_{jk}\right)=p\left({\rho}_{il}\right)\phantom{\rule{0.2em}{0ex}}if\left|j-k\right|=\left|i-l\right|.$$

(10)

(or the priors are the same along the corresponding subdiagonal of *R*).

This implies that “stationary” priors on Π induce “stationary” priors on the marginal correlations and vice versa. Most priors we introduce here satisfy this property.

More generally independent linearly transformed Beta priors on the interval (-1,1) for partial correlations are a convenient and flexible way to specify a prior for a correlation matrix *R*. These priors, denoted by *Beta*(*α*,γ), have the density

$$p\left(\rho \right)=\frac{1}{2\beta \left(\alpha ,\gamma \right)}{(\frac{1+\rho}{2})}^{\alpha -1}{(\frac{1-\rho}{2})}^{\gamma -1}.$$

(11)

Interestingly, the uniform prior on the correlation matrix (Barnard et al, 2000) corresponds to the following stationary Beta priors on the partial correlations:

$${\pi}_{i,i+k}\sim \mathit{\text{Beta}}\left({\alpha}_{k},{\alpha}_{k}\right),$$

(12)

where
${\alpha}_{k}=1+\frac{1}{2}\left(p-1-k\right)$; see Joe (2006). We will refer to this prior as Barnard Beta (BB). As noted in Barnard et al. (2000), such a prior on *R* results in the marginal priors for each of the marginal correlations being somewhat peaked around zero (same peakedness for *all ρ _{jk}*). Also note that the priors become more peaked as p grows.

In general, priors for the correlation matrix proportional to powers of the determinant of the correlation matrix,

$$p\left(R\right)\propto {\left|R\right|}^{{\alpha}_{p}-1}$$

(13)

are constructed by setting
${\alpha}_{k}={\alpha}_{p}+\frac{1}{2}\left(p-1-k\right)$ in (12) Priors so constructed are proper, so improper priors like Jeffreys' for a correlation matrix in a multivariate normal model, *π* (*R*) = |*R*|^{−(}^{p}^{+1)/2}, are not special cases.

The IU priors on the PACF induce desirable behavior for longitudinal (ordered data) by ‘shrinking’ higher lag correlations toward zero. The Beta priors in (12), which induce a uniform prior for *R* (BB priors) place a uniform (-1,1) prior on the lag *p −* 1 partial autocorrelations and shrink the other partial autocorrelations toward zero with the amount of shrinkage being *inversely* proportional to lag. This induces the desired behavior on the marginal correlations, making their marginal priors equivalent, but it is counter-intuitive for ordered/longitudinal data with serial correlation; in addition, the shrinkage of the lag one partial autocorrelations for the BB prior increases with *p* (recall the form in (12)). In such data, we would expect lower lag correlations to be less likely to be zero and higher lag correlations to be more likely to be zero. Thus, the independent uniform priors are likely a good default choice for the partial autocorrelations in terms of inducing desirable behavior on the marginal correlations and not counter-intuitively shrinking the partial autocorrelations. We explore this shrinkage behavior further via some simulations in Section 3.5.

In addition, we expect many of the higher lag partial correlations to be close to zero for longitudinal data with serial correlation via conditional independence (see Table 1). To account for this, we could (aggressively) shrink the partial autocorrelations toward zero (with the shrinkage increasing with lag) using shrinkage priors similar to those proposed in Daniels and Pourahmadi (2002) and Daniels and Kass (2001) or by creating such priors based on the Beta distributions proposed here. We are currently exploring this.

Other priors for *R* have been proposed in the literature which cannot be derived based on independent priors on the partial autocorrelations. For example, the prior on *R* that induces marginal uniform (-1,1) priors on the *ρ _{ij}'s* (Barnard et al., 2000) has the form:

$$p(R)=\phantom{\rule{0.2em}{0ex}}|R{|}^{p(p-1)/2-1}\prod _{i=1}^{p}\phantom{\rule{0.2em}{0ex}}|R[-i,-i]{|}^{-\left(p+1\right)/2}$$

(14)

where *R*[−*i*,−*i*] is the submatrix of *R* with the *i*th row and column removed and
$|R\left[-i,-i\right]={\left[R\right]}_{ii}^{-1}\left|R\right|$. Such a prior might not be a preferred one for longitudinal (ordered) data where the same marginal priors on all correlations (irrespective of lag) may not be the best default choice.

Eaves and Chang (1992) derived some related reference priors for the set of partial correlations, *π*_{1},*j* for *j* = 2,…, *p*; however, their priors are not natural for longitudinal data. Chib and Greenberg (1998) specified a truncated multivariate normal distribution on the marginal correlations. Liechty et al. (2004) placed normal distributions on the marginal correlations with the goals of grouping the marginal correlations into clusters. The latter two priors along with those in Wong et al. (2003) for the full partial correlations (*p ^{ij}*) are highly constrained given that they model the marginal correlations directly.

An additional issue with modeling the correlation matrix is computational. Our development here will focus on cases without covariates in the correlation matrix (this will be left for future work) under the class of independent priors on the PACF discussed in Section 3.1. The proposal here might be viewed as an alternative to the PX-RPMH algorithm in Liu and Daniels (2006) that explicitly exploits the fact that we are modeling the partial correlations themselves (a computational comparison will be left for future work). However, our approach will naturally allow structures in the partial correlations which cannot be done when using current versions of the PX-RPMH (or similar) algorithms; for example, if the partial autocorrelations are zero or constant within lag since then the correlation matrix is highly constrained.

In the following, we assume the data, {*Y*_{i} : *i* = 1,…, *n*}are independent, normally distributed *p*-vectors with mean *X*_{i}*β* and with covariance matrix Σ = *R* (a correlation matrix). A natural way to sample the partial autocorrelations is via a Gibbs sampling algorithm in which we sample from the full conditional distributions of each of the partial autocorrelations. Given that the full conditional distributions of the partial autocorrelations are not available in closed form there are several options to sample them. We explore a simple one next.

We propose to use an auxiliary variable approach to sample each partial autocorrelation. Define the likelihood for the partial autocorrelation, *π _{jk}* as

$$L({\pi}_{\text{jk}})p({\pi}_{\text{jk}})={\int}_{0}^{\infty}I\{{u}_{\text{jk}}<L({\pi}_{\text{jk}})\}p({\pi}_{jk})d{u}_{jk}.$$

(15)

To sample *π _{jk}* we can proceed in two steps,

- Sample
*U*~_{jk}*Unif*(0,*L*(*π*))._{jk} - Sample
*p*(*π*) constrained to the set {_{jk}*π*:_{jk}*L*(*π*) >_{jk}*U*}._{jk}

Truncated versions of the priors proposed here, linearly transformed Beta distributions (of which the uniform is a special case), can easily be sampled using the approach in Damien and Walker (2001). The truncation region for step 2, given that the domain of *π _{jk}* is bounded, can typically be found quickly numerically.

The likelihood evaluations needed to find the truncation interval can be made simpler by taking advantage of the following facts.

**Fact 1.** If we factor *R ^{−}*

**Fact 2.** To isolate the likelihood contribution of π_{j,j}_{+}* _{k}*, we can factor the entire multivariate normal distribution into

**Fact 3.** Using the determinantal identity in Theorem 1(b), the determinant of submatrices of *R* in terms of partial correlations can be written as a function of the partial correlations,

$$\left|R\left[j:j+k\right]\right|=\prod _{l=1}^{k}\prod _{i=j}^{j+k-l}(1-{\pi}_{i,i+l}^{2})$$

(16)

It can be shown that the likelihood in terms of one of the partial correlations of interest, say *π _{jk}*, can be written as

$$\mathit{\text{Lik}}({\pi}_{jk})\propto \phantom{\rule{0.2em}{0ex}}exp(-\frac{1}{2}tr\{{R}^{-1}[j:k]{s}_{y}[j:k]\})(1-{\pi}_{jk}^{2}{)}^{-n/2}$$

(17)

where *S _{y}*[

Extensions of these computational procedures to modelling the correlation matrix when the matrix of interest is a covariance matrix is straightforward (see, e.g., Liu and Daniels, 2006).

We now conduct some simulations in a longitudinal setting to

- examine the mixing behavior of the auxiliary variable sampler here and
- compare the risk of the IU prior to the BB prior, the standard default prior for a correlation matrix.

In terms of the mixing of the Markov chain, the auxiliary variable sampler on the partial autocorrelations works quite well, with the lag correlation in the chain dissipating quickly. For example, for *p =* 5, *n =* 20, the lag correlations for each partial autocorrelation was negligible by lag 10. Similar results were seen for other *p*/*n* combinations.

For our simulation, we consider three true matrices representing typical serial correlation, an AR(1) with lag 1 correlation of .8 and one with lag correlation .6. Both these matrices have all partial autocorrelation beyond lag 1 equal to 0. We also considered a matrix that had more non-zero partial autocorrelations with lags 1,2,3,4 equal to (.8, .4, .1,0) respectively. We consider two size matrices/sample size combinations, *p =* 5 with *n =* 10,25,50,100 and *p =* 10 with sample sizes, *n =* 20,50,100. We also consider several loss functions, log likelihood (LL) loss, *tr*(*R*^{−1}) − log|*R*^{−1}| −*p*, with Bayes estimator the inverse of the posterior expectation of R^{−1} and squared error loss on Fisher's z-transform of the partial autocorrelations, *π _{jk}* (SEL-P) and the marginal correlations,

For *p = 5*, the risk reductions from the IU prior are clear from Table 3, with percentage reductions as large as 30% for *n =* 10, 20% for *n =* 25 and 10% for *n =* 50. For *p =* 10, the risk reductions from the uniform prior are clear from Table 4, with percentage reductions as large as 40%. The largest risk reductions were for loss SEL-M (squared error loss on the Fisher's z-transform on the marginal correlations). The lower risk reductions for the first order autoregressive covariance matrices are related to only the lag 1 partial autocorrelations being non-zero; so the shrinkage of the BB prior for all the other partial autocorrelations is not unreasonable. Examination of squared error loss for the partial autocorrelations by lag indicates large reductions for the lag 1 partial autocorrelations and small increases for the other lag partial autocorrelations.

Results for *p* = 5. Each row corresponds to *n* = 10, 25, 50, 100. IU: independent uniform priors; BB: Barnard Beta priors. LL: log likelihood loss; SEL-P: squared error loss on Fisher's z-transformation of the partial autocorrelations; SEL-M: squared error **...**

Results for *p* = 10. Each row corresponds to *n* = 20, 50, 100. IU: independent uniform priors; BB: Barnard Beta priors. LL: log likelihood loss; SEL-P: squared error loss on Fisher's z-transformation of the partial autocorrelations; SEL-M: squared error **...**

In addition, the estimates of the first order lag correlation themselves show large differences (not shown). For AR(.8) with *p =* 10 and *n =* 20, the means were .76 under the IU prior and about .71 under the BB prior with similar discrepancies for *p = 5*.

Some of the risk reductions from using the IU priors on the partial autocorrelations are small. However, they come at no computational cost (unlike some priors for covariance matrices proposed in the literature) and are consistent with prior beliefs about partial autocorrelations representing serial correlation. The BB prior is not a good default choice due to its dampening effects on the important lower order partial autocorrelations. Further risk improvement might be expected through the use of more targeted shrinkage (Daniels and Pourahmadi, 2002).

The models and priors for partial correlations are extremely important for many applications involving longitudinal and functional data in the behavior and social sciences. In particular, modeling longitudinal data using structural equation and factor analytic models (i.e., latent variable models in general) typically require careful modeling of correlation matrices (see e.g., Daniels and Normand, 2006) as do multivariate probit models (Chib and Greenberg, 1998; Czado, 2000; Liu, Daniels, and Marcus, 2009). The tools here provide both a general class of methods for using the ordered partial correlations that allow parsimonious modeling of correlations via regression modeling and sensible priors on correlations within such models which is often essential in small to medium sized datasets. Such modeling takes on even more importance in the presence of incomplete data (Daniels and Hogan, 2008). In addition, the uniform priors on the partial correlations recommended in Section 3 provide no additional computational challenges over standard priors for a correlation matrix. Future work will illustrate these methods more fully in applications.

Using the variance-correlation separation strategy, modeling a covariance matrix is reduced to that of its correlation matrix *R* which has the additional constraint that all its diagonal entries must equal to one. Though the Cholesky decomposition can handle the positive-definiteness, it cannot be applied directly when there are additional constraints such as stationarity or constancy along diagonals (Pourahmadi, 1999, Sec. 2.6), zero entries (Chaudhuri, Drton and Richardson, 2007) and separable covariance structures (Lu and Zimmerman, 2005). The reparameterization in terms of partial autocorrelations is shown to work well in the face of an additional constraint. It requires ordering the variables which is not a problem for longitudinal and functional data, but might be difficult to justify for other situations. Related work on trying to ‘order’ data that does not have a natural ordering can be found in Stein et al (2004) and Berger and Bernardo (1992). The long history and successful use of the PACF in the time series literature provide valuable graphical and analytical tools which can be generalized to the nonstationary setup.

Given the conditioning structure of the partial autocorrelations, we expect many of them to be zero (see Table 1). Thus, it would be natural to adapt the approach in Wong, Carter, and Kohn (2003) to zero out the partial correlations. We might expect computational simplifications given that the PACF are free to vary independently in [−1, 1] unlike the full partial correlations. In addition, when constructing priors for the probability of a partial autocorrelation being zero, the lack of exchangeability of the partial autocorrelations (vs. the full partial correlations) given that they condition on different numbers of variables (i.e., only the intervening variables) must be taken into account; such issues have been addressed in Liu et al. (2009) in a related setting.

We will explore the computational efficiency of other proposals for Bayesian computing in future work, including sampling all the partial autocorrelations together. In addition, we will derive strategies for Bayesian inference when modeling Fisher's *z* transform of the partial autocorrelations as a function of covariates.

We thank Yanpin Wang for coding the simulations. This research was partially funded by grants from NIH (Daniels) and NSF (Pourahmadi).

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

M.J. Daniels, Department of Statistics, University of Florida.

M. Pourahmadi, Division of Statistics, Northern Ill. University.

- Anderson TW. An Introduction to Multivariate Statistical Analysis. John Wiley & Sons; 1984.
- Barnard J, McCulloch R, Meng X. Modeling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage. Statistica Sinica. 2000;10:1281–1312.
- Barndorff-Nielsen O, Schou G. On the parameterization of autoregressive models for partial autocorrelation. J of Multivariate Analysis. 1973;3:408–419.
- Berger JO, Bernardo JM. On the development of reference priors. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics 4: Proceedings of the Fourth Valencia Meeting; Clarendon Press; 1992. pp. 35–49.
- Box GEP, Jenkins GM, Reinsel GC. Time Series Analysis-Forecasting and Control. Revised 3rd. Prentice Hall; NJ: 1994.
- Chen Z, Dunson D. Random effects selection in linear mixed models. Biometrics. 2003;59:159–182.
- Chib S, Greenberg E. Analysis of multivariate probit models. Biometrika. 1998;85:347–361.
- Chaudhuri S, Drton M, Richardson TS. Estimation of a covariance matrix with zeros. Biometrika. 2007;94:199–216.
- Chiu TYM, Leonard T, Tsui KW. The matrix-logarithm covariance model. Journal of the American Statistical Association. 1996;91:198–210.
- Czado C. Multivariate regression analysis of panel data with binary outcomes applied to unemployment data. Statistical Papers. 2000;41:281–304.
- Damien P, Wakefield J, Walker SG. Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables. Journal of the Royal Statistical Society, Series B: Statistical Methodology. 1999;61:331–344.
- Damien P, Walker SG. Sampling truncated normal, Beta, and Gamma densities. Journal of Computational and Graphical Statistics. 2001;10:206–215.
- Daniels HE. The approximate distribution of serial correlation coefficients. Biometrika. 1956;43:169–185.
- Daniels MJ, Hogan JW. Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis. Chapman & Hall (CRC Press); 2008.
- Daniels M, Kass R. Nonconjugate Bayesian estimation of covariance matrices in hierarchical models. Journal of the American Statistical Association. 1999;94:1254–1263.
- Daniels MJ, Kass RE. Shrinkage estimators for covariance matrices. Biometrics. 2001;57:1173–1184. [PMC free article] [PubMed]
- Daniels M, Normand SL. Longitudinal profiling of health care units based on mixed multivariate patient outcomes. Biostatistics. 2006;7:1–15. [PMC free article] [PubMed]
- Daniels MJ, Pourahmadi M. Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika. 2002;89:553–566.
- Davies PI, Higham NJ. Numerically stable generation of correlation matrices and their factors. BIT. 2000;40:640–651.
- Dégerine S, Lambert-Lacroix S. Partial autocorrelation function of a nonstationary time series. J Multivariate Analysis. 2003;89:135–147.
- Dempster AP. Covariance selection. Biometrics. 1972;28:157–175.
- Eaves D, Chang T. Priors for ordered conditional variances and vector partial correlation. J of Multivariate Analysis. 1992;41:43–55.
- Engle RF. Dynamic conditional correlation: A simple class of multivariate GARCH models. Journal of Business and Economics. 2002;20:339–350.
- Holmes RB. On random correlation matrices. SIAM J Matrix Anal Appl. 1991;12:239–272.
- Joe H. Generating random correlation matrices based on partial correlations. Journal of Multivariate Analysis. 2006;97:2177–2189.
- Jones MC. Randomly choosing parameters from the stationarity and invertibility region of autoregressive-moving average models. Applied Statistics. 1987;36:134–138.
- Jones RH. Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics. 1980;22:389–395.
- Kenward MG. A method for comparing profiles of repeated measurements. Biometrics. 1987;44:959–971.
- Kurowicka D, Cooke R. A parameterization of positive definite matrices in terms of partial correlation vines. Linear Algebra and its Applications. 2003;372:225–251.
- Kurowicka D, Cooke R. Completion problem with partial correlation vines. Linear Algebra and its Applications. 2006;418:188–200.
- Landau HJ. Maximum entropy and the moment problem. Bull of the Amer Math Soc. 1987;16:47–77.
- Liechty JC, Liechty MW, Muller P. Bayesian correlation estimation. Biometrika. 2004;91:1–14.
- Liu C. Comment on “The art of data augmentation” (Pkg: p1-111) Journal of Computational and Graphical Statistics. 2001;10:75–81.
- Liu X, Daniels MJ. A new algorithm for simulating a correlation matrix based on parameter expansion and reparameterization. Journal of Computational and Graphical Statistics. 2006;15:897–914.
- Liu X, Daniels MJ, Marcus B. Joint models for the association of longitudinal binary and continuous processes with application to a smoking cessation trial. Journal of the American Statistical Association 2009 [PMC free article] [PubMed]
- Lu N, Zimmerman DL. The likelihood ratio test for a separable covariance matrix. Statistics and Probability Letters. 2005;73:449–457.
- McCullagh P, Nelder JA. Generalized Linear Models. 2nd. London: Chapman & Hall; 1989.
- Pan J, MacKenzie G. On modelling mean-covariance structures in longitudinal studies. Biometrika. 2003;90:239–244.
- Pitt M, Chan D, Kohn R. Efficient Bayesian inference for Gaussian copula regression models. Biometrika. 2006;93:537–554.
- Pourahmadi M. Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika. 1999;86:677–690.
- Pourahmadi M. Foundations of Time Series Analysis and Prediction Theory. Wiley; New York: 2001.
- Pourahmadi M. Cholesky decompositions and estimation of a covariance matrix: Orthogonality of variance-correlation parameters. Biometrika. 2007;94:1006–1013.
- Quenouille MH. Approximate tests of correlation in time series. Journal of Royal Statistical Society, Series B. 1949;11:68–84.
- Ramsey FL. Characterization of the partial autocorrelation function. Annals of Statistics. 1974;2:1296–1301.
- Stein ML, Chi Z, Welty LJ. Approximating likelihoods for large spatial data sets. Journal of Royal Statistical Society Series B. 2004;50:275–296.
- Wermuth N, Cox DR, Marchetti GM. Covariance chains. Bernoulli. 2006;12:841–862.
- Wong F, Carter CK, Kohn R. Efficient estimation of covariance selection models. Biometrika. 2003;90:809–830.
- Yang R, Berger JO. Estimation of a covariance matrix using the reference prior. Annals of Statistics. 1994;22:1195–1211.
- Yule GU. On the theory of correlation for any number of variables treated by a new system of notation. Roy Soc Proc. 1907;79:85–96.
- Zimmerman DL. Viewing the correlation structure of longitudinal data through a PRISM. The American Statistician. 2000;54:310–318.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |