Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2880335

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Quality of approximation
- 3. Testing for independence of two sets of variables
- 4. Canonical correlation analysis
- 5. Tests of common means or variances
- 6. Multivariate linear model
- 7. Concluding discussion
- References

Authors

Related links

Ann Appl Stat. Author manuscript; available in PMC 2010 June 3.

Published in final edited form as:

Ann Appl Stat. 2009; 3(4): 1616–1633.

doi: 10.1214/08-AOAS220PMCID: PMC2880335

NIHMSID: NIHMS199316

Iain M. Johnstone, Department of Statistics, Stanford University, Stanford, California 94305, USA;

Iain M. Johnstone: ude.drofnats@jmi

See other articles in PMC that cite the published article.

The greatest root distribution occurs everywhere in classical multivariate analysis, but even under the null hypothesis the exact distribution has required extensive tables or special purpose software. We describe a simple approximation, based on the Tracy–Widom distribution, that in many cases can be used instead of tables or software, at least for initial screening. The quality of approximation is studied, and its use illustrated in a variety of setttings.

The greatest root distribution is found everywhere in classical multivariate analysis. It describes the null hypothesis distribution for the union intersection test for any number of classical problems, including multiple response linear regression, MANOVA, canonical correlations, equality of covariance matrices and so on. However, the exact null distribution is difficult to calculate and work with, and so the use of extensive tables or special purpose software has always been necessary.

This paper describes a simple asymptotic approximation, based on the Tracy Widom distribution. The approximation is not solely asymptotic; we argue that it is reasonably accurate over the entire range of the parameters. “Reasonably accurate” means, for example, less than ten percent relative error in the 95th percentile, even when working with *two* variables and any combination of error and hypothesis degrees of freedom.

This paper focuses on the approximation, its accuracy and its applicability to a range of problems in multivariate analysis. A companion paper [Johnstone (2008)] contains all proofs and additional discussion.

Our main claim is that for many applied purposes, the Tracy–Widom approximation can often, if not quite always, substitute for the elaborate tables and computational procedures that have until now been needed. Our hope is that this paper might facilitate the use of the approximation in applications in conjunction with appropriate software.

To briefly illustrate the Tracy–Widom approximation in action, we revisit the rootstock data, as discussed in Rencher (2002), pages 170–173. In a classical experiment carried out from 1918–1934, apple trees of different rootstocks were compared (Andrews and Herzberg [(1985), pages 357–360] has more detail). Rencher (2002) gives data for eight trees from each of six rootstocks. Four variables are measured for each tree: Girth4= trunk girth at 4 years in mm, Growth4= extension growth at 4 years in m, Girth15= trunk girth at 15 years in mm, and Wt15= weight of tree above ground at 15 years in lb.

Stock | Girth4 | Growth4 | Girth15 | Wt15 | |

1 | I | 111 | 2.569 | 358 | 760 |

2 | I | 119 | 2.928 | 375 | 821 |

… | |||||

47 | VI | 113 | 3.064 | 363 | 707 |

48 | VI | 111 | 2.469 | 395 | 952 |

A one-way multivariate analysis of variance can be used to examine the hypothesis of equality of the four-dimensional vectors of mean values corresponding to each of the six groups (rootstocks). The standard tests are based on the eigenvalues of (**W** + **B**)^{−1}**B**, where **W** and **B** are the sums of squares and products matrices within and between groups respectively. We focus here on the largest eigenvalue, with observed value *θ*^{obs} = 0.652. Critical values of the null distribution depend on parameters, here s = 4, m = 0, n = 18.5 [using (8) below, along with the conventions of Section 5.1 and Definition *θ*]. Traditionally these are found by reference to tables or charts. Here, the 0.05 critical value is found—after manual interpolation in those tables—to be *θ*_{0.05} = 0.377. The approximation (6) of this paper yields the approximate 0.05 critical value
${\theta}_{0.05}^{\text{TW}}=0.384$, which clearly serves just as well for rejection of the null hypothesis.

It is more difficult in standard packages to obtain *p*-values corresponding to *θ*^{obs}. The default is to use a lower bound based on the *F* distribution [see (12)], here *p _{F}* (

The rest of this introduction provides enough background to state the main Tracy–Widom approximation result. Section 2 focuses on the quality of the approximation by looking both at conventional percentiles and at very small *p*-values. The remaining Sections 3–6 describe some of the classical uses of the largest root test in multivariate analysis, in each case in enough detail to identify the parameters used. Some extra attention is paid in Section 6 to the multivariate linear model, in view of the wide variety of null hypotheses that can be considered.

Our setting is the distribution theory associated with sample draws from the multivariate normal distribution. For definiteness, we use the notation of Mardia, Kent and Bibby (1979), to which we also refer for much standard background material. Thus, if **x**_{1}, …, **x*** _{n}* denotes a random sample from

A *p* × *p* matrix **A** that can be written **A** = **X**′**X** in terms of such a normal data matrix is said to have a Wishart distribution with scale matrix **Σ** and degrees of freedom parameter *n*, **A** ~ *W _{p}*(

We consider analogs of the *F* and Beta distributions of multivariate analysis, which are based on two independent chi-squared variates. Thus, let **A** ~ *W _{p}*(

The matrix analog of a Beta variate is based on the eigenvalues of (**A** + **B**)^{−1}**B**, and leads to the following:

Let **A** ~ *W _{p}*(

Since **A** is positive definite, we have 0 < *θ* < 1. Clearly *θ* (*p*, *m*, *n*) can also be defined as the largest root of the determinantal equation

$$det[\mathbf{B}-\theta (\mathbf{A}+\mathbf{B})]=0.$$

Specific examples will be given below, but in general the parameter *p* refers to dimension, *m* to the “error” degrees of freedom and *n* to the “hypothesis” degrees of freedom. Thus, *m* + *n* represents the “total” degrees of freedom.

There are min(*n*, *p*) nonzero eigenvalues of **A**^{−1}**B** or, equivalently, min(*n*, *p*) nonzero roots *θ* = (*θ _{i}*) of the determinantal equation above. The joint density function of these roots is given by

$$p(\theta )=C\prod _{i=1}^{min(n,p)}{\theta}_{i}^{(\mid n-p\mid -1)/2}{(1-{\theta}_{i})}^{(m-p-1)/2}\mathrm{\Delta}(\theta ),$$

(1)

where Δ(*θ*) = Π_{i}_{≠}* _{j}* |

The greatest root distribution has the property

$$\theta (p,m,n)\stackrel{\mathcal{D}}{=}\theta (n,m+n-p,p),$$

useful, in particular, in the case when *n* < *p* [e.g. Mardia, Kent and Bibby (1979), page 84].

Empirical and theoretical investigation has shown that it is useful to develop the approximation in terms of the logit transform of *θ*; thus, we define

$$W(p,m,n)=\text{logit}\phantom{\rule{0.16667em}{0ex}}\theta (p,m,n)=log\left(\frac{\theta (p,m,n)}{1-\theta (p,m,n)}\right).$$

(2)

Our main result, stated more formally below, is that with appropriate centering and scaling, *W* is approximately Tracy–Widom distributed:

$$\frac{W(p,m,n)-\mu (p,m,n)}{\sigma (p,m,n)}\stackrel{\mathcal{D}}{\Rightarrow}{F}_{1}.$$

(3)

The centering and scaling parameters are defined by

$$\mu (p,m,n)=2logtan\left(\frac{\phi +\gamma}{2}\right),$$

(4)

$${\sigma}^{3}(p,m,n)=\frac{16}{{(m+n-1)}^{2}}\frac{1}{{sin}^{2}(\phi +\gamma )sin\phi sin\gamma},$$

(5)

where the angle parameters *γ*, are defined by

$$\begin{array}{l}{sin}^{2}\left(\frac{\gamma}{2}\right)=\frac{min(p,n)-1/2}{m+n-1},\\ {sin}^{2}\left(\frac{\phi}{2}\right)=\frac{max(p,n)-1/2}{m+n-1}.\end{array}$$

The *F*_{1} distribution, due to Tracy and Widom (1996) and plotted in Figure 1, has its origins in mathematical physics—see Tracy and Widom (1996); Johnstone (2001) for further details. The density is asymmetric, with mean −1.21 and SD 1.27. Both tails have exponential decay, the left tail like *e*^{−|s|3/24} and the right tail like *e*^{−(2/3)s3/2}.

For the present paper, what is important is that the *F*_{1} distribution does not depend on any parameters, and the distribution itself, along with its inverse and percentiles, can be tabulated as univariate special functions. These functions play the same role in this paper as the standard normal distribution Φ, its inverse Φ^{−1} and percentiles *z _{α}* play in traditional statistical application.

An R package RMTstat is available at CRAN (cran.r-project.org). It facilitates computation of the distributional approximations and largest root tests described in this paper, and the use of percentiles and random draws from the *F*_{1} distribution. Its scope and use is described in more detail in an accompanying report Johnstone et al. (2010). A parallel MATLAB package is in development; it will also contain code to reproduce the figures and table in this paper.

Let *f _{α}* denote the

$$\begin{array}{lll}{f}_{0.90}=0.4501,\hfill & {f}_{0.95}=0.9793,\hfill & {f}_{0.99}=2.0234\hfill \end{array}.$$

Then the *α*th percentile of *θ* (*p*, *m*, *n*) is given approximately by

$${\theta}_{\alpha}={e}^{\mu +{f}_{\alpha}\sigma}/(1+{e}^{\mu +{f}_{\alpha}\sigma}),$$

(6)

where *μ* = *μ*(*p*, *m*, *n*), *σ* = *σ* (*p*, *m*, *n*) are given by (4) and (5).

The more formal statement of (3) goes as follows. Assume *p*, *m* and *n* → ∞ together in such a way that

$$lim\frac{p\wedge n}{m+n}>0,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}lim\frac{m}{p}>1.$$

(7)

For each *s*_{0} , there exist *c*, *C* > 0 such that for *s* ≥ *s*_{0},

$$\mid P\{W(p,m,n)\le \mu (p,m,n)+\sigma (p,m,n)s\}-{F}_{1}(s)\mid \le C{p}^{-2/3}{e}^{-cs}.$$

For the full proof and much more discussion and detail, see the companion paper [Johnstone (2008)].

If **A** and **B** are as in the definition of *θ* (*p*, *m*, *n*), then let (*p*, *m*, *n*) denote the *smallest* eigenvalue of (**A** + **B**)^{−1}**B**. Its distribution is given by

$$\stackrel{\sim}{\theta}(p,m,n)\stackrel{\mathcal{D}}{=}1-\theta (p,n,m),$$

(note the reversal of *m* and *n*!). In particular, the Tracy–Widom distribution will give a generally useful approximation to the *lower* tail of (*p*, *m*, *n*).

There is an entirely analagous result when **A** and **B** follow complex Wishart distributions, with a modified limit distribution *F*_{2}. Details are given in Johnstone (2008).

There is a substantial literature computing percentage points of the greatest root distribution for selected parameter values, partially reviewed below. The standard paramaterization used in these tables arises from writing the joint density of the roots *θ _{i}* as

$$p(\theta )=C\prod _{i=1}^{\text{s}}{\theta}_{i}^{\text{m}}{(1-{\theta}_{i})}^{\text{n}}\mathrm{\Delta}(\theta ).$$

From this and (1) it is apparent that our “MKB” parameters (*p*, *m*, *n*) are related to the “Table” parameters (s, m, n) via

$$\begin{array}{ll}\text{s}=min(n,p),\hfill & p=\text{s},\hfill \\ \text{m}=(\mid n-p\mid -1)/2,\hfill & m=\text{s}+2\text{n}+1,\hfill \\ \text{n}=(m-p-1)/2,\hfill & n=\text{s}+2\text{m}+1.\hfill \end{array}$$

(8)

In terms of the table parameters and N = 2(s + m + n) + 1, the centering and scaling constants of the Tracy–Widom approximation are given by

$${sin}^{2}\left(\frac{\gamma}{2}\right)=\left(\text{s}-\frac{1}{2}\right)/\text{N},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{sin}^{2}\left(\frac{\phi}{2}\right)=\left(\text{s}+2\text{m}+\frac{1}{2}\right)/\text{N}$$

and

$$\mu =2logtan\left(\frac{\phi +\gamma}{2}\right),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\sigma}^{3}=\frac{16}{{\text{N}}^{2}}\frac{1}{{sin}^{2}(\phi +\gamma )sin\phi sin\gamma}.$$

(9)

We turn to the comparison of percentage points
${\theta}_{\alpha}^{\text{TW}}$ from the Tracy–Widom approximation (6) with the exact values *θ _{α}* for small values of the table parameters (s, m, n). The most extensive tabulations of

Figures 2 and and33 plot
${\theta}_{\alpha}^{\text{TW}}$ against *θ _{α}* at 95th and 90th percentiles for s = 2. This is the smallest relevant value of s—otherwise we are in the univariate case covered by

Comparison of exact and approximate 95th percentiles for s = 2. Top panel: solid line is the Tracy–Widom approximation
${\theta}_{\alpha}^{\text{TW}}(2,\text{m},\text{n})$ plotted as a function of m for values of n shown. Dashed lines are the exact percentiles *θ* **...**

Comparison of exact and approximate 90th percentiles for s = 4. Top panel: solid line is the Tracy–Widom approximation
${\theta}_{\alpha}^{\text{TW}}(4,\text{m},\text{n})$ plotted as a function of m for values of n shown. Dashed lines are the exact percentiles *θ* **...**

$$r=({\theta}_{\alpha}^{\text{TW}}/{\theta}_{\alpha})-1.$$

Figure 2 shows that even for s = 2, the 95th percentile of the TW approximation has a relative error of less than 1 in 20 except in the zone where both m ≤ 2 and n ≥ 10, where the relative error is still less than 1 in 10. Note that the relative error is always positive in sign, implying that the approximate critical points yield a conservative test. More extensive contour plots covering s = 2(1)6 and 90th, 95th and 99th percentiles may be found in Johnstone and Chen (2007).

There has been a large amount of work to prepare tables or charts for the null distribution of the largest root, much of which is reviewed in Chen (2003). We mention contributions by the following: Nanda (1948, 1951); Foster and Rees (1957); Foster (1957, 1958); Pillai (1955, 1956b, 1956a, 1957a, 1965a, 1967); Pillai and Bantegui (1959); Heck (1960); Krishnaiah (1980); Pillai and Flury (1984); Chen (2002, 2003, 2004b, 2004a).

Because of the dependence on the three parameters, these tables can run up to 25 pages in typical textbooks, such as those of Johnson and Wichern (2002) and Morrison (2005).

Constantine (1963) expresses the c.d.f. of the largest root distribution in terms of a matrix hypergeometric function. Koev and Edelman (2006) have developed efficient algorithms (and a MATLAB package available at http://www-math.mit.edu/~plamen) for the evaluation of such matrix hypergeometric functions using recursion formulas from group representation theory.

Koev (2010) collects useful formulas and explains how to use them and mhg to compute the exact c.d.f. and percentiles for the largest root distribution over a range of values of the “MKB” parameters corresponding to *m*, *n*, *p* ≤ 17, and *m*, *n*, *p* ≤ 40 when *n* − *p* is odd.

SAS/STAT 9.0 made available an option for computing exact *p*-values using Davis (1972); Pillai and Flury (1984). There is also some stand-alone software described by Lutz (1992, 2000).

We recall the hypothesis that **A** ~ *W _{p}*(

$${\lambda}_{max}({\mathbf{A}}^{-1}\mathbf{B})=\underset{\mid \mathbf{u}\mid =1}{max}\frac{{\mathbf{u}}^{\prime}\mathbf{Bu}}{{\mathbf{u}}^{\prime}\mathbf{Au}}.$$

(10)

For fixed **u** of unit length, the numerator and denominator are distributed as independent
${\chi}_{(n)}^{2}$ and
${\chi}_{(m)}^{2}$ respectively, and so, again for fixed **u**, the ratio has an *F _{n}*

$$\frac{m}{n}{\lambda}_{max}({\mathbf{A}}^{-1}\mathbf{B})>F\sim {F}_{n,m}.$$

Using the *F _{n}*

The default *p*-value provided in both SAS and R (through package car) uses this unsatisfactory distribution bound.

Table 1 attempts to capture a variety of scenarios within the computational range of Koev’s software.

Column Exact shows a range of significance levels *α* covering several orders of magnitude. Column Largest Root shows the corresponding quantiles *θ _{α}* of the largest root distribution, for the given values of (s, m, n)—these are computed using Koev’s MATLAB routine qmaxeigjacobi. Thus, an observed value of

The remaining columns compare the Tracy–Widom approximation and the *F* bound. The *p*-value obtained from the Tracy–Widom approximation is given by

$${P}_{\text{TW}}({\theta}_{\alpha})=1-{F}_{1}((\text{logit}({\theta}_{\alpha})-\mu )/\sigma ),$$

(11)

where *μ* and *σ* are computed from (9).

The *F* bound on the *p*-value is given by

$$P(\theta (\text{s},\text{m},\text{n})>{\theta}_{\alpha})>{P}_{F}({\theta}_{\alpha})=1-{F}_{{\nu}_{1},{\nu}_{2}}({\nu}_{2}{\theta}_{\alpha}/({\nu}_{1}(1-{\theta}_{\alpha}))),$$

(12)

where *ν*_{1} = s+ 2m+ 1 and *ν*_{2} = s+ 2n+ 1 denote the hypothesis and error degrees of freedom respectively.

The two tables consider s = 2 and 6 variables respectively. The values of m = −0.5 and 5 correspond to s and s + 11 hypothesis degrees of freedom, while the values of n = 2 and 10 translate to s + 5 and s + 21 error degrees of freedom respectively.

At the 10% and 5% levels, the Tracy–Widom approximation is within 20% of the true *p*-value at s = 6, and within 35% of truth at s = 2. The *F*-value is wrong by a factor of four or more at s = 2, and by three orders of magnitude at s = 6. At smaller significance levels, the Tracy–Widom approximation generally stays within one order of magnitude of the correct *p*-value—except at (s, m, n) = (2, −0.5, 10). The *F* approximation is off by many orders of magnitude when s = 6.

In addition, we note that the Tracy–Widom approximation is conservative in nearly all cases, the exception being for *θ* ≥ 0.985 in the case (s, m, n) = (6, −0.5, 2). In contrast, the *F* approximation is *always* [cf. (12)] anti-conservative, often badly so.

In applications one is often concerned only with the general order of magnitude of the *p*-values associated with tests of the various hypotheses that are entertained—not least because the assumptions of the underlying model are at best approximately true. For this purpose, then, it may be argued that the TW approximate *p*-value is often quite adequate over the range of (s, m, n) values. Of course, if (s, m, n) is not too large and greater precision is required, then exact *p*-values can be sought, using, for example, SAS or Koev’s software.

Let **x**_{1}, …, **x*** _{n}* be a random sample from

Again we have two sets of variables, an **x**-set with *q* variables and a **y**-set with *p* variables. The goal is to find maximally correlated linear combinations *η* = **a**′**x** and = **b**′**y**. We suppose that (**X**, **Y**) is a data matrix of *n* samples (rows) on *q* + *p* variables (columns) such that each row is an independent draw from *N _{p}*

Often it may be apparent that the first *k* canonical correlations are nonzero and the main interest focuses on the significance of
${r}_{k+1}^{2},{r}_{k+2}^{2}$, etc. We let *H _{s}* denote the null hypothesis that

*If* **Σ** *H _{s}*,

$$\mathcal{L}({r}_{s+1}\mid p,q,n;\mathbf{\sum})\stackrel{st}{<}\mathcal{L}({r}_{1}\mid p,q-s,n;\mathbf{I}).$$

This nonasymptotic result follows from interlacing properties of the singular value decomposition (Appendix). Since
$\mathcal{L}({r}_{1}^{2}\mid p,q-s,n;\mathbf{I})$ is given by the null distribution *θ* (*p*, *n* + *s* − *q* − 1, *q* − *s*), we may use the latter to provide a conservative *p*-value for testing *H _{s}*. In turn, the

Waugh (1942) gave perhaps the first significant illustration of CCA using data on *n* = 138 samples of Canadian Hard Red Spring wheat and the flour made from each of these samples. The aim was to seek highly correlated indices **a**′**x** of wheat quality and **b**′**y** of flour quality, since a well correlated grading of raw (wheat) and finished (flour) products was believed to promote fair pricing of each. In all, *q* = 5 wheat characteristics—kernel texture, test weight, damaged kernels, foreign material, crude protein in wheat—and *p* = 4 flour characteristics—wheat per bushel of flour, ash in flour, crude protein in flour, gluten quality index—were measured. The resulting squared canonical correlations were
$({r}_{1}^{2},{r}_{2}^{2},{r}_{3}^{2},{r}_{4}^{2})=(0.923,0.554,0.056,0.008)$. The leading correlation would seem clearly significant and, indeed, from our approximate formula (6),
${\theta}_{0.99}^{\text{TW}}=0.184$.

To assess the second correlation *r*_{2}, we appeal to the conservative test discussed above based on the null distribution with *q* −1 = 4, *p* = 4 and *n* = 138. The Tracy–Widom approximation
${\theta}_{0.99}^{\text{TW}}\approx \mu +2.023\sigma \doteq 0.166\ll 0.554$, which strongly suggests that this second correlation is significant as well.

Marginal histograms naturally reveal some departures from symmetric Gaussian tails, but they do not seem extreme enough to invalidate the conclusions, which are also confirmed by permutation tests.

Suppose that we have *k* populations with independent data matrices **X*** _{i}* consisting of

Suppose that independent samples from two normal distributions *N _{p}*(

The multivariate linear model blends ideas well known from the univariate setting with new elements introduced by correlated multiple responses. In view of the breadth of models covered, and the variety of notation in the literature and in the software, we review the setting in a little more detail, beginning with the familiar model for a single response

$$\mathbf{y}=\mathbf{X}\mathit{\beta}+\mathbf{u}.$$

Here **y** is an *n* × 1 column vector of observations on a response variable, **X** is an *n* × *q* model matrix, and **u** is an *n* × 1 column vector of errors, assumed here to be independent and identically distributed as *N* (0, *σ*^{2}). The *q* × 1 vector ** β** of unknown parameters has the least squares estimate—when

$$\widehat{\mathit{\beta}}={({\mathbf{X}}^{\prime}\mathbf{X})}^{-1}{\mathbf{X}}^{\prime}\mathbf{y}.$$

The error sum of squares *SS _{E}* = (

Consider the linear hypothesis *H*_{0}: **C**_{1}** β** = 0, where

$$\mathbf{X}\mathit{\beta}=\left[\begin{array}{ll}\mathbf{X}{\mathbf{C}}^{(1)}\hfill & \mathbf{X}{\mathbf{C}}^{(2)}\hfill \end{array}\right]\left(\begin{array}{l}{\mathbf{C}}_{1}\mathit{\beta}\hfill \\ {\mathbf{C}}_{2}\mathit{\beta}\hfill \end{array}\right),$$

where we have partitioned **C**^{−1} = [**C**^{(1)} **C**^{(2)}] into blocks with *g* and *q* − *g* columns respectively.

Let **P**_{1} denote the orthogonal projection onto the subspace orthogonal to the columns of **XC**^{(2)}. We have the sum of squares decomposition

$${\mathbf{y}}^{\prime}{\mathbf{P}}_{1}\mathbf{y}={\mathbf{y}}^{\prime}\mathbf{Py}+{\mathbf{y}}^{\prime}({\mathbf{P}}_{1}-\mathbf{P})\mathbf{y}$$

and the hypothesis sum of squares for testing *H*_{0}: **C**_{1}** β** = 0 is given by

$$F=\frac{S{S}_{H}/g}{S{S}_{E}/(n-q)}\sim {F}_{g,n-q}.$$

Explicit expressions for the sums of squares are given by

$$\begin{array}{l}S{S}_{E}={\mathbf{y}}^{\prime}(\mathbf{I}-\mathbf{X}{({\mathbf{X}}^{\prime}\mathbf{X})}^{-1}{\mathbf{X}}^{\prime})\mathbf{y},\\ S{S}_{H}={({\mathbf{C}}_{1}\widehat{\mathit{\beta}})}^{\prime}{[{\mathbf{C}}_{1}{({\mathbf{X}}^{\prime}\mathbf{X})}^{-1}{\mathbf{C}}_{1}^{\prime}]}^{-1}{\mathbf{C}}_{1}\widehat{\mathit{\beta}}.\end{array}$$

In the multivariate linear model,

$$\mathbf{Y}=\mathbf{XB}+\mathbf{U},$$

the single response **y** is replaced by *p* response vectors, organized as columns of the *n* × *p* matrix **Y**. The model (or design) matrix **X** remains the same for each response; however, there are separate vectors of unknown coefficients and errors for each response; these are organized into a *q* × *p* matrix **B** of regression coefficients and an *n* × *p* matrix **E** of errors. The multivariate aspect of the model is the assumption that the rows of **U** are indepedent, with multivariate normal distribution having mean **0** and common covariance matrix **Σ**. Thus, **U** is a normal data matrix of *n* samples from *N _{p}*(

$$\widehat{\mathbf{B}}={({\mathbf{X}}^{\prime}\mathbf{X})}^{-1}{\mathbf{X}}^{\prime}\mathbf{Y}.$$

The linear hypothesis becomes

$${H}_{0}:{\mathbf{C}}_{1}\mathbf{B}=\mathbf{0}.$$

The sums of squares of the univariate case are replaced by hypothesis and error sums of squares and products matrices:

$$\begin{array}{l}E={\mathbf{Y}}^{\prime}\mathbf{PY}={\mathbf{Y}}^{\prime}(\mathbf{I}-\mathbf{X}{({\mathbf{X}}^{\prime}\mathbf{X})}^{-\mathbf{1}}{\mathbf{X}}^{\prime})\mathbf{Y},\\ H={\mathbf{Y}}^{\prime}{\mathbf{P}}_{2}\mathbf{Y}={({\mathbf{C}}_{1}\widehat{\mathbf{B}})}^{\prime}{[{\mathbf{C}}_{1}{({\mathbf{X}}^{\prime}\mathbf{X})}^{-\mathbf{1}}{\mathbf{C}}_{1}^{\prime}]}^{-1}{\mathbf{C}}_{1}\widehat{\mathbf{B}},\end{array}$$

(13)

in which the univariate vectors **y** and ** are simply replaced by their multivariate analogs ****Y** and . It follows that **E** ~ *W _{p}*(

Thus, under the null hypothesis **C**_{1}**B** = 0, Roy’s maximum root statistic *θ*_{1} has null distribution

$$\begin{array}{ll}{\theta}_{1}\sim \theta (p,n-q,g),\hfill & \text{where}\hfill \\ p=\text{dimension},\hfill & g=\text{rank}({\mathbf{C}}_{1}),\hfill \\ q=\text{rank}(\mathbf{X}),\hfill & n=\text{sample}\phantom{\rule{0.16667em}{0ex}}\text{size}.\hfill \end{array}$$

(14)

This situation routinely occurs when redundant parameterizations are used, for example, when dealing with factors in analysis of variance models. One approach (e.g., MKB, Section 6.4) is to rearrange the columns of **X** and partition **X** = [**X**_{1} **X**_{2}] so that **X**_{1} has full rank. We must also assume that the matrix **C**_{1} is *testable* in the sense that, as a function of **B**, **XB** = **0** implies **C**_{1}**B** = **0**. In such cases, if we partition **C**_{1} = [**C**_{11} **C**_{12}] conformally with **X**, then
${\mathbf{C}}_{12}={\mathbf{C}}_{11}{({\mathbf{X}}_{1}^{\prime}{\mathbf{X}}_{1})}^{-1}{\mathbf{X}}_{1}^{\prime}{\mathbf{X}}_{2}$ is determined from **C**11.

With these assumptions, we use **X**_{1} and **C**_{11} in (13) and (14) in place of **X** and **C**_{1}.

A straightforward extension is possible in order to test null hypotheses of the form

$${\mathbf{C}}_{1}\mathbf{B}{\mathbf{M}}_{1}=0,$$

where **M**_{1} is *p* × *r* of rank *r*. The columns of **M**_{1} capture particular linear combinations of the dependent variables—for an example, see, e.g., Morrison (2005), Chapter 3.6.

We simply consider a modified linear model

$$\mathbf{Y}{\mathbf{M}}_{1}=\mathbf{XB}{\mathbf{M}}_{1}+\mathbf{U}{\mathbf{M}}_{1}.$$

An important point is that the rows of **UM**_{1} are still independent, now distributed as
${N}_{p}(\mathbf{0},{\mathbf{M}}_{1}^{\prime},\mathbf{\sum}\mathbf{M})$. So we may simply apply the above analysis, replacing **Y**, **U** and **B** by **YM**_{1}, **UM**_{1} and **BM**_{1} respectively. In particular, the greatest root statistic now has null distribution given by

$${\theta}_{1}\sim \theta (r,n-q,g).$$

Analyses involving the four multivariate tests are provided in a number of SAS routines, such as GLM and CANCORR. The parameterization used here can be translated into that used in SAS by means of the documentation given in the SAS/STAT Users Guide—we refer to the section on Multivariate Tests in version 9.1, page 48 ff. The linear hypotheses correspond to MKB notation via

MKB | SAS |
---|---|

C_{1} | L |

B | β |

M_{1} | M |

while the parameters of the greatest root distribution are given by

MKB | SAS | |||
---|---|---|---|---|

dimension | r | rank(M_{1}) | rank(M) | p |

hypothesis | g | rank(C_{1}) | rank(L) | q |

error | n − q | v |

(Note: we use sans serif font for the SAS parameters!) Finally, the SAS printouts use the following parameters:

$$\begin{array}{l}\text{s}=\text{p}\wedge \text{q},\\ \text{m}=(\mid \text{p}-\text{q}\mid -1)/2,\\ \text{n}=(\text{v}-\text{p}-1)/2.\end{array}$$

We have described the Tracy–Widom approximation to the null distribution for the largest root test for a variety of classical multivariate procedures. These procedures exhibit varying degrees of sensitivity to the assumption of normality, independence etc. Documenting the sensitivity/robustness of the T–W approximation is clearly an important issue for further work. Two brief remarks can be made. In the corresponding single Wishart setting [e.g., Johnstone (2001)], the largest eigenvalue can be shown, under the null distribution, to still have the T–W limit if the original data have “light tails” (i.e., sub-Gaussian) [see Soshnikov (2002); Péché (2009)]. In the double Wishart settings, simulations for canonical correlation analysis with *n* = 100 samples on *q* = 20 and *p* = 10 variables, each following i.i.d. *t*_{(5)} or i.i.d. random sign distributions, showed that the T–W distribution for the leading correlation
${r}_{1}^{2}$ still holds in the central 99% of the distribution.

William Chen graciously provided an electronic copy of his tables for the distribution of the largest root.

If **Σ** *H _{s}*, there are at most

$$\mathbf{X}={Q}_{X}{R}_{X},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathbf{Y}={Q}_{Y}{R}_{Y}.$$

Let
$C={Q}_{X}^{T}{Q}_{Y}$ and form the SVD *C* = *U RV ^{T}*. Then the diagonal elements

Now consider the reduced *n* × (*q* − *s*) matrix **X**^{−} obtained by dropping the last *s* columns from **X**. Form the QR decomposition **X**^{−} = *Q*_{X−} *R*_{X−}. From the nature of the decomposition, we have *Q _{X}* = [

$${\sigma}_{s+1}(C)\le {\sigma}_{1}({C}_{-}).$$

Indeed, our earlier discussion implies that **X**^{−} and **Y** are independent, and so *σ*_{1} (*C*_{−}) has the null distribution (*r*_{1}|*p*, *q* − *s*, *n*; **I**).

^{1}Supported in part by NSF DMS-05-05303, 09-06812 and NIH RO1 EB 001988.

^{2}This (actually approximate) value is obtained by interpolation from Koev’s function pmaxeigjacobi which only handles integer values of n.

- Anderson TW. An Introduction to Multivariate Statistical Analysis. 3. Wiley; Hoboken, NJ: 2003. MR1990662.
- Andrews DF, Herzberg AM. Data. Springer; New York: 1985.
- Chen WW. Some new tables of the largest root of a matrix in multivariate analysis: A computer approach from 2 to 6. Presented at the 2002 American Statistical Association.2002.
- Chen WW. Table for upper percentage points of the largest root of a determinantal equation with five roots. InterStat. 2003. Available at interstat.statjournals.net.
- Chen WW. The new table for upper percentage points of the largest root of a determinantal equation with seven roots. InterStat. 2004a. Available at interstat.statjournals.net.
- Chen WW. Some new tables for the upper probability points of the largest root of a determinantal equation with seven and eight roots. In: Dalton J, Kilss B, editors. Special Studies in Federal Tax Statistics. Statistics of Income Division, Internal Revenue Service; 2004b. pp. 113–116.
- Constantine AG. Some non-central distribution problems in multivariate analysis. Ann Math Statist. 1963;34:1270–1285. MR0181056.
- Davis AW. On the marginal distributions of the latent roots of the multivariate beta matrix. Ann Math Statist. 1972;43:1664–1670. MR0343465.
- Foster FG. Upper percentage points of the generalized Beta distribution. II. Biometrika. 1957;44:441–453. MR0090199.
- Foster FG. Upper percentage points of the generalized Beta distribution. III. Biometrika. 1958;45:492–503. MR0100366.
- Foster FG, Rees DH. Upper percentage points of the generalized Beta distribution. I. Biometrika. 1957;44:237–247. MR0086462.
- Golub GH, Van Loan CF. Matrix Computations. 3. Johns Hopkins Univ. Press; Baltimore: 1996. MR1417720.
- Heck DL. Charts of some upper percentage points of the distribution of the largest characteristic root. Ann Math Statist. 1960;31:625–642. MR0119301.
- Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. 6. Pearson Prentice Hall; Upper Saddle River, NJ: 2002.
- Johnstone IM. On the distribution of the largest eigenvalue in principal components analysis. Ann Statist. 2001;29:295–327. MR1863961.
- Johnstone IM. Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Ann Statist. 2008;36:2638–2716. MR2485010. [PMC free article] [PubMed]
- Johnstone IM, Chen WW. Finite sample accuracy of Tracy–Widom approximation for multivariate analysis. 2007 JSM Proceedings; Alexandria, VA: Amer. Statist. Assoc; 2007. pp. 1161–1166.
- Johnstone IM, Ma Z, Perry PO, Shahram M. RMTstat: Distributions, statistics and tests derived from random matrix theory. 2010 Manuscript in preparation.
- Koev P. Computing multivariate statistics. 2010 Manuscript in preparation.
- Koev P, Edelman A. The efficient evaluation of the hypergeometric function of a matrix argument. Math Comp. 2006;75:833–846. (electronic). MR2196994.
- Krishnaiah PR. Computations of some multivariate distributions. In: Krishnaiah PR, editor. Handbook of Statistics, Volume 1—Analysis of Variance. North-Holland; Amsterdam: 1980. pp. 745–971.
- Lutz JG. A Turbo Pascal unit for approximating the cumulative distribution function of Roy’s largest root criterion. Educational and Psychological Measurement. 1992;52:899–904.
- Lutz JG. Roy table: A program for generating tables of critical values for Roy’s largest root criterion. Educational and Psychological Measurement. 2000;60:644–647.
- Mardia KV, Kent JT, Bibby JM. Multivariate Analysis. Academic Press; London: 1979. MR0560319.
- Morrison DF. Multivariate Statistical Methods. 4. Thomson; Belmont, CA: 2005.
- Muirhead RJ. Aspects of Multivariate Statistical Theory. Wiley; New York: 1982. MR0652932.
- Nanda DN. Distribution of a root of a determinantal equation. Ann Math Statist. 1948;19:47–57. MR0024106.
- Nanda DN. Probability distribution tables of the largest root of a determinantal equation with two roots. J Indian Soc Agricultural Statist. 1951;3:175–177. MR0044788.
- Péché S. Universality results for largest eigenvalues of some sample covariance matrices ensembles. Probab Theory Related Fields. 2009;143:481–516.
- Pillai KCS. Some new test criteria in multivariate analysis. Ann Math Statist. 1955;26:117–121. MR0067429.
- Pillai KCS. On the distribution of the largest or smallest root of a matrix in multivariate analysis. Biometrika. 1956a;43:122–127. MR0077052.
- Pillai KCS. Some results useful in multivariate analysis. Ann Math Statist. 1956b;27:1106–1114. MR0081852.
- Pillai KCS. Concise Tables for Statisticians. The Statistical Center, Univ. of the Philippines; Manila: 1957.
- Pillai KCS. On the distribution of the largest characteristic root of a matrix in multivariate analysis. Biometrika. 1965;52:405–414. MR0205391.
- Pillai KCS. Upper percentage points of the largest root of a matrix in multivariate analysis. Biometrika. 1967;54:189–194. MR0215433. [PubMed]
- Pillai KCS, Bantegui CG. On the distribution of the largest of six roots of a matrix in multivariate analysis. Biometrika. 1959;46:237–240. MR0102152.
- Pillai KCS, Flury BN. Percentage points of the largest characteristic root of the multivariate beta matrix. Commun Statist Part A. 1984;13:2199–2237. MR0754832.
- Rencher AC. Methods of Multivariate Analysis. 2. Wiley; New York: 2002. MR1885894.
- Soshnikov A. A note on universality of the distribution of the largest eigenvalues in certain classes of sample covariance matrices. J Statist Phys. 2002;108:1033–1056. MR1933444.
- Tracy CA, Widom H. On orthogonal and symplectic matrix ensembles. Commun Math Phys. 1996;177:727–754. MR1385083.
- Waugh FV. Regressions between sets of variables. Econometrica. 1942;10:290–310.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |