Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2887702

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Permutation Tests of Column-Wise Independence
- 3 Row and Column Correlations
- 4 Normal Theory
- 5 Other Test Statistics
- 6 Remarks
- References

Authors

Related links

Ann Appl Stat. Author manuscript; available in PMC 2010 June 18.

Published in final edited form as:

Ann Appl Stat. 2009 January 1; 3(3): 922–942.

doi: 10.1214/09-AOAS236PMCID: PMC2887702

NIHMSID: NIHMS93108

Stanford University

See other articles in PMC that cite the published article.

Having observed an *m* × *n* matrix *X* whose rows are possibly correlated, we wish to test the hypothesis that the columns are independent of each other. Our motivation comes from microarray studies, where the rows of *X* record expression levels for *m* different genes, often highly correlated, while the columns represent *n* individual microarrays, presumably obtained independently. The presumption of independence underlies all the familiar permutation, cross-validation, and bootstrap methods for microarray analysis, so it is important to know when independence fails. We develop nonparametric and normal-theory testing methods. The row and column correlations of *X* interact with each other in a way that complicates test procedures, essentially by reducing the accuracy of the relevant estimators.

The formal statistical problem considered here can be stated simply: having observed an *m* × *n* data matrix *X* with possibly correlated rows, test the hypothesis that the columns are independent of each other. Relationships between the row correlations and column correlations of *X* complicate the problem’s solution.

Why are we interested in column-wise independence? The motivation in this paper comes from microarray studies, where *X* is a matrix of expression levels for *m* genes on *n* microarrays. In the “Cardio” study I will use for illustration there are *m* = 20426 genes each measured on *n* = 63 arrays, with the microarrays corresponding to 63 subjects, 44 healthy controls and 19 cardiovascular patients^{1}. We expect the gene expressions to be correlated, inducing substantial correlations *within* each column (Owen, 2005; Efron, 2007a; Qiu, Brooks, Klebanov and Yakovlev, 2005a), but most of the standard analysis techniques begin with an assumption of independence *across* microarrays, that is, across the columns of *X*. This can be a risky assumption: all of the familiar permutation, cross-validation and bootstrap methods for microarray analysis, such as the popular SAM program of Tusher, Tibshirani and Chu (2001), depend on column-wise independence of *X*; dependence can invalidate the usual choice of a null hypothesis, as discussed next, leading to flawed assessments of significance.

An immediate purpose of the Cardio study is to identify genes involved in the disease process. For gene *i* we compute the two-sample *t*-statistic “*t _{i}*” comparing sick versus healthy subjects. It will be convenient for discussion to convert these to

$${z}_{i}={\Phi}^{-1}\left({F}_{61}\right({t}_{i}\left)\right)\phantom{\rule{1em}{0ex}}i=1,2,\dots ,m,$$

(1.1)

with and Ф *F*_{61} the cumulative distribution functions (cdf) of standard normal and *t*_{61} distributions; under the usual assumptions, *z _{i}* will have a standard

The left panel of Figure 1 shows the histogram of all 20426 *z _{i}* values, which is seen to be much wider than

The right panel of Figure 1 seems to offer a “smoking gun” for correlation: the scattergram of expression levels for microarrays 31 and 32 looks strikingly correlated, with sample correlation coefficient .805. Here *X* has been standardized by subtraction of its row means, so the effect is not due to so-called ecological correlations. (*X* is actually “doubly standardized,” as defined in Section 2.) Nevertheless the question of whether or not correlation .805 is significantly positive turns out to be surprisingly close, as discussed in Section 4, because the row-wise correlations in *X* drastically reduce the degrees of freedom for the scatterplot. Despite the massive appearance of 20426 points, the scattergram’s accuracy is no more than would be given by 17 independent bivariate normal pairs.

Answering the title’s question, that is, testing for column-wise independence in the presence of row-wise dependence, has both easy and difficult aspects. Section 2 introduces a class of simple permutation tests which, in the case of the Cardio data, clearly discredit column-wise independence. However these tests depend on the ordering of the *n* columns, and can’t be used if the initial order is lost. It is natural and desirable to look for test statistics of column-wise independence that are invariant under permutation of the columns. Classical multivariate analysis, as in Anderson (2003), develops column independence tests in terms of the eigenvalues of an *n* by *n* Wishart matrix. However, this theory depends on the assumption of row-wise independence, disqualifying it for use here.

Sections 3 through 5 consider more general classes of independence tests, both from nonpara-metric and normal theory points of view. The theorem in Section 3 illustrates a key difficulty: correlation between the rows of *X* (ruled out in the classic theory) can give a misleading appearance of column-wise dependence. Similarly, row-wise dependence can greatly degrade the accuracy of the usual *n* × *n* sample covariance matrix of the columns, as shown by the theorem in Section 4. Various non-permutation normal-theory tests are discussed in Section 5, some promising, but with difficulties seen for all of them. The paper ends in Section 6 with a collection of remarks and details.

Simple permutation tests can provide strong evidence against column-wise independence, as we will see for the Cardio data. Our main example concerns the 44 healthy subjects, where *X* is now an *m* × *n* matrix with *m* = 20426 and *n* = 44. For convenience we assume that *X* has been “demeaned” by the subtraction of row and column means, giving

$$\sum _{i}{x}_{ij}=\sum _{j}{x}_{ij}=0\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}i=1,2,\dots ,m\phantom{\rule{thickmathspace}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}j=1,2,\dots ,n.$$

(2.1)

Our numerical results go further and assume “double standardization”: that in addition to (2.1),

$$\sum _{j}{x}_{ij}^{2}=n\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\sum _{i}{x}_{ij}^{2}=m\phantom{\rule{1em}{0ex}}for\phantom{\rule{thickmathspace}{0ex}}i=1,\dots ,m\phantom{\rule{thickmathspace}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}j=1,\dots ,n,$$

(2.2)

i.e., that each row and column of *X* has mean 0 and variance 1; see Remark 6.4 in Section 6.

Let $\widehat{\Delta}$ be the familiar estimate of the *n* × *n* covariance matrix between the columns of *X*,

$$\widehat{\Delta}=\left({X}^{\prime}X\right)\u2215m,$$

(2.3)

Under double standardization, $\widehat{\Delta}$ is actually the sample correlation matrix, which we expect to be near the identity matrix *I _{n}* under column-wise independence. Also let

Let *S(v*_{1}) be a statistic that measures structure, for instance a linear regression of *v*_{1} versus array index. Comparing *S(v*_{1}) with a set of permuted values

$$\left\{{S}^{\ast l}=S\left({v}^{\ast l}\right),\phantom{\rule{1em}{0ex}}l=1,2,\dots ,L\right\},$$

(2.4)

*v* ^{l}* a random permutation of the components of

Permutation testing was applied to *v*_{1} for the Cardio data, using the “block” statistic

$$S\left({v}_{1}\right)={v}_{1}^{\prime}B{v}_{1},$$

(2.5)

where *B* is the *n* × *n* matrix

$$B=\sum _{h}{\beta}_{h}{\beta}_{h}^{\prime}.$$

(2.6)

The sum in (2.6) is over all vectors *β _{h}* of the form

$${\beta}_{h}=(0,0,\dots ,0,1,1,\dots ,1,0,0,\dots ,0).$$

(2.7)

with the 1s forming blocks of length between 2 and 10 inclusive. A heuristic rationale for block testing appears below; intuitively, microarray experiments are prone to block disturbances because of the way they are developed and read; see Callow et al. (2000). After *L* = 5000 permutations, only three *S** values exceeded the actual value *S(v*_{1}), *p*-value .0006, yielding strong evidence against the i.i.d. null hypothesis.

The right panel of Figure 2 pertains to a microarray prostate cancer study (Singh et al., 2002) discussed in Efron (2008): *m* = 6033 genes were measured on each of *n* = 102 men, 50 healthy controls and 52 prostate cancer patients. The right panel plots first eigenvectors for $\widehat{\Delta}$, (2.3), computed separately for the healthy controls and the cancer patients (the two matrices being individually doubly standardized). Both vectors increase almost linearly from left to right. Taking *S(v*_{1}) as the linear regression of *v*_{1} versus array number, permutation testing overwhelmingly rejected the i.i.d. null hypothesis, as it also did using the block test. The prostate study appears as a favorable example of microarray technology in Efron (2008). Nevertheless, Figure 2 indicates a systematic drift in the expression level readings as the study progressed. Some genes drift up, others down (the average drift equaling 0 because of standardization), inducing a small amount of column-wise correlation.

Section 5 discusses models for *X* where the *n* × *n* column covariance matrix is of the “single degree of freedom” form

$$\Delta =I+\lambda \beta {\beta}^{\prime}$$

(2.8)

for some known fixed vector *β*, the null hypothesis of column-wise independence being *H*_{0} : λ = 0. An obvious choice of test statistic in this situation is

$${S}_{\beta}={\beta}^{\prime}\left(\widehat{\Delta}-I\right)\beta ,$$

(2.9)

a monotone increasing function of ${\beta}^{\prime}\widehat{\Delta}\beta $. If *β* is unknown we can replace *S _{β}* with

$${S}_{B}=\sum _{h=1}^{H}{\beta}_{h}^{\prime}\widehat{\Delta}{\beta}_{h}=\mathrm{tr}\left(\widehat{\Delta}\sum _{h}{\beta}_{h}{\beta}_{h}^{\prime}\right)\equiv \mathrm{tr}\left(\widehat{\Delta}\mathrm{B}\right),$$

(2.10)

where {*β*_{1}, *β*_{2}, … , *βH*, } is a catalog of “likely prospects” as in (2.7).

Permutation test statistics such as (2.5) can be motivated from the singular value decomposition (SVD) of *X*,

$$\underset{m\times n}{X}=\underset{m\times K}{U}\underset{K\times K}{d}\underset{K\times n}{{V}^{\prime}},$$

(2.11)

where *K* is the rank, *d* the diagonal matrix of ordered singular values, and *U* and *V* orthonormal matrices of sizes *m* × *K* and *n* × *K*,

$${U}^{\prime}U={V}^{\prime}V={I}_{K},$$

(2.12)

*I _{K}* the

$${e}_{1}\ge {e}_{2}\ge \dots \ge {e}_{K}>0,\phantom{\rule{1em}{0ex}}({e}_{k}={d}_{k}^{2})$$

(2.13)

are the eigenvalues of *X′X* = *V′d*^{2}*V*.

*S _{B}* in (2.10) can now be written as

$${S}_{B}=\sum _{j=1}^{k}\frac{{e}_{j}}{m}\left({v}_{j}^{\prime}B{v}_{j}\right).$$

(2.14)

Model (2.8) suggests that most of the information against the null hypothesis *H*_{0} of independence lies in the first eigenvector *v*_{1}, getting us back to test statistic $S\left({v}_{1}\right)={v}_{1}^{\prime}B{v}_{1}$ as in (2.5).

What should the statistician do if column-wise independence is strongly rejected, as in the Cardio example? Use of an empirical null rather than a permutation or theoretical null, *N* (.03, 1.57^{2}) rather than *N*(0, 1) in Figure 1, removes the reliance on column-wise independence for hypothesis testing methods such as False Discovery Rates, at the expense of increased variability. Efron (2008) discusses these points.

Two objections can be raised to our permutation tests: (1) they are really testing i.i.d., not independence; (2) non-independence might not manifest itself in the order of *v*_{1} (particularly if the order of the microarrays has been shuffied in some unknown way).

Column-wise standardization makes the column distributions more similar, mitigating objection (1). Going further, “quantile standardization” — say replacing each column’s entries by normal scores (Bolstad, Irizarry, Åstrand and Speed, 2003) — makes the marginals exactly the same. The Cardio data was reanalyzed using normal scores, with almost identical results.

Objection (2) is more worrisome from the point of view of statistical power. The order in which the arrays were obtained *should* be available to the statistician, and should be analyzed to expose possible trends like those in Figure 2^{2}. It would be desirable, nevertheless, to have independence tests that do not depend on order — that is, test statistics invariant under column-wise permutations. The remainder of this paper concerns both the possibilities and difficulties in the development of “non-permutation” tests.

There is an interesting relationship between the row and column correlations of the matrix *X*, which complicates the question of column-wise independence. For the notation of this section define the *n* × *n* matrix of sample covariances between the columns of *X* as

$$\widehat{\mathit{Cov}}={X}^{\prime}X\u2215m,$$

(3.1)

called $\widehat{\Delta}$ in Section 2, and likewise

$$\widehat{\mathit{cov}}=X{X}^{\prime}\u2215n,$$

(3.2)

for the *m* × *m* matrix of row-wise sample covariances (having more than 400, 000, 000 entries in the Cardio example!).

**Theorem 1**. *If X has row and column means* 0, (2.1), *then the n*^{2 }*entries of* $\widehat{\mathit{Cov}}$ *have empiricalmean* 0 *and variance c*_{2},

$${c}_{2}=\sum _{k=1}^{K}{e}_{k}^{2}\u2215{\left(\mathit{mn}\right)}^{2},$$

(3.3)

*with e _{k} the eigenvalues* (2.13),

$${1}_{n}^{\prime}{X}^{\prime}X{1}_{n}\u2215m=0,$$

(3.4)

according to (2.1), while the mean of squared entries is

$$\frac{\sum _{j=1}^{n}\sum _{{j}^{\prime}=1}^{n}{\widehat{\mathit{Cov}}}_{j{j}^{\prime}}^{2}}{{n}^{2}}=\frac{\mathrm{tr}\left({\left({\mathrm{X}}^{\prime}\mathrm{X}\right)}^{2}\right)}{{m}^{2}{n}^{2}}=\frac{\mathrm{tr}\left({\mathrm{V}}^{\prime}{d}^{4}\mathrm{V}\right)}{{m}^{2}{n}^{2}}={c}_{2}.$$

(3.5)

Replacing *X′X* with *XX*′ yields the same results for the row covariances $\widehat{\mathit{cov}}$.

Under double standardization (2.1)-(2.2), the covariances become sample correlations, say $\widehat{\mathit{cor}}$ and $\widehat{\mathit{cor}}$ for the columns and rows. Theorem 1 has a surprising consequence: whether or not the columns of X are independent, the column sample correlations will have the same mean and variance as the row correlations. In other words, substantial row-wise correlation can induce the appearance of column-wise correlation.

Figure 3 concerns the 44 healthy subjects in the Cardio study, with *X* an (*m, n*) = (20426, 44) doubly standardized matrix. All 44^{2} column correlations are shown by the solid histogram, while the line histogram is a random sample of 10, 000 row correlations. Here *c*_{2} = .283^{2}, so according to the Theorem both histograms have mean 0 and standard deviation .283.

The 44 diagonal elements of $\widehat{\mathit{cor}}$ protrude as a prominent spike at 1. (We can’t see the spike of 20426 diagonal elements for the row correlation matrix $\widehat{\mathit{cor}}$ because they form such a small fraction of all 20426^{2}.) It is easy to remove the diagonal 1’s from consideration.

**Corollary.** In the doubly standardized situation, the off-diagonal elements of the column correlation matrix $\widehat{\mathit{Cor}}$ *have empirical mean and variance*

$$\widehat{\mu}=-\frac{1}{n-1}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\widehat{\alpha}}^{2}=\frac{n}{n-1}\left({c}_{2}-\frac{1}{n-1}\right).$$

(3.6)

For *n* = 44 and *c*_{2} = .283 this gives

$$(\widehat{\mu},{\widehat{\alpha}}^{2})=(-.023,{.241}^{2}).$$

(3.7)

.

The corresponding diagonal-removing corrections for the row correlations (replacing *n* by *m* in (3.6)) are neglible for *m* = 20426. However *c*_{2} overestimates the variance of the row correlations for another reason: with only 44 points available to estimate each correlation, estimation error adds a considerable component of variance to the $\widehat{\mathit{cor}}$ histogram in the left panel, as discussed next.

Suppose now that the columns of *X* are in fact independent, in which case the substantial column correlations seen in Figure 3 must actually be induced by row correlations, via Theorem 1. Let *cor _{ii′}* indicate the

$${\alpha}^{2}=\sum _{i<{i}^{\prime}}co{r}_{i{i}^{\prime}}^{2}/\left(\begin{array}{c}\hfill m\hfill \\ \hfill 2\hfill \end{array}\right).$$

(3.8)

.

Remark 6.5 of Section 6 shows that ${\widehat{\alpha}}^{2}$ in (3.6) is an approximately unbiased estimate of α^{2}, assuming column-wise independence. For the Cardio example $\widehat{\alpha}=.241$, similar to the size of the microarray correlation estimates in Efron (2007a), Owen (2005), and Qiu et al. (2005a). Section 4 discusses the crucial role of α in determining the accuracy of estimates based on *X*.

The right panel of Figure 3 compares the histogram of the column correlations ${\widehat{\mathit{cor}}}_{j{j}^{\prime}}$, now excluding cases *j* = *j*′, with the row correlation histogram corrected for sampling overdispersion via the shrinkage factor .241/.283. As predicted by Theorem 1, the similarity is striking. A possible difference lies in the long right tail of the $\widehat{\mathit{Cor}}$ distribution (including ${\widehat{\mathit{cor}}}_{31,32}$, the case illustrated in Figure 1), whose significance is examined in Section 4.

The results of Sections 2 and 3 were developed nonparametrically. This section concerns multivariate normal theory, afterwards used in Section 5 to draw the connection with classical multivariate independence tests. We consider the *matrix normal* distribution for *X*,

(4.1)

where the Kronecker notation indicates covariance structure

(4.2)

Row *x _{i}* of

(4.3)

(*not* independently across rows unless is diagonal), and likewise for column *x _{j}*,

Much of classical multivariate analysis focuses on the situation = *I*, where the rows *x _{i}* are independent replicates

(4.4)

in which case the sample covariance matrix $\widehat{\Delta}=X\prime X\u2215m$ has a scaled Wishart distribution,

$$\widehat{\Delta}\sim \mathrm{Wishart}(m,\Delta )\u2215m.$$

(4.5)

Distribution (4.5) has first and second moments

$$\underset{n\times n}{\widehat{\Delta}}\sim \left(\underset{n\times n}{\Delta},\underset{{n}^{2}\times {n}^{2}}{{\Delta}^{\left(2\right)}}/m\right)\phantom{\rule{1em}{0ex}}\text{with}\phantom{\rule{thickmathspace}{0ex}}{\Delta}_{jk,lh}^{\left(2\right)}={\Delta}_{jl}{\Delta}_{kh}+{\Delta}_{jh}{\Delta}_{kl}$$

(4.6)

for *j, k, l, h* = 1, 2, … , *n*; see Mardia, Kent and Bibby (1979, p. 92).

Relation (4.6) says that when = *I*, that is when the rows of *X* are independent, $\widehat{\Delta}$ unbiasedly estimates the row covariance matrix with accuracy proportional to *m*^{-1/2}. Correlation between rows reduces the accuracy of $\widehat{\Delta}$, as shown next.

Returning to the general situation (4.1)-(4.3), define

$$\stackrel{~}{\Delta}={X}^{\prime}{\sigma}^{-2}X\u2215m,$$

(4.7)

where *σ* is the diagonal matrix with diagonal entries .

**Theorem 2**. *Under model* (4.1), $\stackrel{~}{\Delta}$ *has first and second moments*

$$\stackrel{~}{\Delta}\sim \left(\Delta ,{\Delta}^{\left(2\right)}\u2215\stackrel{~}{m}\right),\phantom{\rule{1em}{0ex}}\stackrel{~}{m}=m\u2215[1+(m-1\left){\alpha}^{2}\right],$$

(4.8)

*where α is the* total correlation *as in* (3.8),

(4.9)

*and* Δ^{(2) }*is the Wishart covariance* (4.6).

Comparing (4.8) with (4.6), we see that correlation between the rows reduces “effective sample size” from *m* to $\stackrel{~}{m}$ : for α = .241 as in (3.7), the reduction is from *m*=20426 to $\stackrel{~}{m}=17.2!$ (Notice that row standardization effectively makes $\stackrel{~}{\Delta}\stackrel{.}{=}\widehat{\Delta}$ (2.3), justifying the comparison.) The total correlation α shows up in other efficiency calculations; see Remark 6.7.

*Proof*. The row-standardized matrix = σ^{-1}*X* has matrix normal distribution

(4.10)

where has diagonal elements From (4.2) we see that is the correlation between elements *X _{ij}* and

$$E\left\{{\stackrel{~}{\Delta}}_{jk}\right\}={\Delta}_{jk},$$

(4.11)

using (4.2).

The covariance calculation for $\stackrel{~}{\Delta}$ involves expansion

$${\stackrel{~}{\Delta}}_{jk}{\stackrel{~}{\Delta}}_{lh}=\left(\sum _{i}{\stackrel{~}{X}}_{ij}{\stackrel{~}{X}}_{ik}\u2215m\right)\left(\sum _{{i}^{\prime}}{\stackrel{~}{X}}_{{i}^{\prime}l}{\stackrel{~}{X}}_{{i}^{\prime}h}\u2215m\right)$$

(4.12)

$$=\frac{1}{{m}^{2}}\left(\sum _{i}{\stackrel{~}{X}}_{ij}{\stackrel{~}{X}}_{ik}{\stackrel{~}{X}}_{il}{\stackrel{~}{X}}_{ih}+\sum _{i\ne {i}^{\prime}}{\stackrel{~}{X}}_{ij}{\stackrel{~}{X}}_{ik}{\stackrel{~}{X}}_{{i}^{\prime}l}{\stackrel{~}{X}}_{{i}^{\prime}h}\right).$$

(4.13)

Using the formula

$$E\left\{{Z}_{1}{Z}_{2}{Z}_{3}{Z}_{4}\right\}={\gamma}_{12}{\gamma}_{34}+{\gamma}_{13}{\gamma}_{24}+{\gamma}_{14}{\gamma}_{23}$$

(4.14)

for a normal vector (*Z*_{1}*Z*_{2}*Z*_{3}*Z*_{4})^{′} with 0 means and covariances γ_{ij}, (4.2) gives

$$E\left\{\sum _{i}{\stackrel{~}{X}}_{ij}{\stackrel{~}{X}}_{ik}{\stackrel{~}{X}}_{il}{\stackrel{~}{X}}_{ih}\right\}=m[{\Delta}_{jk}{\Delta}_{lh}+{\Delta}_{jl}{\Delta}_{kh}+{\Delta}_{jh}{\Delta}_{kl}]$$

(4.15)

and

(4.16)

Then (4.13) yields giving

$$E\left\{{\stackrel{~}{\Delta}}_{jk}{\stackrel{~}{\Delta}}_{lh}\right\}={\Delta}_{jk}{\Delta}_{lh}+({\Delta}_{jl}{\Delta}_{kh}+{\Delta}_{jh}{\Delta}_{kl})\left(\frac{1+(m-1){\alpha}^{2}}{m}\right),$$

(4.17)

giving

$$\mathrm{cov}({\stackrel{~}{\Delta}}_{jk},{\stackrel{~}{\Delta}}_{lh})=({\Delta}_{jl}{\Delta}_{kh}+{\Delta}_{jh}{\Delta}_{kl})\u2215\stackrel{~}{m}$$

(4.18)

as in (4.8).

A corollary of Theorem 2, used in Section 5, concerns bilinear functions of and Δ and $\stackrel{~}{\Delta}$,

$${\tau}^{2}={w}^{\prime}\Delta w\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\stackrel{~}{\tau}}^{2}={w}^{\prime}\stackrel{~}{\Delta}w,$$

(4.19)

where *w* is a given *n*-vector.

**Corollary. ***Under model* (4.1), ${\stackrel{~}{\tau}}^{2}$ has mean and variance

$${\stackrel{~}{\tau}}^{2}\sim ({\tau}^{2},2{\tau}^{4}\u2215\stackrel{~}{m}).$$

(4.20)

The proof follows that for Theorem 2; see Remark 6.9.

If = *I* in (4.1), then $\stackrel{~}{\Delta}=\widehat{\Delta}$ and ${\stackrel{~}{\tau}}^{2}$ has a scaled chi-squared distribution,

$${\stackrel{~}{\tau}}^{2}\sim {\tau}^{2}\cdot {\chi}_{m}^{2}\u2215m,$$

(4.21)

with mean and variance ${\stackrel{~}{\tau}}^{2}\sim ({\tau}^{2},2{\tau}^{4}\u2215m)$, so again the effect of correlation within is to reduce the effective sample size from *m* to $\stackrel{~}{m}$ (4.8).

We can approximate $\stackrel{~}{\Delta}$ (4.7), with

$$\widehat{\Delta}={X}^{\prime}{\widehat{\sigma}}^{-2}X\u2215m,$$

(4.22)

where ${\widehat{\sigma}}_{ii}^{2}$ is an estimate of _{ii} based on the observed variability in row *i*. If the rows of *X* have been standardized, then ${\widehat{\sigma}}_{ii}^{2}=1$ and $\widehat{\Delta}$ returns to its original definition *X*′*X/m*.

Both Theorem 2 and the Corollary encourage us to think of $\widehat{\Delta}$ as, approximately, a scaled Wishart distribution based on an independent sample of size $\stackrel{~}{m}$

$$\widehat{\Delta}\stackrel{.}{\sim}\mathrm{Wishart}(\stackrel{~}{m},\Delta )\u2215\stackrel{~}{m}.$$

(4.23)

The dangers of this approximation are discussed in Section 5, but it is, nevertheless, an evocative heuristic, as shown below.

Figure 4 returns to the question of the seemingly overwhelming correlation .805 between arrays 31 and 32 seen in Figure 1. A one-sided *p*-value was calculated for each of the 946 column correlations, using as a null hypothesis the normal theory correlation coefficient distribution based on a sample size of $\stackrel{~}{m}$ = 17.2 pairs of *N*_{2}(0*, I*) points (the correct null if Δ = *I* in (4.23)). Benjamini and Hochberg’s (1995) False Discovery Rate test, level *q* = .1, was applied to the 946 *p*-values. This yielded 7 significant cases, those with sample correlation .723; all 7 were from the block of arrays 27 to 32 indicated in Figure 2. Correlation .805 does turn out to be significant, but by a much closer margin than Figure 1’s scattergram suggests.

The Fdr procedure was also applied using the simpler null distribution *N*(—.023, .241^{2}) (3.7). This raised the significance threshold from .723 to .780, removing two of the previously significant correlations.

Theorem 1 showed that the variance of the observed column correlations is useless for testing column-wise independence, since any value at all can be induced by row correlations. The test in Figure 4 avoids this trap by looking for unusual outliers among the column correlations. It does *not* depend on the order of the columns, objection (2) in Section 2 for permutation tests, but pays the price of increased modeling assumptions.

Theorem 2 offers a normal-theory strategy for testing column-wise independence. We begin with *X* ~ *N _{m,n}*(0, Δ)(4.1), taking

(5.1)

as suggested by double standardization. The null hypothesis of column-wise independence is equivalent to the column correlation matrix equaling the identity,

$${H}_{0}:\Delta =I,$$

(5.2)

since then (4.2) says that all pairs in different columns are independent.

To test (5.2), we estimate with Δ with $\widehat{\Delta}$, (4.22) or more simply $\widehat{\Delta}=X\prime X\u2215m$ after standardization, and compute a test statistic

$$S=s\left(\stackrel{~}{\Delta}\right),$$

(5.3)

where s(.) is some measure of distance between $\widehat{\Delta}$ and *1*. The accuracy approximation $\widehat{\Delta}\stackrel{.}{~}(\Delta ,{\Delta}^{\left(2\right)}\u2215\stackrel{~}{m})$ from (4.8), with Δ = *1*, is used to assess the significance level of the observed *S*, maybe even employing the more daring approximation $\widehat{\Delta}\stackrel{.}{~}\text{Wishart}(\stackrel{~}{m},I)\u2215\stackrel{~}{m}$. Strategy (5.3) looks promising but, as the examples of this section will show, it suffers from serious difficulties that are absent under the classic assumption of independent rows.

One of the difficulties stems from Theorem 1. An obvious test statistic for *H*_{0} : Δ = *I* is

$$S=\sum _{j<{j}^{\prime}}{\stackrel{~}{\Delta}}_{j{j}^{\prime}}^{2}/\left(\begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}\right),$$

(5.4)

the average squared off-diagonal element of $\widehat{\Delta}$. But $\widehat{\Delta}=\widehat{\mathit{Cov}}$ (3.1), so in the doubly standardized situation of (3.6), *S* is an increasing monotone function $\widehat{\alpha}$, the estimated total correlation. This disqualifies *S* as a test statistic for (5.2), since large values of $\widehat{\alpha}$ can always be attributed to row-wise correlation alone.

Similarly, the variance of the eigenvalues (2.13),

$$S=\sum _{k=1}^{K}{({e}_{k}-e.)}^{2}\u2215k\phantom{\rule{1em}{0ex}}\left(e.=\sum {e}_{k}\u2215K\right),$$

(5.5)

looks appealing since the true eigenvalues all equal 1 when Δ = *I*. However (5.5) is also a monotonic function of $\widehat{\alpha}$; see Remark 6.1.

The general difficulty here is “leakage,” the fact that row-wise correlations affect the observed pattern of column-wise correlations. This becomes clearer by comparison with classical multivariate methods, where row-wise correlations are assumed away by taking = *I* in (4.1). Johnson and Graybill (1972) consider a two-way ANOVA problem where, after subtraction of main effects, *X* has the form

$${X}_{ij}={a}_{i}{\beta}_{j}+{\u220a}_{ij}\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}i=1,2,\dots ,m\phantom{\rule{thickmathspace}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}j=1,2,\dots ,n,$$

(5.6)

*a _{i} ~ N*(0, λ) and

In the Kronecker notation (4.1), *X* ~ *N*_{m,n}(0,*I* Δ) with

$$\Delta =I+\lambda \beta {\beta}^{\prime}.$$

(5.7)

Now (5.2) becomes *H*_{0} : λ = 0. Johnson and Graybill show that, with *β* unknown, the likelihood ratio test rejects *H*_{0} for large values of the eigenvalue ratio (2.13),

$$S={e}_{1}/\sum _{k=1}^{K}{e}_{k}.$$

(5.8)

Since the *m* rows of *X* are assumed independent, they can test *H*_{0} by comparison of *S* with values ${S}^{\star}={e}_{1}^{\star}\u2215{\sum}_{k=1}^{K}{e}_{k}^{\star}$ obtained from

$${\widehat{\Delta}}^{\ast}\sim \mathrm{Wishart}(m,I)\u2215m,$$

(5.9)

as in (4.5).

Getting back to the correlated rows situation, Theorem 2 suggests comparing *S* with values *S** from

$${\widehat{\Delta}}^{\ast}\sim \mathrm{Wishart}(\stackrel{~}{m},I)\u2215\stackrel{~}{m},$$

(5.10)

$\stackrel{~}{m}$ as in (4.8). The solid histogram in Figure 5 compares 100 *S** values from (5.10), $\stackrel{~}{m}$ = 17.2 for the Cardio data, with the observed value *S* = .207 from the doubly standardized Cardio matrix for the healthy subjects used in Figure 3. All 100 *S** values are much smaller than *S*, providing strong evidence against *H*_{0} : Δ = *I*.

The evidence looks somewhat weaker, though, if we simulate *S** values with ${\widehat{\Delta}}^{\star}$ obtained from random matrices

(5.11)

doubly standardized, where has total correlation α = .241, the estimated value for *X*, (4.9). The line histogram in Figure 5 shows 100 such *S** values, all still smaller than *S*, but substantially less so. (Remark 6.8 describes the construction of *X**.)

Why does (5.11) produce larger “null” *S** values than (5.10)? The answer is simple: even though the first and second moments of ${\widehat{\Delta}}^{\star}={X}^{\star}\prime {X}^{\star}\u2215m$ match ${\widehat{\Delta}}^{\star}$ from (5.10), its eigenvalues do not. The non-zero eigenvalues of *X*′X*/m* equal those of . This is another example of leakage, where the fact that in (5.11) is not the identity *I _{m}* distorts the estimated eigenvalue of ${\widehat{\Delta}}^{\star}$ even if Δ =

The eigenratio statistic *S* = *e*_{1}/Σ*e _{k}* is invariant under permutations of the columns of

The bilinear form (4.19)-(4.20) yields another class of test statistics,

$${\widehat{\tau}}^{2}={w}^{\prime}\widehat{\Delta}w\stackrel{.}{\sim}({\tau}^{2},2{\tau}^{4}\u2215\stackrel{~}{m}),$$

(5.12)

where *w* is a pre-chosen *n*-vector and ${\tau}^{2}={w}^{\prime}\Delta w$. Delta-method arguments give $CV\left(\widehat{\tau}\right)\stackrel{.}{=}{\left(2\stackrel{~}{m}\right)}^{-1\u22152}$ for the coeffcient of variation of $\widehat{\tau}$. Defining

$${Z}_{i}={x}_{i}^{\prime}w\phantom{\rule{1em}{0ex}}\left({x}_{i}^{\prime}\phantom{\rule{thickmathspace}{0ex}}\text{the}\phantom{\rule{thickmathspace}{0ex}}i\text{th row of}\phantom{\rule{thickmathspace}{0ex}}X\right),$$

(5.13)

yields the alternative form

$${\widehat{\tau}}^{2}=\sum _{i=1}^{m}{Z}_{i}^{2}\u2215m.$$

(5.14)

.

In a two-sample situation like that for the Cardio study, sample sizes n_{1} and n_{2}, we can choose

$${w}^{\prime}={\left(\frac{{n}_{1}{n}_{2}}{{n}_{1}+{n}_{2}}\right)}^{1\u22152}(-{1}_{{n}_{1}}\u2215{n}_{1},{1}_{{n}_{2}}\u2215{n}_{2}),$$

(5.15)

“1_{n}” indicating a vector of *n* 1’s. This choice makes

$${Z}_{i}={\left(\frac{{n}_{1}{n}_{2}}{{n}_{1}+{n}_{2}}\right)}^{1\u22152}({\stackrel{\u2012}{x}}_{2i}-{\stackrel{\u2012}{x}}_{1i}),$$

(5.16)

the multiple of the mean response difference between the two samples that has variance 1 if Δ = *I*. In terms of (5.12), *w*^{2} = 1 so τ^{2} = 1.

For the Cardio study, with *n*_{1} = 44, *n*_{2} = 19, and $\stackrel{~}{m}$ = 17.2, we obtain $\widehat{\tau}=1.48$, coefficient of variation 0.17. This puts $\widehat{\tau}$ more than 2.8 standard errors above the null hypothesis value τ = 1, again providing evidence against column-wise independence. The *Z _{i}* values from (5.16) are nearly indistinguishable from the

Once again, however, there are difficulties with this as a test for column-wise independence. There is no question that the *Z _{i}*’s are overdispersed compared to the theoretical value τ = 1. But problems other than column dependence can cause overdispersion, in particular unobserved covariate differences between subjects in the two samples (Efron, 2004, 2008).

The statistic $S=w\prime \widehat{\Delta}w$ in (5.15) does not depend upon the order of the columns of *X* within each of the two samples, answering objection (2) against permutation tests, but it is the only such choice for a two-sample situation. Other *w*’s might yield interesting results. The version of (5.15) comparing the first 22 healthy Cardio subjects with the second 22 provided the spectacular value $\widehat{\tau}=1.87$, and here the “unobserved covariate” objection has less force.

Now, however, the test statistic depends on the order of the columns within the healthy subjects’ matrix, reviving objection (2). Again we might want to check a catalog of possible *w* vectors *w*_{1}, *w*_{2}, …, *w _{H}*, leading back to test statistic

$${S}_{B}=\sum _{h}{w}_{h}^{\prime}\widehat{\Delta}{w}_{h}=\mathrm{tr}\left(\widehat{\Delta}\mathrm{B}\right)\phantom{\rule{1em}{0ex}}\left(\mathrm{B}=\sum _{\mathrm{h}}{\mathrm{w}}_{\mathrm{h}}{\mathrm{w}}_{\mathrm{h}}^{\prime}\right)$$

(5.17)

as in (2.10), the only difference being that the null distribution of $\widehat{\Delta}$ now involves normal theory rather than permutations. Remark 6.9 shows that the null first and second moments of *S _{B}* are similar to (5.12),

$${S}_{B}\underset{{H}_{0}}{\sim}\left(\mathrm{tr}\left(B\right),\frac{2}{\stackrel{~}{\mathrm{m}}}\mathrm{tr}\left({B}^{2}\right)\right).$$

(5.18)

.

In summary, normal-theory methods are interesting and promising, but are not yet proven competitors for the permutation tests of Section 2.

This section presents some brief remarks and details supplementing the previous material.

**Remark 6.1. ***The constant **c*_{2} The variance constant *c*_{2} in Theorem 1 (3.3) can be expressed as

$${c}_{2}=\frac{K}{{\left(mn\right)}^{2}}\left[{e}^{-2}+\sum _{k=1}^{K}{({e}_{k}-\stackrel{\u2012}{e})}^{2}\right]\phantom{\rule{1em}{0ex}}\left(\stackrel{\u2012}{e}\equiv \sum _{1}^{K}{e}_{k}\u2215K\right),$$

(6.1)

.
so that *c*_{2} ≥ *K(ē/mn*)^{2}, with equality only if the eigenvalues *e _{k}* are equal. In the doubly standardized case ē =

$${c}_{2}\ge 1\u2215K,$$

(6.2)

where *K* is the rank of *X*.

**Remark 6.2. ***Permutation invariance* If the columns of *X* are i.i.d. observations from a distribution on R* ^{m}*, then the distribution of

$$\stackrel{~}{X}\pi =L\left(X\right)\pi =L\left(X\pi \right)\sim L\left(X\right)=\stackrel{~}{X},$$

(6.3)

showing that $\stackrel{~}{X}$ is permutation invariant.

Similarly, suppose $\stackrel{~}{X}=R\left(X\right)$, performing the same operation ${\stackrel{~}{X}}_{i}=r\left({X}_{i}\right)$ on each row of *X*, where now we require *r*(*x*)π = *r*(*x*π) for all *n*-vectors *x*. The same argument as (6.3) demonstrates that $\stackrel{~}{X}$ is still permutation invariant. Iterating row and column standardizations as in Table 1 then shows that if the original data matrix *X* is permutation invariant, so is its doubly standardized version.

**Remark 6.3. ***Covariances after demeaning* Suppose that *X* is normally distributed, with covariances Δ (4.2), all columns having the same expectation vector μ. Let $\stackrel{~}{X}$ be the demeaned matrix obtained by subtracting all the row and column means of *X*. Then

(6.4)

where

$${\stackrel{~}{\Delta}}_{j{j}^{\prime}}={\Delta}_{j{j}^{\prime}}-\Delta {.}_{{j}^{\prime}}-{\Delta}_{j}.+\Delta ..,$$

(6.5)

dots indicating averaging over the missing subscripts, and similarly for . This shows that de-meaning tends to reduce covariances by recentering them around 0.

**Remark 6.4. ***Standardization* A matrix *X* is “column standardized” by individually subtracting the mean and dividing by the standard deviation of each column, and similarly for row standardization. Table 1 shows the effect of successive row and column standardizations on the 20426 × 44 demeaned matrix of healthy Cardio subjects. Here “Col” is the empirical standard deviation of the 946 column-wise correlations ${\widehat{\mathit{Cor}}}_{jj\prime},j<j\prime $; “Eig” is $\widehat{\alpha}$ in (3.6); and “Row” is the empirical standard deviation $"\widehat{\beta}"$ of a 1% sample of the row correlations ${\widehat{\mathit{cor}}}_{ii\prime}$, but adjusted for overdispersion,

$${\text{Row}}^{2}=\frac{n}{n-1}\left({\widehat{\beta}}^{2}-\frac{1}{n-1}\right).$$

(6.6)

Sampling error of the Row entries is about ±.0034.

The doubly standardized matrix *X* used for Figure 3 was obtained after five successive column-row standardizations. This was excessive; the Figure looked almost the same after two iterations. Other microarray examples converged equally rapidly, though small counterexamples can be constructed where double standardization isn’t possible.

Microarray analyses usually begin with some form of column-wise standardization (Bolstad et al., 2003; Qiu, Klebanov and Yakovlev, 2005b), designed to negate “brightness” differences between the *n* microarrays. In the same spirit, row standardization helps prevent incidental gene differences (for example, very great or very small expression level variabilities) from obscuring the actual effects of interest. Standardization tends to reduce the apparent correlations as in Remark 6.3. Without standardization, the scatterplot in Figure 1 stretches out along the main diagonal, correlation .917, driven by genes with unusually large or small inherent expression levels.

**Remark 6.5. ***Corrected estimates of the total correlation* Suppose that the true row correlations *cor _{ii}*′ have mean 0 and variance α

$${\widehat{\mathit{cor}}}_{i,{i}^{\prime}}\stackrel{.}{=}\left[{\mathit{cor}}_{i{i}^{\prime}},{(1-{\mathit{cor}}_{i{i}^{\prime}}^{2})}^{2}\u2215(n-3)\right],$$

(6.7)

(6.7) being a good normal-theory approximation (Johnson and Kotz, 1970, Chap. 32). Letting ${\stackrel{\u2012}{\alpha}}^{2}$ be the empirical variance of the ${\widehat{\mathit{cor}}}_{ii\prime}$ values, a standard empirical Bayes derivation yields

$${\widehat{\alpha}}^{2}={A}^{2}-\frac{3}{n-5}{A}^{4}\phantom{\rule{1em}{0ex}}\left[{A}^{2}=\frac{(n-3){\stackrel{\u2012}{\alpha}}^{2}-1}{n-5}\right]$$

(6.8)

as an approximately unbiased estimate of α^{2}. (If $\stackrel{\u2012}{\mathit{cor}}$ is not assumed to equal 0, a slightly more complicated formula applies.) Of course ${\widehat{\alpha}}^{2}=0$ if the right side of (6.8) is negative.

Theorem 1 implies that ${\stackrel{\u2012}{\alpha}}^{2}$ = 0 nearly equals *c*_{2}, (3.3), in the doubly standardized situation. Formula (3.6), with say

$${\stackrel{~}{\alpha}}^{2}=\frac{n}{n-1}\left({\stackrel{\u2012}{\alpha}}^{2}-\frac{1}{n-1}\right)$$

(6.9)

is not identical to (6.8), but provides an excellent approximation for values of $\stackrel{\u2012}{\alpha}\le 0.5$: with *n* = 44 and $\stackrel{\u2012}{\alpha}=.283$ as in (3.6), $\widehat{\alpha}=.2415$ while $\stackrel{~}{\alpha}=.2412$.

**Remark 6.6. ***Column and row centerings* The column correlation mean $\widehat{\mu}=-1\u2215(n-1)$ in (3.6) is forced by the row-wise demeaning Σ* _{j} x_{ij}* = 0, (2.1), centering the solid histogram in the right panel of Figure 3 at -.023. With

**Remark 6.7. ***The total correlation* α The total correlation α, which plays a key role in Theorem 2, (4.9), also is the central parameter of the theory developed in Efron (2007a). Equations (3.15)- (3.16) there are equivalent to (5.12) here. In both papers, α has the very convenient feature of summarizing the effects of an enormous *m* × *m* correlation matrix in a single number.

**Remark 6.8.** *for simulation* (5.11) The *X** simulation used in Figure 5 began with *m* × *n* matrix *Y* = (*y _{ij}*),

$${y}_{ij}={c}_{Ij}+{e}_{ij}\phantom{\rule{1em}{0ex}}\left\{\begin{array}{cc}{e}_{ij}\hfill & \sim \mathcal{N}(0,1)\hfill \\ {c}_{Ij}\hfill & \sim \mathcal{N}(0,{\gamma}^{2})\hfill \end{array}\phantom{\}}\phantom{\rule{1em}{0ex}}\right(\text{all independent}),$$

(6.10)

where *I* = 1, 2, 3, 4, 5 as *i* is in the first, second, …, last fifth of 1 through *m*; *Y* was then column standardized to give *X**, so that had a block form, with large positive correlations (about 0.61) in the (*m*/5) × (*m*/5) diagonal blocks. The choice λ = 1.23 was required to yield α = .241.

**Remark 6.9. ***Bilinear statistics* Since $\stackrel{~}{\Delta}~(\Delta ,{\Delta}^{\left(2\right)}\u2215\stackrel{~}{m})$ (4.8), it is clear that $E\left\{{\stackrel{~}{\tau}}^{2}\right\}={\tau}^{2}$ in Corollary (4.20). The variance calculation proceeds as in Theorem 2:

$$\begin{array}{cc}\hfill \mathrm{var}\left\{{\stackrel{~}{\tau}}^{2}\right\}& =\sum _{jk}\sum _{lh}{\Delta}_{jk,lh}^{\left(2\right)}{w}_{j}{w}_{k}{w}_{l}{w}_{h}\u2215\stackrel{~}{m}\hfill \\ \hfill & =\sum _{jk}\sum _{lh}[{\Delta}_{jl}{\Delta}_{kh}+{\Delta}_{jh}{\Delta}_{kl}]{w}_{j}{w}_{k}{w}_{l}{w}_{h}\u2215\stackrel{~}{m}\hfill \\ \hfill & =\left[\sum _{jl}\sum _{kh}\left({\Delta}_{jl}{w}_{j}{w}_{l}\right)\left({\Delta}_{kh}{w}_{k}{w}_{h}\right)+\sum _{jh}\sum _{kl}\left({\Delta}_{jh}{w}_{j}{w}_{h}\right)\left({\Delta}_{kl}{w}_{k}{w}_{l}\right)\right]/\stackrel{~}{m}\hfill \\ \hfill & =2{\left(\sum _{jk}{\Delta}_{jk}{w}_{j}{w}_{l}\right)}^{2}/\stackrel{~}{m}=2{\tau}^{4}\u2215\stackrel{~}{m}.\hfill \end{array}$$

(6.11)

The verification of (5.18) is the same, except with element *b _{jk}* of

^{1}The entries of *X* are log(red/green) ratios obtained from oligonucleotide arrays.

^{2}The referee points out that when A ymetrix CEL files are available, array run dates will usually be found in the DatHeader lines.

^{3}Most multivariate texts reverse the situation, taking the columns as independent replicas of possibly correlated rows.

- Anderson TW. An Introduction to Multivariate Statistical Analysis. Third Wiley, New York: 2003.
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 1995;57:289–300.
- Bolstad BM, Irizarry RA, Åstrand M. Irizarry, Speed TP. Comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003. pp. 185–193.http://web.mit.edu/biomicro/education/RMA.pdf Available at. [PubMed]
- Callow M, Dudoit S, Gong E, Speed T, Rubin E. Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Research. 2000;10:2022–2029. [PubMed]
- Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 2004;99:96–104.
- Efron B. Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 2007a;102:93–103.
- Efron B. Size, power, and false discovery rates (2007) Ann. Statist. 2007b;35:1351–1377.
- Efron B. Microarrays, empirical Bayes, and the two-groups model. Statist. Sci. 2008;23:1–47. with discussion and Rejoinder.
- Johnson DE, Graybill FA. An analysis of a two-way model with interaction and no replication. J. Amer. Statist. Assoc. 1972;67:862–868.
- Johnson NL, Kotz S. Continuous Univariate Distributions-1. Houghton Mifflin Company; Boston: 1970.
- Mardia K, Kent J, Bibby J. Multivariate Analysis. Academic Press; London San Diego: 1979.
- Owen AB. Variance of the number of false discoveries. J. Roy. Statist. Soc. Ser. B. 2005;67:411–426.
- Qiu X, Brooks AI, Klebanov L, Yakovlev A. The effects of normalization on the correlation structure of microarray data. BMC Bioinformatics. 2005. p. 120.http://www.biomedcentral.com/1471-2105/6/120 Available at. [PMC free article] [PubMed]
- Qiu X, Klebanov L, Yakovlev A. Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes. Statist. Appl. Genet. Mol. Bio. 2005. http://www.bepress.com/sagmb/vol4/iss1/art34 article 34. Available at. [PubMed]
- Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002;1:203–209. [PubMed]
- Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Nat. Acad. Sci. USA. 2001. pp. 5116–5121.http://www.pnas.org/cgi/content/full/98/9/5116 Available at. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |