A simple microarray experiment may be carried out to detect the differences in expression between two conditions. Each condition may be represented by one or more RNA samples. Using two-color cDNA microarrays, samples can be compared directly on the same microarray or indirectly by hybridizing each sample with a common reference sample [
4,
6]. The null hypothesis being tested is that there is no difference in expression between the conditions; when conditions are compared directly, this implies that the true ratio between the expression of each gene in the two samples should be one. When samples are compared indirectly, the ratios between the test sample and the reference sample should not differ between the two conditions. It is often more convenient to use logarithms of the expression ratios than the ratios themselves because effects on intensity of microarray signals tend be multiplicative; for example, doubling the amount of RNA should double the signal over a wide range of absolute intensities. The logarithm transformation converts these multiplicative effects (ratios) into additive effects (differences), which are easier to model; the log ratio when there is no difference between conditions should thus be zero. If a single-color expression assay is used - such as the Affymetrix system [
7] - we are again considering a null hypothesis of no expression-level difference between the two conditions, and the methods described in this article can also be applied directly to this type of experiment.
A distinction should be made between RNA samples obtained from independent biological sources - biological replicates - and those that represent repeated sampling of the same biological material - technical replicates. Ideally, each condition should be represented by multiple independent biological samples in order to conduct statistical tests. If only technical replicates are available, statistical testing is still possible but the scope of any conclusions drawn may be limited [
3]. If both technical and biological replicates are available, for example if the same biological samples are measured twice each using a dye-swap assay, the individual log ratios of the technical replicates can be averaged to yield a single measurement for each biological unit in the experiment. Callow
et al. [
8] describe an example of a biologically replicated two-sample comparison, and our group [
9] provide an example with technical replication. More complicated settings that involve multiple layers of replication can be handled using the mixed-model analysis of variance techniques described below.
'Fold' change
The simplest method for identifying differentially expressed genes is to evaluate the log ratio between two conditions (or the average of ratios when there are replicates) and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed [
10-
12]. For example, if the cut-off value chosen is a two-fold difference, genes are taken to be differentially expressed if the expression under one condition is over two-fold greater or less than that under the other condition. This test, sometimes called 'fold' change, is not a statistical test, and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. The fold-change method is subject to bias if the data have not been properly normalized. For example, an excess of low-intensity genes may be identified as being differentially expressed because their fold-change values have a larger variance than the fold-change values of high-intensity genes [
13,
14]. Intensity-specific thresholds have been proposed as a remedy for this problem [
15].
The t test
The
t test is a simple, statistically based method for detecting differentially expressed genes (see Box for details of how it is calculated). In replicated experiments, the error variance (see Box ) can be estimated for each gene from the log ratios, and a standard
t test can be conducted for each gene [
8]; the resulting
t statistic can be used to determine which genes are significantly differentially expressed (see below). This gene-specific
t test is not affected by heterogeneity in variance across genes because it only uses information from one gene at a time. It may, however, have low power because the sample size - the number of RNA samples measured for each condition - is small. In addition, the variances estimated from each gene are not stable: for example, if the estimated variance for one gene is small, by chance, the
t value can be large even when the corresponding fold change is small. It is possible to compute a global
t test, using an estimate of error variance that is pooled across all genes, if it is assumed that the variance is homogeneous between different genes [
16,
17]. This is effectively a fold-change test because the global
t test ranks genes in an order that is the same as fold change; that is, it does not adjust for individual gene variability. It may therefore suffer from the same biases as a fold-change test if the error variance is not truly constant for all genes.
Modifications of the t test
As noted above, the error variance (the square root of which gives the denominator of the t tests) is hard to estimate and subject to erratic fluctuations when sample sizes are small. More stable estimates can be obtained by combining data across all genes, but these are subject to bias when the assumption of homogeneous variance is violated. Modified versions of the t test (Box ) find a middle ground that is both powerful and less subject to bias.
In the 'significance analysis of microarrays' (SAM) version of the
t test (known as the
S test) [
18], a small positive constant is added to the denominator of the gene-specific
t test. With this modification, genes with small fold changes will not be selected as significant; this removes the problem of stability mentioned above. The regularized
t test [
19] combines information from gene-specific and global average variance estimates by using a weighted average of the two as the denominator for a gene-specific
t test. The
B statistic proposed by Lonnstedt and Speed [
20] is a log posterior odds ratio of differential expression versus non-differential expression; it allows for gene-specific variances but it also combines information across many genes and thus should be more stable than the
t statistic (see Box for details).
The
t and
B tests based on log ratios can be found in the Statistics for Microarray Analysis (SMA) package [
21]; the
S test is available in the SAM software package [
22]; and the regularized
t test is in the Cyber T package [
23]. In addition, the Bioconductor [
24] has a collection of various analysis tools for microarray experiments. Additional modifications of the
t test are discussed by Pan [
25].
Graphical summaries (the 'volcano plot')
The 'volcano plot' is an effective and easy-to-interpret graph that summarizes both fold-change and t-test criteria (see Figure ). It is a scatter-plot of the negative log10-transformed p-values from the gene-specific t test (calculated as described in the next section) against the log2 fold change (Figure ). Genes with statistically significant differential expression according to the gene-specific t test will lie above a horizontal threshold line. Genes with large fold-change values will lie outside a pair of vertical threshold lines. The significant genes identified by the S, B, and regularized t tests will tend to be located in the upper left or upper right parts of the plot.