Among the various reasons to conduct microarray experiments we may identify the following key objectives: comparison of different gene expression levels in varying conditions; identification of functionally related genes, discriminating samples through the clustering of gene expression profiles; and the unravelling of gene interaction networks from time series data. Here, we are interested in microarray data with at least three replicates under two different ‘conditions’. These conditions can either be normalized channel intensities, or gene expression levels at two different time points, comparing cell lines or strains, or generally ‘before’ versus ‘after’ or ‘control’ versus ‘treatment’ comparative experiments. Since in Section 4 we apply this technique to gene deletion studies, we refer to ‘normalized channel intensities’ when gDNA was hybridized against gDNA, instead of using the term ‘gene expression’. The aim is to reliably identify genes with significant differences in gene expression between the two conditions. This problem is non-trivial due to uncertainties arising from various sources of non-biological variation during experimentation, measurement and data pre-processing. For differences in expression levels the use of ‘fold changes’ is unreliable and a statistical analysis is required to distinguish true changes from random variations and to assign significance values to differences.
The data set we analysed was based on comparative genomic experiments between strains of Mycobacterium bovis
, the agent of bovine tuberculosis, and the human pathogen M.tuberculosis
. These organisms share >99.9% identity at the nucleotide level, but this is offset by a range of gene deletions from M.bovis
(Gordon et al., 2001
). It is still unclear whether these deletions confer any phenotype on M.bovis
, or whether the distribution of deletions differs across M.bovis
isolates. This would have major implications for strain evolution. Furthermore, this data set should in theory have been simple, in that we were comparing the presence or absence of genes across strains. However, as we shall show, even this relatively simple data set requires a robust statistical approach to ensure the validity of the results.
An important step in pre-processing microarray data is ‘normalization’ (Kepler et al., 2002
; Yang et al., 2002
; Quackenbush, 2002
), which ensures that data from different arrays are comparable. For the significance analysis investigated in this paper, we assume that data were normalized before we investigate differences in normalized channel intensities of the genes.
Various statistical approaches are available to test whether two samples are drawn from the same population or distribution. Although non-parametric tests such as Wilcoxon’s 2-sample test are applied extensively to microarray data, these are out of the scope of the present study. On the other hand, parametric tests applied to microarray data are based on the t
-test at one stage or another. They can be further classified according to whether they use assumptions about gene-to-gene interactions. Regression models and mixture models require prior assumptions that some calculated quantities have special, implicit relationships (e.g. expression level and standard deviation of genes) (Baldi and Long, 2001
; Huber et al., 2002
). The other type of tools, such as the simple t
-test, concentrate only on individual genes. We are going to demonstrate that these methods, concentrating on individual genes, perform poorer than methods based on regression.
In this paper we compare several methods including SAM (Tusher et al., 2001
; Efron et al., 2000
), Bayesian regularized t
-test (Baldi and Long, 2001
), mixture modeling (Pan, 2002
) (analyzed only theoretically), Wilk’s lambda score (Hwang et al., 2002
) and variance stabilization (Huber et al., 2002
). We discuss implicit assumptions, the model structure, computational complexity and applicability to microarray data. From this comparison, we developed a weighted resampling approach and applied it to the study of differences in channel intensities as the result of gene deletions in M.bovis
. The approach is derived from ideas in SAM and the regularized t
-test and employs weighted resampling. We introduced a new way of replicate handling to grasp the reliability of the replicates in two steps. First, this approach detects particular outliers gene-by-gene (which reflects the local disturbance on the array), second, it weights every resampled group according to the probability of containing an outlier. The results of our comparison justified the theoretical basis of this approach, which outperforms previously published algorithms. All the methods are not restricted to specific microarray experiments, thus can be applied to any comparative experiment.
The outline of this paper is as follows. In Section 2, we provide a short summary of the methods used in the comparative study. Section 3 introduces our weighted resampling approach to identify significant differences in gene expression levels (or in normalized channel intensities). This is followed by a comparison of our approach to the others with an application to data sets obtained for gene deletions in M.bovis
. The summary of the results and discussion are in Sections 4 and 5, respectively. In the supplementary material
) we prove the equality of variance stabilization with a special case of the modified t
-test. The web-link also includes the proof of the equivalence of Wilk’s lambda score (for two-class comparisons) with the simple t
-test. Note that in this paper we call two statistical tests equivalent if their test statistics are monotone function of each other, thus they give the same significance ranking for the genes. This does not necessarily mean that they pick the same group of genes at each confidence level.