PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bioinformLink to Publisher's site
 
Bioinformation. 2007; 1(10): 384–389.
Published online 2007 April 10.
PMCID: PMC1896052

An adaptive alpha spending algorithm improves the power of statistical inference in microarray data analysis

Abstract

The adaptive alpha-spending algorithm incorporates additional contextual evidence (including correlations among genes) about differential expression to adjust the initial p-values to yield the alpha-spending adjusted p-values. The alpha-spending algorithm is named so because of its similarity with the alpha-spending algorithm in interim analysis of clinical trials in which stage-specific significance levels are assigned to each stage of the clinical trial. We show that the Bonferroni correction applied to the alpha-spending adjusted p-values approximately controls the Family Wise Error Rate under the complete null hypothesis. Using simulations we also show that the use of the alpha spending algorithm yields increased power over the unadjusted p-values while controlling FDR. We found the greater benefits of the alpha spending algorithm with increasing sample sizes and correlation among genes. The use of the alpha spending algorithm will result in microarray experiments that make more efficient use of their data and may help conserve resources.

Keywords: microarray data, contextual evidence, adaptive alpha spending

Background

Microarray technology has become a widely used and effective research tool in modern molecular biology. It can produce a snapshot of the expression levels of thousands of genes simultaneously at a very low cost per data point. However, researchers are often more interested in how biological pathways respond to experimental condition changes rather than in changes in expression levels of individual genes. The total flux through a pathway can change dramatically through subtle changes in expression levels of genes involved in that pathway. [1] Thus, the prevalence of microarray technology in the research of complex metabolic disorders makes the problem of identifying genes with subtle differential expression increasingly important. Unfortunately, the identification of genes with subtle differential expression is challenging due to the huge number of genes involved, the noisiness of the data, and the very small sample sizes (often not more than 5 observed expression levels per gene and/or per treatment group).

Most approaches for identifying differentially expressed genes may be of limited power because they neither take into account nor capitalize on dependencies among genes. As an alternative, we propose an adaptive alpha-spending algorithm that takes into account the dependencies of expression levels among genes explicitly by assigning gene-specific significance levels to each gene. The alpha-spending algorithm is named so because of its similarity with alpha-spending algorithms in interim analysis in clinical trials. [2] Interim analysis is often carried out at multiple times in a clinical trial for reasons such as checking adherence to the protocol, economic and ethical reasons. Because in interim analysis the same null-hypothesis is tested multiple times, not correcting for multiple testing will inflate the type 1 error. Multiplicity is controlled in the alpha-spending algorithm by assigning stage specific significance levels to each stage in the clinical trial such that the sum of stage specific significance levels is equal to the overall significance level, The PDF file linked below

Methodolgy

The gene-specific significance levels are based on a prediction equation similar to the linear regression prediction equation: as given in the PDF file linked below

Discussion

We have proposed an adaptive alpha-spending algorithm for finding differentially expressed genes in microarray data sets in which observed dependencies among genes are incorporated by assigning gene specific significance levels to each gene. We think this procedure may increase the power in finding differentially expressed genes. The supplementary material is attached

Our simulation study confirms that the alpha-spending algorithm controls the PCER and FDR in many practical situations. Under the complete null, the PCER was controlled with respect to all genes overall as well as for the group of uncorrelated genes. For the group of correlated genes, the PCER tended to be inflated (Table 1). Under the partial null, the PCER was controlled in all simulation parameter settings and the FDR was controlled in most of the simulation parameter settings (Figure 1). The observed PCER decreases for increasing group-size and correlation, but this relationship was not seen in the observed FDR. On average the alpha-spending algorithm improves the power and this power improvement increased for increasing group size or increasing correlation. The power improvement can be up to 47% for ρ = 0.7 and n = 6 (Figure 2). However the power improvement varied substantially across individual simulated data sets. For lower values of ρ and n power decreased for some simulated data sets and this decrease in power was up to 15% for ρ = 0.3 and n = 4 . For n ≥ 6 the alphaspending algorithm seemed to have added value. We also increased the number of genes in the simulation to 2000 for some cases; the results are very similar to what was obtained for the simulations with 700 genes.

Table 1
Observed PCER for the alpha-spending post-processed p-values estimated for correlated genes, uncorrelated genes, and all genes under the complete null hypothesis that all genes are non-differentially expressed. The number of genes in each simulation was ...
Figure 1
Observed PCER and observed FDR of the alpha-spending algorithm as a function of power of the ordinary t-test for different correlations ρ = 0.3, 0.5, 0.7 and different group sizes n = 4, 6,10 for k = 700 . The number of genes in each simulation ...
Figure 2
Power improvement of alpha-spending p-values with respect to the ordinary t-test. The results are from the partial null hypothesis simulations with 20% of the genes differentially expressed and correlated with the same correlation coefficient ρ ...

The supplementary material is attached

Conclusion

We have proposed an adaptive alpha-spending algorithm for finding differentially expressed genes in microarray data sets in which observed dependencies among genes are incorporated by assigning gene specific significance levels to each gene. We have shown that the alpha-spending algorithm approximately controls the FWER under the complete null. In a simulation study we have illustrated that the alpha-spending algorithm controls the PCER and FDR and improves the power when applied to the ordinary t-test under special circumstances within the two group comparisons with equal group sizes. However, there may be situations in which the PCER is inflated as was shown for the correlated genes under the complete null.

Supplementary Material

Data 1:
Data 2:
Data 3:
Data 4:

Acknowledgments

We thank Dr. Gary Gadbury for helpful contributions to earlier versions of this paper and Mr. Jelai Wang for help in developing early versions of the simulation code and the online supplement. We thank Dr. Purushotham Bangalore from the Department of Computer and Information Sciences (CIS) at the University of Alabama at Birmingham (UAB) for allowing us to use CPU cycles on the Everest cluster of UAB CIS and Dr. Alan Shih for allowing us to use CPU cycles at the Cahaba cluster at The Enabling Technology Lab, which part of the Mechanical Engineering Dept at UAB. This research was supported in part by NIH grants P30DK56336, P01AG11915, R01AG018922, P20CA093753, R01AG011653, R01DK56366, R01ES09912, U24DK058776, and U54CA100949; NSF grant 0090286; and a grant from the University of Alabama Health Services Foundation.

Footnotes

Citation:Brand et al., Bioinformation 1(10): 384-389 (2007)

References

1. Mootha VK, et al. Nat Genet. 2003;34:267. [PubMed]
2. Lan KKG, Demets DL. Biometrika. 1983;70:659.
3. Lachin JM. Stat Med. 2005;24:2747. [PubMed]
4. Benjamini Y, Hochberg Y. JRSS-B. 1995;57:289.
5. Morris CN. J Am Stat Assoc. 1983;78:47.
6. Edwards JW, et al. Funct Integr Genomics. 2005;5:32. [PubMed]
7. Cherepinsky V, et al. Proc Natl Acad Sci. 2003;100:9668. [PubMed]
8. Cui XG, et al. Biostatistics. 2005;6:59. [PubMed]

Articles from Bioinformation are provided here courtesy of Biomedical Informatics Publishing Group