*p*-value evaluation is a crucial ingredient in statistical hypothesis testing. Most often, the

*p*-value can be obtained according to the asymptotic distribution of the test statistic when the sample size is sufficiently large. But in many applications, the asymptotic distribution is either unreliable due to insufficient sample size or unavailable for complicated test statistics. Resampling-based procedures, such as permutation (

Good, 2005) or bootstrap (

Efron and Tibshirani, 1994), have been widely used for evaluating the

*p*-value in such situations.

The resampling-based procedure is easy to implement, but it requires repeated calculations of the test statistic on a large number of data sets simulated under some particular simulation scheme. In much current biomedical research, such as medical imaging studies (e.g. Naiman and Prebe, 2001) and genetic studies (e.g.

Wellcome Trust, 2007), a large number of tests (sometimes over 1 million tests) are conducted in order to identify very few targets of interest. Invariably, when so many tests are conducted, we face the task of evaluating extremely small

*p*-values. Also, if the family-wise false-positive rate is to be maintained at an acceptable level, any individual test must have an extremely small

*p*-value to be declared globally significant. For instance, in the case of genome-wide association study (GWAS) with half a million genetic markers, a commonly accepted threshold for global significance is 10

^{ − 7}. Using standard resampling-based procedures to estimate such a small

*p*-value can be a daunting task, as [100/

*p*] iterations are needed to reach the precision of

, where

*p* denotes the true

*p*-value and

is the standard deviation of the resampling-based estimate (

Kimmel and Shamir, 2006 ; Shi

*and others*, 2007). Thus, it generally requires about 10

^{9} iterations to achieve a reliable estimate for a

*p*-value of 10

^{ − 7}. This puts a huge computational burden on resampling-based testing procedures.

Our method was motivated by a novel analysis of the Cancer Genetic Markers of Susceptibility (CGEMS) genetic association study of prostate cancer (Yeager *and others*, 2007; Thomas *and others*, 2008). The analysis was performed on stage II follow-up study in which 3941 prostate cancer cases and 3964 controls were genotyped for 27 383 single nucleotide polymorphisms (SNPs) that showed promising association evidence in the CGEMS stage I GWAS. A number of GWAS studies including CGEMS have previously shown that the chromosomal region of 8q24 contains multiple susceptibility SNPs for prostate cancer and some other cancers as well. In spite of its central role in cancer genetics, the biologic function of this region is not well understood because of lack of known genes in this chromosomal region. The goal of our analysis of the CGEMS study was to scan for SNPs that may influence the risk of prostate cancer through interaction with 8q24 SNPs and thus could provide insight into the biology of the 8q24 region. We used the Tukey score test proposed by Chatterjee *and others* (2006) to conduct such a scan. Since the asymptotic distribution for the Tukey score test is not available, Chatterjee *and others* (2006) proposed a resampling-based procedure for the *p*-value evaluation. For detection of a significant SNP among 27 383 SNPs at a family-wise type I error rate of 0.05, the Bonferroni corrected *p*-value threshold for individual SNPs is 1.83×10^{ − 6}. To accurately evaluate *p*-values around or below this level, according to the above argument, the standard resampling-based procedure needs at least 10^{8} iterations. However, due to its complex nature and large sample size, a single run of the Tukey score test for this project takes about 0.02 s of CPU time on a 2.6 GHz Opteron Linux machine. As a result, the *p*-value evaluation for a single SNP using the resampling-based test with 10^{8} iterations requires a total of 556 h of CPU time. Thus, it is computationally very challenging, if not infeasible, to apply the standard resampling-based test to this project.

To alleviate the computational obstacle presented by the standard resampling-based procedure, various alternatives have been proposed. Theoretic approximation formulas have been derived for particular problems (e.g.

Siegmund, 1988; Feingold

*and others*, 1993;

Conneely and Boehnke, 2007; Li

*and others*, 2008). Monte Carlo methods based on importance sampling have been developed for some specific applications, such as the scan statistic for genetic linkage studies (

Anquist and Hossjer, 2004; Shi

*and others*, 2007) and the genetic association tests based on contingency table analysis (

Kimmel and Shamir, 2006). Although those alternatives work well for their targeted applications, they are not sufficiently general to be adapted readily for other applications. They achieve computational efficiency at the cost of application generality.

In this paper, we propose a *p*-value evaluation procedure that achieves both computational efficiency and application generality. The procedure is based on the recently developed stochastic approximation–based Monte Carlo algorithm (SAMC) (Liang *and others*, 2007). Once a simulation scheme (e.g. random permutation of the outcome) has been selected for generating the data under the null hypothesis, the SAMC algorithm can be used to evaluate the distribution of the test statistic effectively under that simulation scheme and to estimate the *p*-value in a dramatically reduced number of iteration steps.