Association studies have emerged as a powerful tool for discovering the genetic basis of human diseases 
. With the development of sequencing and high-throughput genotyping technologies, the number of single nucleotide polymorphism (SNP) markers genotyped by current association studies is dramatically increasing. The large number of correlated markers brings to the forefront the multiple hypothesis testing correction problem and has motivated much recent activity to address it 
There are two common versions of the multiple testing correction problem: per-marker threshold estimation and p-value correction. In a typical study which collects
markers, at each marker, we perform a statistical test and obtain a p-value which we refer to as a pointwise p-value
. We would like to know how significant a pointwise p-value needs to be in order to obtain a significant result given that we are observing
markers. The per-marker threshold
can be defined as the threshold for pointwise p-values which controls the probability of one or more false positives 
. Similarly, we would like to quantitatively measure the significance of a pointwise p-value taking into account that we are observing
markers. For each pointwise p-value, the corrected p-value
can be defined as the probability that, under the null hypothesis, a p-value equal to or smaller than the pointwise p-value will be observed at any marker 
. For example, the Bonferroni correction corrects a pointwise p-value
, or estimates the per-marker threshold as
given a significance threshold
While the Bonferroni (or Šidák) correction provides the simplest way to correct for multiple testing by assuming independence between markers, permutation testing is widely considered the gold standard for accurately correcting for multiple testing 
. However, permutation is often computationally intensive for large data sets 
. For example, running 1 million permutations for a dataset of 500,000 SNPs over 5,000 samples takes up to 4 CPU years using widely used software such as PLINK 
). On the other hand, the Bonferroni (or Šidák) correction ignores correlation between markers and leads to an overly conservative correction, which is exacerbated as the marker density increases.
In this paper, we correct for multiple testing using the framework of the multivariate normal distribution (MVN). For many widely used statistical tests, the statistics over multiple markers asymptotically follow a MVN 
. Using this observation, several recent studies 
proposed efficient alternative approaches to the permutation test, and showed that they are as accurate as the permutation test for small regions at the size of candidate gene studies (with <1% average error in corrected p-values) 
. However, when applied to genome-wide datasets, they are not as accurate. In our analysis of the Wellcome Trust Case Control Consortium (WTCCC) data 
, these methods eliminate only two-thirds of the error in the corrected p-values relative to the Bonferroni correction. There are two main reasons why these methods do not eliminate all of the error. First, the previous MVN-based methods can be extended to genome-wide analyses only by partitioning the genome into small linkage disequilibrium (LD) blocks and assuming markers in different blocks are independent, because they can handle only up to hundreds of markers in practice 
. This block-wise strategy leads to conservative estimates because inter-block correlations are ignored (). Second, these methods do not account for the previously unrecognized phenomenon that the true null distribution of a test statistic often fails to follow the asymptotic distribution at the extreme tails of the distribution, even with thousands of samples.
Block-wise strategy and sliding-window approach.
We propose a method for multiple testing correction called SLIDE (a Sliding-window approach for Locally Inter-correlated markers with asymptotic Distribution Errors corrected), which differs from previous methods in two aspects. First, SLIDE uses a sliding-window approach instead of the block-wise strategy. SLIDE approximates the correlation matrix as a band matrix (a matrix with non-zero elements along the diagonal band), which can effectively characterize the overall correlation structure between markers given a sufficiently large bandwidth. Then SLIDE uses a sliding-window Monte-Carlo approach which samples a statistic at each marker by conditioning on the statistics at previous markers within the window, accounting for entire correlation in the band matrix ().
Second, SLIDE takes into account the phenomenon that the true null distribution of a test statistic often fails to follow the asymptotic distribution at the tails of the distribution. It is well known that if the sample size is small, the true distribution and the asymptotic distribution show a discrepancy 
. However, to the best of our knowledge, the effect of this discrepancy in the context of association studies has not been recognized, since thousands of samples are typically not considered a small sample. We observe that this discrepancy often appears in genome-wide association studies, even with thousands of samples, because of the extremely small genome-wide per-marker threshold (or pointwise p-value). The error caused by this discrepancy is more serious for datasets with a large number of rare variants, highlighting the importance of this problem for association studies based on next-generation sequencing technologies (See Materials and Methods
). SLIDE corrects for this error by scaling the asymptotic distribution to fit to the true distribution.
With these two advances, SLIDE is as accurate as the permutation test. In our simulation using the WTCCC dataset 
, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of previous MVN-based methods' corrected p-values, and 80 times smaller than the error rate of the Bonferroni-corrected p-values. Our simulation using the 2.7 million HapMap SNPs 
shows that SLIDE is accurate for higher-density marker datasets as well. In contrast, the error rates of previous MVN-based methods increase with the marker density, since the dataset will include more rare variants. Computationally, our simulation shows that SLIDE is orders of magnitude faster than the permutation test and faster than other competing methods.
The MVN framework for multiple testing correction is very general, allowing it to be applied to many different contexts such as quantitative trait mapping or multiple disease models 
. We show that the MVN framework can also correct for multiple testing for the weighted haplotype test 
and the test for imputed genotypes based on the posterior probabilities 
In addition to multiple testing correction, we extend the MVN framework to solve the problem of estimating the statistical power of an association study with correlated markers. There are two traditional approaches to this problem: a simulation approach constructing case/control panels from the reference dataset 
, which is widely considered the standard but is computationally intensive; and the best-tag Bonferroni method 
, which is an efficient approximation but is often inaccurate.
The power estimation problem can be solved within the MVN framework because the test statistic under the alternative hypothesis follows a MVN centered at the non-centrality parameters (NCP). The vector of the NCPs turns out to be approximately proportional to the vector of correlation coefficients (
) between the causal SNP and the markers. This is a multi-marker generalization of the Pritchard and Preworzki 
single-marker derivation of the NCP proportional to
. Our method SLIP (S
liding-window approach for L
nter-correlated markers for P
ower estimation) efficiently estimates a study's power using the MVN framework.
Seaman and Müller-Myhsok 
and Lin 
pioneered the use of the MVN for multiple testing correction. Seaman and Müller-Myhsok described the direct simulation approach (DSA) method. Conneely and Boehnke 
increased its efficiency by adapting an available software package called mvtnorm 
. Both studies primarily focused on datasets used in candidate gene studies and suggested the block-wise strategy as a possible approach for genome-wide studies.
Another approach for multiple testing correction is to estimate the effective number of tests from eigenvalues of the correlation matrix 
. Recently, Moskvina and Schmidt 
and Pe'er et al. 
showed that the effective number of tests varies by the p-value levels, demonstrating that a method estimating a constant effective number can be inaccurate. Moskvina and Schmidt 
proposed a pairwise correlation-based method called Keffective, which estimates the effective number taking into account the significance level. Keffective is a sliding-window approach similar to SLIDE, but it differs because within each window it uses the pairwise correlation to the most correlated marker, while SLIDE uses the conditional distribution given all markers. Fitting the minimum p-value distribution by a beta distribution 
has been shown often to be inaccurate 
. Kimmel and Shamir 
developed an importance sampling procedure called rapid association test (RAT). RAT is efficient for correcting very significant p-values, but requires phased haplotype data.
Connecting the multiple testing correction and power estimation problems leads to the insight that the per-marker threshold estimated from the reference dataset for estimating power can be used as a precomputed approximation to the true per-marker threshold for the collected samples. In simulations using the WTCCC control data, we show that the per-marker threshold estimated from the HapMap CEU population data approximately controls the false positive rate.
Our methods SLIP and SLIDE require only summary statistics such as the correlation between markers within the window size, allele frequencies, and the number of individuals. Therefore unlike the permutation test, our method can still be applied even if the actual genotype data is not accessible. Our methods are available at http://slide.cs.ucla.edu