Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3117224

Formats

Article sections

Authors

Related links

Ann Hum Genet. Author manuscript; available in PMC 2012 March 1.

Published in final edited form as:

Published online 2010 November 25. doi: 10.1111/j.1469-1809.2010.00626.x

PMCID: PMC3117224

NIHMSID: NIHMS302257

The publisher's final edited version of this article is available free at Ann Hum Genet

See other articles in PMC that cite the published article.

New technology for large-scale genotyping has created new challenges for statistical analysis. Correcting for multiple comparison without discarding true positive results and extending methods to triad studies are among the important problems facing statisticians. We present a one-sample permutation test for testing transmission disequilibrium hypotheses in triad studies, and show how this test can be used for multiple single nucleotide polymorphism (SNP) testing. The resulting multiple comparison procedure is shown in the case of the transmission disequilibrium test to control the familywise error. Furthermore, this procedure can handle multiple possible modes of risk inheritance per SNP. The resulting permutational procedure is shown through simulation of SNP data to be more powerful than the Bonferroni procedure when the SNPs are in linkage disequilibrium. Moreover, permutations implicitly avoid any multiple comparison correction penalties when the SNP has a rare allele. The method is illustrated by analyzing a large candidate gene study of neural tube defects and an independent study of oral clefts, where the smallest adjusted p-values using the permutation procedure are approximately half those of the Bonferroni procedure. We conclude that permutation tests are more powerful for identifying disease-associated SNPs in candidate gene studies and are useful for analysis of triad studies.

Advances in technology have led to an increase in large genetic association studies of disease. Along with the ability to look at large numbers of single nucleotide polymorphisms (SNPs) has come the need for improved methods of statistical correction for multiplicity. The Bonferroni procedure is the simplest and most often used method of correction. However, the Bonferroni procedure is well known to be overly conservative in the presence of correlation (Han et al., 2009).

Permutation tests implicitly account for correlation through the use of the data vectors, thereby improving power over Bonferroni-type methods. Permutation tests are also the only multiple comparison procedures capable of exact error control for small or moderate sample sizes. Multiple comparison procedures based on permutations have several other attractive features. One is that they implicitly reduce the penalty for comparisons when the events are rare (Westfall & Troendle, 2008), such as when a SNP is too uncommon to produce a small enough p-value to affect the permutational correction for multiplicity. The permutational procedure will essentially consider that SNP not tested, effectively reducing the correction factor. This is an important advantage when some of the SNPs under study have rare alleles. Another advantage of permutation tests is their ability to handle multiple tests of the same hypothesis easily. It is quite common in genetic association studies for several different modes of inheritance to be considered, typically dominant, recessive, and multiplicative. Each of these modes of inheritance leads to different tests of the null hypothesis. Permutational procedures need only consider the minimum p-value across all tests and SNPs to produce adjusted p-values that account for both the multiple SNPs and multiple tests applied to each SNP.

Another statistical challenge is providing improved methods for family-based studies. Some diseases, like birth defects, are well suited to collection of genetic information on triads (case child, mother, father). The use of triads avoids two problems: (1) ascertainment bias inherent in control selection (Schlesselman, 1982) and (2) population stratification where the case groups may contain different proportions of an ethnic group than the control group. Both of these lead to excess type I errors in tests of association using case–control designs (Lee & Wang, 2008). Triads allow methodology conditioned on the parental genotypes that is robust to population stratification. A very common genetic association test for a single bi-allelic locus (e.g., SNP), based only on triad data, is the transmission disequilibrium test (TDT) (Spielman et al., 1993). This test can be obtained as a likelihood ratio test for the child's genotype in a multiplicative model, conditioned on the parental genotype.

In this report, we describe a one-sample permutational approach for the inheritance-association hypothesis, and show how it can be used when correcting for multiplicity. The resulting multiple comparison procedure permits testing multiple SNPs and multiple tests of each SNP to allow for different inheritance models. We show that the method strongly controls the familywise error rate (FWE), regardless of sample size. Simulations show the permutational procedure to have more power than the Bonferroni procedure under varying inheritance modes, risk allele proportions, and SNP correlations. We analyze a study of candidate genes in neural tube defect (NTD [MIM #182940]) triads as well as a study of oral cleft (OFC1 [MIM #119530]) triads in Ireland, showing that the smallest adjusted p-values from the permutational procedure can be approximately half that of the Bonferroni procedure.

We start with the classical one-sample permutation test that arises from a pre-test post-test design. In this design a single sample of subjects is observed before and after some intervention. Let (*Z*_{i}, *W*_{i}) be the measured variable on subject *i* (pre-intervention, post-intervention), *i* = 1*,...,n*. A test to see if the intervention led to different values of the measured variable is based on the values of the differences, *D _{i}* =

Consider the case of testing for genetic association using genotype data on triads. Designate the allele of interest *A*, and let the other allele be denoted *G* (it is not important that the alleles actually be A and G). The data from *n* triads regarding the transmission of these alleles is shown in Table 1. The TDT is based on whether or not the designated allele is transmitted by each heterozygous parent to the case child (Spielman et al., 1993). The null hypothesis is that there is no association between disease and transmission of the designated allele. The standard *χ*^{2} test for this hypothesis has test statistic *TS* = (*b − c* )^{2}/(*b* + *c*), which under the null hypothesis has a ${\chi}_{\left(1\right)}^{2}$ distribution.

The data for the *i*th triad can be represented as (*Z _{i}*,

A full permutation test of the null hypothesis considers each of the possible permuted datasets (*Z*_{1}, *W*_{1}), … (*Z _{n}*,

In most applications, a full permutation test is not computationally feasible. For example, there are 2^{n} different permutation datasets for a study with *n* triads. In these cases a random sample of permuted datasets are selected. For a random permutation test, let *TS ^{r}* be the test statistic computed from the

Suppose now that there are data from *n* triads on *k* SNPs. One might want to test the null hypothesis that there is no genetic association of disease with any of the *k* SNPs. This leads quite naturally to a multivariate permutation test. Suppose for each SNP we have a designated allele, which we denote *A*, and the other allele is denoted *G*. The data for the *i*th triad on the *j*th SNP can now be represented as (*Z _{ij}*,

$${Z}_{i}=\left(\begin{array}{c}{Z}_{i1}\hfill \\ {Z}_{i2}\hfill \\ \vdots \hfill \\ {Z}_{\mathit{ik}}\hfill \end{array}\right)$$

and

$${W}_{i}=\left(\begin{array}{c}{W}_{i1}\hfill \\ {W}_{i2}\hfill \\ \vdots \hfill \\ {W}_{\mathit{ik}}\hfill \end{array}\right).$$

The null hypothesis is that transmission of the designated allele on any SNP has no effect on the risk of being a case. Many test statistics could be chosen to test this composite or overall null hypothesis. However, a simple and very effective choice is the maximum of the TDT test statistics from each SNP individually. Therefore, denote *TS* = max{*TS _{j}* :

The multi-SNP case illustrates one of the advantages of permutations. Permutations estimate the exact conditional distribution of the *TS* under the null hypothesis. In contrast, a parametric approach leaves one trying to estimate the distribution of the maximum of *k* correlated ${\chi}_{\left(1\right)}^{2}$ variables. This is not a problem that can be solved without resorting to asymptotics or approximations. Neither asymptotics nor approximations work well as *k* increases.

Suppose there are *n* triads on two SNPs. Here we will show in detail what the permutations might look like and how they are obtained. Figure 1 shows the first three triads from a hypothetical dataset. According to the notation given in the previous section, the data vectors for the first three triads are

$${Z}_{1}=\left(\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 0\hfill \end{array}\right),\phantom{\rule{thinmathspace}{0ex}}{W}_{1}=\left(\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{array}\right),$$

$${Z}_{2}=\left(\begin{array}{c}\hfill 1\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{array}\right),\phantom{\rule{thinmathspace}{0ex}}{W}_{2}=\left(\begin{array}{c}\hfill 0\hfill \\ \hfill 1\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{array}\right),$$

and

$${Z}_{3}=\left(\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 0\hfill \end{array}\right),\phantom{\rule{thinmathspace}{0ex}}{W}_{3}=\left(\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{array}\right).$$

The first two rows of the *Z* and *W* vectors correspond to the information from the parents about SNP1 transmission, whereas the later two rows correspond to SNP2 transmission. Notice that rows of the *Z* and *W* vectors for which both the *Z* and *W* component is 0 correspond to nonheterozygous parents. Permutations are made to the corresponding *Z* and *W* pairs. Thus, if we let ${Z}_{1}^{\ast}$ be the permuted *Z*_{1} vector, ${Z}_{1}^{\ast}$ will either be $\left(\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 0\hfill \end{array}\right)$ or $\left(\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{array}\right)$ with equal probability. After each *Z* and *W* pair are independently permuted, one has a new dataset ${Z}_{1}^{\ast},{W}_{1}^{\ast},\dots ,{Z}_{n}^{\ast},{W}_{n}^{\ast}$, which is then used to obtain *TS*^{1}. The test statistic for the TDT, given previously and expressed in terms of the *Z* and *W* vectorsis (*b* − *c* )^{2}*/*(*b* + *c*), where *b* is the sum of the *Z* components (over all of the triads) and *c* is the sum of the *W* components. This process is repeated *M* times to obtain *TS ^{r}* for

Consider again the situation described in the previous section where we have data from *n* triads on *k* SNPs. Let *p _{j}* be the exact binomial TDT p-value for SNP

Sequential versions of the Bonferroni procedure like the Holm procedure (Holm, 1976), provide improvements. However, in large genetic association studies there are typically few SNPs that survive correction and the improvement over the Bonferroni is small unless the fraction of rejected null hypotheses is substantial. In this paper, we only consider single step procedures that treat all hypotheses without regard to rejection of any other hypotheses.

A permutational procedure is easily applied to this problem by slightly modifying the multivariate permutation version of the TDT described in the previous section. Let *TS ^{r}* = max{

Let *H _{j}* be the null hypothesis for SNP

The JDC here says essentially that if the null hypothesis holds for individual SNPs, then it holds in a multivariate sense for those same SNPs taken as a collection. The advantage of the JDC condition is that with the condition, multivariate permutation gives the exact joint distribution of test statistics under the null hypothesis. Without the JDC, the multivariate distribution of test statistics would be unknown even though each one would have a known marginal distribution.

When the JDC holds, it is easy to see that the procedure controls the FWE regardless of the true hypotheses and regardless of sample size. For the genetic association tests of multiple SNPs we are considering, we show now that the JDC does hold. The reasoning is that because each *H _{j}*,

The JDC does not always hold. In fact, in the usual two-sample case of case–control comparisons on multiple SNPs, it would not be expected to hold in general. In that case, correction for multiplicity based on multivariate permutation of the cases and controls does not strongly control the familywise error without assuming the JDC. If for any reason the covariance structure of the SNPs was different for cases than controls, the JDC would not hold. One way in which such a differential correlation might arise would be in a particular type of interaction between SNPs. However, in the case of interaction it might be seen as a benefit rather than a drawback that the method might lead to rejection of the null hypothesis for certain SNPs that are part of an interaction, although this would technically be a familywise error for the multiple comparison procedure.

Often in genetic association studies one would like to use several tests of the same null hypothesis. Typically, dominant, recessive, and multiplicative inheritance models for a given SNP are assumed, leading to different tests of the no association null hypothesis. Regardless of what inheritance models one might decide to use in testing the hypothesis of no association for the *j*th SNP with disease, the only necessary modification of the permutational procedure is that the max in the definition of *TS ^{r}* extends also over the different tests applied to the

Monte Carlo simulations were used to assess the FWE and power of the permutational multiple testing procedure, and compare it with use of the Bonferroni procedure. A genotype relative risk model was assumed at each SNP *j*,where *ψ*_{1} represents the risk of disease with one copy of the allele of interest divided by the risk of disease with no copies. Similarly, *ψ*_{2} represents the risk of disease with two copies of the allele of interest divided by the risk of disease with no copies. Correlated multivariate genotype data for triads was generated by first obtaining haplotype data in linkage disequilibrium (LD) for the parents and then applying an inheritance model.

To generate haplotype data in LD for the parents of cases, SNPs on the same strand were assumed to be in linked blocks of length *n _{b}*, with SNPs between blocks independent. For the purpose of the simulations the proportion of the allele of interest in the population,

$${P}_{r}\{\text{allele of interest on current SNP}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}\text{allele of interest on previous SNP}\}=\frac{{D}_{00}+{p}^{2}}{p}$$

and

$${P}_{r}\{\text{allele of interest on current SNP}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}\text{no allele of interest on previous SNP}\}=\frac{p(1-p)-{D}_{00}}{1-p}.$$

Each parental chromosome is generated independently.

Once haplotypes for the parents are generated, genotypes for the children are generated using a random crossover model. In this model, the child inherits from the same strand until a random crossover event occurs with probability *p _{c}*, where inheritance is then from the other strand.

A total of 500 triads were simulated for each replication of the simulation experiment, with 100 SNPs in blocks of size *n _{b}* = 5. The LD parameter was

Table 2 shows the results of the null simulations for testing at level 0.05. Each simulation consisted of 100,000 replications. Both procedures control the FWE at the desired level of 5%. However, we note that the Bonferroni procedure becomes overly conservative when the correlation between SNPs within blocks is increased (correlation increases as *p _{c}* decreases).

Table 3 shows the results of simulations under three different nonnull models. The nonnull models correspond to dominant, recessive, and multiplicative disease inheritance patterns. Each simulation consisted of 10,000 replications. In each simulation, there were 5 nonnull SNPs out of 100. The average power over the nonnull SNPs is reported. In the cases of low correlation, there is very little difference in power between the methods. However, in the cases of high correlation, the power is substantially higher in the permutational method. This is in agreement with similar comparisons between the Bonferroni and two-sample permutational procedures for controlling the FWE.

As part of a candidate gene study, we analyzed 1339 SNPs from 93 genes on 277 complete NTD triads from the Republic of Ireland. Birth defects are ideal candidates for triad studies because the parents are usually easily identified and likely to be willing to agree to participate. Using multiplicity correction is extremely harsh and the Bonferroni corrected p-values are all 1.0 after truncation, despite the smallest unadjusted p-value being 0.002233. The Bonferroni multiplier is 1339, so the Bonferroni adjusted p-value for the most significant SNP is not even close to being below 1.0 as 1339 × 0.002233 = 3.0. In contrast the permutational procedure gives adjusted p-values below 1.0 (smallest permutational adjusted p-value was 0.74), much smaller than the Bonferroni. Thus, although none of the adjusted p-values are close to being significant, it is clear that the adjusted p-values from the Bonferroni procedure are extremely conservative when compared to those from the permutational procedure.

As an example to see how the permutational adjustment compares to the Bonferroni when some of the adjusted p-values are relatively small, we present a subanalysis of the above experiment. This is presented as an example to examine the relative size of the adjusted p-values, and not to represent what we consider appropriate control for multiplicity. We consider 18 SNPs from a single gene on 277 complete NTD triads. The p-values adjusted only for the 18 SNPs are presented in Table 4. One sees again that the permutational procedure gives smaller adjusted p-values than the Bonferroni. Moreover, the improvement is quite large when considered as a proportion of the Bonferroni adjusted p-value. For the smallest unadjusted p-value, the permutational procedure yields an adjusted p-value more than 40% smaller than the corresponding Bonferroni adjusted p-value.

A final example is given from an independent study. As part of an analysis of oral clefts in Ireland (Carter et al., 2010), 31 SNPs on 250 complete cleft palate only case triads were analyzed. Again, this is presented as an example to compare the adjusted p-values of the procedures, and not to represent what we consider appropriate control for multiplicity, or to represent a complete analysis of cleft cases on these SNPs. The p-values corresponding to the 10 most significant SNPs, adjusted for all 31 SNPs are presented in Table 5. In this case, the smallest adjusted p-value of the permutational procedure is almost 50% smaller than the corresponding Bonferroni adjusted p-value.

We have shown that permutations can be used to approximate the null distribution under the TDT null hypothesis, and that this leads to a one-sample permutational test. This extends to tests of multiple SNPs by permuting vectors of genotypes. Furthermore, we have shown how an FWE-controlling multiple comparison procedure can be obtained quite simply and have proven strong control of the FWE. The methodology extended easily to allow for multiple tests per hypothesis, which accommodates testing via multiple inheritance models in the same study. The permutational approach given here may also be used in more complex family-based designs that include either affected or unaffected siblings. Simulations show that the permutational procedure has the desired FWE level, and that it has a substantial power advantage over the Bonferroni procedure when the SNPs are in LD. This is important when a segment of a gene with multiple SNPs in LD is being examined. Although the power advantages of our approach compared to the Bonferroni procedure were most notable in cases of LD, there is an additional, perhaps more important, reason why this approach may be valuable in genome-wide association studies (GWAS). Our simulations did not contain rare SNPs, a situation where permutational adjustments are more powerful compared to the Bonferroni.

In the future technology will doubtless become available to examine more genetic variants than can be studied currently. This advance will create more problems for statisticians dealing with multiple comparisons. The permutation procedure described here will aid in dealing with these problems because of its ability to handle rare alleles and LD more efficiently than currently used methods (e.g., Bonferroni). We conclude that using a permutational version of the TDT is feasible, and leads to more powerful detection of associated SNPs in candidate gene studies of triads.

This research was supported in part by the Intramural Research Program of the NIH, NICHD. We thank Dr. Lawrence Brody for his insightful advice.

- Carter TC, Molloy AM, Pangilinan F, Troendle JF, Kirke PN, Conley MR, Orr DJA, Earley M, McKiernan E, Lynn EC, Doyle A, Scott JM, Brody LC, Mills JL. Testing reported associations of genetic risk factors for oral clefts in a large Irish study population. Birth Defects Res A. 2010;88:84–93. [PMC free article] [PubMed]
- Han B, Kang HM, Eskin E. Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet. 2009;5:e1000456. 1–13. [PMC free article] [PubMed]
- Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1976;6:65–70.
- Lee W-C, Wang L-Y. Simple formulas for gauging the potential impact of population stratification bias. Am J Epi. 2008;167:86–89. [PubMed]
- Pesarin F. Multivariate Permutation Tests: With Applications in Biostatistics. Wiley; Chichester: 2001.
- Schaid DJ, Sommer SS. Genotype relative risks: Methods for design and analysis of candidate-gene association studies. Am J Hum Genet. 1993;53:1114–1126. [PubMed]
- Schlesselman JJ. Case-Control Studies: Design, Conduct, Analysis. Oxford University Press; New York: 1982.
- Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–516. [PubMed]
- Westfall PH, Troendle JF. Multiple testing with minimal assumptions. Biometr J. 2008;50:745–755. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |