Home | About | Journals | Submit | Contact Us | Français |

**|**Stat Appl Genet Mol Biol**|**PMC2703613

Formats

Article sections

Authors

Related links

Stat Appl Genet Mol Biol. 2009 January 1; 8(1): 23.

Published online 2009 April 16. doi: 10.2202/1544-6115.1437

PMCID: PMC2703613

NIHMSID: NIHMS105545

Copyright © 2009 The Berkeley Electronic Press. All rights reserved

This article has been cited by other articles in PMC.

Multiple hypothesis testing is commonly used in genome research such as genome-wide studies and gene expression data analysis (Lin, 2005). The widely used Bonferroni procedure controls the family-wise error rate (FWER) for multiple hypothesis testing, but has limited statistical power as the number of hypotheses tested increases. The power of multiple testing procedures can be increased by using weighted p-values (Genovese et al., 2006). The weights for the p-values can be estimated by using certain prior information. Wasserman and Roeder (2006) described a weighted Bonferroni procedure, which incorporates weighted p-values into the Bonferroni procedure, and Rubin et al. (2006) and Wasserman and Roeder (2006) estimated the optimal weights that maximize the power of the weighted Bonferroni procedure under the assumption that the means of the test statistics in the multiple testing are known (these weights are called optimal Bonferroni weights). This weighted Bonferroni procedure controls FWER and can have higher power than the Bonferroni procedure, especially when the optimal Bonferroni weights are used. To further improve the power of the weighted Bonferroni procedure, first we propose a weighted Šidák procedure that incorporates weighted p-values into the Šidák procedure, and then we estimate the optimal weights that maximize the average power of the weighted Šidák procedure under the assumption that the means of the test statistics in the multiple testing are known (these weights are called optimal Šidák weights). This weighted Šidák procedure can have higher power than the weighted Bonferroni procedure. Second, we develop a generalized sequential (GS) Šidák procedure that incorporates weighted p-values into the sequential Šidák procedure (Scherrer, 1984). This GS Šidák procedure is an extension of and has higher power than the GS Bonferroni procedure of Holm (1979). Finally, under the assumption that the means of the test statistics in the multiple testing are known, we incorporate the optimal Šidák weights and the optimal Bonferroni weights into the GS Šidák procedure and the GS Bonferroni procedure, respectively. Theoretical proof and/or simulation studies show that the GS Šidák procedure can have higher power than the GS Bonferroni procedure when their corresponding optimal weights are used, and that both of these GS procedures can have much higher power than the weighted Šidák and the weighted Bonferroni procedures. All proposed procedures control the FWER well and are useful when prior information is available to estimate the weights.

Multiple hypothesis testing involves testing multiple hypotheses simultaneously; each hypothesis is associated with a test statistic (Rubin *et al.*, 2006). Multiple hypothesis testing is a common problem in genome research, such as genome-wide studies and gene expression data analysis (Lin, 2005). For multiple hypothesis testing, a traditional criterion for error (type I) control is the family-wise error rate (FWER), which is the probability of rejecting one or more true null hypotheses (Hochberg and Tamhane, 1987; Lin, 2005).

The Bonferroni procedure (Bonferroni, 1937) and the Šidák procedure (Šidák, 1967) are two well-known methods for controlling FWER with computational simplicity and wide applicability (Olejnik *et al.*, 1997). However, both of these methods have limited statistical power as the number of hypotheses tested (*m*) increases (Nakagawa, 2004). Holm (1979) proposed a (step-down) sequential Bonferroni procedure which has slightly higher power than the Bonferroni procedure but there is little difference between these two procedures when the number of tests (*m*) is large (Lin, 2005). As an extension of the (step-down) sequential Bonferroni procedure, Holm (1979) proposed a generalized sequential (GS) Bonferroni procedure by using different weights for hypotheses of different importance. Although Holm did not show how to estimate the weights, the method has the potential to improve the power of multiple hypothesis testing when prior information is available to estimate the weights.

Rubin *et al.* (2006) and Wasserman and Roeder (2006) proposed a weighted Bonferroni procedure that adjusts p-values by using optimal weights. These optimal weights were calculated by maximizing the average power of the weighted Bonferroni procedure under the assumption that the means of all test statistics are known, and these weights are called optimal Bonferroni weights. Under such assumption, the average power of the weighted Bonferroni procedure is much higher than that of the Bonferroni procedure (Rubin *et al.*, 2006; Genovese *et al.*, 2006; Wasserman and Roeder, 2006). In practice, the means of the test statistics are unknown. However, if some prior information is available to estimate the means, this weighted Bonferroni procedure can be more powerful than the Bonferroni procedure (Rubin *et al.*, 2006; Wasserman and Roeder, 2006; Roeder *et al.*, 2006; Roeder *et al.*, 2007).

The purpose of this study is to develop more powerful weighted hypothesis testing procedures as extensions of the weighted Bonferroni procedure. First, we propose a weighted Šidák procedure, and then under the assumption that the means of all test statistics are known, we estimate the optimal weights maximizing the average power of the weighted Šidák procedure (these weights are called optimal Šidák weights). The weighted Šidák procedure has slightly higher power than the weighted Bonferroni procedure. Second, we develop a GS Šidák procedure as an extension of the GS Bonferroni procedure of Holm (1979) and the sequential Šidák procedure (Scherrer, 1984). Finally, assuming that the means of all test statistics are known, we incorporate the optimal Šidák weights and the optimal Bonferroni weights into the GS Šidák procedure and the GS Bonferroni procedure, respectively. Theoretical proof and/or simulation studies show that, using their corresponding optimal weights, the GS Šidák procedure has slightly higher power than the GS Bonferroni procedure, and that both GS Šidák procedure and GS Bonferroni procedure have much higher power than the weighted Šidák procedure and the weighted Bonferroni procedure. All the proposed procedures can control the FWER well.

Consider testing *m* (null) hypotheses **H** = *H*_{1}*, H*_{2}*, H** _{m}* with corresponding test statistics

As described earlier, FWER is the probability of falsely rejecting at least one true null hypothesis (Hochberg and Tamhane, 1987), which can be written as

$$\text{FWER}=\text{Pr}(\text{rejectingatleastone}{H}_{i}|{H}_{i}\in {\mathbf{\text{H}}}_{0}).$$

A multiple testing procedure is said to control the family-wise error rate at a significance level *α* if FWER *≤ α*.

The power for a single test is called *per-hypothesis* power. For a single test with hypothesis *H** _{i}*, the

In the Bonferroni procedure, if
${p}_{j}\le \frac{\alpha}{m}$, then reject the null hypothesis *H** _{j}* ; otherwise, it is failed to reject

$$\frac{1}{m}\sum _{j=1}^{m}{w}_{j}=1.$$

(1)

For hypothesis *H** _{j}* (1

This procedure controls FWER at level *α*. The weights (*w*_{1}, *w*_{2}, * w** _{m}*) can be specified by using certain prior information available to the researcher. For example, in genome-wide association studies, the prior information can be linkage signals or results from gene expression analyses. Roeder

Rubin *et al.* (2006) and Wasserman and Roeder (2006) independently proposed very similar approaches to estimate the optimal weights by maximizing the average power of the procedure, assuming that the e means **μ** = (*μ*_{1}, *μ*_{2} *, μ** _{m}*) are known. We call these optimal weights optimal Bonferroni weights and they are calculated (Wasserman and Roeder, 2006) by

$${w}_{j}=\frac{m}{\alpha}\overline{\Phi}\left(\frac{{\mu}_{j}}{2}+\frac{\Delta}{{\mu}_{j}}\right)I\left({\mu}_{j}>0\right),$$

(2)

where (*x*) is the upper tail probability of a standard normal cumulative distribution function (CDF) (i.e., (*x*) = 1- Φ(*x*) and Φ(*x*) denotes the CDF of the standard normal distribution) and Δ is the constant that satisfies equations (1) and (2) i.e.

$$\frac{1}{m}\sum _{j=1}^{m}\frac{m}{\alpha}\overline{\Phi}\left(\frac{{\mu}_{j}}{2}+\frac{\Delta}{{\mu}_{j}}\right)I\left({\mu}_{j}>0\right)=\alpha .$$

(3)

As an illustrative example, Figure 1 shows the optimal Bonferroni weights as a function of the means *μ** _{j}* in a multiple testing with 100 tests. The means

Since the Šidák procedure has higher power than the Bonferroni procedure for independent tests (Simes, 1986), we propose a weighted Šidák procedure that incorporates weighted p-values into the Šidák procedure as an extension of the weighted Bonferroni procedure. We also describe how to calculate the optimal weights for the weighted Šidák procedure assuming means of the test statistics are known.

In the Šidák procedure (Šidák, 1967), for any null hypothesis *H** _{j}* (1

$${p}_{j}\le 1-{(1-\alpha )}^{\frac{{w}_{j}}{m}}\text{or}\text{equivalently}{(1-{p}_{j})}^{\frac{1}{{w}_{j}}}\ge {(1-\alpha )}^{\frac{1}{m}},$$

(4)

then reject the null hypothesis *H** _{j}*; on the other hand, when

**Theorem 1.** *Suppose m tests are independent, then the weighted Šidák procedure controls the family-wise error rate at a significance level* *α*.

**Proof.** P(failing to reject any true null hypotheses in **H**_{0})

$$=\hspace{0.17em}\prod _{j:{H}_{j}\in {H}_{0}}\text{P}\left({p}_{j}\hspace{0.17em}>\hspace{0.17em}1\hspace{0.17em}-\hspace{0.17em}{(1-\alpha )}^{{w}_{j}/m}\right)\hspace{0.17em}=\hspace{0.17em}\prod _{j:{H}_{j}\in {H}_{0}}{(1-\alpha )}^{{w}_{j}/m}\hspace{0.17em}=\hspace{0.17em}{(1-\alpha )}^{\sum _{j:{H}_{j}\in {H}_{0}}{w}_{j}/m}\hspace{0.17em}=\hspace{0.17em}1-\alpha ,$$

where, *p** _{j}* follows standard uniform distribution when

From the Taylor series expansions, we obtain

$$\frac{\alpha}{m}{w}_{j}\le 1-{(1-\alpha )}^{\frac{{w}_{j}}{m}}.$$

Based on this inequality, when the same pre-determined weights (*w*_{1}, *w*_{2}, , *w** _{m}*) are used by the weighted Šidák procedure and the weighted Bonferroni procedure, if any hypothesis

**Theorem 2.** *For m independent tests, if the same pre-determined weights* (*w*_{1}*,* *w*_{2}*, , w** _{m}*)

** Remark 1**. If all weights

As stated earlier, how to estimate optimal weights by using the prior information still needs further investigation. Here, we derive the optimal weights that maximize the average power of the weighted Šidák procedure under the assumption tion that the means (*μ*_{1}, *μ*_{2}, , *μ** _{m}*) are known. These optimal weights are called optimal Šidák weights, which is an extension of the optimal Bonferroni weights of Wasserman and Roeder (2006).

For any specified weights (*w*_{1}*,* *w*_{2}*, , w** _{m}*)

$$Powe{r}_{j}=\text{P}\left({p}_{j}<1-{(1-\alpha )}^{\frac{{w}_{j}}{m}}|{\mu}_{j}>0\right)=\overline{\Phi}\left({\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{\frac{{w}_{j}}{m}}\right)-{\mu}_{j}\right).$$

The average power of the weighted Šidák procedure is

$$P{W}_{\text{average}}=\frac{1}{{m}_{2}}\sum _{j:{\mu}_{j}>0}\overline{\Phi}\left({\overline{\Phi}}^{-1}\left({\left(1-(1-\alpha \right)}^{\frac{{w}_{j}}{m}}\right)-{\mu}_{j}\right).$$

To find the optimal weights that maximize this average power subject to constraint of equation (1), Lagrange method was used to obtain conditional extremum of *PW*_{average}.

**Theorem 3.** *Given FWER being* *α* *and known means (μ*_{1}*, μ*_{2}*, , μ*_{m}*) of the m independent test statistics (Z*_{1}*, Z*_{2}*,* *, Z*_{m}*), the optimal non-negative weights* (*w*_{1}*,* *w*_{2}*, , w** _{m}*)

$$c-\frac{{w}_{i}}{m}\text{ln}(1-\alpha )={\mu}_{i}{\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)-\frac{{\mu}_{i}^{2}}{2},fori=1,\dots ,m,$$

(5)

*where c is a constant (given in Appendix A*).

The proof of this theorem is given in Appendix A. The inequalities and equations can be solved by using the “**nlminb()”** function in R.

In the following simulation studies we will show that the weighted Šidák procedure using the optimal Šidák weights can have higher power than the weighted Bonferroni procedure using the optimal Bonferroni weights and that the weighted Šidák procedure using the optimal Šidák weights can have much higher power than the Šidák procedure.

Holm (1979) introduced a GS Bonferroni procedure that is a step-down procedure using ordered weighted p-values. If the (unknown) weights used in the procedure are estimated appropriately by using prior information, the procedure can have higher power than the weighted Bonferroni procedure (also see below). In this section, we first review this GS Bonferroni procedure, and then we propose a GS Šidák procedure as an extension of the GS Bonferroni procedure.

When assuming that the means of the statistics are known, it is difficult to derive the optimal weights by maximizing the average power of these GS procedures as done before for the weighted Bonferroni and the weighted Šidák procedures. We incorporate the optimal Bonferroni (Šidák) weights described in Section 2.2 (2.3) into the GS Bonferroni (Šidák) procedure. We will show below that when these optimal weights are used, the GS Bonferroni (Šidák) procedure has higher power than the weighted Bonferroni (Šidák) procedure.

Given nonnegative weights (*w*_{1}*,* *w*_{2}*, , w** _{m}*) for the

- Step 1. If ${B}_{(1)}>\frac{\alpha}{\sum _{i=1}^{m}{w}_{(i)}}$, stop the procedure; otherwise, reject
*H*_{(1)}and go to the next step.... - Step
*j*. If ${B}_{(j)}>\frac{\alpha}{\sum _{i=j}^{m}{w}_{(i)}}$, stop the procedure; otherwise, reject*H*_{(j)}and go to the next step.....

Continue these steps until the procedure is stopped or all *B*-values have been processed.

This procedure controls FWER at level *α*. If we set all weights *w** _{j}* equal to 1, the inequality
${B}_{(j)}>\frac{\alpha}{\sum _{i=j}^{m}{w}_{(i)}}$ in step

Now we compare the power of the GS Bonferroni procedure and the weighted Bonferroni procedure when the same pre-determined weights are used in these two procedures. For pre-specified weight (*w*_{(1)}, *w*_{(2)}, , *w*_{(}_{m}_{)}) associated with hypotheses (*H*_{(1)}, *H*_{(2)}, , *H*_{(}_{m}_{)}) such that
${m}^{-1}\sum _{j=1}^{m}{w}_{j}=1$ (i.e.,
${m}^{-1}\sum _{j=1}^{m}{w}_{(j)}=1$), if any false hypothesis *H*_{(}_{j}_{)} is rejected by the weighted Bonferroni procedure, that is,
${B}_{(j)}\le \frac{\alpha}{m}$ is true, then
${B}_{(j)}\le \frac{\alpha}{\sum _{i=j}^{m}{w}_{(i)}}$. Since
$\sum _{i=j}^{m}{w}_{(i)}\le m$, we have

$${B}_{(1)}\le {B}_{(2)}\le \dots \le {B}_{(j)}\le \frac{\alpha}{m}\le \frac{\alpha}{\sum _{i=j}^{m}{w}_{(i)}}.$$

Thus, *H*_{(}_{j}_{)} will also be rejected by the GS Bonferroni procedure. Therefore, we have Theorem 4.

**Theorem 4.** *Given weights* (*w*_{1}*, w*_{2}*, , w** _{m}*)

As stated earlier, it is difficult to estimate the optimal weights that maximize the average power of the GS Bonferroni procedure under the assumption that the means of statistics are known. Here we propose to use the optimal Bonferroni weights described in Section 2.2. When these optimal Bonferroni weights are used, from Theorem 4, we know that the GS Bonferroni procedure has higher average power than the weighted Bonferroni procedure. Our simulation studies will confirm this.

The GS Bonferroni procedure is based on the Bonferroni procedure. As stated earlier, the Šidák procedure has higher power than the Bonferroni procedure. Therefore, we propose a GS Šidák procedure.

Given nonnegative weights (*w*_{1}*, w*_{2}*, , w** _{m}*) for the

Step 1. If
${S}_{(1)}<{(1-\alpha )}^{\frac{1}{\sum _{i=1}^{m}{w}_{(i)}}}$, then stop the procedure; otherwise reject *H*_{(1)} and go to the next step.

....,

Step *j*. When *H*_{(1)} , *H*_{(}_{j}_{–1)} have been tested and rejected: if

$${S}_{(j)}<{(1-\alpha )}^{\frac{1}{\sum _{i=j}^{m}{w}_{(i)}}},$$

(6)

stop the procedure; otherwise reject the hypothesis *H*_{(j)}, and go to the next step.

...,

Continue these steps until the procedure is stopped or all S-values have been processed.

**Theorem 5.** *Suppose m tests are independent, then the GS Šidák procedure controls family-wise error rate at a significant level* *α*.

**Proof.** Let **I**_{0} be the set of index subscripts for the true null hypotheses, **I**_{0} = {*t: H*_{t}**H**_{0}}. Let
${S}_{{\mathbf{\text{I}}}_{0}}^{l}={\text{max}}_{t\in {\mathbf{\text{I}}}_{0}}{S}_{t}$ denote the largest S-value among all *S** _{t}* with t

P(failing to reject any true null hypotheses in **H**_{0})

$$\begin{array}{l}=P\left(\underset{j=1}{\overset{k}{\cup}}({S}_{(j)}<{\left(1-\alpha \right)}^{1/\sum _{i=j}^{m}{w}_{(i)}})\right)\ge P\left({S}_{(k)}<{\left(1-\alpha \right)}^{1/\sum _{i=k}^{m}{w}_{(i)}}\right)\\ =\prod _{t\in {\mathbf{\text{I}}}_{0}}P\left({S}_{t}<{\left(1-\alpha \right)}^{1/\sum _{i=k}^{m}{w}_{(i)}}\right)=\prod _{t\in {\mathbf{\text{I}}}_{0}}P\left({(1-{p}_{t})}^{1/{w}_{t}}<{\left(1-\alpha \right)}^{1/\sum _{i=k}^{m}{w}_{(i)}}\right)\\ =\prod _{t\in {\mathbf{\text{I}}}_{0}}P\left((1-{p}_{t})<{\left(1-\alpha \right)}^{{w}_{t}/\sum _{i=k}^{m}{w}_{(i)}}\right)={\left(1-\alpha \right)}^{{\sum}_{t\in {\mathbf{\text{I}}}_{\mathbf{0}}}{w}_{t}/\sum _{i=k}^{m}{w}_{(i)}}>1-\alpha ,\end{array}$$

where
${\sum}_{t\in {\mathbf{\text{I}}}_{0}}{w}_{t}/\sum _{i=k}^{m}{w}_{(i)}\le 1$, and 1 - *p** _{t}* follows uniform distribution when

Now we compare the power of the GS Šidák procedure to that of the weighted Šidák procedure when both procedures use the same weights *w** _{j}* that satisfy
${m}^{-1}\sum _{j=1}^{m}{w}_{j}=1(\text{i},\text{e}.,{m}^{-1}\sum _{j=1}^{m}{w}_{(j)}=1)$. If

$${S}_{(1)}\ge {S}_{(2)}\ge \dots \ge {S}_{(j)}\ge {(1-\alpha )}^{1/m}\ge {(1-\alpha )}^{1/\sum _{i=j}^{m}{w}_{(i)}}.$$

Thus, *H*_{(}_{j}_{)} will also be rejected by the GS Šidák procedure, and we have Theorem 6.

**Theorem 6.** *Given weights* (*w*_{1}*, w*_{2}*, , w** _{m}*)

Furthermore, we compare the power of the GS Šidák procedure to that of the GS Bonferroni procedure when the same pre-specified weights are used in these two procedures.

**Theorem 7.** *For m independent tests, if the same weights* (*w*_{1}*, w*_{2}*, , w** _{m}*)

**Proof.** From the definition of B-value (*B** _{i}* =

If ${B}_{(j)}\le \alpha /\sum _{i=j}^{m}{w}_{(i)}$, from the Taylor series expansions, we have

$${p}_{(j)}\hspace{0.17em}\le \hspace{0.17em}\alpha {w}_{(j)}/\sum _{i=j}^{m}{w}_{(i)}\le \hspace{0.17em}1-{(1-\alpha )}^{{w}_{(j)}/\sum _{i=j}^{m}{w}_{(i)}}.$$

Thus, ${S}_{(j)}={(1-{p}_{i})}^{1/{w}_{i}}\ge {(1-\alpha )}^{1/\sum _{i=j}^{m}{w}_{(i)}}$.

** Remark 2**. If setting all the weights equal (to 1), then the GS Šidák procedure becomes the sequential Šidák procedure (Scherrer, 1984).

In the GS Šidák procedure, a major issue is how to calculate the weights. As we stated before, it is difficult to derive optimal weights that maximize the average power of the GS Šidák procedure under the assumption that the means of the statistics are known. Here, under this assumption, we suggest using the optimal Šidák weights calculated by equation (5). From Theorem 7, the GS Šidák procedure has higher average power than the weighted Šidák procedure when the optimal Šidák weights are used by these two procedures.

We will show that the GS Šidák procedure using the optimal Šidák weights has higher power than the GS Bonferroni procedure using the optimal Bonferroni weights by simulation studies (see below). It appears to be difficult to prove this statement theoretically because the optimal Šidák weights are not the same as the Bonferroni weights.

To further evaluate the performance of the proposed testing procedures, we compared by simulation studies the average power of six multiple testing procedures: the Šidák procedure, the Bonferroni procedure, the weighted Šidák (Bonferroni) procedure using the optimal Šidák (Bonferroni) weights, and the GS Šidák (Bonferroni) procedure using the optimal Šidák (Bonferroni) weights.

When we assume that the means of statistics are known, for each true null hypothesis *H** _{j}* :

We simulated datasets in a similar way to Rubin *et al.* (2006). Each simulated dataset **X** = (*X** _{i,j}*)

When generating each dataset that was associated with 1,000 covariates, we randomly chose 50 covariates and set the means *γ** _{j}* > 0 for these 50 covariates (i.e.,

Table 1 shows the results of the estimated average power of the six multiple testing procedures in Scenarios 1 and 2. From Table 1, we can see that the GS Šidák, weighted Šidák, and Šidák procedures have slightly higher estimated average power than the corresponding GS Bonferroni, weighted Bonferroni, and Bonferroni procedures, and that the GS Šidák procedure is most powerful among the six procedures. The GS Šidák procedure and the GS Bonferroni procedure can have much higher power than both the weighted Šidák procedure and the weighted Bonferroni procedure. For example, in Scenario 1, when *μ* = 3, the estimated average power of the GS Šidák procedure, GS Bonferroni procedure, the weighted Šidák procedure, and the weighted Bonferroni procedure is 0.5820, 0.5792, 0.4670 and 0.4639, respectively (see Table 1).

In previous sections, all weights are calculated under the assumption that the means **μ** = (*μ*_{1}, *μ*_{2}, , *μ** _{m}*) of statistics are known. However, in real data analysis, the means

It is beyond of the scope of this study to determine how to effectively estimate the means (*μ*_{1}, *μ*_{2}, , *μ** _{m}*) by using prior information. To show the performance of our proposed procedures when estimated means are used, as an example, we implemented our proposed procedures by incorporating the data-splitting method and applied these methods to the simulated Scenarios 1–2 data sets described in the previous section. The only exception is that we assume here that the first 950 covariates have means equal to zero and the last 50 covariates have the common mean value

For each simulated dataset, the Bonferroni procedure and the Šidák procedure were implemented on the entire dataset, while the other four procedures used the data-splitting method (under the assumption that the order of means (*μ*_{1}, *μ*_{2}, , *μ** _{m}*) is known). Table 2 shows the estimated average power and family-wise error rates of the six procedures for Scenarios 1 and 2. We only show the results with the proportion π of the first part

The estimated FWERs and average power of the six multiple procedures over 1,000 replicated data sets when the means of the test statistics are unknown (*α* = 0.05, *m* = 1000, *m*_{2} = 50, *n*=100, and π = 0.1 for data-splitting).

From Table 2, we can find that the weighted Šidák and the GS Šidák procedures have slightly higher estimated average power than the weighted Bonferroni and the GS Bonferroni procedures, respectively, and that the GS Šidák procedure has the highest estimated average power among these six procedures. For example, when *μ* is equal to 4, the estimated average power of the GS Šidák procedure is 0.8719. It is nearly 13% more than that of the weighted Bonferroni procedure (0.7427). In addition, it is interesting that the estimated average power of the six procedures is smaller than their estimated FWERs when *μ* is equal to 1. This occurs because the average power is the average (not cumulative value) of per-hypothesis powers for the 50 false null hypotheses, and the FWER is a cumulative value (not average) of type I error rates for 950 tests.

From Table 2, we can also find that the six procedures can control FWERs quite well. Interestingly, the estimated FWERs are much lower in the four procedures using weights (i.e. the weighted Bonferroni, weighted Šidák, GS Bonferroni and GS Šidák) than in the two procedures without using weights (Bonferroni and Šidák). The reason is that the four weighted procedures used the prior information of the order of means of the test statistics.

In this article, we propose a weighted Šidák procedure and a GS Šidák procedure for multiple hypotheses testing based on the weighted Bonferroni procedure. Under the assumption that the means of the test statistics are known, we further describe how to estimate the optimal Šidák weights which maximize the average power of the weighted Šidák procedure. We show that the weighted Šidák procedure using the optimal Šidák weights can have higher power that the weighted Bonferroni procedure using the optimal Bonferroni weights. Furthermore, we incorporate the optimal Šidák (Bonferroni) weights into the GS Šidák (Bonferroni) procedure. Using these optimal weights the GS Šidák (Bonferroni) procedures can have higher power than the corresponding weighted Šidák (Bonferroni) procedures, respectively, and the GS Šidák procedure often has the highest power among these procedures.

For the multiple procedures using weights described in this article, how to estimate ate the weights (*w*_{1}, *w*_{2}, *, w** _{m}*) by using prior information is still an open problem. Several investigations have been reported in the literature. Roeder

It appears that the optimal Šidák weights and optimal Bonferroni weights have better property than the weights described in the previous paragraph because these optimal weights are based on maximizing the average power of the procedures. However, the optimal Šidák weights and optimal Bonferroni weights are calculated assuming that the means of test statistics are known, and in practice, these means are unknown. The means of test statistics may be estimated by using prior information (Roeder *et al.*, 2007). When certain prior information is available to estimate the means of statistics, the procedures proposed in this paper are useful and can have much higher power than the widely used Bonferroni procedure. However, how to use prior information to estimate the optimal Šidák weights and optimal Bonferroni weights is still a challenge. We will pursue studies on this topic in the future.

Most of the proposed methods focus on the normal distribution model and one-sided tests. It is trivial to modify the formulas to handle two-sided tests for normal l distribution and *χ*^{2} distribution. All the proposed methods assume independence among the multiple tests. This assumption is very conservative. In a real data analysis, multiple tests are often highly correlated. For example, in genome-wide association studies, the tests for different markers may be correlated due to linkage disequilibrium among the markers (Conneely and Boehnke, 2007; Nyholt, 2004). How to extend our proposed method to account for correlation among tests is another issue we will pursue in the future.

All the proposed methods focus on the control of the family-wise error rate for multiple testing. However, a similar idea can be applied to control false discovery rate by using weighed p-value (see also Genovese *et al.*, 2006).

We thank the editor and two referees for their helpful comments and useful suggestions. This research was supported by grant GM073766, GM077490, and GM081488 from the National Institute of General Medical Sciences. Address for correspondence: Dr. Guimin Gao, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294. email: ude.bau.hpos.sm@oagg. Phone: 205-975-9188.

**Proof.** For the *m* independent test statistics (*Z*_{1}*, Z*_{2}*,* *, Z** _{m}*), we estimate the optimal weights

$$G(\lambda ,\mathbf{\text{w}})=\frac{1}{{m}_{2}}\sum _{j:{\mu}_{j}>0}\overline{\Phi}\left({\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{j}/m}\right)-{\mu}_{j}\right)-\lambda \left(m-\sum _{j:{\mu}_{j}>0}{w}_{j}\right).$$

By setting the derivatives, with respect to *w** _{i}*, for

$$\frac{\partial G(\lambda ,\hspace{0.17em}\mathbf{\text{w}})}{\partial {w}_{i}}\hspace{0.17em}=\hspace{0.17em}-{(1-\alpha )}^{{w}_{i}/m}\frac{\phi \left({\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)-{\mu}_{i}\right)}{\phi \left({\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)\right)}\frac{1}{m{m}_{2}}\text{ln}(1-\alpha )+\lambda \hspace{0.17em}=\hspace{0.17em}0,$$

that is

$$\frac{\lambda m{m}_{2}}{\text{ln}(1-\alpha )}=\frac{\phi \left({\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)-{\mu}_{i}\right)}{\phi \left({\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)\right)}{(1-\alpha )}^{{w}_{i}/m},$$

(A.1)

where (*x*) is the probability density function of the standard normal distribution. From (A.1), we have

$$\frac{\lambda m{m}_{2}}{\text{ln}(1-\alpha )}=\text{exp}\left({\mu}_{i}{\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)-\frac{{\mu}_{i}^{2}}{2}\right){(1-\alpha )}^{{w}_{i}/m}.$$

Taking logarithm on both sides, we obtain

$$\text{ln}\left(\frac{\lambda m{m}_{2}}{\text{ln}(1-\alpha )}\right)-\frac{{w}_{i}}{m}\text{ln}(1-\alpha )={\mu}_{i}{\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)-\frac{{\mu}_{i}^{2}}{2},$$

(A.2)

or

$$c-\frac{{w}_{i}}{m}\text{ln}\left(1-\alpha \right)={\mu}_{i}{\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)-\frac{{\mu}_{i}^{2}}{2},$$

where,
$\text{c}=\text{ln}\left(\frac{\lambda m{m}_{2}}{\text{ln}(1-\alpha )}\right)$. Therefore, **w** satisfies the equations (5).

To make sure that equations (5) provide optimal values, we need to investigate the second derivatives of the Lagrange function for *w** _{i}*.

$$\begin{array}{l}\frac{{\partial}^{2}G(\lambda ,w)}{\partial {w}_{i}^{2}}\hspace{0.17em}=\hspace{0.17em}\frac{\partial}{\partial {\omega}_{i}}\left[-{(1-\alpha )}^{{w}_{i}/m}\frac{\phi (\delta -{\mu}_{i})}{\phi (\delta )}\frac{1}{m{m}_{2}}\text{ln}(1-\alpha )+\lambda \right]\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}=\frac{1}{{m}_{2}}{\left(\frac{1}{m}\text{ln}(1-\alpha )\right)}^{2}{(1-\alpha )}^{{w}_{i}/m}\frac{\phi (\delta -{\mu}_{i})}{\phi (\delta )}\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}+\frac{1}{m{m}_{2}}{(1-\alpha )}^{{w}_{i}/m}\text{ln}(1-\alpha )\{\frac{{m}^{-1}{(1-\alpha )}^{{w}_{i}/m}\text{ln}(1-\alpha )\phi (\delta -{\mu}_{i})(\delta -{\mu}_{i})}{{[\phi (\delta )]}^{2}}\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}+\frac{{m}^{-1}{(1-\alpha )}^{{w}_{i}/m}\text{ln}(1-\alpha )\phi (\delta -{\mu}_{i})(-\delta )}{{\left[\phi \left(\delta \right)\right]}^{2}}\}\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}=\frac{{m}_{2}^{-1}{\left({m}^{-1}\text{ln}(1-\alpha )\right)}^{2}{(1-\alpha )}^{{w}_{i}/m}\phi (\delta -{\mu}_{i})\left[-\phi (\delta )-{\mu}_{i}{(1-\alpha )}^{{w}_{i}/m}\right]}{{\left[\phi (\delta )\right]}^{2}}<0,\end{array}$$

where, $\delta ={\overline{\Phi}}^{-1}\left(1-{(1-\alpha )}^{{w}_{i}/m}\right)$. Note that the off-diagonal elements of the Hessian matrix are all zeroes. We conclude that the Hessian matrix is negative definite. Consequently, the solutions of the weights are optimal.

- Bonferroni CE. “Volume in Onore di Ricarrdo dalla Volta,”. Universita di Firenza; 1937. Teoria statistica delle classi e calcolo delle probabilita; pp. 1–62.
- Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of p values for multiple correlated tests. Am J Hum Genet. 2007;81:1158–1168. doi: 10.1086/522036. [PubMed] [Cross Ref]
- Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93:509–524. doi: 10.1093/biomet/93.3.509. [Cross Ref]
- Hochberg Y, Tamhane AC. Multiple comparison procedures. New York: Wiley; 1987.
- Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6:65–70.
- Ionita-Laza I, McQueen MB, Laird NM, Lange C. Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan. Am J Hum Genet. 2007;81:607–614. doi: 10.1086/519748. [PubMed] [Cross Ref]
- Lin DY. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics. 2005;21:781–787. doi: 10.1093/bioinformatics/bti053. [PubMed] [Cross Ref]
- Nakagawa S. A farewell to Bonferroni: the problems of low statistical power and publication bias. Behavioral Ecology. 2004;14:1044–1045. doi: 10.1093/beheco/arh107. [Cross Ref]
- Nyholt DR. A simple correction for multiple testing for single-nucleotide-polymorphisms in linkage disequilibrium with each other. Am J Hum Genet. 2004;74:765–769. doi: 10.1086/383251. [PubMed] [Cross Ref]
- Olejnik S, Li JM, Huberty CJ, Supattathum S. Multiple testing and statistical power with modified Bonferroni procedures. J Educat Behavioral Statist. 1997;22:389–406.
- Roeder K, Bacanu S, Wasserman L, Devlin B. Using linkage genome scans to improve power of association scans. Am J Hum Genet. 2006;78:243–252. doi: 10.1086/500026. [PubMed] [Cross Ref]
- Roeder K, Devlin B, Wasserman L. Improving power in genome-wide association studies: weights tip the scale. Genet Epidemiol. 2007;31:741–747. doi: 10.1002/gepi.20237. [PubMed] [Cross Ref]
- Rubin D, Dudoit S, van der Laan MJ. 2006. A method to increase the power of multiple testing procedures through sample splitting U.C. Statistical Applications in Genetics and Molecular Biology 5: article 19.10.2202/1544-6115.1148 [PubMed] [Cross Ref]
- Scherrer B. Biostatistique. G. Morin; Quebec: 1984. p. 850.
- Šidák Z. Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc. 1967;62:626–633. doi: 10.2307/2283989. [Cross Ref]
- Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–754. doi: 10.1093/biomet/73.3.751. [Cross Ref]
- Wasserman L, Roeder K. 2006. Weighted hypothesis testing. (http://arxiv.org/abs/math.ST/0604172) (accessed July 5, 2007)

Articles from Statistical Applications in Genetics and Molecular Biology are provided here courtesy of **Berkeley Electronic Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |