Meta-analysis of
P-values generally benefits from weighting. When samples are obtained from the same or similar populations, as in the model studied by Whitlock and Chen, the optimal weights for the
Z-test are given by

. In this case, the weighted
Z-test, Lancaster’s test and the test based on pooled data provide very similar power. This is expected, because Lancaster’s method approaches the weighted
Z method asymptotically, as min(
ni) increases. When there is heterogeneity of variances, but the true mean is the same across studies, weighting by

is optimal, but the gain in power is not great, compared to weighting by

(0.784 vs. 0.743 at
α=5%). To an extent, power increase is small because of the large range of sample sizes. A constant sample size of
n=289 would have given the powers of 0.801 vs. 0.743 respectively, for the same assumed heterogeneity of variances, max (
σ2)/min (
σ2)=13.4. When there is heterogeneity of means,
Z-test that uses standardized effect sizes as weights has the largest power (
Lipták, 1958;
Won et al., 2009), however an application of this test requires the knowledge of
μ. Note that this value needs to be pre-specified: plugging in an estimate
![[mu]](/corehtml/pmc/pmcents/mgrcirc.gif)
obtained from the same data that was used to compute
P-values would invalidate the combination test.
In this study, one-sided P-values were assumed. Such P-values are appropriate for meta-analytic combination of P-values from several studies. Two-sided P-values are generally inappropriate, because they are oblivious to the effect direction. Two-sided P-values from two studies in which the effect direction is flipped can both be small nevertheless, resulting in an inappropriately small combined P-value. On the other hand, combined result of corresponding one-sided P-values will properly reflect cancellation of the pooled effect that would have been observed if raw data from the two studies were combined.
Despite the fact that the mechanics of the meta-analytic process involves manipulation of one-sided
P-values, it is often the case that the final result needs to be a two-sided
P-value. For example, when allele frequencies are compared between two groups of individuals classified based on the presence or absence of a trait, the null hypothesis is usually that the frequency is the same, and the alternative hypothesis does not specify a particular effect direction. The weighted
Z-test provides an important advantage in dealing with this situation, due to symmetry of the normal transformation. There are two possible one-sided combined
P-values for each assumed effect direction, but with the weighted
Z method, the combined
P-value for the first assumed direction is the same distance from
1/
2 as the combined
P-value for the second assumed direction. Therefore, one can arbitrarily assume either one of the two directions when computing one-sided
P-values, and obtain a combined one-sided
P-value,
pone-sided. The two-sided combined
P -value is the same regardless of the assumed direction:
What if available individual
P-values are all two-sided? Often, studies report
P-values that correspond to statistics such as |
T| and |
Z|, or its squared value, i.e. the one degree of freedom chi-square. These individual
P-values can be converted to one-sided before combining as follows:
Once again, the assumed effect direction can be chosen arbitrarily. For example, in testing for association of an allele with a trait at a biallelic locus
A/
a, we can arbitrarily choose one of the alleles, e.g. allele
A. Then the “effect direction” for
i-th study is positive if there is positive correlation of that allele with the presence of the trait in that study. Once these one-sided
P-values are combined, the result can be converted back to two-sided by
Equation (3).
Another advantage of the weighted
Z test is that it can be easily extended to account for the case of correlated statistics between studies. For the test to be valid under independence, we need an assumption that the set of {
Zi} jointly follows a multivariate normal distribution under the null hypothesis. If cor(
Zi,
Zj) =
rij, the modification amounts to replacing the denominator in
Equation (1) with

. The multivariate normal assumption is often justified asymptotically, and in certain situations the correlations {
rij} are known. For example, when each
Zi is a result of comparing group
i of sample size
ni to a common “control” group of sample size
n0 by the two-sample
T -test, then

(
Dunnett, 1955). In principle, a variation of the weighted Fisher’s method can be extended to this situation, if we can assume that chi-square statistics formed from individual
P-values can be represented by squares of underlying multivariate normal variables with correlations
rij. However, required computations are more involved. First, the two degree of freedom chi-square transformation in
Equation (2) would have to be replaced with the one degree of freedom transformation. Then one would need to compute eigenvalues of

, where
w is the vector of weights and
R
R is the matrix of squared correlations. Finally, to compute the combined
P-value, one can use the fact that the weighted sum of these correlated chi-squares can be represented by the sum of independent weighted chi-squares with weights given by the above eigenvalues (
Box, 1954). Thus, one can use the observed weighted sum of correlated chi-squares with weights substituted by the eigenvalues as an input to a routine for computing the cumulative distribution of the sum of independent weighted chi-squares.
Although there is no single method for combining
P-values that is most powerful in all situations, a meta analytic setup considered by Whitlock and extended here to include study heterogeneity is quite general, because many forms of one-sided statistics approach a normal distribution asymptotically. Therefore, the

– or

–weighted
Z-test for combining one-sided
P-values can be recommended in most situations.
In this study, the weighted Fisher’s method showed slightly smaller power values compared to other methods in this study. If absolute values or squares of T-statistics for each study were assumed instead, as in calculation of two-sided tests, the weighted Fisher’s would have yielded higher power values than either Lancaster’s or the weighted Z methods. As already noted, combining individual two-sided P-values is generally not appropriate in meta-analysis, where the same hypothesis is tested in all studies. Combination of two sided P-values is more appropriate when individual tests are concerned with separate hypotheses. Small combined P-value in that case can be interpreted as evidence that one or more individual null hypotheses are false. Owing to the virtue of being sensitive to small P-values, the weighted Fisher’s method would provide good power, especially in those situations where there is pronounced heterogeneity of effect sizes between studies.