In this manuscript, we describe methods for combining and adjusting P-values in the context of the shared controls design. We suggest that P-value based analysis provides powerful means of inference for association studies that reuse control individuals.
First problem that we consider is the conditional P-value adjustment. This type of adjustment arises specifically in GWAS, when a particular SNP comes in the spotlight because of a significant P-value for a particular disease. A scenario that we consider is when case samples for several etiologically related diseases are contrasted against a common control group. Suppose that a small P-value is observed for one disease, significant at the GWAS level. Suppose that next we notice that the association P-value for a related disease at that SNP is 0.01. This value by itself would not stand out among the GWAS results. However, if these two P-values were independent, we could claim that there is support for a hypothesis of common etiology. With the shared controls design, P-values are no longer independent. Chances of observing a second P-value as small as 0.01 or smaller is no longer 1%, and can be considerably higher. Thus, without taking the correlation into account we would arrive at a spurious conclusion. Or approach allows to quantify just how likely it is to observe a small P value, given observations of small P-values for one or several related diseases and leads to a conditional P-value adjustment.
Next, we consider the problem of combining association signals across several diseases. We find that weighted versions of P
-value combination methods that take into account correlation due to shared individuals are as powerful as analysis that aggregates individual genotype information. These methods are especially useful in meta-analytic applications. Association signals can be combined across distinct diseases with similar, genetically mediated etiology. The inverse chi-square method that is robust in the presence of either association heterogeneity or association direction reversal is especially useful. Alternatively, P
-values obtained for a single disease and several independent case samples, contrasted against the same control group, can be combined to ascertain an overall strength of association. The proposed inverse normal method is most appropriate for this situation. The P
-value combination methods described here are useful in broader contexts than just the shared controls design. These methods can be applied whenever asymptotically normal statistics, or their squared versions are used, and the correlation between the statistics is either known, or when the tests are independent. The idea of combining several two-sided P
-values in a meta-analytic application by first converting them to one-sided, combining, and converting the result back to a two-sided P
-value was considered previously. Overall and Rhoades (1986)
considered such approach based on the Fisher combination method [Overall and Rhoades, 1986
]. However, because the effect direction cannot be chosen beforehand, one would have to consider both directional hypotheses in turn with that approach, then compute two one-sided P
-values, and then double
the minimum of the two. Doubling is nothing more than the Bonferroni penalty, which results in a conservative test. In fact, the resulting P
-value can be greater than one. With the inverse normal approach that we advocate, two combined Z
-scores are identical but opposite in sign. Thus, our approach avoids the penalty, and gives a single two-sided combined P
-value. Moreover, we find that in most cases, the resulting P
-value is nearly the same as that provided by the overall statistic, based on pooling raw data. Our weighting approach for combined P
-values gives an improvement over a previously suggested weighting by the study size for combining independent tests [Whitlock, 2005
]. We suggest that for the independent tests, the optimal weighting is by the square root of the study size, that is, the weights should be proportional to the inverse of the standard error.
The last issue is that of multiple testing. Whenever tests are combined with the goal of obtaining a consensus evidence in support of a common hypothesis, there is a possibility that a significant result is driven by just one very small P-value. If P-values for k diseases are combined, one might be interested in examining individual P-values, adjusted for having made k tests. In general, taking into account the correlation between these tests results in a less conservative penalty than that provided by a simple Bonferroni adjustment. However, we find that when the shared control group is at least as large as the largest case group, the improvement over the Bonferroni adjustment is negligible, especially at small significance levels that are appropriate in the context of GWAS. Thus, we recommend that when the multiplicity adjustment is made based on testing association for k diseases, the Bonferroni adjustment is sufficient.
Recently, Lin and Sullivan described ways to perform meta-analysis by combining individual records as well as summary statistics which also allow for shared study subjects [Lin and Sullivan, 2009
]. Their approaches and the approaches we describe are mutually complementary, building toward a statistical framework for comprehensive analysis of genetic data with overlapping subjects. The P
-value based approach is efficient, but it is also useful because of its simplicity and broad applicability. Most of the analysis described here only requires access to association P
-values and knowledge of sample sizes. The inverse normal approach has an additional obvious requirement for knowledge of the association direction: with this approach, one would not want P
-values to reinforce the combined result, unless the respective effect directions are in agreement.