In a statistical experiment, a P-value is considered significant if it is less than that experiment's chosen alpha value. The alpha value specifies the accepted level of certainty at which a result is considered statistically significant when it is in fact merely the result of random chance. For example, in an experiment using an alpha value of 0.05, there isa1in20 that any given true ‘null’ test would seem significant just by chance. When multiple hypotheses are tested, each hypothesis has a probability of being falsely determined to be significant. If 10 hypotheses are tested and the alpha level is 0.05, then the chance of finding at least one apparently significant difference due to random chance equals 0.4 (which is 1 − 0.9510).
Correction for multiple hypotheses attempts to maintain the probability of falsely finding any significant hypothesis at the alpha value. The most common multiple hypothesis correction method used is the Bonferroni correction, whereby the alpha value is simply divided by the number of tests, and the overall chance of finding any false positive remains the same as in a single hypothesis experiment. The Bonferroni correction assumes that the tests are independent, and is usually considered a conservative adjustment (Sokal and Rohlf, 1995
). In our case, the hypotheses (GO nodes) are not independent, because the nodes themselves are structured in a DAG and it is thus not clear whether a Bonferroni adjustment would be appropriate. To determine whether the Bonferroni correction is appropriate for multiple hypothesis correction, we implemented a simulation-based correction within GO::TermFinder. For each simulation, the same number of genes as were provided in the real data were picked randomly from the list of genes that define the background distribution, and P
-values were calculated as normal. Adjusted P
-values for the real data were calculated for each node as the fraction of 1000 null-hypothesis simulations having any node with a P
-value as good or better than the P
-value for that node, where the null hypothesis states that a randomly chosen list of genes should not be significantly annotated by any GO nodes. Examining the output of simulations, to determine a correction factor that would need to be applied to uncorrected P
-values, and comparing it to the Bonferroni adjusted P
-values, we determined that the Bonferroni adjustment is in fact somewhat liberal, rather than conservative. Both simulation and Bonferroni are provided as options for multiple hypothesis correction, though while the simulation based analysis is the most accurate, it also takes three orders of magnitude longer to run, as 1000 independent simulations are needed.