Searching for sets of predictive markers results in many putative models being taken into consideration. In the extreme case, all possible combinations are evaluated and a criterion, such as a measure of statistical association, is scored; the combination with the best score is taken. This combination may be examined in an independent follow-up study to verify that it is not a “false positive”. If there are
k markers to start with, there are
k(
k − 1)/2 pairs, and, at most,
L = 2
k − 1 combinations, including individual markers, all possible marker pairs, triples, and so on. A considerable problem with this approach is that the probability of a false positive increases with the number of combinations. The probability that a
p-value of
p or smaller represents a false positive is related to the false discovery rate, FDR (
Morton 1998,
Storey, 2002). At a given
fixed value of
p,
where
F(
p) is the
p-value cumulative distribution function under the alternative hypothesis, which reflects the power. For example, with the χ
2 test,

where Ψ
d,λ and Ψ
d,0 are the cumulative distribution functions of central and non-central χ
2, respectively, and
d denotes the degrees of freedom.
Supposing that statistical tests and their p-values are used to rank the hypotheses and that there is a single “true” association, HA, among L−1 null hypotheses, H0, where L is the number of combinations, then the probability that a randomly picked hypothesis represents a false discovery is
If we let

(FDR) = FDR/(1 − FDR), then this is an increasing function of FDR. Using Pr(
H0) = 1 − 1/
L,
One interpretation of this formula is that the power gained by considering combinations must increase more sharply than the number of combinations, L, otherwise FDR is going to increase. If L is an exponential function of the number of markers, k, the condition that FDR should not increase is difficult to achieve. Practically, this means that when L is large, the chances that a claimed association turns out to be a false positive are high.
For example, assume the power to detect a single true combination at α = 0.05 is 80%. Our analysis depends on the definition of a “true combination” and for illustrative purposes we assume that there is only a single subset. One can use a different definition, whereby every subset that partially overlaps with the combination of interest does represent a true discovery, which will reduce L in half. If we start with k = 10 markers and examine all L = 2k−1 = 1023 combinations, then among p-values ≤ 0.05, 98% are going to be false positives, with the assumption of a single true combination.
It is worthy to note that simple multiple testing correction alone does not easily get around this problem: suppose the testing is at the Bonferroni-corrected α = 0.05/1023 and that the proportion of false positives among p-values as small as 0.05/1023 ≈ 4.9 × 10−5 is examined. The proportion of false positives still remains high; about 32%. The situation worsens with a larger k. For example, when k = 15 the proportion of false positives among significant p-values after applying the Bonferroni correction (1.5 × 10−6 or smaller) is 69%.
This problem is very hard to circumvent, because L quickly increases with the number of markers, but the sample size does not increase correspondingly (see
Witte et al., 2000, for requirements on that). On the other hand, RP has the implicit assumption that at least one of the individual markers in the “true” associated set, as well as the smaller subsets (combinations) of markers in this set, should show some degree of association. As a consequence, the number of combinations remains linear as a function of the number of markers, keeping FDR relatively low (e.g., a tree with three splits will make approximately 3 ×
k comparisons).
How reasonable is the assumption that at least one of the individual effects will be present? One might think it is easy to come up with examples of marker interactions that preclude “marginal” effects. Consider the simplest possible scenario: a binary case (e.g., the presence/absence of a disease) and the control phenotype (
Y, N) and two diallelic loci (two binary predictors),
x1 = {0, 1} and
x2 = {0, 1}. If one considers a situation where allele ”1” has to be present at both markers in order for the disease risk to be high, then this is a clear case of interaction; however, the “case” category is likely to be enriched with the alleles of type ”1” at both markers compared to controls. A more subtle and seemingly “no-marginal effects” scheme is the following: ”1,1” or ”0,0” combinations are both associated with high risk, whereas ”0,1” and ”1,0” have low risk. Similar situations have been considered as justification for methods that go after “pure interactions”.
Jannot et al. 2003 considered such interactions for the case of genotypic interactions where the penetrance array is 3 × 3 for two loci. They evaluated an approach wherein all 2
k−1 marker combinations were analyzed and
p-values associated with the best combination were obtained by a permutation algorithm.
Lin et al. 2004 took a similar approach using transmission disequilibrium tests. They state that because of the usage of the permutation approach to multiple tests “the gains from considering haplotypes in our exhaustive allelic method are not overshadowed by the penalty paid for doing far more tests”. These approaches assume that the accurate type-I error protection through permutations is balanced with the appropriate increase in power provided by considering the proper subset.
Ritchie et al. 2001 proposed an algorithm that starts with a multi-dimensional table that allows for all possible interactions for a given subset of markers. The multi-dimensional table is represented by one of 2
k−1 combinations of
k markers. The high-risk cells are then pooled into one group and the low-risk cells into another group, and a cross-validation is used to assess robustness of the reduced model. A combinatorial partitioning method (CPM) proposed by
Nelson et al. 2001 evaluates all partitions of
a cells (multilocus genotypes) into
b groups. The number of partitions is given by:
which is substantially larger than the number of marker subsets. For example, with three diallelic markers there are 2
3−1 = 7 marker subsets that could be tested separately. However, assuming we start with the binary classification at each locus and
a = 2
3 multilocus combinations, there are 127 ways to partition the genotypes into two groups, and 966 ways to partition them into three groups. Another combinatorial algorithm by
Tahri-Daizadeh et al. 2003 considers increasingly complex models and evaluates proposed models on the basis of information criterion (
Akaike, 1974). For quantitative traits,
Culverhouse et al. 2004 proposed a modification of CPM that reduces this number of putative models by successfully merging cells with similar values of response. Such short-cut algorithms may consider a single model after the collapsing is finished. Significance levels are adjusted accordingly by a permutation algorithm. Nevertheless, the null distribution obtained via permutations essentially accounts for the number of looks through the data. This can be thought of as the adjustment by the ”effective” number of tests,
Le ≤
L.
While methods optimized for discovery of pure interactions are important, even the most contrived penetrance configurations allow for marginal effects – the feature exploited by the RP. The advantage of the RP method is in drastically reducing the number of potential models, thus lowering Pr(
H0) in (
1) and, therefore, reducing the probability of a false discovery.
Cordell and Clayton (2002) proposed a stepwise logistic regression procedure for evaluating the effects of the different polymorphisms within a gene.
Curran (2003) compared such approach using the combination of forward and backward selection with the performance of RP. The research indicated a remarkable similarity between the two methods in terms of the predictive power and the effects and interactions entering the final model.
Returning to the “no-marginal effects” scheme (”1,1”/”0,0” vs. ”0,1”/”1,0”) it may seem that all alleles are going to be present at equal frequencies in both phenotype groups. However, this is generally not the case and it is possible to discover such haplotype-trait associations by looking at just a single marker. The single marker effect may be smaller than a multilocus effect; however the reduction in degrees of freedom or the number of tests may balance a reduction in the effect magnitude.
The four possible haplotypes formed by the two SNPs are X = {00, 01, 10, 11}, with frequencies P = {p00, …, p11}. Four haplotype penetrance values are:
With random pairing of haplotypes and multiplicativity, the population prevalence of Y is:
In terms of the two SNPs that constitute a haplotype, the values γij define two-locus interactions with respect to the binary phenotype. For example, a high value of Pr(Y | X = 11) relative to γ indicates that both x1 = 1 and x2 = 1 are required for this probability to be above the population average. Suppose we have allele information at one of the markers, x1. Then the association can be detected if the penetrance of one of the alleles, e.g. γ1·. = Pr(Y | x1 = 1) is different from γ. This marginal allele penetrance value is
using the allele probabilities among the case and the control groups:
In turn, these are calculated from the frequencies of haplotypes and the penetrance values as
When γ1· ≠ γ, the marginal effect of a SNP can be detected. Thus, in general, this effect depends not only on the haplotype penetrances but on the population frequencies of the haplotypes as well. Looking at the “no marginal effect” difference,
it can be seen that for a plausible “pure interaction” penetrance configuration,

, the additional requirement on the population frequencies of haplotypes is such that the two pairs of frequencies should match:
As an example let the haplotype penetrance array {γ00, γ01, γ10, γ11} be Γ = {0.9, 0.2, 0.2, 0.9}, which may seem to imply that an SNP effect is unlikely, because of the ”orthogonal” structure of Γ. As often happens in reality, one of the four haplotypes can be of relatively high population frequency. In such a case, the marginal (SNP) effects can be quite pronounced, as can be seen from two following examples.
Example 1. Consider a situation when one of the low penetrance haplotype has a high frequency, P = {0.05, 0.05, 0.85, 0.05}. In this case, γ = 0.27 and γ1· = 0.55. The frequencies among the cases and the controls are Pr(x1 = 1 | Y) = 0.204 and Pr(x1 = 1 | N) = 0.062 corresponding to the loge of relative risk of 1.191.
Example 2. Now suppose that one of the high penetrance haplotypes has a high frequency, P = {0.05, 0.05, 0.05, 0.85}. The values become γ = 0.83 and γ1· = 0.55. The frequencies among the cases and the controls are Pr(x1 = 1 | Y) = 0.066 and Pr(x1 = 1 | N ) = 0.265 corresponding to the log relative risk of −1.39. Such value provides 99% power (at α = 0.05) with samples of 160 cases and 160 controls.
Allelic penetrance
equation (5) can be easily generalized to multiple genetic loci. Let ”
j” index all possible haplotypes that contain allele ”1” at the locus of interest. Then the marginal effect associated with allele ”1” is
where the denominator gives the frequency of ”1”. The condition of no marginal effect associated with allele ”1” is again γ1· = γ.
Example 3. As a three-locus illustration using RP, consider three diallelic loci, x1, x2, x3, with eight possible haplotype combinations, {000, 001, ..., 111}, with penetrances and population frequencies sampled from the Dirichlet(1/4, ..., 1/4) distribution, as given in .
| Table 1Population parameters for the RP example |
In this example there are two high risk haplotypes (000 and 111) carrying different alleles. By looking at the penetrance values alone, a technique that relies on the marginal effects associated with one of the three SNPs may appear unlikely to be successful. For simplicity, we considered a haploid population and sampled 250 case and 250 control haplotypes. A sample RP tree using HelixTree
® is shown in . HelixTree is a software system that uses recursive partitioning algorithms customized to the field of genetics (
www.goldenhelix.com).
The first split is on x3 with a p-value 4.6 ×10−4, signifying a substantial effect associated with this SNP. Allele ”0” at this SNP appears predictive of the cases, with the sample prevalence increasing from 0.5 in the root node to 0.86 at the first split. Following splits yield even smaller p-values. It appears that as the interaction is successfully uncovered, the p-values can still get progressively smaller even though the sample size is decreased. The right branch involving all three SNPs is significant with a p-value of 1.1 × 10−17, indicating that there is substantial interaction among SNPs, i.e. “haplotype effect”. Therefore, even in this odd situation, RP found one haplotype perfectly (x1 = x2 = x3 = 1). It also found two of the three alleles defining the second high penetrance haplotype (x1 = x2 = x3 = 0).
To examine the “no marginal effect” condition in more detail, the haplotype classes need to be reordered so that the first i = 1, ..., k of the total of m haplotypes contain the SNP of interest. For example, suppose we are looking at the effect of allele ”0” at the first SNP in . Then i = 1, ..., 3. The marginal penetrance of allele ”0” is
The marginal penetrance of allele ”1” (or all remaining alleles) is
and the population prevalence is
The condition of ”no marginal effect” can be expressed as
Consider some specific situations.
- Obviously, the equality holds when all susceptibilities γi are the same.
- Next, assume very high haplotype diversity. In the extreme, pi = 1/m for all i. In this case (12) simplifies to
If markers are diallelic,
k =
m/2, then (
12) simplifies to
The ”pure interaction” penetrance considered above with the condition given in (
7) corresponds to this case.
- To examine the “orthogonal penetrance” case across various frequencies of haplotypes, we simulated the penetrance configuration similar to that given in with haplotype population frequencies following the Dirichlet(1) distribution. In each of the 50,000 simulations we uniformly sampled penetrances of high risk haplotypes (000, 111) between 0.2 and 0.9. Penetrances of low risk haplotypes were set equal to each other and sampled from the interval (0, 0.1) at each simulation. shows the simulation results. Haplotype and SNP risk distribution refer to the risk of high susceptibility haplotype or SNP relative to the population prevalence that is specific to a particular simulation run. We denote this quantity by RRP: “risk of a genetic variant relative to the population prevalence”. The overall distribution of the population prevalence, as well as the prevalence among the carriers of a susceptibility SNP, is given in the second row. The 95% quantile of the population prevalence is 0.326. However, the last graph (SNP penetrance) is shifted to the right relative to the prevalence distribution, and 25% of its distribution is above the 95% quantile for the population prevalence, indicating the presence of effects associated with SNPs.
- Many penetrance configurations, e.g. the case of a single highly penetrant haplotype, will induce marginal effects associated with SNPs. It is noteworthy to mention that even if the equality (12) holds for a particular SNP, the indices i = 1, ...k, i = k + 1, ...m would need to be rearranged each time a different SNP is considered, which is likely to result in the marginal effect at one of the SNPs. A situation we examined numerically is when there is one high frequency haplotype with the frequency p′ and the penetrance γ′, while other haplotypes are relatively rare. When all haplotypes except p′ are of the same frequency, the condition (12) becomes
where

and

. The following results are based on 150,000 simulations.
- The high susceptibility haplotype has the lowest frequency (). Mean SNP relative risk (RRP, as defined above) was 1.903. Mean haplotype RRP was 5.843. The prevalence 95% quantile was 0.089. The proportion of SNP penetrance distribution above that value was 0.627.
- The high susceptibility haplotype has an intermediate (random) but never the highest frequency (). Mean SNP relative risk was 2.076. Mean haplotype RRP was 5.269. The prevalence 95% quantile was 0.097. The proportion of SNP penetrance distribution above that value was 0.695.
- The high susceptibility haplotype has the highest frequency (). Mean SNP relative risk was 1.243. Mean haplotype RRP was 1.516. The prevalence 95% quantile was 0.348. the proportion of SNP penetrance distribution above that value was 0.170.
Examples and simulation results presented in this section suggest that even in situations where the penetrance configuration is in favor of no marginal SNP effect, there needs to be a very specific set of population frequencies of haplotypes in order for the marginal effect to be absent. We considered highly interacting models so that the effect of a high-penetrance joint set of predictors (e.g., “haplotype”) is expected to be larger than that of a SNP. Nevertheless, one would need to balance the potential increase in the effect size with the number of looks through potential models. In this light, an analyst using the RP algorithm makes a very reasonable “bet” that marginal effects are going to be present among the markers that form the best predictive combination. The multiple testing implication of this sequence of order k searches is the reduction in the false discovery rate.
An interesting observation from Figures (3–5) is that the difference in magnitude between the haplotypic and SNP effects decreases as the frequency of the high-penetrance haplotype increases. It follows that “common” susceptibility haplotypes can be mapped by looking at individual SNP associations. The relative risk discrepancy is higher for rare susceptibility haplotypes. This case deserves further consideration. Low frequency of high-risk haplotypes implies problems with frequency estimation in the case of an unobserved haplotype phase, as well as a lack of statistical power for analysis using haplotypes as predictors. In this case, the relatively higher frequency of SNPs may still provide better power despite the smaller effect size.