We studied type I error rate and power for tests of genotype-exposure interaction for several study designs: our proposed SACO design with 1 unaffected sibling per case, the case-parent design (5
), the tetrad design (3
), the case-sibling design analyzed by enforcing within-sibship genotype-exposure independence (4
), the population-based case-control design, and the original case-only design (1
). We applied conditional logistic regression, using pseudosiblings as necessary, for the first 4 designs and unconditional logistic regression for the last 2 (details in Web Appendix
). Restricting attention to dichotomous exposures, we considered scenarios in which the marker under study was related to risk only through linkage disequilibrium with an unmeasured causative SNP, assuming that linkage was strong enough that the recombination rate between the 2 loci in a single meiosis was effectively 0. Thus, we studied 2-locus haplotypes comprising SNPs at a causative locus and a marker locus.
To simulate exposure-related genetic population stratification, we constructed populations consisting of 2 equal-sized subpopulations, each mating only within itself. Each subpopulation had the same 4 two-SNP haplotypes in Hardy-Weinberg equilibrium but with frequencies that differed between subpopulations. (Hardy-Weinberg equilibrium, though convenient here, is not needed for validity.) For each subpopulation, we specified the following parameters: baseline disease risk, exposure prevalence, the marginal SNP frequencies at the causative locus and at the marker locus, and the correlation between them—quantities that determine the frequencies of the 4 haplotypes. To simulate homogeneous populations, we assigned the 2 subpopulations the same parameters. We based all our calculations on 1,000 families (plus 1,000 unrelated population controls for case-control studies). For family-based methods, the number of informative families was random and smaller than 1,000.
The vector (R1
) represents the relative risk parameters associated with the causative variant in each subpopulation. R1
) are the relative risks for an unexposed person carrying 1 or 2 copies of the causal variant allele, respectively, relative to an unexposed person carrying no copies; I1
) are the interaction parameters, that is, the respective ratios of the 2 genotype relative risks in the exposed over the unexposed; and Re
) is the relative risk associated with the exposure in noncarriers.
We calculated expected cell counts for the multinomial joint distribution of 2-SNP haplotypes and exposure in cases and unaffected siblings given the parental diplotypes. With 4 haplotypes, there are 10 distinct diplotypes and 100 pairs of ordered parental diplotypes, each with 4 possible (not necessarily distinct) offspring diplotypes or 16 possible case-sibling diplotype pairs, in which each sibling pair could have experienced 4 different exposure assignments. We imposed a rare disease assumption and assumed that siblings’ exposures were independent. Given (R1, R2, I1, I2, Re), the subpopulation parameters, and the number of cases, we calculated expected cell counts for the 100 × 16 × 4 cells and summed across appropriate sets to calculate expected counts for each design studied.
We computed chi-squared noncentrality parameters (NCPs) by calculating log-likelihood-ratio test statistics from pseudodata that comprised expected cell counts from the multinomial distribution described above (5
). We transformed values of the NCPs into power values for 2-sided tests with nominal type I error rate α = 0.05 using the cumulative distribution function for a noncentral χ2
. Multiplying our NCP values by K
/1,000 transforms them to NCPs for K
cases instead of 1,000.
To investigate type I error rates for a marker under exposure-related population stratification, we considered a range of no-interaction null scenarios. The risk parameters in relation to the causative variant in each subpopulation were set at (R1, R2, I1, I2, Re) = (2, 3, 1, 1, 1.5). In 1 set of scenarios, subpopulations had the same baseline risks but different exposure prevalences (0.08 and 0.32). In each subpopulation, the frequencies of the causative SNP and the marker were both 0.3. We examined no-interaction scenarios across varying degrees of exposure-related population stratification as the correlation between the causative locus, and the marker ranged from −0.1 to 0.7 in one subpopulation and from 0.7 to −0.1 in the other. When the subpopulations have the same correlation, the population has no exposure-related genetic population stratification; as the 2 subpopulation-specific correlations diverge, the exposure-related stratification in the population grows. In another set of scenarios with the same allele frequencies as before, we fixed the correlations between the causative variant and marker at 0.5 and 0.1 in the 2 subpopulations and generated varying degrees of exposure-related population stratification by letting the exposure frequency range from 0.04 to 0.36 in one subpopulation and from 0.36 to 0.04 in the other.
We also assessed the power of tests of genotype-exposure interaction in a homogenous population assuming that the causative SNP was typed. Risk parameters were set at (R1, R2, I1, I2, Re) = (2, 3, 1.5, 2.25, 1.5) or (1, 1, 1.5, 2.25, 1.5). We used log-additive genetic coding for interaction with a saturated coding for genetic main effects (i.e., fitting 2 relative risk parameters) to ensure validity of the interaction test.