In this issue of the Journal, Campa et al. (1) empirically examine interactions between single-nucleotide polymorphisms (SNPs) that have been associated with disease incidence and certain risk factors, for breast cancer. The authors analyzed data from 8576 breast cancer case subjects and 11892 control subjects from the Breast and Prostate Cancer Cohort Consortium (BPC3), more than 80% of whom are of European ancestry, for interactions between 17 SNPs that were “strongly and statistically significantly” associated with breast cancer risk in previous studies, and nine established breast cancer risk factors. They summarize that “this study does not support the hypothesis that known common breast cancer susceptibility loci strongly modify the association between established risk factors and cancer,” and they argue that “these findings are important given the size, prospective design, and comprehensive approach of the study.”
It is of interest to examine whether SNP genotype establishes a background that can affect a woman’s ability to tolerate, or to benefit from, another emergent characteristic or exposure in relation to breast cancer risk. For example, there could be practical implications if some women, defined by genotype, did not experience a breast cancer risk elevation in response to regular alcohol consumption or to the use of combined estrogen plus progestin postmenopausal hormone therapy—two of the risk factors considered in (1). Defining interaction as departure from a multiplicative odds ratio (OR) model, as the authors (1) do, seems natural for this purpose as a multiplicative model would imply that the odds ratio for the risk factor or exposure does not depend at all on SNP genotype.
At face value, the findings of Campa et al. (1) provide a strong endorsement for the usual focus on simple ratio models in epidemiological research. If the odds ratios for genotypic characteristics vary little with the values of other disease risk factors, then each can be studied in relative isolation, and odds ratios can be expected to be similar across populations, thereby simplifying the conduct and interpretation of epidemiological studies. Another recent study (2) reached a similar conclusion, based on 7610 breast cancer cases and 10196 control subjects from the Million Women Study (MWS). That study included 12 breast cancer–associated SNPs and 10 risk factors, with considerable overlap with the factors studied in the BPC3 (1).
To what extent should we consider these findings concerning interactions to be definitive, and what are the implications for interactions helping to explain disease risk associations among family members—the so-called missing heritability (3)? Also, are there implications for the likely importance of genotype in determining breast cancer risk more generally?
The power to identify a moderate departure from a multiplicative odds ratio model typically depends strongly on the strength of the association of each factor with disease risk. Campa et al. (1) describe the SNPs they studied as having odds ratios (high risk vs low risk based on previous reports) ranging from 1.15 to 1.45, with each association achieving an acceptable threshold for genome-wide statistical significance (P < 5 × 10−7). However, the per-allele odds ratios (higher vs lower risk) for these SNPs in the BPC3 consortium range only from 1.00 to 1.21, as shown in Supplementary Table 3 (available online) (1). Indeed, three of the 17 SNPs have nominal P values greater than .20. It is worth remembering that genome-wide statistical significance criteria are intended to ensure a multiple testing–corrected P value less than .05, and that the odds ratios in the genome-wide association studies in which these associations were identified may be substantially biased away from the null hypothesis of no association, particularly for SNPs that barely met the genome-wide P value criteria (4,5). Similarly, and remarkably, in the MWS study (2), only seven of the 12 SNPs studied had nominal P value of less than .05, with per-allele odds ratios ranging from 1.01 to 1.22. It follows that the genetic variants so far included in these large-scale empiric interaction studies have been mostly very weakly associated with disease risk. The investigators of MWS (2) acknowledge this issue and write that their study “lacked power to assess moderate gene-environment interactions for all but four or so SNPs most strongly related to breast cancer risk …” while noting that tens of thousands of breast cancer cases would be needed for a comprehensive evaluation of biologically plausible interactions.
The odds ratios are somewhat larger for the established risk factors considered in the BPC3, as shown in Supplementary Table 2 (available online) (1). Specifically, the estimated odds ratios between extreme risk factor categories are age at menarche (OR = 1.16), height (OR = 1.16), body mass index (OR = 0.89 for premenopausal; OR = 1.06 for postmenopausal), parity (OR = 1.23), age at menopause (OR = 1.20), family history of breast cancer (OR = 1.51), smoking status (OR = 1.14), alcohol consumption (OR = 1.31), and postmenopausal hormone therapy (OR = 1.22). However, these are rather modest associations also, from the perspective of interaction testing, and some associations seem smaller than one might anticipate from the collective epidemiological literature. Of the factors considered, age at menarche, height, body mass index, parity, and age at menopause themselves are complex phenotypes, which have both genetic and environmental risk factors. Diet and physical activity patterns over the life span may be key drivers of these phenotypes, and reliable assessment of these difficult-to-measure exposures would be needed to address the underlying gene and environment interactions. The fact that very little of breast cancer heritability is explained by findings from genome-wide association studies (6) suggests that there are strong genetic disease determinants yet to be identified, or that breast cancer familial aggregation results substantially from a shared environment among family members, or both.
The remaining risk factors considered by Campa et al. (1), such as smoking, alcohol, and hormone therapy, are more purely “environmental,” although, of course, genotype could influence decisions concerning these behaviors. One of these factors, the combined postmenopausal hormone therapy, appears to be associated with breast cancer risk quite strongly (7) and illustrates some additional issues and challenges related to interaction testing.
First, estrogen-only (E-alone) and combined estrogen plus progestin (E + P) are evidently quite different in their associations with breast cancer risk. For E-alone use, it is not clear whether there is any association, with observational studies mostly suggesting a weak positive association with breast cancer, but with the Women’s Health Initiative (WHI) randomized trial (8,9) suggesting a reduced risk of breast cancer. The BPC3 analyses do not provide evidence of any association with ever use of E-alone (OR = 1.02, 95% confidence interval = 0.93 to 1.11), as shown in Supplementary Table 2 (available online) (1). For E + P use, the WHI trial yields a two- to threefold increased risk among women who adhere for 5 or more years (7), similar to observational studies (10). However, this elevation in risk evidently drops back to basal levels within 2–3 years following cessation of use (7). Hence, the odds ratio estimate of 1.48 for ever use of E + P, as shown in Supplementary Table 2 (available online) (1), is presumably much diluted by the inclusion of short-term and, especially, distant former use of hormone therapy. A comparison between long-term current E + P users to never users, along with SNP genotype, could have considerably greater power for detecting departure from a multiplicative model.
A rather fundamental statistical feature of interaction testing is the use of a flexible model for the marginal associations for both the genetic variant and for the risk factor, so that evidence for interaction, or lack thereof, is not confounded by inadequate marginal association modeling. Even though per-allele odds ratios provide a convenient summary of marginal associations of SNPs with breast cancer risk, Campa et al. (1) do not flexibly model such associations as a function of the number of minor SNP alleles. Hence, interaction testing could be strengthened by including indicator variables for one and two minor SNP alleles in the marginal model for the SNP association, rather than simply modeling the number of SNP minor alleles.
The actual interaction test statistic could arise from including product terms between the number of minor SNP alleles and indicator variables for corresponding risk factor categories as in (1) or from products of indicator variables for categories of each. The former may often be more efficient statistically, but may not be sensitive to “environmental” effects that are localized in a specific SNP genotype. For example, Prentice et al. (11) provided preliminary evidence of a more favorable breast cancer risk pattern for both E + P and E-alone among the approximately 14% of women who had the homozygous minor allele (TT) for SNP rs3750817 in intron 2 of the fibroblast growth factor receptor 2 (FGFR2) gene. Campa et al. (1) find a nominal P value of .05 for interaction of ever use of hormone therapy with FGFR2-rs3750817, with a one df test, as shown in Supplementary Table 5 (available online). Our two df test in the analysis of data from the WHI clinical trials yielded nominal P values of .03 and .05 for E + P and E-alone, respectively (11). Instead, if we had used a one df test, the corresponding P values would have been similar for E + P (P = .02), but quite different for E-alone (P = .37). For various reasons, the BPC3 analyses (1) do not seem to definitively test interaction hypotheses concerning FGFR2-rs3750817.
In summary, investigations of the joint association of genotype and environmental exposures with breast cancer risk are at an early stage. Measurement challenges for underlying key exposures (eg, diet and physical activity) present important barriers to interaction identification, as well as to assessment of the marginal environmental factor associations with disease risk. Analyses to date do not go far toward elucidating the overall importance of genetic variants in breast cancer risk determination. Also, although interactions may sometimes occur with weak or nonexistent marginal associations, it seems that future interaction testing may wisely limit multiple testing adjustments by focusing on genetic variants having a more substantial disease association than is the case for many of the genetic variants so far considered.