|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies have identified variants on chromosome 15q25.1 that increase the risks of both lung cancer and nicotine dependence and associated smoking behavior. However, there remains debate as to whether the association with lung cancer is direct or is mediated by pathways related to smoking behavior. Here, the authors apply a novel method for mediation analysis, allowing for gene-environment interaction, to a lung cancer case-control study (1992–2004) conducted at Massachusetts General Hospital using 2 single nucleotide polymorphisms, rs8034191 and rs1051730, on 15q25.1. The results are validated using data from 3 other lung cancer studies. Tests for additive interaction (P = 2 × 10−10 and P = 1 × 10−9) and multiplicative interaction (P = 0.01 and P = 0.01) were significant. Pooled analyses yielded a direct-effect odds ratio of 1.26 (95% confidence interval (CI): 1.19, 1.33; P = 2 × 10−15) for rs8034191 and an indirect-effect odds ratio of 1.01 (95% CI: 1.00, 1.01; P = 0.09); the proportion of increased risk mediated by smoking was 3.2%. For rs1051730, direct- and indirect-effect odds ratios were 1.26 (95% CI: 1.19, 1.33; P = 1 × 10−15) and 1.00 (95% CI: 0.99, 1.01; P = 0.22), respectively, with a proportion mediated of 2.3%. Adjustment for measurement error in smoking behavior allowing up to 75% measurement error increased the proportions mediated to 12.5% and 9.2%, respectively. These analyses indicate that the association of the variants with lung cancer operates primarily through other pathways.
Three genome-wide association studies (1–3) have found associations between genetic variants on chromosome 15q25.1 and lung cancer. These variants are known to be associated with smoking behavior as well (3–9), raising the question of whether the association of the variants with lung cancer operates primarily through smoking or through other pathways (1–5, 10–14). In addition to possible effects of genetic variants on 15q25.1 on lung cancer risk either through smoking or independent of smoking, Thorgeirsson et al. (3, 13) noted a third possible explanation for the associations: that the variant may increase individuals’ vulnerability to the harmful effect of tobacco smoke, a form of gene-environment interaction. Prior studies attempting to discriminate between these possibilities have been limited by lack of adequate methods to accommodate interaction (5, 12, 14) in assessing direct and indirect effects and inadequate handling of case-control data (14). Traditional mediation methods do not allow for interaction between the effects of the exposure (the genetic variant) and the effects of the mediator (smoking), though such interaction would be present if the variant increased vulnerability to the effect of smoking (3, 13).
The method (15) we demonstrate here overcomes these limitations and is applied to a case-control lung cancer study of 1,836 cases and 1,452 controls (16) conducted at Massachusetts General Hospital (MGH). Allowing for such gene-environment interaction in estimating direct and indirect effects may be important, since prior literature has noted the possibility that the variant increases vulnerability to the effects of smoking on lung cancer (13), and there is evidence that carriers of the variant allele extract more nicotine and toxins from each cigarette (17). Analyses using the same method were also applied to 3 other genome-wide case-control studies of lung cancer (1, 2) to replicate results.
We drew 1,836 cases and 1,452 controls from a case-control study assessing the molecular epidemiology of lung cancer, which was conducted at MGH from 1992 to 2004 and is described in detail elsewhere (16). Briefly, eligible cases included any person over the age of 18 years with a diagnosis of primary lung cancer that was further confirmed by an MGH lung pathologist. The controls were recruited from the friends or spouses of cancer patients or the friends or spouses of other surgery patients in the same hospital. Potential controls that carried a previous diagnosis of any cancer (other than nonmelanoma skin cancer) were excluded from participation. Interviewer-administered questionnaires collected information on sociodemographic variables, including age (years; continuous), sex, educational history (college degree or more; yes/no), smoking intensity (cigarettes/day), and duration of smoking (years), from each subject. The study was reviewed and approved by the institutional review boards of MGH and the Harvard School of Public Health.
To confirm the findings, we replicated the analyses in 3 additional case-control studies of lung cancer: a study conducted at the University of Texas M. D. Anderson Cancer Center (2) with 2,827 cases and 2,345 controls (1995–2006); a central European study conducted by the International Agency for Research on Cancer (1) with 1,871 cases and 2,472 controls (1998–2002); and a study conducted in Toronto, Ontario, Canada (1), with 333 cases and 501 controls (1997–2002). All of the analyses were limited to Caucasians.
We selected the 2 single nucleotide polymorphisms (SNPs) rs8034191 and rs1051730, based on published reports that have shown them to have the most consistent statistically significant associations with smoking behavior. The association region of 15q25.1 is one of high linkage disequilibrium, and the 2 SNPs are highly correlated with other significant SNPs in the area and fairly representative of the region. In the MGH study, peripheral blood samples were obtained from all study participants at the time of enrollment. DNA was extracted from peripheral blood samples using the Puregene DNA Isolation Kit (Gentra Systems, Minneapolis, Minnesota). The polymorphisms in the MGH study were genotyped using a 5′-nuclease assay (TaqMan) and the ABI Prism 7900HT Sequence Detection System (Applied Biosystems, Foster City, California). Genotyping was performed by laboratory personnel blinded to clinical variables and case-control status, and analysis of a randomly selected 5% of the samples was repeated to validate genotyping procedures. Blinded genotyping results were independently reviewed by 2 of the authors. To check for genotyping error, departures from Hardy-Weinberg equilibrium in controls were examined.
Genotype data from the M. D. Anderson study were obtained with Illumina HumanHap 300 BeadChips for 1,154 cases and 1,137 controls at the Johns Hopkins Center for Inherited Disease Research (Baltimore, Maryland), and the remaining cases and controls were genotyped using TaqMan. The International Agency for Research on Cancer’s central Europe study (1) was also genotyped with the HumanHap300 BeadChip using the Illumina Infinium platform, and genotyping was conducted at the Centre National Genotypage (Paris, France). Genotype data for the Toronto study were obtained by genotyping cases and controls with the HumanHap300 BeadChip with the Illumina Infinium platform at McGill University and the Genome Quebec Innovation Centre (Montreal, Canada). Further details on genotyping in these studies can be found elsewhere (1, 2).
Number of cigarettes smoked per day was used as a measure of smoking intensity and has been shown to be a good marker for nicotine dependence (18–20). Linear regression was used for models of smoking intensity, measured as the square root of cigarettes per day so as to better approximate a linear fit. Analyses using total cigarettes per day gave similar results. Logistic regression was used to model lung cancer status both with and without a smoking × variant interaction term. A multiplicative model was used for the number of risk alleles throughout. Covariates included in the models were genotype, age, sex, college education, and smoking duration; models for lung cancer also included smoking intensity (square root of cigarettes per day). Analyses which omitted smoking duration as a covariate gave qualitatively similar conclusions.
The linear regression for smoking intensity was weighted; cases were weighted by the prevalence of lung cancer divided by the proportion of cases, and controls were weighted by 1 minus the prevalence divided by the proportion of controls in the study (15, 21). Weights were further adjusted by sampling fractions in studies (M. D. Anderson and Toronto) in which sampling fractions varied by smoking status (22). The weighting takes into account the fact that in the case-control study design, cases were selected by lung cancer status, not by smoking (15, 21). The weighted regression corresponds to the associations that would be observed in a cohort study of the same population. Robust standard errors were used to account for weighting and possible nonnormality. When sampling fractions of cases and controls varied by smoking status (M. D. Anderson and Toronto)—for example, oversampling of controls to match the cases according to smoking behavior—an offset term was used in the logistic regression to account for sampling design (22).
The regression for smoking intensity and the regression for lung cancer risk were combined to obtain direct and indirect effects using odds ratios for mediation analysis for a dichotomous outcome (15). The direct effect can be interpreted as the odds ratio for lung cancer among persons with the genetic variant versus those without the variant if smoking behavior were what it would have been without the variant. The indirect effect can be interpreted as the odds ratio for lung cancer for those with the genetic variant present comparing the risk if smoking behavior were what it would have been with versus without the genetic variant. Direct and indirect effects are averaged over all individuals (smokers and nonsmokers) and are also evaluated at the mean population level of the covariates. The proportion mediated is reported on the risk difference scale (15) and is obtained by ORd × (ORi − 1)/(ORd × ORi − 1), where ORd is the direct-effect odds ratio and ORi is the indirect-effect odds ratio.
Analyses assume that conditional on the covariates, there is no confounding of 1) the exposure-outcome relation, 2) the mediator-outcome relation, or 3) the exposure-mediator relation and that 4) there is no effect of the exposure that itself confounds the mediator-outcome relation (15). No confounding of the effect of the exposure on the mediator and on the outcome (assumptions 1 and 3), when the exposure is a genetic variant with analysis restricted to a single ethnic group, is likely to hold approximately and is generally assumed in genetic studies. The robustness of results to the confounding assumptions can be examined through sensitivity analysis techniques (23).
P values for interaction on the additive scale are reported using the relative excess risk due to interaction (24–26); P values for multiplicative interaction were obtained by means of a Wald test of the interaction coefficient in the logistic regression. Measures correspond to a 1-allele change in the genetic variants and to a 1-unit change in the cigarette measure. The measure of additive interaction assesses the extent to which the odds ratio for a 1-unit increase in both exposures exceeds the sum of the odds ratios for a 1-unit increase in each exposure considered separately. The measure of multiplicative interaction assesses the log of the ratio of the odds ratios for a 1-unit increase in both exposures relative to the product of the odds ratios for a 1-unit increase in each exposure considered separately. Analyses for measurement error are conducted using bias analysis (27). Results from the 4 studies were combined on the log-odds scale using sample-size-based meta-analysis. Analyses were implemented with SAS 9.2 (SAS Institute, Inc., Cary, North Carolina) and R 2.4 (R Foundation for Statistical Computing, Vienna, Austria).
Table 1 summarizes the demographic characteristics of the 4 studies used in the analysis. Models for lung cancer and for smoking intensity (cigarettes per day) can be combined to calculate indirect effects mediated by smoking and direct effects through other pathways (15). Table 2 shows direct and indirect effects for rs8034191, along with tests for gene-by-smoking interaction from the 4 studies and a pooled meta-analysis (full details are available in Web Table 1 (http://aje.oxfordjournals.org/)). Analyses from the MGH study indicated strong evidence for a direct effect and suggested that the indirect effect is small. Ignoring possible gene-environment interaction gave a direct-effect odds ratio of 1.35 (95% confidence interval (CI): 1.21, 1.52; P = 3 × 10−7) and an indirect-effect odds ratio of 1.01 (95% CI: 0.99, 1.02; P = 0.15) per rs8034191 C allele, with 3.6% of the increased risk being mediated by smoking. Tests for interaction were significant on the additive risk scale (P = 1 × 10−3), indicating that the effect of smoking varied by genotype, with weaker evidence on the multiplicative scale (P = 0.17). Allowing for smoking-by-gene interaction gives, for changes from 0 to 1 C allele and 0 to 2 C alleles, respectively, direct-effect odds ratios of 1.31 (95% CI: 1.15, 1.49; P = 3 × 10−5) and 1.72 (95% CI: 1.34, 2.21; P = 2 × 10−5) and indirect-effect odds ratios of 1.01 (95% CI: 0.99, 1.02; P = 0.15) and 1.03 (95% CI: 0.99, 1.07; P = 0.16), with 6.3% of the increased risk being mediated by smoking intensity in the latter scenario. The confidence interval for the indirect effect is relatively narrow; the mediated effect is a relatively small portion.
Table 3 shows results of similar analyses for rs1051730 (full details are available in Web Table 2). In the MGH study, ignoring possible interaction gives a direct-effect odds ratio of 1.32 (95% CI: 1.18, 1.48; P = 2 × 10−6) and an indirect-effect odds ratio of 1.01 (95% CI: 0.99, 1.02; P = 0.19) per A allele, with a proportion mediated of 3.5%. Tests for interaction between the effects of smoking and rs1051730 alleles indicated additive interaction (P = 0.004), with weaker evidence for multiplicative interaction (P = 0.26). Allowing for smoking-by-gene interaction gives direct-effect odds ratios of 1.29 (95% CI: 1.14, 1.46; P = 8 × 10−5) and 1.66 (95% CI: 1.30, 2.12; P = 1 × 10−8) and indirect-effect odds ratios of 1.01 (95% CI: 0.99, 1.02; P = 0.22) and 1.02 (95% CI: 0.99, 1.06; P = 0.20) for changes from 0 to 1 A allele and 0 to 2 A alleles, respectively, with 5.7% of the increased risk being mediated by smoking in the latter scenario.
As shown in Tables 2 and and3,3, all 3 replication studies exhibited patterns similar to those of the MGH study, and overall replication P values for the 3 studies again indicated significant direct effects and small indirect effects. For both rs8034191 and rs1051730, there was moderately strong evidence in the replication for interaction on both additive (P = 6 × 10−8; P = 1 × 10−7) and multiplicative (P = 0.03; P = 0.03) scales. Likewise, pooled estimates from all 4 studies, reported in the Abstract, indicated large, highly significant direct effects (P = 2 × 10−15; P = 1 × 10−15), indirect effect odds ratios close to 1 (P = 0.09; P = 0.22), and interaction on both additive (P = 2 × 10−10; P = 1 × 10−9) and multiplicative (P = 0.01; P = 0.01) scales.
We conducted further analyses using all 4 studies to allow for the possibility that the cigarettes/day measure was recorded with error and that the measure does not capture all of the relevant smoking behavior. Assuming that the cigarettes/day measure explains only 50% of the variability in the biologically relevant measure gives corrected direct- and indirect-effects odds ratios of 1.24 and 1.01, respectively, for rs8034191, with a proportion mediated of 6.3% and odds ratios of 1.25 and 1.01, respectively, for rs1051730, with a proportion mediated of 4.7%. Assuming that cigarettes/day explains only 25% of the variability in the biologically relevant measure (i.e., 75% measurement error) gives odds ratios of 1.23 and 1.03 for rs8034191 with a proportion mediated of 12.5% and odds ratios of 1.24 and 1.02 for rs1051730 with a proportion mediated of 9.2%. Measurement error may have attenuated estimates of the proportion mediated, but even in the more extreme scenario the majority of the effect appears to be direct.
The analyses here indicate that the associations of these genetic variants with lung cancer operate principally through pathways other than changing smoking intensity. Although this may initially appear surprising, further evidence supports these results. In the studies conducted here and as verified in larger studies (6–9), the effect size of the variants on smoking is only approximately 1 cigarette per day, which may be of limited biologic relevance for lung cancer.
Our conclusion is further supported by the fact that recent studies have found no association between the variants and lung cancer among nonsmokers (28, 29). These studies, in conjunction with the results presented here, suggest that although the association is not principally mediated by changing smoking behavior, it occurs in the presence of smoking. The strong empirical evidence here of variant-by-smoking interaction, on both additive and multiplicative scales, provides further support for this. The direct association of the variants with lung cancer may operate only for smokers, even though the variants do not substantially increase smoking intensity itself.
The interpretation of direct-effect estimates is complicated in the presence of a smoking-by-variant interaction: The direct effect may vary by smoking status. There appears to be a direct association for smokers, but perhaps not for nonsmokers. The natural direct effects estimated in our analyses essentially average over the direct effects for smokers and nonsmokers. Importantly, however, the indirect effects were small in all of our analyses. The associations of the variants with lung cancer do not operate primarily through changing the number of cigarettes smoked per day.
Certain biologic hypotheses are consistent with the statistical evidence. As noted above, it may be that the variant serves to increase the amount of nicotine and toxins extracted from each cigarette (17); such an effect would only be observed for smokers. Such an effect would also be observed even if the variant did not operate primarily by changing the number of cigarettes smoked per day. Smoking (or nicotine) has an effect on the regulation or is involved in some downstream action (e.g., expression) of the genes for which these SNPs are markers (30). In addition, nicotine and the tobacco derivatives N′-nitrosonornicotine and 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone have strong affinity for the nicotinic acetylcholine receptors that are also known to be present in lung tissue (31). Nicotine has been shown to be involved in lung carcinogenesis through activation of the nictonic acetylcholine receptors in nonneuronal cells (32). If the variant or any functional SNPs for which the variants are markers served to activate nicotinic receptors (33), such an effect would again only be observed for smokers.
The result that most of the association operates through pathways other than smoking seems reasonably robust to measurement error. It is possible that our estimates are inaccurate due to unmeasured confounders that affect both smoking and lung cancer. However, such factors (e.g., low socioeconomic status) would probably affect smoking and lung cancer in the same direction, and sensitivity analysis (23) indicates that this would likely lead to underestimation of the direct effect and overestimation of indirect effects. This would yet further support our conclusion that the vast majority of the association is direct, and so we have not explored such sensitivity analyses further. The method we employed for mediation (15) allows for potential variant-by-smoking interaction, as had been previously suggested (3, 13), and correctly handles case-control data. These are advantages not shared by other mediation analysis approaches. Our conclusions appear to be on fairly solid ground.
Our results here are also of historical interest. Over 50 years ago, Fisher (34) suggested that there might be a genetic variant responsible for both smoking behavior and lung cancer. He proposed that this common genetic cause might explain the association between smoking and lung cancer and thus that smoking may not itself in fact have a causal effect on lung cancer. Our results here show that, in some respects, Fisher was at least slightly correct. In previous studies, the variants on chromosome 15q25.1 have been shown to affect smoking behavior (3–9); here we have provided fairly conclusive evidence that these variants also affect lung cancer through pathways other than by increasing smoking behavior. Thus, there is indeed a common genetic cause of smoking and lung cancer. Fisher was partially correct, but only partially. As Cornfield et al. (35) clearly demonstrated, in response to Fisher, using sensitivity analysis for a hypothetical genetic variant, the effect sizes of the variants on smoking and on lung cancer here are much too small to try to explain away the causal effect of smoking itself on lung cancer.
Author affiliations: Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts (Tyler J. VanderWeele, Eric J. Tchetgen Tchetgen, David C. Christiani); Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts (Tyler J. VanderWeele, Eric J. Tchetgen Tchetgen, Xihong Lin); Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts (Kofi Asomaning, David C. Christiani); Department of Epidemiology, University of Texas M. D. Anderson Cancer Center, Houston, Texas (Younghun Han, Margaret R. Spitz, Sanjay Shete, Xifeng Wu, Christopher I. Amos); International Agency for Research on Cancer, Lyon, France (Valerie Gaborieau, Paul Brennan); Samuel Lunenfeld Research Institute of Mount Sinai Hospital, Toronto, Ontario, Canada (Ying Wang, Rayjean J. Hung); Cancer Care Ontario, Toronto, Ontario, Canada (John McLaughlin); and Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts (David C. Christiani).
The authors acknowledge funding from the US National Institutes of Health (grants ES017876, HD060696, CA076404, CA134294, CA074386, CA092824, P50CA70907-05, CA121197, CPRIT RP10043, U19 CA148127, and R01CA133996) and the Canadian Cancer Society Research Institute (grant 020214).
Conflict of interest: none declared.