In this study, we investigated the multiple-mediation model involving a three-path mediating effect using data from a case-control study. Such multiple-mediation models have been studied previously but not in the context when the study subjects are sampled according to case-control design 
. We found that bias arises in evaluating the indirect effects if the case-control sampling study design is ignored and standard logistic regressions are applied. Therefore, we proposed an approach to correct bias in estimating coefficients from the mediation analysis and provide accurate estimates of the specific indirect effects. This approach can also be employed when the original case-control study is frequency matched on one of the mediators. We employed the bootstrapping approach to assess the significance of the indirect effects. We conducted simulation studies to investigate the performance of the proposed approach and showed that, compared with the standard approach, the proposed approach provides more accurate estimates of the indirect effects as well as of the percentages mediated by the mediators. The multiple-mediation model investigated in this study is related to directed graphic models, which have been applied to the study of genetic data. For example, Zhu and Zhang 
investigated the association between genetic variants and multiple traits using a similar scenario as considered in . However, their analysis was focused on testing multiple traits (e.g., primary disease and mediators) simultaneously for identifying a common genetic variant, while our study is focused on decomposing the potential direct and/or indirect effects of a genetic variant on the primary disease. Moreover, their study was based on a family-based study design, while our study is focused on a case-control study design of the primary disease in which the controls may be frequency-matched to cases with respect to one of the mediators.
We applied the approach to investigate the mediating effects of smoking and COPD on the association between the SNP rs1051730 and lung cancer risk using lung cancer case-control GWA study data where the multiple-mediation model was employed. We concluded on the basis of the results obtained from the proposed approach that the SNP rs1051730 influences lung cancer risk indirectly through all three pathways: through smoking only, bypassing COPD (18.3%); through COPD only, bypassing smoking (30.2%); and through both smoking and COPD (20.6%). The percentages mediated through different pathways (total 69.1%) obtained using the proposed approach were more correct, according to our simulation results, whereas the percentages mediated obtained using the standard approach were either under-estimated or over-estimated. Our findings that COPD mediates the effect of the SNP on the lung cancer association concurs with a previous study of the association between the SNP rs16969968 (in tight linkage disequilibrium with rs1051730) and COPD 
, in which the authors proposed that the association between the α5 subunit nAChR SNP and lung cancer could be largely explained through its relationship to COPD. Importantly, our results confirm previous findings from our group 
that the association between the SNP rs1051730 and COPD was mediated by smoking behavior (percentage mediated
~40%). Thus, the study emphasizes the complex interrelationships among smoking, genes, COPD, and lung cancer.
One may argue that the use of self-reported, physician-diagnosed emphysema as a COPD measure could result in misclassification of the disease. For example, some studies have shown that when spirometry is used to assess COPD in smokers, estimates of undiagnosed COPD range from 50–80% 
. Such misclassification would lead to under-estimation of effect sizes for the association between genetic variants and COPD risk. However, a few studies suggest that the questionnaire-based approach to defining COPD is quite accurate for epidemiologic studies 
This study extends our previous work investigating the mediating effects of smoking and COPD on the association between the rs1051730 SNP and lung cancer using a single-mediator model 
. However, the previous study ignored the case-control study design, which might under-estimate the indirect effect of each mediator, as well as the percent mediated by each mediator. VanderWeele et al. 
used a weighted regression approach 
to address the problem of case-control study design when assessing the direct and indirect effects of genetic variants on 15q25.1 on the lung cancer risk through smoking. That study focused on only a single mediator (i.e., smoking) and showed that smoking intensity only explained a small portion (~5%) of the association between the SNP rs1051730 and lung cancer risk, which differs from the percentage we have obtained for the path through smoking only (~18%). This difference could be due to multiple reasons. First, different types of data sets were employed in the two studies: we used only ever smokers, whereas VanderWeele et al. used both never and ever smokers for the analysis. Second, we employed a multiple-mediation model, so the indirect effect through smoking only was assessed by controlling for the other mediator, COPD, whereas VanderWeele et al. did not include COPD in their model. Moreover, the study of VanderWeele et al. used a different measure based on ORs to evaluate the percentage of the effect of the SNP mediated by smoking intensity, which assumes a rare outcome disease 
and is not applicable to our situation because lung cancer is not rare in ever smokers. Most importantly, the difference in the results is due to the different scales used for the smoking intensity measure as the mediator variable. In the study of VanderWeele et al. 
, the square root of the number of cigarettes smoked per day was employed as a continuous mediator variable. In this case, the mediating effect can be interpreted as the effect at the square-root scale of the individual smoking one cigarette per day on the association between the SNP and lung cancer risk. In contrast, in our study we categorized the individuals into light smokers (<25 cigarettes smoked per day [mean number of cigarettes smoked per day
17]) and heavy smokers (≥25 cigarettes smoked per day [mean number of cigarettes smoked per day
38]). In this sense, the mediating effect should be interpreted as the effect of heavy smoking compared to light smoking on the association between the SNP and lung cancer risk, which as expected, would be higher than the square-root scale used in the VanderWeele study 
Munafo et al. 
studied the association between genetic variants on chromosome 15q25 locus and tobacco exposure as measured by self-reported daily cigarette consumption and also based on a single measurement of cotinine levels in current smokers. They found that the genetic variants have a stronger association with cotinine level than with self-reported cigarette consumption and the per-allele increase in cotinine level indicated a per-allele increase risk of lung cancer with OR
1.31. Since the lung cancer GWA studies suggested that the genetic variants increase lung cancer risk by 1.32 fold 
, Munafo and colleagues concluded that the association of 15q25 locus with lung cancer risk is likely to be mediated largely via tobacco exposure. Compared to our approach, this study in actuality did not perform any formal mediation analysis, but inferred the results partially based on the published data, and therefore, could not provide the percentage of the genetic variant-lung cancer association mediated by tobacco exposure. This fact was also noted by Spitz et al. 
. The major difference in the conclusions of these two studies could also be due to the different samples (current smokers versus ever smokers) and different smoking measures (cotinine level versus smoking quantity) used.
In our study, we focused on the multiple-mediator model shown in , which allows for the causal association of one mediator to another mediator (i.e., smoking to COPD). In our real data analysis, the causal association of smoking to COPD was known from previous studies. However, in reality, the assumed causal direction might not be known in advance and has to be obtained using theoretical justification or intuition about the area of investigation 
. The alternative is to consider both mediators to co-vary in the model, as in a parallel multiple-mediator model 
. Our approach can be applied to such models as well to correct the potential bias in the estimations of the indirect effects when case-control study data are employed.
The measure of percent mediated used in our study is usually applicable when the signs of the indirect and direct effects are the same 
. However, in the multiple-mediation model, it is possible that the indirect effects, as well as the direct effect, will have different signs. In this situation, the total effect assessed by the summation of the indirect effects and the direct effect could be arbitrary, and therefore, the percent mediated by each mediator could be greater than 1 (i.e., the total effect is less than the indirect effect), negative (i.e., the total effect and the indirect effect have opposite signs), or undefined (i.e., the total effect approaches zero) 
. One possible solution is to assess the percentages mediated using the absolute values for all indirect and direct effects 
. Alternatively, one may use other measures, such as the measure referencing the indirect effect relative to the direct effect and the proportion of the variance in outcome variable explained by the indirect effect 
. However, these measures might have the same issues, such as producing a negative value. In this study, we assumed there were no confounding factors mitigating associations among the SNP, smoking behavior, COPD, and lung cancer risk 
It should be noted that, when we refer to the direct effect, we mean the effect of the SNP on lung cancer risk directly or through pathways other than smoking and COPD.
In summary, we investigated the multiple-mediator model, which involves a three-way mediating effect from one mediator to another in a case-control study. We proposed an approach to correct the biased estimations of the indirect effects in such models due to case-control study design. The proposed approach can provide accurate estimations for indirect effects and percent mediated. It is also robust to the case-control study being frequency matched on one of the mediators. The application of the proposed multiple-mediation approach to the study of the association between SNP rs1051730 and lung cancer risk suggests that the SNP has an indirect association with lung cancer risk mainly through its effect on both smoking behavior and COPD, as well as a relatively weaker direct association with lung cancer risk. Currently, several studies are ongoing to identify genetic variants associated with smoking behaviors and COPD using existing GWA study data collected for lung cancer using simplistic regression analyses. Such studies should use more sophisticated statistical models that take into account the complex interplay of smoking, COPD, and lung cancer. Finally, additional studies that include metabolomics markers, and biochemical assays of lung carcinogens as suggested by Spitz et al. 
, and spirometry assessment among smokers as suggested by Young et al. 
, as well as together with CT scans would be needed to more accurately tease out the direct and indirect effects of the genetic variants on lung cancer risk.