Home | About | Journals | Submit | Contact Us | Français |

**|**PLoS One**|**v.7(10); 2012**|**PMC3471886

Formats

Article sections

Authors

Related links

PLoS One. 2012; 7(10): e47705.

Published online 2012 October 15. doi: 10.1371/journal.pone.0047705

PMCID: PMC3471886

Jian Wang,^{1,}^{4} Margaret R. Spitz,^{2} Christopher I. Amos,^{3} Xifeng Wu,^{4} David W. Wetter,^{5} Paul M. Cinciripini,^{6} and Sanjay Shete^{1,}^{4,}^{*}

Juan P. de Torres, Editor^{}

Clinica Universidad de Navarra, Spain

* E-mail: gro.nosrednadm@etehss

Conceived and designed the experiments: JW SS. Performed the experiments: JW SS. Analyzed the data: JW. Contributed reagents/materials/analysis tools: JW MRS CIA XW DWW PMC SS. Wrote the paper: JW SS.

Received 2012 July 2; Accepted 2012 September 14.

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

This article has been cited by other articles in PMC.

A mediation model explores the direct and indirect effects between an independent variable and a dependent variable by including other variables (or mediators). Mediation analysis has recently been used to dissect the direct and indirect effects of genetic variants on complex diseases using case-control studies. However, bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator in the study samples is not sampled following the principles of case-control study design. In this case, the mediation analysis using data from case-control studies might lead to biased estimates of coefficients and indirect effects. In this article, we investigated a multiple-mediation model involving a three-path mediating effect through two mediators using case-control study data. We propose an approach to correct bias in coefficients and provide accurate estimates of the specific indirect effects. Our approach can also be used when the original case-control study is frequency matched on one of the mediators. We employed bootstrapping to assess the significance of indirect effects. We conducted simulation studies to investigate the performance of the proposed approach, and showed that it provides more accurate estimates of the indirect effects as well as the percent mediated than standard regressions. We then applied this approach to study the mediating effects of both smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 gene locus and lung cancer risk using data from a lung cancer case-control study. The results showed that the genetic variant influences lung cancer risk indirectly through all three different pathways. The percent of genetic association mediated was 18.3% through smoking alone, 30.2% through COPD alone, and 20.6% through the path including both smoking and COPD, and the total genetic variant-lung cancer association explained by the two mediators was 69.1%.

A mediation model is a statistical approach that explores the direct and indirect effects of an independent variable (i.e., initial variable) on a dependent variable (i.e., outcome variable) by including one or more mediating variables (or mediators) [1]. In some scenarios, the mediation model can infer the causal effects from the initial variable to the mediator variable and then to the outcome variable [1]. Mediation models have been widely applied in many different fields [2], such as psychology, behavioral science, genetic epidemiology, prevention research, and political communication research. Recently, there have been efforts in using mediation analysis to dissect the direct and indirect effects of genetic variants on complex diseases in genetic variant association studies [3]–[7]. Most of these studies used data from genome-wide association (GWA) studies, in which the outcome variables were selected on the basis of case-control study design. For example, our group has applied single-mediator analysis (i.e., the Baron-Kenny procedure) to identify the mediation effects of smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 genetic locus and lung cancer risk using data from a case-control GWA study of lung cancer [6]. However, ignoring the case-control study design and applying standard regressions might result in biased estimations of the indirect effects. According to recent studies of secondary phenotypes, the bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator (i.e., cases and controls with respect to the mediator) is not sampled following the principles of case-control study design [8]–[12]. In this case, the mediation analysis using data from case-control studies might lead to biased indirect effect estimates, either over- or under-estimated depending on the prevalence values of outcome and mediators.

Lung cancer GWA studies have consistently shown that the CHRNA5-A3 gene cluster is strongly associated with an increased risk of lung cancer. Also, multiple studies have associated SNPs spanning this region with heavy smoking, nicotine dependence, smoking cessation and COPD [13]–[19]. Thus, there is a debate about whether the genetic variants have an impact on lung cancer risk directly or exert their effect largely through the profound effect of the variants on smoking intensity [20]–[22] or COPD [23]. Further work investigating this association concluded that there are dual pathways between the genetic variant and lung cancer association, independently via a direct effect on lung carcinogenesis and through smoking behavior [6], [7], [15], [24]–[26]. More recent studies of current smokers have shown that the genetic variants on CHRNA5-A3 gene cluster have a stronger association with cotinine levels than with self-reported smoking behavior, and suggested that the effect of the genetic variants on lung cancer risk, is largely, if not exclusively, through their effect on smoking intensity [27]–[29]. However, in an accompanying editorial Spitz et al [21] concluded that the degree to which the association is mediated by smoking is yet to be determined. Prior studies focused on one mediator (e.g., smoking) at a time, and none has studied multiple mediators simultaneously in one model. However, in reality, more than one mediator could affect the association between the genetic variant and lung cancer risk. In our previous analysis [6], we found that in single-mediator analyses smoking and COPD were mediators of the association between the single-nucleotide polymorphism (SNP) rs1051730 and risk of lung cancer. However, analyzing multiple mediators in one model could have some advantages over such single-mediator analyses [30].

The multiple-mediation model used for the study of the SNP, smoking, COPD and lung cancer risk is depicted as a path diagram in Figure 1. The multiple-mediation model includes a three-path mediating effect through both smoking and COPD, which allows one mediator (i.e., smoking) to causally affect the other mediator (i.e., COPD) [31]. This causal association is biologically compelling because smoking is the known major risk factor for COPD [32]. The underlying assumption of this three-path mediating effect is that the individuals carrying the deleterious allele of rs1051730 are more likely to be heavy smokers, which in turn leads to a higher risk of COPD, which in turn increases the risk of lung cancer. Thus, in addition to the indirect effects passing through each of the mediators alone, we will investigate the indirect effect passing through both mediators.

To our knowledge, there has been no previous study investigating such a multiple mediation model in the case-control study design setting, in which the standard regression approach could provide biased estimations for the indirect effects as we described above. Therefore, we developed an approach to conduct a multiple-mediation analysis using the model shown in Figure 1. We conducted simulations to investigate the performance of the proposed approach, and these showed the approach can provide accurate estimates of the indirect effects. The bootstrapping approach was applied to assess the significance of the indirect effects and total effect. We also developed an approach for when the original case-control study is frequency matched on one of the mediators, as in our lung cancer case-control study where controls are frequency matched to cases with respect to smoking status. We applied the proposed approach to the multiple-mediation study of the simultaneous mediating effects of smoking and COPD on the association between SNP rs1051730 and lung cancer risk using lung cancer case-control GWA study data.

Let *X*, *M _{1}*,

(1)

(2)

(3)

where *a _{0}*,

When the data of interest are randomly sampled from the general population, the estimations of the indirect effects and the percent mediated are accurate. However, if the data are sampled based on a case-control study design, the estimated associations among the initial variable and both mediators (i.e., *a _{1}*,

As stated above, the regression coefficient *d*, of the *M _{1}*–

(4)

where *E _{kj}* is the expected number of individuals in the sample, with

where *j*, *k*, *r*=0, 1. The conditional probability *p _{kj|r}* is written as

The probabilities *p _{1}* and

and where *b _{0}*,

(5)

(6)

Given a sample with *N* independent individuals for a case-control study of the disease (*Y*), one can estimate the regression coefficients *b _{1}* and

When the genetic variant is assumed to be additive, special care needs to be taken. In this situation, we used a categorical random variable, , to denote the three genotypes , , and . We employed the property that the biased OR obtained using logistic regression is given by the per-allele OR and adapted the approach for an additive model proposed in our previous study [35]. To obtain the true per-allele OR, we assessed biased OR in two ways. First, we obtained the biased OR_{1} by calculating the OR of SNP random variable *X*=1 versus *X*=0, which gives the OR for heterozygous genotype against wild-type homozygous genotype. Second, we obtained the biased OR_{2} by calculating the OR of SNP random variable *X*=2 versus *X*=0, which gives the OR for homozygous genotype for variant allele against wild-type homozygous genotype. On the basis of OR_{1} and OR_{2}, and following the different formulas in our previous study [12], we obtained two corrected coefficients, and the final corrected coefficient for the additive genetic model is the average of these.

Frequency matching is an important and commonly used study design for known risk confounders and has been widely used in case-control studies [38]. In the analysis of real lung cancer data, because smoking is a well-known risk confounder for the association between lung cancer and other risk factors, controls were frequency matched to lung cancer cases with respect to smoking status. That is, for the multiple mediation model shown in Figure 1, the disease cases and controls are frequency matched on the mediator *M _{1}*. In this scenario, frequency-matching design also contributes to bias in the estimate of the coefficients for associations among the SNP and the mediators (i.e.,

for *i=*0, 1, 2 and *j*=0, 1.

The parameter was denoted as the difference in the proportions of individuals with the presence of the mediator *M _{1}* in the disease cases and controls, given as =prop(

and

for *i=*0, 1, 2, and *j*=0, 1.

When assessing the corrected coefficient , we used a similar formula to evaluate the expected numbers of individual *E _{kj}*:

for *j*, *k*=0, 1.

The conditional probabilities and are defined as:

and

for *j*, *k*=0, 1.

If the original disease case-control study is frequency matched on the mediator *M _{1}*, the estimated value of

Bootstrapping has been employed to evaluate the significance of indirect effects in a multiple-mediator model [30], [33] to overcome the difficulty in assessing standard errors for the indirect effects. In this study, we also used the empirical confidence intervals (CIs), based on a resampling-based method with replacement [39]. Given the regression coefficients *b _{1}*, and

- Take
*B*samples with replacement from the study data, each with*n*individuals from the disease cases and_{1}*n*samples from the disease controls (_{0}*n=n*). Note that_{0}+n_{1}*n*≤_{0}*N*and_{0}*n*≤_{1}*N*, where_{1}*N*and_{0}*N*are numbers of cases and controls with respect to the disease in the study sample._{1} - Evaluate the bootstrap regression coefficients using logistic regressions based on the bootstrap samples. Denote the bootstrap coefficients as , , , , and ,
*u*=1, 2, …,*B*. The corrected coefficients , , and ,*u*=1, 2, …,*B*are calculated by using the approaches described above. - The bootstrap indirect effects are assessed as , , and ++,
*u*=1, 2, …,*B*. Let , , and be the*u*th ordered bootstrap indirect effects estimations, respectively. Then the 100(1- )% CIs of indirect effects are given as (,), (,), (,), and (,), respectively.

We performed simulation studies to investigate the performance of our approach for evaluating the indirect effects in the multiple-mediation model in a case-control study (Figure 1). To mimic the real data analysis of lung cancer, we assumed a single di-allele SNP with a minor allele frequency (MAF) of 37%. We used 14%, 24%, and 12% as the prevalence values for the disease (*Y*), the mediator *M _{2}*, and the mediator

First, we generated genotypes for a SNP using the genotype frequencies, which can be calculated from the MAF. The mediator *M _{1}* values were then generated on the basis of the dataset of realizations of the SNP using Equation (1), assuming different genetic models for the SNP. Conditioned on mediator

The average results of the regression coefficients *a _{1}*,

For the unmatched case-control study design, when the standard logistic regressions were applied, the estimates of *c′*, *b _{1}*, and

When the case-control study was frequency-matched with mediator *M _{1}*, in addition to the coefficients

Table 2 reports the average results for the indirect effects and the percent mediated through two mediators on the effect of the genetic variant on the disease, assessed on the basis of the regression coefficient results reported in Table 1. The true indirect effects, total effect, and percent mediated are listed in the table for each scenario. We considered several specific indirect effects involved in the multiple-mediation model (Figure 1), including the indirect effect through the mediator *M _{1}*, bypassing mediator

For the unmatched case-control study, when the standard regression approach was applied, the estimates of the specific indirect effects, as well as the total effect, were biased compared to the true values. This was expected because the coefficients used to assess the indirect effects and total effect were biased. For example, for scenario one with a dominant genetic model (unmatched study), the specific indirect effects and the total effect were given as *IE _{1}*=0.51,

When the case-control study was frequency matched with mediator *M _{1}*, the magnitudes of bias in the estimations of indirect effects, total effect, and percent mediated were larger than those for the unmatched study when applying the standard approach. For example, in scenario one for the frequency-matched design, when the proportion of individuals with presence of

Therefore, we observed from the overall simulation results that the standard logistic regressions provided biased estimates for the coefficients *a _{1}*,

We applied our approach to assess the mediating effects of smoking behavior and COPD simultaneously on the association between the SNP rs1051730 and lung cancer risk using a multiple-mediation model (Figure 1) based on the data from a lung cancer GWA study [6], [20], [25]. This analysis included *N _{1}*=1,153 lung cancer case subjects who were current or former smokers and

Table 3 reports the estimated coefficients, indirect effects, total effects, and percentages mediated for the SNP-lung cancer association obtained using both the standard and proposed approaches. As we showed in the simulation studies, the estimated coefficients *b _{2}* and

When the standard logistic regression approach was applied, not all three specific indirect effects were statistically significant, as evidenced by some bootstrap CIs containing zeros (Table 3). The first indirect effect carries the effect of the SNP on increasing lung cancer risk through only smoking, bypassing COPD. This indirect effect was assessed by the product of *a _{1}* and

The total effect of the SNP on lung cancer risk was calculated as the sum of the direct (*c′*) and total indirect (*IE _{t}*) effects (

Therefore, we applied the new approach proposed in this article to estimate the indirect effects of smoking and COPD on the association between the SNP and lung cancer risk (see Table 3). The indirect effect of smoking, bypassing COPD, was evaluated by using the product of and the fixed *b _{1}* value (i.e., log(1.86)) and was found to be equal to

In this study, we investigated the multiple-mediation model involving a three-path mediating effect using data from a case-control study. Such multiple-mediation models have been studied previously but not in the context when the study subjects are sampled according to case-control design [31], [33]. We found that bias arises in evaluating the indirect effects if the case-control sampling study design is ignored and standard logistic regressions are applied. Therefore, we proposed an approach to correct bias in estimating coefficients from the mediation analysis and provide accurate estimates of the specific indirect effects. This approach can also be employed when the original case-control study is frequency matched on one of the mediators. We employed the bootstrapping approach to assess the significance of the indirect effects. We conducted simulation studies to investigate the performance of the proposed approach and showed that, compared with the standard approach, the proposed approach provides more accurate estimates of the indirect effects as well as of the percentages mediated by the mediators. The multiple-mediation model investigated in this study is related to directed graphic models, which have been applied to the study of genetic data. For example, Zhu and Zhang [47] investigated the association between genetic variants and multiple traits using a similar scenario as considered in Figure 1. However, their analysis was focused on testing multiple traits (e.g., primary disease and mediators) simultaneously for identifying a common genetic variant, while our study is focused on decomposing the potential direct and/or indirect effects of a genetic variant on the primary disease. Moreover, their study was based on a family-based study design, while our study is focused on a case-control study design of the primary disease in which the controls may be frequency-matched to cases with respect to one of the mediators.

We applied the approach to investigate the mediating effects of smoking and COPD on the association between the SNP rs1051730 and lung cancer risk using lung cancer case-control GWA study data where the multiple-mediation model was employed. We concluded on the basis of the results obtained from the proposed approach that the SNP rs1051730 influences lung cancer risk indirectly through all three pathways: through smoking only, bypassing COPD (18.3%); through COPD only, bypassing smoking (30.2%); and through both smoking and COPD (20.6%). The percentages mediated through different pathways (total 69.1%) obtained using the proposed approach were more correct, according to our simulation results, whereas the percentages mediated obtained using the standard approach were either under-estimated or over-estimated. Our findings that COPD mediates the effect of the SNP on the lung cancer association concurs with a previous study of the association between the SNP rs16969968 (in tight linkage disequilibrium with rs1051730) and COPD [23], in which the authors proposed that the association between the α5 subunit nAChR SNP and lung cancer could be largely explained through its relationship to COPD. Importantly, our results confirm previous findings from our group [6] that the association between the SNP rs1051730 and COPD was mediated by smoking behavior (percentage mediated=~40%). Thus, the study emphasizes the complex interrelationships among smoking, genes, COPD, and lung cancer.

One may argue that the use of self-reported, physician-diagnosed emphysema as a COPD measure could result in misclassification of the disease. For example, some studies have shown that when spirometry is used to assess COPD in smokers, estimates of undiagnosed COPD range from 50–80% [23], [48]–[52]. Such misclassification would lead to under-estimation of effect sizes for the association between genetic variants and COPD risk. However, a few studies suggest that the questionnaire-based approach to defining COPD is quite accurate for epidemiologic studies [53]–[55].

This study extends our previous work investigating the mediating effects of smoking and COPD on the association between the rs1051730 SNP and lung cancer using a single-mediator model [6]. However, the previous study ignored the case-control study design, which might under-estimate the indirect effect of each mediator, as well as the percent mediated by each mediator. VanderWeele et al. [7] used a weighted regression approach [56] to address the problem of case-control study design when assessing the direct and indirect effects of genetic variants on 15q25.1 on the lung cancer risk through smoking. That study focused on only a single mediator (i.e., smoking) and showed that smoking intensity only explained a small portion (~5%) of the association between the SNP rs1051730 and lung cancer risk, which differs from the percentage we have obtained for the path through smoking only (~18%). This difference could be due to multiple reasons. First, different types of data sets were employed in the two studies: we used only ever smokers, whereas VanderWeele et al. used both never and ever smokers for the analysis. Second, we employed a multiple-mediation model, so the indirect effect through smoking only was assessed by controlling for the other mediator, COPD, whereas VanderWeele et al. did not include COPD in their model. Moreover, the study of VanderWeele et al. used a different measure based on ORs to evaluate the percentage of the effect of the SNP mediated by smoking intensity, which assumes a rare outcome disease [56] and is not applicable to our situation because lung cancer is not rare in ever smokers. Most importantly, the difference in the results is due to the different scales used for the smoking intensity measure as the mediator variable. In the study of VanderWeele et al. [7], the square root of the number of cigarettes smoked per day was employed as a continuous mediator variable. In this case, the mediating effect can be interpreted as the effect at the square-root scale of the individual smoking one cigarette per day on the association between the SNP and lung cancer risk. In contrast, in our study we categorized the individuals into light smokers (<25 cigarettes smoked per day [mean number of cigarettes smoked per day =17]) and heavy smokers (≥25 cigarettes smoked per day [mean number of cigarettes smoked per day = 38]). In this sense, the mediating effect should be interpreted as the effect of heavy smoking compared to light smoking on the association between the SNP and lung cancer risk, which as expected, would be higher than the square-root scale used in the VanderWeele study [7].

Munafo et al. [27] studied the association between genetic variants on chromosome 15q25 locus and tobacco exposure as measured by self-reported daily cigarette consumption and also based on a single measurement of cotinine levels in current smokers. They found that the genetic variants have a stronger association with cotinine level than with self-reported cigarette consumption and the per-allele increase in cotinine level indicated a per-allele increase risk of lung cancer with OR=1.31. Since the lung cancer GWA studies suggested that the genetic variants increase lung cancer risk by 1.32 fold [20], Munafo and colleagues concluded that the association of 15q25 locus with lung cancer risk is likely to be mediated largely via tobacco exposure. Compared to our approach, this study in actuality did not perform any formal mediation analysis, but inferred the results partially based on the published data, and therefore, could not provide the percentage of the genetic variant-lung cancer association mediated by tobacco exposure. This fact was also noted by Spitz et al. [28]. The major difference in the conclusions of these two studies could also be due to the different samples (current smokers versus ever smokers) and different smoking measures (cotinine level versus smoking quantity) used.

In our study, we focused on the multiple-mediator model shown in Figure 1, which allows for the causal association of one mediator to another mediator (i.e., smoking to COPD). In our real data analysis, the causal association of smoking to COPD was known from previous studies. However, in reality, the assumed causal direction might not be known in advance and has to be obtained using theoretical justification or intuition about the area of investigation [57]. The alternative is to consider both mediators to co-vary in the model, as in a parallel multiple-mediator model [30]. Our approach can be applied to such models as well to correct the potential bias in the estimations of the indirect effects when case-control study data are employed.

The measure of percent mediated used in our study is usually applicable when the signs of the indirect and direct effects are the same [58]. However, in the multiple-mediation model, it is possible that the indirect effects, as well as the direct effect, will have different signs. In this situation, the total effect assessed by the summation of the indirect effects and the direct effect could be arbitrary, and therefore, the percent mediated by each mediator could be greater than 1 (i.e., the total effect is less than the indirect effect), negative (i.e., the total effect and the indirect effect have opposite signs), or undefined (i.e., the total effect approaches zero) [59]. One possible solution is to assess the percentages mediated using the absolute values for all indirect and direct effects [60]. Alternatively, one may use other measures, such as the measure referencing the indirect effect relative to the direct effect and the proportion of the variance in outcome variable explained by the indirect effect [61]. However, these measures might have the same issues, such as producing a negative value. In this study, we assumed there were no confounding factors mitigating associations among the SNP, smoking behavior, COPD, and lung cancer risk [7].

It should be noted that, when we refer to the direct effect, we mean the effect of the SNP on lung cancer risk directly or through pathways other than smoking and COPD.

In summary, we investigated the multiple-mediator model, which involves a three-way mediating effect from one mediator to another in a case-control study. We proposed an approach to correct the biased estimations of the indirect effects in such models due to case-control study design. The proposed approach can provide accurate estimations for indirect effects and percent mediated. It is also robust to the case-control study being frequency matched on one of the mediators. The application of the proposed multiple-mediation approach to the study of the association between SNP rs1051730 and lung cancer risk suggests that the SNP has an indirect association with lung cancer risk mainly through its effect on both smoking behavior and COPD, as well as a relatively weaker direct association with lung cancer risk. Currently, several studies are ongoing to identify genetic variants associated with smoking behaviors and COPD using existing GWA study data collected for lung cancer using simplistic regression analyses. Such studies should use more sophisticated statistical models that take into account the complex interplay of smoking, COPD, and lung cancer. Finally, additional studies that include metabolomics markers, and biochemical assays of lung carcinogens as suggested by Spitz et al. [28], and spirometry assessment among smokers as suggested by Young et al. [23], as well as together with CT scans would be needed to more accurately tease out the direct and indirect effects of the genetic variants on lung cancer risk.

Correction of coefficients *a _{1}* and

(DOCX)

Click here for additional data file.^{(47K, docx)}

This work was supported by United States National Institutes of Health (NIH) grant R01CA131324 (SS) and by a faculty fellowship from The University of Texas MD Anderson Cancer Center Duncan Family Institute for Cancer Prevention and Risk Assessment (JW). This study makes use of lung cancer data generated by support from NIH grants U19CA148127 and R01CA121197. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

1. MacKinnon DP (2008) Introduction to statistical mediation analysis. New York: Erlbaum. 488 p.

2. Shrout PE, Bolger N (2002) Mediation in experimental and nonexperimental studies: new procedures and recommendations. Psychol Methods 7: 422–445 [PubMed]

3. Blackburn EH (2011) Walking the walk from genes through telomere maintenance to cancer risk. Cancer Prev Res (Phila) 4: 473–475 [PubMed]

4. Gu J, Chen M, Shete S, Amos CI, Kamat A, et al. (2011) A genome-wide association study identifies a locus on chromosome 14q21 as a predictor of leukocyte telomere length and as a marker of susceptibility for bladder cancer. Cancer Prev Res (Phila) 4: 514–521 [PMC free article] [PubMed]

5. Ishii T, Wakabayashi R, Kurosaki H, Gemma A, Kida K (2011) Association of serotonin transporter gene variation with smoking, chronic obstructive pulmonary disease, and its depressive symptoms. J Hum Genet 56: 41–46 [PubMed]

6. Wang J, Spitz MR, Amos CI, Wilkinson AV, Wu X, et al. (2010) Mediating effects of smoking and chronic obstructive pulmonary disease on the relation between the CHRNA5-A3 genetic locus and lung cancer risk. Cancer 116: 3458–3462 [PMC free article] [PubMed]

7. Vanderweele TJ, Asomaning K, Tchetgen Tchetgen EJ, Han Y, Spitz MR, et al. (2012) Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction. Am J Epidemiol 175: 1013–1020 [PMC free article] [PubMed]

8. Li H, Gail MH, Berndt S, Chatterjee N (2010) Using cases to strengthen inference on the association between single nucleotide polymorphisms and a secondary phenotype in genome-wide association studies. Genet Epidemiol 34: 427–433 [PMC free article] [PubMed]

9. Lin DY, Zeng D (2009) Proper analysis of secondary phenotype data in case-control association studies. Genet Epidemiol 33: 256–265 [PMC free article] [PubMed]

10. Richardson DB, Rzehak P, Klenk J, Weiland SK (2007) Analyses of case-control data for additional outcomes. Epidemiology 18: 441–445 [PubMed]

11. Wang J, Shete S (2011) Power and type I error results for a bias-correction approach recently shown to provide accurate odds ratios of genetic variants for the secondary phenotypes associated with primary diseases. Genet Epidemiol 35: 739–743 [PMC free article] [PubMed]

12. Wang J, Shete S (2011) Estimation of odds ratios of genetic variants for the secondary phenotypes associated with primary diseases. Genet Epidemiol 35: 190–200 [PMC free article] [PubMed]

13. Chen LS, Baker TB, Piper ME, Breslau N, Cannon DS, et al. (2012) Interplay of Genetic Risk Factors (CHRNA5-CHRNA3-CHRNB4) and Cessation Treatments in Smoking Cessation Success. Am J Psychiatry. [PMC free article] [PubMed]

14. Chen LS, Saccone NL, Culverhouse RC, Bracci PM, Chen CH, et al. (2012) Smoking and genetic risk variation across populations of European, Asian, and African American ancestry–a meta-analysis of chromosome 15q25. Genet Epidemiol 36: 340–351 [PMC free article] [PubMed]

15. Kaur-Knudsen D, Bojesen SE, Tybjaerg-Hansen A, Nordestgaard BG (2011) Nicotinic acetylcholine receptor polymorphism, smoking behavior, and tobacco-related cancer and lung and cardiovascular diseases: a cohort study. J Clin Oncol 29: 2875–2882 [PubMed]

16. Liu JZ, Tozzi F, Waterworth DM, Pillai SG, Muglia P, et al. (2010) Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat Genet 42: 436–440 [PMC free article] [PubMed]

17. Pillai SG, Ge D, Zhu G, Kong X, Shianna KV, et al. (2009) A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. Plos Genetics 5: e1000421. [PMC free article] [PubMed]

18. Furberg H, Kim Y, Dackor J, Boerwinkle E, Franceschini N, et al. (2010) Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet 42: 441–447 [PMC free article] [PubMed]

19. Thorgeirsson TE, Gudbjartsson DF, Surakka I, Vink JM, Amin N, et al. (2010) Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior. Nat Genet 42: 448–453 [PMC free article] [PubMed]

20. Amos CI, Wu X, Broderick P, Gorlov IP, Gu J, et al. (2008) Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet 40: 616–622 [PMC free article] [PubMed]

21. Hung RJ, McKay JD, Gaborieau V, Boffetta P, Hashibe M, et al. (2008) A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452: 633–637 [PubMed]

22. Thorgeirsson TE, Geller F, Sulem P, Rafnar T, Wiste A, et al. (2008) A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452: 638–642 [PubMed]

23. Young RP, Hopkins RJ, Hay BA, Epton MJ, Black PN, et al. (2008) Lung cancer gene associated with COPD: triple whammy or possible confounding effect? Eur Respir J 32: 1158–1164 [PubMed]

24. Lips EH, Gaborieau V, McKay JD, Chabrier A, Hung RJ, et al. (2010) Association between a 15q25 gene variant, smoking quantity and tobacco-related cancers among 17 000 individuals. Int J Epidemiol 39: 563–577 [PMC free article] [PubMed]

25. Spitz MR, Amos CI, Dong Q, Lin J, Wu X (2008) The CHRNA5-A3 region on chromosome 15q24-25.1 is a risk factor both for nicotine dependence and for lung cancer. J Natl Cancer Inst 100: 1552–1556 [PMC free article] [PubMed]

26. Wacholder S, Chatterjee N, Caporaso N (2008) Intermediacy and gene-environment interaction: the example of CHRNA5-A3 region, smoking, nicotine dependence, and lung cancer. J Natl Cancer Inst 100: 1488–1491 [PMC free article] [PubMed]

27. Munafo MR, Timofeeva MN, Morris RW, Prieto-Merino D, Sattar N, et al. (2012) Association between genetic variants on chromosome 15q25 locus and objective measures of tobacco exposure. J Natl Cancer Inst 104: 740–748 [PMC free article] [PubMed]

28. Spitz MR, Amos CI, Bierut LJ, Caporaso NE (2012) Cotinine conundrum–a step forward but questions remain. J Natl Cancer Inst 104: 720–722 [PMC free article] [PubMed]

29. Keskitalo K, Broms U, Heliovaara M, Ripatti S, Surakka I, et al. (2009) Association of serum cotinine level with a cluster of three nicotinic acetylcholine receptor genes (CHRNA3/CHRNA5/CHRNB4) on chromosome 15. Hum Mol Genet 18: 4007–4012 [PMC free article] [PubMed]

30. Preacher KJ, Hayes AF (2008) Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behav Res Methods 40: 879–891 [PubMed]

31. Hayes AF, Preacher KJ, Myers TA (2011) Mediation and the estimation of indirect effects in political communication research. In: Bucy EO, Lance Holbert R, editors. Sourcebook for Political Communication Research: Methods, Measures, and Analytical Techniques. New York: Routledge. 434–465.

32. Young RP, Hopkins RJ, Christmas T, Black PN, Metcalf P, et al. (2009) COPD prevalence is increased in lung cancer, independent of age, sex and smoking history. Eur Respir J 34: 380–386 [PubMed]

33. Taylor AB, MacKinnon DP, Tein JY (2008) Tests of the three-path mediated effect. Organizational Research Methods 11: 241–269

34. MacKinnon DP, Lockwood CM, Brown CH, Wang W, Hoffman JM (2007) The intermediate endpoint effect in logistic and probit regression. Clin Trials 4: 499–513 [PMC free article] [PubMed]

35. Wang J, Shete S (2012) Analysis of secondary phenotype involving the interative effect of the secondary phenotype and genetic variants on the primary disease. Under review. [PMC free article] [PubMed]

36. Mathworks (2002) Matlab. Cambridge, MA: Mathworks.

37. Powell MJD (1970) A fortran subroutine for solving systems of nonlinear algebraic equations. In: Rabinowitz P, editor. Numerical methods for nonlinear algebraic equations. New York: Gordon and Breach. 115–161.

38. Rothman KJ, Greenland S (1998) Modern epidemiology. Philadelphia, PA: Lippincott Williams & Wilkins. 737 p.

39. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. New York: Chapman and Hall. 436 p.

40. Villeneuve PJ, Mao Y (1994) Lifetime probability of developing lung cancer, by smoking status, Canada. Can J Public Health 85: 385–388 [PubMed]

41. Lamprecht B, McBurnie MA, Vollmer WM, Gudmundsson G, Welte T, et al. (2011) COPD in never smokers: results from the population-based burden of obstructive lung disease study. Chest 139: 752–763 [PMC free article] [PubMed]

42. CDC (2005) Cigarette smoking among adults – United States, 2004. MMWR 54: 1121–1124 [PubMed]

43. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106: 9362–9367 [PubMed]

44. Spencer CC, Su Z, Donnelly P, Marchini J (2009) Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. Plos Genetics 5: e1000477. [PMC free article] [PubMed]

45. Shete S, Hosking FJ, Robertson LB, Dobbins SE, Sanson M, et al. (2009) Genome-wide association study identifies five susceptibility loci for glioma. Nat Genet 41: 899–904 [PubMed]

46. Peto R, Darby S, Deo H, Silcocks P, Whitley E, et al. (2000) Smoking, smoking cessation, and lung cancer in the UK since 1950: combination of national statistics with two case-control studies. BMJ 321: 323–329 [PMC free article] [PubMed]

47. Zhu W, Zhang H (2009) Why Do We Test Multiple Traits in Genetic Association Studies? J Korean Stat Soc 38: 1–10 [PMC free article] [PubMed]

48. Lindberg A, Bjerg A, Ronmark E, Larsson LG, Lundback B (2006) Prevalence and underdiagnosis of COPD by disease severity and the attributable fraction of smoking Report from the Obstructive Lung Disease in Northern Sweden Studies. Respir Med 100: 264–272 [PubMed]

49. Mannino DM, Watt G, Hole D, Gillis C, Hart C, et al. (2006) The natural history of chronic obstructive pulmonary disease. Eur Respir J 27: 627–643 [PubMed]

50. Stav D, Raz M (2007) Prevalence of chronic obstructive pulmonary disease among smokers aged 45 and up in Israel. Isr Med Assoc J 9: 800–802 [PubMed]

51. Zaas D, Wise R, Wiener C (2004) Airway obstruction is common but unsuspected in patients admitted to a general medicine service. Chest 125: 106–111 [PubMed]

52. Young RP, Hopkins RJ, Hay BA, Gamble GD (2011) GSTM1 null genotype in COPD and lung cancer: evidence of a modifier or confounding effect? The Application of Clinical Genetics 4: 137–144 [PMC free article] [PubMed]

53. Barr RG, Herbstman J, Speizer FE, Camargo CA Jr (2002) Validation of self-reported chronic obstructive pulmonary disease in a cohort study of nurses. Am J Epidemiol 155: 965–971 [PubMed]

54. Straus SE, McAlister FA, Sackett DL, Deeks JJ (2002) Accuracy of history, wheezing, and forced expiratory time in the diagnosis of chronic obstructive pulmonary disease. J Gen Intern Med 17: 684–688 [PMC free article] [PubMed]

55. Eisner MD, Trupin L, Katz PP, Yelin EH, Earnest G, et al. (2005) Development and validation of a survey-based COPD severity score. Chest 127: 1890–1897 [PubMed]

56. Vanderweele TJ, Vansteelandt S (2010) Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol 172: 1339–1348 [PMC free article] [PubMed]

57. Hayes AF (2012) An analytical primer and computational tool for observed variable moderation, mediation, and conditional process modeling. Manuscript submitted for publication.

58. Imai K, Keele L, Tingley D (2010) A general approach to causal mediation analysis. Psychol Methods 15: 309–334 [PubMed]

59. Hayes AF (2009) Beyond Baron and Kenny: statistical mediation analysis in the new millennium. Commun Monogr 76: 408–420

60. Alwin DF, Hauser RM (1975) Decomposition of Effects in Path Analysis. Am Sociol Rev 40: 37–47

61. Fairchild AJ, Mackinnon DP, Taborga MP, Taylor AB (2009) R^{2} effect-size measures for mediation analysis. Behav Res Methods 41: 486–498 [PMC free article] [PubMed]

Articles from PLoS ONE are provided here courtesy of **Public Library of Science**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |