|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: NLS RCC THSA DSC XC SC IG SH YH KKV XK MTL JZM SES SHS VLS YW NB PB ACH MH NRH DJH MKJ NGM GWM TJP LP MLP JPR MRS JCW RBW NEC MAE TE SMG JG RSH JK KSK PK MFL MDL PAFM MMN MR DR AS CIA LJB. Performed the experiments: NLS RCC THSA XK SES SHS VLS YW PB JC NRH JCW SHW MAE TE RSH JK PK SP CIA LJB. Analyzed the data: NLS RCC THSA DSC XC SC SH YH KKV XK JZM SES SHS LS YW ASW SHA PB NC NRH TN RS MRS JS WW BZY MAE TE RSH SP CIA LJB. Wrote the paper: NLS RCC THSA DSC IG SH KKV JZM SES SHS VLS LS NB MH NRH NGM GWM TN TJP LP MLP JPR RS MRS JCW RBW BZY MAE JG JK PK MDL PAFM DR AS CIA LJB. Contributed analysis tools: NLS RCC THSA YH CIA. Performed meta-analysis: NLS RCC THSA LS. Wrote first draft of paper: NLS RCC THSA LS LJB.
Recently, genetic association findings for nicotine dependence, smoking behavior, and smoking-related diseases converged to implicate the chromosome 15q25.1 region, which includes the CHRNA5-CHRNA3-CHRNB4 cholinergic nicotinic receptor subunit genes. In particular, association with the nonsynonymous CHRNA5 SNP rs16969968 and correlates has been replicated in several independent studies. Extensive genotyping of this region has suggested additional statistically distinct signals for nicotine dependence, tagged by rs578776 and rs588765. One goal of the Consortium for the Genetic Analysis of Smoking Phenotypes (CGASP) is to elucidate the associations among these markers and dichotomous smoking quantity (heavy versus light smoking), lung cancer, and chronic obstructive pulmonary disease (COPD). We performed a meta-analysis across 34 datasets of European-ancestry subjects, including 38,617 smokers who were assessed for cigarettes-per-day, 7,700 lung cancer cases and 5,914 lung-cancer-free controls (all smokers), and 2,614 COPD cases and 3,568 COPD-free controls (all smokers). We demonstrate statistically independent associations of rs16969968 and rs588765 with smoking (mutually adjusted p-values<10−35 and <10−8 respectively). Because the risk alleles at these loci are negatively correlated, their association with smoking is stronger in the joint model than when each SNP is analyzed alone. Rs578776 also demonstrates association with smoking after adjustment for rs16969968 (p<10−6). In models adjusting for cigarettes-per-day, we confirm the association between rs16969968 and lung cancer (p<10−20) and observe a nominally significant association with COPD (p=0.01); the other loci are not significantly associated with either lung cancer or COPD after adjusting for rs16969968. This study provides strong evidence that multiple statistically distinct loci in this region affect smoking behavior. This study is also the first report of association between rs588765 (and correlates) and smoking that achieves genome-wide significance; these SNPs have previously been associated with mRNA levels of CHRNA5 in brain and lung tissue.
Nicotine binds to cholinergic nicotinic receptors, which are composed of a variety of subunits. Genetic studies for smoking behavior and smoking-related diseases have implicated a genomic region that encodes the alpha5, alpha3, and beta4 subunits. We examined genetic data across this region for over 38,000 smokers, a subset of which had been assessed for lung cancer or chronic obstructive pulmonary disease. We demonstrate strong evidence that there are at least two statistically independent loci in this region that affect risk for heavy smoking. One of these loci represents a change in the protein structure of the alpha5 subunit. This work is also the first to report strong evidence of association between smoking and a group of genetic variants that are of biological interest because of their links to expression of the alpha5 cholinergic nicotinic receptor subunit gene. These advances in understanding the genetic influences on smoking behavior are important because of the profound public health burdens caused by smoking and nicotine addiction.
Smoking is associated with many different diseases. Lung cancer is the illness most identified with smoking, and its prevalence over time mirrors per capita tobacco consumption . There has been a reduction in smoking in the United States, and a concomitant decline in the incidence of lung cancer is beginning to emerge. Nonetheless more people die from lung cancer each year than from any other cancer . Chronic obstructive pulmonary disease (COPD), another serious lung disease largely attributable to smoking, is also among the leading causes of death.
Recently, genetic findings for nicotine dependence and smoking related diseases converged to implicate the chromosome 15q25.1 region, which includes the CHRNA5-CHRNA3-CHRNB4 cluster of cholinergic nicotinic receptor subunit genes. The nicotine dependence locus tagged by the single nucleotide polymorphism (SNP) rs16969968 and correlates has been replicated for smoking related traits including cigarettes-per-day and heavy smoking –, and has been reported as the most significant association genome-wide in very recent meta-analyses –. This locus has also been associated with risk for lung cancer and COPD in several genome-wide association studies (GWAS) , –. This represents an exciting overlap of genetic findings for nicotine dependence and smoking related diseases. Though different SNPs may be reported by each study, the high correlation between the associated SNPs (r2>0.8 with rs16969968) implies that these statistical signals tag the same locus in European-ancestry populations. The SNP rs16969968 results in an amino acid change (D398N) in the alpha5 receptor subunit protein and has been shown to affect receptor function .
Extensive genotyping of the CHRNA5-CHRNA3-CHRNB4 region has provided potential evidence for at least two additional distinct signals for nicotine dependence , , , . A second locus, tagged by rs578776, is associated with nicotine dependence and smoking in several samples of European-ancestry, with the minor allele protective in the sense that it is elevated in controls; rs578776 has only low correlation with rs16969968 in European-ancestry populations (r2=0.24 in the HapMap CEU panel), though the linkage disequilibrium (LD) coefficient |D'| is 1. A third important locus in this region is a group of highly correlated SNPs, tagged by rs588765, which are associated with mRNA levels of CHRNA5 in brain tissue ,  and lung tissue – from European-ancestry subjects. When rs16969968 and rs588765 (or correlates) are studied together, three common haplotypes are observed, each with distinct effects on risk , . There are hints that other, less common variants (minor allele frequency (MAF)≤5%) also contribute to nicotine dependence in this region, including a fourth locus represented by rs12914008 which has shown a relatively strong odds ratio of 0.73 in European-American subjects .
With the support of the National Institute on Drug Abuse (NIDA), we formed the Consortium for the Genetic Analysis of Smoking Phenotypes (CGASP), which includes smoking, lung cancer, and COPD researchers, to enable the pursuit of several research goals. For this first analysis project we focused on the chromosome 15q25.1 region containing CHRNA5-CHRNA3-CHRNB4. Specifically, we focused on the four distinct loci discussed above, which have low correlation with each other and have demonstrated evidence for involvement in nicotine dependence. Analyses were undertaken to investigate two questions: first, are there multiple statistically distinct genetic loci in this region that exert independent effects on smoking, and second, are similar patterns of genetic risk shared across smoking, lung cancer, and COPD.
This study was conducted according to the principles expressed in the Declaration of Helsinki and obtained informed consent from participants and approval from the appropriate institutional review boards.
All subjects included in these meta-analyses were current or former smokers of European ancestry. Results from 34 datasets, which include a total of 38,617 unrelated subjects who were assessed for cigarettes-per-day, contributed to the meta-analyses. Eight of the datasets were drawn from family-based studies and contributed only a subset of unrelated individuals to these analyses. Table 1 gives sample sizes and demographics of each participating study sample. Text S7.describes additional details for each dataset, including ascertainment criteria and genotyping methods, and documents that four datasets are also members of other consortia. All datasets contributed to the analyses of smoking. A subset of these 34 datasets also had information on lung cancer cases and lung-cancer-free smoker controls (6 datasets, N=13,614 smokers) and/or COPD cases and COPD-free smoker controls (4 datasets, N=6,182 smokers). The data for these traits are described in Table 2 and Table 3 respectively.
The traits examined were smoking quantity, lung cancer, and COPD. Two smoking traits were derived from measurements of cigarettes smoked per day (CPD): a 4-level categorical trait (CPD≤10, 10<CPD≤20, 20<CPD≤30, and CPD>30) and a dichotomous trait contrasting subjects from the lowest smoking category (CPD≤10: light-smoking “controls”) to those in the two highest categories combined (CPD>20: heavy smoking cases). The dichotomous trait of heavy versus light smoking was our primary trait for analysis. For one study (NAG-Finland), which used different boundaries to record CPD as detailed in the supplemental material, the distribution of CPD was examined to harmonize the phenotypes and select alternative boundaries. The numbers of subjects in each smoking category, total and by study, are given in Table 1. Lung cancer and COPD were analyzed as dichotomous traits. COPD cases were defined to have COPD as determined by post-bronchodilator spirometry as GOLD Stage II or worse (N=1,719), or self-reported COPD, emphysema or chronic bronchitis.
In European-ancestry populations, each of the four loci of interest can be represented by various highly correlated SNPs (SNPs having high r2 with each other). For each locus, we chose one target SNP for analysis: rs16969968 (locus 1), rs578776 (locus 2), rs588765 (locus 3), and rs12914008 (locus 4); the pairwise correlations between any two of these loci are r2<0.5 (Table S1). In samples for which a given target SNP was not available, we chose a highly correlated proxy SNP based on r2 computed with Haploview  using downloaded HapMap CEU genotype data, Release 23 . Table S2 lists the proxy SNPs used and their r2 with the corresponding target SNPs. Figure S1 displays the SNPs for each of the 4 loci in relation to the CHRNA5-CHRNA3-CHRNB4 cluster.
To ensure uniform analyses, SAS (SAS Institute, Cary, NC) and R  scripts for genetic association analyses were developed centrally and then distributed. The scripts were executed by each participating site, and the results returned to the coordinating group.
In each dataset, associations between the loci and the traits were evaluated using logistic regression. Our primary analysis model coded genotypes additively as the number of copies of the minor allele according to the HapMap CEU reference population. This allele is referred to as the “coded allele” (C) and the major allele is referred to as the “reference allele” (R). To confirm the appropriateness of the additive model, for each locus a 2 degree of freedom model including the additive term and a heterozygote deviation term was evaluated. The analyses of the 4-level CPD trait used generalized logistic regression to obtain separate effect estimates (beta coefficients) for each category with respect to the lowest smoking category as the referent. All these association analyses included sex and age as covariates. In addition, lung cancer and COPD analyses included categorical cigarettes-per-day as an unordered covariate.
Association results from each dataset, including the beta coefficient and standard error, were provided to the coordinating team. Meta-analysis was carried out using PLINK  to obtain overall summary odds ratios (ORs) and statistics. The R package rmeta  was used to verify results and create plots. There was no evidence of significant heterogeneity across datasets for these analyses (minimum heterogeneity p=0.21 for dichotomous CPD, 0.07 for lung cancer, 0.24 for COPD; for categorical CPD a nominally significant p was seen only for category 3 and locus 1 (p=0.007)). Because of varying study designs, ascertainment strategies, and representative SNPs, we nevertheless report results from random effects meta-analyses.
As noted earlier, locus 1 (representing rs16969968) is a highly replicated association finding and furthermore rs16969968 has been shown to have functional effects on the resulting alpha5-containing receptor . Therefore an important question is whether the remaining loci demonstrate additional independent effects on disease risk. Although loci 2, 3 and 4 are not highly correlated with rs16969968, |D'| is high. A high |D'| can correspond to a low r2 if the alleles that tend to co-occur on the same haplotype have very different allele frequencies. Previous results in the COGEND data suggest that there may be independent or synergistic effects on nicotine dependence between locus 1 and locus 3 , and haplotype analyses in the Utah and LHS samples , and in the COGEND and CPS-II-CPD samples , also indicate effects of haplotypes containing loci 1, 2 and 3.
To test whether additional loci contribute to dichotomous smoking quantity over and above the effect of rs16969968, we included both locus 1 and each of the other loci in the logistic regression models adjusting for sex and age, with and without a SNP×SNP interaction term. For lung cancer and COPD the models also included categorical cigarettes-per-day as an unordered covariate. These results were then meta-analyzed as described above. The SNP×SNP interaction term was never significant in the meta-analysis (p>0.3), so we report results from the joint models without interactions. To allow comparison between single-SNP and joint results on comparable data, for each locus pair we also repeated the univariate single-SNP meta-analyses on the subset of datasets that had genotypes available at both loci. For dichotomous smoking quantity we also tabulated pair-wise joint genotype by case status counts for locus 1 (rs16969968) versus each of the other three loci across the contributing datasets that had both loci.
Across the four target loci, multiple traits (4), the multiple models (additive and additive+heterozygote deviation), and the 2-SNP joint analyses (3 loci), our study was designed to perform fewer than 80 tests. A conservative Bonferroni correction would result in an uncorrected p-value threshold of 6.25×10−4 corresponding to an experiment-wide alpha of 0.05. The results tables report uncorrected p-values which we compared to this threshold to determine statistical significance.
We calculated allele frequencies within each sample to confirm that the coded allele (minor allele in HapMap CEU) was indeed the minor allele as expected in these European-ancestry subjects. Table S3 shows allele frequencies in each sample for the SNPs used. For each locus, frequencies are similar across studies and proxy SNPs, and similar to the frequencies in the HapMap CEU reference population.
All reported results are based on additive models. The additive model is appropriate because none of the tests for deviation from the additive assumption were significant. For each analysis, the tables and figures report the number of individuals successfully genotyped for the relevant SNP or SNPs.
Table 4 summarizes the meta-analysis results of dichotomous CPD (heavy/light smoking) in single-SNP analysis. Meta-analysis across all 34 samples clearly shows a highly significant association between dichotomous CPD and locus 1 (tagging rs16969968). Figure 1 displays a forest plot of the summary meta-analysis results for locus 1 (p=5.96×10−31, OR=1.33, 95% confidence interval (1.26–1.39)), and also the ORs in each contributing dataset.
The same analysis of locus 2 (tagging rs578776) yields a meta-analysis p-value of 1.38×10−25 and an OR of 0.78 (0.74–0.81), indicating a protective association for the minor allele as has previously been reported (Figure 2). Locus 3 (tagging rs588765) under the same model gives a p-value of 0.00027 and OR of 0.93 (0.89–0.97), which meets our threshold for multiple-test corrected significance but, unlike locus 1 and locus 2, does not surpass genome-wide significance (Figure 3). Locus 4 (tagging rs12914008) does not show a main effect on dichotomous CPD (p=0.45, OR=1.05 (0.93–1.17). The forest plot for locus 4 is given in Figure S2.
The categorical CPD analysis, which includes all 4 CPD levels in a generalized logit model, allows us to evaluate genetic effects for each CPD category with respect to the lowest smoking class (CPD≤10). Table 5 shows the results.
For locus 1 (rs16969968), we see an ordinal effect with increasing CPD; that is, the odds ratio increases from 1.15 to 1.29 to 1.40 for categories 2, 3 and 4, with a corresponding decrease in p-value from 3.17×10−8 to 2.12×10−12 to 5.47×10−40. A similar ordinal effect is seen for locus 2 (rs578776), with the odds ratio decreasing from 0.88 to 0.79 to 0.77. For locus 3 (rs588765) we see an effect only with the highest smoking category (CPD>30). For locus 4 no effect is seen across smoking categories, consistent with the dichotomous CPD results.
To dissect the potential distinct effects of these loci on heavy versus light smoking, we carried out meta-analyses of joint SNP models that included sex, age, locus 1 and each of the other loci, coded additively.
In the joint analysis of locus 1 and locus 2, there is suggestive evidence of distinct effects, but the association at locus 2 is no longer genome-wide significant in the presence of locus 1. Both SNPs become less significant compared to their single locus models: in the joint model, locus 1 gives p=2.15×10−22, OR=1.27 (1.21–1.33) and locus 2 gives p=4.50×10−7, OR=0.87 (0.83–0.92). When each SNP is placed individually in the model and meta-analyzed across the 32 datasets that provided data for both loci, locus 1 gives p=1.41×10−32, OR=1.34 while locus 2 gives p=1.38×10−25, OR=0.76. The risk-increasing alleles at locus 1 (C) and locus 2 (R) are positively correlated, even though the minor alleles are negatively correlated.
In joint analysis of locus 1 and locus 3, locus 1 (rs16969968) yields a p-value of 3.52×10−36, OR=1.47 (1.38–1.56); locus 3 (rs588765) gives p=6.03×10−9, OR=1.17 (1.11–1.23). Thus locus 3 attains genome-wide significance (p<5×10−8) after adjusting for the effect of locus 1. Note that adjusting for locus 1 changes the direction of effect for locus 3 (OR>1) compared to the single-SNP results. In the 33 datasets that have both loci genotyped, we obtain p=5.39×10−29, OR=1.32 for locus 1 alone, and p=0.00027, OR=0.93 (0.89–0.97) for locus 3 alone. The evidence for association in the joint model is stronger than when each SNP is analyzed alone. In fact, when locus 1 is not taken into account, the effect of locus 3 is potentially masked, and the effect of the minor allele is in an opposite direction (protective versus risk).
To further examine these interesting results for locus 1 and locus 3, we show the number of heavy and light smokers in each joint genotype class, and corresponding odds ratios using the genotype that is homozygous for both reference (major) alleles as the reference group (Table 6). The reference alleles (major in HapMap CEU) are labeled “R” and the coded alleles (minor in HapMap CEU) are labeled “C”.
The first important observation is that there are very few subjects in certain cells, namely the cells corresponding to RC/CC at locus 1/locus 3, CC/RC, and CC at both loci. This table therefore reveals that the risk alleles at locus 1 (C) and locus 3 (C) are negatively correlated, and explains why the effect of rs588765 is seen only after adjusting for rs16969968. This pattern also reflects the high |D'| between the loci.
The second observation is that for the remaining, well populated cells, the coded allele at locus 3 increases risk on the background of a fixed genotype at locus 1 (e.g. row 1 of the table, corresponding to the stratum of RR homozygotes at locus 1). Similarly, for a fixed genotype at locus 3, the coded allele at locus 1 increases risk (e.g. column 1 of the table, corresponding to the stratum of RR homozygotes at locus 3). Thus for each locus, the effect seen in the joint, 2-SNP logistic regression is confirmed in the most informative stratum at the other locus.
For locus 1 and locus 4 in the joint model, locus 1 gives p=1.01×10−38, OR=1.35 (1.29–1.41) and locus 4 gives p=5.55×10−3, OR=1.17 (1.05–1.31). While the effect for locus 4 is stronger than was seen in single-SNP analysis, it does not meet our multiple test threshold for significance. In single-SNP analysis of the 25 datasets that have genotypes at both loci, locus 1 alone gives p=7.56×10−35, OR=1.33; locus 4 is non-significant (p=0.45, OR=1.05).
In Table 7 we report the single-SNP meta-analysis results for the six lung cancer datasets; recall that all subjects were smokers, and sex, age and categorical CPD were included as covariates. As with the CPD traits, locus 1 (rs16969968) shows highly significant evidence for association with lung cancer (p=1.99×10−21). The summary odds ratio of 1.31 (1.24–1.38) closely matches the dichotomous CPD odds ratio of 1.33 (1.26–1.39). Figure 4 shows the association results for locus 1 by dataset and the overall meta-analysis results.
Locus 2 (rs578776) also shows evidence of association with lung cancer in single-SNP analysis (p=9.74×10−10; OR=0.82 (0.77–0.87)) (Figure 5). Locus 3 results in a p-value of 0.0004 (OR=0.90 (0.86–0.96)) (Figure 6); as with categorical CPD, this meets our multiple-test-corrected threshold but is not genome-wide significant. Locus 4 shows no evidence for association with lung cancer; the forest plot is given in Figure S3.
Similar to our analyses of categorical CPD, we carried out joint analyses of locus 1 with each of the other 3 loci, with covariates for sex, age and dummy-coded CPD. After adjusting for the effect of locus 1, none of the other loci reached our multiple-test-corrected significance threshold.
For locus 1 and locus 2 jointly in the model, locus 1 gave p=2.68×10−13, OR=1.26 (1.19–1.34) and locus 2 gave p=0.012, OR=0.91 (0.85–0.98). In joint analysis of locus 1 and locus 3, locus 1 yields p=2.24×10−19, OR=1.39 (1.30–1.50) and locus 3 gives p=0.0050, OR=1.11 (1.03–1.19), showing the same change from protective to risk for the minor allele as was observed in the dichotomous CPD analysis. Finally, in the last pairing, locus 1 gives p=2.66×10−22 OR=1.33 (1.26–1.41) and locus 4 gives p=0.028, OR=1.26 (1.02–1.55).
Table 8 summarizes the meta-analysis results for the 3 datasets with the COPD trait; as with lung cancer, all subjects were smokers and sex, age, and categorical CPD were included as covariates. In these analyses, only locus 1 provides even suggestive evidence for association though it does not survive multiple test correction (uncorrected p=0.01). The locus 1 odds ratio is 1.12 (1.02–1.23), a point estimate lower than that for CPD (1.33) and lung cancer (1.31) (Figure 7).
The first goal of this meta-analysis project was to test whether distinct loci in the CHRNA5-CHRNA3-CHRNB4 gene cluster demonstrate independent effects on smoking behavior (heavy (CPD>20) versus light (CPD≤10) smoking). We selected loci for study based on prior statistical and/or functional evidence for involvement. The second goal was to test whether similar patterns of association are seen across these loci in the smoking-related diseases of lung cancer and COPD. This meta-analysis marks the first large-scale effort to line up association results for these related traits – smoking, lung cancer, and COPD – using a uniform analysis protocol. Our results contribute important new insights about genetic risk for these traits. In particular, we demonstrate strong evidence that smoking behavior is influenced by multiple distinct loci in this region, including two loci that are associated with relevant biological effects in functional studies.
First, our results show that locus 1, representing the CHRNA5 amino acid change rs16969968 and correlates, demonstrates highly significant association with smoking behavior (OR=1.33, p=5.96×10−31). Our strong evidence for the involvement of locus 1 with smoking across these samples marks the robustness of its genetic effect. The contributing datasets for the smoking analyses range from samples ascertained for nicotine dependence, lung cancer, or COPD, to adolescent samples, to populations ascertained for a variety of diseases including schizophrenia, alcohol or other substance dependence, breast cancer, type 2 diabetes, and heart disease. This meta-analysis represents a very diverse group, and yet the association between rs16969968 and smoking behavior is consistent.
The second, and novel, finding from this meta-analysis is the evidence for an additional, distinct, locus in this region that is associated with heavy/light smoking and is genome-wide significant. We demonstrated that locus 3, representing rs588765 and correlates, attains a p-value of p=6.03×10−9 (OR=1.17) when we adjust for locus 1 in a logistic regression model. It is notable that the association between locus 3 and CPD is not as apparent in the single-SNP analysis that does not control for locus 1 (e.g. meta-analysis p=0.0003, OR=0.93, which does not reach genome-wide significance). The negative correlation between the risk alleles at locus 1 and locus 3 (r=−0.64) masks the effect at the latter locus in single-SNP analysis, a phenomenon known as suppression , . The association evidence for both SNPs is strengthened in the joint analysis, with a reversal of the direction of effect for locus 3. This evidence of statistically independent association for locus 3 with smoking in our analysis is compelling given that these SNPs have also been implicated in altered mRNA levels for CHRNA5 in brain and lung tissue from European-ancestry subjects , , . Thus, both statistical and functional evidence indicate that at least one SNP correlated with CHRNA5 mRNA levels is involved in risk, and highlight locus 3 as an important group of SNPs for further investigation.
A third observation from this study is that locus 2 (rs578776 and correlates) shows evidence for involvement in heavy/light smoking. Locus 2 is genome-wide significant in the single-SNP analysis of dichotomous CPD without adjustment for locus 1, with the minor allele elevated in controls (meta-analysis p=1.38×10−25, OR=0.78). However the association is much weaker (p=4.50×10−7, OR=0.87) in the joint logistic regression model that includes locus 1 and locus 2. One interpretation is that part of the single-SNP association at locus 2 is driven by the effect of locus 1 (perhaps related to the high |D'|). Nevertheless, there is evidence for residual signal at locus 2.
We tested a fourth locus representing rs12914008, a relatively uncommon (MAF ~5%) non-synonymous SNP in CHRNB4 that has previously shown suggestive evidence for association in European-Americans . In both the univariate analysis and the joint analysis with locus 1, locus 4 is not associated with smoking behavior after multiple test correction. Because of the low allele frequency of this variant, the power to detect an effect is lower than for the other three loci.
This meta-analysis therefore highlights locus 1, locus 2, and locus 3, and indicates dependencies in their effects on risk for heavy smoking. Haplotypes based on these three loci have been described ,  and are seen in HapMap CEU, where the observed haplotype patterns for rs16969968 (locus 1), rs578776 (locus 2), and rs588765 (locus 3) are: A-G-C (frequency 0.425), G-G-T (0.333), G-A-C (0.207), G-A-T (0.035). Only four of the eight possible haplotypes are observed. This is consistent with the correlation structure between the loci. Locus 2 and locus 3 have low correlation with each other (e.g. r2=0.07 between rs578776 and rs588765 in HapMap CEU release 23); however their correlation sharply increases when locus 1 is taken into account (e.g. in GG homozygotes at rs16969968, r2=0.74 in HapMap CEU).
Our association results together with the correlation patterns of these three loci suggest that future haplotype or diplotype analyses across large datasets could clarify the relative contributions of these loci. Our evidence that multiple distinct genetic loci affect smoking quantity is consistent with previous reports of risk and protective haplotypes for nicotine dependence in the Utah and LHS samples , and in the COGEND and CPS-II-CPD samples . The Utah/LHS study haplotype included 5 SNPs: two that represent locus 1 (rs16969968 and rs1051730), two that represent locus 2 (rs569207 and rs578776), and one that represents locus 3 (rs680244). The COGEND and CPS-II-CPD haplotype analyses included up to 3 loci, one each for locus 1, 2 and 3. Across all these published studies, the high-risk haplotype carries the risk allele at rs16969968 (locus 1); because of the high |D'| between loci, only one haplotype carries that allele. Among the remaining haplotypes, a low risk haplotype is obtained when the minor allele at locus 2 or the major allele at locus 3, or both, is paired with the non-risk allele at rs16969968.
Taken together, our meta-analysis results argue strongly for the existence of at least two statistically distinct loci in this region that affect risk for heavy smoking. In particular, both locus 1 and locus 3, which have known functional effects, are genome-wide significant in joint, mutually-adjusted analysis. The minor allele at locus 3 shifts from a marginally significant protective factor when considered alone to a robust risk factor when considered in combination with locus 1. The statistical evidence and negatively correlated alleles at locus 1 and locus 3 are consistent with at least two mechanistic models: distinct effects of two loci where the minor allele at each locus increases risk across a constant background at the other locus, or a haplotype dose effect where alleles at the two loci act in concert on the same haplotype strand. In the latter model, the minor-major and major-minor haplotypes each increase risk relative to the major-major haplotype, as can be seen in Table 6 once it is recognized that the rarity of the minor-minor haplotype implies that the double-heterozygote cell essentially represents the minor-major and major-minor diplotype. It is also possible that multiple rare variants underlie these findings, as has been suggested in general for disease associations with common SNPs . It remains possible that these associations with locus 1, locus 2 and locus 3 are reflecting correlation with yet another underlying, untyped variant that alone explains the altered biology leading to risk. However, biological involvement of multiple loci appears more likely given that two of these loci represent two distinct, relevant functional consequences: namely, locus 1 (the amino acid change at rs16969968) is associated with altered receptor response to a nicotine agonist in vitro , and locus 3 (rs588765 and correlates) is associated with altered mRNA levels of CHRNA5 in brain and lung tissue , . Further investigation via resequencing, biological/functional assays, and animal models is needed to dissect the causal biology that underlies the statistical evidence.
An important open question is the degree to which the associations between chr15q25 variants and lung cancer are due to their effects on smoking. When comparing smoking and lung cancer single-SNP results, the patterns of association (odds ratios and directions of effect) were similar across the loci studied. Locus 1 is associated with lung cancer even when controlling for amount smoked per day (p=1.99×10−21, OR=1.31). This result suggests possible direct genetic effects of locus 1 on this cancer, at least in the presence of smoking. However, CPD is not a sufficient proxy for carcinogen exposure , and in never-smokers there is a lack of association between locus 1 and lung cancer –, so it is possible that more refined adjustment for smoking will reduce or abolish this association.
For lung cancer, after controlling for categorical CPD and effects of locus 1, we were not able to definitively demonstrate association at either locus 2 or locus 3 after correction for multiple tests. For the mutually adjusted analysis of locus 1 and locus 3 for lung cancer, we observed the same change in the direction for the locus 3 odds ratio that we observed in the joint-SNP analysis of smoking. However, unlike what was seen for smoking, for lung cancer the magnitude (and significance) of the effects did not increase. There are several possible reasons for this, including: chance, the smaller sample size for lung cancer, or qualitative differences in the relationship between these loci and smoking behavior versus the relationship between these loci and lung cancer (after adjusting for smoking quantity). This highlights the challenges posed when attempting to dissect the contributions of multiple loci of modest effect on complex, correlated traits. Further studies, and larger sample sizes, are needed.
For COPD, when controlling for cigarettes-per-day we did not find evidence for association with any of the loci after correction for multiple tests. For locus 1, the odds ratio of 1.12 (1.01–1.23) is lower than for smoking and lung cancer. The COPD analyses were based on smaller samples than those available for CPD or for lung cancer.
Very recently, three other large smoking genetics consortia published their meta-analysis findings that confirm locus 1 (representing not only rs16969918 but also rs1051730 and other SNPs) as the locus most associated with smoking quantity, genome-wide –. All three studies used linear regression to test for association with either quantitative CPD value  or categorical CPD (1–10, 11–20, 21–30, and 31+) , . Those consortia also report results from conditional analyses in which a locus 1 SNP was included as a covariate, paralleling our joint analyses.
In contrast to our novel finding in CGASP of genome-wide significance for locus 3 when analyzed jointly with locus 1, none of the other consortia report strong evidence for locus 3 when paired with locus 1. In the Oxford-GSK study , imputation using 1000 Genomes data detected the most significant single-SNP association for CPD at the locus 1 SNP rs55853698 (r2>0.96 with rs16969968). After conditioning on rs55853698, the strongest residual signal was detected at a locus 2 SNP, rs6495308 (p=3.96×10−5; r2=0.825 with rs578776 in HapMap CEU); they do not report the association result for rs588765 in the conditioned analysis, although it must have been less significant than 3.96×10−5. In their single-SNP analysis, rs6495308 (locus 2) gave a p-value of 2.2×10−10. Their results for locus 2 are therefore consistent with our observation that in joint analysis of locus 1 and locus 2, the significance at locus 2 is reduced compared to the single-SNP analysis. They do not report on whether the evidence for locus 1 and locus 3 strengthens in the joint analysis compared to single-SNP analysis, as we observed in the CGASP datasets. They do note that there is no obvious residual association with a third SNP after conditioning on either the pairing of locus 1 (rs16969968) and locus 3 (rs588765), or the pairing of locus 1 (rs55853698) and locus 2 (rs6495308). That result is consistent with the correlation and haplotype structure of these three loci discussed previously.
In the ENGAGE study , conditioning on the locus 1 SNP rs1051730 identified residual evidence at rs2869046 (p=4.8×10−5) and rs2036534 (p=9.1×10−5), neither of which is genome-wide significant. Rs2036534 tags locus 2 (r2=0.74 with rs578776 in HapMap CEU) while rs2869046 is only weakly correlated with locus 3 (r2=0.46).
In TAG , the conditional analyses indicated residual association at rs684513 (p=6.3×10−9), rs9788682 (p=1.06×10−8), and rs7163730 (p=1.22×10−8), which attain genome-wide significance. These SNPs are each correlated with locus 2, and much less correlated with locus 3 (r2=0.7, 0.55 and 0.56 respectively with rs578776 in HapMap CEU; r2<0.11 with rs588765). It is possible that differences in samples, phenotype definitions, or analysis methods may be contributing to the differences between our strong findings for locus 3 and the three other consortium reports. To further understand the genetic contributions in this region, more work is needed, and not only statistical evidence but also biological evidence will be important.
In summary, our meta-analysis demonstrates significant, robust association of locus 1, representing the non-synonymous CHRNA5 SNP rs16969968 as well as rs1051730 and rs55853698, with smoking heaviness across very diverse datasets. Our study also demonstrates strong evidence that at least one additional distinct locus in this region affects risk for heavy smoking. In particular, we have identified for the first time that locus 3 – representing the CHRNA5 expression-associated SNPs rs588765 and correlates – surpasses GWAS-level significance for association with heavy smoking in European-ancestry subjects; this effect is detectable after adjusting for the effect of rs16969968. This new result for locus 3 raises the corresponding SNPs (rs588765 and correlates) to the level of interest already accorded to the two loci which have previously been detected at GWAS-level significance in single-SNP analyses: locus 1 (rs16969968 and correlates) and locus 2 (rs578776 and correlates). Our result also has implications for all genetic association studies, as it illustrates that joint analysis of SNPs is an important tool for identifying genome-wide significant effects that, soberingly, may be obscured in single SNP analyses.
Our study used multiple highly correlated SNPs to represent each of the 4 tested loci, depending on availability in each dataset, and all subjects were of European ancestry. Hence this study is not designed to determine which SNP(s), among the highly correlated SNPs for each locus, are most likely to be biologically involved. Future work, involving large-scale meta-analysis of other populations (e.g. Asian or African ancestry) to capitalize on LD differences between populations, comprehensive functional annotation of genetic variants, DNA re-sequencing and variant discovery, and functional and animal studies may help narrow down these large sets of correlated SNPs to the most promising causal alleles.
The CHRNA5-CHRNA3-CHRNB4 region containing the target SNPs rs16969968 (locus 1), rs578776 (locus 2) rs588765 (locus 3), and rs12914008 (locus 4). The SNPs used in this study to represent each locus are drawn with dotted lines connecting them to each other.
(1.09 MB TIF)
Forest plot for dichotomous CPD and locus 4.
(0.38 MB TIF)
Forest plot for lung cancer and locus 4.
(0.30 MB TIF)
Correlation (r-squared) between the four target SNPs representing loci 1, 2, 3, and 4 (HapMap CEU Release 23).
(0.05 MB DOC)
Correlation (r-squared) between the target SNPs and their proxies (HapMap CEU Release 23).
(0.06 MB DOC)
Genotyped SNPs and overall allele frequencies, by sample dataset.
(0.10 MB DOC)
For facilitating this collaboration of the Consortium for the Genetic Analysis of Smoking Phenotypes, we thank Jonathan Pollock and the National Institute on Drug Abuse, which provided infrastructure support through conference calls and two meetings (June 2009 and February 2010). We thank Gary Swan and Marco Ramoni for their support. For this project we wish to acknowledge and thank the following people. For meta-analysis coordination at Washington University: Weimin Duan and Cindy Helms. For administrative support at Washington University: Tracey Richmond and Sherri Fisher. For the Washington University Collaborative Genetic Study of Nicotine Dependence (COGEND) study: Michael Brent, LiShiun Chen, Alison Goate, Sarah Hartz, Dorothy Hatsukami, Anthony Hinrichs, Eric Johnson, Heidi Kromrei, Tracey Richmond, Joe Henry Steinbach, Jerry Stitzel, Scott Saccone, Sharon Murphy; in memory of Theodore Reich, founding Principal Investigator of COGEND, we are indebted to his leadership in the establishment and nurturing of COGEND and acknowledge with great admiration his seminal scientific contributions to the field. For the University of Utah studies: Andrew von Niederhausern, Diane M. Dunn, Nori Matsunami, Nanda A. Singh, Lisa Baird, Hilary Coon, William M. McMahon, Mary Beth Scholand, Richard E. Kanner, Lorise C. Gahring, Scott W. Rogers, John R. Hoidal, Timothy B. Baker. For GlaxoSmithKline: Wayne Anderson, Meg Ehm and the ECLIPSE investigators. For the University of Colorado CADD study: Thomas Crowley, John K. Hewitt, Michael C. Stallings, Christian Hopfer, Kenneth Krauter, Robin P. Corley, Matthew B. McQueen; for the University of Colorado Add Health study: John K. Hewitt, Andrew Smolen, Kathleen M. Harris; for the University of Colorado NYSFS study: Scott Menard and David Huizinga. For the Finnish studies: Anu Loukola, Ulla Broms, Tellervo Korhonen, Kauko Heikkilä, Markus Perola, Samuli Ripatti, Veikko Salomaa, Arpo Aromaa, Antti Jula. For the Washington University Nicotine Addiction Genetics (NAG) and BigSib studies: Andrew Schrage, Rachel Qin Zhu. For the Yale study: Henry R. Kranzler, Lindsay A. Farrer, John Farrell, Roger D. Weiss, Kathleen T. Brady. For the UVA study: Tianhua Tim, Qing Xu. For the Harvard HPFS and NHS studies: Susan Hankinson, Eric Rimm, Frank Hu, Gary Curhan.
This analysis uses data from Add Health, a program project designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris, and funded by a grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. No direct support was received from grant P01-HD31921 for this analysis. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Persons interested in obtaining data files from Add Health should contact Add Health, Carolina Population Center, 123 W. Franklin Street, Chapel Hill, NC 27516-2524 (ude.cnu@htlaehdda).
NLS is the spouse of S.F. Saccone, who is listed as an inventor on a patent, “Markers of Addiction”, covering the use of certain SNPs in diagnosing, prognosing, and treating addiction. LJB, JCW and JPR are listed as inventors on a patent, “Markers of Addiction,” covering the use of certain SNPs in diagnosing, prognosing, and treating addiction. LJB has served as a consultant to Pfizer in 2008. XK is a full time employee of GlaxoSmithKline. SP was a full time employee of GlaxoSmithKline. Current affiliation is with Hoffman-La Roche. JK has served as a consultant to Pfizer in 2008. MDL has served as a consultant to NIH, deCODE genetics, University of Pennsylvania, Reckitt Benckiser Pharmaceuticals, Pennsylvania Department of Health, and Informational Managements Consulting. Dr. Li also serves as a scientific advisor to ADial Pharmaceuticals. TJP receives compensation from the University of Mississippi Medical Center; part of his salary has been supported by grants from NIDA, NCI, the University of Mississippi Health Care Cancer Institute, the Mississippi State Department of Health, Pfizer Inc., and GlaxoSmithKline.
Meta-analysis was supported by the National Institute on Drug Abuse (NIDA; R01 DA026911, R03 DA023166), National Institute of General Medical Sciences (NIGMS; K25 GM69590), and the American Cancer Society (ACS; IRG-58-010-50). The Washington University COGEND contribution was supported by the National Cancer Institute (NCI; P01 CA089392), The National Human Genome Research Institute (NHGRI; U01 HG04422-01), and NIDA (K02 DA021237). COGEND genotyping at Perlegen Sciences was performed under NIDA Contract HHSN271200477471C; phenotypic and genotypic data are stored in the NIDA Center for Genetic Studies (NCGS) at http://zork.wustl.edu/ under NIDA Contract HHSN271200477451C (PIs J Tischfield and J Rice); genotyping services were also provided by the Center for Inherited Disease Research (CIDR), which is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, contract number HHSN268200782096. Utah was supported by NIDA/NHLBI (P01-HL72903). GlaxoSmithKline funded data with (SCO104960, NCT00292552, and RES11080). MD Anderson work supported by NCI (CA55769, CA121197, CA133996, CA016672, and CA127219) and by the National Institute of Environmental Health (NIEH; P30ES007784). University of Colorado was supported by funds from NIDA (R21 DA026901, P60 DA011015, R01 DA012845, R01 DA021913), NIAAA (K01 AA015336, R01 AA017889, T32 AA007464, R01 AA011949), and NICHD (P01 HD31921). Wayne State University was funded by NCI (R01 CA60691, R01 CA87895, N01 PC35145, P30 CA22453). Finland studies were funded by NIDA (R01 DA12854), the Center of Excellence in Complex Disease Genetics of the Academy of Finland (21356 and 129680), The European Community's Seventh Framework Programme/ENGAGE Consortium (HEALTH-F4-2007- 201413), Nordic Center of Excellence in Disease Genetic, and Wellcome Trust. The NAG-BigSib contribution was funded by NIDA (R01 DA12854, R56 DA12854, K08 DA019951), NIAAA (R01 AA11998, R01 AA13320, R01 AA13321), and the Australian National Health and Medical Research Council. The VA twin study was supported by NIDA (K01DA019498, R21DA027070). Studies conducted at Yale were funded by NIDA (R01s DA12890, DA12849, K01DA024758) and NIAAA (AA11330). NIDA supported studies done at UVA (R01 DA-12844, R01 DA-13783). Harvard studies (HPFS, NHS) were funded by NIDDK (5P01DK070756), NCI (P01CA087969), NHGRI (5U01HG004399), and NHLBI (5R01HL035464); genotyping for HPFS_CHD and NHS_CHD was supported by Merck Research Laboratories. The University of Bonn and Central Institute of Mental Health Mannheim studies were supported by the German Federal Ministry of Education and Research (BMBF) within the context of the German National Genome Research Network (NGFN-2 and NGFN-plus) by grants to MR (01GS8152). EAGLE and PLCO were supported by the Intramural Research Program of NIH, NCI, Division of Cancer Epidemiology and Genetics. PLCO was also supported by individual contracts from the NCI to the University of Colorado Denver (NO1-CN-25514), Georgetown University (NO1-CN-25522), Pacific Health Research Institute (NO1-CN-25515), Henry Ford Health System (NO1-CN-25512), University of Minnesota, (NO1-CN-25513), Washington University (NO1-CN-25516), University of Pittsburgh (NO1-CN-25511), University of Utah (NO1 CN-25524), Marshfield Clinic Research Foundation (NO1-CN-25518), University of Alabama at Birmingham (NO1 CN-75022), Westat, Inc. (NO1-CN-25476), University of California, Los Angeles (NO1-CN-25404). The datasets HPFS-T2D, NHS-T2D, NCI-EAGLE, NCI-PLCO, and the Study of Addiction: Genetics and Environment (SAGE), which overlaps with COGEND, are among the genome-wide association studies funded as part of the Gene Environment Association Studies (GENEVA) under the NIH Genes, Environment and Health Initiative (GEI). Assistance with data cleaning for GENEVA studies, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning for GENEVA studies was also provided by the National Center for Biotechnology Information. Genotyping for GENEVA studies was performed at the Johns Hopkins University Center for Inherited Disease Research, with support from the NIH GEI (U01HG004438) and the NIH contract ‚High throughput genotyping for studying the genetic contributions to human disease‚ (HHSN268200782096C), and the Broad Institute of MIT and Harvard, with funding support from the NIH GEI (U01HG04424). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.