|Home | About | Journals | Submit | Contact Us | Français|
Recent meta-analyses of European ancestry subjects show strong evidence for association between smoking quantity and multiple genetic variants on chromosome 15q25. This meta-analysis extends the examination of association between distinct genes in the CHRNA5-CHRNA3-CHRNB4 region and smoking quantity to Asian and African American populations to confirm and refine specific reported associations.
Association results for a dichotomized cigarettes smoked per day (CPD) phenotype in 27 datasets (European ancestry (N=14,786), Asian (N=6,889), and African American (N=10,912) for a total of 32,587 smokers) were meta-analyzed by population and results were compared across all three populations.
We demonstrate association between smoking quantity and markers in the chromosome 15q25 region across all three populations, and narrow the region of association. Of the variants tested, only rs16969968 is associated with smoking (p < 0.01) in each of these three populations (OR=1.33, 95%C.I.=1.25–1.42, p=1.1×10−17 in meta-analysis across all population samples). Additional variants displayed a consistent signal in both European ancestry and Asian datasets, but not in African Americans.
The observed consistent association of rs16969968 with heavy smoking across multiple populations, combined with its known biological significance, suggests rs16969968 is most likely a functional variant that alters risk for heavy smoking. We interpret additional association results that differ across populations as providing evidence for additional functional variants, but we are unable to further localize the source of this association. Using the cross-population study paradigm provides valuable insights to narrow regions of interest and inform future biological experiments.
Recent genetic meta-analyses including tens of thousands of subjects of European ancestry show strong evidence of association between smoking quantity (cigarettes per day, CPD) and multiple genetic markers on chromosome 15q25 [Liu, et al. 2010; Saccone, et al. 2010; TAG 2010; Thorgeirsson, et al. 2010]. Those studies synthesized evidence across many independent datasets to highlight specific variants in the region of the CHRNA5-CHRNA3-CHRNB4 gene cluster associated with smoking behavior in European ancestry subjects. It is important to determine the biological mechanisms underlying these associations; however, the high linkage disequilibrium (LD) in this region among individuals of European ancestry makes it difficult to differentiate potentially causal variants from the many correlated variants. Because the genetic architecture of chromosome 15q25 varies across populations, comparing associations across diverse populations with differing genetic architecture can help refine the region of association and point to variants more likely to have functional relevance [Rotimi and Jorde 2010; Saccone, et al. 2008; Zaitlen, et al. 2010].
The most robust genetic finding on chromosome 15q25 in subjects of European ancestry is the region tagged by rs16969968, rs1051730, and other correlated variants. This finding has been replicated for smoking-related traits in multiple distinct datasets [Baker, et al. 2009; Berrettini, et al. 2008; Keskitalo, et al. 2009; Saccone, et al. 2009; Saccone, et al. 2007; Sherva, et al. 2008; Stevens, et al. 2008; Thorgeirsson, et al. 2008; Weiss, et al. 2008], and has now been reported as the most significant genome-wide association in recent meta-analyses of European ancestry subjects (e.g. rs16969968, p = 5.57×10−72, or rs1051730, p = 2.75×10−73) [Liu, et al. 2010; Saccone, et al. 2010; TAG 2010; Thorgeirsson, et al. 2010]. We will use the term “bin” to denote a group of correlated SNPs (r2 ≥0.7) that may constitute the same association signal in European ancestry samples [Carlson, et al. 2004]. Under this definition and using the 1000 Genomes Pilot 1 CEU as the European ancestry reference sample [Durbin, et al. 2010], the single bin tagged by rs16969968 and rs1051730 includes 52 known variants. This bin, which we will call bin A, groups together and unifies the most significant meta-analysis findings as well as individual dataset reports of SNPs associated with nicotine dependence, heavy smoking, lung cancer, and other smoking related diseases in European ancestry datasets.
There are additional markers of interest in this region that are not strongly correlated with bin A. Because of the clear association between smoking behavior and bin A, each of the large-scale meta-analyses of European ancestry samples carried out association tests conditional on bin A variants for other SNPs to determine whether additional genetic markers in 15q25 are associated after adjusting for effects of bin A [Liu, et al. 2010; Saccone, et al. 2010; TAG 2010; Thorgeirsson, et al. 2010]. After conditioning on bin A, the meta-analyses identified additional SNPs in this region associated with smoking behavior. These SNPs can be grouped into three distinct bins (B, C, D) (Table 1). Bin B, tagged by rs588765 and rs880395, is associated with genome-wide significance among heavy versus light smokers but only in analyses conditioning on bin A (p=1.2×10−9) [Saccone, et al. 2010]. Notably, bin B is also associated with mRNA levels of CHRNA5 in brain and lung [Falvella, et al. 2010; Smith, et al. 2010; Wang, et al. 2009]. Bin C, tagged by rs6495308 [Liu, et al. 2010], rs2036534 [Thorgeirsson, et al. 2010], rs7163730, rs9788682, rs684513 [TAG 2010], and rs578776 [Saccone, et al. 2010], is associated with heavy smoking after conditioning on bin A (p values from 9.1×10−5 to 6.3×10−9). In contrast to bin B, bin C is less significant in conditional analysis compared to single SNP analysis. Bin D is represented by rs2869046, which also displayed residual association after conditioning on bin A (p=4.8×10−5) [Thorgeirsson, et al. 2010]. Markers from these different bins (A, B, C, and D) are only modestly correlated with one another, with r2≤0.52 in the 1000 Genomes Pilot 1 CEU (N=180; Table 2).
Differences in the correlational structure of markers spanning the region 15q25 between populations result in distinct sub-bins of correlated markers among Asian and African American populations which provide an opportunity to refine the source of the previously reported signals. For example, bin A, consisting of 52 variants including rs16969968, separates into 20 sub-bins in Asians (based on 1000 Genomes Pilot 1 JPT/CHB) and 38 sub-bins in African Americans (based on combined information from the 1000 Genomes Pilot 1 YRI and HapMap 3 Release 2 ASW) [Altshuler, et al. 2010]. In particular, rs16969968 and rs1051730 are highly correlated in European ancestry (r2=1) and Asian populations (r2=1), but display only moderate correlation (r2=0.40) in the African American population. These differences in genetic architecture can be used to dissect the association signals.
The purpose of this meta-analysis is to determine if bins A, B, C, and D show consistent association with smoking behavior across populations and, if so, to leverage these differences in genetic correlation across populations to refine the genetic associations in this region previously reported in subjects of European ancestry. We expect a sub-bin showing consistent evidence across all three populations to be more likely to contain a variant altering a biological mechanism. We performed meta-analyses of results from a total of 27 datasets: nine European ancestry samples (used to evaluate consistency with previous results), seven Asian samples, and eleven African American samples. We tested for association between smoking phenotypes and the four distinct bins (A through D) across all three populations. This cross-population study therefore improves our understanding of genetic risk for smoking by highlighting potentially functional variants.
Results from 27 datasets, containing a total of 32,587 smokers with measures of cigarettes per day, contributed to the meta-analyses. Of these datasets, 9 consisted of European ancestry subjects (N=14,786), 7 consisted of Asians (N=6,889), and 11 consisted of African Americans (N=10,912). Twenty datasets were samples of unrelated individuals. The remaining seven datasets were family-based studies, for which the primary analyses involved an extraction of unrelated individuals. To be included in the analyses, each subject was required to have reported smoking cigarettes in his/her lifetime. Genotyping varied among studies from extensive coverage based on genome wide association genotyping to only a limited number of candidate SNPs genotyped in this 15q25 region. Text S1 provides additional details for each dataset, including recruitment, primary phenotypes, definitions for smokers and cigarettes per day, DNA source, genotyping platforms, and genotyping quality control. Table S1 shows the sample size and demographics for each participating dataset. Four of nine datasets of European ancestry were included in the previous report [Saccone, et al. 2010] (see Table S1 for the overlap, which involves only European-ancestry samples). The informed consent from participants and approval from the appropriate institutional review boards were obtained.
Smoking quantity was assessed with cigarettes smoked per day (CPD). The primary phenotype for analysis was a dichotomous trait contrasting light smoking controls (CPD≤10) to heavy smoking cases (CPD>20). In addition, a 4-level ordered trait (CPD≤10; 11≤CPD≤20; 21≤CPD≤30; CPD≥31 coded as 0, 1, 2, 3 respectively) was developed for confirmatory analysis. The only exception was one study (Women’s Health Initiative) that measured smoking amount with different threshold levels (CPD≤14, 15≤CPD≤24, 25≤CPD≤34, CPD≥35), and CPD≤14 defined the light smoking controls which was contrasted with CPD 25 as heavy smoking cases.
Multiple SNPs in 15q25 have been identified as associated with smoking behavior in studies of European ancestry subjects. We focused on the results highlighted in the most powerful studies, namely the large meta-analyses [Liu, et al. 2010; Saccone, et al. 2010; TAG 2010; Thorgeirsson, et al. 2010] (Table 1). Table 2 lists the 11 targeted SNPs and illustrates how linkage disequilibrium (LD, measured by r2) structure varies across different populations. We used SNAP [Johnson, et al. 2008] with 1000 Genomes Pilot 1 reference samples [Altshuler, et al. 2010; Durbin, et al. 2010] and HapMap3 ASW to obtain LD estimates for our three populations: CEU for European ancestry, JPT/CHB for Asians, and ASW/YRI for African Americans.
It is important to examine not just the 11 previously identified SNPs listed in Table 1, but all SNPs correlated with these 11 SNPs in Europeans. We used a two-step process to define distinct groups of correlated SNPs, which we call bins. First, we grouped previously identified SNPs by their correlation in the 1000 Genomes Pilot 1 CEU (i.e. European ancestry) reference sample, using r2≥0.7 as our threshold [Durbin, et al. 2010]. Under this strategy, the 11 previously identified SNPs listed in Table 1 are partitioned into four groups: Group A (rs16969968, rs1051730), Group B (rs588765, rs880395), Group C (rs6495308, rs2036534, rs7163730, rs9788682, rs684513, rs578776), and Group D (rs2869046) (Table 2). From these four groups, we established the bins by including all SNPs correlated (r2≥0.7) in the European ancestry reference sample with at least one of the SNPs defining the bin. The threshold of 0.7 was chosen to provide an inclusive collection of tested SNPs. Using SNAP to obtain correlated variants in a bin based on 1000 Genomes Pilot 1 CEU, we identified 52 SNPs in bin A, 111 SNPs in bin B, 82 SNPs in bin C, and 15 SNPs in bin D.
Next, we partitioned these SNPs within a bin into “sub-bins” based on r2≥0.8 in the Asian and African American populations. The higher threshold of 0.8 was used for sub-bins in the other populations to refine the focus of the analyses. In Asians, we identified 20 sub-bins for bin A, 39 sub-bins for bin B, 24 sub-bins for bin C, and 7 sub-bins for bin D. In African Americans, we identified 38 sub-bins for bin A, 37 sub-bins for bin B, 26 sub-bins for bin C, and 7 sub-bins for bin D.
We evaluated the genetic associations between heavy smoking and each genotyped SNP in three populations. Standardized scripts were developed centrally by the coordinating site (Washington University) for analyses of all participating datasets at each individual research center. Results were returned to the coordinating site for quality checks and meta-analyses. Individual SNP analyses were performed using SAS (SAS Institute, Cary, NC).
In each dataset, association between heavy versus light smoking based on cigarettes per day and all SNPs was evaluated with logistic regression models as the primary analysis. Genotypes were coded additively as the number of non-reference alleles, where the reference allele was defined as the major allele in the European ancestry population in dbSNP [Sherry, et al. 2001]; consistency of allelic coding was confirmed by comparing allele labels and allele frequencies across all datasets within each population. Age as a continuous variable and gender were included as covariates. Secondary analyses of the 4-level cigarettes per day trait used linear regression models with the same covariates, assuming that the trait has a simple linear relationship with the predictors.
Analyses were stratified by ancestry: European, Asian, and African American. We evaluated the effect of each bin A SNP using single SNP association analyses. For bins B, C, and D, both single SNP association and conditional analyses controlling for bin A were performed. Analyses conditional on bin A (rs16969968) served as our primary analysis model for bins B, C, and D because they were targeted due to previously reported results of analyses conditional on bin A in European ancestry meta-analyses.
For each ancestry group, every dataset with at least one genotyped SNP in a given sub-bin contributed to the meta-analysis of that sub-bin. For each sub-bin, a SNP was selected as the target. In samples where the target SNP was missing, we used the results from the SNP with highest correlation (r2) with the target SNP in the sub-bin defined by the 1000 Genome Pilot 1 JPT/CHB for Asians, and the 1000 Genome Pilot 1 YRI or HapMap3 ASW project for African Americans.
We used PLINK to perform meta-analyses and generate overall summary odds ratios (OR), standard errors, and p values [Purcell, et al. 2007]. The R package, rmeta, was used to confirm results and generate meta-analysis plots. Meta-analyses results were based on fixed effect models to determine the evidence for association within our collected samples, so we are not making a general inference about what might be observed in other samples.
Our primary analysis was to determine if any intersecting sub-bins across Asians and African Americans would display evidence of consistent association when comparing heavy versus light smokers, where we defined a consistent association as having the same direction and p value < 0.01 in both populations. Our binning strategy resulted in 100 single sub-bin tests and 67 conditional association tests across the four bins: a total of 167 tests. Because the probability of any particular test resulting in a p-value < 0.01 in both non-European populations by chance would be 0.0001 (=0.01×0.01), results consistently associated in both populations would remain significant after Bonferroni correction (167×0.0001< 0.05).
Bin A (tagged by rs16969968 and rs1051730 in Europeans) includes 52 SNPs correlated (r2≥0.7) in the European ancestry reference sample. This bin separates into 20 sub-bins in Asian populations and 38 sub-bins in African American populations. We had adequate coverage to test 9 of these 20 sub-bins in Asian data and 27 of these 38 sub-bins in African American data.
We detected a strong association between the dichotomous phenotype of heavy smoking versus light smoking and bin A in European ancestry data (OR=1.31, 95% C.I.=1.22–1.40, p=1.3×10−14). The only sub-bin showing consistent association with heavy smoking across the other two populations is tagged by rs16969968 (Asian population: OR=1.64, 95%C.I.=1.15–2.32, p=5.8×10−3; African American population: OR=1.62, 95% C.I.=1.21–2.17, p=1.1×10−3). As noted in the Methods, because the probability of any particular test resulting in a p<0.01 in both populations by chance alone would be 0.0001, this result of consistent association in both populations remained significant after Bonferroni correction.
Figure 1 shows all SNPs in bin A, and the only consistently associated sub-bins (p<0.01 in both Asians and African Americans). Bin A variants span six genes in the European ancestry population, the sub-bin tagged by rs16969968 in the Asian population spans three genes, and the sub-bin tagged by rs16969968 in the African American population spans only one gene (CHRNA5). Figure 2 provides a forest plot summary of the stratified meta-analyses for the bin/sub-bin tagged by rs16969968, the only consistent association for bin A, in all three populations. Each plot lists ORs for each contributing sample. The overall cross-population meta-analysis across all datasets gave an OR of 1.33 (95%C.I.=1.25–1.42, p=1.1×10−17).
In European and Asian populations, rs16969968 and rs1051730 are highly correlated. However, due to the different LD structure in African Americans, rs16969968 and rs1051730 represent two different sub-bins (r2=0.40 in HapMap 3 Release 2 ASW). In our analysis of African Americans, there is stronger evidence of association between the dichotomous phenotype heavy smoking versus light smoking and the sub-bin tagged by rs16969968 (OR=1.62, 95% C.I.=1.21–2.17, p=1.1×10−3), compared to the sub-bin tagged by rs1051730 (OR=1.15, 95%C.I.=1.03–1.28, p=1.1×10−2). This stronger finding is seen despite the lower minor allele frequency (MAF) and much smaller available sample and for rs16969968 (MAF =0.06, 667 cases/1140 controls) compared to that for rs1051730 (MAF=0.12, 1712 cases/5640 controls).
For bin A, no tested sub-bin other than the one tagged by rs16969968 shows consistent association across populations. The meta-analyzed genetic associations between all available constituent sub-bins and heavy smoking are shown in Table S2.
Bin B (tagged by rs588765 and rs880395 in Europeans) includes 111 SNPs correlated (r2≥0.7) in the European ancestry reference sample, which was partitioned into 39 sub-bins in Asian and 37 sub-bins in African American ancestry reference samples. We had adequate coverage to test 10 of these 39 sub-bins in Asian samples and 22 of these 37 sub-bins in African American samples. Consistent with the previous report [Saccone, et al. 2010] which used some of these same data (see Table S1 for the overlap, which involves only European-ancestry samples), we find that in European ancestry samples, bin B is associated (OR=1.27, 95%C.I.=1.16–1.38, p=8.7×10−8) with heavy smoking in conditional analyses with rs16969968; bin B is not associated in single SNP analyses (OR=1.0, 95%C.I.=0.94–1.07, p=0.99). In Asian samples, testing for SNP association conditioning on rs16969968 show an association between heavy smoking and bin B, with the strongest result for the sub-bin tagged by rs514743 (OR=1.30, 95% C.I.= 1.07–1.58, p=9.7×10−3), which is similar to the single SNP test (OR=1.28, 95% C.I.=1.05–1.56, p=0.014). In African American subjects, there is a trend of association for the same sub-bin in conditional association (OR=1.16, 95%C.I.=0.99–1.36, p=0.064) (Table 3), compared to the single SNP association (OR=1.05, 95%C.I.=0.96–1.15, p=0.24). Thus, we found evidence of association in the Asian samples consistent with the association observed in the samples of European ancestry, but only a trend toward association in the African American subjects. The meta-analyzed conditional and single SNP associations between these constituent sub-bins and heavy smoking are shown in Tables S3 and S6.
Bin C (tagged by rs6495308, rs2036534, rs7163730, rs9788682, rs684513, and rs578776 in Europeans) includes 82 SNPs correlated (r2≥0.7) in the European ancestry reference sample, which was partitioned into 24 sub-bins in Asian and 26 sub-bins in African American reference samples. We had adequate coverage to test 12 of these 24 sub-bins in Asian samples and 19 of these 26 sub-bins in African American samples. Consistent with the previous studies [Liu, et al. 2010; Saccone, et al. 2010; TAG 2010; Thorgeirsson, et al. 2010], in European ancestry samples there is an association between heavy smoking and bin C (OR=0.79, 95% C.I.=0.72–0.86, p= 2.5×10−7) in association tests conditioning on rs16969968, as well as an association in a single SNP analysis (OR=0.77, 95% C.I.=0.71–0.83, p= 4.0×10−11).
Neither the Asian nor African American populations provide strong evidence of association with heavy smoking in any tested sub-bin in bin C under conditional association tests (all p>0.01). In the Asian data, the strongest single SNP signal was the sub-bin tagged by rs6495308 (OR=0.83, 95%C.I.= 0.72–0.96, p=9.8×10−3). In the African American data, there was no evidence of consistent association in either single SNP or conditional analyses (p>0.01) for the sub-bin tagged by rs6495308 or any other sub-bin. The meta-analyzed conditional and single SNP associations between tested sub-bins and heavy smoking are shown in Tables S4 and S7.
Bin D (tagged by rs2869046 in Europeans) includes 15 SNPs correlated (r2≥0.7) in the European ancestry reference sample, which was partitioned into 7 sub-bins in Asians and 7 sub-bins in African Americans. We had adequate coverage to test 2 of these 7 sub-bins in Asian samples and 3 of the 7 sub-bins in African American samples. We found no evidence of association between bin D and heavy smoking in European ancestry data, or across populations in single SNP or conditional association analyses (p>0.1). The meta-analyzed genetic associations between available sub-bins and heavy smoking conditional and single SNP associations are shown in Tables S5 and S8. All bins were tested in secondary analyses using the 4 level phenotype measured by cigarettes per day and results were similar.
This collaborative genetic meta-analysis of smoking behavior is the first to show consistent association in the chromosome 15q25 region with heavy smoking, across samples representing three genetically distinct populations – European ancestry, Asian, and African American. Previous meta-analyses examined only European ancestry data to definitively identify associations between chromosome 15q25 and smoking behavior. Smaller individual studies of Asians and African Americans have previously examined this region for association with smoking and related phenotypes. Smoking quantity has been reported as associated with variants correlated with rs16969968 in subjects of Asian and African American descent [Amos, et al. 2010; Li, et al. 2005; Li, et al. 2010; Saccone, et al. 2009; Schwartz, et al. 2010; Shiraishi, et al. 2009; Wu, et al. 2009]. Our meta-analysis synthesizes reported findings of individual SNP associations and compares genetic associations across multi-population samples to take the correlations between genetic variants within each population into account. Our meta-analysis strengthens the evidence of association between the specific SNP rs16969968 in bin A and heavy smoking across these diverse populations.
The strongest association signal seen in this gene cluster in European ancestry populations is represented by a group of 52 correlated variants, including rs16969968, which we call bin A. Due to these high correlations, the ability to statistically refine the association between smoking and these SNPs is very limited when using only European ancestry subjects. However, the LD structure between these 52 variants breaks down into 20 sub-bins in Asians and 38 sub-bins in African Americans.
By requiring consistent genetic effects across the three populations, we can refine a genetic association to variants that are more likely to reflect potential functional variants. Two SNPs in bin A are the most frequently reported from previous meta-analyses of smoking behavior in European ancestry subjects: rs16969968 and rs1051730. They are highly correlated (r2=1) in European ancestry and Asian populations, but display only modest correlation in African Americans (r2=0.40) (HapMap 3 Release 2 ASW). We can leverage this difference in LD architecture to differentiate the association of heavy smoking with these two variants.
In our meta-analysis of African-Americans, rs16969968 is more strongly associated with heavy smoking (OR=1.62, 95%C.I.=1.21–2.17, p=0.0011, N=1807) than rs1051730 (OR=1.15, 95%C.I.=1.03–1.28, p=0.011, N=7352). SNP rs16969968 is the most strongly associated polymorphism across all three populations and the only variant meeting the consistent association threshold in our study. In addition, SNP rs16969968 causes an amino acid change in the nicotinic receptor α5 subunit and alters function of its receptor [Bierut, et al. 2008]. The observed consistent associations across diverse populations, combined with the results of biological experiments on rs169669968, provide converging evidence that rs16969968, rather than rs1051730, is most likely one causative variant in this region driving the strongest association signal.
Prior meta-analyses in European ancestry populations have reported additional association signals distinct from bin A, and they cluster into three groups. Bin B, a group of 111 variants highly correlated in Europeans, includes the previously reported associated SNPs rs588765 and rs880395. The association with bin B previously reported in Europeans was seen only in association analyses conditioning on rs16969968 . Bin B consists of 39 sub-bins in Asian subjects and 37 sub-bins in African American subjects. In conditional analyses, we found evidence of association between bin B and heavy smoking in the Asian data (OR=1.30, 95% C.I.= 1.07–1.58, p=9.7×10−3), as well as reproducing the European ancestry finding (OR=1.27, 95%C.I.=1.16–1.38, p=8.7×10−8). In the African American data, there was a trend toward association in the same direction (OR=1.16, 95%C.I.=0.99–1.36, p=0.064).
Bin B variants, located upstream of the coding region of CHRNA5, are associated with variability in CHRNA5 mRNA levels in European ancestry samples [Falvella, et al. 2010; Smith, et al. 2010; Wang, et al. 2009]. Low levels of CHRNA5 mRNA expression are associated with lower risk for nicotine dependence. No data exist on CHRNA5 mRNA expression in other populations, and further work to examine expression data and smoking behavior in other populations is needed. Because the risk allele of rs16969968 occurs primarily on the low mRNA expression alleles represented by bin B, conditional SNP analysis controlling for bin A (rs16969968) is important to distinguish between these two distinct mechanisms [Saccone, et al. 2010; Wang, et al. 2009]. This is an important example to demonstrate how a genetic effect could be better detected and characterized when additional related variants are taken into account.
Bin C, a group of 82 variants correlated in Europeans, consisted of 24 sub-bins in subjects of Asians and 26 sub-bins in African Americans. Variants reported in previous meta-analyses of European ancestry (rs6495308 [Liu, et al. 2010], rs2036534 [Thorgeirsson, et al. 2010], rs7163730, rs9788682, rs684513 [TAG 2010], rs578776 [Saccone, et al. 2010]) were all examined. However, no tested SNP in bin C was consistently associated with heavy smoking with p < 0.01 in both Asians and African Americans. Similarly, we have no consistent associations with bin D which contains 15 SNPs correlated in Europeans and consists of 7 sub-bins in Asians and 7 sub-bins in African Americans.
In undertaking this project, we faced numerous challenges. First, smoking behavior differs substantially across populations. Smoking quantity distributions differ across populations; smokers of European ancestry smoke more heavily than do Asians or African Americans. As a result, we decided to compare heavy (>20 cigarettes per day) versus light smoking (≤10 cigarettes per day) in our primary association analysis to more closely capture the contrast between nicotine dependent smokers and non-dependent smokers. We then confirmed the consistency of results using the full distribution of smoking quantity in subsequent analyses.
Second, genotyping coverage varied between studies, and several studies in our meta-analysis had only a few variants genotyped. As a result, not all sub-bins were tested and the sample size varied across the tested sub-bins. For example, SNP rs55853698 which was imputed and reported as highly associated with smoking quantity in a previous meta-analysis of European ancestry subjects [Liu, et al. 2010], lies in bin A, but no genotyping data was available for testing this SNP or the sub-bin it represents in Asian and African American populations. Use of imputed data has the potential to mitigate these problems. However, imputation was not possible for our low-coverage studies. Therefore, the concerns about having untested SNPs and unequal subjects in the region would remain even with imputed data. We believe it is important to report our findings based on directly genotyped variants, and the interpretation of the consistent associations is not expected to change with imputation.
In addition, we were not able to perform thorough admixture tests in all datasets due to variable genotyping. Population stratification unaccounted for by our stratified analyses of self identified ancestry - European, Asian, and African American - could be a confounder in our results. Although we are leveraging the admixture to separate the effects of different genetic variants, there may be differential admixture in the cases and controls among African Americans. The impact of varied genetic architecture within given, broadly defined populations as well as within and across populations represented by individual sites (e.g., Japanese, Chinese, and Korean) needs to be elucidated in future larger scale studies with sufficient representation of individuals from different population backgrounds and more comprehensive genotyping.
Lastly, an association seen in one population that is not consistent across all three may nonetheless represent a true biological signal. Lack of consistency for the association may simply reflect differences in power for our population samples. Issues that can affect power across our three diverse populations include sample size, minor allele frequency, and even population-specific effect size. The last factor could arise in a variety of ways, including differences in LD structure, background variation, and marker information content. Thus we suggest caution when interpreting the negative or non-consistent association results from this study. Though these results strengthen the evidence for rs16969968 as a likely causal variant, this region remains in need of further interrogation with additional genotyping and standardized imputation across all populations.
Despite the limitations of this study, this meta-analysis refines the association signals with heavy smoking across samples representing European ancestry, Asian, and African American populations. In particular, for bin A, we present evidence showing rs16969968 is a likely causal variant for heavy smoking among the common SNPs in the bin. Our evidence also suggests there are additional distinct genetic variants in the chromosome 15q25 region associated with smoking, but we are unable to clearly identify these other associations across all three populations. For example, we extend the finding of association with bin B in European ancestry samples to an association in Asians, and a trend towards association in African Americans.
This consistent pattern of cross-population association despite many unmeasured genetic and environmental differences has provided important evidence to support true causal variants. It also provides critical information by narrowing a region of interest so laboratory experiments that must follow association studies can focus on a smaller number of variants. Thus, this study represents an important step on the pathway from association to function.
Please see Attachment A: Acknowledgements and Funding.