|Home | About | Journals | Submit | Contact Us | Français|
Cigarette smoking is the leading preventable cause of death in the US. Although smoking behavior has a significant genetic determination, the specific genes and associated mechanisms underlying smoking behavior are largely unknown. Here, we performed a genome-wide association study on smoking behavior in 840 Caucasians, including 417 males and 423 females, in which we examined ~380,000 SNPs. We found that a cluster of nine SNPs upstream from the IL15 gene were associated with smoking status in males, with the most significant SNP, rs4956302, achieving a p value (8.80×10−8) of genome-wide significance. Another SNP, rs17354547, that is highly conserved across multiple species, achieved a p value of 5.65×10−5. These two SNPs, together with two additional SNPs (rs1402812 and rs4956396) were selected from the above nine SNPs for replication in an African-American sample containing 1,251 subjects, including 412 males and 839 females. The SNP rs17354547 was successfully replicated in the male subgroup of the replication sample; it was associated with smoking quantity (SQ), the Heaviness of Smoking Index (HSI) and the Fagerstrom Test for Nicotine Dependence (FTND), with p values of 0.031, 0.0046 and 0.019, respectively. In addition, a haplotype formed by rs17354547, rs1402812 and rs4956396 was also associated with SQ, HSI and FTND, achieving p values of 0.039, 0.0093 and 0.0093, respectively. To further confirm our findings, we performed an in silico replication study of the nine SNPs in a Framingham Heart Study sample containing 7,623 Caucasians from 1,731 families, among which, 3,491 subjects are males and 4,132 are females. Again, male-specific association with smoking status was observed, for which seven of the nine SNPs achieved significant p values (p<0.05) and two achieved marginally significant p values (p<0.10) in males. Several of the nine SNPs, including the highly conserved one across species, rs17354547, are located at potential transcription factor binding sites, suggesting transcription regulation as a possible function for these SNPs. Through this function, the SNPs may modulate gene expression of IL15, a key cytokine regulating immune function. As the immune system has long been recognized to influence drug addiction behavior, our association findings suggest a novel mechanism for smoking addiction involving immune modulation via the IL15 pathway.
Cigarette smoking results in an annual death toll of 438,000 in the United States, where one in every five deaths is smoking related (1). Smoking is highly associated with the development of cardiovascular and respiratory diseases, and cancer (2). Most dramatically, men and women who smoke increase their risk of developing lung cancer by 13 and 23 folds, respectively, compared to non-smokers (2).
Despite extensive smoking-control efforts, > 20% (or > 45 million) of American adults continue to smoke (3). There is a substantial genetic component underlying smoking behavior, with a heritability > 50% as demonstrated in studies with twins (4). The specific genes underlying smoking behavior, however, remain largely unknown. To date, more than 20 whole genome linkage studies have identified a number of loci potentially linked to smoking behavior, but few of these loci have been replicated across studies with high statistical significance (for review, see (5)). Genetic association studies have also implicated several genes associated with smoking behavior, such as GABAB2 (6), DOPA decarboxylase (7), and nicotinic acetylcholine receptor α4 subunit (CHRNA4) (8) and α5/α3/β4 cluster on chromosome 15 (9). However each of these studies focused on genes with known significance in neural biology and, consequently, these association studies were not designed to identify potentially novel genes/regulatory mechanisms underlying smoking behavior. Moreover, most of the genes implicated in these association studies await further confirmation from independent studies.
A promising strategy to facilitate identification of genes underlying smoking behavior is genome-wide association studies (GWAS) that take advantage of the knowledge of linkage equilibrium (LD) patterns in humans and the rapid development of high throughput SNP genotyping platforms. With high SNP densities that facilitate detection of culprit DNA changes within a narrow genomic region, the GWAS approach has demonstrated its great power for identifying novel genes associated with human complex diseases/traits (10-14).
Here we conducted one of the first few GWAS investigations to search for novel genetic factors underlying smoking behavior. Using an Affymetrix 500K array, we successfully genotyped and analyzed a total of 379,319 single nucleotide polymorphisms (SNPs) for 840 unrelated Caucasians, including 417 males and 423 females. We identified a cluster of nine SNPs upstream of the IL15 gene, which achieved strong association with smoking status in males. One particular SNP, rs17354547, is highly conserved across multiple species. This SNP, together with several other ones, are located at potential transcription factor binding sites, suggesting their functional importance, possibly by regulating IL15 gene expression. Furthermore, this SNP was replicated in an African-American (AA) cohort, where it was associated with several nicotine dependence (ND) phenotypes, and all of the nine SNPs were replicated in silico for association with smoking status in a Framingham Heart Study (FHS) Caucasian sample. Our findings suggest a novel mechanism for smoking behavior, where the IL15 pathway may play an important role.
The study was approved by the necessary Institutional Review Boards of all involved institutions. Signed informed-consent documents were obtained from all study participants before they entered the study. For our GWAS, a random sample containing 840 unrelated Caucasians was identified from our established and expanding genetic repertoire currently containing more than 6,000 subjects. All of the chosen subjects were US Caucasians of European origin living in Omaha, Nebraska and its surrounding areas. They were healthy subjects recruited for genetic research of common human complex traits, such as bone mineral density and body mass index. Detailed recruitment and exclusion criteria were published elsewhere (15). Briefly, subjects with chronic diseases and conditions involving vital organs (heart, lung, liver, kidney, and brain) and severe endocrinological, metabolic, and nutritional diseases were excluded from this study. The general relevant characteristics of the study subjects are listed in Table 1.
Smoking-related data from all subjects were recorded in a nurse-administered questionnaire, which also included a detailed medical history. Subjects were categorized as “smokers” based on their answer to the question in the questionnaire “Do/did you smoke cigarettes?” A subject who never smoked is defined as a “non-smoker”. For non-smokers, we intentionally excluded those subjects younger than the age of 25. This exclusion strategy is to ensure that our control subjects (non-smokers) are unlikely to develop smoking behavior if exposed to cigarettes in future since most smokers start smoking in adolescence. For smokers, cigarette consumption information was also collected, which was the number of cigarettes smoked per day. For the purpose of our analyses, the cigarette consumption information was transformed into indexed smoking quantity (SQ) using the criterion from the Fagerström Test for ND (FTND) questionnaire: SQ = 1 if number of cigarettes smoked/day is 10 or less, SQ = 2 if the number is 11-20, SQ = 3 if the number is 21-30, and so on.
For replication of our GWAS findings, we used a family-based sample containing 402 nuclear families that included a total of 1,251 subjects (412 males and 839 females) (16). All the study subjects are of African-American (AA) origin and were recruited primarily from the Mid-South states of Tennessee, Mississippi, and Arkansas during 1999-2004. Proband smokers were required to be at least 21 years of age, have smoked for at least the last five years, and have consumed an average of 20 cigarettes per day for the last 12 months. Siblings and parents of a smoking proband were recruited whenever possible, regardless of their smoking status. Extensive data were collected on each participant, including demographics (e.g., sex, age, race, biological relationships, weight, height, years of education, and marital status), medical history, smoking history and current smoking behavior, ND, and personality traits assessed by various questionnaires, available at NIDA Genetics Consortium Website (http://zork.wustl.edu/nida). All participants provided informed consent. The study protocol and forms/procedures have been approved by all participating Institutional Review Boards.
In the present replication cohort, ND was ascertained by the three measures most commonly used in the literature: SQ (as defined in the above section), the Heaviness of Smoking Index (HSI: 0-6 scale), which includes SQ and smoking urgency (i.e., how soon after waking up does the subject smoke the first cigarette), and the Fagerström Test for ND (FTND: 0-10 scale) (17). A detailed description of the demographic and clinical characteristics of the sample is presented in Table 1.
To further replicate our GWAS findings in Caucasians, we used a sample from the FHS population, which contains 7,623 Caucasians, including 3,491 males and 4,132 females, from 1,731 families. The phenotype and genotype information of the cohort was downloaded from Framingham SHARe (SNP Health Association Resource), accessed through NCBI dbGaP (http://view.ncbi.nlm.nih.gov/dbgap). Appropriate procedures have been taken for the usage of the data, which include approval from UMKC IRB and signatures on the Data Distribution Agreement by all the UMKC investigators who have access to the data.
Self-reported smoking status was available for all the 7,623 subjects according to the Framingham SHARe. For determination of the smoking status, a question “Did you smoke cigarettes regularly in the last year?” was asked to a subject. Those answering “yes” are treated as smokers and “no” as non-smokers in this sample. In total, there are 1,172 smokers, among whom 542 are males and 630 females. To be consistent with our GWAS sample, those non-smokers who are younger than 25 were excluded from the analyses. The basic characteristics of the study subjects are presented in Table 1.
Genomic DNA was extracted from whole human blood using a commercial isolation kit (Gentra systems, Minneapolis, MN, USA) following the protocols detailed in the kit. Genotyping with Affymetrix Mapping 250k Nsp and Affymetrix Mapping 250k Sty arrays was performed at Vanderbilt Microarray Shared Resource using the standard protocol recommended by Affymetrix. Genotyping calls were determined from the fluorescent intensities using the DM (dynamic model-based) algorithm with a 0.33 P-value setting (18) as well as the B-RLMM algorithm (19). DM calls were used for quality control while the B-RLMM calls were used for all subsequent data analysis. B-RLMM clustering was performed with 94 samples per cluster.
The final average BRLMM call rate across the entire sample reached the high level of 99.14%. However, out of the initial full-set of 500,568 SNPs, we discarded 32,961 SNPs with sample call rates < 95%, another 36,965 SNPs with allele frequencies deviating from Hardy-Weinberg equilibrium (HWE) (P < 0.001) and 51,323 SNPs with minor allele frequencies (MAF) < 1%. Therefore, the final SNP set maintained in the subsequent analyses contained 379,319 SNPs, yielding an average marker spacing of ~7.9 kb throughout the human genome.
Based on our GWAS findings and the allowance of the budget, we selected four SNPs for replication analyses (i.e. rs4956302, rs17354547, rs4956396 and rs1402812). Please see the Results section for the detailed justification for selecting these SNPs.
DNA was extracted from peripheral blood samples of each participant using a kit from Qiagen Inc. (Valencia, CA). All SNPs were genotyped using the TaqMan SNP Genotyping Assay in a 384-well microplate format (Applied Biosystems, Foster, CA). Briefly, 15 ng of DNA was amplified in a total volume of 7 μl containing an MGB probe and 2.5 μl of TaqMan universal PCR master mix. Allelic discrimination analysis was performed on the ABI Prism 7900HT Sequence Detection System (Applied Biosystems, Foster CA). To ensure the quality of genotyping, SNP-specific control samples were added to each 384-well plate.
The final call rate for the 4 genotyped SNPs, rs17354547, rs1402812, rs4956396 and rs4956302 was 99.35%, 99.30%, 99.50% and 99.30%, respectively. To verify the quality of our genotyping, we also checked the SNP data for any significant departure from Hardy-Weinberg equilibrium (HWE). The HWE at each locus was assessed by the χ2 test and the significance level for the test was set at 0.0125 (=0.05/4), adjusting for multiple testing of four SNPs. All the genotyped SNPs were in HWE (p > 0.0125). In particular, for the SNP, rs17354547, the p value was 0.258.
Using this FHS sample, we performed in silico replication of the nine interesting SNPs identified in our GWAS cohort (see details for these nine SNPs in the Results section). Genotyping of these nine SNPs in the FHS sample was performed with Affymetrix 500K mapping array plus Affymetrix 50K supplementary array. For details of the genotyping method, please refer to Framingham SHARe at NCBI dbGaP website (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000007.v3.p2). Specifically, for the nine SNPs of interest, the call rates are as follows: 98.98% for rs12505771, 99.10% for rs6838494, 98.58% for rs17354547, 99.14% for rs17354568, 98.84% for rs17007301, 98.85% for rs1402812, 98.41% for rs4956396, 98.60% for rs13133830, and 98.33% for rs4956302. The p values for HWE test at these nine SNPs range from 0.41 to 0.88, suggesting HWE and good genotyping quality at these SNPs.
GWAS statistical analyses were performed with software package SAS (SAS Institute Inc., Cary, NC). Genotypic association analysis was conducted with logistic regression for association with “smoking status” and with a Poisson regression analysis for association with “SQ”. Genotypes were treated as independent variables and phenotypes (smoking status and SQ) were treated as dependent variables in each method. To adjust for the effects of “age” or “sex” on the association, these two covariates were included in a model together with the genotype information and the model was compared in terms of likelihood with another restricted model where only the effects of the covariates were estimated. The significance for the covariate-adjusted association, which is the difference in likelihood of the two models, was tested using a chi-square test.
Predisposing risk of each significant SNP was evaluated by an odds ratio (OR) and a corresponding 95% confidence interval (CI), which was calculated using the software package Stata (Stata Corporation, College Station, Texas). To calculate an OR for a certain SNP, subjects were first divided into two genotype groups, carriers of a minor allele (i.e., homozygotes of the minor allele and heterozygotes) and non-carriers of the minor allele (i.e., homozygotes of the major allele). The OR was then calculated by comparing the prevalence of smokers (i.e., ratio of smokers vs. non-smokers) in the two genotype groups.
To explore potential functions of the interesting SNPs identified through GWAS, we used the FASTSNP (function analysis and selection tool for SNPs) program (http://fastsnp.ibms.sinica.edu.tw) that analyzes SNP functions based on up-to-date information extracted from 11 external bioinformatic databases at query time (21). In addition, we also evaluated inter-species conservation of the identified SNPs and their flanking sequences using the UCSC Human Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway).
We adopted a GWAS significance threshold at p = 4.2 × 10−7, which was derived by Freimer and Sabatti (22) based on a gene-wise approach, and subsequently modified by Lencz et al. (23) taking into account a more accurate estimate of the total number of genes in the human genome.
To detect population stratification that may lead to spurious association results, we used the software Structure 2.2 (http://pritch.bsd.uchicago.edu/software.html) to investigate the potential substructure of our sample. The program uses a Markov chain Monte Carlo (MCMC) algorithm to cluster individuals into different cryptic sub-populations on the basis of multi-locus genotype data (24). To ensure robustness of our results, we performed independent analyses under three assumed numbers for population strata (k = 2, 3, and 4), using 2,000 un-linked markers that were randomly selected across the entire genome. To confirm the results achieved through Structure 2.2, we further tested population stratification of our GWAS sample using a method of genomic control (25).
To determine whether the association findings in our GWAS could potentially be due to bias (e.g., genotyping error), we examined the distribution of p values for all ~380,000 SNPs analyzed in our sample using the quantile-quantile (Q-Q) plot.
For the genotyped SNPs in our AA replication sample, the PedCheck program was used to determine genotyping consistency for Mendelian inheritance (26). Pair-wise linkage disequilibrium (LD) between all SNP markers was assessed using Haploview (27), with the determination of haplotype blocks based on block definitions proposed by Gabriel and colleagues (28). Association between individual SNPs and the ND measures (SQ, HSI and FTND) was determined by the PBAT program using generalized estimating equations (29), with gender and age entered as covariates. The association test was allele-based. Associations between each ND measure and haplotypes from multiple SNP combinations were examined using the FBAT program, with the computation of p values for the Z statistic based on the Monte Carlo sampling option under the null distribution of no linkage and no association (30). The haplotype analysis was single haplotype-based. Since four major haplotypes were tested, the significance for the analysis was set at 0.0125 (=0.05/4) adjusting multiple testing through Bonferroni correction.
The association analyses of the nine interesting SNPs (see details in the Results section) in the FHS sample were performed in the similar way as for the AA sample, using the FBAT program to examine association of the SNPs with smoking status (29).
We performed genome-wide genotypic association analyses for smoking status (adjusted for age and sex) and identified a cluster of nine SNPs, upstream (i.e., around 93 kb) from the IL15 gene, which ranked among the most significant 30 SNPs tested genome-wide in the total sample (Appendix I). Of particular interest, three of the nine SNPs, rs4956302, rs17354547 and rs1402812, ranked as the most significant SNPs among all SNPs tested genome-wide (Appendix I). We therefore focused our subsequent analyses on these nine SNPs.
In our total sample containing 840 subjects, the nine SNPs achieved p values at levels of 10−5 to 10−6 for association with smoking status, with the most significant SNP, rs4956302, reaching a p value of 1.19 ×10 −6 (Table 2). We further performed gender-specific association analyses (adjusted for age) for these nine SNPs and found that these SNPs were associated with smoking status much more strongly in males than in females, showing an apparent male-specific association (Table 2). In males, association of these SNPs with smoking status achieved significance levels similar to that seen in the total sample despite the fact that the number of male subjects was approximately one half of the total. In particular, rs4956302 achieved a p value of 8.08 × 10−8 in males, which is significant at the genome-wide level (Table 2).
Detailed information for these nine SNPs is shown in Table 2. All SNPs have minor allele frequencies (MAF) of ~ 0.20, and carriers of minor alleles in the male sample had odds ratios (OR) <0.50, suggesting a protective role against smoking behavior for the minor alleles of these SNPs in males. Genotype frequency distribution at the nine SNPs in smoking vs. non-smoking groups in males, females and the total sample is presented in Appendix II.
To further confirm the significance of these nine SNPs to smoking behavior, we also tested their association with another important phenotype for smoking, cigarette consumption (SQ), in the total sample as well as in males and females. The male-specific association pattern was also detected between these SNPs and SQ (Table 2); association with SQ was non-significant in females but achieved p values at the levels of 10−3-10−4 in both the total and male samples (Table 2).
Haplotype analyses using the Haploview program and our own genotype data indicated that strong LD exists among these nine SNPs, which form a single haplotype block. The haplotype formed by these nine SNPs was also associated with smoking status and cigarette consumption in males, achieving p values of 8.96×10−5 and 6.20×10−3, respectively. Figure 1 presents the structure of the haplotype block formed by these nine SNPs as well as association signals achieved in males for “smoking status” and “cigarette consumption” at these SNPs. (Figure 1 about here)
Using the FASTSNP program, we analyzed the potential functions of these nine SNPs. According to the analyses, four of the SNPs are located at potential transcription factor (TF) binding sites. For the SNP rs4956302, a polymorphic change of T→C may establish a binding site for the TF, Lyf-1. For the SNP, rs17354547, a polymorphic change of A→C may delete binding sites for the TFs, CdxA and YY1. For the SNP, rs17354568, a polymorphic change of A→C may create a binding site for the TF, Ik-2. For the SNP, rs6838494, a polymorphic change of C→A may create a binding site for TFs, CdxA and Nkx-2 but delete the binding site for Oct-1.
Using the UCSC Human Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway), we determined that rs17354547, and the potential TF binding sequence containing this SNP (as suggested by the FASTSNP program), is highly conserved among 28 vertebrate species (Figure 2). (Figure 2 about here)
For this replication study, the SNP rs4956302 was selected due to the fact that it achieved the highest significance for association with smoking status among all the SNPs tested genome-wide. The SNP rs17354547 was selected because it is highly conserved across multiple species. The other two SNPs (rs4956396 and rs1402812) were randomly selected from the remaining seven SNPs since these seven SNPs are in high LD and achieved similar p values (Figure 1).
In the total sample of our AA replication cohort, these four SNPs did not achieve significant p values for association with SQ, HSI and FTND. However, in the male sub-group of our replication sample, the SNP rs17354547 was significantly associated with SQ, HSI and FTND, achieving p values of 0.031, 0.0046 and 0.019, respectively. In addition, a haplotype formed by the SNPs rs17354547, rs1402812 and rs4956396 was also associated with SQ, HSI and FTND, achieving p values of 0.039, 0.0093 and 0.0093, respectively. Since a total of four major haplotypes were tested, to adjust for multiple testing, the significance level for the haplotype association analysis was reset at 0.0125 through Bonferroni correction (0.05/4). Therefore, based on the adjusted significance level, the haplotype association was significant for only two ND phenotypes, which are HSI and FTND. Additional details for the above results are presented in Table 3.
We performed in silico replication of the nine interesting SNPs identified in our GWAS using the FHS sample. In the total sample and in the female subgroup of the sample, no significant results were achieved. However, in the male subgroup of the sample, seven of the SNPs achieved significant (p<0.05) and two achieved marginally significant p values (p<0.10) for association with smoking status. Additional details of the results are presented in Table 4.
To detect potential stratification of our GWAS sample, we analyzed our sample using software Structure 2.2 (24). When 2,000 randomly selected un-linked markers were used to cluster our subjects, all subjects of the sample were tightly clustered together under all assigned values (i.e., 2, 3, or 4) for the assumed number of population strata, k; these results suggest no population stratification. The results are shown in Appendix III.
We further tested our GWAS sample for population stratification using the genomic control method (25). Based on genome-wide SNP information, we estimated the inflation factor (λ), a measure for population stratification. Ideally, for a homogeneous population with no stratification, the value of λ should be equal to or near 1.0. For our total sample, the estimated λ value was 1.009 for smoking status and 1.012 for cigarette consumption (SQ), suggesting essentially no population stratification; these findings further confirm the results achieved through the Structure 2.2 software.
Using the Q-Q plot, we examined the distribution of p values achieved for smoking status in our GWAS for all ~380,000 SNPs that were analyzed (Appendix IV). As shown in the plot, the observed p values match well with the expected p values over a wide range of values of [−LOG10(p)], which is from 0 to ~ 4. Observed p values gradually depart from expected p values at the extreme tail, where [−LOG10(p)] is ≥ ~4. This pattern suggests that our GWAS association findings are more likely to be attributable to true genetic variation than to potential bias, such as genotyping errors.
This study reports one of the first few GWAS of smoking behavior in Caucasians. Through this study, we identified a cluster of nine SNPs upstream from the IL15 gene, which ranked among the most significant 30 SNPs associated with smoking status in our GWAS (Appendix I). Gender-specific analyses indicated that these nine SNPs were associated with smoking status in a male-specific manner (Table 2). In particular, the SNP rs4956302 achieved a genome-wide significant p value of 8.80×10−8 in male subjects. Another SNP, rs17354547, is highly conserved across multiple species (Figure 2), suggesting its functional importance. Of note, these nine SNPs were also associated with another important phenotype for smoking behavior, cigarette consumption (SQ), and this association was also male specific (Table 2).
From these nine SNPs, we choose four SNPs (rs4956302, rs17354547, rs1402812 and rs4956396) to replicate our association findings in an AA family-based cohort containing 1,251 subjects, including 412 males and 839 females. We selected rs4956302 due to the fact that it achieved the highest significance among all the SNPs tested genome-wide in our GWAS cohort. We also selected rs17354547 because it is highly conserved across multiple species. Another two SNPs (rs1402812 and rs4956396) were randomly chosen since all seven of the remaining significant SNPs are in high LD and had similar p values. Interestingly, a male-specific association with multiple ND phenotypes was observed for the SNP rs17354547; in male subjects of our AA replication cohort SNP rs17354547 achieved p values of 0.031, 0.0046 and 0.019 for association with SQ, HSI and FTND, respectively (Table 3). Furthermore, in male subjects from our replication sample, a haplotype formed by the SNPs rs17354547, rs1402812 and rs4956396 was also associated with SQ, HSI and FTND, with p values of 0.039, 0.0093 and 0.0093, respectively. The replication results support our GWAS findings for an association between smoking behavior and these SNPs located upstream from the IL15 gene.
To further confirm our GWAS findings, we performed an in silico replication study of the nine interesting SNPs upstream of the IL15 gene using a large FHS sample containing 7,623 Caucasians from 1,731 families. Again, a clear pattern of male-specific association of these SNPs with smoking status was observed; although none of the SNPs achieved p values less than 0.44 in the female subgroup, seven of the SNPs achieved p values less than 0.05 and two achieved p values less than 0.10 in the male subgroup (Table 4). The results provide additional support to our GWAS findings and replication findings achieved in the AA cohort.
It was not until very recently that intergenic transcription has been recognized as an active and common cellular process. Evidence has shown that a significant portion of the transcriptome arises from outside annotated genes (31,32). As an important function, intergenic transcription can regulate expression at nearby genes (33,34). In particular, intergenic transcription was found to be an important mechanism underlying expression of cytokine genes, such as GM-CSF, IL3, IL4, IL5, IL 10, and IL13 (35-38). Given their location at potential TF binding sites, those SNPs identified in our GWAS that are upstream from the cytokine gene IL15, might potentially regulate IL15 gene expression through intergenic transcription. Importantly, the SNP, rs17354547, replicated in both the AA and FHS cohorts, as well as the TF binding site that can be potentially modulated by this SNP, are highly conserved across multiple species (Figure 2), further supporting the functional importance of this SNP in transcription regulation. Overall, our findings suggest that the observed association of the SNPs upstream of the IL15 gene with smoking status and multiple ND phenotypes may be mediated through regulation of IL15 gene expression, and that this appears to represent a novel mechanism underlying smoking behavior.
Multiple lines of evidence demonstrate that the immune system, in particular, lymphoid cells, play an important role in drug addiction. Destruction of the immune system with irradiation or immunosuppressive drugs has been shown to significantly alleviate the opiate-withdrawal syndrome (39,40). In contrast, transfer of lymphoid cells to irradiated rats before morphine administration restores drug-withdrawal signs (41). These findings suggest a mechanism for neuro-immunological interactions, where factors derived from the immune system may regulate functions of the central nervous system, influencing addictive behaviors. This mechanism is supported by the discovery of functional synapses between neurons and lymphocytes (42). Given that IL15 is an important immunoregulatory cytokine influencing activation and proliferation of T lymphocytes and natural killer cells, it appears reasonable to speculate that IL15 influences smoking addiction through its immunoregulatory effects.
Population stratification and/or ethnic admixture can be an important source of spurious association in genetic association studies. However, these factors did not affect our GWAS sample and are therefore unlikely to have interfered with our association results. Our study cohort came from an apparently homogenous US mid-west white population, living in Omaha, Nebraska and its surrounding areas. We found that the allele frequencies for the interesting SNPs in our sample are very similar to those reported in the typical and representative Caucasian samples used in the HapMap CEU (Table 2). Furthermore, using the program Structure 2.2 (24), we analyzed our study subjects thoroughly in order to detect potential sub-populations in our sample. In these analyses, all subjects tightly clustered together as a single group, suggesting no significant population substructure in our sample (Appendix III). Furthermore, the measure for population stratification (λ) for our GWAS sample, calculated through the genomic control method (25), was 1.009 for smoking status and 1.012 for cigarette consumption, suggesting essentially no stratification. For the above reasons, the association results, as detected in our GWAS, are not likely to be plagued by spurious associations due to population admixture/stratification.
In our GWAS discovery cohort, control subjects were defined as never-smokers. This criterion for selecting controls is different from the conventional one, where current non-smoking subjects with a certain degree of previous exposure (e.g., having smoked more than 1 but less than 100 cigarettes in their lifetime) are normally selected as controls. Therefore, a potential problem of our study design is that some “control” subjects in our GWAS sample may in future become smokers if exposed to cigarettes. Depending on the number of such subjects, this potential misclassification problem may undercut the statistical power of our study, leading to false negative results. In our study, we tried to minimize the effects caused by this potential problem by excluding those control (non-smoking) subjects under the age of 25 from our study. Since most smokers initiate smoking behavior in adolescence, non-smoking subjects under the age of 25 may have a much higher chance than older people to develop smoking behavior if exposed to cigarettes. Therefore, after excluding these younger subjects from our control group, the subjects in the group that may later develop smoking behavior due to exposure to cigarettes, if existing, may not be in large numbers. Hence, the potential misclassification problem caused by our control subject selection strategy may have only moderate effects to the overall results of our study. The robustness of our GWAS findings is supported by their replication in both the AA and FHS cohorts.
As another limitation of our study, we did not adjust for multiple testing (for testing multiple smoking behavior-related phenotypes in our GWAS and the AA replication cohorts). However, due to the limited number of different phenotypes (i.e., 2 phenotypes in the GWAS cohort and 3 phenotypes in the AA cohort) and the fact that these phenotypes are correlated smoking behavior traits, adjusting for multiple testing may only have minor effects on the current results. Even with the most stringent correction, Bonferroni correction that does not consider correlation of the multiple traits, the most significant SNP in our GWAS, rs4956302, is still significant at the corrected genome-wide significance level of 2.1×10−7 (= 4.2×10−7/2) for association with smoking status, and the most significant SNP in our AA replication study, rs17354547, is also significant at the corrected significance level of 0.017 (=0.05/3) for association with HSI. Again, replication of our GWAS findings in two different cohorts attests to the findings' robustness and may have attenuated the potential problem due to multiple testing of several phenotypes.
In summary, we identified a group of SNPs, upstream from the IL15 gene, that were associated with both smoking status and quantity of cigarette consumption. Interestingly, a key SNP, rs17354547, which is highly conserved across multiple species, was replicated in an independent AA cohort for association with multiple ND phenotypes. Moreover, all of the nine SNPs were replicated in silico in a FHS cohort for association with smoking status. Remarkably, the association of the SNPs with smoking behavior-related phenotypes in both our GWAS and the two replication samples appeared to be male-specific. Higher prevalence of smoking in males than in females in the US (3) attaches additional importance to our findings. Some of the SNPs, located at potential TF binding sites, may regulate IL15 gene expression and consequently, could have an important regulatory effect on the immune system. The above findings, together with previous data from studies of drug addiction, compel us to propose a novel mechanism for smoking addiction modulated by the immune system, where the IL15 pathway may play a key role. The confirmation and elaboration of this hypothetical mechanism needs further detailed functional studies directed at IL15.
Investigators of this work were partially supported by grants from NIH (R01 AR050496-01, R21 AG027110, R01 AG026564, R21 AA015973, P50 AR055081, and R01 DA12844). The study also benefited from grants from National Science Foundation of China, Huo Ying Dong Education Foundation, HuNan Province, Xi'an Jiaotong University, and the Ministry of Education of China.
The Framingham Heart Study and the Framingham SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University. The Framingham SHARe data used for the analyses described in this manuscript were obtained through dbGaP (accession number phs000007.v3.p2). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI.
|SNP||P Value||Chromosome||Position||Associated Gene|
|SNP||Smoking status||Genotypea||N (total)b||N (male)b||N (female)b|