|Home | About | Journals | Submit | Contact Us | Français|
Abilities to successfully quit smoking display substantial evidence for heritability in classic and molecular genetic studies. Genome-wide association (GWA) studies have demonstrated single-nucleotide polymorphisms (SNPs) and haplotypes that distinguish successful quitters from individuals who were unable to quit smoking in clinical trial participants and in community samples. Many of the subjects in these clinical trial samples were aided by nicotine replacement therapy (NRT). We now report novel GWA results from participants in a clinical trial that sought dose/response relationships for “precessation” NRT. In this trial, 369 European-American smokers were randomized to 21 or 42 mg NRT, initiated 2 wks before target quit dates. Ten-week continuous smoking abstinence was assessed on the basis of self-reports and carbon monoxide levels. SNP genotyping used Affymetrix 6.0 arrays. GWA results for smoking cessation success provided no P value that reached “genome-wide” significance. Compared with chance, these results do identify (a) more clustering of nominally positive results within small genomic regions, (b) more overlap between these genomic regions and those identified in six prior successful smoking cessation GWA studies and (c) sets of genes that fall into gene ontology categories that appear to be biologically relevant. The 1,000 SNPs with the strongest associations form a plausible Bayesian network; no such network is formed by randomly selected sets of SNPs. The data provide independent support, based on individual genotyping, for many loci previously nominated on the basis of data from genotyping in pooled DNA samples. These results provide further support for the idea that aid for smoking cessation may be personalized on the basis of genetic predictors of outcome.
Cigarette smoking is a significant cause of premature death and disease (1). Although abstinence reduces risks to smokers, success rates after attempts to quit smoking remain modest. One year after unaided attempts to quit smoking, abstinence rates are <5%. Even with pharmacologic aids that increase success, long-term abstinence rates are <25% (2). Twin studies document substantial heritability for smokers’ abilities to successfully abstain from smoking, suggesting substantial genetic components to individual differences in abilities to quit (3,4).
We recently reported genome-wide association (GWA) studies for success in quitting smoking in six independent samples of carefully monitored individuals who attempted to quit smoking in clinical trials or in community quitters, using carefully validated DNA pooling approaches (5–8). No result from any of these studies achieves “genome-wide” significance. However, the molecular genetic results from these independent samples display substantial convergence with each other (that is, the nominally positive results from each of these samples cluster in small chromosomal regions to extents much greater than expected by chance, and the same small chromosomal regions are identified by the clustered, nominally positive results from different samples to greater extents than those expected by chance) (5,9–13).
We report GWA studies of smoking cessation success in individually genotyped European-American participants in a smoking cessation trial that examined effects of 21 versus 42 mg/24 h precessation nicotine replacement therapy (NRT) (14). Although the sample size is modest for GWA, we nevertheless described the highly significant overlap between the chromosomal regions identified in this work and those identified by nominally significant associations with successfully quitting in other studies of smoking cessation. We identify specific gene ontology classes into which candidate “quit success” genes (that are identified in these analyses) fall more often than expected by chance. We describe a Bayesian network into which the quit success–associated SNPs fall.
Adult smokers who expressed desires to quit were recruited and screened at one of four North Carolina centers. Participants provided written informed consent; reported smoking an average of ≥10 cigarettes/day that each yielded ≥0.5 mg nicotine; displayed end-expired air carbon monoxide (CO) ≥10 ppm; failed to display any exclusionary features on history, physical examination or laboratory evaluations; and were compensated up to $140. Smokers were subdivided into low- and high-dependence subgroups (Fagerström Test for Nicotine Dependence [FTND] scores ≤6 or >6, respectively), and individuals in each of these subgroups were randomly assigned to 21 mg/24 h or 42 mg/24 h nicotine patch doses. During seven study sessions, brief supportive counseling was provided, clinical trial materials were dispensed and dependent measures were assessed. Dependent measures included measured end-expired air CO and reports of smoking, withdrawal symptoms and adverse effects including nausea and/or emesis.
Each participant wore two skin patches daily for 6 wks, beginning 2 wks before the target quit date. One 21-mg active patch (GlaxoSmithKline, Research Triangle Park, NC, USA) was applied in the morning. At noon, either another 21-mg patch (42 mg/day) or a placebo patch (Rejuvenation Labs, Cadillac, MI, USA) (21 mg/day) was applied. NRT doses were gradually reduced beginning 4 or 6 wks after the quit date for the 42 and 21 mg/24 h groups, respectively. Participants with sleep disturbances removed patches at bedtime and applied new ones upon awakening. Subjects experiencing other symptoms of nicotine toxicity reduced doses until symptoms abated according to the following sequence: reduce morning patch from 21 to 14 to 7 to 0 mg/day and then discontinue the afternoon patch. All participants were provided with denicotinized cigarettes (<0.05 mg nicotine yield; Vector Tobacco, Mebane, NC, USA) to smoke during the 2-wk precessation period.
The primary outcome—continuous abstinence from the target quit date through the end of treatment (10 wks)—was assessed on the basis of self-reports of continuous abstinence that were confirmed by end-expired CO levels ≤10 ppm. An intent-to-treat criterion was used. Participants who withdrew from the study or were lost to follow-up were classified as nonabstinent.
DNA was extracted from blood, quantitated and genotyped by using Affymetrix 6.0 microarrays according to the manufacturer’s instructions. Genotypes for each individual passed Affymetrix quality control metrics with a contrast quality control threshold >0.4 and provided calls for >97% of SNP genotypes. Imputation using PLINK (15) with a confidence threshold >0.95 determined most missing genotype calls. We assessed data from 905,273 SNPs, of which 868,154 were autosomal, 36,862 were located on X and 257 were located on Y.
Genetic background was assigned for each individual on the basis of principal component analyses of data from all SNPs and was confirmed by self-report in almost all cases (14). Data from the 369 participants of European- American descent who were identified in this way are analyzed herein.
Differences between allele frequencies in successful quitters versus unsuccessful quitters were compared by using the χ2 test. We performed preplanned primary “nontemplate” GWA analyses similar to those we have previously described (16). We identified SNPs that (a) display χ2 values with P < 0.01 “nominally positive” significance compared with data from individuals who were successful versus unsuccessful in quitting smoking and (b) cluster in small chromosomal regions, so that at least four of these nominally positive SNPs lie within 25 kb of at least one other positive SNP. A number of these clustered, nominally positive SNPs identify genes; many also lie between currently annotated genes.
To seek additional support for the chromosomal regions identified by these clusters of nominally positive SNPs, we sought additional association signals in these same regions from clustered, nominally positive SNPs identified in relevant independent GWA studies: (a) Uhl et al.: 1,000,000 SNP studies of smokers who quit versus those who continued to smoke in the “patch in practice” study of NRT in UK smokers (8,17); (b) Uhl et al.: 1,000,000 SNP GWA studies of smokers who quit versus those who continued to smoke in a clinical trial of denicotinized cigarettes (7); (c) Drgon et al.: 500,000 SNP GWA studies of smokers who quit versus those who continued to smoke in community settings (6); (d–f) each of three samples from Uhl et al.: 500–600,000 SNP GWA studies of smokers who were successful versus unsuccessful in quitting in clinical trial settings (5); and (g) Bierut et al.: 38,000 SNP GWA studies of nondependent (FTND) versus dependent (FTND) smokers (11). To provide insight into some of the genes likely to harbor variants that contribute to individual differences in ability to quit, we identify genes that are identified by clustered, nominally positive SNPs from the current sample and at least two other quit success or nicotine dependence samples.
We compare observed results for smoking cessation success to those expected by chance using 10,000 Monte Carlo simulation trials, as described (18). For each trial, a randomly selected set of SNPs from the current data set was assessed to see if it provided results equal to or greater than the results that we actually observed. The number of Monte Carlo trials for which the randomly selected SNPs displayed (at least) the same features as the observed results was then tallied to generate an empirical P value. These simulations thus corrected for the number of repeated comparisons made in these analyses, an important consideration in evaluating these GWA data sets. We also performed permutation analyses using PLINK to provide a secondary assessment of significance.
To assess the power of our current approach for smoking cessation success, we used current sample sizes and standard deviations, the program PS v2.1.31 (19,20) and α = 0.05. To provide controls for the possibility that differences between quitters and nonquitters observed herein were due to occult ethnic/racial allele frequency differences or noisy assays, we assessed the overlap between the results obtained here and the SNPs that displayed the largest (a) allele frequency differences between African-American versus European-American control individuals and (b) the largest assay “noise.”
Bayesian networks are probabilistic graphical models that represent a set of variables as nodes and their conditional interdependencies as edges. These networks thus provide data-driven probabilistic classifications that can identify ways in which results from sets of SNPs provide a reasonable network, which SNPs provide the most direct relationship to quit success and which SNPs provide a more indirect relationship to quit success. We thus used BayesWare (Markov Chain Monte Carlo methods; BayesWare™, http://www.bayesware.com) to seek networks for sets of the 25, 50, 100, 200, 500 and 1,000 SNPs that displayed the strongest evidence for association with quit success from the current data, or from sets of 25, 50, 100, 200 and 500 SNPs that came from lists in which there were random relationships between the P values and SNPs. The numbers of SNPs included in the networks formed were tabulated for each set of SNPs from true and permuted control data sets. The network based on 1,000 true SNPs was used for subsequent analyses that sought relationships between SNPs and quit success and between SNPs in the inner versus outer circles of this Bayesian network.
Gene ontology analyses were performed in BioBase™. The gene names in lists of genes identified by clustered, nominally significant results were matched to BioBase gene annotations. Functional enrichment analyses were performed by using “biological process” gene ontology (GO) terms as defined in the BioBase knowledge base. Functional enrichment was tested by using hypergeometric tests. To provide a control, random gene lists of the same size were assembled from the list of all genes using a Perl script (Drgon et al., unpublished data); GO analysis was then performed on these random gene lists. The hypergeometric test P value distributions of the randomized gene lists analyses were compared with the P value distributions obtained from GO analysis of the bona fide lists.
When comparing data from European-American trial participants who were unsuccessful with successful quitters, there is significant clustering of nominally positive SNPs in small chromosomal regions. Thus, there are 5,898 “nominally positive” SNPs with nominal P < 0.01. A total of 2,147 of these SNPs lie in 338 clusters, each containing at least four nominally positive SNPs separated from each other by ≤25 kb. We would expect eight such clusters by chance (Monte Carlo P < 0.0001). A total of 176 of the regions identified by these clustered, nominally positive SNPs contain a total of 206 genes (Table 1). None of 10,000 permutation tests in which individuals were randomly assigned to be “pseudo abstinent” or “pseudo nonabstinent” ever identified as many SNPs that achieved nominally significant results and that clustered in small chromosomal regions as found in the actual data set (thus P < 0.0001).
We calculated the power of these samples for detection of 5%, 7.5% and 10% differences in allele frequency. We used the mean 0.24 minor allele frequency that we found for nominally positive SNPs in these samples. The power to detect these differences was 0.15, 0.28 and 0.43, respectively.
These data for clustered, nominally positive SNPs from the current data set provide significant chromosomal overlap with genes that have been identified by other relevant data sets, largely those derived from validated pooled genotyping approaches (Table 2). These approaches identify the same genes that are identified by nominally positive results in other studies to extents much greater than what we would expect by chance. The overlaps between the clustered, nominally positive SNPs from the current sample and the clustered, nominally positive SNPs from at least two other samples of successful versus unsuccessful quitters and/or nicotine dependence identify 59 genes. Whereas the empirical P values associated with most of these genes do not withstand stringent Bonferroni corrections for multiple testing, several of these gene-wise P values do yield P values <0.0008 and thus survive this correction for multiple testing (21) (Table 2).
Control for occult stratification was based on examining the overlap between the 2,147 clustered, nominally positive SNPs from the present quit success analyses with the 2.5% of the SNPs for which the racial/ethnic differences in control individuals from prior data sets were largest. We identified 48 SNPs with these properties; 50 would have been expected by chance. Controls for noisy SNPs found that 70 of the clustered, nominally positive SNPs overlapped with the set of SNPs that provided the largest variance in other assessments of these SNPs using Affymetrix 6.0 arrays, while 50 would be expected by chance. We identify the clusters that contain SNPs that provide greater assay variance in Table 1.
Bayesian networks incorporated many of the SNPs that provided the strongest 25, 50, 100, 200, 500 or 1,000 P values for the true quit success data when analyzed by using BayesWare (Figure 1) (http://bayesware.com [22–25]). By contrast, only a few SNPs were included in the corresponding analyses of data from permutated control sets of SNPs in which there were random relationships between SNPs and the set of P values obtained from the bona fide data (Figure 1). Figure 2 provides a graphic representation of the Bayesian network for data from the 1,000 SNPs with the strongest P values. Interestingly, the relationship between the SNPs for which data directly predicts abstinence in this data set (for example, those in the inner circle forming the “Markov Blanket” of the outcome node) and the SNPs located in the outer circle can be explained by the linkage disequilibrium between the SNPs (data not shown). This relationship would be expected if the network was detecting true biological relationships, but not if the network was detecting noise. However, there were relatively few interrelationships between these “inner circle” SNPs (data not shown), suggesting that linkage disequilibrium was not responsible for much of the influence of these SNPs on quit success.
The 5,898 SNPs, for which alleles are identified by these results as directly predicting abstinence, display P values that range from 0.0000028 to 0.01 in the primary data set. A total of 960 of these SNPs also display nominally significant association with quit success in at least one other previously reported quit success data set, whereas 32 of these SNPs display such nominally significant associations in at least two prior samples.
A number of genes identified by clusters of nominally significant SNPs in this work fall into several functional classes identified by gene ontology. Functional enrichment analysis (BioBase) that compares the representation of functional classes with all human genes identified significant overrepresentation, when corrected for false discovery rate (FDR), of genes involved in the following: molecular functions, the membrane/plasma membrane, synapses and synaptic transmission of nerve impulses, cell communication, radial glia-guided migration of Purkinje cells, protein binding, neuron projections, protein kinase C activity, cell–cell signaling and communication, cell migration in hindbrain, negative regulation of response to stimulus, localization of the cell, hindbrain radial glia-guided cell migration, cell motion, axon guidance, binding, cell junctions, hind-brain development, signal transduction, nucleoside monophosphate and cAMP metabolic process, G-protein complexes and glutamate receptor activity.
The current results provide independent support, from individually genotyped GWA, for data derived from prior studies of smoking cessation success in clinical trial and community settings that used validated methods for pooled genotyping. The substantial overlaps between the autosomal data obtained with individual genotyping and those obtained previously in pooled DNA samples provide mutual validation for the current and previous data sets. The current results provide additional support for polygenic contributions to individual differences in the ability to quit smoking.
These observations can be discussed in light of the strengths and limitations of the current data set. The data display several strengths: (a) the successful and unsuccessful subjects were recruited at the same time from the same study centers, providing significant assurance that contributions of underlying stratification to the results obtained herein have been minimized; (b) both the careful clinical and biochemical monitoring of these participants support the accuracy of smoking cessation assessments; (c) nominally positive results from this work cluster into small chromosomal regions to extents greater than expected by chance; (d) many more of the positive results from this work than we would expect by chance identify the same chromosomal regions that were identified by other studies of smoking cessation and/or vulnerability to develop nicotine dependence in smokers; (e) in these same subjects, a single genotype score per subject that was based on data from the study by Uhl et al. (5) predicted quit success via interactions with nicotine dose and FTND dependence significantly better than at random (P = 0.015 ); (f) the true results from this trial, but not permuted results, form a plausible Bayesian network; and (g) the genes identified by these results provide overrepresentation of plausible groups of biological mechanisms in functional enrichment analyses (BioBase).
There are also limitations of these analyses. First, the sample is of modest size from the perspective of GWA, although it is relatively large from the perspective of a clinical trial. This modest sample size provides modest power. This modest power led us to forego analyses of subgroups, such as comparisons between subjects treated with 21 versus 42 mg nicotine. It reduces our confidence in the genes that are identified in this work but not in prior studies and in the negative data concerning genes that have been reproducibly identified in prior studies but not in the current work. Second, individuals in this trial were recruited so that an equal number of participants with FTND scores ≤6 and >6 were randomized to 42 or 21 mg NRT. We combined individuals treated with both doses in the current analysis to increase power, since overall effects of dose on quit success rates were not significant (although effects can be noted in subsets of subjects). Third, we identify no large effects of any SNP assessed here. Data for individual SNPs are less robust than data for clusters of nominally positive SNPs or sets of these clustered SNPs. The data from individual SNPs from this trial, for example, fail to achieve significance in permutation analyses (data not shown). Fourth, more than one-quarter of the SNPs that form seven of the clusters identified in Table 1 are found among the sets of SNPs for which assay variation is large in prior studies using these same Affymetrix 6.0 reagents. Although no cluster is identified solely on the basis of SNPs with these properties, we label these clusters in Table 1 to provide additional cautions in interpreting these results. Fifth, we have not used SNPs, samples or treatments that are identical to those used in prior smoking cessation GWA studies. Each of these issues has limited our enthusiasm about use of SNP-by-SNP meta-analyses, although these metaanalyses might be appropriate when larger data sets are assessed (26–29). Sixth, because some of the chromosomal clusters contain genes with related functions, by selecting all of the genes in a cluster for BioBase analyses, some selection bias may be introduced.
Clustering of SNPs whose allele frequencies display nominally significant differences between successful quitters and those who were not successful provides a major preplanned signal that lies at the core of the analyses used herein. We would anticipate the observed highly significant clustering of SNPs that display nominally positive results in this and several additional independent samples if many of these positive SNPs lay near and were in linkage disequilibrium with functional allelic variants that distinguished subjects who were more able to quit smoking from those who were less able to quit. We would not anticipate this degree of clustering if the results were due to chance. The Monte Carlo P values noted here are thus likely to receive contributions from both the extent of linkage disequilibrium among the clustered, nominally positive SNPs and the extent of linkage disequilibrium between these SNPs and the functional haplotype(s) that lead to associations with quit success. These Monte Carlo P values thus weigh against two null hypotheses: (a) that all of the results are random “noise” (Monte Carlo P values for clustering data from the current study alone) and (b) that the results are caused by stochastic differences in haplo-type frequencies between the successful versus unsuccessful quitters (Monte Carlo P value for clustering data from the current versus prior quit success GWA studies).
The current work has thus identified a set of SNPs that, based on Bayesian network analyses and overlap with prior data sets, are likely to identify a network of SNPs and genes with true biological relationships. Indeed, the genes identified in the current and prior smoking cessation studies are overrepresented in specific GO categories (Table 3). Most of these genes are expressed in the brain, as we might expect for addiction-related traits. Many can be related to neurotransmission processes, as we again might expect for such traits. Although the large number of genes identified in this work precludes detailed discussions of each gene, it is especially interesting to note the substantial representation of “cell adhesion”–related genes among those likely to contain allelic variants that associate with the ability to quit smoking. These genes include DAB1, ASTN, CTNNA2, FHIT, SLIT3, MAGI2, SEMA3A, CSMD1, PTPRD, GPC5, GPC6 and CDH13 (30). It is also interesting that the GO results point to several kinds of biological processes of importance for development of and function of selected brain circuits. We could speculate that variations in such genes could influence brain development, alter basal or preexisting behavioral traits and thus indirectly influence smoking cessation (31).
The current data add appreciably to the increasingly robust sets of studies that document molecular genetic contributions to the ability to quit smoking. The present results add to the support for personalized approaches to smoking cessation treatment that come from recent analyses of single genotype–based scores for each of these subjects (14). In this work, abstinence varied on the basis of individual and/or interactive effects of genotype score, nicotine dose and baseline level of nicotine dependence in predicting the degree to which participants were able to reduce smoking during a two-week precessation treatment with NRT. We need to continue to work to apply an integrated sum of SNPs in the context of appropriate clinical information (http://www.genome.gov/27529204) to match individuals with the best type and/or intensity of therapy to maximize benefits and minimize side effects in smoking cessation. One current stepped-care approach based on these aggregate data might entail the following: (a) initial use of NRT, with assignment of nicotine dose based on dependence level and quit success genotype scores, (b) identification of individuals who do not reduce CO sufficiently during initial NRT and (c) prompt reassignment of such non–CO reducers to alternative therapies, such as bupropion or varenecline.
More precise information about genetic influences on the ability to quit smoking from these and prior data sets will aid us in constructing improved “quit success” genotype scores. In subsequent studies, for example, we can test whether the quit success scores in which data from SNPs are selected and weighted by P values (14) perform better or worse than quit success scores in which data from SNPs are selected and weighted on the basis of participation in Bayesian networks, such as those documented here. It is conceivable that such scores may also help us to assess the genetic determinants of generalized abilities to change other health-related behaviors. For both dependent individuals and individuals with other health problems that can be modified through behavior change, these data might thus add to an increasingly rich basis for improved understanding and for development of personalized treatment strategies.
This study was supported by the National Institutes of Health (NIH)– Intramural Research Program, National Institute on Drug Abuse, Department of Health and Human Services (GRU); a grant to Duke University (JE Rose, principal investigator) from Philip Morris USA (Richmond, VA, USA). The funders had no role in the planning or execution of the study, data analysis or publication of results. We are grateful for help from Joseph E. Herskovic, PhD, Eric C. Westman, MD, Qing-Rong Liu, PhD, and Donna Walther, MS. The underlying clinical trial was registered with clinicaltrials. gov (ID# NCT00734617). This study used BioBase (http://biobase-international.com), installed on the Helix System at the Center for Information Technology (CIT), National Institutes of Health, Bethesda, Maryland (http://helix.nih.gov).
GR Uhl and JE Rose are listed as inventors for a patent application filed by Duke University that specifies sets of genomic markers that distinguish successful quitters from unsuccessful quitters in data from other clinical trials. MF Ramoni has financial interest in BayesWare LLC.
Online address: http://www.molmed.org