|Home | About | Journals | Submit | Contact Us | Français|
A genetic contribution to smoking behavior is well established. To identify loci that increase the risk for smoking behavior, many genomewide linkage scans have been performed using various smoking behavior assessments. Numerous putative susceptibility loci have been identified, but only a few of these were replicated in independent studies.
We used genome seach meta-analysis (GSMA) to identify risk loci by pooling all available independent genome scan results on smoking behavior. Additionally, to minimize locus heterogeneity, subgroup analyses of the smoking behavior assessed by the Fagerstrom test for nicotine dependence (FTND) and maximum number of cigarettes smoked in a 24-hour period (MaxCigs24) were carried out. Samples of European ancestry were also analyzed separately.
A total number of 15 genome scan results were available for analysis, including 3404 families with 10,253 subjects. Overall, the primary GSMA across all smoking behavior identified a genomewide suggestive linkage in chromosome 17q24.3–q25.3 (PSR=0.001). A secondary analysis of FTND in European-ancestry samples (625 families with 1878 subjects) detected a genomewide suggestive linkage in 5q33.1–5q35.2(PSR=0.0076).Subgroup analysis of MaxCigs24 (966 families with 3273 subjects) identified a genomewide significant linkage in 20q13.12–q13.32 (PSR=0.00041, POR=0.048), where a strongly supported ND candidate gene, CHRNA4, is located.
The regions identified in the current study deserve close attention and will be helpful for candidate gene identification or target resequencing studies in the future.
Cigarette smoking is highly prevalent throughout many populations around the globe. Despite increasing awareness of the risks associated with smoking, the World Health Organization (1999) estimated that 1.1 billion people still smoke, and predicted that by 2025, the number will increase to 1.6 billion worldwide. Thus, understanding various factors that influence smoking behavior is critical to the prevention and cessation of smoking. Although the etiology of smoking behavior is complex, a genetic contribution to smoking behavior, presumably based on addiction to nicotine, is well established from twin and adoption studies (1, 2). Genetic linkage analysis can be a useful design to detect genes that segregate in families, including common variants, multiple rare variants within one locus, and copy number variation.
More than 20 genomewide linkage scans on smoking behavior have been performed using a variety of smoking behavior assessments, including DSM-IV defined nicotine dependence, the Fagerstrom Test for Nicotine Dependence (FTND) (3, 4), the Fagerstrom Tolerance Questionnaire (FTQ), Fagerstrom-derived smoking quantity (SQ) and Heaviness of Smoking Index (HSI), habitual smoking, persistent smoking, maximum number of cigarettes smoked in a 24-hour period (MaxCigs24), and others (5–24). Numerous putative susceptibility loci have been identified, but only a few of these have been replicated in independent studies, which is not uncommon for linkage analysis of genetically complex traits. Considering the high likelihood of many risk loci of low-to-moderate effect for complex traits and the relatively small sample size in each study, the discrepancy among study results is expected, because a single study may be statistically underpowered to detect a low magnitude but real genetic linkage. Although literature reviews have provided valuable overviews of progress in linkage studies on smoking behavior, they are not intended to provide formal statistical assessment of pooled evidence for linkage across studies. Considering the quantity of accumulated genome scan results on smoking behavior now available, a rigorous statistical method of synthesis of the reported results for linkage could provide a powerful approach to detect previously unappreciated linkage signals.
The genome search meta-analysis (GSMA) method has been proposed as a valid and robust meta-analysis technique to combine the evidence for linkage across multiple linkage scans using a non-parametric ranking method (25, 26). Apart from the advantage of greater power in detecting small but consistent evidence for linkage, GSMA can combine linkage results from studies with different family structures, marker sets, and statistical analysis methods. While a unique genetic spectrum might characterize each specific smoking behavior, we hypothesize that some risk loci are shared across different assessments. Consequently, the aim of the current study is to identify potential risk loci which are independent of distinct smoking behavior assessments using the GSMA by pooling all independent genome scan results of smoking behavior. Because increased sample homogeneity can be helpful to reduce locus heterogeneity and therefore increase power to detect regions specific for a particularly defined sample set, subgroup GSMA based on FTND and MaxCigs24 was carried out. Samples incorporating subjects of mostly European ancestry were also analyzed separately.
To identify existing genomewide linkage studies on smoking behavior, we conducted a computerized literature search of the PubMed database using the following keywords and subject terms: ‘linkage’, ‘smoking’, ‘nicotine dependence’, ‘genome-wide’, or ‘genomewide’. Review articles on genetics of smoking behavior were also screened. The genome scans included in the current GSMA were required to meet the following criteria: 1) whole genome linkage scan on smoking related traits performed in humans; 2) whole genome linkage results either available from the original investigators, or extractable from published graphs; 3) samples used in genome scans should be independent of each other. Linkage studies which were repeated analyses of the same sample using different statistical methods or phenotype measures were identified, and we included only one independent study for which the whole genome linkage results were available. In addition, when a study reported whole genome linkage results on different samples; we treated each sample as a separate genome scan. When a study reported a two-stage analysis, only the original results were used and any follow up studies in candidate regions were excluded, as the GSMA requires a uniform distribution of markers across the genome (25).
In total, 20 studies (24 genome scans, 5428 families) were identified (5–24), and finally 12 studies (15 complete genome scans, 3404 families) met criteria and were included in the GSMA (5–16), as listed in Table 1. Eight studies (9 genome scans, 2024 families) were not included for the following reasons (17–24). Four studies are repeated analyses of the Framingham Heart Study (6, 19–21); from these, the result from Goode (6) was used because the whole-genome linkage result was available from the published graph. Three linkage analyses have been performed on the data of the Collaborative Study on the Genetics of Alcoholism (7, 17–18); we used the result from Bierut (7) since the whole-genome linkage results were available from the authors. Two linkage analyses on two different assessments of smoking behavior (9, 22) were performed for the same sample from the Netherlands Twin Register; the result for “maximum number of cigarettes per day” was included (9). In the earliest study, the authors carried out a 2-stage genome scan, and only the result from stage 1 was used since stage 2 was a follow up genome scan in candidate regions (5). Finally, results for two studies (three genome scans) were not available (23, 24). In addition, two studies (15, 16) used the same sample of Finnish twin families, but performed the genome scan on different assessments of smoking behavior (FTND and MaxCigs24). The result from Loukola (16) was included for the primary GSMA on smoking behavior and FTND. The result from Saccone (15) was only used for the secondary GSMA on MaxCigs24. Consequently, the data for this analysis were collected from 12 studies (15 genome scans), including 3404 families with 10,253 individuals.
The relevant characteristics of each study included in the GSMA are summarized in Table 1. For each study, the following information was extracted: first author, journal, year of publication, ethnicity of study population, number of families, subjects, definition of phenotype, number of markers, linkage statistic, and software used for linkage analysis. If the genomewide linkage results were available from the published graphs, the required linkage statistics for GSMA were extracted from the figures by the digital software g3data (http://www.frantz.fi/software/g3data.php). Otherwise, the authors were invited to contribute the whole genome linkage results including marker names, genetic position and linkage statistics for each marker. The authors of five studies including seven genome scans (7, 13–16) provided the original linkage results. Genetic map positions and marker locations were unified based on the Marshfield genetic map.
The GSMA method was used to synthesize the evidence for linkage across multiple genome scans. In the primary GSMA, chromosomes are divided into approximately equal length bins traditionally ~30 cM, generating a total number of 118 bins on the autosomes based on the Marshfield genetic map. The notation “c.n” for the bin numbering is used to refer to the nth bin on chromosome c. For each study, each bin was assigned a within-study rank by its highest LOD, NPL or Z score or minimum P-value, so that the bin with the highest linkage score or minimum P-value is assigned a rank 118 and other bins are ranked in the descending order of their strength for linkage. The ranks were then summed across studies for each bin to obtain the summed rank (SR), which forms the basic test statistic for assessing linkage within the bin. Bins with high SR may show significant evidence for linkage.
We used GSMA software (27) to evaluate empirically the significance of the SR. Briefly, for each study, the observed rank values were randomly reassigned to 118 bins, allowing for tied ranks in each study to be incorporated in the null distribution. Bin ranks were summed across studies; this procedure was repeated 10,000 times. The empirical P-value of SR was calculated by counting the proportion of bins in which a summed rank value was equal to or larger than the observed one. In addition, a POR is calculated as the proportion of the simulated nth highest ordered sum rank (OR), which is equal to or greater than the observed nth highest summed rank through the same permutation procedure. Simulation studies have shown that bins with significant P-values (P < 0.05) of both SR and OR are likely to identify true linkage signals (26). By applying a Bonferroni correction for multiple testing (assuming 118 independent bins), values of PSR, 0.05/118=0.00042 and 1/118=0.0085 correspond to the genomewide significant and suggestive evidence for linkage, respectively. Simulation studies have shown that these thresholds are appropriate for the GSMA (28), with a P-value exceeding the genomewide significant threshold expected once by chance in 20 meta-analyses and a P-value exceeding the genomewide suggestive threshold expected once by chance per single meta-analysis.
We performed both unweighted and weighted GSMA analysis. In unweighted analysis, each study was assumed to contribute equally to the GSMA. The weighted GSMA takes into account the relative contribution from each study. The most appropriate weighting factor is not obvious; simulation studies have shown that the square root of the number of affected cases within each study performed well (26). Since most of the phenotypes we included in this GSMA were quantitative traits (FTND, MaxCigs24) and not binary outcomes, we used the square root of the number of genotyped subjects in each study as the primary weighting factor and the relevant results were reported in detail. To evaluate the influence of different weighting factors on the results, we also used an alternative weighting factor defined by the square root of number of pedigrees x number of markers used in each study (although we note that the latter is an imperfect approximation of information content in each study owing to varying information content from different markers, and diminishing information beyond a marker set that achieves genomewide coverage).
A bin width of around 30 cM is used in the GSMA most frequently, and was demonstrated to be optimal by simulation (26). In order to detect weak linkage signals near the boundary of two bins, a shifted 30cM bin GSMA was also applied by moving bin boundaries 15 cM and starting at the midpoint of the bins used in the primary 30 cM analysis (29).
Two major sources of heterogeneity in linkage studies arise from differing definitions of smoking behavior and ethnicities of samples, since a variety of smoking behavior assessments were included in the current GSMA across different populations. Although we postulated that some genes that regulate smoking behavior might be independent of specific smoking behavior assessments and population groups, restricting the combined analyses to studies with similar ascertainment criteria or to subjects from similar ethnic backgrounds could potentially increase power to detect linkage to particular regions where the relevant risk loci are specific either to a trait or to a population. Therefore, subgroup analyses of the families assessed by FTND or MaxCigs24 were performed. Subjects of European ancestry were also analyzed separately. Other populations or phenotype groups had too few studies available for separate analysis.
First, we performed the primary 30 cM bin width GSMA over all independent genome scans on smoking behavior, encompassing 3404 families with 10,253 genotyped subjects. Figure 1 illustrates the weighted and unweighted PSR for all bins across the genome. The full details of genetic regions showing bins with nominal significance in weighted analysis are shown in Table 2, and the unweighted analysis results are also included as a comparison with weighted analysis. The strongest evidence for a smoking behavior risk locus was found on chromosome 17q24.3–q25.3 (bin 17.4) where suggestive evidence for linkage was achieved by either unweighted (PSR = 0.002) or weighted analysis (PSR = 0.001).
To identify loci that might increase susceptibility to a specific smoking behavior trait, we performed subgroup GSMA over the genome scan results on the smoking behaviors measured by FTND and MaxCigs24. Five genome scan results were included in the analysis of FTND (1347 families with 3995 subjects). The weighted and unweighted PSR for all bins across the genome are illustrated in Figure 2. There were no regions that achieved genomewide significant or suggestive evidence for linkage, except six regions showed nominally significant evidence for linkage in the weighted analysis (Table 3). In the subgroup analysis of MaxCigs24, four genome scan results, all from European ancestry populations, were included (966 families with 3273 subjects). The genomewide results with weighted and unweighted analysis are illustrated in Figure 3. A genomewide significant linkage was identified in 20q13.12–q13.32 by both weighted (PSR=0.00041, POR=0.048) and unweighted analysis (PSR=0.00032, POR=0.037). Three regions (22q12.3–q13.32, 20p12.1–q13.12 and 17q24.3–q25.3) achieved genomewide suggestive evidence for linkage. Eleven regions on six chromosomes (2, 12, 16, 17, 20 and 22) with both PSR and POR less than 0.05 are somewhat likely to harbor risk loci for MaxCigs24 trait (Table 4).
To increase homogeneity and evaluate population specific linkages, samples of European ancestry (or mostly European ancestry) were analyzed separately. Since the primary GSMA on MaxCigs24 only included European ancestry populations, secondary analysis by European ancestry only needed to be carried out for smoking behavior and FTND. Figures S1 and S2 in the Supplement illustrate the results across all bins for smoking behavior and FTND, respectively. Here, we report the results of the European ancestry GSMA compared with the overall GSMA in weighted analysis. Briefly, 11 genome scan results were included for analysis of smoking behavior (2486 families with 7270 subjects). Bin 17.4 still achieved suggestive linkage (PSR=0.002). Three bins (11.1, 3.8 and 5.4) were no longer significant and two bins (22.2 and 5.6) became nominally significant (Table 2). For the FTND phenotype, three genome scan results were included (625 families with 1878 subjects). Two bins (6.6 and 5.4) were no longer significant and 2 bins (5.7 and 5.6) became significant (Table 3).
In order to identify weak linkage signals near the boundaries of two bins, we re-analyzed the data using a shifted 30cM bin GSMA starting at the midpoint of the bins used in the primary analysis. Figures S3, S4 and S5 in the Supplement illustrate the genomewide results using the shifted 30cM GSMA for smoking behavior, FTND and MaxCigs24, respectively. Here, we report the bins which achieved genomewide suggestive evidence for linkage (weighted analysis) in this secondary analysis. In the analysis of smoking behavior, bin 16.1 (16p13.2–16p12.1), achieved genomewide suggestive linkage in both all (PSR =0.0074) and European-ancestry (PSR=0.0085) samples. Bin 5.6 (5q33.1–5q35.2) reached genomewide suggestive linkage for FTND in European-ancestry samples (PSR=0.0076). In the analysis of the MaxCigs24 trait, bin 22.1 (22q11.22–22q13.2) reached genomewide suggestive linkage (PSR=0.0016).
Lastly, to evaluate how the different weighting factors could influence the results, we performed the weighted analysis using the weighting factor defined by the square root of number of pedigree x number of markers. As a result, we found that the resultant top ranked bins and P-values, generated by the two weighting factors, remain close to each other. Wilcoxon signed-rank test further showed no significant difference (P > 0.7) on the ranks of bins by the two different weighting factors for each trait we investigated. A comparison of the top five bins identified for each of the traits using both weighting approaches is shown in Table S1 in the Supplement.
The current GSMA, which included 3404 families with 10,253 subjects, has identified many regions with varying degrees of evidence of linkage for smoking behavior. In the primary 30 cM GSMA of combined smoking behavior, genomewide suggestive linkage was detected at chromosome 17q24.3–q25.3. The fact that we did not identify any bins with genomewide significant evidence for linkage in the primary analysis might imply the possible relatively higher genetic heterogeneity due to a variety of different smoking behaviors and sample ancestry. Although only nominal significance was detected in the primary GSMA for the FTND, genomewide suggestive linkage was observed in 5q33.1–5q35.2 by shifting the bin boundary 15 cM in a secondary analysis of European-ancestry samples. Subgroup analysis of the MaxCigs24 phenotype identified numerous linkage signals; this might be attributable to the improved power from the increased homogeneity of both the phenotype and sample ancestry. For example, a genomewide significant linkage in bin 20.3 (20q13.12–q13.32) was identified, and the adjacent bin 20.2 (20p12.1–q13.12) showed suggestive linkage, providing more support for a true linkage signal in this region. Eleven regions with both PSR and POR <0.05 support that some or all of these 11 regions are likely to harbor risk loci for the MaxCigs24 trait.
A necessary follow-up step is to evaluate if notable candidate genes map to regions nominated by the current GSMA. A strong candidate gene for nicotine dependence, CHRNA4 (20q13.2–q13.3) (30–34), is located in bin 20.3 (20q13.12–q13.32), where genomewide significant linkage was reached in the primary GSMA of MaxCigs24. CHRNA4, which encodes the nicotinic acetylcholine receptor α4 subunit gene, is highly expressed in the central nervous system (CNS) and plays a major role in tolerance, reward, and the modulation of mesolimbic dopamine function, all of which are critical to the development of nicotine dependence (35). Two genes, PLEKHG1 (36) and OPRM1 (37), are located in bin 6.5 (6q23.2–q25.3), which ranks highest and its adjacent bin 6.6 ranks second highest in the GSMA of FTND. The PLEKHG1 gene contains a pleckstrin homology (PH) domain and is expressed in the brain and peripheral nervous system (38). It is possible that variants of these PH-domain-containing proteins have an impact on the cell-signaling pathways that regulate neuronal plasticity, and thus could influence predisposition to ND. The µ-opioid receptor gene OPRM1 has been found to be associated with FTQ nicotine dependence (37) and plays a role in substance use and dependence across several drug classes (39–41). Two other previously identified candidate genes, DRD4 (42, 43) and COMT (44), are located at bin 11.1 and bin 22.1 respectively, each of which showed nominal significance in the primary GSMA of smoking behavior or MaxCigs24. Some other well known candidate genes, such as the NCAM1-TTC12-ANKK1-DRD2 gene cluster (45, 46) and DDC (47, 48), are not represented in any of the chromosomal regions identified. These findings might reflect the possibility that the effect size of these genes is too small to be detected by the current GSMA.
Recent genomewide association studies (GWAS) have identified many more genes implicated in smoking behavior. The first GWAS on smoking using sample pooling and 2.4 million SNPs suggested several novel genes possibly associated with ND (49). Among the top candidate genes list in this first GWAS, five genes were located in the GSMA nominated bins: NRXN1 in bin 2.3 (2p22.1-p13.2), FTO in bin 16.2 (16p12.3-q12.2), GPSM3 in bin 20.2 (20p12.1–q13.12), TRPC7 in bin 5.6 (5q31.2–q34) and FBXL17 in bin 5.4 (5q14.1–q21.3). Another recent GWAS on smoking behavior (50) was conducted for a sample of 840 European-ancestry subjects using ~380,000 SNPs, and has also identified genes possibly associated with smoking behavior, among which five genes map within regions discovered by the GSMA: TBC1D22A in bin 22.2 (22q12.3–q13.32), PDE10A in bin 6.6 (6q25.3–q27), RDH11 in bin 14.3 (14q23.3–q31.1), CENTD3 in bin 5.6 (5q31.2–q34) and LEP in bin 7.5 (7q31.1–q34).
Several GWAS have consistently identified a region on chromosome 15q24 associated with smoking intensity or lung cancer (51–55). Candidate gene studies have also confirmed the association between variation mapped to the gene cluster (CHRNA5, CHRNA3 and CHRNB4) at 15q24 and different smoking behaviors (34, 56–59). We are not aware of any individual genomewide linkage scan which has reported genomewide suggestive or significant linkage in the region of 15q24. Our primary 30 cM width GSMA did not detect any signal in 15q24, but we did find PSR=0.058 at bin 15.2 (15q21.1–q25.1) which covered the region 15q24 in the 30 cM shifted GSMA of smoking behavior in the European descent populations. The evidence for linkage obtained from the current GSMA thus fails to provide linkage support for the region 15q24 that apparently harbors ND susceptibility genes. The fact that we did not find any stronger evidence for linkage in this region might imply that the genetic effect is small and the current GSMA does not have sufficient power to achieve significant evidence for linkage in this region or that the heterogeneity among different samples obscure the discovery of some linkage loci with minor effects. This is hard to reconcile with the consistency with which this region has been identified in the GWAS.
We discuss briefly additional novel candidate genes mapped to regions discovered by the GSMA. In particular, we focus on the region of chromosome 17q24.3–q25.3 (bin 17.4), since this region ranks highest in the meta-analysis of the combined smoking behavior, and was consistently nominated by the meta-analysis of FTND and MaxCigs24. According to NCBI database information (http://www.ncbi.nlm.nih.gov/), there are 253 genes in 17q24.3–q25.3, among which we find two particularly promising candidate genes for smoking. One gene is G protein pathway suppressor 1 (GPS1), which suppresses G-protein and mitogen-activated signal transduction. Variants of this gene might influence the regulation of the dopamine signaling pathway and associated with smoking behavior. Another promising candidate gene is suppressor of cytokine signaling 3 (SOCS3). A recent GWAS discovered a genomewide significant association of IL15 with smoking behavior in males (50). As IL15 is an important cytokine that regulates T and natural killer cell activation and proliferation, the genetic association of IL15 with smoking may serve as paradigmatic for a novel mechanism for nicotine dependence involving immune modulation through the IL15 pathway. Hence, it is reasonable to suspect that variants in SOCS3 gene might influence the regulation of immune system through a cytokine signaling pathway.
One advantage of GSMA is to confirm consistent evidence for linkage across studies. We compared the regions nominated in the current GSMA with regions that were previously identified as consistent across several independent genome scans. Genomic regions on chromosomes 9 (from 91.9 to 136.5 cM based on the Marshfield map), 10 (62–158 cM), 11 (2–76 cM), and 17 (20–82 cM) have been replicated independently more often than other regions (1). However, no strong evidence was achieved for these regions in the current GSMA. GSMA is not used for exclusion mapping and the failure to show strong evidence of linkage for these regions does not necessarily mean that those regions do not harbor risk loci for smoking behavior. The GSMA method is particularly useful to identify regions that show weak but consistent evidence of linkage across multiple studies. It does not take into account directly whether the linkage signals aggregated have reached genomewide, or “suggestive,” evidence for linkage, themselves. Nonetheless, it is notable that some evidence from the current GSMA does support linkage on chromosome 11 and 17. For example, in the primary 30 cM GSMA of smoking behavior, bin 11.1 (0–29.6 cM) showed nominal significance (PSR= 0.017). Suggestive evidence for linkage has been achieved for bin 17.4 (95–126 cM) which is near the region (20–82 cM) most frequently reported on chromosome 17.
Although GWAS has greater power to detect small effects on phenotype of common variants and copy number variations (CNVs), an adequately powered linkage study design has the advantage of detecting diverse genetic effects that segregate in families, including common variants, multiple rare variants within one locus, and heritable CNVs. With the growing evidence for the role of rare variants and CNVs in psychiatry disorders (60, 61), the consensus regions discovered by linkage studies may serve as a useful complement to the emerging GWAS approach in reconstructing the genetic architecture of psychiatry disorders, especially in pinpointing the causal rare variants that cannot be captured by common tag SNPs in the GWAS design. In conclusion, the current meta-analysis including 15 genome scans of smoking behavior has identified many regions showing evidence of linkage with smoking behavior. Known and novel candidate genes map to the highly ranked regions are of particular interest. Therefore, the regions identified in the current study deserve close attention and will be helpful for candidate gene identification or target resequencing studies in the future.
This work was supported by the U.S. National Institutes of Health (career development award K01 DA024758 to B.Z.Y., and R01 DA12690 and R01 DA12849 to J.G.) and by the US Department of Veterans Affairs (VA CT REAP and New England MIRECC Center). We express our appreciation to the following investigators who contributed their original linkage results to the current meta-analysis: Dr. Laura Jean Bierut; Dr. Kirk C. Wilhelmsen; Dr. Jaakko Kaprio and Dr. Scott F. Saccone. We would like to thank Dr. Cathryn Lewis for helping in understanding the GSMA. We also acknowledge the help of Dr. Paola Forabosco for providing valuable scripts for accomplishing the GSMA. Finally, we thank Ms. Yingdi Xu for digitalizing the linkage results from the published graphs.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.