|Home | About | Journals | Submit | Contact Us | Français|
The molecular mechanisms causing smoking‐induced health decline are largely unknown. To elucidate the molecular pathways involved in cause and consequences of smoking behavior, we conducted a genome‐wide gene expression study in peripheral blood samples targeting 18238 genes. Data of 743 smokers, 1686 never smokers and 890 ex‐smokers were available from two population‐based cohorts from the Netherlands. In addition, data of 56 monozygotic twin pairs discordant for ever smoking were used. One hundred thirty‐two genes were differentially expressed between current smokers and never smokers (P<1.2×10−6, Bonferroni correction). The most significant genes were G protein‐coupled receptor 15 (P<1×10−150) and leucine‐rich repeat neuronal 3 (P<1×10−44). The smoking‐related genes were enriched for immune system, blood coagulation, natural killer cell and cancer pathways. By taking the data of ex‐smokers into account, expression of these 132 genes was classified into reversible (94 genes), slowly reversible (31 genes), irreversible (6 genes) or inconclusive (1 gene). Expression of 6 of the 132 genes (three reversible and three slowly reversible) was confirmed to be reactive to smoking as they were differentially expressed in monozygotic pairs discordant for smoking. Cis‐expression quantitative trait loci for GPR56 and RARRES3 (downregulated in smokers) were associated with increased number of cigarettes smoked per day in a large genome‐wide association meta‐analysis, suggesting a causative effect of GPR56 and RARRES3 expression on smoking behavior. In conclusion, differential gene expression patterns in smokers are extensive and cluster in several underlying disease pathways. Gene expression differences seem mainly direct consequences of smoking, and largely reversible after smoking cessation. However, we also identified DNA variants that may influence smoking behavior via the mediating gene expression.
The molecular mechanisms causing smoking‐induced health decline are largely unknown. Associations between gene expression and smoking behavior have been observed in multiple tissues (airway epithelial cells, alveolar macrophages, leucocytes, lymphocytes, B cells, monocytes and in whole blood). With one exception (n=1240) (Charlesworth et al. 2010), all studies had relatively small sample sizes (n<200) (Spira et al. 2004; Heguy et al. 2006; Beane et al. 2007; Lodovici et al. 2007; Charlesworth et al. 2010; Pan et al. 2010; Beineke et al. 2012; Paul & Amundson 2014). Gene expression studies can help elucidating the molecular pathways involved in the etiology and consequences of smoking behavior.
Most studies explored differential gene expression patterns for smoking by comparing gene expression in current smokers with never smokers. Reversibility of gene expression due to smoking can be addressed when ex‐smokers are also included. Reversible genes are differentially expressed between current and ex‐smokers, but not between ex‐smokers and never smokers (Beane et al. 2007). If a gene shows a reversible gene expression pattern, this might suggest that the gene expression pattern is a reaction to smoking (a reactive gene expression). This does not have to be the case: the higher expression in the current smokers compared with non‐smokers could be the result of a higher genetic liability to smoking behavior, making a person more vulnerable to start smoking and continue smoking (causative gene expression).
When studying gene expression associations with smoking behavior in a cross‐sectional design, reactive and causative gene expression cannot easily be distinguished. By studying monozygotic (MZ) twin pairs discordant for smoking, reactive genes may be detected, as differential gene expression between a smoking MZ twin and the genetically identical non‐smoking co‐twin cannot be caused by differences in genetic liability, and the differential expression is therefore likely to be reactive to smoking (van Leeuwen et al. 2007). Studying expression quantitative trait loci (eQTLs) may identify causative gene expression. When differential expression between smokers and non‐smokers is associated with DNA variants that also influence smoking behavior, this influence may be mediated by gene expression (which is therefore likely to be causative for smoking).
In the present study, we analyze micro‐array gene expression measurements in peripheral blood in two Dutch cohorts (Jansen et al. 2014; Wright et al. 2014). First, we aim to identify genes with differential expression between 743 current smokers and 1686 never smokers. Second, we aim to determine the reversibility of the identified genes by comparing the gene expression levels of current and never smokers with the gene expression levels of 890 ex‐smokers. The third aim is to classify the expression of the identified genes as reactive or causative for smoking, using the MZ twins and eQTL lookups in the Tobacco and Genetic Consortium (TAG Consortium 2010) genome‐wide association (GWA) study for smoking behavior. This is the first large‐scale gene expression study for smoking in peripheral blood combining these different approaches.
Two projects supplied data for this study: the Netherlands Twin Register (NTR) (Willemsen et al. 2010) and the Netherlands Study of Depression and Anxiety (NESDA) (Spijker et al. 2004). Both studies were approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Center, Amsterdam [Institutional Review Board (IRB) number IRB‐2991 under Federalwide Assurance 3703; IRB/institute codes, NESDA 03‐183; NTR 03‐180], and all subjects provided written informed consent. The sample consisted of 3109 participants from NTR and 1962 from NESDA. The age ranged from 18 to 88years (mean 38, standard deviation 13), and 65 percent of the sample was female. Data were used of 2892 individuals (from 1433 families) from the NTR and 427 unrelated participants from NESDA. NESDA participants without data on blood counts (n=1535) were excluded. In addition, data of 56 MZ twin pairs discordant for smoking were available from the NTR. eQTL lookup was performed using data of 5071 subjects (3109 NTR and 1962 NESDA participants) for which both genotype and gene expression data were available.
Information on smoking behavior was obtained during the home visit for blood collection as part of the NTR biobank project (2004–2008). Participants were asked whether they were current smokers, ex‐smokers or never smokers. For current smokers, the number of cigarettes smoked per day was obtained, and for ex‐smokers, the time since quitting (in years). Subjects who exclusively smoked pipe or cigar were excluded. Data were verified by linking this information to survey data available from the longitudinal survey study of the NTR (Supporting Information Method A). For NESDA subjects, information on smoking was obtained during the day of blood sampling as part of the face‐to‐face baseline interview. Sex, age at the time of blood draw, body mass index (weight/height squared in kg/m2) at the time of blood draw, lymphocyte count, monocyte count, neutrophil count, eosinophil count, basophil count and blood hemoglobin (mmol/l) were extracted from NESDA and the NTR databases (Jansen et al. 2014; Wright et al. 2014). In addition, cotinine levels were available for 612 of 743 current smokers.
The blood sampling and RNA extraction procedures have been described in detail previously (Willemsen et al. 2010; Jansen et al. 2014; Wright et al. 2014). In short, for NTR, venous blood samples were drawn between 7 and 10am after an overnight fast. Within 20minutes of sampling, heparinized whole blood was transferred into PAXgene Blood RNA tubes (Qiagen, Valencia, Florida, USA) and stored at −20°C. The PAXgene tubes were shipped to the Rutgers University Cell and DNA Repository (http://www.rucdr.org). RNA was extracted using Qiagen Universal liquid handling system (PAXgene extraction kits from manufacturer's protocol). Total RNA was measured by spectroscopy (Trinean DropSense, GentBrugge, Belgium) to determine purity and concentration; RNA fidelity was measured by the (Agilent Bioanalyzer, Santa Clara, California, USA) analysis.
From NESDA subjects, venous whole‐blood samples were obtained between 8 and 10am, after overnight fasting. Between 10 and 60minutes after blood draw, 2.5ml of heparinized blood was transferred into PAXgene tubes. This tube was kept at room temperature for a minimum of 2hours and then stored at −20°C. Average time between blood sampling and RNA extraction was 113weeks. Total RNA was extracted according to the manufacturer's protocol (Qiagen) (Spijker et al. 2004).
Gene expression was assessed at the Rutgers University Cell and DNA Repository. Samples were randomly assigned to plates. For cDNA synthesis, 50ng of RNA was reverse transcribed and amplified in a plate format on a Biomek FX liquid handling robot (Beckman Coulter, Brea, California, USA) using Ovation Pico WTA reagents per the manufacturer's protocol (NuGEN, San Carlos, California, USA). Products purified from single‐primer isothermal amplification were then fragmented and labeled with biotin using Encore Biotin Module (NuGEN). Prior to hybridization, the labeled cDNA was analyzed using electrophoresis to verify the appropriate size distribution (Caliper AMS90, HT DNA 5K/RNA LabChip). Samples were hybridized to Affymetrix U219 array plates (GeneTitan, Affymetrix, Santa Clara, California, USA). The U219 array contains 530467 probes for 49293 transcripts. All probes are 25 bases in length and designed to be ‘perfect match’ complements to a designated transcript. Array hybridization, washing, staining and scanning were carried out in an Affymetrix GeneTitan System per the manufacturer's protocol.
Gene expression data were required to pass standard Affymetrix quality control metrics (Affymetrix expression console). Probes were removed when their location was uncertain or if their location intersected a polymorphic single‐nucleotide polymorphism (SNP), leaving 44241 probe sets for analysis. Expression values were obtained using robust multi‐array average normalization implemented in Affymetrix Power Tools (v 1.12.0). Data for samples that displayed an average Pearson correlation below 0.8 with the probe set expression values of other samples and samples with incorrect sex chromosome expression were removed.
1a. Differential gene expression between current smokers and never smokers
Linear mixed models were used to explore differences in gene expression patterns between current smokers (n=743) and never smokers (n=1686) while correcting for family relatedness (Visscher, Benyamin, & White 2004). For each of the 44241 probe sets, a mixed model was fitted with gene expression as dependent variable and smoking status (current versus never) as independent variable. Fixed‐effect covariates included in the final model were sex, age at the time of blood draw, body mass index, lymphocyte count, monocyte count, neutrophil count, eosinophil count, basophil count, average correlation between arrays, hemoglobin (mmol/l), cohort (NTR or NESDA), time of blood sampling, month of blood sampling, time between blood sampling and RNA extraction and the time between RNA extraction and RNA amplification. Random effects were plate, well, family ID and zygosity. Non‐significant covariates were not included in the final model: depression status (DSM‐IV depression yes/no), psychotropic medication (yes or no), alcohol use (≥12 glasses of alcohol per week versus <12 glasses per week or not drinking), education level (low, moderate or high), time between RNA amplification and RNA fragmentation and time between RNA fragmentation and RNA hybridization. Mixed models and P‐values were computed using the R function lmer from the package lme4. To correct for multiple testing, a Bonferroni correction was applied (P<0.05/44241=12−06). The genes differentially expressed between current and never smokers are called ‘smoking‐related genes’ throughout the rest of the manuscript.
1b. Pathway analyses of smoking‐related genes. We used Fisher's exact test to explore whether gene categories were enriched among the identified smoking‐related genes. We used gene categories from Gene Ontology Biological Process (GOBP), Kyoto Encyclopedia of Genes and Genome (KEGG) and Ingenuity Pathway Analysis (IPA). KEGG, GOBP and IPA have overlapping but also unique gene categories. GOBP does not contain disease gene categories, but KEGG and IPA do. GOBP covers many biological processes not covered by KEGG and IPA. KEGG contains strongly curated pathways, whereas IPA provides more broadly defined categories. IPA provides P‐values and corresponding false discovery rates (FDRs) for enrichment. For the GOBP and KEGG pathways, the statistical software package R was used. GOBP categories were retrieved using the R package org.Hs.eg.db (version 2.10.1). KEGG pathways were downloaded from http://www.broadinstitute.org/gsea/downloads.jsp (c2.cp.kegg.v4.0.entrez.gmt). The reference set and the gene set were defined separately for KEGG and GOBP analysis. In total, 9902 GOBP and 186 KEGG categories contained one or more genes measured by the U219 microarrays. The gene set with smoking‐related genes was tested for enrichment (separately for upregulated and downregulated genes) in categories with more than one gene overlap. The P‐value from the exact test is the chance that the overlap between the gene set and the gene category is not larger than for random gene sets of this size within the reference set. For the P‐values derived from these tests, the FDR was computed.
2. Reversibility of the associations between smoking and gene expression
In the ex‐smoking and never smoking groups, the percentage of women and the mean age is higher compared with that of the group of current smokers. Importantly, neutrophil (P=1.1×10−40), monocyte (P=1.7×10−34), lymphocyte (P=1.2×10−30) and eosinophil (P=9.4×10−14) counts differed between current smokers and never smokers. Blood subcell constitution has a major impact on whole‐blood gene expression, and thus, blood cell counts are major confounders when identifying smoking‐associated gene expression differences. The characteristics of the participants are summarized in Supporting Information Table S1.
The present study is the largest investigation of gene expression levels in peripheral blood of smokers and non‐smokers to date. The results revealed 132 differentially expressed genes in current smokers compared with non‐smokers. These smoking‐related genes were enriched for several disease‐related pathways. Expression of most of these genes is likely to be reactive to smoking as can be derived from integrating data from MZ twin and ex‐smokers. However, using eQTL analysis, we identified two SNPs (rs8058865 in GPR56 and rs10897430 in RARRES3) that may influence smoking behavior through intermediate gene expression.
Most previous studies investigating gene expression patterns in relation to smoking were carried out in relatively small samples (Spira et al. 2004; Heguy et al. 2006; Beane et al. 2007; Lodovici et al. 2007; Charlesworth et al. 2010; Pan et al. 2010; Beineke et al. 2012; Wright et al. in press). So far, 1 larger study is published including 1240 individuals (Charlesworth et al. 2010). They detected 323 genes (FDR 5 percent) whose expression levels in lymphocytes were significantly correlated with smoking behavior. In total, 54 of the 323 genes were also present in our list of 132 smoking‐related genes (=41 percent of our genes). In a smaller study (n=209 subjects+180 subjects for replication), a five‐gene predictive model for smoking status in whole blood was developed (Beineke et al. 2012), and three of the five genes (LRRN3, CLDND1 and LEF1) were also found in our list of 132 smoking‐related genes. This confirms that the differential gene expression in smokers is a robust finding. In contrast, there was no overlap between the genes expressed in airway epithelial cells of smokers versus non‐smokers and the 132 smoking‐related genes detected in our study using peripheral blood. This might point to tissue‐specific gene expression: smoking a cigarette might influence the expression of other genes in lung tissue compared with blood cells.
To rule out the possibility that the effects we found are merely due to difference in general health between current smokers and non‐smokers, we added a variable reflecting overall health status to the model. Betas and P‐values for smoking status were highly correlated between models with and without the health variable (Spearman's rho=0.99 and 0.91, respectively; data not shown), indicating that health variables are unlikely to explain the observed associations between smoking status and gene expression.
Pathway analyses of the 132 smoking‐related genes showed that both the upregulated and downregulated genes are involved in the immune response, the immune system and natural killer cell activation. It is well known that inflammatory cells produce a variety of mediators in response to smoking (Sopori 2002; Rom et al. 2013). It has been speculated that many of the health consequences of chronic inhalation of cigarette smoke might be due to its adverse effects on the immune system (reviewed by Rom et al. 2013). In addition, the downregulated genes were involved in blood coagulation, asthma and cardiac infarction, while the upregulated genes were part of cancer pathways. The expression of some well‐known cancer genes, like MYC and LEF1, was upregulated in smokers compared with never smokers. LEF1 codes for a lymphoid enhancer‐binding factor 1, which is located in the nucleus, and the protein encoded by MYC is a multifunctional, nuclear phosphoprotein that plays a role in cell cycle progression, apoptosis and cellular transformation. Both MYC and LEF1 act as transcription regulators and are involved in many types of cancers including lung cancer (Nguyen et al. 2009; Dang 2012).
Most of the genes were reversible (71 percent) or slowly reversible (24 percent) after smoking cessation, and only six genes (4.5 percent) were irreversible. Of the six genes that were classified as irreversible, LEF1 is also associated with smoking in a previous study (Beineke et al. 2012). The other five genes are not reported in previous studies on smoking but need further investigation in order to explain the differential gene expression between ever smokers and never smokers. The genes classified as ‘slowly reversible’ included our top hits GPR15 and LRRN3. For GPR15, this is in line with a previous study, while LRRN3 was classified as both rapidly reversible (Wan et al. 2012) and slowly reversible (Beineke et al. 2012) in previous studies. GPR15 is a G protein‐coupled receptor that acts as a chemokine receptor for human immunodeficiency virus types 1 and 2. LRRN3 is a gene coding for a leucine‐rich repeat neuronal 3, and it is highly expressed in lymphocytes. Both genes were associated with smoking status in peripheral blood data sets (Wan et al. 2012; Paul & Amundson 2014), and GPR15 has been associated with cigarette smoking in three different epigenetic studies focusing on changes in DNA methylation (Breitling et al. 2011; Wan et al. 2012; Sun et al. 2013). More evidence could be obtained with longitudinal data investigating differences in gene expression levels within individuals who quit or start smoking.
The fact that we also observed a relation between the number of cigarettes smoked per day or cotinine levels and the level of gene expression suggests a dose–response relationship. The pattern was observed for most genes but was significant for the expression of 28 genes with cigarettes per day and 10 genes with cotinine. This dose–response relationship is confirmed in the ex‐smokers in whom the gene expression levels of long‐term quitters were often closer to those of non‐smokers than those of short‐term quitters.
In order to assess the relative importance of environmental versus genetic sources, we compared the gene expression patterns in MZ twin pairs discordant for smoking. The MZ twin design is a powerful design. A paper of Haque, Gottesman & Wong (2009) reviewed several studies that showed substantial epigenetic differences in MZ twin pairs discordant for psychiatric phenotypes. In smoking research, only one small study with nine MZ twin pairs discordant for smoking reported several genes in which the expression differed in smokers and their non‐smoking co‐twins, but none of these genes overlapped with our top results. Of the 132 smoking‐related genes, six genes were differentially expressed in the MZ pairs consisting of a current smoking twin and a twin who never smoked: GPR15, PF4, TTC38, ACO63977.1, ALAS2 and EIF2AK1 (FDR<5 percent for 132 tests). Gene expression of the six genes was categorized as reversible (n=3) or slowly reversible (n=3). Those genes can be considered to show reactive gene expression. The other 126 genes might also have reactive gene expression, but considering the small sample size, the power might be limited. For most of the 132 smoking‐related genes (75 percent), the gene expression effects (upregulated or downregulated) in the MZ twin pairs were in the same direction as the effects in the total population. This suggests that cigarette smoking influences gene expression.
In addition, we identified two SNPs (eQTLs of the smoking‐related genes GPR56 and RARRES3), which were positively associated with the number of cigarettes per day in the large meta‐analyses of the TAG Consortium. Both cis‐eQTLs were also found in another eQTL study in whole blood (Westra et al. 2013). In our sample, these SNPs were both significantly downregulated in current smokers. GPR56 is a member of the G protein‐coupled receptor family. GPR56 has been shown to have numerous roles in cell guidance/adhesion as demonstrated by its roles in tumor inhibition and neuron development (Fève et al. 2014). RARRES3 codes for the retinoic acid receptor responder protein 3. RARRES3 is thought to act as a tumor suppressor or growth regulator. A recent study showed that loss of function of RARRES3 in estrogen receptor‐negative breast cancer cells stimulates their invasive capacity and promotes metastasis to the lung. It is remarkable that both genes are associated with tumor inhibition. Previous studies have demonstrated that the well‐known association between cancer and smoking is largely explained by causal effects of smoking (Lee & Hashibe in press), but these results surprisingly suggest an additional role for pleiotropy (same genes influencing both smoking and cancer risk). The TAG consortium GWA study P‐values for these two SNPS were not very low and need replication in future studies.
Our study was carried out in blood. Although effects of smoking on gene expression might be present in other tissues too, like airway epithelial cells, peripheral blood appears to be a good surrogate tissue for investigating the effect of smoking on gene expression as it expresses a large proportion of the genes encoded in the human genome. Comparison of peripheral blood transcriptome with genes expressed in nine different human tissue types revealed that over 80 percent of gene expression was shared with any given tissue (Liew et al. 2006). The current study showed that it is possible to identify gene expression sites and eQTLs with demonstrable criterion validity for smoking behavior in peripheral blood draws.
Another important note is that the current study focused on cigarette smoking; we did not include questions on other ways to take in nicotine (like e‐cigarettes or water pipe) or on cannabis use (often smoked) in the biobank study. It should be noted that this information is available from the longitudinal survey study of the NTR for a subsample, but especially with gene expression studies, it is crucial that the information is collected on the same time as the blood collection.
Lastly, it should be noted that persons who have quit a long time ago might be healthier than those who quit recently. We did not ask for the reasons of quitting, so this might represent a mix of health and non‐health related reasons. However, when we corrected the models for self‐reported health, the health variable did not explain the observed associations between smoking and gene expression.
In conclusion, our results suggest that cigarette smoke causes differential gene expression. The differentially expressed genes play a role in several disease pathways including cancer. Most smoking‐related gene expression seem reversible after smoking cessation. In addition, we found two genetic variants influencing gene expression and making subjects vulnerable for smoking behavior. The current results are an important step to provide insights into the association between smoking behavior and differential gene expression.
Figure S1 (A) The effect (beta) of the comparison between current and never on the x‐axis, with the effect (beta) of the comparison between current and ex on the y‐axis for the 132 candidate genes. (B) The effect (beta) of the comparison between current and never on the x‐axis, with the effect (beta) of the comparison between ex and never on the y‐axis for the 132 candidate genes. Effect is in the same direction when dots are located in the upper right square of the plot (both positive) or in the lower left square of the plot (both negative), which is the case for 100 percent of the genes in (A) and for 80 percent of the genes in (B).
Figure S2 (A) The effect (beta) of the regression of CPD on gene expression levels on the y‐axis, with the effect (beta) of the comparison of gene expression levels between current and never on the x‐axis for the 132 smoking‐related genes. (B) The effect of the regression of time since quitting on gene expression levels on the y‐axis with the effect (beta) of the comparison of gene expression levels between current and never on the x‐axis for the 132 smoking‐related genes. (C) The effect of the comparison of gene expression levels of the 132 candidate genes between monozygotic twin pairs discordant for current smoking on the x‐axis, with the effect of the comparison between current and never smokers in the total population on the y‐axis.
Table S1 Characteristics of the study sample consisting of smokers, ex‐smokers and never smokers.
Table S2 Overview of the 132 smoking‐related genes.
Table S3 Pathway analyses for the 132 smoking‐related genes.
Supporting info item
Supporting info item
Supporting info item
Supporting info item
Supporting info item
Supporting info item
Supporting info item
Supporting info item
Supporting info item
Supporting info item
This work was supported by the Netherlands Organization for Scientific Research (MagW/ZonMW grants 904‐61‐090, 985‐10‐002, 904‐61‐193, 480‐04‐004, 400‐05‐717 and 912‐100‐20; Spinozapremie 56‐464‐14192; Geestkracht program grant 10‐000‐1002); the Center for Medical Systems Biology (NWO Genomics), Biobanking and Biomolecular Resources Research Infrastructure, VU University's Institutes for Health and Care Research and Neuroscience Campus Amsterdam, NBIC/BioAssist/RK (2008.024); the European Science Foundation (EU/QLRT‐2001‐01254); the European Community's Seventh Framework Program (FP7/2007‐2013); ENGAGE (HEALTH‐F4‐2007‐201413); and the European Research Council (ERC 284167 and ERC 230374). Gene expression data were funded by the US National Institute of Mental Health (RC2 MH089951) as part of the American Recovery and Reinvestment Act of 2009. R.J. was supported by the Biobank‐based Integrative Omics Study (BIOS) consortium, which is funded by the Biobanking and Biomolecular Research Infrastructure (BBMRI‐NL, NWO project 184.021.007).
Vink J. M., Jansen R., Brooks A., Willemsen G., van Grootheest G., de Geus E., Smit J. H., Penninx B. W., and Boomsma D. I. (2017) Differential gene expression patterns between smokers and non‐smokers: cause or consequence?. Addiction Biology, 22: 550–560. doi: 10.1111/adb.12322.