III.A. Analytical Approach
III.A.1 Smoking Genetics and Genetic Instruments
As discussed above, previous studies have used cigarette prices/taxes and other area-level instruments for smoking. In this study, we use a new source of variation in smoking that is due to genetic predisposition. There is a convergent evidence of strong links between specific genetic factors and cigarette smoking. Smoking behaviors are well known to involve a complex etiology of genetic, social and economic factors (Li, 2006
; Tyndale, 2003
). Many twin and adoption studies have demonstrated that genetic heritability is at least 50% for both smoking initiation and smoking persistence (Carmelli et al., 1992
; Heath and Martin, 1993
; Lessov et al., 2004
; Maes et al., 2004
). Researchers have identified several variants (SNPs) in nicotine, detoxification and neurotransmitter genes that are significantly correlated with several smoking behaviors including smoking initiation, smoking intensity such as cigarettes per day and other intensity measures, and smoking cessation.4
Of course, non-genetic factors including economic, social, human capital and environmental factors such as prices, income, education, availability/supply of cigarettes, cigarette advertising, peer behavior and social network effects, and culture also affect smoking decisions. Nonetheless, there is strong and consistent evidence that various types of smoking behaviors are influenced by genetic factors. This clearly does not negate the importance of the economic, social and other non-genetic effects, but rather highlights the complex etiology that underlies smoking behaviors and that likely involves several genetic and non-genetic factors acting independently or interactively with each other. Furthermore, typically evaluated economic, social and demographic factors generally explain 10% or less of smoking behaviors (e.g., Cheng and Kenkel, 2010
). The large percentage of variation in smoking that is unexplained by these factors is consistent with the suggested involvement of genetic factors in smoking behaviors.
The increasingly available data on specific genetic contributors to risk behaviors such as smoking provides a wealth of information for economists and other social scientists who wish to study the effects of these risk behaviors on outcomes (Fletcher, 2011
; Wehby et al., 2008
). Specifically, genetic variants may be utilized as instruments in order to account for the endogenous selection of risk behaviors. Other disciplines such as epidemiology have also begun to embrace this approach, referred to as “Mendelian randomizaton” in the epidemiology literature (Lawlor et al., 2008
Genetic variants provide several advantages over traditional instruments for behaviors, which have ranged from individual-level enabling factors (such as income or employment) to area-level characteristics related to prices or policies. There are different types of DNA variants, but the most commonly-studied types are DNA base-pair variants, called single nucleotide polymorphisms (SNPs). Each SNP has two variants, called alleles (allele A or B). For each SNP, an individual may have a homozygote combination of either allele (AA or BB) or a heterozygote combination (AB). At the time of germ cell generation – a process called meiosis – each allele has an equal 50% chance of being transferred. In other words, the chance of inheriting either allele from a heterozygote parent is 50%.
Note that this “random” allele inheritance process does not imply that all individuals have the same probability of inheriting a certain allele. For example, a child born to AA-homozygote parents has a 100% chance of inheriting the AA genotype.5
Nonetheless, the process of random assignment of alleles from each parent to their child over time suggests that genetic variants that are related to the risk behavior of interest (smoking in this study) are unlikely to be correlated with unobserved behaviors that are correlated with this risk behavior through preferences for health and risk taking, if these variants do not directly affect these unobserved behaviors or the unobserved preferences. Studies have found evidence of this randomizing effect of genetic instruments (Smith et al., 2007
). Another advantage of genetic instruments is that they precede all behaviors in time and cannot be affected by behaviors. In contrast, other instruments (including area-level instruments such as cigarette prices or public smoking policies) may be affected by behaviors if individuals choose where to live based in part on their behaviors and area characteristics, and if population behaviors affect these characteristics (such as if smoking rates are a factor in policy decisions on cigarette taxes and policies).
Of course, violations of the exogeneity of genetic instruments may occur due to certain factors, as has been previously highlighted (von Hinke Kessler Scholder, 2011
). First, genetic variants in physical proximity on the same chromosome may be correlated to each other, a process called linkage disequilibrium. If genetic instruments of interest are correlated with other genetic variants that are associated with unobserved confounders, such as other risk behaviors correlated with smoking, then the genetic instruments become endogenous (Lawlor et al., 2008
). This threat to instrument exogeneity may be evaluated when suspected by studying the correlations between the genetic instruments and neighboring SNPs that are suspected of being related to unobserved confounders. Also, some genes have multiple functions and may affect several relevant behaviors and risk factors, which if unobserved, may violate the instrument exogeneity. Therefore, it is important to understand the gene functions and their implications for the instrument validity. Other violations may occur due to compensatory effects for genetic risks, although this is unlikely to be relevant for most applications aimed at studying behavior effects (von Hinke Kessler Scholder, 2011
Another potential violation of instrument exogeneity may occur due to non-random matching within couples based simultaneously
on genetic and non-genetic factors (behavioral, economic, and social factors) that may contribute to their children’s behaviors. If smokers are more likely to marry smokers due to shared preferences (including preferences for risk taking, future discounting and health importance), and if parental smoking increases children’s propensity to smoke (due for example to the availability of cigarettes in the household and the “acceptability” of smoking in the family), then this does not violate the exogeneity of genetic instruments by itself. In other words, as long as such preferences are not caused by the genetic instruments, the instruments remain exogenous even if matching occurs based on these preferences. However, if smokers are more likely to intermarry due to the genetic risk factors that predispose to smoking, which is less expected compared with matching based on preferences that affect smoking, then this would violate the exogeneity of the genetic instruments. A standard test for Hardy-Weinberg Equilibrium (HWE) can be employed to evaluate whether the proportions of SNP genotypes satisfy the expected distribution based on random mating (Wigginton and Abecasis, 2005
), which we employ as described below.
Another potential limitation of genetic instruments that may be common in several applications and that needs to be acknowledged is that the instruments may have significant but “weak” effects on the endogenous behavior – i.e. do not explain a large percentage of the variation in behavior – based on the commonly used thresholds for non-weak instruments such as an F-statistic above 10 or specific F-statistic thresholds based on 2SLS bias relative to OLS (Stock and Yogo, 2005
This is expected because of the complex etiologies of behaviors that involve several genetic and non-genetic risk factors. Both, the current knowledge of biological pathways and empirical evidence to date suggest that most genetic factors are likely to explain individually only a small portion of the variation of complex traits such as behaviors. Weak instruments, even when exogenous as they should be, may bias the endogenous variable effect in the direction of the estimate assuming exogeneity and its variance downward, which may result in wrong inference of significant effects (Hahn and Hausman, 2003
). Appropriate inference that is robust for weak instruments is needed in this case in order to avoid any bias with standard inference approaches. Therefore, there may be a tradeoff for several applications with genetic instruments between the theoretical advantage of instrument exogeneity compared to other non-genetic individual- or area-level instruments, and the empirical challenge of identifying the behavioral effect based on limited exogenous variation. However, with the availability of weak-instrument robust inference methods, and given the limitations and mixed evidence of previous studies of smoking and body weight employing other identification approaches, evaluating the smoking effects using genetic instruments likely adds significant knowledge to this area.
Our study uses the following three SNPs in the gene GABBR2
as instruments for smoking: rs1435252, rs3780422 and rs1930139. GABBR2 is
on chromosome 9 and is involved in neuronal activity inhibition by coding for a protein for a GABA-B receptor involved in neurotransmitter release. This gene is considered to be a high priority candidate gene for nicotine dependence (NICSNP, 2007
). Two studies using the same data source but different sample sizes found variants in GABBR2
to be significantly involved in several smoking measures including the Fagerström Test for Nicotine Dependence (FTND), cigarettes per day, and the Heaviness of Smoking index (Beuten, 2005
; Li et al., 2009
). A candidate gene study of a sample of pregnant women from the Danish National Birth Cohort that was included in a GWAS prematurity study found several SNPs in GABBR2
to be significantly associated with the average number of cigarettes smoked per day during pregnancy, although the associations were not very strong, with F-statistics ranging from 3 to 7 (Prater et al., 2011
). Furthermore, a study found that nicotine significantly modifies the expression of GABBR2
in the brain of rats both at the messenger RNA and protein levels, providing support for the involvement of this gene in smoking behaviors (Sun et al., 2007
The evidence from the above studies suggests a role for GABBR2
in smoking and provides some support for considering it as an instrument. However, not all studies find significant association between GABBR2
and smoking behaviors. A candidate-gene study of smoking that evaluated several GABBR2
variants did not find significant associations between these variants and smoking (Agrawal et al., 2008
). The inconsistency in results between that study and the first two studies finding significant association with smoking introduces some uncertainty regarding whether GABBR2
truly affects smoking behaviors. However, there are major differences between that study and those other two studies in design and smoking measures. The first two studies reporting significant association use a family-based association design where probands – the nuclear participants who define the family structure for the genetic association analysis and determine the family’s eligibility to participate in the study – are defined based on consuming a minimum of 20 cigarettes per day for the last 12 months (Beuten, 2005
; Li et al., 2009
). In contrast, the study finding no significant association for GABBR2
used a case-control design with cases (affected individuals) having an FTND equal to or greater than 4, and controls (unaffected individuals) having an FTND equal to 0 (Agrawal et al., 2008
). Therefore, the first two studies finding significant association between GABBR2
and smoking were more enriched with intensive smokers relative to the study finding no association.
Two genome-wide association studies (GWAS) did not report GABBR2
to have significant effects on smoking (Berrettini et al., 2008
; Caporaso et al., 2009
). However, none of the SNPs in these studies was significant at the GWAS significance threshold corrected for multiple testing; some SNPs in CHRNA3
had low p-values but did not pass the p-value threshold. The samples for these two GWAS were also less enriched in intensive smokers relative to the two family-based studies finding significant association with GABBR2
. Recently, two meta-analyses of GWAS data reported some variants to be significantly related to smoking at GWAS-appropriate significance thresholds with SNPs in CHRNA3
having the largest effect and strongest statistical significance for cigarettes per day (Liu et al., 2010
; The Tobacco and Genetics Consortium, 2010
). These meta-analyses did not find evidence for GABBR2
effects on smoking. The CHRNA3
and other gene variants found to be significant in these GWAS meta-analyses have not been genotyped in our study sample.
Ideally, the selected genetic instruments should have unequivocal evidence of being related to the endogenous variable of interest (smoking behavior in this case). An increase in the number of previous studies replicating the associations of the genes or specific genetic variants being considered as instruments with the endogenous variable certainly provides stronger support for choosing these as instruments. However realistically, replications of genetic effects are not observed in many cases of complex phenotypes. The lack of replication is a rather common phenomenon for phenotypes and behaviors such as smoking that have complex genetic etiologies involving multiple genes and alleles that may interact amongst themselves and with environmental factors. The genetic/allelic heterogeneity may contribute to differences in results especially when comparing the results of studies that differ in their sample characteristics, phenotype measures, and designs. Further, the effects of genetic variants may be modified by environmental, behavioral and demographic characteristics which may result in differences in genetic effects across studies employing samples that vary in these characteristics.
GWAS provide several advantages over candidate gene association studies for detecting small to moderate genetic effects. However, they also face some limitations that should be considered when evaluating the replication of results. These include very low significance thresholds that may mask some important genetic effects as well as grouping heterogeneous samples from multiple data sources that may introduce noise into the phenotype measures and increase the variance of the genetic effect estimates. The smoking GWAS described above included samples from several studies of various health conditions (diabetes, cardiovascular outcomes, depression/anxiety, others) which may increase measurement error in the smoking phenotypes.
In any application of genetic instrumental variables, it is important to evaluate the evidence both for and against employing the genetic variants as instruments and the implications for interpreting the study findings. While there is mixed evidence for GABBR2, the previous findings of significant effects of GABBR2 variants on smoking, the knowledge of the gene’s functions including the study results for gene expression in response to nicotine summarize above, and the significant relationship that we find in our study between GABBR2 and number of cigarettes provides support for considering this gene as an instrument for smoking.
The second condition that instruments should satisfy is that they do not affect the outcome (body weight) either directly or through unobserved confounders. In other words, the instruments should affect body weight only indirectly through their effects on smoking. The support for this condition should be primarily obtained from the current knowledge of the gene functions based on previous studies. After searching the literature on GABBR2
, we are unable to find any published studies that directly associate this gene with other behaviors or pathways that can affect body weight besides smoking.7
Also, there is no evidence that this particular gene is involved in general addiction or in addictive personalities, and no studies have found evidence of it being linked to other addictions besides nicotine dependence. One study has reported that patients with autism have lower protein levels of GABA(B) receptor 2 units in the cerebellum compared to unaffected individuals (Fatemi et al., 2009
). Another study found protein-level reductions for this receptor in the lateral cerebella of patients with schizophrenia, depression, and bipolar disorder compared to individuals without these conditions (Fatemi et al., 2011
). While important, these studies are based on very small samples (less than 15 cases for each disorder) and do not provide evidence that GABBR2
affects these health conditions or affects body weight through them. Furthermore, the fact that our sample is limited to pregnant and mostly healthy women reduces any potential threats of the instrument being correlated with body weight through these health conditions.
As would be the case for most genes involved in behaviors, it is important to recognize that the lack of evidence to date for a role of GABBR2 in other behaviors besides smoking that may also affect body weight does not provide complete assurance against identifying such a role in the future. The continuously expanding and increasing capacity of genetic studies and gene-function assays for behavioral traits is expected to further characterize the functions of GABBR2 and other candidate genes for smoking, which will allow for a more comprehensive evaluation of the fit of these genes as instruments in future studies. However, based on the current knowledge, there is a reasonable support for GABBR2 to be unrelated to body weight other than through smoking.
The excludability of the instruments from the body weight function cannot be fully tested due to the role of unobservable factors but can be partially evaluated by testing the over-identification restrictions (Basmann, 1960
; Hausman, 1983
; Wooldridge, 2002
). These tests evaluate the excludability of the additional instruments from the body-weight function after identifying the smoking effect using one instrument. We employ these tests in order to provide a partial statistical evaluation of the instrument exogeneity. As described below, the instruments pass the over-identification restriction test at very high p-values (0.998), which provides some assurance for the exogeneity of the instruments and is consistent with the current knowledge of the gene effects. As an additional check of whether the instruments may violate the assumption of being unrelated to BMI other than through smoking, we correlate the instruments with observed socioeconomic, demographic, and behavioral characteristics. While this does not fully evaluate the excludability condition, it reveals whether the instruments are systematically correlated with other observable confounders which would suggest that the instruments may not be exogenous. As described below, we find that the instruments are not related to these characteristics, which provides some further assurance that the instruments may be exogenous.
Two of the three SNPs that we employ as instruments, rs1435252 and rs3780422, have been found to be significantly related to smoking behaviors in European-American populations both on their own and as part of haplotypes (groups of alleles across multiple SNPs) involving other SNPs (Beuten, 2005
). We include direct indicators for the genotypes of the three SNPs as instruments instead of haplotypes for two reasons. First, we are unable to find significant effects of haplotypes that have an adequate frequency to be used as instruments. Second, including indicators for multiple SNPs simultaneously in the model captures differences in the effects of various combinations of genotypes across these SNPs and some of the haplotype effects.8
For each SNP, we include two indicators for the minor allele homozygote and heterozygote genotypes, with the major allele homozygote as the reference genotype. This avoids the assumption that the minor allele has similar effects in its homozygote and heterozygote forms.
The three SNPs we employ as instruments are significantly correlated with each other; the hypothesis of no linkage disequilibrium is rejected at p <0.0001. However, this has no effect on the IV analysis which depends on the statistical strength of the instruments as a group in predicting the endogenous variable. Therefore, significant correlations between the instruments that increase the standard errors for the individual instrument effects on smoking do not reduce the significance of the joint instrument effects as a group. We find that the instruments have significant joint effects on smoking after adjusting for all the model exogenous variables, but that they are “weak” using the standard thresholds, with an F-statistic of 3.4. Therefore, we employ weak-instrument robust inference approaches that are described below in detail. In addition, we evaluate HWE for these SNPs.