Our approach to analyzing susceptibility pathways for smoking quantity was based on the identification of GO terms and KEGG pathways common to the OZALC-NAG and SAGE datasets that showed replication in the ARIC study. Our decision to unify the set of SNPs analyzed in each of the two exploratory studies, genotyped by different Illumina chips, proved to be a valid option that identifies common categories of genes while maximizing the chances of observing the same enriching genes. We implemented this strategy by extending the set of SNPs originally genotyped in the OZALC-NAG study, incorporating imputed data to encompass the ones ascertained in SAGE. We applied this same approach to analyze the ARIC dataset, which was genotyped using the Affymetrix platform. Moreover, apart from the differences in the enriching genes, all of the replicated GO terms and KEGG pathways were also significant when we restricted analysis of ARIC to the genotyped SNPs in the Affymetrix Human SNP Array 6.0 (data not shown).
We found GO terms for the cholinergic receptors, that included genes tagged by SNPs previously reported to achieve genome-wide significant levels (i.e. 5.0E−08), although these SNPs did not necessarily achieve this level in the datasets we evaluated. In addition, other cholinergic receptor genes with more modest
p-values were also enriched in these GO terms. Each study identified CHRNA7 but the significant SNPs were different and in low r
2 (<0.20) but high D’ (>0.80) values, suggesting the existence of a shared risk allele. However, we could not identify a same SNP that was significant for the three studies in the region (
p-value<0.05). This might indicate the presence of an untyped SNP, possibly with a minor allele frequency too low to be accurately imputed, or might be a synthetic association representing the effects of multiple rare variants. In contrast, for the CHRNA9 gene, we could neither identify a common significant SNP nor a common allele tagged by SNPs in linkage disequilibrium (r
2>0.5 or D’>0.5). Despite this, the pathway analysis was robust enough to highlight the associations of these two genes to smoking quantity. Indeed, the presence of these moderate signals in
CHRNA7 and
CHRNA9, as well as the ones in cholinergic muscarinic receptors, sustained the significance of the terms GO:0005892, GO:0042166, GO:0004889 and GO:0015464 for the analysis of the least stringent threshold for SNPs (). In contrast, the absence of these moderate signals resulted in the terms GO:0035095, GO:0035094 and GO:0007274 only being significant for the analysis of the SNPs that satisfied the most stringent threshold (
Table S2).
We did not restrict our analysis to the ALIGATOR method, but also applied the MAGENTA method, as each of these methods can provide complementary findings
[35]. Only the GO terms for cholinergic receptors were consistently significant in the OZALC-NAG, SAGE and ARIC studies using the MAGENTA method, increasing the levels of certainty of the original ALIGATOR predictions. In contrast, the ALIGATOR method identified other GO terms and KEGG pathways common to the three datasets. We verified if the assumption of one causal gene per signal made by the MAGENTA method caused the different results. However, correcting or not for physically proximal genes did not change substantially the MAGENTA results for these categories of genes. It could be argued that ALIGATOR results are a product of the specificity of the method, and are not susceptibility factors for smoking. However, some of the genes included in these significant GO terms were previously reported to influence smoking behaviors, which increases the confidence in these findings (
e.g., GRM8
[32] and
GRM7
[31],
[30]). Our analysis detected other genes common to all datasets, including the bitter taste receptor
TAS2R1, suspected to be able to sense the nicotine in cigarette smoke
[36]. It has been shown that nicotine activates taste receptor pathways both specific for nicotine and also common to other bitter substances
[37],
[38]. This provides support for the finding that variants in some of the taste receptors can modulate cigarette consumption. Similarly, retinoic acid genes were also specific to the ALIGATOR analysis; but again it has been suggested that the activation of nicotinic receptors affects cellular signaling associated with retinoic acid target genes
[39].
One caveat to consider when performing pathway analysis is that the results obtained are biased, or at least restricted, to the biological knowledge that is incorporated into the GWAS, as well as its representation and modeling. This may explain the absence of any replicated GO term or KEGG pathway for the metabolism of nicotine.
CYP2A6 encodes the enzyme that metabolizes approximately 70 to 80% of nicotine to cotinine
[10],
[40],
[41]. A single SNP in this chromosomal region is typed by the Illumina Human 1 M chip.: rs3733829, which is in the first intron of EGLN2 located 40 kb downstream from
CYP2A6
[10]. This SNP is not in high linkage disequilibrium with any other SNP included in the chip in the extended genetic region considered for the
CYP2A6. Moreover,
CYP2A6 is a complex locus involving structural and rare functional variants that are not well tagged by the SNPs included in the genotyping platforms. The
UGT complex of genes, which catalyze nicotine and cotinine glucuronidation
[34], were significant in both the OZALC-NAG and SAGE studies, and were identified among the other retinoid binding genes. Furthermore, the flavin monooxygenase 1 gene (
FMO1) is also associated with nicotine metabolism
[42] and was tagged in the three studies. There is one KEGG pathway (hsa00982 “
Drug metabolism – cytochrome P450”) and two GO terms (GO:0005792 “
microsome” and GO:0042598 “
vesicular fraction”) with less than 500 genes that include FMO1 and UGT1A4. However, each of these terms and pathways has a very large number of genes (i.e., 73, 241 and 248 genes respectively); and thus are not specific enough to formally represent nicotine metabolism genes. In contrast, the identification of ribosome genes was an unexpected result of our analysis. The relationship, if any, of this gene family to nicotine consumption is not currently understood.
Both ALIGATOR and MAGENTA provide methods to correct for the multiple pathways tested. ALIGATOR applies a bootstrap approach
[15] whereas MAGENTA implements both Bonferroni multiple test correction and false discovery rate method
[27]. To analyze the combined evidence of the multiple studies evaluated independently () we chose the most stringent method, the Bonferroni multiple test correction, although it is most likely too conservative for the nested organization of GO terms. The corrected
p-value is 6.11E−6 for the nominal
p-value
=

0.05 and 8179 pathways tested. Using this significance level the terms GO:0042166 and GO:0004984 were significant for the three combined studies (weighted Z-method
[43]) (). Similarly, both methods provide mechanisms to correct for linkage disequilibrium; and thus avoid the situation that a same signal, which spans across multiple genes, inflates the number of significant genes in a same pathway. Because this inflation can be a source of false positive results for terms including clusters of genes, we re-executed the ALIGATOR method collapsing all physically proximal genes in the same category into a single entity. The GO terms representing the cholinergic receptor genes remained significant (category-specific
p-value <0.05) after this correction (
Table S5) for the OZALC-NAG, the SAGE and the ARIC studies. Similarly the GO term GO:0004984
“olfactory receptor activity” remained significant in the three studies, but the other two sensory perception GO terms were not consistently significant (
Table S5).
Our systematic analysis of smoking quantity, conditioning GWAS to extrinsic information, has identified both expected and novel GO terms and KEGG pathways. This new information should lead to a further prioritization of genes that do not include genome-wide significance SNPs. Many genetic variants, each one with small effects, are expected to be associated to complex traits
[44], Pathway analysis can be considered as a signal-to-noise filter for the true signals that are not strong enough to clearly stand out from the statistical background in a traditional GWAS.