Lung cancer is frequently cited as a malignancy attributable solely to environmental exposures—primarily cigarette smoke. However, evidence that genetic factors influence lung cancer susceptibility has been provided by numerous studies, beginning with the landmark study of Tokuhata and Lilienfeld1
, which demonstrated a 2.5-fold higher risk in smoking first-degree relatives of lung cancer cases compared with smoking relatives of controls and showed that the familial aggregation of lung cancer in case relatives compared to control relatives occurred irrespective of the relative’s smoking history. Subsequent epidemiological case-control analyses have consistently provided evidence for a two- to threefold increased lung cancer risk in relatives of cases compared with those of controls2
Direct evidence for a genetic predisposition to lung cancer is provided by the increased risk associated with constitutional TP53
(tumor protein p53)4
gene mutations, rare mendelian cancer syndromes such as Bloom’s7
and Werner’s syndromes8
, and strongly familial lung cancer9
. The genetic basis of inherited susceptibility to lung cancer outside the context of these disorders is at present undefined, but a model in which high-risk alleles account for all of the excess familial risk seems unlikely. Alternatively, part of the inherited genetic risk may be caused by low-penetrance alleles. This hypothesis implies that testing for allelic association should be a powerful strategy for identifying alleles that predispose to lung cancer.
We conducted a genome-wide association study (GWAS) of histologically confirmed non–small cell lung cancer (NSCLC) to identify common low-penetrance alleles influencing lung cancer risk. To minimize confounding effects from cigarette smoking and increase the power to detect genetic effects, we frequency matched controls to cases according to smoking behavior. We also matched controls to cases by age (within 5 year categories) and sex, and we further matched former smokers by years of cessation (). To minimize confounding by ethnic variation, we restricted our study population to individuals of self-reported European descent.
Characteristics of study populations
Using Illumina HumanHap300 v1.1 BeadChips, we genotyped 317,498 tagging SNPs in a series of 1,154 ever-smoking lung cancer cases and 1,137 ever-smoking controls (Texas discovery series; ). There was no evidence of genome-wide inflation of χ2
tests, which can occur in the presence of population substructure. The GWAS identified several genomic locations as potentially associated with lung cancer risk (). We further verified that these findings were robust to potential substructure by conditioning on marker similarity either by using Cochran-Mantel-Haenszel tests (Supplementary Fig. 1
online) or by conditioning on eigenvectors (Supplementary Table 1
Summary of ten fast-track SNPs analyzed in discovery and replication studies
Results from genome-wide association analysis of directly tested SNPs in the Texas discovery set using Illumina 300K HumanHap v1.1 Beadchips.
We performed a fast-track replication of the ten most significant associations from the GWAS in two additional case-control datasets (). One replication set was drawn from the same case-control population in Texas (711 cases and 632 controls) as the discovery phase, following the same criteria for matching. The other replication set was from the UK (2,013 cases and 3,062 controls). shows adequate frequency matching in the discovery phase for smoking behavior, age and sex, cigarette smoking intensity and years of smoking exposure, but currently smoking cases reported heavier packyears (cigarettes per day × years smoked) than currently smoking controls. The Texas replication set included more recently recruited participants for whom matching was incomplete. The UK replication set was not matched, and included some small-cell lung cancers and some lifetime never smokers. We could not assess potential effects of substructure in the replication sets, but the Texas replication used the same study population and control selection procedures as the discovery set, and previous studies from the same UK controls showed that population substructure did not influence risk estimation for colorectal cancer10
We replicated the elevated risks associated with two of the ten SNPs selected for validation in these additional case-control series, rs10151730 and rs8034191, both mapping to an 88-kb region of chromosome 15 ( and ). Through joint analysis of genotype data for cases and controls from the three series ( and Supplementary Table 2
online), we found unequivocal evidence for an association between these two SNPs and lung cancer risk. For rs8034191 and rs1051730, the combined P
values were 3.15 × 10−18
and 7.00 × 10−18
, respectively (). P
values from the replication data were < 10−12
(), and a similar level of significance was obtained when the joint tests were Bonferroni adjusted for 315,450 tests (results not shown). No other SNP showed significant evidence for association. Using Cochran-Mantel-Haenszel analysis, we did not observe any heterogeneity in the odds ratios (ORs) among the series (P
> 0.9) for these two SNPs. Combined adjusted ORs for lung cancer associated with rs8034191 and rs1051730 were 1.32 (95% CI: 1.24–1.41) and 1.32 (95% CI: 1.23–1.39), respectively. Combined adjusted ORs among all ever-smokers from the three studies were 1.28 for heterozygotes for both SNPs, and 1.81 and 1.80 for homozygotes with minor alleles of rs8034191 and rs1051730, respectively ().
Figure 2 The 15q25.1 locus. The top panel shows SNP single marker association results. Results in blue depict genotyped SNPs, and results in red are for imputed SNPs. All known genes and predicted transcripts in the local area are shown. Positions are that of (more ...)
Association of rs8034191 and rs1051730 genotypes with lung cancer risk among ever smokers before and after adjustment for age, sex and packyears of cigarette exposure
rs1051730 and rs8034191 map to a 100-kb region of strong linkage disequilibrium (LD) on chromosome 15 extending from 76,593,078 bp to 76,681,394 bp (). Three genes map to this region: CHRNA3 and CHRNA5 (nicotinic acetylcholine receptor alpha subunits 3 and 5) and PSMA4 (proteasome alpha 4 subunit isoform 1), as well as the hypothetical gene LOC123688 isoform 1. Although rs1051730 and rs8034191 are separated by 88 kb, the genotypes are highly correlated (r2 = 0.88 in the discovery set and 0.81 in HapMap for the population of European ancestry (CEU)). Intervening genotyped markers in the region showed weaker associations with lung cancer in the discovery set (), but the imputed SNP rs931794 at position 76,613,235 in LOC123688 showed the most significant association with lung cancer risk (P = 1.8 × 10−6).
We determined the haplotype block structure across the entire region (). To further study genetic effects in the candidate region, we estimated haplotypes from nine SNPs genotyped on the Illumina panel spanning the haplotype block that includes rs1051730 and rs8034191. A single extended haplotype was significantly associated with lung cancer risk (P
= 7.0 × 10−5
), but this did not improve the prediction of case status over that provided by the individual SNPs rs1051730 or rs8034191 (Supplementary Table 3
online). This result provides evidence against multiple alleles or loci in the region contributing to disease susceptibility.
There is a growing body of evidence implicating the nicotinic acetylcholine receptor pathway in both the etiology and the progression of lung cancer11–13
. Specifically, nicotine has been reported to promote cancer cell proliferation, survival, migration, invasion and tumor angiogenesis through the acetylcholine receptor pathway. The nicotinic acetylcholine receptor may also be a key player in nicotine-mediated suppression of apoptosis in lung cancer cells12
. Furthermore, it has been demonstrated that stimulation of nicotinic cholinergic receptors by nicotine promotes growth of human mesothelial cells14
is perhaps the more attractive candidate susceptibility gene for lung cancer. A previous study has shown15
that the nicotinic acid receptor could increase risk of lung cancer through a mechanism in which the CHRNA3
subunit binds NNK and subsequently upregulates nuclear factor kappa B to induce cell proliferation. PSMA4
is a component of the ATP- and ubiquitin-dependent nonlysosomal pathway, and although it is involved in the processing of class I major histocompatibility complex (MHC) peptides, there is little evidence to date for a role in lung cancer.
may have a role in nicotine dependence16
, we evaluated the relationship between the SNPs and lung cancer risk by smoking phenotype. Even though cases and controls from Texas were frequency matched on smoking behavior, lung cancer cases who smoked reported higher cumulative levels of exposure than controls who smoked (). Hence, it might be conjectured that the genetic associations we have identified relate to smoking behavior, which in turn modulates lung cancer risk, rather than a direct effect of a genetic susceptibility factor per se
. There was, however, no consistent trend of genotypic risk associated with different strata of smoking behavior and years since smoking cessation among former smokers ( and Supplementary Table 4
online). We also did not observe any significant change in risk of lung cancer associated with rs8034191 or rs1051730 after adjusting for age, sex and packyears of smoking () in the Texas populations. For the UK population, smoking adjustment decreased the ORs slightly. As shown in , for the UK sample, the OR among participants who had never smoked was nearly 1 for both risk genotypes. These results, if subsequently confirmed with a larger sample of never-smoking cases and controls, would indicate that these SNPs play a role in determining lung cancer risk only among ever-smokers. We found similar risks associated with genotypes for heavier and lighter smokers (Supplementary Table 5
online), with marginally higher genotypic risks among lighter smokers. Adjusting for genotype of either candidate SNP did not affect the association between smoking and lung cancer risk, indicating that the candidate SNPs and smoking have independent effects on lung cancer risk in our study. (Supplementary Table 6
Figure 3 Effects of SNPs according to smoking behavior in current, former and never smokers adjusting for age, sex and packyears of tobacco smoke exposure. (a–c) The x axis indicates the extent of exposure, starting with never smokers (UK population, panel (more ...)
To characterize in further detail the relationships between genotypes and smoking, we carried out additional exploratory studies. We analyzed whether rs8034191 or rs1051730 were associated with selected measures of nicotine dependence, that is, number of cigarettes consumed per day and packyears of exposure (Supplementary Table 7
online). Results showed weak evidence that these SNPs influence smoking behavior; however, the effects seemed consistently significant across studies in only former but not in current smokers. Collectively, these data provide evidence that, although the nicotinic acetylcholine receptor may have a role in smoking behavior, variation at 15q5.4 defined by rs8034191 or rs1051730 directly contributes to lung cancer susceptibility. A previous study16
found an association with rs16969968, a marker in strong LD with rs1051730, with an index of nicotine dependence (Fagerstrom index) in nondiseased individuals. Our study shows a weak effect of rs8034191 or rs1051730 on smoking behaviors and an extremely significant effect on lung cancer risk, whether or not an adjustment for smoking behavior is made during the analysis.
In conclusion, we have identified and replicated a locus associated with lung cancer risk. Given that the carrier frequencies of rs8034191 and rs1051730 are ~50% in populations of European ancestry, they may be of importance from a public health perspective. These data are the strongest evidence to date for common susceptibility alleles for lung cancer risk. CHRNA5 and CHRNA3 are promising candidate genes in this region of 15q25.1.