|Home | About | Journals | Submit | Contact Us | Français|
Texas: C.I.A. and M.R.S. conceived of this study. M.R.S. established the Texas lung cancer study. C.I.A. supervised and performed the analyses. G.M. provided oversight in manuscript development and in the conduct of genetic studies. I.P.G., Q.D., Q.Z., W.V.C. and X.G. performed statistical analyses. S.S. developed and implemented statistical procedures for joint analysis. X.W. and J.G. oversaw genotyping for Texas studies. ICR: R.S.H. and T.E. established GELCAPS. R.S.H. supervised laboratory analyses. A.M. oversaw GELCAPS and developed the database. P.B. supervised sample organization, genotyping and sequencing. Y.W. provided database management. K.S. and J.V. performed DNA preparation and sequencing. CIDR: K.D. and Y.-Y.T. were responsible for direction of GWA genotyping and genotype data quality assurance conducted by the Center for Inherited Disease Research. All authors contributed to the final paper, with CIA., R.S.H., M.R.S., I.P.G., K.D., S.S. and Y.-Y.T. playing key roles.
To identify risk variants for lung cancer, we conducted a multistage genome-wide association study. In the discovery phase, we analyzed 315,450 tagging SNPs in 1,154 current and former (ever) smoking cases of European ancestry and 1,137 frequency-matched, ever-smoking controls from Houston, Texas. For replication, we evaluated the ten SNPs most significantly associated with lung cancer in an additional 711 cases and 632 controls from Texas and 2,013 cases and 3,062 controls from the UK. Two SNPs, rs1051730 and rs8034191, mapping to a region of strong linkage disequilibrium within 15q25.1 containing PSMA4 and the nicotinic acetylcholine receptor subunit genes CHRNA3 and CHRNA5, were significantly associated with risk in both replication sets. Combined analysis yielded odds ratios of 1.32 (P < 1 × 10−17) for both SNPs. Haplotype analysis was consistent with there being a single risk variant in this region. We conclude that variation in a region of 15q25.1 containing nicotinic acetylcholine receptors genes contributes to lung cancer risk.
Lung cancer is frequently cited as a malignancy attributable solely to environmental exposures—primarily cigarette smoke. However, evidence that genetic factors influence lung cancer susceptibility has been provided by numerous studies, beginning with the landmark study of Tokuhata and Lilienfeld1, which demonstrated a 2.5-fold higher risk in smoking first-degree relatives of lung cancer cases compared with smoking relatives of controls and showed that the familial aggregation of lung cancer in case relatives compared to control relatives occurred irrespective of the relative’s smoking history. Subsequent epidemiological case-control analyses have consistently provided evidence for a two- to threefold increased lung cancer risk in relatives of cases compared with those of controls2.
Direct evidence for a genetic predisposition to lung cancer is provided by the increased risk associated with constitutional TP53 (tumor protein p53)4 and RB1 (retinoblastoma)5,6 gene mutations, rare mendelian cancer syndromes such as Bloom’s7 and Werner’s syndromes8, and strongly familial lung cancer9. The genetic basis of inherited susceptibility to lung cancer outside the context of these disorders is at present undefined, but a model in which high-risk alleles account for all of the excess familial risk seems unlikely. Alternatively, part of the inherited genetic risk may be caused by low-penetrance alleles. This hypothesis implies that testing for allelic association should be a powerful strategy for identifying alleles that predispose to lung cancer.
We conducted a genome-wide association study (GWAS) of histologically confirmed non–small cell lung cancer (NSCLC) to identify common low-penetrance alleles influencing lung cancer risk. To minimize confounding effects from cigarette smoking and increase the power to detect genetic effects, we frequency matched controls to cases according to smoking behavior. We also matched controls to cases by age (within 5 year categories) and sex, and we further matched former smokers by years of cessation (Table 1). To minimize confounding by ethnic variation, we restricted our study population to individuals of self-reported European descent.
Using Illumina HumanHap300 v1.1 BeadChips, we genotyped 317,498 tagging SNPs in a series of 1,154 ever-smoking lung cancer cases and 1,137 ever-smoking controls (Texas discovery series; Table 2). There was no evidence of genome-wide inflation of χ2 tests, which can occur in the presence of population substructure. The GWAS identified several genomic locations as potentially associated with lung cancer risk (Fig. 1). We further verified that these findings were robust to potential substructure by conditioning on marker similarity either by using Cochran-Mantel-Haenszel tests (Supplementary Fig. 1 online) or by conditioning on eigenvectors (Supplementary Table 1 online).
We performed a fast-track replication of the ten most significant associations from the GWAS in two additional case-control datasets (Table 1). One replication set was drawn from the same case-control population in Texas (711 cases and 632 controls) as the discovery phase, following the same criteria for matching. The other replication set was from the UK (2,013 cases and 3,062 controls). Table 1 shows adequate frequency matching in the discovery phase for smoking behavior, age and sex, cigarette smoking intensity and years of smoking exposure, but currently smoking cases reported heavier packyears (cigarettes per day × years smoked) than currently smoking controls. The Texas replication set included more recently recruited participants for whom matching was incomplete. The UK replication set was not matched, and included some small-cell lung cancers and some lifetime never smokers. We could not assess potential effects of substructure in the replication sets, but the Texas replication used the same study population and control selection procedures as the discovery set, and previous studies from the same UK controls showed that population substructure did not influence risk estimation for colorectal cancer10.
We replicated the elevated risks associated with two of the ten SNPs selected for validation in these additional case-control series, rs10151730 and rs8034191, both mapping to an 88-kb region of chromosome 15 (Table 2 and Fig. 2). Through joint analysis of genotype data for cases and controls from the three series (Table 3 and Supplementary Table 2 online), we found unequivocal evidence for an association between these two SNPs and lung cancer risk. For rs8034191 and rs1051730, the combined P values were 3.15 × 10−18 and 7.00 × 10−18, respectively (Table 2). P values from the replication data were < 10−12 (Table 2), and a similar level of significance was obtained when the joint tests were Bonferroni adjusted for 315,450 tests (results not shown). No other SNP showed significant evidence for association. Using Cochran-Mantel-Haenszel analysis, we did not observe any heterogeneity in the odds ratios (ORs) among the series (P > 0.9) for these two SNPs. Combined adjusted ORs for lung cancer associated with rs8034191 and rs1051730 were 1.32 (95% CI: 1.24–1.41) and 1.32 (95% CI: 1.23–1.39), respectively. Combined adjusted ORs among all ever-smokers from the three studies were 1.28 for heterozygotes for both SNPs, and 1.81 and 1.80 for homozygotes with minor alleles of rs8034191 and rs1051730, respectively (Table 3).
rs1051730 and rs8034191 map to a 100-kb region of strong linkage disequilibrium (LD) on chromosome 15 extending from 76,593,078 bp to 76,681,394 bp (Fig. 2). Three genes map to this region: CHRNA3 and CHRNA5 (nicotinic acetylcholine receptor alpha subunits 3 and 5) and PSMA4 (proteasome alpha 4 subunit isoform 1), as well as the hypothetical gene LOC123688 isoform 1. Although rs1051730 and rs8034191 are separated by 88 kb, the genotypes are highly correlated (r2 = 0.88 in the discovery set and 0.81 in HapMap for the population of European ancestry (CEU)). Intervening genotyped markers in the region showed weaker associations with lung cancer in the discovery set (Fig. 2), but the imputed SNP rs931794 at position 76,613,235 in LOC123688 showed the most significant association with lung cancer risk (P = 1.8 × 10−6).
We determined the haplotype block structure across the entire region (Fig. 2). To further study genetic effects in the candidate region, we estimated haplotypes from nine SNPs genotyped on the Illumina panel spanning the haplotype block that includes rs1051730 and rs8034191. A single extended haplotype was significantly associated with lung cancer risk (P = 7.0 × 10−5), but this did not improve the prediction of case status over that provided by the individual SNPs rs1051730 or rs8034191 (Supplementary Table 3 online). This result provides evidence against multiple alleles or loci in the region contributing to disease susceptibility.
There is a growing body of evidence implicating the nicotinic acetylcholine receptor pathway in both the etiology and the progression of lung cancer11–13. Specifically, nicotine has been reported to promote cancer cell proliferation, survival, migration, invasion and tumor angiogenesis through the acetylcholine receptor pathway. The nicotinic acetylcholine receptor may also be a key player in nicotine-mediated suppression of apoptosis in lung cancer cells12. Furthermore, it has been demonstrated that stimulation of nicotinic cholinergic receptors by nicotine promotes growth of human mesothelial cells14. CHRNA3 is perhaps the more attractive candidate susceptibility gene for lung cancer. A previous study has shown15 that the nicotinic acid receptor could increase risk of lung cancer through a mechanism in which the CHRNA3 subunit binds NNK and subsequently upregulates nuclear factor kappa B to induce cell proliferation. PSMA4 is a component of the ATP- and ubiquitin-dependent nonlysosomal pathway, and although it is involved in the processing of class I major histocompatibility complex (MHC) peptides, there is little evidence to date for a role in lung cancer.
Because CHRNA3 and CHRNA5 may have a role in nicotine dependence16, we evaluated the relationship between the SNPs and lung cancer risk by smoking phenotype. Even though cases and controls from Texas were frequency matched on smoking behavior, lung cancer cases who smoked reported higher cumulative levels of exposure than controls who smoked (Table 1). Hence, it might be conjectured that the genetic associations we have identified relate to smoking behavior, which in turn modulates lung cancer risk, rather than a direct effect of a genetic susceptibility factor per se. There was, however, no consistent trend of genotypic risk associated with different strata of smoking behavior and years since smoking cessation among former smokers (Fig. 3 and Supplementary Table 4 online). We also did not observe any significant change in risk of lung cancer associated with rs8034191 or rs1051730 after adjusting for age, sex and packyears of smoking (Table 3) in the Texas populations. For the UK population, smoking adjustment decreased the ORs slightly. As shown in Figure 3, for the UK sample, the OR among participants who had never smoked was nearly 1 for both risk genotypes. These results, if subsequently confirmed with a larger sample of never-smoking cases and controls, would indicate that these SNPs play a role in determining lung cancer risk only among ever-smokers. We found similar risks associated with genotypes for heavier and lighter smokers (Supplementary Table 5 online), with marginally higher genotypic risks among lighter smokers. Adjusting for genotype of either candidate SNP did not affect the association between smoking and lung cancer risk, indicating that the candidate SNPs and smoking have independent effects on lung cancer risk in our study. (Supplementary Table 6 online).
To characterize in further detail the relationships between genotypes and smoking, we carried out additional exploratory studies. We analyzed whether rs8034191 or rs1051730 were associated with selected measures of nicotine dependence, that is, number of cigarettes consumed per day and packyears of exposure (Supplementary Table 7 online). Results showed weak evidence that these SNPs influence smoking behavior; however, the effects seemed consistently significant across studies in only former but not in current smokers. Collectively, these data provide evidence that, although the nicotinic acetylcholine receptor may have a role in smoking behavior, variation at 15q5.4 defined by rs8034191 or rs1051730 directly contributes to lung cancer susceptibility. A previous study16 found an association with rs16969968, a marker in strong LD with rs1051730, with an index of nicotine dependence (Fagerstrom index) in nondiseased individuals. Our study shows a weak effect of rs8034191 or rs1051730 on smoking behaviors and an extremely significant effect on lung cancer risk, whether or not an adjustment for smoking behavior is made during the analysis.
In conclusion, we have identified and replicated a locus associated with lung cancer risk. Given that the carrier frequencies of rs8034191 and rs1051730 are ~50% in populations of European ancestry, they may be of importance from a public health perspective. These data are the strongest evidence to date for common susceptibility alleles for lung cancer risk. CHRNA5 and CHRNA3 are promising candidate genes in this region of 15q25.1.
For detailed descriptions of the component studies, see Supplementary Methods online. The study protocols were approved by the Institutional Review Board of The University of Texas M.D. Anderson Cancer Center and by a review board at the Institute for Cancer Research Foundation. Informed consent was obtained from all participants.
Genotyping procedures and quality control approaches are described in Supplementary Methods. We retained data from 315,860 SNPs from Illumina analysis that had genotyping results in 90% or more subjects, but 410 were monomorphic for analysis in individuals of European descent (and hence not informative). Confirmatory genotyping in Houston was conducted on an independent sample of 711 cases and 632 controls using a Taqman genotyping platform for the ten most significant SNPs identified in the discovery phase. The Texas replication sample comprised independent cases and controls from the discovery set who were from the same study population source but who tended to be more recently enrolled participants with incomplete frequency matching. Genotyping of UK samples was conducted by competitive allele-specific PCR KASPar chemistry (KBiosciences).
We used similarity in genotypes as implemented in PLINK to identify individuals and clusters of individuals who deviated by more than 4 standard deviations from other study subjects, and we excluded these outliers. We identified genetically related subjects using PLINK software, which uses the similarity in identity by state of genotypes to estimate identity by descent values17, setting the clustering value at 0.0001 and excluding 639 markers that deviated from Hardy-Weinberg equilibrium in the controls (P < 0.0001) and 584 SNPs with minor allele frequency (MAF) < 0.01.
Association between SNP genotype and disease status was primarily assessed using the allelic 1 degree-of-freedom (d.f.) test or Fisher’s exact test where an expected cell count was <5. We also carried out association analysis using the Armitage-Doll trend test18. The ORs associated with each SNP and the 95% confidence intervals were estimated by allele and by genotype using unconditional logistic regression. None of the markers associated with lung cancer risk showed deviations from Hardy-Weinberg equilibrium (P > 0.05).
We evaluated the adequacy of the case-control matching and the possibility of differential genotyping of cases and controls using quantile-quantile plots of test statistics. A test inflation factor λ was calculated by dividing the median of the test statistics by the expected median from a χ2 distribution with 1 d.f.19. The mean and median of the χ2 tests in Figure 1 were 1.0196 and 0.4675, very close to the expected values of 1.00 and 0.456. Comparison of the median χ2 test with its expected value yielded a λ value of 1.025, very close to expected, indicating that population substructure, if present, did not have any substantial effect upon the discovery stage analyses presented here.
We used HelixTree for preliminary analyses and for initial data manipulation; we then transferred data to PLINK17 and EIGENSTRAT20. We evaluated the association of markers with lung cancer risk allowing for potential effects of population substructure by using a Cochran-Mantel-Haenszel test22 in PLINK. Strata were defined by a nearest neighbor cluster analysis of genetic similarity, which identified 44 clusters. We also carried out a second analysis to allow for substructure effects using EIGENSTRAT20. All genetic data from the discovery set were used to obtain correlation matrices among the subjects. Spectral analysis was done to extract those eigenvectors explaining the largest proportion of interindividual variation. A scree plot of the associated eigenvalues showed a point of inflection when three eigenvalues were included, and these three eigenvalues alone exceeded 2.0 (results not shown). Results from all analyses were very similar for significantly associated SNPs whether or not adjustments for population structure were made (Supplementary Table 1).
We used SAS Genetics v9.1 to conduct association tests for Hardy-Weinberg equilibrium and to perform haplotype analyses. Logistic regression, implemented in SAS version 9.1, was used to perform analyses adjusting for smoking and other covariates. We conducted joint analysis of data generated from multiple phases using standard methods for combining raw data based on the Mantel-Haenszel method. We used Cochran’s Q statistic to test for heterogeneity.
We used Haploview21 software (v3.2) to infer the LD structure of the genome in the regions containing loci associated with disease risk. To impute SNPs from multimarker tags, we used a procedure described previously22 based upon haplotype frequencies from HapMap release 21, build 35.
We obtained P values combining data from the discovery phase as well as the two replication phases following a procedure outlined previously23. Specifically, we set the critical value for the discovery phase to be the least significant result among the ten SNPs retained for follow up (P = 4.9 × 10−5). We obtained the joint test statistics by comparing allele frequencies in cases versus controls from all studies according to their sample sizes (results from the two replication phases were combined prior to joint analysis). We used the joint statistic value conditioning on the critical value P = 4.9 × 10−5 using the program CaTS to estimate the P value required to reach observed joint Z value. The pointwise P value so derived can be adjusted for multiple testing using a Bonferroni approach by multiplying the pointwise P value by the number of tests (results not shown). For several cases in which the replication P value was very much larger than the discovery P value, the CaTS software could not provide a result because of numerical overflow, and these results were indicated by > 1 × 10−5, which was the least significant P value obtained before the overflow. We also provided P values from the replication phase only by combining results from the Texas replication and UK studies, and adjusting for center effects using a Cochran-Mantel-Haenszel procedure implemented in SAS.
Haploview, http://www.broad.mit.edu/mpg/haploview/; Eigenstrat, http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm; CaTS, http://www.sph.umich.edu/csg/abecasis/CaTS/.
Partial support for this study has been provided by US National Institutes of Health grants R01CA133996, R01CA55769, P50 CA70907 and R01CA121197, the Kleberg Center for Molecular Markers at M.D. Anderson Cancer Center, and by support from the Flight Attendants Medical Research Institute. Genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, Contract Number N01-HG-65403. We thank the Kelsey Research Foundation for facilitating control selection in Texas. At the Institute for Cancer Research, work was undertaken with support primarily from Cancer Research UK. We are also grateful to the National Cancer Research Network, HEAL and Sanofi-Aventis. A. Matakidou was the recipient of a clinical research fellowship from the Allan J. Lerner Fund. We are also thankful for the unstinting efforts of the study coordinators and interviewers, including S. Honn, P. Porter, S. Ritter and J. Rogers. We also thank the study participants, who had the most critical role in this research.
Note: Supplementary information is available on the Nature Genetics website.
Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions