|Home | About | Journals | Submit | Contact Us | Français|
Recent genome-wide association studies (GWASs) have identified common genetic variants at 5p15.33, 6p21–6p22 and 15q25.1 associated with lung cancer risk. Several other genetic regions including variants of CHEK2 (22q12), TP53BP1 (15q15) and RAD52 (12p13) have been demonstrated to influence lung cancer risk in candidate- or pathway-based analyses. To identify novel risk variants for lung cancer, we performed a meta-analysis of 16 GWASs, totaling 14 900 cases and 29 485 controls of European descent. Our data provided increased support for previously identified risk loci at 5p15 (P = 7.2 × 10−16), 6p21 (P = 2.3 × 10−14) and 15q25 (P = 2.2 × 10−63). Furthermore, we demonstrated histology-specific effects for 5p15, 6p21 and 12p13 loci but not for the 15q25 region. Subgroup analysis also identified a novel disease locus for squamous cell carcinoma at 9p21 (CDKN2A/p16INK4A/p14ARF/CDKN2B/p15INK4B/ANRIL; rs1333040, P = 3.0 × 10−7) which was replicated in a series of 5415 Han Chinese (P = 0.03; combined analysis, P = 2.3 × 10−8). This large analysis provides additional evidence for the role of inherited genetic susceptibility to lung cancer and insight into biological differences in the development of the different histological types of lung cancer.
Lung cancer is a major cause of cancer death worldwide accounting for over 1 million deaths each year (1). The major lung cancer histologic subtypes (small cell and non-small cell) have different clinicopathological characteristics reflective of differences in carcinogenesis (2).
While lung cancer is largely caused by tobacco smoking, there is increasing evidence for the role of inherited genetic factors in disease etiology (3). Notably, genome-wide association studies (GWASs) of lung cancer have robustly demonstrated that polymorphic variation at 5p15.33 (TERT/CLPTM1L), 6p21.33 (BAT3/MSH5) and 15q25.1 (CHRNA5/CHRNA3/CHRNB4) influences lung cancer risk in European populations (4–9). Additionally, single-nucleotide polymorphisms (SNPs) at 22q12 (10,11) and the 15q15.2 locus containing the TP53-binding protein 1 gene have been associated with lung cancer risk (12–14). Three additional susceptibility regions at 13q12.12, 22q12.2 (15) and 3q28 (16) have been identified in GWASs on Asian populations, but to date these regions have not been implicated in lung cancer risk in individuals of European ancestry.
Given the biological differences across lung cancer subtypes, histology- and smoking-specific associations have been conducted. Analyses have shown that SNP rs2736100 (TERT) is primarily associated with adenocarcinoma risk (17,18) and variation in RAD52 at 12p13 with squamous cell carcinoma risk (19). Significant heterogeneity by smoking status and age of onset has been shown for SNPs at the 15q25 locus harboring nicotinic acetylcholine receptor genes (17).
The statistical power of individual GWAS has been limited by the modest effect sizes of genetic variants, the need to establish stringent thresholds of statistical significance and financial constraints on the numbers of variants that could be studied. Additionally, due to sample size limitations, few comprehensive histology and smoking history subgroup analyses have been performed in individual GWAS. Meta-analysis of existing GWAS data therefore offers the opportunity to discover additional disease loci harboring common variants associated with lung cancer risk and explore the variability in genetic effects according to disease heterogeneity.
In this study, we conducted a pooled analysis of data from 16 GWASs of lung cancer providing data on 14 900 cases and 29 485 controls of European ancestry. We studied associations by histology, sex, smoking status, age of onset, stage and family history of lung cancer and explored the individual contribution of SNPs in previously identified risk loci. To explore how these genetic findings translate into non-European populations, we evaluated selected SNPs in a Han Chinese study of 2338 lung cancer cases and 3077 controls.
A total of 14 900 lung cancer cases and 29 485 controls of European descent from 16 previously reported lung cancer GWASs undertaken by nine analytical centers were included in the meta-analysis (Table 1, Supplementary Material, Table S1 and Fig. S1). The meta-analysis was primarily based on pooling GWAS summary results from 318 094 SNPs featured on Illumina HumanHap 300 BeadChips arrays. For studies genotyped on HumanHap550 or 610Quad Illumina platforms, an additional 217 914 SNPs were available to inform our analysis.
Some degree of genomic over-dispersion (genomic inflation) is expected under a polygenic model even in the absence of population stratification and other technical artifacts (20), and the meta-analysis showed modest evidence of over-dispersion (λ = 1.10) for the core 318 094 SNPs typed on Illumina HumanHap 300 BeadChips platform (Fig. 1, Supplementary Material, Fig. S2). Adjustment for a genomic inflation factor of 1.10 in this meta-analysis conservatively reduces the power to detect an association. The λ normalized to 1000 cases and 1000 controls was only 1.005, when the approach proposed by Freedman et al. (21) was applied.
SNPs mapping to the previously identified risk loci at 5p15, 6p21 and 15q25 provided the best evidence for an association with lung cancer (Supplementary Material, Tables S2 and S3 and Fig. 1). The strongest association was found for rs1051730, which maps to exon 5 of CHRNA3 at 15q25 (P = 2.2 × 10−63; Fig. 2G), and rs8034191 (r2 = 0.93, D′ = 1 between the two SNPs, P = 9.5 × 10−59), which is located in the second intron of the AGPHD1 gene. Consistent with previous observations (17,18), the rs1051730 association was significant in smokers (P = 1.8 × 10−59) and not in never smokers (P = 0.06, Fig. 2, Supplementary Material, Fig. S3). The association also appeared slightly stronger in females than in males [respective odds ratios (ORs) 1.42 and 1.29, Phet = 0.02] and for late-stage rather than early-stage disease (respective ORs 1.39 and 1.28, Phet = 0.06).
Thirty-one additional SNPs localizing to 15q25.1 with a varied level of linkage disequilibrium (LD) with rs1051730 (Supplementary Material, Table S4) showed a genome-wide significant association (Supplementary Material, Table S2). The third strongest evidence for association was observed for rs6495309 (P = 1.1 × 10−32), which maps the 3′ downstream of CHRNB4 and shows weaker correlation with rs1051730 (r2 = 0.15; D′ = 1.00; Fig. 2F). After adjusting for rs1051730, the effect estimate for rs6495309 was greatly attenuated (P = 4.0 × 10−5), while rs1051730 remained significant (P = 2.4 × 10−26; Fig. 3) when allele dosage for rs6495309 was included into a model. Two intronic variants of CHRNA5 (rs680244, effect allele T, OR = 0.90, P = 7.2 × 10−10, and rs6495306, effect allele G, OR = 0.91, P = 1.8 × 10−9) changed the direction of effect when controlling for rs1051730 (OR = 1.14 P = 1.4 × 10−8 and OR = 1.13, P = 4.1 × 10−8 for rs680244, effect allele T, and rs6495306, effect allele G, respectively, after controlling for allelic dosage). Conversely, an analysis adjusting for rs6495309 enhanced their effects consistently across studies (P = 9.2 × 10−31 and 5.1 × 10−32 for rs680244 and rs6495306, respectively). No other 15q25.1 variant showed a significant association when allelic dosages for rs1051730 and rs6495309 were included into the statistical model.
After imputation, the most significant association in the meta-analysis of GWAS data from individuals of European ancestry was shown by rs951266 (P = 2.8 × 10−62, Supplementary Material, Fig. S4), which maps to intron 2 of CHRNA5 and is in LD with both rs1051730 and rs16969968. Rs951266 also showed evidence for an association in the Han Chinese population (MAF = 0.04, P < 0.01). Several other rare imputed variants that do not directly correlate with the 15q25 variants identified in GWAS of European descendants (r2 < 0.05 and D′ = 1) showed association with lung cancer risk in the Han Chinese population (Supplementary Material, Fig. S6). These variants map within or in close proximity to IREB2.
As previously reported (6), two independent susceptibility variants, rs2736100 and rs401681, which annotate to TERT and CLPTM1L, were identified in the 5p15.33 region (Fig. 2A and B). Also consistent with previous findings (17,18), the risk associated with rs2736100 was largely confined to adenocarcinoma (P = 1.7 × 10−19). In contrast, rs401681 influenced the risk of all lung cancer histologies, but had its strongest effect on squamous cell carcinoma (OR = 0.84, P = 3.7 × 10−11) and large-cell carcinoma (OR = 0.78, P = 0.006). Both SNPs in the 5p15.33 locus had a stronger effect in never smokers (OR = 1.25 for rs2736100 and OR = 0.80 for rs401681) than in ever smokers (OR = 1.11, Phet = 0.04 for rs2736100 and OR = 0.88; Phet = 0.11 for rs401681). The rs2736100 association was stronger in women than in men (respective ORs = 1.21 and 1.12; Phet = 0.05) and in late-stage versus early-stage disease (respective ORs = 1.19 and 1.07; Phet = 0.05). Logistic regression including the allelic dosage of two independent SNPs (rs401681 and rs2736100) as covariates showed no support for additional independent associations at 5p15.33 (Fig. 3). Consistent with previous reports (15), rs2736100 and rs401681 both showed an association with lung cancer risk in Han Chinese (Supplementary Material, Fig. S5). In the meta-analysis of imputed genotypes, rs2853677, localizing to intron 2 of TERT, showed the strongest evidence for an association with adenocarcinoma (OR for the G allele = 1.33; P = 2.2 × 10−18) and rs465498, localizing to intron 10 of CLPTM1L, showed the strongest association with lung cancer overall (OR for the A allele = 1.15, P = 8.2 × 10−18; Supplementary Material, Fig. S4). Both variants showed correlations with the previously identified SNPs (r2 = 1.0 and D′ = 1.0 between genotyped rs401681 and imputed rs465498 at CLPTM1L and r2 = 0.54 and D‘ = 0.80 between genotyped rs2736100 and imputed rs2853677 at TERT).
The analysis of previously reported lung cancer risk locus at 6p21–6p22, which encompasses HLA, is complicated by an extended LD structure (7). The strongest 6p21–6p22 association was shown for rs3117582 (P = 2.3 × 10−14), which maps the 5′ upstream of the DNA repair gene BAT3, and is in complete LD with rs3131379 in MSH5, a DNA mismatch repair gene. This association was stronger for squamous cell carcinoma (respective ORs for squamous carcinoma and adenocarcinoma, 1.30 and 1.12, Phet = 0.02; Fig. 2, Supplementary Material, Fig. S3). Logistic regression including the allelic dosage of rs3117582 did not identify any SNPs associated with lung cancer risk with genome-wide significance (Fig. 3). When markers were imputed, the strongest signal for squamous cell carcinoma in this region was observed for two correlated variants (r2 = 1.0, D′ = 1.0; r2 = 0.76, D′ = 0.93 with rs3117582 for both SNPs): rs2523546 (effect allele G, OR = 0.76, P = 1.1 × 10−10) and rs2523571 (effect allele T, OR = 0.76, P = 9.7 × 10−11) located in the 3′UTR region of HLA-B (Supplementary Material, Fig. S4).
Excluding SNPs mapping to 5p15.33, 6p21–6p22 and 15q25.1, no SNP showed evidence for a genome-wide significant association with lung cancer risk (Fig. 1). Stratification of lung cancer cases by histology did, however, reveal associations with squamous cell carcinoma risk for the previously described risk locus at 12p13 and two potential novel disease loci at 9p21.3 and 2q32.1 (P < 5.0 × 10−7; Figs 1 and and2,2, Supplementary Material, Fig. S7). No evidence for association was noted for chromosome X variants in the gender subgroup analysis.
Specifically, rs10849605, which maps within intron 1 of the DNA double-strand repair gene RAD52 in the 12p13.33 region, was inversely associated with lung cancer risk (effect allele T; OR = 0.92, P = 5.0 × 10−7). This association was particularly strong among smokers (OR = 0.92, P = 6.0 × 10−7) and cases with squamous (OR = 0.87; P = 6.0 × 10−8) and small-cell carcinoma (OR = 0.85, P = 2.0 × 10−6; Phet = 0.0002 across histologies; Fig. 2E, Supplementary Material, Fig. S3). This variant was not significantly associated with lung cancer risk overall or any histology group in the Han Chinese GWAS (Table 2, Supplementary Material, Fig. S5).
To further explore this region, we performed a meta-analysis of imputed variants from 15 GWASs from eight analytical centers on individuals of European ancestry. In this analysis, rs3748522, which maps to intron 1 of RAD52 and is in strong LD with rs10849605 (r2 = 0.90, D′ = 1), provided the strongest association with squamous cell carcinoma (effect allele A, OR = 0.86, P = 3.4 × 10−8; Supplementary Material, Fig. S4).
The strongest evidence for association at 9p21 was shown with rs1333040, which is located ~ 74 kb upstream of CDKN2B, within intron 12 of CDKN1B antisense RNA 1 or ANRIL (effect allele C, OR = 1.06, P = 9.4 × 10−5; Figs 1 and and2D).2D). A subgroup analysis by histology revealed strong heterogeneity (Phet = 0.003) with the strongest association for squamous cell cancer (OR = 1.14, P = 2.9 × 10−7). Rs1333040 also showed an association with squamous cell carcinoma in the Han Chinese population (P = 0.03, Supplementary Material, Fig. S5). In the combined analysis of all data sets, this association attained genome-wide significance (P = 2.3 × 10−8; Table 2). In an analysis of imputed data across eight studies, the lowest P-value was shown for rs1537372 (effect allele G, OR = 1.14, P = 3.3 × 10−6) (r2 = 0.60, D‘ = 0.95 with rs1333040) located in the intron 14 of CDKN1B-AS/ANRIL (Supplementary Material, Fig. S4).
The next strongest association with squamous cell carcinoma risk was shown for rs11683501 at 2q32.1 after adjustment for smoking (effect allele G, OR = 1.17, P = 1.6 × 10−7, Supplementary Material,Fig. S7). This SNP is located 3′ downstream of the nucleoporin 35 kDa (NUP35) gene (22). Imputation did not identify any stronger association (Supplementary Material, Fig. S4), and rs11683501 did not show an association with risk in the Han Chinese population (Supplementary Material, Fig. S5).
We additionally interrogated variation at 15q15.2, which has been previously identified as a determinant of lung cancer risk (8,12–14,23). In the meta-analysis, rs504417 showed the strongest association, but did not attain genome-wide significance (P = 1.2 × 10−6; Supplementary Material, Fig. S7).
Finally, we evaluated the 3q28, 2q29, 13q12.12 and 22q12.2 regions previously identified in GWASs of Asian populations (15,16,24) as risk factors for lung cancer. None of the SNPs (or their proxies) mapping to these loci showed evidence for an association in European populations (Supplementary Material, Table S5).
By pooling summary results from 16 GWASs, we have provided additional evidence for inherited genetic predisposition to lung cancer and have refined associations at the 5p15, 6p21–6p22, 12p13 and 15q25 risk loci. Furthermore, we have shown that 9p21.3 variation is a determinant of squamous cell lung cancer risk.
Consistent with previous studies (6,7,17,18), our meta-analysis confirmed two independent signals at 5p15.33 (annotating TERT and CLPTM1L genes) as determinants of lung cancer risk impacting differentially on lung cancer histology. The rs2736100 variant in TERT was principally associated with adenocarcinoma risk and showed stronger effects in women, early-onset disease and never smokers where the proportion of adenocarcinoma cases is generally higher (25–27). Although indirect, the possibility that the association between rs2736100 and adenocarcinoma risk is mediated through an effect on TERT is supported by an observation of TERT amplification and mRNA overexpression in adenocarcinoma (28), as well as the inhibition of lung adenocarcinoma cell growth promoted by the suppression of hTERT expression (29).
The CLPTM1L association appears stronger in squamous cell lung carcinoma and large-cell lung cancer, two histology groups strongly linked to tobacco smoking. This is consistent with the finding that a variant in CLPTM1L (rs402710, G) has been associated with high levels of DNA adducts caused by smoking (30).
A role for the 6p21–6p22 locus in lung cancer development has been previously shown by some (4,7), but not all studies (17,31). This meta-analysis identified 61 SNPs at 6p21–6p22 showing a significant association with lung cancer risk. Most of these SNPs were highly correlated with rs3117582, which had the strongest effect for the squamous cell carcinoma. rs3117582 is located 73 bp 5′ of the gene encoding BCL2-associated athanogene 6 (BAG6/BAT3), which belongs to a BAG domain containing a family of proteins that interact with Hsp70/Hsc70 (32). BAT3/BAG6-deficient mice are embryonic lethal with defects in the development of the lung, brain and kidney (33). BAT3/BAG6 plays an essential role in p53-mediated apoptosis induced by genotoxic stress (34). rs3117582 is in perfect LD with rs3131379, which maps to intron 10 of the DNA mismatch repair mutS homolog 5 (Escherichia coli) (MSH5) gene. Both BAT3 and MSH5 are expressed in lung tissue and are strong potential candidates for being the functional basis for the association (35,36). Since the development of squamous cell lung cancer is strongly influenced by environmental exposure to carcinogens that cause DNA damage, it is highly plausible that genetic variation in the DNA repair mechanism and/or DNA-damage-induced apoptosis would play an etiologic role.
The 9p21.3 region encodes three tumor suppressor genes that play key roles in cell cycle inhibition, senescence and stress-induced apoptosis: CDKN2A/p16INK4A (cyclin-dependent kinase inhibitors 2A), p14ARF (alternative transcript generated by alternative exon 1 of CDKN2A/p16INK4A) and CDKN2B/p15INK4B (cyclin-dependent kinase inhibitors 2B) (37). CDKN2A/p16INK4A was originally identified as a melanoma susceptibility gene (38), but is inactivated in many tumors including lung cancer (39–42). 9p21.3 variants associated with lung cancer risk in our study are located 5′ upstream of CDKN2B, within the intronic region of the CDKN2B antisense RNA (ANRIL/CDKN2B-AS). Recent studies have demonstrated 9p21.3 to be a susceptibility locus in many GWASs (43) including on breast cancer (44), glioma (45,46), type 2 diabetes (47–49), endometriosis (50), coronary artery disease (51,52), intracranial aneurysm (53) and glaucoma (44). Several splice variants with varied enhancer activity have been described for ANRIL (54), including GQ495921, GQ495919 and GQ495923, which are expressed in lung cancer cell lines (55). Multiple SNPs, including rs1333040 reported here, have been shown to be associated with ANRIL mRNA expression in peripheral blood (56). ANRIL recruits a polycomb repression complex (PRC2) to silence CDKN2B but not CDKN2A (54,57,58).
The identified SNP rs1333040 correlates (0.7 < r2 < 0.8) with 24 variants located within or 3′UTR downstream of CDKN2B-AS. None of these variants are located within the coding sequence. However, the possibility that the identified variant is tagging a functional SNP located directly within the CDKN2A/p16INK4A, p14ARF or CDKN2B/p15INK4B genes cannot be excluded. Further studies are needed to evaluate the effect that the SNPs we identified may have on ANRIL/CKDN2B-AS.
The 12p13 risk variants map within the RAD52 homolog gene which plays a role in DNA double-strand repair and homologous recombination (59,60). A role of the RAD52 in lung carcinogenesis was originally proposed from a candidate gene study reported by Danoy et al. (61), a finding confirmed by a pathway-based analysis using GWAS data from the National Cancer Institute (NCI), UK, and the MD Anderson Cancer Center (MDACC) studies which robustly demonstrated an association for squamous lung cancer (19). The role of RAD52 in repairing double-strand breaks induced by tobacco smoking is supported by the association being confined to smokers.
The present study has confirmed the smoking-related effect of 15q25 variation on lung cancer risk and has provided additional support for the existence of several independent disease loci within the CHRNA5/CHRNA3/CHRNB4 region. This is consistent with genotyping data which has shown several distinct signals for smoking behavior and lung cancer risk within this region (62–64). Saccone et al. (63) described four distinct loci influencing smoking behavior at 15q25 with at least two of them (locus 1 annotated by rs1051730/rs16969968 and locus 3 annotated by rs588765) having independent effects on smoking behavior. The second locus annotated by rs6495308 was more strongly associated with heavy smoking. In contrast, the Oxford-GlaxoSmithKline study reported a secondary locus distinct from rs6495308 (62).
Our current study supports the existence of the two distinct signals defined by rs1051730/rs16969968 and rs6495309/rs6495308/rs2036534. Reciprocal attenuation of the effects for these two signals when allele dosage for an opposite variant is included into a model raises the possibility of an underlying haplotypic effect (r2 = 0.17, D′ = 1.0 between these two SNPs) or an imperfect correlation with an unknown functional variant. We also observed an effect for rs680244/rs6495306 (r2 = 1.0, D′ = 1.0 with rs588765 for both) in our meta-analysis, which remained significant at a genome-wide level when controlled for rs6495309 and strongly diminished when controlled for both rs1051730 and rs6495309. This suggests that the rs588765/rs680244/rs6495306 effect on lung cancer risk is not independent. Similar to the earlier observation from Saccone et al. (63), these variants had opposite effects when adjusted for rs1051730, which may reflect a haplotypic organization in which the rs1051730 allele increases risk while other associated SNPs decrease the risk.
Our study confirmed a different genetic background for the two major histological subtypes of lung cancer—squamous cell carcinoma and adenocarcinoma. Although the role of the CHRNA5/CHRNA3/CHRNB4 locus at 15q25 and, to some extent, the CLPTM1L locus at 5p15.33 appeared independent from the histology type, all other identified genomic regions showed strong heterogeneity by histology, suggesting different genetic etiologies for these lung cancer subtypes. The significance of cell cycle control (CDKN2A/ARF/CDKN2B/ANRIL), DNA damage response and DNA repair genes (RAD52 and BAT3/MSH5) in squamous cell carcinoma is consistent with the notion of a particularly strong effect of smoking on the development of this histological subtype (65) and suggests candidate drug targets that may have clinical utility (66).
The power of the meta-analysis to identify 5p15.33, 6p21–6p22 and 15q25.1 risk SNPs and loci was over 90%, making it unlikely that additional lung cancer susceptibility variants of similar magnitude and allele frequencies can be identified by simply increasing sample size in Europeans. In contrast, the power of our study was limited to detect rarer variants (i.e. MAFs < 0.05) and common variants of a small effect size (i.e. RR ≤ 1.05) and/or with modest effects confined to a specific histology (Supplementary Material, Fig. S8). The present study was also limited to the genetic variants tagged by the genotyping arrays used. Several novel variants were identified within 5p15.33, 6p21–6p22, 9p21.3, 12p13.33 and 15q25.1 through imputation. The imputed variants correlated with the previously genotyped SNPs in individuals of European descendents, suggesting no additional independent signal within known loci to be identified. However, the replication of imputed variants by direct genotyping would be helpful to completely characterize the strength of effects of these SNPs.
In summary, by pooling results from 16 GWASs, we have been able to comprehensively assay the relationship between common genetic variation and lung cancer risk. Furthermore, we have been able to demonstrate a novel relationship between 9p21.3 variation and squamous cell lung carcinoma. This study provides valuable insights into the pathogenesis of lung cancer, indicating that there is etiological heterogeneity to disease development which is influenced by inherited genetic variation. The identification of additional risk loci is likely to require genotyping larger series using arrays formatted to capture variants poorly tagged by current platforms.
The study was conducted under the auspices of the Transdisciplinary Research In Cancer of the Lung (TRICL) Research Team, which is a part of the Genetic Associations and MEchanisms in ONcology (GAME-ON) consortium, and associated with the International Lung Cancer Consortium (ILCCO).
The meta-analysis was based on summary data from 16 previously reported lung cancer GWASs undertaken by nine analytical centers providing genotype data on 14 900 lung cancer cases and 29 485 controls of European descent: the MD Anderson Cancer Center lung cancer study (5); cases from the Liverpool Lung Project and control individuals from the UK Blood Service collections (UKBS) (4,67); the UK lung cancer GWAS from the Institute for Cancer Research including lung cancer cases from the Genetic Lung Cancer Predisposition Study (GELCAPS) and controls from the 1958 Birth Cohort (7,68,69); the deCODE Genetics lung cancer study (9); the Helmholtz-Gemeinschaft Deutscher Forschungszentren (HGF) lung cancer GWAS (70); the lung cancer study from Canada (University of Toronto and Samuel Lunenfeld Research Institute) (4); the Harvard lung cancer study (71); the NCI lung cancer GWAS including the Environment and Genetics in Lung Cancer Etiology (EAGLE) study (72), the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC) (73), the Prostate, Lung, Colon, Ovary Screening Trial (PLCO) (74) and the Cancer Prevention Study II Nutrition Cohort (CPS-II) (18,75); the IARC lung cancer GWAS (4) including Central Europe GWAS (76), the Carotene and Retinol Efficacy Trial (CARET) cohort lung cancer GWAS (77), the HUNT2/Tromso 4 study (78), lung cancer GWAS from France (79) and the lung cancer study from Estonia (80,81) (Table 1; Supplementary Material, Material and methods). All participants provided informed written consent. All studies were reviewed and approved by institutional ethics review committees at the involved institutions. In each of these studies, SNP genotyping had been performed using Illumina HumanHap 300 BeadChips, HumanHap550 or 610 Quad arrays. Further details about selection criteria, cancer diagnosis, genotyping and quality control in each study are provided in the Supplementary Material, Material and methods. Lung cancer diagnosis in most studies was based on histopathology or cytology but in a minority on clinical history and imaging.
The Chinese lung cancer GWAS included 2338 lung cancer cases and 3077 controls from the Nanjing and Beijing Lung Cancer Studies (15) genotyped using Affymetrix Human SNP Array 6.0 chips (Supplementary Material, Material and methods). The Nanjing and Beijing Lung Cancer Studies provided summary data on the top SNPs for overall lung cancer risk and risk by specific histology. The selected loci were 5p15.33 (1.20–1.61 Mb), 6p22.3–6p21.31 (22.0–36.5 Mb), 15q25.1 (76.1–77.2 Mb), 18q2.3 (40.0–21.5 Mb), 12p13.33 (0.54–1.54 Mb), 2q32.1 (183.4–184.5 Mb) and 9p21.3 (21.66–22.2 Mb) (NCBI Build 36).
Associations between SNP genotypes and lung cancer risk were evaluated under a log-additive model of inheritance. Additionally, we explored dominant, recessive and co-dominant models. Each study center provided summary statistics from two models: (i) unconditional logistic regression adjusted for sex, age at diagnosis or age at recruitment (5-year age intervals), country/study center where appropriate and significant principal components for population stratification and (ii) additionally adjusted for smoking status coded as categorical variable never/current/former. Analyses stratified by histology (adenocarcinoma, squamous cell carcinoma, large-cell carcinoma and small-cell carcinoma), sex, age at diagnosis for cases or recruitment for controls (≤50 and >50 years), smoking status (current, former, never), tumor stage (I–IV) and family history of lung cancer in a first-degree relative were performed (Supplementary Material, Table S1). Both the UK studies did not contribute data to the smoking analysis, since this information was not available for controls. In addition to the above analyses, each centre provided lung cancer risk estimates for 15q25, 6p21 and 5p15 loci after controlling for allelic dosage for the most significantly associated SNP(s) within the locus. For the 15q25 locus, the statistical model included rs1051730 and/or rs6495309 allelic dosages as covariates; for the 6p21 locus, rs3117582 allelic dosage and for the 5p15 locus allelic, dosages for rs401681 and/or rs2736100.
Prior to undertaking the meta-analysis of all GWAS data sets, we searched for potential errors and biases in data from each case–control series (82). With the exception of the Liverpool Lung Project, quantile–quantile (Q–Q) plots showed that there was minimal inflation of the test statistics, indicating that substantial cryptic population substructure or differential genotype calling between cases and controls was unlikely in each of the GWASs (Supplementary Material, Fig. S1).
To refine the association of the previously reported and newly identified disease loci, we imputed untyped genotypes using Impute2 (83), Mach1 (84,85) or minimac (86) software and HapMap Phase II, Phase III and/or 1000 Genome Project data release 2010-08 or 2010-06 reference genotypes (Supplementary Material, Table S6). The selected loci were 5p15.33 (1.20–1.61 Mb), 6p22.3–6p21.31 (22.0–36.5 Mb), 15q25.1 (76.1–77.2 Mb), 18q2.3 (40.0–21.5 Mb), 12p13.33 (0.54–1.54 Mb), 2q32.1 (183.4–184.5 Mb) and 9p21.3 (21.66–22.2 Mb) (NCBI Build 36). The analytical scheme was similar to the meta-analysis but taking imputation uncertainty into account by using posterior means or allele dosage in logistic regression. Imputed allele dosage for each SNP was tested for association with lung cancer risk using the two models with and without adjustment for smoking as described above. The meta-analysis of imputed genotypes included all studies except the HGF Germany where imputed data were not available. Poorly imputed SNPs defined by an RSQR < 0.30 with MACH1/minimac or an information measure Is < 0.30 with IMPUTE2 were excluded from the analyses (Supplementary Material, Table S6).
The meta-analysis was primarily based on pooling GWAS results for the log-additive model of inheritance from 318 094 SNPs featured on Illumina HumanHap 300 BeadChips arrays. For studies genotyped on HumanHap550 or 610Quad Illumina platforms, additional 217 914 SNPs were available to inform our analysis.
Meta-analysis under fixed and random-effects models was conducted. As with individual studies, we examined for the over-dispersion of P-values in the meta-analysis by generating Q–Q plots and deriving an inflation factor λ by comparison of observed versus expected P-values for the meta-analysis applying the estlambda function within the GenABEL package (87). Cochran's Q statistic to test for heterogeneity and the I2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated. I2 values ≥75% are considered the characteristic of large heterogeneity (88). To assess the robustness of associations in the meta-analysis, we performed a sensitivity analysis sequentially excluding studies. Wherever removing one study resulted in a >10% change of the OR point estimates, we reported results separately (89).
All calculations were performed using PLINK (90) and SAS version 9.2 (SAS Institute Inc., Cary, NC, USA). Q–Q and Manhattan plots were created using an R library GenABEL (87). We used LocusZoom for regional visualization of results (91). Power calculations were performed using QUANTO version 1.2.4 for the main effect of gene and the log-additive model of inheritance stipulating a P-value of 5.0 × 10−8 (92).
HapMap Project: http://www.hapmap.org
1000 Genome Project: http://www.1000genomes.org/
SNP Annotation and Proxy Search: http://www.broadinstitute.org/mpg/snap/index.php
R project: www.r-project.org/
AceView: integrative annotation of cDNA-supported genes in human, mouse, rat, worm and Arabidopsis: http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/
International Lung Cancer Consortium: http://ilcco.iarc.fr/
Genetic Association Mechanisms in Oncology (GAME-ON): A Post-Genome Wide Association Initiative: http://epi.grants.cancer.gov/gameon/
This study was supported by the grant from the National Institute of Health (NIH) (U19CA148127). The SLRI study was supported by Canadian Cancer Society Research Institute (020214), Ontario Institute of Cancer and Cancer Care Ontario Chair Award to R.H. The ICR study was supported by Cancer Research UK (C1298/A8780 and C1298/A8362—Bobby Moore Fund for Cancer Research UK) and NCRN, HEAL and Sanofi-Aventis. Additional funding was obtained from NIH grants (5R01CA055769, 5R01CA127219, 5R01CA133996, and 5R01CA121197). The Liverpool Lung Project (LLP) was supported by The Roy Castle Lung Cancer Foundation, UK. The ICR and LLP studies made use of genotyping data from the Wellcome Trust Case Control Consortium 2 (WTCCC2); a full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Sample collection for the Heidelberg lung cancer study was in part supported by a grant (70-2919) from the Deutsche Krebshilfe. The work was additionally supported by a Helmholtz-DAAD fellowship (A/07/97379 to M.N.T.) and by the National Institute of Health (USA) (U19CA148127). The KORA Surveys were financed by the GSF, which is funded by the German Federal Ministry of Education, Science, Research and Technology and the State of Bavaria. The LUng Cancer in the Young study (LUCY) was funded in part by the National Genome Research Network (NGFN), the DFG (BI 576/2-1; BI 576/2-2), the Helmholtzgemeinschaft (HGF) and the Federal office for Radiation Protection (BfS: STSch4454). Genotyping was performed in the Genome Analysis Center (GAC) of the Helmholtz Zentrum Muenchen. Support for the Central Europe, HUNT2/Tromsø and CARET genome-wide studies was provided by Institut National du Cancer, France. Support for the HUNT2/Tromsø genome-wide study was also provided by the European Community (Integrated Project DNA repair, LSHG-CT-2005-512113), the Norwegian Cancer Association and the Functional Genomics Programme of Research Council of Norway. Support for the Central Europe study, Czech Republic, was also provided by the European Regional Development Fund and the State Budget of the Czech Republic (RECAMO, CZ.1.05/2.1.00/03.0101). Support for the CARET genome-wide study was also provided by grants from the National Cancer Institute of the U.S. National Institutes of Health (R01 CA111703 and UO1 CA63673) and by funds from the Fred Hutchinson Cancer Research Center. Additional funding for study coordination, genotyping of replication studies and statistical analysis was provided by the US National Cancer Institute (R01 CA092039). The lung cancer GWAS from Estonia was partly supported by a FP7 grant (REGPOT 245536), by the Estonian Government (SF0180142s08), by EU RDF in the frame of Centre of Excellence in Genomics and Estoinian Research Infrastructure's Roadmap and by University of Tartu (SP1GVARENG). The work reported in this paper was partly undertaken during the tenure of a Postdoctoral Fellowship from the IARC (for MNT). The Environment and Genetics in Lung Cancer Etiology (EAGLE), the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC) and the Prostate, Lung, Colon, Ovary Screening Trial (PLCO) studies and the genotyping of ATBC, the Cancer Prevention Study II Nutrition Cohort (CPS-II) and part of PLCO were supported by the Intramural Research Program of NIH, NCI, Division of Cancer Epidemiology and Genetics. ATBC was also supported by U.S. Public Health Service contracts (N01-CN-45165, N01-RC-45035 and N01-RC-37004) from the NCI. PLCO was also supported by individual contracts from the NCI to the University of Colorado Denver (NO1-CN-25514), Georgetown University (NO1-CN-25522), Pacific Health Research Institute (NO1-CN-25515), Henry Ford Health System (NO1-CN-25512), University of Minnesota (NO1-CN-25513), Washington University (NO1-CN-25516), University of Pittsburgh (NO1-CN-25511), University of Utah (NO1-CN-25524), Marshfield Clinic Research Foundation (NO1-CN-25518), University of Alabama at Birmingham (NO1-CN-75022, Westat, Inc. NO1-CN-25476), University of California, Los Angeles (NO1-CN-25404). The Cancer Prevention Study II Nutrition Cohort was supported by the American Cancer Society. The NIH Genes, Environment and Health Initiative (GEI) partly funded DNA extraction and statistical analyses (HG-06-033-NCI-01 and RO1HL091172-01), genotyping at the Johns Hopkins University Center for Inherited Disease Research (U01HG004438 and NIH HHSN268200782096C) and study coordination at the GENEVA Coordination Center (U01 HG004446) for EAGLE and part of PLCO studies. Funding for the MD Anderson Cancer Study was provided by NIH grants (P50 CA70907, R01CA121197, RO1 CA127219, U19 CA148127, RO1 CA55769) and CPRIT grant (RP100443). Genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is funded through a federal contract from the National Institutes of Health to The Johns Hopkins University (HHSN268200782096C). The Harvard Lung Cancer Study was funded by Funded by National Institutes of Health (CA074386, CA092824, CA090578). The Nanjing study and Beijing studies were funded by the China National High-Tech Research and Development Program Grant (2009AA022705), the National Key Basic Research Program Grant (2011CB503805) and the National Natural Science Foundation of China (30730080, 30972541, 30901233 and 30872178). Funding to pay the Open Access publication charges for this article was provided by the grant from the National Institute of Health (NIH) (U19CA148127).
We would like to thank all individuals who participated in this study and who made this study possible.
We gratefully acknowledge the Genetic Associations and MEchanisms in ONcology (GAME-ON) consortium.
We are grateful to clinicians who took part in the GELCAPS consortium and the Liverpool Lung Project (LLP) program. The ICR and LLP studies made use of genotyping data from the WTCCC2; a full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk.
We thank Dr H. Dally, Mrs A. Seidel, Dr T. Muley, Dr S. Thiel, Dr H. Wikman, Ms U. von Seydlitz-Kurzbach, Ms D. Bodemer, Dr C. Klappenecker, Mr M. Hoffmann and Mr C. Beynon for help with sample and/or data collection and archiving for the Heidelberg lung study. We are grateful to all patients and staff at the Thoraxklinik-Heidelberg, who participated in the Heidelberg lung cancer study and particularly Prof. Peter Drings for making it possible.
We gratefully acknowledge the KORA study group especially G. Fischer, H. Grallert, N. Klopp, C. Gieger, R. Holle, C. Hanrieder and A. Steinwachs, Institute of Epidemiology, Helmholtz Centre Munich, Neuherberg, Germany, and all individuals, who participated as cases or controls in this study and the KORA Study Center and their co-workers for organizing and conducting the data collection.
We also gratefully acknowledge the LUCY-consortium especially G. Wölke and V. Zietemann for coordinating recruitment, all patients and staff of the participating hospitals: Aurich (Dr Heidi Kleen); Bad Berka (Dr med. R. Bonnet, Klinik für Pneumologie, Zentralklinik Bad Berka GmbH); Bonn (Prof. Ko, Dr Geisen, Innere Medizin I, Johanniter Krankenhaus); Bonn (Dr Stier, Medizinische Poliklinik, Universität Bonn); Bremen (Prof. Dr D. Ukena, Dr Penzl, Zentralkrankenhaus Bremen Ost, Pneumologische Klinik); Chemnitz (PD Dr Schmidt, OA Dr Jielge, Klinikum Chemnitz, Abteilung Innere Medizin); Coswig (Prof. Höffken, Dr Schmidt, Fachkrankenhaus Coswig); Diekholzen (Dr Hamm, Kreiskrankenhaus Diekholzen, Klinik für Pneumologie); Donaustauf (Prof. Pfeifer, Dr v. Bültzingslöwen, Fr. Schneider, Fachklinik für Atemwegserkrankungen); Essen (Prof. Teschler, Dr Fischer, Ruhrlandklinik—Universitätsklinik, Abt. Pneumologie); Gauting (Prof. Häußinger, Prof. Thetter, Dr Düll, Dr Wagner, Pneumologische Klinik München-Gauting); Gera (CA MR Dr Heil, OÄ Dr Täuscher, OA Dr Lange, II. Medizinische Klinik, Wald-Klinikum Gera); Göttingen (Prof. Trümper, Prof. Griesinger, Dr Overbeck, Abteilung Onkologie, Hämatologie); Göttingen (Prof. Schöndube, Dr Danner, Abteilung Thorax-, Herz- und Gefäßchirurgie); Göttingen/Weende (Dr med. Fleischer, Ev. Krankenhaus Göttingen-Weende e.V., Abteilung Allgemeinchirurgie); Greifenstein (Prof. Morr, Dr M. Degen, Dr Matter, Pneumologische Klinik, Waldhof Elgershausen); Greifswald (Prof. Ewert, Dr Altesellmeier, Universitätsklinik Greifswald, Klinik für Innere Medizin B); Hannover (Prof. Schönhofer, Dr Kohlaußen, Klinikum Hannover Oststadt, Medizinische Klinik II, Pneumologie); Heidelberg (Prof. Drings, Dr Herrmann, Thoraxklinik-Heidelberg GmbH, Abt. Innere Medizin-Onkologie); Hildesheim (Prof. Kaiser, St. Bernward Krankenhaus, Medizinische Klinik II); Homburg (Prof. Sybrecht, OA Dr Gröschel, Dr Mack, Uniklinik des Saarlandes, Innere Medizin V); Immenhausen (Prof. Andreas, Dr Rittmeyer, Fachklinik für Lungenerkrankungen); Köln (Priv. Doz. Dr Stölben, Kliniken der Stadt Köln, Lungenklinik Krankenhaus Merheim); Köln (Prof. Wolf, Dr Staratschek-Jox, Klinikum der Universität Köln, Klinik I für Innere Medizin); Leipzig (Prof. Gillisen, OA Dr Cebulla, Städt. Klinikum St Georg, Robert-Koch-Klinik); Leipzig (Kreymborg, Universitätsklinikum Leipzig, Medizinische Klinik I, Abteilung Pneumologie); Lenglern (Prof. Criée, Dr Körber, Dr Knaack, Ev. Krankenhaus Weende e.V., Standort Lenglern, Abt. Pneumologie; München (Prof. Huber, Dr Borgmeier, Klinikum der LMU-Innenstadt, Abt. Pneumologie); Neustadt a. Harz (Dr Keppler, Schäfer, Evangelisches Fachkrankenhaus für Atemwegserkrankungen); Rotenburg (Prof. Schaberg, Dr Struß, Diakoniekrankenhaus Rotenburg, Lungenklinik Unterstedt); St. Pölten – Österreich (OA Dr M. Wiesholzer, Zentralklinikum St Pölten, I. Medizinische Klinik).
We thank Li Su, Yang Zhao, Geoffrey Liu, John Wain, Rebecca Heist and Kofi Asomaning.
We gratefully acknowledge the Icelandic Cancer Registry (www.krabbameinsskra.is) for assistance in the ascertainment of the Icelandic UBC patients for the deCODE lung cancer study.
Conflict of Interest statement. Some authors involved in this study own stock in DeCODE Genetics.