|Home | About | Journals | Submit | Contact Us | Français|
Chromosome 5p15.33 has been identified by genome-wide association studies as one of the regions that associate with lung cancer risk. A few single-nucleotide polymorphisms (SNPs) in the telomerase reverse transcriptase (TERT) and cleft lip and palate transmembrane 1-like (CLPTM1L) genes located in this region have shown consistent associations. We performed dense genotyping of SNPs in this region to refine the previously reported association signals for lung cancer risk. Two hundred and fifteen SNPs were genotyped on an Illumina iSelect panel, in a hospital-based case–control study of 1681 lung cancer cases and 1235 unaffected controls. Association was tested using unconditional logistic regression, while adjusting for age, sex and pack-years smoked. Furthermore, since many of the SNPs were in linkage disequilibrium (LD), haplotype blocks were constructed, from which tagging SNPs at an r2 threshold of ≥0.95 were included in a stepwise forward selection logistic regression model. Of the 215 SNPs, 69 were significant at P < 0.05 in univariate analysis; of these, 35 SNPs meeting the r2 threshold were included in the multiple logistic regression model. Two SNPs, rs370348 (odds ratio = 0.76, P = 1.6 × 10−6) and rs4975538 (odds ratio = 1.18, P = 0.005), significantly associated with risk in the overall sample. Among ever smokers, rs4975615 (odds ratio = 0.75, P = 1.2 × 10−4) and rs4975538 (odds ratio = 1.26, P = 0.002) were significant, whereas among never-smokers, rs451360 (odds ratio = 0.62, P = 7.6 × 10−5) was significant. We refined the consistent association signal in this region, allowing for the considerable LD between SNPs and identified four novel SNPs that were independently and significantly associated with lung cancer risk. Results of these analyses strongly suggest effects on risk from several loci in the TERT/CLPTM1L region.
Genome-wide association studies (GWAS) have identified chromosome 5p15.33 as one of the regions that reproducibly associates with lung cancer risk and risk for several other cancers as well. 5p15.33 is one of the three regions, besides 15q25.1 and 6p21.33 with the strongest and most consistent association signals for lung cancer risk (1–3). Several studies have confirmed the association of single-nucleotide polymorphisms (SNPs) in the 5p15.33 region with lung cancer risk (4–8) suggesting that this region may have a role in lung cancer etiology. This 5p15.33 region comprises two candidate susceptibility genes, telomerase reverse transcriptase (TERT) and cleft lip and palate transmembrane 1-like (CLPTM1L). TERT encodes a catalytic subunit of telomerase that maintains telomere ends. Its overexpression leads to prolongation of the life span of the cell (9). Although not detectable in most normal tissues, it is overexpressed in cancer cells. CLPTM1L plays a role in apoptosis and has been found to be upregulated in cisplatin-resistant cell lines (10).
That the 5p15.33 region is important in susceptibility to lung cancer is further supported by replication of the association in diverse populations and in subgroups stratified by smoking history, histological subtype and sex. A few SNPs in the TERT and CLPTM1L genes have been replicated in African Americans (4) and Asians (7,11). Similarly, the rs2361000 SNP in the hTERT gene has been associated with lung cancer risk in smokers and shows stronger associations among never-smokers (8). Indeed, this SNP appears to strongly influence risk in those with adenocarcinoma (4,11), the most common form of lung cancer in never-smokers. Finally, tumor studies confirm the relevance of the 5p15.33 region in lung cancer etiology. Comparative genomic hybridization and fluorescence in situ hybridization studies of lung tumors by Kang et al. (12) have identified amplification in the 5p15.33 region as one of the most consistent alterations in early stage lung cancer. Likewise, Zienolddiny et al. (13) found that rs402710 (TERT–CLPTM1L locus) is associated with increased formation in the lungs of DNA adducts, which are possible precursor to lung carcinogenesis.
Since GWAS coverage is roughly one SNP per 10000 bp and may not be adequate to refine the localization of the causal variants in this region, we conducted a dense analysis of 215 SNPs across the TERT and CLPTM1L genes to refine the association signal.
The study participants for this case–control study were a consecutive series of newly diagnosed Caucasian lung cancer cases from an ongoing lung cancer study that has been accruing participants at The University of Texas MD Anderson Cancer Center since 1995. The controls were recruited from the Kelsey-Seybold Foundation, Houston’s largest multidisciplinary physician practice. The controls were frequency matched to the cases on age (±5 years), sex, smoking status and ethnicity (14). These study subjects were not included as participants in the GWAS for lung cancer conducted at MD Anderson that was reported recently (1). Only non-Hispanic white subjects (both cases and controls) were included in these analyses. All participants provided informed consent and the study was approved by the Institutional Review Board.
We used SNP browser version 4.0 (15) to identify SNPs for further study. The software was designed for selection of SNPs based on observed linkage disequilibrium (LD), including construction of metric LD maps and the selection of haplotype-tagging SNPs. SNP selection was based on the ethnic-specific LD patterns identified by HapMap Project http://hapmap.ncbi.nlm.nih.gov/.
We analyzed SNPs located in the area between 1290 and 1450 kb on chromosome 5. Because we had a relatively large sample size and wanted as complete as possible coverage of the candidate region, we used liberal criteria for SNP selection: minor allele frequency ≥0.01, all non-synonomous SNPs were selected and no exclusion was initially proposed based on the distance from the neighboring SNP. The area includes four genes: SLC6A18, TERT, CLPTM1L and SLC6A3. A panel of 568 validated SNPs was generated.
A total of 221 SNPs in the 5p15.33 region were included as part of a custom-designed Illumina iSelect Genotyping Beadchip (San Diego, CA) that included 19949 SNPs, a majority of which (11930) were in inflammation pathways for a separate analysis. Of the 568 validated SNPs in the target region of interest, many SNPs that had been identified had to be removed because the Illumina iSelect chemistry cannot accommodate SNPs <50 bp from each other. SNPs were also excluded if other quality control metrics established by Illumina indicated that the SNP was likely to fail. Genotypes were called using the Beadstudio software. There were 64 expected duplicates and 4 unexpected duplicates (individuals who had DNA analyzed twice and were found to have the same DNA and are therefore either identical twins or the same person). The data were first filtered to remove any SNPs with <95% call rate and then individuals with <90% call rate across SNPs were removed. Among the remaining samples, the error rate obtained by comparing original to replicate samples was 0.0176%. The final data set consisted of 1681 Caucasian lung cancer cases and 1235 unaffected controls, with genotyped results for 215 SNPs.
Hardy–Weinberg equilibrium was tested for each of the SNPs using a Fisher’s exact test.
A three-step strategy was applied to perform the analyses (Figure 1fig1). First, association between each SNP and lung cancer was tested using unconditional logistic regression using STATA 10 (StataCorp LP, College Station, TX). Both allelic (additive or per allele) and genotypic (each genotype is compared with the other two by creating dummy variables) genetic models were tested. Only those SNPs significant at a χ2 P < 0.05 in univariate analysis were moved forward to the second stage of analysis. Next, to overcome problems of collinearity between SNPs significant in univariate analysis, haplotype blocks were constructed using Haploview v.4.1 (16) to identify sets of SNPs in LD. Then, to filter out highly correlated SNPs, Tagger (17), a program implemented within Haploview was used to select tagging SNPs from each block to represent SNPs that were in high LD with each other at an r2 threshold of ≥0.95. In this selection process, the SNP with the smallest P value (i.e. most significant SNP, rs370348) from the univariate analysis and two SNPs (rs401681 and rs31489) of the eight previously published SNPs that were not selected by Tagger were forced to be included so that all eight published SNPs were retained for analysis. This subset of SNPs was carried forward for further analysis by stepwise forward selection logistic regression to determine those SNPs that continued to show association (at P < 0.01) while adjusting for age (continuous), sex, pack-years smoked (continuous) and other SNPs in the model. Stepwise forward selection was repeated for subgroup analysis by sex, histological subtype and pack-years smoked.
In view of the strong correlation between smoking and lung cancer risk, we also performed logistic regression analyses stratified by smoking status and tested for SNP–smoking interaction by the Likelihood ratio test (testing the model with and without the SNP–smoking interaction term included in the model). In addition, we analyzed the SNP–lung cancer association with and without smoking (in pack-years) included as a covariate and examined differences in effect size in the smoking-adjusted and unadjusted logistic regression analyses. Finally, we examined the association of each SNP with smoking (ever smoker–never-smoker as the outcome variable) in cancer free controls, while adjusting for age and sex.
To control for type 1 error due to multiple testing, instead of applying the Bonferroni correction, we applied a method proposed by Li and Ji (18) that controls the error rate for correlated tests. We applied this method in view of the large number of highly correlated SNPs in our data. The Li and Ji method is based on calculating the effective number of independent tests performed—[M(eff)], first proposed by Cheverud (19). Li and Ji further developed the method to provide a more accurate estimate of the M(eff), to control the experiment-wise significance level. Using this method, the effective number of independent marker loci was calculated as 99.49 and the experiment-wide significance threshold required to keep the type I error rate at 5% was 0.0005.
We performed haplotype analysis for the SNPs that were significantly associated with lung cancer risk in the stepwise forward selection logistic regression analysis.
Imputation was performed to increase coverage of SNPs in the 5p15.33 region. An additional 136 SNPs were imputed using HapMap 3 reference data (release #2, February 2009) plus 1000 Genomes reference data (March 2010 genotypes; June 2010 haplotypes) that had predicted imputation r2 values ≥0.9. After imputation, the most likely genotype was used for analyses.
We also evaluated whether or not rare variants in the region associated with lung cancer risk using the WHaIT: weighted haplotype and imputation-based tests program (20). In this analysis, we restricted the study to only include SNPs that had a minor allele frequency of ≤0.05 and then evaluated whether, in aggregate, there was evidence that SNPs showed an association with lung cancer risk, upweighting according to the rarity of alleles.
The analyses included 1681 lung cancer case subjects and 1235 unaffected controls. Although frequency matching was performed by age (±5 years), sex and smoking status, the matching was incomplete; the cases were on average 6 years older than the controls, the proportion of affected males was higher than the affected females and smokers were over-represented in the case group (Table I).
Of the 215 SNPs genotyped, 69 were statistically significant at P < 0.05 in univariate analysis in either an additive (per allele) or a genotypic model. Of these 69 SNPs, four SNPs (rs2735940, rs2853672, rs37002 and rs402710) were out of Hardy–Weinberg equilibrium at P < 0.05, but we did not exclude these from the analysis. All eight previously reported SNPs in the 5p15.33 region (3–8,11,21) (Supplementary Table 1 is available at Carcinogenesis Online) were significantly associated with lung cancer risk at P < 0.05 in our data at this stage of the analysis. To filter out highly correlated SNPs to overcome collinearity in the data, 35 tag SNPs were selected from haplotype blocks to represent the 69 significant SNPs using Tagger in Haploview. These 35 SNPs that were retained for further analysis included all the eight published SNPs and the most significant SNP (rs370348) in the univariate analysis. The adjusted odds ratios, 95% confidence intervals and P values for the 35 selected SNPs are presented in Table II. The results of the association analysis for the 35 SNPs are presented in Figure 2fig2, along with the observed correlation (r2) between the SNPs. The results of the eight previously reported SNPs in this region are indicated by square boxes in Figure 2A; all eight were significant at P < 0.05 in our data.
In the stepwise forward selection logistic regression analysis, two SNPs, rs370348 and rs4975538, were significantly associated with lung cancer (Table III). However, only rs370348 met the multiple testing corrected threshold for significance at P < 0.0005. Among smokers, rs4975615 (but not rs370348) and rs4975538 were significant, whereas among never-smokers, rs451360 was the only SNP significantly related to lung cancer risk (Table III, Figure 3). In stratified analysis by sex, rs370348 was significantly associated with lung cancer risk in men, reaching genome-wide significance (P = 4.9 × 10−8), but none of the SNPs reached statistical significance in women. In analysis by histological subtype, results for adenocarcinoma were consistent with results of the overall analyses. Detailed results by histological subtype are presented in Figure 3.
We also performed analyses to determine whether the genes identified in the 5p15.33 region are possibly associated with smoking and not lung cancer. In stratified analysis by smoking status (presented in Table II), the top SNPs in overall analyses continued to be the top SNPs in both smokers and non-smokers, although the P values were attenuated due to stratification. The exception was rs6889886, a SNP that was not significant in ever smokers but was statistically significant in never-smokers (P = 4.3 × 10−4). This SNP was also statistically significant for interaction with smoking (P = 0.03). However, this SNP was not independently associated with lung cancer in the forward selection logistic regression analysis overall or by smoking subgroups. Two other SNPs, rs7727912 and rs27065, were significant at P < 0.05 in the test for multiplicative interaction but the main effects for both these SNPs were only marginally significant. Positive tests for interactions may reflect the large number of tests and should be viewed with some skepticism. We also performed logistic regression analysis with and without adjusting for smoking (pack-years smoked) and found that the effect size did not change by >10% for any of the SNPs (Supplementary Table 2 is available at Carcinogenesis Online). Finally, in the logistic regression analysis of the SNP–smoking association in unaffected controls (with ever–never smoked as the outcome), none of the SNPs was statistically significant (P > 0.05; results not shown). These results indicate that the association detected for the SNPs in the 5p15.33 region is with lung cancer risk and not with smoking.
In the haplotype analysis (results not presented), none of the haplotypes was more significant than the most significant underlying SNP tests. Hence, haplotype analysis did not identify a haplotype that fit the data better than individual SNP studies did. The omnibus test was also not better than the individual SNP tests.
Other than rs370348 and rs4975615 (r2 = 0.78), there was low correlation between the other significant SNPs (Figure 3). LD mapping with the eight previously reported SNPs showed that two of the significant SNPs identified, rs4975538 (maximum r2 = 0.27 with rs2736100) and rs451360 (maximum r2 = 0.58 with rs4635969) were largely independent of the other SNPs (Figure 3).
Results of the imputed SNPs are presented in Supplementary Figure 1 is available at Carcinogenesis Online along with the results of the genotyped SNPs in this region. None of the imputed SNPs reached a significance level greater than the genotyped SNPs so the imputed SNPs were not analyzed further.
Analysis of rare variants was not contributory and the P value derived from the rare variant analysis program WHaIT (20) did not identify an excess of rare variants in the case or control populations (P = 0.52 for ever smokers and P = 0.8 for never-smokers).
In this fine mapping analysis of genetic variants in the 5p15.33 region, we identified four novel SNPs associated with lung cancer risk, one of which was specific to smokers and one was specific to non-smokers. None of the SNPs was in protein-coding region or a promoter or splice site variant; rs4975538 is an intronic SNP in TERT, rs451360 and rs370348 are intronic SNPs in CLPTM1L and rs4975615 is in the intergenic region between the two genes. Although none of these SNPs is in a putative functional region, our findings confirm that the TERT–CLPTM1L region is related to lung cancer risk. Interestingly, there are very few common variants in the exonic protein-coding regions in TERT or CLPTM1L and other than rs2736098 (corresponding to A305A); all are rare variants with a <5% minor allele frequency.
The GWAS by McKay et al. (5) first suggested two genes, TERT and CLPTM1L, in the 5p15.33 region that could have a role in lung cancer susceptibility. Their GWAS identified four SNPs in this region, rs402710, rs2736100, rs401681 and rs31489, of which the first two SNPs were replicated in an independent sample of cases and controls. The studies that followed confirmed the association of these and other SNPs in the TERT–CLPTM1L region (Supplementary Table 1 is available at Carcinogenesis Online). Furthermore, lung cancer risk for SNPs in the TERT–CLPTM1L region was also reported by other authors in specific subgroups, including never-smokers (8,11), people with adenocarcinoma (4), women (11,21), African Americans (21) and Asians (7,11,22). SNPs in the TERT–CLPTM1L region that reached genome-wide significance in these studies are listed in Supplementary Table 1 (available at Carcinogenesis Online). In particular, rs2736100 was found to be significant across different studies and subgroups, which suggest that this or another SNP in LD with it is probably to be causally related to lung cancer risk. In comparison, although all the eight previously reported SNPs were nominally significant at P < 0.05 in our study (Table II and Figure 2A), rs2736100 was not one of the most significant SNPs (P = 0.009).
There is evidence that the 5p15.33 region may be important in susceptibility to other cancers as well. Rafnar et al. examined rs401681 and rs2736098, two SNPs in the 5p15.33 region for their association with risk for many different types of cancer and found a significant association for rs401681 with several cancers including basal cell carcinoma, lung, bladder, prostate and cervical cancers (6). Another study confirmed the association of variants in this region with bladder cancer (23) and associations were also determined for rs401681 with squamous cell carcinoma of the head and neck (24). Furthermore, in a GWAS for pancreatic cancer, rs401681 was identified as one of the susceptibility loci (25). Interestingly, rs401681 was the second most significant SNP in our study (P = 1.1 × 10−5). Rafnar et al. also examined the association between rs401681 and rs2736098 and telomere length in DNA from whole blood as telomere shortening is a possible mechanism of carcinogenesis related to the 5p15.33 region. Their results suggested that the variants may lead to a gradual shortening over time, although this effect was only apparent in older women (6). However, these results were not confirmed by Pooley et al., who found that rs401681, which is located in intron 13 of CLPTM1L was not associated with mean telomere length (26).
Both TERT and CLPTM1L at the 5p15.33 susceptibility locus are attractive candidate genes for lung cancer as they have both been plausibly linked with carcinogenesis. TERT encodes the catalytic subunit of telomerase, an enzyme that maintains telomere ends by adding the telomere repeat TTAGGG. It has been shown that telomerase expression is high in progenitor and cancer cells and absent or low in normal somatic cells (27). Telomere length is linked to aging and anti-apoptosis, and mouse studies have shown that dysregulation of telomerase expression may be involved in oncogenesis (28). Similarly, CLPTM1L encodes an enzyme—cleft lip and palate transmembrane 1-like that is upregulated in cisplatin-resistant cell lines and may be associated with apoptosis (10). Furthermore, the risk allele of rs402710 within the CLPTM1L gene has also been found to be associated with a higher accumulation of DNA damage measured by bulky aromatic/hydrophobic DNA adducts, which may be an early step in lung carcinogenesis (13).
One of the lingering questions about the GWAS hits identified for lung cancer is whether genes identified are associated with smoking and not lung cancer. We tested for gene–smoking interaction to see if smoking modified the SNP–lung cancer association. However, other than for rs6889886, our results did not show that smoking modified the SNP–lung cancer association for any of the SNPs. Even for rs6889886, which is an intergenic SNP between CLPTM1L and SLC6A3 genes, evidence for an interaction with smoking could reflect type 1 error, given the number of tests performed. We also examined smoking (as pack-years smoked) as a confounder of the SNP–lung cancer association and did not find a significant change in the effect sizes when we compared the results of the unadjusted and adjusted analyses. Finally, we examined the SNP–smoking association in the unaffected controls and found that none of the SNPs was associated with smoking. Our findings clearly suggest that the SNPs in the 5p15.33 region are strongly associated with lung cancer and not smoking.
In summary, in this analysis, we used a fine mapping approach to evaluate additional, possibly causal SNPs in the 5p15.33 GWAS-identified lung cancer susceptibility locus. We used multiple logistic regression according to haplotype blocks to identify independent variants associated with lung cancer risk. We identified rs370348 and rs4975538 as novel SNPs associated with lung cancer risk and two additional SNPs that may be susceptibility markers for lung cancer risk in smokers (rs4975615) and non-smokers (rs451360). Our results show that after fine mapping, the 5p15.33 locus that has repeatedly been identified as a strong susceptibility locus for lung cancer, there appears to be several distinct loci influencing disease risk. None of the SNPs we identified were obvious functional SNPs, that is, in exonic, splice site or promoter regions. A limitation of this study was incorporation of a limited number of SNPs on the SNP array of the total number of SNPs identified in the SNP selection process. Future analyses using sequencing approaches may help to identify all causal variants in this region and animal and cell models may be needed to establish mechanisms of cancer risk.
This research was supported in part by the National Institutes of Health through MD Anderson’s Cancer Center Support Grant (CA016672); research grants (CA55769, CA127219, R01 CA121197, 1P50 CA70907, U19 CA148127); Cancer Prevention & Research Institute of Texas grant (RP100443).
The authors gratefully acknowledge the contribution of Emily Lu for help with data imputation and analysis, Huifeng Zhang for genotyping and Stephanie Deming for scientific editing.
Conflict of Interest Statement: None declared.