This study provides further evidence that the genes underlying susceptibility to lung cancer may include genes relevant to susceptibility to COPD. This has been possible by using cohorts of smokers, matched for smoking exposure, but quite different in their phenotypic response to smoking exposure. This phenotypic response has been defined in part by the presence or absence of COPD, itself a common sub-phenotype of lung cancer
[8],
[13],
[14], defined by a measurable biomarker (FEV
1) with a strong genetic basis
[2],
[7]. By comparing chronic smokers with normal lung function with those with COPD and lung cancer, sub-phenotyped for COPD, the genetic associations identified to date can be better understood. Indeed, by re-examining the associations reported from recently reported lung cancer and COPD (FEV
1) GWA studies, the results of this current study suggest the genetic effects from these loci confer specific protective or susceptibility effects on COPD, Lung cancer or both (, and ). Despite comparatively small sample sizes here, using this approach the authors have recently shown that the 15q25 (CHRNA 3/5) and 4q31 (HHIP/GYPA) loci might be relevant in both COPD and lung cancer
[26],
[28]. The results in this study suggest that the rs1052486 SNP on the 6p21 locus (BAT3) confers susceptibility to lung cancer in smokers with pre-existing COPD and that, the rs402710 SNP on the 5p15 locus (CRR9/TERT) and the rs2808630 SNP on the 1q23 locus (CRP) confer susceptibility to lung cancer in those with no pre-existing COPD. The rs1422795 SNP on the 5q33 locus (ADAM 19), previously associated with reduced FEV
1 [22],
[23], might also confer susceptibility to both COPD and lung cancer. The rs7671167 SNP on the 4q22 locus (FAM13A), previously linked to reduced lung function and COPD
[23,] is associated with both COPD and lung cancer. Larger studies will be needed to confirm these findings as the sample sizes here are small, particularly after sub-phenotyping the lung cancer cases for COPD. These results also suggest that the previously published risk algorithm
[27],
[32], where combining risk genotypes and clinical variables identified in a multivariate analysis, can segment smokers into moderate, high and very high risk of lung cancer. The authors conclude that when spirometry is used to sub-phenotype smokers, genes with effects on reduced lung function or COPD appear to be relevant in “susceptibility” to lung cancer. This provides further evidence to support existing epidemiological studies suggesting COPD and lung cancer are related by more than smoking exposure
[24],
[30] but also an overlapping genetic susceptibility to smoking ( and and )
[26],
[28].
Epidemiological studies suggest COPD is an important sub-phenotype of lung cancer. The results of this study suggest genetic associations broadly define three disease groups: smokers primarily susceptible to COPD (G1), smokers susceptible to both COPD and lung cancer (G2), and smokers susceptible to lung cancer only (G3) ( and ). More importantly, the epidemiological studies also show there is a fourth group of smokers, consisting of the majority of smokers (≈70%)
[4],
[5],
[12], who maintain normal or near normal lung function. This group, have a “resistant” phenotype (G0), either do not develop, or are at least risk of, COPD and lung cancer
[4],
[5],
[8],
[9],
[12]. This is likely to be due, in part, to an excess of protective genetic variants compared to susceptibility variants
[27],
[31]. Based on the results of this study, the G0 genes conferring protection from COPD and lung cancer include the rs7671167 SNP (FAM13A gene on the Chr 4q22 locus) and the rs1489759 and rs2202507 SNPs (GYPA and HHIP genes on the Chr 4q31 locus). The rs2070600 SNP (AGER on the Chr 6p21 locus), previously linked to reduced FEV
1, appears to be a susceptibility gene for COPD but not lung cancer (G1). Both the rs169968 SNP (CHRNA3/5 gene on the Chr 15q25 locus) and the rs1052486 SNP (BAT3 gene on the Chr 6p21 locus) appear to confer susceptibility to lung cancer, but the latter only in conjunction with COPD (G2). The rs402710 SNP (CRR9 (TERT) on the Chr 5p15 locus) appears to confer susceptibility to lung cancer in those with no pre-existing COPD, in keeping with other studies (G3)
[34],
[43],
[44]. These observations require validation in larger studies where SNP effects on histological subtypes might also be relevant to our findings
[1],
[43]. Several loci linked to lung function in the general population, such as the rs10516526, rs11168048 and rs11155242 SNPs (GSTCD on 4q24, HTR4 on 5q33, and GPR126 on 6q24, respectively)
[22],
[23] do not appear to be related to COPD in this study. However, given that the population study did not look specifically at smokers, it is possible that these loci are not relevant to the lung's response to tobacco smoke exposure. The authors conclude that the novel study design used here provides a viable approach with which to better understand the genetic epidemiology of lung cancer.
The pathologic link between COPD and lung cancer may stem in part from the overlapping inflammatory, apoptotic and matrix remodelling/repair processes
[45]–
[47] underlying COPD, and the development of squamous metaplasia, epithelial-mesenchymal transition (EMT) and DNA damage that underlies lung carcinogenesis
[28],
[45],
[48]–
[51]. In particular, there is growing evidence that suggests these smoking induced changes are orchestrated by the bronchial epithelium
[28],
[45],
[48]–
[51] - the HHIP, CHRNA 3/5 and FAM13A proteins are all known to be expressed on the bronchial epithelium (see below). Although several of the SNPs, investigated in this study have been shown to have functional effects on gene expression or protein function, they may not themselves be the causal variant, but instead representative of the causal allele through linkage disequilibrium
[52]. We note that in many instances, the physical distance between these risk SNPs and the proposed candidate genes is large. Despite this, it remains possible that the investigated SNPs are themselves functional as (a) studies have shown that SNPs with regulatory effects on genes maybe some distance away
[50], and (b) it has recently been recognised that common SNPs with consistent disease association signals, through “Synthetic associations”, may represent the biological effects of rare variants in nearby genes as much as 2 mega-bases apart
[53]. If such an effect were true, then there is potential for considerable overlap between the susceptibility genes for COPD and for lung cancer. The rs16969968 SNP (CHRNA 3/5 on 15q25,) investigated in this study results in a non-synonymous amino-acid change in a highly conserved region of the second intra-cellular loop of the α5 subunit of the nicotinic acetylcholine receptor. This receptor is expressed on both bronchial epithelial cells and inflammatory cells, and is believed to moderate pulmonary inflammation
[54] and lung destruction
[34]. This receptor also binds both nitrosamines (known carcinogens in cigarette smoke
[55]) and nicotine linking it to lung cancer and nicotine addiction respectively
[56]. The rs1052486 SNP (BAT3 on 6p21,) is a missense mutation (Ser619Pro) in the BAT3 gene and has been previously linked to lung cancer
[57]. BAT3 is a nuclear protein that influences apoptosis through it's interaction with p53
[58] linking it to both COPD and lung cancer. The rs1489759 SNP (HHIP on 4q31,) is 93 kb upstream of the HHIP gene and of unknown function. The HHIP protein is believed to be important in the bronchial epithelial response to smoking
[59] and epithelial repair processes in lung cancer
[60]. The HHIP protein has been linked with epithelial-mesenchymal transition, a pathological process that results from lung remodelling (with release of metalloproteinases and growth factors
[29],
[45],
[61]) and initiates lung carcinogenesis
[48]. The rs2202507 SNP (GYPA on 4q31,) is of unknown function and downstream of the GYPA gene. The GYPA protein, found on erythrocytes, shows reduced expression in COPD and is indicative of oxidative stress
[62]. Whether the GYPA association with COPD and lung cancer reflects an independent effect or linkage effect with the HHIP locus (LD≈0.70) is still debated
[21]. The rs7671167 SNP (FAM13A on 4q22,) is found in intron 4 of the FAM13A gene and has no known biological function
[43],
[63]. The FAM13A protein, expressed in respiratory cells, is thought to be involved in signal transduction with possible tumor suppressor activity
[63],
[64]. The rs1422795 SNP (ADAM 19 on 5q33,) is a missense mutation (Ser284Gly) in the ADAM 19 gene. ADAM 19 is a transmembrane protein expressed in human lung implicated in cell-matrix interactions
[65], pulmonary inflammation
[66] and lung cancer
[67]. The rs402710 SNP (CRR9 (TERT) on 5p15,) is an intronic SNP of unknown function in the CRR9 gene and associated with lung cancer in many studies
[1],
[17],
[18],
[34]. This SNP is 25 kb upstream from the TERT gene encoding, which encodes the catalytic subunit of telomerase, a reverse transcriptase that affects telomere shortening, which has been implicated in both aging and lung cancer
[68]. The results of the current study suggest that the CRR9/TERT locus confers susceptibility to lung cancer in the absence of COPD. Such a finding is in accordance with those recently reported by Yang et al
[34], who found after adjusting for the presence of COPD, only the rs 402710 SNP (Chr5p15 locus) was associated with lung cancer while the effects of the other GWA associated SNPs were lost. The rs2808630 SNP (CRP on 1q23,) is found in the 3′ flanking region of the CRP gene and has been associated with serum CRP levels (C allele with reduced CRP)
[69]. Elevated CRP levels have been shown in prospective studies to be associated with greater decline in lung function
[70] and elevated lung cancer risk after adjustment for smoking
[71]. In the current study, where all cohorts were matched for smoking exposure, the CC genotype (low CRP level) was less frequent in both COPD and lung cancer cases although only achieved significance in the lung cancer only sub-phenotype. The rs2070600 SNP (AGER on 6p21,) is a missense mutation (Gly82Ser) of the AGER gene and shown to affect the inflammatory response in humans
[72]. AGER protein expression has been shown to be increased in the lungs of smokers with COPD
[73] whilst decreased in human lung cancer cell lines
[74]. We conclude that the SNP associations described here with COPD and/or lung cancer can be explained by plausible, but as yet unproven, biological functions. We also conclude that through sub-phenotyping for COPD, possible clues as to the independent and overlapping pathogenic processes underlying COPD and lung cancer can be better examined.
The use of healthy smokers as controls in this study represents a novel though possibly controversial approach
[31] to identifying the genetic basis of lung cancer. The authors contend that such an approach is classically used in pharmacogenetic studies where the disparate response to a standardised dose of drug provides a dynamic phenotype (high vs low metabolisers or responders vs nonresponders) from which to identify relevant genes
[75]. In the setting of lung cancer, smoking is the drug and FEV
1 the biomarker of responsiveness. The latter is based on the epidemiological studies showing that FEV
1 is the most important risk factor for lung cancer among smokers
[8],
[9],
[12],
[8],
[25,76] and has a bimodal distribution among chronic smokers
[10]–
[12]. The latter is very relevant as bimodal distribution supports a genetic basis as suggested by twin studies where heritability of FEV
1 is estimated to be 40–77% compared to only 15–25% for lung cancer
[6],
[7]. From a genetic epidemiology perspective, a cohort of chronic smokers with the resistant or “non-responder” phenotype (normal or near normal FEV
1), might provide an alternate control group to the non-random (and unscreened) smokers used in case-controls to date
[17]–
[19]. Controls recruited from hospital clinics or in the absence of spirometric screening (volunteers), report a COPD prevalence of 30% or more
[33]). If the control group includes a high proportion of smokers with COPD, the effect of the COPD related genes on lung cancer susceptibility will be diluted or lost. This is also relevant as the proportion of COPD patients who eventually develop lung cancer may be as high as 25–30%
[8],
[77] and the frequencies of several disease-related SNPs are very similar between lung cancer and COPD groups (See , eg FAM13A, HHIP). This might explain why the lung cancer GWA studies to date failed to consistently identify the Chr4q31 (HHIP/GYPA) and Chr4q22 (FAM13A) loci as a protective loci
[17]–
[19], and the Chr 5q33 (ADAM19) locus as a possible susceptibility locus. It would also explain why matching for COPD in the lung cancer cases and controls might identify only the Chr5p21 (CRR9/TERT) locus which in the current study was associated with lung cancer in smokers with no underlying COPD
[34]. The authors propose that FEV
1 be routinely measured in genetic epidemiology studies of lung cancer to better understand the role of “COPD genes” in lung cancer
[8],
[12]. Subtyping for emphysema using computerised tomography or reduced diffusion capacity would further refine the subphenotyping for COPD
[78].
It is possible that the specific associations reported in this study reflect in part, small sample size and chance findings. This represents an important limitation of the current study requiring replication in a larger study. It is also possible that the findings reflect true associations that have been better identified, despite small sample sizes, by more precise phenotyping of subjects. Minimising misclassification has been shown to improve the power of a study to identify true associations
[36]. The authors suggest that some important associations may be either missed
[18],
[19] or miss-assigned
[17]–
[19] in studies where the COPD status of smoking controls is unknown, especially using hospital based controls where the prevalence of COPD has been found to be as high as 30%
[33]. The latter would be analogous to searching for type 2 diabetes genes by comparing obese patients with type 2 diabetics thereby missing the genetic effects contributing to obesity. If previous case-control studies use control groups where the prevalence of COPD is 25–30%, then relevant genetic effect may be obscured. This is well illustrated in where, for several SNPs (eg HHIP, GYPA, CRR9 (TERT), ADAM19 and CHRNA 3/5), the frequencies of “risk genotypes” between COPD and lung cancer cases are very similar. In addition, matching of other confounding variables, in particular smoking dose exposure, may also help to detect relevant genetic associations which might otherwise be diluted by using unexposed people (non-smokers
[17]–
[19]). Matching for smoking is particularly important in these studies of smoking related disease as the penetrance of SNP effects, reflected in the odds ratio, are likely to be related to the degree and/or duration of smoking exposure. The effect of certain SNPs have been shown to be greater when investigated only in those with greater smoking exposure
[21],
[29]. This is the case in α1-anti-trypsin deficiency where people homozygote for the Z allele (low α1-antitrypsin level) are at risk of emphysema when they smoke, but much less so when they are non-smokers
[79]. Lastly, there remains the possibility that the SNP associations reported here result from gender, age or height differences between the group comparisons. Although our sample sizes are modest, we think this is unlikely as the groups are comparable with respect to these variables and we specifically examined this possibility and did not find any SNP effects confounded by these variables.
The authors have previously reported a lung cancer susceptibility model whereby genotype data is combined with non-genetic data
[27],
[32]. This model is based on the results of a multivariate analysis that include the genotypes, scored according to whether they conferred a small protective (-1) or susceptibility (+1) effect
[27],
[32]. The clinical variables, identified as independent predictors of lung cancer following multivariate analysis were, age over 60 years, a family history of lung cancer and previous diagnosis of COPD. In stepwise regression, family history of lung cancer is independently associated with lung cancer risk after inclusion of the SNP genotype data
[80] and likely reflects rare family-specific genetic effects not accounted for by the genotypes tested here. An example of such a genetic effect is represented by the RGS17 gene on Chr 6q24 implicated in familial lung cancer but not investigated here
[81]. Similarly, the prior diagnosis of COPD is independently associated with lung cancer risk and likely reflects the contribution of genetic susceptibility to COPD not otherwise accounted for by the SNPs in the panel. The SNP data provides an important and significant contribution to the overall score as “risk genotypes” are a risk variable present from birth, and unlike family history and diagnosis of COPD, not dependent on age or natural history of disease. This is very relevant to prevention as high risk SNP genotypes can be identified early in a person's smoking history, before irreversible malignant transformation has occurred. Although lung function data itself is also an important variable in defining the risk of lung cancer, it is usually not available for the majority of smokers where it is often not done until exertional breathlessness is severe and when over 50% of lung function is irreversibly lost
[12]. For each subject in the control smoker and lung cancer cohorts, a lung cancer susceptibility score was derived according to these variables and their distributions compared
[27],
[32]. The distribution showed a bimodal separation suggesting utility as a screening test of risk
[27],
[32],
[82]. Using the same approach in the current study, with the susceptibility and protective genotypes derived from the GWA SNPs (9 SNP panel, ), the lung cancer susceptibility score was also bimodal and showed a limited utility in an ROC analysis (AUC

=

0.69) ( and ). This utility was increased when the 10 most informative SNPs from the previous study were added (N

=

19 SNP model, AUC

=

0.72, data not shown). This suggests that as new genetic variants are identified and added to the risk model, a greater utility based on ROC analysis might be achieved
[31],
[80]. This study provides further evidence that lung cancer results from the combined effects of several genetic variants
[83] with low penetrance
[84] from genes implicated in both COPD and lung cancer
[26]–
[28]. This study also highlights the limitations of the lung cancer GWA studies reported to date
[85] and the need to consider sub-phenotyping using spirometry-defined COPD to better understand the relative effects of genetic variants on lung cancer susceptibility
[26],
[28]. In conclusion, this study provides additional evidence that genes involved in the risk of COPD may also be relevant to the risk of lung cancer and that spirometry be routinely used to identify COPD, an important sub-phenotype of lung cancer. This study also supports the potential of combining genotype data
[27],
[32] in an algorithmic fashion to identify smokers at greatest risk of lung cancer.