Using a candidate gene approach in a two stage selection process a panel of protective and susceptibility SNPs were identified that individually confer only small effects on risk of lung cancer (OR ranging from 0.3 to 2.6). This is very much in keeping with the experience from case control association studies to date 
. Consistent with existing risk models, relevant factors were combined using an algorithm (in this study including SNP data) to derive a susceptibility score on a simple linear scale. This study design, and the algorithmic approach that underlies this lung cancer susceptibility score, is comparable to a recent study in prostate cancer. Moreover, it takes into account important epidemiological observations relevant to genetic predisposition to lung cancer. First, that although smoking exposure is essentially a pre-requisite to getting lung cancer, increasing age and poor lung function have important independent effects on lung cancer susceptibility. Second, the genetic factors underlying lung cancer risk are likely to be both polygenic and heterogeneous, conferred by a variable combination of genetic variants (i.e. SNPs with low penetrance and small effect sizes). Third, genetic factors may confer either a protective 
or susceptibility 
phenotype to lung cancer. Fourth, the potential confounding effect of COPD 
has been accounted for in the model. Here we report a 20 SNP panel which combined with family history 
define risk (OR) across quintiles ranging 1–10 with an AUC of 0.70. A risk tool with greater clinical utility can be derived by including age to identify those at greatest susceptibility to lung cancer (OR ranging 1–19 and AUC
This study sought to minimize false positive results in a number of ways. The most important of these was to internally validate the SNP associations using a two stage design with an initial discovery cohort (run 1) to identify SNPs of potential interest. Only these SNPs were tested in a second (validation) cohort of cases and controls (run 2) and using univariate analysis from the two runs independently to select the SNPs based on replication. Second, population stratification was excluded and third, the presence of genotyping error was minimized through HWE analysis and by the exclusion of SNPs with <95% call rate (fails on genotyping is invariably genotype specific, thus generating false positive associations). With respect to possible confounding, in a sensitivity analysis where lung cancer cases and healthy smoking controls were matched for smoking exposure (pack years), age, gender and presence of COPD, the performance of the lung cancer score was not reduced.
Weaknesses in this study include the modest size of the cohorts, borderline significance of some SNPs in the absence of correction, cross-sectional design and recruitment limited to Caucasians with a minimum 15 pack years. Furthermore, we chose to recruit smokers with essentially normal lung function as controls to improve power 
and best represent those least susceptible to the adverse effects of smoking (COPD and lung cancer) but most representative of smokers in general who maintain normal lung function 
. For this reason, COPD was not included in the model although it is an important risk factor and added to the score's utility in a post-hoc analysis. A further limitation of the study is that although the cases and controls were arguably representative, not all variables were precisely matched in the initial analysis (eg age, gender and smoking patterns). It should be noted that although precise matching of all demographic variables reduces the potential for confounding, it also potentially obscures important effects of variables in a risk model. Although only 14 of the 20 SNPs reached traditional levels of significance in the combined cohorts, and the addition of the remaining six SNPs only contributed modestly to the model, this was a two stage design where replication of associations (in this and other studies) and biological plausibility 
were the basis of SNP selection. Further studies will need to be done to further validate this SNP panel and risk model in unselected populations.
In this study a candidate gene (i.e. hypothesis driven) approach was used to identify potentially functional SNPs associated with the development of both COPD and lung cancer. Although the SNPs identified in this study may only reflect linkage disequilibrium with functional variants nearby, these SNPs are likely to have functional effects and involvement directly with susceptibility to lung cancer. The 20 SNP panel consists of genetic variants known to encode proteins underlying important pathways implicated in lung carcinogenesis, specifically; metabolism of smoking-derived carcinogens (N-Acetyl Transferase 2 and Cytochrome P450 2E1) 
, inflammatory cytokines (Interleukins 1, 8 and 18, Tissue necrosis factor alpha1 receptor, Toll-like receptor 9) 
, smoking addiction (dopamine D2 receptor and Dopamine transporter 1) 
, anti-oxidant response to smoking (α1 anti-chymotrypsin and extracellular superoxide dismutase) 
, cell cycle control, DNA repair and apoptosis (Xeroderma Pigmentosum complementary group D, p73, Bcl-2, FasL, Cerb1 and REV1) 
and integrins implicated in apoptosis 
. One of the SNPs (α5 nAChR) has recently been associated with both lung cancer and COPD in candidate gene 
and genome wide association studies 
. This receptor appears to de directly related to nicotine effects on airway inflammation 
. As can be seen, the SNP panel (Table III) is made up of a variety of SNPs from genes implicated in many inter-related pathways. Twelve of these SNPs have been associated with lung cancer in other cohorts. It is likely other SNPs from as yet unidentified genes will be identified in the future. To assess further the utility of the lung cancer susceptibility score, a prospective study is in progress. To date the lung cancer cases (n
43) have the same mean and distribution as the lung cancer cases reported in this study (unpublished data). Further case control and functional studies will be needed to further explore the role of these SNPs in lung cancer susceptibility.
The authors propose that clinical utility of genotype data requires that many SNPs are analyzed and their effects combined with other epidemiological factors of relevance 
. The algorithm approach used in this study assumes a simple additive model comparable to that recently published in Prostate cancer 
and involves minimal assumptions (not hierarchical or Path analysis based). The patient's score can be compared with the scores in smokers with least susceptibility to lung cancer (lowest quintiles) in a simple linear fashion. Such an approach is comparable to the risk tools developed by others 
. The potential clinical utility of the lung cancer susceptibility score was assessed by receiver operator curve analysis. This showed the c statistic to be 0.77 and, at a cut off of ≥3, an estimated sensitivity of 89% and corresponding specificity of 45%. These findings are comparable to the ROC performance of the Framingham score (c statistic
0.74). The c statistic for the 20 SNP panel on its own was 0.68 (and 0.70 when combined with family history) indicating its utility in the current cohort. There is evidence, although limited, that genetic testing may positively alter the behavior of smokers in the context of smoking cessation (increase intent and possibly improve quit rate 
) or by lowering smoking prevalence 
. Although further validation studies are required, this study suggests that genetic data may be combined with other risk variables from smokers or ex-smokers to identify individuals most susceptible to developing lung cancer. Further studies are planned in larger cohorts of unselected cases and controls.