Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Mol Carcinog. Author manuscript; available in PMC 2013 October 1.
Published in final edited form as:
PMCID: PMC3289753

Genetic variability in DNA repair and cell cycle control pathway genes and risk of smoking-related lung cancer


DNA repair and cell cycle control play an important role in the repair of DNA damage caused by cigarette smoking. Given this role, functionally relevant single nucleotide polymorphisms (SNPs) in genes in these pathways may well affect the risk of smoking-related lung cancer. We examined the relationship between 240 SNPs in DNA repair and cell cycle control pathway genes and lung cancer risk in a case-control study of white current and ex-cigarette smokers (722 cases and 929 controls). Additive, dominant and recessive genetic models were evaluated for each SNP. A genetic risk summary score was also constructed. Odds ratios (OR) for lung cancer risk and 95% confidence intervals (95% CI) were estimated using logistic regression models. Thirty-eight SNPs were associated with lung cancer risk in our study population at P<0.05. The strongest associations were observed for rs2074508 in GTF2H4 (Padditive=0.003), rs10500298 in LIG1 (Precessive=2.7×10−4), rs747658 and rs3219073 in PARP1 (rs747658: Padditive=5.8×10−5; rs3219073: Padditive=4.6×10−5), and rs1799782 and rs3213255 in XRCC1 (rs1799782: Pdominant=0.006; rs3213255: Precessive=0.004). Compared to individuals with first quartile (lowest) risk summary scores, individuals with third and fourth quartile summary score results were at increased risk for lung cancer (OR: 2.21, 95% CI: 1.66–2.95 and OR: 3.44, 95% CI: 2.58–4.59, respectively; Ptrend<0.0001). Our data suggests that variation in DNA repair and cell cycle control pathway genes is associated with smoking-related lung cancer risk. Additionally, combining genotype information for SNPs in these pathways may assist in classifying current and ex-cigarette smokers according to lung cancer risk.

Keywords: SNP, case-control, lung cancer


Lung cancer causes approximately 160,000 deaths annually in the United States [1]. Although cigarette smoking can explain over 80% of this risk [2], lung cancer also occurs in never smokers [3] and fewer than one of five smokers develop lung cancer [4]. Plausible explanations for variable risk in smokers, occurrence in never smokers and aggregation in families include inherited differences in genes intimately involved in carcinogenesis by way of carcinogen metabolism, DNA repair, cell cycle control, apoptosis or inflammation [5].

Approximately 160 human genes mediate DNA repair through processes that include nucleotide excision repair (NER), base excision repair (BER), mismatch repair, homologous recombination, and non-homologous end-joining, the latter two pathways responsible for repairing double strand DNA breaks [6]. The NER pathway repairs DNA damage caused by the tobacco-related carcinogen benzo(a)pyrene, while the BER pathway repairs DNA damage caused by reactive oxygen species produced by cigarette smoke [7]. Given these roles, functionally relevant single nucleotide polymorphisms (SNPs) in genes in these pathways may well affect the risk of smoking-related lung cancer. In addition, cell cycle control proteins, including p53, p21, cyclin D, and cyclin E, mediate DNA damage responses that lead either to apoptosis or cell cycle arrest [8]. Therefore, variation in relevant cell cycle control pathway genes could magnify or attenuate cumulative effects from deficiencies in DNA repair.

Early studies typically evaluated associations with lung cancer risk for a limited number of SNPs in selected NER genes, mainly ERCC1, ERCC2, ERCC5, XPA and XPC [9,10], in selected BER genes, mainly APEX1, OGG1 and XRCC1 [9], or in selected cell cycle control genes TP53 (p53) [9,11], CDKN1A (p21) [11], and CCND1 (cyclin D1) [11,12]. Unfortunately, this approach has not yet identified a single gene variant useful for lung cancer risk stratification. This result motivates the search for stronger and more useful patterns of variation that involve multiple candidate genes representing more than one cancer-relevant biological pathway. Therefore, we used a custom-designed 384-SNP microarray to examine the relationship between common variation in NER, BER, and cell cycle control pathway genes and risk of smoking-related lung cancer in a large case-control study of white current and ex-cigarette smokers (722 cases and 929 controls).

Materials and Methods

Study population


The case group includes patients enrolled between 1990 and 2008 within one year of a thoracic surgery procedure for lung cancer diagnosis, staging or treatment at a University of Pittsburgh Medical Center hospital. Additional eligibility criteria for the current study included: 1) non-missing sex, 2) current or ex-cigarette smoker, 3) ≥10 pack-year cumulative cigarette dose exposure, 4) 45–85 years old at time of lung cancer diagnosis, 5) pathologically verified lung cancer diagnosis (excluding carcinoid), and 6) DNA sample available for genotyping. Cases previously enrolled in a computed tomography (CT) lung cancer screening study (Pittsburgh Lung Screening Study, PLuSS) were excluded. In total, 837 cases fulfilled these criteria.


Controls for this study were selected from PLuSS. Between 2002 and 2005, PLuSS enrolled 50–79 year-old current and ex-cigarette smokers of at least one-half pack/day for at least 25 years and excluded individuals who 1) quit smoking more than 10 years earlier, 2) reported a history of lung cancer, or 3) reported chest CT within one year of enrollment. For the current study, we randomly selected 1000 controls from among the 3463 white or black race CT-screened PLuSS participants with DNA available and no interval lung cancer diagnosis as of September 2008. All cases and controls provided written informed consent and were enrolled under University of Pittsburgh Institutional Review Board-approved protocols.

Gene and SNP selection

SNPs in key genes in the NER, BER and cell cycle control pathways were included on the 384-SNP microarray (See Supplemental Tables 1 and 2 for a listing of all included SNPs). The SNP selection strategy incorporated six features: 1) functional significance as determined by amino acid substitution; 2) potential for regulatory change; 3) alteration in protein stability; 4) evolutionary conservation across species, 5) disease association reported in the literature; and, 6) tagSNP in either Utah residents with Northern or Western European ancestry (HapMap CEU) or Yoruban Africans from Ibadan, Nigeria (HapMap YRI). We additionally included 10 ancestry informative markers (AIMs) (rs1426654, rs16891982, rs1871534, rs2814778, rs3827760, rs6003, rs722098, rs723632, rs7349 and rs930072), previously validated against self-reported race in a study population including Caucasian and African-American subjects.

DNA isolation and genotyping

Genomic DNA was isolated using Gentra Systems Inc. isolation kits and analyzed using an Illumina® GoldenGate custom-designed 384-SNP microarray. Analyses serially excluded: 1) assigned genotypes by Illumina® GenomeStudio 2010.1 using a GenTrain score <0.45 or a ClusterSep score <0.25, 2) individuals with genotyping call rates <90%, and 3) SNPs that failed in more than 2% of the samples. Forty cases and eight controls had call rates <90% and were excluded leaving 797 cases and 992 controls. Seven SNPs failed completely and 21 SNPs failed in > 2% of the samples, these SNPs were excluded. SNPs with minor allele frequency (MAF) <0.02 in the control group were excluded as well (N=62, mainly HapMap YRI tagSNPs), as were 19 SNPs that had been included on the 384-SNP microarray for a separate study and were located in genes not directly involved in DNA repair or cell cycle control (11 in ABCB1 and 8 in GSTP1). Finally, 25 SNP that deviated from Hardy-Weinberg equilibrium (HWE; P<0.01) were also excluded from the analyses. Supplemental Table 1 lists all 134 SNPs that were excluded. Supplemental Table 2 shows HWE P-values and control group MAFs for the remaining 240 non-AIM SNPs. The 240 SNP set consists of 185 SNPs in 20 NER pathway genes (gene: number of SNPs – ERCC1: 5, ERCC2: 12, ERCC3: 10, ERCC4: 8, ERCC5: 15, ERCC6: 1, ERCC8: 9, GTF2H1: 10, GTF2H3: 6, GTF2H4: 6, GTF2H5: 2, LIG1: 21, MNAT1: 8, RAD23A: 1, RAD23B: 9, RPA1: 17, RPA2: 3, RPA3: 25, XPA: 6, XPC: 11), 29 SNPs in four BER pathway genes (MPG: 2, OGG1: 2, PARP1: 12, XRCC1: 13), and 26 SNPs in seven cell cycle control genes (CDKN2A: 5, CCND1: 5, CCNE1: 2, CCNH: 2, CDK7: 4, E2F1: 2, TP53/P53: 6).

Statistical analyses

Using data from the 10 AIMs for the 797 cases and 992 controls with ≥90% call rates and a two-cluster solution in Structure 2.3.1 [13], we inferred genetic ancestry using the estimated white genome fraction cutoff that correctly classified 929 (99.5%) of 934 and 58 (100%) of 58 eligible control subjects self-reporting white and black race, respectively. To avoid spurious candidate genotype-disease association due to population stratification [14], all subsequent data analyses used the 722 cases (677 self-reported white and 45 no reported race) and the 929 controls from the white genetic ancestry cluster.

Deviation from HWE was examined in the control population for each SNP using the χ2 goodness-of-fit test. We also used χ2 tests to evaluate case-control differences with respect to sex, age, enrollment year, and cumulative cigarette dose (pack-year) exposure, and to test independence, in the control group, between genotype and other factors (sex, age, enrollment year, and pack-year). We used logistic regression (SAS 9.2, SAS Institute, Cary, NC) to estimate crude and adjusted odds ratio (OR) for lung cancer risk and corresponding 95% confidence interval (95% CI). Additive, dominant and recessive genetic models were evaluated for each SNP. Adjusted models contained parameters for sex, age, year enrolled, and pack-year. Fitting logistic regression models containing terms for genotype group and exposure group (sex, age, year enrolled, or pack-year) and terms for the genotype group by exposure group cross classification, screens for gene-environment interaction applied the log-likelihood ratio test (at P<0.05) to the cross classification terms. Case-case comparisons, comparing 258 squamous cell carcinoma cases with 301 adenocarcinoma cases, were conducted to evaluate heterogeneity in genetic risk factors for these two non-small cell lung cancer subsets.

We used HapMap phase 1 & 2 data ( and Haploview 4.1 [15] to determine, for each gene, the fraction of all successfully genotyped (HWE P-value ≥0.01) common-variant (MAF≥0.05) SNPs in the HapMap-CEU population captured at r2≥0.8 by the subset of SNPs included in our panel and screened for association with lung cancer risk. The 240 SNP set captured all (100%) common-variant HapMap-CEU SNPs in five genes (E2F1, GTF2H1, LIG1, MPG, and TP53/P53), most (80–99%) common-variant HapMap-CEU SNPs in 12 genes (CCND1, CDKN2A, ERCC1, ERCC3, ERCC4, ERCC5, GTF2H3, GTF2H4, GTF2H5, MNAT1, PARP1, and RPA2), many (60–79%) common-variant HapMap-CEU SNPs in 9 genes (CCNH, CDK7, ERCC2, OGG1, RPA1, RPA3, XPA, XPC, and XRCC1), and comparatively few (<60%) common-variant HapMap-CEU SNPs in five genes (CCNE1, ERCC6, ERCC8, RAD23A, and RAD23B).

A compound statistic [16,17] was used to rank SNPs according to statistical association with lung cancer risk. That is, SNPs were ranked according to the minimum, unadjusted P-value obtained under the additive, dominant, and recessive genetic models (min 3p). Results from the additive and recessive models were ignored in sample sets containing fewer than 10 minor allele homozygotes.

Using the most frequent haplotype as referent and ignoring haplotypes with a control group frequency <0.01, whole gene haplotype analyses used HPlus 3.2 (Fred Hutchinson Cancer Research Center, WA, to identify lung cancer risk-associated haplotypes. Using selected lung cancer risk-associated SNPs in low linkage disequilibrium, we additionally summarized genetic lung cancer risk by summing genotypes coded as shown in Table 4, footnote 2.

Table 4
Lung cancer case (Ntotal=722) and control (Ntotal=929) subjects distributed according to genotype risk summary score*,.


Characteristics of the study population are presented in Table 1. Compared with the control group (N=929), the lung cancer case group (N=722) included more men, more subjects in the two oldest age categories, and more subjects in the highest pack-year category.

Table 1
Characteristics of lung cancer case and control subjects

Thirty-eight SNPs (15.8% of 240) were associated with risk of lung cancer at P<0.05 (unadjusted). Table 2 shows genotype distribution, best genetic model (i.e., model with smallest P-value), rank based on min 3p, unadjusted and adjusted ORs and 95% CIs and P-values for these SNPs. Eleven SNPs had unadjusted P-values <0.01, and P-values remained <0.01 after adjustment for sex, age, year enrolled, and pack-years for six of these SNPs (Table 2). This latter set includes: rs2074508 in GTF2H4 (OR: 0.73, 95% CI: 0.59–0.90, P=0.003, additive model), rs10500298 in LIG1 (OR: 1.73, 95% CI: 1.29–2.32, P=2.7×10−4, recessive model), rs747658 and rs3219073 in PARP1 (two SNPs in high linkage disequilibrium, r2≥0.8; rs747658: OR: 0.62, 95% CI: 0.49–0.78, P=5.8×10−5; rs3219073: OR: 0.61, 95% CI: 0.48–0.77, P=4.6×10−5; additive model for both), and rs1799782 (Arg194Trp) and rs3213255 in XRCC1 (rs1799782: OR: 0.53, 95% CI: 0.34–0.84, P=0.006, dominant model; rs3213255: OR: 1.52, 95% CI: 1.14–2.01, P=0.004, recessive model). Similar results were observed when the case group was limited to non-small cell lung cancer only (data not shown).

Table 2
SNPs statistically (unadjusted P<0.05) associated with lung cancer risk, by pathway and gene.

Evaluation of gene-environment interactions was limited to the 38 SNPs associated with risk of lung cancer at P<0.05 (Table 2). Being homozygous for the rare allele of SNP rs7783714 in RPA3 was associated with a reduced risk of lung cancer among women (OR: 0.41, 95% CI: 0.23–0.74), but not among men (OR: 0.91, 95% CI: 0.55–1.49) (Pinteraction=0.04). Otherwise, analyses stratified by sex, age, enrollment year, or pack-year did not identify meaningful instances of SNP-lung cancer risk associations modified by co-factor level (data not shown).

Case-case comparisons conducted to evaluate heterogeneity in genetic risk factors for the two largest lung cancer subsets (squamous cell carcinomas and adenocarcinomas) were limited to the same 38 SNP set. PARP1 rs2048424 minor allele homozygotes were more common among adenocarcinoma cases than among squamous cell carcinoma cases (12.0% vs. 7.0%, P=0.049); among controls, 14.1% were homozygous for the rs2048424 minor allele. LIG1 rs10500298 minor allele homozygotes were also more common among adenocarcinoma cases than among squamous cell carcinoma cases (28.0% vs. 19.9%, P=0.027); among controls, 13.9% were homozygous for the rs10500298 minor allele. Frequencies for the other SNPs did not differ significantly between the two tumor subsets (data not shown).

Compared to the most common haplotype, rarer haplotypes in five different genes (ERCC2, GTF2H4, PARP1, XRCC1 and CCND1) were associated with a decreased risk of lung cancer and in one gene (GTF2H1) with an increased risk of lung cancer (P<0.01; Table 3). Results were qualitatively similar after adjustment for sex, age, enrollment year, and pack-year (Table 3). As indicated in Table 3, the rare ERCC2 haplotype associated with decreased lung cancer risk contained the rs238405 common allele, the GTF2H4 haplotype contained the rs2074508 minor allele, the PARP1 haplotypes contained the rs8679 common allele and the most rare PARP1 haplotype additionally contained the minor alleles from rs747658, rs3219073, rs3219090, rs2048424 and rs2666428, one of the XRCC1 haplotypes contained the rs1799782 minor allele and the common alleles from rs762507 and rs3213255 and the other XRCC1 haplotype contained the rs1799782 common allele and the rs3213266 minor allele, and the CCND1 haplotype contained the rs603965 common allele and the minor alleles from rs3918298 and rs678653. The rare GTF2H1 haplotype associated with increased risk of lung cancer contained the rs4150667 minor allele (Table 3).

Table 3
Haplotypes associated with risk of lung cancer at p<0.01. Underlines identify loci associated with lung cancer risk in single SNP analyses (Table 2). SNP labels identify common (“0”) and minor (“1”) alleles.

We constructed a genotype risk summary score that used best genetic model results for 31 of the 38 SNPs that were statistically significantly associated with lung cancer risk at P<0.05 (see Table 2). The 7 SNPs that were excluded (RPA1 rs4281767 and rs7503173, RPA2 rs7356, RPA3 rs13227585, and PARP1 rs747658, rs2048424 and rs2666428) were in high linkage disequilibrium (r2≥0.80) with one or more of the 31 SNPs included in the summary score (data not shown). Summary scores were stratified according to quartile distribution in the control group. Individuals with third and fourth quartile summary score results were at increased risk for lung cancer compared to individuals with first quartile (lowest) summary scores (OR: 2.21, 95% CI: 1.66–2.95 and OR: 3.44, 95% CI: 2.58–4.59, respectively; Ptrend < 0.0001; Table 4). Adjustment for sex, age, enrollment year, and pack-year attenuated these associations minimally (Table 4). Summary score quartile predicted lung cancer with a 0.636 area under the receiver operating characteristic (ROC) curve.


In this case-control study, we evaluated associations of 240 common SNPs in genes involved in NER, BER or cell cycle control with risk of smoking-related lung cancer. Thirty-eight SNPs, 24 in NER, 10 in BER and four in cell cycle control genes, were associated with lung cancer risk at P<0.05 in our study population. The strongest associations with risk of lung cancer were observed for rs2074508 in GTF2H4, rs10500298 in LIG1, rs747658 and rs3219073 in PARP1, and rs1799782 and rs3213255 in XRCC1. In addition, haplotypes in six different genes (ERCC2, GTF2H1, GTF2H4, PARP1, XRCC1 and CCND1) were found associated with lung cancer risk.

GTF2H4 codes for the p52 subunit of NER transcription factor IIH (TFIIH). The TFIIH complex has ATPase and helicase activities and opens DNA at sites of DNA distorting damage. The p52 subunit may regulate the ATPase activity of the TFIIH subunit (XPB), a protein coded by ERCC3 [18]. In our study, single SNP analyses found the minor allele of rs2074508 in GTF2H4 significantly associated with a reduced risk of lung cancer (additive model). The common GTF2H4 haplotype associated with reduced lung cancer risk captured this minor allele. Wang et al. [19] recently reported associations between cervical dysplasia and two SNPs near GTF2H4, rs2894054, located 3.7 kb upstream, and rs6926723, located 0.9 kb downstream (within valyl-tRNA synthetase, VARSL) from GTF2H4. Published studies of GTF2H4 genetic variation and cancer risk appear otherwise not to exist [19]. Providing a first comprehensive survey of common variation in GTF2H4 and lung cancer susceptibility, the GTF2H4 SNPs evaluated in our study captured 9 (90%) of the 10 HapMap phase 1 & 2 common-variant (MAF≥0.05) CEU SNPs.

DNA ligase I (encoded by LIG1) joins Okazaki fragments in the lagging strand during DNA synthesis and completes BER and NER [20]. In our study, case and control genotype distributions differed significantly for one of the 21 examined LIG1 SNPs. LIG1 rs10500298 minor allele homozygotes were more common among lung cancer cases than controls and more common among adenocarcinomas than squamous cell carcinomas. A literature search identified five published studies on lung cancer risk and LIG1 genetic variability [study #1: one SNP, 530 and 570 non-Hispanic white cases and controls [21]; study #2, 34 SNPs, 143 and 172 French Caucasian cases and controls [22]; study #3: four SNPs, ~440 and ~790 mixed race (~60% Caucasian) cases and controls [23]; study #4: five SNPs, 113 and 299 Latino American cases and controls [24], and study #5: five SNPs, 255 and 280 African American cases and controls [24]]. The French Caucasian, mixed race, and African American studies reported significant lung cancer associations involving either single LIG1 SNPs and/or haplotypes. Our SNP panel included two SNPs (rs20581, rs20579) associated with risk of lung cancer in the mixed race study. However, neither SNP was found to be associated with lung cancer risk in our study. SNP rs10500298 was not in strong linkage disequilibrium with a third mixed race lung cancer risk-associated SNP (rs20580) nor with three French Caucasian lung cancer risk-associated SNPs (rs3730994, rs3786763, rs3730912).

Upon detection of DNA strand breaks, poly (ADP-ribose) polymerase 1 (PARP1) adds poly(ADP-ribose) to nuclear proteins, recruits XRCC1, and initiates BER [25]. Interestingly, inhibition of PARP1 creates a state of synthetic lethality in cells that are unable to complete homologous recombination as a result of BRCA1 or BRCA2 loss and PARP1 inhibitors show promise in treating BRCA-deficient breast and ovarian cancer [25,26]. In our study, the minor alleles of two PARP1 SNPs, rs747658 and rs3219073 (in high linkage disequilibrium), reduced lung cancer risk, per minor allele, by one third. Consistent with this, the relatively common PARP1 haplotype that contained the minor alleles from these two SNPs was strongly statistically significantly associated with a lower risk of lung cancer as well. Thus far, only a few studies on PARP1 variability and lung cancer risk have been published. Among Japanese, having at least one minor allele of SNP rs3219145 (Lys940Arg), a SNP absent in HapMap CEU whites and not evaluated by us, was associated with increased lung cancer risk (OR 1.40, 95% CI 1.04–1.90) [27]. Lockett et al. [28] reported that poly (ADP-ribose) polymerase activity is lower in lymphocytes collected from individuals with at least one 762Ala allele (Val762Ala; rs1136410) and in vitro experiments confirmed lower enzymatic activity of PARP1-Ala762 [29]. Consistent with this, among Han Chinese, having at least one minor allele of SNP rs1136410 was associated with an increased lung cancer risk (OR 1.26, 95% CI 1.05–1.52) [30]. However, no significant association was observed in Korean [31] and Japanese study populations [27], and in our study population the association was not statistically significant either (OR 1.18, 95% CI 0.96–1.46). The PARP1 SNPs evaluated in our study captured 46 (96%) of 48 phase 1 & 2 common-variant HapMap CEU SNPs.

Commonly described as a scaffold protein, XRCC1 may coordinate BER through interactions not only with PARP1, but also with DNA ligase III (LIG3), DNA polymerase β (POLB), apurinic/apyrimidinic (AP) nuclease (APEX1), and polynucleotide kinase 3′-phosphatase (PNKP) [32]. In our study population, two SNPs in XRCC1, rs1799782 (Arg194Trp) and rs3213255, were strongly associated with risk of lung cancer. Many published studies on XRCC1 variability and lung cancer risk have evaluated rs1799782 and two other non-synonymous SNPs, rs25489 (Arg280His) and rs25487 (Arg399Gln) (for example, [3335]). However, four recent meta-analyses, Kiyohara et al. [36], Wang et al. [37], Zheng et al. [38], and Vineis et al. [39], did not find any one of these three commonly studied non-synonymous coding SNPs associated with lung cancer risk in whites. In analyses adjusted for age and restricted to smokers with high pack-year exposures, a large white-only case-control study [34] also observed significantly lower lung cancer risk in association with Trp194 [34]. Therefore, our observation with respect to rs1799782 may reflect a study population enriched with heavy smokers combined with a Trp194 lung cancer protective effect that depends on a history of heavy smoking. The XRCC1 SNPs evaluated in our study captured 18 (75%) of 24 phase 1 & 2 common-variant (MAF≥0.05) HapMap CEU SNPs.

The effect of genetic variation in DNA repair and cell cycle control genes on lung cancer risk may become detectable only in the presence of certain environmental factors that cause DNA damage such as cigarette smoking. Our study population consisted of former and current smokers only and different results may be observed among never smokers. It should also be noted that multiple testing may have led to chance findings and that associations will need to be confirmed in additional studies. However, SNPs rs747658 and rs3219073 in PARP1 did remain associated with lung cancer risk at P<0.05 after applying the conservative Bonferroni correction.

Although our study population was relatively large, it is possible that some associations were not detected due to insufficient power. Specifically, testing under a log-additive model at a 0.05 two-sided significance level, our case-control study (722 cases and 929 controls) had under 90% power for very common variants (population MAF≥0.25) with lung cancer effects (OR) less than 1.3 and for common variants (population MAF=0.05) with lung cancer effects (OR) less than 1.6 [40]. Haplotype-based analyses may have improved our power for detecting genes with lung cancer association involving uncommon or multiple susceptibility alleles [41]. Evaluations of gene-environment interactions and case-case comparisons were limited to the 38 SNPs associated with lung cancer risk at P<0.05 and it is possible that important interactions and/or associations were not identified due to the limited number of SNPs investigated.

Unfortunately, our study population contained only a relatively small number of non-white subjects which limited our ability to evaluate SNP-lung cancer risk associations in non-white race groups. However, using 45 case and 58 control subjects with African genetic ancestry, we observed that four SNPs (ERCC3 rs4150407, LIG1 rs175628, RPA2 rs7356, and RPA3 rs13246995) were statistically significantly (P<0.05) associated with lung cancer risk, and replicated directional associations observed in subjects with Caucasian genetic ancestry.

Contributing to an expanding knowledge base about the role of genetic variation in DNA repair and cell cycle pathway genes in lung carcinogenesis, our findings may help identify biological processes behind lung cancer susceptibility. However, knowing the genotype of a single common SNP can not be used usefully to assign smokers to very high or low risk groups. This limitation motivates the search for multi-SNP indices capable of stratifying smokers into more meaningful risk groups [42]. As an illustration of one uncomplicated approach [42], we formed a 31-SNP genotype risk summary score that identified second, third, and fourth quartile groups at 1.04-fold, 2.21-fold, and 3.44-fold increased risk, respectively, relative to a first, lowest risk, quartile reference group.

To conclude, our data suggests that common variation in DNA repair and cell cycle control pathway genes is associated with smoking-related risk of lung cancer. Specifically, we observed strong associations with lung cancer risk for SNPs in GTF2H4, LIG1, PARP1 and XRCC1. An illustrative genotype risk summary score that combined genotype information for 31 SNPs in 15 genes risk-stratified current and ex-cigarette smokers over a 3.4-fold range. If confirmed in additional studies, combining genotype information for SNPs known to be associated with lung cancer risk may assist in classifying current and former smokers according to risk.

Supplementary Material

Supp Table S1

Supp Table S2


Grant support: This work was supported by National Cancer Institute grant 5P50 CA090440 to J. M. S. and Cancer Center Core Grant 2P30 CA047904 to the University of Pittsburgh Cancer Institute.


confidence interval
computed tomography
minor allele frequency
odds ratio
Pittsburgh Lung Screening Study
single nucleotide polymorphism


1. Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ. Cancer statistics, 2009. CA Cancer J Clin. 2009;59(4):225–249. [PubMed]
2. The Health Consequences of Smoking: A report of the Surgeon General. Atlanta, Ga: Dept. of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health; Washington, D.C: U.S. G.P.O; 2004. The impact of smoking on disease and the benefits of smoking reduction (Table 7.3) For sale by the Supt. of Docs.
3. Samet JM, Avila-Tang E, Boffetta P, Hannan LM, Olivo-Marston S, Thun MJ, Rudin CM. Lung cancer in never smokers: Clinical epidemiology and environmental risk factors. Clin Cancer Res. 2009;15(18):5626–5645. [PMC free article] [PubMed]
4. Villeneuve PJ, Mao Y. Lifetime probability of developing lung cancer, by smoking status, Canada. Can J Public Health. 1994;85(6):385–388. [PubMed]
5. Rudin CM, Avila-Tang E, Harris CC, et al. Lung cancer in never smokers: Molecular profiles and therapeutic implications. Clin Cancer Res. 2009;15(18):5646–5661. [PMC free article] [PubMed]
6. Wood RD, Mitchell M, Lindahl T. Human DNA repair genes, 2005. Mutat Res. 2005;577(1–2):275–283. [PubMed]
7. Asami S, Hirano T, Yamaguchi R, Tomioka Y, Itoh H, Kasai H. Increase of a type of oxidative DNA damage, 8-hydroxyguanine, and its repair activity in human leukocytes by cigarette smoking. Cancer Res. 1996;56(11):2546–2549. [PubMed]
8. el-Deiry WS, Harper JW, O’Connor PM, et al. WAF1/CIP1 is induced in p53-mediated G1 arrest and apoptosis. Cancer Res. 1994;54(5):1169–1174. [PubMed]
9. Hung RJ, Christiani DC, Risch A, et al. International Lung Cancer Consortium: pooled analysis of sequence variants in DNA repair and cell cycle pathways. Cancer Epidemiol Biomarkers Prev. 2008;17(11):3081–3089. [PMC free article] [PubMed]
10. Shen M, Berndt SI, Rothman N, et al. Polymorphisms in the DNA nucleotide excision repair genes and lung cancer risk in Xuan Wei, China. Int J Cancer. 2005;116(5):768–773. [PubMed]
11. Wang W, Spitz MR, Yang H, Lu C, Stewart DJ, Wu X. Genetic variants in cell cycle control pathway confer susceptibility to lung cancer. Clin Cancer Res. 2007;13(19):5974–5981. [PubMed]
12. Pabalan N, Bapat B, Sung L, Jarjanazi H, Francisco-Pabalan O, Ozcelik H. Cyclin D1 Pro241Pro (CCND1-G870A) polymorphism is associated with increased cancer risk in human populations: a meta-analysis. Cancer Epidemiol Biomarkers Prev. 2008;17(10):2773–2781. [PubMed]
13. Pritchard J, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:949–959. [PubMed]
14. Thomas DC, Witte JS. Point: Population stratification: A problem for case-control studies of candidate-gene associations? Cancer Epidemiol Biomarkers Prev. 2002;11:505–512. [PubMed]
15. Barrett J, Fry B, Maller J, Daly M. Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–265. [PubMed]
16. Friedlin B, Zheng G, Li Z, Gastwirth J. Trend tests for case-control studies of genetic markers: Power, sample size and robustness. Hum Hered. 2002;53:146–152. [PubMed]
17. Kuo C-L, Feingold E. What’s the best statistic for a simple test of genetic association in a case-control study? Genet Epidemiol. 2010;34:246–253. [PubMed]
18. Oksenych V, Coin F. The long unwinding road: XPB and XPD helicases in damaged DNA opening. Cell Cycle. 2010;9(1):90–96. [PubMed]
19. Wang SS, Gonzalez P, Yu K, et al. Common genetic variants and risk for HPV persistence and progression to cervical cancer. PLoS One. 2010;5(1):e8667. [PMC free article] [PubMed]
20. Lindahl T, Barnes DE. Mammalian DNA ligases. Annu Rev Biochem. 1992;61:251–281. [PubMed]
21. Shen H, Spitz MR, Qiao Y, Zheng Y, Hong WK, Wei Q. Polymorphism of DNA ligase I and risk of lung cancer--a case-control analysis. Lung Cancer. 2002;36(3):243–247. [PubMed]
22. Michiels S, Danoy P, Dessen P, et al. Polymorphism discovery in 62 DNA repair genes and haplotype associations with risks for lung and head and neck cancers. Carcinogenesis. 2007;28(8):1731–1739. [PubMed]
23. Lee Y-CA, Morgenstern H, Greenland S, et al. A case-control study of the association of the polymorphisms and haplotypes of DNA ligase I with lung and upper-aerodigestive-tract cancers. Int J Cancer. 2008;122(7):1630–1638. [PMC free article] [PubMed]
24. Chang JS, Wrensch MR, Hansen HM, et al. Nucleotide excision repair genes and risk of lung cancer among San Francisco Bay Area Latinos and African Americans. Int J Cancer. 2008;123(9):2095–2104. [PMC free article] [PubMed]
25. Rouleau M, Patel A, Hendzel MJ, Kaufmann SH, Poirier GG. PARP inhibition: PARP1 and beyond. Nat Rev Cancer. 2010;10(4):293–301. [PMC free article] [PubMed]
26. Fong PC, Boss DS, Yap TA, et al. Inhibition of poly(ADP-ribose) polymerase in tumors from BRCA mutation carriers. New Engl J Med. 2009;361(2):123–134. [PubMed]
27. Sakiyama T, Kohno T, Mimaki S, et al. Association of amino acid substitution polymorphisms in DNA repair genes TP53, POLI, REV1 and LIG4 with lung cancer risk. Int J Cancer. 2005;114(5):730–737. [PubMed]
28. Lockett KL, Hall MC, Xu J, et al. The ADPRT V762A genetic variant contributes to prostate cancer susceptibility and deficient enzyme function. Cancer Res. 2004;64(17):6344–6348. [PubMed]
29. Wang X-G, Wang Z-Q, Tong W-M, Shen Y. PARP1 Val762Ala polymorphism reduces enzymatic activity. Biochem Biophys Res Comun. 2007;354(1):122–126. [PubMed]
30. Zhang X, Miao X, Liang G, et al. Polymorphisms in DNA base excision repair genes ADPRT and XRCC1 and risk of lung cancer. Cancer Res. 2005;65(3):722–726. [PubMed]
31. Choi JE, Park SH, Jeon H-S, et al. No association between haplotypes of three variants (codon 81, 284, and 762) in poly(ADP-ribose) polymerase gene and risk of primary lung cancer. Cancer Epidemiol Biomarkers Prev. 2003;12(9):947–949. [PubMed]
32. Horton JK, Watson M, Stefanick DF, Shaughnessy DT, Taylor JA, Wilson SH. XRCC1 and DNA polymerase beta in cellular protection against cytotoxic DNA single-strand breaks. Cell Res. 2008;18(1):48–63. [PMC free article] [PubMed]
33. Cote ML, Yoo W, Wenzlaff AS, et al. Tobacco and estrogen metabolic polymorphisms and risk of non-small cell lung cancer in women. Carcinogenesis. 2009;30(4):626–635. [PMC free article] [PubMed]
34. Hung RJ, Brennan P, Canzian F, et al. Large-scale investigation of base excision repair genetic polymorphisms and lung cancer risk in a multicenter study. J Natl Cancer Inst. 2005;97(8):567–576. [PubMed]
35. Zhou W, Liu G, Miller DP, et al. Polymorphisms in the DNA repair genes XRCC1 and ERCC2, smoking, and lung cancer risk. Cancer Epidemiol Biomarkers Prev. 2003;12(4):359–365. [PubMed]
36. Kiyohara C, Takayama K, Nakanishi Y. Association of genetic polymorphisms in the base excision repair pathway with lung cancer risk: a meta-analysis. Lung Cancer. 2006;54(3):267–283. [PubMed]
37. Wang Y, Yang H, Li H, Li L, Wang H, Liu C, Zheng Y. Association between X-ray repair cross complementing group 1 codon 399 and 194 polymorphisms and lung cancer risk: a meta-analysis. Cancer Lett. 2009;285(2):134–140. [PubMed]
38. Zheng H, Wang Z, Shi X, Wang Z. XRCC1 polymorphisms and lung cancer risk in Chinese populations: a meta-analysis. Lung Cancer. 2009;65(3):268–273. [PubMed]
39. Vineis P, Manuguerra M, Kavvoura FK, et al. A field synopsis on low-penetrance variants in DNA repair genes and cancer susceptibility. J Natl Cancer Inst. 2009;101(1):24–36. [PubMed]
40. Gauderman W, Morrison J. QUANTO 1.1: A computer program for power and sample size calculations for genetic-epidemiology studies. 2006
41. Morris RW, Kaplan NL. On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genet Epidemiol. 2002;23:221–233. [PubMed]
42. Young RP, Hopkins RJ, Hay BA, et al. A gene-based risk score for lung cancer susceptibility in smokers and ex-smokers. Postgrad Med J. 2009;85(1008):515–524. [PubMed]