|Home | About | Journals | Submit | Contact Us | Français|
Few studies on the association between nucleotide excision repair (NER) variants and lung cancer risk have included Latinos and African Americans. We examine variants in six NER genes (ERCC2, ERCC4, ERCC5, LIG1, RAD23B and XPC) in association with primary lung cancer risk among 113 Latino and 255 African American subjects newly diagnosed with primary lung cancer from 1998 to 2003 in the San Francisco Bay Area, and 579 healthy controls (299 Latinos and 280 African Americans). Individual single nucleotide polymorphism and haplotype analyses, multifactor dimensionality reduction, and principal components analysis were performed to assess the association between six genes in the NER pathway and lung cancer risk. Among Latinos, ERCC2 haplotype CGA (rs238406, rs11878644, rs6966) was associated with reduced lung cancer risk [odds ratio (OR) of 0.65 and 95% confidence interval (CI): 0.44-0.97], especially among non-smokers (OR=0.29; 95% CI: 0.12-0.67). From multifactor dimensionality reduction analysis, in Latinos, smoking and three SNPs (ERCC2 rs171140, ERCC5 rs17655, and LIG1 rs20581) together had a prediction accuracy of 67.4% (p=0.001) for lung cancer. Among African Americans, His/His genotype of ERCC5 His1104Asp (rs17655) was associated with increased lung cancer risk (OR=1.78; 95% CI: 1.09-2.91), and LIG1 haplotype GGGAA (rs20581, rs156641, rs3730931, rs20579, and rs439132) was associated with reduced lung cancer risk (OR=0.61; 95% CI: 0.42-0.88). Our study suggests different elements of the NER pathway may be important in the different ethnic groups resulting either from different linkage relationship, genetic backgrounds, and/or exposure histories.
Nucleotide excision repair (NER) has been well described and is one of the three DNA repair pathways cells used to repair DNA base damage 1, 2. Despite numerous publications of the association of several NER genetic polymorphisms and lung cancer risk 3-41, only three studies included African Americans 7, 8, 31 and only two included Latinos 7, 31. Although 80 to 90 percent of lung cancer is attributable to smoking 42, smoking patterns may not fully explain the difference in lung cancer incidence, particularly among African Americans 43, 44, who have the highest lung cancer rates in the United States 45. This suggests that ethnic differences in the incidence rates of lung cancer may be partially explained by inherited variations among different ethnic/racial groups. Therefore, the current study examines the association between ERCC2, ERCC4, ERCC5, LIG1, RAD23B, and XPC and lung cancer risk in these two understudied populations, African Americans and Latinos (who have the lowest lung cancer rates in the United States) 45. We used logistic regression of individual candidate SNPs and haplotypes as well as principal components analyses and multifactor dimensionality reduction to thoroughly explore genetic associations and gene-environment interactions with lung cancer risk. Moreover, to control for potential population stratification in these admixed populations 46, all analyses in this study were adjusted for individual genetic ancestry determined by a panel of 184 ancestry informative markers.
Cases were identified through the Northern California Cancer Center’s rapid case ascertainment program and included San Francisco Bay Area residents newly diagnosed with primary lung cancer between September 1998 and March 2003. Subjects’ treating physicians were sent a letter asking whether subjects had any contraindications to participate in the study. If no contraindications were indicated by the physicians, subjects were sent a letter describing the purpose of the study and a postcard to return if they did not want to participate. Subjects who did not refuse participation were telephoned for a short interview to obtain information on ethnicity, and pre-diagnostic smoking history, occupational history, and dietary habits. Self-identified Latinos or African Americans were individually asked to participate in a more detailed in-person interview and to donate blood or buccal specimens.
Recruitment of control subjects has been described in detail previously 47. Briefly, control subjects were recruited through three sources: random-digit dialing, Health Care Financing Administration records, and community-based recruitment (e.g. health fair, churches, and senior centers). Controls were frequency-matched to cases on age, gender, and race/ethnicity (Latino or African American) with a control to case ratio of approximately 2 to 1. Control subjects completed in-person interviews and donated a blood and/or buccal specimen.
The study was approved by the Committee on Human Research of the University of California, San Francisco and by the Institutional Review Boards of all collaborating institutions.
The current analysis includes 17 single nucleotide polymorphisms (SNPs) belonging to 6 NER genes (ERCC2, ERCC4, ERCC5, LIG1, RAD23B, and XPC) and 1 SNP belonging to PPP1R13L, which forms a haplotype block with several SNPs of the ERCC2 gene, but is not involved with nucleotide excision repair (SNPs are listed in Supplemental Table 1). SNPs were selected using a candidate gene approach and were drawn from multiple sources. A number of SNPs (rs13181, rs1052555, rs3916876, and rs238406) were identified in ERCC2 from the literature 48-50, and rs17655, rs1805329, rs1800067, and rs2228001 were selected for their potential influence on DNA repair pathways 51. The SNP500Cancer database 52 was queried for SNPs appearing in candidate genes in the combined 102 individual SNP500 population with a minor allele frequency (MAF) >5%; SNPs rs1799787, rs20581, rs156641, rs3730931, rs20579, and rs439132 were selected in this manner. Finally the HapMap database 53 was used to generate haplotypes from candidate genes and their flanking 10,000bp regions in Yoruba West Africans from Ibadan, Nigeria (YRI) and CEPH (Utah residents with ancestry from northern and western Europe) populations. Rs1799793, rs171140, and rs11878644 were identified as tag SNPs in the CEPH data set, possessing an MAF>5%.
In addition to the SNPs of the NER genes, a panel of biallelic SNPs designed by co-author M. Seldin were genotyped to account for the potential population stratification among Latinos and African Americans, two admixed populations. European ancestral DNA was collected from 47 white European descent Caucasians who were healthy controls from an ongoing population based cancer study in SF Bay Area 54. African ancestral DNA (N = 47) was provided by co-author R. Kittles and was collected from 23 subjects from the Bini, a Niger-Congo group of Bantu speakers from Edo State and 24 subjects from the Kanuri, a group of Nilo Saharan speakers from the Lake Chad region of northern Nigeria. Amerindian ancestral DNA (N = 46) was provided by co-author G. Silva and was collected from Mayans living in two villages, Bola De Oro and Cienega Grande, from Chimaltenango. One hundred eighty-four unlinked autosomal SNPs with large differences in allele frequencies between ancestral populations were identified as ancestry informative markers (mean difference in allele frequencies ranged from 0.43 to 0.49). Genetic ancestry (percent European, Amerindian, and African ancestry) was estimated using these 184 ancestry informative markers and a maximum likelihood-based program written in R specifically developed for this project based on the methods described by Chakraborty et al. 55 and Hanis et al. 56.
Genotyping was performed on an Illumina BeadStation 500G Golden Gate genotyping platform with a custom panel of 384 candidate and ancestry informative SNPs and unamplified DNA extracted from blood. For six subjects with insufficient DNA from blood, genotypes from whole genome amplified blood or buccal DNA samples are included in the data set. Whole genome amplification (WGA) was performed as previously described 57. Genotype reproducibility was verified with duplicates of unamplified DNA (N=31) and WGA/genomic DNA pairs. Unamplified duplicates averaged 99.99% reproducible over a range of 99.86-100%. Depending on whether WGA was amplified from blood (N=18 pairs tested) or buccal derived DNA (N=28 pairs tested), WGA/genomic pairs respectively averaged 99.39% (98.93-99.60%) and 98.49% (96.11-99.73%) genotype reproducibility.
All Latino (n = 131) and African American (n = 267) cases were genotyped along with all available Latino controls (n = 308). Due to budget constraints, we selected a random sample (n=290) of African American controls for genotyping. For the current analysis, we excluded subjects who reported belonging to other ancestral/ethnicity groups in addition to Latino or African American. The final sample of this study consists of 412 Latino subjects (113 cases and 299 controls) and 535 African American subjects (255 cases and 280 controls)
All analyses were conducted separately within the two ethnic groups, Latino and African American. We calculated allele frequencies for all NER SNPs and excluded from further analysis those with a minor allele frequency less than 5%. Tests of Hardy-Weinberg equilibrium were performed for each SNP by using the exact test in the Proc Allele procedure of SAS Genetics (Cary, NC) and SNPs failing Hardy Weinberg test with a false discovery rate (FDR) <0.05 (after adjustment for multiple comparisons) were excluded.
Further analyses were performed in the following order:
Unconditional logistic regression was performed with each individual SNP without assuming any mode of inheritance by including two index variables in the model (one for heterozygous variant and one for homozygous variant genotype). In addition, tests for trend were performed using the log-additive model by coding the copies of minor alleles as 0, 1, and 2.
Haplotype blocks for two genes (ERCC2 and LIG1) with more than one SNP genotyped were determined by Haploview version 3.32 59, using the block definition as described by Gabriel et al. 60. Haplotype analysis was then performed with each haplotype block. Haplotypes were estimated from SNPs belonging to the same haplotype block by expectation-maximization (EM) algorithm using the SAS macro HAPPY written by Kraft and Chen (http://www.hsph.harvard.edu/faculty/kraft/soft.htm). HAPPY SAS macro includes the SAS PROC HAPLOTYPE with the “stepem” option based on the haplotype estimation SNPHAP program by David Clayton 61. A study by Adkins compared four methods of estimating haplotypes, including Phase and SNPHAP and showed that all four methods performed equally well 62. Haplotypes with frequency less than 5% were combined into one group for analysis. Haplotype trend regressions were performed to estimate the odds ratio (OR) associated with having one-copy increment of a specific haplotype using the most common haplotype as the reference group 63, 64. To account for the uncertain phases of haplotypes, the probabilities of having different haplotype combinations were incorporated as weights in the regression model. Global tests for the association between haplotypes of a haplotype block and lung cancer were performed comparing the full model with the haplotype variables to the submodel without the haplotype variables using the log-likelihood ratio test.
To assess gene-smoking interaction, analyses stratified by smoking status were performed with ERCC2 and LIG1 haplotypes or individual ERCC4, ERCC5, RAD23B, and XPC SNPs with a main effect p ≤ 0.10. Tests for interaction were performed by including product terms between smoking status and SNPs or haplotypes in the unconditional logistic regression model. P-value for interaction was obtained by log-likelihood ratio test comparing the full model with product terms to the submodel without the product terms.
Multifactor dimensionality reduction (MDR) analysis was performed to assess high-order smoking-SNP and SNP-SNP interactions. Subjects with missing data on at least one SNP were excluded from the MDR analysis (12 Latinos and 20 African Americans). A detailed description of MDR has been published previously 65, 66. Briefly, MDR is a nonparametric method which reduces n-dimensional data to a single dimensional variable with two levels (high vs. low risk). The MDR procedure performs exhaustive searches of all possible combinations of n genetic/environmental factors and the best combination of n-factors is the one with the highest prediction accuracy and the highest cross validation consistency. For the current analysis, we allowed MDR to choose up to four variables among all qualified SNPs and smoking status (ever vs. never). We repeated the 10-fold cross validation 10 times using 10 different random seeds to reduce the probability of spurious findings due to chance division of the data. P-values were calculated by permutation testing with 1000 permutations. The best combination of n factors was then included in the unconditional logistic regression model as a dichotomous predictor (high vs. low risk) to determine the associated OR while adjusting for age, sex, and genetic ancestry.
As an alternative to haplotype analysis, principal components analysis (PCA) was performed with ERCC2 and LIG1 SNPs using a method described by Gauderman et al. 67. The PCA method captures the linkage-disequilibrium pattern within a gene but does not require one to estimate haplotypes with unknown phase 67. Simulations showed that PCA is as or more powerful than both genotype- and haplotype-based approaches67. First, PCA was performed to generate principal components that capture the correlation structure between SNPs within a gene. Then, the principal components that explained at least eighty percent of the variance were modeled for their association with lung cancer status by logistic regression. The eighty-percent cut-off was shown to have sufficient statistical power according to Gauderman et al. 67. SNPs that are strongly correlated with the principal components that are significantly associated with lung cancer risk are thought to be the important SNPs (or linked to the important SNPs) for disease susceptibility. Tests for interaction between smoking and principal components generated from PCA were also performed.
Among Latinos, cases were more likely to have ever smoked, smoked more pack-years, had higher income, and had a higher mean percentage of European ancestry and a lower mean percentage of Amerindian ancestry compared to controls (Table 1). Among African Americans, cases were more likely than controls to have ever smoked, smoked more pack-years, and had fewer years of schooling, but notably the percentages of European and African genetic ancestry were very similar for cases and controls.
One SNP (ERCC2 rs3916876) among Latinos and three SNPs (ERCC2 rs3916876, ERCC4 rs1800067, and RAD23B rs1805329) among African Americans were excluded from analysis due to MAF < 0.05 (supplemental table 1). All SNPs were in Hardy-Weinberg equilibrium except for rs238406 (p=0.01) among Latino controls and rs20581 (p=0.03) among African American controls; however, the deviation of those two SNPs from Hardy-Weinberg equilibrium could be due to chance after accounting for multiple testing (FDR > 0.05), and thus they were kept in the analyses (FDRs for rs238406 among Latino controls and rs20581 among African American controls were 0.44 and 0.58, respectively).
Among Latinos (Table 2), two of the seventeen SNPs tested were significantly associated with risk of lung cancer (p<0.05); these were ERCC2 rs13181 (Lys751Gln), and PPP1R13L rs6966, which forms a haplotype block with several ERCC2 SNPs. Among African Americans (Table 2), three of fifteen SNPs were significantly associated with risk of lung cancer; these were ERCC5 rs17655 (Asp1104His), and LIG1 rs20579 and rs439132. We performed sensitivity analyses with these significant SNPs adjusting for smoking as an additional covariate and the results were either similar or more statistically significant (see footnote of Table 2).
For ERCC2, Latinos had three haplotype blocks, whereas African Americans had two haplotype blocks (supplemental figures 1 and 2). For LIG1, Latinos had one haplotype block of three SNPs and African Americans had one haplotype block of five SNPs (supplemental figures 3 and 4).
Among Latinos, reduced lung cancer risk was associated with ERCC2 haplotype blocks 2B and 3B compared to the most frequent haplotypes (2A and 3A, respectively) (Table 3). While for African Americans, the most significant result was observed with LIG1 haplotype 1B, which was inversely associated with risk of lung cancer compared to haplotype 1A (Table 3). The reduced risk with this haplotype seems to be attributed to the combination of the G allele of rs3730391 and the A allele of rs20579. We performed sensitivity analyses with these significant haplotypes adjusting for smoking as an additional covariate and the results were either similar or more statistically significant (see footnote of Table 3).
For the proposed prior probabilities of 0.25 and 0.1, the FPRPs for Gln/Gln genotype of ERCC2 Lys751Gln were 0.42 and 0.69, respectively, suggesting a weak to moderate evidence for the association. The FPRPs for ERCC2 block 3B were 0.19 and 0.41 for prior probabilities of 0.25 and 0.1, respectively, suggesting a strong to moderate evidence for the association.
For the proposed prior probabilities of 0.25 and 0.1, the FPRPs for His/His genotype of ERCC5 Asp1104His were 0.21 and 0.44, respectively, suggesting moderate evidence for the association. The FPRPs for LIG1 block 1B were 0.07 and 0.19 for prior probabilities of 0.25 and 0.1, respectively, suggesting strong evidence for the association.
For ERCC2 block2, haplotype 2B was associated with a statistically significant reduced risk of lung cancer compared to haplotype 2A only among non-smokers (Table 5) but the test for interaction was not statistically significant (p=0.22). For block 3, haplotype 3B was associated with a reduced risk of lung cancer compared to haplotype 3A only among non-smokers and the test for interaction was borderline statistically significant (p=0.09).
For ERCC5 Asp1104His, the risk of those with His/His variant genotype was significantly increased compared to those with Asp/Asp wildtype genotype only among ever smokers (OR= 1.92, 95% CI: 1.10-3.36); however, the test for interaction was not statistically significant (Table 6).
LIG1 haplotype 1B, which was associated with a statistically significant reduced risk of lung cancer in the combined analysis, showed similar ORs across smoking strata (Table 6). Although the test for interaction between LIG1 haplotypes and smoking was borderline statistically significant (p= 0.05), this was mainly attributed to the difference between the risk associated with the rare haplotype groups among different smoking strata, the result of which can not be easily interpreted. Therefore, it was concluded that there is no evidence of interaction between LIG1 haplotypes and smoking on the risk of lung cancer for the major LIG1 haplotypes.
For Latino subjects, the MDR procedure identified smoking, rs171140 of ERCC2, rs17655 of ERCC5, and rs20581 of LIG1 as the best combination for predicting the case/control status (Table 7 and supplemental figure 5) with a prediction accuracy of 67.4% and an associated p-value of 0.001, although smoking alone had a good prediction accuracy of 62.5%. The OR associated with the “high risk” group as defined by the best combination (smoking, rs171140, rs17655, and rs20581) was 8.02 (95% CI: 4.67 - 13.77, p-value < 0.001), adjusting for age, sex, percent of European and Amerindian genetic ancestry using unconditional logistic regression.
For African American subjects, smoking was the best predictor of lung cancer with a prediction accuracy of 63.5% and a p-value of <0.001 (supplemental table 2)
Results for the PCA (supplemental tables 3-6) were consistent with those from haplotype analyses.
The PCA identified a significant inverse interaction between principal component 2 (PC2) and smoking (supplemental table 3), meaning that the increased risk associated with PC2 weakened as the number of pack-years smoked increased. Three of the four SNPs (rs238406, rs11878644, and rs6966) which demonstrated strong correlations with PC2 also made up block 3 of ERCC2 in the haplotype analysis. The direction of the correlation between these three SNPs and PC2 indicated that the C allele of rs238406, G allele of rs11878644, and A allele of rs6966 constituted a group with lower risk; therefore, the result of the PCA was consistent with that of haplotype analysis (Table 3).
The PCA indicated a significant decreased risk of lung cancer associated with PC1 of LIG1, which is strongly correlated with rs3730931 and rs20579 (supplemental table 6). The positive correlation between rs3730931 and rs20579 and PC1 indicated that the alleles associated with reduced lung cancer risk are G for rs3730931 and A for rs20579 which were consistent with results from the haplotype analysis (Table 3).
In this study, we used an integrative approach to analyze both single variants and haplotypes of genes in the NER pathway, including MDR analysis to account for the complex gene-gene and gene-smoking interactions, and principal components analysis for thorough exploration of correlations among variants that are not linkage-phase dependent. For Latinos, in the MDR analyses, smoking was a strong predictor of lung cancer, as expected, but three SNPs (ERCC2 rs171140, ERCC5 rs17655, and LIG1 rs20581) also increased the case-control prediction accuracy, suggesting that additional effect modification by genetic factors may also be important. Since MDR deals with statistical prediction, whether the results of MDR have any biological significance would need to be confirmed by laboratory studies.
Another strength of this study was the ability to control for ancestry differences among cases and controls within each ethnic group using ancestry informative markers. As previously described, cases of this study were ascertained from a population registry, while controls came from a variety of sources including random digit dialing, Health Care Financing Administration (Medicare) rolls, and community sources such as churches, senior centers, etc 47. This may explain why the percentage of Amerindian ancestry was higher among Latino controls than cases; controls were more likely to have Central American heritage while cases were more likely to be third or higher generation US ancestry and Mexican ancestry. Controlling for this difference in ancestry (population stratification) by inclusion of genetic ancestry in the logistic models as determined by an extensive panel of ancestry informative markers, increases confidence that observed differences among cases and controls for NER pathway genes is not due to ancestral differences. For Latinos, the adjustment for genetic ancestry moved the association toward the null for most SNPs or haplotypes, suggesting the existence of some population stratification, but the confounding of the gene-disease association by population stratification did not appear extensive. For African Americans, the results were almost identical with and without adjusting for genetic ancestry, suggesting that population stratification was minimal. One must be aware that since population stratification is dependent on different allele frequencies and disease risks among different ethnic groups, the minimal impact of population stratification observed in this study can not be generalized to other studies with different SNPs and different admixed populations.
Comparisons of our results for each gene in relation to previously reported literature are discussed in detail below.
In the current study, the Asp312Asn (rs1799793) was not significantly associated with lung cancer risk among either Latinos or African Americans. In contrast, the Gln/Gln genotype of Lys751Gln (rs13181) was associated with increased lung cancer risk among Latinos but not among African Americans. The only other study of ERCC2 and lung cancer among African Americans also reported a null association between Lys751Gln Gln/Gln genotype and lung cancer (OR=1.03; 95% CI: 0.40-2.65) and did not report on other ERCC2 variants 8. These variants have been assessed in twenty studies of Asians and Caucasians with mixed results 5, 6, 8-11, 13, 17-21, 23, 24, 26, 29, 36-38, 41. A recent meta-analysis of ERCC2 genes in 11 populations found that the Asp312Asn polymorphism was not associated with risk of lung cancer 68; and that the Lys751Gln Gln/Gln genotype yielded a pooled OR of 1.30 (95% CI: 1.13-1.49) with data from 15 study populations. This association was confined to Caucasians (OR=2.25; 95% CI: 0.97-5.23) and was not apparent in Asian populations (OR=1.02; 95% CI: 0.20-5.27) 68. However, the null result could be due to a low frequency of Gln/Gln among Asians (≤ 2% for 3 of the 4 Asian studies included in the meta-analysis) 68. More recent studies also showed no association of lung cancer risk with Asp312Asn polymorphisms in either Asians 13, 24, 37 or Caucasians 9, while one 36 of five 9, 13, 19, 24, 36 recent studies showed an significant increased risk of lung cancer associated with Lys/Gln genotype of Lys751Gln. The functional impact of the ERCC2 polymorphisms is yet to be clarified. A recent study showed that the variants of Arg156Arg, Asp312Asn, and Lys751Gln polymorphisms were all associated with a decreased mRNA expression 69; however, another study showed that the variants of Asp312Asn, and Lys751Gln and the double variants of (Asp312Asn/Lys751Gln) had no impact on nucleotide excision repair capacity or the basal transcription of ERCC2 70.
Ethnic differences in associations of lung cancer risk with ERCC2 variants suggest that either those polymorphisms may only be important for certain ethnicities or the presence or absence of associations could result from different linkage patterns between the SNPs genotyped and the causal SNPs. There is a high variability in the allele frequencies and the linkage disequilibrium patterns of ERCC2 polymorphisms among Europeans, Africans, and Asians 50. Thus, it is important to examine the association between ERCC2 haplotypes and the risk of lung cancer, as haplotype analysis may point to the important region(s) of the gene that warrant further examination. Furthermore, the lung cancer risk may not be attributed to individual SNPs, but more to haplotypes which may reflect the joint effect of multiple SNPs.
For Latinos, both the haplotype and principal components analyses of ERCC2 suggested that block 2 and block 3 may be important regions associated with the risk of lung cancer for Latinos. The strongest association was for block 3, which spans the 5′ upstream region of the ERCC2. Given the association observed in Latinos, further examination and sequencing of the 5′ upstream region of ERCC2 may be warranted, since it may contain important regulatory sequences and polymorphisms influencing the expression of ERCC2.
Among Latinos, interaction analyses showed that the association between lung cancer risk and ERCC2 haplotypes was confined to non-smokers. Similar findings have been reported by three other studies in other ethnic groups 9, 11, 38. A possible explanation is that the extensive damage due to the high dose of carcinogens among heavy smokers overwhelms the DNA repair capacity of ERCC2, and the “protective” advantages of certain genotypes or haplotypes are attenuated or obliterated under such conditions.
Too few studies have examined variants in ERCC5 with lung cancer risk for consistent results to have emerged. Among African Americans in this study, those with the His/His genotype of Asp1104His had statistically significant higher lung cancer risk. Although similar results were reported by the only other study among African Americans, results were not statistically significant because of the small number of study subjects (71 cases and 71 controls) 7. Significantly higher lung cancer risk among His1104 carriers has also been observed among Caucasians, Mexican Americans, Asian Americans 7 and Koreans 14. In contrast, among Latinos, we observed a non-statistically significant lower risk of lung cancer for those with His/His genotype. A lower risk of lung squamous cell carcinoma for His carriers was also suggested in a study among Japanese subjects 24. However, a study among Chinese found no association of His1104 genotype or two ERCC5 haplotype blocks with lung cancer risk 26. In contrast, a study among Caucasians reported increased lung cancer risk with the rare haplotype (CCCGA) formed by rs732321, rs4150360, rs3759500, rs3818356, and rs4771436 19. Since we only typed one SNP for ERCC5, we were not able to perform haplotype analyses.
Among African Americans, our analysis suggested a possible interaction of ERCC5 variants with lung cancer risk with those with His/His genotype and ever smoked having the highest risk of lung cancer. Two studies reported a similar interaction between Asp1104His and smoking on the development of lung cancer 7, 14.
The functional impact of Asp1104His polymorphism is currently unknown though the resulting amino acid substitution may potentially affect the structural integrity of the protein. Future laboratory assessment is necessary to determine the functional impact of this polymorphism.
Among Latinos, none of the five LIG1 SNPs included in this study were significantly associated with lung cancer risk although the numbers of A allele of rs20579 showed a borderline significant trend with increasing risk (p=0.07). For African Americans, rs20579 A allele was significantly associated with a decreased lung cancer risk while the rs439132 G allele was significantly associated with increased risk. A study among Eastern and Central Europeans showed that subjects who are heterozygous for rs20579 had an increased risk of young-onset lung cancer compared to those with homozygous wildtype genotype 15. In addition, the same study reported that the variant G allele of rs3730931 was associated with an increased risk of early-onset lung cancer, which was not observed by our study. Neither our study nor the study by Michiels et al. 19 found any association of rs20581 (Asp802Asp) and rs156641 variants and lung cancer risk.
Among Latinos, neither our haplotype nor principal components analyses revealed any association between LIG1 variants and lung cancer risk. For African Americans, our haplotype and principal components analyses suggested that variations in rs3730931 and rs20579 or regions linked to those two SNPs may be associated with lung cancer risk. Similarly, the only other study of lung cancer risk and LIG1 haplotypes reported a statistically significant association 19, though different choices of SNPs and a study population with a different ethnic background make it difficult to compare the results their haplotype analysis to ours.
Among Latinos, RAD23B Ala249Val variants were not significantly associated with lung cancer risk. We did not assess the Ala249Val polymorphism among African Americans since the minor allele frequency was low (4%). A study among Chinese reported an elevated lung cancer risk associated with having either Ala/Val or Val/Val genotypes 26. Another study also observed a higher frequency of the Val allele among lung cancer cases compared to controls (0.18 vs. 0.15) although not statistically significant 19.
A major limitation of this study is the relatively small sample size which may have limited the statistical power to detect a weak SNP-disease association and increased the probability of spurious significant results. The small sample size in the current study may not have sufficient power to detect gene-environment interactions; therefore, the results of the gene-smoking analysis should be viewed as exploratory. In addition, SNP coverage is sparse in the genes examined by this study so the negative findings may not necessarily preclude their importance in the development of lung cancer. Further studies should incorporate greater coverage of variation in NER pathway genes. Nevertheless, this is one of the few studies examining the association between NER SNPs and lung cancer among Latinos and African Americans.
In conclusion, among Latinos, the current study showed that ERCC2 may be associated with risk of lung cancer especially among non-smokers, and that smoking together with ERCC2, ERCC5, and LIG1 may have a joint influence on the development of lung cancer. For African Americans, we found that ERCC5 and LIG1 were independently associated with lung cancer risk. Thus, our study and others have suggested that different elements of the pathway may be important in the different ethnic groups resulting either from different linkage patterns, genetic backgrounds, and/or exposure histories. These results need to be confirmed by future large-scale studies among Latinos and African Americans.
This work was supported by a grant from the National Institute of Environmental Health Sciences (R01 ES06717). Dr. Jeffrey S. Chang was also supported by the National Cancer Institute (R25 CA112355). We thank Dr. John Belmont of Baylor College of Medicine for the collection of Mayan DNA samples.