|Home | About | Journals | Submit | Contact Us | Français|
Base excision repair (BER) is the primary DNA damage repair mechanism for repairing small base lesions resulting from oxidation and alkylation damage. This study examines the association between 24 single-nucleotide polymorphisms (SNPs) belonging to five BER genes (XRCC1, APEX1, PARP1, MUTYH and OGG1) and lung cancer among Latinos (113 cases and 299 controls) and African-Americans (255 cases and 280 controls). The goal was to evaluate the differences in genetic contribution to lung cancer risk by ethnic groups. Analyses of individual SNPs and haplotypes were performed using unconditional logistic regressions adjusted for age, sex and genetic ancestry. Four SNPs among Latinos and one SNP among African-Americans were significantly (P<0.05) associated with either risk of all lung cancer or non-small cell lung cancer (NSCLC). However, only the association between XRCC1 Arg399Gln (rs25487) and NSCLC among Latinos (odds ratio associated with every copy of Gln = 1.52; 95% confidence interval: 1.01–2.28) had a false-positive report probability of <0.5. Arg399Gln is a SNP with some functional evidence and has been shown previously to be an important SNP associated with lung cancer, mostly for Asians. Since the analyses were adjusted for genetic ancestry, the observed association between Arg399Gln and NSCLC among Latinos is unlikely to be confounded by population stratification; however, this result needs to be confirmed by additional studies among the Latino population. This study suggests that there are genetic differences in the association between BER pathway and lung cancer between Latinos and African-Americans.
Base excision repair (BER) is the primary mechanism for repairing small base lesions in DNA resulting from oxidation and alkylation damage (1). The major genes involved in the BER pathway include MUTYH, OGG1, APEX1, XRCC1 and PARP1 (1). The role of BER in cancer has been evaluated extensively and studies have shown that various genetic variants of BER genes may be associated with cancers especially those cancers caused by tobacco use. Although many studies have been published on the association between BER genes and lung cancer, only one included African-Americans (2), who have the highest lung cancer incidence in the USA (3), and only one included Latinos (4), who have the lowest lung cancer incidence in the USA (3). Although African-Americans generally smoke more (both in the prevalence and the amount) than Latinos, smoking pattern alone does not completely explain the difference in their incidences of lung cancer, suggesting a role for inherited variation (5,6). This study examines 24 single-nucleotide polymorphisms (SNPs) in five BER genes (XRCC1, APEX1, PARP1, MUTYH and OGG1) in the development of lung cancer among Latinos and African-Americans in the San Francisco Bay Area. The goal was to evaluate differences in the genetic contribution to the development of lung cancer between Latinos and African-Americans.
Primary lung cancers newly diagnosed among San Francisco Bay Area residents between September 1998 and March 2003 were identified through the Northern California Cancer Center’s rapid case ascertainment program. In addition, there was one hospital in our recruitment area that did not participate in Northern California Cancer Center’s rapid case ascertainment program. For that hospital, cases were identified by an independent effort similar to Northern California Cancer Center’s rapid ascertainment method. A letter was sent to subjects’ treating physicians inquiring whether subjects had any contraindications to participate in the study, and if the response was negative, subjects were sent a letter describing the purpose of the study and a postcard to return for refusal. Subjects who did not refuse participation were telephoned for a short interview to obtain information on ethnicity, prediagnostic smoking history, occupational history and dietary habits. Subjects who self-identified as Latinos or African-Americans were asked to participate in a more detailed in-person interview and to donate blood or buccal specimens.
Recruitment of control subjects has been described in detail previously (7). Briefly, three sources were used to recruit control subjects and included random-digit dialing, Health Care Financing Administration records and community-based recruitment (e.g. health fair, churches and senior centers). Controls were frequency matched to cases on age, sex and race/ethnicity (Latino or African-American) with a control to case ratio of ~2 to 1. Participating control subjects were asked to complete in-person interviews and donate a blood and/or buccal specimen.
The study was approved by the Committee on Human Research of the University of California, San Francisco, and by the Institutional Review Boards of all collaborating institutions.
SNPs were selected from several sources and covered the 10 kb upstream and downstream regions of the genes to include SNPs associated with regulatory elements. Some SNPs were selected from available databases including HapMap (8) (rs1001581, rs17655484, rs2030404, rs2256507, rs25486, rs25496, rs2682562, rs2854510, rs3213255, rs3213403, rs7148997 and rs1041856) and SNP500 (9) (rs3213274, rs1760944 and rs3219149). Other SNPs were commonly studied SNPs identified in the literature [rs3547 (Gln632Gln), rs25487 (Arg399Gln), rs1799780, rs1799782 (Arg194Trp), rs25489 (Arg280His), rs2682585, rs1130409 (Asp148Glu), rs3219466 and rs1052133 (Ser326Cys)].
One hundred eighty-four ancestry informative markers were included to account for potential population stratification among the two admixed populations (Latinos and African-Americans) in this study. These 184 markers were assembled using DNA from Europeans (San Francisco Bay Area, n=47), West Africans (Bantu and Nilo Saharan speakers, Nigeria, n=46) and Amerindians (Mayans, Guatemala, n=46). The mean difference in allele frequencies between these three parental populations ranged from 0.43 to 0.49.
Genotyping was performed at the University of California Davis Genome Center on an Illumina BeadStation 500G Golden Gate custom panel using unamplified DNA extracted from blood. For six subjects with insufficient DNA from blood, genotyping was performed on whole-genome amplified blood or buccal DNA samples as described previously (10). Genotyping reproducibility was verified with duplicates and the reproducibility averaged 99.99% for 31 unamplified DNA duplicates (ranged from 99.86 to 100%), 99.39% for 18 whole-genome amplified blood DNA duplicates (98.93–99.60%) and 98.49% for 28 whole-genome amplified buccal DNA duplicates (96.11–99.73%).
All Latino (n=113) and African-American (n=255) cases and all available Latino controls (n=299) were genotyped. A random sample of African-American controls (n=280) was selected for genotyping due to budget constraints. The random sample of African-American controls maintained the equal distribution of the matching variables (age and sex) between cases and controls (Table I). In addition to matching for age and sex, adjustment for confounding was done in the analytical stage by including these two variables in the statistical models.
Although there are known epidemiologic and biological differences between small cell lung cancer and non-small cell lung cancer (NSCLC), BER of DNA damage is ubiquitous across many cell types, so all lung cancer types were initially combined for analysis to maximize sample size for statistical power. In addition, separate analyses were performed for NSCLC. No separate analyses were performed with small cell lung cancer or the finer NSCLC subtypes (squamous cell carcinoma, adenocarcinoma or large cell) because the smaller number of cases for small cell lung cancer or each NSCLC subtype lacked statistical power. Separate analyses were conducted for the two ethnic groups, Latino and African-American. SNPs that had a minor allele frequency <5% or were not in Hardy–Weinberg equilibrium (false discovery rate adjusted P-value<0.05) among control subjects were excluded from further analysis. All SNP– or haplotype–lung cancer association analyses were adjusted for genetic ancestry (percent European and Amerindian ancestry), which was estimated using 184 ancestry informative markers and a maximum likelihood-based program written in R specifically developed for this project based on the methods described by Chakraborty et al. (11) and Hanis et al. (12). In addition, all analytical models were adjusted for the two matching variables, age and sex. Although smoking is an established risk factor for lung cancer, since smoking does not directly determine one’s genotype, it does not satisfy the definition of a confounder. Smoking may be indirectly associated with BER genes through race/ethnicity but we accounted for this by stratifying on self-reported race/ethnicity and adjusting for genetic ancestry. Sensitivity analysis adjusting for smoking did not change our results (data not shown).
Individual SNPs were analyzed without assuming the mode of inheritance by including two index terms for heterozygous and homozygous variant in the unconditional logistic regression model. In addition, test for trend was performed to assess the odds ratio (OR) associated with one-copy increment of the variant allele.
Haplotype analyses were performed for two genes (XRCC1 and APEX11) with more than one SNP genotyped. First, the haplotype blocks for each gene were defined by the method proposed by Gabriel et al. (13) using Haploview software (14). Haplotypes for each block were then estimated by the expectation–maximization algorithm using the SAS macro HAPPY written by Kraft and Chen (http://www.hsph.harvard.edu/faculty/kraft/soft.htm), which includes the SAS PROC HAPLOTYPE with the ‘stepem’ option based on the haplotype estimation SNPHAP program (15). Haplotype trend regressions were performed to estimate the OR associated with having one-copy increment of a specific haplotype compared with the most common haplotype (reference group) (16,17). Haplotypes with frequency <5% were combined into one group for analysis. The probabilities of having different haplotype combinations were incorporated as weights in the regression model to account for the uncertain phases of haplotypes. Global tests for the association between haplotypes of a haplotype block and lung cancer were performed comparing the full model with the haplotype variables to the submodel without the haplotype variables using the log-likelihood ratio test.
In addition to haplotype analysis, principal components analysis (PCA) was performed with XRCC1 and APEX1. The PCA method captures the linkage disequilibrium pattern within a gene but does not require one to estimate haplotypes with unknown phase and is more powerful than the haplotype-based approach (18). First, principal components that capture the correlation structure between SNPs within a gene were generated. Then, the principal components that explained 80% or more of the variance were modeled for their association with lung cancer status by unconditional logistic regression. The 80% cut off was shown to have sufficient statistical power (18). The results from the PCA were also used to guide us in combining haplotypes for analyses.
False-positive report probabilities (FPRPs) for the significant results (P<0.05) were calculated to account for potential false positives. FPRP is defined as the probability of no true association between a genetic variant and disease given the statistically significant finding; the magnitude of FPRP depends on the prior probability of the association between a genetic variant and the disease and the statistical power of the test (19). SNPs with some biological evidence for functional impact were assigned prior probabilities of 0.10–0.25; other SNPs were assigned prior probabilities of 0.001–0.01. A significant finding with a FPRP <0.50 warrants replication by future studies. As other investigators may have different views on assigning prior probabilities, FPRPs were presented with prior probabilities ranging from 0.001 to 0.25.
Gene–smoking interactions were assessed using analyses stratified by smoking status with individual SNPs or haplotypes with a main effect P≤0.10. Tests for interaction were performed by including product terms between smoking status and SNPs or haplotypes in the unconditional logistic regression model. P-value for interaction was obtained by the log-likelihood ratio test comparing the full model with product terms with the submodel without the product terms.
Among Latinos, cases had higher household income and a higher percentage of ever smokers compared with the controls (Table I). Among Latinos who ever smoked, cases smoked more pack-years compared with the controls. Latino cases had a higher percentage of European and a lower percentage of Amerindian genetic ancestry. Among African-Americans, controls had more years of education and a lower percentage of ever smokers compared with the cases. In addition, among African-American smokers, cases smoked more pack-years compared with controls. The distribution of genetic ancestry estimates was similar among African-American cases and controls.
SNPs included in the analyses are presented in supplementary Table 1 (available at Carcinogenesis Online). All SNPs were in Hardy–Weinberg equilibrium after accounting for multiple testing (false discovery rate>0.05). Three SNPs (rs25496, 3213274 and rs3219466) among Latinos and four SNPs (rs25489, rs1799780, rs3219149 and rs3219466) among African-Americans were excluded because the minor allele frequencies were <5%.
Two SNPs were significantly associated with risk of all lung cancer (rs2256507 and rs25486) among Latinos (Table II). rs2256507 is located within the 10,000 kb region downstream from the 3′ end of XRCC1 and is part of the ZNF575 gene. rs25486 is located in the ninth intron of XRCC1 and is in high linkage disequilibrium (r2=0.88) with a non-synonymous SNP, rs25487 (Arg399Gln).
Among African-Americans, the only significant lung cancer association was found with rs17655484. This SNP is located in the 10,000 kb region upstream from the 5′ end of XRCC1 and belongs to LOC390940.
Four SNPs were significantly associated with risk of NSCLC among Latinos (Table III). Two (rs2256507 and rs25486) of the four SNPs were also significantly associated with risk of all lung cancer, but these associations became stronger with NSCLC. The other two (rs25487 and rs1001581) of the four SNPs were only significantly associated with NSCLC. rs25487 (Arg399Gln) is a non-synonymous SNP located in exon 10 of XRCC1. rs1001581 is located in the second intron of XRCC1.
Among African-Americans, the only significant SNP (rs17655484) associated with all lung cancer was also the only SNP associated with risk of NSCLC; however, the association was stronger for NSCLC than for all lung cancer.
Twelve of the 15 XRCC1 SNPs form a single haplotype block (supplementary Figure 1 is available at Carcinogenesis Online). Global tests for the XRCC1 haplotype association were not significant (P>0.05) for either all lung cancer or NSCLC; however, haplotype 3 and 5 were associated with a statistically significant reduced risk of NSCLC compared with haplotype 1 (Table IV).
PCA showed that principal component 1 was associated with a significantly increased risk of NSCLC (supplementary Table 2 is available at Carcinogenesis Online), and the SNPs that most strongly correlated with principal component 1 were rs2256507, rs25487, rs25486 and rs1001581, which were the four significant SNPs in the single-SNP analyses.
Guided by the results of PCA, we observed that haplotypes 2–6 shared the same alleles at rs25487, rs25486 and rs1001581 and all had a lower risk (OR<1.0) compared with haplotype 1. Therefore, we combined haplotypes 2–6 into one group, which was associated with a significantly reduced risk of NSCLC compared with haplotype 1 (OR: 0.63; 95% confidence interval: 0.42–0.96).
Compared with the single-SNP analyses, the multilocus analysis of XRCC1 for Latino did not contribute any additional information; therefore, further analyses were conducted with the four statistically significant SNPs.
Fifteen SNPs of XRCC1 formed a single haplotype block for African-Americans (supplementary Figure 2 is available at Carcinogenesis Online). Global tests for haplotype association were not statistically significant for either all lung cancer or NSCLC; however, haplotype 5 was associated with a significantly reduced risk of all lung cancer or NSCLC compared with haplotype 1 (Table V). Haplotype 5 was the only major haplotype (frequency>5%) carrying the A allele of rs17655484, which contributed to its reduced lung cancer risk as seen in the single-SNP analyses. Similar to the haplotype analyses, no overall association was observed by PCA (supplementary Table 3 is available at Carcinogenesis Online).
Two (rs7148997 and rs1041856) of the four APEX1 SNPs formed a haplotype block (supplementary Figure 3 is available at Carcinogenesis Online). No significant associations were observed with either the haplotype analyses (Table IV) or PCA (supplementary Table 4 is available at Carcinogenesis Online).
Two (rs7148997 and rs1041856) of the four APEX1 SNPs formed a haplotype block (supplementary Figure 4 is available at Carcinogenesis Online). A significant reduced risk was observed with the rare haplotype group compared with haplotype 1; however, due to their rarity, these haplotypes are unlikely to have significant contribution to the risk of lung cancer. Furthermore, PCA analysis did not show any significant association (supplementary Table 5 is available at Carcinogenesis Online).
FPRPs were calculated for three significant SNPs (rs2256507, rs25487 and rs1001581) (Table VI). rs25487 (Arg399Gln) of XRCC1 is a non-synonymous SNP with some functional evidence (20–23) and was therefore assigned higher prior probabilities of 0.1–0.25. The FPRPs for the Gln/Gln genotype were 0.68 and 0.42 for the prior probabilities of 0.1 and 0.25, respectively. The FPRPs associated with each copy of Gln allele were 0.45 and 0.21 for the prior probabilities of 0.1 and 0.25, respectively. Since rs2256507 and rs1001581 had little functional information, they were assigned prior probabilities of 0.001–0.01, and their FPRPs were all >0.5. FPRP was not calculated for rs25486 since it is in high linkage disequilibrium with rs25487 (r2=0.88).
rs17655484 was assigned prior probabilities of 0.001–0.01 due to the lack of functional information, and the FPRPs were all >0.5 (Table VI).
Gene–smoking interactions were evaluated for four SNPs (rs2256507, rs25487, rs25486 and rs1001581) with significant main effect using dominant, recessive and log-additive models. No significant SNP–smoking interaction was observed (supplementary Table 6 is available at Carcinogenesis Online).
We were unable to assess for gene–smoking interaction for the only significant SNP (rs17655484) since no non-smoking lung cancer cases carried at least one variant allele.
In the current analysis, four SNPs for Latinos and one SNP for African-Americans in XRCC1 were found to be significantly associated with lung cancer risk. Among these SNPs, the association between Arg399Gln (rs25487) and NSCLC among Latinos has the smallest probability of being false positive with FPRPs <0.5. We found an increased risk of NSCLC for those who carried the Gln/Gln genotype compared with those with the Arg/Arg genotype among Latinos, but not among African-Americans. The only other study with Latino subjects did not observed a significant association between Arg399Gln and adenocarcinoma of the lung; however, the number of Latino cases was small in that study (n<50) (4). The lack of an association between lung cancer risk and Arg399Gln among African-Americans is probably due to the low prevalence of Gln/Gln genotype among this population (2.1% among controls and 1.6 among all lung cancer cases). A previous study among African-Americans also did not observe a significant association between Arg399Gln and lung cancer risk, and the prevalence of Gln/Gln carriers also was low (3.7% among controls and 1.9% among cases) (2). Two SNP databases, SNP500Cancer (9) and NCBI Entrez SNP (24), also report a low prevalence of Arg399Gln Gln/Gln genotype among African-Americans ranging from 1.4 to 5.4%, suggesting that the contribution of Arg399Gln polymorphism to lung cancer risk may be low for African-Americans. In contrast, the prevalences Arg399Gln Gln/Gln genotype were higher among Hispanics (4.3–7.1%), Asians/Pacific rim population (4.7–29.2%) and Caucasians (8.5–19.4%) as reported by SNP500cancer and NCBI Entrez SNP. A meta-analysis by Kiyohara et al. (25) showed that Gln/Gln genotype was significantly associated with an increased risk of lung cancer among Asians but not among Caucasians. Since that meta-analysis, one study among Caucasians reported an increased risk of lung cancer associated with Gln variant only among light smokers (26). In contrast, another study among Caucasians reported a decreased lung cancer risk among non-smoking women who carried at least one copy of Gln (27). A third study among Caucasians did not observe any significant association between lung cancer and Arg399Gln (28). One (29) of the two studies (29,30) among Chinese published since the meta-analysis found an increased risk of lung cancer associated with the Gln/Gln genotype. In contrast, a study among an Indian population reported an inverse association between lung cancer and the Gln allele (31). These inconsistencies between racial groups may be due to the difference in environmental exposures (e.g. prevalence and the dose of tobacco smoking). Alternatively, Arg399Gln may be a proxy SNP linked to a true causal SNP, and the linkage patterns may differ across ancestral groups.
Although the association between Arg399Gln and lung cancer requires further confirmation, several studies suggest that this polymorphism may have functional impact. A lower DNA repair capacity has been observed among styrene-exposed workers who carried Gln/Gln genotype (21,22). Another study showed that among never-smokers, Gln/Gln carriers had a significantly higher level of DNA damage as measured by DNA adduct compared with those with either Arg/Arg or Arg/Gln genotypes. Finally, a study reported that among benzene-exposed laboratory workers, those with at least one Gln allele had a lower DNA repair capacity (23).
The current study observed a significant association between lung cancer and one SNP (rs2256507) in the 3′ downstream region of XRCC1 for Latinos and one SNP (rs17655484) in the 5′ upstream region of XRCC1 for African-Americans, suggesting that these regions may warrant further investigation. Specifically, these regions may contain regulatory elements that influence the expression of XRCC1 protein. However, high FPRPs associated with these SNPs suggest that these findings may be due to chance.
A meta-analysis of OGG1 Ser326Cys (rs1052133) using data from seven studies showed that those who carried the Cys/Cys genotype had a small increase in the risk of lung cancer (OR=1.24; 95% confidence interval: 1.01–1.53) (32). Subsequently, another meta-analysis of OGG1 Ser326Cys using 11 studies (including seven from the previous meta-analysis) showed a small but non-significantly increased risk of lung cancer (OR=1.17; 95% confidence interval: 0.88–1.56) (25). This suggests that the impact of Ser326Cys, if real, is probably to be small. In our study, the ORs of NSCLC for those with Cys/Cys genotype were 1.43 and 1.19 for Latinos and African-Americans, respectively, which were similar in magnitude to results from the meta-analyses, though not statistically significant. Our sample size was too small to detect a significant OR <1.5.
Asp148Glu is a commonly studied SNP for APEX1. Though the Asp148Glu was shown by one study to have functional impact on sensitivity to ionizing radiation (33), no clear association between lung cancer and APEX1 Asp148Glu has been found according to a meta-analysis (25). Since that meta-analysis, one additional study reported no association between lung cancer and Asp148Glu (27), whereas another study reported an increased lung cancer risk for those who carried at least one copy of Glu (26). Consistent with most of the previous studies, we also did not observe any significant association between APEX1 Asp148Glu and lung cancer.
A major limitation of this study is the lack of statistical power to detect weak gene–disease associations (OR≤1.5) due to the small sample size, which may also increase the chance for spurious findings. In addition, SNP coverage is not complete in genes examined by this study such that the null findings do not necessarily suggest that these genes are not important for the development of lung cancer. Nevertheless, this is only the second study for both Latinos and African-Americans to assess the association between BER SNPs and lung cancer and the only study of this type to have adjusted for genetic ancestry to account for potential population stratification.
In summary, this study observed a significant increased risk of NSCLC associated with Gln/Gln genotype of Arg399Gln of XRCC1 among Latinos but not among African-Americans, suggesting that there may be genetic differences in the association between BER pathway and lung cancer between Latinos and African-Americans. Since the analyses were adjusted for genetic ancestry, the observed association between Arg399Gln and NSCLC among Latinos is less likely to have been confounded by population stratification. However, since this is the first study to report this association among Latinos, more studies need to be conducted among Latinos to confirm this result.
National Institute of Environmental Health Sciences (R01 ES06717); National Cancer Institute (R25 CA112355 to J.S.C.).
We thank Dr John Belmont of Baylor College of Medicine and Dr Gabriel Silva of Obras Sociales Santo Hermano Pedro, Antigua, Guatemala, for the collection of Mayan DNA samples and Dr Rick Kittles of Pritzker School of Medicine, University of Chicago for providing the African DNA samples. We thank the University of California Davis Genome Center for performing Illumina genotyping. We thank Drs Anna M.Napoles-Springer and Eliseo J.Perez-Stable of the Department of Medicine, University of California, San Francisco, for their assistance with study design and subject recruitment.
Conflict of Interest Statement: None declared.