|Home | About | Journals | Submit | Contact Us | Français|
DNA repair genes are important for maintaining genomic stability and limiting carcinogenesis. We analyzed all single nucleotide polymorphisms (SNPs) of 125 DNA repair genes covered by the Illumina HumanHap300 (v1.1) BeadChips in a previously conducted genome-wide association study (GWAS) of 1,154 lung cancer cases and 1,137 controls and replicated the top-hits of XRCC4 SNPs in an independent set of 597 cases and 611 controls in Texas populations. We found that six of 20 XRCC4 SNPs were associated with a decreased risk of lung cancer with a P value of 0.01 or lower in the discovery dataset, of which the most significant SNP was rs10040363 (P for allelic test = 4.89 ×10−4). Moreover, the data in this region allowed us to impute a potentially functional SNP rs2075685 (imputed P for allelic test = 1.3 ×10−3). A luciferase reporter assay demonstrated that the rs2075685G>T change in the XRCC4 promoter increased expression of the gene. In the replication study of rs10040363, rs1478486, rs9293329, and rs2075685, however, only rs10040363 achieved a borderline association with a decreased risk of lung cancer in a dominant model (adjusted OR = 0.80, 95% CI = 0.62–1.03, P = 0.079). In the final combined analysis of both the Texas GWAS discovery and replication datasets, the strength of the association was increased for rs10040363 (adjusted OR = 0.77, 95% CI = 0.66–0.89, Pdominant = 5×10−4 and P for trend = 5×10−4) and rs1478486 (adjusted OR = 0.82, 95% CI = 0.71 −0.94, Pdominant = 6×10−3 and P for trend = 3.5×10−3). Finally, we conducted a meta-analysis of these XRCC4 SNPs with available data from published GWA studies of lung cancer with a total of 12,312 cases and 47,921 controls, in which none of these XRCC4 SNPs was associated with lung cancer risk. It appeared that rs2075685, although associated with increased expression of a reporter gene and lung cancer risk in the Texas populations, did not have an effect on lung cancer risk in other populations. This study underscores the importance of replication using published data in larger populations.
Lung cancer remains the leading cause of cancer-related deaths in both men and women, with an estimated 157,300 deaths and 222,520 new cases, accounting for about 28% of all cancer deaths and 15% of all new cancer cases, in the United States in 2010 . Although major risk factor for lung cancer is cigarette smoking, exposure to ionizing radiation, such as radon and medical imaging, is also the recognized risk factor for lung cancer [2–5]. Environmental carcinogenic agents, however, cause lung cancer only in a minority of exposed individuals, suggesting that inherited susceptibility might contribute to the variation in lung cancer risk [6, 7]. Numerous studies [8–12] have shown that individuals with a familial history of lung cancer have an increased risk, which further supports an etiological role of genetic factors in lung cancer risk.
Cellular DNA integrity is constantly threatened by various assaults. DNA damage is caused by both endogenous metabolites, such as reactive oxygen, nitrogen species and lipid peroxidation products, and environmental carcinogens, such as those found in tobacco smoke. DNA damage, if left unrepaired or repaired incorrectly, may result in genetic instability and mutation fixation, subsequently leading to cancer development [13, 14]. Therefore, DNA damage has emerged as a major culprit in cancer . In humans, at least four major repair pathways have been evolved to repair most of the DNA lesions according to their chemical and physical properties . The nucleotide-excision repair (NER) pathway mainly repairs bulk lesions, whereas the base-excision repair (BER) recognizes and removes incorrect and damaged bases. The mismatch repair (MMR) is responsible for correcting replication errors, whereas the DNA double-strand break (DSB) repair involves two major pathways, the non-homologous end joining (NHEJ) and the homologous recombination (HR). The DSBs are the most toxic and mutagenic DNA lesions in human cells, because a single DSB can potentially lead to loss of more than 100 million base pairs of genetic information .
Genetic variants in DNA repair genes have been investigated in many association studies of cancer based on either a candidate or pathway approach, with inconsistent results and failure to replicate in later studies, particularly in lung cancer [18–20]. Recently, genome-wide association study (GWAS) has emerged as a powerful agnostic approach for identifying novel susceptibility loci involved in human diseases . Several recent GWA studies have identified some loci associated with lung cancer, including CHRNA3/5 at chromosome 15q25.1, TERT and CLPTM1L at 5p15.33, BAT3-MSH5 at 6p21.33, a common variant rs1051730 in the nicotinic acetylcholine receptor gene cluster on chromosome 15q24, and HLADQA1 at 6p21.31 [22–28]. However, most of the previously studied candidate genes, including DNA repair genes, were not among the top-hit loci in these GWAS datasets. It is possible that contribution of each of a large number of genetic variants to lung cancer susceptibility is weak to be detected in GWA studies.
In the present study, we analyzed all 1,806 SNPs in 125 DNA repair genes covered by the Illumina HumamHap300 (v1.1) BeadChip in 1,154 lung cancer cases and 1,137 controls in a Texas population . Although none of the SNPs achieved genome-wide significance (i.e., a P value < 10−7) for an association with lung cancer risk, 32 SNPs had P value of < 10−2, of which 6 SNPs (rs10040363, rs4591730, rs1017794, rs1011981, rs9293329 and rs1478486) were located in the XRCC4 gene region (Fig. 1 and Table 1), suggesting that these XRCC4 variants may be associated lung cancer susceptibility.
To further test for the significance of loci in XRCC4 associated with lung cancer susceptibility, we did following as the validation and replication of the finding: 1) we used genotype imputation to infer untyped XRCC4 SNPs, thereby increasing the chance to capture putative, untyped causal and functional SNPs; 2) we performed additional experiments to identify functional relevance of the observed significant SNPs, 3) we performed a replication study of the observed significant SNPs in an independent set of 597 cases and 611 controls from a similar Texas population to that used in the discovery phase, and 4) we conducted a meta-analysis of the Texas GWA and replication studies with four additional GWA studies of a total of 12,312 primary lung cancer cases and 47,921 controls.
The study protocols were approved by the Institutional Review Board of the University of Texas M. D. Anderson Cancer Center. Informed consent was obtained from all participants.
The discovery phase included a subset of Texas populations in our ongoing lung cancer case-control studies at The University of Texas M. D. Anderson Cancer Center . The ascertainment and matching criteria of cases and controls and the methods of recruitment, genotyping, and quality control have been described in details elsewhere [23, 29]. Briefly, the GWAS dataset included 1154 lung cancer cases and 1137 controls. Cases were patients with newly diagnosed and histopathologically-confirmed lung cancer, and controls were healthy individuals seen for routine care at a multispecialty physician practice in Houston, with a frequency matching to the cases by age (in 5 year categories), sex and smoking status. The demographic characteristics of this study population are shown as Table 2.
The replication study was an independent sample drawn from the same case-control populations as we did in the GWAS discovery phase, including 602 cases with histopathologically confirmed lung cancer, and 618 healthy controls with the same matching criteria.
The meta-analysis of our GWA study and replication study with genotyping data from four additional GWA studies was conducted. These four additional GWA studies were from the National Cancer Institute (NCI), UK, the International Agency for Research on Cancer (IARC) and deCODE Genetics. We communicated with the corresponding or principal investigators of these studies to request data on genotype frequencies in cancer cases and non-cancer controls for the XRCC4 variants. Details of case and control ascertainment and matching criteria as well as the genotyping in each of these studies have been published previously [24, 25, 27, 28]. Briefly, the NCI study, which included 5,739 cases and 5,848 controls, was based on one population-based case-control study in Italy, a cohort study in Finland and two cohort studies in all U.S. states; The UK study included 1,978 cases from Genetic Lung Cancer Predisposition Study (GELCAPS), and 1,438 controls from the 1958 Birth cohort; The IARC study, which included 1968 case and 2598 control, was based on a lung cancer case-control study conducted in 6 central European countries (Czech Republic, Hungary, Poland, Romania, Russia and Slovakia); The 876 cases and 36,272 controls in the deCODE Genetics study were drawn from one population-based case-control study from deCODE Genetics in Iceland. Together, the meta-analysis included a total of 12,312 primary lung cancer cases and 47,921 controls, which were all European descent. Genotyping of all subjects from GWA studies of IARC and deCODE Genetics was conducted using the HumanHap300 K, while that from UK conducted using the HumanHap550K. Three illumina platforms, i.e., the HumanHap550K, the HumanHap610 and the HumanHap 1 Million Chips, were used in the NCI GWAS.
The discovery dataset had data on 317,498 tagging SNPs , of which 20 SNPs in XRCC4 were covered and screened for replication using Applied Biosystems TaqMan genotyping platform according to the manufacture’s recommendations. Briefly, the reactions were prepared by using TaqMan Universal Master Mix, 80×SNP Genotyping Assay Mix, Dnase-free water, and 10-ng genomic DNA in a final volume of 5 μL per reaction. Both negative and positive controls were included in each plate to ensure the accuracy of the genotyping. The PCR amplification was run, and the plate was read using a TaqMan 7900 HT sequence detection system (Applied Biosystems). The analyzed fluorescence results were then auto-called in to the genotypes using the built-in SDS2.3 software of the system.
To infer potentially functional SNPs tagged by the 20 genotyped XRCC4 SNPs covered by the HumanHap300 (v1.1) BeadChips, we imputed genotypes using these 20 SNPs within a 307 kb region on chromosome 5q 14.2 from 82,407,760 bp to 82,715,135 bp, covering the whole region of the XRCC4 gene. The imputation was conducted using a Hidden Markov Model programmed in MACHv1.0 (http://www.sph.umich.edu/csg/abecasis/MACH/) . The imputation method combined the observed GWAS genotype data with the HapMap CEU genotype data as a reference panel (Release 24/Phase II Apr07, on NCBI B36 assembly, dbSNP b126, http://www.hapmap.org/cgi-perl/gbrowse/hapmap24_B36/) and then inferred the untyped genotypes probabilistically. A minor allele frequency [MAF] > 0.01 and an estimated r 2 > 0.3 were chosen as thresholds to flag and discard low quality imputed SNPs without removing many successfully imputed markers [31, 32]. We found that only one imputed SNP, rs2075685 was located in the promoter region (−651G>T) of XRCC4 and therefore is likely to affect its function. We therefore studied the effect of variation in this SNP for additional functional studies.
To explore a potential function of the imputed SNP rs2075685 located in the promoter region of XRCC4, we further performed a luciferase reporter assay. The human colon cancer cell line HCT116 was obtained from John Hopkins University School of Medicine (a gift from Dr. Bert Vogelstein). The human non-small cell lung carcinoma cell line H1299 and human cervical cancer Hela cell line were obtained from the American Type Culture Collection (ATCC, Manassas, VA). HCT116 and Hela cells were cultured in 1X Dulbecco’s modified Eagle’s medium (DMEM) containing 10% fetal bovine serum (Sigma-Aldrich, MO), and H1299 cells were cultured in RPMI1640 medium with 10% fetal bovine serum at 37ºC in 5% CO2.
The 1001-bp XRCC4 promoter (from −901 to +100 bp relative to the translation start site) was cloned by PCR with the primers of 5'-AAGGTACCCCAGGTGGTAAATTCGTCCA-3' (forward) and 5'-AAGCTAGCATCTAAATCCCGCCTTTTCC-3' (reverse), including the KpnI and NheI restriction sites. The difference between G and T alleles in the promoters was obtained by using the DNA template from subjects homozygous (GG and TT) for the -651G>T (rs2075685) variant. The PCR products were cloned into the basic-pGL3 firefly luciferase vector (Promega, Madison, WI) at the KpnI and NheI restriction sites. The -651G and -651T constructs were sequenced to confirm the orientation and integrity of each insert (Fig. 2A).
The cultured cells were transiently transfected with 1 μg of each of the XRCC4-reporter plasmids by FuGENE HD (Roche Applied Science, IN) in 12-well culture plates. The p-TK renilla luciferase (pRL-TK) (Promega) was co-transfected as an internal control. The pGL3 basic vector without the insert was used as a negative assay control. The cells were collected 48 hours after transfection and analyzed for luciferase activity with a Dual-Luciferase Reporter Assay System (Promega). The relative luciferase activity was calculated according to the manufacturer’s instruction using a luminometer (TD-20/20 DLReasy, Promega). Promoter activity was calculated for each of the constructs as a ratio of the luciferase activity to that of the pGL3 basic vector. All transfections were performed in triplicate.
The Hardy-Weinberg equilibrium (HWE) was tested using a χ2 test with one degree of freedom for each SNP. The differences in the distributions of demographic characteristics, selected variables, and genotypes between cases and controls were examined using the χ2 test. Haploview program (v 4.1) (http://www.broad.mit.edu/mpg) was used to infer the linkage disequilibrium (LD) structure among the 20 SNPs. Logistic regression was used to calculate the odds ratios (ORs) and 95% confidence intervals (CIs) for the association between a single locus and lung cancer risk with and without adjustment for age, sex, smoking status. The case-control data sets were tested for heterogeneity using the Breslow-Day test. As no significant heterogeneity was identified between the two study population subsets, all raw genotype data were combined for the final analysis. All the P values reported here were two sided. Statistical tests were performed using the SAS statistical software (version 9.13; SAS Institute, Cray, NC) and PLINK software (version 1.06; http://pngu.mgh.harvard.edu/purcell/plink/) . In the meta-analysis, we combined data from our own studies and from an additional four published GWA studies using Review Manager (RevMan) version 4.3 (Cochrane Collaboration, Copenhagen, Denmark). A random-effects model was used to pool the results depending on the heterogeneity between studies . Between-study heterogeneity was tested by the χ2-based Q-statistic and the I2 statistic, where I2 greater than 50% is considered significantly heterogeneous .
From the initial screening analysis for significant SNPs of the 125 DNA repair genes (Fig. 1 and Table 1), we identified 32 SNPs of 17 genes with a P value for an association with risk of lung cancer. These genes included two in BER, two in NER, four in HR and one in MMR. We chose the XRCC4 gene for further investigation, because this gene had four SNPs associated with lung cancer in the study population, and these associations have not been reported before. Therefore, we further assessed the risk associated with all 20 SNPs in XRCC4 covered by the Illumina HumanHap300 (v1.1) BeadChips, which represents a region mapping to a 234-kb region of chromosome 5 spanning between 82,412,760 bp and 82,710,135 bp. The 20 SNPs were distributed across intronic (n = 18; 90%), or flanking intergenic regions (n = 2, 10%).
Genotype distributions of all 20 SNPs were in Hardy-Weinberg equilibrium among the controls (P > 0.05). Further examination of the LD pattern among the six top-hit SNPs revealed that except for rs1478486 and rs9293329, the other four SNPs (i.e., rs10040363, rs4591730, rs1017794, and rs1011981) were in strong LD (r2 = 0.95 and D’ = 1.0 for rs10040363 and rs4591730, r2 = 0.95 and D’ = 0.98 for rs10040363 and rs1017794, r2 = 0.80 and D’ = 0.97 for rs10040363 and rs1011981, and r2 = 0.83 and D’ = 1.0 for rs1017794 and rs1011981) in the controls (Fig. 1B). We then evaluated associations between these three SNPs (i.e., rs10040363, rs1478486 and rs9293329) and lung cancer risk in both additive and dominant models and found that except for rs9293329, which was associated with a decreased lung cancer risk in a dominant model, other two SNPs (rs10040363 and rs1478486) were nominally significantly associated with a decreased risk of lung cancer in either an additive or a dominant model with adjustment for age, sex, smoking status, and pack-years of smoking (Table 3).
These six top-hit SNPs map to about a 96.6-kb region of chromosome 5 extending from 82,412,760 bp to 82,509,401 bp (Fig. 1B), located in the intronic regions of XRCC4, with two SNPs (rs10040363 and rs4591730) in intron 3 and the other four SNPs in intron 1. To identify potential functional SNPs that may be linked with these tagging SNPs, we used imputation to further map the associated SNPs across a 307-kb region on chromosome 5q14.2 from 82,407,760 bp to 82,715,135 bp, spanning the whole region of the XRCC4 gene. Imputation expanded the number of SNPs from 20 observed SNPs to a total of 236 SNPs in the 307-kb region. After exclusion of imputed SNPs based on quality control measures (r 2 = 0.3 and MAF > 0.01), 218 SNPs remained in the final imputed dataset (Fig. 2), of which only two imputed SNPs (rs2075685 and rs1056503) were located in the functional regions of XRCC4. The imputed SNP rs1056503 is located in exon 8 of XRCC4 and in high LD (r2 = 1.0 and D’ = 1.0 in the CEU component of HapMap) with the observed SNP rs1805377, but it was not associated with lung cancer risk (imputed P for allelic test = 0.1151) (data not shown). Notably, the other imputed SNP, rs2075685, located in the promoter region of XRCC4 was found to be in high LD (r2 = 0.90 and D’ = 1.0) with the observed top SNP rs1478486 (imputed P for allelic test = 0.0013) (data not shown) and was predicted to be functional via the analysis of the TRANSFAC (Searching Transcription Factor Binding Sites) program .
To further determine the allele-specific effects of −651G>T (rs2075685, which is in strong LD with the top-hit SNP rs1478486 (r2 = 0.90 and D’ = 1.0) on the XRCC4 promoter activity, we generated two luciferase reporter gene constructs with 1,001 bp of the XRCC4 promoter region (from −901 to +100), containing either G or T at the −651 position (Fig. 3A). As shown in Fig. 3B, the XRCC4 promoter containing the protective T allele displayed a significantly higher reporter gene expression, compared with the G allele in the human cervical cancer cell line Hela (P < 0.01), the human non-small cell lung carcinoma cell line H1299 (P < 0.01), and the human colon adenocarcinoma cell line HCT116 (P < 0.01), suggesting that G to T allele change at XRCC4-651 may increase the XRCC4 promoter activity in a non-tissue specific manner.
Among the six top-hit SNPs, four SNPs (i.e., rs10040363, rs4591730, rs1017794 and rs1011981) are in strong LD, and rs10040363 showed the strongest evidence of an association with lung cancer risk in the Texas GWAS discovery dataset. Hence, we selected rs10040363, other two observed top-hit SNPs (rs1478486 and rs9293329), and the imputed functional SNP rs2075685 for further replication in an additional 602 cases and 618 controls from the same Texas population. Among these subjects, five case and seven controls failed in the genotyping assays. Thus, the final analysis included 597 lung cancer cases and 611 controls. The frequency distribution of selected characteristics of the cases and controls is presented in Table 2. Because of frequency matching, there was no statistically significant difference in the distributions of age, sex and smoking status between cases and controls in the replication population that was similar to the population used in the discovery GWAS.
The genotype frequencies of these four SNPs among the controls were all in agreement with the Hardy-Weinberg equilibrium (chi-square test: P = 0.269 for rs10040363, P = 0.253 for rs1478486, P = 0.126 for rs9293329, P = 0.382 for rs2075685). The SNP rs10040363 G allele was associated with a decreased risk of lung cancer with a borderline statistical significance (adjusted OR = 0.80, 95%CI = 0. 62–1.03, P = 0.079) (Table 3) in the replication set. There was a similar, but non-significant association with the rs1478486 A allele (adjusted OR = 0.83 and 95% CI = 0.65–1.06), but rs9293329 was not associated with significant altered risk (Table 3). Except for rs9293329, the trends for association between rs10040363 and rs1478486 and lung cancer risk were in the identical direction in both the discovery and replication sets, and therefore, the lack of significance of the replication was likely due to limited power because of smaller sample size of the replication set. For the imputed functional rs2075685, the T allele also tended to be associated with decreased lung cancer risk (adjusted OR = 0.83 and 95% CI = 0.64–1.07, P = 0.152) (data not shown).
Because the observed risks associated with the replicated SNPs were in the same direction in both the discovery and replication datasets, we then combined the two datasets to increase study power. Because rs2075685, which is in strong LD with the top-hit SNP rs1478486 (r2 = 0.90 and D’ = 1.0) was imputed and not directly genotyped, we did not include this SNP in the final combined analysis. Using the Breslow-Day test, we found no statistically significant evidence of heterogeneity between the GWAS population and the replication population for rs10040363, rs1478486, and rs9293329. As shown in Table 3, the strength of association for either rs10040363 G allele or rs1478486 A allele was substantially enhanced (adjusted OR = 0.77, 95% CI = 0.66–0.89, P = 5×10−4 and P for trend = 5×10−4 for rs10040363 and adjusted OR = 0.82, 95% CI = 0.71 −0.94, P = 6×10−3, and P for trend = 3.5×10−3 for rs1478486).
We subsequently conducted a meta-analysis of these four replicated XRCC4 SNPs with a total of 12,312 cases and 47,921 controls using the combined data from the two Texas populations and four published GWA studies. However, the results of the meta-analysis did not show an association with risk of lung cancer for any of these SNPs (Fig. 4). The overall ORs for rs10040363, rs1478486, rs9293329, and rs2075685 were 0.96 (95% CI: 0.86–1.08; Pheterogeneity = 0.002), 0.98 (95% CI: 0.89–1.07; Pheterogeneity = 0.02), 0.99 (95% CI: 0.91 –1.07; Pheterogeneity = 0.15), and 0.96 (95% CI: 0.89 –1.05; Pheterogeneity = 0.07) by random effects, respectively.
Using the published Texas lung cancer GWAS discovery dataset, we first analyzed 1,806 SNPs of 125 DNA repair genes, among which 32 SNPs of 17 genes were found to have an allele effect on cancer risk with a P value of <0.01, although no genome-wide significant association was identified. We then assessed the associations between 20 SNPs of XRCC4 (the top-hit gene in the list of 17 genes) and lung cancer risk. We found that, of 20 SNPs, six (i.e. rs10040363, rs4591730, rs1017794, rs1011981, rs1478486, and rs9293329) were associated with risk of lung cancer with a P value of < 1×10−2, and the most significant SNP was rs10040363 (P value for allelic test = 4.89 ×10−4) with an imputed functional rs2075685 SNP (the imputed P value for allelic test = 1.3 ×10−3). The minor alleles of the six top-hit observed SNPs appeared to be protective against lung cancer risk, which is consistent with the data from the luciferase reported assay that further demonstrated that the rs2075685G>T change in the XRCC4 promoter increased XRCC4 expression.
In our replication study of three independent top-hit SNPs and one imputed functional SNP, we found that the rs10040363 G allele was associated with decreased risk of lung cancer with a borderline statistical significance, whereas all the three SNP, i.e.rs1478486, rs9293329 and rs2075685, were not. It is noted, however, that both rs1478486 A and rs2075685 T alleles exhibited reduced lung cancer risk. In the combined analysis of both GWAS discovery and replication datasets, the strength for an association was increased for rs10040363 (Pdominant = 5.0×10−4 and P for trend = 5×10−4) and rs1478486 (Pdominant = 6.0×10−3 and P for trend = 3.5×10−3), and the trends of the risk were consistently in the same direction in both discovery and the replication datasets. In the meta-analysis, however, we did not find evidence of an association between overall lung cancer risk and any of the four XRCC4 SNPs. This underscores the importance of replication of any findings of an effect of a low-penetrance locus on cancer risk, particularly from a GWA study, in different study populations.
XRCC4 is a limiting factor in the NHEJ , which is required for both normal development and suppression of tumors. It has been recognized that mouse embryonic cells with disruption of XRCC4 show reduced proliferation, radiation hypersensitivity, chromosomal instability, and severely impaired V(D)J recombination . A deficiency in XRCC4 results not only in an increased sensitivity of cells to X-ray but also may give rise to immunodeficiency in animals . Although our experimental data showed that the rs2075685G>T change in the XRCC4 promoter region increased XRCC4 expression, the association with risk for this functional SNP did not achieve statistical significance in our replication dataset (despite its association with a similarly decreased risk). Since rs2075685 is located in the promoter region in XRCC4 and the G>T change in this SNP increases XRCC4 expression, if rs1478486, one of the observed top-hit SNPs in high LD with rs2075685, contributes to the risk of lung cancer, rs2075685 could be the causal SNP linked to rs1478486. It is also likely that rs10040363 in XRCC4, though intronic, could also be linked to other untyped causal SNPs. It is plausible, however, that disease-associated variants with modest effects may be distributed proportionately between coding and noncoding sequences of the genome . Indeed, several studies have found that ‘functional’ intronic variants are associated with disease occurrence [41, 42]. For example, an intronic SNP in a RUNX1 binding site of SLC22A4, which affects the transcriptional efficiency of SLC22A4, is strongly associated with rheumatoid arthritis . The results from our replication dataset might represent an association of mild to modest effect, but such a weak association was not supported by the meta-analysis.
So far, at least two small studies have reported that rs10040363 and rs2075685 are involved in the susceptibility to lung cancer [43, 44]. A French study in 151 cases and 172 controls found that variant alleles of rs10040363 and rs2075685 were associated with decreased risk of lung cancer. This direction of association is the same as seen in our data. However, a candidate gene study from Taiwan of 164 lung cancer patients and 649 healthy controls found that rs2075685 was not associated with lung cancer risk, and this discrepancy could be due to ethnic differences, genotyping platform and study sizes. Again, these discrepancies further underscore the importance of replication to rule out a chance finding, particularly from under-powered studies.
The limitations of the present study include: 1) our analysis was limited to individuals of non-Hispanic whites, the controls were frequency-matched to the cases by age (within 5 year categories), sex, smoking status, and all the subjects in the discovery and replication datasets were ever smokers; 2) the sample size in the replication phase was smaller than the original discovery dataset; and 3) the subgroup meta-analysis was not conducted because only genotyping data was available from other four GWA studies.
In summary, using the data on 1,806 SNPs of 125 DNA repair genes from a published GWAS with a replication study in Texas populations, we first identified rs10040363 in XRCC4 that was associated with lung cancer risk in the study populations. We then identified another variant rs2075685, tagged by rs1478486, in the XRCC4 promoter, that might increase XRCC4 expression. However, the evidence supporting such findings is lacking from both our replication in an independent Texas population and from the meta-analysis of four previously published GWA studies. It appeared that rs2075685, although associated with increased expression of a reporter gene and lung cancer risk in the Texas populations, did not have an effect on lung cancer risk in other populations.
We thank Min Zhao, Jianzhong He and Kejing Xu for their laboratory assistance, and Dakai Zhu for his technical support. This study was supported in part by National Institutes of Health grants ES11740 and CA131274 (to Q. W.), CA86390 and CA55769 (to M. R. S.), CA121197 (to C.A.), and CA 16672 (to The University of Texas M. D. Anderson Cancer Center). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health.
Conflict of interest
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.