|Home | About | Journals | Submit | Contact Us | Français|
Linkage disequilibrium-mapping studies in Caucasians have indicated anassociation of Chr19q13.3 sub-region spanning ERCC2, PPP1R13L, CD3EAP and ERCC1 with several cancers. To refine the region of association and identify potential causal variations among Asians, we performed a fine-mapping study using 32 (39) SNPs in a 71.654kb sub-region. The study included 384 Chinese lung cancer cases and 387 controls. Seven closely situated SNPs showed significant associations with lung cancer risk in five different genetic models of single-locus associations (adjusted for smoking duration). These were PPP1R13L rs1970764 [OR (95% CI) = 1.58 (1.09-2.29), P = 0.014] in a recessive model and PPP1R13L rs1005165 [OR (95% CI) = 1.25 (1.01-1.54), P = 0.036], CD3EAP rs967591 [OR (95% CI) = 1.40 (1.13-1.75), P = 0.0023], rs735482 [OR (95% CI) = 1.29 (1.03-1.61), P = 0.026], rs1007616 [OR (95% CI) = 0.78 (0.61-1.00), P = 0.046], and rs62109563 [OR (95% CI) = 1.28 (1.03-1.59), P = 0.024] in a log-additive model and ERCC1 rs3212965 [OR (95% CI) = 0.70 (0.52-0.94), P = 0.019] in an over-dominant model. Six-haplotype blocks were determined in the sub-region. Using an alternative approach where we performed a haplotype analysis of all significant polymorphisms, rs1970764 was found to be most consistently associated with lung cancer risk. The combined data suggest that the sub-region with the strongest association to lung cancer susceptibility might locate to the 23.173kb from PPP1R13L intron8 rs1970764 to rs62109563 3′ to CD3EAP. Limited risk loci and span on lung cancer in this sub-region are initially defined among Asians.
The candidate sub-region of chromosome 19q13.3 includes four genes. From 3′→5′, they are ERCC2/XPD (excision repair cross-complementing rodent repair deficiency, complementation group 2/xeroderma pigmentosum complementary group D), PPP1R13L/IASPP/RAI [protein phosphatase 1, regulatory (inhibitor) subunit 13 like/Inhibitory member of the ASPP family/RelA-associated inhibitor], CD3EAP/ASE-1 [CD3e molecule, epsilon-associated protein/antisense to ERCC1)], and ERCC1 (excision repair cross-complementing rodent repair deficiency, complementation group 1) (Figure (Figure1).1). ERCC2 and ERCC1 are involved in DNA repair while PPP1R13L and CD3EAP participate in apoptosis and rRNA transcription, respectively [1, 2]. Genetically determined changes in activity of any of the four genes may play vital roles in carcinogenesis.
Lung cancer is a leading cause of death worldwide . Genetically determined susceptibility may contribute to carcinogenesis, possibly through gene-environment interactions. Three linkage disequilibrium (LD)-mapping studies in Caucasian populations have identified the sub-region encompassing ERCC2, PPP1R13L, CD3EAP and ERCC1 within chromosome 19q13.3 as being associated with risk of basal cell carcinoma, breast cancer and with multiple myeloma prognosis [4, 5, 1, 6]. Fine-mapping of cancer susceptibility related to the four genes at this chromosome region is still pending in Asian populations. Since the allele frequencies of a number of polymorphisms are very different among Caucasians and Asians, fine-mapping of the region in Asians may provide a tool for identification of the causal genetic variants.
In previous studies, we have examined all the commonly occurring single nucleotide polymorphisms (SNP) of the four genes at chromosome 19q13.3 in relation to lung cancer risk [7–10]. In order to further refine genetic variant location of lung cancer susceptibility at the same region in Chinese populations, we here used a dense fine-mapping strategy and performed extensive genotyping of SNPs.
The current study group included 384 lung cancer cases and 387 cancer-free controls. Lung cancer cases had significantly higher occurrence of cancer family history [OR (95% CI) = 11.70 (4.62-29.66), P < 0.0001] and longer smoking history (> 20 years) (P < 0.0001) compared to the control group. There was no notably statistically significant difference between cases and controls for mean age and gender. There were more males than females among cases (Table (Table11).
Thirty-nine polymorphisms were genotyped in the chromosome 19q13.3 region encompassing the 4 genes. Genotyping revealed no variant alleles for rs10418623, rs3212967, and rs3212950. All other polymorphisms were in Hardy-Weinberg equilibrium among controls except rs2097215, rs8112723, rs201704 and rs3212986. So, these 7 SNPs were excluded leaving 32 SNPs for the subsequent evaluation.
Thirty-two SNPs in 19q13.3 were analyzed using five different genetic models in relation to lung cancer risk adjusting for smoking duration (Table (Table2).2). PPP1R13L rs1970764, rs1005165, CD3EAP rs967591, rs735482, rs1007616, rs62109563, and ERCC1 rs3212965 were all associated with lung cancer risk in at least one of the models. The associations were evaluated on the basis of AIC (Akaike's Information Criterion) for different genetic models (to choose the smallest values). Six SNPs, PPP1R13L rs1970764 [OR, Odd Ratio (95% CI,) = 1.58 (1.09-2.29), P = 0.014] in the recessive model and PPP1R13L rs1005165 [OR (95% CI) = 1.25 (1.01-1.54), P = 0.036], CD3EAP rs967591 [OR (95% CI) = <1.40 (1.13-1.75), P = 0.0023], rs735482 [OR (95% CI) = 1.29 (1.03-1.61), P = 0.026], rs1007616 [OR (95% CI) = 0.78 (0.61-1.00), P = 0.046], and rs62109563 [OR (95% CI) = 1.28 (1.03-1.59), P = 0.024] in the log-additive model, were associated with increased lung cancer risk. ERCC1 rs3212965 [OR (95% CI) = 0.70 (0.52-0.94), P = 0.019] was associated with lowered lung cancer risk in the over-dominant model. The association of CD3EAP rs967591 [OR (95% CI) = 1.40 (1.13-1.75), P = 0.0023] with lung cancer risk was the strongest using the log-additive model.
The log transformed P-values from the associations between the 32 single SNPs and lung cancer risk in the five genetic models are illustrated in Figure Figure2.2. The 7 closely situated markers from PPP1R13L rs1970764 (thirteenth SNP) to ERCC1 rs3212965 (twenty-seventh SNP) constitute a risk sub-region of 29.707kb on chromosome 19q13.3. This sub-region encompasses three genes PPP1R13L, CD3EAP and ERCC1. The SNP with the most statistically significant association with lung cancer risk was CD3EAP rs967591 (twentieth SNP). This SNP was statistically significantly associated with lung cancer risk in four different models (adjusted for smoking duration) (Table (Table22).
The analysis of single-SNP associations with lung cancer risk was further stratified by smoking duration. We have previously reported that carriers of genotypes of rs967591 (G > A) AA, rs735482 (A > C) AC and CC, rs3212961 (C > A) CA and AA, rs2298881 (C > A) CA and AA were at increased risk among the heavy smokers (> 20 years). There was interaction between rs3212961 and smoking duration (Ptrend = 0.03) [11, 10, 8]. In this study, we re-confirmed the aforementioned results and furthermore showed that carriers of rs1005165 (C > T) TT and CT [TT versus CC, OR (95% CI) = 2.24 (1.08-4.65), CT versus CC, OR (95% CI) = 1.73 (1.02-2.93)] or rs8113779 (G > T) TT [TT versus GG, OR(95% CI) = 2.20 (1.06-4.56)] in this sub-region were at increased lung cancer risk among heavy smokers (> 20 years of smoking) (data not shown).
Linkage disequilibrium and haplotype block structure of the 32 SNPs in the genes ERCC2, PPP1R13L, CD3EAP and ERCC1 are illustrated in Figure Figure1.1. Six haplotype blocks were identified based on 95% confidence interval bounds of D′ values (Figure (Figure1).1). Global test of the haplotype distribution between cases and controls showed statistically different haplotype distribution in block 5 (global P = 0.011). Moreover, a protective haplotype ACGGTATTACG spanning 11 SNPs (encompassing the minor allele of rs1007616 and the major alleles of remaining 10 SNPs) of PPP1R13L, CD3EAP and ERCC1 in this block [OR (95%CI) = 0.72 (0.53- 0.97), P = 0.032, adjusted for smoking duration] was detected after haplotypes with frequency < 0.03 in both cases and controls were excluded (Supplementary Table 1).
In a second approach used to identify causative polymorphisms, linkage was determined between the seven SNPs which were associated with lung cancer risk (rs1970764, rs1005165, rs967591, rs735482, rs1007616, rs62109563, rs3212965) (Supplementary Figure 1). Rs967591, rs735482 and rs1005165 were in tight linkage with high r2 values (Supplementary Figure 1). Therefore, rs735482 and rs1005165 were excluded from further analysis. Haplotype analysis of rs1970764, rs967591, rs1007616, rs62109563, and rs3212965 is shown in Table Table3.3. The P-value of the global haplotype association was 0.023. The second most frequent haplotype encompassing the variant alleles of rs1970764, rs967591, and rs62109563 was associated with 1.44-fold (95% CI = 1.05-1.97, P = 0.022) increased risk of lung cancer. The sixth haplotype associated with a non-statistically significant 1.8-fold increased risk of lung cancer (95% CI = 0.92-3.54, P = 0.088) also included the variant allele of rs1970764 and in addition, the variant allele of rs3212965. Thus, this analysis points to rs1970764 as being most consistently associated with lung cancer risk.
We have previously identified a susceptibility region on chromosome 19q13.3 in relation to lung cancer risk using a HapMap-based strategy among Chinese (16 tag SNPs) . Fine-mapping studies now enable us to narrow down the set of candidate causal polymorphisms. This study is an elaboration of our previous association analysis with lung cancer including 18 tag SNPs and 14 non-tag SNPs. To the best of our knowledge, this is the first comprehensive fine-mapping of lung cancer (or cancer) susceptibility encompassing ERCC2, PPP1R13L, CD3EAP and ERCC1 on chromosome 19q13.3 among Chinese (also Asian).
We have previously reported that PPP1R13L rs1970764 in intron8 and CD3EAP rs967591 in the 5′ UTR and CD3EAP rs735482 in exon3 were associated with increased lung cancer risk in a co-dominant model or dominant model after adjustment for smoking duration [11, 10]. In the present study, we report that rs1970764 was associated with lung cancer risk in a recessive model, and rs967591 and rs735482 was associated with lung cancer risk in a log-additive model as previously reported (Table (Table2).2). In addition, we identified 4 new polymorphisms in the vicinity of three SNPs as being associated with lung cancer risk. There are PPP1R13L rs1005165 near the 5′ end of CD3EAP, CD3EAP rs1007616 in the 3′ UTR, rs62109563 in the 3′ UTR, and ERCC1 rs3212965 in intron5.
We determined 6 haplotype blocks in the sub-region of chromosome 19q13.3. Seven SNPs which were associated with lung cancer risk were partitioned in block 3 and block 5. PPP1R13L rs1970764 is located in block 3. The remaining 6 SNPs were all located in block 5. The 6 SNPs were in strong pair-wise linkage disequlibrim with one another (all D' > 0.8) (Figure (Figure1),1), implying that they probably detect the same biological effect. Both PPP1R13L rs1970764 in block 3 and CD3EAP rs967591 in block 5 were important constituents of the previously identified “high-risk haplotype” associated with increased risk of several cancers among Caucasian [12–14]. We also proposed the two SNPs as risk candidates for single genetic locus among Chinese [11, 15]. Recent studies of Koreans reported that PPP1R13L rs1970764 was significantly associated with relapse-free and disease-specific survival in a recessive model for rectal cancer and CD3EAP rs967591 AA genotype exhibited lower overall survival of early-stage lung cancer [16, 17]. The two blocks spanning ERCC2, PPP1R13L, CD3EAP and ERCC1 within chromosome 19q13.3 were high linkage disequilibrium as also observed in Caucasian Danes .
We attempted to locate the causal sequences and polymorphisms. The haplotype block analysis suggested that the causal genetic variation could locate to the 28.406 kb from ERCC2 intron11 rs238403 to PPP1R13L intron8 rs34231843 in block 3 and to the 12.836kb from PPP1R13L intron1 rs4803817 to ERCC1 intron5 rs3212964 in block 5. However, when combining this information with the multiple single marker analysis, our data clearly indicated that the biologically relevant effectors probably locates to the 29.707kb spanning from PPP1R13L intron8 rs1970764 to ERCC1 intron5 rs3212965 as all significant P-values were found in this region. The most probable location is therefore in the 23.173kb spanning from PPP1R13L intron8 rs1970764 to rs62109563 3′ to CD3EAP. This DNA segment contains 6 SNPs with significant P-values. The sub-region span was very similar to the region identified for Caucasian Danes [1, 5, 6]. Using an alternative approach where we performed a haplotype analysis of all significant polymorphisms, rs1970764 was found to be most consistently associated with lung cancer risk.
However, the present case-control study group had a modest sample size and verification should be attempted in larger population-based cohorts.
We performed functional predictions for the 7 SNPs that were significantly associated with lung cancer risk using web-based SNP selection tools: SNPinfo  and Polyphen-2 . The SNPinfo analysis suggested that rs1005165 at 5′ near gene and rs967591 at 5′ UTR may potentially modify activity of both PPP1R13L and CD3EAP because of the close proximity of both genes. Rs1005165 with high regulatory potential score (RPS) (0.405099) and conservation score (0.781) and rs967591 with RPS (0.199765) were predicted to create TFBS (Transcription Factor Binding Sites) or TFBS and Splicing [ESE or ESS (Exonic Splicing Enhancer or Exonic Splicing Silencer)]. Rs1007616 in the 3′ UTR and rs62109563 near the 3′ may potentially influence both CD3EAP and ERCC1 since the two SNPs locate to a region with overlapping transcription of CD3EAP and ERCC1. TFBS and MicroRNA-binding sites were predicted for rs1007616. A benign influence was predicted by Polyphen-2 for rs735482 which is a non-synonymous SNP. Thus, 4 of the SNPs associated with lung cancer risk could potentially be the biologically relevant polymorphism. An important next step would be to experimentally characterize the potential biological effects of these all candidate SNPs.
In summary, we have fine-mapped the sub-region encompassing ERCC2, PPP1R1, CD3EAP and ERCC1 on chromosome 19q13.3 region in relation to lung cancer susceptibility among Chinese. We observed that of the 32 (39) SNPs directly genotyped covering a 71.654kb region, seven closely situated SNPs were all associated with lung cancer risk. Studying combinations of markers, LD blocks and their haplotypes, we suggest that a sub-region with the strongest association to lung cancer susceptibility might locate to the 23.173kb from PPP1R13L intron8 rs1970764 to rs62109563 3′ to CD3EAP. Further fine-mapping studies of cancer susceptibility with other Asian populations should focus on these loci.
The Chinese Administration Office of Human Genetic Resources approved this protocol. It complied with the principles outlined in the Helsinki Declaration. All study participants granted written or oral informed consent.
The studied population comprised 771 subjects, including 384 cases with lung cancer and 387 cancer-free controls using the same study population with increased sample size . Briefly, lung cancer diagnosis was based on standard clinical and histological criteria. Eligible cases were previously untreated (recruited prior to chemotherapy or radiotherapy for cancer). Cancer-free controls were identified from the orthopedics wards in the same region. Cancer-free status was ensured by Doctor's query in detail. Cancer-free, randomly selected controls were matched to the cases by ±3 years age, sex and ethnicity (So population stratification was not carried out). All subjects were unrelated ethnic Han Chinese. All covariate data were obtained from questionnaires. Stratification analyses were defined by gender, age (10-year intervals) and smoking history (20-year intervals) (Table (Table11).
We added more non-tag SNPs in present study to tag SNPs from our previous studies and combined all data to enable fine-mapping of lung susceptibility on chromosome 19q13.3. As a whole, SNPs were determined in the Chr19q13.3 lung cancer candidate region spanning 45855262 - 45926916 bp (range: 71.654kb) from 37.3 Genome Build in NCBI dbSNP. In total 39 SNPs, containing 18 tag SNPs (r2 ≥ 0.80 and MAF ≥ 0.05) and 21 non-tag SNPs with MAF ≥ 0.05, were included to capture the variability present across the sub-region in Chinese populations. The tag SNPs covered 90% of the common variation in the sub-region. The risk loci from our previous publications  were used as the center source of selection of non-tag SNPs. More details are presented in Table Table44.
Sequenom MassARRAY iPLEX platform (San Diego, CA, USA) was used for genotyping of ERCC2 rs238418, rs238414, rs2070831, rs50872, rs2097215; PPP1R13L rs8112723, rs201704, rs10418623, rs35209357, rs34231843, rs4803816, rs1005165; CD3EAP rs8113779, rs3212986, rs1007616, rs62109563; and ERCC1 rs3212967, rs3212965, rs3212955, rs3212950. Genotypes of ERCC2 rs1799787, rs3916874, rs3916840, rs50871, rs238403; PPP1R13L rs6966, rs1970764, rs4802252, rs4803817; CD3EAP rs967591, rs1046282, rs735482; and ERCC1 rs3212980, rs3212964, rs3212961, rs11615, rs2298881 were determined by methods of PCR-RFLP or Taq-Man or LDR-PCR as our previous reports [7–11]. Assay design and mass spectrometric genotyping were performed as previously described  with modifications as indicated in Supplementary Material, Methods. Primer and probe sequences are listed in Supplementary Material, Table Table2.2. As previously described, assay design failed for rs50872 of ERCC2 , whereas the genotype distribution of rs2070830 in PPP1R13L (strongly)  and rs238415 in ERCC2 (slightly)  deviated from Hardy-Weinberg equilibrium among the controls. The three SNPs were regenotyped using Sequenom platform in the present analysis.
For each SNP, Hardy-Weinberg equilibrium test, allele frequencies, and genotype frequencies were calculated using the SNPStats program  and Plink software v1.07 (http://pngu.mgh.harvard.edu/~purcell/plink/). Genotype distribution for each SNP among controls was tested for deviation from Hardy-Weinberg equilibrium and rejected at P < 0.05. Unconditional logistic regression was applied for calculation of adjusted OR, adjusted for smoking duration) and interaction between genotypes and smoking duration by using SNPStats. We did not adjust for family history of cancer since the study object is genetic susceptibility factors. Five genetic models (co-dominant model, dominant model, recessive model, over-dominant model and log-additive model) (Table (Table2)2) were performed for each single-locus case-control association. Haploview software 4.2  and SNPStats program  were used to calculate D' and r2 values between the genotyped SNPs and haplotype frequencies, to generate D' and r2 map and LD block boundaries (based on 95% confidence bounds on D' values ) and to analyze the haplotype associations (OR,, adjusted for duration of smoking) of LD blocks identified. Haplotypes with frequency < 0.03 among both cases and controls were excluded from the analysis.
This study was approved by the Chinese Administration Office of Human Genetic Resources (No. 015).
CONFLICTS OF INTEREST
The authors declare no conflict of interest.
This work was supported by National Natural Science Foundation of China (30571016 and 81072384); Foundation of Key Application Basis Studies, Science and Technology Bureau of Shenyang, People's Republic of China (1081233-1-00-04) and Key Laboratory Foundation of Science and Technology Research, Education Ministry of Liaoning Province, People's Republic of China (2008S222).
This paper has been accepted based in part on peerreview conducted by another journal and the authors' response and revisions as well as expedited peer-review in Oncotarget.