|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies have identified susceptibility loci for esophageal squamous cell carcinoma (ESCC). We conducted a meta-analysis of all single-nucleotide polymorphisms (SNPs) that showed nominally significant P-values in two previously published genome-wide scans that included a total of 2961 ESCC cases and 3400 controls. The meta-analysis revealed five SNPs at 2q33 with P< 5 × 10−8, and the strongest signal was rs13016963, with a combined odds ratio (95% confidence interval) of 1.29 (1.19–1.40) and P= 7.63 × 10−10. An imputation analysis of 4304 SNPs at 2q33 suggested a single association signal, and the strongest imputed SNP associations were similar to those from the genotyped SNPs. We conducted an ancestral recombination graph analysis with 53 SNPs to identify one or more haplotypes that harbor the variants directly responsible for the detected association signal. This showed that the five SNPs exist in a single haplotype along with 45 imputed SNPs in strong linkage disequilibrium, and the strongest candidate was rs10201587, one of the genotyped SNPs. Our meta-analysis found genome-wide significant SNPs at 2q33 that map to the CASP8/ALS2CR12/TRAK2 gene region. Variants in CASP8 have been extensively studied across a spectrum of cancers with mixed results. The locus we identified appears to be distinct from the widely studied rs3834129 and rs1045485 SNPs in CASP8. Future studies of esophageal and other cancers should focus on comprehensive sequencing of this 2q33 locus and functional analysis of rs13016963 and rs10201587 and other strongly correlated variants.
Worldwide, esophageal cancer causes more than 400 000 deaths each year (1) and within the People's Republic of China it ranks fourth as a cause of cancer-related deaths (2). As seen throughout the economically developing world, esophageal squamous cell carcinoma (ESCC) predominates, which differs from recent trends in populations of European descent, where esophageal adenocarcinoma rates now exceed ESCC (3). In contrast to low-incidence populations, tobacco smoking and excessive alcohol consumption do not appear to be major risk factors in the geographically limited areas of China with a heavy burden of ESCC (4). Numerous studies show that a family history of esophageal cancer increases risk in China (5), but compared with other cancer types, few studies have comprehensively assessed the contribution of common genetic variants to ESCC risk.
Recently, two genome-wide association studies (GWASs) examined the contribution of common genetic variants to ESCC risk in Han Chinese (6,7). Both studies reported a strong association with single-nucleotide polymorphisms (SNPs) on chromosome 10q23, which harbors a plausible candidate gene, PLCE1. Variants in PLCE1 have also been linked to an inherited nephrotic syndrome and dengue fever shock syndrome (8,9). To further explore the role of common variants in ESCC risk, we combined risk estimate data from the two studies and completed a meta-analysis.
The joint data set included 2961 ESCC cases and 3400 controls, including 2024 cases and 2708 controls from one scan (6) and 937 cases and 692 controls from the other (7). In the joint data set, we examined associations at the previously reported susceptibility locus at 10q23 (Supplementary Material, Table S1); rs2274223, a nonsynonymous SNP in PLCE1 that was independently reported in the previous GWAS, showed a combined per allele odds ratio (OR) [95% confidence interval (CI)] of 1.39 (1.27–1.51) with P= 1.44 × 10−13. Six other SNPs also showed highly significant associations, with the strongest at rs3765524 with a P-value of 3.15 × 10−14.
Using the combined data set, we discovered an association at 2q33 that achieved genome-wide significance. We found five SNPs at this locus with P < 5 × 10−8 in the combined data set (Table 1). The strongest signal was rs13016963, with a combined OR (95%CI) of 1.29 (1.19–1.40) and P = 7.63 × 10−10. These SNPs are in high linkage disequilibrium (LD) and map to a region including CASP8, ALS2CR12 and TRAK2 (Fig. 1). In models conditioned on the most notable marker, rs10201587, the associations for the other five SNPs were attenuated, which suggests that our findings point to a single association signal.
The association between cancer risk and variants at the 2q33 locus, which includes CASP8, has been tested for many tumor types in over 50 studies (10,11). Since the SNPs tested and the genomic location associated varied by study, organ and histology, to better define our association signal, we imputed SNPs with the IMPUTE2 program (12) in the National Cancer Institute (NCI) scan using a hybrid reference of the 1000 Genomes Asian set and the Asian component of the Division of Cancer Epidemiology and Genetics (DCEG) Reference Imputation Set (13). We imputed 4304 SNPs in the 2q33 region with a mean certainty of 80.7% based on the information measure of the IMPUTE2 program and the SNPs associated with cancer risk with P < 1 × 10−5 that are listed in Supplementary Material, Table S2. The imputed SNP (rs6745435) with the strongest association signal was only marginally better than the strongest association for a genotyped SNP in the NCI scan (rs10201587) (Fig. 2). These SNPs were also in high LD and when we tested the 34 imputed SNPs in models conditioned on rs10201587 (Supplementary Material, Table S3), we found that all other SNP associations were attenuated, suggesting that these associations are from a single association signal.
The five genotyped SNPs that exceeded genome-wide significance localize between two recombination hotspots (positions 201819605 and 202211605, medians of 2 kb inferred hotspot intervals), which contain all SNPs with P-value <5 × 10−8. The top SNP in the combined data (rs13016963) and two others (rs9288318 and rs10201587) are located in introns of ALS2CR12 (amyotrophic lateral sclerosis 2 chromosomal, candidate 12) (Fig. 1). These are in high LD with rs10931936 (r2= 0.982) located in an intron of CASP8 (caspase 8, apoptosis-related cysteine peptidase), and rs9288318 (r2= 0.865), rs10201587 (r2= 0.874) and rs7578456 (r2= 0.734), which are located in the region between ALS2CR12 and TRAK2, adding further support to a single locus.
We conducted an inferred ancestral recombination graph (ARG) analysis (14) to identify one or more haplotypes that harbor the variants directly responsible for the detected association signal. We used data from the NCI scan (4732 subjects and 53 genotyped SNPs) to identify the location with the most likely functional SNPs by reconstructing the genealogical history based on haplotypes inferred using the PHASE program (15). The Margarita program determines whether a possible mutation resides on the marginal tree by comparing the frequency of the branches between cases and controls. The most probable pair of haplotypes for each subject was selected for analysis in the Margarita program, and during this process subjects whose highest haplotype pair probability was <50% were excluded from further analysis. We estimated 100 ARG genealogies and calculated permutation P-values (106 permutations) for each of the 53 SNPs. In a comparison of the location and strength of association from the ARG analysis with that from the standard GWAS P-values for each SNP (Fig. 3), both methods indicate that the strongest signal is localized to a region near rs10201587 (chr2: 201 911 036, hg18). The background association signal of the ARG analysis points to a haplotype containing the five SNPs and further distinguishes between risk haplotypes and protective haplotypes. In Figure 4, all haplotypes highlighted in green are predicted not to harbor the risk allele, whereas those in red are predicted to carry the risk allele. The ARG analysis showed that among all the marginal trees, rs10201587 had the lowest permutation P-value and has the greatest ability to segregate cases and controls, but four additional SNPs in near-perfect LD with rs10201587 (rs3769823, rs10931936, rs13016963 and rs9288318) provide the same discrimination when we restricted the data to haplotypes with a frequency of at least 1%. The complete separation of haplotypes with a frequency >1% identified the five genotyped SNPs and 45 imputed SNPs in strong LD with an r2 > 0.8.
In this study, we combined results from two previous GWASs in Han Chinese and discovered a new association between highly correlated variants at 2q33 and risk of ESCC, which maps to the CASP8/ALS2CR12/TRAK2 gene region. The strongest association was observed for rs13016963 [OR (95%CI) of 1.29 (1.19–1.40) and P = 7.63 × 10−10]. In an imputation analysis of over 4000 SNPs, no stronger signal was observed with ESCC risk. The notable SNPs are strongly correlated and reside within the boundaries of two recombination hotspots that we determined in the control subjects of the NCI GWAS. Since the conditional analysis did not preserve independent significance and in fact demonstrated substantial diminution of the tested signal, we conclude that there is one common allele on 2q33 associated with risk for ESCC.
We also used ARG analyses to further explore the region by reconstructing the genealogical history in this stable population to compare the frequency of branches of cases and controls to further localize the most likely functional variant(s). In the ARG analysis based on a permutation test, the results showed that rs10201587 had the strongest signal, but rs13016963 and three other SNPs were in near perfect LD with this SNP among the haplotypes tested. Based on the imputation analysis, there were 45 additional SNPs in strong LD, which could also be responsible for the direct association. Confirmation of the variants role will require additional study, including re-sequence analysis and functional studies of the optimal variants.
Numerous studies have investigated whether variation in the CASP8 gene region alters cancer risk, including cancers of the lung (16), breast (17), pancreas (18), non-Hodgkin lymphoma (19), squamous head and neck cancers (20) and others (10,21). Although a recent GWAS showed that rs13016963 was significantly associated with risk of melanoma (22), most previous studies tested ostensibly functional variants in the CASP8 promoter (−652 6N del, rs3834129), a variant that leads to an amino acid change in the CASP8 gene product (D302H, rs1045485), or a base substitution in the 3′UTR (Ex14–271 A>T). Overall, these studies have produced mixed results, primarily due to their small size and inadequate replication efforts (10,21,23), but they did include one finding of a significant association for rs3834129 with ESCC in Han Chinese (11). However, we found no evidence for an association between rs3834129 and risk of ESCC. rs6747918, which is highly correlated with rs3834129 in Han Chinese (r2= 0.8), had an OR (95% CI) of 1.06 (0.96–1.17) in our study (P= 0.256).
Two previous studies have investigated associations between risk of ESCC and other variants in the 3′ end of CASP8 or further downstream in the region of ALS2CR12 and TRAK2. A small study from our group suggested an association with rs1406121 in ALS2CR12 (24), but this finding was not replicated in a sample of 300 ESCC cases and matched controls (25). In our current larger data set, a proxy for rs1406121 [rs7577057, r2= 0.963 in the Beijing Han Chinese (CHB) population] showed a nominal but not genome-wide significant association with an OR of 0.87 (0.81–0.93), P= 0.00015. This SNP has r2 (0.284) with the strongest signal (rs13016963) seen in our meta-analysis.
In India, Umar et al. (23) reported an association between risk of ESCC and the IVS12–19G>A (rs3769818) variant in CASP8 [per allele OR 3.36 (1.07–10.61), P= 0.039]. In our data, we did not genotype rs3769818 directly, but rs13016963 is in high LD with rs3769818 (r2= 0.733 in the NCI study population) and showed a strong association (Table 1). Furthermore, Umar suggested that the association was limited to men. In our data, we observed no significant difference in the association for men and women [ORmen (95% CI) = 1.30 (1.15–1.46), P= 1.49 × 10−5; ORwomen (95% CI) = 1.20 (1.03–1.40), P= 0.023].
CASP8 encodes caspase-8, a cysteine-aspartic acid protease that in its mature form initiates apoptosis and in its immature form, procaspase-8, helps control cell migration and adhesion (26). Genetic variants altering caspase-8 expression or function have the potential to affect tumorigenesis through either of these pathways. Earlier work from our group showed loss of CASP8 expression from both tumors and esophageal squamous dysplasia (27), the precursor lesion for ESCC. The association signal we see at 2q33 could also be attributed to other genes in the region, ALS2CR12 or TRAK2. The ALS2CR12 protein is strongly expressed in normal squamous esophageal tissue (http://www.proteinatlas.org/ENSG00000155749/normal/esophagus) and is named as a candidate gene for juvenile amyotrophic lateral sclerosis (28), but the connection to cancer is unclear. The SNPs associated with ESCC in this study could also be tagged to TRAK2 (29).
Our meta-analysis identified an association between ESCC risk and a single locus at 2q33 that harbors a plausible candidate gene, CASP8 in ESCC. This locus appears to play a role in the risk of other cancers, but the pattern of association between variants at this locus and different cancer types has varied widely in previous studies, similar to what has been observed for the TERT/CLPTM1L locus on 5p15 (30). Future studies of esophageal and other cancers should focus on comprehensive sequencing of this 2q33 locus and functional analysis of rs13016963 and rs10201587and other strongly correlated variants.
The two studies contributing data to this meta-analysis have been described in detail in the original publications (6,7). After publication, additional GWAS data were available for 133 ESCC cases and 615 controls in one scan and these were included in this analysis (6). The newly genotyped additional data were processed by the Core Genotyping Facility GWAS pipeline and similar quality control (QC) filtering metrics were applied to ensure good quality data retained in the downstream analysis. We excluded the following: (i) samples with missing rates >6%; (ii) loci with missing rates >5%; (iii) samples with abnormal mean heterozygosity values of either >30 or 25%; (iv) gender discordant samples; (v) unexpected duplicate pairs. In total, this analysis used data on 2961 ESCC cases and 3400 controls, including 2024 cases and 2708 controls from one scan (vi) and 937 cases and 692 controls from the other (vii). The NCI scan used the Illumina 660W Quad array for genotyping, whereas the China scan used the Illumina 610 Quad array. The details of these methods and the quality assurance and QC metrics are available in the prior publications (6,7).
From each study, variants with a nominal P-value <0.05 from a two-sided linear trend test were tabulated and beta estimates and standard errors sent to the NCI analytic core. From the China scan, this constituted 29 971 SNPs, and from the NCI scan this constituted 26 177 SNPs.
We designed four TaqMan assays for rs10201587, rs13016963, rs7578456 and rs9288318, respectively, and genotyped a total of 340 samples randomly chosen from the Shanxi or Singapore studies, out of which 303 samples were previously genotyped on the Illumina 660W arrays and passed after genotyping QC filtering. The overall concordance rate was >99.7% between the TaqMan data and GWAS data for these four SNPs in the 303 samples.
We performed a fixed effect meta-analysis using the inverse variance method to estimate the combined ORs and 95% CIs. The P-value for heterogeneity was calculated by Cochran's Q, which is distributed as a χ2 statistic with 1 degree of freedom (df).
For the imputation reference set, we used a combination of 60 CHB+JPT subjects from 1000 genomes low coverage July 2010 data set (31), 29 additional HapMap CHB+JPT subjects and an internal imputation data set of 74 subjects scanned with the Illumina 2.5M chip (13). The IMPUTE2 program (12) was used with the recommended default settings to impute a 3 Mb window (200–203 Mb, hg18) on chr2, which encompasses our newly discovered locus. The association for the imputed SNPs was analyzed using SNPTEST (31) based on allelic dosage for the genotypic term, including adjustments for study, age, sex and the first principal component from the PCA of the population stratification SNPs.
To identify recombination hotspots in the region, we used SequenceLDhot (32), a program that uses the approximate marginal likelihood method (33) and calculates likelihood ratio statistics at a set of possible hotspots. From the controls, 100 individuals were sampled five times without replacement for five independent recombination hotspot analyses and these five random samples are represented as five different colored lines in Figure 1. Specifically, genotypes of 90 SNPs spanning chr2: 201 432 605–202 408 291 were phased using PHASE v2.1 (15) to calculate background recombination rates. The PHASE outcome was used as direct input for the SequenceLDhot program. For the plot, the likelihood ratio statistic values between 201783605 and 202283605 were plotted. The reason we tested a wider range than the plot was to capture flanking hotspots that contained all highly significant signals. The LD was calculated as r2 for 55 SNPs that were genotyped in both data sets within an ~473 kb region bounded by rs12693932 and rs3731707 (chr2: 201 801 640–202 275 009, UCSC genome build hg18), and a heat map was drawn using the snp.plotter program (34).
The genotype data from the NCI scan for 4732 individuals on 53 loci in the 2q33 region were analyzed with the PHASE v2.1.1 program (15) to statistically infer all probable pairs of haplotypes for each individual. The most probable pair of haplotypes for each subject was selected for analysis in the Margarita program (14). A total of 141 (<3%) individuals with ambiguous phasing and a highest haplotype pair probability of <50% were excluded; 4591 (1974 cases and 2617 controls) remained in the downstream ARG analysis. An ensemble of 100 ARGs was inferred and 1 000 000 permutations were done for the best cut (determined by the allelic test P-value) at each marginal tree to estimate the P-value at each locus. Haplogroups with frequency >1% were categorized into protective or at-risk groups.
Conflict of Interest statement. The authors have no conflicts of interest to declare.
This work was supported by the Xinxiang Medical University Key Scientific Program (2009-5), the National Natural Science Foundations of China (30670956, 30971133), 863 HighTech Key Projects (2006AA02A403, 2007AA02Z161), China Key Program on Basic Research (2007CB516812), Special Scientific Programs from Science and Technology Department (2009-8), Health Department (2009-10) and Education Department (2008-7) of Henan Province and the Anhui Provincial Special Scientific Program (2007-7). The Shanghai Men's Health Study (SMHS) was supported by the National Cancer Institute extramural research grant (R01 CA82729). The Shanghai Women's Health Study (SWHS) was supported by the National Cancer Institute extramural research grant (R37 CA70837) and, partially for biological sample collection, National Cancer Institute Intramural Research Program contract NO2-CP-11010 with Vanderbilt University. The Singapore Chinese Health Study (SCHS) was supported by the National Cancer Institute extramural research grants (R01 CA55069, R35 CA53890, R01 CA80205 and R01 CA144034). The Shanxi Upper Gastrointestinal Cancer Genetics Project was supported by the National Cancer Institute Intramural Research Program contract NO2-SC-66211 with the Shanxi Cancer Hospital and Institute, Taiyuan, Shanxi, China. The Nutrition Intervention Trials (NIT) were supported by National Cancer Institute Intramural Research Program contracts NO1-SC-91030 and HHSN261200477001C with the Cancer Institute of the Chinese Academy of Medical Sciences, Beijing, China. This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics.