|Home | About | Journals | Submit | Contact Us | Français|
We conducted a genome-wide association study of gastric cancer (GC) and esophageal squamous cell carcinoma (ESCC) in ethnic Chinese subjects in which we genotyped 551,152 single nucleotide polymorphisms (SNPs). We report a combined analysis of 2,240 GC cases, 2,115 ESCC cases, and 3,302 controls drawn from five studies. In logistic regression models adjusted for age, sex, and study, multiple variants at 10q23 had genome-wide significance for GC and ESCC independently. A notable signal was rs2274223, a nonsynonymous SNP located in PLCE1, for GC (P=8.40×10−9; per allele odds ratio (OR) = 1.31) and ESCC (P=3.85×10−9; OR = 1.34). The association with GC differed by anatomic subsite. For tumors located in the cardia the association was stronger (P=4.19 × 10−15; OR= 1.57) and for those located in the noncardia stomach it was absent (P=0.44; OR=1.05). Our findings at 10q23 could provide insight into the high incidence rates of both cancers in China.
Gastric cancer and esophageal cancer cause more than 700,000 and 400,000 deaths respectively each year, and represent the 2nd and 6th leading causes of cancer death worldwide1. For GC, infection with Helicobacter pylori is the primary etiologic factor in all populations, although the majority of infected individuals do not develop cancer. Smoking tobacco and drinking alcoholic beverages explain nearly 90% of ESCC cases in the United States and other Western countries2, but these exposures represent minor factors in high-risk populations in China3 and elsewhere4. Risk factors for ESCC in populations with high incidence rates include family history5 and dietary deficiencies6, but a large proportion of the etiology in these populations remains unexplained. GC and ESCC occur in the Taihang Mountains of North-Central China at some of the highest rates reported for any cancer7; over 20% of all deaths in this area have been attributed to these cancers8, 9. However, the causes of the high rates and of the geographic correlation of these two anatomically adjacent but histologically distinct tumors have not been determined. The gastric cancers in this area occur primarily in the uppermost portion of the stomach (proximal 3 cm) and are referred to as gastric cardia cancers, while those in the remainder of the stomach are referred to as gastric noncardia cancers. In most other parts of China gastric noncardia cancers are the predominant upper gastrointestinal tract tumors10.
To investigate the genetic contribution to these highly fatal diseases in ethnic Chinese subjects, we conducted parallel genome-wide association studies (GWAS) for GC and ESCC with shared controls. Using the Illumina 660W Quad chip, we scanned 4,987 samples from the case-control and case-only components of the Shanxi Upper Gastrointestinal Cancer Genetics Project (Shanxi) and 1,389 samples from a prospective cohort, the Linxian Nutrition Intervention Trials (NIT); both studies were conducted in the Taihang Mountains (Supplementary Table 1). After quality control metrics were applied (Online Methods), 551,152 SNPs were analyzed in 1,625 cases of GC, 1,898 cases of ESCC and 2,100 controls. 12,000 SNPs with minimal linkage disequilibrium (pair-wise r2 < 0.004) were used to test for differences in population substructure11 and did not demonstrate significant evidence for population substructure within study (data not shown). In a second phase, we optimized TaqMan assays to genotype eight SNPs that were significant in the genome-wide phase for GC, ESCC, or both in an independent set of subjects (615 GC, 217 ESCC and 1202 controls) from the Shanxi and NIT studies and three additional prospective cohorts (The Shanghai Men's Health Study, the Shanghai Women's Health Study, and the Singapore Chinese Cohort Study) (Supplementary Table 1). For these eight SNPs, we conducted a combined analysis of 2,240 GC cases, 2,115 ESCC cases, and 3,302 controls (details in Supplementary Table 1).
The results of the initial GWAS for GC and ESCC, which were analyzed independently, are presented as Manhattan plots in Supplementary Figure 1 using P-values from 1 df trend tests in logistic regression models adjusted for age, sex, and study. We found independent genome-wide significant associations at chromosome 10q23 for both GC and ESCC (Tables 1 and 2, Fig. 1). For GC, an association initially observed at chromosome 1q22 was not supported in the combined data (Table 1); additional studies are required to determine if this locus contributes to risk for GC in ethnic Chinese.
At 10q23, we analyzed a set of five correlated SNPs in both GC and ESCC, including two nonsynonymous variants. The strongest association for GC was with rs3781264 (P = 3.76 × 10−9; per allele OR= 1.36, 95% c.i. 1.23–1.50). The other four SNPs at 10q23 also showed genome-wide significance (Table 1). The associations differed when gastric cancers were divided into the two anatomic subsites. The strongest association for gastric cardia cancer was rs2274223 (P = 4.19 × 10−15; OR = 1.57, 95% c.i. 1.40–1.76,), but there was no association for gastric noncardia cancers (P = 0.44; OR = 1.05, 95% c.i. 0.93 – 1.20). rs2274223 and other SNPs at 10q23 also showed genome-wide significance with ESCC (P = 3.85 × 10−9; OR = 1.34, 95% c.i. 1.22–1.48) (Table 2). We found consistent results when comparing the two studies from the high incidence areas of the Taihang Mountains (Supplementary Table 2). The five SNPs at 10q23, which have strong pair-wise LD (r2 from 0.62 to 0.98 in controls), map to the Phospholipase C ε 1 gene (PLCE1) that lies adjacent to the nucleolar complex associated 3 homolog gene (NOC3L) (Fig. 1).
The SNPs that showed significant associations for GC and ESCC at 10q23 in the PLCE1 gene included two SNPs that result in missense mutations in the coding region, rs2274223 (Arg1927His) and rs3765524 (Ile1777Thr). Further work is required to determine if either of these SNPs is functionally important, but the findings suggest a single locus associated with risk for both cancers. Notably, when gastric cancers were divided into the two distinct anatomic locations, the association was restricted to tumors of the cardia (Table 1).
PLCE1 is a member of the phospoholipase C family of proteins and, uniquely within this family, it interacts with the proto-oncogene ras12 among other proteins. Variants in PLCE1 are known to cause early-onset nephrotic syndrome in humans13, but this gene may also be linked to carcinogenic processes. PLCE1 knockout mice are resistant to the promoting effects of 12-O-tetradecanoylphorbol-13-acetate in 7,12-dimethylbenzanthracene-induced skin carcinogenesis14 and are resistant to intestinal tumor formation when crossed with APCmin/+ mice15. In addition, the SNPs reside in an area between two recombination hot spots that also includes NOC3L, which has been linked to control of DNA replication during mitotic clonal expansion16.
For ESCC, we initially observed an independent significant association with rs738722 at chromosome 22q12 (P = 5.67 × 10−8; OR = 1.32, 95% c.i. 1.19–1.45) (Table 2) in the first phase, but the association was not statistically significant in the second phase by itself. In the combined data the association remained strong (P = 1.41 × 10−8; OR = 1.30, 95% c.i. 1.19–1.43). This SNP maps to a region within the CHK2 checkpoint homolog gene (CHEK2), but is also in LD with regions of the Hsc B iron-sulfur cluster co-chaperone homolog gene (HSCB) (Supplementary Fig. 2). Previous studies of Caucasian populations have suggested an association between uncommon variants in CHEK2 (rs2267130 and rs17879961) and risk of upper aerodigestive tract cancers17, 18, but these SNPs were not included in our scan. Rare variants in CHEK2 have also been associated with susceptibility to breast19, colorectal, and other cancers20. This association appears promising, but with the lack of independent confirmation further studies are needed to validate it.
We also examined loci previously reported in a GWAS21 for GC (Supplementary Table 3). Specifically, we examined rs2920297 and rs2294008 at 8q24; both SNPs are in proximity to the prostate stem cell antigen gene (PSCA). We found no associations for GC, but when we restricted our analysis to gastric noncardia tumors, both SNPs showed associations of similar magnitude to those reported in a recent meta-analysis of East Asian studies22 (e.g. rs2294008 OR = 1.35, 95% c.i. 0.94–1.94). For ESCC, we also examined SNPs marking the alcohol-metabolizing genes ADH1B (rs1159918 and rs1042026) and ALDH2 (rs3782886 and rs671) that have been reported in candidate gene studies23 and in a GWAS24. Overall and in strata defined by alcohol drinking and tobacco smoking, we found no associations with these SNPs (Supplementary Table 4), perhaps due to the different environmental risk factors for ESCC in our study populations compared to previous studies with strong alcohol- and tobacco-related risks. In the Shanxi5 and NIT3 studies, the only two studies included in this portion of the analysis, alcoholic beverage and tobacco use are not major ESCC risk factors.
In summary, we conducted parallel genome-wide association studies for GC and ESCC in ethnic Chinese subjects. Variants at 10q23 in PLCE1 showed genome-wide significant associations for gastric cardia cancer and ESCC. These findings suggest that a common genetic mechanism may contribute to the etiology of both cancers. Fine mapping and sequencing in these loci will be required to determine the optimal genetic variants to be studied in laboratory systems to explain these association signals. Additional studies are needed to confirm and discover more loci associated with risk for GC and ESCC in populations in East Asia and elsewhere25.
Study participants for the GWAS were drawn from two studies, the Shanxi Upper Gastrointestinal Cancer Genetics Project (Shanxi) and the Linxian Nutrition Intervention Trial (NIT), a prospective cohort. For the second phase, we genotyped additional subjects from Shanxi and NIT as well as subjects from the Shanghai Men's Health Study (SMHS), the Shanghai Women's Health Study (SMHS), and the Singapore Chinese Health Study (SCHS) (Supplementary Table 1). The Shanxi study controls were matched on age and sex for the case-control portion, while for the NIT controls were selected as a case-cohort and frequency matched on age and sex. For the SMHS, SWHS, and SCHS cohorts, controls were alive, free of upper gastrointestinal tract cancer, and matched to cases as described in Supplementary Table 1. For the Shanxi and NIT study, tumor anatomic location was known for all cases and >85% of cases had pathologic confirmation. For the three cohorts added in the second phase, the proportion with anatomic location in the stomach is given in Supplementary Table 1 and pathologic confirmation was available for >95% of cases. All examined esophageal cancers were squamous cell carcinomas (ESCC) and all examined gastric cancers (GC) were adenocarcinomas. Cardia cancers were located in the proximal 3 cm of the stomach, while noncardia cancers were those in the remainder of the stomach. Gastric cancers without location information were included in total GC analyses but excluded from GC anatomic subsite analyses.
Each of the five participating studies obtained informed consent from subjects and from their studies Institutional Review Board(s). The NCI Special Studies-Institutional Review Board approved the overall GWAS study.
Genome-wide scanning was attempted on 6,384 samples using the Illumina 660W Quad chip. After excluding 8 samples with no observed intensity data, the 6,376 remaining samples were analyzed, including 4,987 from the Shanxi study and 1,389 from the NIT. Clustering was performed with 1,270 previously scanned Caucasian samples in order to improve calling for low MAF SNPs in the East Asian samples.
Participants were excluded because of: 1) completion rate lower than 94% (n=485 samples); 2) abnormal heterozygosity values of less than 25% or greater than 30% (n=53, among which 36 were also excluded for low completion rates; 3) discordant expected duplicates (n=3 pairs); 4) concordant unexpected duplicates (n=5 pairs, all from Shanxi); 5) gender discordance (n=55, all from Shanxi); 6) phenotype exclusions (due to ineligibility or incomplete information) (n=46). We checked for relatedness among study subjects using genotypes for all subject pairs with identity-by-state greater than 45%. These were input into GLU qc.ibds module (http://code.google.com/p/glu-genetics/) to estimate the identity-by-degree ratio and infer the degree of relatedness (1–2nd degree). We found 20 full sibling pairs, 2 parent-child pairs, and 22 half-sibling pairs. This level of relatedness was not surprising because of the geographic proximity of subjects accrued in the two studies. We selected and retained one from each of the 1st degree relative pair and excluded the other (n=22) for the PCA but included all for the association analyses. For the 132 known duplicate pairs the concordance was 99.98%.
Using 12,000 SNPs in low linkage disequilibrium (pair-wise r2<0.004)11, we identified and excluded two subjects with less than 90% Asian ancestry based on STRUCTURE analysis (http://pritch.bsd.uchicago.edu/structure.html)26 (Supplementary Fig. 3). For the Shanxi and NIT study subjects passing the QC metrics, principal component analysis showed borderline significant differences between but not within studies27. For subsequent analyses, we adjusted for study.
For all subjects in the genome-wide scan phase, we attempted 657,364 genotype assays. For analysis, we removed SNPs with a call rate <90%. 551,152 SNPs were advanced to the association analysis. Quantile-quantile plots (Supplementary Fig. 4) for case-control analyses were separately examined for GC and ESCC and there was no evidence for significant problems with population substructure or case-control matching: The unscaled λ for GC and ESCC were 0.990 and 0.989 respectively, while λ1000 for GC and ESCC were 0.995 and 0.994, respectively28. The Illumina Infinium genotype probe cluster plots for select SNPs (rs2274223 and rs3781264) are shown in Supplementary Figure 5.
After completion of the genome-wide phase, we selected six SNPs at10q23, two at 1q22, and two at 22q12 for TaqMan genotyping in our second phase. All ten SNPs were at or near genome-wide significance for total GC, ESCC, or both. For the selected SNPs, we successfully optimized eight TaqMan assays (ABI), while two failed manufacturing or validation. For the second phase using TaqMan, we included samples from the Shanxi and NIT study that had not been scanned or failed QC metrics in the genome-wide phase as well as samples from three prospective cohort studies of subjects of Chinese ethnicity (SMHS, SWHS, and SCHS) (Supplementary Table 1). In total, we completed TaqMan assays on 2034 subjects. After standard quality control metrics were applied, the sample completion rate overall was 98.8.%. Concordance between called Illumina genotypes and TaqMan was greater than 99.4%.
We used logistic regression models to estimate associations between genetic variants and disease risk. Primary models were adjusted for age in 10-year groups, sex, and study. We report trend models (Tables 1, 2), but also fit genotype models for comparison (Supplementary Table 3). All reported P-values are based on two-sided tests.
In the second and combined phases, logistic regression models were adjusted for age, sex, and study. Because previous studies reported an interaction between risk of ESCC, alcohol or tobacco consumption, and variants marking the ADH1B or ALDH2 gene loci, we fit models both adjusted for and stratified on these factors (Supplementary Table 4).
Data analysis and management was performed with GLU (Genotyping Library and Utilities version 1.0), a suite of tools available as an open-source application for management, storage and analysis of GWAS data.
The Shanghai Men's Health Study (SMHS) was supported by the National Cancer Institute extramural research grant [R01 CA82729]. The Shanghai Women's Health Study (SWHS) was supported by the National Cancer Institute extramural research grants [R37 CA70837 and 5R37CA070867-13, CANCER RISK REDUCTION AND DIET: A COHORT STUDY OF WOMEN, PI Wei Zheng] and, partially for biological sample collection, National Cancer Institute Intramural Research Program contract NO2-CP-11010 with Vanderbilt University. The studies would not be possible without the continuing support and devotion from the study participants and staff of the SMHS and SWHS.
The Singapore Chinese Health Study (SCHS) was supported by the National Cancer Institute extramural research grants [R01 CA55069, R35 CA53890, R01 CA80205, and R01 CA144034]. We are indebted to the contributions of Drs Mimi C Yu and Hin-Peng Lee in the establishment of this cohort. The study would not be possible without the assistance with the identification of cancer cases through database linkage by the Ministry of Health in Singapore. We are indebted to the study subjects for their continuing participation and staff of the SCHS for their support.
The Shanxi Upper Gastrointestinal Cancer Genetics Project was supported by the National Cancer Institute Intramural Research Program contract NO2-SC-66211 with the Shanxi Cancer Hospital and Institute, Taiyuan, Shanxi, China.
The Nutrition Intervention Trials (NIT) were supported by National Cancer Institute Intramural Research Program contracts NO1-SC-91030 and HHSN261200477001C with the Cancer Institute of the Chinese Academy of Medical Sciences, Beijing, China.
This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, the Division of Cancer Epidemiology and Genetics, and the Center for Cancer Research.