On the basis of the initial genotyping analysis, a new SNP rs9679290 was identified to be associated with RCC risk approaching the threshold for genome-wide significance (P
= 5.75 × 10−8
, per-allele OR = 1.27, 95% CI: 1.17–1.39) (Fig. A), whereas, in this subset of the GWAS data set, the signals for the two highly correlated SNPs (rs11894252 and rs7579899) identified in the previous GWAS were not as strong (P
= 1.35 × 10−3
, per-allele OR = 1.17, 95% CI: 1.06–1.28 and P
= 2.13 × 10−3
, per-allele OR = 1.16, 95% CI: 1.05–1.28, respectively) (Table ). Notably, rs9679290 is not correlated with either rs11894252 or rs7579899 (r2
< 0.1) (Supplementary Material, Table S1
Figure 2. Association results and LD plots for the 2p21 region. (A) The P-values (–log 10 scale) of association tests from the SNPs of the initial genotyping analysis (red dots) and from the analysis using imputed SNPs (blue dots). (B) The P-values of the (more ...)
SNPs selected for three stage fine mapping of 2p21 region
To investigate the possibility that a more complex genetic architecture underlies the association with chromosome 2q21 (18
), we imputed genotypes across the 120 kb surrounding rs9679290 (which included the two previously reported SNPs) using two publicly available reference data sets: 1000 Genomes Project March 2010 release (http://www.1000genomes.org/page.php
) and Phase III HapMap (19
). Of the imputed 304 SNPs tested by association analysis, we observed a promising new signal at rs4953348, which is highly correlated with rs9679290 (P
= 2.77 × 10−14
, per-allele OR = 1.37, 95% CI: 1.27–1.48) (Fig. A). We also note that we did not observe any new significant associations with rare variants with minor allele frequencies (MAF) less than 5% in our data set.
Four correlated SNPs (rs4953348, rs4953346, rs10208823 and rs12617313) among the top hits were genotyped in five studies [American Cancer Society Cancer Prevention Study II Nutrition Cohort (CPS-II) was added to PLCO, ATBC, CEERCC and USKC] (2481 cases and 4203 controls) to validate the association signals (Fig. ). For the five studies combined, rs12617313 achieved genome-wide significance (P = 1.72 × 10−9) (Fig. B and Table ).
Forest plots and heterogeneity tests for the four validated EPAS1 SNPs.
A conditional analysis was performed to determine whether the effect observed for rs12617313 was independent of the previously reported markers in the GWAS, namely rs11894252 and rs7579899 (see Materials and Methods). When adjusted for rs12617313, the associations due to rs11894252 and rs7579899 were attenuated (from P = 1.35 × 10−3 and P = 2.12 × 10−3 to P = 4.09 × 10−1 and P = 4.69 × 10−1, respectively). When the two newly genotyped SNPs, rs12617313 and rs4953346, were evaluated in an analysis conditioned on rs11894252, the association signals remained notable and significant within the region (P = 3.08 × 10−4 and 2.70 × 10−4, respectively). A comparable finding was observed after conditioning for rs7579899 (data not shown).
To investigate whether the previously reported GWAS locus and our new markers were independent, we conducted an interaction test between rs11894252 and rs12617313, by fitting a logistic regression model that includes the main effects of both SNPs and their interaction term and covariates. The result showed that the interaction term was not significant (P = 0.45). Additionally, we performed an analysis of sets of SNPs to examine whether additional SNPs across this region might capture or explain the new signal we are reporting (rs4953346 and rs12617313) as well as the reported GWAS signal (marked by rs11894252). We fit a logistic regression model including a set of SNPs drawn from each of two regions of interest (rs12617313 and rs11894252) and six additional SNPs selected so that the pair-wise linkage disequilibrium (LD) (r2) among all the SNPs in the set was ≤0.2. None of the SNPs except rs12617313 (P = 0.007) showed a significant association (P < 0.1) with RCC risk. Similar analyses using r2 thresholds of 0.4 and 0.6 revealed similar results, suggesting two or more signals.
The two previously reported GWAS SNPs, rs11894252 and rs7579899, are strongly correlated (r2
= 1.0), but minimally correlated with the two SNPs that we identified by imputation, rs12617313 and rs4953346 (Fig. C; Supplementary Material, Table S1
). In an analysis of 4203 controls, we used SequenceLDhot and identified strong evidence of a recombination hotspot separating the SNPs identified in the GWAS from the new SNPs reported here, rs12617313 and rs4953346 (Fig. C and Supplementary Material, Fig. S1
To further investigate interactions between smoking status and genetic variants in EPAS1
, which were reported by the previous GWAS (13
), we conducted a series of pooled analyses stratified by smoking for the two sets of EPAS1
variants (see Materials and Methods). There was a notable interaction between the GWAS SNP (rs7579899) and smoking (P-
interaction = 0.036) (Supplementary Material, Table S2
). In contrast, no interaction was identified for the two new SNPs (rs12617313 and rs4953346) and smoking (P-
interaction = 0.272 and 0.378, respectively).
Since 80% of clear cell RCCs are reported to have VHL
somatic inactivation through either genetic or epigenetic mechanisms (21
), the entire coding regions of VHL
in 507 RCCs were sequenced to investigate whether common germline variants in EPAS1
were associated with VHL
alterations in RCCs (see Materials and Methods). We observed that cases with germline EPAS1
variants in the new region were more likely to have tumor VHL
alterations, with the strongest association observed for rs12617313 (P
= 0.006, OR = 1.82, 95% CI, 1.19–2.80) (Supplementary Material, Table S3
). Notably, the high-risk allele, A, was associated with VHL
alterations. In contrast, germline EPAS1
variants identified by GWAS were not associated with VHL
alterations in RCCs (P
> 0.2). No change in results was observed after adjustment for stage or grade.
In a re-sequence analysis of the 16 coding exons, the 5′ untranslated region (5′ UTR) and exon–intron junctions of EPAS1
(GenBank NM_001430) in 94 cases of RCCs (see Materials and Methods), we identified a common synonymous coding variant (c.1908T>C; N636N) in exon 12. This together with two novel 5′ UTR variants (c.-58insC and c.-140G>A) were confirmed in 100 CEPH (Centre d'Etudes du Polymorphisme Humain) controls. On the basis of the low MAF, they were not strongly correlated with the new SNPs described in our fine-mapping study (data not shown). The lack of observed coding variation is consistent with the high degree of coding sequence conservation of EPAS1
across species and with the paucity of EPAS1
common missense variants in the public SNP database (http://www.ncbi.nlm.gov/projects/SNP)
. Furthermore, nucleotide sequence alignment showed that the non-coding EPAS1
region containing the high-risk SNPs (rs12617313, rs4953346 and rs9679290) is evolutionarily conserved among species. As the strongest new signals were clustered in intron 1 of EPAS1
, we searched for putative regulatory elements using ORegAnno and other public databases. We identified two transcription binding sites (p53 and CCCTC-binding factor) adjacent to the new high-risk variants and a distant iron-responsive element (IRE) in the 5′ UTR of the mRNA that regulates HIF2α (Fig. D). The new signals are not in LD with variants in the transcription-binding sites or IRE. In addition, microRNA public databases examined did not suggest that these are surrogate SNPs in strong LD with signals that map to microRNA coding sequences in chromosome 2p21.