|Home | About | Journals | Submit | Contact Us | Français|
To identify new risk variants for cutaneous basal cell carcinoma, we performed a genome-wide association study of 16 million SNPs identified through whole-genome sequencing of 457 Icelanders. We imputed genotypes for 41,675 Illumina SNP chip-typed Icelanders and their relatives. In the discovery phase, the strongest signal came from rs78378222[C] (odds ratio (OR) = 2.36, P = 5.2 × 10−17), which has a frequency of 0.0192 in the Icelandic population. We then confirmed this association in non-Icelandic samples (OR = 1.75, P = 0.0060; overall OR = 2.16, P = 2.2 × 10−20). rs78378222 is in the 3′ untranslated region of TP53 and changes the AATAAA polyadenylation signal to AATACA, resulting in impaired 3′-end processing of TP53 mRNA. Investigation of other tumor types identified associations of this SNP with prostate cancer (OR = 1.44, P = 2.4 × 10−6), glioma (OR = 2.35, P = 1.0 × 10−5) and colorectal adenoma (OR = 1.39, P = 1.6 × 10−4). However, we observed no effect for breast cancer, a common Li-Fraumeni syndrome tumor (OR = 1.06, P = 0.57, 95% confidence interval 0.88–1.27).
Basal cell carcinoma (BCC) is the most common cancer in people of European ancestry. Sun exposure is the primary risk factor for BCC, but genetic predisposition also plays a substantial role1,2. High penetrance mutations in Hedgehog pathway genes (PTCH1, PTCH2 and SUFU) cause Gorlin syndrome, also known as basal cell nevus syndrome1. Candidate gene and genome-wide association studies have shown that common, low-penetrance susceptibility alleles exist at MC1R and several other loci3–7.
Previously, we described a large genome-wide association study of the Icelandic population using common SNPs and showed how genotypes can be phased over long distances8,9. For BCC, we initially generated Illumina SNP chip data for 1,366 affected individuals (cases) and 40,309 controls. Haplotype association analysis based on long-range phasing showed that several 0.3-cM haplotypes at 17p13 were strongly associated with BCC. The most significant signals were produced by haplotype A6 (OR = 2.04, P = 2.0 × 10−10), spanning the region chr17: 7,186,095–7,425,536 and by a highly correlated haplotype, A8 (OR = 2.00, P = 3.0 × 10−10), spanning an adjacent region, chr17:7,431,901–7,680,389. The region covered by these haplotypes is illustrated in Figure 1.
To search for variants that might not be covered well by the chips, we used high-capacity DNA sequencing techniques to sequence the entire genomes of 457 Icelanders to an average depth of over 10× (Online Methods), which identified approximately 16 million SNPs. To ensure that all the rare risk alleles that might be carried on the A6 or A8 backgrounds would be sequenced, we included ten individuals who carried these haplotypes among the 457 individuals selected for sequencing. Using imputation assisted by long-range haplotype phasing, we used the sequence data to determine the genotypes of the 16 million SNPs in the 41,675 Icelanders who had been genotyped on the SNP chips. Moreover, knowledge of Icelandic genealogy allowed us to propagate genotypic information into individuals for whom we have neither SNP chip nor sequence data, a process we refer to as ‘genealogy-based in silico genotyping’. We refer to the combined method of imputing sequence-derived data into phased chromosomes from chip-typed individuals and using genealogy-based in silico genotyping to infer the sequence of ungenotyped individuals as ‘two-way imputation’ (Supplementary Note).
We conducted a two-way-imputation–based genome-wide BCC association analysis of the 16 million SNPs, which we designated the ‘discovery phase’. This analysis identified a number of SNPs with strong associations in the region covered by the two haplotypes. The strongest signal (OR = 2.36, P = 5.2 × 10−17) came from rs78378222, located in the 3′ untranslated region of TP53 (Fig. 1 and Table 1). This signal was not only the strongest in the region covered by the two candidate haplotypes, but it was also the strongest signal of the 16 million SNPs observed genome wide. The minor (C, at risk) allele is present in the Icelandic population at a frequency of 0.0192. There was no deviation from Hardy-Weinberg equilibrium, and we observed C/C homozygotes (although they are rare), so the variant is not recessive lethal. The SNP was not on the Illumina chips and is not in HapMap2 or HapMap3. The best on-chip tag SNP was rs4796305, which produced only a modest signal (OR = 1.18, P = 0.0024, minor allele frequency = 0.119, r2 = 0.15). A quantile-quantile plot is shown in Supplementary Figure 1. Aside from the 17p13 locus, all other SNPs that surpassed the P = 3.0 × 10−9 Bonferroni-adjusted threshold for genome-wide significance were in previously published loci5,6.
To further investigate the rs78378222 association, we conducted a follow-up phase. The design of the discovery and follow-up phases is detailed in Supplementary Figure 2. We devised a Centaurus10 single-track assay for rs78378222 and used it to genotype all available samples from Icelandic cases with BCC (n = 2,322) and 7,200 controls. Among these, 1,044 out of the 2,322 single-track genotyped cases had not previously been genotyped by the Illumina chip. An association analysis with these 1,044 cases and 7,200 controls yielded OR = 2.19 and P = 3.2 × 10−7 (Table 1, see the Iceland follow-up single-track genotyped category). Note that this is not a fully independent replication of the discovery phase result, as some of the individuals who had been assigned genealogy-based in silico genotypes in the discovery phase were single-track genotyped in the follow-up phase (Supplementary Fig. 2). We then repeated the two-way-imputation–based analysis using a non-overlapping sample that excluded the 1,044 single-tracked cases (Table 1, see the Iceland follow-up two-way–imputation category). Combining the results from the two follow-up groups provided strong evidence for the association of rs78378222 with BCC (OR = 2.25, P = 5.4 × 10−19; Table 1, see the Iceland follow-up phase combined category). Association based on all 2,322 cases who had been single-track genotyped (irrespective of whether or not they had been chip typed) produced comparable results (Supplementary Table 1).
We then typed rs78378222 in replication samples from Denmark, eastern Europe and Spain (the replication sets are described in Supplementary Table 2). We found the risk allele in all the populations tested, with frequencies that seemed to decline with each population’s distance from Iceland (Table 1). Combined, the evidence for replication of the BCC association in non-Icelandic samples was significant and showed no evidence of heterogeneity (OR = 1.75, P = 0.0060, P of heterogeneity (Phet) = 0.27; Table 1). Combined with the Icelandic follow-up phase data, the overall association was highly significant (OR = 2.16, P = 2.2 × 10−20).
The second strongest signal in the genome originated from a previously unreported SNP (designated chr17:7,640,788; OR = 2.41, P = 1.1 × 10−13) which was also in the region covered by the two candidate haplotypes (Fig. 1). Based on single-track genotyping of 2,281 cases and 6,858 controls for both SNPs, chr17:7,640,788 was correlated (r2 = 0.61) with rs78378222 and so may capture the same signal. When adjusted for the effect of rs78378222, there was no residual signal from chr17:7,640,788 (ORadj = 1.07, Padj = 0.72), whereas rs78378222 remained significant after adjustment for chr17:7,640,788 (ORadj = 2.00, Padj = 3.0 × 10−5). Therefore, we did not investigate chr17:7,640,788 further. A common germline variant in TP53 (rs1042522, p.Pro72Arg) has been studied extensively for cancer susceptibility, generally with equivocal results11. We saw no evidence that rs1042522 was associated with BCC (OR = 1.00, P = 0.98). We confirmed this by single-track genotyping. Thus, it appears that p.Pro72Arg does not confer risk of BCC and is unrelated to the rs78378222 signal.
p53 is induced by ultraviolet irradiation of skin and is a primary mediator of the tanning response12,13. We looked for associations between rs78378222 and sensitivity of skin to sun in 11,131 samples from Iceland and The Netherlands14,15. There was no significant association between rs78378222[C] and self-reported sun sensitivity (Fitzpatrick score I and II compared to III and IV; OR = 1.14, P = 0.23, 95% confidence interval (CI) 0.92–1.41). We also noted that rs78378222[C] was somewhat more frequent in individuals with tumors at sun-exposed sites (0.0412) than in those with tumors at non–sun-exposed sites (0.0302); however, this difference was not significant (P = 0.18).
Inspection of the sequence surrounding rs78378222 indicated that it occurs in the sole polyadenylation signal of TP53, with the risk-associated variant changing the sequence AATAAA to AATACA, thus disrupting the signal sequence. This class of mutations was first observed in the polyadenylation signal of HBA2 (encoding alpha 2 globin), leading to alpha-thalassemia16. We obtained RNA from blood and adipose tissue from rs78378222[A/C] heterozygotes and rs78378222[A/A] homozygotes. Using RT-PCR with primers internal to the TP53 mRNA, we observed that rs78378222[A/C] heterozygotes expressed somewhat less TP53 transcript than wild-type homozygotes (P = 0.041; Supplementary Fig. 3a). To investigate polyadenylation site usage of wild-type and variant TP53, we selected total RNA samples from rs78378222[A/C] heterozygotes and carried out 3′ rapid amplification of complementary DNA ends (RACE; Supplementary Fig. 3b). Amplification using a TP53 gene-specific forward primer produced a band of 1.3 kb, which is the expected length of correctly terminated mRNA (Supplementary Fig. 3c). However, sequencing of this band indicated that correctly terminated polyA(+) mRNAs were produced predominantly from the wild-type allele, with 73% of mRNAs containing the wild-type A allele and 27% containing the variant C allele (P = 1.6 × 10−6; Supplementary Fig. 3c). We then carried out RT-PCR using a ‘run-on’ reverse primer, located in the genomic sequence approximately 320 bp beyond the normal 3′ end of TP53 (Supplementary Fig. 3b). Sequencing of RT-PCR products from heterozygotes showed that this RNA species was comprised almost entirely of variant C allele transcripts (Supplementary Fig. 3d). Taken together, these data suggest that the rs78378222[C] variant impairs proper termination and polyadenylation of the TP53 transcript.
Next, we searched for associations between rs78378222 and 20 major tumor types by cross-referencing genotypes to the Icelandic Cancer Registry and national pathology records. After correcting for multiple phenotype testing, we observed significant associations for prostate cancer, brain cancers and colorectal adenoma (but not colorectal cancer) (Supplementary Table 3). We conducted a follow-up phase in an attempt to get further evidence of these associations. The follow-up phase was analogous to that described in Supplementary Figure 2, and the sample numbers are detailed in Supplementary Table 4. We directly genotyped rs78378222 in all available Icelandic cases with prostate cancer, colorectal adenoma or brain cancer and determined follow-up phase two-way–imputation-based and single-track genotype-based association values (Table 2). For prostate cancer, we further genotyped replication samples from five countries. The association with prostate cancer outside Iceland was significant (OR = 1.63, P = 1.1 × 10−4), as was the combined Iceland follow-up and replication sample result (OR = 1.44, P = 2.4 × 10−6; Table 2). For the colorectal adenoma follow-up phase, single-track genotyping gave a comparable result to two-way imputation, and the combined analysis yielded a significant association result (OR = 1.39, P = 1.6 × 10−4; Table 2 and Supplementary Table 1). However, we still observed no significant association with colorectal cancer after genotyping replication samples from four countries (combined OR = 1.06, P = 0.51, 95% CI 0.89–1.27; Supplementary Table 5). This raises the possibility that rs78378222[C] might predispose to colorectal adenomas with a low propensity for progression to invasive cancer.
The discovery two-way–imputation category of ‘all brain cancers’ contained meningiomas (International Classification of Diseases (ICD) 10 code C70) and gliomas (C71–C72). When glioma and meningioma were considered separately, the association appeared stronger for glioma (OR = 2.50, P = 0.0055) than for meningioma (OR = 1.48, P = 0.36). Although this difference was not significant (P = 0.88), we focused on glioma in the follow-up phase (Table 2). In Iceland, the follow-up phase yielded a suggestive association with glioma (OR = 2.36, P = 0.0036). This was confirmed in two case-control samples of adult glioma from the United States (OR = 2.34, P = 9.2 × 10−4). Combined with the Icelandic data, we obtained firm evidence that rs78378222[C] was associated with glioma (OR = 2.35, P = 1.0 × 10−5; Table 2).
We also investigated potential associations with melanoma (because of the link between rs78378222 and skin cancer risk) and breast cancer (because of the involvement of TP53 mutations in Li-Fraumeni and Li-Fraumeni–like syndromes). We did not see any convincing evidence that rs78378222 confers risk of either of these tumors (OR = 1.07, P = 0.64, 95% CI 0.81–1.42 for melanoma and OR = 1.06, P = 0.57, 95% CI 0.88–1.27 for breast cancer; Supplementary Table 5). There was no evidence of specific associations with estrogen-receptor–negative breast cancer (OR = 1.15, P = 0.61) or high-risk breast cancer (defined as an age at diagnosis under 50 or a history of multiple primary breast cancers; OR = 0.87, P = 0.36).
Given its central role in tumor biology and the high frequency of somatic mutation17, intensive efforts have been devoted to searching for germline cancer susceptibility variants in TP53. Aside from gain-of-function mutations typically found in Li-Fraumeni and Li-Fraumeni–like syndromes, no germline variants of TP53 have so far been implicated reproducibly in cancer predisposition11,18. Cross-referencing the Icelandic genealogical database with cancer registry records did not reveal any rs78378222[C] carriers with family histories fitting the criteria of Li-Fraumeni or Li-Fraumeni–like syndrome19,20. TP53 mutations leading to Li-Fraumeni and Li-Fraumeni–like syndromes are rare, occurring in 1 in 5,000 to 1 in 20,000 births21,22. Although the penetrance of rs78378222[C] for the cancers it affects is much less, carriers are expected in frequencies of up to 4% in some populations. The effect of rs78378222[C] on 3′-end processing suggests a new mechanism by which TP53 may promote oncogenesis, with a distinctive spectrum of tumors being affected.
URLs. Picard version 1.17, http://picard.sourceforge.net/.
Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturegenetics/.
The work at deCODE genetics was funded in part by contract number 202059 (PROMARK) from the 7th Framework Program of the European Union. The Danish study ‘Diet, Cancer and Health’ was supported by grants from the Danish Cancer Society and ‘Europe against cancer’: European Prospective Investigation into Cancer and Nutrition (EPIC). The Inter99 study ‘A population-based primary prevention study on cardiovascular disease and type 2 diabetes’ was initiated by T. Jørgensen (principal investigator), K. Borch-Johnsen (co-principal investigator), H. Ibsen and T.F. Thomsen. The steering committee comprises the former two individuals and C. Pisinger. The establishment of the cohort was financially supported by research grants from the Danish Research Council, The Danish Centre for Health Technology Assessment, Novo Nordisk Inc., Research Foundation of Copenhagen County, the Ministry of Internal Affairs and Health, The Danish Heart Foundation, The Danish Pharmaceutical Association, The Augustinus Foundation, The Ib Henriksen Foundation and the Becket Foundation. The University of California, San Francisco (UCSF) Adult Glioma Study also acknowledges the people who have made substantial contributions to subject recruitment, specimen processing, pathology review and data analysis, including L.S. McCoy, I. Smirnov, J.S. Patoka, M.D. Prados, S.M. Chang and M.S. Berger (Department of Neurological Surgery, UCSF), J.L. Wiemels (Department of Epidemiology and Biostatistics, UCSF) and T. Tihan (Department of Pathology, UCSF). Work at the University of California, San Francisco has been supported by US National Institutes of Health (NIH) grants R01CA52689 and UCSF Brain Tumor SPORE, P50CA097257, as well as by grants from the National Brain Tumor Foundation, the UCSF Lewis Chair in Brain Tumor Research and by donations from families and friends of J. Berardi, H. Glaser, E. Olsen, R.E. Cooper and W. Martinusen. Work at the Mayo Clinic has been supported by the Mayo Clinic Brain Tumor SPORE (NIH P50 CA108961), the Mayo Clinic Comprehensive Cancer Center (NIH P30 CA15083) and an American Recovery and Reinvestment Act (ARRA) Recovery grant (NIH NS068222). Members of the Swedish Low-risk Colorectal Cancer Study Group are: D. Edler, Karolinska Universitetssjukhuset, Solna, Stockholm, Sweden; C. Lenander, Mag-tarm-centrum, Ersta sjukhus, Stockholm, Sweden; J. Dalén, St Görans sjukhus, Stockholm, Sweden; F. Hjern, Danderyds sjukhus, Danderyd, Sweden; N. Lundqvist, Norrtälje sjukhus, Norrtälje, Sweden; U. Lindforss, Södertälje sjukhus, Södertälje, Sweden; L. Påhlman, Akademiska sjukhuset, Uppsala, Sweden; K. Smedh, Centrallasarettet, Västerås, Sweden; A. Törnqvist, Centralsjukhuset, Karlstad, Sweden; J. Holm, Länssjukhuset Gävle-Sandviken, Gävle. Sweden; M. Janson, Karolinska Universitetssjukhuset, Huddinge, Huddinge, Sweden; M. Andersson, Universitetssjukhuset, Örebro, Sweden; S. Ekelund, Södersjukhuset, Stockholm, Sweden; and E. Olsson, Mälarsjukhuset, Eskilstuna, Sweden. Work on the US Prostate Cancer sample set was supported in part by the Urological Research Foundation, Prostate SPORE grant (P50 CA90386-05S2) and the Robert H. Lurie Comprehensive Cancer Center grant (P30 CA60553). For the UK Prostate Cancer sample set, the UK Department of Health funded the ProtecT study through the National Institute for Health Research (NIHR) Health Technology Assessment programme (projects 96/20/06, 96/20/99). We acknowledge the contribution of all members of the ProtecT study research group. We acknowledge the support of the NIHR Cambridge Biomedical Research Centre and the National Cancer Research Institute (ProMPT) Prostate Cancer Collaborative. The views and opinions expressed herein are those of the authors and do not necessarily reflect those of the UK Department of Health. Sample collection in Romania was supported in part by the Romanian National Council For Scientific Research (CNCSIS-UEFISCSU), grant PNII-IDEI 1184/2008. In Spain, J.I.M. is funded by Red Tematica de Investigacion Cooperative en Cancer RD06/0020/1054.
Note: Supplementary information is available on the Nature Genetics website.
AUTHOR CONTRIBUTIONSThe study was designed and the results were interpreted by S.N.S., P.S., G.M., D.F.G., O.T.M., J.H.O., A.K., U.T., T. Rafnar and K.S. Subject ascertainment and recruitment was carried out by S.N.S., J.G., B.S., K.T., R.R., K.R.B., B.A.N., A.T., K.O., P.R., E.G., K.K., K.H., C.C., V.F., P.G., S.N., F.F., M.D.G.-P., E.S., A.P., A.D.J., A.G., F.R., D.P., V. Soriano, C.R., K.K.A., M.M.v.R., R.G.H.M.C., I.M.v.O., D.-J.v.S., J.A.S., W.H.M.P., B.T.H., J.L.D., F.C.H., D.B., O.C., M.J., I.E.C., V.C., P.B., I.N.M., D.E.D., A.C., D.M., S.K., B.A.A., E.J., R.B.B., G.V.E., F.S., P.H.M., T.S., T.V., O.T.J., H.S., T. Jonsson, J.G.J., L.T., T. Rice, H.M.H., Y.X., D.H.L., B.P.O., M.L.K., P.A.D., V. Steinthorsdottir, A.L., R.S.S., T.O.K., K.B., T. Jørgensen, D.R.W., T.H., O.P., V.J., D.E.N., W.J.C., M.W., J.W., R.B.J., E.N., U.V., L.A.K., R.K., J.I.M., J.H.O., U.T. and T. Rafnar. The sequencing, genotyping and expression analysis was carried out by S.N.S., A.J., J.G., O.T.M., H.J., H.T.H., A.S. and U.T. The statistical and bioinformatics analysis was carried out by S.N.S., P.S., G.M., D.F.G., S.A.G., G.T. and A.K. S.N.S., P.S., D.F.G., T. Rafnar and K.S. drafted the manuscript. All authors contributed to the final version of the paper. Principal collaborators for the case-control population samples were: S.K. (Colorectal Adenoma), G.V.E. (Iceland Prostate), V.J. (Romania Prostate), D.E.N. (UK Prostate), W.J.C. (US Prostate), M.W. (US UCSF Glioma), R.B.J. (US Mayo Clinic Glioma), E.J. and J.I.M. (Spain BCC and Prostate), U.V. and O.P. (Denmark BCC), L.A.K. (Netherlands Prostate), R.K. (Eastern Europe BCC) and J.H.O. (Iceland BCC).
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturegenetics/.