|Home | About | Journals | Submit | Contact Us | Français|
E.A.R., D.T.B., D.F.E. and M.R.S. designed the study and obtained financial support. E.A.R., R.A.H., The UK Testicular Cancer Collaboration and M.R.S. coordinated the studies providing samples for stage 1 and stage 2. P.D. directed genotyping of stage 1. N.R. and C.T. directed genotyping of stage 2. R.L., A.R., D.H., S.H. and S.S. conducted genotyping of stage 2. E.T.D. and P.D. produced and analyzed gene expression data. A.A.A.O., J.M., J.N., D.T.B., C.T. and D.F.E. designed, coordinated and conducted statistical analyses. M.R.S. drafted the manuscript with substantial contributions from C.T., E.A.R. and D.F.E. All authors contributed to the final paper.
We conducted a genome-wide association study for testicular germ cell tumor (TGCT), genotyping 307,666 SNPs in 730 cases and 1,435 controls from the UK and replicating associations in a further 571 cases and 1,806 controls. We found strong evidence for susceptibility loci on chromosome 5 (per allele OR = 1.37 (95% CI = 1.19–1.58), P = 3 × 10−13), chromosome 6 (OR = 1.50 (95% = CI = 1.28–1.75), P = 10−13) and chromosome 12 (OR = 2.55 (95% CI = 2.05–3.19), P = 10−31). KITLG, encoding the ligand for the receptor tyrosine kinase KIT, which has previously been implicated in the pathogenesis of TGCT and the biology of germ cells, may explain the association on chromosome 12.
Testicular germ cell tumor (TGCT) is the most common malignancy in men aged 15–45 years. The worldwide incidence of the disease is 7.5 per 100,000, but the rates vary considerably between countries and ancestry groups1. Known risk factors include a family history of the disease, previous germ cell tumor, subfertility, undescended testis (UDT)2 and testicular microlithiasis3, the presence of small foci of intratesticular calcification. There are two main subclasses of TGCT: seminomas show histological features of primordial germ cells, whereas nonseminomas show varying degrees of differentiation toward embryonal and extraembryonal structures. Some tumors show features of both classes. TGCTs are believed to arise from progenitor germ cells through a preinvasive phase of intratubular germ cell neoplasia (ITGCN)4. The peak incidence of nonseminomas is between the ages of 20 and 30 years, whereas seminomas manifest about a decade later5. Most TGCTs in adults are malignant tumors with a strong propensity to metastasize. However, TGCTs are markedly sensitive to radiotherapy and/or chemotherapy and most are cured even if disseminated6.
Several studies have estimated the risk to brothers and fathers of individuals with TGCT to be eight- to tenfold and four- to sixfold, respectively7, much higher than the familial risks for most other cancer classes, which are generally approximately twofold8. However, most families with multiple cases of TGCT include only two affected individuals, usually sibpairs, and extended pedigrees with several cases are exceedingly rare9. A genome-wide genetic linkage study of 179 families by an international consortium did not provide strong evidence for the location of a gene predisposing to TGCT9. However, candidate association studies have indicated that deletions on the Y chromosome that are also associated with infertility are implicated in TGCT susceptibility10.
We carried out a genome-wide association study for TGCT susceptibility alleles using subjects with TGCT from the UK and the Illumina 370K array. Genotype frequencies were compared with those obtained on controls from the UK 1958 Birth Cohort using the Illumina 550K array and data on SNPs common to both arrays. Following quality control and removal of samples with non-European ancestry (see Online Methods), we analyzed results from 307,666 SNPs on 730 TGCT cases and 1,435 controls (UK1). Genotype frequencies between cases and controls were compared using Cochran-Armitage trend tests. There was only slight evidence of inflation of the test statistics (λ = 1.04, see Supplementary Fig. 1 online), indicating that there was little confounding due to population stratification.
To select loci for replication we searched for linkage disequilibrium (LD) blocks in which at least three SNPs each showed significance levels <10−5. Five such loci were identified, on chromosomes 1, 4, 5, 6 and 12 (Supplementary Table 1 online). We then genotyped one SNP from each locus and two from the chromosome 12 locus in an additional series of 571 UK TGCT cases and 1,806 controls from the UK 1958 cohort (UK2) (Supplementary Table 2 online). SNPs on chromosomes 5, 6 and 12 showed convincing evidence of association after replication (Table 1).
The strongest evidence was obtained for rs995030 and rs1508595, which are located within the same LD block on chromosome 12. SNPs located in adjacent LD blocks showed much weaker evidence of association, suggesting that the causative variant resides within this block. In a multiple regression analysis, there was evidence that both rs995030 and rs1508595 are independently associated with disease risk (P = 0.03 in stage 2, P = 0.0006 overall, compared with a model with only a single risk marker). This suggests that, if there is a single causal variant, it is distinct from either marker. Within this LD block there is only one annotated protein-coding gene, KITLG (also known as stem cell factor or steel), which encodes the ligand for the membrane-bound receptor tyrosine kinase KIT. rs995030 is within the 3′ untranslated region of KITLG and rs1508595 is in intergenic DNA 5′ to KITLG. The possibility that KITLG is the gene through which the association is effected is supported by several previously reported observations. First, the KIT–KITLG system regulates the survival, proliferation and migration of germ cells11, and germline homozygous null mutations of either gene in mice cause infertility as a result of failure of progenitor germ cell development12. Second, germline heterozygous deletions that remove the complete KITLG coding sequence confer a twofold increased risk of TGCT in a spontaneous mouse model of the disease13. Third, somatic missense mutations or amplification of KIT are present in ~25% of human seminomas, although they are very rare in nonseminomas (COSMIC; see URLs section in Online Methods). Paradoxically, the somatic changes of KIT found in seminomas activate its kinase14, whereas the germline deletions of KITLG that modify susceptibility in mice are predicted to reduce it12. Somatic mutations of KIT had previously been found in 14 seminomas included in the current study. The frequencies of the rs995030 and rs1508595 genotypes in these cases were not, however, different than those seen in the TGCT series as a whole. The involvement of both germline and somatic mutations of the KITLG–KIT system in mouse and human TGCT implicates KITLG as the target gene of the susceptibility locus on chromosome 12. However, an effect mediated by other genes cannot be excluded.
rs4624820 on chromosome 5 is located ~10 kb 3′ of SPRY4. SPRY4 is an inhibitor of the mitogen-activated protein kinase pathway, which is activated by the KITLG–KIT pathway15. rs210138 on chromosome 6 falls within an intron of BAK1 (BCL2-antagonist/killer 1), which encodes a protein that promotes apoptosis by binding to and antagonizing the apoptosis repressor activity of BCL2 and other antiapoptotic proteins16. Somatic rearrangements of the immunoglobulin heavy chain locus and BCL2 that result in constitutive BCL2 overexpression are found in B-cell chronic lymphocytic leukemia and follicular lymphomas. Expression of BAK1 in testicular germ cells is repressed by the KITLG–KIT pathway and interaction of BAK1 with antiapoptotic proteins is implicated in the germ cell apoptosis that occurs in response to blockade of this pathway16. It is, therefore, plausible that rs210138 exerts its effects on TGCT susceptibility through BAK1, and may influence similar biological pathways as rs995030 and rs1508595, the SNPs in the vicinity of KITLG. The loci on chromosomes 1 and 4 remain candidates for TGCT susceptibility but will require further evaluation.
Location of a disease-associated SNP near a gene does not necessarily implicate the gene in disease susceptibility. To investigate further their biological effectors we searched for associations between rs4624820, rs210138 and rs995030/rs1508595 and expression of genes within 1 Mb of them. In lymphoblastoid cell lines the G allele of rs210138, which is associated with the elevated risk of testis cancer, is associated with lower expression of BAK1 (P = 0.00078) in the extended CEU population (GENEVAR project, see URLs section in Online Methods) (Fig. 1). The association was significant at the 0.05 permutation threshold17, with other associated SNPs in the same LD block significant at the 0.01 level. This result suggests a biologically plausible mechanism by which reduction of BAK1 expression alleviates repression of antiapoptotic proteins, inhibiting apoptosis and hence contributing to neoplastic change. No correlation was observed between rs4624820 and SPRY4 expression or rs995030/rs1508595 and KITLG expression.
We investigated whether the loci on chromosomes 5, 6 and 12 are associated with different risks in subgroups of TGCT cases characterized by specific phenotypic characteristics or risk factors (Supplementary Tables 3–5 online). The OR conferred by the high-risk allele of rs4624820 on chromosome 5 was higher in early-onset cases (Ptrend = 0.006). There was also weak evidence for an age-of-diagnosis effect for the chromosome 12 SNPs, rs995030 (Ptrend = 0.06) and rs1508595 (Ptrend = 0.03). However, for both of these the OR conferred by the high-risk allele was higher in individuals who were older at diagnosis. This result does not seem to be attributable to a higher risk of the older-onset seminoma subclass of TGCT (see below).
None of the three loci showed a significant difference between cases with seminoma (537) compared to nonseminoma (412), cases with testicular maldescent (78) compared to those with normal descent (749), cases with a family history of TGCT (220) compared to those without (1,081) or cases with unilateral (1,238) compared to bilateral (63) disease (Supplementary Tables 3–5). For bilaterality and maldescent, the power to detect a difference was limited because of the small numbers examined. However, the absence of a difference between cases of seminoma compared to nonseminoma suggests that, despite their distinct histological and biological features, these two subclasses of TGCT share a common biological pathway of oncogenesis. Compatible with this notion are the existence of TGCT cases that have mixed pathology5, the lack of evidence that, in cases with bilateral TGCT, the two tumors are more likely than by chance to share histological type and the lack of evidence for clustering of histological type in individual families with multiple TGCT cases18.
The failure to demonstrate a higher OR in familial TGCT cases compared to those without a family history is, however, surprising. Under a simple multiplicative polygenic model, the OR should be elevated by approximately 50% in cases with an affected first-degree relative19. This degree of enrichment is not observed in this study for any of the loci; larger studies are required to further investigate this apparent absence of familial enrichment.
The relative risks of TGCT associated with the common variants discovered in this study, and in particular the greater than twofold risks associated with variants in the vicinity of KITLG, are considerably higher than have been reported from similar studies of breast, color-ectal, lung or prostate cancers20. This may, in part, reflect greater underlying biological homogeneity of TGCT compared to other cancer types. Although variants conferring similar relative risks have been found for other cancers (for example, in CHEK2), they are much rarer and population specific, indicative of strong negative selection. The absence of similar selection here is somewhat surprising given that the disease had a high fatality rate before the introduction of modern therapies, although negative selection may have been weaker in the past as the disease was rarer. We note that the disease allele frequencies at KITLG are substantially higher in Europeans than in other ancestry groups, consistent with some adaptive selection, although the variation in frequencies is not extreme as judged against all common SNPs (P ~ 0.3 based on the FST values across HapMap populations).
Each of the SNPs on chromosomes 5, 6 and 12 show a dose response, such that the estimated risks are compatible with a log-additive model. The relative risk to homozygotes with the high-risk alleles on chromosome 12 is greater than sixfold, and is approximately twofold for the loci on chromosomes 5 and 6. We investigated the combined effect of the loci on chromosomes 5, 6 and 12. There was some evidence for a departure from a multiplicative model for rs4624820 and rs210138, such that the combined risk was greater than the product of the individual risks (a positive interaction; P = 0.02); the other pairwise combinations were compatible with a multiplicative model. According to this model, the highest-risk individuals (males who are homozygous for the high-risk allele for all four risk SNPs, approximately 0.7% of the population) have a predicted risk that is approximately 40 times the risk of the lowest-risk individuals (those who are homozygous for all the low-risk alleles) and approximately four times the population risk. These results raise the possibility that, in conjunction with other known risk factors, these variants may be used in the future for risk prediction, particularly given the availability of relatively simple screening approaches such as testicular ultrasound. Further studies will be required, however, to refine the risk estimates and their interactions before this can be considered in clinical practice.
The three susceptibility loci reported here together account for ~7% of the risk to siblings and 10% of the risk to offspring of individuals with TGCT. The results of previous genetic linkage studies suggest that rare, high-penetrance genes are unlikely to account for much of the remaining familial risk. The power to detect the loci on chromosomes 5, 6 and 12 with a genome-wide search of this size was approximately 65%, 80% and 96% respectively, indicating that few further common variants with similar effects are likely to be identified with the current genome-wide arrays. Multiple loci of weaker effect may explain the residual familial risk, and these may be detected by additional association studies of TGCT.
Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturegenetics/.
Cases of TGCT were identified from a UK study of familial testicular cancer and a national collection of TGCT cases treated within the UK coordinated at the Institute of Cancer Research (ICR). Cases were collected over a 20-year period. All cases had a prior diagnosis of germ cell tumor (GCT) of the testis or within extragondal sites and all TGCT cases were unrelated. Subjects donated samples and medical information with full informed consent and under national ethical review board approval. Information on clinical status, including type of TGCT, age at diagnosis, the presence of undescended testis (UDT) and laterality of disease was confirmed by reviewing histological reports and clinical notes.
Control samples were drawn from the 1958 Birth Cohort. This is an ongoing follow-up of all persons born in Great Britain during one week in 1958, including a biomedical assessment during 2002–2004 at which blood samples and informed consent were obtained for creation of a genetic resource (National Child Development Study (NCDS), see URLs section below). We transformed cryopreserved peripheral blood lymphocytes into immortalized cultures by infection with Epstein-Barr virus (EBV) and extracted DNA from cell lines using a manual guanidine hydrochloride method. We selected 1,500 individuals of self-reported white ethnicity and representative of sex and each geographical region for genome-wide genotyping. The next available 1,920 male samples from the 1958 Birth Cohort, representative of geographical region, were selected for the replication series.
We genotyped cases in the first phase of the study on the Illumina 370K chip. We utilized controls from the 1958 Birth Cohort that had been previously genotyped on the Illumina 550K chip. Analysis was based on the 310,043 SNPs common to both chips. Validation and replication of cases and controls in the second stage was conducted by Taqman methodology (Applied Biosystems) using the manufacturer's protocols.
From the GWAS data, we eliminated 18 cases with call rates of <95%, 2 cases identified as being of non-European ancestry and 7 cases and 3 controls of probable non-European ancestry on the basis of the GWAS data. The latter were identified by estimating the average identity by state (IBS) among all participants together with the phase II HapMap samples and using multidimensional scaling (removing those with approximately >10% non-European ancestry). There were three pairs of duplicate samples, and in each case the sample with the lower call rate was excluded. After these exclusions, 730 cases and 1,435 controls were used in the final analysis (Supplementary Tables 1 and 2 online).
We filtered out all SNPs with a call rate <95% in cases or controls, or, for SNPs with a minor allele frequency of 1–5%, with a call rate of <99%. SNPs with a minor allele frequency <1% were excluded. We also excluded SNPs whose genotyped frequency departed from Hardy-Weinberg equilibrium at P < 0.00001 in controls, or P < 10−12 in cases. After these exclusions, we analyzed data on 307,666 SNPs.
In stage 2, we excluded 124 samples that failed two or more of the assays used. The call rates were at least 95% for each SNP in each population. Genotype distributions in each control population for each SNP were consistent with Hardy-Weinberg equilibrium.
We assessed associations between each SNP and disease at stage 1 using a 1-d.f. Cochran-Armitage trend test and a general 2-d.f. χ2 test. Inflation in the χ2 statistic was assessed using the genomic control approach: we derived an inflation factor (l) by dividing the median of the lowest 90% of the 1-d.f. statistics by the 45% percentile of a 1-d.f. χ2 distribution (0.357). This cutoff was used to avoid inclusion of SNPs likely to be associated with risk. We chose to present P values uncorrected for λ, as the estimated λ (1.04) was very close to 1, making little difference to the significance levels, and to preserve consistency with the stage 2 analysis.
After stage 2, we conducted 1-d.f. and 2-d.f. tests stratifying by stage. Odds ratios and confidence limits were estimated from the stage 2 data using unconditional logistic regression, stratified by stage. Estimates from stage 2 are given in the text, because these are less subject to the ‘winner's curse’. Modification of the odds ratios by age was assessed using a case-only analysis, assessing the effect of age on SNP genotype in the cases using polytomous regression. The effects of SNP genotypes on tumor type, family history, UDT cases and bilaterality were assessed similarly. The combined effects of multiple SNPs were assessed by fitting multiple logistic regression models, stratified by stage. Evidence for departure from a multiplicative model was assessed by adding an interaction term to the model. To estimate the power to detect each of the associations found, we computed the noncentrality parameter for the test statistic at each stage using the per-allele relative risk from stage 2 and allele frequency. This was used to estimate power on the basis of a bivariate normal distribution for the score statistics after each stage to allow for the correlations in the test statistics. We assumed significance thresholds of P < 10−5 after stage 1 and P < 10−7 after stage 2.
Associations with gene expression were investigated on the extended CEU (Caucasians of European descent from Utah) from HapMap. Data are publicly available as of December 15, 2008 (GENEVAR project, see URLs section below). We assessed the correlation between SNPs by conducting Spearman Rank Correlation between normalized gene expression levels and the count of one of the alleles of the SNP (0, 1 or 2). Significance was assessed by permutation as described previously17.
COSMIC, http://www.sanger.ac.uk/genetics/CGP/cosmic/; GENEVAR project, www.sanger.ac.uk/genevar; National Child Development Study (NCDS), http://www.cls.ioe.ac.uk/studies.asp?section = 000100020003.
We would like to thank the individuals with TGCT and the clinicians involved in their care for participation in this study. We would like to thank D. Dudakia, J. Pugh, H. McDonald and J. Marke for subject recruitment and database entry for the TGCT collections. We acknowledge NHS funding to the NIHR Biomedical Research Centre. We acknowledge use of DNA from the British 1958 Birth Cohort DNA collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. D.F.E. is a Principal Research Fellow of Cancer Research UK, and the study was supported by the Institute of Cancer Research, Cancer Research UK and the Wellcome Trust.
Note: Supplementary information is available on the Nature Genetics website.
Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/