|Home | About | Journals | Submit | Contact Us | Français|
Testicular dysgenesis syndrome (TDS) is a common disease that links testicular germ cell cancer, cryptorchidism and some cases of hypospadias and male infertility with impaired development of the testis. The incidence of these disorders has increased over the last few decades, and testicular cancer now affects 1% of the Danish and Norwegian male population.
To identify genetic variants that span the four TDS phenotypes, the authors performed a genome-wide association study (GWAS) using Affymetrix Human SNP Array 6.0 to screen 488 patients with symptoms of TDS and 439 selected controls with excellent reproductive health. Furthermore, they developed a novel integrative method that combines GWAS data with other TDS-relevant data types and identified additional TDS markers. The most significant findings were replicated in an independent cohort of 671 Nordic men.
Markers located in the region of TGFBR3 and BMP7 showed association with all TDS phenotypes in both the discovery and replication cohorts. An immunohistochemistry investigation confirmed the presence of transforming growth factor β receptor type III (TGFBR3) in peritubular and Leydig cells, in both fetal and adult testis. Single-nucleotide polymorphisms in the KITLG gene showed significant associations, but only with testicular cancer.
The association of single-nucleotide polymorphisms in the TGFBR3 and BMP7 genes, which belong to the transforming growth factor β signalling pathway, suggests a role for this pathway in the pathogenesis of TDS. Integrating data from multiple layers can highlight findings in GWAS that are biologically relevant despite having border significance at currently accepted statistical levels.
Infertility, cryptorchidism, small testis size with poor semen quality, and hypospadias are relatively common among men in Western countries and are all risk factors for testicular germ cell cancer (TGCC),1 2 which has been increasing in incidence and is the most common malignancy of young men in many countries.3 Caucasian men are at particularly high risk of testicular cancer, which currently affects ~1% of the Danish and Norwegian male populations.4 It has been proposed that most cases of TGCC, and some cases of hypospadias, cryptorchidism and decreased spermatogenesis (excluding those unrelated to development of the fetal testis) may be symptoms of a testicular dysgenesis syndrome (TDS) that originates from perturbations during fetal development.2 5
Recent progress in basic and epidemiological research on testis cancer has provided evidence that TGCC, in spite of its occurrence in young adult men, has its origin in fetal germ cells with stem cell characteristics,6 7 the so-called intratubular carcinoma in situ (CIS) germ cells.8 According to a widely tested hypothesis, these cells are derived from gonocytes that failed to differentiate to spermatogonia because of dysfunction of Sertoli, peritubular and Leydig cells, the function of which is essential for differentiation of fetal germ cells.6 9 However, the invasive potential of CIS cells is not attained until after the pituitary–gonadal axis is activated during puberty.
Little is known about the causes of TDS, although environmental and lifestyle factors appear to play an important role, as suggested by the rapid increase in incidence of TGCC over a couple of generations.3 The prevalence of mild forms of cryptorchidism and poor semen quality also appear to be increasing.10 11 There is evidence of a genetic contribution, as brothers and fathers of patients with TGCC have a significantly increased risk of TGCC.12 Marked ethnic differences in risk of TGCC among men living in the same areas also indicate a clear genetic component in the aetiology of the disease.3 Three genome-wide association studies (GWASs)13–15 and a replication study16 have recently identified several genetic loci that predispose to TGCC (KITLG, SPRY4, TERT-CLPTM1L, ATF7IP and DMRT1). Altogether, these loci can potentially explain up to 13% of the heritability. The variants with highest effect size were found at 12q21, implicating KITLG/KIT signalling as a pathway involved in TGCC susceptibility.
On the basis of our hypothesis that patients with TDS may be genetically predisposed to as yet unknown environmental factors affecting inter-related pathways during testis development, we conducted a novel GWAS in which we grouped cases with at least one of four TDS phenotypes: TGCC, cryptorchidism, hypospadias, or infertility with low sperm concentration. The GWAS was conducted on a discovery cohort of 926 Danish men, comprising 439 healthy controls and 488 patients affected by at least one TDS phenotype. The cohort was carefully selected with the help of detailed patient records, which provided background information, and additional phenotype characterisation including semen analysis. Candidate markers from the discovery stage were subsequently assessed in an independent replication cohort of 671 Nordic men. The markers were selected using three approaches: (1) conventional single-marker association; (2) an integrative systems biology approach based on augmenting the GWAS data with several complementary types of data; and (3) protein-complex-based pathway analysis, which examined whether groups of related genes in the same functional pathway were jointly associated with the trait of interest.
Systems biology approaches that exploit the complementarities of heterogeneous types of data have recently been successfully applied in GWASs to detect associations that may be missed by single-marker association.17–19 Here we used two such approaches. First, we adapted the MetaRanker tool20 to integrate gene–phenotype associations from several complementary data sources, including targeted knockout experiments in mice, transcriptome time-series expression studies of the developing testis, and protein–protein interactions. In a second approach, we tested protein complexes for association with TDS, in order to jointly examine multiple markers from genes inter-related by prior biological knowledge.21–23
With the above approaches we were able to: (1) identify pleiotropic risk variants for TDS—that is, variants associated with all of the phenotypic subtypes; (2) confirm the risk factors in the KITLG gene found by two recent GWASs on TGCC; and (3) identify genes and protein complexes through integrative systems biology approaches that would not be found by single-marker association.
The discovery cohort comprised 927 individuals of Danish descent: 439 healthy young men with semen concentrations above 60 million sperm/ml (table 1) and 488 cases (212 men with germ cell tumours, 138 men with cryptorchidism (diagnosed by lack of testis descent at birth), 31 men with hypospadias, and 107 infertile men identified using the following criteria: absence of any known cause of infertility (no Y-chromosome microdeletions and no history of radio- or chemo-therapy); absence of varicocele; sperm count below 15 million sperm/ml; and testis volume below 15 ml). Cases with several symptoms of TDS were grouped according to severity (TGCC>hypospadias>cryptorchidism>infertility) (supplementary figure 1 online and supplementary table 1 online).
The replication cohort included 235 controls and 436 TDS cases from Denmark, Sweden and Finland, three Nordic populations that are genetically closely related (supplementary figure 2 online). Cases of cryptorchidism (n=103) were contributed from Finland (Turku area) (n=34) and from Rigshospitalet Denmark (n=69); TGCC samples (n=333) were collected in Sweden (from the Malmö region (n=112) and the Stockholm area (n=135)) and Denmark (n=86); controls were obtained from Rigshospitalet, Denmark (n=184) and the Turku area in Finland (n=51) (table 1).
For investigations of gene expression at the protein level, we used nine samples of adult orchiectomy specimens (from men operated on for testicular tumours), as well as three samples of normal human fetal testis tissue (originating from spontaneous miscarriages at 14–15 weeks of gestation). All tissue samples were paraffin-embedded and stored in the pathology archives at Rigshospitalet.
The local ethics committees approved the study, and all subjects provided informed consent. The tissue samples were investigated as coded (adult testes) or anonymous (fetal testes) sections.
The discovery cohort was genotyped using the Affymetrix Genome-Wide Human SNP Array 6.0, following the manufacturer-supplied protocol. Genotype calling and subsequent analysis were performed using the Birdseed algorithm (of Affymetrix Power Tools), Plink v1.06 and R. Genotyping of 34 associated markers and their call rates (supplementary tables 2 and 3 online), selected from the discovery stage, was performed using the KBiosciences Competitive AlleleSpecific PCR SNP genotyping system (KASPar) as an in-house service at KBiosciences. KASPar assay primers were designed using the Primer Picker software available at http://www.kbioscience.co.uk (KBiosciences). p Values were obtained by the MAX test,24 and OR with confidence limits were retrieved by logistic regression in R.
From the 926 samples included in the initial discovery cohort, we removed 55 for which genotyping call rates were below 96% or their quality control contrasts were below 0.4. Individuals detected to be of non-Scandinavian ancestry (n=11) were excluded (supplementary figure 3 online). In addition, 21 samples with high potential for confounding the association analysis due to risk of consanguinity (n=17) or a large amount of missing data (n=4) were excluded.
We removed markers with low minor allele frequencies (<0.1), an excess of missing genotypes (>0.01) or strong deviation from Hardy–Weinberg equilibrium (<10e–5). The resulting 600 798 markers were tested for association with the disease phenotypes using MAX statistics.24 Multiple testing correction was performed using the maxT permutation method.
The polyclonal antibody, transforming growth factor β receptor III (TGFBR3) (Santa Cruz Biotechnology, Santa Cruz, California, USA; sc-6199), was used according to a standard indirect peroxidase protocol based on a Zymed Histostain kit (Invitrogen, Carlsbad, California, USA), as previously described.25 All tissue samples were formalin-fixed and paraffin-embedded. For negative controls, the primary antibody was replaced with the dilution buffer. For fetal testis (14–15 weeks of gestation, three samples), the same procedure as above was used, except that the tertiary layer was streptavidin-coupled with alkaline phosphatase and development was performed with BCIP NBT as previously described.26
A three-layer MetaRanker analysis20 was applied to integrate information from external phenotype-specific datasets. Ranked lists of all human genes were obtained in each layer, as described below, and then combined into a meta-score. Low ranked genes were considered candidate susceptibility genes.
For each gene, a p value was obtained as the minimal GWAS p value of all markers mapping to the gene according to Affymetrix Genome-Wide Human SNP Array 6.0 annotations. The p values were bias-corrected for gene size, linkage disequilibrium and marker density, on the basis of the (latent) number of independent markers.20 27 All genes were ranked according to the corrected p values.
To identify differentially expressed genes in the developing fetal testis, time-series array expression data from two studies on mouse fetal testis and from one study on human fetal testis were retrieved from the Gene Expression Omnibus and normalised by q-spline normalisation.28 The study on human fetal testis was the only one found in the Gene Expression Omnibus; it provided data on 17 testis samples taken between 9 and 20 weeks of gestation.29 Genes were ranked according to the coefficient of variation (the SD of expression divided by the mean). The mouse studies included data from biological duplicates taken at five time points between the time of the indifferent gonad (11.5 days postcoitus (dpc)) and birth (18.5 dpc),30 respectively at five time points between gestational day 11 and 18.31 For both mouse studies, genes were ranked on the basis of one-way analysis of variance F-tests for differential expression at different time points. A single ranked list of genes that describes the transcriptional regulation in the developing testis was obtained by summing the gene ranks from the three studies (after mapping the mouse genes to their human orthologues).
All targeted alleles with their human orthologues and associated morphologies were obtained from the Mouse Genome Informatics (MGI) database via the BioMart interface. A set of 34 morphologies from the Mammalian Phenotype ontology were manually curated on the basis of their developmental defects of the testis and relation to TDS (supplementary table 4 online). To avoid a bias towards spermatogenesis-related genes, two morphologies, ‘male infertility’ and ‘abnormal spermatogenesis’, were excluded. All descendants to the curated terms were then retrieved from the Mammalian Phenotype ontology, resulting in a total of 122 terms. This set of terms was mapped to 315 human orthologous genes that were targeted in MGI. Next, we ranked all human protein complexes according to their enrichment for these 315 targeted genes. A complex was defined as a central hub protein and all its first-order interaction partners in a network of protein–protein interactions.32 Each complex was tested for enrichment of targeted genes using a hyper-geometric test. In this manner, all sets of genes interacting in a biologically active complex obtained a rank with respect to its enrichment of TDS targeted mouse genes.
For each gene, let Lj denote the rank from layer j. The meta-score was obtained as rank indicates candidate disease genes.
To complement the single-nucleotide polymorphism (SNP) GWAS and the gene-centric meta rank analysis, we applied a protein–protein interaction-based approach that leverages GWAS analysis by testing for the aggregate association across several gene products.21 Protein complexes were defined and delineated by use of our previously published protein–protein interaction database, InWeb.32 We ranked 8342 protein complexes on the basis of their enrichment in genes that harbour low TDS GWAS association p values. Of particular interest were complexes with low scores that also contained genes with low meta-rank from the three-layer MetaRanker analysis.
In this study we genotyped a discovery cohort comprising 439 controls and 488 cases affected by at least one of four TDS sub-phenotypes: testicular germ cell cancer, cryptorchidism, hypospadias or low semen concentration. Samples were analysed using the Affymetrix Genome-Wide Human SNP Array 6.0. After application of stringent quality control criteria, we compared the genotype frequency distributions between cases and controls for 600 798 markers using the MAX test,24 which allows simultaneous investigation of additive, dominant and recessive inheritance. We observed only marginal confounding due to population stratification or other biases, as indicated by the genomic inflation factor (λ) of 1.019 and the quantile–quantile plot of observed versus expected χ2 test statistics (supplementary figure 4 online).
We identified a strong association with all phenotypes of TDS (p<10−6) for nine markers, within the same linkage disequilibrium block, at 2q31.1, with the strongest association for rs17198432 (figure 1). The subset of 212 TGCC cases in our cohort was investigated in a separate single-marker analysis to compare the corresponding TGCC-specific results with those of two recent GWASs on TGCC.13–15 The most significant marker at KITLG was ranked 12th among all markers (p=1.17×10−5). Furthermore, in the TGCC group, the observed genotype frequencies of the KITLG markers were comparable to those previously reported (table 2).
For each data type, all genes were ranked on the basis of their strength of association with the TDS phenotype. These data type-specific ranked lists of genes were then combined into a final ranked list of genes. In order to avoid high ranking of genes that were only supported by a single data type, we used a modification of the MetaRanker20 tool by using the sum rather than the product of the individual gene ranks. Thus all human genes were ranked on the basis of a combination of microarray gene expression time-series studies of the developing fetal testis in mouse and human,29–31 protein–protein interaction data,32 targeted mice knockouts resulting in developmental defects of the testis (MGI), and gene scores based on the single-marker analysis of this GWAS (figure 2; see Materials and methods for details). From this analysis, we selected markers from 14 top-ranked genes, many of which are active during early development (supplementary tables 5 and 6 online). Genes implicated by previous GWASs of TGCC had relatively poor ranks (supplementary tables 7 and 8 online), partly because we used a TDS cohort rather than solely TGCC cases and partly because the genes did not exhibit any significant variable expression during testicular development.
Joint analysis of sets of markers may identify aggregated effects that are significant at a pathway level. Such analysis is motivated by the hypothesis that a set of genetic variations can perturb different components of a certain biologically functional entity, such as a signalling pathway or a protein complex, and each variation can influence the functional entity either individually or in combination. Thus we derived protein complexes from protein–protein interaction data, and tested each complex for significant enrichment of TDS-associated SNPs. A complex consisting of interacting members of the TGFβ superfamily (BMPER, BMP2, BMP7, BMP6 and BMP4) was found to be significantly associated (supplementary table 9 online).
On the basis of the findings from the marker selection approaches described above (TDS single-marker association, integration of gene–phenotype association and pathway analysis) and previously reported KITLG markers with association with TGCC, we selected a total of 39 SNPs (supplementary table 2 online) for subsequent validation in an independent replication cohort of 671 Nordic men. These included 16 SNPs from four independent loci based on single-marker association, and 23 SNPs based on integration of gene association ranks from different data types and protein complex ranks based on pathway analysis. Of the 671 samples, 23 did not meet the quality control standards for the genotyping system used by KBiosciences.
Markers at three genes showed a significant association in the replication cohort: KITLG, TGFBR3 and BMP7 with OR (95% CI) 1.93 (1.39 to 2.69), 1.52 (1.08 to 2.15), 1.28 (1.01 to 1.62) (table 2). Surprisingly, the nine SNPs located in the HOXD region that were identified by the single-marker association in the discovery cohort did not find support in the validation cohort (table 2 and figure 3). Comparison of ORs for the different phenotypes of TDS showed that KITLG was mainly associated with TGCC, but not with the other TDS sub-phenotypes, whereas TGFBR3 and BMP7 showed similar association for all phenotypes (table 3).
For further investigation of TGFBR3 expression at the protein level, we performed an immunohistochemical analysis in fetal and adult testis (figure 4). Leydig cells and peritubular cells were TGFBR3-positive, predominantly at cell membranes, in both normal fetal testes and adult testes with dysgenetic features, such as the presence of CIS cells, hyperplastic Leydig cell micronodules and thickened peritubular membranes. Fetal gonocytes, CIS cells and other types of germ cells were negative, except for a very few scattered CIS cells in adluminal location and some staining in Golgi apparatus/pre-acrosome in round spermatids, which may be unspecific reactions. In contrast with a previous study,33 TGFBR3 protein was not detected in Sertoli cells, probably because of the differences in tissue fixation methods or specificity of the antibodies used.
This study focused on understanding the genetics behind developmental TDS, which is typically associated with male infertility and TGCC, sometimes combined with mild forms of genital malformations. Some studies suggest that a common genetic background for TDS is doubtful,1 and others have even implied that TDS does not exist.34 We believe that TDS is a real syndrome with predominantly environment-dependent pathogenesis and a relatively weak component of genetic predisposition. This means that the prevalence of TDS among Caucasian populations probably depends on the strength of the adverse environmental or lifestyle factors. However, as this study suggests, it is likely that a subset of cases of TDS do share predisposing genetic variants. Here, we identified, using a combination of GWAS and systems biology approaches, genomic variants in TGFBR3 and BMP7 between several TDS phenotypes, most notably cryptorchidism and testicular cancer. TGFBR3 has not previously been reported to be directly associated with male infertility or TDS, but participates in pathways that are highly relevant in terms of involvement in regulation of embryogenesis and oncogenesis. In the single-marker approach, we also found evidence in the discovery cohort for a possible association for rs17198432 and other markers in the HOXD gene cluster.
It should be mentioned that SNP markers found to be significant in GWASs are more often than not in intergenic regions and not in coding or known regulatory regions. Picking the closest gene as the affected one is an assumption; however, cis-eQTL studies35 have shown that it is a reasonable assumption. One of the strengths of our integrative approach is that it takes a gene focus that increases the probability of the gene being involved, as it requires the gene to have relevant expression, as well as protein–protein interactions of relevance to the phenotype.
The integrative systems biology analysis, which combined data from the GWAS with gene–phenotype associations found in complementary data types, including testis developmental expression data and protein–protein complexes associated with testicular developmental defects, identified several disparate loci that are functionally linked to the TGFβ superfamily signalling pathway, which were supported by the replication study. Using GWAS alone, these SNPs would fall into a ‘grey zone’,18 and would not have been selected for replication. Many GWASs suffer from this phenomenon because the number of samples required to attain genome-wide significance can be unfeasibly high.
Among the loci with the highest significance across all four TDS phenotypes was the TGFBR3 gene, which encodes the TGFβ receptor type III, a co-receptor for inhibins and TGFβ1–3, BMP2, BMP4 and BMP7. TGFBR3 and its co-receptors and ligands are expressed in most endocrine tissues, including the testis. TGFBR3 has been identified in human Sertoli cells and Leydig cells, both in the normal testis and in tubules with CIS.36 In this study, we found that TGFBR3 was expressed in both Leydig and peritubular cells in fetal as well as adult testis, but was absent from Sertoli cells. This expression pattern supports the idea that TGFBR3 is essential for embryonic development of the reproductive system, as silencing of the murine Tgfbr3 gene resulted in impaired function of fetal Leydig cells and testicular dysgenesis.37 In addition, the signalling partners of TGFBR3, the family of activin receptors, are present in fetal human testis38 and appear to be dysregulated in testes with TGCC.33 39 Of note, among the loci that scored high in both the pathway analysis and the integration analysis were BMP7 and other bone morphogenetic proteins, which can bind TGFBR3. The presence of several components of the same pathway, which have all previously been implicated in the development of the testis and early reproductive system, reinforces the validity of the identified SNPs as possible genetic factors predisposing to TDS.
The analysis of the TGCC subset of cases confirmed the association of the KITLG locus (12q21) with increased risk of testicular cancer.13 14 However, it has been suggested that SNPs in the KIT and KITLG genes may also be involved in infertility,40 but we did not detect an association for other subtypes of TDS. Thus, the KITLG region should be further studied to pinpoint causative variants within KITLG or its regulatory regions. In the single-marker approach, we also found evidence for a possible association for rs17198432 and other markers in the HOXD gene cluster (supplementary table 10 online). HOXD genes encode a group of transcription factors well known for their importance in morphogenesis during early embryonic development of limbs and genitalia,41 and a direct association with cryptorchidism has been shown in one study.42 As HOXD was not supported in the replication cohort, we have not discussed this in detail, but we mention it as a variant locus that would be worth screening for in other cohorts.
Here, we have successfully applied systems biology methodologies to prioritise important findings in a GWAS of a carefully collected discovery cohort of individuals of Danish ethnic background with detailed clinical records, and a subsequent replication cohort with similar phenotypes. The GWAS data were complemented by integration of multiple data types. This approach enabled us to identify potentially significant variations that would normally not be prioritised on the basis of genome-wide multiple testing corrected p values. Two of the three replicated loci were selected on the basis of systems biology, indicating that integrative analyses can be useful for candidate selection of markers that do not reach genome-wide significance in a single-marker association analysis. The candidates identified in this study, notably those involved in TGFβ superfamily signalling pathways, provide evidence that at least a subset of TDS cases may have a common genetic predisposition. The paucity of informative markers indicates the predominant role of environmental and lifestyle factors in the pathogenesis of this syndrome. However, association with the TGFβ signalling pathway has not been shown previously, and this should be examined further. The findings should be corroborated in further independent GWASs, fine-mapping and sequencing studies, or by mechanistic investigations of the implicated pathways. In any case, integration of prevailing information via systems biology approaches holds promise for enhancing future GWASs.
We would like to thank Betina F Nielsen, Ana Ricci Nielsen and Olga Rigina for their excellent technical assistance, and Rasmus Nielsen and Anders Albrechtsen for statistical advice. We thank all patients for participating in this study, and the health professionals for facilitating our work. The population allele and genotype frequencies were obtained from the data source funded by the Nordic Center of Excellence in Disease Genetics based on samples regionally selected from Finland, Sweden and Denmark.
Funding: This work was supported by the Villum Kann Rasmussen Foundation, a NABIIT grant from the Danish Strategic Research Council, the Novo Nordisk Foundation, the Academy of Finland, Sigrid Juselius Foundation, Foundation for Paediatric Research, Turku University Hospital, the European Commission (FP7/2008-2012: DEER 212844), the Swedish Cancer Society (CAN 2009/817), Gunnar Nissons Cancer Foundation, Malmö University Hospitals Cancer Foundation and King Gustaf V's Jubilee Fund for Cancer Research.
Competing interests: None.
Patient consent: Obtained.
Ethics approval: The local ethics committees in Denmark, Sweden and Finland.
Contributors: MDD, HL, NJ, AJ, SB, RG, TSJ, NES and ERM designed the study; MDD, NW and DE performed experiments; JEN performed the immunohistochemistry; NW, DE, MDD, JDS, TAG, THP, TSJ and RG collected and analysed data; NJ, ERM, AG, YLG, GCC, HEV, JT and GD provided DNA samples and materials; MDD, HL, ERM, NES, NW, DE, SB and RG wrote the manuscript; SB, AJ, TSJ, HL, RG and THP provided technical support, conceptual advice and project coordination.
Provenance and peer review: Not commissioned; externally peer reviewed.