|Home | About | Journals | Submit | Contact Us | Français|
Genes mutated in congenital malformation syndromes are frequently implicated in oncogenesis1,2, but the causative germline and somatic mutations occur in separate cells at different times of an organism’s life. Here we unify these processes for mutations arising in male germ cells that show a paternal age effect3. Screening of 30 spermatocytic seminomas4,5 for oncogenic mutations in 17 genes identified 2 mutations in FGFR3 (both 1948A>G encoding K650E, which causes thanatophoric dysplasia in the germline)6 and 5 mutations in HRAS. Massively parallel sequencing of sperm DNA showed that the FGFR3 mutation increases with paternal age, with a similar mutation spectrum at the K650 codon to that in bladder cancer7,8. Most spermatocytic seminomas show increased immunoreactivity for FGFR3 and/or HRAS. We propose that paternal age effect mutations activate a common “selfish” pathway supporting proliferation in the testis, leading to diverse phenotypes in the next generation including fetal lethality, congenital syndromes and cancer.
Spontaneous germline point mutations in humans occur at average rates of 4-160 × 10−9 per haploid genome, tend to be paternal in origin with a male/female mutation ratio of 2-7, and show only a modest effect of parental age on mutation rate9-13. However a different pattern is evident in some congenital disorders, which arise from specific point mutations that are 2-3 orders of magnitude more common than expected. The causative mutations nearly always originate from the healthy fathers (male/female ratio >10) who, on average, are 2-6 years older than the population mean. We term mutations with these collective properties paternal age effect mutations: the best documented examples occur in the genes FGFR2, FGFR3, HRAS, PTPN11 and RET (Supplementary Table 1 online)14-19. In all cases the mutations exhibit dominant inheritance and encode missense substitutions with gain-of-function properties.
The pathological basis of paternal age effect mutations needs to be explained in the context of normal spermatogenesis, in which progeny of diploid stem cells (spermatogonia) have a choice either to self-renew, or to differentiate through a series of mitotic and meiotic divisions, leading to mature sperm20. Amongst paternal age effect mutations, the properties of the 755C>G transversion in FGFR2 (a cause of Apert syndrome)14, have been investigated in most detail. A quantitative assay showed that the mutation level is elevated (10−6-10−4) in the sperm of many healthy men and increases with age21. However, the mutation levels are usually distributed very unequally between the two FGFR2 alleles, a pattern inconsistent with the notion that the mutations originate from many independent replication errors during spermatogenesis. Instead, rare initial mutations could become enriched because of a selective advantage: this would lead to progressive clonal expansion of the mutant spermatogonia and account for the allelic skewing and paternal age effect21,22. Putative FGFR2-mutant clones have been observed in normal testes, as predicted23. Additional analyses of sperm and testes for a different Apert syndrome FGFR2 mutation24,25, as well as an FGFR3 mutation that causes achondroplasia26-28, support a shared mechanism for the origin of paternal age effect disorders.
The proposed clonal expansions of spermatogonia are reminiscent of the action of oncogenes in neoplasia; consistent with this, somatic FGFR2 mutations identical to those causing Apert syndrome are frequent in endometrial carcinoma29,30. We therefore proposed that the mutant clones in the testis might themselves progress to testicular tumors31. Previous attempts to identify FGFR2 or FGFR3 mutations in common testicular germ cell tumors (seminomas and non-seminomas) yielded negative results31,32, however these tumors occur predominantly in young men (aged 25-35 years) and arise from a fetal precursor state33, which is difficult to reconcile with the proposed origin of paternal age effect mutations. Here we have investigated spermatocytic seminomas, a rare type of testicular germ cell tumor with a later mean age of onset (~54 years). These tumors present as slow growing, well-circumscribed swellings that rarely metastasize and are thought to originate from the adult spermatogonial lineage4,5.
We sequenced mutation hotspots in fibroblast growth factor receptor genes (FGFR1, FGFR2, FGFR3) in 30 spermatocytic seminomas (Supplementary Table 2 online) and found the identical 1948A>G transition in FGFR3 (encoding K650E) in two different tumors (Supplementary Fig. 1a). In both cases, histopathologically normal testis adjacent to the tumor was negative for the mutation. This mutation has previously been identified in the germline heterozygous state in the neonatally lethal skeletal disorder thanatophoric dysplasia type II (TDII, MIM187601)6, and as a somatic mutation in bladder tumors7,8, seborrheic keratoses34, and multiple myeloma35 (Supplementary Table 3 online). The FGFR3 K650E substitution is strongly activating, allowing constitutive autophosphorylation of the intracellular tyrosine kinase domain in the absence of ligand (Supplementary Note online)36,37. A paternal age effect in thanatophoric dysplasia was described previously, but genetic studies were not undertaken38,39.
Based on the birth prevalence of thanatophoric dysplasia (which is caused by several different FGFR3 mutations)6,27 of 2.4 × 10−5 (refs. 38,39) and by analogy with other FGFR2 and FGFR3 mutations21-28, we expected the 1948A>G mutation to be present at average levels of ~10−5 in sperm. Since five other point mutations of the FGFR3 K650 codon (AAG) have been described in germline congenital disorders36,40,41 and three of these also as somatic mutations in tumors8,34,35,42 (Fig. 1a, Supplementary Table 3), we aimed to quantify all 9 possible substitutions at FGFR3 codon 650. We divided each DNA sample into 3 aliquots, 2 of which were spiked with different amounts of diluted genomic DNA heterozygous for the FGFR3 1948A>C substitution, to provide an internal standard for absolute quantification of mutation levels. Samples were digested with BpiI (recognition sequence GAAGAC), to enrich equally for all FGFR3 K650 codon substitutions. During subsequent PCR amplification we used primers containing unique 4-nucleotide tags to identify each sample, then all products for a given spike level were pooled to construct 3 independent libraries for massively parallel sequencing (Fig. 1b).
To assess the accuracy and reproducibility of the assay, we estimated mutation levels for two different FGFR3 K650 substitutions (1948A>G [2 samples] and 1949A>T), when progressively diluted with normal blood DNA. For both 1948A>G samples we found an excellent correspondence between the amount of input DNA and the estimated mutation level, at least down to an input concentration of 10−5; for the 1949A>T sample, mutation levels were underestimated by a factor ~3.3 fold, but were also strongly correlated with the amount of input DNA (Fig. 2a).
We estimated levels of the 9 possible single nucleotide substitutions at the FGFR3 K650 codon in 78 sperm and 8 blood samples obtained from healthy donors. Whereas relatively low counts of all mutations were found in the blood samples, levels of the 1948A>G substitution were often much higher (maximum of 2.1 × 10−4) in sperm samples and were significantly correlated with donor age (Spearman rank rs = 0.34, P = 0.002) (Fig. 2b). 1948A>G was the most prevalent substitution in 66/78 sperm donors (Fig. 2c; Supplementary Table 4 online) and accounted for 73% of total mutations in these samples. Amongst the other potential substitutions, 1949A>C (K650T) accounted for 17% of total mutations in sperm was the most prevalent mutation in 8/78 samples. This change has been described as a constitutional mutation in a few individuals with acanthosis nigricans and mild short stature41, and as a somatic mutation in bladder tumors42 (Supplementary Table 3). The three other substitutions that accounted for total mutation levels >1%, 1949A>T (K650M) and 1950G>C/T (K650N), have also been observed in constitutional disorders (Fig. 2a, Supplementary Table 3)36,40. By contrast, the three point mutations that encode silent, conservative or stop changes (Fig. 1a), which have not been reported as either germline or somatic mutations, were all >1000-fold less prevalent than 1948A>G.
The marked variations in prevalence of different mutations at the K650 codon, and correspondence with the functional effect of the encoded substitution, suggest differential selection of cells expressing mutant proteins. We compared the average levels in sperm of the 9 FGFR3 K650 point mutations with four other measurements for this codon (Fig. 2d, Supplementary Table 3). There was a strong correlation (r = 0.95) with total cases reported of each germline mutation, indicating that the level of mutations in sperm is likely to be the major determinant of the population prevalence of different pathogenic germline mutations. There was also a strong correlation (r = 0.94) with the total cases reported of each somatic mutation in bladder tumors, indicating that similar mutation/selection forces are likely to act in these distinct biological contexts. The correlations were substantially weaker with the mutation distribution observed in seborrheic keratoses (r = 0.33) and with the relative activation potential of the TK measured by in vitro kinase assay (r = 0.27)36: in both contexts, the gain-of-function effect of the K650M mutation appears to surpass that of K650E (Fig. 2d). However, heterozygosity for K650M (or the equivalent K644M in mouse) leads to a viable phenotype in both species40,43, whereas K650E (K644E in mouse) is lethal6,44, which demonstrates that the in vitro kinase measurement does not capture all dimensions of the pathological consequences of these two substitutions (see Supplementary Note).
We screened 14 additional genes in the spermatocytic seminomas, including genes (i) mutated in syndromes exhibiting a strong paternal age effect (HRAS, PTPN11, RET), (ii) involved in signal transduction pathways (mitogen activated protein kinase [MAPK] and phosphoinositide-3 kinase [PI3K]) of proteins encoded by class (i) genes and for which pathogenic activating mutations have been reported (AKT1, BRAF, KRAS, MAP2K1, MAP2K2, NRAS, PIK3CA, SOS1) and (iii) genes for which oncogenic mutations are commonly found in tumors (bladder, thyroid, and endometrial cancers) where paternal age effect mutations have also been described (CTNNB1, EGFR, KIT). Five mutations (all in tumors negative for FGFR3 mutations), were found in HRAS at the Q61 codon hotspot of activating mutations: 3 were 182A>G (Q61R) transitions and 2 were 181C>A (Q61K) transversions. All mutations were apparently homozygous, and were absent in adjacent normal tissue in the four available cases (Supplementary Fig. 1b). Q61R and Q61K substitutions are common in human cancers (Catalogue of Somatic Mutations in Cancer: http://www.sanger.ac.uk/perl/genetics/CGP/cosmic) and are both highly activating in a transformation assay45. In the germline, heterozygous HRAS mutations cause Costello syndrome (MIM218040)16,17,46, but no substitution at Q61 has been identified, which likely reflects lethality of these mutations17,45,47 (see Supplementary Note). No mutations were identified in the other 13 genes screened (Supplementary Table 2). The average age of patients positive for FGFR3 or HRAS mutations (74.9 years) was significantly greater than for mutation-negative cases (57.6 years) (Student t-test P = 0.0009) (Fig. 3a).
To evaluate further the role of FGFR3 and HRAS in spermatocytic seminomas, we examined protein distribution in 26 tumors, including all 7 mutation-positive cases. FGFR3 (ref.48) and HRAS are expressed in subpopulations of cells in normal seminiferous tubules (Fig. 3b insets); pERK1/2, an activated effector of the MAPK pathway (Fig. 4a), is present in the nucleus of a subset of spermatogonia, primary spermatocytes and a subset of round spermatids (not shown). In spermatocytic seminomas, we found strong positive staining for FGFR3 in 5/24 tumors (including 1/2 that were FGFR3 mutation positive) and for HRAS in 15/26 tumors (including 4/5 that were HRAS mutation positive) (Fig. 3b); 18/26 (70%) of tumors were strongly positive for one or both proteins. In most tumors (18/24), including 6/7 mutated cases, the presence of FGFR3 and HRAS was mutually exclusive, but this did not show a simple correlation with the gene mutated in the mutation-positive cases (Fig. 3b). pERK1/2 was detected in 14/25 (56%) of tumors (Supplementary Table 2).
Our results support the proposal31 that clonal expansion resulting from selective advantage of paternal age effect mutations can lead to testicular tumors. To our knowledge this work links for the first time in any organism the processes of mutation in the soma (causing neoplasia) and germline (causing heritable disorders in the next generation), which normally occur in different cells, to a mutational event likely occurring in the same cell. The clonal expansions presumably involve altered dynamics of stem cell self-renewal, through a proliferative advantage (possibly enhanced by preferential survival)49 compared to neighboring non-mutant cells, analogous to the role of oncogenes in cancer. Only weak advantage (selection coefficient of 0.002-0.01 per cell generation)3,23,24 is necessary to account for the observed mutation levels in sperm and paternal age excess (2-6 years) observed for the associated germline disorders. Our data favour a premeiotic origin4,5 for spermatocytic seminoma, which is supported by the observation that transfection of murine spermatogonial stem cells with mutant HRAS (encoding G12V) causes seminomatous tumors50. Men originating tumors containing FGFR3 and HRAS mutations were significantly older than those without mutations (Fig. 3a), suggesting that the mutated tumors represent a distinct pathological subset.
Based on these and previous data21-28, activating mutations in FGFR2, FGFR3 and HRAS promote clonal expansion in the testis. The encoded proteins are physiologically connected, as HRAS lies downstream of FGFRs in the growth factor receptor signaling pathway (Fig. 4a)2,46. By considering the additional genes (RET, PTPN11) subject to paternal age effect mutations (Supplementary Table 1), these connections can be extended. Hence, RET is, like FGFRs, a receptor tyrosine kinase (RTK) that signals through RAS and plays a critical role in spermatogonial renewal51. Overexpression of GDNF, the RET ligand, leads to accumulation of undifferentiated spermatogonia and testicular tumors in older mice52. PTPN11 (encoding SHP2) positively regulates RTK-RAS signaling (Fig. 4a)53.
A central role for abnormal FGFR3-RAS signaling in the origin of spermatocytic seminoma is supported by the immunohistochemical analysis (Fig. 3b). Both FGFR3 and RAS are present in normal testis, but are markedly elevated in spermatocytic seminomas, including cases lacking FGFR3 and HRAS mutations (in which other mechanisms are presumably leading to upregulation of the proteins). The reciprocal pattern of increased FGFR3 or HRAS staining in many tumors suggests that elevation of either component is sufficient for pathway activation. Similar observations have been made in low grade bladder cancer, in which mutations in FGFR3 and RAS genes are mutually exclusive54 and FGFR3 mutation status and protein expression do not always correlate8. The spectrum of FGFR3 K650 codon mutations in sperm is very similar to that present in bladder cancer (Fig. 2d), suggesting that similar mutation-selection mechanisms operate in these distinct cellular contexts.
RAS activates multiple pathways including those typically involved in proliferation (MAPK) and survival (PI3K)2. Activating mutations of genes encoding downstream components of the MAPK pathway (BRAF, RAF1, MEK1 and MEK2), cause neuro-cardio-facial-cutaneous syndromes that overlap with PTPN11 and HRAS mutations (Fig. 4a)2,46; inhibition of the MAPK pathway in mice modeling specific Fgfr2 and Ptpn11 mutations ameliorates their abnormal phenotypes55,56. We observed pERK1/2 staining in 56% of spermatocytic seminomas, indicating pathological activation of the MAPK pathway (Supplementary Table 2). However the PI3K pathway has also been functionally implicated in spermatogonial self-renewal50.
We envisage a range of consequences for these selfish27 mutations occurring in spermatogonial cells (Fig. 4b). Heterozygosity for the most highly activating mutations, such as FGFR3 K650E and HRAS Q61R/K, causes severe, lethal phenotypes when transmitted in the germline6,17. Additional secondary genetic changes at these loci (Supplementary Note) would lead, in combination with other mutations57, to spermatocytic seminoma. Moderately activating mutations, for example those encoding substitutions at G12/G13 in HRAS (Costello syndrome)16,17,46, in FGFR2 (Apert syndrome)14,21-25 and FGFR3 (achondroplasia)15,26,28, lead to clonal expansion that is eventually limited by growth arrest or senescence58 before overt tumors become apparent. We view this process as analogous to that occurring in skin, where a spectrum of activating FGFR3 mutations in keratinocytes leads to seborrheic keratoses34. Diverse mutations in spermatogonia that confer weaker selective advantage may lead to lower levels of enrichment (>1-50 fold) in sperm. As well as encoding missense substitutions, mutations could confer altered gene expression, which shows a high fraction of positively selected changes in testis59. Such subtle effects would be technically challenging to detect, yet could contribute substantially to the burden of human disease, through a common disease-rare variant mechanism60. Cancer-predisposing mutations are especially likely to be favored by this process, posing an increased risk to offspring of older fathers.
Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturegenetics/.
GenBank: FGFR3 genomic DNA, NT_006081; FGFR3 exon IIIc cDNA, NM_000142; HRAS cDNA, NM_005343. cDNA numbering is given relative to the A (=1) of the ATG initiation codon; note that the FGFR3 K650 codon is alternatively numbered K652 in the exon IIIb spliceform.
Approval for the study was provided by the Oxfordshire Research Ethics Committee C (OxREC C03.076) and the Joint Research Ethics Committees of the Copenhagen and Frederiksberg Communes (KF 01 265848).
We obtained 43 spermatocytic seminomas collected from tissue archives at hospitals in Denmark, Sweden and Oxford, UK, and included mutation and immunohistochemistry data on 30 samples for which >30% (average 64%; range 30%-98%) of exons amplified. The mean age of patients at the time of tumor removal was 61.9 years (range 33-87 years). In addition to having typical morphological features, the tumor type was confirmed by staining with MAGE-A4, while PLAP and OCT3/4 were negative5,57. None of the tumors was invasive.
Single ejaculates from 78 men (aged 22.1-73.9 years) and 8 blood samples from individuals aged 36.6-73 years were donated anonymously. DNA samples from 2 patients heterozygous for FGFR3 1948A>G (TD3 and TD6), one patient heterozygous for 1949A>T (SAD4) and one patient heterozygous for 1948A>C (HCH1-spike, used as the spike DNA) were a kind gift from L. Legeai-Mallet (Hôpital Necker-Enfants Malades, Paris, France).
DNA was extracted from paraffin-embedded samples as described31 and mutation hotspots were analyzed by PCR amplification and DNA sequencing of the following genes: AKT1, BRAF, CTNNB1, EGFR, FGFR1, FGFR2, FGFR3, HRAS, KIT, KRAS, MAP2K1, MAP2K2, NRAS, PIK3CA, PTPN11, RET and SOS1. All PCRs were set up in a sterile UV-hood in a 30 μl reaction volume using 1× FastStart Buffer, 150 μM dNTPs, 0.16 μM primers (forward and reverse), 0.5 U FastStart Taq DNA polymerase and 0.05 U Pwo DNA polymerase (both from Roche). When available, restriction digests were used to screen for the mutations of interest. The PCR primers, conditions and genotyping methods are given in Supplementary Table 5 and the results in Supplementary Table 2. The products were cleaned (30 min at 37 °C in 0.2× Exo I buffer, 10 U Exonuclease I [New England Biolabs] and 2 U Shrimp Alkaline Phosphatase [SAP, United States Biochemical], followed by 15 min at 85 °C), sequenced in both orientations, and run on a ABI 3700 DNA sequencer (Applied Biosystems). Positions where germline and/or somatic mutations have been previously reported were specifically interrogated on chromatograms and scored independently. To evaluate the zygosity status of the tumor samples, we genotyped 4 single nucleotide polymorphisms (SNPs) (rs2071616, rs2659871, rs41279090 and rs2075526).
Representative cores (2 mm) of tumor and adjacent normal tissue (when available) were punched from paraffin-embedded blocks of spermatocytic seminomas and control tissues (epididymis, prostate, classical seminoma and embryonal carcinoma). Two tissue microarrays containing 40 cores each were constructed and sectioned at 4 μm. Larger sections from some tumors were individually stained. Immunohistochemical staining was performed using diluted monoclonal mouse antibodies to FGFR3 (B-9, 1:50), HRAS (F-235, 1:80), OCT-3/4 (C-10, 1:250) (all from Santa Cruz Biotechnology), MAGE-A4 (1:2000; a kind gift from Dr. G. Spagnoli, Basel, Switzerland) and PLAP (1:50; DAKO), and a monoclonal rabbit antibody to pERK1/2 (20G11, 1:150; Cell Signalling Technology), by means of a standard indirect peroxidase method. Deparaffinized and rehydrated sections were microwaved, incubated first in 0.5% H2O2, then in goat serum (Histostain kit, Zymed) before the addition of the primary antibodies overnight at 4°C, while control sections of each specimen were incubated in the dilution buffer alone. After washing in Tris buffer and incubation with a peroxidase conjugated anti-mouse antibody (or anti-rabbit for pERK), the reaction was developed in the presence of 3-amino-9-ethyl carbazole (AEC) and H2O2 (Histostain kit). All sections were counterstained with Mayer’s haematoxylin. The sections were evaluated by two independent observers and semi-quantitative scoring was used to assess the relative abundance of stained cells.
DNA was extracted from blood and whole ejaculates as described21 and concentrations were precisely estimated at 3 dilutions against a dilution series of human genomic DNA (Roche) using the TaqMan PCR conditions (Supplementary Table 6), designed to a unique 91 bp amplicon located on chromosome 16 (courtesy of M. de Gobbi).
Measurements of mutation levels around the FGFR3 K650 codon (cDNA position 1948-1950) were performed using a strategy similar to that described21. Triplicate samples each containing 10 μg of genomic DNA and either 2 ng (spike level 10−4), 0.2 ng (spike level 10−5) or 0 ng (unspiked) of the HCH1-spike DNA were digested with 120 U XbaI and 40 U BpiI (both Fermentas) for 4 h. The digested DNA samples, flanked by 2 lanes of Lambda DNA/Eco91I marker (Fermentas), were electrophoresed in a 0.9% Tris-borate-EDTA (TBE) agarose gel (without ethidium bromide). This double digestion generates a 4,453 bp XbaI-fragment carrying mutant FGFR3 K650 sequences, while the normal BpiI-digested K650 sequence yields 2 fragments of 2,446 bp and 2,007 bp (note that the FGFR3 genomic reference sequence contains a rare G allele at a known A/G SNP rs7688609, located at cDNA position 1953; this nucleotide is shown as A in Fig. 1a and Supplementary Fig. 2).
To select for mutant sequences, a gel slice corresponding to 4.2-4.6 kb (the marker lanes were cut out of the gel, stained with ethidium bromide and the 4,325 bp and 4,822 bp fragments were labelled with dye before being replaced in their original position in the gel), was excised and gel purified with an E.Z.N.A. MicroElute kit (Omega Bio-Tek). PCR amplification was performed (PCR1, Supplementary Table 6) followed by a second round of selection with 30 U BpiI for 4 h in 100 μl volume to yield selected material referred to as ‘Pool2′. For each biological sample, an aliquot of 5 μl of the Pool2 material was used as template for a nested amplification (PCR2) in order to prepare the Illumina libraries (see below).
Genomic samples heterozygous for FGFR3 1948A>G (TD3 and TD6) and 1949A>T (SAD4) mutations were used in reconstruction experiments in which 10 μg of blood genomic DNA was mixed with a diluted series of mutant DNA corresponding to final mutation concentrations of 0 (no added mutant DNA), 3 × 10−6 (0.06 ng), 10−5 (0.2 ng), 3 × 10−5 (0.6 ng), 10−4 (2 ng) and 3 × 10−4 (6 ng) and taken through the same protocol as the sperm and blood samples (i.e. each dilution sample was mixed with 3 dilutions of the HCH1-spike DNA). The dilution samples were analyzed together with the blood and sperm samples.
Mutant genomic samples TD3, TD6 and SAD4 and 4 normal (wild-type) genomic DNA samples were included as controls in the analysis. These samples were taken through the same protocol of amplification with the exception of the omission of the BpiI enzyme from all the incubation steps, hence there was no selection imposed on these samples.
Three independent libraries were prepared for massively parallel sequencing using a modified version of the Illumina protocol “Digital Gene Expression-Tag Profiling with DpnII”. Each library contained a mixture of 112 DNA species and was characterized by a specific amount of the HCH1-spike DNA (library 1 was unspiked, library 2 contained the spike at 10−5 and library 3 contained the spike at 10−4). Primer sequences and reaction conditions are provided in Supplementary Table 6.
Five microlitres of each Pool2 sample was used for PCR amplification (PCR2) using 112 different forward 4nt-tag-Fw primers (containing a common FGFR3 sequence preceded by a unique 4 nt tag and a DpnII cloning site; Supplementary Fig. 2). The tag primers were synthesized, HPLC-purified and checked for synthesis error by mass spectroscopy (Thermo Electron Corporation). After confirming efficient amplification, equal volumes of PCR products were mixed so as to be represented in a roughly equimolar ratio in the library. In parallel, the reconstruction-dilution samples and control DNAs were also amplified with unique 4nt-tag-Fw primers and added to the PCR mix. For each library, pooled PCR products were purified (E.Z.N.A. PCR purification kit), digested with 100 U of DpnII (New England Biolabs) for 1 h 30 min, dephosphorylated using 4 U of SAP (United States Biochemical) for 1 h at 37 °C and heat inactivated for 15 min at 80 °C, followed by purification (E.Z.N.A. MicroElute PCR purification kit) and resuspension in 15 μl sterile water. 10 μl of the purified fragments were ligated to the Illumina Gex Adapter 1 (annealed Adapter1a and Adapter1b sequences) using the Adapter ligation conditions. The ligation reaction was electrophoresed in a 3% Tris-acetate-EDTA (TAE) agarose gel and a slice of the expected size (137 bp) was excised, purified (E.Z.N.A. gel purification kit) and resuspended in 25 μl sterile water. Each library was then enriched by a final amplification (Gex PCR) using 1.5 μl of purified ligated products. The 3 libraries were prepared independently to avoid cross-contamination and were quantified using a fluorometer (Qubit) before being validated by direct sequencing, Pyrosequencing (Biotage) and Bioanalyzer (Agilent).
Massively parallel sequencing of the libraries was performed on 3 different channels of an Illumina GAII sequencer by ServiceXS (Leiden, The Netherlands) for 36 cycles using the Gex-DpnII sequencing primer (Supplementary Fig. 2b).
To estimate the levels of each single nucleotide mutation at the FGFR3 K650 codon (corresponding to cycles 23-25 of the Illumina sequencing scheme [Supplementary Fig. 2b]), we used a Bayesian approach to fit a model to the observed counts at each nucleotide in the sequencing data that allows for bias in frequencies derived from sequencing error and noise introduced by the rounds of PCR and digestion. The model is similar to that used previously for Pyrosequencing21, but allows for a different error structure arising from the GAII Illumina technology. We discarded reads where the minimum Phred quality score for any of the four bases of the tag (cycles 1-4), or three bases of the K650 codon (cycles 23-25), was under 10. We also eliminated reads with apparent frameshift errors or mutations outside the BpiI restriction site (i.e. in the invariant 16 bp primer sequence). Let zi be the log10 frequency of mutation i in the sample. Our prior for zi is normal(−8, 2) for all non-wild-type species. After adding the spike at concentration sj in experiment j (s0 = 0, s1 = 10−5, s2 = 10−4) the normalized frequencies of the non-wild-type mutations are given by
where I is an indicator function that takes the value 0 or 1 depending on whether the mutation in question is the same as the HCH1-spike (1948A>C). We allow each experiment (defined by the level of spike and the lane of the machine on which sequencing was carried out) to have separate bias and signal to noise ratio parameters. Specifically, we assume that the counts for the mutations in the sequencing data are multinomial with frequency vector
where Bj represents the signal to noise ratio for experiment j and wij represents the background for mutation i in experiment j. We assume a uniform(0,1000) prior for each Bj and a uniform(0,100) prior for each wij.
Through use of the multinomial-Dirichlet model, we can integrate over x to give the contribution to the total likelihood from experiment j in a given sample
where , Cij is the number of reads observed with mutation i in experiment j, and . Likelihoods are combined across samples and experiments to give a total likelihood.
To estimate parameters we used Metropolis-Hastings MCMC with sequential update of each parameter, a burn-in period of 3,000 complete parameter updates and a subsequent 10,000 samples. Multiple runs from different starting points were conducted to check for convergence and visual inspection was used to compare observed to expected values as a means of checking model adequacy. Titration data (Fig. 2a), normal wild-type control DNA and biological samples were analyzed together. Samples from the chain are summarized by the posterior mean value and the 95% equal-tailed probability interval (ETPI). Estimated levels of all FGFR3 K650 codon substitutions for sperm and blood samples are given in Supplementary Table 4.
We thank the Oxford Fertility Clinic and anonymous sperm donors for help with obtaining samples; L. Andersen, T. Chin-A-Woeng, K. Clark, K. Cook, A. Fenwick, A. Herlihy, H. Kistrup, J. Lim, J-E. Nielsen, L. Thompson and N. Ward for expert technical support and advice; associates of the Rajpert-De Meyts and Wilkie labs for discussions; staff in numerous pathology departments in Denmark and Lund (Sweden), and G. Turner and I. Roberts, for histopathological support and samples; L. Legeai-Mallet for providing control genomic samples; G. Spagnoli for the MAGE-A4 antibody; M. de Gobbi for information on TaqMan primers for DNA quantification; M. Dighe, R. Heller and K. Kutsche for permission to reproduce photos; and S. Robertson for comments on the manuscript. This work was funded by the Danish Cancer Society (DP08147 to ERDM) and the Wellcome Trust (078666 to AOMW).
Note: Supplementary information is available on the Nature Genetics website.
URLs. Digital Gene Expression-Tag Profiling with DpnII, http://illumina.ucr.edu/ht/documentation/molbiol-docs/DGE-DpnII-Sample-Prep.pdf/view.