|Home | About | Journals | Submit | Contact Us | Français|
J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female 1. Diverse studies have supported Haldane’s contention of a higher average mutation rate in the male germline in a variety of mammals, including humans (e.g. 2,3). Here we present the first direct comparative analysis of male and female germline mutation rates from complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell-lines from which DNA was derived. Most strikingly, in one family we observed that 92% of germline DNMs were from the paternal germline, while, in complete contrast, in the other family 64% of DNMs were from the maternal germline. These observations reveal considerable variation in mutation rates within and between families.
Mutation underlies all heritable genetic variation, and the observation that a mutation has arisen de novo can be highly discriminating for identifying causal pathogenic variation in patients 4-6. Attempts to measure mutation rates in humans fall into two broad categories: direct methods that estimate the number of mutations that have occurred in a known number of generations 7,8, and indirect methods that infer mutation rates from levels of genetic variation within or between species. Previous estimates of germ-line base substitution rates range from 1.1 to 3 × 10−8 per base per generation 7,9-14. This variation is due, in part, to uncertainty or assumptions in key parameters, such as divergence times between species, generation times and ancestral population sizes. Furthermore, all previous estimates represent an average across multiple generations and/or an average of male and female mutation rates. Consequently, the previous studies provide no information on how mutation rates vary between individuals of either the same or different sexes or indeed between gametes within an individual. It has been proposed that the mammalian male germline may be more mutagenic than the female, because of the greater number of cell divisions 1. Subsequent studies (e.g. 2,3) have suggested, on average, the male germline is more mutagenic than the female, with the most robust recent estimate 3 based on whole genome sequences of human and chimpanzee, suggesting a six-fold difference, averaged across ~5-7 million years of independent evolution of the two lineages.
High-throughput sequencing enables whole-genome analysis of mutation rates in human pedigrees 7, and promises to revolutionize our understanding of how mutation rates vary between sexes, individuals and families. We analyzed lymphoblastoid cell-lines from two parent-offspring trios (CEU and YRI) sequenced genome-wide to greater than 22-fold mapped depth using three different sequencing platforms during the pilot phase of the 1000 genomes project (15, Online Methods). We developed three independent probabilistic algorithms, to identify candidate de novo mutations (DNMs) from these sequence data (Supplementary Note). From the union of candidate DNMs identified by the three algorithms 3,236 and 2,750 potential DNMs were selected for experimental validation from the CEU and YRI trios respectively, far in excess of the expected number of true germline DNMs, to maximize our sensitivity to detect DNMs.
We attempted validation of every candidate DNM, using two novel experimental approaches and additional resources from each family to unambiguously distinguish germline DNMs from somatic or cell-line DNMs (Figure 1, Online Methods, Tables S1-3). For the CEU trio these validation experiments were performed on LCL-derived DNA from both the original trio and a third-generation from the same family. For the YRI trio these validation experiments were performed on LCL-derived DNA from the trio as well as whole-genome-amplified blood-derived DNA from the same individuals. Using these validation data we classified each putative DNM into one of five categories: (i) germline DNM, (ii) non-germline (somatic or arising in cell culture) DNM, (iii) inherited variant (iv) false positive, (v) inconclusive (Table 1, Supplementary Note, Table S1 and Figures S1-2). We identified 49 and 35 germline DNMs and 952 and 643 non-germline DNMs in the CEU and YRI trios respectively. The observed ~20:1 ratio of non-germline DNMs to germ-line DNMs is substantially larger than the 1:1 ratio published previously 4. This difference could be due to the age of the cell-lines (number of passages), the mutagenicity of the cell culture conditions and/or the clonality of the cell-lines. We observed differences in the mutational characteristics of germline DNMs, non-germline DNMs and inherited germline variants, in terms of the ratio of transitions and transversions, the proportion of CpG mutations, the clonality of mutations, their occurrence at sites under selective constraint and the evidence for transcription coupled repair (Table 1, Supplementary Note, Figure S3-4).
By estimating the false negative rates in discovery and validation of DNMs and quantifying the proportion of the genome that we were able to scrutinize reliably for DNMs (Supplementary Note), we estimated the germline DNM rate in each trio to be 1.17 × 10−8 (95% CI: 0.88 × 10−8 - 1.62 × 10−8) and 0.97 × 10−8 (95% CI: 0.67 × 10−8 - 1.34 × 10−8) for the CEU and YRI trios respectively. The sex-averaged germline mutation rate estimates we derived agree very closely with three other recent studies focusing on sex-averaged mutation rates in the most recent generation 4,7,13. Averaging across these four studies gives a more precise sex-averaged mutation rate of 1.18×10−8 (±0.15×10−8), which is less than half of the frequently-cited sex-averaged mutation rate derived from human-chimpanzee sequence divergence of 2.5×10−8 14. These apparently discordant, estimates can be largely reconciled if the age of the human-chimpanzee divergence is pushed back to 7 million years, as suggested by some interpretations of recent fossil finds 16, and by considering more recent (and slightly lower), robust genome-wide estimates of sequence divergence 17. These considerations suggest a plausible range for the divergence-derived mutation rate of 1.12×10−8 to 2.05×10−8, which encompasses the averaged contemporary mutation rate above. Moreover, by considering that the distribution of mutation rates in the population could contain a long tail of relatively rare individuals with considerably higher mutation rates (perhaps as a result of genetic or environmental factors), it can be appreciated that the mean rate across many generations could be considerably greater than the modal rate within a generation.
We ascertained for most germline DNMs whether they arose on a paternal or maternal haplotype, using three alternative methods (Online Methods, Supplementary Note, Table S1). Where more than one haplotyping method could be applied to the same DNM (N=17) the results were 100% concordant. Male and female germline mutation rates in the two trios (Figure 2) were significantly different (p < 3 × 10−6, Fisher exact test). In one family, 92% of germline DNMs are from the paternal germline, whereas, in the other family only 36% of DNMs were paternal in origin. Although, the confidence intervals of some of the parent-specific rates overlap, the paternal rates in the two trios do not overlap, and neither do the maternal rates. These differences could be due to extensive variation in the number of DNMs in gametes from the same individual or to considerable variation between individuals in their underlying DNM rate. With only a single offspring per family, we cannot distinguish between these two alternatives, but either would give rise to substantial variation in the number of DNMs between offspring of different families. The potential scale of this variation can be appreciated by simply considering that exchanging the paternal gamete in the CEU trio for that in the YRI trio would have resulted in a five-fold difference in the number of mutations seen in the two offspring.
Some of this variation in mutation rates between families might be explained by differences in parental ages and a dependency of mutation rate on age. Unfortunately, parental ages at conception for these two trios were not available, nevertheless the analysis of larger sibships would be required to disentangle fully the effects of parental age from genetic and environmental factors that might also differ between families. Variation in mutation rates between individuals could also be partly explained by a recent relaxation of selective constraint on mutation rates resulting from the lower efficiency of selection in humans as compared to the most recent common ancestor of humans and chimpanzees 16, due to our small effective population size 17. Mutation is a random process and, as a result, considerable variation in numbers of mutations is to be expected between contemporaneous gametes within an individual. If modeled as Poisson process, the 95% confidence intervals on a mean number of ~30 DNMs per gamete (as expected from a mutation rate of ~1×10−8) ranges from 20 to 41, a two-fold difference. Truncating selection might act to remove the most mutated gametes and thus reduce this variation among gametes that successfully reproduce, however, any additional heterogeneity in stem cell ancestry or environment, for example, variation in the number of cell divisions leading to contemporaneous gametes, would likely increase inter-gamete variation in numbers of mutations.
In summary, while there may be growing concordance in estimates of the average mutation rate in contemporary generations, we have presented evidence of substantial variance in sex-specific mutation rates between families. The variation in mutation rates that we observed is of potential clinical significance, as it suggests that the risk of mis-diagnosing a DNM as being pathogenic could vary substantially between patients.
Advances in sequencing technologies that lower costs and increase fidelity (Supplementary Note) will empower further studies into mutational processes by applying the framework we have established here for estimating sex-specific mutation rates in families. These future studies promise to revolutionise our understanding of mutation processes, and how they vary between individuals and between families as a result of age, genetic background and environmental exposures.
We would like to thank Gil McVean, Tim Massingham, Jeff Thorne, Julie Hussin, Alison Motsinger, Coriell Cell Repositories and members of the 1000 genomes analysis group for their help and support. DFC, SJL, YZ, CT and MEH were funded by the Wellcome Trust [grant number: 077014/Z/05/Z]. JK, FC, YI, MZ, GAR, and PA were funded by the Ministry of Development, Exploration and Innovation (grant #PSR-SIIRI-195) in Quebec and a Genome Quebec Award for Population and Medical Genomics to PA.
Author contributions: MEH and PA conceived of the study; DFC, JEMK, MAD, MD, RC, EAS and PA developed statistical methodologies; DFC, JEMK, MAD, CLH, KVG, EAS, MEH and PA analysed the data; FC, YI, GAR, CT, MZ, SJL and YZ generated validation data; and DFC, PA and MEH wrote the paper.