Human mutation rates are important for understanding many aspects of evolution and medicine, and attempts to estimate them date back to Haldane's prescient 1935 figure of 2 × 10
−5 mutations/gene/generation for the haemophilia gene
[1]. This rate is equivalent to 2 × 10
−8 mutations/nucleotide/generation if mutations at 1,000 nucleotides could generate haemophilia. Similarly, Kondrashov's estimate at 20 loci causing Mendelian disorders was 1.8 × 10
−8 mutations/nucleotide/generation
[2]. Alternative estimates for human and chimpanzee sequences that are likely to be neutral have also been similar: for example ~2.5 × 10
−8 mutations/nucleotide/generation
[3]. Yet the mutation rate depends on local context; it varies over a scale that ranges from pairs of nucleotides (e.g., CpG dinucleotides show an approximately 10× higher rate of base substitution than the average) to entire chromosomes (e.g., the Y chromosome shows a rate several times higher than autosomes because of its restriction to the male germ line, where more cell divisions occur per meiosis)
[12]. It has not previously been possible to measure base-substitution mutation rates directly by sequencing human nuclear DNA in families, but this has been done for the mtDNA HVSI, where a controversy has emerged over whether the “pedigree rate” measured in family studies is consistent with the “evolutionary rate” inferred from comparisons of different species or whether it is substantially faster
[13]. The ability to measure nuclear rates directly, offered by advances in sequencing technology, now promises additional insights into these areas.
Current next-generation sequencing technologies such as the Illumina platform used here have a high base-calling error rate, perhaps 1%, and have the additional feature that the short reads obtained need to be mapped to the reference sequence; this feature is potentially error prone for non-unique sequences. We overcame base-calling errors by using high-quality calls and high coverage (mean 11× and 20×, respectively) and avoided mapping errors by excluding the extensive duplicated (“palindromic”) and highly repeated sections of the reference sequence from the analysis, as well as applying the filtering criteria listed in . We then tested all candidate mutations by capillary sequencing, and thus we are confident that the false-positive rate in the final dataset is effectively zero. The false-negative rate is more difficult to measure, but three lines of reasoning suggest that it is low. First, relaxing the candidate-mutation filters to include second-class candidates did not identify any additional mutations (). Second, in the capillary verification experiments, about 20 kb was sequenced from both chromosomes, and no unexpected mutations were discovered. Third, all of the expected gold-standard YCC SNPs were detected. Because these are detected by comparison with the reference sequence in the same way as mutations, we can use this measurement to estimate a false-negative rate of <2% at the positions that differ between the DFNY1 and reference sequences. Thus, we conclude that the measured rate is reliable.
In the current study, two DFNY1-family Y chromosomes separated by 13 generations were resequenced. Because one carries the DFNY1 mutation and the other does not, the question arises as to whether the mutations detected might relate to the DFNY1 phenotype rather than representing the neutral rate. Three of the four can be eliminated as causal because they do not segregate with the phenotype. The fourth (ChrY: 2,971,542 A>T) segregates with the phenotype but lies in a region devoid of genes and seems unlikely to be causal because a compelling candidate mutation—a rearrangement located outside the ~10.15 Mb region scanned here—has been identified (our unpublished data). We therefore conclude that the SNP mutations observed do indeed represent the neutral rate.
The measured mutation rate has wide confidence intervals, but in the future these could be narrowed substantially if more sets of related males were sequenced, and they could in principle be more precise than rates inferred from comparisons of related species, which are limited by uncertainties in the fossil record and the generation times of extinct ancestors. No discrepancy in the pedigree or evolutionary rate was evident. Although mutations in cell culture are expected, the contrast between 8/8 mutations in one cell line and 0/8 in the other was not (p = 0.008) and suggests the influence of unknown mutagenic environmental factors or, more likely, a mutagenic genotype specific to DFNY1-101, and it illustrates how different somatic mutation rates can be in related cell lines. Two of the cell line mutations (4,633,474 C>T and 4,980,623 T>G;
Figure S1) were mixtures of ancestral and mutant alleles, but the other six were fixed (3,957,219 G>A, 4,939,256 T>C, 12,063,011 C>G, 15,126,873 T>C, 20,627,064 C>G, and 27,095,961 A>G).
In conclusion, we have shown that one can use next-generation sequencing technology to measure the very low mutation rate of human nuclear DNA reliably. The mutation rate observed is consistent with that inferred from evolutionary comparisons but can potentially be measured more precisely and provide new insights into human mutation processes.