Despite the advances in personal genomics thus far (Ashley et al., 2010
; Levy et al., 2007
; Pushkarev et al., 2009
; Wheeler et al., 2008
), gamete genome variation within individuals, especially fine scale personal recombination activity and germline mutation rates, have been as yet generally inaccessible. Bulk analysis of sperm cells with PCR offers high-resolution and sensitivity (Jeffreys et al., 2005
; Webb et al., 2008
) and has been used to demonstrate variable usage of historical recombination hotspots, but is limited to investigating focused areas within the genome. Cytological approaches can be used to study recombination-related effects in individuals, but these studies use gamete progenitor cells instead of sperm and have several limitations: (1) the sample collection requires invasive biopsies, (2) the analysis is performed before the completion of meiosis, so it is not clear if all of the synaptonemal complexes proceed to fully recombine and each progenitor cell analyzed by this method predicts an average result from four future sperm cells, and (3) cytological staining does not allow high resolution molecular analysis such as genotyping or sequencing.
There has been increasing interest in performing single cell genome analysis in human cancers, and one can compare the methods and results used in cancer with those used here for human gamete genomes. One group used FACS to sort individual nuclei from human breast tumors (Navin et al., 2011
). The genomes from these nuclei were amplified in microliter volumes and lightly sequenced to ~0.2x coverage. This data was sufficient to construct a rough cell lineage map but did not allow calling of individual bases; rather, low-resolution structural variants were used. Another group used mouth-pipetting to isolate individual cells from hematopoietic and kidney tumor (Hou et al., 2012
; Xu et al., 2012
), whose genomes were then amplified in microliter volumes. Rather than performing whole genome analyses, these samples were then put through exome amplification and sequencing - effectively obtaining 30x coverage of only 1% of the genome. That data was also used to establish lineage relationships between the cells, this time on the basis of point mutations. Their work reveals one of the challenges of performing single cell analyses on diploid genomes - only 57% of the diploid calls were correct. Without the ability to examine a significant proportion of the whole genome, the studies mentioned above had to rely on high mutation rate to distinguish single cells. As a consequence, none of the methods have been applied to samples other than late stage cancers.
In this study, we applied microfluidics to single-cell whole-genome amplification. This technique not only presents great parallelization, but also improved amplification performance. MDA is sensitive to environmental contamination and extensive sample purification is required for traditional bench-top whole-genome amplifications (Hou et al., 2012
; Jiang et al., 2011
; Xu et al., 2012
). More sensitive assays even revealed contamination in the MDA reagents (Blainey and Quake, 2010
). By incorporating the amplification into microfluidic chips, we reduced the reaction volume, and hence the contamination, by ~1000 fold.
Amplification error has been a concern for single cell whole-genome analysis. Previous microliter volume single cell exome studies have shown 2–3×10−5
false discovery rates from MDA (Hou et al., 2012
; Xu et al., 2012
). Using our microfluidic approach on haploid cells, we have reduced the error rate to 4×10−9
with 5× coverage (binomial probability with per read error rate). An important feature of single molecule MDA is its repetitive usage of the originating genuine template molecule. Even if an amplification error happens in the initial stage, there will still be a large fraction of products preserving the correct base information from the original template, and the power of statistics from multiple coverage discriminates these errors from true genomic variation.
Using this microfluidic MDA approach, we reported the first genome-wide single cell analysis of human sperm. We were able to create a personal recombination map for an individual and to measure the rate of de novo
mutations in this individual’s germline. The advantage of sampling a large set of meioses from single individual for fine scale analysis allowed us to uncover individual specific features potentially buried under population data. P0’s preference for a subset of historical hotspots suggests how individual features contribute to the population diversity, and a potential solution for the hotspot paradox. We propose that this partially overlapping feature is also the general pattern in individuals: everyone is using a different subset of the historical hotspots; while some hotspots are dying in some people, new recombination activities evolve to refill the hotspot pool; the partially overlapping patterns of individuals give rise to the population results, with hotspots (still active in many people) and deserts (used by fewer people). Support to this theory comes from single cell analysis. While P0 has on average 58% overlap with the historical hotspots, this ratio range from 0 to 100% for his single cells (Figure S2D
). The partially overlapping patterns between individual cells produce P0’s personal recombination landscape.
Transmission distortion has long been known but the key factors behind it are not clear. Biased segregation during meiosis, different ability to achieve fertilization and different postzygotic viability can all contribute to this phenomenon. Specifically if meiotic drive exists, the molecular mechanism is not known. Our data from 91 cells showed that meiotic drive does not generally appear as whole haplotype blocks, but may occur at individual SNP loci. The most intuitive explanation for this result would be gene conversion. Indeed, we found 5–15 gene conversions in each genome sequenced cell. This represents a lower bound for the total number of conversions in each single human sperm, since there is a limited heterozygous SNP density. If both crossover events and gene conversion originate from double strand breaks and share a recombination mechanism, then they should have the same hotspot overlapping ratio. If we match the number of gene conversion at hotspots and further assume there are 1.5 million heterozygous SNP in the genome, the total number of gene conversion in a single cell would be ~250–800, which is 10–35× of crossovers. Previous sperm typing studies have suggested 4–15× gene conversions over crossovers, based on data from 3 hotspots (Jeffreys and May, 2004
), but this value apparently changes across the genome (Gay et al., 2007
Evolutionary studies have estimated the germline mutation level (Makova and Li, 2002
) but recent results from 1000 Genome Project (Conrad et al., 2011
) are not consistent with the previous findings. The combination of data from our study and the 1000 Genome Project suggest that the germline mutation rate can vary greatly among different individuals but not different cells from the same individual. This may explain why the male mutation rate is not always higher than the female. DNA methylation also affects genome instability (Li et al., 2012
) and C->T point mutation levels, but in opposite ways. A fine tuned methylation level is therefore required for high quality sperm genome. The high germline mutation rate at CpA regions (Conrad et al., 2011
; Miyoshi et al., 1992
) at least suggests a methylation profile different from the somatic genome. The fact that cytosine deamination is less well repaired at CpA than at CpG also explains our findings (Wang and Edelmann, 2006
The ability to study a large numbers of single sperm cells has offered several new insights in meiosis biology. Studying the germline genome is but one application of single cell genomics and we expect that the method described here will find applications in many other fields, including cancer, aging, immunology and developmental biology.