In the present work, we addressed the problem as to how low-abundant DNA mutations, for example, as typically induced by environmental mutagens, can be detected directly using ultra-high-throughput sequencing. In principle, such low abundant mutations can be detected by single-molecule sequencing (25
). However, this necessarily involves the sequencing of fragments derived from different cells and precludes insight into the mosaic basis of somatic mutations in tissues or cell populations. Indeed, the absence of robust methods for analyzing genomes at the level of the single cell essentially constrains access to the complexity of biological tissues (26
). For example, cancerous tissue is notoriously heterogeneous with each cell carrying its own unique capabilities for growing into a full-blown tumor (1
). The ability to analyze subclonal genetic diversity will greatly expand the accessible clinical information about a particular cancer in a particular patient (27
Another area of application of single-cell genome sequencing is in monitoring stem cells. The genomic integrity of human embryonic stem cells and induced pluripotent stem cells has been questioned due to a high observed rate of point mutations (28
), copy number variations (29
) and changes in genome-wide CpG methylation (30
). These studies identified potentially hazardous somatic mutations/epi-mutations that had clonally expanded through the population. However, there may also be low-abundance mutations present in the stem cell populations missed by the aforementioned analyses that need to be addressed before clinical use can be considered.
Finally, during development, maturation and aging the genome of somatic cells is subject to the continuous occurrence of random genome sequence alterations, which gradually diminish intra-organ homogeneity and may lead to loss of coordination of expression among multiple genes in functional pathways or networks (31
). Such emerging cell-to-cell variability is only amenable through single-cell genomics approaches.
In this study, we have taken the first step towards comprehensively analyzing mammalian single-cell genomes using next-generation sequencing. We first showed evidence of an induced mutation load in single ENU-treated cells from a Drosophila
cell line. The high level of induction and the consistency of the results across the three ENU-treated cells provide strong evidence that the experimental results for the ENU-induced cells are accurate. Next, we developed a reduced-representation assay to repeat the experiment using MEFs. We observed a consistent mutation frequency across the two single ENU-treated MEFs, similar to the levels found in the treated S2 cells. The mutation frequencies observed in the treated MEFs were two-fold higher than those previously estimated using a lacZ transgenic reporter gene (20
). The discrepancy can be explained, at least in part, by the fact that the reporter gene cannot detect mutations that do not inactivate the β-galactosidase enzymatic activity (21
While our results conclusively indicate elevated mutation frequencies in both S2 cells and MEFs after exposure to ENU, our data does not allow for accurate estimates of the frequency of spontaneous mutations in control cells. For the control MEF cells, it is possible to make a comparison with spontaneous mutation frequencies observed with our lacZ reporter system in these same cells (32
). While we do find higher mutation frequencies by direct sequencing than with the reporter system (in keeping with the inability of the lacZ positive selection system to detect silent mutations), they are in the same range.
Background mutation frequencies are equal to the sum of the spontaneous mutation frequencies within the individual cells and the background error rate of the assay due to mutations introduced during the WGA step. An error introduced early in MDA would be found in 12.5% of the sequences in a diploid cell and 6.25% of the sequences in a tetraploid cell on average (25
). The kinetics of the amplification process, i.e. the fact that multiple polymerase molecules may be operating on the original template strand when the initial error is introduced, may further reduce the probability of significant errors occurring. The percentages listed above are averages however, and it is possible that an error produced early in MDA could be randomly selected for, leading to a false positive call present in the majority of reads aligning at a locus. Additionally, the high degree of allele dropout observed in many of our samples increases the probability that some artifacts produced early in MDA could be found in a significant proportion of reads aligning at a locus. In order to obtain a more quantitative estimate of the spontaneous mutation frequency, more extensive studies are needed, using maximum-likelihood methods for estimating mutation frequencies from the high-throughput sequencing data, for example, as described by Lynch (33
The mutant spectra observed in the treated S2 and MEF cells agrees with data obtained using reporter genes (34–36
), providing additional evidence for the accuracy of our measured ENU-induced mutation loads. A common argument against the use of reporter genes is that they may not be representative of genome-wide events, due to both sequence specificity and their dependence on a phenotypic change. ENU is a small direct-acting agent with a lack of sequence specificity and, therefore, it was not surprising that no major differences were found between reporter gene data and our genome-wide unbiased approach.
While this analysis was limited to point mutations, the same methodology can be applied to investigate small insertions and deletions (InDels) and structural variation. Indeed, the paired-end sequencing approach allows us to detect structural alterations as invalid alignments to the reference genome sequence (37
). If the mapped locations of the ends of a paired-read have abnormal distances, orientation or chromosomal localization, then a genomic rearrangement is suggested. While we were clearly able to detect such events, ENU is a point mutagen and we did not find an increase of genome rearrangements in the treated cells (not shown).
In summary, these results show for the first time how massively parallel sequencing can be used effectively for measuring random, low-abundance mutations in somatic cells. This opens up the possibility to analyze intra-tissue heterogeneity of cellular genotypes. Importantly, our methodology provides a direct, comprehensive approach for estimating an individual's risk from exposure to mutagenic agents, such as radiation.