To test whether some observed mutations were present in the starting fibroblasts at low frequency prior to reprogramming, we developed a new digital quantification assay (DigiQ) to quantify the frequencies of 32 mutations in six fibroblast lines using ultra-deep sequencing (Supplementary Figure
S3-
4). We amplified each mutated region from the genomic DNA of 100,000 cells with a high-fidelity DNA polymerase and sequenced the pooled amplicons with an Illumina Genome Analyzer at an average coverage of 10
6. Although the raw sequencing error is roughly 0.1-1% with the Illumina sequencing platform, detection of rare mutations at a lower frequency is possible with proper quality filtering and careful selection of controls
22. For each fibroblast line, we included the mutation-carrying hiPS DNA as the positive control and another “mutation-free” DNA sample as the negative control for sequencing errors (
Supplementary Methods). Comparison of the allelic counts at the mutation positions between the fibroblast lines and the negative controls allowed us to distinguish rare mutations from sequencing errors, and estimate the detection limit of the assay. Seventeen of the 32 mutations were found in fibroblasts in a range of 0.3-1000 in ten thousand while 15 mutations were not detectable (Supplementary Table
S2-
3). In each fibroblast line with more than one detectable rare mutation, the frequency of each mutation was very similar, which suggests that a small sub-population of each fibroblast line appeared to contain all pre-existing hiPS mutations, while the rest of the cells lacked any of them.
We extended this analysis by asking whether all of the hiPS mutations could have pre-existed in the fibroblast populations. For the 15 mutations not detected with the DigiQ assay, the detection limits can be estimated (
Supplementary Methods). The sequencing quality was sufficiently high at 7 of the 15 sites such that rare mutations at frequencies of 0.6-5 in 100,000 should be detectable with our assay (
Supplementary Table S3). Since 30,000-100,000 fibroblast cells were used in the reprogramming experiments, we can rule out the presence of two mutated genes (
NTRK3 and
PLOR1C) in even one cell of the starting fibroblast population, while five others were present in no more than 1-2 cells.
As another test of the hypothesis that all of the mutations pre-existed in fibroblasts prior to reprogramming, we examined the exomes of two hiPS lines derived from a fibroblast line dH1cf16, which was itself clonally derived from the dH1F fibroblast line and passaged the minimum amount to generate enough cells for reprogramming. The two hiPS lines derived from the non-clonal dH1F fibroblast line contained 8 and 3 new mutations not found in the fibroblasts respectively; we observed a very similar independent mutational load in the clonal lines (6 new mutations in the hiPS line dH1cf16-iPS1 and 2 new mutations in the hiPS line dH1cf16-iPS4). Together, these experiments establish that while some of the reprogramming-associated mutations were likely to pre-exist in the starting fibroblast cultures, the others occurred during reprogramming and subsequent culture. Specific distributions tend to vary across hiPS lines (
Supplementary Table S3).
Mutations occurring during reprogramming could be due in part to a significantly elevated mutation rate during reprogramming. It is also possible that selection could play an important role. We tested the possibility that an elevated mutation rate might occur because the reprogramming process might be inducing transient repression of
p53,
RB1, and other tumor suppressor genes, which are known to inhibit reprogramming and are required for normal DNA damage responses.
SV40 Large-T antigen, which inactivates tumor suppressor and DNA damage response genes (including
p53 and
p105/
RB1)
23, was expressed during reprogramming of three analyzed hiPS lines (DF6-9-9, DF19-11, and iPS4.7).
24. Another hiPS line (FiPS4F-shpRB4.5) was generated while directly knocking down
RB1 (
Supplementary Figure S5). However, the observed mutational load was very similar in these lines compared to the others, indicating that reprogramming-associated mutations cannot be explained by an elevated mutation rate caused by
p53 or
RB1 repression.
We also probed if additional mutations could become fixed during extended passaging by extending our analysis of one hiPS line. While most of our hiPS lines were sequenced at fairly low passage number (less than 20), to directly measure the effect of post-reprogramming culture we also sequenced one hiPS line (FiPS4F2) at two passages (p9 and p40). We discovered that all seven mutations identified in the passage 9 line remained fixed in the passage 40 line, but that four additional mutations were found to be fixed in the passage 40 cell line.
To test the possibility that selection is operating during hiPS generation, we performed an enrichment analysis to determine if reprogramming-associated mutated genes were more likely to be observed in cancer cells than random somatic mutation. We used the COSMIC database as a source of genes commonly mutated in cancer. We discovered that the reprogramming-associated mutated genes were significantly enriched for genes found mutated in cancer (p=0.0019,
Supplementary Materials), which implies some mutations were selected during reprogramming.
As an alternative test of the selection hypothesis, we asked whether mutations associated with reprogramming could be functional based on the nonsynonymous:synonymous (NS:S) ratio. Traditionally, the analysis of the NS:S ratio is applied to germline mutations evolved over a long period of evolutionary time, which is thus not directly applicable to somatic mutations. However, functional mutations are known to be positively selected in cancers, allowing us to make a direct comparison to mutation characteristics found in cancer genomes. Strikingly the NS:S ratio is very similar between mutations identified in three recent cancer genome sequencing projects
25,26,27 and the reprogramming-associated mutations we found (2.4:1 and 2.6:1, respectively), indicating that a similar degree of selection pressure may be present.
We also checked if reprogramming-associated mutations could be providing a common functional advantage using a pathway enrichment analysis through Gene Ontology terms
28. No statistically significant similarity was identified, indicating that mutated genes have varied cellular functions. Again, identical results were found when performing the same analysis on mutations identified during the genome sequencing of melanoma, breast cancer, and lung cancer samples
25,26,27. This lack of enrichment in cancer genomes is generally thought to be due to the presence of many passenger mutations in cancer cells, which could also be true for reprogramming-associated mutations. Nonetheless, these analyses suggest that selection of potentially functional mutations could play a role in amplifying rare mutation-carrying cells and, when coupled with the single-cell bottleneck in hiPS colony picking, could contribute to the fixation of initially low-frequency mutations throughout the entire hiPS cell population.