|Home | About | Journals | Submit | Contact Us | Français|
Defined transcription factors can induce epigenetic reprogramming of adult mammalian cells into induced pluripotent stem cells. Although DNA factors are integrated during some reprogramming methods, it is unknown whether the genome remains unchanged at the single nucleotide level. Here we show that 22 human induced pluripotent stem (hiPS) cell lines reprogrammed using five different methods each contained an average of five protein-coding point mutations in the regions sampled (an estimated six protein coding point mutations per exome). The majority of these mutations were non-synonymous, nonsense, or splice variants, and were enriched in genes mutated or having causative effects in cancers. At least half of these reprogramming-associated mutations pre-existed in fibroblast progenitors at low frequencies, while the rest were newly occurring during or after reprogramming. Thus, hiPS cells acquire genetic modifications in addition to epigenetic modifications. Extensive genetic screening should become a standard procedure to ensure hiPS safety before clinical use.
hiPS cells have the potential to revolutionize personalized medicine by allowing immunocompatible stem cell therapies to be developed1,2. However, questions remain about hiPS safety. For clinical use, hiPS lines must be reprogrammed from cultured adult cells, and could carry a mutational load due to normal in vivo somatic mutation. Furthermore, many hiPS reprogramming methods utilize oncogenes that may increase the mutation rate. Additionally, some hiPS lines have been observed to contain large-scale genomic rearrangements and abnormal karyotypes after reprogramming3. Recent studies also revealed that tumor suppressor genes, including those involved in DNA damage response, have an inhibitory effect on nuclear reprogramming4-9. These findings suggest that the process of reprogramming could lead to an elevated mutational load in hiPS cells.
To probe this issue, we sequenced the majority of the protein-coding exons (exomes) of twenty-two hiPS lines and the nine matched fibroblast lines from which they originated (Table 1). These lines were reprogrammed in seven laboratories using three integrating methods (four-factor retroviral, four-factor lentiviral, and three-factor retroviral) and two non-integrating methods (episomal vector and mRNA delivery into fibroblasts). All hiPS lines were extensively characterized for pluripotency and had normal karyotypes prior to DNA extraction (Supplementary Methods). Protein coding regions in the genome were captured and sequenced from the genomic DNA of hiPS lines and their matched progenitor fibroblast lines using either padlock probes10,11 or in-solution DNA or RNA baits12,13. We searched for single base changes, small insertions/deletions, and alternative splicing variants, and identified 12,000 - 18,000 known and novel variants for each cell line that had sufficient coverage and consensus quality (Table 1).
We identified sites that showed the gain of a new allele in each hiPS line compared with their corresponding matched progenitor fibroblast genome. A total of 124 mutations were validated with capillary sequencing (Figure 1, Table 2, Supplementary Figure S1), which revealed that each mutation was fixed in heterozygous condition in the hiPS lines. No small insertions/deletions were detected. For three hiPS lines (CV-hiPS-B, CV-hiPS-F, PGP1-iPS), the donor’s complete genome sequence obtained from whole blood is publicly available14,15; we used this information to further confirm that all 27 mutations in these lines were bona fide somatic mutations. Because 84% of the expected exomic variants16 were captured at high depth and quality, the predicted load is approximately 6 coding mutations per hiPS genome (see Table 1 for details). The majority of mutations were missense (83/124), nonsense (5/124), or splice variants (4/124). Fifty-three missense mutations were predicted to alter protein function17 (Supplementary Table S1). Fifty mutated genes were previously found to be mutated in some cancers18,19. For example, ATM is a well-characterized tumor suppressor gene found mutated in one hiPS line, while NTRK1 and NTRK3 (tyrosine kinase receptors) can cause cancers when mutated20 and contained damaging mutations in three hiPS lines (CV-hiPS-F, iPS29e, FiPS4F-shpRB4.5) reprogrammed in three labs from different donors. Two NEK kinase genes, a family related to cell division, were mutated in two independent hiPS lines. In addition to cancer-related genes, fourteen of the twenty-two lines contain mutations in genes with known roles in human Mendelian disorders21. Three pairs of hiPS lines (iPS17a and iPS17b, dH1F-iPS8 and dH1F-iPS9, CF-RiPS1.4 and CF-RiPS 1.9) shared three, two, and one mutation respectively; these most likely arose in shared common progenitor cells prior to reprogramming. However, most hiPS lines derived from the same fibroblast line did not share common mutations (Table 2 and Supplementary Table S1).
These data raise the possibility that a significant number of mutations are occurring during or shortly after reprogramming and then become fixed during colony picking and expansion. An alternative hypothesis is that the mutations we found are simply the result of age-accrued biopsy heterogeneity or in vitro fibroblast cell culture. The skin biopsies were collected from donors at ages varying from newborn to 82 years old; biopsy heterogeneity therefore does not appear to play a primary role, as the mutational load is not correlated (R2 = 0.046) with donor age (Supplementary Figure S2). We attempted to grow clonal fibroblasts in order to obtain a control for single-cell mutational load, but a direct assessment was not possible due to technical difficulties in mimicking the exact culture conditions (Supplementary Methods). Assuming the skin biopsy is mutation-free, we can use previously published values for the typical mutation rate in culture to obtain an expectation of ten times fewer mutations per genome than we observed (p< 1.27 × 10−53; Supplementary Methods), indicating that hiPS mutational load is high compared to normal culture mutational load. We define the term “reprogramming-associated mutations” to describe mutations observed after reprogramming. Reprogramming-associated mutations could be pre-existing at low frequencies in the fibroblast population, occurring during the reprogramming process, or occurring after reprogramming. All reprogramming-associated mutations have become fixed in the hiPS line population.
To test whether some observed mutations were present in the starting fibroblasts at low frequency prior to reprogramming, we developed a new digital quantification assay (DigiQ) to quantify the frequencies of 32 mutations in six fibroblast lines using ultra-deep sequencing (Supplementary Figure S3-4). We amplified each mutated region from the genomic DNA of 100,000 cells with a high-fidelity DNA polymerase and sequenced the pooled amplicons with an Illumina Genome Analyzer at an average coverage of 106. Although the raw sequencing error is roughly 0.1-1% with the Illumina sequencing platform, detection of rare mutations at a lower frequency is possible with proper quality filtering and careful selection of controls22. For each fibroblast line, we included the mutation-carrying hiPS DNA as the positive control and another “mutation-free” DNA sample as the negative control for sequencing errors (Supplementary Methods). Comparison of the allelic counts at the mutation positions between the fibroblast lines and the negative controls allowed us to distinguish rare mutations from sequencing errors, and estimate the detection limit of the assay. Seventeen of the 32 mutations were found in fibroblasts in a range of 0.3-1000 in ten thousand while 15 mutations were not detectable (Supplementary Table S2-3). In each fibroblast line with more than one detectable rare mutation, the frequency of each mutation was very similar, which suggests that a small sub-population of each fibroblast line appeared to contain all pre-existing hiPS mutations, while the rest of the cells lacked any of them.
We extended this analysis by asking whether all of the hiPS mutations could have pre-existed in the fibroblast populations. For the 15 mutations not detected with the DigiQ assay, the detection limits can be estimated (Supplementary Methods). The sequencing quality was sufficiently high at 7 of the 15 sites such that rare mutations at frequencies of 0.6-5 in 100,000 should be detectable with our assay (Supplementary Table S3). Since 30,000-100,000 fibroblast cells were used in the reprogramming experiments, we can rule out the presence of two mutated genes (NTRK3 and PLOR1C) in even one cell of the starting fibroblast population, while five others were present in no more than 1-2 cells.
As another test of the hypothesis that all of the mutations pre-existed in fibroblasts prior to reprogramming, we examined the exomes of two hiPS lines derived from a fibroblast line dH1cf16, which was itself clonally derived from the dH1F fibroblast line and passaged the minimum amount to generate enough cells for reprogramming. The two hiPS lines derived from the non-clonal dH1F fibroblast line contained 8 and 3 new mutations not found in the fibroblasts respectively; we observed a very similar independent mutational load in the clonal lines (6 new mutations in the hiPS line dH1cf16-iPS1 and 2 new mutations in the hiPS line dH1cf16-iPS4). Together, these experiments establish that while some of the reprogramming-associated mutations were likely to pre-exist in the starting fibroblast cultures, the others occurred during reprogramming and subsequent culture. Specific distributions tend to vary across hiPS lines (Supplementary Table S3).
Mutations occurring during reprogramming could be due in part to a significantly elevated mutation rate during reprogramming. It is also possible that selection could play an important role. We tested the possibility that an elevated mutation rate might occur because the reprogramming process might be inducing transient repression of p53, RB1, and other tumor suppressor genes, which are known to inhibit reprogramming and are required for normal DNA damage responses. SV40 Large-T antigen, which inactivates tumor suppressor and DNA damage response genes (including p53 and p105/RB1)23, was expressed during reprogramming of three analyzed hiPS lines (DF6-9-9, DF19-11, and iPS4.7).24. Another hiPS line (FiPS4F-shpRB4.5) was generated while directly knocking down RB1 (Supplementary Figure S5). However, the observed mutational load was very similar in these lines compared to the others, indicating that reprogramming-associated mutations cannot be explained by an elevated mutation rate caused by p53 or RB1 repression.
We also probed if additional mutations could become fixed during extended passaging by extending our analysis of one hiPS line. While most of our hiPS lines were sequenced at fairly low passage number (less than 20), to directly measure the effect of post-reprogramming culture we also sequenced one hiPS line (FiPS4F2) at two passages (p9 and p40). We discovered that all seven mutations identified in the passage 9 line remained fixed in the passage 40 line, but that four additional mutations were found to be fixed in the passage 40 cell line.
To test the possibility that selection is operating during hiPS generation, we performed an enrichment analysis to determine if reprogramming-associated mutated genes were more likely to be observed in cancer cells than random somatic mutation. We used the COSMIC database as a source of genes commonly mutated in cancer. We discovered that the reprogramming-associated mutated genes were significantly enriched for genes found mutated in cancer (p=0.0019, Supplementary Materials), which implies some mutations were selected during reprogramming.
As an alternative test of the selection hypothesis, we asked whether mutations associated with reprogramming could be functional based on the nonsynonymous:synonymous (NS:S) ratio. Traditionally, the analysis of the NS:S ratio is applied to germline mutations evolved over a long period of evolutionary time, which is thus not directly applicable to somatic mutations. However, functional mutations are known to be positively selected in cancers, allowing us to make a direct comparison to mutation characteristics found in cancer genomes. Strikingly the NS:S ratio is very similar between mutations identified in three recent cancer genome sequencing projects25,26,27 and the reprogramming-associated mutations we found (2.4:1 and 2.6:1, respectively), indicating that a similar degree of selection pressure may be present.
We also checked if reprogramming-associated mutations could be providing a common functional advantage using a pathway enrichment analysis through Gene Ontology terms28. No statistically significant similarity was identified, indicating that mutated genes have varied cellular functions. Again, identical results were found when performing the same analysis on mutations identified during the genome sequencing of melanoma, breast cancer, and lung cancer samples25,26,27. This lack of enrichment in cancer genomes is generally thought to be due to the presence of many passenger mutations in cancer cells, which could also be true for reprogramming-associated mutations. Nonetheless, these analyses suggest that selection of potentially functional mutations could play a role in amplifying rare mutation-carrying cells and, when coupled with the single-cell bottleneck in hiPS colony picking, could contribute to the fixation of initially low-frequency mutations throughout the entire hiPS cell population.
Taken together, our results clearly demonstrate that pre-existing and new mutations during and after reprogramming all contribute to the high mutational load we discovered in hiPS lines. Although we cannot completely rule out the possibility that reprogramming itself is “mutagenic”, our data argue that selection during hiPS reprogramming, colony-picking, and subsequent culture may be contributing factors. A corollary is that, if reprogramming efficiency is improved to a level such that no colony-picking and clonal expansion is necessary, the resulting hiPS cells could potentially be free of mutations.
Despite the power of our experimental approach to accurately identify and characterize reprogramming-associated mutations, their functional significance remains to be shown. This issue parallels a general problem facing the genomics community: high-throughput sequencing technologies have allowed data generation rates to greatly outpace functional interpretation. Additionally, when considering the biological significance of reprogramming-associated mutations, there are two separate functional aspects to consider: whether some of these mutations contributed functionally to the reprogramming of cell fate, and whether some of these mutations could increase disease risk when hiPS-derived cells/tissues are used in the clinic. These two aspects are not necessarily connected. Although the functional effects of the 124 mutations remained to be characterized experimentally, it is nonetheless striking that the observed reprogramming-associated mutational load shares many similarities with that observed in cancer. Furthermore, the observation of mutated genes involved in human Mendelian disorders suggests that the risk for diseases other than cancer needs to be evaluated for hiPS-based therapeutic methods. Future long-term studies must focus on functional characterization of reprogramming-associated mutations in order to further aid the creation of clinical safety standards.
Because safe hiPS cells are critical for clinical application, just as previous findings of large-scale genome rearrangements in hiPS lines led to the introduction of karyotyping as a standard post-reprogramming protocol, routine genetic screening of hiPS lines to ensure that no obviously deleterious point mutations are present must become a standard procedure. Complete exome or genome sequencing of hiPS lines might be an efficient way to screen out hiPS lines that have a high mutational load or that have mutations in genes implicated in development, disease, or tumorigenesis. Further rigorous work on mutation rates and distributions during in vitro culture and reprogramming of hiPS cells, and perhaps human embryonic stem cells, will be essential to help establish clinical safety standards for genomic integrity.
CV-hiPS-F and CV-hiPS-B were reprogrammed from CV Fibroblasts using 4-factor retroviral vectors. PGP1-iPS cells were reprogrammed by Cellular Dynamics using the same four factors in a lentiviral vector from PGP1F fibroblasts29. dH1F-iPS8, dH1F-iPS9, dH1cF16-iPS1, dH1cF16-iPS4, dH1cF16, and dH1F cells were obtained from previous cultures30 reprogrammed with retroviral vectors containing the same factors31. DF-6-9-9, DF-19-11, iPS4.7, and FS cells were obtained from previously existing cultures; the reprogramming process and characterization of lines has been described previously24. iPS11a, iPS11b, iPS17a, iPS17b, iPS29A, iPS29e, Hib11, Hib17, and Hib29 cells were obtained from previous cultures reprogrammed using retroviral vectors encoding three or four factors32. FiPS3F1 and FiPS4F7 were reprogrammed from HFFxF fibroblasts using similar protocols33-35. FiPS4F2 and FiPS4F-shpRB4.5 were reprogrammed using the same 4-factor protocol from IMR90 fibroblasts. The mRNA-derived lines (CF-RiPS1.4, CF-RiPS1.9, and CF Fibroblasts) were obtained from previous cultures36. All hiPS lines were extensively characterized for pluripotency. Fourteen lines were tested for teratoma formation and shown to express all embryonic germ layers in vivo. DNA was extracted from each cell type using Qiagen’s DNeasy kit.
Exome capture was performed with either a library of padlock probes, commercial hybridization capture DNA baits (NimbleGen SeqCap EZ), or RNA baits (Agilent SureSelect), and the resulting libraries were sequenced on an Illumina GA IIx sequencer. Putative mutations were rejected if they were known polymorphisms or contained any minor allele presence in the fibroblast. All candidate mutations were confirmed using capillary Sanger sequencing.
For digital quantification, mutations were PCR-amplified and sequenced using an Illumina GA IIx. These libraries were sequenced to obtain on average one million independent base calls for each location. A binomial test was then used to determine if the observed minor allele frequency could be separated from error and estimate the frequency of each mutation.
Detailed methods are available in the Supplementary Materials.
We thank J.M. Akey, G.M. Church, S. Ding, J.B. Li and J. Shendure for discussions and suggestions, S. Vassallo for assistance on DNA shearing, G.L. Boulting and S. Ratansirintrawoot for assistance on hiPS cell culturing. This study is supported by NIH R01 HL094963 and UCSD new faculty startup fund to K.Z., a training grant from the California Institute for Regenerative Medicine (TG2-01154), and a CIRM grant (RC1-00116) to L.S.B.G. L.S.B.G. is an Investigator of the Howard Hughes Medical Institute. A.G. is supported by the Focht-Powell Fellowship and a CIRM pre-doctoral Fellowship. Y.H.L. is supported by the A*Star Institute of Medical Biology and Singapore stem cell consortium. Work in the laboratory of J.C.I.B. was supported by grants from MICINN, Sanofi-Aventis and the G. Harold and Leila Y. Mathers and Cellex Foundations.