|Home | About | Journals | Submit | Contact Us | Français|
Genome instability is a potential limitation to the research and therapeutic application of induced pluripotent stem cells (iPSCs). Observed genomic variations reflect the combined activities of DNA damage, cellular DNA damage response (DDR), and selection pressure in culture. To understand the contribution of DDR on the distribution of copy number variations (CNVs) in iPSCs, we mapped CNVs of iPSCs with mutations in the central DDR gene ATM onto genome organization landscapes defined by genome-wide replication timing profiles. We show that following reprogramming the early and late replicating genome is differentially affected by CNVs in ATM-deficient iPSCs relative to wild-type iPSCs. Specifically, the early replicating regions had increased CNV losses during retroviral (RV) reprogramming. This differential CNV distribution was not present after later passage or after episomal reprogramming. Comparison of different reprogramming methods in the setting of defective DDR reveals unique vulnerability of early replicating open chromatin to RV vectors.
Different types of genetic aberrations such as point mutations, small-scale copy number variations (CNVs), and other large-scale chromosomal level changes have been reported in human embryonic stem cells (hESCs) and iPSCs [human pluripotent stem cells (hPSCs)] [1–12]. The presence of these variations is a considerable concern for therapeutic applications of hPSCs, as evidenced by the recent halt of the first human induced pluripotent stem cell (iPSC) clinical study in Japan . While some themes around genomic instability during reprogramming have begun to emerge, our understanding of how genomic aberrations arise in these cells remains limited . Recently, we showed that genome replication timing changes associated with cellular reprogramming shape the CNV landscape in human iPSCs . Replication timing organization is a highly cell-type–specific, spatiotemporally controlled epigenetic property. Replication domains are structural and functional units of the genome with near one to one correspondence to topologically associated domains defined by Hi-C chromosome conformation capture. In addition, replication timing clearly influences genomic mutation rates [15,16]. The relationship between mutation rate and spatiotemporal organization of the genome underscores the complexity of genome structure and function, and nuclear reprogramming is a powerful platform for studying that relationship .
Genome stability, DNA replication, and DNA damage response (DDR) are intrinsically linked to higher-order chromatin organization [18,19]. To study the effect of DDR on genome stability during reprogramming and in pluripotency, we set out to investigate genomic aberrations during factor-based reprogramming when the DDR system has been compromised. Comparisons of these DDR-deficient cells with normal cells could reveal additional properties of genomic variations arising during reprogramming. We focused on the gene ATM, in which mutations lead to accumulation of genomic aberrations and result in ataxia telangiectasia (A-T) syndrome. ATM is the central gene involved in DDR and repair. Mutations in ATM result in defective cell cycle checkpoint activation and a reduced capacity for repair of DNA double-strand breaks (DSBs). iPSCs from A-T patients have been generated by multiple laboratories, but the issue of genomic variation has not been comprehensively investigated [20–24]. We analyze the CNVs of iPSCs derived from A-T patient using high-resolution single-nucleotide polymorphism (SNP) array and discovered differential genome-wide distribution relative to replication timing organization and the effects of integrating versus nonintegrating reprogramming methods.
Dermal fibroblasts from A-T syndrome patients were obtained from the Coriell Institute for Medical Research Cell Repository (Supplementary Fig. S1; Supplementary Data are available online at www.liebertpub.com/scd). These cells and fibroblasts from a healthy control (hFib2) were cultured and reprogrammed as described . Briefly, 100,000 cells were infected with retroviruses prepared from constructs containing GFP-tagged human OCT4, SOX2, KLF4, and c-MYC cDNA. On day 4, cells were trypsinized and plated to 10-cm dishes with mitotically inactivated mouse embryonic fibroblasts (iMEFs) as feeder cells. The following day, the medium was changed to hESC medium. Medium was replenished daily. At 20–35 days postinfection, colonies with hESC-like morphology and silenced GFP expression were picked and expanded for further analysis.
Human fibroblast lines were cultured in 10% fetal bovine serum and GlutaMAX in Dulbecco's modified Eagle's medium (DMEM; Lonza). Human iPSCs and ESCs were either cultured on iMEFs in DMEM/F12 supplemented with 20% Knockout Serum Replacement (Gibco), 0.1mM β-mercaptoethanol (Gibco), GlutaMAX (Gibco), nonessential amino acids (Gibco), and 10ng/mL FGF2 (Sigma), or in feeder-free conditions on Matrigel in mTeSR medium with 5× supplements (STEMCELL Technologies). Human iPSCs were passaged using 50U/mL type IV collagenase (Gibco) (for feeder culture) or dispase (for feeder-free culture) approximately every 5–7 days.
Cells were treated with 25 or 100ng/mL neocarzinostatin (NCS) or 2 Gy γ-irradiation and fixed at various time points for immunostaining or protein samples were collected for western blotting.
Samples were washed with phosphate-buffered saline (PBS) and fixed in 4% paraformaldehyde for 20min at room temperature. After three washes in PBS, cells were permeabilized in 0.2% Triton X-100 in PBST/3% bovine serum albumin (BSA) for 12min and subsequently washed three times with PBS. After blocking with PBST/3% BSA for 1h, samples were incubated with primary antibodies diluted in PBST/3% BSA overnight at 4°C. The next day, cells were washed three times with PBST and incubated with secondary antibodies—Alexa Fluor 594 anti-goat IgG or Alexa Fluor 488 anti-rabbit IgG (both from Life Technologies) for 1h before further washes. Then samples were stained with DAPI and mounted for imaging with a Nikon Ti fluorescence microscope. Primary antibodies used were Oct4 (19857; Abcam), Nanog (21624; Abcam), SSEA-4 (566218; BD Pharmingen), Tra-1-60 (90232; Chemicon), Tra-1-81 (MAB4381; Millipore), phospho-CHK2 (2661; Cell Signaling Technologies), and phospho-H2A.X (05-636; Millipore).
Cells were lysed with RIPA buffer with protease inhibitors and quantified with Pierce BCA assay reagent. Protein samples were separated through sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) and transferred to PVDF membranes, which were then blocked with 5% skim milk in TBST and incubated with primary antibodies overnight at 4°C. Primary antibodies were p53 (SC-126; Santa Cruz), β-actin (612656; BD Biosciences), and ATM (No. 2783; Cell Signaling Technology).
RNA was isolated with the RNeasy Kit (Qiagen) according to the manufacturer's instruction. The amount of RNA was quantified using a NanoDrop (NanoDrop Technologies) and analyzed with Agilent Bioanalyzer for quality check. For microarray analysis, RNA probes were prepared and hybridized to Human Genome U133 Plus 2.0 oligonucleotide microarrays (Affymetrix) as per the manufacturer's instructions. Arrays were processed by the Coriell Institute Genotyping and Microarray Center. Gene expression levels were normalized with the Robust Multichip Average algorithm. Hierarchical clustering and heat map was performed by means of the Euclidean distance with average linkage method. The calculation was performed by R package. All microarray data have been deposited with the NCBI Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo/) under accession number GSE78716.
Genomic DNA was purified with Qiagen Blood and Tissue DNA Kit. Sample handling and hybridization to Affymetrix SNP 6.0 arrays were performed according to the manufacturer's instructions at the Genotyping and Microarray Center of Coriell Institute for Medical Research. For detecting CNV calls, the Affymetrix Genotyping Console 4.1.1 algorithm was used. CNV locations are based on the human genome assembly of March 2006 (hg18). Samples were normalized to Affymetrix HapMap reference dataset hybridized on the same platform. For CNV calls, regional GC correction, 10-kb cut-off value, and a minimum of 10 markers were used as analysis configurations. All samples passed quality control requirements. All identified CNVs were included, except for CNVs spanning centromeric regions [the average marker distribution within these regions is too large (>40kb)]. Perl and Python were used for in silico data analysis and CNV data parsing. R language was used for statistical analysis and P-value calculations. CNV calls were experimentally validated using quantitative polymerase chain reaction and a computational approach based on replication timing array data as shown previously . Validation rate of iPSC-manifested novel CNVs was about 90%. All SNP array data have been deposited to GSE78715.
Asynchronously cycling iPSC or fibroblast cells are pulse labeled with the nucleotide analog 5-bromo-2-deoxyuridine (BrdU) and sorted into S-phase fractions on the basis of DNA content using flow cytometry. BrdU-labeled DNA from each fraction is immunoprecipitated, amplified, differentially labeled and cohybridized to a whole-genome comparative genomic hybridization microarray (Roche Nimblegen 6X630k arrays). Data were analyzed as in Lu et al.  and deposited to GSE78717.
Primary dermal fibroblasts from two A-T patients (Supplementary Fig. S1A) were reprogrammed using retroviral (RV) transduction of the standard Yamanaka factors, as we previously reported . The disease cells were obtained from the Coriell Institute for Medical Research Repository. The exact mutations in the two A-T fibroblast lines (ATF1 and ATF2) are not clear due to the large size of the ATM gene, but both patients are clinically affected. All iPSC lines express pluripotency markers (Supplementary Fig. S1B) and yielded teratomas comprised of tissues of all three germ layers (Supplementary Fig. S1C).
To confirm that A-T disease iPSCs (A-T iPSCs) retained the ATM deficiency phenotype, DSB's were induced using two different approaches—NCS and γ-irradiation. Immunofluorescence and western blotting confirmed that in ATM-deficient fibroblasts and the resultant iPSCs, CHK2 phosphorylation is dramatically reduced along with less significant upregulation of p53 (Fig. 1 and Supplementary Fig. S1D). Collectively, these data demonstrate that the A-T iPSCs maintain their original ATM deficiency phenotype.
iPSC lines, analyzed at various passages, have a global gene expression pattern indistinguishable from a well-characterized hESC line, CHB-8 (Supplementary Fig. S2). We were not able to detect any consistently up or downregulated genes or pathways in pluripotent state between A-T iPSCs and iPSCs generated from healthy donor fibroblasts (WT iPSCs). These data show that ATM deficiency does not affect global gene expression in the pluripotent state, similar to previous reports [21–24].
Because ATM is involved in cell cycle checkpoints, we examined the cell cycle distribution of A-T iPSCs, WT iPSCs, and WT hESC lines. We could not observe any obvious difference in the cell cycle profiles among these cell lines (Supplementary Fig. S1E). Under standard culture condition, these cells maintain a similar passage ratio every 6–7 days and could be maintained in culture for over 50 passages.
We identified CNVs by high-resolution SNP array on six RV reprogrammed A-T iPSC lines and their parental fibroblasts (Supplementary Tables S1 and S2), including low-passage (immediately after reprogramming, passage 5–8) and high-passage (passage 19–23) iPSCs. We also expanded two previously generated WT iPSC lines  to similar passages and analyzed their CNVs (Supplementary Tables S1 and S2).
Similar to what we reported previously, the majority of CNVs in iPSCs was novel—not detected in parental fibroblasts . Novel CNVs detected only in low passage cells most likely reflect genomic aberrations that arise or are selected for during the reprogramming process, whereas novel CNVs only detected in high-passage cells reflect the result of prolonged culture in pluripotent state. An initial characterization of the CNVs demonstrated substantial variability in the number of CNVs across the various lines and passages (Fig. 2A and Supplementary Table S1), consistent with what has been previously reported in similar analyses [4–6,26]. Most iPSC CNVs were novel—only a minority was also present in parental fibroblast (Fig. 2B). Of the novel CNVs detected in low passage iPSCs, most (~70%) were unique and no longer detected at high passage (Fig. 2B; 73.8%±20.9% for WT iPSCs; 69.1%±21.4% for A-T iPSCs). However, compared to WT iPSCs, A-T iPSCs tended to have a higher number of unique novel CNV losses at higher passage (Fig. 2B; 57.4%±16.6% for WT and 81.5%±11.5% for A-T). These data demonstrate that CNVs, which appear immediately following reprogramming, are highly dynamic and that more de novo CNVs accumulate in A-T iPSCs during prolonged pluripotent culture, likely due to the DDR defects.
We failed to detect any consistent CNV losses or gains, but we did detect novel CNV gains in multiple A-T iPSC lines in the immunoglobin (Ig) regions, in particular IgLV locus on chr 22, which was amplified in two of the three A-T iPSC lines from ATF1 after reprogramming (Supplementary Fig. S2B). Other Ig loci (IgKV locus on chr 2 and IgH locus on chr 14) were also affected in some lines, although these were inherited from parental fibroblasts. No genomic level studies have been done in A-T patients to suggest that mutations in this locus could be linked to ATM deficiency. However, the findings that Ig loci are CNV hotspots in A-T iPSCs, but not in WT iPSCs, could potentially be connected with immunological dysfunction seen in A-T patients .
WT and ATM-deficient cells undergoing reprogramming will experience a similar degree of genomic stress due to replicative stress, metabolic changes, and reactive oxygen species production associated with reprogramming . Without functional ATM, some of the damage associated with these stresses will not be properly repaired in A-T iPSCs and comparison to WT will reveal the influence of ATM-mediated DDR on the genome aberration landscape. We generated A-T iPSC-specific replication timing profiles from a representative iPSC line of each A-T patient. Because the two A-T iPSC replication timing profiles are essentially identical (Pearson correlation 0.94), we combined them into one profile for further analyses (Fig. 3A). Of note, the replication timing profile of A-T iPSCs was also similar to that of WT iPSCs (Pearson correlation 0.87) (Supplementary Fig. S3), consistent with a recent report, where ATM knockdown did not change genome-wide chromatin interaction patterns in mouse pro-B cells .
Mapping novel CNVs of A-T iPSCs to this replication timing profile, we found that CNVs are contained within replication domains or constant timing regions, CTRs, corresponding to several tandem replication domain that replicate at the same time (Fig. 3B) . We also did not find an enrichment of CNVs in timing transition regions of iPSCs or fibroblasts (Fig. 3C), which are thought to be particularly vulnerable to DNA damage possibly due to the lack of active replication origins [31,32]. These data are similar to what we found with CNVs in WT iPSCs following reprogramming and demonstrate that the distribution of CNVs is related to higher-order replication domain organization .
Next, we analyzed novel CNV distribution following reprogramming as a function of replication timing in A-T iPSCs. At low passage, following reprogramming, WT and A-T iPSCs have similar numbers of gains, but we found a slight predominance of gains in the late replicating domains (Fig. 4A, orange), in contrast to WT, where gains enriched in early replicating domains (Fig. 4A, purple). A-T iPSC had a much higher number of losses compared to WT and they were distributed equally across early and late replicating domains (Fig. 4B, orange). This finding is also in contrast to WT, where CNV losses were strongly enriched in late replicating domains (Fig. 4B, purple). To exclude the potential contribution of the different genetic background (instead of the contribution of ATM deficiency), we separately analyzed CNVs from lines derived from each A-T patient and found similar trends in both (Supplementary Fig. S4A, B). Overall, A-T-iPSCs had a higher load of CNVs in early replicating genome in contrast to WT cells. These results suggest a higher vulnerability of early replication domains to genomic stress during reprogramming.
To test whether genomic stress during reprogramming is related to the integrative nature of the RV vectors, we generated additional A-T (from one of the patient fibroblast cells) and WT iPSC lines using episomal (EP) vectors and profiled CNVs in low-passage cells (Supplementary Tables S1 and S2). The CNV distribution of low-passage EP A-T iPSCs was similar to EP WT iPSCs (Fig. 4C, D), both similar to RV WT iPSCs. Therefore, the differential distribution of CNVs in low-passage A-T iPSCs is only seen during RV reprogramming (Fig. 4A, B), suggesting that integrative vector-mediated reprogramming causes more genomic stress in early replicating regions, a trend that is only uncovered in the absence of normal DDR.
To better understand how replicative stress and the pluripotent genome architecture interact to shape the CNV landscape during regular maintenance culture, we compared CNVs detected in low- and high-passage WT and A-T iPSCs. While A-T iPSCs accrued more CNVs during continued culture, especially losses, at higher passage, both A-T and WT iPSCs had a similar distribution of gains and losses relative to low-passage WT iPSCs (Supplementary Fig. S4C, D): gains enriching in early replicating regions and losses enriching in late replicating regions. The increased vulnerability of early replicating genome to CNVs (Fig. 4A, B) was not seen at the higher passage. These data further support that early replicating regions of the genome are uniquely susceptible to CNVs caused by genomic stress during reprogramming.
Various sources of genomic stress, DDR pathways, cellular proliferation, and selective pressure all converge to generate the CNV landscape in cells. These processes operate on the platform of precise, spatiotemporally regulated, cell-type–specific higher-order chromatin organization . Studying the dynamics of these processes during reprogramming and pluripotent culture is inherently difficult because they are so intertwined. By tracking CNV distribution as a function of replication timing, passage, and reprogramming method in the setting of normal (WT) and impaired (A-T) DDR, we discovered important insights about the origin of genomic variations in iPSCs. Our previous study on CNVs in WT iPSCs identified patterns of genomic instability relative to genome organization and reorganization during reprogramming. In this study, we show that genomic stress associated with integrative vector-based reprogramming differentially affects early replicating regions during the reprogramming process. This genomic stress is revealed when ATM function is impaired and is distinct from the genomic stress associated with maintenance culture of iPSCs.
Because DNA DSBs contribute to CNVs and A-T iPSCs have less efficient DSB repair, the distribution of CNVs in these cells reveals the landscape of DSB lesions before repair. Moreover, CNVs detected at low passage and not present in parental fibroblast are considered novel and likely arose during reprogramming. Our analyses demonstrate that novel CNV gains and losses are differentially distributed when ATM-mediated repair is impaired during reprogramming. Specifically, early replication genome is more susceptible to CNV losses (Fig. 4B). Moreover, by comparing CNVs in low and high passage to the parental fibroblasts, we gain insight to the role of DSB repair contributing to CNVs in reprogramming compared to pluripotent culture. In higher passage A-T iPSCs, where novel CNVs reflect genomic instability due to pluripotent culture rather than reprogramming, CNV distributions are similar to WT iPSCs. These findings suggest that the propensity for DSBs to generate CNVs during reprogramming is different compared to pluripotent culture.
Finally, we compared sister iPSCs reprogrammed by different methods of delivering the reprogramming factors. By removing ATM-mediated DDR, we revealed that early replicating/open chromatin domains are more vulnerable to genomic stress in retrovirus-mediated reprogramming compared to nonintegrating EP factors. RV integration happens during or after DNA replication [33–35] and could preferably target early replicating or open chromatin regions, although a genome-wide analysis of RV integration sites relative to higher-order chromatin organization has not been done. Although not tested here, nonrandom genomic stress could be associated with lentiviral vectors and piggyback transposons and, therefore, the genomic aberration effects of different reprogramming methods remain an important safety consideration for iPSCs. Interestingly, a recent report also showed reprogramming method-related differences in genetic variants . Fortunately, the use of integrative vectors for reprogramming has become less popular as other delivery methods such as sendai viral vectors and mRNAs have become more efficient and readily available. However, the reprogramming factor delivery method is not the only source of genomic stress. Replicative stress during nuclear reprogramming of cells contributes to genomic instability as shown in a drug-inducible system .
A significant portion of the genome changes replication timing as a result of nuclear reprogramming. We have previously reported that how the genome's replication timing is changed during reprogramming influences CNV distribution. Specifically, CNV gains accumulate in regions of the genome that change to earlier replication during the reprogramming process. Moreover, this differential distribution of CNVs occurs irrespective of the reprogramming method used. Interestingly, this finding holds true for A-T iPSCs—in both RV and EP reprogramming CNV gains enriched in the genome that changes to earlier replication during reprogramming (Supplementary Fig. S5). These results provide further proof that the increased CNV losses found in the early replicating genome in RV A-T iPSCs, but not in EPI A-T iPSCs, reflect unique genomic stress associated with RV reprogramming that is uncovered by A-T deficiency.
Our finding that iPSC CNV gains and losses in CNVs have differential distribution in open and closed chromatin provide further support for differential repair efficiency of the DDR pathways in different chromatin compartments [37,38]. ATM is more efficient at repairing DSBs, especially losses, in euchromatin (in general early replicating) than in heterochromatin (usually late replicating) . Therefore, during RV reprogramming of WT cells, most losses accumulate in late replicating regions, whereas gains in early regions. However, when ATM-mediated repair is not functioning, these differences are eliminated and the result is a more even distribution of gains and losses across the early and late replicating regions, as demonstrated in low-passage A-T iPSCs.
The experiments involving ATM deficiency illustrate how our understanding of the complex interactions between the DDR and pluripotency network can be informed by surveying the mutational landscape in the context of genome spatial organization. Interestingly, Supek and Lehner recently reported a similar relationship between DNA mismatch repair genes and differential mutation rates across the human genome . Additional studies with more targeted interrogation of other DDR network factors are warranted to better understand how these processes interact with nuclear architecture and cell fate change to result in genomic aberrations and how these aberrations impact the utility of iPSCs as platforms to study the disease and develop new therapies.
The authors thank Jeanne Carroll and Ruggero Spadafora for critical reading of the article. Norman Gerry at the Coriell Institute for technical assistance. This work was supported by the NICHD grant R00HD061981 to P.H.L. H.L. is supported by Mayo Clinic Center for Individualized Medicine and Mayo Clinic Center for Regenerative Medicine. D.M.G. is supported by R21CA161666 and P01GM085354.
No competing financial interests exist.