|Home | About | Journals | Submit | Contact Us | Français|
Derivation of induced pluripotent stem (iPS) cells requires the expression of defined transcription factors (among Oct3/4, Sox2, Klf4, c-Myc, Nanog and Lin28) in the targeted cells. Lentiviral or standard retroviral gene transfer remains the most robust and commonly used approach. Low reprogramming frequency overall, and the higher efficiency of derivation utilizing integrating vectors compared to more recent non-viral approaches suggests that gene activation or disruption via proviral integration sites (IS) may play a role in obtaining the pluripotent phenotype. We provide for the first time an extensive analysis of the lentiviral integration profile in human iPS cells. We identified a total of 78 independent integration sites (IS) in 8 recently established iPS cell lines derived from either human fetal fibroblasts or newborn foreskin fibroblasts after lentiviral gene transfer of Oct4, Sox2, Nanog, and Lin28. The number of IS ranged from 5 to15 IS per individual iPS clone and 75 IS could be assigned to a unique chromosomal location. The different iPS clones had no IS in common. Expression analysis as well as extensive bioinformatic analysis did not reveal functional concordance of the lentiviral targeted genes between the different clones. Interestingly, in 6 of the 8 iPS clones some of the IS were found in pairs, integrated into the same chromosomal location within six base pairs of each other or in very close proximity. Our study supports recent reports that efficient reprogramming of human somatic cells is not dependent on insertional activation or deactivation of specific genes or gene classes.
The reprogramming of murine fibroblasts into “induced pluripotent stem (iPS) cells” was first achieved by Takahashi and Yamanaka via ectopic retroviral expression of defined transcription factors (Oct4, Sox2, Klf4, and c-Myc) known to be required for normal embryogenesis and to be highly expressed in embryonic stem cells . In an independent screen Yu et al. identified the transcription factors Oct4, Sox2, Nanog, and Lin28 to be sufficient to reprogram human somatic cells into iPS cells . Both approaches were soon adapted to various other cell types, cells from different species as well as to somatic cells from patients [3–10]. The transgenes were introduced into target cells using replication-deficient murine retroviruses or lentiviruses. Although the reprogramming procedure is quite robust and has been replicated in many laboratories, the overall efficiency of obtaining fully functional iPS cells is very low, for instance only 0.02% of transfected cells become iPS cells after retroviral gene transfer , indicating that other factors besides the expression of the ectopic transgenes likely play a role in this process. Several studies indicate that the differentiation status of the starting cell population has an impact on the reprogramming efficiency , but even when homogeneous undifferentiated murine neural stem cells were used as targets the efficiency remained below 3.6% .
The low efficiency of iPS derivation suggests the worrisome possibility that viral vector integration into the genome activates cooperating endogenous proto-oncogenes, and that particular loci must be turned on in this manner to achieve efficient reprogramming. The ability to use episomal vectors for iPS derivation  and a recent report showing that protein delivery can induce reprogramming of mouse embryonic fibroblasts suggest that insertional mutagenesis is not required for iPS generation, and are encouraging milestones towards the generation of iPS cells without genetic modification . Due to the higher efficiency and reproducibility of integrating viral gene transfer protocols for reprogramming, this method remains the most common for the derivation of iPS cells. Yu et al. recently optimized the lentiviral based reprogramming strategy and were able to increase the efficiency almost 100 fold to 1% in human foreskin fibroblasts . Therefore, the issue of whether insertional activation contributes to iPS generation utilizing this system remains, and also impacts on assessment of the tumorigenic capacity of cells derived from iPS clones and the use of these vectors to create iPS cells for research applications.
Integrating retroviruses have long been known to be potentially oncogenic via activation of adjacent proto-oncogenes by strong proviral promoter/enhancers. Wild-type replicating viruses were assumed to result in tumors due to repeated random insertions into the genome, eventually resulting in activation of an adjacent gene. Replication-incompetent vectors were presumed to be much less likely to activate oncogenic gene or genes with a limited number of insertions in target cells. However, more recently it has been demonstrated that insertion patterns are far from random, with enhanced integration within actively-expressed genes and surrounding transcription start sites for the Moloney murine leukemia virus (MLV) whereas insertion of lentiviruses such as the human immunodeficiency virus (HIV) or simian immunodeficiency virus (SIV) are overrepresented in expressed transcriptional units [15–18]. Insertional activation of adjacent genes has been shown to result in biologic effects including immortalization, clonal dominance and malignant transformation both in vitro and in vivo, including in several landmark clinical trials targeting hematopoietic stem cells, a cell type that appears to be particularly susceptible [19–23].
In the present study we investigated the distribution of IS in 8 human iPS cell lines derived from fetal fibroblasts (IMR90) or from newborn foreskin fibroblast (FS). These human iPS clones were obtained via transduction utilizing lentivirus vectors expressing Oct4, Sox2, Nanog, and Lin28 transcription factors. We studied the general distribution of integration sites across clones and analyzed whether integration events perturbed expression of interrupted or nearby genes. We also compared integration sites between clones, asking whether specific loci were more likely to be targeted in iPS clones and thus possibly facilitating derivation of cells with primitive characteristics.
The human iPS cell lines IPS(IMR-90)1–4, IPS(FS) 1–4, IMR90 cells (Cat# CCL-186™, ATCC, Manassas, VA) and human newborn foreskin fibroblast (FS, Cat# CRL-2097™, ATCC) were cultured as previously described . The iPS-clones were derived from either IMR90 or FS cells after lentiviral gene transfer (Backbone: pSin4-EF2-IRES-Pur) of the human cDNA coding for either OCT4, Sox2, Lin28 and Nanog . Detailed vector maps and sequences can be found on www.addgene.org. Genomic DNA was isolated using the DNeasy DNA Purification Kit according to the manufacturer’s instructions (QIAGEN, Valencia, CA).
Linear amplification mediated LAM-PCR was performed as previously described with primers and linker cassettes shown in Table S3 . Three hundred nanograms of genomic DNA were linearly amplified using a HIV-3′-LTR–specific 5′-biotinylated primer. After the second strand synthesis by random priming, the DNA was digested with either ApoI or TasI and ligated to a linker cassette. Nested PCR was performed using HIV-3′-LTR–specific and linker-specific primers. The amplicons were purified from 2.5% low melting point agarose gels (NuSieve GTG, Cambrex, IA) using QIAGEN QIAquick Gel Extraction Kit (Valencia, CA) and cloned into pCR4TOPO vector (Invitrogen, Carlsbad, CA) for sequencing with M13-primers using an ABI Prism Genetic Analyzer (Applied Biosystems, Foster City, CA). Sequences were analyzed by the DNASTAR SeqManII software (Madison, Wisconsin), scanning for the pCR-4 TOPO vector, assembling sequences with lengths greater than 100 base pairs, a match size of minimum 50bp, and a percent match requirement of 95%. The trimmed sequences were aligned to Build 36 of the human genome (hg18) using the BLAT server (http://genome.ucsc.edu/) and a local copy of BLAST.
All PCR analyses were performed on 100ng RNA-free genomic DNA. Each integration site specific PCR was performed using one primer within the 3'LTR and the second within flanking genomic sequences. Primers used are listed in Table S3. After separation on 2.5% Low Melting Temperature agarose gels (NuSieve GTG, Cambrex, IA) the amplicons were purified using QIAquick Gel Extraction Kit (QIAGEN, Valencia, CA) and cloned in the pCR4TOPO vector (Invitrogen, Carlsbad, CA) for further sequencing using M13-primer (Table S3)
For Southern blot analysis 10ug of genomic DNA was digested with EcoRI and separated on a 0.75% Agarose gel. EcoRI cuts once within the vector genome. Following the transfer to a nylon membrane the DNA fragments were hybridized with a radiolabeled Puromycin cDNA probe generated by PCR from the original pSin EF2 vectors  (Primer list Table S3). The labeling reaction was performed according to manufacturer’s recommendations (Amersham Ready –To- Go™ DNA Labelling Beads, GE Healthcare)
The previously published Nimblegen expression data on the iPS cell lines was downloaded from GEO (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL5876). Relative expression levels for each gene interrupted by lentiviral insertions were determined by comparison to all other iPS-clones combined (for instance the data set of gene expression for genes interrupted by vector insertions for IPS(IMR90)-1 was compared to the data set of IPS(IMR90)-2 to 4 and IPS(FS)-1 to 4). A fold ratio and a p-value were calculated for each comparison and p-values less than 0.05 were considered as significant.
We compared the IS distribution in the iPS cells to in silico generated control sets of insertions in the human genome. Briefly, 10 000 sets of 75 random IS were designed in silico as follows: A TasI or ApoI site in the genome was selected at random using a random number generator. The in silico IS was placed either upstream or downstream (p=0.5) of the site, at a distance matching the size of one of the sequences obtained experimentally. The in silico IS was validated only when a BLAST alignment of the genomic sequence between the restriction site and the IS returned a unique sequence in the genome. This operation was repeated 75 times to obtain a single matching random dataset, and then again for a total of 10,000 datasets of 75 IS each. These control datasets were subjected to the same analyses as the experimental datasets, and the results were used to generate empiric p-values. p-values of less than 0.05 were considered significant.
Two gene lists were generated based on the occurrence of integration sites. The first list comprised of all genes with integration sites within the coding regions. The second list took into account all genes having an integration site within a 30 kb window. The gene lists were analyzed for enrichment of functional pathways using MetaCore™(GeneGo Inc., St Joseph, MI) and Ingenuity Pathway Analysis (IPA, Ingenuity Systems, Redwood City, CA). Hypergeometric test was performed to test for equality of observed proportion of genes mapped to a particular pathway between the gene lists and the reference set. Type I error was controlled by using false discovery rate correction for multiple testing (FDR=0.01).
Genomic DNA samples from previously established iPS cells clones were used for analysis of integration sites. Four iPS clones were derived from IMR90 fetal fibroblasts (IPS(IMR90)-1 to IPS(IMR90)-4) and four iPS clones were derived from foreskin fibroblasts (IPS(FS)-1 to IPS(FS)-4). The Oct4, Sox2, Nanog, and Lin28 transcription factors were transferred to the cells via a standard third generation lentiviral vector. These vectors have much of the viral LTR promoter/enhancer region deleted, but still contain a strong internal promoter to drive transgene expression and residual LTR enhancer elements. All clones were tested in a comparative manner for their ES cell-like phenotype which included telomerase activity, cell surface markers, and genes characterizing human ES cells . They also maintained the developmental potential to differentiate into derivatives of all three primary germ layers. Recently, the clones IPS(IMR90)-4 as well as IPS(FS)-1 were successfully differentiated into in vitro functional cardiomyocytes .
To identify the lentiviral integration sites we performed linear amplification–mediated PCR (LAM-PCR, ), on genomic DNA from all 8 iPS clones followed by shotgun cloning and sequencing. Valid sequences were mapped to the human genome (Build 36, hg18). In order to identify the complete insertion profile and to minimize restriction enzyme bias , we analyzed all samples separately with two different restriction enzymes (Apo1 and Tas1). Furthermore, Southern blot (Figure 1) analysis of each individual clone roughly confirmed the number of IS identified by LAM-PCR, indicating that the vast majority of IS present in each clone had been identified.
Using this approach, we were able to identify a total of 78 valid IS ranging from 5 to15 IS per individual iPS clone (Table 1). Valid IS were defined only if the sequence we obtained from the LAM-PCR was correctly juxtaposed to the vector long terminal repeat (LTR) and yielded a best BLAT hit with at least 95% identity over 95% of the length of the sequence (University of California, Santa Cruz [UCSC] Genome Browser, http://genome.ucsc.edu; hg18). Seventy five of the 78 valid IS could be assigned to a unique chromosomal location. Fifty-three out of 75 (70.7%) could be mapped within transcription units (TU) of RefSeq mRNAs, 69.3% in exons and 1.3% in introns, (RefSeq Genes track in the UCSC Genome Browser, Table 1 and Table 2). This IS distribution was significantly different compared to the 10,000 in silico-generated random IS datasets. The profile with an insertional preference for TU was very similar to previous IS analysis for HIV-based vector systems used to transduce other target cell populations [26–28]. There was marginal underrepresentation of IS within genomic long terminal repeat elements (LTR) for the iPS IS (60.0% vs. 75.24%, p ≤ 0.021), and overrepresentation of IS contained within a 5kb window surrounding CpG islands (10.67% vs. 5.33%, p ≤ 0.0444) (Table 1 and Table S1). We also observed a preference for iPS IS within a 30kb window of proto-oncogenes (Table 3) compared to the random controls (10.17% vs. 3.44%, p ≤ 0.0275).
The most important finding was the lack of any common integration sites shared between the different IPS clones. The closest distance between two integration sites found in different clones was 557kb (IPS(IMR90)-2.8 and IPS(FS)-3.7).
To determine if the lentiviral insertions resulted in a pattern of differential expression of interrupted or nearby genes, we used microarray data from IPS(IMR90)-1 to -4 and IPS(FS)-1 to -4 (GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL5876). The expression arrays (NimbleGen Systems) utilized allowed comparative expression analysis for 45 out of the 53 genes with lentiviral insertions. The expression of genes with IS from each individual clone was compared to either the profile of the remaining 3 clones derived from the same parental cells, or all 7 iPS clones combined. For example, the expression of the genes interrupted by insertions in IPS(IMR90)-1 was compared either to the expression of the same genes in IPS(IMR90)-2, IPS(IMR90)-3 and IPS(IMR90)-4 respectively or IPS(IMR90)-2 to -4 and IPS(FS)-1 to-4. Out of 45 evaluable genes only three (WDR66 and MYST2 in clone IPS(IMR90)-2, p<0.0001 and KIAA0528, p=0.03 in clone IPS(FS)-2) were significantly over-expressed. The expression of two genes in clone FS-1 (ACVR2A, p=0.01 and RAF1, p=0.02) were decreased compared to the expression of the genes in the other clones combined (Figure 2 and Table S2).
The lentiviral-tagged genes were investigated for over-representation of functional pathways in MetaCore™ (GeneGo Inc., St Joseph, MI) and Ingenuity Pathway Analysis (IPA, Ingenuity Systems, Redwood City, CA). MetaCore and IPA are web-based software suites incorporating proprietary, manually curated database of biological pathways, networks, interactions and disease biomarkers. Two gene lists were considered for the analysis: genes with integration sites within the coding regions and those having an integration site within a 30kb window. A false discovery rare (FDR) cut-off of p < 0.01 failed to identify significantly enriched biological pathways, indicating that no functional pathways were preferentially targeted by the lentiviral integrations (Figures S1A, S1B, S1C, and S1D).
Surprisingly, in 6 of 8 iPS clones two independent insertions were found within the same or very close chromosomal location. In what we have termed variant 1 double insertions (IPS(IMR90)-2.2a/b, IPS(IMR90)-4.9a/b, IPS(FS)-1.7a/b, IPS(FS)-4.1a/b) the two proviruses were integrated in the exact same location in reverse orientation head-to-head with the 3’-LTR junctions showing the expected 4–6bp duplication at the integration site (Figure 3 A) . In variant 2 double insertions the integrations were very close but not at identical sites, and were in a tail-to-tail (3’LTR to 3’LTR) configuration, found in IPS(IMR90)-3.4a/b, and IPS(FS)-2.4a/b (Figure 3 B). In order to confirm our LAM-PCR results and the existence of these unusual IS we performed conventional PCR on 3 clones (IPS(IMR90)-4, IPS(FS)-1 and IPS(FS)-4) harboring double insertions using primers within the LTR of the provirus and the gene affected by its integration. In all cases we were able to amplify a specific product of the expected size. Subsequent sequencing of these products confirmed the LAM-PCR results for the location of both integrants in these pairs (Figure 3 C).
Although the characteristic duplication at the IS of the variant 1 double insertions indicate that they are truly IS on a single allele, along with the finding of set on a single Y chromosome (IPS(FS)-1.7a/b), we further confirmed the single allele nature of the double integrants by performing PCR using primers spanning the potential site of insertion in IPS(IMR90)-4, IPS(FS)-1and IPS(FS)-4. In all cases we were able to detect an amplicon of the expected size (Figure 3 D) and again confirmed the genomic sequences. This result proves that one allele had no insertions, and thus further suggesting that the double insertions must be on the other allele.
Since the pioneering work from Takahashi and Yamanaka  first demonstrated the possibility of converting differentiated murine somatic cells into iPS cells with similar characteristics and functional properties as embryonic stem cells via the introduction of a set of transcription factors via viral gene transfer, several groups have confirmed the general principle of reprogramming by applying modified methods to cells of different origins and species, including human and non-human primate cells [2–5,8–10,30]. Even though the reprogramming strategies are rapidly evolving and there are recent reports documenting generation of iPS cells without the use of integrating viral vectors [13,14,31,32], the transfer and expression of the crucial transcription factors via retro- or lentiviral gene transfer remains the most commonly-utilized reprogramming strategy in the preclinical and laboratory setting, due to its robustness and reproducibility. Moreover, the reprogramming efficiency in the episomal and protein transduction studies was substantially lower in comparison to the viral gene delivery methods, raising the possibility that a selection of de novo mutations in the parental cell population contributed to the induction of the iPS phenotype [13,14].
Whether activation or dysregulation of cellular genes caused by genomic vector insertions contributes to or modifies the process of iPS cell generation or properties are important to fully investigate, even if non-integrating delivery systems are eventually utilized for future clinical applications. Are specific integration sites found repeatedly in iPS cell lines, indicating that dysregulation of one or more genes greatly enhance the chance of iPS generation? Or specific gene classes? Are genes near integration sites reproducibly up-or down-regulated?
Our study is the first to investigate the vector insertion profile in human iPS- cell lines. Prior studies reported on integration sites found in murine iPS cells generated utilizing standard MLV retroviral vectors, and did not reveal any recurrent integration sites shared by different iPS clones [33, 34]. Given differences in the ease of transformation of murine versus human cells of all types, and the lower efficiency of iPS generation from human versus murine cells, it is important to carry out insertion studies specifically in human iPS cell lines. Human iPS cells were initially created utilizing either standard MLV or lentiviral HIV vectors [1, 2, 4]. Because MLV vectors have been associated more directly with genotoxicity and insertional proto-oncogene activation in various clinical and experimental systems than lentiviral vectors, we felt iPS cells created utilizing MLV vectors were unlikely to move forward into clinical applications or be utilized by most laboratory investigators [35–37].
We therefore focused on analysis of IS in 8 human iPS cell lines created via transfer of four transcription factors utilizing safety-modified HIV-based lentiviral vectors [2,38]. These vectors drive gene expression with an internal promoter and have the LTR enhancer regions deleted. However, this type of vector has still been shown to perturb the expression of adjacent genes [39, 40]. Even if iPS cells shut down expression of the vector transgenes during the derivation process, it is possible that dysregulation of adjacent genes might persist and impact on iPS properties.
In order to circumvent possible restriction enzyme bias and to increase the probability of identifying all IS present in these clones, we performed independent LAM-PCR reactions using two restriction enzymes. Using this approach, we independently validated 66% of the IS; for the remaining IS the alternative restriction site was either too far or too close from the LTR-genome junction to allow amplification or an unambiguous mapping. The number of IS detected by LAM-PCR was also in very close approximation to the number of IS predicted by Southern blotting (Figure 1 and Table1), providing reassurance that virtually all IS were identified. It is critical with LAM-PCR to use stringent criteria for identification of IS, including presence of the appropriate LTR junctions and at least 95% matching to the human genome for the entire length of the cloned fragment . We found no IS shared between the 8 examined cell lines. Furthermore, mRNA expression analysis did not reveal any overlapping dysregulation of genes near insertion sites. Pathway and network studies showed a wide functional diversity of the lentivirally tagged genes, all indicating that insertional mutagenesis is unlikely to systematically contribute to the reprogramming process.
Proto-oncogenes were over-represented in the list of insertion sites in iPS cells, similar to analysis of the murine dataset obtained by Aoi et al. [33, 39]. The target cells were highly proliferative embryonic or foreskin fibroblasts, and proto-oncogenes may be highly expressed in these target cells and therefore more likely targets. HIV vectors are known to preferentially integrate in genes that are highly expressed .
Although we did not detect any IS shared between independent iPS clones, 6 of the 8 clones were found to harbour unusual “double” insertions, with two proviral insertions located very close to each other on the same chromosomal allele (Figure3 A and B). To our knowledge, this phenomenon has not been previously reported in any insertion site survey, neither in our own published analysis of 702 MLV IS and 501 SIV IS in rhesus macaques [43,44], nor in insertion site analyses of cells transduced with HIV vectors similar to those used to create iPS clones . The loci with double insertions did not fall into any particular class of genes, nor were they shared between clones or differentially expressed between clones. This suggests no direct functional link with the success of reprogramming and achieving pluripotency, but instead an epiphenomenon of the lentiviral reprogramming procedures. In order to obtain iPS clones, there is strong positive selection for cells that were able to be simultaneously transduced with four different vectors, unlike previous MLV and HIV experiments, where integration profiles were carried out on cell populations harbouring fewer copies, and without selection pressure for a high copy number per cell. The target cells were exposed to high titer stocks of all vectors, and only those that successfully integrated and expressed proviruses carrying all four transgenes were able to grow out as iPS cells. Most of the cells had more than one copy of each vector. Possibly in cells successfully transduced simultaneously with so many vector copies events occur during integration process that somehow favor nearby dual integrants, for instance more than one provirus utilizing the same molecular site as part of the integrase complex, resulting in adjacent integration after tethering to chromatin. Successful reprogramming to fully functional iPS cells not only requires the correct stoichiometrical expression of the factors, but also the subsequent silencing of the viral vectors . It is possible that one or both requirements are better achieved with proviruses integrated close together .
In conclusion, although the sample size of our study as well as of prior studies in the murine system [33, 34] does not unequivocally rule out a contribution of insertional mutagenesis via proviral insertion on the reprogramming of somatic cells to iPS cells, we found no evidence of shared integration site selection between independent iPS clones, and no pattern of gene expression perturbation linked to vector insertions. Thus reprogramming of human somatic cells likely does not depend on activation/deactivation of specific genes induced by lentiviral integration. Our study supports the continued utilization of lentiviral vectors for generation of iPS cells, at least for non-clinical applications, given their improved efficiency compared to most non-integrating vector or protein transfer approaches. Further research is needed to explore the nature and the influence of the double insertions.
This research was supported in part by the Intramural Research Programs of the National Heart, Lung, and Blood Institute and the National Human Genome Research Institute, National Institutes of Health.
Author contributions: T.W.: conception and design, collection and assembly of data, data analysis and interpretation, manuscript writing; A.C.: collection and assembly of data, data analysis; J.Y.M: collection and assembly of data; X.X: collection and assembly of data; A.D.N. and B.B.: assembly of data, data analysis and interpretation; J.E.AB.: provision of study materials. T.G.W. assembly of data, data analysis and interpretation, J.A.T: provision of study materials, conception and design and final approval of manuscript. C.E.D: conception and design, data analysis and interpretation, manuscript writing, final approval of manuscript
Conflict of interest: J.E.AB.: Cellular Dynamics International: Consultancy, Equity Ownership. J.A.T.: Cellular Dynamics International: Equity Ownership, Membership on entity’s Board of Directors. Tactics II Stem Cell Ventures: Consultancy. The other authors have no conflicts of interest to declare.