The advent of induced pluripotent stem (iPS) cells enables for the first time the derivation of unlimited numbers of patient-specific stem cells
1–3 and holds great promise for regenerative medicine
4,5. Recent studies have explored the potential of iPS cell generation combined with gene and cell therapy for disease treatment in mice and humans
4,5. However, for the promise of iPS cell technology in therapeutic applications to be fully realized, clinically translatable methodologies for the introduction of therapeutic, suicide, drug resistance or reporter genes into human iPS cells will be needed. The foreign genetic material should ideally be delivered into ‘safe harbors’, that is, regions of the genome where the integrated material is adequately expressed without perturbing endogenous gene structure or function, following a process that is amenable to precise mapping and minimizing occult genotoxicity. Retroviruses, such as HIV, efficiently integrate in the human genome with a strong bias toward actively transcribed genes
6. This semi-random integration pattern favors expression of retrovirally encoded transgenes but entails a risk of perturbing the expression of neighboring genes, including cancer-related genes
7–10. We hypothesized that screening iPS cell clones harboring a single vector copy would enable us to retrieve safe harbor sites that met the following five criteria: (i) distance of at least 50 kb from the 5′ end of any gene, (ii) distance of at least 300 kb from any cancer-related gene, (iii) distance of at least 300 kb from any microRNA (miRNA), (iv) location outside a transcription unit and (v) location outside ultraconserved regions (UCRs) of the human genome
11. As the most common insertional oncogenesis event is transactivation of neighboring tumor-promoting genes
7,12, the first two criteria exclude the portion of the human genome located near promoters of genes, in particular, cancer-related genes (
Supplementary Table 1). The latter were defined as genes functionally implicated in human cancers or the human homologs of genes implicated in cancer in model organisms (available at
http://microb230.med.upenn.edu/protocols/cancergenes.html). Proximity to miRNA genes was adopted as an exclusion criterion because miRNAs are implicated in the regulation of many cellular processes, including cell proliferation and differentiation. As vector integration within a transcription unit can disrupt gene function through the loss of function of a tumor suppressor gene or the generation of an aberrantly spliced gene product
10, our fourth criterion excludes all sites located inside transcribed genes. Finally, we excluded UCRs—regions that are highly conserved over multiple vertebrates and known to be enriched for enhancers and exons
11.
We investigated this approach in an iPS cell model for the genetic correction of β-thalassemia major using a well-characterized globin lentiviral vector
13,14 (). We generated a total of 20 iPS cell lines from skin fibroblasts or bone marrow mesenchymal stem cells (MSCs) () from four individuals with β-thalassemia major of various genotypes (
Supplementary Table 2). All putative thalassemia iPS cell lines (referred to as thal-iPS) exhibited characteristic human embryonic stem (hES) cell morphology ( and
Supplementary Fig. 1). Seven putative thal-iPS cell lines (
Supplementary Table 2) were selected for further characterization. They expressed human pluripotent cell markers (Tra-1-81, Tra-1-60, SSEA-3, SSEA-4 and Nanog) and pluripotency-related genes at similar levels to hES cell lines ( and
Supplementary Figs. 1–3). Their pluripotency was assessed by formation of teratomas comprising tissues derived from all three germ layers after grafting into immunodeficient mice ( and
Supplementary Figs. 4 and 5). They could be efficiently differentiated
in vitro into mesoderm derivatives, such as beating putative cardiomyocytes (
Supplementary Movie 1) and hematopoietic progenitor cells (see below). Genotyping confirmed the β-thalassemia mutations (
Supplementary Table 2 and
Supplementary Fig. 6). Silencing of all four transgenes was demonstrated by flow cytometry (in thal-iPS cell lines derived using vectors encoding the four reprogramming factors OCT4, SOX2, KLF4 and c-MYC together with distinct fluorescent proteins
15,
Supplementary Fig. 7), as well as quantitative reverse-transcription (qRT)-PCR (
Supplementary Fig. 8). Demethylation of the
OCT4 promoter was assessed and confirmed in the thal-iPS cell lines thal1.52, thal2.1, thal5.10 and thal5.11 (). All seven thal-iPS cell lines tested exhibited normal male or female karyotypes ( and
Supplementary Fig. 9). To generate transgene-free thal-iPS cells, we selected two thal-iPS cell lines, thal5.10 and thal5.11, found to contain six copies of the single polycistronic vector flanked by
loxP sites (fSV2A) used for reprogramming (
Supplementary Fig. 10a), after all six copies of the fSV2A vector they both contained were mapped to the genome (
Supplementary Table 3). Several excised thal-iPS cell lines were derived from them after transient Cre expression by an integrase-deficient lentiviral vector (Cre-IDLV). Complete excision of all six copies of the fSV2A vector (
Supplementary Fig. 10a–d,f) and absence of integration of the Cre-IDLV vector (
Supplementary Fig. 10c,e,f) were thoroughly documented. Altered expression of endogenous genes in the vicinity of the six integrated vectors or of the residual promoterless (U3-deleted) lentiviral long terminal repeats (LTR) before and after vector excision, respectively, was excluded by microarray analysis (
Supplementary Fig. 11). Characterization of two vector-excised lines, thal5.10-Cre8 and thal5.11-Cre23 (derived from lines thal5.10 and thal5.11, respectively), confirmed their preserved pluripotency (
Supplementary Figs. 1–3, 5). Comparative genomic hybridization (CGH) of the excised line thal5.10-Cre8 and the parental MSCs revealed no genetic abnormalities (
Supplementary Fig. 12).
To establish thal-iPS cell clones harboring a therapeutic β-globin gene, we generated a lentiviral vector, TNS9.3/fNG, expressing the human β-globin gene
cis-linked to its DNAse I hypersensitive site (HS) 2, HS3 and HS4 locus control region elements, derived from the previously described TNS9 vector
13 (). To determine the probability of retrieving sites that meet the safe harbor criteria, we analyzed 5,840 integration sites of our TNS9.3/fNG vector in the thal5.11-Cre23 iPS cell line. This survey revealed that 17.3% of all integrations met all five criteria (
Supplementary Table 4), supporting the feasibility of recovering iPS cell clones harboring vector integrations in safe harbors from a relatively small set of clones. We thus transduced the thal-iPS cell lines thal1.52, thal2.1, thal5.10 and thal5.11 at low multiplicity of infection to isolate thal-iPS cell clones harboring a single TNS9.3/fNG vector copy. Fifteen clones found to harbor a single TNS9.3/fNG copy by quantitative PCR (
Supplementary Table 5) were randomly selected. Single-vector integration and clonality could be thoroughly established by Southern blot analysis after digestion using two different restriction enzymes and two different probes ( and
Supplementary Fig. 13) in 13 of them, and the vector integration sites were mapped to the human genome ( and ). One of the 13 clones, clone thal5.10-2, was found to harbor an integration that meets all five safe harbor criteria (). Two additional safe harbor sites were found among 23 other sites we mapped in multiple-copy thal-iPS cell clones (
Supplementary Table 6).
| Table 1Analysis of the globin vector integration site in 13 single-vector-copy thal-iPS cell clones with respect to the five safe harbor criteria |
To assess vector-encoded β-globin gene expression, we derived hematopoietic progenitors through embryoid body differentiation of the 13 single-copy thal-iPS cell clones and we further differentiated them along the erythroid lineage ( and
Supplementary Fig. 14). By the end of this process, the majority of cells exhibited characteristic hematopoietic cell morphology, expression of the erythroid cell markers glycophorin A and transferrin receptor (CD71) and macroscopic hemoglobinization (
Supplementary Figs. 14 and 15). The erythroid nature of these thal-iPS cell derivatives was further corroborated by the marked induction of well-characterized, erythroid-specific genes (
Supplementary Fig. 16). Notably, the erythroid progeny of all wild-type and untransduced thal-iPS cell lines expressed α-globin, as well as embryonic and fetal ε- and γ-globins, albeit not the adult β-globin transcript, similarly to the erythroid progeny of the H1 hES cell line ( and
Supplementary Fig. 17) and in accordance with previous reports
16–18. Expression of vector-encoded β-globin was not detected in undifferentiated thal-iPS cell clones, as expected (
Supplementary Fig. 17). Upon erythroid differentiation, 12 of the 13 single-copy thal-iPS cell clones expressed detectable vector-encoded β-globin. Expression levels, normalized to endogenous α-globin expression, ranged from 9% to 159% (mean, 53%) of a normal endogenous β-globin allele (), similar to those we and others have obtained by lentiviral-mediated globin gene transfer in murine and human erythroid cells
14. β-globin expression was confirmed and quantified at the protein level by high-performance liquid chromatography (HPLC) analysis in four clones (,
Supplementary Table 7 and
Supplementary Fig. 18). Notably, clone thal5.10-2, which expressed 85% of the level afforded by a normal endogenous β-globin allele (), demonstrates that a globin vector, integrated in a site meeting all five of our safe harbor criteria () and located >300 kb from the nearest gene 5′ end, is capable of expressing β-globin at a high level.
Expression of genes located within 300 kb of the vector insertion site was assessed in six single-copy thal-iPS cell clones in both the undifferentiated state, as well as in the erythroid progeny by micro-arrays. This analysis revealed that three out of five integrations eliminated by our safe harbor criteria did indeed result in perturbed expression of neighboring genes (
Supplementary Figs. 19 and 20). Dysregulated expression was detected in a total of five genes present at a distance ranging from 9 to 275 kb from the vector insertion, whereas we did not detect any genes beyond 300 kb of the insertion to be significantly differentially expressed (
P < 0.05). Of note, the safe harbor integration site in clone thal5.10-2 is in a genomic region with no genes within 300 kb on either side. The microarray analysis did not reveal any statistically significant differentially expressed genes elsewhere in the genome in this clone or any other.
Our data demonstrate that the generation and identification of transgene-expressing iPS cell clones, in which transgene expression is obtained at therapeutic levels in iPS cell progeny from selected chromosomal sites, are feasible by screening a limited number of single-copy clones and applying five safe harbor criteria for their selection. Approximately half (47.7%) of the clones we obtained under optimized transduction conditions harbored a single vector copy (
Supplementary Table 5), and clonality could be confirmed in 13 out of 15 (86.7%) of them. As the frequency of integrations in sites that meet our five safe harbor criteria is 17.3% (
Supplementary Table 4), the overall efficiency of our strategy is 7.1%. Three out of five clones eliminated by our safe harbor criteria showed perturbed expression of neighboring endogenous genes, which was not the case in clone thal5.10-2, demonstrating the usefulness of selecting genetically modified iPS cell clones based on this strategy and these criteria. Notably, applying our criteria to a series of gamma-retroviral and lentiviral integration sites associated with oncogenic events or perturbed endogenous gene expression would effectively eliminate all of these well-characterized deleterious integrations
7–10.
This approach has the prospect of broad application in genetic engineering of human iPS cells. Genetic correction through addition of a therapeutic gene into safe harbors in patient-specific iPS cells provides a realistic alternative strategy to targeted gene repair, especially for genetically heterogeneous disorders associated with multiple mutations. In contrast to genome editing strategies, our approach does not require customized targeting vectors with long isogenic ends
19 or complex genotoxicity screens that are needed when using endonucleases
20,21. In the latter case, the risk of occult genotoxicity mediated by off-target effects of double-stranded DNA cleaving agents needs to be balanced against the long-term experience with risk assessment of retroviral vector integration, which can be thoroughly analyzed, as we demonstrate here. Apart from genetic correction, future clinical applications of iPS cells will likely require addition of drug resistance, reporter or suicide genes to permit
in vivo selection, tracking or cell eradication, respectively. To this end, the identification of suitable genomic locations for transgene knock-in is of great importance. Recent studies suggest that genomic sites, such as the adeno-associated virus integration site 1 (AAVS1)
20,22 and the human ROSA26 locus
23, can support transgene expression, but data on the safety of these sites are lacking. The screening strategy we describe here should prove useful for the
de novo discovery and characterization of putative universal genomic safe harbors. The requirements for a safe harbor are (i) avoidance of genotoxicity and (ii) support of the appropriate expression level and regulation of the integrated transgene. Notably, β-globin gene expression in the safe harbor clone thal5.10-2 was in the therapeutic range, which, based on clinical observations in individuals with homozygous β-thalassemia and hereditary persistence of fetal hemoglobin, is on the order of 30% of α-globin expression
24.
The potential genotoxicity of the reprogramming process used upstream of our safe harbor strategy also needs to be taken into account. In this study we used an excisable vector system and selected patient-specific iPS cell lines harboring a relatively low number of reprogramming vector copies and determined their position in the genome. Since Cre-mediated excision leaves behind a promoterless, U3-deleted LTR, we propose that lines can be selected—as we demonstrate here—on the basis of (i) exclusion of all integrations within exons, to avoid frame-shift, premature termination of translation or translation of abnormal proteins and (ii) ascertainment of lack of perturbation of gene expression by residual LTR fragments that reside within transcription units. Based on our large integration site data set in human iPS cells, 97% of all lentiviral vector integrations are outside exons. The need to screen thal-iPS cell lines for residual LTR insertions may be eliminated if efficient generation of human iPS cells using nonintegrating systems becomes a realistic option.
Ascertainment of lack of perturbation of gene expression in the host cell in both a local and genome-wide range, as shown here (
Supplementary Figs. 19 and 20), provides an important initial safety test. This can be complemented by additional tests for features of neoplastic transformation
25 and, eventually, by serial transplantation studies of iPS cell–derived hematopoietic stem cells in immunodeficient mice, currently precluded by the inability to efficiently generate engraftable human hematopoietic stem cells derived from ES and iPS cells
5,26. Further evaluation of safe harbors could also include long-term studies in transgenic mice bearing transgenes in syntenic regions, as well as bioinformatics-assisted searches in the cumulated databases of common retroviral integration sites found in patients treated with retroviral vectors and not associated with any side effect
27–29, although this information is of limited value in the absence of transgene expression data at these sites.
In conclusion, the present study provides a framework and a strategy combining bioinformatics and functional analyses for identifying safe harbors for transgene integration in the human genome. As our understanding of the function of the human genome and of genome-wide interactions advances, the definition of safe harbors will likely be refined over time, eventually building a registry of dependable genomic locations for the safe and effective genetic engineering of human cells.