|Home | About | Journals | Submit | Contact Us | Français|
Human embryonic stem cells (hESCs) are used as platforms for disease study, drug screening and cell-based therapy. To facilitate these applications, it is frequently necessary to genetically manipulate the hESC genome. Gene editing with engineered nucleases enables site-specific genetic modification of the human genome through homology-directed repair (HDR). However, the frequency of HDR remains low in hESCs. We combined efficient expression of engineered nucleases and integration-defective lentiviral vector (IDLV) transduction for donor template delivery to mediate HDR in hESC line WA09. This strategy led to highly efficient HDR with more than 80% of the selected WA09 clones harboring the transgene inserted at the targeted genomic locus. However, certain portions of the HDR clones contained the concatemeric IDLV genomic structure at the target site, probably resulted from recombination of the IDLV genomic input before HDR with the target. We found that the integrase protein of IDLV mediated the highly efficient HDR through the recruitment of a cellular protein, LEDGF/p75. This study demonstrates that IDLV-mediated HDR is a powerful and broadly applicable technology to carry out site-specific gene modification in hESCs.
Human pluripotent stem cells (hPSCs), such as embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs), are used to establish surrogate models to study development biology and disease pathogenesis in a dish. They can also serve as platforms for drug validation, screening and cell-based therapy. To facilitate these applications, it is frequently necessary to genetically manipulate the hPSC genome. Recent progress in the development of gene editing technologies has significantly advanced the ability to insert, disrupt or repair a gene in hPSCs. These gene editing tools, including zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated (Cas) system, bind their genomic target and create double-strand DNA breaks (DSBs) (1). DSBs instigate DNA repair through non-homologous end-joining (NHEJ), which is error-prone and frequently leads to gene inactivation (2). Alternatively, in the presence of the sister chromatid or an exogenous donor DNA template, DSBs stimulate homology-directed repair (HDR), resulting in targeted genome modification (3). Both NHEJ and HDR are active in nearly all cell types and organisms. HDR occurs only during S and G2 phases whereas NHEJ occurs throughout the entire cell cycle (4). The frequency of HDR is generally lower and more variable than NHEJ, likely caused by the competition with NHEJ for repairing DSBs and specific suppression of HDR in M and early G1 phases to prevent telomere fusion (5). The obstacle to carry out HDR is exacerbated in hPSCs due to the poor efficiency of introducing the donor template. Intrinsic properties unique to hPSCs also contribute to the poor HDR frequency as much higher HDR frequencies were observed in murine ESCs (6). The HDR frequency in hPSCs could also be affected by the activity of the targeted gene. Several studies have shown relatively poor HDR frequencies when a silent gene in hPSCs was targeted when compared with an active gene (7–9). These problems in conjunction with the difficulty to maintain stable hPSCs in culture hamper broad application of hPSCs in basic research and clinical application. Although ongoing efforts have made some progress to boost the HDR frequency in human ESCs with small molecules of NHEJ inhibitors (10–13), the potential effect of these small molecules on self-renewal and pluripotency of hESCs remains unknown. Data from several studies suggests that the low HDR frequency in hESCs is not likely due to a failure of the engineered nuclease to recognize its genomic target (14,15), since the frequency of NHEJ-induced genetic modifications is significantly higher than that of the HDR-induced genetic modifications with the same nuclease in hESCs. The major limitation seems to be due to the lack of efficient engagement of the HDR-mediated repair pathway in hESCs. Finding strategies to overcome this obstacle and improve the efficiency of precise gene editing in hPSCs therefore pose a major challenge.
To boost the HDR frequency in human cells, several groups used non-integrating viral vectors to deliver the gene editing system and the donor template. Viral vectors are attractive owning to their high efficiency in transducing human cells. Since these vectors are non-integrating, their genomes would not persist in actively proliferating cells. In addition, viral transduction permits fine control over the number of DNA copies that reach the nucleus to mediate HDR. Vectors derived from adeno-associated virus (AAV) have been shown to achieve high efficiencies of gene targeting (up to 34% with selection) with ZFN in human U2OS osteosarcoma line (16). Donor template-carrying AAV6 in conjunction with ZFN mRNAs was shown to mediate HDR at frequencies ranging from 19 to 43% with fetal liver-derived hematopoietic progenitor cells (HPCs) in the absence of selection (17). AAV-mediated donor template delivery in the absence of an engineered nuclease also generated highly efficient HDR (up to 30% with selection) in hESCs and iPSCs (18,19). However, these studies were limited only to active loci in human HPCs or PSCs. Whether silent loci can be edited with the AAV vector at a similar frequency remains unclear. Using adenoviral (Ad) vectors to introduce DSBs, although the overall frequency of getting clones expressing the GFP marker was low, nearly all the GFP+ clones were derived from HDR (20). However, Coluccio et al. showed in a side-by-side comparison that IDLV, in the presence of ZFN, was more efficient than the Ad vector to mediate HDR in human cells (21). Ad vector mediated HDR without an engineered nuclease was also shown to work in hESCs and iPSCs (22,23). However, long homology arms (>10 kb) were required to mediate such HDR, making it more time consuming for the donor template assembly and its insertion into the Ad vector backbone. Several studies used integration-defective lentiviral vector (IDLV) for gene editing in human cells. IDLV is derived from mutations in the integrase (IN) of human immunodeficiency virus-1 (HIV-1) that impair the ability of HIV-1 to integrate into the host genome. Although IDLV was shown to be more efficient than AAV and Ad vectors to mediate HDR in some studies (20,21), the overall frequency remained relatively low in hPSCs. In studies using IDLVs to deliver both the ZFNs and the donor template, the HDR frequency was ~3.5% in human ESCs and ~0.6% in human iPSCs in the absence of selection (24,25). Since IDLV does not support efficient transgene expression due to epigenetic silencing of the unintegrated vector genome (26), the low HDR frequency in hPSCs may be enhanced by using different strategies to increase the expression of the engineered nuclease.
We chose to use nucleofection to introduce the expression plasmids for TALEN pairs and CRISPR/Cas9 into hESCs. The study by Zou et al. showed that ZFNs could be efficiently expressed with nucleofection and achieved significant gene correction through HDR in hESCs (27). We found that, under this condition, the donor template delivered via IDLV transduction led to significantly improved HDR frequencies in hESC line WA09. More than 80% of the WA09 clones screened were derived from homologous recombination of the shared homology regions between the donor template and the targeted genomic locus. Importantly, the high HDR frequency with IDLV occurred despite the two genetic loci we tested, the genes encoding hematopoietic-specific Wiskott-Aldrich Syndrome (WAS) protein and motor neuron-specific transcription factor HB9, remained silent in undifferentiated WA09 cells. We also found that highly efficient HDR mediated by IDLV was partly due to the recruitment of the host factor, lens epithelium–derived growth factor p75 splice variant (LEDGF/p75) (28), by the IN protein in IDLV. LEDGF/p75 promotes the repair of cellular DSBs through the endogenous HDR pathway (29). Active recruitment of host HDR proteins by the IDLV IN protein in the pre-integration complex could therefore account for the unique advantage of using IDLV to deliver the donor template for HDR in hESCs. This strategy overcomes a major hurdle in the gene editing technology and shows that IDLV-mediated HDR is a powerful and broadly applicable technology to carry out site-specific gene editing in hESCs.
HEK293T cells (CRL 3216, ATCC, Manassas, VA, USA) and HEK293 cells (CRL1573, ATCC) were cultured in Dulbecco's Modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (Hyclone, Logan, UT) and 2 mM GlutaMAX (Life Technologies, Gaithersburg, MD) at 37°C with 5% CO2 incubation. Human embryonic stem cell line WA09 (WiCell, Madison, WI, USA) and derived HDR clones were maintained in ESC medium consisting of DMEM/F12 (1:1) with 20% Knockout Serum Replacement, 2 mM GlutaMAX, 0.1 mM MEM NEAA, 100 U/ml penicillin, 100 mg/ml streptomycin and 0.1 mM 2-mercaptoethanol (Sigma, St Louis, MO, USA), supplemented with 4 ng/ml basic fibroblast growth factor (bFGF). Cells were cultured on irradiated mouse embryonic fibroblasts (MEF) CF-1 (GlobalStem, Rockville, MD, USA) for regular culture and DR4 MEF for G418 or puromycin selection. WA09 cells were transferred to Matrigel (BD Biosciences, San Jose, CA, USA)-coated dishes and cultured with mTeSR1 (Stemcell Technologies, Vancouver, BC, Canada) for at least one passage prior to nucleofection followed by daily culture medium change.
Nucleotide Targeter 2.0 (https://tale-nt.cac.cornell.edu/) was used to identify the TALEN target site. TALEN was assembled and the expression plasmid constructed as described (14,30). The assemble kit was purchased from Addgene (Cambridge, MA) (31). To construct pU6-CRISPR, the fragment containing the U6 promoter and tracrDNA was polymerase chain reaction (PCR) amplified from pX330 (32) (Addgene) and cloned into pBluescript SK (-). To construct the CRISPR expression plasmid, complementary oligonucleotides encoding each sgRNA were annealed and cloned into the BbsI sites in pU6-CRISPR. The design of the sgRNA was according to the recommendation on the Zhang laboratory website (http://crispr.genome-engineering.org). To construct the Cas9 expression plasmid, the CBh-hCas9 fragment was PCR amplified from pX330 and cloned into pBluescript SK (-).
HEK293 cells were grown to 40% confluence in 48-well culture dishes and transfected with 0.14 μg of each TALEN expression plasmid or 0.08 μg pU6-sgRNA together and 0.2 μg pCBh-hCas9 using Lipofectamine 2000 (Invitrogen, San Diego, CA, USA). Forty-eight hours after transfection, the genomic DNA was extracted with Epicentre QuickExtract solution (Epicentre Biotechnologies, Madison, WI, USA). Approximately 4000 genome equivalents of the transfected cells were used in a 25 μl PCR reaction to amplify the region of interest. Primers used for monitoring HDR were WAS-F (AAAGGAAGTTGGGCAGAGGTGAGT), WAS-R (CCATCGATTGTGTGTTGGATGGTCATGGAGGT) and HB9-F (AACGCAGCAAAAAGGCCAAA), HB9-R (ACGCTCGTGACATAATCCCC). PCR amplification was carried out with Hotstar Taq (Qiagen, Valencia, CA, USA) using the following cycling conditions: 95°C for 15 min for initial denaturation; 35 cycles at 94°C for 30 s, 60°C for 30 s, 72°C for 30 s and a final extension at 72°C for 5 min. The activity of the engineered nuclease was measured using the Surveyor nuclease following the manufacturer's instruction (Transgenomic, Omaha, NE, USA). ImageJ was used to measure the indel efficiency by quantifying the intensity of the band separated by gel electrophoresis. The following formula was used to calculate the percent of indel formation (33): % indel formation = 100 × [1 − (1 − fraction cleaved)1/2]; fraction cleaved = 100 × sum of the cleavage product peaks / (cleavage products + parent peak).
To construct the donor template-containing IDLV for the WAS gene, the WAS genomic sequence spanning the nuclease cleavage site was PCR amplified using two primer sets: WASTL-F (CCGGAATTCGTGTGGAGAGGAGATGGGAAAGTT) and WASTL-R (CCGGAATTCCATCCATCCAGAGACACAGGGAAG) for the upstream homology arm; WASTR-F (CCATCGATGGTAAGAGTGGATGGAGGAATGAG) and WASTR-R (CCATCGATTGTGTGTTGGATGGTCATGGAGGT) for the downstream homology arm. The neoR gene controlled by the phosphoglycerate kinase (pgk) promoter was inserted between the two homology arms to generate pWAS-T. To generate the pHIV7/WAS for making IDLV, the 3 kb fragment containing the neoR gene flanked by the two homology arms derived from the WAS locus was isolated from pWAS-T by EcoRI and XhoI digestion, blunted with Klenow (New England Biolabs, Ipswich, MA, USA), and cloned into the unique BamHI site in pHIV7. To construct the donor template-containing IDLV for the HB9 gene, the HB9 genomic sequence spanning the nuclease cleavage site was PCR amplified using two primer sets: HB9TL-F (TGCTCTAGAGTCTCACCCTGAGGCCATTC) and HB9TL-R (CCATCGATCTCGAGCTGGGGCGCGGGCTGGTGGC) for the upstream homology arm, HB9TR-F (CCATCGATGAGCCCCGCGGCCCAGCAGGTGCGGC) and HB9TR-R (ACGCGTCGACTCTAGAGTTTGAACGCTCGTGACATAATC) for the downstream homology arm. The purR gene encoding the puromycin N-acetyl-transferase controlled by the pgk promoter was inserted between the two homology arms to generate pHB9-T. To generate the pHIV7/HB9-T for making IDLV, the 2.48 kb fragment containing the purR gene flanked by the two homology arms derived from the HB9 locus was isolated from pHB9-T by XbaI digestion, blunted with Klenow, and cloned into the unique BamHI site in pHIV7.
To generate infectious IDLV, the lentiviral construct was transfected into HEK293T cells with pC-Help or p8.91_Q168A and pCMV-G by calcium phosphate co-precipitation as described previously (34). The vector harvested was concentrated by precipitation with polyethylene glycol (PEG). To determine the relative IDLV titer, HEK293 cells were transduced with serially diluted viral stocks, selected in 800 μg/ml G418- or 1μg/ml puromycin-containing medium. Colony-Forming Unit (CFU) was scored after two weeks in selection. To carry out HDR, 2 μg of each TALEN plasmid or 1 μg pU6-sgRNA together, 3 μg pCBh-hCas9, with or without 1 μg IDLV construct were co-transfected into 6 × 105 HEK293 cells using Lipofectamine 2000 in 6-well dishes. This was followed by IDLV transduction 16 h later at a dose of 2000 CFUs. The treated cells were serially diluted, plated and selected in G418- or puromycin-containing medium 48 h after transduction. The genomic DNA from individual neoR or purR colonies was isolated after two weeks using Quick Extract solution and subjected to PCR screening. The PCR primers used for measuring the HDR frequency for the WAS gene were WF1 (TGCTCCCTGCCCAGCTAACAAA) and WR1 (CCATCTTGTTCAATGGCCGATCCC) for the genomic region upstream from the nuclease cleavage site, WF2 (AGCGCATCGCCTTCTATCGCC) and WR2 (TCTGCCCATCCTTCCATTCACTCA) for the genomic region downstream from the nuclease cleavage site. WF1 and WR2 were located outside of the homology arms to avoid the amplification of the donor template-containing IDLV genome. The amplification was carried out with PCR Master Mix (Promega, Madison, WI, USA), using the following cycling conditions: 95°C for 5 min for initial denaturation; 34 cycles of 94°C for 30 s, 58°C for 30 s and 72°C for 90 s; and a final extension at 72°C for 7 min. The primer set for H-T concatemer detection was R-LTR (CGCACCCATCTCTCTCCTTCTA), LTR-WPRE (TCCTTCTGCTACGTCCCTTC). The primer set for DNA loading control in Figures Figures1C1C and 2C was Loading Ctrl-F (AGGGTTTCACTATGAAGGGAGGGA) and Loading Ctrl-R (AGTGGACCAGAACGACCCTTGTTA).
WA09 cells were pre-treated with 10 μM Rock Inhibitor for at least 2 h and dissociated into single cell suspension with 1 mg/ml Accutase (Invitrogen). Approximately 1.5 × 106 cells were mixed with 2.5 μg of each TALEN plasmid or 1.5 μg pU6-sgRNA and 3.5 μg pCBh-hCas9, nucleofected using program B-016 with Necleofector (Lonza, Basel, Switzerland) and cultured in Matrigel coated 24-well dishes in mTeSR1 supplemented with 10 μM Rock Inhibitor. This was followed by IDLV transduction at 2000 CFUs 16 h later. Forty-eight hours after transduction, cells were dissociated by Accutase and seeded at a concentration of 1.5 × 104 cells/well into 6-well dishes with irradiated DR4 feeder cells. G418 at a concentration of 600 μg/ml or puromycin at a concentration of 0.5 μg/ml was used for selection 3 days after cell re-plating. Two weeks after drug selection, half of each colony was manually picked and transferred into Quick Extract DNA solution for genomic DNA extraction. PCR screening was carried out as mentioned above. The primers for identifying homozygous HDR clones in Figure Figure2A2A were homo Ctrl-F (AAGTCCCCTCTCATGGTCCT) and homo Ctrl-R (GCTCGTCCATCCACATACCT). This primer set flanked the nuclease cleavage site in the WAS locus (Figure (Figure1A)1A) and would generate a 359-bp PCR fragment if homozygous HDR did not occur. The PCR primer set used for measuring the HDR frequency in the HB9 locus were 9F1 (GAGGAGGGAACATCCTGAGA) and 9R1 (GGGGAACTTCCTGACTAGGG) for the genomic region upstream from the nuclease cleavage site, 9F2 (GCAACCTCCCCTTCTACGAG) and 9R2 (CCCTCCCCATCATTTCTACA) for the genomic region downstream from the nuclease cleavage site.
A short hairpin RNA (shRNA), L1, targeting LEDGF/p75 was designed as described (29). Two single-stranded oligodeoxyribonucleotides with the sequences of GATCCCCGCAATGAGGATGTGACTAATTCAAGAGATTAGTCACATCCTCATTGCTTTTTGGAAA and AGCTTTTCCAAAAAGCAATGAGGATGTGACTAATCTCTTGAATTAGTCACATCCTCATTGCGGG were synthesized, annealed and cloned into an expression vector containing the H1 promoter. A similar expression plasmid containing the shRNA specific for the firefly luciferase (luc) gene was used as the negative control. To test the knockdown efficiency of L1, 1 or 4 μg of the shRNA plasmid and 1 μg of pmaxGFP plasmid as the transfection efficiency control were co-transfected into HEK293 cells using Lipofectamine 2000 in 6-well dishes. Seventy-two hours after transfection, the transfected cells were harvested and lysed for protein extraction. Approximately 20 μg of the total protein were separated by gel electrophoresis and subjected to western blot analysis using the rabbit anti-LEDGF/p75 antibody (1:1000, Bethyl Laboratories, Montgomery, TX, USA) and the rabbit polyclonal anti-GAPDH antibody (Abcam, Cambridge, MA, USA). Signals were detected with Odyssey Imaging System (LI-COR, Lincoln, NE, USA) for LEGDG/p75 and X-ray film for GAPDH. Plasmid pCHMWS/GFP-IBD-IRES-Puro used for the IBD competition study and the negative control plasmid pCHMWS/GFP-IRES-Puro were kindly provided by Dr Debyser (KULeuven and IRC KULAK, Belgium). For the LEDGF/p75 knockdown study, 0.5 μg WASCR-2, 1.5 μg pCBh-hCas9 and 2 μg L1 or the luc shRNA control were co-transfected into 6×105 HEK293 cells using Lipofectamine 2000 in 6-well dishes. For the IBD competition study, pCHMWS/GFP-IBD-IRES-Puro and pCHMWS/GFP-IRES-Puro instead of L1 and the luc shRNA were used. This was followed by the IDLV transduction at a dose of 2,000 CFUs 16 h after transfection. Treated cells were serially diluted, plated and selected in G418. After 2 weeks in selection, neoR colonies were stained with methylene blue (2% in methanol) and counted by the ImageJ system.
Statistical analysis was performed with the GraphPad Prism 6.0 software. Error bars indicate mean ± s.d.. Comparisons between groups were carried out using two-sided t-test. Significance was considered when P < 0.05.
We first determined whether using IDLV to deliver the donor template for HDR had any advantage over conventional DNA transfection in human cells. Several TALEN pairs were designed to target a genomic region in intron 6 of the Wiskott-Aldrich Syndrome (WAS) gene (Figure (Figure1A).1A). Mutations in this gene cause profound immunodeficiency in patients (35). The TALEN pairs were targeted to the 5΄ end of intron 6 as mutations in this region were frequently found to affect normal splicing of the WAS transcript (36). Three upstream and six downstream TALENs were designed to cleave the targeted region (Figure (Figure1A).1A). A study of mix-matching the upstream and downstream TALENs in HEK293 cells showed that the combination of W3L and W4R generated the highest efficiency in insertion deletion (indel) formation (Figure (Figure1B).1B). To construct the donor template for HDR, a cassette containing the WAS sequences of the targeted region and the gene encoding neomycin phosphotransferase (neoR) gene was inserted into a lentiviral vector for IDLV production (Figure (Figure1A).1A). For TALEN expression, we used lipofectamine to transfect the W3L/W4R expression plasmids into HEK293 cells. Transfected cells were transduced with the IDLVs 16 h later and selected for neoR colonies. In the absence of TALEN, few neoR colonies appeared (15 ± 2, n = 3). As the IN mutation (D116N) in IDLV completely abolished its enzymatic activity (37), emergence of these colonies could be attributed to the preferential incorporation of the linear IDLV genome by naturally occurring DSBs in HEK293 cells. In the presence of W3L/W4R, the number of neoR colonies increased significantly (193 ± 64, n = 3). This increase could be attributed to one of the two possibilities. First, DSBs generated by the TALENs at the genomic target stimulated the incorporation of the linear IDLV genome via its two free ends. Second, DSBs stimulated HDR between the genomic locus and the donor template in the IDLV genome. To differentiate these two possibilities, we used genomic PCR to screen the neoR colonies for HDR. Out of 70 colonies screened, 52 (75%) scored positive for HDR at both homology arms whereas 8 (11%) scored positive at only one arm (Figure (Figure1C1C and Table Table1).1). Thus, up to 86% of the neoR colonies were derived from HDR with a majority of them containing the expected configuration. In contrast, co-transfection of HEK293 cells with the W3L/W4R expression plasmids and the IDLV construct followed by PCR screening of the neoR colonies exhibited a HDR frequency of lower than 10% (Figure (Figure1C1C and Table Table1).1). Thus, a minimum of 8-fold enrichment in the clones derived from homologous recombination was achieved when the donor template was delivered through IDLV transduction instead of DNA transfection. Since HEK293 cells can be transfected with high efficiency and multiple copies of the donor template plasmid are expected to be introduced via transfection, it is unlikely that the difference in the HDR frequency between IDLV transduction and plasmid transfection can simply be attributed to the availability of the donor template. One possibility is that the donor template in the context of an IDLV genome has a favored configuration to serve as the substrate for HDR or simply because the IDLV genome in the pre-integration complex of lentiviral vectors is more stable than transfected naked plasmid DNA in the cell.
To ensure that the elevated HDR frequency observed with IDLV transduction is not limited to the engineered nuclease used, we designed two CRISPRs (WASCR-1 and WASCR-2) targeting the same genomic region as the TALEN pairs (Figure (Figure1A).1A). A comparison of the indel frequency in HEK293 cells showed a similar efficiency between the CRISPRs and W3L/W4R (Figure (Figure1D).1D). HEK293 cells were transfected with the WASCR-2/Cas9 plasmids and transduced with the same donor template-containing IDLV shown in Figure Figure1A.1A. PCR analysis of neoR colonies indicated that out of 105 colonies screened, 70 (67%) of them scored positive for HDR. Among them, 59 (56%) scored positive at both homology arms (Table (Table1).1). Co-transfection of HEK293 cells with the WASCR-2/Cas9 and the IDLV plasmids led to the emergence of the neoR colonies, and out of 48 colonies screened, 10 (21%) were derived from HDR at both arms (Table (Table1).1). To determine whether an increase in IDLV input can boost the HDR frequency further, we transduced HEK293 cells with 4-fold more IDLV. PCR analysis of neoR colonies showed that 46 (77%) out of 60 neoR colonies screened were positive for HDR at both arms (Table (Table1).1). An increase in the IDLV input therefore boosted the likelihood of gaining HDR-derived clones with the expected configuration at both homology arms (Figure (Figure1E).1E). These results strongly suggest that the donor template delivered through IDLV transduction has a significant advantage over DNA transfection to mediate HDR in HEK293 cells. This is irrespective of the engineered nuclease used to generate DSBs as long as the engineered nuclease is expressed efficiently in the target cell.
To determine whether IDLV can mediate highly efficient HDR in hESCs, WA09 cells were nucleofected with the expression plasmids for W3L/W4R followed by IDLV transduction. NeoR colonies were screened for HDR by PCR. Out of 81 neoR colonies screened, 50 (62%) scored positive for HDR. Among them, 41 (51%) were derived from HDR at both homology arms including one colony with both WAS alleles altered by HDR (Table (Table1).1). Since previous studies showed that the genome of IDLV was prone to the formation of concatemers through illegitimate recombination in transduced cells (20,21,24), we measured the fraction of HDR colonies containing the head-to-tail (H-T) IDLV concatemers at the WAS locus. We used a PCR primer pair specific for the IDLV genome to identify those clones containing the H-T structure (Figure (Figure2A).2A). Among 30 randomly picked colonies with HDR at either one or both arms, 15 of them exhibited the H-T concatemer configuration at the targeted WAS locus (Figure (Figure2A,2A, upper panel). The presence of additional sequences derived from the IDLV vector would interfere with DNA repair or transgene expression from the targeted genomic locus. Excluding those colonies with the concatemeric structure reduces the overall frequency of HDR-derived colonies with the expected genomic modification but still remains well above the HDR frequency achieved by plasmid transfection or viral vector transduction shown in other studies. A similar study was carried out using WASCR-2/Cas9 in WA09 cells. Out of 97 neoR colonies screened, 89 (92%) scored positive for HDR and 79 (81%) of those exhibited HDR at both homology arms (Figure (Figure2A2A and Table Table1).1). It is interesting that the HDR frequency derived from WASCR-2/Cas9 was significantly higher than that from W3L/W4R in WA09 cells (92 versus 62%) whereas W3L/W4R generated HDR-derived colonies more efficiently than WASCR-2/Cas9 in HEK293 cells (86 versus 67%) (Table (Table1).1). This result is consistent with two previous studies demonstrating more efficient indel formation at the genomic target with CRISPR/Cas9 than that with TALEN in human ESCs (15,38). Examination of 30 randomly picked HDR-derived colonies with WASCR-2/Cas9 showed the presence of the H-T structure in 15 of them (Figure (Figure2A,2A, lower panel). Concatemer formation is therefore unavoidable when IDLV is used to deliver the donor template.
To ensure that the high HDR frequency mediated by IDLV is not limited to the WAS locus, we designed a CRISPR, HB9CR-3, targeting a site in exon 3 of the motor neuron-specific HB9 gene (39). A donor template containing the puromycin-resistant (purR) gene flanked by the genomic sequence spanning the HB9CR-3 cleavage site was inserted into a lentiviral vector for IDLV production (Figure (Figure2B).2B). WA09 cells were nucleofected with the HB9CR-3/Cas9 expression plasmids followed by the IDLV transduction. In the first round of screening, all nine purR colonies were positive for HDR at both homology arms with two of them exhibiting the H-T concatemer (Figure (Figure2C).2C). Repeated studies with increased IDLV input showed the HDR frequency near 50% or higher at the HB9 locus (Table (Table2).2). However, the frequency of colonies containing the H-T concatemer also increased significantly. This data suggests that the high frequency of HDR mediated by IDLV is not limited to a specific genomic locus but the high-level IDLV input may lead to concatemerization of the IDLV genome not suitable for site-specific gene modification. To apply the IDLV strategy in gene editing, the donor template-containing IDLV input needs to be titrated carefully to optimize the HDR frequency at the targeted genomic locus.
One possibility to explain the significant advantage of using IDLV transduction to mediate HDR over plasmid transfection is that the pre-integration complex of IDLV actively recruits cellular protein(s) involved in the normal process of HDR, leading to the observed increase in clones derived from homologous recombination. We tested this hypothesis by focusing on the HIV-1 IN protein. The cellular factor LEDGF/p75 is known to associate with HIV-1 IN and involved in HIV pathogenesis (28). LEDGF/p75 binds to chromatin via its PWWP domain at the N terminus and to HIV-1 IN through its integrase binding domain (IBD) at the C terminus (Figure (Figure3A)3A) (40). It was recently shown that LEDGF/p75 promoted the repair of DSBs through the normal homologous recombination pathway (29). LEDGF/p75 carries out this activity by recruiting C-terminal binding protein interacting protein (CtIP) in a DNA damage-dependent manner. CtIP acts together with the MRE11-RAD50-NBS1 complex to promote DNA end resection and generate single-stranded DNA crucial for HDR (41). Thus, it is likely that IDLV, through the interaction of IN with LEDGF/p75, recruits cellular proteins such as CtIP to facilitate HDR (Figure (Figure3A).3A). To test this hypothesis, we first determined whether LEDGF/p75 was involved in IDLV-mediated HDR. As shown in Figure Figure3B,3B, downregulation of LEDGF/p75 in HEK293 cells with a small hairpin RNA (shRNA) resulted in significant reduction in the number of neoR colony, and genomic PCR of the pooled neoR colonies confirmed that LEDGF/p75 reduction indeed lowered IDLV-mediated HDR frequency (Figure (Figure3B,3B, lower panel). Over-expression of the LEDGF/p75 IBD that competes with LEDGF/p75 for IN binding (42) also led to a similar reduction in the neoR colony number and the HDR frequency confirmed by genomic PCR (Figure (Figure3C).3C). The data is consistent with the hypothesis that LEDGF/p75 plays an important role in IDLV-mediated HDR. These studies, however, cannot exclude the possibility that LEDGF/p75 is directly involved in the cellular HDR pathway and a reduction in its expression by the shRNA or disruption of its activity by IBD leads to a universal reduction in the HDR frequency in general. To directly address the role of the IN protein in IDLV-mediated HDR, we generated IDLV with a packaging plasmid containing the IN mutant Q168A (43). This mutant loses the ability to bind LEDGF/p75 but continues to support viral DNA replication and migration of the pre-integration complex into the nucleus (43,44). This is in contrast to the D116N IN mutant we used to produce the IDLV for HDR in all the studies described above. The D116N mutant contains a mutation in the IN catalytic core domain (37). This mutation abolishes the enzymatic activity of IN but supports viral DNA replication, the migration of the pre-integration complex into the nucleus and, most importantly, the interaction with LEDGF/p75 (44,45). Side-by-side comparison in HEK293 cells transduced with IDLVs containing different IN mutations showed a significant reduction in the HDR frequency at the WAS locus with IDLV(Q168A) relative to that with IDLV(D116N) (Figure (Figure3D).3D). This data suggested that the direct interaction between the IN protein and LEDGF/p75 was important for IDLV-mediated HDR. Based on these studies, we concluded that the interaction between IN in the IDLV pre-integration complex and LEDGF/75 might facilitate the recruitment of other cellular proteins, such as CtIP, to repair DSBs generated by the engineered nuclease. The ability of the lentiviral IN protein to recruit LEDGF/p75 therefore explain the unique advantage of IDLV over AD and AAV vectors to deliver the donor template for HDR in human cells including hESCs.
In the current study, we showed highly efficient HDR mediated by IDLV in human HEK293 cells and hESC WA09 cells. The HDR frequency, including recombination at one or both homology arms, reached more than 80% among the antibiotics-selected colonies in both cell lines. Our previous study and a study by Yang et al. showed that the HDR frequency in hPSCs was not limited by the cleavage activity of an engineered nuclease as the nuclease generated NHEJ at higher frequencies than HDR at the same genomic locus (14,15). Rather, the frequency of HDR seems to depend on the availability of the donor template at the DSB. The donor template present in the context of the IDLV genome should be well protected by the viral and cellular proteins in the pre-integration complex from degradation by various cellular nucleases. In contrast, a transfected donor plasmid is more likely under constant exposure to cellular nucleases which can lead to the degradation of the donor template. This hypothesis is consistent with our observation that the same IDLV construct for making infectious vector, when introduced as a plasmid into target cells, performed HDR much less efficiently than the infectious IDLV (Figure (Figure1C).1C). The other potential reason that accounts for the observed high HDR frequency with IDLV could be due to the presence of the HIV IN protein in the pre-integration complex. Under the normal process of lentiviral infection, the viral pre-integration complex is tethered to the host chromatin through the interaction between the IN protein and LEDGF/p75 to facilitate host genome integration (28). Although the enzymatically inactive D116N IN mutant we used to prepare the IDLV for HDR loses its ability to catalyze viral DNA integration, mutation at this residue in the IN protein continues to support the tethering of the viral DNA to host chromatin (44). In contrast, the Q168A mutation completely abolishes the ability of the IN protein to bind host chromatin (43,44). The result in Figure Figure33 demonstrates the importance of chromatin tethering of the viral pre-integration complex through the interaction between IN and LEDGF/p75 for IDLV-mediated HDR. The other possibility for the observed high HDR frequencies could be due to the active recruitment of the cellular CtIP protein, an important component for homologous recombination, by the chromatin-associated pre-integration complex of IDLV to stimulate HDR. It was shown that LEDGF/p75 tethered CtIP to the sites of DNA damage via a direct interaction between the two proteins (29). CtIP is involved in DNA end resection and generation of single-strand DNA (ssDNA) at DSBs (46). The ssDNA at DSBs serves as a key intermediate for the assembly of the signaling machinery for homologous recombination–mediated DSB repair. Importantly, these possibilities are not mutually exclusive: the stability of the IDLV pre-integration complex, the chromatin tethering of the pre-integration complex and the active recruitment of HDR proteins may all contribute to the observed high HDR frequency mediated by IDLV. Our observation that disruption of the interaction between the IN protein and LEDGF/p75 only reduced the HDR frequency by half is consistent with this hypothesis (Figure (Figure3D).3D). One caveat is whether LEDGF/p75 can bind both IN and CtIP simultaneously. Daugaard et al. reported that a LEDGF/p75 mutation D366N abolished IN binding but continued to support the CtIP binding (29), suggesting that these two proteins did not compete for LEDGF/p75 binding.
High HDR frequency relies on efficient expression of the engineered nuclease. Compared with previous studies using IDLV to express the engineered nuclease (24,25), our study used plasmid nucleofection to express the engineered nuclease in hESCs. This procedural change is expected to elevate the expression of the engineered nuclease, most likely due to the introduction of multiple copies of the nuclease expression plasmids. A previous study comparing the effect on HDR with I-SceI expressed from either an integrating lentiviral vector or an IDLV showed a much higher HDR frequency in several different human cell lines with the integrating lentiviral vector (47). Together, these studies confirm that IDLV is not suitable for high-level nuclease expression (24,25). Other studies used a combination of Ad vectors to express the engineered nuclease and IDLV to provide the donor template for HDR in human cells. Holkers et al. used this combination to show efficient HDR (~87% with selection) at the AAVS1 locus in human myoblasts (20). Coluccio et al. used a similar strategy and showed a HDR frequency of 10–20% at the AAVS1 locus in two human keratinocyte cell lines but only 0.2–0.3% in primary human keratinocytes (21). Thus, the HDR frequency varies significantly in different cell types even with the Ad vector system capable of robust nuclease expression. Although the Ad vector is able to mediate efficient nuclease expression in different cell types, our study shows that nucleofection of the nuclease expression plasmid is sufficient to mediate efficient HDR in hESCs. It avoids the time consuming steps of constructing the Ad nuclease expression vector and infectious vector production. Our data also shows that a significant fraction of the HDR colonies derived from the IDLV strategy described here contains the concatemeric IDLV structure at the targeted genomic site. Several studies using IDLV for HDR have detected the similar structure (20,21,24). The concatemer is most likely derived from illegitimate recombination among the linear IDLV genome in the transduced cells. The hypothesis is consistent with the observation that an increase in the IDLV input resulted in higher frequencies of WA09 colonies containing the concatemer (Table (Table2).2). Although the formation of the concatemer does not seem to interfere with HDR, these tandem repeats are not expected to restore a reading frame in gene repair nor will they yield the expected transgene expression pattern in the context of gene insertion. To minimize the production of hESC clones with such a structure, the level of the input IDLV needs to be carefully titrated to maximize the generation of HDR clones with the desired gene modification.
Multiple studies carried out in hPSCs have indicated that the HDR frequency at silent genetic loci was generally lower than that at active loci. Zhou et al. used ZFN to correct the missense mutation in the β-globin gene and obtained a single gene-corrected clone out of 300 clones screened in a sickle cell anemia patient-derived iPSC line (8). Sebastiano et al. carried out a similar study in a different sickle cell anemia iPSC line and achieved ~10% HDR frequency at the β-globin locus (48). Soldner et al. used ZFN-mediated HDR to either introduce mutations into the neuron-specific gene encoding α-synuclein in several hESC lines or correct the gene mutation in a patient-derived iPSC lines and achieved HDR frequencies ranging from 0 to 2% (49). In contrast to these studies that targeted silent genes, Zhou et al. used ZFN to target the constitutively active gene encoding phosphatidylinositol glycan class A (PIG-A) and showed much higher HDR frequencies (up to 50%) in both ESC and iPSC lines (27). Hockemeyer et al. used ZFN to target two active loci in hPSCs, Oct4 and AAVS1, and showed HDR frequencies ranging from 39 to >90% in a hESC line. In contrast, targeting a silent gene encoding transcription factor PITX3 resulted in a HDR frequency of only 11% in the same hESC line (7). Thus, these studies showed a trend of lower HDR frequency with silent loci in hPSCs. The two loci we evaluated in the current study, WAS and HB9, are active in hematopoietic lineages and motor neurons, respectively, but remain silent in undifferentiated WA09 cells. Yet with the IDLV strategy, we obtained a HDR frequency of >60% at both genetic loci. Our study was carried out with a donor template containing only ~500 bp homology arm on each side of the neoR marker. The size of the homology arm is the smallest among all the reported studies. The size of the homology arm has been reported to be proportional to the HDR frequency (6). The short homology arm length in the context of the IDLV genome is therefore not rate-limiting to mediate efficient HDR in both HEK293 and WA09 cells. Together, our studies demonstrate the advantages of using IDLV to mediate HDR for site-specific gene modification in hPSCs.
In summary, we show that IDLV-mediated delivery of the donor template represents a highly efficient strategy to mediate gene editing in hESCs. HDR clones with the desired gene modification could be established at much higher frequencies than other strategies reported in the literature. We discover that the IN protein in the pre-integration complex of IDLV, through its interaction with LEDGF/p75, contributes significantly to the high HDR frequency in human cells. With the demonstrated HDR frequency, the burden to carry out large-scale gene editing in hESCs can be overcome by this strategy. Although a significant fraction of the HDR-derived hESC clones contains the concatemeric structure, the clones with the desired gene modification can easily be identified by PCR-based assays. As this strategy generates high frequency of HDR-derived clones, it would be less time consuming and labor intensive to isolate the correct gene-edited clones. Future direction would be to identify strategies to reduce hESC clones with the concatemeric structure.
We thanks Drs Debyser and Serguera for providing the IBD and the Q168A packaging plasmids used in the study. Research reported in this publication included work performed in the Analytical Cytometry Core and Integrative Genomics Core of City of Hope National Medical Center.
National Key Basic Research Program of China [2015CB964900, in part]; Projects of International Cooperation and Exchanges ; Zhejiang Provincial Natural Science Foundation of China [LQ14H080001]; Major Program of National Natural Science Foundation of China ; National Cancer Institute of the National Institutes of Health [P30CA033572]. Funding for open access charge: National Key Basic Research Program of China [2015CB964900].
Conflict of interest statement. None declared.