|Home | About | Journals | Submit | Contact Us | Français|
CRISPR RNA-guided endonucleases (RGENs) have rapidly emerged as a facile and efficient platform for genome editing. Here, we use a human cell-based reporter assay to characterize off-target cleavage of Cas9-based RGENs. We find that single and double mismatches are tolerated to varying degrees depending on their position along the guide RNA (gRNA)-DNA interface. We readily detected off-target alterations induced by four out of six RGENs targeted to endogenous loci in human cells by examination of partially mismatched sites. The off-target sites we identified harbor up to five mismatches and many are mutagenized with frequencies comparable to (or higher than) those observed at the intended on-target site. Our work demonstrates that RGENs are highly active even with imperfectly matched RNA-DNA interfaces in human cells, a finding that might confound their use in research and therapeutic applications.
Recent work has demonstrated that clustered, regularly interspaced, short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems1–3 can serve as the basis of a simple and highly efficient method for performing genome editing in bacteria, yeast and human cells, as well as in vivo in whole organisms such as fruit flies, zebrafish and mice4–13. The Cas9 nuclease from S. pyogenes (hereafter simply Cas9) can be guided via simple base pair complementarity between the first 20 nucleotides of an engineered gRNA and a target genomic DNA sequence of interest that lies next to a protospacer adjacent motif (PAM) matching the sequence NGG5–12, 14 (Supplementary Fig. 1). Previous studies performed in vitro14, in bacteria7 and in human cells10 have shown that Cas9-mediated cleavage can be abolished by single mismatches at the gRNA/target site interface, particularly in the last 10–12 nucleotides located in the 3’ end of the 20 nt gRNA targeting region. Although Maraffini and colleagues7 recently performed a systematic investigation of Cas9 RGEN specificity in bacteria, the specificities of RGENs in human cells have not been extensively defined and, to our knowledge, bona fide off-target mutations induced by Cas9 have not been identified in any eukaryotic cell or organism. Understanding the scope of RGEN-mediated off-target effects in human and other eukaryotic cells will be critically essential if these nucleases are to be used widely for research and therapeutic applications.
To begin to define the specificity determinants of RGENs in human cells, we sought to perform a large-scale test in which we assessed the effects of systematically mismatching various positions within multiple gRNA/target DNA interfaces. To do this, we used a quantitative human cell-based enhanced green fluorescent protein (EGFP) disruption assay previously described by our lab15 that enables rapid quantitation of targeted nuclease activities (Fig. 1a). In this assay, the activities of nucleases targeted to a single integrated EGFP reporter gene can be quantified by assessing loss of fluorescence signal in human U2OS.EGFP cells caused by inactivating frameshift insertion/deletion (indel) mutations introduced by error prone non-homologous end-joining (NHEJ) repair of nuclease-induced double-stranded breaks (DSBs) (Fig. 1a and Methods). For the studies described here, we used three ~100 nt single gRNAs (sgRNAs) targeted to different sequences within EGFP (Supplementary Fig. 2); each of these sgRNAs can efficiently direct Cas9-mediated disruption of EGFP expression (Supplementary Results).
In initial experiments, we tested the effects of single nucleotide mismatches at 19 of 20 nucleotides in the complementary targeting region of our three EGFP-targeted sgRNAs. To do this, we generated variant sgRNAs for each of the three target sites harboring Watson-Crick transversion mismatches at positions 1 through 19 (numbered 1 to 20 in the 3’ to 5’ direction; see Supplementary Fig. 1) and tested the abilities of these various sgRNAs to direct Cas9-mediated EGFP disruption in human cells. (We did not generate variant sgRNAs bearing a substitution at position 20 because this nucleotide is part of the U6 promoter sequence and therefore must remain a guanine to avoid affecting expression.) For EGFP target site #2, single mismatches in positions 1 – 10 of the sgRNA have dramatic effects on associated Cas9 activity (Fig. 1b, middle panel), consistent with previous studies that suggest mismatches at the 5’ end of gRNAs are better tolerated than those at the 3’ end7, 10, 14. However, we found with EGFP target sites #1 and #3 that single mismatches at all but a few positions in the sgRNA appear to be well tolerated, even within the 3’ end of the sequence. Furthermore, the specific positions that are sensitive to mismatch differ for these two targets (Fig. 1b, compare top and bottom panels) – for example, target site #1 is particularly sensitive to a mismatch at position 2 whereas target site #3 is most sensitive to mismatches at positions 1 and 8.
To test the effects of more than one mismatch at the sgRNA/DNA interface, we created a series of variant sgRNAs bearing double Watson-Crick transversion mismatches in adjacent (Fig. 1c) and separated (Fig. 1d) positions and tested the abilities of these to direct Cas9 nuclease activity in human cells using our EGFP disruption assay. All three target sites generally showed greater sensitivity to double alterations in which one or both mismatches occur within the 3’ half of the sgRNA targeting region (Figs. 1c and 1d). However, the magnitude of these effects exhibited site-specific variation, with target site #2 showing the greatest sensitivity to these double mismatches and target site #1 generally showing the least (Figs. 1c and 1d). To test the number of adjacent mismatches that can be tolerated, we constructed variant sgRNAs bearing increasing numbers of mismatched positions ranging from positions 19 to 15 in the 5’ end of the sgRNA targeting region (where single and double mismatches appear to be better tolerated) (Fig. 1e). Testing of these increasingly mismatched sgRNAs revealed that for all three target sites, the introduction of three or more adjacent mismatches results in significant loss of RGEN activity (Fig. 1e). Taken together, our results in human cells confirm that the activities of RGENs can be more sensitive to mismatches in the 3’ half of the sgRNA targeting sequence. However, our data also clearly reveal that the specificity of RGENs is complex and target site-dependent, with single and double mismatches often well tolerated even when one or more mismatches occur in the 3’ half of the sgRNA targeting region. Furthermore, our data also suggest that not all mismatches in the 5’ half of the sgRNA/DNA interface are necessarily well tolerated.
We next sought to determine whether we could identify off-target mutations for RGENs targeted to endogenous human genes. To accomplish this, we used six sgRNAs that target three different sites in the VEGFA gene, one in the EMX1 gene, one in the RNF2 gene, and one in the FANCF gene (Table 1 and Supplementary Table 1). These six sgRNAs efficiently directed Cas9-mediated indels at their respective endogenous loci in human U2OS.EGFP cells as detected by T7 Endonuclease I (T7EI) assay (Online Methods and Table 1). For each of these six RGENs, we then examined dozens of potential off-target sites (ranging in number from 46 to as many as 64) for evidence of nuclease-induced NHEJ-mediated indel mutations in U2OS.EGFP cells. The loci we assessed included all genomic sites that differ by one or two nucleotides as well as subsets of genomic sites that differ by three to six nucleotides and with a bias toward those that had one or more of these mismatches in the 5’ half of the sgRNA targeting sequence (Supplementary Table 2). Using the T7EI assay, we readily identified four off-target sites (out of 53 candidate sites examined) for VEGFA site 1, twelve (out of 46 examined) for VEGFA site 2, seven (out of 64 examined) for VEGFA site 3 and one (out of 46 examined) for the EMX1 site (Table 1 and Supplementary Table 2). No off-target mutations were detected among the 43 and 50 potential sites examined for the RNF2 or FANCF genes, respectively (Supplementary Table 2). The rates of mutation at verified off-target sites were very high, ranging from 5.6% to 125% (mean of 40%) of the rate observed at the intended target site (Table 1). These bona fide off-targets included sequences with mismatches in the 3’ end of the target site and with as many as a total of five mismatches, with most off-target sites occurring within protein coding genes (Table 1). DNA sequencing of a subset of off-target sites provided additional molecular confirmation that indel mutations occur at the expected RGEN cleavage site (Supplementary Fig. 3).
Having established that RGENs can induce off-target mutations with high frequencies in U2OS.EGFP cells, we next sought to determine whether these nucleases would also have these effects in other types of human cells. We had chosen U2OS.EGFP cells for our initial experiments because we previously used these cells to evaluate the activities of TALENs15 but human HEK293 and K562 cells have been more widely used to test the activities of targeted nucleases. Therefore, we also assessed the activities of the four RGENs targeted to VEGFA sites 1, 2, and 3 and the EMX1 site in HEK293 and K562 cells. We found that each of these four RGENs efficiently induced NHEJ-mediated indel mutations at their intended on-target site in these two additional human cell lines (as assessed by T7EI assay) (Table 1), albeit with somewhat lower mutation frequencies than those observed in U2OS.EGFP cells. Assessment of the 24 off-target sites for these four RGENs originally identified in U2OS.EGFP cells revealed that many were again mutated in HEK293 and K562 cells with frequencies similar to those at their corresponding on-target site (Table 1). As expected, DNA sequencing of a subset of these off-target sites from HEK293 cells provided additional molecular evidence that alterations are occurring at the expected genomic loci (Supplementary Figure 4). We do not know for certain why in HEK293 cells four and in K562 cells eleven of the off-target sites identified in U2OS.EGFP cells did not show detectable mutations. However, we note that many of these off-target sites also showed relatively lower mutation frequencies in U2OS.EGFP cells. Therefore, we speculate that mutation rates of these sites in HEK293 and K562 cells may be falling below the reliable detection limit of our T7EI assay (~2–5%) because RGENs generally appear to have lower activities in HEK293 and K562 cells compared with U2OS.EGFP cells in our experiments. Taken together, our results in HEK293 and K562 cells provide evidence that the high-frequency off-target mutations we observe with RGENs will be a general phenomenon seen in multiple human cell types.
Our results reveal that predicting the specificity profile of any given RGEN is neither simple nor straightforward. Our EGFP reporter assay experiments show that single and double mismatches can have variable effects on RGEN activity in human cells that do not strictly depend upon their position(s) within the target site. For example, consistent with previously published reports, alterations in the 3’ half of the sgRNA/DNA interface generally have greater effects than those in the 5’ half7, 10, 14; however, single and double mutations in the 3’ end sometimes also appear to be well tolerated whereas double mutations in the 5’ end can greatly diminish activities. In addition, the magnitude of these effects for mismatches at any given position(s) appears to be site-dependent. Comprehensive profiling of a large series of RGENs with testing of all possible nucleotide substitutions (beyond the Watson-Crick transversions used in our EGFP reporter experiments) may help provide additional insights into the range of potential off-targets. In this regard, the recently described bacterial cell-based method of Maraffini and colleagues7 or the in vitro, combinatorial library-based cleavage site-selection methodologies previously applied to ZFNs by Liu and colleagues16 might be useful for generating larger sets of RGEN specificity profiles.
Despite these challenges in comprehensively predicting RGEN specificities, we were able to identify bona fide off-targets of RGENs with relative ease simply by examining a subset of genomic sites that differed from the on-target site by one to five mismatches. Notably, under conditions of our experiments, the frequencies of RGEN-induced mutations at many of these off-target sites were similar to (or higher than) those observed at the intended on-target site, enabling us to identify them using the simple and relatively insensitive T7EI assay (which, as performed in our laboratory, has a reliable detection limit of ~2 to 5% mutation frequency). Because these mutation rates were very high, we were able to avoid using deep sequencing methods previously required to detect much lower frequency ZFN- and TALEN-induced off-target alterations16–19. Our analysis of RGEN off-target mutagenesis in human cells also confirmed the difficulties of predicting RGEN specificities – not all single and double mismatched off-target sites show evidence of mutation whereas some sites with as many as five mismatches can also show alterations. Furthermore, the bona fide off-target sites we identified do not exhibit any obvious bias toward transition or transversion differences relative to the intended target sequence (Supplementary Table 2; grey highlighted rows).
Although we have unveiled off-target sites for a number of RGENs, we note that our identification of these sites was neither comprehensive nor genome-wide in scale. For the six RGENs we studied, we only examined a very small subset of the much larger total number of potential off-target sequences in the human genome (sites that differ by three to six nucleotides from the intended target site; compare Supplementary Tables 2 and 3). Although examining such large numbers of loci for off-target mutations by T7EI assay is neither a practical nor a cost-effective strategy, the use of high-throughput sequencing in future studies might enable the interrogation of larger numbers of candidate off-target sites and provide a more sensitive method for detecting bona fide off-target mutations. For example, such an approach might enable the unveiling of additional off-target sites for the two RGENs for which we failed to uncover any off-target mutations. In addition, an improved understanding both of RGEN specificities and of any epigenomic factors (e.g.--DNA methylation and chromatin status) that may influence RGEN activities in cells might also reduce the number of potential sites that need to be examined and thereby make genome-wide assessments of RGEN off-targets more practical and affordable.
It will be interesting to investigate whether the specific choice of RGEN target site can be used to minimize the frequencies of genomic off-target mutations. Given that off-target sites that differ at up to five positions from the intended target site can be efficiently mutated by RGENs, attempting to choose target sites with minimal numbers of off-target sites as judged by simple mismatch counting seems unlikely to be effective; thousands of potential off-target sites that differ by four or five positions within the 20 bp RNA:DNA complementarity region will typically exist for any given RGEN targeted to a sequence in the human genome (see, for example, Supplementary Table 3). It is also possible that the nucleotide content of the gRNA complementarity region might influence the range of potential off-target effects. For example, high GC-content has been shown to stabilize RNA:DNA hybrids20 and therefore might also be expected to make gRNA/genomic DNA hybridization more stable and more tolerant to mismatches. Additional experiments with larger numbers of gRNAs will be needed to assess if and how these two parameters (numbers of mismatched sites in the genome and stability of the RNA:DNA hybrid) influence the genome-wide specificities of RGENs. However, it is important to note that even if such predictive parameters can be defined, the effect of implementing such guidelines would be to further restrict the targeting range of RGENs.
Another potential general strategy for reducing RGEN-induced off-target effects might be to reduce the concentrations of gRNA and Cas9 nuclease expressed in the cell. We tested this idea using the RGENs for VEGFA target sites 2 and 3 in U2OS.EGFP cells but found that transfecting less sgRNA- and Cas9-expressing plasmid decreased the mutation rate at the on-target site but did not appreciably change the relative rates of off-target mutations (Table 2 and Supplementary Table 4). Consistent with this, we note that we also observe high-level off-target mutagenesis in two other human cell types (HEK293 and K562 cells) even though the absolute rates of on-target mutagenesis are lower than in U2OS.EGFP cells. Although additional work is clearly needed to further explore this strategy, these initial experiments suggest that reducing expression levels of gRNA and Cas9 in cells is not likely to provide a simple solution for reducing off-target effects. Furthermore, these results also suggest that the high rates of off-target mutagenesis we observe in human cells are not caused by overexpression of sgRNA and/or Cas9.
Our finding that significant off-target mutagenesis can be induced by RGENs in three different human cell types has important implications for broader use of this genome-editing platform. For research applications, the potentially confounding effects of high frequency off-target mutations will need to be considered, particularly for experiments involving either cultured cells or organisms with slow generation times for which the outcrossing of undesired alterations would be challenging. One way to control for such effects might be to utilize multiple RGENs targeted to different DNA sequences to induce the same genomic alteration as off-target effects are not random but instead related to the targeted site. However, for therapeutic applications, our findings clearly indicate that the specificities of RGENs will need to be carefully defined and/or improved if these nucleases are to be used safely in the longer term for treatment of human diseases.
DNA oligonucleotides (Supplementary Table 1) harboring variable 20 nt sequences for Cas9 targeting were annealed to generate short double-strand DNA fragments with 4 bp overhangs compatible with ligation into BsmBI-digested plasmid pMLM3636. Cloning of these annealed oligonucleotides generates plasmids encoding a chimeric +103 single-chain guide RNA with 20 variable 5’ nucleotides under expression of a U6 promoter9, 11. pMLM3636 and the expression plasmid pJDS246 (encoding a codon optimized version of Cas9) used in this study are both available through the non-profit plasmid distribution service Addgene (http://www.addgene.org/crispr-cas).
U2OS.EGFP cells harboring a single integrated copy of an EGFP-PEST fusion gene were cultured as previously described15. For transfections, 200,000 cells were Nucleofected with the indicated amounts of sgRNA expression plasmid and pJDS246 together with 30 ng of a Td-tomato-encoding plasmid using the SE Cell Line 4D-Nucleofector™ X Kit (Lonza) according to the manufacturer’s protocol. Cells were analyzed 2 days post-transfection using a BD LSRII flow cytometer. Transfections for optimizing gRNA/Cas9 plasmid concentration were performed in triplicate and all other transfections were performed in duplicate.
PCR reactions were performed using Phusion Hot Start II high-fidelity DNA polymerase (NEB) with PCR primers and conditions listed in Supplementary Table 2. Most loci amplified successfully using touchdown PCR (98 °C, 10 s; 72–62 °C, −1 °C/cycle, 15 s; 72 °C, 30 s]10 cycles, [98 °C, 10 s; 62 °C, 15 s; 72 °C, 30 s]25 cycles). PCR for the remaining targets were performed with 35 cycles at a constant annealing temperature of 68 °C or 72 °C and 3% DMSO or 1M betaine, if necessary. PCR products were analyzed on a QIAXCEL capillary electrophoresis system to verify both size and purity. Validated products were treated with ExoSap-IT (Affymetrix) and sequenced by the Sanger method (MGH DNA Sequencing Core) to verify each target site.
For U2OS.EGFP and K562 cells, 2 × 105 cells were transfected with 250 ng of sgRNA expression plasmid or an empty U6 promoter plasmid (for negative controls), 750 ng of Cas9 expression plasmid, and 30 ng of td-Tomato expression plasmid using the 4D Nucleofector System according to the manufacturer’s instructions (Lonza). For HEK293 cells, 1.65 × 105 cells were transfected with 125 ng of sgRNA expression plasmid or an empty U6 promoter plasmid (for the negative control), 375 ng of Cas9 expression plasmid, and 30 ng of a td-Tomato expression plasmid using Lipofectamine LTX reagent according to the manufacturer’s instructions (Life Technologies). Genomic DNA was harvested from transfected U2OS.EGFP, HEK293, or K562 cells using the QIAamp DNA Blood Mini Kit (QIAGEN), according to the manufacturer’s instructions. To generate enough genomic DNA to amplify the off-target candidate sites, DNA from three Nucleofections (for U2OS.EGFP cells), two Nucleofections (for K562 cells), or two Lipofectamine LTX transfections was pooled together before performing T7EI. This was done twice for each condition tested, thereby generating duplicate pools of genomic DNA representing a total of four or six individual transfections. PCR was then performed using these genomic DNAs as templates as described above and purified using Ampure XP beads (Agencourt) according to the manufacturer’s instructions. T7EI assays were performed as previously described15.
Purified PCR products used for the T7EI assay were cloned into Zero Blunt TOPO vector (Life Technologies) and plasmid DNAs were isolated using an alkaline lysis miniprep method by the MGH DNA Automation Core. Plasmids were sequenced using an M13 forward primer (5’ – GTAAAACGACGGCCAG – 3’) by the Sanger method (MGH DNA Sequencing Core).
This work was supported by a National Institutes of Health (NIH) Director’s Pioneer Award DP1 GM105378, NIH R01 GM088040, NIH P50 HG005550, Defense Advanced Research Projects Agency (DARPA) W911NF-11-2-0056, and the Jim and Ann Orr Massachusetts General Hospital Research Scholar Award. We thank Shengdar Q. Tsai for helpful discussions and encouragement.
Author ContributionsY.F., J.D.S., and J.K.J. designed experiments; Y.F., J.A.F., C.K., M.L.M, D.R., and J.D.S. performed experiments; Y.F., M.L.M., D.R., J.D.S., and J.K.J. wrote the manuscript.
Competing Financial Interests
J.K.J. has a financial interest in Transposagen Biopharmaceuticals. J.K.J.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.