|Home | About | Journals | Submit | Contact Us | Français|
Targeted genome editing using engineered nucleases has rapidly transformed from a niche technology to a mainstream method used by many biological researchers. This widespread adoption has been largely fueled by the emergence of the clustered regularly interspaced short palindromic repeat (CRISPR) technology, an important new platform for generating RNA-guided nucleases (RGNs), such as Cas9, with customizable specificities. RGN-mediated genome editing is facile, rapid and has enabled the efficient modification of endogenous genes in a wide variety of biomedically important cell types and novel organisms that have traditionally been challenging to manipulate genetically. Furthermore, a modified version of the CRISPR-Cas9 system has been developed to recruit heterologous domains that can regulate endogenous gene expression or label specific genomic loci in living cells. Although the genome-wide specificities of CRISPR-Cas9 systems remain to be fully defined, the capabilities of these systems to perform targeted, highly efficient alterations of genome sequence and gene expression will undoubtedly transform biological research and spur the development of novel molecular therapeutics for human disease.
The capability to introduce targeted genomic sequence changes into living cells and organisms provides a powerful tool for biological research as well as a potential avenue for therapy of genetic diseases. Frameshift knockout mutations enable reverse genetics and assignment of function, sequence insertions can be used to fuse genes to epitope tags or other functional domains, such as fluorescent proteins to endogenous gene products, and specific sequence alterations can be used to induce amino acid substitutions for disease modeling, to transfer traits in agricultural crops and livestock, and to correct defective genes for therapeutic applications. For many years, strategies for efficiently inducing precise, targeted genome alterations were limited only to certain organisms (for example, using homologous recombination in yeast or recombineering in mice) and often required drug-selectable markers or left behind ‘scar’ sequences associated with the modification method (for example, residual loxP sites from Cre recombinase-mediated excision). Targeted genome editing using customized nucleases provides a new, general method for inducing targeted deletions, insertions and precise sequence changes in a broad range of organisms and cell types. The high efficiency of genome editing obviates the need for additional sequences, such as drug-resistance marker genes, and therefore the need for additional manipulations to remove them.
A crucial first step for performing targeted genome editing is the creation of a DNA double-stranded break (DSB) at the genomic locus to be modified1. Nuclease-induced DSBs can be repaired by one of at least two different pathways that are operative in nearly all cell types and organisms: non-homologous end-joining (NHEJ) and homology-directed repair (HDR) (Fig. 1). NHEJ can lead to the efficient introduction of insertion/deletion mutations (indels) of various lengths, which can disrupt the translational reading frame of a coding sequence or the binding sites of trans-acting factors in promoters or enhancers. HDR-mediated repair can be used to introduce specific point mutations or to insert desired sequences through recombination of the target locus with exogenously supplied DNA ‘donor templates’. When targeted nuclease-induced DSBs are used, the frequencies of these alterations are typically higher than 1% and, in some cases, can be 50% or higher; modifications at these levels enable identification of desired mutations by simple screening, without the need for drug resistance marker selection.
Early methods for targeting DSB-inducing nucleases to specific genomic sites relied on protein-based systems with customizable DNA-binding specificities such as meganucleases, zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs). These platforms have enabled important advances, but each has its own set of associated advantages and challenges (Box 1). More recently, a novel platform based on a bacterial CRISPR-Cas system has been developed that is unique and flexible due to its dependence on RNA as the moiety that targets the nuclease to a desired DNA sequence. In contrast to ZFN and TALEN platforms that use protein–DNA interactions for targeting, these RGNs use simple base pairing rules between an engineered RNA and the target DNA site.
Meganucleases, ZFNs, and TALENs have been utilized extensively for genome editing in a variety of different cell types and organisms. Meganucleases are engineered versions of naturally occurring restriction enzymes that typically have extended DNA recognition sequences. Zinc finger nucleases (ZFNs) and Transcription Activator-Like Effector Nucleases (TALENs) are artificial fusion proteins composed of an engineered DNA binding domain fused to a non-specific nuclease domain from the FokI restriction enzyme. Zinc finger and transcription activator-like effector repeat domains with customized specificities can be joined together into arrays capable of binding to extended DNA sequences.
The engineering of meganucleases has been challenging for most academic researchers because the DNA recognition and cleavage functions of these enzymes are intertwined in a single domain86, 87. By contrast, the DNA binding domains of ZFNs and TALENs are distinct from the FokI cleavage domain88, thereby making it more straightforward to modify the DNA-binding specificities of these nucleases. However, robust construction of engineered zinc finger arrays has also proven to be difficult for many laboratories due to the need to account for context-dependent effects between individual finger domains in an array89. Despite the availability of various publicly available methods designed to simplify the challenge of creating ZFNs90-96, these nucleases have not been engineered by a wide range of laboratories. In contrast to zinc fingers, TALE repeat domains appear to be less affected by context-dependence and can be robustly assembled in a modular fashion to recognize virtually any DNA sequence97 using a simple one-to-one code between individual repeats and the four possible DNA nucleotides98, 99. Although much simpler to design than meganucleases or ZFNs, the assembly of DNAs encoding large numbers of highly conserved TALE repeats can require the use of non-standard molecular biology cloning methods. Many user-friendly methods for making such assemblies have been described in the literature100 but the highly repetitive nature of TALEN-coding sequences also creates barriers to their delivery using certain viral vectors, such as lentiviruses101. Nonetheless, the greater simplicity of TALENs relative to meganucleases and ZFNs has led to their adoption over the past several years by a broader range of scientists. The question of whether to utilize these platforms for a given application must be answered on a case-by-case basis and we refer the reader to recent reviews on these different technologies for additional information86, 88, 100.
In this review, we describe how this RNA-guided system works and how it has been applied to perform genome editing across a wide variety of cell types and whole organisms. We also discuss advantages and limitations of this system, including assessment of off-target effects and recent strategies for improving specificity, and how it can be re-purposed for other applications such as regulation of gene expression and selective labeling of the genome. Finally, we consider the next set of challenges that will need to be addressed for this emerging genome editing platform.
CRISPR systems are adaptable immune mechanisms used by many bacteria to protect themselves from foreign nucleic acids, such as viruses or plasmids2-5. The type II CRISPR system incorporates sequences from invading DNA between CRISPR repeat sequences encoded as arrays within the host genome (Fig. 2a). Transcripts from the CRISPR repeat arrays are processed into CRISPR RNAs (crRNAs) each harboring a variable sequence transcribed from the invading DNA, known as the “protospacer” sequence, as well as part of the CRISPR repeat. Each crRNA hybridizes with a second transactivating CRISPR RNA (tracrRNA) and these two RNAs complex with the Cas9 nuclease6. The protospacer-encoded portion of the crRNA directs Cas9 to cleave complementary target DNA sequences, if they are adjacent to short sequences known as “protospacer adjacent motifs” (PAMs). Protospacer sequences incorporated into the CRISPR locus are not cleaved because they are not present next to a PAM sequence.
The type II CRISPR system from Streptococcus pyogenes has been adapted for inducing sequence-specific DSBs and targeted genome editing7. In its simplest and most widely used form, two components must be introduced and/or expressed in cells or an organism to perform genome editing: the Cas9 nuclease; and a ‘guide RNA’ (gRNA), consisting of a fusion of a crRNA and a constant tracrRNA (Fig. 2b). 20 nucleotides at the 5′ end of the gRNA (corresponding to the protospacer sequence of the crRNA; Fig. 2c) direct Cas9 to a specific target DNA site using standard RNA-DNA complementarity base pairing rules. These target sites must lie immediately 5′ of a PAM sequence that matches the canonical form 5′-NGG (although recognition at sites with alternate PAM sequences (e.g. 5′-NAG) has also been reported, albeit at less efficient rates7-9). Thus, with this system, Cas9 nuclease activity can be directed to any DNA sequence of the form N20-NGG simply by altering the first 20 nts of the gRNA to correspond to the target DNA sequence. Type II CRISPR systems from other species of bacteria that recognize alternative PAM sequences and that utilize different crRNA and tracrRNA sequences have also been used to perform targeted genome editing10-12. However, because the most commonly used and extensively characterized system to date is based on the S. pyogenes system, the remainder of this review focuses on this particular platform and its components unless otherwise noted.
Following the initial demonstrations in 2012 that Cas9 could be programmed to cut various DNA sites in vitro7, a flurry of papers published in 2013 showed that this platform also functions efficiently in a variety of cells and organisms. Initial proof-of-principle studies showed that Cas9 could be targeted to endogenous genes in bacteria8, cultured transformed human cancer cell lines and human pluripotent stem cells in culture13-16, as well as in a whole organism, the zebrafish17. Subsequently, Cas9 has been used to alter genes in yeast18, tobacco19, 20, thale cress19, rice21, 22, wheat21, sorghum23, mice24, 25, rats26, rabbits27, frog28, fruit flies29, 30 silkworm31,and roundworms32 (see Table 1 for a list of these published reports).Cas9-induced DSBs have been used to introduce NHEJ-mediated indel mutations as well as to stimulate HDR with both double-stranded plasmid DNA and single-stranded oligonucleotide donor templates. The capability to introduce DSBs at multiple sites in parallel using the Cas9 system is a unique advantage of this platform relative to meganucleases, ZFNs, or TALENs. For example, expression of Cas9 and multiple gRNAs has been used to induce small and large deletions or inversions between the DSBs14,33-35, to simultaneously introduce mutations in three genes in rat cells36, five genes in mouse ES cell clones24 and five genes in the somatic cells of a single zebrafish37.
The simplicity of Cas9 targeting has also inspired the generation of large gRNA libraries using array-based oligonucleotide synthesis. These libraries can be engineered to encompass multiple gRNAs for every target gene in a host organism, thereby greatly facilitating forward genetic screens and selections. In contrast to short hairpin RNA libraries, which only mediate gene knockdown, these gRNA libraries have been used with Cas9 nuclease to generate knockout mutations. Libraries comprised of between ~64,000 and ~87,000 distinct gRNAs have been used to demonstrate the capacity for positive and negative forward genetic phenotype screens in both human and mouse cells38-40.
Variant Cas9 nickases that cut one strand, rather than both strands, of the target DNA site have also been shown to be useful for genome editing. Introduction of a D10A or H840A mutation into the RuvC1- or HNH-like nuclease domains present in Cas9 (Fig. 3a)41, 42 results in the generation of nickases that cut either the complementary or non-complementary DNA target strands, respectively, in vitro7, 12, 43 (Fig. 3b and 3c). Consistent with previous studies performed with ZFN-derived nickases44-46, Cas9 nickases can, at some sites, induce HDR with reduced levels of concomitant NHEJ-mediated indels13, 14. However, although at some sites Cas9 nickase can induce HDR with efficiencies similar to what is observed with the Cas9 nuclease from which it is derived13, 14, these rates can also be greatly reduced at other sites47, 48. Importantly, the frequencies of indel mutations introduced by nickases has also been reported to be high at certain sites13, 47-49. Although the precise DNA repair pathways by which these various alterations are induced remain as yet undefined, one potential mechanism that has been postulated is that passage of a replication fork through a nuclease-induced nick site might result in a DNA DSB. Additional studies with Cas9 nickases will be needed to better understand locus-dependent differences in the efficiencies of HDR and indel mutation induction.
Although RGNs generally cleave their intended target sites reliably, an important question is to what extent these nucleases induce off-target cleavage events (and therefore unwanted NHEJ-induced indel mutations). To assess RGN specificity, several groups have created variant gRNAs bearing one to four nucleotide mismatches in the complementarity region and then examined the abilities of these molecules to direct Cas9 nuclease activity in human cells at reporter gene50 or endogenous gene14, 51 target sites. These studies showed that mismatches are generally better tolerated at the 5′ end of the 20 nt targeting region of the gRNA than at the 3′ end; this result is consistent with previous experiments performed in vitro and in bacterial cells, which suggested that the 8 – 12 bps at the 3′ end of the targeting sequence (a.k.a. the seed sequence) are crucial for target recognition7, 8, 14, 52, 53. However, the effects of single and double mismatches are not always predictable based on their location within the gRNA targeting region; some mismatches in the 5′ end can have dramatic effects whereas some in the 3′ end do not significantly affect Cas9 activity50. In addition, not all nucleotide substitutions at a given position necessarily have equivalent effects on activity51.
A reciprocal, and perhaps more relevant, approach for studying specificity is to assess the activities of Cas9 at potential off-target genomic DNA target sites, (i.e. sites that have a few nucleotide differences compared to the intended target). A number of studies have examinedpotential off-target sites that differ at one to six positions from the on-target site in human cells47, 48, 50, 51, 54. Collectively, these reports have found cases of off-target mutations at sites that differ by as many as five positions within the protospacer region50 and/or that have an alternative PAM sequence of the form NAG51. Interestingly, indel mutation frequencies at these off-target sites can be high enough (>2-5%) to detect using the relatively insensitive T7 Endonuclease I (T7E1) mutation mismatch assay and to sometimes be comparable to the on-target site mutation frequency48, 50. In addition, more sensitive deep sequencing assays have been used to identify lower frequency off-target mutations48, 55, 56. It is important to note that all of these directed studies only examined a subset of the much larger number of potential off-target sites in the genome. For example, any given 20 nt protospacer will have typically have hundreds to thousands of potential off-target sites that differ at four or five positions, respectively, in 6 × 109 bps of random DNA. In addition, although it has been suggested that higher GC content at the RNA:DNA hybridization interface might potentially help to stabilize Cas9:gRNA complexes, high rates of mutagenesis have been observed for off-target sites with as little as 30% matched GC-content9, 50.
A somewhat more comprehensive strategy for examining Cas9 specificities is to identify off-target sites from a partially degenerate library of variants that is based on the intended on-target sequence. One recent report identified sites from such libraries based on their abilities to be bound by a catalytically inactive form of Cas9 fused to a transcriptional activation domain (see further discussion below)49. This study found sites that were mismatched by as many as three (and possibly more) positions relative to the on-target site49. These results are similar to those of another study, which used in vitro selection for Cas9 nuclease cleavage activity to identify potential off-target sites from a partially degenerate library of target site variants. Some of the off-target sites identified by these in vitro selections (with up to four mismatches) were also shown to be mutated in human cells9.
A recent study using whole exome sequencing did not find evidence of Cas9-induced off-target mutations in three modified human K562 cell line clones57. Although the authors acknowledge that the high false negative result rate associated with exome sequencing analysis limits interpretation of this data, these results do suggest that with careful target selection it may be possible to isolate Cas9-edited cells with otherwise intact exomes. Additional examples with deeper sequencing coverage and whole genome (rather than whole exome) sequencing will be needed to determine how readily cells that do not have off-target mutations can be isolated. The ability to do so would encourage broader research application of Cas9 technology. However, it is worth noting that deep sequencing the genomes of individual cell clones is neither a sensitive nor an effective approach for defining the full genomewide spectrum of Cas9 off-target sites.
Overall, the various studies published to date strongly suggest that off-target sites of RNA-guided Cas9 nucleases can be variable in frequency and challenging to predict. For any given target site, it is not currently possible to predict how many mismatches can be tolerated, nor do we fully understand why some sites are cleaved whereas other are not. We also do not know how genomic and/or epigenomic context might affect the frequency of cleavage. Although some initial evidence suggests that DNA methylation may not inhibit Cas9-based genome editing55, it seems both plausible and likely that chromatin structure could play a role in off-target site accessibility. A more comprehensive understanding of Cas9 off-target effects will have to await the development of unbiased, global measures of Cas9 specificity in cells
Even with an incomplete understanding of RNA-guided Cas9 nuclease specificity, researchers have begun to explore various approaches to reduce off-target mutagenic effects. One potential strategy is to test the effects of reducing the concentrations of gRNA and/or Cas9 expressed in human cells. Results with this approach have been mixed with one group observing proportionately larger decreases in rates of off-target mutagenesis relative to on-target mutagenesis for two gRNAs51 and another group observing nearly proportionate decreases for two other gRNAs50. The use of modified gRNA architectures with truncated 3′ ends (within the tracrRNA-derived sequence) or with two extra guanine nucleotides appended to the 5′ end (just prior to the complementarity region) also yielded better on-target to off-target ratios but again with considerably lower absolute efficiencies of on-target genome editing57, 58.
Another proposed approach for improving specificity involves the use of “paired nickases” in which adjacent off-set nicks are generated at the target site using two gRNAs and Cas9 nickase47, 49, 57(Fig. 3d), a strategy analogous to one originally performed with pairs of engineered zinc finger nickases46. Cas9 nickases targeted to sites on opposite DNA strands separated by four to 100 bps can efficiently induce indel mutations or HDR events with a single-stranded DNA oligonucleotide donor template in human and mouse cells47, 49, 57. It has been proposed that the concerted action of paired nickases create a DSB that is then repaired by NHEJ or HDR47, 56. Importantly, paired nickases can reduce Cas9-induced off-target effects of gRNAs in human cells. The addition of a second gRNA and substitution of Cas9 nickase for Cas9 nuclease can lead to lower levels of unwanted mutations at previously known off-target sites of the first original gRNA47. However, an as-yet unanswered question is whether the second gRNA can itself induce its own range of Cas9 nickase-mediated off-target mutations in the genome. Multiple studies have shown that single Cas9 nickases can function on their own to induce indel mutations at certain genomic loci13, 47-49, perhaps because an individual nick might be converted to a DSB when a replication fork passes through the locus59, 60. Thus, one important way to improve the paired nickase system might be to modify it so that the activities of the two gRNAs are strictly co-dependent on each other for genome editing activity—that is, so that each gRNA is only active for genome editing when bound to DNA in close proximity to the other.
Our group has recently shown that off-target effects can be substantially reduced by using gRNAs that have been shortened at the 5′ end of their complementarity regions. These truncated gRNAs (tru-gRNAs) bearing 17 or 18 nucleotides of complementarity generally function as efficiently as full-length gRNAs in directing on-target Cas9 activity but show decreased mutagenic effects at off-target sites and enhanced sensitivity to single or double mismatches at the gRNA/DNA interface48. tru-gRNAs are simple to implement and this strategy avoids the technical challenges associated with expressing multiple gRNAs in a single cell for the paired nickase approach. Importantly, tru-gRNAs might also be used in conjunction with other strategies for improving Cas9 specificity (e.g.—tru-gRNAs have already been shown to improve the specificity of paired nickases48) and to improve the specificities of dCas9 fusion proteins for non-nuclease applications (described below).
Due to rapid progress in the field, potential users of CRISPR-Cas technology face a variety of choices about how to practice CRISPR-Cas technology. Here we discuss some of the parameters to consider when implementing the methodology.
It is important to note that the efficiency of Cas9 activity for any given locus can be influenced by the architecture of guide RNA(s) used. As described above, most recent studies have used a single gRNA that is a fusion of a programmable crRNA and part of the tracrRNA, but earlier studies also used a ‘dual gRNA’ configuration in which the crRNA and tracrRNA are expressed separately. In general, studies using single gRNAs have consistently reported substantially higher editing rates than those using dual gRNAs13, 14, 17, 61. These findings suggest that the single gRNA system may be more active than the double gRNA system, presumably because two components can assemble more efficiently than three components.
In addition, single gRNAs harboring variable lengths of tracrRNA sequence on their 3′ ends have been used by different groups (Supplementary Table 1). Systematic comparisons have generally demonstrated that longer single gRNAs (containing a longer 3′ portion of the tracrRNA sequence) are more active than shorter ones51. The most commonly used single gRNA to date is approximately 100 nts in length (Supplementary Table 1) and has been shown reliably to direct Cas9 to genomic sites in yeast18, tobacco19, 20, thale cress19, rice21, 22, wheat21, sorghum23, mice26, rats26, rabbits27, frog28, fruit flies29, 30, roundworms32, silkworm31 and human somatic and pluripotent stem cells13. The tru-gRNAs described above are shortened versions of this ~100 nt single gRNA.
For gRNAs, the choice of promoter used to express these RNAs can limit the options for potential target sites. For example, the RNA polymerase III-dependent U6 promoter or the T7 promoter require a G or GG, respectively, at the 5′ end of the RNA to be transcribed (Fig. 4a); as a result, standard full-length or tru-gRNAs expressed from these promoters are limited to targeting sites that match the forms GN16-19NGG or GGN15-18NGG, sites that each occur every 1 in 32 bps or 1 in 128 bps, respectively, in random DNA. Paired nickase strategies require the identification of two sites on opposite strands of DNA with an appropriate spacing in between (as described above). One strategy to reduce these targeting range restrictions is to choose sites without regard to the identities of the first or first two bases at the 5′ end (i.e..--making gRNAs that are mismatched at these positions). Another potential strategy to bypass these restrictions is to append the required G or GG to the 5′ end of the gRNA, thereby encoding gRNA transcripts that are 1 or 2 bps longer (Fig.4b). Both of these strategies have been used successfully to produce active gRNAs but with variable efficiencies in the genome-editing activities of the Cas9 nucleases47, 54, 57, 62. Larger-scale studies are needed to clarify the effects of using either mismatched or extended gRNAs on the efficiencies and specificities of RGN-mediated cleavage. Several groups have provided web-based software that facilitates the identification of potential CRISPR RGN target sites in user-defined sequences (for example, the ZiFiT Targeter software17, 48 (http://zifit.partners.org) and the CRISPR Design Tool55 (http://crispr.mit.edu/)).
RGNs have been delivered to a broad range of cell types and organisms using a variety of delivery methods. In cultured mammalian cells, researchers have used electroporation61, Nucleofection13, 50 and Lipofectamine-mediated transfection13, 14, 50 of non-replicating plasmid DNA to transiently express Cas9 and gRNAs. Lentiviral vectors have also been used to constitutively express Cas9 and/or gRNAs in cultured human 38, 39 and mouse40 cells. In vitro transcribed RNAs and/or plasmid DNA have been injected directly into embryos; for example, in embryos of zebrafish17, fruit flies29, 30, 63, mice24, 26 and rats26. Both plasmid DNA and RNA have also been injected into the gonads of adult roundworms32, 64-68 and in one study purified Cas9 protein complexed with gRNA was injected69. In addition to animal models and cell lines, Cas9 has been used successfully in multiple plant species including wheat, rice, sorghum, tobacco, and thale cress using a range of standard delivery methods including PEG-mediated transformation of protoplasts, Agrobacterium-mediated transfer in embryos and leaf tissue, and/or bombardment of callus cells with plasmid DNA19-21, 23. For most RGN applications, transient expression of gRNAs and Cas9 is typically sufficient to induce efficient genome editing. Although constitutive expression of RGN components might potentially lead to higher on-target editing efficiencies, extended persistence of these components in the cell might also lead to increased frequencies of off-target mutations, a phenomenon that has been previously reported with ZFNs70.
The existence of CRISPR RGN-induced off-target effects and our inability to comprehensively identify all of these alterations on a genome-wide scale requires investigators for now to account for potentially confounding effects of these undesired mutations. Several strategies might be used to rule out off-target mutations as a potential alternative explanation for any phenotypes observed. For example, complementation with re-introduction of a wild-type gene can be used to confirm the effects of knockout mutations. In addition, similar to the strategy of targeting a gene with multiple RNAi hairpins, one could easily create mutations in the same gene using gRNAs targeted to different sites. Presumably, each gRNA will be expected to have a different range of off-target effects and therefore if the phenotype is observed with each of these different gRNAs it would seem unlikely that undesired mutations are the cause. The ease with which multiple gRNAs can be rapidly designed and constructed makes it simple and feasible to implement this latter type of strategy with the Cas9 system. The high efficacy of the Cas9 nuclease for inducing mutations makes it an attractive choice for creating mutant cell lines and whole organisms in spite of the need to account for off-target effects.
Beyond enabling facile and efficient targeted genome editing, the CRISPR-Cas system has the potential to be used to regulate endogenous gene expression or to label specific chromosomal loci in living cells or organisms. Catalytically inactive or “dead” Cas9 (dCas9), a variant bearing both the D10A and H840A mutations that does not cleave DNA, can be recruited by gRNAs to specific target DNA sites7, 12(Fig. 3e). Targeting of dCas9 to promoters was initially shown to enable repression of gene expression in both Escherichia coli71 and human cells72. Interestingly, dCas9 repressed a bacterial promoter efficiently when recruited with gRNAs that interacted with either strand of sequences located upstream of the promoter; however, when targeting sites downstream of the transcription start point, only gRNAs that interacted with the non-template strand induced dCas9-mediated repression71. dCas9 also provides a general platform for recruitment of heterologous effector domains to specific genomic loci (Fig. 3f). For example, dCas9 fusions to a transcriptional activation domain (VP64 or the p65 subunit of nuclear factor kappa B; NF-κB) or a transcriptional repression domain (the Krüppel associated box (KRAB) domain) have been shown to regulate the expression of endogenous genes in human73-75 and mouse cells76 as well as in bacteria71. Changes in gene expression induced by these dCas9 fusions in human cells thus far appear to be generally lower than those induced by similar TALE-based transcription factors49, 77-80. However, multiplex recruitment of dCas9-based activators using between 2 and 10 sgRNAs targeted to the same promoter can result in substantially higher levels of human gene activation, presumably due to the phenomenon of activator synergy49, 73, 74, 76. This capability of dCas9-based activators to function synergistically is consistent with previous observations for TALE-based activators77, 78 in human cells. It will be interesting to see in future experiments whether dCas9 fusions to histone modifiers and TET proteins can also be used to perform targeted “epigenome editing” including the alteration of specific histone modifications and demethylation of particular cytosine bases in human cells, as has been recently described with TAL effector DNA-binding domains75, 81, 82.
An alternative strategy for tethering heterologous effector domains to DNA-bound dCas9-gRNA complexes is to exploit well-defined RNA–protein interaction pairs. This approach uses two engineered components: a gRNA that has one or two RNA binding sites for the phage MS2 coat protein fused to its 3′ end; and a fusion of MS2 coat protein to an effector domain49. Addition of the MS2 RNA binding sequences to the gRNA does not abolish its ability to target dCas9 to specific DNA sites. Furthermore, co-expression of the MS2 coat protein fusion with the hybrid gRNA and dCas9 has been used to recruit activation domains to a gene promoter in human cells49. Although the activation observed appears to be somewhat less robust than direct fusions to dCas9, this type of configuration might provide additional options and flexibility for multiplex recruitment of effector domains to a specific promoter. For example, one can imagine leveraging multiple gRNAs and MS2 coat protein binding sites on each of gRNA to recruit many copies of different domains to the same promoter.
Thus far, evidence suggests that the effects of the small number of dCas9-activation domain or dCas9-repression domain fusions tested to date can be highly specific in human cells as judged by RNA-seq experiments76, 83; however, this may be because not all binding events lead to changes in gene transcription. It remains to be determined whether the effects of other dCas9 fusions prove to be as specific in their genome-wide effects as these dCas9-based activators and repressors or if, like their nuclease counterparts, they will benefit from enhanced specificity conferred by tru-gRNAs.
In yet another application, it has been demonstrated that an EGFP-dCas9 fusion can be used to visualize DNA loci harboring repetitive sequences such as telomeres with just a single gRNA or non-repetitive loci using 26 to 36 gRNAs tiled across a 5 kb region of DNA84. This imaging strategy provides a powerful tool for studying chromosome dynamics and structure and extends the dCas9 system beyond gene expression-based applications.
Progress in the development of Cas9-based technologies over the past eighteen months has been stunning to watch but many interesting questions and applications remain yet to be addressed and explored.
First, methods for expanding the targeting range of RNA-guided Cas9 will be important for inducing precise HDR or NHEJ events as well as for implementing the paired nickase and other multiplex strategies. As noted above, a major source of this targeting range restriction for Cas9, paired Cas9 nickases, and dCas9 fusions is the need for a PAM sequence matching the form NGG. The ability to exploit alternative PAM sequences of the form NAG or NNGG has been noted7, 8, 51 but more experiments are needed to ascertain how robustly these sequences are actually recognized and cleaved. Other gRNA-Cas9 platforms with different PAM sequences isolated from Streptococcus thermophilus, Neisseria meningitidis, and Treponema denticola have also been characterized10, 11, 14 and identification of more of these systems from other species85 could further enhance the targeting range of the platform.
Second, the field urgently needs to develop unbiased strategies to globally assess the off-target effects of Cas9 nucleases or paired nickases in any genome of interest. Such methods will be crucial for evaluating how effectively improvements described to date actually enhance the specificity of the platform. In addition, although tru-gRNAs and paired nickases can reduce off-target effects, it is possible that further improvements will be needed, especially for therapeutic applications. Ideally, these new strategies could be combined with existing approaches, such as the paired nickases and tru-gRNAs. Examples of such improvements might involve using protein engineering to modify Cas9 and/or modifying the nucleotides used by the gRNA to mediate recognition of the target DNA site. Alternatively, the construction of inducible forms of Cas9 and/or gRNAs might provide a means to regulate the active concentration of these reagents in the cell and thereby improve the ratio of on- and off-target effects.
Third, methods for efficient delivery and expression of CRISPR-Cas system components will undoubtedly need to be optimized for each particular cell-type or organism to be modified. For example, some cell types might be refractory to transfection and/or infection by standard viral vectors. A related challenge will be to develop methods that enable tissue-specific, cell-type-specific or developmental stage-specific expression of either the gRNAs and/or the Cas9 nuclease. Strategies that enable efficient multiplex expression of large numbers of gRNAs simultaneously from one vector will also enable this advantage of the technology to be leveraged. Collectively, these advances will be important for research use and therapeutic applications.
Lastly, strategies for shifting the balance away from NHEJ-mediated indel mutations and towards HDR-driven alterations remain a priority for development. Although high rates of HDR can be achieved with the CRISPR RGNs and single-stranded DNA oligonucleotides, competing mutagenic NHEJ also occurs simultaneously. This limitation is particularly problematic when using HDR to induce point mutation changes (as opposed to insertions) in the protospacer part of the target site; these successfully types of altered alleles can still be efficiently re-cut and then mutagenized by NHEJ, thereby reducing the yield of correctly edited sequences. One of the challenges with developing an approach to improve the HDR:NHEJ ratio is that inhibition of NHEJ is likely to be poorly tolerated by most cells, given its central role in normal DNA repair. For therapeutic applications seeking to exploit HDR in particular, reduction or elimination competing NHEJ will be a critically important requirement.
The simplicity, high efficiency and broad applicability of the RNA-guided Cas9 system have positioned this technology to transform biological and biomedical research. The ease with which researchers can now make changes in the sequence or expression of any gene will enable reverse genetics to be performed in virtually any organism or cell type of interest. In addition, the capability to construct large libraries of gRNAs for altering or regulating genes of interest will enable facile comprehensive forward genetic screens. All of these systems can also be multiplexed by expressing multiple gRNAs in a single cell, thereby further extending the complexity of forward and reverse genetic experiments that can be performed. Although the off-target effects of Cas9 remain to be defined on a genome-wide scale, much progress has already been made toward improving specificity and further advances will undoubtedly come rapidly given the intensity of research efforts in this area. All of these recent and future advances in developing and optimizing Cas9-based systems for genome and epigenome editing should propel the technology forward to therapeutic applications, opening the door to treating a wide variety of human diseases.
J.K.J. is grateful for support from the US National Institutes of Health (NIH) (grants DP1 GM105378 and R01 GM088040), the Defense Advanced Research Projects Agency (grant W911NF-11-2-0056) and The Jim and Ann Orr Massachusetts General Hospital Research Scholar Award. This material is based upon work supported by, or in part by, the U. S. Army Research Laboratory and the U. S. Army Research Office under grant number W911NF-11-2-0056. The authors apologize to colleagues whose studies were not cited due to length and reference constraints.
Competing Interests Statement
J.K.J. has financial interests in Editas Medicine and Transposagen Biopharmaceuticals. J.K.J.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.