|Home | About | Journals | Submit | Contact Us | Français|
New technologies have recently emerged that enable targeted editing of genomes in diverse systems. This includes precise manipulation of gene sequences in their natural chromosomal context and addition of transgenes to specific genomic loci. This progress has been facilitated by advances in engineering targeted nucleases with programmable, site-specific DNA-binding domains, including zinc finger proteins and transcription activator-like effectors (TALEs). Recent improvements have enhanced nuclease performance, accelerated nuclease assembly, and lowered the cost of genome editing. These advances are driving new approaches to many areas of biotechnology, including biopharmaceutical production, agriculture, creation of transgenic organisms and cell lines, and studies of genome structure, regulation, and function. Genome editing is also being investigated in preclinical and clinical gene therapies for many diseases.
Genome editing is the introduction of a predetermined sequence change to the chromosomal DNA of a cellular genome. The instructions for almost all functions of living systems are encoded in the genome. Consequently, the ability to easily and precisely add, remove, or exchange DNA sequences within a cellular genome would theoretically enable routine reprogramming of biological systems for numerous applications relevant to all areas of biotechnology, including medicine, energy, and the environment. The editing of genome sequences in diverse cell types and species has recently become possible through the advent of synthetic nucleases that can be engineered to target almost any site in a complex genome. The enhancement of gene targeting through nuclease-mediated DNA cleavage has been known for over fifteen years, but genome editing has not been widely applied to diverse areas of biotechnology until recently (Figure 1). This rapid growth is the result of the increased availability of public and commercial sources for engineering targeted nucleases (Table 1), as well as significant progress in enhancing and monitoring genomic modifications. Despite the exponential growth of the use of this technology, current methods still do not fulfill the criteria of an ideal gene editing tool: 1) high frequency of desired sequence changes in the target cell population, 2) no off-target mutations, and 3) rapid and efficient assembly of nucleases that target any site in the genome at low cost. Progress in genome editing has been the subject of several comprehensive review articles [1–3]. Therefore this review emphasizes the most significant advances in genome editing in the last few years and the corresponding adoption of this technology for new applications. We also discuss the current challenges and future directions necessary to establish a genome editing technology that is sufficiently robust, efficient, specific, economical, and readily available for routine use in research and biotechnology.
The engineering of enzymes that target specific sequences within complex genomes is a formidable challenge. The most successful approaches to date have been based on modular proteins in which a DNA-binding domain that recognizes the target DNA sequence is fused to an effector domain that catalyzes changes to the structure or function of the target gene. The DNA recognition domain is typically based on the structure of natural DNA-binding proteins, including zinc finger proteins and transcription activator-like effectors. These targeted DNA-binding proteins can be combined with effector domains to create functional enzymes, including synthetic transcription factors, methyltransferases, integrases, nucleases, and recombinases, that modify genes in many cell types and species.
The Cys2-His2 zinc finger domain is the most common DNA-binding motif in the human proteome and consists of 30 amino acids in a ββα configuration, where the α-helix projects into the major groove of DNA and recognizes 3–4 contiguous nucleotide bases  (Figure 2A). The DNA-binding specificity of synthetic zinc finger domains has been extensively engineered through site-directed mutagenesis and rational design or the selection of large combinatorial libraries. Collectively, this work has yielded unique zinc finger domains with specificity for almost all of the 64 possible nucleotide triplets . Significantly, the modular structure of the zinc finger motif permits the conjunction of several domains in series, allowing for the recognition and targeting of extended sequences in multiples of three nucleotides. As a result of this work, it is now theoretically possible to design synthetic zinc finger proteins to bind practically any target in the genome of any species.
Despite the numerous successful uses of engineered zinc finger proteins for regulating and editing genes in many species and cell types, the full potential of this technology has not yet been fulfilled. This has largely been attributed to the nuances of zinc finger protein engineering and failed attempts to adopt the methods in new laboratories. Although the source of the challenges in zinc finger engineering is still unclear, several new methods have become available in recent years that facilitate the rapid assembly and screening of numerous novel zinc finger proteins in parallel (Table 1). The modular assembly approach uses a single engineered zinc finger domain for each possible three base pair sequence [5,6]. The resulting zinc finger array is assembled from this library for any particular target sequence with the assistance of an online web server [6,7] and is then created with standard recombinant DNA methods or by commercial gene synthesis [8,9]. An alternative approach, known as “OPEN”, selects new zinc finger proteins from randomized libraries for each new target site . Although this strategy entails considerably more effort and resources than modular assembly, it has been reported to generate functional zinc finger proteins with a higher frequency of success [10,11]. In 2010, a method labeled “context-dependent assembly” (CoDA), which recombines zinc finger domains that have previously been validated to work together, was suggested to be highly effective by accounting for interactions between zinc fingers while maintaining the simplicity of modular assembly . An archive of optimized two-finger modules has also been described that generates highly successful proteins with a greater targeting range than CoDA . Finally, engineered zinc finger proteins are available commercially for custom targets from Sigma-Aldrich Corporation’s CompoZr Zinc Finger Nuclease platform. This commercial service was created through partnership with Sangamo Biosciences, Inc. (Richmond, CA) and licensing of Sangamo’s proprietary methods for assembling zinc finger proteins.
The discovery of a simple modular DNA recognition code by transcription activator-like effectors (TALEs), reported in 2009, created another option for engineering programmable DNA-binding proteins [14,15]. TALEs are naturally occurring DNA-binding proteins produced by plant pathogenic bacteria to regulate host gene expression. In contrast to zinc finger domains of 30 amino acids that each recognize three base pairs, each TALE repeat consists of 34 amino acids and recognizes only a single base pair [14–16] (Figure 2B). DNA binding preference by each repeat is determined by only two hypervariable amino acids in positions 12 and 13, called repeat-variable diresidues [14, 15]. Like zinc finger domains, these modular TALE repeats can be linked together to recognize a specific DNA sequence and then fused with transcriptional activation domains or nuclease domains to direct enzyme activity to targeted chromosomal loci [17–20]. Several protocols have recently been described that enable rapid assembly of custom TALE repeat arrays in only a few days using publicly available reagents [18,20,21] (Table 1). Custom engineered TALEs have also become available commercially through Cellectis Bioresearch (Paris, France) and Life Technologies (Grand Island, NY). The rapid progress of TALE engineering relative to the development of synthetic zinc finger proteins has led many to suggest that this protein motif may be more amenable to reengineering, potentially due to a more modular structure . However TALEs have been much less studied than zinc finger proteins, and continued work is necessary to fully understand the strengths and weaknesses of these different technologies.
Although conventional homologous recombination can be used to introduce sequence changes into the genomic DNA of some species and cell types, this process is not efficient enough for most applications in which genome editing would be useful. However, the synthetic DNA-binding proteins described above can be used to engineer nucleases that can be targeted to almost any site in a cellular genome [1–3]. These nucleases create targeted double-strand breaks (DSBs) that stimulate natural DNA repair machinery to mend these breaks by non-homologous end joining (NHEJ). This repair pathway can be used to disrupt a gene or excise segments of genomes. Alternatively, the DSB also dramatically enhances the rate of homologous recombination at that locus and a homologous donor template can be delivered to the cells along with the nuclease to target gene addition to that site or make small substitutions to gene sequences. Recent advances in these techniques that have enabled more effective genome editing are described below.
Engineered zinc finger proteins or TALEs can be fused to the catalytic domain of a restriction endonuclease to generate zinc finger nucleases (ZFNs) or TALE nucleases (TALENs) that create a DSB at the locus of interest. Because the nuclease acts as a dimer, two DNA-binding proteins must be engineered to target adjacent sequences, separated by a spacer region where the nuclease catalytic domain can dimerize and cleave the target DNA [1–3]. The catalytic domain most commonly used to induce the DSB is derived from the type IIS restriction endonuclease FokI. Several recent advances in the structure of the FokI domain have been made to increase its activity and specificity. First, a directed evolution strategy was used to identify a hyperactive FokI variant, named Sharkey, that increases cleavage activity in vitro and in vivo . Second, several mutations have been described that prevent unwanted FokI homodimer formation and genotoxic cleavage of off-target sequences . Although FokI heterodimer variants had previously been described to prevent homodimer formation, this was accompanied by lower catalytic activity. These new mutations appear to restore the lost nuclease activity while maintaining the strict requirement for heterodimer formation . Finally, variants of FokI have been described that act as orthogonal obligate heterodimers, such that autonomous pairs of nucleases can be used together without cross-reactivity or homodimer formation .
Although NHEJ-based repair of DSBs is sufficient for gene disruption  and the deletion of chromosomal segments , the introduction of new sequences to the nuclease target site requires a homologous donor repair template [28–30]. For certain applications, creating this homologous donor with homology arms of >700 base pairs may be complicated or laborious. Two new approaches have provided methods for simplification of this strategy. First, linear donor sequences with as little as 50 base pairs of homology were efficiently integrated into sites of nuclease cleavage in human cells . Second, single-stranded DNA oligonucleotides, in combination with engineered nucleases, were used instead of a donor targeting vector to induce targeted point mutations, deletions or insertions of short sequences [32,33]. Importantly, oligonucleotide-based templates contain the minimum genetic information needed to introduce DNA sequence changes, therefore reducing the risk of off-target effects.
Current methods for nuclease-mediated genome editing do not allow for directing gene repair exclusively to either the NHEJ or homologous recombination pathway. Consequently, the inclusion of a donor vector with the nuclease treatment results in a cell population containing a mixture of cells modified by both pathways . In order to better monitor this process, a “traffic light” reporter system was created in which NHEJ and homologous recombination events are differentially monitored by flow cytometric analysis of green and red fluorescence . This study also identified factors that bias the balance of the two repair pathways and confirmed findings that creating single-strand breaks with a nickase, in contrast to DSBs with a nuclease, favors homology-directed repair and minimizes NHEJ [36–38]. The use of episomal fluorescent reporters of gene repair was later extended to enrich for cells modified at their endogenous locus by cytometric cell sorting . Other methods that have been used to improve the efficiency of genome editing include the regulation of nuclease activity with small molecules to minimize toxicity  and transient hypothermia to increase nuclease expression levels .
The usefulness of genome editing technologies is largely dependent on achieving single site specificity in the context of large and complex genomes. However, it is challenging to prove that no other sequences across the whole genome are unintentionally modified. This is particularly important given the observed cytotoxicity of many nucleases, presumably due to off-target DNA cleavage. Analysis had previously been limited to predicting potential off-target sites based on in vitro binding profiles . To address these concerns, new methods have been developed for comprehensive mapping of nuclease activity in vitro  and in vivo [44,45]. Although the ZFNs analyzed in these studies acted primarily at their intended target site, many previously unknown off-target sites were also revealed. These off-target sites had high sequence homology to the intended ZFN binding site and therefore these methods will be valuable to designing improved nucleases. Additionally, advances in high-throughput DNA sequencing have enabled direct analysis of genomes with single base pair resolution. For example, a recent study sequenced the complete exome of a ZFN-treated clonal cell population and showed that only a single point mutation was created in this process .
The advent of genome editing has created a variety of new approaches that are progressively becoming routine methods to interrogate biological systems. The accessibility of commercially and publicly available custom nucleases has facilitated novel studies of protein glycosylation , gene destabilization , protein localization and dynamics [49,50], chromosomal translocation  and DNA repair . Genome editing tools can be used to model human disease  or generate human  or mouse  isogenic cell lines that allow for robust and uniform transgene expression. Traditionally, gene targeting in animal models has been largely limited to mice, but engineered nucleases have enabled targeted gene modifications in rats , pigs , zebrafish [58,59], frogs , rabbits , cattle , flies , and worms . Furthermore, gene targeting in zygotes is possible, independent of embryonic stem cells . Genome editing in plants is providing new opportunities in agricultural biotechnology for the production of food and biofuels [65,66]. Finally, genome editing has been applied to the generation of apoptosis-resistant mammalian cells lines for improved biopharmaceutical production .
The field of gene therapy has typically focused on the addition of new genes to cells, leading to a variety of challenges and obstacles. Genome editing has provided several distinct means for addressing the limitations of previous gene therapy approaches. First, transgenes can be added to specific “safe harbor” loci in the genome [68,69], in contrast to conventional gene delivery vectors that integrate randomly into chromosomal DNA. This approach was recently explored as a gene therapy for chronic granulomatous disease . Alternatively, the disease-causing mutations can be directly corrected by genome editing, as has been done in studies of X-linked SCID , α1-antitrypsin deficiency , sickle cell anemia [72,73], hemophilia B , and p53-related cancer . Genes may also be disrupted by genome editing to produce therapeutic phenotypes. For example, the HIV co-receptor CCR5 has been disrupted in both T cells  and hematopoietic stem cells  thus blocking HIV entry. Clinical trials are underway with this approach (NCT00842634, NCT01044654, and NCT01252641) and at the time of this review, data from the first phase 1 clinical trial has demonstrated improvement in several clinical parameters while being well tolerated. The HIV co-receptor CXCR4 has also been targeted in similar preclinical studies . Furthermore, gene editing has been used to enhance cellular immunotherapy by disrupting endogenous T cell receptors [78,79] or the glucocorticoid receptor in T cells in a clinical trial for malignant glioma (NCT01082926). Finally, successful genome editing in human embryonic stem cells and induced pluripotent stem cells has provided new avenues for genetic correction or augmentation in regenerative medicine [68,80–82].
Genome editing is rapidly progressing towards a golden era of easily accessible, highly specific enzymes that can directly manipulate genomic targets of interest. In the last two years, there has been an explosion in the number and diversity of applications of this technology (Figure 1). Collectively these advances represent a paradigm shift in the way we manipulate and study complex genomes and cellular processes.
Several challenges and opportunities still remain as these technologies move towards widespread adoption. A large-scale study of the in vitro and in vivo DNA-binding properties of TALEs relative to zinc finger proteins would provide insightful information on the differential activity and capacity for reengineering of these scaffolds. There is still much work to be done to further improve both the specificity of engineered nucleases as well as the methods used to monitor off-target events, and advances in high-throughput sequencing are facilitating these efforts [43,45,46]. The structure and epigenetic state of the genomic target site are likely equally important as the engineered DNA-binding proteins , and this is a subject that has largely been understudied. The continued development of methods for controlling the mechanisms of DNA repair will enhance the robustness of genome editing and the uniformity of modified alleles [34–38]. Improved methods for efficient nuclease delivery, particularly in vivo , will be essential to translating genome editing into gene therapies. Recent evidence that AAV-based homologous donor vectors lead to enhanced homology-directed repair provides a promising path forward in this area [83–86]. Finally, the development of methods for genome editing that do not depend on DNA repair pathways, such as zinc finger recombinases [87–91], may ultimately improve the safety and specificity of genome editing.
This work was supported by The Hartwell Foundation, a Basil O’Connor Starter Scholar Award from the March of Dimes, an NSF Faculty Early Career Development (CAREER) Award (1151035) and an NIH Director’s New Innovator Award (1DP2OD008586).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.