|Home | About | Journals | Submit | Contact Us | Français|
The ability to direct functional domains to specific DNA sequences is a long sought-after goal for studying and engineering biological processes. Transcription activator like effectors (TALEs) from Xanthomonas sp. present a promising platform for designing sequence-specific DNA binding proteins. Here we describe a robust and rapid method for overcoming the difficulty of constructing TALE repeat domains. We synthesized 17 designer TALEs (dTALEs) that are customized to recognize specific DNA binding sites, and demonstrate that dTALEs can specifically modulate transcription of endogenous genes (Sox2 and Klf4) from the native genome in human cells. dTALEs provide a designable DNA targeting platform for the interrogation and engineering of biological systems.
Systematic interrogation and engineering of biological systems in normal and pathological states depend on the ability to manipulate the genome of target cells with efficiency and precision1, 2. Some naturally occurring DNA binding proteins have been engineered to enable sequence-specific DNA perturbation, including designer polydactyl zinc finger (ZFs)3–5 and meganuclease6, 7 proteins. In particular, designer ZFs can be attached to a wide variety of effector domains such as nucleases, transcription effectors, and epigenetic modifying enzymes to carry out site-specific modifications near their DNA binding site. However, due to the lack of a simple correspondence between amino acid sequence and DNA recognition, design and development of sequence-specific DNA binding proteins based on designer ZFs and meganucleases remain difficult and expensive, often involving elaborate screening procedures and long development time on the order of several weeks. Here we developed an alternative DNA targeting platform based on the naturally occurring transcription activator-like effectors (TALEs) from Xanthomonas sp.8–11
TALEs are natural effector proteins secreted by numerous species of Xanthomonas to modulate host gene expression and facilitate bacterial colonization and survival9, 11. Recent studies of TALEs have revealed an elegant code linking the repetitive region of TALEs with its target DNA binding site8, 10. Common among the entire family of TALEs is a highly conserved and repetitive region within the middle of the protein, consisting of tandem repeats of mostly 33 or 34 amino acid segments (Fig. 1a). Repeat monomers differ from each other mainly in amino acid positions 12 and 13 (variable diresidues), and recent computational and functional analysis9, 11 have revealed a strong correlation between unique pairs of amino acids at positions 12 and 13 and the corresponding nucleotide in the TALE binding site (e.g. NI to A, HD to C, NG to T, and NN to G or A; Fig. 1a). The existence of this strong association suggests a potentially designable protein with sequence-specific DNA binding capabilities, and the possibility of applying designer TALEs to specify DNA binding in mammalian cells. However, our ability to test the modularity of the TALE DNA binding code remains limited due to the difficulty in constructing custom TALEs with specific tandem repeat monomers. Early studies have tested the DNA binding properties of TALEs8, 12–16, including two studies that tested artificial TALEs with customized repeat regions13, 14.
A prerequisite for exploring the modularity TALE repeat monomers is the ability to synthesize designer TALEs with tailored repetitive DNA binding domains. While this has been recently shown to be possible13, 14, 17, the repetitive nature of the TALE DNA binding domains renders routine construction of novel TALEs difficult when using PCR-based gene assembly or serial DNA ligation, and may not be amenable to high-throughput TALE synthesis. Furthermore, even though commercial services can be employed for the synthesis of novel TALE binding domains14, they present a cost-prohibitive option for large scale TALE construction and testing. Hence a more robust protocol to construct large numbers of designer TALEs would enable ready perturbation of any genome target in many organisms.
To enable high-throughput construction of designer TALEs, we developed a reliable hierarchical ligation-based strategy to overcome the difficulty of constructing TALE tandem repeat domains (Fig. 1b and Supplementary Methods). To reduce the repetitiveness of designer TALEs and to facilitate amplification using PCR, we first optimized the DNA sequence of the four repeat monomers (NI, HD, NN, NG) to minimize repetitiveness while preserving the amino acid sequence. In order to assemble the individual monomers in a specific order, we altered the DNA sequence at the junction between each pair of monomers, similar to the Golden Gate cloning strategy for multi-piece DNA ligation18, 19. Using different codons to represent the junction between each pair of monomers (Gly-Leu), we designed unique 4 base pair sticky-end ligation adapters for each junction (Supplementary Methods). Using this strategy, 4 monomers can be ligated simultaneously to form 4-mer tandem repeats. Three 4-mer repeats can be simultaneously ligated to form the desired 12-mer tandem repeat and subsequently ligated into a backbone vector containing a 0.5 length repeat monomer specifying the 13th nucleotide of the binding site at the C-terminus of the repeat domain, as well as the N- and C-terminal non-repetitive regions from the Xanthomonas campestris pv. armoraciae TALE hax3 (Fig. 1b).
Using this method, we attempted to construct 17 artificial TALEs with specific combinations of 12.5-mer repeats to target 14 base pair DNA binding sites – TALEs require the first letter of the binding site to be a T, and the 12.5-mer repeat targets a 13 bp binding site. We analyzed 2 clones for each of the 17 dTALEs via sequencing and found that all dTALEs were accurately assembled. Furthermore, we were able to construct 16 dTALEs in parallel in 3 days, a time substantially shorter than what is required for constructing a similar number of dTALEs via commercial DNA synthesis, and at a fraction of the cost.
The DNA binding code of TALEs was identified based on analysis of TALE binding sites in plant genomes8, 10 and the binding specificity of TALEs have been analyzed using various in vitro and in vivo methods8, 14–17, 20–22. In order to determine whether this code can be used to target DNA in mammalian cells, we designed a fluorescence-based reporter system (Supplementary Fig. 1) by placing the DNA binding site for each dTALE upstream of a minimal CMV promoter driving the fluorescence reporter gene mCherry (Fig. 1c). To generate dTALE transcription factors, we replaced the endogenous nuclear localization signal (NLS) and acidic transcription activation domain (AD) of wild type hax3 with a mammalian NLS derived from the simian virus 40 large T-antigen and the synthetic transcription activation domain VP64 (Fig. 1c). To allow quantitative comparison of the dTALE activity, we also fused a self-cleaving green fluorescent protein (GFP) to the C-terminus of each dTALE, so that we could quantify the relative level of dTALE expression using GFP fluorescence measurements.
Co-transfection of a dTALE (dTALE1) and its corresponding reporter plasmid in the human embryonic kidney cell line 293FT led to robust mCherry fluorescence (Fig. 1d). In contrast, transfection of 293FT cells with the reporter construct alone did not yield appreciable levels of fluorescence (Fig. 1d). Therefore, dTALEs are capable of recognizing their target DNA sequences, as predicted by the TALE DNA binding code, in mammalian cells. We quantified the level of reporter induction by measuring the ratio of total mCherry fluorescence intensity between cells co-transfected with dTALE and its corresponding reporter plasmid, and cells transfected with the reporter plasmid alone. To account for differences in dTALE expression level, we use the total GFP fluorescence from each dTALE transfection as a normalization factor to assess the fold of reporter induction fold.
Next we asked whether the DNA recognition code is sufficiently modular so that dTALEs could be customized to target any DNA sequence of interest. We first synthesized 13 distinct dTALEs targeting a range of DNA binding sites with diverse DNA sequence compositions (Fig. 2a) and found that 10 out of 13 dTALEs (77%) drove robust mCherry expression (> 10 folds) from their corresponding reporters. Three dTALEs exhibited more than 50 folds reporter induction (dTALE1, dTALE4 and dTALE8), and only one out of 13 dTALEs (dTALE11) generated less than 5 fold induction of mCherry reporter expression (Fig. 2a). As a positive control, we constructed an artificial zinc finger-VP64 (ZF-VP64) fusion, where the ZF has previously been shown to activate transcription from a binding site in the human erbB-2 promoter23. This artificial ZF-VP64 protein was tested using the same mCherry reporter assay and demonstrated approximately 16-fold mCherry reporter activation (Fig. 2a).
These data indicate that sequence-specific dTALEs can be designed and synthesized to target a wide spectrum of DNA binding sites at a similar or greater level as artificial ZF-VP64 transcription factors. While most dTALEs exhibited robust transcription activation in our reporter assay, the large range of observed activity suggests that other effects might contribute to dTALE DNA-targeting efficacy. Possible causes might include differences in DNA-interacting capabilities such as binding strength of individual repeat types, context-dependence of monomer binding strength, or complexities of mammalian transcription processes22.
To further characterize the robustness of dTALEs activities and their DNA binding specificity, we altered the target nucleotides in the binding sites of dTALE1 and dTALE13 to test the impact of mismatch position and number on dTALE activity. In general, we found that dTALE activity is inversely correlated with the number of mismatches (Supplementary Figs. 2 and 3). However, the specific dTALE recognition rules most likely depend on a combination of positional and contextual effects as well as the number of mismatchs8, 15, 16, 22, and need to be further characterized in greater detail.
Each fully assembled dTALE has more than 800 amino acids. Therefore we sought to identify the minimal N- and C-terminal capping region necessary for DNA binding activity. We used Protean (LASERGENE) to predict the secondary structure of the TALE N- and C-termini and truncations were made at predicted loop regions. We first generated a series of N-terminal dTALE1 truncation mutants and found that transcriptional activity is inversely correlated with the N-terminus length (Fig. 2b,c). Deletion of 48 amino acids from the N-terminus (truncation mutant N1-C0, Fig. 2c) retained the same level of transcription activity as the full length N-term dTALE1, while deletion of 141 amino acids from the N-term (truncation mutant N2-C0, Fig. 2c) retained approximately 80% of transcription activity. Therefore given its full transcriptional activity, we chose to use truncation position N1 for all subsequent studies.
Similar truncation analysis in the C-terminus revealed that a critical element for DNA binding resides within the first 68 amino acids (Fig. 2b,d). Truncation mutant N1-C3 retained the same level of transcriptional activity as the full C-terminus, whereas truncation mutant N1-C4 reduces dTALE1 activity by more than 50% (Fig. 2d). Therefore in order to preserve the highest level of dTALE activity, approximately 68 amino acids of the C-terminus of hax3 should be preserved.
The modularity of the TALE code is ideal for designing artificial transcription factors for transcriptional manipulation from the mammalian genome. In order to test whether dTALE could be used to modulate transcription of endogenous genes, we designed 4 additional dTALEs to directly activate transcription of Sox2, Klf4, c-Myc, and Oct4. dTALE binding sites were selected from the proximal 200bp promoter region of each gene (Fig. 3a). To assay the DNA binding activity of the 4 new dTALEs, we used the mCherry reporter assay as in previous experiments. three out of four dTALEs (Sox2-dTALE, Klf4-dTALE, and cMyc-dTALE) exhibited greater than 20-fold of mCherry reporter activation (Fig. 3a,b).
To test the activity of dTALEs on endogenous genes, we transfected each dTALE into 293FT cells and quantified mRNA levels of each target gene using qRT-PCR. dTALE-Sox2 and dTALE-Klf4 were able to upregulate their respective target genes by 5.5 ± 0.1 and 2.2 ± 0.1 folds (Fig. 3c), providing a demonstration that dTALE can be used to modulate transcription from the genome. To control for specificity of activation, we transfected 293FT cells in parallel with dTALE1, which was not designed to target either Sox2 or Klf4, and found no change in the level of Sox2 or Klf4 expression relative to the mock control. Interestingly, we observed a statistically significant decrease in the level of Klf4 mRNA in 293FT cells transfected with Sox2-dTALE (approximately a 2-fold reduction). This is potentially due to secondary cross-regulation among reprogramming factors24, 25. Finally, it is worth noticing that cMyc-dTALE and Oct4-dTALE did not upregulate their target genes (data not shown). This is not surprising as different genetic loci may not be equally accessible for activation, possibly due to epigenetic repression. Together, the data demonstrated that dTALE can be designed to bind and specifically activate transcription from the promoters of endogenous mammalian genes.
The modular nature of the TALE DNA recognition code provides a novel and attractive solution for achieving sequence-specific DNA interaction in mammalian cells. For the first time, sequence-specific DNA binding proteins with predictable binding specificity can be generated in a matter of days, economically using molecular biology methods accessible to most. Future studies exploring the molecular basis of TALE-DNA interaction will likely extend the modular nature of the TALE code for increased precision, specificity, and robustness. Given the ability of dTALEs to efficiently anchor transcription effector modules to endogenous genomic targets, other functional modules, including nucleases26, recombinases27, and epigenetic modifying enzymes2, can be similarly targeted to specific binding sites. The designer TALE toolbox will empower researchers, clinicians, and technologists alike with a new repertoire of programmable precision genome engineering technologies.
Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/.
Note: Supplementary information is available on the Nature Biotechnology website.
This work was supported by the Harvard Society of Fellows (F.Z.), National Human Genome Research Institute Center for Excellence in Genomics Science (P50 HG003170, G.M.C.), Department of Energy Genomes to Life (DE-FG02-02ER63445, G.M.C.), Defense Advanced Research Projects Agency (W911NF-08-1-0254, G.M.C.), the Wyss Institute for Biologically Inspired Engineering (G.M.C.), and NIH Transformative R01 (R01 NS073124-01, for F.Z. and P.A.). S.L. was partially supported by a predoctoral fellowship from the European School of Molecular Medicine (S.E.M.M.). We thank the entire Church and Arlotta laboratories for discussion and support.
Competing Financial Interests
The authors declare no competing financial interests.
Author ContributionsF.Z. and L.C. conceived the study. F.Z., L.C., S.L., and S.K. designed, performed, and analyzed all experiments. P.A. supervised the work of S.L. and G.M.C. supervised the work of F.Z., L.C., and S.K. G.M.C. and P.A. and F.Z. provided support for this study. F.Z. and L.C. wrote the manuscript with support from all authors.