|Home | About | Journals | Submit | Contact Us | Français|
TAL effectors are proteins secreted by bacterial pathogens into plant cells, where they enter the nucleus and activate expression of individual genes. TAL effectors display a modular architecture that includes a central DNA-binding region comprising a tandem array of nearly identical repeats that are almost all 34 residues long. Residue number 13 in each TAL repeat (one of two consecutive polymorphic amino acids that are termed ‘repeat variable diresidues’, or ‘RVDs’) specifies the identity of a single base; collectively the sequential repeats and their RVDs dictate the recognition of sequential bases along one of the two DNA strands. The modular architecture of TAL effectors has facilitated their extremely rapid development and application as artificial gene targeting reagents, particularly in the form of site-specific nucleases. Recent crystallographic and biochemical analyses of TAL effectors have established the structural basis of their DNA recognition properties and provide clear directions for future research.
TAL effectors are trans-kingdom transcription factors that are secreted by plant pathogenic bacteria in the genus Xanthomonas [1,2]. Diseases caused by the many species and pathovars of Xanthomonas collectively affect a wide variety of plants, including several major crop and ornamental species , and their TAL effectors play critical roles in determining whether the bacterium is able to infect its host. The first TAL effector identified was AvrBs3 from Xanthomonas campestris pv. vesicatoria (a pathogen of pepper). AvrBs3 triggers a plant immune response in strains of pepper that carry the disease resistance gene Bs3. First characterized genetically, AvrBs3 activity was shown to correspond to a DNA fragment on a self-transmissable plasmid that encoded a 125 kilodalton protein on one strand and an 82 kilodalton protein on the opposite strand . A comparison of the two ‘mirror’ reading frames led to the observation that “a remarkable feature of both ORFs is the presence of 17 direct 102 bp repeats which [within each ORF] share 91% to 100% homology with each other” . It was subsequently demonstrated that the open reading frame on the first strand was the avrBs3 gene, providing the archetypal amino acid sequence for this protein class .
Subsequent studies revealed members of the avrBs3 family in a variety of Xanthomonas species, including several that like avrBs3 act as avirulence factors corresponding specifically to different host resistance genes [6,7], others that contribute to the pathogen’s ability to cause disease in susceptible plants , and some that can play either role depending on whether the plant carries the corresponding resistance gene [9,10]. Members of this protein family are also broadly distributed among diverse isolates of the plant pathogenic bacterium Ralstonia solanacearum, though these are not yet well characterized .
The first clue to the mechanism of TAL effector function came from the observation that the proteins contain functional nuclear localization signals (NLS), shown using a reporter fusion transiently expressed in onion epidermal cells . Shortly thereafter, localization of AvrBs3 itself to the nucleus following delivery by the pathogen during infection was observed and shown to be required for triggering host immunity . Identification of a C-terminal acidic activation domain in the members of the AvrBs3 family, demonstration of the functionality of this domain in a yeast assay, and discovery of the ability of the proteins to bind DNA [14–16] provided further clues regarding the protein domain architecture and interactions displayed by these genetic factors (Figure 1) and gave rise to the moniker “TAL,” which stands for “transcription activator-like” .
In 2004, a genetic study indicated that mutations in the gamma subunit of the general transcription factor IIA (TFIIA) could confer resistance to Xanthomonad infections, thereby suggesting a possible point of interaction between TAL effectors and the plant host transcriptional machinery . Subsequent studies demonstrated that activation of individual plant genes by TAL effectors is linked either to resistance  or to susceptibility [20–22] to infection. A pair of reports in 2007 further demonstrated that the avirulence protein AvrBs3 can elicit either a resistance phenotype in pepper plants via direct transcriptional activation of the ‘Bs3’ cognate resistance gene [ 23•] or a susceptibility phenotype in the same species (in the absence of Bs3) by activating several genes, including the cell-size regulator UPA20 [24•]. This analysis resulted in a description of the “upregulated by AvrBs3” (UPT) box: a nucleotide sequence that is conserved among the promoter regions of all the AvrBs3 target genes and is required for activation by AvrBs3. Together, these studies established unequivocally that TAL effectors are trans-kingdom transcription factors.
The number of repeats found in TAL effectors varies from five to over thirty, with an average of roughly 17 . Almost all are 34 amino acids in length, and they vary primarily in the identity of the residues at position 12 and 13 in each repeat, a pair of residues that were termed the ‘repeat variable diresidue’ or ‘RVD’. The repeat region always terminates with an apparently truncated repeat, containing the first 20 residues (including the RVD), that is commonly referred to as a ‘half repeat’. Overall, at least two dozen unique RVDs are observed across the known TAL effectors, out of which seven sequences are most common - HD, NG, HG, NN, NS, NI and ‘N*’. N* corresponds to a 33 residue repeat with a missing residue within the RVD loop. Two research groups independently demonstrated, in papers published back-to-back in 2009, that the string of RVDs in a TAL effector defines the length and nucleotide sequence of that effector’s DNA target, via a one-to-one correspondence of specific RVDs to specific nucleotides [25••,26••]. For example, the presence of an ‘HD’ RVD within a repeat corresponds to recognition of a cytosine, whereas an ‘NG’ or ‘HG’ RVD corresponds to thymine. The modular nature of this recognition mechanism suggested that it could be exploited as a ‘code’ to predict TAL effector DNA binding sites, and to create gene targeting proteins using custom arrays of TAL effector repeats.
Recent structural studies of TAL effectors (Figure 2), published side-by-side in early 2012, provided a clear view of the structural basis for the DNA recognition ‘code’ described above [27••,28••]. The first structure, of an artificially engineered TAL effector termed ‘dHAX3’, corresponded to a 533 residue construct containing 11 canonical repeats and the half repeat, representing three of the most common RVDs (HD, NG and NS). It was solved in the presence and absence of bound DNA to high resolution (1.85 and 2.4 Å, respectively; PDB entries 3V6T and 3V6P). The second structure, of the naturally occurring TAL effector PthXo1 from X. oryzae (PDB entry 3UGM), was solved using a high-throughput computational structure prediction and phasing strategy [29•]. Although the PthXo1 structure was only solved in the presence of bound DNA, and to much lower resolution (dmin = 3 Å), it contains over 20 repeats bound to two full turns of DNA and illustrated protein-DNA contacts for six separate types of RVDs (HD, NG, HG, NN, NI, and N*) [28••]. The PthXo1 structure also contains two highly basic ‘cryptic’ repeats located at the N-terminus that engage the DNA backbone and an essential 5′ thymine residue that immediately precedes the RVD-specified target nucleotides. Because PthXo1 was crystallized in association with its naturally occurring target DNA sequence, the complex contained several examples of RVD-nucleotide mismatches, such as NG versus cytosine, which provided further insight into the structural and biochemical determinants of RVD nucleotide specificity.
The structures both demonstrate that each TAL repeat forms a left-handed, two-helix bundle, in which the two hypervariable residues in each repeat (at positions 12 and 13) are found at the end of the loop that connects the two helices (Figure 3a). The individual repeats carry a relatively neutral overall charge and self-associate to form a right-handed superhelix that wraps around the DNA major groove along the entire length of the DNA target site. The DNA in both structures adopts an unperturbed canonical B-form duplex conformation. The structure of dHAX3 in the absence of DNA indicates that the effector displays a more extended, slightly unwound conformation, although the protein still displays a right-handed superhelical structure with a slightly longer distance separating individual RVDs [27••]. Modeling the conformation of DNA-free dHAX3 around a DNA duplex indicates that a significantly more extended effector conformation might be required for a DNA target search by the unbound protein. That hypothesis agrees with published small angle X-ray scattering (SAXS) data on the full-length PthA TAL effector in the presence and absence of bound DNA, which indicated at least a two-fold reduction in the length of the effector upon target site binding .
Both crystal structures also demonstrated that sequence-specific contacts between the effector and the DNA are formed solely by the second residue of each RVD (at position 13 in each repeat) to atoms on the major groove edge of each base on a single contiguous strand of the DNA target. In contrast, the first residue in each RVD (position 12, which is usually occupied by an asparagine or a histidine) serves a largely structural role, forming a hydrogen bond between the side chain and the backbone carbonyl oxygen from position 8 (in the first helix) in each repeat. Those contacts likely help establish a pre-bound conformation of the DNA-contacting RVD loop that ameliorates the substantial entropic cost of binding that would otherwise accompany the ordering of each TAL repeat along the entire length of its target.
The majority of observed contacts between individual RVD residues at position 13 and their corresponding nucleotide bases (Figure 3a) represent interactions that are optimized via either (i) directional hydrogen-bonds for recognition of nucleotide bases (such as HD to cytosine or NN to a purine ring); (ii) highly complementary packing in the absence of hydrogen bonds (such as between the backbone alpha carbon of a glycine in NG or HG and the extracyclic methyl carbon of a thymine base; or (iii) interactions that appear to achieve reduced (but not completely negligible) specificity through steric exclusion of alternate bases (in particular, the ‘NI’ RVD). A fourth type of interaction is represented by the ‘N*’ RVD, in which truncation of the RVD loop and lack of any side chain at position 13 appears to accommodate any base, presumably with little or no contribution to overall affinity. A recent study has demonstrated that the pattern of contacts and specificity described above can be extended to the recognition of modified bases: the presence of an NG or HG repeat (specific for thymine in an unmodified target) can accommodate similar interactions with a 5-methylcytosine, thus making it possible to identify and/or design potential TAL effectors that can discriminate between target sites that contain methylated CpG sequences and those that are unmodified [31•]. Similarly, an even more recent structural and biochemical study from the same group has demonstrated that TAL effectors can also bind DNA-RNA hybrids, doing so by reading out the DNA strand sequence. Binding of the effector protects such structures from RNase H degradation, and implies that TAL effectors may be used as research tools (or even protein therapeutics) in systems where DNA-RNA hybrids are formed [32•].
A recent recognition study, which examined the function of a variety of artificial TAL effectors that were engineered to contain long strings of single types of repeats and RVDs [33••] indicates that that ‘HD’ and ‘NN’ (or ‘HN’) repeats (which target cytosines and purines, respectively) make the strongest contribution to overall TAL effector function in transcriptional activation assays as compared to the other most common repeat types, whereas effectors containing strings of ‘NI’ repeats appear to display considerably lower function and reduced specificity. The same study also examined the relative contributions of various other RVD types to TAL effector activity and specificity, and found that an ‘NH’ RVD (which is found rarely in TAL effectors sequenced to date) demonstrates a strong preference for guanine in the TAL target site. Whether these observations reflect DNA binding properties or other unique requirements for TAL effector activity, and whether they will translate to the function of artificial gene targeting proteins that use the TAL effector repeat scaffold remains to be determined, but such data are clearly of great benefit and have provided important design guidelines for researchers in the field.
An open question regarding TAL effector function is the mechanism by which these highly unusual DNA binding proteins search for and acquire their cognate targets, and the possible role of flanking elements. Possible clues can be found in (i) the relative performance of artificial TAL constructs containing a variety of N- and C-terminal truncations, (ii) an examination of the sequences immediately N-terminal to the central TAL repeats, and (iii) the structure of the PthXo1 TAL effector bound to DNA. Various studies of gene targeting proteins constructed using TAL effector scaffolds appear to indicate that the first 120 to 150 residues are dispensable both for effector and nuclease function, and that further truncations reduce either or both activities [34,35]. The remaining N-terminal region (corresponding roughly to residues 120 to 254) that immediately precedes the beginning of the canonical TAL repeats contains a highly basic region of the protein, with about 11 conserved arginine and lysine residues that contribute significantly to the overall basic charge of the protein. The structure of the PthXo1 effector bound to its DNA target demonstrated the presence of at least two ‘cryptic’ repeats (which were termed the 0 and −1 repeats in that model) that form multiple non-specific interactions to the region of the DNA target immediately 5′ of the first sequence specific contact  (Figure 3b). This same region of the effector also engages an invariable thymine base (found at position zero of nearly all TAL binding sites) with contacts to the protein backbone and the indole ring of a single tryptophan residue that is found in all Xanthomonad TAL effectors. A more recent structural and biophysical analysis of an extended N-terminal region of dHax3 (initially identified via limited proteolytic digests) indicates that as many as four additional cryptic repeats are formed immediately upstream of the central repeat region, and that this region provides the bulk of binding energy required for high affinity target binding and sequence-specific recognition [36•].
Thus, a reasonable model for a TAL effector DNA target search would involve a rapid association and dissociation mechanism that is largely dependent upon the highly basic N-terminal flanking region, followed by a secondary ‘annealing’ process during each protein-DNA encounter, in which the central TAL repeats (that individually display limited affinity to the negatively charged DNA target) sample the opposing nucleotide base identity sequentially along the target, wrapping around the DNA as long as the cognate sequence is appropriately complementary to each RVD in turn.
The biological, bioinformatic and structural studies summarized above have led to an explosion of reports, starting with the initial description of a chimeric TAL effector nuclease in 2010 [37••], that demonstrate the successful creation of a wide variety of gene-targeting reagents using the TAL effector scaffold, as well as a variety of efficient methods for the rapid creation of such reagents that contain investigator-designed, artificial TAL repeat sequences (recently reviewed in [38–41]). Gene targeting reagents using TAL effector scaffolds have included not only TAL nucleases, but also gene-specific activators and repressors [42–44]. These advances have allowed gene targeting reagents created using TAL effector scaffolds to join and perhaps surpass zinc finger nucleases and homing endonucleases (or ‘meganucleases’) as commonly employed tools for a variety of genome editing and correction applications. While carefully controlled comparative studies on the performance of each type of reagent on similar targets, and for similar purposes, represent an outstanding area for additional future investigation, a number of recent in cellulo and in vivo studies suggest that TAL effector-based targeting reagents display robust activity, specificity and low toxicity in a variety of contexts, in addition to their ease of engineering [35,45–52].
The authors’ work in this field has been supported by the NIH (R01 GM098861 to A.J.B. and B.L.S. and R01 GM088277 to P.H.B.), a Searles Scholars Fellowship to P.H.B. and by training grant support from the Northwest Genome Engineering Consortium to A.N-S.M.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.