|Home | About | Journals | Submit | Contact Us | Français|
Zinc-finger proteins (ZFPs) have long been recognized for their potential to manipulate genetic information because they can be engineered to bind novel DNA targets. Individual zinc-finger domains (ZFDs) bind specific DNA triplet sequences; their apparent modularity has led some groups to propose methods that allow virtually any desired DNA motif to be targeted in vitro. In practice, however, ZFPs engineered using this ‘modular assembly’ approach do not always function well in vivo. Here we report a modular assembly scoring strategy that both identifies combinations of modules least likely to function efficiently in vivo and provides accurate estimates of their relative binding affinities in vitro. Predicted binding affinities for 53 ‘three-finger’ ZFPs, computed based on energy contributions of the constituent modules, were highly correlated (r = 0.80) with activity levels measured in bacterial two-hybrid assays. Moreover, Kd values for seven modularly assembled ZFPs and their intended targets, measured using fluorescence anisotropy, were also highly correlated with predictions (r = 0.91). We propose that success rates for ZFP modular assembly can be significantly improved by exploiting the score-based strategy described here.
The ability to reliably engineer DNA binding proteins that recognize any desired DNA sequence would provide an unprecedented level of control over genetic information; for example, by allowing the creation of site-specific nucleases that specifically alter genomic DNA (1–5). The C2H2 zinc-finger domain (ZFD) is arguably the best characterized DNA binding motif and offers considerable promise for the rational engineering of site-specific DNA binding proteins (6–11). Zinc-finger proteins (ZFPs) consist of multiple individual ZFDs, each of which typically recognizes adjacent sequence triplets in duplex DNA (Figure 1). An individual ZFD comprises a pair of anti-parallel β-strands and one α-helix, which coordinate a zinc ion through conserved pairs of cysteine and histidine residues. In the canonical three-finger domain of the Zif268 transcription factor, the amino acid side chains at positions −1, +3 and +6 relative to the amino-terminal end of the α-helix typically make base-specific contacts with three adjacent nucleotides within the major groove of double-stranded DNA (12). An aspartic acid residue in the +2 position of the DNA recognition helix can specify a fourth nucleotide, resulting in either target-site overlap with an adjacent module or specification of an additional nucleotide at the 3′-end of the target site (13,14).
Several research groups have characterized ZFDs that recognize many of the 64 possible DNA triplets (15–20). Using a ‘modular assembly’ approach, novel ZFPs that recognize variant DNA sites are assembled by simply stringing together individual ZFDs. In practice, however, ZFPs made by modular assembly display a wide range of binding affinities and specificities (15,19,21–23). Although modular assembly has proven useful for some in vivo applications, such as artificial transcription factors, recent work suggests that the success rate of creating artificial zinc-finger nucleases (ZFNs–fusions of engineered zinc fingers to a non-specific nuclease domain) by this method is considerably lower (24,25). These low success rates, together with the inability to predict which ZFPs are likely to function in vivo, have motivated our groups to improve the procedures and design criteria for ZFP engineering (25,26).
The present study was motivated by our observation that among a small set of modularly assembled ZFPs, those that fail to function in vivo are more likely to possess modules previously shown to have relatively low affinity for target DNA. This observation implies that insufficient affinity can contribute to poor function in vivo and also suggested that it might be possible to predict the affinity of a modularly assembled ZFP using existing affinity data for component modules. Here we test these hypotheses and demonstrate that both the in vitro binding affinity and the lack of in vivo activity of a ZFP can be predicted using the energy contributions of its component ZFDs. Our approach for predicting the binding of ZFPs to desired target sequences should improve success rates of modular assembly by guiding investigators away from target sites and ZFP combinations least likely to function in vivo.
All ZFDs used in these experiments have been described by the Barbas group (15) and are referred to as ‘Barbas modules’. ZFPs containing desired three-finger (three-module) arrays were assembled by iterative ligation and cloning of restriction fragments encoding ZFDs using reagents and protocols previously described by the Zinc Finger Consortium (http://www.zincfingers.org/) (27). ZFP-encoding fragments were then cloned into vectors for expression as Gal11P-hybrid proteins in the bacterial two-hybrid (B2H) system as previously described (27).
A series of B2H reporter plasmids, each harboring a target binding site for one of 27 different three-finger ZFPs, was constructed by cloning synthetic target oligonucleotides into reporter plasmid pBAC-lacZ as previously described (27). Binding of a Gal11P-ZFP hybrid protein to the target sequence on a B2H reporter plasmid triggers transcriptional activation of a lacZ reporter gene encoding β-galactosidase. In vivo ZFP performance was therefore assayed using a β-galactosidase assay in which ZFP-induced activation of lacZ expression was measured relative to control constructs lacking the ZFP.
Zinc finger–maltose binding protein (MBP) fusion protein constructs were generated by transferring three-finger arrays, assembled as described above, into pHMTC (28). The MBP fusion plasmids were transformed into BL21 Escherichia coli cells (Invitrogen) using standard chemical transformation procedures (29).
For protein expression, 5 ml cultures were grown for 16 h at 30°C with agitation in ZFE broth [Luria Broth (LB), 1.11 mM dextrose, 100 µg/ml ampicillin]. Expansion cultures of 10 ml were inoculated from these overnight cultures (1:100 dilution) and grown to an OD600 of 0.5 before a 2 h induction with isopropyl β-D-1-thiogalactopyranoside (IPTG). Cells were harvested by centrifugation for 10 min at 4000g at 4°C and frozen overnight at −20°C. The following day, cells were resuspended in 4 ml WB1 (15 mM HEPES pH 7.8, 200 mM NaCl, 20 µM ZnSO4)/1 mM PMSF/0.1% Nonidet™ P40 (NP-40) and refrozen at −70°C. Cells were then thawed in ice water and centrifuged at 9000g at 4°C for 20 min. To remove remaining nucleic acids, the resulting supernatant was transferred to a new cold tube and polyethyleneimine was added to 0.1%. The supernatant was then incubated for 30 min before a second centrifugation at 16 000 g at 4°C for 30 min.
Amylose beads (NEB) were prepared in 50 µl aliquots in 1.5 ml micro-centrifuge tubes according to manufacturer's instructions. Beads were washed (suspended, spun down and supernatant removed) three times in 1 ml WB1/0.1% NP-40 at 4°C and resuspended in 450 µl WB1. For affinity purification, 1 ml of clarified protein supernatant was added to prepared beads, and incubated at 4°C for 30 min. The slurry was centrifuged and the supernatant was removed. The proteins bound to beads were washed two times with 700 µl WB1/0.1% NP-40 and two times with zinc buffer A (ZBA; 10 mM Tris–HCl, pH 7.5, 90 mM KCl, 1 mM MgCl2, 90 µM ZnCl2)/0.1% NP-40 (15). Purified proteins were then eluted in 200 µl ZBA/0.1% NP-40/40 mM maltose for 30 min at room temperature, with gentle agitation. After elution, beads were centrifuged at 16 000g. The supernatant was transferred to a new cold tube and centrifuged again at 16 000g. The supernatant was transferred to a new cold tube and gently stirred to mix protein. Proteins were stored at −70°C in Axygen MaxymumRecovery™ tubes. Protein concentrations were estimated using a Bradford assay against a bovine serum albumin (BSA) standard in ZBA/0.1% NP-40.
Binding reactions were performed in ZBA/0.1% NP-40/0.1 mg/ml non-acetylated BSA (Sigma) for 30 min on ice with 5 nM target DNA. Target sites (shown in Figure 2a) were formed using hairpin DNA oligonucleotides as described (15). HPLC purified, 3′-6-FAM-labeled oligonucleotides were ordered from Integrated DNA Technologies (Coralville, IA, USA). In each experiment, two serial dilutions of purified ZFP-MBP fusion protein were performed over a range of 1000–0.122 nM. Reported binding affinity values are based on the average of three separate binding experiments, performed on different days, using three separate protein preparations. Fluorescence anisotropy (FA) measurements were made using a Varian Cary Eclipse spectrophotometer in L-format configuration. Each value was based on five measurements averaged over 5 s, using a 490 nm excitation wavelength (5 nm slit width), and 530 nm emission wavelength (20 nm slit width) at 880 V. Background light scattering for each protein sample dilution was measured and subtracted to correct for protein concentration-dependent variation in intensities. Kd values were determined by nonlinear regression (30,31) using Prism (http://www.graphpad.com/prism/Prism.htm).
To test the hypothesis that binding energy contributions of individual ZFDs can be used to predict the in vitro binding affinities and in vivo performance of extended ZFP arrays, 27 three-module ZFPs were constructed by assembling various GNN-specific modules previously characterized by the Barbas group (15). ZFP compositions were chosen to systematically explore a wide range of predicted binding affinities and to test the influence of context on module performance. As shown in Table 1, ZFDs were divided into three affinity classes based on their reported affinity constants measured in a fixed context, namely as fingers in the middle position of a three-finger Zif268 variant (15). Modules comprising Zif268 variants with Kd values <10 nM were categorized as ‘strong’, Kd = 10–30 nM as ‘moderate’ and Kd > 30 nM as ‘weak’ (Table 1). Using three different modules to represent each binding class, all possible combinations of strong, moderate and weak affinity modules for a three-module ZFP were assembled. To allow direct comparisons among proteins that differ by a single module, ZFPs were designed in subgroups in which only one finger position was varied (Table 2).
If one assumes that the binding energy of a three-finger ZFP (ΔG°ZFP) is equal to the sum of the binding energies of its three component ZFDs (ΔG°ZFD) [Equation (1)], it follows that the difference in binding energy between any two ZFPs is the sum over the positions of the difference in binding energy between the modules at each position [Equation (2)].
Because the ZFDs used in this study were evaluated in the middle (F2) position of a three-finger ZFP, and because the other fingers (F1 and F3) were constant in all these ZFPs, the differences in measured binding constants among these constructs should be attributable to the differences in binding energy between the F2 ZFDs. Thus, ΔΔG can be calculated between any two ZFDs by using the identity relating Gibbs free energy to Kd [Equation (3), RT = 0.58].
To compare binding affinity measurements with predicted values, the predicted ΔΔG was calculated as the difference between each ZFP and a standard (STD) ZFP composed entirely of the F2 domain of parental C7 (15).
Thus, using Equation (4) and binding constants for ZFP variants published by the Barbas group (15), we predicted ΔΔG values for 27 novel modularly assembled ZFPs constructed using Barbas GNN modules (Figure 2). Predicted ΔΔG values ranged from 2.1 kcal/mol for ZFP #1, containing three strong modules to 8.2 kcal/mol for ZFP #27 containing three weak modules.
To evaluate in vivo binding of the 27 modularly assembled ZFPs to their cognate DNA targets, we used a quantitative B2H assay (32). In this assay, binding of a ZFP to its target site activates transcription of a lacZ reporter positioned downstream of an adjacent promoter. Thus, ZFP DNA-binding activity can be assessed by quantifying β-galactosidase activity in ZFP-expressing cells relative to control cells that do not express the ZFP. We chose to use the B2H system as an assay because recently published studies have shown that absence of ZFP activity in this system is an excellent predictor for failure of these proteins to function as ZFNs in human cells (24–26). For 25 of the 27 ZFPs tested, the level of lacZ activation observed was in excellent agreement with predicted energies (Figure 2a). Expression of the two ZFPs with the strongest predicted binding energy was toxic to cells, preventing analysis of these constructs. Several models describing the relationship between predicted and measured activity were evaluated, with segmental linear regression providing the best fit (r = 0.77; Figure 2b, dashed line). Inspection of the data revealed that the GTA-specific module (QSSSLVR) was present in most ZFPs that exhibited significantly greater activation than predicted (Figure 2b, red diamonds). Excluding ZFPs containing this module from the analysis increased the correlation coefficient to 0.86 (Figure 2b, solid line).
The predictions described above relied on published in vitro binding affinities for ZFPs in which modules were evaluated in a fixed context (15) to estimate binding contributions of individual modules. In an alternate approach, we predicted ZFP performance by solving individual module contributions as component variables of a system of linear equations. Briefly, in constructing the 27 different three-finger proteins, nine ZFDs were used approximately 8–10 times (approximately three times at each of the possible three positions, Table 1). Assuming that the energy contributions of individual ZFDs in a ZFP are additive, the B2H activity of each ZFP was considered to result from its particular combination of modules (Supplementary Figure 1). Individual module contributions were calculated for each ZFP using a leave-one-out linear system solution. Expected lacZ activation in the B2H assay for each of the ZFPs was then predicted by summing individual module contributions. As shown in Figure 3, expected levels of activation computed in this manner were highly correlated with actual B2H activity measurements (r = 0.86).
The energy contributions computed using a system of linear equations to analyze in vivo activity data from the B2H assay indicate that the GTA-specific QSSSLVR module binds with higher affinity than previous Kd estimates. This is consistent with our conclusion based on inspection of energies computed from in vitro binding constants (Figure 2b). We estimated a new value for this module by calculating the Kd that optimizes the correlation of predicted energies with the B2H data. This approach resulted in an estimated Kd of 2.5 nM for this module, 10-fold lower than the previously reported value of 25 nM (15). Incorporating this new estimate improved correlation between the in vitro energy model and in vivo fold activation data (r = 0.86, Figure 2c).
To directly evaluate the effects of individual module affinities on in vivo performance, sets of related ZFPs designed to vary at a single module position were analyzed for differences in B2H activity (Table 2). For all three sets of ZFPs in which the F1 position was varied (while F2 and F3 were fixed), the greatest in vivo activity was observed when the F1 position contained a high affinity module; the least activity was observed with a low affinity module in this position. The same trend was observed for all four groups in which the F3 position was varied while the F1 and F2 fingers were fixed. For sets in which the F2 position was varied, only one strong module (TSGSLVR) and one moderate affinity module (QSSSLVR) were tested. In these cases, the moderate affinity module outperformed the high affinity module. These results suggest that the effect of single module substitutions on relative binding affinity can be predicted reliably in most cases.
In summary, three lines of analysis: (i) predictions based on in vitro binding constants for modules in a fixed context, (ii) predictions derived from a system of linear equations based on in vivo performance and (iii) analysis of the effects of various single finger substitutions in vivo, demonstrate that in vivo performance for ZFPs can be predicted based on DNA-binding affinities of individual ZFDs.
Our success in estimating the activities of ZFPs in the B2H assay suggested that our scoring scheme could be applied more generally to predict in vitro ZFP affinities. To test whether activation measured in the B2H assay directly reflects DNA binding affinity for the desired target site, 9 of the 27 engineered proteins, along with a control Zif268 protein, were chosen for in vitro binding affinity measurements (Figure 4). Kd values were determined using fluorescence anisotropy (FA), a rapid and reproducible solution-based DNA binding assay that allows computation of the bound fraction of a fluorescently labeled ligand, based on the decrease in its rotational velocity due to binding (33,34).
As shown in Figure 5, binding affinity constants determined by FA were highly correlated with predicted energies. The two ZF proteins with highest predicted affinities were toxic to bacterial cells, prohibiting purification of sufficient quantities of protein for in vitro analysis. Energies computed based on previously published in vitro affinity measurements for modules in a fixed context (15) were proportional to the log of Kd's measured in our experiments (r = 0.91) (Figure 5a). As before, assuming a Kd of 2.5 nM for the QSSSLVR module significantly improved the correlation (r = 0.97). Predicted in vivo activation levels generated by the leave-one-out linear system method were also highly correlated with experimentally determined binding constants (r = 0.93; Figure 5b). Thus, results obtained using a rapid and reliable spectroscopic method suggest that ZFP binding affinities measured in vitro generally correspond to results obtained in vivo using the B2H system. This demonstrates that our rule-based strategy can be used to predict ZFP DNA binding affinity.
To evaluate the generality of this rule-based approach, we calculated predicted energies for another set of modularly assembled ZFPs that had been previously evaluated using the B2H system (25). From 168 modularly assembled ZFPs, we selected all ZFPs comprising GNN or TGG modules for which published in vitro DNA binding affinity constants are available [measured in the F2 position of the standard Zif268 variant backbone (15)]. As shown in Figure 6a, based on a segmental linear regression model, binding energies for 24 of these 26 ZFPs are highly correlated (r = 0.80) with reported B2H activity measurements. These results are also in excellent agreement with the results described above and shown in Figure 2b, although slightly higher activation levels were uniformly observed in the latter experiments. Notably, both sets of experiments identify a ΔΔG of ~5 kcal/mol (corresponding to a Kd of ~100 nM) as the threshold for zinc-finger function in vivo (in bacterial cells). We also used the scoring function generated from the B2H experiments performed by us (and shown in Figure 2c) to predict B2H activity for the 24 ZFPs evaluated by Ramirez et al. (25). Again, the predicted and measured fold-activation scores were in close agreement, with a correlation coefficient of 0.79 (Figure 6b). Taken together, these results suggest that the scoring function developed and evaluated may be generally applicable to ZFPs assembled using the Barbas lab GNN modules.
Using a rule-based strategy that combines experimentally determined binding energies of individual ZFDs, we were able to compute binding energies for ZFPs made from a particular set of well-characterized GNN modules (15). We also showed that these predicted binding energies are in excellent agreement (r = 0.91; Figure 5a) with binding affinity constants measured directly in vitro. Furthermore, we showed a strong correlation between these computed binding energies and ZFP activities in a B2H system for two different sets of modularly assembled three-finger ZFPs. This is an important advance because a ZFP that lacks activity in the B2H system will also have a high probability of failing to function as a ZFN in human cells (24–26). Thus, using only our scoring method, researchers can now identify target sites that will have a high probability of failing to yield functional zinc-finger arrays by the method of modular assembly. Our rule-based strategy will thus allow researchers to focus their modular assembly efforts on a smaller number of target sites with a higher probability of success.
We believe that our results also provide one potential explanation for the discrepancy between the overwhelming success rates for a previous in vitro report (35) and the low in vivo success rates observed for ZFPs in the recent study of Ramirez et al. (25): many of the modules used to perform modular assembly likely possess low affinities. Our data suggest, in fact, that 30–50% of potential three-finger ZFPs made wholly from the Barbas GNN modules will fail to function in the B2H system, a result in agreement with the recently published results of Ramirez et al. (25).
Although our results demonstrate that the energy contributions of individual ZFDs in a ZFP array are additive, we also believe they lend additional support to the notion that context is an important parameter that should be accounted for when engineering multi-finger ZFPs (i.e. that one single ZFD module will not always be optimal or adequate for recognition of its cognate 3-bp subsite in different multi-finger ZFP contexts). For example, our data show that although a weak finger will sometimes be found in a nonfunctional ZFP array (if it is joined together with other weak affinity ZFDs), it will also sometimes be found in functional arrays when paired with stronger affinity ZFDs. In addition, our data show that although strong fingers will sometimes be found in functional ZFPs, they can be found in nonfunctional ZFPs. Furthermore, the use of three strong fingers in a ZFP can lead to toxicity in E. coli cells. Although the precise mechanism of this toxicity is unclear, a reasonable hypothesis is that excessively high affinity leads to binding to related but off-target sequences with sufficient affinity to cause biological consequences (essentially, excessive affinity leading to problems of specificity). Thus, our data further re-enforce the ideas that individual ZFDs do not function completely independently and that the specific attributes of neighboring fingers do matter in the context of engineering a multi-finger ZFP.
The importance of context-dependent effects also suggests that identification of additional ZFDs with variable affinities for GNN triplets may be needed if the efficiency of modular assembly is to be improved. If such ZFDs were available, it might be possible to achieve higher success rates for modular assembly by creating several ZFPs for a given target site so as to identify a combination that balances affinities (and presumably, specificities) of its component ZFDs. A related point is that our findings also suggest one possible reason why more complex selection-based methods that account for context-dependent effects [e.g. the OPEN method recently described by Joung and colleagues (26,32)] may be more successful than modular assembly: these methods are able to balance the overall affinity and specificity of the final ZFP array by identifying optimal combinations from various ZFDs with a range of affinities and specificities for their target 3-bp subsites.
The strong correlations among predicted binding energies, in vivo activities, and in vitro binding affinity constants for the ZFPs analyzed in this work suggests that our rule-based approach might be extended to evaluate arrays assembled using GNN modules from other sources (17,19) and non-GNN modules. We have not yet evaluated such modules, but our work demonstrates two ways this could be achieved: (i) by directly measuring in vitro binding constants for modules in the F2 position of a standardized ZFP framework and (ii) by computing individual module contributions to ZFP binding as component variables of a system of linear equations that describe their activities (measured in vivo in this work, but in vitro binding constants could also be used). The energy scoring scheme proposed here will allow researchers to determine whether a modular assembly strategy is likely to be feasible for specific targets of interest, based on currently available well-characterized modules, or whether an alternative selection-based engineering strategy should be considered.
A recent study on the use of ZFNs for homologous recombination cited lack of specificity as a primary determinant of ZFN-mediated toxicity in human cells (24). A likely mechanism for ZFN-induced toxicity is through binding to genomic sequences similar to the desired target sequence. As noted above, we observed toxicity in bacterial cells for several ZFPs, even in the absence of a fused nuclease domain, suggesting that ZFP binding to certain sites in genomic DNA can be toxic, particularly for high affinity ZFPs. Although this is the first published report of such toxicity in bacterial cells that we are aware of, it has been observed previously for several other sites (Joung,J.K. unpublished data). However, bacterial expression of ZFPs with affinities in the pM range, with no toxic effects, has also been reported (19,36,37). High-throughput chip or microfluidics-based DNA binding experiments (38–41) could be used to obtain affinity and specificity data for virtually every possible target site for a given ZFP, providing additional insight into ZFP-induced toxicity and into the fundamental rules that govern the affinity and specificity of DNA recognition by zinc-finger DNA binding proteins.
A correlation between ZFP binding constants measured in vitro and functional activity measured in vivo has also been observed by others using different reporter systems (37). A similar degree of correlation was observed using the B2H system in our study (Supplementary Figure 2). Our results further demonstrate that measurable ZFP activity in an in vitro binding assay does not necessarily translate into adequate function in vivo, in agreement with Beerli et al. (42). However, the energy threshold we determined for ZFP activity in vivo, using B2H assays, corresponds to a Kd of ~100 nM, and thus differs from the estimated threshold Kd of ~10 nM reported as the minimum affinity necessary for ZFP function in mammalian cells (42). The significance of this difference between thresholds determined in bacterial and mammalian cells is difficult to evaluate, given that functional assays and Kd measurements were performed in different laboratories using different assays and with ZFPs containing different numbers of fingers.
Stormo and colleagues (43–47) have shown that the DNA-binding specificity of ZFPs can be effectively predicted from additive energy contributions of individual residues that make base-specific contacts with target site nucleotides. Our results complement this idea by demonstrating that the affinity of ZFPs also can be predicted, using affinity data for component modules. Our application of the binding energy additivity concept differs somewhat from that used by Stormo to predict specificity in that it assumes additivity of energy contributions at the individual finger rather than individual residue level. Also, our approach implicitly includes energetic contributions of residues that are not directly involved in base contacts (e.g. phosphate contacts), as well as energetic contributions resulting from context-dependent effects that presumably occur among recognition helix residues within each finger.
The apparent simplicity of modular assembly has contributed to the current focus on C2H2 ZFDs as the domains of choice for designing custom DNA binding proteins. Our results make it possible, for the first time, to reliably identify prospective binding sites that are unlikely to yield functional ZFDs by modular assembly using a set of GNN-specific finger modules. The rule-based strategy presented here can provide accurate guidance for both in vitro binding affinities and in vivo functionality for engineered ZFPs by computing energy contributions of individual ZFDs. We have updated the Zinc Finger Targeter (ZiFiT) web server (http://bindr.gdcb.iastate.edu/ZiFiT) (48) so that it now provides users with a list of potential ZFP-target site pairs for a desired genomic sequence, scored according to the procedures developed and validated in this work.
Supplementary Data are available at NAR Online.
National Institutes of Health (GM066387 to D.D.); National Science Foundation (DBI0501678 to D.F.V.); National Institutes of Health (GM069906 and GM078369 to J.K.J.); and graduate research assistantships provided by United States Department of Agriculture (MGET 2001-52100-11506, NSF IGERT0504304 and ISU's Center for Integrated Animal Genomics (CIAG). Funding for open access charge: National Science Foundation (DBI0501678).
Conflict of interest statement. None declared.
We thank members of our groups and colleagues, especially Fengli Fu, Deepak Reyon, David Wright, Ronnie Winfrey, Ben Lewis, Bob Farnham, Abd Elhamid Azzaz, Les Miller, Gaya Amarasinghe and Vasant Honavar and the referees for their helpful suggestions and valuable feedback. We also thank Guru Rao for the use of his spectrophotometer.