|Home | About | Journals | Submit | Contact Us | Français|
Zinc Finger Targeter (ZiFiT) is a simple and intuitive web-based tool that facilitates the design of zinc finger proteins (ZFPs) that can bind to specific DNA sequences. The current version of ZiFiT is based on a widely employed method of ZFP design, the ‘modular assembly’ approach, in which pre-existing individual zinc fingers are linked together to recognize desired target DNA sequences. Several research groups have described experimentally characterized zinc finger modules that bind many of the 64 possible DNA triplets. ZiFiT leverages the combined capabilities of three of the largest and best characterized module archives by enabling users to select fingers from any of these sets. ZiFiT searches a query DNA sequence for target sites for which a ZFP can be designed using modules available in one or more of the three archives. In addition, ZiFiT output facilitates identification of specific zinc finger modules that are publicly available from the Zinc Finger Consortium. ZiFiT is freely available at http://bindr.gdcb.iastate.edu/ZiFiT/.
Zinc fingers (ZFs), the most abundant DNA-binding motifs encoded in eukaryotic genomes, offer perhaps one of the best understood protein–DNA binding mechanisms (1–4). Engineered zinc finger proteins (ZFPs) have significant potential as tools for gene regulation and genome modification because they can be used to target functional domains to virtually any desired location in any genome. For example, engineered zinc fingers fused to a non-specific nuclease domain can be used to create double-stranded DNA breaks for the purpose of inducing high-efficiency homologous recombination at specific genome loci (5–10).
ZFPs provide a versatile framework for designing proteins with new DNA-binding specificities. One simple method for making ZFPs is to assemble pre-existing single finger ‘modules’ (with known specificities) into multi-finger arrays. Each ZF module recognizes approximately three base pairs of DNA and, when appropriately joined together, the resulting ZF arrays are capable of specifically recognizing longer DNA sequence motifs (Figures 1 and and2).2). Three research groups have each described and characterized separate archives of ZF modules for constructing multi-finger arrays (11–15). Recently, the Zinc Finger Consortium (http://www.zincfingers.org) has incorporated all three of these archives into a standardized framework that facilitates rapid assembly of multi-finger arrays using a simple restriction digest-mediated cloning strategy (16).
In collaboration with the Zinc Finger Consortium, we have developed a web-based server, ZiFiT (Zinc Finger Targeter), which facilitates the design of zinc finger proteins that recognize specific DNA sequences. ZiFiT has a simple interface through which users provide the DNA sequence of the gene or region within which they wish to search for potential target ZFP-binding sites. Target sites can be either those bound by a single ZFP (Figure 2a) or by a dimeric zinc finger nuclease (in which ZFP arrays bind two ‘half-sites' separated by a fixed-length ‘spacer’ Figure 2b). ZiFiT output provides users with potential ZFP target DNA sequence(s) within the region of interest, together with a corresponding array of ZF modules needed to construct the desired ZFPs. The output includes information specifying the source of each module and a reference number that uniquely identifies each module within the standardized Zinc Finger Consortium modular assembly archive (16).
A canonical zinc finger module consists of two anti-parallel beta strands and an alpha helix coordinated with a zinc ion via cysteine and histidine contacts (1–4). Amino acids in positions −1 to +6 (numbered relative to the start of the alpha helix) recognize specific DNA triplet sequences, primarily by forming base-specific contacts in the major groove of the double-stranded target DNA (Figure 1a). ZF modules are often referred to according to these ‘recognition’ residues in the alpha helix, listed in N- to C-terminal direction; we refer to the other amino acids in the module as the finger backbone. As illustrated in Figure 1b, a ZFP binds its target DNA site with the amino acids of the recognition helices (from N- to C-terminus) contacting consecutive nucleotides in DNA running in the 3′ to 5′ direction. This can lead to confusion because the DNA target site is typically referred to in the 5′ to 3′ direction.
ZiFiT was designed to take advantage of ZF module sets developed and characterized by three independent research groups. Users can specify whether modules should be chosen from only one, two or all three of these module sets (designated as Barbas, Sangamo or ToolGen, based on the group that described them—see below). The Zinc Finger Consortium has generated plasmids encoding modules from all three sets in a standardized framework, allowing users to rapidly assemble desired ZF proteins using individual modules from sets of their choice (16). ZiFiT was developed in conjunction with the Zinc Finger Consortium and automatically provides reference numbers for requesting these ready-to-assemble plasmid modules from Addgene, a non-profit plasmid distribution service (http://www.addgene.org/zfc/). It should be noted that the designers of each module set did not necessarily intend for users to mix modules from different sets within a single ZFP array. However, several studies have generated functional ZFPs by combining modules from different sets (10,14).
Barbas Modules—These ZF modules were developed using a combination of phage display and rational design methods by the Barbas laboratory at The Scripps Research Institute (11,12,15). Modules are available for recognition of all GNN triplets, most ANN and CNN triplets, and a few TNN triplets. These Barbas modules were developed under the assumption that individual ZF modules have virtually complete positional independence, i.e. their recognition properties are not dramatically affected by their position within an array or by the identities of neighboring zinc fingers. The current version of ZiFiT includes 49 distinct Barbas modules (see Supplementary Table 1).
Sangamo Modules—Sangamo ZF modules were designed at Sangamo BioSciences Inc., and are currently available for all GNN triplets and a smaller number of’non-GNN’ triplets (13,17,18). Sangamo modules were developed under the assumption that the position of a module within a three-finger array can affect its recognition properties (e.g. amino-terminal finger compared with carboxy-terminal finger). Each of the three positions within a three-finger array has a distinct ZF module developed for a given triplet at that position. For this reason, if a user chooses to use Sangamo modules, ZiFiT restricts the user to the design of three-finger arrays in which positional context is preserved. The current version of ZiFiT includes 57 position-specific Sangamo modules (see Supplementary Table 1).
ToolGen Modules—ToolGen modules are naturally occurring human zinc fingers that were identified and characterized by ToolGen Inc. and are available for a variety of nucleotide triplets (14). The current version of ZiFiT includes 35 distinct ToolGen modules (see Supplementary Table 1).
ZiFiT facilitates the design of ZFPs using the ‘modular assembly’ approach in which pre-existing individual zinc finger modules are linked together to recognize desired target sequences. ZiFiT is available at http://bindr.gdcb.iastate.edu/ZiFiT/ or can be accessed under Software Tools from the Zinc Finger Consortium website at http://www.zincfingers.org/software-tools.htm). The ZiFiT website includes instructions and examples as well as several links to other websites that provide background information regarding zinc finger protein design. A FAQs page provides additional guidance for new users.
Single ZFP-binding sites—Individual zinc finger modules can be linked together to form multi-finger arrays that recognize specific sequences in double-stranded genomic DNA (Figure 2a). These multi-finger arrays can be fused to other protein domains, such as transcriptional activation or repression domains, in order to target them to specific locations within large genomes (1–4). Because a single ZF recognition helix typically binds three contiguous nucleotides in DNA, most binding sites for single ZF proteins (which we designate'single ZF array binding sites’) have lengths that are multiples of three base pairs. However, certain ZF modules containing aspartic acid in the +2 position of the DNA recognition helix appear to recognize four nucleotides. This can result in ‘target site overlap’ between adjacent ZF modules or, if the Asp-containing module occurs in the amino-terminal position of an array, the requirement for an additional 3' nucleotide in the ZF array binding site (19).
Dimeric zinc finger nuclease sites—Zinc finger nucleases (ZFNs) consist of a zinc finger array fused to a non-specific dsDNA nuclease (e.g. the nuclease domain of the Type IIS restriction enzyme FokI) (5,6,8,10). ZFNs made with FokI nuclease are catalytically active only as dimers (20). Thus, a full ZFN target site consists of two ZF ‘half-sites’ on complementary DNA strands, separated by a ‘spacer’ of five or six base pairs, as shown in Figure 2b (6,21). In this article, we designate the two ‘half-sites’ together with the spacer as a ‘dimeric ZF nuclease site.’
First-time users must complete a quick and easy registration. After logging in, users indicate which type of ZFP they wish to design, either single zinc finger arrays or dimeric zinc finger nucleases (ZFNs) before proceeding to the main sequence input screen (Figure 3). Check boxes near the top of the page allow users to select which ZF module sets (Barbas, Sangamo, ToolGen) they wish to include in their search. The DNA sequence of interest (i.e. the sequence of the region within which the user wishes to identify potential ZFP target sites) can be submitted either in FASTA format or as a raw DNA sequence in standard 5′ to 3′ orientation (both spaces and numbers are ignored). Users can specify the number of DNA triplets to include in each ZF array target site using drop-down menus below the sequence input box.
Zinc finger proteins with fewer than three modules do not typically possess affinities needed to bind their targets, while target sites of 18nt (six modules) are typically enough to ensure their uniqueness in eukaryotic genomes (22). ZiFiT restricts target site sizes to 3–8 to triplets (corresponding to 3–8 ZF modules) for standard single ZF arrays and target sizes to 3 or 4 triplets (corresponding to 3–4 ZF modules) for both the left and right arrays of a dimeric ZF nuclease site. In dimeric ZFN target sites, the length of the spacer region, within which the active dimeric nuclease cleaves, can be defined by the user as either five or six DNA bases. The preferred distance is six bases (6,21).
Advanced options can be accessed by selecting the ‘Advanced’ link in the lower right corner of the input page (as shown in Figure 3; this box toggles to a ‘Basic’ link, which hides the Advanced Options). Advanced options allow the user to: (i) define ‘Triplet Composition’ by specifying the minimum or maximum number of GNN, ANN, CNN, TNN triplets to include in potential target sites; (ii) choose whether to ‘ignore Asp overlap,’ which refers to the target site overlap that can occur with Asp in position +2 of the helix (see the Materials and Methods section) and (iii) choose to search one or both strands of the input DNA sequence for target sites, when searching for a single array site. By definition, both strands are considered in a ZFN site search, therefore this option is not available in the ZFN window.
Guidance for using advanced options is provided on the ZiFiT Instructions and FAQ pages. For example, users can adjust ‘Triplet Composition’ to enforce a bias for target sites containing mostly GNN triplets. This can improve chances of successful ZFP design because GNN-specific ZF modules have been more thoroughly characterized than other modules. The ‘Ignore Asp overlap’ option is particularly useful for troubleshooting: for example Asp in position +2 can cause unexpected results when the target site falls on the very 3′ end of a submitted sequence. An Asp in position +2 of the first module specifies an additional 3′ base. If this base is not available, a partial target site can be detected by ZiFiT and partial matches are not returned to the user. Thus, this option should be used only by advanced users or for troubleshooting why a suspected site is not returned by ZiFiT
ZiFiT searches for potential ZFP target sites in the input DNA sequence, based on the module sets chosen by the user (and other user-specified restrictions described above). For each ‘hit’ within the query DNA sequence, ZiFiT output consists of two components: a ‘DNA target site’ and a table of corresponding ‘ZFP modules’ that could be assembled to recognize that target site. As shown in Figure 4, the double-stranded DNA sequence (displayed at top) represents a potential dimeric ZFN target site identified by ZiFiT. Recognition triplets in the target site are color-coded to match corresponding entries in the table of ZFP modules (displayed below the target site sequence). The table lists ZF modules that have been experimentally shown to bind triplets in the displayed target site. If more than one module is available for a given triplet position, all modules for that position are displayed.
Each ZF module entry includes the recognition helix sequence of the module, the color-coded DNA sequence of its corresponding target triplet, a reference number for requesting the ZF module, if desired, from the Addgene Zinc Finger Consortium website (http://www.addgene.org/zfc/), and the original source of the module (Barbas, Sangamo or Toolgen). When ZiFiT identifies more than one potential ZFP/target site pair within the query sequence, additional hits are also displayed and can be viewed by scrolling. In the current implementation, the order in which ZiFiT returns potential target sites is determined solely by their position within the query sequence. In some cases, ZiFiT may return a large number of potential target sites and/or several corresponding potential ZFP designs. Tips for choosing among these are provided on the ZiFiT FAQs page.
ZiFiT is designed to facilitate the experimental generation of ZFPs by providing output that includes reference numbers for ready-to-assemble modules generated by the Zinc Finger Consortium. These resources can greatly simplify the ZF protein assembly process. All ZF modules used in ZiFiT can be requested through Addgene, a non-profit plasmid distribution service (see http://www.addgene.org/zfc for additional details). A detailed protocol for assembling and evaluating zinc finger proteins using this strategy has been published recently in Nature Protocols (16). Other research groups have also described PCR-based zinc finger assembly protocols (23,24). Additional general information regarding zinc finger proteins and their applications is available at the Zinc Finger Consortium website at http://www.zincfingers.org.
ZiFiT will be updated regularly to reflect growth in ZFP assembly resources (e.g. collections of zinc finger modules and arrays) and increasing availability of experimental data regarding the in vivo efficacy of specific ZF proteins. While several studies have demonstrated that ZFPs engineered using the modular assembly approach can function successfully, other studies (both published and unpublished) have shown that designed ZFPs function with variable degrees of success (14,25,26). As noted above, in the current version of ZiFiT, the order in which potential ZF array target sites are displayed in the output window corresponds to the order in which they occur within the query DNA sequence—no ranking is implied. A feature we plan to implement in the next version of ZiFiT is a scoring function that will rank and provide information for each hit (i.e. for each ZFP/target site pair) in the ZiFiT output. This scoring information and ranked listing of experimentally validated ZF target sites will provide additional guidance to assist users in designing ZF arrays that are most likely to function. In collaboration with the Zinc Finger Consortium, members of our groups have designed and implemented a Zinc Finger Experiment Database, with the goal of collecting and making accessible results from all available zinc finger experiments. The next version of ZiFiT will be closely integrated with this database. It will provide users with relevant results regarding ZF modules and potential ZF target sites within a query sequence that are identical or highly similar to modules and target sites that have been experimentally tested. We will update ZiFiT to support additional ZF modules and/or methods as they are developed and made available by the zinc finger research community.
A server offering software similar to ZiFiT, named Zinc Finger Tools (http://www.zincfingertools.org) has been developed by the Barbas lab at the Scripps Institute, for use with their ZF module set (27). One significant advantage of ZiFiT is that it offers users the ability to choose modules from three different module sets instead of only the single Barbas set, and users may combine modules from different archives if desired. When using the Barbas Zinc Finger Tools software, a user will only have one potential protein to test for any given target site. Because modular assembly does not always yield functional proteins (14,25) it is important to have multiple potential zinc finger arrays for any given target site. Also, ZiFiT was intentionally designed to complement experimental resources made publicly available by the Zinc Finger Consortium and distributed through Addgene. Zinc Finger Tools offers a scoring function to assist users in choosing among potential zinc finger target sites, but as noted on the Zinc Finger Tools website itself: rigorous empirical validation of the scoring function awaits further experimental data.
We respectfully acknowledge those who have shared their results with the zinc finger research community. We also thank users and referees for providing many helpful suggestions that have improved the ZiFiT server.
This work is supported in part by NIH grant GM066387 (D.D) & NSF grant DBI 0501678 (D.F.V.) and graduate research assistantships provided by USDA MGET grant 2001-52100-11506 and ISU Center for Integrated Animal Genomics (CIAG). J.K.J is supported by NIH grants GM0699006 and GM072621.
Funding to pay the Open Access publication charges for this article was provided by NFS Grant DBI 0501678
Conflict of interest statement. None declared.