Completion of the Toxoplasma genome sequencing project and the availability of the Toxoplasma genome database unleashes the possibilities of studying large cohorts of genes predicted to play roles in particular cellular events. While gene disruption can be a powerful approach for determining protein function, deciphering the subcellular localization of a protein can also provide important clues to its role in a process or pathway. We present here a combination of approaches for improving the efficiency of determining protein localization. First, we generated a T. gondii strain lacking Ku80, which functions with Ku70 in NHEJ DNA repair. In the absence of Ku80, homologous recombination predominates. We have exploited this to introduce a YFP tag to genes at their endogenous loci. We also generated a YFP expression construct containing a LIC cassette for convenient cloning of genes of interest. The LIC method of cloning proved much more effective and efficient than conventional restriction enzyme-mediated subcloning, since a single restriction digest of the vector allows the cloning of any insert with complementary LIC sequences. This makes it an ideal method for high-throughput cloning of inserts.
Several proteins that have previously been tagged with GFP were used as positive controls for testing the effectiveness of this construct and system. ACP, pCNA, MIC3, and ROP1 all targeted properly to their respective organelles and, importantly, this analysis showed that essential genes, such as ACP and pCNA, are amenable to tagging. However, IMC1-YFP and HSP60-YFP failed to show expression, even after multiple attempts. One potential explanation is that the YFP is cleaved from the gene product, a likely scenario with IMC1 after the protein is incorporated into the inner membrane complex (
23). Thus, a limitation of the system is that tagging and visualization of gene products that undergo C-terminal proteolysis or glycophosphatidyl inositol anchor addition are not feasible.
Homologous recombination generally requires either a double-strand break in one of the two DNA double helices (plasmid or genomic DNA) or a single-strand nick in both helices (
2,
22). Early studies performed in yeast determined that plasmids containing a double-strand break in the region of homology to genomic sequences recombined with high efficiency after yeast transformation (
28). Linearization of the integrating construct results in ends-in recombination, wherein the entire plasmid integrates into the genome at the region of homology and creates a duplication of the homologous sequence; a chromosomal break does not occur because the break or gap is repaired by gene conversion (reviewed in reference
38). In the absence of construct linearization, proper integration is dependent on the low probability of a chromosomal break at the targeted locus. After analyzing constructs in supercoiled, linear, or nicked/open circle forms, we confirmed that only constructs linearized within the gene fragment integrated into the homologous locus. Linearization of the construct near the selectable marker did not result in proper integration of YFP into the gene of interest, but it is possible that a proportion of these pyrimethamine-resistant parasites contained the DHFR-TS selectable cassette in the endogenous DHFR-TS locus by double-crossover gene replacement. This would result in YFP-negative, pyrimethamine-resistant parasites. Based on the findings of Fohl and Roos, however, these parasites have a fitness defect compared to those with either a normal copy or those with exogenous copies in addition to the endogenous gene (
8). Thus, such parasites may only be seen in situations where tagging of the target locus results in a substantial growth defect.
Linearization within the gene fragment requires the presence of a unique restriction site, which may necessitate amplification of a larger fragment of DNA to include such a site. Fortunately, this was not a hindrance, since we found that LIC cloning of larger fragments (3 to 4kb) was equally efficient as that with shorter lengths (1 to 2 kb). In the case of 49.m03355, a particularly small gene of 240 bp, an additional 2.7 kb of upstream genomic DNA was amplified to include a unique restriction enzyme site. No other genes were present in the additional DNA amplified, but since the targeting sequence inserts by homologous recombination, any gene(s) present would nonetheless be reconstituted after integration.
The positive control YFP-tagged proteins targeted properly to their expected organelles, and many tagged hypothetical proteins showed a pattern consistent with being in apical invasion organelles (rhoptries and micronemes). However, additional genes selected for tagging either never showed YFP expression (55.m04618, 8.m00176, 8.m00178 80.m02161, and 50.m00008) or displayed a pattern that resembled that of mistargeting or retention (20.m03958 and 49.m00054) in a subapical location, which is presumably a staging ground for the apical invasion organelles (Table ). Similar results have been seen with some exogenously GFP-tagged genes encoding secretory proteins (
49). These gene products do not appear to have a feature in common, e.g., a transmembrane anchor, which could account for the failure of the fusion protein to reach the invasion organelles. Bands detected on immunoblots were the predicted sizes for the YFP-tagged proteins, implying that the proper genes were tagged. There are several possible scenarios: (i) these proteins may naturally occupy a subapical site; (ii) the precursor is observed in a subapical compartment and the YFP is cleaved from the mature protein prior to continuing on to the invasion organelles (as the case may be for 49.m00054); or (iii) the YFP tag hinders the protein from moving through the secretory pathway to its final destination. Distinguishing among the possibilities would require producing antibodies to establish the normal location of the protein. We are currently generating and testing additional constructs with smaller epitope tags, which may better facilitate the correct localization of the tagged gene product.
The genes chosen for tagging were based on their potential role in invasion, identified by the similarity of their expression profiles to other known invasion proteins or by their presence in an ESA proteomic screen. While we expected most of these genes to be in the invasion organelles, there were exceptions. For example, the 49.m03355 protein showed a distinct signal at the extreme apical tip of the parasite, with no overlap with microneme proteins AMA1 or MIC2 or with the inner membrane complex protein IMC1. The pattern is similar to TgCAM1 and TgCAM2 (
17) which, like 49.m03355, are observed at the apical tip of mature parasites and in developing daughter cells. The resolution of light microscopy is insufficient to ascertain whether 49.m03355 is localized to the apical polar rings or the microtubules comprising the conoid body.
In addition to the YFP vector, we have also engineered constructs containing mCherry, tandem dimer Tomato (tdTomato), and cyan fluorescent reporter proteins (
21,
33,
34). We speculate that low pH in the apical invasion organelles may hamper the YFP signal, and assessing additional fluorescent reporter proteins may improve signal detection. Since pCNA-mCherry and pCNA-tdTomato constructs integrated properly and are correctly expressed in the nucleus (data not shown), these vectors can now be used to try tagging genes that did not show fluorescence with YFP. Constructs encoding YFP and cyan fluorescent protein may also be utilized in fluorescent resonance energy transfer interaction studies of two chromosomally tagged proteins. Another application of the tagging method is to identify interacting partners in protein complexes. To this end, we have generated a tagging construct with a tandem affinity purification tag for isolation of protein complexes. The combination of these constructs, the LIC cloning method, and the Δ
ku80 parasite strains, should prove to be useful tools for more rapid determination of protein localization and interacting partner proteins for the functional analyses of novel proteins.