Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Nat Methods. Author manuscript; available in PMC 2008 June 20.
Published in final edited form as:
PMCID: PMC2435063

Epitope tagging of endogenous proteins for genome-wide ChIP-chip studies


We developed a strategy to introduce epitope tag–encoding DNA into endogenous loci by homologous recombination–mediated ‘knock-in’. The tagging method is straightforward, can be applied to many loci and several human somatic cell lines, and can facilitate many functional analyses including western blot, immunoprecipitation, immunofluorescence and chromatin immunoprecipitation–microarray (ChIP-chip). The knock-in approach provides a general solution for the study of proteins to which antibodies are substandard or not available.

For ease in constructing directed knock-in of Flag-epitope tags, we developed a universal knock-in vector. The vector contains two multiple cloning sites, sequences that encode a triple Flag epitope tag (3×Flag), a neomycin gene flanked by loxP sites and two inverted terminal repeats (Fig. 1). For targeted knock-in of 3×Flag, we inserted sequences homologous to 5′ and 3′ regions flanking the target locus into the two respective cloning sites, and packaged the resulting vector into recombinant adeno-associated virus (rAAV). Then we infected cells with targeting virus and selected neomycin-resistant clones. We identified correctly targeted clones by genomic PCR and then excised the neomycin gene by infection with adenovirus expressing Cre recombinase. As detailed in Supplementary Methods online, we successfully used this strategy to Flag-tag the C terminus of the following five proteins in three different colorectal cancer cell (CRC) lines: STAT3 (signal transducer and activator of transcription 3; Fig. 1c), PTPN14 (protein tyrosine phosphatase non-receptor 14), MRE11 (meiotic recombination 11), CHD7 (chromodomain helicase DNA-binding protein 7) and N, encoding a new protein (Supplementary Fig. 1 online). Knock-in of the 3×Flag epitope was highly efficient, with targeting frequency of 1−4% across these five different knock-in experiments (correctly targeted clones/total neomycin-resistant clones; Supplementary Table 1 online), and the efficiency of Cre-mediated excision of the neomycin cassette was 42−83% (Supplementary Table 1). The resultant epitope tagged proteins were readily detectable by western blot, immunoprecipitation and immunofluorescence, using commercially available anti-Flag (Fig. 2 and Supplementary Fig. 1). Moreover, Flag-STAT3 retained the ability to activate the expression of a target gene1, and Flag-MRE11 remained associated with known interacting proteins2, suggesting that the presence of the 3×Flag tag is not likely to interfere with targeted protein function (Supplementary Fig. 2 online). These data indicate that the rAAV-mediated targeting method is feasible for multiple loci across several different CRC cell lines and that 3×Flag can serve as a universal epitope for several antibody-based applications.

Figure 1
Schematic diagram of tagging endogenous protein with 3×Flag. (a) Targeting (NEO-loxP-3×Flag) vector. L-ITR and R-ITR, left and right inverted terminal repeats, respectively; MCS, multiple cloning site; CMV, cytomegalovirus promoter; NEO ...
Figure 2
3×Flag tagged proteins are detectable by western blot, immunoprecipitation and immunofluorescence. (a) Western blots of wild-type and 3×Flag-tagged STAT3 in DLD1 cells with either anti-STAT3 or anti-Flag. (b) Cell lysates of STAT3 3×Flag ...

To test whether the tagging method can be used for global chromatin immunoprecipitation (ChIP) analyses, which require antibodies with very high specificity, we performed ChIP-chip analyses on homozygously tagged STAT3 and hemizygously tagged CHD7 loci. We performed ChIP using antibodies either to the Flag tag, or to native STAT3 or CHD7 proteins. Hybridizations were carried out on tiled microarrays that span the Encyclopedia of DNA Elements (ENCODE) regions3. A histogram displaying the intensity ratios of oligonucleotide probes after hybridization showed a skewing of the data in the positive direction, consistent with enrichment of DNA fragments captured by ChIP with Flag antibody (Fig. 3a and data not shown). We also plotted intensity ratios by their position along each chromosome. Representative examples of the STAT3 data are shown in Figure 3b and Supplementary Figure 3 online. We used a computer program incorporating a sliding window and threshold approach to identify genomic sites enriched for STAT3 binding at high confidence (P < 1 × 10−15)4,5. Within the ENCODE regions, we identified 179 binding sites using Flag antibodies and 153 binding sites using STAT3 antibodies in cells expressing Flag-tagged STAT3, and 161 binding sites using STAT3 antibodies in wild-type DLD1 cells (Supplementary Table 2 online). Using conventional ChIP-PCR we verified 15 of 18 Flag-STAT3 binding sites (83%; Supplementary Fig. 4 online). Moreover, the relative amounts of enrichment for each site tested were similar between cells expressing Flag-tagged and wild-type STAT3 (R2 = 0.79), indicating that the efficiency of ChIP was independent of the cell lines and antibodies used. Although we performed these studies on homozygous 3×Flag-tagged STAT3, ChIP-chip analysis of heterozygous 3×Flag-tagged CHD7 yielded similar amounts of enrichment, indicating that tagging only one copy of a given allele is sufficient for ChIP analyses (data not shown).

Figure 3
ChIP analysis of wild-type and Flag-tagged STAT3. (a) Histogram of mean signal ratios of Flag-STAT3 chromatin-immunoprecipitated DNA versus random-sheared total genomic DNA. The distinct tail at the right-hand end corresponds to DNA fragments enriched ...

Binding profiles from Flag-STAT3 and wild-type STAT3 ChIP-chip experiments appear strikingly similar for the 0.5 Mb ENCODE regions shown in Figure 3b. To systematically determine the overlap of STAT3 binding sites for the remaining 29.5 Mb within the ENCODE regions, we selected all sites that were identified by ChIP-chip with antibodies to Flag or wild-type STAT3 (n = 214), and plotted the maximum mean signal intensity value for each site in a scatter plot (Fig. 3c) and heatmap (Fig. 3d). The plots revealed excellent correlation between sites identified using Flag antibodies and those identified using STAT3 antibodies, suggesting that the vast majority of binding sites identified between experiments overlap, and indicating the fidelity of ChIP results obtained using the epitope-tagging method. Some of the nonoverlapping sites could be due to differences in antibody sensitivity, subtle variations in growth conditions or experimental variability. However, we think that most of the variation is the result of threshold issues related to data processing and is not due to true false negatives. This is supported by both the heatmap (Fig. 3d) and by visual examination of the raw data (Supplementary Fig. 3). Regardless of the minor differences, the data suggest that the Flag antibodies are specific for STAT3, and by and large the presence of the tag does not alter the genomic distribution of STAT3.

Using the Clover algorithm6 and all motifs in the TRANSFAC database7, we tested ChIP-chip hits for enrichment of motifs that correspond to known transcription-factor binding sites. As expected, the STAT3 motif was significantly enriched in ChIP-chip hits from cells expressing either Flag-tagged or wild-type STAT3 (P < 0.01; data not shown). Notably, we also detected enrichment of AP-1 and HNF-3 motifs. Interactions between STAT3, c-Jun and HNF-3 have been reported previously8,9, and the enrichment of these motifs may reflect cooperative binding to DNA.

The tagging approach offers several advantages over transgenic expression of recombinant proteins. First, as the epitope-tag sequences are knocked into endogenous loci by homologous recombination, transcriptional regulation by native promoters and enhancers is maintained. Second, the tagging method obviates the need for cloning tagged full-length cDNAs, which can be particularly challenging for large transcripts. Third, as the 3×Flag tag serves as a universal epitope for multiple applications, detection methods can be standardized. The targeted knock-in method is fast and inexpensive. Additionally, we have modified the vector so that construction of the targeting vector can be achieved in one step with very high efficiency (>80%) (Supplementary Fig. 5a–c online)10. Using the modified vector, an epitope-tagged protein can be generated in approximately one-half the time required for polyclonal antibody production, and this assumes that the resulting antibody would be suitable for ChIP (Supplementary Fig. 5d). In the future, as high-throughput gene targeting strategies evolve, it should be possible to target virtually every transcription factor within the same or different cell types. Combining this effort with genome-wide ChIP-chip or ChIP-Seq11 analyses could facilitate the characterization of transcription factor–DNA interaction networks in mammalian cells, similar to studies that are now only feasible in yeast12.


We thank D. Sedwick for helpful discussions, J. Yu for technical assistance, and P. Harte and G. Crawford for critically reading this manuscript. This work was supported by grants from US National Institutes of Health (CA127590, U54CA116867), Concern Foundation and V foundation to Z. Wang, and National Institutes of Health grants KCA103843A and RHD056369A to P.C.S.

Supplementary Material



1. Zhang X, et al. Proc. Natl. Acad. Sci. USA. 2007;104:4060–4064. [PubMed]
2. Cherry SM, et al. Curr. Biol. 2007;17:373–378. [PMC free article] [PubMed]
3. The ENCODE Project Consortium Science. 2004;306:636–640. [PubMed]
4. Scacheri PC, Crawford GE, Davis S. Methods Enzymol. 2006;411:270–282. [PubMed]
5. Scacheri PC, et al. PLoS Genet. 2006;2:e51. [PubMed]
6. Frith MC, et al. Nucleic Acids Res. 2004;32:1372–1381. [PMC free article] [PubMed]
7. Wingender E, Dietze P, Karas H, Knuppel R. Nucleic Acids Res. 1996;24:238–241. [PMC free article] [PubMed]
8. Ginsberg M, et al. Mol. Cell. Biol. 2007;27:6300–6308. [PMC free article] [PubMed]
9. Waris G, Siddiqui A. J. Virol. 2002;76:2721–2729. [PMC free article] [PubMed]
10. Bitinaite J, et al. Nucleic Acids Res. 2007;35:1992–2002. [PMC free article] [PubMed]
11. Johnson DS, Mortazavi A, Myers RM, Wold B. Science. 2007;316:1497–1502. [PubMed]
12. Lee TI, et al. Science. 2002;298:799–804. [PubMed]