|Home | About | Journals | Submit | Contact Us | Français|
The interpretation of genome sequences requires reliable and standardized methods to assess protein function at high throughput. Here we describe a fast and reliable pipeline to study protein function in mammalian cells based on protein tagging in bacterial artificial chromosomes (BACs). The large size of the BAC transgenes ensures the presence of most, if not all, regulatory elements and results in expression that closely matches that of the endogenous gene. We show that BAC transgenes can be rapidly and reliably generated using 96-well-format recombineering. After stable transfection of these transgenes into human tissue culture cells or mouse embryonic stem cells, the localization, protein-protein and/or protein-DNA interactions of the tagged protein are studied using generic, tag-based assays. The same high-throughput approach will be generally applicable to other model systems.
At a time when the ‘thousand-dollar genome’ seems a realistic goal for the near future, methods for dissecting the functions of the encoded genetic information lag far behind the genome sequence, both in throughput and in quality of the produced data. Genome sequencing and subsequent bioinformatics analysis have made it possible to study the function of genes in mammalian tissue culture cells using systematic reverse-genetic approaches1-3 and have radically improved researchers’ ability to identify human disease genes. Such studies typically identify single genes, whose biological function has often not yet been described. In order to place the proteins these genes encode in pathways, these studies must be followed by detailed molecular-level analysis, of which the most powerful types are protein localization and protein-protein interaction. The power of protein localization and protein-protein interaction studies can be seen from the genome-wide application of GFP localization and tandem affinity tag-based complex purification in the yeast Saccharomyces cerevisiae, which has produced a comprehensive picture of the core proteome of a simple, well-studied model system4-8. The key advantage of yeast for these studies was their efficient intrinsic homologous recombination, which allowed the same tag-coding sequence to be introduced at the endogenous locus of nearly every gene of the genome. The tagged proteins were then systematically analyzed through standardized, generic, tag-based assays.
To transfer this approach to mammalian cells, we require methods that produce data about localization and binding partners on a genome-wide scale. Any such method should satisfy at least two important criteria. First, it must provide reliable and reproducible expression of the tagged protein at levels and patterns matching those of the endogenous counterpart. Second, it must be an efficient and scalable procedure, suitable for high-throughput use.
Protein tagging in mammalian tissue culture cells is typically performed with cDNA-based transgenes that lack the normal endogenous noncoding regulatory information such as introns or 3′ untranslated regions (UTRs) and are usually driven by unrelated ubiquitous or tissue-specific promoters. As a result they do not reproduce the endogenous regulation at the transcriptional and post-transcriptional levels. Furthermore, the generation of sequence-verified, full-length cDNAs is expensive and laborious, especially for large or rare transcripts. As a result, comprehensive cDNA libraries are not available for most model organisms.
Transgenes based on large genomic clones such as bacterial artificial chromosomes (BACs) are often large enough to contain complete genes with all their endogenous regulatory sequences. Mapped BAC libraries are typically generated as part of genome sequencing projects and are readily available for most model organisms. The development of homologous recombination–based DNA engineering methods, commonly referred to as recombineering9,10, in Escherichia coli has enabled rapid and robust modification of these large constructs. We and others have previously described the successful use of recombineering to generate BAC transgenes and their use for expression and/or purification studies in mammalian tissue culture cells11,12, worms13, flies14, zebrafish15 and mice16,17. Recently, we have demonstrated that the fidelity of recombineering in E. coli is high enough to permit multiple DNA engineering steps to be carried out in liquid culture, thereby opening up a way for high-throughput application of this approach13.
Here we establish an efficient, generic and scalable approach for BAC-based transgenesis in mammalian tissue culture cells, which we term ‘BAC TransgeneOmics’. We describe high-throughput production of BAC transgenes using a robust procedure for 96-well-format recombineering and establish protocols for efficient, stable transfection of these large constructs. We demonstrate the versatility of this approach for the analysis of protein localization and protein-protein interactions and the mapping of DNA-binding sites of proteins.
The general outline of our approach is shown in the flowchart of Figure 1a. First, we selected a suitable BAC clone containing the gene of interest and tagged it by recombineering in E. coli9,10,13,18. We then stably transfected the purified BAC transgene into cultured mammalian cells11,12 and performed protein localization, protein complex purification or chromatin immunopurification experiments either on transfected pools or after isolation of clones derived from single cells. The success rate at each step of the generation of transgenic cell lines was more than 80% (Fig. 1b), enabling the high-throughput application of this approach.
As a generic protein tag, we selected a modified version of the ‘localization and affinity purification’ (LAP) tag19. This tag consists of (in the order used for purification) extended green fluorescent protein (EGFP) for localization and immunopurification, the PreScission protease cleavage site for native elution, S-peptide for a second affinity-purification step and a tobacco etch virus (TEV) protease cleavage site for a second native elution step. We constructed recombineering cassettes for tagging at either the N or C terminus of the protein. The N-terminal cassette (Fig. 2a) has a dual eukaryotic-prokaryotic promoter (PGK-gb2)20 driving a neomycin-kanamycin resistance gene within an artificial intron inside the tag coding sequence. The selection cassette is flanked by two loxP sites and can be permanently removed by Cre recombinase-mediated excision. The C-terminal cassette (Fig. 2b) contains the sequence encoding the tag followed by an internal ribosome entry site (IRES) in front of the neomycin resistance gene. In addition, a short bacterial promoter (gb3) drives the expression of the neomycin-kanamycin resistance gene in E. coli.
To facilitate high-throughput production of the transgenic constructs, we developed a program (BACFinder) that automatically selects the most suitable BAC clone for any given mouse or human gene and generates the sets of PCR primers required for tagging and verification. An added advantage is that these cross-species BACs facilitate functional validation of the tagged transgene by specific RNAi knockdown of the endogenous gene product11,13.
We inserted the tagging cassettes, containing 50 nucleotides of PCR-introduced homology arms, into the BAC by recombineering, either behind the start codon (for the N-terminal tag) or in front of the stop codon (for the C-terminal tag) of the gene. All steps of transgene production were carried out in 96-well-plate format (Fig. 2). The E. coli cells that had successfully recombined the cassette were selected for kanamycin resistance in liquid culture. In the test experiment (Fig. 3a), about 90% of the reactions (88 of 96) survived the selection. In the control experiment (Fig. 3b), in which the transformation order was shifted so that the cassette and the BAC did not match, none of the clones grew under selection, indicating that the resistant cells are derived only from the specific recombineering reaction. By plating on selective agar, we determined that each saturated culture was derived from 10-200 independent recombination events (data not shown). We checked two independent clones for each reaction by PCR through the tag insertion point. Of the 88 BACs that grew in selective media, 85 (97%) yielded a PCR product of the expected size (see Supplementary Fig. 1 online).
A PCR check of the original, unmodified BACs showed that in most of the clones that failed to grow in selection, the targeted genomic region was missing (see Supplementary Fig. 1). This correlates well with the estimated 10% of chimeric, rearranged or wrongly mapped clones in the BACs used.
In a test of high-throughput transfection, 86% (67 of 78) of the transfected BACs gave antibiotic-resistant clones. For 90% (60 of 67) of the analyzed cell lines, we detected a distinct band on a western blot, indicating that almost all of the transgenic cells were expressing the transgene (Fig. 3c and Supplementary Table 1 online). In some cases, we observed additional specific bands that might have been caused by protein degradation or by endogenous variations resulting from alternative splicing, specific proteolytic processing in vivo or post-translational modifications.
The presence of endogenous transcriptional control elements in the transgene should translate into physiological expression levels. However, the integration site and potentially the number of integrated copies will be different in each independent cell line. To assess the copy number of the BAC transgenes in stably transfected cell lines, we used fluorescence in situ hybridization to locate the BAC clone in the nuclei of two transgenic cell lines (Supplementary Fig. 2 online). In both cases we detected the transgene at a single nuclear locus. A fluorescence in situ hybridization probe specific for the endogenous gene locus produced signal of similar size and intensity, indicating that the transgene has integrated in low or single copy number. The resolution of the method does not, however, allow us to precisely quantify the copy number of the transgene.
BAC transfection typically results in a pool of several independent lines. Even at the single-copy level, two independently generated lines might have different expression levels due to position effects and transgene fragmentation before integration. Although single cell–derived clones can be isolated if necessary, this step is very time consuming. As long as transgene expression does not differ substantially between the individual lines, it is preferable to perform the initial analyses directly on the clone pool. To evaluate the expression variation within the clone pools, we determined the percentage of cells expressing GFP in a pool of cells that was obtained after selection for G418 resistance selection. For 14 of 15 pools, more than 60% of the cells were GFP positive (Supplementary Table 2 online). To look into cell-to-cell variations in expression, we used fluorescence-activated cell sorting to analyze the GFP fluorescence intensity distribution within a pool of HeLa cells stably transfected with a mouse transgene consisting of the chromosome protein HP1β tagged with LAP. We found that the majority of the cells in the pool (59% or 88% for mouse HP1β tagged with LAP at the N or C terminus, respectively) expressed GFP at the same level (Supplementary Fig. 3 online). We also compared the relative expression level of mouse AURKB-LAP in the pool with five clonal cell lines by western blotting, which showed similar expression levels for the tagged and endogenous protein in the pool and clonal cell lines (Supplementary Fig. 3).
Our findings indicate that transgene expression levels do not vary significantly between independent BAC transgenic lines and clone pools and therefore can be used directly for downstream analyses. However, we note that this observation may not be for every transgene and will be gene dependent. In this case, single cell–derived lines can be generated and screened to identify an appropriate clone for further studies.
We selected 15 well-characterized genes for tagging and detection and reproduced the known localization patterns for 11 of them (Fig. 4a; Supplementary Fig. 3 and Supplementary Table 2). As expected, proteins that are subunits of a complex, such as AURKB and INCENP, localized to the same cell compartments.
To check whether the tag position influences the endogenous protein localization, we analyzed transgene expression for three proteins (HP1β, AURKB and Rab5C) tagged at both the N and C termini (Supplementary Fig. 3). LAP-tagged HP1β was readily detectable at its proper location in the nucleus and showed the expected dynamics through the cell cycle21,22 when tagged with either an N- or a C-terminal tag. In contrast, AURKB showed physiological localization dynamics through the cell cycle23 only when tagged at its C terminus, and Rab5C was found at its proper localization at endosomes only when tagged at its N terminus (Fig. 4a)24. These examples confirm that tagging of both termini is advisable, especially when the localization of the endogenous protein is unknown25.
To assess the utility of the LAP tag for affinity purification at endogenous expression levels, we purified the same 15 genes as described above. Because most of these proteins are known to form complexes during mitosis, we arrested cells in prometaphase using nocodazole. We copurified LAP-tagged bait proteins and endogenous prey proteins from cell extracts by two-step affinity purification. First the bait protein was pulled down with antibody to GFP (see Supplementary Methods online). Next, the recovered protein was specifically eluted by PreScission protease cleavage. The S-peptide part of the tag was then used for a second affinity-purification step. The isolated complexes were further analyzed by SDS-PAGE and silver staining to assess their purity and yield, followed by direct liquid chromatography–linked LTQ Fourier transform mass spectrometry (LC-MS/MS). We then identified protein interaction partners by database mining. Using 6 × 107 cells cultured as monolayers, we were able to recover the baits for 15 of the 15 tested proteins (Supplementary Table 1). In a more wide-ranging selection of samples, we typically recover the bait in about 90% of cases (data not shown). Most of the immunoprecipitates showed a distinctive pattern of bands on a silver-stained SDS-PAGE gel (Fig. 4b), and copurifying proteins previously known to interact with the bait proteins were identified (Supplementary Table 1). For example the dynein, anaphase promoting complex/cyclosome (APC/C), cohesin and γ-tubulin complexes were purified in their entirety with this technique. For the APC/C complex, this is the first report of the isolation of the entire complex in mammalian cells by tag-based affinity purification.
The determination of interaction sites for DNA binding proteins is another application that can greatly benefit from the BAC TransgeneOmics approach. Methods for genome-wide mapping of DNA binding sites based on chromatin immunopurification (ChIP) coupled with microarray analysis (ChIP-chip)26,27 or sequencing (ChIP-Seq)28-31 are available, but they usually rely on antibodies.
We evaluated the performance of the LAP cassette for ChIP using LAP-tagged transgenic cell lines for the human transcription factors forkhead box A1 (FOXA1), spliced X-box binding protein 1 (XBP1-S) and vitamin D receptor (VDR) in the human breast cancer cell line MCF7. All three lines showed expression patterns consistent with the physiological localization of the endogenous transcription factors (Supplementary Fig. 4 online). Furthermore, the interaction of VDR-LAP with a known VDR target site in MCF732 was dependent on the presence of its endogenous ligand vitamin D3 (Supplementary Fig. 5 online). Genome-scale ChIP-chip analysis (Supplementary Table 3 online) of VDR-LAP identified binding sites for putative target genes (Fig. 5a) that were highly enriched within 1 kb of transcription start sites (Fig. 5b) and may therefore indicate promoter regions.
Recently, binding sites of FOXA1 and XBP1-S have been analyzed by ChIP33,34, and thus ChIP-grade antibodies against the endogenous proteins were available for a comparative analysis with our approach. We performed parallel ChIP-chip analyses with protein-specific antibodies (using the wild-type cell line) or a goat polyclonal antibody to EGFP directed against the LAP tag (using the transgenic lines). This comparison revealed similar binding profiles from the two approaches (Fig. 5c,d), with 87% (FOXA1) and 76% (XBP1-S) overlap of the identified binding sites (Supplementary Table 3).
These results showed that tag-based ChIP analysis produces results comparable to those of the conventional antibody-based approach.
Although BAC tagging provides a convenient, quick way to assess protein function in mammalian cell culture, it is often desirable to study the role of a given protein during development and in adult organisms. With minor modifications, our BAC transgenesis protocol is applicable to embryonic stem (ES) cells. The example in Figure 6a,b show a BAC-transgenic ES cell line for GFP-tagged PCNA. As in the BAC-transgenic HeLa cells (Fig. 4a), PCNA-LAP showed dispersed nuclear localization in G1, consistent with the role of PCNA during DNA synthesis. The pluripotent ES cells can develop into any cell of the body, and they can be differentiated into many different cell types in vitro35-37. We generated ES-derived transgenic mice using laser-assisted eight-cell embryo injection technology38. At 13.5 days post coitum (d.p.c.), three out of the ten embryos showed fluorescence throughout the embryo, indicating that the transgenic ES cells had efficiently contributed to all cell lineages (Fig. 6c). Mouse embryonic fibroblasts derived from one of the transgenic embryos reproduced the expression pattern observed in the transgenic HeLa and ES cells (Fig. 6d,e). These findings indicate that the BAC transgenes can be used, through ES cells, to study protein function in transgenic mice and to derive transgenic primary cells.
The analysis of protein localization and interaction, when complemented with phenotypic data from genome-scale loss-of-function screens, is often sufficient to unravel the molecular role of a protein of interest. Furthermore, the systematic analysis of protein localization and interaction data on a proteome scale would enable the assignment of putative functions for many proteins that do not produce detectable phenotypes in loss-of-function studies. Although these studies are feasible with the established antibody-based approaches, raising and testing specific antibodies for large sets of proteins can be very expensive and time consuming. In contrast, generic tag-based approaches are much more suitable for genome-scale application, as high-lighted by many recent yeast studies4,7,8.
The approach that we present here is the first solution for protein tagging in mammalian tissue culture cells that is comparable with that in yeast, in terms of both throughput and quality of the obtained data. The use of BACs for transgenesis enables the expression of the transgene from its native genomic environment, which includes most, if not all, regulatory elements, a situation closely resembling that of endogenous gene targeting. The method is applicable to very large genes, which are difficult to obtain as cDNAs. Unlike cDNA-derived transgenes, a single BAC transgene covers all alternative splice variants, except in the case of an alternative first (for N-terminal tagging) or last (for C-terminal tagging) exon. In most cases the addition of an exogenous tag has no obvious effect on protein function. There are cases, however, when the tag would not be tolerated, for example because of the presence of an important functional domain close to the tag insertion point. In such cases tagging of the opposite protein terminus is recommended. The tag combination that we use performs very well for protein localization, tandem affinity purification and ChIP. However, for further applications, tagging cassettes with different affinity epitopes and/or fluorescent proteins can be easily generated.
Notably, the BAC tagging approach can be applied, through ES cells, to protein function exploration within the multicellular context of the developing mouse embryo or adult mice. Furthermore, transgenic mice are a source of transgenic primary cells that are typically difficult to generate or maintain by other methods, and thus this method provides a reliable approach for protein tagging in many primary cell types.
The format of the BAC transgenic pipeline permits automation of most experimental steps. Although we have used standard cell culture techniques in the current study, automated cell culture systems are available from several commercial suppliers. Automated imaging and image analysis is rapidly evolving and already permits sophisticated localization based screens39,40. With the increasing sensitivity of mass spectrometry, protein interaction partners can be rapidly identified upon affinity purification from relatively small number of cells and affinity purification and mass spectrometry might soon be feasible in 96-well format. All these approaches will greatly benefit from the method we describe here.
Perhaps the most important step forward with this approach is that it can be easily transferred to any model system that permits stable transgenesis. The scalable format and highly efficient 96-well recombineering approach open up the possibility for rapid generation of comprehensive tagged BAC transgene resources with genome wide coverage, for which we propose the term ‘TransgeneOmes’. The method described has a success rate of 80-90% at each step of the pipeline, which enables the generation of hundreds of transgenic constructs and tens of transformed cell lines per person per month.
We are grateful to J. Ellenberg and Z. Maliga for stimulating discussions, to K. Neugebauer for the help in establishing the ChIP protocol, and to O. Hudecz, C. Stingl and G. Mitulovic (Institute of Molecular Pathology) and A. Ssykor, M. Biesold, D. Richter, K. Kozak and D. Drechsel (Max Planck Institute for Molecular Cell Biology and Genetics) for excellent assistance. We thank I. Cheesman for helpful discussions. This work has been supported by the 6th Framework Program of the European Union, Integrated Project ‘MitoCheck’ (LSHG-CT-2004-503464), and by NGFN2 grant SMP-RNAi (01GR0402). Work in the laboratories of J.-M.P. and K.M. is supported by Boehringer Ingelheim, the GenAu Program, the Austrian Research Promotion Agency (FFG), the European Science Foundation and the Austrian Science Fund (FWF) via the EuroDynaProgram. A.F.S. received funding from the 6th Framework Program of the European Union, Integrated Project ‘Heroic’ (LSHG-CT-2005-018883). K.P.W. is supported by grant 1R01HG004428-01 from the National Human Genome Research Institute of the US National Institutes of Health. R.K. is supported by a long-term fellowship of the Human Frontier Science Program Organization. Y.T. was supported by the Uehara Memorial Foundation.
Description of the antibodies, plasmids, strains and cell lines used, as well as detailed protocols for 96-well recombineering, BAC transfection, LAP-based localization, purification and ChIP-chip mapping of DNA binding sites, can be found in the Supplementary Methods.
The BACFinder clone search and oligo design tool is available online at http://www.mitocheck.org/cgi-bin/BACfinder.
Database accession codes.
The ChIP/chip data has been submitted to the Gene Expression Omnibus database with accession number GSE10845.
COMPETING INTERESTS STATEMENT
The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturemethods/.