Genetic trapping experiments have a long-standing history in Drosophila
functional genomics. The classic "enhancer trap" screens utilised P
-element-mediated insertion of the E. coli lacZ
gene to facilitate relatively unbiased discovery of tissue-specifically expressed genes and enhancers at a genome-wide level [1
]. In recent years, additional transposable elements such as piggyBac
] or Minos
] have found their way into research applications. Together, this set of transposons provides excellent tools for genome-wide genetic screens. While P
-elements and piggyBac
show some positional preferences towards insertion at the 5' regions of target genes, and thus have biased coverage with respect to the genome [4
is reported to show a much more random genomic distribution of insertions [5
] and may be more suited to genome-wide screens.
The introduction of the green fluorescent protein (GFP) in research applications opened new avenues for enhancer trapping, since the expression of a fluorescent reporter can be directly observed and detection does not require an enzymatic reaction [7
]. A more recent advance in the gene trapping approach is the "protein trap" screen, which aims to create GFP-tagged versions of endogenous proteins under the control of the genes' native regulatory sequences [8
]. As is evident from the comprehensive recombination-based efforts in yeast, tagging endogenous proteins in vivo can have tremendous utility for genomics, cell biology and systems biology studies [12
]. Protein tagging is likely to be of even greater untility in metazoans where many different cell types are present.
Protein trap screens utilise artificial reporter-encoding exons to generate fluorescent fusion proteins by random integration into the genome. The reporter is usually a GFP variant, flanked by splice acceptor/donor sites, and carried within a transposable element vector. Integration of the transposon within an intron in the correct orientation results in the transcription and subsequent splicing of the trapping exon into the mature mRNA of the targeted gene. If the trapping exon is in-frame with coding sequence of the host protein, a functional GFP-tagged version of the protein may be produced. A comprehensive screen obviously requires vectors carrying targeting exons in all three reading frames, but even when multiple vectors are employed the isolation of bona fide protein traps is a relatively rare event.
Following the pioneering work of Morin et. al. [8
], the results from a variety of Drosophila
protein trap screens have been published [8
]. In total, these studies have screened close to 80 million individual embryos or larvae, with only a small fraction of these generating lines that express the GFP tag. The FlyTrap database [13
] reports 1,522 fluorescent lines generated from the three largest protein trap screens performed to date [8
]. Mapping of their insertion coordinates to Release 5.3 gene annotation shows that these represent 271 unique genes tagged with protein traps located within introns separating coding exons. With the fly genome containing approximately 14,000 protein-coding genes, the screens have hit less than 2% of known fly genes. This is far from genome-wide coverage, clearly a desirable goal for comprehensive functional genomics studies. The restricted success of protein trap screens is especially surprising given that approximately 11,600 Drosophila
genes contain introns. Thus, in principle, approximately 80% of fly genes are accessible to protein trapping. Interestingly, although the overall number of unique lines generated in the different screens is relatively small, there is considerable overlap in the tagged genes recovered by the individual screens.
The low efficiency and high degree of overlap between the published screens suggests that there are limitations to the protein trap strategies currently in use. Here we attempt to identify and quantify these limitations, and suggest future strategies that may increase the repertoire of trapped genes. We considered a number of potential factors that could bias protein trap screens, including transposable element integration hotspots, gene architecture, gene expression and protein structure. We constructed a probability model that we used to predict a set of target genes with a high likelihood of successfully receiving a protein trap insertion. Our model predicts that approximately 800 of the genes encoded in the fly genome are permissive for P
-element based protein trapping and of these, 264 genes have already been tagged in previous studies (with P
-element or piggyBac
, in previously published and novel screens). A similar analysis based on data from a more limited set of protein traps generated with piggyBac
vectors estimates approximately 3,100 genes are permissive targets with this transposon, and about 2,800 of these have not yet been tagged in previous studies. Comparing the predictions for both transposons we find that most potential as yet untagged P
-element targets are also good piggyBac
targets (448 out of 536 potential P
-element targets). Due to the apparent importance of transposon insertion bias, it is likely that a transposable element such as Minos
, which exhibits a more random insertion preference [5
], may be a better vector for future random protein trap screens. Ultimately, it is likely that recombination or recombineering -based gene targeting techniques will need to be employed to achieve comprehensive coverage of the fly genome.