|Home | About | Journals | Submit | Contact Us | Français|
We constructed Drosophila melanogaster BAC libraries with 21-kb and 83-kb inserts in the P(acman) system. Clones representing 12-fold coverage and encompassing more than 95% of annotated genes were mapped onto the reference genome. These clones can be integrated into predetermined attP sites in the genome using ΦC31 integrase to rescue mutations. They can be modified through recombineering, for example to incorporate protein tags and assess expression patterns.
Genetic model systems such as Drosophila melanogaster are powerful tools for investigating developmental and cell biological processes, properties of inheritance, the molecular underpinnings of behavior, and the molecular bases of disease 1. The approaches used in model systems rely on the identification of mutations in genes and the characterization of the gene products, often aided by transgenesis techniques 2.
We recently developed a new transgenesis platform for D. melanogaster, the P(acman) (P/ΦC31 artificial chromosome for manipulation) system, that allows modification of cloned fragments by recombineering and germ-line transformation of genomic DNA fragments up to 133 kb in length 3. P(acman) combines a conditionally amplifiable BAC 4, the ability to use recombineering in E. coli for retrieval and manipulation of DNA inserts 5, and bacteriophage ΦC31 integrase-mediated germ-line transformation into the D. melanogaster genome 6,7. Clones are maintained at low-copy number to improve plasmid stability and facilitate recombineering, but can be induced to high-copy number for plasmid isolation to facilitate microinjection of embryos. Recombineering can be used to insert protein tags for in vivo protein localization or acute protein inactivation 8, and to create deletions 9 and point mutations 5 for structure/function analysis. ΦC31-mediated transgenesis integrates DNA constructs at specific pre-determined attP sites dispersed throughout the genome 3,6,7,10, eliminating the need to map integration events and reducing variability in expression due to position effects 10. The technique allows rescue of mutations in large genes 3 and facilitates comparative expression analysis of engineered DNA constructs 7,10–12. Previously, genomic regions of interest were cloned into P(acman) by gap-repair from available mapped BAC clones 3. Here, we describe a more efficient approach: we constructed two genomic BAC libraries in the P(acman) system and mapped the cloned inserts by alignment of paired end sequences to the reference genome sequence.
We engineered a novel P(acman) BAC vector for construction of genomic libraries (Fig. 1a). In addition to the published features 3, we included a polylinker embedded within a mutant α-lacZ fragment. It became apparent that in the low-copy-number condition necessary to ensure stability of large genomic inserts, standard α-lacZ fragments are expressed at insufficient levels to permit reliable blue-white colony screening. We isolated a mutant with significantly enhanced β-galactosidase activity resulting from a premature stop codon in the α-lacZ fragment (Supplementary Fig. 1) that permits blue-white selection for cloned inserts at low-copy number using an automated colony picking device.
To create a resource for manipulation and analysis of D. melanogaster genes, we constructed two P(acman) libraries with different insert sizes (Supplementary Fig. 2). For analysis of most genes, we used the library with an insert size of 20 kb. Ninety percent of protein-coding gene annotations in D. melanogaster are less than 12.1 kb in length, and a 20 kb insert size should provide sufficient flanking genomic sequence to contain most genes, including regulatory sequences required for normal expression. For analysis of large genes and gene complexes, we constructed a library with an insert size of 80 kb. High molecular weight genomic DNA was prepared from the D. melanogaster strain used to produce the reference genome sequence. The DNA was fragmented by partial restriction digestion, and size fractions in the 20 kb and 80 kb ranges were recovered and cloned separately to produce two genomic BAC libraries. The libraries produced from the 20 kb and 80 kb fractions were designated CHORI-322 and CHORI-321, respectively. We stocked 73,728 CHORI-322 clones and 36,864 CHORI-321 clones.
To map P(acman) BACs on the genome, paired end sequences were determined and aligned to the reference genome sequence. We mapped consistent paired ends of 33,314 CHORI-322 clones representing 4.3-fold coverage of the X chromosome and 5.9-fold coverage of the autosomes, and 12,328 CHORI-321 clones representing 8.2-fold coverage of the X chromosome and 9.3-fold coverage of the autosomes. The mapped paired end sequences show that the average insert sizes of the CHORI-322 and CHORI-321 libraries are 21.0 kb (+/− 4.0 kb) and 83.3 kb (+/− 21.5 kb), respectively. An additional 18,767 CHORI-322 clones and 11,571 CHORI-321 clones were partially mapped to the genome sequence by alignment of one end sequence only. The two libraries together represent deep coverage of the genome and span most annotated genes (Supplementary Table 1). The mapped CHORI-322 and CHORI-321 clones span 88.9% and 99.3% of annotated genes, respectively. P(acman) clones containing genes and genomic regions of interest can be identified through a web-accessible genome browser (http://pacmanfly.org/) (Fig. 1b) and are available for distribution from the BACPAC Resources Center (http://bacpac.chori.org/).
We tested the P(acman) library resource for transformation efficiency using clones encompassing several genes. For each gene, we identified a clone containing substantial flanking sequences biased toward the 5’ end of the gene annotation. These clones are likely to include the regulatory sequences necessary for normal expression of the gene. For small genes (≤12 kb), a CHORI-322 clone was preferred over a CHORI-321 clone, as smaller clones tend to have higher transformation efficiencies 3. When a mapped CHORI-322 clone was not available for a small gene (e.g. hh, vas and shi) or sufficient 5’ regulatory sequence did not appear to be present in a mapped CHORI-322 clone (e.g. jar, lt and cta), we chose a CHORI-321 clone instead. In total, we selected 38 clones from the CHORI-322 library (Table 1) and 24 clones from the CHORI-321 library (Table 2). The largest clone, encompassing Hnf4, has an insert size of 105 kb. Each clone was isolated and tested for integration into a genomic attP docking site, either VK37 on chromosome arm 2L or VK33 on chromosome arm 3L 3, using ΦC31 integrase 6,7. The transformation efficiency of each clone was defined as the percentage of G0 fertile crosses that yielded at least one transgenic animal. We were successful in obtaining at least one transformant for all CHORI-322 clones (Table 1) and 13 of the 24 CHORI-321 clones (Table 2). In addition, 16 of 17 CHORI-322 clones used for recombineering-mediated tagging (see below) were successfully integrated (Supplementary Table 2). Moreover, 53 of 72 CHORI-321 clones have been integrated successfully in an independent experiment to generate a set of duplication lines, each carrying a clone from a tiling path of overlapping CHORI-321 clones spanning the entire X chromosome (Ellen Popodi and Thom Kaufman, personal communication). These data show that more than 98% (54/55) of CHORI-322 clones and at least 68% (66/96) of CHORI-321 clones can be successfully integrated. For all transformants, the presence of the expected DNA fragment sizes at the integration junctions - indicative of site-specific integration at the respective docking site - was confirmed by multiplex PCR that tests simultaneously for the presence of attP, attB, attR and attL sites (Supplementary Fig. 3).
The range of integration efficiencies observed is surprisingly broad. Efficiencies ranged from 0% to 28.1 % for CHORI-322 clones and from 0% to 11.6 % for CHORI-321 clones. The insert sizes of CHORI-322 clones are very similar to each other, so the observed range suggests that some fragments are less efficiently transformed than others due to sequence content or specific interference between certain fragments and docking sites (e.g. Csp and wg). Notably, the high efficiency observed for some CHORI-321 clones (e.g. CH321-16H04, CH321-64G01 and CH321-79N05) suggests that further optimization of the integration efficiency of large clones is possible.
We tested transgenic insertions of ten CHORI-322 and six CHORI-321 clones for their ability to complement lethal mutations in genes. All CHORI-322 clones tested, encompassing the genes CG6017, chc, dap160, drp1, endo, Eps15, n-syb, sqh, synj and vha100-1, rescue lethal mutations in the corresponding genes. To our knowledge, rescue of mutations in endo, n-syb and vha100-1 using genomic fragments has not been reported previously. Similarly, CHORI-321 clones encompassing the genes cac, Dscam, lt and shakB complement lethal mutations in the corresponding genes. Rescue of cac, lt and shakB using genomic fragments has also not been reported previously. Rescue of a lethal mutation in lt with a 92 kb genomic fragment inserted in euchromatin is surprising, because full expression of lt and several other heterochromatic genes has been shown to be dependent on their heterochromatic context 13. Only one of three clones tested complemented lt lethality, suggesting that essential regulatory elements or sufficient genomic context were absent in the other two clones.
To test the utility of recombineering in P(acman) BACs, we introduced EGFP reporter tags into 17 genes encoding transcription factors with well-documented embryonic expression patterns. We inserted the coding region of EGFP in-frame at the 3’ end of the open reading frame, replacing the stop codon and creating C-terminal protein fusions 14 (Supplementary Fig. 4). Both the untagged and tagged constructs were tested for integration using ΦC31 integrase (Supplementary Table 2). Eleven tagged constructs were tested for expression of the fusion protein. Since this EGFP does not fold efficiently in embryos prior to stage 15, we performed immunohistochemistry on embryos with an anti-GFP antibody (Fig. 2 and Supplementary Fig. 5a,b). EGFP fluorescence could be used to visualize fusion protein expression in live embryos only in the late stages of embryonic development (Supplementary Fig. 5c). The expression patterns of eve, D, cad, Dfd, tll, slp2 , and exd are reproduced by the transgenic fusion constructs (Supplementary Discussion). The en and h gene expression patterns appeared to be exceptions (Fig. 2k–l). For h, only two stripes (1 and 5) of expression in the embryo were observed, instead of eight 15. Interestingly, enhancers for stripes 1 and 5 are located in the 7 kb region proximal to the transcription start site, whereas the regulatory elements for the other stripes are located more distally 16. The latter regulatory elements are lacking in CH322-135D17 used to tag h. Hence, the tagged construct is expressed in the expected pattern. Similarly, en expression was only observed in 13 stripes and not the head region 17. This may be due to the absence of regulatory regions in the en clone CH322-92I14 (Judith Kassis, personal communication). These experiments show that recombineering-mediated deletion of genomic sequences in P(acman) constructs can be used to dissect the control of transcription by cis-regulatory elements.
In conclusion, we have described a versatile P(acman) BAC library resource for functional analysis of transgenes in D. melanogaster(Supplementary Discussion). We conservatively estimate that the new resource enables in vivo analysis of more than 95% of D. melanogaster genes including large genes, gene complexes and heterochromatic genes (Supplementary Fig. 6). Moreover, protein tagging should prove a valuable alternative to antibody production, particularly when proteins are poorly immunogenic. Finally, the flexibility of recombineering 5 permits the integration of a variety of protein tags for numerous applications 18. The few genes and gene complexes that are too large to be contained within clones in the P(acman) libraries or are otherwise not represented in them can be obtained using the previously described gap-repair procedure 3 and previously mapped and end-sequenced BAC libraries constructed from the same isogenized strain 19,20.
We thank the Washington University Genome Sequencing Center for their excellent BAC end sequencing services. We thank the Bloomington Drosophila Stock Center, NCI-Frederick, N. Copeland (NCI Frederick), A. Hyman (Max Planck Institute), R. Karess (CNRS), R. Ordway (Penn State University), J. Reinitz (Stony Brook University), D. Schmucker (Harvard Medical School), T. Schwarz (Children’s Hospital, Boston), B. Wakimoto (University of Washington), S. Warming (NCI Frederick) and L. Zipursky (UCLA) for reagents. We are especially thankful to J. Bischof, K. Basler (University of Zurich) and F. Karch (University of Geneva) for providing germ-line ΦC31 sources and information about their use. We thank J. Cohen for help with recombineering, N. Giagtzoglou and A. Rajan for help with microscopy, C. Amemiya and D. Frisch for helpful communications and discussions. We are grateful to B. Wakimoto for critical reading of the manuscript. Confocal microscopy was supported by the BCM Intellectual and Developmental Disabilities Research Center. This work was supported by a grant from the Howard Hughes Medical Institute to H.J.B. and the NIH modENCODE project in collaboration with K.P.W. H.J.B. is an Investigator of the Howard Hughes Medical Institute.
The sequence of the attB-P(acman)-CmR-BW vector and the P(acman) BAC end sequences have been deposited in GenBank under accession numbers FJ931533 and FI329972 to FI494724, respectively.
Methods and associated references are available as supplementary online material at http://www.nature.com/naturemethods/.
Supplementary information is available on the Nature Methods website.