|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date.
We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences.
The high-quality and large-insert BAC libraries of the insects, together with the identified BACs containing genes of interest, provide valuable information, resources and tools for comprehensive understanding and studies of the insect genomes and for addressing many fundamental questions in Lepidoptera. The sample of the genomic sequences provides the first insight into the constitution and evolution of the insect genomes.
Large-insert bacterial artificial chromosome (BAC) libraries have been shown to be critical resources for many aspects of molecular and genomic studies [1,2], such as the positional cloning of genes  and quantitative trait loci , comparative studies of synteny and gene organization among different species , as well as for local or whole genome physical and genetic mapping and sequencing [6-11]. Arrayed, large-insert DNA libraries have provided the opportunity for researchers to analyze and share information and resources on specific clones [1,2,12,13]. Hundreds of BAC libraries have been constructed for microbe, plant and animal species [1,2,6,7,12,13]. However, only a few large-insert BAC libraries are available to date for insect species, especially lepidopteran insects [10,11,14-17]. This could slow progress for the comprehensive molecular and genomics research of these clades.
Moths and butterflies, members of the insect order Lepidoptera, are the second most diverse group of animals, with at least 150,000 named species . They are widespread members of the ecosystem, playing important roles as pollinators and prey, and are among the most destructive agricultural pests. Clearly, Lepidoptera are under-represented in terms of genomic resources and knowledge relative to their biological and economic status. This research was designed mainly to construct comprehensive BAC library resources for two species of moths, the tobacco hornworm, Manduca sexta and the tobacco budworm, Heliothis virescens, and one species of butterfly, the Müllerian mimic, Heliconius erato. These species have genome sizes ranging from 400 to 500 Mb/haploid genome (395 Mb for H. erato , 404 Mb for H. virescens , and 500 Mb for M. sexta [J. S. Johnston, pers. communication]) and are widely-used models for studying fundamental problems in neurobiology , olfaction , development , and immune responses  (M. sexta]; host feeding preferences  and evolution of insecticide resistance  and sexual communication systems  (H. virescens); and wing pattern mimicry [(H. erato) . Moths and butterflies are estimated to have diverged from each other at least 50–60 million years ago . The sphingid, M. sexta, is a member of the same superfamily, Bombycoidea, as the domesticated silkworm, Bombyx mori, the current genome model for Lepidoptera [8,9], and the noctuid, H. virescens, is related to other pest noctuids currently being used for genomic studies including Spodoptera frugiperda [16,29] and Helicoverpa armigera . Here, we report the construction and characterization of six large-insert BAC libraries for these species and the first insight into the constitution and evolution of their genomes. The libraries will enable a large community of scientists to isolate and study the genes controlling these processes, provide new tools for lepidopteran systematics, and serve as critical resources for comparative genomic studies and genome sequencing of this important group of organisms.
One of the most important steps toward construction of high-quality BAC libraries is preparation of high-quality megabase DNA. Since no procedure was available for preparation of HMW DNA from these insects, we first developed a method for megabase DNA preparation by testing different DNA isolation buffer systems and tissues collected at different developmental stages of the insects. The results showed that the day-10 pupae (males and females) of M. sexta and day-4 pupae (males and females) of H. virescens and H. erato were most suitable for megabase DNA isolation using a buffer system containing 0.1 M NaCl, 10 mM Tris-HCl, 10 mM EDTA, pH 9.4, and 0.15% β-mercaptoethanol. The DNA isolated with this method was not only large in size (> 1000 kb), but also readily digestible and clonable, thus being well-suited for BAC library construction.
The major goal of this study was to develop BAC resources that are widely usable for molecular and genomic studies of the insects, including whole genome physical mapping and sequencing. Therefore, we constructed two BAC libraries for each species with BamHI and EcoRI in the BAC vector pECBAC1. Table Table11 summarizes the characteristics of the six BAC libraries constructed. The libraries were named MSB and MSR for M. sexta (MS) and B or R for BamHI or EcoRI, respectively, HVB and HVR for H. virescens, and HEB and HER for H. erato. The insert sizes of the library clones were estimated based on a random sample of 200–300 BACs from each library digested with NotI, a relatively rare cutter in lepidopteran DNA, and fractionated on pulsed-field gels. A typical pulsed field gel pattern for a set of random clones selected from the MSB BAC library is shown in Figure Figure1.1. The average insert sizes of the libraries ranged from 150–175 kb, and the proportion of the insert-empty clones was <5%. Each library contained from 19,200 to 21,504 clones which were arrayed into 384-well microtiter plates. Based on the number of clones and average insert sizes of each library, we estimated that the genome coverage of each library ranged from 6 ×–8 × genome equivalents, with the two combined libraries of each species having a genome coverage of 13.0 ×–16.3 × (Table (Table1).1). Based on this and previous studies [1,2,6,7,13], these BAC library resources should be well-suited for many kinds of molecular and genomic research, including whole genome physical mapping and sequencing.
As an independent test of genome coverage, to demonstrate the utility of the libraries, and to isolate BACs containing genes of importance, the libraries were robotically double-spotted onto Nylon membrane filters in 3 × 3 format and screened with gene sequence-specific probes that are of interest for studies of the lepidopteran models. These included genes involved in olfaction (MsOR1, MsOR3, and HvHR16), nerve axon growth and guidance (MsNos128, MsEph, MsFasII, and MsPlexA), hormone action (MsE75, and HvPTTH), wing patterning (Hewg, Heptc, and HeCi), Bt toxin action (HvAPN120 and Hvcad), and ribosomal protein structure (HvRpS4, HeRpS5, HeRpS9, HeRpL3, and HeRpL10), some of which have also served as anchor loci for comparative linkage mapping [5,31]. Tables Tables2,2, ,33 and and44 summarize the library screening results. Although the number of hits for individual probes varied widely, which may reflect the uneven distribution of the clones constructed with a single enzyme, the average number of hits in a library was close to the expected genome coverage estimated by the library insert sizes and the genome size (Table (Table1).1). The library genome coverage estimated by hybridization was slightly higher than the values expected by the BAC library insert sizes and the genome DNA content for the libraries constructed for H. virescens (16.7 hits per probe vs. 16.3 ×), and slightly lower for the libraries from M. sexta (12.3 hits per probe vs. 13.0 ×) and H. erato (13.9 hits per probe vs. 15.4 ×).
To validate the libraries further, estimate levels of contamination by microbial or organellar sequences and obtain some information about the constitution of the insect genomes, one 96-well plate per library, thus two 96-well plates per species, was sequenced from both ends of each clone. A total of 246–299 BESs were successfully generated for each species (Table (Table5).5). The sequences had an average read length of 630 nucleotides, with an average of 560 Q20 nucleotides. Analysis of the BESs indicated that there was no evidence of contamination with DNA from organelles, the E. coli host, or other microbes that were potentially carried by the DNA source insects. The sequences are registered in the trace archives of GenBank described in Methods [see Additional file 1].
To provide a first insight into the constitution of the insect genomes, we analyzed the BESs using the RepeatMasker program, with an emphasis on the contents of GC and repeat elements including retroelements, DNA transposons, simple repeats, and low complexity repeats (Table (Table55 [see Additional files 2, 3 and 4]). The GC contents of the genomes ranged from 32.35% (H. erato) to 36.18% (H. virescens), with the genome of the butterfly having 3.34% less GC than those of the moths. A total of 117.702 kb of BESs was obtained from the genome of H. erato. The sequence contained a total of 231 repeats, which is equivalent to 10.01% of the genome. Two hundred six of these repeats were categorized as low complexity (7.86% of the genome). This number contrasted with those of H. virescens and M. sexta. H. virescens had a total of 198.779 kb BESs from which a total of 103 repeats (3.16% of the genome) were identified, of which 79 were categorized as low complexity (1.51% of the genome). Similarly, a total of 180.031 kb of BESs generated from the M. sexta BACs were found to contain a total of 125 repeats (3.34% of the genome), of which 103 were categorized as low complexity (1.86% of the genome). Therefore, 3.16–10.01% of the lepidopteran genomes comprised repeat elements, of which a majority was categorized as low complexity repeats. The overall percentage of repeat elements was approximately 3-fold larger for the butterfly than for the moth genomes; further, the percentage of the low complexity repeats was > 4-fold larger for the butterfly than for the moth genomes. Among the low complexity repeats, 11 were longer than 100 bp (244 bp for the largest low complexity repeat), all of which were obtained from H. erato, whereas all remaining low complexity repeats obtained from M. sexta and H. virescens were shorter than 100 bp.
Retrotransposons, transposons and simple repeats were also identified in the BESs, but altogether they comprised <1% of the genomes. Nevertheless, the percentage of simple sequence repeats in the butterfly genome (0.66%) was about 2-fold higher than those of the moth genomes (0.39% and 0.27%). Moreover, a total of 15 retro-elements were identified in the BESs of all three species whereas only 3 DNA transposons were identified, suggesting that retro-elements are generally more abundant than DNA transposons in these genomes.
Using the discontiguous Megablast program to query the database of all organisms available in GenBank, we searched for matches to BESs of the three species by BLASTn after masking with the RepeatMasker program (Tables (Tables66 and and77 [see Additional files 5, 6 and 7]). One hundred eight (43.9%) of the 246 H. erato BESs had a total of 1,364 hits to GenBank sequences, with an average of 12.6 hits per BES (range 1–74 hits) at the default e-value of 3.0–3.5E–140. Of the 299 H. virescens BESs, 134 (44.8%) had a total of 1,128 hits, with each BES having an average of 8.4 hits (range 1–69 hits). Of the 273 M. sexta BESs, 127 (34.1%) had a total of 1,516 hits, with each BES having an average of 11.9 hits (range 1–57 hits; Table Table66).
We further examined the distribution of the BLASTn hits of the BESs among different species (Tables (Tables66 and and77 [see Additional files 5, 6 and 7]). Of the 1,364 H. erato hits, 1,226 (89.9%) were from 38 insect species. The majority of the latter hits were from other Heliconius species, including H. melpomene, H. doris, H. himera, H. erato, and H. cydno, whereas the remaining hits were from H. virescens, Helicoverpa armigera (cotton bollworm), and H. zea (corn earworm). Exclusively, 1,059 (77.6%) of the 1,364 H. erato hits belonged to H. melpomene. Moreover, we found that both ends of three H. erato BACs, LQCBT56 (HER08E08), LQCBU02 (HEB02A01), and LQCBU16 (HEB04B01), had more than 10 discontiguous homologous sequences each to H. melpomene BACs registered in GenBank. These results suggested that this set of BACs may represent homologous regions. Complete sequencing of the three H. erato BACs will provide more information about the extent of microsynteny and evolution between these two species.
In comparison, 859 (83.5%) of the 1,029 hits for H. virescens BESs were from 52 insect species (Table (Table7).7). The top four values for BES hits included sequences from B. mori, H. melpomene, H. virescens, and M. sexta. Compared with 27 and 24 lepidopteran species with sequences hit by the H. erato and M. sexta BESs, respectively, 41 lepidopteran species were represented in sequences hit by the H. virescens BESs, even though similar numbers of BESs (246–299) were queried for the three species. This suggested that the H. virescens genome is a broader representative of Lepidoptera than H. erato or M. sexta.
Of the 933 hits for M. sexta BESs, 530 (56.8%) were from 30 insect species, including 340 hits (36.4%) exclusive to known sequences in M. sexta. Moreover, the highest numbers of M. sexta hits included sequences from vertebrates, including mouse (90), human (81), and zebrafish (63), and plants (50). By comparison to the genomic sequences of H. erato and H. virescens, the M. sexta genome seems to have evolved to be less unique as a lepidopteran, showing more similarities to the genomic sequences of other animals and even plants.
It was also found that most of the homologous sequences were shorter than 200 bp. However, large contiguous homologous sequences (>200 bp) were found to be associated with lepidopteran genes encoding proteins involved in hormone metabolism, structural proteins, and metabolic enzymes [see Additional files 5, 6 and 7]. An independent search of the BESs using BLASTx in GenBank and ButterflyBase  yielded an average of 10.7 hits per species with high homology to confirmed coding regions of identified genes at e-values less than 1E-10 and bitscores in a range of 55–383 [see Additional file 8]. Additional hits were to ORFs with high similarity to features associated with retrotransposons, such as reverse transcriptase, gag-pol polyprotein, and endonuclease  and non-LTR transposons found in the silkworm genome, such as TRAS and SART . Due to the limited sequencing information, the BLASTn results and discovery of putative genes are presented as potential features to be confirmed by further analysis.
We have constructed six BAC libraries for three lepidopteran model species (2 moths and 1 butterfly). These libraries not only have large-insert sizes (150 – 175 kb) and deep genome coverage (13 × – 17 ×), but also have a low level of insert-empty clones (<5%) and no detected contamination with DNA from organelles and microbes potentially living on the source insects, as indicated by BES analysis. Moreover, the genome coverage and quality of the libraries have been verified independently by screening high-density filters of the libraries with a set of single-copy genes or ESTs. The observation that none of the libraries was contaminated with microbial DNA potentially carried by the source insects was expected, because the self-contained non-feeding pupal stage used as a DNA source for the library construction had purged their guts at the end of larval development. However, we did observe 6, 21 and 20 short sequences in the He, Hv and Ms BESs, respectively, which were homologous to viral, bacterial, and fungal sequences (Tables (Tables66 and and77 [see Additional files 5, 6 and 7]). We believe that the homologues are real, but not from sample contamination because they sit in the middle of BESs. These results perhaps provide a line of preliminary evidence for the presence of microbial sequences in these lepidopteran genomes, possibly by horizontal transfer. Similar findings have been obtained in B. mori . On the other hand, considering the small fraction (~0.5%) of the BAC libraries sampled, a more direct test of organelle contamination could be accomplished by using mitochondrial sequences as probes for hybridization. Furthermore, since the libraries of each species were constructed with two restriction enzymes (EcoRI and BamHI) complementary in the GC content of their restriction sites, the genome coverage should be much better distributed along the genome than those constructed with a single enzyme [13,36]. Therefore, these libraries could provide useful resources for comprehensive genomics research of the three model lepidopterans.
The libraries, library filters and individual clones have been distributed to a number of laboratories and are presently being used for following studies: 1) walking to wing colour patterning genes from closely linked AFLP sequences in H. erato [17,37]; 2) testing synteny between M. sexta, B. mori, and H. melpomene by chromosomal fluorescence in situ hybridization using BACs containing orthologous genes as probes [5,38]; 3) analysis of full-length coding and regulatory regions for the M. sexta Broad gene (L. Riddiford, personal communication); and 4) analysis of H. virescens HR16 putative odor receptor sequences (F. Gould, personal communication).
The results of this study (Tables (Tables5,5, ,66 and and7)7) have provided a snapshot of the basic characters of the genomes of a group of ditrysian moths and butterflies which diverged from each other at least 50–60 million years ago . First, the genomes of all three species are AT-rich (64–68%), with the genome of the butterfly (H. erato) having an AT content more than 3% higher than those of the moths (M. sexta and H. virescens). Second, the results show that all three insect genomes contain relatively small fractions of repeat elements (3–10%), including retro-transposons, transposons, simple repeats, and low complexity repeats. These results are in agreement with the small genomes of the species (400–500 Mb/1C) which generally tend to contain smaller fractions of repeat elements. Of these three insect species, the butterfly genome contains 3–5-fold more repeat elements (10.01% all repeats), especially low complexity repeats, than the two moth genomes. Papa reported that the total repetitive sequences accounted for about 26% of the genomic regions linked to wing pattern variation in H. erato . The difference could be an effect of more H. erato-specific repeats documented, sampling of a specific region with a higher average repeat density, or both. Third, whereas the three insect genomes all contain a small number (<1%) of retro-elements, DNA transposons and simple repeats, retro-elements seem much more abundant than DNA transposons, and the butterfly genome is two-fold richer in simple repeats than the two moth genomes. Compared with published information from B. mori, the finding of such a low percentage of repeat contents in these three lepidopteran species is surprising, especially for M. sexta, which is in the same superfamily as the silkworm, Bombycoidea. Xia et al.  estimated about 20% of the B. mori genome to be composed of "transposable elements;" further, early work based on Cot hybridization kinetics estimated about 45% of the silkworm genome to be composed of repetitive sequences . More recently Osanai-Futahashi et al. reported that the TEs made up 35% of the silkworm genome and contributed greatly to the genome size . One may argue that we might simply have not identified all the relevant repeats in the BESs, but our argument is supported by the following evidence. The genome of the butterfly, H. erato, contains extremely large numbers (1059 of 1364 hits) of small duplicated sequences or "novel repeats" (not registered in GenBank) which are homologous to three completely sequenced BAC clones (118 kb of AEHM-41C10, 112 kb of AEHM-46M10, and 118 kb of AEHM-7G12) of H. melpomene. This in turn indicates the presence of novel repetitive or duplicated sequences in the H. melpomene genome [see Additional file 5]. Large-scale end sequencing of the complete BAC libraries will uncover more detailed aspects of these butterfly and moth genomes, and provide more information for fundamental studies of lepidopteran insects in general.
The BLAST analysis of the sampled BESs has also provided insights into the evolution of these insect genomes. It is not surprising to find the top hits are to the sequences of lepidopteran species, but it is quite surprising that the highest numbers of M. sexta BES hits were to the sequences of other animals and plants rather than to B. mori (Tables (Tables66 and and7).7). This finding suggests that although all the genomes have undergone changes since the split from the most recent common ancestor, they may have done so along different trajectories, with the M. sexta genome retaining some sequences in common with plants and animals that have been either lost or modified to a greater extent in H. virescens and H. erato. Such a hypothesis can only be tested when more genomic data are available for these lepidopteran insects. Moreover, the BESs of the butterfly (H. erato) are well-matched only to the sequences of H. melpomene. This suggests that not only is the butterfly more related to H. melpomene than to the two moth species, as expected, but this group has also diverged to a greater extent, resulting in a higher level of species- or evolutionary lineage-specific sequences. This argument is further supported by the finding that 27 of the 76 species having sequence matches to the BESs of H. erato (35.5%) were from other Lepidoptera. This number is 6% higher than that of M. sexta but 9% less than that of H. virescens. By contrast, the total number of top hits to lepidopteran species for H. virescens BESs was 41, or 17 and 14 more than for M. sexta and H. erato, respectively. Therefore, the genome of H. virescens may be a better representative of the genomes of Lepidoptera as a whole (Table (Table77).
One may argue that the RepeatMasker program might not mask the repeat sequences completely because of the limited amount of repeat elements available in the public database; however, this does not appear to have affected the BLAST results significantly. For instance, B. mori represents the species having the most sequence information in GenBank among the lepidopteran species; however, we found significantly different hits, 23, 264 and 41 for the BESs of H. erato, H. virescens, and M. sexta, respectively (Table (Table6).6). Moreover, there are many more Drosophila spp. sequences in GenBank than for any other insect; however, we only observed limited numbers of Drosophila sequence hits: 25, 16 and 36 for the BESs of H. erato, H. virescens, and M. sexta, respectively (not shown); there were no large (≥ 200 bp) hits for any Drosophila sequence, and only one large (≥ 200 bp) hit each for Apis mellifera (honey bee), for the BESs of H. virescens and M. sexta, even though the honey bee genome is also fully sequenced [see Additional files 5, 6 and 7]. Similarly, a large number of top BLASTx hits were to protein sequences in lepidopteran species (12/32) or other insects (17/32), such as Acyrthosiphon pisum (pea aphid) and Tribolium castaneum (red flour beetle), of which relatively few were to Drosophila spp (3/32).
We constructed six high-quality, deep-genome coverage BAC libraries, two libraries for each of three lepidopteran model species: H. erato, H. virescens, and M. sexta, with two restriction enzymes, respectively. As the average clone insert size of the libraries ranging from 152–175 kb, we estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. This genome coverage should be sufficient for many aspects, if not all, of genomics studies of each species, including genome-wide physical mapping and genome sequencing.
Genomic sequence sample analysis of the moths and butterfly has provided an initial insight into the constitution and evolution of their genomes. Although large-scale genome sequencing is needed to further decipher the genomes of the species, especially their gene contents, the basic characteristics of the repeated sequence portion of each genome is useful information for our understanding of the genomes and their evolution. The high-quality BAC libraries of the insects, together with the gene-containing BACs and BAC end sequences, provide valuable information, resources and tools for comprehensive studies of the insect genomes and for addressing many fundamental questions in Lepidoptera.
To minimize the potential polymorphism of the source DNA for BAC library construction, we sought insects that were as inbred as possible. For each species, we used progeny from a single pair mating to restrict the potential polymorphism of the insects to a maximum of 4 alleles per locus. Because the source strains were at least partially inbred, we expected significantly less polymorphism at many loci. This strategy also minimized the number of haplotypes in a library, since intra-chromosomal exchange (crossing over) occurs only in lepidopteran males. The source of DNA for M. sexta was a colony that was maintained without outcrossing for at least 30 years (L. Riddiford, U. Washington). For H. virescens (F. Gould, North Carolina State U.) and H. erato (O. McMillan, U. Puerto Rico), the DNA source was a colony that had to be replenished periodically from wild populations to avoid inbreeding depression. Whereas M. sexta and H. virescens deposit large numbers of eggs in a short time, enabling a relatively synchronous rearing, H. erato lays only a few eggs each day for several months. Therefore, we collected and froze insects at an appropriate stage (day-4 pupae) based on pilot studies. Consequently, it took more than 6 months to accumulate a sufficient number of animals (~200) to prepare high molecular weight (HMW) DNA for library construction of this species.
To maintain the identity of the DNA source animals, voucher specimens from the same families used for BAC library construction were archived at the Museum of Comparative Zoology, Harvard University (M. sexta, N. Pierce), and at the North Carolina State University Insect Collection (H. virescens and H. erato, F. Gould). The archived specimens included dried adult wings and the corresponding bodies preserved in 70–100% alcohol at -20°C or -80°C.
The pECBAC1 vector  was used in the library construction . Vector DNA was isolated by the alkaline lysis method, purified by cesium chloride gradient centrifugation, digested completely with either BamHI or EcoRI, and dephosphorylated with calf intestinal alkaline phosphatase. The digested vector DNA was precipitated, dissolved in TE (10 mM TrisHCl, 1 mM EDTA, pH 8.0), adjusted to 10 ng/μl, and stored at -20°C before use [1,2,13,43].
HMW DNA was isolated from the insects using frozen pupal tissues and buffer system (see Results) according to the procedure described by Wu et al. . The BAC libraries were constructed using an improved procedure developed in our laboratory [1,2,13,43]. Briefly, HMW genomic DNA plugs were prepared from day-10 pupae (males and females) of M. sexta and day-4 pupae (males and females) of H. virescens and H. erato. DNA was partially digested with BamHI or EcoRI, size-fractionated in a clamped homogeneous electrical field (CHEF) apparatus (Bio-Rad), recovered by electroelution, and then ligated into the BamHI or EcoRI site of the pECBAC1 vector, respectively. The ligated DNA was transformed into E. coli DH10B cells (Invitrogen, USA) by electroporation. The transformed cells were incubated in SOC medium with shaking at 250 rpm, 37°C for 1 h. Recombinant transformants were selected and incubated for 32 h at 37°C on LB agar (Invitrogen, USA) plates containing 12.5 μg/ml chloramphenicol, 0.5 mM IPTG, and 50 μg/ml X-gal.
White colonies were randomly selected and grown in LB medium (Invitrogen, USA). BAC DNA was isolated, digested with NotI, and subjected to CHEF gel electrophoresis. The ligation that gave a transformation efficiency of 200 or more white colonies/μl ligation and that generated clones with the largest inserts was selected for library construction. The BAC colonies were manually arrayed as individual clones in 384-well microtiter plates containing 50 μl LB plus freezing broth with 12.5 μg/ml chloramphenicol [1,2,13,43,44]. After incubation at 37°C for 14 h, the microtiter plates were stored at -80°C. To facilitate their accessibility, all six BAC libraries have been made available to the public at the TAMU GENE finder Genomic Resources Center directed by H.-B. Z.
A GeneTAC G3 robotic workstation (Genomic Solutions, Inc., USA) was used to double-spot the BAC libraries onto 8 × 12-cm Hybond N+ filters (Amersham-Pharmacia, USA) in 3 × 3 format so that each high-density clone filter contained two spots of each clone from four 384-well microtiter plates (1,536 × 2 spots). The filters were processed according to Zhang  and Zhang et al. . Filter screening was carried out using a non-radioactive detection system (ECL, Pharmacia Amersham/Pharmacia, USA) with X-ray film (Hyperfilm; Amersham/Pharmacia, USA) according to the manufacturer's instructions. All probes used except for putative olfactory receptors (MsOR1, MsOR3, and HvHR16) were verified as single copy by BLASTn search of KAIKObase  or BLASTx search of FlyBase  to confirm the presence of one chromosomal locus. The amount of hybridizing DNA per filter was adjusted to a range of 30–60 ng per filter based on the intensity of signal obtained after initial screening. We routinely used 0.5 M NaCl in the hybridization buffer, but in some cases increased stringency to 0.4 M to reduce background. We re-used filters without stripping until the background became too high to read positive signals reliably or until we detected carry-through; then we treated the filters to remove the probe DNA according to the manufacturer's instructions. Probe DNA was obtained from a variety of sources, including bacterial plasmids containing well-characterized cDNA sequences and PCR fragments amplified from genomic DNA based on sequences registered in GenBank. Insert DNA was amplified by PCR using primers designed from the plasmid vectors or within the insert sequence and purified on Wizard Spin columns (Promega, USA) before labelling with the ECL reagents. Although we were able to detect positive signals on filters hybridized with probes as short as 350–400 bp, probes of 1 kb or greater gave more consistent signal-to-noise ratios.
Ninety-six clones were randomly selected from each of the six BAC libraries and re-arrayed into a 96-well plate. Both ends of each clone were sequenced using the primer 5'-TAATACGACTCACTATAGGG-3' for the T7 end and 5'-GTTTTTTGCGATCTGCCGTTTC-3' for the SP6 end using the procedure developed at The Institute for Genomic Research  [see Additional file 1 for the original library clone names and the corresponding TIGR names]. The sequences are registered in the trace archives of GenBank under the following link: http://www.ncbi.nlm.nih.gov/Traces/trace.cgi? SPECIES_CODE='HELICONIUS ERATO' AND CENTER_NAME='TIGR' or TI#: 908600791-908601036; SPECIES_CODE='HELIOTHIS VIRESCENS' AND CENTER_NAME='TIGR' or TI#: 908601037–908601335; and SPECIES_CODE='MANDUCA SEXTA' AND CENTER_NAME='TIGR' or TI#: 908601336–908601608. The resultant BESs were analyzed by utilizing the RepeatMasker program . Finally, the masked BESs were BLASTed against the databases of all organisms by using the discontiguous megablast program at NCBI using the default criteria.
BLASTx searches were carried out against non-redundant protein sequences in GenBank (December 2008) and ButterflyBase version 2.92 . Hits with e-values less than 1E-7 and bitscores greater than 50 were evaluated for similarity to coding regions of identified proteins and retrotransposons. High matches of similar sequences in more than one species was used as a criterion for provisional identification of a bona fide protein.
BAC: Bacterial Artificial Chromosome; BES: BAC end sequence; ORF: open reading frame.
CW participated in the study design, BAC library construction, BAC end sequence data analysis and manuscript preparation. DP, DC and EN participated in the library screening. FS supported the BAC library assembly and filter preparation. SZ conducted the BAC end sequencing. H-BZ participated in the study design, BAC library construction and manuscript preparation. MRG participated in the study design, insect vouchering, library screening, BLASTx search and manuscript preparation. All authors read and approved the final manuscript.
Correspondence of TIGR BAC end sequence names with their library clone names. Table S1 lists TIGR BAC end sequence names and corresponding BAC library clone names.
Classes of repeats in BESs of 3 lepidopteran species analyzed with the RepeatMasker tool. Table S2 includes summaries of types of repeated sequences identified in the BESs of H. erato, H. virescens, and M. sexta using RepeatMasker.
Lists of masked Lepidoptera repeats. Table S3 lists the masked repeats in the BESs of H. erato, H. virescens, and M. sexta, their sequence characteristics, and location in the BESs.
Detailed repeat element contents in the BESs of three lepidopteran models. Table S4 summarizes the types of repeat elements found in the BESs of H. erato, H. virescens, and M. sexta.
Heliconius erato BES BLASTn hits. Table S5 lists the sequences hit by BLASTn search with H. erato BESs and their characteristics.
Heliothis virescens BES BLASTn hits. Table S6 lists the sequences hit by BLASTn search with H. virescens BESs and their characteristics.
Manduca sexta BES BLASTn hits. Table S7 lists the sequences hit by BLASTn search with M. sexta BESs and their characteristics.
High scoring BES hits identified by BLASTx search. Table S8 lists the high scoring BLASTx hits in ButterflyBase and GenBank for H. erato, H. virescens, and M. sexta
We acknowledge L. Riddiford at University of Washington, F. Gould at North Carolina State University, and O. McMillan at University of Puerto Rico for generous gifts of insects; and A. Nighorn, Arizona State University, H. Robertson, University of Illinois, L. Riddiford, X. Zhou, University of Washington, P. Copenhaver, Oregon Health and Science University, O. McMillan, S. Gill, University of California at Riverside, L. Gahan, Clemson University, F. Gould, North Carolina State University, and R. Palli, University of Kentucky, for probes. We also would like to thank the BAC end sequencing team at TIGR for BAC end sequencing, N. Pierce at Harvard University and F. Gould for archiving the insect specimens, and K. Pennoyer, University of Rhode Island, for the BLASTx search. This work was funded by NSF Grant IBN-0208388 to MRG, CW, and H-BZ.