|Home | About | Journals | Submit | Contact Us | Français|
The freshwater snail Biomphalaria glabrata is closely associated with the transmission of human schistosomiasis. An ecologically sound method has been proposed to control schistosomiasis using genetically modified snails to displace endemic, susceptible ones. To assess the viability of this form of biological control, studies towards understanding the molecular makeup of the snail relative to the presence of endogenous mobile genetic elements are being undertaken since they can be exploited for genetic transformation studies. We previously cloned a 1.95 Kb BamHI fragment in B. glabrata (BGR2) with sequence similarity to the human long interspersed nuclear element (LINE or L1). A contiguous, full-length sequence corresponding to BGR2, hereafter-named nimbus (BgI), has been identified from a B. glabrata bacterial artificial chromosome (BAC) library. Sequence analysis of the 65,764 bp BAC insert contained one full-length, complete nimbus (BgI) element (element I), two full-length elements (elements II and III) containing deletions and flanked by target site duplications and 10 truncated copies. The intact nimbus (BgI) contained two open reading frames (ORFs 1 and 2) encoding the characteristic hallmark domains found in non-long terminal repeat retrotransposons belonging to the I clade; a nucleic acid binding protein in ORF1 and an apurinic/apyrimidinic endonuclease, reverse transcriptase and RNase H in ORF2. Phylogenetic analysis revealed that nimbus (BgI) is closely related to Drosophila (I factor), mosquito Aedes aegypti (MosquI) and chordate ascidian Ciona intestinalis (CiI) retrotransposons. Nimbus (BgI) represents the first complete mobile element characterized from a mollusk that appears to be transcriptionally active and is widely distributed in snails of the neotropics and the Old World.
Transmission of the parasitic blood fluke, Schistosoma mansoni, one of the causative agents of the chronic debilitating disease, schistosomiasis, persists in the tropics despite efforts in reducing its prevalence. The snail Biomphalaria glabrata is the major intermediate host responsible for transmission of human schistosomiasis in the Western Hemisphere. With schistosomiasis still prevalent in 74 countries and with an estimated 600 million people at risk, the search for an effective vaccine continues (Pearce, 2003; Lebens et al., 2004; Tran et al., 2006), as does the development of novel tools that will target the intermediate snail stage of the parasite’s life cycle.
Variations in snail-host susceptibility and parasite infectivity may have helped shape the genomes of both these organisms (Cribb et al., 2001) and several laboratories are involved in identifying genes that govern the outcome of parasite infection in the snail (Raghavan et al., 2003; Lockyer et al., 2004; Zhang et al., 2004; Bouchut et al., 2007). Data from studies using expressed sequence tags (ESTs) showed that sequences related to retrotransposons occur in B. glabrata (Raghavan et al., 2003). By identifying those genes that render a snail resistant to infection, we can envision future genetic transformation of snails with such gene(s) using mobile genetic elements (MGEs). With the success of germ line transformation in insects (Coates et al., 1998; Jasinskiene et al., 1998; Kidwell and Wattam, 1998) and mollusks (Boulo et al., 2000) using MGEs, it is conceivable that this strategy can be extended to B. glabrata in a way that will make the genetic manipulation of this snail a practical possibility.
Transposable elements are ubiquitous in eukaryotes and have emerged as one of the key molecular evolutionary forces shaping the genome (Fedoroff, 1999). Eukaryotic MGEs are grouped into two major classes (I and II) according to their mechanism of transposition (Finnegan, 1989; Charlesworth et al., 1994). Class I elements transpose via an RNA intermediate, using reverse transcriptase (RT) to copy the sequence that is transposed, while Class II elements transpose directly from DNA to DNA. The Class I elements are further subdivided into sub-classes, namely those that possess long terminal repeats (LTRs) at their ends and those that do not (non-LTRs), with LTR retrotransposons being more efficiently transposed (Eickbush, 1992).
The non-LTR retrotransposable elements or long interspersed nuclear elements (LINE or L1-like elements) are the most abundant (Eickbush, 1994; Malik et al., 1999). They comprise 17% of the human genome and play a powerful role in modulating gene expression. The presence of the L1 retrotransposon was found to reduce gene expression in a mechanism that truncated mRNA (Han and Boeke, 2004) and a synthetic construct showed a 200-fold increase in transposition compared with the original sequence (Han et al., 2004). In addition, a human L1 harboring a retrotransposon cassette was shown to effect changes in the neural progenitor cells both in vitro and in the brains of transgenic mice in vivo (Muotri et al., 2005), demonstrating that synthetic retrotransposons have the potential to be practical tools for manipulating mammalian and perhaps other genomes.
We previously identified a 1.95 Kb BamHI repetitive sequence (BGR2) related to the human L1 and Drosophila melanogaster I factor in B. glabrata (Knight et al., 1992) whose open reading frame (ORF) was found to encode an RT. Subsequent gene discovery studies using ESTs identified a cDNA transcript (Bg37) with 97% sequence identity to the previously reported BGR2 element. In addition to the identification of this transcript, varying levels of RT enzyme activity were detected in the resistant and susceptible snails before and after parasite infection (Raghavan et al., 2003). Using the cDNA Bg37 and the genomic BGR2 as probes, the complete non-LTR element, from here on named nimbus (BgI), was isolated from a B. glabrata (BS-90) bacterial artificial chromosome (BAC) library.
Very few mobile elements have been characterized in mollusks. Aside from the truncated BGR2 element (Knight et al., 1992), pearl from the cupped oyster Crassostrea virginica (molluscan class Bivalvia) is the only other putative non-autonomous DNA transposon to be identified in the phylum Mollusca (Gaffney et al., 2003). Here, we provide the discovery and characterization of nimbus (BgI), an MGE of the freshwater snail B. glabrata and the first full-length, non-LTR retrotransposon from a mollusk.
Snails used in this study were either recent field isolates or laboratory maintained. Of the laboratory maintained snails, two different B. glabrata snail stocks were used. The BS-90 stock (wild type, pigmented) isolated in Salvador, Brazil (Paraense and Correa, 1963), has been maintained in the laboratory since its isolation and serves as a stable stock that displays the parasite-resistant phenotype at any age (either as an adult or juvenile). The M-line stock (albino) was bred for high susceptibility to infection (Newton, 1955) and although genotypic variability among and within these snails exists, this stock continues to be useful as one that displays the parasite susceptibility phenotype. Recent field-isolated species of Biomphalaria (Biomphalaria tenagophila, Biomphalaria straminea, Biomphalaria alexandrina, Biomphalaria pfeifferi, Biomphalaria sudanica, Biomphalaria schrammi) or species of different genera (Bulinus truncatus, Oncomelania hupensis hupensis, Planorbis sp.) that can either serve or not serve as intermediate hosts were collected from several countries in Africa (Old World) and South America (New World). Once in the laboratory, snails were maintained at ambient temperature in aerated water and fed romaine lettuce. Prior to DNA isolation, snails were treated with ampicillin (100 mg/ml) overnight at room temperature to avoid potential contamination from resident bacteria. DNA was isolated from whole individual snails and from adult schistosomes as previously described (Knight et al., 1998).
Haemocyte cDNA libraries were constructed from either normal or parasite-exposed BS-90 snails. The construction of the libraries and processing of individual clones to generate ESTs have previously been described (Raghavan et al., 2003).
Probes ([32P]-labeled) used for Southern hybridization were prepared from purified PCR products (572 bp for BGR2 and 272 bp for Bg37) from recombinant plasmids containing either the BamH1 genomic L1- related element (BGR2; GenBank Acc. No. X60372) or the cDNA insert of nimbus (Bg37, GenBank Acc. No. EF413179) identified from the haemocyte ESTs (GenBank Acc. No. AW740305) as described previously (Knight et al., 1992; Raghavan et al., 2003). For the Southern blots, BamHI and PstI genomic DNA digests were separated on a 1% agarose gel, blotted on to Zeta Probe™ membrane (Bio-Rad Laboratories, Hercules, CA) and probed with either the ([32P]-labeled) BGR2 or Bg37 (1 x 105 cpm/ml each of acid precipitable counts). Hybridization was performed at 65°C for 18 h and membranes washed according to standard stringent conditions (Maniatis et al., 1982). The blots were stripped between hybridizations with the two different probes and tested for residual radioactivity by autoradiography prior to re-probing.
Copy number analysis of nimbus (BgI) was performed by slot blot analysis by comparing the hybridization signal of nimbus (BgI) with that of the single copy B. glabrata ferritin gene using the same blot. Briefly, nylon membrane (Hybond N+, Amersham) cut to fit the slot-blot manifold (BioRad) was equilibrated with deionized water and 6 X sodium chloride-sodium citrate (SSC). After placing the pre-wet membrane on three layers of Whatman 3MM paper (pre-equilibrated with 6 X SSC) in the slot-blot manifold, DNA (10-fold serial dilutions from 10 μg to 1 ng in a total volume of 150 μl tris-EDTA (TE) buffer, pH 8.0) was pipetted into microcentrifuge tubes containing 12 μl of 5 M NaOH and 3 μl 0.5 M EDTA. Samples were boiled (10 min), snap-chilled on ice and neutralized (equal volume of 2 M ammonium acetate) before dispensing into wells. Samples were pulled through the slots by gentle vacuum and wells washed with 2 X SSC. After u.v. cross-linking, blots were cut into strips before hybridizing individual strips with either [32P]-labeled ferritin or the nimbus (BgI) (RT domain) as probes under standard hybridization conditions (Maniatis et al., 1982). Membranes were washed at moderate stringency (1 X SSC, 0.5% SDS at 60°C) before subjecting to autoradiography on X-ray film with intensifying screens at −70°C for 7 days. Densitometric analyses of the autoradiograph were performed on a Macintosh computer using the public domain NIH Image program (developed at the U.S. National Institutes of Health and available on the Internet at http://rsb.info.nih.gov/nih-image/). Autoradiographs were captured digitally and analyzed using the NIH ImageJ program.
Genomic DNA (50 ng) from the various snail isolates was processed for PCR and amplified using Taq polymerase (Promega, WI). Amplified products were resolved by gel electrophoresis (1% agarose) in TBE (0.089 M Tris-borate pH 8.0, 0.089 M boric acid, 0.002 M EDTA) buffer and visualized by u.v. trans-illumination of the ethidium-stained gel. Gene specific primers (GSP) used in PCR were purchased from Sigma Genosys (St. Louis, MO) and designed using the Accelrys GCG program, “Prime” (Deveraux et al., 1984), for the following samples: haemocyte EST (Bg37), Forward primer (Bg37for)- 5′ GCTCCATTAAACCGAACAGAC 3′ and Reverse primer (Bg37rev)- 5′CCCCGTAGATCATTGCTAAC; Genomic LINE (BGR2) RT domain, Forward primer (BGR2-3)- 5′ ATCACCGACCTACTTGCACC 3′ and Reverse primer (BGR2-4)- 5′ GATTCGGCTTACTGCCTTCC 3′ with the following conditions; denaturing: 94°C for 1 min, annealing: 56°C for 1 min, extension: 72°C for 2 min, for 30 cycles.
Total mRNA was isolated from whole snails and treated with RNase- free DNase (RQ1, Promega, WI) as described previously (Miller et al., 2001). First strand cDNA was synthesized using the treated mRNA in the presence of an oligo (dT) primer, either in the presence or absence (control reactions with no RT were performed to preclude any DNA contamination) of RT (SuperScript RT, Invitrogen Life Technologies, Carlsbad, CA) as previously described (Miller et al., 2001). Second strand amplifications of the templates were performed using the same GSPs of the haemocyte EST cDNA(Bg37) as described above. As a positive control, the constitutively expressed B. glabrata myoglobin gene (Dewilde et al., 1998) was amplified in parallel using the same conditions as described previously (Miller et al., 2001) and the following primer sets: myo5′primer- 5′ GATGTTCGCCAATGTTCCC 3′ and myo3′primer- 5′ AGCGATCAAGTTTCCCCAG 3′. Another control included a no-cDNA template reaction, for both sets of amplification reactions (Bg37 and myoglobin) to rule out any contamination of the PCR reagents used. The PCR products were analyzed on a 1% agarose gel and the bands visualized by u.v. transillumination of ethidium bromide-stained gel.
High molecular weight DNA was made from the head foot region of the BS90 (resistant) snail (six adults, 10 mm in diameter) after incubating overnight at room temperature in ampicillin (100 μg/ml). Dissected tissue was plunged directly into liquid nitrogen, crushed into a fine powder at −70°C before resuspending in 3 vol. of 50 mM EDTA (pH 8.0). After the addition of molten agarose at 70°C (InCert, FMC BioProducts, ME) made up in 125 mM EDTA (pH 7.5), the mixture was dispensed into moulds (500 μl) and allowed to solidify for 20 min at 4°C. DNA-impregnated agarose plugs were removed from the moulds and incubated in CTAB (2% w/v cetyl trimethyl ammonium bromide, 1.4 M NaCl, 0.2% 2-mercaptoethanol, 20 mM EDTA, 100 mM Tris/HCl pH8.0) proteinase K (2 mg/ml) lysis buffer for 24 h at 55°C. Digestion was allowed to proceed for a second day with a single change of buffer before washing the plugs at 10 min intervals (six to eight times) at 55°C in 50 mM EDTA (pH 8.0) followed by washing (five to six times) in TE buffer (10 mM Tris.HCl, 0.1mM EDTA pH 8.0) at 37°C and four times at room temperature. Washed plugs were incubated for 2 h at 37°C in TE buffer containing 1.0 mM phenylmethylsulphonyl fluoride (PMSF). After one change of buffer, incubation in protease inhibitor was continued overnight at 37°C before repeating the series of washes as described above. Washed plugs were stored indefinitely in TE buffer at 4°C and digested with restriction enzyme HindIII before ligation. Fragments between 100-50 Kb were resolved by transverse alternating field electrophoresis (TAFE; Stewart et al., 1988) before ligation into the HindIII site of the BAC vector (pBeloBAC11), performed in collaboration with Molecular Genetics (Huntsville, AL). From three separate ligation reactions 21,952 colonies were obtained with a size range of inserts between 60–88 Kb which translates approximately into two-fold coverage of the 931 Mb estimated genome size of the B. glabrata snail. Individual colonies were picked into 384 well plates (stored at −70°C in 50% glycerol) and gridded onto Luria-agar chloramphenicol (12.5 μg/μl) plates which were overlaid with nylon membranes (Zetaprobe) and probed using the [32P]-labeled PCR products from BGR2 (572 bp) or the transcript Bg37 (272 bp). DNA lifted from colonies (384-well format, high-density gridded membranes) in duplicate were processed for hybridization and washed under high stringency conditions as described in Maniatis et al. (1982). Clones that hybridized to both probes were processed for further sequencing only if signals were present in duplicate lifts and after verification by nested PCR to check for the expected size product using either BGR2 or Bg37 GSPs and DNA from the positive BAC clones as a template.
DNA sequences were analyzed using the Accelrys GCG program (Deveraux et al., 1984), EMBOSS (Rice et al., 2000) and comparisons made with sequences in the protein and nucleic acid public databases using the BLAST algorithm (Altschul et al., 1990). All sequences have been submitted to GenBank with accession numbers (EF413179, nimbus cDNA Bg37; EF413180, nimbus BgI element and EF418587, complete BAC sequence BRIBAC72bg_line5). In addition, all sequences mentioned in this paper are designated with BRI numbers and clone identification numbers.
For phylogenetic analyses, the B. glabrata nimbus (BgI) RT sequence was added to previously reported alignments (Permanyer et al., 2003, 2006). The B. glabrata BGR element (that was a partial sequence of nimbus, BgI) was replaced with the complete nimbus (BgI) RT sequence (GenBank Acc. No. EF413180), the Aedes aegypti mosquI element was added to the alignment and sequences with very large gaps were eliminated from the alignment. The multiple sequences were then clustered using CLUSTAL X 1.83 (Thompson et al., 1997) with pair-wise gap penalties. Phylogenetic analysis was performed using the neighbor-joining method with corrections for multiple substitutions, excluding positions with gaps and an unrooted tree. The tree was drawn using NJ plot (Perrière and Gouy, 1996). Confidence in each node was assessed by 1,000 bootstrap replicates.
Knight et al. (1992) demonstrated that a major repetitive element (BGR2) in the genome of B. glabrata that is present as a 1.95Kb BamHI fragment was homologous to the human L1-like retrotransposon (Fanning and Singer, 1987). Attempts at finding the corresponding transcript for BGR2 (indicative of its active transcription) by conventional Northern blots were unsuccessful. During gene discovery studies involving haemocytes (immune effector cells), an EST (GenBank Acc. No. AW740305) corresponding to this BamH1 repetitive element was identified (Raghavan et al., 2003), offering evidence that this L1-related retrotransposon might be active in the snail. This 476 bp cDNA transcript, named Bg37, was fully sequenced and contained additional sequence extending the 5′ end than was previously available in the original BGR2 sequence. The sequence of this transcript has been submitted to GenBank with the accession number EF413179.
Because of the significant match between the Bg37 transcript and BGR2 (97% sequence similarity) we used Southern hybridization to re-examine whether the transcribed version was organized as the same BamH1 repetitive element (BGR2) (Knight et al., 1992) in the snail genome. DNA isolated from either resistant (BS-90) or susceptible (M-line) snails and the parasite (S. mansoni) after digestion (BamH1 and Pst1), was blotted and probed as described in Materials and methods. Blots were probed either with the [32P]-labeled 1.95 Kb genomic L1-like element BGR2 (Fig. 1A) or the radiolabeled 476 bp nimbus (BgI) transcript Bg37 (Fig. 1B) after stripping and re-probing. As expected, results indicated that both probes recognize the same 2 Kb BamH1 repetitive fragment within the snail genome as previously described (Knight et al., 1992). Likewise, both probes hybridized to the same repetitive ~ 3.5Kb Pst1 restricted band in the genomes of either the resistant (BS-90) or susceptible (M-line) snails. No signal was detected with these probes against S. mansoni DNA, indicating the absence of this sequence in the parasite.
Nested GSPs based on the 476 bp haemocyte nimbus (BgI) transcript Bg37 were used to amplify a 272 bp fragment by RT-PCR using DNase -treated RNA prepared from parasite-resistant and -susceptible adult and juvenile snails (Fig. 2A lanes 2–5). Results showed that a transcript corresponding to this retrotransposon is actively expressed in both adult and juvenile snails. Myoglobin was used as the housekeeping gene (352 bp) to indicate that equivalent amounts of mRNA were used during the RT-PCR reaction (Fig. 2B lanes 2–5). Since the PCR reactions were performed to saturation, no quantitative basis could be attributed to age-related differential regulation of the L1 related Bg37 transcript. In addition, in both Figs. 2A and 2B lanes 7–10 reflect PCRs performed using Bg37 and myoglobin primers, respectively, on the four snail samples in the absence of RT to rule out DNA contamination. The lack of product from these samples (aside from primer dimers) indicates the absence of DNA contamination in the RNA samples. Lanes 6 and 11 in both Figs. 2A and 2B show the “no template” negative control to rule out any contamination from the PCR reagents.
Copy number analysis for the retrotransposon nimbus (BgI) was deduced by slot blot analysis compared with hybridization of the same blots with the single copy ferritin gene (Adema et al., 2006). Ten-fold serial dilutions of denatured and neutralized B. glabrata BS-90 DNA from 10 μg-1 ng were spotted. The blots were cut into strips before hybridizing individual strips with either the labeled, ferritin gene or nimbus (BgI) RT as probes (Figs. 3A and B, respectively). The probe for nimbus (BgI) detected a signal in as little as 10 ng of B. glabrata DNA, while the single copy ferritin probe detected a signal at the higher concentration (1 μg) of genomic DNA. Nimbus (BgI) was detected at a 100-fold lower concentration of genomic DNA compared with the single-copy ferritin gene. This result indicated that at least 100 copies (including partial copies) were present throughout the B. glabrata genome.
Because these studies were conducted with laboratory maintained snails, our concern was that the nimbus (BgI) retrotransposon represents an MGE unique to these snails and not to those in their natural environment. We were, therefore, interested in analyzing the distribution of this element in both field isolates of the same species and snails of other genera from different geographical regions (Old and New World). In Fig. 4, the distribution of the nimbus (BgI) retrotransposon in the genomes of 11 different snails was examined by PCR using GSPs designed from the corresponding nimbus (BgI) cDNA Bg37. Of the Biomphalaria snails examined, all are successful hosts for the transmission of S. mansoni. Geographically, B. sudanica, B. alexandrina and B. pfeifferi (lanes 9–11) are of African (Old World) origin, whereas the others are neotropical (lanes 3–8). DNA samples from snails that transmit schistosomes of other species e.g. B. truncatus (host for Schistosoma haematobium, lane 12) and O. hupensis (host for Schistosoma japonicum, lane 13) were also included in the analysis, as was DNA from Planorbis (lane 8), an organism that represents a pulmonate gastropod that is not an intermediate host for human schistosomes. As shown in Fig. 4, the PCR product (272 bp) corresponding to the expected size of the nimbus (BgI) cDNA transcript (Bg37) product was amplified from DNA from all the snails examined. The uniform distribution of this retrotransposon in all these snails, including phylogenetically distant Pomatiopsidae Oncomelania and more recent Basommatophora snails (Biomphalaria and Bulinus), provides evidence (Dejong et al., 2003) of an ancient colonization of the genomes of these snails by this L1-like retrotransposon.
A genomic BAC library of B. glabrata BS-90 was screened either with the [32P]-labeled nimbus (BgI) transcript Bg37 (272 bp) or the original BGR2 RT region (572 bp) amplified RT-PCR products as probes. Of the 10 BAC clones that cross-hybridized with both probes, five were subjected to preliminary BAC sequencing. Of these, one clone named BRIBAC72bg_line5 was selected for complete sequencing and closure due to the presence of a large number of RT domains. The entire BAC insert (65,764 bp, GenBank Acc. No. EF418587, Fig. 5) contained one complete, full-length nimbus (BgI) element (element I), two full-length elements containing deletions and substitutions (elements II and III) and 10 truncated copies (elements A, B and C), all highlighted in dark grey. In addition, a truncated copy of an unrelated non-LTR retrotransposon CR1 (shown in black), one truncated copy of another non-LTR retrotransposon, RTE (in the reverse orientation, shown in black) and two truncated copies of the LTR retrotransposon gypsy (highlighted in black) were found in this clone. The full-length nimbus (BgI) elements (I, II and III) displayed the characteristic hallmarks of Class 1 non-LTR retrotransposons,, ORF1 (encoding a nucleic acid binding protein) and ORF2 encoding the apurinic/apyrimidinic endonuclease (APE), RT and RNase H (RNH) domains. While element I was complete, elements II and III showed multiple deletions in ORFs 1 and 2. The other ten truncated copies of nimbus (BgI) in the BAC clone contained no ORF1 and only a partial ORF2.
The nucleotide and amino acid sequences of the full-length nimbus (BgI) retrotransposon (GenBank Acc. No. EF413179) is shown in Supplementary Fig. 1A. Blastx (Altschul et al., 1990) analysis of the coding regions (ORF1 and 2) showed significant sequence similarities (40–50%) to other non-LTR retrotransposons from Aedes aegypti (Tu and Hill, 1999), the fruit fly Drosophila melanogaster (Berezikov et al., 2000) and the sea squirt Ciona intestinalis (Permanyer et al., 2003), with E values of > e−20.
Analysis of ORF1 in the 5,869 bp complete nimbus (BgI) retrotransposon (Fig. 5, element I and Supplementary Fig. 1A) revealed the presence of zinc finger domains. ORF1, also known as the nucleic acid binding domain (NABD, defined by horizontal arrows labeled nucleocapsid), had three zinc finger domains of the CCHC type mainly found in the NABD of retroviruses. Two of the nimbus (BgI) NABD domains had the invariant sequence type CX2CX4HX4C and the other, the variant sequence CX2CX4HX6C (Supplementary Fig. 1A, underlined amino acids). Putative promoter sequences TATA and CAAT (Supplementary Fig. 1A, underlined in bold italics) were present at nucleotide positions −112 and −164, respectively, upstream of the start codon for ORF1. ORF1 was separated from ORF2 by a short stretch of 20 nucleotides before the start of ORF2. ORF2 encoded the APE (domains defined by horizontal arrows labeled endonuclease), RT (domains defined by horizontal arrows labeled reverse transcriptase) and RNH (domains defined by horizontal arrows labeled RNase H). The non-coding regions (NCRs) 5′NCR (565 bp) were present prior to the start of ORF1 and the 3′ NCR (234 bp) at the end of ORF2.
The 5′ and 3′ NCRs of all nimbus (BgI) elements were complex (Supplementary Fig. 1B). The 5′ NCR contains a highlighted 16 bp repeat element (RE) at the 5′ end (5′RE: 5′-TTTAAATTCTCACTAC-3′) that has a direct counterpart (3′RE: 5′-TTTAAATTTTTACTAC-3′) in the 3′ NCR. While we have been unable to identify a function for these REs, they are intriguing since they differ from each other with the subsitution of two “T” residues in place of “C” residues and are invariant in their positioning relative to the nimbus (BgI) element. The 3′ end of the NCR is characterized by the presence of a variable tandem repeats (TCAAn or CAAn). These features are different from the fruit fly (Fawcett et al., 1986) or mosquito (Tu and Hill, 1999) I elements that have either poly (A) or “TAA” tandem repeats. These nimbus (BgI) variable tandem repeats are AT rich (75%), and define the junction of the 3′ target site duplication (TSD) at the end of each element in cis.
Putative TSDs define the 5′ and 3′ ends of the full-length nimbus (BgI) elements. The TSDs also vary in sequence and size (Supplementary Fig. 1B) in the three full-length elements (Element I TSD: 5′-ACCATTCGACCC-3′; Element II TSD: 5′-ACATG-3′ and Element III TSD: 5′-TGACAGAGA-3′). Multiple insertions of the truncated nimbus (BgI) elements (Fig. 5, regions A, B and C) in close juxtaposition in the BAC clone made the identification of the TSDs on either ends of these elements difficult (Supplementary Fig. 1B). However, both the 5′ and 3′ ends of these truncated elements resemble the 3′ NCR of the full-length elements.
A schematic diagram of a reprentative retrotransposable element belonging to the I-clade is shown in Fig. 6A. Phylogenetic analysis was performed using multiple sequence alignments of the RT domain (conserved regions 0–5) comprising of 440 amino acids of the nimbus (BgI) element and 88 other known elements of the 14 non-LTR clades. As shown in Fig. 6B, all clades were supported with significant bootstrap values (> 50%) using the neighbour-joining method used by the program NJ plot (Perrière and Gouy, 1996). Based on this analysis we found that nimbus (BgI) belongs to clade I (Malik et al., 1999), along with the CiI element of C. intenstinalis (Permanyer, et al., 2003), mosquI from the Aedes aegypti (Tu and Hill 1999), I factor from two different species of Drosophila (Fawcett et al., 1986), and the trypanosome L1 and INGI elements (Kimmel et al., 1987; Martin et al., 1995). Since this is an unrooted tree, the relationship between the major groups cannot be assigned definitively.
MGEs, or transposable elements, are major components of eukaryotic genomes but are poorly understood (Craig, 2002). They are one of the principal driving forces in eukaryotic evolution (Charlesworth et al., 1994) and can disrupt genes, induce genomic rearrangements, influence gene expression and mobilize various types of non-autonomous sequences. While non-LTRs had been identified in a large number of metazoans, the lack of a large pool of sequences has hampered the identification of MGEs in mollusks. This paucity will be addressed by the sequencing of the 931 Mb snail genome (Gregory 2003) which is underway (Raghavan and Knight, 2006) and should provide us with more information on the genomic organization, copy number and distribution of this and other MGEs in the snail genome.
It was serendipitous that aa L1-like element BGR2, with an ORF encoding RT, showing similarity to human L1 (Scott et al., 1987) and drosophila I factor (Fawcett et al., 1986) was discovered in the snail B. glabrata, when genomic DNA analysis involving festriction fragment length polymorphism (RFLP) studies were being conducted (Knight et al., 1992). With the availability of snail ESTs and BAC libraries, more of these non-LTR sequences were identified and a functional but variable RT enzyme activity was also demonstrated in the snail (Raghavan et al., 2003). Using the B. glabrata EST transcript (Bg37) and the original genomic BGR2 as probes, we have now identified from a B. glabrata (BS-90) BAC library the corresponding contiguous snail non-LTR retrotransposon that we have named nimbus (BgI) to denote its potential for mobility in the snail genome. Since nimbus clusters with I factor (group V) of non-LTRs (Finnegan, 1989; Busseau et al., 1994; Berezikov et al., 2000), the designation BgI has also been included in the naming of the snail element to signify which clade it belongs to (clade I).
Southern hybridization results (Fig. 1) showing that both genomic (BGR2) and transcript (Bg37) probes hybridize to the same discrete fragments (2.0, Kb BamH1 and 3.5 Kb, PstI) demonstrates the conserved organization of both active and inactive elements in the genome. The copy number of the nimbus (BgI) element indicated the presence of ~ 100 copies distributed throughout the B. glabrata genome. While the copy number of nimbus (BgI) compared with other non-LTR elements is lower than expected, variation between copy numbers of non-LTR elements belonging to the various clades have been reported (Permanyer et al., 2006). MosquI element from A. aegypti, to which nimbus (BgI) is related, is present in only14 copies.
Nucleotide sequence analysis of the entire insert (65,764 bp) of a BAC clone that satisfied the screening strategy we adopted revealed the presence of several non-LTR sequences in this single clone. The BS-90 BAC clone contained three full-length nimbus (BgI) elements (Fig. 5, elements I, II and III) and 10 truncated copies (Fig. 5, elements A, B and C). Truncated copies of other non-LTR retrotransposons (CR1 and RTE) and the LTR retrotransposon (gypsy) were also present in the BAC sequence. It is therefore possible that this clone represents a transposition hot spot in the B. glabrata genome, a phenomenon that has been observed with the mosquito mosquI element (Tu and Hill, 1999). Transposition activity of non-LTR retrotransposons frequently generates 5′ truncated copies that are unable to transpose because they lack the coding and promoter sequences that are essential for transposition (Luan et al. 1993) and consequently evolve as pseudogenes.
Phylogenetic relationship of the various non-LTR retrotransposons can only be assigned via the common RT domain of all the elements, since this domain is absolutely required to achieve active transposition. The RT domain was used to establish the phylogeny of nimbus (BgI) and the 14 reported clades of the non-LTR retrotransposable elements. In the neighbour-joining tree (Fig. 6B), while most of the clades were supported with significant bootstrap values (≥50%), clade I showed the lowest bootstrap value (70%), in agreement with previous analyses (Malik et al., 1999; Tu and Hill, 1999; Permanyer et al., 2003, 2006). However, we believe our results are the first reported from a gastropod for the full-length sequence of a non-LTR retrotransposon that may be functional and may provide some insight into the evolutionary timeline of the colonization of invertebrate genomes by these MGEs.
The full-length nimbus (BgI) elements contained two ORFs, ORF1 and ORF2. ORF1 revealed the presence of three zinc finger motifs of the CCHC type that are characteristic of this domain ((Dawson et al., 1997; Apweiler et al., 2001). This region, usually composed of 14 amino acids and resembling the ‘zinc fingers’ of transcription factors, is the only highly conserved sequence element among the retroviral proteins (Katz and Jentoft, 1989) and is required for viral genome packaging and the early infection process. ORF2 codes for an APE, RT and RNH (Fawcett et al., 1986, Feng et al., 1998). Like other active non-LTR retrotransposons, the two ORFs from nimbus (BgI) are flanked by 5′ and 3′ NCRs and a direct repeat indicating possible TSD sites. While this site by itself is variable in both size (six to 12 nucleotides) and composition (see Results section 3.5), they are present within a larger sequence of 150 bp or more (Finnegan, 1997). The variable TSDs may be generated during the asymmetric cleavage of the target DNA sequence by the nimbus (BgI) endonuclease (APE). While most non-LTR retrotransposons are terminated at the 3′ end either by an A-rich sequence or by tandem TAAn repeats as in the fruit fly and mosquito I elements, we found variable tandem-repeats (TCAAn or CAAn) immediately preceding the 3′ TSD in both the truncated and full-length copies of nimbus (BgI). Several features about this snail element are unique among other known elements of the I-clade, These include the complex nature of the 5′ and 3′ NCRs that are highly conserved, different types of TSDs, the invariant 5′ and 3′ 16 bp REs, and the variable tandem repeats at the 3′ ends. These novel features are interesting enough to warrant further investigation. Intriguingly, the nimbus 5′ and 3′ NCRs have been discovered to be highly conserved in the genome of S. mansoni when screened against the S. mansoni whole genome shotgun database available either at the Sanger Center (using BLASTN) or NCBI (using discontiguous Mega BLAST). Significant nucleotide matches (approximately 80% identity, data not shown) between the nimbus 5′ and 3′ NCRs and parasite might be an indication of possible horizontal transfer of this sequence between the snail host and its parasite. Although several MGEs have been characterized from the schistsosomes (S. mansoni and S. japonicum; Brindley et al., 2003), none have so far been identified as belonging to the I-clade. In conclusion, nimbus (BgI) is a complete, mobile element of the freshwater snail B. glabrata and the first full-length MGE from a mollusk, a discovery that may provide insights into the snail/parasite relationship and, at a future date, be useful for genetic transformation studies aimed towards reducing snail-borne diseases.
A). The 5,869 sequence of the complete nimbus (BgI) element (GenBank EF413180). Capital letters denote coding sequences and small letters, the non-coding sequences. Position 1 in the numbering scheme is the first nucleotide of the start codon ATG (indicated by bold text) that begins the open reading frame 1 (ORF1; nucleic acid binding protein) coding region. A short, non-coding region of 20 bp separates ORF1 and ORF2. The ORF2 sequence encodes the apurinic/apyrimidinic endonuclease (APE), reverse transcriptase (RT) and RNase H (RNH) and horizontal arrows delineate these three hallmark domains. The amino acids of the various ORFs are shown in single letter codes and also labeled from 1-461 for ORF1 and 1-1,222 for ORF2. The three zinc finger regions composed of the sequence CX2CX4HX4C and the variant CX2CX4HX6C are underlined. The nimbus(BgI) coding region is flanked at either end by the 5′ and 3′ non-coding regions (NCRs) that are represented in small letters. The 3′ NCR consists of a conserved sequence containing a 16 bp repeat element (RE; bold, underlined region and Supplementary Fig. 1B) at the end of which is a stretch of “TCAAn or CAA” variable tandem repeats (highlighted in bold) that occur prior to the 3′ TSD (capital letters, boxed region) and defines the end of one element and the beginning of the next sequence. The 5′ NCR also begins with a 5′ target site duplication (TSD) (capital letters, boxed region), and contains a 16 bp 5′ RE (bold, underlined region and Supplementary Fig. 1B) and putative promoter sequences (CAAT and TATA, italics) upstream of the initiation codon of ORF1. B) Multiple alignment of the 5′ non-coding regions (NCRs) and 3′ NCRs of the nimbus (BgI) elements. The top panel shows the alignment of the 3′ NCRs of all nimbus (BgI) elements in the order they are arranged in the bacterial artificial chromosome (BAC) clone. The full-length elements are labeled I, II and III and the truncated transposition cassettes as A, B and C (the individual elements within each cassette is designated A1, A2 and A3; B1 and B2; C1, C2, C3. C4 and C5). The 3′ NCR panel is defined by the name of the element, the beginning bp, the 3′ NCR conserved domains (shown in red and blue), the 16 bp 3′ repeat element (RE) (shown in green), the variable tandem repeat (TCAAn or CAAn) region shown in orange and the start of the next region that may or may not contain the target site duplications (TSDs) (the 3′ TSDs shown in black are identifiable in full-length elements only) and the end bp position. The bottom panel shows the alignment of the three 5′ NCRs of the full-length nimbus (BgI) elements in the sequential order in which they appear 5′-3′ in the BAC clone. The full-length elements are labeled as I, II and III. The 5′ NCR panel is defined by the name of the element, the beginning bp position, the end of the previous region that contain the 5′ TSDs (black), the 5′ NCR conserved domains (blue and red) containing the 16 bp 5′ RE (shown in green), and the end bp position.
We wish to acknowledge the Biowulf PC/Linux cluster at the NIH, Bethesda, MD (http://biowulf.nih.gov) used for the sequence analysis and Dr. Peter FitzGerald for his help and support with the bioinformatics undertaken for this work. We also thank Maria Ragland for her help in the BAC library construction, Fred Lewis for his support and, Mitzi Sereno for her help in editing this manuscript. Thanks goes to the following for sending us field-isolated snails used in this study, Drs. C.P. de Souza, D. Minchella, E.S Loker and to Dr. H. Shizuya for sending us the pBeloBAC11 vector. This work was funded by; USF-Sandler Foundation Grant, 9000005939 and NIH-NIAID Grant, AI-063480-01.
Note: Nucleotide sequence data reported in this paper have been submitted to GenBank with the following accession numbers: EF413179 (nimbus cDNA Bg37), EF413180 (nimbus BgI element) and EF418587 (complete sequence of the BAC clone BRIBAC72bg_line5)
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.