|Home | About | Journals | Submit | Contact Us | Français|
The chloroplasts of Euglena gracilis bounded by three membranes arose via secondary endosymbiosis of a green alga in a heterotrophic euglenozoan host. Many genes were transferred from symbiont to the host nucleus. A subset of Euglena nuclear genes of predominately symbiont, but also host, or other origin have obtained complex presequences required for chloroplast targeting. This study has revealed the presence of short introns (41–93 bp) either in the second half of presequence-encoding regions or shortly downstream of them in nine nucleus-encoded E. gracilis genes for chloroplast proteins (Eno29, GapA, PetA, PetF, PetJ, PsaF, PsbM, PsbO, and PsbW). In addition, the E. gracilis Pbgd gene contains two introns in the second half of presequence-encoding region and one at the border of presequence-mature peptide-encoding region. Ten of 12 introns present within presequence-encoding regions or shortly downstream of them identified in this study have typical eukaryotic GT/AG borders, are T-rich, 45–50 bp long, and pairwise sequence identities range from 27 to 61%. Thus single recombination events might have been mediated via these cis-spliced introns. A double crossing over between these cis-spliced introns and trans-spliced introns present in 5′-UTRs of Euglena nuclear genes is also likely to have occurred. Thus introns and exon-shuffling could have had an important role in the acquisition of chloroplast targeting signals in E. gracilis. The results are consistent with a late origin of photosynthetic euglenids.
Euglena gracilis belongs to the order Euglenida, the protist phylum Euglenozoa, and the eukaryotic supergroup Excavata. The phylum Euglenozoa includes also the orders Kinetoplastida (including suborders Trypanosomatina and Bodonina) and Diplonemida. The monophyly of Euglenozoa has been suggested based on various common morphological features, e.g. discoidal mitochondrial cristae and a characteristic feeding apparatus,1,2 and on molecular phylogenies.3 Moreover, Euglenozoa share the presence of the modified base ‘J’ in the nuclear DNA.4 There is little evidence for the presence of signalling pathways regulating nuclear gene expression at the transcriptional level.5,6 The addition of non-coding capped spliced-leaders to nuclear pre-mRNAs via trans-splicing is also common among Euglenozoa.7–12
Euglena gracilis and other phototrophic euglenids possess chloroplasts surrounded by three membranes.13 These arose by a secondary endosymbiotic event in which an euglenozoan host engulfed a green alga.14–16 Chlorarachniophytes (belonging to the supergroup Rhizaria) possess complex green plastids with four envelope membranes and nucleomorph, obtained via an independent secondary symbiosis.17 While plastids of euglenids descended from a prasinophyte, chlorarachniophyte plastids most likely descended from an ulvophyte green algal endosymbiont.18
Many Euglena nuclear genes, mostly of symbiont (i.e. resulting from endosymbiotic gene transfer from the nucleus of the primary host cell to the nucleus of the secondary host cell), but also of host or other origin have acquired presequences for chloroplast targeting. Most presequences required for chloroplast import in Euglena are tripartite, comprising in order: N-terminal signal peptide for targeting to ER, the S/T-rich region resembling transit peptides of organisms possessing primary plastids, and the stop-transfer sequence serving as a membrane anchor (class I proteins, comprising also thylakoid-lumen-targeted class IB proteins possessing an additional hydrophobic thylakoid transfer domain).19–22 Therefore, the major part of the protein precursor stays ‘outside’ while passing through ER, Golgi apparatus, and membrane vesicles prior to their fusion with the outermost chloroplast membrane.19–21 A recent in-depth analysis of E. gracilis presequences revealed another set, the class II of nucleus-encoded plastid protein precursors.22 These lack the putative stop-transfer sequence and possess only a signal sequence at the N-terminus, followed by a transit-peptide-like sequence.22
The complete sequence of the E. gracilis chloroplast genome disclosed an unusually high number of introns: groups II and III introns, and even twintrons (introns within introns).23 However, little is known about introns in nuclear genes of euglenids, as only few genomic sequences from euglenids are available. Introns in the E. gracilis Lhcbm1 gene (according to the nomenclature of Koziol and Durnford,24 encoding light-harvesting chlorophyll a/b binding protein of photosystem II), RbcS genes (encoding small subunit of RuBisCo), and GapC (encoding cytosolic glyceraldehyde-3-phosphate dehydrogenase) do not possess consensus splicing borders (5′-GT/AG-3′) and structural characteristics of group I and II introns, and many of them are flanked by short direct repeats.25–27 These introns can form secondary structure, which could potentially bring together 5′- and 3′-ends, probably without the involvement of spliceosomes.25–28 However, E. gracilis contains also canonical introns, e.g. the 16 introns of the TubC genes (two gene copies encoding gamma-tubulin)28 or the introns in the fibrillarin gene.29,30 The 5′-ends of these introns can potentially base pair with U1 snRNA, suggesting that they are excised in a spliceosome-dependent manner.29 Introns with GT/AG borders are present also in the beta-tubulin gene of the non-photosynthetic euglenoid flagellate Entosiphon sulcatum.9 Furthermore, introns in E. gracilis TubA and TubB genes (encoding alpha- and beta-tubulin, respectively) are of conventional as well as of non-conventional type.28
Recombination events and exon-shuffling have been discussed by various authors as possibly involved in the addition of sequences encoding transit peptides (mitochondrial targeting signals) to nuclear genes for mitochondrial proteins.31–34 In an analogous manner, sequences encoding stroma-targeting peptides might have been added to nucleus-encoded genes for chloroplast proteins in organisms (Archaeplastida) possessing primary chloroplasts of cyanobacterial origin. Such exon-shuffling could occur via recombination processes mediated by introns. However, the identification of introns originally involved in exon-shuffling is problematic for nuclear genes encoding mitochondrial proteins, and for nuclear genes for proteins targeted to primary chloroplasts. The mitochondria arose via an alpha-proteobacterial endosymbiosis, which perhaps dates back to the origin of eukaryotes,35,36 and the cyanobacterial ancestry of primary plastids dates back to the origin of the Archaeplastida.37,38 Since then many intron integration/excision events occurred in various lineages39,40 making it almost impossible to identify introns, which were ancestrally involved in the acquisition of transit peptides. However, the secondary chloroplasts are the results of relatively recent endosymbioses of red and green algae in eukaryotic hosts (for reviews see refs 41–45). It has been suggested that recombination processes might have led to addition of presequences (or at least their parts) to nuclear genes for chloroplast proteins in organisms possessing secondary plastids.46,47 Perhaps the best evidence so far for the involvement of recombination processes mediated by introns in the acquisition of presequences and/or their parts came from the study of Kilian and Kroth48 which revealed the presence of a single intron either within the presequence region or shortly downstream of it in seven nucleus-encoded genes for plastid proteins (AtpC, FbaC1, PetJ, PsbM, PsbO, PsbU and Tpt1) in the diatom Phaeodactylum tricornutum possessing four-membrane-bounded plastids of red algal origin. In this study, we decided to extend this hypothesis to the flagellate E. gracilis possessing secondary chloroplasts of green algal origin.
Euglena gracilis (Pringsheim strain Z, SAG 1224–5/25 Collection of Algae, Göttingen, Germany) was cultivated in 100 ml Erlenmeyer flasks containing 50 ml of a modified Cramer and Myers medium49 supplemented with ethanol (0.8%) and adjusted to pH 6.9. Medium was inoculated with 5 × 104 cells per ml. Cells were grown at 27°C with continuous illumination (30 μmol photons m−2 s−1). Cultures in the exponential growth phase were used for DNA isolation.
The protocol for genomic DNA isolation was used as described in the chapter 2.3.1. (Preparation of Genomic DNA from Plant Tissue) of Current Protocols in Molecular Biology50 with following modification: cells were harvested by centrifugation at 1000 × g (3 min), then washed twice with ice-cold ddH2O, and resuspended with buffer (100 mM Tris–Cl, pH 8; 100 mM EDTA, pH 8; 250 mM NaCl) containing 8 μl of proteinase K (Merck, 20 mg/ml) per 1 ml of buffer. 20% N-lauroylsarcosine (Sigma) was added and the mixture was incubated in waterbath at 55°C for 1 h. After the steps of extractions, centrifugation (6000 × g, 30 min, 4°C), DNA precipitation (2-propanol), centrifugation (7500 × g, 15 min, 4°C) and solubilization (TE buffer, pH 8), RNA was removed (RNase A, 15 min). Thereafter, phenol:chloroform (1:1) and chloroform:isoamylalcohol (24:1) extractions were performed each followed by centrifugation (7500 × g, 7 min). One-tenth volume of 3 M sodium acetate (pH 5.2) was added to the top phase, and DNA was precipitated with 96% ethanol at −20°C, centrifuged (8000 × g, 15 min, 4°C) and washed (70% ethanol). DNA was resuspended in the TE buffer (pH 8).
Primers were derived from six E. gracilis nuclear mRNA sequences encoding chloroplast proteins. Table 1 contains the accession numbers of these mRNAs (see refs 19, 26, 51–54) and the corresponding positions of primer sequences. Another four pairs of primers were derived from four E. gracilis nuclear EST sequences (see ref. 22) encoding chloroplast proteins: PetF (ferredoxin), PsaF subunit of photosystem I, and the PsbM and PsbW subunits of photosystem II. All these four ESTs possessed SL-leader sequence (TTTTTTTCG) at the 5′-end, and were used in previous analysis of presequences of E. gracilis.22 Table 2 contains the e-values, accession numbers of these ESTs used for the design of primers, and the positions corresponding to primer sequences in these ESTs.
Primers were designed using Primer-BLAST (primer 3 and BLAST) to obtain similar melting temperature (60°C) for all primers. The effort was made to design primers such as to be able to amplify the whole presequence-encoding region and short part downstream of it (or as long part of this region as possible following our stringent primer design criteria).
The PCRs were performed in 50-µl reaction volume with the final concentration of Mg2+, primers and dNTPs as 2 mM, 0.2 µM and 0.5 mM, respectively. 100 ng of total E. gracilis DNA and 2.5 Units of Taq DNA polymerase (Invitrogen) were used per reaction. Samples were denatured by heating for 5 min at 94°C, subjected to 34 cycles of 30 s denaturation at 94°C, 1 min annealing at 58°C, and 2 min extension at 72°C, and a final cycle of 8 min at 72°C. PCR products were visualized on 1.5% agarose gels (TAE), purified using QIAquick PCR Purification Kit (Qiagene), and sequenced twice (using forward as well as reverse primers) using ABI 3130xl Genetic Analyzer (Applied Biosystems) and the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) according to suppliers’ protocols. The services of the Department of Molecular Biology (Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia) were used for sequencing of PCR products.
The sequence data were analyzed using Chromas, BLAST and CLUSTAL W. Sequence identity of the intron sequences was computed by the global alignment (Needle tool from the EMBOSS suite with the default settings).55 Since the introns have unusual nucleotide composition, which may have inflated the scores, the statistical significance of each alignment score was computed by a permutation test. For each pair of introns, 100 000 random permutations of their bases were aligned, and the empirical distribution of scores was computed. Sequences were permuted by Shuffleseq from the EMBOSS suite,55 and the consensus splice sites (GT/AG) were kept in their original position in each permutation.
The PCR products amplified using all primers were listed in Tables 1 and and22 (except those for Pbgd, PsbO, and PsbW) and total E. gracilis DNA as a template were about 50 bp longer than those expected for cDNA templates. In the cases of Pbgd, PsbO, and PsbW, PCR products were about 150, 90, and 250 bp longer, respectively.
Sequencing of seven PCR products revealed that each contained one 41–50 bp intron. The Pbgd PCR product contained three introns (48, 46, and 50 bp), the PsbO PCR product contained one 93 bp intron, and the PsbW PCR product contained two introns (48 and 195 bp). It is noteworthy, that the 195 bp psbW-i2 intron is present downstream of the stop codon in the 3′-UTR of PsbW gene. Thus the total number of introns identified in this study was 13. Except for the PsbO, PetJ, and the intron present in PsbW 3′-UTR, all introns are 45–50 bp in size and contain typical eukaryotic GT/AG consensus splicing borders (see Table 3 which also includes the accession numbers of the partial gene sequences containing introns identified in this study).
It was impossible to determine borders and phase of the 93 bp-long intron in the PsbO gene, because it does not contain consensus borders, and TG sequence is present in mRNA, but also on both intron–exon borders. The splicing borders of intron in PsbO may be TG/TT, GA/TT or AC/TG. Similar problems with the determination of intron borders have been described for the Lhcb gene-encoding LHCPII protein,25 and for GapC-encoding cytosolic glyceraldehyde-3-phosphate dehydrogenase,26 because the introns in these genes are flanked by short direct repeats (2–5 bp) and do not possess consensus splicing borders. The 41 bp intron in the PetJ gene also does not show consensus splicing borders. A guanine nucleotide is present on both its intron–exon borders, thus its splicing borders might be GT/TC or TT/CG, and it is either in phase 1 or 2.
With the exception of the 195 bp intron in PsbW (psbW-i2, with GA/GT splicing borders and no direct repeat on intron–exon borders), all introns identified in this study are present either within the second half of the presequence-encoding region or shortly downstream of it. The 48 bp intron in Eno29 gene (eno29-i1) is present between the amino acid positions 166 and 167 of Eno29 mRNA (accession number AJ272112), while presequence-encoding region ends with the position 161 of this mRNA sequence. The 50 bp gapA-i1 was localized between the codons for aa 90 and 91 of the 127 aa GapA presequence. Sharif et al.51 reported a 139 aa presequence for Pbgd, whereas Durnford and Gray22 predicted a length of 151 aa. The C-terminus of the presequence region accounts for the difference between these two studies: the 48 bp pbgd-i1 is present in the codon for aa 85, the 46 bp pbgd-i2 is inserted between the codons for aa 119 and 120, and the 50 bp pbgd-i3 localizes to the codon for aa 144 of Pbgd preprotein (i.e. either within the end of the presequence region or shortly downstream of it). The 45 bp petA-i1 is present downstream of the codon 87 of the 147 aa presequence region. The 46 bp petF-i1 was found to be inserted into the codon-specifying aa 131 of the 138 aa presequence region of the EST-encoding ferredoxin (accession number EG565162). The 41 bp petJ-i1 localizes downstream of either nt 304 or 305 of PetJ partial mRNA sequence (accession number AJ130725), with nt 267 representing the end of the presequence-encoding region. The predicted 144 aa PsaF presequence harbours the 47 bp psaF-i1 downstream of codon 94. The 47 bp psbM-i1 is inserted into codon 131 of the PsbM presequence region (predicted to 154 aa). The 93 bp psbO-i1 was identified about 60 nt downstream of the PsbO presequence-encoding region. Finally, the 48 bp intron psbW-i1 localizes between the codons 66 and 67 of the predicted 82 aa PsbW presequence.
Taken together, in this study, 13 new intron sequences present in E. gracilis nuclear genes encoding chloroplast proteins have been described. In genes encoding chloroplast-targeted proteins Eno29, GapA, PetA, PetF, PetJ, PsaF, PsbO, PsbM and PsbW, one intron has been identified within the second half of presequence-encoding region or shortly downstream of it, while in gene encoding Pbgd, two introns were identified within the presequence and one at the presequence-mature peptide border encoding region. Importantly, the BLAST search revealed no significant primary sequence similarity of the introns identified in this study to either introns present in the E. gracilis chloroplast genome, or to any introns from other organisms in public databases.
Ten of 13 introns identified in this study are conventional, and are 45–50 bp long. Introns of similar size (44–53 bp) have been already described in some other E. gracilis nuclear genes, while some of them are conventional.26–29 The only shorter introns in euglenoid species known so far are three introns (27, 29 and 31 bp-long) present in hsp90 gene of the phagotrophic euglenid Peranema trichoforum.56 Of these, only one is conventional. Nevertheless, it should be mentioned that E. gracilis introns can widely vary in size,25–30 and the largest one identified so far is the conventional intron i1 (9.2 kb) in one of the two copies of the gamma-tubulin gene.28
Interestingly, the E. gracilis nuclear gene encoding chloroplast protein RbcS also contains an intron within the second half of presequence region. The size of this intron is 53 bp, it is in phase 0, but does not possess GT/AG borders.27 In the nuclear gene Lhcb (Lhcbm1), a 86 bp intron roughly separates presequence and mature peptide coding regions.25 This intron is also non-conventional, and it is impossible to determine its phase due to TG dinucleotide present on both intron–exon borders.25 Likewise, the 93 bp intron in the PsbO presequence is also flanked by TG dinucleotide and shares 46% primary sequence identity with the 86 bp intron in Lhcb.
Importantly, 10 of 14 E. gracilis introns known to be present in the second half of presequence-encoding regions or shortly downstream of them share various characteristic features: the length (45–50 bp), consensus GT/AG splicing borders, they are AT- and especially T-rich, and possess characteristic pyrimidine tracks at the 3′-ends. Moreover, the primary sequence identity of each two of these 10 introns ranges from 27 to 61% (Table 4). Notably, the 44 and 46 bp introns of conventional type present in the E. gracilis fibrillarin gene29 share 58% primary sequence identity, and the primary sequence identity of these 2 introns and 10 45–50 bp introns found in this study ranges from 32 to 55% (Table 4). Although not all alignment scores are statistically significant (Table 4), the sequence similarity together with other characteristics of these 44–50 bp E. gracilis introns suggests that recombination events between these introns can potentionally occur. In comparison, conventional introns present within or shortly downstream of presequence regions of nuclear-encoded plastid proteins from the diatom Phaeodactylum tricornutum are 183–410 bp long and their pairwise sequence comparison did not reveal significant sequence similarity.48
Kilian and Kroth48 suggested ‘semi-exon shuffling’ as a possible mechanism for the acquisition of presequence parts (e.g. signal peptides) in diatoms. The intron present within the presequence-encoding region of the donor gene might have recombined either with 5′-UTR of acceptor gene or with its transit peptide (likely transferred from the red algal symbiont nucleus to the host nucleus with the acceptor gene), while new 3′-AG intron border in the acceptor gene might have been generated by utilizing random AG nucleotides.48 However, the primary sequence similarity of 10 45–50 bp introns present within or shortly downstream of E. gracilis presequences, and the similarity between the 86 bp intron in the Lhcb and the 93 bp intron in the PsbO presequences, suggest exon-shuffling rather than ‘semi-exon shuffling’ as a likely mechanism for the acquisition of presequences or their parts in E. gracilis.
Two possible scenarios for presequence acquisition via exon-shuffling in euglenids are depicted in Fig. 1. The first one includes single recombination events between cis-spliced introns of donor gene (possessing the presequence region) and acceptor gene (Fig. 1A). Importantly, the acceptor may gain not only the presequence region, but also the trans-spliced intron necessary for the addition of capped spliced leader. Another mechanisms for presequence acquisition in E. gracilis involves double crossing over events, one occurring between trans-introns of donor and acceptor gene, and the second involving adjacent cis-spliced introns of donor and acceptor gene (Fig. 1B). The cis-intron in the donor gene in Fig. 1 is placed right at the border of the presequence-mature peptide-encoding region for illustration. However, it could also be present within the presequence-encoding region (most likely in the second half of it). It should be mentioned that the presequence regions of E. gracilis chloroplast precursor proteins have been predicted to vary from 61 to 233 aa,22 and the shortest one currently known (that of Eno29) possibly comprises only 47 aa.51 Thus the addition(s) of shorter parts of presequence region from donor genes to acceptor genes might have resulted in targeting to chloroplasts. In addition, three introns identified in Pbgd gene might represent an example of how the presequence-encoding regions were generated via recombination events mediated by introns.
It has once been suggested that euglenids and trypanosomatids might have acquired their plastids prior to their divergence, followed by plastid loss in the trypanosomatid clade.57 However, the cladistic analysis of gene loss inferred from complete plastid genome sequences,58 and the morphological characters shared by eukaryotrophic and phototrophic euglenids but absent from osmotrophic and bacteriotrophic euglenids, and trypanosomes strongly suggest a more recent origin of photosynthetic euglenoids.59,60 The presence of short conventional introns (sharing 27–61% sequence identity) within the second half or shortly downstream of Euglena presequence-encoding regions is indicative of a relatively recent acquisition of chloroplast-targeting signals in Euglena. This is consistent with, and adds additional support for a relatively recent origin of euglenoid secondary plastids, later than the endosymbiosis of the evolutionarily ancient red algae leading to diatoms. Anyway, the repertoire for creating novel targeting sequences or for replacing the transit sequences from the primary host cell by bi- or tripartite presequences did already exist. This applies for the α-proteobacterial endosymbiosis leading to mitochondria and the above-mentioned secondary endosymbiosis leading to chromophytes, respectively: exon-shuffling at the DNA level via appropriately placed introns enabling recombination. Our data suggest that euglenids also made use of this mechanism, probably as the last in a row.
Although nuclear gene sequence data of euglenids are fragmentary, it seems that nuclear genes of euglenids possess many cis-spliced introns. In contrast, wide-scale genome data from parasitic kinetoplastids are available, but very few cis-spliced introns from trypanosomes were reported so far, including a 11 bp intron in the gene for tRNA(tyr) of Trypanosoma cruzi and Trypanosoma brucei,61 and 653 and 302 bp introns in the gene for poly(A) polymerase of T. brucei and T. cruzi, respectively.62 One might argue that almost complete loss of cis-spliced introns in trypanosomes arose through parasitic life style, as did the overall compaction of nuclear genomes of trypanosomes including fairly short intergenic spacers with polycistronic transcription63,64 and overlapping genes.65 However, cis-spliced introns seem to be rare in both parasitic and free-living kinetoplastids, and this general condition could pre-date the adoption of parasitism by the trypanosomatid lineage.66 The euglenid lineage with numerous cis-spliced introns—as opposed to the kinetoplastid lineage—likely was better pre-adapted for the acquisition of chloroplast-targeting presequences, and thus for the successful integration of an algal symbiont.
This work was supported by grants from the Ministry of Education of the Slovak Republic (VEGA 1/0416/09, to J. K.; and VEGA 1/0118/08, to R. V.), by grants from Comenius University, Bratislava, Slovakia (UK/144/2007, and UK/208/2009 to M. V.), and by grant P19683 from the Austrian ‘Fonds zur Förderung der wissenschaftlichen Forschung’ to W. L.
Edited by Satoshi Tabata