|Home | About | Journals | Submit | Contact Us | Français|
The assembly of 20,000 sequencing reads obtained from shotgun and chromosome-specific libraries of the Spiroplasma citri genome yielded 77 chromosomal contigs totaling 1,674 kbp (92%) of the 1,820-kbp chromosome. The largest chromosomal contigs were positioned on the physical and genetic maps constructed from pulsed-field gel electrophoresis and Southern blot hybridizations. Thirty-eight contigs were annotated, resulting in 1,908 predicted coding sequences (CDS) representing an overall coding density of only 74%. Cellular processes, cell metabolism, and structural-element CDS account for 29% of the coding capacity, CDS of external origin such as viruses and mobile elements account for 24% of the coding capacity, and CDS of unknown function account for 47% of the coding capacity. Among these, 21% of the CDS group into 63 paralog families. The organization of these paralogs into conserved blocks suggests that they represent potential mobile units. Phage-related sequences were particularly abundant and include plectrovirus SpV1 and SVGII3 and lambda-like SpV2 sequences. Sixty-nine copies of transposases belonging to four insertion sequence (IS) families (IS30, IS481, IS3, and ISNCY) were detected. Similarity analyses showed that 21% of chromosomal CDS were truncated compared to their bacterial orthologs. Transmembrane domains, including signal peptides, were predicted for 599 CDS, of which 58 were putative lipoproteins. S. citri has a Sec-dependent protein export pathway. Eighty-four CDS were assigned to transport, such as phosphoenolpyruvate phosphotransferase systems (PTS), the ATP binding cassette (ABC), and other transporters. Besides glycolytic and ATP synthesis pathways, it is noteworthy that S. citri possesses a nearly complete pathway for the biosynthesis of a terpenoid.
Spiroplasmas are arthropod-associated bacteria belonging to the class Mollicutes, a group of wall-less microorganisms phylogenetically related to low-G+C-content, Gram-positive bacteria (51). Spiroplasma citri is a helical plant-pathogenic mollicute responsible for the “stubborn” disease of citrus (39). It inhabits the phloem sap of infected plants to which it is transmitted by sap-sucking hemipteran insect in a circulative and propagative manner (31, 32). S. citri can infect a wide range of plant species, including crop and wild plants, as it is transmitted by polyphagous leafhoppers (13). Spiroplasmas are available in pure culture, and their study has therefore benefited from the use of molecular genetics. In particular, the relationships of spiroplasmas with their two hosts, the plant and the leafhopper vector, have been extensively studied (11, 22). In S. citri, the inactivation of genes and functional complementation of mutants have shown that (i) fructose consumption by the spiroplasma is a major cause for symptom production in plants, (ii) the solute binding protein of a putative ABC-type transporter is involved in the insect transmission process, and (iii) spiralin, the major membrane protein, is not essential for helicity, motility, and pathogenicity but is required for efficient transmission by the leafhopper vector (10, 19, 23, 24, 28). To characterize other spiroplasma genes potentially involved in insect transmission and pathogenicity, the genome of S. citri strain GII3-3X is currently being deciphered.
The S. citri genome is characterized by an abundance of extrachromosomal elements, including seven plasmids, pSciA and pSci1 to pSci6, present as 10 to 14 copies per cell. These plasmids are vertically inherited, but some of them could also be horizontally transferred, as they encode proteins involved in partitioning and the cell-to-cell transfer of DNA molecules (12, 40). Plasmids pSci1 to pSci5 encode surface proteins of the S. citri adhesion-related protein (ScARP) family, and pSci6 was previously shown to confer insect transmissibility (9). Therefore, it is likely that the abundance and diversity of plasmids could provide S. citri strain GII3-3X with the ability to quickly adapt to various vector insects and, hence, to be transmitted to diverse host plants. However, chromosome-encoded determinants are also expected to play a role in spiroplasma biology. In S. citri, the chromosome sizes vary from 1.6 to 1.9 Mbp among strains (53, 54), and part of the size variation is thought to result from different amounts of prophage sequences (35). Many S. citri strains are infected by single-stranded DNA-containing filamentous phages (Plectrovirus), whose sequences also occur as partial or full-length prophages integrated into the spiroplasma chromosome (7, 35, 38). Here we report the partial chromosome sequence of S. citri strain GII3-3X and the functional assignment of the predicted coding sequences.
S. citri strain GII3 was originally isolated from the leafhopper Circulifer haematoceps, collected in Morocco in 1980 (49). A triply cloned strain was further produced by plating onto SP4 medium, and one of the clones was further propagated as GII3-3X. Spiroplasmas were grown at 32°C in SP4 medium (48).
The S. citri chromosome was digested with NotI, ApaI, SmaI, BssHII, and SstII, and the restriction fragments were separated by pulsed-field gel electrophoresis (PFGE) with a Bio-Rad (Marnes-la-Coquette, France) CHEF DRIII apparatus. The results indicated that the chromosome contained one single site for the NotI endonuclease and 9, 19, 7, and 15 sites for ApaI, SmaI, BssHII, and SstII, respectively. The full-size circular chromosome was estimated to be 1,820 kbp. Southern blot hybridizations of the DNA fragments were performed with 43 different probes (see Fig. S1 in the supplemental material). From these data, a physical and genetic map comprising 51 restriction sites and 43 genetic markers was constructed (Fig. 1B and C). The previously reported replication origin oriC (55) was located at 1,380 kbp on the ApaB fragment.
Sequencing data were produced following a chromosome map-based approach and a classical shotgun strategy completed by the end sequencing of inserts from a mini bacterial artificial chromosome (miniBAC) library. A first pSMART library of 4,000 clones with 3- to 4-kbp inserts (prepared by Amplicon Express, Pullman, WA) and a second pBluescript library of 2,400 clones with 1- to 3-kbp inserts were produced and end sequenced. Additional data were produced by the end sequencing of a miniBAC library of 900 clones in pECBAC1 with inserts of 15 to 25 kbp obtained by limited Sau3A digestion. In addition, 12 Sau3A plasmid libraries were constructed from restriction fragments (indicated by letters in Fig. Fig.1B)1B) eluted from PFGE gels. After the primary assembly of the data, gap filling was performed by primer walking, PCR, and genome walking. The assembly and editing of 20,000 sequencing reads were performed with the phred-phrap-consed package (20, 21, 26). Incorrect assemblies of repeated sequences were detected by phrap on the basis of abnormally long distances between insert extremities, for instance, those exceeding 4 kbp for plasmid inserts. To overcome these difficulties, the misassembled DNA regions were assembled separately and completed by primer walking. The resulting consensus sequences were reintroduced into the general assembling process to resolve repeated regions and restore a normal scaffold. The phrap assembly yielded 77 chromosomal contigs totaling 1,674 kbp of the 1,820-kbp chromosome and 8 circular contigs, which proved to be an SVTS2-like viral DNA (SVGII3) and seven plasmids, pSciA and pSci1 to pSci6, with sizes ranging from 7.8 to 35.3 kbp (40). Based on the relative positions of the restriction sites and genetic markers, the 20 largest contigs were organized on the chromosome map (Fig. (Fig.1A).1A). Eighteen additional contigs, NP01 to NP18, totaling 165 kbp, could not be located on the physical map.
Sequence analyses and annotation were managed with the iANT (Integrated Annotation Tool) Web-based annotation environment developed for Ralstonia solanacearum genome annotation (41). Coding sequences (CDS) were predicted by using the FrameD program (42) trained with known S. citri genes. For the structural and functional assignment of the CDS, they were compared to data in the SP-TrEMBL database using the BLASTP program with a cutoff E value of e−10 (1), submitted to TMPRED for membrane-spanning domains, and aligned to data in the ProDom and Prosite databases for conserved protein domains (5, 16). Signal peptides were predicted by using SignalP 3.0 (8), and transmembrane topology was predicted by using TMHMM (29). The prediction of tRNA was conducted with tRNA-ScanSE (33). The genome of Bacillus subtilis was taken as a reference for the gene product and gene name. The gene product and gene name were propagated when at least 50% of each protein was involved in the alignment with at least 20% identity at the amino acid level. Levels of confidence were “blank,” probable, putative, and hypothetical when identity levels were above 80%, 60%, 40%, and 20%, respectively, for a reciprocal coverage of 80%. For reciprocal coverage in the range of 50 to 80%, levels of confidence were probable, putative, and hypothetical when identity levels were above 80%, 60%, and 40%, respectively. Annotated sequence data can be accessed at http://iant.toulouse.inra.fr/bacteria/annotation/cgi/spici.cgi?. Comparisons of proteomes were performed with Molligen 3.0 (http://cbi.labri.fr/outils/molligen/). To determine the number of orthologous CDS between two genomes, the bidirectional best hit was computed from Smith and Waterman SSEARCH local alignments. Best reciprocal hits below a cutoff E value of e−4 were considered significant when more than 50% of the longest protein was involved in the alignment.
The S. citri GII3-3X chromosome has a G+C content of 26.1%. It encodes one single 16S-23S-5S rRNA operon and 32 tRNAs, including the two tryptophan tRNAs, tRNATrp(CCA) and tRNATrp(UCA) (14). The tRNA genes are organized into eight clusters. The 38 contigs comprise 1,908 CDS representing an overall coding density of 74%. CDS are symmetrically distributed on both sides of oriC, with a strong bias for the leading strands of chromosome replication (Fig. (Fig.2).2). Every predicted CDS was assigned to a functional category. CDS encoding components of cellular processes, cell metabolism, and structural elements account for 6.2%, 20%, and 3.5% of the CDS, respectively (Fig. (Fig.3).3). CDS of external origin such as viruses and mobile elements represent 20.5% and 3.6% of the coding capacity, respectively, and CDS of unknown function represent 47% of the coding capacity. Among these CDS of unknown function, 21% group into 63 paralog families that cluster mainly in a large region of the chromosome at the opposite side of oriC. The largest family comprises 23 paralogs with an N-terminal A(D/N)LXX pentapeptide repeat motif (PPA) of unknown function. Paralogous CDS group into repeated blocks of 7 to 12 ordered units each (see Fig. S2 in the supplemental material). These blocks are arranged into four distinct types depending on whether they contain the P70, P25, P31, or P29 paralog. The most conserved CDS are found in the PPA and P16 families. The organization of these paralogs into conserved blocks of various sizes suggests that they may represent remnants of former mobile units having undergone many insertion/deletion/recombination events (Fig. S2). No integrase/transposase-encoding CDS were identified in these blocks. Phage-related sequences were homologous to plectrovirus SpV1 (38) and SVTS2 (43) as well as those presumed to be from SpV2, a lambda-like phage found in S. citri cultures (15). These viral sequences are dispersed into more than 50 sites all along the chromosome. It is noteworthy that most (13/14) of the regions containing plectrovirus sequences larger than 7 kbp (i.e., that might represent full-length prophages) contain truncated CDS. Only one seems to correspond to a linearized, entire SpV1 genome. The integration of a plectrovirus genome into the S. citri chromosome has been shown to occur upon virus infection and to be associated with resistance to subsequent infection (44).
Sixty-nine CDS encoding truncated and full-length transposases were detected. Based on sequence similarities with transposases in the ISFinder database (45), these transposases belong to four distinct insertion sequence (IS) families (IS30, IS431, IS3, and ISNCY). Whereas those of the IS30 and IS431 (formerly classified as IS3) families are all associated with SpV1 sequences, those of the IS3 (one copy) and ISNCY (seven copies of IS1202) families are not.
Similarity analyses showed that 21% of the S. citri chromosomal CDS were truncated compared to their bacterial orthologs, revealing important gene decay. In addition to many transposases and viral CDS, which appeared C-terminally or N-terminally truncated, housekeeping genes were also found to be incomplete. Among these are recA, involved in homologous recombination and shown to be truncated in other strains of S. citri (34); dinB, encoding DNA polymerase IV; and pyrP, encoding uracil permease.
Transmembrane domains, including signal peptides, were predicted for 599 CDS, of which 58 were putative lipoproteins with a cysteine residue at positions 22 to 26. S. citri has a Sec-dependent protein export pathway, as indicated by the presence of the secY, secA, ftsY, and ffh (SRP54) genes and homologs to secE (SPICI03_054) and oxaA or yidC (SPICI01A_062). Eighty-four CDS were involved in the acquisition of various metabolites from the host/environment. They encode components of phosphoenolpyruvate phosphotransferase systems (PTS), ATP binding cassette (ABC) transporters, and various other transporters. In addition to the general enzymes EI and HPr and the glucose-, fructose-, and trehalose-specific PTS permeases that were previously shown to be functional (2, 3, 18, 25), the S. citri genome also encodes several other hypothetical PTS permeases of unknown substrates, most of which are truncated. The occurrence of PTS in spiroplasma genomes is a striking difference from the genomes of other phloem-limited bacteria such as phytoplasmas (30, 36, 47) and the phloem-restricted proteobacterium “Candidatus Liberibacter asiaticus” (17), which do not encode PTS. In the S. citri GII3-3X genome, 32 CDS were predicted to encode components of ABC transporters (27), including 12 permeases and 15 ATP binding and 5 solute binding components, 2 of which are specific for phosphate and oligopeptide, respectively. Other transporters include an Na+-dependent transporter, three cation transporters, one arginine/ornithine antiporter, two amino acid and polyamine permeases, one formate/nitrite transporter, ferritin, a malate permease, an Na+/dicarboxylate symporter, and a NAD transporter. Like the other mollicutes, S. citri has limited synthesis capabilities. In particular, except for the glycine hydroxymethyltransferase, which converts glycine into serine, it lacks most of the enzymes involved in amino acid metabolism. In contrast, many genes of the purine and pyrimidine metabolism pathways are present.
Besides the PTS to import sugars, S. citri GII3-3X also possesses a complete glycolytic pathway and lactate dehydrogenase to promote fermentation. Regarding the pentose-phosphate pathway, S. citri GII3-3X lacks the two genes encoding the enzymes that convert glucose-6P into ribulose-5P. Therefore, this pathway appeared deficient for the oxidative phase. In turn, the S. citri GII3-3X genome encodes four enzymes of the nonoxidative phase of this pathway, namely, ribulose-phosphate 3-epimerase (rpe; EC 220.127.116.11), transketolase (tkt; EC 18.104.22.168), ribose-phosphate pyrophosphokinase (prs; EC 22.214.171.124), and, possibly, ribose-5-phosphate isomerase B (rpiB; EC 126.96.36.199). These four genes are present in mycoplasmas but are totally absent in phytoplasmas.
The S. citri genome contains CDS homologous to Clostridium perfringens arcA, arcB, arcC, and arcD, which encode proteins homologous to arginine deiminase (ADI), ornithine carbamoyltransferase (OTC), carbamate kinase (CK), and the arginine/ornithine antiporter (ArcD), respectively. The presence of the ADI operon is consistent with our previous data showing the ability of S. citri GII3-3X to metabolize arginine (18), a feature shared by other mollicutes, including Mycoplasma penetrans and Mycoplasma hominis (37).
A large operon made of eight CDS was predicted to represent the operon of atpB, atpE, atpF, atpH, atpA, atpG, atpD, and atpC, encoding the eight subunits of the classical FoF1-ATP synthase, which is thought to be essential for ATP production. Surprisingly, this ATP synthase is absent in sequenced phytoplasma genomes (30, 36, 47). The S. citri GII3-3X chromosome also encodes a nearly complete pathway for the biosynthesis of a C55 terpenoid (Fig. (Fig.4A).4A). Six genes (dxs, dxr, ispD, ispF, ispG, and ispH) dedicated to the synthesis of isopentenyl pyrophosphate through the 2-C-methyl-d-erythritol 4-phosphate/1-deoxy-d-xylulose 5-phosphate (MEP/DOXP) pathway were identified. Only ispE is missing for the pathway to be complete, as is the case for Bacillus subtilis (Fig. (Fig.4A).4A). A comparison with B. subtilis also reveals that both bacteria lack the mevalonate pathway for isopentenyl-pyrophosphate (PP) synthesis (Fig. (Fig.4A).4A). Other bacteria, such as Xylella fastidiosa and Wolbachia pipiensis, which are also associated with plants or arthropods, have a complete MEP/DOXP pathway, and in the class Mollicutes it is also the case for M. penetrans and Mycoplasma gallisepticum. On the contrary, other phloem-restricted bacteria like phytoplasmas and “Ca. Liberibacter asiaticus” lack genes for the MEP/DOXP pathway. Note that in S. citri, the genes involved in the MEP/DOXP pathway are not conserved due to their clustering in an operon, but instead, they are scattered over 1 Mbp of the S. citri chromosome (Fig. (Fig.4B).4B). A possible use of isopentenyl-PP in spiroplasma cells is the modification of tRNA. Indeed, S. citri has an miaA gene, which encodes the tRNA isopentenyl transferase, which catalyzes the first step of the modification of an adenosine near the tRNA anticodon. Even though S. citri has the uppS gene, which transforms farnesyl-PP into undecaprenyl-PP, it lacks the gene encoding the enzyme that transforms isopentenyl-PP into farnesyl-PP. Therefore, the capability of S. citri to synthesize the C55 terpenoid undecaprenyl pyrophosphate remains to be established.
In summary, S. citri, like other members of the Mollicutes (46), has undergone a reductive genome evolution resulting in the lack of many biosynthetic pathways. Unlike phytoplasmas (30, 36, 47) and like mycoplasmas, S. citri has retained the ability to import sugars via the PTS and to synthesize ATP using an FoF1-ATP synthase. The S. citri chromosome is roughly 1 Mbp larger than most phytoplasma genomes. It contains many CDS of plectrovirus prophages and conserved blocks of unknown function that could also be remnants of viral sequence insertions, a feature that is common with phytoplasma genomes, as they contain repeated clusters of genes that could be mobile genetic elements or remnants of ancient phage attacks (4, 50). In S. citri GII3-3X, the finding of hundreds of truncated CDS reveals an important gene decay. A similar gene decay was also reported for a published 85-kbp region of the Spiroplasma kunkelii chromosome (56). It may be possible that the reductive evolution of the spiroplasma genome is still ongoing on the way to smaller genomes with the loss of genes that are not necessary. The resulting genomes would have a size closer to that of related mollicutes, around 1 Mbp. By comparison of the annotated region of the S. kunkelii chromosome with the homologous region in the S. citri GII3-3X chromosome, over 100 predicted CDS, 66 CDS, and a cluster of two tRNAs were found conserved and organized in the same order. Despite this overall synteny, S. kunkelii has 18 additional small CDS, most of which encode hypothetical proteins of unknown function, while S. citri possesses 8 additional CDS. Both spiroplasmas have a disrupted uraA or pyrP gene for uracil permease, but S. kunkelii presented three truncated CDS for RNase HII (rnhB), whereas S. citri had the complete rnhB gene. Apart from the important gene decay, the S. citri chromosome is also characterized by an important proportion of species-specific CDS of unknown function (47%). Comparison of the predicted proteome of S. citri GII3-3X with those of the phylogenetically related Mycoplasma mycoides SC PG1 and Mesoplasma florum showed that they have 425 and 444 orthologous CDS in common, respectively. For comparisons to mollicutes of other phylogenetic branches, the figures vary, from 290 (Mycoplasma genitalium) to 322 (Mycoplasma pulmonis) and 359 (Acholeplasma laidlawii) orthologous CDS. Among the plant-pathogenic mollicutes, which are transmitted by hemipteran insects, S. citri shares only 223, 240, and 242 orthologous CDS with “Candidatus Phytoplasma mali,” “Ca. Phytoplasma asteris,” and “Ca. Phytoplasma australiense,” respectively. These data indicate that in the S. citri genome, the overall gene conservation is more related to the phylogenetic origin than to the ecological niche. Therefore, spiroplasmas and phytoplasmas, despite sharing a similar biological cycle between plants and insects, have rather divergent genome information.
Due to the extremely large number of repeated sequences, the S. citri chromosome sequence could not be completed. However, it is likely that most of the missing data correspond to repeated sequences of viral origin. Despite its abundant direct and inverted repeats, the genome of S. citri is quite stable, certainly due to its being recA deficient. A comparison of the S. citri and Spiroplasma melliferum chromosome maps showed that very few large-scale rearrangements took place during their evolution (52). The sequence data described here sustain the current efforts to establish the S. citri proteome and produce new targeted mutants for a better understanding of the interactions of S. citri with its leafhopper vector. It will also contribute to comparative genomics in the class Mollicutes (6, 46) and may help in an understanding of genome evolution in this class of small bacteria with reduced genomes and provide clues for the identification of the function of the numerous cryptic genes that have been described in the past 15 years.
This work was funded by INRA, Fundecitrus, and the Conseil Régional d'Aquitaine.
We thank Audrey Henri and Marie Weber for excellent technical assistance and Pascal Sirand-Pugnet for helpful discussions.
Published ahead of print on 2 April 2010.
†Supplemental material for this article may be found at http://aem.asm.org/.