|Home | About | Journals | Submit | Contact Us | Français|
The chromosome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A), associated with dieback in papaya, Australian grapevine yellows in grapevine, and several other important plant diseases, was determined. The circular chromosome is represented by 879,324 nucleotides, a GC content of 27%, and 839 protein-coding genes. Five hundred two of these protein-coding genes were functionally assigned, while 337 genes were hypothetical proteins with unknown function. Potential mobile units (PMUs) containing clusters of DNA repeats comprised 12.1% of the genome. These PMUs encoded genes involved in DNA replication, repair, and recombination; nucleotide transport and metabolism; translation; and ribosomal structure. Elements with similarities to phage integrases found in these mobile units were difficult to classify, as they were similar to both insertion sequences and bacteriophages. Comparative analysis of “Ca. Phytoplasma australiense” with “Ca. Phytoplasma asteris” strains OY-M and AY-WB showed that the gene order was more conserved between the closely related “Ca. Phytoplasma asteris” strains than to “Ca. Phytoplasma australiense.” Differences observed between “Ca. Phytoplasma australiense” and “Ca. Phytoplasma asteris” strains included the chromosome size (18,693 bp larger than OY-M), a larger number of genes with assigned function, and hypothetical proteins with unknown function.
Phytoplasmas are bacterial plant pathogens in the class Mollicutes that are associated with over 1,000 plant diseases worldwide (39, 78). Phytoplasmas have genomes of between 530 and 1,200 kb, no outer cell wall, a G+C content between 23 and 29 mol%, two rRNA operons, a low number of tRNAs, and a limited set of metabolic enzymes (9, 13, 48, 60). Comparative analysis of the 16S rRNA gene revealed that phytoplasmas form a distinct clade within the class Mollicutes (29, 44, 74). Within this class, the phytoplasmas cluster within the AAA (Asteroleplasma, Anaeroplasma, and Acholeplasma) clade rather than the SEM (Spiroplasma, Mycoplasma, and Entomoplasma) clade (8, 66). Most Mollicutes (including mycoplasmas and spiroplasmas) use UGA as a tryptophan codon in addition to the standard UGG tryptophan codon. In contrast, acholeplasmas and phytoplasmas use UGA as a stop codon (44).
In 2004, the provisional genus status “Candidatus Phytoplasma” was adopted based on the directions outlined previously by Murray and Stackebrandt (34, 52). The distinct position of phytoplasmas is based on 16S rRNA sequence homology and other properties like host range and vector specificity. Based on the “Candidatus” criteria, 26 “Candidatus Phytoplasma” species have been described (23) (http://www.bacterio.cict.fr). In Australia, “Candidatus Phytoplasma australiense” (hereafter abbreviated as “Ca. Phytoplasma australiense”), a member of the 16SrXII-B group, is widespread and associated with several diseases in economically important crops. These diseases include Australian grapevine yellows (61, 75), papaya dieback (41), strawberry lethal yellows (SLY), strawberry green petal (62), and pumpkin yellow leaf curl (81). In New Zealand, “Ca. Phytoplasma australiense” is associated with several plant diseases including SLY (2), phormium lethal yellows (41), Cordyline australis (cabbage tree) sudden decline, and coprosma lethal decline (3).
Sequence analysis based on the 16S rRNA gene showed that phytoplasmas associated with Australian grapevine yellows, strawberry green petal, SLY, papaya dieback, and phormium lethal yellows diseases shared 99.6 to 99.8% sequence homology (62). Streten and Gibb (82) previously showed that “Ca. Phytoplasma australiense” could be differentiated into subgroups based upon differences in both the tuf and ribosomal protein-encoding (rp) genes. The subgroups were referred to as 16SrXII-B tuf-Australia I, rp-A; tuf-New Zealand I, rp-B; and tuf-New Zealand II, rp-C. This level of diversity within “Ca. Phytoplasma australiense” was supported by a previous study by Andersen et al. (4).
Mollicutes are targets for genome sequencing projects due to their small genomes and economic importance in plant and animal diseases. Mycoplasma genitalium was the first mollicute and second bacterium to be fully sequenced (25). Whole-genome projects provide insight into the organism's biology, such as the minimal gene set for survival in a cell-free medium, nutritional requirements, energy metabolism, and pathogenicity factors, and to understand host-pathogen interactions (23).
To date, 17 mollicute genomes have been fully sequenced (http://cbi.labri.fr/outils/molligen/home.php), including two phytoplasmas, “Ca. Phytoplasma asteris” strains onion yellows mutant (OY-M) (60) and aster yellows witches’ broom (AY-WB) (9). Information derived from the two phytoplasma genomes include features such as reduced metabolic functions compared to those of mycoplasmas, an absence of the pentose phosphate cycle, no ATP synthase subunits, and repeated DNA organized in potential mobile units (PMUs) (9, 60).
In this publication, we report the complete genome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A) and a comparative analysis with the two “Ca. Phytoplasma asteris” strains and members of the Mollicutes.
“Ca. Phytoplasma australiense” was transmitted from Gomphocarpus physocarpus (cottonbush) in Queensland, Australia, to periwinkle by grafting. The phytoplasma strain was maintained in periwinkle in an insect-proof glasshouse by periodic grafting. The transmitted phytoplasma strain was confirmed by PCR using specific primers (fMLO1 and rMLO1) that amplify the phytoplasma elongation factor (tuf) gene (76).
Chromosomal “Ca. Phytoplasma australiense” DNA was prepared as described previously by Neimark and Kirkpatrick (56), with modifications described previously by Padovan et al. (63). Instead of midribs, periwinkle flowers were identified as a source of phytoplasma DNA. Agarose plugs containing the phytoplasma DNA were arranged in stacks and separated by pulsed-field gel electrophoresis (PFGE) in a 1% gel using the CHEF DRIII apparatus (Bio-Rad, Munich, Germany) with the following parameters: 6 V/cm, a switch time of 20 to 100 s, 1× Tris-acetate-EDTA, and 14°C for 24 h. Yeast chromosomes (New England Biolabs, Frankfurt, Germany) were used as a molecular size marker.
The unstained chromosomal DNA was electroeluted from the excised PFGE agarose slice and concentrated by ethanol precipitation using glycogen as a carrier. Two shotgun libraries with average insert sizes of 1.5 and 3.5 kb were generated from sonicated DNA. Sheared DNA fragments were blunt ended or flushed with T4 and Klenow polymerase (New England Biolabs, Frankfurt, Germany) and ligated into vector pUC19 (Fermentas, St. Leon-Rot, Germany). The recombinant plasmids were electroporated into Escherichia coli strain DH10B (Invitrogen, Karlsruhe, Germany). Plasmids were isolated from the clones and sequenced using ABI3730XL capillary sequencer systems (Applied Biosystems, Darmstadt, Germany). Additionally, a fosmid library was constructed (pCC1FOS; Epicenter Biotechnologies, Hessisch Oldendorf, Germany) according to the manufacturer's instructions.
Sequences were assembled using Phrap (http://www.genome.bnl.gov/Software/UW/) and the Consed package (version 14.00) (28). Gaps and regions of poor sequence quality were improved by resequencing, primer walking, and long-range PCR. The total sequence data showed a 14-fold coverage and high sequence quality with only one error in 100,000 bases.
Glimmer 2.0 was used to predict open reading frames (ORFs) in the finished sequence (19). ORF predictions were manually adjusted using ARTEMIS (70) and FlipORF (BioManager; Entigen Corporation) (22). Similarity searches were carried out using BLASTP (1) against the UniProt database. Functional assignments were determined using the INTERPRO system (7). The results were entered in the Web-based platform HTGA (High-Throughput Genome Annotation) (65) and used for final annotation. tRNA genes were identified by the algorithm described at the Washington University Department of Genetics website (http://www.genetics.wustl.edu/eddy/tRNAscan-SE/) (46).
“Ca. Phytoplasma australiense” metabolic pathways were reconstructed using the Kyoto Encyclopedia of Genes and Genomes database (http://www.genome.jp/kegg/). Membrane transporters were determined using TransportDB (http://www.membranetransport.org/), and insertion sequences (ISs) were identified using the IS Finder program (http://www-is.biotoul.fr/index.html?is_special_name=ISRso11). Inverted repeats were determined by the Inverted Repeats Finder (http://tandem.bu.edu/cgi-bin/irdb/irdb.exe). Comparative genome analysis of “Ca. Phytoplasma australiense,” “Ca. Phytoplasma asteris” (strains OY-M [GenBank accession number AP006628] and AY-WB [GenBank accession number CP000061]), and Mollicutes (Mycoplasma capricolum subsp. capricolum [GenBank accession number CP000123], M. mycoides subsp. mycoides SC [GenBank accession number BX293980], Ureaplasma parvum [GenBank accession number AF222894], Mycoplasma penetrans HF-2 [GenBank accession number BA000026], Mycoplasma gallisepticum strain R [GenBank accession number AE015450], Mycoplasma pneumoniae M129 [GenBank accession number U00089], Mycoplasma genitalium G37 [GenBank accession number L43967], Mycoplasma mobile 163K [GenBank accession number AE017308], Mycoplasma hyopneumoniae 232 [GenBank accession number AE017332], Mycoplasma pulmonis UAB CTIP [GenBank accession number AL445566], and Mycoplasma synoviae [GenBank accession number AE017245]) were conducted using the Molligen 2.0 database (http://cbi.labri.fr/outils/molligen/home.php).
The “Candidatus Phytoplasma australiense” chromosome sequence was deposited in the GenBank, EMBL, and DDBJ nucleotide sequence databases under accession number AM422018.
The “Ca. Phytoplasma australiense” genome is comprised of one circular chromosome of 879,324 bp, a G+C content of 27% (Fig. (Fig.1;1; see Table S1 in the supplemental material), and a 3.7-kb plasmid (84). ORFs identified in the plasmid were not encoded on the “Ca. Phytoplasma australiense” chromosome. The chromosome contained two rRNA operons, 35 tRNAs (one pseudo-tRNA), two miscellaneous RNA genes (RNase P RNA and transfer mRNA), and 839 predicted ORFs (85) with a minimal size of 30 amino acid (aa) residues (Table (Table11 and Fig. Fig.1).1). Five hundred two protein-coding genes had an assigned function, and 337 genes were hypothetical proteins with unknown function (Table (Table1).1). In agreement with “Ca. Phytoplasma asteris” strains OY-M (59) and AY-WB (9), UGA was also used as a stop codon for ORF prediction.
Of the 839 predicted ORFs, 202 (24% of the genome) ORFs covering 147,146 bp of the chromosome were present as multiple copies and comprised 58 paralog groups throughout the genome. One hundred forty-three ORFs (12.1% of the genome) covering 106,682 bp of the chromosome were organized into gene clusters referred to as PMUs. A gene cluster was deemed to be a PMU if the genes were involved in DNA replication, recombination, and repair, such as phage integrases, replicative DNA helicase, and HimA (DNA binding protein HU) and those involved in replication and repair. Five PMUs were identified (Fig. (Fig.11 and Table Table22).
PMU1 (~7,600 bp) was repeated five times throughout the genome, covering a total of 46,215 bp. All genes encoded in this mobile group were involved in DNA replication, recombination, and repair (replicative DNA helicase, DNA binding protein HU, and phage integrase); nucleotide transport and metabolism (thymidylate kinase and IMP dehydrogenase/GMP reductase); translation; ribosomal structure; and biogenesis (N6 adenine-specific DNA methylase). Replicative DNA helicases encoded in this PMU had sequence similarity with a LambdaSa2 prophage from Streptococcus agalactiae serotype V (GenBank accession number Q8DXH8). PMU4 (~11,000 bp) was the largest PMU, covering a total of 30,776 bp in the “Ca. Phytoplasma australiense” chromosome, which encoded genes similar to PMU1 except for the inclusion of a single-stranded DNA binding gene similar to Staphylococcus prophage phiPV83 (GenBank accession number Q9MBS1) and DNA-directed RNA polymerase-specialized sigma subunit (fliA). PMU2 and PMU3 encoded genes similar to those included in PMU1 and PMU4 except that these units did not include many of the conserved hypothetical proteins that consistently appeared in PMU1 and PMU4. PMU5 (~7,700 bp) encoded genes for nucleotide transport and metabolism (thymidylate kinase and IMP dehydrogenase/GMP reductase), translation, ribosomal structure, and biogenesis (N6-adenine-specific DNA methylase). The thymidylate kinase (CAM11639 and CAM11700) encoded in this PMU differed from the one found in PMU1 to PMU4 since it showed sequence similarity with orthopoxyviruses.
Some of the genes listed in Table Table22 as part of the identified PMUs were also located elsewhere on the genome. However, gap analysis and sequence alignments showed that their nucleotide sequences were different from those clustered in mobile units. Generally, paralog genes within PMUs were 99 to 100% similar to each other and only 50% similar to genes that were not clustered in PMUs, i.e., single-stranded DNA binding protein (ssb [CAM11486]), DNA primase (dnaG [CAM11948]), and DNA-binding protein HU (huP [CAM12017]). A single (“Ca. Phytoplasma australiense”) tmk gene copy was also identified and not a part of the two paralog families. This single copy was only 43.5% similar to those in the paralog families. Multiple amino acid alignment indicated that this single-copy tmk was similar to tmk-b from OY-M and AY-WB. One paralog family containing two ORFs (CAM11639 and CAM11700) was similar to the tmk-a gene from OY-M and AY-WB but also to orthopoxviruses. This tmk gene was encoded on PMU5 (Table (Table2).2). The parolog tmk gene from the second group showed very low similarity to the tmk genes in the OY-M and AY-WB chromosomes. We also found clusters that resembled PMUs. These derivatives contained fragmented ORFs similar to those encoded in PMUs. The “Ca. Phytoplasma australiense” genome contained multiple copies of many genes, indicating gene duplication, but it also contained 159 ORFs (19% of the genome) of fragmented genes (Fig. (Fig.1).1). Although genes found in PMUs were similar for all three phytoplasmas, generally, the PMUs found in the “Ca. Phytoplasma australiense” chromosome were smaller than those found in strain AY-WB. The largest PMU in “Ca. Phytoplasma australiense” was only ~11,000 bp, compared to the largest in AY-WB being 20,093 bp.
Elements were considered to be associated with DNA insertion and deletion events on the basis of their similarity to phage integrase proteins, transposases from other phytoplasmas, and ISs from members of the IS3 family (40). The coding regions in the “Ca. Phytoplasma australiense” genome that we referred to as phage integrase-like were similar to transposases belonging to the IS30 group of the IS3 family, phytoplasma transposases, phage integrases, and transposases from phages. BlastX searches of “Ca. Phytoplasma australiense” phage integrase-like proteins indicated the presence of several conserved motifs such as the helix-turn-helix (HTH) motif (Fig. (Fig.2).2). Similarly, conserved regions were observed for putative transposases such as the Rve motif, which is the integrase core domain, and the DDE motif (Fig. (Fig.2).2). “Ca. Phytoplasma australiense” ORF CAM11686, located on PMU5 (320 aa) had similarities to both phytoplasma transposases (200 aa) and phage transposases (100 aa). A similar result was found for all full-length “Ca. Phytoplasma australiense” phage integrase-like proteins. A multiple sequence alignment of phytoplasma transposases, phage transposases, IS transposases, phage integrases, and a representative of “Ca. Phytoplasma australiense” phage integrase-like elements (Fig. (Fig.2)2) revealed a consensus of 187 aa. Of these, over 120 aa were found in either or both phage integrases and phage transposases as well as phytoplasmas.
A characteristic of transposable elements is the protein recognition binding sites. They include inverted and direct repeat sequences for transposases and IS elements and two recognition sites (attP and attB) for site-specific recombinases such as phage integrases. We could not locate the two phage integrase recognition sites on the “Ca. Phytoplasma australiense” chromosome. Inverted repeat sequences were identified on the AY-WB genome, and these were used to search for similar sequences on the “Ca. Phytoplasma australiense” genome, but no similar regions were identified. Using the Inverted Repeats Finder program, inverted repeats were located on several sections of the “Ca. Phytoplasma australiense” chromosome; however, none of these flanked IS-like elements or PMUs.
In an analysis of the regions with similarities to phage integrases, four of the nine complete genes had both OrfA and OrfB, which may form a functional OrfA/OrfB fusion protein upon a −1 translational frameshift. Five ORFs (CAM11461, CAM11686, CAM11954, CAM12122, and CAM12145) encoded both the HTH and the catalytic DDE motifs, suggesting that these proteins might be functional. Two full-length phage integrase-like elements (CAM11398 and CAM11541) did not encode a functional N terminus but contained the conserved DDE motif in the integrase catalytic region. Of the 30 fragmented genes with similarities to phage integrases, three encoded the conserved DDE motif but did not contain the HTH conserved motif, while another three encoded the conserved HTH motif but not the DDE motif, which suggests that they may not be functional. OrfA with the HTH DNA binding motif can compete with transposases to bind terminal inverted repeats, and the OrfB protein can catalyze sequence cleavage. The presence of both OrfA and OrfB can also inhibit the formation of an active transposome that includes the transposase, the terminal inverted repeats, and the target DNA. But since terminal inverted repeats around the phage integrase-like proteins were not located on the “Ca. Phytoplasma australiense” chromosome, the exact function of these proteins is unclear. One transposase found in AY-WB PMU1 encoded both OrfA and OrfB, suggesting that it may be able to produce a full-length ORFAB fused-frame transposase (9).
Of the 39 elements with similarity to phage integrases, 20 were similar to those of bacteriophages from Lactobacillus casei (TrEMBL accession number Q6J1X2) and Escherichia coli (TrEMBL accession numbers Q6H9S3, Q8X555, Q7Y2I6, and Q6H9S6), and five of the putative phage integrase coding domains were similar to bacteriophage Sf6 from Shigella flexneri (TrEMBL accession number Q716C2).
The “Ca. Phytoplasma australiense” genome, like the OY-M and the AY-WB genomes, lacked genes for amino sugar, nucleotide sugar, glyoxylate, and dicarboxylate biosynthesis. However, it encoded 13 genes involved in glycolysis and gluconeogenesis. Eight of these genes are essential for glycolysis, and the remaining five genes encode enzymes used in gluconeogenesis. The same eight genes involved in glycolysis were also encoded by the two “Ca. Phytoplasma asteris” strains. Five glycolytic genes (coding for glucose-6-phosphate isomerase, 6-phosphofructokinase, 2,3-bisphophoglycerate-independent phosphoglycerate mutase, enolase, and pyruvate kinase) had a different gene order in the “Ca. Phytoplasma australiense” chromosome compared to those of the two “Ca. Phytoplasma asteris” strains. In the “Ca. Phytoplasma australiense” chromosome, three glycolytic genes (6-phosphofructokinase, 2,3-bisphophoglycerate-independent phosphoglycerate mutase, and enolase) were located over 130 kb from the remaining two glycolytic genes. In the “Ca. Phytoplasma asteris” strains, the five glycolytic genes were within a 30-kb region. The presence of these genes in “Ca. Phytoplasma australiense” suggests that a functional glycolytic pathway may exist. All three phytoplasma genomes were missing complete pathways for amino and nucleotide sugar, glyoxylate, and dicarboxylate metabolism.
The “Ca. Phytoplasma australiense” genome encoded only one gene involved in oxidative phosphorylation (inorganic pyrophosphatase). Although the “Ca. Phytoplasma australiense” genome encoded genes for the acyl carrier protein and the fatty acid/phospholipid synthesis protein involved in fatty acid biosynthesis, it encoded only one gene (1-acyl-sn-glycerol-3-phosphate acyltransferase) for glycerolipid metabolism. All three phytoplasma genomes were missing complete pathways for ATP synthesis, fatty acid metabolism, and carbon dioxide fixation.
The “Ca. Phytoplasma australiense” genome lacked genes involved in the synthesis of several essential amino acids. The genome encoded cytosine-specific DNA methylase involved in methionine metabolism, cysteinyl-tRNA synthetase involved in cysteine metabolism, and methyltransferase and N6-adenine-specific methylase involved in histidine, tyrosine, and tryptophan metabolism; these were not found in OY-M or AY-WB, suggesting that these genes were strain specific. All three phytoplasma genomes were missing complete pathways for phenylalanine metabolism, the urea cycle, metabolism of amino groups, and d-glutamine, d-glutamate, d-arginine, d-ornithine, d-alanine, and d-glutathione metabolism.
The “Ca. Phytoplasma australiense” genome carried genes encoding proteins such as thiamine biosynthesis protein and intracellular protease/amidase involved in thiamine metabolism riboflavin kinase involved in riboflavin metabolism; phosphopantetheinyl transferase (holo-ACP synthase) involved in pantethenate and coenzyme A biosynthesis, and dihydrofolate reductase involved in folate biosynthesis but lacked metabolic genes for vitamin B6.
Multiproteome differential queries (Molligen) plus the BLAST database (NCBI) were used to identify “Ca. Phytoplasma australiense”-specific genes. “Ca. Phytoplasma australiense” carried the highest number of strain-specific genes (197) compared to OY-M (86) and AY-WB (66). Some “Ca. Phytoplasma australiense”-specific genes coded for sucrose phosphorylase (gtfA), cytosine-specific DNA methylase, leucyl aminopeptidase (pepA), metallophosphoesterase, riboflavin kinase (ribF), regulatory protein (spxA), restriction endonuclease (RsrlR), S-adenosyl-methyltransferase (mraW), and regulation factor cyclic AMP (fic). “Ca. Phytoplasma australiense” pepA was similar to the proteins found in Rhizobium etli CFN 42 and Agrobacterium tumefaciens strain C58.
Identified strain-specific metabolic genes were also found for all three phytoplasmas (Table (Table3).3). The percentages of genes in each functional category were analyzed and found to be similar for all three phytoplasmas (data not shown). Most of these genes were involved in translation, membrane transport, or carbohydrate metabolism. Almost 50% of protein-coding genes had unknown function.
Whole-genome alignment of the three phytoplasma genomes allowed insight into gene synteny. Alignments between “Ca. Phytoplasma australiense” and AY-WB (Fig. (Fig.3a)3a) and OY-M (Fig. (Fig.3b)3b) indicated small sections of gene synteny between the genomes. The longest alignment region was 62 kb (Fig. 3a and b) and is defined by norM at position 629134 in “Ca. Phytoplasma australiense” (CAM11967), position 453835 in AY-WB (AY_WB441), and position 332195 in OY-M (PAM280) and by gpsA at position 691298 in “Ca. Phytoplasma australiense” (CAM12018), position 507182 in AY-WB (AY_WB480), and position 269418 in OY-M (PAM241). This ~62-kb region shows a conserved gene order and consists of genes involved in replication, repair, transcription, translation, membrane transport, protein export, and nucleotide, amino acid, and lipid metabolism. The AY-WB and OY-M genome alignment (Fig. (Fig.3c)3c) produced an X-shaped pattern, indicating a conserved gene order of the majority of AY-WB and OY-M genes but in an inverted orientation. The area (Fig. (Fig.3c)3c) of ~250 kb was defined by lplA at position 423992 in AY-WB (AY_WB412) and position 354087 in OY-M (PAM309) and by glnQ at position 660824 in AY-WB (AY_WB634) and position 103752 in OY-M (PAM079). The ~62-kb conserved section for all three phytoplasmas was located within this ~250-kb region.
“Ca. Phytoplasma australiense” encodes the complete ABC subfamily capable of importing methionine, cobalt, zinc/manganese, dipeptides/oligopeptides, spermidine/putrescine, and sugars. However, “Ca. Phytoplasma australiense” lacks the periplasmic component for the amino acid transport system. “Ca. Phytoplasma australiense” encodes two copies of the complete ABC transport system for peptide and nickel (see Table S2 in the supplemental material). It also encodes the permease component for the ABC-type arginine system, which was not found in the “Ca. Phytoplasma asteris” strains. “Ca. Phytoplasma australiense” also encodes one putative transporter (CAM11945) that was similar to hemolysin. Signal peptides were identified for three solute binding “Ca. Phytoplasma australiense” ABC transporters as well as some hypothetical proteins (data not shown). All three phytoplasma genomes encoded a large number of membrane transporters responsible for amino acid uptake, inorganic ion uptake, dipeptide/oligopeptide uptake, spermidine/putrescine uptake, sugar uptake, and multidrug resistance as well as some unclassified transporters and other transporters such as cation transport ATPases (see Table S2 in the supplemental material). “Ca. Phytoplasma asteris” strain AY-WB was missing the methionine permease component, while strain OY-M was missing the complete ABC family for methionine. Both “Ca. Phytoplasma asteris” strains were missing the periplasmic component of the cobalt transport system. Both “Ca. Phytoplasma australiense” and strain AY-WB encoded the complete maltose import system compared to strain OY-M, which lacked the periplasmic component.
All three phytoplasma genomes also lacked genes encoding type I and type II secretion pathways, the phosphoenolpyruvate-dependent phosphotransferase system (PTS) involved in membrane transport, and the two-component system involved in signal transduction.
Comparative analysis of the three phytoplasma genomes using multiproteome differential queries showed that each phytoplasma shared a large number of similar genes, 570 from “Ca. Phytoplasma australiense,” 571 from OY-M, and 505 from AY-WB (Fig. (Fig.4).4). The majority of these similar genes had assigned functions. The similar genes shared among the three phytoplasmas included those that were present in multiple copies, such as the DNA binding protein HU, which had 11 copies in “Ca. Phytoplasma australiense,” 15 copies in OY-M, and six copies in AY-WB. The numbers of multicopy, single, and fragmented genes differed among phytoplasmas for any given similar gene; therefore, the total number of similar genes shared between phytoplasma genomes was not identical. The “Ca. Phytoplasma australiense” genome encoded 38 genes similar to those of OY-M, while OY-M shared 15 genes with “Ca. Phytoplasma australiense”; of these similar genes, none were found in AY-WB. For example, the uncharacterized phage-associated protein (gepA) had five copies in “Ca. Phytoplasma australiense” and only one copy in OY-M. The “Ca. Phytoplasma australiense”- and OY-M-specific genes included metabolic genes (Table (Table3)3) and mostly hypothetical proteins with unknown function. Similar results were observed for genes shared between “Ca. Phytoplasma australiense” and AY-WB (Table (Table3).3). The numbers of similar genes shared between the two “Ca. Phytoplasma asteris” strains were considerably larger, with 79 genes from OY-M and 82 genes from AY-WB (Fig. (Fig.4),4), none of which were encoded on the “Ca. Phytoplasma australiense” genome, therefore suggesting gene conservation between the two closely related “Ca. Phytoplasma asteris” strains.
A three-way genome comparison on the basis of similar protein sequences was used to determine a tax plot for all three phytoplasmas. “Ca. Phytoplasma australiense” was used as the reference phytoplasma. Seventy-one “Ca. Phytoplasma australiense” proteins had similarity to proteins in both the OY-M and AY-WB genomes, 454 proteins were similar to OY-M only, and 314 were similar to AY-WB only. Compared to M. genitalium and M. gallisepticum, on the basis of similar protein sequences, “Ca. Phytoplasma australiense” had 260 proteins similar to both Mycoplasma species, compared to 228 for AY-WB and 259 for OY-M.
The three phytoplasma genomes were compared with M. capricolum subsp. capricolum, M. mycoides subsp. mycoides SC, M. penetrans, M. gallisepticum strain R, M. pneumoniae, M. mobile 163K, M. hyopneumoniae 232, M. pulmonis, M. synoviae, and Ureaplasma urealyticum/Ureaplasma parvum to determine how many phytoplasma genes were conserved in their distant relatives. AY-WB shared the smallest number of genes (175 genes) with the other listed Mollicutes, compared with 202 in “Ca. Phytoplasma australiense” and OY-M. “Ca. Phytoplasma australiense” encodes 184 genes that were not found in the Mollicutes used in this comparison, while OY-M and AY-WB had 82 and 69 unique genes, respectively (data not shown). Of the 184 genes, 137 were coding for hypothetical proteins, while the remaining ORFs encoded genes including transposases, phage integrases, and N6-adenine-specific DNA methyltransferase from PMUs.
Recent phytoplasma genome sequence projects have provided insight into their genetic setup, metabolic capabilities, possible virulence factors, and proteins that might interact with their hosts. Phytoplasma genomic research is still in its infancy but will advance quickly if more sequenced phytoplasma genomes are available. A comparative analysis of several strains of a phytoplasma species living in the same host but displaying differences in virulence may provide insight into pathogenicity factors involved in plant disease.
The “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A) chromosome is 879,324 bp in size and is therefore one of the largest phytoplasma chromosomes to be completely sequenced. One extrachromosomal DNA element (pCPa, 3.7 kb [GenBank accession number DQ119295]) was previously identified in “Ca. Phytoplasma australiense” (85).
The “Ca. Phytoplasma australiense” genome contains PMUs, as defined previously by Bai et al. (9), that are gene clusters that encode elements with similarities to phage integrases and genes involved in replication, repair, or recombination.
We identified five potential mobile clusters in the “Ca. Phytoplasma australiense” genome. These elements were identified based upon the location of the phage integrase-like element; the identification of genes involved in replication, repair, and recombination; as well as their conserved gene organization. The coding regions in the “Ca. Phytoplasma australiense” genome with similarities to phage integrase also have similarities to transposases belonging to the IS30 group of the IS3 family and transposases encoded on phage elements. Some phage integrase-like coding domains may also be involved in chromosome rearrangement since they were similar to the oac gene from bacteriophage Sf6 from Shigella flexneri (16).
Most of the “Ca. Phytoplasma australiense” PMUs contained the tmk gene that encodes thymidylate kinase. This enzyme catalyzes the transfer of a terminal phosphoryl group from ATP to dTMP and is crucial for de novo synthesis and salvage pathways for pyrimidine deoxyribonucleotides (50). “Ca. Phytoplasma australiense” carries multiple copies of the tmk gene in two paralog families. The OY-M phytoplasma genome encodes two tmk genes, tmk-a and tmk-b (50), where multiple copies of tmk-a exist in PMUs (9) but only tmk-b has thymidylate kinase activity (50). One paralog family containing two ORFs (CAM11639 and CAM11700) was similar to the tmk-a gene from OY-M and AY-WB but also to orthopoxviruses (6), and this supports a link between viruses and phytoplasma extrachromosomal DNA (42, 55, 57, 58, 67, 84).
As has been reported previously for the AY-WB phytoplasma (9), “Ca. Phytoplasma australiense” mobile units may undergo transposition reactions in a replicative manner. There are two lines of evidence for this: first, “Ca. Phytoplasma australiense” contained multiple PMUs as well as PMU-like clusters; second, these mobile groups contained genes encoding DNA helicase, primase, and single-stranded DNA binding, all of which are involved in DNA replication, recombination, or repair. “Ca. Phytoplasma australiense” PMUs also included single-stranded DNA binding proteins and DNA helicases that were similar to prophage or phage entities that have a role in bacterial diversification by horizontal gene transfer (15). The presence of phage integrase-like sequences within mobile DNA groups indicates the existence of phage-based horizontal transfer (73) or a new mechanism. The mechanism by which “Ca. Phytoplasma australiense” PMUs may have been laterally transferred and accumulated throughout the genome is unknown, since the features that characterize composite transposons, such as inverted repeat regions, were not found. If the PMUs are mobile, they may rely on helper elements to provide conjugative transfer, or they may package the mobile units into phage particles (45, 71, 81).
“Ca. Phytoplasma australiense” encoded 197 strain-specific genes compared to other phytoplasmas and 184 strain specific-genes compared to the Mollicutes studied. Most of these genes encode hypothetical proteins of unknown function. Although the exact function of PepA in “Ca. Phytoplasma australiense” is unknown, its location on the genome raises interesting possibilities. Upstream from pepA are both a hemolysin-like protein that is a potential virulence factor and a phage integrase-like element that, if functional, can integrate DNA. ABC transporters are located downstream from pepA. Since PepA is known to bind proteins, the presence of certain proteins upstream and downstream of pepA suggests that it may not be used only as a housekeeping gene (49). A recent review by Matsui et al. (49) suggested that PepA may have a secondary function as a toxin receptor to vesicular trafficking or a site-specific recombination factor or may interact with the ABC-like spermidine/putrescine-binding transporters.
Another strain-specific “Ca. Phytoplasma australiense” protein is the bacterial regulatory protein SpxA. This protein represses the transcription of genes involved in growth and development during unfavorable conditions by binding to RNA polymerase and is commonly found in gram-positive bacteria with low GC content (21, 53, 54, 88), including most Mollicutes. In B. subtilis, spx exerts positive and negative control over transcription initiation, particularly during oxidative stress (88). Spx also exerts redox-sensitive transcription control over trxA and trxB, two genes that are involved in thiol homeostasis. This reaction is dependent on the presence of a CXXC motif (found in the “Ca. Phytoplasma australiense” SpxA protein) that implied that spx was involved in the cell's response to thiol-specific oxidative (disulfide) stress (54). In mycoplasmas, the thioredoxin reductase system involving TrxA and TrxB is required by the pathogen for defense against reactive oxygen species such as hydrogen peroxide produced by the host (11, 25, 32). This system differs from those of other bacteria that encode catalases, peroxidases, and superoxide mutases to provide defense against oxidative stress (59, 83). All three phytoplasmas carry sodA, the gene for superoxide dismutase. It was previously reported that OY-M used a defense system distinct from that of mycoplasmas (59). However, since “Ca. Phytoplasma australiense” carries spxA, we speculate that this phytoplasma may use either or both systems in response to oxidative stress within the cell.
“Ca. Phytoplasma australiense” metabolic pathways are similar to those of OY-M and AY-WB. Essentially, all three phytoplasmas lacked functional metabolic pathways for sugar metabolism, ATP synthesis, CO2 fixation, fatty acid metabolism, the urea cycle, both type I and type II secretion systems, and the PTS. The missing PTS in phytoplasmas sets them apart from those in the SEM clade since they are unable to import sugars using the multiprotein system. Phytoplasmas instead may rely on their ABC transporters to import sugars (9).
In bacteria, ATP synthase has eight subunits (24, 26, 30, 37, 72). Mycoplasmas and ureaplasmas encode all eight subunits of the FoF1-type ATPase catalytic core for ATPase synthase and utilize the transmembrane potential for ATP synthesis (66), but all three phytoplasma genomes sequenced to date lack all eight subunits. Phytoplasma genomes do not encode cytochrome genes and therefore lack a functional oxidative phosphorylation pathway. Studies with B. subtilis and E. coli show that in the absence of oxidative phosphorylation, ATP could be synthesized by substrate-level phosphorylation (72). All three phytoplasma genomes encode P-type ATPases, suggesting that they may be able to generate electrochemical gradients over the membrane (9) by actively transporting the substrate across the membrane and maintaining the gradient potential, thus providing further evidence that phytoplasmas are able to synthesize ATP in the absence of the oxidative phosphorylation pathway.
The number of similar genes between the two “Ca. Phytoplasma asteris” strains and the “Ca. Phytoplasma australiense” strain is indicative of the relationship of the strains. Based on the similarity of genes, OY-M and AY-WB are more closely related to each other. The number of strain-specific genes in “Ca. Phytoplasma australiense” compared to OY-M and AY-WB reflected the larger genome size as well as the evolutionary divergence between “Ca. Phytoplasma australiense” and the two “Ca. Phytoplasma asteris” strains. When the OY-M and AY-WB genomes were aligned, the “X” pattern was characteristic of closely related and recently diverged genomes. This was also observed between M. genitalium and M. pneumoniae (27). Whole-genome alignment between “Ca. Phytoplasma australiense” and the two “Ca. Phytoplasma asteris” genomes showed no conservation of gene synteny, although the number of similar genes was consistent.
Obligate pathogens rely on their host for certain nutrients such as amino acids, cofactors, nucleotides, and other compounds (87). This is reflected in the large number of important membrane transporters that are retained in the genome (17). All three phytoplasmas encode a large number of ABC transporters, particularly those capable of importing sugars such as maltose, trehalose, sucrose, and palatinose. The OY-M genome carries an incomplete gene/pseudogene gftA for sucrose phosphorylase that is important for sucrose cleavage (60). This gene is absent from the AY-WB genome (9), but the complete ORF is found in the “Ca. Phytoplasma australiense” genome. This suggests that once sucrose is imported into the cell, “Ca. Phytoplasma australiense” can convert it into glucose and fructose. Spiroplasma citri, an insect-transmitted phloem-limited plant pathogen, requires both glucose and fructose to survive in plant sieve tubes. The PTS is the major import system of carbohydrates in S. citri (5). S. citri can also utilize trehalose, which is the dominant sugar in leafhopper hemolymph (5). S. citri has evolved the capacity to metabolize glucose, fructose, and trehalose and adapt to its host environment (5). Like S. citri, it is possible that “Ca. Phytoplasma australiense” may have been able to adapt and survive in its plant host environment using sucrose phosphorylase to cleave imported sugars and possibly utilize sugar ABC transporters to import trehalose for survival in their insect host. However, the use of trehalose to survive in the insect host is purely speculative, since there is no evidence that “Ca. Phytoplasma australiense” can utilize trehalose, although it does have the means to import the sugar.
Virulence genes such as hemolysins and adhesion-related proteins are thought to be involved in bacterial pathogenicity. “Ca. Phytoplasma australiense” carried an ORF (CAM11455) that had sequence similarities to a hemolysin III protein from Staphylococcus epidermis. Similar findings were reported previously for the AY-WB phytoplasma genome, where two hemolysin-related proteins were identified (9). A conserved domain for a membrane protein was identified within a “Ca. Phytoplasma australiense” ORF (CAM11455) that also showed similarities to hemolysin inner membrane protein YqfA from E. coli. “Ca. Phytoplasma australiense” encoded another ORF (CAM11945) with similarities to hemolysin, but it also contained a transmembrane domain, two cystathionine beta synthase domains and a transporter domain. These cystathionine beta synthase domains are generally found in two or four copies within a protein and may play a regulatory role, but their exact function is unknown (77). The transporter domain may be involved in the modulation of ion substrate transport (47). Although these two ORFs are related to hemolysins, the presence of extra domains suggested that they were not true hemolysins. Therefore, at this stage, direct evidence of pathogenicity factors is missing.
Plant-pathogenic bacteria secrete enzymes capable of degrading plant cell walls, such as cellulases, xylanases, glucanases, pectinases, and proteases (33, 38). None of these enzymes were found in “Ca. Phytoplasma australiense,” but glucanase was encoded in the OY-M genome and may be involved in phytoplasma virulence (60).
Gram-positive bacteria secrete proteins via the main protein translocation system (Sec), where proteins traverse through a only single membrane to enter the host cell (20, 86). Phytoplasmas carry some of the genes of the Sec protein translocation pathway, but they lack several genes and signal peptidases of the protein maturation component, such as secB, secG, secF, and secD (66). Although several components of the pathway were missing, the Sec protein translocation system was found to be functional in OY-M (35). One indication of a functional sec-dependent pathway is the presence of proteins with N-terminal signal peptides that can be cleaved (9, 10, 36). This signal peptide has been found to precede major membrane proteins of phytoplasmas from other groups (10, 12). Such proteins can be secreted via the Sec protein translocation system and might act as part of the virulence machinery, as reported previously for Streptococcus pyogenes (9, 69). Some of the “Ca. Phytoplasma australiense” hypothetical proteins as well as three solute binding ABC transporters contained these N-terminal signal peptides, which adds weight to the speculation that the phytoplasmas use a Sec protein translocation system.
All three phytoplasmas encoded an intracellular multiplication IcmE-like protein that is part of the type IV secretion system and that is involved in phagocytosis (31, 89), intracellular multiplication, and host cell destruction (14, 79). “Ca. Phytoplasma australiense” encoded an ORF (CAM11886) that was similar to the IcmE proteins from Legionella pneumoniae and Coxiella burnetii. The type IV secretion systems are homologous to conjugation systems that are used by bacteria to deliver macromolecules such as nucleoprotein complexes and proteins across kingdom barriers (18, 80). Thus, it would be interesting to determine the function, if any, of CAM11886 in pathogenicity.
Parasitic and endosymbiotic bacteria are in a general process of genome reduction because essential metabolites are supplied by the host. This tendency is also observed within the phytoplasmas, where genome sizes down to 530 kb have been reported (48). Phytoplasmas and Mollicutes in general have distinctive genomic features such as a reduced chromosome, low GC content, fewer than 1,000 genes, and a limited metabolic capacity (51). These obligate parasites tend to have chromosomes with a large number of DNA repeats (68) that may contribute to chromosomal rearrangements. This may explain the poor conservation of gene order observed in Mollicutes (68). Buchnera sp. is a mutualistic endosymbiont that plays a key role in the survival and ecology of its host and provides nutrients not available in the host's restricted diet. This includes essential amino acids such as riboflavin and tryptophan. Unlike other Buchnera spp., the Buchnera aphidicola Cc chromosome is only 416 kb (64), and during the process of genome reduction, it has lost most of its metabolic functions including the essential metabolism of tryptophan and riboflavin. B. aphidicola Cc also lacks most of the transporters encoded by other Buchnera spp. and lacks genes involved in amino sugar and peptidoglycan biosynthesis, suggesting that it may be close to becoming a free-diffusing cell (64). B. aphidicola Cc may be losing its symbiotic capacity and is being complemented by a highly abundant secondary symbiont, Serratia symbiotica, which may be providing the essential amino acids required by the host (64). This differs from phytoplasmas, where there are large numbers of transporters that are used to obtain essential nutrients from the host. This suggests that the possible effect of genome reduction on phytoplasmas may lead to complete reliance on the host cell for survival.
Gene order is not conserved between different phytoplasma strains, but synteny was observed between closely related phytoplasmas. While the manuscript was in preparation, another “Ca. Phytoplasma australiense” genome was sequenced (43). This New Zealand “Ca. Phytoplasma australiense” strain is larger (959,779 bp) (43) than the strain that we describe here. The sequence of that strain is not publicly available yet, so a detailed analysis of the larger genome was not possible. PMUs may be a key factor in chromosome size variation, as suggested previously by Bai et al. (9) for the closely related “Ca. Phytoplasma asteris” strains OY-M and AY-WB. AY-WB may be further along the reductive evolutionary path than OY-M since its genome contains fewer PMU insertions, more truncated or deleted ORFs, more missing metabolic genes, and fewer ORFs shared with some Mollicutes (9). Comparative genomic analysis between the two “Ca. Phytoplasma australiense” strains may reveal that PMUs and/or multicopy sequences could explain the differences in genome sizes.
The comparative analysis of three full-length phytoplasma genomes has increased our understanding of their genetics, particularly their metabolic capabilities. Since some essential metabolic pathways are completely missing and others are greatly reduced, it is still difficult to obtain a clear picture of the metabolic capacity of phytoplasmas. Nearly 50% of the ORFs found in these phytoplasmas are yet unassigned, and it is likely that key metabolic enzymes are among those which do not have orthologs in other bacteria. At present, several other genome sequencing projects of phytoplasmas with much smaller genomes (“Ca. Phytoplasma mali” [602 kb] [M. Kube et al., unpublished data] and Western X phytoplasma [670 kb] [L. W. Liefting and B. C. Kirkpatrick, unpublished data]) and different phylogenetic group affiliations are nearly complete. This additional information will give us a more comprehensive view of the essential metabolic pathways and might allow us to predict an evolutionary path for the “large-genome” phytoplasmas.
Published ahead of print on 21 March 2008.