|Home | About | Journals | Submit | Contact Us | Français|
Phytoplasmas are unculturable bacterial plant pathogens transmitted by phloem-feeding hemipteran insects. DNA of phytoplasmas is difficult to purify because of their exclusive phloem location and low abundance in plants. To overcome this constraint, suppression subtractive hybridization (SSH) was modified and used to selectively amplify DNA of the stolbur phytoplasma infecting a periwinkle plant. Plasmid libraries were constructed, and the origins of the DNA inserts were verified by hybridization and PCR screenings. After a single round of SSH, there was still a significant level of contamination with plant DNA (around 50%). However, the modified SSH, which included a second round of subtraction (double SSH), resulted in an increased phytoplasma DNA purity (97%). Results validated double SSH as an efficient way to produce a genome survey for microbial agents unavailable in culture. Assembly of 266 insert sequences revealed 181 phytoplasma genetic loci which were annotated. Comparative analysis of 113 kbp indicated that among 217 protein coding sequences, 83% were homologous to “Candidatus Phytoplasma asteris” (OY-M strain) genes, with hits widely distributed along the chromosome. Most of the stolbur-specific SSH sequences were orphan genes, with the exception of two partial coding sequences encoding proteins homologous to a mycoplasma surface protein and riboflavin kinase.
Phytoplasmas are responsible for plant diseases that damage annual crops as well as perennial cultures such as fruit trees and grapevine (24). These pathogens multiply within the phloem cells of the host plant and are transmitted from plant to plant by phloem-feeding insects (51). As of today, the many diseases induced by phytoplasmas cannot be cured, and the control of disease spread consists of implementing prophylactic measures, such as quarantine, destruction of infected plant material, and pesticide treatment against the insect vectors. Implementing control of phytoplasma-induced diseases requires the taxonomic characterization of the agent, the determination of its plant host range, and the identification of its insect vector(s) (25). All of these studies necessitate the development of methods for diagnosis that rely on the molecular detection of phytoplasma DNA (8, 19). Most phytoplasmas have been classified according to 16S rRNA gene phylogeny and restriction fragment length polymorphism profiles into 14 to 20 groups (24, 49), and 20 “Candidatus Phytoplasma” species have been described (16, 46). Differentiation of phytoplasmas occupying distinct biological niches but displaying less than 3% divergence in 16S rRNA genes must be achieved by comparing genetic loci other than 16S rRNA genes (27). Whereas important progress has been made regarding phytoplasma classification and ecology, phytoplasma phytopathogenicity and mechanisms of phytoplasma transmission by insects are still poorly understood and will benefit from the knowledge and the comparative analysis of phytoplasma genomes.
However, isolating phytoplasma DNA is still hampered by the inability to cultivate phytoplasmas in vitro and by the difficulty in isolating significant amounts of phytoplasmas. Enrichment of phytoplasmas from plant or insect extracts by differential centrifugation or immunoaffinity analysis was obtained but yielded a small number of organisms and could never be generalized (17, 18, 47, 48). Based on the high AT content of phytoplasma DNA, its physical enrichment using a CsCl equilibrium buoyant density gradient in the presence of bisbenzimide allowed others to isolate phytoplasma DNA and to clone phytoplasma genes (7, 21, 44). More recently, the physical isolation and characterization of full-length phytoplasma chromosomes by pulsed-field gel electrophoresis (PFGE) allowed the determination of phytoplasma chromosome size and the cloning of enough genomic DNA to plan the sequencing of a phytoplasma genome (28, 33, 36). A strategy consisting of cloning PFGE-purified phytoplasma DNA into organized lambda phage libraries, completed by PCR amplification and shotgun sequencing, recently led to the determination of the first phytoplasma genome sequence (39). Although this represents a scientific breakthrough, at least the partial sequencing of other phytoplasma genomes is still necessary due to the ecological and genomic diversity of phytoplasma (24, 49). Comparisons between phytoplasma genomes will help to discover species-specific genes that could be related to specific biological properties. In order to improve phytoplasma DNA purification, the use of molecular subtraction was investigated. The suppression subtractive hybridization (SSH) technique (9) was adapted to produce high quantities of enriched phytoplasma DNA from total DNA of infected plants. Enrichment was further increased by introducing a second round of subtraction (double SSH). The stolbur disease phytoplasma, the type member of the phylogenetic group 16SrXII-A (24), was chosen because in Europe it affects a wide range of wild and cultivated plants, to which it is specifically transmitted by polyphagous Fulguromorpha planthoppers of the family Cixiidae (12, 14, 51). The stolbur strain PO (for Pyrénées Orientales) was recently transmitted to Catharanthus roseus periwinkles and was selected for this study. The stolbur PO also possesses an interesting phenotype as it induces floral abnormalities such as virescence and phyllody in addition to leaf yellowing.
Double SSH is presented as a new method which provides enough DNA to construct phytoplasma genomic libraries. A preliminary survey of the stolbur phytoplasma genome deduced from the characterization of 181 phytoplasma genetic loci illustrates the potential of this approach.
Healthy or stolbur-infected periwinkle (C. roseus L.) plants were grown in a greenhouse at 25°C during the day and at 20°C at night with a photoperiod of 16 h of daylight and 8 h of darkness. The initial periwinkle plant infected with stolbur phytoplasma strain PO was obtained in 1996 through exposure of healthy periwinkle to natural inoculation in the southwest of France. Stolbur PO was confirmed as a stolbur strain by serology and 16S rRNA gene sequencing. The strain has been maintained in periwinkle since 1996 by graft inoculation.
Leaf midribs (0.5 g) were triturated in polyethylene bags with a ball-crushing apparatus (Bioreba Ag) in 3 ml of buffer (55 mM cethyltrimethylammoniumbromide, 1.4 M NaCl, 26 mM β-mercaptoethanol, 20 mM EDTA, 100 mM Tris-HCl, 0.5 mM polyvinylpyrrolidone [40,000 Da], pH 8.0). The triturate was incubated for 30 min at 65°C and subsequently centrifuged for 5 min at 1,500 × g. The supernatant was submitted to protein extraction by adding one volume of chloroform:isoamylic alcohol (24:1, vol/vol). After centrifugation for 10 min at 14,000 × g, 0.6 volume of isopropanol was added to the upper phase. The precipitated nucleic acids were recovered by centrifugation at 14,000 × g for 20 min. The pellet was washed twice with 70% ethanol, dried under vacuum, and resuspended in 60 μl of sterile water. Prior to SSH, nucleic acids were treated with RNase A (10 μg/ml). The integrity of the DNA was verified by electrophoresis through a 0.7% agarose gel, and DNA concentration was determined by measuring the optical density at 260 nm of the extract.
The rationale and main steps of double SSH are presented in supplemental Fig. S1. The SSH protocol was according to the PCR-Select Bacterial Genome Subtraction kit (BD Bioscience) except for the primers and adaptors, which were different (Table (Table1).1). Briefly, total DNA from healthy and infected periwinkle (2 μg) was digested overnight at 37°C either by RsaI or by HincII restriction endonucleases (MBI Fermentas) to constitute the driver DNA and prepare tester DNA. For RsaI-SSH, tester DNA (100 ng) from infected periwinkle was ligated in two separate reactions of 10 μl each with either adaptor 1 or 2R for RsaI-SSH. Then 1 μl of each ligation product was heat denatured and separately hybridized to an excess of driver (600 ng of RsaI-digested DNA from healthy periwinkle) for 1.5 h at 63°C. The two hybridization mixtures were then mixed in the presence of 300 ng of driver and incubated for an overnight hybridization at 63°C. Hybrids carrying both adaptors 1 and 2R were amplified by nested PCR according to the manufacturer's instructions using Taq Advantage cDNA Polymerase Mix (BD Biosciences-Clontech). PCR amplification was performed in a 25-μl reaction volume with 0.4 μM primer P1 (RsaI-SSH). The templates were first heated for 2 min at 72°C to fill the ends and then denatured for 25 s at 94°C. Thermal PCR conditions consisted of 25 cycles (10 s at 94°C, 30 s at 66°C, and 1 min 30 s at 72°C) with a single final extension of 7 min at 72°C. One microliter of a 1:40 dilution of the primary PCR product was submitted to a nested amplification of 18 thermal cycles with primers NP1 and NP2R using the same parameters as above except for the annealing temperature, which was 68°C. For HincII-SSH, tester DNA corresponded to HincII-digested stolbur-infected periwinkle DNA ligated to adaptors 3 and 4R; PCR was carried out with primer P3, and nested PCR was carried out with primers NP3 and NP4R.
For reverse SSH, DNA from infected plants was used as the driver and DNA from healthy plants was used as the tester. For double RsaI-SSH, the adaptors 1 and 2R were first removed from the RsaI-SSH product by RsaI digestion and replaced by the adaptors 3 and 4R. Then, the RsaI-SSH product ligated to the new adaptors was used as a tester, and the RsaI reverse SSH product was used as a driver after the adaptors were removed by RsaI digestion. For double HincII-SSH, the HincII-SSH product was used as a tester after removal of adaptors 3 and 4R by HincII digestion and their replacement with adaptors 1 and 2R. The HincII reverse SSH product was used in this instance as the driver. For double HincII-SSH, PCR was carried out with a classical Taq DNA polymerase (Promega).
The SSH PCR products (1 μl) were ligated overnight at 4°C with 3 units of T4 DNA ligase (Promega) in ligation buffer containing 50 ng of pGEM-T Easy plasmid vector (Promega). Ligations were used to transform Escherichia coli strain DH10B. Ampicillin-resistant white colonies were randomly picked and grown in LB broth containing ampicillin (50 μg/ml). Plasmid purifications were performed with a Wizard Plus SV Miniprep DNA Purification System (Promega). Insert lengths were estimated after EcoRI digestion by agarose gel electrophoresis. For individual hybridization screening, inserts were labeled by PCR amplification with nested SSH primers in the presence of a digoxigenin-11-dUTP containing a deoxynucleoside triphosphate mix (DIG Labeling Mix plus; Roche). Probes were used to hybridize dot blots consisting of NaOH(0.4 M)-denatured healthy or infected plant DNA (10 μg) and of the corresponding plasmid as a positive control (100 ng), spotted on Nytran Super Charge nylon N+ transfer membrane (0.45-μm pore size; Schleicher & Schuell). For large-scale negative screening with reverse SSH probes, probes were labeled with 1 nmol of digoxigenin-11-dUTP by random priming (DIG High Prime kit; Roche), and 2 μg of each plasmid was spotted on the membranes. Dot blots were prehybridized for 1 h and hybridized overnight at 42°C in hybridization solution (50% formamide, 5× SSC [1× SSC is 0.15 M NaCl plus 0.015 M sodium citrate], 2% blocking reagent [Roche], 0.1% lauroyl sarcosyne, 0.02% sodium dodecyl sulfate [SDS], 100 ng/ml of heat-denatured salmon sperm DNA). Membranes were then washed four times for 15 min in 2× SSC-0.5% SDS at ambient temperature and twice for 30 min in 0.5% SSC-0.1% SDS at 60°C. The presence of hybridized probe was revealed using anti-digoxigenin Fab fragment and CDP-Star as a substrate according to the DIG DNA Labeling and Detection Kit (Roche).
To confirm the phytoplasma origin of SSH sequences having no homologue in sequence databases, sense and antisense primers were designed for each orphan sequence (Table (Table1)1) using the primer picking tool of the program Consed (15) and the following parameters: 20 to 25 bp for the primer length and 52 to 60°C for the melting temperature. Amplifications using Taq DNA polymerase (Promega) were performed on both healthy and stolbur PO-infected periwinkle DNA in a 50-μl reaction volume with a 1 μM concentration of each primer pair. Thermal PCR conditions consisted of 40 cycles (30 s at 94°C, 30 s at 55°C, and 1 min 30 s at 72°C). For primer pairs F/RSDH0070, F/RSR02F01, and F/RSR02F02, annealing and elongation temperatures were decreased, respectively, from 55°C to 52°C and from 72°C to 66°C. For primer pair F/RSR01G05 the annealing temperature was increased from 55°C to 60°C.
Sequencing reactions were performed by Genome-Express (Grenoble, France) according to Applied Biosystems (Big Dye Terminator) or Amersham Pharmacia Biotech (ET Terminator) protocols; unincorporated nucleotides were removed by exclusion columns, and all samples were processed on MegaBACE capillary sequencing instruments. The raw sequence chromatograms were assembled and edited using Phred, Phrap, and Consed software programs (10, 11, 15). Homologies between SSH-cloned sequences and known sequences were detected using the BLASTN and the BLASTX algorithms against the nonredundant GenBank database at http://www.ncbi.nlm.nih.gov/BLAST and against the “Ca. Phytoplasma asteris” (OY-M) genome database at http://gib.genes.nig.ac.jp (39) with the phytoplasmal translation code. Homology results were considered significant when more than 30% sequence identity was obtained with an E value below 10−12. Coding sequences (CDS) encoding hypothetical proteins were predicted through compositional analysis by the program FrameD (45) trained on the set of 45 partial stolbur phytoplasma conserved CDS, totaling 7,150 codons (see Fig. S2 in the supplemental material, where these CDS are represented as boxes with a yellow background). The tRNAs were predicted using the tRNA-scan SE program (31). The G+C content, amino acid usage, and codon usage were determined using the program Freqsq at http://www.infobiogen.fr/services/analyze/analyze.php?page = analweb.
The stolbur SSH sequences were deposited in the EMBL database under the accession numbers AJ970537 to AJ970718.
In order to purify stolbur phytoplasma DNA from total DNA of infected periwinkle plants (tester DNA), a first enrichment step was carried out by subtracting the DNA of healthy periwinkle (driver DNA) from the tester DNA. Two SSH steps were carried out after digestion of DNAs by RsaI or HincII. RsaI-SSH (Fig. (Fig.1A,1A, lane 2) and HincII-SSH products (Fig. (Fig.1B,1B, lane 2) were clearly different from their respective unsubtracted controls (Fig. (Fig.1,1, lanes 1). The SSH products were cloned to produce two plasmid libraries of 3,000 clones in the case of RsaI-SSH and 8,000 clones in the case of HincII-SSH. First, to determine which proportion of DNA clones corresponded to phytoplasma DNA, 20 plasmids were purified, and their DNA inserts were labeled with digoxigenin-11-dUTP. Individual probes were used to hybridize phytoplasma-infected as well as healthy periwinkle DNA (Fig. (Fig.2A,2A, lanes 1 and 2, respectively). Hybridization results indicated that nearly half of the plasmid inserts corresponded to phytoplasma DNA as the corresponding probes recognized only phytoplasma-infected periwinkle DNA. Second, to evaluate the proportion of phytoplasma DNA for a larger number of clones, a total of 263 and 75 plasmids were purified from the RsaI-SSH and HincII-SSH libraries, respectively, and spotted on nylon membrane. An attempt to screen out SSH plasmids carrying periwinkle DNA inserts using healthy periwinkle DNA probe lacked sensitivity due to the high complexity of total DNA probe (data not shown). Plasmids were therefore hybridized to a reverse SSH probe, known to efficiently screen out false-positive DNA of driver origin (38). Results of this negative screening are presented in Fig. Fig.2B2B for the RsaI-SSH library and summarized in Table Table22 for both libraries. Nearly half (52%) of RsaI-SSH clones were not recognized by the reverse SSH probe and were, therefore, determined to be potential phytoplasma DNA. In the case of HincII-SSH, 58% of the clones were determined to be phytoplasma DNA (Table (Table2).2). The average insert sizes were 0.9 and 0.8 kbp for RsaI- and HincII-SSH libraries, respectively.
To avoid fastidious hybridization screening, a second step of enrichment (double SSH) was realized by subtracting from the SSH product used as tester DNA a driver DNA consisting of reverse SSH product (Fig. 1A and B, lanes 3). This latter product yields most, if not all, of the false-positive DNA fragments after one round of SSH as shown above. The products of these double SSHs, named double RsaI-SSH and double HincII-SSH products (Fig. 1A and B, lanes 4), were cloned to construct plasmid libraries of 10,000 and 2,000 clones, respectively. The average insert size in both libraries was again around 0.8 kbp (Table (Table2).2). Negative screening of double RsaI-SSH plasmids using reverse SSH probe showed that only 2 plasmids of over 40 tested hybridized to the probe (Fig. (Fig.2C).2C). In addition, one of these two plasmids (clone SDR0021) (Fig. (Fig.2C,2C, spot C01) was demonstrated upon insert sequencing to consist of a chimeric sequence resulting from the ligation of two RsaI fragments, one of plant origin and the other of phytoplasma origin. Therefore, the proportion of phytoplasma DNA in the double RsaI-SSH library was estimated to be 97.5%. Similar results were obtained by screening the double HincII-SSH library, as none of the 40 clones tested was recognized by the reverse SSH probe. These data demonstrated that SSH sequences homologous to sequences present in the reverse SSH product had been efficiently subtracted.
Sequences of 266 plasmid inserts selected through hybridization screening and having a size greater than 0.5 kbp were assembled by library. Editing the assemblies revealed that the redundant inserts totaled 15% of the RsaI- and HincII-SSH libraries. The double RsaI-SSH library was more redundant (20%), whereas the double HincII-SSH library was less redundant (10.6%), essentially due to the effective suppression in the double HincII-SSH product of a 0.6-kbp PCR fragment which was prominent in the HincII-SSH product (Fig. (Fig.1B,1B, lanes 2 and 4). Nucleotide discrepancies between redundant sequences allowed the calculation of the apparent frequency of Taq DNA polymerase errors. This could be estimated to be in the range of 1.5 to 1.7 per kbp for both simple SSH libraries, about double (3.5 per kbp) for the double RsaI-SSH library, but much higher (6.9 per kbp) for the double HincII-SSH library (Table (Table2).2). This higher frequency corresponded to the use of classical Taq DNA polymerase lacking the proofreading activity present in the Advantage polymerase mix used to amplify RsaI-SSH, double RsaI-SSH, and HincII-SSH products.
Six SSH sequences containing either RsaI or HincII internal restriction sites were proven by BLAST analysis and/or PCR to be chimeric molecules. Sequences SR02A11, SDR0021, and SDH0029 were identified as periwinkle-phytoplasma chimeras, whereas sequences SR02E10, SDR0026, and SDH0066 corresponded to phytoplasma-phytoplasma chimeras. As no adaptor sequences were found, these are expected to have been formed at the beginning of tester DNA preparation.
Assembly and editing of all SSH sequences produced 196 contigs which were compared to the nonredundant databases using BLASTX and BLASTN. Three sequences not previously identified by hybridization screening as being periwinkle DNA were finally found to be homologous to reported plant DNA sequences, 150 sequences matched homologous CDS or nucleotide sequences in the “Ca. Phytoplasma asteris” (OY-M) genome sequence, and 4 sequences encoded partial CDS homologous to protein from bacteria other than “Ca. Phytoplasma asteris.” Finally, 39 sequences corresponded to orphans as there were no homologous sequences in gene databases. The phytoplasma or plant origin for the orphan sequences needed to be determined. Therefore, a pair of specific primers was designed for each orphan sequence (Table (Table1)1) except for the clone SR02A11, for which two primer pairs were designed as it contained an unexpected internal RsaI site. A PCR amplification was performed using each primer pair on both healthy and phytoplasma-infected periwinkle DNA (Fig. (Fig.3).3). Results indicated that 12 sequences were of periwinkle origin as they were detected in both DNAs, whereas 27 sequences, including the second RsaI part of SR02A11, were clearly of phytoplasma origin as PCR was only positive for the phytoplasma-infected periwinkle DNA. In summary, among SSH sequences, 181 were demonstrated to be of phytoplasma origin by hybridization, sequence homology, or PCR screening, whereas 15 sequences totaling 7.6 kbp were ultimately of plant origin.
The 181 SSH sequences represent altogether 113,209 nucleotides, which correspond to approximately 13.5% of the chromosome of stolbur phytoplasma strain PO, estimated in the range of 820 to 850 kbp by PFGE (unpublished data). The overall nucleotide identity to “Ca. Phytoplasma asteris” (OY-M) chromosome was 52%. As summarized in Fig. Fig.4A4A and described in supplemental Table S1, of the 181 sequences, 150 were found to be homologous to sequences present on the “Ca Phytoplasma asteris” (OY-M) chromosome (83%), 4 were homologous to sequences from other bacterial genomes (2%), and 27 proved to be orphan sequences specific to the stolbur phytoplasma genome (15%). As illustrated in supplemental Fig. S2, the “Ca. Phytoplasma asteris” CDS homologous to stolbur phytoplasma sequences were widely distributed along its chromosome except in the regions between the CDS PAM346 to PAM400 and PAM540 to PAM584, in which no stolbur-equivalent sequences could be identified. These two regions of the “Ca. Phytoplasma asteris” (OY-M) chromosome contain no housekeeping genes. They are constituted by CDS encoding hypothetical proteins having no homologs in protein databases and by clusters of paralogous CDS encoding proteins able to bind, modify, or segregate DNA, as well as ATP-dependent zinc proteases, sigma factors, and thymidylate kinases (39).
Local gene organization was mostly conserved between the two genomes. Indeed, we identified 14 intergenic sequences encompassed by homologous CDS in SSH stolbur inserts and in the “Ca. Phytoplasma asteris” genome. One exception, on the SSH sequence SR01C04, was the CDS corresponding to the conserved PAM448, which was not preceded by a CDS homologous to PAM447 but by a CDS homologous to PAM436. The fact that in the “Ca. Phytoplasma asteris” chromosome, the CDS PAM436 is encoded by the opposite strand more than 14 kbp upstream from PAM448 indicated a local DNA rearrangement in one of the chromosomes.
“Ca. Phytoplasma asteris” CDS homologous to stolbur SSH sequences were often clustered, which may indicate that the SSH libraries could overrepresent some regions of the stolbur phytoplasma genome. As restriction sites recognized by RsaI and HincII contain 50% G+C, libraries could overrepresent a region of higher G+C content. The overall G+C content of the 113-kbp SSH sequences was 29.9%, nearly 2% higher than that of the “Ca. Phytoplasma asteris” genome (28%). In order to know whether the libraries overrepresented the G+C-rich regions of the stolbur chromosome, the G+C content of the stolbur genome was estimated by comparing conserved parts of CDS and intergenic sequences common to the two phytoplasmas (Table (Table3).3). A set of 45 CDS conserved between both phytoplasmas and uniformly distributed along the “Ca. Phytoplasma asteris” chromosome was chosen (represented as boxes with a yellow background in Fig. S2 in the supplemental material). In a comparison of the G+C content of the 45 partial CDS, an average of 30.95% G+C was found for stolbur phytoplasma sequences (21,450 bp) versus 32.06% found in the corresponding sequences in “Ca. Phytoplasma asteris” (21,600 bp). Similarly, the 14 SSH stolbur intergenic sequences, which were encompassed in both phytoplasmas by homologous CDS, contained 18.9% G+C instead of 22.3% in the corresponding “Ca. Phytoplasma asteris” sequences. If the protein-coding region represents 73% of the stolbur phytoplasma genome, as reported in the case of “Ca. Phytoplasma asteris,” the G+C content of the stolbur phytoplasma chromosome should be about 1.7% less than the percent G+C of the “Ca. Phytoplasma asteris” chromosome, or approximately 26.3%. Taking into account that the overall G+C content of SSH sequences was calculated as 29.9%, we can conclude that the RsaI and HincII SSH libraries slightly overrepresent the G+C-rich regions of the stolbur phytoplasma genome, which certainly contains 26 to 27% G+C.
Other interesting features could be drawn from the comparison (Table (Table3).3). The conserved parts of the 45 CDS were 0.7% shorter for stolbur phytoplasma (7,150 amino acids) than for “Ca. Phytoplasma asteris” (7,200 amino acids), and the average length of the 14 conserved intergenic sequences was also shorter for stolbur phytoplasma (47.6 bp) than for “Ca. Phytoplasma asteris” (OY-M) (78.1 bp). The tendency of stolbur phytoplasma to use less G+C was confirmed for all codon positions in CDS. Interestingly, this resulted in the use of less proline (−8%), encoded by G+C-rich codons (CCN), or more TTA/G codon instead of CTN to encode leucine. However, stolbur phytoplasma surprisingly uses 13.5% more arginine, which is encoded by CGN, a G+C-rich codon, and by the AGA/G codon, without much change in the proportion of CGA/T and AGA used. Another remarkable feature was the significantly lower use (−23.6%) of cysteine for stolbur phytoplasma, corresponding to a lower use of TGC-encoded cysteine residues (14 compared to 30 for “Ca. Phytoplasma asteris”), whereas TGT codons were similarly used (44 for stolbur phytoplasma and 46 for “Ca. Phytoplasma asteris”).
According to BLASTN and tRNAscan-SE results, 15 stolbur SSH sequences were shown to correspond to parts of 16S and 23S rRNA genes, tRNA clusters, or group II intron RNA domains. A BLASTX homology search as well as FrameD-assisted prediction of the coding state of open reading frames allowed the identification of 217 partial or complete CDS (see Table S1 in the supplemental material). Among the 217 partial or complete CDS, 165 were homologous to CDS of the “Ca. Phytoplasma asteris” genome. Four CDS had homologues in bacteria other than “Ca. Phytoplasma asteris.” Two corresponded to S subunits of restriction enzymes (clones SR0028 and SR2E10), a partial CDS was homologous to a surface protein of Mycoplasma agalactiae (clone SR01H10), and another partial CDS was homologous to riboflavin kinase of Coxiella burnetii and Bacillus cereus (clone SR01B07). Finally, 48 CDS corresponded to a hypothetical protein with no equivalent in the “Ca. Phytoplasma asteris” genome. As illustrated in Fig. Fig.4B,4B, CDS predicted from the stolbur phytoplasma SSH sequences were assigned to five classes of cellular functions. The functional assignment for stolbur phytoplasma CDS is similar to the assignment obtained for the “Ca. Phytoplasma asteris” genome (Fig. 4B and C).
None of the stolbur phytoplasma partial CDS were homologous to CDS of “Ca. Phytoplasma asteris” plasmids EcOYM and pOYM (37). However, the CDS predicted from the SR02B12 SSH sequence had significant homology with the “Ca. Phytoplasma asteris” PAM444 (41% amino acid identity) and with three CDS of pBLTVA1 (54% amino acid identity), a plasmid reported to be present in the phytoplasma known as the beet leafhopper-transmitted virescence agent (29). The partial CDS identified on the SR02B12 SSH sequence was also characterized by 17 GXY triplets located after an N-terminal signal peptide, which were absent in PAM444 and pBLTVA1 CDS. In bacteria, GXY motifs known as collagen-like structural domains have mostly been reported in the firmicute group (42), for instance, on extracellular proteins known as virulence factors of group A Streptococcus (32). During insect cell invasion, the stolbur phytoplasma might benefit from possessing an external protein with such collagen-like domains, whose presence on the human mannose binding lectin was demonstrated to stimulate phagocytosis (1).
BLASTX analyses and FrameD predictions revealed a frameshift in 14 CDS over 217 predicted CDS. Nine frameshifted CDS were PCR amplified and directly sequenced without cloning. All were demonstrated to harbor no frameshift. By comparison to these direct chromosomal sequences, seven frameshifts found on the SSH sequences resulted from single nucleotide deletions on poly(T) or poly(A). These data demonstrated that no decay affected the genes encoding, for example, the replication initiation/membrane attachment protein, the DNA gyrase alpha subunit, the polyribonucleotide nucleotidyltransferase, the chaperone DnaK, the ribosomal protein S5, and the excinuclease ABC subunit C. These frameshifts could have been introduced either by the in vitro amplification by Taq polymerase or during the SSH plasmid propagation in E. coli.
Comparison of each SSH sequence to the 181 SSH sequences revealed the existence of repeated sequences (designated ** in Table S1 in the supplemental material). The most abundant repeated sequences corresponded to RNA domains and parts of the reverse transcriptase of a type II intron present as a single copy in the “Ca. Phytoplasma asteris” chromosome. Three and two copies of stolbur phytoplasma CDS, respectively, homologous to PAM522 and PAM532 were detected. Both CDS encode conserved hypothetical proteins described only in both phytoplasma genomes and are present as six and two copies, respectively, in the “Ca. Phytoplasma asteris” (OY-M) chromosome. Two sequences encoded partial CDS homologous to thymidylate kinase, and three other sequences corresponded to partial CDS, homologous to ATP-dependent zinc protease (HflB), which are encoded by several gene copies in the “Ca. Phytoplasma asteris” (OY-M) chromosome (39).
SSH was initially developed for generating differentially regulated or tissue-specific cDNA probes and libraries (9). More recently, SSH was used to identify species-specific genes or virulence factors of diverse cultivated bacterial species including proteobacteria, cyanobacteria, and mycoplasmas (22, 34, 40, 41, 50). The effectiveness of SSH to selectively amplify bacterial DNA from total DNA of an infected host had not yet been explored. This paper presents a modification of the original SSH method, called double SSH, which has been developed for purifying the DNA of the stolbur phytoplasma, a phloem-restricted plant pathogen unavailable in culture. In practice, phytoplasmas are propagated either in planta or in insect vectors. In addition, phytoplasmas multiply at a higher level in experimental herbaceous hosts, such as the periwinkle C. roseus, than they do in woody plant hosts (2). Berges and colleagues evaluated the concentration of various phytoplasmas in C. roseus by competitive PCR as ranging from 2.2 × 108 to 1.5 × 109 phytoplasmas per gram of leaf midrib (2). As stolbur phytoplasma PO has a genome size of about 850 kbp, this represents 85 ng of phytoplasma DNA per gram of periwinkle leaf midrib, from which 100 μg of total DNA can usually be extracted. Therefore, the phytoplasma DNA should not represent more than 0.1% of the total infected-plant DNA. Using competitive PCR, quantification of the aster yellows phytoplasma DNA in its insect vector Macrosteles fascifrons estimated this DNA to be 0.8% of total DNA extracted from the insect (30). The method reported in the present paper enriched phytoplasma DNA from less than 0.1% of total plant DNA to 50% and 100% of the inserts in SSH and double SSH libraries, respectively, providing enrichments of 500 and 1,000. These results were obtained regardless of the restriction enzymes used. With the second round of subtraction in double SSH, the false-positive plant DNA was efficiently eliminated. This represents a reduction of false-positive background comparable to that reported to be achieved by mirror orientation selection, which, however, failed to eliminate 18% of nondifferentially expressed brain cDNA (43).
The average size of stolbur phytoplasma cloned DNA was about 0.85 kbp (0.3 to 2.4 kbp), a useful size as it is in the range of bacterial genes and larger than the 0.1 to 0.6 kbp obtained by the phytoplasma whole-genome amplification method recently described (13). Representation of a genome in random genomic libraries primarily depends on genome and insert size and on the number of independent clones. The likelihood that a sequence of interest is present in such a random library can be estimated by a simple statistic based on the Poisson distribution (4). Shotgun libraries with 2,000 and 10,000 inserts of 0.85 kbp (characteristics of double SSH libraries) will, respectively, represent about 87% and 99% of an 850-kbp genome (stolbur PO estimated chromosome size). However, these theoretical values will not be reached with double SSH libraries for the following reasons. First, due to the SSH process, restriction fragments larger than 2 kbp will not be represented in SSH libraries. The use of different restriction enzymes to set up overlapping libraries is therefore necessary to improve genome coverage. Second, PCR better amplifies some of the DNA targets, as demonstrated by the insert redundancy in our SSH libraries. This selectivity of PCR amplification will affect the randomness of phytoplasma genome representation in the SSH libraries. It must be noted that applying a second round of SSH did not result in an increase in the redundancy. For the reasons mentioned above, it is clear that phytoplasma genome sequencing cannot be achieved by using only the double SSH method. To assess the real potential of the method, more SSH clones need to be sequenced in order to evaluate the libraries' depth. Another disadvantage of the double SSH was the increased frequency of Taq polymerase errors introduced by the two additional amplifications. Polymerase errors were found at relatively high frequency in the SSH libraries; as expected for methods with two and four PCR steps, the observed error frequency was in these cases, respectively, twice and four times the frequency observed for a single PCR step (i.e., 0.76 errors per kbp) (3). The use of proofreading DNA polymerases should partially avoid these errors, even though the ability of such enzymes to efficiently amplify the complex SSH products remains to be determined. Two other artifacts could be attributed to the method: 3% of the inserts were periwinkle-phytoplasma or phytoplasma-phytoplasma chimeras that did not result from the cloning of two different SSH fragments into plasmids, and variations in poly(A) and poly(T) length that did not exist in the original genes had been introduced in 6% of the SSH fragments.
One advantage of the SSH and double SSH methods is the possibility to clone many phytoplasma partial gene sequences from only 4 μg of total periwinkle DNA (1 g of midribs usually yields 100 μg of DNA). This is an improvement compared to the 20 to 40 g of plant material necessary for the method using a CsCl equilibrium buoyant density gradient (21). Recently, the application of double SSH to other phytoplasmas propagated in periwinkle was also successful in our laboratory for “Candidatus Phytoplasma prunorum” (strain GSFY2) (J. L. Danet and X. Foissac, unpublished data). This indicated that this new method can be applied to other phytoplasmas and could also be applied to other microbes unavailable in culture.
Most of the SSH inserts selected from SSH libraries or randomly sampled from the double SSH libraries were proven to correspond to stolbur phytoplasma DNA. Sequencing of these clones produced a general view of the stolbur phytoplasma genome and pointed out some specific features by comparison to “Ca. Phytoplasma asteris.” “Ca. Phytoplasma asteris” and stolbur phytoplasma represent, respectively, the phylogenetic groups 16Sr-I and 16Sr-XIIA, which belong to a common phylogenetic clade that emerged early during phytoplasma evolution (24, 26). Their 16S rRNA genes are 96% identical, and the chromosome sizes of the strains compared in the present study are nearly the same, 820 to 850 kbp for stolbur phytoplasma (PO) and 860 kbp for “Ca. Phytoplasma asteris” (OY-M). Functional assignment to “Ca. Phytoplasma asteris” (OY-M) genes led to the conclusion that phytoplasma evolved by genome reduction, but surprisingly 40% of the 754 protein-coding genes could not be assigned to a known basic cellular function (39). Among the 181 stolbur phytoplasma DNA sequences identified, 31 sequences (17%) had no equivalent in “Ca. Phytoplasma asteris.” Most of the stolbur phytoplasma-specific CDS encode hypothetical proteins to which no biological features could be associated, except two partial CDS homologous to known proteins of interest. These CDS encode the riboflavin kinase involved in flavin mononucleotide and flavin adenine dinucleotide synthesis and a protein homologous to surface proteins of animal mycoplasmas. “Ca. Phytoplasma asteris” and stolbur phytoplasma share common plant hosts belonging to the Asteraceae and Solanaceae plant families but also woody hosts such as grapevine and strawberry plants in which they often induce similar, if not identical, symptoms. As a specific property, insect transmission from plant to plant of stolbur phytoplasma can be achieved only by Cixiidae, an hemipteran family belonging to the Fulguromorpha suborder (12, 14), whereas “Ca. Phytoplasma asteris” is transmitted by Cicadellidae, which belongs to the Cicadomorpha suborder (35). This insect vector specificity must be associated to specific genetic determinants required for the adaptation to insect vector such as a specific adhesin or another protein required for cellular invasion. It would be interesting to know whether genes stol-1H10, ribF, or any of the stolbur-specific genes encoding proteins of unknown function play a role in adaptation of the phytoplasma to its insect-specific vector(s). The observation that stolbur phytoplasma certainly uses less proline and cysteine but more arginine than “Ca. Phytoplasma asteris” suggests that these phytoplasmas diverge in their strategy to encode proteins. This could result from different environmental constraints regarding amino acid availability.
The chromosome size of various “Ca. Phytoplasma asteris” strains varies from 660 to 1,130 kbp, and variations in the range of 860 to 1,350 kbp have also been evidenced between different stolbur phytoplasma strains (33). A specific feature of stolbur phytoplasma is the presence of multiple copies of a type II intron with a sequence nearly identical to the unique copy found in the “Ca. Phytoplasma asteris” chromosome and absent in the genome of other Mollicutes. This type of catalytic RNA acting as a mobile genetic element has been reported in various bacterial genomes (5, 6, 23) and could have an influence on stolbur phytoplasma chromosomal plasticity. Repeated uvrD, hflB, tmk, dam, and ssb genes account for 18% of the total genes present on the “Ca. Phytoplasma asteris” chromosome (39). Similar repeats exist in stolbur phytoplasma at least for hflB and tmk, but it is difficult to evaluate from our data the proportion of the stolbur genome these repeats represent.
Variations of the symptoms induced in plants infected by closely related phytoplasma strains, as, for example, presence or absence of virescence, phyllody, elongation of internodes, or witches' broom, have frequently been reported in groups 16Sr-I and 16Sr-X (20, 26) and can also be observed in the group 16Sr-XIIA (unpublished data). Methods such as double SSH will facilitate the comparison of larger sets of genes between phenotypically different strains and might provide a key to the molecular basis of phytoplasma biology, especially the interaction between the plant host and the insect vector. Providing the gene targets for future molecular tools will also help to better document phytoplasma diversity and epidemiology as well as phytoplasma taxonomy, which requires the comparison of genomic loci in the “Ca. Phytoplasma” genus (16).
A.C. was supported by a Ph.D. fellowship from INRA-SPE and Conseil Régional d'Aquitaine, and G.A. was supported by a Ph.D. fellowship from the MENRT.
We gratefully acknowledge Alain Blanchard for critical review of the manuscript and Géraldine Gourgues and Patrick Bonnet for laboratory and greenhouse technical help. We thank Jean-Luc Danet and Wolfgang Jarausch for providing the stolbur PO phytoplasma strain.
†Supplemental material for this article may be found at http://aem.asm.org/.