|Home | About | Journals | Submit | Contact Us | Français|
Xylella fastidiosa is a xylem-dwelling, insect-transmitted, gamma-proteobacterium that causes diseases in many plants, including grapevine, citrus, periwinkle, almond, oleander, and coffee. X. fastidiosa has an unusually broad host range, has an extensive geographical distribution throughout the American continent, and induces diverse disease phenotypes. Previous molecular analyses indicated three distinct groups of X. fastidiosa isolates that were expected to be genetically divergent. Here we report the genome sequence of X. fastidiosa (Temecula strain), isolated from a naturally infected grapevine with Pierce's disease (PD) in a wine-grape-growing region of California. Comparative analyses with a previously sequenced X. fastidiosa strain responsible for citrus variegated chlorosis (CVC) revealed that 98% of the PD X. fastidiosa Temecula genes are shared with the CVC X. fastidiosa strain 9a5c genes. Furthermore, the average amino acid identity of the open reading frames in the strains is 95.7%. Genomic differences are limited to phage-associated chromosomal rearrangements and deletions that also account for the strain-specific genes present in each genome. Genomic islands, one in each genome, were identified, and their presence in other X. fastidiosa strains was analyzed. We conclude that these two organisms have identical metabolic functions and are likely to use a common set of genes in plant colonization and pathogenesis, permitting convergence of functional genomic strategies.
Different microorganisms are able to survive in and to colonize plant water-conductive vessels (xylem). The result of this association is either beneficial or detrimental to the plant host. Of the latter, an example is the association of Xylella fastidiosa (38) with diverse plant hosts. X. fastidiosa is a fastidious, insect-transmitted, xylem-inhabiting bacterium known to cause several economically important diseases of both monocotyledonous and dicotyledonous plants (14, 17, 29). These diseases include Pierce's disease (PD) of grapevine and citrus variegated chlorosis (CVC), which have rather distinct symptoms and geographical distributions.
PD, caused by certain strains of X. fastidiosa, is characterized by wilted, shriveled, raisin-like fruit and scorched leaves that detach, leaving bare petioles attached to the canes (37). The bark of affected canes may lignify or mature irregularly, leaving areas of brown bark tissue surrounded by green immature tissue. Delayed and stunted shoot growth occurs in the spring following infection, and chronically infected grapevines eventually die. This devastating disease is a major threat to the viability of the California wine industry. The PD X. fastidiosa strain, whose genome sequence is described here, was isolated in 1998 from a naturally infected grapevine in Temecula, Calif. CVC, on the other hand, is characterized by the presence of small hard fruits of no commercial value and conspicuous spotted chlorosis on the upper leaf surface, resembling the symptoms of zinc deficiency and occasionally accompanied by gum-like extrusions from the spots on the lower surface (4).
X. fastidiosa is phylogenetically placed at the base of the gamma group of Proteobacteria (36). Molecular analyses at the species level have revealed three distinct groups. The grapevine-infecting variants responsible for PD are found in one group, while the citrus-infecting variants responsible for CVC are found in another (6, 7, 16). Initial expectations, based on geographical distributions, host diversity, differential disease symptoms, and molecular analyses, were that organisms from the three groups would be sufficiently different to support taxonomic separation at the subspecies or pathovar level (17, 19, 24).
The genomic sequence of the PD X. fastidiosa Temecula strain has now been determined in order to further elucidate both the molecular basis of X. fastidiosa pathogenicity and the phylogenetic relationships among X. fastidiosa strains. Comparative analyses of the complete sequences and annotations of PD X. fastidiosa and an X. fastidiosa representative of the CVC group isolated from Brazil (33) revealed that these strains exhibit remarkably limited genomic variability and share 95.7% amino acid identity in equivalent regions. There are only three genomic rearrangements, two identified genomic islands, 41 PD X. fastidiosa and 152 CVC X. fastidiosa strain-specific genes, and some genes harboring frameshifts. Our analyses suggest that a common functional genomic strategy may be undertaken to identify means of controlling X. fastidiosa-induced diseases.
Total genomic DNA was isolated from the PD X. fastidiosa Temecula strain. The complete genome sequence was generated by using a combination of ordered cosmid and shotgun strategies (13). Various shotgun libraries with different insert sizes (0.8 to 2.0 kb and 2.0 to 4.5 kb) were constructed from nebulized genomic DNA cloned into pUC18, and a total of 102,348 sequences were generated; 81% of these had at least 400 bases with a Phred quality above 20 (15), providing approximately 13-fold genome coverage. A cosmid library (Lawrist vector) with inserts ranging from 30 to 45 kb was constructed. A total of 2,752 cosmid ends were sequenced; 63% of these had at least 300 bases with a Phred quality above 20, providing approximately 26-fold genome coverage. These cosmid ends were used in the scaffold, and 12 cosmids were selected to be fully sequenced. Sequence gaps were identified by linking information from forward and reverse reads and were closed by primer walking, PCR sequencing, and insert subcloning. Sequences from both ends of most cosmid clones were used to confirm the orientation and integrity of the contigs. The sequences were assembled by using the Phred+Phrap+Consed package (15). All consensus bases have a Phred quality of at least 20. There are no unexplained high-quality discrepancies, and the overall error estimate is less than 1 in every 10,000 bases. Most of the sequencing was performed with BigDye terminators and ABI Prism 3700 DNA sequencers.
Annotation was dependent primarily on open reading frame (ORF) identification by using GLIMMER (10), GeneMark (5), and alignment against the National Center for Biotechnology Information protein database. BLASTX searches were carried out to find additional putative protein-coding genes. All ORFs were inspected manually by the annotation team. For each ORF, links to Cluster of Orthologous Groups of Proteins (COG), Protein Family Database (PFAM), and Kyoto Encyclopedia of Genes and Genomes (KEGG) were made available. RNA species were identified by using BLASTN (3), secondary structure analysis, and tRNAscan-SE (21). Domestic software was used in order to generate items such as gene maps, lists, comparative CVC X. fastidiosa and PD X. fastidiosa data, and GenBank submissions. For a full list of ORFs, gene maps, and comparative tables, refer to supplementary material at http://aeg.lbi.ic.unicamp.br/world/xfpd/.
Whole genomes of PD X. fastidiosa and CVC X. fastidiosa were compared at the nucleotide level by using the program MUMmer (11) with default values. At the amino acid level, the genomes were compared by using previously developed programs (33). Genes g and h were considered orthologs if h was the best BLASTP hit for g and vice versa, where the e-values were 10−5 or less. A gene was considered strain specific if it had no hits or the e-value was 10−5 or more in the other genome.
Oligonucleotides were constructed for the genomic islands and their flanking regions. PCR analyses were carried out in duplicate in two different laboratories. For a full list of strains and primers, refer to supplementary material at http://aeg.l-bi.ic.unicamp.br/world/xfpd/.
The sequences have been deposited in GenBank with accession numbers AE009442 (chromosome) and AE009443 (plasmid).
The PD X. fastidiosa Temecula genome is composed of a single large circular chromosome (2,519,802 bp) and a small plasmid, pXFPD1.3 (1,345 bp), also reported by others for some PD X. fastidiosa strains (16). Table Table11 shows a comparative summary of the main genome features of PD X. fastidiosa Temecula and CVC X. fastidiosa 9a5c. Major discrepancies between these two strains consist of a 159,503-bp chromosome size difference and the absence of large plasmid pXF51 in PD X. fastidiosa Temecula. The variation in the percentages of hypothetical ORFs observed (0.8% for PD X. fastidiosa Temecula and 4.4% for CVC X. fastidiosa 9a5c) could be due to the difference in the genome size, as explained below.
Of the 2,066 protein-coding genes annotated in PD X. fastidiosa Temecula, 2,025 (98%) are also present in CVC X. fastidiosa 9a5c. Of these orthologous genes, 94.5% have 80% or more amino acid identity, with an average identity of 95.7%, as shown in Fig. Fig.1.1. This conservation is distributed along the whole chromosome, and regions of lower identity tend to appear in clusters (Fig. (Fig.1).1). This level of protein identity is comparable to that observed among the orthologous proteins of different Escherichia coli strains (27), Helicobacter pylori strains (1), and Salmonella enterica serovars (22) and thus supports a close relationship between these two Xylella strains. The most conserved PD X. fastidiosa Temecula genes include all those that determine the basic metabolism and cellular functions of the bacterium, which we thus conclude are mostly identical to those previously described for CVC X. fastidiosa 9a5c (33). Energy is generated by the efficient utilization of carbohydrates, including cellulose, but with no predicted catabolism of fatty acids or amino acids as alternative energy sources. In contrast, a complete set of biosynthetic pathways is present, permitting the synthesis of all amino acids, purines, pyrimidines, and nucleotides as well as an extensive array of cofactors and prosthetic groups. Transport systems include those for carbohydrates, ions, amino acids, and peptides as well as those for the extrusion of drugs and toxins.
A total of 106 genes in the PD X. fastidiosa Temecula genome (5.2%), although shared with the CVC X. fastidiosa 9a5c genome, have amino acid identities of 20 to 80%. Among these are 58 genes that are found within phage-related regions and genomic islands. In addition, 18 conserved hypothetical genes in the vicinity of the hemolysin and hemagglutinin genes fall within this group. Interestingly, among genes with assigned functions that exhibited this higher level of divergence, we found some that may be involved in X. fastidiosa-plant host interactions, including those for fimbrillins and hemagglutinins (attachment and cell aggregation); colicin, hemolysin, and bacteriocin (toxins); and drug resistance and DNA restriction and modification enzymes (see supplementary material for a full list of genes). Thus, there may have been more selective pressure for alterations in these genes to enhance plant-specific bacterial colonization capability. There are also genes in the two genomes that have either a frameshift or an in-frame stop codon (Table (Table2),2), suggesting that they are nonfunctional. The most intriguing of these is the polygalacturonase precursor gene, which has a stop codon in CVC X. fastidiosa 9a5c but is intact in PD X. fastidiosa Temecula. For two other partially sequenced Xylella genomes (http://www.jgi.doe.gov/), no frameshift is observed within the polygalacturonase precursor gene. Other than among Xylella genomes, this gene shares 65% identity with its Ralstonia solanacearum ortholog (32); orthologs are also present in other necrogenic plant pathogens, such as Xanthomonas campestris pv. campestris, X. axonopodis pv. citri, and Erwinia carotova. This gene is essential for the synthesis of cell wall-degrading enzymes that facilitate intervessel migration. Its intact status in PD X. fastidiosa Temecula may account for the more aggressive nature of PD than of CVC (2), where it is not essential for disease development, since Koch's postulates for strain 9a5c were experimentally fulfilled.
PD X. fastidiosa Temecula has 41 strain-specific genes (1.9%), while CVC X. fastidiosa 9a5c has 152 such genes (6.8%) (Table (Table2).2). In both strains, more than half of these are hypothetical or conserved hypothetical genes, and a significant proportion are associated with mobile genetic elements. Among the PD X. fastidiosa-specific genes with assigned functions are a hydrolase gene with similarity to genes in Xanthomonas (gi21113352), Pseudomonas (gi15598992), and Salmonella (gi16763691) and a gene for a type II restriction and modification system most similar to that of the cyanobacterium Nostoc (gi547934). Genes for two other proteins, proteic killer and HicA, are shared with Nostoc (gi17232769) and E. coli (gi15804020). The CVC X. fastidiosa-specific genes with assigned functions include a gene for an O-antigen acetylase that is involved in LPS modification and that is similar to those found in S. enterica (gi16761319), Sinorhizobium meliloti (gi16761319), Mesorhizobium loti (gi13474718), Neisseria meningitidis (gi15795071), and Pseudomonas (gi15600431) (12, 34) and an additional drug resistance translocase gene that is most similar to genes identified in Caulobacter crescentus (gi16127299), Mycobacterium tuberculosis (gi15841836), and M. loti (gi13472297). This gene is located, along with 71 other specific genes, on the CVC-specific island described below. The apparently diverse origins of the specific genes with assigned functions in the two strains are also reflected in conserved hypothetical genes that are similar to the genes of a large, unrelated group of bacteria including Xanthomonas, Pseudomonas, Ralstonia, Listeria innocua, Agrobacterium tumefaciens C58, C. crescentus, and even, for one gene in CVC X. fastidiosa, the distantly related eubacterium Microscilla (gi14485002). It appears that individual genes have been accumulated in phages and transferable islands during their passage through many bacterial species before being incorporated within the X. fastidiosa genome (26).
Alignment of the PD X. fastidiosa Temecula and CVC X. fastidiosa 9a5c chromosomes, starting from the putative origins of replication, highlighted three chromosomal regions of the two genomes that were translocated and inverted despite their overall identity (Fig. (Fig.2).2). All such reorganization events occurred at least 250,000 bp from the putative origin of replication (Fig. (Fig.2A),2A), as previously observed for other bacterial species (18). These three large rearranged chromosomal regions and other small rearrangements were all flanked at one border by a putative phage-related integrase, suggesting that they were phage mediated. The PD X. fastidiosa Temecula chromosome harbors eight clusters of phage-related regions, Xpd1 to Xpd8, none of which is organized in a manner similar to that of the four CVC X. fastidiosa 9a5c prophages described previously (XfP1 to XfP4) (33). The Stretcher global alignment program (25) was used to determine the overall nucleotide identity in a given region, enabling analysis of similarity among the phage-related regions. The Xpd1 region shares 83 and 78% nucleotide identity with CVC X. fastidiosa 9a5c prophages XfP2 and XfP1, respectively. All of the other CVC X. fastidiosa 9a5c prophage regions share less than 50% nucleotide identity with the Xpd phage clusters. Three of the phage-related regions are specific to the PD X. fastidiosa genome, and one is involved in one of the large rearrangements mentioned above. Three phage-related regions (Xpd5, Xpd6, and Xpd8) are highly divergent from the equivalent regions in the CVC X. fastidiosa chromosome, while Xpd1 maintains the same borders as XfP4. We have not considered these phage-related regions to be strain specific due to the fact that we cannot determine whether the insertion events occurred prior to strain divergence. In addition, some genes are shared by these phage-related regions. Figure Figure2B2B is a schematic representation of the PD X. fastidiosa and CVC X. fastidiosa chromosomes illustrating the rearrangements and the relative positions of the prophage clusters.
Genomic islands specific to each genome were characterized on the basis of marked decreases in protein identities, different GC contents, and codon bias. Two of these islands, one specific to each genome, have higher GC contents, and their relative positions are indicated in Fig. Fig.1.1. In PD X. fastidiosa Temecula, genomic island PD1 (giPD1) is 15.7 kb long, has 61.2% GC content, and harbors an extra copy of a hemagglutinin gene with a phage-related integrase at one end (Fig. (Fig.3A).3A). In CVC X. fastidiosa 9a5c, genomic island CVC1 (giCVC1) is 67 kb long, has 63.3% GC content, and is inserted within tRNA Gly-2 (Fig. (Fig.3B).3B). The integrase immediately adjacent to the tRNA Gly-2 gene is highly similar at the nucleotide (93%) and protein (87%) levels to a previously described Pseudomonas putida strain B13 integrase (31) that is associated with the P2 integrase/recombinase family. In Pseudomonas, the integrase is associated with a self-transmissible 105-kb clc element that carries the clcRABDE genes encoding chlorocatechol-degradative enzymes. It is interesting that different integrases can share common integration target sites (39). The integrase characterized for giCVC1 is targeted to the glycine tRNA structural gene (glyV), like the integrase associated with the Pseudomonas self-transmissible element.
In an attempt to correlate the presence or absence of the genomic islands with a disease phenotype, PCR analyses were performed to characterize their distributions in different X. fastidiosa strains. Primers were constructed for the island borders. Table Table33 shows the giCVC1 distribution in 64 strains of X. fastidiosa isolated from different hosts and different geographical regions. The flanking regions of giCVC1 are the same as the corresponding regions in the PD X. fastidiosa Temecula genome, except that the whole region is inverted relative to the origin of replication. The use of the same flanking primers produced, after PCR amplification, a product of 4,587 bp for the PD X. fastidiosa genome, indicating the absence of giCVC1. However, three distinct groups were identified based on the sizes of the amplified products (Table (Table3).3). The CVC group, which contains all or part of giCVC1, comprises most of the tested Brazilian strains regardless of host (citrus, coffee, hibiscus, and periwinkle). Surprisingly, the PD group, which does not have giCVC1, was subdivided into two groups. One group comprises strains isolated from grapevine, mulberry, almond (ATCC 35870), and oleander, with an amplified fragment of approximately 4.6 kb. The second group, comprising strains isolated from plum in Brazil and the United States and from almond (ATCC 700965), elm, oak, and periwinkle in the United States, produced a smaller amplified fragment (2.9 kb). These results are consistent with the existence of different groups of strains of Xylella in North America and South America, as suggested previously (8, 28), with the exception of the plum strain isolated in Brazil. This strain had a pattern similar to that of the North American strains, a fact that could be indicative of its recent introduction into Brazil via infected seedlings.
In PD X. fastidiosa, giPD1 is located within the phage-related region Xpd2 (Fig. (Fig.3A).3A). PCR analysis of this region in 30 different X. fastidiosa strains revealed a pattern more variable than that obtained for the giCVC1 distribution. The presence or absence of giPD1 could not be correlated with the groups described above, as both PD and CVC strains may contain this island. Careful inspection of the genome around giPD1 enabled us to characterize a 68.8-kb region that could represent the ancient insertion of a prophage and/or a conjugative transposon related to the Tn21 family. Transposon ends similar to those of Tn5053 were detected at both extremities of the proposed region, and a degenerate copy of the transposase was also found within the island. Tn5053 was originally described as a transposon which carries the mercuric resistance operon described for Xanthomonas (20). One interpretation of the PCR results is that giPD1 was already present in the ancestral Xylella genome prior to the divergence of the 30 strains studied here and that the evolution of each strain, irrespective of the plant host, was characterized by multiple losses from this ancestral island. On the other hand, it seems that giCVC1 is limited to Brazilian X. fastidiosa strains. Therefore, it is reasonable to infer that this island is a recent acquisition by the original strain that spread to South America that nevertheless occurred prior to its expansion. It is interesting that part of giCVC1 was also observed in X. axonopodis pv. citri (9), which causes citrus canker.
The biology of Xylella-induced disease is poorly understood, and the grouping of strains has been a strategy devised in part to develop effective disease management measures. Different techniques have been used to try to establish pathovar or subspecies categories not necessarily focused on the evolution of the group. Qin et al. (30) proposed the following natural groups of strains: 1, citrus and coffee; 2, grapevine, almond, and ragweed; and 3, elm, oak, and plum. Chen et al. (6) proposed three strain groups based on 16S ribosomal DNA sequences: 1, citrus and coffee; 2, grapevine and mulberry; and 3, elm, oak, peach, plum, and periwinkle. Based on our analysis of giCVC1, we propose three groups of strains: 1, citrus, coffee, and possibly other South American strains; 2, grapevine, mulberry, oleander, and some almond strains, all from North America; and 3, elm, oak, plum, periwinkle, and some almond strains, again all from North America. Hendson et al. (16) showed that with the exception of some almond strains, all X. fastidiosa strains isolated from the same host had identical sequences for the intergenic spacer of rRNA genes. The two almond strains that we examined are distinguished by the size of the amplified fragment corresponding to giCVC1 and by the presence of giPD1.
The presence of genomic islands in different closely related strains is known to represent the gain of adaptive traits by an organism; examples include the Mesorhizobium symbiotic island (35) and the LEE islands of enteropathogenic E. coli strains (23). The acquisition of such islands can result in evolution by quantum leaps. A comparative analysis of the two X. fastidiosa genomes permitted the identification of such islands in both genomes, although their adaptive functions remain to be demonstrated. Essentially all of the differences between the PD X. fastidiosa Temecula and CVC X. fastidiosa 9a5c genomes can be accounted for by the numbers and relative positions of clusters of phage-related genes and insertion or deletion events, among which giPD1 and giCVC1 are included. If prophage regions are excluded and rearrangements are reoriented, the genomes of both X. fastidiosa strains are very similar and colinear. We propose that the evolutionary divergence of the two sequenced X. fastidiosa strains is thus mainly due to lateral gene transfer mediated mostly by phage vectors. It is noteworthy, however, that there have been fewer lateral gene transfer events in the X. fastidiosa genomes than can be detected based on a comparison with E. coli strains (27). Despite the genome rearrangements, the most significant conclusion to be drawn from the sequencing of the PD X. fastidiosa Temecula genome is that many of the genes in the two X. fastidiosa strains are highly similar, including not only those involved in basic cellular housekeeping but also many of those likely to have a direct role in pathogenicity. This conclusion suggests that the diseases caused by different X. fastidiosa pathotypes most likely rely on the expression of a common set of genes to allow the bacteria to become established in planta. This possibility of common pathogenic mechanisms implies that functional genomics studies of the two organisms would share significant common ground, and their integration might accelerate advances in combating both PD and CVC. In this regard, critical cross-infection experiments with PD X. fastidiosa and CVC X. fastidiosa strains and reciprocal hosts would be of great immediate interest to evaluate this hypothesis.
The polygalacturonase gene was amplified and sequenced from 11 strains of X. fastidiosa (see list in supplementary material). Besides the CVC X. fastidiosa 9a5c strain all the citrus and coffee strains examined showed the same frameshift, while other strains including mulberry, almond, and grape isolates did not show the frameshift.
This project was funded by FAPESP (São Paolo, Brazil), CNPq (Brasília, Brazil), USDA-ARS, American Vineyard Foundation, and California Department of Food and Agriculture.
The DNA used was isolated by E. Civerolo at the University of California, Davis, and the sequencing was undertaken by the Agricultural and Environmental Genomics Group of the Organization for Nucleotide Sequencing and Analysis founded by FAPESP. We thank all of the technicians involved in this project.