|Home | About | Journals | Submit | Contact Us | Français|
Xanthomonas is a large genus of bacteria that collectively cause disease on more than 300 plant species. The broad host range of the genus contrasts with stringent host and tissue specificity for individual species and pathovars. Whole-genome sequences of Xanthomonas campestris pv. raphani strain 756C and X. oryzae pv. oryzicola strain BLS256, pathogens that infect the mesophyll tissue of the leading models for plant biology, Arabidopsis thaliana and rice, respectively, were determined and provided insight into the genetic determinants of host and tissue specificity. Comparisons were made with genomes of closely related strains that infect the vascular tissue of the same hosts and across a larger collection of complete Xanthomonas genomes. The results suggest a model in which complex sets of adaptations at the level of gene content account for host specificity and subtler adaptations at the level of amino acid or noncoding regulatory nucleotide sequence determine tissue specificity.
The genus Xanthomonas is a member of the class Gammaproteobacteria and consists of 20 plant-associated species, many of which cause important diseases of crops and ornamentals. Individual species comprise multiple pathogenic variants (pathovars [pv.]). Collectively, members of the genus cause disease on at least 124 monocot species and 268 dicot species, including fruit and nut trees, solanaceous and brassicaceous plants, and cereals (32). They cause a variety of symptoms, including necrosis, cankers, spots, and blight, and they affect a variety of plant parts, including leaves, stems, and fruits (47). The broad host range of the genus contrasts strikingly with the narrow host ranges of individual species and pathovars (96), which also exhibit a marked tissue specificity by colonizing either the xylem or the intercellular spaces of nonvascular, mesophyll tissues.
Morphological and physiological uniformity within Xanthomonas has hampered the establishment of a stable taxonomy reflective both of the phenotypic diversity and of the evolutionary relationships within the genus (32, 86). The current taxonomy, proposed by Vauterin et al. (96) and later substantiated and refined by Rademaker et al. (67), however, is robust. On the basis of molecular genotyping, 20 species (genomic groups) are recognized, with each comprising one to several pathovars. Within this framework, examples are found both of apparent convergent evolution with regard to pathogenic traits, e.g., isolates in different genomic groups that infect the same host(s), and of divergent evolution, e.g., isolates in the same genomic group that infect different hosts or the same host differently (67). By establishing molecular genetic relationships among the more than 140 known members of the genus with distinctive pathogenic characteristics, the current taxonomy sets the stage for informed comparisons directed toward identifying unique determinants of host and tissue specificity as well as pathogenicity factors that are universally important in plant disease.
Complete genome sequences of nine Xanthomonas strains, representing pathovars within four species, have been published. The strains are strain 306 of X. axonopodis pv. citri, which causes citrus canker (18); strain 85-10 of X. axonopodis pv. vesicatoria, the bacterial spot pathogen of pepper and tomato, formerly a pathovar of X. campestris (92); strains 8004, ATCC 33913, and B100 of X. campestris pv. campestris, the causal agent of black rot in crucifers, including the model plant Arabidopsis thaliana (here Arabidopsis) (18, 66, 98); strains KACC10331, MAFF311018, and PXO99A of X. oryzae pv. oryzae, which is responsible for bacterial blight of rice (44, 59, 74); and strain GPE PC73 of X. albilineans, a more distantly related, xylem-limited pathogen of sugarcane with a markedly reduced genome size relative to other Xanthomonas spp. (62).
We report here the complete genome sequence of the nonvascular counterpart of X. campestris pv. campestris, namely, X. campestris pv. raphani (strain 756C, formerly classified as X. campestris pv. armoraciae), which causes bacterial spot of crucifers, including Arabidopsis, and the complete genome sequence of the nonvascular counterpart of X. oryzae pv. oryzae, namely, X. oryzae pv. oryzicola (strain BLS256), which causes bacterial leaf streak of rice. In addition to facilitating functional genomics and DNA-based diagnostics important for understanding and controlling the respective diseases, these genome sequences enable key comparisons within the vascular and nonvascular pairs to shed light on tissue specificity and across the pairs to provide insight into host specificity. They also enable broader comparisons across all the available complete genome sequences to identify candidate specificity determinants as well as candidate genes fundamental to pathogenesis, irrespective of the host and tissue infected. Results of such comparisons are also presented.
Bacterial genomic DNA was randomly sheared by nebulization, end repaired with consecutive BAL31 nuclease and T4 DNA polymerase treatments, and size selected using gel electrophoresis on 1% low-melting-point agarose. After ligation to BstXI adapters, DNA was purified by three rounds of gel electrophoresis to remove excess adapters and the fragments were ligated into the vector pHOS2 (a modified pBR322 vector) linearized with BstXI. The pHOS2 plasmid contains two BstXI cloning sites immediately flanked by sequencing primer binding sites. These features reduce the frequency of nonrecombinant clones and reduce the amount of vector sequences at the end of the reads. The ligation reaction products were electroporated into Escherichia coli. We constructed several shotgun libraries with average insert sizes of 4.5, 7, 11, and 13.5 kb for X. oryzae pv. oryzicola BLS256 and 4, 5, and 9 kb for X. campestris pv. raphani 756C. Clones were plated onto large-format (16- by 16-cm) diffusion plates prepared by layering 150 ml of fresh antibiotic-free agar onto a previously set 50-ml layer of agar containing antibiotic. Colonies were picked for template preparation, inoculated into 384-well blocks containing liquid medium, and incubated overnight with shaking. High-purity plasmid DNA was prepared using a DNA purification robotic workstation custom-built by Thermo CRS (Thermo Scientific) and based on the alkaline lysis miniprep (75) and isopropanol precipitation. DNA precipitate was washed with 70% ethanol, dried, and resuspended in 10 mM Tris-HCl buffer containing a trace of blue dextran. The yield of plasmid DNA was approximately 600 to 800 ng per clone, providing sufficient DNA for at least four sequencing reactions per template. Sequencing was done using a dideoxy sequencing method (76). Two 384-well cycle sequencing reaction plates were prepared from each plate of plasmid template DNA for opposite-end, paired-sequence reads. Sequencing reactions were completed using the BigDye Terminator chemistry and standard M13 forward and reverse primers. Reaction mixtures, thermal cycling profiles, and electrophoresis conditions were optimized to reduce the volume of the BigDye Terminator mix and to extend read lengths on the AB3730xl sequencers (Applied Biosystems). Sequencing reactions were set up by the Biomek FX pipetting workstations. Robots were used to aliquot and combine templates with reaction mixtures consisting of deoxy- and fluorescently labeled dideoxynucleotides, DNA polymerase, sequencing primers, and reaction buffer in a 5-μl volume. After 30 to 40 consecutive cycles of amplification, reaction products were precipitated by isopropanol, dried at room temperature, resuspended in water, and transferred to an AB3730xl sequencer. A total of 58,510 and 75,159 successful X. oryzae pv. oryzicola BLS256 and X. campestris pv. raphani 756C reads, respectively, were produced. The average read lengths were 859 bp for X. oryzae pv. oryzicola BLS256 and 796 bp for X. campestris pv. raphani 756C. After initial assembly, gaps were closed in each genome by primer walking on plasmid templates, sequencing genomic PCR products that spanned the gaps, and transposon insertion and sequencing of selected shotgun clones.
Multiple rounds of assembly were performed, beginning with the shotgun reads and, later, including additional finishing reads. In the final assemblies, 74,930 reads (for X. campestris pv. raphani 756C) and 57,520 reads (for X. oryzae pv. oryzicola BLS256) were trimmed to remove vector and low-quality sequences and then assembled using Celera Assembler (57). The assemblies were evaluated using the Hawkeye assembly diagnosis software (77) to identify and correct any collapsed repeats. Protein-coding genes were identified using the Glimmer, version 3.0, program (20, 21), which includes an algorithm to identify ribosome binding sites for each gene. Transcription terminators were predicted using the TransTermHP tool (42) with parameter settings expected to yield over 90% accuracy. Transfer RNAs were identified with the tRNAScanSE server (51). Regions with neither Glimmer predictions nor RNA genes were searched in all six frames using BLASTx (2) to identify any missed proteins, and all annotations were manually curated as described previously (34). The origin and terminus of replication were determined using GC skew analysis (50), which for X. campestris pv. raphani 756C indicates an origin near position 1 and a terminus near 2,460 kb. The chromosome replication initiator gene dnaA, which is commonly found near the origin, is at position 201 in the X. campestris pv. raphani 756C genome and at position 43 in the X. oryzae pv. oryzicola BLS256 genome. Oligomer skew analysis, which identifies 8-mers preferentially located on the leading strand (73), indicates an origin for X. campestris pv. raphani 756C at position 99 and a terminus at 2,471 kb, based on multiple 8-mers, including CCCTGCCC and CGCCCTGC. These 8-mers occur 409/468 (87.4%) and 856/1,205 times on the leading strand; for CCCTGCCC the likelihood that this occurred by chance is 2.7 × 10−58. Similarly, in X. oryzae pv. oryzicola BLS256, the 8-mer CCCTGCCC occurs 325/368 (88.3%) times on the leading strand and indicates an origin at position 57 kb and a terminus at 2,473 kb. (The oligomer skew software is available at http://cbcb.umd.edu/~salzberg/docs/skew/.)
The genome sequences compared in this study are listed in Table 1. In the tables and figures throughout, where sequences for multiple strains of the same pathovar were identical, the sequence for only one strain is noted. The genome of X. albilineans GPE PC73, which is reduced and divergent relative to the other genomes, was not included.
The complete nucleotide sequences of a set of 31 phylogenetic marker genes (primarily genes involved in transcription, translation, and replication ) were extracted from each of the genomes and aligned to generate a maximum-likelihood tree with 1,000 bootstrap replicates using the MEGA5 program and applying the GTR + G (gamma distribution) model of nucleotide substitution (89; http://www.megasoftware.net). Sequences from Stenotrophomonas maltophilia strain K279a (15) were used as an outgroup.
Whole-genome alignments were carried out using the MAUVE program with default parameters (16).
Nucleotide sequences of transcription activator-like (TAL) effector genes from X. oryzae pv. oryzicola BLS256, X. oryzae pv. oryzae MAFF 311018, and X. oryzae pv. oryzae PXO99A and of pthA of X. axonopodis pv. citri (gi 899439), with the repeat regions removed, were aligned using the software MAFFT (40) with the L-INS-I utility. Pseudogenes were excluded. The repeat region was identified as starting after the sequence CCCCCTGAAC and ending before (A/G)GCATTGTTGC. The Phyml algorithm (30) was used to construct the maximum-likelihood phylogenetic tree on the basis of the alignment with 100 bootstrap replicates. The jModelTest tool (63) identified the TPM1uf+I+G model to be the best according to the Bayesian information criterion, and this model was used by Phyml. TPM1uf is the Kimura 3-parameter model (K81) (41) with unequal base frequencies (uf), invariant sites (I), and gamma-distributed rates (G). Maximum-likelihood phylogenetic trees were also constructed from amino acid alignments using the JTT+G+F model, selected by the ProtTest program (1).
The annotated protein sequences of each genome in Table 1, 44,658 protein sequences total, were pairwise compared by using BLASTp (11) with an E-value threshold of 1e−10, complexity filter inactivated, and output formatted for the OrthoMCL program (option 8). The BLASTp output was processed with the OrthoMCL program using default parameters (48), resulting in 5,285 clusters of homologous proteins, with each cluster consisting of at least two sequences, which were not necessarily from different strains. Unique sequences were then included as singlets with the set of clusters. Finally, all sequences annotated as transposon related were collected in a separate BLAST database, and sequences in the set of clusters were deleted if they matched sequences in the transposon database file at a threshold of 1e−20. Clusters that had more than 50% of their initial members culled were entirely removed. Also removed were short sequences of less than 60 amino acids. The final set of clustered sequences consisted of 7,040 clusters, ranging from 1 cluster with 81 members to 2,112 singlets.
On the basis of the strains represented in them, clusters were associated with 1 of 15 categories derived from the two independent classifications of strains as either vascular (V) or nonvascular (N) in their mode of infection and monocot (M) or dicot (D) with respect to their host(s). To determine genes specific, essential, or specific and essential to different strain categories, Venn diagrams were drawn in R language using the VennDiagram package (13).
In-house computational scripts for automation of the analyses and intermediate output files are available at http://brendelgroup.org/JBac11/.
The complete genome sequences and annotation for X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 have been deposited in GenBank under accession numbers CP002789 and AAQN01000001, respectively. The traces have been deposited in the NCBI Trace Archive (http://www.ncbi.nlm.nih.gov/Traces), and the complete assemblies are in the NCBI Assembly Archive (http://www.ncbi.nlm.nih.gov/Traces/assembly).
X. campestris pv. raphani 756C is a chloramphenicol-resistant derivate of strain 756 which can be transformed efficiently and is virulent on Arabidopsis (38, 39; A. J. Bogdanove, unpublished data). Strain 756 was isolated from cabbage in East Asia prior to 1985 (3, 4). The strain was used in early studies of the role of hrpX in bacterial blight and bacterial leaf spot of crucifers (39). Originally classified as X. campestris pv. armoraciae, X. campestris pv. raphani 756C is among a large collection of X. campestris strains that had been isolated from cruciferous plants and classified under six pathovar designations (including X. campestris pv. armoraciae) that were recently regrouped by Fargier and Manceau into X. campestris pv. campestris, which they defined as causing black rot on at least one cruciferous species; X. campestris pv. incanae, causing bacterial blight on garden stock and wallflower; and X. campestris pv. raphani, causing bacterial leaf spot on both cruciferous and solanaceous plants (24). Fargier and colleagues have since shown additional support for this regrouping by multilocus sequence typing (25). Further, the authors identified X. campestris pv. raphani 756C as X. campestris pv. raphani race 3 (24) using the set of differential cultivars and accessions of Brassica spp. established by Vicente et al. (97).
X. oryzae pv. oryzicola BLS256 is a moderately to highly virulent isolate collected in 1984 from a breeding plot at the International Rice Research Institute (IRRI) Experiment Station in the Philippines (C. M. Vera-Cruz, unpublished data). X. oryzae pv. oryzicola BLS256 has been used previously in molecular studies. It was shown to contain a homologue of the hrpXo gene from X. oryzae pv. oryzae PXO99A (69), and it is the original source of the avrRxo1 avirulence gene that triggers gene-for-gene resistance in maize and transgenic rice plants carrying the Rxo1 resistance gene (107, 108).
The X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 genomes are single circular chromosomes of 4,941,214 bp and 4,831,739 bp, respectively, with 65.3% and 64.0% GC contents, respectively (Fig. 1). The X. campestris pv. raphani 756C genome contains 4,535 annotated protein-coding genes and 57 tRNA genes. The X. oryzae pv. oryzicola BLS256 genome contains 4,480 protein-coding genes and 54 tRNA genes. Each genome contains two rRNA operons (Table 2).
A genome-based phylogenetic relationship among the completely sequenced Xanthomonas strains is shown in Fig. 2. This phylogenomic inference was drawn on the basis of the complete sequences of a set of 31 conserved housekeeping genes that are considered to be rarely subjected to horizontal gene transfer (35). The same set of genes was recently used to establish phylogenomic relationships of 106 bacterial and archaeal genomes (102). The Xanthomonas tree is well resolved and robust, with high bootstrap values for all clades. As expected, X. oryzae pv. oryzicola BLS256 and all X. oryzae pv. oryzae strains together form their own phylogenetic group. In contrast, X. campestris pv. raphani 756C, though closely related to the X. campestris pv. campestris strains, forms a distinct clade. The tree also reveals that the X. oryzae strains are more closely related to the X. axonopodis strains than to the X. campestris strains.
A striking feature shared by the Xanthomonas genomes is an abundance of insertion sequence (IS) elements, which are postulated to be important drivers of Xanthomonas genome evolution (55, 92). X. oryzae strains have particularly large numbers and the greatest diversity of IS elements yet observed in the genus (74). X. oryzae pv. oryzicola BLS256 is no exception, with 245 IS elements (including partial ones) representing six distinct families. The X. campestris pv. raphani 756C genome carries only 41 IS elements representing three families. When IS element content is compared across sequenced X. oryzae and X. campestris genomes, the families represented and patterns of amplification are similar within pathovars but distinct across pathovars and across species (see Fig. S1 and Table S1 in the supplemental material). Only three out of the eight IS families detected, IS3, IS4, and IS5, are represented in all strains. Members of the IS5 family are the most numerous. Four families, IS256, IS30, IS630, and ISL3, are represented only in X. oryzae strains. IS1114 (IS5 family) is the only individual IS element present in all the sequenced Xanthomonas genomes, including the X. axonopodis pv. citri 306 and X. axonopodis pv. vesicatoria 85-10 genomes.
In addition to serving as vectors for lateral gene transfer, IS elements can generate other types of genome modifications, including rearrangements, inversions, and deletions, any of which can lead to acquisition, modification, or loss of gene content. Whole-genome alignments of X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 to their respective vascular counterparts reveal numerous rearrangements and inversions within each group, but these are markedly more prevalent among the X. oryzae strains (Fig. 3). The X. oryzae strains also show larger and more numerous insertions and deletions. Not surprisingly, most of the breakpoints in the alignments in both groups are associated with IS elements (see Fig. S2 in the supplemental material).
Though not obligate pathogens, conceivably, the X. oryzae strains tolerate larger numbers of IS elements and consequent gene disruptions due to a more consistent association with their host plant. Rice is cultivated more extensively than the cruciferous hosts of X. campestris and often across multiple seasons per year, providing a more stable environment than X. campestris might experience. This could alleviate the requirement for some genes and thereby make the genome more tolerant of IS element proliferation. Alternatively or additionally, intensive cultivation of rice might have imposed selection on X. oryzae for the ability to rapidly adapt genetically. Genome plasticity conferred by IS elements may represent a mechanism for this.
Different patterns of IS element distribution in the two otherwise genetically highly similar pathovars support this notion.
Type III effectors (T3Es) are delivered into plant cells via the hrp gene-encoded type III secretion (T3S) system, which is required for pathogenesis in both X. campestris pv. raphani and X. oryzae pv. oryzicola (39, 53). They include proteins that act as virulence factors by promoting disease or avirulence factors by triggering host plant defense, depending on the host genotype. Type III effector genes in the X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 genomes were predicted by BLASTn using a database of identified and candidate T3Es of plant pathogens (http://www.xanthomonas.org).
X. campestris pv. raphani 756C differs markedly with respect to its repertoire of T3E genes from the other Xanthomonas strains, which generally contain a core set of 10 (avrBs2, xopF1, xopK, xopL, xopN, xopP, xopQ, xopR, xopX, and xopZ1) (http://www.xanthomonas.org). X. campestris pv. raphani 756C contains only three of these (xopF1, xopP, and xopR) and four other T3E genes that are less broadly conserved (Table 3). All the T3S genes themselves, including the hpa and hrpW accessory protein genes, are intact (52). Bordering the hrp gene cluster, a novel candidate T3E gene, xopAT, was identified on the basis of the presence of a plant-inducible promoter (PIP) box and a properly positioned −10 box-like sequence upstream of the coding sequence (Table 3; see Table S2 in the supplemental material). The PIP box is a conserved promoter element for the coregulation of hrp and many T3E genes by the activator HrpX (26, 28, 43, 91, 95). Including xopAT, seven of the eight X. campestris pv. raphani 756C candidate T3E genes are preceded by the PIP and −10 motifs (see Table S2 in the supplemental material). Like other T3E genes, xopAT also has an unusual (low) G+C content suggestive of acquisition by horizontal gene transfer (Table 3). Finally, in common with other T3Es, the encoded protein contains eukaryotic motifs for myristoylation and palmitoylation (93) but otherwise has no similarity to other known proteins. X. campestris pv. raphani 756C carries two other genes, avrXccA1 and avrXccA2, that are related to avrXca of X. campestris pv. raphani strain 1067, which was shown to function as an avirulence gene when expressed heterologously in X. campestris pv. campestris 8004 inoculated into Arabidopsis thaliana (60). Some features of these genes, however, including an N-terminal signal peptide for type II secretion, suggest that they are not bona fide T3Es. Overall, across the sequenced Xanthomonas genomes, the T3E gene content of X. campestris pv. raphani 756C is most similar to that of the other X. campestris strains. Of particular interest, X. campestris pv. raphani 756C contains xopAC (synonym avrAC), which in X. campestris pv. campestris 8004 triggers vascular tissue-specific resistance in A. thaliana ecotype Col-0 (104). The gene in X. campestris pv. raphani 756C is highly conserved with its homolog in X. campestris pv. campestris 8004, including its promoter, which contains the same imperfect PIP box and −10 box-like sequences. An xopAC mutant of X. campestris pv. campestris 8004, retained full virulence on A. thaliana ecotype Kashmir and became virulent on A. thaliana ecotype Col-0 following vein inoculation (104). It would be of interest to test whether mutation of the xopAC allele in X. campestris pv. raphani 756C might enable it to colonize the xylem of its hosts.
Twenty-six candidate T3E genes were identified in the X. oryzae pv. oryzicola BLS256 genome, with 19 preceded by the PIP and −10 motifs (Table 4; see Table S3 in the supplemental material). This number does not include the large family of TAL effector genes, discussed in the next section. In addition to each of the 10 core T3E genes, several candidates with homologs in the genomes of only a subset of the Xanthomonas strains or other plant pathogenic bacteria were found. One, xopAF, is present only in X. oryzae pv. oryzicola BLS256 and three Pseudomonas syringae genomes examined, though a homolog exists in X. axonopodis pv. vesicatoria strain 91-118 (5). With the exception of xopAF and four other T3E genes, xopAJ (synonym avrRxo1), xopAK, xopI, and xopO, all of the X. oryzae pv. oryzicola BLS256 T3E genes are present in each of the three X. oryzae pv. oryzae genomes, though xopU and xopY are pseudogenes in X. oryzae pv. oryzae KACC10331. X. oryzae pv. oryzicola was shown to inhibit the rice defense response to several TAL effector avirulence proteins of X. oryzae pv. oryzae; this ability was T3S dependent, indicating that one or more X. oryzae pv. oryzicola T3Es interferes with rice resistance gene-mediated effector recognition or signaling (53). The ones missing from the X. oryzae pv. oryzae strains are attractive candidates. No virulence function has yet been reported for XopAJ (107), but a homolog of XopAF in P. syringae was shown to suppress plant basal immunity (49). Also, HopK1, the P. syringae homolog of XopAK, was reported to suppress the defense-associated hypersensitive reaction triggered by T3E HopPsyA in Nicotiana tabacum and to induce jasmonic acid-responsive genes (33, 36). Four of the five X. oryzae pv. oryzicola BLS256 T3E genes missing from the X. oryzae pv. oryzae strains are conserved in X. axonopodis pv. vesicatoria 85-10, and as noted, the fifth, xopAF, is present in X. axonopodis pv. vesicatoria strain 91-118. It is tempting to speculate also therefore that one or more of these play a role in tissue specificity as well. Across the Xanthomonas genomes, xopC2, which encodes an SKWP-related protein, is present and intact only in the X. oryzae strains, suggesting a possible role in Xanthomonas adaptation to monocots, though an allele is also present in Ralstonia solanacearum, a dicot pathogen.
Upon delivery into host cells via the type III secretion system, TAL effectors enter the nucleus, bind to effector-specific sequences in host gene promoters, and drive transcription of host disease susceptibility (S) genes that facilitate bacterial colonization, symptom development, or pathogen dissemination (for recent reviews, see references 8, 9, and 100). TAL effector specificity is determined by a central domain of tandem, 33- to 35-amino-acid repeats. A polymorphic pair of residues at positions 12 and 13 in each repeat, the repeat-variable diresidue (RVD), specifies individual nucleotides in the binding site, with the four most common RVDs each preferentially associating with one of the four bases. Binding sites, which correspond in length to the number of repeats in the effector, are referred to as upregulated by TAL effector (UPT) boxes.
Though some X. campestris pv. raphani strains and others reported to be X. campestris pv. armoraciae carry TAL effector genes, the X. campestris pv. raphani 756C genome has none. Among Xanthomonas spp. that carry TAL effector genes, X. oryzae stands out in terms of the large numbers present in individual strains and in the diversity across strains. X. oryzae pv. oryzicola BLS256 is no exception: it contains 26 TAL effector genes and 2 TAL effector pseudogenes (see Table S4 and Fig. S3 in the supplemental material). Several TAL effectors of X. oryzae pv. oryzae have been functionally characterized (for a review, see reference 100). They include both virulence factors, which activate host S genes, and avirulence factors, so called because they activate host resistance (R) genes. Function has yet to be assigned for any X. oryzae pv. oryzicola TAL effector, but candidate targets in rice for six X. oryzae pv. oryzicola BLS256 effectors were identified on the basis of UPT box nucleotide sequences predicted from the effector RVDs (56).
As in X. oryzae pv. oryzae KACC10331, X. oryzae pv. oryzae MAFF 311018, and X. oryzae pv. oryzae PXO99A, all of the encoded X. oryzae pv. oryzicola BLS256 TAL effector repeats are either 33 or 34 amino acids long, except the fifth repeat of Tal9b, which is missing 4 amino acids at the end following the RVD. The numbers of RVDs range from 13 to 27 (see Fig. S3 in the supplemental material). The RVD sequences are distinct from one another and from those of the X. oryzae pv. oryzae TAL effectors, suggesting that the two pathovars generally target different host genes. Furthermore, the X. oryzae pv. oryzicola BLS256 and X. oryzae pv. oryzae TAL effectors appear to have differentiated independently (Fig. 4). The maximum-likelihood tree inferred from an alignment of X. oryzae pv. oryzicola BLS256, X. oryzae pv. oryzae MAFF 311018, and X. oryzae pv. oryzae PXO99A TAL effector nucleotide sequences, with the repeat regions removed and using the pthA gene of X. axonopodis pv. citri as the outgroup, places all X. oryzae pv. oryzicola BLS256 TAL effector sequences in a monophyletic clade with 73% bootstrap support. The X. oryzae pv. oryzae TAL effector sequences are also monophyletic, albeit with lower bootstrap support (20%). The low bootstrap support for the X. oryzae pv. oryzae clade reflects uncertainty in the location of the outgroup. However, analyses with 17 other members of the Xanthomonas TAL effector (79) family as outgroups each consistently split the X. oryzae pv. oryzicola BLS256 and X. oryzae pv. oryzae sequences into distinct monophyletic groups (data not shown). X. oryzae pv. oryzae KACC10331 TAL effectors were not included in the analysis due to a discrepancy in the number of these genes in the published genome and previous estimates by Southern blotting suggesting misassembly (44, 105). Nevertheless, the evidence suggests that a single TAL effector gene existed in the common ancestor of X. oryzae pv. oryzicola and X. oryzae pv. oryzae that was recruited for specialization and underwent duplication independently in the two lineages. The tree also suggests that duplication occurred both before and after the split of X. oryzae pv. oryzae MAFF 311018 and X. oryzae pv. oryzae PXO99A. Though phylogenetic resolution is low within the X. oryzae pv. oryzae cluster, there is some indication of gene loss as well. Another possibility is that intragenome recombination, encouraged by the central repeat region, causes inconclusive phylogenetic signals within the X. oryzae pv. oryzae lineage. Analysis of amino acid sequence data (data not shown) also shows a monophyletic X. oryzae pv. oryzicola clade; however, the X. oryzae pv. oryzae clade is split by the outgroup. This tree suggests at least two TAL effector genes in the ancestor, but low bootstrap support and conflicting results from the more informative DNA sequence data (83) argue against this conclusion.
The X. oryzae pv. oryzicola BLS256 TAL effector genes are distributed across 12 loci in the genome (see Fig. S3 in the supplemental material), defined by proximity and shared orientation of the genes at each locus. Five of these contain just one gene, three contain two genes, three contain three genes, and the remaining locus contains six genes and the two pseudogenes. In contrast to TAL effector genes in the sequenced X. oryzae pv. oryzae genomes, which are typically separated by a conserved 989-bp sequence, the X. oryzae pv. oryzicola BLS256 TAL effectors are uniformly preceded immediately by 108 to 134 bp of conserved sequence similar to the start codon proximal end of the 989-bp X. oryzae pv. oryzae sequence, but with a 7-bp insertion and 10 substitutions. Though shorter, this sequence suggests that, as in X. oryzae pv. oryzae, the loci are not operons but clusters of genes driven individually by a conserved promoter. Loci 1, 2, 9, and 12 are followed by the first 40 bp of this sequence, and genes tal2a, tal3a, tal3c, and tal8 are followed immediately by the first 28 bp of the sequence, providing evidence of a history of duplication and recombination with endpoints within the sequence. IS elements or fragments flank some of the genes, outside these putative promoter sequences. Curiously, the two pseudogenes that constitute locus 3 of X. oryzae pv. oryzae PXO99A are each immediately preceded not by the 989-bp sequence but by a 109-bp sequence closely related to the X. oryzae pv. oryzicola BLS256 sequence carrying a similar 7-bp insertion, suggesting that this degenerate locus shares its origin more directly with the X. oryzae pv. oryzicola BLS256 loci, possibly due to a horizontal transfer event.
The rice pattern recognition receptor XA21 confers resistance to X. oryzae pv. oryzae strains that secrete the protein Ax21 (activator of XA21-mediated immunity) (45, 85). A 17-amino-acid peptide, called axYS22, from the Ax21 N terminus is sufficient to trigger XA21-mediated immunity. Sulfation of tyrosine 22 is critical for this activity (45). Ax21 is encoded by the ax21 gene. The genes raxR and raxH and the broadly conserved phoP and phoQ genes encode two-component regulatory systems that are predicted to control expression of ax21 (10, 46). Sulfation of Ax21 is thought to be accomplished by the products of raxP and raxQ, which function in concert to produce 3′-phosphoadenosine 5′-phosphosulfate (PAPS), and by the product of raxST, which is similar to mammalian tyrosyl-sulfotransferases. RaxST is hypothesized to catalyze the transfer of the sulfuryl group from PAPS to the tyrosine(s) of Ax21 (31, 80). Finally, raxA, raxB, and raxC encode components of a type I system for Ax21 secretion (19). The ax21 gene is conserved in all sequenced Xanthomonas spp., with over 89% identity, but rax gene content varies: phoP, phoQ, raxC, raxH, raxP, raxQ, and raxR are present in all strains, but raxA, raxB, and raxST are present only in the X. oryzae strains and in X. axonopodis pv. vesicatoria 85-10 (see Table S5 in the supplemental material). Furthermore, in X. oryzae pv. oryzae KACC10331, which lacks Ax21 activity, there is a frameshift mutation in raxST that results in a premature stop codon, and the raxA and raxB genes are divergent in their C- and N-terminal coding regions relative to the other X. oryzae pv. oryzae strains. In contrast, in X. oryzae pv. oryzicola BLS256 all the rax genes are present and 95 to 99% identical to their counterparts in X. oryzae pv. oryzae PXO99A. None of these variations, however, correlate with host (species) or tissue specificity.
Two-component signal transduction systems (TCSTSs), consisting of a sensor kinase and a response regulatory protein, are used broadly by prokaryotes to modulate gene expression in response to environmental cues (29, 87). TCSTS-related gene content across all the sequenced Xanthomonas genomes is generally similar (see Table S6 and Table S7 in the supplemental material). X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 possess 108 and 99 TCSTS-related genes, respectively. However, the X. campestris and X. oryzae strains are missing several, different sets of TCSTS genes that are present, with a few exceptions, in the X. axonopodis strains. Since the X. axonopodis strains are predicted to be most closely related to the ancestral strain (Fig. 2), this observation suggests that the lineages leading to the X. campestris and X. oryzae strains discarded ancestral TCSTS genes no longer necessary for survival as they adapted to their hosts and other features of their environments. Both the X. campestris and the X. oryzae strains also contain a small number of TCSTS genes specific to their lineages (XCR2315, XCR2343, XCR2344, XCR2730, and orthologs in the X. campestris strains and XOC3605, XOC3969, XOC3968, and orthologs in the X. oryzae strains). These may have been acquired during differentiation from the other Xanthomonas strains and could represent host-specific adaptations.
Specific functions of most Xanthomonas TCSTSs are yet to be discerned. Two TCSTSs that have been functionally characterized in Xanthomonas are conserved across all the sequenced strains: the PhoP and PhoQ system (corresponding to PXO_02836 and PXO_02837) is required in X. oryzae pv. oryzae PXO99A for sensing low extracellular levels of Ca2+ and expressing virulence genes, including the T3S regulator hrpG (34), and the ColR and ColH system (corresponding to XC_1049 and XC_1050) in X. campestris pv. campestris 8004 also regulates hrp genes as well as genes for stress tolerance (106). Shared regulatory substrates of PhoPQ and ColRH suggest the possibility that differential integration and cross talk among shared TCSTSs may also play a role in host or tissue specificity.
The rpf gene cluster positively regulates the synthesis of extracellular enzymes, extracellular polysaccharide, biofilm dispersal, and virulence in X. campestris pv. campestris by governing synthesis and perception of the intercellular signal DSF (6, 22, 23, 84, 99, 101). Mutations in this cluster also lead to loss of virulence in X. axonopodis pv. citri, X. oryzae pv. oryzicola, and X. oryzae pv. oryzae (12, 82, 84, 90). RpfG is the response regulator implicated in signal transduction in response to DSF (6, 71, 84). RpfG contains an HD-GYP domain and functions as a cyclic di-GMP phosphodiesterase (71). Cyclic di-GMP is a second messenger that was originally identified to be an allosteric activator of cellulose synthesis but is now known to regulate a range of functions in diverse bacteria, including the virulence of several animal and plant pathogens (37, 70). Two other protein domains, named for the conserved amino acid motifs GGDEF and EAL, have been implicated in cyclic di-GMP synthesis and degradation, respectively. HD-GYP, GGDEF, and EAL domains are widely conserved and abundant in bacteria. They are found largely in combination with other signaling domains, suggesting that their activities in cyclic di-GMP turnover can be modulated by environmental cues. The genomes of X. campestris pv. campestris 8004 and X. campestris pv. campestris ATCC 33913 encode 3 proteins with an HD-GYP domain and 35 proteins with a GGDEF and/or EAL domain. X. campestris pv. campestris B100 lacks three genes encoding GGDEF domain proteins that are found in X. campestris pv. campestris 8004 and X. campestris pv. campestris ATCC 33913. The HD-GYP domain proteins are completely conserved across all the sequenced genomes. However, there are fewer genes encoding GGDEF and/or EAL domain proteins in X. campestris pv. raphani 756C than in the X. campestris pv. campestris strains and fewer still in X. oryzae pv. oryzicola BLS256 and again in the X. oryzae pv. oryzae strains. Specifically, of the inventory of genes in X. campestris pv. campestris 8004 and X. campestris pv. campestris ATCC 33913, one gene is absent in X. campestris pv. raphani 756C, five additional genes are absent in X. oryzae pv. oryzicola BLS256, and a further five genes (making 11 in total) are absent in the X. oryzae pv. oryzae strains. The X. oryzae pv. oryzae strains also have an additional gene encoding a GGDEF protein that is not found in the X. campestris pv. campestris strains. Functional genomic analysis has revealed that 13 genes encoding HD-GYP, GGDEF, and/or EAL domain proteins significantly contribute to virulence of X. campestris pv. campestris 8004 on Chinese radish (72). Intriguingly, 10 of these genes are retained in the X. oryzae pv. oryzae genomes (see Table S8 in the supplemental material). There are no clear correlations of GGDEF and EAL domain protein content with tissue or host specificity, but as suggested by differences in IS element content discussed above, it is possible that the reduced number of GGDEF and EAL proteins in the X. oryzae clade reflects greater stability and lower complexity in the environmental niche that the X. oryzae strains occupy.
TonB-dependent receptors (TBDRs) are outer membrane proteins of proteobacteria involved in iron uptake (64). TBDRs are also carbohydrate transporters (7, 58, 78) and may play a key role in sensing and exploitation of plant-derived carbon sources. In X. campestris pv. campestris ATCC 33913, they are involved in utilization of plant cell wall compounds and in virulence (7). Indeed, many of the X. campestris pv. campestris ATCC 33913 TBDR genes are linked to genes for carbohydrate uptake and catabolism, as well as signaling (7). TBDRs classically harbor an N-terminal plug domain linked to a C-terminal β-barrel domain, which is inserted in the bacterial outer membrane (14, 27, 61, 81). At their N termini, TBDRs display a signal peptide for transport across the inner membrane and at the N-terminal end of the plug domain a conserved stretch of amino acids called the TonB box, tLDXVXV in X. campestris pv. campestris ATCC 33913 (lowercase indicates a less well conserved residue and X represents any amino acid ), that interacts with TonB (61, 81).
As in X. campestris pv. campestris ATCC 33913, TBDRs are overrepresented in X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 relative to other Gram-negative bacteria, consistent with the hypothesis that they are an ancient and important class of proteins in the genus. The X. campestris pv. raphani 756C genome encodes 70 complete TBDRs and 2 truncated TBDRs, 1 missing the plug domain and the other missing the β-barrel domain (see Table S9 in the supplemental material). X. oryzae pv. oryzicola BLS256 contains 36 complete TBDR genes, 2 TBDR genes that lack a detectable TonB box but are otherwise complete, 5 pseudogenes missing the plug domain, and 14 pseudogenes missing the β-barrel domain (see Table S10 in the supplemental material). Three of the pseudogenes missing the plug domain encode a signal peptide and a TonB box, and two missing the β-barrel domain lack a detectable TonB box in the plug domain, indicating that combinations of internal and terminal deletions (or partial duplications) contributed to the large number of pseudogenes in X. oryzae pv. oryzicola BLS256. The relative reduction in the number of intact TBDRs in X. oryzae pv. oryzicola BLS256 is also reflected in the X. oryzae pv. oryzae genomes (see Fig. S4 in the supplemental material). Similar to the GGDEF and EAL domain proteins and consistent with the large numbers of IS elements, smaller numbers of TBDRs in the X. oryzae strains again may reflect adaptation to a closer and more stable association with the host.
The TBDR genes in all of the sequenced Xanthomonas genomes and two Xylella fastidiosa genomes were clustered on the basis of sequence similarity, and a phylogenetic tree was drawn using the neighbor-joining method (see Fig. S4 in the supplemental material). Of the 19 TBDR genes implicated in virulence in X. campestris pv. campestris ATCC 33913 (7), 10 are conserved across all the Xanthomonas genomes (clade A; see Fig. S4 in the supplemental material), suggesting a general role in virulence. One of these may be particularly important, as it is also conserved in the X. fastidiosa genomes (clade A*; see Fig. S4 in the supplemental material). Seven of the X. campestris pv. campestris ATCC 33913 genes cluster only with TBDRs from the other X. campestris strains and the X. axonopodis strains (clade B; see Fig. S4 in the supplemental material). These might be important specifically in pathogenicity for dicots. One of these is conserved in X. fastidiosa (clade B*; see Fig. S4 in the supplemental material). One of the X. campestris pv. campestris ATCC 33913 TBDRs implicated in virulence clusters with genes in only the X. campestris strains (clade C; see Fig. S4 in the supplemental material) and may be important to association with brassicas. The remaining X. campestris pv. campestris ATCC 33913 TBDR gene resides in a cluster missing representatives only from X. campestris pv. raphani 756C and X. oryzae pv. oryzae KACC10331 (clade A′; see Fig. S4 in the supplemental material). No X. oryzae-specific TBDR gene clusters that would reflect adaptations to monocots or to rice specifically were found, though one clade consists of seven X. oryzae pv. oryzicola BLS256 genes and one X. campestris pv. raphani 756C gene (clade D; see Fig. S4 in the supplemental material), suggesting amplification in X. oryzae pv. oryzicola BLS256 of a TBDR gene common to both strains. No clusters comprise genes only from vascular pathogens or only from nonvascular pathogens.
The X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 genomes encode several outer membrane-associated, fimbrial and nonfimbrial adhesin proteins that are similar to their counterparts in animal pathogenic bacteria and in other plant-pathogenic bacteria (see Table S11 in the supplemental material). One of these nonfimbrial proteins is an ortholog of the previously described XadA (Xanthomonas adhesin-like protein A)/XadA1 protein of X. oryzae pv. oryzae (68). XadA1 and its paralog, XadB, play an important role in the early stages of rice infection by X. oryzae pv. oryzae (17). All sequenced Xanthomonas strains encode XadA1, but only the X. oryzae strains encode XadB. Another paralog, called XadA2, is present only in X. axonopodis pv. citri 306 and Xanthomonas axonopodis pv. vesicatoria 85-10 but not in the other sequenced xanthomonads. In addition to XadA1/XadB, the X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 genomes encode another nonfimbrial adhesin-like protein, YapH (Yersinia autotransporter-like protein H). Orthologs of YapH are also present in the X. campestris pv. campestris and X. oryzae pv. oryzae genomes.
A characteristic feature of many adhesins, such as the XadA/XadB and YapH protein families, is the presence of internal amino acid repeats. Differences in the numbers of these repeats and sequence variations in these repeats are observed for the various adhesins of X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256, compared with each other and with the orthologs from the vascular counterparts of these strains (data not shown). The observed differences in the repertoire of adhesins, particularly the sequence variations among them, might contribute to host and tissue specificity. A similar adhesion-based host specificity model was also recently proposed by Mhedbi-Hajri and coworkers, who found a correlation of adhesin repertoires with Xanthomonas host specificity as well as evidence of rapid evolution among adhesin genes (54).
To identify other genes that may be involved in host specificity or tissue specificity, an ab initio, comprehensive comparison across genomes was carried out. All non-transposon-related protein-coding genes were grouped into clusters of homologs, and for each cluster the most restrictive category based on the strains represented in the cluster was identified by using the two 2-way classifications of strains as monocot (M) versus dicot (D) with respect to the host(s) they infect and vascular (V) versus nonvascular (N) with regard to their mode of infection (Table 1). A cluster of homologs was considered to be specific to a category if all cluster members belong to this category, although a homolog may be missing in one or more strains of the category, and none are found elsewhere (see Fig. S5A in the supplemental material). A cluster of homologs was considered to be essential to a category if all strains in the category are represented in the cluster, although there may be additional homologs in other strains (see Fig. S5B in the supplemental material). The intersection of the specific and essential sets defines clusters for which all strains in the category are represented and there are no additional homologs in other strains (Fig. 5). A total of 2,508 clusters are shared in all 10 bacterial strains analyzed; 333 clusters occur only in D strains, 90 occur only in M strains, only 1 cluster appears to be unique to V strains, and none is unique to N strains. Clusters specific and essential to the ND, VD, NM, and VM categories were also found, though they likely suffer bias due to the small numbers and close relatedness of strains within these categories. Because the gene clusters represent putative orthologs, their numbers include cases in which different copy numbers of a gene in different strains result in multiple clusters, some of which are specific to the strains with higher copy numbers of the gene. For example, the 333 D-specific clusters may include a cluster of genes homologous to a category-nonspecific cluster because the gene is present in two (sufficiently divergent) copies in the D group strains and present in one copy in all the rest. Also, since the numbers are based on annotated protein-coding genes, there might be errors due to missed genes in some genomes. To refine the sets of D-, M-, and V-specific and essential gene clusters and to minimize possible errors due to unannotated genes, for each cluster a representative protein sequence was compared to every cluster in the complementary category using BLASTp, and the coding sequence of that protein was compared to the genome sequence of every strain in the complementary category by using tBLASTn, with an E-value threshold of 1e−10 in each case. Only clusters for which the representative protein showed no sequence matches in the complementary category were deemed truly category specific and essential. This refinement resulted in 172 D-specific gene clusters, 54 M clusters, and the 1 V cluster (representative sequences of the 2,508 shared clusters, the refined D-, M-, and V-specific and essential clusters, and the ND-, VD-, NM-, and VM-specific and essential clusters, as well as the complete lists of genes in the D-, M-, and V-specific and essential cluster sets, are given in Data Files S1 through S9 in the supplemental material).
The refined sets of D-, M-, and V-specific and essential clusters were grouped according to functional category. The single V-specific and essential cluster is annotated as a hypothetical protein gene, unfortunately providing no basis for a functional hypothesis. Among the D- and M-specific and essential clusters (see Fig. S6 in the supplemental material), the most highly represented, accounting for a third of the dicot pathogen-specific and essential genes and two-thirds of the monocot pathogen-specific and essential genes, are hypothetical protein genes or genes with no functional category assigned and putative secreted protein genes. Genes for intracellular trafficking and secretion are also highly represented. Consistent with the distributions of TBDR genes, genes for carbohydrate transport and metabolism are the next largest group for the dicot pathogen clusters, but there is only one member of this functional category unique to the monocot pathogen clusters. Less well represented categories form a diverse assortment. These findings suggest a number of possibilities. Type III effectors or other secreted proteins critical to host or tissue specificity might be among the hypothetical proteins or putative secreted proteins. Differences in carbohydrate sensing, uptake, or catabolism may play key roles. Genes among some of the poorly represented functional categories, including secondary metabolite synthesis, transport, or catabolism, lipid transport and catabolism, and defense mechanisms, might also contribute. Overall, the large number of genes that distinguish dicot from monocot pathogens suggests the possibility that complex sets of adaptations determine host specificity. In contrast, minimal distinguishing differences in gene content between the vascular and nonvascular pathogens indicate that subtler differences, at the level of amino acid or noncoding nucleotide polymorphisms, may be the determinants of tissue specificity.
Lu and colleagues (52) found no differences correlating to host or tissue specificity in the gene content of several pathogenesis-associated gene clusters across most of the genomes examined here. The authors concluded that determinative differences likely reside among the secretory and regulatory substrates of those clusters, among polymorphisms within genes or regulatory sequences in the clusters, or in unrelated genes elsewhere in the genome. The complementary, genome-wide comparative analyses presented here, enabled by the completion of the X. campestris pv. raphani 756C and X. oryzae pv. oryzicola BLS256 genomes, further support this conclusion and indeed point to type III effector genes, cyclic di-GMP signaling protein genes, two-component signaling system genes, TBDRs, and several other genes potentially related to interactions of Xanthomonas spp. with plants to be candidate determinants of host and tissue specificity. Comparisons across a broader set of genome sequences may help narrow this list. Recently acquired draft sequences of several Xanthomonas genomes will be useful in this regard (40a, 65, 88, 94). At the same time, future improvements of computational methods for large-scale comparisons may reveal polymorphisms within genes and noncoding sequences that will contribute to refined models for host and tissue specificity. Ultimately, in-depth, large-scale functional characterization of candidates will be necessary to pinpoint those that play key roles.
We thank Anne Alvarez for assistance in selecting X. campestris pv. raphani 756C as a representative strain, Sophien Kamoun for providing that strain, Fadi Towfic and the BCB Lab of Iowa State University for assistance with TAL effector phylogenetic analysis, and Jonas Gaiarsa for assistance with IS element analysis.
This work was supported by the USDA-NSF Microbial Genome Sequencing Program (award CSREES 20043560015022).
†Supplemental material for this article may be found at http://jb.asm.org/.
Published ahead of print on 22 July 2011.