General features of the 1448A genome.
The 1448A genome consists of one circular chromosome (5,928,785 bp) and two plasmids, p1448A-A (131,950 bp) and p1448A-B (51,711 bp) (Table and Fig. ). In total, 5,353 ORFs were identified in the 1448A genome. A putative function was assigned to 3,626 (68%) ORFs, with the remaining ORFs (1,727 total) designated as hypothetical proteins (224; 4%), conserved hypothetical proteins (822; 15%), or proteins of unknown function (681; 13%). The classification of 1448A ORFs into biological-role categories is summarized in Fig. S1 in the supplemental material. Transport and binding proteins constitute the largest role category found in 1448A (728 ORFs; 14%), followed by genes involved in cellular processes (708 ORFs; 13%). Approximately 10% of the total ORFs have a regulatory role (535 ORFs) and presumably facilitate adaptation to the different environments faced by 1448A as a pathogen, an epiphyte, and a seed-borne bacterium during different phases of its life cycle. Approximately 5% (256) of the total number of ORFs fall within the mobile genetic element category, with 201 ORFs involved in transposition constituting the majority of this role category. The 1,348 paralogous families identified in 1448A, encompassing 3,177 ORFs (59%), indicate the extent of gene duplication within the genome. Apart from ORFs encoding proteins, the 1448A genome also contains 64 tRNAs, five rRNA operons, and one structural RNA.
General features of the P. syringae pv. phaseolicola 1448A genome and comparison with the P. syringae pv. tomato DC3000 genome
FIG. 1. Features of the P. syringae pv. phaseolicola 1448A genome. (A) The chromosome. The ORFs on the positive and negative strands are depicted on the outermost and second circle of the figure, respectively. The ORFs are color coded based on the major grouping (more ...) Comparative genomics with P. syringae pv. tomato DC3000.
Access to complete genomes of pathovars from two of the three P. syringae
phylogenetic clusters (53
) provides the opportunity to expand comparison to the whole-genome level. With respect to size and other genomic features, 1448A is generally similar to the tomato pathogen DC3000 (Table and Fig. ; see Table S1 in the supplemental material), with 83% of the ORFs shared between DC3000 and 1448A (BLASTP cutoff, E < 10−15
). Furthermore, classification of the ORFs from DC3000 and 1448A into functional-role categories reveals that the genomes of these two pathovars are highly conserved with respect to the ability to encode proteins required for basic physiological and metabolic functions (see Fig. S1 in the supplemental material). Indeed, an analysis of 154 biological processes included in the Genome Properties System (29
), a computational method to identify the presence/absence of metabolic pathways and systems in prokaryotic genomes, detected only two differences between the strains. One difference was the presence of an intein in DC3000 (PSPTO3229) where none is found in 1448A. The second difference was an authenticated point mutation in 1448A in the gene for N
-formylglutamate amidohydrolase (PSPPH4868), resulting in a truncation and raising the possibility that 1448A may be unable to degrade histidine to glutamate by this pathway (see Table S2 in the supplemental material). In the sections below, we report on features of the 1448A genome important to its pathogenic lifestyle and, when relevant, compare these to the related but distinct pathovar DC3000.
Virulence-implicated ORFs in 1448A and DC3000.
We previously identified 298 ORFs implicated in virulence in the DC3000 genome (10
). Within the virulence category, we include effectors (both Avr and Hop) and other proteins of potential benefit to pathogenesis, including those involved in epiphytic fitness. Searches of the 1448A genome using the DC3000 virulence ORFs revealed that 240 (81%) of the DC3000 virulence ORFs are present in 1448A, including genes for many Hop effectors; secretion pathways I, II, and III; cell wall-degrading enzymes; adhesins; and nonribosomal peptide synthases (NRPSs) (see Table S3 in the supplemental material). Examples of virulence factors present in DC3000 but absent from 1448A include genes involved in the synthesis and regulation of the phytotoxin coronatine. On the other hand, a vestigial virulence-related gene, inaZ
, which encodes the ice nucleation protein that is responsible for frost damage in plants (30
), is present in 1448A but not in DC3000. However, the inaZ
gene in 1448A (PSPPH1596 and PSPPH1598) is unlikely to encode a functional gene product, given that the reading frame is interrupted by a transposase (PSPPH1597; ISPsy18 transposase; Mutator family).
Although many virulence factors, such as the hrp
genes encoding the TTSS, are highly conserved in both sequence and location between the two pathovars, the sizes and locations of others vary substantially. For example, the effector candidate HopAS1 (PSPPH4736; 1,361 amino acids) is truncated in DC3000 due to a point mutation but is present in a longer version in 1448A. HopX1 (PSPPH1296; formerly AvrPphE) is present and functional in DC3000 but appears to be nonfunctional as an avirulence determinant in 1448A, as previously reported for race 6 strains of P. syringae
pv. phaseolicola (62
). The effector gene inventories of DC3000 and 1448A are large and characteristically contain a mix of functional and nonfunctional genes that differ among strains and pathovars (10
; M. Vencato et al., unpublished data). As experimentally documented and discussed elsewhere, 1448A harbors at least 22 genes (see Table S3 in the supplemental material) encoding proteins that can be delivered by the TTSS into plant cells (14
; Vencato et al., unpublished data).
Comparison of virulence genes that are plasmid borne in 1448A with homologs in DC3000.
With regard to differences in location, genes encoding putative homologs of most of the virulence factors that are located on plasmids p1448A-A and p1448A-B are chromosomally located in DC3000. For example, genes encoding 1448A TTSS effectors HopD1 and HopQ1 are located on plasmid p1448A-A (PSPPHA0010 and -A0012), whereas their orthologs in DC3000 (PSPTO0876 and -0877) are clustered with other virulence factor genes on a chromosomal genomic island (34
). Similarly, of the four shared virulence factors that are plasmid borne in DC3000, all but one, a copy of levansucrase (PSPPHA0027 on p1448A-A), are found on the 1448A chromosome.
In general, the plasmids present in the two pathovars exhibit significant differences in both size and content. Plasmids p1448A-A and p1448A-B belong to the pPT23A family found in DC3000 and other P. syringae
). The small plasmid (p1448A-B) carries the rulAB
genes (PSPPHB0003 and -B0004) for UV resistance but lacks type III effector genes. By contrast, in DC3000, the rulAB
genes are located on the plasmid that is enriched in virulence factors (pDC3000A) and rulB
has a disrupted reading frame. Plasmid p1448A-B does carry an ORF with a potential role in virulence, PSPPHB0059, a homolog of an ExeA-like protein (PMA4326B11) present on plasmid pPMA4326B in P. syringae
pv. maculicola (61
). ExeA, along with ExeB, is required for type II secretion of the toxin aerolysin in the pathogen Aeromonas hydrophila
). No homologs of ExeB were found in 1448A, but a fragment of an ExeA-like ORF (PSPPHA0124) is present in the pathogenicity island (PAI) on p1448A-A.
Other genes of potential significance to pathogenicity found on plasmid p1448A-A but not in DC3000 include homologs of the atypical fimbrial genes found in Salmonella
(PSPPHA0063 to -65) and sinR
(PSPPHA0062). A fourth gene, safA
, frequently but not exclusively associated with this locus in Salmonella
, is absent from 1448A. The safA
genes encode fimbrial structural proteins, whereas the safBC
genes encode usher and chaperone proteins and sinR
encodes a LysR transcriptional regulator. All were identified on a 47-kb genomic island in Salmonella enterica
serovar Typhimurium and were adjacent to a transposase gene, upstream of safA
. The synteny in p1448A-A is inverted compared to Salmonella
, with sinR
located upstream of safA
rather than downstream of safD
. Transposases are located on either side of the gene cluster (PSPPHA0061 and PSPPHA0068). It has been recognized that the saf
genes are horizontally transferred (20
), but a role in virulence has yet to be determined. The horizontal transfer of the saf
genes has been linked to the evolution of Salmonella
serovars and may be associated with host adaptation. Homologs of saf
genes have not been reported previously in a plant-pathogenic pseudomonad.
Comparison of virulence genes in the plasmid-borne PAI between P. syringae pv. phaseolicola strains 1448A (race 6) and 1449B (race 7).
Knowledge of the 1448A genome sequence also provides an opportunity to compare gene content and organization with a race 7 strain of P. syringae
pv. phaseolicola, 1449B. This strain contains a well-characterized PAI on its 154-kb plasmid (32
) that includes the known effector genes hopAB1
, and ORF4, whose product has cysteine protease motifs (31
). Also located on the plasmid, but physically separated from the PAI, is hopD1
. The main difference between the 1449B PAI and the homologous region in 1448A is the absence from 1448A of a 9,471-nt sequence that carries hopF1
. The deletion was flanked on its left and right borders by a chimeric ORF with homology to IS1492b
) and IS1090
) and a region containing several transposon gene fragments, including IS801
, respectively (49
). Loss of hopF1
helps to explain the virulence of 1448A to the bean cultivar Red Mexican, which carries the matching R1
). In 1448A, an insertion of 10 nt in the coding sequence of the homolog of ORF3 in 1449B (32
) results in a deduced 218-amino-acid protein (PSPPHA0122; type III effector HopAW1) that includes the catalytic triad CHD motif indicative of the cysteine protease activity found in several effectors. The H and D components of the triad are absent from the peptide encoded by the truncated ORF3 in 1449B (accession number AAD47205
Attachment factors potentially contributing to virulence.
Bacterial attachment and colonization represent essential stages in the pathogenic processes of many bacteria and depend upon the presence of multiple factors. Attachment factors previously identified in DC3000 include type IV pili, alginate, nonalginate capsular polysaccharide, exopolysaccharides, and filamentous hemagglutinin, many of which are also found in 1448A. For example, all genes required for alginate biosynthesis are present in both pathovars, as well as in Pseudomonas aeruginosa
, although regulation of its production in P. syringae
probably differs from that in P. aeruginosa
). Similarly, a full complement of genes required for synthesis of type IV pili appears to be present, and pilus biosynthesis has been experimentally confirmed in both DC3000 and another strain of P. syringae
pv. phaseolicola (50
). Although hemagglutinins have not been shown to play a significant role in pseudomonad virulence, three genes encoding filamentous hemagglutinin-like proteins are present in 1448A, as are three genes implicated in biosynthesis of capsular polysaccharides and two involved in exopolysaccharide biosynthesis.
Secretion systems associated with virulence.
An unusual feature of the 1448A chromosome is the presence of a cluster of 12 genes (PSPPH2519 to -2538) encoding TTSS components whose closest homologs are variously in Rhizobium
, and Aeromonas
spp. or in P. aeruginosa
. Mutations in numerous hrp
genes encoding the Hrp TTSS abolish the ability of P. syringae
pv. phaseolicola to cause disease in beans or to elicit the hypersensitive response in nonhost or resistant plants (39
), which suggests that this second, putative TTSS does not deliver effectors into plant cells. No sequences associated with HrpL-responsive promoters are associated with the second set of TTSS genes, and the expression and functions of these genes remain to be tested.
In addition to the TTSS, three other major secretion pathways (I, II, and IV) are found in many pathogenic bacteria, including other pseudomonads (18
). Like DC3000, 1448A encodes multiple type I secretion pathway components, although protein substrates implicated in virulence have yet to be identified. The 1448A genome also contains genes for two distinct type II secretion pathways, an occurrence previously observed in P. aeruginosa
). Candidate substrates for the type II pathways include two cellulases, two pectate lyases, a pectin lyase, and a polygalacturonase. By contrast, DC3000 contains genes only for a single type II pathway, and the cell wall-degrading enzymes representing the most likely secretion substrates are limited to two cellulases and a pectin lyase.
Type IV secretion systems are employed by various bacteria to secrete diverse substrates and are closely related to systems involved in conjugal transfer of DNA (11
). In contrast to DC3000, but similar to P. syringae
pv. maculicola (61
), 1448A has a large number of genes with high similarity to the type IV secretion genes of Agrobacterium tumefaciens
operon and virD4
). These genes are found on both plasmids in 1448A, with the large plasmid containing a subset of the virB
operon and the small plasmid containing the entire operon, along with the homolog of virD4
. However, the fact that these genes are on plasmids, the presence of a virB5
homolog found only in systems transferring DNA (11
), and the presence of a small protein between virB5
found only in conjugal systems (11
) strongly suggest that the Type IV homologs in 1448A are involved in the process of conjugal transfer of DNA rather than virulence-related protein translocation.
Virulence-implicated metabolic, biosynthetic, and regulatory capabilities.
The known sources of carbon potentially available to P. syringae
while growing in the host apoplast (intercellular space) are sucrose and gamma-aminobutyric acid (GABA) (25
), with the latter also serving as a nitrogen source. The genes required to transport and metabolize these substrates have been reported in DC3000 (10
) and are also present in 1448A. As seen in DC3000, the genes encoding the sucrose porin precursor and sucrose-6-phosphate hydrolase (PSPPH5187 and -5192) are clustered, along with a putative regulator (PSPPH5193) and other sugar transporters, on the 1448A chromosome. As in DC3000, 1448A has one GABA permease (PSPPH4937) and multiple copies of GABA transaminase (PSPPH0095, -3457 [authentic frameshift], and -5040) and succinate-semialdehyde dehydrogenase (PSPPH0096, -2572, and -5038 [authentic frameshift]).
The ethylene-forming enzyme (EFE) 1-aminocyclopropane-1-carboxylate (ACC) oxidase (EC 184.108.40.206) has been reported in some strains of P. syringae
pv. phaseolicola, such as PK2 (22
), and other P. syringae
). EFE, which catalyzes the conversion of the precursor 1-aminocyclopropane-1-carboxylate to ethylene in the final step in the synthesis of the phytohormone, is not present in 1448A or DC3000. However, the enzyme ACC deaminase (EC 220.127.116.11), which also uses ACC as a substrate, is present in 1448A (PSPPH1761) and DC3000 (PSPTO3675). Transgenic tomato plants expressing ACC deaminase from Pseudomonas
sp. strain 6G5 have demonstrated delayed ripening of tomato fruit due to reduced ethylene synthesis (36
The genes encoding the enzymes IaaM and IaaH, required for the production of the plant hormone indole-3-acetic acid (IAA; auxin), are present in both pathovars; however, the enzyme indoleacetate-lysine ligase (IaaL), which converts IAA to IAA-lysine, is present in DC3000 (10
) but not in 1448A. In DC3000, iaaL
is part of the Hrp regulon (21
), is located adjacent to the transposable element ISPsy7, and appears to have been inserted into a portion of the genome that otherwise shares a high degree of gene synteny with 1448A.
) in P. fluorescens
SBW25, a plant growth-promoting pseudomonad, encodes the genes required for the synthesis of an acetylated cellulose-like polymer that is associated with fitness of the microbe in the rhizosphere and phyllosphere (23
). The homologs of the genes comprising the wss
operon, except wssJ
, are present in DC3000 (PSPTO1026 to -1034) but not in 1448A.
Global regulatory proteins associated with P. syringae
virulence or epiphytic fitness include the alternative sigma factors HrpL and RpoN, the two-component regulators GacS and GacA, quorum-sensing components AhlI and AhlR, and the TetR family activator of the quorum-sensing system AefR (47
). Genes encoding all of these proteins are found in 1448A (see Table S1 in the supplemental material). HrpL is unique to P. syringae
and other phytopathogens with group I Hrp systems, such as Erwinia amylovora
), and genes activated by HrpL are addressed elsewhere (14
; Vencato et al., unpublished data).
Nonribosomal peptide and polyketide synthases.
Several P. syringae
pv. phaseolicola strains are known to produce the phytotoxin phaseolotoxin, which acts by inhibiting the activity of ornithine carbamoylphosphate transferase (OCTase), an enzyme in the pathway for synthesis of arginine (19
). Toxigenic strains produce a phaseolotoxin-insensitive OCTase encoded by argK
), which is part of the argK
gene cluster (54
) that is involved in the biosynthesis of phaseolotoxin. Gonzalez et al. (24
) demonstrated via PCR that 1448A contains the argK
cluster. However, although they reported that the strain does not produce phaseolotoxin, recent assays indicate that 1448A does produce phaseolotoxin (R. Jackson, unpublished data). As expected, 1448A contains two OCTases, one of which is encoded by argK
(PSPPH4319) clustered with other loci associated with toxin production. The presence of a transposase immediately downstream of argK
supports the acquisition of the cluster by horizontal transfer (54
). DC3000, which does not synthesize the toxin, contains only a single OCTase (PSPTO4164) that is an ortholog of the phaseolotoxin-sensitive enzyme (PSPPH3895) in 1448A.
The 1448A genome has eight regions that contain genes with predicted NRPS and polyketide synthase domains (see Table S4 in the supplemental material). Three of these, including the argK-tox region, appear to be unique to 1448A compared to other sequenced genomes. Syntenic homologs of the remaining five regions are found in the genomes of either DC3000, P. syringae pv. syringae B728a, or both. The homologs include the gene cluster for the biosynthesis of pyoverdin, which is found in all three strains, and for yersiniabactin, which is found in 1448A and DC3000. A cluster that is common to 1448A and B728a (>90% similarity; PSPPH1749 to -1752) contains two NRPS genes, one hybrid NRPS-polyketide synthase, and a possible dioxygenase. Of particular interest is a region shared between 1448A and B728a (>90% similarity and perfect gene synteny; PSPPH2703-2720), which contains a series of genes, each of which is 40 to 80% similar to the genes of the phaseolotoxin-producing argK-tox region, which encodes homologs of the dCTP deaminases, fatty acid desaturases, ornithine aminotransferase, a phosphatase-like protein, a PEP synthase-like protein, and eight proteins of unknown function. Several of the argK-tox genes are missing from this cluster, including argK itself and the gene for Arg-Lys amidinotransferase, and the gene order appears to have undergone at least two rearrangement events. This “pseudophaseolotoxin” region itself sits within a larger region (PSPPH2736 to -2860) that appears to be a lateral-transfer “hotspot,” showing evidence of at least seven different transfer events with distinguishable origins, and is quite close to the yersiniabactin gene cluster (PSPPH2892 to -2904).
Mobile genetic elements.
The percentage of ORFs in 1448A (5%) included in the category of mobile genetic elements is less than that seen in DC3000 (7%). However, as in DC3000, IS elements comprise the majority of ORFs in this role category. Most of the transposases found in 1448A are different from those identified in DC3000, which indicates that they were acquired after the divergence of the lineages leading to the present pathovars. ISPsy18 transposases, which are not present in DC3000, belong to the Mutator family and comprise the most abundant family of IS elements (48 intact copies; 60 total) in the 1448A genome. These transposases belong to the IS256
family and are most similar to ISRSO7 transposase found in Ralstonia solanacearum
). Many of the ISPsy18 transposases are either inserted into ORFs or linked to genes with disrupted reading frames (see Table S5 in the supplemental material).
The transposase for insertion sequence element IS801
is present in a number of pseudomonads and has been characterized in P. syringae
pv. phaseolicola strain LR781 (51
, a member of the IS91
family, is not present in DC3000, although the IS91
family is represented by the transposases ISPsy3 and ISPsy4. A total of 19 intact and disrupted versions of the IS801
transposase are present in 1448A. The full-length transposase is 410 amino acids, similar to IS801
, first isolated from plasmid pMMC7105 of P. syringae
pv. phaseolicola strain LR781 (51
). Of the four copies of the full-length IS801
, one is present on p1448A-A (PSPPHA0083). One of the three chromosomal copies is located close to the origin of replication, upstream of dnaA
, the gene encoding the chromosomal replication protein. The 15 copies of IS801
with disrupted reading frames include 10 identical copies of a shorter version of IS801
; stop codon after the first 156 amino acids) which are distributed on the chromosome and p1448A-A. Interestingly, if the stop codon in IS801s
(TAA in all cases) is replaced by TTC (the codon for phenylalanine), the reading frame then encodes the full-length IS801
. Since IS801s
retains the left and right ends of IS801
and is present in multiple copies on plasmid p1448A-A and the chromosome, it is likely that the truncated copies of IS801
are capable of being transposed by the intact versions of IS801
. The small plasmid, p1448A-B, does not contain IS801
Some regions of the genomes in the two pathovars appear to be preferred sites for the insertion of mobile genetic elements (see Table S6 in the supplemental material). For example, unrelated transposon proteins are located adjacent to PSPPH5203 (in 1448A) and PSPTO5595 (encoding glucosamine-fructose-6-phosphate aminotransferase in DC3000), which define one end of a syntenic block near the origins of replication in the two genomes (see below). Similarly, a GTP-binding protein, YchF (PSPTO1101 and PSPPH0988), is adjacent to a site-specific recombinase in DC3000 and a transposase in 1448A. Through close analysis of the plasmids, we have identified a genetic signature sequence of 133 nt that is linked to several effector genes in 1448A and other P. syringae pathovars. The 133-nt sequence was first identified in the 1448A PAI, close to avrPphC and next to the right junction of the avrPphF deletion (see Fig. S2 in the supplemental material). BLAST analysis for short, nearly exact matches showed that this sequence was mosaic in that part of it that forms the right flank of IS801, while the remainder is found within or next to recombinase integrases. Copies of the sequence were found next to hopH1, hopC1, hopAM1-1 (P. syringae pv. tomato DC3000), avrPpiA1 (P. syringae pv. pisi), and avrRpm1 (P. syringae pv. maculicola), and in each case, the sequence was either part of or next to a phage integrase/site-specific recombinase gene. The sequence may therefore form a common target for integration events, allowing acquisition and deletion of effectors.
The large plasmid (p1448A-A) contains a 4.1-kb region that has been duplicated (coordinates 2,726 to 6,858 and 59,496 to 63,628; PSPPHA0004 to -A0009 and PSPPHA0070 to -A0075; 100% identity at the nucleotide level). Genes present in this region include two polygalacturonases (PSPPHA0006 and -A0072) and truncated versions of the type III effector HopW1 (PSPPHA0009 and -A0075). Full-length copies of HopW1 genes were not identified in 1448A or DC3000. Both duplicated regions are associated with mobile genetic elements, suggesting that they may have been acquired by independent horizontal-transfer events. The duplicated region close to the origin of replication is bordered by an IS801 transposase (PSPPHA0003) on one side and type III effectors (PSPPHA0010 and -A0012; HopD1 and HopQ1 and -2, respectively) and transposases (PSPPHA0013 to -A0015) on the other. The second duplicated region is flanked by a truncated resolvase (PSPPHA0076) on one side and a transposase and a stability/partitioning determinant (PSPPHA0068 and -A0069) on the other.
Identification of the core Pseudomonas genome and ORFs unique to P. syringae.
To identify the core Pseudomonas
genome and characterize the P. syringae
-specific portion of 1448A, the peptide complement of 1448A was compared to the predicted proteomes of the four complete pseudomonad genomes (10
). The results of this comparative analysis are presented in Table S7 in the supplemental material. Using a BLASTP E value cutoff of 10−5
, a total of 3,567 ORFs (67%) were found to be present in all four reference species, 365 (7%) were specific to the two P. syringae
pathovars and were found in both, and 392 ORFs (7%) were specific to 1448A (Fig. ).
Comparison of the P. syringae pv. phaseolicola 1448A genome with four other finished Pseudomonas genomes (P.a., P. aeruginosa PAO1; P.p., P. putida KT2440; P.f., P. fluorescens Pf-5; DC3000, P. syringae pv. tomato DC3000).
The ORFs that were specific and common to the two P. syringae
pathovars included those encoding several predicted TTSS effectors, pectic enzymes, insecticidal toxins, regulatory proteins, and lipoproteins. Several of these genes encode virulence factors, discussed above, that are likely to be of particular importance to plant pathogens. The shared TTSS effectors represent a subset that appears to be broadly conserved among P. syringae
pathovars (Vencato et al., unpublished data). The pectic enzymes have the potential to play multiple roles in modifying plant cell walls during pathogenesis, although extensive wall disruption has not been reported. The function of insecticidal toxins, previously identified in DC3000 (10
), is unknown, but insects may benefit P. syringae
as vectors or compete with the bacteria as herbivores (30
). The regulatory proteins that are either specific to P. syringae
or unique to 1448A will be particularly interesting to investigate because of their potential roles in regulating virulence-related functions and host specificity. The P. syringae
-specific lipoproteins are also interesting, given the prevalence of studies predicting bacterial envelope functions among genes reported to be conserved in plant-associated bacteria but lacking in reference genomes (67
Because a BLASTP E value cutoff of 10−5 was used in this analysis, the ORFs that are unique to one or both of the strains are unlikely to have domains in common with ORFs in the other pseudomonads, which makes these unique ORFs of particular interest in exploring pathogenesis. However, two caveats must be noted. First, 1448A ORFs that are absent from DC3000 but present in one or more of the other pseudomonads may be potentially involved in fitness functions not directly related to virulence. Thus, some of the ORFs that are unique to 1448A or DC3000 could also be involved in various fitness functions. Second, some of the ORFs that are likely to be important in P. syringae pathogenicity may be members of ubiquitous gene families, such as ABC transporters with predicted specificities for plant-derived sugars and nonribosomal peptide synthases, and therefore not recognized as being unique to one or both strains in this analysis.
Comparative analysis of 1448A and DC3000 based on orthology and synteny.
Relatedness between the two pathovars was further analyzed using reciprocal best hits and a more stringent BLASTP cutoff to identify putative orthologs. At the whole-genome level, reciprocal BLASTP analysis (E value < 10−15) of the predicted functional ORFs in the 1448A and DC3000 genomes revealed that 4,133 ORFs (81% of predicted 1448A ORFs and 74% of predicted functional DC3000 ORFs) were orthologous. An additional 290 ORFs in 1448A have putative paralogs in DC3000, whereas 693 ORFs are unique to 1448A. Conversely, 1,018 DC3000 ORFs are absent from 1448A. The genes unique to 1448A predominantly encode proteins of unknown function (381; 55% of unique ORFs), including 198 hypothetical proteins and 122 conserved hypothetical proteins, as well as transposable elements (93; 13%). Complete lists of ORFs conserved between the two pathovars, unique to 1448A, and unique to DC3000 can be found in Tables S1 and S8 in the supplemental material. BLASTP analyses provide a one-dimensional perspective on gene content similarity. However, conservation of gene order and location are also important indications of genome conservation. Using the Sybil syntenic-block algorithm, 3,941 of 5,144 chromosomal ORFs (77%) were found to be syntenic in location between these two pathovars (Fig. ). Thus, although the 1448A and DC3000 genomes represent divergent clades of the P. syringae group, their genomes are highly conserved.
FIG. 3. Syntenic relationships between P. syringae pathovars. Regions of collinearity in the chromosomes of the two pathovars were identified using the Sybil syntenic-block algorithm. The syntenic blocks were color coded by their positions in the reference chromosome (more ...) Identification of recombination hotspots.
In a previous study, the DC3000 genome was compared with the genomes of the saprophyte P. putida
and the animal pathogen P. aeruginosa
for the purpose of identifying lineage-specific regions (LSRs) enriched for genes unique to DC3000 (34
). The availability of a genome sequence for a second pathovar now provides the opportunity to determine if the DC3000 LSRs point to regions more widely conserved among the P. syringae
pathovars yet lacking in other pseudomonads and therefore represent a distinguishing fingerprint for the P. syringae
species. Not only would mapping of shared regions aid in the localization of genes responsible for P. syringae
-specific phenotypes, but the genetic makeup of these regions could also shed light on the timing of gene acquisition relative to speciation and pathovar differentiation.
As previously mentioned, the DC3000 and 1448A genomes exhibit extensive regions of conserved sequence and gene order. However, a plot of the predicted DC3000 LSRs relative to the genome alignment reveals that the DC3000 LSR locations do not generally correspond to regions of syntenic conservation between DC3000 and 1448A. In fact, 29/44 DC3000 LSRs exhibit <10% syntenic conservation with the corresponding regions in 1448A. Of the 10/44 LSRs with >70% sequence similarity, only 4 were syntenically conserved between the two pathovars. Representative examples of LSRs with low syntenic conservation are illustrated in Fig. .
FIG. 4. Alignment of the P. syringae pv. tomato DC3000 (Pst DC3000) and P. syringae pv. phaseolicola 1448A (Psp 1448A) genomes in the region encompassing the P. syringae pv. tomato DC3000 LSRs 31 to 36. The red bars indicate colinear regions of similarity, and (more ...)
The DC3000 LSRs were initially characterized with the goal of identifying novel genes involved in DC3000-specific and/or P. syringae-specific virulence, and indeed, the locations of known virulence genes were found to correlate with their locations. However, with the exception of the main Hrp island (LSR13), the lack of LSR conservation between DC3000 and 1448A suggests that relative position within the genome may be of limited use as a guide to shared, novel virulence factors. This is supported by the observation that locations of previously identified virulence genes common to the two pathovars are not strongly conserved. For example, of the 17 Hop virulence proteins present in both pathovars, only 8 are conserved in their relative positions within the genome, and 1 of these, HopAF1, though conserved in relation to neighboring genes, is inverted in its orientation.
Although previously hypothesized to represent conserved syntenic P. syringae-specific cassettes of genes, the LSRs may, in fact, represent recombination hotspots within the genus Pseudomonas, accounting not only for the differences between DC3000 and species such as P. putida and P. aeruginosa, but also for differences among the P. syringae pathovars. Comparison with the genome sequences of additional Pseudomonas species, including the soon-to-be-completed P. syringae pv. syringae B728a genome, should provide further insight into this hypothesis.
Analysis of the genome of the bean pathogen P. syringae pv. phaseolicola 1448A revealed a large degree of conservation with other pseudomonads, especially P. syringae pv. tomato DC3000. However, divergence in ORF content was observed and could be the basis for differential host ranges between the pathovars. Numerous differences in the suite of genes involved in virulence, fitness, survival, and synthesis of natural products were found. Coupled with the differential ORFs with unknown functions, these genes provide a resource for future investigations into pathogenicity and host range specificity.