General H. mustelae Genome Features
The general features of the
H. mustelae genome are summarized in Table , and compared to selected other genomes of members of the
Campylobacterales. The
H. mustelae genome comprises a single circular chromosome of 1,578,097 base-pairs, and like most strains of the ε-proteobacteria selected for sequencing thus far, is plasmid-free. The GC content of the
H. mustelae genome is the second highest among the
Campylobacterales analyzed herein, and is among the highest values reported for members of the genus [
7]. All the
Campylobacterales genomes sequenced to date have similarly high coding densities.
H. mustelae has slightly higher mean predicted CDS length than many other related bacteria, aided by the fact that the
H. mustelae genome encodes some of the largest proteins ever recorded in this bacterial group (see below). Laterally acquired DNA in bacteria can be identified by local anomalies in GC mol% content, and is often associated with IS elements or tRNA genes [
52]. In pathogenic bacteria in general, such islands are typically associated with significant augmentation in virulence capability [
53], and the
H. pylori cag pathogenicity island is a key determinant of increased potential to cause more severe pathology including gastric ulcers and cancer [
54]. The
cag pathogenicity island of
H. pylori is not present in
H. mustelae, nor in any of the other helicobacters or campylobacters, including
H. acinonychis which diverged relatively recently (ca. 200,00 years) from
H. pylori [
30]. Although searching of the
H. mustelae genome with the Alien Hunter program initially identified 23 candidate regions with anomalous nucleotide content, none of these are large enough to be considered an island or islets (data not shown), and they are not characterized by linkage to phage genes, integrases or tRNA genes. A representative example is the region from HMU08130 to HMU08170 with a GC content of 41.55 mol%, flanked by direct repeat sequences in HMU08130 and HMU0818; this region encodes a presumptive type I restriction-modification system. The region from HMU08300 to HMU08350 has a GC mol% content of 46.69% and harbours several biosynthesis genes that are also found in
H. pylori,
H. hepaticus,
W. succinogenes and
C. jejuni. Alien Hunter also detected the primary Hsr locus from HMU08520 to HMU08780. Analysis of the GC mol% content around the genome (Fig. ) shows several smaller stretches of anomalous GC content, similar to localized deviations in
H. pylori. However, there is no evidence for a pathogenicity island in
H. mustelae, in contrast to
H. pylori and
H. hepaticus. A curious feature of the
H. mustelae genome is the asymmetric nature of the GC skew pattern ([G-C]/[G+C]; Fig. ). The pattern of GC skew is normally symmetrical for bacterial circular chromosomes [
55]. The lack of symmetry of the
H. mustelae GC skew may be indicative of recent genome rearrangement, as suggested for
Yersinia pestis [
56]. This is most likely to be a localized deletion, since in contrast to
Y. pestis, the GC skew pattern of the
H. mustelae genome does not suggest a transposed genome region. Lack of large-scale synteny between helicobacter genomes (see Comparative Genomics below) complicates identification of such a presumptive deleted region.
| Table 1General features of the H. mustelae genome, and those of selected Campylobacterales |
The paucity of insertion sequence elements and bacteriophage-related genes in
H. mustelae (Table ) suggests that these mechanisms are not significant agents of diversity generation in this species, a process which is driven in
H. pylori by free recombination [
57]. Thus it was surprising to identify a CRISPR locus and three
cas genes (HMU00230-00250) in the
H. mustelae genome, as these features have recently been identified as a phage resistance mechanism [
58]. CRISPR loci have not been annotated in other helicobacter or campylobacter genomes. HMU00670 encodes another potential phage resistance mechanism, a predicted Abortive Infection (Abi) protein.
H. acinonychis is unusual in having two complete prophages in its genome (one of which is no longer contiguous, due to genome decay), which has been attributed in part to the presence of only six predicted functional restriction-modification loci compared to eleven in
H. pylori [
30]. The
H. mustelae genome also encodes 6 predicted restriction-modification systems.
Virulence-related Genes in H. mustelae
Members of the Campylobacterales whose genome has been sequenced to date harbour a variable complement of known or inferred virulence factors genes (reviewed comparatively in ref. [
35]). Many of these factors have not actually been studied for biological significance except in
H. pylori, where the linkage to gastric colonization or pathogenesis for some defined traits is generally very clear [
2]. The presence in the
H. mustelae genome of homologues to genes whose products have been linked to colonization, persistence or pathogenesis in related organisms is summarized in Table . As noted above, a
cag pathogenicity island is not present. Relative to
H. pylori, the
H. mustelae genome contains a second (additional) urease operon AB2, that contributes to acid resistance [
41]. There is no full-length
H. mustelae ortholog of the vacuolating cytotoxin VacA of
H. pylori. Homology searches with the
H. pylori vacA gene identified 10 predicted autotransporter (AT) genes, organized into three cluster, plus the Hsr locus, and one singleton HMU08270 (Additional File
1). HMU04240 is a VacA homologue that lacks an autotransporter domain. Significantly, when the autotransporter beta-barrel domain is excluded from the analysis, six of the VacA homologues detected using the full length VacA sequence (HMU00600, HMU00620, HMU00630, HMU01180, HMU01190 and HMU08270 show no significant identity between their passenger domains and database entries. HMU06680 and HMU06730 are annotated as glycine-rich autotransporters, located in AT cluster 3. Regions of HMU06680 display 32% identity to HP0922 (a toxin-like outer membrane protein/VacA paralog) and significant residue identity in the passenger domain to an immunodominant antigen in
H. bilis (accession AAQ14336). In addition, two regions of the HMU06680 protein, residues 100-130 and 1339-1494, show ca. 25% residue identity to passenger-domain regions of VacA, and VacA-like proteins in
H. pylori. Several older studies have suggested that
H. mustelae does not produce vacuolating cytotoxic activity [
59,
60], so the biological significance of these VacA-related proteins is unclear. Of the remaining autotransporters, one is the Hsr variable surface antigen HMU08630, in a genetic configuration similar to strain 4298, flanked by cassettes for alternative epitopes [
44]. The predicted autotransporter HMU08270 comprises 4,094 amino acids, has no significant identity to database entries, and an unusual autotransporter domain, which is not at the extreme carboxy-terminus. Finally HMU06740 lacks a predictable signal peptidase cleavage site, and is unusually small for an autotransporter protein. The distinctive wealth of this class of secreted protein in
H. mustelae, the evidence for their production (see below) and the likelihood of their involvement in host interaction, make this bacterium a potentially productive model for exploring autotransporter evolution and biological function.
| Table 2Presence of genes related to colonization, persistence or pathogenesis in the H. mustelae genome, compared to H. pylori |
Outer membrane proteins are important for the pathogenesis of
H. pylori. The genome of
H. mustelae contains 26 genes that were annotated as encoding putative outer membrane proteins. Phylogenetic analysis of these proteins relative to the categorized
H. pylori OMPs [
61] showed that some of the
H. mustelae OMPs group with
H. pylori orthologs (Fig. ). For example HMU05640, HMU05650, and HMU10680 convincingly cluster in the clade containing the 8 members of the Hof OMP family of
H. pylori, and the three Hof-related
H. mustelae proteins show similar size and C-terminal motif to the
H. pylori Hof proteins (Table ). Apart from the fact that HP0486 is expressed and is not heat-modifiable [
62], suggesting it is not a porin, nothing is known about the function of Hof proteins. Interestingly, a further 12
H. mustelae OMPs cluster in two groups either side of the Hof protein clade (Fig. ). None of the annotated
H. mustelae OMP sequences position phylogenetically in the tight Hop-containing clade that includes BabA, SabA and OipA, and orthologs of these three adhesins are absent in the
H. mustelae genome. One Omp in
H. mustelae, HMU04150, is positioned on the periphery of a clade containing both Hor and Hop proteins of
H. pylori (Fig. ). However HMU04150 lacks the characteristic (AEX [D, N]G) motif present in the
H. pylori Hop proteins, and its carboxy terminal motif (Table ) is more similar to Hor proteins. HMU11950 clusters with the three FecA orthologues of
H. pylori. The majority of the remaining
H. mustelae OMPs are currently unclassified (Table ). They almost all share the properties of relatively small size, and lack of significant-identity database homologues. Some have atypical carboxy terminal sequences for OMPs, and signal peptidase cleavage sites that are not readily predicted. Two of them are expressed (see below), indicating significant production levels, and the biological function of these unclassified OMPs warrants further investigation.
| Table 3Classification of H. mustelae outer membrane proteins in relation to H. pylori OMPs |
The
H. mustelae genome encodes many other proteins likely to contribute to virulence, based upon information available for homologues (Table ). Close to the origin of replication, in a region distinguished by anomalously low GC content, are two genes related to haemolysis or haemagglutination. HMU00160 encodes a predicted protein with significant homology to haemolysin activators of diverse gram-negatives including
Photorhabdus luminescens,
Burkholderia pseudomallei, and a pathogenicity-island encoded determinant of
E. coli [
63]. HMU00170 Hag/Hly encodes a predicted 227 kDa protein with predicted signal peptide, and containing Pfam motifs for haemagglutination (Haemagg_act; PF05860), filamentous haemagglutinin (Fil_haemagg; PF05594), and an ATP/GTP-binding site motif A (P-loop). Homologues of this protein constitute a large family whose members are widely distributed among gram-negative pathogens, are annotated as either haemagglutinins or haemolysins, but which appear to lack functional characterization. The residue identity with HMU00170 is confined to the first 350 residues of that protein, and is particularly high over the filamentous haemagglutinin region (ca. 35-50% identity). Homologues of this pair of genes are lacking in helicobacters and campylobacters, suggesting this is an
H. mustelae-specific acquisition among the ε-proteobacteria.
H. mustelae has two orthologs (HMU02820 and HMU09010) of HP0508, which has been characterized as a plasminogen binding protein in
H. pylori [
64]. The biological significance of this phenotype in either gastric pathogen is unclear. HMU0700 was annotated as CiaB by virtue of containing a low molecular weight phosphotyrosine protein phosphatase Pfam domain, and significant residue identity to the
C. jejuni CiaB (Campylobacter Invasion Antigen B; Cj0914c) protein. This protein is required for maximal invasion of epithelial cells by
C. jejuni, and is notably exported by the flagellar export apparatus [
65]. Another recently discovered Cia protein, Cj1242 [
66], is not present in the
H. mustelae genome. Homologues of CiaB have been annotated in
W. succinogenes and
H. hepaticus, but not in
H. pylori, which is curious because CiaB is also present in ε-proteobacteria isolated from sea vents [
67], which are located on the deepest branch of the ε-proteobacterial tree. Like
H. pylori,
H. mustelae lacks the N-glycosylation system that contributes to pathogenesis in Campylobacters [
68,
69], and the genes for which are also present in
H. hepaticus,
W. succinogenes and sea vent ε-proteobacteria [
67]. HMU10120 is a homologue of proteins in other Campylobacterales which was first described in
C. jejuni as PEB4a, and which is a major antigen and cell adhesin [
70,
71]. The HMU10120 gene product was detected in the
H. mustelae proteome (see below). Its role as an adhesin warrants further scrutiny, since it surprisingly contains a rotamase Pfam domain. Another candidate virulence/survival determinant in
H. mustelae is HMU12690, which is a homologue of the
H. pylori neutrophil activating protein HP0243 [
72]. This has recently been described as one of three
H. pylori proteins diagnostically predictive for development of gastric cancer [
73]. NapA also has a role in protecting
H. pylori from oxidative stress [
74]. These features, coupled with the high-level expression of the HMU12690 protein in
H. mustelae (see below), suggest that it may be relevant for survival or pathogenesis in the ferret stomach. The
H. mustelae gene HMU06150 is homologous to Cj1327 and Cj1328, two genes involved in sialic acid biosynthesis, HMU06140 is annotated as an acylneuraminate cytidylyltransferase. Thus,
H. mustelae may decorate its surface with sialic acid.
H. pylori incorporates human blood group antigens into its LPS (reviewed in references [
75-
77]) in a strain dependent manner. Ferrets express a structure equivalent to human blood group A on gastric tissue, and
H. mustelae strains express blood group A antigen in their LPS [
47,
78]. The
H. mustelae genome includes divergent orthologues of ten out of fourteen genes [
77] implicated in
H. pylori LPS biosythesis/blood group antigen production (Additional File
2). Some of these candidate orthologues are so divergent that they cannot be confidently separated from potential flagellin glycosylation genes (see below). The
H. mustelae repertoire includes a single predicted fucosyl transferase, encoded by HMU12060.
H. pylori fucosyltransferases display low identity to mammalian enzymes. Interestingly, HMU12050 encodes a predicted blood group AB glycosyltransferase, which shows 31-33% BLAST identity against mammalian AB glycosyltransferases, and putative glycosyltransferases from
E. coli O86 and
Haemophilus somnus (Fig. ). To our knowledge, this is the first identification of a bacterial gene for synthesizing mammalian blood group A/B antigen which is known to be actually produced on the bacterial surface. It is expected by analogy with
H. pylori (reviewed in ref. [
75]) that this gene product will contribute to the ability of
H. mustelae to adapt to the gastric environment (immune avoidance), modulate inflammation and immune cell recognition, and exacerbate pathology by triggering autoimmunity. The ferret model provides an excellent platform to test these hypotheses.
There are 4 secretion systems predicted in the
H. mustelae genome (see below), including the flagellum protein export system. Motility conferred by flagella is an essential property for successful colonization of the ferret by
H. mustelae [
79], and the hook and flagellin proteins of
H. mustelae have already been characterized [
45,
46]. The annotation of the
H. mustelae genome revealed a typical set [
80] of Campylobacterales flagellar genes (Additional File
3), for structural components, glycosylation, regulation, and chemotaxis. The number of chemotaxis genes is reduced compared to
H. pylori, with orthologs of
cheV1,
tlpC, and
tlpA apparently being absent. This may be functionally offset by the presence of HMU05990, a putative MCP-type signal transduction protein, which includes a PAS domain sensor sequence (Pfam 08447). This protein is absent in
H. pylori and
H. hepaticus, and its closest homolog is in
C. jejuni.
H. pylori contains several genes including HP0840 and HP0366, whose products result in glycosylation of flagellin with pseudaminic acid [
81], which is required for flagellin assembly into flagellar filaments [
82]. The
H. mustelae genome includes a clear orthologue of HP0840 (designated
flaA1; Additional file
2, Table S2). Two potential homologues of HP0366, HMU06610 and HMU02370, were identified in the
H. mustelae genome. Another group of
H. mustelae genes, HMU11700-HMU11730, shows some relatedness to Cj1311-Cj1317, involved in flagellin sialylation, but their function, and indeed the glycosylation state of
H. mustelae flagellins, is still unknown. A noteworthy feature is the fact that two essential motility genes,
fliK (HMU07800; hook length control protein) and
motA (HMU03580; motor protein) are pseudogenes in the sequenced strain, which we subsequently confirmed to be non-motile (data not shown). The original type strain used for the species description emendation [
17] was motile, as are
H. mustelae isolates from wild ferrets [
18]. In the case of FliK, HMU07800 is flanked upstream by an ORF encoding 277 amino acids, which is preceded by a perfect GG-N
10-GC σ
54 promoter motif expected for
fliK [
83,
84]. Thus, a frame-shift between HMU07800 and HMU07790 has inactivated
fliK. The gene for MotA also appears to have suffered a frameshift. We assume that these mutations occurred during recent laboratory passage, in a manner similar to frame-shift inactivation of
fliP in
H. pylori strain 26695 [
85], revertants of which can be easily obtained on motility agar at high plating density.
The 3 other complete or partial protein secretion systems predicted from the
H. mustelae genome are presented in Additional File
4. The Sec system genes are not linked, except for
secD and
secF, which are clustered with
yajC. Like other ε-proteobacteria,
H. mustelae lacks SecB; it has a single
secA gene. The
secE gene was found by homology search, internal to the
tlp gene and in the opposite strand. There is a single
tatB-tatC gene cluster. The
tatA gene is present (HMU02290) but the
tatE gene is apparently absent. It is thus not clear if the
H. mustelae Tat system is functional. Relevant for the abundance of autotransporters, we annotated a gene predicted to encode an Omp85(YaeT) homolog, which has a critical role in outer membrane protein insertion/biogenesis [
86]. Analysis of the Gsp genes suggested the presence of a fragmented or remnant pilin biosynthesis system. The genes encoding GspDEF (also called CtsDEF) are clustered. However, there are other ORFs around them that have no significant homology to Gsp or type IV pilin-related proteins, except for a putative pseudopilin but this is unusually distantly separated from the others. Putative PilT-encoding and prepilin peptidase genes were also found separately on the chromosome, and not near anything that looks like encoding type IV pili or GSP machinery. Thus there may be a pilin assembly unit in the
H. mustelae genome, which could contribute to pathogenicity, but functional investigation is required. Types IV and VI secretion system components were not found.
The presence of homopolymeric tracts in and between genes has been identified as a potential antigenic variation mechanism in
C. jejuni [
31],
H. pylori [
28] and
H. hepaticus [
29], and has been postulated to compensate for the relative paucity of transcriptional regulators. Disregarding polyA or polyT repeats because of the high genomic AT content, we identified 12 genes potentially affected by variation in copy number of intragenic homopolymers, and 8 potentially affected by intergenic variation (Additional file
5). Only two of the former category showed actual length variation in the shot-gun read data, compared to three of the latter. As expected from other Campylobacterales, the dominant gene function affected was surface architecture, at either protein or carbohydrate level. However, the overall number of genes potentially affected by this putative method of antigenic variation was significantly lower than
H. pylori, C. jejuni or
H. hepaticus. This may be due to the dominant coverage by the Hsr protein, which is a major antigen, and which changes epitopes by recombination [
44].
The Expressed Proteome of H. mustelae
We prepared sub-cellular fractions from
H. pylori and
H. mustelae, and first compared them by SDS-PAGE (Fig. ). We cultured both species for two days on plates, compared to five-days used for the initial
H. pylori proteome analysis [
87], to minimize development of coccoid forms [
88]. The initial supernatant from harvesting the cells was designated as an extracellular fraction, since it was expected to contain exported proteins. In accordance with the well-documented property of autolysis for
H. pylori [
89], the extracellular fraction of both species shared many bands with the cytosolic fraction of the respective species (Fig. ). However, it was also clear that most of the proteins were apparently not shared between the two species. The greatest number of co-migrating bands between species was observed in the cytosol fraction, while the envelope fractions of the two species contained distinctive protein profiles. The
H. mustelae envelope fraction contained around eight major proteins, less than half the number in the
H. pylori envelope fraction, and few if any appeared to be produced by both species, consistent with the predictions from their respective genome sequences.
The dominant proteins in the envelope and cytoplasmic compartments of
H. mustelae were identified by LC-MS. The most abundant 50 proteins in each fraction are presented in Table and Table ; the complete datasets are available in Additional file
6 and Additional file
7. The membrane proteome includes several cytoplasmic proteins that are also known to be highly expressed in
H. pylori, including alkyl hydroperoxide reductase AhpC, flavodoxin and thioredoxin [
87], and bacterioferritin [
90]. Resistance to oxidative stress, and electron transfer functions, are clearly important processes that are performed using similar proteins in the two species. These proteins are all known to form either higher molecular weight aggregates, or membrane associations, which may explain their presence in the insoluble cell fraction. The AhpC protein, originally and mistakenly thought to be
H. pylori-specific [
91], was reported to be produced by several other
Helicobacter species but not
H. mustelae [
92], although the gene was detectable in
H. mustelae by PCR. The abundant soluble urease subunits A and B were also present in the insoluble fraction, as well as the cytosolic fraction, either through aggregation or membrane association in the former. The UreA2 and UreB2 structural sub-units were not detected, even though their mass fingerprints are clearly distinguishable from UreA and UreB (not shown). This non-production under our growth conditions is consistent with the observation that the expression of the Ure2 operon in
H. mustelae only occurs under nickel limitation [
41]. Despite the apparent lack of similarity between the
H. pylori and
H. mustelae proteomes in one-dimensional electrophoresis, when the 20 most abundant proteins detected in
H. pylori by two-dimensional electrophoresis [
87] were cross-compared to the
H. mustelae cytosolic proteome, all 20 were present in the latter sample (Table ). The relative abundances cannot be reliably compared due to differences in the methodologies, and growth phases of cells. The shorter growth period we used is reflected by the lower levels of stress proteins and higher levels of elongation factor EF-Tu in the detected
H. mustelae proteome. Future comparative transcriptomic and proteomic investigations are needed to identify variations in core genome expression between the two species. In addition, we will compare the
H. mustelae transcriptome and proteome after 5 days growth to that of
H. pylori, to clarify comparative issues with the current datasets.
| Table 4The envelope proteome of H. mustelae determined by LC-MS |
| Table 5The cytosolic proteome of H. mustelae determined by LC-MS |
| Table 6Comparison of highly expressed proteins in the two dimensional electrophoresis pattern of H. pylori with abundant cytosolic proteins of H. mustelae detected by LC-MS |
The abundant members of the cell envelope proteome include proteins involved in metabolism (e.g. ATP synthase), transport (e.g. ABC transporter subunits), secretion (SecG, lower amounts of SecA), and several flagellar proteins (Table ). Notable among the most abundant proteins is HMU14250, a hypothetical protein with homology to pseudopilin or pilin subunits (see above). Less than 1% of the expressed cytosolic proteome was annotated as "hypothetical". In contrast, six of the top fifty proteins in the membrane proteome were annotated as "hypothetical", as was 10% of the total detected membrane proteome, validating the gene annotation process, and highlighting the possible contribution of proteins of unknown function to the biology of
H. mustelae. Of the 26 predicted outer membrane protein in
H. mustelae, only 4 of these, HMU0564, HMU0565, HMU1360 and HMU1367, were detected in the membrane proteome (Table ). The fact that two of these are encoded by contiguous genes and likely co-transcribed is suggestive that their successful detection is due to similarly high expression levels. It is likely that some or many of the other predicted outer membrane proteins are actually expressed, but are below the detection limit, estimated to be in the micromolar range. Surface proteins detected in the expressed proteome also included HMU04120, a putative OM component of an efflux system. Of the 10 autotransporter proteins annotated, 7 of these were detected, at relatively high levels. Interestingly, at 1.37 Mol%, the dominant surface ring-forming protein Hsr was not the most highly expressed protein. HMU0118 was detected at 1.72% Mol% and HMU0063 at 1.73%. HMU0118 is 29% identical to the Hsr protein and HMU0063 is 38.8% identical to Hsr, but in both cases, the identity at the amino terminal exposed part of the molecule is low. Although the Hsr gene was identified and cloned by immunoreactivity with antiserum raised against purified Hsr protein, and this antiserum labeled the surface rings by immunoelectron microscopy [
42], the possibility remains that the surface rings are composed of more than one autotransporter protein. This would contribute to even greater antigenic variability of the
H. mustelae surface caused by recombination of sequences for new epitopes into the expressed Hsr protein [
44].
Comparative Genomics and phylogeny of the ε-Proteobacteria
Alignment of the
H. mustelae genome sequence with those of
W. succinogenes,
C. jejuni,
H. pylori and
H. hepaticus revealed lack of extensive i.e. long-range synteny with any of these genomes (Fig. ), a feature noted for other genomic comparisons within the Campylobacterales [
35]. Although the ACT software allows visualization of mutually reversed homologous sequences, it was noteworthy that comparing
H. mustelae to
W. succinogenes or
C. jejuni seemed to more clearly highlight a vestigial genome backbone than comparing it to
H. pylori or
H. hepaticus (Fig. ). Re-orienting the
H. pylori and
H. hepaticus genomes with the
dnaA genes at co-ordinate 1 partly clarified the rungs of a conserved ladder of homology between the genomes, but this was still largely obscured by the numbers of relative transpositions and reversions of multiple loci between the genomes. The rungs in the ladder are formed by genes including
dnaA, gyrase, a putative metallo-beta-lactamase (shared with
H. hepaticus), and
gatB, the 2-oxoglutarate:acceptor oxidoreductase operon. This analysis also highlighted the degree to which lack of synteny in the compared genomes is due to transposition across the origin-terminus axis, resulting in an X-shaped alignment that is symmetric about the origin of replication as previously noted in other bacteria by Eisen [
94]. This symmetry indicates homologous loci at the same distance from the origin but on the opposite side of the origin, which is explained by the fork replication theory [
95]. The genome alignments highlight the absence of the
H. pylori cag pathogenicity island and the
H. hepaticus genomic island (from HH_232 to HH_303) in the
H. mustelae genome (Fig. ). Relatively few longer stretches of the
H. mustelae genome lack any significant homologues in
H. pylori or
H. hepaticus; those that exist include HMU00600-HMU00690 that includes AT cluster 1; HMU01180-HMU01200 including AT cluster 2; and HMU10860-HMU10880 that encodes a predicted tricarboxylate transport system not found in the other
Helicobacter species.
To explore the phylogenomics of the
Campylobacterales for which genome sequence data were available, we first performed pair-wise alignments of their proteomes and then constructed a matrix based on their relatedness. Using methods derived during our studies of another bacterial group within which genetic distances are very long, the
Lactobacillales [
96], we defined orthologues at protein level, requiring 30% identity over 80% of the sequence lengths. The pair-wise alignment data is presented in Additional file
10. These data indicated that
H. mustelae was closest phylogenetically to
H. hepaticus, followed by
H. pylori and the the Campylobacters. A tree constructed based on 212 orthologous proteins shared between the respective taxa (Fig. ) showed two major branches, one including the four
Camplyobacter genomes. In the second branch,
H. mustelae clustered most closely with
H. pylori and with the enterohepatic species
H. hepaticus.
W. succinogenes was peripheral to the Campylobacter clade. This topology is more concordant with the 16S rRNA phylogeny of Dewhirst [
26] and Gueneau [
25] than with the 23S rRNA gene phylogeny constructed by Dewhirst
et al, which the authors suggested to be more robust [
26]. The positioning of
W. succinogenes in particular by the 23S rRNA gene phylogeny is significantly different from that based on numbers of orthologs in the current study including
H. mustelae. Our exclusion of
W. succinogenes from the helicobacters also conflicts with a phylogeny constructed in the same study by Dewhirst and colleagues [
26], based on 870 shared proteins.
We have previously used Supertree analysis to clarify relatedness in distant taxa [
96]. The advantage of this approach is that it combines the maximum likelihood trees constructed from each of hundreds of core proteins, in this case the 212 proteins shared by all taxa (Fig. ). This all-against-all comparison identified numbers of proteins specific to major groups (Fig. ). The three helicobacters constituted a reasonably robust group, with over thirteen hundred core proteins, compared to 1097 in the four campylobacters. The consensus supertree constructed for the eight
Campylobacterales plus outgroup is presented in Fig. . Based on this more restricted set of core proteins,
W. succinogenes still positioned on the edge of the
Helicobacter clade.
H. pylori was most closely related to
H. mustelae. However this
H. pylori-
H. mustelae branch was the least supported by the combined frequencies of the individual maximum likelihood trees, indicating the instability of this phylogenetic relationship. Considering the pairwise comparisons, whereby
H. mustelae was most closely related to
H. hepaticus (Additional file
9, Table S9), the choice of proteins clearly has a profound affect on the phylogeny inferred. The choice of
T. maritima as outgroup may also have affected the outcome, but the number of shared orthologs was only 252 when this taxon was not included in the all-against-all comparison, suggesting this was not a major factor. As for the pair-wise ortholog analysis and the phylogeny based on concatened core proteins (Fig. ),
W. succinogenes did not cluster among the helicobacters, and the data do not support the notion of revising the nomenclature of these genera [
26].