The results of the genome-based phylogenetic reconstruction suggest that certain changes should be considered in the nomenclature of the
Xanthomonas genus. For instance,
X. fuscans was recently proposed as a new species [
27], but here we show that it should be considered as a later heterotypic synonym of
X. citri, as previously suggested [
18,
31]. Other clades in the standing bacterial nomenclature [
63] within the
Xanthonomonas genus were consistent with the phylogenetic reconstruction. Nevertheless, we observed a paralogy in the genus
Xanthomonas when
Xylella fastidiosa was included with
X. albilineans outside the
Xanthomonas group. Our results suggest that
X. albilineans, probably along with other early-branching
Xanthomonas, should be considered for a new genus designation. However, the relationships between
X. albilineans, Xylella and the other
Xanthomonas remain unclear. Another shared feature between
Xylella fastidiosa and
X. albilineans is the reduced genome. The reductions in these genomes were previously shown to be due to independent events [
42]. Here we show evidence suggesting that reductive genome evolution could also affect other clades in the genus such as
X. vasicola.
The phylogenetic relationship between
X. albilineans, Xylella fastidiosa and the rest of the taxa in the genus
Xanthomonas is not clear. The genome of
X. albilineans is part of the "early-branching species" [
7], a group of species including
X. albilineans and
X. sacchari previously found to be basal in the phylogeny of the genus [
7,
35]. The species is also a member of the "
hyacinthii" group, a group of species with major differences in the 16S-23S rDNA Intergenic Spacer (ITS) with respect to the other members of the genus [
32]. Pieretti and collaborators [
42] suggested that
Xylella and
X. albilineans form a monophyletic clade, which is basal to the rest of
Xanthomonas. This is based on a Maximum Likelihood analysis with seven housekeeping genes. Our analyses with over two hundred genes suggest that
X. albilineans is basal to
Xylella and the rest of taxa in the genus
Xanthomonas. Neither of the analyses obtains a good support value for these nodes. The most straightforward explanation for this is that certain regions of the genome support one topology and certain others support the second one. This could be due to a considerable number of LGT in these genomes. Alternatively, it could be due to the large amount of changes accumulated in
Xylella fastidiosa, as revealed by the length of the corresponding branch (Figure ).
The phylogenetic tree presented in Figure displays identical topology and similar relative branch lengths as inferred by different optimality criteria (Maximum Likelihood, Bayesian Inference, Maximum Parsimony). The tree supports monophyly in the species X. campestris, X. oryzae and X. vasicola. The clade "X. axonopodis" contains the species X. fuscans, X. citri, X. axonopodis and X. euvesicatoria. However, the lower coverage in terms of sequenced genomes of these species makes it difficult to support any further observation beyond the close relatedness within the clade with respect to other species.
Interestingly, the phylogeny displays a close relationship between the species
X. fuscans and
X. citri. In order to compare their similarity in the same framework of MLSA performed for other species of
Xanthomonas (
e.g., [
31]), we constructed a matrix containing 989 loci employed for the phylogenetic inference (Table ). According to the resulting matrix, a similarity threshold of 99% can differentiate bacteria recognized as belonging to the different pathovars (except in
X. vasicola, for which pathovars
vasculorum and
musacearum display a similarity above 99%, possibly due to non-chromosomal sequences). All the species with currently accepted names [
63] have similarities above 97%. This value (in accordance with previous MLSA calibrations [
31]) also differentiate species outside the
X. axonopodis clade, but fails to differentiate
X. fuscans and
X. citri, suggesting that the two pathovars conform a single species as previously suggested [
18,
31]. This is also supported by the likelihood distances between these two taxa (Figure , Table ). Accordingly, we recommended that the species
X. fuscans be regarded as a heterotypic synonym of
X. citri.
| Table 2Similarity matrix between genomes |
Several robust methods for the identification of orthology, multiple sequence alignments and phylogenetic inferences have recently been developed (reviewed in [
64]). However, a common flexible framework for their joint application in specialized phylogenetic studies and MLSA in general is still required. The BioPerl libraries, including the Bio::Phylo package [
65,
66], provide valuable tools for the automation of analyses, but the connections between different steps are often not automated, making them time-consuming. Unus allows the execution of complete workflows in phylogenomics within a single interface, and its current functionalities and limitations underscore the need for a fully structured platform in the field, such as those available for other branches of genomics.
We compared the automatically selected OGs for the phylogenetic assessment with several lists of genes manually compiled. These comparisons indicated that, depending on the genome coverage and annotation of the drafts employed, our analyses broadly agree in the selection of OGs with those utilized previously for phylogenetic inference. Furthermore, the functional distribution of the automatically selected genes exhibits the expected behaviour at different taxonomical levels. Selections on broader taxonomical levels exhibit a larger representation of genes implicated in central-metabolism, while the proportion of clade-specific genes augments in narrower taxonomical levels.
The analysis of the distribution of COG categories shows that central metabolism and ribosomal proteins are favoured when comparing distant genomes, as they are in phylogenetic studies based on one or few loci. Genes in these categories are better suited than genes in other COG categories or unclassified genes because of two characteristics that are important for phylogenetic assessment. Firstly, genes implicated in central-metabolism and ribosomal genes are usually of single-copy. Genes with in-paralogs are normally avoided in phylogenetic inferences given the difficulty in identifying corresponding genes in sets of paralogy [
67], despite some efforts to include them in phylogenetic analyses (
e.g., [
68]). Secondly, these genes are often present even in genomes from loosely related organisms. Although phylogenetic reconstructions based on gene content have proven successful (
e.g., [
69]), it is hard to achieve high resolution below species and it is not possible with incomplete draft genomes.
Additional genes suitable for phylogenetic analyses were detected through automated identification of orthologs, allowing a higher resolution among closely related taxa. These genes are usually not included in MLSA, although they can add important information about relationships within the group. For closely related bacteria (such as the X. oryzae pv. oryzae strains), the importance of such additional information resides on the low variability among genomes. Therefore, the option to select orthologs without a priori knowledge of the genes that will be included, allows for flexibility in terms of data availability, as well as the obtention of optimized phylogenetic resolution at any taxonomic level under study.
A previous study [
42] suggested a reductive evolution in the genome of
X. albilineans, revealed by the small genome (3.77 Mbp) and the high putative pseudogenization. We present evidence supporting the hypothesis that the reductive genome evolution occurs along the genus, and is not restricted to the species
X. albilineans. In our analyses, the species
X. albilineans effectively revealed large genomic reductions, but even larger reductions were presented by the species
X. vasicola, with recent genomic gains only detected on tip nodes, suggesting a reductive evolution tendency followed by the acquisition of genomic regions. The genomic gains on tip nodes can be partly explained by the inclusion of non-chromosomal material in the draft genomes of
X. vasicola, although this result was not found in other draft genomes in the study that have non-chromosomal material, such as XamC. An alternative explanation is that genomic gains have arisen by recent genetic exchange with other bacteria, as previously suggested for
X. vasicola [
47]. However, the large ancestral losses cannot be explained by means of the incompleteness of the genomes, and may reflect an ancestral genomic reduction in the species. The size of the regions involved in such events, and whether they affect restricted functional categories of genes or random regions, is still to be determined.
We identified two clusters of genes with paraphyletic distribution, suggesting lateral gene transfer. One of the clusters, present in X. campestris and the "X. axonopodis" clade, exhibits interesting functional relationships with the Type IV Secretion System (T4SS), while most of the genes are annotated as coding for either putative secreted or membrane proteins. Identification of LGT events based only on intrinsic features such as the G+C content and the CAI would fail to identify both clusters, showcasing the usefulness the phylogenetic distribution of orthologs as a complement for the prediction of putative LGT events.