Traditionally, the assignment of taxonomical appropriate names to microbes was based on phenotypic characters, such as Gram stains or the possession of cell walls. Currently, with over 350 whole genomes sequenced, there is ongoing debate to re-evaluate the prokaryotic species definition [2
]. One such attempt, considering different genomic parameters, was performed by Coenye and Vandamme [5
], who compared different phylogenetic approaches on the lactic acid bacteria. They concluded that the different approaches agreed well, although these do not necessarily provide much additional information about phylogenetic relationships. In our study, we analysed the overall consistency of the phylogenetic signal of the genome signature in 334 prokaryotic. We tested the congruence between the δ* intragenerically, intraspecifically, and the δ* values with their corresponding 16S rDNA sequence identity values.
For some species (E. coli
group, Bacillus cereus
cluster) it was known that they are probably the same species, and the low values for δ* corroborate this [11
]. Comparison of the intrageneric δ* (Fig. ) and intraspecific δ* (Fig. ) shows various intrageneric values to be well in the range of intraspecific values, suggesting that there are more clusters that may actually constitute one species. This is the case for the different Bartonella
spp. and Yersinia
spp., This may also hold true for M. bovis
and M. tuberculosis
, and the L. innocua
and L. monocytogenes
species. For these six groups, the 16S rDNA data support the notion of very close phylogenetic relationships.
In contrast, four extreme intraspecific δ* values are within the intrageneric range. The different B. aphidicola
species display high genomic dissimilarity values as well as low 16S rDNA sequence identity scores, suggesting these might actually be different species. This is in agreement with an estimated divergence time of over 150 million years [26
]. The reason why the different species of R. palustris
, P. marinus
and P. fluorescens
display high δ* values, while the ribosomal sequences of the individual species are nearly identical remains unclear, although very long branches have been observed between members of these species in different phylogenetic studies [22
]. It is of note that between the different P. marinus
genomes substantial differences in size and GC percentage are observed.
Generally, an inverse relation between δ* and 16S is found, but the perceived resolution of this relation seems low and therefore δ* values alone seem insufficient to infer reliable phylogenetic relationships. Also, it is not possible to infer a reliable phylogenetic clustering based on distance matrices as bootstrapping is not possible.
For some prokaryotic species containing multiple chromosomes it had been suggested that the secondary chromosome may have been acquired via horizontal gene transfer [12
]. We find that the genomic dissimilarity between the two primary chromosomes in bacteria is generally low, but it is higher than the genomic dissimilarity between chromosomes from the same species, supporting the HGT hypothesis. Intragenomic dispersal of DNA can ameliorate the dissimilarity in genome signature, obscuring compositional dissimilarities over time [12
]. A consequence of this is that different chromosomes found in metagenomic analyses can not readily be grouped into genomes for prokaryotes, though this is a minor problem as most prokaryotes have single chromosome genomes. We find that each intragenomic comparison of the two chromosomes of the different Vibrio
genomes yields a higher δ* value than the average of δ* = 0.009 in intraspecific comparisons (data not shown). Secondary chromosomes are present in all sequenced genomes of Vibrio
spp., and if they have been present in each genome since the split of the different Vibrio
spp. the different chromosomes in each genome should have had ample time to ameliorate towards more similar dinucleotide frequencies. The fact that the different chromosomes of each Vibrio
genome are still dissimilar from each other in composition may be caused by an instable chromosome II, which is known to be less well-conserved between the different Vibrio
species than chromosome I [28
The precise origin of the genome signature is still unknown. For the GC percentage it has been suggested that certain environmental conditions shape the nucleotide composition [29
]. This has also been found to be the case for the genome signature [30
], although the exact effect of different conditions on different genome sequences remains unknown. It is likely that mutational pressures direct the shape of the genome signature, but the fact that secondary chromosomes in most cases remain dissimilar from the primary chromosomes underscores our lack of understanding of the factors that shape the nucleotide composition.
In conclusion, the genome signature is more similar between closely related species, and increases with larger phylogenetic distances, but this relation seems inadequate to infer phylogenetic relationships by itself. Unfortunately, distance matrices based on single values, as is the case with δ* scores, are not amenable to bootstrapping, so robust phylogenetic analysis can not be inferred from δ* values for prokaryotes. This parameter does however have a strong phylogenetic signal and can therefore be used to support or contradict a given phylogeny and resulting taxonomy. The combination of δ* and 16S rDNA data given above for Mycobacteria, Listeria, Prochlorococcus and Buchnera provide convincing evidence for a re-evaluation of these taxonomic relationships. Also, if there are no additional ways to infer relationships (e.g. in the absence of comparable markers, as with multiple chromosomes in metagenomic analyses) the genome signature may help to cluster chromosomes, although the intragenomic δ* scores can be relatively high in multichromosomal genomes from prokaryotes.