We have shown that ROL values for specific orthologues are correlated over long broad evolutionary distances, and that these correlations remain strong even within specific functional classes of genes and for genes that are not essential for cellular viability. In other words, the constancy of the level gene conservation across bacterial orders seems to result from specific differences in gene function. The strength of the correlations we find here are of similar magnitude to one found in a previous study of correlations between protein evolutionary rates within the Chlamydiaceae [8
]. Notably, the Chlamydiaceae are far more closely related than the clades considered here, so a high correlation should not be surprising. However, we have also considered selection on a more general scale (gene presence versus gene absence), which likely increases the strength of the correlations. Interestingly, for some orthologues, ROL values have changed considerably across taxonomic groups (we show three examples in Figs. and ). We propose that these genes have changed in functional importance, resulting in either increased or decreased purifying selection.
Figure 4 Correlation between gene ROL values for information transfer RNA related ((MultiFun class 2.2) nonessential genes. The ROL values for γ-β-proteobacteria and α-proteobacteria are plotted. Each point indicates the estimated ROL value (more ...)
Figure 5 Phylogenetic pattern of conservation for zur, sspA, and rbfA. Unlike most RNA related genes (Fig. 5), these three genes have different patterns of conservation, and thus different ROL values, when compared between γ-β-proteobacteria and (more ...)
Some essential E. coli
genes have orthologues that are consistently lost at high rates among other γ-β-proteobacteria, α-proteobacteria, and Bacilli-Mollicutes, contrary to the high level of conservations expected for essential genes. This is not due to these genes only being essential in E. coli
and nonessential in other taxa. In Table we show a list of genes that are essential in E coli K12
and which have high ROL values (greater than 2.4 in all three bacterial groups studied (Fig. )), together with data from an empirical study of gene essentiality in the γ-proteobacterium Acinetobacter baylyi
]. Of nine genes with an orthologue in Acinetobacter
, eight are also essential in Acinetobacter
. This suggests, surprisingly, that some genes, despite being essential, are lost frequently, and is consistent with the view that compensation at other sites in the genome may occur even for "essential" functions.
Essential E. coli genes with high ROL values are also essential in Acinetobacter.
Many of the essential genes that are lost at high rates are recent innovations. Considering those genes that are essential in E. coli K12
but are lost at high rates from other γ-β-proteobacteria, 44% (18 out of 41) have a distribution restricted to the γ-β-proteobacteria and are thus likely to be relatively recent additions to the genomic repertoire. In contrast, of the essential genes with low ROL values (less than 2.4), only 0.9% (2 out of 222) are restricted to the γ-β-proteobacteria. Previous work has shown that recently acquired genes tend to be incorporated at the edge of the cellular network [10
]. Such peripheral genes may thus be more easily removed from the genome, with fewer interactions to compensate.
These results confirm and extend previous studies that have investigated the relationship between essentiality and gene conservation [11
]. However, here we have used a phylogenetically corrected measure of gene conservation (ROL). Additionally, we have found that the ability of orthologue conservation to predict gene essentiality is far higher than has previously been realized [11
], most likely due to the lower accuracy of earlier datasets. Finally, we have shown for the first time a correlation between gene conservation and quantitative measures of deletion phenotypes (growth yield, Fig. S2).
Our metric of gene conservation, which takes into account phylogenetic history, provides a considerable improvement over simpler measures such as the fraction of taxa that retain a specific orthologue (retention). Using retention to predict essentiality yields an AUC of 0.937, meaning that essential genes are incorrectly ranked higher than nonessential genes 6.3% of the time. Using ROL, the misclassified fraction is reduced to 5.3%, a reduction of 16% in the error rate. ROL has the additional advantage of being based on a specific evolutionary model, which itself may provide biological insights, for example into the relative rates of gene loss versus horizontal transfer (i.e. the ratio of gene loss versus gene gain in lineages).
Finally, we note that high-throughput experimental assessments of gene essentiality are prone to both false positive and false negative results (i.e. annotating a non-essential gene as essential and vice versa). The level of agreement on essentiality between the two most recent studies of gene essentiality [5
] is similar to the level of agreement between both studies and ROL (all are between 94% and 95%), and far greater than between the first experimental study of gene essentiality [14
] and the latter two experimental studies. This suggests that ROL may be a valid and useful means of cross-validating experimental studies in order to find genes likely to be false positives or false negatives, which could then be reexamined.