Tree reconciliation is a standard approach for phylogeny-based orthology detection. This method identifies orthologous and paralogous relationships by comparing gene and species trees (
28,
29) and thus requires accurate trees of both types. While the divergence patterns among several representative species (e.g. human, mouse, rat and nematode) can be elucidated in detail, the phylogenetic relationships among numerous or unfamiliar species can lead to inaccurate species trees. Furthermore, uncertainties in phylogenetic parameters and lineage sorting can cause inconsistencies among gene trees obtained from a given sequence dataset (
3,
14), leading to inaccurate orthology detection. To obtain reliable phylogenetic trees for detecting orthology, recent phylogeny-based methods have used bootstrap analyses (
14,
15). The two representative programs, Orthostrapper and Rio, generate a set of bootstrapped trees that provide more confidence than a gene tree. However, the results are dependent on arbitrary levels of bootstrap supports (
15). In theory, decrease in bootstrap support levels would increase the probability of detecting orthologs, but may also produce many false positives, thereby increasing sensitivity but reducing specificity. Alternatively, the increase of the level of the bootstrap support would lead to lower sensitivity and higher specificity. The weaknesses of bootstrap analyses have been demonstrated in this (;
Supplementary Table 2) and previous studies (
14,
15).
To overcome the weaknesses of tree reconciliation and bootstrap analyses, we developed a novel algorithm for detecting orthologs based on the concept of minimum evolution and implemented Mestortho. We then compared the reliability of Mestortho to phylogeny-based (Orthostrapper) and BLASTp-based RBH methods using three kinds of datasets (KOG, EXProt-1 and EXProt-2). Since the KOG datasets were originally curated based on the BLASTp-based reciprocal best hit method, the reliability of RBH may be overestimated in the datasets. It is therefore noteworthy that the reliability of RBH with the KOG datasets was higher (mean sensitivity: 83.8%; mean specificity: 94.3%) than the reliabilities for the EXProt-1 (67%; 71.5%) and EXProt-2 (56.9%; 67.5%) datasets (), where their true orthologs were respectively defined using the KO system and BLAST-based KAAS server, and the tree reconciliation from the EXProt datasets curated using an EC number identity criterion. Since Mestortho and Orthostrapper did not show these patterns, the KOG datasets appear to be artificially better references for the RBH method. We then checked the EXProt-1 datasets. Because the 14 EXProt datasets with more than one KOI were involved in sequence similarity searches using the KAAS, it is evident that the EXProt-1 datasets are more or less driven by the attribute of sequence similarity. However, the decrease of the reliability of RBH for the EXProt-1 datasets (sensitivity: 67%; specificity: 71.5%) in comparison to that for the KOG datasets (83.8%; 94.3%) suggested that the degree of the sequence similarity attribute of the EXProt-1 was less than that of the KOG datasets. Finally, we used the EXProt-2 datasets to evaluate reliability. Unexpectedly, the reliabilities of Orthostrapper with the EXProt-2 datasets were not higher than those obtained with the other datasets (). This indicates that the EXProt-2 datasets curated by tree reconciliation are not biased, even if using a phylogeny-based method (Orthostrapper).
Regardless of dataset attribute considerations, the mean sensitivities of Mestortho (94.1, 97 and 76.3%) were consistently higher than those of RBH (83.8, 67 and 56.9%) for the KOG, EXProt-1 and EXProt-2 datasets (). Moreover, the higher mean sensitivity of Mestortho (76.3%) in comparison to that of RBH (56.9%) in the analysis of the EXProt-2 datasets indicates that Mestortho is more powerful than RBH in its ability to discriminate true positives from false negatives even for datasets that were not defined by sequence similarity (EXProt-2; ). On the other hand, the RBH method had the maximum mean specificity for the KOG datasets and even for the EXProt-2 datasets, although the mean specificity of Mestortho (77.3%) was slightly higher than that of RBH (71.5%; ). Because the RBH method involves a sequence similarity-based approach, this result suggests that the method could be more useful to detect true negatives in any given dataset, in comparison to the other methods. In contrast, the mean specificity of Mestortho for the EXProt-2 datasets (28.7%) was lower than any mean specificity obtained with Orthostrapper (;
Supplementary Table 2), suggesting that Mestortho's weakness is its ability to discriminate false positives from true negatives. In a previous study using the KOG database of eukaryotic genomes (
9), RSD showed reliability similar to that of RBH. Furthermore, a recent study of orthology datasets curated by the BLASTp method also showed that RSD has no advantage over RBH in ortholog detection (
30). Although the reliability of Mestortho was not directly compared with that of RSD in this study, we expect Mestortho to be at least more sensitive than RSD.
We also argue that the reliability of Mestortho, RBH and Orthostrapper is not limited by the small number of EXProt datasets analyzed and is not biased by the occurrence of HGT. The sensitivities of Mestortho in the analysis of 100 random samples of size 21 obtained from the 95 KOG datasets were clearly higher than those of RBH (
Supplementary Figure 2), with the exception of five random samples, despite the small difference (~10%; Mestortho: 94.1%, RBH: 83.8%) between the mean sensitivities of the two methods for the KOG datasets (). These results indicate that the higher sensitivity of Mestortho over RBH appears displayed more reliably in the EXProt datasets than the KOG datasets, mainly due to the relatively large difference of the mean sensitivities (~30% in the EXProt-1; ~20% in the EXProt-2; ). Although HGT events in any bacterial sequence set can generate incorrect orthologs, our estimate of HGT occurrence (0.992%) for the 21 EXProt datasets showed that such events were rare and had minimum impact on the reliabilities obtained in this study.
Sequence alignment can also affect our minimum evolution approach. A change in alignment parameters can affect the evolutionary distance between any two sequences in a given dataset, which can subsequently affect the MES of any putative ortholog set. Mestortho results showed they were indeed dependent on changes of alignment parameters (
Supplemental Figure 3); only
~60% out of newly generated alignments resulted in results being identical to those of the reference alignments. At the level of individual sequences, however, most sequences (91.3%) were detected as orthologs regardless of alignment details. Although our simulation did not show the precise degree of robustness against changes in input alignment, Mestortho appears to generate moderately consistent results, especially at the individual sequence level.
Mestortho calculates the MES of each of the NJ trees describing evolution of every putative ortholog subset within a set of homologous sequences. Because the MES depends only on the sum of branch lengths of a phylogenetic tree, our algorithm is independent of topological changes and instabilities of reconstructed trees, and the program has the potential to detect orthologs that are not orthologous on the topology of a phylogenetic tree. To verify this property, we performed orthology detection using sequences of globin-related genes. It is generally believed that an ancestral globin gene duplicated to generate globin and myoglobin about 800 million years ago, and that the families of α- and β-globins have arisen by the duplication of the recent globin gene ~450–500 million years ago (
3). If the occurrences of speciation and duplication events have been mixed in evolutionary time, each of related species would have different numbers of paralogous genes. However, a recent study on globin evolution showed that each of human, mouse and Port Jackson shark has three globin-related genes; myoglobin, α- and β-hemoglobins. This implies that two duplication events of globin were followed by speciation events of three species (
27). In the phylogenetic tree shown in a, the topology of the myoglobin group is completely congruent with the divergence of the three species. Therefore, the group can be easily detected as orthologous by tree reconciliation. However, the groups of α- and β-hemoglobins have incongruent phylogenetic topologies in comparison to the species divergence (a). According to the standpoint of tree reconciliation, the hemoglobin sequences of shark should be excluded from each orthologous clusters of α- and β-hemoglobins. However, the previous study reported that the sequences of α- and β-hemoglobins for shark were orthologous in each gene family (
27). Despite conflict between the phylogenetic tree and the prior evolutionary knowledge, Mestortho regarded α- and β-hemoglobin sequences of shark as orthologs to α- and β-hemoglobins of human and mouse, respectively (b). There are two co-ortholog groups in the phylogenetic tree (a). According to tree reconciliation, each of two groups had only co-orthologous sequences. However, the previous evolutionary study of globin-related genes showed that the α- and β-hemoglobin duplication of shark precedes the speciation of the species (
27), revealing that the co-orthology of two sequences of shark on the phylogenetic tree is not correct. Unlike manual inspection of the phylogenetic tree (a), Mestortho detected three human sequences of α- and β-hemoglobin as co-orthologs, but rejected shark-related co-orthology (b), indicating that our approach for detecting orthology may be free from errors inherent to tree reconciliation.
Several methods are available for detecting orthologs among homologous sequences. Unfortunately, all of them, including Mestortho, produce different false positive and negative rates depending on the algorithm used (
9). We assume that orthologs of a reference sequence have different functional constraints than other orthologous groups in a given homologous sequence dataset. If an orthologous group with the reference sequence has evolutionary constraints stronger than or similar to other orthologous groups in a given dataset, Mestortho will probably detect the sequence members of the orthologous group more accurately because the evolutionary cost of the ancestral divergence between the two groups (α5+β5 in b) is sufficient to yield a difference in their MESs. In our simulation, the MES confidence interval of 33 orthologous groups showed that
True orthologous groups including reference sequences evolved slower than
False groups (), indicating that our assumption for orthology detection is reliable. However, there are some exceptional cases that lead to incorrect orthology detection. First, if an orthologous group of a reference sequence includes a pseudogene-like sequence (αB; a) that evolved faster than other sequences in a given dataset, Mestortho would detect the paralog of the pseudogene-like sequence as an ortholog (βB; a). Similarly, if a gene was missed in a species (αB; a), the false ortholog would be detected as an ortholog (βB; a). Second, if an ortholog of the reference (αB; b) is more closely clustered to the paralogs of the reference than the other true orthologs (αA; b), our approach would detect a false positive as an orthologous sequence (b). For example, in the EXProt dataset of
superoxide dismutase (EC 1.15.1.1), the paralogous gene of
P. aeruginosa, instead of the true ortholog of the species, was identified as being orthologous to the reference sequence, due to its closer clustering with the reference sequence of
E. coli (c).