The distinction between orthologs and paralogs has been a central concept of phylogenomics 
. And yet, it is only recently that the functional relevance of this distinction has been treated as a hypothesis to be tested. To date, several indirect, sequence-based studies have failed to support this classical model, rather supporting an alternative model of uniform functional divergence, independent of duplication [reviewed in 5]
. Recently, Nehrt et al. 
have compared the functional annotations of orthologs and paralogs between human and mouse. Surprisingly, they report the strongest functional similarity for paralogs, which is expected neither under the classical model nor under the uniform model.
Directly comparing functional annotations is complicated, because they are derived from a variety of sources and by a variety of procedures. The best-known bias is that computationally derived annotations (IEA code) are generally believed to be less reliable than experimentally derived annotations. The computational annotations reflect the algorithms used to propagate annotations 
, and thus are shared preferentially among proteins with high sequence similarity, among orthologs, or among proteins sharing well-defined domains. Any analysis including these GO annotation will recover the impact of these algorithms, which is indeed what we find when we use all GO annotations. Much of the older literature on function divergence used the EC nomenclature as a measure of function, and thus mixed indiscernibly electronic and experimental annotations. Thus it is probable that most results based on the EC nomenclature are biased by electronic annotations (i.e., Fig. S15
Even limiting ourselves to experimentally derived annotations, there remains a great deal of complexity and bias in the data of functional annotation.
First, different model organisms are studied by different scientific communities, for different purposes, which bias the types of experiments conducted and reported. Moreover, each organism is predominantly annotated by one Model Organism Database team, which differs from others in its data curation and annotation practices. Indeed, we observe significant differences in background functional similarity, depending on the species compared. While part of this variation might be due to biological differences among the species, these differences appear to be mostly due to the artifacts outlined above. Here, we have compared 13 organisms spanning the tree of life (Fig. S17
; Table S4
), and we have corrected each comparison by the background frequencies of annotations from the relevant genomes. Moreover, we show that results limited to two yeasts are consistent with results averaged over all organisms.
Second, each experiment is performed and reported by a given team of investigators, who have a scientific focus and a manner of reporting which are specific to them. This induces a strong bias towards similar annotations derived from the same paper, which mostly affects same-species paralogs. Importantly, there is a bias towards similar annotations even when considering different papers which share at least one co-author. Unless accounted for, this confounding factor leads to a large spurious excess of similarity between same-species paralogs [similar to the results of 7]
. Controlling for it leads to the opposite conclusion: a weak excess of similarity between orthologs (Fig. S16
). This observation is also corroborated by a recent rebuttal from the GO consortium, which reexamined two case studies from Nehrt et al.'s paper and concluded that the difference in function similarity computed between orthologs and paralogs was mainly due to bias in annotations, not in the underlying functions 
While GO annotations are complex and biased, it nevertheless appears possible to identify and correct these biases, and to detect biologically significant signal. We feel that the use of 13 different species, with diverse annotation levels and evolutionary distances, contributes to the robustness of our results.
Once the biases identified above are accounted for, the signal which emerges can be summarized in three major points: (i) Consistent with the “ortholog conjecture”, or “standard model of phylogenomics”, overall functional similarity is highest between one-to-one orthologs, lowest between paralogs, and intermediate between other orthologs. (ii) There is at best a very weak relation between protein sequence similarity and functional similarity. (iii) The difference between orthologs and paralogs, although consistent with the ortholog conjecture, is weaker than expected under a naive understanding of that model; this is especially true when Molecular Function and Biological Process are considered separately.
The standard model of higher functional similarity among orthologs than paralogs at similar levels of sequence divergence could not be supported until it was explicitly tested 
. Several recent studies have performed such tests, and found some measure of support for the standard model. On a structural level, there appears to be higher conservation of intron position 
, of protein structure 
, and of domain architecture 
between orthologs. Presumably more relevant to biological function, the conservation of expression patterns appears higher between orthologs than between paralogs, in mammals 
. On the other hand, Nehrt et al. 
have found that the expression correlation of human/mouse inparalogs is significantly higher than that of orthologs (but not outparalogs). And a study of the evolution of sub-cellular localization in yeasts did not find any difference between orthologs and paralogs 
. These contradictory results might be due in part to the overall modest difference between orthologs and paralogs, and in part to differences between different aspects of function.
An intriguing pattern in our results is that we find strong conservation of Cellular Component annotations among orthologs. Contrary to the two other ontologies, sub-cellular localization is an aspect of function which leaves little room for divergent interpretation. Moreover, experimental results are easier to report in similar terms in different species. These factors might allow better detection of the excess conservation of orthologs. Thus, of the 3 ontologies, our results on cellular components are arguably the most conclusive.
As for the two other aspects of protein function captured by the Gene Ontology—Molecular Function and Biological Process—they have more subtle patterns. Molecular Function shows an excess of conservation between orthologs which is weaker than for Cellular Component, but which is strongly significant over all 13 genomes analyzed. This is the aspect of function for which there was previously the most evidence for the “uniform model” of no significant difference between orthologs and paralogs; with the available data, this can now be rejected. This is also the aspect of function for which the absolute value of excess similarity (i.e., excess similarity of homologs over random pairs) is strongest—for both orthologs and paralogs. Thus, Molecular Function appears to be strongly conserved between even distant homologs, which supports the received wisdom of predicting this type of annotation on the basis of conserved protein domains.
Biological Process also has a significant excess of function conservation among orthologs, although weaker than for the Cellular Component. This is surprising, given the wide differences in biology between the species compared. Indeed, throughout the entire range of sequence divergence, orthologs are considerably more similar in function than even same-species paralogs. Of note, the biases which amplify apparent similarity between paralogs are strongest for this aspect of function: not correcting for the sampling bias of orthologs or paralogs detected between species can lead to a spurious excess of conservation of same-species paralogs. Our results contradict the concept of the evolution of cellular context set forth by Nehrt et al. to explain the apparent higher similarity of function of in-paralogs between human and mouse 
This concept was also related to the weak relation between protein sequence divergence and functional divergence. Nehrt et al. 
speculated that protein function might evolve more as a function of the divergence of cellular context than as a function of protein sequence. They suggested that a comparison of orthologs of different ages might recover an effect of divergence age on functional divergence. Our analysis includes species divergences spanning the range from 36 Mya to 3300 Mya, yet we still do not find a strong relation between functional divergence and protein divergence, nor with species divergence time. These observations suggest that protein function evolves in a very non-clock-like manner. Indeed, clock-like evolution is an expected pattern for neutrally evolving characters 
, whereas selection is expected to be the major force shaping the evolution of protein function.
The low impact of evolutionary time on average protein function conservation is also apparent if we compare humans to model organisms with very different divergence times. Indeed, the extent of functional similarity of one-to-one orthologs is similar between human and E. coli, human and yeast, human and fly, or human and mouse. This supports the strong relevance of these various species for understanding human biology. In fact, the average similarity over all available one-to-one orthologs is even higher for the more distant E. coli and yeast, than for fly or mouse. This is probably due the fact that only proteins with very strong function conservation are kept as detectable one-to-one orthologs over such long evolutionary spans. We verified this by comparing only proteins which are detected as one-to-one orthologs in triplets of these species. For human-mouse-fly, we do recover a stronger similarity for more closely related species. But for the triplets with yeast or E. coli, this is not the case. In terms of evolutionary biology, this shows that, to some extent, protein function does diverge with time. Yet there is a class of proteins, conserved beyond animals, which conserve their function, irrespective of divergence time, on average. In terms of annotation procedures for databases, and even design of new experiments, these results show that if a protein is conserved between two species, as one-to-one ortholog, then its function is probably mostly conserved, even if the divergence time is very large.
In conclusion, our analyses corroborate the central tenet of the standard model of phylogenomics—that at similar levels of sequence divergence, orthologs are in general more similar in function than paralogs. But although significant, the difference is modest, and is uneven among different aspect of function (among different ontologies). Furthermore, our results expose other trends unexplained by the standard model, such as differences among subtypes of orthology and paralogy (also observed in other contexts, such as intron conservation 
), or the lack of interaction between sequence and function divergence. Hence, the standard model has validity, but is of only limited practical use. To further progress in our understanding of the relation between gene evolution and gene function, we need to move beyond the orthology/paralogy dichotomy.