Walter M. Fitch, a towering giant of molecular evolution, passed away on 10 March 2011. His pivotal contributions to the field are manifold, diverse and profound. He developed some of the first algorithms and practical methods for phylogenetic tree construction; pioneered statistical approaches in sequence comparison and phylogenetic analysis; introduced the covarion approach in the study of gene evolution, a simple but powerful concept the full potential of which is not yet realized; and in the final years of his long and enormously productive career, laid the foundation of viral phylogenetics that has become essential for the most practical purpose of predicting the dominant influenza strain in the next epidemics. Fitch's many fundamental achievements have been succinctly but clearly outlined in a recent Science Retrospective . For the special issue of Briefings in Bioinformatics on orthology and orthologs, it seems most appropriate to focus on a single paper of Fitch in which he introduced the concepts and definitions of orthology and paralogy. Appearing in 1970 in a well-respected but relatively low key journal Systematic Zoology, under the unassuming (at least, at the time) title ‘Distinguishing homologous from analogous proteins’ , this paper hardly attracted much attention immediately upon its publication and was quite poorly cited over the next 25 years. This has changed when comparison of gene sets from completely sequenced genomes became the core of comparative and evolutionary genomics (Figure 1). All of a sudden, it became apparent that the straightforward classification of homologs introduced by Fitch in that article was absolutely essential for genomics to make any sense of the pouring genomes, so the citations have been accumulating at a good pace ever since (Figure 1). I would go as far as to submit that Fitch's paper is the single most important conceptual cornerstone of modern genomics. Indeed, distinguishing orthologs from paralogs is critical for at least three key tasks :
Between these and related tasks, analysis and classification of homologous relationships among genes is at the center of almost every comparative genomic study. Not only is discrimination of orthologs and paralogs vitally important, it is also decidedly nontrivial both at the fundamental and at the technical level as evidenced by the multitude of developed methods and resources for this purpose many of which are described in the articles of this special issue of Briefings in Bioinformatics. Moreover, even the conceptual foundations of the orthology paradigm remain open to debate: a recent bioinformatics study has challenged the ‘ortholog conjecture’, namely the common (and crucial to genome annotation) assumption that orthologs are generally more functionally similar than paralogs . So orthology and paralogy are not fixed textbook definitions but evolving concepts, and this is the typical fate of Walter Fitch's ideas and research: he usually addressed only deep problems that remain active research subjects for many decades if not indefinitely.
- reconstruction of genome evolution including genes losses, horizontal gene transfer and lineage-specific duplication;
- study of major aspects of the evolutionary process such as the distribution of selection pressure across genes; and
- transfer of functional information from functionally characterized genes to uncharacterized homologs from other organisms which is the basis of genome annotation.
Interestingly, Fitch's 1970 article was not dedicated to the classification of homologous relationships. As follows from the title, the bulk of the paper is about distinguishing homology from analogy, or more precisely, estimating the likelihood of a given level of sequence similarity between proteins emerging due to convergence (another subject that remains pertinent to this day). Only in the final paragraph of the discussion, Fitch notes: ‘It is not sufficient, for example, when reconstructing a phylogeny from amino acid sequences that the proteins be homologous. … there should be two subclasses of homology. Where the homology is the result of gene duplication so that both copies have descended side by side during the history of an organism (for example, a and b hemoglobin) the genes should be called paralogous (para = in parallel). Where the homology is the result of speciation so that the history of the gene reflects the history of the species (for example a hemoglobin in man and mouse) the genes should be called orthologous (ortho = exact). Phylogenies require orthologous, not paralogous, genes’ . I hope readers appreciate the crystal clarity of this description given in the days when the idea of comparing complete genomes was still alien to biologists. For the 30th anniversary of his landmark paper, Fitch revisited the subject and published an equally lucid and succinct discussion of various aspects of homology that arguably remains the best reading on this subject .
Fitch's 1970 article has duly become a citation classic, but not a record-breaking one: according to ISI Web of Science, by the end of 2010, it has been cited 554 times (Figure 1), a highly respectable but by no account astonishing number. This could seem surprising given the obvious importance of Fitch's concept but another little scientometric exercise provides a clue. According to PubMed, the term ‘ortholog’ alone (not counting ‘orthologous’ or ‘orthology’) has been used in the title or abstract of 4947 publications (Figure 2). Obviously, almost 90% of articles in PubMed that use the term do not cite Fitch. The classification of homologs he proposed 40 years ago has become part of the research fabric in evolutionary biology and genomics—so much so that it is used mostly without attribution. It is hard to think of a higher recognition and a better tribute to a great scientist.