The primary product of the OMA pipeline is a list of high-confidence pairs of orthologous genes. This list is available for download, but is for most users too cumbersome to process and too general for typical applications. Thus, we also combine these pairwise relations into four types of groups. The existence of these different representations—and in general of multiple and at times conflicting definitions of orthologous groups—can be confusing for many users. In this section, we review the four types of groups provided in OMA, and discuss their advantages, limitations and applications ().
Figure 3. The four types of ortholog grouping provided in OMA. (a) protein-centric view reports orthologs to a gene of reference; (b) genome pair view lists all orthologs between two species; (c) OMA group view displays sets of genes in which all pairs of genes (more ...)
The most straightforward type of groups is the ‘protein-centric view’. In this representation, the OMA Browser provides the user with a list of genes orthologous to a specific gene. This view is appropriate for analyses centered on a single or only few genes of interest, e.g. to predict their function. More typical uses are provided in .
Typical applications of orthologs and their most suitable representation of orthology
For analyses involving mainly pairs of genomes, the ‘genome pair view’ accessible in the ‘Download’ section of the OMA Browser is the most appropriate: it provides a list of all pairs of orthologs between any two genomes specified by the user.
The third type of groups, ‘OMA groups’, consists in groups of genes in which all pairs are orthologs. In a graph representation with genes as nodes and orthology relations as edge, OMA groups correspond to fully connected subgraphs. Due to this definition, each OMA group includes at most one sequence per species, and, save for inference errors, the gene trees obtained from them should be congruent to the tree of the corresponding species. Indeed, the primary application of OMA groups is to provide input data for phylogenetic inference. In other applications, OMA group are often less appropriate. For example, evolutionary histories involving a duplication will, by definition, require at least two groups. Similarly, spuriously missed (i.e. false negative) orthologous predictions will also result in group fragmentation. And finally, because each protein belongs to one group at most, this representation only captures a subset of all inferred pairs of orthologs.
The last type of groups, ‘hierarchical groups’, consists in groups of genes that have descended from a common ancestral gene within a specific taxonomic range. Thus, by definition, hierarchical groups include both orthologs and in-paralogs with respect to the last common ancestor of the taxonomic range. In term of the underlying gene trees, hierarchical groups correspond to the leaves of subtrees rooted in the speciation events that define the taxonomic clade in question. By exploring groups across several levels, it is (at least in principle) possible to pinpoint the timing of particular duplication events. Hence, hierarchical groups can convey phylogenetic signal in ways that pairwise orthology/paralogy relations cannot.