Many soluble and membrane-bound proteins form homooligomeric complexes in a cell, although their oligomerization states are often difficult to characterize.1–8
For example, more than three-fourths of all entries in the Protein Quaternary Structure database are homooligomers,9
while the BRENDA Enzyme Database†
contains 70% multimeric enzymes, most of them representing homooligomers. It is difficult to overestimate the functional importance of protein oligomerization, which can be used to regulate the activity of many proteins such as enzymes, ion channel proteins, receptors, and transcription factors. Indeed, it has been suggested that large assemblies consisting of many identical subunits have advantageous regulatory properties as they can undergo sensitive phase transitions.10
Oligomerization can also provide sites for allosteric regulation, generate new binding sites at dimer interfaces to increase specificity, and increase diversity in the formation of regulatory complexes.11–16
In addition, oligomerization allows proteins to form large structures without increasing genome size and provides stability, while the reduced surface area of the monomer in a complex can offer protection against denaturation.10,17,18
Recently, analysis of high-throughput protein–protein interaction networks found that there are significantly more self-interacting proteins than expected by chance,19
and that the efficiency of co-aggregation between different protein domains decreases with decreasing sequence identity.20
Several explanations were proposed to account for these observations of self-attraction, including stability and foldability arguments.21,22
It was found, for example, that predictions of energy distributions of homodimers are shifted toward lower energies compared to those of heterodimers.23
The physical effect of a statistically enhanced self-attraction was further modeled to show that interactions between identical random surfaces are stronger than attractive interactions between different random surfaces of the same size.24,25
Stability requirements are important, but are not the only requirements governing protein evolution. Protein evolution optimizes the biological function of a protein and might not necessarily lead to optimal stability or foldability, especially if these properties are antagonistic with functional constraints. Different evolutionary scenarios of protein oligomerization have been discussed in the literature. Some of them propose evolutionary pathways that follow kinetic scenarios of two-state or three-state folding or domain swapping.26–29
At the same time, duplication of homodimers may lead to oligomers of paralogs and may create new protein complexes in evolution.30
Although oligomerization plays an important functional role, the formation of multiple oligomerization interfaces and symmetry requirements puts additional constraints on the evolution of constituent monomers and on the complex itself.
Homooligomers provide convenient systems for studying the evolution of protein interactions using only one phylogenetic tree, thus avoiding the ambiguity of finding corresponding branches between different phylogenetic trees for heterooligomeric complexes. At the same time, the evolution of protein interactions cannot be decoded without a detailed analysis of interaction interfaces and binding modes. This in turn requires information on the atomic details of interacting residues for different and diverse members of a given protein family. In this article, we analyze the general principles of the evolution of homooligomers in terms of their symmetry, interface sizes, and conservation of binding modes, and focus specifically on the evolution of the binding modes of nine homooligomer families. We successfully map different binding modes and oligomerization states on phylogenetic trees and trace their evolution. First, we find that binding modes have a tendency to be conserved between proteins from the same homooligomeric family sharing more than 50% sequence identity, with the trend being more pronounced for close homologs of above 70% identity. This result is important for inferring protein binding modes from known complexes to homologs/interlogs with unannotated interaction modes or binding sites. Second, we show that the most ancient binding modes have a tendency to involve symmetrical larger interfaces, while the more recent binding modes exhibit more asymmetrical smaller interfaces.