Although homomers are central to biology, only anecdotal knowledge exists on their principles of evolution and assembly, and no unifying theory has been proposed. Large increases in structural data in recent years, however, have enabled us to study quaternary structure or spatial arrangement of subunits on a data set of 5,375 unique structures. This data set is ~tenfold greater than any studied previously16
(Methods). On the basis of this data set, we quantify how often proteins change their quaternary structure, and identify the evolutionary routes taken to do so. Subsequently, as evolution of a complex can be viewed as assembly over a long timescale, we compare evolutionary routes with (dis)assembly routes probed by mass spectrometry.
Homomers can be separated into two main classes of open or closed symmetry. The first class corresponds to open structures that would polymerize to infinity in the absence of limiting factors. Such assemblies (for example, tubulin and actin) are rare in our data set (3%), probably because their innate dynamic character renders them difficult to crystallize. In contrast, closed symmetries are finite in space, and most homomers adopt either cyclic or dihedral symmetry (), with only a small fraction (1%) having cubic symmetry (not shown). Throughout we denote Cn as a cyclic complex containing n subunits, and Dn as a dihedral complex containing 2n subunits.
Abundance and properties of cyclic and dihedral symmetries
It has long been observed that smaller complexes are more abundant than larger ones, and even numbers of subunits are favoured over odd numbers8,9,17
. Here we confirm this observation, with 62% of complexes being dimers. We quantify the different types of symmetries found in homomers and show that the abundance of complexes with even numbers of subunits is due to the prevalence of dihedral complexes. Whenever an option exists for cyclic or dihedral, on average we find an 11-fold preference for dihedral complexes (). There is an evolutionary explanation for this preference, as the probability that a dihedral complex evolved by random mutation should be higher than the probability for a cyclic complex for at least two reasons: first, at the level of individual interfaces, in dihedral complexes most interfaces are face-to-face (or back-to-back), whereas all interfaces in cyclics are face-to-back () and these are less likely to form by random mutation5,18
; and second, at the level of whole complexes, evolution of dihedral complexes can take place in multiple steps (C1→C2→D2) whereas cyclics must evolve in one step (C1→C4, ).
Notably, dihedral and cyclic symmetries are geometrically related: a complex with Dn
symmetry can be formed from n
dimers with C2 symmetry or from two n
-mers with Cn
(). If a protein complex has a particular symmetry, we find that homologues are likely to have the same symmetry type. More specifically, for sequence identities >90%, conservation is nearly 100%, whereas in the range of 30–40% sequence identities, conservation is ~70% (Supplementary Fig. 1
). Proteins with different degrees of quaternary structure conservation are illustrated in . Thymidylate synthase always exists as a dimer, adenylyltransferase is a dimer in Bacillus subtilis
and a hexamer in human (trimer of B. subtilis
dimers), whereas two phospholipase A2s have geometrically very distant quaternary structures.
When quaternary structure is not conserved, we speculate that pathways linking geometrically related symmetries represent both evolutionary and assembly routes. For example, a dihedral tetramer (D2) can be described as a dimer of dimers, where a back-to-back dimerization patch forms a first dimer, and a second face-to-face dimerization patch forms the dimer of dimers. This is not true of a cyclic tetramer (C4), where subunits interact in a face-to-back manner, such that two different surface patches are involved in forming an interface (). Therefore, we expect many more dihedral than cyclic tetramers to share evolutionary relationships with dimers. This is illustrated by the pathway from a dimer to a dihedral tetramer () and the disallowed transition from a dimer to a cyclic tetramer.
Following this idea, we looked at evolutionary relationships in terms of sequence similarity between different quaternary structures to unveil the routes most commonly taken to build larger complexes (). Each quaternary structure is represented schematically with the numbers of proteins of each type. Pairs of quaternary structures are connected according to the statistical significance of the number of evolutionary transitions between them. Most pairs have fewer transitions between them than expected in a random model (Methods) as exemplified by monomers (C1) and dihedral tetramers (D2). Other pairs with insignificant numbers of transitions are shown in Supplementary Fig. 2
. We find that cyclic dimers, trimers and tetramers share notable numbers of transitions with their dihedral counterparts, supporting the stepwise evolutionary scenario where homomers with dihedral symmetry evolve through cyclic intermediates ().
Notably in this stepwise scenario, two evolutionary routes lead to a dihedral complex (Dn
): either from n
dimers or from two cyclic n
-mers (). This raised the question as to whether it was possible to identify which of these two routes was taken by a given dihedral complex. On the basis of energetic considerations (Supplementary Information 1
), we propose that a hierarchy of interface sizes exists within dihedral complexes, and that the larger interface is conserved in evolution. To test this hypothesis, we looked for tetramers homologous to a dimer, as well as hexamers homologous to a dimer or trimer. In this data set ( and Supplementary Table 1
) we examined whether the interface within the dimer or trimer corresponded to the largest interface in the homologous tetramer or hexamer. Among 33 tetramers and 19 hexamers studied, 49 complexes conserve the larger interface with the dimeric or trimeric homologue, whereas only 3 conserve their smaller interface ( and Supplementary Table 1
). This result implies that the evolutionary route of a homomer can be predicted solely from its interface sizes. Our predictions for the evolutionary pathways of D3, D4 and D5 complexes (Supplementary Fig. 3a
) have led us to formulate a general model of homomer evolution (Supplementary Fig. 3b
It is notable that this signature of complex formation (hierarchy in interface sizes) is conserved throughout evolution. This can be interpreted in at least two different although not mutually exclusive ways: (1) once the complex is formed there is no need to dramatically change the interface size, analogous to the classical explanation for the marginal stability of proteins20
(that is, selective pressure becomes almost non-existent beyond the point where proteins fold); and (2) maintaining a hierarchy of interface strengths is important for a precise order during assembly21,22
, in which case the largest interface would reflect the main intermediate species during assembly. To test this hypothesis we targeted ten complexes for study using electrospray mass spectrometry ( and Supplementary Table 2
(Dis)assembly pathways in 16 complexes
Initially we verified that the complexes could be generated intact and corresponded to the stoichiometry described in the protein data bank (PDB). The mass spectra recorded for two hexamers with D3 symmetry and one 14-mer with D7 symmetry revealed that the intact homomer is maintained in each case (). We then induced the disassembly of each complex through the careful change in ionic strength or the stepwise addition of partial denaturants. We detected stable subcomplexes corresponding to trimers and dimers for hexameric AUH protein (an RNA binding protein), and MoaC (a molybdenum cofactor biosynthesis protein), respectively (Supplementary Table 2
). Examination of the interface size shows that in both cases the larger interface is maintained. Similarly for the Ca2+
-dependent kinase with D7 symmetry, a dimer is the principal dissociation product and buries the largest interface (). For one complex (PDB entry 1vea) our results were ambiguous as no intermediate and only monomeric subunits were detected; for another complex (PDB entry 1umg) we predicted a tetramer and detected a dimer. In this case, both subcomplexes bury large surfaces (>5,000Å2
), which may bias the use of interface size as a proxy for interface strength. For the remaining complexes, the predicted subcomplex containing the larger interface was observed. These results demonstrate that the largest interface is maintained consistently during disassembly.
To address whether the disassembly process was the reverse of the assembly pathway, we attempted to reassemble a subset of the complexes studied by dilution of the denaturant and/or manipulation of the ionic strength. In ~50% of the complexes examined we were able to reassemble the original homomer. These results—together with previous studies where reassembly was found to be strongly dependent on factors such as ionic strength, temperature and concentration of denaturant23,24
—indicate that disassembly is the reverse of assembly under the appropriate conditions.
To complement our experimental observations, we found six additional complexes for which (dis)assembly intermediates had been reported (). Of these, five match our prediction and one (nucleoside diphosphate kinase) had no intermediate detected. This homomer may either assemble without forming subcomplexes, or subcomplexes may have escaped detection. Alternatively, formation of subcomplexes might involve factors absent from the experimental set-up25
. Thus, although there are exceptions, we find agreement between the evolutionary pathway and (dis)assembly pathway in 81% of the cases we examined.
Overall, through analysis of a large set of homomers, we have shown that the evolutionary pathway of a homomer can be inferred from its atomic structure morphology. This allowed us to predict the (dis)assembly pathway of homomers in solution, and design mass-spectrometry-based experiments to validate our predictions. Results revealed that the (dis)assembly pathway, which takes place on a protein-folding timescale (~seconds), mimics the evolutionary pathway that has taken place over a considerably longer timescale (~millions of years). This is the first time that a general principle for formation and assembly of homomers has been demonstrated. We hope that this will stimulate further studies, as relationships between folding, complex formation and aggregation are only beginning to be explored.