Our results show that the evolutionary rates of complex I subunits contain a significant amount of information about the complex’s quaternary structure. For the matrix arm we found that about 61% of the correlation in evolutionary rate could be explained by the distances of the subunit centers. This is even more striking if we consider that the evolutionary model was derived from eukaryotic sequences and thus should reflect the matrix arm structure in eukaryotes, while our reference structure is from a bacterium. Indeed, the evolutionary 3D model revealed a twist between the N-module and the Q-module when compared to the bacterial structure, a finding that is supported by experimental data ([6
], personal communication).
In the two models that include the membrane arm, mitochondria-encoded subunits were predicted to be separate from nucleus-encoded subunits, which is in line with previous results. The independent variation of evolutionary rates in the nuclear and mitochondrial genomes [51
] may have contributed to the isolation of the mitochondria encoded subunits in our models. Nevertheless, we stress that the position of the membrane arm core subunits, specifically at the proximal end of the matrix arm in both models, indicates a signal of the physical structure in the evolutionary correlation data. Furthermore, the strict separation of the nucleus-encoded Iβ subunits and their mitochondria-encoded counterparts NADH4 and 5 may be explained by other factors, as these two groups also behave differently in experiments [1
]. Interestingly, despite its nuclear encoding, the membrane-integral subunit NDUFA1 of the Iα sub-complex [1
] is positioned close to the membrane arm core, in particular close to NADH2 (Figures a and b). In T. thermophilus
, NADH2 is located between the two subunits NADH1 and NADH4 [5
] both of which are known physical interactors of NDUFA1 [21
]. A direct physical interaction of NDUFA1 with NADH2 is therefore likely.
The evolutionary correlation failed to identify the correct topology of the membrane arm core. A number of biological reasons could explain such a lack of signal. First, long-range structural constraints [5
] may interfere with the distance-dependent structural constraints that are necessary for a distance-dependent strength of evolutionary rate correlation. Second, the formation of OXPHOS super-complexes with complex I dimers may result in correlations between distant subunits. Indeed, despite their positions at opposite ends of the membrane arm, NADH1 and 5 show a high correlation in evolutionary rates with each other and with subunit CYTb of complex III [33
] consistent with their proximity in OXPHOS complexes organized into respiratory strings [55
]. Third, the lack of correlation with physical distance may result from non-adaptive variation in the mitochondria-encoded genes caused by variable and, at least in some eukaryotic taxa, heterogeneous mutation-pressure [56
]. Indeed, in a number of animal taxa changes in gene order or mutation-pressure led to non-adaptive changes in mitochondrial genes [57
]. The mitochondrial genomes of some taxa in our study, such as plants, are clearly different from those in animals (reviewed in [60
]) and their genes are likely under different mutation-pressures [61
]. Fourth, the embedding of the membrane proteins in two dimensions might reduce the evolutionary constraints to maintain interactions in comparison to proteins that are embedded in three dimensions.
The integration of multiple proteins in a single model assumes that the interactions are permanent and non-competitive. This is clearly not the case for the model of 45 proteins because it includes assembly factors. This model can therefore not exactly represent a physical structure. According to current models, complex I assembles from independent subcomplexes [62
]. Of the assembly factors required for this process and included in our study, only NDUFAF1 (AF1) is required for the assembly of the distal membrane arm sub-complex [13
]. In our model, AF1 is located close to the matrix arm, which supports an indirect rather than a direct involvement of AF1 in membrane arm assembly [65
]. The distal membrane arm further combines with a pre-formed membrane-anchored proximal matrix/membrane arm that contains the subunits NDUFS2 (S2) and NADH1 (1) and possibly NDUFS3 (S3) and NDUFS7 (S7) [62
] and whose assembly involves NDUFAF3 (AF3) and possibly C8orf38 (O38) [17
]. Although the membrane-association of AF3 and O38 is not reflected in our data, they form a tightly co-evolving triple with C2orf56 (O56), which is known to bind the proximal matrix arm subunit S2 [12
]. The high correlation in evolutionary rates between AF3, O38, and O56 suggest strong selective constraints on their cooperation during the assembly of the proximal matrix/membrane arm sub-complex. The fourth assembly factor that has been experimentally linked to the proximal membrane arm, C20orf7 (O7) [18
], is indeed placed close to the proximal matrix arm subunits S2, S3 (Figure b, right bottom), and the proximal membrane arm subunits A3, A6, and A8 [2
After the joining of the two membrane arm intermediates, the proximal matrix arm is further extended. This step involves the NUBPL-mediated assembly of at least one FeS-cluster into the distal matrix arm [11
]. In the evolutionary configuration the assembly factor NUBPL is positioned side by side with the permanent subunit NDUFA2 (A2; Figure b, right top). Like NUBPL, A2 is associated with the distal matrix arm [68
]. The highly conserved A2 subunit is structurally similar to thioredoxin-like proteins with a loop-region of probably variable conformation that contains two cysteines in human (C24 and C58) [69
]. These cysteines can form a revertible disulfide bridge with an in-vitro
redox-potential in the range of the large majority of isopotential FeS-clusters of complex I [69
]. Although the cysteines are not fully conserved, occasionally FeS-clusters are bound by serine, histidine, or aspartate [71
]. Indeed, the human serine 30 in NDUFA2 is a good candidate for FeS-cluster binding because it is perfectly conserved in all species, with the notable exceptions of Trypanosoma
, in which it is substituted by cysteine. Together these observations and the very strong evolutionary rate correlation of A2 and NUBPL support an involvement of A2 in complex I associated FeS-cluster assembly or maintenance. The peripheral position of A2 and NUBPL in the model could be a consequence of other strong evolutionary constraints not directly related to complex I.
Also NDUFAF2 (AF2, B17.2L) has been linked to the assembly of the distal matrix arm [14
]. Interestingly, the evolutionary data position AF2 directly besides its paralog NDUFA12 (A12, B17.2) [10
]. Like AF2, A12 is known to be associated to the distal-matrix arm to which it is directly recruited from the mitochondrial matrix [68
]. The correlation in evolutionary rates and the independent co-loss in multiple complex I lacking taxa [10
] support an evolutionarily conserved functional relationship of AF2 and A12. It is tempting to speculate that AF2 temporarily binds at the binding site of A12, e.g. to stabilize the local structural context, and is later substituted by its paralog. Such close positioning and physical interaction of homologous proteins within the same protein complex is one of the prevailing trends in the “fate” of duplicated proteins in complexes [72
]. Complex I appears to add another twist to this pattern in the sense that the predicted interaction is only temporary.
The rate of protein evolution is influenced by diverse factors [73
], in particular expression and general functional relatedness [39
]. It is therefore even more remarkable that we found physical distance to be the major determinant of the evolutionary rate correlation for the complex I matrix arm. However, this result does not apply to the whole complex. Thus, to establish whether the mirror-tree/MDS combination is a good general method to predict quaternary structures, other complexes need to be analyzed. Furthermore, instead of using the mirror-tree method one could use residue correlation to measure the co-evolution of subunits more directly. Residue correlation has been used to predict contact interfaces for protein pairs [30
] and to investigate a rotation-symmetric homo-multimeric complex [76
]. A simple implementation would be to integrate pairwise residue correlations [28
] or correlations that account for indirect correlations [30
] or phylogenetic dependency [76
] by in-silico two-hybrid [80
] into subunit distances and map these into three dimensions by multidimensional scaling.