Protein complexes are vital molecular machines of the cell1
; structural characterization of these complexes provides insight into their function2
. Given the number of undetermined protein complexes3
and inherent difficulties in experimentally determining the structures of these complexes at atomic resolution4
, there is an acute need to develop computational methods for structural modeling of macromolecular assemblies.
Molecular docking techniques have traditionally been used to predict binary complexes given their unbound component structures. These methods rely on a global search of a large set of possible binary configurations, maximizing geometrical and physicochemical complementarities between a pair of constituent subunits5–9
. The CAPRI challenge provides a critical assessment of such docking methods10
. Analysis of CAPRI results shows that in some cases docking methods suffer from relatively low accuracy, especially when the individual protein subunits are modeled or when their bound and unbound conformations significantly differ9,11–13
. This limitation has led to the emergence of restrained docking procedures that guide sampling and/or filter docking solutions based on additional sources of information9,14,15
. Notably, the HADDOCK webserver can incorporate multiple sources of information into its docking procedure16
While most docking methods are designed to deal with two molecules, the majority of functional macromolecular assemblies in the cell consist of more than two components. Inspired by targets in previous CAPRI rounds12
, several groups have proposed docking-based modeling methods for symmetric complexes17–21
. Fewer docking methods have been developed for the significantly more challenging case of asymmetric assembly modeling14,22
. A major obstacle for macromolecular docking algorithms is the ability to select near-native models from an ensemble of possible solutions. Knowledge of the overall shape of a complex, even at low resolution, can significantly reduce ambiguity inherent in such an ensemble when it is used to filter the set of candidate models. Such overall shape information can be obtained by Electron microscopy (EM) 23–25
or small angle X-ray scattering 26,27
EM is becoming a method of choice for structural visualization of large protein complexes. EM reconstruction techniques provide a density map of a complex at resolutions typically ranging from 5 Å to 25 Å28
. Following generation of a density map, atomic structures of complex components are often fitted into the map to construct a “quasiatomic” model of the complex 29–31
. Thus, EM data can be used not only to filter docking solutions but also to fit assembly subunits into their density. Rigid fitting techniques rely on a global search for the placement (position and orientation) of a single subunit inside the density map that maximizes the overlap between the model and the map32
. However, the majority of these techniques are designed to work independently on single subunits, without taking into account protein-protein interaction interfaces.
To combine the strengths of molecular docking and molecular fitting approaches, and to overcome their limitations, we have developed the MultiFit method. MultiFit simultaneously positions protein subunits into a density map of a protein assembly by combining geometric principles commonly used in molecular fitting and molecular docking33
. Here, we describe new algorithms for two of the stages of the MultiFit algorithm that significantly improve the accuracy of the method. In addition, we describe an extension of the MultiFit method for cyclic symmetric assemblies, resulting in a highly efficient algorithm that accurately treats such cases.
Below, we outline the MultiFit algorithm and describe the recent algorithmic advances. We then illustrate the method by modeling the structure of the methane monooxygenase enzyme (asymmetric complex) and the GroEL chaperone (cyclic symmetric complex), followed by results on a 10 complex benchmark. Finally, we discuss the advantages of incorporating EM data in macromolecular docking algorithms.