|Home | About | Journals | Submit | Contact Us | Français|
Advances in electron microscopy (EM) allow for structure determination of large biological assemblies at increasingly higher resolutions. A key step in this process is fitting multiple component structures into an EM-derived density map of their assembly. Here, we describe a web server for this task. The server takes as input a set of protein structures in the PDB format and an EM density map in the MRC format. The output is an ensemble of models ranked by their quality of fit to the density map. The models can be viewed online or downloaded from the website. The service is available at; http://salilab.org/multifit/ and http://bioinfo3d.cs.tau.ac.il/.
Macromolecular assemblies are involved in nearly all cellular processes. Determining the structures of these biological machines is crucial for deciphering their function. Recent advances established electron microscopy as a central technique for studying the structures of macromolecular assemblies in different functional states in vitro and in vivo. Because the resolution of an electron microscopy density map is relatively low, fitting of atomic resolution component structures into the density map of the whole assembly is essential. MultiFit is the first web server for achieving this task.
Recent advances have established electron microscopy (EM) as a central technique for studying the structures of macromolecular assemblies in different functional states in vitro and in vivo (1). The resolution of an EM density map is typically better than 25 Å, and can be as high as ~4 Å for highly symmetric structures (2,3). In most cases, however, the resolution is insufficient to construct a full atomic model of a protein complex. To this end, fitting of atomic resolution structures into an EM density map of the whole assembly is essential (4–8).
In the past decade, different algorithms have been developed for fitting a single protein subunit into its density map (9–20). Most methods use a variant of the cross-correlation coefficient as the quality-of-fit measure (21). The position of a protein subunit inside the density map is sampled either exhaustively or by matching precalculated geometric features. Methods for fitting multiple components of large assemblies have also been recently described (22–25). In particular, we have developed the MultiFit module of the Integrative Modeling Platform (IMP, http://www.salilab.org/imp/) software package (23,26). MultiFit simultaneously positions protein subunits into a density map of a protein assembly by combining geometric criteria commonly used in molecular docking and quality-of-fit criteria commonly used in EM fitting. The method was validated in the 2010 EM modeling challenge (http://ncmi.bcm.edu/challenge/).
Here, we present a web interface to MultiFit. The server takes as input a set of protein structures in the PDB format and an EM density map in the MRC format. The output is an ensemble of models ranked by their quality of fit to the density map. The models can be viewed online or downloaded from the website.
MultiFit is a method for simultaneously fitting atomic-resolution protein structures into their assembly density map at resolutions as low as 25 Å. The input is a set of atomic structures of proteins and an EM density map of their assembly. The component positions and orientations are optimized with respect to a scoring function that includes the quality-of-fit of components in the map, the protrusion of components from the map envelope and the shape complementarity between pairs of components. The scoring function is optimized by an exact inference optimizer DOMINO (Discrete Optimization of Multiple INteracting Objects) that efficiently finds the global minimum within a discrete sampling space. Specifically, the optimization algorithm is composed of four stages, each sampling assembly models at increasingly higher resolution and accuracy. In ‘anchor graph segmentation’ stage, an unlabeled segmentation of the density map into regions is calculated using a Gaussian mixture model; the segmented regions correspond approximately to the subunits in the complex. In ‘fitting-based assembly configuration’ stage, a set of coarse assembly models is found by an enumeration over possible assignments of subunits to regions, followed by simultaneous local fitting of the subunits in the corresponding regions. In ‘docking-based pose refinement’ stage, each of the models found in the ‘configuration’ stage is refined by simultaneous local optimization of the interfaces between pairs of interacting subunits as sampled by local pairwise docking. In ‘rigid body minimization’ stage, each of the models found in the ‘refinement’ stage is further refined using a local Monte Carlo/conjugate gradients minimization procedure. The default run of the MultiFit web server omits the final refinement stage. Users can explore the ensemble of solutions generated by the first three stages and then refine a subset of the ensemble using a downloaded version of MultiFit. For cyclic symmetric complexes, the symmetry is imposed within the optimization procedure for improved efficiency, such that only symmetric models are sampled. In particular, in ‘fitting-based assembly configuration’ and ‘docking-based pose refinement’, only cyclic symmetric models consistent with the symmetry of the density map are sampled (26).
The MultiFit web server requires as input a set of protein structures in the PDB format, an EM density map of their assembly in the MRC format, and a few parameters (Figure 1). The parameters for the density map include: (i) resolution (Å) (27); (ii) voxel spacing on the grid representing the map (Å); and (iii) the contour level that results in the volume accommodating the molecular mass of the complex. These parameters are included for maps deposited in the EM Data Bank (EMDB) (28).
The MultiFit web server operates in two modes: cyclic-symmetric and non-symmetric. In the cyclic-symmetric mode, the symmetry order should be provided (2 for dimer, 3 for trimer, etc.). If the arrangement of the input monomers in its native complex follows a different type of symmetry, the user should use the downloaded version of MultiFit. In the non-symmetric mode, a list of subunit PDB files and the number of copies of each subunit are required. The input density should be pre-segmented to contain only the input set of proteins.
The server also has an optional input parameter specifying an e-mail address to which a link to the results page will be sent once the job is completed. Alternatively, the user can bookmark a web link to the results page at the time of data submission. The status of the job (queued, running or finished) can be accessed on the queue page.
The computation is performed in real time and the server page is updated once the calculation has finished. The typical running time is about 20 min for assemblies with tens of thousands of atoms. The web server output page displays a table of the top 20 assembly models that best fit the assembly density map, along with their quality-of-fit scores ranked from top left to bottom right (Figure 1). MultiFit lists the optimal as well as suboptimal solutions; when the latter have good scores and are different from the optimal solution, the user should be skeptical about all solutions and further analyze the ensemble.
Each model can be saved as a PDB file and can also be directly opened with UCSF Chimera (19). A compressed file containing all models is available for download. Moreover, the MultiFit output text file can be downloaded. Row i lists the transformation applied to each of the subunits, the model quality-of-fit score, and the geometric complementarity score for model i. This output file can be used as input to IMP for further refinement and analysis. It can also be used as input for refining symmetric complexes using the SymmRef method (29).
With the growing number of macromolecular assemblies characterized by EM, integrative modeling techniques are becoming increasingly useful for a mechanistic understanding of these assemblies (6,30–32). The MultiFit web server was designed to provide a user-friendly web interface to the MultiFit module in the IMP package, for fitting multiple protein structures into their assembly density map.
The Clore Foundation Ph.D Scholars program (to K.L.); the Israel Science Foundation (1403/09) and the Hermann Minkowski-Minerva Center for Geometry at Tel Aviv University (H.J.W.); and the Sandler Family Supporting Foundation, National Institutes of Health (R01 GM54762, U54 RR022220, PN2 EY016525 and R01 GM083960), Hewlett-Packard, NetApp, IBM and Intel (to A.S.).
Conflict of interest statement. None declared.
The authors are grateful to Ron Conway, Mike Homer, Hewlett-Packard, NetApp, IBM and Intel for computer hardware gifts.