|Home | About | Journals | Submit | Contact Us | Français|
Structural modeling of macromolecular complexes greatly benefits from interactive visualization capabilities. Here we present the integration of several modeling tools into UCSF Chimera. These include comparative modeling by MODELLER, IMP simultaneous fitting of multiple components into electron microscopy density maps by IMP MultiFit, computing of small-angle X-ray scattering profiles and fitting of the corresponding experimental profile by IMP FoXS, and assessment of amino acid sidechain conformations based on rotamer probabilities and local interactions by Chimera.
Proteins carry out their functions through interactions with other molecules. Of particular interest here are assemblies of multiple proteins, which are often large, dynamic, flexible, and fragile, contributing to the difficulty of determining their structures. Even when single structure determination methods fail, however, atomic models of assemblies can be determined by combining multiple types of experimental data, including those from X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, electron microscopy (EM), small-angle X-ray scattering (SAXS), cross-linking, mass spectrometry (MS), and affinity purification (Alber et al., 2008; Lasker et al., 2010a; Robinson et al., 2007). Computational integration of diverse experimental data into an ensemble of models that best satisfy the data is not yet an entirely automated process. Therefore, visualization software, used to setup calculations, assess the results, and troubleshoot problems, is essential for the quality and efficiency of iterative integrative structure modeling.
A common structure determination approach is the fitting of crystal structures and comparative models into an EM map of the full molecular assembly. The structures can be fit as rigid bodies by sampling globally (Fabiola and Chapman, 2005) or locally (Goddard et al., 2007; Pintilie et al., 2010). Methods of flexible fitting include molecular dynamics (Trabuco et al., 2008), Monte Carlo (Topf et al., 2008), normal mode analysis (Tama et al., 2004), and morphing (Wriggers, 2010; Wriggers and Chacón, 2001). Restraints such as symmetry (Navaza et al., 2002) and intermolecular distances (Rossmann et al., 2001) can be incorporated into the fitting process. Which available method is best depends on many factors, including the resolution and symmetry of the density map, the availability of additional restraints, and the accuracy of component models.
SAXS profiles have been used widely for low-resolution structural characterization of molecules in solution (Petoukhov and Svergun, 2007; Putnam et al., 2007; Schneidman-Duhovny et al., 2010). While a SAXS profile can be converted into an assembly envelope that can in turn be used directly for fitting component molecules (Svergun, 1999), the SAXS measurement has a relatively low information content - the rotationally averaged scattering intensity versus the scattering angle approximately determines only the system’s radial distribution function. Thus, a good use of an experimental SAXS profile is to compare it to a profile computed from a 3D structural model that was derived from other data (Pons et al., 2010; Schneidman-Duhovny et al., 2010; Svergun et al., 1995). Also, changes in assembly conformation or composition under variations of pH, salt, temperature, cofactors, and drugs can be recognized, and candidate models ranked by comparison of experimental and model-derived SAXS profiles.
Atomic assembly models often generate invaluable testable hypotheses. For example, models predict which residues are in contact at intermolecular interfaces and thus may be essential for assembly formation and function. In models built from individual X-ray crystal structures, the sidechain conformations may not reflect those in the complete assembly, either because of the induced fit or modeling errors. Thus, analysis of sidechain rotamers is useful for assessing residue interactions in the complex (Guharoy et al., 2010).
We have recently integrated comparative (homology) protein structure modeling by MODELLER (Fiser et al., 2000; Marti-Renom et al., 2000; Sali and Blundell, 1993), multiple simultaneous fitting into EM maps by IMP MultiFit (Lasker et al., 2010a; Lasker et al., 2009), SAXS profile fitting by IMP FoXS (Forster et al., 2008; Schneidman-Duhovny et al., 2010), and evaluation of sidechain conformations from backbone-dependent and backbone-independent rotamer libraries (Dunbrack, 2002; Lovell et al., 2000) into the UCSF Chimera molecular visualization package. These capabilities augment over 100 tools already provided by Chimera for the interactive analysis of atomic models, density maps, and protein sequences (Couch et al., 2006; Goddard et al., 2005; Goddard et al., 2007; Meng et al., 2006; Morris et al., 2007; Pettersen et al., 2004; Pintilie et al., 2010). Chimera provides graphical user interfaces to simplify setting up input data and parameters for the fitting process, evaluating results, and performing cycles of refinement for building models of macromolecular assemblies. The homology modeling, EM fitting, and SAXS calculations are launched from Chimera and executed remotely via MODELLER- and IMP-based web services (Russel et al., 2011), with results displayed in the molecular visualization environment as they become available. The web service approach allows incorporation of improvements without the user installing new software, and can provide transparent access to more powerful computing resources. Optionally, calculations can also be performed using locally installed copies of MODELLER and IMP. MultiFit and FoXS are part of the Integrated Modeling Platform (IMP) package (Russel et al., 2011) that performs simultaneous optimization of multiple restraint types to generate ensembles of assembly structures consistent with diverse experimental data (Alber et al., 2008; Lasker et al., 2010b; Robinson et al., 2007). The Chimera user interfaces described here are a first step towards a more comprehensive graphical user interface (GUI) to control and visualize results from this suite of tools.
Next, we describe the current assembly modeling tools and then illustrate their range of capabilities on two example systems, GroEL chaperonin and ARP2/3.
MODELLER is used for homology or comparative modeling of protein three-dimensional structures (Eswar et al., 2001; Marti-Renom et al., 2000). The user provides an initial alignment of the sequence to be modeled (“target”) to the sequence(s) of one or more known structures (“templates”). MODELLER then calculates a set of plausible structures containing all non-hydrogen atoms. MODELLER implements comparative modeling by satisfaction of spatial restraints (Sali and Blundell, 1993) and can perform many additional tasks, including fold assignment, sequence- and structure-based alignments, de novo modeling of loops, and model assessment (Fiser and Sali, 2003; Fiser et al., 2000).
In Chimera, the modeling process can be initiated by input of only the target sequence, or if already available, a sequence alignment including the target and at least one other sequence for which a structure is known. If the input is only the target sequence, BLAST is used via a web service to search the PDB database for potential templates. The user can choose one or more of the hits to be fetched from the PDB and to be included along with the query (target) in a sequence alignment. Sequence alignments in Chimera are displayed in the Multalign Viewer tool (Meng et al., 2006). This tool includes many features, such as automatic sequence-structure communication, calculation of measures of conservation, and simple editing (e.g., adjusting gaps as well as adding and deleting sequences). When the alignment is satisfactory, the user can choose Structure -> Modeller Tools from the Multalign Viewer menu to set up and launch the MODELLER calculation, to be run either locally or via a web service. The process is run in the background and can be monitored with Chimera’s task manager. When the results become available, the models are displayed in Chimera and their associated scores shown in a table (Figure 1). The table lists the GA341 (Melo et al., 2002), zDOPE and DOPE scores (Pieper et al., 2011; Shen and Sali, 2006). Clicking “Fetch Scores” triggers a web service that calculates additional model scores: the estimated RMSD and estimated overlap (Eramian et al., 2008).
The MultiFit module of IMP simultaneously fits atomic structures of components into their assembly EM density map at resolutions as low as 25 Å (Lasker et al., 2009). The component positions and orientations are optimized with respect to a scoring function that includes the quality-of-fit of components in the map, the protrusion of components from the map envelope, and the shape complementarity between pairs of components. The scoring function is optimized by the exact inference optimizer DOMINO that efficiently finds the global minimum in a discrete sampling space. If cyclic symmetry is specified, the symmetry is imposed within the optimization procedure for added efficiency (Lasker et al., 2010a).
Chimera’s MultiFit GUI (under Tools -> Volume Data in the menu) takes as input one or more protein structures and an EM density map. Chimera allows editing structures and maps to generate the desired inputs: structures can be combined and subsets of their atoms selected or deleted, and specific regions of maps can be extracted. The user can further specify the level and resolution of the EM map, as well as the number of copies of subunits. Multiple copies of a structure can be fit assuming cyclic symmetry, or multiple different structures fit without using symmetry. The results are returned as a list of possible configurations with correlation scores indicating their goodness of fit to the density (Figures 2 and and5).5). Choosing a row in the list displays the corresponding set of structures in the Chimera graphics window. More than one row can be chosen to display multiple sets of results simultaneously for comparison; the different solutions are shown with different colors. A typical run takes a few minutes and can be performed locally or remotely via a web service.
Fast X-Ray Scattering (FoXS), another module of IMP, is a rapid and accurate method for calculating a theoretical SAXS profile given an atomic structure (Schneidman-Duhovny et al., 2010). The method explicitly computes all interatomic distances and models the first solvation layer based on atomic solvent-accessible surface areas. Alternatively, a fast coarse-grained profile can be calculated based on protein Cα positions only. The theoretical SAXS profile can be fitted to an experimental SAXS profile by minimizing a penalty (χ2) function.
The FoXS interface (in the Chimera menu, Tools -> Higher-Order Structure -> Small-Angle X-Ray Profile) takes as input one or more entire structures or the currently selected atoms and, optionally, an experimental SAXS profile. Results are returned as a two-dimensional plot (Figure 3). If an experimental profile is provided, the theoretical profile will be scaled to fit the experimental profile and the χ value representing their quality of fit reported in the legend. Advanced options include whether or not to adjust excluded volume and hydration parameters to improve the fit, whether or not to apply a background adjustment to the experimental data, and whether or not to perform coarse-graining. The typical running time is less than a second for a system of a thousand atoms, and can extend to a few minutes for tens of thousands of atoms. Users can modify the input structures and recalculate the SAXS profile, and multiple results can be shown on the same plot.
Further structural refinement can employ the Chimera Rotamers tool. This tool allows viewing and evaluating likely conformations of amino acid sidechains and incorporating them into structures. A residue can be updated to a different conformation of the same type of amino acid residue or mutated into a different type.
Rotamer libraries are catalogs of distinct conformations of amino acid sidechains and their probabilities, usually extracted from a sample of high-quality structures (Dunbrack, 2002; Lovell et al., 2000). The probabilities can reflect not only residue type but also other information, such as the backbone of the residue, defined by the and ψ dihedral angles. The Rotamers tool in Chimera further combines data from a rotamer library with the evaluation of local non-bonded interactions to facilitate identifying likely sidechain conformations in the context of an entire structure.
In Chimera, one or more amino acid residues can be selected to indicate positions of interest within a protein structure. The Rotamers tool (under Tools -> Structure Editing in the menu) allows specifying an amino acid residue type, which could differ from that in the structure, and the rotamer library to use. Three library options are provided: the Dunbrack backbone-dependent rotamer library (Dunbrack, 2002), the Richardson backbone-independent library (Lovell et al., 2000) with the author-recommended common-atom values, and the Richardson library with modal (peak) values instead of common-atom. At each selected position, a “bouquet” of rotamers is displayed (Figure 4). The rotamer sidechain torsion (χ) angles and probabilities from the library are listed in a separate window. Choosing one or more rows in the list with the mouse displays only the corresponding rotamers in the main window and hides the others. Importantly, the probabilities in the list are taken from the rotamer library and are not affected by the structural environment, except by and ψ angles when the Dunbrack library is used. The rotamer list Columns menu allows calculating the number of clashes (bad contacts) and hydrogen bonds formed by each rotamer with its surroundings, and showing these results as additional columns in the list. If a suitable density map is present, the rotamers can also be evaluated for their fit to the density. The list can be sorted by the values in any column by clicking the column header. The library and local environment information together with interactive viewing of specific rotamers facilitate identifying the best conformation given the entire context. A single rotamer can be chosen and used to replace the pre-existing sidechain coordinates. The Rotamers tool is also useful for revealing cases where an experimental structure has a nonrotameric conformation, possibly due to other local constraints, or suggesting how well a structure might accommodate a particular mutation and what effect that mutation might have on its function.
Bacterial chaperonin GroEL is a widely studied ATP-regulated molecular machine, composed of two back-to-back stacked rings (cis and trans), each containing seven 60 kDa subunits of the same sequence (Horwich et al., 2007; Ranson et al., 2001; Sigler et al., 1998; Zeilstraryalls et al., 1991). In this example, we model the structure of the E. coli GroEL structure based on an EM density map (EMDB: 1042) (Ranson et al., 2001) and structural homologs of the subunit. We then assess models by a SAXS profile and examine sidechain rotamers of a pair of residues that form a salt bridge.
The homology modeling input is the amino acid sequence of the target, E. coli GroEL (UniProt id: P0A6F5) (Blattner et al., 1997; Burland et al., 1995; Hemmingsen et al., 1988). BLAST was launched via a Chimera web service to search the PDB database for known protein structures with sequence similarity to the target (the E. coli structures were filtered out from the search results). The following GroEL structures were chosen as the templates: chains A and B from Mycobacterium tuberculosis chaperonin 60.2 (PDB: 1SJP) (Qamra and Mande, 2004), chain H from Thermus thermophilus chaperonin (PDB: 1WE3) (Shimamura et al., 2004) and chain A from Paracoccus denitrificans chaperonin-60 (PDB: 1IOK) (Fukami et al., 2001) The pairwise identities between the target and template sequences are 59.3, 57.1, 63.2, and 68.0%, respectively (Figure 1).
The homology modeling process was launched as a web service from the Chimera-MODELLER interface. After 20 minutes on the Linux cluster running the service, ten MODELLER models were returned along with statistical measures of model accuracy, GA341 (Melo et al., 2002) and Discrete Optimized Protein Energy (DOPE and zDOPE, raw and normalized, respectively) (Shen and Sali, 2006). A zDOPE score below -1 indicates that the distribution of atom pair distances in the model resembles that found in a large sample of known protein structures. The “Fetch Scores” option on the results dialog was used to calculate two additional scores, Estimated RMSD and Estimated Overlap relative to the true structure (Eramian et al., 2008).
These models can also be assessed by comparison to any of the several previously solved structures of E. coli GroEL. The highest-resolution structure available from the PDB for the intact complex in the apo form is 2NWC (3.02Å) (Kiser et al., 2007). We chose to compare the monomer models to chain A of 2NWC, or 2NWC_A. The ten models were superimposed on 2NWC_A using the Chimera MatchMaker tool, with resulting Cα RMSDs ranging from 1.19 to 1.73Å for all residues present in the structure (2NWC_A contains 524 of the 548 residues found within the UniProt sequence file).
The result so far is a model of a single subunit. While GroEL consists of two back-to-back stacked 7-subunit rings, we modeled one ring at a time using MultiFit’s cyclic symmetry mode.
Next, we used MultiFit to simultaneously fit seven copies of the subunit into the cryo-EM density map of the GroEL cis ring complex and then repeated the procedure for the trans ring (Figure 2). A cryo-EM map of the ATP-bound state of GroEL is available from the EM Data Bank (EMDB: 1042; Ranson et al., 2001). The cis ring can be extracted using the “region bounds” option in the Chimera volume viewer or using Chimera’s segmentation tool (Pintilie et al., 2010). Through the Chimera-MultiFit GUI, the model of the monomer from MODELLER previously identified as the best match to 2NWC_A (lowest-RMSD, see above) and the segmented cryo-EM map were chosen and submitted to the MultiFit web service. The top eleven solutions were returned. Similarly, the trans ring density was extracted and MultiFit was used to model the second heptameric ring. Subsequently, the two ring models were combined into a single atomic structure (Figure 2D) using their positions relative to the original map. The highest-correlation structures of the two rings were combined to give one atomic model, the second-highest-correlation structures of the two rings to give a second model, and so on for a total of six models of the apo form of GroEL.
Superposition of the highest-correlation model with the known structure of E. coli GroEL (PDB: 2NWC) using Chimera’s MatchMaker tool revealed a Cα RMSD of 2.66 Å for 7336 Cα atom pairs.
To further assess the results from previous steps, SAXS profiles of the six model structures and 2NWC were calculated and fit to the experimental profile provided by Chiu and Ludtke (Ludtke et al., 2001). We chose the coarse-grained SAXS profile option (Cα atoms only) because the complexes are large (52,668 atoms), allowing calculating and fitting a theoretical profile in approximately 2 minutes. The best match between the experimental and theoretical profiles is for Model03, with a mediocre χ of 10.2 (Figure 3). The relatively high χ value could be a consequence of several factors. First, the solution conformation of GroEL may differ from the crystal structure, either within subunits or between subunits. Second, it is also possible that multiple conformations are present in solution. Third, the SAXS profile may not have been determined or computed accurately.
Intersubunit salt bridges play a crucial role in the cooperativity of GroEL (Ranson et al., 2001) and its transitions between functional states (Hyeon et al., 2006; Yang et al., 2009). In the T state, residue E386 at the end of helix M in the intermediate domain forms salt bridges with R284, R285 and R197 on adjacent subunits. These salt bridges stabilize the apo form of GroEL’s T state. During the transition from the T state to the R state that occurs with ATP binding and hydration, these contacts are disrupted and E386 forms an intersubunit salt bridge with K80 instead. It is desirable to study the sidechains of these residues, but the resolution of the EM density map is too low for a generating a precise model based on the map alone. Instead, we employ the Chimera Rotamers tool. The Dunbrack rotamer library was used in this example. Figure 4 shows the rotamers of E386 and R197 that have zero clashes (bad atomic contacts) with their surroundings (thin wires), so these are the rotamers that “fit” into the structure. The sidechain conformations in the model (which were determined by MODELLER at the monomer-modeling stage) are shown with sticks. For E386, the highest-probability zero-clash rotamer matches the conformation in the model. The probabilities are based only on backbone conformation, so clashes are the more important criterion in the context of the structure. For R197, none of the zero-clash rotamers closely matches the conformation in the model, but these rotamers are no less reasonable, and in fact the rotamer listed third in Figure 4 (third-highest-probability of the zero-clash rotamers) is better positioned to form a salt bridge with E386 than the conformation in the model (4.1Å between NH2 and OE1 vs. 5.9Å). 4.1Å is still too far for a good salt bridge interaction, but as close as possible given the backbone geometry.
The actin-related protein-2/3 (ARP2/3), a seven-protein asymmetric complex, plays a major role in the formation of branched actin-filament networks during diverse processes ranging from cell motility to endocytosis (Goley and Welch, 2006). The bovine ARP2/3 complex structure has been determined by X-ray crystallography (Robinson et al., 2001), revealing its molecular organization. However, a structural understanding of how the ARP2/3 complex mediates actin filament formation is still limited. EM studies have been useful for investigating ARP2/3 function (Egile et al., 2005). To demonstrate the potential application of the Chimera MultiFit tool, we simultaneously fit comparative models of the ARP2/3 subunits into a simulated 15Å density map of ATP-bound ARP2/3 (PDB id: 1TYQ)(Nolen et al., 2004) created using Chimera’s “molmap” command (Figure 5).
Comparative models were calculated separately for each of the seven target subunits. BLAST was launched via Chimeras web service to search the PDB database for homologous protein structures. Twelve structures were used to model each of the seven target subunits (Table A.1); 1TYQ subunits were filtered out since these were used to generate the assembly density map.
A MODELLER comparative modeling process was then launched from the Chimera-MODELLER interface separately for each target subunit. Modeling a single subunit took 4-7 minutes on the Linux cluster running the MODELLER web service and produced five comparative models and associated quality scores, as described above. The “Fetch Scores” option in the results dialog was used to calculate two additional scores, Estimated Cα-RMSD and Estimated Overlap relative to the hypothetical true structure (Eramian et al., 2008). The model with the best Estimated Cα-RMSD was selected for each subunit (Table A.2).
The space sampled in simultaneous asymmetric fitting is generally larger than that sampled in symmetric fitting, as fewer geometric restraints are provided. Thus, the user has the option of performing either global or local searches via MultiFit. With the global search option, no prior information regarding subunit positions and orientations is used, so no initial placement of the structures is required. Alternatively, the user can approximately place the subunits within the density map and perform local searching, in which case MultiFit will sample possible solutions for each subunit only within a bounding box surrounding the initial placement. These initial placements can come from individual subunit fitting (using, for example, the Chimera Fit in Map tool), manual visual fitting, or other methods based on prior knowledge of the complex. For the ARP2/3 system, running MultiFit with global sampling gave correlation scores for the top ten solutions ranging from 0.8 to 0.86 and took approximately 5 minutes. Comparison of the top-scoring model to the reference crystal structure gave a Cα-RMSD value of 6.09 Å and 31% of Cα atoms superposing within 3.5 Å of their native positions. For local sampling Chimera is first used to launch a MultiFit anchor graph calculation. The anchor graph suggests approximate centroid positions for the seven subunits (Figure 5C). The user can then use the anchor graph as a guide for placing the individual subunits within the density. Once initial positions are set, a MultiFit run with local sampling can be launched from Chimera. With this approach, the top ten solutions for the ARP2/3 assembly had correlation scores ranging from 0.93 to 0.96. The local sampling run took about twice as long as global; although both calculations are parallelized on the web server (otherwise global sampling could take hours), the local search employs finer sampling. Comparison of the top ten local fitting solutions to the reference crystal structure gave Ca-RMSD values of 3.9-4.7 Å and 72-80% of Cα atoms superposing within 3.5Å of their native positions.
UCSF Chimera is an extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, sequence alignments, docking results, and molecular dynamics trajectories. MODELLER (http://salilab.org/modeller) implements comparative protein structure modeling by satisfaction of spatial restraints and can perform many additional tasks, including de novo modeling of loops and optimization of protein structure with respect to a flexibly defined objective function. IMP (http://salilab.org/imp) is a suite of modules, including MultiFit (http://modbase.compbio.ucsf.edu/multifit/) and FoXS (http://modbase.compbio.ucsf.edu/foxs/), for the integrative structural characterization of macromolecular assemblies.
The current work integrates comparative modeling with MODELLER, multiple-structure fitting into EM density maps with MultiFit, and SAXS profile calculation and comparison with FoXS into the Chimera system, with easy-to-use graphical user interfaces. The calculations can be run locally or via web services hosted by the UCSF Resource for Biocomputing, Visualization, and Informatics.
Chimera includes complete documentation and can be downloaded free of charge for noncommercial use, with versions available for Mac, Windows, and Linux (http://www.rbvi.ucsf.edu/chimera). The interfaces and features described here are available in Chimera daily builds dated September 1, 2011 and later, and will be available in Chimera production releases version 1.6 and higher. In addition, video tutorials illustrating use of the tools described in this paper are available at http://www.rbvi.ucsf.edu/chimera/videodoc/JSB-Yang/index.html.
Limitations should be noted. MultiFit performs rigid-body rather than flexible fitting, and the symmetric fitting currently handles only cyclic symmetries. Furthermore, the Chimera interfaces do not provide access to all options that would be available from running MODELLER or IMP modules directly. There is a trade-off between keeping the interfaces simple and easy to use, versus accommodating more controls and more types of calculations. For example, the Chimera-MODELLER interface allows including water and/or ligand molecules from template structures, but only a single protein chain or subunit can be modeled at a time.
However, the integration described here brings several advantages. The accessibility of MODELLER, MultiFit, and FoXS calculations is enhanced by simple graphical interfaces and the provision of web services, so that local installations are not required. Chimera can be used to search the PDB for modeling templates and to prepare structures, sequence alignments, and density maps as inputs to the modeling and fitting calculations. Results can be analyzed in several ways, including measuring distances and angles, identifying hydrogen bonds and other contacts, coloring to show properties such as residue hydrophobicity and sequence conservation, and superimposing structures.
Chimera’s task manager monitors web service data transfers and execution progress. Advanced users/developers could potentially use their own web services to run calculations launched from Chimera. Chimera’s web services are implemented using the Opal Toolkit (nbcr.net/software/opal) (Krishnan et al., 2009; Krishnan et al., 2006).
To date, structure determination of challenging macromolecular assemblies requires integration of different data types obtained by multiple methods (Alber et al., 2007). An integrated visualization-based platform can greatly facilitate modeling tasks and lower the barrier to their use. We have illustrated the modeling of two multi-protein complexes from sequence to 3D structure using Chimera, MODELLER, and IMP. We expect that integrative modeling protocols, coupled with a user-friendly visualization tool such as Chimera, will become increasingly useful and facilitate maximizing the coverage, accuracy, resolution and efficiency of the structural characterization of macromolecular assemblies.
The authors gratefully acknowledge Prof. Wah Chiu and Dr. Steven Ludtke for providing the SAXS experimental data for the GroEL example. The research of KL was supported by continuous mentorship from Prof. Haim J. Wolfson as well as a fellowship from the Clore Foundation Ph.D Scholars program. KL’s research was carried out in partial fulfillment of the requirements for a Ph.D. degree from TAU. This work was funded by NIH grants R01 GM083960, U54 RR022220, and PN2 EY016525 to AS and P41 RR001081 to TEF.