Current de novo
model building procedures generally rely on the presence of structural landmarksfrom which manual or semi-automated model building is initiated. Pathwalking
rapidly constructs first-approach models, represented as Cα backbone traces that are topologically equivalent to the protein’s tertiary structure, without requiring a priori
knowledge. Such models serve as initial starting points for further refinement with software such as Rosetta, Modeller
(Alber et al., 2007
; Bradley et al., 2005
; Schröder, Brunger and Levitt, 2007
is unique in that it is completely de novo
, sequence-free, template free, semi-automated and suitable for use on maps from 3 to 7 Å resolution. Unlike most of the modeling tools in cryo-EM, pathwalking
does not use a structural template for model building, refinement or evaluation. Furthermore, pathwalking
minimizes user intervention, unlike interactive modeling tools like Gorgon, O or Coot (Baker et al., 2011
; Emsley et al., 2010
; Jones et al., 1991
). X-ray crystallographic tools exist for (semi-) automatic model building, however these utilities are targeted to higher resolution density maps, though some can potentially be applied to 3–4 Å resolution density maps (Cohen et al., 2004
; Cowtan, 2006
While pathwalking is almost completely automated, many control points have been added to allow for user input regarding potential paths. Evaluated visually, a good path should: connect all pseudoatoms such that each is visited only once, contains no intersecting path segments, have reasonable connectivity (bond distances and angles) and have connections within/bounded by the density map. Additionally, the model is expected to have “realistic” structural features. Regions in the density map shown to have helices should have pseudoatoms and a path arranged helically; regions containing β-sheets should have parallel/anti-parallel strands. Threading the primary sequence on to a path and evaluating it in the context of SSEs and sidechain density can also be used in the evaluation a model. If the user perceives a problem with the path or wishes to evaluate alternate paths, pathwalking can be run multiple times simply by varying the parameters for pseudoatom placement and/or path searching, adding constraints or manually adjusting “bad” regions of the trace. Such interventions may improve registration of SSEs and sidechains in the density map, which are not explicitly considered in pathwalking.
For evaluating pathwalking, we created a large enchmark data set. In the initial test, we examined the TSP-solvers for pathwalking in a set of 737 non-redundant protein structures. In this data set, we used the position of the known Cα atoms as the pseudoatom inputs to e2pathwalker.py. This test showed that a correct path could be identified given reasonably spaced pseudoatoms. In the second benchmark, we considered not only the problem of path tracing but also the problem of placing pseudoatoms in simulated density maps. Our pathwalking approach produced correct topological models in all the examples, though some non-protein like geometries were observed. In the final set of tests, we examined the entire pathwalking procedure on authentic density maps ranging from ~4–8 Å resolution. This benchmark covered a wide range of fold-types and was representative of maps deposited in the EMDB and PDB. While in the higher resolution data sets paths through the density maps contained a limited number of ambiguities, lower resolution density maps, like the ribosome density map, did not have unambiguous paths and were considerably harder targets. It should also be noted that some of the higher resolution density maps were not uniformly resolved and contained regions where the density was considerably more difficult to evaluate (apical domain of GroEL). Overall, the set of simulated and authentic density maps provide a realistic baseline for what users should expect with density maps in the “near-atomic” resolution range.
In nearly all of our test cases, pathwalking
produced topologically correct models (CLICK score close to 1), though the exact amino acid assignment was often out of register, resulting in relatively high RMS deviations when compared to the known structure (–). The emphasis in pathwalking
is that models can be built directly from the density map with correct topologies, despite errors in amino acid assignments. As demonstrated, this level of error can be corrected with additional optimization steps (DiMaio et al., 2009
). In GroEL, a single iteration of density-based refinement using Rosetta
resulted in improved stereochemistry and geometry, and also repaired a vast majority of the sequence shifts, lowering the RMS deviation by 16.4% (). Additional rounds of refinement would likely further improve model quality.
In the cases where pathwalking
did not give the correct fold on the first iteration, models typically did not agree with the secondary structure predictions. In chain Q from the ribosome density map, several strands and loops were transposed (Figure S7
). The model visually appeared to agree with the density map, however it did not agree with the secondary structure, indicating a bad topological path. In this case, it was possible to constrain well-defined regions and calculate an alternate path (Figure S7
, row 6).
Our approach requires that a single subunit be accurately segmented from the entire density map. Missing portions or extra density will result in poor pseudoatom placement (). Depending on the level of mis-segmentation, pathwalking may not yield the correct protein fold. Therefore, it is imperative that segmentation be as accurate as possible. In practice, segmentation and model building at subnanometer resolutions are usually coupled and, as such, the pathwalking protocol may need to be run iteratively as subunit boundaries are defined.
With pathwalking, it is possible that the connections between pseudoatoms could be adversely effected by non-optimal pseudoatom placement. The TSP solvers do not consider this uncertainty. By adding random perturbations of varying strength to the pseudoatom coordinates and running e2pathwalker.py many times, alternative models can be computed. In most cases, the ensemble of the models will agree topologically, though differences may be seen in poorly resolved regions. Degenerate paths in a “fuzzy” loop may connect the same pseudoatoms in different orders yet still maintain the protein fold. Conversely, the same path may be achieved with a different set of pseudoatoms. In these cases, the user is required to judge which order of connectivity is best based on features in the density map, path geometry and a priori information. Additionally, a user can explicitly add or remove connections based on other biochemical information and/or visual interpretation. In all cases, the best model can generally be selected visually such that it meets basic protein structure requirements.
Map resolution is also a factor in model accuracy. From our benchmarks, it was possible to construct first–approach models even at 7–8 Å resolution with our pathwalking
tools. As all density maps vary in composition, quality and resolution, it is difficult to assign hard limits for pathwalking
. This is in part due to the various resolution definitions, variability in resolvability of density maps and the SSE content in the protein. The accuracy of pathwalking
is a direct reflection of the resolvability of features in a density map. At subnanometer resolutions, α-helices tend to be better resolved than loops and β-sheets, making it possible to construct models for all helical proteins at lower resolutions (Figures S2–S5
). A well-defined map containing mostly helices at 7 Å resolution will undoubtedly yield better results than a poorly resolved density map of an all-β protein at 4.5 Å resolution. Ultimately, the resolvability of structural features dictates the limitations of our approach. Therefore, we cannot specify an absolute resolution range for pathwalking
Model Validation with Pathwalking
Beyond model construction, our pathwalking procedures can be used to assess de novo model validity and report potential alternative topologies. As in the case of ε15 gp7, alternate models using the pathwalking procedure can highlight potential areas of structural ambiguity. This can be particularly useful when dealing with models where resolvability is limited.
Pathwalking represent the first step in sequence and template-free modeling in near-atomic resolution density maps. This process is capable of rapidly computing first-approach models for individual subunits in large macromolecular complexes. Additionally, the same utilities can be used to validate models and display alternate topologies. We believe our pathwalking tools will become an important part of model building and validation for the growing number of near-atomic resolution density maps by cryo-EM and X-ray crystallography.