Gorgon occupies a unique area in interactive modeling tools. It is the only software toolkit that has been developed exclusively for modeling protein structure directly from non-atomic resolution density maps. Like many other interactive modeling tools, such as Chimera (Pettersen et al., 2004
), UROX (Siebert and Navaza, 2009
), COOT (Emsley et al., 2010
) and Sculptor (Birmanns et al., 2010
), Gorgon has a modular design that utilizes a set of menu-driven processes to manipulate density maps and/or models. Not only does Gorgon offer tools for fitting known structures but also has a set of unique utilities, including feature recognition and correspondence searches, for skeleton-based model construction at near atomic resolution density maps. Currently, no other modeling toolkit provides such a complete interactive environment for modeling protein structure directly from a density map at non-atomic resolutions.
It should also be noted that Gorgon and the de novo modeling process results in a Cα-only model without considerations for mainchain and sidechain constraints. These types of model are less detailed than those generally obtained from Coot or other modeling tools targeted at higher resolution crystallographic data that include additional density and biophysical constraints. The utilities in Gorgon allow a user to rapidly and robustly build “first-approach” models that generally have the correct topology or protein fold but non-optimized atom positioning and/or assignment. Again, utilities commonly found in atomic modeling software, such as Ramachandran plots and rotamer libraries, are not used for Cα backbone construction and thus not part of Gorgon's current toolkit. In the event a density map does have sufficient resolution to visualize the majority of sidechains, Gorgon's initial model can be further optimized using conventional modeling and refinement software designed for X-ray crystallography.
Although Gorgon is designed around building de novo models directly from the cryo-EM density map, components of Gorgon can be used independently. As an example, secondary structure identification can be performed on nearly all subnanometer resolution density maps without using any of Gorgon's additional features. However, it is important to emphasize that all of Gorgon tools are focused on subnanometer resolution density maps. As such, Gorgon's “Fit to Density” tool is extremely quick and accurate for fitting atomic models to a density map, but requires the presence of SSEs and thus, would not work on low-resolution density maps.
The extensible framework of Gorgon provides users with a convenient and feature-rich environment for structure annotation at intermediate resolutions. This flexible framework allows for the integration of additional software packages, such as molecular dynamics and modeling programs, as well as facilitating the rapid development of new algorithms and tools for use within Gorgon's user interface. We envision that this design will allow for the incorporation of new tools, such as flexible fitting and segmentation routines, as the number of subnanometer resolution structures continues to grow.
Pitfalls in de novo modeling
While Gorgon was designed to provide a complete modeling environment for subnanometer to near-atomic resolution density maps, there are certain caveats. Ultimately, Gorgon's utilities are limited by the resolvability of map features and not by the stated map resolution. With the utilities in Gorgon, a density map is processed to produce an annotation, such as a skeleton, SSEs, a correspondence or a model. Therefore, even if the map has sufficient resolution but is of low quality, accurate results cannot be obtained from Gorgon, or that matter, any other modeling software.
In Gorgon, the de novo modeling protocol is based on establishing a sequence to structure correspondence using SSEs. This correspondence is based on the ability to accurately predict SSEs in sequence and identify them in the density map. Secondary structure predictions typically have a success rate of ~80%. Often, consensus sequence predictions are built from many different predictions to reduce potential errors. However, even small errors in secondary structure prediction can result in small register shifts, mis-identification of lengths or even missing SSEs. During model building, these errors are propagated throughout the model resulting in improper sequence to structure assignments. Whether it is from a consensus prediction or better prediction methods, improvements in the prediction will translate directly to more accurate models.
Another potential hazard is the lack of anchor points in model building. A correspondence search provides the user with a set of SSE visualized in the density that can be assigned to sequence. These SSEs serve as anchor points from which a model is “grown” along a skeleton path connecting these elements. To assist in growing and anchoring the backbone trace, sidechain density can also be used, though it is often only sporadically visible in the best cryo-EM density maps at the present time. As such, regions with no or few anchor points, such as long extended loops, can result in less accurate Cα placement.
Illustrating the aforementioned sources of error, the average error for the backbone trace of Rotavirus VP6 was ~1.7 Å, indicating a high level of accuracy in describing the fold of the protein. While, nearly 50% of all Cα atoms were within 3 Å of their corresponding position in the X-ray structure, ~11% of the Cα atoms had deviations greater than 8 Å. These deviations were often associated with either long loops between anchor points or in regions where the secondary structure prediction was inaccurate. At higher resolutions, sidechain density can aid in the placing of Cα atoms and help eliminate some of these outliers.
As already mentioned, a backbone trace is built along a path defined by the density skeleton, which is a simplistic geometrical representation of the medial axis of the density. This means that atoms are placed linearly in all regions except for helices. β strands do not contain the zig-zag like appearance found in atomic resolution models, but rather are just a linear array of Cαs.
Finally, no explicit refinement tools are used in generating models in Gorgon. Simple distance warnings in Gorgon and visual inspection of density occupancy and bond geometry are essentially the only mechanisms of assessing model quality. As no computational refinements are done from directly inside Gorgon, no force fields or atomistic prosperities are considered during model building.
Despite their drawbacks, “first-approach” de novo models are topologically equivalent to a fully refined protein structure. Combined with constraints derived from their density map, these de novo models can be computationally refined with additional modeling software, such as Rosetta, to produce more accurate and stereochemically correct all-atom models.
Modeling and resolution
As all maps vary in composition, quality and resolution, it is difficult to assign an exact resolution cut-off for building models. This is in part due to the various resolution definitions and variability in resolvability of maps at ostensibly the same resolution. Clearly though, model building is easier and more reliable at higher resolution, though still possible even at lower resolutions (Böttcher et al., 1997
; Conway et al., 1997
; Zhou et al., 2001
). Regardless of resolution, for building de novo
backbone models, the density map must contain SSEs that can be clearly identified.
As demonstrated with the Rotavirus VP6 and Rice Dwarf Virus P8 capsid proteins examples presented here, models could be constructed from their respective density maps despite the difference in resolution. Even where loops were somewhat ambiguous in the case of P8, the presence of well-defined helices allowed for short loops to be built between the helices in the lower domain. However, in the larger β-sheet rich upper domain of P8, no clear path could be found due to the lack of resolution and identifiable anchor points. In VP6, which has ostensibly the same protein fold as P8 but nearly 3 Å better in resolution, the β-strands were clearly visible making construction of a complete model possible. It should also be noted that a potential complication in higher resolution (>4Å) structures is feature detection with SSEHunter, as the β-sheets begin to look like a series of parallel densities rather than a thin flat plate. As such, β-sheets at this resolution can simply be treated as a series of loops during modeling.
Model building requires a significant time investment and an understanding of density features at near-atomic resolutions. Size, complexity and quality of the density map all affect the model building process. Even the most experienced users may not be able to build a reliable model in poorly resolved regions of density maps. Therefore, the time and ease of building a de novo
model is related to the map quality, motif of the target protein and experience of the user. Gorgon attempts to integrate and streamline the model building process in a user-friendly environment. Anecdotally, the GroEL model required approximately three months to construct by hand (Ludtke et al., 2008
). With Gorgon, the construction of a similarly size Rotavirus VP6 took a single afternoon by an experienced Gorgon user.
Since Gorgon's initial release, it has grown to include numerous utilities dedicated to annotating subnanometer resolution density maps from cryo-EM and other structural methodologies. In addition to these utilities, walkthroughs, video tutorials, bug-tracking and biological/computational references have been added to the Gorgon website (http://gorgon.wustl.edu
), which details the tools and approaches found in Gorgon. As the number of subnanometer resolution structures continues to grow, we anticipate that the Gorgon user-base will continue to grow rapidly.