|Home | About | Journals | Submit | Contact Us | Français|
Single particle cryo-electron microscopy (cryo-EM) is a technique aimed at structure determination of large macromolecular complexes in their unconstrained, physiological conditions. The power of the method has been demonstrated in selected studies where for highly symmetric molecules the resolution attained permitted backbone tracing. However, most molecular complexes appear to exhibit intrinsic conformational variability necessary to perform their functions. Therefore, it is now increasingly recognized that sample heterogeneity constitutes a major methodological challenge for cryo-EM. To overcome it dedicated experimental and particularly computational multi-particle approaches have been developed. Their applications point to the future of cryo-EM as an experimental method uniquely suited to visualize the conformational modes of large macromolecular complexes and machines.
Cryo-electron microscopy (cryo-EM) in combination with digital image processing (single particle approach) is well suited to determine structures of large macromolecular complexes and machines at intermediate resolution (6-30 Å) . In the course of single particle image processing, thousands to hundreds of thousands or even millions of particle projections are combined to yield an averaged, 3-D density map . Structure determination using single particle technique is based on the premise that the specimen can be purified such that the macromolecular complexes exist in a soluble, isolated and structurally identical form. Only then the EM images can be treated as if they are projections of a single particle. Since modern microscopes can deliver resolution better than 1Å, the short exposure time imposed by the possible radiation damage to the specimen is one major obstacle and precludes one from routinely achieving sufficient total Signal-to-Noise Ratio (SNR) that would result in atomic resolution of determined structures. Indeed, structures that permit backbone tracing of the entire secondary structure are at hand, at least for highly symmetric particles such as (icosahedral) virus capsids [3-5] or GroEL .
However, it became increasingly clear in recent years that the main premise of the single particle approach of the macromolecular complex existing in multiple identical copies of the same structure is rarely if ever fulfilled. On one hand, there is compositional heterogeneity. Macromolecular assemblies are instable to some extent. As biochemical purification protocols are often a compromise between intactness of the complex and purity, the random loss of certain components during preparation can result in subpopulations of incomplete or damaged complexes. On the other hand, there is conformational heterogeneity. Even relatively stable macromolecular machines such as the ribosome undergo large conformational changes during their functional cycle and therefore can occur in a mixture of conformational states [7-11]. Therefore, for most of the macromolecular complexes studied, sample heterogeneity appears to be a major obstacle in structure determination in addition to difficulties caused by low SNR of the data. Depending on the extent of the heterogeneity and the degree to which the variability in the structure is localized, there are several ways in which the differences between the individual complexes can influence the resulting 3-D map because also steps of the image processing (alignment, classification, determination of Euler angle) can be adversely affected.
A variability of the complex that is not localized but is uniformly distributed over its structure should not affect the alignment in a major way. This kind of variability will mainly affect the high spatial frequencies in the reconstructed object, resulting in decreased resolution. Nevertheless, the 3-D map will be correct in its general features. Another possibility is that a small region of the complex is significantly more flexible than the bulk of the structure. In this case the alignment of projection data will be dominated by the major – and invariable - part of the complex, yielding basically the correct structure, except that the spatial resolution within the region of the flexible domain will be lower than the average resolution of the cryo-EM map. In addition, flexible components may have lower densities than the average density of the map (as a result of “conformational averaging”), thus becoming invisible at the standard density threshold selected to visualize the complex. Substoichiometric occupancy of components of the complex (for example ligands) will result in similar effects precluding their direct visualization. However, the most serious problem appears to be caused by a third type of heterogeneity caused by major conformational variability manifesting itself in coexistence of multiple substates of the complex in the sample. The average over such a mixture will not represent the structure of the complex faithfully. It is not even the case that the solved structure is a simple average of the individual complexes whose projections were recorded in their basic states, since the 3D projection alignment will most likely introduce additional errors in orientation parameters.
In the presence of strong compositional and/or conformational heterogeneity it can be prohibitively difficult to determine a structure using single particle EM technique. The adverse influence of even small variations in the structure of the macromolecular complexes on the 3D reconstruction is becoming increasingly dominant at high spatial frequencies and thus hampers the improvement of the resolution of a given specimen. Therefore, it is now recognized that heterogeneity in the cryo-EM data set is a major limiting factor in structure determination by single particle cryo-EM and a variety of both experimental and computational methods have been proposed to address the problem. However, it is also recognized that although admittedly posing a methodological challenge, the analysis and visualization of conformational modes of a macromolecular assembly in solution yields biological information of utmost importance. In what can be considered multi-particle cryo-EM, one expects new information about the structural dynamics of macromolecular complexes and machines to emerge in the years to come.
Structural heterogeneity of the specimen significantly complicates computational methodology required for structure determination. First of all, one has to establish without any doubt that such variability is indeed present in the data and that the difficulties in structure determination or refinement are not caused by low quality of images or simply by mistakes in computational protocols employed. This is not at all a trivial problem, because reliable tools available to validate a cryo-EM map in the absence of external information are still lacking. Nevertheless, considerable progress has been made to tackle the problem of sample heterogeneity and a variety of approaches have been developed. In the following, we will give an overview of computational multi particle approaches. Advances in sample purifications  and grid preparation  are also of great significance in this context, but have to remain beyond the scope of this review.
Careful analysis of EM projection images in 2D is always recommended as 2D single particle methods are in many instances theoretically sound and much better developed than their 3D counterparts, not to mention much shorter time of calculations. This is particularly true for computational tools relevant for the analysis of conformational flexibility, such as calculation of variance fields of images, both in real and in Fourier spaces , eigenanalysis of variance/covariance matrix of the data, or cluster analysis of aligned images. The most straightforward approach is to compare similarly looking 2D averages and interpret the differences as the evidence for flexibility of the complex [15,16].
While the direct analysis of average and variance fields may be in practice cumbersome, the eigenanalysis proved to be an indispensable tool in cryo-EM . Originally, the method was used to detect staining artifacts in the data  or slight misalignments due to “rocking” of the molecule on the support grid . However, in the absence of obvious artifacts, it is possible to interpret the results of eigenanlysis in terms of conformational states of the molecule .
As powerful as 2D approach can be, it is not in wide use due to difficulties with unequivocal interpretation of the result. A subset of 2D images distinguished by a high degree of homogeneity has to be identified in a larger set of images with the help of clustering techniques. However, what is a homogeneous subset is a subjective decision and while clustering techniques can serve as guiding tools, it is impossible to support conclusions by any statistical analysis. The observed “flexibility” might be quite likely due to grouping of projection views at slightly different angular directions and in the absence of 3D structure this possibility cannot be disproved. Nevertheless, analysis of 2D data supported by PCA methodology can provide unique insight into conformational stability of the sample and in recent years undergoes resurgence, particularly in conjunction with higher dimensional methods.
A logical extension of analysis of 2D data is to expand the findings into 3D structure determination in this way confirming that the variability observed in 2D translates itself into presence of conformers of the protein in 3D. The most robust approaches are based on experimental determination of some of the necessary orientation parameters, such as in the Random Conical Tilt (RCT) technique . By taking at the electron microscope two pictures of the same area of the grid, one tilted to 50° - 60°, the second untilted, the technique provides immediately robust estimate of two Eulerian angles of projection images. The third one is established by 2D alignment of the selected particle images. Moreover, it is possible (and advisable) to apply cluster analysis to aligned untilted 2D images in order to verify that they indeed are projections of the same 3D molecule. Presence of distinct 2D class averages opens up a possibility to compute 3D reconstructions corresponding to all of them. A direct consequence is that it becomes possible to find orientations and merge independent reconstructions reducing artifacts resulting from missing Fourier information, the so-called missing cone problem of RCT-derived structures . More importantly, it also becomes possible to deduce presence of conformers by simply comparing different RCT reconstructions obtained from the same sample [22-25]. Similar results are obtained using the alternative “orthogonal tilt” data collection method .
More often than not collection of high-tilt cryo-EM data is extremely challenging and even if problems with charging and other instabilities are overcome, data is of unsatisfactory quality. Thus, there is significant effort invested into development of computational methods that utilize only untilted images. The goal is to deliver properly sorted homogeneous subsets of 2D projection images while at the same time establishing 3D geometrical relations between them. Methods in this class rely on the central section theorem that states that Fourier transforms of 2D projections of a 3D object are central section of the 3D Fourier transform of this object. It is straightforward consequence that Fourier transforms of any two projections intersect along a line, henceforth called a common line. One can find such 3D orientations of 2D projections that pair-wise discrepancies between respective common lines are minimized, which yields an ab initio model of the structure [27,28]. Further, given the discrepancies it should be in principle possible to simultaneously sort out data into homogeneous subsets . In a simpler application of common lines, it is assumed that orientations of projections are known, and discrepancies along common lines are only used for clustering of the 2D data . The approach can be very effective, as in case of demonstration of the “breathing” core of the pyruvate dehydrogenase complex . While promising, methods based on common lines methodology suffer from poor performance for low SNR data.
As the methods based on common lines methodology are not particularly robust, it is tempting to utilize larger amount of information included in the data, i.e., not only discrepancies along selected 1D lines in Fourier space, but between entire 2D images. This approach is based on “angular continuity” principle, which is a common-sense observation that 2D projections of a 3D object that have similar angular directions are similar, while those whose angular directions are far apart are not necessarily similar (for symmetric objects, this obviously holds for asymmetric unit only), and these projection direction dependent dissimilarities should be reflected by any discrepancy measure. This principle was used for ab initio structure determination and proved to be quite robust . If the structure is expected to be in multiple states, one can perform clustering of 2D projection data assigned to the same (or almost the same) projection direction and hope that dissimilarities between class averages corresponding to different substates are larger than those between class averages corresponding to the same substate but having different (but close) angular directions. All what remains is to mutually reconcile the class averages such that the angular consistency is maintained. As long as the heterogeneity is caused by the presence or absence of components of the complex (ligands), one can expect the complexes to have different overall molecular masses, resulting in projections at the same angular direction that have different average. In this case, the mutual reconciliation is achieved by labeling clusters within each of the angular sets according to the average pixel density, which finally leads to calculation of multiple initial 3D structures of the complex. The method was proposed as “focused classification”, the name owing to the idea of restricting of regions of interest in 2D based on regions of high variability in 3D . Similar reasoning was applied to the problem of sorting molecules by size, in which case the sorting of data was supplemented by the PCA analysis of 2D projection images . The latter approach was later extended to the detection of two conformers (with and without a ligand) [32,35]. While the initial formulation of the principle of focused classification was general , all applications so far were restricted to detection of two classes of presence and absence of a ligand. It remains to be seen whether more general applications will be shown and to which extent such methods can be extended to studies of conformational flexibility of macromolecular complexes.
Studies of conformational variability based on initial analysis of 2D cryo-EM projection data proceed in stages, each involving decisions that are somewhat subjective. In order to have a more straightforward and objective approach, it is preferable to attack the problem directly in 3D. When the imaged sample is a mixture of multiple conformers whose respective basic structures are known or can be guessed, “supervised classification” is a method relatively easy to use. One simply computationally project the known (or guessed) 3D templates into exhaustive sets of angular directions, compare all cryo-EM 2D data with all templates and assigns data to the thus established most similar template. This approach has been used to enhance the occupancy of a ribosomal ligand  or to distinguish between conformational states of the ribosome related by a ratchet-like rotation . However, the question of the bias introduced by the choice of initial templates has to be addressed very carefully. Moreover, as substantial prior knowledge is required, it is difficult to use this method to explore new systems or to discover unexpected behavior of the specimen.
The bias of initial 3D templates can be mitigated, if not eliminated altogether, by employment of a 3D version of a multi-reference alignment, quite successfully used in 2D work . In this method, each 2D EM projection data is aligned with all 2D reference projections of template volumes (whose number is in principle arbitrary). In addition to orientation parameters obtained as in 3D projection alignment, the values of similarity measures are analyzed and the data is assigned to the group of the most similar 3D template with appropriate orientation parameters and updated template volumes are computed [8,10,33,35,38-42]. The process is iterated until stable structures are obtained. The method constitutes a version of unsupervised classification in which class membership and orientation parameters are estimated simultaneously .
In order to make the method computational feasible, the initial 3D templates are not derived randomly from the data, as is standard in K-means clustering, but determined instead by one of the 2D to 3D approaches, for example by the focused classification . In the case of the viral capsid Heymann and colleagues used a linear combination of previously determined cryo-EM maps of two processed and unprocessed capsids in order to obtain a gallery of initial 3D templates . In the study of the fatty acid synthetase the initial 3D templates were obtained by normal mode analysis of the structure determined from the entire data set of projection images . In addition, Brink and co-workers tested the bias introduced by the selection of initial models and in this case the results of the refinement proved to be robust. In the case of ribosomal complexes, a variation of the method was introduced in which the multi-particle refinement is initiated by addition of a “neutral” template (in this case a vacant ribosome) to the pool of determined complexes [43,44]. After convergence of the multi-reference alignment, the pool of templates is again increased until stable sub-populations are obtained.
As in 2D case, the multi-particle refinement method is a version of K-means clustering procedure that is invariant with respect to projection-to-volume registration (see Appendix in ). It therefore inherits advantages and disadvantages of respective methods and compounds them due to the fact that both algorithms are combined. Multi-reference alignment performs very well when the number of 3D templates sought is known and is not too high, good initial guesses of all 3D templates are available, the individual structures are not too dissimilar, and the 2D projection data are distributed evenly among groups. The main drawback is the substantial computational demand, particular when one has to explore a number of different partitions or templates are featureless.
A version of multi-reference alignment inspired by the principle of Maximum-Likelihood (ML) was introduced with the goal of mitigating the initial reference bias and improving convergence properties . It was based on the initial ML method for alignment of EM data by Sigworth, who considered a likelihood function of the model parameters that includes a conditional probability function of observing the outcome model (2D or 3D average of EM data) given a set of orientation parameter values . Further, the method was extended to finite-mixture model that includes membership of each 2D projection image to one of the discrete classes, thus obtaining a ML version of the multi-reference alignment algorithm [45,47]. These algorithms assume a model in which orientation parameters are “fuzzied” by treating them as random variables and the Expectation-Maximization (E-M) recursive parameter estimation algorithm includes integrations of images over the continuous random distributions of orientation parameters sought. This has a distinct advantage of improving the convergence properties, i.e., the danger of producing artifactual structures with low SNR data is somewhat mitigated, but it also makes the method extremely computationally costly. In order to make the method computationally manageable, in an ad hoc improvement, a heuristic reduction of the search space in rotation angles was introduced. In the 3D case, further simplification includes the search for Eulerian angles carried out only over rather sparsely distributed set of projection directions (angular step set to 10°) . As the simplifications are essentially heuristic, the convergence properties of the algorithm remain untested. A promising development is to use a locally adaptive algorithm, in which advantage is taken of smoothness of the correlation function in order to improve the computational efficiency of the method .
While the computational requirements restrict applicability of 3D ML approach to intermediate resolution analysis, the initial results are promising. The ML-based analysis of 70S ribosome complexed with elongation factor G revealed heterogeneity of the sample similar to the one obtained earlier using supervised classification, but without relying on prior knowledge of the content of the sample .
A comprehensive approach to 3D analysis of conformational heterogeneity requires methodology for establishing presence of variability in the 3D map (real-space variance in the reconstruction), deciding the plausible number of 3D structures in the sample, and constructing good initial guesses of these 3D templates. Such an approach is being advanced using statistical resampling technique of bootstrap in conjunction with PCA and clustering [14,49,50]. The first step, the calculation of the voxel-by-voxel variance for a structure computed as a 3-D reconstruction from the set of 2D projections is based on a very simple premise. By assuming that voxels in the reconstructed 3D map can be considered weighted sums of pixels in 2D projections, the variances and covariances can be calculated using a variant of the bootstrap technique in which a new set of projections is selected with replacements from a given set of projection data. This selection process is repeated large number times, for each resampled set of projections a corresponding 3D volume is calculated, and the voxel-by-voxel (bootstrap) variance of these volumes is calculated. The “correct” variance is obtained from the bootstrap variance using a simple algebraic relation . The regions in 3D map that have high variance indicate possible flexibility of the complex [49,51] or substoichiometric binding of a ligand (Fig.1). This information can be used to initialize 3D K-means procedure, for example in the form of “focused classification” . Better insight can be gained by taking advantage of the fact that the variances/covariances of bootstrap volumes are related to those of the original volume in the same way. Thus, it is possible to perform eigenanalysis of the bootstrap volumes and derive eigenvectors (here called eigenvolumes) that reflect variability of the studied complex . Further, eigenvolumes can be computationally projected in the directions of 2D data and inner product between the pairs of eigenvolumes' projections and EM projection data yields factorial coordinates of the latter, which tie 2D projection data to the common 3D reference framework and to the determined variability of the structure. This step also results in a dramatic reduction of the complexity of the problem, as usually few dominating eigenvolumes (around ten) are sufficient to explain the dominant variability of the structure. The reduced dimensionality of the data set makes it possible to conduct exhaustive clustering studies of the factorial coordinates and determine the number and the structures of initial 3D templates. These are subsequently refined using the established methods of 3D K-means and multi-reference alignment (Fig.2). The described methodology is quite appealing in that the determination of initial templates and probably more importantly of their number is separated from computationally costly 3D multi-reference alignment and also in that eigenvolumes immediately yield insight into expected heterogeneity of the data set. However, high-resolution applications of the methodology have to be yet demonstrated.
From its inception, single particle cryo-EM rested on the hypothesis that the particles imaged would correspond to complexes of identical composition and conformation. It has become progressively clear that this fundamental assumption is rarely (if ever) realized. Although originally proposed as a means for calculating 3D variance maps that would serve to assess the quality and validity of a cryo-EM reconstruction, the resampling analysis we propose to develop will actually relax the requirement for imaging identical particles, and become a powerful tool for studying the conformational behavior of large macromolecular assemblies. It is important to note the fundamental difference between the resampling approach and the 3D multi-reference analysis proposed by others and us [7,8,11,31,39,47,50]. To be computationally feasible, previous methods required an assumption about the number of conformers, and reasonable guesses about the structures of those conformers. In contrast, the resampling approach requires no prior information, relying instead on the calculation and clustering of factorial coordinates of projection images based on eigenanalysis of alignment parameters for an entire data set. In summary, our approach provides a very efficient and elegant way to establishing the number of conformers in the data set, and information about the structure of conformers is derived directly and objectively from the EM data.
Delivery of the promise of single particle reconstruction, i.e., routine delivery of atomic resolution protein structures in close-to-native conditions, although at hand, requires exponentially growing effort. Moreover, without exceptions, complexes that are on the way to be determined by cryo-EM to atomic resolution (3-4Å) were already crystallized: Gro-EL, virus capsids, TMV, ribosome. While comparative studies are of great value and interest, it is possible that for cryo-EM to succeed at an atomic level the complex has to be sufficiently large and rigid that X-rays crystallographic effort is also likely to succeed. Combining this possibility with the realization that atomic-resolution cryo-EM work requires costly high-end instrumentation and very significant human and computational resources, it is quite possible that the future of single particle analysis lies elsewhere.
It is indeed the observation of conformational heterogeneity where cryo-EM reconstruction proved to be particularly efficient and uniquely useful. As we hopefully succeeded in demonstrating, many if not most of the recent EM structure determination projects contain the demonstration, discussion, or hints at plausibility of conformational variability of the complex. As we are currently learning also from other biophysical techniques, e.g. single-molecule fluorescence resonance energy transfer (smFRET) measurements, the dynamic behaviour of macromolecular machines such as the ribosome appears to be even more complex than anticipated. As shown by smFRET studies  and multi-particle cryo-EM [41,44,54,55] functional ribosomal complex can exhibit an intrinsic conformational heterogeneity, i.e. the ribosome can spontaneously adopt distinct native state conformations that are in equilibrium at room temperature. Thus, a metastable energy landscape view of the function of ribosomes and other macromolecular machines is emerging similar to models that have been developed for folding and dynamics of smaller proteins .
With such a proliferation of conformational variability findings by EM it is striking that there are virtually no mathematical methods to assess their validity, short of extending the resolution of the complex to the level at which secondary elements become identifiable. It is of serious concern to observe that claims of conformational variability are at the same time used to explain artifactual appearance of limited resolution EM maps or any observed differences with respect to X-ray crystallographic maps. The standard method of assessing quality of EM results is based on evaluating their self-consistency with the help of the Fourier Shell Correlation (FSC) methodology. It is generally known that entirely artifactual maps can boast impressive resolution, as evaluated by the FSC and that resolution estimated by the FSC for a plausibly looking map is not a proof that the map actually has valid information to such a frequency limit. Nevertheless, there is no shortage of widely exaggerated resolution claims that subsequently propagate to subsequent analysis of the map in terms of fitting of X-ray elements or claims of conformational variability. As the awareness of such dangers is growing many researchers embark on significant effort in order to support the findings.
Future work on methodology of conformational modes studies by multi particle analysis has to be accompanied by vigorous efforts to develop new validation methods for the field. Taking into account the central role that structural dynamics of macromolecules plays in understanding biological processes on molecular level, we have no doubt that the next frontier of structural biology is the development of reliable methods, both experimental and computational, to study these phenomena. Cryo-EM, in its ability to capture images of molecules in their unconstrained, close-to-native states, is perfectly suited to become a leading tool in studies of dynamical phenomena in macromolecules, particularly in application to large oligomeric complexes.
We thank Marek Kimmel for critical reading of the manuscript. This work was supported by grants from the NIH R01 GM 60635 (to PAP), the DFG (SFB 740 and SFB 449 to CMTS), by the European Union 3D-EM Network of Excellence and by the European Union and Senatsverwaltung für Wissenschaft, Forschung und Kultur Berlin (UltraStructureNetwork, Anwenderzentrum) (to CMTS).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.