Assignment of protein structures using experimentally measured electron density maps is the crucial step in structure biology. If the electron density is obtained at high resolution (< 3 Å), direct structure assignment is possible. However, for some proteins, only low-resolution (typically > 5 Å) electron densities can be obtained, such as from using Cryo-electron microscopy (cryo-EM)
1–7. Practically, cryo-EM is commonly applied to large molecular complexes, for which the high-resolution structures are often already solved using X-ray crystallography
3,8,9. In this case, the major challenge is the assembly of the existing crystal structures to fit the electron density of the complex. However, other challenges still exist, especially in the case when the structure of a single domain is unavailable
6,7, or an individual protein undergoes large conformational change upon assembly of the complex. At ~5 Å resolution it is possible to identify single protein domain boundaries and separate the electron densities of the individual proteins. At this resolution, large secondary segments can already be assigned, but it is still impossible to directly determine the structure in atomic detail
5. To determine the structure of a single protein within the complex, existing methods often use high-resolution x-ray structures and build homology models to fit the low-resolution density map
5,7, or utilize
ab initio folding guided by the density
6.
We propose that it is possible to directly use the single-domain cryo-EM density to search a large structural database, such as the Protein Data Bank (PDB) or other database of protein models, in order to identify existing structures that best fit the electron density. These matching structures can be subsequently selected to build atomic models using existing methods
5–9. Toward this goal, it is necessary to have a computational method that can rapidly compare the electron density with a large number of structural models (hundreds of thousands of structures per second of CPU time)
independent of sequence similarity. Since it is often straightforward to derive the electron density map from a structural model, the essential challenge is to find an algorithm that can quickly compare two electron density distributions.
Electron density may be considered as a type of 3D object. Fast comparison of 3D objects has been a long-standing problem in computer graphics, modeling, and vision. To avoid explicit sampling in the rigid body degrees of freedom, one tactic is to use a vector of invariant descriptors (fingerprints) to describe the unique 3D features of the object
10–12. In this way, the comparison of 3D objects is reduced to a comparison of two vectors, which is extremely fast. For example, we have previously used curvature distributions as fingerprints in order to describe local surface patches of a protein, and successfully identified local surface similarity between proteins, independent of sequence and fold
13.
The specified fingerprint we use in this study for the comparison of electron densities is made up of the 3D Zernike invariants
10,12, which are constructed based on expansion of the 3D Zernike functions. Although other forms of expansion, such as the spherical harmonic function
11,14,15 and Hermite function
16, have also been used, we choose 3D Zernike expansion due to its advantage of polynomial expansion in Cartesian coordinates
12. Because of its speed and accuracy, 3D Zernike invariants have recently gained increasing popularity in the field of shape retrieval. Novotni and Klein first used 3D Zernike invariants for shape retrieval, and find better performance than spherical harmonic descriptors
12. Sael et al. used 3D Zernike invariants to compare the geometry
17 and electrostatic properties
18 of protein surfaces. Venkatraman et al. further apply Zernike invariants to represent local surface shape for protein-protein docking
19. 3D Zernike invariants were also adapted by Mek et al. to describe the molecular shape of ligands and proteins
20, which was later extended by Grandison et al. to include flexibility in order to describe structural motion
21. More examples of applications of spherical harmonic and 3D Zernike invariants can be found in a recent review by Venkatraman et al.
22. Despite the increasing research interest, application of Zernike invariants to the comparison of electron densities of proteins has not been reported yet.
In this study, we demonstrate the feasibility of using electron density maps of proteins to search a large protein structure database for matching structures. We benchmark this approach in a constructed test set, and also search the entire PDB for structures that fit two experimental cryo-EM electron densities, bovine metarhodopsin I
4 (5.5 Å resolution) and GroEL
4 (6 Å resolution). By ranking the protein structures based on their fingerprint similarity to the query electron density, we successfully identify matching structures among the top hits.