|Home | About | Journals | Submit | Contact Us | Français|
Molecular replacement (MR) is a well established method for phasing of X-ray diffraction patterns for crystals composed of biological macromolecules of known chemical structure but unknown conformation. In MR, the starting point is known structural domains that are presumed to be similar in shape to those in the macromolecular structure which is to be determined. A search is then performed over positions and orientations of the known domains within a model of the crystallographic asymmetric unit so as to best match a computed diffraction pattern with experimental data. Unlike continuous rigid-body motions in Euclidean space and the discrete crystallographic space groups, the set of motions over which molecular replacement searches are performed does not form a group under the operation of composition, which is shown here to lack the associative property. However, the set of rigid-body motions in the asymmetric unit forms another mathematical structure called a quasigroup, which can be identified with right-coset spaces of the full group of rigid-body motions with respect to the chiral space group of the macromolecular crystal. The algebraic properties of this space of motions are articulated here.
Over the past half century, X-ray crystallography has been a wildly successful tool for obtaining structures of biological macromolecules. Aside from finding conditions under which crystals will grow (which largely has been reduced to automated robotic searches) the major hurdle in determining a three-dimensional structure when using X-ray crystallography is that of phasing the diffraction pattern. And while experimental methods such as multiple isomorphous replacement (MIR) and multiple-wavelength anomalous dispersion (MAD) phasing are often used, if the macromolecular system under study is known a priori to consist of components that are similar in structure to solved structures, then the phasing problem can be reduced to a purely computational one, known as a molecular replacement (MR) search. In this article, six-dimensional MR searches for single-domain structures are formulated using the language and tools of modern mathematics. A coherent mathematical description of the MR search space is presented. It is also shown that more generally the -dimensional search space that results for a multi-domain macromolecule or complex constructed from rigid parts is endowed with a binary operation. This operation is shown not to be associative, and therefore the resulting space is not a group. However, as will be proven here, the result is a mathematical object called a quasigroup.
This concept can be understood graphically at this stage without any notation or formulas. Consider a planar rigid-body transformation applied to the particular gray letter ‘Q’ in the upper-right cell in Fig. 1 . The transformation moves that ‘Q’ from its original (gray) state to a new (black) state. The change in position resulting from the translational part of the transformation can be described by a vector originating at the center of the gray ‘Q’ and terminating at the center of the black one. In this example the translation vector points up and to the right. The transformation also results in an orientational change, which in this case is a counterclockwise rotation by about 25°. If the other gray ‘Q’s are also moved from their initial state in an analogous way so that the relative motion between each corresponding pair of gray and black ‘Q’s is the same, the result will be that shown in Fig. 1 , which represents four cells of an infinite crystal. This is the same as what would result by starting with the cell in the upper right together with both of its ‘Q’s, and treating these three objects as a single rigid unit that is then translated without rotation and copied so as to form a crystal. The resulting set of black ‘Q’s is not the same as would have resulted from the single rigid-body motion of all of the gray ‘Q’s as one infinite rigid unit.
In the scenario in Fig. 1 there is exactly one ‘Q’ in each unit cell before the motion and exactly one in each cell after the motion, where ‘being in the unit cell’ is taken here to mean that the center point of a ‘Q’ is inside the unit cell. It just so happens in the present example that the same ‘Q’ is inside the same cell before and after this particular motion. But this will not always be the case. Indeed, if each new ‘Q’ is moved from its current position and orientation by exactly the same relative motion as before (i.e. if the relative motion in Fig. 1 is applied twice), the result will be the black ‘Q’ in Fig. 2 . In this figure the lightest gray color denotes the original position and orientation, the middle-gray ‘Q’ that is sitting to the upper right of each light one is the same as the black one in Fig. 1 , and now the new black one has moved up and to the right of this middle-gray one. This is the result of two concatenated transformations applied to each ‘Q’. Note that now each black ‘Q’ has moved from its original unit cell into an adjacent one. But if we focus on an individual unit cell, we can forget about the version that has left the cell, and replace it with the one that has entered from another cell. In so doing, the set of continuous rigid-body motions within a crystal becomes a finite-volume object, unlike continuous motions in Euclidean space. This finite-volume object is what is referred to here as a motion space, which is different from the motion group consisting of all isometries of the Euclidean plane that preserve handedness.
Each element of a motion space can be inverted. But this inverse is not simply the inverse of the motion in Fig. 1 . Applying the inverse of each of the rigid-body transformations for each ‘Q’ that resulted in Fig. 1 is equivalent to moving each light-gray ‘Q’ in Fig. 3 to the position and orientation of the new black ones to the lower left. This does not keep the center of the resulting ‘Q’ in the same unit cell, even though the original motion did. But again, we can forget about the version of the ‘Q’ that has left the unit cell under this motion, and replace it with the one that enters from an adjacent cell.
If we were doing this all without rotating, the result simply would be the torus, which is a quotient of the group of Euclidean translations by primitive lattice translations. But because orientations are also involved, the result is more complicated. The space of motions within each unit cell is still a coset space (in this case, of the group of rigid-body motions by a chiral crystallographic space group, due to the lack of symmetry of ‘Q’ under reflections), and such motions can be composed. But unlike a group, this set of motions is non-associative as will be shown later in the paper in numerical examples. This non-associativity makes these spaces of motions a mathematical object called a quasigroup.
The concept of quasigroups has existed in the mathematics literature for more than half a century (see e.g. Bruck, 1958 ), and remains a topic of interest today (Pflugfelder, 1990 ; Sabinin, 1999 ; Smith, 2006 ; Vasantha Kandasamy, 2002 ; Nagy & Strambach, 2002 ). Whereas the advanced mathematical concept of a groupoid has been connected to problems in crystallography (Weinstein, 1996 ), to the the author’s knowledge connections between quasigroups and crystallography have not been made before. Herein a case is made that a special kind of quasigroup (i.e. a motion space) is the natural algebraic structure to describe rigid-body motions within the crystallographic asymmetric unit. Therefore, quasigroups and functions whose arguments are elements of a quasigroup are the proper mathematical objects for articulating molecular replacement problems. Indeed, the quasigroups shown here to be relevant in crystallography have properties above and beyond those in the standard theory. In particular, the quasigroups presented here have an identity and possess a continuum of elements similar to a Lie group.1
The crystallographic space groups have been cataloged in great detail in the crystallography literature. For example, summaries can be found in Bradley & Cracknell (2009 ), Burns & Glazer (1990 ), Hahn (2002 ), Hammond (1997 ), Julian (2008 ), Janssen (1973 ), Ladd (1989 ), Lockwood & MacMillan (1978 ), Evarestov & Smirnov (1993 ) and Aroyo et al., (2010 ), as well as in various online resources. Treatments of space-group symmetry from the perspective of pure mathematicians can be found in Conway et al. (2001 ), Engel (1986 ), Hilton (1963 ), Iversen (1990 ), Miller (1972 ), Nespolo (2008 ) and Senechal (1980 ).
Of the 230 possible space groups, only 65 are possible for biological macromolecular crystals (i.e. the chiral/proper ones). The reason for this is that biological macromolecules such as proteins and nucleic acids are composed of constituent parts that have handedness and directionality (e.g. amino acids and nucleic acids, respectively, have C–N and 5′–3′ directionality). This is discussed in greater detail in McPherson (2003 ), Rhodes (2000 ), Lattman & Loll (2008 ) and Rupp (2010 ). Of these 65, some occur much more frequently than others and these are typically non-symmorphic space groups. For example, more than a quarter of all proteins crystallized to date have symmetry, and the three most commonly occurring symmetry groups represent approximately half of all macromolecular crystals (Rupp, 2010 ; Wukovitz & Yeates, 1995 ).
The number of proteins in a unit cell, the space group and aspect ratios of the unit cell can be taken as known inputs in MR computations, since they are all provided by experimental observation. From homology modeling, it is often possible to have reliable estimates of the shape of each domain in a multi-domain protein. What remains unknown are the relative positions and orientations of the domains within each protein and the overall position and orientation of the protein molecules within the unit cell.
Once these are known, a model of the unit cell can be constructed and used as an initial phasing model that can be combined with the X-ray diffraction data. This is, in essence, the molecular replacement approach that is now more than half a century old (Rossmann & Blow, 1962 ; Hirshfeld, 1968 ; Lattman & Love, 1970 ; Rossmann, 2001 ). Many powerful software packages for MR include those described in Navaza (1994 ), Collaborative Computational Project, Number 4 (1994 ), Vagin & Teplyakov (2010 ) and Caliandro et al. (2009 ). Typically these perform rotation searches first, followed by translation searches.
Recently, full six-degrees-of-freedom rigid-body searches and degree-of-freedom (DOF) multi-rigid-body searches have been investigated (Jogl et al., 2001 ; Sheriff et al., 1999 ; Jamrog et al., 2003 ; Jeong et al., 2006 ) where is the number of domains in each molecule or complex. These methods have the appeal that the false peaks that result when searching the rotation and translation functions separately can be reduced. This paper analyzes the mathematical structure of these search spaces and examines what happens when rigid-body motions in crystallographic environments are concatenated. It is shown that unlike the symmetry operations of the crystal lattice, or rigid-body motions in Euclidean space, the set of motions of a domain (or collection of domains) within a crystallographic unit cell (or asymmetric unit) with faces ‘glued’ in an appropriate way does not form a group. Rather, it has a quasigroup structure lacking the associative property.
The remainder of this paper (which is the first in a planned series) makes the connection between molecular replacement and the algebraic properties of quasigroups. §2 provides a brief review of notation and properties of continuous rigid-body motions and crystallographic symmetry. §3 articulates MR problems in modern mathematical terminology. §4 explains why quasigroups are the appropriate algebraic structures to use for macromolecular MR problems, and derives some new properties of the concrete quasigroup structures that arise in MR applications. Examples illustrate the lack of associativity. §5 focuses on how the quasigroups of motions defined earlier act on asymmetric units. §6 illustrates the non-uniqueness of fundamental domains and constructs mappings between different choices, some of which can be called quasigroup isomorphisms. §7 develops the special algebraic relations associated with projections from quasigroups to the asymmetric units on which they act. §8 returns to MR applications and illustrates several ways in which the algebraic constructions developed in the paper can be used to describe allowable motions of macromolecular domains while remaining consistent with constraints imposed by the crystal structure. Future papers in this series will address the geometric and topological properties of these motion spaces, and connections with harmonic analysis.
This section establishes common notation and reviews the properties of continuous and discrete motions.
The special Euclidean group, , consists of all rotation–translation pairs where is an rotation matrix, the set of which forms the special orthogonal group , and is a translation vector. The group operation for this group is defined for every as
From this it is easy to calculate that for any , and where
Here is the identity matrix and 0 is the null translation vector.
These are the continuous groups of pure translations and pure rotations. The group of pure translations is isomorphic with with the operation of addition, i.e. , and the group of pure rotations is isomorphic with , i.e. , where the operation for is matrix multiplication. These subgroups are special because any element can be written as a product of pure translations and rotations as .
Let denote the chiral group of discrete symmetries of a macromolecular crystal. , though discrete, always has an infinite number of elements and can be viewed as a proper subgroup of the group of rigid-body motions, , which is written as , with denoting proper subgroup.
The group acts on the set as
for all position vectors . Any such position can be expressed as where is the natural basis for consisting of orthogonal unit vectors. Alternatively, in crystallographic applications it can be more convenient to write where are the directions from one lattice point to the corresponding one in an adjacent primitive unit cell. Sweeping through values defines a primitive crystallographic unit cell. Whereas denotes any of a continuum of positions, the set of all discrete translations of the form for all forms the Bravais lattice, , and for any two fixed , is also in the lattice. The lattice together with addition is the group of primitive lattice translations, , which is infinite but discrete. is the whole group of crystallographic symmetry operations that includes both lattice translations and a chiral point group as subgroups. The space group of a Bravais lattice is a semi-direct product and can be thought of as a discrete version of . However, a crystal consists of both a Bravais lattice and a motif repeated inside the unit cells. This changes the symmetry, by possibly removing some rotational symmetry operations and possibly introducing some discrete screw displacements.
In general, given any proper subgroup contained in a group (which is denoted as ), including (but not limited to) the case when is , or , and is , left and right cosets are defined, respectively, as
It is well known that a group is divided into disjoint left (or right) cosets, and that only for a normal subgroup, , is it the case that for all . More generally, the left- and right-coset (or quotient) spaces that contain all left or right cosets are denoted, respectively, as and . Normal subgroups are special because and a natural group operation, , can be defined so that is also a group. For example, in equation (4) is a normal subgroup of , meaning that for all and , . This condition is written as , and in fact it can be shown that .
A space, , on which a group, , acts can be divided into disjoint orbits. The set of all of these orbits is denoted as , as this is a kind of quotient space.2 An immediate crystallographic consequence of these definitions is that if is the full chiral symmetry group of a crystal and , then can be identified with the asymmetric unit. Moreover, if is the largest discrete translation group of the crystal (and so also), then can be identified with the primitive unit cell, and so too can the coset space . Since is a normal subgroup of , the unit cell is actually endowed with a group structure, namely periodic addition. For this reason, a unit cell, , in -dimensional space with its opposing faces glued is equivalent to an -dimensional torus,
This can be identified with the box with the operation of addition for all . This fact is implicitly and extensively used in crystallography to expand the density in a unit cell in terms of Fourier series. Furthermore, the translational motion of the contents of a unit cell is easy to handle within the framework of classical mathematics. However, if one wishes to focus attention in MR searches on the asymmetric unit , then there is no associated group operation. An advantage of using is that it is smaller (in terms of volume) than , and therefore when discretizing this space for numerical computations the number of grid points required for a given resolution will be smaller. Furthermore, even in the case when the whole unit cell is considered, though periodic translations are handled in an effortless way within the context of classical Fourier analysis, rotations of the rigid contents within a unit cell of a crystal are somewhat problematic within the classical framework, which provides the motivation for the current work.
The set of orbits can be viewed as a region in , denoted as (or for short when the connection between and is clear from the context). Here stands for ‘fundamental domain’. A point in is denoted as , and serves as a representative for each orbit generated by the application of all elements of to a particular . Each point can be thought of as for a unique and , where and can be chosen as the unit cell and the asymmetric unit, respectively.
Typically MR searches are performed by reducing the problem of first finding the orientation/rotation of a homologous component, followed by a translational/positional search. This method works extremely well for single-domain proteins because the signal-to-noise ratio (SNR) is very high. However, in crystals composed of complex multi-body proteins or complexes, the SNR can be quite low.3
Suppose that a single copy of a macromolecular structure of interest has an electron density . That is, there exists a function . This says nothing more than that the density is non-negative. This function may be constructed by adding densities of individual domains within the structure. And if thermal motions are taken into account, each of these component densities can be motionally blurred as described in Chirikjian (2010 ).
This means that the total electron density of the non-solvent part of the crystal will be4
The symmetry group, , and number of copies of the molecule in a given unit cell can both be estimated directly from the experimental data (Matthews, 1968 ). Note that such a function is ‘-periodic’ in the sense that for any ,
Now suppose that before constructing symmetry-related copies of the density , we first move it by an arbitrary . The result will be
There should be no confusion between the single-argument and two-argument versions of the density function; they are actually different functions which are easily distinguished by their arguments. They share the same name ‘’ to avoid a proliferation of notation.
It is easy to see that for any fixed
The in each of these expressions can be taken to be in , but this is wasteful because extends to infinity, and the same result appears whether or is used for any . Therefore, the rigid-body motions of interest are those that can be taken one from each coset . In contrast to an element of , which is denoted as , an element of the fundamental region corresponding to the coset space is denoted as . In other words, is an element of as well as of . The notation is similar to used earlier, but unlike spaces of orbits, since it is possible to have both left-coset spaces and right-coset spaces, the subscript r is used to restrict the discussion to the ‘right’ case, as well as to distinguish from .
There is never any need to consider outside of since
In an X-ray diffraction experiment for a single-domain protein, is not obtained directly. Rather, the magnitude of the Fourier transform of is obtained with held fixed by the physics of the crystal. In general, if are the vectors describing lattice directions, so that each element of the group consists of translations of the form
then the classical Fourier series coefficients for (which for each fixed is a function on ) are denoted as . There is duality between the Fourier expansions for and for the unit cell , and likewise is the unitary dual of .
A goal of molecular replacement is then to find the specific such that best matches with the diffraction pattern, , which is provided from X-ray crystallography experiments.5 In other words, a fundamental goal of molecular replacement is to minimize a cost function of the form
where is some measure of distance, discrepancy or distortion between densities or intensities. For example, , or . Of these, is by far the most popular because it lends itself to computation in either Fourier space or real space via Parseval’s equality. Less detailed versions of equation (12) use in place of , in which case the translational part of shows up as a phase factor that disappears when computing magnitude.
when is extended to take values in . This makes them functions on (or, equivalently, ), in analogy with the way that a periodic function on the real line can be viewed as a function on the circle.
Though the discussion here treats translations and rotations together, the standard approach in molecular replacement is to break up the right-hand side of equation (12) into a part that depends only on the rotational part of , and then a term that depends on a combination of the translational and rotational parts of . This second term is discarded and a pure rotational search is performed. Computationally this is advantageous because the dimensions of the search space are reduced from to , but since the term that is thrown away depends on the rotational part of , this introduces a larger degree of ‘noise’ into the cost function, thereby introducing spurious false peaks in the rotation function that would otherwise not need to be investigated.
In this section an example is used to illustrate graphically. Let be shorthand for where is a counterclockwise rotation around the axis by angle and the composition of two motions is defined in equation (1). When , the and components of span a finite range, which we can take to be a unit square in the plane. Then can be viewed as a box, with the vertical direction denoting the rotation angle . The height of the top horizontal face of the box relative to its bottom is defined by radians. All opposing faces of the box are glued directly to each other with corresponding points defined by the intersection of lines parallel to coordinate axes and the faces.
This is illustrated in Fig. 4 in which the points on opposing faces in each box are identified. This means that in Fig. 4 (a) the following sets each describe the same point: , , where . Similarly, in Fig. 4 (b) , , where . As a consequence, all eight of the extreme vertices in each figure correspond to the same point.
The algebraic properties established in the following section build on these ideas and will assist in the further mathematical characterization of the MR problem.
Though is a group and G is a group, is not a normal subgroup of (and neither is ). Therefore, unlike the situation in which or , which are again groups, the right-coset spaces and are not groups. However, as will be shown here, it is possible to define a non-associative binary operation for these spaces, which turns them into quasigroups.
As demonstrated in the previous section, the choice of is not unique. Given any and a fixed choice of , we can define to be such that for some . Therefore, we can think of as a mapping that selects one representative of each coset that has the following properties,
With these three properties, it is possible to define a binary operation between any two elements . Namely,
A right (group) action of G on can be defined as
Then, when this expression is evaluated with in place of ,
The relationships between , and are described by the commutative diagram below, where id is the identity map, and id, applied to means that id is applied to and is applied to .
If , then and . Furthermore, if in addition , then . However, if , then an additional operation would be required to ensure that . And herein lies the reason why motions in are a quasigroup rather than a group. Namely, in general
That is, the associative property fails.
When it comes to computing inverses, we seek an inverse of that is also in . Unlike a group, there is no a priori guarantee that the left inverse exists, the right inverse exists, and that they are the same. Here we show that indeed left inverses exist, how to compute them, and that in general the left inverse is not a right inverse.
Since we would always define such that , it follows that . Since for some , and
Therefore, applying to both sides gives
But this means that is the left inverse of with respect to the operation . This is true regardless of whether or not and are equal.
But this left inverse is not a right inverse:
And it still fails to be a right inverse.
In the special case when , it follows that and . Combining these then gives . Furthermore, in this special case, the left inverses computed above also will be right inverses. For example, if and is as in Fig. 4 (b), then and is the same as , which serves as both a left and right inverse, since in this context holds, as usual in a group. Note that if instead we used as in Fig. 4 (a), then in the above example .
In any quasigroup the following equations can be solved for and for any given and that are in the quasigroup:
These solutions are denoted as
(where and are division on the right and left, respectively). But, since the associative law does not hold, we cannot simply apply the inverse of or to obtain the answer. Instead, using the rules established in §4.1,
where is the special element of chosen to ensure that . Similarly,
Here no special choice of is required, and when , is simply the left inverse of .
As stated in §2, the group of rigid-body motions, , acts on points in Euclidean space, , by moving them as . So too, the quasigroup acts on points in to move them to other points in the same space. However, the usual property of a group action,
does not apply for a quasigroup.
If and and we can define a (quasigroup) action of on as
This is illustrated in the diagram below.
Note that since
it follows that
Since acts from the left on , and since , it follows that
Then, upon the application of to both sides,
Also, combining the properties of group and quasigroup actions,
This can be written as
And though it would be too much to expect that the properties of a group action would hold for a quasigroup action, the fact that
As depicted in Fig. 4 , the definition of a fundamental domain is not unique. And since the definition of depends on how is defined, it too is not unique. Let and denote an allowable alternative to and . When examining relationships between candidate fundamental domains, it makes sense to consider allowable mappings of the form
For example, in addition to the two cases shown in Fig. 4 when , valid fundamental domains for can be obtained by translating each horizontal slice in those figures by some continuous and . Hence a continuum of different fundamental domains can exist that correspond to one coset space . Corresponding to each choice, is replaced by a different .
From an algebraic perspective, it is interesting to ask when such domains are equivalent as quasigroups. In other words, we seek special bijections of the form where
Such mappings can be called quasigroup isomorphisms.
The existence of bijections is clear in the example of Fig. 4 , since it is possible to divide up the two fundamental domains into octants, and generate a mapping by permuting these octants and gluing them appropriately. However, it is not clear a priori whether or not such a bijection will preserve the quasigroup operation in the sense of equation (24).
In contrast, the conjugation of by some fixed can be used to define
Then if ,
Therefore, if we define by the equality
it is easy to see that
This is expressed in the following commutative diagram.
In other words, for any fixed , is a quasigroup since is, and the above diagram commutes. But unlike in equation (23) where the quasigroup corresponds to the same coset space, here the coset spaces are different since in general . But this discussion becomes relevant to the issue of constructing different fundamental domains for the same coset space if we restrict the choice of such that . This is achieved easily by restricting , the normalizer of in . When choosing , it follows that
and, therefore, the set of all mappings forms a group under the operation of composition in equation (26), , and this group is isomorphic with where is the centralizer of in . Recall that is the largest subgroup of in which is a normal subgroup, and is the subgroup of consisting of all elements that commute with every element of .
Additional algebraic properties result from the special role that translations play, both in space groups and in continuous Euclidean motions. These are explored in this section.
When viewed as a set rather than a group, . Then a natural projection operator is that simply picks off the translational part of as . When this projection is applied after multiplying two group elements, the result is This is of the same form as the action in equation (5). Therefore, we can write the following diagram, which is equivalent to the equation
This algebraic property gives the geometric structure of a trivial principal fiber bundle, which will have implications for possible geometric interpretations of , which will be explored in the second paper in this series.
Until now, no specific choice was made to identify which representatives of the cosets are used to define . Such a choice would fix the geometric structure of . The general discussion of this is postponed until the second paper in this series. But the case when is now addressed, and it is closely related to the properties of the projection operator discussed previously.
Two possible choices for when were illustrated in Fig. 4 . More generally, the choice of is partially constrained by identifying with . This does not fully define because can be defined in multiple ways (for example in Figs. 4 a and 4 b, this is, respectively, the unit square contained in the first quadrant and centered at the origin).
The (partial) definition
is acceptable because has no rotational or screw symmetry operators, and therefore its action from the left has no effect on the part of . Then it is clear that for any , and
Since a pure translation is of the form , it is possible to compute . Similarly, a translation can be identified as a position via the action where is the origin in . The projection operator relates and as as a special case of equation (29).
This section first reviews the multi-domain molecular replacement problem and then illustrates the applicability of the algebraic concepts developed earlier in this paper.
Consider a multi-domain protein or complex that is known to consist of rigid components, each of which has a high degree of homology to a known protein. Some of these components might also be homologous to each other, but in the absence of any evidence otherwise, the domains will be treated as having different density functions. If the th body/domain in the assemblage has density when described in its own body-fixed reference frame, then for some unknown set of rigid-body motions , the density of the whole unknown structure must be of the form
where . Here are relative rigid-body motions between sequentially numbered bodies. Such a numbering does not require that the bodies form a kinematic chain, though such topological constraints naturally limit the volume of the search space.
If the assemblage/complex/multi-body protein that is formed from these individual domains/bodies is rigid, then symmetry mates in the crystal will all have the same values of . Here takes the place of in the earlier discussion of single-body molecular replacement, and the density becomes
Cost functions analogous to equation (12) follow naturally, but now become functions of , and therefore represent a -dimensional search. Direct grid searches of very high dimensional spaces will always be inadvisable, no matter how rapidly computer technology advances. However, by taking advantage of the quasigroup structure of this search space, gradient descent methods may be appropriate. Whereas such methods are inadvisable when seeking optima in the rotation function (since there is tremendous ‘noise’ that results from discarding non-pure-rotation terms), the high-dimensional search space is far less noisy since the high-dimensional model that is matched to the diffraction pattern (or in real space, the Patterson function) has built into it a higher-fidelity model where all variables are simultaneously present, rather than sequential searches over each domain.
The properties of quasigroups of motions and their actions on points in an asymmetric unit, as well as actions of motion groups on quasigroups, will play a role in various aspects of MR that will be explored in later papers in this series. These include modeling motional smearing such as is the case in static disorder and thermal motion in crystals, and the formulation of optimization problems such as minimizing the cost in equation (12). Such applications involve both the algebraic properties discussed here, and the geometric ones that will be described in the second paper in this series. Nevertheless, it is possible to illustrate at this stage how the concepts of , , , , , and interact naturally in a particular MR-related problem, as discussed below.
Consider a macromolecular structure consisting of two rigid domains. Let and denote the densities of these bodies, each relative to its own body-fixed reference frame. In the case when these locally defined densities have their body-fixed frames coincident with the identity reference frame , then for . If the frame attached to body 2 has a position and orientation of relative to the frame attached to body 1, then the density function for the composite structure (when the reference frame attached to body 1 is the identity) will be
Then, if body 1 is itself moved and body 2 retains its relative spatial relationship to body 1, the result will be
Using the algebraic rules established earlier, the second term can be written as
The extension to the multi-domain case follows in a similar way, and does not require the introduction of new concepts of action.
This is because, in the case of group actions, the solution to is . But in the quasigroup case, the solution to is not But a solution can be constructed using the algebraic concepts discussed earlier. Namely, if , then
and similarly for . Therefore, the algebraic constructions presented earlier provide a tool for manipulating different descriptions of densities that arise in MR applications.
The algebraic structure of the molecular replacement problem in macromolecular crystallography has been articulated here. This includes enumerating the quasigroup structure of the coset space , where is the space group of the crystal and is the continuous group of rigid-body motions. Equipped with these properties of the space articulated here, it becomes possible to formulate codes for searching the space of motions of macromolecules in asymmetric units in a way that is not subject to the arbitrariness of a choice of coordinates such as Euler angles, and the inescapable distortions and singularities that result from coordinate-dependent approaches. Geometric and numerical aspects of the formulation presented here will be investigated in follow-on papers. In such applications, it is important to fix a geometric interpretation of . It will be shown that the algebraic concept of discussed here provides insights into concrete choices for , and the mappings and quasigroup isomorphisms discussed here provide the means to convert between different choices for these domains.
This work was supported by NIH grant No. R01 GM075310. The suggestions by W. P. Thurston, S. Zucker and the anonymous reviewer are greatly appreciated.
1In the mathematics literature a quasigroup with identity is called a loop (Sabinin, 1999 ; Smith, 2006 ; Vasantha Kandasamy, 2002 ), but since the word ‘loop’ is used in biological contexts to mean a physical serial polymer-like structure with constrained ends, the word ‘quasigroup’ will be used here instead of mathematical ‘loop’.
2Some books denote this as , but to be consistent with the definition of action in equation (5), in which acts on the left of , it makes more sense to write in analogy with the way that preserves the order of in the definition of the coset .
3It should be pointed out that the ‘noise’ here is not noise in the true sense, but rather results from false peaks in rotational correlations arising from restricting the search from a high-dimensional space (e.g. for a system composed of rigid bodies) to an initial three-dimensional orientational search.
4Though this is an infinite sum, each has compact support because each protein domain is a finite body, and so convergence is not an issue.