In some cases, some relatively trivial measures have been used to improve data quality. For example, in order to detect reflections beyond 4.5Å in the case of crystals of the SIV (simian immunodeficiency virus) gp12 envelope glycoprotein, it was important to make sure that a small beamstop was placed close to the crystal and to move the detector 400 mm back from it [
4]. This served to minimize the background, as the diffraction limit is basically a signal to noise issue; most of the noise (which is mainly from background) is contributed by the diffuse scattering from the sample and from air scattering of the direct beam. Large crystal-to-detector distances are especially helpful at those synchrotron beamlines where the beam has very small crossfire. In addition, there is the problem of the series termination errors that give rise to ripples next to real density features (see e.g., Minichino
et al. [
5] and references therein). Also, diffraction that is not isotropic, with diffraction limits that are dependent on the direction of the scattering vector, occurs frequently. This situation can be helped by ellipsoidal truncation and anisotropic scaling that can, for example, be done on the UCLA (University of California, Los Angeles) web server [
6,
7] or the CCP4 (Collaborative Computational Project No. 4) program SCALEIT. The effects of radiation decay in the data sets can be alleviated by applying the so-called zero dose correction [
8], provided that each unique reflection was measured a number of times, which may be difficult to achieve in low symmetry space groups.
The combination of the above sorts of factors will almost always lead to electron density maps that are noisy and lacking in detail. In the 4-5Å resolution range, α-helices appear as tubes and β-sheets as walls of density with no indication where the individual strands might run. Indeed, for the latter, the hydrogen bonding between β-sheet strands is notorious for causing confusion in tracing the path of the polypeptide chain. Of course, even to see this much, some kind of phase information is needed in addition to the intensity data. Unless the macromolecular assembly is mainly made up of α-helical domains, in this resolution range a complete
de novo structure determination without known three-dimensional structures of its components and domains is extremely challenging and in many cases might be impossible [
9]. An important exception is the class of cases where high-order non-crystallographic symmetry is available, such as in the case of spherical or cylindrical viruses.
It is increasingly the case, indeed now very often, that the three-dimensional structures of the components or fragments of the molecule or assembly in question are already available, and this opens up an avenue towards generating useful initial phase information. Modern automated molecular replacement (MR) programs such as Phaser [
10] or AMoRe [
11] can generate good solutions even when the search model is a relatively small fraction of the total scattering mass. Also, programs such as Phaser allow an ensemble of search models to be used, thus widening the radius of convergence of MR – an important limitation when only one search model is used. However, for any MR approach to provide meaningful phase information, a large fraction of parts of the assembly has to be known three-dimensionally and the fragment structures should not change much upon assembly formation. In favorable cases, a simple difference Fourier calculated with the MR-based model phases can reveal interesting and previously unseen parts of the assembly [
12].
Even when MR is able to place the fragments, it remains extremely desirable to have some experimental phase information. This may come from a selenomethionine (Se-Met) multi-wavelength anomalous diffraction (MAD) experiment, or from heavy atom soaks with, for example, the Ta
6Br
12 cluster [
13], which is especially suited for low resolution work on large assemblies. The heavy atom substructure should then be solvable with phases computed from the MR solution. Optimizing the heavy atom substructure and subsequent density modification can be done with a variety of programs, such as Sharp/Solomon [
14,
15], Solve/Resolve [
16], and others.
Beyond providing direct phase information, these approaches can also independently verify that things are proceeding well, as the substructure obtained with the MR phases must be the same (except for a possible origin shift) as the ones obtained independently (e.g., by the combined Patterson-Direct methods approach implemented in ShelxD [
17]). The huge advantage of either a SAD (single-wavelength anomalous diffraction) or a MAD data set based on Se-Met or Br-dU (bromodeoxyuridine – if there is nucleic acid in the structure) is that the heavy atom substructure could provide further positioning information for the domains or fragments in favorable cases. However, their phasing power is often limited at low resolution. A very important aspect of even modest quality experimental phases is that they are free of model bias.
Model bias, which is more serious at low resolution, is perhaps the greatest caveat in crystallography because the placed model, in the absence of experimental phases, is the only source of phase information. As phase information dominates maps, even an incorrectly placed or inappropriate model (or both) will inevitably show up in its own density to some extent when a map based on model phases is calculated. In addition to the importance of experimental phases, it is not possible to emphasize how important the exploitation of real space redundancies (non-crystallographic symmetry) is, or – if multiple crystal forms are available – how important it is to attempt multi-crystal averaging [
4]. These effectively improve the inherently poor data-to-parameter ratio but assume that the geometrical relationship between the related domains or molecules can be established.
In the past, after placing known fragments into their place and perhaps some rigid body refinement, not much more optimization could be done. However, there have recently been a number of important technical advances in this area. One of these is B-factor sharpening [
18], which involves the application of a negative B-factor to the diffraction data set. This increases the highest resolution reflections in the set and can give rise to more detail-rich maps (e.g., visible side chains) and it is especially useful if experimental phases are available. Care is needed in applying this as the weak highest resolution reflections also have the highest errors and it is likely that by increasing their contributions the overall noise of the map will increase as well. The optimum choice is the negative of the pseudo Wilson B-factor of the diffraction data [
3]. It is also very important to have a reliable bulk solvent model and to correct for data anisotropy. Previous procedures that have worked well when high resolution data were available displayed unstable results for low resolution sets. New grid search-based iterative parameter optimizations of the bulk solvent model such as the ones implemented in the newer versions of CNS (Crystallography and NMR [nuclear magnetic resonance] system) [
19] and Phenix [
20] have successfully overcome this problem.
Quite clearly, any attempt to do molecular model refinement at resolutions poorer than 3.5Å has to have stronger and additional restraints applied to the structure. Explicit restraints of secondary structure, typically through some kind of H-bonding potential, are very useful. An exciting recent development is the incorporation of known three-dimensional structures of homologues of the assembly investigated through incorporation of a deformable elastic network (DEN) potential into the target function used in torsion angle dynamics [
21]. DEN allows restrained but still large-scale deviations from a high(er) resolution reference structure and this, in principle, overcomes the main limitation of previous refinement protocols.