|Home | About | Journals | Submit | Contact Us | Français|
A combination of molecular replacement and single-wavelength anomalous diffraction phasing has been incorporated into the automated structure-determination platform Auto-Rickshaw. The complete MRSAD procedure includes molecular replacement, model refinement, experimental phasing, phase improvement and automated model building. The improvement over the standard SAD or MR approaches is illustrated by ten test cases taken from the JCSG diffraction data-set database. Poor MR or SAD phases with phase errors larger than 70° can be improved using the described procedure and a large fraction of the model can be determined in a purely automatic manner from X-ray data extending to better than 2.6 Å resolution.
As of May 2009, more than 57 000 three-dimensional structures of biological macromolecules had been deposited in the Protein Data Bank (PDB; Berman et al., 2000 ). With the availability of an ever-increasing number of potential search models among previously determined structures, molecular replacement (MR) has become the predominant technique for the determination of further structures. For the year 2007, it has been reported that more than two thirds of all newly deposited structures in the PDB could be solved using MR (Long et al., 2008 ). Different approaches for MR have been realised, including the use of Patterson map techniques (e.g. Rossmann & Blow, 1962 ; Huber, 1965 ; DeLano & Brünger, 1995 ), structure-factor correlation (Navaza, 1987 ) and statistical targets (Bricogne, 1992 , 1997 ; Read, 2001 ). As a consequence, a number of good and easy-to-use MR programs have become available. Examples include AMoRe (Navaza, 1994 ), MOLREP (Vagin & Teplyakov, 1997 ), CNS (Brünger et al., 1998 ), EPMR (Kissinger et al., 1999 ), QS (Glykos & Kokkinidis, 2000 ) and Phaser (McCoy et al., 2005 ).
In principle, MR can lead to a successful structure determination within hours or even minutes. Often, however, the method is not straightforward in practice. The model derived from an MR solution inherently suffers from model bias, which can become severe, especially when the root-mean-square difference (r.m.s.d.) between the search model and the target structure is high. Reduction of the model bias and model completion can become a challenging issue at resolutions lower than 2.3 Å and often requires iterative time-consuming manual correction of the model using computer graphics alternating with model refinement. The standard methods for bias removal include omission of parts of the model, allowance for model errors in the refinement target functions and map coefficients, map-averaging techniques (Main, 1967 ; Bricogne, 1976 ; Kleywegt & Read, 1997 ) and free-atom modelling, refinement and model building (Perrakis et al., 1999 ). During refinement, implementation of maximum-likelihood (ML) targets (Murshudov et al., 1997 ; Brünger et al., 1998 ) together with σA-weighted map coefficients (Read, 1986 ) to produce electron-density maps of the form (2m|F obs| − D|F calc|, αcalc) can significantly reduce model bias. ‘Classical’ OMIT maps (Bhat, 1988 ; Bhat & Cohen, 1984 ), σA-weighted OMIT maps (Read, 1986 , 1990 ), shake OMIT maps (Zeng et al., 1997 ) and simulated-annealing OMIT maps (Hodel et al., 1992 ) are often used for this purpose. The statistical-based reciprocal-space density-modification method (Prime&Switch) can be applied to initial experimental maps or model-phased maps. This has been implemented in the program RESOLVE and performs well at low resolution with marginal models (Terwilliger, 1999 , 2000 ). Recently, an efficient bias-removal protocol ‘Shake&wARP’ has been made available as a web service (Reddy et al., 2003 ) using a combination of EPMR (Kissinger et al., 1999 ) and the CCP4 suite of programs (Collaborative Computational Project, Number 4, 1994 ). Finally, the direct-method program OASIS (Hao et al., 2000 ) has been extended to perform dual-space molecular-replacement model completion (He et al., 2007 ).
The second most important phasing method in macromolecular crystallography is single-wavelength anomalous diffraction (SAD). SAD is based on accurately collected anomalous intensity differences arising from the presence of heavy atoms. Naturally, determination of the substructure becomes easier when an anomalous difference Fourier synthesis can be calculated using preliminary phases from an MR solution. The subsequent use of this substructure to generate an unbiased electron-density map (Baker et al., 1995 ) is often referred to as MRSAD (molecular replacement with single-wavelength anomalous diffraction; Schuermann & Tanner, 2003 ).
In the past few years, several automated structure-determination pipelines have been developed with varying degrees of automation and often with rather different goals. These include ACrS (Brunzelle et al., 2003 ), PHENIX (Adams et al., 2002 ), ELVES (Holton & Alber, 2004 ), CRANK (Ness et al., 2004 ), SGXPro (Fu et al., 2005 ), Auto-Rickshaw (Panjikar et al., 2005 ), autoSHARP (Vonrhein et al., 2006 ) and HKL-3000 (Minor et al., 2006 ). Most of them are based on experimental phasing approaches. More recently, software aimed at automatically assembling the set of ‘best’ models for MR has also been developed. Examples are MrBUMP (Keegan & Winn, 2007 ) and BALBES (Long et al., 2008 ). The MR software pipelines make several decisions concerning the actual protocol for sequence alignment and homology modelling, the truncation of the model in regions of uncertain homology and the choice of the MR software engine. The current consensus approach is to derive a variety of models and to try MR for all of them one by one, followed by preliminary refinement and ranking of each potential solution.
Here, we demonstrate that by using some of the abovementioned developments structure solution by a combination of MR and SAD can be automated and that even poor MR or SAD phases can be significantly improved. This approach is useful for the validation of MR solutions and for the reduction of model/phase bias. It is especially practical in cases in which the anomalous signal is not sufficiently strong to solve the structure by experimental phasing but is good enough to bootstrap the structure starting from a preliminary MR solution. The incorporation of the method into Auto-Rickshaw allows the fully automated determination of a large fraction of the structure from X-ray data extending to better than 2.6 Å resolution for most cases studied.
Ten test cases were selected from the JCSG data depository (http://www.jcsg.org/datasets-info.shtml). All of these data sets were collected at the high-energy side of the selenium K absorption edge (Table 1 ). The examples covered maximum resolutions ranging from 1.8 to 2.5 Å, were distributed among various crystal forms and seven different space groups and contained between 116 and 1356 amino-acid residues in the asymmetric unit. The sequence identity of the available search models to the target structures ranged between 36 and 51%.
The program MrBUMP (Keegan & Winn, 2007 ) was used for search-model selection based on the sequence identity to the target structure as the main selection criterion. The quality of the search model was assessed by calculating the r.m.s.d. to the homologous part of the target model. When the final refined target structure was superimposed onto the corresponding search model, the r.m.s.d. values ranged from 0.7 to 2.4 Å based on 72–371 superimposed Cα atom pairs.
The process of MRSAD is shown schematically in Fig. 1 . The one common entry point to the MRSAD procedure is a set of heavy-atom sites X H. These sites can be determined from the observed anomalous differences ΔF o either via heavy-atom substructure determination by Patterson, direct-methods or dual-space techniques or via model phases αc,MR resulting from an MR solution or a partial model. The sites are used to compute an initial set of phases αSAD, which are improved by density modification, noncrystallographic symmetry (NCS) averaging (where applicable), phase extension etc. to yield the modified phases αMOD, which in turn are the starting phases for model building. In the second cycle, the model phases αC,SAD derived from the built partial model are used to update the heavy-atom sites and are then combined with the SAD phases derived from the updated heavy-atom substructure. The resultant combined phases αCOMB are then used again for density modification and model building. The procedure is repeated until most of the structure has been built.
The success of the MRSAD protocol was judged on the basis of the fraction of the total amino-acid residues built as well as by the R free of the refined partial model. In our experience, for structures traced to a reasonable completeness the fraction of the side chains docked is a good indicator of the overall quality of the model (data not shown).
The MRSAD approach has been implemented in the automated structure-determination pipeline Auto-Rickshaw. The respective crystallographic computer programs invoked at every step are depicted in the MRSAD flowchart (Fig. 2 ). In Auto-Rickshaw, the required input parameters for the MRSAD protocol include only the space group, the number of amino-acid residues per subunit, the number of subunits in the asymmetric unit, the amino-acid sequence of the target structure or a search model and native or anomalous data. The Auto-Rickshaw web server (http:/www.embl-hamburg.de/Auto-Rickshaw) allows the user to follow the progress of the structure determination conveniently. It also provides visualization of the resulting model and the possibility to download all relevant files for further inspection. An initial overview of the Auto-Rickshaw framework has been described previously (Panjikar et al., 2005 ). In the following, the various tasks performed in Auto-Rickshaw are described in more detail.
(i) If a search model for MR is provided as input and if the difference in the unit-cell parameters between the search model and the input X-ray data is larger than 1%, MR is performed using the program MOLREP (Vagin & Teplyakov, 1997 ). Otherwise, this step is skipped and the model is refined directly (see below). (ii) If the amino-acid sequence of the target structure is provided by the user, the MR pipeline BALBES (Long et al., 2008 ) is executed, which uses the models of domains from its own database and refines potential solutions using REFMAC5. Auto-Rickshaw then proceeds to the next step using the best MR model provided by BALBES.
This step involves rigid-body refinement of each chain of the MR model using CNS (Brünger et al., 1998 ) at 4 Å resolution. Afterwards, positional and B-factor refinement at 3.0 Å resolution is carried out. The resulting model is then used for refinement with REFMAC5 (Murshudov et al., 1997 ) to the maximum resolution of the provided X-ray data. If the asymmetric unit contains more than one molecule and if the resolution is lower than 1.8 Å, NCS restraints are included in the refinement. Once an R free of less than 30% is reached, the process is terminated; otherwise, it continues with the next step.
When the resolution of the X-ray data is 2.6 Å or higher, phases calculated from the refined model are subjected to statistical density modification using PIRATE (Cowtan, 2000 ). The resultant phases are then used for automated model building using ARP/wARP (Perrakis et al., 1999 ). When the resolution is lower than 2.6 Å, ‘Prime&Switch’-based density modification and model building are performed using RESOLVE (Terwilliger, 1999 , 2000 ).
This step can only be performed when the input intensity file contains the Friedel pairs. The model phases and the anomalous differences are combined into a single MTZ file using CAD (Collaborative Computational Project, Number 4, 1994 ) and an anomalous difference Fourier map is calculated with FFT (Collaborative Computational Project, Number 4, 1994 ). A peak search is performed with PEAKMAX (Collaborative Computational Project, Number 4, 1994 ) and the site selection is based on the peak-height list produced. Initially, all sites above 5σ (where σ denotes the standard deviation of the anomalous difference Fourier map) are considered. Then, only sites which are above the threshold identified by a drop in the peak height of more than 65% between successive sites are selected. If no such drop can be identified in the peak list, the remainder of the peak list is searched until the peak height reaches 4.5σ. If the substructure model is poor (peak heights between 5σ and 9σ), RESOLVE is invoked for ‘Prime&Switch’ density modification and the resultant phases are used for the substructure solution. If all peak heights are above 13σ, SHELXC and SHELXE (Sheldrick, 2008 ) are used for density modification. SHELXE is executed for 400 cycles using the ‘free-lunch’ algorithm (Caliandro et al., 2007 ). The phases and structure factors are theoretically extended to 1.5 Å if the resolution of the experimental data is between 2.0 and 1.5 Å. The success of the procedure is gauged by the connectivity of the map. If this approach is successful, the next steps are skipped and the procedure continues with automated model building (see below) using ARP/wARP.
The automatically refined MR model or the partial model resulting from ARP/wARP or RESOLVE and the heavy-atom sites found from the previous step as well as the anomalous data are used to produce a set of phases using Phaser. The purpose of this step is to validate the initial heavy-atom sites determined from the anomalous difference Fourier map and to find additional heavy sites which could not be detected in the map. When the MR solution is poor and the anomalous difference Fourier map does not generate heavy-atom sites with peaks higher than 5σ, Phaser may still be able to identify some low-occupancy sites. Should Phaser not succeed in producing a list of heavy-atom sites or if the heavy atoms are known from the previous step, OASIS-2006 (Zhang et al., 2007 ) is used for dual-space phasing.
The pipeline can invoke three heavy-atom refinement and phase-calculation programs: MLPHARE (Collaborative Computational Project, Number 4, 1994 ), SHARP (de La Fortelle & Bricogne, 1997 ) and BP3 (Pannu et al., 2003 ; Pannu & Read, 2004 ). Initially, MLPHARE is executed to refine the occupancy of the sites to the maximum resolution of the data. If the figure of merit (FOM) does not exceed 10%, the resolution limit is decreased by 0.2 Å and the sites are refined at the lower resolution. If after this step the FOM has not risen above 15%, SHARP is used for refinement and phase calculation. If SHARP does not succeed, the refinement is continued using BP3.
The two sets of phases calculated in steps 5 and 6 are combined using SIGMAA (Read, 1986 ). This step is skipped for an MR solution, when only native data are available. The resulting phases are improved by density modification and NCS averaging in PIRATE or RESOLVE.
A beta version of SHELXE (Sheldrick, 2009 ) is used to build a polyalanine model using the phases calculated in the previous step. The updated substructure is used in step 5 if the cycle is repeated, for example when the model is not completed in the current cycle.
The choice of programs for model building depends on the resolution of the X-ray data. If the value for the approximate resolution for 50% solvent content d 50 [according to the formula d 50 = d min(sc−1 − 1)1/3, where d min is the nominal maximum resolution of the X-ray data and sc is the solvent content of the crystal] is higher than 2.6 Å, the initial model building is carried out with ARP/wARP v.7.0.1. The number of building cycles is dependent on the map quality, which is assessed from the number of residues built in the first building cycle. If d min is less than 2.0 Å and more than 70% of the model is built in the first building cycle, the total number of building cycles is set to five, whereas in all other cases ten building cycles are used. If the maximum resolution is lower than 2.6 Å then RESOLVE is used. When a polyalanine model is available from step 8, it is used as a starting model in ARP/wARP and density-modified phases are used for phased refinement in REFMAC5 for iterative automated model building and side-chain docking. The benefit of the phased refinement is that ARP/wARP usually requires fewer building cycles. Similarly, if the model-building path uses RESOLVE, the polyalanine model is used as a starting model for further building and side-chain docking.
The model generated in step 9 is now refined to the maximum resolution of the data using REFMAC5. If the resultant R free is lower than 30%, the automated procedure is considered to be complete. Otherwise, if the built model is less than 70% complete (using RESOLVE) or 90% complete (using ARP/wARP), an anomalous difference Fourier map is calculated based on the latest phase set and steps 4–10 are repeated. Auto-Rickshaw checks the improvement after every big cycle (steps 4–10). The improvement is gauged by the standard deviation of the local r.m.s. of the electron-density map after density modification, the total number of residues built, the R free value from the refinement of the model and the absolute peak heights in the anomalous difference Fourier map. For MR based on native data alone, progress is assessed based on the fraction of the model built and the R free from the refinement of the model.
In the case of MRSAD all of the abovementioned steps are carried out. However, when the substructure cannot be resolved because of a poor-quality model and/or poor anomalous data then the MRSAD protocol switches to a conventional MR recycling protocol. This protocol consists of steps 1–3 and steps 5, 9 and 10. The process is iterated until there is no further improvement from one cycle to the next. The MR recycling protocol can also be invoked using the native data and sequence or model information.
The major goals of the above implementation are to overcome the model bias from an MR solution, to build a more complete model from a partial and possibly fragmented preliminary model, and to use anomalous data in aiding model building in electron-density maps generated from MR phases.
The data sets used to evaluate the MRSAD procedure of Auto-Rickshaw are listed in Table 1 . The ten examples are sorted by increasing strength of the anomalous signal as indicated by the ratio R anom/R p.i.m.. Also shown are the PDB codes of the search models used for MR in each of the cases and the sequence-identity percentages of the search models. In order to evaluate the described MRSAD procedure, a comparison of MRSAD with a purely MR-based structure-determination procedure (Table 2 ) and with a standard SAD phasing procedure (Table 3 ) was performed using the ten test cases.
In Table 2 , three approaches based on the primary phase information from an MR solution are compared with each other: the conventional MR procedure, the MR recycling procedure and the MRSAD procedure described above. The conventional MR procedure simply entails structure solution using MR and subsequent model refinement using CNS and REFMAC5. The MR recycling procedure is based upon iterative improvement of the MR phases. In each phasing cycle (PCMR) the MR phases are improved by model completion using OASIS-2006, density modification using PIRATE/RESOLVE and model building using ARP/wARP. In the MRSAD procedure a phasing cycle (PCMRSAD) consists of steps 4–10 described above. The numbers presented in Table 2 and graphically in Fig. 3 (a) demonstrate that the MRSAD procedure yields a larger fraction of automatically built amino-acid residues and equally low or lower R free values in all cases. The number of phasing cycles is also typically reduced, leading to quicker structure determination. This can be a decisive factor when structure determination is invoked whilst a user is at a synchrotron beamline collecting data, when quick answers are required in order to have an influence on further data-collection strategy. A striking example in this respect is the test case 1vkn. This structure contains four molecules of 339 residues each in the asymmetric unit in space group P21. The maximum resolution of the X-ray data is 2.45 Å. In this case, MR phasing alone was not sufficient to complete the model. Even after a round of MR recycling the free R factor was still above 50% and the model could not be improved any further. In contrast, three rounds of MRSAD cycling produced a model with about 60% of the residues built and about 40% of all side chains docked into the electron density.
The described MRSAD procedure was also compared with a purely SAD phasing approach, as well as with a SAD phasing with subsequent model refinement approach (Table 3 , Fig. 3 b). For seven of the ten test cases the substructure could be solved, making them amenable to SAD phasing in the ‘Advanced version’ of Auto-Rickshaw. The remaining three cases (2gi3, 1vmf and 2f4l) were thus not further considered. For two of the seven successful cases, the SAD phases turned out to be so good that most of the structure was built automatically and that the free R factor was already below 30% after model refinement, so that no further improvement by MRSAD was anticipated. For the remaining five cases the improvement of MRSAD over SAD is clearly discernible from the numbers in Table 3 and the graphs in Fig. 3 (b). 1vkn is again a striking case: SAD phasing alone was difficult in spite of the rather high R anom/R p.i.m. ratio of 2.4. The automatically built model from SAD phases alone contained only 209 of the 1356 residues present in the asymmetric unit. This partial model was used as input for the MRSAD protocol and was directly fed into Phaser for SAD phasing and substructure completion. Phaser produced 40 heavy-atom sites, corresponding to 32 Se and eight S atoms. The sites were refined in MLPHARE to a maximum resolution of 2.45 Å. The phases from Phaser and MLPHARE were then combined and density modification and NCS averaging with RESOLVE were carried out followed by model building with ARP/wARP. In the first phasing cycle, the MRSAD protocol resulted in the building of 464 residues. This model was refined using REFMAC5 to R work and R free values of 49.7% and 51.4%, respectively. In the second cycle, 682 residues were built and refinement of the model gave R work and R free values of 46.2% and 50.6%, respectively. In the fourth cycle, 917 residues were built and 633 residues were docked into the sequence. Refinement of the model resulted in R work and R free values of 39.1% and 41.1%, respectively. A further round of MRSAD phasing did not improve the total number of residues and R free increased by 3%. Therefore, the procedure was halted at this point. The evolution of the electron density and the model is shown in Fig. 4 . This particular example demonstrates that in cases when SAD phases are weak and insufficient to produce a good starting model the MRSAD protocol can rescue the situation.
Since the implementation of the MRSAD phasing protocol in Auto-Rickshaw in August 2007, 84 users have used MRSAD to solve a total of 120 novel structures with resolutions ranging from 2.7 to 1.5 Å and the number of amino-acid residues in the asymmetric unit ranging from 100 to 3000. 46 structures were solved starting from a search model or from sequence information, whilst the remaining structure solutions started from experimental phases. A recent example is the crystal structure of Plasmodium falciparum profilin (Kursula et al., 2008 ), where a partial model (60 residues of 171) was obtained using the three-wavelength Br MAD data sets. This model and the Br peak data set were used as a starting point in the MRSAD protocol, which provided an almost complete model.
The developed protocol can be applied to various kinds of problems. One particularly useful application is the model completion of protein–protein complex structures. As an example, the structure of vascular endothelial growth factor (VEGF-A) in complex with an engineered binding protein was solved using the MRSAD protocol based on a search model available for VEGF and using long-wavelength data (Giese & Skerra, unpublished work). Even for very large structures, such as, for instance, muconate-lactonizing enzyme from Klebsiella pneumoniae (3048 residues and eight subunits in the asymmetric unit; PDB entry 3fcp; Fedorov et al., unpublished work), model completion has successfully been achieved.
The Auto-Rickshaw platform has been installed on a 68 CPU-core cluster at EMBL Hamburg. It is available via a web server at http://www.embl-hamburg.de/Auto-Rickshaw/. Registration and use of the server are free of charge for academic users.
The Auto-Rickshaw platform is undergoing continuous development. This includes the incorporation of new functionalities as well as continuous software upgrades. A number of additional tasks will be incorporated into the MR and MRSAD protocols of the Auto-Rickshaw software pipeline in the future. These include the use of other molecular-replacement programs (e.g. Phaser), use of the SAD function (Skubák et al., 2004 ) in refinement and a link to automatic data-collection software such as DNA (Leslie et al., 2002 ) and automated data-processing systems such as XIA2 (http://www.ccp4.ac.uk/xia/). Another important aspect is the evolution and improvement of the decision making by evaluating an ever larger number of test cases and by extensive parameter screening in order to increase the efficiency of the coded decision making for the described phasing protocols.
We would like to express our thanks to the developers of the various computer programs for their kind permission to use their software in the Auto-Rickshaw pipeline. We also gratefully acknowledge the generous supply of X-ray data from the JCSG data depository by Ashley Deacon. The work was supported in part by the EC-funded BIOXHIT project (contract No. LHSG-CT-2003-503420).