To help users through the various steps from images to a fully processed, scaled and merged data set, various comprehensive software packages have been developed (Pflugrath, 1999
; Holton & Alber, 2004
; Sauter et al.
; Minor et al.
; Winter, 2010
). Over the last five years, we have developed a set of programs that make up the autoPROC
framework together with several third-party programs. The collection of modules that make up this framework are intended as an offline tool for the fully automatic processing of diffraction images from single-sweep or multi-sweep experiments (e.g.
multi-wavelength MAD, low-resolution and high-resolution passes, inverse-beam or interleaved-wavelength data collection). The typical steps during this process involve (i) image analysis; (ii) spot search; (iii) indexing; (iv) initial analysis of diffraction quality and detector parameters; (v) refinement of initial unit-cell parameters, orientation and mosaicity; (vi) determination of the most likely space group; (vii) integration of all images and (viii) scaling and merging of integrated intensities (see Fig. 4).
Typical offline data-processing steps.
Since June 2005, autoPROC has been released to members of the Global Phasing industrial consortium as well as various academic beta testers and synchrotron beamlines. It has been extensively used and incorporated in high-throughput pipelines and has seen several updates since then. The latest version is expected to be released to academic users in the first quarter of 2011.
is implemented as a series of modules for the various steps shown in Fig. 4. Each module is clearly separated from the others, with a defined set of input and output parameters. The original implementation used mainly MOSFLM
) and SCALA
) as the pipeline components. Subsequent developments added support for XDS
) as the data-processing engine and POINTLESS
) for space-group determination. Several programs from the CCP
4 suite (Collaborative Computational Project, Number 4 , 1994
) are also used within the pipeline. Additional software components developed exclusively for autoPROC
are available to add further functionality and robustness. A collection of auxiliary tools is provided to help the user during automated data processing. Execution of programs is mainly command-driven and in its simplest form can take place through a single command (using all default settings)
Several mechanisms are provided to fine-tune the data processing and decision-making for a particular data set, a specific beamline or instrument, a series of data collections coming from a known crystal form or challenging projects that might require nonstandard parameters. Owing to the many data sets that a typical synchrotron trip can yield, a macro facility is implemented to group a collection of settings to enable easy and fast application of autoPROC
to a large collection of data sets. This also allows easy incorporation of the software into a larger in-house pipeline, e.g.
in drug-discovery programs or structure-based drug design.
8.2. Determining the beam centre
The GETBEAM program is provided in order to help the user to understand the relationship between the image-header values for a specific instrument or beamline and the values expected by the integration program (as driven through autoPROC). It allows the testing of coordinate conventions, the analysis of direct-beam shots and the refinement of input beam-centre coordinate values.
If a direct-beam shot image is given, the largest pixel value in the image array is used. The search algorithm is restrained to the initial beam-centre value (which is usually obtained from the image header) in order to avoid finding a rogue pixel or zinger, as shown in Fig. 5.
Figure 5 Using GETBEAM to help define direct-beam coordinates: (a) background-only image for 1vq0 (JCSG, 2006 ) with lines used for calculating correlations between opposite areas; (b) part of a direct-beam shot image with enlarged areas around the direct-beam (more ...)
When no direct-beam shot image is available, a series of normal images can be used. To remove the effect of diffraction spots on these images, a so-called underlay image is constructed (Pflugrath, 1999
): for this, the smallest pixel value found in all images at each position is taken. The final image should be void of actual diffraction spots if several images wide enough apart in oscillation angle are used. Ideally, the only remaining feature of this image should be the diffuse background coming mainly from the solvent in and around the crystal. In a setting where the direct beam is perpendicular to the detector surface this should be a radially symmetric distribution with the direct-beam coordinates at its centre. Fig. 5 shows a series of lines emanating from the current beam centre constructed in order to calculate the correlation of pixel values between opposite lines. This score is used in either deciding which of the eight possible choices of origin is the most likely or, if well defined features with circular symmetry such as ice rings are present, to refine an initial beam-centre position.
A collection of 356 data sets (JCSG, 2006
) collected between October 2001 and September 2010 was used to analyse the usefulness of this method to determine the most likely coordinate convention that the beam-centre values recorded in an image header refer to. Nearly half of these data sets (170) had the beam centre recorded as the midpoint of the image and were excluded from further analysis. Of the remaining 186 data sets, three could not be indexed correctly. For the remaining 183 data sets the average distance between refined beam-centre values and the values recorded in the header was 67.9 pixels. On the other hand, the same average distance after using GETBEAM
was only 5.4 pixels. This clearly shows the benefit of testing for the coordinate convention of header values using this approach.
8.3. Multiple lattices
allows the detection of multiple lattices and robust indexing of the main lattice (see Fig. 6). This is achieved through an iterative selection of spots matching the current indexing matrix. This approach is similar to that presented by Sauter & Poon (2010
). Spots that clearly do not match the current orientation matrix are pooled for a second round of indexing: in this way, additional lattices can be detected automatically and their relation to the main lattice can be analysed. Furthermore, spots that cannot be indexed at all within any of the orientation matrices obtained are used to search for possible ice rings in the diffraction images (Fig. 2).
Figure 6 Visualization of multiple lattices in 1vk2 (JCSG, 2006 ) by autoPROC: both pictures show ‘lattices’ in different colours. The two main lattices are shown in red and blue.
Data processing is performed using the best orientation obtained for the highest populated lattice (see Fig. 7), but the user could also select any of the minor lattices for integration. However, with the current integration programs implemented in autoPROC there still remains the possibility of wrongly integrating spots that overlap between the lattices or of the parameter refinement switching between lattices for specific crystal orientations (where the lattices are not separated on the data images). Further developments will aim to address the problem of integrating and processing overlapped spots in the presence of multiple lattices.
Figure 7 Determining separate orientation matrices for different lattices in 1vk2 (JCSG, 2006 ): (a) predictions for the main lattice (fulls, blue; partials, yellow; too wide in ϕ, green); (b) diffraction image without predictions; (c) minor lattice (more ...)
8.4. Consistent indexing
In all cases where exact consistency of indexing is required between the individual sweeps of a multi-sweep data set in which the action of a goniostat has been involved, autoPROC uses an auxiliary program KAPPAROT to calculate the motions of general goniostats (Kappa and Eulerian) as well as those of 2θ arms if applicable. Instrument definitions are flexible and follow simple rules regarding right-handed coordinate systems and axis rotations (Fig. 8).
Based on this description (see the example in Fig. 9), the well defined relation between separate sweeps is maintained and the resulting orientation matrices will be correctly related through the known goniostat motions, provided the complete set of required goniostat angles is written into each image header.
Figure 9 Defining goniostat axes. The so-called Cambridge reference frame follows the definition of MOSFLM (Leslie, 1992 ).
This is achieved by using a general treatment of multi-axis goniometry and detector geometry first proposed by Thomas (1986
) and used in the EEC Workshop on Position-Sensitive Detector Software (Bricogne, 1986
) to convert the initial version of the MADNES
program, originally written for the Nonius FAST detector (Messerschmidt & Pflugrath, 1987
), into an instrument-independent package (Pflugrath, 1997
). The same treatment was subsequently implemented in d*TREK
) and extended by Paciorek et al.
To check the results obtained during data processing, autoPROC converts the XDS orientation information into a form suitable for use with MOSFLM (as seen in Fig. 10). This allows visual inspection of the predictions made on the basis of the current orientation matrix, unit-cell parameters, mosaicity etc.
Figure 10 Visualizing XDS results with MOSFLM: the orientation matrix from XDS is transformed by autoPROC into MOSFLM format, together with distance, beam centre and mosaicity. The resulting descriptions can directly be loaded into MOSFLM, where interactive tools (more ...)
To keep the amount of information given to the user at a minimum, the most important results (indexing solution, space-group determination, merging statistics, automatic determination of high-resolution limit), together with some notes and warning messages, are reported. Several statistics as well as refined parameters are given either as a function of resolution or as a function of image number. The former allow decisions to be made regarding appropriate resolution cutoffs, whereas the latter can show events or trends during rotation of the crystal (see, for example, Fig. 11).
Figure 11 Scale factor based on background scatter versus image number from XDS. These plots are generated automatically by autoPROC. (a) shows a typical example of the different scattering power of a crystal during a full 180° rotation; (b) shows an event (more ...)
The current version of autoPROC
is available free of charge to academics, who should go to http://www.globalphasing.com/autoproc/
for further details. Questions about autoPROC
should be sent to proc-develop/at/globalphasing.com