Fig. 1 shows a simulated X-ray snapshot diffraction pattern from a nanocrystal of Photosystem I (PSI, space group

*P*6

_{3},

*a* =

*b* = 281,

*c* = 165 Å) at 1.8 keV, to indicate the idealized features of a typical nanocrystal diffraction pattern [see Kirian

*et al.* (2010

) for details]. This is a fully spatially coherent simulation of a randomly oriented parallelepiped crystal of 17 × 17 × 30 unit cells (~0.5 µm in size), with a spectral width of 0.1% and beam divergence of 1.5 mrad. The diffracted intensity is given by the intersection of the Ewald sphere with the Fourier transform of the entire crystal, and in this case the crystals are small enough to produce a series of subsidiary intensity maxima surrounding each reciprocal-lattice point. While a single diffraction pattern contains only partially integrated reflection intensities, we describe below how structure factors may be extracted through averaging the intensities from many such partial reflections collected from crystals which vary in size, shape and orientation. Previous simulations (Kirian

*et al.*, 2010

) showed that, for a 10% size variation, this procedure can produce highly accurate structure factors with as few as some tens of thousands of nanocrystals in the absence of beam divergence, spectral width and crystal mosaicity. We term this Monte Carlo merging because we assume that, in the absence of a goniometer, a sufficient number of randomly oriented crystallites will sample all possible crystal orientations, sizes and shapes approximately uniformly. This assumption is supported by our previous simulations (Kirian

*et al.*, 2010

). Non-uniform orientation distributions are discussed in §7

.

If we assume monochromatic plane-wave radiation with incident wavevector

**k**
_{i} (|

**k**
_{i}| = 1/λ), the diffracted photon flux

*I*_{n} (photons per pulse per pixel) at scattering vector Δ

**k** =

**k**
_{i} −

**k**
_{o} from the

*n*th randomly oriented finite crystal is given in the kinematic theory as

where

*F*(Δ

**k**) is the transform of the average unit cell,

*S*_{n}(Δ

**k**) is the transform of the truncated crystal lattice (an interference function, similar to a three-dimensional sinc function laid down at reciprocal-lattice points),

*J*
_{o} is the average incident photon flux density (photons per pulse per area),

*r*
_{e} is the classical radius of the electron,

*P* is a polarization factor and ΔΩ is the solid angle subtended by the detector pixel. The vector Δ

**k** is defined in the crystal reference frame. We assume that absorption effects are negligible for our crystals of micron dimensions, although an absorption correction may be needed for the surrounding liquid in which the crystals are suspended.

For a given detector pixel, the observed Δ

**k** may be determined from the geometry of the detector and the crystal orientation. The average molecular transform

*F*(Δ

**k**) is defined here to be identical for all of the nanocrystals, but the lattice transform

*S*_{n}(Δ

**k**) depends on the size and shape of the crystal, and may differ significantly from one crystal to the next. However, we assume that the lattice transform always obeys the translational symmetry

*S*_{n}(Δ

**k**) =

*S*_{n}(Δ

**k** +

**g**
_{hkl}), where

**g**
_{hkl} is any reciprocal-lattice vector with Miller indices

*hkl*. For a perfect crystal,

*S*_{n}(

**g**
_{hkl}) is equal to the number of unit cells in the

*n*th crystal, and

*I*_{n}(

**g**
_{hkl}) is therefore proportional to the square of the number of unit cells. The integrated lattice transform is proportional to the square root of the number of unit cells, and the integrated reflection intensity is proportional to the number of unit cells (Holton & Frankel, 2010

).

The structure factors

*F*_{hkl} we would like to extract from diffraction data are ideally equal to the unit-cell transform evaluated at a reciprocal-lattice point

**g**
_{hkl}. Since the probability of observing diffraction precisely at

**g**
_{hkl} is essentially zero, we instead average intensities that fall within a small sphere centered at the lattice point. We define an integration domain radius δ, such that all intensities for which |Δ

**k** −

**g**
_{hkl}| < δ will be included in the average. After merging the indexed diffraction data in the three-dimensional diffraction volume from crystals that differ in size, shape and orientation, we may then form the average intensity

where

is understood to mean that we average over the distribution of crystal shapes and sizes, but only include intensity measurements for which |Δ

**k** −

**g**
_{hkl}| < δ is satisfied. If δ is sufficiently small we may write the approximation

and the structure-factor magnitudes may be evaluated as

We assume a well defined distribution of crystal shapes and size, so that there exists a mean value of

. Since any lattice transform is identical when translated by a reciprocal-lattice vector

**g**
_{hkl},

is a constant that does not depend on the specific Miller indices

*hkl* – an identical shape transform is laid down around every reciprocal-lattice point. We may therefore extract a quantity proportional to structure factors without any knowledge of the crystal size and shape distribution since

Accurate results can be expected from equation (5)

, provided that a reasonable integration domain radius δ is chosen, and that we measure a sufficient number of diffraction patterns to sample the various crystal shapes, sizes and orientations. The value of δ should be chosen to be smaller than the features of the unit-cell transform, which, according to the Shannon sampling theorem, corresponds to approximately δ < 1/2

*d* for the largest cell constant

*d*. Since intensities are averaged in a Monte Carlo fashion, relying on chance to provide all needed crystal orientations, shapes and sizes, the error in a measured mean intensity

*I* is the standard error of the mean

, where

*N* is the number of measurements (pixels) contributing to the particular intensity, and σ(

*I*) is the standard deviation in the intensity. An exceedingly small δ will drive down the value of

*N*, while an oversized δ may increase the variance in intensities or sample unwanted background counts. The distribution of crystal sizes will have a particularly significant effect on the value of

, and a narrow size distribution is clearly preferable, unless the data are scaled according to crystal size prior to merging of intensities. Optimization of δ will depend on beam divergence, spectral width, crystal disorder, mosaicity and so on, all of which have been neglected in our simplified model above. Errors introduced during data processing should also be considered, as discussed in §7

.