|Home | About | Journals | Submit | Contact Us | Français|
Following on from the success of the previous crystal structure prediction blind tests (CSP1999, CSP2001, CSP2004 and CSP2007), a fifth such collaborative project (CSP2010) was organized at the Cambridge Crystallographic Data Centre. A range of methodologies was used by the participating groups in order to evaluate the ability of the current computational methods to predict the crystal structures of the six organic molecules chosen as targets for this blind test. The first four targets, two rigid molecules, one semi-flexible molecule and a 1:1 salt, matched the criteria for the targets from CSP2007, while the last two targets belonged to two new challenging categories – a larger, much more flexible molecule and a hydrate with more than one polymorph. Each group submitted three predictions for each target it attempted. There was at least one successful prediction for each target, and two groups were able to successfully predict the structure of the large flexible molecule as their first place submission. The results show that while not as many groups successfully predicted the structures of the three smallest molecules as in CSP2007, there is now evidence that methodologies such as dispersion-corrected density functional theory (DFT-D) are able to reliably do so. The results also highlight the many challenges posed by more complex systems and show that there are still issues to be overcome.
This paper reports on the results of the fifth blind test of crystal structure prediction (CSP), an international test hosted periodically by the Cambridge Crystallographic Data Centre (CCDC). We refer to this fifth blind test as CSP2010.
Over the last several decades there has been much research in the field of crystal structure prediction. The grand aim is to develop the ability to reliably predict, by computational methods, how a molecule will crystallize in the solid state, with only the chemical diagram and the crystallization conditions known. This would allow for the prediction of solid-state properties before the molecule or molecules in question had even been synthesized, and could also help determine the likelihood that different polymorphic forms, or as yet unseen polymorphs of currently known structures, exist. This application is of particular importance in the pharmaceutical industry where the presence of different polymorphs can lead to very different and potentially undesirable physical properties of new drugs.
For the last decade the CCDC has held periodic blind tests to assess the current reliability and capabilities of the techniques available in the field. Four blind tests, starting in 1999 and every 2 or 3 years thereafter, have previously been held. Each has required the identification of a set of molecules with known but previously unpublished crystal structures to use as targets for the participants to predict using the various techniques they have developed. This approach is similar to that adopted to monitor and test advances in other areas of predictive modelling, such as protein structure prediction (Moult et al., 2007 ). Recently there has also been a blind test for search methods for the crystal structure prediction of purely inorganic systems (Oganov, 2010 ). Repeating the blind test periodically helps to evaluate advances that have been made in methodologies since the last test, as well as establish the reliability of the techniques which have been successful in previous tests for a given category of target; the small number of targets in any one blind test introduces the possibility of a slightly easier or harder molecule (whose difficulty cannot be easily judged prior to commencement of the test) influencing the results.
This fifth blind test was therefore held to assess the reproducibility of the good results (Neumann et al., 2008 ; Day et al., 2009 ) from the previous blind test, CSP2007, and also to assess the developments in methodologies when applied to more challenging targets than the relatively simple rigid molecules mostly studied thus far. These additional targets better represent cases that would be more likely to be encountered in the pharmaceutical industry.
The organization for this latest blind test, CSP2010, was similar to that used for the previous four evaluations of the field, the results of which have been previously published: CSP1999 (Lommerse et al., 2000 ), CSP2001 (Motherwell et al., 2002 ), CSP2004 (Day et al., 2005 ) and CSP2007 (Day et al., 2009 ). Invitations to participate were sent to 24 research groups known to be active in the field. The test was also advertised through various websites and meetings.
The previous blind test puts forward targets for prediction in the following four categories:
These four categories were left the same as those used in CSP2007 so as to facilitate comparison of results. In addition, it was decided to add two new categories that would provide greater challenges:
The new fifth category presents a much greater challenge in terms of flexibility than previously encountered in earlier blind tests, with a large flexible molecule intended to represent those often associated with modern pharmaceuticals. The new sixth category gives an opportunity to study the challenging effects of polymorphism by introducing a molecule for which more than one polymorph is known.
Crystallographers were contacted in August 2009 with a request for unpublished crystal structures that matched one or more of the six categories for the fifth blind test. Crystal structures were collected at the CCDC and assessed for the possibility of inclusion in one of the six possible categories. To be suitable, a crystal structure had to be of high quality and have all atoms located with no disorder. The crystal structure had to be unpublished and the donor crystallographer had to agree to postpone any publication for the duration of the blind test. Collection of suitable candidates for all six categories proved exceptionally difficult, especially for category 1, where the target molecule is very small with a very restricted set of constituent elements, and also for category 6 where few suitable candidates were available that were not of sufficient interest to be withheld from publication for the duration of this test. Almost 30 submitted crystal structures had to be rejected either due to not conforming to any of the six categories, or the presence of refinement issues such as disorder.
After considerable effort, one candidate was collected for category 1, four for category 2, eight for category 3, three for category 4, three for category 5 and one for category 6. For those categories where there was more than one candidate, the final target choice was made randomly.
For category 6, the one candidate that was submitted was gallic acid monohydrate, for which two new polymorphs had been found. These complemented the two previously published polymorphs for gallic acid monohydrate, which are located in the Cambridge Structural Database (CSD; Allen, 2002 ) under the KONTIQ CSD reference code family. For the purposes of this blind test, these known forms are referred to as forms (1) and (2). Of the two new forms submitted as candidates for prediction, one [form (4), as recently published by Clarke et al., 2011 ] had one formula unit in the asymmetric unit (i.e. one gallic acid and one water molecule). The other, form (3), was originally solved with two formula units per asymmetric unit. However, analysis after the blind test submissions showed that this solution contained a disordered hydrogen-bonding network and the crystal structure could also be described with an ordered hydrogen-bonding network by doubling the unit cell, as now published (Clarke et al., 2011 ). For the purposes of this blind test, form (3) was therefore deemed inappropriate as a target crystal structure. The main aim for this category, then, was to predict form (4), whose structure has been recently independently published (Demirtaş et al., 2011 ) and see where (if at all) forms (1) and (2) appeared in the ranked list of predictions.
The molecular diagrams and crystallization conditions were sent by e-mail to 15 participant groups on 16 November 2009. Immediately after circulation of the target crystal structures we were made aware that the crystal structure of the molecule selected for category 1 (4-ethynylbenzonitrile) had been solved, was undergoing publication and so would soon be in the public domain. The decision was therefore made to remove this candidate for category 1 and attempt to locate a suitable replacement. Thankfully a suitable candidate was quickly provided and the revised list of target molecules, as detailed in Table 1 , was distributed to participants on 23 November 2009. Following the numbering used in the previous blind tests we refer to these molecules by the Roman numerals (XVI)–(XXI).
The format of this blind test was kept broadly the same as the last blind test, with the exception that a greater length of time was allowed before submission of results. Participants were requested to forward their three ‘official’ predictions for each target molecule to the CCDC, where the experimentally determined crystal structures were held for the duration of the test. As well as these three main predictions, participants were urged to submit an extended list of the crystal structures they generated in order to help post-analysis and to provide insight into the performance of the various methods. The deadline for submissions was 20 August 2010. The experimentally determined crystal structures for all six categories were then circulated to all participants on 23 August 2010 to allow post-analysis of their predictions. Lastly, a workshop was held at the CCDC mid-September 2010 to discuss the results.
We present here results from the 14 participating groups that agreed to publish their results. Details of these 14 participating groups, together with a summary of which targets they attempted and if a match with the experimental structure was observed in their submission, are presented in Table 2 (a).
Methodologies for the participating research groups vary significantly. A summary of the techniques used by each of the groups is presented in Table 2 (b), together with key references for most of the methods used. More detailed descriptions are also provided in the associated supplementary material.1
In general, each of the methods employed involved three general steps:
There are two main approaches that can be used for treating the molecular structure during crystal structure prediction. Firstly, the molecule can be treated as rigid throughout the calculations, assuming that the packing forces are too small to significantly distort the molecular geometry. In this case the method used to determine the rigid molecular structure is vitally important, as the effect of the molecular structure on crystal energy calculations can be large (Beyer & Price, 2000 ).
Alternatively, the structure can be considered as flexible with intramolecular bond stretching, angle bending and torsional terms allowed to vary during the search as well as the final energy minimizations. For extremely flexible molecules such as target (XX) the conformational distributions can be reduced to a more manageable level via methods such as analysis of conformational preferences using software such as Mogul (Bruno et al., 2004 ).
There are many diverse methods for generating crystal packing arrangements in order to achieve a variety of plausible packing arrangements. Most participants in this blind test opted to generate large numbers of crystal structures with random or quasi-random variables such as unit-cell parameters and positions and orientations of the molecules. Several groups also elected to use a low-discrepancy Sobol’ sequence (Sobol’, 1967 ; Press et al., 1992 ). This helps ensure a more uniform and thus efficient sampling and avoids the problems of gaps and clusters that purely random sampling can exhibit. Other groups used Monte Carlo types of search, genetic algorithms, grid-based systematic searches or first-principles ab initio random structure searching which allows the possibility of a change in covalent bonding (Pickard & Needs, 2006 , 2011 ).
For the majority of these methods, space-group symmetry is used. These methods search each space group and Z′ separately and so in order to help reduce the computing time required, many groups chose to restrict their search to only the most commonly adopted space groups. This blind test saw two groups electing to search all 230 space groups for some or all of their predictions. Other groups used the alternative approach of generating P1 crystal structures with varying numbers of independent molecules (up to 8) in the unit cell. Space-group symmetry was then identified in the resulting crystal structures, after energy minimization, using packages such as PLATON (Spek, 2009 ).
The final ranking of the crystal structures is still almost exclusively based on the calculated lattice energies of the structures generated by the crystal structure search. Often tens, if not hundreds, of possible structures can exist within a few kJ mol−1 of the calculated global minimum (Day et al., 2004 ) and therefore extreme accuracy is needed. One successful approach to generating these lattice energies is the DFT-D method, which can give more accurate lattice energies (Neumann & Perrin, 2005 ) or re-minimization of the structures with more sophisticated force fields such as distributed multipoles (Stone, 2005 ) and additional flexibility (Kazantsev, Karamertzanis, Adjiman & Pantelides, 2011 ; Day & Cooper, 2010 ; Görbitz et al., 2010 ). Moreover, additional or alternative criteria may be used to discriminate between likely and unlikely crystal structures. Such approaches include lattice dynamic contributions (van Eijck, 2001 ; Anghel et al., 2002 ) or comparisons to known crystal structures in the CSD (Dey et al., 2006 ), exploiting any isostructurality relationships (Asmadi et al., 2010a ,b ).
This paper is accompanied by a large amount of supplementary material: the coordinates of the experimental crystal structures, lists of predicted crystal structures by each participant, as well as detailed descriptions of methodology, results and post-analysis by most of the participating research groups. Before discussing the results of the predictions, the crystal packings in the X-ray determined crystal structures of the six categories are described.
2-Diazo-3,5-cyclohexadiene-1-one (C6H4N2O) was chosen as the blind test target for category 1 after the initial target, 4-ethynylbenzonitrile, was found to have been previously solved. Molecule (XVI) was crystallized by slow evaporation from ethanol and the crystal structure was solved from X-ray diffraction data collected at 174 K (Britton, 2010 ). The molecule crystallizes with Z′ = 1 in the orthorhombic space group Pbca. The crystal packing shows diazide-carbonyl and CHO= interactions (Fig. 1 ).
1,2-Dichloro-4,5-dinitrobenzene (C6H2Cl2N2O4) was chosen as the blind test target for category 2, although it deviates somewhat from the criteria for this category as the molecule is not truly rigid; the nitro groups allow for some degree of rotational freedom. Crystals were obtained by slow evaporation of methanol and X-ray diffraction data were collected at 174 K (Britton, 2010 ). The molecule crystallizes in the monoclinic space group P21/c with Z′ = 1 (Fig. 2 ).
(1-((4-Chlorophenyl)sulfonyl)-2-oxo-propylidene)diazenium (C9H7ClN2O3S) was the target for category 3. Molecule (XVIII) was crystallized by slow evaporation from ethyl acetate (EtOAc) and the crystal structure was solved from X-ray diffraction data collected at 150 K (Blake, 2010 ). The crystal structure was solved in the orthorhombic space group Pbca with Z′ = 1. The conformational flexibility can be described by three exocyclic torsion angles, as shown in Table 1 . The CN2CO moiety adopts a mostly planar trans configuration (Fig. 3 ).
1,8-Naphthyridinium fumarate (C8H7N2, C4H3O4) was chosen as the target for category 4. This 1:1 salt was formed by slow evaporation from methanol and the crystal structure was solved in the orthorhombic space group Pca21 from data collected at 200 K (MacGillivray, 2010 ) with Z′ = 1. The packing in this crystal structure is dominated by hydrogen bonds, with linear chains of fumarate and naphthylpyridinium ions forming alternating connections to these chains (Fig. 4 ). The crystal structure is isostructural with the entry RABYID in the CSD (Shan et al., 2003 ) where quinolinium is substituted for 1,8-naphthyridinium (i.e. one nitrogen is replaced by a C—H group).
Benzyl-(4-(4-methyl-5-(p-tolylsulfonyl)-1,3-thiazol-2-yl)phenyl)carbamate (C25H22N2O4S2) was chosen as the target for the new category 5. Molecule (XX) was crystallized by slow evaporation from EtOAc and the crystal structure solved in the monoclinic space group P21/n with Z′ = 1 (Blake, 2010 ). The conformational flexibility can be described with eight exocyclic torsion angles (Table 1 ). The molecule adopts an elongated S shape, with the central part of the molecule mostly planar, the greatest deviation from planarity being between the phenyl and thiazol groups with an angle of 13°. The mostly planar mid-section of the molecule forms stacks via a series of weak interactions with CH and NHOS as well as CHOC atom–atom contacts (i.e. shorter than the sum of van der Waals radii), as shown in Fig. 5 .
Gallic acid monohydrate (C7H6O5·H2O) was chosen as the target for the new category 6. Gallic acid monohydrate had two previously known forms, (1) (Jiang et al., 2000 ) and (2) (Okabe et al., 2001 ). Form (4) of hydrate (XXI) was observed from crystals grown by slow evaporation from methanol in the presence of sarcosine and crystallized in the monoclinic space group P21/c with Z′ = 1 (Clarke et al., 2011 ). The crystal structure is dominated by an extensive hydrogen-bonding network. Unlike forms (1) and (3), no carboxylic acid dimer units are formed, with forms (2) and (4) instead having hydrogen bonds from the carboxylic acid to both water and adjacent gallic acid molecules (Fig. 6 ).
The submitted predictions were compared with each experimentally determined crystal structure using the ‘Crystal Structure Similarity’ feature of the Materials Module of Mercury (Macrae et al., 2008 ). The algorithm used by this feature allows comparison of the molecular packing environment between two or more crystal structures. The reference crystal structure, in this case the experimentally determined crystal structure, is analysed and represented by a reference molecule and a coordination shell of its 14 closest neighbours. This set of distances is then searched for in the predicted crystal structures and if they match to within the default geometric tolerances (distances within 20% and angles within 20°) then the coordination shells are overlaid and a root-mean-squared deviation (RMSD15) of the atomic positions is calculated for all matching molecules. As with previous blind tests, this search was configured to ignore H atoms due to the uncertainty of their positions in X-ray determined crystal structures. If all 15 molecules of the reference and predicted crystal structure matched within the standard tolerances, the crystal structure was determined as having been successfully predicted.
For hydrate (XXI) it became apparent that some predictions matched all non-H atoms but not the H-atom positions as located in the target crystal structure. For this molecule we therefore re-ran the crystal structure comparison, but this time elected to include H atoms in the calculation in order to determine if an exact match was present.
Overlays of the X-ray determined crystal structure with some of the predicted structures for targets (XVI)–(XX) can be found in the supplementary material.
All of the participating research groups attempted predictions for molecule (XVI), two of whom predicted the observed crystal structure within their three predictions (Table 3 ). One of these successes (Neumann, Leusen, Kendrick and van de Streek) was submitted as the group’s first prediction, while the other (van Eijck) was submitted as the participant’s second prediction. Both of these successful predictions gave RMSD15 deviations from the experimentally determined crystal structure of less than 0.25 Å.
Outside of the three official predictions, the observed crystal structure was present in the extended lists of five other research groups. The success rates here are comparable to the first three blind tests, while not quite as high as the results observed in the fourth blind test. This may be attributed to some methods having difficulties with many structures close in energy. The very small ΔE in Table 3 , even when the observed structure is found outside of the first three predictions, shows how closely spaced the energies are for this molecule, and the accuracy in lattice energy required for a successful prediction.
13 of the participating research groups attempted predictions for molecule (XVII), two of which predicted the observed crystal structure within their three official predictions (Table 4 ). As with molecule (XVI), one of these successes (Neumann, Leusen, Kendrick and van de Streek) was submitted as the group’s first prediction, while the other (Price and Habgood) was submitted as that group’s second prediction. Both of these successful predictions gave RMSD15 values of less than 0.13 Å.
Four other research groups submitted the observed crystal structure in their extended list of solutions, with energies between 3.2 and 6.4 kJ mol−1 above their global minimum. The slightly lower rate of success for this category than for the last blind test may be attributed to the fact that molecule (XVII) is not truly rigid, with flexibility in the nitro groups having to be taken into consideration. Despite these additional challenges, the observed crystal structure was still successfully predicted.
13 research groups attempted predictions for the category 3 target, molecule (XVIII), with one group (Neumann, Leusen, Kendrick and van de Streek) successfully predicting the observed crystal structure within their three predictions (Table 5 ). Once again, this solution was submitted as this group’s first submission, with an RMSD15 from the observed crystal structure of just 0.12 Å.
Three other groups also reported the correct crystal structure in their extended lists of solutions, with one group (Orendt, Grillo, Ferraro and Facelli) close to having a successful prediction as their number 4 structure is a close match to the experimental structure with an RMSD15 value of 0.252 Å.
11 participants attempted predictions for the molecular salt (XIX) and two of these predicted the observed crystal structure within the three official predictions (Table 6 ): van Eijck as the second prediction and Neumann, Leusen, Kendrick and van de Streek as the third prediction, with RMSD15 values of 0.15 and 0.22 Å. Two other participants located the crystal structure within their extended lists of submissions.
The rate of success in searching for structures with two independent molecules in the asymmetric unit is broadly comparable with that of the last blind test. However, the energetic ranking of the salt structures provided a greater challenge than was experienced with the cocrystal used in 2007. The most successful prediction relied on the use of a supramolecular dimer owing to difficulties with modelling individual ions. Comparison with predictions and the known crystal structure of the similar compound present in CSD entry RABYID also helped to weight some predictions, including the third placed submission made by Neumann, Leusen, Kendrick and van de Streek, which would have been ranked at position 20 by energy alone.
Ten participants attempted predictions for molecule (XX) and two of these predicted the observed crystal structure as their top submission (Day and Cruz-Cabeza; Price, Kazantsev, Karamertzanis, Adjiman and Pantelides). One other group (Neumann, Leusen, Kendrick and van de Streek) also located the observed crystal structure in its extended list of solutions (Table 7 ) at rank 7.
This category was introduced in this blind test as a new challenge and so there are no results from any previous blind tests with which to compare. However, this does appear to be the first case of a molecule of this complexity having been successfully predicted under blind test conditions and then detailed in a refereed publication. The key dependence was on the conformation of the molecule and with eight internal degrees of freedom the problem became one of completeness of the search. One team resolved this by taking into account CSD observations for each of the flexible components to reduce the search to a more manageable size.
Ten participants attempted predictions for the hydrate (XXI). This category featured the opportunity to find and locate both an unknown polymorph and two polymorphs whose crystal structures had previously been determined. During analysis of the results it became apparent that there is an alternative proton arrangement in the hydrogen-bonding network of form (4) involving the central OH moiety of the acid and the water molecules (see Fig. 7 ). Solutions with both proton conformations were generated by some groups, but no agreement was observed in which form had the lower energy.
In previous blind tests, H-atom placement has been ignored in determining if a participant’s entry matches the target crystal structure, but in this case it was evident that the two groups that submitted a match within their top three submissions (Price and Braun; van Eijck) did so with the p-hydroxy conformation of form (4)alt, not that of the target crystal structure form (4)expt (Fig. 8 ). As the p-hydroxy gallic acid proton shows enlarged displacement parameters, it could be argued that some disorder is present in the structure.
Given this, we present here results for both exact matches including H-atom placement (Table 8 a) and matches for non-H atoms only (Table 8 b). No groups submitted an exact match in their top three solutions. Four groups (Day; van Eijck; Neumann et al.; Price and Braun) had exact matches within their extended lists of submissions. For matches involving only the non-H atoms, two groups located the target crystal structure within their top three solutions (van Eijck; Price and Braun) as their first and third submissions respectively. Both of these groups also located the exact match, but at significantly higher energies of approximately 12 kJ mol−1 above their global minimum. Three other groups (Desiraju et al.; Day; Neumann et al.) also located this crystal structure in their extended lists of submissions.
Tables 8 (c) and (d) show successful matches for the existing polymorphs [forms (1) and (2) in this test]. Six groups located form (1) in their extended lists of submissions, and five groups located form (2). These were generally predicted at high relative energies and rankings, and with no consistency between groups on the stability order between form (1) and (2). This highlights problems in modelling the stability of hydrates.
Table 9 summarizes the approximate computational resources used by some of the participants. Of particular note is the disparity between some of the groups; the range of computational expense seen in CSP2010 varies from a few thousand CPU hours to almost 200 000 CPU hours (which translates to over 22 CPU years). Clearly the resources required for this blind test have increased. A large portion of the total CPU time was devoted to targets (XX) and (XXI), and is therefore clearly dependent upon the complexity of the molecule. Fortunately, the computer systems required to meet this increased need are also now more readily accessible, as shown by several groups reporting increases of computing resource of over an order of magnitude (and sometimes almost two orders of magnitude) over the resources used for their CSP2007 submissions. As computers get progressively faster and with greater numbers of computing cores per processor, the real time required for these computations is decreasing. This makes modern computers more viable for fast prediction of the simpler targets.
The success rate for previous blind tests has shown a fluctuating, but generally upward trend, with particular success shown in the fourth blind test (Day et al., 2009 ). This fifth blind test was designed to see if the successes of the fourth test could be repeated, and also to provide more challenging targets to try to stretch the techniques that have thus far been developed. This test therefore saw the introduction of more flexible molecules, as well as hydrates and salts, significantly increasing the complexity of the challenge.
Success for these tests is a combination of two factors: Firstly the ability to generate all possible crystal structures, and secondly the ability to evaluate and rank those crystal structures. The search performance can be impacted by methods that are presently unable to search for crystal structures in space groups with higher values of Z, or simply through a lack of computing time and resources. This will lead to an incomplete search space, which may cause the correct solution to be missed entirely. For flexible molecules the conformation of the molecule is also of great importance. Failure to use the correct conformation or to allow for flexibility during the search will lead to failure to predict the correct crystal structure, and this problem becomes greater the more flexible the target molecule. Lastly, the crystal structures generated must be ranked, which is often complicated by the fact that most molecules tend to have many distinct crystal packing possibilities within a small energy range (Day et al., 2004 ), so that the energy differences between crystal structures are generally very small. The identification and use of accurate energy models can often prove to be the most challenging aspect of successful crystal structure prediction. Ranking is further complicated by thermodynamic kinetic aspects, i.e. energies alone may not be sufficient; entropies and nucleation kinetics could also be relevant.
Of the groups that participated in the fifth blind test, most attempted solutions for the four targets [(XVI), (XVII), (XVIII) and (XIX)] that matched the criteria of the previous blind test. Overall, the success rates for these four targets were a little lower than for CSP2007, but generally at least as good if not better than the results obtained for CSP1999, CSP2001 and CSP2004. What these results do show, however, is that just as in CSP2007, the method adopted by Neumann, Leusen, Kendrick and van de Streek again excelled, with this group able to successfully predict the crystal structures of the first three categories with their number 1 submission, as well as the fourth category with their number 3 submission. They were the only participants able to generate all target crystal structures within their extended list of submissions. They did so with the lowest RMSD15 values for all except the hydrate crystal structure. This demonstrates the reliability of DFT-D methods to predict the crystal structures of small organic molecules (Asmadi et al., 2009 ; Chan et al., 2011 ). For the fourth category, complete crystal-structure prediction studies were performed for (XIX) and for model compound RABYID from the CSD. The energy landscapes of these two systems were analysed and showed significant similarities. Based on these similarities, it had to be concluded that the experimental structure of (XIX) could be isostructural to the experimental structure of RABYID, and this structure, even though it was ranked 20th by energy (22nd for RABYID), was submitted as the third candidate structure (Kendrick et al., 2011 ).
More complex systems such as salts continue to provide some challenge, perhaps suggesting that a salt should be considered a new, more challenging category than the current ‘cocrystal’ definition of category four.
This test also introduced two new categories that provided much greater challenges to the participants and 11 out of the participating groups attempted at least one of the targets (XX) and (XXI). Particularly encouraging was that two groups (Price et al.; Day et al.) successfully predicted the crystal structure for the largest, most flexible molecule to be included in this series of blind tests. The hydrate target (XXI) proved to be a considerable challenge, even to methods that have been successful for hydrates of o-dihydroxybenzoic acids (Braun et al., 2011 ), and highlights the many difficulties that such a system can pose to characterization as well as successful structure prediction. However, this is a system that needs to be tackled; water is one of the most complex solvents to model, yet it is also one of the most important.
This blind test has also once again highlighted that the use of generic standard force fields does not lead to good crystal structure prediction results. We have also observed that the more extensive search methods are adequate within the limitations (Z′, no disorder etc.) implicit in the blind test categories, but have to assume the covalent bonding in the chemical diagram and rely on a sufficient number of search structures being refined by the more accurate and expensive model for the lattice energy. Successful prediction of small molecule crystal structures has been shown to require both accuracy of energies and the ability to coordinate the inter- and intramolecular force field contributions. The methods that gave the greatest success were varied and were modified to take on these tougher challenges.
Molecule (XVI), while the simplest of the rigid molecule targets, proved to have many crystal structures close in energy. Transferable empirical potentials had difficulty coping with diazide–carbonyl interactions with induction being a problem. Simple point-charge models used by some groups failed completely for this molecule, although van Eijck did find in post-analysis that one set of charges predicted the observed structure.
This system is the first to be tackled by an ab initio random search method (Misquitta, Pickard and Needs) which does not fix the chemical bonding and uses electronic structure methods during the search, although this approach results in a significant increase in the computing resources required when compared with the methods employed by the other participating groups. This method failed as the search was not extended to eight formula units in the cell. Many of the numerous minima, including the global minimum, corresponded to an isomer with the formation of a bond to give a heterocyclic ring with the two N atoms, showing the promise of this method for cases, such as tautomers, where the covalent bonding is uncertain.
Molecule (XVII) was perhaps not well selected as a target for its category as the molecule was not truly rigid; the orientation of the nitro groups may be affected by intermolecular interactions in the crystal structure. As a result participants were forced to first consider how to deal with this flexibility. The electrostatic potentials of the nitro groups also proved to be unusually challenging to model successfully, although the dispersion proved to be a very important contribution to the lattice energy. These additional issues lead to molecule (XVII) being a significantly more difficult problem than previous targets in this category. Despite these extra challenges, the success rate for this category was good compared with previous blind tests.
For molecule (XVIII) flexibility proved to be the key to successfully locating the crystal structure in the search. Some searches missed the crystal structure, with the most fundamental reason being the wrong conformation of the C(N2)C(O) bond. The relative energies of the cis and trans configurations were sufficiently sensitive to the methods being used to cause mis-assignment.
For the salt (XIX), there were again some difficulties with flexibility with the relative orientation of the two fragments; for the acid there is considerable conformational flexibility and the calculated stability of the conformers alters between the gas and solid.
All groups encountered significant problems with developing suitable methods of evaluating the relative lattice energies of structures containing the different conformers. Plane-wave ab initio methods do not cope well with isolated ions in a vacuum, causing problems with ion-specific reference data calculated with DFT-D methods for force-field parameterization. Induction and charge transfer, which are stronger in molecular salts, limited the transferability of exp-6 potentials which had been fitted to crystal structures of neutral molecules. Successful prediction based on energy (van Eijck) was achieved by the use of a supramolecular dimer, rather than two ions as individual molecules.
Other groups noted the similarity of an existing crystal structure in the CSD, RABYID, the crystal structure of which consisted of the same fumarate ion but with a quinolinium counterion instead of 1,8-naphthyridinium (i.e. where the unprotonated nitrogen is instead a CH moiety). The energy landscapes generated for both salts proved similar enough to encourage the speculation that the two crystal structures could be isostructural and one group (Neumann et al.) submitted a successful prediction based on this approach (Kendrick et al., 2011 ).
Molecule (XX) was the first large flexible molecule to feature in the blind tests and proved a considerable challenge. For such highly flexible molecules there is a key dependence on the conformation of the molecule and successful prediction involved succeeding at this early step. One of the main difficulties is the computing power required to make a complete search for all available space groups with a flexible molecule; when all standard orientations about the exocyclic single bonds are considered, there are over one thousand possible conformations. The two successful strategies (Day and Cruz-Cabeza; Price, Kazantsev, Karamertzanis, Adjiman and Pantelides) reduced the search space to a more manageable level, producing innovations in methodology that have been described and contrasted in detail elsewhere (Kazantsev, Karamertzanis, Adjiman, Pantelides, Price, Galek, Day & Cruz-Cabeza, 2011 ). Day et al. used geometry data for similar systems from the CSD to help limit the search further and considered a set of predefined conformations. Price et al. identified likely ranges of values for the flexible torsions and used an extension to the CrystalPredictor methodology and databases of the ab initio calculations on the isolated molecule to allow the crystal structures and conformations to be simultaneously refined (Kazantsev, Karamertzanis, Adjiman & Pantelides, 2011 ). Neumann et al. employed a fully flexible molecule, allowing all conformations to be explored during the crystal structure generation step. Use of multipoles and empirical potentials performed better than DFT-D in this case, with both groups using this method (Day et al.; Price et al.) successfully predicting the crystal structure in first place.
Hydrate (XXI) proved to be one of the most challenging systems in the blind test. For this molecule two known polymorphs already existed. However, the difficulty in predicting this crystal structure was not due to the availability of two already known polymorphs, but rather that the representation of water–water and water–gallic acid interactions is extremely difficult to model, making the successful prediction of even the known polymorphs a difficult task.
As a hydrate, the hydrogen-bonding network enabled by the water molecules and the various hydrogen-bond donors and acceptors in the acid proved key to successfully predicting the crystal structure, but it is also obvious that the sheer number of different possible hydrogen-bond networks make the problem a difficult one. The results obtained for form (4) show that with the same placement of non-H atoms there is more than one set of hydrogen positions that is possible. Energetically, the OH conformation observed in the experimental structure is not the most favourable in isolation and, given the nature of X-ray diffraction, the positions of these protons cannot be deemed as unequivocally determined. Indeed, there is evidence of large displacement parameters for the protons involved in the two alternative hydrogen-bond networks. This leads us to consider that the structure is best described as disordered with respect to which network is present. This matter would only be resolved with an in-depth temperature-dependent X-ray and NMR study. A post-blind test polymorphism screen (Braun, Personal communication) showed that the ordered form (2) structure is the most stable polymorph at room temperature.
Overall, the systems that gave the most difficulty are those where the molecules can adopt very different low-energy conformations, where current methods may not accurately reflect the energy differences between the conformations in the solid state. Work on improving the estimates of polymorphic energy differences in challenging cases where the polymorphs have different numbers of inter- and intramolecular hydrogen bonds (Karamertzanis et al., 2008 ) shows that improving the theoretical basis of the methods used to evaluate the lattice energies will lead to further progress.
This fifth blind test has built upon the successes of previous blind tests and shows that a state-of-the-art method for crystal structure prediction is able to reliably predict crystal structures of small rigid and slightly flexible molecules, and methods are emerging that are able to tackle larger more flexible molecules and complex systems such as salts and hydrates.
For each of the six target crystal structures, there was at least one successful prediction under the criteria stated for success at the start of the test [although for the hydrate (XXI) certain protons were incorrectly placed]. The number of successful predictions for each of the first four categories was broadly comparable with the first three blind tests, but slightly less than the great successes observed with CSP2007. This may be due to the difficulty of easily gauging a target’s difficulty based on its molecular diagram alone – several of the targets in this blind test showed additional challenges not faced in CSP2007 even though the target molecule met the same selection criteria. One observation that is easily made, however, is that the DFT-D method continues to perform very well for these molecule types, although it does not yet supply a comprehensive solution, as observed by the inability to predict some targets [such as (XIX)] by energy methods alone. Most other successes were based on using realistic models for the intermolecular forces (Stone, 1996 ), which included a distributed multipole representation of the molecular charge distribution.
For the large flexible molecule [target (XX)] it is promising that two groups were able to successfully predict the crystal structure as their first place entry. In both cases success was achieved by systematic reduction of the problem to more manageable proportions, such as through the use of CSD geometry data to determine the more likely conformations in the experimental crystal structure. More methodological and program development should allow the most thermodynamically stable crystal structures to be computed more readily for molecules of this complexity in the future. The best approach for such complex systems may well be the use of experimental data, including polymorph screening, alongside the calculations to move towards a predictive technology for the understanding and anticipation of polymorphism.
The difficulties faced in this blind test have helped push the participating teams to adopt novel approaches in an attempt to successfully predict the experimental crystal structures. While some challenges remain, such as the need to include a direct consideration of temperature (thermodynamics prescribes that relative stability is a function of temperature), the results achieved in this blind test demonstrate that crystal structure prediction can now be performed reliably for small molecules using a state-of-the-art method. Furthermore, results on the large molecule [target (XX)] as well as the salt [target (XIX)] and the hydrate [target (XXI)] provide encouragement that crystal structure prediction can move on from prediction of small rigid molecules to more complex systems, while highlighting deficiencies in current methods where key developments are still required.
Supplementary material file. DOI: 10.1107/S0108768111042868/bk5106sup1.txt
Supplementary material file. DOI: 10.1107/S0108768111042868/bk5106sup2.zip
Supplementary material file. DOI: 10.1107/S0108768111042868/bk5106sup3.zip
We are grateful to the crystallographers who supplied candidate structures: Professor Doyle Britton [molecules (XVI) and (XVII)], Professor Alexander Blake [molecules (XVIII) and (XX)], Professor Leonard MacGillivray [salt (XIX)] and Professor Michael Zaworotko [molecule (XXI)]. SLP group developments were funded by CPOSS Basic Technology EP/F03573X, http://www.cposs.org.uk. DEB was funded by the Austrian Science fund (FWF) J2897-N17. The Imperial College team (AVK, PGK, CSA and CCP) gratefully acknowledge the Engineering and Physical Sciences Research Council (EPSRC) under the Molecular Systems Engineering grant (EP/E016340) for financial support and the High Performance Computing Cluster at Imperial College London for providing resources for calculations. GMD thanks the Royal Society for funding. AJCC thanks the Pfizer Institute for Pharmaceutical Materials Science for funding. ST, RP and TST thank the Indian Institute of Science for fellowships. GRD thanks the DST for the award of a JC Bose fellowship. BvE thanks Paul Ruttink. The University of Utah team gratefully acknowledges an allocation of computer time from the Center for High Performance Computing at the Univeristy of Utah. RJN and AJM would like to acknowledge EPSRC grant EP/F032773/1 for funding.
1Supplementary data for this paper are available from the IUCr electronic archives (Reference: BK5106). Services for accessing these data are described at the back of the journal.