Acta Crystallogr B. 2011 December 1; 67(Pt 6): 535–551. | PMCID: PMC3222142 |
Towards crystal structure prediction of complex organic compounds – a report on the fifth blind test
David A. Bardwell,a* Claire S. Adjiman,b Yelena A. Arnautova,c Ekaterina Bartashevich,d Stephan X. M. Boerrigter,e Doris E. Braun,f Aurora J. Cruz-Cabeza,agh Graeme M. Day,i Raffaele G. Della Valle,j Gautam R. Desiraju,k Bouke P. van Eijck,l Julio C. Facelli,mn Marta B. Ferraro,o Damian Grillo,o Matthew Habgood,f Detlef W. M. Hofmann,pq Fridolin Hofmann,qr K. V. Jovan Jose,s Panagiotis G. Karamertzanis,b Andrei V. Kazantsev,b John Kendrick,t Liudmila N. Kuleshova,p Frank J. J. Leusen,t Andrey V. Maleev,u Alston J. Misquitta,v Sharmarke Mohamed,f Richard J. Needs,v Marcus A. Neumann,w Denis Nikylov,d Anita M. Orendt,m Rumpa Pal,k Constantinos C. Pantelides,b Chris J. Pickard,x Louise S. Price,f Sarah L. Price,f Harold A. Scheraga,c Jacco van de Streek,w Tejender S. Thakur,k Siddharth Tiwari,k Elisabetta Venuti,j and Ilia K. Zhitkovu
aCambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
bImperial College London, England
cCornell University, USA
dSouth Ural State University, Russian Federation
eSSCI, An Aptuit Company, USA
fDepartment of Chemistry, University College London, England
gThe Pfizer Institute for Pharmaceutical Materials Science, University Chemical Laboratory, University of Cambridge, England
hUniversity of Amsterdam, The Netherlands
iDepartment of Chemistry, University of Cambridge, England
jUniversità di Bologna, Italy
kIndian Institute of Science, India
lUniversity of Utrecht, The Netherlands
mCenter for High Performance Computing, University of Utah, USA
nDepartment of Biomedical Informatics, University of Utah, USA
oUniversidad de Buenos Aires, Argentina
pParco Scientifico e Technologico, Italy
qFlexCryst, Germany
rUniversity Erlangen–Nürnberg, Germany
sRuhr-Universität Bochum, Germany
tUniversity of Bradford, England
uVladimir State Humanitarian University, Russian Federation
vCavendish Laboratory, England
wAvant-garde Materials Simulation, Germany
xDepartment of Physics and Astronomy, University College London, England
Received August 1, 2011; Accepted October 16, 2011.
This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
2. Organization and approach
The organization for this latest blind test, CSP2010, was similar to that used for the previous four evaluations of the field, the results of which have been previously published: CSP1999 (Lommerse
et al., 2000
![[triangle]](/corehtml/pmc/pmcents/rtrif.gif)
), CSP2001 (Motherwell
et al., 2002
![[triangle]](/corehtml/pmc/pmcents/rtrif.gif)
), CSP2004 (Day
et al., 2005
![[triangle]](/corehtml/pmc/pmcents/rtrif.gif)
) and CSP2007 (Day
et al., 2009
![[triangle]](/corehtml/pmc/pmcents/rtrif.gif)
). Invitations to participate were sent to 24 research groups known to be active in the field. The test was also advertised through various websites and meetings.
The previous blind test puts forward targets for prediction in the following four categories:
- (1) Small, rigid molecules; only the elements C, H, N and O; Z′ = 1 in any space group; up to 25 atoms.
- (2) Rigid molecules; unusual functional groups or elements such as halogens, S, P and B; Z′ = 1 in any space group; up to 30 atoms.
- (3) Moderately flexible molecule with 2–4 internal degrees of freedom; Z′ = 1 in any space group; up to 40 atoms.
- (4) Multiple independent rigid molecules, e.g. solvates, co-crystals, salts or Z′ = 2 structures; any space group; up to 30 atoms.
These four categories were left the same as those used in CSP2007 so as to facilitate comparison of results. In addition, it was decided to add two new categories that would provide greater challenges:
- (5) Molecule with 4–8 internal degrees of freedom; Z′ ≤ 2 in any space group; 50–60 atoms.
- (6) Molecule for which more than one polymorph is known, and which roughly falls into one of the first four categories.
The new fifth category presents a much greater challenge in terms of flexibility than previously encountered in earlier blind tests, with a large flexible molecule intended to represent those often associated with modern pharmaceuticals. The new sixth category gives an opportunity to study the challenging effects of polymorphism by introducing a molecule for which more than one polymorph is known.
Crystallographers were contacted in August 2009 with a request for unpublished crystal structures that matched one or more of the six categories for the fifth blind test. Crystal structures were collected at the CCDC and assessed for the possibility of inclusion in one of the six possible categories. To be suitable, a crystal structure had to be of high quality and have all atoms located with no disorder. The crystal structure had to be unpublished and the donor crystallographer had to agree to postpone any publication for the duration of the blind test. Collection of suitable candidates for all six categories proved exceptionally difficult, especially for category 1, where the target molecule is very small with a very restricted set of constituent elements, and also for category 6 where few suitable candidates were available that were not of sufficient interest to be withheld from publication for the duration of this test. Almost 30 submitted crystal structures had to be rejected either due to not conforming to any of the six categories, or the presence of refinement issues such as disorder.
After considerable effort, one candidate was collected for category 1, four for category 2, eight for category 3, three for category 4, three for category 5 and one for category 6. For those categories where there was more than one candidate, the final target choice was made randomly.
For category 6, the one candidate that was submitted was gallic acid monohydrate, for which two new polymorphs had been found. These complemented the two previously published polymorphs for gallic acid monohydrate, which are located in the Cambridge Structural Database (CSD; Allen, 2002
![[triangle]](/corehtml/pmc/pmcents/rtrif.gif)
) under the KONTIQ CSD reference code family. For the purposes of this blind test, these known forms are referred to as forms (1) and (2). Of the two new forms submitted as candidates for prediction, one [form (4), as recently published by Clarke
et al., 2011
![[triangle]](/corehtml/pmc/pmcents/rtrif.gif)
] had one formula unit in the asymmetric unit (
i.e. one gallic acid and one water molecule). The other, form (3), was originally solved with two formula units per asymmetric unit. However, analysis after the blind test submissions showed that this solution contained a disordered hydrogen-bonding network and the crystal structure could also be described with an ordered hydrogen-bonding network by doubling the unit cell, as now published (Clarke
et al., 2011
![[triangle]](/corehtml/pmc/pmcents/rtrif.gif)
). For the purposes of this blind test, form (3) was therefore deemed inappropriate as a target crystal structure. The main aim for this category, then, was to predict form (4), whose structure has been recently independently published (Demirtaş
et al., 2011
![[triangle]](/corehtml/pmc/pmcents/rtrif.gif)
) and see where (if at all) forms (1) and (2) appeared in the ranked list of predictions.
The molecular diagrams and crystallization conditions were sent by e-mail to 15 participant groups on 16 November 2009. Immediately after circulation of the target crystal structures we were made aware that the crystal structure of the molecule selected for category 1 (4-ethynylbenzonitrile) had been solved, was undergoing publication and so would soon be in the public domain. The decision was therefore made to remove this candidate for category 1 and attempt to locate a suitable replacement. Thankfully a suitable candidate was quickly provided and the revised list of target molecules, as detailed in Table 1, was distributed to participants on 23 November 2009. Following the numbering used in the previous blind tests we refer to these molecules by the Roman numerals (XVI)–(XXI).
| Table 1Diagrams and crystallization conditions for the targets of CSP2010 |
The format of this blind test was kept broadly the same as the last blind test, with the exception that a greater length of time was allowed before submission of results. Participants were requested to forward their three ‘official’ predictions for each target molecule to the CCDC, where the experimentally determined crystal structures were held for the duration of the test. As well as these three main predictions, participants were urged to submit an extended list of the crystal structures they generated in order to help post-analysis and to provide insight into the performance of the various methods. The deadline for submissions was 20 August 2010. The experimentally determined crystal structures for all six categories were then circulated to all participants on 23 August 2010 to allow post-analysis of their predictions. Lastly, a workshop was held at the CCDC mid-September 2010 to discuss the results.
We present here results from the 14 participating groups that agreed to publish their results. Details of these 14 participating groups, together with a summary of which targets they attempted and if a match with the experimental structure was observed in their submission, are presented in Table 2(a).
| Table 2(a) List of participants with summary of the targets attempted and rank of any successful predictions |