This section describes illustrative results based on the CSD entries discussed in §1
), starting with GEBXOA. This is assigned with a triple bond between the Ru atoms. The actual bond order from electron counting is 2.5 (Chakravarty et al.
). Metal–metal multiple bonds are often assigned correctly, although it is also common for the assigned bond order to be out by 1 in either direction. Missing H atoms in GEBXOA are inferred correctly.
The charges on the metal-containing species in HEWMOL are assigned correctly, the algorithm recognizing that the implied metal oxidation states – MnII
– are all of high probability. The assignment is easy because the MnCl
ion is in the complete-molecule UF datafile, and therefore known to be invariably di-anionic. Also, thallium has very well defined oxidation-state preferences. The algorithm is often successful in such circumstances. For example, zinc complexes are usually assigned correctly because the metal can only be ZnII
; even when errors are made, the oxidation-state check usually indicates that there is a problem. Conversely, structures containing two or more metals which can adopt many oxidation states are much more likely to be assigned incorrectly (although often with low reliability scores), especially when, as is common, none of the metal-containing molecules or ions correspond to entries in the complete-molecule UF data file.
The assignment of HEWMOL is not perfect because the algorithm does not identify bonds between the Tl+
ion and the crown ether O atoms (these bonds are present in the CSD representation). This is a common situation in highly ionic complexes (most obviously, when oxygen ligands are coordinated to elements of groups 1 and 2), where the distinction between a metal–oxygen bond and a metal
oxygen short nonbonded contact is blurred. It is then difficult for the algorithm to reproduce what is essentially the subjective judgement of a chemist. The identification of metal–oxygen bonds in these types of compounds is an ongoing problem in the CSD. The policy of following authors’ judgements leads to inconsistencies, which places an onus on database users to construct substructure queries with care. Conversely, if bonds were assigned on the basis of strict distance criteria, the result in many cases would be chemically unintuitive.
YAZZOP and BALTUE are assigned correctly, with the redox-active ligand in its correct oxidation state in each. However, these types of structures represent a severe challenge for the algorithm and errors are frequent, although they are often highlighted by oxidation-state warnings. The correct assignment of Re=O double bonds in BALTUE is satisfying given the superficially attractive alternative of assuming the O atoms belong to water molecules with undetermined H-atom positions. Metal–oxygen and metal–nitrogen double bonds are common so it is important to recognize them, and the algorithm tends to perform well in this respect. Bond-length differences between single and double bonds can be substantial (e.g. about 0.4 Å between the mean values of V—OH2 and V=O), which helps.
The algorithm fails to reproduce the CSD charge assignment for BAPYEX, making all species neutral and compounding the felony by awarding a relatively high reliability score of 2. Unfortunately, this is typical: the algorithm performs badly on charge-transfer salts. The authors describe BAPYEX as a biradical (Mochida et al.
), which cannot be properly represented in the CSD anyway.
VOMNUH and VOMPAP are both assigned correctly with reliability scores of 2. The algorithm assigns an aromatic representation to the pyrazole ligands in VOMNUH, since both N atoms are bonded to metal and the negative charge is therefore unlikely to be localized on either one of them. An aromatic representation would also be assigned to a pyrazole ligand in which one N atom was metal-bound and the other bonded to boron, since boron is a metalloid. However, if one of the N atoms is bonded to a metal and the other to a non-metal, as in VOMPAP, a non-aromatic representation results.
The carbene ligands in XONQIB are correctly identified as such (this is often but not always the case). However, the assignment differs from that in the CSD in that the CN bonds in the carbene ligand are assigned as single rather than delocalized. In our view, either representation is defensible.
OFIKOD (Fig. 2) is assigned incorrectly because the missing metal-bound hydrogen is not inferred. However, the incorrect structure is accompanied by warning messages and awarded a reliability score of only 1 (this is the structure to which Table 3 refers). The template-based oxidation-state method detects that the assigned structure implies PtI, which has low probability (p = 0.009). This is a typical result: missing metal-bound H atoms are never added by the algorithm but the error often produces oxidation-state warnings. In the present case, there is an additional clue: without the hydride ligand, the Pt atom appears to be three-coordinate with an unusual ‘T’-shaped geometry. However, the algorithm currently makes no use of metal coordination geometries.
XOLSIB and VOLSAR (Fig. 3; both assigned correctly) represent another common and difficult problem on which the algorithm often fails: whether to assume a metal-bound —OR group is an alkoxide or an alcohol with a H atom missing. Again, the oxidation state is usually the biggest clue. In XOLSIB the assumption of missing alcohol H atoms is necessary to achieve a template-based oxidation state estimate of DyIII, which is the only reasonable hypothesis for this element. Conversely, a credible oxidation state is obtained for the Nb atom in VOLSAR with the alkoxide formulation, which is therefore accepted. In general, the algorithm tends to avoid inferring missing H atoms unless their presence is very obvious.
NOLZOE (Fig. 4) is assigned correctly: in particular, the solvate molecule is assigned as tetrahydrofuran despite its near planarity. This reflects the influence of the prior probabilities in Bayes’ formula, tetrahydrofuran occurring in the CSD several hundred times more often than furan. Interestingly (and somewhat to our surprise) the few furan molecules in the CSD are often assigned correctly, e.g. in CSD entries GAGBEV and WOSREB, suggesting that the geometry tests are well chosen for the relevant complete-molecule UF. In contrast, cyclohexane molecules with missing H atoms are almost always assigned as the overwhelmingly more common benzene, suggesting that the geometry tests are less effective for this pair. Cyclohexane geometries in the CSD are very variable (i.e. parameters used in geometry tests have large standard deviations), and we suspect this reduces the discriminatory power of the tests.
As mentioned earlier, the algorithm does not resolve the disorder in QEHLOF (Fig. 5), where H atoms are present only for the major configuration of a disorder assembly. This situation is typical of cases where it is probably better to rely on manual editing than attempt an algorithmic solution. In DEHMAF (Fig. 6), the algorithm correctly assumes that the structure contains methanol disordered by symmetry over two sites rather than half-occupancy ethane-1,2-diol. However, it will always make this type of assumption. In another example, TOLLOW, this gives the wrong answer – the structure is supposed to contain partial occupancy NH2—CH2—CH2—NH2, but the algorithm assumes disordered CH3NH2. There are two H atoms on each solvent carbon in this structure, which suggests the authors intended the former description, but not conclusively (i.e. one of the H atoms on each carbon might have been missing).
The symmetry-imposed disorder in AHALEA (Fig. 7) is resolved correctly. The twelve 1/4 occupancy oxygen sites (generated from three symmetry-independent oxygen sites by a fourfold axis) are correctly partitioned into four groups, each representing a reasonable sulfate-ion geometry, thus demonstrating that the geometry-scoring function is effective.