|Home | About | Journals | Submit | Contact Us | Français|
The symbols for the new IUPAC elements named in November 2016 can introduce subtle ambiguities within cheminformatics software. The ambiguities are described and demonstrated by highlighting inconsistencies between software when handling existing element symbols.
On the 28th November 2016 the International Union of Pure and Applied Chemistry (IUPAC) approved the names and symbols of four new elements: 113 Nihonium (Nh), 115 Moscovium (Mc), 117 Tennessine (Ts), 118 Oganesson (Og). Cheminformatics libraries typically use a centralised dictionary of elements to store and look up symbols in the periodic table. Naïvely adding the new element symbols to this table can introduce unexpected behaviour.
The first ambiguity was noted in the the preliminary recommendations from IUPAC. Contracted group abbreviations are common in published chemistry research to make sketches more concise and readable. Common abbreviations include Ph for Phenyl, NO2 for Nitro, COOH for Carboxylic acid, etc. The new symbol Ts is widely used to represent a Tosyl group. Unfortunately the symbol Tn could not be used as it was previously used for 220Radon (Thoron). It was noted in the preliminary recommendations that similarly to how Ac is used for both Actinium and Acetyl (and sometimes Acyl) the intended meaning would be clear from the context (Fig. (Fig.11).
While it is true that a human may decipher the intended meaning, it is more difficult for the software especially when the compound is no longer associated with the original context. The existence of compounds like PubChem ’s p-tolylactinium (CID 20712520 ) instead of the intended p-acetyl structure demonstrate this. The error here was propagated from the original substance submissions: a deprecated ChemSpider  entry, and patent sketches extracted by SCRIPDB .
Software for sketching chemical diagrams often allow the input of contracted abbreviations. ChemDraw 15 interprets all Ac labels as Acetyl and it is impossible to add an Actinium atom to a sketch even via the periodic table selection menu or from file or line-notation input. In MarvinSketch 184.108.40.206 entering Ac using the periodic table menu or keyboard short-cuts results in Actinium whilst using the “Label Editor” produces Acetyl. BIOVIA Draw 2017 makes a clear distinction when adding abbreviated atoms and both interpretations can be input. ChemDoodle 7.0.2 always interprets Ac as Actiniumwhen setting the atom label but does allow OAc for Acetoxy. With all of these, there is often little visual indication or feedback as to whether a user has entered the input they intended to.
To remove the ambiguity between Tosyl and Tennessine the alternative abbreviation Tos can be used. A brief analysis of sketches taken from United States patent applications published in 2015 shows that Ts is used in atom labels 2290 times and Tos 113 times.
A more subtle problem may arise with the symbol Nh in software that allows case insensitive atom labels. It is reasonable to accept CL as equal to Cl for chlorine (e.g. PDB HETATMs) but NH (secondary amine) may now unexpectedly be picked up as Nihonium from the internal dictionary.
Support for the SMARTS query language is available in many closed and open-source cheminformatics toolkits. A potential area for ambiguity is again found with Nihonium and the interpretation of other transfermium symbols. Transfermium symbols were officially named after the initial release of the Daylight SMARTS toolkit  and in subsequent implementations some are interpreted differently between toolkits either as an element or a conjuction (AND) expression. For example, at the time of writing both the CDK  and Open Babel  interpret [Bh] as [B&h] by whilst RDKit  interprets it as [#107].
The problem occurs due to the implicit conjunction between adjacent primitive expressions. The new symbol [Nh] could be interpreted as [#113] (element 113) or [N&h] (Aliphatic nitrogen and at least one implicit hydrogen). Table Table11 lists the transfermium symbols and their different interpretations. Software that generates SMARTS patterns should take extra care to avoid writing ambiguous expressions.
A pragmatic approach to handling the new elements or perhaps all high atomic number elements with a very short half-life could be to simply ignore them. Whilst these elements are unlikely to have a practical application it is unsatisfactory to simply ignore them and we hope this commentary highlights that care should be taken when supporting the new symbols in cheminformatics software.
JWM wrote the manuscript and RAS recognized the ambiguities with SMARTS expressions. Both authors read and approved the final manuscript.
The authors declare that they have no competing interests.
John W. Mayfield, Email: moc.erawtfosevomtxen@nhoj.
Roger A. Sayle, Email: moc.erawtfosevomtxen@regor.