Home | About | Journals | Submit | Contact Us | Français |

**|**J Cheminform**|**v.2(Suppl 1); 2010**|**PMC2867170

Formats

Article sections

Authors

J Cheminform. 2010; 2(Suppl 1): P36.

Published online 2010 May 4. doi: 10.1186/1758-2946-2-S1-P36

PMCID: PMC2867170

5th German Conference on Cheminformatics: 23. CIC-Workshop

Frank Oellien, Uli Fechner and Thomas Engel

http://www.biomedcentral.com/content/pdf/1758-2946-2-S1-info.pdf5th German Conference on Cheminformatics: 23. CIC-Workshop

8–10 November 2009

Goslar, Germany

Copyright ©2010 Georg et al; licensee BioMed Central Ltd.

The spatial arrangement of a chemical compound plays an important role regarding the related properties or activities. A straightforward approach to encode the geometry is to enumerate pairwise spatial relationships between *k *substructures, like functional groups or subgraphs. This leads to a combinatorial explosion with the number of features of interest and redundant information. The goal of this work is to compute all possible *k*-subsets of spatial points and to extract a single canonical descriptor for each subset in sub-polynomial computation time. More precisely, the problem is to reduce the complexity of *n*_{k }= *n*·(*n *- 1)...(*n *- *k*) possible relationships (patterns or descriptors) for *n *features and *k*-point relationships.

We propose a two-step algorithm to solve this problem. A modified algorithm for the computation of the binomial coefficient computes the *k*-subsets [1] containing the possible combinations of the *n *relevant features. If a *k*-subset is completed in the inner recursion, the algorithm computes a canonical representation for it. By defining a natural order by means of the geometrical center of gravity of the *k *points, we extract *k *patterns that describe the distance to the center of gravity and type of the spatial feature *k * *F*. Then, the algorithm returns a unique identifier for the lexicographically sorted array of patterns. If applicable (), an additional identifier is added which has the form , where *d*_{ij }denotes the geometrical distance between features *i*, *j*. Else (), this step is omitted. Therefore, this approach also considers stereochemistry. Finally, one feature is returned for each *k*-subset resulting in a set of *C*(*n*, *k*) patterns describing the structure.

The main result is that the number of features is reduced from *n*_{k }to *C*(*n*, *k*), which equals the binomial coefficient. This procedure is useful in combination with similarity approaches that use spatial relationships, like pharmacophore searches, fingerprints, or graph kernels. We experimentally validated the algorithm on numerous QSAR benchmark sets in combination with the pharmacophore kernel [2].

Articles from Journal of Cheminformatics are provided here courtesy of **Springer**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |