|Home | About | Journals | Submit | Contact Us | Français|
Shape is a fundamentally important molecular feature that often determines the fate of a compound in terms of molecular interactions with preferred and non-preferred biological targets. Complementarity of binding in small molecule-protein, peptide-receptor, antigen-antibody and protein-protein interactions is key to life and survival, but also to targeting molecules with bioactivity. We review the application of shape in various biological systems such as substrate recognition, ligand specificity / selectivity and antibody recognition in the context of computational methods such as docking, quantitative structure activity relationships, classification models and similarity search algorithms. These in silico pharmacology methods have recently demonstrated the importance and applicability of determining molecular shape in drug discovery, virtual screening and predictive toxicology. The results from recently published studies show that shape and shape-based descriptors are at least as useful as other traditional molecular descriptors.
Form follows function - that has been misunderstood. Form and function should be one, joined in a spiritual union.Frank Lloyd Wright, 1908
It has been long recognized that determining molecular shape (Box 1) and changes in this property is essential in order to understand, molecules involved in chemical reactions 1. Enzymes can differentiate between functional groups in a molecule through shape recognition; also, natural products produced through biosynthetic pathways involve shape recognition for selective oxidation 2. More recently, chiral recognition was shown at the single-molecule level to involve mutually induced conformational adjustments 3. At lower concentrations of magnesium ions, ribozymes from Escherichia coli and Bacillus subtilis recognize cloverleaf shape RNAs rather than hairpin shape RNAs, indicating shape recognition 4. Chemical shape interaction has a key role in the senses 5: smell (via hundreds of olfactory receptors), sight (via receptors responsible for the perception of color), and taste (via receptors responsible for the perception of bitter, sour, sweet, salt and umami), and all of these receptors are G-protein coupled receptors. Various studies have reinforced the idea that molecular shape plays a major role in biological activity (6 and references therein). Addition of other features to molecular shape would be expected to increase specificity such as complementary electrostatic or steric interactions.
The shape (OE. sceap Eng. created thing) of an object located in some space refers to the part of space occupied by the object as determined by its external boundary — abstracting from other aspects the object may have such as its color, content, or the object’s position and orientation in space, and its size. Shape can also be more generally defined as "the appearance of something, especially its outline". However, the definition of shape in molecular pharmacology does encompass structural features like depth, size and surface. As can be visualized by a simple Tupperware toy in which small shapes fit into complementary holes (Box Figure A), the importance of shape comes into play when a protein packs against another protein (Box Figure B) or when a small molecule is desired to fit into a binding site on a protein surface (Box Figure C), where size and shape complementarity may be essential in addition to favorable electrostatic and steric interactions, to fulfill the “lock and key” or “induced fit” hypotheses 63.
Analogous to the term shape is “depth” which can be defined as the distance of an atom from the nearest surface water molecule 64 that is usually applicable when describing shape based features in proteins such as grooves in DNA or binding pockets buried in membranes 65. However, the term depth is more abstract and provides only a limited quantitative measure of shape.
Irrespective of the type of definition used, the essence of shape is therefore very useful in describing molecule(s) by themselves or the nature of interactions between molecules. Hence, the study of shape in molecular pharmacology has gained importance due to its applicability in drug design process in silico techniques (Table 1) widely employed to decrease the costs of drug discovery and development. These computational methods can enable rapid comparisons between small molecules, or small molecules with protein receptor sites, mainly based on their shape and other properties such as electrostatics. This review explores the various definitions of shape generally used when describing a molecule or interaction between molecules and provides examples of biological systems where the concept of shape plays a major role. The applicability of shape in in silico techniques is also detailed along with future developments for in silico pharmacology. Several published studies illustrate that shape-based methods and descriptors in various classification and other modeling schemes are as useful as traditional molecular properties, like 2D descriptors 7-9.
There are many examples of applications of computational methods using shape (Table 2). Simple molecular shape analysis by determining the van der Waals volume of active and inactive compounds can be insightful for enzymes, for example, in elucidating substrate recognition by Pseudomonas fluorescens N3 dioxygenase 10 and in visualizing differences in inhibitors for human cytochrome P450 (CYP) 51 11. Shape descriptors (Box 2) have been found to be important in some recent computational models. For example, a model for protein-protein interaction inhibitors found the shape descriptor SHP2 at the top of the decision tree 12. Sammon and Kohonen mapping human ether-ago-go potassium ion channel models contained Wiener and Balaban index descriptors, suggesting that molecular shape or topological characteristics were important for binding to this ion channel 13.
Many different molecular shape descriptors have been proposed so far in the literature for small molecules and polymers. A review of all the different topological indices and their application to drug discovery is discussed in 66 and is beyond the scope of this current review. Fragment or substructure based indices (also called Free-Wilson-analysis) are the 2D descriptors commonly used to describe molecular shape 61. Field based descriptors and others such as Shape Signatures 53, Zernicke descriptors 67, local intersection volume 68 and path-space ratio 69 use 3D information of the molecule and are generally more efficient and computationally intensive forms describing the molecule. Field methods in general have been broadly classified as quantum mechanics (QM) based descriptors (PEST and TAE) 70 and non-QM methods such as Comparative Molecular Field Method (CoMFA) 42. Shape descriptors represent only an essence of the molecular shape by reducing the three dimensions to a set of numbers. Hence, these descriptors cannot be qualitatively used to ascertain ligand atoms responsible for hydrogen bonding to protein donor atoms.
The co-evolution of molecule protein interactions with regard to shape has been explored in the study of nuclear hormone receptors (NHRs) that recognize bile salts 14, 15. Bile salts are the main end-metabolites of cholesterol in vertebrate animals. Evolutionarily early vertebrates such as jawless fish (lampreys and hagfish) use planar 5α (‘allo’) bile salts 15. In contrast, many other vertebrates, including humans and most mammals, use 5β bile salts that have a ‘bend’ at the junction of the A and B steroid rings. Cross-species comparisons of the selectivity of the farnesoid X receptor (FXR; ‘bile acid receptor’) for structurally diverse bile salts showed that FXR has changed selectivity for bile salts from preference for 5α (flat) bile salts (‘ancestral’ pattern in sea lamprey and zebrafish) to a preference for 5β bile salts in humans and mice (‘recent’ pattern). Computational homology models predicted that this selectivity change was mediated by altering the shape and size of the ligand binding pocket 15. Using similar computational approaches, vertebrate liver X receptors (LXR, ‘oxysterol receptor’) have also been found to diverge from invertebrate LXR in their ligand specificity 16. Analyses of cross-species differences in receptor binding sites can also be inferred using a ligand-based approach such as a pharmacophore. For example, pregnane X receptors (PXRs) (broad specificity nuclear hormone receptor (NHR) involved in regulation of liver metabolism 17), show significant cross-species differences in ligand specificity, with a broadening of ligand specificity from teleost fish to mammals and birds14, 18. These NHRs represent robust model systems to explore the co-evolution of receptors and ligands in terms of shape and size.
In a recent study on identifying highly selective dopamine D4 receptor agonists and antagonists, shape and charge complementarity between the ligand and the receptor microdomain was found to play a major role in the functioning of the 1,4-disubstituted aromatic piperidines and piperazine inhibitors (Figure 1) 19. Thus, the presence of structurally compatible regions between the receptor and its ligand is required for the functioning of these compounds.
Immunoassays based on antibodies are widely used in clinical medicine 20. Common applications include drug of abuse (DOA) screening, endocrinology testing, and therapeutic drug monitoring (TDM). Immunoassays are also employed as sensors for the detection of chemical warfare agents such as nerve gases and environmental pollutants 21. Immunoassays may have either narrow specificity for a single target compound (e.g., a drug, vitamin, toxin, or hormone) or broader specificity for a group of structurally related target compounds (e.g., benzodiazepines, opiates). Antibodies use molecular shape to recognize the antigen and this may present problems with similar shaped but distinct molecules. Immunoassays are limited by the occurrence of false positives (or ‘cross-reactive’ compounds), defined as a positive result in the absence of the target compound(s). False positives are a particular limitation of DOA screening assays. For example, many drugs with structural similarity to amphetamine and methamphetamine such as pseudoephedrine and bupropion can cross-react with amphetamine screening tests. There have been some studies of the three-dimensional structure of antibodies bound to drugs that are the target of DOA screening tests or TDM assays, providing insight into the specificity of antibody-drug interactions. For example, in X-ray crystallographic structures of antibodies complexed with cocaine 22 (additionally supported by molecular modeling studies 23), the antibody interacts with all portions of the target molecule.
There have been few efforts to use computational methods to predict and identify cross-reactive compounds for clinically used immunoassays. In initial studies, we have looked at molecular shape in terms of 2D similarity of test compounds to that of the antigen, using the MDL public keys fingerprint descriptors. A database of frequently used Food and Drug Administration (FDA)-approved drugs derived from the Clinician’s Pocket Drug Reference supplemented with drugs of abuse and drug metabolites (n = 813) important in clinical toxicology was used for searching. We found that ‘within-class’ true positives for three urine toxicology screening assays (barbitures, benzodiazepines, and tricylic antidepressants) tend to have high similarity to the target compound (Figure 2A). Particularly for benzodiazepines, however, some within-class compounds (e.g., clobazam and clonazepam) have lower similarity to the target compound diazepam, reflecting the diversity of structural modifications on the basic benzodiazepine core structure. Similarity methods provide a means of quantitatively rationalizing why compounds like clobazam and clonazepam generate ‘false negative’ results in an assay where diazepam was used to generate the assay antibodies.
In addition, other computational approaches could be used such as 3D methods that require the development of a pharmacophore or pharmacophoric pattern representing the arrangement of the chemical features and distances between them important for binding 24, 25 (e.g. alignment of desipramine and amitrityline). The advantage of the 3D similarity approach is that it is able to find structural matches that may not look similar in 2D, but possess the key features for 3D mapping. For example, a pharmacophore for amitriptyline and desipramine that was used to search the same database of over 800 drugs and metabolites retrieved over 150 hits including many non-tricyclic compounds (Figure 3). The number of hits can be filtered further to 79 by addition of a van der Waals surface around one of the molecules accounting for shape of the molecule (Figure 3B, 3C). The approaches described above represent unique ways to search for potential cross reacting compounds based on shape that corresponds to some degree with the antibody-antigen interaction.
Given the enormous domain of shape-based features, it is challenging to characterize shape in algorithms for drug discovery. Several docking and scoring algorithms (Table 1) typically use shape either in implicit form or explicitly. In an explicit representation, the shape of the ligand is used to position it in the binding site of the target (usually a protein) with multiple conformer evaluations. This feature can be used to guide the rest of the docking process. Examples of the many available docking algorithms that use an explicit shape method are DOCK 26, GOLD 27, GLIDE 12 and AutoDock 28. Although some of these methods may also use shape implicitly through their scoring functions (the reader is referred to the individual references for further detail on each method). In an implicit representation, the shape of a molecule is used as a first level of filter or in a “screening” mode to identify if the molecule fits into the target or not. The soft docking method 29 and PUZZLE 30 (which use the shape complementarity between the surface of two interacting molecules as a filter) and many other programs use various representations for shape implicitly.
Another way to describe shape is to represent it in the form of molecular descriptors. Many different molecular shape descriptors have been proposed so far in the literature for small molecules and polymers (Box-2) It had been suggested over twenty years ago that a descriptor should obey all of the following properties to be useful, namely it should: a) describe local shape b) be invariant to coordinate transformations and c) enable determination of shape complementarity between 2 or more compounds 31. One group 32 added an additional requirement that the shape descriptor must provide a means of positioning the shape that it describes in a few canonical orientations. These shape descriptors can be used together with a distance metric such as the Tanimoto coefficient and/or quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) algorithms (Table 2).
Similar to these descriptor notations is another collection of shape representations called the “shape catalog”. For a set of active compounds, a database or catalog of fingerprints can be created by considering all possible shapes that arise from the conformations of these compounds 33. The main advantage of this catalog, apart from its application as a repository of molecular shapes, is that it can be customized by the user for the desired level of similarity cutoff between the compounds 34, 35.
A minimal description of molecular shape may also be rendered as a pharmacophore or pharmacophore fingerprints 24, 25, whereby the angle and distance between key molecular features imparts some degree of information on size and shape required for favorable molecular interactions. Another approach that can incorporate shape details (of both ligand and indirectly the protein binding site) is pseudoreceptor modelling, which represents a protein binding site using one or more molecules or conformations, and has been recently discussed in detail 36. Pseudoreceptor models can be used for searching databases for molecules with complementarity to the pseudoreceptor model.
Shape descriptors have been applied to many pharmacologically relevant problems as briefly summarized below. Shape descriptors are highly useful for classification purposes 37 and can be modified to act as weights to scoring functions 9. Further, shape descriptors have been also shown to be applicable to clustering molecules into groups that share similar overall shapes and hence to find analogs of lead compounds in a drug discovery application.
Shape similarity searches can be performed at two levels of complexity namely: global shape analysis and local shape analysis 38. If a complete structure of one molecule is matched with another, it is called a global shape analysis method. Examples of algorithms that use global shape analysis are the distance geometry method 39 and the molecular shape analysis (MSA) method that uses steric and van der Waals volumes as a shape descriptor 40. If a sub-structure of a complete molecule matches with another molecule then such an analysis is called local shape analysis. Many algorithms have incorporated the local shape analysis to identify compounds that not only share a part of their core structure with the query molecule but also other scaffolds, that may have shape similarity and not necessarily structural similarity 38. Be it the local or global similarity, the issue narrows down to finding “neighborhood” molecules using many clues such as a receptor structure 41 or QSAR based classification results 42. 3D similarity or classification studies can be performed using the following global alignment and alignment free methods.
One approach to classifying compounds based on shape is to overlay them (alignment-based) using a set of guide points. An example is the shape-based and ligand centric approach called rapid overlay of chemical structures (ROCS, Openeye Scientific Software, Sante Fe) which performs overlays of conformers of a molecule of interest 37, 43-47. The overlays are performed quickly as the molecules are described as atom-centered Gaussian functions and conformers are compared using the Tanimoto coefficient. Adding chemical feature information is also possible and has been found useful for improving virtual screening results. ROCS has generally been found to perform as well as docking and other virtual screening methods (or better in the case of most targets) 37, 38, 40 although with some exceptions 46. ROCS has also been used to discriminate between cruzain and cathepsin L inhibitors, as well as replicating the X-ray conformation of a known cruzain inhibitor 43.
Quantum chemical derived molecular descriptors are normally considered computationally costly; however, fragment-based approaches including transferable atom equivalent (TAE) descriptors surmount this limitation 48, 49 and are alignment free. The TAE descriptors have been used with machine learning methods to model 26 absorption, distribution, metabolism, excretion and toxicity related datasets demonstrating good internal model validation statistics in most cases 50. These descriptors have also been used to describe ligands and binding sites of proteins in the Protein Database 51. Using a k nearest neighbor (kNN) pattern recognition approach and variable selection, the active site structure could identify its complementary ligand after screening 1% of the database in over 90% of cases when a representative family protein was present in the training set. The alignment free TAE descriptors can also be used to generate property-encoded surface translation (PEST) descriptors using a ray tracing approach reflected on the inside of the electron density isosurface. This provides 2D histograms of distances versus surface property, and each bin of the histogram becomes a molecular descriptor. These descriptors have been used with an olfactory database to train a genetic algorithm to distinguish between musk and non-musk compounds52. A very similar approach to PEST is the alignment free Shape Signatures method which results in compact histograms that encode for molecular shape, shape and polarity or other properties to produce signatures53. This method has been used for database similarity searching in a series of drug discovery applications 54. The approach can also be used to build enriched or biased databases of small molecules by customizing the screening database to a particular drug target such as G-protein coupled receptors 49. For every molecule, the heights of the corresponding normalized molecular shape, polarity and shape signature bins comprise sets of molecular descriptors which have been used for machine learning models of the 5-HT2B receptor, the human ether-a-go-go potassium ion channel 7, and blood-brain barrier penetration 8. These models were also evaluated against and in combination with additional descriptors in the MOE suite of software (Chemical Computing Group, Montreal, Canada). It was found that Shape Signature descriptors describing molecular shape and charge slightly outperformed shape descriptors alone. These descriptors have been used recently to build classification models for PXR , as 9 well as a hybrid docking and molecular descriptor-based approach, coupling the GoldScore with other shape-based scoring functions.
The concept of bioisosterism (i.e. molecules with similar shapes may share similar biological activities) has been applied to drug discovery 55, 56. In a study identifying angiotensin-II analogs, four query molecules were used to search for bioisosteres in a database of ~1000 compounds. Based on the search results, 425 compounds were synthesized and tested for angiotensin II inhibition. Of these 425 compounds, only 63 compounds that were identified by shape similarity search as being most similar to any of the four query structures were found to be active 57. A good correlation between the shape scores and the inhibitory activity was found among all 425 compounds. However, algorithms that implement this concept have often been criticized for using the shape descriptor derived from a single representative conformer to search databases of large numbers of compounds. To overcome this limitation, many groups have now resorted to using at least 3-10 unique conformers and also to perform enrichment studies on the virtual libraries of compounds as an initial filter. Although successful stories like the one above validate bioisosterism, this hypothesis does not necessarily hold true with other types of descriptor or even some targets 58, 59.
These studies and others (Table 2) using different computational methods illustrate the broad applicability of shape-based descriptors which are of value to the pharmaceutical and environmental sciences fields. Shape can adequately describe the ligand-protein interaction indirectly and be used for screening databases to obtain good enrichments with active compounds. Most of these algorithms also have the advantage of enabling fast comparisons 6 and are broadly applicable.
Shape is a fundamentally important molecular feature important for describing ligands interacting with receptors, ion channels, enzymes and transporters and an array of other proteins and complex biological processes. The shape of the protein or a (pseudo)receptor binding site can also be used to find molecules with complementarity or by using the crystal structure conformation of the ligands in the PDB as a shape-based search query. This might point to potential off-targets or alternate targets that may represent repurposing opportunities for known drugs. Shape-based approaches have many potential areas for development in the future as applied to in silico pharmacology 60 (Box 3).
The following represent some potential future uses of molecular shape:
In the same way that an architect like Frank Lloyd Wright had an excellent appreciation of shape, thus it appears that his concept of form and function is generally applicable to shape and its role in molecular pharmacology.
SE kindly acknowledges Penelope Ekins for lending her Tupperware toy and Accelrys for providing Discovery Studio. MDK is supported by K08-GM074238 from the National Institutes of Health . We gratefully acknowledge our collaborators from many of the studies described.