Search tips
Search criteria

Results 1-25 (837363)

Clipboard (0)

Related Articles

1.  Applications of the Cambridge Structural Database in chemical education1  
Journal of Applied Crystallography  2010;43(Pt 5):1208-1223.
The educational value of three-dimensional crystal structures in the Cambridge Structural Database (CSD) is discussed in the context of practical use cases and the availability of a free teaching subset of the CSD that can be used in conjunction with WebCSD, an application that provides internet access to CSD information content.
The Cambridge Structural Database (CSD) is a vast and ever growing compendium of accurate three-dimensional structures that has massive chemical diversity across organic and metal–organic compounds. For these reasons, the CSD is finding significant uses in chemical education, and these applications are reviewed. As part of the teaching initiative of the Cambridge Crystallographic Data Centre (CCDC), a teaching subset of more than 500 CSD structures has been created that illustrate key chemical concepts, and a number of teaching modules have been devised that make use of this subset in a teaching environment. All of this material is freely available from the CCDC website, and the subset can be freely viewed and interrogated using WebCSD, an internet application for searching and displaying CSD information content. In some cases, however, the complete CSD System is required for specific educational applications, and some examples of these more extensive teaching modules are also discussed. The educational value of visualizing real three-dimensional structures, and of handling real experimental results, is stressed throughout.
PMCID: PMC2943741  PMID: 20877495
Cambridge Structural Database; crystallographic education; WebCSD
2.  New software for statistical analysis of Cambridge Structural Database data 
Journal of Applied Crystallography  2011;44(Pt 4):882-886.
A new piece of software for statistical analysis of geometrical, chemical and crystallographic data within the Cambridge Structural Database System is described. This software has been written specifically to deal with chemical structure data and crucially provides simultaneous visualization of the three-dimensional structural information.
A collection of new software tools is presented for the analysis of geometrical, chemical and crystallographic data from the Cambridge Structural Database (CSD). This software supersedes the program Vista. The new functionality is integrated into the program Mercury in order to provide statistical, charting and plotting options alongside three-dimensional structural visualization and analysis. The integration also permits immediate access to other information about specific CSD entries through the Mercury framework, a common requirement in CSD data analyses. In addition, the new software includes a range of more advanced features focused towards structural analysis such as principal components analysis, cone-angle correction in hydrogen-bond analyses and the ability to deal with topological symmetry that may be exhibited in molecular search fragments.
PMCID: PMC3246811  PMID: 22477784
data analysis; computer programs; Cambridge Structural Database; substructure; Vista
3.  Short strong hydrogen bonds in proteins: a case study of rhamnogalacturonan acetylesterase 
The short hydrogen bonds in rhamnogalacturonan acetylesterase have been investigated by structure determination of an active-site mutant, 1H NMR spectra and computational methods. Comparisons are made to database statistics. A very short carboxylic acid carboxylate hydrogen bond, buried in the protein, could explain the low-field (18 p.p.m.) 1H NMR signal.
An extremely low-field signal (at approximately 18 p.p.m.) in the 1H NMR spectrum of rhamnogalacturonan acetylesterase (RGAE) shows the presence of a short strong hydrogen bond in the structure. This signal was also present in the mutant RGAE D192N, in which Asp192, which is part of the catalytic triad, has been replaced with Asn. A careful analysis of wild-type RGAE and RGAE D192N was conducted with the purpose of identifying possible candidates for the short hydrogen bond with the 18 p.p.m. deshielded proton. Theor­etical calculations of chemical shift values were used in the interpretation of the experimental 1H NMR spectra. The crystal structure of RGAE D192N was determined to 1.33 Å resolution and refined to an R value of 11.6% for all data. The structure is virtually identical to the high-resolution (1.12 Å) structure of the wild-type enzyme except for the interactions involving the mutation and a disordered loop. Searches of the Cambridge Structural Database were conducted to obtain information on the donor–acceptor distances of different types of hydrogen bonds. The short hydrogen-bond inter­actions found in RGAE have equivalents in small-molecule structures. An examination of the short hydrogen bonds in RGAE, the calculated pK a values and solvent-accessibilities identified a buried carboxylic acid carboxylate hydrogen bond between Asp75 and Asp87 as the likely origin of the 18 p.p.m. signal. Similar hydrogen-bond interactions between two Asp or Glu carboxy groups were found in 16% of a homology-reduced set of high-quality structures extracted from the PDB. The shortest hydrogen bonds in RGAE are all located close to the active site and short interactions between Ser and Thr side-chain OH groups and backbone carbonyl O atoms seem to play an important role in the stability of the protein structure. These results illustrate the significance of short strong hydrogen bonds in proteins.
PMCID: PMC2483496  PMID: 18645234
short hydrogen bonds; low-field NMR signals; rhamnogalacturonan acetylesterase
4.  An automated system designed for large scale NMR data deposition and annotation: application to over 600 assigned chemical shift data entries to the BioMagResBank from the Riken Structural Genomics/Proteomics Initiative internal database 
Journal of biomolecular NMR  2012;53(4):311-320.
Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including 1H, 13C and 15N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001–2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies.
PMCID: PMC4308039  PMID: 22689068
NMR; Chemical shift; Proteomics; Database; BMRB
5.  Data mining of metal ion environments present in protein structures 
Journal of inorganic biochemistry  2008;102(9):1765-1776.
Analysis of metal-protein interaction distances, coordination numbers, B-factors (displacement parameters), and occupancies of metal binding sites in protein structures determined by X-ray crystallography and deposited in the PDB shows many unusual values and unexpected correlations. By measuring the frequency of each amino acid in metal ion binding sites, the positive or negative preferences of each residue for each type of cation were identified. Our approach may be used for fast identification of metal-binding structural motifs that cannot be identified on the basis of sequence similarity alone. The analysis compares data derived separately from high and medium resolution structures from the PDB with those from very high resolution small-molecule structures in the Cambridge Structural Database (CSD). For high resolution protein structures, the distribution of metal-protein or metal-water interaction distances agrees quite well with data from CSD, but the distribution is unrealistically wide for medium (2.0 – 2.5 Å) resolution data. Our analysis of cation B-factors versus average B-factors of atoms in the cation environment reveals substantial numbers of structures contain either an incorrect metal ion assignment or an unusual coordination pattern. Correlation between data resolution and completeness of the metal coordination spheres is also found.
PMCID: PMC2872550  PMID: 18614239
Metalloprotein; protein structure; metal binding
6.  A rule-based algorithm for automatic bond type perception 
Assigning bond orders is a necessary and essential step for characterizing a chemical structure correctly in force field based simulations. Several methods have been developed to do this. They all have advantages but with limitations too. Here, an automatic algorithm for assigning chemical connectivity and bond order regardless of hydrogen for organic molecules is provided, and only three dimensional coordinates and element identities are needed for our algorithm. The algorithm uses hard rules, length rules and conjugation rules to fix the structures. The hard rules determine bond orders based on the basic chemical rules; the length rules determine bond order by the length between two atoms based on a set of predefined values for different bond types; the conjugation rules determine bond orders by using the length information derived from the previous rule, the bond angles and some small structural patterns. The algorithm is extensively evaluated in three datasets, and achieves good accuracy of predictions for all the datasets. Finally, the limitation and future improvement of the algorithm are discussed.
PMCID: PMC3557220  PMID: 23113939
Bond type perception; Bond order; Chemical bond; Molecular modeling
7.  WebCSD: the online portal to the Cambridge Structural Database 
Journal of Applied Crystallography  2010;43(Pt 2):362-366.
The new web-based application WebCSD is introduced, which provides a range of facilities for searching the Cambridge Structural Database within a standard web browser. Search options within WebCSD include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching.
WebCSD, a new web-based application developed by the Cambridge Crystallographic Data Centre, offers fast searching of the Cambridge Structural Database using only a standard internet browser. Search facilities include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching. Text, chemical diagrams and three-dimensional structural information can all be studied in the results browser using the efficient entry summaries and embedded three-dimensional viewer.
PMCID: PMC3246830  PMID: 22477776
WebCSD; computer programs; database searching; Cambridge Structural Database; similarity searching; substructure; reduced cell
8.  Redetermination of (d-penicillaminato)lead(II) 
In the title coordination polymer, [Pb(C5H9NO2S)]n {systematic name: catena-poly[(μ-2-amino-3-methyl-3-sulfido­butano­ato)lead(II)]}, the d-penicillaminate ligand coordin­ates to the metal ion in an N,S,O-tridentate mode. The S atom acts as a bridge to two neighbouring PbII ions, thereby forming a double thiol­ate chain. Moreover, the coordinating carboxyl­ate O atom forms bridges to the PbII ions in the adjacent chain. The overall coordination sphere of the PbII ion can be described as a highly distorted penta­gonal bipyramid with a void in the equatorial plane between the long Pb—S bonds probably occupied by the stereochemically active inert electron pair. The amino H atoms form N—H⋯S and N—H⋯O hydrogen bonds, resulting in a cluster of four complex units, giving rise to an R 4 4(16) ring lying in the ab plane. The crystal structure of the title compound has been reported previously [Freeman et al. (1974 ▶). Chem. Soc. Chem. Commun. pp. 366–367] but the atomic coordinates have not been deposited in the Cambridge Structural Database (refcode DPENPB). Additional details of the hydrogen bonding are presented here.
PMCID: PMC3343873  PMID: 22589847
9.  Automated extraction of chemical structure information from digital raster images 
To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated.
This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns.
The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.
PMCID: PMC2648963  PMID: 19196483
10.  A Study of the Hydration of the Alkali Metal Ions in Aqueous Solution 
Inorganic Chemistry  2011;51(1):425-438.
The hydration of the alkali metal ions in aqueous solution has been studied by large angle X-ray scattering (LAXS) and double difference infrared spectroscopy (DDIR). The structures of the dimethyl sulfoxide solvated alkali metal ions in solution have been determined to support the studies in aqueous solution. The results of the LAXS and DDIR measurements show that the sodium, potassium, rubidium and cesium ions all are weakly hydrated with only a single shell of water molecules. The smaller lithium ion is more strongly hydrated, most probably with a second hydration shell present. The influence of the rubidium and cesium ions on the water structure was found to be very weak, and it was not possible to quantify this effect in a reliable way due to insufficient separation of the O–D stretching bands of partially deuterated water bound to these metal ions and the O–D stretching bands of the bulk water. Aqueous solutions of sodium, potassium and cesium iodide and cesium and lithium hydroxide have been studied by LAXS and M–O bond distances have been determined fairly accurately except for lithium. However, the number of water molecules binding to the alkali metal ions is very difficult to determine from the LAXS measurements as the number of distances and the temperature factor are strongly correlated. A thorough analysis of M–O bond distances in solid alkali metal compounds with ligands binding through oxygen has been made from available structure databases. There is relatively strong correlation between M–O bond distances and coordination numbers also for the alkali metal ions even though the M–O interactions are weak and the number of complexes of potassium, rubidium and cesium with well-defined coordination geometry is very small. The mean M–O bond distance in the hydrated sodium, potassium, rubidium and cesium ions in aqueous solution have been determined to be 2.43(2), 2.81(1), 2.98(1) and 3.07(1) Å, which corresponds to six-, seven-, eight- and eight-coordination. These coordination numbers are supported by the linear relationship of the hydration enthalpies and the M–O bond distances. This correlation indicates that the hydrated lithium ion is four-coordinate in aqueous solution. New ionic radii are proposed for four- and six-coordinate lithium(I), 0.60 and 0.79 Å, respectively, as well as for five- and six-coordinate sodium(I), 1.02 and 1.07 Å, respectively. The ionic radii for six- and seven-coordinate K+, 1.38 and 1.46 Å, respectively, and eight-coordinate Rb+ and Cs+, 1.64 and 1.73 Å, respectively, are confirmed from previous studies. The M–O bond distances in dimethyl sulfoxide solvated sodium, potassium, rubidium and cesium ions in solution are very similar to those observed in aqueous solution.
The hydration of alkali metal ions has been studied by large angle X-ray scattering, LAXS, and double difference infrared spectroscopy. The obtained M−O bond distances from LAXS have been compared to relevant crystal structures, conclusions about hydration numbers in aqueous solution have been made, and new ionic radii have been proposed. Hydration numbers of six, seven, eight and eight are proposed for the sodium, potassium, rubidium and cesium ions in aqueous solution.
PMCID: PMC3250073  PMID: 22168370
11.  Atomic resolution studies of carbonic anhydrase II 
The structure of human carbonic anhydrase II has been solved with a sulfonamide inhibitor at 0.9 Å resolution. Structural variation and flexibility is seen on the surface of the protein and is consistent with the anisotropic ADPs obtained from refinement. Comparison with 13 other atomic resolution carbonic anhydrase structures shows that surface variation exists even in these highly ordered isomorphous crystals.
Carbonic anhydrase has been well studied structurally and functionally owing to its importance in respiration. A large number of X-ray crystallographic structures of carbonic anhydrase and its inhibitor complexes have been determined, some at atomic resolution. Structure determination of a sulfonamide-containing inhibitor complex has been carried out and the structure was refined at 0.9 Å resolution with anisotropic atomic displacement parameters to an R value of 0.141. The structure is similar to those of other carbonic anhydrase complexes, with the inhibitor providing a fourth nonprotein ligand to the active-site zinc. Comparison of this structure with 13 other atomic resolution (higher than 1.25 Å) isomorphous carbonic anhydrase structures provides a view of the structural similarity and variability in a series of crystal structures. At the center of the protein the structures superpose very well. The metal complexes superpose (with only two exceptions) with standard deviations of 0.01 Å in some zinc–protein and zinc–ligand bond lengths. In contrast, regions of structural variability are found on the protein surface, possibly owing to flexibility and disorder in the individual structures, differences in the chemical and crystalline environments or the different approaches used by different investigators to model weak or complicated electron-density maps. These findings suggest that care must be taken in interpreting structural details on protein surfaces on the basis of individual X-ray structures, even if atomic resolution data are available.
PMCID: PMC2865367  PMID: 20445237
carbonic anhydrase; structure comparison; metalloproteins; atomic resolution
12.  The Catalytic Mn2+ Sites in the Enolase-Inhibitor Complex - Crystallography, Single Crystal EPR and DFT Calculations 
Crystals of Zn2+ / Mn2+ yeast enolase with the inhibitor PhAH (phosphonoacetohydroxamate) were grown under conditions with a slight preference for binding of Zn2+ at the higher affinity site, site I. The structure of the Zn2+/Mn2+ PhAH complex was solved at a resolution of 1.54 Å and the two catalytic metal binding sites, I and II, show only subtle displacement compared to that of the corresponding complex with the native Mg2+ ions. Low temperature echo-detected high field (W-band, 95 GHz) EPR (electron paramagnetic resonance) and 1H ENDOR (electron-nuclear double resonance) were carried out on a single crystal and rotation patterns were acquired in two perpendicular planes. Analysis of the rotation patterns resolved a total of six Mn2+sites; four symmetry related sites of one type and two out of the four of the other type. The observation of two chemically inequivalent Mn2+ sites shows that Mn2+ ions populates both site I and II and the zero-field splitting ( ZFS) tensors of the Mn2+ in the two sites were determined. The Mn2+site with the larger D-value was assigned to site I based on the 1H ENDOR spectra, which identified the relevant water ligands. This assignment is consistent with the seemingly larger deviation of site I from octahedral symmetry, compared to site II. The ENDOR results gave the coordinates of the protons of two water ligands and adding them to the crystal structure revealed their involvement in a network of H-bonds stabilizing the binding of the metal ions and PhAH. Although specific hyperfine interactions with the inhibitor were not determined, the spectroscopic properties of the Mn2+ in the two sites were consistent with the crystal structure. Density functional theory (DFT) calculations carried out on a cluster representing the catalytic site, with Mn2+ in site I and Zn2+ in site II, and vice versa, gave overestimated D values on the order of the experimental ones, although the larger D value was found for Mn2+ in site II rather than in site I. This was attributed to the high sensitivity of the ZFS parameters to the Mn-O bond lengths and orientations, such that small, but significant differences between the optimized and crystal structure alter the ZFS considerably, well above the difference between the two sites.
PMCID: PMC2538446  PMID: 17367133
13.  Tetra­kis(1,2-dimethoxy­ethane-κ2 O,O′)ytterbium(II) bis­(μ2-phenyl­selenolato-κ2 Se:Se)bis­[bis­(phenyl­selenolato-κSe)mercurate(II)] 
The title salt, [Yb(C4H10O2)4][Hg2(C6H5Se)6], consists of eight-coordinate homoleptic [Yb(DME)4]2+ dications (DME is 1,2-dimethoxy­ethane) countered with [Hg2(SePh)6]2− di­anions. The cations and anions have twofold rotation and inversion symmetry, respectively. The Yb centre displays a square-anti­prismatic coordination geometry and the Hg centre has a distorted tetra­hedral coordination environment. One phenyl­selenolate anion and one methyl group of a DME ligand are disordered over two positions with equal occupancies. This structure is unique in that it represents a less common mol­ecular lanthanide species in which the lanthanide ion is not directly bonded to an anionic ligand. There are no occurrences of the [Hg2(SePh)6]2− dianion in the Cambridge Structural Database (Version of November 2007), but there are similar oligomeric and polymeric Hgx(SePh)y species. The crystal structure is characterized by alternating layers of cations and anions stacked along the c axis.
PMCID: PMC2961915  PMID: 21203084
14.  BioMe: biologically relevant metals 
Nucleic Acids Research  2012;40(Web Server issue):W352-W357.
In this article, we introduce BioMe (biologically relevant metals), a web-based platform for calculation of various statistical properties of metal-binding sites. Users can obtain the following statistical properties: presence of selected ligands in metal coordination sphere, distribution of coordination numbers, percentage of metal ions coordinated by the combination of selected ligands, distribution of monodentate and bidentate metal-carboxyl, bindings for ASP and GLU, percentage of particular binuclear metal centers, distribution of coordination geometry, descriptive statistics for a metal ion–donor distance and percentage of the selected metal ions coordinated by each of the selected ligands. Statistics is presented in numerical and graphical forms. The underlying database contains information about all contacts within the range of 3 Å from a metal ion found in the asymmetric crystal unit. The stored information for each metal ion includes Protein Data Bank code, structure determination method, types of metal-binding chains [protein, ribonucleic acid (RNA), deoxyribonucleic acid (DNA), water and other] and names of the bounded ligands (amino acid residue, RNA nucleotide, DNA nucleotide, water and other) and the coordination number, the coordination geometry and, if applicable, another metal(s). BioMe is on a regular weekly update schedule. It is accessible at
PMCID: PMC3394320  PMID: 22693222
15.  Penta­kis­(μ3-N,2-di­oxido­benzene-1-car­box­imid­ato)di-μ2-formato-penta­kis­(1H-imidazole)­methanolpenta­manganese(III)man­gan­ese(II)–methanol–water (1/3.36/0.65) 
The title compound, [Mn6(C7H4NO3)5(CHO2)2(C3H4N2)5(CH3OH)]·3.36CH3OH·0.65H2O, or Mn(II)(O2CH)2[15-MCMn(III)N(shi)-5](Im)5(MeOH)·3.36MeOH·0.65H2O (where MC is metallacrown, shi3− is salicyl­hydroximate, Im is imidazole and MeOH is methanol), contains five MnIII ions as members of the metallacrown ring and an MnII atom bound in the central cavity. The central MnII atom is seven-coordinate with a geometry best described as between face-capped trigonal–prismatic and face-capped octa­hedral. Three MnIII ions of the metallacrown ring are six-coordinate with distorted octa­hedral geometries. Of these six-coordinate MnIII ions, two have mirror-plane configurations, while the other has a Δ absolute stereoconfiguration. The remaining two MnIII ions have a coordination number of five with a distorted square-pyramidal geometry. The five imidazole ligands are bound to five different MnIII ions. Disorder is observed for one of the coordinating imidazole ligands, as the imidazole ligand is disordered over two alternative mutually exclusive positions in a ratio of 0.672 (9) to 0.328 (9). The inter­stitial voids between the main mol­ecules that constitute the structure are mostly filled with methanol mol­ecules that form hydrogen-bonded chains. Some of the sites of the non-coordinated methanol mol­ecules are not fully occupied, with the remainder of the volume either empty or taken up by ill-defined close to amorphous content. One site was refined as being taken up by either two or one methanol mol­ecules, with an occupancy ratio of 0.628 (5) to 0.343 (5). This disorder might thus be correlated with the disorder of the imidazole ring (an N—H⋯O hydrogen bond between the major moieties of the imidazole and the methanol mol­ecules is observed). On the other side of the disordered imidazole ring the chain of partially occupied methanol mol­ecules originates that extends via O—H⋯O hydrogen bonds to the metal-coordinated methanol mol­ecule. The three partially occupied methanol mol­ecules were refined to be disordered with two water mol­ecules to take two residual electron density peaks into account (the exact nature of these weak residual electron density peaks cannot be deduced from the X-ray diffraction data alone, the assignment as water is tentative). The occupancy rate for the methanol mol­ecules refined to 0.480 (7). The occupancy rate of the two water mol­ecules refined to 0.34 (1) and 0.31 (2) for each site.
PMCID: PMC3588767  PMID: 23468732
16.  Domain-based small molecule binding site annotation 
BMC Bioinformatics  2006;7:152.
Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites.
Using a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at . The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives.
By focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein.
PMCID: PMC1435939  PMID: 16545112
17.  Validation of archived chemical shifts through atomic coordinates 
Proteins  2010;78(11):2482-2489.
The public archives containing protein information in the form of NMR chemical shift data at the BioMagResBank (BMRB) and of 3D structure coordinates at the Protein Data Bank are continuously expanding. The quality of the data contained in these archives, however, varies. The main issue for chemical shift values is that they are determined relative to a reference frequency. When this reference frequency is set incorrectly, all related chemical shift values are systematically offset. Such wrongly referenced chemical shift values, as well as other problems such as chemical shift values that are assigned to the wrong atom, are not easily distinguished from correct values and effectively reduce the usefulness of the archive. We describe a new method to correct and validate protein chemical shift values in relation to their 3D structure coordinates. This method classifies atoms using two parameters: the per-atom solvent accessible surface area (as calculated from the coordinates) and the secondary structure of the parent amino acid. Through the use of Gaussian statistics based on a large database of 3220 BMRB entries, we obtain per-entry chemical shift corrections as well as Z scores for the individual chemical shift values. In addition, information on the error of the correction value itself is available, and the method can retain only dependable correction values. We provide an online resource with chemical shift, atom exposure, and secondary structure information for all relevant BMRB entries ( and hope this data will aid the development of new chemical shift-based methods in NMR. Proteins 2010. © 2010 Wiley-Liss, Inc.
PMCID: PMC2970900  PMID: 20602353
nuclear magnetic resonance; chemical shift; protein; atom coordinates; validation
18.  DG-AMMOS: A New tool to generate 3D conformation of small molecules using Distance Geometry and Automated Molecular Mechanics Optimization for in silico Screening 
Discovery of new bioactive molecules that could enter drug discovery programs or that could serve as chemical probes is a very complex and costly endeavor. Structure-based and ligand-based in silico screening approaches are nowadays extensively used to complement experimental screening approaches in order to increase the effectiveness of the process and facilitating the screening of thousands or millions of small molecules against a biomolecular target. Both in silico screening methods require as input a suitable chemical compound collection and most often the 3D structure of the small molecules has to be generated since compounds are usually delivered in 1D SMILES, CANSMILES or in 2D SDF formats.
Here, we describe the new open source program DG-AMMOS which allows the generation of the 3D conformation of small molecules using Distance Geometry and their energy minimization via Automated Molecular Mechanics Optimization. The program is validated on the Astex dataset, the ChemBridge Diversity database and on a number of small molecules with known crystal structures extracted from the Cambridge Structural Database. A comparison with the free program Balloon and the well-known commercial program Omega generating the 3D of small molecules is carried out. The results show that the new free program DG-AMMOS is a very efficient 3D structure generator engine.
DG-AMMOS provides fast, automated and reliable access to the generation of 3D conformation of small molecules and facilitates the preparation of a compound collection prior to high-throughput virtual screening computations. The validation of DG-AMMOS on several different datasets proves that generated structures are generally of equal quality or sometimes better than structures obtained by other tested methods.
PMCID: PMC2781789  PMID: 19912625
19.  FINDSITELHM: A Threading-Based Approach to Ligand Homology Modeling 
PLoS Computational Biology  2009;5(6):e1000405.
Ligand virtual screening is a widely used tool to assist in new pharmaceutical discovery. In practice, virtual screening approaches have a number of limitations, and the development of new methodologies is required. Previously, we showed that remotely related proteins identified by threading often share a common binding site occupied by chemically similar ligands. Here, we demonstrate that across an evolutionarily related, but distant family of proteins, the ligands that bind to the common binding site contain a set of strongly conserved anchor functional groups as well as a variable region that accounts for their binding specificity. Furthermore, the sequence and structure conservation of residues contacting the anchor functional groups is significantly higher than those contacting ligand variable regions. Exploiting these insights, we developed FINDSITELHM that employs structural information extracted from weakly related proteins to perform rapid ligand docking by homology modeling. In large scale benchmarking, using the predicted anchor-binding mode and the crystal structure of the receptor, FINDSITELHM outperforms classical docking approaches with an average ligand RMSD from native of ∼2.5 Å. For weakly homologous receptor protein models, using FINDSITELHM, the fraction of recovered binding residues and specific contacts is 0.66 (0.55) and 0.49 (0.38) for highly confident (all) targets, respectively. Finally, in virtual screening for HIV-1 protease inhibitors, using similarity to the ligand anchor region yields significantly improved enrichment factors. Thus, the rather accurate, computationally inexpensive FINDSITELHM algorithm should be a useful approach to assist in the discovery of novel biopharmaceuticals.
Author Summary
As an integral part of drug development, high-throughput virtual screening is a widely used tool that could in principle significantly reduce the cost and time to discovery of new pharmaceuticals. In practice, virtual screening algorithms suffer from a number of limitations. The high sensitivity of all-atom ligand docking approaches to the quality of the target receptor structure restricts the selection of drug targets to those for which high-quality X-ray structures are available. Furthermore, the predicted binding affinity is typically strongly correlated with the molecular weight of the ligand, independent of whether or not it really binds. To address these significant problems, we developed FINDSITELHM, a novel threading-based approach that employs structural information extracted from weakly related proteins to perform rapid ligand docking and ranking that is very much in the spirit of homology modeling of protein structures. Particularly for low-quality modeled receptor structures, FINDSITELHM outperforms classical all-atom ligand docking approaches in terms of the accuracy of ligand binding pose prediction and requires considerably less CPU time. As an attractive alternative to classical molecular docking, FINDSITELHM offers the possibility of rapid structure-based virtual screening at the proteome level to improve and speed up the discovery of new biopharmaceuticals.
PMCID: PMC2685473  PMID: 19503616
High-throughput structure determination based on solution Nuclear Magnetic Resonance (NMR) spectroscopy plays an important role in structural genomics. One of the main bottlenecks in NMR structure determination is the interpretation of NMR data to obtain a sufficient number of accurate distance restraints by assigning nuclear Overhauser effect (NOE) spectral peaks to pairs of protons. The difficulty in automated NOE assignment mainly lies in the ambiguities arising both from the resonance degeneracy of chemical shifts and from the uncertainty due to experimental errors in NOE peak positions. In this paper we present a novel NOE assignment algorithm, called HAusdorff-based NOE Assignment (HANA), that starts with a high-resolution protein backbone computed using only two residual dipolar couplings (RDCs) per residue37, 39, employs a Hausdorff-based pattern matching technique to deduce similarity between experimental and back-computed NOE spectra for each rotamer from a statistically diverse library, and drives the selection of optimal position-specific rotamers for filtering ambiguous NOE assignments. Our algorithm runs in time O(tn3 +tn log t), where t is the maximum number of rotamers per residue and n is the size of the protein. Application of our algorithm on biological NMR data for three proteins, namely, human ubiquitin, the zinc finger domain of the human DNA Y-polymerase Eta (pol η) and the human Set2-Rpb1 interacting domain (hSRI) demonstrates that our algorithm overcomes spectral noise to achieve more than 90% assignment accuracy. Additionally, the final structures calculated using our automated NOE assignments have backbone RMSD < 1.7 Å and all-heavy-atom RMSD < 2.5 Å from reference structures that were determined either by X-ray crystallography or traditional NMR approaches. These results show that our NOE assignment algorithm can be successfully applied to protein NMR spectra to obtain high-quality structures.
PMCID: PMC2613371  PMID: 19122773
21.  Contact replacement for NMR resonance assignment 
Bioinformatics  2008;24(13):i205-i213.
Motivation: Complementing its traditional role in structural studies of proteins, nuclear magnetic resonance (NMR) spectroscopy is playing an increasingly important role in functional studies. NMR dynamics experiments characterize motions involved in target recognition, ligand binding, etc., while NMR chemical shift perturbation experiments identify and localize protein–protein and protein–ligand interactions. The key bottleneck in these studies is to determine the backbone resonance assignment, which allows spectral peaks to be mapped to specific atoms. This article develops a novel approach to address that bottleneck, exploiting an available X-ray structure or homology model to assign the entire backbone from a set of relatively fast and cheap NMR experiments.
Results: We formulate contact replacement for resonance assignment as the problem of computing correspondences between a contact graph representing the structure and an NMR graph representing the data; the NMR graph is a significantly corrupted, ambiguous version of the contact graph. We first show that by combining connectivity and amino acid type information, and exploiting the random structure of the noise, one can provably determine unique correspondences in polynomial time with high probability, even in the presence of significant noise (a constant number of noisy edges per vertex). We then detail an efficient randomized algorithm and show that, over a variety of experimental and synthetic datasets, it is robust to typical levels of structural variation (1–2 AA), noise (250–600%) and missings (10–40%). Our algorithm achieves very good overall assignment accuracy, above 80% in α-helices, 70% in β-sheets and 60% in loop regions.
Availability: Our contact replacement algorithm is implemented in platform-independent Python code. The software can be freely obtained for academic use by request from the authors.
PMCID: PMC2718645  PMID: 18586716
22.  MO Tripeptide Diastereomers (M = 99/99mTc, Re): Models To Identify the Structure of 99mTc Peptide Targeted Radiopharmaceuticals 
Inorganic chemistry  2007;46(18):7326-7340.
Biologically active molecules, such as many peptides, serve as targeting vectors for radiopharmaceuticals based on 99mTc. Tripeptides can be suitable chelates and are easily and conveniently synthesized and linked to peptide targeting vectors through solid-phase peptide synthesis and form stable TcVO complexes. Upon complexation with [TcO]3+, two products form; these are syn and anti diastereomers, and they often have different biological behavior. This is the case with the approved radiopharmaceutical [99mTcO]depreotide ([99mTcO]P829, NeoTect) that is used to image lung cancer. [99mTcO]depreotide indeed exhibits two product peaks in its HPLC profile, but assignment of the product peaks to the diastereomers has proven to be difficult because the metal peptide complex is difficult to crystallize for structural analysis. In this study, we isolated diastereomers of [99TcO] and [ReO] complexes of several tripeptide ligands that model the metal chelator region of [99mTcO]depreotide. Using X-ray crystallography, we observed that the early eluting peak (A) corresponds to the anti diastereomer, where the Tc═O group is on the opposite side of the plane formed by the ligand backbone relative to the pendant groups of the tripeptide ligand, and the later eluting peak (B) corresponds to the syn diastereomer, where the Tc═O group is on the same side of the plane as the residues of the tripeptide. 1H NMR and circular dichroism (CD) spectroscopy report on the metal environment and prove to be diagnostic for syn or anti diastereomers, and we identified characteristic features from these techniques that can be used to assign the diastereomer profile in 99mTc peptide radiopharmaceuticals like [99mTcO]depreotide and in 188Re peptide radiotherapeutic agents. Crystallography, potentiometric titration, and NMR results presented insights into the chemistry occurring under physiological conditions. The tripeptide complexes where lysine is the second amino acid crystallized in a deprotonated metallo-amide form, possessing a short N1–M bond. The pKa measurements of the N1 amine (pKa ~5.6) suggested that this amine is rendered more acidic by both metal complexation and the presence of the lysine residue. Furthermore, peptide chelators incorporating a lysine (like the chelator of [TcO]depreotide) likely exist in the deprotonated form in vivo, comprising a neutral metal center. Deprotonation possibly mediates the interconversion process between the syn and anti diastereomers. The N1 amine group on non-lysine-containing metallopeptides is not as acidic (pKa ~6.8) and does not deprotonate and crystallize as do the metallo-amide species. Three of the tripeptide ligands (FGC, FSC, and FKC) were radiolabeled with 99mTc, and the individual syn and anti isomers were isolated for biodistribution studies in normal female nude mice. The main organs of uptake were the liver, intestines, and kidneys, with the FGC compounds exhibiting the highest liver uptake. In comparing the diastereomers, the syn compounds had substantially higher organ uptake and slower blood clearance than the anti compounds.
PMCID: PMC2270398  PMID: 17691766
23.  Data-Driven High-Throughput Prediction of the 3D Structure of Small Molecules: Review and Progress 
Accurate prediction of the 3D structure of small molecules is essential in order to understand their physical, chemical, and biological properties including how they interact with other molecules. Here we survey the field of high-throughput methods for 3D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system’s performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to the state-of-the-art, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD [99.6% organic, 94.6% metal-organic] whereas the widely used commercial method CORINA predicts structures for 68.5% [98.5% organic, 51.6% metal-organic]. On the common subset of molecules predicted by both methods COSMOS makes predictions with an average speed per molecule of 0.15s [0.10s organic, 0.21s metal-organic], and an average RMSD of 1.57Å [1.26Å organic, 1.90Å metal-organic], and CORINA makes predictions with an average speed per molecule of 0.13s [0.18s organic, 0.08s metal-organic], and an average RMSD of 1.60Å [1.13Å organic, 2.11Å metal-organic]. COSMOS is available through the ChemDB chemoinformatics web portal at:
PMCID: PMC3081951  PMID: 21417267
24.  Robust structure-based resonance assignment for functional protein studies by NMR 
Journal of Biomolecular Nmr  2009;46(2):157-173.
High-throughput functional protein NMR studies, like protein interactions or dynamics, require an automated approach for the assignment of the protein backbone. With the availability of a growing number of protein 3D structures, a new class of automated approaches, called structure-based assignment, has been developed quite recently. Structure-based approaches use primarily NMR input data that are not based on J-coupling and for which connections between residues are not limited by through bonds magnetization transfer efficiency. We present here a robust structure-based assignment approach using mainly HN–HN NOEs networks, as well as 1H–15N residual dipolar couplings and chemical shifts. The NOEnet complete search algorithm is robust against assignment errors, even for sparse input data. Instead of a unique and partly erroneous assignment solution, an optimal assignment ensemble with an accuracy equal or near to 100% is given by NOEnet. We show that even low precision assignment ensembles give enough information for functional studies, like modeling of protein-complexes. Finally, the combination of NOEnet with a low number of ambiguous J-coupling sequential connectivities yields a high precision assignment ensemble. NOEnet will be available under:
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-009-9390-3) contains supplementary material, which is available to authorized users.
PMCID: PMC2813526  PMID: 20024602
NMR; Assignment; Structure-based; NOE; Network; Chemical shifts; Residual dipolar couplings; NOEnet
25.  Halogen bonds in some dihalogenated phenols: applications to crystal engineering 
IUCrJ  2013;1(Pt 1):49-60.
The preference of Br to form type II contacts over type I is explored by various techniques. The mechanical properties of some dihalogenated phenols are correlated with their structures.
3,4-Dichlorophenol (1) crystallizes in the tetragonal space group I41/a with a short axis of 3.7926 (9) Å. The structure is unique in that both type I and type II Cl⋯Cl interactions are present, these contact types being distinguished by the angle ranges of the respective C—Cl⋯Cl angles. The present study shows that these two types of contacts are utterly different. The crystal structures of 4-bromo-3-chlorophenol (2) and 3-bromo-4-chlorophenol (3) have been determined. The crystal structure of (2) is isomorphous to that of (1) with the Br atom in the 4-position participating in a type II interaction. However, the monoclinic P21/c packing of compound (3) is different; while the structure still has O—H⋯O hydrogen bonds, the tetramer O—H⋯O synthon seen in (1) and (2) is not seen. Rather than a type I Br⋯Br interaction which would have been mandated if (3) were isomorphous to (1) and (2), Br forms a Br⋯O contact wherein its electrophilic character is clearly evident. Crystal structures of the related compounds 4-chloro-3-iodophenol (4) and 3,5-dibromophenol (5) were also determined. A computational survey of the structural landscape was undertaken for (1), (2) and (3), using a crystal structure prediction protocol in space groups P21/c and I41/a with the COMPASS26 force field. While both tetragonal and monoclinic structures are energetically reasonable for all compounds, the fact that (3) takes the latter structure indicates that Br prefers type II over type I contacts. In order to differentiate further between type I and type II halogen contacts, which being chemically distinct are expected to have different distance fall-off properties, a variable-temperature crystallography study was performed on compounds (1), (2) and (4). Length variations with temperature are greater for type II contacts compared with type I. The type II Br⋯Br interaction in (2) is stronger than the corresponding type II Cl⋯Cl interaction in (1), leading to elastic bending of the former upon application of mechanical stress, which contrasts with the plastic deformation of (1). The observation of elastic deformation in (2) is noteworthy; in that it finds an explanation based on the strengths of the respective halogen bonds, it could also be taken as a good starting model for future property design. Cl/Br isostructurality is studied with the Cambridge Structural Database and it is indicated that this isostructurality is based on shape and size similarity of Cl and Br, rather than arising from any chemical resemblance.
PMCID: PMC4104968
crystal engineering; crystal structure prediction; elastic deformation; intermolecular interaction

Results 1-25 (837363)