Search tips
Search criteria

Results 1-25 (719840)

Clipboard (0)

Related Articles

1.  New software for statistical analysis of Cambridge Structural Database data 
Journal of Applied Crystallography  2011;44(Pt 4):882-886.
A new piece of software for statistical analysis of geometrical, chemical and crystallographic data within the Cambridge Structural Database System is described. This software has been written specifically to deal with chemical structure data and crucially provides simultaneous visualization of the three-dimensional structural information.
A collection of new software tools is presented for the analysis of geometrical, chemical and crystallographic data from the Cambridge Structural Database (CSD). This software supersedes the program Vista. The new functionality is integrated into the program Mercury in order to provide statistical, charting and plotting options alongside three-dimensional structural visualization and analysis. The integration also permits immediate access to other information about specific CSD entries through the Mercury framework, a common requirement in CSD data analyses. In addition, the new software includes a range of more advanced features focused towards structural analysis such as principal components analysis, cone-angle correction in hydrogen-bond analyses and the ability to deal with topological symmetry that may be exhibited in molecular search fragments.
PMCID: PMC3246811  PMID: 22477784
data analysis; computer programs; Cambridge Structural Database; substructure; Vista
2.  Applications of the Cambridge Structural Database in chemical education1  
Journal of Applied Crystallography  2010;43(Pt 5):1208-1223.
The educational value of three-dimensional crystal structures in the Cambridge Structural Database (CSD) is discussed in the context of practical use cases and the availability of a free teaching subset of the CSD that can be used in conjunction with WebCSD, an application that provides internet access to CSD information content.
The Cambridge Structural Database (CSD) is a vast and ever growing compendium of accurate three-dimensional structures that has massive chemical diversity across organic and metal–organic compounds. For these reasons, the CSD is finding significant uses in chemical education, and these applications are reviewed. As part of the teaching initiative of the Cambridge Crystallographic Data Centre (CCDC), a teaching subset of more than 500 CSD structures has been created that illustrate key chemical concepts, and a number of teaching modules have been devised that make use of this subset in a teaching environment. All of this material is freely available from the CCDC website, and the subset can be freely viewed and interrogated using WebCSD, an internet application for searching and displaying CSD information content. In some cases, however, the complete CSD System is required for specific educational applications, and some examples of these more extensive teaching modules are also discussed. The educational value of visualizing real three-dimensional structures, and of handling real experimental results, is stressed throughout.
PMCID: PMC2943741  PMID: 20877495
Cambridge Structural Database; crystallographic education; WebCSD
3.  The Catalytic Mn2+ Sites in the Enolase-Inhibitor Complex - Crystallography, Single Crystal EPR and DFT Calculations 
Crystals of Zn2+ / Mn2+ yeast enolase with the inhibitor PhAH (phosphonoacetohydroxamate) were grown under conditions with a slight preference for binding of Zn2+ at the higher affinity site, site I. The structure of the Zn2+/Mn2+ PhAH complex was solved at a resolution of 1.54 Å and the two catalytic metal binding sites, I and II, show only subtle displacement compared to that of the corresponding complex with the native Mg2+ ions. Low temperature echo-detected high field (W-band, 95 GHz) EPR (electron paramagnetic resonance) and 1H ENDOR (electron-nuclear double resonance) were carried out on a single crystal and rotation patterns were acquired in two perpendicular planes. Analysis of the rotation patterns resolved a total of six Mn2+sites; four symmetry related sites of one type and two out of the four of the other type. The observation of two chemically inequivalent Mn2+ sites shows that Mn2+ ions populates both site I and II and the zero-field splitting ( ZFS) tensors of the Mn2+ in the two sites were determined. The Mn2+site with the larger D-value was assigned to site I based on the 1H ENDOR spectra, which identified the relevant water ligands. This assignment is consistent with the seemingly larger deviation of site I from octahedral symmetry, compared to site II. The ENDOR results gave the coordinates of the protons of two water ligands and adding them to the crystal structure revealed their involvement in a network of H-bonds stabilizing the binding of the metal ions and PhAH. Although specific hyperfine interactions with the inhibitor were not determined, the spectroscopic properties of the Mn2+ in the two sites were consistent with the crystal structure. Density functional theory (DFT) calculations carried out on a cluster representing the catalytic site, with Mn2+ in site I and Zn2+ in site II, and vice versa, gave overestimated D values on the order of the experimental ones, although the larger D value was found for Mn2+ in site II rather than in site I. This was attributed to the high sensitivity of the ZFS parameters to the Mn-O bond lengths and orientations, such that small, but significant differences between the optimized and crystal structure alter the ZFS considerably, well above the difference between the two sites.
PMCID: PMC2538446  PMID: 17367133
4.  Short strong hydrogen bonds in proteins: a case study of rhamnogalacturonan acetylesterase 
The short hydrogen bonds in rhamnogalacturonan acetylesterase have been investigated by structure determination of an active-site mutant, 1H NMR spectra and computational methods. Comparisons are made to database statistics. A very short carboxylic acid carboxylate hydrogen bond, buried in the protein, could explain the low-field (18 p.p.m.) 1H NMR signal.
An extremely low-field signal (at approximately 18 p.p.m.) in the 1H NMR spectrum of rhamnogalacturonan acetylesterase (RGAE) shows the presence of a short strong hydrogen bond in the structure. This signal was also present in the mutant RGAE D192N, in which Asp192, which is part of the catalytic triad, has been replaced with Asn. A careful analysis of wild-type RGAE and RGAE D192N was conducted with the purpose of identifying possible candidates for the short hydrogen bond with the 18 p.p.m. deshielded proton. Theor­etical calculations of chemical shift values were used in the interpretation of the experimental 1H NMR spectra. The crystal structure of RGAE D192N was determined to 1.33 Å resolution and refined to an R value of 11.6% for all data. The structure is virtually identical to the high-resolution (1.12 Å) structure of the wild-type enzyme except for the interactions involving the mutation and a disordered loop. Searches of the Cambridge Structural Database were conducted to obtain information on the donor–acceptor distances of different types of hydrogen bonds. The short hydrogen-bond inter­actions found in RGAE have equivalents in small-molecule structures. An examination of the short hydrogen bonds in RGAE, the calculated pK a values and solvent-accessibilities identified a buried carboxylic acid carboxylate hydrogen bond between Asp75 and Asp87 as the likely origin of the 18 p.p.m. signal. Similar hydrogen-bond interactions between two Asp or Glu carboxy groups were found in 16% of a homology-reduced set of high-quality structures extracted from the PDB. The shortest hydrogen bonds in RGAE are all located close to the active site and short interactions between Ser and Thr side-chain OH groups and backbone carbonyl O atoms seem to play an important role in the stability of the protein structure. These results illustrate the significance of short strong hydrogen bonds in proteins.
PMCID: PMC2483496  PMID: 18645234
short hydrogen bonds; low-field NMR signals; rhamnogalacturonan acetylesterase
5.  A rule-based algorithm for automatic bond type perception 
Assigning bond orders is a necessary and essential step for characterizing a chemical structure correctly in force field based simulations. Several methods have been developed to do this. They all have advantages but with limitations too. Here, an automatic algorithm for assigning chemical connectivity and bond order regardless of hydrogen for organic molecules is provided, and only three dimensional coordinates and element identities are needed for our algorithm. The algorithm uses hard rules, length rules and conjugation rules to fix the structures. The hard rules determine bond orders based on the basic chemical rules; the length rules determine bond order by the length between two atoms based on a set of predefined values for different bond types; the conjugation rules determine bond orders by using the length information derived from the previous rule, the bond angles and some small structural patterns. The algorithm is extensively evaluated in three datasets, and achieves good accuracy of predictions for all the datasets. Finally, the limitation and future improvement of the algorithm are discussed.
PMCID: PMC3557220  PMID: 23113939
Bond type perception; Bond order; Chemical bond; Molecular modeling
6.  Tetra­kis(1,2-dimethoxy­ethane-κ2 O,O′)ytterbium(II) bis­(μ2-phenyl­selenolato-κ2 Se:Se)bis­[bis­(phenyl­selenolato-κSe)mercurate(II)] 
The title salt, [Yb(C4H10O2)4][Hg2(C6H5Se)6], consists of eight-coordinate homoleptic [Yb(DME)4]2+ dications (DME is 1,2-dimethoxy­ethane) countered with [Hg2(SePh)6]2− di­anions. The cations and anions have twofold rotation and inversion symmetry, respectively. The Yb centre displays a square-anti­prismatic coordination geometry and the Hg centre has a distorted tetra­hedral coordination environment. One phenyl­selenolate anion and one methyl group of a DME ligand are disordered over two positions with equal occupancies. This structure is unique in that it represents a less common mol­ecular lanthanide species in which the lanthanide ion is not directly bonded to an anionic ligand. There are no occurrences of the [Hg2(SePh)6]2− dianion in the Cambridge Structural Database (Version of November 2007), but there are similar oligomeric and polymeric Hgx(SePh)y species. The crystal structure is characterized by alternating layers of cations and anions stacked along the c axis.
PMCID: PMC2961915  PMID: 21203084
7.  WebCSD: the online portal to the Cambridge Structural Database 
Journal of Applied Crystallography  2010;43(Pt 2):362-366.
The new web-based application WebCSD is introduced, which provides a range of facilities for searching the Cambridge Structural Database within a standard web browser. Search options within WebCSD include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching.
WebCSD, a new web-based application developed by the Cambridge Crystallographic Data Centre, offers fast searching of the Cambridge Structural Database using only a standard internet browser. Search facilities include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching. Text, chemical diagrams and three-dimensional structural information can all be studied in the results browser using the efficient entry summaries and embedded three-dimensional viewer.
PMCID: PMC3246830  PMID: 22477776
WebCSD; computer programs; database searching; Cambridge Structural Database; similarity searching; substructure; reduced cell
8.  Redetermination of (d-penicillaminato)lead(II) 
In the title coordination polymer, [Pb(C5H9NO2S)]n {systematic name: catena-poly[(μ-2-amino-3-methyl-3-sulfido­butano­ato)lead(II)]}, the d-penicillaminate ligand coordin­ates to the metal ion in an N,S,O-tridentate mode. The S atom acts as a bridge to two neighbouring PbII ions, thereby forming a double thiol­ate chain. Moreover, the coordinating carboxyl­ate O atom forms bridges to the PbII ions in the adjacent chain. The overall coordination sphere of the PbII ion can be described as a highly distorted penta­gonal bipyramid with a void in the equatorial plane between the long Pb—S bonds probably occupied by the stereochemically active inert electron pair. The amino H atoms form N—H⋯S and N—H⋯O hydrogen bonds, resulting in a cluster of four complex units, giving rise to an R 4 4(16) ring lying in the ab plane. The crystal structure of the title compound has been reported previously [Freeman et al. (1974 ▶). Chem. Soc. Chem. Commun. pp. 366–367] but the atomic coordinates have not been deposited in the Cambridge Structural Database (refcode DPENPB). Additional details of the hydrogen bonding are presented here.
PMCID: PMC3343873  PMID: 22589847
9.  Automated extraction of chemical structure information from digital raster images 
To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated.
This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns.
The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.
PMCID: PMC2648963  PMID: 19196483
10.  An automated system designed for large scale NMR data deposition and annotation: application to over 600 assigned chemical shift data entries to the BioMagResBank from the Riken Structural Genomics/Proteomics Initiative internal database 
Journal of biomolecular NMR  2012;53(4):311-320.
Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including 1H, 13C and 15N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001–2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies.
PMCID: PMC4308039  PMID: 22689068
NMR; Chemical shift; Proteomics; Database; BMRB
11.  Data mining of metal ion environments present in protein structures 
Journal of inorganic biochemistry  2008;102(9):1765-1776.
Analysis of metal-protein interaction distances, coordination numbers, B-factors (displacement parameters), and occupancies of metal binding sites in protein structures determined by X-ray crystallography and deposited in the PDB shows many unusual values and unexpected correlations. By measuring the frequency of each amino acid in metal ion binding sites, the positive or negative preferences of each residue for each type of cation were identified. Our approach may be used for fast identification of metal-binding structural motifs that cannot be identified on the basis of sequence similarity alone. The analysis compares data derived separately from high and medium resolution structures from the PDB with those from very high resolution small-molecule structures in the Cambridge Structural Database (CSD). For high resolution protein structures, the distribution of metal-protein or metal-water interaction distances agrees quite well with data from CSD, but the distribution is unrealistically wide for medium (2.0 – 2.5 Å) resolution data. Our analysis of cation B-factors versus average B-factors of atoms in the cation environment reveals substantial numbers of structures contain either an incorrect metal ion assignment or an unusual coordination pattern. Correlation between data resolution and completeness of the metal coordination spheres is also found.
PMCID: PMC2872550  PMID: 18614239
Metalloprotein; protein structure; metal binding
12.  catena-Poly[[μ3-hydroxido-tetra-μ2-pyrid­azine-1:2κ4 N:N′;1:3κ2 N:N′;2:3κ2 N:N′-tetrakis(selenocyanato)-1κN,2κN,3κ2 N-trizinc(II)]-μ-cyanido-1:2′κ2 C:N] 
In the crystal structure of the title compound, [Zn3(NCSe)4(OH)(CN)(C4H4N2)4]n one of the two crystallograph­ically independent zinc(II) cations is coordinated by two terminal N-bonded seleno­cyanato anions and two N atoms of two symmetry-related pyridazine ligands in a trigonal-bipyramidal geometry, while the other zinc(II) cation is coordinated by one terminal N-bonded seleno­cyanato anion, one μ-1,2-cyanido anion and three N atoms of three crystallographically independent pyridazine ligands in a slightly distorted octa­hedral coordination geometry. The zinc(II) atoms are further connected via a μ3-hydroxido anion into trinuclear building blocks. The formula unit consists of three zinc cations, four seleno­cyanato anions, one μ3-hydroxido anion, four pyridazine mol­ecules as well as one cyanido anion. The asymmetric unit contains half of a formula unit. One of the zinc atoms, two seleno­cyanato anions, two pyridazine ligands and the μ3-hydroxido anion are located on a crystallographic mirror plane, whereas the cyanido anion is located on a twofold rotation axis. Therefore, this anion is disordered due to symmetry. The cyanido anions connect the metal centres into polymeric zigzag chains propagating along the a axis.
PMCID: PMC3007368  PMID: 21588094
13.  Validation of archived chemical shifts through atomic coordinates 
Proteins  2010;78(11):2482-2489.
The public archives containing protein information in the form of NMR chemical shift data at the BioMagResBank (BMRB) and of 3D structure coordinates at the Protein Data Bank are continuously expanding. The quality of the data contained in these archives, however, varies. The main issue for chemical shift values is that they are determined relative to a reference frequency. When this reference frequency is set incorrectly, all related chemical shift values are systematically offset. Such wrongly referenced chemical shift values, as well as other problems such as chemical shift values that are assigned to the wrong atom, are not easily distinguished from correct values and effectively reduce the usefulness of the archive. We describe a new method to correct and validate protein chemical shift values in relation to their 3D structure coordinates. This method classifies atoms using two parameters: the per-atom solvent accessible surface area (as calculated from the coordinates) and the secondary structure of the parent amino acid. Through the use of Gaussian statistics based on a large database of 3220 BMRB entries, we obtain per-entry chemical shift corrections as well as Z scores for the individual chemical shift values. In addition, information on the error of the correction value itself is available, and the method can retain only dependable correction values. We provide an online resource with chemical shift, atom exposure, and secondary structure information for all relevant BMRB entries ( and hope this data will aid the development of new chemical shift-based methods in NMR. Proteins 2010. © 2010 Wiley-Liss, Inc.
PMCID: PMC2970900  PMID: 20602353
nuclear magnetic resonance; chemical shift; protein; atom coordinates; validation
14.  Atomic resolution studies of carbonic anhydrase II 
The structure of human carbonic anhydrase II has been solved with a sulfonamide inhibitor at 0.9 Å resolution. Structural variation and flexibility is seen on the surface of the protein and is consistent with the anisotropic ADPs obtained from refinement. Comparison with 13 other atomic resolution carbonic anhydrase structures shows that surface variation exists even in these highly ordered isomorphous crystals.
Carbonic anhydrase has been well studied structurally and functionally owing to its importance in respiration. A large number of X-ray crystallographic structures of carbonic anhydrase and its inhibitor complexes have been determined, some at atomic resolution. Structure determination of a sulfonamide-containing inhibitor complex has been carried out and the structure was refined at 0.9 Å resolution with anisotropic atomic displacement parameters to an R value of 0.141. The structure is similar to those of other carbonic anhydrase complexes, with the inhibitor providing a fourth nonprotein ligand to the active-site zinc. Comparison of this structure with 13 other atomic resolution (higher than 1.25 Å) isomorphous carbonic anhydrase structures provides a view of the structural similarity and variability in a series of crystal structures. At the center of the protein the structures superpose very well. The metal complexes superpose (with only two exceptions) with standard deviations of 0.01 Å in some zinc–protein and zinc–ligand bond lengths. In contrast, regions of structural variability are found on the protein surface, possibly owing to flexibility and disorder in the individual structures, differences in the chemical and crystalline environments or the different approaches used by different investigators to model weak or complicated electron-density maps. These findings suggest that care must be taken in interpreting structural details on protein surfaces on the basis of individual X-ray structures, even if atomic resolution data are available.
PMCID: PMC2865367  PMID: 20445237
carbonic anhydrase; structure comparison; metalloproteins; atomic resolution
15.  MO Tripeptide Diastereomers (M = 99/99mTc, Re): Models To Identify the Structure of 99mTc Peptide Targeted Radiopharmaceuticals 
Inorganic chemistry  2007;46(18):7326-7340.
Biologically active molecules, such as many peptides, serve as targeting vectors for radiopharmaceuticals based on 99mTc. Tripeptides can be suitable chelates and are easily and conveniently synthesized and linked to peptide targeting vectors through solid-phase peptide synthesis and form stable TcVO complexes. Upon complexation with [TcO]3+, two products form; these are syn and anti diastereomers, and they often have different biological behavior. This is the case with the approved radiopharmaceutical [99mTcO]depreotide ([99mTcO]P829, NeoTect) that is used to image lung cancer. [99mTcO]depreotide indeed exhibits two product peaks in its HPLC profile, but assignment of the product peaks to the diastereomers has proven to be difficult because the metal peptide complex is difficult to crystallize for structural analysis. In this study, we isolated diastereomers of [99TcO] and [ReO] complexes of several tripeptide ligands that model the metal chelator region of [99mTcO]depreotide. Using X-ray crystallography, we observed that the early eluting peak (A) corresponds to the anti diastereomer, where the Tc═O group is on the opposite side of the plane formed by the ligand backbone relative to the pendant groups of the tripeptide ligand, and the later eluting peak (B) corresponds to the syn diastereomer, where the Tc═O group is on the same side of the plane as the residues of the tripeptide. 1H NMR and circular dichroism (CD) spectroscopy report on the metal environment and prove to be diagnostic for syn or anti diastereomers, and we identified characteristic features from these techniques that can be used to assign the diastereomer profile in 99mTc peptide radiopharmaceuticals like [99mTcO]depreotide and in 188Re peptide radiotherapeutic agents. Crystallography, potentiometric titration, and NMR results presented insights into the chemistry occurring under physiological conditions. The tripeptide complexes where lysine is the second amino acid crystallized in a deprotonated metallo-amide form, possessing a short N1–M bond. The pKa measurements of the N1 amine (pKa ~5.6) suggested that this amine is rendered more acidic by both metal complexation and the presence of the lysine residue. Furthermore, peptide chelators incorporating a lysine (like the chelator of [TcO]depreotide) likely exist in the deprotonated form in vivo, comprising a neutral metal center. Deprotonation possibly mediates the interconversion process between the syn and anti diastereomers. The N1 amine group on non-lysine-containing metallopeptides is not as acidic (pKa ~6.8) and does not deprotonate and crystallize as do the metallo-amide species. Three of the tripeptide ligands (FGC, FSC, and FKC) were radiolabeled with 99mTc, and the individual syn and anti isomers were isolated for biodistribution studies in normal female nude mice. The main organs of uptake were the liver, intestines, and kidneys, with the FGC compounds exhibiting the highest liver uptake. In comparing the diastereomers, the syn compounds had substantially higher organ uptake and slower blood clearance than the anti compounds.
PMCID: PMC2270398  PMID: 17691766
16.  Sterically Demanding Multidentate Ligand Tris[(2-(6-methylpyridyl))methyl]amine Slows Exchange and Enhances Solution State Ligand Proton NMR Coupling to 199Hg(II) 
Inorganic chemistry  2002;41(9):2529-2536.
The solution state coordination chemistry of Hg(ClO4)2 with tris[(2-(6-methylpyridyl))methyl]amine (TLA) was investigated in acetonitrile-d3 by proton NMR. Although Hg(II) is a d10 metal ion commonly associated with notoriously rapid exchange between coordination environments, as many as six ligand environments were observed to be in slow exchange on the chemical shift time scale at select metal-to-ligand ratios. One of these ligand environments was associated with extensive heteronuclear coupling between protons and 199Hg and was assigned to the complex [Hg(TLA)]2+. The 5J(1H199Hg) = 8 Hz associated with this complex is the first example of five-bond coupling in a nitrogen coordination compound of Hg(II). The spectral complexity of related studies conducted in acetone-d6 precluded analysis of coordination equilibria. Crystallographic characterization of the T-shaped complex [Hg(TLAH)(CH2COCH3)](ClO4)2 (1) in which two pyridyl rings are pendant suggested that the acidity of acetone combined with the poor coordinating abilities of the neutral solvent adds additional complexity to solution equilibria. The complex crystallizes in the triclinic space group P1¯ with a = 9.352(2) Å, b = 12.956(2) Å, c = 14.199(2) Å, α = 115.458(10)°, β = 90.286(11)°, γ = 108.445(11)°, and Z = 2. The HgNamine, Hg-Npyridyl, and Hg-C bond lengths in the complex are 2.614(4), 2.159(4), and 2.080(6) Å, respectively. Relevance to development of 199Hg NMR as a metallobioprobe is discussed.
PMCID: PMC1560100  PMID: 11978122
17.  A Study of the Hydration of the Alkali Metal Ions in Aqueous Solution 
Inorganic Chemistry  2011;51(1):425-438.
The hydration of the alkali metal ions in aqueous solution has been studied by large angle X-ray scattering (LAXS) and double difference infrared spectroscopy (DDIR). The structures of the dimethyl sulfoxide solvated alkali metal ions in solution have been determined to support the studies in aqueous solution. The results of the LAXS and DDIR measurements show that the sodium, potassium, rubidium and cesium ions all are weakly hydrated with only a single shell of water molecules. The smaller lithium ion is more strongly hydrated, most probably with a second hydration shell present. The influence of the rubidium and cesium ions on the water structure was found to be very weak, and it was not possible to quantify this effect in a reliable way due to insufficient separation of the O–D stretching bands of partially deuterated water bound to these metal ions and the O–D stretching bands of the bulk water. Aqueous solutions of sodium, potassium and cesium iodide and cesium and lithium hydroxide have been studied by LAXS and M–O bond distances have been determined fairly accurately except for lithium. However, the number of water molecules binding to the alkali metal ions is very difficult to determine from the LAXS measurements as the number of distances and the temperature factor are strongly correlated. A thorough analysis of M–O bond distances in solid alkali metal compounds with ligands binding through oxygen has been made from available structure databases. There is relatively strong correlation between M–O bond distances and coordination numbers also for the alkali metal ions even though the M–O interactions are weak and the number of complexes of potassium, rubidium and cesium with well-defined coordination geometry is very small. The mean M–O bond distance in the hydrated sodium, potassium, rubidium and cesium ions in aqueous solution have been determined to be 2.43(2), 2.81(1), 2.98(1) and 3.07(1) Å, which corresponds to six-, seven-, eight- and eight-coordination. These coordination numbers are supported by the linear relationship of the hydration enthalpies and the M–O bond distances. This correlation indicates that the hydrated lithium ion is four-coordinate in aqueous solution. New ionic radii are proposed for four- and six-coordinate lithium(I), 0.60 and 0.79 Å, respectively, as well as for five- and six-coordinate sodium(I), 1.02 and 1.07 Å, respectively. The ionic radii for six- and seven-coordinate K+, 1.38 and 1.46 Å, respectively, and eight-coordinate Rb+ and Cs+, 1.64 and 1.73 Å, respectively, are confirmed from previous studies. The M–O bond distances in dimethyl sulfoxide solvated sodium, potassium, rubidium and cesium ions in solution are very similar to those observed in aqueous solution.
The hydration of alkali metal ions has been studied by large angle X-ray scattering, LAXS, and double difference infrared spectroscopy. The obtained M−O bond distances from LAXS have been compared to relevant crystal structures, conclusions about hydration numbers in aqueous solution have been made, and new ionic radii have been proposed. Hydration numbers of six, seven, eight and eight are proposed for the sodium, potassium, rubidium and cesium ions in aqueous solution.
PMCID: PMC3250073  PMID: 22168370
18.  STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins 
Nucleic Acids Research  2004;32(Web Server issue):W500-W502.
STRIDE is a software tool for secondary structure assignment from atomic resolution protein structures. It implements a knowledge-based algorithm that makes combined use of hydrogen bond energy and statistically derived backbone torsional angle information and is optimized to return resulting assignments in maximal agreement with crystallographers' designations. The STRIDE web server provides access to this tool and allows visualization of the secondary structure, as well as contact and Ramachandran maps for any file uploaded by the user with atomic coordinates in the Protein Data Bank (PDB) format. A searchable database of STRIDE assignments for the latest PDB release is also provided. The STRIDE server is accessible from
PMCID: PMC441567  PMID: 15215436
19.  Evidence for a Dual Role of an Active Site Histidine in α-Amino-β-Carboxymuconate-ε-Semialdehyde Decarboxylase† 
Biochemistry  2012;51(29):5811-5821.
The previously reported crystal structures of α-amino-β-carboxymuconate-ε-semialdehyde decarboxylase (ACMSD) show a five-coordinate Zn(II)(His)3(Asp)(OH2) active site. The water ligand is H-bonded to a conserved His228 residue adjacent to the metal center in ACMSD from Pseudomonas fluorescences (PfACMSD). Site directed mutagenesis of His228 to tyrosine and glycine in the present study results in complete or significant loss of activity. Metal analysis shows that H228Y and H228G contain iron rather than zinc, indicating that this residue plays a role in metal selectivity of the protein. As-isolated H228Y displays a blue color, which is not seen in wild-type ACMSD. Quinone staining and resonance Raman analyses indicate that the blue color originates from Fe(III)-tyrosinate ligand-to-metal-charge- transfer (LMCT). Co(II)-substituted H228Y ACMSD is brown in color and exhibits an EPR spectrum showing a high-spin Co(II) center with a well-resolved 59Co (I = 7/2) eight-line hyperfine splitting pattern. The X-ray crystal structures of the as-isolated Fe-H228Y (2.8 Å), Co- (2.4 Å) and Znsubstituted H228Y (2.0 Å resolution) support the spectroscopic assignment of metal ligation of the Tyr228 residue. The crystal structure of Zn-H228G (2.6 Å) was also solved. These four structures show that the water ligand present in WT Zn-ACMSD is either missing (Fe-H228Y, Co-H228Y, and Zn- H228G) or disrupted (Zn-H228Y) in response to His228 mutation. Together, these results highlight the importance of His228 for PfACMSD’s metal specificity as well as maintaining a water molecule as ligand of the metal center. His228 is thus proposed to play a role in activating the metal-bound water ligand for subsequent nucleophilic attack on the substrate.
PMCID: PMC3419591  PMID: 22746257
20.  Halogen bonds in some dihalogenated phenols: applications to crystal engineering 
IUCrJ  2013;1(Pt 1):49-60.
The preference of Br to form type II contacts over type I is explored by various techniques. The mechanical properties of some dihalogenated phenols are correlated with their structures.
3,4-Dichlorophenol (1) crystallizes in the tetragonal space group I41/a with a short axis of 3.7926 (9) Å. The structure is unique in that both type I and type II Cl⋯Cl interactions are present, these contact types being distinguished by the angle ranges of the respective C—Cl⋯Cl angles. The present study shows that these two types of contacts are utterly different. The crystal structures of 4-bromo-3-chlorophenol (2) and 3-bromo-4-chlorophenol (3) have been determined. The crystal structure of (2) is isomorphous to that of (1) with the Br atom in the 4-position participating in a type II interaction. However, the monoclinic P21/c packing of compound (3) is different; while the structure still has O—H⋯O hydrogen bonds, the tetramer O—H⋯O synthon seen in (1) and (2) is not seen. Rather than a type I Br⋯Br interaction which would have been mandated if (3) were isomorphous to (1) and (2), Br forms a Br⋯O contact wherein its electrophilic character is clearly evident. Crystal structures of the related compounds 4-chloro-3-iodophenol (4) and 3,5-dibromophenol (5) were also determined. A computational survey of the structural landscape was undertaken for (1), (2) and (3), using a crystal structure prediction protocol in space groups P21/c and I41/a with the COMPASS26 force field. While both tetragonal and monoclinic structures are energetically reasonable for all compounds, the fact that (3) takes the latter structure indicates that Br prefers type II over type I contacts. In order to differentiate further between type I and type II halogen contacts, which being chemically distinct are expected to have different distance fall-off properties, a variable-temperature crystallography study was performed on compounds (1), (2) and (4). Length variations with temperature are greater for type II contacts compared with type I. The type II Br⋯Br interaction in (2) is stronger than the corresponding type II Cl⋯Cl interaction in (1), leading to elastic bending of the former upon application of mechanical stress, which contrasts with the plastic deformation of (1). The observation of elastic deformation in (2) is noteworthy; in that it finds an explanation based on the strengths of the respective halogen bonds, it could also be taken as a good starting model for future property design. Cl/Br isostructurality is studied with the Cambridge Structural Database and it is indicated that this isostructurality is based on shape and size similarity of Cl and Br, rather than arising from any chemical resemblance.
PMCID: PMC4104968
crystal engineering; crystal structure prediction; elastic deformation; intermolecular interaction
21.  Data-Driven High-Throughput Prediction of the 3D Structure of Small Molecules: Review and Progress 
Accurate prediction of the 3D structure of small molecules is essential in order to understand their physical, chemical, and biological properties including how they interact with other molecules. Here we survey the field of high-throughput methods for 3D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system’s performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to the state-of-the-art, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD [99.6% organic, 94.6% metal-organic] whereas the widely used commercial method CORINA predicts structures for 68.5% [98.5% organic, 51.6% metal-organic]. On the common subset of molecules predicted by both methods COSMOS makes predictions with an average speed per molecule of 0.15s [0.10s organic, 0.21s metal-organic], and an average RMSD of 1.57Å [1.26Å organic, 1.90Å metal-organic], and CORINA makes predictions with an average speed per molecule of 0.13s [0.18s organic, 0.08s metal-organic], and an average RMSD of 1.60Å [1.13Å organic, 2.11Å metal-organic]. COSMOS is available through the ChemDB chemoinformatics web portal at:
PMCID: PMC3081951  PMID: 21417267
22.  A crystallographic perspective on sharing data and knowledge 
The crystallographic community is in many ways an exemplar of the benefits and practices of sharing data. Since the inception of the technique, virtually every published crystal structure has been made available to others. This has been achieved through the establishment of several specialist data centres, including the Cambridge Crystallographic Data Centre, which produces the Cambridge Structural Database. Containing curated structures of small organic molecules, some containing a metal, the database has been produced for almost 50 years. This has required the development of complex informatics tools and an environment allowing expert human curation. As importantly, a financial model has evolved which has, to date, ensured the sustainability of the resource. However, the opportunities afforded by technological changes and changing attitudes to sharing data make it an opportune moment to review current practices.
PMCID: PMC4196029  PMID: 25091065
Crystallography; Data; Knowledge; Sharing; Sustainability
23.  Protein Side-Chain Resonance Assignment and NOE Assignment Using RDC-Defined Backbones without TOCSY Data3 
Journal of biomolecular NMR  2011;50(4):371-395.
One bottleneck in NMR structure determination lies in the laborious and time-consuming process of side-chain resonance and NOE assignments. Compared to the well-studied backbone resonance assignment problem, automated side-chain resonance and NOE assignments are relatively less explored. Most NOE assignment algorithms require nearly complete side-chain resonance assignments from a series of through-bond experiments such as HCCH-TOCSY or HCCCONH. Unfortunately, these TOCSY experiments perform poorly on large proteins. To overcome this deficiency, we present a novel algorithm, called NASCA (NOE Assignment and Side-Chain Assignment), to automate both side-chain resonance and NOE assignments and to perform high-resolution protein structure determination in the absence of any explicit through-bond experiment to facilitate side-chain resonance assignment, such as HCCH-TOCSY. After casting the assignment problem into a Markov Random Field (MRF), NASCA extends and applies combinatorial protein design algorithms to compute optimal assignments that best interpret the NMR data. The MRF captures the contact map information of the protein derived from NOESY spectra, exploits the backbone structural information determined by RDCs, and considers all possible side-chain rotamers. The complexity of the combinatorial search is reduced by using a dead-end elimination (DEE) algorithm, which prunes side-chain resonance assignments that are provably not part of the optimal solution. Then an A* search algorithm is employed to find a set of optimal side-chain resonance assignments that best fit the NMR data. These side-chain resonance assignments are then used to resolve the NOE assignment ambiguity and compute high-resolution protein structures. Tests on five proteins show that NASCA assigns resonances for more than 90% of side-chain protons, and achieves about 80% correct assignments. The final structures computed using the NOE distance restraints assigned by NASCA have backbone RMSD 0.8 – 1.5 Å from the reference structures determined by traditional NMR approaches.
PMCID: PMC3155202  PMID: 21706248
Nuclear magnetic resonance (NMR); side-chain resonance assignment; nuclear Overhauser effect (NOE) assignment; residual dipolar coupling (RDC); protein structure determination
24.  Domain-based small molecule binding site annotation 
BMC Bioinformatics  2006;7:152.
Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites.
Using a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at . The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives.
By focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein.
PMCID: PMC1435939  PMID: 16545112
25.  BioMe: biologically relevant metals 
Nucleic Acids Research  2012;40(Web Server issue):W352-W357.
In this article, we introduce BioMe (biologically relevant metals), a web-based platform for calculation of various statistical properties of metal-binding sites. Users can obtain the following statistical properties: presence of selected ligands in metal coordination sphere, distribution of coordination numbers, percentage of metal ions coordinated by the combination of selected ligands, distribution of monodentate and bidentate metal-carboxyl, bindings for ASP and GLU, percentage of particular binuclear metal centers, distribution of coordination geometry, descriptive statistics for a metal ion–donor distance and percentage of the selected metal ions coordinated by each of the selected ligands. Statistics is presented in numerical and graphical forms. The underlying database contains information about all contacts within the range of 3 Å from a metal ion found in the asymmetric crystal unit. The stored information for each metal ion includes Protein Data Bank code, structure determination method, types of metal-binding chains [protein, ribonucleic acid (RNA), deoxyribonucleic acid (DNA), water and other] and names of the bounded ligands (amino acid residue, RNA nucleotide, DNA nucleotide, water and other) and the coordination number, the coordination geometry and, if applicable, another metal(s). BioMe is on a regular weekly update schedule. It is accessible at
PMCID: PMC3394320  PMID: 22693222

Results 1-25 (719840)