A new piece of software for statistical analysis of geometrical, chemical and crystallographic data within the Cambridge Structural Database System is described. This software has been written specifically to deal with chemical structure data and crucially provides simultaneous visualization of the three-dimensional structural information.
A collection of new software tools is presented for the analysis of geometrical, chemical and crystallographic data from the Cambridge Structural Database (CSD). This software supersedes the program Vista. The new functionality is integrated into the program Mercury in order to provide statistical, charting and plotting options alongside three-dimensional structural visualization and analysis. The integration also permits immediate access to other information about specific CSD entries through the Mercury framework, a common requirement in CSD data analyses. In addition, the new software includes a range of more advanced features focused towards structural analysis such as principal components analysis, cone-angle correction in hydrogen-bond analyses and the ability to deal with topological symmetry that may be exhibited in molecular search fragments.
data analysis; computer programs; Cambridge Structural Database; substructure; Vista
The educational value of three-dimensional crystal structures in the Cambridge Structural Database (CSD) is discussed in the context of practical use cases and the availability of a free teaching subset of the CSD that can be used in conjunction with WebCSD, an application that provides internet access to CSD information content.
The Cambridge Structural Database (CSD) is a vast and ever growing compendium of accurate three-dimensional structures that has massive chemical diversity across organic and metal–organic compounds. For these reasons, the CSD is finding significant uses in chemical education, and these applications are reviewed. As part of the teaching initiative of the Cambridge Crystallographic Data Centre (CCDC), a teaching subset of more than 500 CSD structures has been created that illustrate key chemical concepts, and a number of teaching modules have been devised that make use of this subset in a teaching environment. All of this material is freely available from the CCDC website, and the subset can be freely viewed and interrogated using WebCSD, an internet application for searching and displaying CSD information content. In some cases, however, the complete CSD System is required for specific educational applications, and some examples of these more extensive teaching modules are also discussed. The educational value of visualizing real three-dimensional structures, and of handling real experimental results, is stressed throughout.
Cambridge Structural Database; crystallographic education; WebCSD
Assigning bond orders is a necessary and essential step for characterizing a chemical structure correctly in force field based simulations. Several methods have been developed to do this. They all have advantages but with limitations too. Here, an automatic algorithm for assigning chemical connectivity and bond order regardless of hydrogen for organic molecules is provided, and only three dimensional coordinates and element identities are needed for our algorithm. The algorithm uses hard rules, length rules and conjugation rules to fix the structures. The hard rules determine bond orders based on the basic chemical rules; the length rules determine bond order by the length between two atoms based on a set of predefined values for different bond types; the conjugation rules determine bond orders by using the length information derived from the previous rule, the bond angles and some small structural patterns. The algorithm is extensively evaluated in three datasets, and achieves good accuracy of predictions for all the datasets. Finally, the limitation and future improvement of the algorithm are discussed.
Bond type perception; Bond order; Chemical bond; Molecular modeling
The new web-based application WebCSD is introduced, which provides a range of facilities for searching the Cambridge Structural Database within a standard web browser. Search options within WebCSD include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching.
WebCSD, a new web-based application developed by the Cambridge Crystallographic Data Centre, offers fast searching of the Cambridge Structural Database using only a standard internet browser. Search facilities include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching. Text, chemical diagrams and three-dimensional structural information can all be studied in the results browser using the efficient entry summaries and embedded three-dimensional viewer.
WebCSD; computer programs; database searching; Cambridge Structural Database; similarity searching; substructure; reduced cell
STRIDE is a software tool for secondary structure assignment from atomic resolution protein structures. It implements a knowledge-based algorithm that makes combined use of hydrogen bond energy and statistically derived backbone torsional angle information and is optimized to return resulting assignments in maximal agreement with crystallographers' designations. The STRIDE web server provides access to this tool and allows visualization of the secondary structure, as well as contact and Ramachandran maps for any file uploaded by the user with atomic coordinates in the Protein Data Bank (PDB) format. A searchable database of STRIDE assignments for the latest PDB release is also provided. The STRIDE server is accessible from http://webclu.bio.wzw.tum.de/stride/.
The electronic structure of a genuine paramagnetic des-oxo Mo(V) catalytic intermediate in the reaction of dimethyl sulfoxide reductase (DMSOR) with (CH3)3NO has been probed by EPR, electronic absorption and MCD spectroscopies. EPR spectroscopy reveals rhombic g- and A-tensors that indicate a low-symmetry geometry for this intermediate and a singly occupied molecular orbital (SOMO) that is dominantly metal centered. The excited state spectroscopic data were interpreted in the context of electronic structure calculations, and this has resulted in a full assignment of the observed magnetic circular dichroism (MCD) and electronic absorption bands, a detailed understanding of the metal-ligand bonding scheme, and an evaluation of the Mo(V) coordination geometry and Mo(V)-Sdithiolene covalency as it pertains to the stability of the intermediate and electron transfer regeneration. Finally, the relationship between des-oxo Mo(V) and des-oxo Mo(IV) geometric and electronic structures is discussed relative to the reaction coordinate in members of the DMSOR enzyme family.
DMSO Reductase; ditholene; molybdenum; magnetic circular dichroism; electronic structure; electron paramagnetic resonance; molecular orbital; redox orbital; reaction coordinate
Divalent zinc triad metal ion complexes of type M(L)2(ClO4)2 (L = N-(2-pyridylmethyl)-N-(2-(methylthio)ethyl)amine) with N4S2 metal coordination spheres were isolated and characterized by X-ray crystallography and variable temperature proton NMR. Although bis-tridentate chelates have nine geometric isomers, the crystallographically characterized complexes of all three metal ions had trans facial octahedral coordination geometry with Ci symmetry. Despite the low coordination number and geometric preferences of d10 metal ions, which facilitate inter- and intramolecular exchange processes, dilute solutions of these bis-tridentate chelates exhibited slow geometric isomerization. Symmetry, sterics and shielding arguments supported specific isomeric assignments for the major and minor chemical shift environments observed at low temperature. At elevated temperature, rapid intramolecular exchange occurred for all three complexes but slow intermolecular exchange on the coupling constant time scale was evidenced through detection of JHgH interactions for Hg(L)2 2+. These unusual observations are discussed in the context of the zinc triad metal ion coordination chemistry of related bis-tridentate chelates.
Isomerization; NMR; Single crystal X-ray structure; Group 12; bis-tridentate chelate
The electronic properties of Thermus thermophilus CuA in the oxidized form were studied by 1H and 13C NMR spectroscopy. All the 1H and 13C resonances from cysteine and imidazole ligands were observed and assigned in a sequence-specific fashion. The detection of net electron spin density on a peptide moiety is attributed to the presence of a H-bond to a coordinating sulfur atom. This hydrogen-bond is conserved in all natural CuA variants, and is important to maintain the electronic structure of the metal site, rendering the two Cys ligands nonequivalent. The anomalous temperature dependence of the chemical shifts is explained by the presence of a low-lying excited state located about 600 cm-1 above the ground state. The room temperature shifts can be described as the thermal average of a σu* ground state and a πu excited state. These results provide a detailed description of the electronic structure of the CuA site at atomic resolution in solution at physiologically relevant temperature.
The ybeY protein from E. coli is reported at a 2.7 Å resolution with a metal ion.
The three-dimensional crystallographic structure of the ybeY protein from Escherichia coli (SwissProt entry P77385) is reported at 2.7 Å resolution. YbeY is a hypothetical protein that belongs to the UPF0054 family. The structure reveals that the protein binds a metal ion in a tetrahedral geometry. Three coordination sites are provided by histidine residues, while the fourth might be a water molecule that is not seen in the diffraction map because of its relatively low resolution. X-ray fluorescence analysis of the purified protein suggests that the metal is a nickel ion. The structure of ybeY and its sequence similarity to a number of predicted metal-dependent hydrolases provides a functional assignment for this protein family. The figures and tables of this paper were prepared using semi-automated tools, termed the Autopublish server, developed by the New York Structural GenomiX Research Consortium, with the goal of facilitating the rapid publication of crystallographic structures that emanate from worldwide Structural Genomics efforts, including the NIH-funded Protein Structure Initiative.
Protein Structure Initiative; metalloproteins; nickel; UPF0054 family
Accurate prediction of the 3D structure of small molecules is essential in order to understand their physical, chemical, and biological properties including how they interact with other molecules. Here we survey the field of high-throughput methods for 3D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system’s performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to the state-of-the-art, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD [99.6% organic, 94.6% metal-organic] whereas the widely used commercial method CORINA predicts structures for 68.5% [98.5% organic, 51.6% metal-organic]. On the common subset of molecules predicted by both methods COSMOS makes predictions with an average speed per molecule of 0.15s [0.10s organic, 0.21s metal-organic], and an average RMSD of 1.57Å [1.26Å organic, 1.90Å metal-organic], and CORINA makes predictions with an average speed per molecule of 0.13s [0.18s organic, 0.08s metal-organic], and an average RMSD of 1.60Å [1.13Å organic, 2.11Å metal-organic]. COSMOS is available through the ChemDB chemoinformatics web portal at: http://cdb.ics.uci.edu/.
Analysis of metal-protein interaction distances, coordination numbers, B-factors (displacement parameters), and occupancies of metal binding sites in protein structures determined by X-ray crystallography and deposited in the PDB shows many unusual values and unexpected correlations. By measuring the frequency of each amino acid in metal ion binding sites, the positive or negative preferences of each residue for each type of cation were identified. Our approach may be used for fast identification of metal-binding structural motifs that cannot be identified on the basis of sequence similarity alone. The analysis compares data derived separately from high and medium resolution structures from the PDB with those from very high resolution small-molecule structures in the Cambridge Structural Database (CSD). For high resolution protein structures, the distribution of metal-protein or metal-water interaction distances agrees quite well with data from CSD, but the distribution is unrealistically wide for medium (2.0 – 2.5 Å) resolution data. Our analysis of cation B-factors versus average B-factors of atoms in the cation environment reveals substantial numbers of structures contain either an incorrect metal ion assignment or an unusual coordination pattern. Correlation between data resolution and completeness of the metal coordination spheres is also found.
Metalloprotein; protein structure; metal binding
The title compound, [Cu(C4H10NO)I(C4H11NO)], was obtained unintentionally as the product of an attempted synthesis of a Cu/Zn mixed-metal complex using zerovalent copper, zinc(II) oxide and ammonium iodide in pure 2-(dimethylamino)ethanol, in air. The molecular complex has no crystallographically imposed symmetry. The coordination geometry around the metal atom is distorted square-pyramidal. The equatorial coordination around copper involves donor atoms of the bidentate chelating 2-(dimethylamino)ethanol ligand and the 2-(dimethylamino)ethanolate group, which are mutually trans to each other, with four approximately equal short Cu—O/N bond distances. The axial Cu—I bond is substantially elongated. Intermolecular hydrogen-bonding interactions involving the –OH group of the neutral 2-(dimethylamino)ethanol ligand to the O atom of the monodeprotonated 2-(dimethylamino)ethanolate group of the molecule related by the n-glide plane, as indicated by the O⋯O distance of 2.482 (12) Å, form chains of molecules propagating along .
The reaction of nickel(II) nitrate with potassium selenocyanate and pyridazine leads to crystals of the title compound, [Ni(NCSe)2(C4H4N2)4]·2C4H4N2. The NiII atom is coordinated by two terminal N-bonded selenocyanate anions and four pyridazine ligands within a slightly distorted octahedral geometry. The crystal structure contains two crystallographically independent pyridazine molecules in cavities of the structure, which are not coordinated to the metal centres. The structure is pseudo-C-centered due to the positioning of the discrete coordination complexes; the non-coordinating pyridazine molecules, however, break the C-centering. In the subcell, these ligands are disordered around centres of inversion, which do not coincide with the mid-point of the molecules.
The solution state coordination chemistry of Hg(ClO4)2 with tris[(2-(6-methylpyridyl))methyl]amine (TLA) was investigated in acetonitrile-d3 by proton NMR. Although Hg(II) is a d10 metal ion commonly associated with notoriously rapid exchange between coordination environments, as many as six ligand environments were observed to be in slow exchange on the chemical shift time scale at select metal-to-ligand ratios. One of these ligand environments was associated with extensive heteronuclear coupling between protons and 199Hg and was assigned to the complex [Hg(TLA)]2+. The 5J(1H199Hg) = 8 Hz associated with this complex is the first example of five-bond coupling in a nitrogen coordination compound of Hg(II). The spectral complexity of related studies conducted in acetone-d6 precluded analysis of coordination equilibria. Crystallographic characterization of the T-shaped complex [Hg(TLAH)(CH2COCH3)](ClO4)2 (1) in which two pyridyl rings are pendant suggested that the acidity of acetone combined with the poor coordinating abilities of the neutral solvent adds additional complexity to solution equilibria. The complex crystallizes in the triclinic space group P1¯ with a = 9.352(2) Å, b = 12.956(2) Å, c = 14.199(2) Å, α = 115.458(10)°, β = 90.286(11)°, γ = 108.445(11)°, and Z = 2. The HgNamine, Hg-Npyridyl, and Hg-C bond lengths in the complex are 2.614(4), 2.159(4), and 2.080(6) Å, respectively. Relevance to development of 199Hg NMR as a metallobioprobe is discussed.
In the title complex, [Cu(C13H10N3O)2]n, the copper(II) cation is located on a crystallographic inversion centre and adopts an elongated octahedral coordination geometry with the equatorial plane provided by trans-arranged bis-N,O-chelating acylhydrazine groups from two ligands and the apices by the N atoms of two pyridine rings belonging to symmetry-related ligands. The ligand adopts a Z conformation about the C=N double bond. The dihedral angle between the pyridine and phenyl rings is 2.99 (13)°. An intraligand C—H⋯N hydrogen bond is observed. In the crystal, each ligand bridges two adjacent metal ions, forming a (4,4) grid layered structure. π–π stacking interactions [centroid–centroid distances in the range 3.569 (4)–3.584 (9) Å] involving rings of adjacent layers result in the formation of a three-dimensional supramolecular network.
High-throughput functional protein NMR studies, like protein interactions or dynamics, require an automated approach for the assignment of the protein backbone. With the availability of a growing number of protein 3D structures, a new class of automated approaches, called structure-based assignment, has been developed quite recently. Structure-based approaches use primarily NMR input data that are not based on J-coupling and for which connections between residues are not limited by through bonds magnetization transfer efficiency. We present here a robust structure-based assignment approach using mainly HN–HN NOEs networks, as well as 1H–15N residual dipolar couplings and chemical shifts. The NOEnet complete search algorithm is robust against assignment errors, even for sparse input data. Instead of a unique and partly erroneous assignment solution, an optimal assignment ensemble with an accuracy equal or near to 100% is given by NOEnet. We show that even low precision assignment ensembles give enough information for functional studies, like modeling of protein-complexes. Finally, the combination of NOEnet with a low number of ambiguous J-coupling sequential connectivities yields a high precision assignment ensemble. NOEnet will be available under: http://www.icsn.cnrs-gif.fr/download/nmr.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-009-9390-3) contains supplementary material, which is available to authorized users.
NMR; Assignment; Structure-based; NOE; Network; Chemical shifts; Residual dipolar couplings; NOEnet
Riboflavin Binding Protein (RBP) binds copper in a 1:1 molar ratio, forming a distinct well-ordered type II site. The nature of this site has been examined using X-ray absorption and pulsed electron paramagnetic resonance (EPR) spectroscopies, revealing a four coordinate oxygen/nitrogen rich environment. On the basis of analysis of the Cambridge Structural Database, the average protein bound copper-ligand bond length of 1.96 Å, obtained by extended x-ray absorption fine structure (EXAFS), is consistent with four coordinate Cu(I) and Cu(II) models that utilize mixed oxygen and nitrogen ligand distributions. These data suggest a Cu–O3N coordination state for copper bound to RBP. While pulsed EPR studies including hyperfine sublevel correlation spectroscopy and electron nuclear double resonance show clear spectroscopic evidence for a histidine bound to the copper, inclusion of a histidine in the EXAFS simulation did not lead to any significant improvement in the fit.
The title compound, [Tb(NCS)3(C18H15OP)3], contains a six-coordinate TbII cation surrounded by three O-bound triphenylphosphine oxide ligands and three N-bound thiocyanate ligands, each in a fac arrangement. There are two crystallographically unique TbIII atoms in the asymmetric unit. One TbIII atom resides on a threefold rotation axis, while the other has no imposed crystallographic symmetry. The thiocyanate ligands are bound through N atoms, illustrating the hard–hard bonding principles of metal complex chemistry.
NMR chemical shifts provide important local structural information for proteins. Consistent structure generation from NMR chemical shift data has recently become feasible for proteins with sizes of up to 130 residues, and such structures are of a quality comparable to those obtained with the standard NMR protocol. This study investigates the influence of the completeness of chemical shift assignments on structures generated from chemical shifts. The Chemical-Shift-Rosetta (CS-Rosetta) protocol was used for de novo protein structure generation with various degrees of completeness of the chemical shift assignment, simulated by omission of entries in the experimental chemical shift data previously used for the initial demonstration of the CS-Rosetta approach. In addition, a new CS-Rosetta protocol is described that improves robustness of the method for proteins with missing or erroneous NMR chemical shift input data. This strategy, which uses traditional Rosetta for pre-filtering of the fragment selection process, is demonstrated for two paramagnetic proteins and also for two proteins with solid-state NMR chemical shift assignments.
NMR chemical shift; protein structure prediction; solid-state NMR structure determination; paramagnetic protein; CS-Rosetta
High-throughput structure determination based on solution Nuclear Magnetic Resonance (NMR) spectroscopy plays an important role in structural genomics. One of the main bottlenecks in NMR structure determination is the interpretation of NMR data to obtain a sufficient number of accurate distance restraints by assigning nuclear Overhauser effect (NOE) spectral peaks to pairs of protons. The difficulty in automated NOE assignment mainly lies in the ambiguities arising both from the resonance degeneracy of chemical shifts and from the uncertainty due to experimental errors in NOE peak positions. In this paper we present a novel NOE assignment algorithm, called HAusdorff-based NOE Assignment (HANA), that starts with a high-resolution protein backbone computed using only two residual dipolar couplings (RDCs) per residue37, 39, employs a Hausdorff-based pattern matching technique to deduce similarity between experimental and back-computed NOE spectra for each rotamer from a statistically diverse library, and drives the selection of optimal position-specific rotamers for filtering ambiguous NOE assignments. Our algorithm runs in time O(tn3 +tn log t), where t is the maximum number of rotamers per residue and n is the size of the protein. Application of our algorithm on biological NMR data for three proteins, namely, human ubiquitin, the zinc finger domain of the human DNA Y-polymerase Eta (pol η) and the human Set2-Rpb1 interacting domain (hSRI) demonstrates that our algorithm overcomes spectral noise to achieve more than 90% assignment accuracy. Additionally, the final structures calculated using our automated NOE assignments have backbone RMSD < 1.7 Å and all-heavy-atom RMSD < 2.5 Å from reference structures that were determined either by X-ray crystallography or traditional NMR approaches. These results show that our NOE assignment algorithm can be successfully applied to protein NMR spectra to obtain high-quality structures.
The crystal structure of the title compound, C15H14O6·H2O, has been redetermined from single-crystal X-ray data. The structure was originally determined by Peet et al. [J. Heterocycl. Chem. (1995), 32, 33–41] but the atomic coordinates were not reported or deposited in the Cambridge Structural Database. The ethyl substituent is disordered over two sites with refined occupancies of 0.815 (6) and 0.185 (6). The indeno group is almost planar [maximum deviation 0.0922 (14) Å] and makes an angle of 68.81 (4)° with the furan ring. The fused ring molecules are assembled in pairs by intermolecular O—H⋯O hydrogen bonds. The resulting dimers are also hydrogen bonded to the water molecules, forming double-stranded chains running along the a axis.
Developing applications for metal-mediated base pairs (metallo-base-pair) has recently become a high-priority area in nucleic acid research, and physicochemical analyses are important for designing and fine-tuning molecular devices using metallo-base-pairs. In this study, we characterized the HgII-mediated T-T (T-HgII-T) base pair by Raman spectroscopy, which revealed the unique physical and chemical properties of HgII. A characteristic Raman marker band at 1586 cm−1 was observed and assigned to the C4=O4 stretching mode. We confirmed the assignment by the isotopic shift (18O-labeling at O4) and density functional theory (DFT) calculations. The unusually low wavenumber of the C4=O4 stretching suggested that the bond order of the C4=O4 bond reduced from its canonical value. This reduction of the bond order can be explained if the enolate-like structure (N3=C4-O4−) is involved as a resonance contributor in the thymine ring of the T-HgII-T pair. This resonance includes the N-HgII-bonded state (HgII-N3-C4=O4) and the N-HgII-dissociated state (HgII+ N3=C4-O4−), and the latter contributor reduced the bond order of N-HgII. Consequently, the HgII nucleus in the T-HgII-T pair exhibited a cationic character. Natural bond orbital (NBO) analysis supports the interpretations of the Raman experiments.
In the title compound, [Mn(C13H9N2O)2(CH3OH)2]Cl, the MnIII atom (site symmetry ) is coordinated by two N,O-bidentate 2-(1H-benzimidazol-2-yl)phenolate ligands and two methanol molecules, to generate a distorted trans-MnN2O4 octahedral geometry for the metal ion. The dihedral angle between the aromatic ring systems in the ligand is 16.0 (3)°. In the crystal structure, the complex cations and chloride anions are linked by O—H⋯Cl and N—H⋯Cl hydrogen bonds. The chloride ion lies on a crystallographic twofold axis.
The synthesis of the title salt, [Ag(C6H7N4)2](NO3)3, was carried out employing a 1:2 molar ratio of 2,2′-biimidazole and silver nitrate respectively. The cation has crystallographically-imposed C2 symmetry with the metal atom in an almost linear coordination environment [N—Ag—N = 177.01 (17)°]. The crystal structure displays N—H⋯O and C—H⋯O hydrogen-bonding interactions.
One bottleneck in NMR structure determination lies in the laborious and time-consuming process of side-chain resonance and NOE assignments. Compared to the well-studied backbone resonance assignment problem, automated side-chain resonance and NOE assignments are relatively less explored. Most NOE assignment algorithms require nearly complete side-chain resonance assignments from a series of through-bond experiments such as HCCH-TOCSY or HCCCONH. Unfortunately, these TOCSY experiments perform poorly on large proteins. To overcome this deficiency, we present a novel algorithm, called NASCA (NOE Assignment and Side-Chain Assignment), to automate both side-chain resonance and NOE assignments and to perform high-resolution protein structure determination in the absence of any explicit through-bond experiment to facilitate side-chain resonance assignment, such as HCCH-TOCSY. After casting the assignment problem into a Markov Random Field (MRF), NASCA extends and applies combinatorial protein design algorithms to compute optimal assignments that best interpret the NMR data. The MRF captures the contact map information of the protein derived from NOESY spectra, exploits the backbone structural information determined by RDCs, and considers all possible side-chain rotamers. The complexity of the combinatorial search is reduced by using a dead-end elimination (DEE) algorithm, which prunes side-chain resonance assignments that are provably not part of the optimal solution. Then an A* search algorithm is employed to find a set of optimal side-chain resonance assignments that best fit the NMR data. These side-chain resonance assignments are then used to resolve the NOE assignment ambiguity and compute high-resolution protein structures. Tests on five proteins show that NASCA assigns resonances for more than 90% of side-chain protons, and achieves about 80% correct assignments. The final structures computed using the NOE distance restraints assigned by NASCA have backbone RMSD 0.8 – 1.5 Å from the reference structures determined by traditional NMR approaches.
Nuclear magnetic resonance (NMR); side-chain resonance assignment; nuclear Overhauser effect (NOE) assignment; residual dipolar coupling (RDC); protein structure determination