Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including 1H, 13C and 15N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001–2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies.
NMR; Chemical shift; Proteomics; Database; BMRB
A methyl-detected ‘out-and-back’ NMR experiment for obtaining simultaneous correlations of methyl resonances of Valine and Isoleucine/Leucine residues with backbone carbonyl chemical shifts, SIM-HMCM(CGCBCA)CO, is described. The developed pulse-scheme serves the purpose of convenience in recording a single data set for all ILV methyl positions instead of acquiring two separate spectra selective for Valine or Leucine/Isoleucine residues. The SIM-HMCM(CGCBCA)CO experiment can be used for ILV methyl assignments in moderately sized protein systems (up to ~100 kDa) where the backbone chemical shifts of 13Cα, 13Cβ and 13CO are known from prior NMR studies and where some losses in sensitivity can be tolerated for the sake of an overall reduction in NMR acquisition time.
chemical shift assignments; methyl labeling; Enzyme I; methyl TROSY; isotope shifts
Oriented Sample (OS) solid-state NMR spectroscopy can be used to determine the three-dimensional structures of membrane proteins in magnetically or mechanically aligned lipid bilayers. The bottleneck for applying this technique to larger and more challenging proteins is making resonance assignments, which is conventionally accomplished through the preparation of multiple selectively isotopically labeled samples and performing an analysis of residues in regular secondary structure based on Polarity Index Slant Angle (PISA) Wheels and Dipolar Waves. Here we report the complete resonance assignment of the full-length mercury transporter, MerF, an 81-residue protein, which is challenging because of overlapping PISA Wheel patterns from its two trans-membrane helices, by using a combination of solid-state NMR techniques that improve the spectral resolution and provide correlations between residues and resonances. These techniques include experiments that take advantage of the improved resolution of the MSHOT4-Pi4/Pi pulse sequence; the transfer of resonance assignments through frequency alignment of heteronuclear dipolar couplings, or through dipolar coupling correlated isotropic chemical shift analysis; 15N/15N dilute spin exchange experiments; and the use of the proton-evolved local field (PELF) experiment with isotropic shift analysis to assign the irregular terminal and loop regions of the protein, which is the major “blind spot” of the PISA Wheel/Dipolar Wave method.
Solid-state NMR; membrane protein; aligned bilayers; dipolar coupling; chemical shift anisotropy; PISA Wheel; Dipolar Wave
To facilitate rigorous analysis of molecular motions in proteins, DNA, and RNA, we present a new version of ROTDIF, a program for determining the overall rotational diffusion tensor from single-or multiple-field Nuclear Magnetic Resonance (NMR) relaxation data. We introduce four major features that expand the program’s versatility and usability. The first feature is the ability to analyze, separately or together, 13C and/or 15N relaxation data collected at a single or multiple fields. A significant improvement in the accuracy compared to direct analysis of R2/R1 ratios, especially critical for analysis of 13C relaxation data, is achieved by subtracting high-frequency contributions to relaxation rates. The second new feature is an improved method for computing the rotational diffusion tensor in the presence of biased errors, such as large conformational exchange contributions, that significantly enhances the accuracy of the computation. The third new feature is the integration of the domain alignment and docking module for relaxation-based structure determination of multi-domain systems. Finally, to improve accessibility to all the program features, we introduced a graphical user interface (GUI) that simplifies and speeds up the analysis of the data. Written in Java, the new ROTDIF can run on virtually any computer platform. In addition, the new ROTDIF achieves an order of magnitude speedup over the previous version by implementing a more efficient deterministic minimization algorithm. We not only demonstrate the improvement in accuracy and speed of the new algorithm for synthetic and experimental 13C and 15N relaxation data for several proteins and nucleic acids, but also show that careful analysis required especially for characterizing RNA dynamics allowed us to uncover subtle conformational changes in RNA as a function of temperature that were opaque to previous analysis.
rotational diffusion tensor; ELM; ELMDOCK; ROTDIF
A multi-objective genetic algorithm is introduced to predict the assignment of protein solid-state NMR spectra with partial resonance overlap and missing peaks due to broad linewidths, molecular motion, and low sensitivity. This non-dominated sorting genetic algorithm II (NSGA-II) aims to identify all possible assignments that are consistent with the spectra and to compare the relative merit of these assignments. Our approach is modeled after the recently introduced Monte Carlo simulated annealing (MC/SA) protocol, with the key difference that NSGA-II simultaneously optimizes multiple assignment objectives instead of searching for possible assignments based on a single composite score. The multiple objectives include maximizing the number of consistently assigned peaks between multiple spectra (“good connections”), maximizing the number of used peaks, minimizing the number of inconsistently assigned peaks between spectra (“bad connections”), and minimizing the number of assigned peaks that have no matching peaks in the other spectra (“edges”). Using six solid-state NMR protein chemical shift datasets with varying levels of imperfection that was introduced by peak deletion, random chemical shift changes, and manual peak picking of spectra with moderately broad linewidths, we show that the NSGA-II algorithm produces a large number of valid and good assignments rapidly. For high-quality chemical shift peak lists, NSGA-II and MC/SA perform similarly well. However, when the peak lists contain many missing peaks that are uncorrelated between different spectra and have chemical shift deviations between spectra, the modified NSGA-II produces a larger number of valid solutions than MC/SA, and is more effective at distinguishing good from mediocre assignments by avoiding the hazard of suboptimal weighting factors for the various objectives. These two advantages, namely diversity and better evaluation, lead to a higher probability of predicting the correct assignment for a larger number of residues. On the other hand, when there are multiple equally good assignments that are significantly different from each other, the modified NSGA-II is less efficient than MC/SA in finding all the solutions. This problem is solved by a combined NSGA-II/MC algorithm, which appears to have the advantages of both NSGA-II and MC/SA. This combination algorithm is robust for the three most difficult chemical shift datasets examined here and is expected to give the highest-quality de novo assignment of challenging protein NMR spectra.
The low sensitivity inherent to both the static and magic angle spinning techniques of solid-state NMR (ssNMR) spectroscopy has thus far limited the routine application of multidimensional experiments to determine the structure of membrane proteins in lipid bilayers. Here, we demonstrate the advantage of using a recently developed class of experiments, polarization optimized experiments (POE), for both static and MAS spectroscopy to achieve higher sensitivity and substantial time-savings for 2D and 3D experiments. We used sarcolipin, a single pass membrane protein, reconstituted in oriented bicelles (for oriented ssNMR) and multilamellar vesicles (for MAS ssNMR) as a benchmark. The restraints derived by these experiments are then combined into a hybrid energy function to allow simultaneous determination of structure and topology. The resulting structural ensemble converged to a helical conformation with a backbone RMSD ∼ 0.44 Å, a tilt angle of 24° ± 1°, and an azimuthal angle of 55° ± 6°. This work represents a crucial first step toward obtaining high-resolution structures of large membrane proteins using combined multidimensional O-ssNMR and MAS-ssNMR.
Oriented Solid-State NMR (OSS-NMR); Magic Angle Spinning Solid State NMR; Membrane Proteins; DUMAS-ssNMR; MEIOSIS; Sarcolipin; Magnetically Aligned bicelles; Hybrid Method for Membrane Protein Structure Determination
The power of nuclear magnetic resonance spectroscopy derives from its site-specific access to chemical, structural and dynamic information. However, the corresponding multiplicity of interactions can be difficult to tease apart. Complimentary approaches involve spectral editing on the one hand and selective isotope substitution on the other. Here we present a new “redox” approach to the latter: acetate is chosen as the sole carbon source for the extreme oxidation numbers of its two carbons. Consistent with conventional anabolic pathways for the amino acids, [1-13C] acetate does not label α carbons, labels other aliphatic carbons and the aromatic carbons very selectively, and labels the carboxyl carbons heavily. The benefits of this labeling scheme are exemplified by magic angle spinning spectra of microcrystalline immunoglobulin binding protein G (GB1): the elimination of most J-couplings and one- and two-bond dipolar couplings provides narrow signals and long-range, intra- and inter-residue, recoupling essential for distance constraints. Inverse redox labeling, from [2-13C] acetate, is also expected to be useful: although it retains one-bond couplings in the sidechains, the removal of CA-CO coupling in the backbone should improve the resolution of NCACX spectra.
isotope substitution; sparse labeling; MAS NMR structure determination; dipolar truncation; isotope labeled peptone; 13C-acetate
For several of the proteins in the BioMagResBank larger than 200 residues, 60% or fewer of the backbone resonances were assigned. But how reliable are those assignments? In contrast to complete assignments, where it is possible to check whether every triple-resonance Generalized Spin System (GSS) is assigned once and only once, with incomplete data one should compare all possible assignments and pick the best one. But that is not feasible: For example, for 200 residues and an incomplete set of 100 GSS, there are 1.6*10260 possible assignments. In “EZ-ASSIGN”, the protein sequence is divided in smaller unique fragments. Combined with intelligent search approaches, an exhaustive comparison of all possible assignments is now feasible using a laptop computer. The program was tested with experimental data of a 388-residue domain of the Hsp70 chaperone protein DnaK and for a 351-residue domain of a type III secretion ATPase. EZ-ASSIGN reproduced the hand assignments. It did slightly better than the computer program PINE (Bahrami et al., PLoS Comput Biol. 2009 5 (3): e1000307) and significantly outperformed SAGA (Crippen et al, (2010) J Biomol NMR 46, 281–298), AUTOASSIGN (Zimmerman et al., (1997) J Mol Biol 269:592–610), and IBIS (Hyberts and Wagner (2003) J Biomol NMR 26:335–344). Next, EZ-ASSIGN was used to investigate how well NMR data of decreasing completeness can be assigned. We found that the program could confidently assign fragments in very incomplete data. Here, EZ-ASSIGN dramatically outperformed all the other assignment programs tested.
Comprehensive application of solution NMR spectroscopy to studies of macromolecules remains fundamentally limited by the molecular rotational correlation time. For proteins, molecules larger than 30 kDa require complex experimental methods, such as TROSY in conjunction with isotopic labeling schemes that are often expensive and generally reduce the potential information available. We have developed the reverse micelle encapsulation strategy as an alternative approach. Encapsulation of proteins within the protective nano-scale water pool of a reverse micelle dissolved in ultra-low viscosity nonpolar solvents overcomes the slow tumbling problem presented by large proteins. Here, we characterize the contributions from the various components of the protein-containing reverse micelle system to the rotational correlation time of the encapsulated protein. Importantly, we demonstrate that the protein encapsulated in the reverse micelle maintains a hydration shell comparable in size to that seen in bulk solution. Using moderate pressures, encapsulation in ultra-low viscosity propane or ethane can be used to magnify this advantage. We show that encapsulation in liquid ethane can be used to reduce the tumbling time of the 43 kDa maltose binding protein from ~23 ns to ~10 ns. These conditions enable, for example, acquisition of TOCSY-type data resolved on the adjacent amide NH for the 42 kDa encapsulated maltose binding protein dissolved in liquid ethane, which is typically impossible for proteins of such size without use of extensive deuteration or the TROSY effect.
encapsulated proteins; reverse miclles; low viscosity fluids; triple resonance NMR; resonance assignment; structure determination
A through bond, C4′/H4′ selective, “out and stay” type 4D HC(P)CH experiment is introduced which provides sequential connectivity via H4′(i)–C4′(i)–C4′(i−1)–H4′(i−1) correlations. The 31P dimension (used in the conventional 3D HCP experiment) is replaced with evolution of better dispersed C4′ dimension. The experiment fully utilizes 13C-labeling of RNA by inclusion of two C4′ evolution periods. An additional evolution of H4′ is included to further enhance peak resolution. Band selective 13C inversion pulses are used to achieve selectivity and prevent signal dephasing due to the of C4′–C3′ and C4′–C5′ homonuclear couplings. For reasonable resolution, non-uniform sampling is employed in all indirect dimensions. To reduce sensitivity losses, multiple quantum coherences are preserved during shared-time evolution and coherence transfer delays. In the experiment the intra-nucleotide peaks are suppressed whereas inter-nucleotide peaks are enhanced to reduce the ambiguities. The performance of the experiment is verified on a fully 13C, 15N-labeled 34-nt hairpin RNA comprising typical structure elements.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-014-9861-z) contains supplementary material, which is available to authorized users.
RNA resonance assignment; HCP; Selective pulses; Four-dimensional NMR; Non-uniform sampling
Peak-picking Of Noe Data Enabled by Restriction Of Shift Assignments-Client Server (PONDEROSA-C/S) builds on the original PONDEROSA software (Lee et al. in Bioinformatics 27:1727–1728. doi:10.1093/bioinformatics/btr200, 2011) and includes improved features for structure calculation and refinement. PONDEROSA-C/S consists of three programs: Ponderosa Server, Ponderosa Client, and Ponderosa Analyzer. PONDEROSA-C/S takes as input the protein sequence, a list of assigned chemical shifts, and nuclear Overhauser data sets (13C- and/or 15N-NOESY). The output is a set of assigned NOEs and 3D structural models for the protein. Ponderosa Analyzer supports the visualization, validation, and refinement of the results from Ponderosa Server. These tools enable semi-automated NMR-based structure determination of proteins in a rapid and robust fashion. We present examples showing the use of PONDEROSA-C/S in solving structures of four proteins: two that enable comparison with the original PONDEROSA package, and two from the Critical Assessment of automated Structure Determination by NMR (Rosato et al. in Nat Methods 6:625–626. doi:10.1038/nmeth0909-625, 2009) competition. The software package can be downloaded freely in binary format from http://pine.nmrfam.wisc.edu/download_packages.html. Registered users of the National Magnetic Resonance Facility at Madison can submit jobs to the PONDEROSA-C/S server at http://ponderosa.nmrfam.wisc.edu, where instructions, tutorials, and instructions can be found. Structures are normally returned within 1–2 days.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-014-9855-x) contains supplementary material, which is available to authorized users.
NOE assignment; 3D structure determination; Client server; Semi-automation; Graphical interface for data visualization and refinement; Structure refinement and validation
The Neuronal Ceroid Lipofuscinoses (NCL) are a group of fatal inherited neurodegenerative diseases in humans distinguished by a common clinical pathology, characterized by the accumulation of storage body material in cells and gross brain atrophy. In this study, metabolic changes in three NCL mouse models were examined looking for pathways correlated with neurodegeneration. Two mouse models; motor neuron degeneration (mnd) mouse and a variant model of late infantile NCL, termed the neuronal ceroid lipofuscinosis (nclf) mouse were investigated experimentally. Both models exhibit a characteristic accumulation of autofluorescent lipopigment in neuronal and non neuronal cells. The NMR profiles derived from extracts of the cortex and cerebellum from mnd and nclf mice were distinguished according to disease/wildtype status. In particular, a perturbation in glutamine and glutamate metabolism, and a decrease in γ-amino butyric acid (GABA) in the cerebellum and cortices of mnd (adolescent mice) and nclf mice relative to wildtype at all ages were detected. Our results were compared to the Cln3 mouse model of NCL. The metabolism of mnd mice resembled older (6 month) Cln3 mice, where the disease is relatively advanced, while the metabolism of nclf mice was more akin to younger (1–2 months) Cln3 mice, where the disease is in its early stages of progression. Overall, our results allowed the identification of metabolic traits common to all NCL subtypes for the three animal models.
Juvenile Neuronal Ceroid Lipofuscinosis (JNCL); Batten disease; CLN3; NMR; Metabolomics; Neurodegeneration
SHPRH (SNF2, histone linker, PHD, RING, helicase) is a SWI2/SNF2-family ATP-dependent chromatin remodeling factor, and one of E3 ubiquitin ligases responsible for Ubc13-Mms2-dependent K63 poly-ubiquitination of PCNA (proliferating cell nuclear antigen) that promotes error-free DNA damage tolerance in eukaryotes. In contrast to its functional homologues, S. cerevisiae Rad5 and human HLTF (helicase like transcription factor), SHPRH contains a PHD (plant homeodomain) finger embedded in the ‘minor’ insert region of the core helicase-like domain. PHD fingers are often found in proteins involved in chromatin remodeling and transcription regulation, and are generally considered as ‘readers’ of methylation state of histone tails, primarily the lysine 4 (K4) residue of histone H3 (H3K4). Here we report the solution NMR structure of the SHPRH PHD domain and investigate whether this domain is capable of recognizing H3K4 modifications. The domain adopts a canonical PHD-finger fold with a central two-stranded anti-parallel β-sheet flanked on both sides by the two interleaved zinc-binding sites. Despite the presence of a subset of aromatic residues characteristic for PHD-fingers that preferentially bind methylated H3K4, NMR titration experiments reveal that SHPRH PHD does not specifically interact with the H3-derived peptides irrespective of K4 methylation. This result suggests that the SHPRH PHD domain might have evolved a different function other than recognizing histone modifications.
DNA damage tolerance; template switching; translesion synthesis; uniquitination; PCNA; HLTF; Rad5; Mms2-Ubc13
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data.
Protein NMR Structure Validation; BioMagResDatabase; XPLOR; CNS; CYANA; CS-Rosetta
The HMCM[CG]CBCA experiment (J. Am. Chem. Soc. (2003), 125, 13868–13878) correlates methyl carbon and proton shifts to Cγ, Cβ, and Cα resonances for the purpose of resonance assignments. The relative sensitivity of the HMCM[CG]CBCA sequence experiment is compared to a divide-and-conquer approach to assess whether it is best to collect all of the methyl correlations at once, or to perform separate experiments for each correlation. A straightforward analysis shows that the divide-and-conquer approach is intrinsically more sensitive, and should always be used to obtain methy-Cγ, Cβ, and Cα correlations. The improvement in signal-to-noise associated with separate experiments is illustrated by the detection of methyl-aliphatic correlations in a 65 kDa protein-DNA complex.
High-pressure NMR spectroscopy has emerged as a complementary approach for investigating various structural and thermodynamic properties of macromolecules. Noticeably absent from the array of experimental restraints that have been employed to characterize protein structures at high hydrostatic pressure is the residual dipolar coupling, which requires the partial alignment of the macromolecule of interest. Here we examine five alignment media that are commonly used at ambient pressure for this purpose. We find that the spontaneous alignment of Pf1 phage, d(GpG) and a C12E5/n-hexnanol mixture in a magnetic field is preserved under high hydrostatic pressure. However, DMPC/ DHPC bicelles and collagen gel are found to be unsuitable. Evidence is presented to demonstrate that pressure-induced structural changes can be identified using the residual dipolar coupling.
Residual dipolar coupling; High hydrostatic pressure; Alignment media; NMR spectroscopy; Structure calculation
The feasibility of using difference spectroscopy, i.e. subtraction of two correlation spectra at different mixing times, for substantially enhanced resolution in crowded two-dimensional 13C-13C chemical shift correlation spectra is presented. With the analyses of 13C-13C spin diffusion in simple spin systems, difference spectroscopy is proposed to partially separate the spin diffusion resonances of relatively short intra-residue distances from the longer inter-residue distances, leading to a better identification of the inter-residue resonances. Here solid-state magic-angle-spinning (MAS) NMR spectra of the full length M2 protein embedded in synthetic lipid bilayers have been used to illustrate the resolution enhancement in the difference spectra. The integral membrane M2 protein of Influenza A virus assembles as a tetrameric bundle to form a protonconducting channel that is activated by low pH and is essential for the viral lifecycle. Based on known amino acid resonance assignments from amino acid specific labeled samples of truncated M2 sequences or from time-consuming 3D experiments of uniformly labeled samples, some inter-residue resonances of the full length M2 protein can be identified in the difference spectra of uniformly 13C labeled protein that are consistent with the high resolution structure of the M2 (22–62) protein (Sharma et al. 2010).
difference spectroscopy; solid-state MAS NMR; membrane protein; inter-residue correlation; 13C-13C chemical shift correlation
Relaxation dispersion spectroscopy is one of the most widely used techniques for the analysis of protein dynamics. To obtain a detailed understanding of the protein function from the view point of dynamics, it is essential to fit relaxation dispersion data accurately. The grid search method is commonly used for relaxation dispersion curve fits, but it does not always find the global minimum that provides the best-fit parameter set. Also, the fitting quality does not always improve with increase of the grid size although the computational time becomes longer. This is because relaxation dispersion curve fitting suffers from a local minimum problem, which is a general problem in non-linear least squares curve fitting. Therefore, in order to fit relaxation dispersion data rapidly and accurately, we developed a new fitting program called GLOVE that minimizes global and local parameters alternately, and incorporates a Monte-Carlo minimization method that enables fitting parameters to pass through local minima with low computational cost. GLOVE also implements a random search method, which sets up initial parameter values randomly within user-defined ranges. We demonstrate here that the combined use of the three methods can find the global minimum more rapidly and more accurately than grid search alone.
Relaxation dispersion curve fitting; Fitting software; Speed and accuracy; Global fit; Monte Carlo-minimization; Local minimum problem
An NMR investigation of proteins with known X-ray structures is of interest in a number of endeavors. Performing these studies through nuclear magnetic resonance (NMR) requires the costly step of resonance assignment. The prevalent assignment strategy does not make use of existing structural information and requires uniform isotope labeling. Here we present a rapid and cost-effective method of assigning NMR data to an existing structure—either an X-ray or computationally modeled structure. The presented method, Exhaustively Permuted Assignment of RDCs (EPAR), utilizes unassigned residual dipolar coupling (RDC) data that can easily be obtained by NMR spectroscopy. The algorithm uses only the backbone N–H RDCs from multiple alignment media along with the amino acid type of the RDCs. It is inspired by previous work from Zweckstetter and provides several extensions. We present results on 13 synthetic and experimental datasets from 8 different structures, including two homodimers. Using just two alignment media, EPAR achieves an average assignment accuracy greater than 80%. With three media, the average accuracy is higher than 94%. The algorithm also outputs a prediction of the assignment accuracy, which has a correlation of 0.77 to the true accuracy. This prediction score can be used to establish the needed confidence in assignment accuracy.
Assignment; Residual dipolar coupling; Refinement; Protein; Structure; RDC; NMR
We introduce a Python-based program that utilizes the large database of 13C and 15N chemical shifts in the Biological Magnetic Resonance Bank to rapidly predict the amino acid type and secondary structure from correlated chemical shifts. The program, called PACSYlite Unified Query (PLUQ), is designed to help assign peaks obtained from 2D 13C–13C, 15N–13C, or 3D 15N–13C–13C magic-angle-spinning correlation spectra. We show secondary-structure specific 2D 13C–13C correlation maps of all twenty amino acids, constructed from a chemical shift database of 262,209 residues. The maps reveal interesting conformation-dependent chemical shift distributions and facilitate searching of correlation peaks during amino-acid type assignment. Based on these correlations, PLUQ outputs the most likely amino acid types and the associated secondary structures from inputs of experimental chemical shifts. We test the assignment accuracy using four high-quality protein structures. Based on only the Cα and Cβ chemical shifts, the highest-ranked PLUQ assignments were 40–60 % correct in both the amino-acid type and the secondary structure. For three input chemical shifts (CO–Cα–Cβ or N–Cα–Cβ), the first-ranked assignments were correct for 60 % of the residues, while within the top three predictions, the correct assignments were found for 80 % of the residues. PLUQ and the chemical shift maps are expected to be useful at the first stage of sequential assignment, for combination with automated sequential assignment programs, and for highly disordered proteins for which secondary structure analysis is the main goal of structure determination.
Chemical shift correlation; Amino-acid type assignment; PLUQ; Secondary structure; Protein resonance assignment
Nuclear magnetic resonance (NMR) spectroscopy has evolved into a powerful tool for fragment-based drug discovery over the last two decades. While NMR has been traditionally used to elucidate the three-dimensional structures and dynamics of biomacromolecules and their interactions, it can also be a very valuable tool for the reliable identification of small molecules that bind to proteins and for hit-to-lead optimization. Here, we describe the use of NMR spectroscopy as a method for fragment-based drug discovery and how to most effectively utilize this approach for discovering novel therapeutics based on our experience.
NMR spectroscopy; fragment-based drug discovery; fragment-based screening; hit identification; fragment libraries