Calicheamicin γ1I (1)
is an enediyne antitumor compound produced by Micromonospora
echinospora spp. calichensis, and its biosynthetic gene cluster
has been previously reported. Despite extensive analysis and biochemical
study, several genes in the biosynthetic gene cluster of 1 remain functionally unassigned. Using a structural genomics approach
and biochemical characterization, two proteins encoded by genes from
the 1 biosynthetic gene cluster assigned as “unknowns”,
CalU16 and CalU19, were characterized. Structure analysis revealed
that they possess the STeroidogenic Acute Regulatory protein related
lipid Transfer (START) domain known mainly to bind and transport lipids
and previously identified as the structural signature of the enediyne
self-resistance protein CalC. Subsequent study revealed calU16 and calU19 to confer resistance to 1, and reminiscent of the prototype CalC, both CalU16 and CalU19 were
cleaved by 1in vitro. Through site-directed
mutagenesis and mass spectrometry, we identified the site of cleavage
in each protein and characterized their function in conferring resistance
against 1. This report emphasizes the importance of structural
genomics as a powerful tool for the functional annotation of unknown
Spatially selective heteronuclear multiple-quantum coherence (SS HMQC) NMR spectroscopy was devised for solution studies of proteins. Due to ‘time-staggered’ acquisition of free induction decays (FIDs) in different slices, SS HMQC allows one to employ long delays for longitudinal nuclear spin relaxation at high repetition rates for the acquisition of the FIDs. To also achieve high intrinsic sensitivity, SS HMQC was implemented by combing a single spatially selective 1H excitation pulse with non-selective 1H 180° pulses. High-quality spectra could be obtained within 66 seconds for a 7.6 kDa uniformly 13C,15N-labeled protein, and within 45 and 90 seconds for, respectively, two uniformly 2H,13C,15N-labeled but isoleucine, leucine and valine methyl group protonated proteins with molecular weights of 7.5 and 43 kDa.
rapid data acquisition; spatially selective NMR; time staggered data acquisition; flip-back pulses; HMQC
The second round of the community-wide initiative Critical Assessment of automated Structure Determination of Proteins by NMR (CASD-NMR-2013) comprised ten blind target datasets, consisting of unprocessed spectral data, assigned chemical shift lists and unassigned NOESY peak and RDC lists, that were made available in both curated (i.e. manually refined) or un-curated (i.e. automatically generated) form. Ten structure calculation programs, using fully automated protocols only, generated a total of 164 three-dimensional structures (entries) for the ten targets, sometimes using both curated and un-curated lists to generate multiple entries for a single target. The accuracy of the entries could be established by comparing them to the corresponding manually solved structure of each target, which was not available at the time the data were provided. Across the entire data set, 71 % of all entries submitted achieved an accuracy relative to the reference NMR structure better than 1.5 Å. Methods based on NOESY peak lists achieved even better results with up to 100 % of the entries within the 1.5 Å threshold for some programs. However, some methods did not converge for some targets using un-curated NOESY peak lists. Over 90 % of the entries achieved an accuracy better than the more relaxed threshold of 2.5 Å that was used in the previous CASD-NMR-2010 round. Comparisons between entries generated with un-curated versus curated peaks show only marginal improvements for the latter in those cases where both calculations converged.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-015-9953-4) contains supplementary material, which is available to authorized users.
Protein; NMR; Structure determination; Automation; Quality; Validation; Blind testing; NOE; Chemical shift; CASD-NMR; Accuracy; Precision
We performed a comprehensive structure validation of both automated and manually generated structures of the 10 targets of the CASD-NMR-2013 effort. We established that automated structure determination protocols are capable of reliably producing structures of comparable accuracy and quality to those generated by a skilled researcher, at least for small, single domain proteins such as the ten targets tested. The most robust results appear to be obtained when NOESY peak lists are used either as the primary input data or to augment chemical shift data without the need to manually filter such lists. A detailed analysis of the long-range NOE restraints generated by the different programs from the same data showed a surprisingly low degree of overlap. Additionally, we found that there was no significant correlation between the extent of the NOE restraint overlap and the accuracy of the structure. This result was surprising given the importance of NOE data in producing good quality structures. We suggest that this could be explained by the information redundancy present in NOEs between atoms contained within a fixed covalent network.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-015-9949-0) contains supplementary material, which is available to authorized users.
Protein; NMR; Structure determination; Quality; Validation; Blind testing; NOE; CASD-NMR
Target-site selection by retroviral integrase (IN) proteins profoundly affects viral pathogenesis. We describe the solution nuclear magnetic resonance structure of the Moloney murine leukemia virus IN (M-MLV) C-terminal domain (CTD) and a structural homology model of the catalytic core domain (CCD). In solution, the isolated MLV IN CTD adopts an SH3 domain fold flanked by a C-terminal unstructured tail. We generated a concordant MLV IN CCD structural model using SWISS-MODEL, MMM-tree and I-TASSER. Using the X-ray crystal structure of the prototype foamy virus IN target capture complex together with our MLV domain structures, residues within the CCD α2 helical region and the CTD β1-β2 loop were predicted to bind target DNA. The role of these residues was analyzed in vivo through point mutants and motif interchanges. Viable viruses with substitutions at the IN CCD α2 helical region and the CTD β1-β2 loop were tested for effects on integration target site selection. Next-generation sequencing and analysis of integration target sequences indicate that the CCD α2 helical region, in particular P187, interacts with the sequences distal to the scissile bonds whereas the CTD β1-β2 loop binds to residues proximal to it. These findings validate our structural model and disclose IN-DNA interactions relevant to target site selection.
Non-structural protein 1 of influenza A virus, NS1A, is a conserved virulence factor comprised of an N-terminal double-stranded RNA (dsRNA)-binding domain (RBD) and a multifunctional C-terminal effector domain (ED), each of which can independently form symmetric homodimers. Here we apply 19F NMR to NS1A from influenza A/Udorn/307/1972 virus (H3N2) labeled with 5-fluorotryptophan (5-F-Trp), and demonstrate that the 19F signal of Trp187 is a sensitive, direct monitor of the ED helix-helix dimer interface. 19F relaxation dispersion data reveal the presence of conformational dynamics within this functionally important protein-protein interface, whose rate is over three orders of magnitude faster than the kinetics of ED dimerization. 19F NMR also affords direct spectroscopic evidence that Trp187, which mediates intermolecular ED:ED interactions required for cooperative dsRNA binding, is solvent exposed in full-length NS1Aat concentrations below aggregation. These results have important implications for the diverse roles of this NS1A epitope during influenza virus infection.
19F NMR; 5-Fluorotryptophan; Conformational dynamics; Effector domain; Influenza A virus; Non-structural protein 1
The 500 kDa protein plectin is essential for the cytoskeletal organization of most mammalian cells and it is up-regulated in some types of cancer. Here, we report nearly complete sequence-specific polypeptide backbone, 13Cβ and methyl group resonance assignments for 24 kDa human plectin(4403-4606) containing the C-terminal plectin repeat domain 6.
cytoskeletal linker protein; HCPIN; plakin repeat domain; plectin repeat domain; selective isotpe labeling; structural genomics
The bacteriophage λ Q protein is a transcription antitermination factor that controls expression of the phage late genes as a stable component of the transcription elongation complex. To join the elongation complex, λQ binds a specific DNA sequence element and interacts with RNA polymerase that is paused during early elongation. λQ’s interaction with the paused early elongation complex involves interactions between λQ and two regions of RNA polymerase: region 4 of the σ70 subunit and the flap domain of the β subunit. We present the 2.1 Å resolution crystal structure of a portion of λQ containing determinants for interaction with DNA, interaction with region 4 of σ70, and interaction with the β flap. The structure provides a framework for interpreting prior genetic and biochemical analysis and sets the stage for future structural studies to elucidate the mechanism by which λQ alters the functional properties of the transcription elongation complex.
Cytosolic nucleotidase II (cN-II) from Legionella pneumophila (Lp) catalyzes the hydrolysis of GMP and dGMP displaying sigmoidal curves while catalysis of IMP hydrolysis displayed a biphasic curve in the initial rate versus substrate concentration plots. Allosteric modulators of mammalian cN-II did not activate LpcN-II while GTP, GDP and the substrate GMP were specific activators. Crystal structures of the tetrameric LpcN-II revealed an activator binding site at the dimer interface. A double mutation in this allosteric binding site abolished activation, confirming the structural observations. The substrate GMP acting as an activator and partitioning between the allosteric and active site is the basis for the sigmoidicity of the initial velocity versus GMP concentration plot. The LpcN-II tetramer showed differences in subunit organization upon activator binding that is absent in the activator-bound human cN-II structure. This is the first observation of a structural change induced by activator binding in cN-II that may be the molecular mechanism for enzyme activation.
5’-nucleotidase; Allostery; Heterotropic activation; Substrate activation; GMP-complexed LpcN-II structure
cancer has a dismal 5 year survival rate of 5.5% that
has not been improved over the past 25 years despite an enormous amount
of effort. Thus, there is an urgent need to identify truly novel yet
druggable protein targets for drug discovery. The human protein DnaJ
homologue subfamily A member 1 (DNAJA1) was previously shown to be
downregulated 5-fold in pancreatic cancer cells and has been targeted
as a biomarker for pancreatic cancer, but little is known about the
specific biological function for DNAJA1 or the other members of the
DnaJ family encoded in the human genome. Our results suggest the overexpression
of DNAJA1 suppresses the stress response capabilities of the oncogenic
transcription factor, c-Jun, and results in the diminution of cell
survival. DNAJA1 likely activates a DnaK protein by forming a complex
that suppresses the JNK pathway, the hyperphosphorylation of c-Jun,
and the anti-apoptosis state found in pancreatic cancer cells. A high-quality
nuclear magnetic resonance solution structure of the J-domain of DNAJA1
combined with a bioinformatics analysis and a ligand affinity screen
identifies a potential DnaK binding site, which is also predicted
to overlap with an inhibitory binding site, suggesting DNAJA1 activity
is highly regulated.
Cyanobacterial phycobiliproteins have evolved to capture light energy over most of the visible spectrum due to their bilin chromophores, which are linear tetrapyrroles that have been covalently attached by enzymes called bilin lyases. We report here the crystal structure of a bilin lyase of the CpcS family from Thermosynechococcus elongatus (TeCpcS-III). TeCpcS-III is a 10-stranded beta barrel with two alpha helices and belongs to the lipocalin structural family. TeCpcS-III catalyzes both cognate as well as non-cognate bilin attachment to a variety of phycobiliprotein subunits. TeCpcS-III ligates phycocyanobilin, phycoerythrobilin and phytochromobilin to the alpha and beta subunits of allophycocyanin and to the beta subunit of phycocyanin at the Cys82-equivalent position in all cases. The active form of TeCpcS-III is a dimer, which is consistent with the structure observed in the crystal. Using the UnaG protein and its association with bilirubin as a guide, a model for the association between the native substrate, phycocyanobilin, and TeCpcS was produced.
bilin lyase; cyanobacteria; fluorescent probes; phycobiliproteins; lipocalins
A high-quality structure of the 68-residue protein CD1104B from Clostridium difficile strain 630 exhibits a distinct all α-helical fold. The structure presented here is the first representative of bacterial protein domain family PF14203 (currently 180 members) of unknown function (DUF4319) and reveals that the side-chains of the only two strictly conserved residues (Glu 8 and Lys 48) form a salt bridge. Moreover, these two residues are located in the vicinity of the largest surface cleft which is predicted to contribute to a surface area involved in protein-protein interactions. This, along with its coding in transposon CTn4, suggests that CD1104B (and very likely all members of Pfam 14230) functions by interacting with other proteins required for the transfer of transposons between different bacterial species.
CD1104B; PF14203; DUF4319; Transposon; Structural Genomics
We have determined the solution NMR structure of the intermembrane space domain (IMSD) of the human mitochondrial ATPase associated with various activities (AAA) protease known as AFG3-like protein 2 (AFG3L2). Our structural analysis and molecular dynamics results indicate that the IMSD is peripherally bound to the membrane surface. This is a modification to the location of the six IMSDs in a model of the full length yeast hexaoligomeric homolog of AFG3L2 determined at low resolution by electron cryomicroscopy 1. The predicted protein-protein interaction surface, located on the side furthest from the membrane, may mediate binding to substrates as well as prohibitins.
m-AAA protease; Molecular dynamics; NMR structure
Small molecule control of intracellular protein levels allows temporal and dose-dependent regulation of protein function. Recently, we developed a method to degrade proteins fused to a mutant dehalogenase (HaloTag2) using small molecule hydrophobic tags (HyTs). Here, we introduce a complementary method to stabilize the same HaloTag2 fusion proteins, resulting in a unified system allowing bidirectional control of cellular protein levels in a temporal and dose-dependent manner. From a small molecule screen, we identified N-(3,5-dichloro-2-ethoxybenzyl)-2H-tetrazol-5-amine as a nanomolar HALoTag2 Stabilizer (HALTS1) that reduces the Hsp70:HaloTag2 interaction, thereby preventing HaloTag2 ubiquitination. Finally, we demonstrate the utility of the HyT/HALTS system in probing the physiological role of therapeutic targets by modulating HaloTag2-fused oncogenic H-Ras, which resulted in either the cessation (HyT) or acceleration (HALTS) of cellular transformation. In sum, we present a general platform to study protein function, whereby any protein of interest fused to HaloTag2 can be either degraded 10-fold or stabilized 5-fold using two corresponding compounds.
Drug Target Validation; Hydrophobic Tag; Degron; Hsp70; Ubiquitin Proteasome System
At present, only 0.9% of PDB-deposited structures are of membrane proteins in spite of the fact that membrane proteins constitute approximately 30% of total proteins in most genomes from bacteria to humans. Here we address some of the major bottlenecks in the structural studies of membrane proteins and discuss the ability of the new technology, the Single-Protein Production (SPP) system, to help solve these bottlenecks.
SPP; membrane protein; NMR
Gram-negative bacteria consist of two independent membranes, the inner cytoplasmic membrane and the outer membrane. The outer membrane contains a number of β-barrel proteins such as OmpF, OmpC, OmpA and OmpX. In this paper, we explored to use the condensed Single Protein Production (cSPP) system for isotope labelling of OmpA and OmpX for NMR structural study, both of which are known to consist of eight β-strands forming a barrel in the outer membrane. Using a deletion strain lacking all major outer membrane proteins, both OmpA and OmpX were expressed well in a 20-fold condensed SPP (cSPP) system. We demonstrated that outer membrane fractions prepared from the cSPP system in M9 medium containing 15-N-NH4Cl can be directly used for NMR structural study of the outer mebrane proteins without any further purification to get excellent [1H-15N]-TROSY spectra.
Biomolecular NMR structures are now routinely used in biology, chemistry, and bioinformatics. Methods and metrics for assessing the accuracy and precision of protein NMR structures are beginning to be standardized across the biological NMR community. These include both knowledge-based assessment metrics, parameterized from the database of protein structures, and model vs. data assessment metrics. On line servers are available that provide comprehensive protein structure quality assessment reports, and efforts are in progress by the world-wide Protein Data Bank (wwPDB) to develop a biomolecular NMR structure quality assessment pipeline as part of the structure deposition process. These quality assessment metrics and standards will aid NMR spectroscopists in determining more accurate structures, and increase the value and utility of these structures for the broad scientific community.
As methods for analysis of biomolecular structure and dynamics using nuclear magnetic resonance spectroscopy (NMR) continue to advance, the resulting 3D structures, chemical shifts, and other NMR data are broadly impacting biology, chemistry, and medicine. Structure model assessment is a critical area of NMR methods development, and is an essential component of the process of making these structures accessible and useful to the wider scientific community. For these reasons, the Worldwide Protein Data Bank (wwPDB) has convened an NMR Validation Task Force (NMR-VTF) to work with the wwPDB partners in developing metrics and policies for biomolecular NMR data harvesting, structure representation, and structure quality assessment. This paper summarizes the recommendations of the NMR-VTF, and lays the groundwork for future work in developing standards and metrics for biomolecular NMR structure quality assessment.
High-quality NMR structures of the C-terminal domain comprising residues 484-537 of the 537-residue protein Bacterial chlorophyll subunit B (BchB) from Chlorobium tepidum and residues 9-61 of 61-residue Asr4154 from Nostoc sp. (strain PCC 7120) exhibit a mixed α/β fold comprised of three α-helices and a small β-sheet packed against second α-helix. These two proteins share 29 % sequence similarity and their structures are globally quite similar. The structures of BchB(484-537) and Asr4154(9-61) are the first representative structures for the large protein family (Pfam) PF08369, a family of unknown function currently containing 610 members in bacteria and eukaryotes. Furthermore, BchB(484-537) complements the structural coverage of the dark-operating protochlorophyllide oxidoreductase (DPOR).
BchB; DPOR; Asr4154; PF08369; PCP-red; structural genomics
SecA is an intensively studied mechanoenzyme that uses ATP hydrolysis to drive processive extrusion of secreted proteins through a protein-conducting channel in the cytoplasmic membrane of eubacteria. The ATPase motor of SecA is strongly homologous to that in DEAD-box RNA helicases. It remains unclear how local chemical events in its ATPase active site control the overall conformation of an ~100 kDa multidomain enzyme and drive protein transport. In this paper, we use biophysical methods to establish that a single electrostatic charge in the ATPase active site controls the global conformation of SecA. The enzyme undergoes an ATP-modulated endothermic conformational transition (ECT) believed to involve similar structural mechanics to the protein transport reaction. We have characterized the effects of an isosteric glutamate-to-glutamine mutation in the catalytic base, which mimics the immediate electrostatic consequences of ATP hydrolysis in the active site. Calorimetric studies demonstrate that this mutation facilitates the ECT in E. coli SecA and triggers it completely in B. subtilis SecA. Consistent with the substantial increase in entropy observed in the course of the ECT, hydrogen-deuterium exchange mass spectrometry demonstrates that it increases protein backbone dynamics in domain-domain interfaces at remote locations from the ATPase active site. The catalytic glutamate is one of ~250 charged amino acids in SecA, and yet neutralization of its sidechain charge is sufficient to trigger a global order-disorder transition in this 100 kDa enzyme. The intricate network of structural interactions mediating this effect couples local electrostatic changes during ATP hydrolysis to global conformational and dynamic changes in SecA. This network forms the foundation of the allosteric mechanochemistry that efficiently harnesses the chemical energy stored in ATP to drive complex mechanical processes.
SecA; ATPase; thermodynamics; entropy; protein dynamics; allostery; hydrogen-deuterium exchange
For the 10th experiment on Critical Assessment of the techniques of protein Structure Prediction (CASP) the prediction target proteins were broken into independent evaluation units (EUs), which were then classified into template-based modeling (TBM) or free modeling (FM) categories. We describe here how the EUs were defined and classified, what issues arose in the process, and how we resolved them. Evaluation units are frequently not the whole target proteins but the constituting structural domains. However, the assessors from CASP7 on combined more than one domain into one evaluation unit for some targets, which implied that the assessment also included evaluation of the prediction of the relative position and orientation of these domains. In CASP10, we followed and expanded this notion by defining multi-domain evaluation units for a number of targets. These included three EUs, each made of two domains of familiar fold but arranged in a novel manner and for which the focus of evaluation was the inter-domain arrangement. An EU was classified to the TBM category if a template could be found by sequence similarity searches and to FM if a structural template could not be found by structural similarity searches. The EUs that did not fall cleanly in either of these cases were classified case-by-case, often including consideration of the overall quality and characteristics of the predictions.
CASP; CASP10; protein structure; structure prediction; domain definition; evaluation unit; assessment unit; classification
We have found that refinement of protein NMR structures using Rosetta with experimental NMR restraints yields more accurate protein NMR structures than those that have been deposited in the PDB using standard refinement protocols. Using 40 pairs of NMR and X-ray crystal structures determined by the Northeast Structural Genomics Consortium, for proteins ranging in size from 5 – 22 kDa, restrained-Rosetta refined structures fit better to the raw experimental data, are in better agreement with their X-ray counterparts, and have better phasing power compared to conventionally determined NMR structures. For 38 proteins for which NMR ensembles were available and which had similar structures in solution and in the crystal, all of the restrained-Rosetta refined NMR structures were sufficiently accurate to be used for solving the corresponding X-ray crystal structures by molecular replacement. The protocol for restrained refinement of protein NMR structures was also compared with restrained CS-Rosetta calculations. For proteins smaller than 10 kDa, restrained CS-Rosetta, starting from extended conformations, provides slightly more accurate structures, while for proteins in the size range of 10 – 25 kDa the less cpu intensive restrained-Rosetta refinement protocols provided more accurate structures. The restrained-Rosetta protocols described here can improve the accuracy of protein NMR structures, and should find broad and general for studies of protein structure and function.
Protein perdeuteration approaches have tremendous value in protein NMR studies, but are limited by the high cost of perdeuterated media. Here, we demonstrate that E. coli cultures expressing proteins using either the condensed single protein production method (cSPP), or conventional pET expression plasmids, can be condensed prior to protein expression, thereby providing high-quality 2H,13C,15N-enriched protein samples at 2.5 - 10% the cost of traditional methods. As an example of the value of such inexpensively-produced perdeuterated proteins, we produced 2H,13C,15N-enriched E. coli cold shock protein A (CspA) and EnvZb in 40X condensed phase media, and obtained NMR spectra suitable for 3D structure determination. The cSPP system was also used to produce 2H,13C,15N-enriched E. coli plasma membrane protein YaiZ and outer membrane protein X (OmpX) in condensed phase. NMR spectra can be obtained for these membrane proteins produced in the cSPP system following simple detergent extraction, without extensive purification or reconstitution. This allows a membrane protein’s structural and functional properties to be characterized prior to reconstitution, or as a probe of the effects of subsequent purification steps on the structural integrity of membrane proteins. We also provide a standardized protocol for production of perdeuterated proteins using the cSPP system. The 10 - 40 fold reduction in costs of fermentation media provided by using a condensed culture system opens the door to many new applications for perdeuterated proteins in spectroscopic and crystallographic studies.
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data.
Protein NMR Structure Validation; BioMagResDatabase; XPLOR; CNS; CYANA; CS-Rosetta