The Morita-Baylis-Hillman reaction forms a carbon-carbon bond between the alpha carbon of a conjugated carbonyl compound and a carbon electrophile. The reaction mechanism involves Michael addition of a nucleophile catalyst at the carbonyl beta carbon, followed by bond formation with the electrophile and catalyst disassociation to release the product. We used Rosetta to design 48 proteins containing active sites predicted to carry out this mechanism, of which two show catalytic activity by mass spectrometry (MS). Substrate labeling measured by MS and site-directed mutagenesis experiments show that the designed active-site residues are responsible for activity, although rate acceleration over background is modest. To characterize the designed proteins, we developed a fluorescence-based screen for intermediate formation in cell lysates, carried out microsecond molecular dynamics simulations, and solved X-ray crystal structures. These data indicate a partially formed active site, and suggest several clear avenues for designing more active catalysts.
Here we describe the solution NMR structure of the 120 amino acid fragment of BT_0084, without the N-terminal lipoprotein targeting sequence, encoded in a conjugative transposon (CTn) in the genome of Bacteroides thetaiotamicron. BT_0084 belongs to a conserved family of TraQ lipoproteins that are encoded at the end of the tra operon, which contains genes essential for transfer of CTns. The structure belongs to the immunoglobulin superfamily and shares structural similarity, albeit low sequence identity (< 15%), to other proteins involved in pili production for bacterial cell attachment. Although its role in repression of CTn transfer remains to be determined, the structure of BT_0084 reported here represents the first from the Bacteroides TraQ family and should facilitate further understanding of the tra operon-regulated transfer of CTns.
TraQ; DUF3872; PF12988; structural genomics; tra operon; conjugation; pili; CTnDOT; CTnERL
The crystal structure of a putative HNH endonuclease, Gmet_0936 protein from Geobacter metallireducens GS-15, has been determined at 2.6 Å resolution using single-wavelength anomalous dispersion method. The structure contains a two-stranded anti-parallel β-sheet that are surrounded by two helices on each face, and reveals a Zn ion bound in each monomer, coordinated by residues Cys38, Cys41, Cys73, and Cys76, which likely plays an important structural role in stabilizing the overall conformation. Structural homologs of Gmet_0936 include Hpy99I endonuclease, phage T4 endonuclease VII, and other HNH endonucleases, with these enzymes sharing 15–20% amino acid sequence identity. An overlay of Gmet_0936 and Hpy99I structures shows that most of the secondary structure elements, catalytic residues as well as the zinc binding site (zinc ribbon) are conserved. However, Gmet_0936 lacks the N-terminal domain of Hpy99I, which mediates DNA binding as well as dimerization. Purified Gmet_0936 forms dimers in solution and a dimer of the protein is observed in the crystal, but with a different mode of dimerization as compared to Hpy99I. Gmet_0936 and its N77H variant show a weak DNA binding activity in a DNA mobility shift assay and a weak Mn2+-dependent nicking activity on supercoiled plasmids in low pH buffers. The preferred substrate appears to be acid and heat-treated DNA with AP sites, suggesting Gmet_0936 may be a DNA repair enzyme.
Unlike random heteropolymers, natural proteins fold into unique ordered structures. Understanding how these are encoded in amino-acid sequences is complicated by energetically unfavourable non-ideal features—for example kinked α-helices, bulged β-strands, strained loops and buried polar groups—that arise in proteins from evolutionary selection for biological function or from neutral drift. Here we describe an approach to designing ideal protein structures stabilized by completely consistent local and non-local interactions. The approach is based on a set of rules relating secondary structure patterns to protein tertiary motifs, which make possible the design of funnel-shaped protein folding energy landscapes leading into the target folded state. Guided by these rules, we designed sequences predicted to fold into ideal protein structures consisting of α-helices, β-strands and minimal loops. Designs for five different topologies were found to be monomeric and very stable and to adopt structures in solution nearly identical to the computational models. These results illuminate how the folding funnels of natural proteins arise and provide the foundation for engineering a new generation of functional proteins free from natural evolution.
Malonyl-coenzyme A decarboxylase (MCD) is found from bacteria to humans, has important roles in regulating fatty acid metabolism and food intake, and is an attractive target for drug discovery. We report here four crystal structures of MCD from human, Rhodopseudomonas palustris, Agrobacterium vitis, and Cupriavidus metallidurans at up to 2.3 Å resolution. The MCD monomer contains an N-terminal helical domain involved in oligomerization and a C-terminal catalytic domain. The four structures exhibit substantial differences in the organization of the helical domains and, consequently, the oligomeric states and intersubunit interfaces. Unexpectedly, the MCD catalytic domain is structurally homologous to those of the GCN5-related N-acetyltransferase superfamily, especially the curacin A polyketide synthase catalytic module, with a conserved His-Ser/Thr dyad important for catalysis. Our structures, along with mutagenesis and kinetic studies, provide a molecular basis for understanding pathogenic mutations and catalysis, as well as a template for structure-based drug design.
•Structures of human and bacterial MCDs were determined at up to 2.3 Å resolution•Distinct tetrameric and dimeric MCD oligomerizations were observed•Unexpected homology to the GNAT superfamily gives insights into catalytic mechanism•The structures provide the molecular basis for the disease-causing mutations in MCD
Malonyl-CoA decarboxylase (MCD) is important in fatty acid metabolism. Froese et al. report structures of several MCDs and show that the MCD catalytic domain shares structural homology with GNAT superfamily. The structures further our understanding of catalysis, pathogenic mutations, and drug design.
The ribosome consists of small and large subunits each comprised of dozens of proteins and RNA molecules. However, the functions of many of the individual protomers within the ribosome are still unknown. Here we describe the solution NMR structure of the ribosomal protein RP-L35Ae from the archaeon Pyrococcus furiosus. RP-L35Ae is buried within the large subunit of the ribosome and belongs to Pfam protein domain family PF01247, which is highly conserved in eukaryotes, present in a few archaeal genomes, but absent in bacteria. The protein adopts a six-stranded anti-parallel β-barrel analogous to the ‘tRNA binding motif’ fold. The structure of the P. furiosus RP-L35Ae presented here constitutes the first structural representative from this protein domain family.
ribosomal protein; L35Ae; PF01247; tRNA binding; solution NMR; structural genomics
Protein domain family YabP (PF07873) is a family of small protein domains that are conserved in a wide range of bacteria and involved in spore coat assembly during the process of sporulation. The 62-residue fragment of Dsy0195 from Desulfitobacterium hafniense, which belongs to the YabP family, exists as a homodimer in solution under the conditions used for structure determination using NMR spectroscopy. The structure of the Dsy0195 homodimer contains two identical 62-residue monomeric subunits, each consisting of five anti-parallel beta strands (β1, 23-29; β2, 31-38; β3, 41-46; β4, 49-59; β5, 69-80). The tertiary structure of the Dsy0195 monomer adopts a cylindrical fold composed of two beta sheets. The two monomer subunits fold into a homodimer about a single C2 symmetry axis, with the interface composed of two anti-parallel beta strands, β1-β1’ and β5b-β5b’, where β5b refers to the C-terminal half of the bent β5 strand, without any domain swapping. Potential functional regions of the Dsy0195 structure were predicted based on conserved sequence analysis. The Dsy0195 structure reported here is the first representative structure from the YabP family.
PF07873; YabP; Dsy0195; Sporulation Protein; Structural Genomics; NMR
The protein Pspto_3016 is a 117-residue member of the protein domain family PF04237 (DUF419), which is to date a functionally uncharacterized family of proteins. In this report, we describe the structure of Pspto_3016 from Pseudomonas syringae solved by both solution NMR and X-ray crystallography at 2.5 Å resolution. In both cases, the structure of Pspto_3016 adopts a “double wing” α/β sandwich fold similar to that of protein YjbR from Escherichia coli and to the C-terminal DNA binding domain of the MotA transcription factor (MotCF) from T4 bacteriophage, along with other uncharacterized proteins. Pspto_3016 was selected by the Protein Structure Initiative of the National Institutes of Health and the Northeast Structural Genomics Consortium (NESG ID PsR293).
Pspto_3016; PF04237; DUF419; structural genomics; 2KFP; 3H9X; double wing; NMR; X-ray crystallography
One goal of the CASP Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction is to identify the current state of the art in protein structure prediction and modeling. A fundamental principle of CASP is blind prediction on a set of relevant protein targets, i.e. the participating computational methods are tested on a common set of experimental target proteins, for which the experimental structures are not known at the time of modeling. Therefore, the CASP experiment would not have been possible without broad support of the experimental protein structural biology community. In this manuscript, several experimental groups discuss the structures of the proteins which they provided as prediction targets for CASP9, highlighting structural and functional peculiarities of these structures: the long tail fibre protein gp37 from bacteriophage T4, the cyclic GMP-dependent protein kinase Iβ (PKGIβ) dimerization/docking domain, the ectodomain of the JTB (Jumping Translocation Breakpoint) transmembrane receptor, Autotaxin (ATX) in complex with an inhibitor, the DNA-Binding J-Binding Protein 1 (JBP1) domain essential for biosynthesis and maintenance of DNA base-J (β-D-glucosyl-hydroxymethyluracil) in Trypanosoma and Leishmania, an so far uncharacterized 73 residue domain from Ruminococcus gnavus with a fold typical for PDZ-like domains, a domain from the Phycobilisome (PBS) core-membrane linker (LCM) phycobiliprotein ApcE from Synechocystis, the Heat shock protein 90 (Hsp90) activators PFC0360w and PFC0270w from Plasmodium falciparum, and 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae.
CASP; protein structure; X-ray crystallography; NMR; structure prediction
Recent studies of signal transduction in bacteria have revealed a unique second messenger, bis-(3′-5′)-cyclic dimeric GMP (c-di-GMP), which regulates transitions between motile states and sessile states, such as biofilms. C-di-GMP is synthesized from two GTP molecules by diguanylate cyclases (DGC). The catalytic activity of DGCs depends on a conserved GG(D/E)EF domain, usually part of a larger multi-domain protein organization. The domains other than the GG(D/E)EF domain often control DGC activation. This paper presents the 1.83 Å crystal structure of an isolated catalytically competent GG(D/E)EF domain from the A1U3W3_MARAV protein from Marinobacter aquaeolei. Co-crystallization with GTP resulted in enzymatic synthesis of c-di-GMP. Comparison with previously solved DGC structures shows a similar orientation of c-di-GMP bound to an allosteric regulatory site mediating feedback inhibition of the enzyme. Biosynthesis of c-di-GMP in the crystallization reaction establishes that the enzymatic activity of this DGC domain does not require interaction with regulatory domains.
Diguanylate cyclase; GG(D/E)F domain; Cyclic di-GMP; X-ray crystal structure; Structural genomics
Human retinoblastoma binding protein 9 (RBBP9) is an interacting partner of the retinoblastoma susceptibility protein (Rb). RBBP9 is a tumor-associated protein required for pancreatic neoplasia, affects cell cycle control, and is involved in the TGF-β signalling pathway. Sequence analysis suggests that RBBP9 belongs to the α/β hydrolase superfamily of enzymes. The serine hydrolase activity of RBBP9 is required for development of pancreatic carcinomas in part by inhibiting TGF-β antiproliferative signaling through suppressing Smad2/3 phosphorylation. The crystal structure of human RBBP9 confirms the α/β hydrolase fold, with a six-stranded parallel β-sheet flanked by α helixes. The structure of RBBP9 resembles that of the YdeN protein from Bacillus subtilis, which is suggested to have carboxylesterase activity. RBBP9 contains a Ser75-His165-Asp138 catalytic triad, situated in a prominent pocket on the surface of the protein. The side chains of the LxCxE sequence motif that is important for interaction with Rb is mostly buried in the structure. Structure-function studies of RBBP9 suggest possible routes for novel cancer drug discovery programs.
α/β hydrolase; pancreatic cancer; protein structure; structural genomics; RBBP9
High-quality NMR structures of the homo-dimeric proteins Bvu3908 (69-residues in monomeric unit) from Bacteroides vulgatus and Bt2368 (74-residues) from Bacteroides thetaiotaomicron reveal the presence of winged helix-turn-helix (wHTH) motifs mediating tight complex formation. Such homo-dimer formation by winged HTH motifs is otherwise found only in two DNA-binding proteins with known structure: the C-terminal wHTH domain of transcriptional activator FadR from E. coli and protein TubR from B. thurigensis, which is involved in plasmid DNA segregation. However, the relative orientation of the wHTH motifs is different and residues involved in DNA-binding are not conserved in Bvu3908 and Bt2368. Hence, the proteins of the present study are not very likely to bind DNA, but are likely to exhibit a function that has thus far not been ascribed to homo-dimers formed by winged HTH motifs. The structures of Bvu3908 and Bt2368 are the first atomic resolution structures for PFAM family PF10771, a family of unknown function (DUF2582) currently containing 128 members.
Bvu3908; Bt2368; PF10771; DUF2582; Winged helix-turn-helix; Structural genomics
The protein domain family PF12095 (DUF3571) is a functionally uncharacterized family of small proteins conserved from cyanobacteria to plants that are typically 85 to 95 amino acids in length in cyanobacteria. In this report, we describe the solution NMR structure of the 86-residue protein Asl3597 from Nostoc sp. PCC7120. The structure of Asl3597, which constitutes the first three-dimensional structure from protein family PF12095, has a unique α/β sandwich fold consisting of four anti-parallel β-strands opposite three continuous α-helices. Asl3597 may have a role in the assembly of the hydrophilic subcomplex of the cyanobacterial NAD(P)H complex as suggested by data for the orthologous Chlororespiratory reduction 7 protein from Arabidopsis thaliana.
Asl3597; PF12095; DUF3571; structural genomics; 2KRX; NDH complex; chlororespiratory reduction 7; CRR7
Molecular replacement (MR) is widely used for addressing the phase problem in X-ray crystallography. Historically, crystallographers have had limited success using NMR structures as MR search models. Here we report a comprehensive investigation of the utility of protein NMR ensembles as MR search models, using data for 25 pairs of X-ray and NMR structures solved and refined using modern NMR methods. Starting from NMR ensembles prepared by an improved protocol, FindCore, correct MR solutions were obtained for 22 targets. Based on these solutions, automatic model rebuilding could be done successfully. Rosetta refinement of NMR structures provided MR solutions for another two proteins. We also demonstrate that such properly prepared NMR ensembles and X-ray crystal structures have similar performance when used as MR search models for homologous structures, particularly for targets with sequence identity > 40%.
The protein family (Pfam) PF04536 is a broadly conserved domain family of unknown function (DUF477), with more than 1,350 members in prokaryotic and eukaryotic proteins. High-quality NMR structures of the N-terminal domain comprising residues 41–180 of the 684-residue protein CG2496 from Corynebacterium glutamicum and the N-terminal domain comprising residues 35–182 of the 435-residue protein PG0361 from Porphyromonas gingivalis both exhibit an α/β fold comprised of a four-stranded β-sheet, three α-helices packed against one side of the sheet, and a fourth α-helix attached to the other side. In spite of low sequence similarity (18%) assessed by structure-based sequence alignment, the two structures are globally quite similar. However, moderate structural differences are observed for the relative orientation of two of the four helices. Comparison with known protein structures reveals that the α/β architecture of CG2496(41–180) and PG0361(35–182) has previously not been characterized. Moreover, calculation of surface charge potential and identification of surface clefts indicate that the two domains very likely have different functions.
CG2496; PG0361; CgR26A; PgR37A; PF04536; DUF477; Structural genomics
The protocols currently used for protein structure determination by NMR depend on the determination of a large number of upper distance limits for proton-proton pairs. Typically, this task is performed manually by an experienced researcher rather than automatically by using a specific computer program. To assess whether it is indeed possible to generate in a fully automated manner NMR structures adequate for deposition in the Protein Data Bank, we gathered ten experimental datasets with unassigned NOESY peak lists for various proteins of unknown structure, computed structures for each of them using different, fully automatic programs, and compared the results to each other and to the manually solved reference structures that were not available at the time the data were provided. This constitutes a stringent “blind” assessment similar to the CASP and CAPRI initiatives. This study demonstrates the feasibility of routine, fully automated protein structure determination by NMR.
Single-stranded DNA (ssDNA) binding proteins are important in basal metabolic pathways for gene transcription, recombination, DNA repair and replication in all domains of life. Their main cellular role is to stabilize melted duplex DNA and protect genomic DNA from degradation. We have uncovered the molecular function of protein domain family domain of unknown function DUF2128 (PF09901) as a novel ssDNA binding domain. This bacterial domain strongly associates into a dimer and presents a highly positively charged surface that is consistent with its function in non-specific ssDNA binding. Lactococcus lactis YdbC is a representative of DUF2128. The solution NMR structures of the 20 kDa apo-YdbC dimer and YdbC:dT19G1 complex were determined. The ssDNA-binding energetics to YdbC were characterized by isothermal titration calorimetry. YdbC shows comparable nanomolar affinities for pyrimidine and mixed oligonucleotides, and the affinity is sufficiently strong to disrupt duplex DNA. In addition, YdbC binds with lower affinity to ssRNA, making it a versatile nucleic acid-binding domain. The DUF2128 family is related to the eukaryotic nuclear protein positive cofactor 4 (PC4) family and to the PUR family both by fold similarity and molecular function.
Crystal structures of the putative ATP pyrophosphatase PF0828 reveal a new domain (EGT domain), and a large conformational change upon ATP binding.
ATP pyrophosphatases (ATP PPases) are widely distributed in archaea and eukaryotes. They share an HUP domain at the N-terminus with a conserved PP-motif that interacts with the phosphates of ATP. The PF0828 protein from Pyrococcus furiosus is a member of the ATP PPase superfamily and it also has a 100-residue C-terminal extension that contains a strictly conserved EGG(E/D)xE(T/S) motif, which has been named the EGT-motif. Here, crystal structures of PF0828 alone and in complex with ATP or AMP are reported. The HUP domain contains a central five-stranded β-sheet that is surrounded by four helices, as in other related structures. The C-terminal extension forms a separate domain, named the EGT domain, which makes tight interactions with the HUP domain, bringing the EGT-motif near to the PP-motif and defining the putative active site of PF0828. Both motifs interact with the phosphate groups of ATP. A loop in the HUP domain undergoes a large conformational change to recognize the adenine base of ATP. In solution and in the crystal PF0828 is a dimer formed by the side-by-side arrangement of the HUP domains of the two monomers. The putative active site is located far from the dimer interface.
ATP pyrophosphatases; PF0828; ATP binding; Pyrococcus furiosus
Protein domain family PF09905 (DUF2132) is a family of small domains of unknown function that are conserved in a wide range of bacteria. Here we describe the solution NMR structure of the 80-residue VF0530 protein from Vibrio fischeri, the first structural representative from this protein domain family. We demonstrate that the structure of VF0530 adopts a unique four-helix motif that shows some similarity to the C-terminal double-stranded DNA (dsDNA) binding domain of RecA, as well as other nucleic acid binding domains. Moreover, gel shift binding data indicate a potential dsDNA binding role for VF0530.
DUF2132; PF09905; nucleic acid binding domain; structural genomics
Structural crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy are the predominant techniques for understanding the biological world on a molecular level. Crystallography is constrained by the ability to form a crystal that diffracts well and NMR is constrained to smaller proteins. While powerful techniques they leave many soluble, purified protein samples structurally uncharacterized. Small Angle X-ray Scattering (SAXS) is a solution technique that provides data on the size and multiple conformations of a sample, and can be used to reconstruct a low resolution molecular envelope of a macromolecule. In this study SAXS has been used in a high-throughput manner on a subset of 28 proteins where structural information is available from crystallographic and/or NMR techniques. These crystallographic and NMR structures were used to validate the accuracy of molecular envelopes reconstructed from SAXS data on a statistical level, to compare and highlight complementary structural information that SAXS provides, and to leverage biological information derived by crystallographers and spectroscopists from their structures. All of the ab initio molecular envelopes calculated from the SAXS data agree well with the available structural information. SAXS is a powerful albeit low-resolution technique that can provide additional structural information in a high-throughput and complementary manner to improve the functional interpretation of high-resolution structures.
UNC119 is widely expressed among vertebrates and invertebrates. Here we report that UNC119 recognized the acylated N-terminus of the rod photoreceptor transducin α-subunit (Tα) as well as C. elegans G proteins Odr-3 and Gpa-13. The crystal structure of human UNC119 at 1.95 Å resolution revealed an immunoglobulin-like β-sandwich fold. Pulldowns and isothermal titration calorimetry revealed a tight interaction between UNC119 and acylated Gα peptides. Co-crystallization of UNC119 with an acylated Tα N-terminal peptide at 2.0 Å revealed that the lipid chain is buried deeply into UNC119's hydrophobic cavity. UNC119 bound TαGTP inhibiting its GTPase activity, thereby providing a stable UNC119-TαGTP complex that is capable of diffusing from the inner segment back to the outer segment following light-induced translocation. UNC119 deletion in both mouse and C. elegans lead to G protein mislocalization. These results establish UNC119 as a novel Gα-subunit cofactor that is essential for G-protein trafficking in sensory cilia.
The soluble monomeric domain of lipoprotein YxeF from the Gram positive bacterium B. subtilis was selected by the Northeast Structural Genomics Consortium (NESG) as a target of a biomedical theme project focusing on the structure determination of the soluble domains of bacterial lipoproteins. The solution NMR structure of YxeF reveals a calycin fold and distant homology with the lipocalin Blc from the Gram-negative bacterium E.coli. In particular, the characteristic β-barrel, which is open to the solvent at one end, is extremely well conserved in YxeF with respect to Blc. The identification of YxeF as the first lipocalin homologue occurring in a Gram-positive bacterium suggests that lipocalins emerged before the evolutionary divergence of Gram positive and Gram negative bacteria. Since YxeF is devoid of the α-helix that packs in all lipocalins with known structure against the β-barrel to form a second hydrophobic core, we propose to introduce a new lipocalin sub-family named ‘slim lipocalins’, with YxeF and the other members of Pfam family PF11631 to which YxeF belongs constituting the first representatives. The results presented here exemplify the impact of structural genomics to enhance our understanding of biology and to generate new biological hypotheses.
A library of quinoxaline derivatives were prepared to target non-structural protein 1 of influenza A (NS1A) as a means to develop anti-influenza drug leads. An in vitro fluorescence polarization assay demonstrated that these compounds disrupted the dsRNA-NS1A interaction to varying extents. Changes of substituent at positions 2, 3 and 6 on the quinoxaline ring led to variance in responses. The most active compounds (35 and 44) had IC50 values in the range of low micromolar concentration without exhibiting significant dsRNA intercalation. Compound 44 was able to inhibit influenza A/Udorn/72 virus growth.
Quinoxaline derivatives; NS1A protein; Influenza A virus; Fluorescence polarization
We describe the RPF web server, a quality assessment tool for protein NMR structures. The RPF server measures the ‘goodness-of-fit’ of the 3D structure with NMR chemical shift and unassigned NOESY data, and calculates a discrimination power (DP) score, which estimates the differences between the fits of the query structures and random coil structures to these experimental data. The DP-score is an accuracy predictor of the query structure. The RPF server also maps local structure quality measures onto the 3D structure using an online molecular viewer, and onto the NMR spectra, allowing refinement of the structure and/or NOESY peak list data. The RPF server is available at: http://nmr.cabm.rutgers.edu/rpf.
The Protein Structure Initiative (PSI) was established in 2000 by the National Institutes of General Medical Sciences with the long-term goal of providing 3D (three-dimensional) structural information for most proteins in nature. As advances in genomic sequencing, bioinformatics, homology modelling, and methods for rapid determination of 3D structures of proteins by X-ray crystallography and nuclear magnetic resonance (NMR) converged, it was proposed that our understanding of the biology of protein structure and evolution could be greatly enabled by ‘genomic-scale’ protein structure determination. Over the past 12 years, the PSI has evolved from a testing bed for new methods of sample and structure production to a core component of a wide range of biology programs.