We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.
structural genomics; protein function prediction; PUA domain-like domains; X-ray crystallography; NMR
Protein domain family PF06855 (DUF1250) is a family of small domains of unknown function found only in bacteria, and mostly in the order Bacillales and Lactobacillales. Here we describe the solution NMR or X-ray crystal structures of three representatives of this domain family, MW0776 and MW1311 from Staphyloccocus aureus and yozE from Bacillus subtilis. All three proteins adopt a four-helix motif similar to sterile alpha motif (SAM) domains. Phylogenetic analysis classifies MW1311 and yozE as functionally equivalent proteins of the UPF0346 family of unknown function, but excludes MW0776, which likely has a different biological function. Our structural characterization of the three domains supports this separation of function. The structures of MW0776, MW1311, and yozE constitute the first structural representatives from this protein domain family.
yozE; MW0776; MW1311; UPF0345; DUF1250; PF06855; MW0776; SAM domain; X-ray crystal structure; solution NMR; structural genomics
Many bioscience fields employ high-throughput methods to screen multiple biochemical conditions. The analysis of these becomes tedious without a degree of automation. Crystallization, a rate limiting step in biological X-ray crystallography, is one of these fields. Screening of multiple potential crystallization conditions (cocktails) is the most effective method of probing a proteins phase diagram and guiding crystallization but the interpretation of results can be time-consuming. To aid this empirical approach a cocktail distance coefficient was developed to quantitatively compare macromolecule crystallization conditions and outcome. These coefficients were evaluated against an existing similarity metric developed for crystallization, the C6 metric, using both virtual crystallization screens and by comparison of two related 1,536-cocktail high-throughput crystallization screens. Hierarchical clustering was employed to visualize one of these screens and the crystallization results from an exopolyphosphatase-related protein from Bacteroides fragilis, (BfR192) overlaid on this clustering. This demonstrated a strong correlation between certain chemically related clusters and crystal lead conditions. While this analysis was not used to guide the initial crystallization optimization, it led to the re-evaluation of unexplained peaks in the electron density map of the protein and to the insertion and correct placement of sodium, potassium and phosphate atoms in the structure. With these in place, the resulting structure of the putative active site demonstrated features consistent with active sites of other phosphatases which are involved in binding the phosphoryl moieties of nucleotide triphosphates. The new distance coefficient, CDcoeff, appears to be robust in this application, and coupled with hierarchical clustering and the overlay of crystallization outcome, reveals information of biological relevance. While tested with a single example the potential applications related to crystallography appear promising and the distance coefficient, clustering, and hierarchal visualization of results undoubtedly have applications in wider fields.
Rational drug design relies on three-dimensional structures of biological macromolecules, especially proteins. Structural genomics high-throughput (HTP) structure determination platforms established by the NIH Protein Structure Initiative are uniquely suited to provide these structures. NMR plays a critical role since (i) many important protein targets do not form single crystals required for X-ray diffraction and (ii) NMR can provide valuable structural and dynamic information on proteins and their drug complexes that cannot be obtained with X-ray crystallography. In this article, recent advances of NMR driven by structural genomics projects are reviewed. These advances promise that future pharmaceutical discovery and design of drugs can increasingly rely on protocols for rapid and accurate NMR structure determination.
protein NMR; structural genomics; structural proteomics; drug discovery; protein interaction networks; structural bioinformatics
While there has been considerable progress in designing protein-protein interactions, the design of proteins that bind polar surfaces is an unmet challenge. We describe the computational design of a protein that binds the acidic active site of hen egg lysozyme and inhibits the enzyme. The design process starts with two polar amino acids that fit deep into the enzyme active site, identifies a protein scaffold that supports these residues and is complementary in shape to the lysozyme active site region, and finally optimizes the surrounding contact surface for high affinity binding. Following affinity maturation, a protein designed using this method bound lysozyme with low nanomolar affinity, and a combination of NMR studies, crystallography and knockout mutagenesis confirmed the designed binding surface and orientation. Saturation mutagenesis with selection and deep sequencing demonstrated that specific designed interactions extending well beyond the centrally grafted polar residues are critical for high affinity binding.
protein-protein interactions; hot spot; Rosetta molecular modeling program; protein engineering and design
We report alterations to the murine leukemia virus (MLV) integrase (IN) protein that successfully result in decreasing its integration frequency at transcription start sites and CpG islands, thereby reducing the potential for insertional activation. The host bromo and extraterminal (BET) proteins Brd2, 3 and 4 interact with the MLV IN protein primarily through the BET protein ET domain. Using solution NMR, protein interaction studies, and next generation sequencing, we show that the C-terminal tail peptide region of MLV IN is important for the interaction with BET proteins and that disruption of this interaction through truncation mutations affects the global targeting profile of MLV vectors. The use of the unstructured tails of gammaretroviral INs to direct association with complexes at active promoters parallels that used by histones and RNA polymerase II. Viruses bearing MLV IN C-terminal truncations can provide new avenues to improve the safety profile of gammaretroviral vectors for human gene therapy.
A high-quality NMR structure of the helicase associated (HA) domain comprising residues 627–691 of the 753-residue protein BVU_0683 from Bacteroides vulgatus exhibits an all α-helical fold. The structure presented here is the first representative for the large protein domain family PF03457 (currently 742 members) of HA domains. Comparison with structurally similar proteins supports the hypothesis that HA domains bind to DNA and that binding specificity varies greatly within the family of HA domains constituting PF03457.
A6KY75_BACV8; BVU_0683; PF03457; Helicase associated domain; Structural genomics; SANT domain
Protein domain family PF11267 (DUF3067) is a family of proteins of unknown function found in both bacteria and eukaryotes. Here we present the solution NMR structure of the 102-residue Alr2454 protein from Nostoc sp. PCC 7120, which constitutes the first structural representative from this conserved protein domain family. The structure of Nostoc sp. Alr2454 adopts a novel protein fold.
Alr2454 protein; DUF3067; PF11267; Protein Structure Initiative; Solution NMR structure; Structural genomics
The yeast mitochondrial protein Sdh5 is required for the covalent attachment of flavin adenine dinucleotide (FAD) to protein Sdh1, a subunit of the hetero-tetrameric enzyme succinate dehydrogenase (SDH). The NMR structure of Sdh5 represents the first eukaryotic structure of the Pfam family PF03937 and reveals a conserved surface region, which likely represents a putative Sdh1-Sdh5 interaction interface. Point mutations in this region result in the loss of covalent flavinylation of Sdh1. Moreover, backbone chemical shift perturbation measurements showed that Sdh5 does not bind FAD in vitro, indicating that it does not function as simple cofactor transporter in vivo.
The Morita-Baylis-Hillman reaction forms a carbon-carbon bond between the alpha carbon of a conjugated carbonyl compound and a carbon electrophile. The reaction mechanism involves Michael addition of a nucleophile catalyst at the carbonyl beta carbon, followed by bond formation with the electrophile and catalyst disassociation to release the product. We used Rosetta to design 48 proteins containing active sites predicted to carry out this mechanism, of which two show catalytic activity by mass spectrometry (MS). Substrate labeling measured by MS and site-directed mutagenesis experiments show that the designed active-site residues are responsible for activity, although rate acceleration over background is modest. To characterize the designed proteins, we developed a fluorescence-based screen for intermediate formation in cell lysates, carried out microsecond molecular dynamics simulations, and solved X-ray crystal structures. These data indicate a partially formed active site, and suggest several clear avenues for designing more active catalysts.
Here we describe the solution NMR structure of the 120 amino acid fragment of BT_0084, without the N-terminal lipoprotein targeting sequence, encoded in a conjugative transposon (CTn) in the genome of Bacteroides thetaiotamicron. BT_0084 belongs to a conserved family of TraQ lipoproteins that are encoded at the end of the tra operon, which contains genes essential for transfer of CTns. The structure belongs to the immunoglobulin superfamily and shares structural similarity, albeit low sequence identity (< 15%), to other proteins involved in pili production for bacterial cell attachment. Although its role in repression of CTn transfer remains to be determined, the structure of BT_0084 reported here represents the first from the Bacteroides TraQ family and should facilitate further understanding of the tra operon-regulated transfer of CTns.
TraQ; DUF3872; PF12988; structural genomics; tra operon; conjugation; pili; CTnDOT; CTnERL
The crystal structure of a putative HNH endonuclease, Gmet_0936 protein from Geobacter metallireducens GS-15, has been determined at 2.6 Å resolution using single-wavelength anomalous dispersion method. The structure contains a two-stranded anti-parallel β-sheet that are surrounded by two helices on each face, and reveals a Zn ion bound in each monomer, coordinated by residues Cys38, Cys41, Cys73, and Cys76, which likely plays an important structural role in stabilizing the overall conformation. Structural homologs of Gmet_0936 include Hpy99I endonuclease, phage T4 endonuclease VII, and other HNH endonucleases, with these enzymes sharing 15–20% amino acid sequence identity. An overlay of Gmet_0936 and Hpy99I structures shows that most of the secondary structure elements, catalytic residues as well as the zinc binding site (zinc ribbon) are conserved. However, Gmet_0936 lacks the N-terminal domain of Hpy99I, which mediates DNA binding as well as dimerization. Purified Gmet_0936 forms dimers in solution and a dimer of the protein is observed in the crystal, but with a different mode of dimerization as compared to Hpy99I. Gmet_0936 and its N77H variant show a weak DNA binding activity in a DNA mobility shift assay and a weak Mn2+-dependent nicking activity on supercoiled plasmids in low pH buffers. The preferred substrate appears to be acid and heat-treated DNA with AP sites, suggesting Gmet_0936 may be a DNA repair enzyme.
Unlike random heteropolymers, natural proteins fold into unique ordered structures. Understanding how these are encoded in amino-acid sequences is complicated by energetically unfavourable non-ideal features—for example kinked α-helices, bulged β-strands, strained loops and buried polar groups—that arise in proteins from evolutionary selection for biological function or from neutral drift. Here we describe an approach to designing ideal protein structures stabilized by completely consistent local and non-local interactions. The approach is based on a set of rules relating secondary structure patterns to protein tertiary motifs, which make possible the design of funnel-shaped protein folding energy landscapes leading into the target folded state. Guided by these rules, we designed sequences predicted to fold into ideal protein structures consisting of α-helices, β-strands and minimal loops. Designs for five different topologies were found to be monomeric and very stable and to adopt structures in solution nearly identical to the computational models. These results illuminate how the folding funnels of natural proteins arise and provide the foundation for engineering a new generation of functional proteins free from natural evolution.
Malonyl-coenzyme A decarboxylase (MCD) is found from bacteria to humans, has important roles in regulating fatty acid metabolism and food intake, and is an attractive target for drug discovery. We report here four crystal structures of MCD from human, Rhodopseudomonas palustris, Agrobacterium vitis, and Cupriavidus metallidurans at up to 2.3 Å resolution. The MCD monomer contains an N-terminal helical domain involved in oligomerization and a C-terminal catalytic domain. The four structures exhibit substantial differences in the organization of the helical domains and, consequently, the oligomeric states and intersubunit interfaces. Unexpectedly, the MCD catalytic domain is structurally homologous to those of the GCN5-related N-acetyltransferase superfamily, especially the curacin A polyketide synthase catalytic module, with a conserved His-Ser/Thr dyad important for catalysis. Our structures, along with mutagenesis and kinetic studies, provide a molecular basis for understanding pathogenic mutations and catalysis, as well as a template for structure-based drug design.
•Structures of human and bacterial MCDs were determined at up to 2.3 Å resolution•Distinct tetrameric and dimeric MCD oligomerizations were observed•Unexpected homology to the GNAT superfamily gives insights into catalytic mechanism•The structures provide the molecular basis for the disease-causing mutations in MCD
Malonyl-CoA decarboxylase (MCD) is important in fatty acid metabolism. Froese et al. report structures of several MCDs and show that the MCD catalytic domain shares structural homology with GNAT superfamily. The structures further our understanding of catalysis, pathogenic mutations, and drug design.
The ribosome consists of small and large subunits each comprised of dozens of proteins and RNA molecules. However, the functions of many of the individual protomers within the ribosome are still unknown. Here we describe the solution NMR structure of the ribosomal protein RP-L35Ae from the archaeon Pyrococcus furiosus. RP-L35Ae is buried within the large subunit of the ribosome and belongs to Pfam protein domain family PF01247, which is highly conserved in eukaryotes, present in a few archaeal genomes, but absent in bacteria. The protein adopts a six-stranded anti-parallel β-barrel analogous to the ‘tRNA binding motif’ fold. The structure of the P. furiosus RP-L35Ae presented here constitutes the first structural representative from this protein domain family.
ribosomal protein; L35Ae; PF01247; tRNA binding; solution NMR; structural genomics
Protein domain family YabP (PF07873) is a family of small protein domains that are conserved in a wide range of bacteria and involved in spore coat assembly during the process of sporulation. The 62-residue fragment of Dsy0195 from Desulfitobacterium hafniense, which belongs to the YabP family, exists as a homodimer in solution under the conditions used for structure determination using NMR spectroscopy. The structure of the Dsy0195 homodimer contains two identical 62-residue monomeric subunits, each consisting of five anti-parallel beta strands (β1, 23-29; β2, 31-38; β3, 41-46; β4, 49-59; β5, 69-80). The tertiary structure of the Dsy0195 monomer adopts a cylindrical fold composed of two beta sheets. The two monomer subunits fold into a homodimer about a single C2 symmetry axis, with the interface composed of two anti-parallel beta strands, β1-β1’ and β5b-β5b’, where β5b refers to the C-terminal half of the bent β5 strand, without any domain swapping. Potential functional regions of the Dsy0195 structure were predicted based on conserved sequence analysis. The Dsy0195 structure reported here is the first representative structure from the YabP family.
PF07873; YabP; Dsy0195; Sporulation Protein; Structural Genomics; NMR
The protein Pspto_3016 is a 117-residue member of the protein domain family PF04237 (DUF419), which is to date a functionally uncharacterized family of proteins. In this report, we describe the structure of Pspto_3016 from Pseudomonas syringae solved by both solution NMR and X-ray crystallography at 2.5 Å resolution. In both cases, the structure of Pspto_3016 adopts a “double wing” α/β sandwich fold similar to that of protein YjbR from Escherichia coli and to the C-terminal DNA binding domain of the MotA transcription factor (MotCF) from T4 bacteriophage, along with other uncharacterized proteins. Pspto_3016 was selected by the Protein Structure Initiative of the National Institutes of Health and the Northeast Structural Genomics Consortium (NESG ID PsR293).
Pspto_3016; PF04237; DUF419; structural genomics; 2KFP; 3H9X; double wing; NMR; X-ray crystallography
One goal of the CASP Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction is to identify the current state of the art in protein structure prediction and modeling. A fundamental principle of CASP is blind prediction on a set of relevant protein targets, i.e. the participating computational methods are tested on a common set of experimental target proteins, for which the experimental structures are not known at the time of modeling. Therefore, the CASP experiment would not have been possible without broad support of the experimental protein structural biology community. In this manuscript, several experimental groups discuss the structures of the proteins which they provided as prediction targets for CASP9, highlighting structural and functional peculiarities of these structures: the long tail fibre protein gp37 from bacteriophage T4, the cyclic GMP-dependent protein kinase Iβ (PKGIβ) dimerization/docking domain, the ectodomain of the JTB (Jumping Translocation Breakpoint) transmembrane receptor, Autotaxin (ATX) in complex with an inhibitor, the DNA-Binding J-Binding Protein 1 (JBP1) domain essential for biosynthesis and maintenance of DNA base-J (β-D-glucosyl-hydroxymethyluracil) in Trypanosoma and Leishmania, an so far uncharacterized 73 residue domain from Ruminococcus gnavus with a fold typical for PDZ-like domains, a domain from the Phycobilisome (PBS) core-membrane linker (LCM) phycobiliprotein ApcE from Synechocystis, the Heat shock protein 90 (Hsp90) activators PFC0360w and PFC0270w from Plasmodium falciparum, and 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae.
CASP; protein structure; X-ray crystallography; NMR; structure prediction
Recent studies of signal transduction in bacteria have revealed a unique second messenger, bis-(3′-5′)-cyclic dimeric GMP (c-di-GMP), which regulates transitions between motile states and sessile states, such as biofilms. C-di-GMP is synthesized from two GTP molecules by diguanylate cyclases (DGC). The catalytic activity of DGCs depends on a conserved GG(D/E)EF domain, usually part of a larger multi-domain protein organization. The domains other than the GG(D/E)EF domain often control DGC activation. This paper presents the 1.83 Å crystal structure of an isolated catalytically competent GG(D/E)EF domain from the A1U3W3_MARAV protein from Marinobacter aquaeolei. Co-crystallization with GTP resulted in enzymatic synthesis of c-di-GMP. Comparison with previously solved DGC structures shows a similar orientation of c-di-GMP bound to an allosteric regulatory site mediating feedback inhibition of the enzyme. Biosynthesis of c-di-GMP in the crystallization reaction establishes that the enzymatic activity of this DGC domain does not require interaction with regulatory domains.
Diguanylate cyclase; GG(D/E)F domain; Cyclic di-GMP; X-ray crystal structure; Structural genomics
Human retinoblastoma binding protein 9 (RBBP9) is an interacting partner of the retinoblastoma susceptibility protein (Rb). RBBP9 is a tumor-associated protein required for pancreatic neoplasia, affects cell cycle control, and is involved in the TGF-β signalling pathway. Sequence analysis suggests that RBBP9 belongs to the α/β hydrolase superfamily of enzymes. The serine hydrolase activity of RBBP9 is required for development of pancreatic carcinomas in part by inhibiting TGF-β antiproliferative signaling through suppressing Smad2/3 phosphorylation. The crystal structure of human RBBP9 confirms the α/β hydrolase fold, with a six-stranded parallel β-sheet flanked by α helixes. The structure of RBBP9 resembles that of the YdeN protein from Bacillus subtilis, which is suggested to have carboxylesterase activity. RBBP9 contains a Ser75-His165-Asp138 catalytic triad, situated in a prominent pocket on the surface of the protein. The side chains of the LxCxE sequence motif that is important for interaction with Rb is mostly buried in the structure. Structure-function studies of RBBP9 suggest possible routes for novel cancer drug discovery programs.
α/β hydrolase; pancreatic cancer; protein structure; structural genomics; RBBP9
High-quality NMR structures of the homo-dimeric proteins Bvu3908 (69-residues in monomeric unit) from Bacteroides vulgatus and Bt2368 (74-residues) from Bacteroides thetaiotaomicron reveal the presence of winged helix-turn-helix (wHTH) motifs mediating tight complex formation. Such homo-dimer formation by winged HTH motifs is otherwise found only in two DNA-binding proteins with known structure: the C-terminal wHTH domain of transcriptional activator FadR from E. coli and protein TubR from B. thurigensis, which is involved in plasmid DNA segregation. However, the relative orientation of the wHTH motifs is different and residues involved in DNA-binding are not conserved in Bvu3908 and Bt2368. Hence, the proteins of the present study are not very likely to bind DNA, but are likely to exhibit a function that has thus far not been ascribed to homo-dimers formed by winged HTH motifs. The structures of Bvu3908 and Bt2368 are the first atomic resolution structures for PFAM family PF10771, a family of unknown function (DUF2582) currently containing 128 members.
Bvu3908; Bt2368; PF10771; DUF2582; Winged helix-turn-helix; Structural genomics
The protein domain family PF12095 (DUF3571) is a functionally uncharacterized family of small proteins conserved from cyanobacteria to plants that are typically 85 to 95 amino acids in length in cyanobacteria. In this report, we describe the solution NMR structure of the 86-residue protein Asl3597 from Nostoc sp. PCC7120. The structure of Asl3597, which constitutes the first three-dimensional structure from protein family PF12095, has a unique α/β sandwich fold consisting of four anti-parallel β-strands opposite three continuous α-helices. Asl3597 may have a role in the assembly of the hydrophilic subcomplex of the cyanobacterial NAD(P)H complex as suggested by data for the orthologous Chlororespiratory reduction 7 protein from Arabidopsis thaliana.
Asl3597; PF12095; DUF3571; structural genomics; 2KRX; NDH complex; chlororespiratory reduction 7; CRR7
Molecular replacement (MR) is widely used for addressing the phase problem in X-ray crystallography. Historically, crystallographers have had limited success using NMR structures as MR search models. Here we report a comprehensive investigation of the utility of protein NMR ensembles as MR search models, using data for 25 pairs of X-ray and NMR structures solved and refined using modern NMR methods. Starting from NMR ensembles prepared by an improved protocol, FindCore, correct MR solutions were obtained for 22 targets. Based on these solutions, automatic model rebuilding could be done successfully. Rosetta refinement of NMR structures provided MR solutions for another two proteins. We also demonstrate that such properly prepared NMR ensembles and X-ray crystal structures have similar performance when used as MR search models for homologous structures, particularly for targets with sequence identity > 40%.
The protein family (Pfam) PF04536 is a broadly conserved domain family of unknown function (DUF477), with more than 1,350 members in prokaryotic and eukaryotic proteins. High-quality NMR structures of the N-terminal domain comprising residues 41–180 of the 684-residue protein CG2496 from Corynebacterium glutamicum and the N-terminal domain comprising residues 35–182 of the 435-residue protein PG0361 from Porphyromonas gingivalis both exhibit an α/β fold comprised of a four-stranded β-sheet, three α-helices packed against one side of the sheet, and a fourth α-helix attached to the other side. In spite of low sequence similarity (18%) assessed by structure-based sequence alignment, the two structures are globally quite similar. However, moderate structural differences are observed for the relative orientation of two of the four helices. Comparison with known protein structures reveals that the α/β architecture of CG2496(41–180) and PG0361(35–182) has previously not been characterized. Moreover, calculation of surface charge potential and identification of surface clefts indicate that the two domains very likely have different functions.
CG2496; PG0361; CgR26A; PgR37A; PF04536; DUF477; Structural genomics
The protocols currently used for protein structure determination by NMR depend on the determination of a large number of upper distance limits for proton-proton pairs. Typically, this task is performed manually by an experienced researcher rather than automatically by using a specific computer program. To assess whether it is indeed possible to generate in a fully automated manner NMR structures adequate for deposition in the Protein Data Bank, we gathered ten experimental datasets with unassigned NOESY peak lists for various proteins of unknown structure, computed structures for each of them using different, fully automatic programs, and compared the results to each other and to the manually solved reference structures that were not available at the time the data were provided. This constitutes a stringent “blind” assessment similar to the CASP and CAPRI initiatives. This study demonstrates the feasibility of routine, fully automated protein structure determination by NMR.