Single-stranded DNA (ssDNA) binding proteins are important in basal metabolic pathways for gene transcription, recombination, DNA repair and replication in all domains of life. Their main cellular role is to stabilize melted duplex DNA and protect genomic DNA from degradation. We have uncovered the molecular function of protein domain family domain of unknown function DUF2128 (PF09901) as a novel ssDNA binding domain. This bacterial domain strongly associates into a dimer and presents a highly positively charged surface that is consistent with its function in non-specific ssDNA binding. Lactococcus lactis YdbC is a representative of DUF2128. The solution NMR structures of the 20 kDa apo-YdbC dimer and YdbC:dT19G1 complex were determined. The ssDNA-binding energetics to YdbC were characterized by isothermal titration calorimetry. YdbC shows comparable nanomolar affinities for pyrimidine and mixed oligonucleotides, and the affinity is sufficiently strong to disrupt duplex DNA. In addition, YdbC binds with lower affinity to ssRNA, making it a versatile nucleic acid-binding domain. The DUF2128 family is related to the eukaryotic nuclear protein positive cofactor 4 (PC4) family and to the PUR family both by fold similarity and molecular function.
Crystal structures of the putative ATP pyrophosphatase PF0828 reveal a new domain (EGT domain), and a large conformational change upon ATP binding.
ATP pyrophosphatases (ATP PPases) are widely distributed in archaea and eukaryotes. They share an HUP domain at the N-terminus with a conserved PP-motif that interacts with the phosphates of ATP. The PF0828 protein from Pyrococcus furiosus is a member of the ATP PPase superfamily and it also has a 100-residue C-terminal extension that contains a strictly conserved EGG(E/D)xE(T/S) motif, which has been named the EGT-motif. Here, crystal structures of PF0828 alone and in complex with ATP or AMP are reported. The HUP domain contains a central five-stranded β-sheet that is surrounded by four helices, as in other related structures. The C-terminal extension forms a separate domain, named the EGT domain, which makes tight interactions with the HUP domain, bringing the EGT-motif near to the PP-motif and defining the putative active site of PF0828. Both motifs interact with the phosphate groups of ATP. A loop in the HUP domain undergoes a large conformational change to recognize the adenine base of ATP. In solution and in the crystal PF0828 is a dimer formed by the side-by-side arrangement of the HUP domains of the two monomers. The putative active site is located far from the dimer interface.
ATP pyrophosphatases; PF0828; ATP binding; Pyrococcus furiosus
Protein domain family PF09905 (DUF2132) is a family of small domains of unknown function that are conserved in a wide range of bacteria. Here we describe the solution NMR structure of the 80-residue VF0530 protein from Vibrio fischeri, the first structural representative from this protein domain family. We demonstrate that the structure of VF0530 adopts a unique four-helix motif that shows some similarity to the C-terminal double-stranded DNA (dsDNA) binding domain of RecA, as well as other nucleic acid binding domains. Moreover, gel shift binding data indicate a potential dsDNA binding role for VF0530.
DUF2132; PF09905; nucleic acid binding domain; structural genomics
Structural crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy are the predominant techniques for understanding the biological world on a molecular level. Crystallography is constrained by the ability to form a crystal that diffracts well and NMR is constrained to smaller proteins. While powerful techniques they leave many soluble, purified protein samples structurally uncharacterized. Small Angle X-ray Scattering (SAXS) is a solution technique that provides data on the size and multiple conformations of a sample, and can be used to reconstruct a low resolution molecular envelope of a macromolecule. In this study SAXS has been used in a high-throughput manner on a subset of 28 proteins where structural information is available from crystallographic and/or NMR techniques. These crystallographic and NMR structures were used to validate the accuracy of molecular envelopes reconstructed from SAXS data on a statistical level, to compare and highlight complementary structural information that SAXS provides, and to leverage biological information derived by crystallographers and spectroscopists from their structures. All of the ab initio molecular envelopes calculated from the SAXS data agree well with the available structural information. SAXS is a powerful albeit low-resolution technique that can provide additional structural information in a high-throughput and complementary manner to improve the functional interpretation of high-resolution structures.
UNC119 is widely expressed among vertebrates and invertebrates. Here we report that UNC119 recognized the acylated N-terminus of the rod photoreceptor transducin α-subunit (Tα) as well as C. elegans G proteins Odr-3 and Gpa-13. The crystal structure of human UNC119 at 1.95 Å resolution revealed an immunoglobulin-like β-sandwich fold. Pulldowns and isothermal titration calorimetry revealed a tight interaction between UNC119 and acylated Gα peptides. Co-crystallization of UNC119 with an acylated Tα N-terminal peptide at 2.0 Å revealed that the lipid chain is buried deeply into UNC119's hydrophobic cavity. UNC119 bound TαGTP inhibiting its GTPase activity, thereby providing a stable UNC119-TαGTP complex that is capable of diffusing from the inner segment back to the outer segment following light-induced translocation. UNC119 deletion in both mouse and C. elegans lead to G protein mislocalization. These results establish UNC119 as a novel Gα-subunit cofactor that is essential for G-protein trafficking in sensory cilia.
The soluble monomeric domain of lipoprotein YxeF from the Gram positive bacterium B. subtilis was selected by the Northeast Structural Genomics Consortium (NESG) as a target of a biomedical theme project focusing on the structure determination of the soluble domains of bacterial lipoproteins. The solution NMR structure of YxeF reveals a calycin fold and distant homology with the lipocalin Blc from the Gram-negative bacterium E.coli. In particular, the characteristic β-barrel, which is open to the solvent at one end, is extremely well conserved in YxeF with respect to Blc. The identification of YxeF as the first lipocalin homologue occurring in a Gram-positive bacterium suggests that lipocalins emerged before the evolutionary divergence of Gram positive and Gram negative bacteria. Since YxeF is devoid of the α-helix that packs in all lipocalins with known structure against the β-barrel to form a second hydrophobic core, we propose to introduce a new lipocalin sub-family named ‘slim lipocalins’, with YxeF and the other members of Pfam family PF11631 to which YxeF belongs constituting the first representatives. The results presented here exemplify the impact of structural genomics to enhance our understanding of biology and to generate new biological hypotheses.
A library of quinoxaline derivatives were prepared to target non-structural protein 1 of influenza A (NS1A) as a means to develop anti-influenza drug leads. An in vitro fluorescence polarization assay demonstrated that these compounds disrupted the dsRNA-NS1A interaction to varying extents. Changes of substituent at positions 2, 3 and 6 on the quinoxaline ring led to variance in responses. The most active compounds (35 and 44) had IC50 values in the range of low micromolar concentration without exhibiting significant dsRNA intercalation. Compound 44 was able to inhibit influenza A/Udorn/72 virus growth.
Quinoxaline derivatives; NS1A protein; Influenza A virus; Fluorescence polarization
We describe the RPF web server, a quality assessment tool for protein NMR structures. The RPF server measures the ‘goodness-of-fit’ of the 3D structure with NMR chemical shift and unassigned NOESY data, and calculates a discrimination power (DP) score, which estimates the differences between the fits of the query structures and random coil structures to these experimental data. The DP-score is an accuracy predictor of the query structure. The RPF server also maps local structure quality measures onto the 3D structure using an online molecular viewer, and onto the NMR spectra, allowing refinement of the structure and/or NOESY peak list data. The RPF server is available at: http://nmr.cabm.rutgers.edu/rpf.
The Protein Structure Initiative (PSI) was established in 2000 by the National Institutes of General Medical Sciences with the long-term goal of providing 3D (three-dimensional) structural information for most proteins in nature. As advances in genomic sequencing, bioinformatics, homology modelling, and methods for rapid determination of 3D structures of proteins by X-ray crystallography and nuclear magnetic resonance (NMR) converged, it was proposed that our understanding of the biology of protein structure and evolution could be greatly enabled by ‘genomic-scale’ protein structure determination. Over the past 12 years, the PSI has evolved from a testing bed for new methods of sample and structure production to a core component of a wide range of biology programs.
GmACP3 from Geobacter metallireducens is a specialized acyl carrier protein (ACP) whose gene, gmet_2339, is located near genes encoding many proteins involved in lipopolysaccharide (LPS) biosynthesis, indicating a likely function for GmACP3 in LPS production. By overexpression in Escherichia coli, about 50% holo-GmACP3 and 50% apo-GmACP3 were obtained. Apo-GmACP3 exhibited slow precipitation and non-monomeric behavior by 15N NMR relaxation measurements. Addition of 4′-phosphopantetheine (4′-PP) via enzymatic conversion by E. coli holo-ACP synthase, resulted in stable >95% holo-GmACP3 that was characterized as monomeric by 15N relaxation measurements and had no indication of conformational exchange. We have determined a high-resolution solution structure of holo-GmACP3 by standard NMR methods, including refinement with two sets of NH residual dipolar couplings, allowing for a detailed structural analysis of the interactions between 4′-PP and GmACP3. Whereas the overall four helix bundle topology is similar to previously solved ACP structures, this structure has unique characteristics, including an ordered 4′-PP conformation that places the thiol at the entrance to a central hydrophobic cavity near a conserved hydrogen-bonded Trp-His pair. These residues are part of a conserved WDSLxH/N motif found in GmACP3 and it’s orthologs. The helix locations and the large hydrophobic cavity are more similar to medium- and long-chain acyl-ACPs than to other apo- and holo-ACP structures. Taken together, structural characterization along with bioinformatic analysis of nearby genes suggest that GmACP3 is involved in lipid A acylation, possibly by atypical long-chain hydroxy fatty acids, and potentially involved in synthesis of secondary metabolites.
PefI; plasmid-encoded fimbriae regulatory protein; solution NMR structure; winged helix-turn-helix; Northeast Structural Genomics Consortium
Psb28 protein; NMR structure; Photosystem II
The solution structure of the Bacillus subtilis protein YndB has been solved using NMR in order to investigate proposed biological functions. The YndB structure exhibits the helix-grip fold, which consists of a β-sheet with two small and one long α-helix, forming a hydrophobic cavity that preferentially binds lipid-like molecules. Sequence and structure comparisons to proteins from eukaryotes, prokaryotes, and archaea suggest that YndB is very similar to the eukaryote protein Aha1, which binds to the middle domain of Hsp90 and induces ATPase activity. Based on these similarities, YndB has been classified as a member of the Activator of Hsp90 ATPase homolog-like protein (AHSA1) family with a function that appears to be related to stress response. An in silico screen of a compound library of ~18,500 lipids was used to identify classes of lipids that preferentially bind YndB. The in silico screen identified, in order of affinity, the chalcone/hydroxychalcone, flavanone, and flavone/flavonol classes of lipids, which was further verified by 2D 1H-15N HSQC NMR titration experiments with trans-chalcone, flavanone, flavone, and flavonol. All of these compounds are typically found in plants as precursors to various flavonoid antibiotics and signaling molecules. The sum of the data suggests an involvement of YndB with the stress response of B. subtilis to chalcone-like flavonoids released by plants due to a pathogen infection. The observed binding of chalcone-like molecules by YndB is likely related to the symbiotic relationship between B. subtilis and plants.
Crystal structures of biosynthetic arginine decarboxylase (ADC, SpeA) from E. coli and C. jejuni are reported.
Biosynthetic arginine decarboxylase (ADC; also known as SpeA) plays an important role in the biosynthesis of polyamines from arginine in bacteria and plants. SpeA is a pyridoxal-5′-phosphate (PLP)-dependent enzyme and shares weak sequence homology with several other PLP-dependent decarboxylases. Here, the crystal structure of PLP-bound SpeA from Campylobacter jejuni is reported at 3.0 Å resolution and that of Escherichia coli SpeA in complex with a sulfate ion is reported at 3.1 Å resolution. The structure of the SpeA monomer contains two large domains, an N-terminal TIM-barrel domain followed by a β-sandwich domain, as well as two smaller helical domains. The TIM-barrel and β-sandwich domains share structural homology with several other PLP-dependent decarboxylases, even though the sequence conservation among these enzymes is less than 25%. A similar tetramer is observed for both C. jejuni and E. coli SpeA, composed of two dimers of tightly associated monomers. The active site of SpeA is located at the interface of this dimer and is formed by residues from the TIM-barrel domain of one monomer and a highly conserved loop in the β-sandwich domain of the other monomer. The PLP cofactor is recognized by hydrogen-bonding, π-stacking and van der Waals interactions.
biosynthetic arginine decarboxylases; SpeA; Campylobacter jejuni; Escherichia coli
The crystal structure of human PI3K p85β iSH2 domain has been determined to 3.3 Å resolution. Comparison with the published structure of the bovine p85β iSH2 domain bound to the influenza A virus nonstructural protein 1 indicates that little or no structural change occurs upon complex formation. Structural analysis of human and bovine p85β iSH2 domains reveals conformational plasticity in the interhelical turn region of the coiled-coil motif.
Phosphatidylinositol 3-kinase (PI3K) proteins actively trigger signaling pathways leading to cell growth, proliferation and survival. These proteins have multiple isoforms and consist of a catalytic p110 subunit and a regulatory p85 subunit. The iSH2 domain of the p85β isoform has been implicated in the binding of nonstructural protein 1 (NS1) of influenza A viruses. Here, the crystal structure of human p85β iSH2 determined to 3.3 Å resolution is reported. The structure reveals that this domain mainly consists of a coiled-coil motif. Comparison with the published structure of the bovine p85β iSH2 domain bound to the influenza A virus nonstructural protein 1 indicates that little or no structural change occurs upon complex formation. By comparing this human p85β iSH2 structure with the bovine p85β iSH2 domain, which shares 99% sequence identity, and by comparing the multiple conformations observed within the asymmetric unit of the bovine iSH2 structure, it was found that this coiled-coil domain exhibits a certain degree of conformational variability or ‘plasticity’ in the interhelical turn region. It is speculated that this plasticity of p85β iSH2 may play a role in regulating its functional and molecular-recognition properties.
phosphatidylinositol 3-kinases; p85β unit; iSH2 domain; influenza virus NS1 protein binding
The de novo design of protein-protein interfaces is a stringent test of our understanding of the principles underlying protein-protein interactions and would enable new approaches to biological and medical challenges. Here we describe a novel motif-based method to computationally design protein-protein complexes with native-like interface composition and interaction density. Using this method we designed a pair of proteins, Prb and Pdar, that heterodimerize with a Kd of 130 nM, 1,000-fold tighter than any previously designed de novo protein-protein complex. Directed evolution identified two point mutations that improve affinity to 180 pM. Crystal structures of complexes containing designed and evolved proteins reveal binding is entirely through the designed interface, making use of specific designed interactions. Surprisingly, in the evolved complex one of the partners is rotated 180 degrees relative to the design model. This work demonstrates that current understanding of protein-protein interfaces is sufficient to rationally design interfaces de novo, and underscores remaining challenges.
Computational design; directed evolution; protein-protein interface; protein interaction; hotspot; Ankyrin Repeat; protein docking
Human interferon-stimulated gene 15 protein (ISG15), also called ubiquitin cross-reactive protein (UCRP), is the first identified ubiquitin-like protein containing two ubiquitin-like domains fused in tandem. The active form of ISG15 is conjugated to target proteins via the C-terminal glycine residue through an isopeptide bond in a manner similar to ubiquitin. The biological role of ISG15 is strongly associated with the modulation of cell immune function, and there is mounting evidence suggesting that many viral pathogens evade the host innate immune response by interfering with ISG15 conjugation to both host and viral proteins in a variety of ways. Here we report nearly complete backbone 1HN, 15N, 13C′, and 13Cα, as well as side chain 13Cβ, methyl (Ile-δ1, Leu, Val), amide (Asn, Gln), and indole N–H (Trp) NMR resonance assignments for the 157-residue human ISG15 protein. These resonance assignments provide the basis for future structural and functional solution NMR studies of the biologically important human ISG15 protein.
Backbone NMR resonance assignment; Human ISG15; Innate immune response; Ubiquitin-like protein
Using the single-protein-production (SPP) system, a protein of interest can be exclusively produced in high yield from its ACA-less gene in Escherichia coli expressing MazF, an ACA-specific mRNA interferase. It is thus feasible to study a membrane protein by solid-state NMR (SSNMR) directly in natural membrane fractions. In developing isotope-enrichment methods, we observed that 13C was also incorporated into phospholipids, generating spurious signals in SSNMR spectra. Notable, with the SPP system a protein can be produced in total absence of cell growth caused by antibiotics. Here, we demonstrate that cerulenin, an inhibitor of phospholipid biosynthesis, can suppress isotope incorporation in the lipids without affecting membrane protein yield in the SPP system. SSNMR analysis of ATP synthase subunit c, an E. coli inner membrane protein, produced by the SPP method using cerulenin revealed that 13C resonance signals from phospholipid were markedly reduced, while signals for the isotope-enriched protein were clearly present.
cSPP; Membrane protein; Phospholipid; biosynthesis; Cerulenin; SSNMR
There is a general need to develop more powerful and more robust methods for structural characterization of homo-dimers, homo-oligomers and multi-protein complexes using solution-state NMR methods. In recent years, there has been increasing emphasis on integrating distinct and complementary methodologies for structure determination of multi-protein complexes. One approach not yet widely used is to obtain intermediate and long-range distance constraints from paramagnetic relaxation enhancements (PRE) and EPR-based techniques such as Double Electron Electron Resonance (DEER), which, when used together, can provide supplemental distance constraints spanning to 10-70Å. In this communication, we describe integration of PRE and DEER data with conventional solution-state NMR methods for structure determination of Dsy0195, a homo-dimer (62 amino acids per monomer) from Desulfitobacterium hafniense. Our results indicate that combination of conventional NMR restraints with only one or a few DEER distance constraints and a small number of PRE constraints is sufficient for the automatic NMR-based structure determination program CYANA to build a network of inter-chain NOE constraints that can be used to accurately define both the homo-dimer interface and global homo-dimer structure. The use of DEER distances as a source of supplemental constraints as described here has virtually no upper molecular weight limit, and utilization of the PRE constraints is only limited by the ability to make accurate assignments of the protein amide proton and nitrogen chemical shifts.
The New York Consortium on Membrane Protein Structure (NYCOMPS) was formed to accelerate the acquisition of structural information on membrane proteins by applying a structural genomics approach. NY-COMPS comprises a bioinformatics group, a centralized facility operating a high-throughput cloning and screening pipeline, a set of associated wet labs that perform high-level protein production and structure determination by x-ray crystallography and NMR, and a set of investigators focused on methods development. In the first three years of operation, the NYCOMPS pipeline has so far produced and screened 7,250 expression constructs for 8,045 target proteins. Approximately 600 of these verified targets were scaled up to levels required for structural studies, so far yielding 24 membrane protein crystals. Here we describe the overall structure of NYCOMPS and provide details on the high-throughput pipeline.
Membrane proteins; Structural genomics; High throughput; NMR; X-ray
Lin0431 protein from Listeria innocua (UniProtKB/TrEMBL ID Q92EM7/Q92EM7_LISIN) was selected as a target of the Northeast Structural Genomics Consortium (target ID: LkR112). Here, we present the high-quality NMR solution structure of this protein which is the first representative for a member of DUF1312 domain family. Lin0431 protein exhibits a β-sandwich topology. Four anti-parallel β-strands form one face of the sandwich and the other three anti-parallel β-strands together with a short α-helix form the other face of the sandwich. Structure alignment by Dali reveals an unexpected structural similarity with domain II of NusG from Aquifex aeolicus. Analyses of the electrostatic protein surface potential and searches for protein surface cavities reveal the conserved basic charged surface cavities of both the Lin0431 and domain II of AaeNusG, suggesting they may bind the negatively charged nucleic acids and/or and other binding partners. The high structural similarity and similar surface features, despite the lack of recognizable sequence similarity, between Lin0431 and AaeNusG domain II suggest that the domain II of NusG and DUF1312 domains have a homologous relationship and may share similar biochemical functions.
structural genomics; Lin0431; NusG
The AT-rich interactive domain (ARID) of human AT-rich interactive domain-containing protein 3A (ARID3A) has been selected for structural characterization by Northeast Structural Genomics Consortium (residues 218-351 NESG ID HR4394C) as part of our Human Cancer Protein Interaction Network (HCPIN) project. Protein ARID3A belongs to the ARID family DNA-binding protein and is known to play important roles in embryonic patterning, cell lineage gene regulation, and cell cycle control, chromatin remodeling and transcriptional regulations. The solution NMR structure of ARID3A ARID domain consists of eight α-helices α0-α7 and a short β hairpin. Helix α0 and α1 form a V shape, helix α2-α4 and helix α5-α7 form two U shapes, and the V and two U shapes packed orthogonal to each other. The NMR structure of the ARID domain of human ARID3A reported here provides a structural basis for elucidating the regulatory mechanisms of ARID3A function, and the molecular mechanism of ARID3A interactions with DNA. It also has potential value in future drug discovery and design.
ARID3A; Bright; Dril1; E2FBP1; Transcription factor; DNA-binding protein
The biochemical and physical factors controlling protein expression level and solubility in vivo remain incompletely characterized. To gain insight into the primary sequence features influencing these outcomes, we performed statistical analyses of results from the high-throughput protein-production pipeline of the Northeast Structural Genomics Consortium. Proteins expressed in E. coli and consistently purified were scored independently for expression and solubility levels. These parameters nonetheless show a very strong positive correlation. We used logistic regressions to determine whether they are systematically influenced by fractional amino acid composition or several bulk sequence parameters including hydrophobicity, sidechain entropy, electrostatic charge, and predicted backbone disorder. Decreasing hydrophobicity correlates with higher expression and solubility levels, but this correlation apparently derives solely from the beneficial effect of three charged amino acids, at least for bacterial proteins. In fact, the three most hydrophobic residues showed very different correlations with solubility level. Leu showed the strongest negative correlation among amino acids, while Ile showed a slightly positive correlation in most data segments. Several other amino acids also had unexpected effects. Notably, Arg correlated with decreased expression and, most surprisingly, solubility of bacterial proteins, an effect only partially attributable to rare codons. However, rare codons did significantly reduce expression despite use of a codon-enhanced strain. Additional analyses suggest that positively but not negatively charged amino acids may reduce translation efficiency in E. coli irrespective of codon usage. While some observed effects may reflect indirect evolutionary correlations, others may reflect basic physicochemical phenomena. We used these results to construct and validate predictors of expression and solubility levels and overall protein usability, and we propose new strategies to be explored for engineering improved protein expression and solubility.
Abl2; ARG; Abl; Abelson Tyrosine Kinase; Abl Related Gene; helices bundle
VPA0419; yiiS; PFAM 04175; structural genomics; GFT NMR