Protein design tests our understanding of protein stability and structure. Successful design methods should allow the exploration of sequence space not found in nature. However, when redesigning naturally occurring protein structures most fixed backbone design algorithms return amino acid sequences that share strong sequence identity with wild-type sequences, especially in the protein core. This behavior places a restriction on functional space that can be explored and is not consistent with observations from nature, where sequences of low identity have similar structures. Here, we allow backbone flexibility during design to mutate every position in the core (38 residues) of a four-helix bundle protein. Only small perturbations to the backbone, 1-2 Å, were needed to entirely mutate the core. The redesigned protein, DRNN, is exceptionally stable (melting point > 140 °C). An NMR and X-ray crystal structure show that the side chains and backbone were accurately modeled (all-atom RMSD = 1.3 Å).
Computational Protein Design; de novo Protein Design; Flexible Backbone Protein Design
High-quality NMR structures of the homo-dimeric proteins Bvu3908 (69-residues in monomeric unit) from Bacteroides vulgatus and Bt2368 (74-residues) from Bacteroides thetaiotaomicron reveal the presence of winged helix-turn-helix (wHTH) motifs mediating tight complex formation. Such homo-dimer formation by winged HTH motifs is otherwise found only in two DNA-binding proteins with known structure: the C-terminal wHTH domain of transcriptional activator FadR from E. coli and protein TubR from B. thurigensis, which is involved in plasmid DNA segregation. However, the relative orientation of the wHTH motifs is different and residues involved in DNA-binding are not conserved in Bvu3908 and Bt2368. Hence, the proteins of the present study are not very likely to bind DNA, but are likely to exhibit a function that has thus far not been ascribed to homo-dimers formed by winged HTH motifs. The structures of Bvu3908 and Bt2368 are the first atomic resolution structures for PFAM family PF10771, a family of unknown function (DUF2582) currently containing 128 members.
Bvu3908; Bt2368; PF10771; DUF2582; Winged helix-turn-helix; Structural genomics
De novo proteins provide a unique opportunity for investigating the structure-function relationships of metalloproteins in a minimal, well-defined, and controlled scaffold. Herein, we describe the rational programming of function in a de novo designed di-iron carboxylate protein from the due ferri family. Originally created to catalyze O2-dependent, two-electron oxidation of hydroquinones, the protein was reprogrammed to catalyze the selective N-hydroxylation of arylamines by remodeling the substrate access cavity and introducing a critical third His ligand to the metal binding cavity. Additional second-and third-shell modifications were required to stabilize the His ligand in the core of the protein. These changes resulted in at least a 106 –fold increase in the relative rates of the two reactions. This result highlights the potential for using de novo proteins as scaffolds for future investigations of geometric and electronic factors that influence the catalytic tuning of di-iron active sites.
de novo design; metalloproteins; di-iron proteins; four-helix bundle; oxidase
The protein family (Pfam) PF04536 is a broadly conserved domain family of unknown function (DUF477), with more than 1,350 members in prokaryotic and eukaryotic proteins. High-quality NMR structures of the N-terminal domain comprising residues 41–180 of the 684-residue protein CG2496 from Corynebacterium glutamicum and the N-terminal domain comprising residues 35–182 of the 435-residue protein PG0361 from Porphyromonas gingivalis both exhibit an α/β fold comprised of a four-stranded β-sheet, three α-helices packed against one side of the sheet, and a fourth α-helix attached to the other side. In spite of low sequence similarity (18%) assessed by structure-based sequence alignment, the two structures are globally quite similar. However, moderate structural differences are observed for the relative orientation of two of the four helices. Comparison with known protein structures reveals that the α/β architecture of CG2496(41–180) and PG0361(35–182) has previously not been characterized. Moreover, calculation of surface charge potential and identification of surface clefts indicate that the two domains very likely have different functions.
CG2496; PG0361; CgR26A; PgR37A; PF04536; DUF477; Structural genomics
The protocols currently used for protein structure determination by NMR depend on the determination of a large number of upper distance limits for proton-proton pairs. Typically, this task is performed manually by an experienced researcher rather than automatically by using a specific computer program. To assess whether it is indeed possible to generate in a fully automated manner NMR structures adequate for deposition in the Protein Data Bank, we gathered ten experimental datasets with unassigned NOESY peak lists for various proteins of unknown structure, computed structures for each of them using different, fully automatic programs, and compared the results to each other and to the manually solved reference structures that were not available at the time the data were provided. This constitutes a stringent “blind” assessment similar to the CASP and CAPRI initiatives. This study demonstrates the feasibility of routine, fully automated protein structure determination by NMR.
Computationally designing protein-protein interactions with high affinity and desired orientation is a challenging task. Incorporating metal-binding sites at the target interface may be one approach for increasing affinity and specifying the binding mode, thereby improving robustness of designed interactions for use as tools in basic research as well as in applications from biotechnology to medicine. Here we describe a Rosetta-based approach for the rational design of a protein monomer to form a zinc-mediated, symmetric homodimer. Our metal interface design, named MID1 (NESG target ID OR37), forms a tight dimer in the presence of zinc (MID1-zinc) with a dissociation constant <30 nM. Without zinc the dissociation constant is 4 μM. The crystal structure of MID1-zinc shows good overall agreement with the computational model, but only three out of four designed histidines coordinate zinc. However, a histidine-to-glutamate point mutation resulted in four-coordination of zinc, and the resulting metal binding site and dimer orientation closely matches the computational model (Cα RMSD = 1.4 Å).
computational protein interface design; protein-protein interaction; metal; zinc; cobalt; homodimer; de novo
The soluble monomeric domain of lipoprotein YxeF from the Gram positive bacterium B. subtilis was selected by the Northeast Structural Genomics Consortium (NESG) as a target of a biomedical theme project focusing on the structure determination of the soluble domains of bacterial lipoproteins. The solution NMR structure of YxeF reveals a calycin fold and distant homology with the lipocalin Blc from the Gram-negative bacterium E.coli. In particular, the characteristic β-barrel, which is open to the solvent at one end, is extremely well conserved in YxeF with respect to Blc. The identification of YxeF as the first lipocalin homologue occurring in a Gram-positive bacterium suggests that lipocalins emerged before the evolutionary divergence of Gram positive and Gram negative bacteria. Since YxeF is devoid of the α-helix that packs in all lipocalins with known structure against the β-barrel to form a second hydrophobic core, we propose to introduce a new lipocalin sub-family named ‘slim lipocalins’, with YxeF and the other members of Pfam family PF11631 to which YxeF belongs constituting the first representatives. The results presented here exemplify the impact of structural genomics to enhance our understanding of biology and to generate new biological hypotheses.
We show that 1H NMR based metabonomics of serum allows the diagnosis of early stage I/II epithelial ovarian cancer (EOC) required for successful treatment. Because patient specimens are highly precious, we conducted an exploratory study using a micro-flow probe requiring only 20 μL serum. By use of logistic regression on principal components (PCs) of the NMR profiles, we built a 4-variable model for early stage EOC prediction (training set: 69 EOC specimens, 84 healthy controls; test set: 40 EOC, 44 controls) with operating characteristics estimated for the test set at 80% specificity [95% confidence interval (CI): 65% to 90%], 63% sensitivity (95% CI: 46% to 77%), and an area under the Receiver Operator Characteristic Curve (AUC) of 0.796. Independent validation (50 EOC, 50 controls) of the model yielded 95% specificity (95% CI: 86% to 99.5%), 68% sensitivity (95% CI: 53% to 80%) and an AUC of 0.949. A test on cancer type specificity showed that women diseased with renal cell carcinoma were not incorrectly diagnosed with EOC, indicating that metabonomics bears significant potential for cancer type-specific diagnosis. Our model can potentially be applied for women at high risk for EOC, and our study promises to contribute to developing a screening protocol for the general population.
Ovarian Cancer; Early Stage Detection; Metabonomics; Cancer-type Specificity; NMR; Micro-flow Probe; Principal Component Analysis; Predictive Statistical Model
Recording of four-dimensional (4D) spectra for proteins in the solid state has opened new avenues to obtain virtually complete resonance assignments and three-dimensional (3D) structures of proteins. As in solution state NMR, the sampling of three indirect dimensions leads per se to long minimal measurement time. Furthermore, artifact suppression in solid state NMR relies primarily on radio-frequency pulse phase cycling. For an n-step phase cycle, the minimal measurement times of both 3D and 4D spectra are increased n times. To tackle the associated ‘sampling problem’ and to avoid sampling limited data acquisition, solid state G-Matrix Fourier Transform (SS GFT) projection NMR is introduced to rapidly acquire 3D and 4D spectral information. Specifically, (4,3)D (HA)CANCOCX and (3,2)D (HACA)NCOCX were implemented and recorded for the 6 kDa protein GB1 within about 10% of the time required for acquiring the conventional congeners with the same maximal evolution times and spectral widths in the indirect dimensions. Spectral analysis was complemented by comparative analysis of expected spectral congestion in conventional and GFT NMR experiments, demonstrating that high spectral resolution of the GFT NMR experiments enables one to efficiently obtain nearly complete resonance assignments even for large proteins.
Magic-angle spinning; Chemical shift assignments; GB1; Correlation spectroscopy
The New York Consortium on Membrane Protein Structure (NYCOMPS) was formed to accelerate the acquisition of structural information on membrane proteins by applying a structural genomics approach. NY-COMPS comprises a bioinformatics group, a centralized facility operating a high-throughput cloning and screening pipeline, a set of associated wet labs that perform high-level protein production and structure determination by x-ray crystallography and NMR, and a set of investigators focused on methods development. In the first three years of operation, the NYCOMPS pipeline has so far produced and screened 7,250 expression constructs for 8,045 target proteins. Approximately 600 of these verified targets were scaled up to levels required for structural studies, so far yielding 24 membrane protein crystals. Here we describe the overall structure of NYCOMPS and provide details on the high-throughput pipeline.
Membrane proteins; Structural genomics; High throughput; NMR; X-ray
We describe a computational protocol, called DDMI, for redesigning scaffold proteins to bind to a specified region on a target protein. The DDMI protocol is implemented within the Rosetta molecular modeling program and uses rigid-body docking, sequence design, and gradient-based minimization of backbone and side chain torsion angles to design low energy interfaces between the scaffold and target protein. Iterative rounds of sequence design and conformational optimization were needed to produce models that have calculated binding energies that are similar to binding energies calculated for native complexes. We also show that additional conformation sampling with molecular dynamics can be iterated with sequence design to further lower the computed energy of the designed complexes. To experimentally test the DDMI protocol we redesigned the human hyperplastic discs protein to bind to the kinase domain of p21-activated kinase 1 (PAK1). Six designs were experimentally characterized. Two of the designs aggregated and were not characterized further. Of the remaining four designs, three bound to the PAK1 with affinities tighter than 350 μM. The tightest binding design, named Spider Roll, bound with an affinity of 100 μM. NMR –based structure prediction of Spider Roll based on backbone and 13Cβ chemical shifts using the program CS-ROSETTA indicated that the architecture of human hyperplastic discs protein is preserved. Mutagenesis studies confirmed that Spider Roll binds the target patch on PAK1. Additionally, Spider Roll binds to full length PAK1 in its activated state, but does not bind PAK1 when it forms an auto-inhibited conformation that blocks the Spider Roll target site. Subsequent NMR characterization of the binding of Spider Roll to PAK1 revealed a comparably small binding `on-rate' constant (<< 105 M−1 s−1). The ability to rationally design the site of novel protein-protein interactions is an important step towards creating new proteins that are useful as therapeutics or molecular probes.
Computational protein design; protein-protein interactions; protein docking; Rosetta molecular modeling program; NMR; CS-ROSETTA
VPA0419; yiiS; PFAM 04175; structural genomics; GFT NMR
analytical methods; clean absorption mode; GFT projection NMR; NMR spectroscopy; resolution enhancement
Structural genomics; GFT NMR; flagella; YvyC; chaperone
Clean absorption mode NMR data acquisition is presented based on mirrored time domain sampling and widely used time-proportional phase incrementation (TPPI) for quadrature detection. The resulting NMR spectra are devoid of dispersive frequency domain peak components. Those peak components exacerbate peak identification and shift peak maxima, and thus impede automated spectral analysis. The new approach is also of unique value for obtaining clean absorption mode reduced-dimensionality projection NMR spectra, which can rapidly provide high-dimensional spectral information for high-throughput NMR structure determination.
Clean Absorption Mode NMR; TPPI; GFT projection NMR; RD projection NMR
The NMR structure of the 21 kDa lipocalin FluA, which was previously obtained by combinatorial design, elucidates a reshaped binding site specific for the dye fluorescein resulting from 21 side chain replacements with respect to the parental lipocalin, the naturally occurring bilin-binding protein (BBP). As expected, FluA exhibits the lipocalin fold of BBP, comprising eight antiparallel β-strands forming a β-barrel with an α-helix attached to its side. Comparison of the NMR structure of the free FluA with the X-ray structures of BBP•biliverdin IXγ and FluA•fluorescein complexes revealed significant conformational changes in the binding pocket, which is formed by four loops at the open end of the β-barrel as well as adjoining β-strand segments. An ‘induced fit’ became apparent for the side-chain conformations of Arg 88 and Phe 99, which contact the bound fluorescein in the complex and undergo concerted rearrangement upon ligand binding. Moreover, slower internal motional modes of the polypeptide backbone were identified by measuring transverse 15N backbone spin relaxation times in the rotating frame for the free FluA and also the FluA•fluorescein complex. A reduction of such motions was detected upon complex formation, indicating rigidification of the protein structure and loss of conformational entropy. This hypothesis was confirmed by isothermal titration calorimetry, showing that ligand binding is enthalpy driven, thus overcompensating negative entropy associated with both ligand binding per se and rigidification of the protein. Our investigation of the solution structure and dynamics as well as thermodynamics of lipocalin-ligand interaction does not only provide insight into the general mechanism of small molecule accommodation in the deep and narrow cavity of this abundant class of proteins but will also support the future design of corresponding binding proteins with novel specificities, so-called “anticalins”.
anticalin; bilin-binding protein; ligand binding; lipocalin; protein engineering
Conventional protein structure determination from nuclear magnetic resonance data relies heavily on side-chain proton-proton distances. The necessary side-chain resonance assignment, however, is labor intensive and prone to error. Here we show that structures can be accurately determined without NMR information on the sidechains for proteins up to 25 kDa by incorporating backbone chemical shifts, residual dipolar couplings, and amide proton distances into the Rosetta protein structure modelling methodology. These data, which are too sparse for conventional methods, serve only to guide conformational search towards the lowest energy conformations in the folding landscape; the details of the computed models are determined by the physical chemistry implicit in the Rosetta all atom energy function. The new method is not hindered by the deuteration required to suppress nuclear relaxation processes for proteins greater than 15 kDa, and should enable routine NMR structure determination for larger proteins.
For cell regulation, E2-like ubiquitin-fold modifier conjugating enzyme 1 (Ufc1) is involved in the transfer of ubiquitin-fold modifier 1 (Ufm1), a ubiquitin like protein which is activated by E1-like enzyme Uba5, to various target proteins. Thereby, Ufc1 participates in the very recently discovered Ufm1-Uba5-Ufc1 ubiquination pathway which is found in metazoan organisms. The structure of human Ufc1 was solved by using both NMR spectroscopy and X-ray crystallography. The complementary insights obtained with the two techniques provided a unique basis for understanding the function of Ufc1 at atomic resolution. The Ufc1 structure consists of the catalytic core domain conserved in all E2-like enzymes and an additional N-terminal helix. The active site Cys116, which forms a thio-ester bond with Ufm1, is located in a flexible loop that is highly solvent accessible. Based on the Ufc1 and Ufm1 NMR structures, a model could be derived for the Ufc1-Ufm1 complex in which the C-terminal Gly83 of Ufm1 may well form the expected thio-ester with Cys116, suggesting that Ufm1-Ufc1 functions as described for other E1-E2-E3 machineries. α-helix 1 of Ufc1 adopts different conformations in the crystal and in solution, suggesting that this helix plays a key role to mediate specificity.
Ufc1; Ufm1; Ubiquitin; E2; Ubiquitin Conjugating Enzyme
The structure of the 142-residue protein Q8ZP25_SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE_ECOLI encoded in the genome of Eschericia coli was determined by NMR. The two proteins belong to Pfam  PF07449, which currently comprises 50 members, and belongs itself to the ‘thioredoxin-like clan’. However, protein HYAE_ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE_ECOLI was previously classified as a [NiFe] hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.
Chaperones; GFT NMR; HYAE_ECOLI; Q8ZP25_SALTY; Structural genomics; Thioredoxin
The varicella-zoster virus major transactivator, IE62, contains a potent N-terminal acidic transcriptional activation domain (TAD). Our experiments revealed that the minimal IE62 TAD encompasses amino acids (aa) 19 to 67. We showed that the minimal TAD interacts with the human Mediator complex. Site-specific mutations revealed residues throughout the minimal TAD that are important for both activation and Mediator interaction. The TAD interacts directly with aa 402 to 590 of the MED25 subunit, and site-specific TAD mutations abolished this interaction. Two-dimensional nuclear magnetic resonance spectroscopy revealed that the TAD is intrinsically unstructured. Our studies suggest that transactivation may involve the TAD adopting a defined structure upon binding MED25.
This Perspective, arising from a workshop held in July 2008 in Buffalo NY, provides an overview of the role NMR has played in the United States Protein Structure Initiative (PSI), and a vision of how NMR will contribute to the forthcoming PSI-Biology program. NMR has contributed in key ways to structure production by the PSI, and new methods have been developed which are impacting the broader protein NMR community.
Future of structural genomics; Functional genomics; NMR; Crystallography; NMR methods; Protein Structure Initiative (PSI)
The so far largely uncharacterized central carbon metabolism of the yeast Pichia stipitis was explored in batch and glucose-limited chemostat cultures using metabolic-flux ratio analysis by nuclear magnetic resonance. The concomitantly characterized network of active metabolic pathways was compared to those identified in Saccharomyces cerevisiae, which led to the following conclusions. (i) There is a remarkably low use of the non-oxidative pentose phosphate (PP) pathway for glucose catabolism in S. cerevisiae when compared to P. stipitis batch cultures. (ii) Metabolism of P. stipitis batch cultures is fully respirative, which contrasts with the predominantly respiro-fermentative metabolic state of S. cerevisiae. (iii) Glucose catabolism in chemostat cultures of both yeasts is primarily oxidative. (iv) In both yeasts there is significant in vivo malic enzyme activity during growth on glucose. (v) The amino acid biosynthesis pathways are identical in both yeasts. The present investigation thus demonstrates the power of metabolic-flux ratio analysis for comparative profiling of central carbon metabolism in lower eukaryotes. Although not used for glucose catabolism in batch culture, we demonstrate that the PP pathway in S. cerevisiae has a generally high catabolic capacity by overexpressing the Escherichia coli transhydrogenase UdhA in phosphoglucose isomerase-deficient S. cerevisiae.
Metabolic responses to cofeeding of different carbon substrates in carbon-limited chemostat cultures were investigated with riboflavin-producing Bacillus subtilis. Relative to the carbon content (or energy content) of the substrates, the biomass yield was lower in all cofeeding experiments than with glucose alone. The riboflavin yield, in contrast, was significantly increased in the acetoin- and gluconate-cofed cultures. In these two scenarios, unusually high intracellular ATP-to-ADP ratios correlated with improved riboflavin yields. Nuclear magnetic resonance spectra recorded with amino acids obtained from biosynthetically directed fractional 13C labeling experiments were used in an isotope isomer balancing framework to estimate intracellular carbon fluxes. The glycolysis-to-pentose phosphate (PP) pathway split ratio was almost invariant at about 80% in all experiments, a result that was particularly surprising for the cosubstrate gluconate, which feeds directly into the PP pathway. The in vivo activities of the tricarboxylic acid cycle, in contrast, varied more than twofold. The malic enzyme was active with acetate, gluconate, or acetoin cofeeding but not with citrate cofeeding or with glucose alone. The in vivo activity of the gluconeogenic phosphoenolpyruvate carboxykinase was found to be relatively high in all experiments, with the sole exception of the gluconate-cofed culture.
The intracellular carbon flux distribution in wild-type and pyruvate kinase-deficient Escherichia coli was estimated using biosynthetically directed fractional 13C labeling experiments with [U-13C6]glucose in glucose- or ammonia-limited chemostats, two-dimensional nuclear magnetic resonance (NMR) spectroscopy of cellular amino acids, and a comprehensive isotopomer model. The general response to disruption of both pyruvate kinase isoenzymes in E. coli was a local flux rerouting via the combined reactions of phosphoenolpyruvate (PEP) carboxylase and malic enzyme. Responses in the pentose phosphate pathway and the tricarboxylic acid cycle were strongly dependent on the environmental conditions. In addition, high futile cycling activity via the gluconeogenic PEP carboxykinase was identified at a low dilution rate in glucose-limited chemostat culture of pyruvate kinase-deficient E. coli, with a turnover that is comparable to the specific glucose uptake rate. Furthermore, flux analysis in mutant cultures indicates that glucose uptake in E. coli is not catalyzed exclusively by the phosphotransferase system in glucose-limited cultures at a low dilution rate. Reliability of the flux estimates thus obtained was verified by statistical error analysis and by comparison to intracellular carbon flux ratios that were independently calculated from the same NMR data by metabolic flux ratio analysis.
Escherichia coli MG1655 cells expressing Vitreoscilla hemoglobin (VHb), Alcaligenes eutrophus flavohemoprotein (FHP), the N-terminal hemoglobin domain of FHP (FHPg), and a fusion protein which comprises VHb and the A. eutrophus C-terminal reductase domain (VHb-Red) were grown in a microaerobic bioreactor to study the effects of low oxygen concentrations on the central carbon metabolism, using fractional 13C-labeling of the proteinogenic amino acids and two-dimensional [13C, 1H]-correlation nuclear magnetic resonance (NMR) spectroscopy. The NMR data revealed differences in the intracellular carbon fluxes between E. coli cells expressing either VHb or VHb-Red and cells expressing A. eutrophus FHP or the truncated heme domain (FHPg). E. coli MG1655 cells expressing either VHb or VHb-Red were found to function with a branched tricarboxylic acid (TCA) cycle. Furthermore, cellular demands for ATP and reduction equivalents in VHb- and VHb-Red-expressing cells were met by an increased flux through glycolysis. In contrast, in E. coli cells expressing A. eutrophus hemeproteins, the TCA cycle is running cyclically, indicating a shift towards a more aerobic regulation. Consistently, E. coli cells displaying FHP and FHPg activity showed lower production of the typical anaerobic by-products formate, acetate, and d-lactate. The implications of these observations for biotechnological applications are discussed.