Ionic liquid pretreatment of biomass has been shown to greatly reduce the recalcitrance of lignocellulosic biomass, resulting in improved sugar yields after enzymatic saccharification. However, even under these improved saccharification conditions the cost of enzymes still represents a significant proportion of the total cost of producing sugars and ultimately fuels from lignocellulosic biomass. Much of the high cost of enzymes is due to the low catalytic efficiency and stability of lignocellulolytic enzymes, especially cellulases, under conditions that include high temperatures and the presence of residual pretreatment chemicals, such as acids, organic solvents, bases, or ionic liquids. Improving the efficiency of the saccharification process on ionic liquid pretreated biomass will facilitate reduced enzyme loading and cost. Thermophilic cellulases have been shown to be stable and active in ionic liquids but their activity is typically at lower levels. Cel5A_Tma, a thermophilic endoglucanase from Thermotoga maritima, is highly active on cellulosic substrates and is stable in ionic liquid environments. Here, our motivation was to engineer mutants of Cel5A_Tma with higher activity on 1-ethyl-3-methylimidazolium acetate ([C2mim][OAc]) pretreated biomass. We developed a robotic platform to screen a random mutagenesis library of Cel5A_Tma. Twelve mutants with 25–42% improvement in specific activity on carboxymethyl cellulose and up to 30% improvement on ionic-liquid pretreated switchgrass were successfully isolated and characterized from a library of twenty thousand variants. Interestingly, most of the mutations in the improved variants are located distally to the active site on the protein surface and are not directly involved with substrate binding.
A procedure for model building is described that combines morphing a model to match a density map, trimming the morphed model and aligning the model to a sequence.
A procedure termed ‘morphing’ for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003 ▶) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package.
morphing; model building; sequence assignment; model–map correlation; loop-building
The functionality of the molecular-replacement pipeline phaser.MRage is introduced and illustrated with examples.
Phaser.MRage is a molecular-replacement automation framework that implements a full model-generation workflow and provides several layers of model exploration to the user. It is designed to handle a large number of models and can distribute calculations efficiently onto parallel hardware. In addition, phaser.MRage can identify correct solutions and use this information to accelerate the search. Firstly, it can quickly score all alternative models of a component once a correct solution has been found. Secondly, it can perform extensive analysis of identified solutions to find protein assemblies and can employ assembled models for subsequent searches. Thirdly, it is able to use a priori assembly information (derived from, for example, homologues) to speculatively place and score molecules, thereby customizing the search procedure to a certain class of protein molecule (for example, antibodies) and incorporating additional biological information into molecular replacement.
molecular replacement; pipeline; automation; phaser.MRage
In an effort to better understand the control of the formation of branched fatty acids in Micrococcus luteus, the structure of β-ketoacyl-ACP synthase III, which catalyzes the initial step of fatty-acid biosynthesis, has been determined.
Micrococcus luteus is a Gram-positive bacterium that produces iso- and anteiso-branched alkenes by the head-to-head condensation of fatty-acid thioesters [coenzyme A (CoA) or acyl carrier protein (ACP)]; this activity is of interest for the production of advanced biofuels. In an effort to better understand the control of the formation of branched fatty acids in M. luteus, the structure of FabH (MlFabH) was determined. FabH, or β-ketoacyl-ACP synthase III, catalyzes the initial step of fatty-acid biosynthesis: the condensation of malonyl-ACP with an acyl-CoA. Analysis of the MlFabH structure provides insights into its substrate selectivity with regard to length and branching of the acyl-CoA. The most structurally divergent region of FabH is the L9 loop region located at the dimer interface, which is involved in the formation of the acyl-binding channel and thus limits the substrate-channel size. The residue Phe336, which is positioned near the catalytic triad, appears to play a major role in branched-substrate selectivity. In addition to structural studies of MlFabH, transcriptional studies of M. luteus were also performed, focusing on the increase in the ratio of anteiso:iso-branched alkenes that was observed during the transition from early to late stationary phase. Gene-expression microarray analysis identified two genes involved in leucine and isoleucine metabolism that may explain this transition.
biofuels; β-ketoacyl-ACP synthase III; iso- and anteiso-branched alkenes; microarray
Cellulases are of great interest for application in biomass degradation, yet the molecular details of the mode of action of glycoside hydrolases during degradation of insoluble cellulose remain elusive. To further improve these enzymes for application at industrial conditions, it is critical to gain a better understanding of not only the details of the degradation process, but also the function of accessory modules.
We fused a carbohydrate-binding module (CBM) from family 2a to two thermophilic endoglucanases. We then applied neutron reflectometry to determine the mechanism of the resulting enhancements.
Catalytic activity of the chimeric enzymes was enhanced up to three fold on insoluble cellulose substrates as compared to wild type. Importantly, we demonstrate that the wild type enzymes affect primarily the surface properties of an amorphous cellulose film, while the chimeras containing a CBM alter the bulk properties of the amorphous film.
Our findings suggest that the CBM improves the efficiency of these cellulases by enabling digestion within the bulk of the film.
Cellulases; Endoglucanases; Carbohydrate-Binding modules; Cellulose model films; Neutron reflectometry
The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB (http://sbkb.org), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a Functional Sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal.
Database; Protein; Protein Production; Structural Biology; Structural Genomics; Technology
Lignin is often overlooked in the valorization of lignocellulosic biomass, but lignin-based materials and chemicals represent potential value-added products for biorefineries that could significantly improve the economics of a biorefinery. Fluctuating crude oil prices and changing fuel specifications are some of the driving factors to develop new technologies that could be used to convert polymeric lignin into low molecular weight lignin and or monomeric aromatic feedstocks to assist in the displacement of the current products associated with the conversion of a whole barrel of oil. We present an approach to produce these chemicals based on the selective breakdown of lignin during ionic liquid pretreatment.
The lignin breakdown products generated are found to be dependent on the starting biomass, and significant levels were generated on dissolution at 160°C for 6 hrs. Guaiacol was produced on dissolution of biomass and technical lignins. Vanillin was produced on dissolution of kraft lignin and eucalytpus. Syringol and allyl guaiacol were the major products observed on dissolution of switchgrass and pine, respectively, whereas syringol and allyl syringol were obtained by dissolution of eucalyptus. Furthermore, it was observed that different lignin-derived products could be generated by tuning the process conditions.
We have developed an ionic liquid based process that depolymerizes lignin and converts the low molecular weight lignin fractions into a variety of renewable chemicals from biomass. The generated chemicals (phenols, guaiacols, syringols, eugenol, catechols), their oxidized products (vanillin, vanillic acid, syringaldehyde) and their easily derivatized hydrocarbons (benzene, toluene, xylene, styrene, biphenyls and cyclohexane) already have relatively high market value as commodity and specialty chemicals, green building materials, nylons, and resins.
Lignin valorization; Ionic liquid pretreatment; Renewable chemicals; Biofuels
The statistical effects of translational noncrystallographic symmetry can be characterized by maximizing parameters describing the noncrystallographic symmetry in a likelihood function, thereby unmasking the competing statistical effects of twinning.
In the case of translational noncrystallographic symmetry (tNCS), two or more copies of a component in the asymmetric unit of the crystal are present in a similar orientation. This causes systematic modulations of the reflection intensities in the diffraction pattern, leading to problems with structure determination and refinement methods that assume, either implicitly or explicitly, that the distribution of intensities is a function only of resolution. To characterize the statistical effects of tNCS accurately, it is necessary to determine the translation relating the copies, any small rotational differences in their orientations, and the size of random coordinate differences caused by conformational differences. An algorithm to estimate these parameters and refine their values against a likelihood function is presented, and it is shown that by accounting for the statistical effects of tNCS it is possible to unmask the competing statistical effects of twinning and tNCS and to more robustly assess the crystal for the presence of twinning.
translational noncrystallographic symmetry; intensity statistics; twinning; maximum likelihood
Xylan is the second most abundant polysaccharide on Earth, and represents a major component of both dicot wood and the cell walls of grasses. Much knowledge has been gained from studies of xylan biosynthesis in the model plant, Arabidopsis. In particular, the irregular xylem (irx) mutants, named for their collapsed xylem cells, have been essential in gaining a greater understanding of the genes involved in xylan biosynthesis. In contrast, xylan biosynthesis in grass cell walls is poorly understood. We identified three rice genes Os07g49370 (OsIRX9), Os01g48440 (OsIRX9L), and Os06g47340 (OsIRX14), from glycosyltransferase family 43 as putative orthologs to the putative β-1,4-xylan backbone elongating Arabidopsis
IRX9, IRX9L, and IRX14 genes, respectively. We demonstrate that the over-expression of the closely related rice genes, in full or partly complement the two well-characterized Arabidopsis irregular xylem
(irx) mutants: irx9 and irx14. Complementation was assessed by measuring dwarfed phenotypes, irregular xylem cells in stem cross sections, xylose content of stems, xylosyltransferase (XylT) activity of stems, and stem strength. The expression of OsIRX9 in the irx9 mutant resulted in XylT activity of stems that was over double that of wild type plants, and the stem strength of this line increased to 124% above that of wild type. Taken together, our results suggest that OsIRX9/OsIRX9L, and OsIRX14, have similar functions to the Arabidopsis
IRX9 and IRX14 genes, respectively. Furthermore, our expression data indicate that OsIRX9 and OsIRX9L may function in building the xylan backbone in the secondary and primary cell walls, respectively. Our results provide insight into xylan biosynthesis in rice and how expression of a xylan synthesis gene may be modified to increase stem strength.
xylan; irregular xylan mutants; cell walls; type II cell walls; xylosyltransferase
Single-structure models derived from X-ray data do not adequately account for the inherent, functionally important dynamics of protein molecules. We generated ensembles of structures by time-averaged refinement, where local molecular vibrations were sampled by molecular-dynamics (MD) simulation whilst global disorder was partitioned into an underlying overall translation–libration–screw (TLS) model. Modeling of 20 protein datasets at 1.1–3.1 Å resolution reduced cross-validated Rfree values by 0.3–4.9%, indicating that ensemble models fit the X-ray data better than single structures. The ensembles revealed that, while most proteins display a well-ordered core, some proteins exhibit a ‘molten core’ likely supporting functionally important dynamics in ligand binding, enzyme activity and protomer assembly. Order–disorder changes in HIV protease indicate a mechanism of entropy compensation for ordering the catalytic residues upon ligand binding by disordering specific core residues. Thus, ensemble refinement extracts dynamical details from the X-ray data that allow a more comprehensive understanding of structure–dynamics–function relationships.
It has been clear since the early days of structural biology in the late 1950s that proteins and other biomolecules are continually changing shape, and that these changes have an important influence on both the structure and function of the molecules. X-ray diffraction can provide detailed information about the structure of a protein, but only limited information about how its structure fluctuates over time. Detailed information about the dynamic behaviour of proteins is essential for a proper understanding of a variety of processes, including catalysis, ligand binding and protein–protein interactions, and could also prove useful in drug design.
Currently most of the X-ray crystal structures in the Protein Data Bank are ‘snap-shots’ with limited or no information about protein dynamics. However, X-ray diffraction patterns are affected by the dynamics of the protein, and also by distortions of the crystal lattice, so three-dimensional (3D) models of proteins ought to take these phenomena into account. Molecular-dynamics (MD) computer simulations transform 3D structures into 4D ‘molecular movies’ by predicting the movement of individual atoms.
Combining MD simulations with crystallographic data has the potential to produce more realistic ensemble models of proteins in which the atomic fluctuations are represented by multiple structures within the ensemble. Moreover, in addition to improved structural information, this process—which is called ensemble refinement—can provide dynamical information about the protein. Earlier attempts to do this ran into problems because the number of model parameters needed was greater than the number of observed data points. Burnley et al. now overcome this problem by modelling local molecular vibrations with MD simulations and, at the same time, using a course-grain model to describe global disorder of longer length scales.
Ensemble refinement of high-resolution X-ray diffraction datasets for 20 different proteins from the Protein Data Bank produced a better fit to the data than single structures for all 20 proteins. Ensemble refinement also revealed that 3 of the 20 proteins had a ‘molten core’, rather than the well-ordered residues core found in most proteins: this is likely to be important in various biological functions including ligand binding, filament formation and enzymatic function. Burnley et al. also showed that a HIV enzyme underwent an order–disorder transition that is likely to influence how this enzyme works, and that similar transitions might influence the interactions between the small-molecule drug Imatinib (also known as Gleevec) and the enzymes it targets. Ensemble refinement could be applied to the majority of crystallography data currently being collected, or collected in the past, so further insights into the properties and interactions of a variety of proteins and other biomolecules can be expected.
protein; crystallography; structure; function; dynamics; None
In X-ray crystallography, molecular replacement and subsequent refinement is challenging at low resolution. We compared refinement methods using synchrotron diffraction data of photosystem I at 7.4 Å resolution, starting from different initial models with increasing deviations from the known high-resolution structure. Standard refinement spoiled the initial models moving them further away from the true structure and leading to high Rfree-values. In contrast, DEN-refinement improved even the most distant starting model as judged by Rfree, atomic root-mean-square differences to the true structure, significance of features not included in the initial model, and connectivity of electron density. The best protocol was DEN-refinement with initial segmented rigid-body refinement. For the most distant initial model, the fraction of atoms within 2 Å of the true structure improved from 24% to 60%. We also found a significant correlation between Rfree-values and the accuracy of the model, suggesting that Rfree is useful even at low resolution.
DEN refinement; membrane protein; low-resolution refinement; simulated annealing; free R value
Escherichia coli has the potential to be a powerful biocatalyst for the conversion of lignocellulosic biomass into useful materials such as biofuels and polymers. One important challenge in using E. coli for the transformation of biomass sugars is diauxie, or sequential utilization of different types of sugars. We demonstrate that, by increasing the intracellular levels of the transcription factor XylR, the preferential consumption of arabinose before xylose can be eliminated. In addition, XylR augmentation must be finely tuned for robust coutilization of these two hemicellulosic sugars. Using a novel technique for scarless gene insertion, an additional copy of xylR was inserted into the araBAD operon. The resulting strain was superior at cometabolizing mixtures of arabinose and xylose and was able to produce at least 36% more ethanol than wild-type strains. This strain is a useful starting point for the development of an E. coli biocatalyst that can simultaneously convert all biomass sugars.
X-ray crystallography is a critical tool in the study of biological systems. It is able to provide information that has been a prerequisite to understanding the fundamentals of life. It is also a method that is central to the development of new therapeutics for human disease. Significant time and effort are required to determine and optimize many macromolecular structures because of the need for manual interpretation of complex numerical data, often using many different software packages, and the repeated use of interactive three-dimensional graphics. The Phenix software package has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on automation. This has required the development of new algorithms that minimize or eliminate subjective input in favour of built-in expert-systems knowledge, the automation of procedures that are traditionally performed by hand, and the development of a computational framework that allows a tight integration between the algorithms. The application of automated methods is particularly appropriate in the field of structural proteomics, where high throughput is desired. Features in Phenix for the automation of experimental phasing with subsequent model building, molecular replacement, structure refinement and validation are described and examples given of running Phenix from both the command line and graphical user interface.
Macromolecular Crystallography; Automation; Phenix; X-ray; Diffraction; Python
Thermophilic fungi have attracted increased interest for their ability to secrete enzymes that deconstruct biomass at high temperatures. However, development of thermophilic fungi as enzyme producers for biomass deconstruction has not been thoroughly investigated. Comparing the enzymatic activities of thermophilic fungal strains that grow on targeted biomass feedstocks has the potential to identify promising candidates for strain development. Thielavia terrestris and Thermoascus aurantiacus were chosen for characterization based on literature precedents.
Thermoascus aurantiacus and Thielavia terrestris were cultivated on various biomass substrates and culture supernatants assayed for glycoside hydrolase activities. Supernatants from both cultures possessed comparable glycoside hydrolase activities when incubated with artificial biomass substrates. In contrast, saccharifications of ionic liquid pretreated switchgrass (Panicum virgatum) revealed that T. aurantiacus enzymes released more glucose than T. terrestris enzymes over a range of protein mass loadings and temperatures. Temperature-dependent saccharifications demonstrated that the T. aurantiacus proteins retained higher levels of activity compared to a commercial enzyme mixture sold by Novozymes, Cellic CTec2, at elevated temperatures. Enzymes secreted by T. aurantiacus released glucose at similar protein loadings to CTec2 on dilute acid, ammonia fiber expansion, or ionic liquid pretreated switchgrass. Proteomic analysis of the T. aurantiacus culture supernatant revealed dominant glycoside hydrolases from families 5, 7, 10, and 61, proteins that are key enzymes in commercial cocktails.
T. aurantiacus produces a complement of secreted proteins capable of higher levels of saccharification of pretreated switchgrass than T. terrestris enzymes. The T. aurantiacus enzymatic cocktail performs at the same level as commercially available enzymatic cocktail for biomass deconstruction, without strain development or genetic modifications. Therefore, T. aurantiacus provides an excellent platform to develop a thermophilic fungal system for enzyme production for the conversion of biomass to biofuels.
Thermoascus aurantiacus; Thielavia terrestris; GH 61; Polysaccharide monooxygenases; Fungal secretome; Ammonia fiber expansion; Ionic liquid; 1-ethyl-3-methylimidazolium acetate; Switchgrass (Panicum virgatum)
Cdc42 is a Ras-related small G-protein, and functions as a molecular switch in signal transduction pathways linked with cell growth and differentiation. It is controlled by cycling between GTP-bound (active) and GDP-bound (inactive) forms. Nucleotide binding and hydrolysis are modulated by interactions with effectors and/or regulatory proteins. These interactions are centralized in two relatively flexible “Switch” regions as characterized by internal dynamics on multiple timescales (Loh et al., (2001) Biochemistry 40, 4590–4600), and this flexibility may be essential for protein interactions. In the Switch I region, Thr35 seems critical for function, as it is completely invariant in Ras-related proteins. To investigate the importance of conformational flexibility in Switch I of Cdc42, we mutated threonine to alanine, determined the solution structure and characterized the backbone dynamics of the single-point mutant protein, Cdc42(T35A). Backbone dynamics data suggests that the mutation changes the timescale of the internal motions of several residues, with several resonances appearing not discernable in Cdc42 wild type (Adams and Oswald (2007) Biomolecular NMR Assignments 1, 225–227). The mutation does not appear to affect the thermal stability of Cdc42, and chymotrypsin digestion data further suggests that changes in conformational flexibility in Switch I slow proteolytic cleavage relative to wild type. In-vitro binding assays show reduced binding of Cdc42(T35A), relative to wild type, to a GTPase binding protein that inhibits GTP hydrolysis in Cdc42. These results suggest that the mutation of T35 leads to the loss of conformational freedom in Switch I that could affect effector/regulatory protein interactions.
Ras GTPase; Signal transduction; Cdc42; Threonine; Alanine; Switch 1 mutant; conformational flexibility; backbone dynamics
Metagenomics approaches provide access to environmental genetic diversity for biotechnology applications, enabling the discovery of new enzymes and pathways for numerous catalytic processes. Discovery of new glycoside hydrolases with improved biocatalytic properties for the efficient conversion of lignocellulosic material to biofuels is a critical challenge in the development of economically viable routes from biomass to fuels and chemicals.
Twenty-two putative ORFs (open reading frames) were identified from a switchgrass-adapted compost community based on sequence homology to related gene families. These ORFs were expressed in E. coli and assayed for predicted activities. Seven of the ORFs were demonstrated to encode active enzymes, encompassing five classes of hemicellulases. Four enzymes were over expressed in vivo, purified to homogeneity and subjected to detailed biochemical characterization. Their pH optima ranged between 5.5 - 7.5 and they exhibit moderate thermostability up to ~60-70°C.
Seven active enzymes were identified from this set of ORFs comprising five different hemicellulose activities. These enzymes have been shown to have useful properties, such as moderate thermal stability and broad pH optima, and may serve as the starting points for future protein engineering towards the goal of developing efficient enzyme cocktails for biomass degradation under diverse process conditions.
A density-based procedure is described for improving a homology model that is locally accurate but differs globally. The model is deformed to match the map and refined, yielding an improved starting point for density modification and further model-building.
An approach is presented for addressing the challenge of model rebuilding after molecular replacement in cases where the placed template is very different from the structure to be determined. The approach takes advantage of the observation that a template and target structure may have local structures that can be superimposed much more closely than can their complete structures. A density-guided procedure for deformation of a properly placed template is introduced. A shift in the coordinates of each residue in the structure is calculated based on optimizing the match of model density within a 6 Å radius of the center of that residue with a prime-and-switch electron-density map. The shifts are smoothed and applied to the atoms in each residue, leading to local deformation of the template that improves the match of map and model. The model is then refined to improve the geometry and the fit of model to the structure-factor data. A new map is then calculated and the process is repeated until convergence. The procedure can extend the routine applicability of automated molecular replacement, model building and refinement to search models with over 2 Å r.m.s.d. representing 65–100% of the structure.
molecular replacement; automation; macromolecular crystallography; structure similarity; modeling; Phenix; morphing
In scientific computing, Fortran was the dominant implementation language throughout most of the second part of the 20th century. The many tools accumulated during this time have been difficult to integrate with modern software, which is now dominated by object-oriented languages.
Driven by the requirements of a large-scale scientific software project, we have developed a Fortran to C++ source-to-source conversion tool named FABLE. This enables the continued development of new methods even while switching languages. We report the application of FABLE in three major projects and present detailed comparisons of Fortran and C++ runtime performances.
Our experience suggests that most Fortran 77 codes can be converted with an effort that is minor (measured in days) compared to the original development time (often measured in years). With FABLE it is possible to reuse and evolve legacy work in modern object-oriented environments, in a portable and maintainable way. FABLE is available under a nonrestrictive open source license. In FABLE the analysis of the Fortran sources is separated from the generation of the C++ sources. Therefore parts of FABLE could be reused for other target languages.
Fortran; C++; Source-to-source conversion; Python; Test-driven development
The foundations and current features of a widely used graphical user interface for macromolecular crystallography are described.
A new Python-based graphical user interface for the PHENIX suite of crystallography software is described. This interface unifies the command-line programs and their graphical displays, simplifying the development of new interfaces and avoiding duplication of function. With careful design, graphical interfaces can be displayed automatically, instead of being manually constructed. The resulting package is easily maintained and extended as new programs are added or modified.
macromolecular crystallography; graphical user interfaces; PHENIX
A low flow rate liquid microjet method for delivery of hydrated protein crystals to X-ray lasers is presented. Linac Coherent Light Source data demonstrates serial femtosecond protein crystallography with micrograms, a reduction of sample consumption by orders of magnitude.
An electrospun liquid microjet has been developed that delivers protein microcrystal suspensions at flow rates of 0.14–3.1 µl min−1 to perform serial femtosecond crystallography (SFX) studies with X-ray lasers. Thermolysin microcrystals flowed at 0.17 µl min−1 and diffracted to beyond 4 Å resolution, producing 14 000 indexable diffraction patterns, or four per second, from 140 µg of protein. Nanoflow electrospinning extends SFX to biological samples that necessitate minimal sample consumption.
serial femtosecond crystallography; nanoflow electrospinning
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. It has several automation features and is also highly flexible. Several hundred parameters enable extensive customizations for complex use cases. Multiple user-defined refinement strategies can be applied to specific parts of the model in a single refinement run. An intuitive graphical user interface is available to guide novice users and to assist advanced users in managing refinement projects. X-ray or neutron diffraction data can be used separately or jointly in refinement. phenix.refine is tightly integrated into the PHENIX suite, where it serves as a critical component in automated model building, final structure refinement, structure validation and deposition to the wwPDB. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
structure refinement; PHENIX; joint X-ray/neutron refinement; maximum likelihood; TLS; simulated annealing; subatomic resolution; real-space refinement; twinning; NCS
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
The combination of algorithms from the structure-modeling field with those of crystallographic structure determination can broaden the range of templates that are useful for structure determination by the method of molecular replacement. Automated tools in phenix.mr_rosetta simplify the application of these combined approaches by integrating Phenix crystallographic algorithms and Rosetta structure-modeling algorithms and by systematically generating and evaluating models with a combination of these methods. The phenix.mr_rosetta algorithms can be used to automatically determine challenging structures. The approaches used in phenix.mr_rosetta are described along with examples that show roles that structure-modeling can play in molecular replacement.
Molecular replacement; Automation; Macromolecular crystallography; Rosetta; Phenix
The implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
Approximately 85% of the structures deposited in the Protein Data Bank have been solved using X-ray crystallography, making it the leading method for three-dimensional structure determination of macromolecules. One of the limitations of the method is that the typical data quality (resolution) does not allow the direct determination of H-atom positions. Most hydrogen positions can be inferred from the positions of other atoms and therefore can be readily included into the structure model as a priori knowledge. However, this may not be the case in biologically active sites of macromolecules, where the presence and position of hydrogen is crucial to the enzymatic mechanism. This makes the application of neutron crystallography in biology particularly important, as H atoms can be clearly located in experimental neutron scattering density maps. Without exception, when a neutron structure is determined the corresponding X-ray structure is also known, making it possible to derive the complete structure using both data sets. Here, the implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
structure refinement; neutrons; joint X-ray and neutron refinement; PHENIX