Protein Structure Initiative:Biology (PSI:Biology) is the third phase of PSI where protein structures are determined in high-throughput to characterize their biological functions. The transition to the third phase entailed the formation of PSI:Biology Partnerships which are composed of structural genomics centers and biomedical science laboratories. We present a method to examine the impact of protein structures determined under the auspices of PSI:Biology by measuring their rates of annotations. The mean numbers of annotations per structure and per residue are examined. These are designed to provide measures of the amount of structure to function connections that can be leveraged from each structure.
One result is that PSI:Biology structures are found to have a higher rate of annotations than structures determined during the first two phases of PSI. A second result is that the subset of PSI:Biology structures determined through PSI:Biology Partnerships have a higher rate of annotations than those determined exclusive of those partnerships. Both results hold when the annotation rates are examined either at the level of the entire protein or for annotations that are known to fall at specific residues within the portion of the protein that has a determined structure.
We conclude that PSI:Biology determines structures that are estimated to have a higher degree of biomedical interest than those determined during the first two phases of PSI based on a broad array of biomedical annotations. For the PSI:Biology Partnerships, we see that there is an associated added value that represents part of the progress toward the goals of PSI:Biology. We interpret the added value to mean that team-based structural biology projects that utilize the expertise and technologies of structural genomics centers together with biological laboratories in the community are conducted in a synergistic manner. We show that the annotation rates can be used in conjunction with established metrics, i.e. the numbers of structures and impact of publication records, to monitor the progress of PSI:Biology towards its goals of examining structure to function connections of high biomedical relevance. The metric provides an objective means to quantify the overall impact of PSI:Biology as it uses biomedical annotations from external sources.
Protein Structure Initiative; Structural genomics; Protein annotations; Protein annotation; Scientific partnerships; Structure to function relationships
The software suite Xsolve semi-exhaustively explores key parameters of the X-ray structure-determination process to compute multiple three-dimensional protein structures independently and in parallel from a set of diffraction images. An optimal consensus model for subsequent manual refinement is computed from these structures.
The Joint Center for Structural Genomics (JCSG), one of four large-scale structure-determination centers funded by the US Protein Structure Initiative (PSI) through the National Institute for General Medical Sciences, has been operating an automated distributed structure-solution pipeline, Xsolve, for well over half a decade. During PSI-2, Xsolve solved, traced and partially refined 90% of the JCSG’s nearly 770 MAD/SAD structures at an average resolution of about 2 Å without human intervention. Xsolve executes many well established publicly available crystallography software programs in parallel on a commodity Linux cluster, resulting in multiple traces for any given target. Additional software programs have been developed and integrated into Xsolve to further minimize human effort in structure refinement. ConsensusModeler exploits complementarities in traces from Xsolve to compute a single optimal model for manual refinement. Xpleo is a powerful robotics-inspired algorithm to build missing fragments and qFit automatically identifies and fits alternate conformations.
distributed protein-structure determination; consensus models; parallel computing
The Center for Eukaryotic Structural Genomics (CESG) is a “specialized” or “technology development” center supported by the Protein Structure Initiative (PSI). CESG’s mission is to develop improved methods for the high-throughput solution of structures from eukaryotic proteins, with a very strong weighting toward human proteins of biomedical relevance. During the first three years of PSI-2, CESG selected targets representing 601 proteins from Homo sapiens, 33 from mouse, 10 from rat, 139 from Galdieria sulphuraria, 35 from Arabidopsis thaliana, 96 from Cyanidioschyzon merolae, 80 from Plasmodium falciparum, 24 from yeast, and about 25 from other eukaryotes. Notably, 30% of all structures of human proteins solved by the PSI Centers were determined at CESG. Whereas eukaryotic proteins generally are considered to be much more challenging targets than prokaryotic proteins, the technology now in place at CESG yields success rates that are comparable to those of the large production centers that work primarily on prokaryotic proteins. We describe here the technological innovations that underlie CESG’s platforms for bioinformatics and laboratory information management, target selection, protein production, and structure determination by X-ray crystallography or NMR spectroscopy.
CESG; LIMS; PSI Materials Repository; Protein Structure Initiative (PSI); PepcDB; Plasmid design; Protein production; Protein structure determination; TEV protease
The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years and has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe.
The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years. The JCSG has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe, as well as making substantial inroads into structural coverage of an entire organism. Targets are processed through an extensive combination of bioinformatics and biophysical analyses to efficiently characterize and optimize each target prior to selection for structure determination. The pipeline uses parallel processing methods at almost every step in the process and can adapt to a wide range of protein targets from bacterial to human. The construction, expansion and optimization of the JCSG gene-to-structure pipeline over the years have resulted in many technological and methodological advances and developments. The vast number of targets and the enormous amounts of associated data processed through the multiple stages of the experimental pipeline required the development of variety of valuable resources that, wherever feasible, have been converted to free-access web-based tools and applications.
structural genomics; Joint Center for Structural Genomics; Protein Structure Initiative
The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A–C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.
CSGID; structural genomics; infectious diseases; drug discovery; structural biology; protein structure
The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by today’s UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849–851, 2007) has resulted from systematic targeting of large families. PSI’s per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another ~15 years to cover most sequences in the current UniProt database.
Protein structure determination; Structural genomics; Evolution; Protein universe
Efforts in structural biology have targeted the systematic determination of all protein structures through experimental determination or modeling. In recent years, 3-D electron cryomicroscopy (cryoEM) has assumed an increasingly important role in determining the structures of these large macromolecular assemblies to intermediate resolutions (6–10 Å). While these structures provide a snapshot of the assembly and its components in well-defined functional states, the resolution limits the ability to build accurate structural models. In contrast, sequence-based modeling techniques are capable of producing relatively robust structural models for isolated proteins or domains. In this work, we developed and applied a hybrid modeling approach, utilizing cryoEM density and ab initio modeling to produce a structural model for the core domain of a herpesvirus structural protein, VP26. Specifically, this method, first tested on simulated data, utilizes the cryoEM density map as a geometrical constraint in identifying the most native-like models from a gallery of models generated by ab initio modeling. The resulting model for the core domain of VP26, based on the 8.5-Å resolution herpes simplex virus type 1 (HSV-1) capsid cryoEM structure and mutational data, exhibited a novel fold. Additionally, the core domain of VP26 appeared to have a complementary interface to the known upper-domain structure of VP5, its cognate binding partner. While this new model provides for a better understanding of the assembly and interactions of VP26 in HSV-1, the approach itself may have broader applications in modeling the components of large macromolecular assemblies.
Efforts in structural genomics have targeted the systematic determination of all protein structures primarily using X-ray crystallography and nuclear magnetic resonance. These initiatives have typically focused on domains, single-protein and in some cases small complexes, and as such macromolecular machines are relatively underrepresented. However, in recent years, electron cryomicroscopy (cryoEM) has assumed an increasingly important role in determining the structure of large macromolecular machines in their biologically active states to intermediate resolutions (5–10 Å). Concurrently, modeling techniques, such as comparative and ab initio modeling, have played an increasingly important role in structure determination of small proteins not amenable to other structural techniques. In this work, Baker and colleagues have leveraged ab initio modeling and cryoEM to assess and identify structural models for the macromolecular components within a large complex. Specifically, the cryoEM density can be used to select the most native-like models from a large gallery of potential models. Applied to the smallest herpesvirus capsid protein, VP26 (12 kDa), it was possible to determine its core domain structure (residues 42–105), which helped to elucidate interactions among the structural protein in the virion. Beyond VP26, these techniques potentially provide a new pathway for accurate structure determination of proteins in their biological and functional states.
The ultimate goal of structural biology is to understand the structural basis of proteins in cellular processes. In structural biology, the most critical issue is the availability of high-quality samples. “Structural biology-grade” proteins must be generated in the quantity and quality suitable for structure determination using X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. The purification procedures must reproducibly yield homogeneous proteins or their derivatives containing marker atom(s) in milligram quantities. The choice of protein purification and handling procedures plays a critical role in obtaining high-quality protein samples. With structural genomics emphasizing a genome-based approach in understanding protein structure and function, a number of unique structures covering most of the protein folding space have been determined and new technologies with high efficiency have been developed. At the Midwest Center for Structural Genomics (MCSG), we have developed semi-automated protocols for high-throughput parallel protein expression and purification. A protein, expressed as a fusion with a cleavable affinity tag, is purified in two consecutive immobilized metal affinity chromatography (IMAC) steps: (i) the first step is an IMAC coupled with buffer-exchange, or size exclusion chromatography (IMAC-I), followed by the cleavage of the affinity tag using the highly specific Tobacco Etch Virus (TEV) protease;  the second step is IMAC and buffer exchange (IMAC-II) to remove the cleaved tag and tagged TEV protease. These protocols have been implemented on multidimensional chromatography workstations and, as we have shown, many proteins can be successfully produced in large-scale. All methods and protocols used for purification, some developed by MCSG, others adopted and integrated into the MCSG purification pipeline and more recently the Center for Structural Genomics of Infectious Diseases (CSGID) purification pipeline, are discussed in this chapter.
domain design; expression vectors; gene cloning; protein purification; crystallization screening; quality assessment
The cores of globular proteins are densely packed, resulting in complicated networks of structural interactions. These interactions in turn give rise to dynamic structural correlations over a wide range of time scales. Accurate analysis of these complex correlations is crucial for understanding biomolecular mechanisms and for relating structure to function. Here we report a highly accurate technique for inferring the major modes of structural correlation in macromolecules using likelihood-based statistical analysis of sets of structures. This method is generally applicable to any ensemble of related molecules, including families of nuclear magnetic resonance (NMR) models, different crystal forms of a protein, and structural alignments of homologous proteins, as well as molecular dynamics trajectories. Dominant modes of structural correlation are determined using principal components analysis (PCA) of the maximum likelihood estimate of the correlation matrix. The correlations we identify are inherently independent of the statistical uncertainty and dynamic heterogeneity associated with the structural coordinates. We additionally present an easily interpretable method (“PCA plots”) for displaying these positional correlations by color-coding them onto a macromolecular structure. Maximum likelihood PCA of structural superpositions, and the structural PCA plots that illustrate the results, will facilitate the accurate determination of dynamic structural correlations analyzed in diverse fields of structural biology.
Biological macromolecules comprise extensive networks of interconnected atoms. These complex coupled networks result in correlated structural dynamics, where atoms and residues move and evolve together as concerted conformational changes. The availability of a wealth of macromolecular structures necessitates the use of robust strategies for analyzing the correlated modes of motion found in molecular ensembles. Current strategies use a combination of least-squares superpositions and statistical analysis of the structural covariance matrix. However, the least-squares treatment implicitly requires that atoms are uncorrelated and that each atom has the same positional uncertainty, two assumptions which are violated in structural ensembles. For example, the atoms in the proteins are connected by chemical bonds, covalent and non-covalent, resulting in strong correlations. Furthermore, different atoms have different variances, because some atoms are known with less precision or have greater mobility. Using maximum likelihood (ML) analysis, we have developed a technique that is markedly more accurate than the classical least-squares approach by accounting for both correlations and heterogeneous variances. The improved ability to accurately analyze the major modes of dynamic structural correlations will benefit a diverse range of biological disciplines, including nuclear magnetic resonance (NMR) spectroscopy, crystallography, molecular dynamics, and molecular evolution.
Protein X-ray crystallography recently celebrated its 50th anniversary. The structures of myoglobin and hemoglobin determined by Kendrew and Perutz provided the first glimpses into the complex protein architecture and chemistry. Since then, the field of structural molecular biology has experienced extraordinary progress and now over 53,000 proteins structures have been deposited into the Protein Data Bank. In the past decade many advances in macromolecular crystallography have been driven by world-wide structural genomics efforts. This was made possible because of third-generation synchrotron sources, structure phasing approaches using anomalous signal and cryo-crystallography. Complementary progress in molecular biology, proteomics, hardware and software for crystallographic data collection, structure determination and refinement, computer science, databases, robotics and automation improved and accelerated many processes. These advancements provide the robust foundation for structural molecular biology and assure strong contribution to science in the future. In this report we focus mainly on reviewing structural genomics high-throughput X-ray crystallography technologies and their impact.
Specific use cases of TOPSAN, an innovative collaborative platform for creating, sharing and distributing annotations and insights about protein structures, such as those determined by high-throughput structural genomics in the Protein Structure Initiative (PSI), are described. TOPSAN is the main annotation platform for JCSG structures and serves as a conduit for initiating collaborations with the biological community, as illustrated in this special issue of Acta Crystallographica Section F. Developed at the JCSG with the goal of opening a dialogue on the novel protein structures with the broader biological community, TOPSAN is a unique tool for fostering distributed collaborations and provides an efficient pathway to peer-reviewed publications.
The NIH Protein Structure Initiative centers, such as the Joint Center for Structural Genomics (JCSG), have developed highly efficient technological platforms that are capable of experimentally determining the three-dimensional structures of hundreds of proteins per year. However, the overwhelming majority of the almost 5000 protein structures determined by these centers have yet to be described in the peer-reviewed literature. In a high-throughput structural genomics environment, the process of structure determination occurs independently of any associated experimental characterization of function, which creates a challenge for the annotation and analysis of structures and the publication of these results. This challenge has been addressed by developing TOPSAN (‘The Open Protein Structure Annotation Network’), which enables the generation of knowledge via collaborations among globally distributed contributors supported by automated amalgamation of available information. TOPSAN currently provides annotations for all protein structures determined by the JCSG in addition to preliminary annotations on a large number of structures from the other PSI production centers. TOPSAN-enabled collaborations have resulted in insightful structure–function analysis for many proteins and have led to numerous peer-reviewed publications, as exemplified by the articles included in this issue of Acta Crystallographica Section F.
collaborative annotations; structural genomics; Protein Structure Initiative
The ability to adopt complex three-dimensional (3D) structures that can rapidly interconvert between multiple functional states (folding and dynamics) is vital for the proper functioning of RNAs. Consequently, RNA structure and dynamics necessarily determine their biological function. In the post-genomic era, it is clear that RNAs comprise a larger proportion (>50%) of the transcribed genome compared to proteins (≤2%). Yet the determination of the 3D structures of RNAs lags considerably behind those of proteins and to date there are even fewer investigations of dynamics in RNAs compared to proteins. Site specific incorporation of various structural and dynamic probes into nucleic acids would likely transform RNA structural biology. Therefore, various methods for introducing probes for structural, functional, and biotechnological applications are critically assessed here. These probes include stable isotopes such as 2H, 13C, 15N, and 19F. Incorporation of these probes using improved RNA ligation strategies promises to change the landscape of structural biology of supramacromolecules probed by biophysical tools such as nuclear magnetic resonance (NMR) spectroscopy, X-ray crystallography and Raman spectroscopy. Finally, some of the structural and dynamic problems that can be addressed using these technological advances are outlined.
dynamics; FRET; isotopic labeling; ligation; NMR; Raman; RNA dynamics and folding and structure; X-ray crystallography
The Protein Structure Initiative:Biology-Materials Repository (PSI:Biology-MR; MR; http://psimr.asu.edu) sequence-verifies, annotates, stores, and distributes the protein expression plasmids and vectors created by the Protein Structure Initiative (PSI). The MR has developed an informatics and sample processing pipeline that manages this process for thousands of samples per month from nearly a dozen PSI centers. DNASU (http://dnasu.asu.edu), a freely searchable database, stores the plasmid annotations, which include the full-length sequence, vector information, and associated publications for over 130,000 plasmids created by our laboratory, by the PSI and other consortia, and by individual laboratories for distribution to researchers worldwide. Each plasmid links to external resources, including the PSI Structural Biology Knowledgebase (http://sbkb.org), which facilitates cross-referencing of a particular plasmid to additional protein annotations and experimental data. To expedite and simplify plasmid requests, the MR uses an expedited material transfer agreement (EP-MTA) network, where researchers from network institutions can order and receive PSI plasmids without institutional delays. Currently over 39,000 protein expression plasmids and 78 empty vectors from the PSI are available upon request from DNASU. Overall, the MR’s repository of expression-ready plasmids, its automated pipeline, and the rapid process for receiving and distributing these plasmids more effectively allows the research community to dissect the biological function of proteins whose structures have been studied by the PSI.
plasmid; structural biology; Protein Structure Initiative; PSI:Biology; protein expression
In structural biology, the most critical issue is the availability of high-quality samples. “Structural-biology-grade” proteins must be generated in a quantity and quality suitable for structure determination using X-ray crystallography or nuclear magnetic resonance. The additional challenge for structural genomics is the need for high numbers of proteins at low cost where protein targets quite often have low sequence similarities, unknown properties and are poorly characterized. The purification procedures must reproducibly yield homogeneous proteins or their derivatives containing marker atom(s) in milligram quantities. The choice of protein purification and handling procedures plays a critical role in obtaining high-quality protein samples. Where the ultimate goal of structural biology is the same—to understand the structural basis of proteins in cellular processes, the structural genomics approach is different in that the functional aspects of individual protein or family are not ignored, however, emphasis here is on the number of unique structures, covering most of the protein folding space and developing new technologies with high efficiency. At the Mid-west Center Structural Genomics (MCSG), we have developed semiautomated protocols for high-throughput parallel protein purification. In brief, a protein, expressed as a fusion with a cleavable affinity tag, is purified in two immobilized metal affinity chromatography (IMAC) steps: (i) first IMAC coupled with buffer-exchange step, and after tag cleavage using TEV protease, (ii) second IMAC and buffer exchange to clean up cleaved tags and tagged TEV protease. Size exclusion chromatography is also applied as needed. These protocols have been implemented on multidimensional chromatography workstations AKTAexplorer and AKTAxpress (GE Healthcare). All methods and protocols used for purification, some developed in MCSG, others adopted and integrated into the MCSG purification pipeline and more recently the Center for Structural Genomics of Infectious Disease (CSGID) purification pipeline, are discussed in this chapter.
Single-structure models derived from X-ray data do not adequately account for the inherent, functionally important dynamics of protein molecules. We generated ensembles of structures by time-averaged refinement, where local molecular vibrations were sampled by molecular-dynamics (MD) simulation whilst global disorder was partitioned into an underlying overall translation–libration–screw (TLS) model. Modeling of 20 protein datasets at 1.1–3.1 Å resolution reduced cross-validated Rfree values by 0.3–4.9%, indicating that ensemble models fit the X-ray data better than single structures. The ensembles revealed that, while most proteins display a well-ordered core, some proteins exhibit a ‘molten core’ likely supporting functionally important dynamics in ligand binding, enzyme activity and protomer assembly. Order–disorder changes in HIV protease indicate a mechanism of entropy compensation for ordering the catalytic residues upon ligand binding by disordering specific core residues. Thus, ensemble refinement extracts dynamical details from the X-ray data that allow a more comprehensive understanding of structure–dynamics–function relationships.
It has been clear since the early days of structural biology in the late 1950s that proteins and other biomolecules are continually changing shape, and that these changes have an important influence on both the structure and function of the molecules. X-ray diffraction can provide detailed information about the structure of a protein, but only limited information about how its structure fluctuates over time. Detailed information about the dynamic behaviour of proteins is essential for a proper understanding of a variety of processes, including catalysis, ligand binding and protein–protein interactions, and could also prove useful in drug design.
Currently most of the X-ray crystal structures in the Protein Data Bank are ‘snap-shots’ with limited or no information about protein dynamics. However, X-ray diffraction patterns are affected by the dynamics of the protein, and also by distortions of the crystal lattice, so three-dimensional (3D) models of proteins ought to take these phenomena into account. Molecular-dynamics (MD) computer simulations transform 3D structures into 4D ‘molecular movies’ by predicting the movement of individual atoms.
Combining MD simulations with crystallographic data has the potential to produce more realistic ensemble models of proteins in which the atomic fluctuations are represented by multiple structures within the ensemble. Moreover, in addition to improved structural information, this process—which is called ensemble refinement—can provide dynamical information about the protein. Earlier attempts to do this ran into problems because the number of model parameters needed was greater than the number of observed data points. Burnley et al. now overcome this problem by modelling local molecular vibrations with MD simulations and, at the same time, using a course-grain model to describe global disorder of longer length scales.
Ensemble refinement of high-resolution X-ray diffraction datasets for 20 different proteins from the Protein Data Bank produced a better fit to the data than single structures for all 20 proteins. Ensemble refinement also revealed that 3 of the 20 proteins had a ‘molten core’, rather than the well-ordered residues core found in most proteins: this is likely to be important in various biological functions including ligand binding, filament formation and enzymatic function. Burnley et al. also showed that a HIV enzyme underwent an order–disorder transition that is likely to influence how this enzyme works, and that similar transitions might influence the interactions between the small-molecule drug Imatinib (also known as Gleevec) and the enzymes it targets. Ensemble refinement could be applied to the majority of crystallography data currently being collected, or collected in the past, so further insights into the properties and interactions of a variety of proteins and other biomolecules can be expected.
protein; crystallography; structure; function; dynamics; None
We demonstrate that it is feasible to determine high-resolution protein structures by electron crystallography of three-dimensional crystals in an electron cryo-microscope (CryoEM). Lysozyme microcrystals were frozen on an electron microscopy grid, and electron diffraction data collected to 1.7 Å resolution. We developed a data collection protocol to collect a full-tilt series in electron diffraction to atomic resolution. A single tilt series contains up to 90 individual diffraction patterns collected from a single crystal with tilt angle increment of 0.1–1° and a total accumulated electron dose less than 10 electrons per angstrom squared. We indexed the data from three crystals and used them for structure determination of lysozyme by molecular replacement followed by crystallographic refinement to 2.9 Å resolution. This proof of principle paves the way for the implementation of a new technique, which we name ‘MicroED’, that may have wide applicability in structural biology.
X-ray crystallography has been used to work out the atomic structure of a large number of proteins. In a typical X-ray crystallography experiment, a beam of X-rays is directed at a protein crystal, which scatters some of the X-ray photons to produce a diffraction pattern. The crystal is then rotated through a small angle and another diffraction pattern is recorded. Finally, after this process has been repeated enough times, it is possible to work backwards from the diffraction patterns to figure out the structure of the protein.
The crystals used for X-ray crystallography must be large to withstand the damage caused by repeated exposure to the X-ray beam. However, some proteins do not form crystals at all, and others only form small crystals. It is possible to overcome this problem by using extremely short pulses of X-rays, but this requires a very large number of small crystals and ultrashort X-ray pulses are only available at a handful of research centers around the world. There is, therefore, a need for other approaches that can determine the structure of proteins that only form small crystals.
Electron crystallography is similar to X-ray crystallography in that a protein crystal scatters a beam to produce a diffraction pattern. However, the interactions between the electrons in the beam and the crystal are much stronger than those between the X-ray photons and the crystal. This means that meaningful amounts of data can be collected from much smaller crystals. However, it is normally only possible to collect one diffraction pattern from each crystal because of beam induced damage. Researchers have developed methods to merge the diffraction patterns produced by hundreds of small crystals, but to date these techniques have only worked with very thin two-dimensional crystals that contain only one layer of the protein of interest.
Now Shi et al. report a new approach to electron crystallography that works with very small three-dimensional crystals. Called MicroED, this technique involves placing the crystal in a transmission electron cryo-microscope, which is a fairly standard piece of equipment in many laboratories. The normal ‘low-dose’ electron beam in one of these microscopes would normally damage the crystal after a single diffraction pattern had been collected. However, Shi et al. realized that it was possible to obtain diffraction patterns without severely damaging the crystal if they dramatically reduced the normal low-dose electron beam. By reducing the electron dose by a factor of 200, it was possible to collect up to 90 diffraction patterns from the same, very small, three-dimensional crystal, and then—similar to what happens in X-ray crystallography—work backwards to figure out the structure of the protein. Shi et al. demonstrated the feasibility of the MicroED approach by using it to determine the structure of lysozyme, which is widely used as a test protein in crystallography, with a resolution of 2.9 Å. This proof-of principle study paves the way for crystallographers to study protein that cannot be studied with existing techniques.
electron crystallography; electron diffraction; electron cryomicroscopy (cryo-EM); microED; protein structure; microcrystals; None
The field of Membrane Protein Structural Biology has grown significantly since its first landmark in 1985 with the first three-dimensional atomic resolution structure of a membrane protein. Nearly twenty-six years later, the crystal structure of the beta2 adrenergic receptor in complex with G protein has contributed to another landmark in the field leading to the 2012 Nobel Prize in Chemistry. At present, more than 350 unique membrane protein structures solved by X-ray crystallography (http://blanco.biomol.uci.edu/mpstruc/exp/list, Stephen White Lab at UC Irvine) are available in the Protein Data Bank. The advent of genomics and proteomics initiatives combined with high-throughput technologies, such as automation, miniaturization, integration and third-generation synchrotrons, has enhanced membrane protein structure determination rate. X-ray crystallography is still the only method capable of providing detailed information on how ligands, cofactors, and ions interact with proteins, and is therefore a powerful tool in biochemistry and drug discovery. Yet the growth of membrane protein crystals suitable for X-ray diffraction studies amazingly remains a fine art and a major bottleneck in the field. It is often necessary to apply as many innovative approaches as possible. In this review we draw attention to the latest methods and strategies for the production of suitable crystals for membrane protein structure determination. In addition we also highlight the impact that third-generation synchrotron radiation has made in the field, summarizing the latest strategies used at synchrotron beamlines for screening and data collection from such demanding crystals. This article is part of a Special Issue entitled: Structural and biophysical characterisation of membrane protein-ligand binding.
•Overview of the most recent advances regarding the growth of membrane protein crystals•Rational design of new crystallization screens for membrane proteins•New automated method for dehydration of membrane proteins•High-throughput approach in seeding of membrane protein crystals•Recent developments in membrane protein structure determination
Membrane protein; Crystal dehydration; Crystal seeding; Macromolecular crystallography; In situ data collection; XFEL
An unusual example of how virus structure determination pushes the limits of the molecular replacement method is presented.
The study of virus structures has contributed to methodological advances in structural biology that are generally applicable (molecular replacement and noncrystallographic symmetry are just two of the best known examples). Moreover, structural virology has been instrumental in forging the more general concept of exploiting phase information derived from multiple structural techniques. This hybridization of structural methods, primarily electron microscopy (EM) and X-ray crystallography, but also small-angle X-ray scattering (SAXS) and nuclear magnetic resonance (NMR) spectroscopy, is central to integrative structural biology. Here, the interplay of X-ray crystallography and EM is illustrated through the example of the structural determination of the marine lipid-containing bacteriophage PM2. Molecular replacement starting from an ∼13 Å cryo-EM reconstruction, followed by cycling density averaging, phase extension and solvent flattening, gave the X-ray structure of the intact virus at 7 Å resolution This in turn served as a bridge to phase, to 2.5 Å resolution, data from twinned crystals of the major coat protein (P2), ultimately yielding a quasi-atomic model of the particle, which provided significant insights into virus evolution and viral membrane biogenesis.
virus structure; phasing methods; data collection; noncrystallographic symmetry
NMR structures of the proteins TM1112 and TM1367 solved by the JCSG in solution at 298 K could be superimposed with the corresponding crystal structures at 100 K with r.m.s.d. values of <1.0 Å for the backbone heavy atoms. For both proteins the structural differences between multiple molecules in the asymmetric unit of the crystals correlated with structural variations within the bundles of conformers used to represent the NMR solution structures. A recently introduced JCSG NMR structure-determination protocol, which makes use of the software package UNIO for extensive automation, was further evaluated by comparison of the TM1112 structure obtained using these automated methods with another NMR structure that was independently solved in another PSI center, where a largely interactive approach was applied.
The NMR structures of the TM1112 and TM1367 proteins from Thermotoga maritima in solution at 298 K were determined following a new protocol which uses the software package UNIO for extensive automation. The results obtained with this novel procedure were evaluated by comparison with the crystal structures solved by the JCSG at 100 K to 1.83 and 1.90 Å resolution, respectively. In addition, the TM1112 solution structure was compared with an NMR structure solved by the NESG using a conventional largely interactive methodology. For both proteins, the newly determined NMR structure could be superimposed with the crystal structure with r.m.s.d. values of <1.0 Å for the backbone heavy atoms, which provided a starting platform to investigate local structure variations, which may arise from either the methods used or from the different chemical environments in solution and in the crystal. Thereby, these comparative studies were further explored with the use of reference NMR and crystal structures, which were computed using the NMR software with input of upper-limit distance constraints derived from the molecular models that represent the results of structure determination by NMR and by X-ray diffraction, respectively. The results thus obtained show that NMR structure calculations with the new automated UNIO software used by the JCSG compare favorably with those from a more labor-intensive and time-intensive interactive procedure. An intriguing observation is that the ‘bundles’ of two TM1112 or three TM1367 molecules in the asymmetric unit of the crystal structures mimic the behavior of the bundles of 20 conformers used to represent the NMR solution structures when comparing global r.m.s.d. values calculated either for the polypeptide backbone, the core residues with solvent accessibility below 15% or all heavy atoms.
NMR and crystal structure comparison; structure-determination software; reference structures; Thermotoga maritima
In this chapter, we concentrate on the production of high quality protein samples for NMR studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium, and outline our high-throughput strategies for producing high quality protein samples for nuclear magnetic resonance (NMR) studies. Our strategy is based on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6X-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (> 97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5,000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this paper describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening and structural genomics research.
Structural Genomics; High throughput protein production; Construct optimization; Disorder prediction; Ligation independent cloning; Multiple Displacement Amplification; Laboratory Information Management System; Protein Structure Initiative; NMR; T7 Escherichia coli expression system; Wheat Germ Cell Free; NMR microprobe screening; Parallel protein purification; 6X-His tag; HDX-MS; Total gene synthesis; condensed single protein production
Evolution depends on the manner in which genetic variation is translated into new phenotypes. There has been much debate about whether organisms might have specific mechanisms for “evolvability,” which would generate heritable phenotypic variation with adaptive value and could act to enhance the rate of evolution. Capacitor systems, which allow the accumulation of cryptic genetic variation and release it under stressful conditions, might provide such a mechanism. In yeast, the prion [PSI+] exposes a large array of previously hidden genetic variation, and the phenotypes it thereby produces are advantageous roughly 25% of the time. The notion that [PSI+] is a mechanism for evolvability would be strengthened if the frequency of its appearance increased with stress. That is, a system that mediates even the haphazard appearance of new phenotypes, which have a reasonable chance of adaptive value would be beneficial if it were deployed at times when the organism is not well adapted to its environment. In an unbiased, high-throughput, genome-wide screen for factors that modify the frequency of [PSI+] induction, signal transducers and stress response genes were particularly prominent. Furthermore, prion induction increased by as much as 60-fold when cells were exposed to various stressful conditions, such as oxidative stress (H2O2) or high salt concentrations. The severity of stress and the frequency of [PSI+] induction were highly correlated. These findings support the hypothesis that [PSI+] is a mechanism to increase survival in fluctuating environments and might function as a capacitor to promote evolvability.
One controversy in evolutionary biology concerns whether there might be plausible explanations for the rapid evolution of complex traits. An extreme and fascinating example of protein conformational change, the prion, offers a framework for this concept. Prion proteins are responsible for neurodegenerative diseases, instruct us in important aspects of amyloid formation, and furthermore, serve as ancient protein-based units of inheritance, a domain previously reserved for nucleic acids. In yeast, the [PSI+] prion causes read-through of nonsense codons. This has the capacity to rapidly unveil hidden genetic variation that may have adaptive value. The suggestion that [PSI+] might serve as a mechanism for evolvability would be strengthened if the frequency of the prion's appearance increased when the organism was under stress and therefore not ideally adapted to its environment. We investigated genetic and environmental factors that could modify the frequency with which the prion appears. Our high-throughput, genome-wide screen identified genes involved in stress response and signal transduction, whereas our cell-based assays found severe conditions that increased prion formation. Thus [PSI+] provides a possible mechanism for the organism to rapidly acquire new phenotypes in times of stress and potentially increases evolvability.
A yeast prion protein may promote evolvability by providing a mechanism for the rapid evolution of complex traits that is responsive to environmental stress.
One of the major goals of structural genomics projects is to determine the three-dimensional structure of representative members of as many different fold families as possible. Comparative modeling is expected to fill the remaining gaps by providing structural models of homologs of the experimentally determined proteins. However, for such an approach to be successful it is essential that the quality of the experimentally determined structures is adequate. In an attempt to build a homology model for the protein dynein light chain 2A (DLC2A) we found two potential templates, both experimentally determined nuclear magnetic resonance (NMR) structures originating from structural genomics efforts. Despite their high sequence identity (96%), the folds of the two structures are markedly different. This urged us to perform in-depth analyses of both structure ensembles and the deposited experimental data, the results of which clearly identify one of the two models as largely incorrect. Next, we analyzed the quality of a large set of recent NMR-derived structure ensembles originating from both structural genomics projects and individual structure determination groups. Unfortunately, a visual inspection of structures exhibiting lower quality scores than DLC2A reveals that the seriously flawed DLC2A structure is not an isolated incident. Overall, our results illustrate that the quality of NMR structures cannot be reliably evaluated using only traditional experimental input data and overall quality indicators as a reference and clearly demonstrate the urgent need for a tight integration of more sophisticated structure validation tools in NMR structure determination projects. In contrast to common methodologies where structures are typically evaluated as a whole, such tools should preferentially operate on a per-residue basis.
Three-dimensional biomolecular structures provide an invaluable source of biologically relevant information. To be able to learn the most of the wealth of information that these structures can provide us, it is of great importance that the quality and accuracy of the protein structure models deposited in the Protein Data Bank are as high as possible. In this work, the authors describe an analysis that illustrates that this is unfortunately not the case for many protein structures solved using nuclear magnetic resonance spectroscopy. They present an example in which two strikingly different models describing the same protein are analyzed using commonly available structure validation tools, and the results of this analysis show one of the two models to be incorrect. Subsequently, using a large set of recently determined structures, the authors demonstrate that unfortunately this example does not stand on its own. The analyses and examples clearly illustrate that relying solely on the experimental data to evaluate structural quality can provide a false sense of correctness and the combination of multiple sophisticated structure validation tools is required to detect the presence of errors in protein nuclear magnetic resonance structures.
The Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB, http://kb.psi-structuralgenomics.org) has been created to turn the products of the PSI structural genomics effort into knowledge that can be used by the biological research community to understand living systems and disease. This resource provides central access to structures in the Protein Data Bank (PDB), along with functional annotations, associated homology models, worldwide protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. It also offers the ability to search all of the structural and methodological publications and the innovative technologies that were catalyzed by the PSI's high-throughput research efforts. In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library, editorials about new research advances, news and an events calendar to present a broader view of structural biology and structural genomics. By making these resources freely available, the PSI SGKB serves as a bridge to connect the structural biology and the greater biomedical communities.
Despite continuing progress in X-ray crystallography and high-field NMR spectroscopy for determination of three-dimensional protein structures, the number of unsolved and newly discovered sequences grows much faster than that of determined structures. Protein modeling methods can possibly bridge this huge sequence-structure gap with the development of computational science. A grand challenging problem is to predict three-dimensional protein structure from its primary structure (residues sequence) alone. However, predicting residue contact maps is a crucial and promising intermediate step towards final three-dimensional structure prediction. Better predictions of local and non-local contacts between residues can transform protein sequence alignment to structure alignment, which can finally improve template based three-dimensional protein structure predictors greatly.
CNNcon, an improved multiple neural networks based contact map predictor using six sub-networks and one final cascade-network, was developed in this paper. Both the sub-networks and the final cascade-network were trained and tested with their corresponding data sets. While for testing, the target protein was first coded and then input to its corresponding sub-networks for prediction. After that, the intermediate results were input to the cascade-network to finish the final prediction.
The CNNcon can accurately predict 58.86% in average of contacts at a distance cutoff of 8 Å for proteins with lengths ranging from 51 to 450. The comparison results show that the present method performs better than the compared state-of-the-art predictors. Particularly, the prediction accuracy keeps steady with the increase of protein sequence length. It indicates that the CNNcon overcomes the thin density problem, with which other current predictors have trouble. This advantage makes the method valuable to the prediction of long length proteins. As a result, the effective prediction of long length proteins could be possible by the CNNcon.
The development of experiments that can generate molecular movies of changing chemical structures is a major challenge for physical chemistry. But to realize this dream, we not only need to significantly improve existing approaches, but we also must invent new technologies .. Most of the known protein structures have been determined by X-ray diffraction and to lesser extent by NMR. Though powerful, X-ray diffraction presents limitations for acquiring time dependent structures. In the case of NMR, ultrafast equilibrium dynamics might be inferred from lineshapes, but the structures of conformations interconverting on such time scales are not realizable.
This Account highlights two dimensional infrared spectroscopy (2D IR), in particular the 2D vibrational echo, as an approach to time resolved structure determination. We outline the use of the 2D IR method to completely determine the structure of a protein of the integrin family in a time window of few picoseconds. As a transmembrane protein, this class of structures has proved particularly challenging for the established structural methodologies of x-ray crystallography and NMR.
We describe the challenges facing multidimensional spectroscopy and compare it with some other methods of structural biology. Then we succinctly discuss the basic principles of 2D IR methods as they relate to time domain and frequency domain experimental and theoretical properties required for protein structure determination. By means of the example of the transmembrane protein, we describe the essential aspects of combined carbon-13 oxygen-18 isotope labels to create vibrational resonance pairs that allow the determination of protein and peptide structures in motion. Finally, we propose a three dimensional structure of the αIIb transmembrane homodimer that includes optimum locations of all side chains and backbone atoms of the protein.
Delocalization among 13C=18O residues on different helices. The vibrational excitation is transferred between modes on different helices on the coherent energy transfer time π/2β.
Multidimensional spectra; vibrational spectra; three dimensional structure; photon echo; vibrational probe; peptide vibrational dynamics