Two new structures of the SRP GTPase Ffh have been determined at 1.1 Å resolution and provide the basis for comparative examination of the extensive water structure of the apo conformation of these GTPases. A set of well defined water-binding positions have been identified in the active site of the two-domain ‘NG’ GTPase, as well as at two functionally important interfaces. The water hydrogen-bonding network accommodates alternate conformations of the protein side chains by undergoing local rearrangements and, in one case, illustrates binding of a solute molecule within the active site by displacement of water molecules without further disruption of the water-interaction network. A subset of the water positions are well defined in several lower resolution structures, including those of different nucleotide-binding states; these appear to function in maintaining the protein structure. Consistent arrangements of surface water between three different ultrahigh-resolution structures provide a framework for beginning to understand how local water structure contributes to protein–ligand and protein–protein binding in the SRP GTPases.
This paper presents a methodology to obtain candidate conformations of multidomain proteins for use in Molecular Replacement. For each separate domain, orientational relationship between the template and the target structure is obtained by using MR. Then, the orientational relationships of the domains are used to calculate the relative rotation between those domains in the target conformation by using pose estimation techniques from the field of Robotics and Computer Vision. With the angle of relative rotation between the domains as a cost function, iterative normal mode analysis is used to drive the template structure into the candidate conformation to match X-ray crystallography data obtained for the target conformation. As a validation, the proposed method is applied to three test proteins: Ribose-binding protein; Lactoferrin; and Calcium ATPase. In each test case, the orientation and translation of the final candidate conformation are generated correctly from the suggested procedure. The results show that the proposed method can yield applicable candidate conformations for MR and reveal the structural details of the target conformation and its position and orientation in the crystallographic unit cell.
The availability of high-intensity synchrotron facilities, technological advances in data-collection techniques and improved data-reduction and crystallographic software have ushered in a new era in high-throughput macromolecular crystallography. Here, the de novo automated crystal structure determination at 1.28 Å resolution of an NAD(P)H-dependent FMN reductase flavoprotein from Pseudomonas aeruginosa PA01-derived protein Q9I4D4 using the anomalous signal from an unusually small number of S atoms is reported. Although this protein lacks the flavodoxin key fingerprint motif [(T/S)XTGXT], it has been confirmed to bind flavin mononucleotide and the binding site was identified via X-ray crystallography. This protein contains a novel flavin mononucleotide-binding site GSLRSGSYN, which has not been previously reported. Detailed statistics pertaining to sulfur phasing and other factors contributing to structure determination are discussed. Structural comparisons of the apoenzyme and the protein complexed with flavin mononucleotide show conformational changes on cofactor binding. NADPH-dependent activity has been confirmed with biochemical assays.
Covalent labeling of macromolecules with trace levels (<1%) of a fluorescent dye is proposed as a means to facilitate finding or detecting crystals in crystallization drops. To test the effects of labeled protein concentration on the resulting X-ray diffraction data, experiments were carried out with the model proteins insulin, ribonuclease, lysozyme and thaumatin, which were labeled with the fluorescent dye carboxyrhodamine. All proteins were labeled on their N-terminal amine and lysozyme was also labeled randomly on lysine side chains in a separate series of experiments. Ribonuclease and N-terminal amine-labeled lysozyme crystals were poorly formed at 10% label concentration and these were not used in subsequent diffraction experiments. All model proteins were tested to 5% labeled protein, and thaumatin and randomly labeled lysozyme gave well formed crystals to 10% labeled protein. In all cases tested, the presence of the label was found to not significantly affect the X-ray diffraction data quality obtained. Qualitative visual-inspection experiments over a range of label concentrations indicated that optimum derivatization levels ranged from 0.025–0.05% for insulin to 0.1–0.25% for thaumatin. Light intensity is a simpler search parameter than straight lines and by virtue of being the most densely packed phase, labeled crystals should be the most intense light sources under fluorescent illumination. For both visual and automated methods of crystal detection, label intensity is a simpler and potentially more powerful search parameter. Screening experiments using the proteins canavalin, -lactoglobulins A and B and chymotrypsinogen, all at 0.5% label concentration, demonstrated the utility of this approach to rapidly finding crystals, even when obscured by precipitate. The use of trace-labeled protein is also proposed to be useful for the automated centering of crystals in X-ray beamlines.
Four case studies in using maximum-likelihood molecular replacement, as implemented in the program Phaser, to solve structures of protein complexes are described.
Molecular replacement (MR) generally becomes more difficult as the number of components in the asymmetric unit requiring separate MR models (i.e. the dimensionality of the search) increases. When the proportion of the total scattering contributed by each search component is small, the signal in the search for each component in isolation is weak or non-existent. Maximum-likelihood MR functions enable complex asymmetric units to be built up from individual components with a ‘tree search with pruning’ approach. This method, as implemented in the automated search procedure of the program Phaser, has been very successful in solving many previously intractable MR problems. However, there are a number of cases in which the automated search procedure of Phaser is suboptimal or encounters difficulties. These include cases where there are a large number of copies of the same component in the asymmetric unit or where the components of the asymmetric unit have greatly varying B factors. Two case studies are presented to illustrate how Phaser can be used to best advantage in the standard ‘automated MR’ mode and two case studies are used to show how to modify the automated search strategy for problematic cases.
macromolecular crystallography; molecular replacement; maximum likelihood
Methods and resources for obtaining chemically plausible starting models and restraint sets for refinement of ligand complexes are described and some of the potential pitfalls are discussed.
Model building and refinement of complexes between biomacromolecules and small molecules requires sensible starting coordinates as well as the specification of restraint sets for all but the most common non-macromolecular entities. Here, it is described why this is necessary, how it can be accomplished and what pitfalls need to be avoided in order to produce chemically plausible models of the low-molecular-weight entities. A number of programs, servers, databases and other resources that can be of assistance in the process are also discussed.
refinement; model building; ligand complexes; restraint sets; macromolecular crystallography
This paper highlights some of the problems that can arise when attempting to obtain crystal structures of small molecule–protein complexes and how biophysical methods can be used to define and overcome these problems. Many of the techniques mentioned are also applicable to the study of protein–protein complexes and mode-of-action analysis.
In attempts to determine the crystal structure of small molecule–protein complexes, a common frustration is the absence of ligand binding once the protein structure has been solved. While the first structure, even with no ligand bound (apo), can be a cause for celebration, the solution of dozens of apo structures can give an unwanted sense of déjà vu. Much time and material is wasted on unsuccessful experiments, which can have a serious impact on productivity and morale. There are many reasons for the lack of observed binding in crystals and this paper highlights some of these. Biophysical methods may be used to confirm and optimize solution conditions to increase the success rate of crystallizing protein–ligand complexes. As there are an overwhelming number of biophysical methods available, some of the factors that need to be considered when choosing the most appropriate technique for a given system are discussed. Finally, a few illustrative examples where biophysical methods have proven helpful in real systems are given.
protein–ligand complexes; isothermal titration calorimetry; dynamic light scattering; nuclear magnetic resonance; thermal denaturation
X-ray structures in the PDB illustrate both the specific recognition of two polypeptide chains in protein–protein complexes and dimeric proteins and their nonspecific interaction at crystal contacts.
Crystal structures deposited in the Protein Data Bank illustrate the diversity of biological macromolecular recognition: transient interactions in protein–protein and protein–DNA complexes and permanent assemblies in homodimeric proteins. The geometric and physical chemical properties of the macromolecular interfaces that may govern the stability and specificity of recognition are explored in complexes and homodimers compared with crystal-packing interactions. It is found that crystal-packing interfaces are usually much smaller; they bury fewer atoms and are less tightly packed than in specific assemblies. Standard-size interfaces burying 1200–2000 Å2 of protein surface occur in protease–inhibitor and antigen–antibody complexes that assemble with little or no conformation changes. Short-lived electron-transfer complexes have small interfaces; the larger size of the interfaces observed in complexes involved in signal transduction and homodimers correlates with the presence of conformation changes, often implicated in biological function. Results of the CAPRI (critical assessment of predicted interactions) blind prediction experiment show that docking algorithms efficiently and accurately predict the mode of assembly of proteins that do not change conformation when they associate. They perform less well in the presence of large conformation changes and the experiment stimulates the development of novel procedures that can handle such changes.
macromolecular recognition; Protein Data Bank
A brief summary of the types of restraint defined in refinement dictionaries.
At the resolution available from most macromolecular crystals, the X-ray data alone are insufficient to lead to a chemically reasonable structure, so stereochemical restraints are essential. These usually restrain bond lengths, bond angles, planes and chiral volumes. The definition of these restraints and where the values come from are described. A dictionary entry contains information about the atom types, their connectivity and all the appropriate restraints. Torsion angles are not usually restrained, but they do have optimum values. In the special case of flexible five- and six-membered rings, including pentose and hexose sugars, the ring pucker is defined by combinations of torsion angles and the pucker affects the position of substituents.
stereochemistry; restraints; bond lengths; bond angles; protein structure; crystallographic refinement
An automated ligand-fitting procedure is applied to (F
o − F
c)exp(iϕc) difference density for 200 commonly found ligands from macromolecular structures in the Protein Data Bank to identify ligands from density maps.
A procedure for the identification of ligands bound in crystal structures of macromolecules is described. Two characteristics of the density corresponding to a ligand are used in the identification procedure. One is the correlation of the ligand density with each of a set of test ligands after optimization of the fit of that ligand to the density. The other is the correlation of a fingerprint of the density with the fingerprint of model density for each possible ligand. The fingerprints consist of an ordered list of correlations of each the test ligands with the density. The two characteristics are scored using a Z-score approach in which the correlations are normalized to the mean and standard deviation of correlations found for a variety of mismatched ligand-density pairs, so that the Z scores are related to the probability of observing a particular value of the correlation by chance. The procedure was tested with a set of 200 of the most commonly found ligands in the Protein Data Bank, collectively representing 57% of all ligands in the Protein Data Bank. Using a combination of these two characteristics of ligand density, ranked lists of ligand identifications were made for representative (F
o − F
c)exp(iϕc) difference density from entries in the Protein Data Bank. In 48% of the 200 cases, the correct ligand was at the top of the ranked list of ligands. This approach may be useful in identification of unknown ligands in new macromolecular structures as well as in the identification of which ligands in a mixture have bound to a macromolecule.
model building; model completion; shape analysis
The most extensive structural information on viruses relates to apparently icosahedral virions and is based on X-ray crystallography and on cryo-electron microscopy single-particle reconstructions. This paper concerns itself with the study of the macromolecular complexes that constitute viruses, using structural hybrid techniques.
The most extensive structural information on viruses relates to apparently icosahedral virions and is based on X-ray crystallography and on cryo-electron microscopy (cryo-EM) single-particle reconstructions. Both techniques lean heavily on imposing icosahedral symmetry, thereby obscuring any deviation from the assumed symmetry. However, tailed bacteriophages have icosahedral or prolate icosahedral heads that have one obvious unique vertex where the genome can enter for DNA packaging and exit when infecting a host cell. The presence of the tail allows cryo-EM reconstructions in which the special vertex is used to orient the head in a unique manner. Some very large dsDNA icosahedral viruses also develop special vertices thought to be required for infecting host cells. Similarly, preliminary cryo-EM data for the small ssDNA canine parvovirus complexed with receptor suggests that these viruses, previously considered to be accurately icosahedral, might have some asymmetric properties that generate one preferred receptor-binding site on the viral surface. Comparisons are made between rhinoviruses that bind receptor molecules uniformly to all 60 equivalent binding sites, canine parvovirus, which appears to have a preferred receptor-binding site, and bacteriophage T4, which gains major biological advantages on account of its unique vertex and tail organelle.
Bacteriophage T4; canine parvovirus; cryo-electron microscopy; image reconstruction; large dsDNA icosahedral viruses; special vertex
A case study showing how the determination of multiple cocrystal structures of the protein tyrosine kinase c-Abl was used to support drug discovery, resulting in a compound effective in the treatment of chronic myelogenous leukaemia.
Chronic myelogenous leukaemia (CML) results from the Bcr-Abl oncoprotein, which has a constitutively activated Abl tyrosine kinase domain. Although most chronic phase CML patients treated with imatinib as first-line therapy maintain excellent durable responses, patients who have progressed to advanced-stage CML frequently fail to respond or lose their response to therapy owing to the emergence of drug-resistant mutants of the protein. More than 40 such point mutations have been observed in imatinib-resistant patients. The crystal structures of wild-type and mutant Abl kinase in complex with imatinib and other small-molecule Abl inhibitors were determined, with the aim of understanding the molecular basis of resistance and to aid in the design and optimization of inhibitors active against the resistance mutants. These results are presented in a way which illustrates the approaches used to generate multiple structures, the type of information that can be gained and the way that this information is used to support drug discovery.
tyrosine kinase; drug discovery; imatinib; nilotinib
A method for detecting structural homologs of components in an intermediate resolution cryo-EM map and their spatial configuration is presented.
Structural analysis of biological machines is essential for inferring their function and mechanism. Nevertheless, owing to their large size and instability, deciphering the atomic structure of macromolecular assemblies is still considered as a challenging task that cannot keep up with the rapid advances in the protein-identification process. In contrast, structural data at lower resolution is becoming more and more available owing to recent advances in cryo-electron microscopy (cryo-EM) techniques. Once a cryo-EM map is acquired, one of the basic questions asked is what are the folds of the components in the assembly and what is their configuration. Here, a novel knowledge-based computational method, named EMatch, towards tackling this task for cryo-EM maps at 6–10 Å resolution is presented. The method recognizes and locates possible atomic resolution structural homologues of protein domains in the assembly. The strengths of EMatch are demonstrated on a cryo-EM map of native GroEL at 6 Å resolution.
structural bioinformatics; intermediate-resolution cryo-EM maps; three-dimensional alignment of secondary structures; macromolecular assemblies
This paper presents a survey of techniques that explore the surface properties of protein:protein interfaces so as to inform the prediction of probable sites of protein:protein interaction on newly determined protein structures.
Several potential applications of structural biology depend on discovering how one macromolecule might recognize a partner. Experiment remains the best way to answer this question, but computational tools can contribute where this fails. In such cases, structures may be studied to identify patches of exposed residues that have properties common to interaction surfaces and the locations of these patches can serve as the basis for further modelling or for further experimentation. To date, interaction surfaces have been proposed on the basis of unusual physical properties, unusual propensities for particular amino-acid types or an unusually high level of sequence conservation. Using the CXXSurface toolkit, developed as a part of the CCP4MG program, a suite of tools to analyse the properties of surfaces and their interfaces in complexes has been prepared and applied. These tools have enabled the rapid analysis of known complexes to evaluate the distribution of (i) hydrophobicity, (ii) electrostatic complementarity and (iii) sequence conservation in authentic complexes, so as to assess the extent to which these properties may be useful indicators of probable biological function.
surfaces; electrostatics; hydrophobicity; conservation
Methods presented for growing protein–ligand complexes fall into the categories of co-expression of the protein with the ligands of interest, use of the ligands during protein purification, cocrystallization and soaking the ligands into existing crystals.
Obtaining diffraction-quality crystals has long been a bottleneck in solving the three-dimensional structures of proteins. Often proteins may be stabilized when they are complexed with a substrate, nucleic acid, cofactor or small molecule. These ligands, on the other hand, have the potential to induce significant conformational changes to the protein and ab initio screening may be required to find a new crystal form. This paper presents an overview of strategies in the following areas for obtaining crystals of protein–ligand complexes: (i) co-expression of the protein with the ligands of interest, (ii) use of the ligands during protein purification, (iii) cocrystallization and (iv) soaks.
protein–ligand complexes; crystallization
The performance of the ligand-building module of the ARP/wARP software suite is assessed through a large-scale test on known protein–ligand complexes. The results provide a detailed benchmark and guidelines for future improvements.
The efficiency of the ligand-building module of ARP/wARP version 6.1 has been assessed through extensive tests on a large variety of protein–ligand complexes from the PDB, as available from the Uppsala Electron Density Server. Ligand building in ARP/wARP involves two main steps: automatic identification of the location of the ligand and the actual construction of its atomic model. The first step is most successful for large ligands. The second step, ligand construction, is more powerful with X-ray data at high resolution and ligands of small to medium size. Both steps are successful for ligands with low to moderate atomic displacement parameters. The results highlight the strengths and weaknesses of both the method of ligand building and the large-scale validation procedure and help to identify means of further improvement.
ligand binding; ARP/wARP
Structures of protein complexes offer some of the most interesting insights into biological processes. In this article, the methods required to show that the complex observed is the physiological one are investigated.
Protein in crystal form is at an extremely high concentration and yet retains the complex secondary structure that defines an active protein. The protein crystal itself is made up of a repeating lattice of protein–protein and protein–solvent interactions. The problem that confronts any crystallographer is to identify those interactions that represent physiological interactions and those that do not. This review explores the tools that are available to provide such information using the original crystal liquor as a sample. The review is aimed at postgraduate and postdoctoral researchers who may well be coming up against this problem for the first time. Techniques are discussed that will provide information on the stoichiometry of complexes as well as low-resolution information on complex structure. Together, these data will help to identify the physiological complex.
protein complex; size-exclusion chromatography; dynamic light scattering; analytical ultracentrifugation; fluorescence resonance energy transfer
The use of isothermal titration calorimetry (ITC) provides a full thermodynamic characterization of an interaction in one experiment. The determination of the affinity is an important value; however, the additional layer of information provided by the change in enthalpy and entropy can help in understanding the biology. This is demonstrated with respect to tyrosine kinase-mediated signal transduction.
Isothermal titration calorimetry (ITC) provides highly complementary data to high-resolution structural detail. An overview of the methodology of the technique is provided. Ultimately, the correlation of the thermodynamic parameters determined by ITC with structural perturbation observed on going from the free to the bound state should be possible at an atomic level. Currently, thermodynamic data provide some insight as to potential changes occurring on complex formation. Here, this is demonstrated in the context of in vitro quantification of intracellular tyrosine kinase-mediated signal transduction and the issue of specificity of the important interactions. The apparent lack of specificity in the interactions of domains of proteins involved in early signalling from membrane-bound receptors is demonstrated using data from ITC.
isothermal titration calorimetry; formation of complexes; tyrosine kinase-mediated signal transduction
Most Kunitz proteins like BPTI and α-dendrotoxin are stabilized by three disulfide bonds. The crystal structure shows how subtle repacking of non-covalent interactions may compensate for disulfide bond loss in a naturally occurring two-disulfide variant, conkunitzin-S1, the first discovered member of a new conotoxin family.
Cone snails (Conus) are predatory marine mollusks that immobilize prey with venom containing 50–200 neurotoxic polypeptides. Most of these polypeptides are small disulfide-rich conotoxins that can be classified into families according to their respective ion-channel targets and patterns of cysteine–cysteine disulfides. Conkunitzin-S1, a potassium-channel pore-blocking toxin isolated from C. striatus venom, is a member of a newly defined conotoxin family with sequence homology to Kunitz-fold proteins such as α-dendrotoxin and bovine pancreatic trypsin inhibitor (BPTI). While conkunitzin-S1 and α-dendrotoxin are 42% identical in amino-acid sequence, conkunitzin-S1 has only four of the six cysteines normally found in Kunitz proteins. Here, the crystal structure of conkunitzin-S1 is reported. Conkunitzin-S1 adopts the canonical 310–β–β–α Kunitz fold complete with additional distinguishing structural features including two completely buried water molecules. The crystal structure, although completely consistent with previously reported NMR distance restraints, provides a greater degree of precision for atomic coordinates, especially for S atoms and buried solvent molecules. The region normally cross-linked by cysteines II and IV in other Kunitz proteins retains a network of hydrogen bonds and van der Waals interactions comparable to those found in α-dendrotoxin and BPTI. In conkunitzin-S1, glycine occupies the sequence position normally reserved for cysteine II and the special steric properties of glycine allow additional van der Waals contacts with the glutamine residue substituting for cysteine IV. Evolution has thus defrayed the cost of losing a disulfide bond by augmenting and optimizing weaker yet nonetheless effective non-covalent interactions.
conotoxin; BPTI; α-dendroxin; native chemical ligation; conus
An automated ligand-fitting procedure has been developed and tested on 9327 ligands and (F
o − F
c)exp(iϕc) difference density from macromolecular structures in the Protein Data Bank.
A procedure for fitting of ligands to electron-density maps by first fitting a core fragment of the ligand to density and then extending the remainder of the ligand into density is presented. The approach was tested by fitting 9327 ligands over a wide range of resolutions (most are in the range 0.8–4.8 Å) from the Protein Data Bank (PDB) into (F
o − F
c)exp(iϕc) difference density calculated using entries from the PDB without these ligands. The procedure was able to place 58% of these 9327 ligands within 2 Å (r.m.s.d.) of the coordinates of the atoms in the original PDB entry for that ligand. The success of the fitting procedure was relatively insensitive to the size of the ligand in the range 10–100 non-H atoms and was only moderately sensitive to resolution, with the percentage of ligands placed near the coordinates of the original PDB entry for fits in the range 58–73% over all resolution ranges tested.
model building; model completion; shape analysis