Protein-DNA interactions are important for many cellular processes, however structural knowledge for a large fraction of known and putative complexes is still lacking. Computational docking methods aim at the prediction of complex architecture given detailed structures of its constituents. They are becoming an increasingly important tool in the field of macromolecular assemblies, complementing particularly demanding protein-nucleic acids X ray crystallography and providing means for the refinement and integration of low resolution data coming from rapidly advancing methods such as cryoelectron microscopy.
We present a new coarse-grained force field suitable for protein-DNA docking. The force field is an extension of previously developed parameter sets for protein-RNA and protein-protein interactions. The docking is based on potential energy minimization in translational and orientational degrees of freedom of the binding partners. It allows for fast and efficient systematic search for native-like complex geometry without any prior knowledge regarding binding site location.
We find that the force field gives very good results for bound docking. The quality of predictions in the case of unbound docking varies, depending on the level of structural deviation from bound geometries. We analyze the role of specific protein-DNA interactions on force field performance, both with respect to complex structure prediction, and the reproduction of experimental binding affinities. We find that such direct, specific interactions only partially contribute to protein-DNA recognition, indicating an important role of shape complementarity and sequence-dependent DNA internal energy, in line with the concept of indirect protein-DNA readout mechanism.
Accurate prediction of the structure of protein-protein complexes in computational docking experiments remains a formidable challenge. It has been recognized that identifying native or native-like poses among multiple decoys is the major bottleneck of the current scoring functions used in docking. We have developed a novel multi-body pose-scoring function that has no theoretical limit on the number of residues contributing to the individual interaction terms. We use a coarse-grain representation of a protein-protein complex where each residue is represented by its side chain centroid. We apply a computational geometry approach called Almost-Delaunay tessellation that transforms protein-protein complexes into a residue contact network, or an un-directional graph where vertex-residues are nodes connected by edges. This treatment forms a family of interfacial graphs representing a dataset of protein-protein complexes. We then employ frequent subgraph mining approach to identify common interfacial residue patterns that appear in at least a subset of native protein-protein interfaces. The geometrical parameters and frequency of occurrence of each “native” pattern in the training set are used to develop the new SPIDER scoring function. SPIDER was validated using standard “ZDOCK” benchmark dataset that was not used in the development of SPIDER. We demonstrate that SPIDER scoring function ranks native and native-like poses above geometrical decoys and that it exceeds in performance a popular ZRANK scoring function. SPIDER was ranked among the top scoring functions in a recent round of CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein–protein docking methods.
Bioinformatics; Amino acids; Centroids; Statistical potential; Delaunay tessellation; Subgraph mining; Motifs; Coarse-grained; ZDOCK; CAPRI
The awareness of important biological role played by functional, non coding (nc) RNA has grown tremendously in recent years. To perform their tasks, ncRNA molecules typically unite with protein partners, forming ribonucleoprotein complexes. Structural insight into their architectures can be greatly supplemented by computational docking techniques, as they provide means for the integration and refinement of experimental data that is often limited to fragments of larger assemblies or represents multiple levels of spatial resolution. Here, we present a coarse-grained force field for protein-RNA docking, implemented within the framework of the ATTRACT program. Complex structure prediction is based on energy minimization in rotational and translational degrees of freedom of binding partners, with possible extension to include structural flexibility. The coarse-grained representation allows for fast and efficient systematic docking search without any prior knowledge about complex geometry.
Determination of protein-DNA complex structures with both NMR and X-ray crystallography remains challenging in many cases. High Ambiguity-Driven DOCKing (HADDOCK) is an information-driven docking program that has been used to successfully model many protein-DNA complexes. However, a protein-DNA complex model whereby the protein wraps around DNA has not been reported. Defining the ambiguous interaction restraints for the classical three-Cys2His2 zinc-finger proteins that wrap around DNA is critical because of the complicated binding geometry. In this study, we generated a Zif268-DNA complex model using three different sets of ambiguous interaction restraints (AIRs) to study the effect of the geometric distribution on the docking and used this approach to generate a newly reported Sp1-DNA complex model.
The complex models we generated on the basis of two AIRs with a good geometric distribution in each domain are reasonable in terms of the number of models with wrap-around conformation, interface root mean square deviation, AIR energy and fraction native contacts. We derived the modeling approach for generating a three-Cys2His2 zinc-finger-DNA complex model according to the results of docking studies using the Zif268-DNA and other three crystal complex structures. Furthermore, the Sp1-DNA complex model was calculated with this approach, and the interactions between Sp1 and DNA are in good agreement with those previously reported.
Our docking data demonstrate that two AIRs with a reasonable geometric distribution in each of the three-Cys2His2 zinc-finger domains are sufficient to generate an accurate complex model with protein wrapping around DNA. This approach is efficient for generating a zinc-finger protein-DNA complex model for unknown complex structures in which the protein wraps around DNA. We provide a flowchart showing the detailed procedures of this approach.
Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.
We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.
We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
Motivation: An effective docking algorithm for antibody–protein antigen complex prediction is an important first step toward design of biologics and vaccines. We have recently developed a new class of knowledge-based interaction potentials called Decoys as the Reference State (DARS) and incorporated DARS into the docking program PIPER based on the fast Fourier transform correlation approach. Although PIPER was the best performer in the latest rounds of the CAPRI protein docking experiment, it is much less accurate for docking antibody–protein antigen pairs than other types of complexes, in spite of incorporating sequence-based information on the location of the paratope. Analysis of antibody–protein antigen complexes has revealed an inherent asymmetry within these interfaces. Specifically, phenylalanine, tryptophan and tyrosine residues highly populate the paratope of the antibody but not the epitope of the antigen.
Results: Since this asymmetry cannot be adequately modeled using a symmetric pairwise potential, we have removed the usual assumption of symmetry. Interaction statistics were extracted from antibody–protein complexes under the assumption that a particular atom on the antibody is different from the same atom on the antigen protein. The use of the new potential significantly improves the performance of docking for antibody–protein antigen complexes, even without any sequence information on the location of the paratope. We note that the asymmetric potential captures the effects of the multi-body interactions inherent to the complex environment in the antibody–protein antigen interface.
Availability: The method is implemented in the ClusPro protein docking server, available at http://cluspro.bu.edu.
email@example.com or firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.
Prediction of physical protein-protein interactions represents a key challenge in computational systems biology. This study provides a proof-of-principle that high-throughput in silico protein docking results can be used to predict interaction partners.
Deciphering the whole network of protein interactions for a given proteome (‘interactome') is the goal of many experimental and computational efforts in Systems Biology. Separately the prediction of the structure of protein complexes by docking methods is a well-established scientific area. To date, docking programs have not been used to predict interaction partners. We provide a proof of principle for such an approach. Using a set of protein complexes representing known interactors in their unbound form, we show that a standard docking program can distinguish the true interactors from a background of 922 non-redundant potential interactors. We additionally show that true interactions can be distinguished from non-likely interacting proteins within the same structural family. Our approach may be put in the context of the proposed ‘funnel-energy model'; the docking algorithm may not find the native complex, but it distinguishes binding partners because of the higher probability of favourable models compared with a collection of non-binders. The potential exists to develop this proof of principle into new approaches for predicting interaction partners and reconstructing biological networks.
interactome; protein docking; protein–protein interaction
The influenza viruses contain a segmented, single-stranded RNA genome of negative polarity. Each RNA segment is encapsidated by the nucleoprotein and the polymerase complex into ribonucleoprotein particles (RNPs), which are responsible for virus transcription and replication. Despite their importance, information about the structure of these RNPs is scarce. We have determined the three-dimensional structure of a biologically active recombinant RNP by cryo-electron microscopy. The structure shows a nonameric nucleoprotein ring (at 12 Å resolution) with two monomers connected to the polymerase complex (at 18 Å resolution). Docking the atomic structures of the nucleoprotein and polymerase domains, as well as mutational analyses, has allowed us to define the interactions between the functional elements of the RNP and to propose the location of the viral RNA. Our results provide the first model for a functional negative-stranded RNA virus ribonucleoprotein complex. The structure reported here will serve as a framework to generate a quasi-atomic model of the molecular machine responsible for viral RNA synthesis and to test new models for virus RNA replication and transcription.
The influenza viruses cause annual epidemics of respiratory disease and occasional pandemics that constitute a major public-health issue. The recent spillover of avian H5N1 and H1N1 swine influenza viruses to humans poses a serious threat of a new pandemic. These viruses contain a segmented RNA genome, which forms independent ribonucleoprotein particles including the polymerase complex and multiple copies of the nucleoprotein. Each of these ribonucleoprotein particles are replicated and express the encoding virus genes independently in the virus-infected cells. To better understand how these processes take place we have determined the three-dimensional structure of a model ribonucleoprotein particle that only contains 248 nucleotides of virus RNA but is biologically active in vitro and in vivo. The structure shows a circular appearance and includes 9 nucleoprotein monomers, two of which are associated to the polymerase complex. Docking of the available atomic structures of the nucleoprotein and domains of the polymerase complex has permitted us to propose a quasi-atomic model for this ribonucleoprotein particle and some of the predictions of the model have been confirmed experimentally by site-directed mutagenesis and phenotype analysis in vitro and in vivo.
Coarse grain modelling of macromolecules is a new approach potentially well adapted to answer numerous issues, ranging from physics to biology. We propose here an original DNA coarse grain model specifically dedicated to protein–DNA docking, a crucial, but still largely unresolved, question in molecular biology. Using a representative set of protein–DNA complexes, we first show that our model is able to predict the interaction surface between the macromolecular partners taken in their bound form. In a second part, the impact of the DNA sequence and electrostatics, together with the DNA and protein conformations on docking is investigated. Our results strongly suggest that the overall DNA structure mainly contributes in discriminating the interaction site on cognate proteins. Direct electrostatic interactions between phosphate groups and amino acids side chains strengthen the binding. Overall, this work demonstrates that coarse grain modelling can reveal itself a precious auxiliary for a general and complete description and understanding of protein–DNA association mechanisms.
Computer Simulation; DNA; chemistry; metabolism; Models, Chemical; Models, Molecular; Nucleic Acid Conformation; Protein Structure, Secondary; Proteins; chemistry; metabolism; Thermodynamics; protein–DNA ; coarse grain; docking ; simulation; ATTRACT
Determining the structure of protein-protein complexes remains a difficult and lengthy process, either by NMR or by X-ray crystallography. Several computational methods based on docking have been developed to support and even serve as possible alternatives to these experimental methods. In this paper, we introduce a new protein-protein docking algorithm, shDock, based on shape complementarity. We characterize the local geometry on each protein surface with a new shape descriptor, the surface-histogram. We measure the complementarity between two surface-histograms, one on each protein, using a modified Manhattan distance. When a match is found between two local protein surfaces, a model is generated for the protein complex, which is then scored by checking for collision between the two proteins. We have tested our algorithm on Version 3 of the ZDOCK protein-protein docking benchmark. We found that for 110 out of the 124 test cases of bound docking in the benchmark, our algorithm was able to generate a model in the top 3600 candidates for the protein complex within an RMSD of 2.5 Å from its native structure. For unbound docking predictions, we found a model within 2.5 Å in the top 3600 models in 54 out of 124 test cases. A comparison with other shape-based docking algorithms demonstrates that our approach gives significantly improved performance for both bound and unbound docking test cases.
protein-protein docking; protein surface; shape descriptor; surface-histogram
Computational approaches to protein-protein docking typically include scoring aimed at improving the rank of the near-native structure relative to the false-positive matches. Knowledge-based potentials improve modeling of protein complexes by taking advantage of the rapidly increasing amount of experimentally derived information on protein-protein association. An essential element of knowledge-based potentials is defining the reference state for an optimal description of the residue-residue (or atom-atom) pairs in the non-interaction state.
The study presents a new Distance- and Environment-dependent, Coarse-grained, Knowledge-based (DECK) potential for scoring of protein-protein docking predictions. Training sets of protein-protein matches were generated based on bound and unbound forms of proteins taken from the DOCKGROUND resource. Each residue was represented by a pseudo-atom in the geometric center of the side chain. To capture the long-range and the multi-body interactions, residues in different secondary structure elements at protein-protein interfaces were considered as different residue types. Five reference states for the potentials were defined and tested. The optimal reference state was selected and the cutoff effect on the distance-dependent potentials investigated. The potentials were validated on the docking decoys sets, showing better performance than the existing potentials used in scoring of protein-protein docking results.
A novel residue-based statistical potential for protein-protein docking was developed and validated on docking decoy sets. The results show that the scoring function DECK can successfully identify near-native protein-protein matches and thus is useful in protein docking. In addition to the practical application of the potentials, the study provides insights into the relative utility of the reference states, the scope of the distance dependence, and the coarse-graining of the potentials.
Many hnRNP proteins and snRNPs interact with hnRNA in the nucleus of eukaryotic cells and affect the fate of hnRNA and its processing into mRNA. There are at least 20 abundant proteins in vertebrate cell hnRNP complexes and their structure and arrangement on specific hnRNAs is likely to be important for the processing of pre-mRNAs. hnRNP I, a basic protein of ca. 58,000 daltons by SDS-PAGE, is one of the abundant hnRNA-binding proteins. Monoclonal antibodies to hnRNP I were produced and full length cDNA clones for hnRNP I were isolated and sequenced. The sequence of hnRNP I (59,632 daltons and pI 9.86) demonstrates that it is identical to the previously described polypyrimidine tract-binding protein (PTB) and shows that it is highly related to hnRNP L. The sequences of these two proteins, I and L, define a new family of hnRNP proteins within the large superfamily of the RNP consensus RNA-binding proteins. Here we describe experiments which reveal new and unique properties on the association of hnRNP I/PTB with hnRNP complexes and on its cellular localization. Micrococcal nuclease digestions show that hnRNP I, along with hnRNP S and P, is released from hnRNP complexes by nuclease digestion more readily than most other hnRNP proteins. This nuclease hypersensitivity suggests that hnRNP I is bound to hnRNA regions that are particularly exposed in the complexes. Immunofluorescence microscopy shows that hnRNP I is found in the nucleoplasm but in addition high concentrations are detected in a discrete perinucleolar structure. Thus, the PTB is one of the major proteins that bind pre-mRNAs; it is bound to nuclease-hypersensitive regions of the hnRNA-protein complexes and shows a novel pattern of nuclear localization.
The actin cytoskeleton is a dynamic structure that coordinates numerous fundamental processes in eukaryotic cells. Dozens of actin-binding proteins are known to be involved in the regulation of actin filament organization or turnover and many of these are stimulus-response regulators of phospholipid signaling. One of these proteins is the heterodimeric actin-capping protein (CP) which binds the barbed end of actin filaments with high affinity and inhibits both addition and loss of actin monomers at this end. The ability of CP to bind filaments is regulated by signaling phospholipids, which inhibit the activity of CP; however, the exact mechanism of this regulation and the residues on CP responsible for lipid interactions is not fully resolved. Here, we focus on the interaction of CP with two signaling phospholipids, phosphatidic acid (PA) and phosphatidylinositol (4,5)-bisphosphate (PIP2). Using different methods of computational biology such as homology modeling, molecular docking and coarse-grained molecular dynamics, we uncovered specific modes of high affinity interaction between membranes containing PA/phosphatidylcholine (PC) and plant CP, as well as between PIP2/PC and animal CP. In particular, we identified differences in the binding of membrane lipids by animal and plant CP, explaining previously published experimental results. Furthermore, we pinpoint the critical importance of the C-terminal part of plant CPα subunit for CP–membrane interactions. We prepared a GST-fusion protein for the C-terminal domain of plant α subunit and verified this hypothesis with lipid-binding assays in vitro.
The actin cytoskeleton is a prominent feature of eukaryotes and plays a central role in many essential aspects of their lives. This highly malleable structure responds to a wide range of stimuli with rapid changes in organization or dynamics. These responses are thought to be mediated by dozens of actin-binding proteins, the biochemical activities of which have been demonstrated to be tightly controlled by other proteins and/or signal transduction mediators. In this study, we investigated the structural aspects of inhibition of actin-capping protein (CP) by phosphatidic acid (PA) and phosphatidylinositol (4,5)-bisphosphate (PIP2). We employed diverse computational methods in combination with experimental approaches to reveal mechanistic details of the direct interaction of CP with the phospholipid membrane containing either PA or PIP2. Importantly, we found several differences between PA/PIP2–CP interactions from two distinct species, Arabidopsis and chicken, that enable us to explain and expand upon previously published results. Our new data shed light on the nature of interactions between peripheral membrane proteins and PA-containing lipid bilayers. In addition to a description of the phospholipid-mediated regulation of CP activity, our work also significantly contributes to the ongoing debate on structural details of protein interactions with phospholipids.
Structural details of protein–protein interactions are invaluable for understanding and deciphering biological mechanisms. Computational docking methods aim to predict the structure of a protein–protein complex given the structures of its single components. Protein flexibility and the absence of robust scoring functions pose a great challenge in the docking field. Due to these difficulties most of the docking methods involve a two-tier approach: coarse global search for feasible orientations that treats proteins as rigid bodies, followed by an accurate refinement stage that aims to introduce flexibility into the process. The FireDock web server, presented here, is the first web server for flexible refinement and scoring of protein–protein docking solutions. It includes optimization of side-chain conformations and rigid-body orientation and allows a high-throughput refinement. The server provides a user-friendly interface and a 3D visualization of the results. A docking protocol consisting of a global search by PatchDock and a refinement by FireDock was extensively tested. The protocol was successful in refining and scoring docking solution candidates for cases taken from docking benchmarks. We provide an option for using this protocol by automatic redirection of PatchDock candidate solutions to the FireDock web server for refinement. The FireDock web server is available at http://bioinfo3d.cs.tau.ac.il/FireDock/.
Protein–protein docking algorithms aim to predict the structure of a complex given the atomic structures of the proteins that assemble it. The docking procedure usually consists of two main steps: docking candidate generation and their refinement. The refinement stage aims to improve the accuracy of the candidate solutions and to identify near-native solutions among them. During protein–protein interaction, both side chains and backbone change their conformation. Refinement methods should model these conformational changes in order to obtain a more accurate model of the complex. Handling protein backbone flexibility is a major challenge for docking methodologies, since backbone flexibility adds a huge number of degrees of freedom to the search space. FiberDock is the first docking refinement web server, which accounts for both backbone and side-chain flexibility. Given a set of up to 100 potential docking candidates, FiberDock models the backbone and side-chain movements that occur during the interaction, refines the structures and scores them according to an energy function. The FiberDock web server is free and available with no login requirement at http://bioinfo3d.cs.tau.ac.il/FiberDock/.
Pre-mRNA splicing is catalyzed by the spliceosome, a multimegadalton ribonucleoprotein (RNP) complex comprised of five snRNPs and numerous proteins. Intricate RNA-RNA and RNP networks, which serve to align the reactive groups of the pre-mRNA for catalysis, are formed and repeatedly rearranged during spliceosome assembly and catalysis. Both the conformation and composition of the spliceosome are highly dynamic, affording the splicing machinery its accuracy and flexibility, and these remarkable dynamics are largely conserved between yeast and metazoans. Because of its dynamic and complex nature, obtaining structural information about the spliceosome represents a major challenge. Electron microscopy has revealed the general morphology of several spliceosomal complexes and their snRNP subunits, and also the spatial arrangement of some of their components. X-ray and NMR studies have provided high resolution structure information about spliceosomal proteins alone or complexed with one or more binding partners. The extensive interplay of RNA and proteins in aligning the pre-mRNA's reactive groups, and the presence of both RNA and protein at the core of the splicing machinery, suggest that the spliceosome is an RNP enzyme. However, elucidation of the precise nature of the spliceosome's active site, awaits the generation of a high-resolution structure of its RNP core.
Spliceosomes contain five snRNPs and numerous non-snRNP proteins. These continuously rearrange during spliceosome assembly and activation so that the reactive groups in the pre-mRNA substrate are correctly aligned for catalysis.
Protein-protein docking is a challenging computational problem in functional genomics, particularly when one or both proteins undergo conformational change(s) upon binding. The major challenge is to define a scoring function soft enough to tolerate these changes and specific enough to distinguish between near-native and "misdocked" conformations.
Using a linear programming (LP) technique, we developed two types of potentials: (i) Side chain-based and (ii) Heavy atom-based. To achieve this we considered a set of 161 transient complexes and generated a large set of putative docked structures (decoys), based on a shape complementarity criterion, for each complex. The demand on the potentials was to yield, for the native (correctly docked) structure, a potential energy lower than those of any of the non-native (misdocked) structures. We show that the heavy atom-based potentials were able to comply with this requirement but not the side chain-based one. Thus, despite the smaller number of parameters, the capability of heavy atom-based potentials to discriminate between native and "misdocked" conformations is improved relative to those of the side chain-based potentials. The performance of the atom-based potentials was evaluated by a jackknife test on a set of 50 complexes taken from the Zdock2.3 decoys set.
Our results show that, using the LP approach, we were able to train our potentials using a dataset of transient complexes only the newly developed potentials outperform three other known potentials in this test.
In recent years, protein–protein interactions are becoming the object of increasing attention in many different fields, such as structural biology, molecular biology, systems biology, and drug discovery. From a structural biology perspective, it would be desirable to integrate current efforts into the structural proteomics programs. Given that experimental determination of many protein–protein complex structures is highly challenging, and in the context of current high-performance computational capabilities, different computer tools are being developed to help in this task. Among them, computational docking aims to predict the structure of a protein–protein complex starting from the atomic coordinates of its individual components, and in recent years, a growing number of docking approaches are being reported with increased predictive capabilities. The improvement of speed and accuracy of these docking methods, together with the modeling of the interaction networks that regulate the most critical processes in a living organism, will be essential for computational proteomics. The ultimate goal is the rational design of drugs capable of specifically inhibiting or modifying protein–protein interactions of therapeutic significance. While rational design of protein–protein interaction inhibitors is at its very early stage, the first results are promising.
protein-protein interactions; drug design; protein docking; structural prediction; virtual ligand screening; hot-spots
Flexible peptides that fold upon binding to another protein molecule mediate a large number of regulatory interactions in the living cell and may provide highly specific recognition modules. We present Rosetta FlexPepDock ab-initio, a protocol for simultaneous docking and de-novo folding of peptides, starting from an approximate specification of the peptide binding site. Using the Rosetta fragments library and a coarse-grained structural representation of the peptide and the receptor, FlexPepDock ab-initio samples efficiently and simultaneously the space of possible peptide backbone conformations and rigid-body orientations over the receptor surface of a given binding site. The subsequent all-atom refinement of the coarse-grained models includes full side-chain modeling of both the receptor and the peptide, resulting in high-resolution models in which key side-chain interactions are recapitulated. The protocol was applied to a benchmark in which peptides were modeled over receptors in either their bound backbone conformations or in their free, unbound form. Near-native peptide conformations were identified in 18/26 of the bound cases and 7/14 of the unbound cases. The protocol performs well on peptides from various classes of secondary structures, including coiled peptides with unusual turns and kinks. The results presented here significantly extend the scope of state-of-the-art methods for high-resolution peptide modeling, which can now be applied to a wide variety of peptide-protein interactions where no prior information about the peptide backbone conformation is available, enabling detailed structure-based studies and manipulation of those interactions.
The heterogeneous nuclear ribonucleoprotein (hnRNP) F belongs to the hnRNP H family involved in the regulation of alternative splicing and polyadenylation and specifically recognizes poly(G) sequences (G-tracts). In particular, hnRNP F binds a G-tract of the Bcl-x RNA and regulates its alternative splicing, leading to two isoforms, Bcl-xS and Bcl-xL, with antagonist functions. In order to gain insight into G-tract recognition by hnRNP H members, we initiated an NMR study of human hnRNP F. We present the solution structure of the three quasi RNA recognition motifs (qRRMs) of hnRNP F and identify the residues that are important for the interaction with the Bcl-x RNA by NMR chemical shift perturbation and mutagenesis experiments. The three qRRMs exhibit the canonical βαββαβ RRM fold but additional secondary structure elements are present in the two N-terminal qRRMs of hnRNP F. We show that qRRM1 and qRRM2 but not qRRM3 are responsible for G-tract recognition and that the residues of qRRM1 and qRRM2 involved in G-tract interaction are not on the β-sheet surface as observed for the classical RRM but are part of a short β-hairpin and two adjacent loops. These regions define a novel interaction surface for RNA recognition by RRMs.
Protein-DNA interactions are the physical basis of gene expression and DNA modification. Structural models that reveal these interactions are essential for their understanding. As only a limited number of structures for protein-DNA complexes have been determined by experimental methods, computation methods provide a potential way to fill the need. We have developed the DISPLAR method to predict DNA binding sites on proteins. Predicted binding sites have been used to assist the building of structural models by docking, either by guiding the docking or by selecting near-native candidates from the docked poses. Here we applied the DISPLAR method to predict the DNA binding sites for 20 DNA-binding proteins, which have had their DNA binding sites characterized by NMR chemical shift perturbation. For two of these proteins, the structures of their complexes with DNA have also been determined. With the help of the DISPLAR predictions, we built structural models for these two complexes. Evaluations of both the 20 DNA binding sites and the structural models of the two protein-DNA complexes against experimental results demonstrate the significant promise of our model-building approach.
protein-DNA interaction; interface prediction; interaction sites
RNA-binding proteins play many essential roles in the regulation of gene expression in the cell. Despite the significant increase in the number of structures for RNA–protein complexes in the last few years, the molecular basis of specificity remains unclear even for the best-studied protein families. We have developed a distance and orientation-dependent hydrogen-bonding potential based on the statistical analysis of hydrogen-bonding geometries that are observed in high-resolution crystal structures of protein–DNA and protein–RNA complexes. We observe very strong geometrical preferences that reflect significant energetic constraints on the relative placement of hydrogen-bonding atom pairs at protein–nucleic acid interfaces. A scoring function based on the hydrogen-bonding potential discriminates native protein–RNA structures from incorrectly docked decoys with remarkable predictive power. By incorporating the new hydrogen-bonding potential into a physical model of protein–RNA interfaces with full atom representation, we were able to recover native amino acids at protein–RNA interfaces.
A novel small nuclear ribonucleoprotein (snRNP) complex containing both U11 and U12 RNAs has been identified in HeLa cell extracts. This U11/U12 snRNP complex can be visualized on glycerol gradients, on native polyacrylamide gels, and by selection with antisense 2'-O-methyl oligoribonucleotides. RNase H-mediated degradation of the U12 snRNA confirmed a direct interaction between the U11 and U12 snRNPs. This snRNP complex is the first to be identified involving low-abundance snRNPs. Selection of the U11/U12 snRNP complex is sensitive to high salt, suggestive of a protein-mediated interaction. Secondary structure analyses revealed several regions of the U11 snRNP accessible for interaction with other RNAs or proteins but no detectable difference between the accessibility of these regions in the U11 monoparticle compared with the U11/U12 snRNP complex. There are also several accessible single-stranded regions in the U12 snRNP, and oligonucleotide-directed RNase H digestion identified nucleotides 28 to 36 of U12 as containing sequences required for the U11/U12 interaction. Both the U12 snRNP and the U11/U12 snRNP complex can be disrupted without altering the cleavage/polyadenylation activity of a nuclear extract.
The influenza A virus genome consists of eight single-stranded negative-sense RNA (vRNA) segments. Although genome segmentation provides advantages such as genetic reassortment, which contributes to the emergence of novel strains with pandemic potential, it complicates the genome packaging of progeny virions. Here we elucidate, using electron tomography, the three-dimensional structure of ribonucleoprotein complexes (RNPs) within progeny virions. Each virion is packed with eight well-organized RNPs that possess rod-like structures of different lengths. Multiple interactions are found among the RNPs. The position of the eight RNPs is not consistent among virions, but a pattern suggests the existence of a specific mechanism for assembly of these RNPs. Analyses of budding progeny virions suggest two independent roles for the viral spike proteins: RNP association on the plasma membrane and the subsequent formation of the virion shell. Our data provide further insights into the mechanisms responsible for segmented-genome packaging into virions.
The influenza A virus genome consists of eight RNA segments, which permits genetic reassortment and contributes to the emergence of novel strains with pandemic potential. Here, electron tomography is used to study the three-dimensional structure of ribonucleoprotein complexes within progeny virions.
Many functionally important proteins in a cell form complexes with multiple chains. Therefore, computational prediction of multiple protein complexes is an important task in bioinformatics. In the development of multiple protein docking methods, it is important to establish a metric for evaluating prediction results in a reasonable and practical fashion. However, since there are only few works done in developing methods for multiple protein docking, there is no study that investigates how accurate structural models of multiple protein complexes should be to allow scientists to gain biological insights.
We generated a series of predicted models (decoys) of various accuracies by our multiple protein docking pipeline, Multi-LZerD, for three multi-chain complexes with 3, 4, and 6 chains. We analyzed the decoys in terms of the number of correctly predicted pair conformations in the decoys.
Results and conclusion
We found that pairs of chains with the correct mutual orientation exist even in the decoys with a large overall root mean square deviation (RMSD) to the native. Therefore, in addition to a global structure similarity measure, such as the global RMSD, the quality of models for multiple chain complexes can be better evaluated by using the local measurement, the number of chain pairs with correct mutual orientation. We termed the fraction of correctly predicted pairs (RMSD at the interface of less than 4.0Å) as fpair and propose to use it for evaluation of the accuracy of multiple protein docking.