PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (27)
 

Clipboard (0)
None
Journals
Year of Publication
Document Types
1.  Structure of a conserved hypothetical protein SA1388 from S. aureus reveals a capped hexameric toroid with two PII domain lids and a dinuclear metal center 
Background
The protein encoded by the SA1388 gene from Staphylococcus aureus was chosen for structure determination to elucidate its domain organization and confirm our earlier remote homology based prediction that it housed a nitrogen regulatory PII protein-like domain. SA1388 was predicted to contain a central PII-like domain and two flanking regions, which together belong to the NIF3-like protein family. Proteins like SA1388 remain a poorly studied group and their structural characterization could guide future investigations aimed at understanding their function.
Results
The structure of SA1388 has been solved to 2.0Å resolution by single wavelength anomalous dispersion phasing method using selenium anomalous signals. It reveals a canonical NIF3-like fold containing two domains with a PII-like domain inserted in the middle of the polypeptide. The N and C terminal halves of the NIF3-like domains are involved in dimerization, while the PII domain forms trimeric contacts with symmetry related monomers. Overall, the NIF3-like domains of SA1388 are organized as a hexameric toroid similar to its homologs, E. coli ybgI and the hypothetical protein SP1609 from Streptococcus pneumoniae. The openings on either side of the toroid are partially covered by trimeric "lids" formed by the PII domains. The junction of the two NIF3 domains has two zinc ions bound at what appears to be a histidine rich active site. A well-defined electron density corresponding to an endogenously bound ligand of unknown identity is observed in close proximity to the metal site.
Conclusion
SA1388 is the third member of the NIF3-like family of proteins to be structurally characterized, the other two also being hypothetical proteins of unknown function. The structure of SA1388 confirms our earlier prediction that the inserted domain that separates the two NIF3 domains adopts a PII-like fold and reveals an overall capped toroidal arrangement for the protein hexamer. The six PII-like domains form two trimeric "lids" that cap the central cavity of the toroid on either side and provide only small openings to allow regulated entry of small molecules into the occluded chamber. The presence of the electron density of the bound ligand may provide important clues on the likely function of NIF3-like proteins.
doi:10.1186/1472-6807-6-27
PMCID: PMC1779786  PMID: 17187687
2.  Analysis of an optimal hidden Markov model for secondary structure prediction 
Background
Secondary structure prediction is a useful first step toward 3D structure prediction. A number of successful secondary structure prediction methods use neural networks, but unfortunately, neural networks are not intuitively interpretable. On the contrary, hidden Markov models are graphical interpretable models. Moreover, they have been successfully used in many bioinformatic applications. Because they offer a strong statistical background and allow model interpretation, we propose a method based on hidden Markov models.
Results
Our HMM is designed without prior knowledge. It is chosen within a collection of models of increasing size, using statistical and accuracy criteria. The resulting model has 36 hidden states: 15 that model α-helices, 12 that model coil and 9 that model β-strands. Connections between hidden states and state emission probabilities reflect the organization of protein structures into secondary structure segments. We start by analyzing the model features and see how it offers a new vision of local structures. We then use it for secondary structure prediction. Our model appears to be very efficient on single sequences, with a Q3 score of 68.8%, more than one point above PSIPRED prediction on single sequences. A straightforward extension of the method allows the use of multiple sequence alignments, rising the Q3 score to 75.5%.
Conclusion
The hidden Markov model presented here achieves valuable prediction results using only a limited number of parameters. It provides an interpretable framework for protein secondary structure architecture. Furthermore, it can be used as a tool for generating protein sequences with a given secondary structure content.
doi:10.1186/1472-6807-6-25
PMCID: PMC1769381  PMID: 17166267
3.  Structure of the yeast histone H3-ASF1 interaction: implications for chaperone mechanism, species-specific interactions, and epigenetics 
Background
The histone H3/H4 chaperone Asf1 (anti-silencing function 1) is required for the establishment and maintenance of proper chromatin structure, as well as for genome stability in eukaryotes. Asf1 participates in both DNA replication-coupled (RC) and replication-independent (RI) histone deposition reactions in vitro and interacts with complexes responsible for both pathways in vivo. Asf1 is known to directly bind histone H3, however, high-resolution structural information about the geometry of this interaction was previously unknown.
Results
Here we report the structure of a histone/histone chaperone interaction. We have solved the 2.2 Å crystal structure of the conserved N-terminal immunoglobulin fold domain of yeast Asf1 (residues 2–155) bound to the C-terminal helix of yeast histone H3 (residues 121–134). The structure defines a histone-binding patch on Asf1 consisting of both conserved and yeast-specific residues; mutation of these residues abrogates H3/H4 binding affinity. The geometry of the interaction indicates that Asf1 binds to histones H3/H4 in a manner that likely blocks sterically the H3/H3 interface of the nucleosomal four-helix bundle.
Conclusion
These data clarify how Asf1 regulates histone stoichiometry to modulate epigenetic inheritance. The structure further suggests a physical model in which Asf1 contributes to interpretation of a "histone H3 barcode" for sorting H3 isoforms into different deposition pathways.
doi:10.1186/1472-6807-6-26
PMCID: PMC1762009  PMID: 17166288
4.  Molecular modeling and characterization of Vibrio cholerae transcription regulator HlyU 
Background
The SmtB/ArsR family of prokaryotic metal-regulatory transcriptional repressors represses the expression of operons linked to stress-inducing concentrations of heavy metal ions, while derepression results from direct binding of metal ions by these 'metal-sensor' proteins. The HlyU protein from Vibrio cholerae is the positive regulator of haemolysin gene, it also plays important role in the regulation of expression of the virulence genes. Despite the understanding of biochemical properties, its structure and relationship to other protein families remain unknown.
Results
We find that HlyU exhibits structural features common to the SmtB/ArsR family of transcriptional repressors. Analysis of the modeled structure of HlyU reveals that it does not have the key metal-sensing residues which are unique to the SmtB/ArsR family of repressors, yet the tertiary structure is very similar to the family members. HlyU is the only member that has a positive control on transcription, while all the other members in the family are repressors. An evolutionary analysis with other SmtB/ArsR family members suggests that during evolution HlyU probably occurred by gene duplication and mutational events that led to the emergence of this protein from ancestral transcriptional repressor by the loss of the metal-binding sites.
Conclusion
The study indicates that the same protein family can contain both the positive regulator of transcription and repressors – the exact function being controlled by the absence or the presence of metal-binding sites.
doi:10.1186/1472-6807-6-24
PMCID: PMC1665450  PMID: 17116251
5.  Sequence analysis and structure prediction of type II Pseudomonas sp. USM 4–55 PHA synthase and an insight into its catalytic mechanism 
Background
Polyhydroxyalkanoates (PHA), are biodegradable polyesters derived from many microorganisms such as the pseudomonads. These polyesters are in great demand especially in the packaging industries, the medical line as well as the paint industries. The enzyme responsible in catalyzing the formation of PHA is PHA synthase. Due to the limited structural information, its functional properties including catalysis are lacking. Therefore, this study seeks to investigate the structural properties as well as its catalytic mechanism by predicting the three-dimensional (3D) model of the Type II Pseudomonas sp. USM 4–55 PHA synthase 1 (PhaC1P.sp USM 4–55).
Results
Sequence analysis demonstrated that PhaC1P.sp USM 4–55 lacked similarity with all known structures in databases. PSI-BLAST and HMM Superfamily analyses demonstrated that this enzyme belongs to the alpha/beta hydrolase fold family. Threading approach revealed that the most suitable template to use was the human gastric lipase (PDB ID: 1HLG). The superimposition of the predicted PhaC1P.sp USM 4–55 model with 1HLG covering 86.2% of the backbone atoms showed an RMSD of 1.15 Å. The catalytic residues comprising of Cys296, Asp451 and His479 were found to be conserved and located adjacent to each other. In addition to this, an extension to the catalytic mechanism was also proposed whereby two tetrahedral intermediates were believed to form during the PHA biosynthesis. These transition state intermediates were further postulated to be stabilized by the formation of oxyanion holes. Based on the sequence analysis and the deduced model, Ser297 was postulated to contribute to the formation of the oxyanion hole.
Conclusion
The 3D model of the core region of PhaC1P.sp USM 4–55 from residue 267 to residue 484 was developed using computational techniques and the locations of the catalytic residues were identified. Results from this study for the first time highlighted Ser297 potentially playing an important role in the enzyme's catalytic mechanism.
doi:10.1186/1472-6807-6-23
PMCID: PMC1636056  PMID: 17076907
6.  Structure of vaccinia virus thymidine kinase in complex with dTTP: insights for drug design 
Background
Development of countermeasures to bioterrorist threats such as those posed by the smallpox virus (variola), include vaccination and drug development. Selective activation of nucleoside analogues by virus-encoded thymidine (dThd) kinases (TK) represents one of the most successful strategies for antiviral chemotherapy as demonstrated for anti-herpes drugs. Vaccinia virus TK is a close orthologue of variola TK but also shares a relatively high sequence identity to human type 2 TK (hTK), thus achieving drug selectivity relative to the host enzyme is challenging.
Results
In order to identify any differences compared to hTK that may be exploitable in drug design, we have determined the crystal structure of VVTK, in complex with thymidine 5'-triphosphate (dTTP). Although most of the active site residues are conserved between hTK and VVTK, we observe a difference in conformation of residues Asp-43 and Arg-45. The equivalent residues in hTK hydrogen bond to dTTP, whereas in subunit D of VVTK, Asp-43 and Arg-45 adopt a different conformation preventing interaction with this nucleotide. Asp-43 and Arg-45 are present in a flexible loop, which is disordered in subunits A, B and C. The observed difference in conformation and flexibility may also explain the ability of VVTK to phosphorylate (South)-methanocarbathymine whereas, in contrast, no substrate activity with hTK is reported for this compound.
Conclusion
The difference in conformation for Asp-43 and Arg-45 could thus be used in drug design to generate VVTK/Variola TK-selective nucleoside analogue substrates and/or inhibitors that have lower affinity for hTK.
doi:10.1186/1472-6807-6-22
PMCID: PMC1636055  PMID: 17062140
7.  Autoinsertion of soluble oligomers of Alzheimer's Aβ(1–42) peptide into cholesterol-containing membranes is accompanied by relocation of the sterol towards the bilayer surface 
Background
Soluble Alzheimer's Aβ oligomers autoinsert into neuronal cell membranes, contributing to the pathology of Alzheimer's Disease (AD), and elevated serum cholesterol is a risk factor for AD, but the reason is unknown. We investigated potential connections between these two observations at the membrane level by testing the hypothesis that Aβ(1–42) relocates membrane cholesterol.
Results
Oligomers of Aβ(1–42), but not the monomeric peptide, inserted into cholesterol-containing phosphatidylcholine monolayers with an anomalously low molecular insertion area, suggesting concurrent lipid rearrangement. Membrane neutron diffraction, including isomorphous replacement of specific lipid hydrogens with highly-scattering deuterium, showed that Aβ(1–42) insertion was accompanied by outward displacement of membrane cholesterol, towards the polar surfaces of the bilayer. Changes in the generalised polarisation of laurdan confirmed that the structural changes were associated with a functional alteration in membrane lipid order.
Conclusion
Cholesterol is known to regulate membrane lipid order, and this can affect a wide range of membrane mechanisms, including intercellular signalling. Previously unrecognised Aβ-dependent rearrangement of the membrane sterol could have an important role in AD.
doi:10.1186/1472-6807-6-21
PMCID: PMC1657013  PMID: 17052343
8.  The crystal structure of superoxide dismutase from Plasmodium falciparum 
Background
Superoxide dismutases (SODs) are important enzymes in defence against oxidative stress. In Plasmodium falciparum, they may be expected to have special significance since part of the parasite life cycle is spent in red blood cells where the formation of reactive oxygen species is likely to be promoted by the products of haemoglobin breakdown. Thus, inhibitors of P. falciparum SODs have potential as anti-malarial compounds. As a step towards their development we have determined the crystal structure of the parasite's cytosolic iron superoxide dismutase.
Results
The cytosolic iron superoxide dismutase from P. falciparum (PfFeSOD) has been overexpressed in E. coli in a catalytically active form. Its crystal structure has been solved by molecular replacement and refined against data extending to 2.5 Å resolution. The structure reveals a two-domain organisation and an iron centre in which the metal is coordinated by three histidines, an aspartate and a solvent molecule. Consistent with ultracentrifugation analysis the enzyme is a dimer in which a hydrogen bonding lattice links the two active centres.
Conclusion
The tertiary structure of PfFeSOD is very similar to those of a number of other iron-and manganese-dependent superoxide dismutases, moreover the active site residues are conserved suggesting a common mechanism of action. Comparison of the dimer interfaces of PfFeSOD with the human manganese-dependent superoxide dismutase reveals a number of differences, which may underpin the design of parasite-selective superoxide dismutase inhibitors.
doi:10.1186/1472-6807-6-20
PMCID: PMC1618392  PMID: 17020617
9.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation 
Background
Identifying pockets on protein surfaces is of great importance for many structure-based drug design applications and protein-ligand docking algorithms. Over the last ten years, many geometric methods for the prediction of ligand-binding sites have been developed.
Results
We present LIGSITEcsc, an extension and implementation of the LIGSITE algorithm. LIGSITEcsc is based on the notion of surface-solvent-surface events and the degree of conservation of the involved surface residues. We compare our algorithm to four other approaches, LIGSITE, CAST, PASS, and SURFNET, and evaluate all on a dataset of 48 unbound/bound structures and 210 bound-structures. LIGSITEcsc performs slightly better than the other tools and achieves a success rate of 71% and 75%, respectively.
Conclusion
The use of the Connolly surface leads to slight improvements, the prediction re-ranking by conservation to significant improvements of the binding site predictions. A web server for LIGSITEcsc and its source code is available at scoppi.biotec.tu-dresden.de/pocket.
doi:10.1186/1472-6807-6-19
PMCID: PMC1601958  PMID: 16995956
10.  Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison 
Background
Protein structure comparison is one of the most important problems in computational biology and plays a key role in protein structure prediction, fold family classification, motif finding, phylogenetic tree reconstruction and protein docking.
Results
We propose a novel method to compare the protein structures in an accurate and efficient manner. Such a method can be used to not only reveal divergent evolution, but also identify circular permutations and further detect active-sites. Specifically, we define the structure alignment as a multi-objective optimization problem, i.e., maximizing the number of aligned atoms and minimizing their root mean square distance. By controlling a single distance-related parameter, theoretically we can obtain a variety of optimal alignments corresponding to different optimal matching patterns, i.e., from a large matching portion to a small matching portion. The number of variables in our algorithm increases with the number of atoms of protein pairs in almost a linear manner. In addition to solid theoretical background, numerical experiments demonstrated significant improvement of our approach over the existing methods in terms of quality and efficiency. In particular, we show that divergent evolution, circular permutations and active-sites (or structural motifs) can be identified by our method. The software SAMO is available upon request from the authors, or from and .
Conclusion
A novel formulation is proposed to accurately align protein structures in the framework of multi-objective optimization, based on a sequence order-independent strategy. A fast and accurate algorithm based on the bipartite matching algorithm is developed by exploiting the special features. Convergence of computation is shown in experiments and is also theoretically proven.
doi:10.1186/1472-6807-6-18
PMCID: PMC1574323  PMID: 16948858
11.  Similar folds with different stabilization mechanisms: the cases of prion and doppel proteins 
Background
Protein misfolding is the main cause of a group of fatal neurodegenerative diseases in humans and animals. In particular, in Prion-related diseases the normal cellular form of the Prion Protein PrP (PrPC) is converted into the infectious PrPSc through a conformational process during which it acquires a high β-sheet content. Doppel is a protein that shares a similar native fold, but lacks the scrapie isoform. Understanding the molecular determinants of these different behaviours is important both for biomedical and biophysical research.
Results
In this paper, the dynamical and energetic properties of the two proteins in solution is comparatively analyzed by means of long time scale explicit solvent, all-atom molecular dynamics in different temperature conditions. The trajectories are analyzed by means of a recently introduced energy decomposition approach (Tiana et al, Prot. Sci. 2004) aimed at identifying the key residues for the stabilization and folding of the protein. Our analysis shows that Prion and Doppel have two different cores stabilizing the native state and that the relative contribution of the nucleus to the global stability of the protein for Doppel is sensitively higher than for PrP. Moreover, under misfolding conditions the Doppel core is conserved, while the energy stabilization network of PrP is disrupted.
Conclusion
These observations suggest that different sequences can share similar native topology with different stabilizing interactions and that the sequences of the Prion and Doppel proteins may have diverged under different evolutionary constraints resulting in different folding and stabilization mechanisms.
doi:10.1186/1472-6807-6-17
PMCID: PMC1574322  PMID: 16857062
12.  Observation of intermediate states of the human prion protein by high pressure NMR spectroscopy 
Background
Prions as causative agents of transmissible spongiform encephalopathies (TSEs) in humans and animals are composed of the infectious isomer, PrPSc, of the cellular prion protein, PrPC. The conversion and thus the propensity of PrPC to adopt alternative folds leads to the species-specific propagation of the disease. High pressure is a powerful tool to study the physico-chemical properties of proteins as well as the dynamics and structure of folding intermediates.
Results
Conformational intermediates of the human prion protein huPrPC were characterized by a combination of hydrostatic pressure (up to 200 MPa) with two-dimensional NMR spectroscopy. All pressure effects showed to be reversible and there is virtually no difference in the overall pressure response between the folded core of the N-terminal truncated huPrPC(121–230) and the full-length huPrPC(23–230). The only significant differences in the pressure response of full-length and truncated PrP suggest that E168, H187, T192, E207, E211 and Y226 are involved in a transient interaction with the unfolded N-terminus. High-pressure NMR spectroscopy indicates that the folded core of the human prion protein occurs in two structural states N1and N2 in solution associated with rather small differences in free enthalpies (3.0 kJ/mol). At atmospheric pressure approximately 29% of the protein are already in the pressure favored conformation N2. There is a second process representing two possible folding intermediates I1 and I2 with corresponding average free enthalpies of 10.8 and 18.6 kJ/mol. They could represent preaggregation states of the protein that coexist at ambient pressure with a very small population of approximately 1.2% and less than 0.1%. Further the pressure response of the N-terminus indicates that four different regions are in a fast equilibrium with non-random structural states whose populations are shifted by pressure.
Conclusion
We identified pressure stabilized folding intermediates of the human prion protein. The regions reflecting most strongly the transition to the intermediate states are the β1/α1-loop and the solvent exposed side of α3. The most pressure-sensitive region (representing mainly intermediate I1) is the loop between β-strand 1 and α-helix 1 (residue 139–141), indicating that this region might be the first entry point for the infectious conformer to convert the cellular protein.
doi:10.1186/1472-6807-6-16
PMCID: PMC1557509  PMID: 16846506
13.  Saturating representation of loop conformational fragments in structure databanks 
Background
Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availability of short fragments in structure databanks.
Results
We studied the representation of protein loop fragments up to 14 residues in length. All possible query fragments found in sequence databases (Sequence Space) were clustered and cross referenced with available structural fragments in Protein Data Bank (Structure Space). We found that the expansion of PDB in the last few years resulted in a dense coverage of loop conformational fragments. For each loops of length 8 in the current Sequence Space there is at least one loop in Structure Space with 50% or higher sequence identity. By correlating sequence and structure clusters of loops we found that a 50% sequence identity generally guarantees structural similarity. These percentages of coverage at 50% sequence cutoff drop to 96, 94, 68, 53, 33 and 13% for loops of length 9, 10, 11, 12, 13, and 14, respectively. There is not a single loop in the current Sequence Space at any length up to 14 residues that is not matched with a conformational segment that shares at least 20% sequence identity. This minimum observed identity is 40% for loops of 12 residues or shorter and is as high as 50% for 10 residue or shorter loops. We also assessed the impact of rapidly growing sequence databanks on the estimated number of new loop conformations and found that while the number of sequentially unique sequence segments increased about six folds during the last five years there are almost no unique conformational segments among these up to 12 residues long fragments.
Conclusion
The results suggest that fragment based prediction approaches are not limited any more by the completeness of fragments in databanks but rather by the effective scoring and search algorithms to locate them. The current favorable coverage and trends observed will be further accentuated with the progress of Protein Structure Initiative that targets new protein folds and ultimately aims at providing an exhaustive coverage of the structure space.
doi:10.1186/1472-6807-6-15
PMCID: PMC1574324  PMID: 16820050
14.  A general method for the unbiased improvement of solution NMR structures by the use of related X-Ray data, the AUREMOL-ISIC algorithm 
Background
Rapid and accurate three-dimensional structure determination of biological macromolecules is mandatory to keep up with the vast progress made in the identification of primary sequence information. During the last few years the amount of data deposited in the protein data bank has substantially increased providing additional information for novel structure determination projects. The key question is how to combine the available database information with the experimental data of the current project ensuring that only relevant information is used and a correct structural bias is produced. For this purpose a novel fully automated algorithm based on Bayesian reasoning has been developed. It allows the combination of structural information from different sources in a consistent way to obtain high quality structures with a limited set of experimental data. The new ISIC (Intelligent Structural Information Combination) algorithm is part of the larger AUREMOL software package.
Results
Our new approach was successfully tested on the improvement of the solution NMR structures of the Ras-binding domain of Byr2 from Schizosaccharomyces pombe, the Ras-binding domain of RalGDS from human calculated from a limited set of NMR data, and the immunoglobulin binding domain from protein G from Streptococcus by their corresponding X-ray structures. In all test cases clearly improved structures were obtained. The largest danger in using data from other sources is a possible bias towards the added structure. In the worst case instead of a refined target structure the structure from the additional source is essentially reproduced. We could clearly show that the ISIC algorithm treats these difficulties properly.
Conclusion
In summary, we present a novel fully automated method to combine strongly coupled knowledge from different sources. The combination with validation tools such as the calculation of NMR R-factors strengthens the impact of the method considerably since the improvement of the structures can be assessed quantitatively. The ISIC method can be applied to a large number of similar problems where the quality of the obtained three-dimensional structures is limited by the available experimental data like the improvement of large NMR structures calculated from sparse experimental data or the refinement of low resolution X-ray structures. Also structures may be refined using other available structural information such as homology models.
doi:10.1186/1472-6807-6-14
PMCID: PMC1559696  PMID: 16800891
15.  Prediction of transmembrane helix orientation in polytopic membrane proteins 
Background
Membrane proteins compose up to 30% of coding sequences within genomes. However, their structure determination is lagging behind compared with soluble proteins due to the experimental difficulties. Therefore, it is important to develop reliable computational methods to predict structures of membrane proteins.
Results
We present a method for prediction of the TM helix orientation, which is an essential step in ab initio modeling of membrane proteins. Our method is based on a canonical model of the heptad repeat originally developed for coiled coils. We identify the helical surface patches that interface with lipid molecules at an accuracy of about 88% from the sequence information alone, using an empirical scoring function LIPS (LIPid-facing Surface), which combines lipophilicity and conservation of residues in the helix. We test and discuss results of prediction of helix-lipid interfaces on 162 transmembrane helices from 18 polytopic membrane proteins and present predicted orientations of TM helices in TRPV1 channel. We also apply our method to two structures of homologous cytochrome b6f complexes and find discrepancy in the assignment of TM helices from subunits PetG, PetN and PetL. The results of LIPS calculations and analysis of packing and H-bonding interactions support the helix assignment found in the cytochrome b6f structure from green alga but not the assignment of TM helices in the cyanobacterium b6f structure.
Conclusion
LIPS calculations can be used for the prediction of helix orientation in ab initio modeling of polytopic membrane proteins. We also show with the example of two cytochrome b6f structures that our method can identify questionable helix assignments in membrane proteins. The LIPS server is available online at .
doi:10.1186/1472-6807-6-13
PMCID: PMC1540425  PMID: 16792816
16.  Secondary structure spatial conformation footprint: a novel method for fast protein structure comparison and classification 
Background
Recently a new class of methods for fast protein structure comparison has emerged. We call the methods in this class projection methods as they rely on a mapping of protein structure into a high-dimensional vector space. Once the mapping is done, the structure comparison is reduced to distance computation between corresponding vectors. As structural similarity is approximated by distance between projections, the success of any projection method depends on how well its mapping function is able to capture the salient features of protein structure. There is no agreement on what constitutes a good projection technique and the three currently known projection methods utilize very different approaches to the mapping construction, both in terms of what structural elements are included and how this information is integrated to produce a vector representation.
Results
In this paper we propose a novel projection method that uses secondary structure information to produce the mapping. First, a diverse set of spatial arrangements of triplets of secondary structure elements, a set of structural models, is automatically selected. Then, each protein structure is mapped into a high-dimensional vector of "counts" or footprint, where each count corresponds to the number of times a given structural model is observed in the structure, weighted by the precision with which the model is reproduced. We perform the first comprehensive evaluation of our method together with all other currently known projection methods.
Conclusion
The results of our evaluation suggest that the type of structural information used by a projection method affects the ability of the method to detect structural similarity. In particular, our method that uses the spatial conformations of triplets of secondary structure elements outperforms other methods in most of the tests.
doi:10.1186/1472-6807-6-12
PMCID: PMC1526735  PMID: 16762072
17.  ProFace: a server for the analysis of the physicochemical features of protein-protein interfaces 
Background
Molecular recognition is all pervasive in biology. Protein molecules are involved in enzyme regulation, immune response, signal transduction, oligomer assembly, etc. Delineation of physical and chemical features of the interface formed by protein-protein association would allow us to better understand protein interaction networks on one hand, and to design molecules that can engage a given interface and thereby control protein function on the other hand.
Results
ProFace is a suite of programs that uses a file, containing atomic coordinates of a multi-chain molecule, as input and analyzes the interface between any two or more subunits. The interface residues are shown segregated into spatial patches (if such a clustering is possible based on an input threshold distance) and/or core and rim regions. A number of physicochemical parameters defining the interface is tabulated. Among the different output files, one contains the list of interacting residues across the interface. Results can be used to infer if a particular interface belongs to a homodimeric molecule.
Conclusion
A web-server, ProFace (available at ) has been developed for dissecting protein-protein interfaces and deriving various physicochemical parameters.
doi:10.1186/1472-6807-6-11
PMCID: PMC1513576  PMID: 16759379
18.  Characterization of the family of Mistic homologues 
Background
Mistic is a unique Bacillus subtilis protein with virtually no detectable homologues in GenBank, which appears to integrate into the bacterial membrane despite an overall hydrophilic composition. These unusual properties have been shown to be useful for high-yield recombinant expression of other membrane proteins through fusion to the C-terminus of Mistic. To better understand the structure and function of Mistic, we systematically searched for and characterized homologous proteins among closely related bacteria.
Results
Three homologues of Mistic were found with 62% to 93% residue identity, all only 84 residues in length, corresponding to the C-terminal residues of B. subtilis Mistic. In every case, the Mistic gene was found partially overlapping a downstream gene for a K+ channel protein. Residue variation amongst these sequences is restricted to loop regions of the protein's structure, suggesting that secondary structure elements and overall fold have been conserved. Additionally, all three homologues retain the functional ability to chaperone fusion partners to the membrane.
Conclusion
The functional core of Mistic consists of 84 moderately conserved residues that are sufficient for membrane targeting and integration. Understanding the minimal structural and chemical complexity of Mistic will lead to insights into the mechanistic underpinnings of Mistic-chaperoned membrane integration, as well as how to optimize its use for the recombinant heterologous expression of other integral membrane proteins of interest.
doi:10.1186/1472-6807-6-10
PMCID: PMC1471793  PMID: 16704729
19.  Fold-recognition and comparative modeling of human α2,3-sialyltransferases reveal their sequence and structural similarities to CstII from Campylobacter jejuni 
Background
The 3-D structure of none of the eukaryotic sialyltransferases (SiaTs) has been determined so far. Sequence alignment algorithms such as BLAST and PSI-BLAST could not detect a homolog of these enzymes from the protein databank. SiaTs, thus, belong to the hard/medium target category in the CASP experiments. The objective of the current work is to model the 3-D structures of human SiaTs which transfer the sialic acid in α2,3-linkage viz., ST3Gal I, II, III, IV, V, and VI, using fold-recognition and comparative modeling methods. The pair-wise sequence similarity among these six enzymes ranges from 41 to 63%.
Results
Unlike the sequence similarity servers, fold-recognition servers identified CstII, a α2,3/8 dual-activity SiaT from Campylobacter jejuni as the homolog of all the six ST3Gals; the level of sequence similarity between CstII and ST3Gals is only 15–20% and the similarity is restricted to well-characterized motif regions of ST3Gals. Deriving template-target sequence alignments for the entire ST3Gal sequence was not straightforward: the fold-recognition servers could not find a template for the region preceding the L-motif and that between the L- and S-motifs. Multiple structural templates were identified to model these regions and template identification-modeling-evaluation had to be performed iteratively to choose the most appropriate templates. The modeled structures have acceptable stereochemical properties and are also able to provide qualitative rationalizations for some of the site-directed mutagenesis results reported in literature. Apart from the predicted models, an unexpected but valuable finding from this study is the sequential and structural relatedness of family GT42 and family GT29 SiaTs.
Conclusion
The modeled 3-D structures can be used for docking and other modeling studies and for the rational identification of residues to be mutated to impart desired properties such as altered stability, substrate specificity, etc. Several studies in literature have focused on the development of tools and/or servers for the large-scale/automated modeling of 3-D structures of proteins. In contrast, the present study focuses on modeling the 3-D structure of a specific protein of interest to a biochemist and illustrates the associated difficulties. It is also able to establish a sequence/structure relationship between sialyltransferases of two distinct families.
doi:10.1186/1472-6807-6-9
PMCID: PMC1508147  PMID: 16620397
20.  A tale of two ferredoxins: sequence similarity and structural differences 
Background
Sequence similarity between proteins is usually considered a reliable indicator of homology. Pyruvate-ferredoxin oxidoreductase and quinol-fumarate reductase contain ferredoxin domains that bind [Fe-S] clusters and are involved in electron transport. Profile-based methods for sequence comparison, such as PSI-BLAST and HMMer, suggest statistically significant similarity between these domains.
Results
The sequence similarity between these ferredoxin domains resides in the area of the [Fe-S] cluster-binding sites. Although overall folds of these ferredoxins bear no obvious similarity, the regions of sequence similarity display a remarkable local structural similarity. These short regions with pronounced sequence motifs are incorporated in completely different structural environments. In pyruvate-ferredoxin oxidoreductase (bacterial ferredoxin), the hydrophobic core of the domain is completed by two β-hairpins, whereas in quinol-fumarate reductase (α-helical ferredoxin), the cluster-binding motifs are part of a larger all-α-helical globin-like fold core.
Conclusion
Functionally meaningful sequence similarity may sometimes be reflected only in local structural similarity, but not in global fold similarity. If detected and used naively, such similarities may lead to incorrect fold predictions.
doi:10.1186/1472-6807-6-8
PMCID: PMC1459171  PMID: 16603087
21.  Structural proteomics of minimal organisms: Conservation of protein fold usage and evolutionary implications 
Background
Determining the complete repertoire of protein structures for all soluble, globular proteins in a single organism has been one of the major goals of several structural genomics projects in recent years.
Results
We report that this goal has nearly been reached for several "minimal organisms" – parasites or symbionts with reduced genomes – for which over 95% of the soluble, globular proteins may now be assigned folds, overall 3-D backbone structures. We analyze the structures of these proteins as they relate to cellular functions, and compare conservation of fold usage between functional categories. We also compare patterns in the conservation of folds among minimal organisms and those observed between minimal organisms and other bacteria.
Conclusion
We find that proteins performing essential cellular functions closely related to transcription and translation exhibit a higher degree of conservation in fold usage than proteins in other functional categories. Folds related to transcription and translation functional categories were also overrepresented in minimal organisms compared to other bacteria.
doi:10.1186/1472-6807-6-7
PMCID: PMC1488858  PMID: 16566839
22.  Statistical deconvolution of enthalpic energetic contributions to MHC-peptide binding affinity 
Background
MHC Class I molecules present antigenic peptides to cytotoxic T cells, which forms an integral part of the adaptive immune response. Peptides are bound within a groove formed by the MHC heavy chain. Previous approaches to MHC Class I-peptide binding prediction have largely concentrated on the peptide anchor residues located at the P2 and C-terminus positions.
Results
A large dataset comprising MHC-peptide structural complexes was created by re-modelling pre-determined x-ray crystallographic structures. Static energetic analysis, following energy minimisation, was performed on the dataset in order to characterise interactions between bound peptides and the MHC Class I molecule, partitioning the interactions within the groove into van der Waals, electrostatic and total non-bonded energy contributions.
Conclusion
The QSAR techniques of Genetic Function Approximation (GFA) and Genetic Partial Least Squares (G/PLS) algorithms were used to identify key interactions between the two molecules by comparing the calculated energy values with experimentally-determined BL50 data. Although the peptide termini binding interactions help ensure the stability of the MHC Class I-peptide complex, the central region of the peptide is also important in defining the specificity of the interaction. As thermodynamic studies indicate that peptide association and dissociation may be driven entropically, it may be necessary to incorporate entropic contributions into future calculations.
doi:10.1186/1472-6807-6-5
PMCID: PMC1435758  PMID: 16549002
23.  Exploring dynamics of protein structure determination and homology-based prediction to estimate the number of superfamilies and folds 
Background
As tertiary structure is currently available only for a fraction of known protein families, it is important to assess what parts of sequence space have been structurally characterized. We consider protein domains whose structure can be predicted by sequence similarity to proteins with solved structure and address the following questions. Do these domains represent an unbiased random sample of all sequence families? Do targets solved by structural genomic initiatives (SGI) provide such a sample? What are approximate total numbers of structure-based superfamilies and folds among soluble globular domains?
Results
To make these assessments, we combine two approaches: (i) sequence analysis and homology-based structure prediction for proteins from complete genomes; and (ii) monitoring dynamics of the assigned structure set in time, with the accumulation of experimentally solved structures. In the Clusters of Orthologous Groups (COG) database, we map the growing population of structurally characterized domain families onto the network of sequence-based connections between domains. This mapping reveals a systematic bias suggesting that target families for structure determination tend to be located in highly populated areas of sequence space. In contrast, the subset of domains whose structure is initially inferred by SGI is similar to a random sample from the whole population. To accommodate for the observed bias, we propose a new non-parametric approach to the estimation of the total numbers of structural superfamilies and folds, which does not rely on a specific model of the sampling process. Based on dynamics of robust distribution-based parameters in the growing set of structure predictions, we estimate the total numbers of superfamilies and folds among soluble globular proteins in the COG database.
Conclusion
The set of currently solved protein structures allows for structure prediction in approximately a third of sequence-based domain families. The choice of targets for structure determination is biased towards domains with many sequence-based homologs. The growing SGI output in the future should further contribute to the reduction of this bias. The total number of structural superfamilies and folds in the COG database are estimated as ~4000 and ~1700. These numbers are respectively four and three times higher than the numbers of superfamilies and folds that can currently be assigned to COG proteins.
doi:10.1186/1472-6807-6-6
PMCID: PMC1444916  PMID: 16549009
24.  Identification of similar regions of protein structures using integrated sequence and structure analysis tools 
Background
Understanding protein function from its structure is a challenging problem. Sequence based approaches for finding homology have broad use for annotation of both structure and function. 3D structural information of protein domains and their interactions provide a complementary view to structure function relationships to sequence information. We have developed a web site and an API of web services that enables users to submit protein structures and identify statistically significant neighbors and the underlying structural environments that make that match using a suite of sequence and structure analysis tools. To do this, we have integrated S-BLEST, PSI-BLAST and HMMer based superfamily predictions to give a unique integrated view to prediction of SCOP superfamilies, EC number, and GO term, as well as identification of the protein structural environments that are associated with that prediction. Additionally, we have extended UCSF Chimera and PyMOL to support our web services, so that users can characterize their own proteins of interest.
Results
Users are able to submit their own queries or use a structure already in the PDB. Currently the databases that a user can query include the popular structural datasets ASTRAL 40 v1.69, ASTRAL 95 v1.69, CLUSTER50, CLUSTER70 and CLUSTER90 and PDBSELECT25. The results can be downloaded directly from the site and include function prediction, analysis of the most conserved environments and automated annotation of query proteins. These results reflect both the hits found with PSI-BLAST, HMMer and with S-BLEST. We have evaluated how well annotation transfer can be performed on SCOP ID's, Gene Ontology (GO) ID's and EC Numbers. The method is very efficient and totally automated, generally taking around fifteen minutes for a 400 residue protein.
Conclusion
With structural genomics initiatives determining structures with little, if any, functional characterization, development of protein structure and function analysis tools are a necessary endeavor. We have developed a useful application towards a solution to this problem using common structural and sequence based analysis tools. These approaches are able to find statistically significant environments in a database of protein structure, and the method is able to quantify how closely associated each environment is to a predicted functional annotation.
doi:10.1186/1472-6807-6-4
PMCID: PMC1435900  PMID: 16526955
25.  Fold classification based on secondary structure – how much is gained by including loop topology? 
Background
It has been proposed that secondary structure information can be used to classify (to some extend) protein folds. Since this method utilizes very limited information about the protein structure, it is not surprising that it has a higher error rate than the approaches that use full 3D fold description. On the other hand, the comparing of 3D protein structures is computing intensive. This raises the question to what extend the error rate can be decreased with each new source of information, especially if the new information can still be used with simple alignment algorithms.
We consider the question whether the information about closed loops can improve the accuracy of this approach. While the answer appears to be obvious, we had to overcome two challenges. First, how to code and to compare topological information in such a way that local alignment of strings will properly identify similar structures. Second, how to properly measure the effect of new information in a large data sample.
We investigate alternative ways of computing and presenting this information.
Results
We used the set of beta proteins with at most 30% pairwise identity to test the approach; local alignment scores were used to build a tree of clusters which was evaluated using a new log-odd cluster scoring function. In particular, we derive a closed formula for the probability of obtaining a given score by chance.Parameters of local alignment function were optimized using a genetic algorithm.
Of 81 folds that had more than one representative in our data set, log-odds scores registered significantly better clustering in 27 cases and significantly worse in 6 cases, and small differences in the remaining cases. Various notions of the significant change or average change were considered and tried, and the results were all pointing in the same direction.
Conclusion
We found that, on average, properly presented information about the loop topology improves noticeably the accuracy of the method but the benefits vary between fold families as measured by log-odds cluster score.
doi:10.1186/1472-6807-6-3
PMCID: PMC1434743  PMID: 16524467

Results 1-25 (27)