PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-15 (15)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
2.  Alignment of multiple protein structures based on sequence and structure features 
Comparing the structures of proteins is crucial to gaining insight into protein evolution and function. Here, we align the sequences of multiple protein structures by a dynamic programming optimization of a scoring function that is a sum of an affine gap penalty and terms dependent on various sequence and structure features (SALIGN). The features include amino acid residue type, residue position, residue accessible surface area, residue secondary structure state and the conformation of a short segment centered on the residue. The multiple alignment is built by following the ‘guide’ tree constructed from the matrix of all pairwise protein alignment scores. Importantly, the method does not depend on the exact values of various parameters, such as feature weights and gap penalties, because the optimal alignment across a range of parameter values is found. Using multiple structure alignments in the HOMSTRAD database, SALIGN was benchmarked against MUSTANG for multiple alignments as well as against TM-align and CE for pairwise alignments. On the average, SALIGN produces a 15% improvement in structural overlap over HOMSTRAD and 14% over MUSTANG, and yields more equivalent structural positions than TM-align and CE in 90% and 95% of cases, respectively. The utility of accurate multiple structure alignment is illustrated by its application to comparative protein structure modeling.
doi:10.1093/protein/gzp040
PMCID: PMC2909824  PMID: 19587024
multiple structure alignment; dynamic programming; guide tree; RMSD; structure overlap
3.  A survey of integral α-helical membrane proteins 
Membrane proteins serve as cellular gatekeepers, regulators, and sensors. Prior studies have explored the functional breadth and evolution of proteins and families of particular interest, such as the diversity of transport-associated membrane protein families in prokaryotes and eukaryotes, the composition of integral membrane proteins, and family classification of all human G-protein coupled receptors. However, a comprehensive analysis of the content and evolutionary associations between membrane proteins and families in a diverse set of genomes is lacking. Here, a membrane protein annotation pipeline was developed to define the integral membrane genome and associations between 21,379 proteins from 34 genomes; most, but not all of these proteins belong to 598 defined families. The pipeline was used to provide target input for a structural genomics project that successfully cloned, expressed, and purified 61 of our first 96 selected targets in yeast. Furthermore, the methodology was applied (1) to explore the evolutionary history of the substrate-binding transmembrane domains of the human ABC transporter superfamily, (2) to identify the multidrug resistance-associated membrane proteins in whole genomes, and (3) to identify putative new membrane protein families.
Electronic supplementary material
The online version of this article (doi:10.1007/s10969-009-9069-8) contains supplementary material, which is available to authorized users.
doi:10.1007/s10969-009-9069-8
PMCID: PMC2780624  PMID: 19760129
Membrane proteins; Superfamily analysis; Multidrug resistance; ABC transporters; Target selection
4.  Structure of the Mammalian 80S Ribosome at 8.7Å Resolution 
We present a structure of the mammalian ribosome determined at ∼8.7Å resolution by cryo-electron microscopy. A molecular model was created by docking homology models of the subunit rRNAs and conserved ribosomal proteins into the density map. We also modeled expansion segments in the subunit rRNAs and identified the positions of 20 novel proteins. In general, we find that many ribosomal proteins interact with the expansion segments to form an integrated framework that may stabilize the mature ribosome. Importantly, our structure gives a snapshot of the mammalian ribosome before the binding of an A-site tRNA. The structure also provides additional support for the idea that movements of the small subunit and L1 stalk occur during the translocation of tRNAs. Finally, new details are presented about novel inter-subunit bridges in the eukaryotic ribosome. These bridges may help reset the conformation of the ribosomal subunits in preparation for the next cycle of chain elongation.
doi:10.1016/j.str.2008.01.007
PMCID: PMC2775484  PMID: 18400176
eukaryotic ribosome; protein translation; tRNA translocation; expansion segments
5.  Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies 
To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems; members of these superfamilies share a common TIM barrel fold and catalyze a wide range of chemical reactions. Here, we describe a collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics (NYSGXRC) that aims to maximize the structural coverage of the amidohydrolase and enolase superfamilies. Using sequence- and structure-based protein comparisons, we first selected 535 target proteins from a variety of genomes for high-throughput structure determination by X-ray crystallography; 63 of these targets were not previously annotated as superfamily members. To date, 20 unique amidohydrolase and 41 unique enolase structures have been determined, increasing the fraction of sequences in the two superfamilies that can be modeled based on at least 30% sequence identity from 45% to 73%. We present case studies of proteins related to uronate isomerase (an amidohydrolase superfamily member) and mandelate racemase (an enolase superfamily member), to illustrate how this structure-focused approach can be used to generate hypotheses about sequence–structure–function relationships.
Electronic supplementary material
The online version of this article (doi:10.1007/s10969-008-9056-5) contains supplementary material, which is available to authorized users.
doi:10.1007/s10969-008-9056-5
PMCID: PMC2693957  PMID: 19219566
Amidohydrolase and enolase superfamilies; Structural genomics; Structure annotation; Target selection
6.  Target Selection and Annotation for the Structural Genomics of the Amidohydrolase and Enolase Superfamilies 
To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems; members of these superfamilies share a common TIM barrel fold and catalyze a wide range of chemical reactions. Here, we describe a collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics (NYSGXRC) that aims to maximize the structural coverage of the amidohydrolase and enolase superfamilies. Using sequence- and structure-based protein comparisons, we first selected 535 target proteins from a variety of genomes for high-throughput structure determination by X-ray crystallography; 63 of these targets were not previously annotated as superfamily members. To date, 20 unique amidohydrolase and 41 unique enolase structures have been determined, increasing the fraction of sequences in the two superfamilies that can be modeled based on at least 30% sequence identity from 45% to 73%. We present case studies of proteins related to uronate isomerase (an amidohydrolase superfamily member) and mandelate racemase (an enolase superfamily member), to illustrate how this structure-focused approach can be used to generate hypotheses about sequence-structure-function relationships.
Summary
To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems. Here, we describe a collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics (NYSGXRC) aimed at maximizing the structural coverage of the amidohydrolase and enolase superfamilies. We selected 535 target proteins from a variety of genomes for high-throughput structure determination by X-ray crystallography, resulting in 61 unique structures thus far.
doi:10.1007/s10969-008-9056-5
PMCID: PMC2693957  PMID: 19219566
Amidohydrolase and enolase superfamilies; structural genomics; structure annotation; target selection
7.  A Kernel for Open Source Drug Discovery in Tropical Diseases 
Background
Conventional patent-based drug development incentives work badly for the developing world, where commercial markets are usually small to non-existent. For this reason, the past decade has seen extensive experimentation with alternative R&D institutions ranging from private–public partnerships to development prizes. Despite extensive discussion, however, one of the most promising avenues—open source drug discovery—has remained elusive. We argue that the stumbling block has been the absence of a critical mass of preexisting work that volunteers can improve through a series of granular contributions. Historically, open source software collaborations have almost never succeeded without such “kernels”.
Methodology/Principal Findings
Here, we use a computational pipeline for: (i) comparative structure modeling of target proteins, (ii) predicting the localization of ligand binding sites on their surfaces, and (iii) assessing the similarity of the predicted ligands to known drugs. Our kernel currently contains 143 and 297 protein targets from ten pathogen genomes that are predicted to bind a known drug or a molecule similar to a known drug, respectively. The kernel provides a source of potential drug targets and drug candidates around which an online open source community can nucleate. Using NMR spectroscopy, we have experimentally tested our predictions for two of these targets, confirming one and invalidating the other.
Conclusions/Significance
The TDI kernel, which is being offered under the Creative Commons attribution share-alike license for free and unrestricted use, can be accessed on the World Wide Web at http://www.tropicaldisease.org. We hope that the kernel will facilitate collaborative efforts towards the discovery of new drugs against parasites that cause tropical diseases.
Author Summary
Open source drug discovery, a promising alternative avenue to conventional patent-based drug development, has so far remained elusive with few exceptions. A major stumbling block has been the absence of a critical mass of preexisting work that volunteers can improve through a series of granular contributions. This paper introduces the results from a newly assembled computational pipeline for identifying protein targets for drug discovery in ten organisms that cause tropical diseases. We have also experimentally tested two promising targets for their binding to commercially available drugs, validating one and invalidating the other. The resulting kernel provides a base of drug targets and lead candidates around which an open source community can nucleate. We invite readers to donate their judgment and in silico and in vitro experiments to develop these targets to the point where drug optimization can begin.
doi:10.1371/journal.pntd.0000418
PMCID: PMC2667270  PMID: 19381286
8.  modbase, a database of annotated comparative protein structure models and associated resources 
Nucleic Acids Research  2008;37(Database issue):D347-D354.
MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5 152 695 reliable models for domains in 1 593 209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/).
doi:10.1093/nar/gkn791
PMCID: PMC2686492  PMID: 18948282
9.  Prediction of enzyme function by combining sequence similarity and protein interactions 
BMC Bioinformatics  2008;9:249.
Background
A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners.
Results
The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST.
Conclusion
Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.
doi:10.1186/1471-2105-9-249
PMCID: PMC2430716  PMID: 18505562
10.  DBAli tools: mining the protein structure space 
Nucleic Acids Research  2007;35(Web Server issue):W393-W397.
The DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions.
doi:10.1093/nar/gkm236
PMCID: PMC1933139  PMID: 17478513
11.  MODBASE: a database of annotated comparative protein structure models and associated resources 
Nucleic Acids Research  2005;34(Database issue):D291-D295.
MODBASE () is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculating the models. MODBASE currently contains 3 094 524 reliable models for domains in 1 094 750 out of 1 817 889 unique protein sequences in the UniProt database (July 5, 2005); only models based on statistically significant alignments and models assessed to have the correct fold despite insignificant alignments are included. MODBASE also allows users to generate comparative models for proteins of interest with the automated modeling server MODWEB (). Our other resources integrated with MODBASE include comprehensive databases of multiple protein structure alignments (DBAli, ), structurally defined ligand binding sites and structurally defined binary domain interfaces (PIBASE, ) as well as predictions of ligand binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP, ).
doi:10.1093/nar/gkj059
PMCID: PMC1347422  PMID: 16381869
12.  MODBASE, a database of annotated comparative protein structure models, and associated resources 
Nucleic Acids Research  2004;32(Database issue):D217-D222.
MODBASE (http://salilab.org/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on the MODELLER package for fold assignment, sequence–structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE uses the MySQL relational database management system for flexible querying and CHIMERA for viewing the sequences and structures (http://www.cgl.ucsf.edu/chimera/). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different data sets. The largest data set contains 1 262 629 models for domains in 659 495 out of 1 182 126 unique protein sequences in the complete Swiss-Prot/TrEMBL database (August 25, 2003); only models based on alignments with significant similarity scores and models assessed to have the correct fold despite insignificant alignments are included. Another model data set supports target selection and structure-based annotation by the New York Structural Genomics Research Consortium; e.g. the 53 new structures produced by the consortium allowed us to characterize structurally 24 113 sequences. MODBASE also contains binding site predictions for small ligands and a set of predicted interactions between pairs of modeled sequences from the same genome. Our other resources associated with MODBASE include a comprehensive database of multiple protein structure alignments (DBALI, http://salilab.org/dbali) as well as web servers for automated comparative modeling with MODPIPE (MODWEB, http://salilab.org/modweb), modeling of loops in protein structures (MODLOOP, http://salilab.org/modloop) and predicting functional consequences of single nucleotide polymorphisms (SNPWEB, http://salilab.org/snpweb).
doi:10.1093/nar/gkh095
PMCID: PMC308829  PMID: 14681398
13.  EVA: evaluation of protein structure prediction servers 
Nucleic Acids Research  2003;31(13):3311-3315.
EVA (http://cubic.bioc.columbia.edu/eva/) is a web server for evaluation of the accuracy of automated protein structure prediction methods. The evaluation is updated automatically each week, to cope with the large number of existing prediction servers and the constant changes in the prediction methods. EVA currently assesses servers for secondary structure prediction, contact prediction, comparative protein structure modelling and threading/fold recognition. Every day, sequences of newly available protein structures in the Protein Data Bank (PDB) are sent to the servers and their predictions are collected. The predictions are then compared to the experimental structures once a week; the results are published on the EVA web pages. Over time, EVA has accumulated prediction results for a large number of proteins, ranging from hundreds to thousands, depending on the prediction method. This large sample assures that methods are compared reliably. As a result, EVA provides useful information to developers as well as users of prediction methods.
PMCID: PMC169025  PMID: 12824315
14.  Tools for comparative protein structure modeling and analysis 
Nucleic Acids Research  2003;31(13):3375-3380.
The following resources for comparative protein structure modeling and analysis are described (http://salilab.org): MODELLER, a program for comparative modeling by satisfaction of spatial restraints; MODWEB, a web server for automated comparative modeling that relies on PSI-BLAST, IMPALA and MODELLER; MODLOOP, a web server for automated loop modeling that relies on MODELLER; MOULDER, a CPU intensive protocol of MODWEB for building comparative models based on distant known structures; MODBASE, a comprehensive database of annotated comparative models for all sequences detectably related to a known structure; MODVIEW, a Netscape plugin for Linux that integrates viewing of multiple sequences and structures; and SNPWEB, a web server for structure-based prediction of the functional impact of a single amino acid substitution.
PMCID: PMC168950  PMID: 12824331
15.  MODBASE, a database of annotated comparative protein structure models 
Nucleic Acids Research  2002;30(1):255-259.
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server.
PMCID: PMC99112  PMID: 11752309

Results 1-15 (15)