Search tips
Search criteria

Results 1-23 (23)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  CASP10 results compared to those of previous CASP experiments 
Proteins  2013;82(0 2):164-174.
We compare results of the community efforts in modeling protein structures in the tenth CASP experiment, with those in earlier CASPs, particularly in CASP5, a decade ago. There is a substantial improvement in template based model accuracy as reflected in more successful modeling of regions of structure not easily derived from a single experimental structure template, most likely reflecting intensive work within the modeling community in developing methods that make use of multiple templates, as well as the increased number of experimental structures available. Deriving structural information not obvious from a template is the most demanding as well as one of the most useful tasks that modeling can perform. Thus this is gratifying progress. By contrast, overall backbone accuracy of models appears little changed in the last decade. This puzzling result is explained by two factors – increased database size in some ways makes it harder to choose the best available templates, and the increased intrinsic difficulty of CASP targets, as experimental work has progressed to larger and more unusual structures. There is no detectable recent improvement in template free modeling, but again, this may reflect the changing nature of CASP targets.
PMCID: PMC4180100  PMID: 24150928
Protein Structure Prediction; Community Wide Experiment; CASP
2.  CASP9 results compared to those of previous CASP experiments 
Proteins  2011;79(0 10):196-207.
The quality of structure models submitted to CASP9 is analyzed in the context of previous CASPs. Comparison methods are similar to those used in previous papers in this series, with the addition of new methods for looking at model quality in regions not covered by a single best structural template, alignment accuracy, and progress for template free models. Progress in this CASP was again modest, and statistically hard to validate. Nevertheless, there are several positive trends. There is an indication of improvement in overall model quality for the mid-range of template based modeling difficulty, methods for identifying the best model from a set generated have improved, and there are strong indications of progress in the quality of template free models of short proteins. In addition, the new examination of model quality in regions of model not covered by the best available template reveals better performance than had previously been apparent.
PMCID: PMC4180080  PMID: 21997643
Protein Structure Prediction; Community Wide Experiment; CASP
3.  Critical Assessment of Methods of Protein Structure Prediction (CASP) - Round IX 
Proteins  2011;79(0 10):1-5.
This paper is an introduction to the special issue of the journal PROTEINS, dedicated to the ninth CASP experiment to assess the state of the art in protein structure modeling. The paper describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Methods for modeling protein structure continue to advance, although at a more modest pace than in the early CASP experiments. Developments of note are indications of improvement in model accuracy for some classes of target, an improved ability to choose the most accurate of a set of generated models, and evidence of improvement in accuracy for short ‘new fold’ models. In addition, a new analysis of regions of models not derivable from the most obvious template structure has revealed better performance than expected.
PMCID: PMC4180088  PMID: 21997831
Protein Structure Prediction; Community Wide Experiment; CASP
4.  Structural and Functional Impact of Cancer Related Missense Somatic Mutations 
Journal of molecular biology  2011;413(2):495-512.
A number of large scale cancer somatic genome sequencing projects are now identifying genetic alterations in cancers. Evaluation of the effects of these mutations is essential for understanding their contribution to tumorigenesis. We have used SNPs3D, a software suite originally developed for analyzing non-synonymous germ line variants, to identify single base mutations with a high impact on protein structure and function. Two machine learning methods are used, one identifying mutations that destabilize protein three dimensional structure, and the other utilizing sequence conservation, and detecting all types of effects on in vivo protein function. Incorporation of detailed structure information into the analysis allows detailed interpretation of the functional effects of mutations in specific cases. Data from a set of breast and colorectal tumors were analyzed. In known cancer genes, approaching 100% of mutations are found to impact protein function, supporting the view that these methods are appropriate for identifying driver mutations. Overall, 50% to 60% of all somatic missense mutations are predicted to have a high impact on structural stability or to more generally affect the function of the corresponding proteins. This value is similar to the fraction of all possible missense mutations that have high impact, and much higher than the corresponding one for human population SNPs, at about 30%. The majority of mutations in tumor suppressors destabilize protein structure, while mutations in oncogenes operate in more varied ways, including destabilization of the less active conformational states. The set of high impact mutations encompass the possible drivers.
PMCID: PMC4177034  PMID: 21763698
missense mutation; machine learning; support vector machine; protein structure; oncogene; tumor suppressor
5.  Protein Stability and in Vivo Concentration of Missense Mutations in Phenylalanine Hydroxylase 
Proteins  2011;80(1):61-70.
A previous computational analysis of missense mutations linked to monogenic disease found a high proportion of missense mutations affect protein stability, rather than other aspects of protein structure and function. The purpose of the present study is to relate the presence of such stability damaging missense mutations to the levels of a particular protein present under ‘in vivo’ like conditions, and to test the reliability of the computational methods. Experimental data on a set of missense mutations of the enzyme phenylalanine hydroxylase (PAH) associated with the monogenic disease phenylketonuria (PKU) have been compared with the expected in vivo impact on protein function, obtained using SNPs3D, an in silico analysis package. A high proportion of the PAH mutations are predicted to be destabilizing. The overall agreement between predicted stability impact and experimental evidence for lower protein levels is in accordance with the estimated error rates of the methods. For these mutations, destabilization of protein three dimensional structure is the major molecular mechanism leading to PKU, and results in a substantial reduction of in vivo PAH protein concentration. Although of limited scale, the results support the view that destabilization is the most common mechanism by which missense mutations cause monogenic disease. In turn, this conclusion suggests the general therapeutic strategy of developing drugs targeted at restoring wild type stability.
PMCID: PMC4170182  PMID: 21953985
missense mutation; machine learning; support vector machine; protein structure; protein stability; phenylketonuria (PKU)
6.  GWAS and drug targets 
BMC Genomics  2014;15(Suppl 4):S5.
Genome wide association studies (GWAS) have revealed a large number of links between genome variation and complex disease. Among other benefits, it is expected that these insights will lead to new therapeutic strategies, particularly the identification of new drug targets. In this paper, we evaluate the power of GWAS studies to find drug targets by examining how many existing drug targets have been directly 'rediscovered' by this technique, and the extent to which GWAS results may be leveraged by network information to discover known and new drug targets.
We find that only a very small fraction of drug targets are directly detected in the relevant GWAS studies. We investigate two possible explanations for this observation. First, we find evidence of negative selection acting on drug target genes as a consequence of strong coupling with the disease phenotype, so reducing the incidence of SNPs linked to the disease. Second, we find that GWAS genes are substantially longer on average than drug targets and than all genes, suggesting there is a length related bias in GWAS results. In spite of the low direct relationship between drug targets and GWAS reported genes, we found these two sets of genes are closely coupled in the human protein network. As a consequence, machine-learning methods are able to recover known drug targets based on network context and the set of GWAS reported genes for the same disease. We show the approach is potentially useful for identifying drug repurposing opportunities.
Although GWA studies do not directly identify most existing drug targets, there are several reasons to expect that new targets will nevertheless be discovered using these data. Initial results on drug repurposing studies using network analysis are encouraging and suggest directions for future development.
PMCID: PMC4083410  PMID: 25057111
7.  Target Highlights in CASP9: Experimental Target Structures for the Critical Assessment of Techniques for Protein Structure Prediction 
Proteins  2011;79(0 10):6-20.
One goal of the CASP Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction is to identify the current state of the art in protein structure prediction and modeling. A fundamental principle of CASP is blind prediction on a set of relevant protein targets, i.e. the participating computational methods are tested on a common set of experimental target proteins, for which the experimental structures are not known at the time of modeling. Therefore, the CASP experiment would not have been possible without broad support of the experimental protein structural biology community. In this manuscript, several experimental groups discuss the structures of the proteins which they provided as prediction targets for CASP9, highlighting structural and functional peculiarities of these structures: the long tail fibre protein gp37 from bacteriophage T4, the cyclic GMP-dependent protein kinase Iβ (PKGIβ) dimerization/docking domain, the ectodomain of the JTB (Jumping Translocation Breakpoint) transmembrane receptor, Autotaxin (ATX) in complex with an inhibitor, the DNA-Binding J-Binding Protein 1 (JBP1) domain essential for biosynthesis and maintenance of DNA base-J (β-D-glucosyl-hydroxymethyluracil) in Trypanosoma and Leishmania, an so far uncharacterized 73 residue domain from Ruminococcus gnavus with a fold typical for PDZ-like domains, a domain from the Phycobilisome (PBS) core-membrane linker (LCM) phycobiliprotein ApcE from Synechocystis, the Heat shock protein 90 (Hsp90) activators PFC0360w and PFC0270w from Plasmodium falciparum, and 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae.
PMCID: PMC3692002  PMID: 22020785
CASP; protein structure; X-ray crystallography; NMR; structure prediction
8.  Structural basis for the mechanism and substrate specificity of glycocyamine kinase, a phosphagen kinase family member†‡ 
Biochemistry  2010;49(9):2031-2041.
Glycocyamine kinase (GK), a member of the phosphagen kinase family, catalyzes the Mg2+-dependent reversible phosphoryl group transfer of the N-phosphoryl group of phospho glycocyamine to ADP to yield glycocyamine and ATP. This reaction helps to maintain the energy homeostasis of the cell in some multicelullar organisms that encounter high and variable energy turnover. GK from the marine worm Namalycastis sp. is heterodimeric, with two homologous polypeptide chains, α and β, derived from a common pre mRNA by mutually exclusive N-terminal alternative exons. The N-terminal exon of GKβ encodes a peptide that is different in sequence and is sixteen amino acids longer than that encoded by the N-terminal exon of GKα. The crystal structures of recombinant GKαβ and GKββ from Namalycastis sp. were determined at 2.6 Å and 2.4 Å resolution, respectively. In addition, the structure of the GKββ was determined at 2.3 Å resolution in complex with a transition state analog, Mg2+-ADP-NO3--glycocyamine. Consistent with the sequence homology, the GK subunits adopt the same overall fold as that of other phosphagen kinases of known structure (the homodimeric creatine kinase (CK) and the monomeric arginine kinase (AK)). As with CK, the GK N-termini mediate the dimer interface. In both heterodimeric and homodimeric GK forms, the conformations of the two N-termini are asymmetric and the asymmetry is different than that reported previously for the homodimeric CKs from several organisms. The entire polypeptide chains of GKαβ are structurally defined and the longer N-terminus of the β subunit is anchored at the dimer interface. In GKββ the 24 N-terminal residues of one subunit and 11 N-terminal residues of the second subunit are disordered. This observation is consistent with a proposal that the GKαβ amino acids involved in the interface formation were optimized once a heterodimer emerged as the physiological form of the enzyme. As a consequence, the homodimer interface (either solely α or solely β chains) has been corrupted. In the unbound state, GK exhibits an open conformation analogous to that observed with ligand-free CK or AK. Upon binding the transition state analog, both subunits of GK undergo the same closure motion that clasps the transition state analog, in contrast to the transition state analog complexes of CK, where the corresponding transition state analog occupies only one subunit, which undergoes domain closure. The active site environments of the GK, CK and AK at the bound states reveal the structural determinants of substrate specificity. Despite the equivalent binding in both active sites of the GK heterodimer, the conformational asymmetry of the N-termini is retained. Thus, the coupling between the structural asymmetry and negative cooperativity previously proposed for CK is not supported in the case of GK.
PMCID: PMC3519428  PMID: 20121101
9.  Evaluation of disorder predictions in CASP9 
Proteins  2011;79(S10):107-118.
Lack of stable three-dimensional structure, or intrinsic disorder, is a common phenomenon in proteins. Naturally unstructured regions are proven to be essential for carrying function by many proteins and therefore identification of such regions is an important issue. CASP has been assessing the state of the art in predicting disorder regions from amino acid sequence since 2002. Here we present the results of the evaluation of the disorder predictions submitted to CASP9. The assessment is based on the evaluation measures and procedures used in previous CASPs. The balanced accuracy and the Matthews correlation coefficient were chosen as basic measures for evaluating the correctness of binary classifications. The area under the receiving operating characteristic curve was the measure of choice for evaluating probability-based predictions of disorder. The CASP9 methods are shown to perform slightly better than the CASP7 methods but not better than the methods in CASP8. It was also shown that capability of most CASP9 methods to predict disorder decreases with increasing minimum disorder segment length.
PMCID: PMC3212657  PMID: 21928402
CASP; intrinsically disordered proteins; unstructured proteins; rediction of disordered regions; assessment of disorder prediction
10.  Neighbor Overlap Is Enriched in the Yeast Interaction Network: Analysis and Implications 
PLoS ONE  2012;7(6):e39662.
The yeast protein-protein interaction network has been shown to have distinct topological features such as a scale free degree distribution and a high level of clustering. Here we analyze an additional feature which is called Neighbor Overlap. This feature reflects the number of shared neighbors between a pair of proteins. We show that Neighbor Overlap is enriched in the yeast protein-protein interaction network compared with control networks carefully designed to match the characteristics of the yeast network in terms of degree distribution and clustering coefficient. Our analysis also reveals that pairs of proteins with high Neighbor Overlap have higher sequence similarity, more similar GO annotations and stronger genetic interactions than pairs with low ones. Finally, we demonstrate that pairs of proteins with redundant functions tend to have high Neighbor Overlap. We suggest that a combination of three mechanisms is the basis for this feature: The abundance of protein complexes, selection for backup of function, and the need to allow functional variation.
PMCID: PMC3383679  PMID: 22761860
11.  Protein Characterization of a Candidate Mechanism SNP for Crohn's Disease: The Macrophage Stimulating Protein R689C Substitution 
PLoS ONE  2011;6(11):e27269.
High throughput genome wide associations studies (GWAS) are now identifying a large number of genome loci related to risk of common human disease. Each such locus presents a challenge in identifying the relevant underlying mechanism. Here we report the experimental characterization of a proposed causal single nucleotide polymorphism (SNP) in a locus related to risk of Crohn's disease and ulcerative colitis. The SNP lies in the MST1 gene encoding Macrophage Stimulating Protein (MSP), and results in an R689C amino acid substitution within the β-chain of MSP (MSPβ). MSP binding to the RON receptor tyrosine kinase activates signaling pathways involved in the inflammatory response. We have purified wild-type and mutant MSPβ proteins and compared biochemical and biophysical properties that might impact the MSP/RON signaling pathway. Surface plasmon resonance (SPR) binding studies showed that MSPβ R689C affinity to RON is approximately 10-fold lower than that of the wild-type MSPβ and differential scanning fluorimetry (DSF) showed that the thermal stability of the mutant MSPβ was slightly lower than that of wild-type MSPβ, by 1.6 K. The substitution was found not to impair the specific Arg483-Val484 peptide bond cleavage by matriptase-1, required for MSP activation, and mass spectrometry of tryptic fragments of the mutated protein showed that the free thiol introduced by the R689C mutation did not form an aberrant disulfide bond. Together, the studies indicate that the missense SNP impairs MSP function by reducing its affinity to RON and perhaps through a secondary effect on in vivo concentration arising from reduced thermodynamic stability, resulting in down-regulation of the MSP/RON signaling pathway.
PMCID: PMC3210151  PMID: 22087277
14.  Community-wide assessment of GPCR structure modeling and docking understanding 
Nature reviews. Drug discovery  2009;8(6):455-463.
With the recent breakthroughs in G protein-coupled receptor structure, one can now compare experimentally determined structures with the most recent modeling and docking methods. A community-wide blind prediction experiment (GPCR Dock 2008) was conducted in coordination with the publication of the human adenosine A2A receptor bound to the ligand ZM241385 crystal structure (Science 322, 1211 (2008)). Twenty-nine participating groups submitted 206 models that were evaluated for the accuracy of the ligand binding mode and the overall receptor model. Several new insights emerged including the critical importance of disulfide bonds in the extracellular loops, helix residue registry, and domain knowledge.
PMCID: PMC2728591  PMID: 19461661
15.  Composition bias and the origin of ORFan genes 
Bioinformatics  2010;26(8):996-999.
Motivation: Intriguingly, sequence analysis of genomes reveals that a large number of genes are unique to each organism. The origin of these genes, termed ORFans, is not known. Here, we explore the origin of ORFan genes by defining a simple measure called ‘composition bias’, based on the deviation of the amino acid composition of a given sequence from the average composition of all proteins of a given genome.
Results: For a set of 47 prokaryotic genomes, we show that the amino acid composition bias of real proteins, random ‘proteins’ (created by using the nucleotide frequencies of each genome) and ‘proteins’ translated from intergenic regions are distinct. For ORFans, we observed a correlation between their composition bias and their relative evolutionary age. Recent ORFan proteins have compositions more similar to those of random ‘proteins’, while the compositions of more ancient ORFan proteins are more similar to those of the set of all proteins of the organism. This observation is consistent with an evolutionary scenario wherein ORFan genes emerged and underwent a large number of random mutations and selection, eventually adapting to the composition preference of their organism over time.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2853687  PMID: 20231229
16.  Outcome of a Workshop on Applications of Protein Models in Biomedical Research 
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
PMCID: PMC2739730  PMID: 19217386
17.  Stochastic noise in splicing machinery 
Nucleic Acids Research  2009;37(14):4873-4886.
The number of known alternative human isoforms has been increasing steadily with the amount of available transcription data. To date, over 100 000 isoforms have been detected in EST libraries, and at least 75% of human genes have at least one alternative isoform. In this paper, we propose that most alternative splicing events are the result of noise in the splicing process. We show that the number of isoforms and their abundance can be predicted by a simple stochastic noise model that takes into account two factors: the number of introns in a gene and the expression level of a gene. The results strongly support the hypothesis that most alternative splicing is a consequence of stochastic noise in the splicing machinery, and has no functional significance. The results are also consistent with error rates tuned to ensure that an adequate level of functional product is produced and to reduce the toxic effect of accumulation of misfolding proteins. Based on simulation of sampling of virtual cDNA libraries, we estimate that error rates range from 1 to 10% depending on the number of introns and the expression level of a gene.
PMCID: PMC2724286  PMID: 19546110
18.  Structural implication of splicing stochastics 
Nucleic Acids Research  2009;37(14):4862-4872.
Even though nearly every human gene has at least one alternative splice form, very little is so far known about the structure and function of resulting protein products. It is becoming increasingly clear that a significant fraction of all isoforms are products of noisy selection of splice sites and thus contribute little to actual functional diversity, and may potentially be deleterious. In this study, we examine the impact of alternative splicing on protein sequence and structure in three datasets: alternative splicing events conserved across multiple species, alternative splicing events in genes that are strongly linked to disease and all observed alternative splicing events. We find that the vast majority of all alternative isoforms result in unstable protein conformations. In contrast to that, the small subset of isoforms conserved across species tends to maintain protein structural integrity to a greater extent. Alternative splicing in disease-associated genes produces unstable structures just as frequently as all other genes, indicating that selection to reduce the effects of alternative splicing on this set is not especially pronounced. Overall, the properties of alternative spliced proteins are consistent with the outcome of noisy selection of splice sites by splicing machinery.
PMCID: PMC2724273  PMID: 19528068
19.  Critical assessment of methods of protein structure prediction—Round VII 
Proteins  2007;69(S8):3-9.
This paper is an introduction to the supplemental issue of the journal PROTEINS, dedicated to the seventh CASP experiment to assess the state of the art in protein structure prediction. The paper describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Highlights are improvements in model accuracy relative to that obtainable from knowledge of a single best template structure; convergence of the accuracy of models produced by automatic servers toward that produced by human modeling teams; the emergence of methods for predicting the quality of models; and rapidly increasing practical applications of the methods.
PMCID: PMC2653632  PMID: 17918729
protein structure prediction; community wide experiment; CASP
20.  Critical assessment of methods of protein structure prediction—Round VII 
Proteins  2007;69(Suppl 8):3-9.
This paper is an introduction to the supplemental issue of the journal PROTEINS, dedicated to the seventh CASP experiment to assess the state of the art in protein structure prediction. The paper describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Highlights are improvements in model accuracy relative to that obtainable from knowledge of a single best template structure; convergence of the accuracy of models produced by automatic servers toward that produced by human modeling teams; the emergence of methods for predicting the quality of models; and rapidly increasing practical applications of the methods.
PMCID: PMC2653632  PMID: 17918729
protein structure prediction; community wide experiment; CASP
21.  Proteopedia - a scientific 'wiki' bridging the rift between three-dimensional structure and function of biomacromolecules 
Genome Biology  2008;9(8):R121.
Proteopedia is an interactive wiki-style web resource that presents 3D structural and functional information in a user-friendly manner and allows real-time community annotation.
Many scientists lack the background to fully utilize the wealth of solved three-dimensional biomacromolecule structures. Thus, a resource is needed to present structure/function information in a user-friendly manner to a broad scientific audience. Proteopedia is an interactive, wiki web-resource whose pages have embedded three-dimensional structures surrounded by descriptive text containing hyperlinks that change the appearance (view, representations, colors, labels) of the adjacent three-dimensional structure to reflect the concept explained in the text.
PMCID: PMC2575511  PMID: 18673581
22.  Rigorous performance evaluation in protein structure modelling and implications for computational biology 
In principle, given the amino acid sequence of a protein, it is possible to compute the corresponding three-dimensional structure. Methods for modelling structure based on this premise have been under development for more than 40 years. For the past decade, a series of community wide experiments (termed Critical Assessment of Structure Prediction (CASP)) have assessed the state of the art, providing a detailed picture of what has been achieved in the field, where we are making progress, and what major problems remain. The rigorous evaluation procedures of CASP have been accompanied by substantial progress. Lessons from this area of computational biology suggest a set of principles for increasing rigor in the field as a whole.
PMCID: PMC1609338  PMID: 16524833
protein structure prediction; community wide experiment; critical assessment of structure prediction; computational biology
23.  SNPs3D: Candidate gene and SNP selection for association studies 
BMC Bioinformatics  2006;7:166.
The relationship between disease susceptibility and genetic variation is complex, and many different types of data are relevant. We describe a web resource and database that provides and integrates as much information as possible on disease/gene relationships at the molecular level.
The resource has three primary modules. One module identifies which genes are candidates for involvement in a specified disease. A second module provides information about the relationships between sets of candidate genes. The third module analyzes the likely impact of non-synonymous SNPs on protein function. Disease/candidate gene relationships and gene-gene relationships are derived from the literature using simple but effective text profiling. SNP/protein function relationships are derived by two methods, one using principles of protein structure and stability, the other based on sequence conservation. Entries for each gene include a number of links to other data, such as expression profiles, pathway context, mouse knockout information and papers. Gene-gene interactions are presented in an interactive graphical interface, providing rapid access to the underlying information, as well as convenient navigation through the network. Use of the resource is illustrated with aspects of the inflammatory response and hypertension.
The combination of SNP impact analysis, a knowledge based network of gene relationships and candidate genes, and access to a wide range of data and literature allow a user to quickly assimilate available information, and so develop models of gene-pathway-disease interaction.
PMCID: PMC1435944  PMID: 16551372

Results 1-23 (23)