PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-19 (19)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Disease-Associated Mutations Disrupt Functionally Important Regions of Intrinsic Protein Disorder 
PLoS Computational Biology  2012;8(10):e1002709.
The effects of disease mutations on protein structure and function have been extensively investigated, and many predictors of the functional impact of single amino acid substitutions are publicly available. The majority of these predictors are based on protein structure and evolutionary conservation, following the assumption that disease mutations predominantly affect folded and conserved protein regions. However, the prevalence of the intrinsically disordered proteins (IDPs) and regions (IDRs) in the human proteome together with their lack of fixed structure and low sequence conservation raise a question about the impact of disease mutations in IDRs. Here, we investigate annotated missense disease mutations and show that 21.7% of them are located within such intrinsically disordered regions. We further demonstrate that 20% of disease mutations in IDRs cause local disorder-to-order transitions, which represents a 1.7–2.7 fold increase compared to annotated polymorphisms and neutral evolutionary substitutions, respectively. Secondary structure predictions show elevated rates of transition from helices and strands into loops and vice versa in the disease mutations dataset. Disease disorder-to-order mutations also influence predicted molecular recognition features (MoRFs) more often than the control mutations. The repertoire of disorder-to-order transition mutations is limited, with five most frequent mutations (R→W, R→C, E→K, R→H, R→Q) collectively accounting for 44% of all deleterious disorder-to-order transitions. As a proof of concept, we performed accelerated molecular dynamics simulations on a deleterious disorder-to-order transition mutation of tumor protein p63 and, in agreement with our predictions, observed an increased α-helical propensity of the region harboring the mutation. Our findings highlight the importance of mutations in IDRs and refine the traditional structure-centric view of disease mutations. The results of this study offer a new perspective on the role of mutations in disease, with implications for improving predictors of the functional impact of missense mutations.
Author Summary
Intrinsically unstructured or disordered proteins have been implicated in the etiology of a wide spectrum of diseases. However, the molecular mechanisms that relate mutations in intrinsically disordered regions (IDRs) to disease pathogenesis have not been investigated. Disordered proteins do not conform to the prevailing view of deleterious mutations which equates function, structure and evolutionary conservation – intrinsically disordered regions are functional, but lack a fixed three-dimensional structure and in general have low sequence conservation. Here we demonstrate that >20% of disease-associated missense mutations affect IDRs and interfere with their functions. We further show that 20% of deleterious mutations in IDRs induce predicted disorder-to-order transitions. Our predictions are supported by accelerated molecular dynamics simulations that show an increase in helical propensity of the region harboring a disease disorder-to-order transition mutation of tumor protein p63. Our results refine the traditional structure-centric view of disease mutations and offer a new perspective on the role of non-synonymous mutations in disease. Our findings have broad implications for improving predictors of the functional impact of missense mutations, and for interpretation of novel variants identified in large genome sequencing projects that aim to provide a better understanding of human genetic variation and its relevance to common diseases.
doi:10.1371/journal.pcbi.1002709
PMCID: PMC3464192  PMID: 23055912
2.  Intrinsic Disorder in the Human Spliceosomal Proteome 
PLoS Computational Biology  2012;8(8):e1002641.
The spliceosome is a molecular machine that performs the excision of introns from eukaryotic pre-mRNAs. This macromolecular complex comprises in human cells five RNAs and over one hundred proteins. In recent years, many spliceosomal proteins have been found to exhibit intrinsic disorder, that is to lack stable native three-dimensional structure in solution. Building on the previous body of proteomic, structural and functional data, we have carried out a systematic bioinformatics analysis of intrinsic disorder in the proteome of the human spliceosome. We discovered that almost a half of the combined sequence of proteins abundant in the spliceosome is predicted to be intrinsically disordered, at least when the individual proteins are considered in isolation. The distribution of intrinsic order and disorder throughout the spliceosome is uneven, and is related to the various functions performed by the intrinsic disorder of the spliceosomal proteins in the complex. In particular, proteins involved in the secondary functions of the spliceosome, such as mRNA recognition, intron/exon definition and spliceosomal assembly and dynamics, are more disordered than proteins directly involved in assisting splicing catalysis. Conserved disordered regions in spliceosomal proteins are evolutionarily younger and less widespread than ordered domains of essential spliceosomal proteins at the core of the spliceosome, suggesting that disordered regions were added to a preexistent ordered functional core. Finally, the spliceosomal proteome contains a much higher amount of intrinsic disorder predicted to lack secondary structure than the proteome of the ribosome, another large RNP machine. This result agrees with the currently recognized different functions of proteins in these two complexes.
Author Summary
In eukaryotic cells, introns are spliced out of proteincoding mRNAs by a highly dynamic and extraordinarily plastic molecular machine called the spliceosome. In recent years, multiple regions of intrinsic structural disorder were found in spliceosomal proteins. Intrinsically disordered regions lack stable native three-dimensional structure in solutions, which makes them structurally flexible and/or able to switch between different conformations. Hence, intrinsically disordered regions are the ideal candidate responsible for the spliceosome's plasticity. Intrinsically disordered regions are also frequently the sites of post-translational modifications, which were also proven to be important in spliceosome dynamics. In this article, we describe the results of a structural bioinformatics analysis focused on intrinsic disorder in the spliceosomal proteome. We systematically analyzed all known human spliceosomal proteins with regards to the presence and type of intrinsic disorder. Almost a half of the combined sequence of these spliceosomal proteins is predicted to be intrinsically disordered, and the type of intrinsic disorder in a protein varies with its function and its location in the spliceosome. The parts of the spliceosome that act earlier in the process are more disordered, which corresponds to their role in establishing a network of interactions, while the parts that act later are more ordered.
doi:10.1371/journal.pcbi.1002641
PMCID: PMC3415423  PMID: 22912569
3.  Duplications of the Neuropeptide Receptor VIPR2 Confer Significant Risk for Schizophrenia 
Nature  2011;471(7339):499-503.
Rare copy number variants (CNVs) play a prominent role in the etiology of schizophrenia and other neuropsychiatric disorders1. Substantial risk for schizophrenia is conferred by large (>500 kb) CNVs at several loci, including microdeletions at 1q21.1 2, 3q29 3, 15q13.3 2 and 22q11.2 4 and microduplication at 16p11.2 5. However, these CNVs collectively account for a small fraction (2-4%) of cases, and the relevant genes and neurobiological mechanisms are not well understood. Here we performed a large two-stage genome-wide scan of rare CNVs and report the significant association of copy number gains at chromosome 7q36.3 with schizophrenia (P= 4.0×10-5, OR = 16.14 [3.06, ∞]). Microduplications with variable breakpoints occurred within a 362 kb region and were detected in 29 of 8,290 (0.35%) patients versus two of 7,431 (0.03%) controls in the combined sample (p-value= 5.7×10-7, odds ratio (OR) = 14.1 [3.5, 123.9]). All duplications overlapped or were located within 89 kb upstream of the vasoactive intestinal peptide receptor VIPR2. VIPR2 transcription and cyclic-AMP signaling were significantly increased in cultured lymphocytes from patients with microduplications of 7q36.3. These findings implicate altered VIP signaling in the pathogenesis of schizophrenia and suggest VIPR2 as a potential target for the development of novel antipsychotic drugs.
doi:10.1038/nature09884
PMCID: PMC3351382  PMID: 21346763
4.  Disease mutations in disordered regions—exception to the rule?† 
Molecular Biosystems  2011;8(1):27-32.
Intrinsically disordered proteins (IDPs) have been implicated in a number of human diseases, including cancer, diabetes, neurodegenerative and cardiovascular disorders. Although for some of these conditions molecular mechanisms are now better understood, the big picture connecting distinct structural properties and functional repertoire of IDPs to pathogenesis and disease progression is still incomplete. Recent studies suggest that signaling and regulatory roles carried out by IDPs require them to be tightly regulated, and that altered IDP abundance may lead to disease. Here, we propose another link between IDPs and disease that takes into account disease-associated missense mutations located in the intrinsically disordered regions. We argue that such mutations are more prevalent and have larger functional impact than previously thought. In addition, we demonstrate that deleterious amino acid substitutions that cause disorder-to-order transitions are particularly enriched among disease mutations compared to neutral polymorphisms. Finally, we discuss potential differences in functional outcomes between disease mutations in ordered and disordered regions, and challenge the conventional structure-centric view of missense mutations.
doi:10.1039/c1mb05251a
PMCID: PMC3307532  PMID: 22080206
5.  Mapping copy number variation by population scale genome sequencing 
Nature  2011;470(7332):59-65.
Summary
Genomic structural variants (SVs) are abundant in humans, differing from other variation classes in extent, origin, and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (i.e., copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analyzing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
doi:10.1038/nature09708
PMCID: PMC3077050  PMID: 21293372
6.  Identification, Analysis and Prediction of Protein Ubiquitination Sites 
Proteins  2010;78(2):365-380.
Summary
Ubiquitination plays an important role in many cellular processes and is implicated in many diseases. Experimental identification of ubiquitination sites is challenging due to rapid turnover of ubiquitinated proteins and the large size of the ubiquitin modifier. We identified 141 new ubiquitination sites using a combination of liquid chromatography, mass spectrometry and mutant yeast strains. Investigation of the sequence biases and structural preferences around known ubiquitination sites indicated that their properties were similar to those of intrinsically disordered protein regions. Using a combined set of new and previously known ubiquitination sites, we developed a random forest predictor of ubiquitination sites, UbPred. The class-balanced accuracy of UbPred reached 72%, with the area under the ROC curve at 80%. The application of UbPred showed that high confidence Rsp5 ubiquitin ligase substrates and proteins with very short half-lives were significantly enriched in the number of predicted ubiquitination sites. Proteome-wide prediction of ubiquitination sites in Saccharomyces cerevisiae indicated that highly ubiquitinated substrates were prevalent among transcription/enzyme regulators and proteins involved in cell cycle control. In the human proteome, cytoskeletal, cell cycle, regulatory and cancer-associated proteins display higher extent of ubiquitination than proteins from other functional categories. We show that gain and loss of predicted ubiquitination sites may likely represent a molecular mechanism behind a number of disease-associated mutations. UbPred is available at http://www.ubpred.org
doi:10.1002/prot.22555
PMCID: PMC3006176  PMID: 19722269
UbPred; protein ubiquitination sites; prediction; post-translational modification; intrinsically disordered protein; unstructured; disordered
7.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures 
Abstract
We introduce a novel graph-based kernel method for annotating functional residues in protein structures. A structure is first modeled as a protein contact graph, where nodes correspond to residues and edges connect spatially neighboring residues. Each vertex in the graph is then represented as a vector of counts of labeled non‐isomorphic subgraphs (graphlets), centered on the vertex of interest. A similarity measure between two vertices is expressed as the inner product of their respective count vectors and is used in a supervised learning framework to classify protein residues. We evaluated our method on two function prediction problems: identification of catalytic residues in proteins, which is a well-studied problem suitable for benchmarking, and a much less explored problem of predicting phosphorylation sites in protein structures. The performance of the graphlet kernel approach was then compared against two alternative methods, a sequence‐based predictor and our implementation of the FEATURE framework. On both tasks, the graphlet kernel performed favorably; however, the margin of difference was considerably higher on the problem of phosphorylation site prediction. While there is data that phosphorylation sites are preferentially positioned in intrinsically disordered regions, we provide evidence that for the sites that are located in structured regions, neither the surface accessibility alone nor the averaged measures calculated from the residue microenvironments utilized by FEATURE were sufficient to achieve high accuracy. The key benefit of the graphlet representation is its ability to capture neighborhood similarities in protein structures via enumerating the patterns of local connectivity in the corresponding labeled graphs.
doi:10.1089/cmb.2009.0029
PMCID: PMC2921594  PMID: 20078397
algorithms; graphs; kernel methods; machine learning; protein structure; protein function
8.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures 
We introduce a novel graph-based kernel method for annotating functional residues in protein structures. A structure is first modeled as a protein contact graph, where nodes correspond to residues and edges connect spatially neighboring residues. Each vertex in the graph is then represented as a vector of counts of labeled non-isomorphic subgraphs (graphlets), centered on the vertex of interest. A similarity measure between two vertices is expressed as the inner product of their respective count vectors and is used in a supervised learning framework to classify protein residues. We evaluated our method on two function prediction problems: identification of catalytic residues in proteins, which is a well-studied problem suitable for benchmarking, and a much less explored problem of predicting phosphorylation sites in protein structures. The performance of the graphlet kernel approach was then compared against two alternative methods, a sequence-based predictor and our implementation of the FEATURE framework. On both tasks the graphlet kernel performed favorably; however, the margin of difference was considerably higher on the problem of phosphorylation site prediction. While there is data that phosphorylation sites are preferentially positioned in intrinsically disordered regions, we provide evidence that for the sites that are located in structured regions, neither the surface accessibility alone nor the averaged measures calculated from the residue microenvironments utilized by FEATURE were sufficient to achieve high accuracy. The key benefit of the graphlet representation is its ability to capture neighborhood similarities in protein structures via enumerating the patterns of local connectivity in the corresponding labeled graphs.
doi:10.1089/cmb.2009.0029
PMCID: PMC2921594  PMID: 20078397
9.  LOSS OF POST-TRANSLATIONAL MODIFICATION SITES IN DISEASE 
Understanding and predicting molecular cause of disease is one of the major challenges for biology and medicine. One particular area of interest continues to be computational analyses of disease-associated amino acid substitutions. To this end, various studies have been performed to identify molecular functions disrupted by disease-causing mutations. Here, we investigate the influence of disease-associated mutations on post-translational modifications. In particular, we study the loss of modification target sites as a consequence of disease mutation. We find that about 5% of disease-associated mutations may affect known modification sites, either partially (4%) of fully (1%), compared to about 2% of putatively neutral polymorphisms. Most of the fifteen post-translational modification types analyzed were found to be disrupted at levels higher than expected by chance. Molecular functions and physiochemical properties at sites of disease mutation were also compared to those of neutral polymorphisms involved in the process of post-translational modification site disruption. Disease-associated mutations in the neighborhood of post-translationally modified sites were found to be enriched in mutations that change polarity, charge, and hydrophobicity of the wild-type amino acids. Overall, these results further suggest that disruption of modification sites is an important but not the major cause of human genetic disease.
PMCID: PMC2813771  PMID: 19908386
10.  Unfoldomics of human diseases: linking protein intrinsic disorder with diseases 
BMC Genomics  2009;10(Suppl 1):S7.
Background
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) lack stable tertiary and/or secondary structure yet fulfills key biological functions. The recent recognition of IDPs and IDRs is leading to an entire field aimed at their systematic structural characterization and at determination of their mechanisms of action. Bioinformatics studies showed that IDPs and IDRs are highly abundant in different proteomes and carry out mostly regulatory functions related to molecular recognition and signal transduction. These activities complement the functions of structured proteins. IDPs and IDRs were shown to participate in both one-to-many and many-to-one signaling. Alternative splicing and posttranslational modifications are frequently used to tune the IDP functionality. Several individual IDPs were shown to be associated with human diseases, such as cancer, cardiovascular disease, amyloidoses, diabetes, neurodegenerative diseases, and others. This raises questions regarding the involvement of IDPs and IDRs in various diseases.
Results
IDPs and IDRs were shown to be highly abundant in proteins associated with various human maladies. As the number of IDPs related to various diseases was found to be very large, the concepts of the disease-related unfoldome and unfoldomics were introduced. Novel bioinformatics tools were proposed to populate and characterize the disease-associated unfoldome. Structural characterization of the members of the disease-related unfoldome requires specialized experimental approaches. IDPs possess a number of unique structural and functional features that determine their broad involvement into the pathogenesis of various diseases.
Conclusion
Proteins associated with various human diseases are enriched in intrinsic disorder. These disease-associated IDPs and IDRs are real, abundant, diversified, vital, and dynamic. These proteins and regions comprise the disease-related unfoldome, which covers a significant part of the human proteome. Profound association between intrinsic disorder and various human diseases is determined by a set of unique structural and functional characteristics of IDPs and IDRs. Unfoldomics of human diseases utilizes unrivaled bioinformatics and experimental techniques, paves the road for better understanding of human diseases, their pathogenesis and molecular mechanisms, and helps develop new strategies for the analysis of disease-related proteins.
doi:10.1186/1471-2164-10-S1-S7
PMCID: PMC2709268  PMID: 19594884
11.  Functional Anthology of Intrinsic Disorder. III. Ligands, Postranslational Modifications and Diseases Associated with Intrinsically Disordered Proteins 
Journal of proteome research  2007;6(5):1917-1932.
Currently, the understanding of the relationships between function, amino acid sequence and protein structure continues to represent one of the major challenges of the modern protein science. As much as 50% of eukaryotic proteins are likely to contain functionally important long disordered regions. Many proteins are wholly disordered but still possess numerous biologically important functions. However, the number of experimentally confirmed disordered proteins with known biological functions is substantially smaller than their actual number in nature. Therefore, there is a crucial need for novel bioinformatics approaches that allow projection of the current knowledge from a few experimentally verified examples to much larger groups of known and potential proteins. The elaboration of a bioinformatics tool for the analysis of functional diversity of intrinsically disordered proteins and application of this data mining tool to >200,000 proteins from Swiss-Prot database, each annotated with at least one of the 875 functional keywords was described in the first paper of this series (Xie H., Vucetic S., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. I. Biological processes and functions of proteins with long disordered regions. J. Proteome Res.). Using this tool, we have found that out of the 711 Swiss-Prot functional keywords associated with at least 20 proteins, 262 were strongly positively correlated with long intrinsically disordered regions, and 302 were strongly negatively correlated. Illustrative examples of functional disorder or order were found for the vast majority of keywords showing strongest positive or negative correlation with intrinsic disorder, respectively. Some 80 Swiss-Prot keywords associated with disorder- and order-driven biological processes and protein functions were described in the first paper (Xie H., Vucetic S., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. I. Biological processes and functions of proteins with long disordered regions. J. Proteome Res.). The second paper of the series was devoted to the presentation of 87 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes and coding sequence diversities possessing strong positive and negative correlation with long disordered regions (Vucetic S., Xie H., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. II. Cellular components, domains, technical terms, developmental processes and coding sequence diversities correlated with long disordered regions. J. Proteome Res.). Protein structure and functionality can be modulated by various posttranslational modifications or/and as a result of binding of specific ligands. Numerous human diseases are associated with protein misfolding/misassembly/ misfunctioning. This work concludes the series of papers dedicated to the functional anthology of intrinsic disorder and describes ~80 Swiss-Prot functional keywords that are related to ligands, posttranslational modifications and diseases possessing strong positive or negative correlation with the predicted long disordered regions in proteins.
doi:10.1021/pr060394e
PMCID: PMC2588348  PMID: 17391016
Intrinsic disorder; protein structure; protein function; intrinsically disordered proteins; bioinformatics; disorder prediction
12.  Functional Anthology of Intrinsic Disorder. II. Cellular Components, Domains, Technical Terms, Developmental Processes and Coding Sequence Diversities Correlated with Long Disordered Regions 
Journal of proteome research  2007;6(5):1899-1916.
Biologically active proteins without stable ordered structure (i.e., intrinsically disordered proteins) are attracting increased attention. Functional repertoires of ordered and disordered proteins are very different, and the ability to differentiate whether a given function is associated with intrinsic disorder or with a well-folded protein is crucial for modern protein science. However, there is a large gap between the number of proteins experimentally confirmed to be disordered and their actual number in nature. As a result, studies of functional properties of confirmed disordered proteins, while helpful in revealing the functional diversity of protein disorder, provide only a limited view. To overcome this problem, a bioinformatics approach for comprehensive study of functional roles of protein disorder was proposed in the first paper of this series (Xie H., Vucetic S., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. I. Biological processes and functions of proteins with long disordered regions. J. Proteome Res.). Applying this novel approach to Swiss-Prot sequences and functional keywords, we found over 238 and 302 keywords to be strongly positively or negatively correlated, respectively, with long intrinsically disordered regions. This paper describes ~90 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes and coding sequence diversities possessing strong positive and negative correlation with long disordered regions.
doi:10.1021/pr060393m
PMCID: PMC2588346  PMID: 17391015
Intrinsic disorder; protein structure; protein function; intrinsically disordered proteins; bioinformatics; disorder prediction
13.  Functional Anthology of Intrinsic Disorder. I. Biological Processes and Functions of Proteins with Long Disordered Regions 
Journal of proteome research  2007;6(5):1882-1898.
Identifying relationships between function, amino acid sequence and protein structure represents a major challenge. In this study we propose a bioinformatics approach that identifies functional keywords in the Swiss-Prot database that correlate with intrinsic disorder. A statistical evaluation is employed to rank the significance of these correlations. Protein sequence data redundancy and the relationship between protein length and protein structure were taken into consideration to ensure the quality of the statistical inferences. Over 200,000 proteins from Swiss-Prot database were analyzed using this approach. The predictions of intrinsic disorder were carried out using PONDR VL3E predictor of long disordered regions that achieves an accuracy of above 86%. Overall, out of the 710 Swiss-Prot functional keywords that were each associated with at least 20 proteins, 238 were found to be strongly positively correlated with predicted long intrinsically disordered regions, whereas 302 were strongly negatively correlated with such regions. The remaining 170 keywords were ambiguous without strong positive or negative correlation with the disorder predictions. These functions cover a large variety of biological activities and imply that disordered regions are characterized by a wide functional repertoire. Our results agree well with literature findings, as we were able to find at least one illustrative example of functional disorder or order shown experimentally for the vast majority of keywords showing the strongest positive or negative correlation with intrinsic disorder. This work opens a series of three papers, which enriches the current view of protein structure-function relationships, especially with regards to functionalities of intrinsically disordered proteins and provides researchers with a novel tool that could be used to improve the understanding of the relationships between protein structure and function. The first paper of the series describes our statistical approach, outlines the major findings and provides illustrative examples of biological processes and functions positively and negatively correlated with intrinsic disorder.
doi:10.1021/pr060392u
PMCID: PMC2543138  PMID: 17391014
Intrinsic disorder; protein structure; protein function; intrinsically disordered proteins; bioinformatics; disorder prediction
14.  Intrinsic Disorder Is a Common Feature of Hub Proteins from Four Eukaryotic Interactomes 
PLoS Computational Biology  2006;2(8):e100.
Recent proteome-wide screening approaches have provided a wealth of information about interacting proteins in various organisms. To test for a potential association between protein connectivity and the amount of predicted structural disorder, the disorder propensities of proteins with various numbers of interacting partners from four eukaryotic organisms (Caenorhabditis elegans, Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens) were investigated. The results of PONDR VL-XT disorder analysis show that for all four studied organisms, hub proteins, defined here as those that interact with ≥10 partners, are significantly more disordered than end proteins, defined here as those that interact with just one partner. The proportion of predicted disordered residues, the average disorder score, and the number of predicted disordered regions of various lengths were higher overall in hubs than in ends. A binary classification of hubs and ends into ordered and disordered subclasses using the consensus prediction method showed a significant enrichment of wholly disordered proteins and a significant depletion of wholly ordered proteins in hubs relative to ends in worm, fly, and human. The functional annotation of yeast hubs and ends using GO categories and the correlation of these annotations with disorder predictions demonstrate that proteins with regulation, transcription, and development annotations are enriched in disorder, whereas proteins with catalytic activity, transport, and membrane localization annotations are depleted in disorder. The results of this study demonstrate that intrinsic structural disorder is a distinctive and common characteristic of eukaryotic hub proteins, and that disorder may serve as a determinant of protein interactivity.
Synopsis
From the formulation of Emil Fisher's lock-and-key hypothesis in 1894 until the early 1990s, a dominating and widely accepted concept in molecular biology was the protein structure–function paradigm. According to this concept, a protein can perform its biological function(s) only after folding into a specific rigid 3-D structure. Only recently has the validity of this structure–function paradigm been seriously challenged, primarily through the wealth of counterexamples that have gradually accumulated over the past 15 years. These counterexamples demonstrated that many proteins exist in a natively unfolded (or intrinsically disordered) state, and function without a prerequisite stably folded structure. In many cases, the lack of structure is required for biological function. Previous results have implicated intrinsic disorder as having an important role in protein interactions. The authors generalize this notion by comparing interaction networks from four eukaryotic organisms: yeast, worm, fly, and human. They have found that within these networks the proteins that interact with multiple protein partners (network hubs) are significantly more disordered than proteins that interact with a single protein partner (network ends). The results of this study demonstrate that intrinsic structural disorder is a distinctive and common characteristic of hub proteins, and that disorder may serve as a determinant of protein interactivity.
doi:10.1371/journal.pcbi.0020100
PMCID: PMC1526461  PMID: 16884331
15.  Serine/arginine-rich splicing factors belong to a class of intrinsically disordered proteins 
Nucleic Acids Research  2006;34(1):305-312.
Serine/arginine-rich (SR) splicing factors play an important role in constitutive and alternative splicing as well as during several steps of RNA metabolism. Despite the wealth of functional information about SR proteins accumulated to-date, structural knowledge about the members of this family is very limited. To gain a better insight into structure-function relationships of SR proteins, we performed extensive sequence analysis of SR protein family members and combined it with ordered/disordered structure predictions. We found that SR proteins have properties characteristic of intrinsically disordered (ID) proteins. The amino acid composition and sequence complexity of SR proteins were very similar to those of the disordered protein regions. More detailed analysis showed that the SR proteins, and their RS domains in particular, are enriched in the disorder-promoting residues and are depleted in the order-promoting residues as compared to the entire human proteome. Moreover, disorder predictions indicated that RS domains of SR proteins were completely unstructured. Two different classification methods, the charge-hydropathy measure and the cumulative distribution function (CDF) of the disorder scores, were in agreement with each other, and they both strongly predicted members of the SR protein family to be disordered. This study emphasizes the importance of the disordered structure for several functions of SR proteins, such as for spliceosome assembly and for interaction with multiple partners. In addition, it demonstrates the usefulness of order/disorder predictions for inferring protein structure from sequence.
doi:10.1093/nar/gkj424
PMCID: PMC1326245  PMID: 16407336
16.  The importance of intrinsic disorder for protein phosphorylation 
Nucleic Acids Research  2004;32(3):1037-1049.
Reversible protein phosphorylation provides a major regulatory mechanism in eukaryotic cells. Due to the high variability of amino acid residues flanking a relatively limited number of experimentally identified phosphorylation sites, reliable prediction of such sites still remains an important issue. Here we report the development of a new web-based tool for the prediction of protein phosphorylation sites, DISPHOS (DISorder-enhanced PHOSphorylation predictor, http://www.ist.temple.edu/DISPHOS). We observed that amino acid compositions, sequence complexity, hydrophobicity, charge and other sequence attributes of regions adjacent to phosphorylation sites are very similar to those of intrinsically disordered protein regions. Thus, DISPHOS uses position-specific amino acid frequencies and disorder information to improve the discrimination between phosphorylation and non-phosphorylation sites. Based on the estimates of phosphorylation rates in various protein categories, the outputs of DISPHOS are adjusted in order to reduce the total number of misclassified residues. When tested on an equal number of phosphorylated and non-phosphorylated residues, the accuracy of DISPHOS reaches 76% for serine, 81% for threonine and 83% for tyrosine. The significant enrichment in disorder-promoting residues surrounding phosphorylation sites together with the results obtained by applying DISPHOS to various protein functional classes and proteomes, provide strong support for the hypothesis that protein phosphorylation predominantly occurs within intrinsically disordered protein regions.
doi:10.1093/nar/gkh253
PMCID: PMC373391  PMID: 14960716
17.  Phosphorylation Variation during the Cell Cycle Scales with Structural Propensities of Proteins 
PLoS Computational Biology  2013;9(1):e1002842.
Phosphorylation at specific residues can activate a protein, lead to its localization to particular compartments, be a trigger for protein degradation and fulfill many other biological functions. Protein phosphorylation is increasingly being studied at a large scale and in a quantitative manner that includes a temporal dimension. By contrast, structural properties of identified phosphorylation sites have so far been investigated in a static, non-quantitative way. Here we combine for the first time dynamic properties of the phosphoproteome with protein structural features. At six time points of the cell division cycle we investigate how the variation of the amount of phosphorylation correlates with the protein structure in the vicinity of the modified site. We find two distinct phosphorylation site groups: intrinsically disordered regions tend to contain sites with dynamically varying levels, whereas regions with predominantly regular secondary structures retain more constant phosphorylation levels. The two groups show preferences for different amino acids in their kinase recognition motifs - proline and other disorder-associated residues are enriched in the former group and charged residues in the latter. Furthermore, these preferences scale with the degree of disorderedness, from regular to irregular and to disordered structures. Our results suggest that the structural organization of the region in which a phosphorylation site resides may serve as an additional control mechanism. They also imply that phosphorylation sites are associated with different time scales that serve different functional needs.
Author Summary
Cells employ protein phosphorylation – the addition of a phosphate group to serine, threonine or tyrosine residues – as a key regulatory mechanism for modulating protein function. Proteomics technologies can now quantify thousands of phosphorylation sites to reveal the dynamics of phosphorylation at each site in response to a biological process. It is known that phosphorylation does not occur randomly with regard to a protein's structure, but so far the relationship between the dynamics of phosphorylation and these structural properties has not been investigated. Here we relate the relative levels of phosphorylation for more than 5,000 sites through the cell cycle to the predicted structural features of the vicinity of the sites. We find that dynamic phosphorylation tends to occur in disordered regions, whereas phosphorylation sites that did not vary as much over the cell cycle are often located in defined secondary structure elements. Kinases that prefer charged amino acids in their substrate motives are more often associated with unchanging sites whereas proline-directed protein kinases phosphorylate cell cycle regulated sites in disordered regions more frequently. The structural organization of the region in which a phosphorylation site resides may therefore serve as an additional control mechanism in kinase mediated regulation.
doi:10.1371/journal.pcbi.1002842
PMCID: PMC3542066  PMID: 23326221
18.  Binding of Two Intrinsically Disordered Peptides to a Multi-Specific Protein: A Combined Monte Carlo and Molecular Dynamics Study 
PLoS Computational Biology  2012;8(9):e1002682.
The unique ability of intrinsically disordered proteins (IDPs) to fold upon binding to partner molecules makes them functionally well-suited for cellular communication networks. For example, the folding-binding of different IDP sequences onto the same surface of an ordered protein provides a mechanism for signaling in a many-to-one manner. Here, we study the molecular details of this signaling mechanism by applying both Molecular Dynamics and Monte Carlo methods to S100B, a calcium-modulated homodimeric protein, and two of its IDP targets, p53 and TRTK-12. Despite adopting somewhat different conformations in complex with S100B and showing no apparent sequence similarity, the two IDP targets associate in virtually the same manner. As free chains, both target sequences remain flexible and sample their respective bound, natively -helical states to a small extent. Association occurs through an intermediate state in the periphery of the S100B binding pocket, stabilized by nonnative interactions which are either hydrophobic or electrostatic in nature. Our results highlight the importance of overall physical properties of IDP segments, such as net charge or presence of strongly hydrophobic amino acids, for molecular recognition via coupled folding-binding.
Author Summary
A substantial fraction of our proteins are believed to be partly or completely disordered, meaning that they contain regions that lack a stable folded structure under typical physiological conditions. This is a feature which plays a key role in their functions. For example, it allows them to have many structurally different binding partners which in turn permits the construction of the intricate signaling and regulatory networks necessary to sustain complex biological organisms such as ourselves. Whereas measuring the binding strengths of associations involving disordered proteins is routine, the binding process itself is today still not fully understood. We use two different computational models to study the interactions of a folded protein, S100B, which can bind various disordered peptides. In particular, we compare two peptides whose structures are known when in complex with S100B. Our results suggest that, although the peptides assume different structures in the bound state, there are similarities in how they associate with S100B. The possibility to computationally model the interplay between proteins is an important complement to experiments, by identifying crucial steps in the binding process. This is essential to understand, e.g., how single mutations sometimes lead to serious diseases.
doi:10.1371/journal.pcbi.1002682
PMCID: PMC3441455  PMID: 23028280
19.  A protein domain-based interactome network for C. elegans early embryogenesis 
Cell  2008;134(3):534-545.
Summary
Many protein-protein interactions are mediated through independently folding modular domains. Proteome-wide efforts to model protein-protein interaction or “interactome” networks have largely ignored this modular organization of proteins. We developed an experimental strategy to efficiently identify interaction domains and generated a domain-based interactome network for proteins involved in C. elegans early embryonic cell divisions. Minimal interacting regions were identified for over 200 proteins, providing important information on their domain organization. Furthermore, our approach increased the sensitivity of the two-hybrid system, resulting in a more complete interactome network. This interactome modeling strategy revealed new insights into C. elegans centrosome function and is applicable to other biological processes in this and other organisms.
doi:10.1016/j.cell.2008.07.009
PMCID: PMC2596478  PMID: 18692475

Results 1-19 (19)