It is now widely recognized that intrinsically unstructured (or disordered) proteins (IUPs, or IDPs) are found in organisms from all kingdoms of life. In eukaryotes, IUPs are highly abundant and perform a wide range of biological functions, including regulation and signaling. Despite increased interest in understanding the structural biology of IUPs/IDPs, questions remain regarding the mechanisms through which disordered proteins perform their biological function(s). In other words, what are the relationships between disorder and function for IUPs? There are several excellent reviews that discuss the structural properties of IUPs/IDPs since 2005(1-3). Here, we briefly review general concepts pertaining to IUPs and then discuss our structural, biophysical, and biochemical studies of two IUPs, p21 and p27, which regulate the mammalian cell division cycle by inhibiting cyclin-dependent kinases (Cdks). Some segments of these two proteins are partially folded in isolation and they fold further upon binding their biological targets. Interestingly, some portions of p27 remain flexible after binding to and inhibiting Cdk2/cyclin A. This residual flexibility allows otherwise buried tyrosine residues within p27 to be phosphorylated by non-receptor tyrosine kinases (NRTKs). Tyrosine phosphorylation relieves kinase inhibition, triggering Cdk2-mediated phosphorylation of a threonine residue within the flexible C-terminus of p27. This, in turn, marks p27 for ubiquitination and proteasomal degradation, unleashing full Cdk2 activity which drives cell cycle progression. p27, thus, constitutes a conduit for transmission of proliferative signals via post-translational modifications. The term “conduit” is used here to connote a means of transmission of molecular signals which, in the case of p27, correspond to tyrosine and threonine phosphorylation, ubiquitination and, ultimately, proteolytic degradation. Transmission of these multiple signals is enabled by the inherent flexibility of p27 which persists even after tight binding to Cdk2/cyclin A. Importantly, activation of the p27 signaling conduit by oncogenic NRTKs contributes to tumorigenesis in some human cancers, including chronic myelogenous leukemia (CML) (4) and breast cancer (5). Other IUPs may participate in conceptually similar molecular signaling conduits and dysregulation of these putative conduits may contribute to other human diseases. Detailed study of these IUPs, both alone and within functional complexes, is required to test these hypotheses and to more fully understand the relationships between protein disorder and biological function.
intrinsically unstructured proteins; disordered proteins; cell signaling; cellular regulation; signaling conduit; post-translational modification; phosphorylation; poly-ubiquitination
Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation.
Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity).
We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins.
Although during the past decade research has shown the functional importance of disorder in proteins, many of the structural and dynamics properties of intrinsically unstructured proteins (IUPs) remain to be elucidated. This review is focused on the role of the extensions of the ribosomal proteins in the early steps of the assembly of the eubacterial 50 S subunit. The recent crystallographic structures of the ribosomal particles have revealed the picture of a complex assembly pathway that condenses the rRNA and the ribosomal proteins into active ribosomes. However, little is know about the molecular mechanisms of this process. It is thought that the long basic r-protein extensions that penetrate deeply into the subunit cores play a key role through disorder-order transitions and/or co-folding mechanisms. A current view is that such structural transitions may facilitate the proper rRNA folding. In this paper, the structures of the proteins L3, L4, L13, L20, L22 and L24 that have been experimentally found to be essential for the first steps of ribosome assembly have been compared. On the basis of their structural and dynamics properties, three categories of extensions have been identified. Each of them seems to play a distinct function. Among them, only the coil-helix transition that occurs in a phylogenetically conserved cluster of basic residues of the L20 extension appears to be strictly required for the large subunit assembly in eubacteria. The role of α helix-coil transitions in 23 S RNA folding is discussed in the light of the calcium binding protein calmodulin that shares many structural and dynamics properties with L20.
Flexibility; structural transitions; helix-coil; calmodulin; linker; electrostatic; helix unwinding; unfolding
Intrinsically disordered proteins have long stretches of their polypeptide chain, which do not adopt a single native structure composed of stable secondary and tertiary structure in the absence of binding partners. The prediction of intrinsically disordered regions in proteins from sequence is increasingly becoming of interest, as the presence of many such regions in the complete genome sequences are discovered and important functional roles are associated with them. We have developed a machine learning approach based on two support vector machines (SVM) to discriminate disordered regions from sequence. The SVM are trained and benchmarked on two sets, representing long and short disordered regions. A preliminary version of Spritz was shown to perform consistently well at the recent biannual CASP-6 experiment [Critical Assessment of Techniques for Protein Structure Prediction (CASP), 2004]. The fully developed Spritz method is freely available as a web server at and .
Motivation: The mutation of amino acids often impacts protein function and structure. Mutations without negative effect sustain evolutionary pressure. We study a particular aspect of structural robustness with respect to mutations: regular protein secondary structure and natively unstructured (intrinsically disordered) regions. Is the formation of regular secondary structure an intrinsic feature of amino acid sequences, or is it a feature that is lost upon mutation and is maintained by evolution against the odds? Similarly, is disorder an intrinsic sequence feature or is it difficult to maintain? To tackle these questions, we in silico mutated native protein sequences into random sequence-like ensembles and monitored the change in predicted secondary structure and disorder.
Results: We established that by our coarse-grained measures for change, predictions and observations were similar, suggesting that our results were not biased by prediction mistakes. Changes in secondary structure and disorder predictions were linearly proportional to the change in sequence. Surprisingly, neither the content nor the length distribution for the predicted secondary structure changed substantially. Regions with long disorder behaved differently in that significantly fewer such regions were predicted after a few mutation steps. Our findings suggest that the formation of regular secondary structure is an intrinsic feature of random amino acid sequences, while the formation of long-disordered regions is not an intrinsic feature of proteins with disordered regions. Put differently, helices and strands appear to be maintained easily by evolution, whereas maintaining disordered regions appears difficult. Neutral mutations with respect to disorder are therefore very unlikely.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Altered abundance of several intrinsically unstructured proteins (IUPs) has been associated with perturbed cellular signalling that may lead to pathological conditions such as cancer. Therefore, it is important to understand how cells precisely regulate availability of IUPs. We observe that regulation of transcript clearance, proteolytic degradation, and translational rate contribute to control the abundance of IUPs, some of which are present in low amounts and for short periods of time. Abundant phosphorylation and low stochasticity in transcription and translation indicates that availability of IUPs can be finely tuned. Fidelity in signalling may require that most IUPs are available in appropriate amounts and not present longer than needed.
Lack of stable three-dimensional structure, or intrinsic disorder, is a common phenomenon in proteins. Naturally unstructured regions are proven to be essential for carrying function by many proteins and therefore identification of such regions is an important issue. CASP has been assessing the state of the art in predicting disorder regions from amino acid sequence since 2002. Here we present the results of the evaluation of the disorder predictions submitted to CASP9. The assessment is based on the evaluation measures and procedures used in previous CASPs. The balanced accuracy and the Matthews correlation coefficient were chosen as basic measures for evaluating the correctness of binary classifications. The area under the receiving operating characteristic curve was the measure of choice for evaluating probability-based predictions of disorder. The CASP9 methods are shown to perform slightly better than the CASP7 methods but not better than the methods in CASP8. It was also shown that capability of most CASP9 methods to predict disorder decreases with increasing minimum disorder segment length.
CASP; intrinsically disordered proteins; unstructured proteins; rediction of disordered regions; assessment of disorder prediction
Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks.
The details of protein structures are important for function. Regions that do not adopt any regular structure in isolation (natively unstructured or disordered regions) initially appeared as a curious exception to this structure–function paradigm. It has become increasingly clear that unstructured regions are fundamental to many roles and that they are particularly important for multicellular organisms. Structural biology is just beginning to apprehend the stunning diversity of these roles. Here, we focused on unstructured regions dominated by a particular type of loop, namely the natively unstructured one. We developed a method that succeeded in the distinction between well-structured and natively unstructured loops. For the development, we did not use any experimental data for unstructured regions; when tested on experimental data, the method performed surprisingly well. Due to its different premises, the method captured very different aspects of unstructured regions than other methods that we tested. We applied the new method to two different problems. The first was the identification of proteins that may be difficult targets for structure determination. The second was the identification of worm proteins that have many interaction partners (more than seven) and unstructured regions. Surprisingly, we found unstructured regions of the loopy type in more than 50% of all the promiscuous worm proteins.
Intrinsically unstructured or disordered proteins are common and functionally important. Prediction of disordered regions in proteins can provide useful information for understanding protein function and for high-throughput determination of protein structures.
In this paper, algorithms are presented to predict long and short disordered regions in proteins, namely the long disordered region prediction algorithm DRaai-L and the short disordered region prediction algorithm DRaai-S. These algorithms are developed based on the Random Forest machine learning model and the profiles of amino acid indices representing various physiochemical and biochemical properties of the 20 amino acids.
Experiments on DisProt3.6 and CASP7 demonstrate that some sets of the amino acid indices have strong association with the ordered and disordered status of residues. Our algorithms based on the profiles of these amino acid indices as input features to predict disordered regions in proteins outperform that based on amino acid composition and reduced amino acid composition, and also outperform many existing algorithms. Our studies suggest that the profiles of amino acid indices combined with the Random Forest learning model is an important complementary method for pinpointing disordered regions in proteins.
CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/.
Protein intrinsic disorder is becoming increasingly recognized in proteomics research. While lacking structure, many regions of disorder have been associated with biological function. There are many different experimental methods for characterizing intrinsically disordered proteins and regions; nevertheless, the prediction of intrinsic disorder from amino acid sequence remains a useful strategy especially for many large-scale proteomics investigations. Here we introduced a consensus artificial neural network (ANN) prediction method, which was developed by combining the outputs of several individual disorder predictors. By eight-fold cross-validation, this meta-predictor, called PONDR-FIT, was found to improve the prediction accuracy over a range of 3 to 20% with an average of 11% compared to the single predictors, depending on the datasets being used. Analysis of the errors shows that the worst accuracy still occurs for short disordered regions with less than ten residues, as well as for the residues close to order/disorder boundaries. Increased understanding of the underlying mechanism by which such meta-predictors give improved predictions will likely promote the further development of protein disorder predictors. The access to PONDR-FIT is available at www.disprot.org.
natively unfolded; intrinsically unstructured; intrinsically disordered; highly flexible; highly dynamic; structurally disordered; predictor; PONDR
Natively unstructured regions are a common feature of eukaryotic proteomes. Between 30% and 60% of proteins are predicted to contain long stretches of disordered residues, and not only have many of these regions been confirmed experimentally, but they have also been found to be essential for protein function. In this study, we directly address the potential contribution of protein disorder in predicting protein function using standard Gene Ontology (GO) categories. Initially we analyse the occurrence of protein disorder in the human proteome and report ontology categories that are enriched in disordered proteins. Pattern analysis of the distributions of disordered regions in human sequences demonstrated that the functions of intrinsically disordered proteins are both length- and position-dependent. These dependencies were then encoded in feature vectors to quantify the contribution of disorder in human protein function prediction using Support Vector Machine classifiers. The prediction accuracies of 26 GO categories relating to signalling and molecular recognition are improved using the disorder features. The most significant improvements were observed for kinase, phosphorylation, growth factor, and helicase categories. Furthermore, we provide predicted GO term assignments using these classifiers for a set of unannotated and orphan human proteins. In this study, the importance of capturing protein disorder information and its value in function prediction is demonstrated. The GO category classifiers generated can be used to provide more reliable predictions and further insights into the behaviour of orphan and unannotated proteins.
As a result of high throughput sequencing technologies, there is a growing need to provide fast and accurate computational tools to predict the function of proteins from amino acid sequence. Most methods that attempt to do this rely on transferring function annotations between closely related proteins; however, a large proportion of unannotated proteins are orphans and do not share sufficient similarity to other proteins to be annotated in this way. Methods that target the annotation of these difficult proteins are feature-based methods and utilise relationships between the physical characteristics of proteins and function to make predictions. One important characteristic of proteins that remains unexploited in these feature-based methods is native structural disorder. Disordered regions of proteins are thought to adopt little or no regular structure and have been experimentally linked with the correct functioning of many proteins. Additionally, disordered regions of proteins can be successfully predicted from amino acid sequence. To address the requirement for protein function prediction methods that target the annotation of orphan proteins and explore the use of information describing protein disorder, a machine learning method for predicting protein function from sequence has been implemented. The inclusion of disorder features significantly improves prediction accuracies for many function categories relating to molecular recognition. The practical utility of the method is also demonstrated by providing annotations for a set of orphan and unannotated human proteins.
Disordered regions are segments of the protein chain which do not adopt stable structures. Such segments are often of interest because they have a close relationship with protein expression and functionality. As such, protein disorder prediction is important for protein structure prediction, structure determination and function annotation.
This paper presents our protein disorder prediction server, PreDisorder. It is based on our ab initio prediction method (MULTICOM-CMFR) which, along with our meta (or consensus) prediction method (MULTICOM), was recently ranked among the top disorder predictors in the eighth edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP8). We systematically benchmarked PreDisorder along with 26 other protein disorder predictors on the CASP8 data set and assessed its accuracy using a number of measures. The results show that it compared favourably with other ab initio methods and its performance is comparable to that of the best meta and clustering methods.
PreDisorder is a fast and reliable server which can be used to predict protein disordered regions on genomic scale. It is available at http://casp.rnet.missouri.edu/predisorder.html.
Protein Phosphatase 1 (PP1) is an essential and ubiquitous serine/threonine protein phosphatase that is regulated by more than 100 known inhibitor and targeting proteins. It is currently unclear how protein inhibitors distinctly and specifically regulate PP1 to enable rapid responses to cellular alterations. We demonstrate that two PP1 inhibitors, I-2 and DARPP-32, belong to the class of intrinsically unstructured proteins (IUPs). We show that both inhibitors have distinct preferences for transient local and long range structure. These preferences are likely their structural signature for their interaction with PP1. Furthermore, we show that upon phosphorylation of Thr34 in DARPP-32, which turns DARPP-32 into a potent inhibitor of PP1, neither local nor long range structure of DARPP-32 is altered. Therefore, our data suggests a role for these transient 3-dimensional topologies in binding mechanisms that enable extensive contacts with PP1's invariant surfaces. Together, these interactions enable potent and selective inhibition of PP1.
Protein Phosphatase 1; Inhibitor-2; DARPP-32; serine/threonine signaling; NMR spectroscopy
The number and importance of intrinsically disordered proteins (IUP), known to be involved in various human disorders, are growing rapidly. To test for the generalized implications of intrinsic disorders in proteins involved in Neurodegenerative diseases, disorder prediction tools have been applied to three datasets comprising of proteins involved in Huntington Disease (HD), Parkinson's disease (PD), Alzheimer's disease (AD). Results show, in general, proteins in disease datasets possess significantly enhanced intrinsic unstructuredness. Most of these disordered proteins in the disease datasets are found to be involved in neuronal activities, signal transduction, apoptosis, intracellular traffic, cell differentiation etc. Also these proteins are found to have more number of interactors and hence as the proportion of disorderedness (i.e., the length of the unfolded stretch) increased, the size of the interaction network simultaneously increased. All these observations reflect that, “Moonlighting” i.e. the contextual acquisition of different structural conformations (transient), eventually may allow these disordered proteins to act as network “hubs” and thus they may have crucial influences in the pathogenecity of neurodegenerative diseases.
Symptoms of vaginal bleeding and abdominal pain are common in cases of ectopic pregnancy (EP), spontaneous abortions (SAB), and complications of an intrauterine pregnancy (IUP). It is important to determine if efforts should focus on differentiating EP from an IUP (IUP + SAB) or a viable IUP from a nonviable gestation (EP + SAB) in women at risk for EP.
This is a retrospective cohort study of women who presented with bleeding or pain or both during the first trimester of pregnancy. The cohort was divided into subjects diagnosed with IUP vs. (EP + SAB). The same cohort was then divided into subjects diagnosed with EP vs. (IUP + SAB). Logistic regression models based on risk factors for both outcomes (EP vs. [IUP + SAB] and IUP vs. [EP + SAB]) were obtained. ROC curves as well as Hosmer-Lemeshow goodness of fit and Akaike's information criterion (AIC) were used.
Overall, 18.1% (n = 367) of the women were diagnosed with EP, 58.8% (n = 1192) were diagnosed with an SAB, and 23.1% (n = 467) had an ongoing IUP. The area under the ROC curve for the model IUP vs. (EP + SAB) was statistically greater than the model EP vs. (IUP + SAB), p < 0.001. AIC and Hosmer-Lemeshow goodness of fit confirmed the better accuracy of the model comparing IUP vs. (EP + SAB).
Information collected at initial presentation from women at risk for EP to be used for building prediction rules should focus on differentiating a viable from a nonviable pregnancy rather than attempting to distinguish an extrauterine from an intrauterine pregnancy. However, this distinction should not affect current clinical care.
The effects of disease mutations on protein structure and function have been extensively investigated, and many predictors of the functional impact of single amino acid substitutions are publicly available. The majority of these predictors are based on protein structure and evolutionary conservation, following the assumption that disease mutations predominantly affect folded and conserved protein regions. However, the prevalence of the intrinsically disordered proteins (IDPs) and regions (IDRs) in the human proteome together with their lack of fixed structure and low sequence conservation raise a question about the impact of disease mutations in IDRs. Here, we investigate annotated missense disease mutations and show that 21.7% of them are located within such intrinsically disordered regions. We further demonstrate that 20% of disease mutations in IDRs cause local disorder-to-order transitions, which represents a 1.7–2.7 fold increase compared to annotated polymorphisms and neutral evolutionary substitutions, respectively. Secondary structure predictions show elevated rates of transition from helices and strands into loops and vice versa in the disease mutations dataset. Disease disorder-to-order mutations also influence predicted molecular recognition features (MoRFs) more often than the control mutations. The repertoire of disorder-to-order transition mutations is limited, with five most frequent mutations (R→W, R→C, E→K, R→H, R→Q) collectively accounting for 44% of all deleterious disorder-to-order transitions. As a proof of concept, we performed accelerated molecular dynamics simulations on a deleterious disorder-to-order transition mutation of tumor protein p63 and, in agreement with our predictions, observed an increased α-helical propensity of the region harboring the mutation. Our findings highlight the importance of mutations in IDRs and refine the traditional structure-centric view of disease mutations. The results of this study offer a new perspective on the role of mutations in disease, with implications for improving predictors of the functional impact of missense mutations.
Intrinsically unstructured or disordered proteins have been implicated in the etiology of a wide spectrum of diseases. However, the molecular mechanisms that relate mutations in intrinsically disordered regions (IDRs) to disease pathogenesis have not been investigated. Disordered proteins do not conform to the prevailing view of deleterious mutations which equates function, structure and evolutionary conservation – intrinsically disordered regions are functional, but lack a fixed three-dimensional structure and in general have low sequence conservation. Here we demonstrate that >20% of disease-associated missense mutations affect IDRs and interfere with their functions. We further show that 20% of deleterious mutations in IDRs induce predicted disorder-to-order transitions. Our predictions are supported by accelerated molecular dynamics simulations that show an increase in helical propensity of the region harboring a disease disorder-to-order transition mutation of tumor protein p63. Our results refine the traditional structure-centric view of disease mutations and offer a new perspective on the role of non-synonymous mutations in disease. Our findings have broad implications for improving predictors of the functional impact of missense mutations, and for interpretation of novel variants identified in large genome sequencing projects that aim to provide a better understanding of human genetic variation and its relevance to common diseases.
For well-structured, rigid proteins, the prediction of rotational tumbling time (τc) using atomic coordinates is reasonably accurate, but is inaccurate for proteins with long unstructured sequences. Under physiological conditions, many proteins contain long disordered segments that play important regulatory roles in fundamental biological events including signal transduction and molecular recognition. Here we describe an ensemble approach to the boundary element method that accurately predicts τc for such proteins by introducing two layers of molecular surfaces whose correlated velocities decay exponentially with distance. Reliable prediction of τc will help to detect intra- and inter-molecular interactions and conformational switches between more-ordered and less-ordered states of the disordered segments. The method has been extensively validated using 12 reference proteins with 14 to 103 disordered residues at the N- and/or C-terminus, and has been successfully employed to explain a set of published results on a system that incorporates a conformational switch.
protein; disordered; unstructured; tumbling; rotational diffusion; boundary element
3D-Jury, the structure prediction consensus method publicly available in the Meta Server , was evaluated using models gathered in the 7th round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers.
The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models.
The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature available in the Meta Server.
Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together.
We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted.
MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/
The spliceosome is a molecular machine that performs the excision of introns from eukaryotic pre-mRNAs. This macromolecular complex comprises in human cells five RNAs and over one hundred proteins. In recent years, many spliceosomal proteins have been found to exhibit intrinsic disorder, that is to lack stable native three-dimensional structure in solution. Building on the previous body of proteomic, structural and functional data, we have carried out a systematic bioinformatics analysis of intrinsic disorder in the proteome of the human spliceosome. We discovered that almost a half of the combined sequence of proteins abundant in the spliceosome is predicted to be intrinsically disordered, at least when the individual proteins are considered in isolation. The distribution of intrinsic order and disorder throughout the spliceosome is uneven, and is related to the various functions performed by the intrinsic disorder of the spliceosomal proteins in the complex. In particular, proteins involved in the secondary functions of the spliceosome, such as mRNA recognition, intron/exon definition and spliceosomal assembly and dynamics, are more disordered than proteins directly involved in assisting splicing catalysis. Conserved disordered regions in spliceosomal proteins are evolutionarily younger and less widespread than ordered domains of essential spliceosomal proteins at the core of the spliceosome, suggesting that disordered regions were added to a preexistent ordered functional core. Finally, the spliceosomal proteome contains a much higher amount of intrinsic disorder predicted to lack secondary structure than the proteome of the ribosome, another large RNP machine. This result agrees with the currently recognized different functions of proteins in these two complexes.
In eukaryotic cells, introns are spliced out of proteincoding mRNAs by a highly dynamic and extraordinarily plastic molecular machine called the spliceosome. In recent years, multiple regions of intrinsic structural disorder were found in spliceosomal proteins. Intrinsically disordered regions lack stable native three-dimensional structure in solutions, which makes them structurally flexible and/or able to switch between different conformations. Hence, intrinsically disordered regions are the ideal candidate responsible for the spliceosome's plasticity. Intrinsically disordered regions are also frequently the sites of post-translational modifications, which were also proven to be important in spliceosome dynamics. In this article, we describe the results of a structural bioinformatics analysis focused on intrinsic disorder in the spliceosomal proteome. We systematically analyzed all known human spliceosomal proteins with regards to the presence and type of intrinsic disorder. Almost a half of the combined sequence of these spliceosomal proteins is predicted to be intrinsically disordered, and the type of intrinsic disorder in a protein varies with its function and its location in the spliceosome. The parts of the spliceosome that act earlier in the process are more disordered, which corresponds to their role in establishing a network of interactions, while the parts that act later are more ordered.
Meta-BASIC (http://basic.bioinfo.pl) is a novel sensitive approach for recognition of distant similarity between proteins based on consensus alignments of meta profiles. Specifically, Meta-BASIC compares sequence profiles combined with predicted secondary structure by utilizing several scoring systems and alignment algorithms. In our benchmarking tests, Meta-BASIC outperforms many individual servers, including fold recognition servers, and it can compete with meta predictors that base their strength on the structural comparison of models. In addition, Meta-BASIC, which enables detection of very distant relationships even if the tertiary structure for the reference protein is not known, has a high-throughput capability. This new method is applied to 860 PfamA protein families with unknown function (DUF) and provides many novel structure–functional assignments available on-line at http://basic.bioinfo.pl/duf.pl. Detailed discussion is provided for two of the most interesting assignments. DUF271 and DUF431 are predicted to be a nucleotide-diphospho-sugar transferase and an α/β-knot SAM-dependent RNA methyltransferase, respectively.
Orchestration of the lifetimes and conformations of intrinsically unstructured proteins and their mRNAs ensure precision and flexibility in biological control.
The lifetimes and conformations of intrinsically unstructured proteins (IUPs) and their mRNAs are orchestrated to ensure precision, speed and flexibility in biological control.
Intrinsically disordered proteins (IDPs) and regions (IDRs) perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving per-residue accuracies over 80%. In a genome-wide study of intrinsic disorder in human genome we observed a big difference in predicted disorder content between confirmed and putative human proteins. We investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted disorder content.
To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. We developed a procedure to create synthetic peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment.
Application of the developed predictor to putative human protein sequences showed that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. This partially, albeit not completely, explains the observed discrepancy in predicted disorder content between confirmed and putative human proteins.
Our findings provide the first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates may be biased.
Intrinsically disordered/unstructured proteins (IDPs) are extremely sensitive to proteolysis in vitro, but show no enhanced degradation rates in vivo. Their existence and functioning may be explained if IDPs are preferentially associated with chaperones in the cell, which may offer protection against degradation by proteases. To test this inference, we took pairwise interaction data from high-throughput interaction studies and analyzed to see if predicted disorder correlates with the tendency of chaperone binding by proteins. Our major finding is that disorder predicted by the IUPred algorithm actually shows negative correlation with chaperone binding in E. coli, S. cerevisiae, and metazoa species. Since predicted disorder positively correlates with the tendency of partner binding in the interactome, the difference between the disorder of chaperone-binding and non-binding proteins is even more pronounced if normalized to their overall tendency to be involved in pairwise protein–protein interactions. We argue that chaperone binding is primarily required for folding of globular proteins, as reflected in an increased preference for chaperones of proteins in which at least one Pfam domain exists. In terms of the functional consequences of chaperone binding of mostly disordered proteins, we suggest that its primary reason is not the assistance of folding, but promotion of assembly with partners. In support of this conclusion, we show that IDPs that bind chaperones also tend to bind other proteins.
Intrinsically disordered/unstructured proteins (IDPs) defy the classical structure–function paradigm because they exist and function without a well-defined 3-D structure. These proteins are extremely sensitive to degradation in the test tube, but show no enhanced degradation rates in the cell. To resolve this apparent contradiction, we tested whether IDPs are protected by interaction with accessory proteins, chaperones, often implicated in guarding other proteins in the cell. Our major finding is that disorder predicted by the IUPred algorithm actually shows negative correlation with chaperone binding in various species. To explain this finding, we argue that IDPs are protected in the cell from proteases by their special amino acid composition, and also by the tight regulation of intracellular proteases. Thus, the primary reason for their chaperone binding is not protection from degradation, but promotion of assembly with partners.