Contrarily to the general believe, many biologically active proteins lack stable tertiary and/or secondary structure under physiological conditions in vitro. These intrinsically disordered proteins (IDPs) are highly abundant in nature and many of them are associated with various human diseases. The functional repertoire of IDPs complements the functions of ordered proteins. Since IDPs constitute a significant portion of any given proteome, they can be combined in an unfoldome; which is a portion of the proteome including all IDPs (also known as natively unfolded proteins, therefore, unfoldome), and describing their functions, structures, interactions, evolution, and so forth. Amino acid sequence and compositions of IDPs are very different from those of ordered proteins, making possible reliable identification of IDPs at the proteome level by various computational means. Furthermore, IDPs possess a number of unique structural properties and are characterized by a peculiar conformational behavior, including their high stability against low pH and high temperature and their structural indifference toward the unfolding by strong denaturants. These peculiarities were shown to be useful for elaboration of the experimental techniques for the large-scale identification of IDPs in various organisms. Some of the computational and experimental tools for the unfoldome discovery are discussed in this review.
Toxoplasma gondii is an obligate intracellular parasite of the phylum Apicomplexa, which includes a number of species of medical and veterinary importance. Inhibitors of lysine deacetylases (KDACs) exhibit potent antiparasitic activity, suggesting that interference with lysine acetylation pathways hold promise for future drug targeting. Using high resolution LC-MS/MS to identify parasite peptides enriched by immunopurification with acetyl-lysine antibody, we recently produced an acetylome of the proliferative intracellular stage of Toxoplasma. In this study, we used similar approaches to greatly expand the Toxoplasma acetylome by identifying acetylated proteins in non-replicating extracellular tachyzoites. The functional breakdown of acetylated proteins in extracellular parasites is similar to intracellular parasites, with an enrichment of proteins involved in metabolism, translation, and chromatin biology. Altogether, we have now detected over 700 acetylation sites on a wide variety of parasite proteins of diverse function in multiple subcellular compartments. We found 96 proteins uniquely acetylated in intracellular parasites, 216 uniquely acetylated in extracellular parasites, and 177 proteins acetylated in both states. Our findings suggest that dramatic changes occur at the proteomic level as tachyzoites transition from the intracellular to extracellular environment, similar to reports documenting significant changes in gene expression during this transition. The expanded dataset also allowed a thorough analysis of the degree of protein intrinsic disorder surrounding lysine residues targeted for this post-translational modification. These analyses indicate that acetylated lysines in proteins from extracellular and intracellular tachyzoites are largely located within similar local environments, and that lysine acetylation preferentially occurs in intrinsically disordered or flexible regions.
parasite; proteomics; acetylation; lysine; Apicomplexa; tachyzoite
Besides being a common threat to farm animals and poultry, coronavirus (CoV) was responsible for the human severe acute respiratory syndrome (SARS) epidemic in 2002–4. However, many aspects of CoV behavior, including modes of its transmission, are yet to be fully understood. We show that the amount and the peculiarities of distribution of the protein intrinsic disorder in the viral shell can be used for the efficient analysis of the behavior and transmission modes of CoV. The proposed model allows categorization of the various CoVs by the peculiarities of disorder distribution in their membrane (M) and nucleocapsid (N). This categorization enables quick identification of viruses with similar behaviors in transmission, regardless of genetic proximity. Based on this analysis, an empirical model for predicting the viral transmission behavior is developed. This model is able to explain some behavioral aspects of important coronaviruses that previously were not fully understood. The new predictor can be a useful tool for better epidemiological, clinical, and structural understanding of behavior of both newly emerging viruses and viruses that have been known for a long time. A potentially new vaccine strategy could involve searches for viral strains that are characterized by the evolutionary misfit between the peculiarities of the disorder distribution in their shells and their behavior.
The earliest whole protein order/disorder predictor (Uversky et al., Proteins, 41: 415-427 (2000)), herein called the charge-hydropathy (C-H) plot, was originally developed using the Kyte-Doolittle (1982) hydropathy scale (Kyte & Doolittle., J. Mol. Biol, 157: 105-132(1982)). Here the goal is to determine whether the performance of the C-H plot in separating structured and disordered proteins can be improved by using an alternative hydropathy scale.
Using the performance of the CH-plot as the metric, we compared 19 alternative hydropathy scales, with the finding that the Guy (1985) hydropathy scale (Guy, Biophys. J, 47:61-70(1985)) was the best of the tested hydropathy scales for separating large collections structured proteins and intrinsically disordered proteins (IDPs) on the C-H plot. Next, we developed a new scale, named IDP-Hydropathy, which further improves the discrimination between structured proteins and IDPs. Applying the C-H plot to a dataset containing 109 IDPs and 563 non-homologous fully structured proteins, the Kyte-Doolittle (1982) hydropathy scale, the Guy (1985) hydropathy scale, and the IDP-Hydropathy scale gave balanced two-state classification accuracies of 79%, 84%, and 90%, respectively, indicating a very substantial overall improvement is obtained by using different hydropathy scales. A correlation study shows that IDP-Hydropathy is strongly correlated with other hydropathy scales, thus suggesting that IDP-Hydropathy probably has only minor contributions from amino acid properties other than hydropathy.
We suggest that IDP-Hydropathy would likely be the best scale to use for any type of algorithm developed to predict protein disorder.
Intrinsically disordered proteins; natively unstructured or unfolded proteins; structure and disorder prediction; support vector machines
The native state of a protein is usually associated with a compact globular conformation possessing a rigid and highly ordered structure. At the turn of the last century certain studies arose which concluded that many proteins cannot, in principle, form a rigid globular structure in an aqueous environment, but they are still able to fulfill their specific functions — i.e., they are native. The existence of the disordered regions allows these proteins to interact with their numerous binding partners. Such interactions are often accompanied by the formation of complexes that possess a more ordered structure than the original components. The functional diversity of these proteins, combined with the variability of signals related to the various intra-and intercellular processes handled by these proteins and their capability to produce multi-variant and multi-directional responses allow them to form a unique regulatory net in a cell. The abundance of disordered proteins inside the cell is precisely controlled at the synthesis and clearance levels as well as via interaction with specific binding partners and posttranslational modifications. Another recently recognized biologically active state of proteins is the functional amyloid. The formation of such functional amyloids is tightly controlled and therefore differs from the uncontrolled formation of pathogenic amyloids which are associated with the pathogenesis of several conformational diseases, the development of which is likely to be determined by the failures of the cellular regulatory systems rather than by the formation of the proteinaceous deposits and/or by the protofibril toxicity.
protein folding; globular proteins; natively disordered proteins; protein-protein and DNA-protein complexes; amorphous aggregates; amyloid fibrils; functional amyloid; inter- and intramolecular contacts
The intracellular environment represents an extremely crowded milieu, with a limited amount of free water and an almost complete lack of unoccupied space. Obviously, slightly salted aqueous solutions containing low concentrations of a biomolecule of interest are too simplistic to mimic the “real life” situation, where the biomolecule of interest scrambles and wades through the tightly packed crowd. In laboratory practice, such macromolecular crowding is typically mimicked by concentrated solutions of various polymers that serve as model “crowding agents”. Studies under these conditions revealed that macromolecular crowding might affect protein structure, folding, shape, conformational stability, binding of small molecules, enzymatic activity, protein-protein interactions, protein-nucleic acid interactions, and pathological aggregation. The goal of this review is to systematically analyze currently available experimental data on the variety of effects of macromolecular crowding on a protein molecule. The review covers more than 320 papers and therefore represents one of the most comprehensive compendia of the current knowledge in this exciting area.
macromolecular crowding; excluded volume; protein structure; protein folding; protein function; protein-protein interaction; intrinsically disordered protein; protein aggregation
In human membrane proteins, intrinsically disordered regions, the regions that lack a well-defined three-dimensional structure under physiological conditions, preferentially occur in the cytoplasmic tails. Many of these proteins represent cell receptors that function by recognizing their cognate ligand outside the cell and translating this binding information into an intracellular activation signal. Based on location of recognition and signaling (effector) domains, functionally diverse and unrelated cell receptors can be classified into two main families: those in which binding and signaling domains are located on the same protein chain, the so-called single-chain receptors (SRs), and those in which these domains are intriguingly located on separate subunits, the so-called multichain receptors (MRs). Recognition domains of both SRs and MRs are known to be well ordered. In contrast, while cytoplasmic signaling domains of SRs are well-structured as well, those of MRs are intrinsically disordered. Despite important role of receptor signaling in health and disease, extensive comparative structural analysis of receptor signaling domains has not been carried out as of yet. In this study, using a variety of prediction algorithms, we show that protein disorder is a characteristic and distinctive feature of receptors with recognition and signaling functions distributed between separate protein chains. We also reveal that disorder distribution patterns are rather similar within SR subclasses suggesting potential functional explanations. Why did nature select protein disorder to provide intracellular signaling for MRs? Is there any correlation between disorder profiles of signaling domains and receptor function? These and other questions are addressed in this article.
intrinsically disordered proteins; immune signaling; protein disorder; single-chain receptors; multichain immune recognition receptors; MIRR; T cell receptor; B cell receptor; RTK; receptor tyrosine kinases
Conformational behavior of five homologous proteins, parvalbumins (PAs) from northern pike (α and β isoforms), Baltic cod, and rat (α and β isoforms), was studied by scanning calorimetry, circular dichroism, and bis-ANS fluorescence. The mechanism of the temperature-induced denaturation of these proteins depends dramatically on both the peculiarities of their amino acid sequences and on their interaction with metal ions. For example, the pike α-PA melting can be described by two successive two-state transitions with mid-temperatures of 90° and 120°C, suggesting the presence of two thermodynamic domains. The intermediate state populated at the end of the first transition was shown to bind Ca2+ ions, and was characterized by the largely preserved secondary structure and increased solvent exposure of hydrophobic groups. Mg2+ and Na+-loaded forms of pike α-PA demonstrated a single two-state transition. Therefore, the mechanism of the PA thermal denaturation is controlled by metal binding. It ranged from the absence of detectable first-order transition (apo-form of pike PA), to the two-state transition (e.g., Mg2+ and Na+-loaded forms of pike α-PA), to the more complex mechanisms (Ca2+-loaded PAs) involving at least one partially folded intermediate. Analysis of isolated cavities in the protein structures revealed that the interface between the CD and EF subdomains of Ca2+-loaded pike α-PA is much more loosely packed compared with PAs manifesting single heat-sorption peak. The impairment of interactions between CD and EF subdomains may cause a loss of structural cooperativity and appearance of two separate thermodynamic domains. One more peculiar feature of pike α-PA is that depending on its interactions with metal ions, it can be an intrinsically disordered protein (apo-form), an ordered protein of mesophilic (Na+-bound state), thermophilic (Mg2+-form), or even of the hyperthermophilic origin (Ca2+-form).
thermodynamics; cooperativity; thermodynamic domain; structural domain; EF-hand; protein unfolding; protein denaturation; intermediate; metal binding; protein cavities; protein intrinsic disorder; hyperthermophile; allergen
Arg96 is a highly conservative residue known to catalyze spontaneous green fluorescent protein (GFP) chromophore biosynthesis. To understand a role of Arg96 in conformational stability and structural behavior of EGFP, the properties of a series of the EGFP mutants bearing substitutions at this position were studied using circular dichroism, steady state fluorescence spectroscopy, fluorescence lifetime, kinetics and equilibrium unfolding analysis, and acrylamide-induced fluorescence quenching. During the protein production and purification, high yield was achieved for EGFP/Arg96Cys variant, whereas EGFP/Arg96Ser and EGFP/Arg96Ala were characterized by essentially lower yields and no protein was produced when Arg96 was substituted by Gly. We have also shown that only EGFP/Arg96Cys possessed relatively fast chromophore maturation, whereas it took EGFP/Arg96Ser and EGFP/Arg96Ala about a year to develop a noticeable green fluorescence. The intensity of the characteristic green fluorescence measured for the EGFP/Arg96Cys and EGFP/Arg96Ser (or EGFP/Arg96Ala) was 5- and 50-times lower than that of the nonmodified EGFP. Intriguingly, EGFP/Arg96Cys was shown to be more stable than EGFP toward the GdmCl-induced unfolding both in kinetics and in the quasi-equilibrium experiments. In comparison with EGFP, tryptophan residues of EGFP/Arg96Cys were more accessible to the solvent. These data taken together suggest that besides established earlier crucial catalytic role, Arg96 is important for the overall folding and conformational stability of GFP.
green fluorescent protein; enhanced green fluorescent protein; fluorescent protein; point mutation; chromophore structure; conformational stability; circular dichroism
α-Synuclein aggregation and fibrillation are closely associated with the formation of Lewy bodies in neurons and are implicated in the causative pathogenesis of Parkinson's disease and other synucleinopathies. Currently, there is no approved therapeutic agent directed toward preventing the protein aggregation, which has been recently shown to have a key role in the cytotoxic nature of amyloidogenic proteins. Flavonoids, known as plant pigments, belong to a broad family of polyphenolic compounds. Over 4,000 flavonoids have been identified from various plants and foodstuffs derived from plants and have been demonstrated as potential neuroprotective agents. In this study 48 flavonoids belonging to several classes with structures differing in the position of double bonds and ring substituents were tested for their ability to inhibit the fibrillation of α-synuclein in vitro. A variety of flavonoids inhibited α-synuclein fibrillation, and most of the strong inhibitory flavonoids were also found to disaggregate preformed fibrils.
Certain metals lead to increased risk of Parkinson’s disease (PD) and the aggregation of α-synuclein is implicated in the PD pathology. Although α-synuclein fibrillation has been extensively studied in dilute solutions in vitro, the intracellular environment is highly crowded. We are showing here that certain metals cause a significant acceleration of α-synuclein fibrillation in the presence of high concentrations of various macromolecules mostly through decreasing the fibrillation lagtime. The faster fibrillation in crowded environments in the presence of heavy metals suggests a simple molecular basis for the observed elevated risk of PD due to exposure to metals.
Parkinson’s disease; α-synuclein; crowding; fibrillation; aggregation; metals
Currently, the understanding of the relationships between function, amino acid sequence and protein structure continues to represent one of the major challenges of the modern protein science. As much as 50% of eukaryotic proteins are likely to contain functionally important long disordered regions. Many proteins are wholly disordered but still possess numerous biologically important functions. However, the number of experimentally confirmed disordered proteins with known biological functions is substantially smaller than their actual number in nature. Therefore, there is a crucial need for novel bioinformatics approaches that allow projection of the current knowledge from a few experimentally verified examples to much larger groups of known and potential proteins. The elaboration of a bioinformatics tool for the analysis of functional diversity of intrinsically disordered proteins and application of this data mining tool to >200,000 proteins from Swiss-Prot database, each annotated with at least one of the 875 functional keywords was described in the first paper of this series (Xie H., Vucetic S., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. I. Biological processes and functions of proteins with long disordered regions. J. Proteome Res.). Using this tool, we have found that out of the 711 Swiss-Prot functional keywords associated with at least 20 proteins, 262 were strongly positively correlated with long intrinsically disordered regions, and 302 were strongly negatively correlated. Illustrative examples of functional disorder or order were found for the vast majority of keywords showing strongest positive or negative correlation with intrinsic disorder, respectively. Some 80 Swiss-Prot keywords associated with disorder- and order-driven biological processes and protein functions were described in the first paper (Xie H., Vucetic S., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. I. Biological processes and functions of proteins with long disordered regions. J. Proteome Res.). The second paper of the series was devoted to the presentation of 87 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes and coding sequence diversities possessing strong positive and negative correlation with long disordered regions (Vucetic S., Xie H., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. II. Cellular components, domains, technical terms, developmental processes and coding sequence diversities correlated with long disordered regions. J. Proteome Res.). Protein structure and functionality can be modulated by various posttranslational modifications or/and as a result of binding of specific ligands. Numerous human diseases are associated with protein misfolding/misassembly/ misfunctioning. This work concludes the series of papers dedicated to the functional anthology of intrinsic disorder and describes ~80 Swiss-Prot functional keywords that are related to ligands, posttranslational modifications and diseases possessing strong positive or negative correlation with the predicted long disordered regions in proteins.
Intrinsic disorder; protein structure; protein function; intrinsically disordered proteins; bioinformatics; disorder prediction
Biologically active proteins without stable ordered structure (i.e., intrinsically disordered proteins) are attracting increased attention. Functional repertoires of ordered and disordered proteins are very different, and the ability to differentiate whether a given function is associated with intrinsic disorder or with a well-folded protein is crucial for modern protein science. However, there is a large gap between the number of proteins experimentally confirmed to be disordered and their actual number in nature. As a result, studies of functional properties of confirmed disordered proteins, while helpful in revealing the functional diversity of protein disorder, provide only a limited view. To overcome this problem, a bioinformatics approach for comprehensive study of functional roles of protein disorder was proposed in the first paper of this series (Xie H., Vucetic S., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. I. Biological processes and functions of proteins with long disordered regions. J. Proteome Res.). Applying this novel approach to Swiss-Prot sequences and functional keywords, we found over 238 and 302 keywords to be strongly positively or negatively correlated, respectively, with long intrinsically disordered regions. This paper describes ~90 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes and coding sequence diversities possessing strong positive and negative correlation with long disordered regions.
Intrinsic disorder; protein structure; protein function; intrinsically disordered proteins; bioinformatics; disorder prediction
Identifying relationships between function, amino acid sequence and protein structure represents a major challenge. In this study we propose a bioinformatics approach that identifies functional keywords in the Swiss-Prot database that correlate with intrinsic disorder. A statistical evaluation is employed to rank the significance of these correlations. Protein sequence data redundancy and the relationship between protein length and protein structure were taken into consideration to ensure the quality of the statistical inferences. Over 200,000 proteins from Swiss-Prot database were analyzed using this approach. The predictions of intrinsic disorder were carried out using PONDR VL3E predictor of long disordered regions that achieves an accuracy of above 86%. Overall, out of the 710 Swiss-Prot functional keywords that were each associated with at least 20 proteins, 238 were found to be strongly positively correlated with predicted long intrinsically disordered regions, whereas 302 were strongly negatively correlated with such regions. The remaining 170 keywords were ambiguous without strong positive or negative correlation with the disorder predictions. These functions cover a large variety of biological activities and imply that disordered regions are characterized by a wide functional repertoire. Our results agree well with literature findings, as we were able to find at least one illustrative example of functional disorder or order shown experimentally for the vast majority of keywords showing the strongest positive or negative correlation with intrinsic disorder. This work opens a series of three papers, which enriches the current view of protein structure-function relationships, especially with regards to functionalities of intrinsically disordered proteins and provides researchers with a novel tool that could be used to improve the understanding of the relationships between protein structure and function. The first paper of the series describes our statistical approach, outlines the major findings and provides illustrative examples of biological processes and functions positively and negatively correlated with intrinsic disorder.
Intrinsic disorder; protein structure; protein function; intrinsically disordered proteins; bioinformatics; disorder prediction
Misfolding and self-assembly of proteins in nanoaggregates of different sizes and morphologies (nanoensembles, primary nanofilaments, nanorings, filaments, protofibrils, fibrils, etc.) is a common theme unifying a number of human pathologies termed protein misfolding diseases. Recent studies highlight increasing recognition of the public health importance of protein misfolding diseases, including various neurodegenerative disorders and amyloidoses. It is understood now that the first essential elements in the vast majority of neurodegenerative processes are misfolded and aggregated proteins. Altogether, the accumulation of abnormal protein nanoensembles exerts toxicity by disrupting intracellular transport, overwhelming protein degradation pathways, and/or disturbing vital cell functions. In addition, the formation of inclusion bodies is known to represent a major problem in the production of recombinant therapeutic proteins. Formulation of these therapeutic proteins into delivery systems and their in vivo delivery are often complicated by protein association. Thus, protein folding abnormalities and subsequent events underlie a multitude of human pathologies and difficulties with protein therapeutic applications. The field of medicine therefore can be greatly advanced by establishing a fundamental understanding of key factors leading to misfolding and self-assembly responsible for various protein folding pathologies. This article overviews protein misfolding diseases and outlines some novel and advanced nanotechnologies, including nanoimaging techniques, nanotoolboxes and nanocontainers, complemented by appropriate ensemble techniques, all focused on the ultimate goal to establish etiology and to diagnose, prevent, and cure these devastating disorders.
misfolding; protein aggregation; conformational disease; partially folded intermediate; nanomedicine
Intrinsically disordered, highly charged protein sequences act as entropic bristles (EBs), which, when translationally fused to partner proteins, serve as effective solubilizers by creating both large favorable surface area for water interactions and large excluded volumes around the partner. By extending away from the partner and sweeping out large molecules, EBs can enable the target protein to fold free from interference. Using both naturally-occurring and artificial polypeptides we demonstrate the successful implementation of intrinsically disordered fusions as protein solubilizers. The artificial fusions discussed herein have low sequence complexity and high net charge, but are diversified by means of distinctive amino acid compositions and lengths. Using 6xHis fusions as controls, soluble protein expression enhancements from 65% (EB60A) to 100% (EB250) were observed for a 20-protein portfolio. Additionally, these EBs were able to more effectively solubilize targets compared to frequently-used fusions such as maltose-binding-protein, glutathione S-transferase, thioredoxin, and N utilization substance A. Finally, although these EBs possess very distinct physio-chemical properties they did not perturb the structure, conformational stability nor function of the green fluorescent protein or the glutathione S-transferase protein. This work thus illustrates the successful de novo design of intrinsically-disordered fusions, and presents a promising technology and complementary resource for researchers attempting to solubilize recalcitrant proteins.
intrinsic disorder; protein; solubility; aggregation; translational fusion
The axis inhibition (Axin) scaffold protein colocalizes β-catenin, casein kinase Iα, and glycogen synthetase kinase 3β by their binding to Axin's long intrinsically disordered region, thereby yielding structured domains with flexible linkers. This complex leads to the phosphorylation of β-catenin, marking it for destruction. Fusing proteins with flexible linkers vastly accelerates chemical interactions between them by their colocalization. Here we propose that the complex works by random movements of a “stochastic machine,” not by coordinated conformational changes. This noncovalent, modular assembly process allows the various molecular machine components to be used in multiple processes.
Scaffold proteins; molecular machines; signaling; protein complexes
Hepatitis C virus (HCV) infection represents a worldwide health threat that still needs efficient protective vaccine and/or effective drug. The traditional medicine, such as camel milk, is heavily used by the large sector of HCV patients to control the infection due to the high cost of the available standard therapy. Camel milk contains lactoferrin, which plays an important and multifunctional role in innate immunity and specific host defense against microbial infection. Continuing the analysis of the effectiveness of camel lactoferrin against HCV, the current study aimed to separate and purify the native N- and C-lobes from the proteolytically cleaved camel lactoferrin (cLF) and to compare their in vitro activities against the HCV infection in Huh7.5 cells in order to determine the most active domain.
Lactoferrin and its digested N- and C-lobes were purified by Mono S 5/50 GL column and Superdex 200 5/150 column. The purified proteins were assessed through three venues: 1. To inhibit intracellular replication, HCV infected cells were treated with the proteins at different concentrations and time intervals; 2. The proteins were directly incubated with the viral particles (neutralization) and then such neutralized viruses were used to infect cells; 3. The cells were protected with proteins before exposure to the virus. The antiviral potentials of the cLf and its lobes were determined using three techniques: 1. RT-nested PCR, 2. Real-time PCR, and 3. Flow cytometry.
N- and C-lobes were purified in two consecutive steps; using Mono-S and Superdex 200 columns. The molecular mass of N- and C-lobes was about 40 kDa. cLF and its lobes could prevent HCV entry into Huh 7.5 cells with activity reached 100% through direct interaction with the virus. The inhibition of intracellular viral replication by N-lobe is 2-fold and 3-fold more effective than that of the cLF and C-lobe, respectively.
Generated native N- and C-lobes from camel lactoferrin demonstrated a range of noticeably different potentials against HCV cellular infectivity. The anti-HCV activities were sorted as N-lobe > cLf > C-lobe.
Camel lactoferrin; Proteolytic digestion; Purification; N- and C-lobes; Huh7.5 cells; Anti-HCV
Hypothetical proteins comprise roughly half of the predicted gene complement of Toxoplasma gondii and Plasmodium falciparum and represent the largest class of uniquely functioning proteins in these parasites. Following the idea that functional relationships can be informed by the timing of gene expression, we devised a strategy to identify the core set of apicomplexan cell division cycling genes with important roles in parasite division, which includes many uncharacterized proteins. We assembled an expanded list of orthologs from the T. gondii and P. falciparum genome sequences (2781 putative orthologs), compared their mRNA profiles during synchronous replication, and sorted the resulting set of dual cell cycle regulated orthologs (744 total) into protein pairs conserved across many eukaryotic families versus those unique to the Apicomplexa. The analysis identified more than 100 ortholog gene pairs with unknown function in T. gondii and P. falciparum that displayed co-conserved mRNA abundance, dynamics of cyclical expression and similar peak timing that spanned the complete division cycle in each parasite. The unknown cyclical mRNAs encoded a diverse set of proteins with a wide range of mass and showed a remarkable conservation in the internal organization of ordered versus disordered structural domains. A representative sample of cyclical unknown genes (16 total) was epitope tagged in T. gondii tachyzoites yielding the discovery of new protein constituents of the parasite inner membrane complex, key mitotic structures and invasion organelles. These results demonstrate the utility of using gene expression timing and dynamic profile to identify proteins with unique roles in Apicomplexa biology.
Proteins of the p53 family are expressed in vertebrates and in some invertebrate species. The main function of these proteins is to control and regulate cell cycle in response to various cellular signals, and therefore to control the organism’s development. The regulatory functions of the p53 family members originate mostly from their highly-conserved and well-structured DNA-binding domains. Many human diseases (including various types of cancer) are related to the missense mutations within this domain. The ordered DNA-binding domains of the p53 family members are surrounded by functionally important intrinsically disordered regions. In this study, substitution rates and propensities in different regions of p53 were analyzed. The analyses revealed that the ordered DNA-binding domain is conserved, whereas disordered regions are characterized by high sequence diversity. This diversity was reflected both in the number of substitutions and in the types of substitutions to which each amino acid was prone. These results support the existence of a positive correlation between protein intrinsic disorder and sequence divergence during the evolutionary process. This higher sequence divergence provides strong support for the existence of disordered regions in p53 in vivo for if they were structured, they would evolve at similar rates as the rest of the protein.
intrinsically disordered proteins; protein evolution; protein-protein interactions; protein-DNA interaction; p53 family
Intrinsically disordered proteins (IDPs) and proteins with long disordered regions are highly abundant in various proteomes. Despite their lack of well-defined ordered structure, these proteins and regions are frequently involved in crucial biological processes. Although in recent years these proteins have attracted the attention of many researchers, IDPs represent a significant challenge for structural characterization since these proteins can impact many of the processes in the structure determination pipeline. Here we investigate the effects of IDPs on the structure determination process and the utility of disorder prediction in selecting and improving proteins for structural characterization. Examination of the extent of intrinsic disorder in existing crystal structures found that relatively few protein crystal structures contain extensive regions of intrinsic disorder. Although intrinsic disorder is not the only cause of crystallization failures and many structured proteins cannot be crystallized, filtering out highly disordered proteins from structure-determination target lists is still likely to be cost effective. Therefore it is desirable to avoid highly disordered proteins from structure-determination target lists and we show that disorder prediction can be applied effectively to enrich structure determination pipelines with proteins more likely to yield crystal structures. For structural investigation of specific proteins, disorder prediction can be used to improve targets for structure determination. Finally, a framework for considering intrinsic disorder in the structure determination pipeline is proposed.
Proteomics; Structural genomics; Structural proteomics; Intrinsically disordered protein
The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
The identification of intrinsically disordered proteins (IDPs) among the targets that fail to form satisfactory crystal structures in the Protein Structure Initiative represent a key to reducing the costs and time for determining three-dimensional structures of proteins. To help in this endeavor, several Protein Structure Initiative Centers were asked to send samples of both crystallizable proteins and proteins that failed to crystallize. The abundance of intrinsic disorder in these proteins was evaluated via computational analysis using Predictors of Natural Disordered Regions (PONDR®) and the potential cleavage sites and corresponding fragments were determined. Then, the target proteins were analyzed for intrinsic disorder by their resistance to limited proteolysis. The rates of tryptic digestion of sample target proteins were compared to those of lysozyme/myoglobin, apo-myoglobin and α-casein as standards of ordered, partially disordered and completely disordered proteins, respectively. At the next stage, the protein samples were subjected to both far-UV and near-UV circular dichroism (CD) analysis. For most of the samples, a good agreement between CD data, predictions of disorder and the rates of limited tryptic digestion was established. Further experimentation is being performed on a smaller subset of these samples in order to obtain more detailed information on the ordered/disordered nature of the proteins.
Intrinsically disordered proteins; protein disorder prediction; Protein Structure Initiative; limited proteolysis
Parkinson’s disease (PD) is a slowly progressive movement disorder that results from the loss of dopaminergic neurons in the substantia nigra, a small area of cells in the mid-brain. PD is a multifactorial disorder with unknown etiology, in which both genetic and environmental factors play important roles. Substantial evidence links α-synuclein, a small highly conserved presynaptic protein with unknown function, to both familial and sporadic PD. Rare familial cases of PD are associated with missense point mutations in α-synuclein, or with the hyper-expression of the wild type protein due to its gene duplication/triplication. Furthermore, α-synuclein was identified as the major component of amyloid fibrils found in Lewy body and Lewy neurites, the characteristic proteinaceous deposits that are the diagnostic hallmarks of PD. α-Synuclein is abundant in various regions of the brain and has two closely related homologs, β-synuclein and γ-synuclein. When isolated in solution, the protein is intrinsically disordered, but in the presence of lipid surfaces α-synuclein adopts a highly helical structure that is believed to mediate its normal function(s). A number of different conformational states of α-synuclein have been observed. Besides the membrane-bound form, other critical conformations include a partially-folded state that is a key intermediate in aggregation and fibrillation, various oligomeric species, and fibrillar and amorphous aggregates. A number of intrinsic and extrinsic factors that either accelerate or inhibit the rate of α-synuclein aggregation and fibrillation in vitro are known. There is a strong correlation between the conformation of α-synuclein (induced by various factors) and its rate of fibrillation. The aggregation process appears to be branched, with one pathway leading to fibrils and another to oligomeric intermediates that may ultimately form amorphous deposits. The molecular basis of Parkinson’s disease appears to be tightly coupled to the aggregation of α-synuclein and the factors that affect its conformation. This review focuses on the contributions of Prof. Anthony L. Fink to the field and presents some recent developments in this exciting area.
α-Synuclein; synucleinopathies; aggregation; amyloid; fibril; neurodegeneration; intrinsically disordered protein; NMR; partially folded intermediate