More than seventy years ago, it was speculated that antibody binding depends on unfolded rather than structured protein [1
]. Specifically, Linus Pauling suggested that high flexibility enables one antibody molecule to bind to differently shaped antigens. The specific idea was that of conformational selection in which the flexible antibody would randomly fluctuate among the different structures, with binding by a particular antigen selecting the structure that fits from the other conformers among the ensemble [2
]. The current body of evidence suggests that there are approximately two broad classes of antibodies, specific and non-specific. The sequence of a highly specific, high-affinity antibody folds into a specific structure that fits with its cognate antigen (with perhaps slight structural shifts of both the antibody and antigen). On the other hand, at least some of the low affinity, nonspecific antibodies contains binding sites that are disordered in isolation but become differently folded when bound to different partners. A recent assembly of structural data on antibody-antigen interactions supports the early conjectures cited above (manuscript in preparation).
More recently, involvement of intrinsic disorder in molecular recognition has been suggested to involve at least two possible mechanisms [3
]. Conformational selection, as described by Pauling [2
] and later by Karush [4
], is the first mechanism. Alternatively, a local part of the binding region could form an interaction followed by concomitant binding and folding over the remainder of the interface [5
]. While described sometime ago [6
], this second mechanism was recently described in terms of folding funnel concepts and called the "fly casting mechanism" [7
]. For two recently studied molecular recognition events, experimental evidence has been provided for the latter mechanism [8
]. Both of these recently studied interfaces are fairly large and extend over significant lengths of the intrinsically disordered proteins. Mixed mechanisms of course are possible, with a subregion of the interface interacting via conformational selection, followed by concomitant binding and folding for the remainder of the interface. The choice between conformational selection and concomitant binding and folding might be due in significant degree to the overall size of the interaction surface.
The existence of unstructured, or incompletely structured, proteins under physiological conditions began to be reported almost sixty years ago, with several additional reports in the following decades [4
]. Since the 1970s, an increasingly strong stream of disordered protein examples has been revealed, and many of these are described in our database of intrinsically disordered proteins [14
]. This database also contains a bibliography that is showing explosive growth, especially over the last few years.
The crowded conditions inside the cell have been suggested to cause intrinsically disordered proteins to fold into 3D structure. To test this possibility, intrinsically disordered proteins were subjected to molecular crowding by adding high concentrations of agents such as glucose. Such in vitro
molecular crowding experiments lead to successful folding of an acid-unfolded globular protein [16
], but fail to induce folding in several intrinsically disordered proteins [16
], suggesting perhaps that crowding leads to a deep energy well for a protein that folds under appropriate conditions but that crowding cannot induce a deep energy well for a protein with a sequence that is incommensurate with folding.
In-cell NMR experiments indicate that some proteins or protein regions remain unfolded even when crowding occurs inside a cell [18
]. Another in-cell NMR report [21
] involving some of the same authors was later retracted because protein leakage from the cells led to misleading data [22
]. The earlier experiments [18
] may not have suffered from the same leakage problems, which might have been specific for the protein used in the later studies [22
]. Overall, these experiments provide additional evidence that intrinsically disordered proteins remain incompletely folded inside the cell, but additional experiments need to be carried out in order to increase confidence in these results.
A number of different terms have been used to describe these proteins, including rheomorphic [23
], natively denatured [24
], natively unfolded [25
], intrinsically unstructured [26
], and several variants of disordered [27
]. By now, several reviews on these proteins have appeared [16
]. We use "intrinsically disordered" to describe all types of incompletely folded proteins and regions, and we use "natively unfolded" or "intrinsically unstructured" to indicate random-coil-like and pre-molten globular forms. Collapsed random coils as recently described for polyQ [35
] are similar to, if not identical with, the premolten globule form and in our view these structures fit into the "natively unfolded" category. However, there is not a consensus in this field regarding nomenclature, which suggests the need for a disordered protein ontology.
Just as the amino acid sequence codes for protein structure, so might the sequence also code for lack of structure or disorder. Development of a predictor of protein disorder is one way to test the hypothesis that disorder is encoded by the amino acid sequence. Furthermore, study of disorder prediction provides a means to understand "the protein disorder code." For example, Figure shows that "natively unfolded" proteins (a subset of intrinsically disordered proteins that have little or no ordered structure under physiologic conditions and behave as random coils or pre-molten globules [37
]) are specifically localized within a unique region of charge-hydropathy phase space, indicating that a combination of low overall hydropathy and high net charge represent a unique structural feature of "natively unfolded" proteins [37
]. In more general terms, certain amino acid residues have been found to be highly "order-promoting" (namely cysteine, tryptophan, tyrosine, isoleucine, phenylalanine, valine, leucine, histidine, threonine, and asparagine) while others are highly "disorder-promoting" (namely aspartic acid, methionine, lysine, arginine, serine, glutamine, proline, and glutamic acid) [40
]. These order-inducing and disorder-inducing amino acid trends are further illustrated by Figure , which depicts the relative amino acid compositions of intrinsically disordered regions available in the DisProt database [15
] in comparison with a set of structured (or ordered) proteins [40
]. In this case, these amino acid compositions were compared by means of a profiling approach [30
Figure 1 Peculiarities of the amino acid sequences of intrinsically disordered proteins. A. Mean net charge versus mean hydropathy plot (charge-hydropathy plot) for the set of 275 folded (blue squares) and 91 natively unfolded proteins (red circles) . B. Amino-acid (more ...)
During the fourth Critical Assessment of Structure Prediction (CASP) experiment, we worked with the meeting organizers to carry out disorder prediction on the various protein targets. This trial was deemed a success (C.J. Brown, unpublished), so disorder prediction was included in the subsequent CASP 5, 6, and 7 experiments [45
]. Inclusion of disorder prediction in the CASP experiments has stimulated a rapid increase in the number of such predictors, with at least 25 different predictors having been developed by now. A collection of links to many of these is maintained at the Database of Disordered Protein website http://www.disprot.org
Several disordered protein predictors have been compared in recent publications [40
]. As more disordered proteins have been identified, and as more sophisticated machine learning methods have been applied, the per residue prediction accuracy has risen from ~70% to ~85%. A likely-to-be significant impediment to further improvement is the misclassification of the residues in the training sets.
Application of the disorder predictors to various organisms in the three domains of life, namely, prokaryotes, archaea, and eukaryotes, reveals a large increase in disorder among the eukaryotes compared to the other two types of organisms [48
]. One related speculation is that more disorder is needed for signaling and coordination among the various organelles in the more complex eukaryotic domain [54
The recent explosion of papers on intrinsically disordered protein contains many new discoveries on these proteins by a large number of investigators. There is neither time nor space to adequately cover these important advances. Herein we focus mainly on our own work; and we hope that other researchers in this field will not be offended by this approach. In the following are seven short stories that briefly review recent research on disordered proteins published by our group. These include the following: (1) A bioinformatics study of the relationship between disorder and function in the Swiss Protein Database [55
]; (2) An introduction of the molecular recognition feature (MoRF) concept and characterization of various MoRFs and MoRF-binding proteins [58
]; (3) The mechanisms by which one disordered region can bind to many partners and by which many different disordered sequences can bind to one site on one protein partner [62
] thereby contributing to the complex protein-protein interaction networks that are observed in nature; (4) The observation that regions of mRNA that undergo alternative splicing code for disordered protein much more often than they code for structured protein [64
]; (5) A bioinformatics study on conservation of intrinsic disorder in protein domains and protein families [65
]; (6) An introduction of the disordered proteins in disease (or D2
) concept, which is based on bioinformatics analysis that indicate an abundance of intrinsic disorder in disease-related proteins [38
]. (7) A novel method for drug discovery based on regions of disordered protein [75
]. The novel drug discovery method suggests how the observations in the first six studies might be put to practical use.