Although much knowledge about ubiquitination has been accumulated to date, there are still numerous unanswered questions regarding specific aspects of this highly complex system. So far, no consensus sequence that determines what specific lysine of the substrate would become ubiquitinated has been identified when non-homologous proteins are considered. In addition, the broad range of specificities of the ligases, together with the relative rigidity of their structures, raises a question about the mechanisms of substrate selection. It is difficult to assume that all substrates carry a similar preexisting structure before they bind to the components of the ubiquitination machinery.
Disorder has previously been implicated in various aspects of ubiquitination.30–33
Here, we present several lines of evidence that a significant fraction of Ub sites may be located in intrinsically disordered regions. First, we searched the literature and found a number of experimentally confirmed Ub sites located in disordered regions. Second, despite the large size of PDB, only 7% of currently known Ub sites in yeast could be confidently mapped to protein structures. Third, the use of disorder region predictors as well as the analysis of sequence, physicochemical and evolutionary properties around Ub sites showed higher propensity of Ub sites to be disordered than ordered (the average disorder prediction scores for Ub and non-Ub sites were 0.57±0.01 and 0.44±0.003, while the scores calculated on the experimentally verified disordered and ordered protein regions were 0.66±0.02 and 0.39±0.01, respectively. Fourth, the functional classes of proteins predicted to be over-ubiquitinated also show signatures of structural disorder, however, this evidence may not be independent. One previous study that also examined structural preferences of Ub sites concluded that these sites were preferentially located within loops.87
However, since the Catic et al. study was limited to only 40 Ub sites and was structure-based, it did not account for the presence of disorder, for which structural information was not available.
Locating Ub sites in unstructured regions is compelling when one takes into account the crystal structures of ubiquitin ligases. The structures of ubiquitin ligases contain large cavities and gaps13,14,17,18,88
that may serve to accommodate unstructured substrates. The Cul1 subunit of the SCF complex is rigid and elongated, and the gap between Skp2 and the E2 active site is ~50Å, supposedly to bind to a wide range of substrates of different sizes.14
Given the rigidity of the SCF complex and the diversity of proteins to which it binds, it is likely that the substrates adopt significant flexibility in order to conform to the rigid scaffold of the SCF complex. Indeed, the structure of β-TrCP1-Skp1 bound to a β-catenin peptide15
indicates that 15 out of 26 residues of the substrate peptide are disordered. Similarly, 14 out of 24 residues of the p27 Kip1
substrate in another structure are also disordered.17
In addition, a large distance between the E3 and E2 active sites suggests that the transfer of ubiquitin requires some large-scale movements. It is reasonable to speculate that movement of the substrate is required for the successful transfer and conjugation of the ubiquitin molecule. Thus, large cavities in structures of ubiquitin ligases could serve to accommodate diverse disordered substrates.
Another important result of this work is development of the Ub sites predictor. UbPred achieved a balanced accuracy of 72%, and area under the ROC curve was estimated to be ~80%. We demonstrated the utility of UbPred by: (1) predicting precise Ub sites in a dataset of Rsp5 ubiquitin ligase substrates; (2) establishing the correlation between ubiquitination and protein half life; (3) identifying functional categories of yeast and human proteins that are likely to be regulated by ubiquitination; and (4) demonstrating potential loss and gain of Ub sites as a consequence of disease mutations in humans. Thus, the initial application of UbPred to various datasets has expanded our understanding of ubiquitination in several biological processes and human diseases.
It should be noted that UbPred algorithm does not account for E3 binding/recognition sites that in some cases have been shown to be located distantly from Ub sites. Therefore, UbPred will not predict the ultimate ubiquitination status of the site since this status would depend on whether E3 binds to a protein or not. In essence, it will output the probability that the site is ubiquitinated if other conditions (such as E3 binding) are satisfied. Currently, it is not known whether universal ubiquitination/degradation signals could successfully predict the ubiquitination status of a substrate. Recent evidence suggests that the presence of bona fide
degradation signals, such as the destruction-box, KEN-box, PEST regions and specific N-end residues shows no correlation with the protein half-life, and it has hardly any influence on protein turnover.32
In agreement with this observation, the computational scan of our positive examples for the presence of two degradation signals, a KEN-box (K-E-N) and a destruction box (R-x-x-L, x = any amino acid) showed that only 8 out of 265 substrates carried KEN-box, and only 18 substrates carried destruction box motifs in their vicinity. These signals, therefore, could not serve as global predictors of substrate ubiquitination and/or degradation. The disorder status of the substrate seems to be a better global ubiquitination signal than the presence of specific motifs.
While we were working on this project, another predictor of Ub sites was developed.89
It was trained on 157 Ub sites extracted from a database of ubiquitinated proteins.90
The majority of the Ub sites in this database were extracted from the two large-scale proteomics-based publications,35,36
also used in our work. However, the developed predictor achieved poor performance on our newly identified Ub sites (Sensitivity = 50.4%; Specificity = 55.8%, Accuracy = 53.1%; AUC = 54.8%
To summarize, the involvement of flexible and disordered protein regions into various aspects of ubiquitination process further emphasizes the functional importance of such regions. Although many functions of disordered regions have already been discovered, we provide computational evidence that ubiquitination has signatures similar to other post-translational modifications that rely on the unfolded structure.20,27,28,91
Moreover, the development of UbPred represents an attempt to identify candidate Ub sites based on the local sequence information. While the number of experimentally determined Ub sites will be growing in the future and these sites will be added to our training set to improve predictor performance, the current accuracy of UbPred is useful for predicting novel ubiquitination substrates as well as new sites in already known substrates. With an established link between the ubiquitin-proteasome system and a number of human diseases,81,82,92
such predictions, especially when confirmed by experiments, would help to target the degradation of individual proteins more precisely, and may ultimately lead to the development of better drugs.