Formation of amyloid-like fibrils is involved in numerous human protein deposition diseases, but is also an intrinsic property of polypeptide chains in general. Progress achieved recently now allows the aggregation propensity of proteins to be analyzed over large scales. In this work we used a previously developed predictive algorithm to analyze the propensity of the 34,180 protein sequences of the human proteome to form amyloid-like fibrils. We show that long proteins have, on average, less intense aggregation peaks than short ones. Human proteins involved in protein deposition diseases do not differ extensively from the rest of the proteome, further demonstrating the generality of protein aggregation. We were also able to reproduce some of the results obtained with other algorithms, demonstrating that they do not depend on the type of computational tool employed. For example, proteins with different subcellular localizations were found to have different aggregation propensities, in relation to the various efficiencies of quality control mechanisms. Membrane proteins, intrinsically disordered proteins, and folded proteins were confirmed to have very different aggregation propensities, as a consequence of their different structures and cellular microenvironments. In addition, gatekeeper residues at strategic positions of the sequences were found to protect human proteins from aggregation. The results of these comparative analyses highlight the existence of intimate links between the propensity of proteins to form aggregates with β-structure and their biology. In particular, they emphasize the existence of a negative selection pressure that finely modulates protein sequences in order to adapt their aggregation propensity to their biological context.
Amyloid-like fibrils are insoluble proteinaceous fibrillar aggregates with a characteristic structure (the cross-β core) that form and deposit in more than 40 pathological conditions in humans. These include Alzheimer's disease, Parkinson's disease, type II diabetes, and the spongiform encephalopathies. A number of proteins not involved in any disease can also form amyloid-like fibrils in vitro, suggesting that amyloid fibril formation is an intrinsic property of proteins in general. Recent efforts in understanding the physico-chemical grounds of amyloid fibril formation has led to the development of several algorithms, capable of predicting a number of aggregation-related parameters of a protein directly from its amino acid sequence. In order to study the predicted aggregation behavior of the human proteome, we have run one of these algorithms on the 34,180 human protein sequences. Our results demonstrate that molecular evolution has acted on protein sequences to finely modulate their aggregation propensities, depending on different parameters related to their in vivo environment. Together with cellular control mechanisms, this natural selection protects proteins from aggregation during their lifetime.