Genome sequencing has enabled us to acquire a vast amount of data, and this has accelerated the discovery of biological rules at the genome level
[35],
[36]. In comparison with the genome and also transcriptome, the proteome is more dynamic and diverse in composition, modification, interaction, and localization. There remain many fundamental rules or regular patterns to be discovered that may explain the diverse nature of the proteome. With the advancement of proteomics, more and more studies began to mine the rules behind the huge datasets
[37].
Protein abundance is one of important phenotypic variables of proteins, and is controlled precisely. Exploration for the proteome-wide relationship between the intrinsic properties (such as evolutionary, structural or functional characters) and this phenotypic variable will benefit to discovering the essential rules related to proteins. Here, we described three correlations of protein abundance with the protein's intrinsic properties of evolution, structure, and function, observed to be consistent from bacteria to yeast, worm, fly, mouse, and human: (1) There is a positive correlation with both the protein's origination time and sequence conservation during phylogeny, confirming the conclusions of previous studies at a wider range of species
[4]–
[6]; (2) We found protein abundance negatively correlated with the protein's domain number, and positively correlated with protein's domain coverage, which indicates that proteins with simple/compact structures tend to be high abundant. The finding that proteins with more domains tend to be with lower abundance may be due to the selection to reduce the risk of mis-translation
[6] of the multi-domain proteins. Another phenomenon that higher abundant proteins tend to have higher domain coverage may result from the selection to reduce the length of non-functional regions within the abundant proteins to minimize transcriptional and translational costs, just as the similar selection on the length of intronic DNA
[7]. (3) The abundances of proteins involved in the mass category tend to be higher than those in the information category, which also is similar to previous studies' conclusions obtained from the data of bacteria
[11] or human cell line
[14]. Our work firstly confirmed that this finding is presented across species, and some new findings emerged based on our new analyses. Intriguingly, the third pattern can be inferred from the first two patterns, highlighting the importance of the structure and function relationship.
Taken together, protein abundance distribution across the whole proteome displays prominent regularities, even though there are large discrepancies in protein composition and abundance among various proteomes of diverse bio-entities. Such regularities seem to be maintained in different organs as analysis of the mouse kidney dataset yielded identical patterns with the analysis of the mouse liver datasets (
Fig. S5). The discovery of those regularities strongly demonstrated that the quantitative and comprehensive proteomic datasets could provide a rich ground for the exploration and discovery of the fundamental rules in nature on the proteome scale.