Large-scale analyses of protein-protein interactions based on coarse-grain molecular docking simulations and binding site predictions resulting from evolutionary sequence analysis, are possible and realizable on hundreds of proteins with variate structures and interfaces. We demonstrated this on the 168 proteins of the Mintseris Benchmark 2.0. On the one hand, we evaluated the quality of the interaction signal and the contribution of docking information compared to evolutionary information showing that the combination of the two improves partner identification. On the other hand, since protein interactions usually occur in crowded environments with several competing partners, we realized a thorough analysis of the interactions of proteins with true partners but also with non-partners to evaluate whether proteins in the environment, competing with the true partner, affect its identification. We found three populations of proteins: strongly competing, never competing, and interacting with different levels of strength. Populations and levels of strength are numerically characterized and provide a signature for the behavior of a protein in the crowded environment. We showed that partner identification, to some extent, does not depend on the competing partners present in the environment, that certain biochemical classes of proteins are intrinsically easier to analyze than others, and that small proteins are not more promiscuous than large ones. Our approach brings to light that the knowledge of the binding site can be used to reduce the high computational cost of docking simulations with no consequence in the quality of the results, demonstrating the possibility to apply coarse-grain docking to datasets made of thousands of proteins. Comparison with all available large-scale analyses aimed to partner predictions is realized. We release the complete decoys set issued by coarse-grain docking simulations of both true and false interacting partners, and their evolutionary sequence analysis leading to binding site predictions. Download site: http://www.lgm.upmc.fr/CCDMintseris/
Protein-protein interactions (PPI) are at the heart of the molecular processes governing life and constitute an increasingly important target for drug design. Given their importance, it is vital to determine which protein interactions have functional relevance and to characterize the protein competition inherent to crowded environments, as the cytoplasm or the cellular organelles. We show that combining coarse-grain molecular cross-docking simulations and binding site predictions based on evolutionary sequence analysis is a viable route to identify true interacting partners for hundreds of proteins with a variate set of protein structures and interfaces. Also, we realize a large-scale analysis of protein binding promiscuity and provide a numerical characterization of partner competition and level of interaction strength for about 28000 false-partner interactions. Finally, we demonstrate that binding site prediction is useful to discriminate native partners, but also to scale up the approach to thousands of protein interactions. This study is based on the large computational effort made by thousands of internautes helping World Community Grid over a period of 7 months. The complete dataset issued by the computation and the analysis is released to the scientific community.