It is now accepted as evident that protein-protein interactions (PPIs) are of fundamental importance in the vast majority of molecular events that occur in living organisms. Proteins can interact to form stable macromolecular assemblies that are able to perform many complex biological functions. They can also form transient interactions that collectively constitute dynamic networks of interactions that regulate how organisms operate. Protein-protein interactions are also of crucial importance to bacteria and viruses, which interfere with the host PPI network during infection [1
]. Logically, protein-protein binding sites are becoming major targets for novel drug design strategies [3
Shape complementarity, surface hydrophobicity and charge complementarity have all been recognized as key factors of recognition in early studies [4
]. More recently, the increasing availability of structural data on protein-protein complexes [6
] has led to a more refined picture of PPI mechanisms. Among the emerging structural and functional properties of transient interactions, one can cite conformational changes and disorder-to-order transitions upon interaction, the sequence conservation of interface residues, the existence of multi-specific proteins, and the role of post-translational modifications [7
A number of methods for predicting PPIs have been developed, targeting two distinct aspects of the problem: protein-protein binding site prediction and protein docking. In the former case, the challenge is to identify the surface residues involved in the formation of protein-protein complexes; see [8
] for recent reviews. In contrast, docking methods aim at predicting the structures of known, generally binary, complexes starting with the structures of separate proteins and using scoring functions based on shape/electrostatic/hydrophobic factors to locate optimal conformations. Substantial progress has been made in the docking field over recent years. The best algorithms are now able to predict correctly the structures of most complexes, when no major conformational change occurs during interaction, and promising developments are being made in the treatment of conformational changes [11
]. It has however been pointed out that the scoring functions used in docking perform very poorly when the aim is to predict binding affinities [13
]. Notably, "cross-docking" studies, where binary protein complexes are separated and the isolated proteins are all docked against each other using a successful multiple minimization docking algorithm [15
], have demonstrated that it is very hard to distinguish between "true" (native) and "false" complexes. Similar difficulties were found using the top-performing Cluspro [17
] web server [J. Martin unpublished results]. In another study, carried out on a larger scale, and using another docking algorithm [18
], despite docking scores biased in favor of true complexes, the vast majority of cases led to false complexes being scored better than true ones.
The fact that false complexes obtain good scores during cross-docking studies raises two important and orthogonal questions: Are scoring functions so poor that they cannot discriminate interacting from non-interacting proteins (as suggested by the observations of [13
]), or does this result, at least in part, reflect a physical reality? Unfortunately, there is virtually no experimental data on the strength of the interactions comprising the "false" complexes. This set of complexes could potentially reflect potential weak, or nonspecific, interactions that are present in the cytoplasm, or avoided by mechanisms such as compartmentalization.
The fact that biological interactions in the cell are tightly orchestrated by localization and co-regulation mechanisms indeed suggests that significant nonspecific interactions may be common. It has been proposed that co-localization is necessary to control specific interactions, given the size of cells and the lifetime of individual proteins [19
]. So far, nonspecific interactions have only been marginally addressed in the literature, but they certainly deserve more attention. If localization and co-regulation is the rule in healthy cells, singular events also occur where localization breaks down, for example when mitochondrial proteins are released into the cytoplasm during the early phase of apoptosis, or when viral or bacterial proteins interfere with the host PPIs during infection. Recent studies indeed suggest that weak interactions play an important role in complex systems. A pioneering simulation of the bacterial cytoplasm has shown that proteins interacting with hard-sphere potentials diffuse too fast compared to experiment and that adding weak nonspecific attractions between all proteins could correct this behavior [20
]. Another recent study suggests that nonspecific binding acts as the evolutionary level to shape the PPI networks and limits the number of different proteins in genomes [21
]. Ultimately, a full understanding of proteins networks can only be achieved if we address the nonspecific as well as the specific interactions.
In this paper, we investigate what can be termed the "twilight zone" of protein-protein interactions, by using computational docking, and building on an interesting outcome of earlier cross-docking experiments: "false" complexes seem to favor interfaces containing residues belonging to "true" interaction sites [15
]. This suggests that randomly chosen protein partners dock in a non-random fashion. Using a non-redundant data set of 198 proteins, we explore the tendency of randomly chosen partners to aggregate at localized regions on the surface of each protein. We analyze the shape and compositional bias of the interfaces that are generated and the potential of this approach for predicting biologically relevant protein binding sites. We test our procedure on PEBP (Phosphatidylethanolamine binding protein), a kinase inhibitor with multiple known partners.