Many biological functions of proteins occur through specific recognition among protein molecules. Knowledge of protein–protein interactions, particularly three-dimensional structural information of protein–protein complexes, is crucial for understanding the biochemical and physiological functions of proteins.1
Recently, the number of tertiary structures of protein complexes has been increasing by the efforts of structure biologists; however, it is still smaller than that of known protein–protein interactions.4
Therefore, the precise prediction of protein complex structures is required for further experimental studies. A protein–protein docking simulation is one of the popular approaches to predict protein complex structures.7
Docking procedures generally consist of two main steps, a sampling step and a subsequent scoring step. A large number of complex models are generated in the former step. The problem of searching the high dimensional conformational space to create a collection of complex models was studied by various research groups.10
However, there are still several issues to overcome, such as the introduction of conformational flexibility in the generation of near-native models for targets with large conformational changes.9
In the latter step, the selection of near-native models is achieved with a scoring function from the many complex models generated in the former step. The various scoring functions that are presently available evaluate complex models in terms of the surface complementarity22
along with the electrostatic filter,10
the atomic contact energy (ACE)27
or the statistical potentials based on the pairs of interacting residues,28
including hydrogen bonds and van der Waals interactions. However, the selection of correct solutions is not easily performed in the structure predictions of many different heterodimers.9
As previous studies have pointed out,1
various types of heterodimer complexes exist not only in biological functions and three-dimensional structures, but also interaction modes. For example, there are heterodimers with electrostatic dominant interfaces, those with hydrophobic dominant interfaces, and those without interfaces but with high or low shape complementarity. In contrast, the scoring functions based on the statistical analysis of heterodimer interactions are usually designed to select the complex models with the most abundant interaction mode in the known complexes, and thus a single scoring function will not be enough to evaluate the diverse protein–protein interfaces. In addition, the identification of the interaction modes, ie, the classification of heterodimer complexes, was usually performed based on the interface characters observed in experimentally determined structures of heterodimers. However, to make a native dimer structure, the information about the difference between noninteracting sites and interacting sites will be more important because even a weak interface can be a native interface if no other better interfaces exist.
Several pioneering works have already proposed the multiple scoring functions optimized for each type of protein function.10
However, they focused only on two types: enzyme-inhibitor and antibody–antigen type complexes. The other heterodimers, such as those related to signal transduction and gene transcription and translation, were classified as other types.32
This is probably because the small numbers of known complex structures make it difficult to find the functional similarities between these heterodimers and to categorize them. Thus, the classification of heterodimers by using information other than that of protein functions will facilitate the construction of the multiple scoring functions.
In this study, we addressed the problem of selecting the correct solutions from the many complex models in the scoring step, by considering the various features of the heterodimers. First, we classified the native interacting sites by considering decoy structures, where the search for the parameters of the scoring functions to discriminate the near-native and the decoy models was carried out. As a scoring function, we used a linear combination of the weighted values of three complementarity scores for the hydrophobicity, the electrostatic potential, and the shape at the protein–protein interface.35
This function indicates the total degree of complementarities for the three surface features over the interfaces. The four heterodimer clusters were found according to our classification scheme. Four scoring functions were then constructed as multiple scoring functions where each function was optimized for each heterodimer type.