Protein-protein interactions and complex formation play a central role in a broad range of biological processes, including hormone-receptor binding, protease inhibition, antibody-antigen interaction and signal transduction [
1]. As structural genomics projects proceed, we are confronted with an increasing number of structurally known proteins that are functionally uncharacterised. To identify how two proteins are interacting will be particularly important for elucidating functions and designing inhibitors[
2]. Although predicting around 50 percent false positive interactions [
3], high throughput interaction discovery methods, such as the yeast two hybrid system, suggest thousands of protein-protein interactions and therefore also imply that a large fraction of all proteins interact with other proteins [
4].
Since many biological interactions occur in transient complexes whose structures often cannot be determined experimentally, it is important to develop computational docking methods which can predict the structure of complexes with a proper accuracy [
5].
Docking algorithms are developed to predict in which orientation two proteins are likely to bind under natural conditions. They can be split in a sampling step followed by a scoring step. A collection of putative structural complexes is generated by scanning the full conformational space in the first step. Afterwards the putative complexes are ranked according to scoring functions based on geometrical and chemical complementarity.
For the scanning of the conformational space for geometrical complementarity different methods are used (for a general introduction and an overview over the different docking methods see Halperin 2002 [
6]). One of the most widely spread docking methods is based on Fast Fourier Transformations (FFT). The usage of FFT was introduced into docking by Katchalsky-Katzir in 1992 [
7]. One important aspect of the docking procedure is the representation of the proteins. Most FFT based methods use a grid representation for the proteins [
7-
11]. Therefore each protein is mapped on a 3D grid, and the cells of the grid get different values assigned, representing the surface or the interior of the proteins (Figure ). Further grids or complex numbers can be used to represent specific properties which are thought to play a crucial role in protein interactions like hydrophobicity or electrostatics [
8,
10-
12].
During the docking procedure the two grids representing the proteins are moved with respect to each other in a specified number of rotations and the geometric correlation for all translations is calculated in Fourier space within one step. The geometric complementarity of the proteins is evaluated by summing up the products of the values of the overlapping cells. In most approaches the surface cells of the proteins are assigned a value of one. Therefore the more surface cells are in contact with surface cells from the other protein the higher is the geometric score. The interior of the larger protein is assigned a negative value (in our docking program: -6). This results in a 'punishing' negative value as soon as overlaps with interior cells of the first protein are observed. The interior cells of the second protein are assigned a value of one leading to an asymmetrical treatment of both proteins, which 'softens' the surfaces slightly [
13].
In the beginning the FFT-based docking methods were developed for bound docking. In bound docking the complex structure is split in its subunits and the docking algorithm predicts the complex structure from these subunits. For most cases bound docking gives good results, i.e. for most cases a near-native complex structure is on the first rank of the prediction output. However, if the complex structure is to be predicted from the 3D-structures determined in an unbound state, most docking procedures do find a near-native solution but not within the first ranks. This can be explained by the conformational differences between the complex and the unbound structures. Therefore the development of unbound docking methods, which are able to predict the near-native complex even if conformational changes take place, is the most challenging current task in the field of protein-protein docking.
Each protein-protein interaction depends on the amino acids involved in the interaction. Several attempts to evaluate the importance of the 20 amino acids for protein-protein docking were published [
14-
18]. Different properties of the amino acids like hydrophobicity, interface propensity, electrostatic properties, flexibility and others were tested for their relevance in docking. Two different approaches were done. On the one hand it was attempted to use these properties to detect the interface region of proteins before the docking procedure [
16,
19-
26] and on the other hand the differences of the amino acids were used to identify the near-native structure of a complex from all those structures showing a high geometrical correlation [
27].
However, some of the properties lead to controversial ratings of the amino acids. For example for methionine there is a high propensity to be in the interface of a complex [
18], which would lead to an assignment of an important role for that amino acid, but at the same time methionine has a large side chain which might cause clashes in rigid body docking even for the near-native complex which should not be 'punished'.
The flexibility of the amino acid side chains is the main reason for the unsatisfactory results of unbound docking. In the past it was tried to truncate or collapse very flexible side chains like arginine, lysine, asparagine, glutamine and methionine by assigning low numerical values to the cells representing their side chains [
10,
28,
29]. Other approaches to treat the flexibility of side-chains include docking with different copies of the unbound subunits [
30], or the usage of rotamer libraries in the refinement step.
Since it is nearly impossible to decide which property and which scale is the best one in each single case, we optimised amino acid specific weighting factors for rigid body unbound-unbound protein-protein docking. Therefore the grid representation of the proteins was extended. The new representation takes the amino acids into account which are represented by the cells. The values assigned to each cell, are composed of a value for surface or interior of the protein and a weighting factor in dependence of the amino acid (Figure ). These weighting factors were specifically optimised for three different classes of complexes following the classification of the dataset[
31]: enzyme-inhibitor/substrate, antibody-antigen and others.
There are two different possible methods to make use of this kind of representation of the proteins. On the one hand these values can be assigned to the cells before the calculation of the geometric correlation. Thereby an improvement of prediction accuracy might be achieved without an extension of the required computation time, but the chosen parameters must be capable of differentiating between clashing complex structures and such which have primarily surface-surface contacts. On the other hand this protein representation can be used to rerank the structures suggested by the calculation of the geometric correlation. This publication is focussed on the second approach.
The aim of our current work is to find new and independent criteria for the reranking of proposed complex structures, which describe other properties of near-native complex structures as our previously published [
27] postfilter. An integration of the different approaches described in the literature requires a larger programming exercise but is expecting to give further improvement.