Protein interactions determine the outcome of most cellular processes and the analysis of protein interaction networks is crucial for understanding the mechanisms of cell functioning. The recent advances in experimental methods for identification of protein-protein interactions have provided extensive data on protein interaction networks. While for some organisms, such as yeast, the networks are close to completion and their reliability is relatively high
[1], for many other organisms the protein interaction data contains a lot of false positives and the coverage still remains low. For example, it has been estimated that less than 10% of all human protein interactions have been experimentally determined
[2]. Moreover, there are many self-interacting proteins in the protein interaction networks
[3], but due to the ambiguity of homooligomer experimental characterization such interactions are usually poorly characterized and largely neglected in large scale network mappings.
One way to fill this gap and provide a more reliable and comprehensive biomolecular interaction network is to employ computational methods for protein interaction prediction and verification. There are many different computational approaches to predict protein interactions; some are based on genomic context, co-evolution, co-expression or co-occurrence patterns of potentially interacting proteins and their genes
[4]. Another group of methods rely on similarities between proteins with unknown interactions and homologous proteins with experimentally observed interactions
[5]–
[8]. It has been suggested, though, that interaction partners can be reliably inferred only for close homologs
[9]–
[12] and annotations transferred from one homologous protein to another may result in incorrect assignment even for close homologs if they have different binding specificities. Since binding specificity is usually determined by the structural and sequence features of protein interaction interfaces, it is essential to detect and transfer binding sites correctly. Current binding site prediction methods use either evolutionary conservation of binding site sequence motifs, information about structures of available complexes, or docking approaches if no such data is available. To verify and guide predictions based on inference, one needs to ensure similarity between unknown query protein and observed binding sites detected in homologs. Our recently developed method and server Inferred Biomolecular Interaction Server (IBIS)
[13],
[14] clusters similar binding sites found in homologous proteins based on the site's conservation of sequence and structure and then calculates position specific score matrices (PSSMs) from binding site alignments. Together with other measures, these PSSMs are used to rank binding sites and to gauge the biological relevance of binding sites with respect to the unknown query protein (). Even though this server handles five different types of protein interactions (protein-protein, protein-small molecule, protein-nucleic acids, protein-peptide and protein-ion), in this work we focused only on protein-protein interactions.
In this paper we tried to assess how the homology inference approach can be used to annotate the biological partners and interfaces of protein-protein interactions even if the native complex is not present in the structural database. We try to determine which factors influence the accuracy of such an approach. First, we find that the performance of the IBIS method for predicting protein interaction partners reaches 88% sensitivity and 67% specificity while performance for prediction of binding site locations is 72% recall and 70% precision. Interestingly a considerable increase in accuracy is observed if all available data on structures of homologous complexes is used, as compared to the approach where only a non-redundant set of structural complexes is employed. Second we show that there exists a trade-off between specificity and sensitivity if we use only conserved binding site clusters or clusters supported by only one observation (singletons). Finally we address the question of predicting the biological interfaces that are not present in the PDB asymmetric unit and need to be reconstructed by applying crystallographic symmetry operations. We show that almost half of such interfaces can be reconstructed by IBIS without the prior knowledge of crystal parameters of the query protein.