Protein interactions are thought to be largely mediated by interactions between structural domains. Databases such as iPfam relate interactions in protein structures to known domain families. Here, we investigate how the domain interactions from the iPfam database are distributed in protein interactions taken from the HPRD, MPact, BioGRID, DIP and IntAct databases.
We find that known structural domain interactions can only explain a subset of 4–19% of the available protein interactions, nevertheless this fraction is still significantly bigger than expected by chance. There is a correlation between the frequency of a domain interaction and the connectivity of the proteins it occurs in. Furthermore, a large proportion of protein interactions can be attributed to a small number of domain interactions. We conclude that many, but not all, domain interactions constitute reusable modules of molecular recognition. A substantial proportion of domain interactions are conserved between E. coli, S. cerevisiae and H. sapiens. These domains are related to essential cellular functions, suggesting that many domain interactions were already present in the last universal common ancestor.
Our results support the concept of domain interactions as reusable, conserved building blocks of protein interactions, but also highlight the limitations currently imposed by the small number of available protein structures.