Identification of molecular interactions is an essential step towards a better understanding of various cellular processes. Recent advances in functional genomics have helped uncover thousands of protein–protein interactions (1–9
). Studying interactions at the protein level, though extremely valuable towards a better understanding of the molecular machinery of a cell, do not provide insights on interaction specificity at the domain level. Most often, it is only a fraction of a protein that directly interacts with its biological partners. Since the majority of the proteins (two-thirds in prokaryotes and four-fifths in eukaryotes) are multi-domain proteins (10
), an interaction between two proteins (either stably or transiently) often involves binding of two or more domains. Thus, understanding protein interactions at the domain level seems to be a logical step towards understanding precise atomic details of interactions.
Over the last few years, researchers have focused their attention on discovering and understanding protein domain (domain–domain) interactions. One way to infer domain–domain interactions is by studying three-dimensional (3D) structures. iPfam (11
) and 3did (12
) are two databases that contain information on known domain–domain interactions inferred from PDB entries (13
). The number of known domain–domain interactions is still mostly limited by the availability of 3D structures. Although many thousands of protein interactions are known, the number of interactions with known protein structures is far fewer than the number of interactions. This limits us from uncovering all possible domain level interactions. Domain interactions inferred from structural data can only explain ~5% of protein interactions in Saccharomyces cerevisiae
and ~19% of protein interactions in Homo sapiens
). In recent years, several computational approaches have been proposed in an effort to unearth previously unrecognized domain–domain interactions on a genome scale. These include approaches based on correlated sequence signatures (15
), maximum-likelihood estimation (16
), phylogenetic profiling (17
), statistical significance (18
), domain pair exclusion analysis (19
), random decision forest framework (20
), sequence co-evolution (21
), parsimony principle (22
), domain fusion, GO (23
) functional annotations and combination thereof (24
While computational approaches have greatly contributed to the discovery and understanding of domain–domain interactions, the ever-increasing sets of predicted domain–domain interactions remains scattered under a variety of diverse formats and sources. This has created a need to develop a comprehensive resource that collates all known and predicted domain–domain interactions from various sources under one roof.
We present here DOMINE, a comprehensive database of protein domain interactions using Pfam-A (26
) domain definitions, which collates known and predicted domain–domain interactions from 10 different sources. By making the existing datasets more accessible, this database will serve as a valuable resource to those working in the field of protein and domain interactions. DOMINE may not only serve as a reference to experimentalists who test for new protein and domain interactions, but also offers a consolidated dataset for analysis by bioinformaticians who seek to test ideas regarding the underlying factors that control the topological structure of interaction networks.