Recent developments in high-throughput technologies have made it possible to systematically discover physical and functional interactions between proteins (Bork et al.
; Chen and Jeong, 2009
; Hu et al.
; Lin et al.
2009; Parrish et al.
; Vidal, 2001
). Protein–protein interactions (PPIs), though extremely valuable towards a better understanding of protein functions and cellular processes, do not provide any direct information about the regions/domains, defined as structural or functional sub-units, within the proteins that mediate the interaction (Jothi et al.
). Most often, it is only a fraction of a protein that directly interacts with its biological partners. Given that a majority of all proteins are multi-domain proteins (Jothi et al.
) and interactions between two proteins are often characterized by interactions between a pair of constituent domains. Thus, understanding interaction at the domain level is a critical step towards (i) thorough understanding of the PPI networks and their evolution; (ii) precise identification of binding sites; (iii) acquisition of insights into the causes of deleterious mutations at interaction sites; and most importantly (iv) development of drugs to inhibit pathological protein interactions (Pawson and Nash, 2003
). In addition, information derived from known domain–domain interactions (DDIs) are increasingly used to understand binding interfaces (Akiva et al.
; Gong et al.
; Shoemaker et al.
), which in turn can help discover unrecognized PPIs (Schuster-Bockler and Bateman, 2007
Many aspects of cell signaling, trafficking and targeting are governed by interactions between globular protein domains and, in some cases, between a globular domain and short peptide segment (Neduva et al.
). Interactions between globular domains have drawn increased attention over the last few years. Three-dimensional structures or models are a great aid to understanding the details of how protein or domain interactions are mediated. Recent studies suggest that the limiting factor is no longer the number of protein structures, but the number of 3D templates on which to model interactions (Aloy and Russell, 2004
). This has created an urgent need in the community to identify the most comprehensive possible set of interaction templates. Given that it has been estimated that there are about 10 000 interaction types and that it will take more than 20 years before we know a full representative set (Aloy and Russell, 2004
), it is important that we expedite the process of identifying all interactions at the domain level to fully understand the structural and evolutionary aspects of protein interactions and complexes (Itzhaki et al.
). Understanding interactions at the domain level will move us a step closer towards understanding critical molecular details of how interaction networks are constructed, which in turn will help illuminate cellular processes (Pawson and Nash, 2003
Although high-throughput techniques used for experimental determination of PPIs can be used to infer interaction between individual domains (Ikeuchi et al.
; Sleno and Emili, 2008
), to our knowledge, no study has used such approaches to detect DDIs on a genomic scale. One way to infer DDIs is to study 3D structures (Aloy and Russell, 2006
; Finn et al.
; Littler and Hubbard, 2005
; Stein et al.
; Russell et al.
). Unfortunately, the number of known DDIs is still mostly limited by the availability of 3D structures as the number of PPIs with known structures is far fewer than the number of known interactions. This limits us from uncovering all possible domain level interactions. Moreover, DDIs inferred from structural data could explain no >20% of the PPIs for any of the Escherichea coli
, Saccharomyces cerevisiae
, Caenorhabditis elegans
, Drosophila melanogaster
and Homo sapiens
organisms (Itzhaki et al.
; Schuster-Bockler and Bateman, 2007
). To expedite the discovery of DDIs, several computational approaches have been proposed in recent years in an effort to unearth previously unrecognized DDIs on a genome scale.
Attempts have been made to understand DDIs using a hypothesis based on correlated mutations at interaction sites (Jothi et al.
; Kann et al.
), generally referred to as the co-evolution principle (Pazos and Valencia, 2008
). Many other methods rely solely on PPI networks to infer DDIs. One of the first was the Association Method, which seeks domain pairs that co-occur more often in interacting protein pairs than expected by chance (Sprinzak and Margalit, 2001
). This idea was later extended using a maximum likelihood estimation approach where domain interaction probability is optimized using an expectation maximization algorithm (Deng et al.
; Liu et al.
). Other groups have proposed probabilistic network models (Gomez and Rzhetsky, 2002
; Nye et al.
), machine learning algorithms (Chen and Liu, 2005
, 2006), phylogenetic profiling (Pagel et al.
) and integrative models (Lee et al.
; Ng et al.
) to study domain interactions. More recently, a unique class of methods emerged where given a PPI network, the goal is to find an optimal set of DDIs that together could explain or justify the set of all interactions in the PPI network. For instance, the DPEA method (Riley et al.
) introduced a new measure for each potentially interacting domain pair, called E
-score, which measures the degree of reduction in likelihood of observing the given PPI network when excluding a domain pair. A variant of this method was proposed later (Wang et al.
). Similar optimization frameworks were proposed to identify the minimal set of DDIs that could explain the set of all PPIs (Guimaraes et al.
; Singhal and Resat, 2007
In this study, we explore an alternative approach called K-GIDDI (knowledge-guided inference of DDIs) for predicting DDIs from cross-species PPI network data. K-GIDDI begins by constructing an initial DDI network from cross-species PPI networks, which is then expanded by inferring additional DDIs using a divide-and-conquer biclustering algorithm guided by Gene Ontology (GO) information (Ashburner et al.
). The expansion of the DDI network is done by identifying partial-complete bipartite sub-networks, guided by GO molecular function terms and adding necessary edges to make them complete bipartite sub-networks. The presumption is that the newly added edges in the DDI network represent missing DDIs, which could be due to the utilization of not-yet-complete PPI networks.
The predicted DDIs are evaluated against a set of known DDIs (Finn et al.
; Stein et al.
) inferred from PDB structure data (Berman et al.
), and predictions from previous approaches stored in the DOMINE database (Raghavachari et al.
). Our results indicate that K-GIDDI can reliably predict DDIs, and its performance is better, if not comparable, to that of previous approaches. Most importantly, K-GIDDI's novel network expansion procedure allows prediction of DDIs that are otherwise not identifiable by methods that rely only on PPI data. This is significant because information derived from these novel DDIs could be used to understand binding interfaces, which in turn can help discover unrecognized PPIs.