PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Mol Biol. Author manuscript; available in PMC Oct 20, 2006.
Published in final edited form as:
PMCID: PMC1618801
NIHMSID: NIHMS12751
Co-evolutionary Analysis of Domains in Interacting Proteins Reveals Insights into Domain–Domain Interactions Mediating Protein–Protein Interactions
Raja Jothi,1* Praveen F. Cherukuri,1,2 Asba Tasneem,3 and Teresa M. Przytycka1*
1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
2Bioinformatics Program Boston University, Boston, MA 02215, USA
3Booz Allen Hamilton Inc., Rockville, MD 20852, USA
*Corresponding authors; E-mail addresses of the corresponding authors: jothi/at/ncbi.nlm.nih.gov; przytyck/at/ncbi.nlm.nih.gov
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein–protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the noninteracting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain–domain interactions. Given a protein–protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain–domain interactions, and used known domain–domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain–domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites.
Keywords: co-evolution, protein—protein interaction, domain—domain interaction
Abbreviations used: MLE, Maximum Likelihood Estimation; PDB, Protein Data Bank; RCDP, Relative Co-evolution of Domain Pairs; SLA, Sequence Lengths Assigned; DPEA, Domain Pair Exclusion Analysis; RDFF, Random Decision Forest Framework
Post-genomic advances in molecular biology have helped uncover the intricate interplay between proteins in metabolic, signaling and regulatory pathways. Identification of protein–protein interactions is an essential step towards a better understanding of various cellular processes. Various high-throughput experimental techniques such as mass spectrometry, yeast two-hybrid, and tandem affinity purification have been used to discover and generate large scale protein interaction data.18 In addition, several computational approaches towards predicting protein–protein interactions have been proposed in an effort to complement experimental methods.923
Protein–protein interaction networks, though extremely valuable towards molecular level understanding of cells, do not provide insights on interaction specificity at the domain level. Most often, it is only a fraction of a protein that directly interacts with its biological partners. Since two thirds of proteins in prokaryotes and four fifths of proteins in eukaryotes are multidomain proteins,24,25 interaction between two proteins (either stably or transiently) often involves binding of pair(s) of domains. Importantly, understanding interaction at the domain level is a critical step towards a thorough understanding of the protein–protein interaction networks and their evolution.
Statistical analysis of domain co-occurrence in interacting proteins (known as the Association Method) have been used towards predicting protein–protein interactions.2629 The underlying idea behind this type of approach is to identify domain pairs that co-occur significantly more often in interacting proteins than in non-interacting proteins, and use this information to identify new protein–protein interactions. Deng et al.30 applied the Maximum Likelihood Estimation (MLE) method to infer domain–domain interactions from a given protein interaction network. They estimated the probability of interaction between every domain pair, and used it to predict protein–protein interactions. Instead of identifying domain interactions with the goal of predicting new protein–protein interactions, Nye et al.31 proposed the lowest p-value method, which focuses on identifying domain–domain interactions that are most likely to mediate a given protein–protein interaction. They performed a comprehensive comparative analysis of the association,26 MLE,30 and the lowest p-value31 methods and showed that (i) the overall prediction accuracy of the MLE and the lowest p-value methods is only as good as a random method, which is about 55%, and (ii) the association method was the worst among the four with a prediction accuracy of about 52%. Chen and Liu32 used Random Decision Forest Framework (RDFF) to predict domain–domain interactions, which are then used to predict protein–protein interactions. Recently, Riley et al.33 extended ideas from earlier methods2628,30 by adding a likelihood ratio test to assess the contribution of each potential domain–domain interaction to the likelihood of a given set of protein–protein interactions. They demonstrated that their method performs considerably better than the association and MLE methods.
At the structural level, considerable amount of work has been done to understand protein/domain interactions. Littler and Hubbard 34 performed a comprehensive analysis of domain–domain interactions observed in protein structures in an effort to understand comparative orientation and interacting surfaces of structurally unsolved domain pairs. Gong et al.35 used geometric properties/tools such as accessible surface area and Voronoi diagram on Protein Data Bank (PDB) crystal structures of protein complexes to detect domain interaction interfaces. Shoemaker et al.36 used a conserved binding mode analysis on domain–domain interactions inferred from PDB crystal structures to detect binding surfaces of biological relevance. Neduva et al.37 studied interactions that involve binding between a globular domain in one protein and a short linear motif (pattern of three to eight residues), and proposed a systematic approach to discover these motifs. For a comprehensive review of structure-based interaction studies, we refer the reader to Aloy and Russell's recent review on protein interactions.38
Structural information is necessary and critical for a full understanding of molecular/domain level interaction. However, as the number of interactions with known protein structures is far fewer than the number of interactions, it makes it difficult to understand domain level interactions at the genomic scale. Here, we attempt to understand domain–domain interactions at the sequence level. Specifically, we investigate the relative degree of co-evolution of domains in interacting proteins to understand whether or not interacting domain pairs exhibit higher level of co-evolution that those that are non-interacting. The concept of co-evolution has been widely applied in predicting protein–protein interactions solely based on sequence information14,16,3944 as well as the gene expression data.45,46 The underlying assumption behind this concept is that the interacting partners must co-evolve so that changes in a protein's binding surface are complemented in the interface of its partner.14,47,48 We use co-evolutionary analysis to study protein–protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase, and nuclear pore complexes, and show that the degree of co-evolution of interacting domain pair(s) between two interacting proteins is higher than that of non-interacting domain pairs. We then develop a computational method, Relative Co-evolution of Domain Pairs (RCDP), to test the generality of the observed behavior, and to predict large-scale domain–domain interactions using the yeast inter-actome. The proposed method, given a pair of interacting proteins, predicts the domain pair(s) that is most likely to mediate the interaction. Predicted domain–domain interactions are validated against a set of known domain–domain interactions found in PDB49 crystal structures (as reported in iPfam50). We finally compare RCDP's prediction results with those from two other methods, and show that the RCDP method can predict known domain interactions that are missed by the other methods.
Co-evolution of domain pairs in interacting proteins
To assess the degree of co-evolution of domain pairs in interacting proteins, we considered protein–protein interactions in the Saccharomyces cerevisiae (yeast) genome. A schematic overview of assessing the degree of co-evolution between two protein/ domain families is shown in Figure 1. In this type of analysis, multiple sequence alignments of two proteins/domains for a common set of species are used to construct phylogenetic trees and similarity matrices. The degree of co-evolution of the two domains is measured by computing a linear correlation coefficient of the two similarity matrices, which implicitly compares the evolutionary histories of the two domains.
Figure 1
Figure 1
A schematic overview of the co-evolutionary analysis. Multiple sequence alignments of two yeast proteins for a common set of species are constructed, followed by the construction of their phylogenetic trees and similarity matrices. The extent of agreement (more ...)
To study the relative degree of co-evolution of domain pairs in interacting proteins, interacting proteins are first assigned with Pfam Hidden Markov Model (HMM)51 profiles. Then, as shown in Figure 2, the correlation (agreement) scores, measuring the degree of co-evolution of all possible domain pairs between the two interacting proteins, are computed. Multiple sequence alignment for domain D in protein P is constructed by extracting those regions in P's multiple sequence alignment that correspond to D (see Materials and Methods for more details). Under the co-evolution hypothesis, which assumes that interacting domains undergo correlated mutations, domain pairs that are mediating the interaction between two proteins are expected to have co-evolved, and thus are expected to have high correlation score. To test this hypothesis, we began by examining yeast interactions supported by at least one PDB crystal structure. For a given protein–protein interaction, correlation scores of the interacting domain pairs (inferred from crystal structures) are compared against those domain pairs not known to interact to see whether or not the interacting domain pairs do really exhibit relatively high level of co-evolution. If two interacting proteins, P and Q, have two domains each, then there are a total of four domain–domain interaction possibilities between them.
Figure 2
Figure 2
Relative degree of co-evolution of domains in interacting proteins. (a) Domain architecture of proteins P and Q (shown using gray boxes) that are known to interact (interaction sites are shown as black boxes). (b) Correlation (agreement) scores, measuring (more ...)
Interactions in F1-ATPase complex
The F1-ATP synthase is a five-subunit catalytic core (in a stoichiometry of 3α, 3β, 1γ, 1δ, and 1epsilon), which uses transmembrane proton motive force generated by photosynthesis or oxidative phosphorylation to drive the synthesis of ATP from ADP and inorganic phosphate. The central stalk, comprising 3α, 3β, and 1γ subunits, links F1 complex to the nine-subunit transmembrane channel through which the protons are pumped (F0 complex). The rod-shaped asymmetrical γ-subunit rotates inside a cylinder made of three α and β-subunits, arranged alternately5256, making contacts with α and β-subunits.
In this complex, we focused our attention on three interactions among the α, β, and γ chains (genes ATP1, ATP2, ATP3, respectively). The corresponding yeast proteins for the α, β, and γ chains (YBL099w, YJR121w, YBR039w, respectively) were assigned with Pfam domains. The α-subunit (YBL099w) was inferred to contain three domains: beta-barrel domain (PF02874, 2e-18), nucleotide-binding domain (PF00006, 3e-122), and C-terminal domain (PF00306, 2e-37). The β-subunit (YJR121w), a close homolog of the α-subunit, was inferred to contain the same three domains as well. The γ-subunit (YBR039w) contained just the ATP synthase domain (PF00231, 1e-130). Figure 3(a) depicts the domain architecture of the sequences along with the true domain–domain interactions between the subunits. The results from co-evolutionary analysis (using orthologs from 22 species) are listed as tables in Figure 3(b), which clearly shows that those domain pairs that interact do, in fact, have relatively high correlation scores. A cartoon of one of several bovine mitochondrial F1-ATPase crystal structures (PDB: 1h8e), supporting the interactions, is shown in Figure 3(c). Despite the complexity of this system, it is remarkable that the degree of co-evolution among the interacting domain pairs is clearly higher than that among the non-interacting domain pairs. In particular, between the α and the β chains, seven out of the nine domain pairs that are known to interact have higher correlation than the two noninteracting domain pairs. In comparison, an approach that picks seven domain pairs out of the nine possibilities at random (without replacement) will have a 0.028 probability (p-value) of getting all its seven picks correct (truly interacting domain pairs).
Figure 3
Figure 3
Interactions among alpha (ATP1), beta (ATP2), and gamma (ATP3) chains of the ATPase. (a) Protein sequences are shown using thick colored lines: red for the alpha chain, green for the beta chain, blue for the gamma chain, and black for alpha or beta chain. (more ...)
Yeast Sec23p/Sec24p heterodimer
Sec23p (YPR181c) and Sec24p (YIL109c) are components of the Sec23p-Sec24p heterodimeric complex of the COPII vesicle, which carries proteins from the endoplasmic reticulum (ER) to the Golgi complex.57 YPR181c and YIL109c are structurally related proteins, consisting of five distinct Pfam domains: zinc finger (PF04810,<1e-19), alpha/beta trunk domain (PF04811,<1e-124), beta-barrel domain (PF08033,<2e-36), helical domain (PF04815, <1e-43), and C-terminal gelsolin-like domain (PF00626, 1e-17). The domain architectures for both the proteins are given in Figure 4(a), showing the lone inter-chain domain–domain interaction between the two alpha/beta trunk domains.
Figure 4
Figure 4
Interaction between Sec23 (YPR181c) and Sec24 (YIL109c) components of the COPII coat of ER-golgi vesicles. (a) Protein sequences are shown using thick gray lines, and Pfam domain annotations are shown using colored rectangular boxes (not drawn to scale). (more ...)
There are several intra-chain domain–domain interactions within each chain, involving all five domains (not shown in the Figure). Since, our analysis considers only inter-chain domain pairs (one in each chain), we considered only the lone inter-chain interaction between the alpha/beta trunk domains of the Sec23p and Sec24 chains. The correlation scores for all possible inter-chain domain pairs are listed as a table in Figure 4(b), and the cartoon of the PDB crystal structure (PDB: 1m2v), supporting the interaction, is shown in Figure 4(c). It is evident from our analysis that the interacting domain pair indeed has relatively high correlation. However, it was not the pair with the highest correlation. The domain pair of beta-barrel domain (PF04811) of Sec23p and alpha/beta trunk domain (PF08033) of Sec24p had the highest correlation. Since the trunk and beta-barrel domains interact within each chain, and both chains are remarkably conserved through evolution and are related, it may not be unreasonable to expect them to have a high correlation. Considering the fact that the lone true interacting domain pair has the second highest correlation score, an approach that picks two domain pairs out of the 25 possibilities at random (without replacement) will have a 0.04 probability (p-value) of picking the lone interacting domain pair.
Interactions in DNA-directed RNA polymerase complex
From the DNA-directed RNA polymerase complex in yeast, we considered two protein–protein interactions involving subunit Rpb8. There are several PDB crystal structures that show Rpb8 subunit interacting with smaller subunits Rpb3 and Rpb11, and the largest subunit Rpb1. The results of our domain-level co-evolutionary analysis for interactions between subunits Rpb8 (YOR224c) and Rpb3 (YIL021w), and subunits Rpb8 and Rpb1 (YDL140c) are shown in Figure 5(a) and (b), respectively. YOR224c was inferred to contain just one domain: RNA_pol_Rpb8 (PF03870, 4e-92). YIL021w was assigned with dimeri-sation domain (PF01193, 1e-19) and insert domain (PF01000, 7e-41), and YDL140c was assigned with seven domains: clamp domain (PF04997, 7e-178), active site domain (PF00623, 5e-188), pore domain (PF04983, 2e-69), funnel domain (PF05000, 2e-50), cleft domain (PF04998, 2e-170), and two mobile module domains (PF04992, 4e-97 and PF04990, 1e-78).
Figure 5
Figure 5
Inferred domain–domain interactions in DNA-directed RNA polymerase complex. Protein sequences are shown using thick gray lines, and the domain annotations are shown using colored rectangular boxes (not drawn to scale). The names of the protein (more ...)
The correlation results for the interaction between Rpb3 and Rpb8 subunits (using orthologs from a common set of 15 species), listed as a table in Figure 5(a), show that the interacting domain pair (PF01193, PF03870), shown in the cartoon of crystal structure (PDB: 1y1v), has a higher level of co-evolution when compared to the non-interacting domain pair (PF01000, PF03870). The correlation results are not that clear for the interaction between the Rpb1 and Rpb8 subunits (refer to the table in Figure 5(b)). While two out of the three interacting domain pairs have high correlation scores, the interaction between the funnel domain (PF05000) and RNA_pol_Rpb8 (PF03870) has the lowest correlation score among all possible domain pairs. An approach that picks four domain pairs out of the six possibilities at random (without replacement) will have a 0.086 probability (p-value) of getting three out its four picks correct (truly interacting domain pairs).
An interacting domain pair with a low correlation (false negative in some sense) could be explained using one or both of the following reasons. When assessing the degree of co-evolution between two domains, we tend to ignore the number of interacting partners a domain may have. Even though the co-evolutionary hypothesis for interacting domains assumes that the interacting domains undergo correlated mutations, more specifically, it is actually the binding surfaces that undergo correlated mutations. A domain having multiple interacting partners may use distinct patches on its surface to interact with each of its partners.34,58 Each binding region of a domain is highly specific to its interacting partner. Thus, the surface patches used by a domain to interact with many interacting partners may undergo independent correlated mutations with their corresponding interacting partner. As a result, mutations at different surface patches of a domain need not be correlated (see Figure 6). Consequently, the degree of co-evolution between two interacting domains, one or both with multiple interacting partners, may actually be suppressed, resulting in a low correlation score. Thus, it may not always be the case that a pair of interacting domains, each of which has multiple interacting partners, has a high correlation score.
Figure 6
Figure 6
Uncorrelated set of correlated mutations. Each rectangular box is a cartoon representation of a multiple sequence alignment of a family of orthologous proteins/domains. There are a total of six families, A, B, C, D, E, and F. The binding residues of interaction, (more ...)
Mutations occurring in interacting domains may not be correlated due to many other biological constraints imposed on them. There may be cases in which the mutations at a binding surface in a domain may not be followed by compensatory mutations at the binding surface of its interacting partner. Since protein binding surfaces are relatively more conserved than the rest of the sequence,59,60 domains with many interacting partners, and thus many surface patches, are likely to be relatively more conserved than those with few interacting partners.61,62 For each interacting domain, from its multiple sequence alignment, we computed the sequence identity of each orthologous domain in reference to the yeast domain. The alignments for Rpb8 and Rpb1 had orthologs in a common set of 17 species, including yeast. The average sequence identities for the interacting domains, along with the number of known interacting partners are listed in Table 1. Interestingly, at least for this example, those domains with many interacting partners are relatively more conserved than those with few partners.
Table 1
Table 1
The average sequence identities of interacting domains between subunits Rpb1 (YDL140c) and Rpb8 (YOR224c) of DNA-directed RNA polymerase, along with the number of known interacting partners for each domain
Exportin Cse1p complexed with its cargo
Nuclear pore complexes serve as a medium for exchange of macromolecules between the nucleus and the cytoplasm. Carrier proteins that shuttle between the nucleus and the cytoplasm enable active transport of large molecules through these pore complexes. Importin-alpha Srp1 (YNL189w), which acts as a carrier for many nuclear trafficking processes, binds cargo in the cytoplasm, moves through the nuclear pore and releases the cargo in the nucleus. The nuclear envelope protein Cse1p (YGL238w), a yeast homolog of mammalian CAS, recycles importin-alpha from the nucleus back to the cytoplasm, thereby allowing it to participate in multiple rounds of nuclear import.63,64
To understand the degree of co-evolution between the interacting domains in this complex, we first assigned Pfam domains to the proteins. Cse1 (YGL238w) was assigned with importin-beta N-terminal domain (PF03810, 2e-22), Cse1 domain containing HEAT repeats (PF08506, 7e-275), and Cas/Cse C terminus domain (PF03378, 5e-77). Importin-alpha Srp1 (YNL189w) was assigned with importin beta binding domain (PF01749, 1e-45) and eight Armadillo repeats (PF00514, <5e-6). The domain architecture and the domain-level interactions in this complex are shown in Figure 7. The co-evolution scores for all domain pairs between these chains are listed as a table in Figure 7(b). Three out of the five interacting domain pairs have high correlation, implying high level of co-evolution. The remaining two interacting domain pairs do not have high correlation scores. An approach that picks five domain pairs out of the 29 possibilities at random (without replacement) will have a 0.019 probability (p-value) of getting two out its five picks correct (truly interacting domain pairs).
Figure 7
Figure 7
Interaction between importin alpha Srp1 (YNL189w) and nuclear export receptor Cse1 (YGL238w). (a) Protein sequences are shown using thick gray lines, and Pfam domain annotations are shown using colored rectangular boxes (not drawn to scale). The names (more ...)
For each interacting domain, from its multiple sequence alignment, we computed the sequence identity of each orthologous domain in reference to the yeast domain. The average sequence identities for the interacting domains, along with the number of known interacting partners are listed in Table 2, which shows that those domains with many interacting partners are relatively more conserved than those with few partners, which is consistent with similar other findings suggesting that hubs (those proteins/domains with numerous interacting partners) are relatively more conserved.5962 If true, this could possibly explain why domains PF03378 and PF08506, with average sequence identities ~34% and ~46%, respectively, have low correlation with domain PF00514_8, which happens to have a relatively high average sequence identity at ~75%.
Table 2
Table 2
The average sequence identities of interacting domains between importin alpha Srp1 (YNL189w) and nuclear export receptor Cse1 (YGL238w), along with the number of known interacting partners for each domain
Predicting large-scale domain–domain interactions from the yeast interactome
Motivated by our results that interacting domain pairs (in interacting proteins) have higher correlation compared to non-interacting domain pairs, we developed a method to test the generality of the observed trend, and to predict large-scale domain–domain (i.e. Pfam–Pfam) interactions using the yeast interactome. First, we assigned the interacting yeast protein sequences with Pfam domains (see Materials and Methods). We then considered only those interactions involving proteins with at least 50% of their sequence lengths assigned (SLA) with Pfam domain(s), which we refer to as “test set SLA ≥50%”. A cutoff of 50% was chosen as a compromise between being sufficiently small to provide enough interactions, and large enough for assigned domains to contain sufficient binding sites. In addition, we imposed a restriction that the interacting proteins have orthologs in at least a common set of ten species. This resulted in a set of 1180 interactions among 654 proteins.
For each protein–protein interaction, we computed the correlation scores of all possible domain pairs between the two proteins, and inferred the domain pair(s) with the highest correlation score to be the interacting pair that is most likely to mediate the protein–protein interaction. For our set of 1180 protein–protein interactions, we inferred a total of 1222 domain–domain interactions (Supplementary material S3), 960 of which are unique. In order to validate our predictions, we used the iPfam database,50 which contains the list of known domain–domain interactions inferred from PDB crystal structures. We found that 206 out of our 1222 predictions (109 out of the 960 unique predictions ≈11.35%) are in iPfam.
Supplementary material S3
Supplementary material S3
RCDP prediction results for the test set 1, containing 1222 domain-domain interactions.
If we restricted our set to only those interactions involving proteins with at least 75% (instead of 50% before) of their sequence lengths assigned with Pfam domain(s), the percentage of predictions in iPfam jumps by 52%. Our restricted set, referred to as “test set SLA ≥75%”, contained a total of 374 protein–protein interactions among 298 proteins. For this set, we inferred a total of 392 domain–domain interactions (Supplementary material S5), 336 of which are unique. 58 out of the 336 unique predictions (≈17.26%) are in iPfam. The increase in the prediction accuracy from 11.35% (for the test set SLA ≥50%) to 17.26% (for the test set SLA ≥75%) indicates that the higher the SLA, the better the prediction accuracy. This is understandable considering the fact that the binding region in an interacting protein needs to be contained in one of that protein's domains in order to have any possibility of it being identified.
Supplementary material S5
Supplementary material S5
RCDP prediction results for the test set 2, containing 394 domain-domain interactions.
We compared our prediction results with those of Chen and Liu's RDFF method32 and Riley et al.'s DPEA method.33 The objective of this comparison is to find what percent of RCDP's predictions are confirmed by the other two methods. Chen and Liu used 9834 yeast protein interactions to infer 4366 domain–domain interactions, out of which 2475 are between Pfam-A domains while the rest involve Pfam-B domains. Riley et al.'s DPEA method is a statistical approach, which uses expectation maximization (EM) algorithm as a subroutine. They used a network of 26,032 protein–protein interactions from 69 organisms to infer a total of 3005 domain–domain interactions, out of which 1812 of them are between Pfam-A domain pairs. The comparison summary is shown in Figure 8(a), in which we refer to our method as RCDP (relative co-evolution of domain pairs). Although our analysis shows that the predictions by the RCDP method are more likely to be in iPfam than those by RDFF or DPEA methods, one needs to keep in mind that the prediction accuracies of the three methods are not directly comparable as they all use different datasets of varying sizes. However, the dataset used in our study is a subset of that used by Chen and Liu, and Riley et al. So, one would expect a good fraction of our predictions to be confirmed by the other two methods. Interestingly, only about 5% of RCDP's predictions are confirmed by both the RDFF and DPEA methods (Figure 8(b)). About 14% of RCDP's predictions are confirmed by DPEA alone, and about 23% of RCDP's predictions are confirmed by RDFF alone. Overall, 31% of RCDP's predictions are confirmed by DPEA and/or RDFF, indicating that RCDP can predict known domain–domain interactions missed by the other two. Thus, the RCDP method can be used with other methods to detect unrecognized domain–domain interactions on a genome scale with wider coverage.
Figure 8
Figure 8
(a) An indirect comparison of RCDP's prediction results with those of RDFF32 and DPEA33 methods. The predictions were validated against the known domain–domain interactions found in PDB crystal structures (as inferred in iPfam50). The prediction (more ...)
Validation of predicted domain–domain interactions and estimating the true predictive power of the RCDP method
We used domain pairs found to interact in PDB49 crystal structures, as reported in the iPfam database,50 as our gold standard to verify the predicted domain–domain interactions. The iPfam database defines two domains from two different chains to be interacting if and only if they are close enough in at least one PDB complex to form an interaction. We consider a predicted interaction between domain Pi in protein P and domain Qj in protein Q to be a true interaction (true positive) if and only if iPfam lists this pair of domain to be interacting based on one or more PDB crystal structure evidences.
An interacting protein pair P and Q is said to contain an iPfam domain–domain interaction xy if domain x is present in protein P and domain y is present in protein Q, or vice versa. A given protein–protein interaction may contain more than one iPfam domain–domain interaction, i.e. out of all possible domain pairs between the two interacting proteins, there may be more than one domain pair listed in iPfam.
We consider only those domain pairs found to interact in PDB crystal structures, as reported in iPfam, as true positives. Absence of a domain pair in iPfam does not necessarily mean that the two domains do not interact. Thus, it may not be fair to consider those predictions (without PDB evidence) as false positives. It could very well be that a good fraction of them could be biologically occurring domain interactions. A simple case would be a true protein–protein interaction, none of whose possible domain pairs are in iPfam. In order to estimate the true predictive power of the RCDP method, we tested it on a validation set comprising only those protein–protein interactions satisfying all of the following conditions: (i) is between proteins with at least 50% of their sequence lengths assigned with Pfam domain(s), (ii) is not between two one-domain proteins, (iii) contains a domain pair that is known to interact as per iPfam, and (iv) is between proteins having orthologs in at least a common set of ten species. The validation set contained a total of 109 protein–protein interactions (Supplementary material S6), comprising a total of 109 unique domain interactions that are in iPfam.
Supplementary material S6
Supplementary material S6
Validation set, containing 109 yeast protein interactions with SLA ≥ 50%.
Ideally, a good domain–domain interaction prediction method should be able to recover all 109 unique known domain–domain interactions present in the validation set. To measure the percentage of recovery, we use the sensitivity measure, which is the ratio of the number of unique true positives to the number of unique positives (which is 109). On the validation set of 109 protein–protein interactions, RCDP predicted 109 unique domain–domain interactions, out of which 63 are in iPfam. This resulted in a sensitivity of 63/109=57.8%. This is an underestimation because there may more than one domain pair mediating a given protein–protein interaction, and since RCDP is designed to find only the pair(s) with the highest correlation, it may not be able to recover all 109 unique domain–domain interactions present in the set.
While it is important for any good method to be able to recover the 109 known domain–domain interactions from the validation set, it is equally important that every predicted domain–domain interaction is correct. To measure the accuracy of our predictions, we used the positive predictive value (PPV) defined as:
equation M1
where TP is the number of predicted domain pairs that are known to be true (in iPfam), and FP is the number of predicted domain pairs that are not in the iPfam. RCDP predicted a total of 147 domain–domain interactions (109 of them are unique) for the validation set, out of which 94 are in iPfam with a PPV of 63.95%. To ensure that the 63.95% prediction accuracy of RCDP is not by chance, we compared it against a random method. For this, we used the exact same random strategy used by Nye et al.31, which, for a given protein–protein interaction, picks a domain pair at random out of all possible domain pairs. Since there could be more than one interacting domain pair within each interacting protein pair, there is a certain probability that the domain pair picked at random is a true interaction. We performed 100,000 runs of this random method on our validation set. The p-value of obtaining a prediction accuracy of ≥63.95% by chance is 1.05×10−2 (z-score: 2.39). The performance of RCDP versus the random method is shown in Figure 9. On average, the random method is expected to have 55.19 (±0.01)% of its predictions to be in iPfam (Figure 9). RCDP outperforms random by about 9%, which is significant, considering the fact that Nye et al. showed, on a different dataset, that the random method performs as good as three other popular methods for predicting domain–domain interactions. In particular, Nye et al. showed that one can expect their “lowest p-value” method, Deng et al.'s MLE method,30 and the random method to have about 55% of their predictions to be true, and Sprinzak et al.'s association method26 to have about 52% of its predictions to be true. Since the interaction dataset and domain annotations (SCOP domains) used in Nye et al.'s study are different from those used in this study, the results are not directly comparable.
Figure 9
Figure 9
Domain–domain interaction predictions results for 109 yeast protein–protein interactions, each of which (i) is between proteins with at least 50% of their sequence lengths assigned with Pfam domain(s), (ii) is not an interaction between (more ...)
Here, we performed co-evolutionary analysis of domains in interacting proteins to assess whether or not interacting domain pairs exhibit higher level of co-evolution than non-interacting domain pairs of a given protein–protein interaction. We used yeast protein–protein interactions from DNA-directed RNA polymerase complex and F1-ATPase complex among others in our investigation. Our results indicated that interacting domain pairs exhibit higher level of co-evolution than the non-interacting domain pairs. Motivated by the results, we designed a method, called RCDP, to confirm the observed trend, and to predict large-scale domain–domain (i.e. Pfam-Pfam) interactions using the yeast interactome. A total of 1222 domain–domain interactions from 1180 protein–protein interactions were predicted, out of which 109 are found in PDB (as reported in iPfam). Through comparison of our predictions with those from two other methods, we showed that the RCDP method can predict known domain–domain interactions missed by the other two methods.
The proposed RCDP method may not be suitable for predicting domain–domain interactions between homodimers (interaction between two copies of the same protein). The reason for this is that the domain pairs with the highest correlation will be inter-chain homodomains, which will have the maximum correlation score of one. Although this makes RCDP's results predictable for homodimers, in reality, it is mostly the case that homodimers are mediated by inter-chain homodomain interactions. In our set of 1180 protein–protein interactions (SLA ≥50%), we had 71 cases of a protein interacting with itself. For this set of 71 interactions, we predicted 112 domain–domain interactions, out of which 84 (75%) are found in PDB (as reported in iPfam).
Although relative co-evolution of interacting domains can be used to predict domain–domain interactions between two interacting proteins, there are some limitations that apply to any method based on co-evolution, which could cause false positives and false negatives that one should be aware of. First of all, this type of analysis assumes that interacting domains/proteins co-evolve, i.e. undergo correlated mutations, which may not be always true due to numerous other biological constraints on the interacting domains/proteins. If domain A with multiple interacting partners undergoes correlated mutations with its interacting partners, then there is a danger of it not having a high correlation with its partners due to “uncorrelated set of correlated mutations” (see Figure 4).
Predicted domain–domain interactions are only as good as the accuracy of the protein–protein interactions used. Domain–domain interactions are predicted under the premise that the given protein–protein interaction is accurate. If a protein–protein interaction is a false positive, then one should not expect the predicted domain–domain interaction to be true. Various studies6568 have reported that anywhere between 40–60% of the reported protein–protein interactions could be false-positives, which could potentially explain the false-positives (≈34%) in predictions by the RCDP method.
Not all interactions are mediated by pairs of globular domains. There are many that involve binding of a domain in one protein to a short region (approximately three to eight residues) in another.69,70 Detecting these short length “linear motifs” using sequence comparison is difficult due to their tendency to reside in disordered regions in proteins, and limited conservation outside of closely related species.37 Thus, there is a possibility that the set of interacting residues may not be part of the domains assigned to a protein. This could lead to incorrect prediction of domain–domain interactions in such cases. And, there may be more than one domain pair mediating a protein–protein interaction, and since RCDP is designed to find only the pair(s) with the highest correlation, it may not be able to recover all interacting domain pairs.
The orthology detection procedure used in this study may not be sufficiently rigorous for detecting orthologs. We did attempt to use a very stringent reciprocal BLAST best-hits approach. But, because of its stringent nature, and our requirement that interacting proteins have at least ten orthologs from a common set of species, we were unable to obtain a large enough dataset to make any statistical conclusion. Another issue is that of closely related paralogs. Since many genes in Eukaryotes are known to have numerous in-paralogs (due to recent duplications), it makes it difficult to establish one-toone orthology relationships. Our tests on a few cases to see the effect of including one in-paralog over another had little or no effect on the co-evolutionary analysis.
Despite these limitations, the RCDP method proves to be extremely useful for inferring domain–domain interactions. Unlike sophisticated statistical methods, which require a training set, the RCDP method can directly be used on a given protein–protein interaction to predict the domain pair that is most likely to mediate the interaction. Since RCDP, DPEA, and RDFF methods share a small fraction of their predictions, indicating that they can detect known domain–domain interactions missed by the other, together they can be used to detect unrecognized domain–domain interactions on a genome scale with wider coverage.
The RCDP method is simple and easy to implement (an implementation of the RCDP algorithm is available), and can be used as a tool to guide experimentalists in discovering previously unrecognized domain–domain interactions. In the future, we would like to investigate whether there is a possibility of transitivity in co-evolution. That is, if A interacts with B, and B interacts with C, will A and C exhibit a high degree of co-evolution (assuming that A and C do not interact) because of their association with a common interacting partner B? If they do, it would be interesting to know the biological reasons/constraints that require them to co-evolve.
Construction of multiple sequence alignments, phylogenetic trees, and similarity matrices
For each protein–protein interaction, multiple sequence alignments for the two proteins were constructed using MUSCLE71 by searching for their respective orthologs in 93 eukaryotic genomes ( Supplementary material S1 ). Ortho-logs were obtained by performing a stringent BLAST search.72 For a given query protein, the best hit in a genome with e-value <1e-5, sequence identity of at least 35% and an alignment length of at least 75% of the length of both the query and the hit sequence was considered to be an ortholog. Sequence identity and alignment length constraints were enforced to eliminate partial hits from consideration. Multiple sequence alignment for domain D in protein P is constructed by extracting those regions in P's multiple sequence alignment that corresponds to D.
Supplementary material S1
Supplementary material S1
Organisms (taxids) used for ortholog search.
In order to be able to compare the evolutionary histories of two domains, we require that both domains have orthologs in at least a common set of ten species. Multiple sequence alignments of both domains for a common set of species were constructed, followed by the construction of phylogenetic trees and similarity matrices using the algorithms provided in the ClustalW suite.73
Assessment of the agreement between the evolutionary histories of two domains
The extent of agreement between the evolutionary histories of two domains is assessed by comparing their phylogenetic trees. For comparison of phylogenetic trees, we follow the standard practice of comparing the corresponding similarity matrices.14,16,3944 The extent of agreement between two similarity matrices, A and B, is evaluated using Pearson's correlation coefficient, given by:
equation M2
where n is the number of species (rows/columns) represented in the matrices, Aij and Bij are the evolutionary distances between species i and j in the tree of domains A and B, respectively, and equation M3 and equation M4 are the mean values of all Aij and Bij respectively. The value of r ranges from −1.0 to +1.0, with higher r indicating greater agreement between the two matrices, and thus higher level of co-evolution between the corresponding families.
Inferring domain–domain interactions
For every interacting protein pair, P and Q, all possible domain–domain interactions between them are considered. Let protein P contain domains {P1, P2, …, Pm} and protein Q contain domains {Q1, Q2, …, Qn}. The correlation of evolutionary histories of all possible domain pairs between P and Q is computed, and the domain pair PiQj with the highest level of co-evolution (whose evolutionary histories correlate the most) is inferred to be the one (or one of many domain–domain contacts) that is most likely to mediate the interaction between P and Q. In cases of more than one domain pair having the highest correlation score, all domain pairs with the highest score are inferred to be interacting. Interestingly, and more often, more than one domain pair mediate a given protein–protein interaction.
Protein–protein interaction test set
Protein–protein interaction data for Saccharomyces cere-visiae (yeast) from the DIP database74 (February 2005 release) were used. This set contained a total of 17,471 interactions underlying 4931 yeast proteins. For domain definition, we used the Pfam database of Hidden Markov Model (HMM) profiles.51 Only Pfam-A profiles were used to assign domain definitions to the 4931 interacting proteins, using e-value cutoff of 1e-3.
Only interacting proteins with at least 50% of their sequence lengths assigned onto Pfam domain(s), and interactions involving them, were considered. This reduced the number of interactions to 3266 among 1397 proteins. Because of the limitation that interacting protein pairs have orthologs in at least a common set of ten species, our final test set contained 1180 interactions, underlying 654 proteins (Supplementary material S2). We also considered a restricted set of interactions, with each interacting protein having at least 75% of its sequence length assigned onto Pfam domain(s). This restricted set contained a total of 374 interactions among 298 proteins (Supplementary material S4).
Supplementary material S2
Supplementary material S2
Test set 1, containing 1180 yeast protein interactions with SLA ≥ 50%.
Supplementary material S4
Supplementary material S4
Test set 2, containing 374 yeast protein interactions with SLA ≥ 75%.
Supplementary information
Supplementary data associated with this article, and an implementation of the RCDP algorithm is available at URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/RCDP/. The supplementary data comprises: S1, organisms used in ortholog search; S2, test set 1, containing 1180 yeast protein interactions with SLA ≥50%; S3, RCDP prediction results for the test set 1, containing 1222 domain–domain interactions; S4; test set 2, containing 374 yeast protein interactions with SLA ≥75%; S5, RCDP prediction results for the test set 2, containing 394 domain–domain interactions; S6, validation set, containing 109 yeast protein interactions with SLA ≥50%.
Acknowledgments
We thank S. Balaji for useful comments and suggestions. This work was supported by the intramural research program of the National Library of Medicine, National Institutes of Health.
1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. [PubMed]
2. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001;98:4569–4574. [PubMed]
3. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. [PubMed]
4. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415:180–183. [PubMed]
5. Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, et al. A protein interaction map of Drosophila melanogaster. Science. 2003;302:1727–1736. [PubMed]
6. Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, et al. A map of the interactome network of the metazoan C. elegans. Science. 2004;303:540–543. [PMC free article] [PubMed]
7. Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, Canadien V, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature. 2005;433:531–537. [PubMed]
8. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440:637–643. [PubMed]
9. Dandekar T, Snel B, Huynen M, Bork P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998;23:324–328. [PubMed]
10. Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA. Protein interaction maps for complete genomes based on gene fusion events. Nature. 1999;402:86–90. [PubMed]
11. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein–protein interactions from genome sequences. Science. 1999;285:751–753. [PubMed]
12. Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N. Use of contiguity on the chromosome to predict functional coupling. In Silico Biol. 1999;1:93–108. [PubMed]
13. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999;96:4285–4288. [PubMed]
14. Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE. Co-evolution of proteins with their interaction partners. J Mol Biol. 2000;299:283–293. [PubMed]
15. Wojcik J, Schachter V. Protein–protein interaction map inference using interacting domain profile pairs. Bioinformatics. 2001;17(Suppl 1):S296–S305. [PubMed]
16. Pazos F, Valencia A. Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Eng. 2001;14:609–614. [PubMed]
17. Pazos F, Valencia A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins: Struct Funct Genet. 2002;47:219–227. [PubMed]
18. Date SV, Marcotte EM. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nature Biotechnol. 2003;21:1055–1062. [PubMed]
19. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, et al. A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science. 2003;302:449–453. [PubMed]
20. Lappe M, Holm L. Unraveling protein interaction networks with near-optimal efficiency. Nature Biotechnol. 2004;22:98–103. [PubMed]
21. Pagel P, Wong P, Frishman D. A domain interaction map based on phylogenetic profiling. J Mol Biol. 2004;344:1331–1346. [PubMed]
22. Kim Y, Subramaniam S. Locally defined protein phylogenetic profiles reveal previously missed protein interactions and functional relationships. Proteins: Struct Funct Genet. 2006;62:1115–1124. [PubMed]
23. Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, et al. Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA. 2005;102:1974–1979. [PubMed]
24. Apic G, Gough J, Teichmann SA. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 2001;310:311–325. [PubMed]
25. Chothia C, Gough J, Vogel C, Teichmann SA. Evolution of the protein repertoire. Science. 2003;300:1701–1703. [PubMed]
26. Sprinzak E, Margalit H. Correlated sequence-signatures as markers of protein–protein interaction. J Mol Biol. 2001;311:681–692. [PubMed]
27. Kim WK, Park J, Suh JK. Large scale statistical prediction of protein–protein interaction by potentially interacting domain (PID) pair. Genome Inform Ser Workshop Genome Inform. 2002;13:42–50.
28. Ng SK, Zhang Z, Tan SH. Integrative approach for computationally inferring protein domain interactions. Bioinformatics. 2003;19:923–929. [PubMed]
29. Albrecht M, Huthmacher C, Tosatto SC, Lengauer T. Decomposing protein networks into domain–domain interactions. Bioinformatics. 2005;21(Suppl 2):ii220–ii221. [PubMed]
30. Deng M, Mehta S, Sun F, Chen T. Inferring domain–domain interactions from protein–protein interactions. Genome Res. 2002;12:1540–1548. [PubMed]
31. Nye TM, Berzuini C, Gilks WR, Babu MM, Teichmann SA. Statistical analysis of domains in interacting protein pairs. Bioinformatics. 2005;21:993–1001. [PubMed]
32. Chen XW, Liu M. Prediction of protein–protein interactions using random decision forest framework. Bioinformatics. 2005;21:4394–4400. [PubMed]
33. Riley R, Lee C, Sabatti C, Eisenberg D. Inferring protein domain interactions from databases of interacting proteins. Genome Biol. 2005;6:R89. [PMC free article] [PubMed]
34. Littler SJ, Hubbard SJ. Conservation of orientation and sequence in protein domain–domain interactions. J Mol Biol. 2005;345:1265–1279. [PubMed]
35. Gong S, Park C, Choi H, Ko J, Jang I, Lee J, et al. A protein domain interaction interface database: InterPare. BMC Bioinformatics. 2005;6:207. [PMC free article] [PubMed]
36. Shoemaker BA, Panchenko AR, Bryant SH. Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci. 2006;15:352–361. [PMC free article] [PubMed]
37. Neduva V, Linding R, Su-Angrand I, Stark A, de Masi F, Gibson TJ, et al. Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol. 2005;3:e405. [PMC free article] [PubMed]
38. Aloy P, Russell RB. Structural systems biology: modelling protein interactions. Nature Rev Mol Cell Biol. 2006;7:188–197. [PubMed]
39. Goh CS, Cohen FE. Co-evolutionary analysis reveals insights into protein–protein interactions. J Mol Biol. 2002;324:177–192. [PubMed]
40. Ramani AK, Marcotte EM. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J Mol Biol. 2003;327:273–284. [PubMed]
41. Gertz J, Elfond G, Shustrova A, Weisinger M, Pellegrini M, Cokus S, Rothschild B. Inferring protein interactions from phylogenetic distance matrices. Bioinformatics. 2003;19:2039–2045. [PubMed]
42. Jothi R, Kann MG, Przytycka TM. Predicting protein–protein interaction by searching evolutionary tree automorphism space. Bioinformatics. 2005;21 (Suppl 1):i241–i250. [PMC free article] [PubMed]
43. Pazos F, Ranea JA, Juan D, Sternberg MJ. Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J Mol Biol. 2005;352:1002–1015. [PubMed]
44. Sato T, Yamanishi Y, Kanehisa M, Toh H. The inference of protein–protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics. 2005;21:3482–3489. [PubMed]
45. Fraser HB, Hirsh AE, Wall DP, Eisen MB. Coevolution of gene expression among interacting proteins. Proc Natl Acad Sci USA. 2004;101:9033–9038. [PubMed]
46. Tirosh I, Barkai N. Computational verification of protein–protein interactions by orthologous co-expression. BMC Bioinformatics. 2005;6:40. [PMC free article] [PubMed]
47. Moyle WR, Campbell RK, Myers RV, Bernard MP, Han Y, Wang X. Co-evolution of ligand-receptor pairs. Nature. 1994;368:251–255. [PubMed]
48. Pazos F, Helmer-Citterich M, Ausiello G, Valencia A. Correlated mutations contain information about protein–protein interaction. J Mol Biol. 1997;271:511–523. [PubMed]
49. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucl Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
50. Finn RD, Marshall M, Bateman A. iPfam: visualization of protein–protein interactions in PDB at domain and amino acid resolutions. Bioinformatics. 2005;21:410–412. [PubMed]
51. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, et al. The Pfam protein families database. Nucl Acids Res. 2004;32:D138–D141. [PMC free article] [PubMed]
52. Boyer PD, Kohlbrenner WE. Energy Coupling in Photosynthesis. In: Selman B, Selman-Reiner S, editors. Elsevier Science Publishing Co; New York: 1981. pp. 231–240.
53. Cox GB, Jans DA, Fimmel AL, Gibson F, Hatch L. Hypothesis. The mechanism of ATP synthase Conformational change by rotation of the beta-subunit. Biochim Biophys Acta. 1984;768:201–208. [PubMed]
54. Mitchell P. Molecular mechanics of protonmotive F0F1 ATPases. Rolling well and turnstile hypothesis. FEBS Letters. 1985;182:1–7. [PubMed]
55. Oosawa F, Hayashi S. The loose coupling mechanism in molecular machines of living cells. Advan Biophys. 1986;22:151–183. [PubMed]
56. Abrahams JP, Leslie AG, Lutter R, Walker JE. Structure at 2.8 A resolution of F1-ATPase from bovine heart mitochondria. Nature. 1994;370:621–628. [PubMed]
57. Lederkremer GZ, Cheng Y, Petre BM, Vogan E, Springer S, Schekman R, Walz T, Kirchhausen T. Structure of the Sec23p/24p and Sec13p/31p complexes of COPII. Proc Natl Acad Sci USA. 2001;98:10704–10709. [PubMed]
58. Kim WK, Ison JC. Survey of the geometric association of domaindomain interfaces. Proteins: Struct Funct Genet. 2005;61:1075–1088. [PubMed]
59. Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES. Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci. 2004;13:190–202. [PubMed]
60. Wuchty S. Evolution and topology in the yeast protein interaction network. Genome Res. 2004;14:1310–1314. [PubMed]
61. Wuchty S, Oltvai ZN, Barabasi AL. Evolutionary conservation of motif constituents in the yeast protein interaction network. Nature Genet. 2003;35:176–179. [PubMed]
62. Fraser HB, Wall DP, Hirsh AE. A simple dependence between protein evolution rate and the number of protein–protein interactions. BMC Evol Biol. 2003;3:11. [PMC free article] [PubMed]
63. Hood JK, Silver PA. Cse1p is required for export of Srp1p/importin-alpha from the nucleus in Saccharomyces cerevisiae. J Biol Chem. 1998;273:35142–35146. [PubMed]
64. Schroeder AJ, Chen XH, Xiao Z, Fitzgerald-Hayes M. Genetic evidence for interactions between yeast importin alpha (Srp1p) and its nuclear export receptor. Cse1p Mol Gen Genet. 1999;261:788–795.
65. Mrowka R, Patzak A, Herzel H. Is there a bias in proteome research? Genome Res. 2001;11:1971–1973. [PubMed]
66. Deane CM, Salwinski L, Xenarios I, Eisenberg D. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics. 2002;1:349–356. [PubMed]
67. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P. Comparative assessment of large-scale data sets of protein–protein interactions. Nature. 2002;417:399–403. [PubMed]
68. Sprinzak E, Sattath S, Margalit H. How reliable are experimental protein–protein interaction data? J Mol Biol. 2003;327:919–923. [PubMed]
69. Pawson T, Scott JD. Signaling through scaffold, anchoring, and adaptor proteins. Science. 1997;278:2075–2080. [PubMed]
70. Sudol M. From Src homology domains to other signaling modules: proposal of the ‘protein recognition code’ Oncogene. 1998;17:1469–1474. [PubMed]
71. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. [PMC free article] [PubMed]
72. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [PubMed]
73. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. Multiple sequence alignment with the Clustal series of programs. Nucl Acids Res. 2003;31:3497–3500. [PMC free article] [PubMed]
74. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The database of interacting proteins: 2004 update. Nucl Acids Res. 2004;32:D449–D451. [PMC free article] [PubMed]