|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: JK SK. Performed the experiments: JK IK J-SY SP. Analyzed the data: JK IK J-SY Y-ES JH SP YSC SK. Wrote the paper: JK IK SK.
PDZ domain-mediated interactions have greatly expanded during metazoan evolution, becoming important for controlling signal flow via the assembly of multiple signaling components. The evolutionary history of PDZ domain-mediated interactions has never been explored at the molecular level. It is of great interest to understand how PDZ domain-ligand interactions emerged and how they become rewired during evolution. Here, we constructed the first human PDZ domain-ligand interaction network (PDZNet) together with binding motif sequences and interaction strengths of ligands. PDZNet includes 1,213 interactions between 97 human PDZ proteins and 591 ligands that connect most PDZ protein-mediated interactions (98%) in a large single network via shared ligands. We examined the rewiring of PDZ domain-ligand interactions throughout eukaryotic evolution by tracing changes in the C-terminal binding motif sequences of the PDZ ligands. We found that interaction rewiring by sequence mutation frequently occurred throughout evolution, largely contributing to the growth of PDZNet. The rewiring of PDZ domain-ligand interactions provided an effective means of functional innovations in nervous system development. Our findings provide empirical evidence for a network evolution model that highlights the rewiring of interactions as a mechanism for the development of new protein functions. PDZNet will be a valuable resource to further characterize the organization of the PDZ domain-mediated signaling proteome.
Rewiring of interactions is a powerful tool for the evolution of organism complexity. Rewiring among preexisting proteins provides a simple mechanism for the development of new signaling circuits by redirecting information flows without a gain or loss of genes. Particularly, interactions mediated by short linear motifs can be easily changed by mutations during evolution, resulting in a rewiring of interactions. However, how interaction rewiring of linear motif interactions facilitates the emergence of new protein function during evolution is poorly understood. Here, we systematically investigated the rewiring of interactions mediated by PDZ domains, which are one of the most commonly found peptide recognition modules. We found that PDZ domain-ligand interactions are frequently rewired by C-terminal sequence mutations in PDZ ligands during evolution. Especially, rewiring of PDZ domain-ligand interactions was involved in neuronal function development, occurring concurrently with the emergence of vertebrates and suggesting that reorganization of signaling pathways by rewiring PDZ domain-ligand interactions significantly contributed to the evolution of nervous systems in vertebrates. Our findings highlight the rewiring of interactions as an effective means for functional innovation, providing new insight into eukaryotic evolution, which has not been fully explained by only the expansion of protein families.
PDZ domains are linear motif-mediated protein-protein interaction modules. PDZ domain-ligand interactions have been greatly expanded in metazoans and are widely used to assemble signaling complexes, including those found in neuronal synapses . Thus, an understanding of how PDZ domain-ligand interactions have evolved would help elucidate the design principle of the eukaryotic signaling network. Many studies have revealed the evolutionary history of PDZ domain families and their functional roles , . However, it remains poorly understood how PDZ domain-mediated interactions are rewired during the evolution of the protein interaction network.
Systematic analysis of interaction rewiring will provide new insights into eukaryotic evolution, which is not fully explained via only the expansion of protein families. Recently, it was suggested that rewiring of interactions is an important mechanism for the evolution of biological systems. Network comparison studies showed that protein interactions frequently change after gene duplication , . In particular, linear motifs were suggested to have great potential to rewire interactions because of their high rate of change , . Indeed, phosphorylation sites in one species are often lost in other species , . Moreover, human-specific phosphorylation sites are recently examined to identify novel phenotypes in humans because the interaction rewiring of kinase interactions may contribute to the emergence of novel biological functions .
Structural information of interacting cellular components (i.e., structural interactome) would provide a more complete picture of a cell and help elucidate the evolutionary principle of the protein interaction network . Recently, structural information of protein complexes were mapped onto protein interaction networks , . Indeed, such interface information of protein interactions would more clearly explain evolutionary principles, such as the network evolution model by gene duplication  and the role of residues surrounding linear motifs in terms of binding specificity . Therefore, to understand the underlying design principle of the PDZ domain-ligand interaction network, detailed interface information at the amino acid level is needed.
In this work, we attempted the first systematic investigation of interaction rewiring in the PDZ domain-ligand interaction network and its role in eukaryotic evolution. We constructed a comprehensive human PDZ domain-ligand interaction network and traced the changes in interaction rewiring during evolution. We developed position weight matrices (PWMs) of human PDZ domains from the experimental data of PDZ domain-ligand interactions. The binding motif information of PDZNet helped to elucidate the changes in PDZ domain-ligand interactions. We found that PDZ domain-ligand interactions are frequently rewired throughout evolution via mutations of C-terminal PDZ ligand sequences. Particularly, interaction rewiring occurred concurrently with emergence of vertebrates whose rewired interactions were largely involved in neuronal signaling, suggesting that nervous system evolution might be achieved by the interaction rewiring of signaling components, such as PDZ protein-ligand interactions. Furthermore, the broad specificity of PDZ domains contributes to interaction rewiring by increasing the chance of acquiring PDZ binding motifs by sequence mutations. Our findings will prompt a new approach for the study of eukaryotic evolution by considering the rewiring of interactions as a major evolutionary process of domain-ligand interactions.
To elucidate how PDZ domain-ligand interactions have evolved, an accurate and detailed understanding of their interactions is essential. Furthermore, a network approach is useful to understand how evolution of PDZ domain-ligand interactions contributed to eukaryotic evolution, because protein functions may not be encoded in an individual protein but rather be encoded in the relationships between proteins in a protein-protein interaction network –. Therefore, we constructed a comprehensive network of PDZ protein-ligand interactions by integrating the experimental data of PDZ domain-ligand interactions and protein-protein interaction databases (Figure 1).
We developed a quantitative model of PDZ domain binding strengths from the experimental data of PDZ domain-ligand interactions, including interactions between 81 PDZ domains and 217 peptides from a protein array , the phage display of 86 PDZ domains , , interactions between 147 PDZ domains and 219 ligands from a database of in vivo PDZ domain-ligand interactions (PDZBase) , and literature mining ,  (Figure 1A). This model converts binary interactions between PDZ domains and ligands into PWMs, which can quantify the binding strengths of a given PDZ domain and peptide sequence based on the physical and chemical properties of binding pocket residues as well as the frequencies of amino acids found in the bound peptides. To capture the binding strengths of the PDZ domain-peptide interactions, we combined a machine-learning algorithm and an information theory-based PWM method. We provide the PWMs of human PDZ domains as a resource (Table S1). In this study, we focused on the C-terminal motifs of ligands for the analysis of PDZNet. Although several internal PDZ binding motifs have been reported, most PDZ domain-ligand interactions are mediated by C-terminal residues, owing to the structural constraint on the internal motifs that require the β-hairpin fold , .
We found that the binding scores of PWMs well represent the experimental affinities of PDZ domain-ligand interactions (Figure 2). The large-scale binding affinities (Kd) of PDZ domain-peptide interactions are available for SNA1 and ERBIN PDZ domains . The PWMs provided the binding scores of the interactions, which showed a strong positive correlation with the experimental affinities for both SNA1 (R2=0.76) and ERBIN (R2=0.85) PDZ domain-peptide interactions. Moreover, in vivo binding affinities of PSD-95_1 (the first PDZ domain of PSD-95) with its ligands correlated well with its binding scores from the PWMs (Figure 3A and 3B). The Kds of Kv1.4 and GluR6 to PSD-95_1 were measured experimentally . Kv1.4 bound to PSD-95_1 with high affinity (Kd=1.5 µM), whereas GluR6 bound to PSD-95_1 with low affinity (Kd=160 µM). When we measured the binding scores of the PSD-95_1 ligands based on PWMs, the binding score of Kv1.4 was found to be higher (binding score=14.72) than that of GluR6 (binding score=6.00; Figure 3B). Next, we measured how precisely the in vivo ligands of PDZ domains can be rediscovered by the binding scores obtained from PDZ domain-ligand interactions. We found that although the interaction data for the target PDZ domain were excluded from the training set, 290 of 320 (91%) of the known PDZ ligands were found in the top 10 percentile of binding scores (Figure 3C). We also found that our PWMs provided reliable predictions for PDZ domains derived from various species (Table S2). Furthermore, we found that our predicted PWMs agreed well with experimental data-based PWMs . We compared the PWMs derived from phage display experiments with the predicted PWMs of the MAGI1_2, DLG1_2, and PTN13_2 PDZ domains and confirmed that they were nearly identical (Figure S1).
To construct PDZNet with high-confidence interactions, we prioritized the experimentally validated PDZ protein-ligand interactions from the prediction results of the PWMs. It is a challenge to correlate the occurrence of amino acids in a linear motif to the binding specificity of peptide-binding domains . The PWM method treats each amino acid position in a linear motif independently; thus, predicted interactions may include a fraction of false-positive results. Therefore, we only included interactions supported by experimental evidence. To assemble experimentally validated protein interactions, we integrated 22 different PPI databases containing 101,777 interactions among 11,043 proteins (Figure 1B and 1C).
PDZNet is composed of 97 PDZ proteins and 596 partners with 1,212 interactions (Figure 4A), which can be accessed in Table S3. PDZ proteins interact with a various number of ligands (Figure S2) and most (98%) PDZ proteins are connected in a large single network via shared ligands. Beginning with PDZNet, we generated two network projections (Figure 4B), which displayed both PDZ protein-PDZ protein and ligand-ligand connections via common interacting partners. In the “PDZ protein network” (PPN; Figure 4B, left panel), nodes represent PDZ proteins; two PDZ proteins are connected if they share at least one ligand. Meanwhile, in the “PDZ ligand network” (PLN; Figure 4B, right panel), nodes are PDZ ligands; two PDZ ligands are connected if they share at least one PDZ protein. On average, a PDZ protein interacts with 17 partners, and a PDZ ligand interacts with three PDZ proteins. We further examined whether this multispecificity is also found at the domain level. For proteins with multiple PDZ domains, PDZNet specifies the interactions mediated by individual domains, yielding 2,643 PDZ domain-ligand interactions (Figure S3). On average, a PDZ domain interacts with 14 ligands, and a ligand interacts with four PDZ domains, suggesting that the complexity of PDZNet originated from the multispecificity of PDZ domains.
We discovered that an interface similarity exists among PDZ domains that share the same ligands. In the PPN, PDZ protein pairs connected by the same ligands tend to have similar pocket residues (Figure S4). For example, SAP97_1, SAP97_2, PSD-93_1, PSD-93_2, SAP102_2, and PSD-95_1 have similar binding pocket residues that bind the same ligand (AT2B4), suggesting that gene duplications contribute to the multispecific interactions in PDZNet and increase network complexity because interaction partners from gene duplication events tend to share the same interface . Indeed, we found that SAP97, PSD-93, SAP102, and PSD-95 PDZ proteins were paralogs, the products of gene duplication events. Interestingly, we also found cases of non-paralogous proteins that have similar binding specificities, which suggest that convergent evolution might also play a role in the development of network complexity. For example, two PDZ proteins, LAP2 and MAGI2, were found to interact with the same ligand, CTND2, although they are evolutionarily unrelated. Meanwhile, the PLN provides a complementary ligand-centered view of PDZNet. We found that the connected ligands in the PLN tend to have similar C-terminal sequences. As shown in Figure 4B, two PDZ ligands, ARVC and CTND2, interact with the LAP2 PDZ protein and have same binding motif (DSWV).
We then asked how PDZ domains and ligands obtained multiple partners during evolution. Gene duplication and subsequent diversification events are considered major factors for network growth. Although gene duplication played a significant role in PDZ proteins and ligand evolution , it may not explain how a PDZ domain can interact with multiple, non-homologous ligands.
We found that sequence mutations played an important role for the attachment of non-homologous ligands to PDZ domains. On an evolutionary time scale, the compendium of PDZ ligands expands via two processes: (1) the introduction of new PDZ ligands by gene duplication of existing partners, or (2) the de novo evolution of new interactions via the acquisition of PDZ-binding motifs (Figure 5A). To examine the extent of gene duplication in PDZNet growth, we calculated the paralog fractions of PDZNet because gene duplication products usually remain as homologous genes . We discovered that the relatively small fraction of PDZ ligands that share a common partner were paralogs (33.6%), whereas a significantly larger portion of PDZ proteins that share a common partner were paralogs (54.5%; Wilcoxon's rank-sum test; p=1.24×10−4; Figure 5B).
Next, we examined the sequence evolution of the binding motifs of human PDZ ligands and discovered that a large portion of PDZ ligands acquired their binding motifs via sequence mutations. We examined the C-terminal sequences of PDZ ligands in each PDZ domain-ligand interaction pair across 16 representative species. We found that nearly one-third of human PDZ ligands gained their PDZ domain interactions by C-terminal mutations during evolution (Table S4; experimental evidence of human PDZ domain-ligand interactions are provided in Table S5). For example, NOS1AP obtained a PDZ-binding motif via sequence mutation and became an interaction partner with the NOS1 PDZ protein from vertebrates. We discovered that NOS1AP has orthologs in a wide range of species from yeast to human (Figure 5C). To examine whether the PDZ-binding motif of NOS1AP emerged from vertebrates, we compared the C-terminal sequences and binding scores of NOS1AP from invertebrate and vertebrate orthologs. The binding of mouse NOS1AP with NOS1 PDZ protein has been confirmed experimentally . The C-terminal sequences of the vertebrate orthologs of NOS1AP are identical, whereas the C-terminal sequences of the invertebrate orthologs of NOS1AP vary across species and differ from those of the vertebrate NOS1AP orthologs. Moreover, we searched for evidence of a NOS1AP–NOS1 interaction in invertebrate PPI databases, including Databases of Interacting Proteins (DIP) , BioGrid , and Comprehensive Drosophila Interactions (Droidb) , but none was found. When we compared the binding scores of NOS1AP to NOS1, all invertebrates orthologs showed low binding scores (average binding score=−3.03), whereas the binding scores of vertebrate orthologs were high (average binding score=5.27; Figure 5D), indicating that NOS1AP was an invertebrate non-binder but gained the ability to bind the NOS1 PDZ protein in vertebrates.
Interaction rewiring is an effective evolutionary mechanism given that it reconfigures molecular systems without a gain or loss of genes . We hypothesized that the rewiring of PDZ domain-ligand interactions via sequence mutation contributed to the evolution of the vertebrate nervous system, in which PDZ proteins and ligands play an important role . To determine whether the rewiring of PDZ domain-ligand interactions had a significant impact on vertebrate nervous systems, we calculated the rewiring rates between species from Escherichia coli to humans and examined the changes in the C-terminal sequences of PDZ ligands and the binding specificities of PDZ domains (Figure S5). We found that PDZ domain-ligand interactions were most frequently rewired between invertebrates and vertebrates (Figure 6).
We also examined the types of biological processes that are significantly affected by the rewiring events of PDZ domain-ligand interactions. We found that the PDZ ligands that arose in invertebrates and gained their PDZ-binding motifs in vertebrates participated significantly in the process of neurological system development (Table S6). For example, with the emergence of vertebrates, the breakpoint cluster region protein, BCR, acquired a PDZ-binding motif and began to interact with the AFADIN PDZ protein by changing its C-terminal sequence from ARLK (binding score=−2.03) to STEV (binding score=5.87). The binding of the PDZ domain interaction sites of BCR and the AFADIN PDZ domain was also confirmed by immunoprecipitation and NMR chemical shift perturbation experiments , . In vertebrates, BCR controls the interaction between AFADIN and RAS GTPase . AFADIN also interacts with the vertebrate-specific receptors EPHA7 and EPHB3 of the Eph-receptor family, which regulate the morphology and motility of neuronal cells through the RAS GTPase , . Thus, the interaction between BCR and AFADIN may evolve to control EPH receptor signaling, which is greatly diversified in vertebrates. Meanwhile, we found that proteins that arose and acquired PDZ domain interaction sites in invertebrates tend to be involved in various cellular processes, such as vesicle-mediated transport, cell cycle, and RNA splicing (Table S6). These results suggest that the emergence of the vertebrate nervous system integrated preexisting functional units during the evolution of synapse complexity.
We found that metazoan-specific PDZ proteins adopted their ligands from proteins of premetazoan origin. The phylogenetic profile shows the origin of the PDZNet proteins (Figure S6). Many of the human PDZ ligands were detected in premetazoans, whereas most human PDZ proteins were only found in metazoan species. The binding scores of nearly one-half of the premetazoan orthologs of the human PDZ ligands were less than zero, indicating that these proteins were not the interaction partners of PDZ proteins in premetazoan species. However, these proteins acquired PDZ-binding motifs in metazoans and began to interact with metazoan PDZ protein partners. For example, EXOC4 is found in yeast and gained its PDZ-binding motif in vertebrates (Figure 7; the multiple sequence alignment of EXOC4 orthologs is shown in Figure S7). The binding of mouse EXOC4 with SAP102 via its PDZ domain was validated in a yeast two-hybrid system and by pull-down assays . The yeast ortholog of EXOC4 is a component of the exocyst complex, which transports vesicles to the plasma membrane. After gaining a PDZ-binding motif recognized by the SAP102 PDZ domain in vertebrates, it participates in NMDA receptor trafficking . This finding suggests that the evolution of metazoan functions required the rewiring of functional modules that existed in premetazoans and contributed to network growth. Indeed, previous studies have noted that proteins of premetazoan origin played important roles in metazoan-specific functions, such as synaptic signaling . Together, the premetazoan ancestry of PDZ ligands highlights the de novo occurrence of PDZ domain-ligand interactions in the rewiring of metazoan evolution.
Next, we asked which physiological system was most affected by the mutations of PDZNet proteins. Mutations could affect the binding specificity of PDZ-ligand interactions via the replacement of interfacial residues or the destabilization of PDZ domain and ligand structure. If an interaction gained from the evolution of PDZNet had contributed to the development of a certain physiological system, an alteration of the interaction could be associated with genetic diseases caused by a malfunction of the system.
We investigated the disease associations of the PDZNet components and found that many PDZNet proteins are significantly associated with neurological diseases (Figure 8). Human genetic diseases were mapped to the components of PDZNet using disease-gene association data from the Online Mendelian Inheritance in Man (OMIM) . Genetic diseases were classified into 20 disease classes based on the physiological system affected . We examined whether a certain disease class was more enriched in the PDZNet components than the other proteins in the human interactome. Of the 20 disease classes examined, the neurological disease class was the most highly associated with mutations of the PDZNet components (Table S7). For example, a mutation in the PDZ protein, NLGNX, perturbed its PDZ domain interaction with the ligand protein, SNTG2, which is suggested to be a cause of mental retardation and autism –. This finding reconfirms the importance of PDZ domain-ligand interactions in the evolution of the nervous system. A morbid map of PDZNet components with the classification of genetic diseases is provided in Table S8 as a resource.
In this study, we describe the first PDZ protein-ligand interaction network coupled with quantitative binding strength. Our network approaches elucidated how PDZ domains have diversified their binding partners in the organization of various signaling complexes from receptors to downstream signaling relays. Moreover, we showed that de novo evolution of PDZ domain-ligand interactions played an important role in the growth of PDZNet. These findings provide empirical evidence for a network evolution model that highlights the rewiring of interactions as a mechanism of functional innovation.
PDZNet provides information beyond just the state of interaction binding. First, PDZNet provides information regarding the binding interface. High-throughput experiments provided large-scale PPI information; however, the identification of which amino acids were used in the interactions has been difficult. The quantitative model of PDZ domain-ligand interactions provides sequence information on domains and linear motifs, enabling a deeper understanding of the mechanisms involved in their interactions. Second, PDZNet provides the binding strengths of the interactions. The quantitative binding strengths of PDZ domain-ligand interactions enable us to understand the competition among interaction partners for switching between signaling flows.
The multispecificity of PDZ domain-ligand interactions has unique advantages in the evolution of PDZ domain function in the cell signaling network. First, the multispecificity of PDZ domains contributes to the frequent rewiring of PDZ domain-ligand interactions and broadens the extent of recognizable sequences, thus increasing the chance that a protein gains a suitable sequence to interact with its partners. Indeed, we found that PDZ domain pockets prefer multiple amino acids for interactions. We analyzed amino acid preference patterns from the PWMs of human PDZ domains (Figure S8) and found that the degeneracy of binding motifs facilitate the binding of different PDZ ligands to the same PDZ domain. This finding is consistent with those of a recent study that revealed the specificities of PDZ domains lie on a continuum . Second, the multispecificity of PDZ domains enables the combinatorial assembly of signaling complexes that control signaling processes. PDZ proteins interact with many signaling proteins and form preassembled complexes, which are important for the precision of information flow and the fidelity of cell signaling events . An interesting observation from our network approach is that a PDZ protein is connected to many ligands. These ligands may interact with a PDZ protein in a tissue-specific manner; the subsequent cell type-dependent expression of the PDZ ligands may lead to an alternative assembly of signaling complexes, thus enabling cell type-specific responses for extracellular signals. Indeed, we observed that the ligands of the SAP97 PDZ protein showed tissue-specific expression patterns, allowing the formation of tissue-specific cell signaling complexes (Figure S9). Third, the multispecific interactions of PDZ domains may enhance the robustness of the signaling processes mediated by PDZ domains. The robustness of the cell signaling network is known to increase because several means often exist to achieve one function as the failure of one can be compensated by others . In PDZNet, PDZ domains tend to interact with a series of homologous proteins, particularly cell surface receptors. This interaction may ensure reliable transmission of signals mediated by PDZ proteins to the plasma membrane.
We found that almost one-third of human PDZ ligands obtained their PDZ-binding motifs via C-terminal sequence mutations, providing evolutionary advantages to the PDZ domain-mediated interactions. First, the formation of linear motifs is an efficient mechanism to increase the number of interactions. Emergence of short linear motifs rarely disrupts the protein structure and can be accompanied by few amino acid changes . Second, the de novo evolution of interactions via sequence mutation provides an effective means for functional innovation. Gene duplication is known to have a limited role in the molecular innovation of biochemical function but facilitates the modularization of functional networks by specialization . In contrast, the de novo evolution of interactions allows connections between evolutionarily unrelated functional modules, thus enabling the reconfiguration of the molecular system. For instance, gain of the PDZ domain-ligand interaction between the EXOC4 PDZ ligand and the SAP102 PDZ protein demonstrated an innovation by bridging two different functional modules. We examined species-specific functional annotations of PDZ ligands and found that yeast EXOC4 participates in vesicle transport with other exocyst complex members, but vertebrate EXOC4 regulates NMDAR transport to the postsynaptic membrane by interacting with the SAP102 PDZ domain . Third, when a PDZ protein gains ligands by sequence mutation, it avoids a loss of fitness caused by an increase in dosage. The de novo evolution of PDZ domain-ligand interactions does not increase the copy number of the PDZ ligand genes, avoiding an unfavorable increase in protein concentration. In contrast, gain of interactions by duplication may cause a loss of fitness because proteins that contain linear motifs tend to be intrinsically disordered and dosage sensitive .
We were also interested in whether new PDZ domain interaction sites were acquired via C-terminal point mutations or DNA insertions. After careful observation of DNA modifications in newly acquired PDZ ligands, we found instances of both. For example, protein PBK of Macaca mulatta acquired PDZ domain interaction motif “ETDV” via C-terminal point mutations in which a single nucleotide substitution (T→C) changed Ile to Thr and another mutation (C→T) changed the codon for Gln to a stop codon (Figure S10A). On the other hand, EXOC4 acquired new PDZ domain interaction sites via DNA insertion in Oryzias latipes (Figure S10B). A large section of DNA inserted near the C-terminus of EXOC4 caused a frame shift mutation, which in turn became the PDZ domain-binding motif “ITTV.”
We found that the rewiring of PDZ domain-ligand interactions most frequently occurred between invertebrates and vertebrates. This massive rewiring may be connected to repeated rounds of whole-genome evolution in ancestral vertebrates. According to Ohno's model , when a gene is duplicated, mutations freely accumulate in the redundant duplicate copy under no selection. Therefore, the duplicate copy has a greater chance of developing new functions without altering existing functions. This evolutionary mechanism may facilitate network rewiring in early vertebrates.
We found that the components of PDZNet are largely associated with neurological diseases. We then asked whether we could identify mutations affecting PDZ-ligand binding, which causes genetic diseases. The disruption of the PDZ domain interaction between PICK1 and GluR7 is known to cause seizures, a chronic neurological disease . Mutations in the C-terminal sequence of GluR7 disrupted its PDZ domain interaction with PICK1. To examine whether our quantitative model can predict the effects of mutations in GluR7, we generated the PWM of the PICK PDZ domain and calculated the binding scores for both the wild-type and mutant forms of GluR7 (Figure S11). We found that the wild type had a high binding score (5.98), and the mutant had a much lower binding score (−0.02). This example illustrates how our method can be applied to characterize genetic diseases that are caused by mutations affecting PDZ domain-ligand interactions.
An important issue of the present biological network study is its incompleteness . PDZNet has room for improvement regarding network coverage in two respects: shortage of nodes and links in the current network. To test whether the conclusions obtained in this work are sufficiently robust with regard to both, we constructed smaller random networks from PDZNet and repeated the analyses. In each trial, 20% of the proteins or interactions were randomly removed from PDZNet. We found that in all tests, the overall organization of the rescaled PDZNet remained largely unaltered, and the conclusions and the differences between the paralog fractions of the PDZ proteins and ligands were retained (Figures S12, S13, S14, S15, S16, S17), supporting the robustness of our findings to the future expansion of PDZ domain-ligand interactions.
Due to the incompleteness of the interactome networks, expansion of network coverage is of significant value. PDZ domain-ligand interactions were relatively difficult to detect using current experimental techniques because transient interactions are often lost during experimental washing steps. Furthermore, a PDZ domain-ligand interaction often depends on phosphorylation , so it can be missed when screening for protein interactions preformed in a single condition. Therefore, many PDZ domain-ligand interactions remain to be discovered. We anticipate that putative PDZ domain-ligand interactions with high-binding scores from PWMs, expression correlations, and similar phylogenetic profiles may be used to uncover novel interactions. Therefore, we provide a candidate list of PDZ domain-ligand interactions to assist in the discovery of novel PDZ domain-ligand interactions (sbi.postech.ac.kr/pdz).
We assembled experimentally confirmed PDZ domain-ligand interactions from various data sources. In detail, we obtained PDZ domain-peptide binding data from a high-throughput binding assay between 81 mouse PDZ domains and 217 peptides derived from genome-encoded receptors by protein array . We collected in vivo PDZ domain-ligand interactions from the published literature, including peptide binding data of Drosophila INAD ,  and human AF-6  PDZ domains. Additionally, a PDZ domain-ligand interaction database, PDZBase , which currently lists 339 in vivo interactions between 145 PDZ domains and 217 ligands, was used. Finally, we obtained 54 human and 28 worm PDZ domains in a high-throughput binding assay  and four N-terminal PDZ domains of human INADL using phage display . The collection of these data resulted in 4,467 experimentally confirmed PDZ domain-ligand interactions.
We collected 563 human PDZ domain sequences from the Pfam repository . After eliminating redundancy, we obtained 268 sequences. We then examined pocket residues of the PDZ domains using hidden Markov model (HMM) alignment, removed the sequences that did not align in the pocket region, and finally obtained 241 distinct human PDZ domains.
We developed a two-step approach to quantify the strength of binding between the PDZ domains and ligands. Using this approach, the binding affinity between each PDZ pocket and its corresponding ligand position was predicted individually based on the idea that the contribution of each ligand position to the binding affinity is additive , which is a widely accepted view in the modeling of linear motif interactions , . The workflow of our approach is summarized in Figure 9.
In the first step, we designed the selectivity space of each pocket (Figure 9, left panel) to contain 20 axes, representing preferences for the corresponding amino acids in the peptide ligand (Figure S18). To obtain the ligand selectivity of the PDZ domains, three types of interaction data were used, namely protein arrays of mouse PDZ domains against synthesized peptides , collections of individual studies of PDZ domain-ligand interactions –, and high-throughput binding assays using phage display . We made a multiple sequence alignment (MSA) of PDZ domains using a HMM for the PDZ domain. We then extracted pocket residues from the MSA and encoded them into feature vectors based on their physicochemical properties. With the feature vectors, we constructed 20 training sets. In each training set, the feature vectors from specific amino acid-preferring pockets were used as a positive set, and the remainder was used as a negative set. We then applied Fisher's Linear Discriminant (FLD) analysis to these training sets such that discriminative axes were trained to distinguish specific amino acid-preferring pockets, resulting in a projection matrix composed of the axes' direction vectors. By multiplying the feature vectors with the projection matrix, we located the pockets of PDZ domains in the selectivity space. Thus, the selectivity spaces for each pocket capture intrinsic amino acid preferences from binary interaction data.
In the second step, to build a PWM of a query PDZ domain, we generated an affinity profile that represents the relative affinity contributions of 20 amino acids to the PDZ domain pocket (Figure 9, right panel). Based on the assumption that closely residing pockets in the selectivity space are similar in their amino acid preferences, we gathered the nearest neighbors of a query domain in a selectivity space to establish an affinity profile from their preferred amino acid sets. Pocket residues of a query PDZ domain were encoded into feature vectors using physicochemical properties and then located on a selectivity space using the projection matrix described above. We gathered ligand sets preferred by the nearest neighbors of the query pocket and estimated the binding affinity contributions of each position.
We converted the pocket residue sequences of a PDZ domain into vector representations by replacing all 20 amino acids with 10 physicochemical properties (amino acid indices) that describe the number of hydrogen bond donors , polarity , volume , bulkiness , hydrophobicity , , isoelectric point , positive charge , negative charge , electron ion interaction potential , and free energy in water . We normalized the values such that the standard deviation is 1 and the average is 0.
Our goal was to predict the specificities of a PDZ domain without knowledge of its structure. As such, a method to extract pocket residues from the sequences of PDZ domains was designed. To identify the positions of pocket residues within the PDZ domain sequence, an MSA was constructed, and the known structure of the PSD-95_1 domain was referenced. We performed a multiple alignment of the PDZ domain sequences using a HMM  and an HMM that was optimized for PDZ domains from Pfam  and aligned the secondary structure profile of the PSD-95_1 domain with the sequence alignment. Pocket residues were subsequently extracted according to the pocket definitions described in Wiedemann et al.  (Figure S19).
To estimate the PDZ domain-ligand binding affinity, we adopted an information theory-based PWM method that is widely used to estimate protein-DNA binding affinities , . In each selectivity space, 40 preferred amino acids of the neighboring pockets of the query were gathered. A PWM was calculated using the four sets of collected amino acids in which amino acid frequencies were calculated at each ligand position; these frequencies were compared to the background frequency that we expected to observe for the C-terminal sequences of the ligands.
PWM(a, i) is the affinity contribution of amino acid a at the ith position, fa,i is the frequency of amino acid a at the ith position in the collected set, and pa,i is the background frequency, defined as the probability of observing amino acid a at the ith position in any ligand protein. The constant k was empirically determined to be 1.921, so that PDZNet includes all experimentally confirmed PDZ domain-ligand interactions as positive binding scores.
A PWM was used to calculate the binding score of a potential interaction partner with a given sequence by summing the corresponding amino acids for the affinity contribution of each position. The binding score of each peptide was calculated according to the following formula:
where PWM(Si,i) is the affinity contribution of the amino acid Si at the ith position in the matrix and Si is the amino acid at the ith position of the peptide.
Affinity values of the 5,257 peptides against both the SNA1 and ERBIN PDZ domains were obtained from Wiedemann et al. , who assessed the affinity values of the peptides with a combination of experiments (i.e., surface plasmon resonance and Boehringer light unit) and statistical analyses.
To evaluate the performance of our method, we measured the ability to identify the 217 known binding partners of 145 PDZ domains in the PDZBase . Using a standard leave-one-out procedure, we generated PWMs and genome-wide rank lists of interaction candidates for each domain using their corresponding PWMs. Our method successfully predicts the binding partners of PDZ domains for which no interaction data are available. When we examined the percentile ranks of experimentally confirmed PDZ domain-ligand interactions, most were enriched at high positions in the rank lists (90th~100th percentile of the binding score; Figure 3C).
We compiled human protein interactions from a total of 22 existing protein interaction databases: the Bio-molecular Interaction Network Database (BIND), the Human Protein Reference Database (HPRD), the Molecular Interaction database (MINT), DIP, IntAct, BioGRID, Reactome, the Protein-Protein Interaction Database (PPID), BioVerse, CCS-HI1, the comprehensive resource of mammalian protein complexes (CORUM), IntNetDB, the Mammalian Protein-Protein Interaction Database (MIPS), the Online Predicted Human Interaction Database (OPHID), Ottowa, PC/Ataxia, Sager, Transcriptome, Complexex, Unilever, protein-protein interaction database for PDZ-domains (PDZBase), and a protein interaction dataset from the literature . We removed low-confidence interactions that were not supported by direct experimental evidence. The final network comprises 101,777 interactions between 11,043 human proteins.
We collected all physical interactions mediated by the PDZ proteins from the integrated PPI network. This PDZ protein-mediated interaction set may have some interactions that are mediated by interaction domains other than PDZ domains, because many PDZ proteins have various domains other than PDZ domains. Therefore, we removed such interactions that were connected by domain-domain interactions rather than PDZ domain-ligand interactions. First, we confirmed that PDZ domain-mediated interactions are rarely augmented by other interaction domains. We found that domain-domain interactions are not present in the experimentally confirmed PDZ protein-ligand interactions from the PDZBase . Furthermore, we found that domain-domain interactions are only enriched in low-scoring PDZ protein interactions (Figure S20). Based on these observations, we removed domain-domain interactions from PDZNet.
We also removed interactions that could be mediated by other peptide-binding domains, such as SH3 and WW domains, rather than PDZ domains. We searched the known peptide-binding motifs and removed interactions mediated by peptide-binding domains that had low binding scores. The cut-off binding score was set to the lowest binding score of the experimentally confirmed PDZ domain-peptide interactions from the PDZBase . The binding score represents the predicted binding strength between a PDZ domain and the C-terminal sequence of its partner. Subcellular localization information was taken from Swiss Prot and consensus localization annotations .
Let two species, i and j, be in a common tree with humans, and species i is more distant from humans. If a human PDZ interaction is absent in species i and present in species j, we define the interaction as rewired. Thus, the rewiring occurred during the time interval between the emergence of species i and j. We also consider that all proteins in species i could be rewired to PDZ proteins in species j. Thus, we define the rewiring rate as the following:
where nj is the number of rewired interaction found in species j; t is the divergence time from human; pall,i is the number of proteins orthologous to human protein in species i; and ppdz,j is the number of proteins orthologous to human PDZ proteins. Divergent time was obtained from the timetree (http://www.timetree.net).
To analyze the interactions between orthologous PDZ domains, we calculated the binding scores of the C-terminal sequences of orthologous PDZ ligands and the predicted PWMs with orthologous PDZ domain sequences.
We examined whether particular protein functions were enriched for protein categories that were defined based on the time of protein emergence and PDZ-binding motif acquisition. We systematically classified PDZ ligands into two categories: (1) proteins arose in invertebrates and acquired PDZ domain interaction sites in vertebrates; (2) proteins arose and acquired PDZ domain interaction sites in invertebrates; we then analyzed the overrepresented functional terms of each group. We used DAVID  for gene set enrichment analysis. All ligand proteins were used as background.
Mutations of PDZNet proteins were mapped to genetic diseases using disease-gene association databases from OMIM. The OMIM database lists gene-disease associations between 2,929 disease types defined by Morbid Map (MM) and 1,777 genes associated with particular disease types. Disease types were further categorized into 1,340 distinct diseases by joining disease subtypes into a single disease if similar disease names were used. These disease types were further classified into 20 disease classes based on the physiological system affected . The p values for over- or under-representation of the disease-associated genes in PDZNet were obtained using a hypergeometric distribution. We independently calculated the probability of the disease-associated genes in each class.
We created a user-friendly web service that provides a PWM and rank list of interaction candidates of a given PDZ domain sequence (Figure S21). The automated pipeline of the web service extracts pocket residues from the query PDZ domain sequence, predicts binding specificity (represented as a PWM), and generates a genome-wide rank list of potential ligands. The web service can handle various exceptions. For example, if a query is an incorrect PDZ domain sequence or an incorrect alignment was made in the pocket residues, the web service provides messages with explanations.
Comparisons of the quantitative model- and phage display data-derived PWMs of MAGI1_2, DLG1_2, and PTN13_2.
Distribution of the interaction partners of 97 human PDZ proteins. The maximum number of ligands per PDZ protein is 102. The average interaction partner of the human PDZ protein is 12.
Network representation of domain-level interactions in PDZNet (best viewed by magnification in a PDF viewer). Domain numbers are presented on the right side of the PDZ protein names with a delimiter (‘_’). The network is composed of 2,643 interactions between 190 PDZ domains and 593 ligands.
Dendrogram of PDZ domains based on the identity of pocket residues. Domain numbers are presented on the right side of the PDZ protein names with a delimiter (‘_’).
Relationship between specificity determining residue (SDR) identity and PWM similarity. Each point represents an orthologous PDZ domain pair.
Phylogenetic profile of human PDZ proteins and ligands across 13 fully sequenced species. The presence (yellow) and absence (black) of orthologs for the 104 PDZ proteins and 554 PDZ ligands are presented.
Multiple sequence alignment (MSA) of EXOC4 orthologs. The MSA was generated using Muscle with default options. C-terminal PDZ binding motifs are shown in bold.
Amino acid preference patterns of human PDZ domain pockets. (A–D) Clustering of amino acid preference profiles of 241 human PDZ domain pockets is shown.
Alternative expression of SAP97 ligands across three human tissues. The protein expression levels of SAP97 PDZ protein and its 13 ligands were compared across brain, bone, and epidermis. Protein expression was measured by quantitative mass spectrometry . The protein abundance ratio was defined by the normalized mass spectrometry intensity value relative to the maximum intensity per protein.
Types of DNA modifications that gain PDZ-binding motifs. (A) A point mutation generated a PDZ-binding motif in the C-terminal amino acids of the Macaca mulatta PBK protein. The binding motif is highlighted in the PWM of SAP97_1 (right). Mutations in the PDZ-binding motif are shown in the alignment of DNA sequences (bottom). (B) A DNA segment insertion generated a PDZ-binding motif in the C-terminal amino acids of the Oryzias latipes EXOC4 protein. The binding motif is highlighted in the PWM of SAP102_1 (right). The inserted DNA segment is shown in the alignment of DNA sequences (bottom).
Mutation effects of the C-terminal GluR7 sequence. (A) C-terminal sequences and binding scores of wild-type and mutation forms of GluR7. (B) The PWM of the PICK1 PDZ domain. Four C-terminal residues of wild-type GluR7 are highlighted.
Repeated analysis of PDZNet by randomly removing 20% of proteins (trial 1). (A) Network representation of PDZNet. (B) Paralog fractions of PDZ ligands that share the same PDZ proteins (left) and PDZ proteins that share the same PDZ ligands (right).
Repeated analysis of PDZNet by randomly removing 20% of proteins (trial 2). (A) Network representation of PDZNet. (B) Paralog fractions of PDZ ligands that share the same PDZ proteins (left) and PDZ proteins that share the same PDZ ligands (right).
Repeated analysis of PDZNet by randomly removing 20% of proteins (trial 3). (A) Network representation of PDZNet. (B) Paralog fractions of PDZ ligands that share the same PDZ proteins (left) and PDZ proteins that share the same PDZ ligands (right).
Repeated analysis of PDZNet by randomly removing 20% of interactions (trial 4). (A) Network representation of PDZNet. (B) Paralog fractions of PDZ ligands that share the same PDZ proteins (left) and PDZ proteins that share the same PDZ ligands (right).
Repeated analysis of PDZNet by randomly removing 20% of interactions (trial 5). (A) Network representation of PDZNet. (B) Paralog fractions of PDZ ligands that share the same PDZ proteins (left) and PDZ proteins that share the same PDZ ligands (right).
Repeated analysis of PDZNet by randomly removing 20% of interactions (trial 6). (A) Network representation of PDZNet. (B) Paralog fractions of PDZ ligands that share the same PDZ proteins (left) and PDZ proteins that share the same PDZ ligands (right).
Discriminating power of selectivity axes. Each boxplot shows distributions of binders and non-binders of an amino acid, which are presented at the top of the plot. Binders are PDZ domain pockets that prefer the amino acid, and non-binders are those domain pockets that do not prefer the amino acid. The vertical axis corresponds to an axis of a selectivity space. Fisher's score (FS) is presented at the top of each plot, indicating the discriminating power of the selectivity axes.
Procedure for extracting pocket residues. (A) Schematic drawing of the PSD-95_1 domain structure. (B) Position of each pocket residue on the structure. (C) The MSA of three representative PDZ domains was constructed using the hidden Markov model that was optimized for the PDZ domain. By adjusting the secondary structural profile on the MSA, the positions of pocket residues were identified. Gray boxes indicate the positions of pocket residues.
Fraction of domain-domain interactions according to the binding scores of all PDZ protein-mediated interactions. The PDZ protein-mediated interactions were binned based on binding score. The fraction of domain-domain interactions were measured for each bin.
Flow chart of web server and a sample output. The web server takes a query PDZ domain sequence and a species name. The outputs are pocket residues, a PWM of the query PDZ domain, and a genome-wide rank list of proteins from the species chosen by the user.
Position weight matrices (PWMs) for 515 human PDZ domains. For resource purposes, homologous PDZ domains are included in the list.
Validation of PWMs on in vivo partners derived from various species.
PDZ domain-ligand interactions in PDZNet.
C-terminal sequences of human PDZ ligand orthologs.
Experimental evidence of human PDZ domain-ligand interactions that emerged via sequence mutations. ‘−’ indicates the absence of an ortholog
Over-represented gene ontology (GO) terms of PDZNet proteins based on the time point of acquiring PDZ domain interaction sites.
Disease classes associated with mutations of PDZNet components.
A morbid map of PDZNet components with the classification of genetic diseases.
The authors have declared that no competing interests exist.
This work was supported in part by National Research Foundation grants (R31-10100 of the World Class University program, 20100028453 of the NCRC program, and 20100020528) funded by the Korean Ministry of Education, Science, and Technology, Seoul National University Cancer Research Institute (cri-10-2), and a POSTECH BSRI grant. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.