Using large scale experimental datasets, the authors show how modular protein interaction domains such as PDZ, SH3 or WW domains, frequently display unexpected multiple binding specificity. The observed multiple specificity leads to new structural insights and accurately predicts new protein interactions.
Modular protein domains interacting with short linear peptides, such as PDZ, SH3 or WW domains, display a rich binding specificity with significant interplay (or correlation) between ligand residues.The binding specificity of these domains is more accurately described with a multiple specificity model.The multiple specificity reveals new structural insights and predicts new protein interactions.
Modular protein domains have a central role in the complex network of signaling pathways that governs cellular processes. Many of them, called peptide recognition domains, bind short linear regions in their target proteins, such as the well-known SH3 or PDZ domains. These domain–peptide interactions are the predominant form of protein interaction in signaling pathways.
Because of the relative simplicity of the interaction, their binding specificity is generally represented using a simple model, analogous to transcription factor binding: the domain binds a short stretch of amino acids and at each position some amino acids are preferred over other ones. Thus, for each position, a probability can be assigned to each amino acid and these probabilities are often grouped into a matrix called position weight matrix (PWM) or position-specific scoring matrix. Such a matrix can then be represented in a highly intuitive manner as a so-called sequence logo (see Figure 1).
A main shortcoming of this specificity model is that, although intuitive and interpretable, it inherently assumes that all residues in the peptide contribute independently to binding. On the basis of statistical analyses of large data sets of peptides binding to PDZ, SH3 and WW domains, we show that for most domains, this is not the case. Indeed, there is complex and highly significant interplay between the ligand residues. To overcome this issue, we develop a computational model that can both take into account such correlations and also preserve the advantages of PWMs, namely its straightforward interpretability.
Briefly, our method detects whether the domain is capable of binding its targets not only with a single specificity but also with multiple specificities. If so, it will determine all the relevant specificities (see Figure 1). This is accomplished by using a machine learning algorithm based on mixture models, and the results can be effectively visualized as multiple sequence logos. In other words, based on experimentally derived data sets of binding peptides, we determine for every domain, in addition to the known specificity, one or more new specificities. As such, we capture more real information, and our model performs better than previous models of binding specificity.
A crucial question is what these new specificities correspond to: are they simply mathematical artifacts coming out of some algorithm or do they represent something we can understand on a biophysical or structural level? Overall, the new specificities provide us with substantial new intuitive insight about the structural basis of binding for these domains. We can roughly identify two cases.
First, we have neighboring (or very close in sequence) amino acids in the ligand that show significant correlations. These usually correspond to amino acids whose side chains point in the same directions and often occupy the same physical space, and therefore can directly influence each other.
In other cases, we observe that multiple specificities found for a single domain are very different from each other. They correspond to different ways that the domain accommodates its binders. Often, conformational changes are required to switch from one binding mode to another. In almost all cases, only one canonical binding mode was previously known, and our analysis enables us to predict several interesting non-canonical ones. Specifically, we discuss one example in detail in Figure 5. In a PDZ domain of DLG1, we identify a novel binding specificity that differs from the canonical one by the presence of an additional tryptophan at the C terminus of the ligand. From a structural point of view, this would require a flexible loop to move out of the way to accommodate this rather large side chain. We find evidence of this predicted new binding mode based on both existing crystal structures and structural modeling.
Finally, our model of binding specificity leads to predictions of many new and previously unknown protein interactions. We validate a number of these using the membrane yeast two-hybrid approach.
In summary, we show here that multiple specificity is a general and underappreciated phenomenon for modular peptide recognition domains and that it leads to substantial new insight into the basis of protein interactions.
Modular protein interaction domains form the building blocks of eukaryotic signaling pathways. Many of them, known as peptide recognition domains, mediate protein interactions by recognizing short, linear amino acid stretches on the surface of their cognate partners with high specificity. Residues in these stretches are usually assumed to contribute independently to binding, which has led to a simplified understanding of protein interactions. Conversely, we observe in large binding peptide data sets that different residue positions display highly significant correlations for many domains in three distinct families (PDZ, SH3 and WW). These correlation patterns reveal a widespread occurrence of multiple binding specificities and give novel structural insights into protein interactions. For example, we predict a new binding mode of PDZ domains and structurally rationalize it for DLG1 PDZ1. We show that multiple specificity more accurately predicts protein interactions and experimentally validate some of the predictions for the human proteins DLG1 and SCRIB. Overall, our results reveal a rich specificity landscape in peptide recognition domains, suggesting new ways of encoding specificity in protein interaction networks.