Protein-protein interactions (PPIs) are central to understanding how cell functions are integrated. At this point scientists have deciphered a well-developed theory that explains the physical and chemical forces that underlie the non-covalent chemistry of PPIs. Electrostatic, dipole-dipole, van der Waals forces are well quantified into accepted force fields such as AMBER that are used for protein structure determination by NMR, modeling of macromolecules, refinement of X-ray structures, macromolecular dynamics, and drug design1–3
. A more recent knowledge base of PPIs has emerged from the growing number of PPI databases built upon data largely from high-throughput techniques such as yeast 2-hybrid screens and Tandem affinity purification/Mass-spectrometry analysis, as well as manual curation of the scientific literature. Collectively BIND, DIP, BioGrid, IntAct, MIPS, MINT, HPRD, the YPD and others contain several hundred thousand PPIs4–8
. Despite these efforts, we have yet to develop a comprehensive enumeration of PPIs.
One of the major efforts addressing the development of a theory that reliably predicts new PPIs is that of minimotifs (also called short linear motifs or SLiMs), which bind to protein domains and provide a key connection of the physical and chemical forces with the large PPI data sets. Minimotifs are contiguous peptide elements in proteins, generally less than 15 residues in length that have a defined function. One class of functions includes binding minimotifs such as those that engage SH2, SH3, PDZ, and a number of other modular protein domains9
. These minimotifs are of known molecular function and are distinct from de novo
prediction of motifs by several approaches including MEME, Gibbs Sampler, PRATT, TEIRESIAS, D-MOTIF and other algorithms10–14
While some PPIs involve extensive surface contact, PPIs driven by minimotifs are generally simpler, with a reduced surface of contact. Analysis of minimotif-driven PPIs simplifies the problem of theoretically predicting new PPIs by limiting the residues that need to be considered. Generally minimotifs are identified by studying sequences of a collection of instances that are known to interact with a protein or by analysis of the permutational space of each position in a peptide sequence that can bind to a domain by phage display, screening random peptide libraries, SPOT peptide arrays, or site-directed mutagenesis. Most often interpretation of this data reduces the series of instances down to a consensus minimotif that accounts for degeneracy at each position. Alternatively, degeneracy and variation can be quantitatively represented in a position specific-scoring matrix (PSSM), which has the advantage that it captures the probability of the collection of instances for each position.
Consensus sequences and PSSMs have some predictive value but have limitations. While high-throughput experimentation has helped us to begin to understand some of the specificity determinants of many minimotifs, sequence alone is not an accurate predictor of novel minimotif instances and does not, by itself, account for the higher degree of specificity observed in minimotif-driven PPIs.
There has been much effort to increase the specificity of and reduce false-positive minimotif predictions15–19,9
. In the Minimotif Miner (MnM) application for predicting minimotifs, three approaches can be used: frequency analysis relies on the simple premise that minimotifs with more complex definitions are less likely to have false-positives15,16
. Since minimotifs must be on the surface of a protein, a surface prediction algorithm can be used to minimize the prediction of buried minimotifs. Likewise, those minimotifs that are conserved in different species provide another measure to reduce false positives. Another major minimotif database, Eukaryotic linear motif server (ELM) utilizes different filters that are complementary to MnM17,18
. A cell compartment filter identifies minimotifs in appropriate cellular compartments, while a globular domain filter can be used to restrict predictions to intrinsically disordered regions. ELM uses taxonomy in a different way than MnM to identify minimotifs in organisms with a conserved minimotif partner. ELM also has a surface filter and a secondary structure prediction filter.
In MnM a query protein (minimotif source protein) is entered and, in part, the sequence is analyzed for minimotifs that encode putative interactions with target proteins. Each query generally produces many target predictions; however, like other minimotif prediction programs, there is a relatively high number of false-positive predictions. To address this limitation we have now adapted a concept previously used to identify novel minimotifs19
. Neduva and Russel examined sets of protein-protein interactions to identify proteins that interacted with a common target protein and shared a unique minimotif signature19
Here, our goal was to engineer a new tool that would have two principle uses: 1) it would improve prediction of new minimotifs in MnM, by reducing the false-positives predictions. A filter would restrict the target predictions to those proteins where the minimotif source protein and the target protein are already known to interact. We implemented several strategies to modulate filter stringency. 2) it can be used to facilitate the study of PPI theory by identifying minimotifs between two proteins that are already known to interact, but the interface is not yet known. For example, in an example analysis of Discs, large homolog associated protein (DLGAP-1, NP_075235) Minimotif Miner predicts 123 potential binding motifs, however, only 2 are previously known to have a known protein-protein interaction. Thus, using PPIs reduced the number of minimotif predictions. In the second application above, an example analysis of DLGAP-1 shows that it contains PxxPxK and YxxP minimotifs. These minimotifs are known to bind to the SH3 and SH2 domains of Crk, respectively20
. A protein-protein interaction of DLGAP1 with Crk was previously identified by an array screen of SH3 binding motifs that included Crk peptides21
. Thus two mechanisms for this PPI are now suggested.