|Home | About | Journals | Submit | Contact Us | Français|
Proteins play a fundamental role in establishing the diversity of cellular processes in health or disease systems. This diversity is accomplished by a vast array of protein functions. In fact, a protein rarely has a single function. The majority of proteins are involved in numerous cellular processes, and these multiple functions are made possible by interactions with other molecules. The complexity of interactions is substantially increased by the spatial and temporal diversity of proteins. For example, proteins can be part of distinct complexes within different subcellular compartments or at different stages of the cell cycle. Posttranslational modifications can regulate and further expand the ability of proteins to establish localization- or temporal-dependent interactions. This complexity and functional divergence of interactions is further increased by the simultaneous presence of stable, transient, direct, and indirect protein interactions. Thus, an understanding of protein functions cannot be fully accomplished without knowledge of its interactions. Characterizing these interactions is therefore critical to understanding the biology of health and disease systems.
Methodologies for studying protein interactions have become a core component of the proteomics field, which aims to define protein abundances, modifications, interactions, and functions. The study of protein interactions is not without difficulty. Protein interactions are inherently dynamic, and complex mixtures of stable and transient interactions routinely co-exist. Immunoaffinity purification (IP) approaches coupled to mass spectrometry (MS) can be used to study a range of these functionally relevant protein associations. Recent years have seen a significant improvement in these affinity-based methods. These improvements include advances in affinity tools, methods for sample preparation, mass spectrometry configurations offering increased sensitivity and accuracy, and bioinformatics approaches allowing a thorough analysis of large-scale or targeted interactome datasets. Therefore, affinity-based methods for identifying protein interactions have grown to encompass a wide variety of techniques with the ability to study diverse biological systems.
This review summarizes recent developments in IP-MS strategies for studying protein-protein interactions. Important considerations and optimization techniques for IP workflows are introduced, providing practical suggestions, as well as concrete examples of studies and applications. One of the greatest challenges in protein interaction studies is recognizing from the multitude of identified interactions those that are specifically associated with the isolated protein of interest. Therefore, a chapter of this review is focused on describing the types of non-specific associations and the current methods used for assessing specificity of interactions. IP-MS strategies have also improved in tackling the challenge of identifying stable and transient interactions, partly by using a combination of cross-linking, MS, and bioinformatics approaches. This review describes some of these approaches, as well as their application to understanding the structure of protein complexes. Critical to characterizing protein interactions is the ongoing development of appropriate bioinformatics tools for mapping interaction networks. An overview of the resources available for analyzing, constructing, and visualizing protein networks is provided, together with a discussion of their advantages and representative literature. While highlighting initial methods and selected developments, for each of these critical aspects of protein interaction studies, this review mainly focuses on current approaches and applications.
The characterization of protein-protein interactions requires the successful isolation of protein complexes close to their physiological states. Maintaining and purifying protein associations close to their native form is challenging, and requires the ability to identify interactions of both stable and transient nature. Affinity-based approaches coupled to mass spectrometry analysis can be powerful tools in studying protein-protein interactions due to their simple and fast execution, selectivity, and sensitivity. Additionally, the enrichment of proteins of interest by immunoaffinity purification can provide insights into posttranslational modifications that may regulate protein interactions and functions1-4. In recent years, affinity-based methods have taken multiple shapes and forms due to the development of diverse workflows that can be integrated within a broad range of biological contexts. Yet, despite this diversity, there are several universal steps (Fig. 1), including: 1) generation/preparation of cell or tissue samples expressing the endogenous or affinity-tagged protein of interest, 2) cell or tissue lysis and solubilization of protein complexes under optimized lysis buffer conditions, 3) isolation of the protein of interest with its interaction partners by affinity purification, 4) elution of purified protein complexes, and 5) analysis of co-isolated proteins by mass spectrometry, leading to the identification and quantification of protein-protein interactions. This general outline is open to multiple modifications at every step to ensure efficient protein isolation, and application to various biological systems, research conditions, and experimental goals. The protein's abundance, stability, localization, and physicochemical properties are factors that have to be considered; thus, much thought must be placed into the design of the workflow for isolation and characterization of protein complexes. This section outlines critical aspects of affinity-based methods for studying protein interactions.
Affinity-based approaches involve the isolation of a target protein (bait) and its interactions (preys) by affinity binding to a capture molecule immobilized on a solid support (resin). Thus, the protein of interest gets captured on the resin together with its interaction partners and, after different washing steps aimed at eliminating non-specific interactions, the isolated protein complex is eluted and analyzed by mass spectrometry. While this approach is frequently applied to small-scale studies focused on one or several proteins of interest5, 6, recent reports have also tackled large-scale studies where the interactomes of an entire system were characterized.7-10 Proteins can be purified in their endogenous form or through an affinity epitope tag (Fig. 1). The advantage of isolating the endogenous protein is the ability of capturing its physiological state, abundance, and interactions within a multitude of systems (i.e., cells, tissues, animal models), without the need for cloning or tagging. This method was successfully implemented in studies of individual endogenous proteins, as well as in large-scale interaction studies. For example, Malovannaya et al. performed a large-scale study of endogenous human regulatory protein complexes, focusing on nuclear transcriptional and signaling proteins.11 Drawbacks of isolating endogenous proteins include the dependence on availability, specificity, and affinity of antibodies recognizing the proteins of interest. Cross-reactivity and cost of the utilized antibodies are often concerns, as well as the possible interference of posttranslational modifications in epitope binding. Additionally, it is not uncommon for an antibody to work well for immunofluorescence or Western blotting, but not for IP experiments.
As an alternative to isolating endogenous proteins, a routinely utilized workflow involves the isolation of epitope tagged proteins. In this approach an epitope tag is fused to the target protein and IP is performed using an antibody specific to the tag. As the method is independent of the protein's binding properties, it is a universally applicable approach, useful for both low- and high-abundance proteins. This approach is suitable for large-scale and high-throughput experiments, as it can be applied to multiple proteins using a single tag. The workflow can be robust and reproducible, as the experimental conditions can be optimized for specific epitope tag(s). One drawback is the time requirement, since the fusion protein has to be generated and introduced into the biological system of choice. Most importantly, the fusion protein has to be assessed to confirm that the tag is not interfering with the protein's endogenous function, localization and properties.1, 12-14 This is an important consideration when studying cellular proteins from various systems (e.g., mammalian1, 14-16, bacterial17, 18, yeast13, 19), as well as proteins introduced by pathogens12, 20. Additionally, when the protein is expressed under an exogenous promoter, functional assays have to confirm that its overexpression does not alter its physiological properties and roles. Retrovirus1, 21 or lentivirus10, 22 transductions have been successfully utilized to control the level of overexpression of tagged proteins. Alternatively, expression vectors with inducible promoters (e.g., tetracycline-inducible) can be used to turn on or off the expression of the fusion protein, and avoid side effects or toxicity due to overexpression.2 A recent study showed that by titrating the amounts of tetracycline, the level of the tagged protein can be adjusted to mimic the endogenous levels prior to immunoaffinity purifications23. Other advancements in expressing tagged proteins at or near endogenous levels include the introduction of tag-encoding DNA into endogenous loci via homologous recombination24, 25, as well as the incorporation of bacterial artificial chromosomes (BACs) for BAC transgene generation allowing the expression of tagged fusion proteins upon stable transfection into mammalian or other biological systems.26-28
A variety of epitope tags have been utilized in immunoaffinity purification experiments (as reviewed in 29, 30). Commonly used single tags include small epitope tags, such as FLAG, six histidines (His)6, c-Myc, and hemaglutinin (HA), or larger tags, such as green fluorescent protein (GFP), glutathione-S-transferase (GST), and protein A (PrA). FLAG, available in 1X or 3X versions, is a frequently used tag due to its small size (8 amino acids per 1X) that limits its interference with protein function.31 3xFLAG tends to be preferred due to its higher affinity than the 1XFLAG and the corresponding increased efficiency of isolation.5, 23, 32 In a recent study, Law et al. purified 3xFLAG fusion proteins involved in RNA-directed DNA methylation, demonstrating a role for these complexes in polymerase V-dependent transcript production in plants.33 In another study, 3xFLAG tags were utilized to study virus-host protein interactions during Sindbis virus infection, unraveling host factors targeted by the virus RNA-dependent RNA polymerase and important for virus replication5. FLAG tags have been also applied to global interactome studies, including a large-scale mapping of human protein-protein interactions involved in diverse biological processes, such as proteasome function, translation, and progression through mitosis.34 GFP is another commonly used tag that has emerged as an effective tool for integrating knowledge about protein localization and interactions.35, 36 The increased use of GFP for protein isolations reflects the current emphasis on understanding the dynamics of protein interactions, as it allows placing protein interactions in a specific spatial and temporal cellular context. Cristea et al.35 have demonstrated the effectiveness of using single-step isolations via the GFP tag for characterizing protein complexes. Yeast and mammalian proteins were visualized by direct fluorescence and isolated by rapid immunoaffinity purifications via the GFP tag. As this study illustrated, the library of GFP-tagged yeast strains generated by O'Shea and Weissman using homologous recombination is a valuable resource for studying localization and interactions of GFP-tagged proteins expressed at endogenous levels in yeast.24 Nevertheless, affinity purifications via the GFP tag have been widely applied to multiple biological systems, including yeast35, 37, metazoans36, bacteria17, 18, mammalian cells1, 38 or tissue28, 39, 40. For example, Cheeseman et al. studied kinetochore proteins in C. elegans, referring to this combined visualization and purification as a Localization and Affinity Purification (LAP) tag strategy.36, 41 Goldberg et al. characterized interactions specific to distinct histone H3 isoforms in mouse embryonic stem cells, identifying H3.3 unique interactions and their roles in localizing H3.3 to telomeres.38 Additionally, these approaches were powerful for characterizing virus-host and virus-virus protein interactions that proved critical during infections with diverse viruses, including Sindbis5, 42, Human cytomegalovirus 12, 20, 43, Herpes simplex virus2, West Nile virus44, and Pseudorabies virus.45 A variation of the GFP isolation approach was designed based on single-chain antibodies raised against a GFP fragment in alpaca, which reduces contamination from antibody fragments.46, 47 The molecular weight of GFP, or other similar large tags, is a concern in IP studies, and the validation of the function and localization of the tagged protein is critical, as shown in 1.
While single-step affinity purification methods have proven effective for capturing low abundant and weak protein interactions, tandem affinity purifications (TAP) were initially introduced to preserve stable interactions and reduce non-specific associations.48 The original TAP method is based on a double TAP tag composed of a Calmodulin binding peptide (CBP) and an IgG-binding unit of Protein A from Staphylococcus aureus (PrA), allowing for a two-step affinity protein purification strategy. Although first applied in yeast, this TAP strategy has been further developed and optimized for isolations in other biological systems, including mammalian14, 49, 50, plants51, 52, viruses53, 54, and bacteria55. In recent years, multiple variations of the TAP tags have been designed for proteomic studies, as reviewed in56-58. Tags used for tandem affinity purification ranged from GFP41 and PrA to smaller tags, such as FLAG. An interesting modification was developed by Gloeckner et al. 59 based on a Strep/FLAG (SF)-TAP tag to reduce the tag size and interference with protein folding. This method has recently undergone further variations through a triple-tag approach that combines StrepII, FLAG, and yellow fluorescent protein (YFP) for parallel affinity capture (iPAC) and screening of protein interactions, localization, and expression.60 In addition, an interesting spin to the TAP methodology, with a concept similar to Förster resonance energy transfer (FRET), has been developed by Maine et al. introducing the bimolecular affinity purification system (BAP).61 This approach incorporates two affinity tags, but unlike traditional TAP procedures, each tag is fused to a different protein from a common multi-subunit complex. Thus, the method is applicable to targeted studies of a distinct molecular complex when two (or more) of its components are already known.
As discussed thus far, immunoaffinity purification methods most commonly involve the utilization of antibodies against endogenous proteins or tags for the isolation of protein complexes. In recent years alternative tools have been developed to complement the use of antibodies. For example, aptamers (nucleic acids), molecular imprinted polymers (MIPs), and engineered binding proteins (protein scaffolds) have been implemented as substitute affinity molecules, as reviewed by Ruigrok et al.62 Among the advantages of these tools are their low production costs, efficient selection, and in some cases greater tolerance to stringent affinity isolation conditions. In a representative study DeGrasse63 utilized a single-stranded DNA aptamer to specifically bind and isolate Staphylococcus aureus Enterotoxin B protein. In another example, Wiens et al.64 generated a silicatein-α fusion protein carrying a Glu tag for binding to hydroxyapatite solid matrix, and analyzed the protein interactions by mass spectrometry. Additionally, small molecules have been successfully used in quantitative chemoproteomics approaches to validate targets of drugs and identify complexes that may be preferentially bound.65, 66 Overall, these tools promise to expand the range of methods for studying interactions. Remaining challenges for their widespread use range from experimental considerations (e.g., aptamers are highly sensitive to nucleases present in lysis buffers, small molecules have to be engineered to contain a binding linker to resins) to more practical obstacles (e.g., patent-protected methodologies are oftentimes strictly licensed to a selected number of biotechnology companies).
In order to effectively isolate a protein of interest, the tissue or cells are first subjected to lysis. This lysis has to be performed in a manner that ensures exposure of cellular contents for immunoaffinity purification, without disruption of protein interactions. Several types of lysis are routinely performed in IP experiments. The most common methods involve physical cell (or tissue) disruption, preferably done cryogenically, or direct lysis in optimized detergent-containing buffers. Cryogenic cell lysis is recommended as an optimal choice in numerous IP-MS workflows. The immediate freezing of the sample (cell or tissue) helps to preserve protein complexes close to their cellular state. Subsequent mechanical grinding of the frozen sample provides an effective mean to shatter through cell wall, cytoskeletal networks, membranes and vesicles, providing access to various baits of interest and reducing non-specific associations35, 67-69. Cryogenic lysis can be performed using mortar and pestle if large amounts of starting material are available, or within grinding jars or plastic tubes if starting with smaller amounts of frozen cells and for more consistent grinding. This type of lysis has been extensively utilized by many groups when working with tissue or cells samples, and is usually followed by suspension of the frozen sample powder in an optimized lysis buffer. For example, we successfully applied this workflow to recent studies of protein complexes in bacteria17, yeast18, mammalian cells1, 21, as well as during viral infection.5, 12, 20 Mechanical disruption can be also performed by shearing the cellular sample by passing through a needle, or by applying temperature shifts using freeze/thaw cycles (for review see70). This type of lysis may not be the appropriate choice when IPs of the bait protein aim to isolate an intact large structure (e.g., postsynaptic density from brain tissue samples) or organelle (e.g., mitochondria). In such cases, direct lysis in a detergent- and salt-containing buffer is more suitable. One caveat of whole cell lysis is that proteins with different sub-cellular localizations can mix during the lysis procedure, becoming available for non-specific binding. Therefore, if localization of the bait protein is known, fractionation can be incorporated into an IP workflow to achieve efficient and clean isolation. Fractionation strategies have been utilized in various biological systems and cell types, especially in large-scale studies aimed at characterizing proteomes of organelles (e.g., mitochondria, Golgi, nucleus).71 However, numerous proteins have multiple sub-cellular localizations, in which case IPs can be either performed from whole cell lysates or as parallel isolations from the fractionated sub-cellular compartments (e.g., parallel nuclear and cytoplasmic isolations). In the latter case, localization-dependent unique and shared interactions can be compared by quantitative mass spectrometry. For example, Trinkle-Mulcahy et al. characterized nuclear- and cytoplasmic-specific interactions of the Survival of Motor Neuron (SMN) complex, known to function in both compartments.72 If the mechanisms regulating protein localization are known (e.g., phosphorylation sites), then mutants can be used to trigger the localization of a protein to a certain sub-cellular compartment, and isolations of mutants can be used to define localization-dependent interactions. For example, Greco et al. utilized phospho-mutants to control the nuclear-cytoplasmic shuttling of histone deacetylase 5 (HDAC5) and determine its localization-dependent interactions.1
Regardless of the procedure utilized for sample preparation (i.e., cryogenic lysis or fractionation), the optimization of lysis buffer conditions is one of the most important steps in IP experiments, providing a balance between efficient protein solubilization, while preserving stable and weak interactions. In addition, this step is critical for reducing non-specific interactions, as discussed in section “Determining specificity of interactions”. Lysis buffers have to be tailored to the chemical properties and localization of the bait protein. Most lysis buffers include protease inhibitors to avoid protein degradation, as well as DNases and/or RNases to decrease sample viscosity and facilitate protein isolation. Sonication can also be applied to mechanically shear DNA and RNA molecules. DN/RNases should be avoided when investigating protein interactions that are facilitated by DNA or RNA molecules. Different types and concentrations of salts and detergents, as well as the pH and ionic strength of the solution should be considered when optimizing the lysis buffer composition, as this determines the accessibility of the bait for isolation and, therefore, the efficiency of extraction. A list of commonly used detergents and their properties have been described.39 Among these, milder detergents that can be used to study more sensitive protein complexes and protein-lipid interactions include Triton X-100 and NP-40 (depending on the selected concentrations). More stringent detergents (e.g., Sodium Deoxycholate, Digitonin) solubilize lipid molecules, making these suitable for studying interactions of membrane-bound proteins. Optimization of lysis buffer detergents is also necessary to ensure the isolation of protein complexes close to their native state. For example, Everberg et al. used a combination of milder detergents (i.e., Zwittergent 3-10, Triton X-114) followed by a polymer two-phase partitioning system to enrich for solubilized membrane protein complexes in their native state.73 Additional considerations include the impact of detergents on the selected mass spectrometry analysis workflow. Norris et al. utilized cleavable detergents for the isolation and analysis of intracellular and membrane proteins, as these proved to be less detrimental for matrix-assisted laser desorption/ionization (MALDI) mass spectrometry and could be eliminated prior to MS analyses.74 Further information on the selection of lysis buffer conditions when studying proteins in different biological contexts is provided in several recent studies and reviews.39, 70, 75
In addition to lysis conditions, other factors critical for efficient isolation of protein complexes include the choice of antibodies and tags (as discussed above), as well as the selection of affinity resin and duration of protein purification (Fig 1). Several types of affinity resins have been commonly used in IP experiments, including natural (e.g., agarose and sepharose beads), organic (e.g., glass), and synthetic (e.g., acrylamide-based supports) resins, as reviewed in76. Magnetic beads are a more recent and continuously improving addition to affinity resins. Important advantages of the latter include the available varied chemistry for binding and their inherent property of surface binding that allows the isolation of a broad range of protein complex sizes. Additionally, since magnetic beads do not require centrifugation, but instead use a magnet for their collection, their easy handling reduces non-specific binding. The impact of the resin type and duration of isolation on the accumulation of non-specific associations is discussed in the “Determining specificity of interactions” section.
The strategies for eluting the isolated protein complexes depend on the purpose of the subsequent analysis, and whether the isolated proteins will be analyzed in their denatured or native forms (Fig. 1). The most common workflows utilize the elution of isolated complexes in a denatured manner, such as with buffers containing sodium dodecyl sulfate (SDS) or lithium dodecyl sulfate (LDS). The advantage of such buffers is the resulting high efficiency of elution, and LDS-based elutions have been recently adapted for sample preparation by in-solution digestion prior to mass spectrometry (e.g., filter-aided sample preparation (FASP77)). However, this type of elution can also trigger the release of non-specific molecules attached to the affinity resin, as well as immunoglobulin (IgG). In a recent study Antrobus and Borner attempt to reduce the amounts of contaminant IgG by modifying the elution conditions for isolated endogenous proteins.78 In this study, “soft” elution conditions using lower detergent concentrations and a lower temperature for elution decreased the amounts of contaminant IgG without dramatically compromising the efficiency of protein isolation. The type of isolated protein complex may, however, play a role in this efficiency of elution. Another common strategy for achieving a denaturing elution is the use of solutions with acidic or basic pH. Citric acid and trifluoroacetic acid (TFA) are commonly used for acidic elutions21, while ammonium hydroxide in combination with ethylenediaminetetraacetic acid (EDTA) is often preferred for basic pH conditions.35 In addition, protein elution can be performed over a pH gradient providing a comprehensive view of the nature of the isolated protein-protein interactions, as multiple fractions can be collected and analyzed. This strategy, while being labor intensive and less practical for high-throughput proteomic studies, can help reveal the stability of interactions (i.e., retained at acidic or basic pH conditions).
As an alternative to denaturing strategies, protein complexes can be eluted in a non-denatured way to preserve intact isolated assemblies, possibly for further functional studies. These types of elution are well integrated with recent mass spectrometry technology developments that allow the analysis of native proteins and complexes.79-85 In such studies, elution can be performed by utilizing reagents for competitive binding to the resin. For example, FLAG (1X or 3X) peptides are used for eluting complexes isolated via anti-FLAG antibodies.86, 87 Additionally, a cyclic peptide has been developed for the native elution of complexes isolated via PrA tag.88 Competitive elutions have also been utilized when purified proteins, rather than antibodies, have been conjugated to resin and used for isolating interacting proteins. For example, Dubois et al. used competition with a synthetic phosphopeptide to elute phosphorylated substrates of sepharose-immobilized 14-3-3 proteins.89 In another example, phenyl phosphate was utilized to elute phosphotyrosine-containing protein complexes involved in receptor tyrosine kinases signaling.90
Upon isolation, a comprehensive characterization of the isolated protein complex(es) can be achieved by taking advantage of the multitude of available mass spectrometry-based workflows (Fig. 1). Depending on the complexity of the isolated assembly, the proteins can be separated by SDS-PAGE electrophoresis (1-D or 2-D), in-solution isoelectric focusing (at the protein or peptide level)85, or directly prepared for mass spectrometry analysis. Protein complexes are routinely analyzed using bottom-up or middle-down approaches, in which the eluted proteins are digested with proteases.82 Bottom-up approaches utilize enzymes that digest the proteins into numerous small to medium length peptides (e.g, trypsin, GluC). Middle-down approaches utilized digestions with proteases that aim to preserve larger parts of the protein intact (e.g., LysC), being frequently integrated in studies of cross-talk between posttranslational modifications. A combination of enzymes can be used to increase the observed protein sequence coverage.2 Isolated protein complexes can also be analyzed using top-down approaches for studying intact proteins.91 Additionally, intact protein complexes, eluted using non-denaturing methods, can be separated by blue native gel electrophoresis (BN-PAGE).90 This type of gel separation, in combination with mass spectrometry, allows for analysis of structure and function of protein complexes.92, 93 Western Blot analyses frequently accompany mass spectrometry analyses for validation or targeted follow-up studies. While not the focus of this review, further information on sample preparation and analyses by mass spectrometry is available in several reviews.94-96
Proteins can have numerous direct and indirect interactions that are spatially and temporally regulated and critical for their diverse functions. In view of this complexity of interactions, a great challenge lies in distinguishing specific interactions from numerous potentially non-specific associations. Therefore, substantial effort has been recently devoted toward developing approaches for assessing the specificity of protein interactions. This section presents an overview of the sources of non-specific binding and the strategies utilized to address them.
Common contaminants in IP experiments include proteins that interact with the resin (e.g., magnetic beads, agarose), immunoglobulin molecules (e.g., heavy or light chains of antibodies), and tags (e.g., FLAG, GFP) (Fig.2A). The choice of resin for IP experiments can aid in reducing non-specific contaminants. For example, Trinkle-Mulcahy et al. indicated that sepharose beads isolated less non-specific associations from cytoplasmic extracts, while magnetic beads performed better with nuclear extracts.28,70 On the other hand, magnetic beads do not require centrifugation, but instead use a magnet for their collection after IP. This type of surface binding and easy washing may partly contribute to the observation of reduced non-specific binding when comparing magnetic to agarose beads.23 The type of antibodies selected for IP can also impact the level of non-specific associations. Polyclonal antibodies tend to have high affinities for binding, providing higher efficiency of isolation when compared to monoclonal antibodies. However, the use of monoclonal antibodies leads to fewer non-specific associations than polyclonal antibodies. Also, it is common that commercially available antibodies against a protein of interest produce higher levels of non-specific binders than custom in-house generated and purified antibodies.
Even more challenging are non-specific associations that bind to the isolated proteins of interest, which may be referred to as non-specific interactions that are particular to the studied complex(es). Once affinity purified on the resin, the co-isolated proteins can act as interacting sites for numerous readily adherent (“sticky”), abundant, or domain-recognizing proteins that are present in the cell lysate. Non-specific binding is, in part, driven by the fact that proteins in solution do not retain the subcellular localization they possess in vivo and, therefore, may have previously non-existent opportunities to bind upon lysis (Fig. 2A). Table 1 lists commonly used methods for identification of non-specific interactions from these different sources of non-specific binding, indicating the advantages and disadvantages of individual methods.
The presence of non-specific interactions from these different sources is influenced by a number of factors that can be experimentally controlled. One of the critical aspects and primary steps in reducing non-specific interactions is the careful optimization of lysis conditions. Lysis buffers can be adjusted to retain specific strong or weak interactions, while reducing non-specific interactions. For example, more stringent lysis buffer compositions (e.g., high salt or detergent concentrations) may be used to primarily preserve strong interactions, while milder conditions can be optimized to retain weaker interactions. While stringent lysis conditions may also be necessary for accessing proteins within membranes or vesicles, one caveat of using high detergent concentrations is the possible denaturing of proteins. Protein denaturation can trigger an additional accumulation of non-specific associations (e.g., heat shock proteins that bind to unfolded proteins97). On the other hand, mild lysis buffer conditions can lead to an increase in not only specific and weak interactions, but also in non-specific binding39. Another critical factor influencing the presence of non-specific associations is the duration of the steps involved in immunoaffinity purification. Cristea, et al. demonstrated that the time frame utilized for cell lysis and subsequent incubation of cell lysate with antibody (conjugated to the resin) can significantly impact the cleanliness of the isolation.35 Incubations of a few minutes to a maximum of one hour can help reduce the number and abundance of non-specific associations. A proper balance between the stringency and incubation time can help maintain the majority of relevant interactions, while reducing background (Fig. 2B).
Although optimal lysis and isolation conditions can help reduce the background, contaminants that interact with resins, tags, IgG, or isolated complexes will still be present, even if at lower levels. Therefore, a requirement for all IPs is to design and incorporate appropriate controls. The experimental conditions for the control isolation should be consistent with the entire workflow (i.e., the conditions for isolating the proteins of interest). Most approaches for determining interaction specificity address non-specific associations with the resin, tag, and antibody, as discussed below. Several approaches have also been designed to tackle the issue of non-specific associations with the isolated proteins themselves. These methods are discussed in the “Assessing interaction specificity using metabolic labeling.”
The most frequently used controls in IPs of tagged bait proteins involve the generation of control cell lines expressing only the tag under the same promoter as the bait. This method has been successfully used by our laboratory to control for non-specific association of proteins to GFP or FLAG tags in studies characterizing functions of protein complexes in mammalian cells, as well as in cells infected with viruses.1, 2, 12, 21, 35, 98 For example, GFP control cell lines were used in a study aimed at determining localization- and phosphorylation-dependent interactions of histone deacetylase 5 (HDAC5)1. In another study, the possibility of temporal changes in non-specific associations was addressed using control isolations at different time points after infection with Sindbis virus42. Liu et al. used TAP tagging and compared the results to control TAP-only IPs to characterize the interactome of the hepatitis B (HBV) viral protein, HBx.99 This study identified a factor involved in apoptosis inhibition during HBV infection. A novel HaloTag technology was developed by Daniels et al. to characterize interactions of RNA polymerases I, II, and III, in which control HaloTag isolations were performed to determine non-specific interactions.100 Yang et al. utilized a GST control to specifically identify ribosomal protein 1 as a novel interacting partner of GST-tagged MSMEG_2731 protein of M. smegmatis (the model organism used to study M. tuberculosis), indicating a role for MSMEG_2731 in the processes of transcription and translation.101 When isolating endogenous proteins, a similar concept is used in designing controls. Resin conjugated with IgG is frequently used to capture proteins that bind non-specifically to the antibody rather than the bait. Malovannaya et al. incorporated additional steps into their IP protocol for elimination of common sources of non-specific associations when isolating an endogenous protein.102 For instance, ultracentrifugation at 100,000xg prior to bead incubation allowed for removal of proteins that had precipitated out of solution during antibody incubation.
Once protein interactions are identified as unique to bait isolations and absent in the control isolations, the interactions of interest must be further validated. Validation of interactions is frequently carried out by performing reciprocal isolations, i.e. isolation of the newly identified protein as a bait and confirmation of the co-isolation of the initial protein of interest as a prey. However, this strategy is limited by the availability and cost of antibodies recognizing the proteins of interest or the requirement of tagging novel proteins. For example, reciprocal IPs were used to confirm putative interactions identified in a study of mitotic protein complexes in cells expressing LAP-tagged bait proteins.103 The large-scale nature of this study allowed the comparison of numerous samples, and proteins found in multiple isolations or in control purifications were considered contaminants. This combination of control parameters led to relatively low rates of false-positive identifications. While reciprocal isolations are valuable in confirming interactions, these approaches do not distinguish between direct and indirect interactions. Additionally, reciprocal isolations can be challenging depending on the abundances of the bait or prey proteins. If the analyzed prey is abundant and involved in numerous complexes, it may prove difficult to isolate the prey protein at a sufficient level for confirmation of an interaction with a low abundance protein. Alternatively, if the prey protein is of low cellular abundance, its isolation may be difficult and confirmation experiments will depend on the achievable efficiency of isolation. Other common approaches for validation of interactions include assessing co-localization of bait and prey proteins6, 103, as well as determining whether the interaction is direct by using binary interaction assays104 (e.g., yeast two-hybrid).
Following optimization of IP conditions to reduce non-specific contaminants, further analysis based on mass spectrometry data can be applied to recognize the remaining non-specific associations. This is commonly done by statistical analysis of label-free qualitative or quantitative data. The qualitative approach compares the presence or absence of a protein in bait and control isolations, and is most commonly applied to large-scale studies that provide sufficient data for statistical analyses. Due to the continuous recent advancements in quantitative mass spectrometry, the quantitative approach is routinely implemented in IP studies of small or large scale, as described below.
Highly abundant proteins are frequently assigned as common contaminants in IP studies. However, interpreting the specificity of such proteins can be challenging. While their presence in isolated complexes can be derived from non-specific associations, true interactions may also be excluded as a side-effect of their abundances (e.g., tubulin, actin, heat shock proteins). Several large-scale studies of yeast proteome aimed to address this and other challenges in identifying specific interactions by developing statistical analysis approaches based on the presence or absence of interactions between multiple isolations.105-107 For example, Gavin and colleagues introduced “socio-affinity” index (SAI) as a tool to assign interaction specificity.105 SAI describes the tendency of two proteins to be present in reciprocal isolations or to associate in isolations of other tagged proteins, allowing to retain highly abundant proteins that would otherwise be removed.105 In another large-scale study, Krogan and colleagues applied two mass spectrometry methods, MALDI-TOF and LC-MS/MS, to identify interactomes of TAP-tagged yeast proteins with high coverage and confidence.106 Using two rounds of machine learning algorithms trained on the curated MIPS database108 of protein complexes, probabilities were assigned to each pairwise interaction. The combined dataset from the above mentioned studies105, 106 was later analyzed by purification enrichment (PE) scoring system that takes into account positive and negative evidences for and against interactions with the goal of decreasing the presence of false-positives.107 Recent study by Babu et al. utilized the PE score to analyze membrane-protein complexes in yeast, providing insights into organization of eukaryotic membranes.10 Similar and alternative computational approaches have been also employed in IP studies in mammalian systems. For example, Jeronimo et al. studied human protein complexes involved in transcription and RNA processing using several computational approaches.109 First, the Mascot110 scores for proteins in control isolations were used to account for non-specific binding of proteins to the resin and abundance of proteins in the isolation. Second, an interaction reliability (IR) score was assigned to each interaction, which was based on both the Mascot score and its network connections. Another human interactome study conducted by Ewing et al. used FLAG-tagged bait proteins with diverse biological functions for IP and high-throughput mass spectrometry analysis.34 To remove non-specific interactions, in addition to removing frequently associated proteins, a partial least squares-based regression model was built to assign interaction confidence scores to observed prey proteins.
In more recent IP-MS studies, quantitative information from mass spectrometry has become a valuable tool for the analysis of interaction specificity. For example, Rinner et al. used a MasterMap integrating MS1 signals from several LC-MS runs to identify specific interaction partners of FoxO3A.111 To increase specificity and sensitivity of analysis, proteins isolated in HA-FoxO3A IP were sequentially added in increasing amounts into control isolations. As a result, protein profiles extracted from LC-MS analyses showed progressive enrichment of specific interacting partners, while non-specific interactions remained identified at constant levels. Information on the number of peptides observed per protein (spectral counts) derived from IP-MS analysis is becoming increasingly popular for label-free quantification. To account for protein length and variation between separate MS analyses, an approach utilizing normalized spectral abundance factors (NSAF) was developed.112 NSAF values are calculated from the total number of spectra for each protein, normalized to its length and the total number of spectra for all proteins in the sample.112 Therefore, NSAF values can provide a view of the relative abundance for each protein among co-isolated proteins, highlighting proteins that may be prominent interactions. Sardiu et al. utilized NSAF values for building protein networks of chromatin remodeling complexes.113 To determine probable contaminants, an NSAF ratio was calculated for each protein by comparing IPs of FLAG-tagged bait and FLAG control.113 In a recent modification of this strategy, Tsai et al. integrated NSAF values with the estimated proteome abundance (PAX) values from PaxDb to build protein interaction networks for proteins with limited known functions.6 PAX values are reflective of the approximate total cellular abundance of proteins.114 Therefore, normalization of NSAF to PAX values6 provides a means for assessing the relative enrichment of proteins within an isolated complex, while correcting for the possible bias resulting from their total cellular abundances. This approach led to the identification of SIRT7 interactions with nucleolar chromatin remodeling complexes, providing insight into its role in rDNA transcription.6 While we expect PAX values to continue providing useful information in interaction studies, several considerations must be taken into account. Proteins are well known to have differential expression levels in distinct cell types and tissues. While PAX values have already been derived for proteins in multiple tissue types, including brain, heart, liver, and lung, information is still unavailable for many cell types and many proteins. Nevertheless, this concern is partly addressed by the fact that many highly abundant proteins, such as cytoskeletal and heat shock proteins, tend to be consistently abundant across different cell types. Therefore, even if the current PAX values may not reflect the precise levels of many medium or low abundance proteins, these values may still allow for correction of the more highly-abundant proteins. One additional concern that remains more difficult to address is the fact that proteomes can be significantly altered by environmental stimuli or pathogens. As an example, viruses can trigger substantial overexpression of certain proteins (e.g, 10-20-fold increase in p53 levels in cytomegalovirus-infected cells115). Therefore, future analyses of PAX values in the presence of different environmental and pathogenic stressors would provide a mean for increasingly accurate normalizations.
Other computational algorithms utilizing spectral counting for assessing specificity of interactions were developed to expand application to different experimental workflows (e.g., use of control and/or reciprocal IPs in the experimental setup). In a global proteomic analysis of deubiquitinating (Dub) enzymes, Sowa et al. developed the CompPass software that uses two scoring metrics to identify high-confidence candidate interacting proteins from parallel, non-reciprocal datasets obtained from FLAG-HA-Dub IPs.116 A Z-score was applied to weigh the presence of unique interactions in multiple immune complexes, and a D-score was used to take into account the abundance of the interactions (total spectral counts, TSC) and their reproducibility. The effectiveness of this approach for generating the Dub interactome was demonstrated by validation analyses using reciprocal IPs, parallel IPs in different cell lines, and comparisons to previously reported interactions. Interestingly, when CompPass was compared to the SAI approach developed by Gavin, et al.105, only 47% of interactions were found to overlap. This observation could be due to different experimental workflows (e.g., cell lysis conditions, sample preparation) or the fact that reciprocal IPs were not included in the initial CompPass assessment. Additionally, when CompPass was compared to the NSAF method used by Sardiu et al.113, and D-scores were calculated based on NSAF or TSC values, the overlap of identified bona fide interactions increased to 87%. One experimental workflow difference between these studies was that Sardiu et al. utilized several control IPs in their analysis, while the CompPass study relied on multiple parallel, non-reciprocal IPs. The CompPass algorithm was successfully applied to studies of E7 human papillomavirus protein interactions, bromodomain protein 4 associations that are important for its transcription regulatory functions, and others.117-119
In 2010, the Nesvizhskii laboratory introduced the SAINT (significance analysis of interactome) algorithm that allows not only generation of confidence scores for protein-protein interactions based on mass spectrometry data, but also creation of a probability model to distribute true and false interactions.7,120 This approach was applied to construction of global kinase and phosphatase interaction networks in yeast, and further optimized for easy implementation in the analysis of various datasets.120 The SAINT algorithm combines experimental data for interactions between bait and prey proteins with values derived from negative controls (e.g., control tag isolations) through a semi-supervised approach. The requirement for negative controls can be avoided if sufficient numbers of distantly connected baits are used for IPs (unsupervised approach).120 Overall, SAINT proved as a useful tool for analysis of both large-and small-scale datasets, as even single-bait IPs performed with sufficient numbers of control IPs can be analyzed by this method.
As an alternative to spectral counting, a specialized scoring system termed MiST (mass spectrometry interaction statistics) was devised by Jager et al. for identification of HIV-1-host protein interactions following transfection of HIV-1 proteins into HEK293 or Jurkat T cells.121 The MiST algorithm was designed to overcome some of the limitations of aforementioned spectral counting-based approaches (e.g., their reliance on the abundance of interactions that can vary depending on intracellular abundances of bait and prey, efficiency of IP experiments, and detection by MS). MiST analysis accounted for protein abundance as indicated by peak intensities, reproducibility of abundance across replicates, and specificity of interactions across all purifications.
Overall, these label-free techniques for assessing interaction specificity can be easily incorporated into small- or large-scale experiments, and do not require expensive reagents when compared to labeling approaches (see section below). However, such methods may not reliably detect small changes in protein abundances and may not comprehensively address non-specific binding to the isolated complexes (Fig. 2A).
Techniques incorporating metabolic labeling were introduced into proteomics workflows to address the need for accurate relative quantification. Incorporation of stable isotopes into proteins in cell culture provides global labeling of all expressed proteins and a means for simultaneous processing of samples for comparison, thus minimizing experimental variations and improving accuracy in relative quantification. Metabolic labeling was first utilized by Oda et al. in the form of 15N labeling in yeast.122 Mann and colleagues developed the SILAC (stable isotope labeling in cell culture) approach in which heavy amino acids containing 2H, 13C, and/or 15N were introduced into growing mammalian cells.123 This method simplified quantification because the expected mass differences between heavy and light peptides were known.
More recently, metabolic labeling has been used to distinguish specific from non-specific interactions in IP experiments. The I-DIRT (Isotopic Differentiation of Interactions as Random or Targeted) method was developed by Chait and colleagues to overcome the challenge of determining contaminants that bind non-specifically to the isolated protein complex (Fig. 2A) of the yeast DNA polymerase ε complex.124 In this approach, cells expressing an affinity-tagged protein are grown in light isotopic medium, while wild-type cells are cultured in heavy medium. As the IP is performed from a 1:1 mixture of light- and heavy-cultured cells, specific associations to the tagged bait are detected by MS analysis to have light peptides only, while non-specific interactions have light and heavy peptides. As in this approach the complexity of the mixed sample is increased especially for non-specific components, Tsai et al. demonstrated that a targeted “SRM-like” I-DIRT approach can be used to increase identification of specific and low abundance interactions of interest in mammalian cells.6 One limitation of the I-DIRT technique is that fast-exchanging interactions can be falsely assigned as non-specific. To differentiate between specific stable, specific transient, and non-specific interactions, a transient I-DIRT strategy using cross-linking in cell culture was developed and applied to study the multi-subunit complex NuA3.125 In the MS analysis, stable interactions produced 100% light peptides, non-specific interactions produced 50% light, while transient interactions generated peptides in an intermediate range (between 50% and 100% light).
While I-DIRT was developed for experiments with tagged proteins, a separate approach was designed for the study of endogenous protein complexes. The QUICK (quantitative immunoprecipitation combined with knockdown) approach integrates metabolic labeling with RNAi to assess interaction specificity, as demonstrated for β-catenin and Cbl interactions.126 By RNAi-mediated knockdown of a protein of interest in light-labeled cell culture, while preserving expression of the target protein in heavy-labeled culture, specific interactions can be distinguished from non-specific interaction through comparison of light and heavy peptide intensities. This approach was applied in several studies, including the assessment of Stat3 interactions and the involvement of human leucine-rich repeat kinase 2 (Lrrk2) in cytoskeleton function.127, 128 Retention of specific versus non-specific interactions also depends on whether purification is done before mixing (MAP, mixing after purification) or after mixing (PAM, purification after mixing) of cell lysates when using SILAC.129 The PAM-SILAC approach is useful in identifying stable interactions from background interactions, but can miss dynamic interactions. To overcome this challenge, Huang and colleagues proposed a combination of time-controlled (Tc) PAMSILAC and MAP-SILAC.130 As shown, dynamic interactions had significantly increased SILAC ratios with the MAP-SILAC approach, while behaving like background proteins (1:1 SILAC ratio) in the PAM-SILAC approach. This method was used for studying the human 26S proteasome, the tumor suppressor PTEN, and others.130, 131
As metabolic labeling is not applicable to analyzing all sample types (e.g., tissues) and requires culturing cells for several passages in heavy media to ensure stable isotope incorporation, chemical labeling methods provide valuable alternatives. Chemical labeling is typically done after IP, either at the protein or peptide level. Therefore, this approach cannot account for variations in sample processing prior to the point of labeling. Chemical labeling has been utilized for the analysis of specificity of interactions. ICAT (isotope-coded affinity tag) was developed by Aebersold and co-workers for use in quantification in MS analyses and was applied to the characterization of protein complexes.132, 133 In this approach, chemical labeling of cysteine residues on intact proteins was performed with either heavy or light ICAT reagents in bait and control purifications, respectively. Specific interactions were distinguished from background proteins by comparison of heavy and light ICAT-labeled peptides. However, the limitations of this method include loss of information about non-cysteine containing peptides, and a reduced ability to characterize differences in post-translational modifications. Chemical labeling with isobaric tags has also been implemented in studies of interaction specificity. iTRAQ, a multiplex isotopic labeling method for amine-containing N-termini and lysine residues of peptides134, was utilized for distinguishing specific interactions in IP experiments.135 As the methodology for multiplexed chemical labeling continues to advance, we envisage that these approaches will become more extensively incorporated into studies of protein complexes and determination of specificity of interactions.
Following the isolation of protein complexes, mass spectrometry analysis, and determination of interaction specificity, the frequently large number of identified putative interactions presents a challenge for determining biologically significant targets, even in small-scale studies with a single bait protein. Therefore, detailed computational analysis is required, involving determination of ontological protein relationships and construction of functional protein networks, to deduce the biological significance of identified protein complexes (Fig. 3). This section discusses the availability and advantages of recently developed computational tools. These include resources for generation and visualization of protein network maps, and deduction of interaction clusters based on functional enrichment and co-localization patterns. Due to the high number and variety of such tools, Table 2 and and33 aim to provide a representative list of commonly used software tools and database resources to date.
As a first step in interactions data analysis, all proteins must have an associated identifier, which allows retrieval of known attributes (e.g., chemical and biological feature sets) from relational databases. While the selection of identifiers is not trivial, the choice is primarily between two distinct classes, gene-centric or protein/organism-specific identifiers. For the former, mapping of protein groups back to gene symbols often provides greatest coverage across databases and does not a priori restrict attribute retrieval by organism. In contrast, selection of database-specific (UniProt accessions, NCBI gi numbers) or species-specific (SGD, MGD, FlyBase) identifiers is often advantageous for retrieving the most accurate functional attributes for the intended target protein/protein group. Unless the software tool provides user control of organism selection, the identifier system is recommended, particularly since many well-developed knowledgebases, such as UniProt, allow facile conversion of accessions to other identifiers. As a second step, gathering general information for a list of putative protein interactions (e.g., protein length, amino acid sequence, phylogenetic data, protein abundance) is often beneficial for subsequent interaction network analysis, as this information may be used to generate a richer picture of the topology of individual protein clusters or larger protein networks. For example, NSAF and PAX abundance values, discussed earlier in this review, reveal valuable information on the extent of enrichment of individual proteins within a protein cluster and could point to a housekeeping role for highly represented interactions. The calculation of such values depends on the availability of general information on protein properties, in particular protein length and global protein abundance. Great resources for obtaining such protein characteristics are protein interaction databases, discussed in great detail below, but primarily the major protein knowledgebases, such as UniProt (Universal Protein Resource Database)136, PIR (Protein Information Resource)137, and RefSeq (NCBI Reference Sequence collection)138, are used for this purpose.
Having gathered initial details on all protein interactions to be analyzed, a common approach used for reconstructing protein complexes and performing network analysis involves the utilization of protein interaction databases (PIDs). These PIDs often contain both direct and indirect protein functional relationships, which have been collected from published literature, experimental data, text-mined abstracts, and computationally-based predictions representing the likelihood of interaction. Table 2 presents a detailed list of universally applicable PIDs. Information about proteins and protein interactions is collected by the process of curation (e.g. MINT139, IntAct140, 141, MPact142) or by employment of prediction softwares that analyze previously published data by text-mining (e.g. Predictome143, 144). Many databases are built on both approaches and are complemented by data deposition from scientists in the appropriate field in an effort to generate the most comprehensive database possible (e.g. HPRD145, DIP146, MPIDB147). Generally, the curation approach, which relies primarily on experimental data is preferred to prediction-based computational tools, as it is based on manual gathering of information from published literature performed by recruited expert scientists. This is why many databases are compiled solely on the basis of manual data curation, with some being even more stringent in their data collection and including only information from experimental data (e.g. MINT139, DIP146). Since variation in the interpretation of data due to human error is possible, it is useful to perform data analysis based on information recruited from multiple PIDs. This strategy was recently applied by Meixner et al.128, utilizing two independent databases (HPRD145 and BioGRID148) for characterizing interactions of the Leucine-rich Repeat Kinase 2.
In addition to the variation in the method of data collection, PIDs also differ in other aspects, including the range of covered species, how comprehensive a given database is, and the specificity of input data (e.g., protein-protein, protein-polysaccharides, protein-small molecule interactions). Thus, it is important to take into account such differences when selecting the databases that suit best a particular study. Multiple databases cover a broad range of species, without particular limitations, including IntAct140, 141, BIND149, DIP146, and MINT139, while others are more specialized and focus on specific species or a subset of commonly studied ones. The MPact142 and the HPID150 databases, for example, include only data from the yeast specie Saccharomyces cerevisiae and humans, respectively, while the MITOP2151 database is built on data exclusively from yeast, mouse, Arabidopsis thaliana, neurospora and humans. There are also databases, such as HPID, that integrate information from other resources like BIND, DIP, and HPRD. Such “secondary” databases collect information from multiple primary databases, offering a comprehensive set of information about putative protein interactions (e.g. IMEx152). At times, more specialized databases can allow for more specific analyses, especially when a particular type of proteins or system is the focus of investigation. One such case is the MatrixDB153 database that aims to report data primarily from extracellular protein interactions and protein-polysaccharides associations. Another is the HIV-1 Human Protein Interaction Database154 that, as implied, recruits data specifically on HIV-1 viral proteins and known protein interactions.
One concern with the utilization of a wide variety of interaction databases over the years has been the fact that with such a vast array of data available, reported and analyzed in many different formats and on multiple platforms, a universal system is required to standardize all molecular interaction data. The initiative to solve this problem was launched by the Human Proteome Organization Proteomics Standards Initiative (HUPO-PSI)155 with the objective to establish community standards for proteomics data representation and to facilitate exchange, verification, and comparison of interactions data. As a result, the International Molecular Exchange (IMEx) consortium (Table 2) to standardize protein interaction data curation was created152, and data providers (e.g., DIP, IntAct, MINT, BioGRID, MatrixDB) have agreed to coordinate literature curation efforts and share information to decrease data redundancy. Another step forward was the introduction of standard data formats, such as Molecular Interaction (MI) extensible markup language (XML), MITAB156, and mzIdentML157.
The next step in data analysis involves the generation of network maps and their visualization. Some interaction databases have incorporated tools for these purposes (e.g., STRING158, VisANT144, Bacteriome159 in Table 2). STRING, for example, creates interactive protein networks with customizable view options of interaction clusters built on the basis of fusion, co-expression, experimental, neighborhood, and text-mining data, among others methods. However, protein interaction databases are best utilized when complemented by other tools, specifically developed for visualization and representation of protein networks and maps. Table 3 provides a list of some of these commonly utilized computational tools. Of note, Cytoscape160 is a widely used software supporting genetic and molecular interaction datasets in multiple formats, and allowing for advanced data analysis and modeling. This effective tool is commonly used for functional protein annotation and visual mapping of interactions from small- and large-scale studies. In addition, numerous Cytoscape plug-ins are available to integrate external functional attributes to the network of interest, including tools for protein clustering by gene ontology annotations, as discussed below. Other examples of useful resources include Pathway Palette161 and Arena3D162. The first is practical for representing proteomics data in the context of biological pathways or protein networks on the basis of peptide rather than protein input data. Arena3D has been specifically developed for three dimensional representations of complex data sets and bigger protein networks. Such platforms have been incorporated in recent analyses of proteomics data from affinity isolation-based studies.128, 163
Interpretation of large datasets of protein interactions can be a challenging task but it is facilitated greatly by analysis of clustered protein associations based on gene ontology (GO) annotations. Functional analysis based on GO annotations can be utilized before or after constructing the protein network maps. This piece of the puzzle provides valuable functional information about the putative biological roles of identified protein complexes and can point researchers in the right direction for further follow-up studies. The three main GO categories that are particularly informative for interpretation of protein interactions are biological process, cellular component, and molecular function (Fig. 3). Biological process clusters reveal a general process or pathway that protein complexes may be involved in, for example cell cycle or transcription; cellular component clusters provide information about the putative localization of identified proteins within cellular compartments or established complexes; and molecular function points to the role of protein in the cell with regard to its activity, for example, kinase or demethylase activity. If the functional analysis based on GO annotations is performed prior to constructing protein networks, this analysis can help narrow down the list of putative interaction partners to specific categories of interest (e.g., protein localization or biological role). Therefore, protein interactions from each category can be further subjected to network analysis to highlight any well-defined protein complexes. Additionally, in cases where the localization of the bait is strictly distinct (e.g., nucleus, mitochondria), contaminating protein interactions restricted to other compartments can be easily filtered if the interaction data are first clustered by GO localization.
There are numerous resources for GO annotation databases. Table 3 provides a list of key features for some of these resources, such as GOA164, GOMiner165, GO::TermFinder166. Particularly useful are multiple plug-ins developed for Cytoscape.160 For instance, ClueGO167, BiNGO168 and DAVID169 were specifically developed to provide GO classification, statistical enrichment of GO terms, and subsequent visualization. These are commonly used in large- and small-scale proteomics studies. For example, Echeverria et al. integrated ClueGO in an analysis of the Heat shock protein Hsp90 molecular chaperone machinery to build a functional map of protein clusters linked to biological processes.170 In another study, Doolittle and Gomez utilized DAVID as a tool to analyze the biological role of predicted protein interactions formed during Dengue virus infection in human and insect host cells.171
Overall, these tools can aid in analyzing large datasets obtained from immunoaffinity purification studies by constructing protein interactome networks. Identification of novel complexes and/or functional relationships is often complemented by orthogonal validation of selected protein associations of interest, as described in the section on “Determining specificity of interactions”. The resulting functional networks can serve as a foundation for understanding the biological roles of the observed associations.
Cross-linking strategies coupled to mass spectrometry analyses have become fundamental tools for studying protein interactions and complexes. This section describes some of the main cross-linking methods in use today and their application to protein interaction studies. In particular, approaches to gain either structural insights into protein complex or to identify specific stable and transient interactions in vitro and in vivo will be discussed.
Cross-linking is the process of joining molecules through formation of covalent bonds. Reagents for protein cross-linking include moderately-reactive reagents for chemical cross-linking and highly-reactive intermediates generated during photo-cross-linking.172 Currently, chemical cross-linking is more frequently used in MS studies than photo-cross-linking. The specificity of cross-linking reagents determines the targeted protein groups to be linked (e.g., amines, sulfhydryls, carboxyls), while the size of the cross-linker determines the distance between captured proteins (Fig. 4). To optimize cross-linking strategies for MS analysis (i.e., to reduce the complexity of cross-linking samples and improve detection of cross-linked peptides), cross-linking reagents with several new features were developed. These features include affinity tags, chemically- or MS-reversible cross-linkers, and isotope labels for ease of detection and identification by MS. Several comprehensive reviews describe the structures and properties of available cross-linking reagents.172-174 Figure 4 illustrates how different cross-linking strategies can be utilized to study protein interactions within existing mass spectrometry workflows.
Protein cross-linking coupled to MS analysis is a useful tool for structural characterization of heterogeneous protein complexes and serves as an alternative to crystallography- and NMR-based approaches. In earlier studies that utilized cross-linking coupled to mass spectrometry analysis, several challenges had to be addressed.175-177 Cross-linking reagents with different properties had to be tested to maximize product diversity for in-depth structural analysis. In addition, the amount of available starting material and the instrument sensitivity had a critical impact on the detection of cross-linked sites. Due to these limitations, early approaches were applicable to the analysis of smaller protein complexes. In recent years, cross-linking methodologies have improved through enrichment for and labeling of cross-linked peptides for easier detection, improved sensitivity in MS analysis, and development of automated algorithms for database searching.172, 173, 178
As a result of improved MS instrument sensitivity, cross-linking combined with MS analysis has been successfully used in a variety of structural studies, including analysis of yeast 19S proteasome lid, yeast RNA Pol II, human protein phosphatase 2A (PP2A) complexes, and phage DNA packaging machinery.179-183 Sharon et al. cross-linked the endogenous yeast 19S proteasome lid complex by bis(sulfosuccinimidyl)suberate (BS3) amine-reactive reagent and analyzed gel bands corresponding to cross-linked proteins by MALDI-TOF/TOF. As a result, this analysis was able to identify additional interactions within the complex.179 To improve detection of cross-linked peptides, Chen et al. applied charge-based enrichment of BS3-cross-linked peptides and high-resolution MS analysis for elucidation of a 15-subunit Pol II complex structure.180 Cross-linking using disuccinimidyl tartrate (DST) has been integrated with mutagenesis analysis in studies of phage packaging mechanisms.181 More recently, Aebersold and colleagues used isotopically light- or heavy-labeled disuccinimidyl suberate (DSS) cross-linkers to study the topology of affinity-purified human protein phosphatase 2A (PP2A) in complex with its adaptor proteins.182 To identify isotopically coded cross-linked, the xQuest search engine was employed.178 The algorithm works by first analyzing MS spectra for isotopic peptide pairs, which are separately sequenced by MS/MS and analyzed for the presence of common ions or isotopically shifted, cross-linked ions. Caveats of this strategy include requirements for efficient labeling with isotopically light or heavy cross-linkers and labor-intensive scoring procedure.174 An alternative workflow that does not make use of isotopically labeled cross-linkers was developed by Goodlett and co-workers.184 This strategy utilizes the Popitam software185, and the cross-linked peptides are considered to be modified at unknown residues with unknown modifications. The resulting cross-linked pair of peptides would have matching modifications, in which one peptide from the pair is modified with a mass equivalent to the mass of the other peptide in the pair plus the cross-linker, and vice versa. Limitations of this approach include the ability of Popitam to effectively match the theoretical spectrum for a single peptide to the more complex spectrum of a cross-linked peptide pair, as well as the requirement for manual verification of cross-linked peptide spectra.186 To address these challenges, a database containing every possible cross-linked product was generated for use with a SEQUEST-style search.186 Additionally, xComb, a publicly available database processing tool, was introduced for use with standard proteome search engine to simplify the identification of cross-linked peptides.187 Other bioinformatics tools were also developed to aid the interpretation of complex cross-linking peptides spectra.188-193
Although the development of search algorithms for cross-linked peptide spectra contributed significantly to the optimization of cross-linking and MS workflows, interpretation of spectra remains labor-intensive, partly due to the low abundance of cross-linked peptides. Several strategies that address this challenge have been proposed, including enrichment of cross-linked peptides using affinity tags194-197, detection by isotope-labeling182, 198-201, fluorescence-labeling,202 or reporter tag labeling,194, 203 as well as cleavable cross-linkers194, 200, 201, 204-207. For example, Chowdhury et al. developed the CLIP (click-enabled linker for interacting proteins) reagent that contains an alkyne tag for enrichment of cross-linked peptides and a detection tag (NO2) for identification of cross-linked peptides in MS/MS spectra.196 Zelter et al. labeled the C-termini of cross-linked peptides with stable isotopes during digestion in the presence of H218O, leading to peptide identification based on signature isotopic peak distributions in MS spectra.208 This method overcomes the need for isotope-coded cross-linkers, detection of peptide mass shifts, and manual inspection. To characterize the yeast 20S proteasome complex, Kao et al. utilized a disuccinimidyl sulfoxide (DSSO) cross-linker with collision-induced dissociation (CID)-cleavable sites that cause separation of cross-linked peptides at the MS/MS level, allowing the identification of peptide chain fragment ions at the MS3 level.201 Overall, the development of new cross-linking reagents in conjunction with careful mass spectrometry analysis has contributed significantly to generation of structural views of protein complexes. In combination with computational modeling approaches, cross-linking enables reconstruction of the architectures of heterogeneous protein complexes at high levels of resolution182, and is expected to continue to impact this field of study.
Understanding the composition and assembly of macromolecular complexes can provide valuable insight into their functions. Additionally, transient interactions, such as enzyme-substrate interactions, are known to contribute significantly to the dynamic regulation of protein functions. However, these transient interactions are particularly challenging to be captured by traditional IP experiments. For example, in the previously mentioned study by Malovannaya et al.102, variability between the abundances (i.e., spectral counts) of co-isolated proteins was considered indicative of non-specific binding. However, exclusion of proteins that demonstrated significant variation in their spectral counts between isolations may simultaneously eliminate transient interactions (false negatives). Cross-linking methods have provided effective tools for studying both stable and transient interactions in vitro and in vivo. As an example of an in vitro study, Lambert et al. utilized BS3 cross-linking to identify both the transient interaction itself and the specific residues involved in binding of small heat-shock protein (HSp21) to its substrate, malate dehydrogenase (MDH).209 Cross-linking was successfully incorporated into IP-MS/MS workflows for identification of transient or weak interactions under stringent purification conditions. One major challenge is performing in vivo cross-linking to capture stable, as well as spatially and temporally transient specific interactions.210 Guerrero et al. developed the QTAX (quantitative analysis of tandem affinity purified cross-linked protein complexes) approach to determine the composition of the yeast 26S proteasome.211 This approach employs a similar concept to metabolic labeling methodologies (i.e., I-DIRT124 and SILAC123, see section on “Determining specificity of interactions”), integrating it with formaldehyde cross-linking in cell culture and tandem affinity purification. Informatively, proteins at the core of the 25S proteasome were represented by smaller numbers of peptides, consistent with their likelihood of low accessibility by trypsin. Reversible cross-linking may be used to resolve this issue. One such method, ReCLIP (reversible cross-link immune-precipitation) was recently proposed by Smith et al.212 Cross-linking via DSP (dithiobis succinimidyl propionate) and DTME (Dithio-bismaleimidoethane) cell-permeable reagents that are thiol-cleavable can be reversed using a reducing agent such as dithiothreitol (DTT). Another example of cross-linking in cell culture for identifying transient interactions is the study of the dynamic NuA3 histone acetyltransferase complex.213 Through integration of metabolic labeling using I-DIRT (see section on “Determining specificity of interactions”) with cross-linking and IPMS/MS, transient interactions were identified and placed in the context of interaction specificity.125
While numerous studies have shown the effective use of cross-linking in cell culture, its use in animal models was recently demonstrated. Bai et al. utilized in vivo time-controlled transcardiac perfusion cross-linking214 with 4% formaldehyde to study amyloid precursor protein (APP) interactions in mice.215 This method allowed for limited cross-linking to occur in tissue, followed by purification of protein complexes in the presence of high salt and detergent concentrations that are particularly useful for studying membrane-bound proteins. As a control for non-specific interactions, parallel immunoaffinity purifications of APP paralogues were performed. While the utility of this approach was apparent by the identification of both known and novel APP interactions, several well-characterized APP-binding proteins were absent. These results highlight some of the remaining difficulties in this approach, including the precise selection of the time for in vivo cross-linking, the accessibility of the interacting surfaces for cross-linking, and the optimization of lysis buffer conditions.
A novel cross-linking reagent called PIR (protein interaction reporter), which contains labile bonds with a reporter group and an affinity tag, was developed in the Bruce laboratory with the goal of reducing sample and data output complexity in the MS/MS workflow.216 The labile bonds of PIR can be selectively cleaved following either LC separation by photo-activation or during the mass spectrometry analysis by any dissociation mechanism, thus releasing peptides that can be identified by a standard MS/MS workflow. A reporter group marks the spectra containing these cleaved products. PIR technology was shown to be effective in cell culture, as well as for cross-linking of virions.205, 217, 218
One approach that has not been yet integrated with mass spectrometry, but has the potential to be used in IP-MS/MS workflows, is the metabolic incorporation of modified amino acids that can generate cross-links in growing eukaryotic cells.219 Photo-inducible amino acids that resemble leucine and methionine were metabolically incorporated into protein chains and activated by UV light to initiate cross-linking reactions. This approach addresses the limitation of currently available cross-linking reagents that cannot efficiently penetrate cell membranes to achieve high levels of cross-linking.
In summary, recent developments in cross-linking approaches with utility in cell culture or in vivo supply new pipelines for distinguishing stable from transient interactions. Therefore, it is expected that these approaches will further contribute to elucidating the spatial and temporal regulation of cellular pathways involved in the proper maintenance of cellular functions, as well as pathways crucial in disease-associated mechanisms.
As summarized in this review, developments in IP-MS experimental workflows have significantly aided the identification and characterization of protein interactions in different biological contexts. Given the increasing sensitivity of mass spectrometry, the interpretation of the resulting large datasets of protein interactions was made possible by developments in computational tools for constructing and visualizing interaction networks. These functional protein networks can be powerful in guiding follow-up biological studies for elucidating the functions of identified interactions. Furthermore, advances in label-free and labeling quantitative approaches for assessing specificity of interactions have contributed to improving the outcome and reliability of IP-MS experiments. With the current expansion of the proteomics field, a better understanding of protein interactions within their temporal and spatial biological context is timely and necessary. Distinguishing stable from transient interactions within a cellular pathway can provide important insights into its normal functions, as well as disregulation in disease. This is made possible by the continuous integration of multidisciplinary methods within the proteomics arsenal of tools. For example, cross-linking has become an integral part of protein interaction studies using mass spectrometry-based methods. Similarly, the combination of microscopy and proteomics has placed protein expression, interactions and posttranslational modifications in the context of localization. Therefore, these are exciting years in advancing our understanding of protein function. We envisage that in coming years, the methodologies will continue to expand and add to our ability to study the temporal and spatial dynamics of protein interactions. Furthermore, knowledge of interactions within different cell types, tissues, or following different environmental stimuli or pathogen infection will be necessary for creating a more complete view of protein regulation. Identifying protein interactions involved in the initiation or progression of disease can provide an array of condition-specific targets for development of new therapeutics.
We are grateful for funding from NIH/NIDA grant DP1DA026192, NIH grant R21AI102187, and HFSPO award RGY0079/2009-C to IMC, and an NSF graduate fellowship to HGB. We also thank Amanda J. Guise and Todd M. Greco in the Cristea lab for critical reading of the manuscript.
Ileana M. Cristea has performed her graduate research at the Michael Barber Center for Mass Spectrometry, University of Manchester, U.K., under the supervision of Simon Gaskell. She then pursued her postdoctoral work in the mass spectrometry laboratory of Brian Chait at The Rockefeller University. She is currently an Assistant Professor in the Department of Molecular Biology at Princeton University, where her laboratory focuses on understanding mechanisms that control the fate of cells under invasion by pathogens, with particular emphasis on the viral modulation of chromatin remodeling machineries. Her research employs a multi-disciplinary approach, integrating modern mass spectrometry and proteomics with molecular biology, biochemistry, and virology. She is the recipient of the Bordoli Prize from the British Mass Spectrometry Society (2001), NIDA Avant-Garde Award for HIV/AIDS Research (2008), Human Frontiers Science Program Young Investigator Award (2009), Early Career Award in Mass Spectrometry from the American Chemical Society NJ Section (2011), and the American Society for Mass Spectrometry Research Award (2012).
Yana V. Miteva completed her B.A. degree in Biochemistry and Molecular Biology at the Pennsylvania State University, during which she performed research in the Robert Schlegel lab and completed a one-year science exchange program at The University Louis Pasteur, France. She then held a research specialist position in the Physiology Department at the University of Pennsylvania in the lab of Todd Lamitina. In 2008 she joined Princeton University and is currently finalizing her Ph.D. thesis work in the Cristea lab, investigating the biological roles of deacetylating complexes during viral infection.
Hanna G. Budayeva completed her B.S. degree in Biochemistry at Rider University, NJ. Her undergraduate research in laboratory of Bryan Spiegelberg focused on in vitro binding assays for characterizing protein interactions involved in transcription regulation. She is presently a Ph.D. student in the Cristea laboratory at Princeton University. In her research, she utilizes systems biology approaches to study changes in sirtuin interactions and downstream pathways during environmental stresses. She is the recipient of a National Science Foundation Graduate Research fellowship.