|Home | About | Journals | Submit | Contact Us | Français|
Protein-protein interactions are important for nearly all biological processes, and it is known that aberrant protein-protein interactions can lead to human disease and cancer. Recent evidence has suggested that protein interaction interfaces describe a new class of attractive targets for drug development. Full characterization of protein interaction networks of protein complexes and their dynamics in response to various cellular cues will provide essential information for us to understand how protein complexes work together in cells to maintain cell viability and normal homeostasis. Affinity purification coupled with quantitative mass spectrometry has become the primary method for studying in vivo protein interactions of protein complexes and whole organism proteomes. Recent developments in sample preparation and affinity purification strategies allow the capture, identification, and quantification of protein interactions of protein complexes that are stable, dynamic, transient, and/or weak. Current efforts have mainly focused on generating reliable, reproducible, and high confidence protein interaction data sets for functional characterization. The availability of increasing amounts of information on protein interactions in eukaryotic systems and new bioinformatics tools allow functional analysis of quantitative protein interaction data to unravel the biological significance of the identified protein interactions. Existing studies in this area have laid a solid foundation toward generating a complete map of in vivo protein interaction networks of protein complexes in cells or tissues.
Essential biological processes in cells such as DNA replication, transcription, translation, protein degradation, and cell cycle control are carried out by an assembly of protein molecules that interact and form multisubunit protein complexes (1–5). These macromolecular complexes are important cellular machineries that work hand in hand to maintain normal cell homeostasis. Although the function of each individual subunit is important, the function of the complex as a whole is not simply the sum of its individual components (6–8). Traditional biochemical methods focus on characterizing a single protein. Although still valid and useful, these methods cannot account for the complexity of macromolecular protein machines. Recently, technological developments in mass spectrometry-based proteomics approaches have made comprehensive characterization of protein complexes possible by enabling the determination of dynamic protein complex compositions, stoichiometries, posttranslational modifications, assemblies, structures, and protein interaction networks (8–17). This review will report the advances in the study of protein-protein interactions by mass spectrometry.
Protein-protein interactions are important for nearly all biological processes, and it is known that aberrant protein-protein interactions can lead to human disease and cancer (18–20). Sufficient evidence has demonstrated that modulation of protein-protein binding represents an emerging therapeutic paradigm, and protein interaction interfaces describe a new class of attractive targets for drug development. For example, inactivation of tumor suppressor p53 has been implicated in various cancers, and one mechanism of inactivation is the inappropriate interaction with the E3 ubiquitin ligase MDM2, leading to excessive degradation of p53 (21). Disruption of this interaction by small molecule inhibitors has shown promise as a potential tool for cancer treatment (21–25). Current work has illustrated how generating protein interaction networks of protein complexes and profiling the interaction differences between normal and disease states would be greatly beneficial for identifying potential molecular targets for mechanism-driven drug discovery.
Several methodologies including yeast two-hybrid system, protein microarray, fluorescence imaging, and affinity purification-mass spectrometry have been developed for studying protein-protein interactions (26). The sensitivity, specificity, efficiency, versatility, and accuracy offered by affinity purification (AP)1-MS make this strategy the method of choice for mapping system-wide scale, in vivo protein interaction networks in various organisms (27–31). Similar strategies have also been applied to study protein interactions of protein complexes (12, 25, 32–38) and protein interaction networks within the context of cellular pathways (39–42).
In recent years, quantitative mass spectrometry has been successfully coupled with affinity purification to study protein-protein interactions. This combination of methods generates highly reliable interaction data by using quantified protein interaction information to distinguish specific protein interactors from nonspecific background proteins (14, 43, 44). This is advantageous because nonspecific binding to the affinity matrix cannot be completely eliminated in resin-based affinity purification processes. In this review, we focus on the recent advances in quantitative profiling of protein interaction networks of protein complexes. These include novel sample preparation strategies to generate comprehensive protein interaction networks of various natures and new bioinformatics tools for their functional characterization.
With the development of new affinity tags and antibodies, AP-MS-based strategies have taken center stage for the study of multisubunit protein complexes. The versatility of AP-MS methods has allowed various sample preparation strategies to be developed and incorporated to capture and identify various types of protein interactions including those that are stable, dynamic, transient, and/or weak (45–50). Basic AP-MS work flows follow the same general process beginning with affinity purification using a tagged bait protein followed by protein digestion, LC-MS/MS analysis, and database searching. Subsequent assessment of the data quality and extraction of functional information by computational analysis are often carried out as well (51). In this process, experimental variables such as the affinity tag, cell lysis condition, sample handling, incubation time, and washing conditions are carefully assessed to allow acquisition of the desired type of interaction data with high reliability because each of these variables is capable of affecting the purification specificity and protein interaction stability. Most of the applications of AP-MS to date involve either one-step or tandem affinity purification under native conditions in physiological buffers. Tandem affinity purification (TAP) can reduce nonspecific binding more effectively, but this prolonged purification process preserves only stable components of protein complexes and limits the recovery of more transient interactions. Therefore, one-step purification is preferred to capture more diversified protein interactions under native conditions. Although minimum nonspecific binding is desired, the stringency of the purification conditions often needs to be compromised to maintain the more loosely bound interactors. Ultimately, quantitative mass spectrometry is required to distinguish specific from nonspecific interactions and obtain higher confidence and easily interpretable interaction data. Co-purified proteins are categorized as specific or nonspecific based on the relative abundance of the protein in purified extracts containing the tagged bait versus control. Specific interaction partners are significantly more abundant in the purified sample containing the tagged bait than in a control sample and therefore generate an abundance ratio greater than 1, whereas the comparable ratios of nonspecifically bound background proteins are close to 1 (14, 43). Recently, it has been suggested that, instead of wild type untagged cells, RNA interference knockdown cells can also be used as controls for determining specific interactions quantitatively (52).
Various quantification techniques based on stable isotope labeling have been developed and used in mass spectrometry-based proteomics approaches for protein quantitation (13, 14). These include metabolic labeling (e.g. SILAC) (53) and chemical labeling (e.g. ICAT, ICPL, and iTRAQ) (54–56). Recently, label-free methods have also shown promise as an alternative strategy for fast quantification with limited cost (57, 58). Metabolic labeling is carried out in vivo at the protein level, whereas chemical labeling is performed in vitro at either the protein or peptide level. Although each method has unique advantages and limitations as reviewed elsewhere (11, 13, 59), they all are theoretically applicable for AP-QMS analysis. In practice, however, metabolic labeling-based approaches appear to be more widely used for studying protein interactions (43). In the wake of this rapid evolution of new quantification techniques, it has become apparent that the quantitative technique itself is not the critical factor for determining the types of interactions that can be isolated. This is because the nature of the protein interactions captured during native AP-MS is highly influenced by experimental conditions and thus controlled by sample preparation and purification strategies (44–46, 60). In this review, we classify two general strategies for the isolation and quantitative determination of specific interactors of protein complexes under native conditions based on sample preparation techniques: 1) purification after mixing (PAM) AP-QMS and 2) mixing after purification (MAP) AP-QMS. The details are illustrated in Fig. 1.
Metabolic labeling such as SILAC is generally considered an accurate strategy for quantitative mass spectrometry (14, 53). This is because metabolic labeling allows in vivo incorporation of stable isotope-labeled amino acids into the entire proteome during cell culture, leading to unbiased labeling, no sample loss, and minimum variation during labeling. However, to ensure full incorporation of labeled amino acids, multiple rounds of doubling are required (61). The versatility of SILAC allows quantitative analysis to be coupled with various sample preparation methods for affinity purification (45). Moreover, the commonly used amino acids (Arg/Lys) for labeling allow every tryptic peptide to carry at least one label for quantitation, thus increasing coverage and accuracy. As shown in Fig. 1A, a typical SILAC-based AP-QMS protocol involves in vivo metabolic labeling, cell lysis, lysate mixing, and purification prior to mass spectrometric analysis of the resulting samples (14, 62). In this process, protein purification is carried out after mixing the cell lysates from two types of differentially labeled cells (i.e. cells with a tagged subunit and non-tagged cells as a control). This preparation strategy is termed PAM (45). PAM-SILAC-based strategies have not only been applied for identification of specific protein interactions in signal transduction pathways (43, 62) but also utilized for studying protein complexes (44, 45, 50, 52, 63–66). The PAM strategy allows effective discrimination against purification background as well as quantification of specific but stable interactors based on their peptide relative abundance measured during MS analysis (45, 49). The main advantage of the PAM AP-QMS method is its simplicity in sample handling and reduction of experimental variations during purification. Although other quantitative approaches with stable isotope labeling at the protein level can be coupled with the PAM method, nearly all of the applications of PAM AP-QMS so far are limited to PAM-SILAC (43).
We have recently introduced a new sample preparation strategy, MAP, coupled with SILAC (i.e. MAP-SILAC) to quantify stable and dynamic protein interactions of protein complexes (45). In the MAP method, affinity purification of protein complexes is first carried out from comparable cell lines separately prior to sample mixing for quantitation as illustrated in Fig. 1B. In comparison with the PAM method, the MAP approach is much more flexible and can be used with any quantitative techniques using stable isotope labeling such as SILAC, ICAT, ICPL, and iTRAQ. This is because the MAP method is compatible with stable isotope labeling implemented at any step (e.g. cell culture, cell lysate, peptide digests, or MS analysis) during the entire sample handling process. The MAP method has been coupled with the SILAC approach to study protein interactions of proteasome complexes (45), COP9 signalosomes (50), and RAD52 protein complexes (67). MAP has also been used in combination with ICAT to map androgen receptor-interacting proteins in mammalian cells (68) and prion-interacting protein (69) and with iTRAQ to study protein interactions of complexes containing the insulin receptor substrate, Chico, in response to insulin signaling (70). The MAP strategy is highly useful and effective for quantitative analysis, especially for cases where purification must be done individually. However, it should be noted that separate purifications of two or more samples can introduce experimental variability that can be mistaken as biological variation. Therefore, caution should be taken when preparing MAP AP-QMS samples. Biological replicates and subsequent validation are a necessity.
It has been demonstrated that both PAM- and MAP-SILAC strategies can be successfully utilized to determine specific and stable protein interactions based on their relative abundance ratio measurements (45, 50). One technical obstacle remaining with PAM-SILAC studies is that quantitative assignment of bona fide but dynamic interactors can be extremely challenging. These dynamic interactors associate and dissociate with their binding partners at high on/off rates. In PAM-SILAC, because all proteins are present in both the light and heavy labeled forms during purification, the fast on/off rates of dynamically interacting proteins will result in an equilibrium between the two isotopically labeled forms of the proteins that bind to the bait. The relative abundance ratios of these interactors will therefore be similar to those of background proteins after a certain time interval, which is dependent on the kinetic parameters of the individual binding partners. As a result, specific but dynamically interacting proteins cannot be effectively distinguished from background proteins through the relative abundance ratio measurements and are often classified as nonspecific interactors (false negatives) (Fig. 2). This problem has been partially solved by decreasing the purification time or using a time-controlled (Tc) PAM-SILAC experiment (45). Such strategies can decrease the extent of dissociation, and some dynamic interactions can be preserved with increased SILAC ratios for their identification. Protein interactors with extremely high on/off rates, however, will still remain indistinguishable from the background proteins as determined by SILAC ratios (45). This is problematic because many regulatory interacting proteins and proteins involved in signaling events are transient and dynamic by nature. Efficient capture and identification of this subpopulation of interactors is highly important for understanding how protein complexes are regulated. To address this issue, we developed the MAP-SILAC approach to allow complete elimination of such interaction exchange of labeled proteins by incorporating separate purification processes of the compared samples. This preserves the specificity of protein interactions during purification and thus retains high SILAC ratios for all specific interactions including those that are stable and dynamic. Compared with PAM-SILAC, which is effective for quantifying stable interactors, MAP-SILAC is more advantageous for unambiguous quantitative identification of both stable and dynamic components of protein complexes (45). A combined approach, however, has proven even more beneficial, allowing effective identification of and differentiation between both stably associated and dynamic interactions (45).
We first used the combined MAP- and Tc-PAM-SILAC approaches to decipher dynamic interactors of the human 26 S proteasome complexes (Fig. 2) (45). In this work, 67 proteasome-interacting proteins (PIPs) were identified, 14 of which would have been classified incorrectly as nonspecific using only the traditional PAM-SILAC approach. Additionally, using this approach, we could clearly distinguish 35 stable interacting PIPs and 16 dynamic interactors from nonspecific background. Nearly half of the dynamic PIPs belong to the ubiquitin-proteasome pathway including one of the proteasome subunits, ADRM1/hRpn13, and the main deubiquitinating enzyme, UCH37, which are susceptible to regulation because of their special functions in the proteasomal degradation pathway (45). It is known that UCH37 is recruited to the proteasome through ADRM1, and consistent with this notion, both UCH37 and ADRM1 display similar interaction dynamics with the proteasome. Interestingly, ARM1/Rpn13 has recently been identified as a novel ubiquitin receptor playing a critical role in ubiquitin chain recognition and disassembly at the proteasome (71), consistent with the nature of its interaction with proteasomes.
The combination of PAM- and MAP-SILAC-based approaches has also been applied to unravel the dynamics of the human COP9 signalosome (CSN) complex (50), human TATA-binding protein transcription complexes (49), and yeast eIF2B-eIF2 and cyclin-Cdc28 complexes (72). In the study of the human TATA-binding complex, it was determined that all known TATA-binding protein-associated factors (TAFs) were specific but stable interactors with only one of them, BTAF1, displaying the characteristics of dynamic interacting proteins during affinity purification as discussed above. Interestingly, it was found that the dynamic nature of the BTAF1 interaction with the TATA-binding protein complex is cell cycle-dependent, providing a basis for further exploration into the mechanism of its control (49). These results provide additional support to the notion that this kind of dynamic interaction is biologically relevant and subject to cellular regulation. More recently, the same strategy has been adopted to study stable and dynamic interactors of mitogen-activated protein kinase kinase 2 (MEKK2) complexes from metabolically labeled mouse embryonic fibroblasts derived from target gene knock-out and wild type mice (73), further demonstrating the feasibility and versatility of the combined PAM- and MAP-SILAC strategy to study dynamic interactions of protein complexes from samples of various origins.
In summary, the combination of PAM- and MAP-SILAC represents a powerful and general proteomics tool for identification and characterization of dynamically interacting proteins, offering unique information on a specific subset of biologically important interactions and prompting directional studies into regulatory mechanisms of protein complexes. As mentioned earlier, MAP and PAM strategies can be coupled with other quantitative techniques; however, the combined approach requires labeling methods at the protein level. Although chemical labeling at the protein level is feasible, metabolic labeling appears to be much more attractive for this type of study because of its inherent advantages (14, 74).
Although stable isotope labeling-based techniques generate accurate protein quantitation by mass spectrometry, such experiments can be quite costly because of expensive labeling reagents, especially when large numbers of samples are needed for comparative analysis. Because of its simplicity and capability for multiplex comparison, label-free quantitation has recently emerged as an alternative quantitative strategy in AP-QMS to map and characterize protein interactions of affinity-purified protein complexes using spectral counting (32, 57, 75, 76). Because mixing is not performed during sample preparation prior to MS analysis for quantitation in label-free AP-QMS analysis, affinity purification of compared samples is carried out separately as in the MAP method (Fig. 1C). The label-free AP-QMS approach is exemplified by two separate analyses in which bona fide protein-protein interactions of several protein complexes have been identified using multiple bait purifications: 1,278 non-redundant interacting proteins from 27 TAPs from human chromatin remodeling and nutrient sensing complexes (32) and 534 nonredundant interacting proteins from 11 TAPs from Rpd3 histone deacetylase complexes in yeast (33). In addition, Sardiu et al. (33) created a probabilistic deletion network that not only identified interacting proteins of the Rpd3 complexes but also detailed the composition of subcomplexes and the intricate interactions between the stable subunits of the complex (33). Because of the use of multiple baits for these studies, the interaction data generated allowed topological analysis of the protein complexes and thus provided more detailed information about protein complex composition and subpopulations. This is an advantage for studying protein complexes because multiplex stable isotope labeling experiments are more costly and technically challenging. However, compared with isotope labeling techniques, label-free quantitation using spectral counting is less accurate for calculating peptide relative abundances. Because each sample for comparison is handled separately in the entire analysis, reproducibility at each step would be critical to minimize experimental variations to obtain reliable interaction data.
Protein-protein interactions in cells are highly dynamic and can be modulated by a host of cellular cues, the nature of which is dependent on binding strength, specificity, and kinetics. Protein interactions have been generally categorized into two major categories: stable and transient/weak interactions. Stable interactions can survive various purification conditions and thus can be easily studied using either native PAM or MAP AP-QMS strategies, whereas transiently and weakly interacting proteins are more susceptible to changes in experimental conditions and are often lost during the native purification process. To obtain a comprehensive picture of protein interaction networks of a given protein complex, it is essential to stabilize these protein interactions during the purification process. This can be accomplished using in vivo chemical cross-linking to freeze protein interactions prior to cell lysis, thereby generating a snapshot of the in vivo protein interaction network (77). In combination with AP-MS, in vivo chemical cross-linking has been successfully used to map protein interactions in cells or tissues (46–48, 78–84). Because of the formation of covalent bonds between interacting partners after cross-linking, affinity purification of the cross-linked complexes under fully denaturing conditions is preferred to eliminate nonspecific purification background and prevent the formation of non-covalent interactions after cell lysis. In addition, quantitative analysis is necessary to effectively distinguish specific interactors from nonspecific background as discussed above. To this end, we have developed a general strategy named QTAX for quantitative analysis of tandem affinity-purified in vivo cross-linked (X) protein complexes (Fig. 1D) (46, 47). Briefly, in vivo cross-linking with formaldehyde is carried out to effectively freeze all types of protein interactions as they occur in the cell prior to lysis and purification. After cell lysis, equal amounts of proteins from the two differentially labeled populations of cells are mixed, and cross-linked products are tandem affinity-purified under fully denaturing conditions using a “TAP” tag, the histidine-biotin (HB) tag (46, 48). The HB tag is the only known TAP tag that is capable of tandem affinity purification under fully denaturing conditions and thus allows stringent washing conditions to significantly reduce nonspecific background. This is important because high background is a major problem that often hampers in vivo cross-linking approaches. Purified protein complexes are then analyzed by SILAC-based quantitative mass spectrometry for unambiguous identification and quantification of specific interacting proteins of a given protein complex. It is worth noting that the PAM method is used in the QTAX strategy. Because protein interactions are chemically attached through in vivo cross-linking and purification is carried out under fully denaturing conditions, no interaction exchange between the light and heavy labeled proteins will occur in the QTAX experiment. This is completely different from what has been observed when the PAM method is coupled with native AP-QMS as discussed earlier (45). As a result, the PAM method is best suited to analyze in vivo cross-linked protein complexes, permitting minimum experimental variation and warranting reliable samples for subsequent comparison.
We have used the QTAX strategy to map yeast 26 S PIPs to elucidate molecular mechanisms underlying ubiquitin-proteasome degradation pathways (46, 47). The QTAX-based tag team approach enabled the determination of a proteasome interaction network containing at least 471 proteins (47), significantly more than the total number of proteins identified by previous reports using other AP-MS methods. Biochemical validation has confirmed that the QTAX strategy is effective for capturing and identifying transient interactions that can only be detected after cross-linking. Protein interaction network analysis has implicated proteasome complex involvement in various cellular processes through its connectivity with 35 gene ontology (GO) protein complexes (47). Additionally, we have shown that the value of SILAC ratios obtained from this type of analysis correlates well with the interaction specificity of PIPs with the proteasome complex (47). This is exemplified by the fact that all of the PIPs identified with “high” ratios (i.e. those present only in the tagged sample) are key factors involved in ubiquitin-proteasome degradation pathways (47).
Compared with homobifunctional N-hydroxysuccinimide esters used for in vivo cross-linking (77), formaldehyde cross-linking appears to be more widely adopted for studying a wide range of protein complexes. This is most likely because of the unique physiochemical properties of formaldehyde: (i) water solubility and cell membrane permeability, (ii) negligible reagent-induced protein rearrangements, and (iii) reversible short cross-linked bonds (2–3 Å) that endure nonphysiological conditions and still maintain structural integrity (85, 86). Although in vivo chemical cross-linking has been effective for capturing a wide range of protein interactions, cross-linking efficiency is dependent on the chemical features of cross-linkers as well as the structures of interacting proteins and the accessibility of cross-linkable amino acids of the protein complexes. To obtain the best results, careful optimization of cross-linking conditions for each protein complex needs to be carried out experimentally.
Apart from the HB tag (46–48, 80, 82, 84), antibodies (78, 81, 87) and epitope-based tags such as Myc (79) and hemagglutinin-FLAG (83) have been used for similar applications. Unlike HB tag-based affinity purification, antibody-based affinity purification must be carried out under native conditions, which will result in higher nonspecific background and thus interfere with subsequent analyses. In addition, cross-linking may diminish epitope recognition significantly to reduce purification efficiency.
Other types of isotope labeling quantitation methods such as iTRAQ have been used with the QTAX strategy as well. Because most cross-linking reagents target free amine groups at N termini and lysines, any chemical labeling for quantitation at the protein level requiring similar chemistry is not possible. However, isotope labeling at the peptide level using methods such as iTRAQ can be applied because enzymatic digestion results in peptides with free N termini for labeling. This can be achieved by revising the QTAX strategy to incorporate the MAP rather than the PAM method as follows: in vivo cross-linked protein complexes are isolated separately from each cell population, digested, labeled with iTRAQ reagents, and then mixed for MS comparison. Markham and co-workers (87) have used a similar strategy to study the in vivo brain interactome of the amyloid precursor protein (APP). In this study, in vivo cross-linked complexes of APP as well as amyloid precursor-like proteins (APLP1 and APLP2) were immunoprecipitated separately from mouse brain tissue for quantitative analysis. The results confirmed eight known interactions of APP and identified more than 30 additional proteins that reside in spatial proximity to APP in the brain, demonstrating the feasibility of this method (87). When isotope labeling is not feasible, label-free methods can be used with a modified version of the QTAX strategy in which every step in the analysis process is performed separately for each sample including in vivo cross-linking, purification, digestion, and MS analysis (83).
In summary, QTAX is a powerful and adaptable technique for studying protein complexes, capable of comprehensively characterizing protein-protein interactions of all types including stable, transient, and/or weak interactions in a single analysis (46, 47). Variations at different steps can be incorporated into the QTAX strategy, but the original protocol (Fig. 1D) (46) remains the superior method as it exploits the optimal combination of sample preparation (PAM), affinity purification (HB tag), and quantitation (SILAC) methods, thereby minimizing experimental variations, nonspecific binding, and sample loss, generating the most reliable quantitation results of protein interactions.
The cellular environment is in a state of constant flux to meet the ever changing needs of a cell. As such, cellular components are in a state of dynamic equilibrium, constantly responding to internal and external cellular signals. The dynamic changes of protein interaction networks of protein complexes at different physiological states may reflect how protein complexes are involved in various biological processes such as the cell cycle, apoptosis, stress response, and DNA damage repair. Over the last decade, comparative expression proteomics has generated considerable interest in using quantitative mass spectrometry to study cell type-specific proteomes, proteome stress response, and cancer proteomes with a special interest in looking for biomarkers (88). Although expression proteomics studies are extremely useful, targeted quantitative analysis of dynamic changes in protein interaction networks of protein complexes will provide more mechanistic details to further our understanding of protein biological functions. In comparison with the vast amount of studies on expression proteomics, only a countable number of studies have been reported on linking dynamics of protein interaction networks of protein complexes to cellular processes or signaling pathways using AP-QMS strategies (64, 65, 70, 76, 84, 89, 90). With the development of new methodologies for quantifying protein complexes, we anticipate that this area of research will attract more attention in the near future. A few examples under different cellular cues are highlighted here.
We recently carried out a comprehensive characterization of cell cycle phase-specific interaction networks of the yeast 26 S proteasome by QTAX (84). The work was aimed at gaining insight and a more complete understanding of cell cycle-dependent regulation of the 26 S proteasome. In this study, we generated G1, S, and M phase-specific proteasome interaction networks and successfully identified 677 PIPs, 266 of which were not previously identified from unsynchronized cells. Furthermore, based on the dynamic changes in SILAC ratios across the three cell cycle phases, PIPs were quantitatively profiled and grouped using a profile vector-based clustering approach. We were able to classify PIPs based on their dynamic changes of their associations with the proteasome as the cell progresses through the cell cycle. Using this clustering approach, we identified and generated 20 functionally significant groups of PIPs, three of which are enriched with cell cycle-related functions. This is the first study to use the QTAX strategy to study the dynamics of interaction networks of a macromolecular protein complex, which can be generalized for studying other protein complexes.
Several studies have attempted to decipher phosphorylation-regulated protein interaction dynamics of protein complexes in response to extracellular signaling (70, 76, 90), at cell cycle phases (89), and in the Neurospora circadian clock (65). Identification and quantification of protein interaction dynamics as well as variations in phosphorylation states of protein complexes at a given state or time point need to be carried out for correlation analysis. For example, Baker et al. (65) used AP-QMS to study phosphorylation-dependent regulation of a circadian oscillator, FREQUENCY (FRQ) complex. Using this strategy, the temporal dynamics of endogenous FRQ protein complex-interacting partners were identified and quantified using the PAM-SILAC approach by comparing a time-specific sample with a pooled reference sample. Next, time-dependent phosphorylation of FRQ throughout the circadian cycle was identified, quantified, and correlated with the dynamics of the FRQ interaction network. Integration of results suggested that circadian rhythms and FRQ interaction dynamics are regulated by FRQ phosphorylation states.
In summary, understanding how protein complex interaction networks respond to exogenous and endogenous cellular cues can provide important details for the dynamic regulation of protein complex functions. Furthermore, by linking posttranslational modifications with protein interaction dynamics, additional regulatory mechanisms and response pathways can be hypothesized.
Continued improvements in the sensitivity of mass spectrometry-based technologies have dramatically enhanced the identification of proteins from AP-QMS experiments. However, improved identification of bona fide interaction partners comes together with increased numbers of contaminant proteins. Generating a list of interacting proteins with high confidence for subsequent functional analysis is the first step toward understanding the biological significance of the identified protein interactions (Fig. 3). Although quantitative mass spectrometry can be used to distinguish between nonspecific and specific interactors, establishing a quantitative threshold cutoff to identify putative specific interactions while filtering out false positives is not trivial. Due to a multitude of variations in AP-QMS strategies, the quality of data obtained is highly variable, and the threshold criteria should be assessed carefully with each type of experimental design to generate a reliable interaction data set. There are two general techniques that have been used to attain appropriate thresholds. The first method uses experimentally derived set points based on the quantitative values of known interactors through literature mining (91) or by using alternative validation techniques to test a subset of interactors (46). The second method uses statistical calculations that are based on experimental variation to calculate data reproducibility and the probability that the data set of identified interactors could be obtained by chance rather than because of specific interactions (62, 67, 70).
Once a threshold value has been determined, the final data set of high confidence interacting proteins should be validated and assessed to determine biological significance. The general work flow is illustrated in Fig. 3. With smaller data sets (<10), traditional biochemical and genetic methods can be used for validation and examination of the biological consequence of the protein-protein interactions identified. In many cases, lists from AP-QMS experiments are extensive (upward of hundreds of proteins) and exceed the practical limit for interaction validation using traditional biochemical methods. Therefore, the most common approach is to first confirm a small subset of identified interactions using alternative strategies followed by a series of bioinformatics techniques to validate and extract biologically relevant information from the complete data set. Typically, this bioinformatics validation encompasses any combination or order of the following steps: creating interaction network maps from open access protein-protein interaction databases, creating topology networks of protein complex subunits and associated proteins, clustering “like behaving” interacting proteins, and identifying functional trends/enrichments in the protein list.
PPI network analysis is an effective method for both validation of large amount of protein interactions and visualization of the connectivity of the associated proteins (47, 87, 92). Depending on how many AP-QMS interaction data sets are acquired, different information can be generated through PPI network analysis including protein interaction network mapping and network topology.
When only one or a few baits are used in AP-QMS experiments, not enough information about interaction connectivity among proteins can be generated. Therefore, protein interaction network analysis will rely on known interactions among the identified proteins that can be obtained from multiple public protein interaction databases including MPact (93), IntAct (94), Human Protein Interaction Database (HPID) (95), BioGRID (96), Saccharomyces Genome Database (SGD) (97), Human Protein Reference Database (HPRD) (98), Database of Interacting Proteins (DIP) (99), Molecular INTeraction database (MINT) (100), Biocarta, and Biomolecular Interaction Network Database (BIND) (101). The PPI network mapping is carried out in the following two steps: 1) extracting annotated protein-protein interactions between the identified interactors and subunits of the protein complex and among identified interactors from public databases to create a network and 2) visualizing the created network with bioinformatics software designed to map and analyze protein interactions, e.g. Cytoscape. With this type of analysis, an interaction network map of the protein complex can be generated that will display the connectivity among subunits of the protein complex and its associated proteins. In addition, PPI network mapping determines the list of identified specific interacting proteins that have not been previously annotated and adds these interactions to the network as novel interactors. These interactors may be biologically relevant for further validation and functional analysis. It has been demonstrated that this general PPI network analysis strategy can serve as an alternative strategy to validate large data sets of protein interactions obtained from AP-QMS experiments when biochemical validation is not feasible (47). In our study of the 26 S proteasome complex using the QTAX strategy, more than 80% of proteasome-interacting proteins identified were mapped to the proteasome interaction network based on the known interaction data, thereby supporting the validity of the identified interactions (47). Similar analysis has also been performed to validate other protein-protein interaction data (67, 84, 87, 91, 92, 102).
Network topology analysis generally requires a large number (>10) of protein interaction data sets and has been previously applied for large scale AP-MS data from various organisms (27–31, 41, 102). Only recently has this method been applied to the study of protein complexes using label-free AP-QMS strategies (32, 33, 38) with multiple baits. Various strategies and algorithms have been developed for network topology analysis that have been evaluated and reviewed in detail elsewhere (103–105). Nevertheless, most of the strategies use co-purification profiles to assign pairwise interaction scores and/or probabilistic measurements to group proteins into three main topological clusters: cores, modules, and attachments (103–108). Core proteins stably associate and stoichiometrically co-purify with every protein complex subunit. Modules and attachments, on the other hand, are considered to be loosely bound interactors and are usually recovered at substoichiometric levels. Modules are groups of two or more proteins that usually purify together but are not part of the core complex. Attachments are proteins that do not stably associate with core complexes or modules and are only present in some purifications (29). Identification of topological cores, modules, and attachments of a given protein complex can offer details about the molecular function and assembly of protein complexes.
Clustering analysis is often used to group protein interactors into functional modules based on user-defined criteria. This analysis can be applied to cluster proteins from multibait AP-QMS data to obtain information on protein complex composition. In addition, it can be used for clustering proteins into functional modules based on their interaction dynamics in response to cellular perturbations. The two commonly used clustering methods are hierarchical and partitioning clustering (105, 109), although several other techniques have been used to handle different types of data sets (84, 110, 111). It is important to note that clustering methods are heuristics, and there is no single universal clustering algorithm for every type of data. Therefore, it is best to assess multiple clustering approaches for obtaining optimal results. Ideally, a clustering approach is able to group similarly behaving proteins into mid- to high density clusters that are functionally related. We recently explored the dynamics of cell cycle phase-specific interaction networks of the yeast 26 S proteasome (84). We have clustered the PIPs by the characteristics of their cell cycle phase-dependent SILAC ratios using a profile vector-based clustering approach. This method separates PIPs based on a set of constraints that we have defined rather than by using the known heuristic clustering algorithms such as hierarchical and k-means methods. Although both heuristic clustering methods produced PIP clusters with assorted enrichments, the profile vector-based method produced more mid-high density clusters and had significantly more enrichments, suggesting that more biologically relevant information was extracted by this approach. The profile vector-based method resulted in a total of 20 clusters enriched for molecular functions, seven enriched in protein complexes, and 10 enriched with pathways. This included three different clusters enriched with biological functions and pathways pertaining to the cell cycle or cell differentiation. This analysis permits the separation of large data sets into smaller, functional modules that are easier to validate and assess for biological significance. In addition, proteins clustered together are thought to carry similar functions; therefore, potential functions of proteins with previously unannotated functions can be proposed based on the known protein functions within the same cluster.
One commonly used approach for discerning biologically meaningful information from proteomics data is to group interacting proteins according to their annotated functional categories. This typically involves extracting biological annotations, usually GO terms, of each gene/protein in the data set and highlighting those that are statistically overrepresented (enriched). This functional enrichment analysis, when applied to AP-QMS studies, can increase the probability of identifying biological processes in which the protein complex of interest may be involved. Such analysis can help investigators determine which protein interactors are the most likely to be biologically significant and establish a priority list of proteins requiring validation and further investigation. In most cases, functional enrichment analysis is not performed on the entire data set but used to validate interacting modules or clusters as was the case in the aforementioned cell cycle-specific 26 S proteasome interaction network study (84). In this case, functional enrichment analysis specifically identified three clusters of proteins that were enriched in processes pertaining to the cell cycle. By prioritizing proteins from these clusters, we identified the mitogen-activated protein kinase Fus3, a specific interactor identified by only three to four peptides (four in G1 phase and three in S phase) from 677 PIPs as a high priority interacting protein. Upon further biochemical validation, we verified that Fus3 displays cell cycle-regulated interaction with the 26 S proteasome in yeast, providing the first physical evidence linking the proteasome to the mating response pathway. Without functional enrichment analysis, this highly interesting protein interaction may have been passed over for more abundant proteins identified in the analysis.
The data analysis processes described above and outlined in Fig. 3 are intensive and require significant knowledge of bioinformatics techniques to obtain statistically significant results. Often, data analysis is the rate-limiting step in the interpretation of AP-QMS results. In recent years, several types of free access software have become available to aid with analysis of proteomics data, particularly AP-MS data. Table I details the unique features and advantages of five of these open access platforms: search tool for recurring instances of neighboring genes (STRING) (112), protein interaction network analysis (PINA) (113), for Data Analysis Tool Extension (DAnTE) (114), SuperHirn (115), and database for annotation, visualization and integrated discovery (DAVID) (116). In the following paragraphs, we will provide a brief overview of how these software products can be used to accomplish the data analysis steps outlined above.
Recently, a handful of web-based platforms have been developed to accomplish extraction, creation, and visualization of PPI networks in one easy to use software suite. The two platforms that we find to be the most convenient and useful are STRING (112, 117) and PINA (113), each with their own unique features. The STRING platform integrates and weights PPI information from several sources, covering about 2.5 million proteins from 630 organisms (112). This megadatabase supplements the compiled information from PPI databases with functional association predictions from genomic contexts, co-expression data, and literature mining to calculate confidence scores for each interaction. Following user-friendly data upload (input gene/protein list), the STRING platform creates an interactive PPI map that has several display modes. In the confidence view, as the name implies, the user can visualize the confidence of each interaction displayed as weighted edges. Additionally, by clicking on an edge, the evidence summary of each interaction can be examined individually. With the evidence view, the interface displays color-coded edges that delineate the evidence of interaction. In this way, the user can view all interaction types for the entire PPI network rather than manually checking each interaction in the confidence view. Lastly, the actions view displays known functional relationships between two interacting proteins (e.g. inhibition or activation, posttranslational modification, or expression). Conveniently, all of the information generated by STRING can be saved and downloaded in user-friendly formats (PNG, SVG, XML, FASTA, and TXT).
Similar to STRING, PINA compiles and maps interaction data from six public PPI databases (113) and provides network construction, filtering, and visualization tools to generate interactive PPI networks from protein lists (input UniProt accession numbers). However, unlike STRING, the PINA platform is not limited to searching a single list of proteins but is capable of handling queries from a list of protein pairs or two lists of proteins (113), making it one of the most versatile network creation software available. The network construction and analysis tools of PINA also have display options for visualizing what they term detection methods (evidence) as well as the types of interactions (actions). In addition to these features, PINA provides protein annotations of protein domains and GO terms. With the GO enrichment analysis tool, PINA identifies significantly enriched GO terms of the PPI network and allows the user to visualize and filter proteins that share similar GO annotations. Additionally, unique graph-theoretical tools provide centrality measures that combine topological data (diameter, degree distribution, shortest path distribution, and clustering coefficient) to identify topologically important proteins of the PPI network such as hubs or bottlenecks. In this way, the user can identify and prioritize proteins that may be central for network connectivity for further validation. All network information can be saved on the PINA web site or downloaded for further use (113).
Clustering analysis requires a significant amount of computational effort and expertise. For those who do not want to attempt analysis themselves and those who do not have access to a bioinformatics collaborator, there are a couple downloadable software that have the ability to cluster proteomics data. We will describe two of these: 1) DAnTE and 2) SuperHirn. DAnTE is downloadable software that can accept tabular quantitative data and will perform both hierarchical and k-means clustering analyses that can be visualized as heat maps (114). There are no explicit requisites for the type of quantitative analysis input; any abundance measurements are suitable once converted into proper file format (such as Excel or CSV) (114). In addition to clustering, DAnTE also includes features such as data normalization, reproducibility analysis, and the ability to handle missing values (114). Each of these measures can be visualized by one of several various statistical plots including histograms, box plots, and correlation diagrams (114). SuperHirn, created by Ruedi Aebersold and co-workers (115), is downloadable software capable of clustering AP-QMS data. With SuperHirn, only label-free quantitative data can be analyzed; however, what is unique about this software is that it can directly analyze FT-MS/MS data files (centroid or raw). Full details of the SuperHirn software can be accessed in the original publication (115) or on line. Briefly, SuperHirn detects, tracks, and combines features in LC-MS patterns to generate a MasterMap that can be used for protein profiling. Additionally, this platform can be used for data quality assessment and subsequent identification and exclusion of low quality LC-MS runs (115). In sum, SuperHirn nicely manages data processing, database searching, protein identification, and quantification analyses in one software platform.
In addition to publicly available network mapping and clustering tools, there are many tools that can be used to analyze gene lists for biological annotations. Each of these tools has their own unique features and advantages that have been reviewed in detail by Khatri and Drãghici (118) and evaluated in Huang et al. (116). One of the most useful and comprehensive tools available that we will briefly discuss in this review is DAVID bioinformatics resources (116). DAVID consists of an integrated biological knowledge base that compiles information from over 40 publicly available gene annotation sources (119). By offering a vast variety of biological annotations including molecular function, biological process, protein complex, phenotype, and disease association, to name a few, DAVID is the most comprehensive, easy to use database we assessed. The resources available also include a suite of tools for calculating the statistical relevance of each enrichment term, charting and clustering enriched proteins, and visualizing enriched gene lists (116). What makes DAVID especially easy to use is that it can accept any gene list, supports several different species and over 35 different gene identifier formats (e.g. lists of UniProt or Swiss-Prot accession numbers, official gene symbols, Affymetrix ID, etc.), and is capable of automated batch conversion from one ID format to another. All of the ID conversions, annotation charts, tables, clusters, and statistical analyses can be downloaded as text files. Additionally, annotation clusters can be viewed and downloaded as heat map images.
In summary, although no one software suite has been created to handle all data analysis steps, by using combinations of multiple platforms, the desired analysis can be achieved. Therefore, with thoughtful planning and careful execution, anyone is capable of analyzing AP-QMS interaction data.
Affinity purification coupled with quantitative mass spectrometry has been demonstrated to be a powerful proteomics tool for mapping protein interaction networks of protein complexes and elucidating complex dynamics associated with various physiological conditions. Although significant progress has been made in methods of sample preparation, tagging, purification, quantitation, and data analysis, new advancements are required to generate more comprehensive, reproducible, and high confidence interaction data sets. Although stable isotope labeling-based quantitative techniques continue to hold their front row seats, new improvements in quantitation accuracy of label-free methods (120, 121) are highly desired for comparing large scale protein interaction networks. The development of the QTAX strategy shows great promise for generating an in vivo snapshot of protein-protein interactions of all natures that occur under physiological conditions in cells or tissues. Current studies have set forth only the initial steps of the long term goal of understanding protein interactions, dynamic composition, posttranslational modifications, assembly, and structure of protein complexes in living systems. The limitation lies in the availability of membrane-permeable cross-linkers that can capture every protein contact with various distances between protein-protein binding interfaces. A recent report on structural elucidation of the RNA polymerase II-transcription factor IIF complex by in vitro cross-linking and mass spectrometry (122) shed some light on future work toward in vitro and in vivo structural analyses of macromolecular machinery. With increased information on protein interactions in mammalian systems and new developments in bioinformatics tools, functional analysis of protein interaction networks that aims to understand protein sociology in cells may become accessible in the foreseeable future. Integration of chemistry, biochemistry, molecular biology, mass spectrometry, and bioinformatics disciplines is essential for developing the next generation of tools to provide better understanding of protein complexes and the functional links found in interaction networks for improved diagnostics as well as treatment of human diseases.
We thank Dr. Chaity Aiken for critical reading of the manuscript.
* This work was supported, in whole or in part, by National Institutes of Health Grant GM-74830 (to L. H.).
1 The abbreviations used are: