|Home | About | Journals | Submit | Contact Us | Français|
Protein-protein interaction network represents an important aspect of systems biology. The understanding of the plant protein-protein interaction network and interactome will provide crucial insights into the regulation of plant developmental, physiological, and pathological processes. In this review, we will first define the concept of plant interactome and the protein-protein interaction network. The significance of the plant interactome study will be discussed. We will then compare the pros and cons for different strategies for interactome mapping including yeast two-hybrid system (Y2H), affinity purification mass spectrometry (AP-MS), bimolecular fluorescence complementation (BiFC), and in silico prediction. The application of these platforms on specific plant biology questions will be further discussed. The recent advancements revealed the great potential for plant protein-protein interaction network and interactome to elucidate molecular mechanisms for signal transduction, stress responses, cell cycle control, pattern formation, and others. Mapping the plant interactome in model species will provide important guideline for the future study of plant biology.
Protein-protein interactions have always been an important consideration for the gene function. The physical interaction between the proteins can be relevant to a variety of biological processes including signal transduction, homeostasis control, stress responses, plant defense and organ formation [1-7]. At the molecular level, protein-protein interaction could be important in protein phosphorylation, transcriptional co-factor recruitment, enzyme post-transcriptional modification for activation or deactivation, assembly of cytoskeleton, transporter activation, and many others [4,8-14]. The protein-protein interactions thus play essential roles for many physiological, pathological and developmental processes in essentially all organisms .
At the systems biology age, the accumulation of protein-protein interaction data enabled the systems level study of protein interaction network . In fact, several major techniques like yeast two hybrid (Y2H) and affinity purification mass spectrometry (AP-MS) have been used to survey the plant protein-protein interaction networks in a variety of model species . The protein-protein interaction network generally refers the network based on the physical interaction among proteins as shown in Fig. (11), where the physical interaction is represented as the edge and the proteins are represented as dots. The protein-protein interaction network is different from the genetic interaction network, where the network of genetic interactions is characterized to elucidate how genes function as a network in biological processes [16-19]. The genetic network was successfully characterized with the so-called SGA (systemic genetic analysis) in yeast and C. elegans [17-20]. Despite the great potential, the application of SGA type of approach in plant is complicated by the diploid genome, low throughput breeding, and often duplications of genes and gene function . The genetic interaction network is relevant to but different from the so-called protein ‘physical interaction’ network, which is often established based on protein interactions instead of genetic interactions [5,22]. The genome-level protein-protein interaction network is also referred as ‘protein interactome’ or ‘interactome’ .
Yeast two-hybrid system (Y2H) and immuno-coprecipitation coupled with tandem mass spectrometry (AP-MS) are the two major platforms previously used to build up the interactomes in different species . The study of the global interactome is far advanced in relatively less complex biological systems such as yeast [24,25]. Both Y2H and AP-MS were applied to map the comprehensive interactome in yeast as the earliest model species with available interactome [18,24-27]. After the yeast interactome, various large-scale efforts have helped to define protein interactome in several model organisms, including fruit fly [28,29], C. elegans , and human [31-33].
Despite the progresses in other species, no global level plant interactome has been published based on experimental data yet . Several National Science Foundation (NSF) funded projects have already initiated the process, but the only available Arabidopsis interactome work is based on the in silico prediction . Plant interactome mapping will help to elucidate the important signal transduction pathways for physiological, pathological and developmental processes. In particular, the protein-protein interaction networks will allow us to identify the so-called hub location of the networks, which are often genes with important functions [5,6]. In this review, we will first discuss several major platforms for plant protein-protein interaction network analysis and their advantages and limitations as applied to plant biology. The application of these platforms to study specific plant biology questions will be covered in detail. The future directions of the plant interactome research will be discussed at the end of the review.
Plant interactome can be mapped with either experimental or in silico methods. Despite the progresses with many different in silico prediction approaches, the experimental approaches like Y2H, BiFC, and AP-MS are still believed to be most reliable approaches for plant interactome mapping. As shown in Table 11, we hereby review the concepts, applications, pros and cons for each platform.
Yeast two-hybrid system was first introduced in 1989 and many improvements have been developed to fulfill the various needs for studying protein-protein interaction [23,35]. Y2H took advantage of the structure for a yeast transcriptional factor, GAL4 protein, which binds with UAS promoter to activate down-stream gene expression. GAL protein has two functional domains, the transcription activation domain and the DNA-binding domain. In Y2H, the bait protein and the target protein are fused with the two domains, respectively. The binding of the two proteins will bring the two domains together to generate a functional GAL4 transcriptional factor, which in turn can activate the UAS-driven expression of reporter genes [23,36,37].
Despite the breakthroughs of the Y2H technology, the technology has some serious drawbacks. First, the technology is often associated with a high false positive rate. Second, the interactions of Y2H system are often limited to nucleus in classic Y2H system. Many of the current Y2H vectors encode nucleus localization signals to transport the cytoplastic proteins into nuclear to study the interaction. However, such effort could further increase the false positive rates. In particular, membrane proteins and proteins with strong signal for cytoplastic expression will be difficult to study with Y2H [7,33]. Third, proteins toxic to yeast cells are difficult to study in Y2H system. Some modified vectors with lower expression of fused proteins may improve the performance of Y2H for toxic proteins. To address these problems and various needs, several systems including hSos/Ras recruitment systems , split-ubiquitin system , three-protein systems , small ligand-dependent systems , dual-bait systems , reverse two-hybrid systems, mammalian and bacterial cell two-hybrid systems were developed to improve the technology .
Y2H has been broadly used in plant biology for identifying the interactions among proteins, however, most of the previous research focused on a certain group of genes instead of systems level . Y2H-based interaction network was built up for essentially all Arabidopsis MADS domain proteins and revealed both specific heterodimers and homodimers. Proteins involved in the similar developmental process were clustered together. The information will help to predict the function of uncharacterized MADS domain proteins . In a similar study, the interaction between MYB protein and R/B-like BHLH were characterized, which helped to distinguish the functions of different MYB proteins with similar sequences . A couple of comprehensive network has been established for a specific group of gene or a particular biological process. Y2H-based protein-protein interaction network was built for genes involved in abiotic stress and seed germination in rice . Recently a network of 116 representative rice kinases and 254 of their interacting proteins were also mapped to understand the roles of protein kinases . Even though the work has revealed significant novel information to the protein kinase interaction and function, the research along with a similar work using AP-MS has highlighted two major limitations on the Y2H technology . First, the AP-MS method has led to the identification of more components of the protein complex as compared to the Y2H. Considering that Y2H mainly examined the interaction between two proteins (or binary interaction), the technology may not identify all the interactions in a complex since the indirect interactions will not be identified by the technique. Second, the Y2H results have to be validated considering that the interaction only gives the results of the interaction outside of the plant cell. The two protein needs to be presented at the same subcellular localization and at the same time to actually interact [4,48]. Y2H cannot take consideration of these factors.
Y2H-based mapping of Arabidopsis interactome are leading to more protein-protein interactions identified and these information are beginning to be available through The Arabidopsis Information Resource (TAIR). Considering the limitations of the Y2H platform, it is important to complement the Y2H-based interactome with other platforms like AP-MS and BiFC to reduce the false positive and study the dynamics of the network . As aforementioned, Y2H assay studies protein-protein interaction in vitro, whilst both BiFC and AP-MS detects the interaction in vivo. If two proteins have the interaction domain but never co-express or co-localize, Y2H will give the false positive of interaction. However, both BiFC and AP-MS will unlikely yield the positive results in the same way Y2H will do.
AP-MS involves the combination of affinity purification and mass spectrometry and has been broadly used to study the protein complexes . With the AP-MS method, a bait protein will be fused to an affinity tag such as his-tag, flag-tag, and TAP-tag for in vivo expression. The multi-component protein complexes can be isolated directly from cell lysate through affinity purification steps and analyzed by the downstream MS or MS/MS analysis. The search of mass spectra from MS/MS against a database with protein sequence will derive the candidate target proteins involved in the interaction . The advantages of AP-MS are obvious. First, the method can be conducted under native physiological conditions, which reflects the in vivo binding. Second, the approach allows us to probe the dynamics of protein-protein interaction, where protein interactions under various conditions can be analyzed. Third, the approach can be used to pull down the protein complex [49-52]. In addition, the recent development of cross-link methods has enabled the study the membrane proteins with AP-MS . Formaldehyde and di-thiobis-succinimidyl-propionate (DSP) are most often used reagents to cross-link the transient protein-protein interaction partners. Formaldehyde can couple the primary amines nearby, and the reaction can be reversed by heating. DSP is a membrane permeable chemical that can form amide bonds with primary amines [24,54,55].
Despite the significant advantages, the approach is not perfect. As in Y2H, AP-MS is often associated with high false positive rate [48,51-53]. The affinity-based protein purification is often associated with five different types of protein containments. First, the abundant proteins such as actin, tubulin and ribosomal proteins can often be found in the sample. In plants, RuBisCo is often a major containment for AP-MS in green tissues, but many of these containments can be easily ruled out by statistical analysis. Second, the proteins interacting with unfolded peptide like heat shock proteins may also be purified together with the targeted protein. Third, the proteins interacting with the affinity matrices can also be found. For example, proteins like STK38, arginine N-methyltransferase-5, and 52-kDa Ro protein were often found in the flag immuno-precipitates from human cells . In plant, background proteins in TAP purified proteins are heat-shock proteins, RuBisCo, cytoskeleton proteins, ribosomal proteins, and so on [51,66]. Fourth, the external proteins like keratin are often found in the sample. These external proteins can be easily ruled out. To eliminate false-positive, careful experimental design and proper controls need to be included to allow researchers to discriminate between contaminants and true interacting proteins. Tandem affinity purification (TAP) has been reported to help to avoid isolation of proteins with affinity for the distal tag or the matrix, however, the TAP tags were often found to interfere with the protein function [50,52,57-59]. In addition, control proteins expressed at similar levels with the same tag as the bait proteins can also help to identify true interacting proteins. High-throughput studies therefore have an advantage over low-throughput studies: a contaminant protein is often isolated with many unrelated bait proteins, a result that is easily recognized through analysis of the high-throughput data .
It has been shown that AP-MS results did not identify many previously documented protein-protein interactions [60,61]. The implications could be two-fold. On one side, the AP-MS may fail to identify the true interaction and thus leads to false negatives. On the other side, it could be that the protein interaction does not exist in the physiological conditions studied. One of the advantages for the AP-MS is to identify the dynamics of the protein-protein interaction, where the in vivo analysis allows us to probe the protein interaction under certain conditions. The negative results could be very valuable and allow us to identify the true interaction for a certain condition. It is therefore important to analyze the protein-protein interaction network with more than one approach.
AP-MS has recently been adapted to study the plant protein-protein interaction network. Among the different platforms, tandem affinity purification (TAP) followed with mass spectrum (MS)-based protein identification is the most common AP-MS platform to elucidate the protein-protein interaction network in planta [23,47,49,57,58,62,63]. As in the Y2H study, protein kinases become the one of the earliest targets for the AP-MS-based network due to their importance in signal transduction . Van Leene and colleagues used an integrated approach combining the Gateway cloning systems, transient expression in suspension cell cultures, TAP purification, and MALDI tandem mass spectrometry to study the cell cycle-related protein-protein interaction network . Forty-two protein-protein associations have been identified, among which 28 were new interactions for the six core cell cycle proteins . Despite the significant progresses of the technology, the approach is limited to the cell culture system and thus is not suitable for many other studies like whole plant stress response and plant insect interaction. The transient expression system for the intact plants is therefore necessary. Another major advancement of the AP-MS is the recent development of cross-linking approach to identify the interaction with membrane protein, which greatly improved the realm of AP-MS applications .
Bimolecular fluorescence complementation (BiFC) is based on the reconstitution of split non-fluorescent GFP variants to form a fluorescent and active protein complex emitting fluorescent signal . Basically, the bait proteins and target proteins will be fused with the partial GFP proteins, respectively. The binding of the bait and target proteins will lead to the fusion of the two combinatory parts of the fluorescent proteins, which can be observed by fluorescent microscopy. Considering the self-florescence of plants, Yellow Fluorescent Protein (YFP) and Red Fluorescent Protein (RFP) instead of Green Fluorescent Protein (GFP) are often used for plant BiFC studies [66,67]. BiFC can be used to explore the protein-protein interaction in various cellular compartments and localize the interaction complex, which is a major advantage for the technique . In addition, BiFC is a sensitive assay, enabling detection of weak and transient interactions, primarily due to the stability of the reconstituted YFP complexes . Despite the advantages of the technology, the application of the technique in the network and the whole interactome mapping is limited due to several major disadvantages of the technique. First, the technique is not optimal for the high-throughput assay considering the efforts for construct making and the various choices to match between bait and target proteins. Second, the slow maturation time of the reconstituted GFP/YFP/CFP (Cyan Fluorescent Protein) often compromises detection of dynamic changes in protein-protein interactions in real time [69-71]. Various efforts have been put into the development of YFP –based methods, considering that some YFP variants can mature within few seconds [72-74]. Third, the high level expression of split fluorescent protein can lead to the detection of non-specific binding, which results in the false positives . Fourth, the most important limitation of the technology as applied in plant biology is the auto-fluorescence of plant cell, which often interfere with the fluorescent signal of the BiFC experiment. The development of red fluorescent proteins with long excitation and emission wavelengths will help to overcome the problem partially. BiFC assays have been successfully applied in several plant species including Arabidopsis, tobacco and such [76,77]. BiFC has enabled the study of different protein complexes formed various CBL/CIPK proteins .
In addition to the experimental methods, the in silico methods combining the computational modeling with previous published interactions have also been used to establish the ‘predicted’ network. However, the reliability of such analysis is often controversial. Moreover, the prediction often comes with a static network, which is lack of dynamic information of protein-protein interaction. The in silico analysis often integrate multiple data types including the gene co-expression, co-localization, functional category, and the occurrence of orthologs or interologs to derive a global network in a species [79,80]. The major consideration for building the network connection is the so-called ‘interologs’, which are identified based on the sequence similarities to the proteins interacting with one another in other model species, such as yeast and human . However, almost 45% of predicted interolog proteins in Arabidopsis are annotated as unknown function, which impose the questions regarding the reliability of the interolog-based network. Increasing the coverage and depth of gene functional and ontology annotation will greatly help to promote the reliability of the interolog-based network prediction in Arabidopsis . The interolog-based method has led to the prediction of the global level interactome in Arabidopsis . Recently, a database is built to curate the predicted and published protein-protein interaction in Arabidopsis .
As aforementioned, the only available global interactome in plants is based on the in silico method, and the experimental approach-based plant interactome is yet to be built. However, several plant protein-protein interaction networks have been built for a certain group of proteins to address a particular biological question or to target an individual pathway. We hereby summarize these works from the perspective of how protein-protein interaction network has been employed to address various biological questions.
One of the major functions for protein-protein interaction is to regulate the signal transduction in plants. Many interacting proteins are actually important components for the signal transduction pathway and the dynamics of the interactions actually often defines the output of the signal transduction. Protein kinases and phosphatases are important components of signal transduction pathway. In particular, higher plants generally encode larger protein kinase and phosphatase families, assumingly due to the complicated defense and stress-related pathways involved. A comprehensive network for rice protein kinases was built to study the potential function of these kinases. The research allowed a better prediction of the molecular and biological function of kinases, particularly in relevance to plant defense . In addition, the two-component systems are important to the early stage of signal transduction. An Y2H-based network was built to characterize the two-component system involved in cytokinin signal transduction. The results indicated cytokinin receptors can interact with a variety different proteins potentially leading to different outcomes of the signaling . Besides the aforementioned work, most of the protein-protein networks are somehow related to signal transduction pathway of a certain biological processes including abiotic stress responses, development and organ formation, as well as plant defense against insects and pathogen [8,63,82].
In an earlier work, Cooper et al. employed Y2H to build a protein-protein interaction network with more than 200 genes involved in abiotic stress and seed germination. The protein-protein interaction network was integrated with the global gene expression and quantitative trait loci (QTL) data to reach a systems level understanding of the regulation of rice response to abiotic and biotic factors. The network has led to the discovery of some disease resistant related genes. Moreover, a certain level of functional correlation between monocot and dicot homologs were found through the network, which indicated that such a network could have a broader implication in terms of gene discovery and functional characterization . Recently, an Y2H-based network was built to characterize the interactions of 73 proteins and 97 interaction pairs in wheat for abiotic stress response and development. Almost all of the bait proteins along with their interactors were interconnected into a network. The research shed lights on the complex interactions among the transcription factors involved in flower development, ABA signaling, abiotic stress, and other processes .
The relevance between abiotic stress responses and plant development can be found at the signal transduction level, which is well reflected in the protein-protein network studies. Abiotic stress responses sometimes mimic certain stages of plant development. For example, the leaf cells undergoing dehydration stress express some of the same genes that embryonic cells express during development or seed desiccation. The aforementioned Y2H network for stress response and seed development built by Cooper et al. has revealed a broader overlapping of gene involved in developmental processes and stress responses. For example, a D-REP DOF ZFP protein has been found with an overlapping function between seed development and stress response. In a similar work in wheat, the aforementioned Y2H network built by Tardif also revealed the connections among flower initiation, vernalization, ABA signaling and abiotic stress, which are mediated by several important transcriptional factors . Protein-protein network was also built for other gene families important for developmental processes, including the MADS genes and MYB genes. The interaction network for MADS genes revealed that proteins involved in the similar developmental process were clustered together. The information helped to predict the function of uncharacterized MADS domain proteins . Similarly, the interactions between MYB protein and R/B-like BHLH were characterized, which helped to distinguish the functions of different MYB proteins with similar sequences . In addition, the interactions between CTR proteins and ethylene receptors were probed with both Y2H and BiFC, which further confirmed the role of CTR in ethylene signal transduction and fruit ripening .
No particular work has been dedicated to defense-related protein-protein interaction network. However, the building of interaction network for protein kinases has led to identify the potential involvement of E3 ubiquitin ligases in pathogen defense signaling mediated by receptor-like kinases. In particular, the fast expanding kinase gene families are more likely to be involved in defense. The finding correlates with the general assumption for kinases to be involved in defense and the phenomena that defense-related gene families in plants often experience fast expansion .
Van Leene et al. studied the Arabidopsis cell cycle protein-protein interaction network using a high-throughput AP-MS integrating the transient protein expression in suspension cell culture, TAP affinity purification, and MALDI-tandem MS protein identification. 42 protein-protein interactions were identified, and 28 out of the 42 interactions are new interactions. The results mapped the interactions around six core cell cycle related proteins and revealed important regulatory mechanisms for cell cycle .
Ubiquitination is a key regulatory mechanism to control protein abundance, localization, and activity in eukaryotic cells. Maor R et al. carried out a large-scale affinity purification of ubiquitinated proteins from Arabidopsis cell suspension culture and characterized the purified proteins with multidimensional protein identification technology (MudPIT) system for protein identification . Using a stringent protein identification selection, a total of 294 proteins specifically bound by the GST-tagged ubiquitin binding domains were identified. Among these proteins, 56 putative ubiquitinated proteins were shown to have 85 ubiquitinated lysine residues, which confirm the enrichment of the target class of proteins. The analysis provided an overview of ubiquinated Arabidopsis proteome, which can help to understand the proteome dynamics and regulation .
RNA polyadenylation is an important process for RNA maturation and turnover regulation. Protein-protein interaction network for mRNA polyadenylation machinery was built to identify the important genes involved in the process. The network identified several key genes in the hub location and the Y2H data was also validated with AP-MS. More importantly, the network combined with gene expression profiling revealed some sub-complexes that may involve in the RNA processing in tissue-specific and developmental stage-specific way .
As aforementioned, protein-protein interaction network and protein interactome is at cutting-edge to expand our understanding on biological processes and networks. The mapping of plant interactome will be important to elucidate molecular mechanisms for signal transduction, abiotic stress responses, plant defense, organ formation and many other biological events [4-6,9,49,85]. The study of protein-protein interaction network thus will also have significant implications in yield increase, pest management, stress resistance, biomass composition regulation and various other crop and biofuel-relevant traits [4-6,9,11,13,14,85,86].
Despite the progresses, several challenges need to be addressed to fulfill the full promises of the plant interactome and protein-protein interaction network. First, the dynamics of a network needs to be surveyed. Protein-protein interaction network is highly dynamic and the interaction in vivo is often transient. The dynamics of the network often determines the output of the network. It is thus important to understand the dynamics of protein-protein interaction network to dissect its role in regulating the dynamic biological processes. Such dynamics has been reflected by the comparative study of a network of five 14-3-3 isoforms and 150 target proteins in barley using both Y2H and AP-MS . Many more interactions were identified in the Y2H assay, which may well reflect the dynamics of the network, because AP-MS identifies the transient and in vivo interaction under a certain condition, whilst Y2H identifies if two proteins have the interacting domains and can interact. Y2H is thus limited to study the static network and will derive very limited dynamic information. Both BiFC and AP-MS can both be used to study the dynamics of protein network [49,78]. BiFC probably is the best option to study the system dynamics because it allows in vivo observation of the protein interactions through fluorescent signals and thus allows intracellular localization of the interaction. However, the resolution to dissolve the time scale of the network dynamics will depend on the protein stability and turnover rate. AP-MS can also be used to study the dynamics of protein-protein interaction, however, suitable transient expression system is required to enable the high-throughput AP-MS.
Besides the network dynamics, the comprehensiveness of the network is also an important consideration. The more comprehensive the plant interactome we can obtain, the better we will have systems level understanding of the regulation of various biological processes [5,87]. The recent efforts of mapping Arabidopsis interactome with Y2H will certainly enhance our knowledge about how proteins interact with one another in planta and what these interactions mean to the various biological processes. However, mapping interactome with one method may lead to severely biased data considering the inherent limitation of the Y2H technology, it is therefore important to complement the current research with other technologies such as AP-MS and BiFC. In the interactome study in yeast and human, both AP-MS and Y2H were used to study the protein-protein interaction network. We expect the same scenario applies to plant, where more than one technology are necessary to map the plant interactome and explore the dynamics of the network during the physiological, pathological and developmental processes.
Overall, protein-protein interaction network and interactome are important topics for plant systems biology. With more and more protein-protein interaction network information and the available of global scale interactome, we believe the protein interaction information will guide us to further understand the molecular and systems level mechanisms for various physiological, pathological and developmental processes in plants.