Most biological processes within the cell are carried out by proteins that physically interact to form stoichiometrically stable complexes. Even in the relatively simple model organism Saccharomyces cerevisiae (budding yeast), these complexes are comprised of many subunits that work in a coherent fashion. These complexes interact with individual proteins or other complexes to form functional modules and pathways that drive the cellular machinery. Therefore, a faithful reconstruction of the entire set of complexes (the 'complexosome') from the physical interactions among proteins (the 'interactome') is essential to not only understand complex formations, but also the higher level cellular organization.
Since the advent of "high-throughput" techniques in molecular biology, several screens have been introduced to infer physical interactions among proteins from organisms in a large-scale ("genome-wide") fashion. These have helped to catalogue significant amount of protein interactions in organisms such as yeast, thereby fueling computational techniques to systematically mine and analyse protein complexes from protein interaction (PPI) networks; for a survey of these methods, see [
1].
Though these methods have helped to identify a considerable complement of complexes in organisms such as yeast, a crucial aspect overlooked is the 'dynamics' of complexes. Many, if not all, complexes are dynamic entities whose subunits assemble at a particular sub-cellular space and time to perform a particular function and disassemble after that. However, the lack of suitable temporal information (the sub-cellular time at which a pair of proteins interact) in currently available high-throughput interaction datasets makes it difficult to computationally predict and study this dynamic behaviour of complexes. For example, if a subset of proteins in one complex is temporally involved in the formation of another complex but at a different sub-cellular time, then existing complex detection methods working solely on PPI networks cannot disambiguate the two complexes, instead they produce a whole fused cluster of proteins originating from both complexes as a single predicted complex. This severely impacts not only the accuracy of the predictions, but more critically our understanding of the underlying cellular organization. In fact in a recent (2010) foresightful survey by Przytycka et al. [
2], the authors emphasize that this lack of temporal information may have led to many cellular processes being wrongly understood. They suggest that if suitable information about the 'timing activities' of proteins can be obtained, the dynamical nature of the underlying organizational principles guiding protein interaction networks and complexes can be better understood.
Towards this direction, several studies have begun on the temporal behaviour of proteins within PPI networks [
3-
7]. These studies primarily integrate time information in the form of gene expression profiles of proteins with the topological characteristics (positioning of proteins) within PPI networks. These studies have revealed several interesting insights into cellular mechanisms which could not have been understood by ignoring time information, thereby reconfirming the claims of Przytycka et al. [
2]. The most important among these findings is the presence of two distinct kinds of 'hub' proteins within PPI networks - 'date hubs' and 'party hubs' - by Han et al. [
3].
However, all these works have still only been to the extent of studying temporal behaviour of individual or pairs or small groups of proteins in PPI networks. Since proteins seldom perform their functions in isolation, a deeper understanding of this behaviour can be obtained by studying larger functional groups of proteins. In our work, we study the temporal behaviour of whole protein complexes. We go about doing this by first identifying a suitable "time of reference" onto which the dynamic behaviour of protein subunits within complexes can be mapped, and employ this to study the dynamic assembly and disassembly of whole complexes. We chose the four phases of the yeast cell cycle as this time of reference. Experiments on this reveal an interesting relationship between the 'staticness' of a protein (constant expression across cell cycle phases) and its potential "reusability" across several phase-based complexes - 'static' proteins tend to be highly "reused" across complexes assembled and disassembled during different phases. We suspect that this pattern might be a biological design principle governing underlying cellular functions. Going further, we provide a new classification of proteins based on their temporal participation in complexes, and show that our classification in fact provides additional support and alternative explanations to earlier classifications like the 'date' and 'party' hubs by Han et al. [
3].
A brief survey of works incorporating temporal information into analysis of PPI networks
Most existing works have primarily integrated gene expression profiles with PPI networks to study the relationship between dynamics of proteins and their positioning within networks. Here, we briefly summarize some of these works.
Correlation between topological positioning of proteins in PPI network and their expression profiles
Based on the analysis using a high-confidence yeast PPI network, Han et al. (2004) [
3] reported an interesting dichotomy of hubs in PPI networks - 'date' hubs and 'party' hubs. Both date hubs and party hubs interact with multiple proteins, but date hubs interact with only one protein at a time (context), while party hubs interact with multiple proteins at the same time (context). Han et al. reported a strong correlation between the topological positioning of these hub proteins in PPI networks and their expression profiles - party hubs are 'modular' and are highly co-expressed with their neighbors, while date hubs are 'central' and are not co-expressed with their neighbors. Though this finding was critically questioned by Batada et al. [
4,
5], the existence of such dichotomy is now increasingly being accepted [
6,
7], and it paved the way for simultaneous analysis of topologies of networks and their gene expression profiles.
Taking this further, Komurov et al. (2007) [
7] studied how proteins with different expression dynamics were positioned in the yeast PPI network. Komurov et al. calculated the statistical expression variance (EV) of each gene in the yeast genome across 272 experiments compiled from SGD [
8]. An EV close to 0 indicated a gene with lowest variance (least dynamic), while an EV close to 1 indicated a gene with highest variance (most dynamic). Using a high-confidence PPI network comprising of 5456 interactions among 2315 proteins, Komurov et al. compared the EVs of proteins with their neighbors in the network, and found a strikingly high correlation between EVs of proteins and their neighbor EVs. This suggested that proteins had similar expression dynamics as their immediate neighbors in the network. This confirmed earlier findings (2001) [
9] that co-regulated proteins frequently interacted with each other. Carrying this forward, Komurov et al. extended the date-party hub hypothesis of Han et al. [
3] by proposing 'family' hubs. Komurov et al. reported that family hubs were constitutively expressed and interacted with their neighbors to form 'static' modules, while party hubs were dynamically co-expressed with their neighbors to form 'dynamic' modules. These static and dynamic modules were enriched with specialized functions.
Yu et al. (2007) [
10] studied the topological positioning of hubs in the yeast PPI network, and said 'date' hubs show high betweenness and are therefore inter-modular, while 'party' hubs show high clustering coefficient and therefore intra-modular. More recently (2011), Patil et al. [
11] classified hubs in PPI networks using a combination of gene co-expression correlation and co-expression stability among interacting proteins. The co-expression stability measures the extent to which a pair protein is constitutively co-expressed, that is, how "stable" is the co-expression. Based on these two measures, Patil et al. found that hubs showing high co-expression correlation as well as high stability (which they call 'Category 1' hubs) with their neighbors were likely to be intra-modular, while hubs showing low co-expression correlation but high stability ('Category 2' hubs) with their neighbors were likely to be inter-modular. Many of the Category 2 hubs were involved in transient interactions, and corresponded to 'date' hubs.
The 'dynamics' of complex formation during the yeast cell cycle
de Lichtenberg et al. (2005) [
12] studied the dynamics of complex formations during the yeast cell cycle. They constructed a PPI network comprising of 300 proteins (184 dynamic and 116 static) using Y2H and TAP/MS screens. Extraction of complexes from these screens and comparisons with known complexes from MIPS [
13] revealed 29 heavily intraconnected modules (complexes or complex variants) that existed at different "time points" during the cell cycle. Further, most complexes contained both constitutively expressed (static) as well as periodically expressed (dynamic) proteins. More interestingly, almost all eukaryotic complexes were
assembled just-in-time contrary to the just-in-time
synthesis observed in bacteria. Just-in-time assembly meant that most subunits of complexes were pre-transcribed, while some subunits were transcribed when required to assemble the final complex. This was more advantageous than just-in-time synthesis because only a few components of entire complexes had to be tightly regulated to control the timing of the final complex assembly. Holding off on the last components enabled the cell to prevent "switching on" of complexes at wrong times.
Our study of protein 'dynamics' in complexes
The discussed works are enough evidence to the claim that understanding of underlying cellular principles can be enhanced by studying the dynamics of proteins together with their topologies in PPI networks. However, these works focus only to the extent of studying pairs of proteins (neighbors) within PPI networks. Since proteins seldom perform their functions in isolation, a deeper understanding can be obtained by studying larger functional groups of proteins in the dynamics context. In our work, we study the dynamics of proteins through their participation in complexes.
Methodology
Its not straight-forward to study dynamics of whole complexes by directly correlating gene expression profiles of constituent proteins - this involves computing the expression correlations simultaneously among multiple proteins (and not just among pairs) which is not easy. To devise a simpler way, we "discretize" the profiling of proteins so that each protein can be assigned a unique discrete time during which it is active. Essentially, we first choose a suitable 'time of reference' containing discrete intervals of time. We then map each protein to a unique interval on this reference based on its peak expression such that two proteins falling within the same interval can be reasonably considered as "co-expressed" or simultaneously active, while those falling within different intervals as "not co-expressed". Once such a profiling of proteins is done, we map all constituent proteins within complexes onto this reference to understand the dynamic behaviour of whole complexes. This makes our analysis simpler as well as insightful, as we shall demonstrate.
Here, we use the yeast cell cycle as our discrete time of reference and its phases as our intervals. The cell cycle is a highly controlled process for duplication of cells. The yeast (eukaryotic) cell cycle consists of four distinct progressive phases G1 (Gap1) → S (synthesis) → G2 (Gap 2) → M (Mitosis). For each protein involved in the yeast cell cycle, we determine the phase in which the protein shows peak expression and map it to that phase. We then study the dynamic behaviour of whole complexes using the peak phases of the constituent proteins.
Of course by adopting only the cell cycle as our time of reference we will be able to study only cell cycle-related complexes. We identified the cell cycle because it is a highly controlled process with distinct temporal phases which makes it easy to bin proteins uniquely into the phases. Secondly, the availability of gene expression data for most of the cell-cycle proteins makes it convenient to compute the phases.
Experimental set up
We considered the four yeast PPI networks shown in Table for our experiments. All four networks are built from raw TAP-MS interaction data coming from two large-scale screens by Gavin et al. [
14] and Krogan et al. [
15]. However, datasets produced from large-scale screens are known to contain considerable amount of spurious (false positive) interactions. Therefore, here we first filter the datasets before performing our experiments. We used four reliability scoring schemes, namely, Iterative-CD [
16], FS Weight [
17], Purification Enrichment (Consolidated network) [
18] and Bootstrap-based [
19] to score the interactions within the network and filter out the noisy (spurious) interactions. The details of these scoring schemes are detailed in the corresponding references, but to summarize here, these schemes essentially assign a confidence score (range 0 - 1) to each interaction in the PPI network. These scores account for the technical uncertainties in the underlying experiments and therefore reflect the reliability of the interactions. Interactions with scores below a certain threshold (here, we consider 0.20) are discarded, and the remaining interactions are retained for our experiments.
| Table 1Yeast PPI networks used in our analysis |
We employed a recent (2010) complex detection method MCL-Caw [
20] to predict complexes from the four networks for our study. MCL-Caw clusters the PPI network solely on topological information to identify dense subnetworks, which are output as its predicted complexes. We further used the hand-curated yeast complexes from Wodak CYC2008 [
21] to substantiate the findings.
Assigning cell cycle phases to proteins
We assigned a unique cell cycle phase (
G1,
S, G2,
M ) to each protein based on the phase in which it showed peak expression. We call this procedure
Peak Expression Discretization (PED). For computing these phases we took the aid of Cyclebase
http://www.cyclebase.org/[
22]. Cyclebase averages gene expression datasets obtained from multiple microarray studies to compute the approximate phase of peak expression for each protein (see Figure ). If a protein is expressed maximum in exactly one phase, it is labeled 'dynamic' along with the corresponding peak phase, else if it expresses maximum in more than one phase it is labeled 'static'. Out of the considered 6114 yeast proteins, 5514 were labeled 'static', and the remaining 600 as 'dynamic'. Out of these 'dynamic' proteins, 576 had distinct a peak phase, while the remaining 24 were labeled 'uncertain'.
Studying temporal characteristics of PPI networks
To begin with, we integrated the computed cell cycle phases of proteins with our PPI networks and performed an analysis of network dynamics, as shown in Table . The table shows that interactions among static proteins (static-static) dominated the network (for example, 94.69% in Consol3.19). This is crucial to maintain the stability of the network. The static-dynamic and dynamic-dynamic interactions formed relatively smaller fractions of the networks (for example, S-D: 4.6% and D-D: 0.716% in the Consol3.19 network).
| Table 2Analysis of 'dynamism' in the four yeast PPI networks |
Further, we noticed that some of the dynamic partners of static proteins peaked in different cell cycle phases. In other words, a single static protein was involved in transient interactions with dynamic proteins peaking in different phases. These static proteins were enriched with a variety of Gene Ontology (GO) terms, the prominent ones being signal transduction and transcription. This indicated that these were likely "multipurpose" in nature. Their positioning in PPI networks showed that many of these static proteins were connected to different functional regions and they formed hubs in the networks. This indicated that 'staticness' or constitutive expression of a protein might be linked to the extent of "multipurpose" functions the protein was involved in, and also to the 'central' positioning of the protein in the PPI network.
Studying dynamics of complexes in PPI networks
Next, we performed our intended study on protein complexes; the workflow is shown in Figure .
A case study of cyclin-CDK complexes
Firstly, we present an interesting case study to motivate our analysis. Upon clustering the consolidated net-work using MCL-CAw, we obtained the following cluster containing Cdc28 (Ybr160w): {Ybr160w, Ygr108w, Ypr119w, Ydl155w, Ylr210w, Ypr120c, Ygr109c, Ymr199w, Ypl256c, Yal040c}. When we mapped the cell cycle phase data to the proteins in this cluster, we noticed that the proteins were expressed during different phases: Ybr160w - Static, Ygr108w -
M, Ypr119w -
G2, Ydl155w -
S, Ylr210w -
S, Ypr120c -
G1, Ygr109c -
G1, Ymr199w - G
1/S, Ypl256c -
G1, and Yal040c -
M (see Figure ). This revealed the existence of multiple 'time-based' complexes fused within this large cluster. Therefore, we decomposed the cluster based on the phases into multiple complexes, by assigning the static Ybr160w to each of the complexes. Validation against literature [
23] confirmed that Cdc28 (Ybr160w) is a
cyclin-dependent kinase (CDK) that participates in multiple complexes with its
cyclin partners, and each of our segregated complexes matched a validated CDK-cyclin complex in the Wodak catalogue [
21].
This procedure demonstrated, firstly, how incorporating time information helped to identify time-based complexes accurately which was not possible using only topology information from PPI networks. Secondly and more interestingly, the "reusability" of the 'static' protein Cdc28 across multiple complexes further hinted towards a possible relationship between 'staticness' and participation in multiple complexes or roles.
A global study of temporal "resuability" of proteins in complexes
We next performed a large-scale study of all complexes predicted from the yeast PPI networks to further confirm this potential link between 'staticness' and temporal reusability of proteins in complexes. To go about this, we first grouped the proteins within complexes into two sets - the proteins were specialized or unique to complexes, and the proteins that were shared among multiple complexes. We call the specialized proteins as "cores", while the shared proteins as "attachments". If there is a potential link between 'staticness' and temporal reusability of proteins, we expect the attachment proteins to be enriched higher in 'staticness' compared to the cores. We state this as our hypothesis and then test it.
Hypothesis We expect 'staticness' to be more enriched in attachments compared to cores in complexes.
Testing our hypothesis: Let λ
s(
X) denote the number of static proteins in set
X, and
λd(
X) denote the number of dynamic proteins in
X. Using this, we define the
enrichment E for static (dynamic) proteins among attachments and cores in the set of complexes
C as follows. For a complex
C
C the enrichment in the attachments
Attach(
C) is,
Therefore, the relative enrichment RE(Attach(C)) of static to dynamic proteins in the attachments in C is,
The enrichment and relative enrichment for cores is defined in a similar way. See an example calculation in Figure . The overall enrichment and relative enrichment for C is obtained by averaging over all complexes.
Table shows these values for the complexes predicted from four yeast PPI networks. These values clearly show that the attachment proteins were enriched considerably higher in 'staticness' compared to core proteins, thus supporting our hypothesis. For example, in the Consolidated network, the relative enrichment of 'staticness' for the attachments was RE(Attach) = 3.402 against RE(Core) = 0.839 for the cores.
| Table 3Analysis of 'dynamism' in cores and attachments of complexes predicted from PPI networks |
When we mapped some of these complexes back onto the PPI network, we found many of the shared 'static' proteins to be involved in "multiphase" interactions - several dynamic proteins peaking in different phases interacted with these shared 'static' proteins to form dynamic complexes. In other words, the static proteins formed "anchors" for dynamic proteins to form dynamic complexes. These findings hinted towards the biological design principle of temporal "reusability" of 'static' proteins across complexes. The sharing of static proteins among complexes instead of the dynamic proteins ensured maintenance of the generic proteins throughout all phases for their "reusability", while only the dynamic proteins had to be transcribed 'just-in-time' to assemble the required complexes. This strongly agreed with the findings by de Lichtenberg et al. [
12]. We analysed some of these shared 'static' proteins and found many to be
kinases that were involved in activating or deactivating cell cycle complexes. For example, Cdc20 was involved in deactivating the Anaphase Promoting Complex/Cyclosome to allow cell division to enter the
M phase.
On the other hand, Table also shows that there was no much difference in the enrichments of static and dynamic proteins in the cores, indicating that both static as well as dynamic proteins were equally capable of being part of cores. In other words, specialized sets of proteins may be either static or dynamic. This agreed with the findings by Komurov et al. [
7] that both static as well as dynamic proteins were equally capable of forming core functional modules - the static proteins formed 'static modules' while the dynamic proteins formed 'dynamic modules', both of which were involved in vital functions of the cell.
Relating our findings to previous studies
Based on the analyses here, we relate our findings to previously discussed studies on combining PPI network and gene expression data by Han et al. [
3], Kumorov and White [
7], Yu et al. [
10] and Patil et al. [
11], and the work on essential proteins by Pereira-Leal et al. [
6]. We provide a new classification of proteins based on their participation in complexes into static "reused" and static/dynamic "specialized" (non-resused) proteins. We relate this classification to that of hubs by the previous works, as show in Table .
| Table 4Relating our findings with those existing works |
The hub proteins that Han et al. and Kumorov and White categorized as 'date' and 'party' hubs correspond to the static reused proteins and the dynamic specialized proteins within complexes, respectively, in our study. The static reused proteins among complexes interact transiently with different sets of proteins to form different temporal complexes (for example, Cdk kinases), and thereby correspond to 'date' hubs. The dynamic proteins get together to form dynamic complexes at a particular time and disintegrate after that; these correspond to the 'party' hubs (for example, dynamic proteins forming the APC/C complex in G1/S phases). The 'family' hubs of Kumorov and White correspond to the static specialized proteins that form static complexes (for example, the ribosomal complexes). Further, the Category 2 and Category 1 hubs of Patil et al.'s studies correspond to our static reused and static specialized proteins, respectively. Relating to Yu et al.'s characterization of hubs into inter-modular and intra-modular, we note that the static reused hubs are shared among complexes and therefore inter-modular, while the static/dynamic specialized hubs are found within complexes and therefore intra-modular. Finally, relating to Pereira-Leal et al.'s findings, we note that many of our reused proteins are involved in multi-purpose roles (example, kinases), which tend to be essential proteins. These relationships are summarized in Table . Therefore, our study provides alternative explanations and additional evidence based on temporal participation in complexes to the classification of hubs from previous studies.