The validity of network evolution models have been measured mainly by the resulting network topology, such as a power-law degree distribution, hierarchical modularity and dissortativity as observed in real PPI networks. Accordingly, the DD model has been thought of as the principal mechanism for PPI network evolution. Here, we dissect the history of PPI network evolution by inspecting several protein age-dependent patterns such as interaction density, homodimeric frequency, and the 3-D spatial arrangement of subunits within multiprotein complexes. The age-dependencies are shown to be very effective in discriminating the validity of different models as summarized in . The tested aspects of age-dependency were independent of topologies as well as of each other, and are thus highly useful as orthogonal criteria for valid models. Importantly, the age-dependent interaction patterns provided insights on PPI evolution, suggesting evidence against the DD model as the dominant mode of PPI network evolution, instead supporting an alternative model, the CG model.
Properties of yeast PPI networks and the tested network evolution models.
In the CG model, we view the PPI network as sparse and dynamic protein crystals per se. The CG model mimics the process of growing protein crystals in solution by sequentially adding each protein. Despite the huge differences in time scale and heterogeneous composition, PPI network evolution likely obeys similar constraints on growing protein crystals. In the CG model, a protein complex or a tightly linked module is analogous to individual crystals, and the number and membership of modules are not pre-defined but rather emerge naturally in each growing step. Crystals grow around multiple nuclei just as protein networks consist of multiple modules/complexes. New modules are generated as the genome size increases and novel function evolves in higher organisms, in a manner similar to how a new crystal forms occasionally through new nucleation events.
The CG model exploits two keys ideas, the first being that the chance of new connection is proportional to the availability of free surface, which is a feature readily recognized by a new protein molecule; this results in an anti-preferential attachment (AP) rule. Although the same surface of a protein can be involved in multiple interactions with different partners through spatial and temporal differentiation, such a factor uniformly increases the capacity of interactions in any protein. Therefore, the connection probability is still positively correlated with the available surface area. These results agree with those of Kim et al. 
, which show that the evolutionary rate is anti-correlated with available surface area. There, multi-interface hubs were nearly four times more frequent than single-interface hubs, reflecting the dominant connection mode of the AP rule. The second key idea is that once an initial connection is made, the subsequent connections are localized to the neighbors of the initial partner within the same module. This localized connection enforces high modularity, similar to that observed in real PPI networks.
At the basis of the crystal growth model is the notion that new interactions form preferentially within existing physical complexes (enforcing modularity), and thus are limited by available protein surface area (the AP rule). Thus modularity & the AP rule both arise due to simple physical constraints of which proteins are most accessible to each other. Recently, Levy and colleagues has shown that the successive steps of homo-oligomeric assembly mimics the evolutionary pathway 
. The CG model expands this idea, where crystal growth reproduces the evolution of the entire PPI network.
Given that the CG model follows an AP rule, how does it generate scale-freeness or “the rich get richer” connectivity? In the CG model, the network grows by anchoring and extension, where a node increases its degree either by becoming an anchor node (anchoring) or by being the neighbor of the anchor node (extension). Therefore, the highly connected nodes have greater chances to increase their degree within each module because they have more opportunities to have anchors as their neighbors. Therefore, the CG model implicitly implements the preferential attachment (PA) rule within each module in a manner similar to the DD model, where the nodes increase their degree by having duplicating genes as their neighbors.
Our result suggests that the CG model is a more plausible mechanism for PPI network evolution than the DD model. First, all the age-dependent aspects tested agree well with the CG model but disagree with the DD model. Second, the CG model is more comprehensive than the DD model in that the CG model can accommodate both gene duplication and horizontal gene transfer as the origins of new nodes (genes). Practically, the DD model may be applicable only to ~20% of the yeast proteome having identifiable duplicates 
. The CG model also embodies the rapid divergence of gene duplicates 
by the AP rule, which avoids competition for the same interface on common partners and connects to new partners with less occupied surfaces. Finally, the CG model is more robust than the DD model. The DD model shows a highly variable degree distribution depending upon parameters and network sizes 
. In contrast, the CG model shows stable characteristics regardless of network size or different module definition methods. Taken together, these strongly suggest that the DD model is unlikely to be the principal, and strongly unlikely to be the sole, mechanism of PPI network evolution.
The age-dependency of interaction density also sheds light on a more fundamental question regarding the mechanism of PPI network evolution. It has been hypothesized that inherent features of proteins, such as stickiness and hydrophobicity are dominant factors in shaping the global network structure 
. However, the observed age-dependency is inconsistent with such a hypothesis and suggests that a stochastic process played a major role. For example, the yeast PPI network shows the patterns of both DABE,AE/BE
(the row IV in ). The connection probability cannot depend solely upon a feature such as protein length or surface hydrophobicity because no single feature (F) can satisfy FAE/BE
(with common FABE
) and FAE/BE
(with common FFu
Power-law distributions have been commonly observed in various types of networks, such as the Internet, social networks, and biological networks. However, the growth of a PPI network poses unique constraints compared to other types of networks. For example, in an airline or railroad network, each new connection is made by considering the context of global network topology (e.g., to minimize average path length), which seems intuitively unlikely to be the case in PPI networks. The CG model follows two simple constraints of available free surface and localized connection, which are physically plausible and depend only on local context but not global topology. With these minimal assumptions analogous to growing protein crystals, the CG model recapitulates remarkably well the age-dependencies as well as the network topologies of the yeast PPI networks.