Analyses of biological networks have revealed modular structures
]. Parter et al.
] found that bacterial species living in variable habitats have metabolic networks with significantly higher modularities than bacterial species living in less variable habitats. According to one explanation, since modularity promotes evolvability, enabling bacteria to quickly adapt to varying environments, having a more modular metabolic network is an evolutionarily favored trait for species living in open habitats such as soil and sea. In other words, high modularity is selected for by evolution for species living in these varying habitats (edge 1 in Figure
). The robustness of metabolic networks, a concept related to modularity
], as measured by the maintenance of a phenotype (e.g., growth) under perturbation (e.g., mutation or gene loss), has been shown, both in vivo
and in simulation, to have risen from fluctuating environments
]. An alternative explanation can be formulated from the other direction: because species with a higher modularity in their metabolic networks are more capable of adapting to changes in environment, they colonize a wider range of habitats, giving rise to the observation that bacteria living in varying habitats have more modular metabolic networks (edge 2 in Figure
). In another recent study of an Archaea data set
], such relationship between modularity and habitat variability was not found, which calls for more investigation of alternative explanations.
A feedback loop between modularity and habitat variability. Two different explanations of the association of the modularity score with the habitat variability.
Modularity as a graph-theoretic concept, when studied on biological networks, can be quantified in different ways
]. In the works of Parter et al.
] and Kreimer et al.
], modularity is based on the definition of Newman and Girvan
]. This definition quantifies the extent to which the graph connectivity of a network exhibits a modular structure, that is, communities with a majority of the connections falling within, rather than across, communities. Roughly speaking, the modularity score Q
] (see Methods), which is a quantity associated with a partition of the network, indicates how much more likely it is for an edge to be placed inside a community from that partition than would be expected from a random selection of neighbors for a node of a certain degree. The partition of nodes that gives rise to the maximum Q
value is regarded as the community structure of the graph, and the score itself is taken to be the graph’s modularity.
Although the modularity score depends on the community structure, similar modularity scores may arise from different community structures. It is natural to ask (and is currently unknown) whether a specific modularity (high or low) of metabolic networks is the result of acquiring a similar community structure or of achieving different community structures. More specifically, assuming that network modularity plays an adaptive role
], as is the case for the first explanation (Figure
), is it the modularity score that confers higher fitness regardless of the community structure giving rise to it, or is it the community structure that is the unit of selection and modularity is conserved only as a consequence? If modularity is achieved via similar community structures, it might be the community structure that is the unit of selection under different environments. That said, any observed association of modularities with the environmental features
] or growth conditions
] would naturally give rise to a question as to whether such a correlation arises due to similar community structures (which, by definition, would have similar modularity scores) or different community structures with similar modularity scores.
In this work, we analyzed metabolic networks of species spanning three kingdoms of life by computing their community structures and modularity scores (see Methods for details on metabolic network reconstruction). We compared the difference in community structures against the difference in modularities and the genetic distance, to investigate the correlation, or lack thereof, among the three. The results suggest that the difference in community structures does not parallel the difference in modularity scores we compute, except when community structures are extremely similar. That is, we find that larger community structure differences do not necessarily mean larger differences in modularity scores and vice versa, which is an indication of convergent evolution of modularities via different underlying community structures. To further understand the evolutionary driving force behind such convergent evolution, we revisited the analysis of Parter et al.
], which first associated modularity with habitat variability, but under different aspects of the microbial life styles, including temperature preference and oxygen requirement. We also confirmed the finding of Kreimer et al.
] that the size of the metabolome (the number of enzymes) is a major determinant of the modularity score, even after the score is normalized and believed to be size-independent on general (non-metabolic) networks.
From a computational perspective, a contribution of this paper is an improved heuristic based on spectral decomposition for modularity optimization
] using a self-organizational merge and resplit refinement. The goal of this improvement is to deterministically identify more optimal modularity scores and community structures efficiently. We show, on well-studied benchmark data sets, that compared to the original algorithm of Newman
] and some other existing algorithms
], our algorithm achieves higher Q
scores at the cost of only a moderate increase in time.