Search tips
Search criteria 


Logo of mbioJournal InfoAuthorsReviewersBoard of EditorsJournals ASM.orgmBiomBio Article
mBio. 2010 Sep-Oct; 1(4): e00169-10.
Published online 2010 October 5. doi:  10.1128/mBio.00169-10
PMCID: PMC2953006

Functional Molecular Ecological Networks


Biodiversity and its responses to environmental changes are central issues in ecology and for society. Almost all microbial biodiversity research focuses on “species” richness and abundance but not on their interactions. Although a network approach is powerful in describing ecological interactions among species, defining the network structure in a microbial community is a great challenge. Also, although the stimulating effects of elevated CO2 (eCO2) on plant growth and primary productivity are well established, its influences on belowground microbial communities, especially microbial interactions, are poorly understood. Here, a random matrix theory (RMT)-based conceptual framework for identifying functional molecular ecological networks was developed with the high-throughput functional gene array hybridization data of soil microbial communities in a long-term grassland FACE (free air, CO2 enrichment) experiment. Our results indicate that RMT is powerful in identifying functional molecular ecological networks in microbial communities. Both functional molecular ecological networks under eCO2 and ambient CO2 (aCO2) possessed the general characteristics of complex systems such as scale free, small world, modular, and hierarchical. However, the topological structures of the functional molecular ecological networks are distinctly different between eCO2 and aCO2, at the levels of the entire communities, individual functional gene categories/groups, and functional genes/sequences, suggesting that eCO2 dramatically altered the network interactions among different microbial functional genes/populations. Such a shift in network structure is also significantly correlated with soil geochemical variables. In short, elucidating network interactions in microbial communities and their responses to environmental changes is fundamentally important for research in microbial ecology, systems microbiology, and global change.


Microorganisms are the foundation of the Earth’s biosphere and play integral and unique roles in various ecosystem processes and functions. In an ecosystem, various microorganisms interact with each other to form complicated networks. Elucidating network interactions and their responses to environmental changes is difficult due to the lack of appropriate experimental data and an appropriate theoretical framework. This study provides a conceptual framework to construct interaction networks in microbial communities based on high-throughput functional gene array hybridization data. It also first documents that elevated carbon dioxide in the atmosphere dramatically alters the network interactions in soil microbial communities, which could have important implications in assessing the responses of ecosystems to climate change. The conceptual framework developed allows microbiologists to address research questions unapproachable previously by focusing on network interactions beyond the listing of, e.g., the number and abundance of species. Thus, this study could represent transformative research and a paradigm shift in microbial ecology.


An ecosystem is a complex system in which various species interact with each other to form complicated networks (1). Through such network interactions, an ecosystem is capable of accomplishing systems level functions (e.g., nutrient cycling, ecosystem stability) which could not be achieved by individual populations. Explaining and predicting network structures, dynamics, and the underlying mechanisms are essential parts of ecology. Although ecological networks of biological communities have been intensively studied in macrobial ecology (15), very limited studies have been carried out in microbial communities due to their vast diversity and as-yet-uncultivated status (68).

Massive amounts of data on microbial community diversity and dynamics across various spatial and temporal scales can be generated with metagenomics and associated technologies, such as high-throughput sequencing and microarrays (9, 10), which offer an unprecedented opportunity to examine network interactions among different microbial species/populations (6). Recently, microarray-based high-throughput technologies, such as GeoChip (10), have been developed and are enabling microbial ecologists to address a variety of ecological questions on a community-wide scale (11, 12). However, identification of network structure based on such high-throughput microarray data is challenging.

Various network methods have been developed for inferring cellular networks based on gene expression data (13), such as differential equation-based network methods, Bayesian network methods, and relevance/coexpression network methods (14). The correlation-based relevance network methods are most commonly used for identifying cellular networks (14) based on gene expression data because of their computational simplicity and the nature of microarray data (typically noisy, highly dimensional, and significantly undersampled) (13). However, most methods for relevance network analysis use arbitrary thresholds, which are often determined based on known biological information (15). Thus, the network structure largely depends on the thresholds chosen. It is difficult to select appropriate thresholds, especially for poorly studied organisms. We have previously developed a novel random matrix theory (RMT)-based approach to automatically identify cellular networks from microarray data (16, 17). Our results indicated that this approach is a reliable, sensitive, and robust tool for modular network identification and gene function prediction through high-throughput genomic data (16, 18, 19).

Here, we describe the novel RMT-based network approach (16, 17) to delineate and characterize functional molecular ecological networks (MENs) in microbial communities based on GeoChip hybridization data by addressing the following questions. (i) How can the functional molecular ecological networks in microbial communities be identified based on high-throughput GeoChip hybridization data? (ii) Does elevated CO2 (eCO2) have an impact on the functional network structure of soil microbial communities? To answer these questions, a novel conceptual framework of MENs has been developed for identifying and characterizing interaction networks in microbial communities based on high-throughput GeoChip hybridization data derived from the microbial communities under both eCO2 and ambient CO2 (aCO2) in a multifactor grassland FACE (free air, carbon dioxide enrichment) experiment, BioCON (biodiversity, CO2, and nitrogen deposition), at the Cedar Creek Ecosystem Science Reserve in Minnesota (20). Our results indicated that the functional ecological networks in microbial communities can be discerned using the RMT-based network approach and that eCO2 has a significant impact on network interactions of microbial communities.


Molecular ecological networks.

The detection and quantitation of microorganisms often rely on individual genes or gene-like DNA fragments such as 16S rRNA genes, functional genes, and intergenic regions. Based on gene abundance data, a network graph can be developed to represent the ecological interactions (links) of different gene markers (nodes) in a microbial community (21). Strictly speaking, the ecological networks determined in this way should reflect the interactions among different microbial populations carrying the OTUs (operational taxonomic units) or functional genes of interest rather than individual “species” in a microbial community. Thus, to avoid confusion, we especially refer to such molecule-based networks in microbial communities as MENs, in which OTUs or functional genes (nodes) are connected by pairwise interactions (links). In addition, MENs derived from phylogenetic gene markers (i.e., 16S rRNA gene sequencing data) are referred to as phylogenetic MENs (pMENs), whereas MENs derived from functional gene markers (e.g., GeoChip hybridization data) are called functional MENs (fMENs).

A general framework of MEN analysis is illustrated in Fig. 1. First, high-throughput metagenomic data (e.g., large-scale sequencing and functional gene array hybridization) are collected and appropriately transformed. Then, a pairwise Pearson correlation between any two genes is estimated based on the gene abundance data and the absolute value of the pairwise correlation coefficient is used to measure their similarity. As a result, a similarity matrix is obtained, which is subsequently transformed into an adjacency matrix by applying a threshold to the correlation values based on an RMT approach. Once the adjacency matrix, which measures the strengths of the connections between nodes, is defined, module analysis and network characterization are performed. Summing the strengths of the connections of each gene with all of the other connected genes yields a single network parameter, connectivity, which represents how strongly that gene is connected to all of the other genes in the network. In addition, the relationships of topological network characteristics (e.g., connectivity) to the sample traits of interest are evaluated to understand the importance of network properties in determining community functions.

Overview of RMT-based MEN analysis. Six key steps are outlined here for molecular ecological analysis. A typical figure is placed in each step to highlight the main characteristics of such types of analysis.

Identification of functional MENs.

Based on RMT, two universal extreme distributions of the nearest-neighbor spacing distribution of eigenvalues are predicted. One is Gaussian orthogonal ensemble (GOE) statistics, which reflects the random properties of complex systems. The other is Poisson distribution, which is related to the system-specific, nonrandom properties of complex systems (22). The two predictions should be applicable to ecological communities if they are universal to complex systems based on RMT. Thus, we assume that there is a transition of the nearest-neighbor spacing distribution of eigenvalues from GOE to Poisson distributions, and this transition can serve as a reference point to distinguish random noise from system-specific, nonrandom properties embedded in high-throughput metagenomic data. This reference point is mathematically defined and can be used as a threshold to identify MENs in an automatic and objective fashion (16).

To discern the ecological network structure in microbial communities, GeoChip-based microbial community functional diversity data from aCO2 and eCO2 were analyzed using the RMT-based network approach (16, 17). Clear transitions of the nearest-neighbor spacing distribution of eigenvalues from GOE to Poisson distribution were observed for grassland microbial communities under aCO2 and eCO2, as indicated by the existence of a similarity threshold in Table 1. These results suggested that the two universal predictions based on RMT are applicable to the microbial communities examined. The transition points were used as the similarity thresholds (st) for network construction. Based on the RMT approach, the thresholds were determined to be 0.80 for both microbial communities (Table 1). If two genes have a correlation larger than this threshold, 0.80, this correlation is highly significant (P < 0.0018) based on the Fisher transformation test. This is also consistent with our previous analyses of gene coexpression networks which showed that the RMT-based approach is able to effectively remove random noise inherent in high-throughput microarray hybridization data (16, 23). Thus, these results suggested that the RMT-based network approach can be used to identify the network structure of microbial communities based on array hybridization data.

Major topological properties of the empirical MENs of microbial communities under eCO2 and aCO2 and their associated random MENs

General characteristics of functional MENs.

To understand whether the identified fMENs have a network topology similar to that of other complex systems, several important general network features, such as scale free, small world, modular, and hierarchical (15, 24, 25), were examined. A scale-free network is a network whose connectivity follows a power law, at least asymptotically; that is, a few nodes in the network have many connections with other nodes while most of the nodes have a few connections. Also, a small-world network is a network in which most of the nodes are not neighbors of one another but most of the nodes can be reached from every other node by a small number of steps.

The degree distributions (or connectivity) in all of the constructed fMENs fit the power law model well, with typical correlation values of −0.89 to −0.94, respectively (Table 1; see Fig. S1 in the supplemental material), indicating that the fMENs in these microbial communities exhibited scale-free behavior, at least approximately. Also, the path length and clustering coefficients were significantly different from those of the corresponding random networks with the same network size and average number of links (Table 1) and were comparable to those in other networks displaying small-world behavior, indicating that the MENs in these microbial communities show typical small-world characteristics.

Many networks in biological and engineering systems are modular (24). In the fMENs examined here, a module is a group of functional genes that are highly connected among themselves but have few connections with the functional genes belonging to other modules. The two fMENs examined were modular, with a significantly higher modularity (M) than those from the corresponding random networks (Table 1).

Hierarchy is a central organizing principle of complex networks, but there is no formal definition of hierarchical topology (26). One of the most important signatures of hierarchical modular organizations is that the scaling of clustering coefficients follows C(k) ~ k−γ, in which k is connectivity and γ is a constant. By log transformation, we will have log[C(k)] ~ −γ log(k); that is, the logarithms of clustering coefficients have a linear relationship with the logarithms of connectivity. The clustering coefficients for the MENs examined followed log[C(k)] ~ −γ log(k) (r = −0.31 ~ −0.85, P < 0.001), suggesting that all of the MENs examined here appeared to be hierarchical. However, it should be cautioned that global network properties such as the average shortest pathway, degree distribution, and clustering coefficient may fail to capture potentially important network structure features (27).

Impact of eCO2 on the architecture of whole MENs.

Although identical thresholds were used to define the network, the network size was considerably bigger under eCO2 than under aCO2 (Table 1). Also, the network composition was substantially different. Only 129 (43%) nodes of the fMENs were shared by eCO2 and aCO2. However, the connectivity values for the genes shared by these two networks were significantly correlated (r = 0.379, P < 0.001).

Various network indexes were calculated separately for both fMENs under eCO2 and aCO2. To test their statistical significance, 100 random networks corresponding to both fMENs were also generated, respectively. The standard deviations of individual indexes were estimated based on their corresponding random networks and used for the Student t test of their significance between eCO2 and aCO2. All of these indexes were significantly different (P < 0.001) between aCO2 and eCO2 (see Table S2 in the supplemental material), indicating that the overall network structures of these two microbial communities were distinctly different. Interestingly, compared to aCO2, the fMENs at eCO2 generally had significantly higher connectivity, shorter path lengths, higher clustering efficiencies, and more modules (Table 1), which are key network properties in terms of system efficiency and robustness (15, 28). All of the above results suggested that eCO2 could have a significant impact on the overall architecture of the fMENs in these grassland microbial communities and that the overall network composition and structure are not well conserved between eCO2 and aCO2.

Totals of seven and eight modules with more than five nodes were obtained for the networks under aCO2 and eCO2, respectively (see Fig. S2 in the supplemental material). The sizes of these modules vary substantially, ranging from 6 to 59 nodes (see Fig. S2B and D in the supplemental material). These networks also differed significantly from each other in modularity (M) (Table 1). Fisher’s exact test showed that no modules can be statistically paired between eCO2 and aCO2, suggesting that these two networks are even less conserved at the modular level.

Effects of eCO2 on the network interactions of functional gene categories/groups.

To examine the effects of eCO2 on the network interactions at the levels of functional gene categories/groups, the network members were classified based on their associated functional gene groups. Consistent with the overall network topology, the network structures, as measured by average connectivity, average clustering coefficients, average path lengths, and modularity, were significantly different between eCO2 and aCO2 for the genes involved in C, N, P, and S cycling (see Fig. S3 in the supplemental material). They were also distinctly different from the corresponding random networks. Similar to the entire networks, all of these subnetworks had significantly higher connectivity, shorter path lengths, and higher clustering efficiencies under eCO2 than under aCO2 (see Fig. S3 in the supplemental material).

The impact of eCO2 on network structure could also be reflected at the functional gene group level. Because most (57%) of the nodes were not shared by these two networks (see Table S3 in the supplemental material), for making a meaningful direct comparison, only the shared nodes from individual functional gene groups in these two networks were considered. As expected, the network complexities of individual functional gene groups, as measured by the node number, the average connectivity of the shared nodes, and the Shannon diversity of the connectivity among various functional groups, were also considerably different between aCO2 and eCO2 (Fig. 2; see Table S3 in the supplemental material). The network complexity for most functional gene groups was generally higher under eCO2 than under aCO2. For instance, the numbers of nodes for C fixation (pmL, rbcL), N fixation (nifH), and sulfate reduction (dsrA) were substantially higher under eCO2 than under aCO2 (Fig. 2). These results suggested that eCO2 substantially altered the network structure among various microbial functional gene groups in this grassland ecosystem.

Distributions of major functional genes in the network under aCO2 (blue) and eCO2 (red). The distribution of genes varies substantially among different functional groups. The gene designations are explained in the legend to Fig. 3.

Differential influences of eCO2 on network interactions of individual functional genes/sequences.

To determine how eCO2 affects the network structure of individual functional genes/sequences, the connectivity and clustering coefficients for the individual nodes shared by these two networks were estimated, followed by paired t tests to examine their statistical significance. Significant differences in connectivity (P = 8.7 × 10−11) or clustering coefficients (P = 1.63 × 10−5) were observed between these two networks under eCO2 and aCO2. These results suggested that eCO2 significantly shifted the network structure of individual functional genes.

Since the structure of the entire networks is too complicated to display, only very limited numbers of key nodes with higher connectivity were considered. The top six functional genes with the highest connectivities under eCO2 were examined (Fig. 3). The network interactions among the top six functional genes of the entire networks under eCO2 (Fig. 3A) were distinctly different from the network interactions of the corresponding genes under aCO2 (Fig. 3B). These six genes under eCO2 had far more complicated network interactions than the corresponding genes under aCO2 in terms of network size, connectivities, and clustering coefficients (Fig. 3C). In contrast, the identities of the highest-ranked genes under aCO2 (see Fig. S4A in the supplemental material), based on connectivities, were substantially different from those under eCO2 (Fig. 3A). The network interactions of these top key genes under aCO2 are also quite different from the corresponding genes under eCO2 (see Fig. S4A and B in the supplemental material).

Impact of eCO2 on the network interactions of key functional genes. (A) Network interactions of the top six functional genes with the highest connectivities under eCO2. (B) Network interactions of the corresponding functional genes under aCO2. Each node ...

eCO2 had differential influences on the network interactions of various individual functional genes/sequences (see Table S3 in the supplemental material). Due to limited space, it is not possible to describe them in detail here. Thus, we picked nifH genes as an example because our previous results showed that the total abundance of nifH genes increased much more than that of other functional genes under eCO2 (29) and the connectivities for all of the shared nifH genes in N fixation were significantly different between eCO2 and aCO2 based on a pairwise t test (P < 0.001) (see Table S4 in the supplemental material). The N-fixing populations monitored by nifH genes had far more complex network interactions with other functional groups of diverse phylogenetic compositions under eCO2 (Fig. 4A) than under aCO2 (Fig. 4B). Also, under eCO2, several nifH hubs (e.g., 110630622, 89512768, and 76667345) were observed and each formed a separate module interacting with various other functional groups (Fig. 4). In addition, the top nifH network hub from an uncultivated bacterium (GenBank ID: 110630622) (see Table S4 in the supplemental material) had positive interactions with many functional gene groups of diverse phylogenetic origins (indicated by the colors of the nodes), such as those involved in N fixation, denitrification, C fixation, C degradation, sulfate reduction, sulfur oxidation, and P utilization (Fig. 5A). Positive interactions may reflect commonly preferred conditions or cooperative behaviors such as cross feeding, syntrophic interactions, and mutualistic interactions (6). However, the same N fixer had very few interactions with other functional gene groups under aCO2 (Fig. 5B; see Table S4 in the supplemental material). These results also suggested that eCO2 dramatically changed the network interactions of various microbial functional genes/populations in the grassland ecosystem.

Network interactions of microorganisms containing nifH genes under eCO2 (A) and aCO2 (B). Microorganisms containing nifH genes formed complex network interactions with other functional groups, and some nifH-containing populations serve as central hubs ...
Network interactions of a nifH hub under both eCO2 and aCO2. The nifH-containing uncultivated microorganism had intensive positive interactions with many functional groups of diverse phylogenetic origins under eCO2 (A) but very simple interactions with ...

Association of network structure with ecological functional traits.

Since microorganisms mediate important biogeochemical cycles of C, N, and P in soils, one intriguing question is whether the network interactions altered under eCO2 are relevant to soil geochemistry and plant productivity. To discern the relationships among microbial network interactions, soil properties, and plant variables, Mantel tests were performed. Because using many unrelated individual variables may mask the signature of any significant variables, the trait-based gene significance measure (30), defined as the square of the correlation between the signal intensity of a gene and each soil or plant variable, was used to identify common subsets of soil and plant variables important to network interactions. Partial Mantel tests revealed very strong correlations between gene connectivity and the gene significance of the selected soil variables based on all of the genes detected or on subsets of the genes involved in recalcitrant C degradation (P < 0.05) or N fixation (P < 0.01) under eCO2. Also, a strong correlation between gene connectivity and the gene significance of the selected plant variables was obtained based on all of the genes detected that are involved in N cycling (P < 0.001). However, none of them were significant (P < 0.05) under aCO2. These results suggested that the microbial community network interactions were, to some extent, related to soil and plant variables and that eCO2 could have a significant impact on such relationships.


Microorganisms play critical roles in biogeochemical cycling of C, N, S, P, and various metals, but the precise roles of many microorganisms in these cycles are unknown (31). Elucidating their network interactions and linking them to ecosystem processes and functions is difficult. Although the availability of community-wide metagenomic data across many replicated samples offers an unprecedented opportunity to examine the network interactions in microbial communities, identifying network structure with such high-throughput metagenomic data is very challenging. Based on high-throughput metagenomic data such as array hybridization, a conceptual framework for studying functional network interactions in microbial communities has been developed in this study. The applications of network approaches to microbial communities could also provide a general framework for assessing the consequences of environmental disturbances at the whole-community level, which can serve as the first step toward a predictive microbial ecology within the context of global environmental change (6).

The RMT-based approach presented here provides a reliable, sensitive, and robust tool for identifying MENs with several main advantages. First, this approach was developed based on the two universal laws of RMT, and hence it is based on a sound theoretical foundation. Thus, it should be applicable to a variety of complex systems such as cells, communities, and ecosystems. Second, the threshold for defining a MEN is automatically defined based on the data structure itself rather than artificially chosen, and thus, no ambiguity occurs in identifying MENs. Third, since RMT is powerful for removing noise from nonrandom, system-specific features, the identified network is reliable, as clearly demonstrated in identifying transcriptional networks (16). In contrast to other approaches, such as those based on permutation testing (7, 32), the thresholds of correlation defined by RMT for defining networks are generally substantially higher. Consequently, the networks identified should be highly robust. Fourth, this RMT approach has potential for analyzing heterogeneous ecological data sets (e.g., hybridizations, sequencing, geochemistry) or combinations thereof. This could be particularly important for linking network structure to ecosystem functioning. In addition, unlike local similarity analysis (7), which deals with time series data, this approach can be used to analyze both spatial and temporal data sets. However, the networks in this study are constructed based on Pearson correlation, which assumes a linear relationship between correlated variables. Some other, nonlinear, correlation methods, such as local similarity analysis (7), should be considered for further improvement of the RMT-based network approach.

Knowledge of network topology (e.g., scale free and small world) is important because the same or similar architecture among different types of networks (e.g., biological, physical, and social networks) may reveal common organizing principles of complex systems, and the shape of the degree distribution greatly affects the stability of the complex systems (15). Several recent studies indicated that most food webs did not display small-world patterns and scale-free structure, except for a few (33, 34). Food webs could be fitted with different functional forms, including power law, truncated power law, exponential, and uniform distributions (34). It appears that most food webs follow exponential distribution (35). Also, most of the mutualistic networks examined follow the truncated power law distribution. In contrast to the above two types of ecological networks, the fMENs presented here displayed small-world effects and the power law distribution, which are more similar to many cellular networks (15) such as protein-protein interaction networks, gene expression networks, and some metabolic networks. Further studies are needed to understand the unique characteristics, origins, evolutionary mechanisms, and dynamics of MENs.

Modularity is an inherent characteristic of many large complex systems, including many technological, biological (e.g., protein-protein interaction networks, gene expression), and social networks (15, 24, 26, 28, 36). In cellular networks, modularity is an evolved property that could enhance the flexibility of generation of various phenotypes during development (28). Modularity of gene regulation is essential to handle diverse and complex stimuli and responses. In ecology, a module is a group of species that interact strongly among themselves but little with species in other modules (known as compartmentalization) (37). Modularity in an ecological community may reflect habitat heterogeneity, physical contact, functional association, divergent selection, and/or phylogenetic clustering of closely related species (36). Modules with their component species may even be the key units of coevolution. Food webs are traditionally considered representative examples of ecological modularity. Recently, it was shown that all larger pollination networks are modular (36). Similar to food webs and pollination networks, our results showed that functional MENs are also modular. The presence of genes in the same modules could signify that the microorganisms carrying these genes have similar ecological niches. In addition, many different functional genes, such as nifH, belong to different modules and serve as hubs in different modules. This could be a unique predominant feature in ecological networks due to the existence of many redundant populations.

The fMENs identified possess the general features of many cellular networks with the hierarchical, modular, small-world, or scale-free network architecture, which could have important implications for the robustness and functional stability of ecosystems (15, 28). A small-world pattern facilitates efficient, rapid communication among different members within a system so the system can make quick responses to environmental changes such as elevated atmospheric CO2. On the other hand, the short path length will allow the local perturbations to reach the whole network quickly so that that network structure, as well as the functions, could be altered. However, the characteristics of modularity will help to minimize the effects of local perturbations on the system as a whole by containing perturbations and damage at a local level (28), whereas the hierarchical organization of various modules ensures that communication between modules and network hubs is relatively quick. Also, while a scale-free network is unperturbed by the random loss of nodes, it is vulnerable to attacks to network hubs (15). However, such vulnerability can be reduced or improved by the nonrandom organization of similar functional genes as hubs in multiple modules. As a result, a loss of one module hub will not have too much of an impact on the functional stability of the system as a whole. Therefore, as a whole, the overall microbial community could rapidly respond to environmental changes and remain robust in the face of random and specific perturbations via the balance of the advantages and disadvantages of various network topological characteristics for system functional stability.

Understanding the responses of biological communities to environmental change, especially anthropogenic change, is a central issue in ecology and evolution and for society. Due to the increased input of C into soil and associated chemistry under eCO2, it is also expected that the composition and structure of microbial communities will be markedly altered under eCO2, as demonstrated by our previous study (29). Although it is known that environmental changes such as acidification, habitat modification, and hydrological disturbance have profound effects on ecological networks (38), this has not yet been explored in microbial communities due to the lack of appropriate technologies. The MENs developed in this study provide an appropriate framework in which to explore the possible effects of environmental changes on microbial community structure. Also, our results indicated that under eCO2, the network interactions for most of the functional gene groups become more complex than those under aCO2. This is consistent with our previous results showing that both the functional and phylogenetic structures of microbial communities were dramatically altered by eCO2 (29). Besides, such a shift in network structure is significantly correlated with soil geochemical variables. These results, along with those of our previous studies (29), suggested that global climate change factors such as eCO2 have a significant impact on not only the functional structure of grassland microbial communities but also their network interactions.

In summary, the analytical approaches and results presented here are important for studies on ecology, global change biology, and microbiology. First, traditionally, most biodiversity studies of microbial communities have just used information on the number of species and the abundance of each species, but no sufficient experimental data with many genes/populations across different samples were available for characterizing the interactions among different microorganisms. This study provides a novel conceptual framework for studying network interactions among different microbial populations, which is an essential component of biodiversity studies. Also, understanding the responses and mechanisms of biological communities to global change is a central goal for ecologists (39, 40). As demonstrated by this study, the network interactions for most of the functional gene groups become more complex under eCO2 than under aCO2. Thus, it is apparent that global climate change factors such as eCO2 could have a fundamental impact on network interactions at the levels of entire communities, individual functional gene categories/groups, and functional genes/sequences. To our knowledge, this is the first study to demonstrate the changes in the network interactions of microbial communities in response to eCO2. In addition, metagenomics has emerged as a cutting-edge 21st-century science but one of the greatest challenges is how to use such information to understand community-level functional processes. This study provides unique tools for discerning network interactions based on metagenomic data. Thus, in short, the MEN framework developed should have a profound impact on the study of biodiversity, global change biology, ecosystem ecology, and systems microbiology.


Sampling site and GeoChip data collection.

Network analysis was conducted with the microbial communities of the soil samples from the Biocon experiment located at the Cedar Creek Ecosystem Science Reserve in Minnesota (45°N, 93°W). Plots were established in 1997 on a secondary successional grassland on a sandy outwash soil after removing the previous vegetation (41). The main Biocon field experiment has 296 (of a total of 371) evenly distributed plots (2 by 2 m) in six 20-meter-diameter rings, three for an aCO2 concentration of 368 µmol/mol and three for an eCO2 concentration of 560 µmol/mol using a FACE system (42). In this study, 24 plots (12 for aCO2 and 12 for eCO2, all with 16 species and no additional N supply) were used. The experimental analyses of these plots (e.g., soil chemistry, plants, GeoChip hybridization, and data preprocessing) are described elsewhere (29). Since this study focused on the impact of eCO2 on ecosystem functional processes, only the genes involved in the cycling of nutrient such as carbon (C), nitrogen (N), phosphorus (P), and sulfur (S) were used for network analysis. Because we are more interested in the network interactions among different microbial functional groups, only the representative commonly used signature genes for various functional gene groups were selected for the network analysis. In most cases, only those genes detected in half or more than half of the total samples (majority rule) were kept for subsequent network constructions.

Network construction.

Two MENs were constructed. The experimental data used for constructing fMENs were generated by GeoChip-based microarray analysis (29). The GeoChip hybridization intensity data were log transformed before the construction of a Pearson correlation matrix, which is commonly used for constructing gene expression networks (30). Logarithmic transformation improves data statistical properties by stabilizing the variations in signal intensity. The correlation matrix was then converted to a similarity matrix, which measures the degree of concordance between the abundance profiles of genes across different samples by taking the absolute values of the correlation matrix (30, 43). Subsequently, an adjacency matrix, which encodes the connection strength between each pair of nodes, was derived from the similarity matrix by applying an appropriate threshold, st, which was defined using the RMT-based network approach as previously described (16, 17).

Network characterization.

The Cytoscape 2.6.0 (44) software was used to visualize the network graphs. Other information about genes, e.g., taxonomy, relative abundance, and edge information, e.g., weights and positive and negative correlations, was also imported into the software and visualized in the network figures. Since we are interested in the impact of eCO2 on network interactions, the fMENs were constructed separately under aCO2 and eCO2.

Various indexes, including average degree (connectivity) (45), betweenness (46), stress and eigenvector centrality (46), average clustering coefficient (47, 48), vulnerability (49), average geodesic distance (45), geodesic efficiency and harmonic geodesic distance (50), density and transitivity (51), and connectedness (52), were used to describe the properties of individual nodes in the network and the overall topologies or structures of the different networks. In general, the network index, connectivity (ki), is calculated by summing the strengths of the connections (i.e., links) of each gene (i.e., node) with all of the other connected genes in the network. Connectivity represents how strongly a gene is connected to other genes and is one of the most commonly used network indexes. The definitions and calculations of other indexes are provided in Table S1 in the supplemental material. Most calculations were accomplished through the sna and igraph packages in the R project (53, 54). Those overall topological indexes in part II (see Table S1 in the supplemental material) describe the overall network topology from different angles, and thus they are useful in characterizing various fMENs identified under different conditions.

To characterize the modularity property of fMENs, each network was separated into modules, which were usually considered functional units in biological systems (36, 55). Modularity (M) measures the extent to which nodes have more links within their own modules than expected if linkage were random. The modules were detected by the modularity detection method as described previously (56). After scanning all of the branches of the hierarchical tree of a graph, the level with the maximum modularity score was used to separate the graph into multiple dense subgraphs. The modularity of each network (M) was calculated as previously described (56).

Random network construction and network comparison.

Since only a single data point of each overall network index was available for each network parameter, standard statistical analysis could not be performed to assess their statistical significance. Thus, random networks were generated using the Maslov-Sneppen procedure (57). This method keeps the numbers of nodes and links unchanged but rewires all of the links’ positions in the fMENs so the sizes of networks are the same and the random rewired networks are comparable to the original ones. For each network identified in this study, a total of 100 randomly rewired networks were generated and all of the network indexes were calculated individually. Then, the average and standard deviation for each index of all of the random networks were obtained. The statistical Z test was used to test the differences between the indexes of the fMEN and random networks. Meanwhile, for comparisons of the network indexes under different conditions, the Student t test was employed using the standard deviations derived from corresponding random networks.

Gene significance based on a sample trait.

In gene expression network analyses, gene significance (GSil) is the correlation between the expression profile of the ith gene and the lth sample trait, Tl (30). The higher the GSil value, the more biologically significant is gene i with respect to sample trait l. In this study, gene significance was defined as follows:


where xi is the ith gene signal intensities i [set membership] {1, ..., n} and Tl is the lth sample trait (e.g., soil pH, N content, total plant biomass) [l [set membership] {1, ..., q}]. Since the measurement units for different traits vary, all of the trait data were standardized before statistical analysis.

Massive soil and plant trait data from this long-term experimental site are available (20, 29, 41) as described above, and they were used for estimating gene significance. The coefficient of correlation between each gene and each soil or plant variable was calculated across 12 replicate samples under both eCO2 and aCO2, respectively. Thus, the gene significance matrix, GSnxq, was obtained.

Relationships of microbial interaction networks with soil and plant variables.

To discern the relationships between the fMENs and soil properties and plant variables, Mantel tests were performed. Since many unrelated individual variables could mask the signature of any significant variables, a common subset of soil or plant variables were selected for Mantel tests as follows. First, the correlations between the gene significance of the lth soil or plant variable (GSil) and the connectivity of individual genes (ki) were calculated across all of the genes detected. The statistical significances of these correlations were estimated based on P values. Second, all of the soil or plant variables with significant r values (P < 0.1) under either eCO2 or aCO2 were selected and combined as a common set of soil or plant variables for Mantel tests. The following soil variables were selected: the percentages of C or N at depths of 0 to 10, 10 to 20, 20 to 40, and 40 to 60 cm; the proportions of soil moisture at depths of 0 to 17, 42 to 59, and 83 to 100 cm; and soil pH. The following plant variables were also selected: total root biomass, percent belowground C, aboveground total biomass, aboveground N, aboveground C/N ratio, coarse roots 0 to 20 (g/m2), annual root ingrowth fine roots (g/m2), forb biomass, and biomass for four plant species, Amorpha canescens (legume), Andropogon gerardii (C4), Lespedeza capitata (legume), and Sorghastrum nutans (C4). In addition, simple or partial Mantel tests were performed for the connectivity of all of the genes detected and all of the selected soil or plant variables to examine the relationships between network structure (i.e., connectivity) and soil or plant variables. Mantel tests were performed using the programs available in the R vegan package (58).


This work has been partially supported through contract DE-SC0004601 and contract DE-AC02-05CH11231 (as part of ENIGMA, a Scientific Focus Area) by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, Genomics: GTL Foundational Science, the United States Department of Agriculture (project 2007-35319-18305) through the NSF-USDA Microbial Observatories Program, and the Oklahoma Bioenergy Center (OBC).

We all contributed intellectual input and assistance to this study and manuscript preparation. The original concept, framework, and strategies were developed by J.Z. All of the data analyses and network construction were performed by Y.D. with help from F.L., Q.T., X.Z., and Z.H. J.Z. synthesized all of the data and wrote the paper with help from Y.D. and Z.H.


Citation Zhou, J., Y. Deng, F. Luo, Z. He, Q. Tu, et al. 2010. Functional molecular ecological networks. mBio 1(4):e00169-10. doi:10.1128/mBio.00169-10.


TABLE S1 Network topological indexes used in this study.
TABLE S2 Topological comparisons of functional MENs under aCO2 and eCO2.
TABLE S3 Summary of the network complexity of individual functional genes involved in C, N, P, and S cyclings.
TABLE S4 Connectivity of all of the shared nifH genes detected under both aCO2 and eCO2.
FIG S1. Scatter plots showing the scale-free property of the fMENs under both aCO2 and eCO2. The x axis is the node connectivity (k). The y axis is the number of nodes under a given connectivity. The values on both axes were log transformed. Lines and r2 values are the best fit of the data to the model. (A) fMEN under eCO2. (B) fMEN under aCO2.
FIG S2. Modular organization of the fMENs with GeoChip-based metagenomic data. The networks were constructed by the RMT-based approach with the GeoChip data from eCO2 (12 samples) (A) and aCO2 (12 samples) (C). Clear modular architecture was observed in this fMEN. Each node signifies a gene, which could correspond to a microbial population. Colors of the nodes indicate different major functional genes. A blue line indicates a positive interaction between two individual nodes, while a red line indicates a negative interaction. The numbers indicate different modules or submodules determined by the fast greedy modularity optimization method. All of the data showed that the functional MENs have a modular architecture. In addition, the sizes of individual modules are plotted in panels B (eCO2) and D (aCO2).
FIG S3. Effects of eCO2 on network interactions of several key functional gene categories under eCO2 (red) and aCO2 (blue). **, statistically significantly different at P = 0.01 based on the standard deviation derived from random network simulation. No standard deviation can be estimated from a random network for connectivity because the connectivities are identical between the empirical and random networks.
FIG S4. Impact of eCO2 on the network interactions of key functional genes. (A) Network interactions of the top eight functional genes with the highest connectivities under aCO2. (B) Network interactions of the corresponding functional genes under eCO2. The networks were constructed by the RMT-based approach with the GeoChip data. The meanings of some of the symbols are given in the legend to Fig. S2.


1. Montoya J. M., Pimm S. L., Sole R. V. 2006. Ecological networks and their fragility. Nature 442:259–264 [PubMed]
2. Bascompte J., Jordano P., Melian C. J., Olesen J. M. 2003. The nested assembly of plant-animal mutualistic networks. Proc. Natl. Acad. Sci. U. S. A. 100:9383–9387 [PubMed]
3. Bastolla U., Fortuna M. A., Pascual-Garcia A., Ferrera A., Luque B., Bascompte J. 2009. The architecture of mutualistic networks minimizes competition and increases biodiversity. Nature 458:1018–1020 [PubMed]
4. Cattin M. F., Bersier L. F., Banasek-Richter C., Baltensperger R., Gabriel J. P. 2004. Phylogenetic constraints and adaptation explain food-web structure. Nature 427:835–839 [PubMed]
5. Holland M. D., Hastings A. 2008. Strong effect of dispersal network structure on ecological dynamics. Nature 456:792–794 [PubMed]
6. Raes J., Bork P. 2008. Molecular eco-systems biology: towards an understanding of community function. Nat. Rev. Microbiol. 6:693–699 [PubMed]
7. Ruan Q. S., Dutta D., Schwalbach M. S., Steele J. A., Fuhrman J. A., Sun F. Z. 2006. Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics 22:2532–2538 [PubMed]
8. Fuhrman J. A. 2009. Microbial community structure and its functional implications. Nature 459:193–199 [PubMed]
9. Handelsman J., Tiedje J. M., Alvarez-Cohen L., Ashburner M., Cann I. K. O., DeLong E. F., Doolittle W. F., Fraser-Liggett C. M., Godzik A., Gordon J. I., Riley M., Schmid M. B., Reid A. H. 2007. Committee on metagenomics: challenges and functional applications. The National Academies Press, Washington, DC
10. He Z., Gentry T. J., Schadt C. W., Wu L., Liebich J., Chong S. C., Huang Z., Wu W., Gu B., Jardine P., Criddle C., Zhou J. 2007. GeoChip: a comprehensive microarray for investigating biogeochemical, ecological and environmental processes. ISME J. 1:67–77 [PubMed]
11. Wang F., Zhou H., Meng J., Peng X., Jiang L., Sun P., Zhang C., Van Nostrand J. D., Deng Y., He Z., Wu L., Zhou J., Xiao X. 2009. GeoChip-based analysis of metabolic diversity of microbial communities at the Juan de Fuca Ridge hydrothermal vent. Proc. Natl. Acad. Sci. U. S. A. 106:4840–4845 [PubMed]
12. Zhou J., Kang S., Schadt C. W., Garten C. T., Jr. 2008. Spatial scaling of functional gene diversity across various microbial taxa. Proc. Natl. Acad. Sci. U. S. A. 105:7768–7773 [PubMed]
13. Gardner T. S., Faith J. J. 2005. Reverse-engineering transcription control networks. Phys. Life Rev. 2:65–88 [PubMed]
14. Zhang B., Horvath S. 2005. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4 :Article17 [PubMed]
15. Barabasi A. L., Oltvai Z. N. 2004. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5:101–115 [PubMed]
16. Luo F., Yang Y., Zhong J., Gao H., Khan L., Thompson D. K., Zhou J. 2007. Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory. BMC Bioinform. 8:299 [PMC free article] [PubMed]
17. Luo F., Zhong J. X., Yang Y. F., Scheuermann R. H., Zhou J. Z. 2006. Application of random matrix theory to biological networks. Phys. Lett. A 357:420–423
18. Yang Y., Harris D. P., Luo F., Xiong W., Joachimiak M., Wu L., Dehal P., Jacobsen J., Yang Z., Palumbo A. V., Arkin A. P., Zhou J. 2009. Snapshot of iron response in Shewanella oneidensis by gene network reconstruction. BMC Genomics 10:131 [PMC free article] [PubMed]
19. Zhou A., He Z. , Redding-Johanson A. M., Mukhopadhyay A., Hemme C. L., Joachimiak M., Luo F., Deng Y., Bender K. S., He Q., Keasling J. D., Stahl D. A., Fields M. W., Hazen T. C., Arkin A. P., Wall J. D., Zhou J. May 2010, posting date. Hydrogen peroxide-induced oxidative stress responses in Desulfovibrio vulgaris Hildenborough. Environ. Microbiol. [Epub ahead of print.] doi: 10.1111/j.1462-2920.2010.02234.x [PubMed] [Cross Ref]
20. Reich P. B., Hobbie S. E., Lee T., Ellsworth D. S., West J. B., Tilman D., Knops J. M., Naeem S., Trost J. 2006. Nitrogen limitation constrains sustainability of ecosystem response to CO2. Nature 440:922–925 [PubMed]
21. Fuhrman J. A., Steele J. A. 2008. Community structure of marine bacterioplankton: patterns, networks, and relationships to function. Aquat. Microb. Ecol. 53:69–81
22. Mehta M. L. 1990. Random matrices, 2nd ed Academic Press, Inc., New York, NY.
23. Luo F., Zhong J., Yang Y., Zhou J. 2006. Application of random matrix theory to microarray data for discovering functional gene modules. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 73:031924 [PubMed]
24. Alon U. 2003. Biological networks: the tinkerer as an engineer. Science 301:1866–1867 [PubMed]
25. Barabasi A. L. 2009. Scale-free networks: a decade and beyond. Science 325:412–413 [PubMed]
26. Muller-Linow M., Hilgetag C. C., Hutt M. T. 2008. Organization of excitable dynamics in hierarchical biological networks. PLoS Comput. Biol. 4:e1000190 [PMC free article] [PubMed]
27. Guimera R., Sales-Pardo M., Amaral L. A. 2007. Classes of complex networks defined by role-to-role connectivity profiles. Nat. Phys. 3:63–69 [PMC free article] [PubMed]
28. Kitano H. 2004. Biological robustness. Nat. Rev. Genet. 5:826–837 [PubMed]
29. He Z., Xu M., Deng Y., Kang S., Kellogg L., Wu L., van Nostrand J. D., Hobbie S. E., Reich P., Zhou J. 2010. Metagenomic analysis reveals a marked divergence in the structure of belowground microbial communities at elevated CO2. Ecol. Lett. 13:564–575 [PubMed]
30. Horvath S., Dong J. 2008. Geometric interpretation of gene coexpression network analysis. PLoS Comput. Biol. 4:e1000117 [PMC free article] [PubMed]
31. Fitter A. H., Gilligan C. A., Hollingworth K., Kleczkowski A., Twyman R. M., Pitchford J. W., Programme N. S. B. 2005. Biodiversity and ecosystem function in soil. Funct. Ecol. 19:369–377
32. Curtis D., North B. V., Sham P. C. 2001. Use of an artificial neural network to detect association between a disease and multiple marker genotypes. Ann. Hum. Genet. 65:95–107 [PubMed]
33. Berlow E. L., Dunne J. A., Martinez N. D., Stark P. B., Williams R. J., Brose U. 2009. Simple prediction of interaction strengths in complex food webs. Proc. Natl. Acad. Sci. U. S. A. 106:187–191 [PubMed]
34. Dunne J. A., Williams R. J., Martinez N. D. 2002. Food-web structure and network theory: the role of connectance and size. Proc. Natl. Acad. Sci. U. S. A. 99:12917–12922 [PubMed]
35. Fortuna M. A., Bascompte J. 2006. Habitat loss and the structure of plant-animal mutualistic networks. Ecol. Lett. 9:278–283 [PubMed]
36. Olesen J. M., Bascompte J., Dupont Y. L., Jordano P. 2007. The modularity of pollination networks. Proc. Natl. Acad. Sci. U. S. A. 104:19891–19896 [PubMed]
37. Bascompte J., Stouffer D. B. 2009. The assembly and disassembly of ecological networks. Philos. Trans. R. Soc. Lond. B Biol. Sci. 364:1781–1787 [PMC free article] [PubMed]
38. Tylianakis J. M., Tscharntke T., Lewis O. T. 2007. Habitat modification alters the structure of tropical host-parasitoid food webs. Nature 445:202–205 [PubMed]
39. Heimann M., Reichstein M. 2008. Terrestrial ecosystem carbon dynamics and climate feedbacks. Nature 451:289–292 [PubMed]
40. Gruber N., Galloway J. N. 2008. An Earth-system perspective of the global nitrogen cycle. Nature 451:293–296 [PubMed]
41. Reich P. B., Knops J., Tilman D., Craine J., Ellsworth D., Tjoelker M., Lee T., Wedin D., Naeem S., Bahauddin D., Hendrey G., Jose S., Wrage K., Goth J., Bengston W. 2001. Plant diversity enhances ecosystem responses to elevated CO2 and nitrogen deposition. Nature 410:809–812 [PubMed]
42. Lewin G. R., Mckintosh E., Mcmahon S. B. 1994. Nmda receptors and activity-dependent tuning of the receptive-fields of spinal-cord neurons. Nature 369:482–485 [PubMed]
43. Horvath S., Zhang B., Carlson M., Lu K. V., Zhu S., Felciano R. M., Laurance M. F., Zhao W., Qi S., Chen Z., Lee Y., Scheck A. C., Liau L. M., Wu H., Geschwind D. H., Febbo P. G., Kornblum H. I., Cloughesy T. F., Nelson S. F., Mischel P. S. 2006. Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. Proc. Natl. Acad. Sci. U. S. A. 103:17402–17407 [PubMed]
44. Cline M. S., Smoot M., Cerami E., Kuchinsky A., Landys N., Workman C., Christmas R., Avila-Campilo I., Creech M., Gross B., Hanspers K., Isserlin R., Kelley R., Killcoyne S., Lotia S., Maere S., Morris J., Ono K., Pavlovic V., Pico A. R., Vailaya A., Wang P. L., Adler A., Conklin B. R., Hood L., Kuiper M., Sander C., Schmulevich I., Schwikowski B., Warner G. J., Ideker T., Bader G. D. 2007. Integration of biological networks and gene expression data using cytoscape. Nat. Protoc. 2:2366–2382 [PubMed]
45. West D. B. 1996. Introduction to graph theory. Prentice Hall, Upper Saddle River, NJ
46. Brandes U., Erlebach T. 2005. Network analysis: methodological foundations. Springer-Verlag, Berlin, Germany
47. Watts D. J., Strogatz S. H. 1998. Collective dynamics of “small-world” networks. Nature 393:440–442 [PubMed]
48. Ravasz E., Somera A. L., Mongru D. A., Oltvai Z. N., Barabasi A. L. 2002. Hierarchical organization of modularity in metabolic networks. Science 297:1551–1555 [PubMed]
49. Costa L. D., Rodrigues F. A., Travieso G., Boas P. R. V. 2007. Characterization of complex networks: a survey of measurements. Adv. Phys. 56:167–242
50. Latora V., Marchiori M. 2001. Efficient behavior of small-world networks. Phys. Rev. Lett. 87:198701 [PubMed]
51. Wasserman S., Faust K. 1994. Social network analysis: methods and applications. Cambridge University Press, Cambridge, United Kingdom
52. Krackhardt D. 1994. Graph theoretical dimensions of informal organizations. Lawrence Erlbaum, Hillsdale, NJ
53. Butts C. T. 2008. Social network analysis with sna. J. Stat. Softw. 24:1–51 [PubMed]
54. Csardi G. 2006. The Igraph library.
55. Newman M. E. J. 2006. Modularity and community structure in networks. Proc. Natl. Acad. Sci. U. S. A. 103:8577–8582 [PubMed]
56. Clauset A., Newman M. E., Moore C. 2004. Finding community structure in very large networks. Phys. Rev. E 70:066111 [PubMed]
57. Maslov S., Sneppen K. 2002. Specificity and stability in topology of protein networks. Science 296:910–913 [PubMed]
58. Dixon P. 2003. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14:927–930

Articles from mBio are provided here courtesy of American Society for Microbiology (ASM)