Network theoretic methods and concepts are increasingly used for the systems biologic analysis of microarray data. We illustrate how network concepts can be used for describing large correlation matrices and for arriving at biologically plausible data reduction techniques. Many alternative approaches for defining gene coexpression networks are possible, e.g., 
. Here we define the network adjacency and the gene significance measure in terms of correlations since this allows us to interpret pairwise relations in terms of angles between scaled versions of the variables. For example, the sample trait based gene significance measure of the i
th gene is determined by the angle between the i
th gene expression profile and the sample trait T
(Equation 4); the scaled intramodular connectivity of the i
th gene (Equation 33) is determined by the angle between the i
th gene expression profile and the module eigengene; the hub gene significance (Equation 34) is determined by the angle between module eigengene and the sample trait.
The geometric interpretation of gene coexpression network analysis reveals a deep connection to other statistical methods. Since it projects the gene expressions profiles onto the hypersphere in an m-dimensional Euclidean space, network analysis can be considered a special case of directional statistics. When focusing on the use of module eigengenes, network analysis can be considered a variant of oblique factor analysis.
A high level view of modules and their centroids (eigengenes) can be used to define eigengene networks 
. High correlations (small angles) between module eigengenes may suggest close relationships between the corresponding pathways. A low level view of a single module allows us to provide a geometric interpretation of intramodular network concepts. We use the singular value decomposition of module expression data to characterize approximately factorizable gene coexpression networks, i.e., adjacency matrices that satisfy aij(q)
. We provide an intuitive formula of the conformity CFi(q)
. Since the module eigengene E(q)
summarizes the overall behavior of the module, the eigengene conformity |cor(xi(q)
measures how well gene i
conforms to the overall module. This insight led us to coin the term “conformity”. Using the singular values, we propose a measure of eigengene factorizability (Equation 24) that is analogous to the proportion of variance explained by the module eigengene (Equation 22). We provide a geometric interpretation of network factorizability in .
The derivation of Observation 1 in the Methods
section highlights a theoretical advantage of the soft-thresholding approach (Equation 2); the resulting weighted network maintains the approximate factorizability of the underlying correlation matrix: aij(q)
Using multiple different gene coexpression networks from mouse tissues, brain cancer, and yeast, we provide empirical evidence that coexpression modules tend to have high eigengene factorizability and that the maximum conformity assumption (Equation 32) is satisfied for low powers of β.
We propose eigengene-based analogs of network concepts (Equation 30). While network concepts are functions of the adjacency matrix, eigengene-based network concepts are analogous functions of the eigengene conformities |cor(xi(q)
. Algebraically, eigengene-based network concepts are closely related to “approximate conformity based” network concepts 
but they allow for a geometric interpretation.
We use the correspondence between intramodular network concepts and their eigengene-based analogs to provide a geometric interpretation of network concepts. Observation 2 states that network concepts in weighted gene coexpression module networks are approximately equal to their eigengene-based analogs. A major theoretical advantage of eigengene-based network concepts is that they reveal simple relationships. To arrive at particularly simple relationships, we make the maximum conformity assumption (Equation 32) for the results presented in the main text. provides a rough dictionary for translating between gene coexpression network analysis and the singular value decomposition if the underlying expression data have high eigengene factorizability (say EF(X(q))>0.95) and if the maximum conformity assumption (Equation 32) is satisfied. However, even if the maximum conformity assumption does not hold, one can still find simple relationships among the network concepts (Equation 49).
The geometric interpretation of gene coexpression networks facilitates the derivation of several results that should be interesting to network theorists. For example, we argue that highly connected intramodular hub genes cannot be intermediate between two distinct coexpression modules (). The geometric interpretation is particularly useful when studying gene significance and module significance measures that are based on a microarray sample trait (Equation 4). To study the relationship between connectivity and gene significance, we propose a novel measure of hub gene significance (Equation 13). We find that the hub gene significance of a module network is determined by the angle between the module eigengene and the microarray sample trait (Equation 34). Our geometric interpretation of coexpression networks allows us to describe situations when a module has low hub gene significance (). Our theoretical derivations for relating module significance to hub gene significance (Equation 37) assumes a gene significance measure based on a sample trait. Although this important assumption is violated for the gene significance measure (knock-out essentiality) in the yeast network, it is striking that the relationship between hub gene significance and module significance can still be observed in this application ().
We provide a robustness analysis that shows that many of our theoretical results apply even if our underlying assumptions are not satisfied ( and , , , and , Text S1
, Text S2
, and Text S3
). We find that the correspondence between network concepts and their eigengene-based analogs is often better in weighted networks than in unweighted networks. Further, we find that the results in weighted networks tend to be more robust than those in unweighted networks with regard to changing the network construction thresholds β
, respectively. Thus, weighted coexpression networks are preferable over unweighted networks when a geometric interpretation of network concepts is desirable.
The correspondence between coexpression module networks and the singular value decomposition () can break down when a high soft threshold is used for constructing a weighted network or when dealing with an unweighted network. Thus, eigengene-based concepts do not replace network concepts when describing interaction patterns among genes.
While this article has a theoretical bent, we illustrate the results on three different microarray data sets (human, mouse, and yeast) that are described in our online R software tutorials, in Text S1
, Text S2
, and Text S3
. Our theoretical results also apply to networks comprised of genes that are highly correlated with a sample trait. The key assumption underlying our results is high eigengene factorizability EF
). To illustrate this point, Text S4
describes a brain cancer network comprised of the 500 genes with highest absolute correlation with brain cancer survival time. Our results illustrate that the geometric interpretation of gene coexpression networks has important theoretical and practical implications that may guide the development and application of network methods.